Nat! bio photo

Nat!

Senior Mull

Twitter Github Twitch

A small x86_64 optimization brainteaser [with complete solution]

Can you make this code - method abc - run faster with x86_64 ?

#import <Foundation/Foundation.h>

@interface Foo : NSObject 
{
   long  a;
   long  b;
   long  c;
}

- (long) abc;

@end

@implementation Foo

- (long) abc
{
   return( self->a + self->b + self->c);
}

@end

Hint: Think about reorganization or restructuring.
Reference toolchain is stock Xcode 3.2 with default gcc 4.2 and Release setting.

The solution is in the full text. And this is the solution:

#import <Foundation/Foundation.h>

@interface Foo : NSObject 
{
   struct 
   {
      long  a;
      long  b;
      long  c;
   } abc;
}

- (long) abc;

@end

@implementation Foo

- (long) abc
{
   return( self->abc.a + self->abc.b + self->abc.c);
}

@end

So why is this faster ? A look at the disassembled code shows, that because of the struct, there is no need to fetch an offset for each value. That ivars are dynamically offset is a new "feature" of the 64 bit runtime.

Before:

   pushq   %rbp
        movq    %rsp, %rbp

        movq    _OBJC_IVAR_$_Foo.a(%rip), %rax
        movq    (%rdi,%rax), %rax
        movq    _OBJC_IVAR_$_Foo.b(%rip), %rdx
        addq    (%rdi,%rdx), %rax
        movq    _OBJC_IVAR_$_Foo.c(%rip), %rdx
        addq    (%rdi,%rdx), %rax

        leave
        ret

After:

   pushq   %rbp
        movq    %rsp, %rbp

        addq    _OBJC_IVAR_$_Foo.x(%rip), %rdi

        movq    (%rdi), %rax
        addq    8(%rdi), %rax
        addq    16(%rdi), %rax

        leave
        ret

I don't know, why gcc doesn't optimize the stack frame away in these simple cases.