« va_list is now an array of some (opaque ?) struct... gee thanks | Main | SSD Fight: Samsung PM 800 256 GB vs Intel X-25M Postville »

A small x86_64 optimization brainteaser [with complete solution]

Can you make this code - method abc - run faster with x86_64 ?
#import <Foundation/Foundation.h>

@interface Foo : NSObject 
{
   long  a;
   long  b;
   long  c;
}

- (long) abc;

@end

@implementation Foo

- (long) abc
{
   return( self->a + self->b + self->c);
}

@end
Hint: Think about reorganization or restructuring.
Reference toolchain is stock Xcode 3.2 with default gcc 4.2 and Release setting.

The solution is in the full text.

And this is the solution:
#import <Foundation/Foundation.h>

@interface Foo : NSObject 
{
   struct 
   {
      long  a;
      long  b;
      long  c;
   } abc;
}

- (long) abc;

@end

@implementation Foo

- (long) abc
{
   return( self->abc.a + self->abc.b + self->abc.c);
}

@end
So why is this faster ? A look at the disassembled code shows, that because of the struct, there is no need to fetch an offset for each value. That ivars are dynamically offset is a new "feature" of the 64 bit runtime.

Before:

	pushq	%rbp
	movq	%rsp, %rbp

	movq	_OBJC_IVAR_$_Foo.a(%rip), %rax
	movq	(%rdi,%rax), %rax
	movq	_OBJC_IVAR_$_Foo.b(%rip), %rdx
	addq	(%rdi,%rdx), %rax
	movq	_OBJC_IVAR_$_Foo.c(%rip), %rdx
	addq	(%rdi,%rdx), %rax

	leave
	ret

After:

	pushq	%rbp
	movq	%rsp, %rbp

	addq	_OBJC_IVAR_$_Foo.x(%rip), %rdi

	movq	(%rdi), %rax
	addq	8(%rdi), %rax
	addq	16(%rdi), %rax

	leave
	ret
I don't know, why gcc doesn't optimize the stack frame away in these simple cases.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on November 19, 2009 12:54 PM.

The previous post in this blog was va_list is now an array of some (opaque ?) struct... gee thanks.

The next post in this blog is SSD Fight: Samsung PM 800 256 GB vs Intel X-25M Postville.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 4.25