Lameness Disclaimer: All this is written to the best of my knowledge. Corrections, additions etc. are certainly welcome.
Method and function call innards
Ok, I lied last time when I said, that we'd be covering different allocation strategies this time. While writing some more articles (which I unfortunately accidently lost, thanks to Mac OS X beta and my stupidity) I noticed that it became awkward to justify many a decision, because the basics on calls hadn't been covered. (Though not necessary, it doesn't hurt to have read the previous articles).
Inside calling C-Functions and Objective-C methods
There will be a lot of disassembled code in this article. Although it can't hurt to understand, what is really happening in the code, for further discussions it is only necessary that you get an impression on the effort involved to do "certain things". Also this article is very much geared to coding on a PPC machine under Mach (i.e. your Mac OS X (Server) box. Things on an Intel machine may very well be different in certain areas, f.i. in the way shared libraries are invoked. Nevertheless most of the comments should also be valid for Intel machinery.
|
|
O tempora O mores - or how to read disassembled code
|
|
Since presumably some people have never seen disassembled code . Lets just quickly examine a typical output of gdb and see, what it means:
gdb>x/1i $pc
0x1ddc <call_test+12>: stw r0,8(r1)
The first number 0x1ddc is the memory address(1) that particular code (C-function or Obj-C method f.e.) happened to be loaded to. If there is a symbol associated with this address (a function or a variable) or its ascending, immediate vicinity, gdb will print the symbols name and its offset in lozenges following the address: <call_test+12>. The data (usually 4 bytes) encountered at this address will then be disassembled into the mnemonic assembler format stw r0,8(r1).
|
|
C calls
Since C is the basis of Obj-C lets review the options we have available for invoking a C function.
If you have a function defined as inline (0) that is short and not too complicated, the compiler (given the -O option) will happily avoid the actual function call and inline the code in the callers code. This is the fastest way to invoke a C function. It is on par with the more old fashioned method of using macros (aka #define). If you have set your compiler to a more aggressive optimization level, it may even inline some of your small functions automatically.
The following example shows the generated code for two C functions. Function inline_foo is inlined by the compiler, whereas plain foo is called the "usual" way as a subroutine.
In the box on the right, you can see the disassembly of the object code generated by the compiler from the source code below.
In the disassembled code listing the inlined code is marked green (<call_test+32> - <call_test+44>), it is just four instructions. The functional equivalent code wrapped in the normal C function header and footer is marked purple (<foo+16> - <foo+28>). All instructions that comprise the overhead of a regular C call over an inlined C call have been marked blue.
|
0x1dac <foo>: mflr r0
0x1db0 <foo+4>: bcl 20,4*cr7+so,0x1db4 <foo+8>
0x1db4 <foo+8>: mflr r12
0x1db8 <foo+12>: mtlr r0
0x1dbc <foo+16>: addis r9,r12,0
0x1dc0 <foo+20>: addi r9,r9,568
0x1dc4 <foo+24>: lfd f0,0(r9)
0x1dc8 <foo+28>: fmul f1,f1,f0
0x1dcc <foo+32>: blr
0x1dd0 <call_test>: mflr r0
0x1dd4 <call_test+4>: stfd f31,-8(r1)
0x1dd8 <call_test+8>: stw r31,-12(r1)
0x1ddc <call_test+12>: stw r0,8(r1)
0x1de0 <call_test+16>: stwu r1,-80(r1)
0x1de4 <call_test+20>: bcl 20,4*cr7+so,0x1de8
0x1de8 <call_test+24>: mflr r31
0x1dec <call_test+28>: fmr f31,f1
0x1df0 <call_test+32>: addis r9,r31,0
0x1df4 <call_test+36>: addi r9,r9,524
0x1df8 <call_test+40>: lfd f0,0(r9)
0x1dfc <call_test+44>: fmul f31,f31,f0
0x1e00 <call_test+48>: bl 0x1dac <foo>
0x1e04 <call_test+52>: fadd f1,f31,f1
0x1e08 <call_test+56>: addi r1,r1,80
0x1e0c <call_test+60>: lwz r0,8(r1)
0x1e10 <call_test+64>: mtlr r0
0x1e14 <call_test+68>: lwz r31,-12(r1)
0x1e18 <call_test+72>: lfd f31,-8(r1)
0x1e1c <call_test+76>: blr
|
static inline double inline_foo( float x)
{
return( x * 2.1);
}
static double foo( float x)
{
return( x * 2.1);
}
double call_test( float x)
{
return( inline_foo( x) + foo( x));
}
|
There is no need to wax on over this trivial issue. Inlining rocks!
Calling a function in a shared library
You very often call functions that are provided by a shared library. Any function that resides in a Framework is in a shared library. As an example lets see how a call to malloc works:
0x1dd4 <call_test2>: mflr r0
0x1dd8 <call_test2+4>: stw r0,8(r1)
0x1ddc <call_test2+8>: stwu r1,-64(r1)
0x1de0 <call_test2+12>: li r3,128
0x1de4 <call_test2+16>: bl 0x1fa4 <dyld_stub_malloc>
0x1de8 <call_test2+20>: addi r1,r1,64
0x1dec <call_test2+24>: lwz r0,8(r1)
0x1df0 <call_test2+28>: mtlr r0
0x1df4 <call_test2+32>: blr
0x1fa4 <dyld_stub_malloc>: mflr r0
0x1fa8 <dyld_stub_malloc+4>: bcl 20,4*cr7+so,0x1fac
0x1fac <dyld_stub_malloc+8>: mflr r11
0x1fb0 <dyld_stub_malloc+12>: addis r11,r11,0
0x1fb4 <dyld_stub_malloc+16>: mtlr r0
0x1fb8 <dyld_stub_malloc+20>: lwz r12,112(r11)
0x1fbc <dyld_stub_malloc+24>: mtctr r12
0x1fc0 <dyld_stub_malloc+28>: addi r11,r11,112
0x1fc4 <dyld_stub_malloc+32>: bctr
0x5ace3800 <malloc>: mflr r0
0x5ace3804 <malloc+4>: stmw r30,-8(r1)
0x5ace3808 <malloc+8>: stw r0,8(r1)
0x5ace380c <malloc+12>: stwu r1,-80(r1)
0x5ace3810 <malloc+16>: bcl 20,4*cr7+so,0x5ace3814
0x5ace3814 <malloc+20>: mflr r31
0x5ace3818 <malloc+24>: mr r30,r3
0x5ace381c <malloc+28>: addis r9,r31,14
0x5ace3820 <malloc+32>: lwz r9,3020(r9)
0x5ace3824 <malloc+36>: cmpwi r9,0
0x5ace3828 <malloc+40>: bne 0x5ace3830 <malloc+48>
0x5ace382c <malloc+44>: bl 0x5ace74a0
0x5ace3830 <malloc+48>: addis r9,r31,14
0x5ace3834 <malloc+52>: lwz r9,3024(r9)
0x5ace3838 <malloc+56>: lwz r3,0(r9)
0x5ace383c <malloc+60>: mr r4,r30
0x5ace3840 <malloc+64>: bl 0x5ace3860
0x5ace3844 <malloc+68>: addi r1,r1,80
0x5ace3848 <malloc+72>: lwz r0,8(r1)
0x5ace384c <malloc+76>: mtlr r0
0x5ace3850 <malloc+80>: lmw r30,-8(r1)
0x5ace3854 <malloc+84>: blr
|
void call_test2()
{
void *p;
p = malloc( 128);
}
|
The generated code for the malloc call in call_test2 are just the two blue lines at address 0x1de0 and 0x1de4. Consider the other code wrappage :)
Instead of directly calling malloc, another piece of code named dyld_stub_malloc gets called. This stub code, that gets statically linked to your code. provides the interface to the shared library function malloc. (I have no idea, why it needs to be done this way)
Therefore a few extra instructions (red) are executed before we are in malloc(2).
From the disassembled output even the layman can easily deduce that this is a more involved procedure and therefore slower than calling a statically linked C function like "foo" above.
|
Anatomy of a Obj-C method call
Given an object and a selector the Objective-C runtime system somehow determines the address of the code that it should call, and then calls it with the appropriate parameters. Now how does that work in detail ? The first thing to note is that writing
is really just the same as writing
objc_msgSend( p, @selector( callWith:and:), x, y);
|
The objc_msgSend function must determine from the object and the selector, what code should be executed. Since the method callWith:and: could be implemented by any number of different objects (rather their classes), just examining the selector can not be sufficient for objc_msgSend. What it needs to do is to examine the class of the object and try to find in this or its superclasses definition an implementation for this selector.
The actual mechanics are nicely explained in the Objective-C book, so I will forego duplicating this. See your local copy on your harddisk or http://www.toodarkpark.org/computers/objc/coreobjc.html#1522.
|
|
What is a selector anyway ?
|
A selector in the current Apple implementation is just the address of a C String, that contains the name of the selector. So if @selector( callWith:and:) yields 0x10210
you would find at address 0x10210 'c', 'a', 'l', 'l', 'W', 'i', 't', 'h', ':', 'a', 'n', 'd', ':', 0
These selector strings are uniqued by the mach runtime during loading. Therefore all selectors of the various frameworks and your main code that have the same name share the same selector address.
|
|
So lets check out a simple Objective C method and step through the instructions executed when calling it.
0x2d18 <-[CallTest3 fooMethod:]>: addis r9,r12,0
0x2d1c <-[CallTest3 fooMethod:]+4>: addi r9,r9,732
0x2d20 <-[CallTest3 fooMethod:]+8>: lfd f0,0(r9)
0x2d24 <-[CallTest3 fooMethod:]+12>: fmul f1,f1,f0
0x2d28 <-[CallTest3 fooMethod:]+16>: blr
|
- (double) fooMethod:(float) x
{
return( x * 2.1);
}
|
It is interesting is that the fooMethod: itself looks very slender compared to the C function we saw in the previous example. Except for the return instruction blr, there is no overhead for stack maintenance and "environmental" register setup, because all this is taken care of in the stub and objc_mgsSend code.
|
0x2d2c <call_test3>: mflr r0
0x2d30 <call_test3+4>: stw r31,-4(r1)
0x2d34 <call_test3+8>: stw r0,8(r1)
0x2d38 <call_test3+12>: stwu r1,-80(r1)
0x2d3c <call_test3+16>: bcl 20,4*cr7+so,0x2d40
0x2d40 <call_test3+20>: mflr r31
0x2d44 <call_test3+24>: addis r4,r31,0
0x2d48 <call_test3+28>: lwz r4,4800(r4)
0x2d4c <call_test3+32>: stfs f1,56(r1)
0x2d50 <call_test3+36>: lwz r5,56(r1)
0x2d54 <call_test3+40>: bl <objc_msgSend>
0x2d58 <call_test3+44>: addi r1,r1,80
0x2d5c <call_test3+48>: lwz r0,8(r1)
0x2d60 <call_test3+52>: mtlr r0
0x2d64 <call_test3+56>: lwz r31,-4(r1)
0x2d68 <call_test3+60>: blr
|
double call_test3( CallTest3 *p, float x)
{
return( [p fooMethod:x]);
}
|
The blue code sets up the parameter, the selector and the object (it puts them into registers, r5, r4 and r3 respectively) and calls the stub function.
|
0x2fc0 <dyld_stub_objc_msgSend>
0x2fc0: mflr r0
0x2fc4: bcl 20,4*cr7+so,0x2fc8
0x2fc8: mflr r11
0x2fcc: addis r11,r11,0
0x2fd0: mtlr r0
0x2fd4: lwz r12,88(r11)
0x2fd8: mtctr r12
0x2fdc: addi r11,r11,88
0x2fe0: bctr
|
Since objc_msgSend resides in a shared library, the processor has to trod through the stub code (red) to jump to the objc_msgSend implementation.
|
0x720bb088 <objc_msgSend>: cmplwi r3,0
0x720bb08c <objc_msgSend+4>: beq 0x720bb1f4
0x720bb090 <objc_msgSend+8>: stw r8,44(r1)
0x720bb094 <objc_msgSend+12>: stw r9,48(r1)
0x720bb098 <objc_msgSend+16>: stw r10,52(r1)
0x720bb09c <objc_msgSend+20>: lwz r12,0(r3)
0x720bb0a0 <objc_msgSend+24>: lwz r12,32(r12)
0x720bb0a4 <objc_msgSend+28>: lwz r11,0(r12)
0x720bb0a8 <objc_msgSend+32>: addi r9,r12,8
0x720bb0ac <objc_msgSend+36>: and r12,r4,r11
0x720bb0b0 <objc_msgSend+40>: rlwinm r0,r12,2,0
0x720bb0b4 <objc_msgSend+44>: lwzx r10,r9,r0
0x720bb0b8 <objc_msgSend+48>: cmplwi r10,0
0x720bb0bc <objc_msgSend+52>: beq 0x720bb0f4
0x720bb0c0 <objc_msgSend+56>: addi r12,r12,1
0x720bb0c4 <objc_msgSend+60>: lwz r8,0(r10)
0x720bb0c8 <objc_msgSend+64>: and r12,r12,r11
0x720bb0cc <objc_msgSend+68>: lwz r10,8(r10)
0x720bb0d0 <objc_msgSend+72>: cmplw r8,r4
0x720bb0d4 <objc_msgSend+76>: bne- 0x720bb0b0
0x720bb0d8 <objc_msgSend+80>: mr r12,r10
0x720bb0dc <objc_msgSend+84>: mtctr r10
0x720bb0e0 <objc_msgSend+88>: lwz r8,44(r1)
0x720bb0e4 <objc_msgSend+92>: lwz r9,48(r1)
0x720bb0e8 <objc_msgSend+96>: lwz r10,52(r1)
0x720bb0ec <objc_msgSend+100>: li r11,0
0x720bb0f0 <objc_msgSend+104>: bctr
0x720bb0f4 <objc_msgSend+108>: stw r3,24(r1)
0x720bb0f8 <objc_msgSend+112>: stw r4,28(r1)
0x720bb0fc <objc_msgSend+116>: stw r5,32(r1)
0x720bb100 <objc_msgSend+120>: stw r6,36(r1)
0x720bb104 <objc_msgSend+124>: stw r7,40(r1)
0x720bb108 <objc_msgSend+128>: mflr r0
0x720bb10c <objc_msgSend+132>: stw r0,8(r1)
0x720bb110 <objc_msgSend+136>: stfd f13,-8(r1)
0x720bb114 <objc_msgSend+140>: stfd f12,-16(r1)
0x720bb118 <objc_msgSend+144>: stfd f11,-24(r1)
0x720bb11c <objc_msgSend+148>: stfd f10,-32(r1)
0x720bb120 <objc_msgSend+152>: stfd f9,-40(r1)
0x720bb124 <objc_msgSend+156>: stfd f8,-48(r1)
0x720bb128 <objc_msgSend+160>: stfd f7,-56(r1)
0x720bb12c <objc_msgSend+164>: stfd f6,-64(r1)
0x720bb130 <objc_msgSend+168>: stfd f5,-72(r1)
0x720bb134 <objc_msgSend+172>: stfd f4,-80(r1)
0x720bb138 <objc_msgSend+176>: stfd f3,-88(r1)
0x720bb13c <objc_msgSend+180>: stfd f2,-96(r1)
0x720bb140 <objc_msgSend+184>: stfd f1,-104(r1)
0x720bb144 <objc_msgSend+188>: stwu r1,-160(r1)
0x720bb148 <objc_msgSend+192>: lwz r3,0(r3)
0x720bb14c <objc_msgSend+196>: mflr r0
0x720bb150 <objc_msgSend+200>: bl 0x720bb154
0x720bb154 <objc_msgSend+204>: mflr r12
0x720bb158 <objc_msgSend+208>: mtlr r0
0x720bb15c <objc_msgSend+212>: addis r12,r12,5
0x720bb160 <objc_msgSend+216>: lwz r12,-1156(r12)
0x720bb164 <objc_msgSend+220>: mtctr r12
0x720bb168 <objc_msgSend+224>: mflr r0
0x720bb16c <objc_msgSend+228>: stw r0,8(r1)
0x720bb170 <objc_msgSend+232>: stwu r1,-56(r1)
0x720bb174 <objc_msgSend+236>: bctrl
0x720bb178 <objc_msgSend+240>: addic r1,r1,56
0x720bb17c <objc_msgSend+244>: lwz r0,8(r1)
0x720bb180 <objc_msgSend+248>: mtlr r0
0x720bb184 <objc_msgSend+252>: mr r12,r3
0x720bb188 <objc_msgSend+256>: mtctr r3
0x720bb18c <objc_msgSend+260>: lwz r1,0(r1)
0x720bb190 <objc_msgSend+264>: lwz r0,8(r1)
0x720bb194 <objc_msgSend+268>: mtlr r0
0x720bb198 <objc_msgSend+272>: lfd f13,-8(r1)
0x720bb19c <objc_msgSend+276>: lfd f12,-16(r1)
0x720bb1a0 <objc_msgSend+280>: lfd f11,-24(r1)
0x720bb1a4 <objc_msgSend+284>: lfd f10,-32(r1)
0x720bb1a8 <objc_msgSend+288>: lfd f9,-40(r1)
0x720bb1ac <objc_msgSend+292>: lfd f8,-48(r1)
0x720bb1b0 <objc_msgSend+296>: lfd f7,-56(r1)
0x720bb1b4 <objc_msgSend+300>: lfd f6,-64(r1)
0x720bb1b8 <objc_msgSend+304>: lfd f5,-72(r1)
0x720bb1bc <objc_msgSend+308>: lfd f4,-80(r1)
0x720bb1c0 <objc_msgSend+312>: lfd f3,-88(r1)
0x720bb1c4 <objc_msgSend+316>: lfd f2,-96(r1)
0x720bb1c8 <objc_msgSend+320>: lfd f1,-104(r1)
0x720bb1cc <objc_msgSend+324>: lwz r3,24(r1)
0x720bb1d0 <objc_msgSend+328>: lwz r4,28(r1)
0x720bb1d4 <objc_msgSend+332>: lwz r5,32(r1)
0x720bb1d8 <objc_msgSend+336>: lwz r6,36(r1)
0x720bb1dc <objc_msgSend+340>: lwz r7,40(r1)
0x720bb1e0 <objc_msgSend+344>: lwz r8,44(r1)
0x720bb1e4 <objc_msgSend+348>: lwz r9,48(r1)
0x720bb1e8 <objc_msgSend+352>: lwz r10,52(r1)
0x720bb1ec <objc_msgSend+356>: li r11,0
0x720bb1f0 <objc_msgSend+360>: bctr
0x720bb1f4 <objc_msgSend+364>: mflr r0
0x720bb1f8 <objc_msgSend+368>: bl 0x720bb1fc
0x720bb1fc <objc_msgSend+372>: mflr r11
0x720bb200 <objc_msgSend+376>: mtlr r0
0x720bb204 <objc_msgSend+380>: addis r11,r11,5
0x720bb208 <objc_msgSend+384>: lwz r11,-1328(r11)
0x720bb20c <objc_msgSend+388>: lwz r11,0(r11)
0x720bb210 <objc_msgSend+392>: cmplwi r11,0
0x720bb214 <objc_msgSend+396>: beqlr
0x720bb218 <objc_msgSend+400>: mflr r0
0x720bb21c <objc_msgSend+404>: stw r0,8(r1)
0x720bb220 <objc_msgSend+408>: addi r1,r1,-64
0x720bb224 <objc_msgSend+412>: mtctr r11
0x720bb228 <objc_msgSend+416>: bctrl
0x720bb22c <objc_msgSend+420>: addi r1,r1,64
0x720bb230 <objc_msgSend+424>: lwz r0,8(r1)
0x720bb234 <objc_msgSend+428>: mtlr r0
0x720bb238 <objc_msgSend+432>: li r3,0
0x720bb23c <objc_msgSend+436>: blr
|
Lets not go too deeply into the objc_msgSend implementation itself. The first two lines handle messaging to nil (orange). It branches to objc_msgSend+364 if the object is nil - we will ignore this part of the routine.
The brownish part (objc_msgSend +8 to +104) is the code, that searches for and jumps to cached methods.
Whenever a method is called the first time for a class, objc_msgSend will not find an entry in its cache for it. In this case the other "black" part will be used (obc_msgSend+108 ff.) to traverse the class hierarchy, locate the proper implementation and fill the method cache with this information.
This lookup code is very slow. It loops and branches a lot into various subroutines and it can take 500 and more instructions to execute.
When a method has been found and put into the cache, lookup times are very fast! Only a very few loops of the khaki colored code (objc_msgSend+40 ff.) need to be executed to find the cache entry then.
The minimum overhead of objc_msgSend is therefore at least 30 instructions per call, sometimes a little more.
|
|
Calling Obj-C methods directly, avoiding objc_msgSend
Objective-C gives you the possibility to resolve a method's address for an object or a class at runtime and to call that method using this address directly. Since this address is dependent on the class of the object (or the class itself), you have to make sure that you do not erroneously use this method address for another object of a different class. For example it would be dangerous to assume that every NSArray subclass uses the same objectAtIndex: method, most probably this will not be the case! Therefore caching objectAtIndex: by asking NSArray class directly is wrong. You need ask the specific object and its actual class will return the proper address, and you should only use it on this particular object and objects you know are of identical class.
You determine a methods address with the methodForSelector: method and get a so called IMP returned.
IMP imp = [anObject methodForSelector:@selector( fooMethod:)];
|
An IMP is a type defined by Foundation and it is a function pointer returning an id, with a variable number of arguments, whose first two parameters are the object and the selector (just like objc_msgSend). Here is its definition copied from /System/Library/Frameworks/System.framework/Headers/objc/objc.h
/* objc.h
* Copyright 1988-1996, NeXT Software, Inc.
*/
#ifndef _OBJC_OBJC_H_
#define _OBJC_OBJC_H_
#import <objc/objc-api.h> // for OBJC_EXPORT
typedef struct objc_class *Class;
typedef struct objc_object {
Class isa;
} *id;
typedef struct objc_selector *SEL;
typedef id (*IMP)(id, SEL, ...);
|
Here is a little function that uses an IMP and its object to call the method with one parameter. Note that for many methods that do not return an int or an id like our fooMethod it is necessary(3) that the IMP is casted to the correct type.
double call_test3b( CallTest3 *p, double (*f)( id, SEL, ...), float x)
{
return( (*f)( p, @selector( fooMethod:), x));
}
|
And you would call it like this, notice (again) the cast:
This is our now well known fooMethod (green).
|
0x2d24 <-[CallTest4 fooMethod:]>: addis r9,r12,0
0x2d28 <-[CallTest4 fooMethod:]+4>: addi r9,r9,732
0x2d2C <-[CallTest4 fooMethod:]+8>: lfd f0,0(r9)
0x2d30 <-[CallTest4 fooMethod:]+12>: fmul f1,f1,f0
0x2d34 <-[CallTest4 fooMethod:]+16>: blr
|
call_test3b( p,
(double (*)( id, SEL, ...)) /* cast */
[p methodForSelector:@selector( fooMethod:)],
2.1);
|
|
As the disassembly shows, three more instructions are introduced by using a function pointer call (blue). What you do not see disassembled is the stub code, because it and the objc_msgSend code are now circumvented.
Caching shared lib C functions
|
Interestingly enough you can cache shared library functions using C function pointers and call them directly, circumventing the stub code. An example to speed up a read loop (marginally)
int (*f_read)( int fd, char *buf int len);
f_read = read;
while( (*f_read)( fd, buf, 1) == 1)
...
|
|
|
0x2cd4 <call_test3b>: mflr r0
0x2cd8 <call_test3b+4>: stw r31,-4(r1)
0x2cdc <call_test3b+8>: stw r0,8(r1)
0x2ce0 <call_test3b+12>: stwu r1,-80(r1)
0x2ce4 <call_test3b+16>: bcl 20,4*cr7+so,0x2ce8
0x2ce8 <call_test3b+20>: mflr r31
0x2cec <call_test3b+24>: mr r0,r4
0x2cf0 <call_test3b+28>: addis r4,r31,0
0x2cf4 <call_test3b+32>: lwz r4,4888(r4)
0x2cf8 <call_test3b+36>: stfd f1,56(r1)
0x2cfc <call_test3b+40>: lwz r5,56(r1)
0x2d00 <call_test3b+44>: lwz r6,60(r1)
0x2d04 <call_test3b+48>: mtlr r0
0x2d08 <call_test3b+52>: mr r12,r0
0x2d0c <call_test3b+56>: blrl
0x2d10 <call_test3b+60>: addi r1,r1,80
0x2d14 <call_test3b+64>: lwz r0,8(r1)
0x2d18 <call_test3b+68>: mtlr r0
0x2d1c <call_test3b+72>: lwz r31,-4(r1)
0x2d20 <call_test3b+76>: blr
|
|
Using the method address can only pay off with the second call of the method. That's because getting the address takes one objc_msgSend also.
What you gain: Speed. Hidden Pitfalls (See next)
What you lose: Possibly generality.
Hidden pitfalls when using IMPs
With methodForSelector: you are getting the address of a method implementation for that class. Which particular method implementation (remember method overloading) is determined at runtime. The method implementation to be used for a class can possibly change during runtime.
Whenever a class is posed away with (+poseAsClass:), the implementation of a method is likely to change. Whenever a category for this class is added, for example by loading an NSBundle, that implementation could possibly change. It is therefore in theory only safe to use the direct call, when you can be certain that the class system remains fixed for the duration of the use of this IMP. For the fainthearted that would mean "never".
But lets check with reality. Whenever you are using a NSNotificationCenter f.e. you are potentially running into the same pitfall, since NSNotificationCenter happens to store the implementation address and not the selector. So if Foundation can do it so can you. Rationalizing this a bit further, poser classes and categories in NSBundles should not be surprised if their code doesn't work as expected, when they are loaded into a nicely running system (via a NSBundle f.e). So we should give the "Schwarzer Peter" to them not to methodForSelector:
An often implemented compromise is to cache the method implementation during the lifetime of the caller, as in the following example.
_array is an instance variable of an NSArray subclass. It contains objects, that implement the methods operate and annihilate.
operateAnnihilate task is to loop through all objects of the array and call both methods in succession. The code optimizes NSArray's objectAtIndex: method call out of the loop, by resolving the implementation address once beforehand.
If it were ensured, that only objects of one same class are ever stored in _array, it would be possible to optimize this method further by manually resolving not only the objectAtIndex: method address, but also the operate and annihilate method addresses.
By NOT doing this we keep the implementation more general and versatile.
|
- (void) operateAnnihilate
{
SEL sel;
IMP f;
int n;
int i;
id p;
sel = @selector( objectAtIndex:);
f = [_array methodForSelector:sel];
n = [_array count];
for( i = 0; i < n; i++)
{
p = (f)( _array, sel, i);
[p operate];
[p annihilate];
}
}
|
|
|
Using inline functions instead of method calls
An inline function can only access private or protected declared instance variables, if the function is declared within the @implementation of the class. Unfortunately you can not declare functions inside of the @interface clause. So this does not work
@interface Foo : NSObject
{
int someVar;
}
// it ain't compiling folks
static inline int someComputedValue( id obj)
{
return( ((Foo *) obj)->someVar * STRANGE_CONSTANT);
}
@end
|
But using @defs (thanks to ZNeK for the suggestion) it can be written like this:
@interface Foo : NSObject
{
int someVar;
}
@end
static inline int someComputedValue( id obj)
{
return( ((struct { @defs( Foo) } *) obj)->someVar * STRANGE_CONSTANT);
}
|
What you gain: speed
What you lose: run time binding of methods.
The lossage is obvious, since we are not calling a method, we are just calling/inlining a C - function.
Wrap Up (The Executive Version)
<Insert the usual optimization disclaimer here, that I am just too tired of writing AND reading>
Obj-C method calls aren't slow, but they are slower than plain C calls. Outside of inline code, the fastest invocations are the static C function (or any function that isn't crossing shared library boundaries) and equally fast if not even a smidgen faster message implementations (IMP)s.
In times of desperation when you feel that you must abandon objects in favor of performance, think again. Using a mixture of C-technology for performance and Objective-C methods for convenience oughta sound more attractive than dropping Objective-C altogether.
Optimize shared library C calls by using function pointers.
Optimize Objective C messages calls, by caching the instance or class method address.
If you want to discuss this articles, please do so in this thread in
the Mulle kybernetiK Optimization Forum.
(0)inline is a common compiler extension, though not properly part of the C language (yet).
(1) This address is btw usually not as random as one might think. Most shared libraries, although principally relocatable, have a certain default address they usually load to. The position of the application code is determined by the linker. If your binary doesn´t change it is always loaded to the same virtual memory address.
(2)Some of the code in the stub, would have to be inlined into the caller, if it weren't in the stub. So strictly the overhead is not 9 instructions.
(3)Necessary as in object code necessary, rather than necessary for lack of warnings or to please your coding style fascist colleagues :).
|