Nat! bio photo

Nat!

Senior Mull

Twitter Github Twitch

mulle-objc: fast methods make mulle_objc_object_call even faster

Continued from mulle-objc: investigating the pros and cons of inlining mulle_objc_object_call.

To speed up the most often used methods, mulle-objc uses a per class table-based indexing mechanism called _mulle_objc_fastmethodtable or short vtab, that is quite similiar to the _vptr of C++.

Support for this faster indexing scheme, is enabled, when you use the compiler optimization level -O1 or above. At -O1 the following method mulle_objc_object_constant_methodid_call is used:

MULLE_C_ALWAYS_INLINE
static inline void   *mulle_objc_object_constant_methodid_call( void *obj,
                                                                mulle_objc_methodid_t methodid,
                                                                void *parameter)
{
   struct _mulle_objc_class   *cls;
   int                        index;

   if( __builtin_expect( ! obj, 0))
      return( obj);

   cls   = _mulle_objc_object_get_isa( obj);
   index = mulle_objc_get_fastmethodtable_index( methodid);
   if( index >= 0)
      return( _mulle_objc_fastmethodtable_invoke( obj, methodid, parameter, &cls->vtab, index));

   return( (*cls->call)( obj, methodid, parameter, cls));
}

The inline function mulle_objc_get_fastmethodtable_index converts a methodid into an index from 0 to 23, when it matches a fast method. For instance the selector for init has an index of 1. Remember that the compiler knows the selector of the method and this selector is an integer constant. So the compiler can solve this function at compile time. This conversion and also the if( index >= 0) are “for free”.

Assuming we know that self is not nil and we make a call to -init, this inlined method call simplifies to:

{
   struct _mulle_objc_class   *cls;

   cls   = _mulle_objc_object_get_isa( obj);
   return( _mulle_objc_fastmethodtable_invoke( obj, @selector( init), parameter, &cls->vtab, 1));
}

_mulle_objc_fastmethodtable_invoke will then read the appropriate function from the vtab and call the method.

Calculating the memory overhead

Obviously it’s much faster to call Objective-C methods this way, so why not do it for all methods ? The answer is space constraints. Due to to the nature of message sending, each class of a class-pair must have a vtab like this. Furthermore every class-pair of the runtime must have a compatible vtab. Lastly the indexes of the selectors must be common across all classes. That is quite restrictive.

The overhead per added fastcall methods is therefore sizeof( IMP) * 2 * #classes. Assuming 512 classes on a 64 bit machine that would be a 8KB overhead.

How many classes are there ?

In Xcode 7.2.1, the number of classes with a NS prefix was 489.

grep -Rho "@interface [a-zA-Z]\+" \
   /Applications/Xcode.app/Contents/Developer/Platforms \
cut -d" " -f2 | awk '!a[$0]++' | sort > classes.txt`
egrep '^NS' classes.txt | wc -l

Choosing fast-method selectors

As we have seen in mulle-objc: some research about selectors an average Objective-C program uses a small set of selectors a lot, and most of the other selectors rather seldomly:

703,484 class
483,335 release
343,421 retain
300,315 dealloc
299,138 autorelease
294,309 characterAtIndex:
260,189 isEqual:
250,537 hash
228,409 allocWithZone:
197,911 alloc
195,324 count
179,983 length
171,437 objectForKey:
156,440 self
138,796 superview
130,172 init

It therefore makes sense to use these for fast methods. The mulle-objc-runtime predefines indices 0-7 for the following selectors in mulle_objc_fastmethodtable.h:

Index Selectors
0 @selector( alloc)
1 @selector( init)
2 @selector( finalize)
3 @selector( dealloc)
4 @selector( instantiate)
5 @selector( autorelease)
6 @selector( retain)
7 @selector( release)

leaving 15 free slots for a Foundation or user apps to use.

MulleObjC currently defines indices 8-21 for its own purposes in ns_fastmethodids.h:

Index Selectors
8 @selector( length)
9 @selector( count)
10 @selector( self)
11 @selector( hash)
12 @selector( nextObject)
13 @selector( timeIntervalSinceReferenceDate)
14 @selector( lock)
15 @selector( unlock)
16 @selector( class)
17 @selector( isKindOfClass:)
18 @selector( objectAtIndex:)
19 @selector( characterAtIndex:)
20 @selector( methodForSelector:)
21 @selector( respondsToSelector:)

A method like count is really a no-brainer. It is a method that one can assume is just a simple accessor. It is also called very often. Therefore the regular Objective-C method overhead will be a significant factor of the runtime cost of calling count.

A method like timeIntervalSinceReferenceDate is debatable, because it may not be called often enough. Also it could be making a system call, which would negate the advantage of making this a fast method.

How to use a fast method for your own code

As MulleObjC is greedy, it leaves you only with two indexes 22,23.

So we want to speed up a method called fastMethod:. It doesn’t matter if it is a class or an instance method. The selector for this methods happens to be 0x006f9586.

Now whereever you call this function, you must define MULLE_OBJC_FASTMETHODHASH_23 ahead of including <mulle_objc/mulle_objc.h>

#define MULLE_OBJC_FASTMETHODHASH_23   0x006f9586
#include <MulleObjC/MulleObjC>

See: fastmethod.m for another example

Code that forgets to do this, will not get the benefits of fast method calls, but that is the only consequence.

Obviously it’s a serious problem, if you have multiple, differing definitions for MULLE_OBJC_FASTMETHODHASH_23 throughout your code.


Continued to mulle-objc: tagged pointers, boon or bane ?`.

‘mulle-objc: tagged pointers, boon or bane ?’


Post a comment

All comments are held for moderation; basic HTML formatting accepted.

Name:
E-mail: (not published)
Website: