Nat! bio photo

Nat!

Senior Mull.

Twitter RSS

Github

mulle-objc: inlined messaging

Continued from mulle-objc: removing superflous ifs

Last time I showed how superflous code could be removed from the method calling code due to inlining. The logical next step is to try to inline the message sending itself.

The mulle-objc runtime uses pretty much the same method cache mechanism as explained in Obj-C Optimization: The faster objc_msgSend , which is the one used in the Apple runtime. I will use a slightly modified c_objc_msgSend code that I showed there as the basis for discussion. (The only real difference is the change in the call signature):

void   *c_objc_msgSend( void *self, SEL _cmd, void *_param)
{
   struct objc_cache    *cache;
   struct objc_class    *cls;
   struct objc_method   *method;   
   unsigned int         hash;
   unsigned int         index;
   
   if( self)
   {
      cls   = self->isa;
      cache = cls->cache;
      hash  = cache->mask;
      index = (unsigned int) _cmd & hash;
      
      do
      {
         method = cache->buckets[ index];
         if( ! method)
            goto recache;
         index = (index + 1) & cache->mask;
      }
      while( method->method_name != _cmd);
      return( (*method->method_imp)( self, _cmd, _param));
   }
   return( self);

recache:
   /* ... */
   return( 0);
}

The preferred slot

Lets assume that the cache has an average fill rate of 25%. Then the average chance of a method being in the proper slot is ~80%.

Because I am not a math guy, I wrote a monte carlo simulation that tried to fill a cache with random indices with predetermined fill rates. It then computed the slot clashes. You can look at it here: research-cache-spread.c.

Being in the preferred slot means, that the do while does not have to loop.

The inlining ObjC message sender

Stripping above code down to the 80% case, we get some code that looks very inlinable:

static inline void   *c_objc_msgSend( void *self, SEL _cmd, void *_param)
{
   struct objc_cache    *cache;
   struct objc_class    *cls;
   struct objc_method   *method;   
   unsigned int         hash;
   unsigned int         index;
   
   if( ! self)
      return( self);

   cls   = self->isa;
   cache = cls->cache;
   hash  = cache->mask;
   index = (unsigned int) _cmd & hash;
   method = cache->buckets[ index];
   if( method->method_name == _cmd)
      return( (*method->method_imp)( (id) self, _cmd, _param));
   return( c_objc_msgSend_2( (id) self, _cmd, _param));
}

Whenever the preferred slot is missed, code will execute quite a bit slower as c_objc_msgSend_2 must duplicate some effort. Yet with a hit chance of 80% this looks very much worthwile, especially since you have the runtime under your control and can hint to it, which kind of methods should be in the preferred slot.

The actual mulle-objc runtime code is similar but even better. It is possible to reduce the inlining overhead in x86_64 to 32 bytes.

But that is getting in too deep for now.

Food for thought

  1. How many method calls are there in a typical ObjC program ?
  2. Which selectors are hot and which aren't ?
  3. How would one mark a method as preferred ?

Continue to mulle-objc: some research about selectors