Nat! bio photo


Senior Mull

Twitter Github Twitch

mulle-objc: caches pivot - selectors mask - methods preload

Continued from mulle_objc: method searching and accidental override protection.

So now that we know how a method is looked up by its selector, let’s look at the few particularities of the mulle-objc method cache in a bit more detail.

Read Let’s Build objc_msgSend and Obj-C Optimization: The faster objc_msgSend for some more background on the used cache algorithm. It won’t be talked about here.

Cache entry: struct _mulle_objc_cacheentry

A cache entry is just a pairing of a selector (SEL) with its implementation (IMP). It is known, that the cache entry size is always a power of two.

struct _mulle_objc_cacheentry
      mulle_objc_uniqueid_t         uniqueid;
      mulle_atomic_pointer_t        pointer;
   } key;
      mulle_atomic_functionpointer_t   functionpointer;
      mulle_atomic_pointer_t           pointer;
   } value;
Field Description
key.uniqueid This is merely used for debugging (it may not work so well on bigendian machines).
key.pointer Stores for example the selector or the classid.
value.functionpointer Stores for example the IMP.
value.pointer Stores for example the class.

Cache: struct _mulle_objc_cache

The cache is a variably sized struct. It contains an array of cache entries, with some information about the cache itself:

struct _mulle_objc_cache
   mulle_atomic_pointer_t          n;
   mulle_objc_cache_uint_t         size;
   mulle_objc_cache_uint_t         mask;
   struct _mulle_objc_cacheentry   entries[ 1];
Field Description
n number of entries in the cache
size maximum number of entries the cache can hold. This is always a power of two (2,4,8,16,…)
mask bitmask for the uniqueid to offset into entries (more about this later)
entries the cache entries, indexable by 0 .. size-1

Once a cache is created, size and mask never change. It is not possible to grow or shrink a cache. A class that needs to store more entries will create a new,larger cache and replace the old one with it.

A simple cache of size 8, containing one entry could look like this:

      { 0, 0 },  // 0x00
      { 0, 0 },  // 0x10
      { 0, 0 },  // 0x20
      { 0, 0 },  // 0x30
      { 0, 0 },  // 0x40
      { @selector( foo:), my_foo }  // 0x50
      { 0, 0 },  // 0x60
      { 0, 0 },  // 0x70
      { 0, 0 }   // 0x80

To index into the cache we take the address of cache->entries[ 0] and then add the masked-off bits of the selector to it.

Let’s say that we are on a 64 bit machine, a cache entry will be 16 bytes large. The offset from (char *) &entries[ 0] to (char *) &entries[ 1] is therefore also 16 bytes and from (char *) &entries[ 0] to (char *) &entries[ 2] is 32 bytes and so on. So for all 8 entries we get this table:

Index  Decimal Offset as Hex as Binary
0 0 00 00000000
1 16 10 00010000
2 32 20 00100000
3 48 30 00110000
4 64 40 01000000
5 72 50 01010000
6 96 60 01100000
7 112 70 01110000

As you can see the variations are in three bits only. These bits make up the mask value.

The selector is a FNV1 hashed value it has fairly good entropy over all bits, it doesn’t really matter where we grab the bits off to index into the cache all of them are equally likely to contain a 0 or a 1.

The value of @selector( foo:) is 0xb4b117d3 and the mask is 0x70. So apply the mask unto the selector and get the proper offset to add to (char *) &entries[0].

0xb4b117d3 & 0x70 == 0x50

And that’s how mulle-objc indexes into the cache.

We could diffuse the bits even further by using a murmur hash avalanche on it, but I believe this to be unnecessary.

This is superior to other runtimes, which use selector addresses. Addresses do not have good bit entropy and often need to be shifted appropriately.

Cache pivot: struct _mulle_objc_cachepivot

The cache pivot is a similar idea to the way the isa pointer is stored in instances. See: mulle_objc: object layout, retain counting, finalize.

Instead of memorizing the beginning of the cache, memorize the most interesting part of it. The compiler is able to produce better indexing code into the cache if the initial pointer is pointing to &entries->entries[ 0] instead of to just to the cache.

The following structs main raison d’être is readability:

struct _mulle_objc_cachepivot
   mulle_atomic_pointer_t   entries; // for atomic XCHG with pointer indirection
Field Description
entries Address of a cache’s entries

To get the address of the struct _mulle_objc_cache the entries belong to, you do some arithmetic, which _mulle_objc_cachepivot_nonatomic_get_cache does for you.

Method calls combine cache access and method lookup

This is a mind-map of what a method call like [foo doSomething] actually does.


  1. At first the cache will be empty.
  2. So on the first -doSomething call, where no other method has been called for this class, the right “cache entry available” path will be used. (“cache miss”)
  3. When -doSomething is called a second time, the cache will now have an occupied entry for -doSomething and the left path is used. (“cache hit”)
  4. Other methods also occupy slots, they may hash to the same index. (“cache collison”) Then the cache search has to loop through the entries to either find a matching entry or an available one.

For often used selectors it would be preferable to ensure, that there are ahead of other colliding selectors to have the fewest or no cache collisions.

There are provisions for that in the runtime:

  1. There is a bit in the method descriptor, that advises the cache algorithm to preload this method. The compiler does not support this bit yet, though.
  2. The runtime contains a list of selectors to preload. These are preloaded after the class specific ones.

Preloading gives a method a preferable entry in the cache, hopefully allowing it to be indexed immediately.

Ultimately the cache is completely filled up and all calls will be fast.

Continue to mulle-objc: divining the guts of message sending.

Post a comment

All comments are held for moderation; basic HTML formatting accepted.

E-mail: (not published)