mulle-objc: caches pivot - selectors mask - methods preload
Continued from mulle_objc: method searching and accidental override protection.
So now that we know how a method is looked up by its selector, let’s look at the few particularities of the mulle-objc method cache in a bit more detail.
Read Let’s Build objc_msgSend and Obj-C Optimization: The faster objc_msgSend for some more background on the used cache algorithm. It won’t be talked about here.
Cache entry: struct _mulle_objc_cacheentry
A cache entry is just a pairing of a selector (SEL) with its implementation (IMP). It is known, that the cache entry size is always a power of two.
struct _mulle_objc_cacheentry
{
union
{
mulle_objc_uniqueid_t uniqueid;
mulle_atomic_pointer_t pointer;
} key;
union
{
mulle_atomic_functionpointer_t functionpointer;
mulle_atomic_pointer_t pointer;
} value;
};
Field | Description |
---|---|
key.uniqueid |
This is merely used for debugging (it may not work so well on bigendian machines). |
key.pointer |
Stores for example the selector or the classid. |
value.functionpointer |
Stores for example the IMP . |
value.pointer |
Stores for example the class. |
Cache: struct _mulle_objc_cache
The cache is a variably sized struct. It contains an array of cache entries, with some information about the cache itself:
struct _mulle_objc_cache
{
mulle_atomic_pointer_t n;
mulle_objc_cache_uint_t size;
mulle_objc_cache_uint_t mask;
struct _mulle_objc_cacheentry entries[ 1];
};
Field | Description |
---|---|
n |
number of entries in the cache |
size |
maximum number of entries the cache can hold. This is always a power of two (2,4,8,16,…) |
mask |
bitmask for the uniqueid to offset into entries (more about this later) |
entries |
the cache entries, indexable by 0 .. size-1 |
Once a cache is created, size
and mask
never change. It is not possible to
grow or shrink a cache. A class that needs to store more entries will create
a new,larger cache and replace the old one with it.
A simple cache of size
8, containing one entry could look like this:
{
1,
8,
0x70,
{
{ 0, 0 }, // 0x00
{ 0, 0 }, // 0x10
{ 0, 0 }, // 0x20
{ 0, 0 }, // 0x30
{ 0, 0 }, // 0x40
{ @selector( foo:), my_foo } // 0x50
{ 0, 0 }, // 0x60
{ 0, 0 }, // 0x70
{ 0, 0 } // 0x80
}
}
To index into the cache we take the address of cache->entries[ 0]
and
then add the masked-off bits of the selector to it.
Let’s say that we are on a 64 bit machine, a cache entry will be 16 bytes large. The offset from (char *) &entries[ 0]
to
(char *) &entries[ 1]
is therefore also 16 bytes and from
(char *) &entries[ 0]
to (char *) &entries[ 2]
is 32 bytes and so on.
So for all 8 entries we get this table:
Index | Decimal Offset | as Hex | as Binary |
---|---|---|---|
0 | 0 | 00 | 00000000 |
1 | 16 | 10 | 00010000 |
2 | 32 | 20 | 00100000 |
3 | 48 | 30 | 00110000 |
4 | 64 | 40 | 01000000 |
5 | 72 | 50 | 01010000 |
6 | 96 | 60 | 01100000 |
7 | 112 | 70 | 01110000 |
As you can see the variations are in three bits only. These bits make up the mask
value.
The selector is a FNV1 hashed value it has fairly good entropy over all bits, it doesn’t really matter where we grab the bits off to index into the cache all of them are equally likely to contain a 0 or a 1.
The value of @selector( foo:)
is 0xb4b117d3 and the mask is 0x70. So apply
the mask
unto the selector and get the proper offset to add to
(char *) &entries[0]
.
0xb4b117d3 & 0x70 == 0x50
And that’s how mulle-objc indexes into the cache.
We could diffuse the bits even further by using a murmur hash avalanche on it, but I believe this to be unnecessary.
This is superior to other runtimes, which use selector addresses. Addresses do not have good bit entropy and often need to be shifted appropriately.
Cache pivot: struct _mulle_objc_cachepivot
The cache pivot is a similar idea to the way the isa
pointer is stored in
instances. See: mulle_objc: object layout, retain counting, finalize.
Instead of memorizing the beginning of the cache, memorize the most interesting
part of it. The compiler is able to produce better indexing code into the cache
if the initial pointer is pointing to &entries->entries[ 0]
instead of to
just to the cache.
The following struct
s main raison d’être is readability:
struct _mulle_objc_cachepivot
{
mulle_atomic_pointer_t entries; // for atomic XCHG with pointer indirection
};
Field | Description |
---|---|
entries |
Address of a cache’s entries |
To get the address of the struct _mulle_objc_cache
the entries belong to,
you do some arithmetic, which _mulle_objc_cachepivot_nonatomic_get_cache
does for you.
Method calls combine cache access and method lookup
This is a mind-map of what a method call like [foo doSomething]
actually
does.
- At first the cache will be empty.
- So on the first
-doSomething
call, where no other method has been called for this class, the right “cache entry available” path will be used. (“cache miss”) - When
-doSomething
is called a second time, the cache will now have an occupied entry for-doSomething
and the left path is used. (“cache hit”) - Other methods also occupy slots, they may hash to the same index. (“cache collison”) Then the cache search has to loop through the entries to either find a matching entry or an available one.
For often used selectors it would be preferable to ensure, that there are ahead of other colliding selectors to have the fewest or no cache collisions.
There are provisions for that in the runtime:
- There is a bit in the method descriptor, that advises the cache algorithm to preload this method. The compiler does not support this bit yet, though.
- The runtime contains a list of selectors to preload. These are preloaded after the class specific ones.
Preloading gives a method a preferable entry in the cache, hopefully allowing it to be indexed immediately.
Ultimately the cache is completely filled up and all calls will be fast.
Continue to mulle-objc: divining the guts of message sending.
Post a comment
All comments are held for moderation; basic HTML formatting accepted.