Nat! bio photo

Nat!

Senior Mull

Twitter Github Twitch

mulle-objc: hashes for classes, selectors and protocols

Continued from mulle-objc: some research about selectors

One of the main problems for a runtime is to get all selectors and classes uniqued across all the various shared libraries. In a single link unit, you can probably depend upon the linker to fold the various strings together to one address. OS X even seems to do it across shared library boundaries. Then you can use the address of the selector string as a unique ID. But FreeBSD for example doesn’t unique strings across shared library boundaries.

So what should @selector( foo:) be in a runtime, that is loaded by any random C linker ? mulle-objc solves the problem like this:

@selector returns a hash the size of uint32_t

The hash of a selector is a FNV1 hash. It was chosen because it is fast and has very little collisions for english text keys. So how much hash collisions are there with my 234,365 unique selectors test case ?

With FNV1 there were no collisions!

None of the selectors had a hash of 0x00000000 or 0xffffffff. In fact the lowest observed hash was 0000ed91 gotoNext and the highest was ffff8baf _unlockMenuBarIfNeeded. No selectors suffered from a collision.

Some math

32 bit could optimally hash 4294967296 different selectors. Removing 0x00000000 and 0xffffffff leaves 4294967294. When there are 234,365 selectors in a system, their hashes occupy 5 permille of the available hashspace.

Nevertheless a collision at runtime is unacceptable. The solution for mulle-objc is, that each selector is added to the runtime by name. If there is a hash collision, the selector can not be added. The solution for the programmer would be to rename the method to avoid the clash. Note, that the number of selectors used in any given application is usually much, much less than 100,000 selectors.

  • TextEdit.app has 35922 selectors
  • Foundation has 6560 selectors
  • AppKit has 35785 selectors

Advantages of using a hash value as @selector

Because the hash diffuses its bit very well, such a selector is excellent for indexing into a hash table. You only need to mask it and there is no need to shift it. It’s much better than using parts of a uniqued string address because of better entropy.

I made a map of the spread of the hashed selectors in 2D space, which produces this picture, (you need to zoom in a little so see anything). If there is anything to deduce from it, it is, that the bit diffusion looks good.

The selector is immediate. It’s trivial to write inlinable code, that does something special with certain selectors at zero cost while compiling.

Example:

static inline void  *c_obj_msgSend( void *self, SEL _cmd, void *_param)
{
   if( sel == MY_PREFERRED_SEL)
     return( do_something_preferred( self, _cmd, _param);
   return( c_obj_msgSend2( self, _cmd, _param);
}

The selector is globally unique within it’s bit-size domain. A selector for a string in 32 bit is the same everywhere. And in general it’s smaller than its string representation, which may also help sometimes.

Classes and Protocols are uniqued the same way

Classes and protocol also get unique IDs by their hash, exactly like selectors. Protocols are a little different, as they are just a hash (and nothing more) at runtime.

Continued to mulle-objc: object layout, retain counting, finalize