Nat! bio photo


Senior Mull.

Twitter RSS


mulle-objc: tagged pointers, boon or bane ?

Continued from mulle-objc: mulle-objc: fast methods make mulle_objc_object_call even faster.

Something that I've always passed over in the previous articles, was the function _mulle_objc_object_get_isa. Let's have a look at it:

static inline struct _mulle_objc_class   *_mulle_objc_object_get_isa( void *obj)
   unsigned int                 index;
   struct _mulle_objc_runtime   *runtime;

   index = _mulle_objc_object_get_taggedpointer_index( obj);
   if( __builtin_expect( ! index, 1))
      return( _mulle_objc_objectheader_get_isa( _mulle_objc_object_get_objectheader( obj)));

   runtime = mulle_objc_inlined_get_runtime();
   return( runtime->taggedpointers.pointerclass[ index]);

We can see that there is the "classical" way of getting the isa from self, by reading the Class pointer from an offset from self:

     return( _mulle_objc_objectheader_get_isa( _mulle_objc_object_get_objectheader( obj)));

But the other parts of the code are used for tagged pointers or TPS for short.

The scheme used by _mulle_objc_object_get_taggedpointer_index is the following:

Architecture Bitmask
32 bit 0x3
64 bit 0x7

See: mulle_objc_taggedpointer.h for more details.

If the value of self ANDed with the bitmask is zero, then self is a conventional object. Any other value indicates a tagged pointer. Classes for TPS are stored in the runtime. And here we have the first problem. In mulle-objc the runtime is usually accessed via the class, but we don't have the class yet.

Getting the TPS class

There are two possible configurations for the runtime, global and thread-local. global is the default. In this case the runtime is stored in a global variable. Access to it is assumed to be reasonably fast, but still its another overhead incurred on every method call. In the thread-local case though, the runtime is retrieved via mulle_thread_tss_get which does a pthread_getspecific on many platforms.

Now pthread_getspecific is very fast, but


->  0x100000f28 <+0>: jmpq   *0xe2(%rip)               ; (void *)0x00007fff86670d4c: pthread_getspecific

->  0x7fff86670d4c <+0>: movq   %gs:(,%rdi,8), %rax
    0x7fff86670d55 <+9>: retq

still calling a shared library function, could put even more of a damper on the proceedings. But none of this has been really benchmarked so far.

pthreads really should provide an inline function for pthread_getspecific.

Pros and Cons of TPS

What can we fit into a TPS ? Small strings of like mulle_char5_t for example.

A standard object in mulle-objc has a guaranteed footprint of at least 2 * sizeof( uintptr_t), which translates on 64 bit to 16 bytes. This memory is used for the retain-count and isa. Now add the data required for the characters. In an app that holds 16 M unique strings of 7 ASCII characters each, that is 256 MB overhead for a payload of about half the size.

With tagged pointers you can eliminate this overhead, if the strings fit the TPS encoding. The creation of a TPS object is also cheaper than a conventional object , since you don't call malloc. Retain/release of the object are also very cheap as it is a NOP.

The big downside of TPS is, that it does slow down all other non TPS objects method calls. The carefully crafted inlinable code section of mulle_objc_object_call now suddenly enlarges by quite a bit. This might make first stage inlining prohibitive. But if we remove this inlining, we will slow-down other objects even more.

So Boon or Bane ?

I don't know!

My gut feeling is, that TPS will pay off in most programs. Currently the compiler does compile with TPS by default. This will define the __MULLE_OBJC_TPS__ (in "future" version The runtime checks this and adds the TPS related code.

You can turn off the generation of tagged pointers with -fno-objc-tps. Since you can not mix TPS with non-TPS code, the runtime checks that you don't load classes with mixed settings.

Post a comment

All comments are held for moderation; basic HTML formatting accepted.

E-mail: (not published)