Nat! bio photo

Nat!

Senior Mull

Twitter Github Twitch

mulle-aba: How it works, part 3

Continued from mulle-aba: How it works, part 2

lock-free data structure “world”

Remember that the CAS instruction and the various atomic increment/decrement instruction operate on register wide data. So how do you solve an atomicity problem, with data that is wider than a register ? Let’s say I have only 64 bit atomic instructions available, how do I atomically increment 128 bit ? You create a 128 bit struct containing two 64 bit fields and reference it by its 64 bit address.

This struct is the world state and the reference to it is the world pointer. Both together form the world. The world pointer has a unique location, there is only one world pointer to a world.

A world state is read only, once it has been referenced by a world pointer. Whenever one needs to evolve the world, a copy is made of the current world state. Then the copy is changed and then you atomically change the 64 bit world pointer using CAS to the copy. If the CAS succeeds the copy has become the new world state. If the CAS fails you just have to do it all over again with the now current world state.

This is an implementation of just such a extremely limited 128 bit world.

struct _mulle_int128_world
{
   unsigned long   hi;
   unsigned long   lo;
};


static inline void   _mulle_int128_increment( struct _mulle_int128_world *p)
{
   if( ! ++p->lo)
     ++p->hi;
}


void   _mulle_int128_atomic_increment( struct _mulle_int128_world **p_world_pointer)
{
   _mulle_int128_world   *new_world;
   _mulle_int128_world   *old_world;

   new_world = malloc( sizeof( struct _mulle_int128_world));
   do
   {
      old_world = _mulle_atomic_read_pointer( p_world_pointer);
      memcpy( new_world, old_world, sizeof( struct _mulle_int128_world));
      _mulle_int128_increment( new_world);
   }
   while( ! _mulle_atomic_compare_and_swap_pointer( p_world_pointer, new_world, old_world));

   // old_world  leaks -> use mulle_aba_free to free it
}

The name world is apt, because it implies, that all values are semantically connected - analogous to the Butterfly effect. Conversely, if the values aren’t connected, it’s by definition not a world.

If the world state could be reconstructed at anytime by any thread, it is not a world but it is a cache. This distinction is important.

World rules

  1. All data in a world state is co-dependent.
  2. There is one and only one world pointer for a world. This implies that at any given time there is also only one current world state. (though threads may operate temporarily on stale world states)
  3. A world state can not be recreated, if lost (otherwise it’s a cache. A cache has different semantics)
  4. A world state, that is or has been referenced by the world pointer is read only until freed (-> ABA!)
  5. A world evolves, by changing the world pointer to a new world state. You only change the world pointer, if the world pointer is still pointing to the assumed previous world state.

If you leave it at that, everything is easy and basically foolproof and lock-free. The rules give a good guideline designing your own world, what to put in and what to leave out.

A cache is different. Since all threads are able to reconstitute the cache contents it is permissible to leave parts of the cache empty and modify them when needed (atomically of course).

What is in the mulle-aba world ?

The world is defined in mulle_aba_storage.h. A world is needed because mulle-aba needs to update the time segment and the “number of threads” atomically.

struct _mulle_aba_world
{
   uintptr_t                             _timestamp;       // current timestamp
   uintptr_t                             _n_threads;       // currently active threads

   // storage for registered threads
   uintptr_t                             _offset;          // _min_timestamp / _mulle_aba_timestamp_storage_n_entries
   unsigned int                          _size;            // storage of timestamp, rc, blocks
   unsigned int                          _n;
   struct _mulle_aba_timestamp_storage   *_storage[];
};
  • _timestamp contains the current index of the time segment. The head of the Snafu worm.
  • _n_threads has the number of registered threads
  • the rest of the world contains information about the configuration of the Snafu worm. The shape of the worm is co-dependent on the _timestamp, so I believe this is correctly still a world.

But then there is the need to optimize…

Use part of the world pointer as world state

mulle-aba assumes that each world is at least 32 bit aligned. That means, that the lower 2 bits of a world state copy are assumed to be always zero.

In mulle-aba threads periodically check in. Whenever that happens, a new time segment needs to be used for the next free operation. In order to avoid allocating a new world state every time that happens, the least significant bit is used as a dirty bit.

The second bit is used for soft-locking. Soft locking blocks concurrent access from threads that register or unregister (but not those that free and checkin as they ignore the soft-lock bit and are free to overwrite the world pointer with new contents).

It’s ABA all over again

The astute reader will notice, that old worlds need to be freed. Of course there is the problem of ABA all over again. But this problem mulle-aba can solve with mulle-aba without going into a recursion. New worlds are most often created during the free operation, the check in only sets the world pointer bit. When a new time segment is opened, the old world will appear on its linked list as a resource to be freed along with the freed resource.

The end

This pretty much concludes the “how it works series” unless someone has questions. :)


Post a comment

All comments are held for moderation; basic HTML formatting accepted.

Name:
E-mail: (not published)
Website: