Optimized Hashing in Mac OS X

This articles starts with an intro about hashing for the budding coder. Seasoned veterans amble down to "Unfortunately on Mac OS X, it's not so simple"

You can download my test cases. Since these are projects made with ProjectBuilderWO you will have to import it into XCode, if you don't have ProjectBuilderWO installed. You can install ProjectBuilderWO from the older developer CDs as a custom install option or from the new developer CD as an optional package in the packages folder if you run on pre-Panther Mac OS X.

- (unsigned int) hash

Every instance of NSObject or any of its subclasses (or more exact, every class that implements the NSObject protocol) responds to the message -hash. To really drive this home, 99% of all classes respond to hash.

One usage of the number returned by hash is as a shortcut for the isEqual: message. If a comparison between two objects yields YES then also both their hash values are equal. The other way around is not true though, two disequal objects may very well by chance return the same hash value. This means that comparing hash values is no substitute for true equality tests. But it may very well be used as a quick test to ensure that two objects are NOT equal. You would code it like this in a loop for example:

// // example code, looking though an array counting // the number of occurrences of "object" in it // - (unsigned int) countOccurrencesOfObject:(id) object inArray:(NSArray *) array { unsigned int count; unsigned int hash; unsigned int i, n; id p; count = 0; hash = [object hash]; n = [array count]; for( i = 0; i < n; i++) { p = [array objectAtIndex:i]; if( [p hash] == hash) if( [p isEqual:object]) ++count; } return( count); }

If computing the hash value of an object is likely to be much faster then comparing two objects with isEqual: then this is likely to be faster than the equivalent code with out the hash comparison.

But the hash is usually used in a much more clever way, and that's where NSDictionary, NSSets and friends come into play. Instead of sequentially comparing all objects with the hash value, a portion or all of the hash value is used as an index to most likely candidates. Here is an excerpt of a hypothetical class implementing a simple dictionary :

@interface SimpleDictionary : ... { NSMutableArray *keyBuckets; NSMutableArray *objectBuckets; } ... - (void)setObject:(id) anObject forKey:(id) aKey { NSArray *keyBucket; NSArray *objectBucket; unsigned int bucketIdx; // Get buckets using the key's // hash value (non-optimally as we will soon // see) bucketIdx = [aKey hash] % [keyBuckets count]; keyBucket = [keyBuckets objectAtIndex:bucketIdx]; objectBucket = [objectBuckets objectAtIndex:bucketIdx]; // Now add the key and object to their buckets. // They will end up at the same position in // each array. [keyBucket addObject:anObject]; [objectBucket addObject:anObject]; } - (id) objectForKey:(id) aKey { NSArray *keyBucket; NSArray *objectBucket; unsigned int bucketIdx; unsigned int i, n; // Get buckets using the key's hash value bucketIdx = [aKey hash] % [keyBuckets count]; keyBucket = [keyBuckets objectAtIndex:bucketIdx]; objectBucket = [objectBuckets objectAtIndex:bucketIdx]; // Search for key in its bucket n = [keyBucket count]; for( i = 0; i < n; i++) { if( [[keyBucket objectAtIndex:i] isEqual:aKey]) break; } // If i equals n we didn't find the key // and must return nil if( i == n) return( nil); // Otherwise i is the position of the key and the // object in it's respective buckets. // Return the object return( [objectBucket objectAtIndex:i]); }

This class maintains two arrays of buckets, one for the keys and one for the objects. Each bucket is an NSArray that contains keys or objects.

The graphic shows a very small dictionary where only two buckets are used each for key and object storage. The key and corresponding object are stored at the same index in their buckets. The hash is now used to preselect the bucket to insert or search. This will make searching much faster because only a small portion of all keys must be searched.

It is clear that the number of buckets should grow proportionally with the number of objects kept in the dictionary so that performance does not degrade.

Good hash, bad hash

What happens if the hash method is bad, and for example by programmers error the hash method always returns 0 ? All objects now reside in a single bucket. That would mean that the loop has always to go through the equality check and that it always has to perform the (now useless) hash call. In the end that would be slower than just testing for equality with isEqual:.

Here is the charted output of a small demo program, that shows how a really bad hash (X=0) is actually worse than no hash (X=-1). Any halfassed hash (X=2) is already very helpful, and things get better as the hash gets better (X=32).

The demo program creates a number of random strings of equal size with a very restricted character set (a and b).

The mutable pitfall

An interesting problem is the addition of mutable instances to a collection. As the hash is often based on the contents of the object, changing the contents and therefore the hash of a mutable object, while it is a member of a hashing collection will lead to problems:

s = [NSMutableString stringWithString:@"A"]; set = [NSMutableSet set]; [set addObject:s]; [s appendString:@"B"]; [set addObject:s]; NSLog( @"A : %@", [set containsObject:@"A"]); NSLog( @"AB : %@", [set containsObject:@"AB"]);

this code will crash at the second containsObject: call (on 10.2).

Now you know the reason, why NSDictionary does not retain it's keys but copyies them.

The hash X=2 has four possible hash values, the X=32 hash uses the full unsigned int range.

It is therefore important that a hash function has the following properties

it must be fast, very fast.
the value returned for an instance, should be different from other objects, that are not "equal" to the instance. If this would violate the "speed" requirement the hash function should try to minimize the likelyhood of disequal objects with the same hash value.

[Example hashvaluetest]

Unfortunately on Mac OS X, it's not so simple

Hash values are used in many places in Foundation and CoreFoundation. Most notably of course the NSDictionary, where the hash is used to quickly search for the key of a dictionary for the subsequent lookup of its associated value. Then there is also the NSSet, the NSHashTable, the NSMapTable, the CFDictionary and probably a bunch of other facilities that serve as collections, lookup tables etc.

During the development of CoreFoundation, which is now the real brains behind most of Foundation, hashing and its usage were apparently not very well thought through. As becomes apparent in the following example.

Assume that we build a class, where every object is disequal from the other. An easy way (not the best, but certainly a perfectly valid way) of doing this is to store a unique number during init as an instance variable, like this:

- (unsigned int) _hash1 { static unsigned int counter; ++counter; return( counter); } - (id) init { [super init]; hash = [self _hash1]; return( self); } - (unsigned int) hash { return( hash); }

Another perfectly valid way (without the benefit of hindsight) would be to generate the hash like this:

- (unsigned int) _hash2 { static unsigned int counter; ++counter; return( ((counter >> 24) & 0xFF) | ((counter >> 16) & 0xFF) << 8 | ((counter >> 8) & 0xFF) << 16 | ((counter) & 0xFF) << 24); }

which reverses the byte order in the hash value.

Both methods create unique unsigned int number, that should be equally serviceable as hash values. Alas it is not so! (This is true for Mac OS X versions 10.0 till 10.2).

Running a test program that inserts 5000 objects into a NSHashTable the runtime for using _hash1 is 0.03s, the runtime for _hash2 is 7.4s.

That's 250 times slower!

It is very apparent that CoreFoundation favors the lower bit positions over the higher bit positions, or rather the higher bit positions aren't even used at all. Presumably the code to select a hash bucket is just the same as was used in the little example dictionary above.

A look into the relevant CFDictionary code reveals, that indeed - as feared - only the lower bits are used to index the bucket:

Two ways to fix such an "issue".

// // enter number of bits to fold hash // code into // static unsigned int fold( unsigned int hash, unsigned int bits) { #if PEDANTIC int n; unsigned int result; unsigned int mask; result = hash; if( bits) { mask = (1UL << bits) - 1; n = 32; do { hash >>= bits; result ^= hash & mask; n -= bits; } while( n >= 0); } return( result); #else // heuristic, very likely none worse return((hash >> 24) ^ (hash >> 12) ^ hash); #endif }

The superior alternative is to only use a prime number of hash buckets and to calculate the index thusly:

return( hash % n_buckets);

The prime modulo guarantees that all hash bits are used in the index computation.

static CFIndex __CFDictionaryFindBuckets1a(CFDictionaryRef dict, const void *key) { CFHashCode keyHash = (CFHashCode)key; const void **keys = dict->_keys; uint32_t mask = dict->_bucketsNum - 1; uintptr_t marker = dict->_marker; CFIndex probe = keyHash & mask; CFIndex probeskip = ((keyHash << 1) + 1) & mask; CFIndex start = probe; for (;;) { uintptr_t currKey = (uintptr_t)keys[probe]; if (marker == currKey) { /* empty */ return kCFNotFound; } else if (~marker == currKey) { /* deleted */ /* do nothing */ } else if (currKey == (uintptr_t)key) { return probe; } probe = (probe + probeskip) & mask; if (start == probe) return kCFNotFound; } }

So, for now a hash function should also obey the following rule

The randomness or entropy of the lowest bits should be maximized. Avoid creating hash values that differ only or mostly in the upper bits. The _hash1 method is in this respect perfect, because entropy is guaranteed to be hightest in the lower bits.

A better NSObject hash ? A better CFHash ?

Disassembling NSObject hash method we find that the address of the object is used as its hash value:

rlwinm r3,r3,30,2,31 blr

which retranslates back to

- (unsigned int) hash { return( (unsigned int) self >> 2); }

That's a good idea, because isEqual: on NSObject tests for adress equality. This obviates the need to call isEqual: in all cases, when hash values are compared before. Also there will be perfect disequality of hash values between different objects, since no two objects share the same memory. The entropy of the lower bits is pretty good (but not without optimization possibilities) as objects are typically allocated close to each other in memory. This can be seen in a random sample run of code that did 16 objects allocations outputting their memory addresses:

4f600 4f520 4fc20 4f580 51040 5bb40 5bb50 5bb60 5bb70 5bb80 5bb90 5bba0 5bbb0 5bbc0 5bbd0 5bbe0

The reason for the shift

The author of the -[NSObject hash] method was undoubtedly aware that objects need to be at least 4 byte aligned on the PPC (because of the leading isa pointer). Therefore any object by that reasoning will have the lower two bits cleared. Given an inclination towards preferable entropy in the lower bits, this makes for a better hash value.

Since objects are - in general - allocated with malloc, and malloc in Mac OS X guarantees 16 byte alignment on the PPC the shift count should actually be 4 nowadays (compare this with the example allocations output above)

- (unsigned int) hash { return( (unsigned int) self >> 4); }

Running a little test case, this shows the following improvement: 0.9 s over 1.4 s with stock -[NSObject hash].

A very interesting experiment is to not do any shifting at all. With that the speed drops down 3.7s!

The bad touch

Now lets have a peek into CoreFoundation to see the code for CFHash

CFHashCode CFHash(CFTypeRef cf) { #if defined(DEBUG) if (NULL == cf) HALT; #endif CFTYPE_OBJC_FUNCDISPATCH0(CFHashCode, cf, "hash"); __CFGenericAssertIsCF(cf); if (NULL != __CFRuntimeClassTable[__CFGenericTypeID(cf)]->hash) { return __CFRuntimeClassTable[__CFGenericTypeID(cf)]->hash(cf); } return (CFHashCode)cf; }

As you notice, there is no shifting performed whatsoever. CoreFoundation objects that do not implement their own hash function, therefore will suffer. For those objects, the performance on dictionary operations is only 25% of what it could be. Fortunately most CFRuntimeClasses, the CoreFoundation equivalent of the Objective-C classes, do implement their own hash functions ⁽¹⁾

CFHash's implementation should be of concern to you, if you implement your own class based on CFRuntimeClass - outside of the NSObject hierarchy. If you subclass from NSObject though, you will inherit NSObject's hash method.

A further (academic) step

The implementation of hash that shifts the address value is suboptimal, because the shift zeroes top bits. Every shift operation therefore destroys a little the uniqueness of the hash value. It is more appropriate to code the hash function as a rotation instead of a shift as:

- (unsigned int) hash { return( ((unsigned int) self >> 4) | (unsigned int) self << (32 - 4)); }

This code is the best of both worlds, increased entropy in the lower bits and full use of all address bits. (This exercise is largely academic, because due to the expected 16 byte alignment of the memory block only zero bits will be rotated up.)

I was very pleasantly surprised to see, that the gcc compiler was able to compile this (with -O3) into the optimal machine code:

rotlwi r3,r3,28 blr

Put it in a category on NSObject, and get a little Cocoa speedup for free! It remains to be seen if this does anything for overall speed. Please let me know.

[Example nsobjecthashtest]

Some Foundation hash implementations discussed

Lets look at the hash implementations of a few selected but often used and usually numerously instantiated Foundation classes. As these are all class clusters, avoid the temptation of optimizing their hash method. Each subclasses' hash needs to stay return value compatible (for isEqual:) with all other subclasses of the same class cluster. The implementations are those of Mac OS X 10.3. Later versions may have different implementations.

NSString/CFString/NSURL

The hash is a convolution of the first and last eight bytes plus the length of the string basically the byte values are shifted and added to the string length.

This is a sensible hash for many computer language and natural language strings.
NSURL uses the same hash as NSString. This is a very bad choice. As the first eight and last eight characters of URLs are likely to be very similiar if not identical, as in these not too contrived examples:

http://www.mulle-kybernetik.com/index.html
http://www.google.de/index.html
ftp://ftp.apple.com/pub/share/Resource.tgz
ftp://ftp.mulle-kybernetik.com/pub/software/MulleEOInterface/MulleEOInterface.source.tgz

Consequently storing a few thousand URLs in an NSDictionary can already lead to performance problems.

NSNumber/CFNumber

The hash for integer is the absolute value. For doubles the absolute value of the integral part plus the absolute value of the fractional part scaled to integer (e.g. 0.5 * 0xFFFFFFFF)

Sounds good.

NSData/CFData

The hash is the ELF hash over the first 16 bytes

ELF isn't bad, but the implementation could be better⁽¹⁾. In certain cases where much is known about the to-be-expected data, a self-written data class can be fruitful. Especially if the first 16 bytes are known to not vary much.

NSDate/CFDate/NSCalendarDate

The hash used is floor( [self timeIntervalSinceReferenceDate])

This gives a value in seconds. Pretty much perfect for a hash value.

NSDictionary/CFDictionary

The hash implementation is [self count].

It's not a good idea to put a lot of equal sized dictionaries into NSSets. An optimization that readily comes to mind, would be to use a "primary key". To create a subclass of NSDictionary and write hash like this:

- (unsigned int) hash { return( [[self objectForKey:@"key"] hash]); }

This may not be a good solution, because the hash of a NSDictionary subclass should stay compatible with NSDictionary's hash. Also, the two method calls just may not make it worth it, as hash is supposed to be fast. If your subclass has direct access to the primary key, go for it.

You might try to put this code into a category on NSDictionary. This would work properly only, if no NSDictionary subclasses implement their own hash methods. If that is the case, it is still likely to be dependent on the Foundation version and therefore dangerous.

The most robust but also most expensive solution appears to be a wrapper class, subclassed from NSObject, where you store a NSDictionary as an instance variable.

NSArray/CFArray

The implementation is [self count].

It's not a good idea to put lots of equal sized NSArrays into a NSSet, as the hash will always the same.

Wrap Up (The Executive Version)

With Mac OS X in it's current state it is double- and triple important to check your hash method.

Keep entropy (randomness, distribution) high in the lowest and lower bits.

Put a category on NSObject to override hash and enjoy faster performance on all NSObject derived classes, that do not implement their own isEqual:. If you need a good hash, but computing it is expensive, consider using an instance variable to cache it there.

Do not add mutable objects to hashing collections.

If you want to discuss this articles, please do so in this thread in the Mulle kybernetiK Optimization Forum.

⁽¹⁾ As of 10.2 the CoreFoundation classes not implementing hash are: CFUserNotification, CFNull, CFAllocator, CFTree, CFBoolean, CFXMLParser, CFBundle, CFPlugInInstance, CFRunLoop, CFSocket.

A little bonus anecdote for the thorough reader, that also reads the footnotes. The hash function of CFData uses the ELF hash algorithm. That algorithm is unfortunately not implemented in a way that is beneficial to the PPC architecture.

Also the algorithm itself has a bug, because the upper nybble of the returned hash value is needlessly always zeroed. Here is a modified ELF hash code, that's 30% faster on my Cube with gcc -O3: and preserves the upper nybble:

#define MULLE_ELF_STEP( B) \ do \ { \ H = (H << 4) + B; \ H ^= (H >> 24) & 0xF0; \ } \ while( 0) // original ELF_STEP // #define ELF_STEP(B) T1 = (H << 4) + B; T2 = T1 & 0xF0000000; if (T2) T1 ^= (T2 >> 24); T1 &= (~T2); H = T1 unsigned int mulle_elf_hash( unsigned char *bytes, unsigned int length) { unsigned int H = 0; int rem = length; while (3 < rem) { MULLE_ELF_STEP(bytes[length - rem]); MULLE_ELF_STEP(bytes[length - rem + 1]); MULLE_ELF_STEP(bytes[length - rem + 2]); MULLE_ELF_STEP(bytes[length - rem + 3]); rem -= 4; } switch (rem) { case 3: MULLE_ELF_STEP(bytes[length - 3]); case 2: MULLE_ELF_STEP(bytes[length - 2]); case 1: MULLE_ELF_STEP(bytes[length - 1]); case 0: ; } return H; } #undef ELF_STEP

[Example elfhashtest]

The example source contains another hash function, which I believe is none worse and is about 40% faster than the original.