|
Lameness Disclaimer: All this is written to the best of my knowledge. Corrections, additions etc. are certainly welcome.
Optimizing Foundation ClassesEnough of the theory from the previous articles, lets get down to some practical optimizations. This article only covers Obj-C/Foundation specific optimizations this time. Some generic optimizatons like "improving the algorithm" or loop unrolling will not be mentioned. The previous articles are prerequisite knowledge!
Into the Code
|
At this point you should download the Example. Compile it and run it. Since it is a project made with ProjectBuilderWO you will have to import it into the new Project Builder, if you don't have ProjectBuilderWO installed. Run each tests multiple times to let the system "warm up". The lowest timing returned counts. Turn off iTunes and other Apps with some - maybe just periodic - activity (like Mail.app polling for email). |
The program reads a UNIX text file, breaks the contents up into an array of lines and then reverses the order of the lines. To do that the code uses an reverse enumerator to access the lines in the array from back to front and to store them in another array to return. This will be the part that will be optimized.
result = [NSMutableArray arrayWithCapacity:[lines count]]; rover = [lines reverseObjectEnumerator]; while( s = [rover nextObject]) [result addObject:s]; return( result);
Running this over 4233000 lines on a Cube 450Mhz took 9.5 seconds. The task: reduce these 9.5 seconds as much as possible.
[ExampleApp: Unoptimized]
IMP address; adress = [someObject methodForSelector:@selector( someMethod:someOtherArgument:)]; adress( someObject, @selector( someMethod:someOtherArgument:), argument1, argument2);
if you do this for one call only it will obviously be even SLOWER than calling [someObject someMethod:argument1 someOtherArgument:argument2], but the payoff will come if you are calling this method repeatedly, e.g. in a loop.
So in the code above there are two method calls in the loop, which will be replaced with IMP function calls.
The code will change like this.
IMP impNextObject; IMP impAddObject; result = [NSMutableArray arrayWithCapacity:[lines count]]; rover = [lines reverseObjectEnumerator]; impAddObject = [result methodForSelector:@selector( addObject:)]; impNextObject = [rover methodForSelector:@selector( nextObject)]; while( s = impNextObject( rover, @selector( nextObject))) impAddObject( result, @selector( addObject:), s); return( result);The result is 8.5 seconds. That's an improvement of more than 10% which is nice but not really crushing.
In terms of Objective-C there is nothing left to be done at this stage. Now one has to consider whether there isn't something one can do to make Foundation move faster.
for( i = n, j = 0; --i >= 0; j++) result[ j] = lines[ i];Coding up a test case for this, the result is fairly exciting. The test runs in a fraction of a second!
There are some quality differences between what Foundation does and what the simple C version does. These are the enhancements of the Foundation version, that you get for free and that the C code does not provide
Of the 1481 samples spent in imped - the optimized reordering routine - over 90% are spent in NSMutableArray's addObject:! Surprisingly only 4% are spent in the enumerators nextObject method. Obviously NSMutableArray's addObject: method will be the focus of the next optimization step.
@interface PrivateMutableArray : NSMutableArray { id *pointers_; unsigned int size_; unsigned int nPointers_; } @end @implementation PrivateMutableArray - (id) initWithCapacity:(unsigned int) n { [super init]; pointers_ = malloc( sizeof( id) * n); nPointers_ = 0; size_ = n; return( self); } - (void) dealloc { unsigned int i; for( i = 0; i < nPointers_; i++) [pointers_[ i] release]; free( pointers_); [super dealloc]; } - (void) grow { unsigned int newsize; if( (newsize = size_ + size_) < size_ + 4) newsize = size_ + 4; if( ! (pointers_ = realloc( pointers_, newsize * sizeof( id)))) [NSException raise:NSMallocException format:@"%@ can't grow from %u to %u entries", self, size_, newsize]; size_ = newsize; } - (void) addObject:(id) p { if( ! p) [NSException raise:NSInvalidArgumentException format:@"null pointer"]; if( nPointers_ >= size_) [self grow]; self->pointers_[ self->nPointers_++] = [p retain]; } - (unsigned int) count { return( nPointers_); } - (id) objectAtIndex:(unsigned int) i { if( i >= nPointers_) [NSException raise:NSInvalidArgumentException format:@"index %d out of bounds %d", i, nPointers_]; return( pointers_[ i]); } @endThe reader will notice, that NSObject and retain/release compatibility has been kept. Also there is boundary checking implemented and the dynamic growth of the array.
There are a few methods lacking for a proper NSMutableArray subclass, namely insertObject:atIndex:, removeLastObject, removeObjectAtIndex:, replaceObjectAtIndex:withObject:. They have not been implemented for simplicity. Here we are optimizing strictly for addObject: performance. It would be interesting and tricky to optimize for both insertObject:atIndex: and addObject: performance, maybe a topic for another article. Straightforward implementations of the leftout methods would result in a fully functional NSMutableArray subclass that is much faster in addObject: and replaceObjectAtIndex:withObject: and removeLastObject performance and probably much slower in executing removeObjectAtIndex: or insertObject:atIndex:.
Putting this through its paces, it finishes in 3s. Not a bad improvement over 9 seconds, and it wasn't a very tricky subclass that was used, but rather a straightforward implementation. No reason to rest on laurels, since it is still about a 100 times slower than the C-code.(3)
[ExampleApp: PrivateMutableArray]
The time spent in addObject: has been reduced to 62%. The enumeration still takes up less than 10% of the loop. The rest of the time is spent in creation of the enumerator 16%, surprisingly large, and the creation of the array and the code of the actual function.
addObject: then spends 60% of its time in retain and 20% calling it through objc_msgSend.
One should not completely do away with the retain, because that would break the NSMutableArray specification (2).
There is no way for the normal developer to enhance the retain routines of all existing and future classes. The only power that exists that could do this is Apple. One might pursue the political route and ask Apple to optimize their library code. In a perfect world that could work...
This implementation is only safe in singletreaded applications. Another article explains how to do this fast and thread safe
@interface PrivateInlineRefCountingString : PrivateString { unsigned int refCountMinusOne_; } @implementation PrivateInlineRefCountingString - (id) retain { refCountMinusOne_++; return( self); } - (void) release { if( ! refCountMinusOne_) { [self dealloc]; return; } refCountMinusOne_--; } - (unsigned int) retainCount { return( refCountMinusOne_ + 1); } @endThe problem of this approach is that one has to recode all the NSString functionality one needs, because care must be taken that one is only dealing with instances of PrivateInlineRefCountingString. Any method that returns a NSString, is likely to return a Foundation NSString and not an instance of PrivateInlineRefCountingString. This recoding is done in the class PrivateString not shown here, but of course available in the example code.
Writing a NSString subclass properly means a huge amount of effort, because one will lose a lot of Foundations optimizations, that you used to get from the built-in Foundation subclasses. A NSString subclass only needs to implement length and characterAtIndex:. Therefore if you as an example do not implement your own substringWithRange: there will be a lot of characterAtIndex: calls from the generic implementation and your NSString subclass will be very very slow. A optimized variant should use the internal representation and generate the substring directly. Taking a look at the NSString specification shows, that there is a lot of routines to implement...
Anyway putting the subclass to work, the payoff of the custom NSString subclass is pretty good. With this optimization another second could be knocked off to a total of less than 2 seconds. [ExampleApp: PrivateInlineRefCountingString]
Since PrivateInlineRefCountingString does it's own reference counting, using CFretain will actually hurt our performance, because CFRetain will have to call it's retain method with an Objective-C method call, so the call of CFRetain is unneeded overhead.
n = [lines count]; result = [PrivateUnsafeMutableArray arrayWithCapacity:n]; for( i = n; --i >= 0;) { s = PrivateUnsafeMutableArrayObjectAtIndex( lines, i); PrivateUnsafeMutableArrayAddObject( result, s); }PrivateUnsafeMutableArray is a subclass of PrivateMutableArray that provides two inline C functions for instance variable access. Check out the previous article if you are unfamiliar with the (struct { @defs( Class) } *) self) "pattern".
#import "PrivateMutableArray.h" @interface PrivateUnsafeMutableArray : PrivateMutableArray { } @end #define PrivateUnsafeMutableArrayCast( self) \ ((struct { @defs( PrivateUnsafeMutableArray) } *) self) static inline void PrivateUnsafeMutableArrayAddObject( PrivateUnsafeMutableArray *self, id p) { if( PrivateUnsafeMutableArrayCast( self)->nPointers_ >= PrivateUnsafeMutableArrayCast( self)->size_) [self grow]; PrivateUnsafeMutableArrayCast( self)-> pointers_[ PrivateUnsafeMutableArrayCast( self)-> nPointers_++] = [p retain]; } static inline id PrivateUnsafeMutableArrayObjectAtIndex( PrivateUnsafeMutableArray *self, unsigned int index) { return( PrivateUnsafeMutableArrayCast( self)->pointers_[ index]); }With this the running time is halved again.
The final sampler output shows, that the majority of time is spent in the Obj-C message dispatch, the one dynamic retain call one can't optimize away. The reference counting percentage has decreased to a mere 10%!
So in the end a 10 fold increase rolling custom Foundation subclasses was achieved, which is much more than I was expecting when I started writing this article.
Writing custom subclasses of Foundation classes is a good method to gain plenty of speed. Obvious candidates are the "non-technical" Foundation classes like NSNumber, NSDictionary, NSString, NSArray etc, especially when you need only a certain part of the functionality provided by Foundation and can optimize for that. Subclassing NSTask for performance reasons is most certainly a waste of time. Be careful though that you aren't deoptimizing certain other functionality with your subclass, as your code might leave out optimizations provided by "hidden" Foundation subclasses, whereas your subclass might fall back on the generic functionality.
(2) If one was to create a NSMutableNonRetainingArray that would not retain its objects, one should at least proclaim an interface where objects have to be retained before adding and will be released upon destruction of the array. Otherwise the risk of leaks will be very high.
This would only give a benefit, if the objects to be added have been freshly created and not autoreleased.
Check out PrivateMutableNonRetainingArray in the Example code.
(3) Another possible reason, why addObject: performs comparatively poorly could be that Foundation ignores the capacity parameter in the arrayWithCapacity: call and then needs to reallocate their internal array often, when the array is growing. From the class sepecification it is free to do that.
There is no way to influence this behaviour though.