Nat! bio photo

Nat!

Senior Mull

Twitter Github Twitch

The Mullocator - Part V - Empirical Investigation

This part is largely driven by some empirical findings and some general musings. Again some of this stuff may seem a little too easy, but lots of little easy parts make up the complexity. Forget one and you won't be getting the complete picture.

  1. malloc free calls come in pairs: (We ignore that you can call free( NULL) and nothing bad happens) There will always be at most as many free calls as there are malloc calls. In some applications, there may be significantly less free calls, but on average the statistical expectation is, that one malloc has one free call.
  2. malloc is a little more important than free: Let's assume an "algorithmic cost" for an malloc/free pair. Would it make sense to cannibalize free's performance in order to improve malloc's speed ? I believe yes. In a Cocoa program for example, at the start of the Runloop, there will be allocations, that are then recorded in an NSAutoreleasePool and then freed. The application would appear more responsive, because the output would appear sooner, and the freeing operation would be done statistically more often at an idle time. It would most likely not make sense to exploit this to the point, where the "algorithmic cost" of the sum increases in favor of malloc alone.
  3. Handling small allocations well is more impotant than large ones: This is because of two insights. a) Filling a large memory block with data is slower than a small one. Obviously. The larger the block gets, the more insignificant malloc/free times become. b) there are more small allocations than large allocations, because memory space is finite. So distributed over many applications using malloc we would naturally expect more small allocations than larger ones.

Empirical Data

These measurements are now four years old, but that shouldn't matter...

Here to the left side are the statistics for a sample session in Sketch.app (The Apple developers example drawing application) running a custom CFAllocator and on the right side are the statistics from a gawk benchmarking program, that ran for a minute or so.

mallocs total
all 451476
<= 8 bytes 244
<= 16 bytes 35286
<= 32 bytes 270517
<= 64 bytes 79047
<= 128 bytes 32771
<= 256 bytes 14944
<= 1024 bytes 5987
<= 4096 bytes 6394
<= 16384 bytes 6136
<= 65536 bytes 116
mallocs > 65536 bytes 34
 
mallocs total
all 9032868
<= 8 bytes 721818
<= 16 bytes 510126
<= 32 bytes 4080271
<= 64 bytes 570480
<= 128 bytes 1260028
<= 256 bytes 1260015
<= 1024 bytes 630127
<= 4096 bytes 3
<= 16384 bytes 0
<= 65536 bytes 0
> 65536 bytes 0

The data nicely supports the aforementioned claims.

Another intersting statistic, that as far as I know is unique :), is the lifetime of a malloced block. A lifetime is the number of other malloc and free calls that occurred between a block's malloc and free. This is a useful indicator to see how shortlived memory allocations statistically are. Again Sketch.app on the left and gawk on the right.

total lifetime count
1 173617
2 18340
<= 4 42944
<= 8 51151
<= 16 58868
<= 32 24687
<= 64 10591
>=64 2407
 
total lifetime count
1 2460218
2 660015
4 1050252
8 1080404
16 30097
32 378
64 240355
>=64 3510530

So at the end of this installment, we are armed with the two findings that allocations are usually fairly small (less than 256 bytes) and usually pretty shortlived. Usually not much more than 32 malloc, free calls inbetween allocation and deallocation. That's something to exploit!