The Mullocator - Part V - Empirical Investigation

This part is largely driven by some empirical findings and some general musings. Again some of this stuff may seem a little too easy, but lots of little easy parts make up the complexity. Forget one and you won't be getting the complete picture.

malloc free calls come in pairs: (We ignore that you can call free( NULL) and nothing bad happens) There will always be at most as many free calls as there are malloc calls. In some applications, there may be significantly less free calls, but on average the statistical expectation is, that one malloc has one free call.
malloc is a little more important than free: Let's assume an "algorithmic cost" for an malloc/free pair. Would it make sense to cannibalize free's performance in order to improve malloc's speed ? I believe yes. In a Cocoa program for example, at the start of the Runloop, there will be allocations, that are then recorded in an NSAutoreleasePool and then freed. The application would appear more responsive, because the output would appear sooner, and the freeing operation would be done statistically more often at an idle time. It would most likely not make sense to exploit this to the point, where the "algorithmic cost" of the sum increases in favor of malloc alone.
Handling small allocations well is more impotant than large ones: This is because of two insights. a) Filling a large memory block with data is slower than a small one. Obviously. The larger the block gets, the more insignificant malloc/free times become. b) there are more small allocations than large allocations, because memory space is finite. So distributed over many applications using malloc we would naturally expect more small allocations than larger ones.

Empirical Data

These measurements are now four years old, but that shouldn't matter...

Here to the left side are the statistics for a sample session in Sketch.app (The Apple developers example drawing application) running a custom CFAllocator and on the right side are the statistics from a gawk benchmarking program, that ran for a minute or so.

mallocs	total
all	451476
<= 8 bytes	244
<= 16 bytes	35286
<= 32 bytes	270517
<= 64 bytes	79047
<= 128 bytes	32771
<= 256 bytes	14944
<= 1024 bytes	5987
<= 4096 bytes	6394
<= 16384 bytes	6136
<= 65536 bytes	116
mallocs > 65536 bytes	34

mallocs	total
all	9032868
<= 8 bytes	721818
<= 16 bytes	510126
<= 32 bytes	4080271
<= 64 bytes	570480
<= 128 bytes	1260028
<= 256 bytes	1260015
<= 1024 bytes	630127
<= 4096 bytes	3
<= 16384 bytes	0
<= 65536 bytes	0
> 65536 bytes	0

The data nicely supports the aforementioned claims.

Another intersting statistic, that as far as I know is unique :), is the lifetime of a malloced block. A lifetime is the number of other malloc and free calls that occurred between a block's malloc and free. This is a useful indicator to see how shortlived memory allocations statistically are. Again Sketch.app on the left and gawk on the right.

total lifetime	count
1	173617
2	18340
<= 4	42944
<= 8	51151
<= 16	58868
<= 32	24687
<= 64	10591
>=64	2407

total lifetime	count
1	2460218
2	660015
4	1050252
8	1080404
16	30097
32	378
64	240355
>=64	3510530

So at the end of this installment, we are armed with the two findings that allocations are usually fairly small (less than 256 bytes) and usually pretty shortlived. Usually not much more than 32 malloc, free calls inbetween allocation and deallocation. That's something to exploit!

Nat!

The Mullocator - Part V - Empirical Investigation

Empirical Data

You might also enjoy (View all posts)