« March 2007 | Main | May 2007 »

April 2007 Archives

April 9, 2007

The Mullocator - Part III - Testing Boundaries

Fixing point #4 in the most straightforward way, is to use an indirection. We keep a current block and if we exhaust it, we grab a new one from the OS.

This poses the question how to get memory from the OS. We can't use malloc obviously, because we're trying to replace it. In the old unix days one used sbrk and brk. It still can be used today - maybe I should use it :) - but instead we are going to use the slightly more "modern" mmap feature. There is also vm_allocate which can be used to allocate memory under Mach. I prefer to use mmap because it's a little bit more universal and at least a few years ago, also faster. The actual implementation details are for another installment. I weasel my way out of the predicament by using a os_alloc function, that will call the required OS function. os_alloc will be written later.

Because malloc(0) is guaranteed by convention to return a unique pointer and not NULL (if memory is available), let's also check for that and align the size at the same time.

Also we now support errno. It should be set to ENOMEM if memory is exhausted.

static inline void  *malloc( size_t size)
{
   extern long  *memory;
   extern long  *sentinel;
   extern long  *buf;
   void         *p;
   long         *q;
   size_t       len;

   size += (size == 0);
   size  = (size + sizeof( long) - 1) / sizeof( long);

   q = &buf[ size];  
   if( q > sentinel)
   {
       len    = size < BLOCK_SIZE_LONGS ? BLOCK_SIZE_LONGS : size;
       memory = os_alloc( len);
       if( ! memory)
       {
           errno = ENOMEM;  // yes should be setting errno..
           return( NULL);
       }
       sentinel = &memory[ len];
       buf      = memory;
       q        = &buf[ size];  
   }

   p   = buf;
   buf = q;
   return( p);
}
Although this code does solve problem #4, it also introduces a few problems.
  • The BLOCK_SIZE_LONGS constant is an arbitrary choice. If we set it to 0x1000000 / sizeof( longs) we haven't gained anything compared to the code before. If we set it to 1, we'll be calling os_alloc for every malloc call. The truth is somewhere in the unknown middle.
  • The old memory block is immediately forgotten once we cannot satisfy a memory request. This is very wasteful. In a call sequence of
        for(;;)
        {
             malloc( 1);
             malloc( BLOCK_SIZE_LONGS * sizeof( long));
        }
    
    we'd be burning memory at almost twice the rate necessary. This is probably too wasteful.
It's good to be aware of these problems. We'll suspend this stories thread for now and turn the attention to free in the next installment.

April 16, 2007

The Mullocator - Part IV - Testing Boundaries

As we'll see soon, the complicated part of malloc isn't actually malloc, it's free. Let's look at the code from Part I again, for simplicity.
static inline void  *malloc( size_t size)
{
   extern long  memory[];
   extern long  *sentinel;
   extern long  *buf;
   void         *p;
   long         *q;
   
   /* assume the compiler does reduce that to shifting, it will... */
   q = &buf[ (size + sizeof( long) - 1) / sizeof( long)];  
   if( q > sentinel)
      return( NULL);

   p   = buf;
   buf = q;
   return( p);
}
We have a huge memory block, where we are cutting of pieces and returning them. There is ONE extremely easy free operation, and that is if the last recently allocated block of memory is returned. We could just bump the buf pointer down to it's previous value and that memory could be reused again. (It'd be easy if we would remember something about it...)

main()
{
    long   *p;

   p = malloc( 1848 * sizeof( long));
   free( p);
}
Before the malloc call, buf will be pointing at memory. Afterwards buf is now pointing to &memory[ 1848]. If the address of memory (now contained in p) gets freed, we could and would want to shrink buf back to memory.
void free( void *p)
{
   if( p == memory)
      buf = memory;
}

The problem is we can't do that. We do not know if p is the most recently allocated block, because we don't remember it. As a simple example like this shows:

main()
{
    long  *p, *q;

   p = malloc( 1 * sizeof( long));    // but now &memory[ 1]
   q = malloc( 1 * sizeof( long));    // but now &memory[ 2]
   free( p);                          // but now &memory[ 0] *BAD*
   p = malloc( 2 * sizeof( long));
   
   assert( p >= q);
   assert( &p[ 2] <= q);  // crash
}
Which leads to the conclusion that we need to store the size of a memory block if we want to be able to free it again. It doesn't matter if we save the end address or the length of the block, because we can compute either from the other.

The more interesting question is where do we save the length of the block ?

This open question concludes the first four parts. We now have a feeling for what would be the fastest malloc and free speed. What we don't really know is, what kind of real life stress malloc faces. That's for the next series, where we actually make some empirical testing and things become a little more interesting...

Hopefully :)

Continue reading "The Mullocator - Part IV - Testing Boundaries" »

April 18, 2007

MacFUSE + sshfs :: Truely Sucks

This is a follow up to http://www.mulle-kybernetik.com/weblog/2007/02/macfuse_sshfs_really_great.html.

The problem in real life day to day operationis, it just hangs too often. And when it hangs, it haaaangs. It happened way too often that I got stuck saving a file I was editing. The way to recover was to reboot. That's bad.

I went into the affair goodnatured and with high hopes. But the day to day praxis shows, this ain't working.

April 22, 2007

ASUS P5B-E PLUS : I can't recommend it

The board runs stable and fast. The BIOS got a lot of tweaking options. Fine....

But the drivers are HELL. The JRAID driver is a piece of shit, that I wasted quite some time on trying to get it to work. That's until I read in some forums, that I am better off NOT installing it over the default Windows drivers.

The SoundMAX driver is just hilarious. At first everything was working. Then unplugged my headphones and plugged my headset in. The software "smartly" detected a new device connected to the soundcards in- and ouputs and asked me what it was. I answered "microphone" for the pink jack. When I was asked for the green jack, I answered "headphones". That was incorrect according to the software, which told me I should connect the phones to the "green" jack. It then decided to turn off that jack completely!

Motherf@cker!

There was NO way to tell the driver to turn that output on again. Finally I reinstalled the driver and I got sound again. Hurray.

The problem is now, the microphone stopped working. And I am not getting it back, by reinstalling the software....

April 24, 2007

The Mullocator - Part V - Empirical Investigation

This part is largely driven by some empirical findings and some general musings. Again some of this stuff may seem a little too easy, but lots of little easy parts make up the complexity. Forget one and you won't be getting the complete picture.
  1. malloc free calls come in pairs: (We ignore that you can call free( NULL) and nothing bad happens) There will always be at most as many free calls as there are malloc calls. In some applications, there may be significantly less free calls, but on average the statistical expectation is, that one malloc has one free call.
  2. malloc is a little more important than free: Let's assume an "algorithmic cost" for an malloc/free pair. Would it make sense to cannibalize free's performance in order to improve malloc's speed ? I believe yes. In a Cocoa program for example, at the start of the Runloop, there will be allocations, that are then recorded in an NSAutoreleasePool and then freed. The application would appear more responsive, because the output would appear sooner, and the freeing operation would be done statistically more often at an idle time. It would most likely not make sense to exploit this to the point, where the "algorithmic cost" of the sum increases in favor of malloc alone.
  3. Handling small allocations well is more impotant than large ones: This is because of two insights. a) Filling a large memory block with data is slower than a small one. Obviously. The larger the block gets, the more insignificant malloc/free times become. b) there are more small allocations than large allocations, because memory space is finite. So distributed over many applications using malloc we would naturally expect more small allocations than larger ones.

Empirical Data

These measurements are now four years old, but that shouldn't matter...

Here to the left side are the statistics for a sample session in Sketch.app (The Apple developers example drawing application) running a custom CFAllocator and on the right side are the statistics from a gawk benchmarking program, that ran for a minute or so.

mallocstotal
all451476
<= 8 bytes 244
<= 16 bytes 35286
<= 32 bytes 270517
<= 64 bytes 79047
<= 128 bytes 32771
<= 256 bytes 14944
<= 1024 bytes 5987
<= 4096 bytes 6394
<= 16384 bytes 6136
<= 65536 bytes 116
mallocs > 65536 bytes 34
 
mallocstotal
all 9032868
<= 8 bytes 721818
<= 16 bytes 510126
<= 32 bytes 4080271
<= 64 bytes 570480
<= 128 bytes 1260028
<= 256 bytes 1260015
<= 1024 bytes 630127
<= 4096 bytes 3
<= 16384 bytes 0
<= 65536 bytes 0
> 65536 bytes 0

The data nicely supports the aforementioned claims.

Another intersting statistic, that as far as I know is unique :), is the lifetime of a malloced block. A lifetime is the number of other malloc and free calls that occurred between a block's malloc and free. This is a useful indicator to see how shortlived memory allocations statistically are. Again Sketch.app on the left and gawk on the right.

total lifetimecount
1173617
2 18340
<= 4 42944
<= 8 51151
<= 16 58868
<= 32 24687
<= 64 10591
>=64 2407
 
total lifetimecount
1 2460218
2 660015
4 1050252
8 1080404
16 30097
32 378
64 240355
>=64 3510530

So at the end of this installment, we are armed with the two findings that allocations are usually fairly small (less than 256 bytes) and usually pretty shortlived. Usually not much more than 32 malloc, free calls inbetween allocation and deallocation. That's something to exploit!

About April 2007

This page contains all entries posted to Nat!'s Web Journal in April 2007. They are listed from oldest to newest.

March 2007 is the previous archive.

May 2007 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34