Nat! bio photo

Nat!

Senior Mull

Twitter Github Twitch

Zerofilling - Part IV - New and Improved Application

Here's a new version, quite a bit nicer than the previous version. Here are the highlights:

  • Use mmap/munmap instead of vm_allocate/vm_unallocate for memory allocation (Thanks to Jim Magee for this)
  • It is more apparent which fields are editable and which aren't.
  • There is a option to loop the benchmark to stabilize the values (Apple-L)
  • You can have more than one window, but you can't close them :)
Download
Binary MulleZerofillCalc.app.tgz
Source MulleZerofillCalc.src.tgz

The mmap feature is the biggy in this release. Using mmap to allocate memory is much faster than the vm_allocate as you will find out. On my Cube 450 use of mmap is almost twice as fast as the use of vm_allocate. That is for pure allocation and deallocation. The actual page fault and zerofill costs are - from my observations - virtually :) unaffected. So while this certainly is of big interest, it in itself doesn't change the comments made in part III.

Operation mmap vm_allocate
1p alloc 9.7 17.6
1p alloc + fault 41.0 51.5
2p alloc 11.1 18.9
2p alloc + fault 65.9 77,3

If you want to know why mmap is faster read on...
Excerpt from a mail from Jim Magee:
The Mach version of the call is an RPC into the kernel targeting whatever task/map happened to be passed in. Almost always, it is the current map. But the API (and the transport mechanism to get to the API) can't assume that. So, it has to do all the dirty work of adding Mach port rights to the destination port for this message, looking up a per-thread reply port and adding a reply port right, formatting and sending a message to that port, validating and atomically translating that destination port send right into a reference on a vm_map_t, and then finally making the vm_allocate() call in the kernel. Some of the same overhead kicks in on the reply (tear down the temporary map reference, build a reply message, etc...). This all adds up to a bit of IPC-related overhead.

When you use the BSD API, it always refers to the map that the current thread is running in. Since each thread already holds a reference against their own address map, there is no need to mess around with trying to add a reference for the duration of this call. Just take the trap arguments, grab the cached reference on the map, and make the call. Copyout a single reply argument and away you go.