Zerofilling - Part IV - New and Improved Application
Here's a new version, quite a bit nicer than the previous version. Here are the highlights:
- Use
mmap
/munmap
instead ofvm_allocate
/vm_unallocate
for memory allocation (Thanks to Jim Magee for this) - It is more apparent which fields are editable and which aren't.
- There is a option to loop the benchmark to stabilize the values (Apple-L)
- You can have more than one window, but you can't close them :)
Download | |
---|---|
Binary | MulleZerofillCalc.app.tgz |
Source | MulleZerofillCalc.src.tgz |
The mmap
feature is the biggy in this release.
Using mmap
to allocate memory is much faster than the
vm_allocate
as you will find out. On my Cube 450 use
of mmap
is almost twice as fast as the use of
vm_allocate
. That is for pure allocation and
deallocation. The actual page fault and zerofill costs are - from
my observations - virtually :) unaffected. So while this certainly
is of big interest, it in itself doesn't change the comments made
in part III.
Operation | mmap | vm_allocate |
---|---|---|
1p alloc | 9.7 | 17.6 |
1p alloc + fault | 41.0 | 51.5 |
2p alloc | 11.1 | 18.9 |
2p alloc + fault | 65.9 | 77,3 |
If you want to know why mmap
is faster read
on...
Excerpt from a mail from Jim Magee:
The Mach version of the call is an RPC into the kernel
targeting whatever task/map happened to be passed in. Almost
always, it is the current map. But the API (and the transport
mechanism to get to the API) can't assume that. So, it has to do
all the dirty work of adding Mach port rights to the destination
port for this message, looking up a per-thread reply port and
adding a reply port right, formatting and sending a message to that
port, validating and atomically translating that destination port
send right into a reference on a vm_map_t
, and then
finally making the vm_allocate()
call in the kernel.
Some of the same overhead kicks in on the reply (tear down the
temporary map reference, build a reply message, etc...). This all
adds up to a bit of IPC-related overhead.
When you use the BSD API, it always refers to the map that the current thread is running in. Since each thread already holds a reference against their own address map, there is no need to mess around with trying to add a reference for the duration of this call. Just take the trap arguments, grab the cached reference on the map, and make the call. Copyout a single reply argument and away you go.