« July 2003 | Main | September 2003 »
What the code measuresThe code is doing the following. First it will do a little warm up, doing some allocations untimed. This gives the memory system a chance to prepare for what is coming :)
The program will then do a million operations of
The first loop just does |
The measured resultsI had to name one entry Anonymous because it was submitted with 10.3 and AFAIK that's already problematic with Apple's touchy-feely lawyers...
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
The filesystem determined to mess up an important directory in the perl library hierarchy. Thanks to ZNeK everything's back to normal again.
|
I have - for your utmost convenience - created a little application, that does the benchmarking and the analysis in one step. Ready for your download.
Well the bad news is there is going to be a part III. Unfortunately the application took longer than expected, so you will have to wait for the third part, where I will tell you what all the numbers mean. :) The third part might manifest as late as next Wednesday. | ![]() | |||||||
| Download | |
|---|---|
| Binary | MulleZerofillCalc.app.tgz |
| Source | MulleZerofillCalc.src.tgz |
These were the measurements taken, enter them in the first four fields of the MulleZerofillCalculator
| Operation | Time | Comment |
|---|---|---|
| 1p alloc | 11.2 | vm_allocate( 1p), vm_deallocate( 1p) |
| 1p alloc + fault | 24.0 | vm_allocate( 1p), page touched - therefore faulted in, vm_deallocate( 1p) |
| 2p alloc | 11.5 | vm_allocate( 2p), vm_deallocate( 2p) |
| 2p alloc + fault | 33.6 | vm_allocate( 2p), 2 pages touched - therefore faulted in, vm_deallocate( 2p) |
vm_allocate and vm_deallocate calls to allocate and free one page of memory takes 11.2. Doing the same amount of calls for a 2 page memory block takes 11.5. There is apparently a 0.3 overhead incurred per page (page extra cost).
page extra cost = 2p alloc - 1p alloc or 11.5 - 11.2 = 0.3Lets assume that allocation and deallocation takes pretty much the same time. Then calls to either
vm_allocate or vm_deallocate would be responsible for half the time measured, with the page extra cost taken into account, we arrive at this formula for allocation cost:
allocation cost = 1p alloc - page extra cost / 2 or (11.2 - 0.3) / 2 = 5.45For each page allocated in one call, the formula would be
allocation cost + page extra cost / 2
The actual mapping of the physical page into userspace and the zerofilling is done during the page fault. Since we measured allocation, page fault and deallocation, the time added because of the actual use of the memory page is the page fault extra cost
page fault extra cost = 1p alloc + fault - 1p alloc or 24.0 - 11.2 = 12.8The time spent in allocation can be expected to have been unchanged, since the OS can't very well know in advance, if a page is actually mapped in or not. Deallocation time can be expected to increase, since now physical memory pages must be mapped out. Here another assumption is made, namely that the extra time spent for mapping in and mapping out is the same, so the cost is divided equally on page faulting and deallocation. This object cost is computed as
object cost = 2 * (1p alloc + fault - 1p alloc) - (2p alloc + fault - 2p alloc) or 2 * (24.0 - 11.2) - (33.6 - 11.5) = 25.6 - 22.1 = 3.5This will be evenly shared with the page fault and the deallocation. So the time spent for one page fault is estimated as
fault + zerofill cost = (2p alloc + fault - 2p alloc) / 2 - object cost / 2 or (33.6 - 11.5) / 2 - 3.5 / 2 = 9.3and the time for deallocation rises to
allocation cost + page extra cost / 2 * pages + object cost / 2 or 5.45 + 0.3 / 2 * 1 + 3.5 / 2 = 7.5To get an estimate of the time spent for zerofilling, i assumed that approximately a quarter of the time used for allocation is spent for the actual page faulting. And I compute zerofill cost like this:
fault + zerofill cost - allocation cost / 4 - object cost / 2 or 9.3 - 5.45 / 4 - 3.5 / 2 = 6.15This in effect weighs zerofilling in at around 50% to 80%, something I observed on my machine using Shark as rather realistic.
vm_allocate will be followed by an access to the first page immediately afterwards.
If faulting optimizations are in effect, then the drag of zerofilling will be even more pronounced. You can optimize zerofilling by not zerofilling :)
mmap/munmap instead of vm_allocate/vm_unallocate for memory allocation (Thanks to Jim Magee for this)
| Download | |
|---|---|
| Binary | MulleZerofillCalc.app.tgz |
| Source | MulleZerofillCalc.src.tgz |
The mmap feature is the biggy in this release. Using mmap to allocate memory is much faster than the vm_allocate as you will find out. On my Cube 450 use of mmap is almost twice as fast as the use of vm_allocate. That is for pure allocation and deallocation. The actual page fault and zerofill costs are - from my observations - virtually :) unaffected. So while this certainly is of big interest, it in itself doesn't change the comments made in part III.
| Operation | mmap | vm_allocate |
|---|---|---|
| 1p alloc | 9.7 | 17.6 |
| 1p alloc + fault | 41.0 | 51.5 |
| 2p alloc | 11.1 | 18.9 |
| 2p alloc + fault | 65.9 | 77,3 |
If you want to know why mmap is faster read on...
Continue reading "Zerofilling - Part IV - New and Improved Application" »
| Download | |
|---|---|
| Binary | MulleZerofillCalc.app.tgz |
| Source | MulleZerofillCalc.src.tgz |
Maynard
A good suggestion and a suggestion that would work. I had asked the same thing Jim Magee off list:
In the back of my mind lurks that factoid, that Mac OS X has a queue of zeroed RAM pages. If the queue is full, then probably a page fault can be serviced in 4-5 us. When it becomes exhausted, like my test program will do it will take 15 us. Correct ? How big is that queue ? And at what rate does it refill ?
The answer was:
Nope. Nothing is done ahead in this regard. Other Mach implementations experimented with this (having the idle loop zero pages ahead instead of doing "nothing"). But it tended to play hell with caches and/or power-management (we _really_ put the processor to sleep in idle).
I would suspect the cache is more of an issue, since the system oughta quiet down after enough pages have been zeroed. Since cleaning a page does invalidate a sizeable part of the cache, this would be quite detrimental to running code I would think.
If I get a little more idle time, I might wrap this up and make Parts I to VI a bona fide "Mulle" article. :)
This page contains all entries posted to Nat!'s Web Journal in August 2003. They are listed from oldest to newest.
July 2003 is the previous archive.
September 2003 is the next archive.
Many more can be found on the main index page or by looking through the archives.