Nat! bio photo

Nat!

Senior Mull

Twitter Github Twitch

msync KILLPAGES and a post from Jim Magee

With msync( KILLPAGES ...) you can give the kernel a hint, that a page is unused and does not need to be swapped out. This way your app can still retain the memory and if space runs tight in the system that memory can be made available.

Now this ain't a cheap operation, so I figured I start using that when 4 MB worth of free memory have accumulated,

Here's a post from Jim Magee (IMO Apple's best guy) about msync from the darvin-developers mailing list:

I think this is going to be application dependent. Our general rule was it was better to just vm_deallocate() them (from profiling "typical" large allocation patterns in malloc()) so that's why the default [scalable malloc zone] algorithm does that. You, obviously, see a different pattern in your application (maybe you free and re-allocate large regions more often than is typical of all applications). So, you would need to find your own algorithm.

But here are some things to consider:

  • Killing a page will take about 4-6us total with an optional zero-fill fault(4-5us) if it gets stolen later. The cost of the kill can be amortized over a range of pages - but there is still per-page overhead. Total overhead: between 4 and 11us per page (slightly lower if large ranges in play)
  • Deallocation (~4us), re-allocation (~4us) and zero-fill faulting (4-5us). The cost of the allocation and deallocation may be amortized over several pages - with no per page overhead. Total overhead: ~12us per page (much lower if large allocations in play)
  • Swapping a page in and out will cost ~20ms total. If the data is useless to you (and you have no assurance that you are going to come back and reuse/re- reference it in the quite near future), do one or the other of the above. But if you KNOW you are going to keep the pages hot, do this (zero cost per page).

--Jim --Jim

Killing a page, means calling msync with the KILLPAGES parameter. The timings have to be his reference system. they can't be absolute obviously. I doubt that the zero fill fault is as fast as he indicates :) Not according to what I see. Should be more like 15 us. You can test this yourself with this program and using Shark.

Shark is in the new CHUD tools and it's really cool. Much better than Shakira.

#import <Foundation/Foundation.h>
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include <sys/types.h>
#include <sys/time.h>
#include <sys/wait.h>
#include <unistd.h>
#include <mach/mach_init.h>
#include <mach/vm_map.h>
#include <mach/vm_statistics.h>

/* Try this to see cost of zero page faulting 
 * in relationship to allocate and deallocate
 * just run and do the math :)
 * PAGES=1 LOOPS=200000
 */
#define PAGES   1
#define LOOPS   (1000*200)

/* Try this to see cost of zero page faulting
 * in relationship to "other activtity" during
 * faulting. Some believe zerofill is cheap.
 * Check it out with Shark. Use a little fuzz,
 * set samples to 50 us and let run for 300000
 * samples or so when it's zero filling
 * PAGES=2048 LOOPS=1000
 */


void   alloc_loop()
{
   unsigned int   i;
   char           *block;

   for( i = 0; i < LOOPS; i++)
   {
      if( vm_allocate( mach_task_self(), (vm_address_t *) &block, PAGES * vm_page_size, 1))
         abort();

      if( vm_deallocate( mach_task_self(), (vm_address_t) block, PAGES * vm_page_size))
         abort();
   }
}


void   touch_loop()
{
   unsigned int   i, j;
   char           *block;

   for( i = 0; i < LOOPS; i++)
   {
      if( vm_allocate( mach_task_self(), (vm_address_t *) &block, PAGES * vm_page_size, 1))
         abort();

      for( j = 0; j < PAGES * vm_page_size; j += vm_page_size)
         block[ j] = 1;

      if( vm_deallocate( mach_task_self(), (vm_address_t) block, PAGES * vm_page_size))
         abort();
   }
}




int  main()
{
   alloc_loop();        /* warm up */

   NSLog( @"Just alloc start\n");
   alloc_loop();
   NSLog( @"Just alloc stop\n");

   NSLog( @"Zero fill start\n");
   touch_loop();
   NSLog( @"Zero fill stop\n");

   return( 0);
}