January 6, 2012

Using --eh-frame-header when using -fobjc-exceptions GNU/Linux: good, but not necessarily good enough

This is on Linux, with the gcc compiler (4.6). Not OS X.

When you compile code with -fobjc-exceptions, you presumably do that to somewhere execute a throw. Lets say, this is my callstack:

abort// glibc
objc_exception_throw// libobjc
-[MyClass methodThrowingAnException]// myclass.so
myClassMethodBouncer// myclass.so
function_with_a_callback// thirdparty.so
-[MyClass waitingForAnException]// myclass.so
main// main

What has happened ? I compiled all the MyClass code with -fobjc-exceptions and even did not forget to link the resulting shared library with --eh-frame-header. So all my code and libobjc.so has PT_GNU_EH_FRAME information, which is necessary for the "modern" C++ like stack unwinding.

Still not good enough. The third party library code function_with_a_callback, was not linked with --eh-frame-header and the unwinding stops right there. No catch handler can be found and objc_exception_throw treats this as an uncaught exception.

An example for the rule, that with every increase of complexity, there is also an increase in brittleness.

December 2, 2011

Measuring context switches :: a small expedition. Part IV

The results are all similiar for the other CPUs/Cores except for CPU0, where it's deterministically different each time:
(27 -> 540) 0.002849s

18859 : 0
19009 : 0
19197 : 0
19306 : 0
19438 : 0
19444 : 0
19444 : 0
19600 : 0
19650 : 0
19765 : 0
19874 : 0
19903 : 0
19912 : 0
19941 : 0
19965 : 0
20441 : 0
20547 : 0
20617 : 0
20629 : 0
20880 : 0
21409 : 0
21862 : 0
23376 : 0
First thing to notice, the program only runs for 0.0028s. That means the frequency of context switches is much higher on CPU 0. 100 times higher in this case. The duration of each context switch is also longer, lets say the average is 20000 cycles making that about 7 times longer.

December 1, 2011

Measuring context switches :: a small expedition. Part III

Lets see some results:
(24 -> 480) 0.229874s

3356 : 18
3365 : 18
3376 : 18
3382 : 18
3388 : 18
3397 : 18
3403 : 18
3409 : 18
3412 : 18
3412 : 18
3415 : 18
3421 : 18
3429 : 18
3429 : 18
3432 : 18
3456 : 18
3479 : 18
3585 : 18
3647 : 18
3691 : 18
4471 : 18
4756 : 18
8265 : 18
The program took about 0.23s to run, in that time it received 23 context switches. That means a context switch happened every 0.1s i.e. 10Hz. Thats not suprisingly the documented preemption time slice for Mac OS X.

A context switch on CPU 18 took on average about 3.400 cycles. One may assume at these times, that the CPU is otherwise idle.

10 times 3.400 cycles is the approximate number of cycles lost per second: 34.000 cycles. Since my CPU is running at about 3.4GHz with Turboboost, that means - in relation to 3.400.000.000 cycles - context switching lost me about 0.001 % performance.

Thinking about this opens up some interesting questions, but lets gather another CPUs results to get some more data... Next time.

Measuring context switches :: a small expedition. Part II

Here is the main routine, that is used for testing. Can't really say more than what's in the comments. The program will wait until a number (currently 23) context switches have hit it. It then prints out the number of seconds elapsed and a list with the durations of said context switches.
#include <assert.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/time.h>
#include <time.h>


extern unsigned long long   _rdtsc();
extern unsigned int         _apicID();


typedef struct
{
   unsigned int   gap;
   unsigned int   apicID;
} context_switch_record;


static int   compare_context_switch_record( context_switch_record *a, 
                                            context_switch_record *b)
{
   return( (int) a->gap - (int) b->gap);
}


int   main( int argc, char *argv[]) 
{
   context_switch_record   infos[ 23];
   unsigned long long  a, b;
   unsigned long long  normal;
   unsigned long long  diff;
   unsigned int        i;
   struct timeval      start;
   struct timeval      stop;
   struct timeval      elapsed;
   
   gettimeofday( &start, NULL);

   //
   // establish a "gap" between two rdtsc() calls, that 
   // should be considered as normal (no interrupt)
   //
   a      = _rdtsc();
   b      = _rdtsc();
   normal = (b - a) * 20;
   printf( "(%qd -> %qd) ", b - a, normal);
   
   for( i = 0; i < sizeof( infos) / sizeof( context_switch_record); i++)
   {
      b  = _rdtsc();
      for(;;)
      {
         a    = b;
         b    = _rdtsc();
         diff = b - a;

	 //
	 // if the delay between a and b is to large
	 // then record it
	 //
         if( diff > normal)
            break;
      }
      assert( diff <= INT_MAX);
      
      // 
      // record cycles spent and current cpuID
      // 
      infos[ i].gap    = diff;
      infos[ i].apicID = _apicID();
   }
   
   gettimeofday( &stop, NULL);

   //
   // compute time spent waiting for n context switches
   //   
   elapsed.tv_sec  = stop.tv_sec - start.tv_sec;
   elapsed.tv_usec = stop.tv_usec - start.tv_usec;
   if( stop.tv_usec < start.tv_usec)
   {
      elapsed.tv_sec -= 1;
      elapsed.tv_usec = 1000000L - start.tv_usec + stop.tv_usec;
   }
   printf( "%d.%06lds\n\n", (int) elapsed.tv_sec, (long) elapsed.tv_usec);
   
   //
   // output data and done
   //   
   qsort( &infos, sizeof( infos) / sizeof( context_switch_record), 
          sizeof( context_switch_record), 
          (void *) compare_context_switch_record);
   
   for( i = 0; i < sizeof( infos) / sizeof( context_switch_record); i++)
      printf( "%u : %u\n", infos[ i].gap, infos[ i].apicID);

   return( 0);
}
Here is the Xcode Project.

Categories