Nat! bio photo

Nat!

Senior Mull

Twitter Github Twitch

Gawk benchmark

I downloaded "gawk" from GNU and build it once just using configure and the standard apple allocator and once I inserted the Mullocator. Then I ran this little benchmark I found somewhere....

# Contributed by Eiso AB <eiso@chem.rug.nl>

BEGIN {
  Switch["123"] = " abc "
  Switch["82"] = " def "
  Switch["985"] = " ghi "
  Switch["20"] = " jkl "
  Switch["1098"] = " mno "
  Switch["3874"] = " pqr "
  Switch["272"] = " stu "

  Switch_R["123"] = " 123 "
  Switch_R["82"] = " 82 "
  Switch_R["985"] = " 985 "
  Switch_R["20"] = " 20 "
  Switch_R["1098"] = " 1098 "
  Switch_R["3874"] = " 3874 "
  Switch_R["272"] = " 272 "

  for (i=0; i <30000; i++)
  {
    s1 = s2 = s3 = " 123 82 985 20 1098 3874 272 "

    for (j in Switch)
    {
      # Manually doing a gsub
      while (match(s1, j))
        s1 = substr(s1, 1, RSTART-1) Switch[j] substr(s1, RSTART+RLENGTH)

      # Use gsub
      gsub(j, Switch[j], s2)

      # gsub, and prevent RE recompile
      gsub(Switch_R[j], Switch[j], s3)
    }
  }
}

Results (before wedging)

bash-2.05a$ 
time ./mullegawk -f bench1.awk 

real    0m16.922s
user    0m14.140s
sys     0m2.480s

bash-2.05a$ time ./applegawk -f bench1.awk 

real    0m22.372s
user    0m19.540s
sys     0m2.310s

Ok now as I had already the mullocator as the malloc library in the MulleMallocTracerLib, I just used the removed all tracing code (with clever use of #ifdef (take that Java)) to get a fairer comparison between the Mullocator and the Apple malloc. I then ran the test again (slightly different, because I need to use a shell script wrapper)

bash-2.05a$ time ./gawk.sh 

real    0m24.526s
user    0m21.810s
sys     0m2.240s

Catastrophe! My code was actually slower than Apple's! Now how could that happen :) ? Fortunuately I had compiled it with -DDEBUG which does lots of checking. With -DDEBUG removed I got

bash-2.05a$ time ./gawk.sh 

real    0m18.709s
user    0m15.290s
sys     0m2.030s
bash-2.05a$ 

This indicates to me, that I am losing a lot time because of the need to bridge into shared library land, which I could avoid before. And also the extraneous wrapping call as f.e. malloc is now coded like this adds to the punishment

void   *malloc( size_t size)
{
   void  *ptr;

   ptr = mulle_malloc( size);
   return( ptr);
}

As I believe that my allocator performs even better when lots of memory is allocated and active, I am looking for some more benchmarks, that stress the memory system even more.