I downloaded "gawk" from GNU and build it once just using configure and the standard apple allocator and once I inserted the Mullocator. Then I ran this little benchmark I found somewhere....
# Contributed by Eiso AB <eiso@chem.rug.nl>
BEGIN {
Switch["123"] = " abc "
Switch["82"] = " def "
Switch["985"] = " ghi "
Switch["20"] = " jkl "
Switch["1098"] = " mno "
Switch["3874"] = " pqr "
Switch["272"] = " stu "
Switch_R["123"] = " 123 "
Switch_R["82"] = " 82 "
Switch_R["985"] = " 985 "
Switch_R["20"] = " 20 "
Switch_R["1098"] = " 1098 "
Switch_R["3874"] = " 3874 "
Switch_R["272"] = " 272 "
for (i=0; i <30000; i++)
{
s1 = s2 = s3 = " 123 82 985 20 1098 3874 272 "
for (j in Switch)
{
# Manually doing a gsub
while (match(s1, j))
s1 = substr(s1, 1, RSTART-1) Switch[j] substr(s1, RSTART+RLENGTH)
# Use gsub
gsub(j, Switch[j], s2)
# gsub, and prevent RE recompile
gsub(Switch_R[j], Switch[j], s3)
}
}
}
Results (before wedging)
bash-2.05a$ time ./mullegawk -f bench1.awk real 0m16.922s user 0m14.140s sys 0m2.480s bash-2.05a$ time ./applegawk -f bench1.awk real 0m22.372s user 0m19.540s sys 0m2.310sOk now as I had already the mullocator as the malloc library in the
MulleMallocTracerLib, I just used the removed all tracing code (with clever use of #ifdef (take that Java)) to get a fairer comparison between the Mullocator and the Apple malloc. I then ran the test again (slightly different, because I need to use a shell script wrapper)
bash-2.05a$ time ./gawk.sh real 0m24.526s user 0m21.810s sys 0m2.240sCatastrophe! My code was actually slower than Apple's! Now how could that happen :) ? Fortunuately I had compiled it with -DDEBUG which does lots of checking. With -DDEBUG removed I got
bash-2.05a$ time ./gawk.sh real 0m18.709s user 0m15.290s sys 0m2.030s bash-2.05a$This indicates to me, that I am losing a lot time because of the need to bridge into shared library land, which I could avoid before. And also the extraneous wrapping call as f.e.
malloc is now coded like this adds to the punishment
void *malloc( size_t size)
{
void *ptr;
ptr = mulle_malloc( size);
return( ptr);
}
As I believe that my allocator performs even better when lots of memory is allocated and active, I am looking for some more benchmarks, that stress the memory system even more.