Nat! bio photo

Nat!

Senior Mull.

Twitter RSS

Github

Part 2 - i7, Quad Core and Dual Core -- Compilation & Geekbench Speeds

Intro

This is the followup to the previous posting Part 1 - i7, Quad Core and Dual Core -- Compilation Times. I copied some of the base information for easier lookup.


The Tested Systems and Configurations

T7300 A white MacBook with a 2 GHz Intel Core 2 Duo and 2 modules for 4 GB total of some aftermarket DDR2@667 MHz RAMs (5.33 GB/s) running 10.6.4 in 32 bit from a 160GB IBM X-25 M SSD. I assume this is dual channel memory, but I am not sure. Estimated memory bandwidth: 10.66 GB/s

Q9650 A Gigabyte X48-DS5 with a 3GHz Intel Q9650 Quad Core and 2 modules for 4 GB total of 5-5-5-15 DDR2@800 MHz RAMs (6.4 GB/s) on two channels (12.8 GB/s) running 10.6.4 in 32 bit from 256 GB a Crucial RealSSD

i7-980x A Gigabyte X58A-UD7 rev2 with a 3.33 GHz Intel i7 980 Extreme and 3 modules for 6 GB total of 7-7-7-20 DDR3@1600MHz RAMs (12.8 GB/s) on three channels (38.4 GB/s) running 10.6.4 in 64 bit from a Crucial RealSSD 256 GB

GHz Cores Threads L1 L2 L3
RAM Type MHz Channels Timing
T7300 2 2 2 64 KB
4 MB
4 GB DDR2 667 ? ?
Q9650 3 4 4 64 KB
12 MB
4 GB DDR2 800 2 5-5-5-15
i7-980x 3,33 6 12 64 KB 256KB 12 MB
6 GB DDR3 1600 3 7-7-7-20













i7-980x 6 GB 3CH-800 3,33 6 12 64 KB 256KB 12 MB
6 GB DDR3 800 3 7-7-7-20
i7-980x 4 GB 2CH-1600 3,33 6 12 64 KB 256KB 12 MB
4 GB DDR3 1600 2 7-7-7-20
i7-980x 3TH/6TH 3,33 3 6 64 KB 256KB 12 MB
6 GB DDR3 1600 3 7-7-7-20
i7-980x 3CH/3TH 3,33 3 3 64 KB 256KB 12 MB
6 GB DDR3 1600 3 7-7-7-20
i7-980x 3.6 3,6 6 12 64 KB 256KB 12 MB
6 GB DDR3 1600 3 7-7-7-20


The Measurements

The Geekbench measurements look more accurate in terms of digits produced, than they really are. When I make a number of subsequent test runs, my Geekbench score on the i7-980x configuration vary from 13750 to 13950. This is of course due to the vagaries of a multitasking system. I didn't run a hundred benchmarks for each configuration but only two and picked the better one.

Best results are marked green.

Machine GEEKBENCH INT FP MEM STREAM
T7300 2866 2319 4151 2137 1751
Q9650 6267 5423 9912 3275 2454
i7-980x 3C/3TH 7443 5396 10479 5504 7868
i7-980x 3C/6TH 8535 6656 13003 4895 6761
i7-980x 4 GB 2CH-1600 13861 12002 23210 4622 6129
i7-980x 6 GB 3CH-800 13888 11921 23596 4355 5866
i7-980x 13894 11987 23250 4805 6008
i7-980x 3.6 GHz 14975 12923 25132 5010 6547


Analysis of Geekbench Data

One measurement that immediately grabs the attention, is the 3 core/3 thread configuration, that is a lot faster in memory and stream benchmarking than any other. My theories why this is so are:

  • less threads competing for the same cache lines
  • longer sequential access to consecutive memory allows longer bursts

If you're moving lots of memory around, adding more threads may not be the answer.

Geekbench results for various configurations

Changing the memory system configuration had very little effect on the overall Geekbench numbers. This is again very surprising to me, but I can only relate, what I experienced. It's amusing that the two-channel configuration (i7-980x 4 GB 2CH-1600) has a better STREAM score than the three-channel configuration (i7-980x).

Over-clocking to 3.6 GHz made sense, because when all cores are running, the i7-980x does not go into turbo mode (which is also 3.6 GHz but only for a select few number of cores). A 10 % increase in CPU speed yields an increase of about 8% in Geekbench.


Analysis of Geekbench Data and Compile Times

One of my pet theories has been, that compilation is a stream bound process. The algorithms behind a compiler are fairly simple and it would have appeared to me, that the main work was simply a matter of pushing all the files through memory.

Machine COMPILE 1
#1: T7300 243
#2: Q9650 172
#3: i7-980x 3C/3TH 81
#4: i7-980x 3C/6TH 72
#5: i7-980x 4 GB 2CH-1600 47
#6: i7-980x 6 GB 3CH-800 48
#7: i7-980x 47
#8: i7-980x 3.6 GHz 44

This turns out not to be the case, because the compile time for the 3 core/3 thread configuration is about 50% slower than the 6 core/12 thread configuration.

Normalizing to the T7300 as the base configuration, I calculated the compile time speed of a configuration as the factor it runs faster (e.g. 47/243 for i7-980x). Then I also normalized the Geekbench scores to the T7300 and then combined the results in the following diagram.

I didn't chart FP, because in my results, it's basically a scaled INT score. And also I still hold on to my belief, that there is very little FP calculation done during compilation. I am basing this belief on my computer science studies admittedly a long long time ago, where compiler construction was my main field.

Geekbench and Compiletimes normalized and charted

As a help of interpretation: each vertical line matches a configuration entry in the results table. So at the second vertical line (#2), that represents the Q9650, you see that the compile time factor COMPILE closely matches that of STREAM. Whereas for the configuration i7-980x (#7) the INT factor and the compile time factor are nearly identical.

It would have been nice if any one of the Geekbench lines would have been a linear scale of the compile time line. Then there would have been a direct linear correlation between that class of Geekbench results and compile time. Unfortunately that is NOT the case. By visual inspection, the GEEKBENCH score seems to be the one with the least deviation over all.

i7-980x 3CH/3TH has a GEEKBENCH score of 7443 and Q9650 has a score of 6267, which I would consider to be in the same ballpark. Yet the i7-980x 3CH/3TH compiles twice as fast as the Q9650, proving that you can't use Geekbench scores to accurately estimate compilation times.

Over-clocking yields a 8% increase in GEEKBENCH score and also about the same percentage increase in compilation time speedup.

With all this in mind I am tempted to formulate the hypothesis, that the GeekBench line closest to the COMPILE line indicates the main bottleneck for compilation of that configuration. (Please discuss :))


Conclusions

For those, who skipped everything and are only reading the conclusions: This screed is solely concerned with code compilation performance, nothing else.

  1. Geekbench is no substitute for actual compilation benchmarks.
  2. Of all the numbers Geekbench produces the combined Geekbench score seems to correlate best with compile times.
  3. The memory system is not much of a factor when compiling on an i7-980x system.
  4. With ample main memory, I/O either to the hard-disk or to the SSD is a negligible factor.
  5. Data intensive applications may actually be slowed down by a larger number of worker threads.
  6. Hyper-threading seems to lose its luster with the addition of more cores. Hyper-threading made a significant difference with 3 cores, but that advantage almost vanished with six cores. I would guess that cache contention is the limiting factor.
  7. Over-clocking your system brings rewards to those who dare.
  8. Using Geekbench and compile times results together may hint at the bottleneck of the system.
  9. I may have crippled my Q9650 system with slow memory.