Part 2 - i7, Quad Core and Dual Core -- Compilation & Geekbench Speeds
Intro
This is the followup to the previous posting Part 1 - i7, Quad Core and Dual Core -- Compilation Times. I copied some of the base information for easier lookup.
The Tested Systems and Configurations
|
GHz | Cores | Threads | L1 | L2 | L3 | RAM | Type | MHz | Channels | Timing | ||
T7300 | 2 | 2 | 2 | 64 KB | 4 MB | 4 GB | DDR2 | 667 | ? | ? | ||
Q9650 | 3 | 4 | 4 | 64 KB | 12 MB | 4 GB | DDR2 | 800 | 2 | 5-5-5-15 | ||
i7-980x | 3,33 | 6 | 12 | 64 KB | 256KB | 12 MB | 6 GB | DDR3 | 1600 | 3 | 7-7-7-20 | |
i7-980x 6 GB 3CH-800 | 3,33 | 6 | 12 | 64 KB | 256KB | 12 MB | 6 GB | DDR3 | 800 | 3 | 7-7-7-20 | |
i7-980x 4 GB 2CH-1600 | 3,33 | 6 | 12 | 64 KB | 256KB | 12 MB | 4 GB | DDR3 | 1600 | 2 | 7-7-7-20 | |
i7-980x 3TH/6TH | 3,33 | 3 | 6 | 64 KB | 256KB | 12 MB | 6 GB | DDR3 | 1600 | 3 | 7-7-7-20 | |
i7-980x 3CH/3TH | 3,33 | 3 | 3 | 64 KB | 256KB | 12 MB | 6 GB | DDR3 | 1600 | 3 | 7-7-7-20 | |
i7-980x 3.6 | 3,6 | 6 | 12 | 64 KB | 256KB | 12 MB | 6 GB | DDR3 | 1600 | 3 | 7-7-7-20 |
The Measurements
The Geekbench measurements look more accurate in terms of digits produced, than they really are. When I make a number of subsequent test runs, my Geekbench score on the i7-980x configuration vary from 13750 to 13950. This is of course due to the vagaries of a multitasking system. I didn't run a hundred benchmarks for each configuration but only two and picked the better one.
Best results are marked green.
Machine | GEEKBENCH | INT | FP | MEM | STREAM |
T7300 | 2866 | 2319 | 4151 | 2137 | 1751 |
Q9650 | 6267 | 5423 | 9912 | 3275 | 2454 |
i7-980x 3C/3TH | 7443 | 5396 | 10479 | 5504 | 7868 |
i7-980x 3C/6TH | 8535 | 6656 | 13003 | 4895 | 6761 |
i7-980x 4 GB 2CH-1600 | 13861 | 12002 | 23210 | 4622 | 6129 |
i7-980x 6 GB 3CH-800 | 13888 | 11921 | 23596 | 4355 | 5866 |
i7-980x | 13894 | 11987 | 23250 | 4805 | 6008 |
i7-980x 3.6 GHz | 14975 | 12923 | 25132 | 5010 | 6547 |
Analysis of Geekbench Data
One measurement that immediately grabs the attention, is the 3 core/3 thread configuration, that is a lot faster in memory and stream benchmarking than any other. My theories why this is so are:
- less threads competing for the same cache lines
- longer sequential access to consecutive memory allows longer bursts
If you're moving lots of memory around, adding more threads may not be the answer.
Changing the memory system configuration had very little effect on the overall Geekbench numbers. This is again very surprising to me, but I can only relate, what I experienced. It's amusing that the two-channel configuration (i7-980x 4 GB 2CH-1600) has a better STREAM score than the three-channel configuration (i7-980x).
Over-clocking to 3.6 GHz made sense, because when all cores are running, the i7-980x does not go into turbo mode (which is also 3.6 GHz but only for a select few number of cores). A 10 % increase in CPU speed yields an increase of about 8% in Geekbench.
Analysis of Geekbench Data and Compile Times
One of my pet theories has been, that compilation is a stream bound process. The algorithms behind a compiler are fairly simple and it would have appeared to me, that the main work was simply a matter of pushing all the files through memory.
Machine | COMPILE 1 |
#1: T7300 | 243 |
#2: Q9650 | 172 |
#3: i7-980x 3C/3TH | 81 |
#4: i7-980x 3C/6TH | 72 |
#5: i7-980x 4 GB 2CH-1600 | 47 |
#6: i7-980x 6 GB 3CH-800 | 48 |
#7: i7-980x | 47 |
#8: i7-980x 3.6 GHz | 44 |
This turns out not to be the case, because the compile time for the 3 core/3 thread configuration is about 50% slower than the 6 core/12 thread configuration.
Normalizing to the T7300 as the base configuration, I calculated the compile time speed of a configuration as the factor it runs faster (e.g. 47/243 for i7-980x). Then I also normalized the Geekbench scores to the T7300 and then combined the results in the following diagram.
I didn't chart FP, because in my results, it's basically a scaled INT score. And also I still hold on to my belief, that there is very little FP calculation done during compilation. I am basing this belief on my computer science studies admittedly a long long time ago, where compiler construction was my main field.
As a help of interpretation: each vertical line matches a configuration entry in the results table. So at the second vertical line (#2), that represents the Q9650, you see that the compile time factor COMPILE closely matches that of STREAM. Whereas for the configuration i7-980x (#7) the INT factor and the compile time factor are nearly identical.
It would have been nice if any one of the Geekbench lines would have been a linear scale of the compile time line. Then there would have been a direct linear correlation between that class of Geekbench results and compile time. Unfortunately that is NOT the case. By visual inspection, the GEEKBENCH score seems to be the one with the least deviation over all.
i7-980x 3CH/3TH has a GEEKBENCH score of 7443 and Q9650 has a score of 6267, which I would consider to be in the same ballpark. Yet the i7-980x 3CH/3TH compiles twice as fast as the Q9650, proving that you can't use Geekbench scores to accurately estimate compilation times.
Over-clocking yields a 8% increase in GEEKBENCH score and also about the same percentage increase in compilation time speedup.
With all this in mind I am tempted to formulate the hypothesis, that the GeekBench line closest to the COMPILE line indicates the main bottleneck for compilation of that configuration. (Please discuss :))
Conclusions
For those, who skipped everything and are only reading the conclusions: This screed is solely concerned with code compilation performance, nothing else.
- Geekbench is no substitute for actual compilation benchmarks.
- Of all the numbers Geekbench produces the combined Geekbench score seems to correlate best with compile times.
- The memory system is not much of a factor when compiling on an i7-980x system.
- With ample main memory, I/O either to the hard-disk or to the SSD is a negligible factor.
- Data intensive applications may actually be slowed down by a larger number of worker threads.
- Hyper-threading seems to lose its luster with the addition of more cores. Hyper-threading made a significant difference with 3 cores, but that advantage almost vanished with six cores. I would guess that cache contention is the limiting factor.
- Over-clocking your system brings rewards to those who dare.
- Using Geekbench and compile times results together may hint at the bottleneck of the system.
- I may have crippled my Q9650 system with slow memory.