Thanks for the effort hornet, graphs make things so much clearer than a whole bunch of numbers! ;)
Printable View
Thanks for the effort hornet, graphs make things so much clearer than a whole bunch of numbers! ;)
How do you spell ownage? :ROTF:
I asked about this some pages before in this thread but got no viable answer...We need a Nehalem ppd statisctics(actual work it does) in order to measure the SMT influence in WCG.
I see JC has posted that all latest tests were done with 3 channel so that's it when it comes to memory efficiency.
Please explain?
I for one is not exeptionally impressed with initial Nehalem performance, even less if this is with triple channel memory performance.
That a lower clocked TLB patch ridden old socket F system with less than poor memory performance does this good compared to Nehalem should worry you more than gloating "ownage". If Shanghai corrects TDP and clocks I do not see Nehalem as superior as most people in this thread.
Intel employee dr.who's appearance in this thread support my assumptions IMO.
PS: This is not for starting flame war, but based on quite obvious results. I spent quite a long time doing benches and comparing yesterday, so please don't misunderstand this as threadcrapping :)
BTW, why does the Gainestown result show 16 threads? Bug?
yes, but I see now that it is the other run with the bloomfield @ 3,07GHz ;)
You do realize what you are comparing here?
8 Barcelona cores @ ~2.4GHz vs. 4 Nehalem cores @ 2.93GHz.
8 Barcelona cores @ 2.4GHz under 64-bit (which improves performance by 15-20%) are losing to 4 Nehalem cores @ 2.93GHz under 32-bit.
Granted your numbers are a bit lower than they should be for a Barcelona system. If you read Anandtech's MP Barcelona review, Dual Opteron 8356 (2.3GHz) score 14,487, under 64-bit. Were Nehalem also in a 64-bit environment, that would mean that 4 Nehalem cores @ 2.93GHz would be a good 15% faster than 8 Barcelona cores @ 2.3GHz. Work that out and that means that 4 Nehalem cores are not that far behind clock/clock parity with 8 Barcelona cores.
If you look at the Gainestown DP results, they are just as impressive. With SMT enabled (full 16 threads, for some reason only 8 threads were run), Dual Gainestown @ 2.4GHz will effectively tie Quad AMD Opteron 8356 @ 2.3GHz in Cinebench 64-bit.
Now imagine the performance of Quad Beckton (32 cores, 64 threads, 24MB L3/CPU) and you know that AMD is in trouble.
This is a comparison of 8 nehalem threads @2,93GHz vs 8 barcelona threads @2,375Ghz. HT or not, a thread is a thread. That is a more correct description.
? Not sure if you were replying to me or not.
A thread doesn't mean anything, because the 8 threads in Nehalem don't cost you anything. Quad core Nehalem is fully comparable to Quad core Barcelona. DP Gainestown (8 cores) is fully comparable to DP Barcelona (8 cores).
(The only area where a thread would "cost" you is if you were paying for a OS per CPU, then HT would probably be disabled. But that doesn't apply to anything here.)
i think its not that easy, cause when you look at the gainestown score and then at the bloomfield score you see where the difference lies. Gainestown has 8 threads on 8 physical cores, while bloomfield has 8 threads on 4 physical cores and 4 virtual cores.
And if you compare gainestown 8 threads (aka HT disabled) with your opteron 8 threads scores, the result of gainestown are quite impressive.
lol you want put every other system to shame with a score of 22k+ in cinebench?
well, i certainly dont gona stop you. :D
So...because Nehalem supports SMT. You quickly say AMD is better because 2 Quadcores can compete with 1 quadcore?
Try 2 quads vs 2 quads. Or try see what a Bloomsfield cost vs those 2 2000 series Barcelonas.
1 Nehalem Quad@2.93Ghz beats 2 Barcelona Quads@2.4Ghz in Cinebench.
Not even to talk about Barcelona uses 64bit and Nehalem 32bit. (Nehalem would perform some 10% better with 64bit)
1 CPU vs 2 CPUs. 500$ vs 1500$
Or use the Gainestown with SMT disabled. About 50% faster than the dual Barcelonas at the same 2.4Ghz.
It is weird we discuss the numbers Nehalem puts out based on one app that does favor intel hardware more ,and that's Cinebech(what's the use of Cinebench anyway?).
I would like to see some 3D Studio,Lightwave ,Maya etc. numbers.Also some head on comparisons with enc./decod. using optimal compiler tweaks for each uarch(Core2,K10,Nehalem).
You know,the real stuff people actually use.WCG ppd also counts in too.
Cinebench is a 3D rendering benchmark. So it is a good benchmark of how well these CPUs will perform with rendering. And I wouldn't say it favors Intel hardware at all. K8 actually isn't too far behind Conroe in Cinebench. Unfortunately K10 does very little to improve performance in Cinebench beyond adding more cores.
To me one of the most exciting things will be that a quad-core Nehalem overclocked to 4.0GHz will roughly equal the performance of 16 Barcelona cores @ 2.3GHz in Cinebench.... 4x Opteron 8356 is currently a $6,000 setup. A 4GHz Nehalem might be possible out of a $284 CPU.... and certainly the $999 XE part. Obviously you can't overclock a CPU when you are doing critical work, but the power that is going to be available to enthusiasts with Nehalem is quite amazing.
3D Rendering :off::lol: , Decrease and to saving time for 3D rendering I think = Time is Money :ROTF:
http://news.yahoo.com/s/nm/20080708/..._dreamworks_dc
http://i245.photobucket.com/albums/g...aneous/Opt.jpg
http://i245.photobucket.com/albums/g...aneous/P_I.jpg
...
its more weird that since 2006 amd enthusiast suddenly claim that every benchmark out there favours intel and therefor isn't valide (exept of siencemark and spec_fp rate).
What do think happens in boinc? E.g. look at seti, the client is optimized by the community to support the specific instructions of certain cpus.
But sadly only intel gets allways the newst optimisations -> amd is at sse3 while the optimized intel client allready supports sse4.1
And no that hase nothing to do with fanboyism of there side, rather then the bigger userbase the can reach when they are optimizing for intel cpus.
Also why should a dev use generic flags for the compilers? To limit his program artificially? If theres is the option i would allways try to provide optimizations for both cpus or even certain architectures.
Actually Cinebench is using scalar SSE instead of vector SSE instructions thus you see no change in K8 vs K10 scores.The use of vector SSE is the main point of even optimizing for that instr. extension set.Thus Cinebench is not a "good" benchmark.
Also i know that people somehow think that SPI represents a valid test to show a strong fp perfromance (intel clearly owns this test),but there are programs that calculate Pi better on AMD hardware but i don't see anyone using these(http://www.xtremesystems.org/forums/...&highlight=gmp).