Nehalem-EP......BLOOMFIELD

Printable View

Show 100 post(s) from this thread on one page

07-27-2008, 02:23 AM
Epsilon84

Thanks for the effort hornet, graphs make things so much clearer than a whole bunch of numbers! ;)
07-27-2008, 03:18 AM
STaRGaZeR

How do you spell ownage? :ROTF:
07-27-2008, 07:05 AM
JCornell

Quote:

Originally Posted by virtualrain

Are all the Nehalem benches with single channel DDR3? JC, what is the problem with the RAM channels is it the CPU or the board?

All Gainestown are with triple channel ... Latest Bloomfield benches are triple too ... ;)
07-27-2008, 07:13 AM
informal

Quote:

Originally Posted by bowman

How did you do the BOINC benchmark, did you add all the 'CPUs' (logical cores) together?

I'm still wondering if that wouldn't give SMT too much credit, though.

Quote:

Originally Posted by bowman

Yeah, but, as you can see when SMT is on the score is hardly any lower, so if one added those 8 logical cores together SMT would give something like an 80% gain which is just crazy.

Is there any way to test it in a more realistic way?

I asked about this some pages before in this thread but got no viable answer...We need a Nehalem ppd statisctics(actual work it does) in order to measure the SMT influence in WCG.

I see JC has posted that all latest tests were done with 3 channel so that's it when it comes to memory efficiency.
07-27-2008, 07:19 AM
LIKMARK

Quote:

Originally Posted by STaRGaZeR

How do you spell ownage? :ROTF:

Please explain?

I for one is not exeptionally impressed with initial Nehalem performance, even less if this is with triple channel memory performance.

That a lower clocked TLB patch ridden old socket F system with less than poor memory performance does this good compared to Nehalem should worry you more than gloating "ownage". If Shanghai corrects TDP and clocks I do not see Nehalem as superior as most people in this thread.

Intel employee dr.who's appearance in this thread support my assumptions IMO.

PS: This is not for starting flame war, but based on quite obvious results. I spent quite a long time doing benches and comparing yesterday, so please don't misunderstand this as threadcrapping :)

BTW, why does the Gainestown result show 16 threads? Bug?
07-27-2008, 07:23 AM
Hornet331

Quote:

Originally Posted by LIKMARK

BTW, why does the Gainestown result show 16 threads? Bug?

where?

do you mean the screen that JC provided of the 2.4GHz one?
07-27-2008, 07:28 AM
LIKMARK

yes, but I see now that it is the other run with the bloomfield @ 3,07GHz ;)
07-27-2008, 07:31 AM
Extelleron

Quote:

Originally Posted by LIKMARK

Please explain?

I for one is not exeptionally impressed with initial Nehalem performance, even less if this is with triple channel memory performance.

That a lower clocked TLB patch ridden old socket F system with less than poor memory performance does this good compared to Nehalem should worry you more than gloating "ownage". If Shanghai corrects TDP and clocks I do not see Nehalem as superior as most people in this thread.

Intel employee dr.who's appearance in this thread support my assumptions IMO.

PS: This is not for starting flame war, but based on quite obvious results. I spent quite a long time doing benches and comparing yesterday, so please don't misunderstand this as threadcrapping :)

BTW, why does the Gainestown result show 16 threads? Bug?

You do realize what you are comparing here?

8 Barcelona cores @ ~2.4GHz vs. 4 Nehalem cores @ 2.93GHz.

8 Barcelona cores @ 2.4GHz under 64-bit (which improves performance by 15-20%) are losing to 4 Nehalem cores @ 2.93GHz under 32-bit.

Granted your numbers are a bit lower than they should be for a Barcelona system. If you read Anandtech's MP Barcelona review, Dual Opteron 8356 (2.3GHz) score 14,487, under 64-bit. Were Nehalem also in a 64-bit environment, that would mean that 4 Nehalem cores @ 2.93GHz would be a good 15% faster than 8 Barcelona cores @ 2.3GHz. Work that out and that means that 4 Nehalem cores are not that far behind clock/clock parity with 8 Barcelona cores.

If you look at the Gainestown DP results, they are just as impressive. With SMT enabled (full 16 threads, for some reason only 8 threads were run), Dual Gainestown @ 2.4GHz will effectively tie Quad AMD Opteron 8356 @ 2.3GHz in Cinebench 64-bit.

Now imagine the performance of Quad Beckton (32 cores, 64 threads, 24MB L3/CPU) and you know that AMD is in trouble.
07-27-2008, 07:32 AM
Hornet331

Quote:

Originally Posted by LIKMARK

yes, but I see now that it is the other run with the bloomfield @ 3,07GHz ;)

yeah he was only shwoing the database for the result. :up:
07-27-2008, 07:37 AM
LIKMARK

This is a comparison of 8 nehalem threads @2,93GHz vs 8 barcelona threads @2,375Ghz. HT or not, a thread is a thread. That is a more correct description.
07-27-2008, 07:42 AM
Extelleron

Quote:

Originally Posted by LIKMARK

This is a comparison of 8 nehalem threads @2,93GHz vs 8 barcelona threads @2,375Ghz. HT or not, a thread is a thread. That is a more correct description.

? Not sure if you were replying to me or not.

A thread doesn't mean anything, because the 8 threads in Nehalem don't cost you anything. Quad core Nehalem is fully comparable to Quad core Barcelona. DP Gainestown (8 cores) is fully comparable to DP Barcelona (8 cores).

(The only area where a thread would "cost" you is if you were paying for a OS per CPU, then HT would probably be disabled. But that doesn't apply to anything here.)
07-27-2008, 07:43 AM
Hornet331

i think its not that easy, cause when you look at the gainestown score and then at the bloomfield score you see where the difference lies. Gainestown has 8 threads on 8 physical cores, while bloomfield has 8 threads on 4 physical cores and 4 virtual cores.

And if you compare gainestown 8 threads (aka HT disabled) with your opteron 8 threads scores, the result of gainestown are quite impressive.
07-27-2008, 07:50 AM
JCornell

Quote:

Originally Posted by LIKMARK

This is a comparison of 8 nehalem threads @2,93GHz vs 8 barcelona threads @2,375Ghz. HT or not, a thread is a thread. That is a more correct description.

Kind of weird why you compare the Bloomfield(High End Desktop, nearly WS) with your AMD Server ...

Yeap, you're right : thread is a thread, but need to differentiate a Physical Core and Logical Core, there're not the same :shrug:
07-27-2008, 07:55 AM
Shintai

Quote:

Originally Posted by Hornet331

bloomfield has 8 threads on 4 physical cores and 4 virtual cores.

8 logical/virtual, 4 physical :p:
07-27-2008, 07:59 AM
JCornell

Quote:

Originally Posted by Hornet331

Here the Cinebench chart with 2,4ghz gainestown with 8 threads.

real cores still own HT cores. ;)

edit: fixed some typos (damn it was late when i did this graphs :ROTF:) see here

Thanks for your Chart :up:

Need updated CINEBENCH R10 Gainestown 3.07GHz*2 HT Disabled result again ? :rofl:
Nah, I think thats no need to :D
07-27-2008, 08:08 AM
Hornet331

lol you want put every other system to shame with a score of 22k+ in cinebench?

well, i certainly dont gona stop you. :D
07-27-2008, 08:16 AM
flippin_waffles

Quote:

Originally Posted by LIKMARK

This is a comparison of 8 nehalem threads @2,93GHz vs 8 barcelona threads @2,375Ghz. HT or not, a thread is a thread. That is a more correct description.

Exactly right. And I remember the same defense for P4 w/HT vs X2.
07-27-2008, 08:20 AM
Shintai

Quote:

Originally Posted by flippin_waffles

Exactly right. And I remember the same defense for P4 w/HT vs X2.

So...because Nehalem supports SMT. You quickly say AMD is better because 2 Quadcores can compete with 1 quadcore?

Try 2 quads vs 2 quads. Or try see what a Bloomsfield cost vs those 2 2000 series Barcelonas.

1 Nehalem Quad@2.93Ghz beats 2 Barcelona Quads@2.4Ghz in Cinebench.
Not even to talk about Barcelona uses 64bit and Nehalem 32bit. (Nehalem would perform some 10% better with 64bit)
1 CPU vs 2 CPUs. 500$ vs 1500$

Or use the Gainestown with SMT disabled. About 50% faster than the dual Barcelonas at the same 2.4Ghz.
07-27-2008, 08:25 AM
informal

It is weird we discuss the numbers Nehalem puts out based on one app that does favor intel hardware more ,and that's Cinebech(what's the use of Cinebench anyway?).
I would like to see some 3D Studio,Lightwave ,Maya etc. numbers.Also some head on comparisons with enc./decod. using optimal compiler tweaks for each uarch(Core2,K10,Nehalem).
You know,the real stuff people actually use.WCG ppd also counts in too.
07-27-2008, 08:29 AM
Extelleron

Quote:

Originally Posted by informal

It is weird we discuss the numbers Nehalem puts out based on one app that does favor intel hardware more ,and that's Cinebech(what's the use of Cinebench anyway?).
I would like to see some 3D Studio,Lightwave ,Maya etc. numbers.Also some head on comparisons with enc./decod. using optimal compiler paths for each uarch(Core2,K10,Nehalem).

Cinebench is a 3D rendering benchmark. So it is a good benchmark of how well these CPUs will perform with rendering. And I wouldn't say it favors Intel hardware at all. K8 actually isn't too far behind Conroe in Cinebench. Unfortunately K10 does very little to improve performance in Cinebench beyond adding more cores.

To me one of the most exciting things will be that a quad-core Nehalem overclocked to 4.0GHz will roughly equal the performance of 16 Barcelona cores @ 2.3GHz in Cinebench.... 4x Opteron 8356 is currently a $6,000 setup. A 4GHz Nehalem might be possible out of a $284 CPU.... and certainly the $999 XE part. Obviously you can't overclock a CPU when you are doing critical work, but the power that is going to be available to enthusiasts with Nehalem is quite amazing.
07-27-2008, 09:11 AM
Kingcarcas

Quote:

Originally Posted by informal

You know,the real stuff people actually use.WCG ppd also counts in too.

I would like to see Folding@Home SMP client :D
07-27-2008, 09:11 AM
JCornell

3D Rendering :off::lol: , Decrease and to saving time for 3D rendering I think = Time is Money :ROTF:

http://news.yahoo.com/s/nm/20080708/..._dreamworks_dc

http://i245.photobucket.com/albums/g...aneous/Opt.jpg

http://i245.photobucket.com/albums/g...aneous/P_I.jpg

...
07-27-2008, 09:13 AM
Hornet331

Quote:

Originally Posted by informal

It is weird we discuss the numbers Nehalem puts out based on one app that does favor intel hardware more ,and that's Cinebech(what's the use of Cinebench anyway?).
I would like to see some 3D Studio,Lightwave ,Maya etc. numbers.Also some head on comparisons with enc./decod. using optimal compiler tweaks for each uarch(Core2,K10,Nehalem).
You know,the real stuff people actually use.WCG ppd also counts in too.

its more weird that since 2006 amd enthusiast suddenly claim that every benchmark out there favours intel and therefor isn't valide (exept of siencemark and spec_fp rate).

What do think happens in boinc? E.g. look at seti, the client is optimized by the community to support the specific instructions of certain cpus.

But sadly only intel gets allways the newst optimisations -> amd is at sse3 while the optimized intel client allready supports sse4.1

And no that hase nothing to do with fanboyism of there side, rather then the bigger userbase the can reach when they are optimizing for intel cpus.

Also why should a dev use generic flags for the compilers? To limit his program artificially? If theres is the option i would allways try to provide optimizations for both cpus or even certain architectures.
07-27-2008, 09:20 AM
informal

Quote:

Originally Posted by Extelleron

Cinebench is a 3D rendering benchmark. So it is a good benchmark of how well these CPUs will perform with rendering. And I wouldn't say it favors Intel hardware at all. K8 actually isn't too far behind Conroe in Cinebench. Unfortunately K10 does very little to improve performance in Cinebench beyond adding more cores.

Actually Cinebench is using scalar SSE instead of vector SSE instructions thus you see no change in K8 vs K10 scores.The use of vector SSE is the main point of even optimizing for that instr. extension set.Thus Cinebench is not a "good" benchmark.
Also i know that people somehow think that SPI represents a valid test to show a strong fp perfromance (intel clearly owns this test),but there are programs that calculate Pi better on AMD hardware but i don't see anyone using these(http://www.xtremesystems.org/forums/...&highlight=gmp).
07-27-2008, 09:21 AM
Jacky

Quote:

Originally Posted by informal

It is weird we discuss the numbers Nehalem puts out based on one app that does favor intel hardware more ,and that's Cinebech(what's the use of Cinebench anyway?).
I would like to see some 3D Studio,Lightwave ,Maya etc. numbers.Also some head on comparisons with enc./decod. using optimal compiler tweaks for each uarch(Core2,K10,Nehalem).
You know,the real stuff people actually use.WCG ppd also counts in too.

Then head over to anandtech, they tested 3dsm SPECapc and Intel improved ~42% there compared to core 2. I hope you trust their numbers.

Show 100 post(s) from this thread on one page