actually these are very benchmarks, i can still access the pictures through url :)
Printable View
JJ is right.
Even though it's very sweet of justhefax to "normalize" the results, his calculations say exactly nothing.
A calculation that would come closer, is to first find out the scaling co-efficient for those benchmarks with a "normal" Agena (compare 2.3 to 2.7 performance to see how well it scales) and then use those coefficients to track back how the shanghai would do at 2.3 Ghz.
All in all this looks to be very promising. We allready knew amd would not get performance crown any time soon, but it seems that at this point, they could be taking the perf/watt crown back from intel.
Ran a few of those shanghai benchmarks here at 2.3GHz with mem at 667MHz (1) and 2.7GHz at 800MHz(2). I repeated each benchmark 5 times and the derivation stays sub 2% for all results.
http://www.abload.de/img/shanghai-projectionp1s0.jpg
Based on that I estimated how that 2356 barcelona would operate at 2.7GHz with mem at 800MHz.
Last row shows improvement of the 2384 over the estimated 2356 at the same clocks. I assumed both opterons ran with 2GHz NB speed.
Don't know where to get those other benchmarks, expecialy mysql would be interesting.
this should be the test config
Intel: Intel S5000PSL, 8 GB DDR2-800 FBDIMM (8 modules). AMD: ASUS KFSN5-D, 8 GB DDR2-800 (8 modules). both: Samsung Spinpoint F1 1TB, Intel X25M 80GB SSD (database benchmark only), Cooler Master iGreen 850W
No he is not right. There is a reason why I included here http://www.xtremesystems.org/forums/...7&postcount=14 Xeon 3.33 Vs. Xeon 2.66. Notice that there is only 16% improvement when clock difference is 25%. So, you can´t do an accurate estimation between 2.3 Barcelona and a simulated 2.3 Shanghai as justthefax did -thanks anyways- (17% less clock, 17% less perf).
So if Barcelona perf scale is similar to what we see with Xeon 2.66 - 3.33, then I guess there is an ~8-10% average improvement clock for clock between Barcelona and Shanghai in these benchs (and except MySQL virtualization or perhaps Sunguard, they aren't server workloads at all). And that perfomance improvement is just fine, keeping in mine we are only talking about a new process with a few tweaks, and not about a new arch.
Of course, in the other hand, power consumption improvement is superb.
JJ stated they are not accurate, I concur. Saying they tell us nothing is exaggeration.
It's true that extrapolation from those numbers underestimates K10.5 performance slightly, but the numbers do not lie, it's just a shrink with very good power consumption.
I have an idea. Let's compare intel scaling numbers, which are available from the same source. Intel and AMD have the same clock-scaling, Intel scales less if limited by bandwidth in some tests, but on the whole they're similiar - I've seen the clock-scaling tests done.Quote:
A calculation that would come closer, is to first find out the scaling co-efficient for those benchmarks with a "normal" Agena (compare 2.3 to 2.7 performance to see how well it scales) and then use those coefficients to track back how the shanghai would do at 2.3 Ghz.
Unsurprisingly cinebench and povray scale almost perfectly with a 25% clock-change on Intel's crippled FSB, FlamMap FSPRO and Sungard AA scale nicely plus/minus a couple percent.
So we can use those benches to asses K10 vs K10.5, because the clock difference is only ~17% and it should show perfect scaling.
In those 4 tests K10.5 is on average 3-4% faster, it's a die shrink QED.
EDIT:
I'm fine with that, I actually expect something like that. 8-10% seems healthy for a die shrink (penryn is in that range too, probably 8% or slightly lower).
EDIT2:
I'm no expert, correct me if I'm wrong but isn't this intentional crippling of the Intel system? Doesn't every single fb-dimm controller consume power? So if they used 2*4GB they could have reduced the fb-dimm consumption by up to 75%?
EDIT3:
Yes, you do, but the overall success of an arch can be predicted more easily when you compare average IPC change at the same power draw, clock scaling, die size and yields.
Personally I believe it's going to be a tough time for AMD because of i7, it will eat their only high ASP market (Server, HPC) and push down the prices of the already high-yielding (smaller die than K10.5) and well scaling (clocks!) penryns. But that's just an economic prediction, I don't want to steal their thunder. Good job AMD.
His calculations tell us nothing because they are grossly inaccurate.
How inaccurate do you want it to be, before you call such calculations useless?
For me being off +- 7% when even one % makes a big difference is no longer in the realm of useful in any way, but that's just me.
Link
16 Shanghai cores @2.7Ghz are 42% faster on SPECjbb2005 benchmark than 24 Dunnington cores running at 2.66Ghz...With perfect frequency scaling,intel would need 24 cores @ 3.7Ghz to just match teh Shaghai 4P server....Quote:
IBM posts leadership 4-processor blade score on SPECjbb2005 benchmark
November 13, 2008 ... IBM® BladeCenter® LS42 using IBM JavaTM6 Runtime Environment,
achieved a leadership 4-processor blade result of 721,843 SPECjbb2005® business operations
per second (SPECjbb2005 bops) and 180,461 SPECjbb2005 bops/JVM, running SPECjbb2005
(Java Business Benchmark), SPEC’s benchmark for evaluating the performance of servers
running typical Java applications.
The LS42 was configured with the AMD Opteron™ Model 8384 quad-core processor at 2.7GHz
with 2MB L2 cache and 6MB L3 cache (4 chips/16 cores/4 cores per chip), 64GB of memory, one
36.4GB disk drive, and IBM Java 6 (using a 1875MB heap), and Microsoft® Windows® Server
2008 Enterprise x64 Edition. (1)
The LS42’s score demonstrates the highest performance achieved to date by a 4-processor
blade server, surpassing the score of 383,456 SPECjbb2005 bops of the Dell PowerEdge M905,
which ran BEA JRockit® 6.0, and used the AMD Opteron Model 8356 quad-core processor at
2.3GHz (4 chips/16 cores/4 cores per chip). (2)
The LS42’s score also handily beats—by 42%—the 508,240 SPECjbb2005 bops of the Dell
PowerEdge R900, which ran BEA JRockit 6.0, and used the Intel® Xeon® Processor X7460 at
2.66GHz (4 chips/24 cores/6 cores per chip). (3)
BladeCenter LS42 blade servers, coupled with the BladeCenter chassis, deliver advanced
application serving with performance, power efficiency, and scalability ideal for enterprise
environments.
Results referenced are current as of November 13, 2008. The SPECjbb2005 results have been
submitted to SPEC for review. Upon successful review, the result will be posted at www.spec.org,
which contains a complete list of published SPECjbb2005 results.
(1) The LS42 model using the AMD Opteron Model 8384 quad-core processor is planned to be
generally available on November 30, 2008.
(2) Dell PowerEdge M905: 383,456 SPECjbb2005 bops and 95,854 SPECjbb2005 bops/JVM,
using four AMD Opteron 8356 quad-core processors at 2.3GHz (4 chips/16 cores/4 cores per
chip), 32GB of memory, one 36GB disk drive, and BEA JRockit 6.0 P27.5.0. The comparison is
based on Dell’s best SPECjbb2005 score for a 4-processor blade server published at SPEC as of
November 13, 2008.
(3) Dell PowerEdge R900: 508,240 SPECjbb2005 bops and 127,060 SPECjbb2005 bops/JVM,
using four Intel Xeon Processor X7460 at 2.66GHz (4 chips/24 cores/6 cores per chip), 64GB of
memory, two 36GB disk drives, and Oracle JRockit 6.0 P27.5.0. The comparison is based on
Dell’s best SPECjbb2005 score for a 4-processor server published at SPEC as of November 13,
2008.View all published results at www.spec.org/jbb2005/results/jbb2005.html
IBM and System x are trademarks or registered trademarks of IBM Corporation.
BEA JRockit is a registered trademark of BEA Systems, Inc.
Intel and Xeon are registered trademarks of Intel Corporation.
Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc., in the United
States, other countries, or both.
Microsoft and Windows are registered trademarks of Microsoft Corporation.
SPEC and SPECjbb2005 are trademarks or registered trademarks of Standard Performance
Evaluation Corporation (SPEC).
All other company/product names and service marks may be trademarks or registered
trademarks of their respective companies.
Who cares?Are you kidding me??Well maybe "Xtreme OCer" doesn't...
These are server CPUs and in this particular Java benchmark they just scream.60% faster per clock than Barcelona which was beat by Dunnington.Now it is reversed again.
Sounds like the new IBM java run-time is either really good, or IBM has found an optimization trick with SPECjbb2005 much like Sun's compiler did with SPECfp2000.
I have to admit, the new 45nm chips look good. But what bout Core I7 Xeon based chips. That will be the true test for AMD in the server market.
there are no xeon based i7 at the moment, intel choose to introduce desktop part first, amd choose to introduce the server part first (which looks like smarter move the me financial-wise)
but maybe AMD could only do this because only the cpu's had to be validated, not the intire platform
New spec submissions are in for Opteron 2.7Ghz (2384 and 8384).For now only the int_rate and fp_rate:
2384:
http://spec.org/cpu2006/results/res2...024-05683.html
http://spec.org/cpu2006/results/res2...024-05684.html
8384:
http://spec.org/cpu2006/results/res2...024-05685.html
http://spec.org/cpu2006/results/res2...024-05686.html
Before shanghai,top scores for AMD in these two tests were held by 2360 Opteron model.From my early calculations and clock normalization(8% difference between 2.5 Barc and 2.7Ghz Shanghai;best submitted scores for Barcelona are used;2P scores are used) you can see that the 45nm part is ~18% faster per clock in int_rate and ~20% per clock fp_rate than Barcelona.
Using the 8xxx series scores,in int_rate the lead is ~19% ,while in fp_rate the lead is ~17% (per clock of course).
No...
Dell uses BEA/Oracle Rockit 6 JVM + Windows 2003 SP1.Quote:
The LS42 was configured with the AMD Opteron™ Model 8384 quad-core processor at 2.7GHz
with 2MB L2 cache and 6MB L3 cache (4 chips/16 cores/4 cores per chip), 64GB of memory, one
36.4GB disk drive, and IBM Java 6 (using a 1875MB heap), and Microsoft® Windows® Server
2008 Enterprise x64 Edition. (1)
IBM uses IBM Java 6 JVM + Windows 2008 SP1.
I just went through the numbers for 24 cores Dunnington @ 2.66Ghz systems for int_rate/fp_rate and compared them to 16 cores Shanghai server @ 2.7Ghz.The Dunnington systems leads by 16% in int_rate test and falls behind by 44% in fp_rate test(24 versus 16 cores,clocks are essentially the same with 1% difference).
Patience young Skywalker :),they will be available soon.The rate scores are also valid for clock/clock evaluation.
It's interesting to see how well these things scale.I'm willing to bet that 6 core Istanbul will scale great too so we can "predict" the scores using present Shanghai numbers(should scale pretty much in line with core numbers and if it does,it will simply destroy dunnington which is already pretty much matched with Shanghai ).
I said that I found the performance disappointing but the power improvements are excellent, How it it a crap post? Are you happy with the ipc improvement? What was your guess for ipc improvement and what is it in reality? Does every opinion have to be 100% positive for AMD for you not to be insulting?