View Full Version : Barcelona vs Clovertown
I did some tests recently of OpenLDAP on Opteron 875, Opteron 2347, and Xeon 5345. The Opteron systems both outperformed the Xeon, despite a clock speed disadvantage. The 2.2GHz Opteron 875 is still better than the 2.3GHz Xeon 5345 under heavy load, and the 1.9GHz Opteron 2347 is just amazing.
http://connexitor.com/blog/pivot/entry.php?id=191
Always makes me wonder how these standard benchmark programs like SuperPi etc. were written and compiled. Personally, I think anything closed source simply can't be trusted. You have no idea what compiler optimizations were used, you have no idea whether large computations take chip cache sizes into account, etc. etc... If you want to benchmark the absolute performance potential of a system, you have to make sure the software you're testing with can actually take advantage of that system.
systemviper
01-07-2008, 11:13 PM
I did some tests recently of OpenLDAP on Opteron 875, Opteron 2347, and Xeon 5345. The Opteron systems both outperformed the Xeon, despite a clock speed disadvantage. The 2.2GHz Opteron 875 is still better than the 2.3GHz Xeon 5345 under heavy load, and the 1.9GHz Opteron 2347 is just amazing.
http://connexitor.com/blog/pivot/entry.php?id=191
Always makes me wonder how these standard benchmark programs like SuperPi etc. were written and compiled. Personally, I think anything closed source simply can't be trusted. You have no idea what compiler optimizations were used, you have no idea whether large computations take chip cache sizes into account, etc. etc... If you want to benchmark the absolute performance potential of a system, you have to make sure the software you're testing with can actually take advantage of that system.
It's still an AMD, "yawn" :rofl:
sorry, I'm an intel fanboy....
linhvndiy
01-07-2008, 11:56 PM
I did some tests recently of OpenLDAP on Opteron 875, Opteron 2347, and Xeon 5345. The Opteron systems both outperformed the Xeon, despite a clock speed disadvantage. The 2.2GHz Opteron 875 is still better than the 2.3GHz Xeon 5345 under heavy load, and the 1.9GHz Opteron 2347 is just amazing.
http://connexitor.com/blog/pivot/entry.php?id=191
Always makes me wonder how these standard benchmark programs like SuperPi etc. were written and compiled. Personally, I think anything closed source simply can't be trusted. You have no idea what compiler optimizations were used, you have no idea whether large computations take chip cache sizes into account, etc. etc... If you want to benchmark the absolute performance potential of a system, you have to make sure the software you're testing with can actually take advantage of that system.
Wow, it's very cool.
And my server customers report Barcelona 2347 single chip out perform dual opteron 280 by large margin (more than 30%).
Can you test mysql & apache on that systems?
Many thanks.
I don't have any test harness for mysql or apache, sorry.
I just noticed in this post http://www.xtremesystems.org/forums/showpost.php?p=2681946&postcount=534 that folks are puzzling over how a 15% increase in clock yielded an 18% increase in performance. AMD has really given us a few puzzles it seems. Just like how my server consistently got a 111% performance gain going from one core to two cores in my tests. (Look at the back-null auth rate for 1 core and 2 cores for Opteron 2347 - 13760 to 29085...)
JohannesRS
01-08-2008, 10:27 AM
It's not the first time it happens.
If you look here:
http://classic.chem.msu.su/gran/gamess/barcelona.html
You will see many tests where Barcellonas begin down but then breaks wolfdale by twisting Amdahl Law's neck.. ;)
Nice tests, thanks guys. :)
From the quick few tests, I put the higher than expected scaling numbers down to a) inter-core bandwidth saturation b) memory bandwidth saturation c) core performance efficiency reliant on data bandwidth fed.
Like I've stated before, software is never perfectly optimized for a given architecture, and if that could be managed, you would see even higher performance at the same clock speeds. Some software ie better optimized for a given architecture though than another.
But servers do have one major call with AMD and for the main reason why Intel wants the IMC/native quad design: memory/intercore bandwidth and latency. With throughputs based on such large bandwidth, AMDs architectural design has a clear advantage, and HPC/data centers do usually involve this at thier intricate level, so they will see maximal benefits of the architectural design as shown above. AMDs core design performance advantage increases in these scenarios the higher the number of nodes you connect (>2-4-8-16-32-64 cores). This is why there's still market share to cap even with thier current Barcelona releases (if they could have them errata free) vs Harpertown, as server market is very unlike Desktop/Mobile for its workloads, conditions, requirements, intensities, capacities and software demands.
Lightman
01-08-2008, 12:01 PM
I just noticed in this post http://www.xtremesystems.org/forums/showpost.php?p=2681946&postcount=534 that folks are puzzling over how a 15% increase in clock yielded an 18% increase in performance. AMD has really given us a few puzzles it seems. Just like how my server consistently got a 111% performance gain going from one core to two cores in my tests. (Look at the back-null auth rate for 1 core and 2 cores for Opteron 2347 - 13760 to 29085...)
I think I know :) !
Try to run your tests with Cool'N'Quiet ON and then retest with C'n'Q OFF ;)
I can reproduce over 400% scaling for some applications on my Quad, but only with C'n'Q ON. When OFF scaling seems to be more natural (read - below 400% :p: ).
BTW. Great test :up: ! Can't wait for SUN Niagara 2 tests from you :yepp:
Jacky
01-08-2008, 12:17 PM
Always makes me wonder how these standard benchmark programs like SuperPi etc. were written and compiled.
SuperPi is by no means 'standard', merely some overclockers play this outdated benchmark (but you have to admit it's fun...)
If you want to benchmark the absolute performance potential of a system, you have to make sure the software you're testing with can actually take advantage of that system.
This very much depends on the system's intended purpose, because "the absolute performance potential" may not be achieved in most environments.
For Instance, Desktops:
Have to work mostly with precompiled closed source programmes (which do not use any optimisations, or are Intel optimised if at all), Because that's what most programmes are, and average joe doesn't even know what "compile" means [but he wants decent price/performance..]
Laptops imo the same.
So a chip which takes any code (optimised, unoptimised) you throw at it, would work best.
Maybe some HPC people , *nix geeks & enthusiasts can and do optimise their code, I don't know about server.
However optimisations can show dramatic improvements, like SSE4 when it comes to encoding with penryn.
justapost
01-08-2008, 04:14 PM
I
Always makes me wonder how these standard benchmark programs like SuperPi etc. were written and compiled. Personally, I think anything closed source simply can't be trusted. You have no idea what compiler optimizations were used, you have no idea whether large computations take chip cache sizes into account, etc. etc... If you want to benchmark the absolute performance potential of a system, you have to make sure the software you're testing with can actually take advantage of that system.
Guess they used intels compiler suite.
Thank you for the review. Can it be that for your one-core benchmarks slapd ran on core0, whom was already under load from the ethernet card. That whould explain the >100% increase.
Have you ran the benchmarks with unoptimized packages? I'd be interested in the benefit from -march=amdfam10 with gcc.
No, in all the tests from 1 to 7 cores, the ethernet driver was always using a separate core from my server process.
Yes, I've run with optimizing turned off too. That is the subject of yet another puzzle. http://gcc.gnu.org/ml/gcc/2007-11/msg00703.html
The numbers I reported in these tests were all using -O2, but under heavy load I actually got higher throughput using -O0. I haven't reported the -O0 numbers for this test, because I didn't want to mix the -O0 and -O2 numbers, and I already had a large amount of data to tabulate.
vBulletin® v3.7.0, Copyright ©2000-2008, Jelsoft Enterprises Ltd.