even considering two processors at once?
Thanks a lot, KTE. ;)
Now, the only question for me is: what will be the HT version of FASN8?
Printable View
even considering two processors at once?
Thanks a lot, KTE. ;)
Now, the only question for me is: what will be the HT version of FASN8?
Yep. The figures of around 260W sustained peak draw were dual socket Opty 2350 with 12x 15k SAS in RAID 0 using a RAID storage card and 8x1GB RAM (Disk, RAM and CPU being loaded). 400W 80%+ efficient PSU is roomy enough, unless you'll be adding vid card, in which case, a little higher is better (esp. if both oc'd).
No idea of FASN8 (that would require more power by design).
You're welcome :)
I assumed, that you refer to Johan's cache ping pong test. One of your later postings confirmed this. Well, I followed the development of this test on the original aceshardware forums a while ago and many ideas have been discussed back then. You can find the full discussion and an early version of the code here:
http://web.archive.org/web/200505281...0681&forumid=2
First I have to say, that this special test is referring to a special variant of core to core communication. And here I think, that K10 got a performance hit in this benchmark due to it's write buffering and maybe even L3 cache (which BTW adds ~20ns to mem latency in case of a miss). This benchmark doesn't tell us anything about how fast a core can access data in another core's cache, which was not written right before this access but at least tens of cycles earlier. Except for semaphores and the like such an access behaviour would just stand for a bad multithreaded coding style. ;)
SSE(2) instructions are mostly being double decoded on K8. SSE was vector decoded on K7. Since these 2 separate ops for both register halves on K8 finished one half one cycle earlier than the other half, it led to a nice 4 cycle latency for standard ops (add, sub, mul).
But as pointed out in the past (google for "k8 sse bottleneck"), there was a strange behaviour regarding SSE loads as you can see in the tests here again. Maybe due to the double decode, it was necessary, that such a decoded instruction uses a single FP unit sequentially. While using x87 or MMX loads it was possible to load two 64 bit values per cycle, this was not true using aligned 128 bit loads resulting in 0.5 SSE loads/cycle. This has been solved (maybe simply by avoiding the double decoding) - leading to a quadrupled SSE load performance compared to K8.
Yes,for that matter no one out there got working boards Tyan, Super micro ???Quote:
Originally Posted by cky2k6 View Post
this thread is the source of my frustration for the past couple of weeks... why can't asus get a working bios...
This hit inquirer
linkQuote:
BARCELONA NEEDS HELP FAST.
Apparently Barcelona will not work in older Mainboards, OLDER Meaning Yesterday & Today & for some time, as it useS multiple voltages at same time, in core, so its input pins are also multiple ?, anyway, due to unique Voltages simultaneously in core, there are no Mainboards, AT ALL.
Testers Stuck Pair into tyan 29XX, No Go, took working system & stuck in Barcelona & 50% lower scores than opteron dual cores. Thats BIG PROBLEM, I think.
Maybe its simple Hardware Socket or just rewire/upgrade of controllers, YET What if Barcelona internal CROSSBARS are all tuned wrong to put out such slow output.
extra!!!extra:kb940520 tests redfiboost/butt locker. Get it ,,ha.ha.ha?ah.
Signed:PHYSICIAN THOMAS STEWART VON DRASHEK M.D..
posted by : THOMAS STEWART VON DRASHEK, 10 October 2007
This slowdown is handicapping the fanboys to brag about barc for least...:)
This was just someones comment about the new AMD roadmap that the inq posted. It was listed in the comments section. Who knows what validity it has.
Yes, its not news its a comment, point is all in all its a big lengthy delay,it holds the same validity as some one ranting in these forums.
But being a dual socket Opteron user its a frustrating experience.
my 2 cents
OEM's get their chips months or atleast weeks before launch but not able to see a decent working board after weeks of launch is the problem,forget about benchmarking and losing crown kind of stuff.
Stabilizing takes time its ok, but minimal working should not.
Problem is with board makers or BIOS writers or with barcs or combined ?
PS: Nothing against you Phil,its an AMD chip buyer/little fan frustration.
Maybe be true, but what I know is that we haven't seen one run correctly on this site. I suspect that the two main reviews that I have seen had them running correctly and I would imagine that there are AMD customers that have Barcelona systems running. I also believe that there are significant problems with the present platform.
I'm not sure what the problem is here, but I suspect its mainly because we're just a little farther down the food chain then Dell, etc. AMD's support is too busy stamping out fires elsewhere.
No offense to Dave, Steven, etc who have worked hard to give us some insight.
In the end, I do believe the writer is correct in sense that the present mb do not work well with K10 and we will not see the full potential of this cpu until new mb are designed. Unfortunately all of those who thought they could just buy the cpu will have to get a new mb as well. I'm glad I just upgraded my 939 and waited.
PS. No offense was taken and I'm frustrated as well
Hypothetical situation:
What if Barcelona is actually all it's cracked up to be, and a little more. What kind of demand would that cause for these chips? What if the supply couldn't fill demand, considering the HPC and supercomputer obligations AMD needs to fill first. My guess is the backlash from that would be as blown out of proportion as the situation now, if not worse.
I imagine AMD is ramping %110, and their first priorities are getting these chips into the hands of the big contractors who already signed for them. I'm sure that is much more important to them than worrying about keeping a handful of enthusiasts happy. After all, these are server and HPC chips.
Just, what if...! :D
The doom and gloom makes me :rofl:
Yeah I know, Phil. My doom and gloom blabbering wasn't directed at you at all. Just an in general observation from all the forums I lurk at around the net. :p: Although it was expected with the massive PR blitz that was launched against them over a year ago now. :yepp:
For sure, OEM's and super comps are their priority and thats where they make bucks its a business fundamental.Quote:
Just, what if...! :D
The doom and gloom makes me :rofl:
Reminder: Its not about AMD and chips,its about boards and BIOS.
Topic here is not getting a decent board out,informing the same repeatedly to others,not a kid saying as AMD gone or some less mature remarks, and also note that HPC and super comp manufacturers wont read AnandTech or XS to buy chips/boards and wait for bench marks. :)
Here it is all about small biz owners,retailers for small clients. enthusiasts buy C2D Quad no offense.
If some one thinks ALL here are a bunch of enthusiasts play games using dual socket opterons,then one may need to rethink.
That Inq comment looks too much like someone reading here posting, and the words being used "apparently" means its not his own source. The way it looks, it could very well be an Inq employee posting a rumor and yet escaping any blacklash this way, which is something never out of their lowliness. It means nothing more than what we already know.
There's no doubt Family 10h has been a disappointment and pure source of frustration up until now and even worse with the future roadmaps all being so badly delayed. No dual core, no tr-core, and no FX till at least March 08, means the FX 3GHz shown in January will not even be released 18 months from the marketing. What a bad shambles that is. Then you have crap MB and BIOS issues. :(
Under normal circumstances, that's exactly how things should work but this time it didn't go as smoothly as planned. As you may or may not have heard, there was a HUGE problem with yeilds on the first run of Barcs. This problem wasn't something which was easily remedied, but ultimately things got fixed. In the meantime, AMD had fallen months behind schedule and instead of delaying the launch any further, the "original revised" launch date was still targeted. This resulted in an insanely short development time for even AMD's largest oem clients. TBH, I wouldn't be surprised to hear that mobo manufacturers were shipped samples as late as a few weeks before launch. IMO, Barcelona is a polished processor - probably exactly how AMD had originally planned it. The issues we're seeing involve compatibility with current mobos. The bios developers can't be blamed either as tey attemped to squeeze what normally takes months of testing into a period of a few weeks. I'm confident that things will sort themselves out in the coming weeks and we'll finally see what k10 is really capable of.
Back to the thread: Sorry I haven't had much time to update here lately I'll find some time to attempt to fill all the requests for benches. Oh, btw, did I happen to mention that ATI's Hammerhead arrived last week? Hopefully Phenom isn't far behind......
Hammerhead? I don't remember what that is...
I think its the RD790 mb.
come on steve, give kribibench a run! :D
The Hammerhead
http://www.xtremesystems.org/forums/...1&d=1188924377
I like me some shark!
Read the whole thread!....
It only took two days. S7e9h3n, you mentioned that the new bios you've acquired (I won't even ask ^_^) work, but you've had no luck with the opterons still :-\ Do the bios you have on hand leave the old options for overclocking on the L1N open? Have you had any luck at all getting either CPU to boot?
I really appreciate the numbers you've thrown up, as well as everyones' contributions of information regarding these chips.
Bananas to that mobo...
I want this:
http://i115.photobucket.com/albums/n...na_fx_mobo.jpg
The arrangement is just too ideal! My TV tuner in the PCIe slot just above the first graphics card, then a couple of 2950's and then a Auzentech Prelude to round the whole package out :eleph:
Edit: I know that it's apparently a mock-up, I don't care! Give me one made of cardboard for all I care!
Edit2: Perhaps it's not a mock-up board, or AMD just wired the fans up to a car battery for this demonstration...
http://i115.photobucket.com/albums/n.../agena/1_s.jpg
that's my dream machine....
One day, one day... :)
Nah the fans are plugged in. Circled in red in the attached pic, the wires are silver instead of the normal red/black/yellow.
Look at the green circled area - hmm, PCIe 2.0 slots apparently with only one 6pin plugged in.
Wonder what those two molexes are for though that I circled in yellow? :confused:
I see two 8-pin EPS connections on this board! Circled in blue - why not just one, are the CPUs really that power hungry that they can't run off of just 4 pins each?
This was a fairly old pick, and IIRC they were running at 3.0 Ghz, so perhaps they were overclocking early silicon and they needed the extra stability of the 8pins/cpu. The molex's are totally unnecessary, I have one on my 580x crossfire board, and it's not required. I noticed the fan wires and yes that is a PCIe2.0 slot; the R600 is being powered by it and a sole PCIe connector ( = win)
im not going to respond to this except for this
anandtech's Linpack test clearly showed that the processor scaled from one socket to two better than 100% it was more than twice as fast on 2 as one(better than linear)!
As for the rest of what you said you dont own the hardware and are repeating what others have said .
then you can explain why Quad FX preforms better at 5-5-5-18 800 than at 4-4-4-12 800,we no an Intel wouldnt react the same,because the platforms are way different.
This is a server processor not desktop, a single socket will get more memory bandwidth on the desktop platform(single socket),obviously more processors on an NUMA system provides more bandwidth,thats why AMD made it.
last point years ago my friend did Seti he had an 866 coppermine Intel,over in the corner was an old Xeon P2 400 the P2 400 slaughtered the 866 on seti at half the clock speed,so he decided to try games on it,games werent even playable.
moral of the story trying to guess the desktop preformance based on server chips,doesnt always workout the way you think(pointless)
Spec clearly showed AMD smashes core on FP with proper code, its that simple by over %100 on some types of code,I could be wrong but I beleive there are 19 FP tests AMD@2Gh beat Intel@3Gh on 17 of them. It also clearly showed that Intel wins on povray,that eveyone loves to run.See what happend is this,new processor gets run on Spec Intel being a huge beast and Media hound runs over to Spec.org to see were they beat AMD,Then they use the media and such to make the tests they win standard bench marks,then Intel throws in there shady compiler and bam they have a winner.Anandtech also clearly showed that on code not compiled with the Intel compiler AMD wins again
http://aceshardware.freeforums.org/v...r=asc&start=60
hey if you cant win fairly cheat LOL
anandtech's Linpack test clearly showed that the processor scaled from one socket to two better than 100% it was more than twice as fast on 2 as one(better than linear)!
20-25 to 35-45, that's 100%+ alright. OTOH, 25-30 to 45-55, seems Intel scales better here.
last point years ago my friend did Seti he had an 866 coppermine Intel,over in the corner was an old Xeon P2 400 the P2 400 slaughtered the 866 on seti at half the clock speed,so he decided to try games on it,games werent even playable.
moral of the story trying to guess the desktop preformance based on server chips,doesnt always workout the way you think(pointless)
I don't see a comparison. You equate PIII vs PII with. Totally different archs, compared with Barcelona and Phenom, same core.
Spec clearly showed AMD smashes core on FP with proper code
SpecFP and SpecFP_rate aren't the same. SpecFP measures pure FP performance, specfp_rate runs multiple instances (not threads) of the same bench, using exorbitant bandwidth requirements. http://www.realworldtech.com/forums/...83478&roomid=2
See what happend is this,new processor gets run on Spec Intel being a huge beast and Media hound runs over to Spec.org to see were they beat AMD,Then they use the media and such to make the tests they win standard bench marks,then Intel throws in there shady compiler and bam they have a winner.
:ROTF:
Anandtech also clearly showed that on code not compiled with the Intel compiler AMD wins again
I don't recall Anandtech disclosing which apps were compiled with what besides Linpack.
don't you people get tired of spreading "false misinformations"?? omg!
false misinformations = true informations :shrug:
My personal favorite is "I could care less".
So many people are too lazy to say "I couldn't care less" without realizing that by saying they "could care less" means they do care about the issue.
I just got dual 2347 and now running with DDR2 800 ecc reg & 8800GTX on Tyan S2932. Now installing window 2003 server 64bits. What do you want me to test now? I will do it tonight.:D
And i will compare it with dual xeon 5320 2.0ghz tomorrow.:D
run superpi, sandra mem bandwidth, 3dmark06, cinebench. basically, just look over this thread for more ideas.
Prime95 benchmark results might be interesting for the crowd at mersenneforum.org. Most recent version seems to be http://www.mersenne.org/gimps/p95v255a.zip
which includes multithreaded tests.
1st: Cinebench R10. How about that point?
I will do more. And Good news is this barcelona can OC. I oc to 2.0ghz.:D
The bad news barce need more good mainboard.
I will have Supermicro suppor DDPM and will try more.
I am tweaking this mainboard too. S2932 now have option to run DDR2 1066 in bios. I want to know enable exactly NUMA in bios, please help me.
Super PI is 41s , barce not for PI. And mean nothing.
I will test render in 3Dmax tomorrow.
And the Cinebench score of 12541 he got almost exactly matches the scores we saw from the techreport Barc reviews last month (slightly slower in fact).
http://techreport.com/articles.x/13224/5
http://techreport.com/articles.x/13176/5
Is there any meaningful benchmark evidence as yet that the barcs that had been sent out for review last month were somehow performance crippled as someone here had claimed?
Good job linhvndiy. :)
Regardless of what transpires I have bad news, in a way inevitable. I asked my boss at work of any news or contact with AMD and he told me he had contacted all the major and local server vendors for a possible order on Barcelona servers many times and was told repeatedly that they have no Barcelona processors for their platforms although everything else is ready because of very short supply and high demand and the next availability date is not known but most likely November. Every one of them stated AMD is having supply problems since the launch. The communication took place over the last week.
http://bp2.blogger.com/_S625jUOxSwA/...PECfp_rate.jpg
the strait line is INTEL
anandtech
"Notice that the Intel CPU has the advantage when it comes to raw processing power: it is about 19% faster in a single CPU configuration. Once you add a second CPU in both systems, that 19% lead is turned into a 3% advantage for AMD"
What he didnt say here is the Intel is clocked 17% higher and still got beat by 3% on 2 processors
that would be scaling better than 100%
Sorry it wasnt the test using Intels compiler
My Barcelonas will be here Thursday and my KFSN4-DRE is on the table :D
I also have prepared KFSN4-DRE, now waiting arrival of rental Barcelona:)
http://222.151.147.26/c-board/file/K...1CPU_setup.jpg
This board has DDR2-400, 533, 667, 800, and 1066 setting.
http://222.151.147.26/c-board/file/K..._mem-clock.jpg
And old ClockGen for nForce 4 works fine on this board.
http://222.151.147.26/c-board/file/K...250_MEM400.png
Beta BIOS 1004(though it's not for -DRE but for -DRE/SAS) shows that
it includes AGESA version 3.1.1.1...which is latest I've ever seen.
http://222.151.147.26/c-board/file/1...SA-3.1.1.1.jpg
Sorry for hijacking your thread, s7e9h3n...
I deeply appreciate your exclusive support;)
ASUS now shows a 1006 bios for dre :d oops mistake wrong board
but what you have is exactly what I got same rev :D how about sharing that bios
Hey, he "somehow lied" about superpi, but he commented about Cinebench!
And now that theree are finally BAs around, really looking forward to see direct comparisons against Xeons!
By the way, can someone confirm/deny ccNUMA and dual-channel bugs? Are them working properly on this BA processsor?
This is 3dmax 9 64bits render. Barcelona 2.0ghz result 2min4'.
Dual Xeon 5355 2.66ghz is about 1min4x'(It's not my test)
I will test this file with xeon 5320.
I thinks Barcelona equal or a lilte bit faster than xeon in 3dmax render.
And AMD x2 6000+ render this file in more than 6min.
But 3dmax is not optimized for 8 cores, i think. High clock Q6600 may have better result.
I want to test server app now, hic.
You have not read this graph right, it's not even a scaling graph, it is a comparitive graph. The values you quote also do not show scaling above 100% I'm afraid because they are comparitive too.
I am sorry, you cannot get better than 100% scaling with cpu's. It's impossible. If you could then IPC at one frequency becomes IPC+N at a higher frequency which is ludicrous. It's even more ludicrous if you are talking about multiple cpu's which you seem to be doing above. CPU's work downwards from 100%, not up from it.
Unless of course the K10 has it's own laws of physics, maybe that is why they are in such short supply ?
Regards
Andy
I think he's trying to use the intel performance factor to amd performance factor as his example for over 100% performance increase, which it is not. It is only an increase in performance per clock cycle.
Stephen, PM incomming.
"Notice that the Intel CPU has the advantage when it comes to raw processing power: it is about 19% faster in a single CPU configuration. Once you add a second CPU in both systems, that 19% lead is turned into a 3% advantage for AMD"
What he didnt say here is the Intel is clocked 17% higher and still got beat by 3% on 2 processors
that would be scaling better than 100%
Sorry it wasnt the test using Intels compiler
ok on one processor(socket)intel won by 19% so on 2(sockets)it should have been 38%,(that would be perfect 100% scaling)but in fact the AMD ran more than twice as fast to WIN by 3%, thats more than 100% scaling on 2 processors,and yes it Phenomenal!! thats 122% scaling
the graff had nothing to do with the scaling comment. I was responding to two post.in one Ill be sure not to do it as it seems to confuse people
The focus should be moved from "100% scaling" vs. "122% scaling" (surely meant ironically) to something like 60% vs. 80%. Then all people here should be happy ;) As we all know, one type of CPU in a multi CPU setup is hampered more by FSB, inter core traffic and mem BW than another type of CPU with its direct connection links, a cache shared by 4 cores and so on..
Will a car run twice as fast by doubling the number of cylinders?
I can understand this result. Some one can explain to me?
I got 1000 point Cinebench R10 more with DDR2 667(better with old result with ddr2 800).And window2003 change to Winxp64
I think motherboard maker have a lot of work to do to tuning their bios to get the best performance of Barce.
And i think s2932 have somethings wrong with memory or not optimized for barce mem controller.
So it's can make these other review barcelona may be not correct.:D
linhvndiy, I kindly ask you to run "openssl speed" :) On Barcelona, and also your dual-Xeon 5320 if possible (for comparison). See http://www.xtremesystems.org/forums/...45&postcount=1
- Z
I looked at the older screenshot again, because this was my first thought. But he run Win 2003 64 bit edition as it seems. The older CB screenshot shows "(64 Bit)". See for yourself: http://www.xtremesystems.org/forums/...postcount=1048
But there is another problem with your argument. CB doesn't do calculations using the integer registers (GPRs) but SSE registers. But these are also doubled from 8 to 16 in 64 bit mode.
Are you on drugs? :yepp:
No lies, we haven't even remotely seen the power of k10 as yet ;)
Mr.Ass is clutching at straws :yepp:
Yeh he knows mate, he's merely trying to alleviate the trouble he has sleeping at night worrying about whether or not I'm right with my K10-SUPERPi predictions :p:
yes but to predict is not the same has having the priviledge? for seeing a real result.
You enter the business of running benchmarks seldomly or never run by others but still offering nice informations. ;)
While you are on it, please run the benchmarks in the most recent Prime95 32-/64-bit versions.
http://www.mersenne.org/gimps/p95v255a.zip
http://www.mersenne.org/gimps/p64v255.zip
(Options -> Benchmark)
To dresdenboy: I will do it tonight.
I just do the openssl. But i dont think it run well : only one cpu run. It should run on Linux. Please let me know how?
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md2 1449.00k 3054.78k 4214.85k 4665.85k 4804.30k
mdc2 0.00 0.00 0.00 0.00 0.00
md4 15350.40k 54126.60k 157606.54k 298261.62k 405246.76k
md5 14174.73k 49422.88k 141059.09k 262657.00k 312424.88k
hmac(md5) 21801.33k 71048.50k 180837.68k 291207.91k 249707.40k
sha1 14510.02k 47249.78k 119463.93k 194800.77k 239247.29k
rmd160 11639.53k 34359.30k 76082.83k 109708.79k 125589.71k
rc4 224624.66k 256885.87k 263172.02k 267750.02k 269124.41k
des cbc 53687.09k 55342.95k 55489.39k 55784.59k 55924.05k
des ede3 19719.34k 19884.10k 19994.27k 20051.53k 19993.81k
idea cbc 36091.68k 38692.84k 39475.80k 39765.86k 39770.57k
rc2 cbc 20258.67k 20911.40k 21074.26k 21115.37k 21136.65k
rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00
blowfish cbc 81029.78k 85576.21k 86592.08k 86928.58k 86951.11k
cast cbc 75352.42k 79531.72k 80737.32k 81029.78k 81029.78k
aes-128 cbc 52766.84k 56660.64k 57723.09k 58032.57k 58042.61k
aes-192 cbc 45497.53k 49483.01k 50526.17k 50770.82k 50770.82k
aes-256 cbc 40900.09k 43919.41k 44644.00k 44924.93k 44930.95k
camellia-128 cbc 0.00 0.00 0.00 0.00 0.0
camellia-192 cbc 0.00 0.00 0.00 0.00 0.0
camellia-256 cbc 0.00 0.00 0.00 0.00 0.0
sha256 9979.01k 24648.36k 45642.97k 57999.97k 58674.42k
sha512 3135.02k 12525.45k 19688.10k 27898.67k 31767.51k
sign verify sign/s verify/s
rsa 512 bits 0.000535s 0.000044s 1869.0 22676.5
rsa 1024 bits 0.002409s 0.000118s 415.1 8472.5
rsa 2048 bits 0.013613s 0.000386s 73.5 2589.7
rsa 4096 bits 0.087500s 0.001354s 11.4 738.5
sign verify sign/s verify/s
dsa 512 bits 0.000397s 0.000478s 2517.8 2090.3
dsa 1024 bits 0.001103s 0.001329s 906.8 752.3
dsa 2048 bits 0.003602s 0.004389s 277.6 227.8
I never claimed to have seen any record breaking fast k10's at all. I said that from benchmarks I've been priveledged (lucky enough) to see, that k10 cpu's will run SPi 1M in less than 23seconds...and when OC'd will manage to bring that down to 13seconds. I never said which K10 cpu's so I'm sorry if any of you managed to slide an exact model number in there :rolleyes:
As for now, we have around 6 users on XS with a k10...all running bugged to the max in server boards/AM2 boards with ECC mem/no oc not to mention most guys have trouble trying to get theirs to boot and run properly.......yet you guys already seem to decide these are the final product???? :rofl:
Come on....:clap:
I never claimed to have seen any record breaking fast k10's at all. I said that from benchmarks I've been priveledged (lucky enough) to see, that k10 cpu's will run SPi 1M in less than 26seconds...and when OC'd will manage to bring that down to 17seconds. I never said which K10 cpu's so I'm sorry if any of you managed to slide an exact model number/revision/stepping in there :rolleyes:...I certainly didn't mean the first one's out the factory with no bios.
What are you guys on......10yr old child pills?
As for now, we have around 6 users on XS with a k10...all running bugged to the max in server boards/AM2 boards with ECC mem/no oc not to mention most guys have trouble trying to get theirs to boot and run properly.......yet you guys already seem to decide these are the final product???? :rofl:
Come on....:clap:
Quote:
K10 cpu @ 3Ghz & DDR2 @ 1066+mhz 5,5,5,18 & AM2+ MOBO = <17s
K10 cpu @ 2Ghz & DDR2 @ 667mhz 5,5,5,15 & AM2 MOBO = 26
Strangely enough out of the 9 vendors he did contact the 8347 was the exact CPU 1 of them said he might be able to get hold of late October. Maybe it was you he contacted afterall :D (jk)
Thanks Ste but he's far up the ladder from me at the corp and the best I get to speak to is his secretary. :( They order batch servers directly from the like of HP, Sun, Dell and other server builders and never go to typical retail outlets for single CPU buys. Its gear for a research lab at a government funded hospital BTW, so the order would be quite large.
Btw: Soldner, didn't intend to say that you lied. Thats why said it "in between these signs"... ;)
[edit: I updated my percentages after learning that linhvndiy was running his 2347's at 1950 MHz to obtain the above scores]
Thanks linhvndiy ! These scores are very interesting.
I have compared them with the scores of K8 at the same speed to do a clock-for-clock comparison between K8 and K10. Technically I ran openssl on a dual Opteron 280 (2.4 GHz), but I scaled its scores down to simulate a 1950 MHz K8 (all of the openssl speed tests scale linearly with the clock frequency, so that's a pretty good estimation, see the end of this post for my K8 results.) For the tests using different buffer sizes (16 to 8192 bytes), I only took into account the three larger buffer sizes (256, 1024 and 8192 bytes), because the overhead of the OpenSSL API with the 2 smaller buffer sizes (16 and 64 bytes) is too high and cripples the results on both K10 and K8 which makes any sort of direct comparison difficult.
So here is how K10 fares against K8 in the most popular encryption/hashing algorithms:
o md4: K10 is between 3% and 6% faster than K8 (+6%, +4%, +3% for the three different buffer sizes)
o md5: the throughput varies too much between the three buffer sizes (the K10 scores vary by +6%, +5%, -8% compared to K8), are you sure your machine was idle ?
o hmac(md5): the throughput varies too much here too (+7%, +5%, -18%)
o sha1: K10 seems to have a negligible advantage over K8 (+7%, +5%, and +4%)
o rc4: K10 is as fast as K8 (within 1% of each other)
o blowfish: K10 is consistently 5% faster than K8
o aes-128: K10 is between 16% and 17% faster than K8 with the three different buffer sizes (for example: 8192-byte test: 58042 kB/s vs. 49500 kB/s)
o aes-192: K10 is exactly 18% faster than K8 no matter what the buffer size is
o aes-256: K10 is also 18% faster
o sha256: the throughput varies too much here (+11%, +11%, +3%), weird
o sha512: K10 is between 4% and 5% faster than K8
o rsa 1024-bit: K10 is +12% faster than K8 on sign operations (415 vs. 370 sign/s)
o rsa 1024-bit: K10 is +10% faster than K8 on verify operations (8473 vs. 7730 verify/s)
o dsa 1024-bit: K10 is +9% faster than K8 (907 vs. 830 sign/s)
o dsa 1024-bit K10 is +9% faster than K8 (752 vs. 690 verify/s)
Overall, K10 is, clock-for-clock, 0% to 18% faster than K8 on these 32-bit (100% ALU) OpenSSL speed tests. (It would have been interesting if the guys who ported OpenSSL to Windows enabled the SSE2 assembly implementation of sha512...)
linhvndiy, you said you wanted to benchmark the 8 cores at the same time. This is possible with "openssl speed -multi 8" under Linux/*BSD/Solaris... But you should know that these tests all scale linearly with the number of cores and the frequency clock (they all fit in the L2 cache), so just multiplying your scores by 8 gives a very precise estimation.
Now, if I can have one more wish ( ;) ) I would ask you to run the same benchmark under 64-bit Linux/*BSD/Solaris. The RSA scores would jump by about x3 (the BN lib just loves 64-bit archs), and the RC4 and MD5 throughput would increase by 15-30% (at least that's what is observed with K8). Running openssl speed in 64-bit mode (and with -multi 8) is how the guys at http://www.tecchannel.de/server/proz...28/index9.html obtained their excellent RSA scores.
If you don't know what 64-bit distro to run, I recommend the 64-bit Ubuntu 7.10 (release candidate), the OpenSSL version they distribute is very recent (package named "openssl", version 0.9.8e).
- Z
Code:"openssl speed" using the 32-bit Windows port of OpenSSL 0.9.8e running on:
vendor_id : AuthenticAMD
cpu family : 15
model : 33
model name : Dual Core AMD Opteron(tm) Processor 280
stepping : 2
cpu MHz : 2405.476
cache size : 1024 KB
OpenSSL 0.9.8e 23 Feb 2007
built on: Wed Feb 28 01:35:20 2007
options:bn(64,32) md2(int) rc4(idx,int) des(idx,cisc,4,long) aes(partial) idea(int) blowfish(idx)
compiler: cl /MD /Ox /O2 /Ob2 /W3 /WX /Gs0 /GF /Gy /nologo -DOPENSSL_SYSNAME_WIN32 -DWIN32_LEAN_AND_MEAN -DL_ENDIAN -DDSO_WIN32 -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DBN_ASM -DMD5_ASM -DSHA1_ASM -DRMD160_ASM -DOPENSSL_USE_APPLINK -I. /Fdout32dll -DOPENSSL_NO_CAMELLIA -DOPENSSL_NO_RC5 -DOPENSSL_NO_MDC2 -DOPENSSL_NO_KRB5 -DOPENSSL_NO_DYNAMIC_ENGINE
available timing options: TIMEB HZ=1000
timing function used: ftime
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md2 1597.75k 3568.29k 4965.69k 5497.35k 5674.21k
mdc2 0.00 0.00 0.00 0.00 0.00
md4 17908.11k 62660.00k 183608.38k 352740.42k 485592.36k
md5 16226.33k 56910.50k 163840.00k 308830.48k 417473.49k
hmac(md5) 24636.15k 81670.76k 208121.77k 341955.99k 425412.77k
sha1 16008.79k 52702.61k 137180.83k 228728.23k 283938.50k
rmd160 13354.47k 39887.58k 89472.52k 130245.25k 150081.32k
rc4 256140.70k 306881.58k 320910.79k 326595.60k 327808.05k
des cbc 60699.04k 63226.74k 63937.56k 64071.86k 64305.16k
des ede3 22338.35k 22840.12k 22980.88k 23171.21k 23027.67k
idea cbc 43123.55k 46596.91k 47527.52k 47798.34k 47798.34k
rc2 cbc 23512.32k 24422.76k 24637.96k 24690.53k 24692.35k
rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00
blowfish cbc 93077.48k 100013.21k 101526.27k 102144.39k 102144.39k
cast cbc 87267.70k 92820.01k 94121.83k 94626.15k 94920.60k
aes-128 cbc 57367.81k 59726.65k 61030.25k 61030.25k 60919.45k
aes-192 cbc 50541.39k 51949.89k 52675.72k 52849.95k 52849.95k
aes-256 cbc 44792.99k 46154.65k 46661.70k 46863.73k 46929.28k
camellia-128 cbc 0.00 0.00 0.00 0.00 0.0
camellia-192 cbc 0.00 0.00 0.00 0.00 0.0
camellia-256 cbc 0.00 0.00 0.00 0.00 0.0
sha256 10971.24k 27588.43k 50805.41k 64589.86k 70135.20k
sha512 3692.13k 14766.89k 23183.76k 32873.14k 37448.10k
sign verify sign/s verify/s
rsa 512 bits 0.000492s 0.000041s 2032.8 24127.0
rsa 1024 bits 0.002308s 0.000108s 455.2 9512.6
rsa 2048 bits 0.012288s 0.000333s 83.4 3055.7
rsa 4096 bits 0.077100s 0.001137s 13.0 879.7
sign verify sign/s verify/s
dsa 512 bits 0.000361s 0.000429s 2769.9 2331.6
dsa 1024 bits 0.000980s 0.001177s 1020.4 849.9
dsa 2048 bits 0.003127s 0.003701s 319.8 270.2
I did test at 1950mhz. And i feel this 32bit ssl is not the test for barcelona. I think barce will perform real in Linux and multi threads. But i have to research how to test under Linux.
Given that the OpenSSL assembly code is absolutely not optimized for the K10 architecture (yet) and that it is exclusively using the ALU unit which is the one that changed the less between K8 & K10 (compared to, say, the FPU and SSE units), I think that 5%-18% of perf improvements across the range of algorithms is pretty good (I'll edit my post with updated percentage numbers later now that I know the clock was 1950 MHz).
Doing 64-bit OpenSSL speed tests is very easy. Just follow my link to download the Ubuntu 7.10 64-bit DVD image. Burn it. Install it. Then in a terminal: "sudo apt-get install openssl", type in your user passwd, then "openssl speed".
- Z