Here's a little teaser....

even considering two processors at once?

Thanks a lot, KTE. ;)

Now, the only question for me is: what will be the HT version of FASN8?

Yep. The figures of around 260W sustained peak draw were dual socket Opty 2350 with 12x 15k SAS in RAID 0 using a RAID storage card and 8x1GB RAM (Disk, RAM and CPU being loaded). 400W 80%+ efficient PSU is roomy enough, unless you'll be adding vid card, in which case, a little higher is better (esp. if both oc'd).

No idea of FASN8 (that would require more power by design).

You're welcome :)

Quote:

Originally Posted by MR_SmartAss

Depends on what is "very quickly". The cores on a K10 @ 2.4GHz or less are communicating slower than the cores of the different dies of the Core2 Quad MCM at same frequency.

I assumed, that you refer to Johan's cache ping pong test. One of your later postings confirmed this. Well, I followed the development of this test on the original aceshardware forums a while ago and many ideas have been discussed back then. You can find the full discussion and an early version of the code here:
http://web.archive.org/web/200505281...0681&forumid=2

First I have to say, that this special test is referring to a special variant of core to core communication. And here I think, that K10 got a performance hit in this benchmark due to it's write buffering and maybe even L3 cache (which BTW adds ~20ns to mem latency in case of a miss). This benchmark doesn't tell us anything about how fast a core can access data in another core's cache, which was not written right before this access but at least tens of cycles earlier. Except for semaphores and the like such an access behaviour would just stand for a bad multithreaded coding style. ;)

Quote:

Originally Posted by MR_SmartAss

Depends of what kind of SSE code. For some code it is true, for some it isn't. For example during the decode phase the 128bit SSE instructions on the K8 are being split(vector path code) in two 64bit and executed in 2 cycles. K10 doesn't split the 128bit SSE instructions and it is executing them in 1 cycle.

SSE(2) instructions are mostly being double decoded on K8. SSE was vector decoded on K7. Since these 2 separate ops for both register halves on K8 finished one half one cycle earlier than the other half, it led to a nice 4 cycle latency for standard ops (add, sub, mul).

But as pointed out in the past (google for "k8 sse bottleneck"), there was a strange behaviour regarding SSE loads as you can see in the tests here again. Maybe due to the double decode, it was necessary, that such a decoded instruction uses a single FP unit sequentially. While using x87 or MMX loads it was possible to load two 64 bit values per cycle, this was not true using aligned 128 bit loads resulting in 0.5 SSE loads/cycle. This has been solved (maybe simply by avoiding the double decoding) - leading to a quadrupled SSE load performance compared to K8.

Quote:

Originally Posted by Dresdenboy

I assumed, that you refer to Johan's cache ping pong test. One of your later postings confirmed this. Well, I followed the development of this test on the original aceshardware forums a while ago and many ideas have been discussed back then. You can find the full discussion and an early version of the code here:
http://web.archive.org/web/200505281...0681&forumid=2

First I have to say, that this special test is referring to a special variant of core to core communication. And here I think, that K10 got a performance hit in this benchmark due to it's write buffering and maybe even L3 cache (which BTW adds ~20ns to mem latency in case of a miss). This benchmark doesn't tell us anything about how fast a core can access data in another core's cache, which was not written right before this access but at least tens of cycles earlier. Except for semaphores and the like such an access behaviour would just stand for a bad multithreaded coding style. ;)

SSE(2) instructions are mostly being double decoded on K8. SSE was vector decoded on K7. Since these 2 separate ops for both register halves on K8 finished one half one cycle earlier than the other half, it led to a nice 4 cycle latency for standard ops (add, sub, mul).

But as pointed out in the past (google for "k8 sse bottleneck"), there was a strange behaviour regarding SSE loads as you can see in the tests here again. Maybe due to the double decode, it was necessary, that such a decoded instruction uses a single FP unit sequentially. While using x87 or MMX loads it was possible to load two 64 bit values per cycle, this was not true using aligned 128 bit loads resulting in 0.5 SSE loads/cycle. This has been solved (maybe simply by avoiding the double decoding) - leading to a quadrupled SSE load performance compared to K8.

Really enjoy your posts here, they're very informative and always a nice addition to the forums :up:

Quote:

Originally Posted by cky2k6

this thread is the source of my frustration for the past couple of weeks... why can't asus get a working bios...

Has anyone tried it with registered RAM?

Quote:

Originally Posted by cky2k6 View Post
this thread is the source of my frustration for the past couple of weeks... why can't asus get a working bios...

Yes,for that matter no one out there got working boards Tyan, Super micro ???

This hit inquirer

Quote:

BARCELONA NEEDS HELP FAST.
Apparently Barcelona will not work in older Mainboards, OLDER Meaning Yesterday & Today & for some time, as it useS multiple voltages at same time, in core, so its input pins are also multiple ?, anyway, due to unique Voltages simultaneously in core, there are no Mainboards, AT ALL.

Testers Stuck Pair into tyan 29XX, No Go, took working system & stuck in Barcelona & 50% lower scores than opteron dual cores. Thats BIG PROBLEM, I think.

Maybe its simple Hardware Socket or just rewire/upgrade of controllers, YET What if Barcelona internal CROSSBARS are all tuned wrong to put out such slow output.

extra!!!extra:kb940520 tests redfiboost/butt locker. Get it ,,ha.ha.ha?ah.

Signed:PHYSICIAN THOMAS STEWART VON DRASHEK M.D..
posted by : THOMAS STEWART VON DRASHEK, 10 October 2007

link

This slowdown is handicapping the fanboys to brag about barc for least...:)

This was just someones comment about the new AMD roadmap that the inq posted. It was listed in the comments section. Who knows what validity it has.

Yes, its not news its a comment, point is all in all its a big lengthy delay,it holds the same validity as some one ranting in these forums.

But being a dual socket Opteron user its a frustrating experience.

my 2 cents

OEM's get their chips months or atleast weeks before launch but not able to see a decent working board after weeks of launch is the problem,forget about benchmarking and losing crown kind of stuff.

Stabilizing takes time its ok, but minimal working should not.

Problem is with board makers or BIOS writers or with barcs or combined ?

PS: Nothing against you Phil,its an AMD chip buyer/little fan frustration.

Maybe be true, but what I know is that we haven't seen one run correctly on this site. I suspect that the two main reviews that I have seen had them running correctly and I would imagine that there are AMD customers that have Barcelona systems running. I also believe that there are significant problems with the present platform.

I'm not sure what the problem is here, but I suspect its mainly because we're just a little farther down the food chain then Dell, etc. AMD's support is too busy stamping out fires elsewhere.

No offense to Dave, Steven, etc who have worked hard to give us some insight.

In the end, I do believe the writer is correct in sense that the present mb do not work well with K10 and we will not see the full potential of this cpu until new mb are designed. Unfortunately all of those who thought they could just buy the cpu will have to get a new mb as well. I'm glad I just upgraded my 939 and waited.

PS. No offense was taken and I'm frustrated as well

Hypothetical situation:

What if Barcelona is actually all it's cracked up to be, and a little more. What kind of demand would that cause for these chips? What if the supply couldn't fill demand, considering the HPC and supercomputer obligations AMD needs to fill first. My guess is the backlash from that would be as blown out of proportion as the situation now, if not worse.
I imagine AMD is ramping %110, and their first priorities are getting these chips into the hands of the big contractors who already signed for them. I'm sure that is much more important to them than worrying about keeping a handful of enthusiasts happy. After all, these are server and HPC chips.

Just, what if...! :D

The doom and gloom makes me :rofl:

Quote:

Originally Posted by flippin_waffles

Hypothetical situation:

What if Barcelona is actually all it's cracked up to be, and a little more. What kind of demand would that cause for these chips? What if the supply couldn't fill demand, considering the HPC and supercomputer obligations AMD needs to fill first. My guess is the backlash from that would be as blown out of proportion as the situation now, if not worse.
I imagine AMD is ramping %110, and their first priorities are getting these chips into the hands of the big contractors who already signed for them. I'm sure that is much more important to them than worrying about keeping a handful of enthusiasts happy. After all, these are server and HPC chips.

Just, what if...! :D

The doom and gloom makes me :rofl:

Oh, I'm not gloom and doom, just frustrated. You did notice that I said we wouldn't see the real potential of these chips until we had better mb. I meant that in a very positive way.:up:

Quote:

Originally Posted by PhilDoc

Oh, I'm not gloom and doom, just frustrated. You did notice that I said we wouldn't see the real potential of these chips until we had better mb. I meant that in a very positive way.:up:

Yeah I know, Phil. My doom and gloom blabbering wasn't directed at you at all. Just an in general observation from all the forums I lurk at around the net. :p: Although it was expected with the massive PR blitz that was launched against them over a year ago now. :yepp:

Quote:

Just, what if...! :D

The doom and gloom makes me :rofl:

For sure, OEM's and super comps are their priority and thats where they make bucks its a business fundamental.

Reminder: Its not about AMD and chips,its about boards and BIOS.

Topic here is not getting a decent board out,informing the same repeatedly to others,not a kid saying as AMD gone or some less mature remarks, and also note that HPC and super comp manufacturers wont read AnandTech or XS to buy chips/boards and wait for bench marks. :)

Here it is all about small biz owners,retailers for small clients. enthusiasts buy C2D Quad no offense.

If some one thinks ALL here are a bunch of enthusiasts play games using dual socket opterons,then one may need to rethink.

Quote:

Originally Posted by flippin_waffles

Yeah I know, Phil. My doom and gloom blabbering wasn't directed at you at all. Just an in general observation from all the forums I lurk at around the net. :p: Although it was expected with the massive PR blitz that was launched against them over a year ago now. :yepp:

Well, lets hope that it goes the way you've stated it. Either way I think Phenom is going to do a little better than the nay sayers predict.

That Inq comment looks too much like someone reading here posting, and the words being used "apparently" means its not his own source. The way it looks, it could very well be an Inq employee posting a rumor and yet escaping any blacklash this way, which is something never out of their lowliness. It means nothing more than what we already know.

There's no doubt Family 10h has been a disappointment and pure source of frustration up until now and even worse with the future roadmaps all being so badly delayed. No dual core, no tr-core, and no FX till at least March 08, means the FX 3GHz shown in January will not even be released 18 months from the marketing. What a bad shambles that is. Then you have crap MB and BIOS issues. :(

Quote:

Originally Posted by mutambo

OEM's get their chips months or atleast weeks before launch but not able to see a decent working board after weeks of launch is the problem,forget about benchmarking and losing crown kind of stuff.

Stabilizing takes time its ok, but minimal working should not.

Problem is with board makers or BIOS writers or with barcs or combined ?

Under normal circumstances, that's exactly how things should work but this time it didn't go as smoothly as planned. As you may or may not have heard, there was a HUGE problem with yeilds on the first run of Barcs. This problem wasn't something which was easily remedied, but ultimately things got fixed. In the meantime, AMD had fallen months behind schedule and instead of delaying the launch any further, the "original revised" launch date was still targeted. This resulted in an insanely short development time for even AMD's largest oem clients. TBH, I wouldn't be surprised to hear that mobo manufacturers were shipped samples as late as a few weeks before launch. IMO, Barcelona is a polished processor - probably exactly how AMD had originally planned it. The issues we're seeing involve compatibility with current mobos. The bios developers can't be blamed either as tey attemped to squeeze what normally takes months of testing into a period of a few weeks. I'm confident that things will sort themselves out in the coming weeks and we'll finally see what k10 is really capable of.

Back to the thread: Sorry I haven't had much time to update here lately I'll find some time to attempt to fill all the requests for benches. Oh, btw, did I happen to mention that ATI's Hammerhead arrived last week? Hopefully Phenom isn't far behind......

Hammerhead? I don't remember what that is...

I think its the RD790 mb.

come on steve, give kribibench a run! :D

The Hammerhead
http://www.xtremesystems.org/forums/...1&d=1188924377

I like me some shark!

Read the whole thread!....

It only took two days. S7e9h3n, you mentioned that the new bios you've acquired (I won't even ask ^_^) work, but you've had no luck with the opterons still :-\ Do the bios you have on hand leave the old options for overclocking on the L1N open? Have you had any luck at all getting either CPU to boot?

I really appreciate the numbers you've thrown up, as well as everyones' contributions of information regarding these chips.

Quote:

Originally Posted by doompc

The Hammerhead
http://www.xtremesystems.org/forums/...1&d=1188924377

Must...have....this........mobo :bounce:

Quote:

Originally Posted by doompc

The Hammerhead
http://www.xtremesystems.org/forums/...1&d=1188924377

Am I the only one who thinks its funny that the wire for that fan is so long and bunched up right there? :lol2:

Quote:

Originally Posted by SparkyJJO

Am I the only one who thinks its funny that the wire for that fan is so long and bunched up right there? :lol2:

yes you are.....:D

Bananas to that mobo...

I want this:

http://i115.photobucket.com/albums/n...na_fx_mobo.jpg

The arrangement is just too ideal! My TV tuner in the PCIe slot just above the first graphics card, then a couple of 2950's and then a Auzentech Prelude to round the whole package out :eleph:

Edit: I know that it's apparently a mock-up, I don't care! Give me one made of cardboard for all I care!

Edit2: Perhaps it's not a mock-up board, or AMD just wired the fans up to a car battery for this demonstration...

http://i115.photobucket.com/albums/n.../agena/1_s.jpg

that's my dream machine....

One day, one day... :)

1 Attachment(s)

Quote:

Originally Posted by RonindeBeatrice

Edit2: Perhaps it's not a mock-up board, or AMD just wired the fans up to a car battery for this demonstration...

http://i115.photobucket.com/albums/n.../agena/1_s.jpg

Nah the fans are plugged in. Circled in red in the attached pic, the wires are silver instead of the normal red/black/yellow.

Look at the green circled area - hmm, PCIe 2.0 slots apparently with only one 6pin plugged in.

Wonder what those two molexes are for though that I circled in yellow? :confused:

I see two 8-pin EPS connections on this board! Circled in blue - why not just one, are the CPUs really that power hungry that they can't run off of just 4 pins each?

Quote:

Originally Posted by SparkyJJO

Nah the fans are plugged in. Circled in red in the attached pic, the wires are silver instead of the normal red/black/yellow.

Look at the green circled area - hmm, PCIe 2.0 slots apparently with only one 6pin plugged in.

Wonder what those two molexes are for though that I circled in yellow? :confused:

I see two 8-pin EPS connections on this board! Circled in blue - why not just one, are the CPUs really that power hungry that they can't run off of just 4 pins each?

This was a fairly old pick, and IIRC they were running at 3.0 Ghz, so perhaps they were overclocking early silicon and they needed the extra stability of the 8pins/cpu. The molex's are totally unnecessary, I have one on my 580x crossfire board, and it's not required. I noticed the fan wires and yes that is a PCIe2.0 slot; the R600 is being powered by it and a sole PCIe connector ( = win)

Lol

Quote:

Originally Posted by MR_SmartAss

Because we care about desktop performance. Most of the desktop software today is single or dual threaded. Four threaded software is rarity and we haven't seen any application that can fully utilize all 4 cores yet.
About the "native" epithet, thats only a marketing which means nothing.

Depends on what is "very quickly". The cores on a K10 @ 2.4GHz or less are communicating slower than the cores of the different dies of the Core2 Quad MCM at same frequency.

Depends of what kind of SSE code. For some code it is true, for some it isn't. For example during the decode phase the 128bit SSE instructions on the K8 are being split(vector path code) in two 64bit and executed in 2 cycles. K10 doesn't split the 128bit SSE instructions and it is executing them in 1 cycle.

You would notice the same on every system, but it is more noticeable on a K8(regardless of the number of CPUs).
Sometimes yes, but sometimes it performs faster without NUMA. Depends of the OS and the code which is being processed.

The HT3 is useless on the desktop and it won't offer any performance benefit over HT2 or HT1.
The bandwidth on the AMD platforms scales with the number of sockets. So the single desktop CPU won't have more bandwidth than Barcelona(2 or more CPUs, ccNUMA). It will only have RAM with lower latency, which would boost its performance for sure. But how much, we can only speculate. Having two IMCs and a large L3 as a medium between the cores and the RAM leads me to a conclusion that it won't bring any dramatical performance improvements. 5% would be impressive.

I don't know who is the person, but I know that at higher frequency K10(and every CPU made up to date) doesn't scale better in performance. At certain frequencies(such are 1.6GHz, 2.4GHz and 3.2GHz) it will run the RAM at a little bit higher(5% to 10%) frequency, but it won't make any noticeable difference in performance. The same happens with the K8, but we don't see a 2.4GHz K8 offering any noticeable IPC advantage over a 2.3GHz.

This is nonsense. Barcelona doesn't scale better than linear, nor it scales linear.
http://img231.imageshack.us/img231/8483/scalingrg0.png
http://www.anandtech.com/cpuchipsets...spx?i=3092&p=6
Note, that this comparison is between the "bugged?" B1 and the new B2, so if you compare B2 to B2, the scaling would be even lower.
Also there was a guy(I don't remember who) from AMD's server division who officially said that K10 @2.5GHz would be around 15% faster than a 2GHz K10.

im not going to respond to this except for this

anandtech's Linpack test clearly showed that the processor scaled from one socket to two better than 100% it was more than twice as fast on 2 as one(better than linear)!

As for the rest of what you said you dont own the hardware and are repeating what others have said .

then you can explain why Quad FX preforms better at 5-5-5-18 800 than at 4-4-4-12 800,we no an Intel wouldnt react the same,because the platforms are way different.
This is a server processor not desktop, a single socket will get more memory bandwidth on the desktop platform(single socket),obviously more processors on an NUMA system provides more bandwidth,thats why AMD made it.

last point years ago my friend did Seti he had an 866 coppermine Intel,over in the corner was an old Xeon P2 400 the P2 400 slaughtered the 866 on seti at half the clock speed,so he decided to try games on it,games werent even playable.
moral of the story trying to guess the desktop preformance based on server chips,doesnt always workout the way you think(pointless)
Spec clearly showed AMD smashes core on FP with proper code, its that simple by over %100 on some types of code,I could be wrong but I beleive there are 19 FP tests AMD@2Gh beat Intel@3Gh on 17 of them. It also clearly showed that Intel wins on povray,that eveyone loves to run.See what happend is this,new processor gets run on Spec Intel being a huge beast and Media hound runs over to Spec.org to see were they beat AMD,Then they use the media and such to make the tests they win standard bench marks,then Intel throws in there shady compiler and bam they have a winner.Anandtech also clearly showed that on code not compiled with the Intel compiler AMD wins again
http://aceshardware.freeforums.org/v...r=asc&start=60

hey if you cant win fairly cheat LOL

Quote:

Originally Posted by RonindeBeatrice

Bananas to that mobo...

I want this:

http://i115.photobucket.com/albums/n...na_fx_mobo.jpg

The arrangement is just too ideal! My TV tuner in the PCIe slot just above the first graphics card, then a couple of 2950's and then a Auzentech Prelude to round the whole package out :eleph:

Edit: I know that it's apparently a mock-up, I don't care! Give me one made of cardboard for all I care!

Edit2: Perhaps it's not a mock-up board, or AMD just wired the fans up to a car battery for this demonstration...

http://i115.photobucket.com/albums/n.../agena/1_s.jpg

that looks like one of the old old old boards in my garage. way to green for my taste, functionality is consern #1 but jeez put some black or something in there so it dosent look like a fern is growing in my pc.

anandtech's Linpack test clearly showed that the processor scaled from one socket to two better than 100% it was more than twice as fast on 2 as one(better than linear)!

20-25 to 35-45, that's 100%+ alright. OTOH, 25-30 to 45-55, seems Intel scales better here.

last point years ago my friend did Seti he had an 866 coppermine Intel,over in the corner was an old Xeon P2 400 the P2 400 slaughtered the 866 on seti at half the clock speed,so he decided to try games on it,games werent even playable.
moral of the story trying to guess the desktop preformance based on server chips,doesnt always workout the way you think(pointless)

I don't see a comparison. You equate PIII vs PII with. Totally different archs, compared with Barcelona and Phenom, same core.

Spec clearly showed AMD smashes core on FP with proper code

SpecFP and SpecFP_rate aren't the same. SpecFP measures pure FP performance, specfp_rate runs multiple instances (not threads) of the same bench, using exorbitant bandwidth requirements. http://www.realworldtech.com/forums/...83478&roomid=2

See what happend is this,new processor gets run on Spec Intel being a huge beast and Media hound runs over to Spec.org to see were they beat AMD,Then they use the media and such to make the tests they win standard bench marks,then Intel throws in there shady compiler and bam they have a winner.

:ROTF:

Anandtech also clearly showed that on code not compiled with the Intel compiler AMD wins again

I don't recall Anandtech disclosing which apps were compiled with what besides Linpack.

don't you people get tired of spreading "false misinformations"?? omg!

Quote:

Originally Posted by MR_SmartAss

If you don't understand the charts or if you don't know what does mean "better than linear", don't spread false misinformations. Here is the article from anand.
http://img229.imageshack.us/img229/5...scalinget3.png
As it is obvious to anyone who can read the numbers and the charts:
Matrix size = 5000: 34 / 21 = 1.62 or 62% scaling
Matrix size = 30000: 44 / 23.5 = 1.87 or 87% scaling
There is no matrix size at which K10 scales 100% going from 1 to 2 sockets.

Some of us have already playing with K10, after and before it was released. Do you own a K10?

err..:rolleyes: Can you support this nonsense with anything?

Again, can you support this or you are just guessing?

What has the bandwidth to do with the performance scaling?

Both P2 and P3 have no common points with K10, nor Seti has a common point with the test from anand. Also instead of trying to teach us with your noob knowledge, educate yourself. P3 performs so great in games because it supports SSE, while P2 doesn't. The reason why Core2 kicks K8's ass in gaming is because of Core2's SSE performance. Because K10 has better SSE engine then K8, K10 performs better then K8 in games.
You don't have to be genius to conclude the performance scaling of an architecture similar to K8 on the same platform. QuadFX performs like a dual Opteron with the exception of few synthetic RAM bandwidth benchmarks. Why should anyone expect this to be different with K10 on the same platform?

Again you are spreading FUD. Spec clearly showed that Core2 smashes K10(as well as K8) on both INT and FP. Also, it seems that you don't understand, "the proper" code is the code made by SPEC and thats only code executed on these tests.

Again you are guessing something, and you are guessing wrong. Before you spread FUD, do a little research.

Are you copy-pasting this from AMDZone, Scientia's or Sharikou's blog? :ROTF:

Quick someone argue with this guy! i can't hes making perfect sense....

false misinformations = true informations :shrug:

Quote:

Originally Posted by LowRun

false misinformations = true informations :shrug:

LOL

Quote:

Originally Posted by LowRun

false misinformations = true informations :shrug:

heh, that's right up there with my favourite phrases "i didn't say nothing," and "you don't know nothing."

Quote:

Originally Posted by LowRun

false misinformations = true informations :shrug:

mon pointe exactly

My personal favorite is "I could care less".

So many people are too lazy to say "I couldn't care less" without realizing that by saying they "could care less" means they do care about the issue.

I just got dual 2347 and now running with DDR2 800 ecc reg & 8800GTX on Tyan S2932. Now installing window 2003 server 64bits. What do you want me to test now? I will do it tonight.:D
And i will compare it with dual xeon 5320 2.0ghz tomorrow.:D

Quote:

Originally Posted by linhvndiy

I just got dual 2347 and now running with DDR2 800 ecc reg & 8800GTX on Tyan S2932. Now installing window 2003 server 64bits. What do you want me to test now? I will do it tonight.:D
And i will compare it with dual xeon 5320 2.0ghz tomorrow.:D

NICE, looking forward to your results, hopefully this is the first non buggy k10 system on xs lol

run superpi, sandra mem bandwidth, 3dmark06, cinebench. basically, just look over this thread for more ideas.

Quote:

Originally Posted by linhvndiy

I just got dual 2347 and now running with DDR2 800 ecc reg & 8800GTX on Tyan S2932. Now installing window 2003 server 64bits. What do you want me to test now? I will do it tonight.:D
And i will compare it with dual xeon 5320 2.0ghz tomorrow.:D

About friggin time (not you personally).

Run EVERYTHING.

Personally requesting - wPrime, Cinebench, some form of DivX encoding, RAR/UNRAR/ZIP test, and whatever else you can think of that has not been mentioned here.

Quote:

Originally Posted by linhvndiy

I just got dual 2347 and now running with DDR2 800 ecc reg & 8800GTX on Tyan S2932. Now installing window 2003 server 64bits. What do you want me to test now? I will do it tonight.:D
And i will compare it with dual xeon 5320 2.0ghz tomorrow.:D

Ehhhh uuummmm lets see.........maybe every benchmark known to man would be a good start :up:

Prime95 benchmark results might be interesting for the crowd at mersenneforum.org. Most recent version seems to be http://www.mersenne.org/gimps/p95v255a.zip
which includes multithreaded tests.

1 Attachment(s)

1st: Cinebench R10. How about that point?
I will do more. And Good news is this barcelona can OC. I oc to 2.0ghz.:D
The bad news barce need more good mainboard.
I will have Supermicro suppor DDPM and will try more.
I am tweaking this mainboard too. S2932 now have option to run DDR2 1066 in bios. I want to know enable exactly NUMA in bios, please help me.
Super PI is 41s , barce not for PI. And mean nothing.
I will test render in 3Dmax tomorrow.

Quote:

Originally Posted by linhvndiy

1st: Cinebench R10. How about that point?

Very nice score for a 2ghz run :)

Quote:

Originally Posted by linhvndiy

1st: Cinebench R10. How about that point?
I will do more. And Good news is this barcelona can OC. I oc to 2.0ghz.:D
The bad news barce need more good mainboard.
I will have Supermicro suppor DDPM and will try more.
I am tweaking this mainboard too. S2932 now have option to run DDR2 1066 in bios. I want to know enable exactly NUMA in bios, please help me.
Super PI is 41s , barce not for PI. And mean nothing.
I will test render in 3Dmax tomorrow.

Thanks in advance for your time! :)

And as your screen shot shows CPU-Z is recognizing K10 rev. BA as A in stepping field!
Can you also make shot of CPU-Z memory information TAB? I like to see how your board is clocking it! :up:

Quote:

Originally Posted by Lightman

And as your screen shot shows CPU-Z is recognizing K10 rev. BA as A in stepping field!

And the Cinebench score of 12541 he got almost exactly matches the scores we saw from the techreport Barc reviews last month (slightly slower in fact).

http://techreport.com/articles.x/13224/5
http://techreport.com/articles.x/13176/5

Is there any meaningful benchmark evidence as yet that the barcs that had been sent out for review last month were somehow performance crippled as someone here had claimed?

Good job linhvndiy. :)

Regardless of what transpires I have bad news, in a way inevitable. I asked my boss at work of any news or contact with AMD and he told me he had contacted all the major and local server vendors for a possible order on Barcelona servers many times and was told repeatedly that they have no Barcelona processors for their platforms although everything else is ready because of very short supply and high demand and the next availability date is not known but most likely November. Every one of them stated AMD is having supply problems since the launch. The communication took place over the last week.

WOW Whats that !!!!!!!!!!!!!!!!!!

http://bp2.blogger.com/_S625jUOxSwA/...PECfp_rate.jpg

the strait line is INTEL

anandtech

"Notice that the Intel CPU has the advantage when it comes to raw processing power: it is about 19% faster in a single CPU configuration. Once you add a second CPU in both systems, that 19% lead is turned into a 3% advantage for AMD"

What he didnt say here is the Intel is clocked 17% higher and still got beat by 3% on 2 processors

that would be scaling better than 100%
Sorry it wasnt the test using Intels compiler

My Barcelonas will be here Thursday and my KFSN4-DRE is on the table :D

Quote:

Originally Posted by KTE

Good job linhvndiy. :)

Regardless of what transpires I have bad news, in a way inevitable. I asked my boss at work of any news or contact with AMD and he told me he had contacted all the major and local server vendors for a possible order on Barcelona servers many times and was told repeatedly that they have no Barcelona processors for their platforms although everything else is ready because of very short supply and high demand and the next availability date is not known but most likely November. Every one of them stated AMD is having supply problems since the launch. The communication took place over the last week.

I can get you a few 8347's ;) :D

Amd

Quote:

Originally Posted by s7e9h3n

I can get you a few 8347's ;) :D

Send them here ,ill takem

Quote:

Originally Posted by Viper666

Send them here ,ill takem

If you're serious about purchasing them, drop me a PM and we'll work out the details ;)

Quote:

Originally Posted by Viper666

My Barcelonas will be here Thursday and my KFSN4-DRE is on the table :D

I also have prepared KFSN4-DRE, now waiting arrival of rental Barcelona:)
http://222.151.147.26/c-board/file/K...1CPU_setup.jpg
This board has DDR2-400, 533, 667, 800, and 1066 setting.
http://222.151.147.26/c-board/file/K..._mem-clock.jpg
And old ClockGen for nForce 4 works fine on this board.
http://222.151.147.26/c-board/file/K...250_MEM400.png
Beta BIOS 1004(though it's not for -DRE but for -DRE/SAS) shows that
it includes AGESA version 3.1.1.1...which is latest I've ever seen.
http://222.151.147.26/c-board/file/1...SA-3.1.1.1.jpg

Sorry for hijacking your thread, s7e9h3n...
I deeply appreciate your exclusive support;)

Quote:

Originally Posted by Viper666

http://bp2.blogger.com/_S625jUOxSwA/...PECfp_rate.jpg

the strait line is INTEL

anandtech

"Notice that the Intel CPU has the advantage when it comes to raw processing power: it is about 19% faster in a single CPU configuration. Once you add a second CPU in both systems, that 19% lead is turned into a 3% advantage for AMD"

What he didnt say here is the Intel is clocked 17% higher and still got beat by 3% on 2 processors

that would be scaling better than 100%
Sorry it wasnt the test using Intels compiler

My Barcelonas will be here Thursday and my KFSN4-DRE is on the table :D

You actually want to get one of those misbehaving things? Better than 100% scaling or no, I prefer working systems. :D

ASUS now shows a 1006 bios for dre :d oops mistake wrong board

but what you have is exactly what I got same rev :D how about sharing that bios

Lol

Quote:

Originally Posted by JVguest

You actually want to get one of those misbehaving things? Better than 100% scaling or no, I prefer working systems. :D

LOL Im sure that ASUS made sure it works !

Quote:

Originally Posted by Viper666

ASUS now shows a 1006 bios for dre :d

Oh, thanks for your info...I must ask to my contact:)

Quote:

Originally Posted by kyosen

Oh, thanks for your info...I must ask to my contact:)

Cool, waiting for you result. I will test the 3Dmax render now. Then compare with xeon 5320. I dont belive anand result.:D

Hey, he "somehow lied" about superpi, but he commented about Cinebench!

And now that theree are finally BAs around, really looking forward to see direct comparisons against Xeons!

By the way, can someone confirm/deny ccNUMA and dual-channel bugs? Are them working properly on this BA processsor?

2 Attachment(s)

This is 3dmax 9 64bits render. Barcelona 2.0ghz result 2min4'.
Dual Xeon 5355 2.66ghz is about 1min4x'(It's not my test)
I will test this file with xeon 5320.
I thinks Barcelona equal or a lilte bit faster than xeon in 3dmax render.
And AMD x2 6000+ render this file in more than 6min.
But 3dmax is not optimized for 8 cores, i think. High clock Q6600 may have better result.
I want to test server app now, hic.

Quote:

Originally Posted by Viper666

http://bp2.blogger.com/_S625jUOxSwA/...PECfp_rate.jpg

the strait line is INTEL

anandtech

"Notice that the Intel CPU has the advantage when it comes to raw processing power: it is about 19% faster in a single CPU configuration. Once you add a second CPU in both systems, that 19% lead is turned into a 3% advantage for AMD"

What he didnt say here is the Intel is clocked 17% higher and still got beat by 3% on 2 processors

that would be scaling better than 100%
Sorry it wasnt the test using Intels compiler

My Barcelonas will be here Thursday and my KFSN4-DRE is on the table :D

You have not read this graph right, it's not even a scaling graph, it is a comparitive graph. The values you quote also do not show scaling above 100% I'm afraid because they are comparitive too.

I am sorry, you cannot get better than 100% scaling with cpu's. It's impossible. If you could then IPC at one frequency becomes IPC+N at a higher frequency which is ludicrous. It's even more ludicrous if you are talking about multiple cpu's which you seem to be doing above. CPU's work downwards from 100%, not up from it.

Unless of course the K10 has it's own laws of physics, maybe that is why they are in such short supply ?

Regards
Andy

I think he's trying to use the intel performance factor to amd performance factor as his example for over 100% performance increase, which it is not. It is only an increase in performance per clock cycle.

Stephen, PM incomming.

"Notice that the Intel CPU has the advantage when it comes to raw processing power: it is about 19% faster in a single CPU configuration. Once you add a second CPU in both systems, that 19% lead is turned into a 3% advantage for AMD"

What he didnt say here is the Intel is clocked 17% higher and still got beat by 3% on 2 processors

that would be scaling better than 100%
Sorry it wasnt the test using Intels compiler

ok on one processor(socket)intel won by 19% so on 2(sockets)it should have been 38%,(that would be perfect 100% scaling)but in fact the AMD ran more than twice as fast to WIN by 3%, thats more than 100% scaling on 2 processors,and yes it Phenomenal!! thats 122% scaling

the graff had nothing to do with the scaling comment. I was responding to two post.in one Ill be sure not to do it as it seems to confuse people

Quote:

Originally Posted by linhvndiy

I want to know enable exactly NUMA in bios, please help me.

The s3992 has a bios option called "node interleave", if it's disabled numa support is enabled.

Quote:

Originally Posted by Viper666

ok on one processor(socket)intel won by 19% so on 2(sockets)it should have been 38%,(that would be perfect 100% scaling)but in fact the AMD ran more than twice as fast to WIN by 3%, thats more than 100% scaling on 2 processors,and yes it Phenomenal!! thats 122% scaling

The focus should be moved from "100% scaling" vs. "122% scaling" (surely meant ironically) to something like 60% vs. 80%. Then all people here should be happy ;) As we all know, one type of CPU in a multi CPU setup is hampered more by FSB, inter core traffic and mem BW than another type of CPU with its direct connection links, a cache shared by 4 cores and so on..

Will a car run twice as fast by doubling the number of cylinders?

1 Attachment(s)

I can understand this result. Some one can explain to me?
I got 1000 point Cinebench R10 more with DDR2 667(better with old result with ddr2 800).And window2003 change to Winxp64
I think motherboard maker have a lot of work to do to tuning their bios to get the best performance of Barce.
And i think s2932 have somethings wrong with memory or not optimized for barce mem controller.
So it's can make these other review barcelona may be not correct.:D

linhvndiy, I kindly ask you to run "openssl speed" :) On Barcelona, and also your dual-Xeon 5320 if possible (for comparison). See http://www.xtremesystems.org/forums/...45&postcount=1

- Z

Quote:

Originally Posted by linhvndiy

I can understand this result. Some one can explain to me?
I got 1000 point Cinebench R10 more with DDR2 667(better with old result with ddr2 800).And window2003 change to Winxp64

This could have been caused by the OS (different memory and CPU thread handling). CB might also not depend too much on memory latency/bandwith.

Quote:

Originally Posted by MR_SmartAss

16x64bit vs 8x32bit GPRs

I looked at the older screenshot again, because this was my first thought. But he run Win 2003 64 bit edition as it seems. The older CB screenshot shows "(64 Bit)". See for yourself: http://www.xtremesystems.org/forums/...postcount=1048

But there is another problem with your argument. CB doesn't do calculations using the integer registers (GPRs) but SSE registers. But these are also doubled from 8 to 16 in 64 bit mode.

Quote:

Originally Posted by MR_SmartAss

Oh nooooo.....It is very bad!!!!
Remember? :rolleyes:

http://www.xtremesystems.org/forums/...&postcount=536

http://www.xtremesystems.org/forums/...&postcount=522

:clap:

You mixing his pi results up with cinebench. that's what he was refering to.

Quote:

Originally Posted by zpdixon

linhvndiy, I kindly ask you to run "openssl speed" :) On Barcelona, and also your dual-Xeon 5320 if possible (for comparison). See http://www.xtremesystems.org/forums/...45&postcount=1

- Z

Yah, i will run it in some mins.

Quote:

Originally Posted by linhvndiy

Yah, i will run it in some mins.

Woohoo. I've finally found people willing to run it. Perseverance pays off ! :D

- Z

Quote:

Originally Posted by MR_SmartAss

Oh nooooo.....It is very bad!!!!
Remember? :rolleyes:

Are you on drugs? :yepp:

Quote:

Originally Posted by JohannesRS

Hey, he "somehow lied" about superpi, but he commented about Cinebench!

No lies, we haven't even remotely seen the power of k10 as yet ;)

Mr.Ass is clutching at straws :yepp:

Quote:

Originally Posted by BeardyMan

You mixing his pi results up with cinebench. that's what he was refering to.

Yeh he knows mate, he's merely trying to alleviate the trouble he has sleeping at night worrying about whether or not I'm right with my K10-SUPERPi predictions :p:

yes but to predict is not the same has having the priviledge? for seeing a real result.

Quote:

Originally Posted by linhvndiy

Yah, i will run it in some mins.

You enter the business of running benchmarks seldomly or never run by others but still offering nice informations. ;)

While you are on it, please run the benchmarks in the most recent Prime95 32-/64-bit versions.

http://www.mersenne.org/gimps/p95v255a.zip
http://www.mersenne.org/gimps/p64v255.zip
(Options -> Benchmark)

To dresdenboy: I will do it tonight.
I just do the openssl. But i dont think it run well : only one cpu run. It should run on Linux. Please let me know how?

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes
md2 1449.00k 3054.78k 4214.85k 4665.85k 4804.30k
mdc2 0.00 0.00 0.00 0.00 0.00
md4 15350.40k 54126.60k 157606.54k 298261.62k 405246.76k
md5 14174.73k 49422.88k 141059.09k 262657.00k 312424.88k
hmac(md5) 21801.33k 71048.50k 180837.68k 291207.91k 249707.40k
sha1 14510.02k 47249.78k 119463.93k 194800.77k 239247.29k
rmd160 11639.53k 34359.30k 76082.83k 109708.79k 125589.71k
rc4 224624.66k 256885.87k 263172.02k 267750.02k 269124.41k
des cbc 53687.09k 55342.95k 55489.39k 55784.59k 55924.05k
des ede3 19719.34k 19884.10k 19994.27k 20051.53k 19993.81k
idea cbc 36091.68k 38692.84k 39475.80k 39765.86k 39770.57k
rc2 cbc 20258.67k 20911.40k 21074.26k 21115.37k 21136.65k
rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00
blowfish cbc 81029.78k 85576.21k 86592.08k 86928.58k 86951.11k
cast cbc 75352.42k 79531.72k 80737.32k 81029.78k 81029.78k
aes-128 cbc 52766.84k 56660.64k 57723.09k 58032.57k 58042.61k
aes-192 cbc 45497.53k 49483.01k 50526.17k 50770.82k 50770.82k
aes-256 cbc 40900.09k 43919.41k 44644.00k 44924.93k 44930.95k
camellia-128 cbc 0.00 0.00 0.00 0.00 0.0

camellia-192 cbc 0.00 0.00 0.00 0.00 0.0

camellia-256 cbc 0.00 0.00 0.00 0.00 0.0

sha256 9979.01k 24648.36k 45642.97k 57999.97k 58674.42k
sha512 3135.02k 12525.45k 19688.10k 27898.67k 31767.51k
sign verify sign/s verify/s
rsa 512 bits 0.000535s 0.000044s 1869.0 22676.5
rsa 1024 bits 0.002409s 0.000118s 415.1 8472.5
rsa 2048 bits 0.013613s 0.000386s 73.5 2589.7
rsa 4096 bits 0.087500s 0.001354s 11.4 738.5
sign verify sign/s verify/s
dsa 512 bits 0.000397s 0.000478s 2517.8 2090.3
dsa 1024 bits 0.001103s 0.001329s 906.8 752.3
dsa 2048 bits 0.003602s 0.004389s 277.6 227.8

Quote:

Originally Posted by linhvndiy

To dresdenboy: I will do it tonight.

Thanks in advance. BTW, Prime95 will write its results into results.txt in the same dir where the exe file resides. It will automatically do tests on one and multiple cores.

Quote:

Originally Posted by BeardyMan

yes but to predict is not the same has having the priviledge? for seeing a real result.

My predictions are based on what I've seen already, from some people with a little inside help :). That's all there is to it I'm afraid.

Quote:

Originally Posted by SOLDNER-MOFO64

My predictions are based on what I've seen already, from some people with a little inside help :). That's all there is to it I'm afraid.

It sure seems you are talking bs, I mean, you some how seen a record breaking fast k10 preview when everone else only had the so called bugged k10s. Can you tell us where you seen this ?

Quote:

Originally Posted by gallag

It sure seems you are talking bs, I mean, you some how seen a record breaking fast k10 preview when everone else only had the so called bugged k10s. Can you tell us where you seen this ?

WERD.

He's all talk and no show.

Quote:

Originally Posted by gallag

It sure seems you are talking bs, I mean, you some how seen a record breaking fast k10 preview when everone else only had the so called bugged k10s. Can you tell us where you seen this ?

I never claimed to have seen any record breaking fast k10's at all. I said that from benchmarks I've been priveledged (lucky enough) to see, that k10 cpu's will run SPi 1M in less than 23seconds...and when OC'd will manage to bring that down to 13seconds. I never said which K10 cpu's so I'm sorry if any of you managed to slide an exact model number in there :rolleyes:

As for now, we have around 6 users on XS with a k10...all running bugged to the max in server boards/AM2 boards with ECC mem/no oc not to mention most guys have trouble trying to get theirs to boot and run properly.......yet you guys already seem to decide these are the final product???? :rofl:

Come on....:clap:

Quote:

Originally Posted by gallag

It sure seems you are talking bs, I mean, you some how seen a record breaking fast k10 preview when everone else only had the so called bugged k10s. Can you tell us where you seen this ?

I never claimed to have seen any record breaking fast k10's at all. I said that from benchmarks I've been priveledged (lucky enough) to see, that k10 cpu's will run SPi 1M in less than 26seconds...and when OC'd will manage to bring that down to 17seconds. I never said which K10 cpu's so I'm sorry if any of you managed to slide an exact model number/revision/stepping in there :rolleyes:...I certainly didn't mean the first one's out the factory with no bios.
What are you guys on......10yr old child pills?

As for now, we have around 6 users on XS with a k10...all running bugged to the max in server boards/AM2 boards with ECC mem/no oc not to mention most guys have trouble trying to get theirs to boot and run properly.......yet you guys already seem to decide these are the final product???? :rofl:

Come on....:clap:

Quote:

K10 cpu @ 3Ghz & DDR2 @ 1066+mhz 5,5,5,18 & AM2+ MOBO = <17s

K10 cpu @ 2Ghz & DDR2 @ 667mhz 5,5,5,15 & AM2 MOBO = 26

Quote:

Originally Posted by SOLDNER-MOFO64

I said that from benchmarks I've been priveledged (lucky enough) to see, that k10 cpu's will run SPi 1M in less than 26seconds...

26 sec is K10's 3+GHz result.
it is obvious. :)

the only reason Core2 achieves so amazing superpi results is its extremely fast cache. K10 is only 5-10-15 (depending on installed memory type) per cent faster than K8 when running superpi - it is a fact

Quote:

Originally Posted by kyosen

I also have prepared KFSN4-DRE, now waiting arrival of rental Barcelona:)

Sorry for hijacking your thread, s7e9h3n...
I deeply appreciate your exclusive support;)

Not a hijack whatsoever my friend....Get ready, the cpu's go in the mail tomorrow and should reach you by the weekend :toast:

Quote:

Originally Posted by s7e9h3n

I can get you a few 8347's ;) :D

Strangely enough out of the 9 vendors he did contact the 8347 was the exact CPU 1 of them said he might be able to get hold of late October. Maybe it was you he contacted afterall :D (jk)

Thanks Ste but he's far up the ladder from me at the corp and the best I get to speak to is his secretary. :( They order batch servers directly from the like of HP, Sun, Dell and other server builders and never go to typical retail outlets for single CPU buys. Its gear for a research lab at a government funded hospital BTW, so the order would be quite large.

Quote:

Originally Posted by KTE

Strangely enough out of the 9 vendors he did contact the 8347 was the exact CPU 1 of them said he might be able to get hold of late October. Maybe it was you he contacted afterall :D (jk)

Thanks Ste but he's far up the ladder from me at the corp and the best I get to speak to is his secretary. :( They order batch servers directly from the like of HP, Sun, Dell and other server builders and never go to typical retail outlets for single CPU buys. Its gear for a research lab at a government funded hospital BTW, so the order would be quite large.

If you caught me a couple weeks ago, I could have supplied him as many 8350's as he needed ;)

Quote:

Originally Posted by SOLDNER-MOFO64

I never claimed to have seen any record breaking fast k10's at all. I said that from benchmarks I've been priveledged (lucky enough) to see, that k10 cpu's will run SPi 1M in less than 26seconds...and when OC'd will manage to bring that down to 17seconds. I never said which K10 cpu's so I'm sorry if any of you managed to slide an exact model number/revision/stepping in there :rolleyes:...I certainly didn't mean the first one's out the factory with no bios.
What are you guys on......10yr old child pills?

As for now, we have around 6 users on XS with a k10...all running bugged to the max in server boards/AM2 boards with ECC mem/no oc not to mention most guys have trouble trying to get theirs to boot and run properly.......yet you guys already seem to decide these are the final product???? :rofl:

Come on....:clap:

Where did you witness these k10s running? And how come they were not also bugged? sorry if this sounds like i am interrogating you, I am genuinely interested.

Btw: Soldner, didn't intend to say that you lied. Thats why said it "in between these signs"... ;)

Quote:

Originally Posted by SOLDNER-MOFO64

I never claimed to have seen any record breaking fast k10's at all. I said that from benchmarks I've been priveledged (lucky enough) to see, that k10 cpu's will run SPi 1M in less than 26seconds...and when OC'd will manage to bring that down to 17seconds. I never said which K10 cpu's so I'm sorry if any of you managed to slide an exact model number/revision/stepping in there :rolleyes:...I certainly didn't mean the first one's out the factory with no bios.
What are you guys on......10yr old child pills?

As for now, we have around 6 users on XS with a k10...all running bugged to the max in server boards/AM2 boards with ECC mem/no oc not to mention most guys have trouble trying to get theirs to boot and run properly.......yet you guys already seem to decide these are the final product???? :rofl:

Come on....:clap:

a difference of nearly 20secs is just to much to coverup with faster mem.

Quote:

Originally Posted by linhvndiy

I just do the openssl. But i dont think it run well : only one cpu run. It should run on Linux. Please let me know how?

Code:

type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md2 1449.00k 3054.78k 4214.85k 4665.85k 4804.30k mdc2 0.00 0.00 0.00 0.00 0.00 md4 15350.40k 54126.60k 157606.54k 298261.62k 405246.76k md5 14174.73k 49422.88k 141059.09k 262657.00k 312424.88k hmac(md5) 21801.33k 71048.50k 180837.68k 291207.91k 249707.40k sha1 14510.02k 47249.78k 119463.93k 194800.77k 239247.29k rmd160 11639.53k 34359.30k 76082.83k 109708.79k 125589.71k rc4 224624.66k 256885.87k 263172.02k 267750.02k 269124.41k des cbc 53687.09k 55342.95k 55489.39k 55784.59k 55924.05k des ede3 19719.34k 19884.10k 19994.27k 20051.53k 19993.81k idea cbc 36091.68k 38692.84k 39475.80k 39765.86k 39770.57k rc2 cbc 20258.67k 20911.40k 21074.26k 21115.37k 21136.65k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 blowfish cbc 81029.78k 85576.21k 86592.08k 86928.58k 86951.11k cast cbc 75352.42k 79531.72k 80737.32k 81029.78k 81029.78k aes-128 cbc 52766.84k 56660.64k 57723.09k 58032.57k 58042.61k aes-192 cbc 45497.53k 49483.01k 50526.17k 50770.82k 50770.82k aes-256 cbc 40900.09k 43919.41k 44644.00k 44924.93k 44930.95k camellia-128 cbc 0.00 0.00 0.00 0.00 0.0 camellia-192 cbc 0.00 0.00 0.00 0.00 0.0 camellia-256 cbc 0.00 0.00 0.00 0.00 0.0 sha256 9979.01k 24648.36k 45642.97k 57999.97k 58674.42k sha512 3135.02k 12525.45k 19688.10k 27898.67k 31767.51k sign verify sign/s verify/s rsa 512 bits 0.000535s 0.000044s 1869.0 22676.5 rsa 1024 bits 0.002409s 0.000118s 415.1 8472.5 rsa 2048 bits 0.013613s 0.000386s 73.5 2589.7 rsa 4096 bits 0.087500s 0.001354s 11.4 738.5 sign verify sign/s verify/s dsa 512 bits 0.000397s 0.000478s 2517.8 2090.3 dsa 1024 bits 0.001103s 0.001329s 906.8 752.3 dsa 2048 bits 0.003602s 0.004389s 277.6 227.8

[edit: I updated my percentages after learning that linhvndiy was running his 2347's at 1950 MHz to obtain the above scores]

Thanks linhvndiy ! These scores are very interesting.

I have compared them with the scores of K8 at the same speed to do a clock-for-clock comparison between K8 and K10. Technically I ran openssl on a dual Opteron 280 (2.4 GHz), but I scaled its scores down to simulate a 1950 MHz K8 (all of the openssl speed tests scale linearly with the clock frequency, so that's a pretty good estimation, see the end of this post for my K8 results.) For the tests using different buffer sizes (16 to 8192 bytes), I only took into account the three larger buffer sizes (256, 1024 and 8192 bytes), because the overhead of the OpenSSL API with the 2 smaller buffer sizes (16 and 64 bytes) is too high and cripples the results on both K10 and K8 which makes any sort of direct comparison difficult.

So here is how K10 fares against K8 in the most popular encryption/hashing algorithms:

o md4: K10 is between 3% and 6% faster than K8 (+6%, +4%, +3% for the three different buffer sizes)
o md5: the throughput varies too much between the three buffer sizes (the K10 scores vary by +6%, +5%, -8% compared to K8), are you sure your machine was idle ?
o hmac(md5): the throughput varies too much here too (+7%, +5%, -18%)
o sha1: K10 seems to have a negligible advantage over K8 (+7%, +5%, and +4%)
o rc4: K10 is as fast as K8 (within 1% of each other)
o blowfish: K10 is consistently 5% faster than K8
o aes-128: K10 is between 16% and 17% faster than K8 with the three different buffer sizes (for example: 8192-byte test: 58042 kB/s vs. 49500 kB/s)
o aes-192: K10 is exactly 18% faster than K8 no matter what the buffer size is
o aes-256: K10 is also 18% faster
o sha256: the throughput varies too much here (+11%, +11%, +3%), weird
o sha512: K10 is between 4% and 5% faster than K8
o rsa 1024-bit: K10 is +12% faster than K8 on sign operations (415 vs. 370 sign/s)
o rsa 1024-bit: K10 is +10% faster than K8 on verify operations (8473 vs. 7730 verify/s)
o dsa 1024-bit: K10 is +9% faster than K8 (907 vs. 830 sign/s)
o dsa 1024-bit K10 is +9% faster than K8 (752 vs. 690 verify/s)

Overall, K10 is, clock-for-clock, 0% to 18% faster than K8 on these 32-bit (100% ALU) OpenSSL speed tests. (It would have been interesting if the guys who ported OpenSSL to Windows enabled the SSE2 assembly implementation of sha512...)

linhvndiy, you said you wanted to benchmark the 8 cores at the same time. This is possible with "openssl speed -multi 8" under Linux/*BSD/Solaris... But you should know that these tests all scale linearly with the number of cores and the frequency clock (they all fit in the L2 cache), so just multiplying your scores by 8 gives a very precise estimation.

Now, if I can have one more wish ( ;) ) I would ask you to run the same benchmark under 64-bit Linux/*BSD/Solaris. The RSA scores would jump by about x3 (the BN lib just loves 64-bit archs), and the RC4 and MD5 throughput would increase by 15-30% (at least that's what is observed with K8). Running openssl speed in 64-bit mode (and with -multi 8) is how the guys at http://www.tecchannel.de/server/proz...28/index9.html obtained their excellent RSA scores.

If you don't know what 64-bit distro to run, I recommend the 64-bit Ubuntu 7.10 (release candidate), the OpenSSL version they distribute is very recent (package named "openssl", version 0.9.8e).

- Z

Code:

"openssl speed" using the 32-bit Windows port of OpenSSL 0.9.8e running on: vendor_id : AuthenticAMD cpu family : 15 model : 33 model name : Dual Core AMD Opteron(tm) Processor 280 stepping : 2 cpu MHz : 2405.476 cache size : 1024 KB OpenSSL 0.9.8e 23 Feb 2007 built on: Wed Feb 28 01:35:20 2007 options:bn(64,32) md2(int) rc4(idx,int) des(idx,cisc,4,long) aes(partial) idea(int) blowfish(idx) compiler: cl /MD /Ox /O2 /Ob2 /W3 /WX /Gs0 /GF /Gy /nologo -DOPENSSL_SYSNAME_WIN32 -DWIN32_LEAN_AND_MEAN -DL_ENDIAN -DDSO_WIN32 -D_CRT_SECURE_NO_DEPRECATE -D_CRT_NONSTDC_NO_DEPRECATE -DBN_ASM -DMD5_ASM -DSHA1_ASM -DRMD160_ASM -DOPENSSL_USE_APPLINK -I. /Fdout32dll -DOPENSSL_NO_CAMELLIA -DOPENSSL_NO_RC5 -DOPENSSL_NO_MDC2 -DOPENSSL_NO_KRB5 -DOPENSSL_NO_DYNAMIC_ENGINE available timing options: TIMEB HZ=1000 timing function used: ftime The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes md2 1597.75k 3568.29k 4965.69k 5497.35k 5674.21k mdc2 0.00 0.00 0.00 0.00 0.00 md4 17908.11k 62660.00k 183608.38k 352740.42k 485592.36k md5 16226.33k 56910.50k 163840.00k 308830.48k 417473.49k hmac(md5) 24636.15k 81670.76k 208121.77k 341955.99k 425412.77k sha1 16008.79k 52702.61k 137180.83k 228728.23k 283938.50k rmd160 13354.47k 39887.58k 89472.52k 130245.25k 150081.32k rc4 256140.70k 306881.58k 320910.79k 326595.60k 327808.05k des cbc 60699.04k 63226.74k 63937.56k 64071.86k 64305.16k des ede3 22338.35k 22840.12k 22980.88k 23171.21k 23027.67k idea cbc 43123.55k 46596.91k 47527.52k 47798.34k 47798.34k rc2 cbc 23512.32k 24422.76k 24637.96k 24690.53k 24692.35k rc5-32/12 cbc 0.00 0.00 0.00 0.00 0.00 blowfish cbc 93077.48k 100013.21k 101526.27k 102144.39k 102144.39k cast cbc 87267.70k 92820.01k 94121.83k 94626.15k 94920.60k aes-128 cbc 57367.81k 59726.65k 61030.25k 61030.25k 60919.45k aes-192 cbc 50541.39k 51949.89k 52675.72k 52849.95k 52849.95k aes-256 cbc 44792.99k 46154.65k 46661.70k 46863.73k 46929.28k camellia-128 cbc 0.00 0.00 0.00 0.00 0.0 camellia-192 cbc 0.00 0.00 0.00 0.00 0.0 camellia-256 cbc 0.00 0.00 0.00 0.00 0.0 sha256 10971.24k 27588.43k 50805.41k 64589.86k 70135.20k sha512 3692.13k 14766.89k 23183.76k 32873.14k 37448.10k sign verify sign/s verify/s rsa 512 bits 0.000492s 0.000041s 2032.8 24127.0 rsa 1024 bits 0.002308s 0.000108s 455.2 9512.6 rsa 2048 bits 0.012288s 0.000333s 83.4 3055.7 rsa 4096 bits 0.077100s 0.001137s 13.0 879.7 sign verify sign/s verify/s dsa 512 bits 0.000361s 0.000429s 2769.9 2331.6 dsa 1024 bits 0.000980s 0.001177s 1020.4 849.9 dsa 2048 bits 0.003127s 0.003701s 319.8 270.2

I did test at 1950mhz. And i feel this 32bit ssl is not the test for barcelona. I think barce will perform real in Linux and multi threads. But i have to research how to test under Linux.

Quote:

Originally Posted by linhvndiy

I did test at 1950mhz. And i feel this 32bit ssl is not the test for barcelona. I think barce will perform real in Linux and multi threads. But i have to research how to test under Linux.

Given that the OpenSSL assembly code is absolutely not optimized for the K10 architecture (yet) and that it is exclusively using the ALU unit which is the one that changed the less between K8 & K10 (compared to, say, the FPU and SSE units), I think that 5%-18% of perf improvements across the range of algorithms is pretty good (I'll edit my post with updated percentage numbers later now that I know the clock was 1950 MHz).

Doing 64-bit OpenSSL speed tests is very easy. Just follow my link to download the Ubuntu 7.10 64-bit DVD image. Burn it. Install it. Then in a terminal: "sudo apt-get install openssl", type in your user passwd, then "openssl speed".

- Z

Quote:

Originally Posted by zpdixon

Doing 64-bit OpenSSL speed tests is very easy. Just follow my link to download the Ubuntu 7.10 64-bit DVD image. Burn it. Install it. Then in a terminal: "sudo apt-get install openssl", type in your user passwd, then "openssl speed".

- Z

I hope, there won't be K10 related problems while running Linux. Might sudo apt-get also work in a live system? Would make things easier I suppose.

However, Prime95 benchmarking won't cause such troubles ;)