The L3 latency is 25 cycles , that's a lot.
Printable View
The L3 latency is 25 cycles , that's a lot.
Some fast testing for showing K8 improvement over mhz
1 year old WinXp install for 24/7 os (thats why the times are slow and also 4x512mb and not tweaked)
Dfi Venus
Opty 165
4x512mb bh5
Used clockgen in windows to lower and higher the clocks"not same memmory speed and fsb, like the k10 scores"
54,907s 1600mhz 177fsb
43,641 2000 222
36,266 2400 267 ~7,375 improvement
34,843 2500 277
33,453 2600 289
32,203 2700 300
31,078 2800 311
30,000 2900 322
28,938 3000 333
28,047 3100 344 ~4,156 improvement from 2700mhz
Thanks for all the hard work Kyosen. The L3 latency was pretty much expected. At this point no one seems to know how that will affect performance, but time will tell. Very supprised about the spi scores with gangled and ungangled. Could have had a better choice of terminology, lol, but maybe a repeat of the scores just to be sure. Looking forward to anything you'd like to run, looks great so far.
L3 Cache had a good impact back in the days of P4 Northwood (Gallatin) CPU....You could gain about 4% - 8% more depending on applications....So IF Barchy gets ONLY 4-8% more than K8, It's the L3 Cache, so AMD isn't going to be that fast....If it gains more, then it will be OK....;)
Any chance at some 2.6ghz love (or higher)? :D
BTW...good job! Finally nice to see some overclocking with these chips.
Hmm strange
Quad core
look at that bios shot
it says
L1 = 512Kb, 128X4
L2 = 2048kb , 512kb X4
L3 = 2048kb
is it me or isn't that wrong?
L3 is shared 2MB so its correct
You are wrong. Barca have 4x128KB L1 + 4x512KB L2 + 2 MB L3 shared cache.
Thanks for the work Kyosen. :) Very much appreciated.
Does it clock any higher?
funny the L3 runs at the speed of the memory controller,on Barcy thats 1.8 @2Ghz on Phenom its alot higher ! I think I saw 4Ghz somewhere
so we can overclock l3 cache? wow..
I doubt it will without some more juice pumpin. We're lookin at a NF4 here not a 790.
Kyosen and Stephen are owed a debt of gratitude. They've taken crap chips to a level we haven't seen yet. On crap boards, I might add...
As far as I know... these guys are the only ones on the internet that have given us a glimpse of the AMD future.
Thanks guys,
BRUNO
Imho, chipsets have always been a major problem for AMD and have hampered performance. Whether their own or other manufactures. The NF4 did happen to be one of the better though. Lets hope the new AMD 790 finally gives AMD users a great one.
I don't want to dwell on this here especially out of respect for others, but I'd like to make it clear. My method was correct but I was in error when I forgot to normalize the values for decreasing number = increasing performance. The formula stays consistent but the math in this scenario becomes:-
((1/32.672 - 1/38.968) / 1/38.968) x 100
= ((0.0306072 - 0.0256621) / 0.0256621) x 100
= 19.27% performance gain with a 20% clock increase
Hope it helps. :)
========
Com'on Kyosen-san, we are all waiting for some usual magic :D
... And for Stephen :stick:
Excuse me, "CRAP CHIPS" ? Oh you mean the first revision of new chips which is obviously labelled as CRAP by the hardcore uber elite clockers like you?
Explain how Nforce 2,3,4,500,550,570,590 or AMD 690G chipsets had MAJOR problems and hampered performance?
Today, I got my own Barcelona, Opteron 2346 HE:)
Bulk, but BA stepping:)
http://222.151.153.254/c-board/file/...6_GE_BA_x2.jpg
BIOS screen shot...NorthBridge clock is 1.60GHz
http://222.151.153.254/c-board/file/...BA_BIOS_NB.jpg
AMD Power Monitor, Everest, and CPU-Z
http://222.151.153.254/c-board/file/...rMon_Evrst.png
Memory clock is showed differently between CPU-Z and Everest...
I have no idea which is correct or not...
Now I'm thinking Vcore & Vnb mod,
but yeah I should run several benchmark programs
before I broke this board by mistake:D
best of luck m8
We'd have to highjack the entire thread, but an example would be the NF3 and other early 939 chipsets. If you gave it too much voltage on vdimm you couldn't boot at 200 fsb. Most thought the boards were all buggered until Misteroadster and myself figure out you had to boot around 230 ish. There's and entire sticky about problems with the dfi nf4.
Anyway, too me this is a plus for AMD. It means there's extra performance that can be gained with an excellent chipset, which so far it appears they have with the 790
This is certainly a thread to watch. I'd love to see how the scaling shows up in real-world apps.
Following the thread Kyosen-san, waiting for you to make those chips fly.
I hope you dont mind me doing a quick compare at your Mhz.
Vista 32bit, all services running.
http://222.151.147.26/c-board/file/C...t2350_2.4G.png
http://fugger.netfirms.com/c3.jpg
Charles, it'd be nice if you would have done a comparison using hardware that's currently shipping :rolleyes: Last I checked, Newegg doesn't list any Yorkfield Qx9770's along with Corsair PC3-16000's (as well as an "unknown" mb) :p: Just wait a bit and a more fair comparison would be with b2 stepping barcelonas.
BTW, I'm surprised the Barcelona is even running that close to the Yorky @ 3.2....maybe there is hope for Phenom after all :yepp:
"BTW, I'm surprised the Barcelona is even running that close to the Yorky @ 3.2....maybe there is hope for Phenom after all "
Come on S7 thats normal Intels $2500 processor always beats Amd's $400 one and its always the comparison that gets made. One thing for sure that Yorky cant beat Crays Red Storm(Powered BY K8 Opterons),Intel doesnt make anything that can Only IBM does
Yea I got this feeling in my gut that says,there gonna be a real big shocker comin soon
followed by major fanboy whinning and crying
Oops...Or is the Yorky running @ 2.4? I say your cheating! :p: j/k
I clocked down, below the Q9450 with Vista 32bit.
Showing where things are now is interesting.
I think that what FUGGER is trying to say is that IF a person in the near future goes and buys TWO machines, one with Phenom at 2.4GHz and one with Yorkfield at 2.4GHz TOO (NOT expenssive at all) and compare them both, he'll get the above resaults...... Nothing more and nothing less....:)
sorry I dont know him ,and no disrespect but Ive already seen benches of that proc. and it didnt beat barcy at 2.5 by that much !
FUGGER, to be a fair comparison, would you mind to run your bench again with only ONE stick of ram. or two in single channel for that matter?
and ddr2 at 400mhz 5-5-5 would equal to your 800mhz DDR3 at what? 9-9-9?
i would really like to see such a comparison
We have to "match" on BOTH systems the Memory Bandwidth if we have to do it as you say....How much Everest Mem Bandwidth have the Barchies and how much the York?.....Latencies too.....Because due to internal Mem controler of the Barchie, the Mem Bandwidth is high......So to be more "accurate", BOTH Mem Bandwidths/Latencies MUST be "matched" somehow.....;)
a 'fair' comparison would be with a BA (wont exist afaik) or B2 Phenom:)
and i smell some Phenom benches at the horizon in the very near future:)
By the way , no application with 3.57 SMP speed up (like Cinebench R10) on Core2 Quad can depend on memory speed that much.
Memory-intensive apps. usually have 2-3 speedup on this processor.
Actually, the Yorky probably nets a much higher BW mark than the Barcelona in Everest. IIRC, I got something like ~5500 with the 2350's + 8gb DDR2 667, 5-5-5-x. Phenom will do much better since it's able to run regular DDR2 and @ a higher speed
BA's WON'T do any better in these benches than the B1's.
Hi Kyosen,
I got your PM, we'll work on that.
Everest is assuming that memory clock is obtained from bus clock on K10, and CPU-Z computes it from CPU clock (as for K8).
IMO Everest is correct, but Tamas & I had no confirmation at the moment. We're expecting a benchmark can tell what the real clock is.
Nice rig, I'm impatient to see results on the new BA stepping :up:
Heh, heh, heh.......I was JUST informed that after the Phenom presentation in Poland, a mobo and a Phenom CPU will come into my place for testing..... :D :p:
EDIT: Of course I won't touch it with my own hands.......I'll bring esdee to do it.....He's a "painted" AMD guy.... :p:
tictac you still making modified BIOS's? Or do I got the wrong guy.
I'm still rocking the DFI Ultra infinity NF2 mobo :)
AFAI tested Cinebench 10 scales very well with CPU and RAM speed, gaining as much as 40 CB marks from just 40MHz (DDR 80MHz) (all things kept constant).
Intel/DDR2 all the way up to 1000MHz doesn't get much in terms of benchmark bandwidth/latency compared to AMD K8/DDR/DDR2 but the new combo of Intel/DDR3 quite obviously gets higher bandwidth as you start going past DDR3-1333 at good timings. Charles' system was running 7-7-7-15 @ 1600MHz which is very close to your Barcelona timings but twice the RAM speed, and twice your bandwidth. His system was 693 CB ahead, which is within reach of high MHz/low latency RAM if the bandwidth is high.
Sorry for my slow working, and I'm happy to see Intel Guys also enjoying in this thread:D
I have three types of CPUs
K8 Opteron 2212HE = 2.0GHz = 200x10
K10 Opteron 2350(B1) = 2.0GHz = 200x10
K10 Opteron 2346(BA) = 1.8GHz = 200x9
We have no way to change multiplier of K10 so far,
I tested K8 Opteron at x9 with CrystalCPUID.
# KFSN4-DRE's BIOS has no multiplier option...
# but now Franck(as cpuz here) is working for K10 functions:up:
I used 1GB x2 memory modules.
They are Dual mode with K8 and Unganged mode with K10.
Memory timing was basically Auto, but I adjusted it in K8 case as same as possible in K10.
OS: Windows Server 2003, no-use services are set as manual starting.
MAXMEM, RAM disk, and copy-waza weren't applied.
SuperPI 1M comparison...screenshots are located on my BBS:
http://www.oohashi.jp/c-board/c-boar...ne;no=5229;id=
K8-1.8G=200x9: 46.468s
K10(BA)-1.8G=200x9: 43.172s -> 46.468/43.172=1.0763
K8-2.0G=222x9: 41.968s
K10(BA)-2.0G=222x9: 38.781s -> 41.968/38.781=1.0821
K8-2.0G=200x10: 41.828s
K10(B1)-2.0G=200x10: 39.125s -> 41.828/39.125=1.0690
K8-2.2G=220x10: 37.953s
K10(B1)-2.2G=220x10: 35.578s -> 37.953/35.578=1.0667
K8-2.4G=240x10: 34.734s
K10(B1)-2.4G=240x10: 32.484s -> 34.734/32.484=1.0692
SuperPI 4M comparison...screenshots are located on my BBS:
http://www.oohashi.jp/c-board/c-boar...ne;no=5230;id=
K8-1.8G=200x9: 3m59.359s
K10(BA)-1.8G=200x9: 3m47.125s -> 239.359/227.125=1.0538
K8-2.0G=222x9: 3m34.547s
K10(BA)-2.0G=222x9: 3m24.250s -> 214.547/204.250=1.0504
K8-2.0G=200x10: 3m35.047s
K10(B1)-2.0G=200x10: 3m26.953s -> 215.047/206.953=1.0391
K8-2.2G=220x10: 3m15.141s
K10(B1)-2.2G=220x10: 3m07.797s -> 195.141/187.797=1.0391
K8-2.4G=240x10: 2m59.297s
K10(B1)-2.4G=240x10: 2m51.906s -> 179.297/171.906=1.0429
Quick conclusion from above:
* There is no strange scaling behaviour within same CPU type.
* L3 contributes for shorter 1M calculation, and its effect reduces for longer 4M calculation.
* K10(BA) looks slightly...very slightly...faster than K10(B1).
Here, we need attention for NorthBridge multiplier...
K10 2346's NB is working at x8, and x9 in 2350's case.
So, if compared at same core&NB multiplier, BA's advance will increase a little more, I think.
Yeah K10 is the improved K8, indeed, so far, at least for SuperPI.
I wonder whether inmature BIOS may prevent the real K10 power or not,
but only AMD (and SOLDNER-MOFO64 too!?) knows:)
I'll try CineBench10 in 32bit & 64bit Windows...or 3DMarks?... tonight.
I was just looking in the SUSE 10.3 Hardware Info tool,under the processor,it shows prefetch for 3dnow(3dnowprefetch) but I dont see anything about SSE prefetch.ANyone know why?
Very nice work Kyosen, thank you. :)
Can you compare one run of Barcelona at X CPU speed and run it twice at two different memory speeds (like 667 and 800) in Cinebench 10? Is that possible?
I'd like to see what effect RAM has on the score to see what we can expect from 1066 CAS4 RAM equipped Phenoms.
So;
@1.8G Increase between K8:K10(BA)-> 46.468/43.172=1.0763 - 7.6%
@2.0G Increase between K8:K10(BA)-> 41.968/38.781=1.0821 - 8.2%
@2.0G Increase between K8:K10(B1)-> 41.828/39.125=1.0690 - 6.9%
@2.2G Increase between K8:K10(B1)-> 37.953/35.578=1.0667 - 6.7%
@2.4G Increase between K8:K10(B1)-> 34.734/32.484=1.0692 - 6.9%
Thus including therein the RAM/NB speed gain;
K8 from 1.8GHz -> 2.4GHz = 33.33% Clock Increase - 33.78% Performance Increase
K10 (B1/BA) from 1.8GHz -> 2.4GHz = 33.33% Clock Increase - 32.90% Performance Increase
This is like Penryn over Core 2 so far although I'm not sure how the Intel chips scale on SPi.
3dmark06 would be good :D
Excuse me for this n00b question:
If memory clock is derived from cpu clock divided by some number,
so what is northbridge clock for??
Thanks
I think it's not really a NB, this is just the IMC speed. On K8 it was always 1:1 with the CPU, apparantly now it's seperate.
This seems interesting to me as on K8 the IMC was suspected to play a major role in the max CPU clockspeed. A typical good K8 had mem revision B1 or BB as you will know, which would get you higher than a good week BW.
Thanks for reply..
So, IMC speed is 'internal speed' only?
To communicate with the core perhaps?
I could get 8800GT yesterday:)
http://222.151.153.254/c-board/file/...E_GF8800GT.jpg
...yes, I can't do 2 CPU operation with long PCIe board:D
I used 169.01 driver with inf file modified...it looks a bit faster than 167.26.
I tried 2.0G, 2.2G, 2.4G...and in 2.4G case, I tried 3core(1core disabled) and
2core(2core disabled) as emulation of Toliman and Kuma:)
I also tried on P5K3-D/Kentsfield with mild memory setting...
Results...screenshots are located on my BBS:
http://www.oohashi.jp/c-board/c-boar...ne;no=5231;id=
CPU clock: 3DMark Score / CPU Score
K10(BA) 2.0G=222x9: 9482 / 2999
K10(B1) 2.0G=200x10: 9353 / 2979
K10(B1) 2.2G=220x10: 10148 / 3267
K10(B1) 2.4G=240x10: 10796 / 3560
K10(B1, 3core) 2.4G=240x10: 10420 / 2830
K10(B1, 2core) 2.4G=240x10; 9423 / 1888
Kentsfield 2.4G=267x9: 11901 / 3845
Kentsfield 3.0G=334x9: 12965 / 4792
Quick conclusion from above:
*K10(BA) looks a bit faster than K10(B1), though multiplier aren't same.
*K10 score seems about 10% lower than Kentsfield at same clock.
So, K10 needs more clock!!!
I'll try CineBench10 in next time...
...now very sleeply...it's early morning here in Japan:)
Hmm, so IF the ICM was the limiting factor when overclocking K8, now we can divide ICM speed with K10, will that mean that we can gain intel-like overclocks (+4.5GHz)? :O
Thanks Kyosen.
Too bad it lags 10% behind Kentsfield now. Maybe Phenom will close the gap to Kentsfield at least.
Communication is done through the HT bus, which has a seperate speed. The Integrated Memory Controller communicates with the memory (read / write). I invite anyone with more technical background to fill in the details. :D
An interesting theory is it not? But ofcourse there are other factors.
beardy,
How do you know the barc. proc. was in single channel mode?
smartypants,
K10 is fine, I think its you that has a problem.:D
K10(B1) 2.4G=240x10: 10796 / 3560
Kentsfield 2.4G=267x9: 11901 / 3845
It would be interesting to see 267*9 on the AMD system or 240*10 on the intel and then to see how that cpu bench scales ;)
ok i am gonna come out and say it...
SHUT UP
no need for petty arguments.
now mabey i dident see it but is the bored ur running a 1207+? so it can utilize HT3? just curious.
yes it would... hard one to predict *concentrating very very hard channelling my clairvoyant powers* oh my god, i got it! no the fsb wouldnt have any damn effect, when the hell has it ever??
i wasent talking the FSB, HT3 is on the die correct? and it would provide direct links to the ram and what not. instead of going through the FSB
FSB x the multplier matters. MOBO fsb? Don't mean sh*t.
Things change everyday, though. :cool:
I dont know how Ganged mode works...if it is a single read/write or two but Unganged mode is two different DCT, there part isn't shared...they both write or read...and just to add, K10 doesnt read/write anymore...it reads until the DCT buffer is out of space and bursts all the awaiting writes to avoid the constant read/write switch delay.
I'm curious about how the relative clock speed between the L3 and the cores affects the L3 cache latency. If I read this right, some combinations should give lower latency than others.
To support independent clocking and modular design, asynchronous dynamic FIFO buffers are used to communicate between different cores and the northbridge/L3 cache. These FIFOs absorb any global skew or clock rate variation, but the latency for passing through depends on the skew and frequency variance – which is why the L3 cache latency is variable.