AMD Zambezi news, info, fans !

Printable View

Show 100 post(s) from this thread on one page

08-13-2011, 11:48 PM
chew*

Quote:

Originally Posted by Dumo

Any prediction of how good/bad BD is just speculation...until we have an actual FX chip on our mobo's socket:)

I don't need to speculate But i surely am not going to be anyones "source" either.
08-13-2011, 11:56 PM
TESKATLIPOKA

chew* we don't want any trouble for you so we won't ask.
08-14-2011, 07:36 AM
freeloader

Quote:

Originally Posted by TESKATLIPOKA

freeloader BD won't crush SB but will be an excellent competitor that's what we all want. To crush the competition BD would need to have at least a core better by 10% than SB. Even if it had the same core to core performance It would be better only in 5-8 threads because SB has HT. You need to be reminded that SB is better by 30-35% than Deneb core to core and you want BD to have a better core by 40-50% than Deneb thats pretty much unreal.
Because you won't get that much improvement BD is a disappointment for you, then what would you buy, SB? BD should be on par with SB or you can get SB-E but it would be pointless if you don't use more than 4 threaded applications because SB and SB-E will perform almost the same in single thread and only the higher core count will make the real difference.

I was hoping that BD would be faster than SB by 10%+ in most benches. I bought into the hype too heavily. AMD will have a chip that's within 10% of the performance of an I2600K. It's a good chip at a good price, I was just expecting a bit more. Because of my current setup (955@4ghz), I'm still going to buy the top model BD just to overclock the heck out of it when it's released. Roughly eight weeks to go. :)
08-14-2011, 07:40 AM
Manicdan

Quote:

Originally Posted by freeloader

I was hoping that BD would be faster than SB by 10%+ in most benches. I bought into the hype too heavily. AMD will have a chip that's within 10% of the performance of an I2600K. It's a good chip at a good price, I was just expecting a bit more. Because of my current setup (955@4ghz), I'm still going to buy the top model BD just to overclock the heck out of it when it's released. Roughly eight weeks to go. :)

within 10% on what tests? single threaded tests?, up to 4 threads?, 8 threads?, stock clocks? OCed?
they are way different in their design and both chips are designed to be competitive in all different segments, but i do expect one to be a clear winner in some, and clear looser in others.
08-14-2011, 07:56 AM
Dumo

Quote:

Originally Posted by chew*

I don't need to speculate But i surely am not going to be anyones "source" either.

At least it will scale with cold...or is it?:D
08-14-2011, 07:57 AM
rintamarotta

It is bit weaker than expected few months ago, how ever i really think that Socket FM2 Bulldozer will show the true performance of this architecture.

AMD allways scales better with cold due its SOI.
08-14-2011, 08:22 AM
BeepBeep2

Quote:

Originally Posted by Daveburt714

Beep is a fanboy... :D

It don't make you bad man...I am too. ;)

I'll shaddup up now... I always drink beer, damn, whiskey! sigh...

I'm not a :banana::banana::banana::banana: folks, sorry for even posting. It doesn't make me bad (hic). :cool:

Dave there is a key word in my post -
"Competitively priced"

I really want a 2500K and UD5 right now, actually I have for a while...but I don't have the money. I do prefer AMD hardware over intel a little bit, but as far as X6 lags behind it's time for something new to play with. I want to buy bulldozer so money has been put aside.
08-14-2011, 09:21 AM
undone

Quote:

Originally Posted by rintamarotta

It is bit weaker than expected few months ago,

Are you refering to 'beat 2600k and 990x with ease' statement? I always speculate Zambezi will on par with 990x.
08-14-2011, 09:43 AM
Olivon

Quote:

Originally Posted by rintamarotta

It is bit weaker than expected few months ago, how ever i really think that Socket FM2 Bulldozer will show the true performance of this architecture.

AMD allways scales better with cold due its SOI.

:rofl: :ROTF:
08-14-2011, 09:56 AM
TESKATLIPOKA

freeloader I think you should see drfedja's chart, thats the best case scenario BD could perform on average against the competition, doesn't mean it will really happen but If yes I would be more than happy, then the only thing I need is a Trinity 13'' notebook:D.

Manicdan BD in SuperPI will be the clear loser thats for sure:up:
08-14-2011, 02:53 PM
demonkevy666

Quote:

Originally Posted by TESKATLIPOKA

freeloader I think you should see drfedja's chart, thats the best case scenario BD could perform on average against the competition, doesn't mean it will really happen but If yes I would be more than happy, then the only thing I need is a Trinity 13'' notebook:D.

Manicdan BD in SuperPI will be the clear loser thats for sure:up:

who uses X87 ?
08-15-2011, 01:35 AM
TESKATLIPOKA

demonkevy666 I meant it as a joke;)
08-15-2011, 05:18 AM
Warwian

According to the Swedish site Sweclockers there are rumors circulating in the Swedish sales channel that Bulldozer deliveries will begin in the last week of September (somewhere around the 26th to 30th of September).

If true it looks like we might see a September release after all, which is good news!

Link:
http://www.sweclockers.com/nyhet/143...an-i-september

Translated:
http://translate.google.se/translate...an-i-september
08-15-2011, 08:50 AM
RussC

AMD said in the conference call that "BD will ship this qtr for revunue". Well thats all the way out to Oct.. This Sept. 19 date seems to be the next bogy. It'll be real telling then.

edit, once the cpus start shipping to OEM's, to me thats when the real "perfromace leaks" will start in ernest.

RussC
08-15-2011, 10:45 AM
Formula350

Quote:

Originally Posted by Warwian

According to the Swedish site Sweclockers there are rumors circulating in the Swedish sales channel that Bulldozer deliveries will begin in the last week of September (somewhere around the 26th to 30th of September).

If true it looks like we might see a September release after all, which is good news!

Quote:

Originally Posted by RussC

AMD said in the conference call that "BD will ship this qtr for revunue". Well thats all the way out to Oct.. This Sept. 19 date seems to be the next bogy. It'll be real telling then.

I've been gone all of July (left the 8th) but what I gleaned upon coming back (Aug 5th, was w/o internet at the cabin) is Sep 22nd, and first week or so in Oct for being in retail. Before I left it was an early/mid August slated release :(

I think this date will stick though, finally :\
08-15-2011, 06:25 PM
rintamarotta

I dont have 990X in here so i dont know performance compared to 990X.
08-16-2011, 01:13 AM
flyck

Quote:

Originally Posted by freeloader

You're the one who said AMD is going to have a $300 dollar price tag on BD, not me. If you had a product that was faster than your competitors, you would definitely charge more for it. We will see in two months who is right and who was wrong. A lot of AMD peeps around here are going to end up with egg on their faces, including myself. I already set up a CHV system with a Phenom II 955 (waiting for BD to arrive when I built it), but now I feel I made a poor decision. :(

The rumoured Quad core 3.6GHz-3.9GHz SB-E is priced at 294$. If it is close to that in speed then that is a great feat
08-16-2011, 04:09 AM
memmem

Quote:

Originally Posted by rintamarotta

I dont have 990X in here so i dont know performance compared to 990X.

Do you have a 2600k?
How does it compare?
08-16-2011, 06:45 AM
tbone8ty

just curious iFANyBOdY knows what the NB freq will be on Bulldozer?

is it still the same 2000mhz HT link and 2000mhz NB freq? will the NB have an unlocked Multi?
08-16-2011, 07:20 AM
xVeinx

HT 3.1 should be at 3 Ghz (9xx series chipset), along with 2.6 Ghz NB frequency I believe. No idea on the multis though...
08-16-2011, 09:06 AM
Mechanical Man

Quote:

Originally Posted by xVeinx

HT 3.1 should be at 3 Ghz (9xx series chipset), along with 2.6 Ghz NB frequency I believe. No idea on the multis though...

With PII you cant have lower clock on HT than in NB, so i doubt that.
08-16-2011, 09:24 AM
Smartidiot89

Quote:

Originally Posted by RussC

AMD said in the conference call that "BD will ship this qtr for revunue". Well thats all the way out to Oct.. This Sept. 19 date seems to be the next bogy. It'll be real telling then.

edit, once the cpus start shipping to OEM's, to me thats when the real "perfromace leaks" will start in ernest.

RussC

AMD said more than that, Bulldozer would ship to their partners during august.

Quote:

Originally Posted by Mechanical Man

With PII you cant have lower clock on HT than in NB, so i doubt that.

Yes, the Opterons can have that so I don't see why it would be a hard limitation for AMD's brand spanking new architecture.
08-16-2011, 09:37 AM
charged3800z24

Quote:

Originally Posted by Mechanical Man

With PII you cant have lower clock on HT than in NB, so i doubt that.

I think you ment you can't have lower clock on NB then on HT.
IIRC, some Opterons have higher HT Link then the NB.
Believe it was 4000 series with 2.2ghz HT and 2.0ghz NB
08-16-2011, 11:04 AM
Mechanical Man

Quote:

Originally Posted by Smartidiot89

AMD said more than that, Bulldozer would ship to their partners during august.

Yes, the Opterons can have that so I don't see why it would be a hard limitation for AMD's brand spanking new architecture.

Did not know that. In that case it must not be limiting factor.
08-16-2011, 11:41 AM
undone

Quote:

The things i can tell you.

1. Bulldozer is here mid-Sept
2. Bulldozer will be very price competitive vs Sandy Bridge
3. Bulldozer is awesome - ha lol.
4. Every part is unlocked
5. It can OC very well!
6. AMD will be doing something special with LN2 very soon - 1-2 weeks.

http://forums.aria.co.uk/showthread.php?t=75187

http://forums.aria.co.uk/showpost.ph...7&postcount=11

It can just confirm it's worth waiting.
08-16-2011, 11:45 AM
andos

I'm seriously thinking about freezing myself, then unfreeze when it's launched..
08-16-2011, 11:52 AM
Manicdan

that was a really bad idea for cartman with the Wii

i betting you should invest your money into LN2, cause im thinking demand will go up and so will prices and you might turn a big enough profit to buy a second BD :up:
08-16-2011, 12:08 PM
andos

Quote:

Originally Posted by Manicdan

that was a really bad idea for cartman with the Wii

i betting you should invest your money into LN2, cause im thinking demand will go up and so will prices and you might turn a big enough profit to buy a second BD :up:

Yeah, I was thinking it wouldnt be too bad waking up in about, 200~300 years, then I could be some old, ancient chip wiseman. In that time, they have probably invented CPU's(or probably just APU'S) which is basically modified atoms(not intel atoms, those small atoms were kind off made of) then there will hopefully be some kind of chip museum or something, where they will hopefully have some kind of AMD section, and BD hopefully not their last chip. :)
08-16-2011, 01:37 PM
rintamarotta

Quote:

Originally Posted by memmem

Do you have a 2600k?
How does it compare?

Dont have mobo :(, my Gigabyte broke while overclocking :(.
08-16-2011, 03:06 PM
memmem

Quote:

Originally Posted by rintamarotta

Dont have mobo :(, my Gigabyte broke while overclocking :(.

Sorry about that...

Don´t want you to brake a NDA, but don´t you know the numbers to compare?
08-16-2011, 08:40 PM
rintamarotta

Zambezi numbers yes i know, i5 2500k or i7 2600k numbers no, ran only graphics benchmarks.
08-16-2011, 11:40 PM
imamage

Where is the quote from :confused:
08-17-2011, 02:05 AM
Olivon

Quote:

Originally Posted by rintamarotta

Second is Zambezi performance beating SB 2600K as well as i7 990x with ease.

Quote:

Originally Posted by rintamarotta

It is bit weaker than expected few months ago, how ever i really think that Socket FM2 Bulldozer will show the true performance of this architecture.

AMD allways scales better with cold due its SOI.

Quote:

Originally Posted by rintamarotta

I dont have 990X in here so i dont know performance compared to 990X.

Quote:

Originally Posted by rintamarotta

Zambezi numbers yes i know, i5 2500k or i7 2600k numbers no, ran only graphics benchmarks.

http://tof.canardpc.com/view/0b1c7ad...14904816b8.jpg
08-17-2011, 02:23 AM
TESKATLIPOKA

Olivon I agree with you. If he really had BD I would be surprised.
08-17-2011, 03:41 AM
FlanK3r

I dont know..., what I know, the right true we will see soon.
08-17-2011, 04:35 AM
Olivon

Quote:

Originally Posted by TESKATLIPOKA

Olivon I agree with you. If he really had BD I would be surprised.

I smell an HFR private joke ... :D

Anyway, I never said that Rintamarotta has no BD, if you look back onto this thread it really seems he got one :

Quote:

Originally Posted by rintamarotta

I'll leave the screens to Zambezi release but 4.8ghz 1.47volts, Noctua NH-D14 3-fan push-pull/push-pull config.

Quote:

Originally Posted by rintamarotta

Its litlebit cooler than 1090T OC'd to 4.1Ghz with 1.47volts, will not say more.
08-17-2011, 04:39 AM
TESKATLIPOKA

FlanK3r probably something new will be presented this Friday at Hot chips conference of course no benches but maybe some new info about the architecture.

Olivon maybe I am wrong and he really has BD samples but thats something only he knows.
His post about how he can't compare BD to SB or Gulftown despite the fact so much reviews with different tests are out didn't add to his credibility, but maybe I am just misunderstanding.
08-17-2011, 05:44 AM
ilkkahy

Quote:

Originally Posted by rintamarotta

Zambezi numbers yes i know, i5 2500k or i7 2600k numbers no, ran only graphics benchmarks.

Is it retail or ES Bulldozer then?
08-17-2011, 05:52 AM
memmem

Quote:

Originally Posted by TESKATLIPOKA

FlanK3r probably something new will be presented this Friday at Hot chips conference of course no benches but maybe some new info about the architecture.

Olivon maybe I am wrong and he really has BD samples but thats something only he knows.
His post about how he can't compare BD to SB or Gulftown despite the fact so much reviews with different tests are out didn't add to his credibility, but maybe I am just misunderstanding.

I hope that, at Hot Chips, they can explain the performance degradations on the Software Optimization Guide:

-----------------------

Note: For best performance, do not mix streaming instructions on a cache line with non-streaming store instructions.
The following performance caveats apply when using streaming stores on AMD Family 15h cores.

•When writing out a single stream of data sequentially, performance of AMD Family 15h processors is comparable to previous generations of AMD processors.

•When writing out two streams of data, AMD Family 15h version 1 processors can be up to three times slower than previous-generation AMD processors. AMD Family 15h version 2 processor performance is approximately 1.5 times slower than previous AMD processors.

•When writing out four non-temporal streams, AMD Family 15h version 1 can be up to three times slower than previous AMD processors. AMD Family 15h version 2 processor performance is comparable to previous AMD processors.

•Using non-temporal stores but not writing out an entire cacheline may cause performance to be up to six times slower than previous AMD processors.

------------------------

It seems that this problem will even be (at a lower rate) on BDv2.

How will this affect BD performance on benches and "real world"?
08-17-2011, 06:19 AM
drfedja

I hope that is rare situations where prefetchnt instructions are critical. Maybe it could be critical when CPU works with high data paralelism and SIMD instructions. However it is crictical when we have data stream conflicts. When we use one single stream, writiong out data is comparable to 10h.
Probably lower performing prefetchnt instructions is caused by WT data cache policy. Bacause L1D is WT, every write to the cache causes a synchronous write to the backing store.
To avoid performance drop, AMD designers included WCC (Write Coalescing Cache) cache for WT stores for both integer cores.
In general, PREFETCHNTA instruction hints processor to fetch the data non-temporally (i.e. this data is not to be used again or used only once). e.g. You're copying data from one location to another you can use this instruction in that case. And PREFETCHTn instructions hints processor that these data are needed repeatedly. e.g. You're doing calculations on same data.
08-17-2011, 09:05 AM
memmem

Quote:

Originally Posted by drfedja

I hope that is rare situations where prefetchnt instructions are critical. Maybe it could be critical when CPU works with high data paralelism and SIMD instructions. However it is crictical when we have data stream conflicts. When we use one single stream, writiong out data is comparable to 10h.
Probably lower performing prefetchnt instructions is caused by WT data cache policy. Bacause L1D is WT, every write to the cache causes a synchronous write to the backing store.
To avoid performance drop, AMD designers included WCC (Write Coalescing Cache) cache for WT stores for both integer cores.
In general, PREFETCHNTA instruction hints processor to fetch the data non-temporally (i.e. this data is not to be used again or used only once). e.g. You're copying data from one location to another you can use this instruction in that case. And PREFETCHTn instructions hints processor that these data are needed repeatedly. e.g. You're doing calculations on same data.

Thanks drfedja,

Do you think playing games on bulldozer will be affected by this?
08-17-2011, 10:05 AM
FlanK3r

memmem: maybe they meaning "version 2" revision number 2...
08-17-2011, 11:47 AM
memmem

Quote:

Originally Posted by FlanK3r

memmem: maybe they meaning "version 2" revision number 2...

But FlanK3r, then there were no need to mention it on the Guide.
08-17-2011, 12:58 PM
Baam

Quote:

Originally Posted by rintamarotta

Zambezi numbers yes i know, i5 2500k or i7 2600k numbers no, ran only graphics benchmarks.

Would you say the Bulldozer is worth waiting for?
08-17-2011, 01:00 PM
halofanman

Quote:

Originally Posted by andos

Yeah, I was thinking it wouldnt be too bad waking up in about, 200~300 years, then I could be some old, ancient chip wiseman. In that time, they have probably invented CPU's(or probably just APU'S) which is basically modified atoms(not intel atoms, those small atoms were kind off made of) then there will hopefully be some kind of chip museum or something, where they will hopefully have some kind of AMD section, and BD hopefully not their last chip. :)

its more likely you would wake up in a borderlands / fallout period and you'd be lucky to get your hands on an Am386 :p:
08-17-2011, 02:43 PM
rintamarotta

Quote:

Originally Posted by TESKATLIPOKA

Olivon maybe I am wrong and he really has BD samples but thats something only he knows.
His post about how he can't compare BD to SB or Gulftown despite the fact so much reviews with different tests are out didn't add to his credibility, but maybe I am just misunderstanding.

Yes there is misunderstanding, i do not look at other peoples results or if i look i wont take notes of them.

Cannot compare SB or Gulftown to BD since i dont have working X58 or Z/P67 board at the moment. I have 980X and 2500K and 2600K cpus tough and 980X is pretty close 990X.
I lost alot of hardware at thunderstorm this summer, servers and my main pc is behind ups unit but other pc's and benching hardware isnt.
I didnt loose BD and 990FX since they werent plugged in at that moment.

Lost 6000$ worth of hardware and im not going to risk that again, now everything is behind ups units.

Quote:

Originally Posted by Baam

Would you say the Bulldozer is worth waiting for?

Id say yes, need see its price first tough.
08-17-2011, 03:20 PM
drfedja

Quote:

Originally Posted by memmem

Thanks drfedja,

Do you think playing games on bulldozer will be affected by this?

I hope not. Maybe there could be exception, but in general I think that the BD will be much faster than Thuban. If you write some memory cpy routine to avoid standard C/C++ library you can use these NT - non temporal instructions.
A typical use for non-temporal stores is copying memory regions that are too large to fit in the cache. Using ordinary stores for that would waste memory bandwidth by unnecessarily fetching all the data in the destination region into the cache before overwriting it. Any useful data that is in the cache before the copy would also be replaced with data from the source and destination regions.
Another typical use is initialization of data structures too large to fit in the cache, for example, setting a large array to all zeros.

A major drawback of non-temporal stores is that they are fairly complex to work with. If they are improperly used they can easily cause performance degradations, or even hard-to-debug bugs in the case of multithreaded programs. There is a main reason to think that BD performance could not be sacrificed because of streaming conflicts - that is improper usage of NT store instructions.
08-18-2011, 07:31 AM
PatRaceTin

hmmmm i read next thread

FM2 socket ? very confused
08-18-2011, 07:41 AM
Evantaur

Quote:

Originally Posted by PatRaceTin

hmmmm i read next thread

FM2 socket ? very confused

well they're getting rid of AM3 socket
08-18-2011, 09:01 PM
Daveburt714

One of the most objective articles I've seen on FX performance...
I'm not sure how true it will turn out to be, but it does seem to make sense, and would
explain some of the wild swings we've seen from pre-release "benchmarks :rolleyes:"...

http://www.extremetech.com/computing...s-next-gen-cpu

Maybe it's just me, but these chips look like a blast to play with even if they can't crush BigBlue in all situations... ;)
08-18-2011, 10:19 PM
FlanK3r

next crazy chinese fake?....Now with very impressive score, the say, this is FX 4120
Attachment 119104
http://diybbs.zol.com.cn/11/11_100430.html

Think, it is not possible, 2 modules higher score than 980x....
08-18-2011, 10:44 PM
informal

I don't think that is a score for QC Zambezi Flanker. It may be a score for 8C one ,it fits almost perfectly (my estimate for 8C was 19220pts :) ).
08-18-2011, 11:15 PM
haschioz

its the uber quadcore....
08-19-2011, 12:27 AM
FlanK3r

Informal: I think the same, logic say me, simillary score is possible only for 4-modules. Maybe Fritz readings modules as number of cores?

But...what is interesting is this:
from RAWZ
08-19-2011, 01:13 AM
TheBreezyBB

That is some Super Exciting news from RawZ.
08-19-2011, 01:50 AM
Mats

Quote:

It'll be interesting to see how BD handles the well known Intel loving Super Pi benchmark. If AMD BD can outperform SB in that benchmark, then fooook me, that'll cause a massive storm.

:rofl::rofl::rofl::rofl:
Yeah Super Pi has the highest priority, not some toy benchmark like Cinebench or whatever..
08-19-2011, 02:47 AM
TESKATLIPOKA

AMD doesn't care about SuperPI.
08-19-2011, 10:05 AM
imamage

Quote:

Originally Posted by FlanK3r

Informal: I think the same, logic say me, simillary score is possible only for 4-modules. Maybe Fritz readings modules as number of cores?

But...what is interesting is this:
from RAWZ

Quote:

clock for clock BD is approx the same as Sandy Bridge.

Sorry for my poor english , but from my understanding it sounds like Bulldozer will be FUN to overclock with
08-19-2011, 10:43 AM
Manicdan

the supposed LN2 AMD event is coming really soon, it would be nice to see if they can break 9ghz
08-19-2011, 10:57 AM
El Gappo

Quote:

Originally Posted by Manicdan

the supposed LN2 AMD event is coming really soon, it would be nice to see if they can break 9ghz

lol, talk about greedy :p

RawZ is just speculating and dreaming the same as you lot btw.
08-19-2011, 12:16 PM
TESKATLIPOKA

A HTT bug causing performance problems could have been the reason for the delay.
http://donthatethegeek.com/2011/08/1...in-production/

credit goes to yuri.cs from pctuning forum who I got it from
08-19-2011, 02:03 PM
Dirk Diggler

Quote:

Originally Posted by TESKATLIPOKA

A HTT bug causing performance problems could have been the reason for the delay.
http://donthatethegeek.com/2011/08/1...in-production/

credit goes to yuri.cs from pctuning forum who I got it from

Well that would explain some of the horrible score weve seen.
08-19-2011, 02:12 PM
TESKATLIPOKA

Dirk Diggler I don't know, the scores what I saw were also in multi-threaded applications and didn't seem low, maybe they will be in B2 so much better based how much perf. penalty this bug causes.
08-19-2011, 03:15 PM
Olivon

Comp_Nou has changed the mobo with M5A97 EVO one. Results are better than the first try with AMD760 :

http://tof.canardpc.com/preview2/041...17774c817d.jpg

The guy is waiting for better BIOS

http://www.chiphell.com/thread-250461-1-1.html
08-19-2011, 03:31 PM
undone

Quote:

Originally Posted by Olivon

Comp_Nou has changed the mobo with M5A97 EVO one. Results are better than the first try with AMD760 :

AMD760? OMG Are they serious?
08-19-2011, 03:35 PM
TESKATLIPOKA

undone yeah, they are serious Chinese amateurs.:shakes: Nothing to be surprised about.
08-19-2011, 03:35 PM
CrazyNutz

Quote:

Originally Posted by informal

Pi stresses the x87 logic inside the chip.That's is not representative of single thread performance.

That is a very incorrect statement. PI will stress a single FPU unit on a single core, within a single thread. This is an exceptional way to compare FPU execution speed between AMD, and Intel.
08-19-2011, 03:44 PM
informal

Quote:

Originally Posted by CrazyNutz

That is a very incorrect statement. PI will stress a single FPU unit on a single core, within a single thread. This is an exceptional way to compare FPU execution speed between AMD, and Intel.

Super pi uses ancient x87 instructions which are depreciated and obsolete since SSE2 launched. Super pi is in no way representative of FPU power of one single core.It just shows how fast can you do x87 math. Not to mention all those BD ES results are borked so move along.
BTW you quoted a post from January 2011,just FYI...
08-19-2011, 03:46 PM
TESKATLIPOKA

CrazyNutz no, he is right It doesn't represent single threaded performance

Quote:

Super PI utilizes the x87 instruction set. These instructions date all the way back to the 8087 math coprocessor. While they were important for 80386, 80486, and Pentium they became obsolete when 3DNow! and Streaming SIMD Extensions were released.

AMD doesn't care about this old instruction set which is no longer used in modern applications.
08-19-2011, 03:46 PM
muziqaz

Quote:

Originally Posted by CrazyNutz

That is a very incorrect statement. PI will stress a single FPU unit on a single core, within a single thread. This is an exceptional way to compare FPU execution speed between AMD, and Intel.

You have to be kidding, right?
I will correct informal here: Super Pi is not representative of any performance. The only thing that it does is shows how quickly(or better - slow) this app calculates Pi digits. That is all it does.
08-19-2011, 03:48 PM
TESKATLIPOKA

informal

Quote:

BTW you quoted a post from January 2011,just FYI...

really, what a CrazyNutz guy:rofl:.
08-19-2011, 03:53 PM
muziqaz

Quote:

Originally Posted by TESKATLIPOKA

informal

really, what a CrazyNutz guy:rofl:.

:D Great way to get yourself into discussion :D
08-19-2011, 05:22 PM
CrazyNutz

Oops my mistake quoting something as far back as 01/11. I'll own up here.

However one thing you guys must understand is "A Large Percentage of software is NOT optimized with SIMD" and some software cannot make effective use of SIMD. It takes manual optimizations to make real use of SIMD. If you were to disassemble most of the software you use on a daily basis you would know what I talking about. You would see a whole lot of NON SIMD instructions in use i.e. legacy float, and integer instructions.

Some of you seem to think the addition of new instructions make the original x86/87 instructions obsolete, well that is not always the case especially with vector instructions. Fellow programmers will know what I'm saying here.

Edit: added this

Quote:

Originally Posted by informal

Super pi uses ancient x87 instructions which are depreciated and obsolete since SSE2 launched.

They are not obsolete. They are used far more than you realize. That like saying x86 is obsolete.

Quote:

Originally Posted by informal

Not to mention all those BD ES results are borked so move along.
.

Don't get defensive, I'm not bashing BD at all :)
08-19-2011, 05:32 PM
informal

Since AMD64 instruction set launched along with 64bit Windows,x87 IS obsolete. In 64 bit OS x87 is not used(like literary ,it's replaced by SSE which duplicates its capabilities). There is no discussion there.
08-19-2011, 07:40 PM
CrazyNutz

Quote:

Originally Posted by informal

Since AMD64 instruction set launched along with 64bit Windows,x87 IS obsolete. In 64 bit OS x87 is not used(like literary ,it's replaced by SSE which duplicates its capabilities). There is no discussion there.

It does NOT fully duplicate it's capabilities, this is a misconception, and x87 has higher precision 80bit vs sse2 64bit. Also compilers try to optimize code to use faster instructions, like mmx/sse etc., however most of the time they fail to produce an executable using these instructions. Where SIMD (sse2/mmx/avx etc.) shines is when you have a string of data that needs to have the same operation performed, it really speeds things up, however again compilers are rarely smart enough to do these optimizations on their own, so these instructions most of the time are hand written and included as inline assembly. I write software this way, and so do my colleagues, however only when it's necessary, otherwise we allow the compiler to spit out x86/87 instructions.

EDIT: Looking further into this, When compiling for 64bit you are correct that sse replaces x87 instructions by default with most recent compilers (i.e. gcc defaults -mfpmath=sse for 64bit).
However there is still a lot of 32bit games, and programs in use, and when compiled for 32bit most compilers default to x87 instructions for floating point math.
08-20-2011, 12:19 AM
TESKATLIPOKA

What about Hot chips conference? Some new info about BD architecture would be great.
08-20-2011, 01:27 AM
drfedja

Quote:

Originally Posted by informal

Since AMD64 instruction set launched along with 64bit Windows,x87 IS obsolete. In 64 bit OS x87 is not used(like literary ,it's replaced by SSE which duplicates its capabilities). There is no discussion there.

It is used, if you are running 32-bit code.
08-20-2011, 01:53 AM
TESKATLIPOKA

drfedja Even if you are right, which program except SupePI or HyperPI still uses x87 instruction set?
08-20-2011, 02:41 AM
chew*

Quote:

Originally Posted by FlanK3r

Informal: I think the same, logic say me, simillary score is possible only for 4-modules. Maybe Fritz readings modules as number of cores?

But...what is interesting is this:
from RAWZ

This guy has no cpu's and has not tested any clearly. I would take anything he says with a grain of salt.

It's merely speculation.
08-20-2011, 02:43 AM
drfedja

Quote:

Originally Posted by TESKATLIPOKA

drfedja Even if you are right, which program except SupePI or HyperPI still uses x87 instruction set?

Not neseserly. If you have 64-bit hypothetic version of SPi or HyperPi, that will use only SIMD, or scalar SSE instructions. 32-bit programs can use both, x87 and/or SSEx.

64-bit programs run in long mode, but 32-bit programs run also in long mode on 64-bit operating system, but in compatibility mode, where is allowed to use all legacy instructions. 32-bit programs, as you know, on 64-bit system without recompiling and they are run unchanged. ;)

Quote:

Originally Posted by TESKATLIPOKA

drfedjawhich program except SupePI or HyperPI still uses x87 instruction set?

Everyone who has compiled with x87 and 32-bit code.
08-20-2011, 02:58 AM
TESKATLIPOKA

drfedja actually I was asking what other programs used for testing cpu's are still using x87?
I am asking because this is the only test where SB 2500 is almost 2x faster compared to x4 980.
08-20-2011, 03:08 AM
rintamarotta

I dont think any other testing program uses x87 exept Pi benchmark's.
At least not when 64bit versions of other benchmarks are used.
08-20-2011, 04:52 AM
drfedja

Quote:

Originally Posted by TESKATLIPOKA

drfedja actually I was asking what other programs used for testing cpu's are still using x87?
I am asking because this is the only test where SB 2500 is almost 2x faster compared to x4 980.

For example, 32-bit version of Cinebench may use x87, some game engines, nVidia physics engine for CPU (Physics87 :d ), some of old codecs and fp intensive applications that not use SSE for example Euler3D, I don't know, but if you wan't to know what app use x87 you may use AMD code analyst or similar code profiling tool, for eg. Intel VTune.
No one today modern app doesn't use x87 any more. That's for sure. ;) But SPi isn't modern, and today app. It is single thread, 32-bit, legacy x87 FP coded program. :P

Quote:

Originally Posted by rintamarotta

I dont think any other testing program uses x87 exept Pi benchmark's.
At least not when 64bit versions of other benchmarks are used.

Of course, because 64-bit compiled app can't use x87 legacy instructions. That is architectural limitation.
08-20-2011, 06:58 AM
TESKATLIPOKA

drfedja Thanks, that's what I wanted to know. That means SuperPI is not fit to be representative for single threaded performance in modern applications.
08-20-2011, 03:50 PM
drfedja

In general, every floating point intensive application has 55% FP code and 45% integer code and integer app has all integer code which contains more memory operations (mov, push, pop, call, ret) and less logical and arithmetical.

Quote:

That means SuperPI is not fit to be representative for single threaded performance in modern applications.

No it isn't, because it is like to benchmark performance of modern CPU with 16-bit DOS programm. It is pointless.
If you want to see how some CPU perform in calculating number Pi, there is some much better apps and programs whos calculate Pi digits hundred times faster than Super Pi. For example, Wolfram Mathematica calculate 1M Pi digits in fraction of second on old dual Core Opteron 170 S939.
What can you tell Super Pi about some new CPU. Almost nothing, except how it performs with cache and stack memory operations. SPi can't tell you how fast CPU handles with x87 legacy FP operations, because SPi is mixed code and it is limited by cache miss and memory reordering.
Only real application for SuperPi is in overclocking competition.
08-20-2011, 04:48 PM
zir_blazer

Talking about raw Hardware evolution, as GPGPU potentional is tapped for floating point performance, you will be less requiered to use legacy Instruction Sets and Hardware like the x87 FPU (That supposedly was a pain to program for in Assembler). x87 is already obsolete if you're running in Long Mode, something that is available from 8 years ago, so there is a waste of die size and power on a lot of legacy components that as time advances will make less sense, but are keep around just for legacy compatibility.
Bulldozer shows a bit of this type of evolution because it boast much more integer potential compared to its floating point capabilities, however, its not the first time we see something like this. Don't you guys remember that the Pentium 4 x87 FPU was quite weak and Intel was pushing that applications used SSE/SSE2 instead? However, AMD timming on this one seems a better bet and we still don't know how much legacy performance is sacrificed. If GPGPU takes over FPU tasks, that is when Fusion will really kick in, as you will have powerful GPU resources in place ready to take over.
Also, I'm very interesed on seeing how future designs evolve. You may not need dedicated Hardware for x87 or other obsoleted Instruction Sets other than at the Hardware decoder stage, after all, while the multiple Instruction Sets and Extensions are compatible with a ton of x86 Processors architectures, each work internally very different to the other after instructions are converted to MicroOps. As Fusion evolves you may see a very powerful FPU/GPU that is feeded decoded x87, SSE and GPU Microops.
08-21-2011, 02:01 AM
drfedja

Quote:

Originally Posted by zir_blazer

Talking about raw Hardware evolution, as GPGPU potentional is tapped for floating point performance, you will be less requiered to use legacy Instruction Sets and Hardware like the x87 FPU (That supposedly was a pain to program for in Assembler). x87 is already obsolete if you're running in Long Mode, something that is available from 8 years ago, so there is a waste of die size and power on a lot of legacy components that as time advances will make less sense, but are keep around just for legacy compatibility.

I agree, but x87 FPU performance is relativly important today. I don't think so that AMD has sacrified x87 FP because it uses same hardware resources as SIMD.
FMAC 0 and FMAC 1 can handle every FMUL or FADD instruction including x87. There is one difference to 10h. BD FPU can handle two FMUL or two FADD instruction at the same time. For single operations thread that can bring some performance. However that is only sideeffect of building FMAC 4-way FPU.

Quote:

Bulldozer shows a bit of this type of evolution because it boast much more integer potential compared to its floating point capabilities, however, its not the first time we see something like this. Don't you guys remember that the Pentium 4 x87 FPU was quite weak and Intel was pushing that applications used SSE/SSE2 instead? However, AMD timming on this one seems a better bet and we still don't know how much legacy performance is sacrificed. If GPGPU takes over FPU tasks, that is when Fusion will really kick in, as you will have powerful GPU resources in place ready to take over.

P4 x87 FPU was weak because of P4 microarchitecture. Front End of P4 is quite narrow. There is only one decoder, no L1-I cache, instead of that there is only trace cache. However, trace cache isn't that bad idea, but with such front end this is pointless. Second, P4 has only two FP pipes. One for FP move and store and second for FP ADD/MUL SIMD and x87 including MMX and integer SSE, and this unit is 128-bit wide. Because of that 128-bit SSE2 throughput isn't that bad on P4, but it is limited to 2DP or 4SP FP ops/cycle. FADD or FMUL on NetBurst has same througput like FADD od FMUL on Nehalem or 10h and double of throughput of K8, because 128-bit SSEx extension on the K8 is executed like two macroops (double dispatch). K8 has two 64-bit SIMD units and 10h has two 128-bit SIMD units. They are widened in order to retire 128-bit SIMD instructions in one cycle. 10h has doubled of FP throughput per core.

Todays GPU's can handle variety of tasks very well, but not all. Next generation of AMD GPUs will be more CPU like. With OpenCL FP intensive tasks can handle both, CPU and GPU simultaneously and that is the beauty of usage of OCL. OpenCL code is highly optimised for SIMD usage on CPU's.
Future is fusion! ;) Next generations of AMD GPUs could execute CPU code, especially SIMD (SSE, AVX, FMA) with some kind of software layer like Java VM or Flash. That software switch is named AMD IL (intermediate language) or something like that. That makes a lot of sense to use with next gen. APU's with has shared address space with integrated GPU (this is APU only feature, and IGP in next gen. APU will use x86-64 address pointers).

Quote:

Also, I'm very interesed on seeing how future designs evolve. You may not need dedicated Hardware for x87 or other obsoleted Instruction Sets other than at the Hardware decoder stage, after all, while the multiple Instruction Sets and Extensions are compatible with a ton of x86 Processors architectures, each work internally very different to the other after instructions are converted to MicroOps. As Fusion evolves you may see a very powerful FPU/GPU that is feeded decoded x87, SSE and GPU Microops.

There is no dedicated hardware, because is same hw is used for SIMD and x87. With AMD microarchitecture, x86 instructions are decoded to MOPs (Macro Operations). MOP is the pair of ALU/MEM operation. ALU and MEM operations in later pipeline stages are dispatched to execution units like micro operations when they came to reservation stations or schedulers. There is one difference between 10h and BD. 10h like all previous designs from K7 to 10h, has dedicated integer memory scheduler, and macroops has lanes. You cannot switch easily one macroop to another lane. With BD, there is unified integer scheduler. Every macro and microop in the scheduler can execute on every free execution unit if they have loaded operands from data cache to register file.
How do you think to decode and schedule GPU instructions into the CPU? That isn't possible. I've explained here what AMD want to make.
CPU instructions decoding inside CPU, and GPU instructions are executed inside GPU, but GPU can execute all SIMD, because of that, AMD can make software context switch for CPU/GPU to handle all instructions with all hardware.
Here is the picture:
Attachment 119179
Every x86 instruction proceed directly to CPU, and every CPU SIMD instruction proceed to CPU or translate to GPU. That is the concept where CPU can handle serial and parallel workload and GPUs works with massive parallel data.

Attachment 119180
08-21-2011, 06:49 AM
radaja

i have been out of it for a couple of weeks,any thing new on BD?
08-21-2011, 02:33 PM
undone

Quote:

Originally Posted by radaja

i have been out of it for a couple of weeks,any thing new on BD?

Some asian got lately ES(B2 rev.) and test it, found there's some mobo and bios limitation that cause Zambezi worked improperly. Seems not only the imature chip but also compatibility problem restrict the performance.
08-21-2011, 02:45 PM
Particle

Quote:

Originally Posted by radaja

i have been out of it for a couple of weeks,any thing new on BD?

Just a minor thing: AMD has shown a Zambezi based system at Hot Chips 2011 running Dirt 3.
08-21-2011, 04:18 PM
Olivon

http://tof.canardpc.com/view/0ef0ddf...62769287c1.jpg

Quote:

315mm^2 according to AMD.

http://semiaccurate.com/forums/showp...&postcount=257

Edit :

AMD presents details of the bulldozers Archtitektur at Hot Chips 23 - Technic3D

http://tof.canardpc.com/view/9ef886d...bea958a146.jpg

http://tof.canardpc.com/view/9cbde70...8e8f3ea5fc.jpg

Quote:

About the performance can be based on this information but do not judge here is only a test of future provide more insight. Based on the known rates can be estimated, however roughly, that the Zambezi processors can not compete with the top model in the next Sandy Bridge EX models from Intel. According to rumors, the Pro-MHz performance is even lower than the current Phenom CPUs are. The focus is thus clear for multi-threading and server market. The "near future", AMD revealed as we will therefore show how the bulldozer surprised or even disappointed.
08-21-2011, 04:40 PM
undone

Quote:

Originally Posted by Olivon

http://semiaccurate.com/forums/showp...&postcount=258

Edit :

AMD presents details of the bulldozers Archtitektur at Hot Chips 23 - Technic3D

http://tof.canardpc.com/view/9ef886d...bea958a146.jpg

http://tof.canardpc.com/view/9cbde70...8e8f3ea5fc.jpg

Nice slide, but it's repeating what have been stated last year. Still no hinting about real performance.
08-21-2011, 10:03 PM
mongoled

Its not the slide he is pushing but the quote below it!

Quote:

About the performance can be based on this information but do not judge here is only a test of future provide more insight. Based on the known rates can be estimated, however roughly, that the Zambezi processors can not compete with the top model in the next Sandy Bridge EX models from Intel. According to rumors, the Pro-MHz performance is even lower than the current Phenom CPUs are. The focus is thus clear for multi-threading and server market. The "near future", AMD revealed as we will therefore show how the bulldozer surprised or even disappointed.

He is a Dr Who lover..............................
08-21-2011, 11:08 PM
TESKATLIPOKA

Quote:

According to rumors, the Pro-MHz performance is even lower than the current Phenom CPUs are. The focus is thus clear for multi-threading and server market.

This part is BS. Single-threaded performance at the same frequency can't be lower than K10.5 or BD could never be faster in multi than SB2600 even if HT gives much smaller performance gains than a dedicated core or module.
08-21-2011, 11:51 PM
drfedja

Quote:

Originally Posted by TESKATLIPOKA

This part is BS. Single-threaded performance at the same frequency can't be lower than K10.5 or BD could never be faster in multi than SB2600 even if HT gives much smaller performance gains than a dedicated core or module.

That's no sense because AMD has no reason to make chip whos discard his own conception. Shared resources in module isn't that much problem for variety of workloads. Because of that, BD must have little stronger per thread, per core IPC, and much stronger per module IPC. In contrary whole BD concept is nonsense and AMD R&D are completly fools, because they can "upgrade" 10h to eight cores and turbo core 2 logic. That hypothetical X8 could outpace SB in multithreading.
08-22-2011, 12:18 AM
FlanK3r

next slides from Hotchips

http://img18.imageshack.us/img18/9189/a5464745s.th.jpg http://img198.imageshack.us/img198/7...464735s.th.jpg

http://img577.imageshack.us/img577/4...464736s.th.jpg http://img844.imageshack.us/img844/5...464737s.th.jpg

http://img820.imageshack.us/img820/701/a5464738s.th.jpg http://img40.imageshack.us/img40/7336/a5464739s.th.jpg

http://img828.imageshack.us/img828/5...464740s.th.jpg http://img812.imageshack.us/img812/6...464741s.th.jpg

http://img688.imageshack.us/img688/8...464742s.th.jpg http://img847.imageshack.us/img847/5...464743s.th.jpg http://img836.imageshack.us/img836/5...464744s.th.jpg
08-22-2011, 01:00 AM
freeloader

I won't say, "I told you so" until official release...
08-22-2011, 01:24 AM
undone

http://h5.abload.de/img/52484_4bbk8w.jpg

http://h5.abload.de/img/52484_1b28jp.jpg
08-22-2011, 01:43 AM
muziqaz

did they have a security guard near the cage? AMD is showing it as some priceless relic :(

on the other hand that case looks really nice. Anyone know who made it and what the name is? :)
08-22-2011, 01:51 AM
Olivon

http://www.youtube.com/watch?v=IO7NcbcUjEM

Show 100 post(s) from this thread on one page