Uhh, sorry, didn't noticed #3571 was also from you, thought Olivon was referring to #3566. :D
Printable View
Uhh, sorry, didn't noticed #3571 was also from you, thought Olivon was referring to #3566. :D
Ah yes, that was then too much speculation. The big step will probably come with BDv3 then.
Edit:
However, what is catching the eye, is the fact that they just had two BD versions on that early slide. Only BD and BD Next generation. Seems to me that the current BDv2 is more like a BDv1.5, according to that early roadmap.
To make the numbers even, they probably renumbered the next gen BD to BDv3 and the "enhanced BD" was numbered BDv2.
Could be too, but the equivalence of 3 codenames and 3 numbers and 3 BD/enh.BD/BD NetGen ist too big to oversee ;)Quote:
I remembered this slide, but thought the Enhanced one is bdver1.1, or something like that...
Correct, i used it, too, but now I became lazy *g* and everybody should be able to understand BDv1,2,3 in the same way as BDver1,2,3 ;-)Quote:
(I'm using the bdverX format, instead of BDvX, because the former is used in some include files for a certain compiler.)
Probably. So far nothing is know about that one, yet. I guess now, that it will be used in the 28nm shrink chips. However, I am not sure. Dresdenboy already catched the code-name very early, but in the mean time intel released their AVX2 specification. I guess AMD would like to add this for BDv3, if they can. Therefore, maybe they changed plans. Furthermore, maybe BDv3 was originally planned for 22nm. Maybe it will be more like a BDv2.5 then ;-)Quote:
Well, they are not entirely the same... Indeed bdver3(?), that one seems to be much more than some little enhancements.
First source I could find with google is from September '10:
http://journal.mycom.co.jp/articles/...cat/index.html
Intel just released AVX2 this June .. i wonder if AMD could adapt it that quickly, or if it is already too late for Steamroller.
@rog:
Ok thanks, then 6.4 ;-)
Would score 7 then already without speed-bump.
Do you guys even read what I wrote? In floating point heavy code that employes all 8 threads Turbo will almost never engage. Turbo will engage accross all 8 integer cores though,but cinebench will use flexfp coprocessors most of the time where tdp will be maxed out. You can read all about bd exec. units power draw and clock characteristics at amd blogs past isscc event.
Now it's almost a week remain before the releasing day. Anyone still looking forward to mid October announcement?
I can't believe people can be so easily deceived to FUD nowadays :D;)
I started a thread in a local hardware forum and said some FUD crap about Bulldozer failed to beat FX
Many "Big blue" fanboy just jump in and said AMD Failed, some replies said they still hold faith that FX will perform quite well (based on the same link I provided)
Anyway , I heard from rumor said the NDA for "something" will end in Oct 12 ,2011 , I guess MAYBE it's Bulldozer. I still don't see any "early" retail out there yet , but I will keep an eye for one for sure ;)
Some interesting talk about x264 optimizing on bulldozer
http://www.planet3dnow.de/vbulletin/...&postcount=562
http://www.planet3dnow.de/vbulletin/...&postcount=585
Looks like fma4 & XOP bring greater help than AVX on bulldozer. Seems we would get a revolutionay change since MMX.
Quote:
2011-09-16 23:42:16 < Dark_Shikari> Oh YI, we know now why AVX is useless on bulldozer
2011-09-16 23:42:20 < Dark_Shikari> *FYI
2011-09-16 23:42:22 < Dark_Shikari> Move elimination
2011-09-16 23:42:29 < Dark_Shikari> Their OOE engine eliminates moves and resolves them before ALU stage
2011-09-16 23:42:34 < Dark_Shikari> So moves are free, so AVX doesn't help
2011-09-16 23:42:39 < Dark_Shikari> Except reducing code size ofc
Quote:
2011-09-23 18:56:03 < Dark_Shikari> Okay, so I have a massive series of bulldozer profiles ready
2011-09-23 18:56:13 < Dark_Shikari> It has instruction-based sampling and all sorts of awesome stuff
2011-09-23 18:56:43 < JEEB> AMD? Awesome stuff? This sounds like something that doesn't happen very often
2011-09-23 18:57:21 < Gramner> any NDA?
2011-09-23 18:59:53 < Dark_Shikari> Technically yeah
2011-09-23 19:00:08 < Dark_Shikari> Though a lot of the stuff isn't bulldozer-specific, its performance counters are just awesome
2011-09-23 19:00:32 < Dark_Shikari> Unsurprisingly, our load/store queue is full in pixel_avg functions.
2011-09-23 19:01:25 < Dark_Shikari> Er, load queue.
2011-09-23 19:01:36 < Dark_Shikari> Our store queue, on the other hand, fills in plane_copy, mc_copy...
2011-09-23 19:01:38 < Dark_Shikari> slicetype_mb_cost?
2011-09-23 19:02:12 < Dark_Shikari> cache_load and cache_save, guess that's obvious
2011-09-23 19:02:33 < Dark_Shikari> analyse_init, naturally
2011-09-23 19:02:50 < Dark_Shikari> Okay, time for INEFFECTIVE_SW_PREFETCHES
2011-09-23 19:03:05 < Dark_Shikari> Oh, this is awesome. It tells you when a prefetch is useless, i.e. the data was already in L1 cache
2011-09-23 19:03:12 < Dark_Shikari> Almost all of the "useless prefetches", pengvado, are in hpel_filter
2011-09-23 19:03:21 < Dark_Shikari> The rest are in cache_load
2011-09-23 19:03:23 < Dark_Shikari> Guess that's expected.
2011-09-23 19:04:02 < Dark_Shikari> Next: DECODER_EMPTY.
2011-09-23 19:04:17 < Dark_Shikari> I... think this is where the instruction decoder... hmm. Is this where the decoder is too fast, or too slow?
2011-09-23 19:04:43 < Dark_Shikari> Okay, it's where the decoder is too slow (there's nothing to dispatch)
(...)
2011-09-23 21:47:40 < Dark_Shikari> Thank you performance counters, I think I just made CABAC RD way faster
2011-09-23 21:48:37 < LordRPI> nice
2011-09-23 21:49:22 < Dark_Shikari> 50% of the branch mispredictions in cabac were on one line of code
2011-09-23 21:49:26 < Dark_Shikari> a restructure of the function, kabam
Quote:
2011-09-27 00:55:51 < Dark_Shikari> pengvado: oh oops, vpermilps and pd are 5-operand (!!!!!)
2011-09-27 00:55:57 < Dark_Shikari> dst,src1,src2,selector,imm8
2011-09-27 00:56:25 < Dark_Shikari> I mean seriously wtf
2011-09-27 01:04:02 < Dark_Shikari> Also, they apparently dropped 3DNOW
Quote:
2011-09-28 01:33:41 < Dark_Shikari> AVX mbtree propagate is slower than sse2
2011-09-28 01:33:49 < Dark_Shikari> FMA only barely manages to get it fast again.
2011-09-28 01:33:49 < kemuri-_9> lol
2011-09-28 01:33:52 < Sean_McG> hahah
2011-09-28 01:33:59 < Dark_Shikari> SSE2: 342 cycles
2011-09-28 01:34:00 < Dark_Shikari> AVX: 374
2011-09-28 01:34:05 < Dark_Shikari> FMA4: 340
2011-09-28 01:34:18 < kemuri-_9> lol
2011-09-28 01:34:26 < Dark_Shikari> I guess this makes sense given that it only has 128-bit execution units
2011-09-28 01:34:34 < Dark_Shikari> and the INT16_TO_FLOAT code is obnoxiously slow because avx sucks
2011-09-28 01:34:41 < Dark_Shikari> i.e. avx has no way of doing int16_t -> float fast
2011-09-28 01:35:18 < Dark_Shikari> Hmm. I wonder if FMA4 supports sse registers?
2011-09-28 01:35:37 < Dark_Shikari> Oh. It *does*...
2011-09-28 01:35:38 < Dark_Shikari> Let me try that.
2011-09-28 01:37:45 * codestr0m ears perk up
2011-09-28 01:49:29 < Dark_Shikari> FMA4: 314 cycles. Much better
2011-09-28 01:49:46 < codestr0m> Dark_Shikari: what was the change?
2011-09-28 02:01:21 < Dark_Shikari> using the sse instead of avx version
2011-09-28 02:01:26 < Dark_Shikari> as the basis for xop
Quote:
2011-10-01 02:09:51 < Dark_Shikari> xop will make this a lot easier, but I'm trying to do ssse3 first
Quote:
2011-10-04 04:46:38 < Dark_Shikari> C, with mode analysis shortcuts: 253 cycles
2011-10-04 04:46:45 < Dark_Shikari> My crappy, badly optimized XOP asm: 93 cycles
2011-10-04 04:46:56 < Dark_Shikari> This is kinda awesome
2011-10-04 04:49:35 < Dark_Shikari> Oh, and old without shortcuts: 379 cycles
2011-10-04 04:49:45 < Dark_Shikari> My asm is 4 times faster than the existing... wait where have we seen this before? XD
2011-10-04 04:49:57 < Dark_Shikari> It's just like SAD_4x4_x9 all over again!
2011-10-04 04:50:10 < JEEB>
2011-10-04 04:50:18 < JEEB> that sounds pretty awesome
2011-10-04 04:50:21 < Dark_Shikari> Except this time I'm still wondering how best to do it without vpperm
2011-10-04 04:50:33 < Dark_Shikari> Thanks AMD, for bringing back the best instruction ever after 15+ years of hiatus.
Will that be possible if AMD will launch Bulldozer , and Radeon HD7000 together ? :rolleyes:
Not in this story... BD launch is set for 12 October.
So will the 125W TDP FX-8120 be better than the 95W version because of more overclocking headroom?
At stock speeds, the 125W will probably be the better buy (if you're not concerned with power usage) because the higher TDP ceiling means that more cores can be activated to the turbo speed. The 95W version will likely not have the same headroom, and may only let one or two cores scale to the turbo speed before they all throttle down.
As for overclocking, that's yet to be determined. As all FX series will have an unlocked multiplier, unless the higher end models are much higher binned, it would seem almost foolish to me to spend the extra on an 8150 vs an 8100. But, we'll see when they come out if the 8150 has higher overclocking headroom than the 8100 and 8120.
2 chips of the same model should perform near identically, i dont believe it will really be set at 125W anyway, i think it will be ~100W and the 95w version will have very similar headroom with extremely minor bump in efficiency that lets it reach the headroom.
I red but do you have a BD? So why should I believe you?
Cinebench11.5 scores are rather bad even CB10 is doing better. Hence I do not believe that the FPU is maxed out at all, especially as there is neither FMAC nor XOP/"MMX" code (the other 2 pipes in the FPU) used. Thus I think there is enough headroom for the 3,9Ghz Turbo stage. Anyways, we'll know in less than 1 week ;-)
Then you have to wait a few weeks longer, because there is only a 8120 and 8150 model at launch ;-)
However, there are 2 versions of the 8120, 125W and 95W, but from experience with the Phenom2s, I would assume that the 95W part will go directly into the OEM market to Dell, hp, etc.
It took me less than 2 mins to find these in retail... and at one retailer!
http://www.newegg.com/Product/Produc...82E16819103856
http://www.newegg.com/Product/Produc...82E16819103809
http://www.newegg.com/Product/Produc...82E16819103921
Well I should have been more specific. Yes, these are all 95W parts, but there is no 125W version of the same model. I was referring to the 1055T. Likewise to the FX8120 there are 2 versions of it, a 95W and a 125W:
http://products.amd.com/en-us/Deskto...?id=641&id=652
However, the 95W part just appeared recently in the etail market (2-3weeks back here in Europe), probably because OEMs are filling their stocks with Bulldozers now.
Also note that there is no "shop now" link for the 95W part in the above link.
Well yes, the parts are indeed different. However, your point was it would be shipped to OEM's and retail wouldn't see any (if at all). Unless it is a OEM special, like the 960T, where AMD clarified as much, i don't have a reason to believe that the 95W chip won't be seen in retail. If you look at it, Phenom II 945 is merely locked version of 940 at lower TDP. This being FX, i don't think AMD would muck about.
hmmm:
FX-6100 $189.99
http://www.tigerdirect.com/applicati...893&CatId=7246
FX-8120? $219.99
http://www.tigerdirect.com/applicati...896&CatId=7246
FX-8150? $259.99
http://www.tigerdirect.com/applicati...899&CatId=7246