You do realise that benchmark is highly unfair?
SB system has 16Gb of RAM, BD only 4Gb
SB system has 120Gb SSD, BD only has 500Gb HDD...... Anomolies like this usually make me question the authors intent......
Printable View
Hyperthreading seems to work well on BF3 and that indicates that there are a lot of cache misses. When the CPU-core waits on memory it kan jump to another thread. Cache misses also indicates that much memory and/or complicated memory patterns are used. Higher frequency is good but avoiding going to ram for data is also important.
If BF3 would have dedicated threads for different tasks, like one render thread, one AI thread etc then hyperthreading is not good.
When work are sliced in smaller jobs it is also important that threads are able to synchronize fast. Don't know if there are improvements there on bulldozer.
F.E.A.R vs BF3 beta
http://gamegpu.ru/Action-/-FPS-/-TPS...-test-GPU.html
http://gamegpu.ru/plugins/content/ju...5wbmd8QlJ8MzA=
http://gamegpu.ru/Action-/-FPS-/-TPS...-test-GPU.html
http://gamegpu.ru/plugins/content/ju...BuZ3xCUnwzMA==
Yer looks like it - I forgot to mention I used Cat 11.x (older ones) for the first bit of testing and the newer 11.10 BETA drivers for the AA testing - Them 11.10 drivers are well crap - First I got a black screen (could just see team members lol!) - Then I started to get loads of glitching. At least with the older drivers I was completely stable!!
Probably not a fair test - but from what I'm reading it helps to have 1.5Gb+ cards when running this game. Hey ho - Guess I'm still happy I can run it maxed out (without AA of course lol)
If anyone out therre is testing - don't forget to restart the game as video setting changes DEFINATELY don't take place until then.
Yes I have to agree after running some fast tests on my system, it seems that I have been wrong and BF3 does like more cores over more frequency. I made too fast conclusion on max cpu usage pattern.
I tested with different cpu affinity & different cpu clocks. Most noticeable difference was that with only two cores used @ 3,6GHz was noticeable lag and much of it, but with four cores @ 1,8GHz it ran very smooth.
It was not in anyway scientific, I only used fraps, task manager et AOD. No charts were made. Btw, it run 38+ fps with 6 cores @ 1,8GHz.
GPU used is HD6870 @ stock. Settings: All high, no-aa, aniso 16x, HBAO, 1680*1050, cat 11.10pre.
Interesting! BF3 almost acts like serverapplications. Frequency isn't allways that important, more cores and fast memory access is sometimes better.
I know that many gamers buys i5-2500K because they want to be prepared for future games. But if this is the future than that CPU will not be as future safe. Less L3 cache (6MB) and only 12 way set associative. Games using much memory and/or complicated memory patterns and i5-2500K cache is trashed in no time.
Quote Originally Posted by gosh View Post
Hyperthreading seems to work well on BF3 and that indicates that there are a lot of cache misses. When the CPU-core waits on memory it kan jump to another thread. Cache misses also indicates that much memory and/or complicated memory patterns are used. Higher frequency is good but avoiding going to ram for data is also important.
If BF3 would have dedicated threads for different tasks, like one render thread, one AI thread etc then hyperthreading is not good.
When work are sliced in smaller jobs it is also important that threads are able to synchronize fast. Don't know if there are improvements there on bulldozer.
Question??????? the I5-2500k is a 4 core 4 thread processor???? If so and gamers bought it to play games, then they are not going to be happy as they don't have the HT to help them out.
Speaking nonsense here. 2500K L3 cache is 2-3X faster than Phenom and Thuban.
Not to put also the IMC much, much faster.
So if it is 6 or 8MB it doesn't coun't.
How that BD with a total of 16MB cache doesn't overall beat 2600K?
If it would be so, than the price vouldn't be 245$.
The most stupid thing i ever heard...Until no we even don't have games to use more than 4 cores or threads.Quote:
I5-2500k is a 4 core 4 thread processor???? If so and gamers bought it to play games, then they are not going to be happy as they don't have the HT to help them out.
Gamers should take care to buy a strong vga.
Will see about that if BF3 knows more.
And tests at 640*480 is also the bigest nonsense i ever heard.
thanks for the reality check. if the 2600k blew past bulldozer in math benchmarks by that much, I don't see what place bulldozer would have in servers.
also, even in the misleading benchmarks where bulldozer loses everything, the one thing it wins is floating point math. if bulldozer wins at floating point by such a large margin, that alone may make up for its IPC weaknesses WHEN PLAYING GAMES!
Look at this: http://semiaccurate.com/forums/showp...postcount=2208
If this is true, I was _very_ close with my math skillz =)
radaja, did you post the below over here yet? I seriously laughed at this.
"well i can tell you it should be bright and sunny, based on what i have been told
from my contacts.....
my brother works at AMD and he said all is superb with BD
my sister works for Global foundries and she said 32nm is 100% perfect
my grandpa works for the Guiness Book of world records and he said BD is the fastest CPU in the world.
my uncle works for Cray and he said Interlagos is the most powerful CPU ever made.
my other sister works for intel and she said everyone is very scared right now at intel and are struggling
to come up with a solution to BD
my mom works for Cyberpower PC's and they are planning on dropping all intel based machines in favor
of AMD's BD for the next 5 years
but this is all just a bunch of BS on my part,nothing i just said is true"
AM3+ Socket?
No dates when these slides were made?
I not only notice the integer performance increase(like Phenom to Llano) but more important is the FMA3 support which is only support on Haswell with Intel plan!:eek:
How should there be PCI-E 3.0 and USB 3.0, when they are still using the AM3+ plattform with the 990FX Chipset?
Technically it might be possible to use the A75 FCH instead of the old SB850/950 to gain USB 3.0 but it doesn't solve the problem of PCI-E 2.0 in the aging Northbridge.
If those slides are legit, they can't be that old, since 2nd Gen Bulldozer was supposed to use socket FM2 and the change to stick to AM3+ was only recently.
If you pay more attention to recently slides, can notice many of them are without date or the part that stated date is being splitted. Be aware of those suspicious slides cuz they might be already obsolete, like some rediculous roadmap.
PD is on both trinity and komodo, trinity doesn't seems to have even higher frequency than Zambezi.
My point is, do you really believe the next gen would not support PCI-E 3.0 and USB 3.0? Piledriver was originally planned for the second half of 2012 if I remember correctly, also PCI-E 3.0 wouldn`t be the part of a chipset anymore since it`s suppossed to be integrated into CPU.
Sure, that's funny.
Should be up-to-date, as the previous plan was to release an IGP-less Piledriver-based chip named Komodo on FM2. It was only recently replaced by this Piledriver on AM3+ one. (AFAIK it's called Wishera.)Quote:
Edit: obviously not very up-to-date slide, there is no native PCI-E 3.0 or USB 3.0 either
Edit: alternatively, it was a plan B from way back...
So, "bdver2" is indeed Piledriver, not Steamroller, then... (?)
Dresdenboy was wrong here, it seems: http://citavia.blog.de/2010/10/21/si...llano-9726240/
1. You're speaking of Komodo, but it seems that's delayed or even canceled...
2. Piledriver will come first with Trinity 2012Q2 (on FM2).
3. Seems they are releasing a Piledriver-based chip for the AM3+ owners, as well, so that platform won't last a few months only...
But, to design a brand new chipset, as well? Not worth it.
4. Perhaps FM2 will support those feats.
that would bring it around the same area as IB which is runnig on a smaller node.... so how doesn't that look good?
FX8170 = (8150 + 8%)
Piledriver = FX8170 * 1.1
IB = SB+20% due to clocks.
Looking everything in a negative way is very easy. especially if you don't know the performance of the current BD..
and where will you place my oc'ed q6600?
will i see any benefit in games on min fps if i upgrade and will the price to upgrade be worth the extra fps?
Ok. Let's see October 12th then. You are forgetting that even if BD can compete with one SB chip it doesn't mean it could compete with the architecture. SB is a relaxed architecture with lots of headroom, if AMD offered resistance then SB would be sold at higher frequencies and LGA 2011 would be hurried out and not like it has been now, BD doesn't seem to be that relaxed at all. And as it looks now, BD is far behind in single threaded performance, if it’s between 2500K and 2600K in a bunch of test which is mostly multithreaded then it’s not on the same level as I see it. You can talk multithreaded all you want but I don’t work with or play Cinebench. I use Firefox and play games. And single thread performance is important there, and yes, games are multithreaded but to a limit.
And, if the 10% in performance increase with Piledriver is frequency based, then it's a sign that we wont see much more than that until the next update in 2013-2014. But if it's IPC they mean, then there's still room for frequency also. And raising performance with frequency is a bit more desperate and a last resort thing to do. You have to agree that it nice to see IPC improvments rather than frequency improvements. The end result might look the same but the later usually has bigger drawbacks like heat, and has much more limited possibilies and is a sign of lack of innovation.
-Boris- ffs, do you even know what you are talking about?
Bd might compete with SB chip but not with it's arch
SB is relaxed arch and BD isn't. Can you even get more BS than this?
What does relaxed arch mean in here?
He probably means SB has room to scale in frequency while BD propably wouldn't.
Given the process problems and the frequencies BD gets at stock and oced, it seems like BD has alot more frequency to gain than SB in the same power envelop though. (FX8150 -> FX8170 = +300MHz (300MHz turbo). 2600 ->3820= +200MHz (100MHz turbo) with an increased TDP.
Feel like something got lost in translation here.
If the best BD is capable of competing with a single product like the 2500K then it's not the same thing as the whole BD architecture is able to compete with the whole SB architecture. Intel has put SB more or less on hold until now. How well will BD compete with LGA 2011? So, that BD can compete with some SB chips does not mean that the architectures are comparable.
That Intel haven't pushed SB at all. SB got lots of potential but there is no reason to hurry for Intel. With 3960X Intel is pushing SB a bit more, can BD compete with that? No. So BD and SB isn't at the same level.
First of, FX8150 is already at a higher TDP than 2600K. Second, frequency isn't everything. LGA 2011 SB get a lot more than frequency. Third you still don't compare the best with the best. 3960X is Intels true highend, where they are pushing SB a bit.
08-12-2011 01:32 AM #1
Daveburt714
Daveburt714 is offline
Xtreme Addict Daveburt714's Avatar
Join Date
Dec 2007
Location
State of Confusion, USA
Posts
1,889
Thanks
7
Thanked 2 Times in 2 Posts
I killed my 1090T :{
1st time I've ever killed a CPU, and I've been OC'ing since the K6-2 400 ...
I was trying to push my memory the other night on the new Sabertooth 990FX.
Long story short, I was doing pretty well @ 245 HTRef , 3.9 cores, 2.94 IMC and 1960 7-9-7 ram (no crazy voltage either).
I had some problems getting over that, so instead of cranking v's higher I turned "IMC Current Capacity" up to 120% in bios.
After that, I had a No Post situation...
I tried everything else first, CMOS reset, change MEM, 1 stick, I even changed the VidCard but still had the red light under the SB (HDD Boot Device Error).
My first thought was the board had died, so I pulled the CPU and put it in the C4F....
Same no post situation. Forunately, I still have a 965, so I put it in the main rig and everything is fine.
MORAL OF THE STORY: Be careful cranking the current capacity on the IMC with Thuban....
Theres no doubt it could have just been my hardware, but I'll be real hesitant to crank it that high again.
PhenomII X6 1090T (1025 GPMW) | Asus SaberTooth 990FX | 2x2Gb Mushkin 1600 CL7 | Sapphire HD 6870 | OCZ 120 SSD / WD 1TB SATA3 | Corsair TX750 PSU
Watercooled ST 120.3 & TC 120.1 / MCP350 XSPC Top / Apogee XT | WIN7 64 Bit HP | Corsair 800D Obsidian Case
First Computer: Commodore Vic 20 (circa 1981).
Is not the IMC part of the Uncore???
uncore - technical definition
Not in the processing core. The designation by Intel for circuits on a chip that are not executing program instructions. For example, the memory controller in the Core i7 CPU is in the "uncore" area of the chip.
Does the AMD Phenom II follow this description also????
Thank You for any knowledge that you may part with in teaching me:up:
I think Boris should lay off the speculation pipe for the time being,he is wandering to some strange lands :). 9 more days and many things will be much clearer.
No, not at all. ;)
I've said I believe it to be IPC improvements. I've just tossed an if into the equation, then people started arguing that those 10% would be enough to pu PD in the same performance as IB, and that BD already had the same performance as SB.
To summarize.
I think PD will be 10% faster than BD clock for clock.
I don't think BD is equal to SB.
I don't think PD will be equal to IB.
I think it would be bad IF these 10% were just frequency improvements.
And people are saying I'm wrong and speculating when presenting the last three points here. And I can't see why. :( If that's being negative then I don't want to be positive, cause it seems delusional.
Microcenter Monthly ad has prices expire the 16th of October, which is not normal; my guess is that a new ad will be released with bulldozer deals, et al. Looks like the 12th or there-abouts is the real deal...
Here's a link with some good info.... if you can get around the bad German>English translation.:welcome:
http://translate.googleusercontent.c...LwmU_kXexnggow
"10% more" can mean alot of things
if the module as a whole got 10% more then it could be because a single thread got stronger, or the second thread is getting 90% instead of 70% gains thus turning a module into 190% instead of 170% which is ~10%.
or it could just be a few % IPC and a few % clock rates
i honestly hope single threaded perf got 10% better IPC, without any sacrifice in clock rates (and maybe like 5% more OC room too just cause the process is more mature)
IPC gains, as high as 10%, could be seen in simply using an optimized code path in certain software environments.
Just sayin' ...
The FX-8120 is unlocked, right?
Yes, it has been stated that all FX chips are unlocked.
Dude, I turned up the "Current Capacity" NOT the voltage! IMC voltage was @ 1.278 which is well within acceptable range IMO.
It was probably just a fluke, but thanks for trying to make me look stupid.... :shakes:
And yes, I was refering to the "uncore" which is actually called CPU-NB or IMC (integrated memory controller) on AMD systems.
Sir,
I was not trying to make you look stupid. I'm sorry if it sounded or came out that way. If you look at my post and the numbers that I got, to me it seemed like i was getting poorer performance with each voltage increase. With my x4 955 it was the opposite, better score with more voltage. :am: I've just started playing with my x6 and trying to get/acquire as much Knowledge from more experienced members. I apologize for many of my stupid questions, But as I have been told " You can not learn if you don't ask". I may not be putting them in the correct format or complete clarity for you MASTERS :worship: As I have said in several of my posts " Thank You for any Knowledge that you are willing to part with...in teaching me :help:
The way I understood the post was that you had allowed to much Voltage to be sent to the IMC. From my limited knowledge, this is the cpu-nb spot in the bios. I read some were in a post that by increasing the Voltage of this can help improve performance by lowering the memory latency.
And again
Thank You for any Knowledge that you are will to part with...in teaching me :hitself::censored:
I am quite sure AMD knows what IPC-gains awaits them with Piledriver, especially since they have shown Trinity up and running.
So assuming projections = "we have no clue", which I personally do believe they do, the only thing they don't know for sure is probably what clock-speeds it will get as that is the final thing they do with a new architecture. Since its a refresh of Bulldozer I am quite confident it should clock the same or even better.
How could they know the stock clock rates (within a given TDP class) long before volume production? Also, the clock rate/power consumption ratio is to be raised even for a given die (new steppings/revs, better yields, etc.), and through the lifecycle of the uarch.
That could be said about BD, as well.
Perhaps they're referring to the Converged BMI Instructions. But, would they've written "x86 performance", then?
I agree, but sometimes the final performance isn't known this early. That said I do think they have more than a clue about IPC on Piledriver at this point, just that you can't be certain. I also believe that it's mostly clock speeds they don't know at this point. And I hope that there will be a bit larger headroom for Piledriver, even if Bulldozer seems to have some promising overclocking potential.
Only if AMD had marketed the octo core as an 8 threaded processor w/ 4 cores... all these benchmarks, speculations and so forth would make WAY more sense.
I knew from the the initial announcement of its release projected to be april this year...as soon as they started detailing the uarch...I had a feeling it should be marketed as a quad core rather than a freakin 8 core.
Cuz, as an 8 core it looks weak...but you change perspective to a quad core w/ 8 threads and the thing becomes a monster.
bout sums it all up!
Damnit AMD market them as quads...not freakin octo's!
Personally I think that Piledriver slide is pure BS. There's no way AMD would talk about that to anybody outside of AMD, before BD has even been released. Not even to big partners like Cray, it would be utterly moronic.
Should be, they are talking about the core-architecture, i.e. not about the whole chip in that paragraph. I just wonder what "digital media workload" is. Cinebench? That would mean a score around 6.60. If they could also up the clock a bit, they could reach ~7.00. Sounds good, but the chip won't be here before ~H2/12. Intel will have IB until then and they could probably clock that thing crazy @22nm.
Apart from that the whole slide seems amateurish, even for an AMD slide.
Yes and that was already known for quite some time. AMD changed plans. The roadmap from Dresdenboy's blog is simply outdated. Remember the Bulldozer / enhanced BD and BD Next Generation names, from the Bulldozer introduction slides?
Attachment 120780
That's basically BDv1, BDv2 and BDv3, or if you want to use the fancy codenames: Bulldozer, Piledriver and Steamroller. Lots of names for the same .. marketing at its best.
Don't forget that these are just the names for the core architeture, the real chips have then names, too. Trinity for the APU with Piledrivercores, etc. etc.
The blue text is Dresdenboy's guesses.
Sure that there are big changes, really? Piledriver is still a moderately enhanced Bulldozer, not an entirely new generation one.
I remembered this slide, but thought the Enhanced one is bdver1.1, or something like that...Quote:
Remember the Bulldozer / enhanced BD and BD Next Generation names, from the Bulldozer introduction slides?
Attachment 120780
That's basically BDv1, BDv2 and BDv3, or if you want to use the fancy codenames: Bulldozer, Piledriver and Steamroller.
(I'm using the bdverX format, instead of BDvX, because the former is used in some include files for a certain compiler.)
Well, they are not entirely the same... Indeed bdver3(?), that one seems to be much more than some little enhancements.Quote:
Lots of names for the same .. marketing at its best.
I won't forget, I promise! ;) (I know, of course.)Quote:
Don't forget that these are just the names for the core architeture, the real chips have then names, too. Trinity for the APU with Piledrivercores, etc. etc.
Rog seems to be very well informed :D
Rog? Wasn't it hyc who believed he himself is? There are official roadmaps and many slides out there...
And, knowing that Piledriver will be available also on AM3+ is actually a selling point for current AM3+ systems, as this way it won't be so shortlived.
F.ex. originally I was going for Zambezi, but recently thought better wait for FM2 and Trinity. Now I'm in a dilemma, again. :)
Turbo won't kick in fully MT fp heavy workloads...
Uhh, sorry, didn't noticed #3571 was also from you, thought Olivon was referring to #3566. :D
Ah yes, that was then too much speculation. The big step will probably come with BDv3 then.
Edit:
However, what is catching the eye, is the fact that they just had two BD versions on that early slide. Only BD and BD Next generation. Seems to me that the current BDv2 is more like a BDv1.5, according to that early roadmap.
To make the numbers even, they probably renumbered the next gen BD to BDv3 and the "enhanced BD" was numbered BDv2.
Could be too, but the equivalence of 3 codenames and 3 numbers and 3 BD/enh.BD/BD NetGen ist too big to oversee ;)Quote:
I remembered this slide, but thought the Enhanced one is bdver1.1, or something like that...
Correct, i used it, too, but now I became lazy *g* and everybody should be able to understand BDv1,2,3 in the same way as BDver1,2,3 ;-)Quote:
(I'm using the bdverX format, instead of BDvX, because the former is used in some include files for a certain compiler.)
Probably. So far nothing is know about that one, yet. I guess now, that it will be used in the 28nm shrink chips. However, I am not sure. Dresdenboy already catched the code-name very early, but in the mean time intel released their AVX2 specification. I guess AMD would like to add this for BDv3, if they can. Therefore, maybe they changed plans. Furthermore, maybe BDv3 was originally planned for 22nm. Maybe it will be more like a BDv2.5 then ;-)Quote:
Well, they are not entirely the same... Indeed bdver3(?), that one seems to be much more than some little enhancements.
First source I could find with google is from September '10:
http://journal.mycom.co.jp/articles/...cat/index.html
Intel just released AVX2 this June .. i wonder if AMD could adapt it that quickly, or if it is already too late for Steamroller.
@rog:
Ok thanks, then 6.4 ;-)
Would score 7 then already without speed-bump.
Do you guys even read what I wrote? In floating point heavy code that employes all 8 threads Turbo will almost never engage. Turbo will engage accross all 8 integer cores though,but cinebench will use flexfp coprocessors most of the time where tdp will be maxed out. You can read all about bd exec. units power draw and clock characteristics at amd blogs past isscc event.
Now it's almost a week remain before the releasing day. Anyone still looking forward to mid October announcement?
I can't believe people can be so easily deceived to FUD nowadays :D;)
I started a thread in a local hardware forum and said some FUD crap about Bulldozer failed to beat FX
Many "Big blue" fanboy just jump in and said AMD Failed, some replies said they still hold faith that FX will perform quite well (based on the same link I provided)
Anyway , I heard from rumor said the NDA for "something" will end in Oct 12 ,2011 , I guess MAYBE it's Bulldozer. I still don't see any "early" retail out there yet , but I will keep an eye for one for sure ;)
Some interesting talk about x264 optimizing on bulldozer
http://www.planet3dnow.de/vbulletin/...&postcount=562
http://www.planet3dnow.de/vbulletin/...&postcount=585
Looks like fma4 & XOP bring greater help than AVX on bulldozer. Seems we would get a revolutionay change since MMX.
Quote:
2011-09-16 23:42:16 < Dark_Shikari> Oh YI, we know now why AVX is useless on bulldozer
2011-09-16 23:42:20 < Dark_Shikari> *FYI
2011-09-16 23:42:22 < Dark_Shikari> Move elimination
2011-09-16 23:42:29 < Dark_Shikari> Their OOE engine eliminates moves and resolves them before ALU stage
2011-09-16 23:42:34 < Dark_Shikari> So moves are free, so AVX doesn't help
2011-09-16 23:42:39 < Dark_Shikari> Except reducing code size ofc
Quote:
2011-09-23 18:56:03 < Dark_Shikari> Okay, so I have a massive series of bulldozer profiles ready
2011-09-23 18:56:13 < Dark_Shikari> It has instruction-based sampling and all sorts of awesome stuff
2011-09-23 18:56:43 < JEEB> AMD? Awesome stuff? This sounds like something that doesn't happen very often
2011-09-23 18:57:21 < Gramner> any NDA?
2011-09-23 18:59:53 < Dark_Shikari> Technically yeah
2011-09-23 19:00:08 < Dark_Shikari> Though a lot of the stuff isn't bulldozer-specific, its performance counters are just awesome
2011-09-23 19:00:32 < Dark_Shikari> Unsurprisingly, our load/store queue is full in pixel_avg functions.
2011-09-23 19:01:25 < Dark_Shikari> Er, load queue.
2011-09-23 19:01:36 < Dark_Shikari> Our store queue, on the other hand, fills in plane_copy, mc_copy...
2011-09-23 19:01:38 < Dark_Shikari> slicetype_mb_cost?
2011-09-23 19:02:12 < Dark_Shikari> cache_load and cache_save, guess that's obvious
2011-09-23 19:02:33 < Dark_Shikari> analyse_init, naturally
2011-09-23 19:02:50 < Dark_Shikari> Okay, time for INEFFECTIVE_SW_PREFETCHES
2011-09-23 19:03:05 < Dark_Shikari> Oh, this is awesome. It tells you when a prefetch is useless, i.e. the data was already in L1 cache
2011-09-23 19:03:12 < Dark_Shikari> Almost all of the "useless prefetches", pengvado, are in hpel_filter
2011-09-23 19:03:21 < Dark_Shikari> The rest are in cache_load
2011-09-23 19:03:23 < Dark_Shikari> Guess that's expected.
2011-09-23 19:04:02 < Dark_Shikari> Next: DECODER_EMPTY.
2011-09-23 19:04:17 < Dark_Shikari> I... think this is where the instruction decoder... hmm. Is this where the decoder is too fast, or too slow?
2011-09-23 19:04:43 < Dark_Shikari> Okay, it's where the decoder is too slow (there's nothing to dispatch)
(...)
2011-09-23 21:47:40 < Dark_Shikari> Thank you performance counters, I think I just made CABAC RD way faster
2011-09-23 21:48:37 < LordRPI> nice
2011-09-23 21:49:22 < Dark_Shikari> 50% of the branch mispredictions in cabac were on one line of code
2011-09-23 21:49:26 < Dark_Shikari> a restructure of the function, kabam
Quote:
2011-09-27 00:55:51 < Dark_Shikari> pengvado: oh oops, vpermilps and pd are 5-operand (!!!!!)
2011-09-27 00:55:57 < Dark_Shikari> dst,src1,src2,selector,imm8
2011-09-27 00:56:25 < Dark_Shikari> I mean seriously wtf
2011-09-27 01:04:02 < Dark_Shikari> Also, they apparently dropped 3DNOW
Quote:
2011-09-28 01:33:41 < Dark_Shikari> AVX mbtree propagate is slower than sse2
2011-09-28 01:33:49 < Dark_Shikari> FMA only barely manages to get it fast again.
2011-09-28 01:33:49 < kemuri-_9> lol
2011-09-28 01:33:52 < Sean_McG> hahah
2011-09-28 01:33:59 < Dark_Shikari> SSE2: 342 cycles
2011-09-28 01:34:00 < Dark_Shikari> AVX: 374
2011-09-28 01:34:05 < Dark_Shikari> FMA4: 340
2011-09-28 01:34:18 < kemuri-_9> lol
2011-09-28 01:34:26 < Dark_Shikari> I guess this makes sense given that it only has 128-bit execution units
2011-09-28 01:34:34 < Dark_Shikari> and the INT16_TO_FLOAT code is obnoxiously slow because avx sucks
2011-09-28 01:34:41 < Dark_Shikari> i.e. avx has no way of doing int16_t -> float fast
2011-09-28 01:35:18 < Dark_Shikari> Hmm. I wonder if FMA4 supports sse registers?
2011-09-28 01:35:37 < Dark_Shikari> Oh. It *does*...
2011-09-28 01:35:38 < Dark_Shikari> Let me try that.
2011-09-28 01:37:45 * codestr0m ears perk up
2011-09-28 01:49:29 < Dark_Shikari> FMA4: 314 cycles. Much better
2011-09-28 01:49:46 < codestr0m> Dark_Shikari: what was the change?
2011-09-28 02:01:21 < Dark_Shikari> using the sse instead of avx version
2011-09-28 02:01:26 < Dark_Shikari> as the basis for xop
Quote:
2011-10-01 02:09:51 < Dark_Shikari> xop will make this a lot easier, but I'm trying to do ssse3 first
Quote:
2011-10-04 04:46:38 < Dark_Shikari> C, with mode analysis shortcuts: 253 cycles
2011-10-04 04:46:45 < Dark_Shikari> My crappy, badly optimized XOP asm: 93 cycles
2011-10-04 04:46:56 < Dark_Shikari> This is kinda awesome
2011-10-04 04:49:35 < Dark_Shikari> Oh, and old without shortcuts: 379 cycles
2011-10-04 04:49:45 < Dark_Shikari> My asm is 4 times faster than the existing... wait where have we seen this before? XD
2011-10-04 04:49:57 < Dark_Shikari> It's just like SAD_4x4_x9 all over again!
2011-10-04 04:50:10 < JEEB>
2011-10-04 04:50:18 < JEEB> that sounds pretty awesome
2011-10-04 04:50:21 < Dark_Shikari> Except this time I'm still wondering how best to do it without vpperm
2011-10-04 04:50:33 < Dark_Shikari> Thanks AMD, for bringing back the best instruction ever after 15+ years of hiatus.
Will that be possible if AMD will launch Bulldozer , and Radeon HD7000 together ? :rolleyes:
Not in this story... BD launch is set for 12 October.
So will the 125W TDP FX-8120 be better than the 95W version because of more overclocking headroom?
At stock speeds, the 125W will probably be the better buy (if you're not concerned with power usage) because the higher TDP ceiling means that more cores can be activated to the turbo speed. The 95W version will likely not have the same headroom, and may only let one or two cores scale to the turbo speed before they all throttle down.
As for overclocking, that's yet to be determined. As all FX series will have an unlocked multiplier, unless the higher end models are much higher binned, it would seem almost foolish to me to spend the extra on an 8150 vs an 8100. But, we'll see when they come out if the 8150 has higher overclocking headroom than the 8100 and 8120.
2 chips of the same model should perform near identically, i dont believe it will really be set at 125W anyway, i think it will be ~100W and the 95w version will have very similar headroom with extremely minor bump in efficiency that lets it reach the headroom.
I red but do you have a BD? So why should I believe you?
Cinebench11.5 scores are rather bad even CB10 is doing better. Hence I do not believe that the FPU is maxed out at all, especially as there is neither FMAC nor XOP/"MMX" code (the other 2 pipes in the FPU) used. Thus I think there is enough headroom for the 3,9Ghz Turbo stage. Anyways, we'll know in less than 1 week ;-)
Then you have to wait a few weeks longer, because there is only a 8120 and 8150 model at launch ;-)
However, there are 2 versions of the 8120, 125W and 95W, but from experience with the Phenom2s, I would assume that the 95W part will go directly into the OEM market to Dell, hp, etc.
It took me less than 2 mins to find these in retail... and at one retailer!
http://www.newegg.com/Product/Produc...82E16819103856
http://www.newegg.com/Product/Produc...82E16819103809
http://www.newegg.com/Product/Produc...82E16819103921
Well I should have been more specific. Yes, these are all 95W parts, but there is no 125W version of the same model. I was referring to the 1055T. Likewise to the FX8120 there are 2 versions of it, a 95W and a 125W:
http://products.amd.com/en-us/Deskto...?id=641&id=652
However, the 95W part just appeared recently in the etail market (2-3weeks back here in Europe), probably because OEMs are filling their stocks with Bulldozers now.
Also note that there is no "shop now" link for the 95W part in the above link.
Well yes, the parts are indeed different. However, your point was it would be shipped to OEM's and retail wouldn't see any (if at all). Unless it is a OEM special, like the 960T, where AMD clarified as much, i don't have a reason to believe that the 95W chip won't be seen in retail. If you look at it, Phenom II 945 is merely locked version of 940 at lower TDP. This being FX, i don't think AMD would muck about.
hmmm:
FX-6100 $189.99
http://www.tigerdirect.com/applicati...893&CatId=7246
FX-8120? $219.99
http://www.tigerdirect.com/applicati...896&CatId=7246
FX-8150? $259.99
http://www.tigerdirect.com/applicati...899&CatId=7246