I don't need to speculate But i surely am not going to be anyones "source" either.
Printable View
chew* we don't want any trouble for you so we won't ask.
I was hoping that BD would be faster than SB by 10%+ in most benches. I bought into the hype too heavily. AMD will have a chip that's within 10% of the performance of an I2600K. It's a good chip at a good price, I was just expecting a bit more. Because of my current setup (955@4ghz), I'm still going to buy the top model BD just to overclock the heck out of it when it's released. Roughly eight weeks to go. :)
within 10% on what tests? single threaded tests?, up to 4 threads?, 8 threads?, stock clocks? OCed?
they are way different in their design and both chips are designed to be competitive in all different segments, but i do expect one to be a clear winner in some, and clear looser in others.
It is bit weaker than expected few months ago, how ever i really think that Socket FM2 Bulldozer will show the true performance of this architecture.
AMD allways scales better with cold due its SOI.
Dave there is a key word in my post -
"Competitively priced"
I really want a 2500K and UD5 right now, actually I have for a while...but I don't have the money. I do prefer AMD hardware over intel a little bit, but as far as X6 lags behind it's time for something new to play with. I want to buy bulldozer so money has been put aside.
freeloader I think you should see drfedja's chart, thats the best case scenario BD could perform on average against the competition, doesn't mean it will really happen but If yes I would be more than happy, then the only thing I need is a Trinity 13'' notebook:D.
Manicdan BD in SuperPI will be the clear loser thats for sure:up:
demonkevy666 I meant it as a joke;)
According to the Swedish site Sweclockers there are rumors circulating in the Swedish sales channel that Bulldozer deliveries will begin in the last week of September (somewhere around the 26th to 30th of September).
If true it looks like we might see a September release after all, which is good news!
Link:
http://www.sweclockers.com/nyhet/143...an-i-september
Translated:
http://translate.google.se/translate...an-i-september
AMD said in the conference call that "BD will ship this qtr for revunue". Well thats all the way out to Oct.. This Sept. 19 date seems to be the next bogy. It'll be real telling then.
edit, once the cpus start shipping to OEM's, to me thats when the real "perfromace leaks" will start in ernest.
RussC
I've been gone all of July (left the 8th) but what I gleaned upon coming back (Aug 5th, was w/o internet at the cabin) is Sep 22nd, and first week or so in Oct for being in retail. Before I left it was an early/mid August slated release :(
I think this date will stick though, finally :\
I dont have 990X in here so i dont know performance compared to 990X.
just curious iFANyBOdY knows what the NB freq will be on Bulldozer?
is it still the same 2000mhz HT link and 2000mhz NB freq? will the NB have an unlocked Multi?
HT 3.1 should be at 3 Ghz (9xx series chipset), along with 2.6 Ghz NB frequency I believe. No idea on the multis though...
http://forums.aria.co.uk/showthread.php?t=75187Quote:
The things i can tell you.
1. Bulldozer is here mid-Sept
2. Bulldozer will be very price competitive vs Sandy Bridge
3. Bulldozer is awesome - ha lol.
4. Every part is unlocked
5. It can OC very well!
6. AMD will be doing something special with LN2 very soon - 1-2 weeks.
http://forums.aria.co.uk/showpost.ph...7&postcount=11
It can just confirm it's worth waiting.
I'm seriously thinking about freezing myself, then unfreeze when it's launched..
that was a really bad idea for cartman with the Wii
i betting you should invest your money into LN2, cause im thinking demand will go up and so will prices and you might turn a big enough profit to buy a second BD :up:
Yeah, I was thinking it wouldnt be too bad waking up in about, 200~300 years, then I could be some old, ancient chip wiseman. In that time, they have probably invented CPU's(or probably just APU'S) which is basically modified atoms(not intel atoms, those small atoms were kind off made of) then there will hopefully be some kind of chip museum or something, where they will hopefully have some kind of AMD section, and BD hopefully not their last chip. :)
Zambezi numbers yes i know, i5 2500k or i7 2600k numbers no, ran only graphics benchmarks.
Where is the quote from :confused:
Olivon I agree with you. If he really had BD I would be surprised.
I dont know..., what I know, the right true we will see soon.
FlanK3r probably something new will be presented this Friday at Hot chips conference of course no benches but maybe some new info about the architecture.
Olivon maybe I am wrong and he really has BD samples but thats something only he knows.
His post about how he can't compare BD to SB or Gulftown despite the fact so much reviews with different tests are out didn't add to his credibility, but maybe I am just misunderstanding.
I hope that, at Hot Chips, they can explain the performance degradations on the Software Optimization Guide:
-----------------------
Note: For best performance, do not mix streaming instructions on a cache line with non-streaming store instructions.
The following performance caveats apply when using streaming stores on AMD Family 15h cores.
•When writing out a single stream of data sequentially, performance of AMD Family 15h processors is comparable to previous generations of AMD processors.
•When writing out two streams of data, AMD Family 15h version 1 processors can be up to three times slower than previous-generation AMD processors. AMD Family 15h version 2 processor performance is approximately 1.5 times slower than previous AMD processors.
•When writing out four non-temporal streams, AMD Family 15h version 1 can be up to three times slower than previous AMD processors. AMD Family 15h version 2 processor performance is comparable to previous AMD processors.
•Using non-temporal stores but not writing out an entire cacheline may cause performance to be up to six times slower than previous AMD processors.
------------------------
It seems that this problem will even be (at a lower rate) on BDv2.
How will this affect BD performance on benches and "real world"?
I hope that is rare situations where prefetchnt instructions are critical. Maybe it could be critical when CPU works with high data paralelism and SIMD instructions. However it is crictical when we have data stream conflicts. When we use one single stream, writiong out data is comparable to 10h.
Probably lower performing prefetchnt instructions is caused by WT data cache policy. Bacause L1D is WT, every write to the cache causes a synchronous write to the backing store.
To avoid performance drop, AMD designers included WCC (Write Coalescing Cache) cache for WT stores for both integer cores.
In general, PREFETCHNTA instruction hints processor to fetch the data non-temporally (i.e. this data is not to be used again or used only once). e.g. You're copying data from one location to another you can use this instruction in that case. And PREFETCHTn instructions hints processor that these data are needed repeatedly. e.g. You're doing calculations on same data.
memmem: maybe they meaning "version 2" revision number 2...
Yes there is misunderstanding, i do not look at other peoples results or if i look i wont take notes of them.
Cannot compare SB or Gulftown to BD since i dont have working X58 or Z/P67 board at the moment. I have 980X and 2500K and 2600K cpus tough and 980X is pretty close 990X.
I lost alot of hardware at thunderstorm this summer, servers and my main pc is behind ups unit but other pc's and benching hardware isnt.
I didnt loose BD and 990FX since they werent plugged in at that moment.
Lost 6000$ worth of hardware and im not going to risk that again, now everything is behind ups units.
Id say yes, need see its price first tough.
I hope not. Maybe there could be exception, but in general I think that the BD will be much faster than Thuban. If you write some memory cpy routine to avoid standard C/C++ library you can use these NT - non temporal instructions.
A typical use for non-temporal stores is copying memory regions that are too large to fit in the cache. Using ordinary stores for that would waste memory bandwidth by unnecessarily fetching all the data in the destination region into the cache before overwriting it. Any useful data that is in the cache before the copy would also be replaced with data from the source and destination regions.
Another typical use is initialization of data structures too large to fit in the cache, for example, setting a large array to all zeros.
A major drawback of non-temporal stores is that they are fairly complex to work with. If they are improperly used they can easily cause performance degradations, or even hard-to-debug bugs in the case of multithreaded programs. There is a main reason to think that BD performance could not be sacrificed because of streaming conflicts - that is improper usage of NT store instructions.
hmmmm i read next thread
FM2 socket ? very confused
One of the most objective articles I've seen on FX performance...
I'm not sure how true it will turn out to be, but it does seem to make sense, and would
explain some of the wild swings we've seen from pre-release "benchmarks :rolleyes:"...
http://www.extremetech.com/computing...s-next-gen-cpu
Maybe it's just me, but these chips look like a blast to play with even if they can't crush BigBlue in all situations... ;)
next crazy chinese fake?....Now with very impressive score, the say, this is FX 4120
Attachment 119104
http://diybbs.zol.com.cn/11/11_100430.html
Think, it is not possible, 2 modules higher score than 980x....
I don't think that is a score for QC Zambezi Flanker. It may be a score for 8C one ,it fits almost perfectly (my estimate for 8C was 19220pts :) ).
its the uber quadcore....
Informal: I think the same, logic say me, simillary score is possible only for 4-modules. Maybe Fritz readings modules as number of cores?
But...what is interesting is this:
from RAWZ
That is some Super Exciting news from RawZ.
:rofl::rofl::rofl::rofl:Quote:
It'll be interesting to see how BD handles the well known Intel loving Super Pi benchmark. If AMD BD can outperform SB in that benchmark, then fooook me, that'll cause a massive storm.
Yeah Super Pi has the highest priority, not some toy benchmark like Cinebench or whatever..
AMD doesn't care about SuperPI.
the supposed LN2 AMD event is coming really soon, it would be nice to see if they can break 9ghz
A HTT bug causing performance problems could have been the reason for the delay.
http://donthatethegeek.com/2011/08/1...in-production/
credit goes to yuri.cs from pctuning forum who I got it from
Dirk Diggler I don't know, the scores what I saw were also in multi-threaded applications and didn't seem low, maybe they will be in B2 so much better based how much perf. penalty this bug causes.
Comp_Nou has changed the mobo with M5A97 EVO one. Results are better than the first try with AMD760 :
http://tof.canardpc.com/preview2/041...17774c817d.jpg
The guy is waiting for better BIOS
http://www.chiphell.com/thread-250461-1-1.html
undone yeah, they are serious Chinese amateurs.:shakes: Nothing to be surprised about.
Super pi uses ancient x87 instructions which are depreciated and obsolete since SSE2 launched. Super pi is in no way representative of FPU power of one single core.It just shows how fast can you do x87 math. Not to mention all those BD ES results are borked so move along.
BTW you quoted a post from January 2011,just FYI...
CrazyNutz no, he is right It doesn't represent single threaded performance
AMD doesn't care about this old instruction set which is no longer used in modern applications.Quote:
Super PI utilizes the x87 instruction set. These instructions date all the way back to the 8087 math coprocessor. While they were important for 80386, 80486, and Pentium they became obsolete when 3DNow! and Streaming SIMD Extensions were released.
informal
really, what a CrazyNutz guy:rofl:.Quote:
BTW you quoted a post from January 2011,just FYI...
Oops my mistake quoting something as far back as 01/11. I'll own up here.
However one thing you guys must understand is "A Large Percentage of software is NOT optimized with SIMD" and some software cannot make effective use of SIMD. It takes manual optimizations to make real use of SIMD. If you were to disassemble most of the software you use on a daily basis you would know what I talking about. You would see a whole lot of NON SIMD instructions in use i.e. legacy float, and integer instructions.
Some of you seem to think the addition of new instructions make the original x86/87 instructions obsolete, well that is not always the case especially with vector instructions. Fellow programmers will know what I'm saying here.
Edit: added this
They are not obsolete. They are used far more than you realize. That like saying x86 is obsolete.
Don't get defensive, I'm not bashing BD at all :)
Since AMD64 instruction set launched along with 64bit Windows,x87 IS obsolete. In 64 bit OS x87 is not used(like literary ,it's replaced by SSE which duplicates its capabilities). There is no discussion there.
It does NOT fully duplicate it's capabilities, this is a misconception, and x87 has higher precision 80bit vs sse2 64bit. Also compilers try to optimize code to use faster instructions, like mmx/sse etc., however most of the time they fail to produce an executable using these instructions. Where SIMD (sse2/mmx/avx etc.) shines is when you have a string of data that needs to have the same operation performed, it really speeds things up, however again compilers are rarely smart enough to do these optimizations on their own, so these instructions most of the time are hand written and included as inline assembly. I write software this way, and so do my colleagues, however only when it's necessary, otherwise we allow the compiler to spit out x86/87 instructions.
EDIT: Looking further into this, When compiling for 64bit you are correct that sse replaces x87 instructions by default with most recent compilers (i.e. gcc defaults -mfpmath=sse for 64bit).
However there is still a lot of 32bit games, and programs in use, and when compiled for 32bit most compilers default to x87 instructions for floating point math.
What about Hot chips conference? Some new info about BD architecture would be great.
drfedja Even if you are right, which program except SupePI or HyperPI still uses x87 instruction set?
Not neseserly. If you have 64-bit hypothetic version of SPi or HyperPi, that will use only SIMD, or scalar SSE instructions. 32-bit programs can use both, x87 and/or SSEx.
64-bit programs run in long mode, but 32-bit programs run also in long mode on 64-bit operating system, but in compatibility mode, where is allowed to use all legacy instructions. 32-bit programs, as you know, on 64-bit system without recompiling and they are run unchanged. ;)
Everyone who has compiled with x87 and 32-bit code.
drfedja actually I was asking what other programs used for testing cpu's are still using x87?
I am asking because this is the only test where SB 2500 is almost 2x faster compared to x4 980.
I dont think any other testing program uses x87 exept Pi benchmark's.
At least not when 64bit versions of other benchmarks are used.
For example, 32-bit version of Cinebench may use x87, some game engines, nVidia physics engine for CPU (Physics87 :d ), some of old codecs and fp intensive applications that not use SSE for example Euler3D, I don't know, but if you wan't to know what app use x87 you may use AMD code analyst or similar code profiling tool, for eg. Intel VTune.
No one today modern app doesn't use x87 any more. That's for sure. ;) But SPi isn't modern, and today app. It is single thread, 32-bit, legacy x87 FP coded program. :P
Of course, because 64-bit compiled app can't use x87 legacy instructions. That is architectural limitation.
drfedja Thanks, that's what I wanted to know. That means SuperPI is not fit to be representative for single threaded performance in modern applications.
In general, every floating point intensive application has 55% FP code and 45% integer code and integer app has all integer code which contains more memory operations (mov, push, pop, call, ret) and less logical and arithmetical.
No it isn't, because it is like to benchmark performance of modern CPU with 16-bit DOS programm. It is pointless.Quote:
That means SuperPI is not fit to be representative for single threaded performance in modern applications.
If you want to see how some CPU perform in calculating number Pi, there is some much better apps and programs whos calculate Pi digits hundred times faster than Super Pi. For example, Wolfram Mathematica calculate 1M Pi digits in fraction of second on old dual Core Opteron 170 S939.
What can you tell Super Pi about some new CPU. Almost nothing, except how it performs with cache and stack memory operations. SPi can't tell you how fast CPU handles with x87 legacy FP operations, because SPi is mixed code and it is limited by cache miss and memory reordering.
Only real application for SuperPi is in overclocking competition.
Talking about raw Hardware evolution, as GPGPU potentional is tapped for floating point performance, you will be less requiered to use legacy Instruction Sets and Hardware like the x87 FPU (That supposedly was a pain to program for in Assembler). x87 is already obsolete if you're running in Long Mode, something that is available from 8 years ago, so there is a waste of die size and power on a lot of legacy components that as time advances will make less sense, but are keep around just for legacy compatibility.
Bulldozer shows a bit of this type of evolution because it boast much more integer potential compared to its floating point capabilities, however, its not the first time we see something like this. Don't you guys remember that the Pentium 4 x87 FPU was quite weak and Intel was pushing that applications used SSE/SSE2 instead? However, AMD timming on this one seems a better bet and we still don't know how much legacy performance is sacrificed. If GPGPU takes over FPU tasks, that is when Fusion will really kick in, as you will have powerful GPU resources in place ready to take over.
Also, I'm very interesed on seeing how future designs evolve. You may not need dedicated Hardware for x87 or other obsoleted Instruction Sets other than at the Hardware decoder stage, after all, while the multiple Instruction Sets and Extensions are compatible with a ton of x86 Processors architectures, each work internally very different to the other after instructions are converted to MicroOps. As Fusion evolves you may see a very powerful FPU/GPU that is feeded decoded x87, SSE and GPU Microops.
I agree, but x87 FPU performance is relativly important today. I don't think so that AMD has sacrified x87 FP because it uses same hardware resources as SIMD.
FMAC 0 and FMAC 1 can handle every FMUL or FADD instruction including x87. There is one difference to 10h. BD FPU can handle two FMUL or two FADD instruction at the same time. For single operations thread that can bring some performance. However that is only sideeffect of building FMAC 4-way FPU.
P4 x87 FPU was weak because of P4 microarchitecture. Front End of P4 is quite narrow. There is only one decoder, no L1-I cache, instead of that there is only trace cache. However, trace cache isn't that bad idea, but with such front end this is pointless. Second, P4 has only two FP pipes. One for FP move and store and second for FP ADD/MUL SIMD and x87 including MMX and integer SSE, and this unit is 128-bit wide. Because of that 128-bit SSE2 throughput isn't that bad on P4, but it is limited to 2DP or 4SP FP ops/cycle. FADD or FMUL on NetBurst has same througput like FADD od FMUL on Nehalem or 10h and double of throughput of K8, because 128-bit SSEx extension on the K8 is executed like two macroops (double dispatch). K8 has two 64-bit SIMD units and 10h has two 128-bit SIMD units. They are widened in order to retire 128-bit SIMD instructions in one cycle. 10h has doubled of FP throughput per core.Quote:
Bulldozer shows a bit of this type of evolution because it boast much more integer potential compared to its floating point capabilities, however, its not the first time we see something like this. Don't you guys remember that the Pentium 4 x87 FPU was quite weak and Intel was pushing that applications used SSE/SSE2 instead? However, AMD timming on this one seems a better bet and we still don't know how much legacy performance is sacrificed. If GPGPU takes over FPU tasks, that is when Fusion will really kick in, as you will have powerful GPU resources in place ready to take over.
Todays GPU's can handle variety of tasks very well, but not all. Next generation of AMD GPUs will be more CPU like. With OpenCL FP intensive tasks can handle both, CPU and GPU simultaneously and that is the beauty of usage of OCL. OpenCL code is highly optimised for SIMD usage on CPU's.
Future is fusion! ;) Next generations of AMD GPUs could execute CPU code, especially SIMD (SSE, AVX, FMA) with some kind of software layer like Java VM or Flash. That software switch is named AMD IL (intermediate language) or something like that. That makes a lot of sense to use with next gen. APU's with has shared address space with integrated GPU (this is APU only feature, and IGP in next gen. APU will use x86-64 address pointers).
There is no dedicated hardware, because is same hw is used for SIMD and x87. With AMD microarchitecture, x86 instructions are decoded to MOPs (Macro Operations). MOP is the pair of ALU/MEM operation. ALU and MEM operations in later pipeline stages are dispatched to execution units like micro operations when they came to reservation stations or schedulers. There is one difference between 10h and BD. 10h like all previous designs from K7 to 10h, has dedicated integer memory scheduler, and macroops has lanes. You cannot switch easily one macroop to another lane. With BD, there is unified integer scheduler. Every macro and microop in the scheduler can execute on every free execution unit if they have loaded operands from data cache to register file.Quote:
Also, I'm very interesed on seeing how future designs evolve. You may not need dedicated Hardware for x87 or other obsoleted Instruction Sets other than at the Hardware decoder stage, after all, while the multiple Instruction Sets and Extensions are compatible with a ton of x86 Processors architectures, each work internally very different to the other after instructions are converted to MicroOps. As Fusion evolves you may see a very powerful FPU/GPU that is feeded decoded x87, SSE and GPU Microops.
How do you think to decode and schedule GPU instructions into the CPU? That isn't possible. I've explained here what AMD want to make.
CPU instructions decoding inside CPU, and GPU instructions are executed inside GPU, but GPU can execute all SIMD, because of that, AMD can make software context switch for CPU/GPU to handle all instructions with all hardware.
Here is the picture:
Attachment 119179
Every x86 instruction proceed directly to CPU, and every CPU SIMD instruction proceed to CPU or translate to GPU. That is the concept where CPU can handle serial and parallel workload and GPUs works with massive parallel data.
Attachment 119180
i have been out of it for a couple of weeks,any thing new on BD?
http://tof.canardpc.com/view/0ef0ddf...62769287c1.jpg
http://semiaccurate.com/forums/showp...&postcount=257Quote:
315mm^2 according to AMD.
Edit :
AMD presents details of the bulldozers Archtitektur at Hot Chips 23 - Technic3D
http://tof.canardpc.com/view/9ef886d...bea958a146.jpg
http://tof.canardpc.com/view/9cbde70...8e8f3ea5fc.jpg
Quote:
About the performance can be based on this information but do not judge here is only a test of future provide more insight. Based on the known rates can be estimated, however roughly, that the Zambezi processors can not compete with the top model in the next Sandy Bridge EX models from Intel. According to rumors, the Pro-MHz performance is even lower than the current Phenom CPUs are. The focus is thus clear for multi-threading and server market. The "near future", AMD revealed as we will therefore show how the bulldozer surprised or even disappointed.
Its not the slide he is pushing but the quote below it!
He is a Dr Who lover..............................Quote:
About the performance can be based on this information but do not judge here is only a test of future provide more insight. Based on the known rates can be estimated, however roughly, that the Zambezi processors can not compete with the top model in the next Sandy Bridge EX models from Intel. According to rumors, the Pro-MHz performance is even lower than the current Phenom CPUs are. The focus is thus clear for multi-threading and server market. The "near future", AMD revealed as we will therefore show how the bulldozer surprised or even disappointed.
This part is BS. Single-threaded performance at the same frequency can't be lower than K10.5 or BD could never be faster in multi than SB2600 even if HT gives much smaller performance gains than a dedicated core or module.Quote:
According to rumors, the Pro-MHz performance is even lower than the current Phenom CPUs are. The focus is thus clear for multi-threading and server market.
That's no sense because AMD has no reason to make chip whos discard his own conception. Shared resources in module isn't that much problem for variety of workloads. Because of that, BD must have little stronger per thread, per core IPC, and much stronger per module IPC. In contrary whole BD concept is nonsense and AMD R&D are completly fools, because they can "upgrade" 10h to eight cores and turbo core 2 logic. That hypothetical X8 could outpace SB in multithreading.
next slides from Hotchips
http://img18.imageshack.us/img18/9189/a5464745s.th.jpghttp://img198.imageshack.us/img198/7...464735s.th.jpg
http://img577.imageshack.us/img577/4...464736s.th.jpghttp://img844.imageshack.us/img844/5...464737s.th.jpg
http://img820.imageshack.us/img820/701/a5464738s.th.jpghttp://img40.imageshack.us/img40/7336/a5464739s.th.jpg
http://img828.imageshack.us/img828/5...464740s.th.jpghttp://img812.imageshack.us/img812/6...464741s.th.jpg
http://img688.imageshack.us/img688/8...464742s.th.jpghttp://img847.imageshack.us/img847/5...464743s.th.jpghttp://img836.imageshack.us/img836/5...464744s.th.jpg
I won't say, "I told you so" until official release...
did they have a security guard near the cage? AMD is showing it as some priceless relic :(
on the other hand that case looks really nice. Anyone know who made it and what the name is? :)