What about Hot chips conference? Some new info about BD architecture would be great.
Printable View
What about Hot chips conference? Some new info about BD architecture would be great.
drfedja Even if you are right, which program except SupePI or HyperPI still uses x87 instruction set?
Not neseserly. If you have 64-bit hypothetic version of SPi or HyperPi, that will use only SIMD, or scalar SSE instructions. 32-bit programs can use both, x87 and/or SSEx.
64-bit programs run in long mode, but 32-bit programs run also in long mode on 64-bit operating system, but in compatibility mode, where is allowed to use all legacy instructions. 32-bit programs, as you know, on 64-bit system without recompiling and they are run unchanged. ;)
Everyone who has compiled with x87 and 32-bit code.
drfedja actually I was asking what other programs used for testing cpu's are still using x87?
I am asking because this is the only test where SB 2500 is almost 2x faster compared to x4 980.
I dont think any other testing program uses x87 exept Pi benchmark's.
At least not when 64bit versions of other benchmarks are used.
For example, 32-bit version of Cinebench may use x87, some game engines, nVidia physics engine for CPU (Physics87 :d ), some of old codecs and fp intensive applications that not use SSE for example Euler3D, I don't know, but if you wan't to know what app use x87 you may use AMD code analyst or similar code profiling tool, for eg. Intel VTune.
No one today modern app doesn't use x87 any more. That's for sure. ;) But SPi isn't modern, and today app. It is single thread, 32-bit, legacy x87 FP coded program. :P
Of course, because 64-bit compiled app can't use x87 legacy instructions. That is architectural limitation.
drfedja Thanks, that's what I wanted to know. That means SuperPI is not fit to be representative for single threaded performance in modern applications.
In general, every floating point intensive application has 55% FP code and 45% integer code and integer app has all integer code which contains more memory operations (mov, push, pop, call, ret) and less logical and arithmetical.
No it isn't, because it is like to benchmark performance of modern CPU with 16-bit DOS programm. It is pointless.Quote:
That means SuperPI is not fit to be representative for single threaded performance in modern applications.
If you want to see how some CPU perform in calculating number Pi, there is some much better apps and programs whos calculate Pi digits hundred times faster than Super Pi. For example, Wolfram Mathematica calculate 1M Pi digits in fraction of second on old dual Core Opteron 170 S939.
What can you tell Super Pi about some new CPU. Almost nothing, except how it performs with cache and stack memory operations. SPi can't tell you how fast CPU handles with x87 legacy FP operations, because SPi is mixed code and it is limited by cache miss and memory reordering.
Only real application for SuperPi is in overclocking competition.
Talking about raw Hardware evolution, as GPGPU potentional is tapped for floating point performance, you will be less requiered to use legacy Instruction Sets and Hardware like the x87 FPU (That supposedly was a pain to program for in Assembler). x87 is already obsolete if you're running in Long Mode, something that is available from 8 years ago, so there is a waste of die size and power on a lot of legacy components that as time advances will make less sense, but are keep around just for legacy compatibility.
Bulldozer shows a bit of this type of evolution because it boast much more integer potential compared to its floating point capabilities, however, its not the first time we see something like this. Don't you guys remember that the Pentium 4 x87 FPU was quite weak and Intel was pushing that applications used SSE/SSE2 instead? However, AMD timming on this one seems a better bet and we still don't know how much legacy performance is sacrificed. If GPGPU takes over FPU tasks, that is when Fusion will really kick in, as you will have powerful GPU resources in place ready to take over.
Also, I'm very interesed on seeing how future designs evolve. You may not need dedicated Hardware for x87 or other obsoleted Instruction Sets other than at the Hardware decoder stage, after all, while the multiple Instruction Sets and Extensions are compatible with a ton of x86 Processors architectures, each work internally very different to the other after instructions are converted to MicroOps. As Fusion evolves you may see a very powerful FPU/GPU that is feeded decoded x87, SSE and GPU Microops.
I agree, but x87 FPU performance is relativly important today. I don't think so that AMD has sacrified x87 FP because it uses same hardware resources as SIMD.
FMAC 0 and FMAC 1 can handle every FMUL or FADD instruction including x87. There is one difference to 10h. BD FPU can handle two FMUL or two FADD instruction at the same time. For single operations thread that can bring some performance. However that is only sideeffect of building FMAC 4-way FPU.
P4 x87 FPU was weak because of P4 microarchitecture. Front End of P4 is quite narrow. There is only one decoder, no L1-I cache, instead of that there is only trace cache. However, trace cache isn't that bad idea, but with such front end this is pointless. Second, P4 has only two FP pipes. One for FP move and store and second for FP ADD/MUL SIMD and x87 including MMX and integer SSE, and this unit is 128-bit wide. Because of that 128-bit SSE2 throughput isn't that bad on P4, but it is limited to 2DP or 4SP FP ops/cycle. FADD or FMUL on NetBurst has same througput like FADD od FMUL on Nehalem or 10h and double of throughput of K8, because 128-bit SSEx extension on the K8 is executed like two macroops (double dispatch). K8 has two 64-bit SIMD units and 10h has two 128-bit SIMD units. They are widened in order to retire 128-bit SIMD instructions in one cycle. 10h has doubled of FP throughput per core.Quote:
Bulldozer shows a bit of this type of evolution because it boast much more integer potential compared to its floating point capabilities, however, its not the first time we see something like this. Don't you guys remember that the Pentium 4 x87 FPU was quite weak and Intel was pushing that applications used SSE/SSE2 instead? However, AMD timming on this one seems a better bet and we still don't know how much legacy performance is sacrificed. If GPGPU takes over FPU tasks, that is when Fusion will really kick in, as you will have powerful GPU resources in place ready to take over.
Todays GPU's can handle variety of tasks very well, but not all. Next generation of AMD GPUs will be more CPU like. With OpenCL FP intensive tasks can handle both, CPU and GPU simultaneously and that is the beauty of usage of OCL. OpenCL code is highly optimised for SIMD usage on CPU's.
Future is fusion! ;) Next generations of AMD GPUs could execute CPU code, especially SIMD (SSE, AVX, FMA) with some kind of software layer like Java VM or Flash. That software switch is named AMD IL (intermediate language) or something like that. That makes a lot of sense to use with next gen. APU's with has shared address space with integrated GPU (this is APU only feature, and IGP in next gen. APU will use x86-64 address pointers).
There is no dedicated hardware, because is same hw is used for SIMD and x87. With AMD microarchitecture, x86 instructions are decoded to MOPs (Macro Operations). MOP is the pair of ALU/MEM operation. ALU and MEM operations in later pipeline stages are dispatched to execution units like micro operations when they came to reservation stations or schedulers. There is one difference between 10h and BD. 10h like all previous designs from K7 to 10h, has dedicated integer memory scheduler, and macroops has lanes. You cannot switch easily one macroop to another lane. With BD, there is unified integer scheduler. Every macro and microop in the scheduler can execute on every free execution unit if they have loaded operands from data cache to register file.Quote:
Also, I'm very interesed on seeing how future designs evolve. You may not need dedicated Hardware for x87 or other obsoleted Instruction Sets other than at the Hardware decoder stage, after all, while the multiple Instruction Sets and Extensions are compatible with a ton of x86 Processors architectures, each work internally very different to the other after instructions are converted to MicroOps. As Fusion evolves you may see a very powerful FPU/GPU that is feeded decoded x87, SSE and GPU Microops.
How do you think to decode and schedule GPU instructions into the CPU? That isn't possible. I've explained here what AMD want to make.
CPU instructions decoding inside CPU, and GPU instructions are executed inside GPU, but GPU can execute all SIMD, because of that, AMD can make software context switch for CPU/GPU to handle all instructions with all hardware.
Here is the picture:
Attachment 119179
Every x86 instruction proceed directly to CPU, and every CPU SIMD instruction proceed to CPU or translate to GPU. That is the concept where CPU can handle serial and parallel workload and GPUs works with massive parallel data.
Attachment 119180
i have been out of it for a couple of weeks,any thing new on BD?
http://tof.canardpc.com/view/0ef0ddf...62769287c1.jpg
http://semiaccurate.com/forums/showp...&postcount=257Quote:
315mm^2 according to AMD.
Edit :
AMD presents details of the bulldozers Archtitektur at Hot Chips 23 - Technic3D
http://tof.canardpc.com/view/9ef886d...bea958a146.jpg
http://tof.canardpc.com/view/9cbde70...8e8f3ea5fc.jpg
Quote:
About the performance can be based on this information but do not judge here is only a test of future provide more insight. Based on the known rates can be estimated, however roughly, that the Zambezi processors can not compete with the top model in the next Sandy Bridge EX models from Intel. According to rumors, the Pro-MHz performance is even lower than the current Phenom CPUs are. The focus is thus clear for multi-threading and server market. The "near future", AMD revealed as we will therefore show how the bulldozer surprised or even disappointed.
Its not the slide he is pushing but the quote below it!
He is a Dr Who lover..............................Quote:
About the performance can be based on this information but do not judge here is only a test of future provide more insight. Based on the known rates can be estimated, however roughly, that the Zambezi processors can not compete with the top model in the next Sandy Bridge EX models from Intel. According to rumors, the Pro-MHz performance is even lower than the current Phenom CPUs are. The focus is thus clear for multi-threading and server market. The "near future", AMD revealed as we will therefore show how the bulldozer surprised or even disappointed.
This part is BS. Single-threaded performance at the same frequency can't be lower than K10.5 or BD could never be faster in multi than SB2600 even if HT gives much smaller performance gains than a dedicated core or module.Quote:
According to rumors, the Pro-MHz performance is even lower than the current Phenom CPUs are. The focus is thus clear for multi-threading and server market.
That's no sense because AMD has no reason to make chip whos discard his own conception. Shared resources in module isn't that much problem for variety of workloads. Because of that, BD must have little stronger per thread, per core IPC, and much stronger per module IPC. In contrary whole BD concept is nonsense and AMD R&D are completly fools, because they can "upgrade" 10h to eight cores and turbo core 2 logic. That hypothetical X8 could outpace SB in multithreading.
next slides from Hotchips
http://img18.imageshack.us/img18/9189/a5464745s.th.jpghttp://img198.imageshack.us/img198/7...464735s.th.jpg
http://img577.imageshack.us/img577/4...464736s.th.jpghttp://img844.imageshack.us/img844/5...464737s.th.jpg
http://img820.imageshack.us/img820/701/a5464738s.th.jpghttp://img40.imageshack.us/img40/7336/a5464739s.th.jpg
http://img828.imageshack.us/img828/5...464740s.th.jpghttp://img812.imageshack.us/img812/6...464741s.th.jpg
http://img688.imageshack.us/img688/8...464742s.th.jpghttp://img847.imageshack.us/img847/5...464743s.th.jpghttp://img836.imageshack.us/img836/5...464744s.th.jpg
I won't say, "I told you so" until official release...
did they have a security guard near the cage? AMD is showing it as some priceless relic :(
on the other hand that case looks really nice. Anyone know who made it and what the name is? :)