Is there Bulldozer wafer at CES, JF-AMD ?
Printable View
Is there Bulldozer wafer at CES, JF-AMD ?
I know what pipeline stalls are. BTW such stall may have reasons other then only cache miss - unsucessfull branch prediction as example. But my question was not about cache misses or pipeline stalls, but about core efficiency (or, in another words, about utilization of core resources). Some time ago I saw some research about instruction parallelism. The main conclusion was that the average instruction rate per cycle in average program is about 2 instruction per cycle. Now Phenom has >6 execution units and 3 pipelines per core and Nehalem has 5 execution ports and 4 pipelines per core which mean the core resource utilization on those cpus is under 50% on average. I would say that the main advantage of HT is not the ability of running one thread when another stalls but the ability of running of both threads in parallel while each thread can schedule instructions to available execution units in any given cycle. This is the reason why nehalem having shorter pipeline and faster caches still benefits from HT more then Pentium4. Nehalem just have more execution resources.
I did not speak about "thread level parallelism" but about "instruction level parallelism".Quote:
The ability for parallelism to increase has more to do with the OS schedulers for the most part. OS's deployed 3 years ago were written when single cores ruled the earth. OS's deployed today were focused more on dual core and even to a small extent quad core, so they do a better job of scheduling. OS's that you will use in 3 years will do much better than today's. It is all a progression. Saying you don't need more cores in the future because today's OS's don't utilize all of the cores is like saying that a 1TB drive is too big. Give people enough storage space and they will fill it. Give them enough cores and they will figure out how to use them.
My notebook probably has 50 different services running (and 3-4 actual programs). There is always a use for more cores, the OS just needs to come along for the ride - and that will be happening.
Sorry for my bad english.
I won't comment on single thread performance because I do not have the numbers. The bulldozer architecture is a multi-core, multithreaded architecture.
Single core performance is going to become less important over time as applications become more multithreaded. I am on the server side, I don't deal with client systems or applications at all. For my customers single threaded performance is far less important because all of the apps are multi-threaded.
The proper server metric is throughput, which is a measure of all cores, not the clock speed of one core.
Customers that care about single threaded performance in servers would be staying with dual cores becuase those have the highest clocks. But we see customers abandoning duals in the server world like crazy.
Bulldozer will have more cores than magny cours, but the increase in performance between the two will be higher than the percentage increase in cores, so, to answer in a roundabout way, the cores will be faster, not slower.
selling a slower ship in fact is a suicide. AMD don't want suicide.
Bulldozer will be faster than laster CPU.
CQFD.
That's true, and from a server point of view that's perfectly acceptable. However, increases in single thread perfomance, that means increase of each core perfomance, will have a direct impact in throughput. Also the Bulldozer architecture is going to be in desktop and mobile too, where single thread perfomance is still much more important than multithreaded and I can't see that changing in a long time. If BD cores are not "significantly" faster than Shangai I can't see AMD changing today's situation, where the competition is faster in almost all levels, including the server space, where they're attacking you with less expensive (for them), better perf/watt, better absolute perfomance parts.
i think the problem is the balance between tech that works great now, and tech that will last a long time.
most people are still happy with their 4400+X2 they bought 5 years ago. i just upgraded my 4850 to a 5850 and saw no increase in WoW, i guess somehow that game is cpu limited even though my PII 940 never has one core above 70%. some things really just dont like scale today, but if everyone only offered quads and above, maybe software would accelerate faster. it would be nice to know the single threaded jump, since many consumer programs probably dont care to be re-written for quads and above. but i think if i got a 8 core BD, i should be set for hopefully 4 years before i would need to upgrade again.
@JF-AMD Is this a true approx ??
http://img710.imageshack.us/img710/6794/quadcore.jpg
Can the 128bit FMAC's can be combined to be used for 256bit operations??
That's a diagram of a QC Zambezi,yes. And the FPU(2 128bit units) in one module can be used by either one int core(256b) or both int cores sharing it a la SMT way(2x128b). Each of the 128b units in FPU is FMAC capable so in effect much much faster than what we have today in K10 or Nehalem.
If you take your time and test some games you'll see how CPU limited they are where it matters the most, min FPS. Crysis (yes, Crysis) works exactly the same in almost all the first and second level with or without AA with my 5850. Play with CPU frequency and you'll see some surprising numbers. Not to mention any "old" game (FEAR is old here) is only single threaded, and the situation is exactly the same as Crysis. Walk into a crowd in Assassin's Creed and play with GPU and CPU power, more interesting numbers will arise. Getting constant 120FPS with VSync is almost impossible in the majority of games because of the CPU, and this is with the fastest Intel processor, overclocked.
i think what i need to play with is resolution, i bet i get the same fps at 1920x1200 as i do with 2560x1600 in WoW. the really annoying part with WoW, was that no matter what settings i lowered, the fps did not go up by much at all. turned off shadows, particle effects, projected textures, no-matter what, i could only get maybe 30-40% more frames, with the settings severely reduced. not sure what a cpu does for WoW, but that game sure could use an update (or a bigger monitor, i so wish i could play with eyefinity before i have to give the card up)
Thanks informal and JF-AMD for clearing that up, now i was thinking about how is STORE used in bulldozer i mean if say a 256bit operation has to be stored can this be done in one cycle or will it need two cycles.
Is 256bit STORE possible in one cycle and does it have to be broken into two 128bit STORE or not?
My degree is in economics, you're outside my range on that one. However, I seem to recall the answer is yes, but there is no knowledge behind my answer, only overhearing a similar response from an engineer in a meeting.
You won't hear anything about bulldozer from CES. This far out from launch, all news will come out of AMD Analyst day presentations. We do them twice a year, spring and fall. The fall was just in november and that is where the bulk of the Bulldozer data came from. Also keep an eye on my blogs, I will talk about bulldozer from time to time and there will be data clarifications.
JF,
Do you have any information on AMD turbo implementation on bulldozer?
Do you also have any information on the type of l3 cache bulldozer will use?
Cheers :D.
There will be a turbo-type function in Bulldozer. I am not a big fan of features like this (and have said so on many occasions) because most server customers want lower power more than higher performance. I get more people asking me how to downclock the processor (we have a feature that turns off P-States) to reduce power consumption. I can't give details but I believe we have a better implementation.
As for the caches we have not released cache size data, that typically comes out at launch. Releasing cache sizes makes it easier for the other guys to try to model performance. The less they know, the better.
Lower power P-states make a ton of sense. The interesting thing is that all of these chips were always capable of running at more states than listed in the Thermal Spec guide. I never understood why AMD chose to limit it and hide all of the P-state setup in the BIOS. I reprogrammed the ACPI tables on all of my Opteron systems to add a new lower power state. It was easy enough once I had read enough documentation, but there's no good reason for this stuff to be hidden from users like that. (Of course, I'm also of the opinion that hiding *anything* is wrong, in the BIOS or wherever. Open source firmware, please.)
AMD's OverDrive software for desktop/consumers was a step forward, making tweaks more accessible. But *Under*drive is more interesting, for machines that are on 24/7 but not always crunching at full load.
Welcome JF-AMD, I have read you posts on AMDZone over time and you seem to bring some good info to the forums but I think it is only fair that you inform people that you are a marketing guy at AMD so when you post here you are working and naturally make things look good for AMD awful for Intel.
@gallag He is doing the same thing Mark does for Intel, noting wrong in that. Given that he cleared some doubts i had about 256bit execution i would say its a good thing to have marketing ppl with some inside knowledge otherwise its ppl claiming and curses...
Maybe all industry reps would just add it in their signature.
"I work for company X as Y"
Would be good.
Is bulldozer supposed to be compatibel with 800 series motherboards?
i hope to see in near future 900 series with dual CPU, and lucid onboard.