AMD's Bobcat and Bulldozer

**~~terrace215~~** · 08-24-2010, 11:16 AM

Originally Posted by god_43

honestly wtf cares about single thread performance? geez if ppl cared about it, they would not be buying dual-cores even. the whole nature of multiple cores is for multi-threading, anything else is fail!

Single-thread performance --> low-thread performance, due to the nature of 2-cored modules and Intel's 2-way HT.

So you are essentially saying "wtf cares about 4-threaded performance?!?!"

Well, that encompasses the vast majority of what you'll be doing on the desktop most of the time, so, the answer is: most people.

**~~terrace215~~** · 08-24-2010, 11:22 AM

Originally Posted by LesGrossman

That's like saying that with K8 AMD concluded that couldn't match the higher cpu clocks of netburst and that would always have to throw more instructions per clock at Intel MHZ's.

Crysis 2 will support 8 cores, good luck with the single thread performance thing.

In that case, you'll have the 8-cored SB 2011 monster able to run one thread per core, while the 8-core Zambezi runs in full module-based-resource sharing mode. The result will not be pretty.

Again, single thread performance really translates as N-threaded performance, where N is either the number of cores in an Intel device, or the number of modules in a BD device. Because up until those thresholds, shared resources aren't really having much of an impact, and you are not really constrained much by power, etc.

**kl0012** · 08-24-2010, 11:23 AM

Originally Posted by LesGrossman

Crysis 2 will support 8 cores, good luck with the single thread performance thing.

Originally Posted by god_43

honestly wtf cares about single thread performance? geez if ppl cared about it, they would not be buying dual-cores even. the whole nature of multiple cores is for multi-threading, anything else is fail!

It's a big mistake to think that a single thread performance doesn't matter. And I don't mean a single threaded programs. Just think about it - what is better -8 fast cores or 8 slow cores in a multithreaded program? Now if we speak about an upcoming high-end desktops then that would be 8x 4-wide 2 thread/core capable SB int cores vs 8x 2-wide Bulld int cores and 8x265 SB fpu vs 4x256 Bulld FPU. Yes, I'm avare about 16-core interlagos, but this probably is going to be a multimodule server oriented chip.

**-Boris-** · 08-24-2010, 11:23 AM

Originally Posted by btarunr

There's "L3 Cache and NB" and "Integrated Northbridge Controller".

The "NB" in "L3 Cache and NB" is the northbridge component found on AMD processors since K8. The "Northbridge Controller" is the IOMMU (what we're used to referring to as northbridge, such as AMD 790FX) integrated into the processor die.

No, the PCIe controler (like 790FX) isn't integrated. So we still don't know what that part is.
There are three NBs on that picture, and only one that truly is a northbridge, and that isn't even labeled as a northbridge, it's labeled as an IMC.

**Manicdan** · 08-24-2010, 11:25 AM

Originally Posted by terrace215

In that case, you'll have the 8-cored SB 2011 monster able to run one thread per core, while the 8-core Zambezi runs in full module-based-resource sharing mode. The result will not be pretty.

Again, single thread performance really translates as N-threaded performance, where N is either the number of cores in an Intel device, or the number of modules in a BD device. Because up until those thresholds, shared resources aren't really having much of an impact, and you are not really constrained much by power, etc.

in realistic cases it means that any 8core cpu can probably run crysis at 2ghz cause it would be built around 3ghz quads for high, and 2.5ghz duels as the normal.

doubt either 8core chip will "lag" in crysis

**superrugal** · 08-24-2010, 11:29 AM

Finally something news.

http://www.semiaccurate.com/2010/08/...mds-bulldozer/

**Oliverda** · 08-24-2010, 11:43 AM

delete pls

**FlanK3r** · 08-24-2010, 11:53 AM

read last post from yuri.cs Now, i dont know reality about AM3+ vs AM3

http://translate.google.cz/translate...409%23p8041409

**ajaidev** · 08-24-2010, 12:08 PM

Originally Posted by kl0012

It's a big mistake to think that a single thread performance doesn't matter. And I don't mean a single threaded programs. Just think about it - what is better -8 fast cores or 8 slow cores in a multithreaded program? Now if we speak about an upcoming high-end desktops then that would be 8x 4-wide 2 thread/core capable SB int cores vs 8x 2-wide Bulld int cores and 8x265 SB fpu vs 4x256 Bulld FPU. Yes, I'm avare about 16-core interlagos, but this probably is going to be a multimodule server oriented chip.

ammm sandy bridge uses a double pumped fpu makeup that means legacy "non-avx" will run it at 8*128 bit, bulldozer will also run the same with 4 modules or 8 cores.

Originally Posted by FlanK3r

read last post from yuri.cs Now, i dont know reality about AM3+ vs AM3

http://translate.google.cz/translate...409%23p8041409

Cant understand the translation can you give a summery of the post....

**ajaidev** · 08-24-2010, 12:28 PM

Bobcat article - http://www.brightsideofnews.com/news...-movement.aspx

AMD Bobcat Core plan: Add an 80-core Cedar GPU and you have - Ontario

80 shaders means it has double the cores of 4200/3200 and since i have overclocked a 4290 to 900mhz i know how well it performs. If this 80 core GPU is also clocked around 750-1000Mhz we may find quite some graphical horses under the bobcats bonnet...

**El Mano** · 08-24-2010, 01:07 PM

Originally Posted by LesGrossman

Crysis 2 will support 8 cores, good luck with the single thread performance thing.

Link?

**justin.kerr** · 08-24-2010, 01:16 PM

lost planet 2 uses 12 threads, at 4.5Ghz it keeps all 12 above 70%

**Chumbucket843** · 08-24-2010, 01:20 PM

Originally Posted by ajaidev

is that a joke? that's the most ridiculous floorplan i have ever seen. it looks like a map or a cloud or something.

**deeperblue** · 08-24-2010, 01:32 PM

Originally Posted by Chumbucket843

is that a joke? that's the most ridiculous floorplan i have ever seen. it looks like a map or a cloud or something.

Computer synthesized, only a few things are laid out by hand.

**madcho** · 08-24-2010, 01:35 PM

i was thinking the same the 10 first second. Looks like a fake map on a ship

**El Mano** · 08-24-2010, 01:41 PM

Originally Posted by justin.kerr

lost planet 2 uses 12 threads, at 4.5Ghz it keeps all 12 above 70%

I just wanted to read more about it. You can assign CPU time to many tasks, some are useful and some are not.

**Chumbucket843** · 08-24-2010, 01:41 PM

Originally Posted by deeperblue

Computer synthesized, only a few things are laid out by hand.

synthesizers do not floorplan. generally modern logic blocks are 50-100K gates, probably around 300-600K transistors. this is a rather large chunk when the core itself, including L1 & L2 it is probably <20M transistors.

**Sn0wm@n** · 08-24-2010, 01:43 PM

LOL is descriptive of this thread

**Chumbucket843** · 08-24-2010, 01:50 PM

Originally Posted by god_43

honestly wtf cares about single thread performance? geez if ppl cared about it, they would not be buying dual-cores even. the whole nature of multiple cores is for multi-threading, anything else is fail!

http://en.wikipedia.org/wiki/Amdahl's_law

a simple example:

a task is 50% parallel, 50% serial.
if i speed up the parallel part by 2x i increase performance by a factor of 1.33
if i speed up the parallel part by 10x i increase performance by a factor of 1.81
if i speed up the parallel part by 100x i increase performance by a factor of 1.98
if i speed up the parallel part infinitely i increase performance by a factor of 2

**radaja** · 08-24-2010, 01:54 PM

Will BD DROP into current AM3 boards or not?

**-Boris-** · 08-24-2010, 01:57 PM

Originally Posted by justin.kerr

lost planet 2 uses 12 threads, at 4.5Ghz it keeps all 12 above 70%

How do you know? In a true 12 core design do you think it would be over 70%? Maybe it uses four cores, and the HT-threads easily gets maxed out. Four cores is around 66% of a quad.

My point is, having a hexa core being utilized 70% can only mean at least 4 cores. It could be 12, but there is no way for you to see it, and since it isn't 100%, I seems like it isn't using all cores.

Originally Posted by deeperblue

Computer synthesized, only a few things are laid out by hand.

Finally, chips from AMD has always been nicely ordered, pointing at a mostly hand made layout. I can imagine that it leads to an uneven power usage and unnecessary long circuits and timings. And wastes die space.

**Chumbucket843** · 08-24-2010, 02:21 PM

Originally Posted by -Boris-

Finally, chips from AMD has always been nicely ordered, pointing at a mostly hand made layout. I can imagine that it leads to an uneven power usage and unnecessary long circuits and timings. And wastes die space.

lol, it is the exact opposite. hand layout is much better. humans are better at finding eulerian paths and coming up with clever layouts. computers cant really do that with all of the design rules and other parameters as effectively. the difference in performance is 2.6-7x faster with custom designed circuits.

really what happens is a coder will simulate his module and make sure it reaches the targeted timing, which is usually much higher than actual delay to assure robust operation. if the logic cant reach the speed it is either rewritten or circuit designers optimize it. in certain logic families it must be entirely custom designed.

circuits that are custom designed are usually things like power gating, clock distribution, and analog circuits such as pll's, dll's, and memory controllers/ io pads.

**Calmatory** · 08-24-2010, 02:23 PM

L1D drop to one fourth could possibly indicate two things, inclusive cache hierarchy and in rare cases slightly higher clocks.

Deeper pipelines then could further increase the clocks and simplify the design. I believe inclusive cache hierarchy would more than compensate for the loss in IPC, and if the BP is better, it could possibly even reduce the time spent in stalls vs. K10 with weaker BP but shorter pipelines.

...or then... The cache hierarchy remains exclusive and slow. Any benefit of exclusive cache is being lost due to smaller L1 compared to L2, and L2 compared to L3. Deeper pipeline reduces the IPC but doesn't allow much higher clocks to compensate for it. Aggressive BP isn't good enough to compensate deeper pipeline. More L1D misses. Speculative execution would lead to wrong decisions and L1I misses too often, greatly worsening the IPC in some cases while improving it only marginally in most cases. ...and GF messes with 32nm HighK SOI, effectively ruining the benefit of more cores in hope for better yields.

Inclusive cache hierarchy would mean that AMD would be able to squeeze the gap between K10/Nehalem cache performance, and bring BD to SB levels of cache performance.

**deeperblue** · 08-24-2010, 02:23 PM

Originally Posted by Chumbucket843

synthesizers do not floorplan. generally modern logic blocks are 50-100K gates, probably around 300-600K transistors. this is a rather large chunk when the core itself, including L1 & L2 it is probably <20M transistors.

AMD says (http://www.youtube.com/watch?v=VIs1CxuUrpc)
"Synthesizable with small number of custom arrays"
Together with what was said before I think one of the main goals that AMD wants to achieve is to have easily customizable processors. Add a gpu core here, some cache there and another core here. From the slide it looks like lots of their process is already capable of being laid out by a computer.
We have the caches, the integer units and the floating point units being the fixed hand optimized blocks with stuff like the x86 decode organically filling up the space in between. AMD also says it makes it easier to put the whole thing on a different process.

I've only limited knowledge about modern synthesizing and floor planning from working with some FPGAs.
Maybe Hans or somebody in the industry can say something about Bobcat?

**Hornet331** · 08-24-2010, 02:30 PM

Only 1 1/2 hour till the other NDA on the presentation slides drop.

Thread: AMD's Bobcat and Bulldozer

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions