AMD to start Bulldozer AM3+ production by March 2011, launch in April 2011

**JF-AMD** · 01-05-2011, 06:38 AM

Originally Posted by Apokalipse

Bulldozer Module is completely different.

HyperThreading only appears like two cores to the OS.
In reality it only uses the second thread to reduce the amount of time the core spends doing nothing whenever there's a stalled thread.

So for a core without hyperthreading:
cache miss occurs -> requests correct data from memory -> waits for it to arrive -> continues thread

A core with hyperThreading:
cache miss occurs -> requests correct data from memory -> instead of waiting for it and processing the first thread, it starts processesing the second thread.

Each Bulldozer module does actually have the hardware to process two threads simultaneously. It is actually two cores.

HyperThreading doesn't and can't make the one core process two threads simultaneously. It just reduces the time it spends waiting because of stalled threads.

It's like trying to put 10 pounds of data in a 5 pound bag. While HT allows a single core to handle 2 threads, it can only handle one at a time. There is only 1 set of integer pipelines, so while 2 threads are initiated, only one is active.

It is like the ability to date 2 people. Anyone can date 2 people, but the ability to do both at once is very rare, if not non-existent.

**Manicdan** · 01-05-2011, 06:41 AM

Originally Posted by JF-AMD

It's like trying to put 10 pounds of data in a 5 pound bag. While HT allows a single core to handle 2 threads, it can only handle one at a time. There is only 1 set of integer pipelines, so while 2 threads are initiated, only one is active.

It is like the ability to date 2 people. Anyone can date 2 people, but the ability to do both at once is very rare, if not non-existent.

lol, grats for making a non-car analogy
and shame on you for having a girl in every city you travel too

**qcmadness** · 01-05-2011, 06:45 AM

Originally Posted by JF-AMD

It's like trying to put 10 pounds of data in a 5 pound bag. While HT allows a single core to handle 2 threads, it can only handle one at a time. There is only 1 set of integer pipelines, so while 2 threads are initiated, only one is active.

It is like the ability to date 2 people. Anyone can date 2 people, but the ability to do both at once is very rare, if not non-existent.

It depends on workload of the threads.

**kl0012** · 01-05-2011, 07:01 AM

Originally Posted by JF-AMD

It's like trying to put 10 pounds of data in a 5 pound bag. While HT allows a single core to handle 2 threads, it can only handle one at a time. There is only 1 set of integer pipelines, so while 2 threads are initiated, only one is active.

It is like the ability to date 2 people. Anyone can date 2 people, but the ability to do both at once is very rare, if not non-existent.

You have no idea what you talking about. Have you?
Your comparisions are really wrong and misleading.
Interleaved hyperthreading (only one thread at time can be actived on each core) is used only in itanium. All x86 hyperthreaded cores can really handle 2 threads at time (depending on available execution resources). But since in most cases there are available execution resources (a single thread rarely utilize even 70% of available core execution resources), so HT mostly shows positive effect. This is really simple and elegant way to utilize resources which are already in place but not used.

**badboy18187** · 01-05-2011, 07:13 AM

BTW i don`t know if I`m posting here correctly but, are there any rumours regarding the APU that will use a Zambezi CPU and a highend 28nm GPU?I think I saw some slides somewhere that it might launch this year.And one other thing, where did this rumour regarding BD`s IPC < SB came from? did AMD themselves actually said that it will be lower?

**zalbard** · 01-05-2011, 07:47 AM

Originally Posted by badboy18187

BTW i don`t know if I`m posting here correctly but, are there any rumours regarding the APU that will use a Zambezi CPU and a highend 28nm GPU?I think I saw some slides somewhere that it might launch this year.And one other thing, where did this rumour regarding BD`s IPC < SB came from? did AMD themselves actually said that it will be lower?

This is false. We won't even see a high-end discrete 28nm GPU from AMD this year. And in APU? LOL.

**demonkevy666** · 01-05-2011, 07:53 AM

Originally Posted by kl0012

You have no idea what you talking about. Have you?
Your comparisions are really wrong and misleading.
Interleaved hyperthreading (only one thread at time can be actived on each core) is used only in itanium. All x86 hyperthreaded cores can really handle 2 threads at time (depending on available execution resources). But since in most cases there are available execution resources (a single thread rarely utilize even 70% of available core execution resources), so HT mostly shows positive effect. This is really simple and elegant way to utilize resources which are already in place but not used.

number of threads is irrelevant compared to amount of execution at one time.

**Apokalipse** · 01-05-2011, 07:53 AM

Originally Posted by zalbard

This is false. We won't even see a high-end discrete 28nm GPU from AMD this year. And in APU? LOL.

Actually, we were supposed to have 28nm before 2011, but TSMC didn't get it ready in time.

**Mechromancer** · 01-05-2011, 08:44 AM

HOLD ON...

If the desktop version of BD, Zambezi will be here in April, the server parts should arrive a few months prior correct? Can we expect Interlagos in February?

**Oliverda** · 01-05-2011, 08:47 AM

Originally Posted by Apokalipse

Actually, we were supposed to have 28nm before 2011, but TSMC didn't get it ready in time.

Fortunately there is a Globalfoundries who is firmly working on a 28nm node as well.

Originally Posted by Mechromancer

HOLD ON...

If the desktop version of BD, Zambezi will be here in April, the server parts should arrive a few months prior correct? Can we expect Interlagos in February?

Server parts will come later in this case. This time they won't wait after the Opterons launch to start the desktop parts. It's a smarter choice.

**Mechromancer** · 01-05-2011, 08:54 AM

Originally Posted by Oliverda

Fortunately there is a Globalfoundries who is firmly working on a 28nm node as well.

Server parts will come later in this case. This time they won't wait after the Opterons launch to start the desktop parts. It's a smarter choice.

Does this mean they are confident as to BD's desktop performance? It looks like another server-biased chip really. With 8 physical cores, it should surprise us though.

**zalbard** · 01-05-2011, 08:56 AM

Originally Posted by Apokalipse

Actually, we were supposed to have 28nm before 2011, but TSMC didn't get it ready in time.

I know. Doesn't change the fact, though.

**Nintendork** · 01-05-2011, 08:57 AM

And since server parts are "the more cores the merrier" I think there's no point in releasing a "simple" 8-core Bulldozer but focusing instead on the 12-16cores MCM parts (Interlagos).

badboy18187
AFAIK

Trinity APU 32nm 2012
Bulldozer 1.5 (based on Komodo?)
Better IGP than Llano

Krishna/Wichita APU 28nm Q4 2011 by rumors
Improved Bobcat cores (probably based on Llano cores)
Better IGP (160SP Caicos HD6450 or low end Llano IGP?)

**-Boris-** · 01-05-2011, 09:04 AM

Originally Posted by badboy18187

BTW i don`t know if I`m posting here correctly but, are there any rumours regarding the APU that will use a Zambezi CPU and a highend 28nm GPU?I think I saw some slides somewhere that it might launch this year.And one other thing, where did this rumour regarding BD`s IPC < SB came from? did AMD themselves actually said that it will be lower?

Zacate is 40nm, it's impossible to have one part of the chip at 40nm and another in 28nm. It's one or another, but not both.
Llano will be interesting though, it's at 32nm and have 6 times as many shaders as Zacate, and will have 4 enhanced Phenom II cores but without the L3.

And about that IPC rumor, much in BD has been streamlined for as low amount of trannies as possible, and sometimes they made a performance trade off to lower the die size and enhance performance per watt or performance per mm².
So it might be a very efficient processor per watt, or per mm², at the cost of IPC. But it might be so that the other will make up for this trade offs.
Anyway, these enhancements aloow 8 cores at a relatively small die, and they might allow a high frequency design which is relatively cool. Might mean 4GHz+ at introduction.

**Nintendork** · 01-05-2011, 09:07 AM

Llano now have 4MB of L2 or 1MB L2 cache/core if I'm correct.

**nn_step** · 01-05-2011, 12:55 PM

Originally Posted by kl0012

You have no idea what you talking about. Have you?
Your comparisions are really wrong and misleading.
Interleaved hyperthreading (only one thread at time can be actived on each core) is used only in itanium. All x86 hyperthreaded cores can really handle 2 threads at time (depending on available execution resources). But since in most cases there are available execution resources (a single thread rarely utilize even 70% of available core execution resources), so HT mostly shows positive effect. This is really simple and elegant way to utilize resources which are already in place but not used.

Actually hyper-threading was criticized for being energy-inefficient. For example, ARM has stated SMT can use up to 46% more power than dual core designs. Furthermore, they claim SMT increases cache thrashing by 42%, whereas dual core results in a 37% decrease.

Not to mention that in May 2005 Colin Percival demonstrated that a malicious thread operating with limited privileges can monitor the execution of another thread through their influence on a shared data cache, allowing for the theft of cryptographic keys.

Hyperthreading is largely a strong negative effect to any system.

**cobra_kai** · 01-05-2011, 01:20 PM

Originally Posted by nn_step

Actually hyper-threading was criticized for being energy-inefficient. For example, ARM has stated SMT can use up to 46% more power than dual core designs. Furthermore, they claim SMT increases cache thrashing by 42%, whereas dual core results in a 37% decrease.

Not to mention that in May 2005 Colin Percival demonstrated that a malicious thread operating with limited privileges can monitor the execution of another thread through their influence on a shared data cache, allowing for the theft of cryptographic keys.

Hyperthreading is largely a strong negative effect to any system.

Link? It seems to me that the performance of recent Intel processors has proven the effectiveness of HT

**kl0012** · 01-05-2011, 01:26 PM

Originally Posted by nn_step

Actually hyper-threading was criticized for being energy-inefficient.For example, ARM has stated SMT can use up to 46% more power than dual core designs.

Adding performance feature will add some power consumption. This is natural. I can't comment ARM design, but SB is very efficient even with HT.

Furthermore, they claim SMT increases cache thrashing by 42%, whereas dual core results in a 37% decrease.

Cache trashing is mostly function of cache size and quality of software. In fact Bulldoser will be affected in the same way as SB with HT since each module uses shared cache (while dedicated L1 was reduced to just 16k).

Not to mention that in May 2005 Colin Percival demonstrated that a malicious thread operating with limited privileges can monitor the execution of another thread through their influence on a shared data cache, allowing for the theft of cryptographic keys.

Hyperthreading is largely a strong negative effect to any system.

Well... This is funny. In fact, no absolutely secure hardware exists. There was introduced Blue Pill malware which uses security holes in AMD virtualization technology. Can we say AMD-V is a "strong negative effect" to any system?

Update:
Since you used wikipedia as your source of information, I just want to add a sentence which you forgot to copy:

In May 2005 Colin Percival demonstrated that a malicious thread operating with limited privileges can monitor the execution of another thread through their influence on a shared data cache, allowing for the theft of cryptographic keys.[15] Note that while the attack described in the paper was demonstrated on an Intel Pentium 4 processor with HTT, the same techniques could theoretically apply to any system where caches are shared between two or more non-mutually-trusted execution threads; see also side channel attack.

Also:
In 2010, ARM has stated that it will include simultaneous multithreading in its chips in the future.

**Epsilon84** · 01-05-2011, 01:45 PM

Originally Posted by nn_step

Hyperthreading is largely a strong negative effect to any system.

(source: hardwarecanucks.com)

**gosh** · 01-05-2011, 01:52 PM

Originally Posted by kl0012

Cache trashing is mostly function of cache size and quality of software. In fact Bulldoser will be affected in the same way as SB with HT since each module uses shared cache (while dedicated L1 was reduced to just 16k).

The cache on intel gets trashed before the cache on phenom, L3 on i7 is 16-way set-associative and on Phenom it is 48-way set-associative.
I think that Bulldozer will have strong prefetchers that works on the L2 cache. The size indicates that they are using that trick to gain speed. Same trick that Intel have worked hard on and as I think, many applications is optimized for.

**gosh** · 01-05-2011, 01:57 PM

Originally Posted by Epsilon84

(source: hardwarecanucks.com)

HT works well if you do long calculation jobs. Other types of applications (most of them) where the code and data is not at all that predictable, then HT doesn't have same impact on performance.

**Solus Corvus** · 01-05-2011, 02:02 PM

Originally Posted by xdan

Time*quantity/per core= performance
Even if Amd will have more cores, if they don't reduce also the time in wich a core do the operationS iT's ussless...

Only if you are using a weakly parallel or single threaded program, and only one instance/program at a time.

There are only a few real world high performance usage scenarios I can think of where that is the case (ie. gaming). For most high performance work people I know run multiple programs at the same time and those programs often have multiple intensive threads. There is a reason I have mostly had multi-processor or multi-core machines since the pentium pro days.

If it comes down to a case where SB is faster with low thread counts and BD is faster with high thread counts, then you can't go by such a simplistic statement as what you made. Nor can you simply go by most review sites either, they usually run a single benchmark at a time. You will have to look at the types of programs you use AND how you use them before you can determine what will have the best performance for your scenario.

Are most people going to put that much thought into their purchasing decisions? Lol, hell no.

**nn_step** · 01-05-2011, 02:05 PM

Originally Posted by cobra_kai

Link? It seems to me that the performance of recent Intel processors has proven the effectiveness of HT

quality of implementation is largely independent architectural superiority.

Similar arguments can be made about many poor designs being better than superior architectures.

Originally Posted by kl0012

Adding performance feature will add some power consumption. This is natural. I can't comment ARM design, but SB is very efficient even with HT.

Cache trashing is mostly function of cache size and quality of software. In fact Bulldoser will be affected in the same way as SB with HT since each module uses shared cache (while dedicated L1 was reduced to just 16k).

Well... This is funny. In fact, no absolutely secure hardware exists. There was introduced Blue Pill malware which uses security holes in AMD virtualization technology. Can we say AMD-V is a "strong negative effect" to any system?

Update:
Since you used wikipedia as your source of information, I just want to add a sentence which you forgot to copy:

In May 2005 Colin Percival demonstrated that a malicious thread operating with limited privileges can monitor the execution of another thread through their influence on a shared data cache, allowing for the theft of cryptographic keys.[15] Note that while the attack described in the paper was demonstrated on an Intel Pentium 4 processor with HTT, the same techniques could theoretically apply to any system where caches are shared between two or more non-mutually-trusted execution threads; see also side channel attack.

Also:
In 2010, ARM has stated that it will include simultaneous multithreading in its chips in the future.

cache thrashing generally refers to data cache [which Bulldozer keeps separate]

Also I wish to note that I am attempting to discuss hyper-threading independent of any implementation and am not discussing security faults in x86 in general,

Originally Posted by Epsilon84

(source: hardwarecanucks.com)

See above

**gosh** · 01-05-2011, 02:09 PM

Originally Posted by xdan

Time*quantity/per core= performance
Even if Amd will have more cores, if they don't reduce also the time in wich a core do the operationS iT's ussless...

If the L2 is fast and prefetchers are effective it will be very fast. Core's are mostly waiting for data to arrive. Having almost all hits in the L1 and L2 will improve speed. It has alot of L2 size to work with

**informal** · 01-05-2011, 02:10 PM

Originally Posted by gosh

The cache on intel gets trashed before the cache on phenom, L3 on i7 is 16-way set-associative and on Phenom it is 48-way set-associative.
I think that Bulldozer will have strong prefetchers that works on the L2 cache. The size indicates that they are using that trick to gain speed. Same trick that Intel have worked hard on and as I think, many applications is optimized for.

I think the shared L2 per core pair with a size of 2MB is going to be a big deal for many desktop apps.Redpriest over at SA forums once said that AMD suffered a lot because many apps were optimized for just that cache size as a sweet spot (on intel HW since past Core2 they had very fast and large shared L2 cache). AMD finally tackled this issue with fairly large and fast L2. Compared to Thuban,in single or poorly threaded desktop workloads there is now 4x more L2 cache running at full clock with 2x greater load/store BW than Thuban core. Further,the L3 is now 33% larger and partitioned with 2.4Ghz+ frequency(20+ percent improvement Vs Thuban).
Just looking at cache subsystem BD is on whole other level than Thuban.

View Poll Results: AMD do not allow preliminary Bulldozer cpu reviews. This is:

Thread: AMD to start Bulldozer AM3+ production by March 2011, launch in April 2011

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions