AMD Tapes Out First "Bulldozer" Microprocessors.

Printable View

Show 100 post(s) from this thread on one page

07-18-2010, 11:46 PM
qurious63ss

JF-AMD are you a process or design engineer for AMD?
07-19-2010, 12:00 AM
FlanK3r

he is director of product marketing servers AMD...
07-19-2010, 03:26 AM
informal

Quote:

Originally Posted by qurious63ss

JF-AMD are you a process or design engineer for AMD?

I think AMD's engineers are not posting technical data on various forums,they have better things to do.
JF is a server marketing dude(very high up in the hierarchy), you can check out his blog.
07-19-2010, 05:55 AM
JF-AMD

Quote:

Originally Posted by god_43

ok well what is cmt then, it is being included with BD right? how does it fit?

CMT was something said a while ago, long before I started working with bulldozer. When I sat down with engineering and product management to get the bulldozer story for the first time (I jump in much further downstream than the other teams) I asked about CMT and they basically said that they were not using that term. I never pursued.
07-19-2010, 05:59 AM
Manicdan

how about we peruse the term "Standardized Threading Definitions" so all cpus can have STDs
07-19-2010, 06:06 AM
Particle
To be brief, I'll bulletize my bones to pick in this thread.
- SMT - Listed as threads instead of cores for a reason. Intel's implementation means you'll only ever have [core count] threads active/executing even though you have [core count] * 2 "threads". Let's not lose sight of that. I don't know why Sav is going off on a tangent about this.
- CMT - Genuinely has [core count] threads active/executing at any given moment. Each module contains two integer units capable of chewing on instructions at the same exact moment. It's two cores per module.
- Threads - When it comes to comparing thread counts, realize that Intel's SMT-enabled chip thread counts aren't the same thing as AMD's CMT thread counts. In the case of AMD, all those threads are actually executing in parallel. In Intel's case they are not.
Maybe that will help clear up some confusion.
07-19-2010, 06:36 AM
savantu
Quote:
Originally Posted by Particle

To be brief, I'll bulletize my bones to pick in this thread.

SMT - Listed as threads instead of cores for a reason. Intel's implementation means you'll only ever have [core count] threads active/executing even though you have [core count] * 2 "threads". Let's not lose sight of that. I don't know why Sav is going off on a tangent about this.
CMT - Genuinely has [core count] threads active/executing at any given moment. Each module contains two integer units capable of chewing on instructions at the same exact moment. It's two cores per module.
Threads - When it comes to comparing thread counts, realize that Intel's SMT-enabled chip thread counts aren't the same thing as AMD's CMT thread counts. In the case of AMD, all those threads are actually executing in parallel. In Intel's case they are not.

Maybe that will help clear up some confusion.
Actually, you're contributing to the confusion because you do not understant ( neither does JF apparently or does it intentionally for FUD ) what SMT really is.

As a hint, you should pay attention to the S part in SMT ( simultaneous multithreading ). There is plenty of literature on the subject, 5min of reading would help you get it settled.
07-19-2010, 06:56 AM
ajaidev

Quote:

Originally Posted by savantu

Actually, you're contributing to the confusion because you do not understant ( neither does JF apparently or does it intentionally for FUD ) what SMT really is.

As a hint, you should pay attention to the S part in SMT ( simultaneous multithreading ). There is plenty of literature on the subject, 5min of reading would help you get it settled.

??

http://i473.photobucket.com/albums/r...anadej/HTT.gif

Straight from Intel and in such a pretty diagram that most people understand right away....
07-19-2010, 06:59 AM
Dresdenboy

Quote:

Originally Posted by savantu

Actually, you're contributing to the confusion because you do not understant ( neither does JF apparently or does it intentionally for FUD ) what SMT really is.

As a hint, you should pay attention to the S part in SMT ( simultaneous multithreading ). There is plenty of literature on the subject, 5min of reading would help you get it settled.

To help out a bit, I think in this case a picture is worth a thousand words:

http://info.nuje.de/intel_smt.png

For some more variants, see http://molesterwaterball.blogspot.co...luster-mt.html

There are pipeline stages in Nehalem, where only one thread is active during one cycle (e.g. decoding) and other stages, where multiple subunits (like EUs) can be used by two threads simultaneously - but still one thread per EU.

Edit: Fixed image (didn't allow direct linking).
07-19-2010, 07:35 AM
savantu

1 Attachment(s)

Quote:

Originally Posted by ajaidev

??

http://i473.photobucket.com/albums/r...anadej/HTT.gif

Straight from Intel and in such a pretty diagram that most people understand right away....

So what should we understand ? That instructions from 2 different threads are in flight at the same time in various execution stages ?
Sounds awfully familiar with what I'm saying : SMT means the simultaneous execution of 2 threads or more in parallel.
Maybe, you should answer why it's called simultaneous in the first place.

Quote:

Originally Posted by Dresdenboy

To help out a bit, I think in this case a picture is worth a thousand words:

http://pics.computerbase.de/2/3/1/6/9/51_m.png

For some more variants, see http://molesterwaterball.blogspot.co...luster-mt.html

There are pipeline stages in Nehalem, where only one thread is active during one cycle (e.g. decoding) and other stages, where multiple subunits (like EUs) can be used by two threads simultaneously - but still one thread per EU.

What's your point, somehow I am missing it ? What do execution units have to do with executing threads simultaneously ?

Maybe instead of amateur sources and interpretations, we should look into real technical articles, done by the people who invented this technologies and which are published at conferences and tech journals.

I've attached a diagram of the a Netburst execution core to show the simultaneous execution of 2 threads : you can find it in this paper
ftp://download.intel.com/technology/...technology.pdf
07-19-2010, 07:52 AM
Particle

Quote:

Originally Posted by savantu

Actually, you're contributing to the confusion because you do not understant ( neither does JF apparently or does it intentionally for FUD ) what SMT really is.

As a hint, you should pay attention to the S part in SMT ( simultaneous multithreading ). There is plenty of literature on the subject, 5min of reading would help you get it settled.

You don't appear to understand what is really going on yourself. There is only one execution unit. You can't have two threads with instructions that compete for the same resources executing on the same clock cycle in the same execution unit. That's the end of the story. HT is, as we've been claiming all along, just a way to maximize the utilization of the core's resources by scheduling work where there would normally be none being done (misses and whatnot). It does not magically let you execute two threads at the same time the way two real cores do.
07-19-2010, 07:55 AM
JF-AMD

Let me put it in simple terms:

With actual cores, throughput generally goes up ~90% when you go from 1 core to 2 cores.

With SMT, throughput generally goes up ~14% for int and ~20% for FP (from SPEC.org, on Intel-based submissions).

SMT may double the number of threads, but it does not double the number of pipelines. You can only fit so many executions per cycle based on the pipelines. SMT might give you better utilization, but you are still limited on pipelines.

Doubling the number of cores will double the number pipelines and allow for more simultaneous execution. That is the key to this whole discussion. Everyone can argue about how many angels can dance on the head of a pin, but in reality, having more cores means that you have a larger dancefloor.
07-19-2010, 07:55 AM
Movieman

Just a thought but does it really matter the path that the two companies have taken?
What should matter is the effectiveness of the choice they made.
IE: Take a $1000.00 intel chip and a $1000.00 AMD chip and see which one does the work you need done better.
Maybe thats too black and white for you smart guys here but to me thats all that counts..
The rest is just a way to kill time typing in a forum..
( Puts on flamesuit):rofl:
07-19-2010, 08:00 AM
JF-AMD

Quote:

Originally Posted by Movieman

Just a thought but does it really matter the path that the two companies have taken?
What should matter is the effectiveness of the choice they made.
IE: Take a $1000.00 intel chip and a $1000.00 AMD chip and see which one does the work you need done better.
Maybe thats too black and white for you smart guys here but to me thats all that counts..
The rest is just a way to kill time typing in a forum..
( Puts on flamesuit):rofl:

That is the craziest thing that I have ever heard ;)

NOBODY ever buys like that. Just customers, but outside of customers, who would ever do that?

http://cdn.mos.bikeradar.com/images/...n36-399-75.jpg
07-19-2010, 08:01 AM
richierich

Quote:

Originally Posted by Movieman

Just a thought but does it really matter the path that the two companies have taken?
What should matter is the effectiveness of the choice they made.
IE: Take a $1000.00 intel chip and a $1000.00 AMD chip and see which one does the work you need done better.
Maybe thats too black and white for you smart guys here but to me thats all that counts..
The rest is just a way to kill time typing in a forum..
( Puts on flamesuit):rofl:

Newegg:

AMD Opteron 6172 12-core 2.1GHz $1009

Intel Xeon X5550 Quad-Core 2.66GHz $1016.49
Intel Xeon X5650 Hexa-Core 2.66GHz $1024.71

Is this what you mean, MM?
07-19-2010, 08:02 AM
OhNoes!

Quote:

Originally Posted by Particle

You don't appear to understand what is really going on yourself. There is only one execution unit. You can't have two threads with instructions that compete for the same resources executing on the same clock cycle in the same execution unit. That's the end of the story. HT is, as we've been claiming all along, just a way to maximize the utilization of the core's resources by scheduling work where there would normally be none being done (misses and whatnot). It does not magically let you execute two threads at the same time the way two real cores do.

The key to all the confusion is workload. Though SMT allows for the simultaneous execution of threads, it's entirely dependent on shared resources. This is a liability CMT design should theoretically overcome.
07-19-2010, 08:06 AM
Particle

Quote:

Originally Posted by OhNoes!

The key to all the confusion is workload. Though SMT allows for the simultaneous execution of threads, it's entirely dependent on shared resources. This is a liability CMT design should theoretically overcome.

Exactly. I do think most of the people here get it, but there are a couple of stubborn ones who don't.
07-19-2010, 08:10 AM
Movieman

Quote:

Originally Posted by richierich

Newegg:

AMD Opteron 6172 12-core 2.1GHz $1009

Intel Xeon X5550 Quad-Core 2.66GHz $1016.49
Intel Xeon X5650 Hexa-Core 2.66GHz $1024.71

Is this what you mean, MM?

That is exactly what I mean..
Look at the type of work your doing, then look at the strengths of the two approaches and choose the one that works best for you..
I imagine that in different types of work there are places where both excel and others where both don't..
07-19-2010, 08:10 AM
savantu

Quote:

Originally Posted by Particle

You don't appear to understand what is really going on yourself. There is only one execution unit. You can't have two threads with instructions that compete for the same resources executing on the same clock cycle in the same execution unit. That's the end of the story. HT is, as we've been claiming all along, just a way to maximize the utilization of the core's resources by scheduling work where there would normally be none being done (misses and whatnot). It does not magically let you execute two threads at the same time the way two real cores do.

Last I knew a CPU typically has 3-4 ALUs and 3-4 FP units. No single thread will use all of them.

And it seems you are missing my point : I'm not saying HT is equal to having another core, I'm saying at allows you to execute 2 threads in parallel. End of story.

Quote:

Originally Posted by JF-AMD

Let me put it in simple terms:

With actual cores, throughput generally goes up ~90% when you go from 1 core to 2 cores.

With SMT, throughput generally goes up ~14% for int and ~20% for FP (from SPEC.org, on Intel-based submissions).

SMT may double the number of threads, but it does not double the number of pipelines. You can only fit so many executions per cycle based on the pipelines. SMT might give you better utilization, but you are still limited on pipelines.

Doubling the number of cores will double the number pipelines and allow for more simultaneous execution. That is the key to this whole discussion. Everyone can argue about how many angels can dance on the head of a pin, but in reality, having more cores means that you have a larger dancefloor.

That's a strawman; you originally claimed

Quote:

Originally Posted by JF-AMD

...

As I understand SMT, a 4 core die has 4 pipelines. If one thread stalls, another can take over those pipelines and continue. So, while you technically have 8 threads active, only 4 are running in any given cycle. SMT takes advantage of thread stalls to fill the pipelines with the "on deck" thread. This is why the throughput increase is 15-20% (for servers).

That is incorrect. SMT means your pipeline has 2 threads active at the same time. Which approach is better, is another discussion.
07-19-2010, 08:15 AM
freeloader

Quote:

Originally Posted by Movieman

Just a thought but does it really matter the path that the two companies have taken?
What should matter is the effectiveness of the choice they made.
IE: Take a $1000.00 intel chip and a $1000.00 AMD chip and see which one does the work you need done better.
Maybe thats too black and white for you smart guys here but to me thats all that counts..
The rest is just a way to kill time typing in a forum..
( Puts on flamesuit):rofl:

Bingo, :banana::banana::banana::banana:ing Yahtzee! I don't care for any of the technical side to any CPU. I just want what works for me, the fastest and most cost efficient. I'm the type of person who would also give up 10 to 15% performance if it was going to save me a few hundred dollars.
07-19-2010, 08:18 AM
Movieman

Lets get away from theory and into reality for a minute ok?
You guys know I have both Intel and AMD systems here yes?
Both excellent, but they are different.
My westmere system( 2 actually, one in SR2 board, one in SM X8DA3 board) are powerfull but suck electric and generate a lot of heat.
My dual Magny cours system is a little less powerfull( 6168 chips,1900mhz,24 cores) but runs at 39-41C and takes a lot less electric to run it.
I also noticed that in the DC work I do there are much less page faults with the MC system.. Why I don't know but it's true.
It is also solid as a rock and has been at 100% load since built 2 months ago.
Bottom line is as said, different strengths and charachteristics but both good systems.
07-19-2010, 08:18 AM
Manicdan

Quote:

Originally Posted by savantu

Last I knew a CPU typically has 3-4 ALUs and 3-4 FP units. No single thread will use all of them.

why would cpus be built in such a way they are never used to maximum capacity?
07-19-2010, 08:18 AM
savantu

Quote:

Originally Posted by OhNoes!

The key to all the confusion is workload. Though SMT allows for the simultaneous execution of threads, it's entirely dependent on shared resources. This is a liability CMT design should theoretically overcome.

Now this is an interesting point : what if being dependent on shared resources isn't a liability, but actually a desirable feature ?

SMT allows me to increase the utilization of an underutilized core. CMT duplicates the core or part of it, thus duplicating the lack of utilization also.

Example : we take a 4 issues wide core with 4 ALU units. Let's say, most of the time only 1 or 2 of those units are used.
-with SMT, we have 2 threads running in parallel on that core, the second thread being dispatched to the idle units. Thus we now have 3 or even all of the units in use.
-with CMT, I add another cluster of 4 ALUs for a total of 8. I have 2 threads, but I also have 2x as many resources available and each thread uses most of the time 1 or 2 ALUs. Thus, out of 8 in the module, I'm constantly using 3-4 units.
07-19-2010, 08:23 AM
savantu

Quote:

Originally Posted by Manicdan

why would cpus be built in such a way they are never used to maximum capacity?

Because it is damn hard to achieve high utilization in just about every domain. Look at your body; when do you use it at max capacity ? The arms for example ? I really doubt you use more than 30-40% of the capacity of the right arm and 10-20% of the left one ( if you're right handed ).

But, I suppose, you don't think in this way : " why does the human body have 2 arms if they are never used to maximum capacity ? ".
07-19-2010, 08:28 AM
Manicdan

Quote:

Originally Posted by savantu

Because it is damn hard to achieve high utilization in just about every domain. Look at your body; when do you use it at max capacity ? The arms for example ? I really doubt you use more than 30-40% of the capacity of the right arm and 10-20% of the left one ( if you're right handed ).

But, I suppose, you don't think in this way : " why does the human body have 2 arms if they are never used to maximum capacity ? ".

well if i could ask god why i have 2 arms when i almost always use just one i would, but i cant. so instead im asking why build a chip that you say, and i quote:

Quote:

No single thread will use all of them.

so are cpu designers idiots who overbuild chips knowing its a waste of money and resources? or is it that SMT only gets 15% of a bonus because chips are almost always being fully utilized?

Show 100 post(s) from this thread on one page