AMD to Disclose Details About Bulldozer Micro-Architecture in August

**haylui** · 06-30-2010, 08:14 AM

Originally Posted by w0mbat

i mean the latter.

surprising for whom? i dont think for u

THANKS.
i thought you would say former
scare the **** out of me

**Chumbucket843** · 06-30-2010, 08:25 AM

Originally Posted by Dresdenboy

That's correct. But several BD related patents indicate, that there could be many buffers and queues to help loosening the connections of different units. There are also patents talking about data crossing clock domains. And a simple case would be to have units with twice the clock frequency. You could even interleave the accesses of slow (half) clock frequency units on a half cycle basis.

the buffers and queues are there to keep communication overhead of a clustered uarch down.

could you give a link to that patent? i really would like to know how they are going to handle sequencing with high clockspeeds. even if you double the clockspeed for a pipeline stage there will still be a lot of complex issues like clock skew, power, and area. in the past AMD has made a few borked synchronizers. idk if they want to go that direction with BD but that was 30 years ago.

**Dresdenboy** · 06-30-2010, 11:57 AM

Originally Posted by Chumbucket843

the buffers and queues are there to keep communication overhead of a clustered uarch down.

could you give a link to that patent? i really would like to know how they are going to handle sequencing with high clockspeeds. even if you double the clockspeed for a pipeline stage there will still be a lot of complex issues like clock skew, power, and area. in the past AMD has made a few borked synchronizers. idk if they want to go that direction with BD but that was 30 years ago.

They should have some experience now with a differently clocked NB or HT PHY. However using a specific clock and 2x or 0.5x that clock as a second clock, should be ok to handle. Cell did it this way.

And some more recent patents:
http://www.freepatentsonline.com/y2008/0288805.html
http://www.freepatentsonline.com/y2009/0261869.html
http://www.freepatentsonline.com/y2010/0049887.html
http://www.freepatentsonline.com/7636803.html
and a paper (NB related, by some of the inventors):
http://www.computer.org/portal/web/c.../ASYNC.2007.21

**Chrysalis** · 06-30-2010, 04:41 PM

Originally Posted by Sn0wm@n

havent we adressed that same subject last page???

we addressed nothing. I said it cant be coded for. your response is they should code for it.

I am thinking what the ???? to your response.

I know in linux and freebsd htt show up as real cpu's what I dont know if its the same case in windows or not.

eg. on a freebsd server I have access to right now it is reporting 8 processors on a quad core htt cpu.

**~~wuttz~~** · 06-30-2010, 04:55 PM

doesn't the windows scheduler have some mechanism that assigns threads onto the first 2/4/6 physical/real cores first-
and only when all those real cores have been used then it starts to allocate more threads to the htt-pipeline?

and that coders have access to this mechanism if their threads initiate this resource call?

6.2 Improving Application Performance on Hyper-Threading-Enabled Systems
In general, multithreaded Windows applications perform better when running unmodified on an HT processor than they do on a similarly equipped single-threaded processor. To optimize the application performance benefit on HT-enabled systems, the application should ensure that the threads executing on the two logical processors have minimal dependencies on the same shared resources on the physical processor. With an understanding of how the application threads and processes utilize the shared resources on an HT processor, setting processor affinity to minimize competition for these system resources can help application performance.

The following example scenarios describe good and bad ways to set thread affinities:
Good HT thread affinity example. Where an application has threads that produce data and threads that consume data, setting affinities so that consumer/producer thread pairs run on the logical processors of the same physical processor should improve performance. This configuration allows the threads to share cached data and to overlap operation. That is, the producer thread can produce future items while the consumer thread is consuming older items.

ist, the more stalls/bubbles an app has, the more htt will be beneficial. otherwise, if an app is optimized as indicated above, htt will also increase performance.

On HT-enabled systems, each logical processor is treated as an individual processor by the operating system and is represented by a bit in the system affinity mask. This is true for both HT-aware and non-HT-aware releases of the Windows operating system.
The system processor affinity mask can be read using the GetProcessAffinityMask function. The mask has a bit set for each processor in the system. The mask can be used by applications to set processor affinity for its threads and processes using the SetThreadAffinityMask or SetThreadIdealProcessor functions.

so, theres the call parameters.

5.3 Using the YIELD (PAUSE) Instruction to Avoid Spinlock Contention
Where two logical processors on the same physical HT processor are competing for access to the same piece of data, the shared resources on the device can have the effect of "starving" one of the logical processors by, in effect, denying it access to the data. This is particularly significant when the piece of data is a spinlock, because the logical processor that is starved of access might own the spinlock. Intel recommends that logical processors be paused while executing spinlocks to alleviate this problem.

5.2 Aggressive HALT of Processors in the Idle Loop
When a processor in a system running the Windows operating system has no work to do, it enters the idle loop. If the first logical processor on an HT processor is executing instructions in the idle loop, that is, if it is not doing any real work, it is competing for shared resources, which degrades the performance capability of the second logical processor on the same physical processor. The result of this is to degrade the rate at which the second logical processor could do real work.
To minimize the impact of this, the idle loop in Windows

XP and the Windows Server

2003 family has been modified to more aggressively HALT processors that are executing in the idle loop. After a logical processor has been halted, it no longer executes instructions and no longer competes for shared resources.

The performance increase that is delivered when transitioning from one active logical processor to two active logical processors, on the same physical processor, is typically in the range of 10% (10) to 30% (30). So on average the total system performance would be likely to increase from 200 to 220 (that is, it goes up by 10%).
This lower performance increase is due to the fact that two threads are competing for the use of the shared resources on one of the physical HT processors. So scheduling a thread onto an HT processor that already has an active logical processor has the following effects:
o Slowing down the performance of that active logical processor
o Limiting the performance of the new scheduled thread on the second logical processor

[ m$ ]

**Hornet331** · 07-01-2010, 02:53 AM

Originally Posted by Chrysalis

we addressed nothing. I said it cant be coded for. your response is they should code for it.

I am thinking what the ???? to your response.

I know in linux and freebsd htt show up as real cpu's what I dont know if its the same case in windows or not.

eg. on a freebsd server I have access to right now it is reporting 8 processors on a quad core htt cpu.

Obviously it was adressed, you just failed to read/understand it.

Linux is SMT(HT) aware since kenrnel 2.4.18 and windows since XP. The os knows which cores are real cores and what cores are logical cores.

And you can code for it, at least in windows there are certain commands available to the programmer to retrive the mapping of the cores if you want to assign thread affinity manually.

http://www.xtremesystems.org/forums/...&postcount=262

Read the doc that is linked in that post. I am als fairly certain, that there is a similar option for linux.

**Chrysalis** · 07-01-2010, 01:24 PM

yep know about affinity so yeah at the very least thats available, will read up on the doc and say my thoughts after.

**Dimitriman** · 07-01-2010, 10:03 PM

I am officially on board the Bulldozer Bandwagon...

'Give me 16 cores or give me death!'

**Jowy Atreides** · 07-04-2010, 03:20 PM

I hope the socket platform is disclosed.

I just bought AM3 with the intention of upgrading to BD and now there's net rumours about an AM3 rev2

**Kej** · 07-04-2010, 03:42 PM

Originally Posted by Jowy Atreides

I hope the socket platform is disclosed.

I just bought AM3 with the intention of upgrading to BD and now there's net rumours about an AM3 rev2

Socket AM3r2 has been known for some time.

Sampsa put up a slide here on the forum last year.

IIRC socket AM2+ was called AM2r2 before it was released, so AM3r2 could be
called AM3+ at launch perhaps.
Hopefully a BIOS upgrade is the only thing that is needed, and that the manufacturers
isn't so lazy that they try to avoid releasing them.

**freeloader** · 07-04-2010, 08:07 PM

I'm pretty sure I read somewhere that AM3 will be compatible with Bulldozer. Unless something has changed since this slide was made.

**Jowy Atreides** · 07-04-2010, 08:26 PM

Originally Posted by freeloader

I'm pretty sure I read somewhere that AM3 will be compatible with Bulldozer. Unless something has changed since this slide was made.

**pokipoki** · 07-04-2010, 09:12 PM

It's a real pity that Opterons can't be overclocked due to lack of motherboards. The cost-effectiveness of the platform is of great value, especially to advanced home users. Socket longevity, more cores for lower prices etc. I'd be willing to pay more for a multi-processor Opteron board with overclocking abilities. I'm sure most duallie fans will do the same. Maybe we should setup a poll to measure the consensus?

**Wishmaker** · 07-05-2010, 03:46 AM

Can anyone confirm the 9 core rumour?

**informal** · 07-05-2010, 04:53 AM

Rumor about 9 cores is not true.

**JF-AMD** · 07-05-2010, 05:09 AM

Originally Posted by pokipoki

It's a real pity that Opterons can't be overclocked due to lack of motherboards. The cost-effectiveness of the platform is of great value, especially to advanced home users. Socket longevity, more cores for lower prices etc. I'd be willing to pay more for a multi-processor Opteron board with overclocking abilities. I'm sure most duallie fans will do the same. Maybe we should setup a poll to measure the consensus?

Opteron is targeted at commercial server applications. It would be extremely expensive to support the consumer market so that will not happen. I have gone through the economics on several forums, but let me dispel the two biggest myths quickly:

1. There is not a "huge market" for server parts in consumer environments. There are definitely people that will want to do this, but it is a very small part of the market.

2. It is not inexpensive to "just add support". Essentially you are doubling a lot of the back end costs.

I would never stop a consumer from doing this because it is my job to sell more processors. But I would warn that if you go down that path, you won't see the level of support that you will see on Phenom and other consumer brands.

**JF-AMD** · 07-05-2010, 05:09 AM

Originally Posted by Wishmaker

Can anyone confirm the 9 core rumour?

Not true.

**zir_blazer** · 07-05-2010, 07:23 AM

Originally Posted by JF-AMD

1. There is not a "huge market" for server parts in consumer environments. There are definitely people that will want to do this, but it is a very small part of the market.

Who was the guy responsible for Socket 939 Opterons 1xx? They were extremely famous and popular here around 2006 because you could get Athlon 64 FX worth bins in a 250 U$D or so Opteron 165, besides that they use Toledo JH-E6 parts with 1 MB Cache L2 per Core while comparable A64X2 were Manchesters BH-E4 with 512 KB Cache L2. The enthusiast market eated those "Server parts".

About the 9 Core issue, the "128 Bits Core that interconnects the 8 64 Bits ones" sounds the 128 Bits IMC and a Crossbar or something.

**freeloader** · 07-05-2010, 07:29 AM

Originally Posted by pokipoki

It's a real pity that Opterons can't be overclocked due to lack of motherboards. The cost-effectiveness of the platform is of great value, especially to advanced home users. Socket longevity, more cores for lower prices etc. I'd be willing to pay more for a multi-processor Opteron board with overclocking abilities. I'm sure most duallie fans will do the same. Maybe we should setup a poll to measure the consensus?

I'd love an overclocking board based on two 4 or 6 core Opteron 4000 series processors.

**JF-AMD** · 07-05-2010, 07:44 AM

Originally Posted by zir_blazer

Who was the guy responsible for Socket 939 Opterons 1xx? They were extremely famous and popular here around 2006 because you could get Athlon 64 FX worth bins in a 250 U$D or so Opteron 165, besides that they use Toledo JH-E6 parts with 1 MB Cache L2 per Core while comparable A64X2 were Manchesters BH-E4 with 512 KB Cache L2. The enthusiast market eated those "Server parts".

About the 9 Core issue, the "128 Bits Core that interconnects the 8 64 Bits ones" sounds the 128 Bits IMC and a Crossbar or something.

I can guarantee you that while those were popular with enthusiasts, that was not a net benefit for AMD. And we did not sell nearly as many as you probably think. I have a Fox Vanilla 140 on my bike and so do a lot of the people that I ride with, but that does not mean it is the most popular fork out there. Just real popular with my friends.

**zir_blazer** · 07-05-2010, 07:56 AM

Originally Posted by JF-AMD

I can guarantee you that while those were popular with enthusiasts, that was not a net benefit for AMD. And we did not sell nearly as many as you probably think. I have a Fox Vanilla 140 on my bike and so do a lot of the people that I ride with, but that does not mean it is the most popular fork out there. Just real popular with my friends.

Opterons 1xx were on a shortage back on its time, so I suppose that the enthusiast market did have an impact on them. Obviously it wasn't a net benefit because you were selling the highest quality bin at very cheap prices that cannibalized A64 counterparts sales. But for those that put their hands on them, it was wonderful.

**Dresdenboy** · 07-06-2010, 03:04 PM

Each BD core has 4 ALUs and at least 3 AGUs. I found some nice numbers in Open64 sources again. More (plus SB BOINC stats) here http://citavia.blog.de/2010/07/06/bu...-core-8927293/

It seems, Hiroshige Goto needs to redraw his diagrams (and me too).

**informal** · 07-06-2010, 04:25 PM

Thanks for the new update,looks very interesting

. Those integer cores should be quite powerful now that we know that Bobcat is pretty speedy too.

**slaveondope** · 07-06-2010, 04:29 PM

Originally Posted by freeloader

I'd love an overclocking board based on two 4 or 6 core Opteron 4000 series processors.

They makem with nvidia chipsets

**vietthanhpro** · 07-06-2010, 05:55 PM

K8->K8L: 2 load or 1 load and 1 store per cycle
Bobcat: 1 load and 1 store per cycle
Bulldozer: 2 load and 1 store per cycle
Core: 2 load or 2 store per cycle

Thread: AMD to Disclose Details About Bulldozer Micro-Architecture in August

Thread Tools

Search Thread

Rate This Thread

Display

Bookmarks

Bookmarks

Posting Permissions