Main Rig:
Processor & Motherboard:AMD Ryzen5 1400 ' Gigabyte B450M-DS3H
Random Access Memory Module:Adata XPG DDR4 3000 MHz 2x8GB
Graphic Card:XFX RX 580 4GB
Power Supply Unit:FSP AURUM 92+ Series PT-650M
Storage Unit:Crucial MX 500 240GB SATA III SSD
Processor Heatsink Fan:AMD Wraith Spire RGB
Chasis:Thermaltake Level 10GTS Black
the buffers and queues are there to keep communication overhead of a clustered uarch down.
could you give a link to that patent? i really would like to know how they are going to handle sequencing with high clockspeeds. even if you double the clockspeed for a pipeline stage there will still be a lot of complex issues like clock skew, power, and area. in the past AMD has made a few borked synchronizers. idk if they want to go that direction with BD but that was 30 years ago.![]()
They should have some experience now with a differently clocked NB or HT PHY. However using a specific clock and 2x or 0.5x that clock as a second clock, should be ok to handle. Cell did it this way.
And some more recent patents:
http://www.freepatentsonline.com/y2008/0288805.html
http://www.freepatentsonline.com/y2009/0261869.html
http://www.freepatentsonline.com/y2010/0049887.html
http://www.freepatentsonline.com/7636803.html
and a paper (NB related, by some of the inventors):
http://www.computer.org/portal/web/c.../ASYNC.2007.21
we addressed nothing. I said it cant be coded for. your response is they should code for it.
I am thinking what the ???? to your response.
I know in linux and freebsd htt show up as real cpu's what I dont know if its the same case in windows or not.
eg. on a freebsd server I have access to right now it is reporting 8 processors on a quad core htt cpu.
doesn't the windows scheduler have some mechanism that assigns threads onto the first 2/4/6 physical/real cores first-
and only when all those real cores have been used then it starts to allocate more threads to the htt-pipeline?
and that coders have access to this mechanism if their threads initiate this resource call?
ist, the more stalls/bubbles an app has, the more htt will be beneficial. otherwise, if an app is optimized as indicated above, htt will also increase performance.6.2 Improving Application Performance on Hyper-Threading-Enabled Systems
In general, multithreaded Windows applications perform better when running unmodified on an HT processor than they do on a similarly equipped single-threaded processor. To optimize the application performance benefit on HT-enabled systems, the application should ensure that the threads executing on the two logical processors have minimal dependencies on the same shared resources on the physical processor. With an understanding of how the application threads and processes utilize the shared resources on an HT processor, setting processor affinity to minimize competition for these system resources can help application performance.
The following example scenarios describe good and bad ways to set thread affinities:
Good HT thread affinity example. Where an application has threads that produce data and threads that consume data, setting affinities so that consumer/producer thread pairs run on the logical processors of the same physical processor should improve performance. This configuration allows the threads to share cached data and to overlap operation. That is, the producer thread can produce future items while the consumer thread is consuming older items.
so, theres the call parameters.On HT-enabled systems, each logical processor is treated as an individual processor by the operating system and is represented by a bit in the system affinity mask. This is true for both HT-aware and non-HT-aware releases of the Windows operating system.
The system processor affinity mask can be read using the GetProcessAffinityMask function. The mask has a bit set for each processor in the system. The mask can be used by applications to set processor affinity for its threads and processes using the SetThreadAffinityMask or SetThreadIdealProcessor functions.
5.3 Using the YIELD (PAUSE) Instruction to Avoid Spinlock Contention
Where two logical processors on the same physical HT processor are competing for access to the same piece of data, the shared resources on the device can have the effect of "starving" one of the logical processors by, in effect, denying it access to the data. This is particularly significant when the piece of data is a spinlock, because the logical processor that is starved of access might own the spinlock. Intel recommends that logical processors be paused while executing spinlocks to alleviate this problem.5.2 Aggressive HALT of Processors in the Idle Loop
When a processor in a system running the Windows operating system has no work to do, it enters the idle loop. If the first logical processor on an HT processor is executing instructions in the idle loop, that is, if it is not doing any real work, it is competing for shared resources, which degrades the performance capability of the second logical processor on the same physical processor. The result of this is to degrade the rate at which the second logical processor could do real work.
To minimize the impact of this, the idle loop in WindowsXP and the Windows Server
2003 family has been modified to more aggressively HALT processors that are executing in the idle loop. After a logical processor has been halted, it no longer executes instructions and no longer competes for shared resources.
[ m$ ]The performance increase that is delivered when transitioning from one active logical processor to two active logical processors, on the same physical processor, is typically in the range of 10% (10) to 30% (30). So on average the total system performance would be likely to increase from 200 to 220 (that is, it goes up by 10%).
This lower performance increase is due to the fact that two threads are competing for the use of the shared resources on one of the physical HT processors. So scheduling a thread onto an HT processor that already has an active logical processor has the following effects:
o Slowing down the performance of that active logical processor
o Limiting the performance of the new scheduled thread on the second logical processor
Last edited by wuttz; 07-01-2010 at 02:36 AM.
Obviously it was adressed, you just failed to read/understand it.
Linux is SMT(HT) aware since kenrnel 2.4.18 and windows since XP. The os knows which cores are real cores and what cores are logical cores.
And you can code for it, at least in windows there are certain commands available to the programmer to retrive the mapping of the cores if you want to assign thread affinity manually.
http://www.xtremesystems.org/forums/...&postcount=262
Read the doc that is linked in that post. I am als fairly certain, that there is a similar option for linux.
yep know about affinity so yeah at the very least thats available, will read up on the doc and say my thoughts after.
I am officially on board the Bulldozer Bandwagon...
'Give me 16 cores or give me death!'![]()
Gigabyte Z77X-UD5H
G-Skill Ripjaws X 16Gb - 2133Mhz
Thermalright Ultra-120 eXtreme
i7 2600k @ 4.4Ghz
Sapphire 7970 OC 1.2Ghz
Mushkin Chronos Deluxe 128Gb
Socket AM3r2 has been known for some time.
Sampsa put up a slide here on the forum last year.
IIRC socket AM2+ was called AM2r2 before it was released, so AM3r2 could be
called AM3+ at launch perhaps.
Hopefully a BIOS upgrade is the only thing that is needed, and that the manufacturers
isn't so lazy that they try to avoid releasing them.
I'm pretty sure I read somewhere that AM3 will be compatible with Bulldozer. Unless something has changed since this slide was made.
![]()
Last edited by freeloader; 07-04-2010 at 08:12 PM.
As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"
It's a real pity that Opterons can't be overclocked due to lack of motherboards. The cost-effectiveness of the platform is of great value, especially to advanced home users. Socket longevity, more cores for lower prices etc. I'd be willing to pay more for a multi-processor Opteron board with overclocking abilities. I'm sure most duallie fans will do the same. Maybe we should setup a poll to measure the consensus?
Can anyone confirm the 9 core rumour?![]()
My Photo Website
MOBO: Gigabyte U3DR /1.6 Bios FI
CPU: I7 920@ 3.8 GHz
COOLER: Wobble X (CF2)/ Push-Pull ZF3
RAM : 6GB TRI KIT G-SKILL DDR3 CL7
Video: CrossFireX XFX ATI 4890 XXX 1GB DDR5 GPU: 1 GHz / MEM: 4 GHz
HDD: 1 * 500GB WD Caviar+RAID 0 2 * 1 TB SAMSUNG SPINPOINT
Optical: Sony Optiarc Labelflash
Audio : X-FI TITANIUM
PSU: Corsair CMPSU-750TX
Case: Antec 902
Rumor about 9 cores is not true.
Opteron is targeted at commercial server applications. It would be extremely expensive to support the consumer market so that will not happen. I have gone through the economics on several forums, but let me dispel the two biggest myths quickly:
1. There is not a "huge market" for server parts in consumer environments. There are definitely people that will want to do this, but it is a very small part of the market.
2. It is not inexpensive to "just add support". Essentially you are doubling a lot of the back end costs.
I would never stop a consumer from doing this because it is my job to sell more processors. But I would warn that if you go down that path, you won't see the level of support that you will see on Phenom and other consumer brands.
Who was the guy responsible for Socket 939 Opterons 1xx? They were extremely famous and popular here around 2006 because you could get Athlon 64 FX worth bins in a 250 U$D or so Opteron 165, besides that they use Toledo JH-E6 parts with 1 MB Cache L2 per Core while comparable A64X2 were Manchesters BH-E4 with 512 KB Cache L2. The enthusiast market eated those "Server parts".
About the 9 Core issue, the "128 Bits Core that interconnects the 8 64 Bits ones" sounds the 128 Bits IMC and a Crossbar or something.
As quoted by LowRun......"So, we are one week past AMD's worst case scenario for BD's availability but they don't feel like communicating about the delay, I suppose AMD must be removed from the reliable sources list for AMD's products launch dates"
I can guarantee you that while those were popular with enthusiasts, that was not a net benefit for AMD. And we did not sell nearly as many as you probably think. I have a Fox Vanilla 140 on my bike and so do a lot of the people that I ride with, but that does not mean it is the most popular fork out there. Just real popular with my friends.
Opterons 1xx were on a shortage back on its time, so I suppose that the enthusiast market did have an impact on them. Obviously it wasn't a net benefit because you were selling the highest quality bin at very cheap prices that cannibalized A64 counterparts sales. But for those that put their hands on them, it was wonderful.
Each BD core has 4 ALUs and at least 3 AGUs. I found some nice numbers in Open64 sources again. More (plus SB BOINC stats) here http://citavia.blog.de/2010/07/06/bu...-core-8927293/
It seems, Hiroshige Goto needs to redraw his diagrams (and me too).
Thanks for the new update,looks very interesting. Those integer cores should be quite powerful now that we know that Bobcat is pretty speedy too.
K8->K8L: 2 load or 1 load and 1 store per cycle
Bobcat: 1 load and 1 store per cycle
Bulldozer: 2 load and 1 store per cycle
Core: 2 load or 2 store per cycle
When AMD had 64-bit and Intel had only 32-bit, they tried to tell the world there was no need for 64-bit. Until they got 64-bit.
When AMD had IMC and Intel had FSB, they told the world "there is plenty of life left in the FSB" (actual quote, and yes, they had *math* to show it had more bandwidth). Until they got an IMC.
When AMD had dual core and Intel had single core, they told the world that consumers don't need multi core. Until they got dual core.
When intel was using MCM, they said it was a better solution than native dies. Until they got native dies. (To be fair, we knocked *unconnected* MCM, and still do, we never knocked MCM as a technology, so hold your flames.)by John Fruehe
Bookmarks