I think this is a peek on some tech of the BD cores... This is a intro to PowerPC7
http://www.eetimes.com/news/semi/sho...leID=219400955
Power7 looks incredible. Memory bandwidth is no joke!
The eDRAM cache of more than 16 Mbytes, improved off-chip signaling techniques "and a few more ingredients," helped IBM get beyond the 300 Gbyte/second memory bandwidth of the Power6. In addition, Power7 is said to pack as many as eight DDR3 memory channels.
with servers and desktop systems spreading apart i wonder how much this would actually help in a desktop though...
4 threads per core would mean 24 threads for a 6core processor... how in h3ll are we supposed to keep such a cpu busy with games using 1-4 threads...
Fast computers breed slow, lazy programmers
The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
http://www.lighterra.com/papers/modernmicroprocessors/
Modern Ram, makes an old overclocker miss BH-5 and the fun it was
And now something a bit different,an official statement from high ranking principal member of technical staff at AMD regarding Bulldozer's key improveemnt:
There's a glimpse of CMT definition somewhere in that statement .The next big turn of the screw for AMD will involve plugging its next-generation Bulldozer core into a Magny-Cours design. The new core expands what has been the single-threaded nature of the AMD cores "in a different fashion than Hyperthreading," said Conway, referring to Intel's method for supporting two threads on a core.![]()
+1
I think you are rite. I think the next gen arc. will have tits bits from PowerPc7 arc and CMT is rite there "going to CMT is a very sad move tough the whole thing about real men real cores"
The PowerPC7 seeems like a very good server processor but not as good a desktop one. But the nehalem arc is proof of how good a server processor can perform in desktop app's.
Also notice this:-
Basically we are taking a leaf from [Intel's] book but doing it differently,Read old IBM arc white papers and one tends to notice how different the CMT is implemented, its more civic to say the least.in a different fashion than Hyperthreading,
Last edited by ajaidev; 08-25-2009 at 08:36 AM.
Looking at patents tells you very little, far less than what is required to do an educated guess on what AMD's next uarch is going to look like. Companies patent like crazy, even if they will probably never use half of those patents. If you can, why not ? Just in case....
I'm pretty sure we can look at thousands of Intel/IBM patents from the '80s and build in our imagination a top notch uarch for the 21th century. Which is, with all due respect, what Dresdenboy did : looked over hundreds/thousands of patents and picked what he considered would make a great uarch. Somebody's wishes aren't necessarily what AMD will produce.
And btw Sandy Bridge has been redone twice if IIRC. You can imagine how high Intel aims after Nehalem.
More newz at XbitLabs :
AMD’s Bulldozer Processors to Feature Multi-Threading Technology [UPDATED]
![]()
I'll get Dresdenboy in here to respond to your statements.
Hehe, they didn't OFFICIALLY announce multi-threading. Probably something they want under NDA for a good long time. Interesting still is the fact that Orochi, the Bulldozer desktop variant, is still only supposed to have 4 cores. It wouldn't make much since to only run 4 threads all the way in 2011. I'm sure we'll here more about all of this soon so let's stay tuned.
Last edited by Mechromancer; 08-25-2009 at 01:18 PM.
More BS from XbitlaBS.They have now updated the article with AMD's denial of any form of SMT in BD cores(no surprise there,SMT was never on AMD's priority list since ,if you look at the diagram and the performance/efficiency it shows,it is the least desired of all 4 approaches).
BD cores will have advanced multithreading enhancements(CMT),as well as improved single thread performance,but SMT is simply not on the list.It doesn't mean it will never be there though.
@Mechromancer
Orochi is >4 cores according to the last roadmap.So likely 6 and going up from there. Also,if adding more cores in BD design could help non-multithreaded workloads to some degree,then it is not wasted die space in any case.
Last edited by informal; 08-25-2009 at 01:41 PM.
this cpu is supposed to come out in 2 years, not 10...
even with extensive efforts i find it very hard to believe 16+ threads make sense in 2 years... just look at how long it took for 2 threads to make sense (for one application, for multi tasking it has been a blessing from day1) and implementing that is way easier than going for true multi threadding and make use of 4+ threads afaik.
and not to mention that theres a huge amount of apps out there that cant and never will be multi threadded... and from reading that paper it sounds like implementing cmp results in 30% lower single threaded performance...
sacrificing single thread performance to have a huge amount of threads and better overall ipc makes no sense in the desktop segment... its great for servers though... the two segments really grow apart more and more i think...
would be nice if amd would go for a heterogenous core, one or two fast single thread cores and then several cmp cores would be nice...
must be tricky to implement and balance the resources though....
It appears to me that the primary difference with AMD's CMT and Intel's Hyperthreading is that AMD is putting more focus on single thread performance and Intel is putting more focus on multi-threading performance. AMD's design appears to have the ability to decode 8 instructions in parallel via 4 fast path and 4 micro-decoders; in sharp contrast with intel's nehalem which only has 3 fast path and 1 micro-decoder
and of course we can always speculate if the SIMD unit can effectively be used as 8 64bit floating point units to execute 8 separate floating point instructions per clock cycle.
Fast computers breed slow, lazy programmers
The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
http://www.lighterra.com/papers/modernmicroprocessors/
Modern Ram, makes an old overclocker miss BH-5 and the fun it was
Hmm.. reading papers linked above I've got exactly opposite fillings. Clustered multi-threading may have worser internal (instructions?) latencies then SMT. Also wider architecture may affect final frequency (Itanium anyone?). Also Hans claims that more than 3-way width is almost pointless for single thread performance (which is of cause arguable), so I don't see haw 8-way decoder may help to single thread performance.
Last edited by kl0012; 08-25-2009 at 09:33 PM.
K7 has proven to be better than P6.
K7 is grandfather of K10.5, & P6 is grandfather of all actual intel's work.
wha'ts wrong with K10 & K10.5 ?
AMD got a lot of work of K6, K7, K8 from outsite. K10 is the first one full developped inside.
And the architecture is lot of more complex than 10 years ago.
What's going wrong then ?
K10 got a huge problem with 65nm. I'm sure AMD tryed to get a 4 issue for K10 but got a heat problem with 65nm.
they didn't go 4 ways in 45nm because they thinked 6 cores will be a better improvement.
They'are right, but to beat intel AMD need the 4 ways to equal intel 4 ways architecture.
With 4 ways in single thread, AMD's gonna be a lot faster than the intel's single thread, @t same frequency.
But they cant use 6 cores & 4 ways i think. 4 ways is lot of heat ( remember K7 ), they wait for a better node, 32nm.
And 4 ways now with actual technologie need HT to be realy efficient, and the best is dynamic HT, with on the fly set up.
I seriously wait for G34 with 2 CPU set up.
I will able to get more & faster ram ( with HT 3.0 tech ) and more cores even slowest. now more core is better. & clock for clock AMD's is very near intel. a bit slower than i7 but same speed than the last core 2 quad in 45nm. It's nice with the ability to be upgrade by K11 without any change. AMD didn't said that yet, but it's to be likely.
With the first post in the thread, i understand now why intel is trying to lead AMD to bankroute. Less dollars in R&D mean for them less work to defend when K11 will be in the fight.
x86 is hard world ^^
Josh Walrath gave us an honorable mention in PCPer's podcast #71:http://www.pcper.com/article.php?aid=411. Check it out!
Here's a link to a paper from Cornell Uni that describes the CMT design in details as well as the possible implementation and efficiency gains when compared to SMT and Partitioned SMT approaches. In conclusion the researchers state that CMT4(4 cluster hypothetical design) approach is better performing than SMT16(hypothetical 16 thread SMT) approach while having much less power draw.
I do find one thing funny in the CMT design while looking at the linked paper.It's the name it was given by the rumor mill on the internet 2 or 3 years ago,the so-called Reverse Hyperthreading in AMD's next gen. cores.Looking at the design of CMT and what it actually does,it's clear that the name was not far from the truth. The actual Hyperthreading tech. in intel P4 and i7 distributes multiple threads in one pipeline and effectively tries to eliminate the "bubbles" in it. The CMT approach on the other side is attempting to execute one thread not across several cores,but several integer clusters(inside one core) which are actually independent pipelines(bar the decoding stage),so the "reverse hyperthreading" nickname is actually correct
. AMD left the FP/SSE unit "unclustered" probably for a good reason. The clusters themselves(2-way ) would be much simpler units from design POV and could be designed with high level of power management in mind and could be clocked more aggressively.
Last edited by informal; 08-27-2009 at 01:57 PM.
Hello folks, thanks to Mechromancer for inviting me although I found the thread before by looking for links to my blogI will try to contribute a bit as time allows.
@informal:
The same paper I found a while back when I was looking for CMT research papers.
@savantu:
Let me bring up my famous predecessor Hans de Vries and his different microarchitecture analysis' which can be found here:
http://www.chip-architect.com/
If we want to look at what somebody wishes, then I'd draw the µArchs of 6 way execution clusters with added DSPs, MEU (multimedia execution unit, was one candidate for being TFP capable - technical floating point with 3 operands and 32 registers.. - which died because of the 64bit mode SSE2 with 16 registers). Or older 8 way Archs, 2 cluster variants. That all has been in older patents, where it appeared and disappeared in phases, maybe in relation to designs being continued or getting scrapped. Let's remember all those scrapped K9 designs and reiterated Bulldozer etc as we heard it from certain news/rumor sites.
Some interesting architecture, still originating from rather old patents, already looks somewhat similar to what we find today:
http://www.chip-architect.com/news/2...hitecture.html
But in case of AMD I think we can take patents in a different way. Intel, IBM and other large companies have a lot of people working on such designs and try to cover many ideas to fill their IP pool. AMD is not so much IP oriented (just protecting itself somehow) because it can't afford to pay for tons of potentially useless patents and waste a lot of the design teams' time for developing "fun architectures" and patenting them. However, if someone developed an idea with some future potential, they might patent it just in case. That likely happens often during the early design stages.
There are many Intel patents looking like not related to anything known or planned, just ideas, which might be useful at some point in the future. But I don't look at that stuff. I just try to find common things in a lot of patents and ideas which look useful or fit to current and older academic research (CPU manufacturers or designers are often trailing current research by many years even before starting a design).
So since AMD is producing loss after loss each quarter, they have to focus on the really important things to do.
And if there is some truth in what Charlie Demerijan seems to know about the core, then what I found fits rather well to his statements like shared FPU, 2 int clusters and so on.
Are you just here for (attempted) comic relief Hornet? That fad of trashing IT journalists is long dead, did you miss the memo? Or maybe you think it's the popular opinion? Maybe among trolls, but for the most part, many in the industry give these rumors alot more credit than you do. Just so you know.
Bookmarks