Page 2 of 11 FirstFirst 12345 ... LastLast
Results 26 to 50 of 262

Thread: Dresdenboys' blog: AMD Bulldozer - Patent based research

  1. #26
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Quote Originally Posted by informal View Post
    I doubt the core(s) would be able to execute 4 threads(maybe Mechromancer meant 4 instructions per clock?). The patent simply describe a 4-way design(4-way as 4 way decoding ;in K8/K10 we are stuck at 3 way). The new approach is flexible as it allows the superior efficiency of 2-way decoding and combines these 2 int pipelines to achieve as much as possible efficiency from integer code(as close as possible to ideal 4-way execution).Similarly,in FP/SSE case we have a 4-way,although "single", SEE unit which is "super" wide(256bit and supporting AVX and FMA4 extensions) and which is able of splitting in many various ways(1x256,2x128 or even 4x64bit),similarly to "Itanium way",which in turns could make it much more efficient then the present ,as Hans call the,dumb way of doing things.

    The described design would still receive many more improvements to other aspects of the core and uncore parts,but the underlying uarchitecture is pretty well presented in dresdenboy's blog. The described design should be very efficient in both multithreading and singlethreading,relying on the split int pipelines for great efficiency and even possible speculative execution(it could be useful with branch prediction and data reliability). There are also many patents on improvements in the area of power management,GPU/CPU integration(2nd gen. of Fusion in ~2012) etc.
    Also the design can be always extended in future,by adding one more int "cluster" and thus making a possible efficient 3x2-way integer "super cluster"(a real unified 6-way design,ie. the natural extension of what we have today,would be a power hog and much less efficient). The FP/SSE part would need appropriate rework and this could be a challenge since in present day patents the SSE unit is still unified and not split in smaller clusters.
    Ok I will go with this as Informal is much more educated on the subject. Thank you Informal! I think it's safe to say we'll see a higher IPC out of BD. Throughput Architecture anyone?
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  2. #27
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    Quote Originally Posted by ajaidev View Post
    I think the PowerPC7 tech may have deeper impact on BD than SPARC... It was long rumored that PoerPC7 and Future amd server processors will share the same socket. Also SPARC uses SMT to get the 8 TPC AMD already said they don't like it, i even asked a question on SMT on AMD work the replay was pretty anti SMT to say the least.
    I think this is a peek on some tech of the BD cores... This is a intro to PowerPC7

    http://www.eetimes.com/news/semi/sho...leID=219400955

  3. #28
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Power7 looks incredible. Memory bandwidth is no joke!
    The eDRAM cache of more than 16 Mbytes, improved off-chip signaling techniques "and a few more ingredients," helped IBM get beyond the 300 Gbyte/second memory bandwidth of the Power6. In addition, Power7 is said to pack as many as eight DDR3 memory channels.
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  4. #29
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Wonder what latency the eDRAM has, bandwidth is nice, but cpus also like low latencies.

  5. #30
    Xtreme X.I.P.
    Join Date
    Nov 2002
    Location
    Shipai
    Posts
    31,147
    with servers and desktop systems spreading apart i wonder how much this would actually help in a desktop though...
    4 threads per core would mean 24 threads for a 6core processor... how in h3ll are we supposed to keep such a cpu busy with games using 1-4 threads...

  6. #31
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by saaya View Post
    with servers and desktop systems spreading apart i wonder how much this would actually help in a desktop though...
    4 threads per core would mean 24 threads for a 6core processor... how in h3ll are we supposed to keep such a cpu busy with games using 1-4 threads...
    based that the current schematics that are posted. It would be dual thread only. but it should show up at two distinct separate cores to the operating system.
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  7. #32
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by saaya View Post
    with servers and desktop systems spreading apart i wonder how much this would actually help in a desktop though...
    4 threads per core would mean 24 threads for a 6core processor... how in h3ll are we supposed to keep such a cpu busy with games using 1-4 threads...
    magic word: speculative threading

  8. #33
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    And now something a bit different,an official statement from high ranking principal member of technical staff at AMD regarding Bulldozer's key improveemnt :


    The next big turn of the screw for AMD will involve plugging its next-generation Bulldozer core into a Magny-Cours design. The new core expands what has been the single-threaded nature of the AMD cores "in a different fashion than Hyperthreading," said Conway, referring to Intel's method for supporting two threads on a core.
    There's a glimpse of CMT definition somewhere in that statement .

  9. #34
    Registered User
    Join Date
    Mar 2009
    Posts
    72
    Quote Originally Posted by saaya View Post
    with servers and desktop systems spreading apart i wonder how much this would actually help in a desktop though...
    4 threads per core would mean 24 threads for a 6core processor... how in h3ll are we supposed to keep such a cpu busy with games using 1-4 threads...
    10 years from now current games might resemble pong.

  10. #35
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    Quote Originally Posted by informal View Post
    And now something a bit different,an official statement from high ranking principal member of technical staff at AMD regarding Bulldozer's key improveemnt :


    There's a glimpse of CMT definition somewhere in that statement .
    +1

    I think you are rite. I think the next gen arc. will have tits bits from PowerPc7 arc and CMT is rite there "going to CMT is a very sad move tough the whole thing about real men real cores"

    The PowerPC7 seeems like a very good server processor but not as good a desktop one. But the nehalem arc is proof of how good a server processor can perform in desktop app's.


    Also notice this:-

    Basically we are taking a leaf from [Intel's] book but doing it differently,
    in a different fashion than Hyperthreading,
    Read old IBM arc white papers and one tends to notice how different the CMT is implemented, its more civic to say the least.
    Last edited by ajaidev; 08-25-2009 at 08:36 AM.

  11. #36
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,730
    Looking at patents tells you very little, far less than what is required to do an educated guess on what AMD's next uarch is going to look like. Companies patent like crazy, even if they will probably never use half of those patents. If you can, why not ? Just in case....
    I'm pretty sure we can look at thousands of Intel/IBM patents from the '80s and build in our imagination a top notch uarch for the 21th century. Which is, with all due respect, what Dresdenboy did : looked over hundreds/thousands of patents and picked what he considered would make a great uarch. Somebody's wishes aren't necessarily what AMD will produce.

    And btw Sandy Bridge has been redone twice if IIRC. You can imagine how high Intel aims after Nehalem.
    Quote Originally Posted by Heinz Guderian View Post
    There are no desperate situations, there are only desperate people.

  12. #37
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,128
    Quote Originally Posted by savantu View Post
    Looking at patents tells you very little, far less than what is required to do an educated guess on what AMD's next uarch is going to look like. Companies patent like crazy, even if they will probably never use half of those patents. If you can, why not ? Just in case....
    I'm pretty sure we can look at thousands of Intel/IBM patents from the '80s and build in our imagination a top notch uarch for the 21th century. Which is, with all due respect, what Dresdenboy did : looked over hundreds/thousands of patents and picked what he considered would make a great uarch. Somebody's wishes aren't necessarily what AMD will produce.

    And btw Sandy Bridge has been redone twice if IIRC. You can imagine how high Intel aims after Nehalem.
    Word. Good bye AMD!

    Oh wait...


  13. #38
    Xtreme Mentor
    Join Date
    Jun 2008
    Location
    France - Bx
    Posts
    2,601

  14. #39
    Xtreme Member
    Join Date
    May 2009
    Location
    Calgary, Alberta
    Posts
    115
    Quote Originally Posted by Olivon View Post
    Thank you sir!

  15. #40
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Quote Originally Posted by savantu View Post
    Looking at patents tells you very little, far less than what is required to do an educated guess on what AMD's next uarch is going to look like. Companies patent like crazy, even if they will probably never use half of those patents. If you can, why not ? Just in case....
    I'm pretty sure we can look at thousands of Intel/IBM patents from the '80s and build in our imagination a top notch uarch for the 21th century. Which is, with all due respect, what Dresdenboy did : looked over hundreds/thousands of patents and picked what he considered would make a great uarch. Somebody's wishes aren't necessarily what AMD will produce.

    And btw Sandy Bridge has been redone twice if IIRC. You can imagine how high Intel aims after Nehalem.
    I'll get Dresdenboy in here to respond to your statements.

    Quote Originally Posted by Olivon View Post
    Hehe, they didn't OFFICIALLY announce multi-threading. Probably something they want under NDA for a good long time. Interesting still is the fact that Orochi, the Bulldozer desktop variant, is still only supposed to have 4 cores. It wouldn't make much since to only run 4 threads all the way in 2011. I'm sure we'll here more about all of this soon so let's stay tuned.
    Last edited by Mechromancer; 08-25-2009 at 01:18 PM.
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  16. #41
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Olivon View Post
    More BS from XbitlaBS .They have now updated the article with AMD's denial of any form of SMT in BD cores(no surprise there,SMT was never on AMD's priority list since ,if you look at the diagram and the performance/efficiency it shows,it is the least desired of all 4 approaches).
    BD cores will have advanced multithreading enhancements(CMT),as well as improved single thread performance,but SMT is simply not on the list.It doesn't mean it will never be there though.

    @Mechromancer

    Orochi is >4 cores according to the last roadmap.So likely 6 and going up from there. Also,if adding more cores in BD design could help non-multithreaded workloads to some degree,then it is not wasted die space in any case .
    Last edited by informal; 08-25-2009 at 01:41 PM.

  17. #42
    Xtreme X.I.P.
    Join Date
    Nov 2002
    Location
    Shipai
    Posts
    31,147
    Quote Originally Posted by lockee View Post
    10 years from now current games might resemble pong.
    this cpu is supposed to come out in 2 years, not 10...
    even with extensive efforts i find it very hard to believe 16+ threads make sense in 2 years... just look at how long it took for 2 threads to make sense (for one application, for multi tasking it has been a blessing from day1) and implementing that is way easier than going for true multi threadding and make use of 4+ threads afaik.

    and not to mention that theres a huge amount of apps out there that cant and never will be multi threadded... and from reading that paper it sounds like implementing cmp results in 30% lower single threaded performance...

    sacrificing single thread performance to have a huge amount of threads and better overall ipc makes no sense in the desktop segment... its great for servers though... the two segments really grow apart more and more i think...

    would be nice if amd would go for a heterogenous core, one or two fast single thread cores and then several cmp cores would be nice...
    must be tricky to implement and balance the resources though....

  18. #43
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    It appears to me that the primary difference with AMD's CMT and Intel's Hyperthreading is that AMD is putting more focus on single thread performance and Intel is putting more focus on multi-threading performance. AMD's design appears to have the ability to decode 8 instructions in parallel via 4 fast path and 4 micro-decoders; in sharp contrast with intel's nehalem which only has 3 fast path and 1 micro-decoder


    and of course we can always speculate if the SIMD unit can effectively be used as 8 64bit floating point units to execute 8 separate floating point instructions per clock cycle.
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  19. #44
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by nn_step View Post
    It appears to me that the primary difference with AMD's CMT and Intel's Hyperthreading is that AMD is putting more focus on single thread performance and Intel is putting more focus on multi-threading performance.
    Hmm.. reading papers linked above I've got exactly opposite fillings. Clustered multi-threading may have worser internal (instructions?) latencies then SMT. Also wider architecture may affect final frequency (Itanium anyone?). Also Hans claims that more than 3-way width is almost pointless for single thread performance (which is of cause arguable), so I don't see haw 8-way decoder may help to single thread performance.
    Last edited by kl0012; 08-25-2009 at 09:33 PM.

  20. #45
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    K7 has proven to be better than P6.

    K7 is grandfather of K10.5, & P6 is grandfather of all actual intel's work.

    wha'ts wrong with K10 & K10.5 ?

    AMD got a lot of work of K6, K7, K8 from outsite. K10 is the first one full developped inside.

    And the architecture is lot of more complex than 10 years ago.

    What's going wrong then ?

    K10 got a huge problem with 65nm. I'm sure AMD tryed to get a 4 issue for K10 but got a heat problem with 65nm.

    they didn't go 4 ways in 45nm because they thinked 6 cores will be a better improvement.

    They'are right, but to beat intel AMD need the 4 ways to equal intel 4 ways architecture.

    With 4 ways in single thread, AMD's gonna be a lot faster than the intel's single thread, @t same frequency.

    But they cant use 6 cores & 4 ways i think. 4 ways is lot of heat ( remember K7 ), they wait for a better node, 32nm.

    And 4 ways now with actual technologie need HT to be realy efficient, and the best is dynamic HT, with on the fly set up.

    I seriously wait for G34 with 2 CPU set up.
    I will able to get more & faster ram ( with HT 3.0 tech ) and more cores even slowest. now more core is better. & clock for clock AMD's is very near intel. a bit slower than i7 but same speed than the last core 2 quad in 45nm. It's nice with the ability to be upgrade by K11 without any change. AMD didn't said that yet, but it's to be likely.

    With the first post in the thread, i understand now why intel is trying to lead AMD to bankroute. Less dollars in R&D mean for them less work to defend when K11 will be in the fight.

    x86 is hard world ^^

  21. #46
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Josh Walrath gave us an honorable mention in PCPer's podcast #71:http://www.pcper.com/article.php?aid=411. Check it out!
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  22. #47
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Here's a link to a paper from Cornell Uni that describes the CMT design in details as well as the possible implementation and efficiency gains when compared to SMT and Partitioned SMT approaches. In conclusion the researchers state that CMT4(4 cluster hypothetical design) approach is better performing than SMT16(hypothetical 16 thread SMT) approach while having much less power draw.

    I do find one thing funny in the CMT design while looking at the linked paper.It's the name it was given by the rumor mill on the internet 2 or 3 years ago,the so-called Reverse Hyperthreading in AMD's next gen. cores.Looking at the design of CMT and what it actually does,it's clear that the name was not far from the truth . The actual Hyperthreading tech. in intel P4 and i7 distributes multiple threads in one pipeline and effectively tries to eliminate the "bubbles" in it. The CMT approach on the other side is attempting to execute one thread not across several cores,but several integer clusters(inside one core) which are actually independent pipelines(bar the decoding stage),so the "reverse hyperthreading" nickname is actually correct . AMD left the FP/SSE unit "unclustered" probably for a good reason. The clusters themselves(2-way ) would be much simpler units from design POV and could be designed with high level of power management in mind and could be clocked more aggressively.
    Last edited by informal; 08-27-2009 at 01:57 PM.

  23. #48
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    Hello folks, thanks to Mechromancer for inviting me although I found the thread before by looking for links to my blog I will try to contribute a bit as time allows.

    @informal:
    The same paper I found a while back when I was looking for CMT research papers.

    @savantu:
    Quote Originally Posted by savantu View Post
    Looking at patents tells you very little, far less than what is required to do an educated guess on what AMD's next uarch is going to look like. Companies patent like crazy, even if they will probably never use half of those patents. If you can, why not ? Just in case....
    I'm pretty sure we can look at thousands of Intel/IBM patents from the '80s and build in our imagination a top notch uarch for the 21th century. Which is, with all due respect, what Dresdenboy did : looked over hundreds/thousands of patents and picked what he considered would make a great uarch. Somebody's wishes aren't necessarily what AMD will produce.
    Let me bring up my famous predecessor Hans de Vries and his different microarchitecture analysis' which can be found here:
    http://www.chip-architect.com/

    If we want to look at what somebody wishes, then I'd draw the µArchs of 6 way execution clusters with added DSPs, MEU (multimedia execution unit, was one candidate for being TFP capable - technical floating point with 3 operands and 32 registers.. - which died because of the 64bit mode SSE2 with 16 registers). Or older 8 way Archs, 2 cluster variants. That all has been in older patents, where it appeared and disappeared in phases, maybe in relation to designs being continued or getting scrapped. Let's remember all those scrapped K9 designs and reiterated Bulldozer etc as we heard it from certain news/rumor sites.

    Some interesting architecture, still originating from rather old patents, already looks somewhat similar to what we find today:
    http://www.chip-architect.com/news/2...hitecture.html

    But in case of AMD I think we can take patents in a different way. Intel, IBM and other large companies have a lot of people working on such designs and try to cover many ideas to fill their IP pool. AMD is not so much IP oriented (just protecting itself somehow) because it can't afford to pay for tons of potentially useless patents and waste a lot of the design teams' time for developing "fun architectures" and patenting them. However, if someone developed an idea with some future potential, they might patent it just in case. That likely happens often during the early design stages.

    There are many Intel patents looking like not related to anything known or planned, just ideas, which might be useful at some point in the future. But I don't look at that stuff. I just try to find common things in a lot of patents and ideas which look useful or fit to current and older academic research (CPU manufacturers or designers are often trailing current research by many years even before starting a design).

    So since AMD is producing loss after loss each quarter, they have to focus on the really important things to do.

    And if there is some truth in what Charlie Demerijan seems to know about the core, then what I found fits rather well to his statements like shared FPU, 2 int clusters and so on.

  24. #49
    I am Xtreme
    Join Date
    Jul 2007
    Location
    Austria
    Posts
    5,485
    Quote Originally Posted by Dresdenboy View Post
    Charlie Demerijan
    know....
    ...

  25. #50
    Xtreme Enthusiast
    Join Date
    Feb 2005
    Posts
    970
    Are you just here for (attempted) comic relief Hornet? That fad of trashing IT journalists is long dead, did you miss the memo? Or maybe you think it's the popular opinion? Maybe among trolls, but for the most part, many in the industry give these rumors alot more credit than you do. Just so you know.

Page 2 of 11 FirstFirst 12345 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •