Page 7 of 11 FirstFirst ... 45678910 ... LastLast
Results 151 to 175 of 262

Thread: Dresdenboys' blog: AMD Bulldozer - Patent based research

  1. #151
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Quote Originally Posted by informal View Post
    Just in from web cast:


    Bulldozer won't have classic "core" but something AMD calls modules !
    Bobcat is alive,sub 1W operation,super low power but has 90% of mainstream performance of today's mainstream CPUs! Fully modular and ready for APU implementation, has OoO abilities,2-way execution,very high performance and IMO looks like one BD "module"

    Now on to BD: confirmed CMT design! More in a minute!
    In : Int units are shared(2x2way execution),1 256b wide FPU.My God,DDboy hit the nail on the head,he is 99% correct in his speculation.


    more: highly advance clock gating,shutting down individual modules for best perf./watt ratio,Turbo like APM functionality.
    AMD states all of this is going to be a game changer.
    ALL HAIL DRESDENBOY!

    Hyperthreading = PWNT
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  2. #152
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    ^^ in the above quote there is a chance it's 2x 4-way int clusters instead of DDboy's speculation about 2x2-way since AMD lists 4 "pipes" in the BD module diagram. But i have no idea if these are simple or complex instructions mentioned there. In patents there is a mention of possible total of 8(eight!) instructions being executed in parallel (due to ability to execute additional 4 fastpath ones in the same clock cycle)
    Last edited by informal; 11-11-2009 at 01:10 PM.

  3. #153
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Quote Originally Posted by informal View Post
    ^^ in the above quote there is a chance it's 2x 4-way int clusters instead of DDboy's speculation about 2x2-way since AMD lists 4 "pipes" in the BD module diagram. But i have no idea if these are simple or complex instructions mentioned there. In patents there is a mention of possible total of 8(eight!) instructions being executed in parallel (due to ability to execute additional 4 fastpath ones in the same clock cycle)
    Once again I say:

    AMD =
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  4. #154
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    Quote Originally Posted by Mechromancer View Post
    Hyperthreading = PWNT
    CMT = paper

  5. #155
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    I just checked again and it is 4 way execution indeed with 2x2way clusters within one module(CPU core) and these two are sharing one wide(256b) SIMD unit.The front end for 4x4way would be way to much complex and expensive ,at least for this generation of products.But still is an option for future iterations of this (previously) unseen design approach. Fastpath comment still stands(even more so now) since 4 fastpath above 4 complex instructions give us precisely total of 8 instructions in one cycle,as dresdenboy found out in his research.
    What is amazing is level of detail he "guessed",he has been correct in almost every part of his speculations.I remember Savantu and his bashing against ddboy's blog,how it is just pure wishfull thinking and imagination,how semi companies patent useless stuff all the time etc. Looks like he is this year's honorable bunnysuit winner .
    Quote Originally Posted by Chumbucket843 View Post
    CMT = paper
    Yes for now,but it is mini-revolution in 2011 . The approach is novel and needs to be applauded since it's a brave move from AMD.
    CMT was all paper for years now,there is academic research papers but not 1 firm ever even presented a possible design solution. The design is much more potent than half-threading(SMT in intel's way of doing things),since resource sharing is done much better in hardware(via common front end and separate int execution units that can share data and one shared dual threaded SIMD unit-a best of both worlds approach). How will it work in practice we'll have to wait and see,but AMD stated that one small bobcat core(based on smae bulldozer) is at the 90% level of today's mainstream performance ,all with that very low power draw .

    edit: let's not forget Hans de Vries and his chip-architect website which detailed this very same approach 7 years ago(IIRC). This was the original Hammer design,not the sledgehammer aka K8 which AMD launched back in 2003(not to say K8 wasn't good,quite opposite). Back in those days Hans presented a possible future core from AMD that resembles exactly what dredenboy depicted in his diagrams and what AMD presented today .
    Last edited by informal; 11-11-2009 at 01:38 PM.

  6. #156
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Quote Originally Posted by informal View Post
    I just checked again and it is 4 way execution indeed with 2x2way clusters within one module(CPU core) and these two are sharing one wide(256b) SIMD unit.The front end for 4x4way would be way to much complex and expensive ,at least for this generation of products.But still is an option for future iterations of this (previously) unseen design approach. Fastpath comment still stands(even more so now) since 4 fastpath above 4 complex instructions give us precisely total of 8 instructions in one cycle,as dresdenboy found out in his research.
    What is amazing is level of detail he "guessed",he has been correct in almost every part of his speculations.I remember Savantu and his bashing against ddboy's blog,how it is just pure wishfull thinking and imagination,how semi companies patent useless stuff all the time etc. Looks like he is this year's honorable bunnysuit winner .

    Yes for now,but it is mini-revolution in 2011 . The approach is novel and needs to be applauded since it's a brave move from AMD.
    CMT was all paper for ears now,there is academic research papers but not 1 firm ever even presented a possible design solution. The design is much more potent than half-threading(SMT in intel's way of doing things),since resource sharing is done much better in hardware(via common front end and separate int execution units that can share data and one shared dual threaded SIMD unit-a best of both worlds approach). How will it work in practice we'll have to wait and see,but AMD stated that one small bobcat core(based on smae bulldozer) is at the 90% level of today's mainstream performance ,all with that very low power draw .
    Quote Originally Posted by Chumbucket843 View Post
    CMT = paper
    An we'll soon have a runner up.
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  7. #157
    Xtreme Addict
    Join Date
    Nov 2006
    Posts
    1,402
    if i understand well, 2 core shares 8 int pipelines. So in a dual core with a dual threaded apply you have up to 8 int/clock.
    And on same processor, with a monothread apply you can have up to 8 int/clock, because it's shared on 2 cores.
    On a Quad, with a multithreaded bench with 4 thread you can have up to 16int/clock, and with only 2 thread you can have up 16int/clock if the "good cores" are used. If only one thread 8/clock.

    Phenom II is based on athlon with only 3/clock/core.

    The performance increase could be amazing if they increase L3 to fetch that monster.

  8. #158
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Madcho you are mixing some things up.You need to reread the webcast and look again at dredenboy's blog.

    Anyhow,Charlie D. has a new dirty tidbit :
    http://www.semiaccurate.com/2009/11/...rth-has-moved/

    Bulldozer has taped out, the earth has moved
    More analyst day dirt dug up
    by Charlie Demerjian

    November 11, 2009

    THREE VERY INTERESTING tidbits snuck out in the Q&A session at the AMD analyst day today. It seems that Fusion and the new cores have taped out and are at the fabs.

    The new cores were said to begin sampling to OEMs in 2010. When pressed on the timing of tapeouts, one AMD spokesperson said that the fabs were 'running product now'. That means the chips have taped out and the fun is about to begin.

    Next up was the process the Fusion cores will be on. The first of them will be made on a silicon-on-insulator (SOI) process, something that makes a lot of sense. It is much easier to port a GPU from bulk silicon to SOI than to do things the other way around. The answer did not preclude bulk silicon variants of Fusion in the future, but since the first generation cores are not made on it, I would not expect that to happen for a while.

    The last bit was confirmation of what we have know, or at least have strongly suspected for a while, that the first generation of Fusion products will be a 'stars' core. The optimistic view of this is that AMD is reusing the old K10 variant for time to market reasons. Basically the uncore was done first, and since it is modular, why not use it?

    If you are pessimistic, you could see this as the Bulldozer and Bobcat cores being massively late. Given that they were on the roadmap for 45nm and delayed about 2 years ago to 32nm, this has a ring of truth to it. Because it was a planned move, and one that rationalizes a likely untenable earlier schedule, I don't think this is a delay, or even a bad thing. The 'delay' probably avoided another "Barcelona".

    In the end, it looks like AMD is on track. 2010 will likely be full of pain, but you can finally see the light at the end of the tunnel. The first of the new parts have taped out, so it is only a matter of time before details start leaking. Then we will know if the grand plan is working, at least on a technical level.S|A
    Last edited by informal; 11-12-2009 at 05:52 AM.

  9. #159
    Xtreme Member
    Join Date
    Jul 2004
    Location
    Berlin
    Posts
    275
    I've updated my blog regarding Bulldozer's FMAC units.

    The information provided during the Analyst Day simply was not enough to satisfy me (and maybe most of us)

  10. #160
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Thanks for the update dresdenboy!

    It's amazing how many things "you got right" . I still remember some skeptic intel fans(savantu, where art thou?) who claimed that your patent based research would not be successful at all since companies "patent all kinds of stuff daily" and bulldozer you predicted was some wishful thinking.We all know how that turned out .

    Very interesting find on the fmac possible structure(especially that not-so-confidential-anymore paper ).

  11. #161
    Xtreme Enthusiast
    Join Date
    Jun 2005
    Posts
    960
    This way no [instruction] fusion of FADDs and FMULs (in todays code) is necessary, which would have not only added complexity in the decoders but would only work for certain combinations.
    That rules out some sort of micro-op fusion like the core architecture has?

  12. #162
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,782
    So, "in english for the rest us", how much performance will BD have over Phenom II; roughly?

  13. #163
    Xtreme Cruncher
    Join Date
    May 2009
    Location
    Bloomfield
    Posts
    1,968
    no one here has any idea.

  14. #164
    YouTube Addict
    Join Date
    Aug 2005
    Location
    Klaatu barada nikto
    Posts
    17,574
    Quote Originally Posted by freeloader View Post
    So, "in english for the rest us", how much performance will BD have over Phenom II; roughly?
    in what exactly? Because the performance is going to vary based up the task being used as the basis for comparison.
    Fast computers breed slow, lazy programmers
    The price of reliability is the pursuit of the utmost simplicity. It is a price which the very rich find most hard to pay.
    http://www.lighterra.com/papers/modernmicroprocessors/
    Modern Ram, makes an old overclocker miss BH-5 and the fun it was

  15. #165
    Xtreme Addict
    Join Date
    Jun 2002
    Location
    Ontario, Canada
    Posts
    1,782
    Quote Originally Posted by nn_step View Post
    in what exactly? Because the performance is going to vary based up the task being used as the basis for comparison.
    Things that matter to me are Folding@Home and video & audio editing/transcoding.

  16. #166
    Xtreme Addict
    Join Date
    Jan 2009
    Posts
    1,445
    it will for sure have i7 (more probably) power. but i think it is really up in the air. from what i understand, this design is very ....different/new, because of this; it is hard to tell what type of power it will yield? any of the gurus care to correct me?
    [MOBO] Asus CrossHair Formula 5 AM3+
    [GPU] ATI 6970 x2 Crossfire 2Gb
    [RAM] G.SKILL Ripjaws X Series 16GB (4 x 4GB) 240-Pin DDR3 1600
    [CPU] AMD FX-8120 @ 4.8 ghz
    [COOLER] XSPC Rasa 750 RS360 WaterCooling
    [OS] Windows 8 x64 Enterprise
    [HDD] OCZ Vertex 3 120GB SSD
    [AUDIO] Logitech S-220 17 Watts 2.1

  17. #167
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Location
    Singapore
    Posts
    970
    Quote Originally Posted by Chumbucket843 View Post
    CMT = paper
    AMD didn't choose multi-threading like Intel did for their Pentium 4 in the Athlon 64 and their realized that it is a mistake. So they won't do this again!!!
    Main Rig:
    Processor & Motherboard:AMD Ryzen5 1400 ' Gigabyte B450M-DS3H
    Random Access Memory Module:Adata XPG DDR4 3000 MHz 2x8GB
    Graphic Card:XFX RX 580 4GB
    Power Supply Unit:FSP AURUM 92+ Series PT-650M
    Storage Unit:Crucial MX 500 240GB SATA III SSD
    Processor Heatsink Fan:AMD Wraith Spire RGB
    Chasis:Thermaltake Level 10GTS Black

  18. #168
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Location
    Singapore
    Posts
    970
    Quote Originally Posted by freeloader View Post
    So, "in english for the rest us", how much performance will BD have over Phenom II; roughly?
    at very least 50% over Phenom II, because AMD's engineers know very well, if the minimum 50% couldn't be achieved; it would be doomed, as Intel will be launching new architecture to counter BullDozer's architecture.

    From the paper, it is very clear that BullDozer is going to be benefit from the new design in terms of power dissipation and much higher IPC in ALU and FPU. Hopefully BullDozer could make use of build in GPU to do much of the FPU intensive job.

    Expected to be about 80%~100% over current Phenom II in certain area like encoding and ALU, overall is 60%.
    Main Rig:
    Processor & Motherboard:AMD Ryzen5 1400 ' Gigabyte B450M-DS3H
    Random Access Memory Module:Adata XPG DDR4 3000 MHz 2x8GB
    Graphic Card:XFX RX 580 4GB
    Power Supply Unit:FSP AURUM 92+ Series PT-650M
    Storage Unit:Crucial MX 500 240GB SATA III SSD
    Processor Heatsink Fan:AMD Wraith Spire RGB
    Chasis:Thermaltake Level 10GTS Black

  19. #169
    Xtreme X.I.P. Particle's Avatar
    Join Date
    Apr 2008
    Location
    Kansas
    Posts
    3,219
    Quote Originally Posted by Chumbucket843 View Post
    CMT = paper
    I don't think that's accurate. Since CMT isn't something they're likely to just tack on at the end and AMD is likely to be experimenting with pieces on silicon at this point, I think it's rather more likely that it isn't just some neat concept paper. At the very least, its physical implementation has probably been designed.
    Particle's First Rule of Online Technical Discussion:
    As a thread about any computer related subject has its length approach infinity, the likelihood and inevitability of a poorly constructed AMD vs. Intel fight also exponentially increases.

    Rule 1A:
    Likewise, the frequency of a car pseudoanalogy to explain a technical concept increases with thread length. This will make many people chuckle, as computer people are rarely knowledgeable about vehicular mechanics.

    Rule 2:
    When confronted with a post that is contrary to what a poster likes, believes, or most often wants to be correct, the poster will pick out only minor details that are largely irrelevant in an attempt to shut out the conflicting idea. The core of the post will be left alone since it isn't easy to contradict what the person is actually saying.

    Rule 2A:
    When a poster cannot properly refute a post they do not like (as described above), the poster will most likely invent fictitious counter-points and/or begin to attack the other's credibility in feeble ways that are dramatic but irrelevant. Do not underestimate this tactic, as in the online world this will sway many observers. Do not forget: Correctness is decided only by what is said last, the most loudly, or with greatest repetition.

    Rule 3:
    When it comes to computer news, 70% of Internet rumors are outright fabricated, 20% are inaccurate enough to simply be discarded, and about 10% are based in reality. Grains of salt--become familiar with them.

    Remember: When debating online, everyone else is ALWAYS wrong if they do not agree with you!

    Random Tip o' the Whatever
    You just can't win. If your product offers feature A instead of B, people will moan how A is stupid and it didn't offer B. If your product offers B instead of A, they'll likewise complain and rant about how anyone's retarded cousin could figure out A is what the market wants.

  20. #170
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Particle is correct since Mr Bergman stated in the Q&A session of the Analyst day that they are twiddling around with the first samples at this moment in time and that they will be shipping the product to their partners (for evaluation and testing purposes ) in first half of 2010,just by the time the whole range of Magny Cours and Lisbon product is launched.

  21. #171
    Xtreme Member
    Join Date
    Apr 2008
    Location
    Stockholm, Sweden
    Posts
    324
    Quote Originally Posted by http://www.sun.com/processors/throughput/faqs.html#5
    What is chip multithreading (CMT)? How does it differ from chip multiprocessing (CMP) and simultaneous multithreading (SMT)?

    Today's traditional single-core processors can only process one thread at a time, spending a majority of time waiting for data from memory. In sharp contrast, chip multithreading (CMT) refers to a processor's ability to process multiple software threads. A CMT processor could implement this multithreaded capability using a variety of methods, such as (i) having multiple cores on a single chip (CMP), (ii) executing multiple threads on a single core (SMT), or (iii) combination of both CMP and SMT.
    Didn't AMD say SMT was nothing for them and they focused on CMP?

  22. #172
    Xtreme Enthusiast
    Join Date
    Oct 2007
    Location
    Singapore
    Posts
    970
    Quote Originally Posted by Eson View Post
    Didn't AMD say SMT was nothing for them and they focused on CMP?

    At that time of spoke, there were less than 0.1% of software supporting this and VMwares are only used on servers
    Now, VMwares are entering desktop level and more and more softwares are taking the advantage of multi-core and multi-threading.
    Things change and so do trend, Intel once thought their CPU would reach 10GHz in a few years. Aren't they were right at that time of speaking??
    Do not just take a paragraph out of context

    Main Rig:
    Processor & Motherboard:AMD Ryzen5 1400 ' Gigabyte B450M-DS3H
    Random Access Memory Module:Adata XPG DDR4 3000 MHz 2x8GB
    Graphic Card:XFX RX 580 4GB
    Power Supply Unit:FSP AURUM 92+ Series PT-650M
    Storage Unit:Crucial MX 500 240GB SATA III SSD
    Processor Heatsink Fan:AMD Wraith Spire RGB
    Chasis:Thermaltake Level 10GTS Black

  23. #173
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    • Two integer clusters share fetch and decode logic but have their own dedicated Instruction and Data cache
    • Integer clusters can not be shared between threads: integer cores act like a Chip Multi Processing (CMP) CPU.
    • The extra integer core (schedulers, D-cache and pipelines) adds only 5% die space
    • L1-caches are similar to Barcelona/Shanghai (64 KB 2-way? Not confirmed)
    • Up to 4 modules share a L3-cache and Northbridge
    Two times 4 Bulldozer modules (2 x 8 "cores" or 16 cores) are about 60 to 80% faster than the twelve core Opteron 6100 CPU in SPECInt_rate.
    http://it.anandtech.com/IT/showdoc.aspx?i=3681&p=3

    very interesting article, who was asking about L1 instruction and CMP related info. Lastly SPECInt_rate hehe i am too tired to use the calculator some one put that percentages in numerical value.
    Coming Soon

  24. #174
    Xtreme Addict
    Join Date
    Jan 2005
    Posts
    1,366
    Quote Originally Posted by ajaidev View Post
    http://it.anandtech.com/IT/showdoc.aspx?i=3681&p=3

    very interesting article, who was asking about L1 instruction and CMP related info. Lastly SPECInt_rate hehe i am too tired to use the calculator some one put that percentages in numerical value.
    Depends on frequency, but here is it:
    A+ Server 1021M-UR+B, AMD Opteron 2439 SE (12 cores, 2.8GHz) - 215
    CELSIUS R670, Intel Xeon W5590 (8 cores, 3.3GHz) - 274 (+27%)
    Bulldozer (16 cores 2.8GHz?) - 344-386 (+60%-80%)
    SandyBridge (12-16 core server version?) - ???

  25. #175
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    First of all we don't know the clocks of the Interlagos ATM.Second,there will also be 2P version of 16 core variant(4 modules/8cores in MCM via direct connect 2.0 resulting in 16 cores within a single MPU;4 DDR3 channels) .That one will have massive int/fp rate results. And judging by the latest Dredenboy's blog about the actual implementation of the FMAC units(bridged as described in patents),the fp/sse part will be brutally strong..
    Last edited by informal; 11-24-2009 at 01:09 PM.

Page 7 of 11 FirstFirst ... 45678910 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •