Page 14 of 149 FirstFirst ... 4111213141516172464114 ... LastLast
Results 326 to 350 of 3724

Thread: AMD Cayman info (or rumor)

  1. #326
    Xtreme Addict
    Join Date
    Feb 2007
    Location
    Arizona, USA
    Posts
    1,700
    Quote Originally Posted by zerazax View Post
    Why?

    The drivers say at least 2 cards are going to be VLIW4 within NI, and since its not Barts but we have Cayman and Antilles to come...
    Hmm... haven't seen that.. mind linking me?


    VLIW5 stands for "Very Long Instruction Word" 5; VLIW standing for the architecture basis, 5 for the number of execution units.
    Each shader 'unit' is composed of 5 execution units, 4 of them being 'simple', with the 5th being able to handle all functions, including transcendental functions.
    This is one (of many) reasons that AMD's u-arch's generally have more shaders than nvidia, yet has comparable performance; unless the developer specifically codes for this unique computing style, a game will only use some of the execution units. This is also one reason why the theoretical computation power for AMD's u-arch's are far greater than what is actually achieved in actual applications, because AMD's figures assume that all execution units are utilized.


    Core i7 920 D0 B-batch (4.1) (Kinda Stable?) | DFI X58 T3eH8 (Fed up with its' issues, may get a new board soon) | Patriot 1600 (9-9-9-24) (for now) | XFX HD 4890 (971/1065) (for now) |
    80GB X25-m G2 | WD 640GB | PCP&C 750 | Dell 2408 LCD | NEC 1970GX LCD | Win7 Pro | CoolerMaster ATCS 840 {Modded to reverse-ATX, WC'ing internal}

    CPU Loop: MCP655 > HK 3.0 LT > ST 320 (3x Scythe G's) > ST Res >Pump
    GPU Loop: MCP655 > MCW-60 > PA160 (1x YL D12SH) > ST Res > BIP 220 (2x YL D12SH) >Pump

  2. #327
    Xtreme Addict
    Join Date
    Jul 2007
    Posts
    1,488
    Quote Originally Posted by nascasho View Post
    Whooa, looks sweet!

    So educate me on what VLIW4 is...
    In Cypress (and back to R600) there were 5 SPUs (Stream Processing Units) per SP (Streaming Processor).
    Like in the first image here: http://www.anandtech.com/show/2841/4
    Now there will be 4 SPUs per SP. And as ajaidev points out, there may be a different arrangement of simple and complex units within the SP instead of 4+1.

    If they keep a grouping of 16 SPs per SIMD, and there are 1920 SPUs, then Cayman will have 30 SIMDs per chip.

    The reason for such a change might be to increase utilization. It could have good results if one or two of the units in your 5 wide design are usually sitting idle. That's why RV770 and Evergreen had such a huge difference between low utilization (games) and high utilization (furmark) in terms of power-draw and heat output.

    Quote Originally Posted by ColonelCain View Post
    Hmm... haven't seen that.. mind linking me?


    VLIW5 stands for "Very Long Instruction Word" 5; VLIW standing for the architecture basis, 5 for the number of execution units.
    Each shader 'unit' is composed of 5 execution units, 4 of them being 'simple', with the 5th being able to handle all functions, including transcendental functions.
    This is one (of many) reasons that AMD's u-arch's generally have more shaders than nvidia, yet has comparable performance; unless the developer specifically codes for this unique computing style, a game will only use some of the execution units. This is also one reason why the theoretical computation power for AMD's u-arch's are far greater than what is actually achieved in actual applications, because AMD's figures assume that all execution units are utilized.
    They have had a 5-wide design for a while now. Why assume that's never going to change? Especially on a generational change?

  3. #328
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Lansing, MI / London / Stinkaypore
    Posts
    1,788
    ^ Evergreen had OCP so Furmark didn't use more than board power.

    WXYZT turns to WXYZ with T probably done in emulation (WXYZ or T for GPGPU).
    The penultimate reason for this is die size. And yet Cayman is still big. Makes you wonder.
    Quote Originally Posted by radaja View Post
    so are they launching BD soon or a comic book?

  4. #329
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Effectively,Cayman could have 50% more SP resources than Cypress,taking into account the grouping per SIMD. If the clocks stay in the range of Cypress,this thing will be the fastest single GPU card on the market,hands down. Now,how will they manage to put two of those monster for Antilles card and keep "reasonable" TDP is another question.

  5. #330
    Xtreme Addict
    Join Date
    Jul 2007
    Posts
    1,488
    Quote Originally Posted by Macadamia View Post
    ^ Evergreen had OCP so Furmark didn't use more than board power.

    WXYZT turns to WXYZ with T probably done in emulation (WXYZ or T for GPGPU).
    The penultimate reason for this is die size. And yet Cayman is still big. Makes you wonder.
    I disagree about die space. Reducing the width of the SP increases the ratio of logic per SPU. Without knowing more specific architectural details I'd guess 5D->4D is ambiguous or detrimental to die space.

  6. #331
    Xtreme Addict
    Join Date
    Jun 2007
    Location
    Thessaloniki, Greece
    Posts
    1,307
    If it is indeed a 4 port VLIW architecture this will be very expensive for AMD as i speculate they will have to recoup the R&D costs for moving the SI design from it's initial 28nm target to 40nm using just one derivative. This of course will allow them to finalize the design on a well known process before they move it to the murky waters of 28nm and therefore lower risk. I suspect we will see a 5770 replacement as a 28nm test vehicle as soon as the process is ready for production.
    Quote Originally Posted by Solus Corvus View Post
    In Cypress (and back to R600) there were 5 SPUs (Stream Processing Units) per SP (Streaming Processor).
    Like in the first image here: http://www.anandtech.com/show/2841/4
    Now there will be 4 SPUs per SP. And as ajaidev points out, there may be a different arrangement of simple and complex units within the SP instead of 4+1.

    If they keep a grouping of 16 SPs per SIMD, and there are 1920 SPUs, then Cayman will have 30 SIMDs per chip.

    The reason for such a change might be to increase utilization. It could have good results if one or two of the units in your 5 wide design are usually sitting idle. That's why RV770 and Evergreen had such a huge difference between low utilization (games) and high utilization (furmark) in terms of power-draw and heat output.


    They have had a 5-wide design for a while now. Why assume that's never going to change? Especially on a generational change?
    Iirc to take advantage of all the the SPUs per SP certain scheduling conditions have to be met that makes it very difficult to even get close to the theoretical max performance with real world code
    Last edited by BrowncoatGR; 10-29-2010 at 01:05 PM.
    Seems we made our greatest error when we named it at the start
    for though we called it "Human Nature" - it was cancer of the heart
    CPU: AMD X3 720BE@ 3,4Ghz
    Cooler: Xigmatek S1283(Terrible mounting system for AM2/3)
    Motherboard: Gigabyte 790FXT-UD5P(F4) RAM: 2x 2GB OCZ DDR3 1600Mhz Gold 8-8-8-24
    GPU:HD5850 1GB
    PSU: Seasonic M12D 750W Case: Coolermaster HAF932(aka Dusty )

  7. #332
    Xtreme Mentor
    Join Date
    Jul 2008
    Location
    Shimla , India
    Posts
    2,631
    Quote Originally Posted by informal View Post
    Effectively,Cayman could have 50% more SP resources than Cypress,taking into account the grouping per SIMD. If the clocks stay in the range of Cypress,this thing will be the fastest single GPU card on the market,hands down. Now,how will they manage to put two of those monster for Antilles card and keep "reasonable" TDP is another question.
    There are three possible ways to do that:

    1. Decrease clock
    2. Decrease number of SIMD
    3. Using a new spin of the silicon with other enhancements/modifications.

    Since cayman is a 4 way design i would think so will Antilles...
    Coming Soon

  8. #333
    Xtreme Member
    Join Date
    Jun 2005
    Posts
    442
    Quote Originally Posted by ajaidev View Post
    There are three possible ways to do that:

    1. Decrease clock
    2. Decrease number of SIMD
    3. Using a new spin of the silicon with other enhancements/modifications.

    Since cayman is a 4 way design i would think so will Antilles...
    What I would like to see is an implimentation of AMD's "Turbo Core" on their dual GPU solutions. What this would mean for cards like Antilles is that on games that can use more than one GPU, the chips would clock down slightly and run in Crossfire configuration. However, in games that DON'T have a Crossfire profile or aren't optimized to run on multi-GPU solutions, one of the cores would overclock to the full speed of a Cayman XT card and run on only one GPU core while the other GPU core is essentially turned off.


    This would be an intelligent solution much like the Turbo-mode that intel implements on their i3/i5/i7 CPU's, only it would apply to GPU's. This would also allow them to intelligently keep Antilles within the 300 watt PCIe spec while in dual-GPU mode and it would allow the card to perform very well on games that only utilize one GPU. They've already got the technology implemented in their cards (notice the 17watts of power consumption at idle). Now they just need to expand upon it and add intelligent power gating to the mix. It could probably be done via software.

    That would be a win-win design. However, it's probably wishful thinking to hope for something like that.
    Last edited by Mad Pistol; 10-29-2010 at 01:16 PM.
    PII 965BE @ 3.8Ghz /|\ TRUE 120 w/ Scythe Gentle Typhoon 120mm fan /|\ XFX HD 5870 /|\ 4GB G.Skill 1600mhz DDR3 /|\ Gigabyte 790GPT-UD3H /|\ Two lovely 24" monitors (1920x1200) /|\ and a nice leather chair.

  9. #334
    Xtreme Enthusiast
    Join Date
    Jan 2008
    Posts
    743
    Quote Originally Posted by Solus Corvus View Post
    In Cypress (and back to R600) there were 5 SPUs (Stream Processing Units) per SP (Streaming Processor).
    Like in the first image here: http://www.anandtech.com/show/2841/4
    Now there will be 4 SPUs per SP. And as ajaidev points out, there may be a different arrangement of simple and complex units within the SP instead of 4+1.

    If they keep a grouping of 16 SPs per SIMD, and there are 1920 SPUs, then Cayman will have 30 SIMDs per chip.

    The reason for such a change might be to increase utilization. It could have good results if one or two of the units in your 5 wide design are usually sitting idle. That's why RV770 and Evergreen had such a huge difference between low utilization (games) and high utilization (furmark) in terms of power-draw and heat output.


    They have had a 5-wide design for a while now. Why assume that's never going to change? Especially on a generational change?
    Quote Originally Posted by informal View Post
    Effectively,Cayman could have 50% more SP resources than Cypress,taking into account the grouping per SIMD. If the clocks stay in the range of Cypress,this thing will be the fastest single GPU card on the market,hands down. Now,how will they manage to put two of those monster for Antilles card and keep "reasonable" TDP is another question.
    What about low ROPs listed? Wouldn't that still be a bottleneck?

  10. #335
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Lansing, MI / London / Stinkaypore
    Posts
    1,788
    Quote Originally Posted by Solus Corvus View Post
    I disagree about die space. Reducing the width of the SP increases the ratio of logic per SPU. Without knowing more specific architectural details I'd guess 5D->4D is ambiguous or detrimental to die space.
    Hm? How does logic go up? Logic should go down (minorly, 10-15% die size would be fortunate) and perf-logic ratio should get boosted just slightly less.

    Unless it's WT-XT-YT-ZT which might be what you mean. That'd be quite an increase actually, but the ratio's way overdone.

    I'm thinking of WXYZ + logic in the SIMD blocks that allows transcendentals to be performed through looping or such.
    Quote Originally Posted by radaja View Post
    so are they launching BD soon or a comic book?

  11. #336
    Xtreme Addict
    Join Date
    Jul 2007
    Posts
    1,488
    Quote Originally Posted by BrowncoatGR View Post
    Iirc to take advantage of all the the SPUs per SP certain scheduling conditions have to be met that makes it very difficult to even get close to the theoretical max performance with real world code
    Which might be exactly the reason for a change from 5D. Theoretical power going unused only drags down energy efficiency. Higher real utilization of units will increase overall power usage, but will also increase performance/watt - maybe to a greater proportion.

    Quote Originally Posted by kadozer View Post
    What about low ROPs listed? Wouldn't that still be a bottleneck?
    Maybe, but we don't know if they are directly comparable to current rops anyway.

    Quote Originally Posted by Macadamia View Post
    Hm? How does logic go up? Logic should go down (minorly, 10-15% die size would be fortunate) and perf-logic ratio should get boosted just slightly less.

    Unless it's WT-XT-YT-ZT which might be what you mean. That'd be quite an increase actually, but the ratio's way overdone.

    I'm thinking of WXYZ + logic in the SIMD blocks that allows transcendentals to be performed through looping or such.
    First off we are assuming that they didn't increase the size of the register file, complexity of the branch predictor, size of the texture units, size of the L1 cache, amount of shared memory, and complexity of the controlling logic - if they did then every SP and SIMD would have an increased ratio of logic to SPUs. Even if they kept it the same we are talking about 25% more SPs for the same number of SPUs (400 SPs vs 320 SPs). But they seem to be increasing the SPU count to 1920 if the rumors are true. So that's a 50% increase in the number of SPs (480 SPs vs 320 SPs). And if 16 SPs per SIMD is the same then it will be a 50% increase in texture units, L1 cache, shared memory, and controlling logic (30 SIMDs vs 20 SIMDs).

    This may be a very powerful chip.

  12. #337
    Banned
    Join Date
    May 2006
    Location
    Brazil
    Posts
    580
    Quote Originally Posted by Dimitriman View Post
    doesnt anyne else find this 32 rops number too low?
    me.. I hope it doesnt hold back cayman.

    at least 2Gb framebuffer is confirmed

  13. #338
    Xtreme Enthusiast
    Join Date
    Jan 2008
    Posts
    743
    Quote Originally Posted by -Sweeper_ View Post
    me.. I hope it doesnt hold back cayman.

    at least 2Gb framebuffer is confirmed
    Hope this is for 6950. They didn't wanna show their cards on that slide listing only the Power for both.

  14. #339
    Xtreme Member
    Join Date
    Dec 2008
    Location
    Raleigh, NC
    Posts
    318
    Quote Originally Posted by Solus Corvus View Post
    In Cypress (and back to R600) there were 5 SPUs (Stream Processing Units) per SP (Streaming Processor).
    Like in the first image here: http://www.anandtech.com/show/2841/4
    Now there will be 4 SPUs per SP. And as ajaidev points out, there may be a different arrangement of simple and complex units within the SP instead of 4+1.

    If they keep a grouping of 16 SPs per SIMD, and there are 1920 SPUs, then Cayman will have 30 SIMDs per chip.

    The reason for such a change might be to increase utilization. It could have good results if one or two of the units in your 5 wide design are usually sitting idle. That's why RV770 and Evergreen had such a huge difference between low utilization (games) and high utilization (furmark) in terms of power-draw and heat output.
    I appreciate you taking the time to educate me on that, thanks a million.

    Btw... reading all of what you guys are typing about ABC's, SIMD's and all that good stuff make me feel duuuuuuumb.

  15. #340
    Xtreme Member
    Join Date
    Aug 2010
    Location
    Athens, Greece
    Posts
    116
    Can we have 32 SPs per SIMD with 4 Texture Units ??

    Something like 16x2 SPs with 4-way VLIW 128 Shaders with 4 Texture Units
    Intel Core i7 920@4GHz, ASUS GENE II, 3 x 4GB DDR-3 1333MHz Kingston, 2x ASUS HD6950 1G CU II, Intel SSD 320 120GB, Windows 7 Ultimate 64bit, DELL 2311HM

    AMD FX8150 vs Intel 2500K, 1080p DX-11 gaming evaluation.

  16. #341
    Xtreme Addict
    Join Date
    Jul 2007
    Posts
    1,488
    Quote Originally Posted by Aten-Ra View Post
    Can we have 32 SPs per SIMD with 4 Texture Units ??

    Something like 16x2 SPs with 4-way VLIW 128 Shaders with 4 Texture Units
    With 1920 SPUs this would put them at 15 SIMDs and 4 Texture units per SIMD would be 60 Texture Units overall. It seems low.

  17. #342
    Xtreme Member
    Join Date
    Aug 2010
    Location
    Athens, Greece
    Posts
    116
    Yes, I was thinking a 16 SIMDs (32SPs) (2048 shaders) with 64 Texture Units

    2x Tessellator engines (one for each Ultra-Threaded Dispatched Processor) and 256-bit memory with 32 ROPs.
    Intel Core i7 920@4GHz, ASUS GENE II, 3 x 4GB DDR-3 1333MHz Kingston, 2x ASUS HD6950 1G CU II, Intel SSD 320 120GB, Windows 7 Ultimate 64bit, DELL 2311HM

    AMD FX8150 vs Intel 2500K, 1080p DX-11 gaming evaluation.

  18. #343
    Xtreme Enthusiast
    Join Date
    Jun 2006
    Location
    Space
    Posts
    769
    so without starting a flamewar...

    ...is this going to be the Dogs bollox?

  19. #344
    Xtreme Addict
    Join Date
    Jul 2007
    Posts
    1,488
    Sure, they don't have to stick with 16 SPs per SIMD. They would have to make fast ROP and Texture Units to keep them from being a bottleneck in this arrangement.

    But 2048 shaders? This could have some serious real shader power.

  20. #345
    Xtreme Cruncher
    Join Date
    Apr 2006
    Posts
    3,012
    AMD have shot themselves in the foot before due to low ROP's counts, i sure hope they don't do it again. with the shadder improvements and the shadder increase they will have more than enough shadding power, but if they sacrifice TMU and rop power to do that it would turn out very badly, sure the shadding performance is stupid high but games still need good TMU and ROP performance otherwise those shadders don't do jack.
    CPU: Intel Core i7 3930K @ 4.5GHz
    Mobo: Asus Rampage IV Extreme
    RAM: 32GB (8x4GB) Patriot Viper EX @ 1866mhz
    GPU: EVGA GTX Titan (1087Boost/6700Mem)
    Physx: Evga GTX 560 2GB
    Sound: Creative XFI Titanium
    Case: Modded 700D
    PSU: Corsair 1200AX (Fully Sleeved)
    Storage: 2x120GB OCZ Vertex 3's in RAID 0 + WD 600GB V-Raptor + Seagate 1TB
    Cooling: XSPC Raystorm, 2x MCP 655's, FrozenQ Warp Drive, EX360+MCR240+EX120 Rad's

  21. #346
    Registered User
    Join Date
    Oct 2009
    Posts
    1
    Quote Originally Posted by Macadamia View Post
    Hm? How does logic go up? Logic should go down (minorly, 10-15% die size would be fortunate) and perf-logic ratio should get boosted just slightly less.
    That ~10-15% number was also my estimate. It's of course dependent on the actual code running. It will be possible to construct cases were Cayman will be slower than Cypress (if Cayman doesn't have significantly more than 1920 SPs). But generally, it will gain the most in situations where the VLIW5 architecture fared worst in comparison to nvidia.
    Quote Originally Posted by Macadamia View Post
    Unless it's WT-XT-YT-ZT which might be what you mean. That'd be quite an increase actually, but the ratio's way overdone.

    I'm thinking of WXYZ + logic in the SIMD blocks that allows transcendentals to be performed through looping or such.
    Actually it is already known how the VLIW4 units will be organized. The codepath for that arch in the driver is functional since Catalyst 10.4, I've posted some stuff about that over at B3D 10 days ago.

    The transcendental functions are done by the xyz units working together (just like it is done for double precision already now, only that it takes 3 slots), so 3 of the 4 slots of the VLIW unit are used to calculate a transcendental. The fourth slot (w) does not take part in that and is still free to use in the same cycle. That means a good part of the t unit got split up in three parts and is distributed to the x, y and z units.
    Another function of the t unit was doing format conversions and roundings. This functionality got replicated to all subunits. That means for this kind of stuff Cayman will fly.
    24bit integer arithmetics are now fully supported by Cayman and can be done in all 4 slots (Evergreen had only partial support which was not really used).
    A 32Bit integer multiplication will unfortunately block all 4 slots (could be done by the t unit with the xyzw slots free for use by other instructions in Evergreen), but this is probably the price to pay to get some transistor savings from the change.
    All other integer instructions can again be done in all 4 slots (as before).

    Double precision instructions behave the same way as in Cypress. Everything involving a multiplication (MUL, FMA) takes 4 slots while the other stuff (like ADD and conversions) takes 2 slots. That means the DP:SP ratio is 1:4.
    Last edited by Gipsel; 10-29-2010 at 02:46 PM.

  22. #347
    Xtreme Addict
    Join Date
    Jan 2008
    Posts
    1,176
    Quote Originally Posted by Motiv View Post
    so without starting a flamewar...

    ...is this going to be the Dogs bollox?
    Hell yes.

    No one can speculate the GTX 580 performance with any degree of validity or truth but Cayman will be off the wall.
    Whether the 580 will be even faster than Cayman is unknown but fast is fast in subjective terms.

    I'll put it this way; all these crappy fud sites are holding on to some fake 36.6 fps benchmark from 2 months ago like it has value.

    What is true however is that the 6870 with 145W power draw can come close to the previous gen. Cayman will have 300W, 2G of vram, a new 4 shader system and a lot more of them.

    It's more than just "30%" faster than the 6870 as fud sites say.

  23. #348
    Xtreme Addict
    Join Date
    May 2007
    Posts
    2,125
    Quote Originally Posted by ColonelCain View Post
    Hmm... haven't seen that.. mind linking me?


    VLIW5 stands for "Very Long Instruction Word" 5; VLIW standing for the architecture basis, 5 for the number of execution units.
    Each shader 'unit' is composed of 5 execution units, 4 of them being 'simple', with the 5th being able to handle all functions, including transcendental functions.
    This is one (of many) reasons that AMD's u-arch's generally have more shaders than nvidia, yet has comparable performance; unless the developer specifically codes for this unique computing style, a game will only use some of the execution units. This is also one reason why the theoretical computation power for AMD's u-arch's are far greater than what is actually achieved in actual applications, because AMD's figures assume that all execution units are utilized.
    I don't have the link, but Gipsel posted about it underneath and was the one who posted about it in the R9xx speculation thread on b3d

    Quote Originally Posted by Motiv View Post
    so without starting a flamewar...

    ...is this going to be the Dogs bollox?
    As I posted here at the beginning of this thread: http://www.xtremesystems.org/forums/...55&postcount=2

    Reviews on Barts have given hints that Cayman is a superset of Barts, meaning it will have more features than Barts has, and that Cayman is focused on high end GPU performance. Given that this is a break from the old AMD strategy, I bet it's a big deal

  24. #349
    Xtreme Member
    Join Date
    Jun 2005
    Posts
    442
    Quote Originally Posted by zerazax View Post
    Reviews on Barts have given hints that Cayman is a superset of Barts, meaning it will have more features than Barts has, and that Cayman is focused on high end GPU performance. Given that this is a break from the old AMD strategy, I bet it's a big deal
    I seriously think that Barts was released first as a simple teaser for what's to come with Cayman and Antilles. If Cayman is a beast, Antilles will be Chuck Norris.
    PII 965BE @ 3.8Ghz /|\ TRUE 120 w/ Scythe Gentle Typhoon 120mm fan /|\ XFX HD 5870 /|\ 4GB G.Skill 1600mhz DDR3 /|\ Gigabyte 790GPT-UD3H /|\ Two lovely 24" monitors (1920x1200) /|\ and a nice leather chair.

  25. #350
    Xtreme Member
    Join Date
    May 2008
    Posts
    336
    is there a release date for these yet?

Page 14 of 149 FirstFirst ... 4111213141516172464114 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •