Page 1 of 8 1234 ... LastLast
Results 1 to 25 of 199

Thread: AMD embraces AVX making a new superset with SSE5(256bit support)

  1. #1
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215

    AMD embraces AVX making a new superset with SSE5(256bit support)

    Original find here.

    Link to pdf:
    http://support.amd.com/us/Processor_TechDocs/43479.pdf

    This is BIG news.AMD is playing it safe this time and it seems that on paper Bulldozer will have one of the Sandy Bridge's main innovation(new 256bit wide instruction set).

    edit: it looks like 4 operand instruction support is also there,so another (previously AVX exclusive) advantage of SB is matched by this.<-after edit2: this is an error on my part,SandyB won't support FMA4 nor FMA3(IvyBridge will).Look at edit 2.

    edit 2:

    To recap,after seeing additional info directly from AMD's devcentral(engineering dept blog) we now know what kind of capabilities wrt instruction set compatibility will be in Bulldozer cores and some info on Sandy Bridge uarch due out in 2010(SandyB won't support FMA at all):

    1)As AMD's senior fellow stated in his blog Bulldozer will support: intel AVX version 5(meaning full avx spec support with 256b wide vectors),intel FMA version 3 and new extension set called XOP,CVT16,FMA4(former SSE5 instructions with new VEX decoding that were not covered with AVX v5 but could still be very convenient for HPC computing etc.).
    2)While supporting AVX,intel's next tick (Sandy Bridge) won't have FMA support in any form. The FMA3 is reserved and planned for tock,a successor to SandyBridge cores(2011 planned). Sandy Bridge will support 256b wide vectors among other stuff AVX will bring ,but won't have FMA.
    Last edited by informal; 05-07-2009 at 04:43 AM.

  2. #2
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Texas
    Posts
    1,663
    Holy crap! Bulldozer will be the best of both worlds. This is big news informal. Good find brother.

    ...I'm in shock I think....yea....

    EDIT: The PDF doesn't specifically say AMD is adding Intel's AVX, but it is adding AVX-like instructions and capabilities. How about some of you engineers take a look and let us know what it all means.
    Last edited by Mechromancer; 05-01-2009 at 08:23 PM.
    Core i7 2600K@4.6Ghz| 16GB G.Skill@2133Mhz 9-11-10-28-38 1.65v| ASUS P8Z77-V PRO | Corsair 750i PSU | ASUS GTX 980 OC | Xonar DSX | Samsung 840 Pro 128GB |A bunch of HDDs and terabytes | Oculus Rift w/ touch | ASUS 24" 144Hz G-sync monitor

    Quote Originally Posted by phelan1777 View Post
    Hail fellow warrior albeit a surat Mercenary. I Hail to you from the Clans, Ghost Bear that is (Yes freebirth we still do and shall always view mercenaries with great disdain!) I have long been an honorable warrior of the mighty Warden Clan Ghost Bear the honorable Bekker surname. I salute your tenacity to show your freebirth sibkin their ignorance!

  3. #3
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,476
    In english?
    i3 2100, MSI H61M-E33. 8GB G.Skill Ripjaws.
    MSI GTX 460 Twin Frozr II. 1TB Caviar Blue.
    Corsair HX 620, CM 690, Win 7 Ultimate 64bit.

  4. #4
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    It basically ads almost all AVX instructions except the SSE4.2(iirc 7 instructions) which is in i7.Big additions are FMA instructions. What AMD did with K8->K10 in terms of SSE throughput,they will do with new uarch. which will have 256b wide support for media instructions.
    Last edited by informal; 05-01-2009 at 08:56 PM.

  5. #5
    Xtreme Addict
    Join Date
    Jan 2008
    Location
    Lubbock, Texas
    Posts
    2,133
    Quote Originally Posted by Glow9 View Post
    In english?
    it means amd is not being retarded and going with an instruction set half the speed of their competitor.

  6. #6
    Registered User
    Join Date
    Mar 2009
    Posts
    72
    Quote Originally Posted by Glow9 View Post
    In english?
    The same guy who found the pdf had this to say.

    Basically, this instruction set extension allow AMD to "patch" the only two deficiencies of SSE5 (compared to AVX): 256-bit vectors and 4-operand instructions, while preserving all SSE5's (many) strengths.

    In fact, looking at the instruction encoding, AMD did not intend or attempt to follow AVX. For example, AVX's 2-byte prefix format was not used, while AVX's 3-byte prefix format was used to allow access to the 4th operand and the 256-bit YMM registers.

    This is IMHO an aggressive move. It is a confidence call from AMD, saying whatever we're going to do in our next generation will be better than yours (Intel's).

  7. #7
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Its SSE5 thats AMDs own invention. In short its another 3DNow!, SSE4A. Good luck!

    AVX and SSE5 is fully incompatible so to say.

    Nothing new really, nothing surprising.

    http://en.wikipedia.org/wiki/Advanced_Vector_Extensions
    http://softwarecommunity.intel.com/i...e-31943302.pdf
    http://en.wikipedia.org/wiki/SSE5

    SSE5 is more a SSE4 competitor than AVX.
    Last edited by Shintai; 05-01-2009 at 11:37 PM.
    Crunching for Comrades and the Common good of the People.

  8. #8
    Xtreme Addict
    Join Date
    Apr 2007
    Posts
    2,476
    Quote Originally Posted by lockee View Post
    The same guy who found the pdf had this to say.
    Okay so to try and understand since I have no clue (note I'm being honest). The guy above said "instruction set half the speed of their competitor." So are you saying they have confidence because they are confident enough with their architecture that they can run it with something half the speed as Intel? I guess I don't follow that.
    i3 2100, MSI H61M-E33. 8GB G.Skill Ripjaws.
    MSI GTX 460 Twin Frozr II. 1TB Caviar Blue.
    Corsair HX 620, CM 690, Win 7 Ultimate 64bit.

  9. #9
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Shintai View Post
    Its SSE5 thats AMDs own invention. In short its anotehr 3DNow!, SSE4A. Good luck!

    AVX and SSE5 is fully incompatible to to say.

    Nothing new really, nothing surprising.
    *New* SSE5,the one AMD presents in the tech doc. is a superset of AVX.Whether it is supported or not is up to OS and compiler support. Remember AMD64 and EMT64 hw realizations(same but not the same on hw level-still compatible)? The fact is that AMD just matched AVX in performance by adding 4 operand instructions and 256b vectors...
    Quote Originally Posted by Glow9 View Post
    Okay so to try and understand since I have no clue (note I'm being honest). The guy above said "instruction set half the speed of their competitor." So are you saying they have confidence because they are confident enough with their architecture that they can run it with something half the speed as Intel? I guess I don't follow that.
    He was referring to (now old) specification for 128bit media instructions/2 operand instr. support in Bulldozer versus the now released spec. of 256bit vector and 4 operand support-and this is a big difference.


    PS: SSE5 notation is gone(as it seems) and AMD now calls this superset : 128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions
    Last edited by informal; 05-01-2009 at 11:38 PM.

  10. #10
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Quote Originally Posted by informal View Post
    *New* SSE5,the one AMD presents in the tech doc. is a superset of AVX.Whether it is supported or not is up to OS and compiler support. Remember AMD64 and EMT64 hw realizations(same but not the same on hw level-still compatible)? The fact is that AMD just matched AVX in performance by adding 4 operand instructions and 256b vectors...


    He was referring to (now old) specification for 128bit media instructions/2 operand instr. support in Bulldozer versus the now released spec. of 256bit vector and 4 operand support-and this is a big difference.


    PS: SSE5 notation is gone(as it seems) and AMD now calls this superset : 128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions
    Its very very far from AVX. And calling it a superset means it should contain all AVX. And since it contains close to nothing or actually nothing..then no.

    I dont think you actually understand this. Read the paper I linked.

    Something compiled for AVX and something compiled for SSE5 is 100% incompatible. SSE5 is more a catchup to SSE4 and then add a few new things that will yet again prove useless in the long run. Not because it cant be used. But simply because of lack of support like always.

    AMD64 and EMT64 wasnt even fully compatible. Plus I think you fail the concept of how the technology move. You dont add it overnight or even before. Look on all the previous times. Its always a catchup scenario. And this wont change anything.
    Last edited by Shintai; 05-01-2009 at 11:51 PM.
    Crunching for Comrades and the Common good of the People.

  11. #11
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Quote Originally Posted by Shintai View Post
    Its very very far from AVX. And calling it a superset means it should contain all AVX. And since it contains close to nothing or actually nothing..then no.

    I dont think you actually understand this. Read the paper I linked.

    Something compiled for AVX and something compiled for SSE5 is 100% incompatible. SSE5 is more a catchup to SSE4 and then add a few new things that will yet again prove useless in the long run. Not because it cant be used. But simply because of lack of support like always.

    AMD64 and EMT64 wasnt even fully compatible. Plus I think you fail the concept of how the technology move. You dont add it overnight or even before. Look on all the previous times. Its always a catchup scenario. And this wont change anything.
    I said that compiler support needs to be there,just as it needs to be there for AVX in order to be fully utilized(as intel designed it to). That's why AMD intros the tech doc. ~2 years before Interlagos hits the market,in order to get the necessary info out for developers and partners(an M$ is a partner and an important one). Did you forget AMD64 and who adopted who's implementation ?

    BTW,original SSE5 was introduced approx. 2 years ago and in those days Interlagos(or whatever market name BD core had) was scheduled for 2009 release. We now know AMD postponed bulldozer and redesign it a bit,and we now know why they did it and what parts they redesigned(256bit support,4 operand instr.,new AVX like instructions).
    Even original SSE5 with those specs promised some hefty perf. improvements versus the previous implementation,so AMD hardware will run even better with the extended instruction set when software gets coded properly for it.PGI and GCC were on board back in the 2007 with original SSE5,you can bet they will be in time Interlagos hits.

    The important thing is AMD improved and matched AVX on hardware level and there are 2 years to get the developer and compiler support for it.Also it will be backwards compatible to all pre-SSE4 optimized software(which is a large majority)

  12. #12
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Lansing, MI / London / Stinkaypore
    Posts
    1,788
    Yeah shintai, because AMD will probably be faster this round with compliant hardware. Look who's actually playing catchup.


    All Intel did was just release specs to give devs an idea how it will be coded before they do it on AMD CPUs. Ooooh satisfaction.
    Quote Originally Posted by radaja View Post
    so are they launching BD soon or a comic book?

  13. #13
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Quote Originally Posted by informal View Post
    I said that compiler support needs to be there,just as it needs to be there for AVX in order to be fully utilized(as intel designed it to). That's why AMD intros the tech doc. ~2 years before Interlagos hits the market,in order to get the necessary info out for developers and partners(an M$ is a partner and an important one). Did you forget AMD64 and who adopted who's implementation ?

    BTW,original SSE5 was introduced approx. 2 years ago and in those days Interlagos(or whatever market name BD core had) was scheduled for 2009 release. We now know AMD postponed bulldozer and redesign it a bit,and we now know why they did it and what parts they redesigned(256bit support,4 operand instr.,new AVX like instructions).
    Even original SSE5 with those specs promised some hefty perf. improvements versus the previous implementation,so AMD hardware will run even better with the extended instruction set when software gets coded properly for it.PGI and GCC were on board back in the 2007 with original SSE5,you can bet they will be in time Interlagos hits.

    The important thing is AMD improved and matched AVX on hardware level and there are 2 years to get the developer and compiler support for it.Also it will be backwards compatible to all pre-SSE4 optimized software(which is a large majority)
    SSE5 aint AVX. And AMD64 was a whole other ballgame. AMD only won because MS said F U to Intel.

    Everyone and their mother uses Intel compilers. They dont use MS or AMD compilers since they are slower. And again. SSE5 got nothing to do with AVX and its not a superset. Lookup that word please.

    AMD dont even have full SSE4.x support yet.

    And in best case, SSE5 is a dumped down version of SSE4 with some 256bit mix.

    Please read Intels AVX paper. It will save alot of posting.

    SSE4A ended as what today? Wasted transistors and designtime?

    SSE5 is about 170 instructions with what, 50 as 256bit? AVX is close to 400, with ~half of them being 256bit.

    And you claim SSE5 is a superset of AVX? Oh thats hilarious!

    SSE5 is mostly Intels SSE4 with alittle 256bit mixed in. You could say its like SSE3+SSE4A when Intel had SSE3+SSE4.1
    Last edited by Shintai; 05-02-2009 at 12:21 AM.
    Crunching for Comrades and the Common good of the People.

  14. #14
    Banned
    Join Date
    Jan 2008
    Location
    Canada
    Posts
    707
    Quote Originally Posted by Shintai View Post
    AMD only won because MS said F U to Intel.
    That is not the whole story. AMD brought x86 into the 64bit realm while Intel was trying to say Eff you to all of us and shove Itanium into the market, and all the while telling us "64bit computing is not needed on the desktop"

  15. #15
    Banned
    Join Date
    Jan 2008
    Location
    Canada
    Posts
    707
    Quote Originally Posted by Shintai View Post
    Oh thats hilarious!
    Your posts certainly are, in a tragic sort of way.

  16. #16
    Xtreme X.I.P.
    Join Date
    Apr 2005
    Posts
    4,475
    Please keep it peacfull. Attack the arguments not the persons.

  17. #17
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Shintai ,there are about ~100 or less new instructions in AVX(new as not being previously supported by intel hw). The rest ,more than 300 legacy SSE instr. are updated for better performance on future hw(and less than 100 from those 300 are widened to 256bits to support those fp vector instructions).

    BTW,SSE4a is 4(in words:four) instructions so the die space that was "wasted" is really huge(I can see AMD pulling their hair over this "wasted" space ). Two of those four(LZCNT and POPCNT) doesn't even need to be included in a SSE4 in order to be available and used.
    Last edited by informal; 05-02-2009 at 12:34 AM.

  18. #18
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Lansing, MI / London / Stinkaypore
    Posts
    1,788
    So tell me Shintai, please tell us where is the fused multiply add acceleration in Sandy Bridge or AVX?



    We're absolutely begging to know, just like the 2010 DX11 GPUs.


    Running through SSE5's rectification, it looks like it has everything AVX touted, with extra few features that SSE5 had from the start.
    Quote Originally Posted by radaja View Post
    so are they launching BD soon or a comic book?

  19. #19
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Quote Originally Posted by informal View Post
    Shintai ,there are about ~100 or less new instructions in AVX(new as not being previously supported by intel hw). The rest ,more than 300 legacy SSE instr. are updated for better performance on future hw(and less than 100 from those 300 are widened to 256bits to support those fp vector instructions).
    And there is 46 in SSE5.

    Quote Originally Posted by Macadamia View Post
    So tell me Shintai, please tell us where is the fused multiply add acceleration in Sandy Bridge or AVX?



    We're absolutely begging to know, just like the 2010 DX11 GPUs.


    Running through SSE5's rectification, it looks like it has everything AVX touted, with extra few features that SSE5 had from the start.
    Would be easier if you checked my links....
    Crunching for Comrades and the Common good of the People.

  20. #20
    Xtreme Addict
    Join Date
    Apr 2008
    Location
    Lansing, MI / London / Stinkaypore
    Posts
    1,788
    Wow, counting instruction numbers?

    I bet most of them are useless compared to a non emulated implementation of FMA (which is what AMD likely has). Which will blow the panties away from any previous FPU.


    And please tell all 3 of those developers that their SSE4.1/2 support is really appreciated (Cinebench x64 has K10.5 paring Core, people are avoiding DivX like the plague because x264 has it beat on all fronts, and probably some other noname is wasting their time on this instead kind of really limited support)
    Quote Originally Posted by radaja View Post
    so are they launching BD soon or a comic book?

  21. #21
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Shintai,actually in the pdf link there are 75 new instructions in new SSE extension set by AMD,not 46(your wiki source?).

  22. #22
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Quote Originally Posted by informal View Post
    Shintai,actually in the pdf link there are 75 new instructions in new SSE extension set by AMD,not 46(your wiki source?).
    You do know some of those are what you call "converted" SSE4? Not AVX style. In that case Intels AVX is ~200....

    Where is this superset of AVX? Superset requires SSE5 amount > AVX amount.

    SSE5 is more somthing halfway between SSE4 and AVX.

    Quote Originally Posted by Macadamia View Post
    Wow, counting instruction numbers?

    I bet most of them are useless compared to a non emulated implementation of FMA (which is what AMD likely has). Which will blow the panties away from any previous FPU.
    Last FPU was x87. We only use SIMD in 64bit(Windows).
    Last edited by Shintai; 05-02-2009 at 01:02 AM.
    Crunching for Comrades and the Common good of the People.

  23. #23
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    Having more instructions doesn't mean much if most of them won't be used much(if at all) .AMD weighted in and decided to incorporate some of AVX instructions (probably the most important ones). The big news in spec. is the FMA addition + 4 operand instructions with 256bit vector support .
    The AVX will also mean jack(pardon my french ) if it is not coded for. We have even SSSE3(16 new instr.) for 3 years now and SSE4.1(47) for 2 years and both of spec. documents much earlier than that and how many applications actually use any of those?

    As for the FPU comment,I believe Macadamia was referring to SIMD.
    Last edited by informal; 05-02-2009 at 01:19 AM.

  24. #24
    Xtreme Cruncher
    Join Date
    Aug 2006
    Location
    Denmark
    Posts
    7,747
    Quote Originally Posted by informal View Post
    Having more instructions doesn't mean much if most of them won't be used much(if at all) .AMD weighted in and decided to incorporate some of AVX instructions (probably the most important ones). The big news in spec. is the FMA addition + 4 operand instructions with 256bit vector support .
    The AVX will also mean jack(pardon my french ) if it is not coded for. We have even SSSE3(16 new instr.) for 3 years now and SSE4.1(47) for 2 years and both of spec. documents much earlier than that and how many applications actually use any of those?

    As for the FPU comment,I believe Macadamia was referring to SIMD.
    So now having it doesnt matter now? How is it big news then?

    Plus what happens when you compile and use say SSSE3. But lack one instruction because you didnt fully support it in your CPU? Thats right, it doesnt run in that mode. It changes to a former more legacy mode.
    If you didnt have full SSE2 support you couldnt run 64bit Windows either.

    Alot of applications uses the newest instructions. Just because Crysis or some other popular title doesnt use it. Doesnt mean its not used at all in large amounts. And you will never see it really unless you bench it. Because then you just run it with SSE3 etc instead or so.
    Last edited by Shintai; 05-02-2009 at 01:26 AM.
    Crunching for Comrades and the Common good of the People.

  25. #25
    Xtreme Cruncher
    Join Date
    Jun 2006
    Posts
    6,215
    What is important is performance. You can have hw support and coded app that can make use of SSSE3(hypothetical ex.) and it makes 3-5% difference... Let's wait and see what comes out from all of this in 2 years.By that time both designs will be well studied and probably we will have hardware on the shelves from both companies.

Page 1 of 8 1234 ... LastLast

Bookmarks

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •