AMD embraces AVX making a new superset with SSE5(256bit support)
Original find here.
Link to pdf:
http://support.amd.com/us/Processor_TechDocs/43479.pdf
This is BIG news.AMD is playing it safe this time and it seems that on paper Bulldozer will have one of the Sandy Bridge's main innovation(new 256bit wide instruction set).
edit: it looks like 4 operand instruction support is also there,so another (previously AVX exclusive) advantage of SB is matched by this.<-after edit2: this is an error on my part,SandyB won't support FMA4 nor FMA3(IvyBridge will).Look at edit 2.
edit 2:
To recap,after seeing additional info directly from AMD's devcentral(engineering dept blog) we now know what kind of capabilities wrt instruction set compatibility will be in Bulldozer cores and some info on Sandy Bridge uarch due out in 2010(SandyB won't support FMA at all):
1)As AMD's senior fellow stated in his blog Bulldozer will support: intel AVX version 5(meaning full avx spec support with 256b wide vectors),intel FMA version 3 and new extension set called XOP,CVT16,FMA4(former SSE5 instructions with new VEX decoding that were not covered with AVX v5 but could still be very convenient for HPC computing etc.).
2)While supporting AVX,intel's next tick (Sandy Bridge) won't have FMA support in any form. The FMA3 is reserved and planned for tock,a successor to SandyBridge cores(2011 planned). Sandy Bridge will support 256b wide vectors among other stuff AVX will bring ,but won't have FMA.
AMD's official stance on AVX/SSE5 and changes in specs
Links of interest:
http://forums.amd.com/devblog/
http://blogs.amd.com/nigeldessau/200...-than-fiction/
Quote:
Originally Posted by N.Dessau
There is a commonly held fallacy that there is one single x86 instruction set. In reality, while all x86 chips run about 99% similar instructions, no two suppliers run exactly the same base. We have a different set to Intel, which is a different set to Via and so on. In fact, one of the things that differentiates our server line from Intel’s is that they don’t even have the same set of functions across the Nehalem line - where as we run all the same functions on the entire family of Quad-Core AMD OpteronTM processors.
This is one of the reasons why AMD Opteron processor-based servers make such good disaster recover solutions - you really can failover running virtual machines to newer, smaller standby systems without worrying that some of the processor functions may not be supported.
While the AMD Opteron processor retains backward compatibility, it is fair to point out that as we deliver new function at each generation, we often have to add extensions to the x86 instruction set (examples are virtualization and 64-bit extensions).
Changing the instruction set can be both complex and expensive for application developers and painful for system designers. AMD recognizes this, and we are trying to reduce some of this cost and complexity by helping to unify the x86 instruction set with the adoption of the Advanced Vector Extensions (AVX).
AMD has always been a champion of open and industry standards, and by adopting the AVX instructions for x86 processors initially announced by Intel in 2008, we can help move this ideal forward. We believe that by proposing and embracing enhancements to the instruction set, AMD provides software developers with a great step towards a more standard platform for innovation.
Now, originally we had focused on what we had called SSE5, a specification we proposed for review by the industry in 2007. However, due to the overlap of functionality between the AVX instructions and SSE5, AMD has decided to recast the SSE5 instructions into the AVX framework. AMD made decision to ensure the continued compatibility of x86 software, and plans to incorporate AVX instructions into AMD processors in 2011.
And, still, we want to continue to advance the ball. In addition to embracing the AVX specification, AMD is proposing further enhancement to the current version of this specification called eXtended Operations (XOP). Given there are features of the SSE5 specification that were positively reviewed in the news and not in the current version of AVX, we have incorporated them into the new proposal. Examples of the functionality include:
* Supporting Enhanced Vectorization
* Accelerating traditional DSP Multi-Media algorithms
* Accelerating floating point algorithms for High Performance Computing
If you want to review the AVX or XOP, AMD is posting theses specification here. I also encourage you to go read a blog written by Dave Christie, a Fellow in our Design Engineering team, to get more insight into the technical details and read what some of our technology partners have to say about this change.
You know, when I hear people cry, “Do not fork the x86 instruction set!” what I really hear is people saying, “Give up driving instruction set innovations!”
Well, there are two reasons why this won’t happen:
* Innovation ‘R US. We believe that bringing innovation to the market is one of our key values and we plan to continue to do what we can to bring users systems that better serve their needs
* There really isn’t a single static x86 instruction set and we need as an industry to make evolution of this instruction set. That’s why we publish changes we are proposing for discussion (and haven’t done it in secret). Our users and the application developers may have good ideas too.
The x86 instruction set will continue to evolve and change and wouldn’t it be great if we could do it together?
Nigel Dessau is senior vice president and chief marketing officer at AMD. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites are provided for convenience and unless explicitly stated, AMD is not responsible for the contents of such links sites and no endorsement is implied.
Quote:
Originally Posted by Dave Christie,senior architect
Since we don't control the definition of AVX, all we can say for sure is that we expect our initial products to be compatible with version 5 of the specification (the most recent one, as of this writing, published in January of 2009), except for the FMA instructions, which we expect will be compatible with version 3 (published in August of 2008).
Why the FMA difference? This was not something we did lightly. In December of 2008, Intel made significant changes to the FMA definition, which we found we could not accommodate without unacceptable risk to our product schedules. Yet we did not want to deprive customers of the significant performance benefits of FMA. So we decided to stick with the earlier definition, renaming it FMA4 (for four-operand FMA - Intel's newer definition uses what we believe to be a less capable three-operand, destructive-destination format). It will have a different CPUID feature flag from Intel's FMA extension. At some future point, we will likely adopt Intel's newer FMA definition as well, coexisting with FMA4. But as you might imagine, we may wait until we're sure the specification is stable.