SIMD: Difference between revisions

No edit summary
Line 17: Line 17:
* '''word''': 32-bit two's complement integer
* '''word''': 32-bit two's complement integer
* '''doubleword''', '''dword''': 64-bit two's complement integer
* '''doubleword''', '''dword''': 64-bit two's complement integer
 
==x86 YMM==
The [http://software.intel.com/en-us/avx/ AVX] (Advanced Vector eXtensions) were introduced on Intel's [[Sandy Bridge]] (2010) and AMD's [[Bulldozer]] (2011). They operate on the 256-bit YMM registers (YMM0..YMM15), which are aliased by the [[SIMD#x86_XMM|XMM]] registers. They are encoded using the [[VEX]] scheme. Support for AVX can be determined via:
* Determine that [[CPUID]].1:ECX.OSXSAVE[bit 27] is set (XGETBV is enabled for application use)
* Issue XGETBV and verify that XFEATURE_ENABLED_MASK[2:1] = 0x3 (XMM state and YMM state are enabled by OS)
* Determine that [[CPUID]].1:ECX.AVX[bit 28] is set (AVX instructions supported)
==x86 XMM==
==x86 XMM==
The Streaming SIMD Extensions operate on the 128-bit XMM registers (XMM0..XMM7 in 32-bit mode, XMM0..XMM15 in 64-bit mode). In its original incarnation on the PIII, execution units (but not registers) were shared with the x87 floating-point architecture. The execution units were separated in the NetBurst microarchitecture. In the Core microarchitecture, the execution engine has been widened for greater SSE throughput.
The Streaming SIMD Extensions operate on the 128-bit XMM registers (XMM0..XMM7 in 32-bit mode, XMM0..XMM15 in 64-bit mode). In its original incarnation on the PIII, execution units (but not registers) were shared with the x87 floating-point architecture. The execution units were separated in the NetBurst microarchitecture. In the Core microarchitecture, the execution engine has been widened for greater SSE throughput.
Line 78: Line 82:


===Future Directions===
===Future Directions===
* [http://software.intel.com/en-us/avx/ AVX] (Advanced Vector eXtensions) -- to be introduced on Intel's [[Sandy Bridge]] (2010) and AMD's [[Bulldozer]] (2011), and implemented within the [http://en.wikipedia.org/wiki/VEX_prefix VEX coding scheme]
* The [http://en.wikipedia.org/wiki/FMA_instruction_set FMA instruction set] extension to x86 should hit around 2011, providing floating-point fused multiply-add
* The [http://en.wikipedia.org/wiki/FMA_instruction_set FMA instruction set] extension to x86 should hit around 2011, providing floating-point fused multiply-add
** AMD appears to call this [http://en.wikipedia.org/wiki/FMA_instruction_set FMA4], part of what was SSE5
** AMD appears to call this [http://en.wikipedia.org/wiki/FMA_instruction_set FMA4], part of what was SSE5