Check out my first novel, midnight's simulacra!

SIMD: Difference between revisions

From dankwiki
Line 3: Line 3:
** '''single''': 32-bit IEEE 754 floating-point (bias-127)
** '''single''': 32-bit IEEE 754 floating-point (bias-127)
** '''double''': 64-bit IEEE 754 floating-point (bias-1023)
** '''double''': 64-bit IEEE 754 floating-point (bias-1023)
** '''long double''': 80-bit "double extended" IEEE 754-1985 floating-point (excess-16383)
** '''long double''': 80-bit "double extended" IEEE 754-1985 floating-point (bias-16383)
*** not an actual SIMD type, but an artifact of x87
*** not an actual SIMD type, but an artifact of x87
** '''word''': 32-bit two's complement integer
** '''word''': 32-bit two's complement integer

Revision as of 20:52, 19 September 2009

x86

  • Terminology:
    • single: 32-bit IEEE 754 floating-point (bias-127)
    • double: 64-bit IEEE 754 floating-point (bias-1023)
    • long double: 80-bit "double extended" IEEE 754-1985 floating-point (bias-16383)
      • not an actual SIMD type, but an artifact of x87
    • word: 32-bit two's complement integer
    • doubleword, dword: 64-bit two's complement integer
  • These do not necessarily map to the C data types of the same name, for any given compiler!

Future

  • AVX (Advanced Vector eXtensions) -- to be introduced on Intel's Sandy Bridge (2010) and AMD's Bulldozer (2011), and implemented within the VEX coding scheme
  • The FMA instruction set extension to x86 should hit around 2011, providing floating-point fused multiply-add

SSE4

SSE4a

SSE4.1

SSE4.2

SSE3

  • movddup -- move a double from a 8-byte-aligned memory location or lower half of XMM register to upper half, then duplicate upper half to lower half

SSSE3

  • pmaddwd -- multiply packed words, then horizontally sum pairs, accumulating into doublewords


SSE2

  • movapd -- move two packed doubles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
    • movupd -- movapd safe for unaligned memory references, with far inferior performance.
  • mulpd -- multiply two packed doubles. the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
  • addpd -- add two packed doubles. the addend is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the augend.

SSE

  • movaps -- move four packed singles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
    • movups -- movaps safe for unaligned memory references, with far inferior performance.
  • mulps -- multiply four packed singles. the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
  • addps -- add four packed singles. the addend is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the augend.

Other Architectures

See Also