Check out my first novel, midnight's simulacra!

SIMD

From dankwiki

Revision as of 20:49, 19 September 2009 by Dank (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

x86

Terminology:
- single: 32-bit IEEE 754 floating-point (bias-127)
- double: 64-bit IEEE 754 floating-point (bias-1023)
- long double: 80-bit "double extended" IEEE 754-1985 floating-point (excess-16383)
  - not an actual SIMD type, but an artifact of x87
- word: 32-bit two's complement integer
- doubleword, dword: 64-bit two's complement integer
These do not necessarily map to the C data types of the same name, for any given compiler!

Future

AVX (Advanced Vector eXtensions) -- to be introduced on Intel's Sandy Bridge (2010) and AMD's Bulldozer (2011), and implemented within the VEX coding scheme

SSE4

SSE4a

SSE4.1

SSE4.2

SSE3

movddup -- move a double from a 8-byte-aligned memory location or lower half of XMM register to upper half, then duplicate upper half to lower half

SSSE3

pmaddwd -- multiply packed words, then horizontally sum pairs, accumulating into doublewords

SSE2

movapd -- move two packed doubles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
- movupd -- movapd safe for unaligned memory references, with far inferior performance.
mulpd -- multiply two packed doubles. the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
addpd -- add two packed doubles. the addend is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the augend.

SSE

movaps -- move four packed singles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
- movups -- movaps safe for unaligned memory references, with far inferior performance.
mulps -- multiply four packed singles. the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
addps -- add four packed singles. the addend is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the augend.

Fused Multiply-Add

The FMA instruction set extension to x86 should hit around 2011

Other Architectures

PowerPC implements AltiVec

See Also

"Why no FMA in AVX in Sandy Bridge?", Intel Developers Forum
SSE5 guide at AMD
2007-04-19 post to http://virtualdub.org, "SSE4 finally adds dot products"

Retrieved from "https://nick-black.com/dankwiki/index.php?title=SIMD&oldid=1010"

X86