Check out my first novel, midnight's simulacra!

SIMD

From dankwiki

Revision as of 22:20, 19 September 2009 by Dank (talk | contribs) (→x86)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

x86

Terminology:
- half precision: 16-bit IEEE 754 floating-point (bias-15) (IEEE 754 2008 binary16)
- single: 32-bit IEEE 754 floating-point (bias-127) (IEEE 754 2008 binary32)
- double: 64-bit IEEE 754 floating-point (bias-1023) (IEEE 754 2008 binary64)
- long double: 80-bit "double extended" IEEE 754-1985 floating-point (bias-16383)
  - not an actual SIMD type, but an artifact of x87
- word: 32-bit two's complement integer
- doubleword, dword: 64-bit two's complement integer
These do not necessarily map to the C data types of the same name, for any given compiler!

Future

AVX (Advanced Vector eXtensions) -- to be introduced on Intel's Sandy Bridge (2010) and AMD's Bulldozer (2011), and implemented within the VEX coding scheme
The FMA instruction set extension to x86 should hit around 2011, providing floating-point fused multiply-add
- AMD appears to call this FMA4, part of what was SSE5

SSE5 (AMD)

Unimplemented extensions competing with SSE4, encoded using a method incompatible with VEX
Withdrawn, converted into VEX-compatible encodings, and split into:
- FMA4: Fused floating-point multiply-add (compare Intel's FMA)
- XOP: Fused integer multiply-add, byte permutations, shifts, rotates, integer vector horizontal operations (compare Intel's SSE4)
- CVT16: Half-precision conversion

SSE4 (Intel)

SSE4a

SSE4.1

DPPD instruction dataflow

dpps -- dot product of two vectors having four single components each
dppd -- dot product of two vectors having two double components each
insertps

SSE4.2

SSE3

movddup -- move a double from a 8-byte-aligned memory location or lower half of XMM register to upper half, then duplicate upper half to lower half

SSSE3

pmaddwd -- multiply packed words, then horizontally sum pairs, accumulating into doublewords

SSE2

movapd -- move two packed doubles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
- movupd -- movapd safe for unaligned memory references, with far inferior performance.
mulpd -- multiply two packed doubles. the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
addpd -- add two packed doubles. the addend is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the augend.

SSE

movaps -- move four packed singles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
- movups -- movaps safe for unaligned memory references, with far inferior performance.
mulps -- multiply four packed singles. the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
addps -- add four packed singles. the addend is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the augend.

Other Architectures

PowerPC implements AltiVec

See Also

"Why no FMA in AVX in Sandy Bridge?", Intel Developers Forum
SSE5 guide at AMD
2007-04-19 post to http://virtualdub.org, "SSE4 finally adds dot products"

Retrieved from "https://nick-black.com/dankwiki/index.php?title=SIMD&oldid=1020"

X86