Check out my first novel, midnight's simulacra!
SIMD: Difference between revisions
From dankwiki
No edit summary |
(→x86) |
||
Line 1: | Line 1: | ||
==x86== | ==x86== | ||
* Terminology: | * Terminology: | ||
** ''single'': 32-bit IEEE 754 floating-point (bias-127) | ** '''single''': 32-bit IEEE 754 floating-point (bias-127) | ||
** ''double'': 64-bit IEEE 754 floating-point (bias-1023) | ** '''double''': 64-bit IEEE 754 floating-point (bias-1023) | ||
** ''long double'': 80-bit "double extended" IEEE 754-1985 floating-point (excess-16383) | ** '''long double''': 80-bit "double extended" IEEE 754-1985 floating-point (excess-16383) | ||
*** not an actual SIMD type, but an artifact of x87 | *** not an actual SIMD type, but an artifact of x87 | ||
** ''word'': 32-bit two's complement integer | ** '''word''': 32-bit two's complement integer | ||
** ''doubleword'', ''dword'': 64-bit two's complement integer | ** '''doubleword''', '''dword''': 64-bit two's complement integer | ||
* These do not necessarily map to the C data types of the same name, for any given compiler! | * These do not necessarily map to the C data types of the same name, for any given compiler! | ||
Revision as of 20:51, 19 September 2009
x86
- Terminology:
- single: 32-bit IEEE 754 floating-point (bias-127)
- double: 64-bit IEEE 754 floating-point (bias-1023)
- long double: 80-bit "double extended" IEEE 754-1985 floating-point (excess-16383)
- not an actual SIMD type, but an artifact of x87
- word: 32-bit two's complement integer
- doubleword, dword: 64-bit two's complement integer
- These do not necessarily map to the C data types of the same name, for any given compiler!
Future
- AVX (Advanced Vector eXtensions) -- to be introduced on Intel's Sandy Bridge (2010) and AMD's Bulldozer (2011), and implemented within the VEX coding scheme
- The FMA instruction set extension to x86 should hit around 2011, providing floating-point fused multiply-add
SSE4
SSE4a
SSE4.1
SSE4.2
SSE3
- movddup -- move a double from a 8-byte-aligned memory location or lower half of XMM register to upper half, then duplicate upper half to lower half
SSSE3
- pmaddwd -- multiply packed words, then horizontally sum pairs, accumulating into doublewords
SSE2
- movapd -- move two packed doubles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
- movupd -- movapd safe for unaligned memory references, with far inferior performance.
- mulpd -- multiply two packed doubles. the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
- addpd -- add two packed doubles. the addend is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the augend.
SSE
- movaps -- move four packed singles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
- movups -- movaps safe for unaligned memory references, with far inferior performance.
- mulps -- multiply four packed singles. the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
- addps -- add four packed singles. the addend is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the augend.
Other Architectures
- PowerPC implements AltiVec
See Also
- "Why no FMA in AVX in Sandy Bridge?", Intel Developers Forum
- SSE5 guide at AMD
- 2007-04-19 post to http://virtualdub.org, "SSE4 finally adds dot products"