SIMD: Difference between revisions

← Older edit Newer edit →

Revision as of 20:49, 19 September 2009

x86

Terminology:
- single: 32-bit IEEE 754 floating-point (bias-127)
- double: 64-bit IEEE 754 floating-point (bias-1023)
- long double: 80-bit "double extended" IEEE 754-1985 floating-point (excess-16383)
  - not an actual SIMD type, but an artifact of x87
- word: 32-bit two's complement integer
- doubleword, dword: 64-bit two's complement integer
These do not necessarily map to the C data types of the same name, for any given compiler!

Future

AVX (Advanced Vector eXtensions) -- to be introduced on Intel's Sandy Bridge (2010) and AMD's Bulldozer (2011), and implemented within the VEX coding scheme

SSE4

SSE4a

SSE4.1

SSE4.2

SSE3

movddup -- move a double from a 8-byte-aligned memory location or lower half of XMM register to upper half, then duplicate upper half to lower half

SSSE3

pmaddwd -- multiply packed words, then horizontally sum pairs, accumulating into doublewords

SSE2

movapd -- move two packed doubles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
- movupd -- movapd safe for unaligned memory references, with far inferior performance.
mulpd -- multiply two packed doubles. the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
addpd -- add two packed doubles. the addend is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the augend.

SSE

movaps -- move four packed singles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
- movups -- movaps safe for unaligned memory references, with far inferior performance.
mulps -- multiply four packed singles. the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
addps -- add four packed singles. the addend is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the augend.

Fused Multiply-Add

The FMA instruction set extension to x86 should hit around 2011

Other Architectures

PowerPC implements AltiVec

@@ Line 1: / Line 1: @@
 ==x86==
+* Terminology:
+** ''single'': 32-bit IEEE 754 floating-point (bias-127)
+** ''double'': 64-bit IEEE 754 floating-point (bias-1023)
+** ''long double'': 80-bit "double extended" IEEE 754-1985 floating-point (excess-16383)
+*** not an actual SIMD type, but an artifact of x87
+** ''word'': 32-bit two's complement integer
+** ''doubleword'', ''dword'': 64-bit two's complement integer
+* These do not necessarily map to the C data types of the same name, for any given compiler!
+===Future===
 * [http://software.intel.com/en-us/avx/ AVX] (Advanced Vector eXtensions) -- to be introduced on Intel's Sandy Bridge (2010) and AMD's Bulldozer (2011), and implemented within the [http://en.wikipedia.org/wiki/VEX_prefix VEX coding scheme]
+===SSE4===
+====SSE4a====
+====SSE4.1====
+====SSE4.2====
 ===SSE3===
 *<tt>[http://www.intel.com/software/products/documentation/vlin/mergedprojects/analyzer_ec/mergedprojects/reference_olh/mergedProjects/instructions/instruct32_hh/movddup--move_one_double-fp_and_duplicate.htm movddup]</tt> -- move a double from a 8-byte-aligned memory location or lower half of XMM register to upper half, then duplicate upper half to lower half
+====SSSE3====
+*<tt>[http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc235.htm pmaddwd]</tt> -- multiply packed words, then horizontally sum pairs, accumulating into doublewords
 ===SSE2===
@@ Line 25: / Line 45: @@
 * "[http://software.intel.com/en-us/forums/intel-avx-and-cpu-instructions/topic/61121/ Why no FMA in AVX in Sandy Bridge?]", Intel Developers Forum
 * [http://developer.amd.com/cpu/SSE5/Pages/default.aspx SSE5] guide at AMD
+* 2007-04-19 post to http://virtualdub.org, "[http://www.virtualdub.org/blog/pivot/entry.php?id=150 SSE4 finally adds dot products]"
 [[Category: x86]]

SIMD: Difference between revisions

Revision as of 20:49, 19 September 2009

Contents

x86

Future

SSE4

SSE4a

SSE4.1

SSE4.2

SSE3

SSSE3

SSE2

SSE

Fused Multiply-Add

Other Architectures

See Also

navigation menu

SIMD: Difference between revisions

Revision as of 20:49, 19 September 2009

x86

Future

SSE4

SSE4a

SSE4.1

SSE4.2

SSE3

SSSE3

SSE2

SSE

Fused Multiply-Add

Other Architectures

See Also

navigation menu

Search