Check out my first novel, midnight's simulacra!
SIMD: Difference between revisions
From dankwiki
(→SSE3) |
(→x86) |
||
Line 7: | Line 7: | ||
*<tt>[http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc180.htm movapd]</tt> -- move two packed doubles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers. | *<tt>[http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc180.htm movapd]</tt> -- move two packed doubles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers. | ||
*<tt>[http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc209.htm mulpd]</tt> -- the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand. | *<tt>[http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc209.htm mulpd]</tt> -- the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand. | ||
==SSE== | |||
*<tt>[http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc181.htm movaps]</tt> -- move four packed singles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers. | |||
===Fused Multiply-Add=== | ===Fused Multiply-Add=== | ||
* The [http://en.wikipedia.org/wiki/FMA_instruction_set FMA instruction set] extension to x86 should hit around 2011 | * The [http://en.wikipedia.org/wiki/FMA_instruction_set FMA instruction set] extension to x86 should hit around 2011 | ||
==Other Architectures== | ==Other Architectures== | ||
* PowerPC implements [http://en.wikipedia.org/wiki/AltiVec AltiVec] | * PowerPC implements [http://en.wikipedia.org/wiki/AltiVec AltiVec] |
Revision as of 18:36, 19 September 2009
x86
- AVX (Advanced Vector eXtensions) -- to be introduced on Intel's Sandy Bridge (2010) and AMD's Bulldozer (2011), and implemented within the VEX coding scheme
SSE3
- movddup -- move a double from a 8-byte-aligned memory location or lower half of XMM register to upper half, then duplicate upper half to lower half
SSE2
- movapd -- move two packed doubles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
- mulpd -- the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
SSE
- movaps -- move four packed singles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
Fused Multiply-Add
- The FMA instruction set extension to x86 should hit around 2011
Other Architectures
- PowerPC implements AltiVec
See Also
- "Why no FMA in AVX in Sandy Bridge?", Intel Developers Forum
- SSE5 guide at AMD