Check out my first novel, midnight's simulacra!

SIMD: Difference between revisions

From dankwiki
No edit summary
Line 6: Line 6:
===SSE2===
===SSE2===
*<tt>[http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc180.htm movapd]</tt> -- move two packed doubles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
*<tt>[http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc180.htm movapd]</tt> -- move two packed doubles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
**<tt>[http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc205.htm movupd]</tt> -- <tt>movapd</tt> safe for unaligned memory references, with far inferior performance.
*<tt>[http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc209.htm mulpd]</tt> -- the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.
*<tt>[http://www.sesp.cse.clrc.ac.uk/html/SoftwareTools/vtune/users_guide/mergedProjects/analyzer_ec/mergedProjects/reference_olh/mergedProjects/instructions/instruct32_hh/vc209.htm mulpd]</tt> -- the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.



Revision as of 18:38, 19 September 2009

x86

  • AVX (Advanced Vector eXtensions) -- to be introduced on Intel's Sandy Bridge (2010) and AMD's Bulldozer (2011), and implemented within the VEX coding scheme

SSE3

  • movddup -- move a double from a 8-byte-aligned memory location or lower half of XMM register to upper half, then duplicate upper half to lower half

SSE2

  • movapd -- move two packed doubles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
    • movupd -- movapd safe for unaligned memory references, with far inferior performance.
  • mulpd -- the multiplier is a 16-byte-aligned memory location or XMM register. the target XMM register serves as the multiplicand.

SSE

  • movaps -- move four packed singles from a 16-byte-aligned memory location to XMM registers, or vice versa, or between two XMM registers.
    • movups -- movaps safe for unaligned memory references, with far inferior performance.

Fused Multiply-Add

Other Architectures

See Also