Single Instruction Multiple Data (SIMD) and MMX Registers

download Single Instruction Multiple Data (SIMD) and MMX Registers

of 14

Transcript of Single Instruction Multiple Data (SIMD) and MMX Registers

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    1/14

    CS220

    April 23, 2007

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    2/14

    Some tips to lab 7#includemain()

    {float f1=1.1,f2=2.2;float result;

    __asm__ (

    "flds %1\n\t""fadds %2\n\t""fsts %0": "=m"(result): "m"(f1), "m"(f2)

    );printf("f1 + f2 = %f\n",result);

    }

    #includemain(){float f1=1.1,f2=2.2;float result;

    __asm__ (

    "faddp\n\t": "=t"(result): "0"(f1), "u"(f2): "st(1)"

    );printf("f1 + f2 = %f\n",result);

    }

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    3/14

    Single Instruction Multiple Data

    (SIMD)

    Data level parallelism

    Multimedia Extensions (MMX)

    Integers

    Reuse FP registers Streaming SIMD Extensions (SSE)

    expanded with 32-bit floating point support

    Additional registers

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    4/14

    SSE/SSE2/SSE3 Perform SIMD operations on floating-point data.

    128-bit, packed, single-precision floating-point data type

    contain four single-precision floating-point values Eight 128-bit registers (XMM0 through XMM7)

    SSE2 128-bit packed double-precision floating-point value

    contains two double-precision values

    128-bit packed byte integer value

    contains 16 single-byte integer values 128-bit packed word integer value

    contains eight word integer values

    128-bit packed double word integer value contains four double word integer values

    128-bit packed quad word integer value contains two quad word integer values

    SSE3 No new data type

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    5/14

    MMX Registers MMX utilizes the 80-bit

    FPU registers MM0 through MM7 are

    directly mapped to FPU

    registers R0 through R7 Random access contrast

    to register stack in FPU

    Only use 64 bits, upper 16bits are set to all ones(NaNs or infinities in FPview)

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    6/14

    Two New principles1. Operations on packed data

    four new 64-bit data types: Packed byte

    Eight bytes packed into one 64-bit quantity

    Packed word Four words packed into one 64-bit quantity

    Packed doubleword Two doublewords packed into one 64-bit quantity

    Quadword One 64-bit quantity

    2. Saturation Arithmetic

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    7/14

    MMX Data Types

    Note that the values in one same register can have different interpretations

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    8/14

    Saturation and Wraparound Wraparound: truncating any overflow, only the lower bits are

    returned. The carry is ignored.

    add two eight-bit values 0x02 and 0xFF

    The actual sum is 0x101, but the ninth bit is truncated, and theresult is 0x01

    Saturation: Results are clipped (saturated) to some maximum orminimum value, 8-bit example:

    0xFFFF0x0655350Unsigned Word

    0x7FFF0x8000+32767-32768Signed Word

    0xfF0x02550Unsigned Byte

    0x7F0x80+127-128Signed Byte

    Upper LimitLower LimitUpper LimitLower Limit

    HexadecimalDecimalData Type

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    9/14

    Cannot mix FPU and MMX instructions Begin MMX instructions at any time

    EMMS (Exit MMX Machine State) to reset FP state. After any MMX instruction, theentire floating-point tag word is set to Valid (00s). EMMS sets the entire floating-pointtag word to Empty (11s).

    Register states (both FP and MMX) can be saved and restored by FNSAVE andFRSTR instructions.

    Do not rely on register contents across transitions.FP_code:

    ...

    MMX_code:...EMMS (*mark the FP tag word as empty*)

    FP_code 1:......

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    10/14

    Instruction Group Fifty-seven MMX instructions:

    Arithmetic Instructions

    Comparison Instructions

    Conversion Instructions Logical Instructions

    Shift Instructions

    Data Transfer Instructions Empty MMX State (EMMS) Instruction

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    11/14

    Category Mnemonic Different Opcodes DescriptionArithmetic PADD[B,W,D] 3 Add with wrap-around on [byte, word, doubleword]

    PADDS[B,W] 2 Add signed with saturation on [byte, word]

    PADDUS[B,W] 2 Add unsigned with saturation on [byte, word]

    PSUB[B,W,D] 3 Subtract with wrap-around on [byte, word, doubleword]

    PSUBS[B,W] 2 Subtract signed with saturation on [byte, word]

    PSUBUS[B,W] 2 Subtract unsigned with saturation on [byte, word]PMULHW 1 Packed multiply high on words

    PMULLW 1 Packed multiply low on wordsPMADDWD 1 Packed multiply on words and add resulting pairs

    Comparison PCMPEQ[B,W,D] 3 Packed compare for equality [byte, word,doubleword]

    PCMPGT[B,W,D] 3 Packed compare greater than [byte, word, doubleword]

    Conversion PACKUSWB 1 Pack words into bytes (unsigned with saturation)

    PACKSS[WB,DW] 2 Pack [words into bytes, doublewords into words] (signed with

    saturation)PUNPCKH [BW,WD,DQ] 3 Unpack (interleave) high-order [bytes, words, doublewords] from

    MMXTM register

    PUNPCKL [BW,WD,DQ] 3 Unpack (interleave) low-order [bytes, words, doublewords] from

    MMX register

    Logical PAND 1 Bitwise AND

    PANDN 1 Bitwise AND NOT

    POR 1 Bitwise OR

    PXOR 1 Bitwise XORShift PSLL[W,D,Q] 6 Packed shift left logical [word, doubleword, quadword] by amount

    specified in MMX register or by immediate value

    PSRL[W,D,Q] 6 Packed shift right logical [word, doubleword, quadword] by amount

    specified in MMX register or by immediate value

    PSRA[W,D] 4 Packed shift right arithmetic [word, doubleword] by amount

    specified in MMX register or by immediate value

    Data Transfer MOV[D,Q] 4 Move [doubleword, quadword] to MMX register or from MMX

    registerState Mgmt EMMS 1 Empty MMX state

    MMX Instruction Set

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    12/14

    Data Transfer Instructions The MOVD (Move 32 Bits) instruction transfers 32 bits of packed

    data from memory to MMX registers and visa versa, or from integer

    registers to MMX registers and visa versa. Examples: movd %eax, %mm0 movd my32bits, %mm0 movd %mm0, my32bits movd %mm0, %mm1 (WRONG!)

    The MOVQ (Move 64 Bits) instruction transfers 64-bits of packeddata from memory to MMX registers and vise versa, or transfersdata between MMX registers. Examples:

    movq %mm0, my64bits movq my64bits, %mm0

    cant move between mmx regs, like load/store.

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    13/14

    Instruction format OPERATION SRC, DEST (AT&T syntax)

    would be decoded as:DEST = DEST OPERATION SRC

    A typical MMX instruction has this syntax:

    Prefix: P for Packed

    Instruction operation: for example - ADD,CMP,or XOR

    Suffix: US for Unsigned Saturation

    S for Signed saturation B, W, D, Q for the data type: packed byte, packed word, packed

    doubleword, or quadword.

  • 8/14/2019 Single Instruction Multiple Data (SIMD) and MMX Registers

    14/14

    The rest of todays class, explain MMX instructions on this page:

    http://www.tommesani.com/MMXPrimer.html

    Note the difference of Intel syntax and AT&T syntax

    http://www.imada.sdu.dk/~kslarsen/dm18/Litteratur/IntelnATT.htm

    This page uses Intel syntax, and the position of source and destinationin instructions are exchanged compared to AT&T syntax.

    The pseudo-code explanation of each instruction is the same.

    You may also want to refer to Intel official MMX reference manual forbetter explanation (also Intel syntax):

    ftp://download.intel.com/ids/mmx/MMX_Manual_%20Prog_Ref.pdf

    Examples and applications of MMX instructions will be on next class.