Arithmetic Coprocessor Coprocessor Basic:

Advanced MicroprocessorAdvanced Microprocessor 11

Arithmetic CoprocessorArithmetic Coprocessor

Coprocessor basic:

• The 80x87 is able to multiply, divide, add, subtract, find the sqrt and calculate transcendental functions and logarithms.

• Data types include - 16-, 32- and 64-bit signed integers- 18-digit BCD data and - 32-,64- and 80-bit (extended precision) floating-point numbers.

• The operation performed by the 80x87 generally executes much faster than equivalent operation written in microprocessor normal instruction.



Data Formats for the Arithmetic Coprocessor:

Signed Integers:-• 16 bit ( word ) – range -32768 to +32767 • 32 bit ( short integer ) – range -2x10+9 to + 2x10+9

• 64 bit ( long integer ) – range -9x10+18 to +9x10+18

3 forms of signed integers-

s magnitude

s

s

magnitude

magnitude

15 0

31 0

63 0



• The directives dw, dd and dq are used for declaring signed integer storage

- dw to define word- dd to define short integer- dq to define long integer

for every microprocessor their will be a coprocessor

8086 80878088 808780186 80187

& so on



Binary Coded Decimal ( BCD ):-

• BCD form requires 80 bits of memory.

• Each number is stored as an 18-digit packed integer in 9 bytes of memory as 2 digit per byte, 10th byte for sign bit.

• Both positive & negative numbers are stored in true formex :

DATA1 DT 20 ; 20 as bcd00 00 00 00 00 00 00 00 00 20

DATA2 DT -220 ; -220 as bcd80 00 00 00 00 00 00 00 02 20

DATA3 DT 50000 ; 50000 as bcd00 00 00 00 00 00 00 05 00 00

SS 1717 1616 1515 1414 1313 1212 1111 1010 99 88 77 66 55 44 33 22 11 00



Floating point:-

• Hold signed integers, fractions & mixed numbers.

• Floating point numbers has 3 parts

- Sign bit - Biased exponent- Significand

• Intel family arithmetic coprocessor supports 3 types of floating point numbers

- Short (32 bit) : single precision, with a bias of 7FH- Long (64 bit) : double precision, with a bias of 3FFH- Temporary (80 bit) : extended precision, with a bias of 3FFFH



1

fraction

fraction

fraction

exp

exp

exps

s

s

31 30 23 22 0

63 62 52 51 0

79 78 64 63 0

Converting Decimal to Floating-point form:

- Convert the decimal number into binary.- Normalize the binary number.- Calculate the biased exponent.- Store the number in the floating-point format.



Ex : convert decimal to floating-point

100.2510 1 - convert to binary

100 ->1100100.25 -> 011100100.01

2 - normalize binary1100100.01 = 1.10010001x26

3 - calculate bias expo7FH(127) for single precision

add expo with precision110 + 01111111 ( 6 + 127)10000101

4 - floating-point numbersign -> 0expo -> 10000101significand -> 10010001000000000000000



-100.2510 1- convert to binary

100 ->1100100.25 -> 01-1100100.01

2 - normalize binary-1100100.01 = -1.10010001x26

3 - calculate bias expo7FH(127) for single precision

add expo with precision110 + 01111111 ( 6 + 127)10000101

4 - floating-point numbersign -> 1expo -> 10000101significand -> -10010001000000000000000



Special Rules:

- The number 0 is stored as all 0s (except for the sign bit).- +/- infinity is stored as logic 1s in the exponent, with a significand of all 0s. Sign bit is used to represent +/- infinity. - A NAN (not-a-number) is an invalid floating-point result that has all 1s in the exponent with a Significand that is NOT all zeros.

Converting Floating-point to Decimal:

- Separate the sign-bit, biased exponent and significand.- Convert the biased exponent into a true exponent by

subtracting the bias.- Write the number as a normalized binary number.- Convert it to a de-normalized binary number.- Convert the de-normalized binary number to decimal.



Ex: convert floating-point to decimal: 1- separate the

sign = 0 expo = 10000011

significand = 10010010000000000000000 2 - convert the biased to true expo

100 <- 10000011 – 01111111 ( 7FH , 127 for single preci) 3 - normalized binary number

1.1001001 x 24 4 - convert to de-normalized binary number

11001.001 5 - convert into decimal

25.125



1 - separate the sign = 1

expo = 10000011significand = 10010010000000000000000

2 - convert the biased to true expo100 <- 10000011 – 01111111 ( 7FH , 127 for single preci)

3 - normalized binary number1.1001001 x 24

4 - convert to de-normalized binary number11001.001

5 - convert into decimal-25.125



The 8087 Architecture:

• 8087 designed to operate concurrently with microprocessor

• 8087 executes 68 different instructions

• Both microprocessor & coprocessor can execute their respective instruction simultaneously or concurrently

• The numeric or arithmetic coprocessor is a special purpose microprocessor, especially designed to execute arithmetic & transcendental operation

• Microprocessor intercepts & executes normal instruction set, Coprocessor intercepts & executes its instruction


Arithmetic CoprocessorArithmetic CoprocessorInternal Structure of the 80x87:

StatusAddress



• Control unit ( CU ):- interface the coprocessor to the microprocessor data bus. if instruction is ESC then coprocessor executes, if not microprocessor will executes it.

• Numeric execution unit ( NEU ) :- - Unit is responsible for executing all coprocessor instruction

- Has 8 register stack, hold arithmetic instruction & results

- Also other register status, tag, control & exception pointers

- Stack within the coprocessor contain 8-registers each 80 bits wide, contain 80 bit extended-precision floating-point number

- Coprocessors converted data are moved between memory & coprocessor register stack.



Status register:• Reflects overall operation of the coprocessor.

• Coprocessor is accessed by executing, FSTSW instructions which stores the content of status register into word of memory

• The coprocessor/microprocessor communications are carried out thru I/O ports

B – busy bit: indicate coprocessor is busy, can be checked by testing status register or by FWAIT instruction



• C3 to C0 – condition code bit : indicate the condition of the coprocessor

• TOP - top of stack (ST) : bit indicate the current register address as the top of stack

• ES – error summary : bit is set if any unmasked error bit (PE,UE, OE, ZE, IE ) is set. In 8087 coprocessor the error summary also caused a coprocessor interrupt

• PE – precision error : result exceed the precision

• UE – underflow : non-zero result , which is too small to represent it current precision selected

• OE – overflow : result is too large. If error is masked, coprocessor enters infinite time


Arithmetic CoprocessorArithmetic Coprocessor• ZE – zero error : divisor is zero, dividend is a non infinity or non zero number.

• DE – denormalized error : least one of the operands is denormalized

• IE – invalid error : indicate stack underflow/overflow, indeterminate form or the use of a NAN as an operand. Sqrt of –ve number

Control register: - selects – precision, rounding control & infinity control

- masks & unmasked the exception bits that corresponds to the rightmost 6 bits of the status register

- FLDCW instruction is used to load a value onto the control register



Invalid Operationmask

DenormalizedOperand mask

Division by zero mask

Precision control00 – single01 – reserved10 – double 11 - extended

Rounding control00 – round nearest or even01 – round down towards minus infinity 10 – round up towards plus

infinity11 – chop or truncate towards zero

Infinity control0 – projective1 – affine


Arithmetic CoprocessorArithmetic Coprocessor• IC – infinity control : affine allows +ve or –ve infinity & projective assumes infinity is unsigned

• RC – rounding control : determine the type of rounding

• PC – precision control : sets the precision of the results

• Exception masks : check error indicated by the exception affects the error bit in the status register , if logic 1 present in the one of the exception control bits , corresponding bit in the status register is masked off



fdiv DATA1fstsw ax ;Copy status reg to AXtest ax, 4 ;Test bit position 2jnz DIVIDE_ERRORfcom DATA1 ;Compare DATA1 to ST0 and set status.fstsw axsahf ;Copy status bits to flags.je ST_EQUALjb ST_BELOWja ST_ABOVE



TAG 7TAG 7 TAG 6TAG 6 TAG 5TAG 5 TAG 4TAG 4 TAG 3TAG 3 TAG 2TAG 2 TAG 1TAG 1 TAG 0TAG 0

Tag register :

-Indicates the contents of each location in the coprocessor stack

- program can view the tag register by storing the coprocessor Environment using FSTENV, FSAVE, FRSTOR

- 00 – VALID, 01 – ZERO, 10 – INVALID or INIFINITY, 11 - EMPTY



Instruction Set:

• executes over 68 different instructions

• coprocessor uses the data bus for data transfer during coprocessor instruction , microprocessor uses during normal instruction

Types of instruction :-- data transfer instructions- arithmetic instructions- comparison instructions- transcendental operations- constant operation- coprocessor control instructions



i) Data transfer instruction:- floating-point- signed-integer- BCD- pentium pro thru pentium4 FCMOV instruction

coprocessor stores the data in 80-bit extended precision floating point number

Floating – point data transfer

FLD (Load Real) : - Loads floating-point data to Stack Top (ST).- Stack pointer is then decremented by 1.- Data can be retrieved from memory, or another stack

position.



Ex : FLD st2 ;Copies contents of register two to ST

top of the stack is register 0 when coprocessor is reset or initialized

FLD data7 ;copies the content memory location data7 to the ;top of stack

size of the transfer is automatically determined by the assembler thru directives

FST ( store real) :

- Stores a copy of the top of the stack into memory or another coprocessor register.- Rounding occurs when the storage operation

completes according to the control register- copy instruction



FSTP ( floating point store and pop)- Stores a copy of the top of the stack into memory or

another coprocessor register- pop the data from the top of stack - a removal instruction

FXCH ( exchange )- exchanges the content of register with top of stack

ex : FXCH st2 ; exchanges top of the stack with register 2

Integer data transfer instruction

- FILD ( load integer)- FIST ( store integer)- FISTP ( store integer and pop)

While transferring the data , coprocessor automatically converts extended floating-point number to integer data.



BCD data transfer instruction

- FBLD – loads the top of stack with BCD memory data- FBSTP – stores top of the stack and does a pop

Pentium pro thru pentium4 instructionFCMOV

- contains condition- if condition true, copies the source to destination- condition are checked for either an ordered or

unordered - testing for NAN and denormalized numbers are not

checkedFCMOVB - move if below, FCMOVE - move if equalFCMOVBE - move if below or equal, FCMOVU - move if unorderedFCMOVNB - move if not below, FCMOVNE - move if not equalFCMOVNBE - move if not below or equal, FCMOVNU - move if not ordered


Arithmetic CoprocessorArithmetic Coprocessorii) Arithmetic instruction:

- addition, subtraction, multiplication, division, calculating square roots

- arithmetic related – scaling, rounding, absolute value, changing sign

Addressing modesModeMode FormForm ExampleExample

StackStack ST(1),STST(1),ST FADDFADD

RegisterRegister ST,ST(n)ST,ST(n)

ST(n),STST(n),ST

FADD ST,ST(2)FADD ST,ST(2)

FADD ST(2),STFADD ST(2),ST

Register popRegister pop ST(n),STST(n),ST FADDP ST(3),STFADDP ST(3),ST

MemoryMemory operandoperand FADD data2FADD data2

-Stack addressing mode is restricted to use ST (stack top) and ST1.

-The source operand is ST while the destination operand is ST1.

-After the operation, the source is popped, leaving the dest. at ST.



Stack addressing mode, • stack, uses top of the stack as the source operand & next to thetop as destination.

• later, top is popped out, result is the top of the stackex :FADD – adds ST and ST1, result will store in ST1FSUB – subtract ST from ST1, result will be ST, FSUBR, reverse instruction – subtracts ST1 from ST, result

in STto compute reciprocalFDIVR – result stored in ST



Register addressing mode,

• MUST use ST as one of the operands.

• The other operand can be any register, including ST0 which is ST. Note that the destination can be either ST or STn.

• unlike stack addressing, non-popping versions can be used.

Memory addressing mode,

• always uses ST as the destination, coprocessor stack oriented Machine



Arithmetic operation,

The following letters are used to additionally qualify the operation:

• P: Perform a register pop after the operation, FADD and FADDP.

• R: Reverse mode for subtraction and division.

• I: Indicates that the memory operand is an integer. I appears as the second letter in the instruction, e.g., FIADD, FISUB, FIMUL, FIDIV.



Arithmetic related operations,

• FSQRT: Finds the square root of operand at ST. Leave result there. Check IE bit for an invalid result, e.g., the operand was negative using FSTSW AX, and TEST AX, 1.

• FSCALE: Adds contents of ST1 (interpreted as an integer) to the exponent of ST. value of ST must be between 2-15 and 2+15

• FPREM1: Performs modulo division of ST by ST1. The resultant remainder is found at ST.

• FRNDINT: Rounds ST to an integer.

• FXTRACT: Decomposes ST into an unbiased exponent and a significand. Extracted significand is at ST and unbiased exponent at ST1.



• FABS: Change sign of ST to positive.

• FCHS: Invert sign of ST.

iii) Comparison instruction:

-Instruction examines the data at the top of the stack with other, return the result of the comparison in status register condition code c3 to c0 .

• FCOM: Compares ST with an memory or register operand. FCOM by itself compares ST and ST1.

• FCOMP/FCOMPP: Compare and pop once or twice.

• FICOM/FICOMP: Compare ST with integer memory operand and optionally pop the stack.



• FTST: Compare ST with 0.0.

• FXAM: Exam ST and modify CC bits to indicate whether contents are positive, negative, normalized, etc.

• FCOMI/FUCOMI: pentium’s, same as FCOM, has one additionalfeature moves the floating point flags register to flag register FNSTSW AX, and SAHF.

iv) Transcendental operations

• FPTAN – finds partial tangent of y/x = tanθ, θ value on top of the stack must be between 0 and n/4 for 87 & 287 , must less than 263 for 387 – pentium4

• FPATAN – partial arctangent θ



•F2XM1: Compute 2x -1

•FSIN/FCOS : sin or cosine , result found in ST

•FSINCOS : sin & cosine, ST – sine & ST1 – cosine

•FYL2X: Compute Ylog2X, X – ST & Y – ST1, result on top of the stack, X range between 0 and infinity & Y range between •-infinity and 0

•FYL2XP1: Compute Ylog2(X + 1)

FunctionFunction equationequation

1010yy 22yy x log x log22 10 10

εεyy 22yy x log x log22 εεxxyy 22yy x log x log22 x x


Arithmetic CoprocessorArithmetic CoprocessorV - Constant operation

• coprocessor instruction set include opcodes that return constants to the top of the stack.

- FLDZ: Store +0.0 to ST.

- FLD1: Store +1.0 to ST.

- FLDPI: Store pi to ST.

- FLDL2T: Store log210 to ST.

- FLDL2E: Store log2e to ST.

- FLDLG2: Store log102 to ST.

- FLDLN2: Store loge2 to ST.



VI . Coprocessor Control instruction

-Control instruction for initialization, exception handling & task switching

FINIT/ FNINIT : performs a reset operation, sets register0 as top of the stack

round, busy,

FSETPM : changes the addressing mode of the coprocessor to the protected addressing mode

FLDCW : loads the control register with the word addressed by the operands

FSTCW/FNSTCW : store the control register into the word sized memory operand


Arithmetic CoprocessorArithmetic Coprocessor• FSTSW AX/ FNSTSW AX : copies the contents of the control register to AX ( not for 8087)

• FCLEX/FNCLEX : clear the error flags in the status register and also busy flag

• FSAVE/FNSAVE : writes the entire state of the machine to memory

• FRSTOR : restores the state of the machine from memory

• FSTENV/FNSTENV : stores the environment of the coprocessor – real mode or protected mode

• FLDENV : reloads the environment

• FINCST : increments the stack pointer FDECSTP : decrement the stack pointer



• FFREE : frees a register content

• FNOP : floating point coprocessor NOP

• FWAIT : causes the microprocessor to wait for the coprocessor to finish an operation, it should be used before the microprocessor access memory data that are affect by the coprocessor



Coprocessor instruction:

- lists of the instruction for all coprocessor from 8087 thru pentium 4, with number of clocking periods required to execute each instruction.

General:

reg = floating point register, st(0), st(1) ... st(7)Mem = memory addressmem32 = memory address of 32-bit itemmem64 = memory address of 64-bit itemmem80 = memory address of 80-bit item



FX = pairs with FXCHNP = no pairing

Instruction clock cycles

• F2XM1 Compute 2x-1

8087 287 387 486 Pentium310-630 310 -630 211-476 140-279 13-57 NP

• FABS Absolute value

8087 287 387 486 Pentium10-17 10-17 22 3 1 FX



• FADD Floating point add• FADDP Floating point add and popvariations/operand 8087 287 387 486 Pentiumfadd 70-100 70-100 23-34 8-20 3/1 FXfadd mem32 90-120 90-120 24-32 8-20 3/1 FX

+EA fadd mem64 95-125 95-125 29-37 8-20 3/1 FX

+EAfaddp 75-105 75-105 23-31 8-20 3/1 FX

• FBLD Load BCDoperand 8087 287 387 486 Pentiummem (290-310) 290-310 266-275 70-103 48-58 NP

+EA



• FBSTP Store BCD and pop 8087 287 387 486 Pentium(520-540)+EA 520-540 512-534 172-176 148-154 NP

• FCHS Change sign8087 287 387 486 Pentium10-17 10-17 24-25 6 1 FX

• FNCLEX Clear exceptions, no waitvariations 8087 287 387 486 Pentiumfclex 2-8 2-8 11 7 9 NPfnclex 2-8 2-8 11 7 9 NPThe wait version may take additional cycles



• FCOM Floating point compare• FCOMP Floating point compare and pop• FCOMPP Floating point compare and pop twicevariations/operand 8087 287 387 486 Pentiumfcom reg 40-50 40-50 24 4 4/1 FXfcom mem32 (60-70) 60-70 26 4 4/1 FX

+EAfcom mem64 (65-75) 65-75 31 4 4/1 FX

+EAfcomp 42-52 42-52 26 4 4/1 FXfcompp 45-55 45-55 26 5 4/1 FX

FCOS Floating point cosine (387+)8087 287 387 486 Pentium- - 123-772 257-354 18-124 NPAdditional cycles required if operand > pi/4 (~3.141/4 =~.785)



•FDISI Disable interrupts (8087 only, others do fnop)•FNDISI Disable interrupts, no wait (8087 only, others do fnop)variations 8087 287 387 486 Pentiumfdisi 2-8 2 2 3 1 NPfndisi 2-8 2 2 3 1 NPThe wait version may take additional cycles

•FDIV Floating divide•FDIVP Floating divide and popvariations/operand 8087 287 387 486 Pentiumfdiv reg 193-203 193-203 88-91 73 39 FXfdiv mem32 (215-225) 215-225 89 73 39 FX

+EAfdiv mem64 (220-230) 220-230 94 73 39 FX

+EAfdivp 197-207 197-207 91 73 39 FX



•FDIVR Floating divide reversed•FDIVRP Floating divide reversed and popvariations/operand 8087 287 387 486 Pentiumfdivr reg 194-204 194-204 88-91 73 39 FXfdivr mem32 (216-226) 216-226 89 73 39 FX

+EAfdivr mem64 (221-231) 221-231 94 73 39 FX

+EAfdivrp 198-208 198-208 91 73 39 FX

•FENI Enable interrupts (8087 only, others do fnop)•FNENI Enable interrupts, nowait (8087 only, others do fnop)Variations 8087 287 387 486 Pentiumfeni 2-8 2 2 3 1 NPfneni 2-8 2 2 3 1 NP



• FFREE Free register8087 287 387 486 Pentium9-16 9-16 18 3 1 NP

• FIADD Integer addoperand 8087 287 387 486 PentiumMem16 (102-137) 102-137 71-85 20-35 7/4 NP

+EAmem32 (108-143) 108-143 57-72 19-32 7/4 NP

+EA

•FINIT Initialize floating point processor•FNINIT Initialize floating point processor, no waitvariations 8087 287 387 486 Pentiumfinit 2-8 2-8 33 17 16 NPfninit 2-8 2-8 33 17 12 NPThe wait version may take additional cycles



•FICOM Integer compare•FICOMP Integer compare and popvariations/operand 8087 287 387 486 Pentiumficom mem16 (72-86) 72-86 71-75 16-20 8/4 NP

+EAficom mem32 (78-91) 78-91 56-63 15-17 8/4 NP

+EAficomp mem16 (74-88) 74-88 71-75 16-20 8/4 NP

+EAficomp mem32 (80-93) 80-93 56-63 15-17 8/4 NP

+EA

• FIMUL Integer multiplyOperand 8087 287 387 486 Pentiummem16 (124-138) 124-138 76-87 23-27 7/4 NP

+EA mem32 (130-144) 130-144 61-82 22-24 7/4 NP

+EA



•FIDIV Integer divide•FIDIVR Integer divide reversedvariations/operand 8087 287 387 486 Pentiumfidiv mem16 (224-238) 224-238 136-140 85-89 42 NP

+EA fidiv mem32 (230-243) 230-243 120-127 84-86 42 NP

+EA fidivr mem16 (225-239)225-239 135-141 85-89 42 NP

+EA fidivr mem32 (231-245) 231-245 121-128 84-86 42 NP

+EA

• FILD Load integeroperand 8087 287 387 486 Pentiummem16 (46-54)+EA 46-54 61-65 13-16 3/1 NPmem32 (52-60)+EA 52-60 45-52 9-12 3/1 NPmem64 (60-68)+EA 60-68 56-67 10-18 3/1 NP


Arithmetic CoprocessorArithmetic Coprocessor•FIST Store integer•FISTP Store integer and popvariations/operand 8087 287 387 486 Pentiumfist mem16 (80-90)+EA 80-90 82-95 29-34 6 NPfist mem32 (82-92)+EA 82-92 79-93 28-34 6 NPfistp mem16 (82-92)+EA 82-92 82-95 29-34 6 NPfistp mem32 (84-94)+EA 84-94 79-93 28-34 6 NPfistp mem64 (94-105)+EA 94-105 80-97 28-34 6 NP

•FISUB Integer subtract•FISUBR Integer subtract reversedvariations/Operand 8087 287 387 486 Pentiumfisub mem16 (102-137)+EA 102-137 71-85 20-35 7/4 NPfisubr mem32 (108-143)+EA 108-143 57-82 19-32 7/4 NP



• FINCSTP Increment floating point stack pointer8087 287 387 486 Pentium6-12 6-12 21 3 1 NP

• FLD Floating point loadoperand 8087 287 387 486 Pentiumreg 17-22 17-22 14 4 1 FXmem32 (38-56)+EA 38-56 20 3 1 FXmem64 (40-60)+EA 40-60 25 3 1 FXmem80 (53-65)+EA 53-65 44 6 3 NPLoad floating point constants

• FLDCW Load control wordoperand 8087 287 387 486 Pentiummem16 (7-14)+EA 7-14 19 4 7 NP



•FLDZ Load constant onto stack, 0.0•FLD1 Load constant onto stack, 1.0•FLDL2E Load constant onto stack, logarithm base 2 (e)•FLDL2T Load constant onto stack, logarithm base 2 (10)•FLDLG2 Load constant onto stack, logarithm base 10 (2)•FLDLN2 Load constant onto stack, natural logarithm (2)•FLDPI Load constant onto stack, pi (3.14159...)

variations 8087 287 387 486 Pentiumfldz 11-17 11-17 20 4 2 NPfld1 15-21 15-21 24 4 2 NPfldl2e 15-21 15-21 40 8 5/3 NPfldl2t 16-22 16-22 40 8 5/3 NPfldlg2 18-24 18-24 41 8 5/3 NPfldln2 17-23 17-23 41 8 5/3 NPfldpi 16-22 16-22 40 8 5/3 NP



•FLDENV Load environment stateoperand 8087 287 387 486 Pentiummem (35-45)+EA 35-45 71 44/34 37/32-33 NPcycles for real mode/protected mode

•FMUL Floating point multiply•FMULP Floating point multiply and popvariations/operand 8087 287 387 486 Pentiumfmul reg s 90-105 90-105 29-52 16 3/1 FXfmul reg 130-145 130-145 46-57 16 3/1 FXfmul mem32 (110-125)+EA 110-125 27-35 11 3/1 FXfmul mem64 (154-168)+EA 154-168 32-57 14 3/1 FXfmulp reg s 94-108 94-108 29-52 16 3/1 FXfmulp reg 134-148 134-148 29-57 16 3/1 FXs = register with 40 trailing zeros in fraction



•FNOP no operation8087 287 387 486 Pentium10-16 10-16 12 3 1 NP

•FPATAN Partial arctangent8087 287 387 486 Pentium250-800 250-800 314-487 218-303 17-173

•FPREM Partial remainder•FPREM1 Partial remainder (IEEE compatible, 387+)Variations 8087 287 387 486 Pentiumfprem 15-190 15-190 74-155 70-138 16-64 NPfprem1 - - 95-185 72-167 20-70 NP

•FPTAN Partial tangent8087 287 387 486 Pentium30-540 30-540 191-497 200-273 17-173 NPAdditional cycles required if operand > pi/4 (~3.141/4 =~.785)



•FRNDINT Round to integer8087 287 387 486 Pentium16-50 16-50 66-80 21-30 9-20 NP

•FRSTOR Restore saved statevariations/Operand 8087 287 387 486 Pentiumfrstor mem (197-207)+EA 197-207 308 131/120 75-95/70 NPfrstorw mem - - 308 131/120 75-95/70 NPfrstord mem - - 308 131/120 75-95/70 NP

cycles for real mode/protected mode



•FSAVE Save FPU state•FSAVEW Save FPU state, 16-bit format (387+)•FSAVED Save FPU state, 32-bit format (387+)•FSAVE Save FPU state, no wait•FSAVEW Save FPU state, no wait, 16-bit format (387+)•FSAVED Save FPU state, no wait, 32-bit format (387+)

variations 8087 287 387 486 Pentiumfsave (197-207)+EA 197-207 375-376 154/143 127-151/124 NPfsavew 375-376 154/143 127-151/124 NPfsaved 375-376 154/143 127-151/124 NPfnsave (197-207)+EA 197-207 375-376 154/143 127-151/124 NPfnsavew 375-376 154/143 127-151/124 NPFnsaved 375-376 154/143 127-151/124 NPCycles for real mode/protected modeThe wait version may take additional cycles



•FSCALE Scale by factor of 28087 287 387 486 Pentium32-38 32-38 67-86 30-32 20-31 NP

FSETPM Set protected mode (287 only, 387+ = fnop)8087 287 387 486 Pentium- 2-8 12 3 1 NP

•FSIN Sine (387+)•FSINCOS Sine and cosine (387+)variations 8087 287 387 486 Pentiumfsin - - 122-771 257-354 16-126 NPfsincos - - 194-809 292-365 17-137 NPAdditional cycles required if operand > pi/4 (~3.141/4 = ~.785)•FSQRT Square root8087 287 387 486 Pentium180-186 180-186 122-129 83-87 70 NP



•FST Floating point store•FSTP Floating point store and pop

variations/Operand 8087 287 387 486 Pentiumfst reg 15-22 15-22 11 3 1 NPfst mem32 (84-90)+EA 84-90 44 7 2 NPfst mem64 (96-104)+EA 96-104 45 8 2 NPfstp reg 17-24 17-24 12 3 1 NPfstp mem32 (86-92)+EA 86-92 44 7 2 NPfstp mem64 (98-106)+EA 98-106 45 8 2 NPfstp mem80 (52-58)+EA 52-58 53 6 3 NP



•FSTCW Store control word•FNSTCW Store control word, no waitvariations/operand 8087 287 387 486 Pentiumfstcw mem 12-18 12-18 15 3 2 NPfnstcw mem 12-18 12-18 15 3 2 NPThe wait version may take additional cycles



•FSTENV Store FPU environment•FSTENVW Store FPU environment, 16-bit format (387+)•FSTENVD Store FPU environment, 32-bit format (387+)•FNSTENV Store FPU environment, no wait•FNSTENVW Store FPU environment, no wait, 16-bit format (387+)•FNSTENVD Store FPU environment, no wait, 32-bit format (387+)variations/operand 8087 287 387 486 Pentiumfstenv mem (40-50)+EA 40-50 103-104 67/56 48-50 NPfstenvw mem 103-104 67/56 48-50 NPfstenvd mem 103-104 67/56 48-50 NPfnstenv mem (40-50)+EA 40-50 103-104 67/56 48-50 NPfnstenvw mem 103-104 67/56 48-50 NPfnstenvd mem 103-104 67/56 48-50 NPCycles for real mode/protected modeThe wait version may take additional cycles


Arithmetic CoprocessorArithmetic Coprocessor•FSTSW Store status word•FNSTSW Store status word, no waitvariations/operand 8087 287 387 486 Pentiumfstsw mem 12-18 12-18 15 3 2 NPfstsw ax - 10-16 13 3 2 NPfnstsw mem 12-18 12-18 15 3 2 NPfnstsw ax - 10-16 13 3 2 NPThe wait version may take additional cycles

•FSUB Floating point subtract•FSUBP Floating point subtract and popvariations/operand 8087 287 387 486 Pentiumfsub reg 70-100 70-100 26-37 8-20 3/1 FXfsub mem32 (90-120)+EA 90-120 24-32 8-20 3/1 FXfsub mem64 (95-125)+EA 95-125 28-36 8-20 3/1 FXfsubp reg 75-105 75-105 26-34 8-20 3/1 FX



•FSUBR Floating point reverse subtract•FSUBRP Floating point reverse subtract and popvariations/operand 8087 287 387 486 Pentiumfsubr reg 70-100 70-100 26-37 8-20 3/1 FXfsubr mem32 (90-120)+EA 90-120 24-32 8-20 3/1 FXfsubr mem64 (95-125)+EA 95-125 28-36 8-20 3/1 FXfsubrp reg 75-105 75-105 26-34 8-20 3/1 FX

FTST Floating point test for zero8087 287 387 486 Pentium38-48 38-48 28 4 4/1 FX

FWAIT Wait while FPU is executing8087 287 387 486 Pentium4 3 6 1-3 1-3 NP



•FXAM Examine condition flags8087 287 387 486 Pentium12-23 12-23 30-38 8 21 NP

FXCH Exchange floating point registers8087 287 387 486 Pentium10-15 10-15 18 4 0-1 *• * FCXH is pairable in the V pipe with all FX pairable instructions

•FXTRACT Extract exponent and significand 8087 287 387 486 Pentium27-55 27-55 70-76 16-20 13 NP

•FYL2X Compute Y * log2(x)•FYL2XP1 Compute Y * log2(x+1)variations 8087 287 387 486 Pentiumfyl2x 900-1100 900-1100 120-538 196-329 22-111 NPfyl2xp1 700-1000 700-1000 257-547 171-326 22-103 NP


MMX Technology MMX Technology

• Multi Media eXtensions ( MMX )

• Designed to accelerate multimedia and communicationapplications

- motion video, image processing, audio synthesis, speech synthesis and compression, video conferencing, 2D and 3D graphics

• Includes new instructions and data types to significantly improve application performance

• Exploits the parallelism inherent in many multimedia andcommunications algorithms

• Maintains full compatibility with existing operating systems and applications



Data Types:-

• packed data types- 8 packed , consecutive 8 bit bytes- 4 packed , consecutive 16 bit words- 2 packed , consecutive 32 bit double words- format have consecutive memory addresses & uses little endian form

63 56 55 48 47 40 39 32 31 24 23 16 15 8 7 0

63 32 31 0

63 0

63 48 47 32 31 16 15 0


MMX Technology MMX Technology • MMX Technology registers have the same format as a 64 bit quantity in memory

• has 2 data access modes- 64 bit access mode, for 64 bit memory & register transfer

occur between floating point coprocessor registers- 32 bit access mode, for 32 bit memory & register transfer

occur between microprocessor registers

MM7MM6MM5MM4MM3MM2MM1MM0

TAGs



• adds 57 new instructions to the instructions set of pentium – pentium4

Instruction Set :-

- arithmetic - comparison- conversion- logical- shift- data transfer

• instruction types are similar to microprocessor , MMX instruction uses packed data types

Arithmetic instruction: addition, subtraction, multiplication & a special multiplication with an addition.



• addition are performed- packed signed or unsigned packed bytes ( B )- packed words ( W )- packed double word data ( D )

• any carry or borrow is generated are dropped

Comparison instruction:

• 2 comparison PCMPEQ( equal) & PCMPGT( greater than)

• compared bytes, words or double word

• do not change the microprocessor flag bits, return 1’s for true & 0’s for false

• if MM2 compared with MM1 , if equal Least significant byte of MM2 contains FFH otherwise 00H



Conversion Instruction:

• 2 comparison instruction PACK as signed and unsigned , & PUNPCK as unpack high data and unpack low data

• packed signed or unsigned packed bytes ( B )- packed words ( W )- packed double word data ( D )

• B,W & D – must be used in combination- WB word to byte- DW double to word

• in conversion, if unsigned word does not fit , then the destination byte becomes an FFH



Logical instruction:

• AND, OR, NAND & XOR

• instruction do not have size extension

• perform bit wise operations on all 64 bits of the data

Shift instruction:

• logical shift & arithmetic shift right instruction

• performed on word (W), double word (D) & quad word (Q)



Data transfer instruction:

• data transfer done – register to register or register and memory

• only rightmost 32 bits are copied , no instruction to transfer leftmost 32 bit ,

• to transfer leftmost 32 bit, shift right

EMMS instruction:

• empty MMX state, all the tags in the floating point unit , floating point register are listed as empty

• this instruction should be executed before the return instruction at the end of MMX procedure or subsequent floating point operation will cause interrupt error, crashing window, application



• EMMS – empty MMX state

Ex : EMMS

• MOVED – move double word

Ex: MOVED MM3, EAX reg to xreg

MOVED EAX, MM4 xreg to regMOVED MM3, DATA mem to xregMOVED DATA1, MM3 xreg to mem

MOVEQ – move quadword

Ex: MOVEQ MM3, MM1 xreg to xregMOVEQ MM3, DATA mem to xregMOVEQ DATA1, MM3 xreg to mem



• PACKSSDW – pack signed doubleword to word

Ex :PACKSSDW MM1,MM2 xreg to xreg PACKSSDW MM1,DATA mem to xreg

• PACKSSWB – pack signed word to byte

Ex :PACKSSWB MM1,MM2 xreg to xreg PACKSSWB MM1,DATA mem to xreg

• PACKUSDW – pack unsigned word to byte

Ex :PACKUSDW MM1,MM2 xreg to xreg PACKUSDW MM1,DATA mem to xreg


MMX Technology MMX Technology • PADD – add with truncation : byte, word & doubleword

Ex : PADDB MM1,MM3 xreg to xregPADDW MM1,MM3PADDD MM1,MM3

PADDB MM1, DATA mem to xregPADDW MM1, DATAPADDD MM1,DATA

• PADDS – add with signed saturation : byte & word

Ex : PADDSB MM1,MM3 xreg to xregPADDSW MM1,MM3

PADDSB MM1, DATA mem to xregPADDSW MM1, DATA



• PADDUS – add with unsigned saturation : byte & wordEx :

PADDUSB MM1,MM3 xreg to xregPADDUSW MM1,MM3

PADDUSB MM1, DATA mem to xregPADDUSW MM1, DATA

• PAND – And•EX :

PAND MM1,MM2 xreg to xregPAND MM1,DATA mem to xreg

• PAND – NandEX :

PANDN MM1,MM2 xreg to xregPANDN MM1,DATA mem to xreg


MMX Technology MMX Technology • PCMPEQU – compare for equalityEx :

PCMPEQUB MM1,MM2 xreg to xreg

PCMPEQUW MM1,MM2

PCMPEQUD MM1,MM2

PCMPEQUB MM1,DATA mem to xreg

PCMPEQUW MM1,DATA

PCMPEQUD MM1,DATA

PCMPGT – compare for greater thanEx :

PCMPGTB MM1,MM2 xreg to xregPCMPGTW MM1,MM2

PCMPGTD MM1,MM2PCMPGTB MM1,DATA mem to xregPCMPGTW MM1,DATAPCMPGTD MM1,DATA



• PMADD – multiply and addEx :

PMADD MM1,MM4 xreg to xreg

PMADD MM1,DATA mem to xreg

• PMULH – multiplication - high Ex :

PMULH MM1,MM4 xreg to xregPMULH MM1,DATA mem to xreg

• PMULL – multiplication - low Ex :

PMULL MM1,MM4 xreg to xregPMULL MM1,DATA mem to xreg

• POR – orPOR MM1,MM4 xreg to xreg

POR MM1,DATA mem to xreg



• PSLL – shift left :word, doubleword and quadword

Ex :PSLLW MM1,MM3 xreg to xregPSLLD MM1,MM3PSLLQ MM1,MM3

PSLLW MM1,DATA mem to xregPSLLD MM1,DATAPSLLQ MM1,DATA

PSLLW MM1,5 xreg by count PSLLD MM1,4PSLLQ MM1,7



• PSRA – shift arithmetic right :word, doubleword and quadword

Ex :PSRAW MM1,MM3 xreg to xregPSRAD MM1,MM3PSRAQ MM1,MM3

PSRAW MM1,DATA mem to xregPSRAD MM1,DATAPSRAQ MM1,DATA

PSRAW MM1,5 xreg by count PSRAD MM1,4PSRAQ MM1,7



• PSRL – shift right :word, doubleword and quadword

Ex :PSRLW MM1,MM3 xreg to xregPSRLD MM1,MM3PSRLQ MM1,MM3

PSRLW MM1,DATA mem to xregPSRLD MM1,DATAPSRLQ MM1,DATA

PSRLW MM1,5 xreg by count PSRLD MM1,4PSRLQ MM1,7


MMX Technology MMX Technology • PSUB – subtraction with truncation : byte, word & doublewordEx :

PSUBB MM1,MM3 xreg to xregPSUBW MM1,MM3PSUBD MM1,MM3

PSUBB MM1, DATA mem to xregPSUBW MM1, DATAPSUBD MM1,DATA

• PSUBS – subtraction with signed saturation: byte, word & doublewordEx :

PSUBSB MM1,MM3 xreg to xregPSUBSW MM1,MM3PSUBSD MM1,MM3

PSUBSB MM1, DATA mem to xregPSUBSW MM1, DATAPSUBSD MM1,DATA



• PSUBUS – subtraction with unsigned saturation: byte, word & doublewordEx :

PSUBUSB MM1,MM3 xreg to xregPSUBUSW MM1,MM3PSUBUSD MM1,MM3

PSUBUSB MM1, DATA mem to xregPSUBUSW MM1, DATAPSUBUSD MM1,DATA

• PXOR – exclusive OrEx :

PXOR MM1,MM3 xreg to xregPXOR MM4,DATA mem to xreg



• PUNPCKH – unpack high : byte, word & doublewordEx

PUNPCKHB MM1,MM3 xreg to xregPUNPCKHW MM1,MM3PUNPCKHD MM1,MM3PUNPCKHB MM1,DATA mem to xregPUNPCKHW MM1,DATAPUNPCKHD MM1,DATA

PUNPCHL – unpack LOW : byte, word & doublewordEx

PUNPCHLB MM1,MM3 xreg to xregPUNPCHLW MM1,MM3PUNPCHLD MM1,MM3PUNPCHLB MM1,DATA mem to xregPUNPCHLW MM1,DATAPUNPCHLD MM1,DATA

Arithmetic Coprocessor Coprocessor Basic:

Documents

Transcript of Arithmetic Coprocessor Coprocessor Basic: