Download - Arithmetic For Computers Mehran Rezaeiengold.ui.ac.ir/~m.rezaei/architecture/calendar/... · Addition and subtraction 3 + + +. . . . B A 0 B 31 A 31 B 1 A 1 0 op a/s R R 1 R 0 31

Arithmetic For Computers

Mehran Rezaei

Introduction

• What happens if an operation generates

a number bigger than it can be

represented (by the space given

originally)?

• How does hardware multiply and divide

numbers?

• What about fractions, floating points and

real numbers? How does computer deal

with them?2

Addition and subtraction

3

+++

. . . .

A0B0A1B1A31B31

op

a/s

R0R1R31

00

z

Have you thought of performance?

4

ab

cin

cout

abcin s

Big-O notation

• f(n) is O(g(n)):

if (two constants) n0 and c can be found to satisfy:

f(n) < cg(n) for any n, n > n0

• g(n) is simple function: 1, n, log2n, n2, n3, 2n

• Following are O(n2):

5

Big-O notation (cont’d)

6

Have you thought of performance?

7

ab

cin

cout

abcin s

Tpd(result3) = 3*(Tpd(nand3)+Tpd(nand2))+Tpd(xor)

Tpd(resultn-1) = (n-1)*(Tpd(nand3)+Tpd(nand2))+Tpd(xor)

Tpd(resultn-1) = (n-1)*constant1+constant2 = O(n)

Ripple carry adder

• What seems to be the problem?

• N-bit (32-bit or 64-bit) ripple carry adder

8

Delay = O(n)

Area = O(n)

Carry select adder

9

Tpd(32 bit CSA) = Tpd(16 bit RCA) + Tpd(multiplexer)

Courtesy of slide: Chris Terman, computational structure, MIT

10

Courtesy of slide: Chris Terman, computational structure, MIT

Speedup: 2.5 times faster than 32 bit ripple carry adder

(in cost of: twice as much area)

Different flavors of adders

• Ripple Carry Adder موج لرزه های کوچک را ریپل می گویند

– RCA

• Carry Select Adder

– Delay: O(log2n)

– Area: O(n)

• Carry Lookahead Adder (CLA)

– Delay: O(log2n)

– Area: O(nlog2n)

• Carry skip adder

– Delay: O(n1/2)

– Area: O(n)

11

Carry Lookahead Adder

12

C1 = A0B0 + A0C0 + B0C0 = A0B0 + C0(A0+B0)

C2 = G1 + C1.P1

= G1 + (G0 + C0.P0).P1

C3 = G2 + C2.P2

= G2 + (G1 + G0.P1 + C0.P0.P1).P2

C4 = G3 + G2.P3 + G1.P2.P3 + G0.P1.P2.P3 + C0.P0.P1.P2.P3

Carry Lookahead Adder (cont’d)

13

C4 = G3 + G2.P3 + G1.P2.P3 + G0.P1.P2.P3 + C0.P0.P1.P2.P3

Delay: O(log2n)

Area: O(nlog2n)

Hybrid (CLA & CRA)

14

CLA

8 8

8

CLA

8 8

8

CLA

8 8

8

CLA

8 8

8

C0 C32C8 C16 C24

Group Generate/ Group Propagate

Final note on CLA

• Could I change Pi = Ai + Bi to Pi = Ai Bi?

15

If Cin = 0 then “carry is generated”

Else “carry is propagated”

End if

Ci+1 = Gi + Ci.Pi where Gi = Ai.Bi and Pi = Ai + Bi

Pi = Ai Bi =>

Ci+1= Gi + Ci•Pi

= Ai•Bi+ Ci•(Ai•Bi’+ Ai’•Bi)

= Ai•Bi + Ci•Ai•Bi’ + Ci•Ai’•Bi

= Ai•Bi + Ci•Ai•Bi’+ Ai•Bi + Ci•Ai’•Bi

= Ai•(Bi + Ci•Bi’) + Bi•(Ai + Ci•Ai’)

= Ai•(Bi+Bi’)•(Bi+Ci) + Bi•(Ai+Ai’)•(Ai+Ci)

= Ai•(Bi+Ci) + Bi•(Ai+Ci)

= Ai•Bi + Ci•(Ai+Bi)

Final note on CLA (cont’d)

16

FA

Ai

BiCi

Pi

SiGi

CoH=GH+CiHPH=GH+(GL+CiLPL)PH

= GH+GLPH + CiLPLPH

=GHL+CiLPHL

17

O(logN)

8 bit CLA (with Generate and Propagate)

8 bit CLA (with Generate and Propagate)

18

19

Addition/Subtraction and overflow

• Examples

0X4E + 0X1F

0X4E – 0X1F

• Overflow

operation result

A + B A > 0 B > 0 < 0

A + B A < 0 B < 0 > 0

A - B A > 0 B < 0 < 0

A - B A < 0 B > 0 > 0

condition

20

What does the ALU (hardware) do

when overflow happens?

• Ignore

– Programmer is responsible for

• Leave it to OS

– Either completely takes care of it

– Or signals the application

• What does MIPS do?

– For signed operation (if overflow occurs) it throws an

exception

– It ignores the overflow of unsigned operations

signed addition with status

• Adder with

– Carry-in: need an extra bit (LSB)

– Carry-out: need an extra bit (MSB)

– Overflow:

• two operands has the same sign but the sum has a different sign

– Zero

• 1 If result is zero and no overflow

• 0 otherwise

– Sign (of the addition result)

If not overflow, MSB result

Else (MSB result)’

21

Array Multiplier

22

𝑦 =

𝑖=0

𝑛−1

𝑎. 𝑏𝑖. 2𝑖

Array Multiplier

23

FAFAFA

Array Multiplier

24

FAFAFA

FAFAFA

Array Multiplier

25

FAFAFA

FAFAFA

FAFAFA

Array Multiplier

26

FAFAFA

FAFAFA

FAFAFA

FAFAFA

Add and Shift Multiplier

27

Add and Shift Multiplier (Cont’d)

28

1101

1011

000000000

1101

011010000

001101000

1101

100111000

010011100

0000

010011100

001001110

1101

100011110

010001111


29


30


31

ProductCout

X

Y

for(i=0;i<4;i++){

if(Product[0] == 1)

(Cout,Product[7-4]) <-- Product[7-4] + Y;

ShiftRight (Cout,Product);

}

Data path and control unit

32

Data path

• Definition:

– A collection of computational components

(e.g., adders/subs, multipliers/dividers, FP

computational units, …) and memory

elements (e.g., flip flops, registers, shift

registers, …) connected with each other via

routing networks (buses) for performing all

the requirements needed and defined in

system’s specification

33

Control Unit

• Definition

– A combinational or sequential circuit that

controls the flow of data in the data path for

performing all the requirements needed and

defined in system’s specification

34

Synchronous vs Asynchronous circuits

• Globally synchronous circuit: all memory

elements (D FFs) controlled (synchronized) by

a common global clock signal

• Globally asynchronous but locally synchronous

circuit (GALS).

• Globally asynchronous circuit

– Use D FF but not with a global clock

– Use no clock signal

35

Synchronous Circuit

• The Big idea: Synchronous methodology

– Group all D FFs together with a single clock:

Synchronous methodology

– Only need to deal with the timing constraint

of one memory element

36

Routing Networks (Buses)

• Tri-state buffer:

– Output with “high-impedance”

37

Routing Networks (Bi – directional)

38

Routing Networks (Multiplexers)

39

What are the differences between tri-state

buffers and multiplexers?

Routing Networks (Cont’d)

40

S SS

D DD

S SS

D DD

Routing Networks (Multiplexers)

41

What are the differences between tri-state

buffers and multiplexers?

i1

i2

sel2

sel1

o

Side note wrap up

• Methodology

– Separate data path from control unit

• Defined data path and control unit

– Synchronous circuit design

• Routing networks and buses

– Tri-state buffers

– Multiplexers as routing networks

42


43

ProductCout

X

Y

for(i=0;i<4;i++){

if(Product[0] == 1)

(Cout,Product[7-4]) <-- Product[7-4] + Y;

ShiftRight (Cout,Product);

}

A

A&S multiplier – data path

• Requirements

– 4 bit registers

• Shift register A (clear, load, and shift right)

• Shift register X (load, and shift right) - multiplier

• Register Y (load) - multiplicant

– A flip flop (load and clear)

– 4 bit adder

– A 2 to 1 multiplexer (if we have 4 bit wide

output bus)

44

A&S multiplier – data path (cont’d)

45

A X

Y

adder

Input Bus

Output Bus

Cout

result

carry out

A&S multiplier – data path (cont’d)

46

A X

Y

adder

Input Bus

Output Bus

Cout

result

carry out

sel

shift

load

cle

ar

sh

ift

loa

d

cle

ar

sh

ift

load

A&S multiplier – control unit

47

Examples

48

A catch

49

n n - 1

n = 0 n != 0

n n - 1

wait

n = 0 n != 0

A&S multiplier – control unit

start start - X[0] = 1

X[0] = 0

-

-

-

--

load X

clear Cout

clear A

load Y load A

load Cout

add

shift A

shift X

shift

done =1

sel = 0

done =1

sel = 1

entity

51

Add & Shift Multiplier

Product Multiplier

Multiplicand

32-bit ALU

Control

32 bits

64 bits

s

w

What happens on signed

Multiplications?

52

Signed Multiplications

• Consider again our 4 bit word multiplication X is

multiplicand and Y is multiplier

– If x3 = y3 = 0

• Unsigned multiplications

– If x3 = 0 and y3 = 1

• For the first 3 steps, do the normal add and shift; and finally

P = P – X

– If x3 = 1 and y3 = 0

• Do the normal shift until the first one is reached (in Y) from

this point on shift 1 instead of zero

– If x3 = 1 and y3 = 1

• Think about this as a homework problem

53

Signed Multiplication X Positive and Y negative

0110

1001

----

0110

00000000

--------

01100000

00110000

0110

0100

----

0000

00110000

--------

00110000

00011000

0110

0010

----

0000

00011000

--------

00011000

00001100

0110

0001

----

0110

00001100

--------

1010

00001100

--------

10101100

11010110

shiftMultiplicand

Multiplier

------------

Result

+ Product

------------

Product

Shift

Tw

o’s co

mp

limen

t

54

Booth’s algorithm (1951)

• In any of approaches we have seen, the

multiplier was examined bit by bit

• Can we take advantage of addition and

subtraction?

• In Booth’s algorithm every two bits of the

multiplier will indicate the action

55

Booth’s algorithm (Cont’d)

0 0 0 1 1 1 1 0Beginning

of run

End

of run

middle of run

–1

+ 10000

-------

011110 Current

PositionPrevious

Position

observation

56

Booth’s algorithm (example)

0110

1001

----

0000 10010

1010

---------

1010 1001

1101 01001

0110

---------

0011 0100

0001 10100

0000 1101

0000 11010

1010

---------

1010 1101

1101 0110

Prd

Sub

Prd

Shift

Add

Prd

shift

prd

shift

Sub

prd

Shift

57

Divide: Paper & Pencil

1000 1001010

-1000

-----

10

101

1010

-1000

-----

10

1001

Dividend : Divisor = Quotient

Divisor * Quotient + Remainder

= Dividend

58

Divide: Paper & Pencil (Cont’d)

• Initial values

• Rg = Rg – Div

• Rg < 0 -> Q0 = 0 and Rg = Rg + Div

• Rg >= 0 -> Q0 = 1

0010 0000 0000 0111

Divisor Rg

Remainder Rg

Quotient

-

+

0

1 0

0

+

1110 0000

7: 2 or 0000 0111 : 0010

+ 0010 0000

59


• Shift Div to Right

0001 0000 0000 0111

Divisor Rg

Remainder Rg

Quotient

-

+

0

1 0

0

0

60


• Check if the iteration N+1 reached

0001 0000 0000 0111

Divisor Rg

Remainder Rg

Quotient

-

+

0

1 0

0

0

Doneyes No

61

The rest of the example

Page 268, figure 4.38

62

V1 algorithm

2b. Restore the original value by adding the

Divisor register to the Remainder register, &

place the sum in the Remainder register. Also

shift the Quotient register to the left, setting

the new least significant bit to 0.

Test

Remainder

Remainder < 0Remainder 0

1. Subtract the Divisor register from the

Remainder register, and place the result

in the Remainder register.

2a. Shift the

Quotient register

to the left setting

the new rightmost

bit to 1.

3. Shift the Divisor register right 1 bit.

Done

Yes: n+1 repetitions (n = 4 here)

Start: Place Dividend in Remainder

n+1

repetition?

No: < n+1 repetitions

63

DIVIDE HARDWARE Version 1

Remainder

Quotient

Divisor

64-bit ALU

Shift Right

Shift Left

WriteControl

32 bits

64 bits

64 bits

64

Observations on Divide Version 1

• 1/2 bits in divisor always 0

=> 1/2 of 64-bit adder is wasted

=> 1/2 of divisor is wasted

– Cut the divisor and ALU to half

• Instead of shifting divisor to right,

shift remainder to left?

65


Remainder

Quotient

Divisor

32-bit ALU

Shift Left

Write

Control

32 bits

32 bits

64 bits

Shift Left

66


• If the quotient receives a 1 at the first iteration then the

quotient register will not be long enough to hold the

value (one bit for any iteration!)

• Eliminate Quotient register by combining with

Remainder as shifted left

– Start by shifting the Remainder left as before.

– Thereafter loop contains only two steps because the shifting of

the Remainder register shifts both the remainder in the left half

and the quotient in the right half

– The consequence of combining the two registers together and

the new order of the operations in the loop is that the remainder

will shifted left one time too many.

– Thus the final correction step must shift back only the

remainder in the left half of the register

67


Remainder (Quotient)

Divisor

32-bit ALU

Write

Control

32 bits

64 bits

Shift Left“HI” “LO”

68

V3 example


69


• Same Hardware as Multiply: just need ALU to add or subtract, and 63-bit register to shift left or shift right

• Hi and Lo registers in MIPS combine to act as 64-bit register for multiply and divide

• Signed Divides: Simplest is to remember signs, make positive, and complement quotient and remainder if necessary– Note: Dividend and Remainder must have same sign

– Note: Quotient negated if Divisor sign & Dividend sign disagreee.g., –7 ÷ 2 = –3, remainder = –1

• Possible for quotient to be too large: if divide 64-bit integer by 1, quotient is 64 bits (“called saturation”)

70

Multiplication and division Inst.


71

Review of Numbers

• Computers are made to deal with

numbers

• What can we represent in N bits?

– Unsigned integers:

0 to 2N - 1

– Signed Integers (Two’s Complement)

-2(N-1) to 2(N-1) - 1

72

Other Numbers

• What about other numbers?– Very large numbers? (seconds/century)

3,155,760,00010 (3.1557610 x 109)

– Very small numbers? (atomic diameter)0.0000000110 (1.010 x 10-8)

– Rational (repeating pattern) 2/3 (0.666666666. . .)

– Irrationals21/2 (1.414213562373. . .)

– Transcendental e (2.718...), (3.141...)

• All represented in scientific notation

73

Scientific Notation Review

• Normalized form: no leadings 0s (exactly one digit to left of decimal point)

• Alternatives to representing 1/1,000,000,000– Normalized: 1.0 x 10-9

– Not normalized: 0.1 x 10-8,10.0 x 10-10

6.02 x 1023

radix (base)decimal point

mantissa exponent

74

Scientific Notation for Binary Numbers

• Computer arithmetic that supports it called floating point, because it represents numbers where binary point is not fixed, as it is for integers– Declare such variable in C as float

1.0two x 2-1

radix (base)“binary point”

Mantissa exponent

75

Floating Point Representation (1/2)

• Normal format: +1.xxxxxxxxxxtwo*2yyyytwo

• Multiple of Word Size (32 bits)

031S Exponent30 23 22

Significand

1 bit 8 bits 23 bits

• S represents Sign

Exponent represents y’s

Significand represents x’s

• Represent numbers as small as

2.0 x 10-38 to as large as 2.0 x 1038

76

Floating Point Representation (2/2)

• What if result too large? (> 2.0x1038 )

– Overflow!

– Overflow => Exponent larger than represented in 8-bit

Exponent field

• What if result too small? (>0, < 2.0x10-38 )

– Underflow!

– Underflow => Negative exponent larger than

represented in 8-bit Exponent field

• How to reduce chances of overflow or

underflow?

77

Double Precision Fl. Pt. Representation

Next Multiple of Word Size (64 bits)

• Double Precision (vs. Single Precision)

– C variable declared as double

– Represent numbers almost as small as

2.0 x 10-308 to almost as large as 2.0 x 10308

– But primary advantage is greater accuracy

due to larger significand

031S Exponent

30 20 19Significand

1 bit 11 bits 20 bitsSignificand (cont’d)

32 bits

78

IEEE 754 Floating Point Standard

• Single Precision, DP similar

• Sign bit: 1 means negative

0 means positive

• Significand:

– To pack more bits, leading 1 implicit for normalized

numbers

– 1 + 23 bits single, 1 + 52 bits double

– always true: Significand < 1 (for normalized

numbers)

• Note: 0 has no leading 1, so reserve exponent

value 0 just for number 0

79


• Kahan wanted FP numbers to be used even if no FP hardware; e.g., sort records with FP numbers using integer compares

• Could break FP number into 3 parts: compare signs, then compare exponents, then compare significands

• Wanted it to be faster, single compare if possible, especially if positive numbers

• Then want order:– Highest order bit is sign ( negative < positive)

– Exponent next, so big exponent => bigger #

– Significand last: exponents same => bigger #

80


• Called Biased Notation, where bias is number

subtract to get real number

–IEEE 754 uses bias of 127 for single prec.

–Subtract 127 from Exponent field to get actual value for

exponent

–1023 is bias for double precision

031S Exponent30 23 22

Significand

1 bit 8 bits 23 bits

• (-1)S x (1 + Significand) x 2(Exponent-127)

– Double precision identical, except with exponent

bias of 1023

81

“Father” of the Floating point standard

IEEE Standard 754

for Binary Floating-Point Arithmetic.

www.cs.berkeley.edu/~wkahan/

…/ieee754status/754story.html

Prof. Kahan

1989

ACM Turing

Award Winner

82

Converting Decimal to FP

• Simple Case: If denominator is an exponent of 2 (2, 4, 8, 16, etc.), then it’s easy.

• Show MIPS representation of -0.75

-0.75 = -3/4

-11two/100two = -0.11two

Normalized to -1.1two x 2-1

(-1)S x (1 + Significand) x 2(Exponent-127)

(-1)1 x (1 + .100 0000 ... 0000) x 2(126-127)

1 0111 1110 100 0000 0000 0000 0000 0000

83

Hairy Example

• How to represent 1/3 in MIPS?

• 1/3

= 0.33333…10

= 0.25 + 0.0625 + 0.015625 + 0.00390625 +

0.0009765625 + …

= 1/4 + 1/16 + 1/64 + 1/256 + 1/1024 + …

= 2-2 + 2-4 + 2-6 + 2-8 + 2-10 + …

= 0.0101010101… 2 * 20

= 1.0101010101… 2 * 2-2

84

Hairy Example

• Sign: 0

• Exponent = -2 + 127 = 12510=011111012

• Significand = 0101010101…

0 0111 1101 0101 0101 0101 0101 0101 010

85

Floating point instructions