Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2...

Digital Kommunikationselektronik TNE027 Lecture 2

1

FA

x n –1

c n c n 1 -

y n 1 –

s n 1 –

FA

x 1

c 2

y 1

s 1

FAc 1

x 0 y 0

s 0

c 0

MSB position LSB position

Ripple-Carry Adder (See Fig. 2.6)

Longest delay (Critical-path delay):

dc(n) = n×dcarry = 2n gate delays

ds(n-1) = (n-1)× dcarry+dsum = 2n gate delays

Binary Adders


2

0 0 0 1 0 1 1 1

c i 1 +

0 0 0 0 1 1 1 1

0 0 1 1 0 0 1 1

0 1 0 1 0 1 0 1

c i x i y i

00 01 11 10

0

1

x i y i c i

1

1

1

1

s i x i y i c i =

00 01 11 10

0

1

x i y i c i

1

1 1 1

c i 1 + x i y i x i c i y i c i + + =

c i

x i

y i s i

c i 1 +

(a) Truth table

(b) Karnaugh maps

(c) Circuit

0 1 1 0 1 0 0 1

s i

Full adder


3

• Fast-carry logic in FPGAs– The fast-carry logic is about a magnitude faster than the

delay through a regular logic LUT.

See Fig. 2.7.

FCLc n c n 1 -

c n

FCLc 2

c 2

FCLc 1

x n –1 y n 1 – x 1 y 1 x 0 y 0

c 1

c 0

x n –1 y n 1 – x 1 y 1 x 0 y 0

XOR

s n-1

XOR

s 0

XOR

s 1


4

• Instruction pipeline for a RISC

Instruction fetch

Instruction decode and

register fetch

Execution and

address calculation

Memory access

ResultWrite back

IF ID EX MEM WB

Instruction Pipelining

• Total latency: Total delay time from instruction fetch to result write back to a register

• Throughput (maximum frequency, registered performance): Number of results (instructions) per second


5

Instruction number Clock number1 2 3 4 5 6 7 8 9 10

Instruction i IF ID EX MEM WBInstruction i+1 IF ID EX MEM WBInstruction i+2 IF ID EX MEM WBInstruction i+3 IF ID EX MEM WBInstruction i+4 IF ID EX MEM WBInstruction i+5 IF ID EX MEM WB

Instruction Pipelining (continued)


6

• Pipelined adders– Pipelining principle can be applied to FPGA designs at

little or no additional cost since each logic element contains a flip-flop, which is otherwise unused.

– An arithmetic operation is broken into small primitive operations. The result of each primitive operations is saved in registers after each pipeline stage.

– If one part of the data is not processed at a pipeline stage, this part of data should still be saved after the pipeline stage.See Fig. 2.9.

Arithmetic Pipelining


7

• Modulo Adders– Building block of RNS-DSP design– Modulo operation is performed by (a) an extra

adder or (b) a ROM look-up table.

See Fig.2.11.


8

• Multiplication by 2k can be implemented by shifting the bits of the operand to the left by k.

• Division by 2k can be implemented by shifting the bits of the operand to the right by k. For signed numbers, it is necessary to preserve the sign. This is done by shifting the bits to the right and filling from the left with the value of the sign bit.

Multiplication and Division by 2k


9

Binary Multipliers

Multiplication of unsigned numbers

1 1 1 0

1 1 1 01 0 1 1

1 1 1 00 0 0 0

1 1 1 0

1 0 0 1 1 0 1 0

Multiplicand MMultiplier Q

Product P

(14)(11)

(154)

Multiplication of unsigned numbers by hand


10

1 1 1 0

1 1 1 01 0 1 1

1 1 1 0

1 0 0 1 1 0 1 0

Multiplicand MMultiplier Q

Product P

(11)(14)

(154)

+

1 0 1 0 10 0 0 0+

0 1 0 1 01 1 1 0+

Partial product 0

Partial product 1

Partial product 2

Multiplication of unsigned numbers

Multiplication for implementation in hardware


11

• Shift-add multiplier– Serial/parallel multiplier

• N full adders + shift registers

– Serial/serial multiplier• One carry-save adder + shift registers

– Parallel/parallel multiplier (array multiplier)• N2 full adders (See Fig. 2.13.)

• Fast array multiplier for FPGAs (See Fig. 2.14.)

• Multiplier blocks

P = Y X = (Y22N+Y1) (X22N+X1)

= Y2X222N + (Y2X1+Y1X2) 2N+ Y1X1


12

Carry-save adder

Serial/serial multiplier

Multiplicant&

Multiplier

Carry-save adder Partial product

Fulladder

a b

D Q

Q Carry-out

Clock

Reset

D Q

Q

s

Y 2

Y 1 Sum bit

y 2

y 1


13

Serial/parallel multiplier using carry-save adders

FA D

D

&

FA D

D

&

D ...

An-2 A0

FA

D

&

An-1Bn-1 Bn-2 ...B0

shift

...

Multiplicant An-1 An-2 ...A0 Multiplier Bn-1 Bn-2 ...B0


14

Binary Dividers

• Division– Division is the most complex operation among

the 4 basic arithmetic operations.– Let N denote numerator and D denominator,

two results are produced: the quotient Q and the remainder R:

N / D = Q + R / D


15

– In division, each quotient bit is determined in a sequential “trial-and-error” procedure. (In multiplication, all partial products can be produced parallel.)

– Result should be constrained:

Q N and |R| D– For signed numbers, R and N are assumed to

have the same sign.


16

Division Algorithms Linear Convergence Division Algorithms

• Restoring divider– a trial-and-error method translated directly from

the “pencil-and-paper” method

See example 2.18.– The main disadvantage of the restoring divider

is that we need two steps, subtract and add (i.e., restore), to determine one quotient bit.


17

• Nonperforming divider– A temporary remainder value is tested before the

remainder register is updated.

– Note that the following VHDL code describes a combinational circuit.

t := r – d; --temporary remainder value

IF t >= 0 THEN -- Nonperforming test

r := t; -- Update remainder

q := q * 2 + 1; -- Shift left and add 1

ELSE

q :=q * 2; -- Shift left

END IF;


18

• Nonrestoring divider– Always perform the subtraction.– If the remainder is negative, perform an

addition of dk/2 in the next step, instead of the restoring addition of dk in the present step and the subtraction of dk/2 in the next step.

– The quotient bit can be positive or negative, i.e., dk = 1, but not zero. This is a signed-digit representation. The negative ones can be saved in the quotient register as zeros.


19

– The signed-digit representation should be converted to 2’s complement representation.

For example: qSD =1111 is saved as 1001 in the quotient register.To convert: 1001 (positive ones)

– 0110 (negative ones) 0011

alternatively: 2 * 1001 + 110011001

+ 1 10011

– Correct remainder if r < 0: r := r + D and q := q – 1.


20

Fast Divider Design• Division through multiplication with

reciprocal of denominator – The reciprocal can be computed via a look-up

table for small bit width.– One can use Newton Algorithm to compute the

reciprocal. f(x) = 1/x – D 0

x∞ = 1/D

xk+1 = xk – f(xk) / f ´(xk)

xk+1 = xk (2 – D xk )


21

• Division by Convergence– Both numerator N and denominator D are multiplied

by approximation factors fk.– After a sufficient number of iterations k (quadratic

convergence),

DΠ fk 1 and NΠ fk Q

Algorithm:– Normalize N and D such that D is close to 1.– Initialize x0 = N and t0 = D .– Repeat the following loop until xk shows the desired

precision.

fk = 2 – tk

xk+1 = xk fk

tk+1 = tk fk See Example 2.21.

Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2...

Documents

Transcript of Digital Kommunikationselektronik TNE027 Lecture 2 1 FA x n –1 c n c n1- y n1– s n1– FA x 1 c 2...