9. ALU Arithmetic - Rutgers Universitypaull/chapt9.pdf · 9.2. 2’s Complement Addition and...

303

9. ALU ArithmeticThe ALU contains circuitry for doing elementary functions, both logical and arithmetic. The

ALU’s view or representation of numbers, is typically different than that used for paper calculations.The representation is generally binary or perhaps trinary, and is not usually the most obvious suchrepresentation. The choice has been strongly influenced by the hardware cost of basic arithmeticoperations required by the various representations.

9.1. RepresentationA number of representations of fixed point numbers have been proposed for use in computer

arithmetic. Amongst these sign-magnitude and 2’s complement are of prime importance. The firstbecause it is the simple and natural representation and the second because, with it, arithmeticoperations can be implemented efficiently.

Sign-magitude, addition of two numbers, if both are positive or both negative requires additionof their magnitudes. The sign of the result is respectively positive or negative. If one is positive andthe other negative addition can be accomplished by subtracting the smaller magnitude from thelarger, and assigning the result the sign of the larger. Sign-magitude subtraction just changes thesign of the subtrahend and then follows the procedure for adding in the previous sentence.Overflow is only possible if both number are of the same sign. It is indicated by a 1 in the sign bit ofthe magnitude sum. So an adder and subtractor is required for the sign-magitude representation.

Adding and subtracting in the 2′s complement representation is simple and efficient. To addbinary numbers requires an Adder circuit. Subtraction of N2 from N1, both in 2′s complement,whether positive or negative, can be achieved with the same Adder, augmented with a simplecircuit to take the 2′s complement of the N2. Any series of additions and subtractions can be soaccomplished with the numbers remaining in 2′s complement representation. This accounts for itsefficiency. The simplicity of addition, subtraction and detection of overflow has an interestingexplanation.

9.2. 2’s ComplementAddition and subtraction is often done numbers in the 2′s complement representation, so a

short review of this representation and its behavior in arithmetic operations is in order.

9.2.0.1. 2’s Complement DefinitionsThe following notation is not intended to make actual computation of the 2′s complement any

easier, but rather to make the explanation of arithmetic operations using 2′s complement arithmeticsimpler.

TC [Q] is the 2′s complement representation of the number Q using n + 1 bits, [n...0]. Bit n is thesign bit, 0 for positive and 1 for negative numbers. If M is a positive n bit number.

TC [M] = M, So the largest positiive number representable is 2n − 1 ([01...1])

TC [−M] is easily computed by complementing M and adding 1. The following alternativedefinition is basic to the subsequent development.

TC [−M] = 2n + (2n − M), where the first 2n makes the sign bit 1. It follows that

304

+

1 0 0 0 0 0 0 01 1 0 1 1 0 0

0 0 1 0 1 0 01 0 0 0 0 0 0 0

2n

2n

M

1 0 0 1 0 1 0 0

2’s complement of -M

0 0 1 0 0 1 1

0 0 1 0 1 0 0

+ 11’s complement of M

1 0 0 1 0 1 0 0 in the sign bit1

1 1 0 1 1 0 0M

TC[-M]TC[-M]2’s complement of -M

2’s complement of M

Figure 9-1: 2’s Complement Representation, TC[M] - Examples

TC [−M] = (2n+1 − M) which is the same as subtracting M from all 0 s in bits 0 through n (whathappens in bit n + 1 doesn’t matter since it is not part of the result.)

For example when M = 2n, TC [−M]) = 2n + (2n − 2n) = 2n. So −2n ([10...0]) is the smallestnumber representable.

TC [TC [−M]] = 2n + 2n − [2n + (2n − M)] = M

Examples of the 2′s complement transformations of positive and negative numbers is given infigure 9-1.

9.2.1. Addition-SubtractionSubtraction is done by taking the 2′s complement of the number to be subtracted and adding,

so only addition need be considered.

When two numbers in 2′s complement which are both in-range of the n + 1 bit representationare added they might very well produce a result which is outside that range (out-of-range). It is oneof the advantages of the 2′s complement representation that such out-of-range results or overflowsare easily detected. How it is done depends on the signs of the addends.

Two Positive Numbers: The Overflow Condition

Consider the addition of two n bit positive (bit n is 0) numbers of magnitude M, and N in2′s complement representation. It is clear that straightforward binary addition M and N, sorepresented, will give the correct representation of the sum as long as the sign bit (n) of the result is0. If the sign bit is 1 then there has been an overflow (the number is larger than can be representedin the n + 1 bits allowed.)

In summary 1 in the sign bit indicates overflow.

Two Negative Numbers: The Overflow Condition

Consider addition of the n + 1 bit negative numbers of magnitude M, and N in 2′s complementrepresentation, both within range ex.

305

TC [−M] = 2n + (2n − M) ,

TC [−N] = 2n + (2n − N).

Simply adding TC [−M] and TC [−N] gives:

TC [−M] + TC [−N] = 2n + 2n + (2n − N) + (2n − M)

TC [−M] + TC [−N] = 2n + 2n + 2n + (2n − (M + N))

= 2n+1 + (2n + (2n − (M + N)))

Certainly M + N ≤ 2n + 2n, whether or not there is overflow, so it follows that

2n+1 + (2n + (2n − (M + N))) ≥ 2n+1. Thus the bit at position n + 1 does not differentiateamongst outcomes, and is in any case outside the number representation [n....0]. Therefore it isdiscarded.

TC [−M] + TC [−N] = 2n + (2n − (M + N))

TC [−(M + N)] = 2n + (2n − (M + N))]. So

If 2n ≥ (M + N) then

TC [−(M + N)] = TC [−M] + TC [−N] and

there is a 1 in position n. There is no overflow. and addition gives the correct 2′s complementrepresentation.

If 2n < (M + N) (an overflow condition) then

TC [−M] + TC [−N] = 2n − [|2n − (M + N)|]

TC [−M] + TC [−N] is then less than 2n.

So, even though adding two negative numbers has a negative result, there is a 0 in the signbit, indicating there is overflow.

In summary 0 in the sign bit is the overflow condition. With a 1 in the sign bit the correct2′s complement representation has been obtained.

One Positive And One Negative Number: Overflow Impossible

Notice immediately that the if M and N are in-range there difference must be be in-range. Sono overflow must be indicated after this addition.

Consider addition of an n + 1 bit positive number M and an n + 1 bit negative number ofmagnitude N. In 2′s complement the representations are:

TC [M] = M

TC [−N] = 2n + (2n − N) ,

TC [M] + TC [−N] = (2n + 2n − N + M) = 2n + (2n − [N − M])

if N > M:

The resulting number is the sum of 2n and a number less than 2n, namely (2n − [N − M]) andthus will have a 1 in the n’th bit which is fine since the result is negative.

TC [M] + TC [−N] = 2n + (2n − [N − M]) = TC [M − N] when M < N

306

which is the correct 2′s complement representation of the sum

if M ≥ N

TC [M] + TC [−N] = M + (2n + (2n − N)) = 2n+1 + [M − N]

The result is obviously ≥ 2n+1, guaranteeing a 1 in the n + 1 st, so 2n+1 can be discarded.

TC [M] + TC [−N] = M − N = TC [M − N] when M ≥ N

which is the correct 2′s complement representation of the sum

In summary there cannot be overflow. The correct 2′s complement representation of the sumalways results.

9.2.1.1. Picture Of OverflowThe properties of overflow in 2′s complement addition has been derived with an algebraic

notation. This was done largely to introduce that notation for later use. As derived above it makesthe properties of 2′s complement seem fortuitous, and makes it feel like luck that it worked out sowell.

A much better view of those properties comes from picturing the ultimately modulo nature of2′s complement addition. If n + 1 bit numbers are considered these start at 0 increase by 1 throughthe poitive numbers, passing through 2n−3, 2n−3 + 1, continuing increasing by 1 to 2n−2 and ultimatelyto the highest representable number 2n − 1. Throughout this range bit n was 0. When 1 is added tothis highest positive number the result is the highest 2′s complement representation of a negativenumber 2n with a 1 in bit n. Now 1 is added to get 2n + 1, and 1 continues to be added generatingthe 2′s complement form of the negative numbers. Finally the smallest negative number 2n + 2n − 1(all 1 s) is reached. The addition of 1 more gives 2n+1 which within the representable range becomes0.

This is pictured by writing the set of numbers positive and negative around a circle as shownat the top of figure 9-2. At the bottom of that figure additions of numbers with different signs areshown for non-overflow and overflow cases. These make it clear why the overflow indications areso simple.

9.2.1.2. Implementations Of AddersA number of implementations of two input adders, including standard sum-carry, (O(n)), carry-

skip and carry select (O(sqrt(n))), carry-lookahead (O(log n)) are given in chapter . And for additionof more than two number inputs tree arrays and recursive adders are given in chapter .

9.2.2. MultiplicationWith the sign-magnitude representation multiplication can be acheived by simply multiplying

the magnitudes of the numbers, and basing the sign of the result on the signs of the numbersmultiplied.

Multiplication in binary is done by adding the multiplicand for each 1 in the multiplier shiftedaccording to that 1 s position in the multiplier. Notice that shifting is equivalent to multiplication by apower of two. This gives the correct result when using sign-magnitude representation and providedthe multiplier and multiplicand are the magnitude of the numbers being multiplied. The the sign of

307

0

2n-1

2n-2

2n-3

-2n

n-1-2

n-3-2

n-2-2

2n-2

2n-3+

n-1-2

n-2-2+

2n

- 1

1-1

-2n+1

+

Positive + Negativeoverflow impossible

2n- 1-2

n

0

0

2n- 1-2

n

-2n

0

2n- 1-2

n

0

Negative + Negative Positive + Positive

overflow

non-overflow

0

2n- 1-2

n

0

2n- 1-2

n

0

2n- 1-2

n

0

2n- 1-2

n

Figure 9-2: 2’s Complement Representation Pictured

the result is simply determined from the signs of the numbers multiplied.

The same procedure works with numbers in 2′s complement representation without specialconsideration for the sign there is aoorioriate padding, i.e., bits added to the left of the number.

In 2′s complement, the multiplication of two numbers, each with n + 1 bits, [n . . . 0], with n beingthe sign bit, will result in a product with magnitude as high as 22n--if the multiplicand and multiplierare both the lowest possible numbers (negative) (-2n). This highest product is positive and requiresfor its representation 2n + 2 bits, 2n + 1 for the magnitude and 1 (= 0) for the sign bit. (The samenumber of bits is necessary if the highest positive number is multiplied by the lowest negative one.)Therefore to get the correct answer the multiplicand is represented with 2n + 2 bits. This is donewith padding. For positive multiplicands adding 0 s to the left of the n + 1 bit position will suffice.

308

The appropriate representation of a negative number with 2n + 2 bits, is the 2′s complement of its2n + 2 bit positive magnitude. This is done simply by padding the n + 1 bit representaion on the leftwith n+1 1 s. (In doing the summation of a multiplicand shifted j positions, only its low order[2n + 2 − j] bits need be considered.) The padding is justified by the following consideration:

One is given the n+1 bit representation: TC[−M] = 2n + (2n − M) and wishes to get the 2n+1 bitrepresentation: TC[−M] = 22n+1 + (22n+1 − M) representation.

22n+1 + (22n+1 − M) =

22n+2 − M + (2n+1 − 2n+1 ) =

(22n+2 − 2n+1) + 2n + (2n − M) =

(22n+2 − 2n+1) + TC[−M] =

(22n+2 − 2n+1) is the padding

In summary, multiplication can be done directly with 2′s complement numbers, if they arepadded. When the multiplicand, N, and multiplier, M, are both positive this is straight forward (since0 s are always assumed to the left of a number). However if either is negative its representationshould be as a 2n + 2 bit number. Multiplication with this 2′s complement representation is obviouslyvalid when multiplicand and multiplier are both positive. Validity is demonstrated below for cases inwhich one or both of these are negative.

2’s compl (with pad)1 1 1 1 0 01 1 1 1 0 00 0 0 0 0 0

0 0 0 0 0 01 1 1 1 0 0

1 1 1 1 0 01 1 1 1 0 0

1 1 1 1 0 00 1 0 0 0 0

X -4-4

16

1 1 0 1 0 01 01 1 1 1 0 0

1 1 1 1 0 0 1 1 1 1 0 0 0 0 0 0

1 1

1 1 1 1 0 0

Inverse Division

-34

1 1 1 1 0 10 0 0 1 0 0

1 1 1 1 0 1

0 0 0 0 0 00 0 0 0 0 0

X

-121 1 0 1 0 01 1

0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0

2’s compl =

Multiplication

0 0 1 1 0 0 12

-43

1 1 1 1 0 00 0 0 0 1 11 1 1 1 0 0

1 1 1 1 0 00 0 0 0 0 0

X

-121 1 0 1 0 01 0

0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0

1 1 0 1 0 01 11 1 1 1 0 1

0 0 0 0 0 0

1 0 0 1 1 1 1 0 1

(a)

(b1) (b2)

Figure 9-3: Multiplication Of 2’s complements Numbers-Examples

309

Multiplier and Multiplicand Negative

See figure 9-3

N and M are positive. To represent −N and −M for purposes of multiplication they are to berepresented as 2n + 2 bit numbers [2n+1 . . . n . . . 0]. In general to indicate the high order bitposition of the number, say P, for which a 2′s complement is taken the notation TCx[P] is used, wherethe subscript x is the position of the high order bit in P.

TC2n+1[−N] = (22n+1 + (22n+1 − N)) = 22n+2 − N

TC2n+1[−M] = 22n+2 − M

Mutiplying:

TC2n+1 [−N] × ).TC2n+1[−M] = 24n+4 − (M + N) 22n+2 + NM

The result that is of concern is the low order 2n + 2 bits, bits [2n+1...0], of this product. Noticethat in the 24n+4 − (M + N) 22n+2 part of this product the low order 2n + 2 bits are all 0, i.e., bits[2n+1...0] are all 0---(M + N)22n+2 shifts the sum M + N so its low-order bit is in position 2n + 2. So theproduct NM which has its high order bit at position 2n+1 is represented correctly in the low order2n + 2 bits (in fact bit 2n+1 will be 0).

Multiplier, Multiplicand: One Positive One Negative

Again N and M are positive. To represent −N and M for purposes of multiplication they are tobe represented as 2n + 2 bit numbers [2n+1 . . . n . . . 0].

TC2n+1(−N) = 22n+2 − N

TC2n+1(M) = M

TC2n+1(−N)TC2n+1(M) = M 22n+2 − N M

Note that M 22n+2 leaves 0 s in bits 0 through 2n+1. So when NM is subtracted it will give thecorrect 2′s complement representation of −NM.

Figure 9-3 shows all the addends in the multiplication. In practice the all-0 addends need notbe included (although hardware arrays for adding in parallel may have to be designed for themaximum number of addends) . Furthermore each addend can be truncated on the left to removeany bits to the left of the high order bit in the product. padding, i.e., bits added to the left of thenumber.

Multiplication directly in 2′s complement often requires including many 1 s in the multiplier(padding) this results in the need to sum ‘many copies of the multiplicand. There are a number ofways of reducing the number of addends.

9.2.2.1. Speeding Up Multiplication, Padding-Save MultiplicationMultiplier and Multiplicand-One Negative And One Positive

Consider the multiplication of a negative, −N, by a positive number, M. Now however, anumber will be represented as the sum of a n + 1 bit number, and the number representing thepadding bits, all 1 s if the number is negative and all 0 s (or no padding) otherwise.

Negative number must be padded so instead of just 2n+1 − N, it is necessary to use :

310

[2n+1 − N] + [2n+1(2n+1 − 1)] for −N

Then

product(M,−N) = M ( [2n+1 − N] + [2n+1(2n+1 − 1)] )

= M [2n+1 − N] + M (22n+2 − 2n+1)]

= M [2n+1 − N] + (22n+2 M − 2n+1 M)]

Let A = M[2n+1 − N],B = 22n+2 M − 2n+1 M = 2n+1(2n+1 M − M)

M = [mn.......m1m0]

A = (M TCn[−N]). Notice that here the 2′s complement is computed for a n + 1 bit number,whereas in the previous approach to this multiplication a 2n + 2 bit number was used, with moremany 1 s.

B is a subtraction of M shifted left n + 1 from M shifted left 2n + 2. This subtraction is shown infigure 9-4.

mn m1 m0

0 ..................... 0 0mn m1 m0

0 ..................... 0 0

0 ..................... 0 0

0 ..................... 0 0...................

...................

bit positions to large

bit positions needed

0 ..................... 0 0dn 1d 0d...................

2n+22 M - n+1

2 M B =

positionsn ....................1 02n+1 n+12n+23n+2 ........................ ...................

Figure 9-4: Subtraction Part Of Multiplication, B

So M is subtracted from all 0 s and shifted so the low order bit (m0) is at position n + 1. This isequivalent to taking the 2′s complement of M and shifting it left n + 1 positions. So B = 2n+1TCn[M].Therefore

product(M,−N) = M TCn[−N] + 2n+1TCn[−M]

Multiplier and Multiplicand-Both Negative

Consider the multiplication of a negative, −N, by a negative number, −M. These negativenumbers must be padded. So in place of 2n+1 − N and 2n+1 − M, it is necessary to use:

(2n+1 − N) + 2n+1(2n+1 − 1), and (2n+1 − M) + 2n+1(2n+1 − 1).

So

product(−M,−N) = [(2n+1 − M) + 2n+1(2n+1 − 1)][(2n+1 − N) + 2n+1(2n+1 − 1)]

= [(2n+1 − M)(2n+1 − N) + (22n+2 − 2n+1)[(2n+1 − M) + (2n+1 − N)]]

311

= TCn[−M] TCn[−N] + (22n+2 − 2n+1) TCn[−M] + (22n+2 − 2n+1) TCn[−N].......(a)

By analogous reasoning to that in the previous--"one negative, one positive" development thisis seen to be equivalent to

= TCn[−M] TCn[−N] + (2n+1) TCn(TCn[−M]) + (2n+1) TCn(TCn[−N])

that amounts to taking the product of the 2′s complement of M and N, base n, and adding Mplus N (since TCn(TCn[M]) = M, shifted left n + 1 positions.)

product(−M,−N) = TCn[−M] TCn[−N] + 2n+1 (M + N)

This way of getting the sum requires either one or two additions beyond that required if nopadding were used.

Another expression for product(−M,−N) that can be used is derived from equation (a) above.

= TCn[−M] (TCn[−N] + (22n+2 − 2n+1)) + (22n+2 − 2n+1) TCn[−N]

= TCn[−N] (TCn[−M] + 2n+1(2n+1 − 1)) + 22n+2 TCn[−M] − 2n+1TCn[−M]

Examples of the padding-save multiplication is given in figure 9-5. Multiplication of a negativeand positive number is shown in (a), and of two negative numbers in (b1) and (b2) for the twoformulations given above.

9.2.2.2. Multiplication Using Booth’s RepresentationThe padding-save approach decreases the length of the binary numbers that need to be

handled in a multiplication. Another approach uses a trinary representation of the multiplier to speedup multiplications.

The Booth transformation of a 2′s complement number, N, (positive or negative) produces atrinary number, N′, in which each bit position has a trit, namely one of three values, 0, 1 or −1. A tritat position j contributes respectively 0 2j, 1 2j, −1 2j to the sum that N′ represented. N and N′represent the same number. The transformation itself is covered in section .

In a multiplication using the booth representation the multiplicand, say X, is typicallyrepresented in standard 2′ complement, and the multiplier, say Y, in Booth representation.

The multiplication is then carried out as follows: The multiplicand itself X, or the 2′ complementof the multiplicand TC[X], is chosen respectively for each trit: 0, 1 or −1 in the trinary multiplier, Y.These are shifted by j (effectively multiplying by 2j) , for the trit in the j th multiplier position andadded to get the final product. When these shifted numbers are added, the sum need not be carriedbeyond bit position 2n + 1. The product will be in 2′s complement (base 2n + 1) representation.

An example of such a multiplication is given in figure 9-6 in which a −1 is represented by −1 .

9.2.2.3. Implementation Of MultipliersThe multiplication of two n bit numbers requires the addition of, at most, n numbers shifted

relative to each other for a result which requires 2n + 2 bits. This can be implemented efficiently by atree of adders. Pairs of numbers are added at the base of the tree the result of each pair of thesefeeds an adder at the next level etc. The tree networks developed previously depend onassociativity and are applicable to any associative operator which can be realized with an iterativecircuit, however, addition has properties beyond associative, which makes possible even faster

312

0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 1

1 0 1 1 0 0 0 0

shift n+1 (4)

M (5)

1 0 1 1 0 1 1 0 1 1 0 0 0 0 1 1 0 1 1 1 0 1

0 0 1 0 0 0 1 1

+

2’s complement

A + B

(35)

0 1 0 1 M (5)TC (-N) (-7)n1 0 0 1

3 2 1 0n

0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 0 1 0 1 1 0 1

X

A

M (5)TC (-N) (-7)n

(a)

3 2 1 0n

1 0 0 1 TC (-N) (-7)n1 0 0 1 TC (-M) (-7)

1 1 11 1 1

NM

(7)(7)

(b1)

B = ( 2n+1

M - M )n+12

<--M (5)

1 1 1 1 1 11 1 1 0

1 1 1 0 0 0 0 0

shift n+1 (4)

+ M (7)N (7)

1 0 1 0 0 0 11 1 1 0 0 0 0 00 0 1 1 0 0 0 1

+(49)

1 0 0 1 1 0 0 1 0 0 1 0 1 0 0 0 1

X1 0 0 1 TC (-N) (-7)n1 0 0 1 TC (-M) (-7)n

TC (-M) n TC (-N) n

1 1 1 1 0 0 0 0TC (-M) (-7)n

1 0 0 1 0 0 0 0

1 0 0 1

1 1 1 1 1 0 0 1

1 0 0 1 0 0 0 0 0 0 0 0

( 2n+1

-1 )n+12

2n+22TC (-M) n 2

n+1 TC (-M) n

1 0 0 1 TC (-N) (-7)

+

X

+

0 1 1 1 0 0 0 0

1 0 0 0 1 1 0 0 0 0 0 1

1 1 1 1 1 0 0 11 1 1 1 1 0 0 1

1 0 0 0 1 1 0 0 0 0 0 1

1 0 0 1 0 0 1 1 0 0 0 1

1 0 0 1 1 0 0 1 0 0 0 0

+

(b2)

Figure 9-5: Padding-Save Multiplication-Examples

addition of a set of numbers. That property is: the state or carry bits have the same character asthe external output or sum bits, both constitute numbers which once formed can be added in furtherstages. That is, instead of passing carry bits to succeeding stages, they can be outputted from eachcell, to be added later. This property is exploited simply in the circuit in figure 9-7 where an O(n)adder of n, n bit binary number adder using this property is shown. This property can also beincorprated in a tree arrangement to construct an O(log n) adder which will be considered

313

0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 1

1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 11 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 11 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 11 1 1 1 1 1 1 0 1 1 0 0 1 0 11 1 1 1 1 1 1 0 1 0 0 1 1 0 0 1 1 0 0 10 0 0 0 0 1 0 0 1 1 0 1 1

0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 1 1 0 0 1

1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 1 0 1

0 0 1 0 1 0 0 1 0 1

+

+

+

X

N

-N

-M

2’s complement

booth

-155

-91

14105

N

-N

-N

-N X -M

Figure 9-6: Multiplication With A Booth Multiplier

subsequently.

Circuitry for implementing a sequence of additions are shown in figure 9-7. In the network atthe top of the figure the binary numbers t = t1 . . . tN is added to (u = u1 . . . uN shifted once relativeto t) and the result is added to (v = v1....vN shifted once relative to u) all in the lowest or 1 st line orstage of full adders. The output of this stage is two binary numbers, the first consisting of its sumoutputs, and the second of its carry outputs. The carry outputs are shifted once relative to the sumoutputs. In the second stage the sum and carry of the 1 st stage is added to the binary number(w = w1....wN shifted once relative to the sum). Again a sum and carry are produced. At eachsuccesive stage, moving upwards, thereafter a new binary number shifted relative to the sum of theresult of the previous stage is added in. This implementation is a 2−dimensional space networkdrawn to reflect how such additions with succesive addends shifted would look on paper--actuallyan inverted-flipped image. The final stage differs from the others. It is 2 input iterative adder. andcan be implemented in any of the ways (O(n) , O(sqrt(n)) , O(log n)), previously described. If n is thelength of the multiplier this circuitry requires O(n) time it must be built large enough to handle naddends, even though in a multiplications on average half the bits in the multiplier are 0.

The circuit in the center of the figure is, in fact, the same parallel array as that at the top. It hasbeen redrawn by shifting succesively higher rows of blocks by succesively greater distances to theleft, so as to finally to form a rectangle. This is done so as to show its relation to the time reducedversion shown at the bottom. The lowest network shown is 1−dimensional and timed. In it the binarynumbers u v, w, x, etc are applied on succeeding clock pulses. The connections to simulate the laststage binary adder of the space network equivalent is not shown in the timed version to simplify thefigure, they would include a carry from each block to the block to its immediate right, and an outputfrom each block, in addition to the connections already shown. This circuit has the same O(n)complexity as the parallel array at the top, but since addends are entered serially only thosegenerated by 1 s in the multiplier need be added thus allowing an average case speedup.

Now consider a tree arrangement in which, as above, sums and carries are treated asseparate numbers. Figure 9-8(a) shows the basic 3 input carry save unit, CS-3. CS-3 has 3 binarynumbers as input and the 2 outputs, the carry (C), shifted one position, and the sum (S) numbers. Itis essentially the same as the lowest stage of the circuit in figure 9-7. The two outputs of CS-3 canthen be summed in a parallel adder. Two CS-3 units then take a total of 6 binary number and eachproduce a sum and carry output. The 4 output numbers thus produced can then be added in two

314

O1

O2 x +

+ +

O3

w w

O0

+

w +

x + x

+ +

+ +y y y

+

+

+

y

x

+y

+w

v

+

w

x. . .

. . .

. . .

. . .

. . .

O7 O6 O5

O4

On+4

On+5

0 1 2 3 n

0 1 2 3 n

sum carry

+ +

++

++

++

++

v v v

ww w

x x

y y

x

y

+

+

+

+

+

. . .

. . . . . .

. . .

. . .

O3

O7 O6 O5

O4

O0

O1

O2

+

+

+

+

+v

w

x

y

On+4

On+5

v

w

x

y

. . . ++ +uu u

+

xwv

xwv

xwv

xwv

y yy y

0 1 2 3 n

u

x

vw

y

O0

O4. . .

t

t

tt t

uuu ut t t t

t

t

+v +

v+

v . . .

uuu

+

ut t t t

sum carry

Figure 9-7: Product An n Bit By A 5 Bit Number, Circuit For Adding Bit Products

additional CS-3 units to again give just a single sum and single carry number output. This is done inan additional two stages. The resultant arrangement is shown in in part (b) of the figure. It is aspecial case of the recursive tree arrangement in part (c), with i = j = 1. In the general case there isa CS-3 i unit and a CS-3 j, i, j ≥ 1 unit. Their outputs are added in two CS-3 units in two stages toagain produce a single sum and carry output. The result is a CS-3i + 3j unit, one with 3i + 3j inputsnumbers and just 2 output numbers. These outputs can then be sent to a parallel adder for a final

315

summing. This recursive schema gives a way tree networks of these adders can be constructed. Ingeneral if i = j it is clear that the addition of two stages of CS-3’s is used for each doubling of thenumber of binary number inputs to be added. Assuming the time of computation is equal to thenumber of stages:

n = 3 2m is the number of inputs, and tp is the time taken by the parallel adder, and τ(x) is thetime taken by the tree network without the final stage of iterative adder. It is clear that:

τ(3) = 1

τ(n) = τ(n/2) + 2

and it is easily verified that this is solved by

τ(n) = 1 + 2 log (n/3)

t(n) = 1 + 2 log (n/3) + tp

t(n) = 1 + 2 log n − log 3 + tp

t(n) < 1 + 2 log n+ tp

Since tp can reasonably be O(log n) this is better at large n than other tree adders we haveconsidered.

The numbers to be added in this circuit could come from a 2′s complement multiplication. Eachbit of the multiplier starting from the low order bit selects the multiplicand or all-0 s to be added toget the product. These must each be shifted according to position of the multipplier bit which selectthem. The circuitry above assumes the addends are all of the same length so for use inmultiplication assume that the quantities sent to the adder are padded out with 0 s to the right fillingthe spaces left as a resullt of the shift and 0 s or 1 s to the left depending on whether the multiplicandis positive or negative.

In the iterative circuit chapter carry-lookahead circuitry which can be developed to giveO(log n) time complexity for adding two numbers was generalized to system which could implementany iterative circuit in O(log n). To what extent can the current circuitry to add n numbers in O(log n)time be generalized. The essence of this design is the fact that the problem of adding 3 numberscan be transformed into that of adding 2 numbers in constant time with the CS-3 circuit. Thiscertainly does not apply to all iterative circuits even though each of those with 3 inputs/cell has 2outputs/cell.

9.3. The Booth Algorithm.

In the implementation of multiplication with a circuit whose addends arrive serially, theaverage complexity depends on the number of 1 s in the multiplier. Each 1 in the multiplier requiresthe addition of a shifted multiplicand, while a 0 requires only a shifting to be applied to the next copyof the multiplicand. In the usual n bit binary representation of number 0 through 2n − 1 the averagenumber of 1 s is n/2. In a more expansive representation with three symbols allowed at a position,1, 0 or −1, (a trit) the total number of 1 s of both polarities can be as low as n/3 on average. This willonly occur with an appropriate choice of the multiple ways to represent a number with trits.Multipliers in 2 s−complement representation can be transformed uniquely into trit representation with

316

To Parallel Adder

(a)

(b)

. . . .

+v +v

+v . . .

uuu

+

ut t t t

v

U VT

CS-3

CS S C

S C

S C

CS-3

CS-3

CS-3CS-3

S C

To Parallel Adder

CS S C

S C

S C

CS-3i CS-3j

CS-3

CS-3

. . . .

. . . .

X3i X3i+1X1 X3i+3j

. . . .

CS-3i + 3j

X1 X3i+3j

S C

To Parallel Adder

(c)

CS-3 + 3

X1 X 3X2 X 4 X5X

6

X1 X6

. . . .

S C

sum( )S

carry( )C

Figure 9-8: Carry Save Multiplication, The Adder

the desired 1/3 property using the Booth transformation. This transformation of binary multiplier x isconveniently given as a composition of two simple transformations. The first, T1(x) = y produces abasic trit representation, but one in which the average number of 1 s is not decreased. The secondT2(y) does finally reduce the average number of 1 s. Both transformations are described below andillustrated in figure 9-9.

1. T1 simply replaces a run from position j to j + k − 1 of 1 s, shown as a in the T1 part ofthe figure, by 1 at j + k and −1 at j shown collectively as c − b on the right of the figure.On the left of the figure it is shown that a + b = c, justifying the replacement of a withc − b.

T1 can be implemented by scanning from the low order to the high order bit (right toleft) of the 2 s−complement multiplier, b, and on finding 0 s generates 0 s in thecorresponding position of its trit representation. On encountering a 1 directly after a 0

317

T1(a)

T2(b)(a)T2

T1

T2

j j-1

a

b

=c

+

. . .jj+k

a

bc-

=

. . .jj+k

j j-1

=

ba

. . .1 1 0 0 0 1

1 1. . .1 0 . . . 0 0

Figure 9-9: Booth Transformations T1 and T2

(or initially), say at position j, at the right of a continuous sequence of 1 s, T1 generatesa −1 in the j th position of the trit representation. For each 1 in the following contiguoussequence of 1 s, T1 puts a 0 in the corresponding positions of the trit representation.Then when it encounters the first 0 following the contiguous sequence of 1 s itproduces a +1. When T1 finishes scanning the leftmost bit of b it stops.

In figure 9-10 (a) a state diagram for a deterministic finite state machine with outputs(D FSM-O), called M1, that implements T1 is shown. A is the starting state and theinput sequence is assumed to ended with the special symbol e, which puts M1 in thefinal state Final. (The Final state has one input, namely e, which takes it to itself)

T1 replaces a contiguous sub-string of k 1 s by 2 1 s. So there is a decrease in thenumber of 1 s if k ≥ 3, but an increase if k = 1. The net effect is that the averagenumber of 1 s is again n/2. Notice also that the output of T1 does not contain anyadjacent pairs of +1 s or of −1 s. In fact the result reading right to left consists exactly of−1 s alternating with +1 s interspersed with 0 s. Every way of placing k 1 s in n positions,with n ≥ k ≥ 1 will occur, and those 1 s will have alternating signs.

2. T2 depends on the fact that 2j − 2j−1 = 2j−1 and −2j + 2j−1 = −2j−1 as illustrated at thebottom of figure 9-9.

T2 takes the output of the T1 transformation and scans it right to left. The input to T2

318

then is a string t of trits. T2 is looking for either +1 followed directly by −1, or −1followed directly by +1. If it detects a −1 at position j − 1 followed by +1 at position j, itreplaces this with +1 at position j−1 and 0 at position j. If it detects +1 at position j−1followed by −1 at position j, it replaces this with −1 at position j−1 and 0 at position j.

T2 eliminates all pairs of adjacent 1 s, but the 1 s no longer alternate between negativeand positive with interspersed 0 s as after T1.

In figure 9-10 (b) a non-deterministic FSM-O, (ND FSM-O) called M2, thatimplements T is given. 1 is the starting state and the input sequence is assumed tobe ended with the end symbol e, which puts M1 in state Final. Starting in state 1 thedetection of a 1 input may result in a 1 or −1 output depending respectively onwhether the input following the 1 is 0 or +1. So there are two transitions shown frominitial state 1, both with inputs of 1, but the following states 2 and 3 each allow a singleinput, namely −1 and 0 respectively. A similar description applies to the non-deterministic representaion of the results of an initial input of −1, which depends onthe input following it.

In figure 9-10 (c) an ND FSM-O, M, which is the composition of those in (a) and (b) is shown.That is, it is mechanically constructed as follows: Assume M1 and M2 are both in their initial states Aand 1 this corresponds to M being in its initial state (A,1). Now an input is presented to M1 resultingin next state δ(A,i) and output o(A,i), that output is the input to M2 in state 1 resulting in a next stateand output of M2 namely δ(1,o(A,i)) and o(1,o(A,i)). So in M the state that results from input i whenthe M is in state (A,1) is (δ(A,i) , δ(1,o(A,i))) and its output is o(1,o(A,i)). The same procedure appliesif (A,1) is any state in M.

The result of the transformations given above is a trit representation which is called theOptimal Booth Transformation, or OB. M will be converted to a D FSM-O to realize the OB in thecontext of a general discussion of such transformations.

9.3.0.1. The Booth Transformation Is CorrectIt is now clear that the transformation T2(T1(X)) results in a legitimate trit representation of X if

X is a positive number. In figure 9-11 a picture of a binary number X is shown. It is 0 at the extremeleft (the sign bit) and moving rightward rises to 1 for a number of positions then again drops to 0then rises to 1 for a single position, dropping again to 0 for several postitions, etc. The result of theT1 transformation on this number is shown directly above it. And then the result of T2 applied to thatis shown above that.

The T2(T1(−X)), OB, transformation is also legitimate if −X is the 2 s−complement(X). Just belowX the 1 s−complement(X) is shown. Below that is shown the 1 which is added to 1 s−complement(X) toget 2 s−complement(X), called −X, shown next in the figure. Below this the result of the T1transformation on −X is shown. And finally the result of T2 applied to that is shown at the bottom offigure 9-11. The proof that, at least for this example, this is legitimate is that for every 1 in T2(T1(X))at the top there is a 1 of opposite polarity in T2(T1(−X)) at the bottom representation. To the extentthat X represents all the possibilities in a binary number which can effect the outcome of thetransformations this is a proof. One place where it may fail to do this is at its low order bit, which if itwere 1 instead of is 0, would give a different picture. However this is not a significant differencesince in either case the addition of a 1 to the 1 s−complement produces a 1 with all 0 s to its right inthe resultant 2 s−complement at the rightmost 0 in the 1 s−complement.

319

( a )

1/-1

0/1

B

M

A

1/0

2 3

1

4 5

1/-1-1/0

-1/1

1/0

-1/-1

0/0

( b )

M

0/01/1

( c )

M

A 1

0/0

1/1

1/0

0/-1B 14B

1/-1 0/1

1/0 0/0

Initial A 2

A 35B

Initial

1

2

0/0

0/0

0/0 1/0

ee/

ee/ee/

ee/

ee/

ee/ ee/

Final

FinalFinal

Final

Final

Final

Final

Final

λe/

Figure 9-10: Development Of T1-T2 Optimal Booth (OB) Non-Deterministic FSM-O

9.3.1. DivisionDivision is the inverse of multiplication. Given a multiplications product and its multiplier, the

division algorithm computes the multiplicand. When doing division these quantities are named,respectively, the dividend, the divisor and the quotient. Division will require a shifted series ofsubtractions to undo the shifted series of additions of the corresponding multiplication.

Consider the multiplication X ⋅ Y = Z using 2 s complement numbers with Z being negative. Thefinal 2 s complement representation of Z is actually a truncation of the result of that multiplication.This is illustrated in figure 9-3. In (b1) −3 ⋅ 4 = −12 and in (b2) −4 ⋅ 3 = −12 are shown in detail.Although the truncated result of the multiplications are the same, the complete result, of 8 bits,including the truncated bits, are different. A straightforward division by −3 in (b1) and −4 in (b2)gives the correct result provided the complete products are used as the dividends. If the certainly

320

1’s complement

1

+

T1(X)

2’s complement

= =

T1(-X)

T2 (T (-X))1

T2 (T (X))1

X

-X

Figure 9-11: Transformations T1 and T2 Work For 2s-complement Numbers

correct representation of −12 in the truncated 6 bits were used the division results would be wrong.

On the otherhand division in which both the given dividend and divisor are positive,representing the magnitudes of their signed values is straightforward. If the numbers are given in2 s complement representation they can be converted to sign-magnitude. Dealing directly with thenumbers in their 2 s complement representation generally requires handling many extra bits.Subtraction is a frequent operation in the division algorithm. Subtraction can be done by adding the2 s complement of the number to be subtracted. So, if an adder is to be use for subtraction in thedivision algorithm, it will be necessary to apply 2 s complement transformations in any case.

Basically the division algorithm requires choosing the highest power of 2 by which to multiplythe divisor, d, and still have a result ≤ the dividend, D. The binary number representing that powerof 2 is or ed (+) into a quotient register, Q, Then performing the multiplication of that power of 2 withd ,and subtracting the result from D we get D1. D1 is the number remaining that must still be givenas a product of d and other powers of 2, whose representations will or ed into Q. The divisionprocess can be succinctly defined recursively:

321

D0 = D = dividendd = divisorQ = 0

If Di ≥ d:Di = Di-1 - 2hdQ = Q + [Binary representation(2h)]

where h is the highest power of 2for which this result is positive.

If Di < d:Quotient = Q, Remainder = Di

In figure 9-12 a register transfer program and a corresponding register transfer block diagramfor implementing this definition are given. A complete example is also given. Di is obtained formDi-1 by assigning the modification of D back to D. The program determines the highest power of 2times d which is ≤ D by determining the number of shifts necessary to left justify d with the currentvalue of D. This requires shifting d and D relative to each other. To do this auxilliary registers d1and Q1 are used. These shifts can be handled in a number of ways. In the example we havealways shifted d left until its high order bit lines up with the high order bit of D. There are other waysof doing the shifting which may be a more efficient. The transfer register block diagram is somewhatnon-committal on the shifting. It assumes the Q and Q1 registers are shift registers. It sends the Dand d1 registers to the "Justify" functional block. This block determines the shift (power of 2product) necessary for d1 to have its hiorder bits aligned with that of D, and sends this value to dand Q1. This can be accomplished by shifting those registers relative to D, or even even movingtheir contents the distance necessary for alignment.

9.3.2. Floating PointA typical floating point number consists of

1. an exponent called E with e + 1 bits: [e...0], typically represented in 2′s complement.

2. a fractional part, called M′ with m + 1 bits: [m...0]. It is always in sign-magnitudeform. Bit m is the sign bit, and bits [m − 1...0] is positive, and is referred to as thefractional magnitude, symbolized by M.

Throughout the discussion examples will be given using the numerical values e = 7, m − 1 = 23.

In this discussion, unless otherwise stated, we assume base 2 representation of numbers.

Determining the higher of two numbers is frequently required. With floating point numberscomparison of exponents is basic to that determination. In order to make the comparison easierexponents are often kept in excess 2e (excess 128) form, which is always positive.

The actual exponent in 2′s complement form E is converted to Ex = E + 2e before it is stored. So

1. If E is negative, E = −X, X positive, so in 2′s complement:E = 2e + (2e − X)Ex = 2e + (2e − X) + 2e

Ex = 2e+1 + (2e − X), which because of overflow outof the sign bit leaves: Ex = 2e − X.

322

d1 = 0 0 1 0 1 0 0 0 0 (shift(d1,Q1)<- 4) Q1 = 1 0 0 0 0 D > d1; no shft(d1,Q1)->1 Q=Q+Q1= 1 0 0 0 0 = 16 D = 0 0 1 1 0 1 1 0 1 d1= 0 0 1 0 1 0 0 0 0 D = 0 0 0 0 1 1 1 0 1

D > d so d1 = 0 0 0 0 0 0 1 0 1 d1 = 0 0 0 0 1 0 1 0 0 (shift(d1,Q1)<- 2) Q1 = 1 0 0 D > d1; no shft(d1,Q1)->1 Q =Q+Q1= 1 0 1 0 0 = 20 D = 0 0 0 0 1 1 1 0 1 d1 = 0 0 0 0 1 0 1 0 0 D = 0 0 0 0 0 1 0 0 1 D is > d so d1 = 0 0 0 0 0 0 1 0 1 d1= 0 0 0 0 0 1 0 1 0(shft(d1,Q1)<-1) Q1= 0 0 0 0 0 0 0 1 0 D < d1; yes shft(d1,Q1)->1 d1 = 0 0 0 0 0 0 1 0 1 Q1 =0 0 0 0 0 0 0 0 1 Q=Q+Q1= 1 0 1 0 1 . = 21 D = 0 0 0 0 0 1 0 0 1 d1 = 0 0 0 0 0 0 1 0 1 D = 0 0 0 0 0 0 1 0 0 = 4 [final] D < d so done (Q = 1 0 1 0 1, Remainder = 1 0 0)

d X Q + D [final] = D [initial]5 X 21 + 4 = 109

EXAMPLE:

d = 0 0 0 0 0 0 1 0 1 = 5[initial]D = 0 0 1 1 0 1 1 0 1 = 109Q = 0 0 0 0 0 0 0 0 0Q1= 0 0 0 0 0 0 0 0 1 D > d so d1 = 0 0 0 0 0 0 1 0 1

+ = logical or = subtraction

CONTROL

yesshft->1 shft<-1 tst yes no or subtract lft just shft

A B

dx

yesshft->1

shft<-1

x Qx

+2

x 3

orxx

Dxx

13x

2xx

subtract

1

x

Justifyx

x

yes no tst

Q1x shft<-1 yes

shft->1

1x

d1xx

13x

2xx

lft just shft shft x

>?

x

x

x

DIVISION

Initially: d is the divisor ( ≥ 0) D is the dividend ( ≥ 0) Q 0 [it will contain the quotient at the end]

L: if (D > d) then

else done (Quotient= Q, Remainder= D)

Q1 1

{ d1 = d;

if (D < d1) then

goto L; }

Q Q + Q1; Q1 1;D D d1;

(d1,Q1) shft(d1,Q1) so d1(hiorder)=D(hiorder);

(d1,Q1) shft->(d1,Q1)->1;

Figure 9-12: Division

2. If E is a positive number X:Ex = 2e + X.

Now the exponents Ex are interpreted for comparison as positive numbers with m + 1 bits. Soit is only necessary to decide which of two positive numbers is greater.

323

For other operations than comparison the exponents should be put in their original form again.This may be done by again adding 2e. This will get the negative excess number to its original form.It do the same for the positive number, with overflow clearing the sign bit.

The fractional magnitude, M, is usually given as a number less than 1. To do that themagnitude of the original number, N, which may not be a fraction, is normalized for storage byshifting N and adjusting the exponent accordingly. If N is not a fraction, it is shifted right r times sothat the binary point is just to the left of the most significant (non−0) bit, and r is added to theexponent E. Similarly if N is less than 2−1, it shifted left l times until the first 1 in N is in the leftmostposition in the fractional part, and l is subtracted from the exponent.

Often, when normalized, mod 2, as above the leading 1 in the fractional part is omitted andunderstood to be present. (If it is omitted 0 and 1/2 have the same representation. An additionalconvention is needed to distinguish them, ex. a leading 1. The standard convention is that 2e in theexponent is to represent 0 independent of the fractional part. This means that the value of −22e

is nolonger available. When this is done excess 2e − 1 (excess 127) is the standard coding for comparison. Itis enough to still assure that all numbers will be positive after the addition. It is also common torepresent numbers in base 2k and normalize them mod2k, in which case the shifts are done until thefirst k bits contain a 1 (but in this case all bit are retained). The different normalizations result indifferent ranges in numbers represented and different average amounts of shifting required.

The exponent E in 2′s complement can have any value from −2e through 0 to 2e−1. So e alwaysfalls within a range of 2e+1.

M is positive, and less than 1, and because of normalization, the leftmost bit of M is 1 for everynumber, except 0, so M is between 2−1 and 20 − 2−m (bits m − 1 through 0 are all 1 s). This is shownin figure 9-13.

. . .. . .0 1

1 0

0 0

1 1

0 0

0 1-

0

0

2 -1 -22 . . . -m+12

-m2

0m m-1 1sig

n

position:

value:

20 -m2-

20

-m2-===

Figure 9-13: Floating Point Range

Assuming normalization, the largest value of M is 20 − 2−m, and the largest exponentrepresented by e is 2e − 1. So the largest number that can be represented is (20 − 2−m) 22e−1. Thelargest negative number is −(20 − 2−m) 22e − 1. So the range is (20 − 2−m)22e

which is approximately22e

(2128).

The range is enormous, but the numbers that can be represented are scattered non-uniformlyover that range. The smallest difference between two succesive numbers is 2−m (2E). So when 2E issmallest E is at its most negative and therefore at small numbers, that difference is2−m (2−2e

) = 2(−2e − m),(2−151) (a very small number. On the otherhand when is at its highest,E = (22e−1]), the difference is 2−m (22e−1) = 2(2e − (m + 1))(2104) Since generally 2e−1 > > (m + 1), it has a

324

positive exponent, and is very large. So there is a great range in the difference between succesivenumbers; being very small at small numbers, rising by multiples of 2, to very large at large numbers.

9.3.3. Implementing Floating Point Addition-PipelineThe design of a floating point adder requires execution of a number of steps, namely,

1. determination of the larger exponent,

2. determining its difference from the smaller exponent, say d,

3. shifting of the fractional part of the smaller exponented one d places right

4. equalizing the exponents to the larger value

5. adding the resultant fractional parts, and finally

6. re-normalizing.

So, assuming the existence of the necessary functional units (2 s−complement adders,difference or circuit, shifter, and an assortment of registers), it is still necessary to sequence theseoperations. The same approach used for design of the CPU is applicable. That is, assuming thefunctional units as well as a few registers are connected to common buses, it is only necessary tospecify a sequence of register transfers to and from functional units, and the execution of thesefunctions. The sequencing is to be implemented by a control unit as in the CPU. The configurationof registers and functional units required, their register transfer code, and instruction task table(union) is shown in figure 9-14. In that figure functional unit names start with "f" and those ofregisters start with "r" or "R". The goal is to add f1 2e1 to f2 2e2.

In figure 9-15 the task table from figure 9-14 is again shown at a. In the first stage re1 re2, (re(1&2) rf1 and rf2 (rf (1&2) are assummed initially to hold the exponents and fractional parts of thefloating point numbers to be added. These re (1&2) are read "r" and written "w" into fDIF, which isexecuted to give e2-e1 at the output of fDIF. In the table r means read when there actually is aread from one unit to another so there is always an accompanying write, w in another row. r′indicates that during the stage the information is maintained in the unit though not used until a laterstage. x indicates an execution of a functional unit.

Based on this table the optimal task interval, OTI, for synchronous pipelining is 4. This can beimproved by duplicating fSHFTREG, and fEXP as shown in (b) to an OTI of 2. The actual timerepresented by this "2" is the maximum time of every odd-even pair of succesive stages. Thestages (columns) that take the most time are those requiring an execution, (xs). So an odd-evenpair of succesive stages with xs is undesireable. It might then pay to add some stages to split upsuch pairs of stages. Such additions will increase the latency, i.e., the time from start to finish of asingle task, but will decrease the throughput. In (c) the result of adding 3 stages is shown-- solatency is now 8. There are now executions in columns 1, 3 and 5 only thus decreasing the actualtime involved in a task interval of 2. However the table now has a maximum forbidden interval of 4.This however can be reduced to 2 again by replication of devices similar to the replications whichresulted in (b) from (a). It appears that as many as 7 registers used as shift registers would beneeded. Is it worth it?

325

1 2 3 4 5

r

wx

fSUM

fSHFTRG

fDIF

re( 1&2)

rf( 1&2)

r2

rEsum

fINCR

w

r

r x

r

w

wx r

wx r

r’ r

rwx

rwx r

stagesdevices

rFsum

rdif w x wr

w

if fSUM3 >= 1 then [sum-dec] 5

TRANSFERS OPERATION BUS STAGE

fDIF1 <--- re1 A 1 fDIF2 <--- re2 B 1 rdif <--- fDIF3 C 1 if fDIF3 > 0 then [dif-dec] 2

{rEsum <--- re1 B 2 fSHFTRG <--- rf2 [shift->rdif] A 2 r2 <--- rf1} C 2 else [dif-def] 2 {

rEsum <--- re2 . B 3 rdif <--- rdif(noop) [compl] 2

fSHFTRG <--- rf1(rdif) [shift->rdif] A 3 r2 <---rf2} C 3 fSUM1 <--- fSHFTRG 4 fSUM2 <--- r2 B 4 rFsum <--- fSUM3 C 5

fSHFTRG <--- rFsum [shift->1] A 5 fINCR <---rEsum [incr] C 5

CONTROL

compl shft difdif-dec shft->1add incrsum-dec

fSUM

fSHFTR

fDIF>0?

rFsum

compl

x

x

23

1

sum-dec

incr

x

rf1

rf2

x

r2

fINCR

x

rEsum

re1

re2x

rdif

2

1

x

xx

A B C

x

x

x

xx

x

x

x

x

x 3

shft-> difshft->1

dif-dec

x

addx

x

x

x

x

x

Figure 9-14: Pipeline For Floating Point Addition

9.4. Implementation Of Booth Algorithms And Finite State MachinesIn this section the implementation of Booth Algorithms (there are two such algorithms

considered here) whose study starts with the Non-Deterministic FSM given in figure 9-10 is studied.The implementations and their properties are considered in great detail. The techniques developedfor their study are generalized so they may be applied to any FSM. Included also is the remarkablyefficient implementation of addition of Booth notation numbers.

326

6 7 81 2 3 4 5

rwx

fSUM

fSHFTRG

fDIF

re(1&2)

rf(1&2)

r2

rEsum

fINCR

w

rr

r

w

wx r

wx r

r’ r

rwx

rwx r

stagesdevices

rFsum

rdif w x wr

w

1 2 3 4 5

rwx

fSUM

fSHFTRG3

fDIF

re(1&2)

rf(1&2)

r2

rEsum

fINCR

w

rr

r

w

wx r

wx r

r’

rwx

rwx r

stagesdevices

rFsum

rdif w x wr

fSHFTRG1

fSHFTRG2 rw

rEsumrEsum1

rEsum3

rEsum2 rw

ww

1 2 3 4 5

rwx

fSUM

fSHFTRG3

fDIF

re(1&2)

rf(1&2)

r2

rEsum

fINCR

stagesdevices

rFsum

rdif

fSHFTRG1

fSHFTRG2

rEsumrEsum1

rEsum3

rEsum2

w

rr

r

w

wx r

wx r

r’

rwx

rwx r

w x wr

rw

rw

ww

r’

r’

r’r’

r’

r’

r’

r’

r’

r’

r’r’

Figure 9-15: Task Tables For Floating Point Additions

9.4.1. From Non-Deterministic To Deterministic FSM-OThe ND FSM-O in figure 9-10 describes the actions necessary to implement the OB algorithm.

In general however an ND FSM-O may or may not define a unique output for every acceptableinput. Generally there are three possibilities:

1. It does not actually specify a unique output sequence for every input sequence.

2. It does specify a unique output sequence for every input sequence which can beimplemented deterministically, but requires a more powerful machine than an FSM-O.

3. It does specify a unique output sequence for every input sequence which can beimplemented deterministically with an FSM-O.

It is the last case we are interested in. First, some more detailed definitions are given.

9.4.2. ND To k-Lookahead D FSM-O, Definitions and TestsA Non-Deterministic FSM-O d consists of:

1. A finite set of states Q, including a single Initial state and a single Final state. TheFinal state has one input, namely e, which takes it to itself.

2. A finite set of inputs I, including a unique end symbol e.

3. A finite set of outputs O.

4. For each state q ∈ Q there is a subset of inputs ∈ I which are applicable to q.

5. A next state function: ∆(q,i) = S , q ∈ Q, i is applicable to q, S is the set of all next stateswhich results when i is applied to q, a subset of Q. ∆(q,e) = {Final}.

6. An output function o(q, i, q′) = i, q ∈ Q, q′ ∈ ∆(q,i), i ∈ I.

327

Start

1A2 ,

B1 0,

A1 1,

1/0

0/0

0/1

1/-1 0/0

1/0

0/01/-1 1/0

0/1

0/0 1/0

B4 0,

1/0B5 1,

0/0

e/0

e/0

B1 1,

e/0

1/λ0/λ

[B45]

A1 0,[A1] [A1]

[A23]

[B1]

[B1]

[B1]e/0

M’

( c )

(Simplified)M’

e/jIndicates transitionto state Final withinput e and output j

( b )

1/-11/0

0/0

0/1 0/0

1/-11/λ

0/λ

0/1

e/1

1/0e/0e/-1

Start [B45] [B1]

[A1] [A23]e/0

e/1

e/0

e/-1

2M’

0/λ

1/λ

-1/λ

1/0

0/0

0/1 -1/-1

1/10/-1

1,1

-1/0

e/0

e/-1

e/1

( a )

Start 1,0

1,-1

[23]

[45]

[1]

A1 0,[A1]

e/0

1/0

0/0

Figure 9-16: 1-Lookahead Deterministic M’2 / M’ Equivalent To M2 / M

The input sequence I = <i1, i2 . . . in>, with in = e generates the state sequenceR = <q0, q1, q2 . . . qn> where q0 = Initial and, for all j, ij is applicable to qj−1 and qj ∈ ∆(qj−1, ij) - there

328

may be more than one such state sequence. The state sequence in turn generates the outputsequence <o1, o2 . . . on> where oj = o(qj−1, ij ), ij is applicable to qj−1 - there is only one outputsequence generated by each generated state sequence R.

An ND FSM-O, M, is state k-lookahead if each input sequence generates a unique statesequence in which each next state is determined uniquely by the current state, the current input,and no more than k subsequent inputs; k being the smallest integer for which this is true. If k = 0then M is a deterministic FSM-O.

If a machine M satisfies the conditions of the above paragraph with state sequence replacedwith output sequence, then M is an ND FSM-O is output k-lookahead.

Note that, because each input sequence ends with e, M will be in a state, s, other than Finalwhen fewer than k subsequent inputs still to come. Thus for M to be k-lookahead the next state afters must be determined by fewer than k subsequent inputs, at least for some states of M. An exampleis given in figure 9-10 (c) where from B1 with a 0 input M can go to either A2 or A3 looking 1 inputahead. Which of these two it will go to is determined, but it may terminate without ever getting 1more input. The e indicates termination and this determines that the next state must be A3, so theoutput is 1.

A general test will be developed for determining whether an ND FSM-O, M, is statek-lookahead, then it will be expanded to give a sufficient condition for M to be output k-lookahead.

9.4.2.1. Design Of D FSM-O Equivalent To k-Lookahead ND FSM-OThe task now is to convert a ND FSM-O M which is k-lookahead into an equivalent D FSM-O,

M’, whose starting state is Start, and whose output is delayed. For an input sequence I to M letO(Initial, I) be the resultant output sequence, then, with input sequence I || ek, the output from M’ isλk || O(Initial, I). (λk and ek are k length sequences of null output, and special input respectively)

The nondeterministic FSM-O s, M1 and M, in figure 9-10 (b) and (c) are both 1-lookahead. TheFSM-O s, M’2 and M’ in figure 9-16 (a) and (b) are respectively the deterministic 1-lookaheadequivalents to M and M1 in figure 9-10.

If a non-deterministic FSM-O, M is 1-lookahead then knowledge of M s state q, its current inputi, and the next input is sometimes necessary and always sufficient to determine both the next stateafter q, designated δ′(q,i⋅j), and the output that results during this transition, designa o′(q,i⋅j). Buthaving to know the current and next input to determine the state means that a non-deterministicequivalent to M, M’, must have state designations different than M s. In order to gather all theinformation required to determine a new state, into a state-input designation, M’, combines a stateof M and an input as its state designation. So:

M’ starts in an initial state, Start, after the first input M’ is never in this state again. All otherstates of M’ are designated with a pair (q,i) where q is a state of M, and i an input to M or an e. InM’ the next state δ((q,i),j) = δ′(q,i⋅j) and the output o((q,i),j) = o′(q,i⋅j) must each be a function of M’ sstate and its current input. In figure 9-17 rules for constructing M’ from M are given. The first lineunder 1-lookahead says that with any input i from state Start of M’ the output is λ, and the next stateis the state-input pair (Initial, i) where Initial is the starting state of M. The next line says that fromthe state (q,i), of M’, an input of j results in an output of o((q,i),j ) = o′(q,i⋅j) and the a next state ofδ((q,i),j) = δ′(q,i⋅j). The following lines’ interpretation are similar to the described lines They deal with

329

2-Lookahead

1-Lookahead

(δ((q,i), j), j)(q, i)j / o((q, i), j)

(Initial, i)Start i / λ

(δ((q,i), e), e)(q, i) e / o((q, i), e)

(q, e) Final

i /λStart ( [∆({Initial}, i)], (Initial, i) )

( [Q], (q,i)) (δ((q,i), j), j) ) ( [∆(Q, j)],j / o((q, i) j)

o’(q, i:j)o((q, i) j) =(δ’(q,i:j), j) (δ((q,i), j), j) =

where: δ’(q,i:j)/o’(q,i:j) is the next state/outputafter state q given that the current input is i and the following one is i .

∀ i applicable to Initial

∀ i,j applicable to q

∀ e,j applicable to q

∀ e applicable to q

e / o((q,<i,j>), e)

Start

(q, <i,j>)

(q, <i,j>)

(q, <j ,e>)e / o((q,<j,e>), e)

j / λk / o((q,<i j>), k )

(Initial, i)

(Initial, <i>) (Initial, <i,j>)

(δ((q,<i, j>), k), <j,k>)

(δ((q,<i, j>), e), <j,e>)

o’(q, i:j,k)o((q,<i j>), k ) =( , <k, i> )δ’(q,i:j,k)(δ((q,<i, j>), k), <j,k>) =

where:where: is the next state/outputafter state q given that the current input is i and the following two are j, k..

δ’(q,i:j)/o’(q,i:j,k)

(δ((q,j), e), e)(q, e ) Final

∀ i,j applicable to Initial

∀ i applicable to Initial

∀ i,j,k applicable to q

∀ i,j,e applicable to q

∀ j,e applicable to q

∀ e applicable to q

Finale / λFinal

Finale / λFinal

e / o(q,e)

e / o(q,e)

{The input sequence should end with 2 e s]

[The input sequence should end with 3 e s]

i / λ

Figure 9-17: Transforming Of Non-Deterministic To k-Lookahead Deterministic FSM-O

the end input e. Notice that two e s are necessary to tease out the final outputs--because of thedelayed output. These five lines give a complete construction of M’. Below this description is adashed line, and below this is another altenative way to implement M’.

This alternative has a more extensive designation for a state. It includes the designation (q,i)the state-input designation of the previous construction, and adds a set-of-states [Q]. It is this set-of-state component which requires additional explanation. Recalling that ∆([Q],i) = {δ(q,i) | q ∈ Q}. Theform of an M’ state in when this information is included is ([Q],(q,i)). Note that such a statedesignation has the interpretation that δ(q,i) = Q in M.

330

The state transitions and outputs with this additional state information is indicated by the rulesin figure 9-17. This formulation of M’ is used in determining equivalent states in the resultantdeterministic 1-lookahead FSM-O, as well as determining if, in fact M’ is 1-lookahead (and moregenerally k-lookahead).

1. If the set-of-state component (Q) of a state of M’ is a single state then the output o(q,i)is unique and j/o(q,i) is the output associated with the application of j in M’.

2. If, Q, the set-of-state component of a state of M’, has more than one state eithera. In going from q with input i to each state in Q gives the same output or

b. it gives different outputs but j is only applicable to states, x, in Q, for which thetransition from q to x with i gives the same outputs.

Certainly two states of M’2 or M’ are equivalent if both there set of state and state-inputscomponents are respectively equal. They may also be equivalent if only the set-of-state parts areequal.

In figure 9-16 the deterministic equivalents have been constructed according to the rules infigure 9-17. In M’2 there there are two equivalent states, namely (2,1), [1] and (4,1), [1]. Since onlythe set-of-states of the two are identical they must be shown equivalent by the usual methods. In M’all three states with set-of-state components [A1], are equivalent as are all in which that componentis [B1]. The result of combining these equivalent states into a single state is shown in (c).

In addition to the rules for 1-lookahead transformation figure 9-17 gives a set of rules for a2-lookahead transformation. These simply involve gathering two inputs without producing anyoutputs and making these part of the state of the deterrministic equivalent. Delayed outputs aregenerated for each additional input. Generally the states represent the state of the generatingND FSM-O, and the next two inputs to be received. The state and two subsequent outputs areupdated with each input except at the end. Notice that because of the delayed output, three e s arenecessary to tease out the final outputs.

General sufficient conditions for a D FSM-O to be state k-lookahead and outputk-lookahead will be developed. First however, being now in a position to determine the number ofzeroes in the output of the OB transformation, we digress.

9.4.2.2. Digression-The Number Of Zeroes In An OB MultiplierThis number can be obtained by finding the probability that M’ of figure 9-17 produces an

output of 0 or 1 given that each of its inputs at any state is equally likely. This in turn can becomputed if the probability of M’ being in any of its states is determined. This is obtained by tracingthe states that M’ can be in after one input, and from all the states that M’ can be in after anadditional input, etc. until all states that will be reached are reached. Given the probability of beingin a state after step n and the fact that next states can result with equal probability given a 0 or 1,the probability of being in the states arrived at after step n + 1 can be computed. In general fromsuch a trace a general expression is derived for state probabilities after any number of inputs.

This procedure is illustrated in figure 9-18. There the states that M’ is in after 1, 2, . . . , 6 inputsfor all input sequences together with the probability that M will be in each of its states after theseinputs is shown.

From this figure a general expression for this probability for any number of inputs is derived. It

331

B45

A1

A23

B1

B45

A1

A23

B1

1

0

0

1

0

1

0

11

0

0

1

0

1

Start0

1

0

1

B45

A1

B1

B45

A10

1

1/2

1/2

2/4

1/4

1/4

3/8

2/8

2/8

1/8 2/16

6/16

3/16

5/16

B45

A1

A23

B1

1

0

0

1

0

1

0

1

6/32

5/32

10/32

11/32

B45

A1

A23

B1

1

0

0

1

0

1

0

1

22/64

10/64

21/64

11/64

1 2 3 4

5/8

3/8

11/16

5/16

21/32

11/32

43/64

21/64

P (k) =A!,B1P (k) =A23,B45

k =

P (k) =A!,B1

P (k-1) +A!,B1

P (k-1) A23,B452

P (k) =A23,B45 2P (k-1)

A!,B1

P (k) =A!,B1

P (k) =A23,B45 (1 + )

2k+2

k+1(-1)1

3

(1 + ) 2

k+3

k+1(-1)2

3

Figure 9-18: Analysis For Probability Of States In M’

is 2 + k (we start k at the third input). In the figure:

PA1,B1(k) be the probability that M is in states A1 or B1 after k inputs and

PA23,B45(k) be the probability that M is in states A23 or B45 after 2 + k inputs. From figure 9-17

The value of these probabilities are given for a number of values of k in the chart in the figure.For example

PA1,B1(1) = 5/8, PA1,B1(2) = 11/16PA23,B45(1) = 3/8 PA23,B45(1) = 5/16

The connection pattern of the trace arrives at a steady state, as it must, after which theinterconnection pattern remains the same. This allows the recursive formulation of theseprobabilities and, in fact, the general solution of the recursions. All this is given in the figure.

So, quite rapidly, as k increases, PA1,B1(k) approaches 2/3 and PA23,B45(k) approaches 1/3.

Since the output when M is in states A1 or B1 and receives an input is always 0, no matterwhat the input is, and that from states A23 and B45 is always 1. On average 2/3 of the trits in the OBoutput are 0 s.

332

9.4.3. Do N Future Inputs Determine The Next State?Given an ND FSM-O M there are a number of ways to determine there is an equivalent state

k-lookahead machine.

Given an ND FSM-O one can construct a k-lookahead machine for k = 1, 2, etc.. Thisconstruction follows the same rules as given for 1 and 2-lookahead construction in figure 9-17. If forsome k the construction results in a deterministic machine then M is state k-lookahead machine.The problem with this approach is that if M has n states its state k-lookahead equivalent may have ak as large as [n(n−1)/2] + 1.

B,C D,E0

1

(A)

(b2)

0

1

1

0

A

E10

B D

C

0

(a3)

0

0

1

1

0

A

B

C

D

E1

0

1

(a1)

0

1

0 1

01

0

1

A

B

C

D

E

0

(a2)

(b3)

(b1)

GMM

GMM

GMM

1

0

B,C

C,E

11

1

(A)

D,E

B,C

C,E

11

1

(A)

D,E B,B0

A,BC,D(B)

0

A,BC,D(B)

0

Figure 9-19: DState k-Lookahead: Pair-Path Graph Test

333

An approach whose termination is easily established involves consideration of pairs of statesof a ND FSM-O, M, and a test of whether such pairs can be resolved with a bounded lookahead,that is, every input sequence of length n applicable to at most one member of the pair.

DEFINITIONS

Two states s and q are 1-resolvable iff no single input is applicable to both s and q.

States s and q are said to imply states (with input i) δ(s,i) and δ(q,i).

States s and q of M are j-resolvable if for each input i, δ(s,i) and δ(q,i) are k-resolveable, wherek ≤ j − 1 and for at least one such input they are j−1-resolvable.

States s and q of M are 1-ambiguous if they both imply the same state with input i for some i.

Constructing The Test Graph GSj

Using these definitions one may determine whether states of M are or are not n-inputresolvable for any finite n, by study of a related graph.

First one finds all pairs of states in ∆(sj, i) for all i, for the state sj of M. The set Sj initiallyconsists of all these starting pairs.

For each pair in Sj there is an associated vertex in the graph GSj. For each pair P, in Sj add all

pairs implied by P and not already in Sj to Sj and an associated vertex in GSj, and add a vertex to

GSj. If pair P implies pair Q with input i then also add an edge from P to Q with label i. This process

is repeated until no new vertices can be added to GSj.

Testing GSjFor State k-Lookahead

If the graph GSjis cyclic or ambiguous then it is not n-resolvable for any n. If the longest path in

the acyclic graph contains nj vertices then if M starts in state sj then with a lookahead of length njthe next state is determined.

We can construct a seperate graph like GSjfor each state, sj of M and if none are cyclic or

1-ambiguous then M is state max(nj | nj a vertex in GSj)-lookahead. Alternatively, since some vertices

will be in more than one graph, we can build the union of the graphs GSjfor all states, sj, in M by

adding new pairs formed by consideration of each state. After the set of new vertices associatedwith new pairs, and edges between them associated with implications are added for each state thegraph the resultant graph can again be tested for ambiguity and cycles. If after all pairs generatedby all states in M have been added there is still no ambiguity or cycles one may conclude that Mstate k lookahead where k is the length of the longest path in the final pair associated graph, GM.Instead of making the the test incremental with the addition of vertices and edges associated witheach state in M, one could postpone all testing till GM is completely constructed.

Start with the straightforward task of finding 1-resolvable pairs of states. Make each of these avertex of the graph GS1

. It will be a vertex with out-degree of 0.

If no pair of states meet this condition then M is not n-distinguishable for any finite number n.

If there some 1-resolvable state pairs then the determination of 2-distinguishable isstraightforward following the definition of j-distinguishable above.

Again, if there no 2-distinguishable state pairs and some pairs which are not 1-distinguishable

334

then M is not n-distinguishable for any finite number n.

This procedure continues until for some j all remaining pairs of states which have not beenalready found k-distinguishable k < j, are determined to be j-distinguishable or none are so foundand M’ is thus determined not to be finitely determined.

EXAMPLES

Figure 9-19 show some examples of the state k-lookahead test. At (a1) a ND FSM-O with two0 inputs accepted in state A, and two 1 inputs accepted in state B is shown. Outputs are not shown-they are irrelevant when testing for state k-lookahead. So states A and B each go to a pair, {B,C}and {C,E} respectively. These pairs imply other pairs. The resultant graph with vertices associatedwith state pairs and edges with pair implications is shown in (b1). In this case the graph is acyclicand contains no ambiguous vertices. The longest path (counting vertices) is that starting at {B,C}and continuing through {C,E} and {D,E}, so the machine in (a1) is state 3-lookahead. Also the stateB has two transitions on 0, going to the pair C,D. The pair path generated here is also acyclic andun-ambiguous and requires only a lookahead of 2--3 dominates.

At (a2) another ND FSM-O is shown. In it two inputs are accepted in state A, and two in stateB. A again goes to the pair {B,C}. The graph generated by this pair and all pairs implied is shown in(b2). This graph is 1-ambiguous. The state associated with pair {D,E} implies that associated with{B,B}, a pair of states which is implied by a single state of (a2) both go to the same state with input0. Although, as in (a1), there is also a pair path started by state B, it need not be considered furtherbecause already state k-lookahead cannot be.

The ND FSM-O at (a31) has only one state, A, that implies a pair. The associated graph isshown in (b31). It contains a cycle which again means state k-lookahead cannot be.

9.4.3.1. Output K-lookahead--A TestThe given test is sufficient to determines whether an ND FSM-O is state k-lookahead. If it is,

then it is also output k-lookahead. If it is not then it may still be output k-lookahead, , and that isnecessary and sufficient for constructing an equivalent D FSM-O.

Testing for output k-lookahead requires attending to the outputs associated with each input inthe given ND FSM-O, M. It is initiated after the test for state k-lookahead fails. So either a cycleand/or an ambiguous pair exists in the pair associated graph we call GM, (this may be the partiallydeveloped one based on only some of the states in M or that based on all states.)

For each ambiguous pair there is in M one or more pairs of paths, or paths of pairs, oftransitions, each starting at a state of M going with the same input on a pair of transitions to a pairof states of M, from there to another pair of states with a pair of transitions both under the sameinput, etc. till a pair of states through a pair of transitions with the same input both pass to a singlestate. If any of the pairs of transitions occur with different outputs then there is no M is clearly notoutput k-lookahead. However if every pair of transitions with the same input on this path of pairsare the same then that ambiguous path does not prevent M from being input output k-lookahead.For each cycle GM also there is a pair of paths in M and if each transition pair with same input insuch a cycle have equal outputs it to will not prevent output k-lookahead. On the other hand if anysuch transition pair with the same input has unequal outputs M is unremoveably ambiguous.However M is k-lookahead even with ambiguities and cycles provided that within anmbiguities and

335

cycles all transition pairs with the equal inputs have equal outputs. If M is in a state leading to apath of transition pairs on an ambiguos path/cycle then looking far enough ahead to determine thatsubsequent input leads to the end of that path/cycle and no other path or cycle, one can determinethe output that may result.

When there are equal outputs along paths of transition pairs it is possible to build equivalentsto M in which such paths of pairs become paths of single transitions. If all such paths can bereduced to single paths because of equal outputs the result is an equivalent to M, say M’, with nocycles or ambiguity in its GM′. Note that ultimately, if all outputs of M were the same, M can beimplemented trivially without lookahead.

The procedure we give for removing an ambiguous pair path ND FSM-O, M, at (a2) isillustrated in figure 9-20 where there is a pair path started from state A going to {B,C} on input 0 andcontinuing through {D,E} on input 1 and thence to {B,B} with a 0 input.

Ambiguity Removal1. Add to M one new state for each pair of states in the pair path and interconnected

with edges labelled with the input taking each pair to the next. The result of doing soto M is called M’1--see the example.

2. Assume S, representing state pair (s1, s2), is on the added path in M’1 with transitionslabelled i which takes s1 to q1 and s2 to q2. Then , if there is any other transition froms1 or s2, to any state r in M, then an edge should be added from S to r in M’1. Theresult of adding all such edges is M’2-- see M’2 in the example.

3. Remove both of the edges from the state starting (A in the example) the ambiguouspath to the first pair of states of the of that pair. This gives M’3.

4. Simplify: Two states are equivalent if there is a 1-1 correspondence between theirtransitions in which the "input/output’ labels of corresponding transitions are identical.This gives M’ the result of removing the ambiguous path--there may be othersremaining. In the example there are.

In the example M’ still has an ambiguous path with starting state X. It to is removable if theappropriate inputs are equal. That path has X going to {C,E} on input 1, to {E,D} on input 1 andthence to B with input 0. The steps above have been applied to M’ in figure 9-20 to get M’’. Thedashed portion is not in the final result, it is removed in the final step of simplification.

EXAMPLES

Consider M of figure 9-21 (a3) (without the dashed line), which is a repeat, with outputsspecification added, of (a3), M, of figure 9-19. This failed to be state k-lookahead as demonstratedbecause of the cycle in GM at (b3) of figure 9-19.

But using the cycle removal pocedure an equivalent of M called M’ is shown at (c3) assumingthat u1 = u2 = u and v1 = v2 = w anw1 = w2 = wd . The graph of pairs for M’, G1

M′, is empty(DETERMINISTIC at (d31)). So it can be implemented with no lookahead. The development of theequivalent M’ is the key to the test. In order for there to be an equivalent the equality of outputsspecified above is necessary. Now we review the removal procedure as applied to this case.Having identified the cycle as involving pairs B,C and D,E the equivalent, M’, has two states not inM, X and Y, corresponding to one of the pairs. State X is entered with a 0 with output u,corresponding to state A entering both B and C in M. Similarly the outputs associated with all the

336

1/y

0/x

A

B

C

D

E

0/u1

0/u2 1/z0/w2

11/v

21/v

10/w

(a2)

M

B,CX

1/y

0/x

A

B

C

D

E

0/u

0/u 1/z0/w

1/v

1/v

0/w0/u

YD,E1/v

0/w

B,CX

D,EY

1/y

0/x

A

B

C

D

E

0/u

0/u 1/z0/w

1/v

1/v

0/w0/u

1/v

0/w

1/z

1/z

1/y

1/y

B,CX

D,EY

1/y

0/x

A

B

C

D

E

1/z0/w

1/v

1/v

0/w0/u

1/v

0/w

1/z

1/z

1/y

B,CX

1/y

0/x

A

B

C

D

E

1/z

0/w

1/v

1/v

0/w

M’

0/u

1/v

1/z

B,CX

1/y

0/x

A

B

C

D

E

1/z

0/w

1/v

1/v

0/w

M’’

0/u

1/v=z

1/z

C,EY

D,EZ1/v=y

0/w

0/x

1/v=y

M’1

M’2 M’3

Figure 9-20: State k-Lookahead: Procedure For Removal Of Ambiguities

transitions in the cycle of B,C and D,E become single transitions in a cycle involving X and Y in M’.Also all the individual states in M will generally be included separately in the M’ equivalent.Consider a transition in M from a single member of any of the pairs represented by X and Y in M’which is not applicable to both members of the pair. Such a transition is that from E to D in Mlabelled with 1/y, i.e., from a member of D,E represented by X to D. This requires an edge from X toD in M’ labelled with 1/y and also the addition of all states that can be reched from D, etc. (If thedashed line in (a3) is included M’ will also have an edge from X to C labelled 0/z and all states

337

(d3)

DETERMINISTIC

(a3)

A

B

C E

D0/u1

0/u2

0/w2

11/v

21/v

10/w

1/y0/z

M

(d2)

0/x

A

B

C

1/y

D

E

0/u1

0/u2 1/z0/w2

11/v

21/v

10/w

(a2)M

(c2)

GM’’(f2)

GM’

0

Y,C(A) D,Y

1B,B

0

B,AD,C(B)

0

GM’

B,CX

1/y

0/x

A

B D

1/z

0/w

1/v

0/w

M’

1/v

1/z

C E1/v

1/v=y

B,AD,C(B)

0

(e2)M’’

B,CX

1/yA

B D

0/x

1/z 0/w

1/v

C E1/v

0/w

1/v=z

1/z

C,EY

D,EZ1/v=y

0/w

0/x

0/u

0/x

0/u

0/x

1/x

(c3)M’

B D

EC0/w

1/v

0/w

1/y0/z

D,EY

1//vB,CX

0/w

0/z

1/y1//v

0/u

1/v

A

1/x

Figure 9-21: State k-Lookahead: Removal Of Ambiguities And Cycles

reachable from C in this case GM′ has a longest path of 1 vertex ((d32)) and so a lookahead of 1 isrequired in a deterministic equivalent.)

In figure 9-21 an equivalent of M at (a2) called M’ is shown at (c2) assuming that u1 = u2 = uand v1 = v2 = v and w1 = w2 = w. M’ is the result of the removal procedure whose details are shownin figure 9-20. The graph of pairs for M’, G1

M′ at (d2) shows that ambiguity still exists in M’. If output

338

z ≠ v or y ≠ v then the ambiguity cannot be removed. If however all these are equal (all designatedv) then we can get a second equivalent M’’ following the removal procedure. The pair path graph forM’’ is at (f2)--it shows M to be 2-lookahead.

9.4.4. Other Uses Of ND To D FSM-O TransformationIn developing the FSM-O for the modified Booth algorithm the non-determinism came directly

from the mode of thinking about the algorithm. But there are other modes that might have beenadopted. One might have incorporated the notion of lookahead directly. Non-determinism is not anecessary. On the otherhand there are a number ways ND FSM-Os arrise naturally, without everthinking about non-determinism.

9.4.4.1. The Inverse Of An FSM-O, Booth ExampleND FSM-O s arrise in development of the inverse of the input-output transformation of a given

FSM-O. Realization of an inverse of a given D FSM-O was first studied in [Huffman, David A.;"Canonical Forms for Information-Lossless Finite State Logical Machines" Selected PapersAddison-Wesley, 1964]. It is not discussed there directly as the problem of realizing a ND FSM-O,but involves equivalent considerations.

Consider an FSM-O which has a unique output sequence, o(I), of length n for each inputsequence of length n. By interchanging the "input/output" (ex. 0/1 becomes 1/0) designations oneach state transformations edge in the state diagram an FSM-O which produces the inverse results. However this inverse FSM-O may very well be non-deterministic. Even then it may beimplementable as a k-lookahead deterministic machine, a more complex machine. Or it maaysimply be unimplementable, i.e., there is no inverse. Notice that the inverse for a D FSM-O still hasno more transitions from any state than the number of inputs of the machine.

Relating this to the inverse of a combinational circuit in Chapter I: It may be that thecombinational circuit which is used to implement a given D FSM-O has an inverse, as described inChapter I, in that there is a unique input, and next state for every output and current state. This willoccur if the inverse machine is deterministic. However even if that function does not have aninverse, i.e., the inverse machine is non-deterministic, there may be an effective inverse D FSM-O--obtained by allowing lookahead of k inputs.

As an example, the inverse transformation is applied to the D FSM-O for OB, M’, is given infigure 9-16 (c) and repeated in figure 9-22. In the latter figure the inverse resulting frominterchanging input and outputs is also shown. The deterministic equivalent which is 1-lookahead isgiven in (b) of that figure. It is tested and developed by the general techniques given previously.

9.4.4.2. Reversed FSM-O M1 ExampleAs is true for the inverse, ND FSM-O s arrise naturally in development of the reverse of the

input-output transformation of a given FSM-O.

An FSM-O gives for each input sequence, I, of length n an output sequence O(I) of length n.By reversing each transition arrow, without changing its "input/output" designations, an FSM-Owhich generates the output in the reverse order when the input is recieved in the reversed order isgenerated. However this reverse FSM-O is very likely to be non-deterministic. Then it may beimplementable as a k-lookahead deterministic machine, a more complex machine, or simply

339

(a)

InverseDeterministic1-lookahead

0/0

-1/1

[C]B,1

[FD]B,-1

1/1

[BC]S,λ

1/1

-1/1

0/1

0/0

[FBC]C,0

0/0

0/1

(b)

Startλ/λ

f/e

f/e

E,1[FC]

f/e1/0 [D]

E,-1-1/0

0/0[FDE]

D,0

f/e0/0

f/eIndicates transition to state Final with input f and output e (λ)

0/11/0

λ/1

λ/0

0/0

B

C

S

0/1

0/e

1/0

-1/1

-1/e

-1/1

0/0

1/e

0/e

E

D

f / λF

B

C

S

1/0

0/0

0/1

1/λ

0/λ

e/0

0/1

1/-1

e/-1

1/-1

0/0

e/1

E

D

F

1/0

Initial

e/0

Inverse

Det

Non-Det

(c)

Final e/e

Figure 9-22: Inverse Of Optimal Booth (OB) Transformation, T2(T1(N)), With FSM-O M

unimplementable, i.e., there is no unique reverse.

An example is given in figure 9-23 (b) of a machine which produces the reverse of the outputof M1, (giving the T1 transformation, figure 9-16), when the input is reversed. The deterministic1-lookahead equivalent again developed by given techniques is shown also.

Notice that the reverse of a D-FSM-O cannot be ambiguous - though its pair path graph may

340

have cycles.

As will be shown later: the reverse of a D FSM-O, M, can be used to determine if M s currentstate can be determined by a finite number of previous inputs.

( a )

1/00/0

ee/

1/-1

0/1BA

FFinal

ee/ Reverse

(b)

1/00/0

1/-1

0/1BA

Fee/ee/

Final

’

f/λ f/λ

( c )

0/e 1/e

1/0

0/0[F’]

e/λ

Start

F,e

[A]

A,0

[B]

B,10/-1

1/0

Final

f/0 f/-1

Non-Det

Det

Figure 9-23: FSM-O M1 For Booth Transformation T1 With Input Reversed

9.5. Alternative Implementations Of Booth TransformationsA deterministic FSM-O with an input of length n requires O(n) time . It can be implemented as

an iterative circuit with circuit cost O(n), in time O(n), or, with additional circuitry, in O(n−2) or evenO(logn). In some cases, ex. when the current state of the FSM-O depends on no more than pprevious inputs, a time O(1) is sufficient. The possibility of achieving O(1) time, with a Booth typealgorithms as our main example, is now considered.

9.5.0.1. Optimal Non-O(1) DesignThe T1 transformation can be implemented with FSM-O, M1 figure 9-10 (a), in which the

current state is completely determined by the single previous input (in either state a 1/0 input putsM1 in state 1/0). Therefore the output of each cell in the iterative equivalent depends on theprevious input (giving the current state) and the current input determining, with the state, the output(so the output at each cell is determined by the current and previous input.). Therefore an iterative

341

equivalent with O(1) time is available. However in the T2 transformation implementation, M2 (b), thecurrent state cannot be determined by a finite number of previous inputs and so M2 operating timeis dependent on, n, the number of inputs. Therefore the combined M1-M2 implementation similarlyrequires time which increases with n.

The OB design given for the T2 transformation replaces the maximum possible number ofsuccesive pairs of ones, 1 −1 or −1 1, with a single 1 and a 0, namely by 0 −1 or 0 1 respectively. Ifthere is a sequence of alternating positive/negative and negative/positive 1 s of even length before a0 (or just before the end) then each pair will be replaced with a 0 and a 1. If the alternatingsequence is of odd length then, excluding the last, all pairs will be replaced. One cannot reduce thenumber of 1 s any more, thus "Optimal Booth", as shown next.

9.5.0.2. Minimum Number of 1s In Booth NotationAn n+1-trit-representation is an n+1 position string of 0 s, 1 s, and −1 s, representing numbers

in the range −2n to 2n−1 (the same range as the n+1 bit 2 s−complement numbers). It represents anumber in the same way that the output of OB does.

An n+1-trit-1−biased-representation is an n+1 position string of 0 s, 1 s, and −1 s representingnumbers in the range −2n to 2n−1 in which two succesive 1 s (+ or −) never occur.

It will be shown that for any n+1-trit-representation R, with two or more succesive 1 s,representing the number N in its range, there is an n+1-trit-1−biased representation representing, N,which has no more 1 s than R. Consider

Procedure

Consider the following procedure to convert a n+1-trit-representation R.to an equivalentn+1-trit-1−biased-representation. R is scanned right to left (low-order to high-order) in search for:

1. a pair of succesive 1 s of different sign.

2. a sequence of two or more succesive +1 s followed either by a 0 or a −1

3. a sequence of two or more succesive −1 s followed either by a 0 or a +1

The following transformations are performed on detecting each of the above respectively1. replace with a +1/−1 and then a 0 if the first member of the pair is −/+.

2. replace with a −1 at the first +1 in the sequence, with 0 s at subsequent +1 s in thesequence and with a +1 at the 0 following the sequence if there is such a 0, or a 0 atthe −1 following the sequence if there is such a −1.

3. replace with a +1 at the first −1 in the sequence, with 0 s at subsequent −1 s in thesequence and with a −1 at the 0 following the sequence if there is such a 0, or a 0 atthe +1 following the sequence if there is such a +1.

NOTE: that because of the range of numbers represented each sequence of at least 2+1 s/−1 s must be followed by at least one −1/+1 or a 0. That is though the rightmost trit may be +1 or−1 the rightmost k > 1 cannot all be +1 s nor can they all be −1 s.

Each replacement eliminates adjacent +1 s, but a 1 at the end of a replacement 2) or 3) maybecome adjacent to a 1 generated at the beginning of any of the replacements, 1), 2) or 3). To takecare of these adjacencies--after each replacement and before continuing the scan back up one tritbefore the first of the replacement trits.

342

This procedure clearly converts a n+1-trit-representation R to an equivalent (representing thesame number) [n+1-trit-1−biased-representation R′. And R′ has no more 1 s than R because eachreplacement has either fewer or an equal number 1 s than those trits replaced.

A = . . . . 0 1 0 . . . 0 0

B = . . . b b b . . . b bjj+1 j-1 01

A = . . . . a a a . . . a ajj+1 j-1 01

C = 0 . . . 0 0 0 . . . 0 0

’ ’’’’-

Figure 9-24: A And B are 1-Biased Representations of the Same Number

1. The n+1 position string of 0 s, 1 s, and −1 s that results from the OB Transformations isan n+1-trit-1−biased representation.

No such string has two succesive 1 s (+ or −). This follows from the nature of thecomponent transformations. In the output of T1, in any subsequence of two 1 s one ispositive and the other negative (in either order). And then T2 replaces each suchsubsequence with a 0 and 1 (or −1). This makes the conclusion plausible. A tighterproof is provided by the state diagram for M′ (c), the deterministic FSM-O for T2(T1(N))in figure 9-16. Every state transition of M′ with a 1 or −1 output goes to a state, X, withthe property that for every input to X, the output is 0.

2. Two different n+1-trit-1−biased representations represent different numbers.

The number represented by the n+1 trit 1−biased representation X = xn−1 xn−1 . . . x0 isxn−1 2n−1 + xn−2 2n−2 + . . . + x0 20. Given any two n+1 such numbers, A and B,thatrepresent the same number they must be identical. Clearly if they represent the samenumber then A + (−B) = C = 0. (−B is simply B with the signs of all its 1 s changed.)Consider the rightmost 1 (+ or −) in A, say at aj. Because the number is 1−biasedaj+1 = 0. More particularly assume aj = 1 and consider figure 9-24 which shows therelation A + (−B) = C = 0.

In order for ci to be 0 for all i,a. b0 through bj−1 must each be 0.

b. bj must = −1 because if it were 0 then cj ≠ 0, if it were 1 since bj+1 = 0(1−biased), cj = 0 (carry 1) and cj+1 = 1 ≠ 0.

A similar argument holds if aj = −1, in which case bj = 1.

The same argument can then be made for each succesive 1 in A.

In conclusion then B must be identical to A for the sum of A and −B to equal 0 andthus for both to represent the same number.

So, for each of the 2n n+1 bit binary input strings, n+1, OB produces a unique n+1-trit-1−biasedrepresentation, R. No other n+1 trit representation of n+1 has fewer 1 s. So R is the n+1-tritrepresentation with the minimum number of 1 s.

Incidentally the 2n n+1-trit-1−biased numbers that result from the 2n possible binary inputs tothe OB transformation is a fraction of the number of n+1-trit-1−biased representations possible.

343

9.5.1. An Alternative O(1) Design, SOBThere is another implementation of the Booth type transformations in which T1 is followed by

T2′. T2′ takes succesive pairs of inputs and for each pair produces a 0, 1, −1, 2 or −2. These areused in multiplication to represent a multiplier, as such they result in

1. +0: shifting the position of the next multiplicand to be added,

2. +1: shifting and adding the multiplicand,

3. −1: shifting and adding the 2 s−complement of the multiplicand,

4. +2: shifting twice and adding the multiplicand,

5. −2: shifting twice and adding the 2 s−complement of the multiplicand.

This transformation is called the Almost Optimum Booth (AOB) transformation because itproduces a slightly greater number of 1 s than OB.

In the implementation there is a null output, λ, on the first of each succesive pairs of inputs.But there is one on each second member of the pair. The state diagram for a D FSM-Oimplementation M2′, of T2′, is given in figure 9-25 (b). (The heavy line returning to state S representsall the state transitions from states 0, 1 and −1 to S.)

The state diagram for M2′ at (b), the FSM-O implementation of T2′, shows that the currentstate is not determined by a finite number of previous inputs. For example, if the previous inputswere (01)n, for any n then the state of M2′ could be S or 1.

However examination of (b) shows that the states of M2′ can be partitioned into Q1 = {S} andQ2 = {0, 1, −1}. M2′ is in a state of Q1 when the first, and every odd input is recieved; it is in a state ofQ2 on even input reception. This information along with that of a finite number of previous inputs isenough to determine the state of M2′ at each cell of the iterative circuit. Since M2′ is in S on everyodd input, no matter what the previous inputs were, every odd cell in the iterative circuit has a stateinput of S, so its output is λ no matter what the cells input is. Furthermore the state input to thenext, even, cell is determined by its previous input, namely state 0, 1 or −1 respectively as thatprevious input is 0, 1 or −1. This together with the input to each even cell is enough to determine theoutput of each even cell. So M2′ s state is determined by the parity of the input (even-odd position inthe equivalent iterative circuiit) and at most one previous input. So M2′ can be implemented as aniterative circuit in O(1) time.

M1 also can be implemented in O(1) time. The iterative circuit for M1 is shown at the top ofFigure 9-25 (c). Its outputs serves as inputs to the iterative circuit for M2′ (lower order inputs are onthe left). These iterative circuits show the dependencies of states on previous inputs or simply onposition (state S). The state of M1 depends on one previous input, as does the state of even cells forM2′. As can be seen for the composition M1 → M2′, for the composed circuit the state depends onthe previous 2 inputs, and the cell output depends on that state and the current input, that is, onthree consecutive inputs (observe the darkened lines).

In figure 9-26 M1 and M2′ of figure 9-25 are shown again in (a) and (b). In (c) the composedstate diagram for M1-M2′, M1−2′, is shown. As for M2′ in figure 9-25, so too for M1−2′ in (c): a finitenumber of previous inputs does not determine the state, ex. if the previous inputs were (01)n, forany n then the state of M1−2′ could be S B or B −1. Also consistently the states of this state diagramcan be partitioned into two sets {SA, SB}, and {A0, A1, B0, B−1} which M1-M2′ enters alternately. So

344

(c)

( c )

( a )

1M

1/-1

1/0

0/1

A B

0/0

e/eFinal

e/e Final

( b )2’M

1

0

-1

0/λ

1/λ

-1/λ

1/2

-1/-2

0/1

-1/-1

0/-1

1/1

S

0/0

e/eFinal

SSS SSS

i1 i2 i3 i 4 i 5 i6

λλo 1 o 2 o 3

A

011 1000

B BA B A

0 -1 -11

-2

1

-1

M

M

I

O

Circuit For

10

λ

0

A

0 0

2

(0 -1) -1)(1 1)(0

1

2’

λ

e

Final

i7

e

Final

Figure 9-25: M1-M2’, Booth (0,1) To (0,1,-1,2,-2) Iterative Circuit

alternative cells of the iterative circuit have their input states in these sets. For those cells in {SA, SB}the previous input determines whether the state is SB or SB. And for the alternate cells in states{A0, A1, B0, B−1} the precise state can be determined by two previous inputs--then with the currentinput the output of that cell is determined. This example illustrates the input dependent property ofan FSM-O which leads to an O(1) implementations.

345

( b )

(c)

2’M

1-2’M

( a )

1M

1/-1

1/0

0/1

A B

0/0

e/eFinal

e/e Final

1

0

-1

0/ λ

1/ λ

-1/ λ

1/2-1/-2

0/1

-1/-1

0/-1

1/1

S

0/0

e/eFinal

0A

0B

-1B

A 1

AS

0/ λ

0/11/ λ

0/2

0/0

0/1

1/0 S B

1/-2

1/ λ

1/-1

0/ λ

1/-1

e/eFinal e/e Final

Figure 9-26: M1-M2’, Booth (0,1) To (0,1,-1,2,-2) State Diagram

A general procedure for detecting state dependence on previous input will be given in a latersection. As in this example the procedure can be simplified if the set of states in which an FSM-Ofinds itself varies periodically with the cardinality of the input.

346

(a)

AS

SB

ASA0

B-11

0

1

0

0

1

B0

A1

A0

B-11

0

1

0

1

0

0

1

0

AS

SB

1

0

1

1

0

1

0

B0

A1

A0

B-1

AS

SB1

0

0

11

00

1

(b)

AS

SBB0

A1

A0

B-11

0

1

0

AS

SB1

0

0

11

00

1

B0

A1

A0

B-1

AS

SB1

0

0

11

00

1

B0

A1

A0

B-11

0

1

0

(c)

0A

0B

-1B

A 1

AS

0/λ

0/11/λ

0/2

0/0

0/1

1/0 S B

1/-2

1/λ

1/-1

0/λ

1/-1

e/eFinal e/e Final

Figure 9-27: Determination Of State Dependence On Previous Inputs, Booth Example

9.5.1.1. Can The States Be Broken Into Proper Subsets Entered Periodically?If there are subsets of D FSM-O M entered periodically then tests for determining if states are

determined by a finite number of previous inputs n-input determined, can be simplified. Here wenote the straightforward procedure which determines whether there are such subsets--the resultantsimplifications in deciding if M is n-input determined is considered later.

Starting with the initial state list all the states that can result with a single input, wiith two

347

inputs, etc. These sets of states may settle down to a single set which eventually repeats itself ateach new input. Alternatively, and this is the interesting case, they may cycle through differentproper subsets of the set of all states as happens for the state diagram of figure 9-27(a). This testterminates when a set of states repeats.

Having decided on the periodic subsets it may be happen that the M is n-input determinedwith a small n. Although the following brief test will decide this question if it is n-determined, it willnot decide when it is not.

Cycling through proper subsets puts M periodically in a state within each such subset. So Mpasses through a state in S0 then one in S1, then S2, then . . . , then Sk and back to S0 etc. Theinvestigation starts by assuming M is in some state of S0, now apply each input to each of thosestates to obtain the next possible state set, S1. If each state of S1 is reached from states in S0 byinputs different from the inputs by which any other state of S1 is reached from states in S0, thenknowledge of the previous single input is sufficient to determine a state in S1. But the same inputmay lead to two different state in S1, then again test each input applicable to states in S1 to reach allstates in S2, this may result in each state in S2 being reached by one or more instances of twosuccesive inputs after M is in S0, with which no other state in S2 can be reached. If this is not thecase then consider inputs from S2 to S3 etc. This continues until in Sjmod k

, where j is the number of

sets that must be considered before we know either that:1. j previous inputs are sufficient to determine each state in Sjmod k

, or

2. it is known that this will never happen (how this is determined has not beenconsidered. It will be later.)

Now if Sjmod khas the first property then knowledge of the state in Sjmod k

and the next input are

sufficient to determine the state in S(j+1)mod k. So each state in S(j+1)mod k

is determined by no more that

j + 1 previous inputs, and perhaps fewer. And similarly the state in S(j+2)mod kare determined by the

previous j + 2 inputs.

The subset forming and n-input determination is for the state diagram of the O(1) Boothdesign is illustrated in figure 9-26 (c). This is repeated in figure 9-27(a) and the subset formationtest is shown in (b) of that figure. M given by (a) cycles through the sets {SA, SB} and{A0, A1, B0, B−1}. The state Final is not included in the figure, it is reached from either SA or SB byinput e, and so should be added to the set {A0, A1, B0, B−1}.

9.5.1.2. AOB: A Non-Optimal But Efficient Approximation AlgorithmReturning to figure 9-25, each non-λ output is a number representing T2′, 0, 1, −1, 2 or −2.

Instead of being presented at one output it can be spread over two, one of which replaces theprevious λ. So in place of 0, 1, −1, 2 or −2, there would be 0 0, 1 0, −1 0, 0 1 or 0 −1 (the lo-order is onthe left). alternatively represented with two positions. This circuit is then equivalent to scanningpairs, ij ij+1, j = 1, 3, . . . from the output of T1, and replacing pairs +1−1 with −10, and −1 1 with 10,and other pairs with themselves. However this is non-optimal because, for example, if the output ofT1 with, i1 on the left is 0 −1 1 0 the result will be 0 −1 1 0, whereas the optimal would be 0 1 0 0. In thiscase a second pass looking at pairs, but starting on the second trit would yield 0 1 0 0, which is fine.However if the first pass starting on the first trit encountered 0 −1 1 −1 1, whereas 0 −1 −1 0 1 which isnot optimal and un-improvable by a second pass of pairs starting at the second trit. The actual ratioof the number of 1 s to 0 s is developed in the next section.

348

This approach could have been designed directly as shown in figure 9-28. In (b) is shown thestate diagram for M2′′. Instead of, like M2′, outputting λ followed by 0, or by, or by 1, or by −1, or by2, or by −2, M2′′ outputs 0 0, 1 0, −1 0, 0 1 or 0 −1 (the lo-order is on the left). The FSM-O M2′′ starts instate S, and thereafter alternately, on successive inputs, is in a state one of two sets of states SETa= {0x, 1x, −1x} and SETb = {0y, 1y, −1y}. Alternate cells in the iterative circuit has its input state in one ofthese two sets. (This could have been arrived at by the general procedure given for finding suchpartitions given preeviously). As seen from the state diagram, in a cell whose input state is in SETathe precise state is determined by the previous input. If that input was i then that state is ix. Andthen it follows that for those cells whose input state is in SETb two previous inputs are enough todetermine its exact state (the first of the two determine which state in SETa was a result of that input,the second of the two together with that state determines the precise state in SETb. )

Since both M1 and M2′′ are O(1) the composition of the iterative circuits for M1 and M2′′ isshown in (c).

Well, looking at each pair of inputs starting with i1 to replace pairs of 1 −1 and −1 1 with 0 −1and 0 1 respectively, does not result in removing as many 1 s as possible. How about, then,following this transformation by one in which the same replacements are made but the pairs lookedat are those starting with i2? This replacement (transformation T3) can be accomplished by aniterative circuit which, similar to T2’’, has a cell input state which can be determined by the cell’sposition and by two previous inputs. So the three transformations T1 − T2′′ − T3 is O(1). It will alwaysleave as few, as sometimes fewer 1 s in the result than T1 − T2′′. Will it however always leave as fewas of the optimal design? No! If after T1 the result was 0 −1 1 −1 1, after T2′′ the result would be0 −1 −1 0 1 and after T3 still 0 −1 −1 0 1.

9.5.1.3. The Number Of Zeroes In A AOB MultiplierThe analysis of the number of zeroes in an OB multiplier is now repeated for the AOB

multiplier generated by the D FSM-O for the AOB M1−2′ in figure 9-26. The objective is to find theprobability of being in each state after k inputs. Analysis of the states that result after all possibleinput sequences to M1−2′ shows that it cycles through a member of the set S1 = {SA, SB} then amember of S2 = {A0, A1, B0, B−1}, etc.. When in each set, it is equally likely to be in any of the statesin that set, assuming 0 and 1 are always equally likely. From S1 a λ output is always produced nomatter which of its states M1−2′ is in. From S2, given that any of its states (as well as either input) isequally likely the following binary representations of the outputs are all equally likely 00, 01, 0−1, −10(from states A0, B−1)) and 00, 01, 0−1, 10 (from states B0, A−)). It follows that 10/16 = .625 of the tritsare 0, a slightly lower fraction than .667 produced by M’,figure 9-16, for the OB transformation.

9.5.1.4. O(1) Implementations In GeneralThe general point is that:

349

( c )

M

M

I

O

Circuit For

SSS

i1 i2 i3 i 4 i 5 i6

λ

A

11 10 000

BA A B

0

0

B

1 0

0x1x 0y

o 11 o 12 o 21 o 22 o 31

yy-1

-1

1y

o 32

-1 -1 00 0

i7e

x0

-1 e

A

1

1

1

2’’

( a )1M

1/-1

1/0

0/1

A B

0/0

e/eFinal

e/eFinal

( b )

0/λ

1/λ

-1/λ

1/0

-1/0

0/1-1/-1

0/-11/1

S

0/0

-1x

1x

0x

i/-1

-1y

0y

i/0

1y

i/1

to

xistate

2’’M

to

xistate

to

xistate

e/-1Final

e/0Final

e/1Final

Final

Final

Figure 9-28: M1-M2’’, Booth (0,1) To (0,1,-1) (Approximation Algorithm)

350

If1. The states of FSM-O, M, can be partitioned into subsets of states,

Ri , i = 0 to k−1, such that M passes through a state of R0, then a state of R1, then. . . Rk−1, and then repeats this set of transitions. For the iterative equivalent

circuit this implies that the state inputs to cells are periodically members of R0,R1, . . . Rk−1.

2. For one of the subset of states say Rx, the precise state of Rx that M occupieswhen periodically in a member of that set is completely determined by a finitenumber of, say v, previous inputs of M.

Then the state at any position in the iterative circuit can bedetermined by a finite number of previous cell inputs.

Once the state at Rx is known by the its v previous inputs, the state at the next cell, being oneof the states in Rx+1modk

, is certainly know with one additional input. And so forth until again a cell

whose possible state inputs are in Rx are reached, requiring only v previous inputs for itsdetermination.

9.5.2. Iterative And FSM-O Models: Lookahead, LookbehindThe design of FSM-O s and an equivalent n-input-output iterative circuits are almost identical.

Generally each cell of the iterative circuit contains the logic of the FSM-O whose requirements aregiven by the state diagram for the FSM-O. (There is however the significant difference that the i thcell has information that it is in fact the i th cell, which information is not available in the FSM-Owithout adding a counting facility to its design, reflected in an altered state diagram. This wasillustrated in the design of the AOB transformation.) Equivalence means that for every inputsequence I both give the same output sequence. Generally, particularly for cases in which the stateand output depend on a finite number of previous inputs (AOB designs) we are interested in theiterative circuit. In either case designs depend on the state diagram. In the simple standard modelof an FSM-O (Iterative Circuit) the output and state (of any cell) is dependent on the current state aswell as a current input (of that cell). But we have seen how a non-standard lookahead model canalso result in FSM-O s and iterative circuits designs. In general, in fact, design can be based on aFSM-O model with both finite lookahead and finite lookbehind which we call the generalized model.

The generalized FSM-O-iterative model consists of a sequence of similar cells in which theoutput and state of any cell can be made dependent on inputs to a finite number of later as well asearlier cells. By way of summary we now show that For any generalized model FSM-O-iterativemachine there is an equivalent standard model FSM-O-iterative one. This can be done by merelyadding some additional dummy inputs, renumbering and and reinterpreting outputs and states. Theresultant equivalent machine the simple standard model in which the output and state of any cellcan be made dependent on the current state as well as a current input.

In a standard FSM-O, I = <i1, i2, . . . , in>

q0 = Sqj = δ(qj−1,ij)oj = o(qj−1,ij)

In the general model a machine’s next state and output depends on its current state as well asits r previous, its current, and k − 1 subsequent inputs and is described as follows:

351

The initial state is q0

qj = δ( <ij−r, . . . , ij−1>,qj−1 , <ij, . . . , ij+k−1> )oj = o( <ij−r, . . . , ij−1>, qj−1 , <ij, . . . , ij+k−1> )

To get the equivalent standard form: the input stream is augmented at the end by k dummyinputs, e s, and then the description given in a standard form, in which:

1. The states, q′ s, carry information about sequences of inputs.

2. Outputs, o′ s, are delayed k time units, or in the equivalent iterative circuit, appear kcells later than they would appear if it were possible to look at future inputs. Also inthe iterative circuit the first k cells are different, they only gather inputs, from theremaining cells which are generally more complex.

I′ = <i1 , i2 , . . . , in , e , . . . , e>

q′0 = <<i1−r , i2−r , . . . , i0>, q0, <>>

δ(q′0,i1) = <<i1−r , i−r , . . . , i0>, q0, <i1>> = <V0, q0, W1>

For j = 2 to k:

δ(q′j−1, ij) = <V0, q0, Wj−1 || <ij>>,Wj = Wj−1 || <ij>

q′k = <<i1−r , i2−r , . . . , i0>, q0, <i1 , i2 , . . . , ik>> = <V0, q0, Wk>

Let V -p be the vector V with the p th component removed, and Vx be the x th component of V.

δ(q′k, ik+1) = <i2−r , i3−r , . . . , i1>, q0<i2, . . . ik+1>> = <V0−1 || [Wk]1, δ(q0, i1), [Wk]

−1 || [V0]last>= <V1, q′k+1, Wk+1>

δ(q′j−1,ij) = <Vj−k−1−1 || [Wj−1]1, δ(q′j−1, [Wj−1]1) || , Wj−1

−1 || Vj−k−1last>

o′(q′j−1, ij) = o(<Vj−k−1, q′j−1, Wj−1> , i1)

9.5.3. Do A Finite Number Of Previous Inputs Determine The Current State?The state of an FSM-O, M, is n-input determined if n previous inputs are sufficient to

determine its current state. If no two states of M require the same single input, as is the case for Min figure 9-10, then one previous input is sufficient to determine the state, it is 1-input determined.Of course a single input can only be sufficient if the number of inputs, N, is less than or equal to thenumber of states. If a single input is not adequate, all pairs of inputs can be tested. If the same twoinput sequence is never applicable to more than one state then M is 2-input determined. (again thiscant’t be true for more than N2 states). This can be continued to tests for 3-input determined, andso on. It is not obvious when to terminate this approach if the FSM-O’s states are not in fact finitelydetermined.

An alternative approach whose termination is easily established is to determine whether pairsof states of a deterministic FSM-O, M, can be distinguished by a "difference in the inputsequences by which each can be reached". Inorder to simplify the discussion a bit we first transformM to Mrev. The determination of "difference in the input sequences applicable to each" state in Mrevis equivalent to the problem as posed for M. To obtain Mrev reverse all edges, leave inputsunchanged (outputs are irrelevant). An example is given in figure 9-29.

Two states s and q are 1-distinguishable iff no single input is applicable to both s and q.

352

M

a

c

f

b

e

d1

0

10

1

10

0

1 0

revM

a

c

f

b

e

d1

0

10

1

10

0

1 0

c

d

e

b c d e f

4

a 1 1 1 1 1

2 2

b 12 2 1

3

3

3

a -

cd

ef

bf

df

be

bd

bc

cf

ce

de

1

0

1

0

0

1

1

0

0

0

1

0

1

1

2

4

3

1

1

2

3

2

3

3

Pair Graph

Number Of Previous Inputs Necessary To Distinguish Pairs Of States

Figure 9-29: Determining If Previous Inputs Determine Current State

States s and q of Mrev are j-distinguishable if for input i, s and q respectively go to states s′ andq′ respectively then s′ and q′ are k-distinguishable where k ≤ j − 1 and for at least one such input s′and q′ are j−1-distinguishable.

(Note that both s and q cannot both go to the same state with the same input because thoughMrev may be non-deterministic, M is deterministic.)

Using these definitions one may organize the determination of whether states of M are or arenot n-input determined, for any finite n, in a number of ways.

Procedure I

353

Start with the straightforward task of finding 1-distinguishable pairs of states. a and b are1-distinguishable if either

1. a or b are have no applicable inputs or

2. no input applicable to a is applicable to b,If no pair of states meet this condition then M is not n-distinguishable for any finite number n.

If there some 1-distinguishabl state pairs then the determination of 2-distinguishable isstraightforward following the definition of j-distinguishable above.

Again, if there no 2-distinguishable state pairs and some pairs which are not 1-distinguishablethen M is not n-distinguishable for any finite number n.

This procedure continues until for some j all remaining pairs of states which have not beenalready found k-distinguishable k < j, are determined to be j-distinguishable or none are so foundand M’ is thus determined not to be finitely determined.

Procedure II

Another approach is to build a distinguishing graph with one vertex for every pair of states inMrev, there is an edge from vertex A to B if some input i is applicable to both members of the pair inA, and the next state of the A states under i gives the pair in B. Ex: figure 9-29. If that graph has nocycles then states are finitely distinguishable.

9.5.3.1. Previous Inputs And Periodic SubsetsIf the states of an FSM-O, M,can be partitioned into sets, say S1, . . . , Sp so that the state of M

is periodically in one of them as the inputs arrive, no matter the value of those inputs, then it is onlynecessary to determine whether the pairs of states from one (any one) of the sets, say Sj are n-inputdetermined. This will generally require consideration of pairs of states outside of Sj also. Adistinguishing graph with vertices associated with state pairs, can be built recursively starting with avertex for each pair from Sj.

If all the state pairs in Sj are found to be k-distinguishable then the pairs in Sj+1 arek1-distinguishable where k1 ≤ k + 1, because now the state in Sj is determined by the previous k1inputs and therefore the next state entered with arrival of the next input is determined. It is inSj+1mod p. In general If all the state pairs in Sj are k-distinguishable then all such pairs in Sj+v arekv-distinguishable where kv ≤ (k + v)mod p.

Further examples of FSM-Os and tests for whether they are n-input determined are illustratedin figure 9-30. Here, in (a), the state dependencies are given in tables for a 9 state FSM-O, M andits reverse Mrev. Instead of building a graph of pairs in Mrev all the states of Mrev are shown incolumn i, with transition arrows going to column ii with the associated input represented; a dashedline for 0, a light solid line for 1 and heavy solid line for 2. We can see from this first set of transitionsthat many pairs of states are 1-distinguishable. Only a 0 input is applicable to states in the set{0, 1, 2}, only a 1 to those in {3, 4, 5}, only a 2 to those in {6, 7, 8}. Taking one member from each oftwo of these sets produces a pair which is 1-distinguishable. Then from ii to iii the transitions of Mrevare shown again. No pair taken from within any of the sets {0, 1, 2}, {3, 4, 5} or {6, 7, 8} is1-distinguishable. So consider where these pairs are taken by transitions of Mrev.

Consider any pair within {0, 1, 2}. With 0 applied to state 0, Mrev goes to all states in {0, 1, 2},and when 0 is applied to state 1 it goes to {3, 4, 5} and when applied to 2 it goes to {6, 7, 8}. So with 0

354

( a )

0 0

1 0

2 0

0 1

1 1

2 1

0 2

2 2

1 2

( b )

0 1 2

1 1 2

0 1 0

1 1 0

0 1 1

1 1 1

2 1 1

2 1 2

2 1 00 2 0

1 0 0

2 0 0

0 2 1

1 0 1

2 0 1

0 2 2

1 0 2

2 0 2

012

inputs

states

1

4

7

0

3

6

2

5

8

a4

b0

b3

b6

b1

b4

b7

b2

b5

b8

a3

a5

inputs0 1 2

a0,a2,a3a3,a4,a5a6,a7,a8

b0 b3 b6

a4 a1a5 a2a6 a3

b3 b6

b1 b4 b7

b2 b5 b8

states

b3b4b5

b0b1b2

b6b7b8

a3 a6a4 a7a5 a8

a6 a0a7 a1

a8 a2

a1a5 a2a6 a3

a4 a7a8

a6

a7

a8

a0

a1

a2

states states

statesinputs

0, 1, 23, 4, 5

0 1 2

10

243

576

86, 7, 8

6, 7, 8

inputs0 1 2

states

0, 1, 20, 1, 20, 1, 2

3, 4, 5

3, 4, 53, 4, 5

6, 7, 86, 7, 8

10

2

43

5

76

8

M

Mrev

M’

iiiiii

iiiiiiiiii

Figure 9-30: State Dependence On Previous Inputs, Some Possibilities

input, states 0, 1, and 2 go to disjoint sets of states, and every pair, one state of which comes from

355

one of these sets and the second from another, is a pair which is 1-distinguishable. So as far asinput 0 is concerned every pair in {0, 1, 2} is 2-distinguishable. The same holds for inputs 1 and 2.This covers all the state pairs so M is 2-input determined. Also, tracing from i to iii, it is evident thatthe each sequence of 2 inputs is applicable at most 1 state. Notice that the connections betweenstages of these interwoven trees form a perfect shuffle.

The FSM-O in (b), M’, can be partitioned into 2 sets each with 8 states, M’ passes from a statein the set a0 through a8, to one in the state set b0 through b8, then back to a0 through a8, etc. Sothe states are partitioned periodicly. As in (a), pairs are not considered explicitly, rather the effect ofeach input on states of M’rev is. (M’rev is not shown). Starting with the set of b states at i, the result ofall inputs applied to these is shown at ii, which are all a states. As demonstrated by the applicabilityof these inputs, States within the sets {b0, b1, b2}, {b3, b4, b5} and {b6, b7, b8} are not1-distinguishable, but all other pairs of b states are. Now some of the other relations between pairsof states will be traced to illustrate the final conclusion. Since the same inputs (1 and 2) areapplicable at ii to all states in {a0, a1, a2}, those in {b0, b1, b2} remain non-distinguishable after asecond input. On the other hand going from ii to iii we see that that states within sets {a0, a1, a2},{a3, a4, a5} and {a6, a7, a8} also are not 1-distinguishable, nor are pairs with one member in{a3, a4, a5} and the other in {a6, a7, a8} because both sets accept 0 and 1. However members ofthese two sets go to {b3, b4, b5} and {b6, b7, b8} respectively with the 0 input and since pairs with onemember from each of these is 1-distinguishable pairs with one member from each of {a3, a4, a5} and{a6, a7, a8} are 2-distinguishable. Since b1 and b2 go to sets {a3, a4, a5} and {a6, a7, a8} respectivelyb1 and b2 must be 3-distinguishable. An alternative view is obtained by tracing from i to iiii. One cansee that the same sequence of three inputs is never applicable to pair of b states. Because ofperiodicity any of the b states is determined by 3 previous inputs each a state, reached by a singleinput from a b state, will be determined by no more than 4 previous inputs.

9.5.4. Addition With Booth TritsA state diagram for a binary adder is given in figure 9-31, at (a1). The current state gives the

carry in, the inputs are the j th pair of adder inputs, the output is the sum, and th next state is thecarry out. This D FSM-O, M, when implemented as an iterative circuit, with all inputs available attime 0, has a time to completion which increases with n, the number of inputs. It is not O(1). In thestate diagram an input of 1 1 always causes a transition to state 1, while an input of 0 0 always putsM in state 0. But an input of 0 1(1 0) can take M to either state. However, suppose that in addition tooutputs of 0 and 1 an output of −1 is allowed ( a −1 at position j contributes −2j to the number). Nowthis gives alternative ways to represent the same output. An input of 0 1 at position j is equivalent toa −1 input at position j+1 and 1 at j. Using this alternative to effect our interpretation of a 0 1 input soas to get the input of 0 1, like 0 0 and 1 1, to always lead to the same state. The state diagram at (a2)accomplishes this end, input 0 1 always takes M to state 1. Thus the previous input determines thecurrent state for this adder. However, once a −1 output is allowed it makes good sense to allow a −1input also, since often a series of additions are necessary. So it is sensible to develop an expandedD FSM-O which works with outputs and inputs with all three values −1, 0, and 1. Under thiscondition there is still a choice of how 0 1(1 0) and analogously 0 −1(−1 0) are handled. Again the goalis have a few previous inputs determine the current state. This is met by the state diagram at "THESOLUTION". This is straightforward for inputs 0 0, 1 1, and −1 −1. For 0 1(1 0) and 0 −1(−1 0) thealternatives allow them to always go to state 1 and −1 respectively for most occurences, but not for

356

1-1

-10/010/0

0

-10/1 10/-1

10/0 -10/0

1-1

-10/010/0

0

-10/1 10/-1

10/0 -10/0 -1,1

100,1(1)

0,-1(-1)

10

THE IDEA

00/1

1

00/0

0

10/1

11/1

10/0

11/0

00/11

00/0

010/-111/0

11/1

10/0

00/1

1-1

00/0

-10/0

11/1

10/0

0

-10/111/0

10/-1

-1-1/-1

00/-1

-1-1/0

10/0 -10/011/-1-1-1/1

THE SOLUTION

THE PROOF(c1) (c2) (c3)

(a1) (a2)

Figure 9-31: Evolution Of State Diagram For Adding Booth Numbers

all occurences. So a test, using pair-path graphs to determine whether finite lookahead determinethe current state, is appropriate. The test need only be carried out for that part of THE SOLUTIONinvolving inputs 0 1(1 0) and 0 −1(−1 0) as shown at (c1). The test is carried out on the reverse of (c1)at (c2). The test is recorded at (c3) and shows that the solution iis 3-lookahead.

After completing a series of additions with the Booth representation it may be necessary toconvert the result back to standard 2 s-complement representation. The inverse of Booth

357

representation given in figure 9-22 is not useable since it assumes the Booth notation wasgenerated by OB which is not the current case. So we assume that input is a string I of n+1 trits(positions [n...0]) (−1, 0, and 1) and just to the left of that we need an end signal, e (although a 0would do just as well). I represents a number in the range −(2n+1−1) to 2n+1−1, and does so in thesame way that as any such trit representation. Note however that there are 3n+1 representationspossible so the same number may be repesented many different ways. The output consists of 0 sand 1 s only and represents the same number as I in its 2 s−complement. The state diagram in figure9-32 gives this transformation. It is tested for previous input determination of current state as shownin the figure. Its pair graph has cycles which cannot be removed.

0,1(1)

0/00/1

0/1

1/0-1/1

0/0

0

1/1

1

-1/0 0/0

0 1/0-1/1

1/1

1

-1/0

0/1

0,1(0)

0/00/1

-1,0,1 0,1REVERSED

-1,0,1 0,1TEST

unremovable cyclese/1e/0

(a) (b)

TEST FOR STATE DEPENDENCE ON A BOUNDED NUMBER OF PREVIOUS INPUTS

Figure 9-32: From Trit To 2s-Complement Numbers

358

9.6. Problems1. Consider three ways of representing integers ( positive and negative)

sign-magnitude ex,, -3 = 1 0 0 1 1

2′s compliment ex., -3 = 1 1 1 0 1

Booth notation ex., -3 = 0 0-1 0+1

The following questions should be answered (brevity and clarity count).

a. Given that the numbers have n bits, (trits in the Booth case). WHAT is therange representable in each representation?

b. HOW do you get the negative of a given number N in each representation?

c. WHY would any of the representations be favored over another for additionand subtraction?

d. WHY would any of the representations be favored over another for decidingwhether two numbers were equal?

e. HOW would conversion from sign-magnitude to each of the other tworepresentations be done?

2. SHOW the multiplication of −3 by −3 in each of the three 5 bit representations. In eachcase stay within the representation as much as possible--taking the negative of anumber within a representation is allowed. For example, to show the multiplication of3 by 3 in 2′s complement, the following is enough.

0001100011----

0001100011-------01001

3. Give an O(1) average time design for a circuit to produce the 2′s complement of an n bitbinary number.

4. GIVE a non-deterministic finite state diagram for T1, the first Booth transformation ifinstead of reading the multiplier starting from the low order bit it is read starting withthe high order bit.

5. GIVE A non-deterministic STATE DIAGRAM for an FSM-O that implements thesecond BOOTH transformation T2 by scanning the input left (high-order) to right (low-order). Assume that the input can be any sequence of n 0 s, 1 s, and −1 s.

a. Can a k-look-ahead deterministic FSM-O be built to implement thisspecification? Explain.

b. If the input to transformation T2 is the output of transformation T1 then T2sinput restricted. In what way?

c. Can a k-look-ahead deterministic FSM-O be built to implement thistransformation T2 assuming the input is the restricted output of transformation

359

T1 and the input is scannned left to right? Explain.

6. Instead of combining M1 and M2 into a single machine, M, as in figure 9-10, one candesign a 1-lookahead implementation of M2, say M2’, and make M1 and M2 thesegments of a 2 segment pipeline, the first segment then implements T1 and thesecond T2.

a. Give the state diagram for M2’.

b. Why might this pipelined implementation be better than the implementation ofM of figure 9-10

c. Give state diagrams for cells in an iterative circuit to realize the FSM-O infigure 9-16 (c), taking advantage of the position of the cells in the iterativecircuit.

7. How would one combine a deterministic lookahead FSM-O, A, with a determministicnon-lookahead FSM-O, B, into one deterministic lookahead FSM-O when the outputof A is the input to B, and also when the output of B is the input to A.

8. Assuming that each binary repreesentation of a 2 s−complement number is equallylikely, PROVE that, on average, the number of 1 s is the same as the number of 0 safter the T1 transformation.

9. As an alternative to the two transformations T1−T2 one could apply a modified form ofT1 called T′1.:

a. Whenever there is a run of two or more 1 s and the space between two suchruns is more than a single 0 T′1 = T1.

b. In T′1 if there is a 1 between two 0 s it is left as is, and

c. If there is a single 0 between two runs, say B to the left of A, each of length≥ 0 again T′1 ≠ T1. Instead in T′1 that 0 is replaced by a −1 and 0 replaces the−1 that would otherwise result from the rightmost 1 of B.

PROVE that T′1(X) = T1(T2(X)). (Consider the state diagram for T′1(X))

361

Index

2’s complement 303

n+1-trit-1−biased-representation 341n+1-trit-representation 341n-input determined 351

Addition 304

Booth 311Booth Algorithm 315

D FSM-O 317, 326

Excess##2e 321

Fractional part 321

In-range 304Inverse, FSM 338

Magnitude, sign 303Multiplicand 309Multiplication 306Multiplier 309

ND FSM-O 318, 328Normalized 323

Output k-lookahead 330, 334Output k-lookahead 328Overflows 304

Padding 307, 309Perfect shuffle 355

Reverse 338, 340, 351

State k-lookahead 330, 334State k-lookahead 328Subtraction 304

Trit 311

i

Table of Contents9. ALU Arithmetic 303

9.1. Representation 3039.2. 2’s Complement 303

9.2.0.1. 2’s Complement Definitions 3039.2.1. Addition-Subtraction 304

9.2.1.1. Picture Of Overflow 3069.2.1.2. Implementations Of Adders 306

9.2.2. Multiplication 3069.2.2.1. Speeding Up Multiplication, Padding-Save Multiplication 3099.2.2.2. Multiplication Using Booth’s Representation 3119.2.2.3. Implementation Of Multipliers 311

9.3. The Booth Algorithm 3159.3.0.1. The Booth Transformation Is Correct 318

9.3.1. Division 3199.3.2. Floating Point 3219.3.3. Implementing Floating Point Addition-Pipeline 324

9.4. Implementation Of Booth Algorithms And Finite State Machines 3259.4.1. From Non-Deterministic To Deterministic FSM-O 3269.4.2. ND To k-Lookahead D FSM-O, Definitions and Tests 326

9.4.2.1. Design Of D FSM-O Equivalent To k-Lookahead ND FSM-O 3289.4.2.2. Digression-The Number Of Zeroes In An OB Multiplier 330

9.4.3. Do N Future Inputs Determine The Next State? 3329.4.3.1. Output K-lookahead--A Test 334

9.4.4. Other Uses Of ND To D FSM-O Transformation 3389.4.4.1. The Inverse Of An FSM-O, Booth Example 3389.4.4.2. Reversed FSM-O M1 Example 338

9.5. Alternative Implementations Of Booth Transformations 3409.5.0.1. Optimal Non-O(1) Design 3409.5.0.2. Minimum Number of 1s In Booth Notation 341

9.5.1. An Alternative O(1) Design, SOB 3439.5.1.1. Can The States Be Broken Into Proper Subsets Entered Periodically? 3469.5.1.2. AOB: A Non-Optimal But Efficient Approximation Algorithm 3479.5.1.3. The Number Of Zeroes In A AOB Multiplier 3489.5.1.4. O(1) Implementations In General 348

9.5.2. Iterative And FSM-O Models: Lookahead, Lookbehind 3509.5.3. Do A Finite Number Of Previous Inputs Determine The Current State? 351

9.5.3.1. Previous Inputs And Periodic Subsets 3539.5.4. Addition With Booth Trits 355

9.6. Problems 358

Index 361

iii

List of FiguresFigure 9-1: 2’s Complement Representation, TC[M] - Examples 304Figure 9-2: 2’s Complement Representation Pictured 307Figure 9-3: Multiplication Of 2’s complements Numbers-Examples 308Figure 9-4: Subtraction Part Of Multiplication, B 310Figure 9-5: Padding-Save Multiplication-Examples 312Figure 9-6: Multiplication With A Booth Multiplier 313Figure 9-7: Product An n Bit By A 5 Bit Number, Circuit For Adding Bit Products 314Figure 9-8: Carry Save Multiplication, The Adder 316Figure 9-9: Booth Transformations T1 and T2 317Figure 9-10: Development Of T1-T2 Optimal Booth (OB) Non-Deterministic FSM-O 319Figure 9-11: Transformations T1 and T2 Work For 2s-complement Numbers 320Figure 9-12: Division 322Figure 9-13: Floating Point Range 323Figure 9-14: Pipeline For Floating Point Addition 325Figure 9-15: Task Tables For Floating Point Additions 326Figure 9-16: 1-Lookahead Deterministic M’2 / M’ Equivalent To M2 / M 327Figure 9-17: Transforming Of Non-Deterministic To k-Lookahead Deterministic FSM-O 329Figure 9-18: Analysis For Probability Of States In M’ 331Figure 9-19: DState k-Lookahead: Pair-Path Graph Test 332Figure 9-20: State k-Lookahead: Procedure For Removal Of Ambiguities 336Figure 9-21: State k-Lookahead: Removal Of Ambiguities And Cycles 337Figure 9-22: Inverse Of Optimal Booth (OB) Transformation, T2(T1(N)), With FSM-O M 339Figure 9-23: FSM-O M1 For Booth Transformation T1 With Input Reversed 340Figure 9-24: A And B are 1-Biased Representations of the Same Number 342Figure 9-25: M1-M2’, Booth (0,1) To (0,1,-1,2,-2) Iterative Circuit 344Figure 9-26: M1-M2’, Booth (0,1) To (0,1,-1,2,-2) State Diagram 345Figure 9-27: Determination Of State Dependence On Previous Inputs, Booth Example 346Figure 9-28: M1-M2’’, Booth (0,1) To (0,1,-1) (Approximation Algorithm) 349Figure 9-29: Determining If Previous Inputs Determine Current State 352Figure 9-30: State Dependence On Previous Inputs, Some Possibilities 354Figure 9-31: Evolution Of State Diagram For Adding Booth Numbers 356Figure 9-32: From Trit To 2s-Complement Numbers 357

9. ALU Arithmetic - Rutgers Universitypaull/chapt9.pdf · 9.2. 2’s Complement Addition and...

Documents

Transcript of 9. ALU Arithmetic - Rutgers Universitypaull/chapt9.pdf · 9.2. 2’s Complement Addition and...