Coping With the Carry Problem 1. Limit Carry to Small Number of Bits Hybrid Redundant Residue Number...
-
Upload
gladys-mcdonald -
Category
Documents
-
view
217 -
download
0
Transcript of Coping With the Carry Problem 1. Limit Carry to Small Number of Bits Hybrid Redundant Residue Number...
Coping With the Carry Problem
1. Limit Carry to Small Number of Bits• Hybrid Redundant• Residue Number Systems
2. Detect the End of Propagation Rather Than Wait for Worst-case Time• Asynchronous (Self-Timed) Design
3. Speed-up Propagation Using Carry Lookahead and Other Methods• Lookahead • Carry-skip• Ling Adder • Carry-select• Prefix Adders • Conditional Sum
4. Eliminate Carry Propagation Altogether• Redundant Number Systems• Signed-Digit Representations
Residue Number Systems (RNS)
• Convert Arithmetic on Large Numbers to Arithmetic on Small Numbers
• Significant Speedup in Some Signal Processing Algorithms
• Valuable Tool for Theoretical Studies of the Limits of Fast Arithmetic
Residue Number Systems (RNS)
• Integer System
• Addition, Subtraction, Multiplication Carry Free !!!
• Division, Comparison, Sign Detection Complex and Slow
• Inconvenient For Fractional Representations• Generally Used For Special Purpose Applications such as DSP Filters
Residue Number Systems (RNS)
• Radix is n-tuple of Integers (mn,mn-1,...,m1)
Not a Single Base Value
• Integer X Represented by n-tuple (xn,xn-1,...,x1)
• qi is Largest Integer Such That:
• xi is the Residue of X mod mi
i i iX m q x
0 ( 1)i ix m
RNS Example Problem
Chinese Scholar, Sun Tzu wrote (1500 years ago):What number has the remainders of 2, 3 and 2 when divided by the values 7, 5 and 3 respectively?
NOTATION:modi i i ix X m x m
Sun Tzu’s Problem:
(7|5|3)
2 | 3 | 2RNS
X
Residue (Modulo) of a Number
11mod 3 2
1mod3 ( 1 3) mod3
2mod3
(2 3) mod 3
5mod3
2
Many Examples in Chapter 4 of Text Use:
RNS 8 | 7 | 5 | 3
Moduli Selection
• Dynamic Range – Product of k Relatively Prime Moduli
• Product, M, is Number of Different Representable Values
in the RNS
DEFINITON
mi and mj are Relatively Prime if gcd(mi,mj) = 1
EXAMPLE
mi = 4 and mj = 9, gcd(4,9) = 1
Although Neither 4 Nor 9 is Prime, They are Relatively Prime
RNS Representation
• Consider RNS(8|7|5|3) (our default RNS in this class)
• 840 Distinct Representable Values
• Since
• Can Represent
• Any Interval of 840 Consecutive Values
8 7 5 3 840M
i iX m M X m
[0,839],[ 420,419],X
Example RNS ValuesRNS=(8|7|5|3)
RNS
RNS
RNS
RNS
RNS
R
RNS
NS
RNS
(5 | 0 |1|
(0 | 0 | 0 | 0) 0,840,1680,
(1|1|1|1) 1,841,1681,
(2 | 2 | 2 | 2) 2,842,1682,
(0 |1| 3 | 2) 8,848,1696,
21,861, ,
(0 |1| 4 |1) 64,904,1744,
(2 | 0 | 0 | 2) 70,770,1610,
(7 | 6 | 4 | 2) 1,839
0) 7
,
01
1
1
679,
RNS Example170110 RNS=(8|7|5|3)
10 10 10 10
10 10 10 10
10 10 10 10
10 10 10 10
1701 / 8 212 5
1701 / 7 243 0
1701 / 5 340 1
1701 / 3 567 0
q remainder
q remainder
q remainder
q remainder
RNS Complementation
• Given RNS Representation of X, -X is Obtained by
Complementing Each Digit. Zero Digits are unchanged.
10 RNS
10 RNS
RNS
21 (5 | 0 |1| 0)
21 (8 5 | 0 | 5 1| 0)
(3 | 0 | 4 | 0)
EXAMPLE
10 10
10 10
10 10
10 10
21 / 8 2 5, 5mod8 3
21 / 7 3 0
21 / 5 4 1, 1mod5 4
21 / 3 7 0
q remainder
q remainder
q remainder
q remainder
CHECK
Chinese Remainder Theorem
• RNS can be viewed as a weighted system.
10 10 10 10
RNS (8 | 7 | 5 | 3)
(105 ,120 ,336 ,280 )iw
EXAMPLE
RNS
840
840
(1| 2 | 4 | 0)
(1 105) (2 120) (4 336) (0 280)
1689 9
RNS Encoding Efficiency
• Example Requires 11 Bits
84041%
2048Efficiency
mod 8 mod 7 mod 5 mod 3
• 840 Different Values Represented
• 211=2048
lg2(840)=9.71411-9.714=1.3 Bits Wasted
RNS Arithmetic • Addition, Subtraction, Multiplication Can be Performed with
Independent Operations on Each Digit • Following Examples Show This Process
10 RNS
10 RNS
8 7 5 3
RNS
8 7 5 3
RNS
8 7 5 3
RNS
5 (5 | 5 | 0 | 2)
1 (7 | 6 | 4 | 2)
( 5 7 | 5 6 | 0 4 | 2 2 )
(4 | 4 | 4 |1)
( 5 7 | 5 6 | 0 4 | 2 2 )
(6 | 6 |1| 0)
( 5 7 | 5 6 | 0 4 | 2 2 )
(3 | 2 | 0 |1)
X
Y
X Y
X Y
X Y
• For Subtraction, Can Complement the Number and Add Also
RNS Circuit Structure
mod 8 mod 7 mod 5 mod 3
mod-8unit
mod-7unit
mod-5unit
mod-3unit
Choosing RNS Moduli
• Assume we wish to represent 100,00010 Values
• Standard Binary lg2(100,000)10 = 16.609610 =17 bits
• RNS(13|11|7|5|3|2), Dynamic RangeM=30,03010
–Insufficient Dynamic Range
–Maximum Digit Width = 4 bits, Total = 17 bits
• RNS(17|13|11|7|5|3|2), Dynamic RangeM=510,51010
– Dynamic Range 5.1 Times Too Large
– Maximum Digit Width = 5 bits, Total = 22 bits
• Adding More Prime Moduli is Inefficient
Choosing RNS Moduli• Remove mi=5 From RNS(17|13|11|7|5|3|2)
• RNS(17|13|11|7|3|2), Dynamic RangeM=102,10210
• Still Have Relatively Prime Moduli
– Maximum Digit Width = 5 bits, Total = 19 bits
– 1 5-bit, 2 4-bit, 1 3-bit, 1 2-bit and 1 1-bit Modulo Units Required
• Maximum Delay 5-bit Carry-Propagate
• Can Combine (3,7) and (2,13) Moduli With no
Speed Penalty
• RNS(26|21|17|11), Dynamic RangeM=102,10210
– Maximum Digit Width = 5 bits, Total = 19 bits
– 3 5-bit and 1 4-bit Modulo Units Required
Relatively Prime Values• Powers of Smaller Primes are Relatively Prime
Example
• gcd(32, 22) = 1 But gcd(32,3) = 3
– Can REPLACE a Modulus With its Power
– Try Use Sequence of SMALLEST Valued Moduli
RNS(22 |3), Dynamic RangeM=1210
RNS(32 |23 |7|5), Dynamic RangeM=2,52010
RNS(11|32 |23 |7|5), Dynamic RangeM=27,72010
RNS(13|11|32 |23 |7|5), Dynamic RangeM=360,36010
– Maximum Digit Width = 4 bits, Total = 21 bits
– Dynamic Range 3.6 times that Needed
Relatively Prime Values RNS(13|11|32 |23 |7|5), Dynamic RangeM=360,36010
– Maximum Digit Width = 4 bits, Total = 21 bits
– Dynamic Range 3.6 times that Needed
• Reduce the Above by Factor of 3
• Replace 32 with 3 and Combine 3 and 5 to Get 15
RNS(15|13|11 |23 |7), Dynamic RangeM=120,12010
– Maximum Digit Width = 4 bits, Total = 18 bits
– Dynamic Range 1.2 times that Needed
• Using This Strategy Can Generally Find the “Best”
Moduli in Terms of Speed and Representation
Efficiency
Moduli Choice for Simple Arithmetic Unit Design
• Simple Units Also Lead to Speed and Cost Benefits• Modulo-ADD,SUBTRACT, MULTIPLY Units
Simple to Design if mi=2ai or 2ai-1
• Power of 2 Moduli Lead to Simple Design– Standard a-bit Binary Adder– Example: Use 16 Instead of 13– Exception in Case of Lookup Table Implementation
•Power of 2a-1 Moduli Lead to Simple Design– Standard a-bit Binary Adder with End-around Carry– Referred to as “Low-cost” Moduli
RNS Low-Cost Moduli
Theorem:A sufficient condition for 2a-1 and 2b-1 to be a relatively prime pair is that a and b are relatively prime.
• Any List of Relatively Prime Numbers:
ak-2> ...>a1>a0
• Can be Used as a BASIS of k-modulus RNS:
RNS(2ak-2|2ak-2 -1|...|2a1-1|2a0-1)
• Widest Residues (Longest Carry-chain) is ak-2-bit Values
Low-Cost Moduli Example
• Consider the Example From Earlier
X=[0,100,000]
• Choosing the Moduli From Smallest to Largest:
RNS(23 | 23 -1| 22 -1) Basis:3, 2 M=16810
RNS(24 | 24 -1| 23 -1) Basis:4, 3 M=168010
RNS(25 | 25 -1 | 23 -1| 22 -1) Basis:5, 3, 2 M=20,83210
RNS(25 | 25 -1 | 24 -1| 23 -1) Basis:5, 4, 3 M=104,16010
• Can’t Include 2 and 4 in Same Basis Set, gcd(2,4)=2
Low-Cost Moduli Example
RNS(25 | 25 -1 | 24 -1| 23 -1) Basis:5, 4, 3 M=104,16010
= RNS(32 | 31 | 15| 7)
• Requires 5+5+4+3=17 bits
• Requires 2 5-bit, 1 4-bit and 1 3-bit Module
• 4 RNS Digits
• Efficiency = (100,001/104,160)=0.96004100%
• Comparing With Unrestricted Moduli: RNS(25 | 25 -1 | 24 -1| 23 -1) 17 bits M=104,16010
5-bit Carry-ripple but Simpler Circuit, Fewer Digits
RNS(15|13|11 |23 |7) 18 bits M=120,12010
4-bit Carry-ripple , 1 Extra Digit
Encoding and Decoding
• Advantages of Alternative Number Systems
Must Not be Outweighed By Conversions
to/from the System
• Encoding From Fixed Positional System to
RNS Easily Accomplished Using a Table-
Lookup and Modulo Addition Circuits
Encoding with Lookup Table
• Conversion of Signed-Magnitude or 2’s
Complement Accomplished by Converting
Magnitude and Taking RNS Complement• Consider the Following Identity:
1 1 01 2 1 0 1 1 02
2 2 2i i i i
i
kk k km m m m m
y y y y y y y
• Idea is to Compute a Table of All Terms and
Store in a Table for all i, j Then Add2
i
j
m
Example Lookup Table
• Use Default RNS=(8|7|5|3)
j 2 j 7
2 j 5
2 j 3
2 j
0 1 1 1 1 1 2 2 2 2 2 4 4 4 1 3 8 1 3 2 4 16 2 1 1 5 32 4 2 2 6 64 1 4 1 7 128 2 3 2 8 256 4 1 1 9 512 1 2 2
• For mi=8 We Can Use 3 LSbs of Value
Example Encoding
2 10
3 2 1 0 2 2 10
7 5 2
2 107
1 105
0 103
3 2 1 0 RNS RNS
10100100 164 RNS (8 | 7 | 5 | 3)
mod 8 ( ) 100 4
2 2 2 7,5,2
mod7 2 4 4 3
mod5 3 2 4 4
mod3 2 2 1 2
( | | | ) (4 | 3 | 4 | 2)
Y
x Y y y y
Y j
x Y
x Y
x Y
Y x x x x
1 1 01 2 1 0 1 1 02
2 2 2i i i i
i
kk k km m m m m
y y y y y y y
RNS to Mixed-Radix Form
• CRT States That a Mixed-Radix Number System
(MRS) is Associated with any RNS
• Solves comparison, sign detection, and overflow problems
• MRS is k-digit Weighted Positional Number System
(mk-1|mk-2|...|m2|m1|m0)
• MRS Weights are Products:
(mk-2...m2m1m0, ...,m2m1m0, m1m0, m0,1)
• MRS Digit Sets in Each of k Positions:
[0, mk-1-1],...,[0, m2-1],[0, m1-1],[0, m0-1]
• MRS Digits in Same Range as RNS Digits
RNS to MRS Example
• Example Position Weights MRS (8|7|5|3)
(7)(5)(3)=105, (5)(3)=15, 3, 1
• (0|3|1|0)MRS(8|7|5|3)
=(0)(105)+(3)(15)+(1)(3)+(0)(1)=4810
• RNS to MRS Conversion Requires Finding the zi
that Correspond to the yi in:
1 2 1 0 RNS 1 2 1 0 MRS( | | | | ) ( | | | | )k kY y y y y z z z z
RNS to MRS Conversion
• From MRS Definition we Have:
1 2 2 1 0 2 1 0 1 1 0( ) ( ) ( ) (1)k kY z m m m m z m m z m z
• Easy to See that z0 = y0, Subtracting This Value From
RNS and MRS Values Results in:
0 1 2 1 RNS 1 2 1 MRS( ' | | ' | ' | 0) ( | | | | 0)k kY y y y y z z z
0'j
j j my y y
RNS to MRS Conversion (cont)
• Thus, if We Can Divide by m0, We Have an
Iterative Approach for Conversion
• Dividing y' (a Multiple of m0) by m0 is SCALING
Easier Than Normal RNS Division
• Accomplished by Multiplying by Muliplicative
Inverse of m0
1 2 1 1 2 1( '' | | '' | '' ) ( | | | )k RNS k MRSy y y z z z • Next, Divide Both Representations by m0:
Multiplicative Inverses
• Multiplicative Inverse is a Value When Multiplied
by Given Quantity Yields a Product of 1
• Example Multiplicative Inverses of 3 Relative to
mi=8, 7, 5:8
7
5
3 3 1
3 5 1
3 2 1
• Thus, Multiplicative Inverses are 3, 5 and 2
• Can Build a Lookup Table Circuit to Store Inverses
CRT LUTi im ix
ii i i m
MM x
3 8 0 0 1 1 0 5 2 2 1 0 3 3 1 5 4 4 2 0 5 5 2 5 6 6 3 0 7 7 3 5
2 7 0 0 1 1 2 0 2 2 4 0 3 3 6 0 4 4 8 0 5 6 0 0 6 7 2 0
1 5 0 0 1 3 3 6 2 6 7 2 3 1 6 8 4 5 0 4
0 3 0 0 1 2 8 0 2 5 6 0
Multiplicative Inverses Example
• Divide the Number Y' = (0|6|3|0)RNS by 3
• Accomplish Through Multiplication by (3|5|2|-)RNS
RNSRNS RNS
8 7 5
RNS
(0 | 6 | 3 | 0)(0 | 6 | 3 | 0) (3 | 5 | 2 | )
3( 0 3 | 6 5 | 3 2 | )
(0 | 2 |1| )
RNS/MRS Conversion Example
• Convert Y=(0|6|3|0)RNS to MRS z0 = y0 = 0
• Divide by 3RNS
RNS
(0 | 6 | 3 | 0)(0 | 2 |1| )
3
• Now, We Have z1=1, Subtract by 1 and Divide by 5
RNSRNS RNS
RNS
(7 |1| 0 | )(7 |1| 0 | ) (5 | 3 | | )
5(3 | 3 | | )
• This Gives z2 = 3, Subtract by 3 and Divide by 7
RNS/MRS Conversion Example
RNS
RNS
(0 | 0 | | )(0 | 0 | | )
7
• Thus Y=(0|6|3|0)RNS is (0|3|1|0)MRS
• Position Weights MRS (8|7|5|3)
(7)(5)(3)=105, (5)(3)=15, 3, 1
So, Y=(0|6|3|0)RNS = (0|3|1|0)MRS = (48)10
RNS/MRS Conversion
•Consider Conversion of (3|2|4|2)RNS from RNS(8|7|5|3)
to DecimalRNS RNS RNS
RNS RNS
RNS RNS
RNS RNS
(3 | 2 | 4 | 2) (3 | 0 | 0 | 0) + (0 | 2 | 0 | 0)
+ (0 | 0 | 4 | 0) + (0 | 0 | 0 | 2)
= 3×(1| 0 | 0 | 0) + 2×(0 |1| 0 | 0)
+ 4×(0 | 0 |1| 0) + 2×(0 | 0 | 0 |1)
•Need to Determine Values of (1|0|0|0)RNS, (0|1|0|0)RNS,
(0|0|1|0)RNS and (0|0|0|1)RNS
RNS/MRS Conversion
•From Definition of RNS, Positions with 0 are Multiples
of RNS(8|7|5|3) and Position with 1 are <Y>mi=1
RNS
RNS
RNS
RNS
(1| 0 | 0 | 0) = 105
(0 |1| 0 | 0) = 120
(0 | 0 |1| 0) = 336
(0 | 0 | 0 |1) = 280
RNS
840
(3 | 2 | 4 | 2)
3 105 2 120 4 336 2 280
779
Chinese Remainder Theorem
• How Did We Find w3 = (1|0|0|0)RNS = 105?
• Since Digits in 7, 5, 3 Places are 0, w3 Must be a
Multiple of (7)(5)(3)=105
• Must Pick the Multiple of 105 Such That its Residue
With Respect to 8 is 1
• Accomplished by Multiplying 105 by its’ Multiplicative
Inverse with Respect to 8
• This Process is Formalized in Chinese Remainder
Theorem
Chinese Remainder Theorem
THEOREM: Chinese Remainder Theorem (CRT)
The magnitude of an RNS number can be obtained
from the CRT formula:1
1 2 1 0 RNS0
( | | | | )i
k
k i i ii m M
Y y y y y M y
where, by definition, Mi = M/mi and i = < Mi-1>mi is
the multiplicative inverse of Mi with respect to mi.
Chinese Remainder Theorem
1
1 2 1 0 RNS0
( | | | | )i
k
k i i ii m M
Y y y y y M y
• Can Avoid Multiplications in Conversion Process
by Storing <Mi<iyi>mi>M in a Table
• Example Table Given on page 64 of Textbook (and
also in slide 33)
Difficult RNS Operations
• Sign Test
• Magnitude Comparison
• Overflow Detection
• Generalized Division
Suffices to discuss first three in context of being able to
do magnitude comparison since they are essentially same
if M is such that M=N+P+1 where the values represented
are in interval [-N,P].
Difficult RNS Operations
• Sign Test same as Comparison with P
• Overflow Detection accomplished using Signs
of Operands and Results
Focus On:
• Magnitude Comparison
• Generalized Division
Magnitude Comparison•Could Convert to Weighted Representation Using CRT
Too Complicated – too much Overhead
Use Approximate CRT Instead
Divide CRT Equality by M1
1 2 1 0 RNS0
( | | | | )i
k
k i i ii m M
Y y y y y M y
111 2 1 0 RNS
01
( | | | | )
i
kk
i i ii m
y y y yYm y
M M
/i iM M m by Definition
Approximate CRT
• Addition of Terms is Modulo-1
• All mi-1<iyi>mi Are in [0,1)
• Whole Part of Result Discarded and Fractional Part Kept
• Much Easier than CRT Modulo-M Addition
• mi-1<iyi>mi Can be Precomputed for all y and i
• Use Table Lookup Circuit and Fractional Adder
(ignore carry-outs)
111 2 1 0 RNS
01
( | | | | )
i
kk
i i ii m
y y y yYm y
M M
Approximate CRT LUTi im iy 1
ii i i m
m y
3 8 0 . 0 0 0 0 1 . 1 2 5 0 2 . 2 5 0 0 3 . 3 7 5 0 4 . 5 0 0 0 5 . 6 2 5 0 6 . 7 5 0 0 7 . 8 7 5 0
2 7 0 . 0 0 0 0 1 . 1 4 2 9 2 . 2 8 5 7 3 . 4 2 8 6 4 . 5 7 1 4 5 . 7 1 4 3 6 . 8 5 7 1
1 5 0 . 0 0 0 0 1 . 4 0 0 0 2 . 8 0 0 0 3 . 2 0 0 0 4 . 6 0 0 0
0 3 0 . 0 0 0 0 1 . 3 3 3 3 2 . 6 6 6 7
Magnitude Comparison ExampleUse approximate CRT decoding to determine the larger
of the two numbers.RNS RNS(0 | 6 | 3 | 0) (5 | 3 | 0 | 0)X Y
1.6250 .4286 .0000 .0000 .0536
Y
M
X Y
Reading the Values from the Tables:
1.0000 .8571 .2000 .0000 .0571
X
M
Thus, we conclude that:
Approximate CRT ErrorIf Maximum Error in Approximate CRT Table is , then
Approximate CRT Decoding Yields Scaled Value of
RNS Number with Error No Greater than k
0.0571 - 0.0536 = 0.0035 > 4 = 0.0002, so X > Y is Safe
Previous Example Table Entries Rounded to 4 Digits
Maximum Error in Each Entry is = 0.00005
k = 4 Digits Error is 4 = 0.0002
Redundant RNS Representations
• Do Not Have Restrict Digits in RNS to Set [0, mi -1]
• If [0, i] Where i mi Then RNS is Redundant
• Redundant RNS Simplifies Modular Reduction Step for Each Arithmetic Operation
Redundant RNS Example• Consider mod-13 with [0,15]• Redundant since: 0 mod13 0 0 mod13 13
1 mod13 1 1 mod13 14
2 mod13 2 2 mod13 15
• Addition Using Pseudo-redundancies Can be Done with Two 4-bit Adders
00
Ignore
Cout
X Y
SUM
1 1 0 1 (13)
1 1 0 1 (13)1 1 0 1 0 (26)
0 0 1 1
1 0 1 01 1 0 1 (13)