Retrenchment strategies corporate level strategies - Strategic management - Manu Melwin Joy
Ece465 High Level Design Strategies
-
Upload
muhammad-yusuf -
Category
Documents
-
view
216 -
download
0
Transcript of Ece465 High Level Design Strategies
-
8/3/2019 Ece465 High Level Design Strategies
1/23
ECE 465
High Level Design Strategies
Lecture Notes # 9
Shantanu Dutt
Electrical & Computer Engineering
University of Illinois at Chicago
-
8/3/2019 Ece465 High Level Design Strategies
2/23
Outline Circuit Design Problem
Solution Approaches:
Truth Table (TT) vs. Computational/AlgorithmicYes, hardware, just like software can implement anyalgorithm!
Flat vs. Divide-&-Conquer Divide-&-Conquer:
Associative operations/functions
General operations/functions
Other Design Strategies for fast circuits: Speculative computation
Best of both worlds (best average and best worst-case)
Pipelining
Summary
-
8/3/2019 Ece465 High Level Design Strategies
3/23
Circuit Design Problem Design an 8-bit comparator that compares two 8-bit #s available in
two registers A[7..0] and B[7..0], and that o/ps F = 1 if A > B and F =0 if A
-
8/3/2019 Ece465 High Level Design Strategies
4/23
Circuit Design Problem (contd)
Approach 2: Think computationally/algorithmically aboutwhat the ckt is supposed to compute:
Approach 2(a): Flat algorithmic approach:
Note: A TT can be expressed as a sequence of if-then-elses
If A = 00000000 and B = 00000000 then F = 0
else if A = 00000000 and B = 00000001 then F=0
.
else if A = 00000001 and B = 00000000 then F=1
.
Essentially a re-hashing of the TT same problems as the TTapproach
-
8/3/2019 Ece465 High Level Design Strategies
5/23
Circuit Design Problem: Strategy 1: Divide-&-Conquer
Approach 2(b): Structured algorithmic approach:
Be more innovative, think of the structure/propertiesof the
computational problem E.g., think if the problem can be solved in a hierarchical or divide-
&-conquer (D&C) manner:
Subprob. A1
A1,1 A1,2 A2,1 A2,2
Root problem A
Subprob. A2
Stitch-up of solns to A1 and A2to form the complete soln to A
D&C approach: See if the problem can be broken up into 2 or more smallersubproblems whose solutions can be stitched-up to give a soln. to the parentprob. Do this recrusively for each large subprob until subprobs are small enough forTT-based solutions
If the subprobs are of a similar kind (but of smaller size) to the root prob thenthe breakup and stitching will also be similar
Do recursively until subprob-size
is s.t. TT-based design is doable
-
8/3/2019 Ece465 High Level Design Strategies
6/23
Shift Gears: Design of a Parity Detection CircuitA Series of XORs(b) 16-bit parity tree
Delay = (# of levels in
AND-OR tree) * td =log2(n) *td
x(15) x(14) x(1) x(0)
w(3,0)
w(3,1)
w(3,2)
w(3,3)
w(3,4)
w(3,5)
w(3,6)
w(3,7)
w(2,0)w(2,1)w(2,2)w(2,3)
w(1,0)w(1,1)
w(0,0) = f
An example of simpledesigner ingenuity---abad design wouldhave resulted in alinear delay that theVHDL code & thesynthesis tool wouldhave been at the
mercy of.
x(0)
x(1)
x(2)X(3)
x(15) f
(a) A linearly-connected circuit
No concurrency in design (a)---the actual problem hasavailable concurrency, though, and it is not exploited well inthe above linear design Complete sequentialization leading to a delay that is linearin the # of bits n(delay = n*td), td= delay of 1 gate All the available concurrency is exploited in design (b)---a
parity tree.Question: When can we have a tree-structured circuit foran operation on multiple operands?Answer: (1) When the operation makes sense for any # ofoperands. (2) It should be possible to break it down intooperations w/ fewer operands. (3) When the operation isassociative. An oper. x is said to be associative if:
a x b x c = (a x b) x c = a x (b x c). Thus if we have 4 operations a x b x c x d, we can eitherperform this as a x (b x (c x d)) [getting a linear delay of 3units] or as (a x b) x (c x d) [getting a logarithmic (base 2)delay of 2 units and exploiting the available concurrency dueto the fact that x is associative].We can extend this idea to noperands (& n-1 operations) to perform as many ofthe pairwise operations as possible in parallel (& do this recursively for every level
of remaining operations), similar to design (b) for the parity detector [xor is anassociative operation!] and thus get a (log2 n) delay.
f = (((x(15) xor x(14)) xor (x(13) xor x(12))) xor ((x(11) xor x(10)) xor (x(9) xor x(8))) )xor (((x(7) xor x(6)) xor (x(5) xor x(4))) xor ((x(3) xor x(2)) xor (x(1) xor x(0))))
-
8/3/2019 Ece465 High Level Design Strategies
7/23
D&C for Associative Operations Let f(xn-1, .., x0) be an associative function. What is the D&C principle involved in the design of an n-bit xor/parity
function? Can it also lead automatically to a tree-based ckt?
f(a,b)
a b
f(xn-1, .., x0)
Stitch-up function---same as theoriginal function for 2 inputs
Using the D&C approach for an associative operation results in the stitchup function being the same as the original function (not the case for non-assoc. operations), but w/ a constant # of operands (2, if the orig problemis broken into 2 subproblems) If the two sub-problems of the D&C approach are balanced (of the samesize or as close to it as possible), then unfolding the D&C results in a
balanced operation tree of the type for the xor/parity function seen earlier
f(xn-1, .., xn/2) f(xn/2-1, .., x0)
-
8/3/2019 Ece465 High Level Design Strategies
8/23
D&C Approach for Non-Associative Opers: n-bit Comparator
A
A[i] B[i] f1(i) f2(i)0 0 0 10 1 0 01 0 1 01 1 0 1
If A[i] = B[i] then { f1(i)=0; f2(i) = 1; /* f2(i) o/p is an i/p to the stitch logic */
/*f2(i) =1 meansf1( ), f2( ) o/ps of the LS of this subtreeshould be selected by the stitch logic as its o/ps */else if A[i] < B[i} then { f1(i) = 0; /* indicates < */f2(i) = 0 } /* indicates f1(i), f2(i) o/ps should be selected by stitch logic as its o/ps */else if A[i] > B[i] then {f1(i) = 1; /* indicates > */
f2(i) = 0 }
The TT may be derived directly or by first thinking of and expressing itscomputation in a high-level programming languageand then convertingit to a TT.
Useful property: At anylevel, comp. of MS (mostsignificant) half determineso/p if result is > or < else
comp. of LS determ. o/p Can thus break up problemat any level into MS andLS comparisons & basedon their results determinewhich o/p to choose for thehigher-level (parent) result
Comp A[7..4],B[7..4]
Comp. A[7..0]],B[7..0] Stitch-up of solns toA1 and A2 to form the
complete soln to A
A1 A2Comp A[3..0],B[3..0]
If A1 reslt is> or < takeA1 reslt elsetake A2 reslt
Comp A[7..6],B[7..6] Comp A[5,4],B[5,4]
A1,1 A1,2
If A1,1,1 reslt is> or < takeA1,1,1 reslt elsetake A1,1,2 reslt
Comp A[7],B[7] Comp A[6],B[6]
If A1,1 reslt is> or < takeA1,1 reslt elsetake A1,2 reslt
A1,1,1
A1,1,2
Small enough to bedesigned using a TT
(2-bit 2-o/p comparator)
Is this is associative?not sure For a non-associative func,determine its propeties that allowdetermining a break-up & a
correct stitch-up function
-
8/3/2019 Ece465 High Level Design Strategies
9/23
Comparator Circuit Design Using D&C (contd.)
Comp A[7..4],B[7..4]
Comp. A[7..0]],B[7..0] Stitch-up of solns to A1 and A2to form the complete soln to A
A
A1A2
Comp A[3..0],B[3..0]
If A1 reslt is> or < takeA1 reslt elsetake A2 reslt
Comp A[7..6],B[7..6]Comp A[5,4],B[5,4]
A1,1 A1,2
If A1,1,1 reslt is> or < takeA1,1,1 reslt elsetake A1,1,2 reslt
Comp A[7],B[7] Comp A[6],B[6]
If A1,1 reslt is> or < takeA1,1 reslt elsetake A1,2 reslt
A1,1,1 A1,1,2
A[i] B[i] f1(i) f2(i)0 0 0 10 1 0 01 0 1 01 1 0 1
Stitch up logic details:If f2(i) = 0 then { my_op1=f1(i);my_op2=f2(i) }/* select MS comp o/ps */else/* select LS comp. o/ps */
{my_op1=f1(i-1); my_op2=f2(i-1) }
Stitch-uplogic
f1(i) f2(i)
my_op1 my_op2
f1(i-1) f2(i-1)
f1(i) f2(i) f1(i-1) f2(i-1) my_op1 my_op2X 0 X X f1(i) f2(i)X 1 X X f1(i-1) f2(i-1)
OR
Once the D&C tree is formulatedit is easy to get the low-level &stitch-up designs Stitch-up design shown here
(Compact TT)
2-bit2:1 Mux
2
2 2
f(i) f(i-1)
my_op
f2(i)
I0 I1
(Direct design)
-
8/3/2019 Ece465 High Level Design Strategies
10/23
Comparator Circuit Design Using D&C Final Design
2-bit2:1 Mux
2
2 2
my(3)
f2(7) = f(7)(2)
I0 I1
1-bitcomparator
f(7)
A[7] B[7]
2
1-bitcomparator
f(6)
A[6] B[6]
2
1-bitcomparator
f(5)
A[5] B[5]
2
1-bitcomparator
f(4)
A[4] B[4]
2
1-bitcomparator
f(3)
A[3] B[3]
2
1-bitcomparator
f(2)
A[2] B[2]
2
1-bitcomparator
f(1)
A[1] B[1]
2
1-bitcomparator
f(0)
A[0] B[0]
2
2-bit2:1 Mux
2
2 2
my(2)
f(5)(2)
I0 I1
2-bit2:1 Mux
2
2 2
my(1)
f(3)(2)
I0 I1
2-bit2:1 Mux
2
2 2
my(0)
f(1)(2)
I0 I1
2-bit2:1 Mux
2
2 2
my(5)
my(3)(2)
I0 I1
2-bit2:1 Mux
2
2 2
my(4)
my(1)(2)
I0 I1
my(5)(2)1-bit
2:1 Mux
F= my1(6)
I0 I1
my(5)(1) my(4)(1)
Log n level
of Muxes
Delay(8-bit comp.) = 3 (delay of 2:1Mux) + delay of 2-bit comp. Note parallelism at work multiplelogic blocks are processing simult.
Delay(n-bit comp.) = log n (delay of2:1 Mux) + delay of 2-bit comp.
H/W_cost(8-bit comp.) =7(HW_cost(2:1 Muxes)) +8(H/W_cost(2-bit comp.)
H/W_cost(n-bit comp.) =(n-1)(H/W_cost(2:1 Muxes)) +n(H/W_cost(2-bit comp.))
-
8/3/2019 Ece465 High Level Design Strategies
11/23
D&C: Top-Down vs Bottom-Up: Mux Design
2:1Sn-1
Sn-2 S0
2n-1 :1
MUX
I0
12 nI n-1
12 nI
Sn-2 S0
2n-1 :1
MUX
n-12n
I
(a) Top-Down
2:1
2:1
2:1
Sn-1 S1
2n-1 :1
MUX
S0
S0
S0
2n-1
2:1
MUXes
(b) Bottom-Up
Generally better to try top-down first
All bits exceptmsb shouldhave differentcombinations;msb should be
at a constantvalue (here 0)
MSB value should differ
among these 2 groups
All bits exceptmsb shouldhave different
combinations;msb should beat a constantvalue (here 1)
-
8/3/2019 Ece465 High Level Design Strategies
12/23
8:1
MUX
I0
I1
I2
I3
I4
I5
I6
I7
S2 S1 S0
An 8:1 MUX example (bottom-up)
I1
2:1MUX
S0
I0
I32:1MUX
S0
I2
I5S0
I4
I7
2:1MUX
S0
I6
2:1
MUX
4:1MUX
S2 S1
I0
I2
I4
I6
Z
I1
I3
I5
I7
Selected when S0 = 1
Selected when S0 = 0
Z
These inputs shouldhave differentlsb or S0values, since their sel. isbased on S0 (all otherremaining, i.e., unselectedbit values should be thesame). Similarly for otheri/p pairs at 2:1 Muxes at
this level.
-
8/3/2019 Ece465 High Level Design Strategies
13/23
8:1
MUX
I0
I1
I2
I3
I4
I5
I6
I7
S2 S1 S0
Opening up the 8:1 MUXs hierarchical design and a top-down view
I1
2:1
MUXS0
I0
I32:1MUX
S0
I2
I5S0
I4
I7
2:1MUX
S0
I6
2:1
MUX
I0
I2
I4
I6
Z
2:1
MUX
2:1
MUX
2:1
MUXZ
S1
S1
S2
I2
I6
I6
Selected when S0 = 0, S1 = 1.
These i/ps should differ in S2
Selected whenS0 = 0, S1 = 1, S2=1
4:1 Mux
4:1 Mux
All bits except msb should have
different combinations; msb
should be at a constant value
(here 0)
All bits except msb should have
different combinations; msb
should be at a constant value
(here 1)
MSB value should differ
among these 2 groups
Add D i i D C
-
8/3/2019 Ece465 High Level Design Strategies
14/23
Adder Design using D&C Example: Ripple-Carry Adder
(RCA)
Stitching up: Carry from LS n/2 bits
is input to carry-in of MS n/2 bits ateach level of the D&C tree.
Leaf subproblem: Full Adder (FA)
Example: Carry-Lookahead Adder(CLA)
Division: 4 subproblems per level
Stitching up: A more complexstitching up process (generation ofsuper P,Gs to connect up thesubproblems)
Leaf subproblem: 4-bit basic CLAwith small p, g bits.
More intricate techniques (like P,Ggeneration in CLA) for complexstitching up for fast designs mayneed to be devised that is notdirectly suggested by D&C. But
D&C is a good starting point.
Add n-bit #s X, Y
Add MS n/2 bits
of X,Y
Add LS n/2 bits
of X,Y
FA FA FA FA
(a) D&C for Ripple-Carry Adder
Add n-bit #s X, Y
Add ms n/4 bits Add 3rd n/4 bits Add 2nd n/4 bits Add ls n/4 bits
4-bit CLA 4-bit CLA 4-bit CLA 4-bit CLA
(b) D&C for Carry-Lookahead Adder
D d R l ti i D&C
-
8/3/2019 Ece465 High Level Design Strategies
15/23
Dependency Resolution in D&C:(1) The Wait Strategy
Strategy 1: Waitfor required o/p of A1 and then perform A2, e.g.,
as in a ripple-carry adder: A = n-bit addition, A1 = (n/2)-bit addition
of the L.S. n/2 bits, A2 = (n/2)-bit addition of the M.S. n/2 bits
No concurrency between A1 and A2:
t(A) = t(A1) + t(A2) + t(stich-up)
= 2*t(A1) + t(stitch-up) if A1 and A2 are the same problems ofthe same size (w/ different i/ps)
Subprob. A2
Root problem A
Subprob. A1
Data flow
So far we have seen D&C breakups in which there is no datadependency between the two (or more) subproblems of the breakup
Data dependency leads to increased delays We now look at various ways of speeding up designs that havesubproblem ependencies in their D&C breakups
-
8/3/2019 Ece465 High Level Design Strategies
16/23
Note: Gate delay is propotional to # of inputs (since, generally there is a seriesconnection of transistors in either the up or down network = # of inputsRs of the
transistors in series add up and is prop to # of inputs delay ~ RC (C is capacitive load)
is prop. to # of inputs)
Assume each gate i/p contributes 2 ns of delay
For a 16-bit adder the delay will be 160 ns
For a 64 bit adder the delay will be 640 ns
Example of the Wait Strategy in Adder Design
-
8/3/2019 Ece465 High Level Design Strategies
17/23
Dependency Resolution in D&C:(2) The Design-for-all-cases-&-select or Speculative Strategy
Other variations---Predict Strategy: Have a single copy of A2 but choose a highly likely
value of the k-bit i/p and perform A1, A2 concurrently. If after k-bit i/p from A1 is available and
selection is incorrect, re-do A2 w/ correct available value.
t(A) = p(correct-choice)*max(t(A1), t(A2)) +[(1-p(correct-choice)]*t(A2) + t(Mux) + t(stich-up),
where p(correct-choice) is probability that our choice of the k-bit i/p for A2 is correct
Need a completion signal to indicate when the final o/p is available for A; assuming worst-
case time (when the choice is incorrect) is meaningless is such designs
Root problem A
Subprob. A1Subprob. A2
Subprob. A2
Subprob. A2
Subprob. A2
4-to-1Mux
Select i/p
00
01
10
11
I/p00
I/p01
I/p10
I/p11
Strategy 2: For a k-bit i/p from A1 to A2, design2k copies of A2 each with a different hardwired k-
bit i/p to replace the one from A1. Select the correct o/p from all the copies of A2via a (2k)-to-1 Mux that is selected by the k-bito/p from A1 when it becomes available (e.g.,carry-select adder) t(A) = max(t(A1), t(A2)) + t(Mux) + t(stich-up)= t(A1) + t(Mux) + t(stitch-up) if A1 and A2 are
the same problems
-
8/3/2019 Ece465 High Level Design Strategies
18/23
Example of the Speculative Strategy in Adder Design
For a 16-bit adder, the delay is (9*48)*2 = 56 ns (2 ns is the delay for a single
i/p); a 65% improvement ((160-56)*100/160)
For a 64-bit adder, the delay is (9*88)*2 = 128 ns; an 80% improvement.
D d R l ti i D&C
-
8/3/2019 Ece465 High Level Design Strategies
19/23
Dependency Resolution in D&C:(3) The Lookahead or Pre-Computation Strategy
Strategy 3: Redo the design of A2 so that it can do as much processing as possible that is independent of
the i/p from A1 (A2_indep = A2_lookahd). This is the lookahead computation that prepares for the final
computation of A2 (A2_dep) that can start once A2_indep and A1 are done.
t(A) = max(t(A1), t(A2_indep)) + t(A2_dep) + t(stitch-up)
E.g., Carry-looakahead adder --- does lookahead computation; also looakahead compuattion is
associative, so doable in (log n). Overall computation is also doable in (log n) time.
A less structured example: Let a1 be the i/p from A1 to A2. If A2 has the logic:
a2 = vx + uvx + wxy + wza1 + uxa1. If this were implemented using 2-i/p AND/OR gates, the delay will
be 8 delay units (1 unit = delay for 1 i/p) after a1 is available. If the logic is re-structured as
a2= (vx + uvx + wxy) + (wz + ux)a1, and if the logic in the 2 brackets are performed before a1 is
available (these constitute A2_indep), then the delay is only 4 delay units after a1 is available.
Root problem A
Subprob. A1
Data flow
Subprob.
A2
A2_dep
A2_indep
orA2_lookahd
Concept
a2 a2
w x y w z a1u x a1v x u v x
A2
Critical path aftera1 avail (8-unit delay)
w x y w z u x a1v x u v x
A2_indepA2_dep
Critical path aftera1 avail (4-unit delay)
Example of an unstructured logic for A2
-
8/3/2019 Ece465 High Level Design Strategies
20/23
D&C Summary
For complex digital design, we need to think of the computationunderlying the design in an algorithmic manner---are there propertiesof this computation that can be exploited for faster, less expensive,modular design; is it amenable to the D&C approach?
The design is then developed in an algorithmic manner & thecorresponding circuit may be synthesized by hand or describedcompactly using a HDL
For an operation/func x on n operands (an-1 x an-2 x x a0 ) if x isassociative, the D&C approach gives an easy stitch-up function,which is x on 2 operands (o/ps of applying x on each half). This resultsin a tree-structured circuit with (log n) delay instead of a linearly-connected circuit with (n) delay can be synthesized.
If x is non-associative, more ingenuity and determination of propertiesof x is needed to determine the stitch-up function. The resulting designmay or may not be tree-structured
D&C can be done top-down or bottom-up. Top-down generally better
way to think for beginners If there is dependency between the 2 subproblems, then we saw
strategies for addressing these dependencies: Wait (slowest, least hardware cost) Speculative (fastest, highest hardware cost) Lookahead (medium speed, medium hardware cost)
-
8/3/2019 Ece465 High Level Design Strategies
21/23
Strategy 2: A general view of speculativecomputations (w/ or w/o D&C) If there is a data dependency between two
or more portions of a computation (whichmay be obtained w/ or w/o using D&C),dont wait for the the previous computation
to finish before starting the next one
Assume all possible input values for thenext computation/stage B (e.g., if it has 2
inputs from the prev. stage there will be 4possible input value combinations) andperform it using a copy of the design forpossible input value.
All the different o/ps of the diff. Copies of B
are Muxed using prev. stage As o/p
E.g. design: Carry-Select Adder (at eachstage performs two additions one for carry-in of 0 and another for carry-in of 1 from theprevious stage)
B Ax
yz
B(0,0)0
0
B(0,1)0
1
B(1,0)1
0
B(1,1)1
1
Ax
y
4:1Mux
z
(a) Original design: Time = T(A)+T(B)
(b) Speculative computation: Time = max(T(A),T(B)) + T(Mux).Works well when T(A) approx = T(B) and T(A) >> T(Mux)
-
8/3/2019 Ece465 High Level Design Strategies
22/23
Strategy 3: Get the Best of Both Worlds(Average and Worst Case Delays)!
Use 2 circuits with different worst-case and average-case behaviors Use the first available output
Get the best of both (ave-case, worst-case) worlds
In the above schematic, we get the good ave case performance ofunary division (assuming uniformly distributed inputs w/o the
disadvantage of its bad worst-case performance)
Unary
Division Ckt
(good ave
case, bad
worst case)
Non-
Restoring
Div. Ckt
(bad ave
case, good
worst case)
Ext.
FSM done2done1
start
Muxselect
outputoutput
inputs inputsRegisters
Register
-
8/3/2019 Ece465 High Level Design Strategies
23/23
Strategy 4: Pipeline It!
Original ckt
or datapath
Stage 1
Stage 2
Stage k
Conversion
to a simple
level-partitionedpipeline (level
partition may not
always be possible
but other pipe-
lineable partitions
may be)
Throughput is defined as # of outputs / sec Non-pipelined throughput = (1 / D), where D = delay of original ckts datapath Pipeline thoughput = 1/ (max stage delay + register delay)Special case: If original ckts datapath is divided into n stages, each of equal delay,and dr is the delay of a register, then pipeline thoughput = 1/((D/n)+dr). If d
r
is negligible compared to D/n, then pipeline throughput = n/D, n times that of theoriginal ckt