Algebraic Techniques To Enhance Common Algebraic Techniques To Enhance Common Sub-expression Extraction for Polynomial Sub-expression Extraction for Polynomial
System SynthesisSystem Synthesis
Sivaram Gopalakrishnan Synopsys Inc., Hillsboro, OR – 97124
Priyank KallaDepartment of Electrical and Computer Engineering,
University of Utah, Salt Lake City, UT- 84112
OutlineOutline Problem context: Polynomial datapath synthesisProblem context: Polynomial datapath synthesis
• Our Focus: Integrating CSE and Algebraic methodsOur Focus: Integrating CSE and Algebraic methods
• Applications: DSP for audio, video, multimedia….Applications: DSP for audio, video, multimedia….
MotivationMotivation Previous Work and LimitationsPrevious Work and Limitations Integrated Approach Integrated Approach
• Square-free factorizationSquare-free factorization
• Common Coefficient ExtractionCommon Coefficient Extraction
• Common Cube ExtractionCommon Cube Extraction
• Algebraic DivisionAlgebraic Division
Results: Area OptimizationResults: Area Optimization Conclusions & Future WorkConclusions & Future Work
The Synthesis FlowThe Synthesis Flow
Polynomial representation?Polynomial representation?
Quadratic filter design for polynomial signal processingQuadratic filter design for polynomial signal processing
y = ay = a0 0 . x. x1122 + a + a1 1 . x. x11 + b + b0 0 . x. x00
22 + b + b1 1 . x. x00 + c . x + c . x0 0 . x. x11
MotivationMotivation
PP11 = x = x22 + 6xy + 9y + 6xy + 9y22
PP22 = 4xy = 4xy22 + 12y + 12y3 3
PP33 = 2zx = 2zx22 + 6xyz + 6xyz
PP11 = x(x+ 6y) + 9y = x(x+ 6y) + 9y22
PP22 = 4xy = 4xy22 + 12y + 12y3 3
PP33 = x(2zx + 6yz) = x(2zx + 6yz)
PP11 = x(x+ 6y) + 9y = x(x+ 6y) + 9y22
PP22 = y = y22(4x+ 12y) (4x+ 12y) PP33 = xz(2x + 6y) = xz(2x + 6y)
Direct Implementation17 Mults & 4 Adds
Horner form15 Mults & 4 Adds
Factorization + CSE12 Mults & 4 Adds
MotivationMotivation
dd11 = x + 3y = x + 3y PP11 = d = d11
22
PP22 = 4d = 4d11yy22
PP33 = 2xzd = 2xzd11
dd11 is a good building block is a good building block
How to identify such building blocks across How to identify such building blocks across multiple polynomial datapaths?multiple polynomial datapaths?
Need an methodology to expose many common Need an methodology to expose many common expressions!!!expressions!!!
Our Approach8 Mults & 1 Add
Conventional MethodsConventional Methods
Extracting control-dataflow graphs (CDFGs) from RTL Extracting control-dataflow graphs (CDFGs) from RTL
• SchedulingScheduling
• Resource sharing Resource sharing
• RetimingRetiming
• Control synthesisControl synthesis
Algebraic Transforms for arithmetic designsAlgebraic Transforms for arithmetic designs
• Factorization [Factorization [Hosangadi et alHosangadi et al, ICCAD 04], ICCAD 04]
• Common Sub-expression Elimination [Common Sub-expression Elimination [Hosangadi et alHosangadi et al, VLSI 05], VLSI 05]
• Term-rewriting [Term-rewriting [Arvind et alArvind et al, IEEE. Micro 98], IEEE. Micro 98]
• Tree-Height Reduction [Tree-Height Reduction [De Micheli De Micheli 94]94]
Lack of symbolic computer algebra manipulationLack of symbolic computer algebra manipulation
Conventional Methods…Conventional Methods… Kernel/Co-kernel Extraction (Factorization + CSE)Kernel/Co-kernel Extraction (Factorization + CSE)
Integrates CSE with cube/coefficient extractionIntegrates CSE with cube/coefficient extraction
Uses coefficients and variables to identify cubes (co-kernels)Uses coefficients and variables to identify cubes (co-kernels)
to obtain kernelsto obtain kernels
Subsequently uses CSE for further optimizationSubsequently uses CSE for further optimization
P = 5P = 5xx22 + 10y + 10y3 3 + 15pq+ 15pq;;
Uses {5, 10, 15, x, y, p, q} for kernel/co-kernel extractionUses {5, 10, 15, x, y, p, q} for kernel/co-kernel extraction
Does not perform algebraic divisionDoes not perform algebraic division
Cannot determine decomposition Cannot determine decomposition 5(x5(x22 + 2y + 2y3 3 + 3pq)+ 3pq)
P = xP = x22 + 2xy + y + 2xy + y22; -> (x+y); -> (x+y)22
Cannot determine the above decomposition Cannot determine the above decomposition
Symbolic algebra techniquesSymbolic algebra techniques Polynomial models for complex computational blocksPolynomial models for complex computational blocks
Guiding Synthesis engines using Gröbner’s basis Guiding Synthesis engines using Gröbner’s basis
[[Peymandoust and De MicheliPeymandoust and De Micheli, TCAD 02], TCAD 02]
• Given polynomial F and Library elements <IGiven polynomial F and Library elements <I11, …, I, …, Inn>>
• F = hF = h11 I I11 + …… + h + …… + hnn I In n
• Restricted to library elementsRestricted to library elements
Datapath optimization using word-length information
[Gopalakrishnan et al, ICCAD 07]• Restricted to fixed-size datapathsRestricted to fixed-size datapaths
• Cannot address systems of polynomialsCannot address systems of polynomials
Optimization techniquesOptimization techniques
• Canonical Form repreCanonical Form representationsentation
∑∑cckkYYk k
• cck k : : Coefficient in the range (0 ≤ Coefficient in the range (0 ≤ cckk ≤ ≤ b bkk))
• YYk k : Falling factorial : Falling factorial
• F = F = 3x3x22yy2 2 - 3x- 3x22yy - 3xy- 3xy2 2 + 3xy = 3x(x-1)y(y-1)+ 3xy = 3x(x-1)y(y-1)
ff11 = 5x = 5x33yy22 - 5x - 5x33yy - 15x- 15x22yy2 2 + 15x+ 15x22yy + 10xy+ 10xy2 2 - 10xy + 3z- 10xy + 3z22
ff22 = 3x = 3x22yy2 2 - 3x- 3x22yy - 3xy- 3xy2 2 + 3xy + z + 1+ 3xy + z + 1
dd11 = x(x-1)y(y-1) = x(x-1)y(y-1)
ff11 = 5d = 5d11(x-2) + 3z(x-2) + 3z22
ff22 = 3d = 3d11 + z + 1 + z + 1
Optimization techniquesOptimization techniques Square-free factorizationSquare-free factorization
Let F be an integral domain ZLet F be an integral domain Z
A polynomial u in F[x] is square-free if there is no A polynomial u in F[x] is square-free if there is no polynomialpolynomial v in F[x] v in F[x]
with deg(v, x) > 0, such that vwith deg(v, x) > 0, such that v22 | u. | u.
uu11 = x = x22 + 3x + 2; u + 3x + 2; u11 = (x+1)(x+2) is square-free = (x+1)(x+2) is square-free
uu22 = x = x44 + 7x + 7x3 3 + 18x+ 18x22 + 20x + 8; + 20x + 8;
uu22 = (x+1)(x+2) = (x+1)(x+2)22 is not square-free!!! is not square-free!!!
Optimization techniquesOptimization techniques Common Coefficient ExtractionCommon Coefficient Extraction
P = 8x + 16y + 24z;P = 8x + 16y + 24z;
PP11 = 2(4x + 8y + 12z); = 2(4x + 8y + 12z);
PP22 = 4(2x + 4y + 6z); = 4(2x + 4y + 6z);
PP33 = 8(x + 2y + 3z); best transformation = 8(x + 2y + 3z); best transformation
Use GCD computationUse GCD computation
Get the coefficients (aGet the coefficients (aisis))
Compute GCD of every pair (aCompute GCD of every pair (aii, a, ajj))
Retain GCDs > atleast (aRetain GCDs > atleast (aii, a, ajj))
Arrange GCDs in decreasing order, perform extractionArrange GCDs in decreasing order, perform extraction
Update GCD list and continue…Update GCD list and continue…
Optimization techniquesOptimization techniques Common Coefficient Extraction (Example)Common Coefficient Extraction (Example)
P = 8x + 16y + 24z + 15a + 30b;P = 8x + 16y + 24z + 15a + 30b; Coefficients {8, 16, 24, 15, 30}Coefficients {8, 16, 24, 15, 30}
GCD list {8, 8, 1, 2, 8, 1, 2, 1, 6, 15}GCD list {8, 8, 1, 2, 8, 1, 2, 1, 6, 15}
Reduced GCD list {8, 15} -> decreasing order {15, 8}Reduced GCD list {8, 15} -> decreasing order {15, 8}
Extracting 15 results in Extracting 15 results in
P = 8x + 16y + 24z + 15(a + 2b);P = 8x + 16y + 24z + 15(a + 2b);
Similarly, extracting 8 results in Similarly, extracting 8 results in
P = 8(x + 2y + 3z) + 15(a + 2b);P = 8(x + 2y + 3z) + 15(a + 2b);
Optimization techniquesOptimization techniques Common Cube ExtractionCommon Cube Extraction
Similar to kernel/co-kernel extraction (for variables…)Similar to kernel/co-kernel extraction (for variables…)
PP11 = x = x22y + xyz;y + xyz;
PP22 = ab = ab22cc33 + b + b22cc22x;x;
PP33 = axz + x = axz + x22zz22b; b;
kernel/co-kernel extraction results inkernel/co-kernel extraction results in PP11 = xy(x + z); = xy(x + z);
PP22 = b = b22cc22(ac + x);(ac + x);
PP33 = xz(a + xzb); = xz(a + xzb);
Optimization techniquesOptimization techniques Polynomial long divisionPolynomial long division
Given two polynomials a(x) and b(x), algebraic division determines Given two polynomials a(x) and b(x), algebraic division determines
q(x) and r(x) such thatq(x) and r(x) such that
a(x) = b(x) q(x) + r(x)a(x) = b(x) q(x) + r(x)
a(x) = xa(x) = x44 - 2x - 2x3 3 + 5; + 5;
b(x) = xb(x) = x22 + 3x + 3x - 2; - 2;
a(x) = b(x) (xa(x) = b(x) (x22 – 5x – 5x + 17) – 61x + 39+ 17) – 61x + 39
q(x) r(x)q(x) r(x)
Optimization techniquesOptimization techniques Common Sub-Expression EliminationCommon Sub-Expression Elimination
Identify isomorphic patterns in an arithmetic expression tree and Identify isomorphic patterns in an arithmetic expression tree and
merge them!!!merge them!!!
k = x + y;k = x + y;
m = x + y + z;m = x + y + z;
n = xy + x + y;n = xy + x + y;
k = x + y;k = x + y;
m = k + z;m = k + z;
n = xy + k;n = xy + k;
Integrated approachIntegrated approach
Input: The polynomial system PInput: The polynomial system Porigorig (list of arrays) (list of arrays)
Perform Canonization, Square-free factorizationPerform Canonization, Square-free factorization
Get best initial cost: CGet best initial cost: Cinitialinitial
Perform Coefficient extraction: PPerform Coefficient extraction: Pcce cce
Perform cube extraction: PPerform cube extraction: Pcce_cubecce_cube, get linear blocks, get linear blocks
Get the lists representing the systemGet the lists representing the system
For every linear block, for each list perform algebraic divisionFor every linear block, for each list perform algebraic division
Pick the best costPick the best cost
IllustrationIllustration
Integrated approach (Example)Integrated approach (Example)
PP11 = 13x = 13x22 + 26xy + 13y + 26xy + 13y22 + 7x - 7y + 11; + 7x - 7y + 11;
PP22 = 15x = 15x22 - 30xy + 15y - 30xy + 15y22 + 11x + 11y + 9; P + 11x + 11y + 9; Porigorig
Square-free factorization does not work!!!Square-free factorization does not work!!!
Initial cost: 16 M and 10 AInitial cost: 16 M and 10 A
After common coefficient extraction (PAfter common coefficient extraction (Pccecce))
PP11 = 13(x = 13(x22 + 2xy + y + 2xy + y2)2) + 7(x – y) + 11; + 7(x – y) + 11;
PP22 = 15(x = 15(x22 - 2xy + y - 2xy + y2)2) + 11(x + y) + 9; + 11(x + y) + 9;
Linear blocks: (x – y), (x + y)Linear blocks: (x – y), (x + y)
Integrated approach (Example…)Integrated approach (Example…)
After common cube extraction (PAfter common cube extraction (Pcce_cubecce_cube))
PP11 = 13(x(x + 2y) + y = 13(x(x + 2y) + y2)2) + 7(x – y) + 11; + 7(x – y) + 11;
PP22 = 15(x(x- 2y) + y = 15(x(x- 2y) + y2)2) + 11(x + y) + 9; + 11(x + y) + 9;
Linear blocks: (x – y), (x + y), (x + 2y), (x – 2y)Linear blocks: (x – y), (x + y), (x + 2y), (x – 2y)
Perform algebraic division using the linear blocksPerform algebraic division using the linear blocks
PPccecce is the best cost implementation with (x+y) (x-y) is the best cost implementation with (x+y) (x-y)
dd11 = x + y; d = x + y; d22 = x - y; = x - y;
PP11 = 13d = 13d1122 + 7d + 7d22 + 11; + 11;
PP22 = 15d = 15d2222 + 11d + 11d11 + 9; + 9;
Cost: 6 M and 6 ACost: 6 M and 6 A
ResultsResults
Average area improvement: 42%
Benchmark Var/Deg/m Factor/CSE Proposed ↑Area % ↑Delay %
SG3X2 2/2/16 204805 102386 50 21.3
SG4X2 2/2/16 449063 197599 55.9 -24.1
SG4X3 2/3/16 690208 557252 19.2 -16.3
SG5X2 2/2/16 570384 271729 52.3 -13.9
SG5X3 2/3/16 1365774 614955 54.9 -20.7
Quad 2/2/16 36405 30556 16 -9.5
Mibench 3/2/8 20359 8433 58.6 -3.7
MVCS 2/3/16 31040 22214 28.4 -32
ResultsResults
Average area improvement: 42%
Benchmark Var/Deg/m Factor/CSE Proposed ↑Area % ↑Delay %
SG3X2 2/2/16 204805 102386 50 21.3
SG4X2 2/2/16 449063 197599 55.9 -24.1
SG4X3 2/3/16 690208 557252 19.2 -16.3
SG5X2 2/2/16 570384 271729 52.3 -13.9
SG5X3 2/3/16 1365774 614955 54.9 -20.7
Quad 2/2/16 36405 30556 16 -9.5
Mibench 3/2/8 20359 8433 58.6 -3.7
MVCS 2/3/16 31040 22214 28.4 -32
Conclusions & Future WorkConclusions & Future Work
Polynomial decomposition approach for arithmetic datapaths Polynomial decomposition approach for arithmetic datapaths Arithmetic datapaths modeled as polynomial systemsArithmetic datapaths modeled as polynomial systems
Integrating CSE with algebraic manipulationIntegrating CSE with algebraic manipulation
Performing algebraic decomposition to enhance the power of CSEPerforming algebraic decomposition to enhance the power of CSE
Impressive area savingsImpressive area savings But delay penalty!!!But delay penalty!!!
Future Work: Future Work: • Address the concerns in delay!!!Address the concerns in delay!!!
• Retarget the approach towards power savings??? Retarget the approach towards power savings???
Questions???
Top Related