R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering
description
Transcript of R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering
R. Arce-Nazario, M. Jimenez, and D. RodriguezElectrical and Computer EngineeringUniversity of Puerto Rico – Mayagüez
WALSAIP
2
Motivation and Objective
Discrete Signal Transforms (DSTs)DFT, DCT, lots of applicationsHardware accelerated but at high area cost
Distributed (dedicated) hardware architectures (DHAs)Cost-effectivePartitioning plays key role
Objective: Use inherent properties of DSTs to improve their hardware partitioning to distributed hardware architectures.
DST Partitioning
DHA
3
Previous Work
Automated partitioning of DST to DHA’sDSTs treated as any other algorithm/benchmark [Srinivasan01][Bringmann00]Converted to high-level or structural DFG and treated as such.
Manual partitioning & automated code generationDST specific properties exploited [Kumhom01]New formulations developed to exploit architectural features. [VanLoan92]SPIRAL and FFTW – code generation platforms exploring the space of equivalent algorithms. ([Pueschel05], [Frigo05])
[Arce05] – Automated partitioning methodology that incorporates DST features and formulation exploration
4
Partitioning Methodology KPA DST
FormulationArchitecturalDescription
FormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
Estimators
High-level partition solution
KPAFormulation DFG
Cost andIndicators
RuleSelection
KPAFormulation
HypergraphRepresentation
5
DSTs – General Concepts
),()..,(],..,[..],..,[ 111111
ddddnn
d knknnnxkkXd
General formula for d-dimensional DST
Essentially a vector-matrix multiplicationFast versions exists, using divide and conquer techniques
Highly regularHighly connectedRules can be applied at formulation level: permutation,index-set..
α’s determine type of transform, e.g. DFT: iii Nknjiii ekn /2),(
( ) ( )( ) ( )8 2 4 1 2 2 2 0 4 2 8F F I T I F I T I F R Ä Ä Ä Ä
8R ( )4 2I FÄ ( )( )2 2 2 0I F I TÄ Ä ( )2 4 1F I TÄ
6
Kronecker Algebra
4444 FFF x Ä)()( 242,4248 FITIFF ÄÄ
84242,4248 )()( PFITIFF ÄÄ
F4
F2 W
W
F2 W
W
F2 W
W
F2 W
WF4
7
Target topology
Similar to existing platforms in market and academia.Annapolis Micro Systems (Wildforce)Gidel (PROC20KE)Berkeley Emulation Engine (BEE) – being proposed as a cost effective alternative to traditional high performance computing systems.
M0
D0
M1
D1
Mk-1
Dk-1
Crossbar
8
Partitioning Methodology KPA DST
FormulationArchitecturalDescription
FormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
Estimators
High-level partition solution
KPAFormulation DFG
Cost andIndicators
RuleSelection
KPAFormulation
HypergraphRepresentation
9
DST properties in our methodology
Incorporated graph considerations to partitioning/placement process
Exploration of equivalent formulations
Partition/Placement
FormulationManipulator
FormulationTo DFG
Heuristic Control
Partition/Placement
Estimators
KPAFormulation DFG
Cost andIndicators
RuleSelection
10
Graph partitioning considerations
Focus on horizontal partitioning schemes (SIMD-like implementation)
Initial solution = balanced horizontal linear partitioning
scheduling consideration: swap nodes from same computational stages.
M0
D0
M1
D1
Mk-1
Dk-1
Crossbar
Kernigan Lin - bipartitioning Heterogeneous channel k-way partitioning
11
Formulation exploration( ) ( ), ,n p m n p p m n pF F I T I F P Ä ÄFormulation
ManipulatorFormulation
To DFG
Heuristic Control
Partition/Placement
KPAFormulation
DFG
Cost andIndicators
RuleSelection
FormulationManipulator
Applies permutation and factorization to Kronecker formulation of DSTs to obtain equivalent formulations
Rule
Number of possible reformulations grows exponentially with DST size
Heuristic control method, first answer questions:Do reformulations have an effect on solution quality?How can we effectively explore the equivalent formulation space to find more apt formulations?
Experiments Gain an understanding of algorithmic level effects on solution quality and convergence.
( ) ( )8 2 16,8 8 2 16,8F I T I F PÄ Ä
( ) ( )( )( )( )2 4 8,2 2 4 8,2 2
16,8 8 2 16,8
F I T I F P I
T I F P
Ä Ä Ä
Ä
12
Measuring quality of solution
0 1 1, , , mCost where ‘weight’ of channel iii i WR
required communications through i
D0
D1
D2
D3
D0
D1
D2
D3
4,4 4, ,8Cost
Example: W01 = W12 = W23 = 1, WXBAR = 2
13
Experiment #1 – Inter-stage permutationsSince Cooley-Tukey’s FFT several common formulations available.
( ) ( )( ) ( )8 2 4 1 2 2 2 0 4 2 8F F I T I F I T I F R Ä Ä Ä Ä Pease formulation here
Experiment – several sizes of 5 common formulations where partitioned.
ISP have effect on solution quality, yet no clear winner formulation.
StockahmTr. Stockahm
Cooley-TukeyG. Sande
Pease
14
Experiment #2 - GranularityThe weight of the nodes for the various computational stages of the transform.
F4F4 F4F4
F4F4
F4F4
F4F4
F4F4
F4F4
F4F4
F2F2
F2F2
F4F4
F4F4
F4F4
F4F4
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
F2F2
164 4 4 4 4 4 4( ) ( )F F I T I F P Ä Ä 16
422422244444 )))()(()(( PFITIFIIFF ÄÄÄÄ
coarser finer
15
Experiment #2 – Granularity
Decomposition rules: Large DST = combinations of smaller DSTs analogous to node clustering
* Multiple formulations achieved best cost. Coarsest granularity is shown.
Size Cost Formulation Cost Formulation Cost Formulation Cost Formulation32 11 2/2/2/4* 7 2/2/2/4 32 8/2/2* 16 2/4/2/264 22 8/2/4* 14 2/2/8* 48 2/2/2/2/4 20 4/2/2/4
128 43 8/2/8* 26 16/2/2/2* 92 2/2/2/2/2/4 32 2/2/2/2/2/4256 86 4/2/32* 55 16/8/2* 132 4/2/2/2/2/4 58 2/2/2/2/2/2/4512 171 4/2/64* 106 64/4/2* 276 2/2/2/2/2/2/4/2 116 2/2/2/2/2/2/8
Array 4 Ring 4 Array 8 Ring 8
Effect of topology: Ring vs. Linear: 57% cost reductionFinest granularity not necessarily best.
( ) ( ) ( ) ( ) ( ) ( ) ( )( )( )8 4 2 8,4 4 2 8,4 2 4 8,2 2 4 8,2 2 4 8,2 2 2 2 4,2 2 2 4,2 8,2F F I T I F P F I T I F P F I T I F I T I F P P Ä Ä Ä Ä Ä Ä Ä Ä
16
Experiment #3 – Breakdown strategy
Breakdown strategy – order and divisors with which a transform is decomposed.Split trees – a common graphical representation of break. StrategyExample: Two split tress for a DFT size 64.
( ) ( )( )( ) ( )64 4 2 8,4 4 2 8,4 8 64,8 8 8 64,8F F I T I F P I T I F P Ä Ä Ä Ä
( )64 2 32 64,2F F I T Ä ( ) ( )( )( )2 2 16 16,2 2 16 16,2 64,2I F I T I F P PÄ Ä Ä
(a)
(b)
6
3 3
2 1
6
1 5
41
(a) (b)
17
Experiment #3 – Results
ProcedureExhaustive generation of split trees for DFT sizes n=16 to 256. Formulations partitioned for various topologiesObservation of split tree decisions that lead to ‘partition friendly’ formulationsGeneration of n > 256 formulations using rules.
18
Conclusions and Future WorkMethodology for partitioning of DST to DHAs:
DST graph considerations Formulation exploration
Graph considerationsGeneration of initial partition linear – provides better results than random.Limitation of node moves – faster convergence time.
Exploration at the algorithmic level experimentsIsolated features such as permutations and granularity
Effect was evidenced, but hard to establish a relation to solution quality.Coarse granularity = better convergence, good solution quality
Breakdown strategy – ‘partition friendly’ formulations generated.
Current Work: Experimentation with DCTs.Experimentation with other properties define overall exploration strategy
19
Acknowledgements
Puerto Rico Experimental Program to Stimulate Competitive Research (PR-EPSCoR)
WALSAIP - Wide-Area Large Scale Automated Information Project
Puerto Rico NASA Space Grant
QUESTIONS?