Post on 20-Dec-2015
Gaussian EliminationBy
Yequn Zhang, Yu Zhang
Contents
IntroductionProblem AnalysisProposed AlgorithmEvaluation
Contents
IntroductionProblem AnalysisProposed AlgorithmEvaluation
Gaussian EliminationForward EliminationBack Substitution
Contents
IntroductionProblem AnalysisProposed AlgorithmEvaluation
Problem AnalysisData size used by kernels changes continuouslyDifficult to find an appropriate block size to avoid divergenceBlock-based approach
Assign a certain part of computation running on CPU-leave the irregularity to cpu
Manually make the data size changes with a step of block sizeBlock number per grid is easy to set
Contents
IntroductionProblem AnalysisProposed AlgorithmEvaluation
Forward EliminationA block-based approachTry to avoid divergenceTry to use GPUTry to be fine-grained
K 1
Find Max Row
Swapcpu
Now start toeliminate the block of data on cpu
Calculatecoefficients
Eliminationon CPU
K 1
Calculate Coefficients
K2K 2
Eliminationon CPU
Swap on GPU
K3
K 3
K4Elimination on GPU
K 4
K5Eliminationon GPU
K 5
Intra-block loop
Inter-block loop
Last inter-block loopprocessedon CPU
Back SubstitutionLaunch kernel when number of coefficients per row
exceeds four block size (64*4=256)A fine-grained way, use a similar way as forward
elimination, part on CPU and part on GPU
Contents
IntroductionProblem AnalysisProposed AlgorithmEvaluation
Block size effect
The contribution of swap and find max rowIs it necessary to implement every part on GPU?
Performance breakdownContribution of each part to the total performance,
including kernels as well as CPU part
Speedup
Questions ?