Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario...
-
Upload
dwayne-cameron -
Category
Documents
-
view
219 -
download
0
Transcript of Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario...
![Page 1: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/1.jpg)
Performance Analysis Performance Analysis OOf f Generics Generics
IIn Scientific Computingn Scientific Computing
Laurentiu Dragan Stephen M. WattLaurentiu Dragan Stephen M. Watt
Ontario Research Centre for Computer AlgebraOntario Research Centre for Computer Algebra
University of Western OntarioUniversity of Western Ontario
SYNASC 2005SYNASC 2005
![Page 2: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/2.jpg)
OverviewOverview
MotivationMotivation
Parametric Polymorphism ImplementationParametric Polymorphism Implementation
Generalizing A Numeric BenchmarkGeneralizing A Numeric Benchmark
Language IssuesLanguage Issues
ResultsResults
Potential OptimizationsPotential Optimizations
ConclusionConclusion
![Page 3: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/3.jpg)
MotivationMotivation
Increasing demand for generic codeIncreasing demand for generic code
Scientific code requires high-performance making Scientific code requires high-performance making optimizations very importantoptimizations very important
Generic code – not as fast as specialized codeGeneric code – not as fast as specialized code
No tools to measure performance of generic codeNo tools to measure performance of generic code
Benchmarks – tool to measure the performance Benchmarks – tool to measure the performance
SciGMark – benchmark for generic codeSciGMark – benchmark for generic code
Compilers – optimize the generic code – performance Compilers – optimize the generic code – performance close to hand specialized codeclose to hand specialized code
![Page 4: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/4.jpg)
Parametric Polymophism Parametric Polymophism ImplementationImplementation
Some languages with support for GenericsSome languages with support for Generics– Aldor, C++Aldor, C++– Java, C#Java, C#
Some types can be given as parametersSome types can be given as parameters
ImplementationsImplementations– Homogeneous: Java, C#Homogeneous: Java, C#
Share the generic codeShare the generic code
Example: Example: Vector<Integer>Vector<Integer> → → Vector Vector with elements of type with elements of type ObjectObject
– Heterogeneous: C++, C#Heterogeneous: C++, C#Specialize the generic codeSpecialize the generic code
Example: Example: std::vector<int>std::vector<int> → new specialized class → new specialized class
![Page 5: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/5.jpg)
Generalizing A Numeric Generalizing A Numeric BenchmarkBenchmark
SciMark 2SciMark 2
Polynomial MultiplicationPolynomial Multiplication
Implemented in Aldor, C++, C#, JavaImplemented in Aldor, C++, C#, Java
![Page 6: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/6.jpg)
SciMark 2SciMark 2
Fast Fourier transform – 1024Fast Fourier transform – 1024– Complex arithmetic, shuffling, non-constant memory Complex arithmetic, shuffling, non-constant memory
reference, trigonometric functionsreference, trigonometric functions
Jacobi successive over-relaxation – 100x100Jacobi successive over-relaxation – 100x100– Typical access patterns in finite difference applicationsTypical access patterns in finite difference applications
Monte Carlo integrationMonte Carlo integration– Random number generator, function inliningRandom number generator, function inlining
Sparse matrix multiplication – 1000, 5000 non-zeroSparse matrix multiplication – 1000, 5000 non-zero– Indirection addressing, non-regular memory referencesIndirection addressing, non-regular memory references
Dense LU factorization – 100x100Dense LU factorization – 100x100– Dense matrix operationsDense matrix operations
![Page 7: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/7.jpg)
From SciMark 2 to SciGMarkFrom SciMark 2 to SciGMark
SciMark – double hardcodedSciMark – double hardcoded– Arrays are of type doubleArrays are of type double– Any change – extensive modifications to the codeAny change – extensive modifications to the code
SciGMark – classes are parametricSciGMark – classes are parametric– Change representation – minimal code changesChange representation – minimal code changes– Double becomes parameter RDouble becomes parameter R
+
R a(R o)
void ae(R o)
doubleR
DoubleRing
Class SOR {Class SOR { double[] array;double[] array;}}
Class SOR < R extends IRing<R> > {Class SOR < R extends IRing<R> > { R [ ] array;R [ ] array;}}
![Page 8: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/8.jpg)
Basic Generic TypesBasic Generic Types
IRing IRing – Provides operations for addition, subtraction, multiplication, Provides operations for addition, subtraction, multiplication,
division – mutable, non-mutabledivision – mutable, non-mutable– Conversions to and from Conversions to and from intint and and doubledouble– Factories to produce new elements of these typeFactories to produce new elements of these type
DoubleRing – wrapper for doubleDoubleRing – wrapper for double– Implements IRingImplements IRing
ComplexComplex– Implements IComplex (simple extension to IRing)Implements IComplex (simple extension to IRing)– Complex<R extends IRing<R>> Complex<R extends IRing<R>>
implements IComplex<Complex<R>,R> implements IComplex<Complex<R>,R>
![Page 9: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/9.jpg)
Generic Tests Generic Tests
GenFFTGenFFT– Uses R: Complex<DoubleRing>Uses R: Complex<DoubleRing>– Complex numbers – two consecutive entries in the arrayComplex numbers – two consecutive entries in the array
Depending on the application – different representation (e.g. Depending on the application – different representation (e.g. Hermitian matrix)Hermitian matrix)
GenMat, GenLUGenMat, GenLU– Use R: DoubleRingUse R: DoubleRing– The classes contain more methods – the whole class The classes contain more methods – the whole class
contains a type parametercontains a type parameter
GenSOR, GenMonteCarloGenSOR, GenMonteCarlo– Use R: DoubleRingUse R: DoubleRing– Have single static method with a type parameterHave single static method with a type parameter
![Page 10: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/10.jpg)
Polynomial MultiplicationPolynomial Multiplication
40 coefficients40 coefficients
Dense representation unidimensional arrayDense representation unidimensional array
Regular memory access, temporary objects creation Regular memory access, temporary objects creation (memory allocation)(memory allocation)
ImplementationImplementation– DensePolynomialDensePolynomial
DensePolynomialG <E extends IRing<E> > implements DensePolynomialG <E extends IRing<E> > implements IRing<DensePolynomialG<E> >IRing<DensePolynomialG<E> >
– SmallPrimeFieldSmallPrimeFieldRepresented by an intRepresented by an int
SmallPrimeFieldG implements IRing<SmallPrimeFieldG>SmallPrimeFieldG implements IRing<SmallPrimeFieldG>
![Page 11: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/11.jpg)
Specializing Polynomial Specializing Polynomial MultiplicationMultiplication
The code was initially implemented using genericsThe code was initially implemented using generics
Inlined all the calls to Inlined all the calls to SmallPrimeFieldSmallPrimeField
Replaced all the instances of Replaced all the instances of SmallPrimeFieldSmallPrimeField with with intint
Essentially the inverse of the operation performed to Essentially the inverse of the operation performed to “generalize” the SciMark“generalize” the SciMark
No changes to the algorithm – all changes could be No changes to the algorithm – all changes could be performed automaticallyperformed automatically
![Page 12: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/12.jpg)
Language IssuesLanguage Issues
JavaJava– No operator overloadingNo operator overloading– Homogeneous – erasure technique – subclassingHomogeneous – erasure technique – subclassing– Implemented at language level – no virtual machine support Implemented at language level – no virtual machine support
– limitations – require object factory– limitations – require object factory– Type inference for generics is invariant – Pass the type as Type inference for generics is invariant – Pass the type as
argument argument Complex <R extends IRing<R>> implements IComplex<Complex<R>,R>Complex <R extends IRing<R>> implements IComplex<Complex<R>,R>
C#C#– Reference types (homogeneous) – Java; primitive types Reference types (homogeneous) – Java; primitive types
(heterogeneous) – C++(heterogeneous) – C++– Structures instead of classes – structures in collections are Structures instead of classes – structures in collections are
boxedboxed
![Page 13: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/13.jpg)
Language IssuesLanguage Issues
C++C++– HeterogeneousHeterogeneous– Parametric polymorphism (templates) macro processorParametric polymorphism (templates) macro processor– No bounded polymorphismNo bounded polymorphism– No way to test the generic class until is instantiateNo way to test the generic class until is instantiate
AldorAldor– HomogeneousHomogeneous– Supports dependent typesSupports dependent types– Polymorphic types constructed using domain constructing Polymorphic types constructed using domain constructing
functionsfunctions
![Page 14: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/14.jpg)
SciGMark ResultsSciGMark Results
Results in MFlops Results in MFlops
Testing environment:Testing environment:– Pentium IV – 3.2GHz (1MB cache), 2 GB RAMPentium IV – 3.2GHz (1MB cache), 2 GB RAM– Windows XP SP2Windows XP SP2– Cygwin/GCC 3.4.4Cygwin/GCC 3.4.4– Sun JDK 1.5.0_04 Sun JDK 1.5.0_04 – Microsoft .NET v2.0.50215Microsoft .NET v2.0.50215– Aldor 1.0.2Aldor 1.0.2
![Page 15: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/15.jpg)
SciGMark ResultsSciGMark Results
N/A35920320244415743471Comp.
401566321282274836562PM
100x10055354031898274780103LU
1000, 500048544773941011173987MM
N/A20390622826226546MC
100x10041715417226816641971SOR
1024340124273212336559FFT
SpeGenSpeGenSpeGenSpeGenSize
AldorC#JavaC++Test
![Page 16: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/16.jpg)
SciGMark ResultsSciGMark Results
N/A35920320244415743471Comp.
401566321282274836562PM
100x10055354031898274780103LU
1000, 500048544773941011173987MM
N/A20390622826226546MC
100x10041715417226816641971SOR
1024340124273212336559FFT
SpeGenSpeGenSpeGenSpeGenSize
AldorC#JavaC++Test
![Page 17: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/17.jpg)
SciGMark ResultsSciGMark Results
N/A35920320244415743471Comp.
401566321282274836562PM
100x10055354031898274780103LU
1000, 500048544773941011173987MM
N/A20390622826226546MC
100x10041715417226816641971SOR
1024340124273212336559FFT
SpeGenSpeGenSpeGenSpeGenSize
AldorC#JavaC++Test
![Page 18: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/18.jpg)
Aldor ResultsAldor Results
Testing environment:Testing environment:– Pentium IV – 3.2GHz (1MB cache), 2 GB RAMPentium IV – 3.2GHz (1MB cache), 2 GB RAM– Linux Fedora Core 3Linux Fedora Core 3– Aldor 1.0.2Aldor 1.0.2
Stanford benchmark Stanford benchmark – Aldor’s performance can be almost as good as C++Aldor’s performance can be almost as good as C++
![Page 19: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/19.jpg)
Aldor ResultsAldor Results
1.291.43Comp int
1.051.07Comp FP
407190.24268380.38Oscar FFT
203550.49143420.69FP Mat Mult
102.00101.00Tree Sort
190890.53135260.74Bubble Sort
152140.66125380.79Quick Sort
46262.1634842.89Puzzle
491550.20153860.65Mat Mult
219870.45197000.528-Queen
469240.21172970.58Towers
269010.37234000.43Permutations
IterationsTimeIterationsTime
C++AldorTest
![Page 20: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/20.jpg)
Potential OptimizationsPotential Optimizations
6-18 times performance improvement6-18 times performance improvement
Specialized codeSpecialized code– Same algorithmSame algorithm– Generic types replaced by specialized typesGeneric types replaced by specialized types– Eliminate generic wrapper objects – primitive typesEliminate generic wrapper objects – primitive types
![Page 21: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/21.jpg)
Test Case AldorTest Case Aldor
Domain producing function:Domain producing function:
PolynomialVect(C: Ring) == add {PolynomialVect(C: Ring) == add { Rep == Vector Polynomial C; Rep == Vector Polynomial C; (f: %) + (g: %): % == { (f: %) + (g: %): % == { res := new(#f); res := new(#f); rf := rep f; rg := rep g; rf := rep f; rg := rep g; for k in 1..#f for i in rf for j in rg repeat for k in 1..#f for i in rf for j in rg repeat res(k) := i + j; res(k) := i + j; per res per res } }}}PC == PolynomialVect(Complex DoubleFloat);PC == PolynomialVect(Complex DoubleFloat);PQ == PolynomialVect(Rational);PQ == PolynomialVect(Rational);
![Page 22: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/22.jpg)
Test Case AldorTest Case Aldor
PolynomialVect(PolynomialVect(CC: Ring) == add {: Ring) == add { Rep == Vector Polynomial Rep == Vector Polynomial CC;; (f: %) + (g: %): % == { (f: %) + (g: %): % == { res := new(#f); res := new(#f); rf := rep f; rg := rep g; rf := rep f; rg := rep g; for k in 1..#f for i in rf for j in rg repeat for k in 1..#f for i in rf for j in rg repeat res(k) := i res(k) := i ++ j; j; per res per res } }}}PC == PolynomialVect(Complex DoubleFloat);PC == PolynomialVect(Complex DoubleFloat);PQ == PolynomialVect(Rational);PQ == PolynomialVect(Rational);
Domain producing function:Domain producing function:
![Page 23: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/23.jpg)
Test Case AldorTest Case Aldor
Specialize the domain producing functionSpecialize the domain producing function
PC == add {PC == add { Rep == Vector Polynomial Rep == Vector Polynomial Complex DoubleFloatComplex DoubleFloat;; (f: %) + (g: %): % == { (f: %) + (g: %): % == { res := new(#f); res := new(#f); rf := rep f; rg := rep g; rf := rep f; rg := rep g; for k in 1..#f for i in rf for j in rg repeat for k in 1..#f for i in rf for j in rg repeat res(k) := i res(k) := i ++ j; j; -- ‘+’ from Complex-- ‘+’ from Complex per res per res } }}}
![Page 24: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/24.jpg)
Optimize Data RepresentationOptimize Data RepresentationScalar product of vector of complex numbersScalar product of vector of complex numbers
dot(u: Vector Complex R, v: Vector Complex R): Complex R == {dot(u: Vector Complex R, v: Vector Complex R): Complex R == { ss: Complex R := 0;: Complex R := 0; for i in 1..n repeat for i in 1..n repeat ss := := ss + u.i*v.i; + u.i*v.i; return s; return s;}}
dot(u: Vector Complex R, v: Vector Complex R): Complex R == {dot(u: Vector Complex R, v: Vector Complex R): Complex R == { xx: R := 0; : R := 0; yy: R := 0;: R := 0; for i in 1..n repeat { for i in 1..n repeat { xx := := xx + real(u.i)*real(v.i) - imag(u.i)*imag(v.i); + real(u.i)*real(v.i) - imag(u.i)*imag(v.i); yy := := yy + real(u.i)*imag(v.i) + imag(u.i)*real(v.i); + real(u.i)*imag(v.i) + imag(u.i)*real(v.i); } } return complex(x,y); return complex(x,y);}}
![Page 25: Performance Analysis Of Generics In Scientific Computing Laurentiu Dragan Stephen M. Watt Ontario Research Centre for Computer Algebra University of Western.](https://reader035.fdocuments.net/reader035/viewer/2022062517/56649ef65503460f94c09c2b/html5/thumbnails/25.jpg)
ConclusionConclusion
Generics important for scientific computing – rich Generics important for scientific computing – rich mathematical models – easy to implement with generic mathematical models – easy to implement with generic codecode
Need a tool to measure the compiler ability to produce Need a tool to measure the compiler ability to produce efficient codeefficient code
We have seen difference of 6-18 times between We have seen difference of 6-18 times between generic and specialized code – room for improvement generic and specialized code – room for improvement in compilers capabilitiesin compilers capabilities
Presented some optimizations ideasPresented some optimizations ideas
http://www.orrca.on.ca/benchmarks/scigmark/1.0/http://www.orrca.on.ca/benchmarks/scigmark/1.0/