The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The...

37
KIT – The Research University in the Helmholtz Association Hartwig Anzt & FiNE@KIT in collaboration with Jack Dongarra & ICL, Ulrike Meier Yang, Enrique Quintana-Orti, and many others... www.kit.edu The Multiprecision Effort in the US Exascale Computing Project ICERM: Variable Precision in Mathematical and Scientific Computing May 7 th / May 8 th 2020

Transcript of The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The...

Page 1: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

KIT – The Research University in the Helmholtz Association

HartwigAnzt &FiNE@KIT incollaboration with JackDongarra &ICL,UlrikeMeierYang,EnriqueQuintana-Orti,and manyothers...

www.kit.edu

The Multiprecision Effort in the US Exascale Computing ProjectICERM: Variable Precision in Mathematical and Scientific ComputingMay 7th / May 8th 2020

Page 2: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

2 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

WhatistheMultiprecision EffortinECP

• CoordinatedeffortacrossallmathlibraryprojectsoftheUSExascale ComputingProject;

• AdministrativelypartofthexSDK4ECPprojectledbyUlrikeMeierYang;

• Linkbetweenmultiprecision effortsofECPprojectpartnersandcreatesynergiesacrosstheindividualefforts;

• Evaluatestatusquoanddevelopanddeployproduction-readysoftware;

• Algorithmfocusonlinearsolvers,eigenvaluesolvers,preconditioners,multigridmethods,FFT,MachineLearning(ML)technology;

• Hardwarefocusonleadershipcomputers(Summit,Frontier…);

• Wearefocusingonperformance,not(bit-wise)reproducibility;

UlrikeMeierYang(LLNL)

Page 3: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

3 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

FloatingpointformatsandperformanceonGPUs

200620072008200920102011201220132014201520162017201820192020

Tesla Fermi Kepler Maxwell Pascal Volta

1 :8 1 :24 1 :32 1 :2 :16*

1 :2

1 :2 :41 :8

1 :2 1 :2 1 :2 :41 :2 1 :2 :4

Rel.computeperformance

Rel.memoryperformance

double :single :half

*Tensorcores

NVIDIAGPUgeneration

Page 4: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

4 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

FloatingpointformatsandperformanceonGPUs

200620072008200920102011201220132014201520162017201820192020

Tesla Fermi Kepler Maxwell Pascal Volta

1 :8 1 :24 1 :32 1 :2 :16*

1 :2

1 :2 :41 :8

1 :2 1 :2 1 :2 :41 :2 1 :2 :4

Rel.computeperformance

Rel.memoryperformance

double :single :half

*Tensorcores

NVIDIAGPUgeneration

Forcompute-bound applications,theperformancegainsfromusinglowerprecisiondependonthearchitecture.

Upto16xforFP16onVolta,upto32xforFP32onMaxwell.

Formemory-boundapplications,theperformancegainsfromusinglowerprecisionarearchitecture-independentandcorrespondtothefloatingpointformatcomplexity(#bits).

Generally,2xforFP32,4xforFP16.

Page 5: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

5 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp

• Performanceofcompute-bound algorithmsdependsonformatsupportofhardware.

• Performanceofmemory-bound algorithmsscaleshardware-independent withinverseofformatcomplexity.

Take-Away

Page 6: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

6 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

IEEE754FloatingPointFormats

SignificandExponentSignbit

Broadlyspeaking….• Thelengthofthe exponentdeterminestherange ofthevalues

thatcanberepresented;• Thelengthofthesignificand determineshowaccurate values

canberepresented;

FigurecourtesyofIgnacioLaguna,LLNL

IDEASWebinar#34byIgnacioLagunaonToolsandTechniquesforFloating-PointAnalysis

Page 7: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

7 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

IEEE754FloatingPointFormats

IDEASWebinar#34byIgnacioLagunaonToolsandTechniquesforFloating-PointAnalysis

doubleprecision(FP64)

singleprecision(FP32)

halfprecision(FP16)

Broadlyspeaking….• Thelengthofthe exponentdeterminestherange ofthevalues

thatcanberepresented;• Thelengthofthesignificand determineshowaccurate values

canberepresented;

FigurecourtesyofIgnacioLaguna,LLNL

Page 8: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

8 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Floatingpointformatsandaccuracy

• Thelengthofthe exponentdeterminestherange ofthevalues thatcanberepresented;• Thelengthofthesignificand determineshowaccurate valuescanberepresented;

• Roundingeffectsaccumulateoverasequenceofcomputations;

LetusfocusonlinearsystemsoftheformAx=b.

• Theconditioning ofalinearsystemreflectshowsensitivethesolutionxiswithregardtochangesintheright-handsideb.

• Ruleofthumb:

relativeresidualaccuracy= (unitround-off)*(linearsystem’sconditionnumber)

N.Higham:Accuracyandstabilityofnumerical algorithms.SIAM,2002.

Page 9: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

9 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Floatingpointformatsandaccuracy

LinearSystemAx=bwithcond(A)≈ 104

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp

Page 10: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

10 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Floatingpointformatsandaccuracy

DoublePrecision

Accuracyimprovement~1012

LinearSystemAx=bwithcond(A)≈ 104

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp

relativeresidualaccuracy=(unitround-off)*(linearsystem’sconditionnumber)

…- ValueType = double;+ ValueType = float;

Page 11: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

11 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Floatingpointformatsandaccuracy

DoublePrecision

Accuracyimprovement~1012

LinearSystemAx=bwithcond(A)≈ 104

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp

SinglePrecision

Accuracyimprovement~104

relativeresidualaccuracy=(unitround-off)*(linearsystem’sconditionnumber)

Page 12: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

12 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Floatingpointformatsandaccuracy

DoublePrecision

Accuracyimprovement~1013

LinearSystemAx=bwithcond(A)=103

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp

SinglePrecision

Accuracyimprovement~104

Page 13: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

13 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Floatingpointformatsandaccuracy

DoublePrecision

SinglePrecisionis10%faster!

LinearSystemAx=bwithcond(A) ≈ 104

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp

SinglePrecision

Page 14: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

14 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Floatingpointformatsandaccuracy

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp

SinglePrecisionDoublePrecision

LinearSystemAx=bwithcond(A)≈ 107

NoimprovementAccuracyimprovement~109

apache2fromSuiteSparse

Page 15: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

15 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

• Performanceofcompute-bound algorithmsdependsonformatsupportofhardware.

• Performanceofmemory-bound algorithmsscaleshardware-independent withinverseofformatcomplexity.

• relativeresidualaccuracy=(unitround-off)*(linearsystem’sconditionnumber)

• Iftheproblem iswell-conditioned,andalow-accuracysolutionisacceptable,usealowprecisionformat.(i.e.IEEEsingleprecision,orevenIEEEhalfprecision.)

Take-Away

Page 16: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

16 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

• Performanceofcompute-bound algorithmsdependsonformatsupportofhardware.

• Performanceofmemory-bound algorithmsscaleshardware-independent withinverseofformatcomplexity.

• relativeresidualaccuracy=(unitround-off)*(linearsystem’sconditionnumber)

• Iftheproblem iswell-conditioned,andalow-accuracysolutionisacceptable,usealowprecisionformat.(i.e.IEEEsingleprecision,orevenIEEEhalfprecision.)

Frameworkforexploring theeffectoffloatingpointformatiniterativesolvers:

https://github.com/ginkgo-project/ginkgo

PratikNayakTerryCojean GoranFlegar ThomasGrützmacher

TobiasRibizel MikeTsaiFritzGöbel

Take-Away

Page 17: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

17 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

• Preconditioningiterativesolvers.

• Idea:Approximateinverseofsystemmatrixtomakethesystem“easiertosolve”:

andsolve.

P�1 ⇡ A�1

Lowprecisionforsolvingill-conditionedproblems

Ax = b , P

�1Ax = P

�1b , Ax = b

<latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit>

Page 18: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

18 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

• Preconditioningiterativesolvers.

• Idea:Approximateinverseofsystemmatrixtomakethesystem“easiertosolve”:

andsolve.

• Whyshouldwestorethepreconditionermatrixinfull(high)precision?

P�1 ⇡ A�1

P�1<latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit><latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit><latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit><latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit>

Lowprecisionforsolvingill-conditionedproblems

Ax = b , P

�1Ax = P

�1b , Ax = b

<latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit>

Page 19: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

19 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

• Preconditioningiterativesolvers.

• Idea:Approximateinverseofsystemmatrixtomakethesystem“easiertosolve”:

andsolve.

• Whyshouldwestorethepreconditionermatrixinfull(high)precision?

• Jacobimethodbasedon diagonalscaling

• Block-Jacobi isbasedonblock-diagonalscaling:

• Eachblockcorresponds toone(small) linearsystem.

• Larger blockstypicallyimproveconvergence.

• Larger blocksmakeblock-Jacobimoreexpensive.

P = diagB(A)

P = diag(A)

P�1 ⇡ A�1

P�1<latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit><latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit><latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit><latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit>

Lowprecisionforsolvingill-conditionedproblems

Ax = b , P

�1Ax = P

�1b , Ax = b

<latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit>

Page 20: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

20 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

DataAccessor• ValueClustering

DataCompression

IEEE754DP

Memory

ProcessingUnits

MemoryOperations

ArithmeticOperations

• IEEE• CustomFormats• Lossy/Lossless• Unum, Posits…

• Preconditioningiterativesolvers.

• Idea:Approximateinverseofsystemmatrixtomakethesystem“easiertosolve”:

andsolve.

• Whyshouldwestorethepreconditionermatrixinfull(high)precision?

• Jacobimethodbasedon diagonalscaling

• Block-Jacobi isbasedonblock-diagonalscaling:

• Eachblockcorresponds toone(small) linearsystem.

• Larger blockstypicallyimproveconvergence.

• Larger blocksmakeblock-Jacobimoreexpensive.

Idea:Storetheinverteddiagonalinlowprecision

P = diagB(A)

P = diag(A)

P�1 ⇡ A�1

P�1<latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit><latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit><latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit><latexit sha1_base64="YRg4p+8JR3cdv4qGWi/HAKL8RTw=">AAAB7XicdVDLSgMxFL3js9ZX1aWbYBHcOMyUgi4LblxWsA9ox5JJM21sJhmSjFCG/oMbF4q49X/c+Tdm2imo6IGQwzn3cu89YcKZNp736aysrq1vbJa2yts7u3v7lYPDtpapIrRFJJeqG2JNORO0ZZjhtJsoiuOQ0044ucr9zgNVmklxa6YJDWI8EixiBBsrtZt32bk/G1SqnlvzciDPrS9Jofju/PeqUKA5qHz0h5KkMRWGcKx1z/cSE2RYGUY4nZX7qaYJJhM8oj1LBY6pDrL5tjN0apUhiqSyTxg0V793ZDjWehqHtjLGZqx/e7n4l9dLTXQZZEwkqaGCLAZFKUdGovx0NGSKEsOnlmCimN0VkTFWmBgbUNmGsLwU/U/aNdf3XP+mXm3UizhKcAwncAY+XEADrqEJLSBwD4/wDC+OdJ6cV+dtUbriFD1H8APO+xcDo460</latexit>

Lowprecisionforsolvingill-conditionedproblems

Ax = b , P

�1Ax = P

�1b , Ax = b

<latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit><latexit sha1_base64="FWLyx6b9LwbJAgeqlSEIFqmud+c=">AAACUXicdVFNbxMxEJ0stA1JKYEeuVhElbgQ7aJKcImUwIUDhyCRDylJI693NrHq/cCeLYlW+xd7KCf+Ry8cQDibPUACI1l68948jf3sp0oact3vNefBw6Pjk/qjRvP08dmT1tNnI5NkWuBQJCrRE58bVDLGIUlSOEk18shXOPav32/18Q1qI5P4M21SnEd8GctQCk6WWrRW/XXXZ7MvGQ/Y7COGpOVyRVzr5GvFDq7yV17B+mzNulXzH0NFklQB5v2iNFSdXyxabbfjlsUOgVeBNlQ1WLTuZkEisghjEoobM/XclOY51ySFwqIxywymXFzzJU4tjHmEZp6XiRTswjIBCxNtT0ysZP905DwyZhP5djLitDL72pb8lzbNKHw7z2WcZoSx2C0KM8UoYdt4WSA1ClIbC7jQ0t6ViRXXXJD9hIYNwdt/8iEYve54bsf7dNnuvaviqMNzeAEvwYM30IMPMIAhCLiFe/gJv2rfaj8ccJzdqFOrPOfwVznN3wIqstY=</latexit>

Page 21: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

21 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

AdaptivePrecisionPreconditioning

• Choosehowmuchaccuracyofthepreconditionershouldbepreservedbythestorageformat.

• Allcomputationsusedoubleprecision,butstoreblocksinlowerprecision.

Page 22: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

22 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

AdaptivePrecisionPreconditioning

+ Regularitypreserved;

+ Flexibleintheaccuracypreserved;

+ NoflexibleKrylov solverneeded

(Preconditionerconstantoperator);

+ Canhandlenon-spd problems

(inversionfeaturespivoting);

+ Preconditionerforanyiterativepreconditionable solver;

- Overheadoftheprecisiondetection

(conditionnumbercalculation);

- Overheadfromstoringprecisioninformation

(needtoadditionallystore/retrieveflag);

- Speedups /preconditionerqualityproblem-dependent;

• Choosehowmuchaccuracyofthepreconditionershouldbepreservedbythestorageformat.

• Allcomputationsusedoubleprecision,butstoreblocksinlowerprecision.

2digits

Page 23: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

23 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

AdaptivePrecisionPreconditioningBlockdistrib

ution

100%

0%

Page 24: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

24 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

AdaptivePrecisionPreconditioningBlockdistrib

ution

100%

0%

Page 25: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

25 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

AdaptivePrecisionPreconditioningBlockdistrib

ution

100%

0%

Page 26: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

26 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Floatingpointformatsandaccuracy

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp

DoublePrecision+MixedPrecisionPreconditionerDoublePrecision+DoublePreconditioner

LinearSystemAx=bwithcond(A)≈107

16%runtime improvement

apache2fromSuiteSparse

Page 27: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

27 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Floatingpointformatsandaccuracy

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp

DoublePrecision+MixedPrecisionPreconditionerDoublePrecision+DoublePreconditioner

LinearSystemAx=bwithcond(A)≈107

16%runtime improvement

apache2fromSuiteSparseginkgo/examples/adaptiveprecision-blockjacobi/adaptiveprecision-blockjacobi.cpp

…auto solver_gen =

cg::build().with_criteria(gko::share(iter_stop), gko::share(tol_stop)).with_preconditioner(bj::build()

.with_max_block_size(16u)

.with_storage_optimization(gko::precision_reduction::autodetect())

.on(exec)).on(exec);

Page 28: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

28 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

NVIDIAV100GPUSpeedup

2

4

0.5

0.25

1

Page 29: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

29 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Problemswherethepreconditionerhashigheraccuracythan2digits.

NVIDIAV100GPUSpeedup

2

4

0.5

0.25

1

Page 30: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

30 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Roughly20%faster.

Flegar,Anzt,Cojean,Quintana-Orti.”Customized-PrecisionBlock-JacobiPreconditioningforKrylov IterativeSolversonData-ParallelManycore Processors”. TOMS,submitted.

ArtifactEvaluation:https://github.com/ginkgo-project/ginkgo-data/tree/2019toms-adaptive-bj

Production-ready code:https://ginkgo-project.github.io

\

NVIDIAV100GPUSpeedup

2

4

0.5

0.25

1

Page 31: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

31 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

• Performanceofcompute-bound algorithmsdependsonformatsupportofhardware.

• Performanceofmemory-bound algorithmsscaleshardware-independent withinverseofformatcomplexity.

• relativeresidualaccuracy=(unitround-off)*(linearsystem’sconditionnumber)

• Iftheproblem iswell-conditioned,andalow-accuracysolutionisacceptable,usealowprecisionformat.(i.e.IEEEsingleprecision,orevenIEEEhalfprecision.)

• Lowprecisionpreconditionerscanbeused toaccelerateiterativesolvers.• Adaptprecision tonumericalrequirementsandpreconditioner quality.

• Toincreasetheperformancebenefits,shiftmostoftheworktothelowprecisionpreconditioner.

Take-Away

Page 32: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

32 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Usingalowprecisionsolveraspreconditioner

• Toincreasetheperformancebenefits,shiftmostoftheworktothelowprecisionpreconditioner.

• Useasimple(cheap)iterativesolverinhighprecisionandasophisticated(expensive)solver inlowprecisionaspreconditioner.

• Mostoftheworkisdoneinlowprecision (fast).• Thehighprecisionoutersolverensureshighqualityofthesolution.

• Popularexample:IterativeRefinement(seeJackDongarra’s talk)

Foranapproximatesolution ,theresidualcomputesas.Theexactsolution foriswhereisthesolutionof.

r = b�Ax

(k)<latexit sha1_base64="dz4yJX0V7/cXCfrp7MhEtMPb0Go=">AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBahHixJKehFqHjxWMF+QBvLZrtpl242YXdTLKH/xIsHRbz6T7z5b9y2OWjrg4HHezPMzPNjzpR2nG8rt7a+sbmV3y7s7O7tH9iHR00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aPbmd8aU6lYJB70JKZeiAeCBYxgbaSebUt0jXx0gW6eHtPS6Hzas4tO2ZkDrRI3I0XIUO/ZX91+RJKQCk04VqrjOrH2Uiw1I5xOC91E0RiTER7QjqECh1R56fzyKTozSh8FkTQlNJqrvydSHCo1CX3TGWI9VMveTPzP6yQ6uPJSJuJEU0EWi4KEIx2hWQyozyQlmk8MwUQycysiQywx0SasggnBXX55lTQrZdcpu/fVYq2axZGHEziFErhwCTW4gzo0gMAYnuEV3qzUerHerY9Fa87KZo7hD6zPH80ykcY=</latexit><latexit sha1_base64="dz4yJX0V7/cXCfrp7MhEtMPb0Go=">AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBahHixJKehFqHjxWMF+QBvLZrtpl242YXdTLKH/xIsHRbz6T7z5b9y2OWjrg4HHezPMzPNjzpR2nG8rt7a+sbmV3y7s7O7tH9iHR00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aPbmd8aU6lYJB70JKZeiAeCBYxgbaSebUt0jXx0gW6eHtPS6Hzas4tO2ZkDrRI3I0XIUO/ZX91+RJKQCk04VqrjOrH2Uiw1I5xOC91E0RiTER7QjqECh1R56fzyKTozSh8FkTQlNJqrvydSHCo1CX3TGWI9VMveTPzP6yQ6uPJSJuJEU0EWi4KEIx2hWQyozyQlmk8MwUQycysiQywx0SasggnBXX55lTQrZdcpu/fVYq2axZGHEziFErhwCTW4gzo0gMAYnuEV3qzUerHerY9Fa87KZo7hD6zPH80ykcY=</latexit><latexit sha1_base64="dz4yJX0V7/cXCfrp7MhEtMPb0Go=">AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBahHixJKehFqHjxWMF+QBvLZrtpl242YXdTLKH/xIsHRbz6T7z5b9y2OWjrg4HHezPMzPNjzpR2nG8rt7a+sbmV3y7s7O7tH9iHR00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aPbmd8aU6lYJB70JKZeiAeCBYxgbaSebUt0jXx0gW6eHtPS6Hzas4tO2ZkDrRI3I0XIUO/ZX91+RJKQCk04VqrjOrH2Uiw1I5xOC91E0RiTER7QjqECh1R56fzyKTozSh8FkTQlNJqrvydSHCo1CX3TGWI9VMveTPzP6yQ6uPJSJuJEU0EWi4KEIx2hWQyozyQlmk8MwUQycysiQywx0SasggnBXX55lTQrZdcpu/fVYq2axZGHEziFErhwCTW4gzo0gMAYnuEV3qzUerHerY9Fa87KZo7hD6zPH80ykcY=</latexit><latexit sha1_base64="dz4yJX0V7/cXCfrp7MhEtMPb0Go=">AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBahHixJKehFqHjxWMF+QBvLZrtpl242YXdTLKH/xIsHRbz6T7z5b9y2OWjrg4HHezPMzPNjzpR2nG8rt7a+sbmV3y7s7O7tH9iHR00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aPbmd8aU6lYJB70JKZeiAeCBYxgbaSebUt0jXx0gW6eHtPS6Hzas4tO2ZkDrRI3I0XIUO/ZX91+RJKQCk04VqrjOrH2Uiw1I5xOC91E0RiTER7QjqECh1R56fzyKTozSh8FkTQlNJqrvydSHCo1CX3TGWI9VMveTPzP6yQ6uPJSJuJEU0EWi4KEIx2hWQyozyQlmk8MwUQycysiQywx0SasggnBXX55lTQrZdcpu/fVYq2axZGHEziFErhwCTW4gzo0gMAYnuEV3qzUerHerY9Fa87KZo7hD6zPH80ykcY=</latexit>

x = x

(k) + c

<latexit sha1_base64="UgDq8r8noA6sd5XuIWYWVuWNCag=">AAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbEIFaEkUtCLUPDisYL9gDaWzXbTLt1swu5GWkN/iRcPinj1p3jz37htc9DWBwOP92aYmefHnCntON9Wbm19Y3Mrv13Y2d3bL9oHh00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aObmd96pFKxSNzrSUy9EA8ECxjB2kg9uzhG12j8kJZHZ1N0jkjPLjkVZw60StyMlCBDvWd/dfsRSUIqNOFYqY7rxNpLsdSMcDotdBNFY0xGeEA7hgocUuWl88On6NQofRRE0pTQaK7+nkhxqNQk9E1niPVQLXsz8T+vk+jgykuZiBNNBVksChKOdIRmKaA+k5RoPjEEE8nMrYgMscREm6wKJgR3+eVV0ryouE7FvauWatUsjjwcwwmUwYVLqMEt1KEBBBJ4hld4s56sF+vd+li05qxs5gj+wPr8AU20kYA=</latexit><latexit sha1_base64="UgDq8r8noA6sd5XuIWYWVuWNCag=">AAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbEIFaEkUtCLUPDisYL9gDaWzXbTLt1swu5GWkN/iRcPinj1p3jz37htc9DWBwOP92aYmefHnCntON9Wbm19Y3Mrv13Y2d3bL9oHh00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aObmd96pFKxSNzrSUy9EA8ECxjB2kg9uzhG12j8kJZHZ1N0jkjPLjkVZw60StyMlCBDvWd/dfsRSUIqNOFYqY7rxNpLsdSMcDotdBNFY0xGeEA7hgocUuWl88On6NQofRRE0pTQaK7+nkhxqNQk9E1niPVQLXsz8T+vk+jgykuZiBNNBVksChKOdIRmKaA+k5RoPjEEE8nMrYgMscREm6wKJgR3+eVV0ryouE7FvauWatUsjjwcwwmUwYVLqMEt1KEBBBJ4hld4s56sF+vd+li05qxs5gj+wPr8AU20kYA=</latexit><latexit sha1_base64="UgDq8r8noA6sd5XuIWYWVuWNCag=">AAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbEIFaEkUtCLUPDisYL9gDaWzXbTLt1swu5GWkN/iRcPinj1p3jz37htc9DWBwOP92aYmefHnCntON9Wbm19Y3Mrv13Y2d3bL9oHh00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aObmd96pFKxSNzrSUy9EA8ECxjB2kg9uzhG12j8kJZHZ1N0jkjPLjkVZw60StyMlCBDvWd/dfsRSUIqNOFYqY7rxNpLsdSMcDotdBNFY0xGeEA7hgocUuWl88On6NQofRRE0pTQaK7+nkhxqNQk9E1niPVQLXsz8T+vk+jgykuZiBNNBVksChKOdIRmKaA+k5RoPjEEE8nMrYgMscREm6wKJgR3+eVV0ryouE7FvauWatUsjjwcwwmUwYVLqMEt1KEBBBJ4hld4s56sF+vd+li05qxs5gj+wPr8AU20kYA=</latexit><latexit sha1_base64="UgDq8r8noA6sd5XuIWYWVuWNCag=">AAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbEIFaEkUtCLUPDisYL9gDaWzXbTLt1swu5GWkN/iRcPinj1p3jz37htc9DWBwOP92aYmefHnCntON9Wbm19Y3Mrv13Y2d3bL9oHh00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aObmd96pFKxSNzrSUy9EA8ECxjB2kg9uzhG12j8kJZHZ1N0jkjPLjkVZw60StyMlCBDvWd/dfsRSUIqNOFYqY7rxNpLsdSMcDotdBNFY0xGeEA7hgocUuWl88On6NQofRRE0pTQaK7+nkhxqNQk9E1niPVQLXsz8T+vk+jgykuZiBNNBVksChKOdIRmKaA+k5RoPjEEE8nMrYgMscREm6wKJgR3+eVV0ryouE7FvauWatUsjjwcwwmUwYVLqMEt1KEBBBJ4hld4s56sF+vd+li05qxs5gj+wPr8AU20kYA=</latexit>

Ac = r<latexit sha1_base64="KPkOLAyN+nBZmcPjWo2YU8ULNxU=">AAAB7XicbVBNSwMxEJ3Ur1q/qh69BIvgqexKQS9CxYvHCvYD2qVk02wbm02WJCuUpf/BiwdFvPp/vPlvTNs9aOuDgcd7M8zMCxPBjfW8b1RYW9/Y3Cpul3Z29/YPyodHLaNSTVmTKqF0JySGCS5Z03IrWCfRjMShYO1wfDvz209MG67kg50kLIjJUPKIU2Kd1Lqh+BrrfrniVb058Crxc1KBHI1++as3UDSNmbRUEGO6vpfYICPacirYtNRLDUsIHZMh6zoqScxMkM2vneIzpwxwpLQrafFc/T2RkdiYSRy6zpjYkVn2ZuJ/Xje10VWQcZmklkm6WBSlAluFZ6/jAdeMWjFxhFDN3a2Yjogm1LqASi4Ef/nlVdK6qPpe1b+vVeq1PI4inMApnIMPl1CHO2hAEyg8wjO8whtS6AW9o49FawHlM8fwB+jzB0tvjjs=</latexit><latexit sha1_base64="KPkOLAyN+nBZmcPjWo2YU8ULNxU=">AAAB7XicbVBNSwMxEJ3Ur1q/qh69BIvgqexKQS9CxYvHCvYD2qVk02wbm02WJCuUpf/BiwdFvPp/vPlvTNs9aOuDgcd7M8zMCxPBjfW8b1RYW9/Y3Cpul3Z29/YPyodHLaNSTVmTKqF0JySGCS5Z03IrWCfRjMShYO1wfDvz209MG67kg50kLIjJUPKIU2Kd1Lqh+BrrfrniVb058Crxc1KBHI1++as3UDSNmbRUEGO6vpfYICPacirYtNRLDUsIHZMh6zoqScxMkM2vneIzpwxwpLQrafFc/T2RkdiYSRy6zpjYkVn2ZuJ/Xje10VWQcZmklkm6WBSlAluFZ6/jAdeMWjFxhFDN3a2Yjogm1LqASi4Ef/nlVdK6qPpe1b+vVeq1PI4inMApnIMPl1CHO2hAEyg8wjO8whtS6AW9o49FawHlM8fwB+jzB0tvjjs=</latexit><latexit sha1_base64="KPkOLAyN+nBZmcPjWo2YU8ULNxU=">AAAB7XicbVBNSwMxEJ3Ur1q/qh69BIvgqexKQS9CxYvHCvYD2qVk02wbm02WJCuUpf/BiwdFvPp/vPlvTNs9aOuDgcd7M8zMCxPBjfW8b1RYW9/Y3Cpul3Z29/YPyodHLaNSTVmTKqF0JySGCS5Z03IrWCfRjMShYO1wfDvz209MG67kg50kLIjJUPKIU2Kd1Lqh+BrrfrniVb058Crxc1KBHI1++as3UDSNmbRUEGO6vpfYICPacirYtNRLDUsIHZMh6zoqScxMkM2vneIzpwxwpLQrafFc/T2RkdiYSRy6zpjYkVn2ZuJ/Xje10VWQcZmklkm6WBSlAluFZ6/jAdeMWjFxhFDN3a2Yjogm1LqASi4Ef/nlVdK6qPpe1b+vVeq1PI4inMApnIMPl1CHO2hAEyg8wjO8whtS6AW9o49FawHlM8fwB+jzB0tvjjs=</latexit><latexit sha1_base64="KPkOLAyN+nBZmcPjWo2YU8ULNxU=">AAAB7XicbVBNSwMxEJ3Ur1q/qh69BIvgqexKQS9CxYvHCvYD2qVk02wbm02WJCuUpf/BiwdFvPp/vPlvTNs9aOuDgcd7M8zMCxPBjfW8b1RYW9/Y3Cpul3Z29/YPyodHLaNSTVmTKqF0JySGCS5Z03IrWCfRjMShYO1wfDvz209MG67kg50kLIjJUPKIU2Kd1Lqh+BrrfrniVb058Crxc1KBHI1++as3UDSNmbRUEGO6vpfYICPacirYtNRLDUsIHZMh6zoqScxMkM2vneIzpwxwpLQrafFc/T2RkdiYSRy6zpjYkVn2ZuJ/Xje10VWQcZmklkm6WBSlAluFZ6/jAdeMWjFxhFDN3a2Yjogm1LqASi4Ef/nlVdK6qPpe1b+vVeq1PI4inMApnIMPl1CHO2hAEyg8wjO8whtS6AW9o49FawHlM8fwB+jzB0tvjjs=</latexit>

x

(k)<latexit sha1_base64="DSmavFgbyE8yLz2MwnZO7G2nB3E=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBahXkoiBT0WvHisYFuhjWWznbRLN5uwuxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGr+4BqFFxiy3Aj8D5RSKNAYCcYX8/8ziMqzWN5ZyYJ+hEdSh5yRo2VOk8PWXV8Pu2XK27NnYOsEi8nFcjR7Je/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/Nz52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasIrP+MySQ1KtlgUpoKYmMx+JwOukBkxsYQyxe2thI2ooszYhEo2BG/55VXSvqh5bs27rVca9TyOIpzAKVTBg0towA00oQUMxvAMr/DmJM6L8+58LFoLTj5zDH/gfP4A6waPPA==</latexit><latexit sha1_base64="DSmavFgbyE8yLz2MwnZO7G2nB3E=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBahXkoiBT0WvHisYFuhjWWznbRLN5uwuxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGr+4BqFFxiy3Aj8D5RSKNAYCcYX8/8ziMqzWN5ZyYJ+hEdSh5yRo2VOk8PWXV8Pu2XK27NnYOsEi8nFcjR7Je/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/Nz52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasIrP+MySQ1KtlgUpoKYmMx+JwOukBkxsYQyxe2thI2ooszYhEo2BG/55VXSvqh5bs27rVca9TyOIpzAKVTBg0towA00oQUMxvAMr/DmJM6L8+58LFoLTj5zDH/gfP4A6waPPA==</latexit><latexit sha1_base64="DSmavFgbyE8yLz2MwnZO7G2nB3E=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBahXkoiBT0WvHisYFuhjWWznbRLN5uwuxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGr+4BqFFxiy3Aj8D5RSKNAYCcYX8/8ziMqzWN5ZyYJ+hEdSh5yRo2VOk8PWXV8Pu2XK27NnYOsEi8nFcjR7Je/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/Nz52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasIrP+MySQ1KtlgUpoKYmMx+JwOukBkxsYQyxe2thI2ooszYhEo2BG/55VXSvqh5bs27rVca9TyOIpzAKVTBg0towA00oQUMxvAMr/DmJM6L8+58LFoLTj5zDH/gfP4A6waPPA==</latexit><latexit sha1_base64="DSmavFgbyE8yLz2MwnZO7G2nB3E=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBahXkoiBT0WvHisYFuhjWWznbRLN5uwuxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGr+4BqFFxiy3Aj8D5RSKNAYCcYX8/8ziMqzWN5ZyYJ+hEdSh5yRo2VOk8PWXV8Pu2XK27NnYOsEi8nFcjR7Je/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/Nz52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasIrP+MySQ1KtlgUpoKYmMx+JwOukBkxsYQyxe2thI2ooszYhEo2BG/55VXSvqh5bs27rVca9TyOIpzAKVTBg0towA00oQUMxvAMr/DmJM6L8+58LFoLTj5zDH/gfP4A6waPPA==</latexit>

Ax = b

<latexit sha1_base64="g/Ub0N8eIUIqalzNgwRMs/8w+Nw=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoBeh4sVjBdMW2lA22027dHcTdjdiCf0LXjwo4tU/5M1/46bNQVsfDDzem2FmXphwpo3rfjultfWNza3ydmVnd2//oHp41NZxqgj1Scxj1Q2xppxJ6htmOO0mimIRctoJJ7e533mkSrNYPphpQgOBR5JFjGCTSzdP1+GgWnPr7hxolXgFqUGB1qD61R/GJBVUGsKx1j3PTUyQYWUY4XRW6aeaJphM8Ij2LJVYUB1k81tn6MwqQxTFypY0aK7+nsiw0HoqQtspsBnrZS8X//N6qYmugozJJDVUksWiKOXIxCh/HA2ZosTwqSWYKGZvRWSMFSbGxlOxIXjLL6+S9kXdc+vefaPWbBRxlOEETuEcPLiEJtxBC3wgMIZneIU3RzgvzrvzsWgtOcXMMfyB8/kDpkON7A==</latexit><latexit sha1_base64="g/Ub0N8eIUIqalzNgwRMs/8w+Nw=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoBeh4sVjBdMW2lA22027dHcTdjdiCf0LXjwo4tU/5M1/46bNQVsfDDzem2FmXphwpo3rfjultfWNza3ydmVnd2//oHp41NZxqgj1Scxj1Q2xppxJ6htmOO0mimIRctoJJ7e533mkSrNYPphpQgOBR5JFjGCTSzdP1+GgWnPr7hxolXgFqUGB1qD61R/GJBVUGsKx1j3PTUyQYWUY4XRW6aeaJphM8Ij2LJVYUB1k81tn6MwqQxTFypY0aK7+nsiw0HoqQtspsBnrZS8X//N6qYmugozJJDVUksWiKOXIxCh/HA2ZosTwqSWYKGZvRWSMFSbGxlOxIXjLL6+S9kXdc+vefaPWbBRxlOEETuEcPLiEJtxBC3wgMIZneIU3RzgvzrvzsWgtOcXMMfyB8/kDpkON7A==</latexit><latexit sha1_base64="g/Ub0N8eIUIqalzNgwRMs/8w+Nw=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoBeh4sVjBdMW2lA22027dHcTdjdiCf0LXjwo4tU/5M1/46bNQVsfDDzem2FmXphwpo3rfjultfWNza3ydmVnd2//oHp41NZxqgj1Scxj1Q2xppxJ6htmOO0mimIRctoJJ7e533mkSrNYPphpQgOBR5JFjGCTSzdP1+GgWnPr7hxolXgFqUGB1qD61R/GJBVUGsKx1j3PTUyQYWUY4XRW6aeaJphM8Ij2LJVYUB1k81tn6MwqQxTFypY0aK7+nsiw0HoqQtspsBnrZS8X//N6qYmugozJJDVUksWiKOXIxCh/HA2ZosTwqSWYKGZvRWSMFSbGxlOxIXjLL6+S9kXdc+vefaPWbBRxlOEETuEcPLiEJtxBC3wgMIZneIU3RzgvzrvzsWgtOcXMMfyB8/kDpkON7A==</latexit><latexit sha1_base64="g/Ub0N8eIUIqalzNgwRMs/8w+Nw=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoBeh4sVjBdMW2lA22027dHcTdjdiCf0LXjwo4tU/5M1/46bNQVsfDDzem2FmXphwpo3rfjultfWNza3ydmVnd2//oHp41NZxqgj1Scxj1Q2xppxJ6htmOO0mimIRctoJJ7e533mkSrNYPphpQgOBR5JFjGCTSzdP1+GgWnPr7hxolXgFqUGB1qD61R/GJBVUGsKx1j3PTUyQYWUY4XRW6aeaJphM8Ij2LJVYUB1k81tn6MwqQxTFypY0aK7+nsiw0HoqQtspsBnrZS8X//N6qYmugozJJDVUksWiKOXIxCh/HA2ZosTwqSWYKGZvRWSMFSbGxlOxIXjLL6+S9kXdc+vefaPWbBRxlOEETuEcPLiEJtxBC3wgMIZneIU3RzgvzrvzsWgtOcXMMfyB8/kDpkON7A==</latexit>

c<latexit sha1_base64="ykyXXryT0qS3g8DIJalovrnOKSA=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSiw3LFbfqLkE2iZeTCuRoDstfg1HM0gilYYJq3ffcxPgZVYYzgfPSINWYUDalY+xbKmmE2s+Wh87JlVVGJIyVLWnIUv09kdFI61kU2M6Imole9xbif14/NeGtn3GZpAYlWy0KU0FMTBZfkxFXyIyYWUKZ4vZWwiZUUWZsNiUbgrf+8ibp3FQ9t+q1apVGLY+jCBdwCdfgQR0acA9NaAMDhGd4hTfn0Xlx3p2PVWvByWfO4Q+czx/CL4zZ</latexit><latexit sha1_base64="ykyXXryT0qS3g8DIJalovrnOKSA=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSiw3LFbfqLkE2iZeTCuRoDstfg1HM0gilYYJq3ffcxPgZVYYzgfPSINWYUDalY+xbKmmE2s+Wh87JlVVGJIyVLWnIUv09kdFI61kU2M6Imole9xbif14/NeGtn3GZpAYlWy0KU0FMTBZfkxFXyIyYWUKZ4vZWwiZUUWZsNiUbgrf+8ibp3FQ9t+q1apVGLY+jCBdwCdfgQR0acA9NaAMDhGd4hTfn0Xlx3p2PVWvByWfO4Q+czx/CL4zZ</latexit><latexit sha1_base64="ykyXXryT0qS3g8DIJalovrnOKSA=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSiw3LFbfqLkE2iZeTCuRoDstfg1HM0gilYYJq3ffcxPgZVYYzgfPSINWYUDalY+xbKmmE2s+Wh87JlVVGJIyVLWnIUv09kdFI61kU2M6Imole9xbif14/NeGtn3GZpAYlWy0KU0FMTBZfkxFXyIyYWUKZ4vZWwiZUUWZsNiUbgrf+8ibp3FQ9t+q1apVGLY+jCBdwCdfgQR0acA9NaAMDhGd4hTfn0Xlx3p2PVWvByWfO4Q+czx/CL4zZ</latexit><latexit sha1_base64="ykyXXryT0qS3g8DIJalovrnOKSA=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSiw3LFbfqLkE2iZeTCuRoDstfg1HM0gilYYJq3ffcxPgZVYYzgfPSINWYUDalY+xbKmmE2s+Wh87JlVVGJIyVLWnIUv09kdFI61kU2M6Imole9xbif14/NeGtn3GZpAYlWy0KU0FMTBZfkxFXyIyYWUKZ4vZWwiZUUWZsNiUbgrf+8ibp3FQ9t+q1apVGLY+jCBdwCdfgQR0acA9NaAMDhGd4hTfn0Xlx3p2PVWvByWfO4Q+czx/CL4zZ</latexit>

Choose initial guess xdo {

Compute r = b – Ax Solve A * c = r Update x = x + c

} while ( ||r||>tol )

IterativeRefinement

Page 33: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

33 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

Usingalowprecisionsolveraspreconditioner

• Toincreasetheperformancebenefits,shiftmostoftheworktothelowprecisionpreconditioner.

• Useasimple(cheap)iterativesolverinhighprecisionandasophisticated(expensive)solver inlowprecisionaspreconditioner.

• Mostoftheworkisdoneinlowprecision (fast).• Thehighprecisionoutersolverensureshighqualityofthesolution.

• Popularexample:IterativeRefinement(seeJackDongarra’s talk)

Foranapproximatesolution ,theresidualcomputesas.Theexactsolution foriswhereisthesolutionof.

r = b�Ax

(k)<latexit sha1_base64="dz4yJX0V7/cXCfrp7MhEtMPb0Go=">AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBahHixJKehFqHjxWMF+QBvLZrtpl242YXdTLKH/xIsHRbz6T7z5b9y2OWjrg4HHezPMzPNjzpR2nG8rt7a+sbmV3y7s7O7tH9iHR00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aPbmd8aU6lYJB70JKZeiAeCBYxgbaSebUt0jXx0gW6eHtPS6Hzas4tO2ZkDrRI3I0XIUO/ZX91+RJKQCk04VqrjOrH2Uiw1I5xOC91E0RiTER7QjqECh1R56fzyKTozSh8FkTQlNJqrvydSHCo1CX3TGWI9VMveTPzP6yQ6uPJSJuJEU0EWi4KEIx2hWQyozyQlmk8MwUQycysiQywx0SasggnBXX55lTQrZdcpu/fVYq2axZGHEziFErhwCTW4gzo0gMAYnuEV3qzUerHerY9Fa87KZo7hD6zPH80ykcY=</latexit><latexit sha1_base64="dz4yJX0V7/cXCfrp7MhEtMPb0Go=">AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBahHixJKehFqHjxWMF+QBvLZrtpl242YXdTLKH/xIsHRbz6T7z5b9y2OWjrg4HHezPMzPNjzpR2nG8rt7a+sbmV3y7s7O7tH9iHR00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aPbmd8aU6lYJB70JKZeiAeCBYxgbaSebUt0jXx0gW6eHtPS6Hzas4tO2ZkDrRI3I0XIUO/ZX91+RJKQCk04VqrjOrH2Uiw1I5xOC91E0RiTER7QjqECh1R56fzyKTozSh8FkTQlNJqrvydSHCo1CX3TGWI9VMveTPzP6yQ6uPJSJuJEU0EWi4KEIx2hWQyozyQlmk8MwUQycysiQywx0SasggnBXX55lTQrZdcpu/fVYq2axZGHEziFErhwCTW4gzo0gMAYnuEV3qzUerHerY9Fa87KZo7hD6zPH80ykcY=</latexit><latexit sha1_base64="dz4yJX0V7/cXCfrp7MhEtMPb0Go=">AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBahHixJKehFqHjxWMF+QBvLZrtpl242YXdTLKH/xIsHRbz6T7z5b9y2OWjrg4HHezPMzPNjzpR2nG8rt7a+sbmV3y7s7O7tH9iHR00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aPbmd8aU6lYJB70JKZeiAeCBYxgbaSebUt0jXx0gW6eHtPS6Hzas4tO2ZkDrRI3I0XIUO/ZX91+RJKQCk04VqrjOrH2Uiw1I5xOC91E0RiTER7QjqECh1R56fzyKTozSh8FkTQlNJqrvydSHCo1CX3TGWI9VMveTPzP6yQ6uPJSJuJEU0EWi4KEIx2hWQyozyQlmk8MwUQycysiQywx0SasggnBXX55lTQrZdcpu/fVYq2axZGHEziFErhwCTW4gzo0gMAYnuEV3qzUerHerY9Fa87KZo7hD6zPH80ykcY=</latexit><latexit sha1_base64="dz4yJX0V7/cXCfrp7MhEtMPb0Go=">AAAB+XicbVBNS8NAEJ3Ur1q/oh69LBahHixJKehFqHjxWMF+QBvLZrtpl242YXdTLKH/xIsHRbz6T7z5b9y2OWjrg4HHezPMzPNjzpR2nG8rt7a+sbmV3y7s7O7tH9iHR00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aPbmd8aU6lYJB70JKZeiAeCBYxgbaSebUt0jXx0gW6eHtPS6Hzas4tO2ZkDrRI3I0XIUO/ZX91+RJKQCk04VqrjOrH2Uiw1I5xOC91E0RiTER7QjqECh1R56fzyKTozSh8FkTQlNJqrvydSHCo1CX3TGWI9VMveTPzP6yQ6uPJSJuJEU0EWi4KEIx2hWQyozyQlmk8MwUQycysiQywx0SasggnBXX55lTQrZdcpu/fVYq2axZGHEziFErhwCTW4gzo0gMAYnuEV3qzUerHerY9Fa87KZo7hD6zPH80ykcY=</latexit>

x = x

(k) + c

<latexit sha1_base64="UgDq8r8noA6sd5XuIWYWVuWNCag=">AAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbEIFaEkUtCLUPDisYL9gDaWzXbTLt1swu5GWkN/iRcPinj1p3jz37htc9DWBwOP92aYmefHnCntON9Wbm19Y3Mrv13Y2d3bL9oHh00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aObmd96pFKxSNzrSUy9EA8ECxjB2kg9uzhG12j8kJZHZ1N0jkjPLjkVZw60StyMlCBDvWd/dfsRSUIqNOFYqY7rxNpLsdSMcDotdBNFY0xGeEA7hgocUuWl88On6NQofRRE0pTQaK7+nkhxqNQk9E1niPVQLXsz8T+vk+jgykuZiBNNBVksChKOdIRmKaA+k5RoPjEEE8nMrYgMscREm6wKJgR3+eVV0ryouE7FvauWatUsjjwcwwmUwYVLqMEt1KEBBBJ4hld4s56sF+vd+li05qxs5gj+wPr8AU20kYA=</latexit><latexit sha1_base64="UgDq8r8noA6sd5XuIWYWVuWNCag=">AAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbEIFaEkUtCLUPDisYL9gDaWzXbTLt1swu5GWkN/iRcPinj1p3jz37htc9DWBwOP92aYmefHnCntON9Wbm19Y3Mrv13Y2d3bL9oHh00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aObmd96pFKxSNzrSUy9EA8ECxjB2kg9uzhG12j8kJZHZ1N0jkjPLjkVZw60StyMlCBDvWd/dfsRSUIqNOFYqY7rxNpLsdSMcDotdBNFY0xGeEA7hgocUuWl88On6NQofRRE0pTQaK7+nkhxqNQk9E1niPVQLXsz8T+vk+jgykuZiBNNBVksChKOdIRmKaA+k5RoPjEEE8nMrYgMscREm6wKJgR3+eVV0ryouE7FvauWatUsjjwcwwmUwYVLqMEt1KEBBBJ4hld4s56sF+vd+li05qxs5gj+wPr8AU20kYA=</latexit><latexit sha1_base64="UgDq8r8noA6sd5XuIWYWVuWNCag=">AAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbEIFaEkUtCLUPDisYL9gDaWzXbTLt1swu5GWkN/iRcPinj1p3jz37htc9DWBwOP92aYmefHnCntON9Wbm19Y3Mrv13Y2d3bL9oHh00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aObmd96pFKxSNzrSUy9EA8ECxjB2kg9uzhG12j8kJZHZ1N0jkjPLjkVZw60StyMlCBDvWd/dfsRSUIqNOFYqY7rxNpLsdSMcDotdBNFY0xGeEA7hgocUuWl88On6NQofRRE0pTQaK7+nkhxqNQk9E1niPVQLXsz8T+vk+jgykuZiBNNBVksChKOdIRmKaA+k5RoPjEEE8nMrYgMscREm6wKJgR3+eVV0ryouE7FvauWatUsjjwcwwmUwYVLqMEt1KEBBBJ4hld4s56sF+vd+li05qxs5gj+wPr8AU20kYA=</latexit><latexit sha1_base64="UgDq8r8noA6sd5XuIWYWVuWNCag=">AAAB+HicbVBNS8NAEJ3Ur1o/GvXoZbEIFaEkUtCLUPDisYL9gDaWzXbTLt1swu5GWkN/iRcPinj1p3jz37htc9DWBwOP92aYmefHnCntON9Wbm19Y3Mrv13Y2d3bL9oHh00VJZLQBol4JNs+VpQzQRuaaU7bsaQ49Dlt+aObmd96pFKxSNzrSUy9EA8ECxjB2kg9uzhG12j8kJZHZ1N0jkjPLjkVZw60StyMlCBDvWd/dfsRSUIqNOFYqY7rxNpLsdSMcDotdBNFY0xGeEA7hgocUuWl88On6NQofRRE0pTQaK7+nkhxqNQk9E1niPVQLXsz8T+vk+jgykuZiBNNBVksChKOdIRmKaA+k5RoPjEEE8nMrYgMscREm6wKJgR3+eVV0ryouE7FvauWatUsjjwcwwmUwYVLqMEt1KEBBBJ4hld4s56sF+vd+li05qxs5gj+wPr8AU20kYA=</latexit>

Ac = r<latexit sha1_base64="KPkOLAyN+nBZmcPjWo2YU8ULNxU=">AAAB7XicbVBNSwMxEJ3Ur1q/qh69BIvgqexKQS9CxYvHCvYD2qVk02wbm02WJCuUpf/BiwdFvPp/vPlvTNs9aOuDgcd7M8zMCxPBjfW8b1RYW9/Y3Cpul3Z29/YPyodHLaNSTVmTKqF0JySGCS5Z03IrWCfRjMShYO1wfDvz209MG67kg50kLIjJUPKIU2Kd1Lqh+BrrfrniVb058Crxc1KBHI1++as3UDSNmbRUEGO6vpfYICPacirYtNRLDUsIHZMh6zoqScxMkM2vneIzpwxwpLQrafFc/T2RkdiYSRy6zpjYkVn2ZuJ/Xje10VWQcZmklkm6WBSlAluFZ6/jAdeMWjFxhFDN3a2Yjogm1LqASi4Ef/nlVdK6qPpe1b+vVeq1PI4inMApnIMPl1CHO2hAEyg8wjO8whtS6AW9o49FawHlM8fwB+jzB0tvjjs=</latexit><latexit sha1_base64="KPkOLAyN+nBZmcPjWo2YU8ULNxU=">AAAB7XicbVBNSwMxEJ3Ur1q/qh69BIvgqexKQS9CxYvHCvYD2qVk02wbm02WJCuUpf/BiwdFvPp/vPlvTNs9aOuDgcd7M8zMCxPBjfW8b1RYW9/Y3Cpul3Z29/YPyodHLaNSTVmTKqF0JySGCS5Z03IrWCfRjMShYO1wfDvz209MG67kg50kLIjJUPKIU2Kd1Lqh+BrrfrniVb058Crxc1KBHI1++as3UDSNmbRUEGO6vpfYICPacirYtNRLDUsIHZMh6zoqScxMkM2vneIzpwxwpLQrafFc/T2RkdiYSRy6zpjYkVn2ZuJ/Xje10VWQcZmklkm6WBSlAluFZ6/jAdeMWjFxhFDN3a2Yjogm1LqASi4Ef/nlVdK6qPpe1b+vVeq1PI4inMApnIMPl1CHO2hAEyg8wjO8whtS6AW9o49FawHlM8fwB+jzB0tvjjs=</latexit><latexit sha1_base64="KPkOLAyN+nBZmcPjWo2YU8ULNxU=">AAAB7XicbVBNSwMxEJ3Ur1q/qh69BIvgqexKQS9CxYvHCvYD2qVk02wbm02WJCuUpf/BiwdFvPp/vPlvTNs9aOuDgcd7M8zMCxPBjfW8b1RYW9/Y3Cpul3Z29/YPyodHLaNSTVmTKqF0JySGCS5Z03IrWCfRjMShYO1wfDvz209MG67kg50kLIjJUPKIU2Kd1Lqh+BrrfrniVb058Crxc1KBHI1++as3UDSNmbRUEGO6vpfYICPacirYtNRLDUsIHZMh6zoqScxMkM2vneIzpwxwpLQrafFc/T2RkdiYSRy6zpjYkVn2ZuJ/Xje10VWQcZmklkm6WBSlAluFZ6/jAdeMWjFxhFDN3a2Yjogm1LqASi4Ef/nlVdK6qPpe1b+vVeq1PI4inMApnIMPl1CHO2hAEyg8wjO8whtS6AW9o49FawHlM8fwB+jzB0tvjjs=</latexit><latexit sha1_base64="KPkOLAyN+nBZmcPjWo2YU8ULNxU=">AAAB7XicbVBNSwMxEJ3Ur1q/qh69BIvgqexKQS9CxYvHCvYD2qVk02wbm02WJCuUpf/BiwdFvPp/vPlvTNs9aOuDgcd7M8zMCxPBjfW8b1RYW9/Y3Cpul3Z29/YPyodHLaNSTVmTKqF0JySGCS5Z03IrWCfRjMShYO1wfDvz209MG67kg50kLIjJUPKIU2Kd1Lqh+BrrfrniVb058Crxc1KBHI1++as3UDSNmbRUEGO6vpfYICPacirYtNRLDUsIHZMh6zoqScxMkM2vneIzpwxwpLQrafFc/T2RkdiYSRy6zpjYkVn2ZuJ/Xje10VWQcZmklkm6WBSlAluFZ6/jAdeMWjFxhFDN3a2Yjogm1LqASi4Ef/nlVdK6qPpe1b+vVeq1PI4inMApnIMPl1CHO2hAEyg8wjO8whtS6AW9o49FawHlM8fwB+jzB0tvjjs=</latexit>

x

(k)<latexit sha1_base64="DSmavFgbyE8yLz2MwnZO7G2nB3E=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBahXkoiBT0WvHisYFuhjWWznbRLN5uwuxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGr+4BqFFxiy3Aj8D5RSKNAYCcYX8/8ziMqzWN5ZyYJ+hEdSh5yRo2VOk8PWXV8Pu2XK27NnYOsEi8nFcjR7Je/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/Nz52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasIrP+MySQ1KtlgUpoKYmMx+JwOukBkxsYQyxe2thI2ooszYhEo2BG/55VXSvqh5bs27rVca9TyOIpzAKVTBg0towA00oQUMxvAMr/DmJM6L8+58LFoLTj5zDH/gfP4A6waPPA==</latexit><latexit sha1_base64="DSmavFgbyE8yLz2MwnZO7G2nB3E=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBahXkoiBT0WvHisYFuhjWWznbRLN5uwuxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGr+4BqFFxiy3Aj8D5RSKNAYCcYX8/8ziMqzWN5ZyYJ+hEdSh5yRo2VOk8PWXV8Pu2XK27NnYOsEi8nFcjR7Je/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/Nz52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasIrP+MySQ1KtlgUpoKYmMx+JwOukBkxsYQyxe2thI2ooszYhEo2BG/55VXSvqh5bs27rVca9TyOIpzAKVTBg0towA00oQUMxvAMr/DmJM6L8+58LFoLTj5zDH/gfP4A6waPPA==</latexit><latexit sha1_base64="DSmavFgbyE8yLz2MwnZO7G2nB3E=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBahXkoiBT0WvHisYFuhjWWznbRLN5uwuxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGr+4BqFFxiy3Aj8D5RSKNAYCcYX8/8ziMqzWN5ZyYJ+hEdSh5yRo2VOk8PWXV8Pu2XK27NnYOsEi8nFcjR7Je/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/Nz52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasIrP+MySQ1KtlgUpoKYmMx+JwOukBkxsYQyxe2thI2ooszYhEo2BG/55VXSvqh5bs27rVca9TyOIpzAKVTBg0towA00oQUMxvAMr/DmJM6L8+58LFoLTj5zDH/gfP4A6waPPA==</latexit><latexit sha1_base64="DSmavFgbyE8yLz2MwnZO7G2nB3E=">AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBahXkoiBT0WvHisYFuhjWWznbRLN5uwuxFL6I/w4kERr/4eb/4bt20O2vpg4PHeDDPzgkRwbVz32ymsrW9sbhW3Szu7e/sH5cOjto5TxbDFYhGr+4BqFFxiy3Aj8D5RSKNAYCcYX8/8ziMqzWN5ZyYJ+hEdSh5yRo2VOk8PWXV8Pu2XK27NnYOsEi8nFcjR7Je/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/Nz52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasIrP+MySQ1KtlgUpoKYmMx+JwOukBkxsYQyxe2thI2ooszYhEo2BG/55VXSvqh5bs27rVca9TyOIpzAKVTBg0towA00oQUMxvAMr/DmJM6L8+58LFoLTj5zDH/gfP4A6waPPA==</latexit>

Ax = b

<latexit sha1_base64="g/Ub0N8eIUIqalzNgwRMs/8w+Nw=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoBeh4sVjBdMW2lA22027dHcTdjdiCf0LXjwo4tU/5M1/46bNQVsfDDzem2FmXphwpo3rfjultfWNza3ydmVnd2//oHp41NZxqgj1Scxj1Q2xppxJ6htmOO0mimIRctoJJ7e533mkSrNYPphpQgOBR5JFjGCTSzdP1+GgWnPr7hxolXgFqUGB1qD61R/GJBVUGsKx1j3PTUyQYWUY4XRW6aeaJphM8Ij2LJVYUB1k81tn6MwqQxTFypY0aK7+nsiw0HoqQtspsBnrZS8X//N6qYmugozJJDVUksWiKOXIxCh/HA2ZosTwqSWYKGZvRWSMFSbGxlOxIXjLL6+S9kXdc+vefaPWbBRxlOEETuEcPLiEJtxBC3wgMIZneIU3RzgvzrvzsWgtOcXMMfyB8/kDpkON7A==</latexit><latexit sha1_base64="g/Ub0N8eIUIqalzNgwRMs/8w+Nw=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoBeh4sVjBdMW2lA22027dHcTdjdiCf0LXjwo4tU/5M1/46bNQVsfDDzem2FmXphwpo3rfjultfWNza3ydmVnd2//oHp41NZxqgj1Scxj1Q2xppxJ6htmOO0mimIRctoJJ7e533mkSrNYPphpQgOBR5JFjGCTSzdP1+GgWnPr7hxolXgFqUGB1qD61R/GJBVUGsKx1j3PTUyQYWUY4XRW6aeaJphM8Ij2LJVYUB1k81tn6MwqQxTFypY0aK7+nsiw0HoqQtspsBnrZS8X//N6qYmugozJJDVUksWiKOXIxCh/HA2ZosTwqSWYKGZvRWSMFSbGxlOxIXjLL6+S9kXdc+vefaPWbBRxlOEETuEcPLiEJtxBC3wgMIZneIU3RzgvzrvzsWgtOcXMMfyB8/kDpkON7A==</latexit><latexit sha1_base64="g/Ub0N8eIUIqalzNgwRMs/8w+Nw=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoBeh4sVjBdMW2lA22027dHcTdjdiCf0LXjwo4tU/5M1/46bNQVsfDDzem2FmXphwpo3rfjultfWNza3ydmVnd2//oHp41NZxqgj1Scxj1Q2xppxJ6htmOO0mimIRctoJJ7e533mkSrNYPphpQgOBR5JFjGCTSzdP1+GgWnPr7hxolXgFqUGB1qD61R/GJBVUGsKx1j3PTUyQYWUY4XRW6aeaJphM8Ij2LJVYUB1k81tn6MwqQxTFypY0aK7+nsiw0HoqQtspsBnrZS8X//N6qYmugozJJDVUksWiKOXIxCh/HA2ZosTwqSWYKGZvRWSMFSbGxlOxIXjLL6+S9kXdc+vefaPWbBRxlOEETuEcPLiEJtxBC3wgMIZneIU3RzgvzrvzsWgtOcXMMfyB8/kDpkON7A==</latexit><latexit sha1_base64="g/Ub0N8eIUIqalzNgwRMs/8w+Nw=">AAAB63icbVBNS8NAEJ3Ur1q/qh69LBbBU0mkoBeh4sVjBdMW2lA22027dHcTdjdiCf0LXjwo4tU/5M1/46bNQVsfDDzem2FmXphwpo3rfjultfWNza3ydmVnd2//oHp41NZxqgj1Scxj1Q2xppxJ6htmOO0mimIRctoJJ7e533mkSrNYPphpQgOBR5JFjGCTSzdP1+GgWnPr7hxolXgFqUGB1qD61R/GJBVUGsKx1j3PTUyQYWUY4XRW6aeaJphM8Ij2LJVYUB1k81tn6MwqQxTFypY0aK7+nsiw0HoqQtspsBnrZS8X//N6qYmugozJJDVUksWiKOXIxCh/HA2ZosTwqSWYKGZvRWSMFSbGxlOxIXjLL6+S9kXdc+vefaPWbBRxlOEETuEcPLiEJtxBC3wgMIZneIU3RzgvzrvzsWgtOcXMMfyB8/kDpkON7A==</latexit>

c<latexit sha1_base64="ykyXXryT0qS3g8DIJalovrnOKSA=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSiw3LFbfqLkE2iZeTCuRoDstfg1HM0gilYYJq3ffcxPgZVYYzgfPSINWYUDalY+xbKmmE2s+Wh87JlVVGJIyVLWnIUv09kdFI61kU2M6Imole9xbif14/NeGtn3GZpAYlWy0KU0FMTBZfkxFXyIyYWUKZ4vZWwiZUUWZsNiUbgrf+8ibp3FQ9t+q1apVGLY+jCBdwCdfgQR0acA9NaAMDhGd4hTfn0Xlx3p2PVWvByWfO4Q+czx/CL4zZ</latexit><latexit sha1_base64="ykyXXryT0qS3g8DIJalovrnOKSA=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSiw3LFbfqLkE2iZeTCuRoDstfg1HM0gilYYJq3ffcxPgZVYYzgfPSINWYUDalY+xbKmmE2s+Wh87JlVVGJIyVLWnIUv09kdFI61kU2M6Imole9xbif14/NeGtn3GZpAYlWy0KU0FMTBZfkxFXyIyYWUKZ4vZWwiZUUWZsNiUbgrf+8ibp3FQ9t+q1apVGLY+jCBdwCdfgQR0acA9NaAMDhGd4hTfn0Xlx3p2PVWvByWfO4Q+czx/CL4zZ</latexit><latexit sha1_base64="ykyXXryT0qS3g8DIJalovrnOKSA=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSiw3LFbfqLkE2iZeTCuRoDstfg1HM0gilYYJq3ffcxPgZVYYzgfPSINWYUDalY+xbKmmE2s+Wh87JlVVGJIyVLWnIUv09kdFI61kU2M6Imole9xbif14/NeGtn3GZpAYlWy0KU0FMTBZfkxFXyIyYWUKZ4vZWwiZUUWZsNiUbgrf+8ibp3FQ9t+q1apVGLY+jCBdwCdfgQR0acA9NaAMDhGd4hTfn0Xlx3p2PVWvByWfO4Q+czx/CL4zZ</latexit><latexit sha1_base64="ykyXXryT0qS3g8DIJalovrnOKSA=">AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mkUI8FLx5bsB/QhrLZTtq1m03Y3Qgl9Bd48aCIV3+SN/+N2zYHbX0w8Hhvhpl5QSK4Nq777RS2tnd294r7pYPDo+OT8ulZR8epYthmsYhVL6AaBZfYNtwI7CUKaRQI7AbTu4XffUKleSwfzCxBP6JjyUPOqLFSiw3LFbfqLkE2iZeTCuRoDstfg1HM0gilYYJq3ffcxPgZVYYzgfPSINWYUDalY+xbKmmE2s+Wh87JlVVGJIyVLWnIUv09kdFI61kU2M6Imole9xbif14/NeGtn3GZpAYlWy0KU0FMTBZfkxFXyIyYWUKZ4vZWwiZUUWZsNiUbgrf+8ibp3FQ9t+q1apVGLY+jCBdwCdfgQR0acA9NaAMDhGd4hTfn0Xlx3p2PVWvByWfO4Q+czx/CL4zZ</latexit>

Choose initial guess x high precisiondo {

Compute r = b – Ax high precisionSolve A * c = r low precisionUpdate x = x + c high precision

} while ( ||r||>tol ) high precision

N.Higham:Accuracyandstabilityofnumericalalgorithms.SIAM,2002.

MixedPrecisionIterativeRefinement

https://github.com/SrikaraPranesh/Multi_precision_NLA_kernels

SriPranesh’s mixedprecisionMatlab suite:

Page 34: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

34 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

MixedPrecisionIterativeRefinementusingsparseiterativesolvers

DoublePrecision

Accuracyimprovement~1013

LinearSystemAx=bwithcond(A)≈104

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/mixed-precision-ir/mixed-precision-ir.cpp

SinglePrecision

Accuracyimprovement~104

relativeresidualaccuracy= (unitroundoff )*(linearsystem’sconditionnumber)

Page 35: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

35 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

MixedPrecisionIterativeRefinementusingsparseiterativesolvers

DoublePrecisionIterativeRefinement

Accuracyimprovement~1014

LinearSystemAx=bwithcond(A)≈104

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/mixed-precision-ir/mixed-precision-ir.cpp

MixedPrecisionIterativeRefinement

Accuracyimprovement~1014

16%runtime improvement

Page 36: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

36 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

MixedPrecisionIterativeRefinementusingsparseiterativesolvers

DoublePrecisionIterativeRefinement

Accuracyimprovement~1014

LinearSystemAx=bwithcond(A)≈104

ExperimentsbasedontheGinkgolibraryhttps://ginkgo-project.github.io/ginkgo/examples/mixed-precision-ir/mixed-precision-ir.cpp

MixedPrecisionIterativeRefinement

Accuracyimprovement~1014

16%runtime improvement

Somereferences:

Strzodka etal.PipelinedMixedPrecisionAlgorithmsonFPGAsforFastandAccuratePDESolversfromLowPrecisionComponents,IEEESymposiumonField-ProgrammableCustomComputingMachines,2006.

Goedekke etal.Performanceandaccuracyofhardware-orientednative-,emulated- andmixed-precisionsolversinFEMsimulations,InternationalJournalofParallel,EmergentandDistributedSystems,2007.

Burattietal.UsingMixedPrecisionforSparseMatrixComputationstoEnhancethePerformancewhileAchieving64-bitAccuracy,ACMTOMS2008.

Baboulin etal.Acceleratingscientificcomputationswithmixedprecisionalgorithms,CPC,2009.

Anzt etal.Mixedprecisioniterativerefinementmethodsforlinearsystems:ConvergenceanalysisbasedonKrylov subspacemethods,PARA2010.

• Forsparseiterativemethods,thebenefitsrelatetothebandwidthsavings;

Page 37: The MultiprecisionEffort in the US Exascale Computing Project · 2020. 5. 8. · 2 H. Anzt: The Multiprecision Effort in the US ExascaleComputing Project 05/08/2020 What is the Multiprecision

37 05/08/2020H.Anzt:TheMultiprecision EffortintheUSExascale ComputingProject

DataAccessor• ValueClustering

DataCompression

IEEE754DP

Memory

ProcessingUnits

MemoryOperations

ArithmeticOperations

• IEEE• CustomFormats• Lossy/Lossless• Unum, Posits…

• Decouplearithmeticprecisionfrommemoryprecision/communication.

• Usingcustomizedprecisionsformemoryoperations.

• Useproblem-adapted lowerprecisioninpreconditioning• Adaptiveprecisionblock-Jacobi /SAI• ILUpreconditioning• MixedPrecisionIterativeRefinement• Multigrid (UlrikeMeierYang@LLNL …)• Polynomialpreconditioning (JenniferLoe andErikBoman @SNL)

• MixedprecisionKrylov solvers(ErinCarson,SteveThomas,BarrySmith…)

• MixedPrecisionSparseLU(SherryLi...)

Futureplansforsparsemethodsusingmixedprecision

This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of EnergyOffice of Science and the National Nuclear Security Administration and the Helmholtz Impuls und VernetzungsfondVH-NG-1241.