Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Post on 24-Feb-2016

34 views 0 download

Tags:

description

Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load. Jack Dongarra Kenneth Roche. Javier Cuenca Domingo Giménez José González. Optimisation of Linear Algebra Routines. Traditional method: Hand-Optimisation for each platform Time-consuming - PowerPoint PPT Presentation

Transcript of Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load

Javier CuencaDomingo Giménez

José González

Jack DongarraKenneth Roche

Optimisation of Linear Algebra Routines

•Traditional method: Hand-Optimisation for each platform

› Time-consuming› Incompatible with Hardware Evolution› Incompatible with changes in the system › (architecture and basic libraries)› Unsuitable for systems with variable load› Misuse by non expert users

Our ApproachModelling

the Linear Algebra Routine (LAR):

Texec = f (SP, AP, n)

SP: System ParametersAP: Algorithmic Parametersn: Problem size

Estimationof SP

Selectionof AP values

Executionof LAR

DESIGN

INSTALLATION

RUN-TIME

Our Approach

LARsJacobi methods for the symmetric eigenvalue problem

Gauss elimination

LU factorisation

QR factorisation

PlatformsCluster of Workstations

Cluster of PCs

SGI Origin 2000

IBM SP2

Static Model of LAR: Situation of platform at installation time

Our Approach

LARsJacobi methods for the symmetric eigenvalue problem

Gauss elimination

LU factorisation

QR factorisation

PlatformsCluster of Workstations

Cluster of PCs

SGI Origin 2000

IBM SP2

Static Model of LAR: Situation of platform at installation time

Dynamic Model of LAR: Situation of platform at run-time.

DESIGN PROCESS

DESIGN

LAR: Linear Algebra RoutineMade by the LAR Designer

LAR

Example of LAR: Parallel Block LU factorisation

Modelling the LARLAR

Modellingthe LAR

MODEL

DESIGN

Modelling the LARLAR

Modellingthe LAR

MODEL

DESIGN

MODELTexec = f (SP, AP, n)

SP: System Parameters AP: Algorithmic Parameters n : Problem size

Made by the LAR-DesignerOnly once per LAR

Modelling the LARLAR

Modellingthe LAR

MODEL

DESIGN

SP: k3, k2, ts, twAP: p, bn : Problem size

MODEL LAR: Parallel Block LU factorisation

Implementation of SP-EstimatorsLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

DESIGN

Implementation of SP-EstimatorsLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

DESIGN

Estimators of Arithmetic-SPComputation Kernel of the LARSimilar storage schemeSimilar quantity of data

Estimators of Communication-SP Communication Kernel of the LAR Similar kind of communicationSimilar quantity of data

INSTALLATION PROCESSLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

INSTALLATION

DESIGN

Installation ProcessOnly once per PlatformDone by the System Manager

Estimation of Static-SPLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

INSTALLATION

Estimation of Static-SPLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

INSTALLATION

Basic LibrariesBasic Communication Library:

MPI PVM

Basic Linear Algebra Library: reference-BLAS

machine-specific-BLASATLAS

Installation FileSP values are obtained using the information (n and AP values) of this file.

Estimation of Static-SPLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

INSTALLATION

Estimation of the Static-SP tw-static (in sec)

Message size (Kbytes) 32 256 1024 2048tw-static 0.700 0.690 0.680 0.675

Platform:Cluster of Pentium III + Fast Ethernet

Basic Libraries: ATLAS and MPI

Estimation of the Static-SP k3-static (in sec)

Block size 16 32 64 128k3-static 0.0038 0.0033 0.0030 0.0027

RUN-TIME PROCESSLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

RUN-TIME

INSTALLATION

LAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

RUN-TIME

INSTALLATION

Optimum-AP

Selectionof Optimum AP

RUN-TIME PROCESS: Static approach

LAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

RUN-TIME

INSTALLATION

Optimum-AP

Selectionof Optimum AP

Executionof LAR

RUN-TIME PROCESS: Static approach

LAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

DESIGN

RUN-TIME

INSTALLATION

RUN-TIME PROCESS:Dynamic Approach

Call to NWSLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Call to NWS

RUN-TIME

NWS Information

Call to NWS

The NWS is called and it reports:

the fraction of available CPU (fCPU)

the current word sending time (tw-current) for a specific n and AP values (n0, AP0).

Then the fraction of available network is calculated:

Call to NWSLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Dynamic Adjustment of SPLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

Current-SP

Dynamic Adjustmentof SP

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Dynamic Adjustment of SP

Current-SP

Dynamic Adjustmentof SP

NWS Information

Call to NWS

The values of the SP are adjusted, according to the current situation:

Static-SP-File

RUN-TIME

Dynamic Adjustment of SPLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

Current-SP

Dynamic Adjustmentof SP

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Selection of Optimum APLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

Current-SP

Dynamic Adjustmentof SP

Optimum-AP

Selectionof Optimum AP

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Execution of LARLAR

Modellingthe LAR

MODEL

Implementationof SP-Estimators

SP-Estimators

Estimationof Static-SP

Static-SP-File

Basic Libraries Installation-File

Current-SP

Dynamic Adjustmentof SP

Optimum-AP

Selectionof Optimum AP

Executionof LAR

NWS Information

Call to NWS

DESIGN

INSTALLATION

RUN-TIME

Platform load: different situations studied

nodo1 nodo2 nodo3 nodo4 nodo5 nodo6 nodo7 nodo8Situation A

CPU avail. 100% 100% 100% 100% 100% 100% 100% 100%tw-current 0.7sec

Situation BCPU avail. 80% 80% 80% 80% 100% 100% 100% 100%tw-current 0.8sec 0.7sec

Situation CCPU avail. 60% 60% 60% 60% 100% 100% 100% 100%tw-current 1.8sec 0.7sec

Situation DCPU avail. 60% 60% 60% 60% 100% 100% 80% 80%tw-current 1.8sec 0.7sec 0.8sec

Situation ECPU avail. 60% 60% 60% 60% 100% 100% 50% 50%tw-current 1.8sec 0.7sec 4.0sec

Optimum AP for the different situations studied

Block size Situations of the Platform Load

n A B C D E1024 32 32 64 64 642048 64 64 64 128 1283072 64 64 128 128 128

Number of nodes to use p = r c

Situations of the Platform Loadn A B C D E1024 42 42 22 22 212048 42 42 22 22213072 42 42 22 2221

Experimental Time:deviations from the Optimum

n = 1024

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

A B C D E

Situations of platform load

Static ModelDynamic Model

Experimental Time:deviations from the Optimum

n = 2048

0%

20%

40%

60%

80%

100%

120%

140%

160%

A B C D E

Situations of the platform load

Static ModelDynamic Model

Experimental Time:deviations from the Optimum

n = 3072

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

A B C D E

Situations of the platform load

Static ModelDynamic Model

Conclusions and Future Work

•The use of the proposed methodology is viable in systems where the load is stable or variable.

•Software like NWS is suitable for the adjustment of the system parameters’ values obtained at installation time.

•The heterogeneous load case offers many more possibilities than the one studied.