From Damaris to CALCioM - Inria · ICS 2011 M. Dorier, G. Antoniu, F. Cappello, M. Snir, L. Orf....

Post on 11-Aug-2020

1 views 0 download

Transcript of From Damaris to CALCioM - Inria · ICS 2011 M. Dorier, G. Antoniu, F. Cappello, M. Snir, L. Orf....

FromDamaristoCALCioMMi/ga/ngI/OInterferenceinHPCSystems

Ma#hieuDorier–ENSRennes,IRISA,InriaRennesKerDataprojectteamJointworkwithRobRoss,DriesKimpe,GabrielAntoniu,ShadiIbrahim

WithintheData@Exascaleassociatedteam

10thworkshopoftheJointLabforPetascaleCompuLngUrbana-Champaign,November2013

1

Outline

•  Damaris:AUer3yearsofcollaboraLon…•  CALCioM:Towardscross-applicaLoncoordinaLon

2

DamarisaUer3years…Originated from the “Shared Buffering System”designed in 2010 during aninternship at NCSA, Damaris proposes to dedicate cores in mulLcore SMPnodestodatamanagement,i.e.storage,insituanalysisandvisualizaLon.

Overview Implementa/on

•  Version0.7.3available•  h#p://damaris.gforge.inria.fr

•  Version1.0forsummer2014•  15095linesofcode•  APIforC,C++andFortran

simulaLons•  EasyconfiguraLonwithXML•  InsituvisualizaLonwithVisIt•  PythonandC++plugins

3

DamarisaUer3years…PeopleInvolved Evaluatedon

BlueWaters,Intrepid,Kraken,Jaguar,Grid’5000,BluePrint,Surveyor

Ma#hieuDorier,GabrielAntoniu,LokmanRahmani,RobertoSisneros,DaveSemeraro,BobWilhelmson,

RobRoss,TomPeterka,DriesKimpe,MarcSnir,FranckCappello,LeighOrf

Publica/onsM. Dorier, advised by G. Antoniu. Damaris - Using Dedicated I/OCoresforScalablePost-petascaleHPCSimulaLons.ICS2011M.Dorier,G.Antoniu,F.Cappello,M.Snir,L.Orf.Damaris:HowtoEfficientlyLeverageMulLcoreParallelismtoAchieveScalable,Ji#er-freeI/O.inProc.ofIEEECLUSTER2012.M.Dorier,advisedbyG.Antoniu.EfficientI/OusingDedicatedCoresinLarge-ScaleHPCSimulaLons.PhDforumofIPDPS2013M. Dorier, R. Sisneros, T. Peterka, G. Antoniu, D. Semeraro.Damaris/Viz, a Nonintrusive, Adaptable and User-Friendly In SituVisualizaLonFramework.inProc.ofIEEELDAV2013

EvaluatedwithCM1,Nek5000,OLAM

4

Mi/ga/ngI/OInterference

inHPCSystems

5

IntroducLontocross-applicaLoninterference

Interference: Performance degradaLon observed by anapplicaLon incontenLonwithotherapplicaLons for theaccesstoasharedresource. •  HowoUendoesI/Ointerferenceoccur?• WhatistheeffectofI/Ointerference?•  HowdowequanLfyandvisualizeit?•  HowtomiLgateit?

6

HowoUendoesI/Ointerferenceoccur?

7

HowoUendoesinterferenceoccur?“Intrepidhasareallyweirdworkloadcomparedtomostothersystems,becauseofthelargenumberoflargejobs.”

NarayanDesai(ANL)

8

HowoUendointerferenceoccur?IamanapplicaLon,IstartwriLng,whatistheprobabilitythatatleastoneotherapplicaLonisalsoaccessingthefilesystem?

P(another is doing I/O) =1− P(X = n)(1−E(µ))n=0

+∞

WhereXisthenumberofrunningapplicaLon(randomvariable),μistheI/OLmev.s.computaLonLmeraLoofapplicaLons(r.v.),AssumingindependencebetweenXandμ.

OnIntrepid:AssumingE(μ)=5%,P(anotherisdoingI/O)=64%

9

WhatistheeffectofI/Ointerference?

10

WhatistheeffectofI/Ointerference?

IORrunningon336cores,wriLngevery10secondsina35-serverPVFSfilesystem

onGrid’5000

Asecondinstanceisstartedon336othercores,wriLngthesameamountofdata

every7seconds

I/Ointerferencehasalargeimpactoncachingmechanisms 11

HowdowequanLfyandvisualizeI/Ointerference?

12

Interferencefactor•  Theuserisinterestedinthefactorbywhichinterference

increasestheI/O3me:

•  ConsideringnapplicaLons,wecould(forexample)wanttominimizethesumofaccessLmes:

•  Thesemetricscanbeadaptedtoanything(EnergyconsumpLon,CPUcycles,etc.):fcanbegeneralizedasametricsformachine-wideefficiency.

IX =TX

TX (alone)>1

f = TXX∈app∑

13

Delta-graphAppA’saccess

AppB’saccessdt

ResultsonSurveyor(2x2048cores),eachcorewrites8MBconLguously.Thegraphrepresentsthepointofviewofoneofthe2applicaLons.

I/OLmewhentheapplicaLonisalone

PerformancedegradaLonduetointerferences

14

BadluckforsmallapplicaLons

ExperimentonGrid’5000,AppBon24cores,AppAon744,wriLng8MBperprocess

SmallestAppobservesanupto14xdecreaseofperformance!Biggestonedoesnotevenseeit!

15

HowtomiLgateI/Ointerference?TheCALCioMapproach

16

TheCALCioMarchitectureAppA

CALCioMI/Olibrary

MPII/O

AppB

I/Olibrary

MPII/O

Read/WriteRead/WriteFileSystem

CoordinaLon

17

Cross-Applica/onLayerforCoordinatedI/OManagement

CALCioM’sAPI

CALCioM_Init(MPI_Comm c)CALCioM_Prepare(MPI_Comm c, MPI_Info i)CALCioM_Ask()CALCioM_Check(int* status)CALCioM_Wait()CALCioM_Release()CALCioM_Complete()CALCioM_Finalize()

18

PossiblecoordinaLonstrategies

AppA

AppBdt

AppA

AppA

AppBdt

“Firstcomefirstserved”(FCFS)SerializaLon

InterrupLon

19

HowtochooseacoordinaLonstrategyQ:GivenapplicaLonAwithexpectedaccessLmeTAandapplicaLonBwithexpectedaccessLmeTB,starLngdtLmeunitsaUerapplicaLonA’saccess,

ShouldAbeinterruptedinfavorofB?OrshouldBwaitforAtoterminateitsaccess?

Example:ifneitherAnorBhavesomethingelsetodo,opLmizingglobalperformance,i.e.minimizinganinterferenceeffectgivenby

f = TATA(alone)

+TB

TB(alone)

TellsusthatBshouldinterruptAifandonlyif

dt <TA(alone)

2 −TB(alone)2

TA(alone)

f = TA +TB

dt < TA(alone) −TB(alone)

20

IntegraLoninMpich•  MPI_InitandMPI_Finalize overwri#en in

libcalciom.a•  MPI_File_open(“myfile”)

Ø MPI_File_open(“calciom:myfile”)•  MPI_File_open(“pvfs2:myfile”)

Ø MPI_File_open(“calciom:pvfs2:myfile”)•  ConnecLonbetweenapplicaLons:couldbedonethrough

MPI_Comm_connect/accept(ideallywouldbenefitfromMPI_Comm_iconnect/iaccept)+interacLonwiththejobscheduler

21

ExperimentalevaluaLon

22

2x2048coresonSurveyor•  AppA:4files,4MBperfileperprocess,conLguouslayout•  AppB:1file,4MBperfileperprocess,conLguouslayout

f = TA +TB dt < TA(alone) −TB(alone)

AppB(smallI/Oload) AppA(bigI/Oload)

ExampleofapplicaLon

23

AppBarrivesfirst,AppAisserializeda[erB

AppB(smallI/Oload) AppA(bigI/Oload)

ExampleofapplicaLon

24

ExampleofapplicaLon

AppBarrivesduringthewriteofthe3firstfilesofAppA,Condi/onindicatesthatAshouldbeinterrupted.

Thelevelofinterrup/onproducesdifferentpa^erns.

AppB(smallI/Oload) AppA(bigI/Oload)

25

ExampleofapplicaLon

AppBarrivesduringthelastwriteofAppA.Condi/ondictatesthatBisserializeda[erA.

AppB(smallI/Oload) AppA(bigI/Oload)

26

Synthesis

CALCioMmanagestoimprovethecomputa/onalefficiencyofthesetofapplica/onsbyavoidinginterference,andthus

improvestheefficiencyoftheen/remachine. 27

Conclusion

28

Conclusion•  Interferencebetweenapplica/onimpactssystemefficiency•  CALCioM:•  CommunicaLonlayerbetweenindependentapplicaLons•  Cross-applicaLoncoordinaLonthroughexchangeofknowledgeonI/Opa#erns

•  Severalpoliciesimplemented:FCFS,interrupLon

reallifeinterference

Thankyou!Ques/ons?

29