HPC & Big Data Convergence

24
HPC & Big Data Convergence The Cutting Edge & Outlook Rashid Mehmood King Abdulaziz University @ IXPUG 2018, KAUST 13 March 2018

Transcript of HPC & Big Data Convergence

HPC&BigDataConvergenceTheCuttingEdge&Outlook

RashidMehmoodKingAbdulaziz University

@IXPUG2018,KAUST13March2018

Dataanalyticsandcomputingecosystemcompared

Rashid Mehmood HPC Big Data Convergence 2

HPCandBigData• HPCtechnologiesareneededbyBigDatatodealwiththe

everincreasingVsofdatainordertoforecastandextractinsightsfromexistingandnewdomains,faster,andwithgreateraccuracy.

• Increasinglymoredataisbeingproducedbyscientificexperimentsfromareassuchasbioscience,physics,andclimate,andtherefore,HPCneedstoadoptdata-drivenparadigms.

• Moreover,therearesynergiesbetweenthemwithunimaginablepotentialfordevelopingnewcomputingparadigms,solvinglong-standinggrandchallenges,andmakingnewexplorationsanddiscoveries.

Rashid Mehmood HPC Big Data Convergence 3

Rashid Mehmood HPC Big Data Convergence 4

Rashid Mehmood HPC Big Data Convergence 5

BigDataAnalyticsWorkloads

Rashid Mehmood HPC Big Data Convergence 6

TechnicalChallengesforaConvergedSystem

• HigherEnergyEfficiency– Isthemostcrucialchallenge– SunwayTaihuLight:15.37MWpowerconsumptionfor93.014Pflops/s(Rmax),125.435Pflops/s(Rpeak)

– Tianhe-2:17.81MWfor33.86Pflops/s(54.9PFLOPSpeak)– Powerandcoolingtechnologies– circuittechnologies– software– Weneedexascaleperformancewhilekeepingthepowerconsumptiontotheexistinglevels

– Powerawarealgorithmsandsoftwarewouldbeanothertrend

Rashid Mehmood HPC Big Data Convergence 7

Communicationsandnetworking

• Thousandsofprocessesandbillionsofthreads• Fine-grainedinteractionbetweentheprocessesisneeded

• Advancesincommunicationtechnologieshavebeenslowercomparedtoadvancesinprocessors

• Highbandwidthandlowlatencytechnologiesareneeded

• Algorithmsandsoftwarearerequiredthatlocaliseandminimisecommunication– Useadditionalcomputingratherthancommunication

Rashid Mehmood HPC Big Data Convergence 8

Challenges(memoryanddatalocality)

• Costofmovingdataishigherthanafloatingpointoperation– Energycostofaddoperationisaround0.9nj,L1toregisteris1nJ,whilemoving

dataoff-chiptoregisterisaround100nJ– 28to40%ofthetotalenergyconsumptionisspentmovingdata– 19to36%ofitiswastedinstalledcycles

• Bandwidth– theXeonbandwidthisonlyupto60GBs−1– IBM’sPower8bandwidthofupto192GBs−1– NVIDIA’slatestGPUshaveover280GBs−1

• Memorycapacity,latencyandbandwidtharecriticalforexascalecomputing

• Newtechnologiessuchasstackedmemorymayhelp• Butmemorypercoreisdecreasing• Designofalgorithmsrequiringlowermemorywouldbethetrend• Datalocalitywouldbeamongthemostimportantgoals• Needstobeintrinsicintheprogrammingmodels

Rashid Mehmood HPC Big Data Convergence 9

Challenges(FaultTolerance)• HPCsoftwarearedevelopedwiththeassumptionsoflow

faultoccurrenceprobability• Typicallyproportionaltothenumberofprocessesand

interactions• Billionsoffine-grainedprocessesinacomputationwill

increasethelikelihoodoffaultoccurrence• Bigdatasystemsaretypicallymorefault-tolerantpartlydue

tomachinevirtualisation,andfailovertechnologies– Butnottothedesiredlevel

• Thiswillpushtowardsrequiringmoreloosely-coupled,intelligent,somewhatfailureawarealgorithmsandsoftwaredesign

• Resilienceneedstobeintrinsicinprogrammingparadigms

Rashid Mehmood HPC Big Data Convergence 10

Challenges(4VsofBigData)• Volume,Velocity,VarietyandVeracity• Largevolumeofdatatypicallymeanmorememory,more

computing,andmoreinteraction• FastdatarequiresfasterI/O,memories,computationalelements,

workflows,applicationsandstrategiesfordatamanagement• Bigdatamayhavemanymorefilesthanthecurrentfilesystems

areabletodealwith• Restructuringandmanagementofcurrentscientificworkflowsis

requiredtomeetthecurrentandfuturedevelopmentsinHPC• Insitudataanalysiswouldalsoneedstobeintegratedwhere

efficient• Varietyandveracityrequireintelligentself-awaremethods

Rashid Mehmood HPC Big Data Convergence 11

Challenges(programmingmodels)• MPI(MessagePassingInterface)isrelativelytightlycoupled

andatypicalchoiceamongtheHPCcommunity• BSP(BulkSynchronousParallel)andMap-Reduceare

typicallyusedbyBigDatausers• Bigdataenvironmentsaremoreexpressiveandproductive

butofferlowerperformancecomparedtoHPC• Newprogrammingmodelsareneededtoexpressthefine-

grainedparallelismamongmillionsofcoresandbillionsofthreads.– Plusheterogeneityofsystems– Resilience– Datalocality

Rashid Mehmood HPC Big Data Convergence 12

SoftwareandHardwareGap• Thehardwarehasalreadychangedatamuchfasterrate

thanthesoftware• Exascalelevelparallelismwouldcreateevenmore

heterogeneityandchange• Theeffortindevelopingsoftwareishugeforagiven

architecture• maintainingandadaptinganexistingsoftwarefornew

architecturesisdaunting• Evenmorechallengingistoadaptexistingcodestofine-

grained(cores)heterogeneoussystemenvironments• Reformulationofscientificproblems,algorithmsand

workflowsareneededtomovetoexascalecomputing– E.g.computeratherthanfetchingdataifpossible

Rashid Mehmood HPC Big Data Convergence 13

AdditionalTechnicalChallenges• Correctnessofalgorithmsandsoftware• EfficiencyofthescientificprocesstosetupexperimentsanduseHPC/BDresources

• UsabilityandImpactismoreimportantthanbenchmarks

• Applicationscientistsareunderstandablymorefocussedonresultsandaccuracy– andlessonenergy,systemandworkflow/processefficiency

– Coordinationtoconsiderthebiggerpicturewouldimproveefficiency,productivityandtimetoinnovation

Rashid Mehmood HPC Big Data Convergence 14

CurrentConvergenceEfforts

Mellanox UDA,RDMA-Hadoop,DataMPI,Hadoop-IPoIB,HMOR

SpecificSolutionsConvergenceApproach

myHadoop

LibHDFS

MPI,ad-hocHadoop,CloudBlast,Spark,HTCondor

dataMPI

VirtualizedAnalyticsShipping(VAS),Sparkondemand

iRODS,MapReduce-MPI,Pilot-MapReduce,SRMetc

(Triple-H)

HPCorientedMapReducesolution

Hadoopon-demandontraditionalHPCresources

HPCapplication’sinterfacetoHDFS

Parallelizationofmanytaskapplicationswithdifferent

workflowsystemsOverlappingofmap,shuffleand

mergephasesMap-reducelikeframeworkforin

situdataanalysisSolutionstodealwithmassiveamountofdataindataintensive

applicationsHybriddesigntoreduceI/O

bottleneckinHDFS

Rashid Mehmood HPC Big Data Convergence 15

DesignPatternsBasedConvergedFuture

Rashid Mehmood HPC Big Data Convergence 16

Businesses Customers Scientists Administrators

Visualization&ManagementLayer

DistributedandParallelVisualizations

LiveAnalysis,AdvanceSearches,recommendations

Interactivedataexploration,renderingdatavisualizations,customized&user-friendly

experience,realtimemonitoring

Analytics/ProcessingLayerAnalyticspatternforunstructured

&structuredata,ProcessingAbstractionPattern,HighVelocity

real-timeprocessingetc.

AdvanceAlgorithms,computationsinparallel,StructuralPatterns,ComputationalPatterns,ParallelAlgorithmicstrategypatternsetc.

RealTimeAnalysis&BatchAnalysis,ResilienceDesignpatterns,Energyefficiency,Trade-offDesignPatterns

Storage/AccessLayer

Distributed&ParallelFileSysteme.g.LUSTER,HDFSetc. CognitiveStorage

UnstructuredDatae.g.HDFS,GFS,NoSQL(MongoDB).StructuredData

e.g.BigTable,HBase

DataSizeReduction,HighVolumeHierarchical,linked,Tabular&Binarystorage,Real-TimeAccess

Structured,Un-Structured,Semi-Structured IoT,Socialmedia,Scientificsimulations,Geographicaldata

DataTypes DataSources

Hardware

Commodity+State-of-the-Art

Acknowledgement

• SardarUsman• FurqanAlam

Rashid Mehmood HPC Big Data Convergence 17

References[1] EricD.Isaacs.(2010,Nov.)HuffpostChicago.[Online].http://www.huffingtonpost.com/eric-d-isaacs/why-america-must-win-the_b_785652.html

[2] Wikipedia.Supercomputer.[Online].https://en.wikipedia.org/wiki/Supercomputer

[3] Top500.http://www.top500.org/.[Online].http://www.top500.org/

[4] BBC.(2015,Apr.)USnuclearfearsblockIntelChinasupercomputerupdate.[Online].http://www.bbc.com/news/technology-32247532

[5] GilesM.B.andRegulyI.,"Trendsinhigh-performancecomputingforengineeringcalculations.,"Phil.Trans.R.Soc.A,vol.372,no.2022,2014,http://dx.doi.org/10.1098/rsta.2013.0319.

[6] TheWhiteHouse,OfficeofthePressSecretary.ExecutiveOrder-- CreatingaNationalStrategicComputingInitiative,29July2015.[Online].https://www.whitehouse.gov/the-press-office/2015/07/29/executive-order-creating-national-strategic-computing-initiative

[7] RobertF.Service,"Obamaordersefforttobuildfirstexascalecomputer,"Science,AAAS,July2015,http://www.sciencemag.org/news/2015/07/obama-orders-effort-build-first-exascale-computer

[8] LawrenceLivermoreNationalLaboratoryDonaCrawford.(2016,Jan.)TheImpactoftheU.S.SupercomputingInitiativeWillBeGlobal.[Online].http://www.top500.org/blog/the-impact-of-the-us-supercomputing-initiative-will-be-global/

[9] DanielA.ReedandJackDongarra,"ExascaleComputingandBigData,"CommunicationsoftheACM,vol.58,no.7,pp.56--68,July2015,http://doi.acm.org/10.1145/2699414

Rashid Mehmood HPC Big Data Convergence 18

References10. NiroshinieFernando,SengW.Loke,WennyRahayu,Mobilecloudcomputing:Asurvey,FutureGenerationComputerSystems,Vol.29,

Issue1,pp84–106,2013.

11. HoangT.Dinh,ChonhoLee,DusitNiyato,PingWang,Asurveyofmobilecloudcomputing:architecture,applications,andapproaches,Vol13Issue18,2013.

12. ScottJarr,FastDataandtheNewEnterpriseDataArchitecture.FirstEdition.October2014.O’ReillyMedia,Inc.

13. BobMarcus,“DataProcessinginCyber-PhysicalSystems”,January2016

14. BarryBolding,5PredictionsforSupercomputingin2016

15. https://www.hpcwire.com/2015/11/18/hpc-roi-invest-a-dollar-to-make-500-plus-reports-idc/

16. http://www.enterprisetech.com/2016/11/16/idc-ai-hpda-driving-hpc-high-growth-markets/?eid=328369061&bid=1593803

17. SaudiArabiaInvestsUS$70BillioninEconomicCitiesProject,Cisco.

18. TechnologyHoldstheKeytoSuccessforSaudiArabia'sVision2030,SaysIDC,21May2016.

19. Bigdataessentialtocancer'moonshot‘,CIO,11May2016.http://www.cio.com/article/3068571/government/big-data-essential-to-cancer-moonshot.html

20. VicePresidentBidenSaysBetterData,ComputingMakeCancerBeatable,CIO,19September2016.http://blogs.wsj.com/cio/2016/09/19/vice-president-biden-says-better-data-computing-make-cancer-beatable/

21. http://qz.com/811199/apple-aapl-is-scaling-back-its-autonomous-car-ambitions-and-focusing-on-creating-self-driving-software/

22. “IsHadooptheNewHPC,”2016.[Online].Available:http://www.admin-magazine.com/HPC/Articles/Is-Hadoop-the-New-HPC.[Accessed:17-March-2018].

23. Referenceslisttobeupdated

Rashid Mehmood HPC Big Data Convergence 19

References• EricD.Isaacs.(2010,Nov.)HuffpostChicago.[Online].http://www.huffingtonpost.com/eric-d-isaacs/why-america-

must-win-the_b_785652.html

• Wikipedia.Supercomputer.[Online].https://en.wikipedia.org/wiki/Supercomputer

• Top500.http://www.top500.org/.[Online].http://www.top500.org/

• BBC.(2015,Apr.)USnuclearfearsblockIntelChinasupercomputerupdate.[Online].http://www.bbc.com/news/technology-32247532

• GilesM.B.andRegulyI.,"Trendsinhigh-performancecomputingforengineeringcalculations.,"Phil.Trans.R.Soc.A,vol.372,no.2022,2014,http://dx.doi.org/10.1098/rsta.2013.0319.

• TheWhiteHouse,OfficeofthePressSecretary.ExecutiveOrder-- CreatingaNationalStrategicComputingInitiative,29July2015.[Online].https://www.whitehouse.gov/the-press-office/2015/07/29/executive-order-creating-national-strategic-computing-initiative

• RobertF.Service,"Obamaordersefforttobuildfirstexascalecomputer,"Science,AAAS,July2015,http://www.sciencemag.org/news/2015/07/obama-orders-effort-build-first-exascale-computer

• LawrenceLivermoreNationalLaboratoryDonaCrawford.(2016,Jan.)TheImpactoftheU.S.SupercomputingInitiativeWillBeGlobal.[Online].http://www.top500.org/blog/the-impact-of-the-us-supercomputing-initiative-will-be-global/

• DanielA.ReedandJackDongarra,"ExascaleComputingandBigData,"CommunicationsoftheACM,vol.58,no.7,pp.56--68,July2015,http://doi.acm.org/10.1145/2699414

Rashid Mehmood HPC Big Data Convergence 20

References• NiroshinieFernando,SengW.Loke,WennyRahayu,Mobilecloudcomputing:Asurvey,FutureGenerationComputerSystems,Vol.29,Issue

1,pp84–106,2013.

• HoangT.Dinh,ChonhoLee,DusitNiyato,PingWang,Asurveyofmobilecloudcomputing:architecture,applications,andapproaches,Vol13Issue18,2013.

• ScottJarr,FastDataandtheNewEnterpriseDataArchitecture.FirstEdition.October2014.O’ReillyMedia,Inc.

• BobMarcus,“DataProcessinginCyber-PhysicalSystems”,January2016

• BarryBolding,5PredictionsforSupercomputingin2016

• https://www.hpcwire.com/2015/11/18/hpc-roi-invest-a-dollar-to-make-500-plus-reports-idc/

• http://www.enterprisetech.com/2016/11/16/idc-ai-hpda-driving-hpc-high-growth-markets/?eid=328369061&bid=1593803

• SaudiArabiaInvestsUS$70BillioninEconomicCitiesProject,Cisco.

• TechnologyHoldstheKeytoSuccessforSaudiArabia'sVision2030,SaysIDC,21May2016.

• Bigdataessentialtocancer'moonshot‘,CIO,11May2016.http://www.cio.com/article/3068571/government/big-data-essential-to-cancer-moonshot.html

• VicePresidentBidenSaysBetterData,ComputingMakeCancerBeatable,CIO,19September2016.http://blogs.wsj.com/cio/2016/09/19/vice-president-biden-says-better-data-computing-make-cancer-beatable/

• http://qz.com/811199/apple-aapl-is-scaling-back-its-autonomous-car-ambitions-and-focusing-on-creating-self-driving-software/

• Referenceslisttobeupdated

Rashid Mehmood HPC Big Data Convergence 21

References• Giffinger,Rudolf;ChristianFertner,HansKramar,RobertKalasek,Nataša Pichler-Milanovic,EvertMeijers (2007)."Smartcities– Rankingof

Europeanmedium-sizedcities".http://www.smart-cities.eu/.Vienna:CentreofRegionalScience.

• RashidMehmoodandM.Nekovee,VehicularAdhocandGridNetworks:Discussion,DesignandEvaluation,InProcofthe14thWorldCongressonIntelligentTransportSystems,October2007

• SMART2020:Enablingthelowcarboneconomyintheinformationage.AreportbyTheClimateGrouponbehalfoftheGlobaleSustainability Initiative(GeSI).2008.

• NicholasStern.KeyElementsofaGlobalDealonClimateChange,LondonSchoolofEconomicsandPoliticalScience.2008.http://www.lse.ac.uk/collections/climateNetwork/publications/KeyElementsOfAGlobalDeal_30Apr08.pdf

• NicholasStern.ExecutiveSummary,SternReviewontheEconomicsofClimateChange,HMTreasury.2006.

• http://abhi-carmaniacs.blogspot.co.uk/2012/02/vehicular-ad-hoc-network.html

• http://mubbisherahmed.wordpress.com/2011/11/29/the-future-of-intelligent-transport-systems-its/

• R.Mehmood,Disk-basedTechniquesforEfficientSolutionofLargeMarkovChains,SchoolofComputerScience,UniversityofBirmingham,UK,October2004

• R.Mehmood,JA.Lu,ComputationalMarkovian analysisoflargesystem,InSpecialissueonIntelligentManagementSystemsinOperations,JournalofManufacturingTechnologyManagement,Vol.22,Issue6,pp.804– 817,2011,DOI:10.1108/17410381111149657

• RashidMehmood,Jie A.Lu.ComputationalMarkovian analysisoflargesystem.InSpecialissueonIntelligentManagementSystemsinOperations.JournalofManufacturingTechnologyManagement,Vol.22,Issue6,pp.804– 817,2011.DOI:10.1108/17410381111149657

Rashid Mehmood 22HPC Big Data Convergence

References • N.Komninos,“Intelligentcities:Variablegeometriesofspatialintelligence,”Intell.Build.Int.,vol.3,no.3,pp.172–188,2011

• RashidMehmood,Furqan Alam,NasserN.Albogami,Iyad Katib,Aiiad Albeshri andSalehM.Altowaijri,UTiLearn:APersonalisedUbiquitousTeachingandLearningSystemforSmartSocieties,IEEEAccess,March2017

• Zubaida AlAzawi,OmarAlani,Mohmmad B.Abdljabar,SalehAltowaijri,andRashidMehmood,ASmartDisasterManagementSystemforFutureCities,InProceedingsoftheACMWorkshoponWirelessandMobileTechnologiesforSmartCities(WiMobCity 2014),inconjunctionwiththe15thACMInternationalSymposiumonMobileAdHocNetworkingandComputing(MobiHoc 2014),Philadelphia,USA,August11-14,pp1-10,2014.

• Z.Alazawi,M.Abdljabar,S.Altowaijri,A.M.Vegni andR.Mehmood,ICDMS:AnIntelligentCloudbasedDisasterManagementSystemforVehicularNetworks,CommunicationTechnologiesforVehicles,LectureNotesinComputerScience,Vol.7266/2012,April2011,DOI:10.1007/978-3-642-29667-3_4

• Zubaida Alazawi,Mohmmad Abdljabar,SalehAltowaijri andRashidMehmood.InvitedPaper:IntelligentDisasterResponseSystembasedonCloud-EnabledVehicularNetworks.11thInternationalConferenceonIntelligentTransportationSystems(ITS)Telecommunications,Saint-Petersburg,Russia,August2011.DOI:10.1109/ITST.2011.6060083

• RashidMehmood andJonCrowcroft,ParallelIterativeSolutionMethodforLargeSparseLinearEquationSystems,TechnicalReportUCAM-CL-TR-650,ComputerLaboratory,UniversityofCambridge,October2005,http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-650.html

• http://cities.media.mit.edu/• Wikipedia.SmartCity.[Online].https://en.wikipedia.org/wiki/SmartCity• A.Caragliu,C.DelBo,andP.Nijkamp,“SmartCitiesinEurope,”3rdCent.Eur.Conf.Reg.Sci.,pp.45–59,2009.

Rashid Mehmood HPC Big Data Convergence 23

Be Part of ourJourneytowardsRealizingSaudiVision2030…

Thankyou…

[email protected]

HPC Big Data Convergence 24Rashid Mehmood