Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

67
Big Data Meets HPC – Exploi4ng HPC Technologies for Accelera4ng Big Data Processing Dhabaleswar K. (DK) Panda The Ohio State University E-mail: [email protected] h<p://www.cse.ohio-state.edu/~panda Keynote Talk at HPCAC-Switzerland (Mar 2016) by

Transcript of Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

Page 1: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

BigDataMeetsHPC–Exploi4ngHPCTechnologiesforAccelera4ngBigDataProcessing

DhabaleswarK.(DK)PandaTheOhioStateUniversity

E-mail:[email protected]

h<p://www.cse.ohio-state.edu/~panda

KeynoteTalkatHPCAC-Switzerland(Mar2016)

by

Page 2: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 2NetworkBasedCompu4ngLaboratory

•  BigDatahasbecometheoneofthemostimportantelementsofbusinessanalyFcs

•  ProvidesgroundbreakingopportuniFesforenterpriseinformaFonmanagementanddecisionmaking

•  Theamountofdataisexploding;companiesarecapturinganddigiFzingmoreinformaFonthanever

•  TherateofinformaFongrowthappearstobeexceedingMoore’sLaw

Introduc4ontoBigDataApplica4onsandAnaly4cs

Page 3: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 3NetworkBasedCompu4ngLaboratory

•  Commonlyaccepted3V’sofBigData•  Volume,Velocity,VarietyMichaelStonebraker:BigDataMeansatLeastThreeDifferentThings,hWp://www.nist.gov/itl/ssd/is/upload/NIST-stonebraker.pdf

•  4/5V’sofBigData–3V+*Veracity,*Value

4VCharacteris4csofBigData

Courtesy:hWp://api.ning.com/files/tRHkwQN7s-Xz5cxylXG004GLGJdjoPd6bVfVBwvgu*F5MwDDUCiHHdmBW-JTEz0cfJjGurJucBMTkIUNdL3jcZT8IPfNWfN9/dv1.jpg

Page 4: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 4NetworkBasedCompu4ngLaboratory

•  Webpages(content,graph)

•  Clicks(ad,page,social)

•  Users(OpenID,FBConnect,etc.)

•  e-mails(Hotmail,Y!Mail,Gmail,etc.)

•  Photos,Movies(Flickr,YouTube,Video,etc.)

•  Cookies/trackinginfo(seeGhostery)

•  Installedapps(Androidmarket,AppStore,etc.)

•  LocaFon(LaFtude,Loopt,Foursquared,GoogleNow,etc.)

•  Usergeneratedcontent(Wikipedia&co,etc.)

•  Ads(display,text,DoubleClick,Yahoo,etc.)

•  Comments(Discuss,Facebook,etc.)

•  Reviews(Yelp,Y!Local,etc.)

•  SocialconnecFons(LinkedIn,Facebook,etc.)

•  Purchasedecisions(Netflix,Amazon,etc.)

•  InstantMessages(YIM,Skype,Gtalk,etc.)

•  Searchterms(Google,Bing,etc.)

•  NewsarFcles(BBC,NYTimes,Y!News,etc.)

•  Blogposts(Tumblr,Wordpress,etc.)

•  Microblogs(Twi<er,Jaiku,Meme,etc.)

•  Linksharing(Facebook,Delicious,Buzz,etc.)

DataGenera4oninInternetServicesandApplica4ons

NumberofAppsintheAppleAppStore,AndroidMarket,Blackberry,andWindowsPhone(2013)•  AndroidMarket:<1200K•  AppleAppStore:~1000KCourtesy:hWp://dazeinfo.com/2014/07/10/apple-inc-aapl-ios-google-inc-goog-android-growth-mobile-ecosystem-2014/

Page 5: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 5NetworkBasedCompu4ngLaboratory

VelocityofBigData–HowMuchDataIsGeneratedEveryMinuteontheInternet?

TheglobalInternetpopula4ongrew18.5%from2013to2015andnowrepresents3.2BillionPeople.

Courtesy:hWps://www.domo.com/blog/2015/08/data-never-sleeps-3-0/

Page 6: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 6NetworkBasedCompu4ngLaboratory

•  ScienFficDataManagement,Analysis,andVisualizaFon

•  ApplicaFonsexamples–  Climatemodeling

–  CombusFon

–  Fusion

–  Astrophysics

–  BioinformaFcs

•  DataIntensiveTasks–  Runslarge-scalesimulaFonsonsupercomputers

–  Dumpdataonparallelstoragesystems

–  Collectexperimental/observaFonaldata

–  Moveexperimental/observaFonaldatatoanalysissites

–  VisualanalyFcs–helpunderstanddatavisually

NotOnlyinInternetServices-BigDatainScien4ficDomains

Page 7: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 7NetworkBasedCompu4ngLaboratory

•  Hadoop:h<p://hadoop.apache.org–  ThemostpopularframeworkforBigDataAnalyFcs

–  HDFS,MapReduce,HBase,RPC,Hive,Pig,ZooKeeper,Mahout,etc.

•  Spark:h<p://spark-project.org–  ProvidesprimiFvesforin-memoryclustercompuFng;Jobscanloaddataintomemoryandqueryitrepeatedly

•  Storm:h<p://storm-project.net–  Adistributedreal-FmecomputaFonsystemforreal-FmeanalyFcs,onlinemachinelearning,conFnuous

computaFon,etc.

•  S4:h<p://incubator.apache.org/s4–  AdistributedsystemforprocessingconFnuousunboundedstreamsofdata

•  GraphLab:h<p://graphlab.org–  ConsistsofacoreC++GraphLabAPIandacollecFonofhigh-performancemachinelearninganddatamining

toolkitsbuiltontopoftheGraphLabAPI.

•  Web2.0:RDBMS+Memcached(h<p://memcached.org)–  Memcached:Ahigh-performance,distributedmemoryobjectcachingsystems

TypicalSolu4onsorArchitecturesforBigDataAnaly4cs

Page 8: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 8NetworkBasedCompu4ngLaboratory

BigDataProcessingwithHadoopComponents •  Majorcomponentsincludedinthis

tutorial:–  MapReduce(Batch)–  HBase(Query)–  HDFS(Storage)–  RPC(Inter-processcommunicaFon)

•  UnderlyingHadoopDistributedFileSystem(HDFS)usedbybothMapReduceandHBase

•  ModelscalesbuthighamountofcommunicaFonduringintermediatephasescanbefurtheropFmized

HDFS

MapReduce

HadoopFramework

UserApplica4ons

HBase

HadoopCommon(RPC)

Page 9: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 9NetworkBasedCompu4ngLaboratory

Worker

Worker

Worker

Worker

Master

HDFSDriver

Zookeeper

Worker

SparkContext

•  Anin-memorydata-processingframework–  IteraFvemachinelearningjobs–  InteracFvedataanalyFcs–  ScalabasedImplementaFon

–  Standalone,YARN,Mesos

•  ScalableandcommunicaFonintensive–  WidedependenciesbetweenResilient

DistributedDatasets(RDDs)–  MapReduce-likeshuffleoperaFonsto

reparFFonRDDs

–  SocketsbasedcommunicaFon

SparkArchitectureOverview

hWp://spark.apache.org

Page 10: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 10NetworkBasedCompu4ngLaboratory

MemcachedArchitecture

•  DistributedCachingLayer–  AllowstoaggregatesparememoryfrommulFplenodes–  Generalpurpose

•  Typicallyusedtocachedatabasequeries,resultsofAPIcalls•  Scalablemodel,buttypicalusageverynetworkintensive

Main memory CPUs

SSD HDD

High Performance Networks

... ...

...

Main memory CPUs

SSD HDD

Main memory CPUs

SSD HDD

Main memory CPUs

SSD HDD

Main memory CPUs

SSD HDD

Page 11: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 11NetworkBasedCompu4ngLaboratory

•  SubstanFalimpactondesigninganduFlizingdatamanagementandprocessingsystemsinmulFpleFers

–  Front-enddataaccessingandserving(Online)•  Memcached+DB(e.g.MySQL),HBase

–  Back-enddataanalyFcs(Offline)•  HDFS,MapReduce,Spark

DataManagementandProcessingonModernClusters

Internet

Front-end Tier Back-end Tier

Web ServerWeb

ServerWeb Server

Memcached+ DB (MySQL)Memcached

+ DB (MySQL)Memcached+ DB (MySQL)

NoSQL DB (HBase)NoSQL DB

(HBase)NoSQL DB (HBase)

HDFS

MapReduce Spark

Data Analytics Apps/Jobs

Data Accessing and Serving

Page 12: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 12NetworkBasedCompu4ngLaboratory

•  IntroducedinOct2000•  HighPerformanceDataTransfer

–  InterprocessorcommunicaFonandI/O–  Lowlatency(<1.0microsec),Highbandwidth(upto12.5GigaBytes/sec->100Gbps),and

lowCPUuFlizaFon(5-10%)•  MulFpleOperaFons

–  Send/Recv–  RDMARead/Write–  AtomicOperaFons(veryunique)

•  highperformanceandscalableimplementaFonsofdistributedlocks,semaphores,collecFvecommunicaFonoperaFons

•  Leadingtobigchangesindesigning–  HPCclusters–  Filesystems–  CloudcompuFngsystems–  GridcompuFngsystems

OpenStandardInfiniBandNetworkingTechnology

Page 13: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 13NetworkBasedCompu4ngLaboratory

InterconnectsandProtocolsinOpenFabricsStack

Kernel Space

Applica4on/Middleware

Verbs

EthernetAdapter

EthernetSwitch

EthernetDriver

TCP/IP

1/10/40/100GigE

InfiniBandAdapter

InfiniBandSwitch

IPoIB

IPoIB

EthernetAdapter

EthernetSwitch

HardwareOffload

TCP/IP

10/40GigE-TOE

InfiniBandAdapter

InfiniBandSwitch

UserSpace

RSockets

RSockets

iWARPAdapter

EthernetSwitch

TCP/IP

UserSpace

iWARP

RoCEAdapter

EthernetSwitch

RDMA

UserSpace

RoCE

InfiniBandSwitch

InfiniBandAdapter

RDMA

UserSpace

IBNa4ve

Sockets

Applica4on/MiddlewareInterface

Protocol

Adapter

Switch

InfiniBandAdapter

InfiniBandSwitch

RDMA

SDP

SDP

Page 14: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 14NetworkBasedCompu4ngLaboratory

HowCanHPCClusterswithHigh-PerformanceInterconnectandStorageArchitecturesBenefitBigDataApplica4ons?

BringHPCandBigDataprocessingintoa“convergenttrajectory”!

Whatarethemajorbo<lenecksincurrentBig

Dataprocessingmiddleware(e.g.Hadoop,Spark,andMemcached)?

Canthebo<lenecksbealleviatedwithnewdesignsbytakingadvantageofHPCtechnologies?

CanRDMA-enabledhigh-performanceinterconnectsbenefitBigDataprocessing?

CanHPCClusterswithhigh-performance

storagesystems(e.g.SSD,parallelfile

systems)benefitBigDataapplicaFons?

Howmuchperformancebenefits

canbeachievedthroughenhanced

designs?HowtodesignbenchmarksforevaluaFngthe

performanceofBigDatamiddlewareon

HPCclusters?

Page 15: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 15NetworkBasedCompu4ngLaboratory

DesigningCommunica4onandI/OLibrariesforBigDataSystems:Challenges

BigDataMiddleware(HDFS,MapReduce,HBase,SparkandMemcached)

NetworkingTechnologies(InfiniBand,1/10/40/100GigE

andIntelligentNICs)

StorageTechnologies(HDD,SSD,andNVMe-SSD)

ProgrammingModels(Sockets)

Applica4ons

CommodityCompu4ngSystemArchitectures

(Mul4-andMany-corearchitecturesandaccelerators)

OtherProtocols?

Communica4onandI/OLibraryPoint-to-PointCommunica4on

QoS

ThreadedModelsandSynchroniza4on

Fault-ToleranceI/OandFileSystems

Virtualiza4on

Benchmarks

UpperlevelChanges?

Page 16: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 16NetworkBasedCompu4ngLaboratory

•  Socketsnotdesignedforhigh-performance–  StreamsemanFcsowenmismatchforupperlayers–  Zero-copynotavailablefornon-blockingsockets

CanBigDataProcessingSystemsbeDesignedwithHigh-PerformanceNetworksandProtocols?

CurrentDesign

Applica4on

Sockets

1/10/40/100GigENetwork

OurApproach

Applica4on

OSUDesign

10/40/100GigEorInfiniBand

VerbsInterface

Page 17: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 17NetworkBasedCompu4ngLaboratory

•  RDMAforApacheSpark

•  RDMAforApacheHadoop2.x(RDMA-Hadoop-2.x)–  PluginsforApache,Hortonworks(HDP)andCloudera(CDH)HadoopdistribuFons

•  RDMAforApacheHadoop1.x(RDMA-Hadoop)

•  RDMAforMemcached(RDMA-Memcached)

•  OSUHiBD-Benchmarks(OHB)–  HDFSandMemcachedMicro-benchmarks

•  hWp://hibd.cse.ohio-state.edu

•  UsersBase:155organizaFonsfrom20countries

•  Morethan15,500downloadsfromtheprojectsite

•  RDMAforApacheHBaseandImpala(upcoming)

TheHigh-PerformanceBigData(HiBD)Project

AvailableforInfiniBandandRoCE

Page 18: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 18NetworkBasedCompu4ngLaboratory

•  High-PerformanceDesignofHadoopoverRDMA-enabledInterconnects

–  HighperformanceRDMA-enhanceddesignwithnaFveInfiniBandandRoCEsupportattheverbs-levelforHDFS,MapReduce,andRPCcomponents

–  EnhancedHDFSwithin-memoryandheterogeneousstorage

–  HighperformancedesignofMapReduceoverLustre

–  Plugin-basedarchitecturesupporFngRDMA-baseddesignsforApacheHadoop,CDHandHDP

–  Easilyconfigurablefordifferentrunningmodes(HHH,HHH-M,HHH-L,andMapReduceoverLustre)anddifferentprotocols(naFveInfiniBand,RoCE,andIPoIB)

•  Currentrelease:0.9.9

–  BasedonApacheHadoop2.7.1

–  CompliantwithApacheHadoop2.7.1,HDP2.3.0.0andCDH5.6.0APIsandapplicaFons

–  Testedwith

•  MellanoxInfiniBandadapters(DDR,QDRandFDR)

•  RoCEsupportwithMellanoxadapters

•  VariousmulF-corepla|orms

•  DifferentfilesystemswithdisksandSSDsandLustre

–  hWp://hibd.cse.ohio-state.edu

RDMAforApacheHadoop2.xDistribu4on

Page 19: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 19NetworkBasedCompu4ngLaboratory

•  High-PerformanceDesignofSparkoverRDMA-enabledInterconnects–  HighperformanceRDMA-enhanceddesignwithnaFveInfiniBandandRoCEsupportattheverbs-level

forSpark

–  RDMA-baseddatashuffleandSEDA-basedshufflearchitecture

–  Non-blockingandchunk-baseddatatransfer

–  Easilyconfigurablefordifferentprotocols(naFveInfiniBand,RoCE,andIPoIB)

•  Currentrelease:0.9.1

–  BasedonApacheSpark1.5.1

–  Testedwith•  MellanoxInfiniBandadapters(DDR,QDRandFDR)

•  RoCEsupportwithMellanoxadapters

•  VariousmulF-corepla|orms

•  RAMdisks,SSDs,andHDD

–  hWp://hibd.cse.ohio-state.edu

RDMAforApacheSparkDistribu4on

Page 20: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 20NetworkBasedCompu4ngLaboratory

•  High-PerformanceDesignofMemcachedoverRDMA-enabledInterconnects–  HighperformanceRDMA-enhanceddesignwithnaFveInfiniBandandRoCEsupportattheverbs-levelfor

MemcachedandlibMemcachedcomponents

–  HighperformancedesignofSSD-AssistedHybridMemory

–  EasilyconfigurablefornaFveInfiniBand,RoCEandthetradiFonalsockets-basedsupport(EthernetandInfiniBandwithIPoIB)

•  Currentrelease:0.9.4

–  BasedonMemcached1.4.24andlibMemcached1.0.18

–  CompliantwithlibMemcachedAPIsandapplicaFons

–  Testedwith•  MellanoxInfiniBandadapters(DDR,QDRandFDR)•  RoCEsupportwithMellanoxadapters•  VariousmulF-corepla|orms•  SSD

–  hWp://hibd.cse.ohio-state.edu

RDMAforMemcachedDistribu4on

Page 21: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 21NetworkBasedCompu4ngLaboratory

•  Micro-benchmarksforHadoopDistributedFileSystem(HDFS)–  SequenFalWriteLatency(SWL)Benchmark,SequenFalReadLatency(SRL)

Benchmark,RandomReadLatency(RRL)Benchmark,SequenFalWriteThroughput(SWT)Benchmark,SequenFalReadThroughput(SRT)Benchmark

–  Supportbenchmarkingof•  ApacheHadoop1.xand2.xHDFS,HortonworksDataPla|orm(HDP)HDFS,Cloudera

DistribuFonofHadoop(CDH)HDFS

•  Micro-benchmarksforMemcached–  GetBenchmark,SetBenchmark,andMixedGet/SetBenchmark

•  Currentrelease:0.8

•  hWp://hibd.cse.ohio-state.edu

OSUHiBDMicro-Benchmark(OHB)Suite–HDFS&Memcached

Page 22: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 22NetworkBasedCompu4ngLaboratory

•  HHH:HeterogeneousstoragedeviceswithhybridreplicaFonschemesaresupportedinthismodeofoperaFontohavebe<erfault-toleranceaswellasperformance.Thismodeisenabledbydefaultinthepackage.

•  HHH-M:Ahigh-performancein-memorybasedsetuphasbeenintroducedinthispackagethatcanbeuFlizedtoperformallI/OoperaFonsin-memoryandobtainasmuchperformancebenefitaspossible.

•  HHH-L:Withparallelfilesystemsintegrated,HHH-LmodecantakeadvantageoftheLustreavailableinthecluster.

•  MapReduceoverLustre,with/withoutlocaldisks:Besides,HDFSbasedsoluFons,thispackagealsoprovidessupporttorunMapReducejobsontopofLustrealone.Here,twodifferentmodesareintroduced:withlocaldisksandwithoutlocaldisks.

•  RunningwithSlurmandPBS:SupportsdeployingRDMAforApacheHadoop2.xwithSlurmandPBSindifferentrunningmodes(HHH,HHH-M,HHH-L,andMapReduceoverLustre).

DifferentModesofRDMAforApacheHadoop2.x

Page 23: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 23NetworkBasedCompu4ngLaboratory

•  RDMA-basedDesignsandPerformanceEvaluaFon–  HDFS–  MapReduce–  RPC–  HBase–  Spark–  Memcached(Basic,HybridandNon-blockingAPIs)–  HDFS+Memcached-basedBurstBuffer–  OSUHiBDBenchmarks(OHB)

Accelera4onCaseStudiesandPerformanceEvalua4on

Page 24: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 24NetworkBasedCompu4ngLaboratory

•  EnableshighperformanceRDMAcommunicaFon,whilesupporFngtradiFonalsocketinterface•  JNILayerbridgesJavabasedHDFSwithcommunicaFonlibrarywri<eninnaFvecode

DesignOverviewofHDFSwithRDMA

HDFS

Verbs

RDMACapableNetworks(IB,iWARP,RoCE..)

Applica4ons

1/10/40/100GigE,IPoIBNetwork

JavaSocketInterface JavaNa4veInterface(JNI)WriteOthers

OSUDesign

•  DesignFeatures–  RDMA-basedHDFSwrite–  RDMA-basedHDFS

replicaFon–  ParallelreplicaFon

support–  On-demandconnecFon

setup–  InfiniBand/RoCEsupport

N.S.Islam,M.W.Rahman,J.Jose,R.Rajachandrasekar,H.Wang,H.Subramoni,C.MurthyandD.K.Panda,HighPerformanceRDMA-BasedDesignofHDFSoverInfiniBand,Supercompu4ng(SC),Nov2012

N.Islam,X.Lu,W.Rahman,andD.K.Panda,SOR-HDFS:ASEDA-basedApproachtoMaximizeOverlappinginRDMA-EnhancedHDFS,HPDC'14,June2014

Page 25: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 25NetworkBasedCompu4ngLaboratory

Triple-H

HeterogeneousStorage

•  DesignFeatures–  Threemodes

•  Default(HHH)•  In-Memory(HHH-M)

•  Lustre-Integrated(HHH-L)–  PoliciestoefficientlyuFlizetheheterogeneous

storagedevices•  RAM,SSD,HDD,Lustre

–  EvicFon/PromoFonbasedondatausagepa<ern

–  HybridReplicaFon

–  Lustre-Integratedmode:

•  Lustre-basedfault-tolerance

EnhancedHDFSwithIn-MemoryandHeterogeneousStorage

HybridReplica4on

DataPlacementPolicies

Evic4on/Promo4on

RAMDisk SSD HDD

Lustre

N.Islam,X.Lu,M.W.Rahman,D.Shankar,andD.K.Panda,Triple-H:AHybridApproachtoAccelerateHDFSonHPCClusterswithHeterogeneousStorageArchitecture,CCGrid’15,May2015

Applica4ons

Page 26: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 26NetworkBasedCompu4ngLaboratory

DesignOverviewofMapReducewithRDMA

MapReduce

Verbs

RDMACapableNetworks(IB,iWARP,RoCE..)

OSUDesign

Applica4ons

1/10/40/100GigE,IPoIBNetwork

JavaSocketInterface JavaNa4veInterface(JNI)

JobTracker

TaskTracker

Map

Reduce

•  EnableshighperformanceRDMAcommunicaFon,whilesupporFngtradiFonalsocketinterface•  JNILayerbridgesJavabasedMapReducewithcommunicaFonlibrarywri<eninnaFvecode

•  DesignFeatures–  RDMA-basedshuffle–  Prefetchingandcachingmapoutput–  EfficientShuffleAlgorithms–  In-memorymerge

–  On-demandShuffleAdjustment–  Advancedoverlapping

•  map,shuffle,andmerge

•  shuffle,merge,andreduce

–  On-demandconnecFonsetup–  InfiniBand/RoCEsupport

M.W.Rahman,X.Lu,N.S.Islam,andD.K.Panda,HOMR:AHybridApproachtoExploitMaximumOverlappinginMapReduceoverHighPerformanceInterconnects,ICS,June2014

Page 27: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 27NetworkBasedCompu4ngLaboratory

•  AhybridapproachtoachievemaximumpossibleoverlappinginMapReduceacrossallphasescomparedtootherapproaches–  EfficientShuffleAlgorithms–  DynamicandEfficientSwitching

–  On-demandShuffleAdjustment

AdvancedOverlappingamongDifferentPhases

DefaultArchitecture

EnhancedOverlappingwithIn-MemoryMerge

AdvancedHybridOverlapping

M.W.Rahman,X.Lu,N.S.Islam,andD.K.Panda,HOMR:AHybridApproachtoExploitMaximumOverlappinginMapReduceoverHighPerformanceInterconnects,ICS,June2014

Page 28: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 28NetworkBasedCompu4ngLaboratory

•  DesignFeatures–  JVM-bypassedbuffer

management–  RDMAorsend/recvbased

adapFvecommunicaFon–  IntelligentbufferallocaFonand

adjustmentforserializaFon–  On-demandconnecFonsetup

–  InfiniBand/RoCEsupport

DesignOverviewofHadoopRPCwithRDMA

HadoopRPC

Verbs

RDMACapableNetworks(IB,iWARP,RoCE..)

Applica4ons

1/10/40/100GigE,IPoIBNetwork

JavaSocketInterface JavaNa4veInterface(JNI)

OurDesign

Default

OSUDesign

•  EnableshighperformanceRDMAcommunicaFon,whilesupporFngtradiFonalsocketinterface•  JNILayerbridgesJavabasedRPCwithcommunicaFonlibrarywri<eninnaFvecode X.Lu,N.Islam,M.W.Rahman,J.Jose,H.Subramoni,H.Wang,andD.K.Panda,High-PerformanceDesignofHadoopRPCwith

RDMAoverInfiniBand,Int'lConferenceonParallelProcessing(ICPP'13),October2013.

Page 29: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 29NetworkBasedCompu4ngLaboratory

0

50

100

150

200

250

80 100 120

Execu4

onTim

e(s)

DataSize(GB)

IPoIB(FDR)

0

50

100

150

200

250

80 100 120

Execu4

onTim

e(s)

DataSize(GB)

IPoIB(FDR)

PerformanceBenefits–RandomWriter&TeraGeninTACC-Stampede

Clusterwith32Nodeswithatotalof128maps

•  RandomWriter–  3-4ximprovementoverIPoIB

for80-120GBfilesize

•  TeraGen–  4-5ximprovementoverIPoIB

for80-120GBfilesize

RandomWriter TeraGen

Reducedby3x Reducedby4x

Page 30: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 30NetworkBasedCompu4ngLaboratory

0100200300400500600700800900

80 100 120

Execu4

onTim

e(s)

DataSize(GB)

IPoIB(FDR) OSU-IB(FDR)

0

100

200

300

400

500

600

80 100 120

Execu4

onTim

e(s)

DataSize(GB)

IPoIB(FDR) OSU-IB(FDR)

PerformanceBenefits–Sort&TeraSortinTACC-Stampede

Clusterwith32Nodeswithatotalof128mapsand64reduces

•  SortwithsingleHDDpernode–  40-52%improvementoverIPoIB

for80-120GBdata

•  TeraSortwithsingleHDDpernode–  42-44%improvementoverIPoIB

for80-120GBdata

Reducedby52% Reducedby44%

Clusterwith32Nodeswithatotalof128mapsand57reduces

Page 31: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 31NetworkBasedCompu4ngLaboratory

Evalua4onofHHHandHHH-LwithApplica4ons

HDFS(FDR) HHH(FDR)

60.24s 48.3s

CloudBurstMR-MSPolyGraph

0

200

400

600

800

1000

4 6 8

ExecuF

onTim

e(s)

Concurrentmapsperhost

HDFS Lustre HHH-L Reducedby79%

•  MR-MSPolygraphonOSURIwith1,000maps–  HHH-LreducestheexecuFonFme

by79%overLustre,30%overHDFS

•  CloudBurstonTACCStampede–  WithHHH:19%improvementover

HDFS

Page 32: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 32NetworkBasedCompu4ngLaboratory

Evalua4onwithSparkonSDSCGordon(HHHvs.Tachyon/Alluxio)

•  For200GBTeraGenon32nodes

–  Spark-TeraGen:HHHhas2.4ximprovementoverTachyon;2.3xoverHDFS-IPoIB(QDR)

–  Spark-TeraSort:HHHhas25.2%improvementoverTachyon;17%overHDFS-IPoIB(QDR)

020406080

100120140160180

8:50 16:100 32:200

Execu4

onTim

e(s)

ClusterSize:DataSize(GB)

IPoIB(QDR) Tachyon OSU-IB(QDR)

0

100

200

300

400

500

600

700

8:50 16:100 32:200

Execu4

onTim

e(s)

ClusterSize:DataSize(GB)

Reducedby2.4x

Reducedby25.2%

TeraGen TeraSort

N.Islam,M.W.Rahman,X.Lu,D.Shankar,andD.K.Panda,PerformanceCharacteriza4onandAccelera4onofIn-MemoryFileSystemsforHadoopandSparkApplica4onsonHPCClusters,IEEEBigData’15,October2015

Page 33: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 33NetworkBasedCompu4ngLaboratory

IntermediateDataDirectory

DesignOverviewofShuffleStrategiesforMapReduceoverLustre•  DesignFeatures

–  Twoshuffleapproaches•  Lustrereadbasedshuffle

•  RDMAbasedshuffle

–  Hybridshufflealgorithmtotakebenefitfrombothshuffleapproaches

–  Dynamicallyadaptstothebe<ershuffleapproachforeachshufflerequestbasedonprofilingvaluesforeachLustrereadoperaFon

–  In-memorymergeandoverlappingofdifferentphasesarekeptsimilartoRDMA-enhancedMapReducedesign

Map1 Map2 Map3

Lustre

Reduce1 Reduce2LustreRead/RDMA

In-memorymerge/sort

reduce

M.W.Rahman,X.Lu,N.S.Islam,R.Rajachandrasekar,andD.K.Panda,HighPerformanceDesignofYARNMapReduceonModernHPCClusterswithLustreandRDMA,IPDPS,May2015

In-memorymerge/sort

reduce

Page 34: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 34NetworkBasedCompu4ngLaboratory

•  For500GBSortin64nodes–  44%improvementoverIPoIB(FDR)

PerformanceImprovementofMapReduceoverLustreonTACC-Stampede

•  For640GBSortin128nodes–  48%improvementoverIPoIB(FDR)

0

200

400

600

800

1000

1200

300 400 500

JobExecu4

onTim

e(sec)

DataSize(GB)

IPoIB(FDR)OSU-IB(FDR)

050

100150200250300350400450500

20GB 40GB 80GB 160GB 320GB 640GB

Cluster:4 Cluster:8 Cluster:16 Cluster:32 Cluster:64 Cluster:128

JobExecu4

onTim

e(sec) IPoIB(FDR) OSU-IB(FDR)

M.W.Rahman,X.Lu,N.S.Islam,R.Rajachandrasekar,andD.K.Panda,MapReduceoverLustre:CanRDMA-basedApproachBenefit?,Euro-Par,August2014.

•  LocaldiskisusedastheintermediatedatadirectoryReducedby48%Reducedby44%

Page 35: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 35NetworkBasedCompu4ngLaboratory

•  For80GBSortin8nodes–  34%improvementoverIPoIB(QDR)

CaseStudy-PerformanceImprovementofMapReduceoverLustreonSDSC-Gordon

•  For120GBTeraSortin16nodes–  25%improvementoverIPoIB(QDR)

•  Lustreisusedastheintermediatedatadirectory

0

100

200

300

400

500

600

700

800

900

40 60 80

JobExecu4

onTim

e(sec)

DataSize(GB)

IPoIB(QDR)OSU-Lustre-Read(QDR)OSU-RDMA-IB(QDR)OSU-Hybrid-IB(QDR)

0

100

200

300

400

500

600

700

800

900

40 80 120

JobExecu4

onTim

e(sec)

DataSize(GB)

Reducedby25%Reducedby34%

Page 36: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 36NetworkBasedCompu4ngLaboratory

•  RDMA-basedDesignsandPerformanceEvaluaFon–  HDFS–  MapReduce–  RPC–  HBase–  Spark–  Memcached(Basic,HybridandNon-blockingAPIs)–  HDFS+Memcached-basedBurstBuffer–  OSUHiBDBenchmarks(OHB)

Accelera4onCaseStudiesandPerformanceEvalua4on

Page 37: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 37NetworkBasedCompu4ngLaboratory

HBase-RDMADesignOverview

•  JNILayerbridgesJavabasedHBasewithcommunicaFonlibrarywri<eninnaFvecode•  EnableshighperformanceRDMAcommunicaFon,whilesupporFngtradiFonalsocketinterface

HBase

IBVerbs

RDMACapableNetworks(IB,iWARP,RoCE..)

OSU-IBDesign

Applica4ons

1/10/40/100GigE,IPoIBNetworks

JavaSocketInterface JavaNa4veInterface(JNI)

Page 38: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 38NetworkBasedCompu4ngLaboratory

HBase–YCSBRead-WriteWorkload

•  HBaseGetlatency(QDR,10GigE)–  64clients:2.0ms;128Clients:3.5ms–  42%improvementoverIPoIBfor128clients

•  HBasePutlatency(QDR,10GigE)–  64clients:1.9ms;128Clients:3.5ms–  40%improvementoverIPoIBfor128clients

0

1000

2000

3000

4000

5000

6000

7000

8 16 32 64 96 128

Time(us)

No.ofClients

010002000300040005000600070008000900010000

8 16 32 64 96 128

Time(us)

No.ofClients

10GigE

ReadLatency WriteLatency

OSU-IB(QDR)IPoIB(QDR)

J.Huang,X.Ouyang,J.Jose,M.W.Rahman,H.Wang,M.Luo,H.Subramoni,ChetMurthy,andD.K.Panda,High-PerformanceDesignofHBasewithRDMAoverInfiniBand,IPDPS’12

Page 39: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 39NetworkBasedCompu4ngLaboratory

HBase–YCSBGetLatencyandThroughputonSDSC-Comet

•  HBaseGetaveragelatency(FDR)–  4clientthreads:38us–  59%improvementoverIPoIBfor4clientthreads

•  HBaseGettotalthroughput–  4clientthreads:102Kops/sec–  2.4ximprovementoverIPoIBfor4clientthreads

GetLatency GetThroughput

0

0.02

0.04

0.06

0.08

0.1

0.12

1 2 3 4AverageLatency(m

s)

NumberofClientThreads

IPoIB(FDR) OSU-IB(FDR)

0

20

40

60

80

100

120

1 2 3 4TotalThrou

ghpu

t(Ko

ps/

sec)

NumberofClientThreads

IPoIB(FDR) OSU-IB(FDR)59% 2.4x

Page 40: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 40NetworkBasedCompu4ngLaboratory

•  RDMA-basedDesignsandPerformanceEvaluaFon–  HDFS–  MapReduce–  RPC–  HBase–  Spark–  Memcached(Basic,HybridandNon-blockingAPIs)–  HDFS+Memcached-basedBurstBuffer–  OSUHiBDBenchmarks(OHB)

Accelera4onCaseStudiesandPerformanceEvalua4on

Page 41: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 41NetworkBasedCompu4ngLaboratory

•  DesignFeatures–  RDMAbasedshuffle–  SEDA-basedplugins–  DynamicconnecFon

managementandsharing

–  Non-blockingdatatransfer–  Off-JVM-heapbuffer

management–  InfiniBand/RoCEsupport

DesignOverviewofSparkwithRDMA Spark Applications(Scala/Java/Python)

Spark(Scala/Java)

BlockFetcherIteratorBlockManager

Task TaskTask Task

Java NIO ShuffleServer

(default)

Netty ShuffleServer

(optional)

RDMA ShuffleServer

(plug-in)

Java NIO ShuffleFetcher(default)

Netty ShuffleFetcher

(optional)

RDMA ShuffleFetcher(plug-in)

Java Socket RDMA-based Shuffle Engine(Java/JNI)

1/10 Gig Ethernet/IPoIB (QDR/FDR) Network

Native InfiniBand (QDR/FDR)

•  EnableshighperformanceRDMAcommunicaFon,whilesupporFngtradiFonalsocketinterface•  JNILayerbridgesScalabasedSparkwithcommunicaFonlibrarywri<eninnaFvecode X.Lu,M.W.Rahman,N.Islam,D.Shankar,andD.K.Panda,Accelera4ngSparkwithRDMAforBigDataProcessing:Early

Experiences,Int'lSymposiumonHighPerformanceInterconnects(HotI'14),August2014

Page 42: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 42NetworkBasedCompu4ngLaboratory

•  InfiniBandFDR,SSD,64WorkerNodes,1536Cores,(1536M1536R)

•  RDMA-baseddesignforSpark1.5.1

•  RDMAvs.IPoIBwith1536concurrenttasks,singleSSDpernode.–  SortBy:TotalFmereducedbyupto80%overIPoIB(56Gbps)

–  GroupBy:TotalFmereducedbyupto57%overIPoIB(56Gbps)

PerformanceEvalua4ononSDSCComet–SortBy/GroupBy

64WorkerNodes,1536cores,SortByTestTotalTime 64WorkerNodes,1536cores,GroupByTestTotalTime

0

50

100

150

200

250

300

64 128 256

Time(sec)

DataSize(GB)

IPoIB

RDMA

0

50

100

150

200

250

64 128 256

Time(sec)

DataSize(GB)

IPoIB

RDMA

57%80%

Page 43: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 43NetworkBasedCompu4ngLaboratory

•  InfiniBandFDR,SSD,64WorkerNodes,1536Cores,(1536M1536R)

•  RDMA-baseddesignforSpark1.5.1

•  RDMAvs.IPoIBwith1536concurrenttasks,singleSSDpernode.–  Sort:TotalFmereducedby38%overIPoIB(56Gbps)

–  TeraSort:TotalFmereducedby15%overIPoIB(56Gbps)

PerformanceEvalua4ononSDSCComet–HiBenchSort/TeraSort

64WorkerNodes,1536cores,SortTotalTime 64WorkerNodes,1536cores,TeraSortTotalTime

050

100150200250300350400450

64 128 256

Time(sec)

DataSize(GB)

IPoIB

RDMA

0

100

200

300

400

500

600

64 128 256

Time(sec)

DataSize(GB)

IPoIB

RDMA

15%38%

Page 44: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 44NetworkBasedCompu4ngLaboratory

•  InfiniBandFDR,SSD,32/64WorkerNodes,768/1536Cores,(768/1536M768/1536R)

•  RDMA-baseddesignforSpark1.5.1

•  RDMAvs.IPoIBwith768/1536concurrenttasks,singleSSDpernode.–  32nodes/768cores:TotalFmereducedby37%overIPoIB(56Gbps)

–  64nodes/1536cores:TotalFmereducedby43%overIPoIB(56Gbps)

PerformanceEvalua4ononSDSCComet–HiBenchPageRank

32WorkerNodes,768cores,PageRankTotalTime 64WorkerNodes,1536cores,PageRankTotalTime

050

100150200250300350400450

Huge BigData GiganFc

Time(sec)

DataSize(GB)

IPoIB

RDMA

0

100

200

300

400

500

600

700

800

Huge BigData GiganFc

Time(sec)

DataSize(GB)

IPoIB

RDMA

43%37%

Page 45: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 45NetworkBasedCompu4ngLaboratory

•  RDMA-basedDesignsandPerformanceEvaluaFon–  HDFS–  MapReduce–  RPC–  HBase–  Spark–  Memcached(Basic,HybridandNon-blockingAPIs)–  HDFS+Memcached-basedBurstBuffer–  OSUHiBDBenchmarks(OHB)

Accelera4onCaseStudiesandPerformanceEvalua4on

Page 46: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 46NetworkBasedCompu4ngLaboratory

•  ServerandclientperformanegoFaFonprotocol–  Masterthreadassignsclientstoappropriateworkerthread

•  Onceaclientisassignedaverbsworkerthread,itcancommunicatedirectlyandis“bound”tothatthread

•  AllotherMemcacheddatastructuresaresharedamongRDMAandSocketsworkerthreads

•  MemcachedServercanservebothsocketandverbsclientssimultaneously

•  MemcachedapplicaFonsneednotbemodified;usesverbsinterfaceifavailable

Memcached-RDMADesign

SocketsClient

RDMAClient

MasterThread

SocketsWorkerThreadVerbsWorkerThread

SocketsWorkerThread

VerbsWorkerThread

SharedData

MemorySlabsItems…

1

1

2

2

Page 47: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 47NetworkBasedCompu4ngLaboratory

1

10

100

1000

1 2 4 8 16 32 641282565121K 2K 4K

Time(us)

MessageSize

OSU-IB(FDR)IPoIB(FDR)

0100200300400500600700

16 32 64 128 256 512 102420484080Thou

sand

sofT

ransac4o

ns

perS

econ

d(TPS)

No.ofClients

•  MemcachedGetlatency–  4bytesOSU-IB:2.84us;IPoIB:75.53us–  2KbytesOSU-IB:4.49us;IPoIB:123.42us

•  MemcachedThroughput(4bytes)–  4080clientsOSU-IB:556Kops/sec,IPoIB:233Kops/s–  Nearly2Ximprovementinthroughput

MemcachedGETLatency MemcachedThroughput

RDMA-MemcachedPerformance(FDRInterconnect)

ExperimentsonTACCStampede(IntelSandyBridgeCluster,IB:FDR)

LatencyReducedbynearly20X

2X

Page 48: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 48NetworkBasedCompu4ngLaboratory

•  IllustraFonwithRead-Cache-Readaccesspa<ernusingmodifiedmysqlslaploadtesFngtool

•  Memcached-RDMAcan-  improvequerylatencybyupto66%overIPoIB(32Gbps)

-  throughputbyupto69%overIPoIB(32Gbps)

Micro-benchmarkEvalua4onforOLDPworkloads

012345678

64 96 128 160 320 400

Latency(sec)

No.ofClients

Memcached-IPoIB(32Gbps)Memcached-RDMA(32Gbps)

0

1000

2000

3000

4000

64 96 128 160 320 400

Throughp

ut(Kq

/s)

No.ofClients

Memcached-IPoIB(32Gbps)Memcached-RDMA(32Gbps)

D.Shankar,X.Lu,J.Jose,M.W.Rahman,N.Islam,andD.K.Panda,CanRDMABenefitOn-LineDataProcessingWorkloadswithMemcachedandMySQL,ISPASS’15

Reducedby66%

Page 49: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 49NetworkBasedCompu4ngLaboratory

•  ohb_memlat&ohb_memthrlatency&throughputmicro-benchmarks

•  Memcached-RDMAcan-  improvequerylatencybyupto70%overIPoIB(32Gbps)

-  improvethroughputbyupto2XoverIPoIB(32Gbps)

-  Nooverheadinusinghybridmodewhenalldatacanfitinmemory

PerformanceBenefitsofHybridMemcached(Memory+SSD)onSDSC-Gordon

0

2

4

6

8

10

64 128 256 512 1024

Throughp

ut(m

illiontran

s/sec)

No.ofClients

IPoIB(32Gbps)RDMA-Mem(32Gbps)RDMA-Hybrid(32Gbps)

0

100

200

300

400

500

Averagelatency(us)

MessageSize(Bytes)

2X

Page 50: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 50NetworkBasedCompu4ngLaboratory

–  MemcachedlatencytestwithZipfdistribuFon,serverwith1GBmemory,32KBkey-valuepairsize,totalsizeofdataaccessedis1GB(whendatafitsinmemory)and1.5GB(whendatadoesnotfitinmemory)

–  Whendatafitsinmemory:RDMA-Mem/Hybridgives5ximprovementoverIPoIB-Mem

–  Whendatadoesnotfitinmemory:RDMA-Hybridgives2x-2.5xoverIPoIB/RDMA-Mem

PerformanceEvalua4ononIBFDR+SATA/NVMeSSDs

0

500

1000

1500

2000

2500

Set Get Set Get Set Get Set Get Set Get Set Get Set Get Set Get

IPoIB-Mem RDMA-Mem RDMA-Hybrid-SATA RDMA-Hybrid-NVMe

IPoIB-Mem RDMA-Mem RDMA-Hybrid-SATA RDMA-Hybrid-NVMe

DataFitsInMemory DataDoesNotFitInMemory

Latency(us)

slaballocaFon(SSDwrite) cachecheck+load(SSDread) cacheupdate serverresponse clientwait miss-penalty

Page 51: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 51NetworkBasedCompu4ngLaboratory

–  RDMA-AcceleratedCommunicaFonforMemcachedGet/Set

–  Hybrid‘RAM+SSD’slabmanagementforhigherdataretenFon

–  Non-blockingAPIextensions•  memcached_(iset/iget/bset/bget/test/wait)•  Achievenearin-memoryspeedswhilehiding

bo<lenecksofnetworkandSSDI/O•  AbilitytoexploitcommunicaFon/computaFon

overlap•  OpFonalbufferre-useguarantees

–  AdapFveslabmanagerwithdifferentI/Oschemesforhigherthroughput.

Accelera4ngHybridMemcachedwithRDMA,Non-blockingExtensionsandSSDs

D.Shankar,X.Lu,N.S.Islam,M.W.Rahman,andD.K.Panda,High-PerformanceHybridKey-ValueStoreonModernClusterswithRDMAInterconnectsandSSDs:Non-blockingExtensions,Designs,andBenefits,IPDPS,May2016

BLOCKINGAPI

NON-BLOCKINGAPIREQ.

NON-BLOCKINGAPIREPLY

CLIENT

SERV

ER

HYBRIDSLABMANAGER(RAM+SSD)

RDMA-ENHANCEDCOMMUNICATIONLIBRARY

RDMA-ENHANCEDCOMMUNICATIONLIBRARY

LIBMEMCACHEDLIBRARY

BlockingAPIFlow

Non-BlockingAPIFlow

Page 52: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 52NetworkBasedCompu4ngLaboratory

–  Datadoesnotfitinmemory:Non-blockingMemcachedSet/GetAPIExtensionscanachieve•  >16xlatencyimprovementvs.blockingAPIoverRDMA-Hybrid/RDMA-Memw/penalty•  >2.5xthroughputimprovementvs.blockingAPIoverdefault/opFmizedRDMA-Hybrid

–  Datafitsinmemory:Non-blockingExtensionsperformsimilartoRDMA-Mem/RDMA-Hybridand>3.6ximprovementoverIPoIB-Mem

PerformanceEvalua4onwithNon-BlockingMemcachedAPI

0

500

1000

1500

2000

2500

Set Get Set Get Set Get Set Get Set Get Set Get

IPoIB-Mem RDMA-Mem H-RDMA-Def H-RDMA-Opt-BlockH-RDMA-Opt-NonB-i

H-RDMA-Opt-NonB-b

AverageLatency(us)

MissPenalty(BackendDBAccessOverhead)ClientWaitServerResponseCacheUpdateCachecheck+Load(Memoryand/orSSDread)SlabAllocaFon(w/SSDwriteonOut-of-Mem)

H=HybridMemcachedoverSATASSDOpt=AdapFveslabmanagerBlock=DefaultBlockingAPINonB-i=Non-blockingiset/igetAPINonB-b=Non-blockingbset/bgetAPIw/bufferre-useguarantee

Page 53: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 53NetworkBasedCompu4ngLaboratory

•  RDMA-basedDesignsandPerformanceEvaluaFon–  HDFS–  MapReduce–  RPC–  HBase–  Spark–  Memcached(Basic,HybridandNon-blockingAPIs)–  HDFS+Memcached-basedBurstBuffer–  OSUHiBDBenchmarks(OHB)

Accelera4onCaseStudiesandPerformanceEvalua4on

Page 54: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 54NetworkBasedCompu4ngLaboratory

•  DesignFeatures–  Memcached-basedburst-buffer

system

•  Hideslatencyofparallelfilesystemaccess

•  ReadfromlocalstorageandMemcached

–  DatalocalityachievedbywriFngdatatolocalstorage

–  DifferentapproachesofintegraFonwithparallelfilesystemtoguaranteefault-tolerance

Accelera4ngI/OPerformanceofBigDataAnaly4csthroughRDMA-basedKey-ValueStore

Applica4on

I/OForwardingModule

Map/ReduceTask DataNode

LocalDisk

DataLocalityFault-tolerance

Lustre

Memcached-basedBurstBufferSystem

Page 55: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 55NetworkBasedCompu4ngLaboratory

Evalua4onwithPUMAWorkloads

GainsonOSURIwithourapproach(Mem-bb)on24nodes

•  SequenceCount:34.5%overLustre,40%overHDFS

•  RankedInvertedIndex:27.3%overLustre,48.3%overHDFS

•  HistogramRaFng:17%overLustre,7%overHDFS

050010001500200025003000350040004500

SeqCount RankedInvIndex HistoRaFng

Execu4

onTim

e(s)

Workloads

HDFS(32Gbps)Lustre(32Gbps)Mem-bb(32Gbps)

48.3%

40%

17%

N.S.Islam,D.Shankar,X.Lu,M.W.Rahman,andD.K.Panda,Accelera4ngI/OPerformanceofBigDataAnaly4cswithRDMA-basedKey-ValueStore,ICPP’15,September2015

Page 56: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 56NetworkBasedCompu4ngLaboratory

•  RDMA-basedDesignsandPerformanceEvaluaFon–  HDFS–  MapReduce–  RPC–  HBase–  Spark–  Memcached(Basic,HybridandNon-blockingAPIs)–  HDFS+Memcached-basedBurstBuffer–  OSUHiBDBenchmarks(OHB)

Accelera4onCaseStudiesandPerformanceEvalua4on

Page 57: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 57NetworkBasedCompu4ngLaboratory

•  Thecurrentbenchmarksprovidesomeperformancebehavior

•  However,donotprovideanyinformaFontothedesigner/developeron:–  Whatishappeningatthelower-layer?

–  Wherethebenefitsarecomingfrom?

–  Whichdesignisleadingtobenefitsorbo<lenecks?

–  Whichcomponentinthedesignneedstobechangedandwhatwillbeitsimpact?

–  Canperformancegain/lossatthelower-layerbecorrelatedtotheperformancegain/lossobservedattheupperlayer?

AretheCurrentBenchmarksSufficientforBigData?

Page 58: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 58NetworkBasedCompu4ngLaboratory

BigDataMiddleware(HDFS,MapReduce,HBase,SparkandMemcached)

NetworkingTechnologies(InfiniBand,1/10/40/100GigE

andIntelligentNICs)

StorageTechnologies

(HDD,SSD,andNVMe-SSD)

ProgrammingModels(Sockets)

Applica4ons

CommodityCompu4ngSystemArchitectures

(Mul4-andMany-corearchitecturesandaccelerators)

OtherProtocols?

Communica4onandI/OLibrary

Point-to-PointCommunica4on

QoS

ThreadedModelsandSynchroniza4on

Fault-ToleranceI/OandFileSystems

Virtualiza4on

Benchmarks

RDMAProtocols

ChallengesinBenchmarkingofRDMA-basedDesigns

CurrentBenchmarks

NoBenchmarks

Correla4on?

Page 59: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 59NetworkBasedCompu4ngLaboratory

•  Acomprehensivesuiteofbenchmarksto–  CompareperformanceofdifferentMPIlibrariesonvariousnetworksandsystems–  Validatelow-levelfuncFonaliFes–  ProvideinsightstotheunderlyingMPI-leveldesigns

•  Startedwithbasicsend-recv(MPI-1)micro-benchmarksforlatency,bandwidthandbi-direcFonalbandwidth•  Extendedlaterto

–  MPI-2one-sided–  CollecFves–  GPU-awaredatamovement–  OpenSHMEM(point-to-pointandcollecFves)–  UPC

•  Hasbecomeanindustrystandard•  Extensivelyusedfordesign/developmentofMPIlibraries,performancecomparisonofMPIlibrariesandeven

inprocurementoflarge-scalesystems•  Availablefromh<p://mvapich.cse.ohio-state.edu/benchmarks

•  AvailableinanintegratedmannerwithMVAPICH2stack

OSUMPIMicro-Benchmarks(OMB)Suite

Page 60: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 60NetworkBasedCompu4ngLaboratory

BigDataMiddleware(HDFS,MapReduce,HBase,SparkandMemcached)

NetworkingTechnologies(InfiniBand,1/10/40/100GigE

andIntelligentNICs)

StorageTechnologies

(HDD,SSD,andNVMe-SSD)

ProgrammingModels(Sockets)

Applica4ons

CommodityCompu4ngSystemArchitectures

(Mul4-andMany-corearchitecturesandaccelerators)

OtherProtocols?

Communica4onandI/OLibrary

Point-to-PointCommunica4on

QoS

ThreadedModelsandSynchroniza4on

Fault-ToleranceI/OandFileSystems

Virtualiza4on

Benchmarks

RDMAProtocols

Itera4veProcess–RequiresDeeperInves4ga4onandDesignforBenchmarkingNextGenera4onBigDataSystemsandApplica4ons

Applica4ons-LevelBenchmarks

Micro-Benchmarks

Page 61: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 61NetworkBasedCompu4ngLaboratory

•  HDFSBenchmarks

–  SequenFalWriteLatency(SWL)Benchmark

–  SequenFalReadLatency(SRL)Benchmark

–  RandomReadLatency(RRL)Benchmark

–  SequenFalWriteThroughput(SWT)Benchmark

–  SequenFalReadThroughput(SRT)Benchmark

•  MemcachedBenchmarks

–  GetBenchmark

–  SetBenchmark

–  MixedGet/SetBenchmark

•  AvailableasapartofOHB0.8

OSUHiBDBenchmarks(OHB)N.S.Islam,X.Lu,M.W.Rahman,J.Jose,andD.K.Panda,AMicro-benchmarkSuiteforEvalua4ngHDFSOpera4onsonModernClusters,Int'lWorkshoponBigDataBenchmarking(WBDB'12),December2012

D.Shankar,X.Lu,M.W.Rahman,N.Islam,andD.K.Panda,AMicro-BenchmarkSuiteforEvalua4ngHadoopMapReduceonHigh-PerformanceNetworks,BPOE-5(2014)

X.Lu,M.W.Rahman,N.Islam,andD.K.Panda,AMicro-BenchmarkSuiteforEvalua4ngHadoopRPConHigh-PerformanceNetworks,Int'lWorkshoponBigDataBenchmarking(WBDB'13),July2013

TobeReleased

Page 62: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 62NetworkBasedCompu4ngLaboratory

•  UpcomingReleasesofRDMA-enhancedPackageswillsupport

–  HBase

–  Impala

•  UpcomingReleasesofOSUHiBDMicro-Benchmarks(OHB)willsupport

–  MapReduce

–  RPC•  Advanceddesignswithupper-levelchangesandopFmizaFons

–  MemcachedwithNon-blockingAPI

–  HDFS+Memcached-basedBurstBuffer

On-goingandFuturePlansofOSUHighPerformanceBigData(HiBD)Project

Page 63: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 63NetworkBasedCompu4ngLaboratory

•  DiscussedchallengesinacceleraFngHadoop,SparkandMemcached

•  PresentediniFaldesignstotakeadvantageofInfiniBand/RDMAforHDFS,MapReduce,RPC,Spark,andMemcached

•  Resultsarepromising

•  Manyotheropenissuesneedtobesolved

•  WillenableBigDatacommunitytotakeadvantageofmodernHPCtechnologiestocarryouttheiranalyFcsinafastandscalablemanner

ConcludingRemarks

Page 64: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 64NetworkBasedCompu4ngLaboratory

FundingAcknowledgmentsFundingSupportby

EquipmentSupportby

Page 65: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 65NetworkBasedCompu4ngLaboratory

PersonnelAcknowledgmentsCurrentStudents

–  A.AugusFne(M.S.)

–  A.Awan(Ph.D.)–  S.Chakraborthy(Ph.D.)

–  C.-H.Chu(Ph.D.)–  N.Islam(Ph.D.)

–  M.Li(Ph.D.)

PastStudents–  P.Balaji(Ph.D.)

–  S.Bhagvat(M.S.)

–  A.Bhat(M.S.)

–  D.BunFnas(Ph.D.)

–  L.Chai(Ph.D.)

–  B.Chandrasekharan(M.S.)

–  N.Dandapanthula(M.S.)

–  V.Dhanraj(M.S.)

–  T.Gangadharappa(M.S.)–  K.Gopalakrishnan(M.S.)

–  G.Santhanaraman(Ph.D.)–  A.Singh(Ph.D.)

–  J.Sridhar(M.S.)

–  S.Sur(Ph.D.)

–  H.Subramoni(Ph.D.)

–  K.Vaidyanathan(Ph.D.)

–  A.Vishnu(Ph.D.)

–  J.Wu(Ph.D.)

–  W.Yu(Ph.D.)

PastResearchScien:st–  S.Sur

CurrentPost-Doc–  J.Lin

–  D.Banerjee

CurrentProgrammer–  J.Perkins

PastPost-Docs–  H.Wang

–  X.Besseron–  H.-W.Jin

–  M.Luo

–  W.Huang(Ph.D.)–  W.Jiang(M.S.)

–  J.Jose(Ph.D.)

–  S.Kini(M.S.)

–  M.Koop(Ph.D.)

–  R.Kumar(M.S.)

–  S.Krishnamoorthy(M.S.)

–  K.Kandalla(Ph.D.)

–  P.Lai(M.S.)

–  J.Liu(Ph.D.)

–  M.Luo(Ph.D.)–  A.Mamidala(Ph.D.)

–  G.Marsh(M.S.)

–  V.Meshram(M.S.)

–  A.Moody(M.S.)

–  S.Naravula(Ph.D.)

–  R.Noronha(Ph.D.)

–  X.Ouyang(Ph.D.)

–  S.Pai(M.S.)

–  S.Potluri(Ph.D.)

–  R.Rajachandrasekar(Ph.D.)

–  K.Kulkarni(M.S.)–  M.Rahman(Ph.D.)

–  D.Shankar(Ph.D.)–  A.Venkatesh(Ph.D.)

–  J.Zhang(Ph.D.)

–  E.Mancini–  S.Marcarelli

–  J.Vienne

CurrentResearchScien:stsCurrentSeniorResearchAssociate–  H.Subramoni

–  X.Lu

PastProgrammers–  D.Bureddy

-K.Hamidouche

CurrentResearchSpecialist–  M.Arnold

Page 66: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 66NetworkBasedCompu4ngLaboratory

SecondInterna4onalWorkshoponHigh-PerformanceBigDataCompu4ng(HPBDC)

HPBDC2016willbeheldwithIEEEInterna4onalParallelandDistributedProcessingSymposium(IPDPS2016),Chicago,IllinoisUSA,May27th,2016

KeynoteTalk:Dr.ChaitanyaBaru,SeniorAdvisorforDataScience,Na4onalScienceFounda4on(NSF);Dis4nguishedScien4st,SanDiegoSupercomputerCenter(SDSC)

PanelModerator:JianfengZhan(ICT/CAS)

PanelTopic:MergeorSplit:MutualInfluencebetweenBigDataandHPCTechniques

SixRegularResearchPapersandTwoShortResearchPapers

hWp://web.cse.ohio-state.edu/~luxi/hpbdc2016

HPBDC2015washeldinconjunc4onwithICDCS’15

hWp://web.cse.ohio-state.edu/~luxi/hpbdc2015

Page 67: Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Processing

HPCAC-Switzerland(Mar‘16) 67NetworkBasedCompu4ngLaboratory

[email protected]

ThankYou!

TheHigh-PerformanceBigDataProjecth<p://hibd.cse.ohio-state.edu/

Network-BasedCompuFngLaboratoryh<p://nowlab.cse.ohio-state.edu/

TheMVAPICH2Projecth<p://mvapich.cse.ohio-state.edu/