San disk axel rosenberg

42
Exascale Architectures DISAGGREGATED STORAGE & COMPUTE Director ISV & Strategic Partners Axel – C. Rosenberg

Transcript of San disk axel rosenberg

Page 1: San disk axel rosenberg

ExascaleArchitecturesDISAGGREGATEDSTORAGE&COMPUTE

DirectorISV&StrategicPartners

Axel–C.Rosenberg

Page 2: San disk axel rosenberg

TheConsequencesofInfiniteStorageBandwidth

Page 3: San disk axel rosenberg

CreaAonofaGlobalLeaderinStorageTechnologyEnhancedscaleanddiversitystrengthensabilitytocaptureopportuniAesinanevolvinglandscape

1 LTM revenues based on most recent public filings and Wall Street research; Western Digital and SanDisk LTM as of 7/1/2016; Toshiba represents March 2016 LTM revenue.

$17,8 $15,9

$11,2 $11,2 $10,3

$5,5 $4,8 $3,4

$2,5

LTM

Rev

enue

1

(Information Storage) (NAND) (NAND) (NAND)

(Storage & Memory) (NAND)

©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved. 3

Page 4: San disk axel rosenberg

LeaderinLow-LatencyFabrics&DriverTechnologies

©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.ConfidenAal. 4

• Demonstratedindustry-leadingnextgeneraAonlowlatencynetworkingtechnologiessuitableforemergingNVMs

•  LowlatencyRDMAEthernetfabricsandprotocols:NVMeoverfabrics,RDMAfabricsforemergingNVMsandNVMefabricsstorage/memoryappliances

•  EnablinguseofemergingNVMinnetworked(datacenter)environment

•  LinuxopensourcedriverimplementaAonensuresflexiblesupportforfutureproducts

NVMeoverfabricsLinuximplementaAon–7uslatency

RDMAtoPCIememorymappedReRAMasfastasDRAM!

Page 5: San disk axel rosenberg

5

Driver:BifurcaAonofData

FastData

Big Data

DATA

TransacAons

DATA

DeepArchive

TransacAons/Sensors/Logs

StreamingAnalyAcs

DeepArchive

AcAveArchive

BatchAnalyAcs

INSIGHTS

DATA

OldWorld NewWorld

©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.

Page 6: San disk axel rosenberg

§  CapacityHDDsincreasinglybeingdirectedatworkloadsthatarebyte-richandaccess-poor(coldAer)

§  Expandingroleofflashbeyondcachingtoprimarystorage

§  Datatemperaturerisingeveryyear

Principlesforchoosingmediabyworkload

Sourceofworkloadcharacteris2csand2008,2013lines:ArefM.(Google)Sourceof2018,2020break-evenline:calcula2onbyPankajM.(CTO)andFadiA.(ES)Sourceofaccessratebyageofdata:Kestu2sP.(Facebook)

©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.

Page 7: San disk axel rosenberg

Driver:SoPwareDefinedStorage

7©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.

Page 8: San disk axel rosenberg

Fabrics(akaNetworks)

Page 9: San disk axel rosenberg

Network,Storage,&DRAMtrendsLog scale

•  UseDRAMBandwidthasaproxyforCPUthroughput

•  ReasonableapproximaAonforDMAandpoorcacheperformanceworkloads(e.g.Storage)

Big difference in slope!

©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.

Page 10: San disk axel rosenberg

Network,Storage,&DRAMtrendsLinear scale

Infinite Storage Bandwidth•  Samedataaslastslide,but

fortheLog-impaired

•  StorageBandwidthisnotliterallyinfinite

•  Butthera0oofNetworkandStoragetoCPUthroughputiswideningveryquickly

©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.

Page 11: San disk axel rosenberg

SSDBW∝ NetworkBW(~10SSDsperport)BW/TB∝Constant(0.25GB/sperTB)

1

10

100

1000

10000

100000

1000000

2004 2006 2008 2010 2012 2014 2016 2018 2020 2022Year

SSDspeedMB/s

NetworkspeedMB/s

SSDDensity100sofMB

GB/s/TB NetworkSpeed/SSDspeed

BitsoverFabrics?

©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.

Page 12: San disk axel rosenberg

Concept:DisaggregaAonofStorage

12©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.

Page 13: San disk axel rosenberg

INFINIFLASH

Winning platinum in Storage Insider IT Awards 2015

InfiniFlash™

§ AllFlash§ Only3RU§ 64TBto512TBRaw§  DisrupAveCost

©2016WesternDigitalCorporaAonoraffiliates.Allrightsreserved.

Page 14: San disk axel rosenberg

§  StorageSohwareunbundlingfromHardware

§  ExplosionofSDSofferingsinrecentyears

§  ExplosionofSDSdeploymentsinrecentyears

§  SDSChangestheresponsibiliAes,notthetechnology

SoPwareDefinedStorage(SDS)–What’snew?

Page 15: San disk axel rosenberg

§  Storageperformanceishugelyaffectedbyseeminglysmalldetails

§  AllHWisnotequal–Switches,NICs,HBAs,SSDsallmajer•  DriversabstracAondoesn’thidedynamicbehavior

§  AllSWisnotequal–Distro,Patches,Drivers,ConfiguraAonmajer

§  Typicallylargedeltabetween“default”and“tuned”systemperf

§  What’sausertodo?

SoPwareDefinedStorage–what’sNOTnew

Page 16: San disk axel rosenberg

§  CPU–Corecounts,ClockSpeeds,CacheSizes

§  DRAM–cachesizes

§  Network–wirespeed,RDMAcapability

§  Storage–Redundancyfordurabilityandavailability•  Storageredundancyforavailabilityisprimarilyanetworkproblem

•  Howmuchredundancyfordurabilityisrequired?

ControllingStorageSystemCosts

Page 17: San disk axel rosenberg

§  AnnualizedFailureRate(AFR)ofFlashissuperiortoHDD

§  MeasuredHDDAFR≈1.7%(1styear),≈8%(3rdyear)1

§  MeasuredFlashAFR≈%0.612orevenless[0.1%..0.5%]3

§  For100PBofrawstoragein8TBDrives(HDD&Flash)

§  WeeklyFailureratesofHDD→upto19Drives/Week

§  WeeklyFailureratesforFlash→1.4cards/Week

What’sdifferentaboutFlash?

1“FailureTrendsinLargeDiskDrivePopula2on”,Feb2007,Google

2hjp://www.intel.com/content/dam/doc/technology-brief/intel-it-validaAng-reliability-of-intel-solid-state-drives-brief.pdf

3hjp://techreport.com/review/26269/behind-the-scenes-with-intel-ssd-division

Page 18: San disk axel rosenberg

§  MeanTimetoDataLoss(MTTDL)iswhatyouactuallycareabout

§  ToconvertAFRintoMTTDLyoumustincluderepairAme(MTTR)

§  RepairoperaAonsdegradenormaloperaAons•  HDDrebuildsarehighlydetrimentaltooperaAons(I/Oblender)

•  FlashrebuildsonlymodestlyaffectoperaAons

§  HDDrebuildAmesusuallylimitedbyoperaAonaldegradaAon

§  FlashrebuildAmesusuallylimitedbynetworkandCPU

FailureRatedoesn’ttellthewholestory

Page 19: San disk axel rosenberg

§  Newdeploymentarchitectureforanoldidea–RAID§  TradeoffofmulApleparameters

•  StorageEfficiency•  ParityComputaAonCost•  RebuildCost•  Performancewhendegraded

§  TypicallythegoalisaspecificMTTDLforthebestcost

WhatisErasureCoding?

Page 20: San disk axel rosenberg

ErasureCodingData0-7

Data0 Data1 Data2 Data3 Data4 Data5 Data6 Data7 Parity0 Parity1K=8 M=2

K=NumberofDataChunksM=NumberofParity(Syndrome)ChunksHerewehave8+2=10Chunkstobeplacedindifferentfaultdomains

ParityComputaAon

Page 21: San disk axel rosenberg

§  TradiAonalHDDRAID-5/6Performspoorly•  ComputaAonsforparitygeneraAon

•  IncreasedseeksduetoaddiAonalparitywrites(2-3xforrandomwrite)

§  ModernFlashECperformancesufficient•  CPUsnowopAmizedforstorage(mulA-core,bejerinstrucAonsets)

•  Flasheliminatesseekpenalty

§  Singleservercaneasilysupport>1.5GB/secofwriteencodingBW*

ErasureCodingPerformance

*RGW4MBObjectWrites(YCSB),K=4,M=2,Cauchy-goodusing2xE5-26802.8GHz8x16GBRDIMM

Page 22: San disk axel rosenberg

§  With1nodedown,~3GB/SecofreadBWavailable/Server*

§  With2nodesdown,~2GB/SecofreadBWavailable/Server*

§  Rebuildcaneasilyajainfulldevicespeed•  Rebuildsarereadintensivewhichisflashfriendly

§  Cephonlyrebuildsin-usedatatofurtherreducerebuildAmes

§  DegradedoperaAonsapplyduringavailabilityoutagestoo!!

PerformancewithErasureCodingwhiledegraded

*RGW4MBObjectReads(YCSB),K=4,M=2,Cauchy-goodusing2xE5-26802.8GHz8x16GBRDIMM

Page 23: San disk axel rosenberg

§  RepairRate+DegradedOperaAonalDemands≤TotalBW&IOPS

§  TotalBW&IOPSdependentonoperaAonalaccesspajerns•  BW&IOPSduringdegradedoperaAonsmuchlessthannormaloperaAons

•  RepairoperaAonstypicallydegradeoperaAonssignificantly

§  Mustchoosebetween:•  PrioriAzingoperaAonsoverrebuild→ ↓MTTDL

•  PrioriAzingrebuildoveroperaAons→ ↓appperformance

•  ProvisioningenoughBW&IOPStocoverboth→ ↑cost

MeanTimeToRepair(HDD)

Page 24: San disk axel rosenberg

§  8TBHDD

§  Rebuildlimitedto3Days(MTTR=72hours)*

§  WithAFR(8TBHDD)=1.7%

§  3xReplicaAon→MTTDLof4x1012Hours

§  WithAFR(HDD)=8%

§  3xReplicaAon→MTTDLof3.5x1010Hours

§  8TBSSD

§  RebuildlimitedbywriteBW(8.9Hours)

§  WithAFR(8TBSSD)=0.61%

§  2xReplicaAon→MTTDLof1.1x1011Hours

MeanTimeToDataLoss(HDD3xReplicaAon)

•  Assumes100%(8TB)rebuildrequired

Page 25: San disk axel rosenberg

§  8TBSSD

§  RebuildlimitedbywriteBW(250MB/Sec)

§  WithAFR(8TBSSD)=0.61%

§  8+1ErasureCoding→MTTDLof7.8x109Hours(MTTR8.9Hours)*

§  8+2ErasureCoding→MTTDLof1.2x1013Hours(MTTR17.8Hours)*

§  16+2ErasureCoding→MTTDLof1.9x1012Hours(MTTR17.8Hours)*

§  16+4ErasureCoding→MTTDLof2.0x1018Hours(MTTR35.6Hours)*

MeanTimeToDataLoss(Flash+EC)

*4drivefailuresfor16+4EC,2drivefailuresforx+2EC,1drivefailurefor8+1EC.Assumes100%rebuildrequired

Page 26: San disk axel rosenberg

§  Flashw/EChassuperiordurability&availabilityvsHDDreplicaAon

§  Flashw/ECreducesstorageoverheadfrom3xto1.1x

§  Forlargescaledeployment,increasedcostofCPUmorethanbalancedbyreducednetworkandstoragecosts

ErasureCodingSummary

Page 27: San disk axel rosenberg

Whathappensaswegetclosertothelimit?

Page 28: San disk axel rosenberg

§  NewDenserServerFormFactors•  Blades•  Sleds

§  GoodshorttermsoluAons

Let’sGetSmall!

Page 29: San disk axel rosenberg

§  StorageCost=Media+Access+Management

§  Sharednothingarchitectureconflatesaccessandmanagement

§  StoragecostswillbecomedominatedbyManagementcost

§  StoragecostsbecomeCPU/DRAMcosts

EffectsOfTheCPU/DRAMBoeleneck

Page 30: San disk axel rosenberg

§  MovemanagementtoupperlayerswhereCPUcanberight-sizedbyclient

§  WhatkindofmediaaccessdoIwant?•  SimpleenoughfuncAonalitytobedonedirectlyindrivehardware–NOCPU•  Allowdirectaccessthroughoutthecomputeclusteroveranetwork•  Justenoughmachinerytoenablecoarse-grainedsharing

EmbracingTheCPU/DRAMBoeleneck

§  Inshort,youreallywantaSAN!–  Ormoretechnically,FabricConnectedStorage

Page 31: San disk axel rosenberg

NotYourFather’sSAN§  ThreeproblemswithcurrentSAN

•  Fibrechanneltransport•  SCSIaccessprotocol•  DriveorientedstorageallocaAon

§  Allofthesewanttobeupdated•  Fibrechannelisbrijleandcostly

•  SCSIiniAatorshavelongcodepathscateringtoseldomusedconfiguraAons

•  Robustsub-drivestorageallocaAon

Page 32: San disk axel rosenberg

SAN2.0§  NVMeoverFabrics

§  1.0Specisout

§  SimpleenoughfordirecthardwareexecuAonofdatapathops

§  MinimaliniAatorcodepathlengthsimproveperformance

§  Namespacesallowsub-driveallocaAons

§  Notmatureenoughforenterprisedeployment–yet

Page 33: San disk axel rosenberg

§  Soon,NICswillforwardNVMeoperaAonstolocalPCIedevices

§  CPUremovedfromthesoQwarepartofthedatapath

§  CPUissAllneededforthehardwarepartofthedatapath

§  IOPSimprove,BWisunchanged

§  SignificantCPUfreedforapplicaAonprocessing

§  GeRngclosertothewall!

SecondGeneraAonSAN2.0

Page 34: San disk axel rosenberg

§  NewgeneraAonofcombinedSSDcontrollerandNIC•  RethinkofinterfaceseliminatesDRAMbuffering

§  Networkgoesrightintothedrive

§  NoCPUtobefound

§  Workswellwithrackscalearchitecture

ThirdGeneraAonSAN2.0,Imagined

Page 35: San disk axel rosenberg

§  Disaggregated/RackScaleArchitecture•  Fabricconnected•  Independentlyscalecompute,networkingandstorage

Let’sGetReally Small

Page 36: San disk axel rosenberg

What’sItAllMean?§  Newformfactorsareineverybody'sfuture

§  Thecomingavalancheofstoragebandwidthwantstobefree•  NotimprisonedbyaCPU

§  RackScaleArchitectureallowsnewStorage/Computeconfigs

§  Storagewillbeincreasingly“SohwareDefined”astheHWevolves

Page 37: San disk axel rosenberg

DataCenterSoluAons 37

InfiniFlashIF150

8TBFlash-CardInnovaAon•  EnterpriseGradePower-FailSafe•  Latchingintegrated&monitored•  Directlysamplesairtemp•  FormfactorenableslowestcostSSD

Non-disrupAveScale-Up&Scale-Out•  Capacityondemand

•  ServehighgrowthBigData•  3UchassisstarAngat64TBupto

512TB•  8to648TBFlashCards(SAS)

•  Computeondemand•  ServedynamicappswithoutIOPS/

TBbojlenecks•  Addupto8servers

Page 38: San disk axel rosenberg

DataCenterSoluAons 38

DisaggregaAonistheKeytoBreakthroughEconomics

OldModel§  Monolithic

§  ProprietarystorageOS

§  Costly:$$$$$

NewModel§  Disaggregated§  SohwareDefinedStack§  Green!§  Highperformance§  CosteffecAve§  Flexible

Standardx86Servers

InfiniFlashHW

SOFTWAREDEFINEDSTORAGE

Page 39: San disk axel rosenberg

DataCenterSoluAons 39

SanDiskFlashStart™andFlashAssure™•  InstallaAonandTrainingServices•  24/7,GlobalonsiteTSANET

CollaboraAveSoluAonsSupport•  2hrPartsdelivery–750+global

locaAons

SohwareDefinedAll-FlashStoragetheDisaggregatedModelforScale

SanDiskFlash•  SharedFlashStorageèINFINIFLASH

•  FlashinServer

SW#Choice

FlashSoft ION#Accelerator

ComputeChoice

FlashSoQ

Page 40: San disk axel rosenberg

DataCenterSoluAons 40

IF100+SuperMicro+Ceph:Scale-OutSoluAon

40

Block&Object

&§  Ultra-denseHighCapacityFlashstorage

–  512TBin3U,Scale-outsohwareforPBscalecapacity

§  Highlyscalableperformance–  IndustryleadingIOPS/TB

§  Cinder,GlanceandSwiPstorage–  Add/removeserver&capacityon-demand

§  Enterprise-Classstoragefeatures–  AutomaAcrebalancing

–  HotSohwareupgrade

–  Snapshots,replicaAon,thinprovisioning

–  Fullyhotswappable,redundant

§  CephOpAmizedforSanDiskflash–  Tuned&HardenedforInfiniFlash

InfiniFlashIF500All-FlashStorageSystemBlockandObjectStoragePoweredbyCeph

Page 41: San disk axel rosenberg

DataCenterSoluAons 41

2016InfiniFlashCustomerMomentum(par2allis2ng)Customer VerAcal ApplicaAon/Plamorm SoluAonDomain

USUniversity EducaAon SpectrumScale(GPFS)

CLOUD

Intl.CreditCard FinancialServices Oracle DATABASES,ANALYTICS

USBank FinancialServices Vmware,OpenStack VIRTUALIZATION

GlobalLeadingISV Tech EnterpriseCloud CLOUD

CompuGroup(Germany/US) Healthcare/LifeSciences DataCore CLOUD

GlobalOnlineShop OnlineCommerce NoSQL BIGDATAANALYTICS

MajorLeagueBaseball Media&Entertainment Tegile BIGDATAMEDIA

JapaneseTelco Telco MapR BIGDATAANALYTICS

USTelco Media&Entertainment OpenStack BIGDATAMEDIA

GleSys(CSPSweden) CloudServiceProvider Nexenta CLOUD

Intl.Analyst Financial OpenStack BIGDATA&CLOUD

CanadianBroadcasAngCorp Media&Entertainment Nexenta BIGDATAMEDIA

Page 42: San disk axel rosenberg

DataCenterSoluAons 42c©2015SanDiskCorporaAon.Allrightsreserved.SanDiskisatrademarkofSanDiskCorporaAon,registeredintheUnitedStatesandotherCountries.InfiniFlashandSanDiskIONAcceleratoraretrademarksofSanDiskCorporaAon.OtherbrandnamesmenAonedhereinareforidenAficaAonpurposesonlyandmaybethetrademarksoftheirrespecAveholder(s).