BCW: Buffer-Controlled Writes to HDDs for SSD-HDD Hybrid ......①The next HDD write state will be...
Transcript of BCW: Buffer-Controlled Writes to HDDs for SSD-HDD Hybrid ......①The next HDD write state will be...
BCW:Buffer-ControlledWritestoHDDsforSSD-HDDHybridStorageServer
Shucheng Wang1,Ziyi Lu1,Qiang Cao1,HongJiang3,Jie Yao2,YuanyuanDong4 andPuyuan Yang4
1,2HuazhongUniversityofScienceandTechnology,3UniversityofTexasatArlington,
4AlibabaGroup
Outline• Background
ØSSD-HDDhybridstorage
• ChallengeandMotivationØUnbalanceddeviceutilizationØHDDbufferedwritebehavior
• DesignØBuffer-ControlledWritesØMixedIOscheduler
• Evaluations2
SSD-HDDhybridstorage
• SSDandHDD• SSD(SolidStateDrive)hashigh-speedperformance• HDD(HardDiskDrive)haslargecapacityandlowcost
• SSD-HDDhybridstorageØ Applications
Ø PropertiesØ Cost-effectivenessØ High-speedandlowlatency
3
UnbalancedDeviceUtilization
• SSDssufferfromhighwritepressure
LongtaillatencyØ 99th-percentilewritelatency> 𝟏𝟎𝒎𝒔
Ø 99.9th-percentilewritelatency> 𝟓𝟎𝒎𝒔
LargequeuelengthØ Morethan𝟏𝟎𝟎 blockedwriterequestsinthequeue
4
Highly-intensivewritesineachSSDØ Peakwriterequestspersecond > 𝟏𝟎𝑲𝑹𝑷𝑺Ø Data writtenperday > 𝟑 𝑻𝑩
UnbalancedDeviceUtilization
0
10
20
30
A B C D
AveragedeviceIO
utilization(%
)
WorkloadType
SSD HDD
• HDDsareunderutilized
5
LowHDDutilizationØ 𝟏/𝟐 − 𝟏/𝟓 of SSDutilizations
Ø 𝟗𝟎% − 𝟗𝟓% oftimesareinidlestate
6
WewanttoexploittheunderutilizedHDDtorelievethepressureofSSDsinhybridstoragenodes
Challenges
7
CouldtheHDDsreachto𝝁𝒔-levelwritelatency?
?
<1msMethodisfeasiblewhen:𝑨𝒗𝒆𝒓𝒂𝒈𝒆 𝑳𝒂𝒕𝒆𝒏𝒄𝒚 𝒐𝒇 𝑯𝑫𝑫
< 𝑻𝒂𝒊𝒍 𝑳𝒂𝒕𝒆𝒏𝒄𝒚 𝒐𝒇 𝑺𝑺𝑫
MotivationalTest• Issuesequential andcontinuouswritestoHDDs
(Append-only)(Close-loop)
• TestedHDDDevices
Manufacturer Capacity Model RecordingTechnology
SequentialWriteBandwidth(MB/s)
WestDigital10TB WD100EFAX PMR 2008TB WD8004FRYZ PMR 1804TB WD40EZRZ PMR 180
Seagate8TB ST8000DM0004 PMR 1804TB ST4000DM004 SMR 180
8
ResultsOverview
9
10TBWestDigitalHDD
HDDwritebehaviors
HDDcan reach𝝁𝒔-levelwritelatency,especiallyfor
smallsizerequests
35us for4KB writes66us for16KB writes180us for64KB writes
1ms
1ms
10
HDDwritebehaviors
Reachus-levelwritelatency,especiallyforsmallsizerequests
𝒎𝒔-level latencyspikes
10ms
Higherthan10msforsomespikes
11
HDDwritebehaviors
Reachtous-levelwritelatency,especiallyforsmallsizerequests
ms-levellatencyspikes
Fixedwritelatencyperiod
Thelengthofthefastestwritestageis16MBTheintervalbetweentwohighestlatencyspikesis8MB
12
Howdoesthishappen?
13
Built-inBuffer
writeIO
HDDDevice
disc
Challenges
14
HDDscanreach𝝁𝒔-levelwritelatency
? Howtocontrolthese𝝁𝒔-levellatencywritesinHDDs?
ü
M M M M M M SS
Write Latency
F F F F
Sync
M M M S
HDDBuffered-WriteModel
• ThreetypesofHDDbufferedwrites• 𝑭astwrite(low-latency)• 𝑴idwrite(mid-latency)• 𝑺lowwrite(high-latencyspike)
15
HDDBuffered-WriteModel
M M M M M M SS
Write Latency
F F F F
Buffered Write Sequence
Sync
M M M S
Wf Wm
• ThreetypesofHDDbufferedwrites• 𝑭astwrite(low-latency)• 𝑴idwrite(mid-latency)• 𝑺lowwrite(high-latencyspike)
• BufferedWriteSequence(afteraSync)StartswithaFaststage,followedbyoneormoreSlow-and-Mid stage-pairs• Fast stagelastsfor𝑾𝒇 datawritten• Mid stagelastsfor𝑾𝒎 datawritten• Slow stagecontainsan𝑺lowwrite
16
Write-statePredictor
• ThenextHDDwritestatecouldbepredicted,accordingto:① Writestateofthecurrentrequest
• Eachwriterequestcanonlybeoneofthe𝑭,𝑴,or𝑺② Current𝐴𝐷𝑊,𝑊U and𝑊V values
• A: 𝐴𝐷𝑊 < 𝑊U and𝑊V• U: 𝐴𝐷𝑊 ≥ 𝑊U and𝑊V
③ 𝑆𝑦𝑛𝑐 operation• Takesthenextwritestatebackto𝐹
F M
Sync
Write
S
Sync
A
U
Sync
A U
17
Buffer-ControlledWrites(BCW)• Goal:
Ø Ensuresuserwritestobeinthe𝑭 or𝑴writestate,avoidswritesinthe𝑺lowstate.
• Steps:Ø Performprofiling forallkeyparameters(ifunknown)Ø Invokes𝑺𝒚𝒏𝒄 operationwhenstartingBCW
• ToflushalldataintheHDDbuild-inbuffer
Time
Sync
18
Buffer-ControlledWrites(BCW)
• Activelypadsnon-userdatatoHDDwhentherearenouserrequestsØ PF:forpaddingthe𝐹 and𝑀 stage(i.e.,4KB)Ø PS:forpaddingthe𝑆 stage(i.e.,64KB)
• WriteuserdatatoHDDwhenthereareuserrequestsinthequeue
Sync
Sync
Time
Time
19
USER
F
USER
F
PF
F
Buffer-ControlledWrites(BCW)
𝑊U
𝐴𝐷𝑊
• Whencurrentdatawritten(𝐴𝐷𝑊)inthe𝐹 and𝑀 statesareclosetothe𝑊𝑓 and𝑊𝑚 values① ThenextHDDwritestatewillbepredictedas𝑺
S
Sync
Time
20
USER
F
PF
F
PF
F
PF
F
Buffer-ControlledWrites(BCW)
• Whencurrentdatawritten(𝐴𝐷𝑊)inthe𝐹 and𝑀 statesareclosetothe𝑊𝑓 and𝑊𝑚 values① ThenextHDDwritestatewillbepredictedas𝑆② Stop receivinguserrequests
Time
21
Sync
USER
F
PF
F
PF
F
PF
F
Buffer-ControlledWrites(BCW)
𝑊U
𝐴𝐷𝑊
PS
S
• Whencurrentdatawritten(𝐴𝐷𝑊)inthe𝐹 and𝑀 statesareclosetothe𝑊𝑓 and𝑊𝑚 values① ThenextHDDwritestatewillbepredictedas𝑆② Stopreceivinguserrequests③ ContinuouslypadsPS,untila𝑺writestateisdetected
Time
22
Sync
USER
F
PF
F
PF
F
PF
F
Buffer-ControlledWrites(BCW)
Sync
• Whencurrentdatawritten(𝐴𝐷𝑊)inthe𝐹 and𝑀 statesareclosetothe𝑊𝑓 and𝑊𝑚 values① ThenextHDDwritestatewillbepredictedas𝑆② Stopreceivinguserrequests③ ContinuouslypadsPS,untila𝑆 writestateisdetected④ Start receivinguserrequests
Time
23
PS
SUSER
F
PF
F
PF
F
PF
F
Buffer-ControlledWrites(BCW)
𝑊V
• SamestepsofpaddingPF andPS in𝑀 stages
PS
S
PF
M
USER
M
PF
M
PS
S
USER
M
𝐴𝐷𝑊
PF
F
Sync
USER
F
PF
F
PF
FTime
24
Challenges
25
HDDscanreach𝝁𝒔-levelwritelatency
?
BCWprovidesaproactiveandcontrollablebufferwritingapproach
ü
ü
HowtoleverageBCWeffectivelyinthehybridstoragenodes?
MixedIOscheduler(MIOS)• AschedulerthatschedulesmixedIOsatruntime• Architecture
• ArequestqueueforeachSSDandHDD,andmonitorstheirlength 𝒍(𝒕)𝑺𝑺𝑫 𝑎𝑛𝑑 𝒍(𝒕)𝑯𝑫𝑫• LogfileineachHDD,adevicefilestoringBCWwritesinanappend-onlymanner.
SchedulingStrategy
User writes
SSD HDDSSD
...
...HDD
...
...Log file in HDD
Request queue
MIOSHDDflag( )l t
26
MixedIOscheduler(MIOS)• Keyparameters:
SchedulingStrategy
User writes
SSD HDDSSD
...
...HDD
...
...Log file in HDD
Request queue
MIOS HDDflag( )l t
Threshold𝑳:𝑆𝑆𝐷 𝑤𝑟𝑖𝑡𝑒 𝑤𝑖𝑡ℎ 𝑙 𝑡 kkl≥ 𝑀 𝑠𝑡𝑎𝑡𝑒 𝑖𝑛 𝐻𝐷𝐷
𝒇𝒍𝒂𝒈𝑯𝑫𝑫: isHDDavailable(BCWcontrolled)?
27
SchedulingstrategiesinMIOS
MIOS_D:redirectSSDwritestoHDDswhen:• 𝒍𝑺𝑺𝑫 ishigher thanthethreshold𝑳 AND• HDDisthe𝑭 or𝑴writestatewithBCW
MIOS_E:• Same asMIOS_Dwhen𝒍𝑺𝑺𝑫 > 𝑳• Furtherperformredirectionwithjust𝑭writestatewhen𝒍𝑺𝑺𝑫 < 𝑳
SSD HDD
Redirect
SSD HDD
Redirect
28
EvaluationSetup
System Linuxversion4.15.0-52-genericCPU IntelXeonE5-2696v4(2.20GHz,22CPUs)
Memory 128GB
HDDsWestDigital10TB(default)
WestDigital4TBSeagate4TB
SSDs Samsung960EVO256GB(NVMe,2000MB/s)
• Comparisons• Baseline:Panguworkloadreplay(writingalldataintoSSDs)• MIOS_D• MIOS_E
• Evaluationenvironment
29
EvaluationSetup
WorkloadTypes A B C D
Business CloudComputing CloudStorage Structured
StorageStructuredStorage
SSDWrites(GB) 14.7 61.2 7.2 7.5
SSDWriteRequests(millions) 0.43 4.4 4.8 4.7
Note LowestIOintensity
Mostwrittendata
• Workloadcharacteristics
30
WritePerformance
• Average,99qr and99.9qr-percentiletaillatencyØ Reducedby65%,85%,and95% respectivelyinworkloadBØ Reducedby2%,3.5%and30%respectivelyinworkloadA
31
95%
QueueLengthReduction
WorkloadAMinimum(15%)reductioninqueuelengths
32
WorkloadBMaximum(95%)reductioninqueuelengths
SSDWrittenDataReduction• MIOS_D:
Ø Reduced5.5%comparedwithBaseline inworkloadAØ Reduced15.3% and16% inworkloadCandD
• MIOS_E:Ø Reduced93.3% comparedwithBaseline inworkloadAØ 71%and72%lessthanBaseline inworkloadCandD
33
93.3% 71%15.3%5.5%
WritePerformance: MIOS_DvsMIOS_E• MIOS_Eleadstoworse latencyperformance• MIOS_Eleadsto1.4xhigheraveragelatency,1.7x higher99qr-percentilelatency,6.6x higher99.9qr-percentilelatencythan MIOS_D inworkloadA
34
Tradeoff
6.6x
87%
ExperimentwithotherHDDs
• DifferenttypesofHDDsdonothaveasignificantimpact• Themaximumdifferenceofaverageandtaillatencyislessthan 3%• AmountofdatawrittentoandnumberofrequestsprocessedinSSDwithdifferentHDDsislessthan5%
Baseline WD10TB
WD4TB
SE4TB
SSDwrittendata(GB) 61.2 4.1 4.2 4.4
SSDwriterequests
(thousands)4453 720 724 769
35