thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling...
Transcript of thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling...
ThyNVM:EnablingSoftware-TransparentCrashConsistencyinPersistentMemorySystems
JingleiRen∗† JishenZhao‡ SamiraKhan†ʹ JongmooChoi+†YongweiWu∗ OnurMutlu†
∗TsinghuaUniversity†CarnegieMellonUniversity
‡UniversityofCalifornia,SantaCruzʹUniversityofVirginia+DankookUniversity
Emergingbyte-addressable non-volatilememory(NVM)
Persistentmemory, a newtier inthememory andstoragestack
NVM is coming…
Step1:
Step2:
Addadataitemtoapersistent linkedlist
Current solution: wrap these in one transactionor use other specificsoftware-based interface
List brokenData lost
Newrequirementforpersistent memorydata:crashconsistency
• Motivation:Limitationsofsoftware-based crashconsistencysupport• Significantburden on programmers:e.g.,adoptingnewinterfaces.• Limitedusecases:e.g.,legacy application, non-transactionalprograms.
• Idea:Software-transparent crashconsistencysupportthrougha newdual-schemecheckpointingmechanismforpersistentmemory.
• Observation:Atradeoffbetween applicationstalltime (checkpointing latency)andmetadatastorageoverhead.• Small-granularityscheme:✔shortcheckpointing latency✘largemetadata.• Large-granularityscheme:✘longcheckpointing latency✔smallmetadata.
• Mechanism:Combinationoftwocheckpointingschemesattwogranularities.• Realizing✔shortcheckpointing latency: cooperation ofthe two schemes.• Realizing✔smallmetadata:sparseupdates→small-granularityscheme;
denseupdates→large-granularityscheme.• Evaluation:Within4.9% slowdown ofan idealizedDRAM-only systemwithcrash consistency support at no cost.
ExecutiveSummary
Outline• Motivation• Observation: A New Tradeoff• Dual-Scheme Checkpointing• Evaluation
MotivationInefficiency of software-based crash consistencysupport
1 void TMhashtable_update(TM_ARGDECL hashtable_t* ht,2 void* key, void* data) {3 list_t* chain = get_chain(ht, key);4 pair_t* pair;5 pair_t updatePair;6 updatePair.first = key;7 pair = (pair_t*)TMLIST_FIND(chain, &updatePair);8 pair->second = data;9 }
Transactional interface for third-party libraries
Manually declaring transactional/persistent components
Prohibited operation, will cause a runtime error
(Potential) program bugfor certain implementation
MotivationInefficiency of software-based crash consistencysupport
ThyNVM - FeatureI:Software-transparent crash consistency support
1 void hashtable_update(hashtable_t* ht, 2 void* key, void* data) {3 list_t* chain = get_chain(ht, key);4 pair_t* pair;5 pair_t updatePair;6 updatePair.first = key;7 pair = (pair_t*)list_find(chain, &updatePair);8 pair->second = data;9 }
Valid operation,persistent memory will ensure crash consistency
Unmodified syntax and semantics
Motivation
Inefficiency of logging• Logging
• Largespaceforrecordingeveryupdate• Slowrecoveryforreplayingthelog
• Copy-on-Write• Largespaceforredundantunmodifieddata• Slowoperationforcopyingunmodifieddata
ThyNVM - FeatureII:An efficientdual-scheme checkpointingmechanism
and copy-on-write (CoW)
Outline• Motivation• Observation: A new tradeoff• Dual-scheme checkpointing• Evaluation
Observation
Two concerns in checkpointing
Latencyofcheckpointingthe
workingcopyofdata
Metadataoverheadtotracktheworkingcopy/checkpointof
data
tradeoff
and their tradeoff
Checkpointing granularity• Small granularity leadsto largemetadata size
• Large granularity leadsto small metadata size
Location of theworkingcopyof data• Caching the working copy in DRAM:writebackbothdirtydataandmetadataduringcheckpointing(longlatency)
• StoringtheworkingcopyinNVM:persistonlymetadataduringcheckpointing(shortlatency);needremapping datalocations
Observation
CheckpointinggranularitySmall (cache block) Large (page)
Locatio
nof
working
copy
DRAM:basedonwriteback
❶ Inefficient✘ Large metadata overhead✘ Long checkpointing latency
❷ Partiallyefficient✔ Small metadataoverhead✘ Long checkpointinglatency
NVM:basedonremap
❸ Partiallyefficient✘ Large metadataoverhead✔ Short checkpointinglatency✔ Fast remapping
❹ Inefficient✔ Small metadataoverhead✔ Short checkpointinglatency✘ Slowremapping(onthecriticalpath)
CheckpointingSchemeI CheckpointingSchemeII
Outline• Motivation• Observation: A New Tradeoff• Dual-SchemeCheckpointing• Evaluation
Dual-Scheme Checkpointing
Definitions• Execution model:epochs
• System model: the hybrid architecture
execution checkpointing execution checkpointing
Epoch 0 (last epoch) Epoch 1 (active epoch) time
Shared LLC
MemoryController
CPUCore
CPUCore
CPUCore
...
DRAM
Address Translation TablesBTT PTT
DRAM Read Queue
NVM Write Queue
NVM Read Queue
DRAM Write QueueNVM
The last checkpoint 𝐶"#$%The active working copy𝑊#'%()*
Block Translation Table (BTT):metadatafor small-granularity schemePage Translation Table (PTT):metadatafor large-granularity scheme
Hardware-based design:Software uses regularload/store instructions
Recover
𝐶"#$%
BTT (Mem. Ctrl.)
Checkpointing Scheme I: Block Remapping(location in the tradeoff: small granularity + NVM in-place)
𝑊#'%()*
𝐶"#$%
BTT (Mem. Ctrl.)P Q NVM (blocks)
PWrite to P(cache block size)
Q
During execution: remap theworking copy to a new address inNVM, to protect the last checkpoint
During checkpointing: only need topersist BTT;𝑊#'%()* becomes𝐶"#$%without anydatamovement
𝐶"#$%
BTT (Mem. Ctrl.)P Q
Checkpointinggranularity
Small (cache block) Large (page)
Locatio
nof
working
copy
DRAM(basedonwriteback)
❶ Inefficient� Large metadata overhead� Long checkpointing latency
❷ Partiallyefficient� Small metadataoverhead� Long checkpointinglatency
NVM(basedonremapping)
❸ Partiallyefficient� Large metadataoverhead� Short checkpointinglatency� Fast remapping
❹ Inefficient� Small metadataoverhead� Short checkpointinglatency� Slowremapping(onthecriticalpath)
P
Q
BTT Backup (NVM)P Q
NVM (blocks)
Dual-Scheme Checkpointing
Dual-Scheme Checkpointing
Checkpointing Scheme II: Page Writeback(location in the tradeoff: large granularity + DRAM cache)
Checkpointinggranularity
Small (cache block) Large (page)
Locatio
nof
working
copy
DRAM(basedonwriteback)
❶ Inefficient� Large metadata overhead� Long checkpointing latency
❷ Partiallyefficient� Small metadataoverhead� Long checkpointinglatency
NVM(basedonremapping)
❸ Partiallyefficient� Large metadataoverhead� Short checkpointinglatency� Fast remapping
❹ Inefficient� Small metadataoverhead� Short checkpointinglatency� Slowremapping(onthecriticalpath)
During execution: update thecached hot pages in DRAM (𝑊#'%()* )
During checkpointing: writeback𝑊#'%()* and PTT to NVM
𝐶"#$%
PTT (Mem. Ctrl.)P P*
P
NVM (pages)
𝑊#'%()*P*
DRAM (pages)
𝐶"#$%
PTT (Mem. Ctrl.)P P*
P
NVM (pages)
𝑊#'%()*P*DRAM (pages) Q
PTT Backup (NVM)P Q
Write toa block in P
Dual-Scheme CheckpointingCoordinating the Two Schemes• Key Mechanism I: Realizing short application stall time bycooperation of dual schemes
ThyNVM: overlap program execution and checkpointing time.
execution checkpointing
Epoch 1
execution checkpointing
Epoch 0
time
Mainly due to the page writeback scheme,while the block remapping scheme finishes
checkpointing fast
Dual-Scheme CheckpointingCoordinatingtheTwoSchemes• Key Mechanism I: Realizing short application stall time bycooperation of dual schemes
execution checkpointing
Epoch 1
execution checkpointing
Epoch 2
execution checkpointing
Epoch 0
time
Two schemes operate separatelyfor different memory regions
Page writeback does checkpointingin background
Block remapping takes charge ofall memory regions temporarily
Dual-Scheme Checkpointing
The penultimate checkpoint 𝐶+*,-"%
the last checkpoint𝐶"#$%
Recover
execution checkpointing
Epoch 1
execution checkpointing
Epoch 2
execution checkpointing
Epoch 0
time
CoordinatingtheTwoSchemes• Key Mechanism I: Realizing short application stall time bycooperation of dual schemes
the workingcopy𝑊#'%()*
Dual-Scheme Checkpointing
Blockremapping (Cooperation) Pagewriteback
Storereceived
No
Still ckpt. 𝐶"#$%?No
Write 𝑾𝒂𝒄𝒕𝒊𝒗𝒆𝒃𝒍𝒐𝒄𝒌 to NVM
(protecting 𝑪𝒍𝒂𝒔𝒕);Update BTT
YesHit in PTT?
Write 𝑾𝒂𝒄𝒕𝒊𝒗𝒆𝒑𝒂𝒈𝒆 to DRAM
(protecting 𝑪𝒍𝒂𝒔𝒕);Update PTT
Still ckpt. 𝐶"#$%?
NoYes
Buffer 𝑾𝒂𝒄𝒕𝒊𝒗𝒆𝒃𝒍𝒐𝒄𝒌 in DRAM
(protecting 𝑪𝒑𝒆𝒏𝒖𝒍𝒕);Update BTT
Yes
CoordinatingtheTwoSchemes• Key Mechanism I:Summary of flow
Acknowledge
Dual-Scheme CheckpointingCoordinatingtheTwoSchemes• Key Mechanism II: Realizing small metadata overhead bymatching write patterns with dual schemes
• Estimate spatial locality by# stores in the last epoch on individualblocks/pages (recorded on BTT/PTT)
• Switch scheme by updating PTT andmigratingnecessary data
Spatiallocality
Writepattern
Page-levelcharacteristics
Granularity formin metadata
Matchingscheme
Low Random,sparse,of small sizes
Small portionof dirty data
Small(cache block size)
Blockremapping
High Sequential,dense,of large sizes
Large portionof dirty data
Large(page size)
Pagewriteback
Outline• Motivation• Observation: A New Tradeoff• Dual-Scheme Checkpointing• Evaluation
Evaluation• Experiment Setup
• Simulator basedongem5• DRAMandNVMwithDDR3• NVM:40(128/368)nsrowhit(clean/dirtymiss)
• Systemsincomparison• IdealDRAM:fullDRAM; nocost in supporting crashconsistency
• Ideal NVM: full NVM; no cost in supporting crashconsistency
• Journaling (one form of logging)• Shadow paging (one form of copy-on-write)
Evaluation• Workload I: Micro-benchmarks with
different write patterns
0
512
1024
1536
2048
Journal ShadowThyNVM 0
20
40
60
80
100
Tota
l am
ount of
NV
M w
rite
tra
ffic
(MB
)
% e
xec. tim
espent on c
kpt.
CPUMigration
Checkpoint.% exec time on ckpt
0
256
512
768
1024
Journal ShadowThyNVM 0
20
40
60
80
100
Tota
l am
ount of
NV
M w
rite
tra
ffic
(MB
)
% e
xec. tim
espent on c
kpt.
CPUMigration
Checkpoint.% exec time on ckpt
(a) Random (b) Sequential
• ThyNVMreducestheNVMwritetrafficby10.8%/14.4%compared to Journalingand Shadow paging.
• Journaling/Shadowpagingspend18.9%/15.2% timeoncheckpointing,whileThyNVMreducesthisoverheadto2.5% onaverage.
Evaluation• Workload II: In-memory storage
(hashtable based key-value store)
• ThyNVMprovides8.8% higherthroughputthanJournaling
• ThyNVMprovides29.9% higherthroughputthanShadowpaging
50 100 150 200 250 300 350
16 64 256 1024 4096
Tran
sact
ion
thro
ughp
ut (K
TPS)
Request size (B)
Ideal DRAMIdeal NVM
JournalShadowThyNVM
Evaluation• Workload III: Compute-intensive tasks
(in CPU SPEC 2006)
• ThyNVMslowsdownbyonly3.4% comparedtoIdealDRAM,andspeedsupby2.7% comparedtoIdealNVM.
0.4 0.5 0.6 0.7 0.8 0.9
1 1.1 1.2
gcc bwavesmilc leslie.soplexGems.lbm omnet.
Nor
mal
ized
IPC
Ideal DRAMIdeal NVM
ThyNVM
ConclusionContributions• We propose anewhybrid persistentmemorydesignwithsoftware-transparentcrashconsistencysupport.
• We identify a new tradeoffbetweenapplicationstalltimeandmetadatastorageoverhead.
• Wedeviseanew efficientdual-schemecheckpointingmechanism.
Potentials• ThyNVMcanenable:(1)easierandmorewidespreadadoption ofpersistentmemory,and(2)moreefficientsoftwarestackforexploitingpersistentmemory.
• ThyNVM can encouragemoreresearchinprovidingprogrammer-friendlymechanismsformanagingpersistentandhybridmemories.
Open Source• Web site:http://persper.com/thynvm (source code, documents, etc.)
Thank you!Jinglei Ren <[email protected]>
http://persper.com/thynvm