thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling...

27
ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren Jishen Zhao Samira Khan †ʹ Jongmoo Choi +† Yongwei Wu Onur Mutlu Tsinghua University †Carnegie Mellon University ‡University of California, Santa Cruz ʹUniversity of Virginia +Dankook University

Transcript of thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling...

Page 1: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

ThyNVM:EnablingSoftware-TransparentCrashConsistencyinPersistentMemorySystems

JingleiRen∗† JishenZhao‡ SamiraKhan†ʹ JongmooChoi+†YongweiWu∗ OnurMutlu†

∗TsinghuaUniversity†CarnegieMellonUniversity

‡UniversityofCalifornia,SantaCruzʹUniversityofVirginia+DankookUniversity

Page 2: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Emergingbyte-addressable non-volatilememory(NVM)

Persistentmemory, a newtier inthememory andstoragestack

NVM is coming…

Page 3: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Step1:

Step2:

Addadataitemtoapersistent linkedlist

Current solution: wrap these in one transactionor use other specificsoftware-based interface

List brokenData lost

Newrequirementforpersistent memorydata:crashconsistency

Page 4: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

• Motivation:Limitationsofsoftware-based crashconsistencysupport• Significantburden on programmers:e.g.,adoptingnewinterfaces.• Limitedusecases:e.g.,legacy application, non-transactionalprograms.

• Idea:Software-transparent crashconsistencysupportthrougha newdual-schemecheckpointingmechanismforpersistentmemory.

• Observation:Atradeoffbetween applicationstalltime (checkpointing latency)andmetadatastorageoverhead.• Small-granularityscheme:✔shortcheckpointing latency✘largemetadata.• Large-granularityscheme:✘longcheckpointing latency✔smallmetadata.

• Mechanism:Combinationoftwocheckpointingschemesattwogranularities.• Realizing✔shortcheckpointing latency: cooperation ofthe two schemes.• Realizing✔smallmetadata:sparseupdates→small-granularityscheme;

denseupdates→large-granularityscheme.• Evaluation:Within4.9% slowdown ofan idealizedDRAM-only systemwithcrash consistency support at no cost.

ExecutiveSummary

Page 5: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Outline• Motivation• Observation: A New Tradeoff• Dual-Scheme Checkpointing• Evaluation

Page 6: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

MotivationInefficiency of software-based crash consistencysupport

1 void TMhashtable_update(TM_ARGDECL hashtable_t* ht,2 void* key, void* data) {3 list_t* chain = get_chain(ht, key);4 pair_t* pair;5 pair_t updatePair;6 updatePair.first = key;7 pair = (pair_t*)TMLIST_FIND(chain, &updatePair);8 pair->second = data;9 }

Transactional interface for third-party libraries

Manually declaring transactional/persistent components

Prohibited operation, will cause a runtime error

(Potential) program bugfor certain implementation

Page 7: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

MotivationInefficiency of software-based crash consistencysupport

ThyNVM - FeatureI:Software-transparent crash consistency support

1 void hashtable_update(hashtable_t* ht, 2 void* key, void* data) {3 list_t* chain = get_chain(ht, key);4 pair_t* pair;5 pair_t updatePair;6 updatePair.first = key;7 pair = (pair_t*)list_find(chain, &updatePair);8 pair->second = data;9 }

Valid operation,persistent memory will ensure crash consistency

Unmodified syntax and semantics

Page 8: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Motivation

Inefficiency of logging• Logging

• Largespaceforrecordingeveryupdate• Slowrecoveryforreplayingthelog

• Copy-on-Write• Largespaceforredundantunmodifieddata• Slowoperationforcopyingunmodifieddata

ThyNVM - FeatureII:An efficientdual-scheme checkpointingmechanism

and copy-on-write (CoW)

Page 9: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Outline• Motivation• Observation: A new tradeoff• Dual-scheme checkpointing• Evaluation

Page 10: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Observation

Two concerns in checkpointing

Latencyofcheckpointingthe

workingcopyofdata

Metadataoverheadtotracktheworkingcopy/checkpointof

data

tradeoff

and their tradeoff

Checkpointing granularity• Small granularity leadsto largemetadata size

• Large granularity leadsto small metadata size

Location of theworkingcopyof data• Caching the working copy in DRAM:writebackbothdirtydataandmetadataduringcheckpointing(longlatency)

• StoringtheworkingcopyinNVM:persistonlymetadataduringcheckpointing(shortlatency);needremapping datalocations

Page 11: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Observation

CheckpointinggranularitySmall (cache block) Large (page)

Locatio

nof

working

copy

DRAM:basedonwriteback

❶ Inefficient✘ Large metadata overhead✘ Long checkpointing latency

❷ Partiallyefficient✔ Small metadataoverhead✘ Long checkpointinglatency

NVM:basedonremap

❸ Partiallyefficient✘ Large metadataoverhead✔ Short checkpointinglatency✔ Fast remapping

❹ Inefficient✔ Small metadataoverhead✔ Short checkpointinglatency✘ Slowremapping(onthecriticalpath)

CheckpointingSchemeI CheckpointingSchemeII

Page 12: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Outline• Motivation• Observation: A New Tradeoff• Dual-SchemeCheckpointing• Evaluation

Page 13: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Dual-Scheme Checkpointing

Definitions• Execution model:epochs

• System model: the hybrid architecture

execution checkpointing execution checkpointing

Epoch 0 (last epoch) Epoch 1 (active epoch) time

Shared LLC

MemoryController

CPUCore

CPUCore

CPUCore

...

DRAM

Address Translation TablesBTT PTT

DRAM Read Queue

NVM Write Queue

NVM Read Queue

DRAM Write QueueNVM

The last checkpoint 𝐶"#$%The active working copy𝑊#'%()*

Block Translation Table (BTT):metadatafor small-granularity schemePage Translation Table (PTT):metadatafor large-granularity scheme

Hardware-based design:Software uses regularload/store instructions

Recover

Page 14: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

𝐶"#$%

BTT (Mem. Ctrl.)

Checkpointing Scheme I: Block Remapping(location in the tradeoff: small granularity + NVM in-place)

𝑊#'%()*

𝐶"#$%

BTT (Mem. Ctrl.)P Q NVM (blocks)

PWrite to P(cache block size)

Q

During execution: remap theworking copy to a new address inNVM, to protect the last checkpoint

During checkpointing: only need topersist BTT;𝑊#'%()* becomes𝐶"#$%without anydatamovement

𝐶"#$%

BTT (Mem. Ctrl.)P Q

Checkpointinggranularity

Small (cache block) Large (page)

Locatio

nof

working

copy

DRAM(basedonwriteback)

❶ Inefficient� Large metadata overhead� Long checkpointing latency

❷ Partiallyefficient� Small metadataoverhead� Long checkpointinglatency

NVM(basedonremapping)

❸ Partiallyefficient� Large metadataoverhead� Short checkpointinglatency� Fast remapping

❹ Inefficient� Small metadataoverhead� Short checkpointinglatency� Slowremapping(onthecriticalpath)

P

Q

BTT Backup (NVM)P Q

NVM (blocks)

Dual-Scheme Checkpointing

Page 15: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Dual-Scheme Checkpointing

Checkpointing Scheme II: Page Writeback(location in the tradeoff: large granularity + DRAM cache)

Checkpointinggranularity

Small (cache block) Large (page)

Locatio

nof

working

copy

DRAM(basedonwriteback)

❶ Inefficient� Large metadata overhead� Long checkpointing latency

❷ Partiallyefficient� Small metadataoverhead� Long checkpointinglatency

NVM(basedonremapping)

❸ Partiallyefficient� Large metadataoverhead� Short checkpointinglatency� Fast remapping

❹ Inefficient� Small metadataoverhead� Short checkpointinglatency� Slowremapping(onthecriticalpath)

During execution: update thecached hot pages in DRAM (𝑊#'%()* )

During checkpointing: writeback𝑊#'%()* and PTT to NVM

𝐶"#$%

PTT (Mem. Ctrl.)P P*

P

NVM (pages)

𝑊#'%()*P*

DRAM (pages)

𝐶"#$%

PTT (Mem. Ctrl.)P P*

P

NVM (pages)

𝑊#'%()*P*DRAM (pages) Q

PTT Backup (NVM)P Q

Write toa block in P

Page 16: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Dual-Scheme CheckpointingCoordinating the Two Schemes• Key Mechanism I: Realizing short application stall time bycooperation of dual schemes

ThyNVM: overlap program execution and checkpointing time.

execution checkpointing

Epoch 1

execution checkpointing

Epoch 0

time

Mainly due to the page writeback scheme,while the block remapping scheme finishes

checkpointing fast

Page 17: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Dual-Scheme CheckpointingCoordinatingtheTwoSchemes• Key Mechanism I: Realizing short application stall time bycooperation of dual schemes

execution checkpointing

Epoch 1

execution checkpointing

Epoch 2

execution checkpointing

Epoch 0

time

Two schemes operate separatelyfor different memory regions

Page writeback does checkpointingin background

Block remapping takes charge ofall memory regions temporarily

Page 18: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Dual-Scheme Checkpointing

The penultimate checkpoint 𝐶+*,-"%

the last checkpoint𝐶"#$%

Recover

execution checkpointing

Epoch 1

execution checkpointing

Epoch 2

execution checkpointing

Epoch 0

time

CoordinatingtheTwoSchemes• Key Mechanism I: Realizing short application stall time bycooperation of dual schemes

the workingcopy𝑊#'%()*

Page 19: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Dual-Scheme Checkpointing

Blockremapping (Cooperation) Pagewriteback

Storereceived

No

Still ckpt. 𝐶"#$%?No

Write 𝑾𝒂𝒄𝒕𝒊𝒗𝒆𝒃𝒍𝒐𝒄𝒌 to NVM

(protecting 𝑪𝒍𝒂𝒔𝒕);Update BTT

YesHit in PTT?

Write 𝑾𝒂𝒄𝒕𝒊𝒗𝒆𝒑𝒂𝒈𝒆 to DRAM

(protecting 𝑪𝒍𝒂𝒔𝒕);Update PTT

Still ckpt. 𝐶"#$%?

NoYes

Buffer 𝑾𝒂𝒄𝒕𝒊𝒗𝒆𝒃𝒍𝒐𝒄𝒌 in DRAM

(protecting 𝑪𝒑𝒆𝒏𝒖𝒍𝒕);Update BTT

Yes

CoordinatingtheTwoSchemes• Key Mechanism I:Summary of flow

Acknowledge

Page 20: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Dual-Scheme CheckpointingCoordinatingtheTwoSchemes• Key Mechanism II: Realizing small metadata overhead bymatching write patterns with dual schemes

• Estimate spatial locality by# stores in the last epoch on individualblocks/pages (recorded on BTT/PTT)

• Switch scheme by updating PTT andmigratingnecessary data

Spatiallocality

Writepattern

Page-levelcharacteristics

Granularity formin metadata

Matchingscheme

Low Random,sparse,of small sizes

Small portionof dirty data

Small(cache block size)

Blockremapping

High Sequential,dense,of large sizes

Large portionof dirty data

Large(page size)

Pagewriteback

Page 21: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Outline• Motivation• Observation: A New Tradeoff• Dual-Scheme Checkpointing• Evaluation

Page 22: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Evaluation• Experiment Setup

• Simulator basedongem5• DRAMandNVMwithDDR3• NVM:40(128/368)nsrowhit(clean/dirtymiss)

• Systemsincomparison• IdealDRAM:fullDRAM; nocost in supporting crashconsistency

• Ideal NVM: full NVM; no cost in supporting crashconsistency

• Journaling (one form of logging)• Shadow paging (one form of copy-on-write)

Page 23: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Evaluation• Workload I: Micro-benchmarks with

different write patterns

0

512

1024

1536

2048

Journal ShadowThyNVM 0

20

40

60

80

100

Tota

l am

ount of

NV

M w

rite

tra

ffic

(MB

)

% e

xec. tim

espent on c

kpt.

CPUMigration

Checkpoint.% exec time on ckpt

0

256

512

768

1024

Journal ShadowThyNVM 0

20

40

60

80

100

Tota

l am

ount of

NV

M w

rite

tra

ffic

(MB

)

% e

xec. tim

espent on c

kpt.

CPUMigration

Checkpoint.% exec time on ckpt

(a) Random (b) Sequential

• ThyNVMreducestheNVMwritetrafficby10.8%/14.4%compared to Journalingand Shadow paging.

• Journaling/Shadowpagingspend18.9%/15.2% timeoncheckpointing,whileThyNVMreducesthisoverheadto2.5% onaverage.

Page 24: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Evaluation• Workload II: In-memory storage

(hashtable based key-value store)

• ThyNVMprovides8.8% higherthroughputthanJournaling

• ThyNVMprovides29.9% higherthroughputthanShadowpaging

50 100 150 200 250 300 350

16 64 256 1024 4096

Tran

sact

ion

thro

ughp

ut (K

TPS)

Request size (B)

Ideal DRAMIdeal NVM

JournalShadowThyNVM

Page 25: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Evaluation• Workload III: Compute-intensive tasks

(in CPU SPEC 2006)

• ThyNVMslowsdownbyonly3.4% comparedtoIdealDRAM,andspeedsupby2.7% comparedtoIdealNVM.

0.4 0.5 0.6 0.7 0.8 0.9

1 1.1 1.2

gcc bwavesmilc leslie.soplexGems.lbm omnet.

Nor

mal

ized

IPC

Ideal DRAMIdeal NVM

ThyNVM

Page 26: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

ConclusionContributions• We propose anewhybrid persistentmemorydesignwithsoftware-transparentcrashconsistencysupport.

• We identify a new tradeoffbetweenapplicationstalltimeandmetadatastorageoverhead.

• Wedeviseanew efficientdual-schemecheckpointingmechanism.

Potentials• ThyNVMcanenable:(1)easierandmorewidespreadadoption ofpersistentmemory,and(2)moreefficientsoftwarestackforexploitingpersistentmemory.

• ThyNVM can encouragemoreresearchinprovidingprogrammer-friendlymechanismsformanagingpersistentandhybridmemories.

Open Source• Web site:http://persper.com/thynvm (source code, documents, etc.)

Page 27: thynvm - soft.cs.tsinghua.edu.cnsoft.cs.tsinghua.edu.cn/os2atc2015/ppt/rjl.pdf · ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems Jinglei Ren∗†

Thank you!Jinglei Ren <[email protected]>

http://persper.com/thynvm