Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation...
Transcript of Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation...
Okeanos: Wasteless Journaling for Fast and Reliable Multistream Storage
Andromachi Hatzieleftheriou, Stergios V. Anastasiadis
Department of Computer ScienceUniversity of Ioannina, Greece
1A. HatzieleftheriouUniversity of Ioannina
Outline
Motivation
Design
Implementation
Evaluation
Conclusions
2A. HatzieleftheriouUniversity of Ioannina
Motivation
• Synchronous small writes critical for system and application
reliability
• Multistream concurrency effectively random I/O
• In page-sized disk accesses async writes have good performance due to batching in memory
sync writes result in wasteful traffic due to excessive full-page I/Os
1
10
100
1000
0 1 10 100Tota
l Jo
urn
al V
olu
me
(M
B)
Request Size (KB)
Write Traffic (Linux ext3)
Data Journaling
Ordered
Page Size=4KB
3A. Hatzieleftheriou
data & metadata
metadata only
University of Ioannina
Design Goals
1. Reliable storage keep data on disk
2. Inexpensive synchronous small writes
sequential disk throughput
3. Reduce disk bandwidth waste due to: writes with high positioning overhead
unnecessary writes of unmodified data
• Proposed approach: batch random small writes in memory
journal data updates at subpage granularity
4A. HatzieleftheriouUniversity of Ioannina
DISK Filesystem
Wasteless Journaling
• Idea:1. Synchronously transfer data deltas from memory to journal
2. Occasionally move data blocks from memory to final location
• Still wasteful!
large writes disk traffic duplication
MEMORY
Journal
Pages
5A. HatzieleftheriouUniversity of Ioannina
data deltas
Selective Journaling
• Definition: write threshold differentiates requests by size
• Idea:1. Transfer large requests to final location without journaling of data
2. Treat small requests according to wasteless journaling
MEMORY
DISK
Journal
data deltas
Pages
Filesystem
6A. HatzieleftheriouUniversity of Ioannina
Consistency
• Wasteless Journaling: atomic updates of both data and metadata
• Selective Journaling: data updates either journaled or not depending on request size
consistency at least as strict as default ext3 journaling mode (ordered)
7A. HatzieleftheriouUniversity of Ioannina
Prototype Implementation
HeaderTagTag
Tag
Data Delta
Data Delta
Data Delta
Data Copies
Block Buffer
Page Cache
Journal Descriptor Block
Multiwrite Journal Block
Modified DataOriginal Data
• block num of final location• offset in page• length in bytes
…
…
…
• Multiwrite journal block
accumulates multiple subpage data updates
• During recovery
apply data deltas to corresponding final disk blocks
8A. HatzieleftheriouUniversity of Ioannina
Experiments
• Implemented in Linux kernel 2.6.18 ext3
• Experimentation Environment: x86-based servers
quad-core 2.66GHz processor
3GB RAM
Seagate Cheetah SAS 300GB 15KRPM disks
• Workloads: Microbenchmarks
Postmark
MPIO-IO over PVFS2
9A. HatzieleftheriouUniversity of Ioannina
Latency
⁻ Data & wasteless achieve substantially lower write latency similar to NILFS (stable Linux port of LFS )
⁻ NILFS read latency significantly higher due to poor storage locality!
1
10
100
1000
0 20 40 60 80 100
Wri
te L
aten
cy (
ms)
Number of Streams
1 Mbps/stream
SelectiveOrderedWastelessDataNILFS 1
10
100
1000
0 20 40 60 80 100
Re
ad L
ate
ncy
(u
s)
Number of Streams
1 Mbps/stream
NILFSSelectiveOrderedDataWasteless
10A. HatzieleftheriouUniversity of Ioannina
Disk Traffic
⁻ Data journaling expensive in terms of journal traffic
⁻ Ordered journaling incurs increased filesystem traffic
⁻ Wasteless & selective substantially reduce journal and filesystem traffic
0.001
0.01
0.1
1
10
0 2000 4000 6000 8000
Jou
rnal
Th
rou
ghp
ut
(MB
/s)
Number of Streams
1Kbps/stream
Data
Wasteless
Selective
Ordered
University of Ioannina 11A. Hatzieleftheriou
0
1
2
3
4
5
0 2000 4000 6000 8000File
Sys
tem
Th
rou
ghp
ut
(MB
/s)
Number of Streams
1Kbps/stream
Ordered
Data
Wasteless
Selective
Lower is better!
Application-Level Workloads
₋ Small files workload wasteless increases transaction throughput
₋ Parallel I/O workload 13 clients, 1 PVFS2 data server, 1 PVFS2 metadata server (15 machines)
wasteless doubles the throughput of parallel application checkpointing
0
200
400
600
800
0 1 10 100
Tran
sact
ion
s/s
Request Size (KB)
Postmark
Wasteless
Data
Selective
Ordered
0.0
0.2
0.4
0.6
0.8
1.0
0 10 20 30 40
Thro
ugh
pu
t M
B/s
Threads per Client
MPI-IO over PVFS2(Write Size 1KB)
Wasteless
Data
Selective
Ordered
12A. HatzieleftheriouUniversity of Ioannina
Conclusions & Future Work
• Key concept: apply subpage journaling of data updates to ensure reliability
• Wasteless Journaling merges subpage writes into page-sized journal blocks
• Selective Journaling journals only updates below a write threshold
• Performance benefits demonstrated over ext3: reduced write latency
improved transaction throughput
avoided bandwidth waste
• Future Work extent for virtualization environments and flash memory systems
13A. HatzieleftheriouUniversity of Ioannina