Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation...

13
Okeanos: Wasteless Journaling for Fast and Reliable Multistream Storage Andromachi Hatzieleftheriou, Stergios V. Anastasiadis Department of Computer Science University of Ioannina, Greece 1 A. Hatzieleftheriou University of Ioannina

Transcript of Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation...

Page 1: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Okeanos: Wasteless Journaling for Fast and Reliable Multistream Storage

Andromachi Hatzieleftheriou, Stergios V. Anastasiadis

Department of Computer ScienceUniversity of Ioannina, Greece

1A. HatzieleftheriouUniversity of Ioannina

Page 2: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Outline

Motivation

Design

Implementation

Evaluation

Conclusions

2A. HatzieleftheriouUniversity of Ioannina

Page 3: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Motivation

• Synchronous small writes critical for system and application

reliability

• Multistream concurrency effectively random I/O

• In page-sized disk accesses async writes have good performance due to batching in memory

sync writes result in wasteful traffic due to excessive full-page I/Os

1

10

100

1000

0 1 10 100Tota

l Jo

urn

al V

olu

me

(M

B)

Request Size (KB)

Write Traffic (Linux ext3)

Data Journaling

Ordered

Page Size=4KB

3A. Hatzieleftheriou

data & metadata

metadata only

University of Ioannina

Page 4: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Design Goals

1. Reliable storage keep data on disk

2. Inexpensive synchronous small writes

sequential disk throughput

3. Reduce disk bandwidth waste due to: writes with high positioning overhead

unnecessary writes of unmodified data

• Proposed approach: batch random small writes in memory

journal data updates at subpage granularity

4A. HatzieleftheriouUniversity of Ioannina

Page 5: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

DISK Filesystem

Wasteless Journaling

• Idea:1. Synchronously transfer data deltas from memory to journal

2. Occasionally move data blocks from memory to final location

• Still wasteful!

large writes disk traffic duplication

MEMORY

Journal

Pages

5A. HatzieleftheriouUniversity of Ioannina

data deltas

Page 6: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Selective Journaling

• Definition: write threshold differentiates requests by size

• Idea:1. Transfer large requests to final location without journaling of data

2. Treat small requests according to wasteless journaling

MEMORY

DISK

Journal

data deltas

Pages

Filesystem

6A. HatzieleftheriouUniversity of Ioannina

Page 7: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Consistency

• Wasteless Journaling: atomic updates of both data and metadata

• Selective Journaling: data updates either journaled or not depending on request size

consistency at least as strict as default ext3 journaling mode (ordered)

7A. HatzieleftheriouUniversity of Ioannina

Page 8: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Prototype Implementation

HeaderTagTag

Tag

Data Delta

Data Delta

Data Delta

Data Copies

Block Buffer

Page Cache

Journal Descriptor Block

Multiwrite Journal Block

Modified DataOriginal Data

• block num of final location• offset in page• length in bytes

• Multiwrite journal block

accumulates multiple subpage data updates

• During recovery

apply data deltas to corresponding final disk blocks

8A. HatzieleftheriouUniversity of Ioannina

Page 9: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Experiments

• Implemented in Linux kernel 2.6.18 ext3

• Experimentation Environment: x86-based servers

quad-core 2.66GHz processor

3GB RAM

Seagate Cheetah SAS 300GB 15KRPM disks

• Workloads: Microbenchmarks

Postmark

MPIO-IO over PVFS2

9A. HatzieleftheriouUniversity of Ioannina

Page 10: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Latency

⁻ Data & wasteless achieve substantially lower write latency similar to NILFS (stable Linux port of LFS )

⁻ NILFS read latency significantly higher due to poor storage locality!

1

10

100

1000

0 20 40 60 80 100

Wri

te L

aten

cy (

ms)

Number of Streams

1 Mbps/stream

SelectiveOrderedWastelessDataNILFS 1

10

100

1000

0 20 40 60 80 100

Re

ad L

ate

ncy

(u

s)

Number of Streams

1 Mbps/stream

NILFSSelectiveOrderedDataWasteless

10A. HatzieleftheriouUniversity of Ioannina

Page 11: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Disk Traffic

⁻ Data journaling expensive in terms of journal traffic

⁻ Ordered journaling incurs increased filesystem traffic

⁻ Wasteless & selective substantially reduce journal and filesystem traffic

0.001

0.01

0.1

1

10

0 2000 4000 6000 8000

Jou

rnal

Th

rou

ghp

ut

(MB

/s)

Number of Streams

1Kbps/stream

Data

Wasteless

Selective

Ordered

University of Ioannina 11A. Hatzieleftheriou

0

1

2

3

4

5

0 2000 4000 6000 8000File

Sys

tem

Th

rou

ghp

ut

(MB

/s)

Number of Streams

1Kbps/stream

Ordered

Data

Wasteless

Selective

Lower is better!

Page 12: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Application-Level Workloads

₋ Small files workload wasteless increases transaction throughput

₋ Parallel I/O workload 13 clients, 1 PVFS2 data server, 1 PVFS2 metadata server (15 machines)

wasteless doubles the throughput of parallel application checkpointing

0

200

400

600

800

0 1 10 100

Tran

sact

ion

s/s

Request Size (KB)

Postmark

Wasteless

Data

Selective

Ordered

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40

Thro

ugh

pu

t M

B/s

Threads per Client

MPI-IO over PVFS2(Write Size 1KB)

Wasteless

Data

Selective

Ordered

12A. HatzieleftheriouUniversity of Ioannina

Page 13: Okeanos: Wasteless Journaling for Fast and Reliable Multistream … · 2019. 2. 25. · Motivation •Synchronous small writes critical for system and application reliability •Multistream

Conclusions & Future Work

• Key concept: apply subpage journaling of data updates to ensure reliability

• Wasteless Journaling merges subpage writes into page-sized journal blocks

• Selective Journaling journals only updates below a write threshold

• Performance benefits demonstrated over ext3: reduced write latency

improved transaction throughput

avoided bandwidth waste

• Future Work extent for virtualization environments and flash memory systems

13A. HatzieleftheriouUniversity of Ioannina