I/O and Storage · CS 571: Operating Systems (Spring 2020) Lecture 10a Yue Cheng Some material...

Post on 27-Jun-2020

3 views 0 download

Transcript of I/O and Storage · CS 571: Operating Systems (Spring 2020) Lecture 10a Yue Cheng Some material...

I/O and Storage: Flash SSDs

CS 571: Operating Systems (Spring 2020)Lecture 10a

Yue Cheng

Some material taken/derived from: • Wisconsin CS-537 materials created by Remzi Arpaci-Dusseau.Licensed for use under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Disk vs. Flash

2Y. Cheng GMU CS571 Spring 2020

Disk Overview

• I/O requires: seek, rotate, transfer

• Inherently:• Not parallel (only one head)• Slow (mechanical)• Poor random I/O (locality around disk head)

• Random requests each taking ~10+ ms

3Y. Cheng GMU CS571 Spring 2020

Flash Overview

• Hold charge in cells. No moving (mechanical) parts!

• Inherently parallel!

• No seeks!

4Y. Cheng GMU CS571 Spring 2020

Storage Hierarchy Overview

5

Flash:Smaller capacityFaster accesses

HDD:Larger capacity

Way slower accesses

Y. Cheng GMU CS571 Spring 2020

Disks vs. Flash: Performance

• Throughput• Disk: ~130MB/s (sequential)

• Flash: ~400MB/s

• Latency• Disk: ~10ms (one op)

• Flash:• Read: 10-50us

• Program: 200-500us

• Erase: 2ms

6Y. Cheng GMU CS571 Spring 2020

Disks vs. Flash: Performance

• Throughput• Disk: ~130MB/s (sequential)

• Flash: ~400MB/s

• Latency• Disk: ~10ms (one op)

• Flash:• Read: 10-50us

• Program: 200-500us

• Erase: 2ms

7

Types of write, more later…

Y. Cheng GMU CS571 Spring 2020

Disks vs. Flash: Internal

8* https://www.enterprisestorageforum.com/storage-hardware/ssd-vs-hdd.html

Disks vs. Flash: Summary

10* https://www.enterprisestorageforum.com/storage-hardware/ssd-vs-hdd.html

Flash Architecture

11Y. Cheng GMU CS571 Spring 2020

SLC: Single-Level Cell

12

NAND Cell

char

ge

Y. Cheng GMU CS571 Spring 2020

SLC: Single-Level Cell

13

NAND Cell

char

ge 1

Y. Cheng GMU CS571 Spring 2020

SLC: Single-Level Cell

14

NAND Cell

char

ge 0

Y. Cheng GMU CS571 Spring 2020

MLC: Multi-Level Cell

15

NAND Cell

char

ge 00

Y. Cheng GMU CS571 Spring 2020

MLC: Multi-Level Cell

16

NAND Cell

char

ge 01

Y. Cheng GMU CS571 Spring 2020

MLC: Multi-Level Cell

17

NAND Cell

char

ge 10

Y. Cheng GMU CS571 Spring 2020

MLC: Multi-Level Cell

18

NAND Cell

char

ge 11

Y. Cheng GMU CS571 Spring 2020

Single- vs. Multi-Level Cell

19

MLC

charge

SLC

charge

Y. Cheng GMU CS571 Spring 2020

Single- vs. Multi-Level Cell

20

MLC

char

ge

SLC

char

ge

expensiverobust

1 cell: 1 bit

cheapsensitive

1 cell: multi-bit

Y. Cheng GMU CS571 Spring 2020

Wearout

• Problem: flash cells wear out after being erased too many times

• MLC: ~10K times

• SLC: ~100K times

• Usage strategy: ???

21Y. Cheng GMU CS571 Spring 2020

Wearout

• Problem: flash cells wear out after being erased too many times

• MLC: ~10K times

• SLC: ~100K times

• Usage strategy: wear leveling• Prevents some cells from being wornout while others

still fresh

22Y. Cheng GMU CS571 Spring 2020

Banks

• Flash devices are divided into banks (aka. planes)

• Banks can be accessed in parallel

23

Bank 0 Bank 1 Bank 2 Bank 3

Y. Cheng GMU CS571 Spring 2020

Banks

• Flash devices are divided into banks (aka. planes)

• Banks can be accessed in parallel

24

Bank 0 Bank 1 Bank 2 Bank 3

read read

Y. Cheng GMU CS571 Spring 2020

Banks

• Flash devices are divided into banks (aka. planes)

• Banks can be accessed in parallel

25

Bank 0 Bank 1 Bank 2 Bank 3

data data

Y. Cheng GMU CS571 Spring 2020

Flash Writes

• Writing 0’s• Fast, fine-grained

• Writing 1’s• Slow, coarse-grained

26Y. Cheng GMU CS571 Spring 2020

Flash Writes

• Writing 0’s• Fast, fine-grained

• called “program”

• Writing 1’s• Slow, coarse-grained

• called “erase”

27Y. Cheng GMU CS571 Spring 2020

Flash Writes

• Writing 0’s• Fast, fine-grained [page-level]

• called “program”

• Writing 1’s• Slow, coarse-grained [block-level]

• called “erase”

28Y. Cheng GMU CS571 Spring 2020

Flash Writes

• Writing 0’s• Fast, fine-grained [page-level]

• called “program”

• Writing 1’s• Slow, coarse-grained [block-level]

• called “erase”

• Flash can only “write” (program) into clean pages• “clean”: pages containing all 1’s (pages that have

been erased)

• Flash does not support in-place overwrite!

29Y. Cheng GMU CS571 Spring 2020

Banks and Blocks

30

Bank 0 Bank 1 Bank 2 Bank 3

Y. Cheng GMU CS571 Spring 2020

Banks and Blocks

31

Bank 0 Bank 2 Bank 3

Each bank contains many “blocks”

Y. Cheng GMU CS571 Spring 2020

Block and Pages

32

11111111

11111111

11111111

11111111

11111111

11111111

11111111

11111111

One block

Y. Cheng GMU CS571 Spring 2020

Block and Pages

33

11111111

11111111

11111111

11111111

11111111

11111111

11111111

11111111

One page

Y. Cheng GMU CS571 Spring 2020

Block and Pages

34

11111111

11111111

11111111

11111111

11111111

11111111

11111111

11111111

All pages are clean (“programmable”)

Y. Cheng GMU CS571 Spring 2020

Block

35

10111000

11111111

11111111

11111111

11111111

11111111

11111111

11111111

program

Y. Cheng GMU CS571 Spring 2020

Block

36

10111000

11111111

11111111

11111111

11111111

01101010

11111111

11111111

program

Y. Cheng GMU CS571 Spring 2020

Block

37

10111000

11111111

11111111

11111111

11111111

01101010

11111111

11111111

Two pages hold data (cannot be overwritten)

Y. Cheng GMU CS571 Spring 2020

Block

38

10111000

11111111

11111111

11111111

11111111

01101010

11111111

11111111

still want to write data into this page???

Two pages hold data (cannot be overwritten)

Y. Cheng GMU CS571 Spring 2020

Block

39

10111000

11111111

11111111

11111111

11111111

01101010

11111111

11111111

erase

Y. Cheng GMU CS571 Spring 2020

Block

40

11111111

11111111

11111111

11111111

11111111

11111111

11111111

11111111

erase(the whole block)

Y. Cheng GMU CS571 Spring 2020

Block

41

11111111

11111111

11111111

11111111

11111111

11111111

11111111

11111111

After erase, again, free state (can write new data in any page)

Y. Cheng GMU CS571 Spring 2020

Block

42

10110001

11111111

11111111

11111111

11111111

11111111

11111111

11111111

This blue page holds data

Y. Cheng GMU CS571 Spring 2020

Flash vs. Disks: APIs

43

disk flashread

write

Y. Cheng GMU CS571 Spring 2020

Flash vs. Disks: APIs

44

disk flashre

adw

rite

read sector read page

Y. Cheng GMU CS571 Spring 2020

Flash vs. Disks: APIs

45

disk flashre

adw

rite

read sector read page

write sectorprogram page

(0’s)erase block

(1’s)

Y. Cheng GMU CS571 Spring 2020

Flash Architecture

• Bank/plane: 1024 to 4096 blocks• Banks accessed in parallel

• Block: 64 to 256 pages• Unit of erase

• Page: 2 to 8 KB• Unit of read and program

46Y. Cheng GMU CS571 Spring 2020

Disks vs. Flash: Performance

• Throughput• Disk: ~130MB/s (sequential)

• Flash: ~400MB/s

• Latency• Disk: ~10ms (one op)

• Flash:• Read: 10-50us

• Program: 200-500us

• Erase: 2ms

47Y. Cheng GMU CS571 Spring 2020

Working with File System

48Y. Cheng GMU CS571 Spring 2020

Traditional File Systems

49

File System

Storage Device

Traditional API:• read sector• write sector

Y. Cheng GMU CS571 Spring 2020

Traditional File Systems

50

File System

Storage Device

Traditional API:• read sector• write sector

Mismatch with flash!

Y. Cheng GMU CS571 Spring 2020

Traditional APIs wrapping around Flash APIsread(addr):

return flash_read(addr)

write(addr, data):

block_copy = flash_read(all pages of block)

modify block_copy with data

flash_erase(block of addr)

flash_program(all pages of block_copy)

51Y. Cheng GMU CS571 Spring 2020

Awkward Flash Write

52

Memory

Flash

0000

0011

0110

1111

block 0

1101

1111

1110

1111

block 1

0001

1111

1111

1011

block 2Y. Cheng GMU CS571 Spring 2020

Awkward Flash Write

53

Memory

Flash

0000

0011

0110

1111

block 0

1101

1111

1110

1111

block 1

0001

1111

1111

1011

block 2

File system wantsto write 0001

Y. Cheng GMU CS571 Spring 2020

Awkward Flash Write

54

Memory

Flash

0000

0011

0110

1111

block 0

1101

1111

1110

1111

block 1

0001

1111

1111

1011

block 2Y. Cheng GMU CS571 Spring 2020

Awkward Flash Write

55

Memory

Flash

0000

0011

0110

1111

block 0

1101

1111

1110

1111

block 1

0001

1111

1111

1011

block 2

Read all pages in block

0000

0011

0110

1111

Y. Cheng GMU CS571 Spring 2020

Awkward Flash Write

56

Memory

Flash

0000

0011

0110

1111

block 0

1101

1111

1110

1111

block 1

0001

1111

1111

1011

block 2

0000

0011

0110

1111

Y. Cheng GMU CS571 Spring 2020

Awkward Flash Write

57

Memory

Flash

0000

0011

0110

1111

block 0

1101

1111

1110

1111

block 1

0001

1111

1111

1011

block 2

0001

0011

0110

1111

Modify target pagein memory

Y. Cheng GMU CS571 Spring 2020

Awkward Flash Write

58

Memory

Flash

0000

0011

0110

1111

block 0

1101

1111

1110

1111

block 1

0001

1111

1111

1011

block 2

0001

0011

0110

1111

Y. Cheng GMU CS571 Spring 2020

Awkward Flash Write

59

Memory

Flash

1111

1111

1111

1111

block 0

1101

1111

1110

1111

block 1

0001

1111

1111

1011

block 2

0001

0011

0110

1111

Erase whole block

Y. Cheng GMU CS571 Spring 2020

Awkward Flash Write

60

Memory

Flash

1111

1111

1111

1111

block 0

1101

1111

1110

1111

block 1

0001

1111

1111

1011

block 2

0001

0011

0110

1111

Y. Cheng GMU CS571 Spring 2020

Awkward Flash Write

61

Memory

Flash

block 0

1101

1111

1110

1111

block 1

0001

1111

1111

1011

block 2

0001

0011

0110

1111

Program all pages in block 00

010011

0110

1111

Y. Cheng GMU CS571 Spring 2020

Awkward Flash Write

62

Memory

Flash

block 0

1101

1111

1110

1111

block 1

0001

1111

1111

1011

block 2

0001

0011

0110

1111

Y. Cheng GMU CS571 Spring 2020

Issue: Write Amplification

• Random writes are expensive for flash!

• Writing one 4KB page may cause:• read, erase, and program of the whole 256KB

block

63Y. Cheng GMU CS571 Spring 2020

Flash Translation Layer

64Y. Cheng GMU CS571 Spring 2020

Flash Translation Layer (FTL)

• Add an address translation layer between upper-level file system and lower-level flash• Translate logical device addresses to physical

addresses

• Convert in-place write into append-write

• Essentially, a virtualization & optimization layer

65Y. Cheng GMU CS571 Spring 2020

Flash Translation Layer (FTL)

66

Logical

Physical 0011

0110

1111

0010

1101

1111

1111

1111

0 1 2 3 4 5 6 7

block 0 block 1

Y. Cheng GMU CS571 Spring 2020

Flash Translation Layer (FTL)

67

Logical

Physical 0011

0110

1111

0010

1101

1111

1111

1111

0 1 2 3 4 5 6 7

block 0 block 1

Write 0011

Y. Cheng GMU CS571 Spring 2020

Flash Translation Layer (FTL)

68

Logical

Physical 0011

0110

1111

0010

1101

0011

1111

1111

0 1 2 3 4 5 6 7

block 0 block 1

Write 0011

Y. Cheng GMU CS571 Spring 2020

Flash Translation Layer (FTL)

69

Logical

Physical 0011

0110

1111

0010

1101

0011

1111

1111

0 1 2 3 4 5 6 7

block 0 block 1

Write 0011

Change address mapping

Y. Cheng GMU CS571 Spring 2020

Flash Translation Layer (FTL)

70

Logical

Physical 0011

0110

1111

0010

1101

0011

1111

1111

0 1 2 3 4 5 6 7

block 0 block 1

Write 0011

Change address mapping

Marked as invalidand will be garbage

collected

Y. Cheng GMU CS571 Spring 2020

SSD Architecture with FTL

71

FTL SRAM:Mapping table

SSD provides disk-like interface

Y. Cheng GMU CS571 Spring 2020

Flash Translation Layer (FTL)

• Usually implemented in flash device’s firmware

• Where to store mapping? • SRAM

• Physical pages can be in three states• valid, invalid, free

72Y. Cheng GMU CS571 Spring 2020

State Transition of Physical Pages

73

Free

Invalid

Valid

Y. Cheng GMU CS571 Spring 2020

State Transition of Physical Pages

74

Free

Invalid

ProgramValid

Y. Cheng GMU CS571 Spring 2020

State Transition of Physical Pages

75

Free

Invalid

Program

Relocate(overwrite)

Valid

Y. Cheng GMU CS571 Spring 2020

State Transition of Physical Pages

76

Free Valid

Invalid

Program

Erase Relocate(overwrite)

Y. Cheng GMU CS571 Spring 2020