1 The Five-Minute Rule 20 Years Later (And How Flash Memory Changes The Rules) Goetz Graefe...

Post on 29-Mar-2015

214 views 0 download

Tags:

Transcript of 1 The Five-Minute Rule 20 Years Later (And How Flash Memory Changes The Rules) Goetz Graefe...

1

The Five-Minute Rule20 Years Later(And How Flash Memory Changes The Rules)Goetz Graefe

Presented ByAbhinav Parate

2

Storage Hierarchy

FLASH

3

Comparing Flash with Disks

4

When should we increase main memory?

• Metrics to decide-– Cost of infrastructure– Cost of maintenance–Mean Time to Failure– Performance improvement

• Simplest answer: Increase RAM size if it is insufficient to hold frequently accessed data item

• What time period is frequent?

5

Cost of accessing a data item

• A disc provides N accesses per second and costs $D.• DA: D/N = Cost of disc access per second • M : Cost of 1 byte of main memory

• I : Expected interval when the same data is accessed again (in seconds)

• B : Size of data in bytes

6

Cost of accessing a data item

• Number of accesses per second for data item = 1/I

• Cost if item is accessed from disc = DA/I

• Cost if item is available in memory = M * B• Keep data item in memory if main memory cost is less than

disc access cost

• M * B < DA/ I

• I < DA/ (M * B)

• For 1 KB data item, I < 400s ~ 5 minutes at 1987 costs

7

The Five-Minute Rule• In 1987, Keep a 1KB data item in main memory, if it is

accessed repeatedly in less than 5 minutes. • In 1967, the frequent period was 0.5 s• In 2007, the authors predicted 5 hour rule• At actual 2007 prices, the period turned out to be

little under 6 hours.

8

Sample Case• A database consists of 500,000 records of 1000 bytes each.• Peak load consists of 600 transactions per sec.• Only 6% of data gets 96% accesses and gets accessed in

<5min.• 6% data resides in main memory.• Remaining data gets accessed via two hard disks to support 1

second access time.• The design saved $3.5m at 1987 costs when compared with

entirely main-memory design

9

Back to Present• Technology changed• Multiple cores• Virtualization• Size of data increased tremendously• Gap between RAM and disks performance increased• FLASH memory comes into the picture!

10

Flash memory characteristics• Purchase cost• Access Latency• Bandwidth• Density• Power consumption• Cooling costs• Everything lies in between RAM and rotating

hard disks!

11

Comparison: Flash and Disks

12

Desirability of Flash Memory• Disk I/O is increasingly becoming bottleneck as the

number of CPU instructions possible in a disk I/O is steadily increasing

• A faster intermediate memory in storage hierarchy is highly desirable

13

Limitation of Flash Memory• Write-bandwidth is lower than read-bandwidth.• Re-writing a block requires erasing of entire block.• Reliability: 100,000-1M erase and write cycles• Requires wear levelling mechanism• Requires agent to erase blocks as soon as they are written

to hard disk.

14

The presentation ahead ...• Key challenges in using flash memory• Addressing challenges• Lots of open questions• Implications in greening the computing infrastructure.

15

#1: Which hardware interface to use?

• Use DIMM?• Use Serial-ATA? • Use new hardware interface?• Defining and developing new hardware interface is time-

consuming exercise• Use one of the existing interfaces

16

#2: Use as Buffer or Persistent Storage?

• Database systems are concerned with providing consistency.

• Databases have large number of small updates and must maintain recovery logs.

• Write logs to persistent storage quickly. • Use Flash as Persistent Storage!

17

#2: Use as Buffer or Persistent Storage?

• File-systems manipulates the file contents in memory and write file to disk in its entirety

• Consistency is achieved via careful write ordering, quick write-back and expensive file-system checks.

• Page movement between flash and disks is expensive if flash is considered as persistent storage.

• Use Flash memory as buffer pool!.

18

#3: How to track Frequent Pages?• The estimation and administration of frequent pages in

current system is done through LRU• Maintain two LRU chains in RAM

19

Least Recently Used Chain• LRU for RAM

• LRU for flash memory

T(N) T(N-1) T(1)

20

#4: How to decide size of RAM and Flash?

• Use Five-Minute Rule!

21

#5: How to move pages among layers in hierarchy?

• RAM and flash– DMA Transfer

• Flash and Disk– DMA (hardware)– Transfer buffer in RAM (software)

22

#6: How to track Page Locations?• File systems– Maintain pointer pages– Pointer points to data page or run of contiguous data

pages– Individual page movement may require breaking up

run and updating pointer pages

23

#6: How to track Page Locations?• Database systems– Use B-Tree indexes– Other kinds of indexes have been implemented on B-Trees

efficiently– Page movement requires updating pointers in parent node

and neighbors

24

Benefits to Database Systems• Check Point Processing– provides consistency in databases– writes dirty pages to persistent storage– persistent flash storage is faster– need to write to disk only if page-replacement policy

requires

• Recovery Logs– quick writes

25

Benefits to Database Systems• Query Processing– Index based selection is faster– Need to consider index based query plans– Index joins and intersections

• Example:• Table Scan: 100M rows : 100s• Index fetches 10K rows in 100s• Table Scan is efficient if result has more than 10K rows.• Flash index scan fetches 500K rows!

26

Problem of Optimal B-tree Page Size

• Two different optimal page sizes

27

Implications for Green Computing• This work's focus is infrastructure cost.• Energy optimization may lead to different optimal page

sizes for B-trees.• Infrastructure cost optimization can lead to significant

reduction in RAM size and hence, lower energy consumption.

• Introduces large flash memory in the system.

28

Implications for Green Computing• P_flash be power consumption with flash memory• P_noflash be power consumption without flash• Let T_flash,T_noflash denote system throughput with/without

flash• System is green if– P_flash / P_noflash < 1– T_flash / T_noflash > 1

29

Implications for Green Computing

• What if P_flash / P_noflash > 1?• In this case, system is green if– T_flash / T_noflash > P_flash / P_noflash– Gain in throughput is higher than extra power spent

30

Some calculations• Assume linear relation between number of frequently accessed

pages and the frequent period• If M is RAM used in no-flash system– M/15 is RAM in flash-based system– 4M is flash memory

• P_flash = M/15 x pram + 4M x pflash

• P_noflash = M x pram

• P_flash < P_noflash if pflash< 14/60 pram

• The relationship holds true.

31

Conclusions• Desirable to have faster intermediate memory in storage

hierarchy.• Database systems are likely to benefit a lot.• Things are not clear about file-systems.• Flash can improve system throughput and reduce power

consumption.• Reduction in RAM usage can lead to significant power

savings.

32

Thank You!