RAMCloud: Scalable High-Performance Storage Entirely in DRAM
description
Transcript of RAMCloud: Scalable High-Performance Storage Entirely in DRAM
![Page 1: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/1.jpg)
RAMCloud: ScalableHigh-Performance Storage
Entirely in DRAMJohn Ousterhout
Stanford University
(with Nandu Jayakumar, Diego Ongaro, Mendel Rosenblum,Stephen Rumble, and Ryan Stutsman)
![Page 2: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/2.jpg)
DRAM in Storage Systems
March 28, 2011 RAMCloud Slide 2
1970 1980 1990 2000 2010
UNIX buffercache
Main-memorydatabases
Large filecaches
Web indexesentirely in DRAM
memcached
Facebook:200 TB total data
150 TB cache!
Main-memoryDBs, again
![Page 3: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/3.jpg)
DRAM in Storage Systems● DRAM usage limited/specialized● Clumsy (consistency with
backing store)● Lost performance (cache misses,
backing store)
March 28, 2011 RAMCloud Slide 3
1970 1980 1990 2000 2010
UNIX buffercache
Main-memorydatabases
Large filecaches
Web indexesentirely in DRAM
memcached
Facebook:200 TB total data
150 TB cache!
Main-memoryDBs, again
![Page 4: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/4.jpg)
Harness full performance potential of large-scale DRAM storage:● General-purpose storage system● All data always in DRAM (no cache misses)● Durable and available (no backing store)● Scale: 1000+ servers, 100+ TB● Low latency: 5-10µs remote access
Potential impact: enable new class of applications
March 28, 2011 RAMCloud Slide 4
RAMCloud
![Page 5: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/5.jpg)
March 28, 2011 RAMCloud Slide 5
RAMCloud Overview● Storage for datacenters● 1000-10000 commodity
servers● 32-64 GB DRAM/server● All data always in RAM● Durable and available● Performance goals:
High throughput:1M ops/sec/server
Low-latency access:5-10µs RPC
Application Servers
Storage Servers
Datacenter
![Page 6: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/6.jpg)
Example Configurations
For $100-200K today: One year of Amazon customer orders One year of United flight reservations
March 28, 2011 RAMCloud Slide 6
Today 5-10 years
# servers 2000 4000
GB/server 24GB 256GB
Total capacity 48TB 1PB
Total server cost $3.1M $6M
$/GB $65 $6
![Page 7: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/7.jpg)
March 28, 2011 RAMCloud Slide 7
Why Does Latency Matter?
● Large-scale apps struggle with high latency Facebook: can only make 100-150 internal requests per page Random access data rate has not scaled!
UI
App.Logic
DataStructures
Traditional Application
UI
App.Logic
App
licat
ion
Ser
vers S
torage Servers
Web Application
<< 1µs latency 0.5-10ms latency
Single machineDatacenter
![Page 8: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/8.jpg)
March 28, 2011 RAMCloud Slide 8
MapReduce
Sequential data access → high data access rate Not all applications fit this model Offline
Computation
Data
![Page 9: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/9.jpg)
March 28, 2011 RAMCloud Slide 9
Goal: Scale and Latency
● Enable new class of applications: Crowd-level collaboration Large-scale graph algorithms Real-time information-intensive applications
Traditional Application Web Application
<< 1µs latency 0.5-10ms latency5-10µs
UI
App.Logic
App
licat
ion
Ser
vers S
torage Servers
Datacenter
UI
App.Logic
DataStructures
Single machine
![Page 10: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/10.jpg)
March 28, 2011 RAMCloud Slide 10
RAMCloud Architecture
Master
Backup
Master
Backup
Master
Backup
Master
Backup…
Appl.
Library
Appl.
Library
Appl.
Library
Appl.
Library…
DatacenterNetwork Coordinator
1000 – 10,000 Storage Servers
1000 – 100,000 Application Servers
![Page 11: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/11.jpg)
create(tableId, blob) => objectId, version
read(tableId, objectId) => blob, version
write(tableId, objectId, blob) => version
cwrite(tableId, objectId, blob, version) => version
delete(tableId, objectId)
March 28, 2011 RAMCloud Slide 11
Data ModelTables
Identifier (64b)Version (64b)
Blob (≤1MB)
Object
(Only overwrite ifversion matches)
Richer model in the future:• Indexes?• Transactions?• Graphs?
![Page 12: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/12.jpg)
● Goals: No impact on performance Minimum cost, energy
● Keep replicas in DRAM of other servers? 3x system cost, energy Still have to handle power failures Replicas unnecessary for performance
● RAMCloud approach: 1 copy in DRAM Backup copies on disk/flash: durability ~ free!
● Issues to resolve: Synchronous disk I/O’s during writes?? Data unavailable after crashes??
March 28, 2011 RAMCloud Slide 12
Durability and Availability
![Page 13: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/13.jpg)
Disk
Backup
Buffered Segment
Disk
Backup
Buffered Segment
● No disk I/O during write requests● Master’s memory also log-structured● Log cleaning ~ generational garbage collectionMarch 28, 2011 RAMCloud Slide 13
Buffered Logging
Master
Disk
Backup
Buffered Segment
In-Memory Log
HashTable
Write request
![Page 14: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/14.jpg)
● Power failures: backups must guarantee durability of buffered data: DIMMs with built-in flash backup Per-server battery backups Caches on enterprise disk controllers
● Server crashes: Must replay log to reconstruct data Meanwhile, data is unavailable Solution: fast crash recovery (1-2 seconds) If fast enough, failures will not be noticed
● Key to fast recovery: use system scale
March 28, 2011 RAMCloud Slide 14
Crash Recovery
![Page 15: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/15.jpg)
RAMCloud
● Master chooses backups statically Each backup stores entire log for master
● Crash recovery: Choose recovery master Backups read log info from disk Transfer logs to recovery master Recovery master replays log
● First bottleneck: disk bandwidth: 64 GB / 3 backups / 100 MB/sec/disk
≈ 210 seconds
● Solution: more disks (and backups)
March 28, 2011 Slide 15
Recovery, First Try
RecoveryMaster
Backups
![Page 16: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/16.jpg)
RAMCloudMarch 28, 2011 Slide 16
Recovery, Second Try● Scatter logs:
Each log divided into 8MB segments Master chooses different backups for each segment (randomly) Segments scattered across all servers in the cluster
● Crash recovery: All backups read from disk in parallel Transmit data over network to recovery master
RecoveryMaster
~1000Backups
![Page 17: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/17.jpg)
● Disk no longer a bottleneck: 64 GB / 8 MB/segment / 1000 backups ≈ 8 segments/backup 100ms/segment to read from disk 0.8 second to read all segments in parallel
● Second bottleneck: NIC on recovery master 64 GB / 10 Gbits/second ≈ 60 seconds Recovery master CPU is also a bottleneck
● Solution: more recovery masters Spread work over 100 recovery masters 64 GB / 10 Gbits/second / 100 masters ≈ 0.6 second
March 28, 2011 RAMCloud Slide 17
Scattered Logs, cont’d
![Page 18: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/18.jpg)
● Divide each master’s data into partitions Recover each partition on a separate recovery master Partitions based on tables & key ranges, not log segment Each backup divides its log data among recovery masters
March 28, 2011 RAMCloud Slide 18
Recovery, Third Try
RecoveryMasters
Backups
DeadMaster
![Page 19: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/19.jpg)
March 28, 2011 RAMCloud Slide 19
Other Research Issues
● Fast communication (RPC) New datacenter network protocol?
● Data model
● Concurrency, consistency, transactions
● Data distribution, scaling
● Multi-tenancy
● Client-server functional distribution
● Node architecture
![Page 20: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/20.jpg)
● Goal: build production-quality implementation● Started coding Spring 2010● Major pieces coming together:
RPC subsystem● Supports many different transport layers● Using Mellanox Infiniband for high performance
Basic data model Simple cluster coordinator Fast recovery
● Performance (40-node cluster): Read small object: 5µs Throughput: > 1M small reads/second/server
March 28, 2011 RAMCloud Slide 20
Project Status
![Page 21: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/21.jpg)
March 28, 2011 RAMCloud Slide 21
Single Recovery Master
1000
400-800 MB/sec
![Page 22: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/22.jpg)
March 28, 2011 RAMCloud Slide 22
Recovery Scalability
1 master6 backups6 disks600 MB
11 masters66 backups66 disks6.6 TB
![Page 23: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/23.jpg)
● Achieved low latency (at small scale)● Not yet at large scale (but scalability encouraging)● Fast recovery:
1 second for memory sizes < 10GB Scalability looks good Durable and available DRAM storage for the cost of volatile
cache
● Many interesting problems left● Goals:
Harness full performance potential of DRAM-based storage Enable new applications: intensive manipulation of large-scale
dataMarch 28, 2011 RAMCloud Slide 23
Conclusion
![Page 24: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/24.jpg)
March 28, 2011 RAMCloud Slide 24
Why not a Caching Approach?● Lost performance:
1% misses → 10x performance degradation
● Won’t save much money: Already have to keep information in memory Example: Facebook caches ~75% of data size
● Availability gaps after crashes: System performance intolerable until cache refills Facebook example: 2.5 hours to refill caches!
![Page 25: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/25.jpg)
RAMCloudMarch 28, 2011 Slide 25
Data Model Rationale
How to get best application-level performance?
Lower-level APIsLess server functionality
Higher-level APIsMore server functionality
Key-value store
Distributed shared memory : Server implementation easy Low-level performance good APIs not convenient for
applications Lose performance in
application-level synchronization
Relational database : Powerful facilities for apps Best RDBMS performance Simple cases pay RDBMS
performance More complexity in servers
![Page 26: RAMCloud: Scalable High-Performance Storage Entirely in DRAM](https://reader036.fdocuments.net/reader036/viewer/2022062520/56815b7d550346895dc97b47/html5/thumbnails/26.jpg)
March 28, 2011 RAMCloud Slide 26
RAMCloud Motivation: TechnologyDisk access rate not keeping up with capacity:
● Disks must become more archival● More information must move to memory
Mid-1980’s 2009 ChangeDisk capacity 30 MB 500 GB 16667xMax. transfer rate 2 MB/s 100 MB/s 50xLatency (seek & rotate) 20 ms 10 ms 2xCapacity/bandwidth(large blocks) 15 s 5000 s 333x
Capacity/bandwidth(1KB blocks) 600 s 58 days 8333x