Post on 10-Dec-2015
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
1
The Rebirth ofDatabase Machines
Dina Bitton
Jim Gray
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
2
Outline
• Active Disks are coming
• Disk Tutorial (not presented, but slides in deck)
• Disk Arms are important (optimize them)
• The Rebirth of Database Machines
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
3
Disks of 30 Years Ago
• 10 MB
• Failed every few weeks
• Cost more than 400$
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
4
Disk Arrays
• 24 cpus
• 384 disks
• More mips in the disks than in the cpus
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
5
Year 2003 Disks• Big disk (10 $/GB)
– 3”– 200 GB– 150 kaps (k accesses per second)– 30 MBps sequential
• Small disk (20 $/GB)– 2”– 40 GB– 100 kaps – 20 MBps sequential
• Both running DBMS, Mail, Web, and OS
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
6
• From CMU Active Disk web sitehttp://www.pdl.cs.cmu.edu/Active/
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
7
Research Problem: When every disk is a super-computer…
And there are thousands of them...
• Who manages data placement?
• Query plans among 1,000 severs?
• How does
– mirroring work?
– backup work?
• Where does my program run?
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
8
Relevant University Research on Active Disks
• Kim Keeton & Dave Patterson @ UC Berkeleyhttp://www.cs.berkeley.edu/~pattrsn/talks/sigmod98-keynote.ppt
• Erik Riedel & Garth Gibson @ CMUhttp://www.pdl.cs.cmu.edu/Active/
• Mike Franklin @ U Marylandhttp://www.cs.umd.edu/projects/bdisk
• Anurag Acharya, Mustafa Uysal @ UC SBhttp://www.cs.ucsb.edu/TRs/TRCS98-06.html
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
9
Outline
• Active Disks are coming
• Disk Tutorial (not presented, but slides in deck)
• Disk Arms are important
• The Rebirth of Database Machines
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
12
Disk Access Time
• Access time = SeekTime 6 ms+ RotateTime 3 ms+ ReadTime 1 ms
• Rotate time:– 5,000 to 10,000 rpm
• ~ 12 to 6 milliseconds per rotation• ~ 6 to 3 ms rotational latency
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
13
Disk Access Time Improves Slowly• Access time = SeekTime 6 ms 8%/y
+ RotateTime 3 ms 8%/y+ ReadTime 1 ms 40%/y
• Other useful facts:– Power rises more than size3 (small is indeed beautiful)
– Small devices are more rugged– Small devices can use plastics (forces are much smaller)
e.g. bugs fall without breaking anything
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
14
Disk Seek Time
• Seek time is ~ Sqrt(distance)(distance = 1/2 acceleration x time2)
• Specs assume seek is 1/3 of disk
• Short seeks are common. (over 50% are zero length)
• Typical 1/3 seek time: 6 ms
• 4x improvement in 20 years.
Full Accelerate Full Stop
spee
d
time
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
20
Disk Access Ratios Have Changed• Key metrics:
$/GBKaps/GB (KB accesses per second per GB)
SCAN: time to scan the disk• Scan going from minutes to days• Disk arms are precious resource
(disk capacity is no longer the precious resource) Kaps/GB went from 500 to 7 and going to 1
yearCapacity
GB $/GB kapskaps/GB
ScanSequential
ScanRandom
1988 0.25 20,000 30 1200 2 minutes 20 minutes1998 18 50 120 7 20 minutes 5 hrs2003 200 5 200 1 2 hrs 1.2 days
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
21
Stripe For More Bandwidth• N-stores have N-times the bandwidth
• Works great!
• Supported by most file systems
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
22
Mirrors: Replicate Stores for Availability• Read one, write all
• If one fails, rebuild from survivor
• Run scrubber in background to fix faults
• N-replicas can give N-times the bandwidth
• UnAvailabity ~ NMTTF
MTTF
years000,000,1years50
day12
A Million Years!!!
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
23
RAID5: Parity Saves Storage Space
• Mirrors: 50% storage overhead– read one, write both
• RAID5: 12% Storage overhead: – read one, write one plus parity
PARITY
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
24
Interesting Fact: Mirrored Disks Optimize Disk Arms
• Doubles read bandwidthSequential: Read
stagger reads from each drive (stripe)
Random: Read closest arm seek is min seekseek is min seek.
• Doubles write cost (write both)
– Write time increases because
seek is max seekseek is max seek.
Seek Distance vs Disks
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7 8
Number of disks
Fra
ctio
n o
f d
isk
surf
ace
for
seek
Write Seek
Read Seek
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
25
If Mix Reads & WritesMirror is Better Than Partition
• 2 servers are better than one
• Benefit is better than 2x write cost if reads writes
Seek Distance vs Disks
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1 2 3 4 5 6 7 8
Number of disks
Write Seek
Read Seek
Normalized Seek Time
0.0
0.5
1.0
1.5
2.0
1 2 3 4 5 6 7 8
Number of disks
Frac
tion
of d
isk
surf
ace
for
seek
Write
25% Read
50% read
75% read
Read
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
26
What if you have LOTS of Disks• When you have BIG disks (200 GB),
arms are precious, space is cheap.
• If you replicate 1000x– write seek time asymptotically approaches 1.7x– read seek time asymptotically approaches zero.
0
100
200
300
400
500
600
700
800
900
1000
1 10 100 1000
Write
Read
Distance to Seek Time to Seek
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
1 2 3 4 5 6 7 8 9 10
disks
tim
e
Read
Write
Time to Seek
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 200 400 600 800 1000
disks
tim
e
Read
Write
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
27
Outline
• Active Disks are coming
• Disk Tutorial (not presented, but slides in deck)
• Disk Arms are important
• The Rebirth of Database Machines
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
28
The Rebirth of Database Machines
Dina Bitton Jim Gray
IDS Microsoft
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
29
Outline
• Performance hungry databases
• History: life and death of database machines
• What has changed that can make database machines work today
• Shared-Nothing Database Machine
• Where is the required bandwidth
• DMP : Shared-Nothing & Shared-Everything
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
30
Demand for Database Performance
• Larger Databases:
– marketing data warehouses: TB of historical data
– daily news broadcasts: 1 TB of searchable video/audio data
• Large Scans: Searches require access to large fraction of database
• Repeated Scans: DSS queries, Data mining algorithms
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
31
Life, Death & Reincarnation• Database Machines are coming, Database Machines
are coming ... (Hsiao 1979)
• Then there was Britton-Lee, Direct, ICL …– Teradata builds highly-parallel shared-nothing SQL
server– many university “paper” designs
• “Database Machines, An Idea whose time has Passed?” (Boral- DeWitt 1983)
• Then there was MMDBs, Grace, Gamma and more Teradata
• Then there was Software (Parallel Database Query)• Next: PDQ + lots of disks with power controllers
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
32
And All Along
Stonebraker’s Opinion:
“The history of DBMS research is littered with innumerable proposals to construct hardware database machines to provide high performance operations. In general these have been proposed by hardware types with a clever solution in searchof a problem on which it might work.”
Readings in Database Systems, Morgan-Kaufmann
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
33
Why Not then, but Yes now• Too early: small databases on 1 disk
TB databases span thousands disks, need partitioning • Disk filter designs: addressed only small part of DBMS
requirementsdisk controllers are fast computers
• Exotic technologies (bubbles, CCD…) went away• Special purpose hardware increased design time and
costHigher level of integration, VLSI design tools better
• Parallel query processing was not well-understoodLarge body of research, successful commercial implementations
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
34
Parallel Query Processing[DeWitt-Gray CACM91]
Source Data
Scan
Sort
Source Data
Scan
Sort
Source Data
Scan
Sort
Source Data
Scan
Sort
Source Data
Scan
Sort
Merge
Pipelining
data streams flow from one operator to the next
Partitioning
tables are partitioned to allow concurrent processing on partitions
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
35
Data Pathway Contention[Patterson Sigmod 1998]
• Diskexternal I/O bus bottleneck to transfer rate, cost
• Networkinternal I/O bus interface is bottleneck to delivered bandwidth
• Memory-Processorprocessor-memory interface (cache+memory bus) is bottleneck to delivered bandwidth
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
36
Processor&
Memory
Processor&
Memory
Processor&
Memory
Processor&
Memory. . .
Scalable Interconnect
A Shared-Nothing Database Machine
No contention in memory access or parallel disk access => “Embarrassingly Parallel” Scan [Patterson]But: how fast need Interconnect be? Each processor has own OS, communication protocols,DB instance Exchange data streams for pipelining ops, for sort, merge Can’t support M:N mapping between disks & threads
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
37
Share-Everything?
• Need more bandwidth for shipping data streams than network can provide
• Need M:N mapping from disks to processors for sort/merge
• Control & synchronization: Data-flow best to synchronize processors
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
38
Level ofIntegration
Components Links Throughput Latency
Chip Transistor /Gate
Connectionlines
30 GB/Sec(16 64-bit registersat 200 MHz)
1-8 internal clocks
Board Chips / Discretecomponents
1. Point-to-pointconnections2. Buses
1. 800 MB/Sec
2. 150 MB/Sec
1.Half of transaction(10 clocks of theslowest device )1. 10-50 bus clocks
System Board /Interface
1. Crossbars2. Buses
1. 200-500MB/Sec
2. 80 MB/Sec
1. 10 crossbar clocks
2. 10-50 bus clocks
Network Node (Systems)/ Bridges
Fibre Channel
Ethernet / VIA
Fibre Channel :100–200 MB/Sec
Sender overhead +Receiver overhead +transmission latency +link availability
Where to Get the Bandwidth?
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
39
Direct connection
BAM RAM
P 4
I/O interface adapter
Bus adapter
DMP BOARD
To Host Computer To other DMP Boards via high-speed switch
• Massive Parallel Operation data-flow control• M:N thread-to-disk RFM
. . . . . .P 1
NP 1 NP 2 NP 16
Direct processor to disk accessDirect disk to memory connect
1 80...
The Data Manipulation Platform
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
40
Scan tabX 1
Scan tabX 2
Scan tabX 32
Scan tabY 3
Scan tabY 1. . . . . .
Exchange 1 Exchange 2
HJoin HJoin HJoin
Exchange 3
Exchange 4
Group 1 Group 2
Sort 1 Sort 2
Exchange 5
1 2 32 1 3
Select sum(tabX.amount*.08), tabY.region from tabX,tabY
where tabX.key=tabY.regiongroup by tabY.region, order by tabY.region;
A DSS Query Execution Plan
. . . . . .
Database Disks
Temp Disks
1/3 selected
1/10 joined
1/10 grouped
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
41
Scan tabX 1
Scan tabX 2
Scan tabX 32
Scan tabY 3
Scan tabY 1. . .
Exchange 1 Exchange 2
HJoin HJoin HJoin
Exchange 3
Exchange 4
Group 1 Group 2
Sort 1 Sort 2
Exchange 5
1 2 32 1 3
Bandwidth Requirements
. . . . . .Database Disks
Temp DiskContention
32*20MB/s= 640 MB/s
2.1 MB/s
21 MB/s
210 MB/s
Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt
42
Conclusion
DMP: shared-nothing and shared-everything
IT ISN’T THAT YOU CAN’T SHAREIT IS WHERE YOU SHARE
ON A CHIP ON A BOARD ON A NETWORK