1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center...
-
Upload
ethel-kristina-osborne -
Category
Documents
-
view
220 -
download
0
Transcript of 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center...
![Page 1: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/1.jpg)
1/35
Stonehenge: Multi-Dimensional Storage
Virtualization
Lan HuangIBM Almaden Research Center
Joint work with Gang Peng and Tzi-cker Chiueh
SUNY Stony Brook
June, 2004
![Page 2: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/2.jpg)
2/35
Introduction Storage growth is
phenomenal: new hardware
Isolated storage: resource waste
Management
clients
IP LAN/MAN/WAN
Database server
File server
1
10
100
1000
10000
1970 1980 1990 2000
Year
Are
al D
ensi
ty
[Patterson’98]
Huge amount of data,heterogeneous Devices, spread out everywhere.
![Page 3: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/3.jpg)
3/35
Storage Virtualization
Examples: LVM, xFS, StorageTank Hide Physical details from high-level applications
applicationStorage
management
O. S. StorageVirtualization
Disks,Controllers
Hardwareresources
AbstractInterface
Physical Disks
Virtual Disks
Clients
![Page 4: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/4.jpg)
4/35
Storage Virtualization
Storage consolidation VD as tangible as PD:
Capacity Throughput Latency
Resource efficiency Ei
![Page 5: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/5.jpg)
5/35
Stonehenge Overview
Input: VD (B, C, D, E) Output: VDs with
performance guarantee
High Level Goals: Storage Consolidation Performance Isolation Efficiency Performance
clients
IP LAN/MAN/WAN
Database server
File server
Stonehenge (LAN)
![Page 6: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/6.jpg)
6/41
Hardware Organization
Storagemanag
er
Storage server
Diskarray
Kernel
ClientApplicatio
n
Storage Clerk
Kernel
ClientApplicatio
n
Storage Clerk
Storage server
Diskarray
Storage server
Diskarray
Control mesg Data/cmds
Gigabit network
Object interface
Object interface
client client
File interface
![Page 7: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/7.jpg)
7/41
Key Issues in Stonehenge
How to ease the task of storage management:
Centralization Virtualization Consolidation
How to achieve performance isolation among virtual disks?
Run time QoS guarantee How to do it efficiently?
Efficiency-aware algorithms Dynamic adaptive feedback
![Page 8: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/8.jpg)
8/41
Key components
Mapper CVC scheduler Feedback path between them
![Page 9: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/9.jpg)
9/35
Virtual to Physical Disk Mapping
Multi-dimension disk mapping: NP Complete
Goal: maximize resource utilization Heuristics: maximize goal function
[toyota75] Input: VDs, PDs Goal Function G: max(G) Output: VD, PD mapping
![Page 10: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/10.jpg)
10/35
Islands Effect
1 2 3 4
PDs
VDs
![Page 11: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/11.jpg)
11/35
Key Components
Mapper CVC scheduler Feedback path between them
![Page 12: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/12.jpg)
12/35
Requirements of Real-time Disk Scheduling
Disk Specific Improve disk bandwidth utilization
SATF, CSCAN etc…
Non Disk Specific Meet real-time request’s deadline Fair disk bandwidth allocation
among virtual disks (Virtual Clock scheduling)
Key: Bandwidth Guarantee
seek
rotation
txferother
![Page 13: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/13.jpg)
13/35
CVC Algorithm
Two Queues: FT(i) = max(FT(i-1),
realtime)+1/IOPSm LBA
LBA Queue is used only if FT’s slack time allows it.
Real time + service time(R) < starting deadline of next request
FT LBA
CVC Scheduler
VD(m)
![Page 14: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/14.jpg)
14/35
Real-life Deployment
Dispatch the next N requests from LBA queue
The next batch will not be issued until the previous batch is done.
FT LBA
CVC Scheduler
VD(m)
Storage controller
On disk scheduler
![Page 15: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/15.jpg)
16/35
CVC Performance
3 VDs with real-life traces: video stream, web, financial, TPC-C
Touch 40% of the storage space
Video Streams Mixed Traces
![Page 16: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/16.jpg)
17/35
Impact of Disk I/O Time Estimate
Model Disk I/O time ? ATA disk impossible [ECSL TR-81] SCSI disk possible?
Run Time measurement: P(I/O Time)
![Page 17: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/17.jpg)
18/35
CVC Latency Bound
If the traffic generated within the period of [0,t]
V(t) <= T + r * t then D <= (T + Lmax )/ Bi +Lmax/C
(1)Storage System:D <= ( ( N+1)*k*C + T + Lmax)/
Bi + ( k*C+Lmax)/C (2)Stonehenge:D <=(N+1)/IOPSi+1/IOPSmax (3)
FT
VD(m)
IOPS(m)
IOPS(max)
N reqT Bytes
?
seek
rotation
txferother
![Page 18: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/18.jpg)
19/35
Key Components
Mapper CVC scheduler Feedback path between them
Relaxing worst case service time estimate
VD multiplex effect
![Page 19: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/19.jpg)
20/35
Empirical Latency vs Worst Case
Approximate P(service time, N) with P(service time, N-1)
Q is P’s inverse function
D <=(Q(0.95) + s) * [(N+1)/IOPSi+1/IOPSmax]
x
y
![Page 20: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/20.jpg)
21/35
Bursty I/O Traffic and Pspare
Self-similar Multiplexing effect : Pspare(x)
x
y
![Page 21: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/21.jpg)
22/35
Latency or Throughput Bound
(Bthroughput, C, D, E) D--> Blatency
(Bthroughput, C, Blatency, E) Bthroughput >= Blatency: throughput
bound Bthroughput < Blatency: latency bound
BthroughputBlatency Or even less?
![Page 22: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/22.jpg)
23/35
MBAC for Latency Bound VDs
When the jth VD with requirements (Dj, IOPS’’j, Cj, E) comes,1. For 0 < i <= j,
Convert Di to IOPS’i: Di <=(Qservice(0.95) +s)*[(N+1)/IOPS’i+1/IOPSmax]
Let IOPSi = max(IOPS’i, IOPS’’i)2. If sum(IOPSi) <IOPSmax , accept the new VD,
otherwise, reject.
![Page 23: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/23.jpg)
24/35
MBAC Performance Pservice
VD Type Probability
Deterministic MBAC Oracle
Run 1 Financial 95% 7 20 22
Run 2 Mixed 95% 7 14 14
Run 3 Mixed 85% 7 17 17
Number of VDs 7 9 10 11 13 14 15
Q_{service}(0.95) 11% 15% 19% 24% 37% 49% -
MBAC N/A 38% 43% 47% 55% 67% 95%
Deterministic 90% - - - - - -
Table 2. Resource Reservation
Table 1. Maximum number of VDs accepted.
![Page 24: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/24.jpg)
25/35
MBAC for Throughput Bound VDs
When jth VD (Dj, IOPS’’j, Cj, E) comes,Convert Dj to IOPS’j:
Dj <=(Qservice(0.95)+s)*[(N+1)/IOPS’j+1/IOPSmax]
Let IOPSj = max(IOPS’j, IOPS’’j)
if IOPSj < Qspare(E) admit the new VD, otherwise, reject it.
![Page 25: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/25.jpg)
26/35
MBAC Performance Pspare
VD 0 – TPC-CVD 1 - financialVD 2 – web
search
![Page 26: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/26.jpg)
27/35
Measurement-based Admission Control (MBAC)
When the jth VD with requirements (Dj, IOPS’’j, Cj, E) comes,1. For 0 < i <= j,
Convert Di to IOPS’i: Di <=(Qservice(0.95) +s)*[(N+1)/IOPSi+1/IOPSmax]
Let IOPSi = max(IOPS’i, IOPS’’i)2. Group VDs into two sets: throughput bounded set T and latency bounded L3. For the throughput bound VDs, calculate combined QI/O_rate, Let Qspare(x) = IOPSmax – QI/O_rate(x)4. If sum(IOPS(L)) <Qspare(E) , accept the new VD, otherwise, reject.
![Page 27: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/27.jpg)
28/35
Issues with Measurement
Stability I/O rate pattern is stable Boundary case for Pservice
Overhead of monitoring trivial
Window Size
![Page 28: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/28.jpg)
29/35
Put them all together: Stonehenge
Functionality: A general purpose IP storage cluster
Performance scheduling
Efficiency measurement
![Page 29: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/29.jpg)
30/35
Software Architecture
kernel
kernel
kernel
F. E. T. D.
iSCSI initiator
Stonehenge
V. Table
P. Table
IDE Mid Layer Driver
Disk Mapper
Admission controller
Trafficshaping
Scheduler
F. E. T. D.
F. E. T. D.
V. Table
P. Table
Scheduler
Queues
FETD front end target driver
User
Stonehenge
![Page 30: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/30.jpg)
31/35
Effectiveness of QoS Guarantees in Stonehenge
(a) CVC
(b) CSCAN (c) Deadline violation percentage
![Page 31: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/31.jpg)
32/35
Impact of Leeway Factor
Overload probability Violation Percentage
![Page 32: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/32.jpg)
33/35
Overall System Performance and Latency Breakdown
1 GHZ CPU IBM 7200 ATA disk
array Promise IDE
controllers 64 bit 66MHZ PCI
bus Intel GB NICs
Software Modules
Average Latency (usec)
iSCSI client 57
iSCSI server 507
Disk access 1360
Central 50
Network delay 1
574
Network delay 2
2
A max of 55 MB/sec per server.
![Page 33: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/33.jpg)
34/35
Related Work
Storage management Minerva etc at HPL
Efficiency-aware disk scheduler: Cello, Prism, YFQ
Run time QoS guarantee Web server, Video server, network QoS
IP storage
![Page 34: 1/35 Stonehenge: Multi-Dimensional Storage Virtualization Lan Huang IBM Almaden Research Center Joint work with Gang Peng and Tzi-cker Chiueh SUNY Stony.](https://reader031.fdocuments.net/reader031/viewer/2022032201/56649d1a5503460f949ef153/html5/thumbnails/34.jpg)
35/35
Conclusion
IP Storage Cluster consolidates storage and reduces fragmentation by 20-30%.
Efficiency-aware CVC real time disk scheduler with dynamic I/O time estimate provides guarantee of performance and good disk head utilization.
Measurement feed-back effectively remedies the over-provision.
Latency: Pservice 2-3 folds Throughput: Pspare 20% I/O time estimate: PI/O time Load imbalance: Pleeway