IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do...
-
Upload
megan-conley -
Category
Documents
-
view
218 -
download
4
Transcript of IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do...
![Page 1: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/1.jpg)
IBM Systems Group
Nick Jones © 2004 IBM Corporation
What could happen to your data?
What can you do about it?
![Page 2: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/2.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation2 Nick Jones
The Plan
Introduction
Types of failure
The probability of failure
The true cost of failure
Addressing the problem
Putting it into perspective
Questions & summary
![Page 3: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/3.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation3 Nick Jones
Types of failure
Consider a pessimist’s view of a hard disk
Two ways in which a drive can fail
– It reports the failure
– It lies
![Page 4: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/4.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation4 Nick Jones
The probability of failure
Mean Time Between Failure ≈ 1,200,000 hours
Drive failure doesn’t sound to be too big a problem…
…but then consider the number of drives
![Page 5: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/5.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation5 Nick Jones
The true cost of failure
“It will never happen to me”
Increased disk size means increased data loss
A few statistics
![Page 6: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/6.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation6 Nick Jones
Addressing the problem
Make backups
Add extra information to the disk
Add extra disks
![Page 7: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/7.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation7 Nick Jones
Addressing the problem: Extra information
Error Correcting Code (ECC) on the disk drive
Client data Drive ECC
Data seen by the drive
![Page 8: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/8.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation8 Nick Jones
Addressing the problem: Extra information
Longitudinal Redundancy Check (LRC) in addition to ECC
Block LRCClient data Drive ECC
![Page 9: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/9.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation9 Nick Jones
Addressing the problem: Extra disks
The idea was published in 1988
A Case for Redundant Arrays of Inexpensive Disks by Patterson, Gibson & Katz
![Page 10: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/10.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation10 Nick Jones
RAID 0: Striping
ABCDEFG
M
I
E
A
N
J
F
B
O
K
G
C
P
L
H
D
RAID array
Data striped across member
disks
![Page 11: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/11.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation11 Nick Jones
RAID 1: Mirroring
ABCDEFG
D
C
B
A
D
C
B
A
RAID array
Data mirrored across member
disks
![Page 12: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/12.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation12 Nick Jones
RAID 10: Striping & Mirroring
ABCDE
K
H
E
B
K
H
E
B
Data striped across mirrored
pairs of disks
J
G
D
A
J
G
D
A
L
I
F
C
L
I
F
C
![Page 13: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/13.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation13 Nick Jones
XOR based parity
Bitwise operator
If the two inputs are the same, the output is 0
If the two inputs are different, the output is 1
Bit 1 Bit 2 XOR result
0 0 0
0 1 1
1 0 1
1 1 0
![Page 14: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/14.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation14 Nick Jones
XOR based parity: An example
0 1 1 0 0 1 0 1
0 0 1 1 0 0 1 1Data
Parity0 1 0 1 0 1 1 0
x x x x x x x x
![Page 15: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/15.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation15 Nick Jones
XOR based parity: An example
0 1 1 0 0 1 0 1
0 0 1 1 0 0 1 1Data
Parity0 1 0 1 0 1 1 0
x x x x x x x
![Page 16: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/16.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation16 Nick Jones
RAID 4: Parity
ABCDEFG
J
G
D
A
K
H
E
B
L
I
F
C
P4
P3
P2
P1
RAID array
Data striped across disks, with
one parity disk
![Page 17: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/17.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation17 Nick Jones
Coping with failure
ABCDEFG
J
G
D
A
K
H
E
L
I
F
C
P4
P3
P2
P1
Error reading E
– Read D, F & P2
– XOR them to reconstruct E
– Write reconstructed E
E
B
![Page 18: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/18.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation18 Nick Jones
Coping with failure
ABCDEFG
J
G
D
A
K
H
E
L
I
F
C
P4
P3
P2
P1
Drive loss
– Replace the drive
– Rebuild the data
– Redundancy restored
B
![Page 19: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/19.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation19 Nick Jones
RAID 5: Rotate parity
ABCDEFG
J
H
F
P1
K
I
P2
A
L
P3
D
B
P4
G
E
C
RAID array
Data striped across disks, with
parity rotating
![Page 20: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/20.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation20 Nick Jones
RAID 6: More parity
ABCDEFG
M
K
P2
A
N
L
Q2
B
O
P3
E
C
P
Q3
F
D
P4
I
G
P1
Q4
J
H
Q1
Data striped across disks, with 2 rotating parities
![Page 21: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/21.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation21 Nick Jones
Putting it into perspective
Cannot survive on RAID alone
Avoid a single point of failure
– Fire, flood, power loss
Split your array across two sites
Human error
Backups still have a place
![Page 22: IBM Systems Group © 2004 IBM Corporation Nick Jones What could happen to your data? What can you do about it?](https://reader035.fdocuments.net/reader035/viewer/2022070306/55160570550346cf6f8b5d41/html5/thumbnails/22.jpg)
IBM Systems & Technology Group
© 2004 IBM Corporation22 Nick Jones
Summary
Want to avoid any single point of failure
Disk drives do fail
RAID protects against drive failure
Mirroring & parity
RAID isn’t the ultimate solution