CPS216: Data-Intensive Computing Systems Introduction to Query Processing Shivnath Babu.
CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
-
Upload
myah-huntington -
Category
Documents
-
view
264 -
download
1
Transcript of CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.
![Page 1: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/1.jpg)
CPS216: Data-Intensive Computing Systems
Data Access from Disks
Shivnath Babu
![Page 2: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/2.jpg)
Outline
• Disks
• Data access from disks
• Software-based optimizations– Prefetching blocks– Choosing the right block size
![Page 3: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/3.jpg)
Focus on: “Typical Disk”
…
Top View
Head assembly
Sector Gap
Terms: Platter, Head, Cylinder, TrackSector (physical), Block (logical),
Gap
![Page 4: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/4.jpg)
Block Address:
• Physical Device
• Cylinder #
• Surface #
• Start sector #
![Page 5: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/5.jpg)
Disk Access Time (Latency)
block Xin memory
?
I wantblock X
![Page 6: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/6.jpg)
Access Time = Seek Time +
Rotational Delay +Transfer Time +Other
![Page 7: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/7.jpg)
Seek Time
3 or 5x
x
1 N
Cylinders Traveled
Time
Average value: 10 ms 40 ms
![Page 8: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/8.jpg)
Rotational Delay
Head Here
Block I Want
![Page 9: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/9.jpg)
Average Rotational Delay
R = 1/2 revolution
Example: R = 8.33 ms (3600 RPM)
![Page 10: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/10.jpg)
Transfer Rate: t
• t: 1 100 MB/second
• transfer time: block size
t
![Page 11: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/11.jpg)
Other Delays
• CPU time to issue I/O
• Contention for controller
• Contention for bus, memory
“Typical” Value: 0
![Page 12: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/12.jpg)
• So far: Random Block Access
• What about: Reading “Next” block?
![Page 13: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/13.jpg)
If we do things right …
Time to get = Block Size + Negligible
next block t
- skip gap
- switch track
- once in a while,
next cylinder
![Page 14: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/14.jpg)
Rule of Random I/O: ExpensiveThumb Sequential I/O: Much less
• Ex: 1 KB Block» Random I/O: 20 ms.» Sequential I/O: 1 ms.
![Page 15: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/15.jpg)
Cost for Writing similar to Reading
…. unless we want to verify!
![Page 16: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/16.jpg)
To Modify Block:(a) Read Block
(b) Modify in Memory
(c) Write Block
[(d) Verify?]
![Page 17: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/17.jpg)
• 3.5 in diameter disk
• 3600 RPM
• 1 surface
• 16 MB usable capacity (16 X 220)
• 128 cylinders
• seek time: average = 25 ms.
adjacent cylinders = 5 ms.
A Synthetic Example
![Page 18: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/18.jpg)
• 1 KB blocks = sectors
• 10% overhead between sectors
• capacity = 16 MB = (220)16 = 224 bytes
• # cylinders = 128 = 27
• bytes/cyl = 224/27 = 217 = 128 KB
• blocks/cyl = 128 KB / 1 KB = 128
![Page 19: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/19.jpg)
3600 RPM 60 revolutions / sec1 rev. = 16.66 msec.
One track:...
Time over useful data:(16.66)(0.9)=14.99 ms.Time over gaps: (16.66)(0.1) = 1.66 ms.Transfer time 1 block = 14.99/128=0.117 ms.Trans. time 1 block+gap=16.66/128=0.13ms.
![Page 20: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/20.jpg)
Burst Bandwith1 KB in 0.117 ms.
BB = 1/0.117 = 8.54 KB/ms.
or
BB =8.54KB/ms x 1000 ms/1sec x 1MB/1024KB = 8540/1024 = 8.33 MB/sec
![Page 21: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/21.jpg)
Sustained bandwith (over track)128 KB in 16.66 ms.
SB = 128/16.66 = 7.68 KB/ms
or
SB = 7.68 x 1000/1024 = 7.50 MB/sec.
![Page 22: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/22.jpg)
T1 = Time to read one random block
T1 = seek + rotational delay + TT
= 25 + (16.66/2) + .117 = 33.45 ms.
![Page 23: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/23.jpg)
A Back of Envelope Calculation
• Suppose it takes 25 ms to read one 1 KB block• 10 tuples of size 100 bytes each fit in 1 block• How much time will it take to read a table
containing 1 Million records (say, Amazon’s customer database)?
![Page 24: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/24.jpg)
Suppose DBMS deals with 4 KB blocks
T4 = 25 + (16.66/2) + (.117) x 1
+ (.130) X 3 = 33.83 ms
[Compare to T1 = 33.45 ms]
...1 2 3 4
1 block
![Page 25: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/25.jpg)
TT = Time to read a full track
(start at any block)
TT = 25 + (0.130/2) + 16.66* = 41.73 ms
to get to first block
* Actually, a bit less; do not have to read last gap.
![Page 26: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/26.jpg)
Outline
• Disks
• Data access from disks
• Software-based optimizations– Prefetching blocks– Choosing the right block size
![Page 27: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/27.jpg)
Software-based Optimizations(in Disk controller, OS, or DBMS
Buffer Manager)
• Prefetching blocks
• Choosing the right block size
• Some others covered in Garcia-Molina et al. book
![Page 28: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/28.jpg)
Prefetching Blocks
• Exploits locality of access– Ex: relation scan
• Improves performance by hiding access latency
• Needs extra buffer space– Double buffering
![Page 29: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/29.jpg)
Block Size Selection?
• Big Block Amortize I/O Cost
• Big Block Read in more useless stuff!
Unfortunately...
![Page 30: CPS216: Data-Intensive Computing Systems Data Access from Disks Shivnath Babu.](https://reader036.fdocuments.net/reader036/viewer/2022062318/55176e5255034645368b4b5a/html5/thumbnails/30.jpg)
Tradeoffs in Choosing Block Size
• Small relations?
• Update-heavy workload?
• Difficult to use blocks larger than track
• Multiple block sizes