Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini...
-
Upload
beryl-black -
Category
Documents
-
view
217 -
download
0
Transcript of Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini...
![Page 1: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/1.jpg)
Improving Disk Throughput in Data-Intensive Servers
Enrique V. Carrera and Ricardo Bianchini
Department of Computer ScienceRutgers University
![Page 2: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/2.jpg)
Introduction
Disk drives are often bottlenecks Several optimizations have been proposed
• Disk arrays
• Fewer disk reads using fancy buffer cache mgmt
• Optimized disk writes using logs
• Optimized disk scheduling
Disk throughput still problem for data-intensive servers
![Page 3: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/3.jpg)
Modern Disk Drives
Substantial processing and memory capacity
Disk controller cache• Independent segments = sequential streams
• If #streams > #segments, LRU segm is replaced
• On access, blocks are read ahead to fill segment
Disk arrays• Array controller may also cache data
• Striping affects read-ahead
![Page 4: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/4.jpg)
Key Problem
Controller caches not designed for servers• Sequential access to small # large files
• Read-ahead of consecutive blocks
• Segment is unit of allocation and replacement
Data-intensive servers• Small files
• Large # concurrent accesses
• Large # blocks often miss in the controller cache
![Page 5: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/5.jpg)
This Work
Goal• Management techniques for disk controller
caches that are efficient for servers
Techniques• File-Oriented Read-ahead (FOR)
• Host-guided Device Caching (HDC)
Exploit processing and memory of drives
![Page 6: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/6.jpg)
Architecture
![Page 7: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/7.jpg)
File-Oriented Read-ahead
Disk controller has no notion of file layout
Read-ahead can be useless for small files• Disk utilization is not amortized
• Useless blocks pollute the controller cache
FOR only reads ahead blocks of same file
![Page 8: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/8.jpg)
File-Oriented Read-ahead
FOR needs to know layout of files on disk• Bitmap of disk blocks kept by controller
• 1 block is logical continuation of previous block
• Initialized at boot, updated on metadata writes
# blocks to read-ahead = # consecutive 1’s or max read-ahead size
![Page 9: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/9.jpg)
File-Oriented Read-ahead
FOR could underutilize segments, so allocation and replacement based on blocks
Replacement policy: MRU
FOR benefits• Lower disk utilization
• Higher controller cache hit rates
![Page 10: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/10.jpg)
Host-guided Device Caching
Data-intensive servers rely on disk arrays, so non-trivial amount of cache space
Current disk controller caches are speed matching and read-ahead buffers
More useful if each cache can be managed directly by the host processor
![Page 11: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/11.jpg)
Host-guided Device Caching
Our evaluation:• Disk controllers permanently cache data with
most misses in buffer cache
• Each controller caches data stored on its disk
• Assumes block-based organization
Support for three simple commands• pin_blk()
• unpin_blk()
• flush_hdc()
![Page 12: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/12.jpg)
Host-guided Device Caching
Execution divided into periods to determine:• How many blocks to cache; which blocks
those are; when to cache them
HDC benefits • Higher cache hit rate
• Lower disk utilization
Tradeoff: space for HDC and read-aheads
![Page 13: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/13.jpg)
Methodology
Simulation of 8 IBM Ultrastar 36Z15 drives attached to non-caching Ultra160 SCSI card
Logical disk blocks striped across array
Contention for buses, memories, and other components is simulated in detail
Synthetic + real traces (Web, proxy, file)
![Page 14: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/14.jpg)
Real Workloads
Web: I/O time as function of striping unit size
HDC: 2MB
![Page 15: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/15.jpg)
Real Workloads
Web: I/O time as function of HDC memory size
Stripes: 16KB
![Page 16: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/16.jpg)
Real Workloads
Summary• Consistent and significant performance gains
• Combination achieves best overall performance
![Page 17: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/17.jpg)
Related Work
Techniques external to disk controllers
Controller cache different than other caches• Lack of temporal locality
• Orders of magnitude smaller than main memory
• Read-ahead restricted to sequential blocks
Explicit grouping• Grouping needs to be found and maintained
• Segment replacements may eliminate benefits
![Page 18: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/18.jpg)
Related Work
Controller read-ahead & caching techniques• None considered file system info, host-guided
caching, or block-based organizations
Other disk controller optimizations• Scheduling of requests
• Utilizing free bandwidth
• Data replication
• FOR and HDC are orthogonal
![Page 19: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/19.jpg)
Conclusions
Current controller cache management is inappropriate for servers
FOR and HDC can achieve significant and consistent increases in server throughput
Real workloads show improvements of 47, 33 and 21% (Web, proxy, and file server)
![Page 20: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/20.jpg)
Extensions
Strategies for servers that use raw I/O
Better approach than bitmap
Array controllers that cache data and hide individual disks
Impact of other replacement policies and sizes for the buffer cache
![Page 21: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/21.jpg)
More Information
http://www.darklab.rutgers.edu
![Page 22: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/22.jpg)
Synthetic Workloads
I/O time as function of file size
![Page 23: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/23.jpg)
Synthetic Workloads
I/O time as function of simultaneous streams
![Page 24: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/24.jpg)
Synthetic Workloads
I/O time as function of access frequency
![Page 25: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/25.jpg)
Synthetic Workloads
Summary• No read-ahead hurts performance for files > 16KB
• No effect if simply replace segments with blocks
• FOR gains increase as file size decreases and # simultaneous streams increases
• HDC gains increase as requests are shifted toward a small # blocks
• FOR gains decrease as % writes increases
![Page 26: Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.](https://reader036.fdocuments.net/reader036/viewer/2022070411/56649f585503460f94c7cf48/html5/thumbnails/26.jpg)
Synthetic Workloads
I/O time as function of percentage of writes