chapter-8
description
Transcript of chapter-8
-
CT 320: Network and System Administra8on Fall 2014*
Dr. Indrajit Ray Email: [email protected]
Department of Computer Science
Colorado State University Fort Collins, CO 80528, USA
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
* Thanks to Dr. James Walden, NKU and Russ Wakeeld, CSU for contents of these slides
-
Disks
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Topics
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
1. Disk components 2. Disk interfaces 3. Lifecycle of a disk 4. Performance 5. Reliability 6. RAID 7. Adding a disk 8. Logical volumes 9. Filesystems
-
Hard Drive Components
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Physical Disk Geometry One head for each surface
All tracks at r = dn form a cylinder
Each sector has 512+ bytes of informa8on
One surface dedicated for posi8oning and synchroniza8on
Not all por8ons of the disk are addressable by the OS
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Hard Drive Components
Actuator Moves arm across disk to read/write data. Arm has mul8ple read/write heads (oben 2/placer.)
Placers Rigid substrate material. Thin coa8ng of magne8c material stores data. Coa8ng type determines areal density: Gbits/in2
Spindle Motor Spins placers from 3600-15,000 rpm. Speed determines disk latency.
Cache 2-16MB of cache memory oben more Reliability: write-back vs. write-through
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Disk Informa;on: hdparm
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
# hdparm -i /dev/hde /dev/hde: Model=WDC WD1200JB-00CRA1, FwRev=17.07W17, SerialNo=WD-WMA8C4533667 Cong={ HardSect NotMFM HdSw>15uSec SpinMotCtl Fixed DTR>5Mbs FmtGapReq } RawCHS=16383/16/63, TrkSize=57600, SectSize=600, ECCbytes=40 BuType=DualPortCache, BuSize=8192kB, MaxMultSect=16, MultSect=o CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648 IORDY=on/o, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=no WriteCache=enabled Drive conforms to: device does not report version: * signies the current ac/ve mode
-
Disk Performance
Seek Time Time to move head to desired track (3-8 ms)
Rota8onal Delay Time un8l head over desired block (8ms for 7200)
Latency Seek Time + Rota8onal Delay
Throughput Data transfer rate (20-80 MB/s)
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Latency vs. Throughput
Which is more important? Depends on the type of load.
Sequen8al access Throughput Mul8media on a single user PC
Random access Latency Most servers
How to improve performance Faster disks Caching More spindles (disks). More disk controllers.
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Disk Performance: hdparm
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
# hdparm -tT /dev/hde /dev/hde: Timing cached reads: 876 MB in 2.00 seconds
= 437.41 MB/sec Timing buffered disk reads: 88 MB in 3.08 seconds = 28.60 MB/sec
-
Reliability
MTBF Average 8me between failures (>1,000,000 hours).
Real failure curves Early phase: high failure rate from defects. Constant failure rate phase: MTBF valid. Wearout phase: high failure rate from wear.
Failures more likely on trauma8c events. Power on/o.
Systems oben wear out before MTBF. Average life span of a disk is about 5 years
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Solid State Drives
Flash memory based solid state drives No moving parts Much higher I/O performance than hard disks Random reads also result in very high performance. Less prone to failure (more reliable)
Higher costs Uses NAND memory
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
NAND Flash Constraints (1)
Flash module divided in Blocks Pages Sectors E.g., 1GB 8K Blocks of 64 pages of 4 sectors of 512 bytes
Read/Write at page granularity (as disks) Writes more 8me and energy consuming than reads (factor of 3 to 10)
Pages must be wricen sequen8ally within a block Erase at block granularity
Erase-before-rewrite constraint 10 8mes more costly than a page write
A block wears out aber 106 write/erase cycles
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
NAND Flash Constraints (2)
Hardware constraints usually lead to make updates out-of place
Flash Transla8on Layer (FTL) is required for Address transla8on Wear leveling Garbage collec8on
FTL is a main source of unpredictability Very badly adapted to random writes Provides no guarantee against read/write failures
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Disk Interfaces
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
SCSI Standard interface for servers.
IDE Standard interface for PCs.
Fibre Channel High bandwidth Can run SCSI or IP
USB Fast enough for slow devices on PCs.
-
SCSI
Small Computer Systems Interface Fast, reliable, expensive.
A bus, not a simple PC to device interface. Each device has a target # ranging 0-7 or 0-15. Devices can communicate directly w/o CPU.
Many versions Original: SCSI-1 (1979) 5MB/s Current: SCSI-3 (2001) 320MB/s
Serial Acached SCSI (SAS) Up to 128 devices Up to 2 GB/s full duplex.
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
IDE
Integrated Drive Electronics / AT acachment Slower, less reliable, cheap. Only allows 2 devices per interface. ATAPI standard added removable devices.
Many versions Original: IDE / ATA (1984) Current: Ultra-ATA/133 133MB/s
Serial ATA Up to 128 devices. 1.5 GB/s New standard up to 6 GB/s
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
IDE vs. SCSI
SCSI oers becer performance/scale Faster bus Faster hard drives (up to 15,000rpm). Lower CPU usage Becer handling of mul8ple requests.
Cheaper IDE oben best for worksta8ons. Convergence
SATA2 and SAS converging on a single standard.
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Other Host Interfaces
PCI Express Speeds up to 2.0 GB/s
Fibre Channel Very high speed achievable Can support variety of network communica8on protocols such as SCSI / IP
Almost exclusively used for servers USB, Firewire
Generally much slower and hence not used for internal disks
USB 3.0 promises speeds > 3.0 GB/s
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
RAID
Redundant Array of Independent Disks Can be implemented in hardware or sobware. Hardware RAID controllers:
Caching Automate rebuilding of arrays
Advantages Capacity Reliability Fault-tolerance Throughput
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
RAID Levels
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
RAID 0: Striped evenly for performance. MTBF = (avg MTBF)/# disks
-
RAID Levels (contd)
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
RAID 1: Mirrored for reliability Every write goes to each disk of set.
Seek 8me halved as reads split between disks.
RAID 0 + 1: Striped + mirrored
-
RAID Levels
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
RAID 5: Striped with distributed parity. Block striping, not disk striping. Can lose one disk of set without losing data.
-
RAID Levels
JBOD: Concatenated for capacity. Only data on bad disk is lost, no performance penalty
RAID 3, 4 exist but not popular. RAID 3 uses byte level striping with dedicated parity disk
RAID 4 uses block level striping with dedicated parity disk
RAID 6 extends RAID 5 by using two parity blocks
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Lifecycle of a HDD Blank media Low level format
Performed at the factory Par88on High level format Opera8ng system install Systems opera8on
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Blank Magne;c Media
For simplicity we will use a linear model of the magne8c media
Unless we are performing electron microscopy the exact media geometry is not signicant
The blank media has only geometric structure and raw magne8c storage
Beginning End
Beginning End
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Read / Write Process (simplied)
Write process Digital signals are encoded (for 8ming recovery) and
transformed into analog signals that drive the magne8c eld on the write head
Read process Analog magne8c eld is sensed, 8ming is recovered and
sampled signal is converted into digital data
Beginning End
Read / Write Head
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Low Level Format
Low level formavng adds indivisible units of storage called sectors Most modern HDDs use 512+ bytes sectors
The + accounts for sector overhead bytes (dier by manufacturer) Overhead bytes provide error correc8on and 8ming recovery
func8ons Bad sectors are automa8cally remapped to redundant sectors by
the HDD controller
512 bytes"
Sectors (512 bytes plus overhead)
Redundant Sectors (Only visible to the HDD controller)
Individual sector
Sector overhead
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Par;;oning
The Master Boot Record is created and includes the Master Boot Code (MBC) and the Master Par88on Table (MPT) always at sector 1 on any bootable media
The MBC is executed at boot if the HDD is designated as the boot device
The MPT contains informa8on about logical volumes including the ac8ve par88on, the par88on whose Volume Boot Code (VBC) will be executed
Each par88on has Disk Parameter Block (DPB) that stores informa8on about par88ons, le system type, date and 8me last mounted etc.
Inter-par88on gaps are a collec8on of unused sector Some sectors are unused due to addressing issues
MBC MPT VBC DPB VBC DPB
Master Boot Record (MBR)
Inter-par88on gap
Par88on #1 Volume Boot Record (VBR)
Unused sectors
Par88on #2
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
High Level Format (File System)
MPT now contains le system type and cluster size Cluster sizes are in increments of 512 bytes (one sector) This becomes the indivisible le size for the opera8ng system
A le system structure is created FAT creates a le alloca8on table (simple table) NTFS creates a master le table (database) Linux EXT2/EXT3/EXT4 creates a virtual le system
MBC MPT
Master Boot Record (MBR)
File System Structures Free Space
Cluster Blocks
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Opera;ng System Install
Opera8ng system code, applica8on code, congura8on data and applica8on data are installed
A swap le is created for NTFS and UNIX variants (Linux, Unix, FreeBSD etc)
Boot code is wricen to the MBC (or VBC if a boot loader is used)
MBC MPT
File System Structures Free Space
Opera8ng System Code / Data
Swap Space Master Boot Record (MBR)
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Adding a Disk
Install new hardware Verify disk recognized by BIOS.
Boot Verify device exists in /dev
Par88on fdisk /dev/sdb
Create lesystem mkfs v t ext3 /dev/sdb1
Add entry to /etc/fstab /dev/sdb1 /proj ext3 defaults 0 2
mount -a
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
When dont you need a lesystem?
Swap space mkswap v /dev/sdb1
Server applica8ons Oracle VMWare Server
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Logical Volumes
What are logical volumes? Appear to user as a physical volume. But can span mul8ple par88ons and/or disks.
Why logical volumes? Aggregate disks for performance/reliability. Grow and shrink logical volumes on the y. Move logical volumes btw physical devices. Replace volumes w/o interrup8ng service.
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
LVM
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
LVM Components
Logical Volume Group (LVG) Set of physical volumes (par88ons or disks.) May be divided into logical volumes (LVs.)
LVs made up of xed sized logical extents Each LE is 4MB. Physical extents are the same size.
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Mapping Modes
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
Linear Mapping LVs assigned to con8nguous areas of PV space.
Striped Mapping LEs interleaved across PVs to improve performance.
-
SeVng up a LVG and LV
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
1. Ini8alize physical volumes pvcreate /dev/hda1 pvcreate /dev/hdb1
2. Ini8alize a volume group vgcreate nku_proj /dev/hda1 /dev/hdb1 Use vgextend to add more PVs later.
3. Create logical volumes lvcreate -n nku1 --size 100G
nku_proj1 4. Create lesystem
mkfs v t ext3 /dev/nku_proj/nku1
-
Extending a LV
Set absolute size lvextend L120G /dev/nku_proj/nku1
Or set rela8ve size lvextend L+20G /dev/nku_proj/nku1
Expand the lesystem without unmoun8ng ext2online v /dev/nku_proj/nku1
Check size df k
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Swap
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
Can use swaple instead of swap par88on dd if=/dev/zero of=/swapfile bs=1024k count=512
mkswap /swapfile Enable swap
swapon /swapfile swapon /dev/sda2
Disable swap swapoff /swapfile swapoff /dev/sda2
Check swap resource usage cat /proc/swaps
-
Filesystems
ext4 Gaining more popularity Can support volumes with sizes up to 1 exbibyte (260 bytes) and les up to 16 tebibytes (240 bytes)
ext3 Current most common Linux lesystem. Journaling eliminates need for fsck.
ext2 Old Linux non-fragmen8ng fast lesystem. Can be converted to ext3 by adding journal: tune2fs j /dev/sda1
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
Moun;ng
To use a lesystem mount /dev/sda1 /mnt df /mnt
Automa8c moun8ng Add an entry in /etc/fstab
Unmount umount /dev/sda1 Cannot unmount a volume in use.
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
-
fstab
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014
# /etc/fstab: static file system information. # #
proc /proc proc defaults 0 0 /dev/hdc1 / ext3 defaults 0 1 /dev/hdc5 /win vfat user,rw 0 0 /dev/hdc7 none swap sw 0 0 /dev/hdc8 /var ext3 defaults 0 2 /dev/hdc9 /home ext3 defaults 0 2 /dev/hda /media/cdrom0 iso9660 ro,user 0 0 /dev/fd0 /media/floppy0 auto rw,user 0 0
-
fsck: check + repair fs
Filesystem corrup8on sources Power failure System crash
Types of corrup8on Unreferenced inodes. Bad superblocks. Unused data blocks not recorded in block maps. Data blocks listed as free that are used in les.
fsck can x these and more Asks user to make more complex decisions. Stores unxable les in lost+found.
Dr. Indrajit Ray, Computer Science Department CT 320 Network and Systems Administra8on, Fall 2014