Role of OS for I/O CS 537 Lecture 13 File...

12
1 11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 1 CS 537 Lecture 13 File Systems Michael Swift 11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 2 Role of OS for I/O Standard library Provide abstractions, consistent interface Simplify access to hardware devices Resource coordination Provide protection across users/processes Provide fair and efficient performance Requires understanding of underlying device characteristics User processes do not have direct access to devices Could crash entire system Could read/write data without appropriate permissions Could hog device unfairly OS exports higher-level functions File system: Provides file and directory abstractions File system operations: mkdir, create, read, write 11/08/07 © 2005 Steve Gribble 3 File systems The concept of a file system is simple the implementation of the abstraction for secondary storage abstraction = files logical organization of files into directories the directory hierarchy sharing of data between processes, people and machines access control, consistency, … 11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 4 Abstraction: File User view Named collection of bytes Untyped or typed Examples: text, source, object, executables, application-specific Permanently and conveniently available

Transcript of Role of OS for I/O CS 537 Lecture 13 File...

Page 1: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

1

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 1

CS 537 Lecture 13

File Systems

Michael Swift

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 2

Role of OS for I/O •  Standard library

–  Provide abstractions, consistent interface –  Simplify access to hardware devices

•  Resource coordination –  Provide protection across users/processes –  Provide fair and efficient performance

•  Requires understanding of underlying device characteristics •  User processes do not have direct access to devices

–  Could crash entire system –  Could read/write data without appropriate permissions –  Could hog device unfairly

•  OS exports higher-level functions –  File system: Provides file and directory abstractions –  File system operations: mkdir, create, read, write

11/08/07 © 2005 Steve Gribble 3

File systems

•  The concept of a file system is simple –  the implementation of the abstraction for secondary storage

•  abstraction = files –  logical organization of files into directories

•  the directory hierarchy –  sharing of data between processes, people and machines

•  access control, consistency, …

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 4

Abstraction: File •  User view

–  Named collection of bytes •  Untyped or typed •  Examples: text, source, object, executables, application-specific

–  Permanently and conveniently available

Page 2: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

2

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 5

Files •  A file is a collection of data with some properties

–  contents, size, owner, last read/write time, protection … •  Files may also have types

–  understood by file system •  device, directory, symbolic link

–  understood by other parts of OS or by runtime libraries •  executable, dll, source code, object code, text file, …

•  Type can be encoded in the file’s name or contents –  file extension: .com, .exe, .bat, .dll, .jpg, .mov, .mp3, … –  content: #! for scripts

•  Operating system view –  Map bytes as collection of blocks on physical non-volatile storage device

•  Magnetic disks, tapes, NVRAM, battery-backed RAM •  Persistent across reboots and power failures

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 6

File Meta-Data •  Meta-data: Additional system information associated with each

file –  Name of file –  Type of file –  Pointer to data blocks on disk –  File size –  Times: Creation, access, modification –  Owner and group id –  Protection bits (read or write) –  Special file? (directory? symbolic link?)

•  Meta-data is stored on disk –  Conceptually: meta-data can be stored as array on disk

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 7

File access methods •  Some file systems provide different access methods that specify

ways the application will access data –  sequential access

•  read bytes one at a time, in order –  Random access

•  access given a part of a file by block/byte # –  record access

•  file is array of fixed- or variable-sized records –  Search access

•  FS contains an index of file contents •  apps query the FS for files containing a word

•  Why do we care about distinguishing sequential from random access? –  what might the FS do differently in these cases?

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 8

File Operations •  Create file with given pathname /a/b/file

–  Traverse pathname, allocate meta-data and directory entry •  Read from (or write to) offset in file

–  Find (or allocate) blocks of file on disk; update meta-data •  Delete

–  Remove directory entry, free disk space allocated to file •  Truncate file (set size to 0, keep other attributes)

–  Free disk space allocated to file •  Rename file

–  Change directory entry •  Copy file

–  Allocate new directory entry, find space on disk and copy •  Change access permissions

–  Change permissions in meta-data

Page 3: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

3

11/08/07 © 2005 Steve Gribble

Opening Files Expensive to access files with full pathnames

–  On every read/write operation: •  Traverse directory structure •  Check access permissions

Open() file before first access –  User specifies mode: read and/or write –  Search directories for filename and check permissions –  Copy relevant meta-data to open file table in memory –  Return index in open file table to process (file descriptor)‏ –  Process uses file descriptor to read/write to file

Per-process open file table –  Current position in file (offset for reads and writes)‏ –  Open mode

Enables redirection from stdout to particular file 11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and

Remzi Arpaci-Dussea, Michael Swift 10

Directories •  Directories provide:

–  a way for users to organize their files –  a convenient file name space for both users and FS’s –  a map from file name to blocks of file data on disk

•  Actually, map file name to file meta-data (which enables one to find data on disk)

•  Most file systems support multi-level directories –  naming hierarchies (/, /usr, /usr/local, /usr/local/bin, …)

•  Most file systems support the notion of current directory –  absolute names: fully-qualified starting from root of FS

bash$ cd /usr/local

–  relative names: specified with respect to current directory bash$ cd /usr/local (absolute) bash$ cd bin (relative, equivalent to cd /usr/local/bin)

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 11

Directories: Tree-Structured •  Directory listing contains <name, index>, but name can be directory

–  Directory is stored and treated like a file –  Special bit set in meta-data for directories

•  User programs can read directories •  Only system programs can write directories

–  Specify full pathname by separating directories and files with special characters (e.g., \ or /)

•  Special directories –  Root: Fixed index for meta-data (e.g., 2) –  This directory: . –  Parent directory: ..

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 12

Path name translation •  Let’s say you want to open “/one/two/three”

fd = open(“/one/two/three”, O_RDWR);

•  What goes on inside the file system? –  open directory “/” (well known, can always find) –  search the directory for “one”, get location of “one” –  open directory “one”, search for “two”, get location of “two” –  open directory “two”, search for “three”, get loc. of “three” –  open file “three” –  (of course, permissions are checked at each step)

•  FS spends lots of time walking down directory paths –  this is why open is separate from read/write (session state) –  OS will cache prefix lookups to enhance performance

•  /a/b, /a/bb, /a/bbb all share the “/a” prefix

Page 4: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

4

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 13

Acyclic-Graph Directories •  More general than tree structure

–  Add connections across the tree (no cycles) –  Create links from one file (or directory) to another

•  Hard link: “ln a b” (“a” must exist already) –  Idea: Can use name “a” or “b” to get to same file data –  Implementation: Multiple directory entries point to same meta-data –  What happens when you remove a? Does b still exist?

•  How is this feature implemented??? –  Unix: Does not create hard links to directories. Why?

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 14

File System Workloads drive designs •  Motivation: Workloads influence design of file system •  File characteristics (measurements of UNIX and NT)

–  Most files are small (about 8KB) –  Most of the disk is allocated to large files

•  (90% of data is in 10% of files) •  Access patterns

–  Sequential: Data in file is read/written in order •  Most common access pattern

–  Random (direct): Access block without referencing predecessors •  Difficult to optimize

–  Access files in same directory together •  Spatial locality

–  Access meta-data when access file •  Need meta-data to find data

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 15

Goals •  OS allocates LBNs (logical block numbers) to meta-data, file

data, and directory data –  Workload items accessed together should be close in LBN space

•  Implications –  Large files should be allocated sequentially –  Files in same directory should be allocated near each other –  Data should be allocated near its meta-data

•  Meta-Data: Where is it stored on disk? –  Embedded within each directory entry –  In data structure separate from directory entry

•  Directory entry points to meta-data

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 16

Allocation Strategies •  Progression of different approaches

–  Contiguous –  Extent-based –  Linked –  File-allocation Tables –  Indexed –  Multi-level Indexed

•  Questions –  Amount of fragmentation (internal and external)? –  Ability to grow file over time? –  Seek cost for sequential accesses? –  Speed to find data blocks for random accesses? –  Wasted space for pointers to data blocks?

Page 5: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

5

Per-file Metadata

•  In unix, the data representing a file is called an inode (for indirect node) –  Inodes contain file size, access times, owner, permissions –  Inodes contain information on how to find the file data

(locations on disk)

•  Every inode has a location on disk.

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 17 11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and

Remzi Arpaci-Dussea, Michael Swift 18

Contiguous Allocation •  Allocate each file to contiguous blocks on disk

–  Meta-data: Starting block and size of file –  OS allocates by finding sufficient free space

•  Must predict future size of file; Should space be reserved? –  Example: IBM OS/360

•  Advantages –  Little overhead for meta-data –  Excellent performance for sequential accesses –  Simple to calculate random addresses

•  Drawbacks –  Horrible external fragmentation (Requires periodic compaction) –  May not be able to grow file without moving it

Contiguous Allocation of Disk Space

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 20

Extent-Based Allocation •  Allocate multiple contiguous regions (extents) per file

–  Meta-data: Small array (2-6) designating each extent •  Each entry: starting block and size

•  Improves contiguous allocation –  File can grow over time (until run out of extents) –  Helps with external fragmentation

•  Advantages –  Limited overhead for meta-data –  Very good performance for sequential accesses –  Simple to calculate random addresses

•  Disadvantages (Small number of extents): –  External fragmentation can still be a problem –  Not able to grow file when run out of extents

D � A � A � A � B � B � B � B � C� C� C� B � B �D � D �

Page 6: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

6

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 21

Linked Allocation •  Allocate linked-list of fixed-sized blocks

–  Meta-data: Location of first block of file •  Each block also contains pointer to next block

–  Examples: TOPS-10, Alto

•  Advantages –  No external fragmentation –  Files can be easily grown, with no limit

•  Disadvantages –  Cannot calculate random addresses w/o reading previous blocks –  Sequential bandwidth may not be good

•  Try to allocate blocks of file contiguously for best performance –  Sensitivity to corruption

•  Trade-off: Block size (does not need to equal sector size) –  Larger --> ?? –  Smaller --> ??

D � A � A � A � B � B � B � B � C� C� C� B � B �D � D � D � D �B �

Linked Allocation

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 23

File-Allocation Table (FAT) •  Variation of Linked allocation

–  Keep linked-list information for all files in on-disk FAT table –  Meta-data: Location of first block of file

•  And, FAT table itself

•  Comparison to Linked Allocation –  Same basic advantages and disadvantages –  Disadvantage: Read from two disk locations for every data read –  Optimization: Cache FAT in main memory

•  Advantage: Greatly improves random accesses

File-Allocation Table

Page 7: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

7

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 25

Indexed Allocation

•  Allocate fixed-sized blocks for each file –  Meta-data: Fixed-sized array of block pointers

•  Allocate space for ptrs at file creation time

•  Advantages –  No external fragmentation –  Files can be easily grown, with no limit –  Supports random access

•  Disadvantages –  Large overhead for meta-data:

•  Wastes space for unneeded pointers (most files are small!)

Indexed Allocation •  Brings all pointers together into the index block •  Logical view

index table"

Example of Indexed Allocation

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 28

Multi-Level Indexed Files •  Variation of Indexed Allocation

–  Dynamically allocate hierarchy of pointers to blocks as needed –  Meta-data: Small number of pointers allocated statically

•  Additional pointers to blocks of pointers –  Examples: original UNIX-based file systems

•  Comparison to Indexed Allocation –  Advantage: Does not waste space for unneeded pointers

•  Still fast access for small files •  Can grow to what size??

–  Disadvantage: Need to read indirect blocks of pointers to calculate addresses (extra disk read)

•  Keep indirect blocks cached in main memory

indirect �double�indirect � indirect � triple�

indirect �

Page 8: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

8

Multi-level Indexes

•  Inode = data on disk containing metadata for a file •  Algorithm for looking up a block

–  If block number < # direct •  Read block number from inode

–  If block number < (# direct) + (block size/index size) •  Read indirect block address from inode •  Read indirect block from disk •  Find index in indirect block (block no. - # direct)

–  If block number < …

•  Compare to multi-level page tables in the file system –  What are the differences? –  Why? –  As with page tables, block size affects how big files can be

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 29 11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and

Remzi Arpaci-Dussea, Michael Swift 30

Free space management

•  How do you remember which blocks are free? –  What operations are needed?

•  Free a block •  Get a free block(s) -- in some particular location

•  Free list: linked list of free blocks –  Advantages: simple, constant-time operation –  Disadvantage: rapidly loses locality –  Used in Unix UFS and FAT

•  Bitmap: bitmap of all blocks indicating which are free –  Advantages: can find strings of consecutive free blocks

•  X86 provides instructions to find 1 bits –  Disadvantages: space overhead –  Used in Unix FFS

Linked Free Space List on Disk

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 32

Directory internals

•  A directory is typically just a file that happens to contain special metadata –  directory = list of (name of file, file attributes) –  attributes include such things as:

•  size, protection, location on disk, creation time, access time, … –  the directory list is usually unordered (effectively random)

•  when you type “ls”, the “ls” command sorts the results for you

Page 9: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

9

Directory Implementation

•  A directory is a file containing –  name –  metadata about file (Windows)

•  size •  owner •  data locations

–  Pointer to file metadata (Unix) •  Organization

–  Linear list of file names with pointer to the data blocks •  simple to program •  time-consuming to execute

–  BTree – balanced tree sorted by name •  Faster searching for large directories

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 34

The tree (directory, hierarchical) file system

•  A directory is a flat file of fixed-size entries •  Each entry consists of an i-node number and a file

name i-node number File name

152 . 18 ..

216 my_file 4 another_file

93 oh_my_god 144 a_directory

•  It’s as simple as that!

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 35

Using directories •  How do you find files?

–  Read the directory, search for the name you want (checking for wildcards)

•  How do you list files (ls) –  Read directory contents, print name field

•  How do you list file attributes (ls -l) –  Read directory contents, open inodes, print name +

attributes

•  How do you sort the output (ls -S, ls -t) –  The FS doesn’t do it -- ls does it!

Mounting Directories

•  When a system has multiple disks or partitions, how are they named? –  Dos/Windows: drive letters:

•  c:\programs files\... •  D:\data

–  Unix: mount points •  /mnt/

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 36

Page 10: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

10

File System Mounting •  Mounting stores in memory a pointer from a

directory on one partition to the root of another partition

•  During access, the OS checks every directory to see if it is a real directory or a mount point –  If mount point, follows pointer in memory to find the

root directory of the other file system.

Mounting Replaces Directories

What happens if you mount (b) under /users?

Resulting File System Steps to Create a File

1.  Check if name is in use a.  Find directory inode b.  Read directory contents for existing files

2.  Allocate inode a.  Update from free inode bitmap/list b.  Fill in inode contents

3.  Add inode to directory a.  Write back directory contents

What happens if you do this in the wrong order?

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 40

Page 11: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

11

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 41

File system consistency

•  Metadata, directory and file blocks are cached in memory

•  The “sync” command forces memory-resident disk information to be written to disk –  system does a sync every few seconds

•  A crash or power failure between sync’s can leave an inconsistent disk

•  You could reduce the frequency of problems by reducing caching, but performance would suffer big-time

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 42

i-check: consistency of the flat file system

•  Is each block on exactly one list? –  create a bit vector with as many entries as there are blocks –  follow the free list and each i-node block list –  when a block is encountered, examine its bit

•  If the bit was 0, set it to 1 •  if the bit was already 1

–  if the block is both in a file and on the free list, remove it from the free list and cross your fingers

–  if the block is in two files, call support!

–  if there are any 0’s left at the end, put those blocks on the free list

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 43

d-check: consistency of the directory file system

•  Do the directories form a tree? •  Does the link count of each file equal the number of

directories links to it? –  I will spare you the details

•  uses a zero-initialized vector of counters, one per i-node •  walk the tree, then visit every i-node

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 44

Protection systems

•  FS must implement some kind of protection system –  to control who can access a file (user) –  to control how they can access it (e.g., read, write, or exec)

•  More generally: –  generalize files to objects (the “what”) –  generalize users to principals (the “who”, user or program) –  generalize read/write to actions (the “how”, or operations)

•  A protection system dictates whether a given action performed by a given principal on a given object should be allowed –  e.g., you can read or write your files, but others cannot –  e.g., your can read /etc/motd but you cannot write to it

Page 12: Role of OS for I/O CS 537 Lecture 13 File Systemspages.cs.wisc.edu/~swift/classes/cs537-fa11/lectures/13...5 Per-file Metadata • In unix, the data representing a file is called an

12

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 45

Model for representing protection •  Two different ways of thinking about it:

–  access control lists (ACLs) •  for each object, keep list of principals and principals’ allowed actions •  Like a guest list (check identity of caller on each access)

–  capabilities •  for each principal, keep list of objects and principal’s allowed actions •  Like a key (something you present to open a door)

•  Both can be represented with the following matrix:

/etc/passwd /home/swift /home/guest

root rw rw rw

swift r rw r

guest r principals

objects

ACL

capability

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 46

ACLs vs. Capabilities •  Capabilities are easy to transfer

–  they are like keys: can hand them off –  they make sharing easy

•  ACLs are easier to manage –  object-centric, easy to grant and revoke

•  to revoke capability, need to keep track of principals that have it •  hard to do, given that principals can hand off capabilities

•  ACLs grow large when object is heavily shared –  can simplify by using “groups”

•  put users in groups, put groups in ACLs •  you are could be in the “cs537-students” group

–  additional benefit •  change group membership, affects ALL objects that have this group in

its ACL

11/7/11 © 2004-2007 Ed Lazowska, Hank Levy, Andrea and Remzi Arpaci-Dussea, Michael Swift 47

Protection in the Unix FS

•  Objects: individual files •  Principals: owner/group/world •  Actions: read/write/execute

•  This is pretty simple and rigid, but it has proven to be about what we can handle!