AOS Lab 9: File system -- Of buffers, logs, and blocks
-
Upload
zubair-nabi -
Category
Technology
-
view
829 -
download
0
description
Transcript of AOS Lab 9: File system -- Of buffers, logs, and blocks
Lab 9: File system – Of buffers, logs, and blocksAdvanced Operating Systems
Zubair Nabi
April 3, 2013
Introduction
The purpose of a file system is to:
1 Organize and store data
2 Support sharing of data among users and applications
3 Ensure persistence of data after a reboot
Introduction
The purpose of a file system is to:
1 Organize and store data
2 Support sharing of data among users and applications
3 Ensure persistence of data after a reboot
Introduction
The purpose of a file system is to:
1 Organize and store data
2 Support sharing of data among users and applications
3 Ensure persistence of data after a reboot
Challenges
• Need on-disk data structures to:• Represent the tree of named directories and files
• Record the identities of the blocks that hold each file’s content• Keep track of the areas of the disk which are free
• The file system needs to support crash recovery• A restart must not corrupt the file system or leave it in an
inconsistent state
• The file system can be accessed by multiple processes at thesame time and this access needs to be synchronized
• Disk access is orders of magnitude slower than memory access,so the file system must maintain an in-memory cache of popularblocks
Challenges
• Need on-disk data structures to:• Represent the tree of named directories and files• Record the identities of the blocks that hold each file’s content
• Keep track of the areas of the disk which are free
• The file system needs to support crash recovery• A restart must not corrupt the file system or leave it in an
inconsistent state
• The file system can be accessed by multiple processes at thesame time and this access needs to be synchronized
• Disk access is orders of magnitude slower than memory access,so the file system must maintain an in-memory cache of popularblocks
Challenges
• Need on-disk data structures to:• Represent the tree of named directories and files• Record the identities of the blocks that hold each file’s content• Keep track of the areas of the disk which are free
• The file system needs to support crash recovery• A restart must not corrupt the file system or leave it in an
inconsistent state
• The file system can be accessed by multiple processes at thesame time and this access needs to be synchronized
• Disk access is orders of magnitude slower than memory access,so the file system must maintain an in-memory cache of popularblocks
Challenges
• Need on-disk data structures to:• Represent the tree of named directories and files• Record the identities of the blocks that hold each file’s content• Keep track of the areas of the disk which are free
• The file system needs to support crash recovery
• A restart must not corrupt the file system or leave it in aninconsistent state
• The file system can be accessed by multiple processes at thesame time and this access needs to be synchronized
• Disk access is orders of magnitude slower than memory access,so the file system must maintain an in-memory cache of popularblocks
Challenges
• Need on-disk data structures to:• Represent the tree of named directories and files• Record the identities of the blocks that hold each file’s content• Keep track of the areas of the disk which are free
• The file system needs to support crash recovery• A restart must not corrupt the file system or leave it in an
inconsistent state
• The file system can be accessed by multiple processes at thesame time and this access needs to be synchronized
• Disk access is orders of magnitude slower than memory access,so the file system must maintain an in-memory cache of popularblocks
Challenges
• Need on-disk data structures to:• Represent the tree of named directories and files• Record the identities of the blocks that hold each file’s content• Keep track of the areas of the disk which are free
• The file system needs to support crash recovery• A restart must not corrupt the file system or leave it in an
inconsistent state
• The file system can be accessed by multiple processes at thesame time and this access needs to be synchronized
• Disk access is orders of magnitude slower than memory access,so the file system must maintain an in-memory cache of popularblocks
Challenges
• Need on-disk data structures to:• Represent the tree of named directories and files• Record the identities of the blocks that hold each file’s content• Keep track of the areas of the disk which are free
• The file system needs to support crash recovery• A restart must not corrupt the file system or leave it in an
inconsistent state
• The file system can be accessed by multiple processes at thesame time and this access needs to be synchronized
• Disk access is orders of magnitude slower than memory access,so the file system must maintain an in-memory cache of popularblocks
xv6 FS layers
File descriptors
Recursive lookup
Directory inodes
Inodes and block allocator
Logging
Buffer cache
System calls
Pathnames
Directories
Files
Transactions
Blocks
xv6 FS layers (2)
1 Buffer cache: Reads and writes blocks on the IDE disk via thebuffer cache, which synchronizes access to disk blocks
• Ensures that only one kernel process can edit any particular blockat a time
2 Logging: Ensures atomicity by enabling higher layers to wrapupdates to several blocks in a transaction
3 Inodes and block allocator: Provides unnamed files, eachunnamed file is represented by an inode and a sequence ofblocks holding the file content
xv6 FS layers (2)
1 Buffer cache: Reads and writes blocks on the IDE disk via thebuffer cache, which synchronizes access to disk blocks
• Ensures that only one kernel process can edit any particular blockat a time
2 Logging: Ensures atomicity by enabling higher layers to wrapupdates to several blocks in a transaction
3 Inodes and block allocator: Provides unnamed files, eachunnamed file is represented by an inode and a sequence ofblocks holding the file content
xv6 FS layers (2)
1 Buffer cache: Reads and writes blocks on the IDE disk via thebuffer cache, which synchronizes access to disk blocks
• Ensures that only one kernel process can edit any particular blockat a time
2 Logging: Ensures atomicity by enabling higher layers to wrapupdates to several blocks in a transaction
3 Inodes and block allocator: Provides unnamed files, eachunnamed file is represented by an inode and a sequence ofblocks holding the file content
xv6 FS layers (2)
1 Buffer cache: Reads and writes blocks on the IDE disk via thebuffer cache, which synchronizes access to disk blocks
• Ensures that only one kernel process can edit any particular blockat a time
2 Logging: Ensures atomicity by enabling higher layers to wrapupdates to several blocks in a transaction
3 Inodes and block allocator: Provides unnamed files, eachunnamed file is represented by an inode and a sequence ofblocks holding the file content
xv6 FS layers (3)
4 Directory inodes: Implements directories as a special kind ofinode
• The content of this inode is a sequence of directory entries, eachof which contains a name and a reference to the named file’sinode
5 Recursive lookup: Provides hierarchical path names such as/foo/bar/baz.txt, via recursive lookup
6 File descriptors: Abstracts many Unix resources, such as pipes,devices, file, etc., using the file system interface
xv6 FS layers (3)
4 Directory inodes: Implements directories as a special kind ofinode
• The content of this inode is a sequence of directory entries, eachof which contains a name and a reference to the named file’sinode
5 Recursive lookup: Provides hierarchical path names such as/foo/bar/baz.txt, via recursive lookup
6 File descriptors: Abstracts many Unix resources, such as pipes,devices, file, etc., using the file system interface
xv6 FS layers (3)
4 Directory inodes: Implements directories as a special kind ofinode
• The content of this inode is a sequence of directory entries, eachof which contains a name and a reference to the named file’sinode
5 Recursive lookup: Provides hierarchical path names such as/foo/bar/baz.txt, via recursive lookup
6 File descriptors: Abstracts many Unix resources, such as pipes,devices, file, etc., using the file system interface
xv6 FS layers (3)
4 Directory inodes: Implements directories as a special kind ofinode
• The content of this inode is a sequence of directory entries, eachof which contains a name and a reference to the named file’sinode
5 Recursive lookup: Provides hierarchical path names such as/foo/bar/baz.txt, via recursive lookup
6 File descriptors: Abstracts many Unix resources, such as pipes,devices, file, etc., using the file system interface
File system layout
• xv6 lays out inodes and content blocks on the disk by dividing thedisk into several sections
boot super bitmap... data...inodes... log...
0 1 2 …..
• Block 0 holds the boot sector• Block 1 (called the superblock) contains metadata about the file
system• File system size in blocks, the number of data blocks, the number
of inodes, and the number of blocks in the log
• Blocks starting at 2 hold inodes, with multiple inodes per block
File system layout
• xv6 lays out inodes and content blocks on the disk by dividing thedisk into several sections
boot super bitmap... data...inodes... log...
0 1 2 …..
• Block 0 holds the boot sector
• Block 1 (called the superblock) contains metadata about the filesystem
• File system size in blocks, the number of data blocks, the numberof inodes, and the number of blocks in the log
• Blocks starting at 2 hold inodes, with multiple inodes per block
File system layout
• xv6 lays out inodes and content blocks on the disk by dividing thedisk into several sections
boot super bitmap... data...inodes... log...
0 1 2 …..
• Block 0 holds the boot sector• Block 1 (called the superblock) contains metadata about the file
system
• File system size in blocks, the number of data blocks, the numberof inodes, and the number of blocks in the log
• Blocks starting at 2 hold inodes, with multiple inodes per block
File system layout
• xv6 lays out inodes and content blocks on the disk by dividing thedisk into several sections
boot super bitmap... data...inodes... log...
0 1 2 …..
• Block 0 holds the boot sector• Block 1 (called the superblock) contains metadata about the file
system• File system size in blocks, the number of data blocks, the number
of inodes, and the number of blocks in the log
• Blocks starting at 2 hold inodes, with multiple inodes per block
File system layout
• xv6 lays out inodes and content blocks on the disk by dividing thedisk into several sections
boot super bitmap... data...inodes... log...
0 1 2 …..
• Block 0 holds the boot sector• Block 1 (called the superblock) contains metadata about the file
system• File system size in blocks, the number of data blocks, the number
of inodes, and the number of blocks in the log
• Blocks starting at 2 hold inodes, with multiple inodes per block
File system layout
boot super bitmap... data...inodes... log...
0 1 2 …..
• inode blocks are followed by bitmap blocks which keep track ofdata blocks in use
• Bitmap blocks are followed by data blocks which hold file anddirectory contents
• Finally at the end, the blocks hold a log which is required by thetransaction layer
File system layout
boot super bitmap... data...inodes... log...
0 1 2 …..
• inode blocks are followed by bitmap blocks which keep track ofdata blocks in use
• Bitmap blocks are followed by data blocks which hold file anddirectory contents
• Finally at the end, the blocks hold a log which is required by thetransaction layer
File system layout
boot super bitmap... data...inodes... log...
0 1 2 …..
• inode blocks are followed by bitmap blocks which keep track ofdata blocks in use
• Bitmap blocks are followed by data blocks which hold file anddirectory contents
• Finally at the end, the blocks hold a log which is required by thetransaction layer
Buffer cache layer
• Has two main jobs:1 Synchronize access to disk blocks
2 Cache popular blocks
• Main interface:1 bread: Obtains a buffer containing a copy of a block2 bwrite: Writes a modified buffer3 brelse: Releases a buffer (after a read or write)
Buffer cache layer
• Has two main jobs:1 Synchronize access to disk blocks2 Cache popular blocks
• Main interface:1 bread: Obtains a buffer containing a copy of a block2 bwrite: Writes a modified buffer3 brelse: Releases a buffer (after a read or write)
Buffer cache layer
• Has two main jobs:1 Synchronize access to disk blocks2 Cache popular blocks
• Main interface:1 bread: Obtains a buffer containing a copy of a block
2 bwrite: Writes a modified buffer3 brelse: Releases a buffer (after a read or write)
Buffer cache layer
• Has two main jobs:1 Synchronize access to disk blocks2 Cache popular blocks
• Main interface:1 bread: Obtains a buffer containing a copy of a block2 bwrite: Writes a modified buffer
3 brelse: Releases a buffer (after a read or write)
Buffer cache layer
• Has two main jobs:1 Synchronize access to disk blocks2 Cache popular blocks
• Main interface:1 bread: Obtains a buffer containing a copy of a block2 bwrite: Writes a modified buffer3 brelse: Releases a buffer (after a read or write)
Buffer cache layer (2)
• Synchronizes access to each block by allowing only a singlekernel thread to have a reference to the block’s buffer
• If one thread is holding a reference to a buffer, other threads willsleep on it
• The buffer cache has a fixed number of buffers to host disk blocks
• If higher layers ask for a block that is not cached, the buffer cacherecycles the least recently used buffer for this block
Buffer cache layer (2)
• Synchronizes access to each block by allowing only a singlekernel thread to have a reference to the block’s buffer
• If one thread is holding a reference to a buffer, other threads willsleep on it
• The buffer cache has a fixed number of buffers to host disk blocks
• If higher layers ask for a block that is not cached, the buffer cacherecycles the least recently used buffer for this block
Buffer cache layer (2)
• Synchronizes access to each block by allowing only a singlekernel thread to have a reference to the block’s buffer
• If one thread is holding a reference to a buffer, other threads willsleep on it
• The buffer cache has a fixed number of buffers to host disk blocks
• If higher layers ask for a block that is not cached, the buffer cacherecycles the least recently used buffer for this block
Buffer cache layer (2)
• Synchronizes access to each block by allowing only a singlekernel thread to have a reference to the block’s buffer
• If one thread is holding a reference to a buffer, other threads willsleep on it
• The buffer cache has a fixed number of buffers to host disk blocks
• If higher layers ask for a block that is not cached, the buffer cacherecycles the least recently used buffer for this block
Buffer cache
• The buffer cache is a doubly-linked of struct buf, with NBUFbuffers, accessed via bcache.head
• A buffer has three state bits1 B_VALID2 B_DIRTY3 B_BUSY
Buffer cache
• The buffer cache is a doubly-linked of struct buf, with NBUFbuffers, accessed via bcache.head
• A buffer has three state bits
1 B_VALID2 B_DIRTY3 B_BUSY
Buffer cache
• The buffer cache is a doubly-linked of struct buf, with NBUFbuffers, accessed via bcache.head
• A buffer has three state bits1 B_VALID
2 B_DIRTY3 B_BUSY
Buffer cache
• The buffer cache is a doubly-linked of struct buf, with NBUFbuffers, accessed via bcache.head
• A buffer has three state bits1 B_VALID2 B_DIRTY
3 B_BUSY
Buffer cache
• The buffer cache is a doubly-linked of struct buf, with NBUFbuffers, accessed via bcache.head
• A buffer has three state bits1 B_VALID2 B_DIRTY3 B_BUSY
bread
• Makes a call to bget() to get a buffer for the given sector
• If the buffer is not B_VALID, it makes a call to iderw to read itinto the buffer cache
bread
• Makes a call to bget() to get a buffer for the given sector
• If the buffer is not B_VALID, it makes a call to iderw to read itinto the buffer cache
Code: bread
struct buf*bread(uint dev, uint sector){struct buf *b;
b = bget(dev, sector);if(!(b->flags & B_VALID))iderw(b);
return b;}
bget
• Scans the buffer list for uint dev and uint sector
1 If such a buffer is present and B_BUSY is not set, it sets it andreturns the buffer
2 If B_BUSY is set, it goes to sleep on the buffer• Important: After bget wakes up, it cannot assume that the buffer
is available now – it might have been reused for a different sector –so it starts all over
3 If the buffer is not present, it reuses an existing buffer and edits itsmetadata to record the new uint dev and uint sector andsets B_BUSY and clears B_VALID and B_DIRTY
bget
• Scans the buffer list for uint dev and uint sector1 If such a buffer is present and B_BUSY is not set, it sets it and
returns the buffer
2 If B_BUSY is set, it goes to sleep on the buffer• Important: After bget wakes up, it cannot assume that the buffer
is available now – it might have been reused for a different sector –so it starts all over
3 If the buffer is not present, it reuses an existing buffer and edits itsmetadata to record the new uint dev and uint sector andsets B_BUSY and clears B_VALID and B_DIRTY
bget
• Scans the buffer list for uint dev and uint sector1 If such a buffer is present and B_BUSY is not set, it sets it and
returns the buffer2 If B_BUSY is set, it goes to sleep on the buffer
• Important: After bget wakes up, it cannot assume that the bufferis available now – it might have been reused for a different sector –so it starts all over
3 If the buffer is not present, it reuses an existing buffer and edits itsmetadata to record the new uint dev and uint sector andsets B_BUSY and clears B_VALID and B_DIRTY
bget
• Scans the buffer list for uint dev and uint sector1 If such a buffer is present and B_BUSY is not set, it sets it and
returns the buffer2 If B_BUSY is set, it goes to sleep on the buffer
• Important: After bget wakes up, it cannot assume that the bufferis available now – it might have been reused for a different sector –so it starts all over
3 If the buffer is not present, it reuses an existing buffer and edits itsmetadata to record the new uint dev and uint sector andsets B_BUSY and clears B_VALID and B_DIRTY
bget
• Scans the buffer list for uint dev and uint sector1 If such a buffer is present and B_BUSY is not set, it sets it and
returns the buffer2 If B_BUSY is set, it goes to sleep on the buffer
• Important: After bget wakes up, it cannot assume that the bufferis available now – it might have been reused for a different sector –so it starts all over
3 If the buffer is not present, it reuses an existing buffer and edits itsmetadata to record the new uint dev and uint sector andsets B_BUSY and clears B_VALID and B_DIRTY
bwrite
• Once bread returns a buffer, the caller has exclusive use of it
• If the caller writes to the buffer, it must call bwrite
• bwrite sets B_DIRTY and makes a call to iderw
bwrite
• Once bread returns a buffer, the caller has exclusive use of it
• If the caller writes to the buffer, it must call bwrite
• bwrite sets B_DIRTY and makes a call to iderw
bwrite
• Once bread returns a buffer, the caller has exclusive use of it
• If the caller writes to the buffer, it must call bwrite
• bwrite sets B_DIRTY and makes a call to iderw
Code: bwrite
voidbwrite(struct buf *b){
if((b->flags & B_BUSY) == 0)panic("bwrite");
b->flags |= B_DIRTY;iderw(b);
}
brelse
• Moves the buffer from its current position to the front of the buffercache linked list, clears the B_BUSY bit, wakes up any processessleeping on that particular buffer
• This moving orders the buffers by how recently they were used• Why do we need to do this?
• Makes the scan in bget efficient – Remember its a doubly linkedlist
brelse
• Moves the buffer from its current position to the front of the buffercache linked list, clears the B_BUSY bit, wakes up any processessleeping on that particular buffer
• This moving orders the buffers by how recently they were used
• Why do we need to do this?• Makes the scan in bget efficient – Remember its a doubly linked
list
brelse
• Moves the buffer from its current position to the front of the buffercache linked list, clears the B_BUSY bit, wakes up any processessleeping on that particular buffer
• This moving orders the buffers by how recently they were used• Why do we need to do this?
• Makes the scan in bget efficient – Remember its a doubly linkedlist
brelse
• Moves the buffer from its current position to the front of the buffercache linked list, clears the B_BUSY bit, wakes up any processessleeping on that particular buffer
• This moving orders the buffers by how recently they were used• Why do we need to do this?
• Makes the scan in bget efficient – Remember its a doubly linkedlist
Code: brelse
void brelse(struct buf *b){if((b->flags & B_BUSY) == 0)panic("brelse");
acquire(&bcache.lock);b->next->prev = b->prev;b->prev->next = b->next;b->next = bcache.head.next;b->prev = &bcache.head;bcache.head.next->prev = b;bcache.head.next = b;b->flags &= ~B_BUSY;wakeup(b);release(&bcache.lock);
}
Logging layer
• xv6 implements file system fault tolerance through a simplelogging mechanism
• System calls do not directly write file system data structures• Instead:
1 A system call first writes a description of all the disk writes that itwishes to perform to a log on the disk
2 It then writes a special commit record to the log to specify that itcontains a complete operation
3 Next it copies the required writes to the on-disk file system datastructures
4 Finally, it deletes the log
Logging layer
• xv6 implements file system fault tolerance through a simplelogging mechanism
• System calls do not directly write file system data structures
• Instead:1 A system call first writes a description of all the disk writes that it
wishes to perform to a log on the disk2 It then writes a special commit record to the log to specify that it
contains a complete operation3 Next it copies the required writes to the on-disk file system data
structures4 Finally, it deletes the log
Logging layer
• xv6 implements file system fault tolerance through a simplelogging mechanism
• System calls do not directly write file system data structures• Instead:
1 A system call first writes a description of all the disk writes that itwishes to perform to a log on the disk
2 It then writes a special commit record to the log to specify that itcontains a complete operation
3 Next it copies the required writes to the on-disk file system datastructures
4 Finally, it deletes the log
Logging layer
• xv6 implements file system fault tolerance through a simplelogging mechanism
• System calls do not directly write file system data structures• Instead:
1 A system call first writes a description of all the disk writes that itwishes to perform to a log on the disk
2 It then writes a special commit record to the log to specify that itcontains a complete operation
3 Next it copies the required writes to the on-disk file system datastructures
4 Finally, it deletes the log
Logging layer
• xv6 implements file system fault tolerance through a simplelogging mechanism
• System calls do not directly write file system data structures• Instead:
1 A system call first writes a description of all the disk writes that itwishes to perform to a log on the disk
2 It then writes a special commit record to the log to specify that itcontains a complete operation
3 Next it copies the required writes to the on-disk file system datastructures
4 Finally, it deletes the log
Logging layer
• xv6 implements file system fault tolerance through a simplelogging mechanism
• System calls do not directly write file system data structures• Instead:
1 A system call first writes a description of all the disk writes that itwishes to perform to a log on the disk
2 It then writes a special commit record to the log to specify that itcontains a complete operation
3 Next it copies the required writes to the on-disk file system datastructures
4 Finally, it deletes the log
Recovery
• In case of a reboot, the file system performs recovery by lookingat the log file
• If the log contains the commit record, the recovery code copiesthe required writes to the on-disk data structures
• If the log does not contain a complete operation, it is ignored anddeleted
Recovery
• In case of a reboot, the file system performs recovery by lookingat the log file
• If the log contains the commit record, the recovery code copiesthe required writes to the on-disk data structures
• If the log does not contain a complete operation, it is ignored anddeleted
Recovery
• In case of a reboot, the file system performs recovery by lookingat the log file
• If the log contains the commit record, the recovery code copiesthe required writes to the on-disk data structures
• If the log does not contain a complete operation, it is ignored anddeleted
Correctness of recovery mechanism
• If the crash occurs before the commit record, the log will beignored, and the state of the disk will stay unmodified
• If the crash occurs after the commit record, then the recovery willreplay all of the operation’s writes, even repeating them if thecrash occurred during the write to the on-disk data structure
• In both cases, the correctness of the file system is preserved:Either all writes are reflected on the disk or none
Correctness of recovery mechanism
• If the crash occurs before the commit record, the log will beignored, and the state of the disk will stay unmodified
• If the crash occurs after the commit record, then the recovery willreplay all of the operation’s writes, even repeating them if thecrash occurred during the write to the on-disk data structure
• In both cases, the correctness of the file system is preserved:Either all writes are reflected on the disk or none
Correctness of recovery mechanism
• If the crash occurs before the commit record, the log will beignored, and the state of the disk will stay unmodified
• If the crash occurs after the commit record, then the recovery willreplay all of the operation’s writes, even repeating them if thecrash occurred during the write to the on-disk data structure
• In both cases, the correctness of the file system is preserved:Either all writes are reflected on the disk or none
Log design
• The log resides at a fixed location at the end of the disk
• It consists of a header block and a set of data blocks• The header block contains
1 An array of sector numbers, one for each of the logged data blocks2 Count of logged blocks
• The header block is written to after a commit• The count is set to zero once all logged blocks have been
reflected in the file system• The count will be zero in case of a crash before a commit• The count will be non-zero in case of a crash after a commit
Log design
• The log resides at a fixed location at the end of the disk
• It consists of a header block and a set of data blocks
• The header block contains1 An array of sector numbers, one for each of the logged data blocks2 Count of logged blocks
• The header block is written to after a commit• The count is set to zero once all logged blocks have been
reflected in the file system• The count will be zero in case of a crash before a commit• The count will be non-zero in case of a crash after a commit
Log design
• The log resides at a fixed location at the end of the disk
• It consists of a header block and a set of data blocks• The header block contains
1 An array of sector numbers, one for each of the logged data blocks
2 Count of logged blocks
• The header block is written to after a commit• The count is set to zero once all logged blocks have been
reflected in the file system• The count will be zero in case of a crash before a commit• The count will be non-zero in case of a crash after a commit
Log design
• The log resides at a fixed location at the end of the disk
• It consists of a header block and a set of data blocks• The header block contains
1 An array of sector numbers, one for each of the logged data blocks2 Count of logged blocks
• The header block is written to after a commit• The count is set to zero once all logged blocks have been
reflected in the file system• The count will be zero in case of a crash before a commit• The count will be non-zero in case of a crash after a commit
Log design
• The log resides at a fixed location at the end of the disk
• It consists of a header block and a set of data blocks• The header block contains
1 An array of sector numbers, one for each of the logged data blocks2 Count of logged blocks
• The header block is written to after a commit
• The count is set to zero once all logged blocks have beenreflected in the file system
• The count will be zero in case of a crash before a commit• The count will be non-zero in case of a crash after a commit
Log design
• The log resides at a fixed location at the end of the disk
• It consists of a header block and a set of data blocks• The header block contains
1 An array of sector numbers, one for each of the logged data blocks2 Count of logged blocks
• The header block is written to after a commit• The count is set to zero once all logged blocks have been
reflected in the file system
• The count will be zero in case of a crash before a commit• The count will be non-zero in case of a crash after a commit
Log design
• The log resides at a fixed location at the end of the disk
• It consists of a header block and a set of data blocks• The header block contains
1 An array of sector numbers, one for each of the logged data blocks2 Count of logged blocks
• The header block is written to after a commit• The count is set to zero once all logged blocks have been
reflected in the file system• The count will be zero in case of a crash before a commit
• The count will be non-zero in case of a crash after a commit
Log design
• The log resides at a fixed location at the end of the disk
• It consists of a header block and a set of data blocks• The header block contains
1 An array of sector numbers, one for each of the logged data blocks2 Count of logged blocks
• The header block is written to after a commit• The count is set to zero once all logged blocks have been
reflected in the file system• The count will be zero in case of a crash before a commit• The count will be non-zero in case of a crash after a commit
Log design (2)
• A transaction sequence is indicated by the start and endsequence of writes in the system call
• Only one system call can be in a transaction at any given time toensure correctness
• The log holds at most one transaction at a time
• Only read system calls can execute concurrently with atransaction
• A fixed amount of space on the disk is dedicated to hold the log• No system call can write more distinct blocks than the size of the
log• Large writes are broken into multiple smaller writes so that each
write can fit in the log
Log design (2)
• A transaction sequence is indicated by the start and endsequence of writes in the system call
• Only one system call can be in a transaction at any given time toensure correctness
• The log holds at most one transaction at a time
• Only read system calls can execute concurrently with atransaction
• A fixed amount of space on the disk is dedicated to hold the log• No system call can write more distinct blocks than the size of the
log• Large writes are broken into multiple smaller writes so that each
write can fit in the log
Log design (2)
• A transaction sequence is indicated by the start and endsequence of writes in the system call
• Only one system call can be in a transaction at any given time toensure correctness
• The log holds at most one transaction at a time
• Only read system calls can execute concurrently with atransaction
• A fixed amount of space on the disk is dedicated to hold the log• No system call can write more distinct blocks than the size of the
log• Large writes are broken into multiple smaller writes so that each
write can fit in the log
Log design (2)
• A transaction sequence is indicated by the start and endsequence of writes in the system call
• Only one system call can be in a transaction at any given time toensure correctness
• The log holds at most one transaction at a time
• Only read system calls can execute concurrently with atransaction
• A fixed amount of space on the disk is dedicated to hold the log• No system call can write more distinct blocks than the size of the
log• Large writes are broken into multiple smaller writes so that each
write can fit in the log
Log design (2)
• A transaction sequence is indicated by the start and endsequence of writes in the system call
• Only one system call can be in a transaction at any given time toensure correctness
• The log holds at most one transaction at a time
• Only read system calls can execute concurrently with atransaction
• A fixed amount of space on the disk is dedicated to hold the log
• No system call can write more distinct blocks than the size of thelog
• Large writes are broken into multiple smaller writes so that eachwrite can fit in the log
Log design (2)
• A transaction sequence is indicated by the start and endsequence of writes in the system call
• Only one system call can be in a transaction at any given time toensure correctness
• The log holds at most one transaction at a time
• Only read system calls can execute concurrently with atransaction
• A fixed amount of space on the disk is dedicated to hold the log• No system call can write more distinct blocks than the size of the
log
• Large writes are broken into multiple smaller writes so that eachwrite can fit in the log
Log design (2)
• A transaction sequence is indicated by the start and endsequence of writes in the system call
• Only one system call can be in a transaction at any given time toensure correctness
• The log holds at most one transaction at a time
• Only read system calls can execute concurrently with atransaction
• A fixed amount of space on the disk is dedicated to hold the log• No system call can write more distinct blocks than the size of the
log• Large writes are broken into multiple smaller writes so that each
write can fit in the log
Code: Typical system call usage of log
begin_trans();...bp = bread(...);bp->data[...] = ...;log_write(bp);...commit_trans();
Log functions
• begin_trans: Waits until it obtains exclusive use of the log
• log_write:• Appends the block’s new content to the log on the disk• Leaves the modified block in the buffer cache so that subsequent
reads of the block during the transaction will yield the updatedstate
• Records the block’s sector number in memory to find out when ablock is written multiple times during a transaction and overwritethe block’s previous copy in the log
• commit_trans:1 Writes the log’s header block to disk, updating the count2 Calls install_trans to copy each block from the log to the
relevant location on the disk3 Sets to count in the log header to zero
Log functions
• begin_trans: Waits until it obtains exclusive use of the log• log_write:
• Appends the block’s new content to the log on the disk
• Leaves the modified block in the buffer cache so that subsequentreads of the block during the transaction will yield the updatedstate
• Records the block’s sector number in memory to find out when ablock is written multiple times during a transaction and overwritethe block’s previous copy in the log
• commit_trans:1 Writes the log’s header block to disk, updating the count2 Calls install_trans to copy each block from the log to the
relevant location on the disk3 Sets to count in the log header to zero
Log functions
• begin_trans: Waits until it obtains exclusive use of the log• log_write:
• Appends the block’s new content to the log on the disk• Leaves the modified block in the buffer cache so that subsequent
reads of the block during the transaction will yield the updatedstate
• Records the block’s sector number in memory to find out when ablock is written multiple times during a transaction and overwritethe block’s previous copy in the log
• commit_trans:1 Writes the log’s header block to disk, updating the count2 Calls install_trans to copy each block from the log to the
relevant location on the disk3 Sets to count in the log header to zero
Log functions
• begin_trans: Waits until it obtains exclusive use of the log• log_write:
• Appends the block’s new content to the log on the disk• Leaves the modified block in the buffer cache so that subsequent
reads of the block during the transaction will yield the updatedstate
• Records the block’s sector number in memory to find out when ablock is written multiple times during a transaction and overwritethe block’s previous copy in the log
• commit_trans:1 Writes the log’s header block to disk, updating the count2 Calls install_trans to copy each block from the log to the
relevant location on the disk3 Sets to count in the log header to zero
Log functions
• begin_trans: Waits until it obtains exclusive use of the log• log_write:
• Appends the block’s new content to the log on the disk• Leaves the modified block in the buffer cache so that subsequent
reads of the block during the transaction will yield the updatedstate
• Records the block’s sector number in memory to find out when ablock is written multiple times during a transaction and overwritethe block’s previous copy in the log
• commit_trans:1 Writes the log’s header block to disk, updating the count
2 Calls install_trans to copy each block from the log to therelevant location on the disk
3 Sets to count in the log header to zero
Log functions
• begin_trans: Waits until it obtains exclusive use of the log• log_write:
• Appends the block’s new content to the log on the disk• Leaves the modified block in the buffer cache so that subsequent
reads of the block during the transaction will yield the updatedstate
• Records the block’s sector number in memory to find out when ablock is written multiple times during a transaction and overwritethe block’s previous copy in the log
• commit_trans:1 Writes the log’s header block to disk, updating the count2 Calls install_trans to copy each block from the log to the
relevant location on the disk
3 Sets to count in the log header to zero
Log functions
• begin_trans: Waits until it obtains exclusive use of the log• log_write:
• Appends the block’s new content to the log on the disk• Leaves the modified block in the buffer cache so that subsequent
reads of the block during the transaction will yield the updatedstate
• Records the block’s sector number in memory to find out when ablock is written multiple times during a transaction and overwritethe block’s previous copy in the log
• commit_trans:1 Writes the log’s header block to disk, updating the count2 Calls install_trans to copy each block from the log to the
relevant location on the disk3 Sets to count in the log header to zero
Code snippet: filewrite
begin_trans();ilock(f->ip);if ((r = writei(f->ip, addr + i, f->off, n1)) > 0)f->off += r;
iunlock(f->ip);commit_trans();
Recovery
• In case of a reboot, the file system performs recovery by lookingat the log file
• If the log contains the commit record, the recovery code copiesthe required writes to the on-disk data structures
• If the log does not contain a complete operation, it is ignored anddeleted
Recovery
• In case of a reboot, the file system performs recovery by lookingat the log file
• If the log contains the commit record, the recovery code copiesthe required writes to the on-disk data structures
• If the log does not contain a complete operation, it is ignored anddeleted
Recovery
• In case of a reboot, the file system performs recovery by lookingat the log file
• If the log contains the commit record, the recovery code copiesthe required writes to the on-disk data structures
• If the log does not contain a complete operation, it is ignored anddeleted
Code snippet: recover_from_log
static voidrecover_from_log(void){
read_head();// if committed, copy from log to diskinstall_trans();log.lh.n = 0;write_head(); // clear the log
}
Code snippet: install_trans
static void install_trans(void) {int tail;for (tail = 0; tail < log.lh.n; tail++) {// read log blockstruct buf *lbuf = bread(log.dev,
log.start+tail+1);// read dststruct buf *dbuf = bread(log.dev,
log.lh.sector[tail]);// copy block to dstmemmove(dbuf->data, lbuf->data, BSIZE);bwrite(dbuf); // write dst to diskbrelse(lbuf);brelse(dbuf);
}}
Block allocator
• Maintains a free bitmap on disk; one bit per block
• A zero bit means that the block is free while a one indicates thatthe block is in use
• The bits for the boot sector, superblock, inode blocks, and bitmapblocks are always set
• Provides two functions to allocate (balloc()) and de-allocate(bfree()) a block
Block allocator
• Maintains a free bitmap on disk; one bit per block• A zero bit means that the block is free while a one indicates that
the block is in use
• The bits for the boot sector, superblock, inode blocks, and bitmapblocks are always set
• Provides two functions to allocate (balloc()) and de-allocate(bfree()) a block
Block allocator
• Maintains a free bitmap on disk; one bit per block• A zero bit means that the block is free while a one indicates that
the block is in use• The bits for the boot sector, superblock, inode blocks, and bitmap
blocks are always set
• Provides two functions to allocate (balloc()) and de-allocate(bfree()) a block
Block allocator
• Maintains a free bitmap on disk; one bit per block• A zero bit means that the block is free while a one indicates that
the block is in use• The bits for the boot sector, superblock, inode blocks, and bitmap
blocks are always set
• Provides two functions to allocate (balloc()) and de-allocate(bfree()) a block
balloc
• Calls readsb to read the superblock to get metadata
• Uses this metadata to traverse the entire bitmap and look for abitmap in which the bit is zero
• If it finds a free block it updates the bitmap and returns the block
balloc
• Calls readsb to read the superblock to get metadata
• Uses this metadata to traverse the entire bitmap and look for abitmap in which the bit is zero
• If it finds a free block it updates the bitmap and returns the block
balloc
• Calls readsb to read the superblock to get metadata
• Uses this metadata to traverse the entire bitmap and look for abitmap in which the bit is zero
• If it finds a free block it updates the bitmap and returns the block
bfree
• Finds the corresponding bitmap block
• Clears its bitmap bit
bfree
• Finds the corresponding bitmap block
• Clears its bitmap bit
Today’s task
• xv6 does not allow concurrent transactions to the log whichmeans that if a system call performs a long write operation, allother write system calls will block
• Come up with a strategy to implement concurrent transactions tothe log in terms of pseudo-code
Reading(s)
• Chapter 6, “File system”, up to section “Code: directory layer"from “xv6: a simple, Unix-like teaching operating system”