Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon...

36
Allows one to store and retrieve files. Keeps track of the names of the files. Keeps track of file attributes such as protection and owner. A filesystem is a structure on disk that (data) blocks: contain the actual file contents. inodes: contain information on where the file is. super-blocks: describe where inodes and blocks are located on disk, A linux filesystem consists of three kinds of entities: directory: a special kind of file that associates file names with file numbers. and -- additionally: Review of basic filesystem concepts Basic filesystem concepts Tuesday, November 22, 2011 2:04 PM Physical Page 1

Transcript of Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon...

Page 1: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Allows one to store and retrieve files. Keeps track of the names of the files. Keeps track of file attributes such as protection and owner.

A filesystem is a structure on disk that

(data) blocks: contain the actual file contents. inodes: contain information on where the file is. super-blocks: describe where inodes and blocks are located on disk,

A linux filesystem consists of three kinds of entities:

directory: a special kind of file that associates file names with file numbers.

and -- additionally:

Review of basic filesystem concepts

Basic filesystem conceptsTuesday, November 22, 2011 2:04 PM

Physical Page 1

Page 2: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

The exact same lecture as last time ...... at a different level of abstraction ...

Last time: the logical organization of a filesystem.

This time: the physical organization. •

Knowledge boundaries between distinct subsystems.

Contractual obligations about performance that do not require communication

between subsystems.

Some themes:

E.g., the paging system guarantees that repeated accesses to the same file block take little time, because that block's contents are cached.

This timeMonday, November 27, 2017 4:27 PM

Physical Page 2

Page 3: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Filesystems are based upon the existence of the paging subsystem.

If we're messing around with one file or region of disk, we're likely to continue.

Heavily use the principle of locality:

Utilize very efficient binary flag structures.

What's important

(An array is any function f that locates an object f(n) in memory (n = integer) in O(1).)

A set of items has no order. A sequence of items has an order.

This is all about representing sets and sequences.

"a file is a sequence of blocks"

What's important to filesystemsTuesday, November 20, 2012 4:30 PM

Physical Page 3

Page 4: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

"Identity node".

owner, group, protection, where its blocks are located.

Contains all attributes of a file or directory:

Does not contain the name. Found by number, not name.

What is an inode?

What is an inode? Tuesday, November 22, 2011 2:06 PM

Physical Page 4

Page 5: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Just a big bag of bits. Typically, 8192 bytes, but this can vary.

What is a block?

What is a block? Tuesday, November 22, 2011 2:08 PM

Physical Page 5

Page 6: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Typically, inodes and blocks are striped into alternating stripes on the disk:

Inode/block stripingTuesday, November 22, 2011 2:12 PM

Physical Page 6

Page 7: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

A pair: <inode, sequence of blocks>Inode identified as a number. inode is an offset into a descriptor table; descriptor tells where the blocks are and how to find them.

We require O(1) random access by inode number. Better to say "array" of inodes:

We require O(1) random access by block number. Better to say: "array" of blocks.

What is a file?

What is a file? Tuesday, November 22, 2011 2:10 PM

Physical Page 7

Page 8: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

A special kind of file. Contains a mapping from names to inodes. Covers one part of a file path.

What is a directory?

What is a directory? Tuesday, November 22, 2011 2:10 PM

Physical Page 8

Page 9: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

All groups of inodesAll block groupsWhat directory is root ("/")

A descriptor with pointers to

Duplicated all over the disk. If it is lost, disk data becomes meaningless!

What is a super-block?

What is a super-block?Tuesday, November 22, 2011 2:09 PM

Physical Page 9

Page 10: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Logical structure of a linux filesystemThursday, December 03, 2009 2:11 PM

Physical Page 10

Page 11: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

The identity of a file is a number. Its name is constructed as a sequence of directories. One file can have multiple names, but only one number.

Some really difficult things to understand

Some really difficult things to understandTuesday, November 22, 2011 1:49 PM

Physical Page 11

Page 12: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Start at file. Find the directory it's in. Go up to its parent using '..'.

That name is a component of the filename.

Find the entry of the parent directory that points to this one.

This means you are at the root. Repeat until the number for . == the number for ..

Constructing a filename from a number:

Constructing a name from a numberTuesday, November 22, 2011 1:50 PM

Physical Page 12

Page 13: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Example: constructing a name from a numberTuesday, November 22, 2011 1:55 PM

Physical Page 13

Page 14: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Why? It stores that directory in the PCB as a number.

One of the most expensive system calls is getcwd(), which returns the current working directory for a process.

Why? Preventing an infinite loop would make getcwd even slower than it already is!

If a filesystem has a loop, then doing a getcwd inside the loop will crash the OS.

hard links to directories may only be created by root.

For this reason,

Some very counter-intuitive facts

mSome very counter-intuitive factsTuesday, November 22, 2011 1:55 PM

Physical Page 14

Page 15: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered. In a realistic filesystem, there might be "free" or "used" arrays. The reason that this is reasonable is the principle of locality: that, on average, processes reference stuff in a local region.So, the used array can be cached, and tends to stay in cache. A realistic file system takes advantage of both locality and the fact that there is a block cache!

So far,

So farThursday, December 02, 2010 4:33 PM

Physical Page 15

Page 16: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Logical superstructureThursday, December 03, 2009 2:21 PM

Physical Page 16

Page 17: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

So far, we are using "sequence" in a loose way. A file is a "sequence of blocks".The physical way that sequence is defined depends upon the recording medium and a number of design tradeoffs.

From logical to physical

Rock: need to be space efficient. Hard place: need to be able to get to any file block in O(1)

Between "rock" and "hard place"

From logical to physicalThursday, December 03, 2009 2:29 PM

Physical Page 17

Page 18: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Unlimited writes -- no degradation. Minimum time before failure (MBTF) does not depend upon # of writes.

Can write or erase a block at a time.

Magnetic disksMonday, November 23, 2015 4:51 PM

Physical Page 18

Page 19: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Design question 1: what is a sequence of blocks?

What is a sequence?Thursday, December 03, 2009 2:34 PM

Physical Page 19

Page 20: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Design question 2: how to keep track of

unused blocks.

Design question 2: how to keep track of unused blocks?Thursday, December 03, 2009 2:39 PM

Physical Page 20

Page 21: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Primitive linux filesystem.No fancy features (until ext3!).Desires: robustness, simplicity. Medium properties: unlimited writes: a write does not degrade the medium (it's degraded by time

spent running).

Case study: ext2

Case study: ext2fs Thursday, December 03, 2009 2:44 PM

Physical Page 21

Page 22: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Free and used blocks and inodes tracked via arrays of T/F bits

Sequences of blocks represented viaarray indirection that uses whole blocks

as arrays of pointers to blocks.

Ext2 design decisions

Ext2 design decisionsThursday, December 03, 2009 2:46 PM

Physical Page 22

Page 23: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

These are not memory pointers like you're used to. They are offsets into the block descriptor arrays. Each one is an integer defining a block on disk.

owner, group, protectionsaccess count of references from directories12 block pointers for first twelve blocks of file. 1 pointer to a block used for single-level indirection

1 pointer for double-level indirection1 pointer for triple-level indirection

Ext2 inode contains:

Ext2 inodeThursday, December 03, 2009 2:48 PM

Physical Page 23

Page 24: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Picture of an ext2 inode:

Picture of an ext2 inodeThursday, December 03, 2009 2:50 PM

Physical Page 24

Page 25: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Statistically, most files are small! The inode contains enough information for files up to 1024*12 bytes.

After this, we allocate a block that can represent indirection to 256*1024 more

bytes. ...

Why indirection?

You remember the boundaries first 1024*12 bytes: in inodenext 256*1024 bytes: in single indirectionnext 256*256*1024 bytes: in double indirection.

Reason it's O(1):

determine which indirection kind, subtract off size of previous blockdivide by block size.

Algorithm for indirection:

Suppose you want byte number 50,000In first indirection blockBase of indirection block is 50,000 - 12*1024Offset of block = (50,000-12*1024)/1024Offset of byte in block (50,000-12*1024)%1024

Why indirection?Thursday, December 03, 2009 3:08 PM

Physical Page 25

Page 26: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

1024-byte blocks

16 inodes per inode block 64-byte inodes, 1024/64 = 210/26=24=

256 block pointers per indirection block

4-byte block pointers, 1024/4 =

Typical ext2 block sizes (early version)

Ext2 block handlingThursday, December 03, 2009 2:59 PM

Physical Page 26

Page 27: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Pager is completely unaware of the nature of pages. But the pager is absolutely essential to the timely function of this algorithm, because all disk objects

become -- for a time -- memory objects!

Big point:

Big point: Tuesday, November 20, 2012 5:15 PM

Physical Page 27

Page 28: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

"magic" code for kind of filesystemsize of a blocknumber of free and total data blocksnumber of free and total inodeslocation of inode for root of filesystemlocations of all "block groups"

The super-block contains

It's on the disk. It's accessed all the time. It's always cached!

The super-blockThursday, December 03, 2009 3:14 PM

Physical Page 28

Page 29: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Block groups

its own inode tableits own data blocksa copy of the super-blockbitmaps determining used inodes and used data blocks.

In ext2, the disk is partitioned into block groups, each with

Block groupsThursday, December 03, 2009 3:10 PM

Physical Page 29

Page 30: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

reuse freed blocks as fast as possiblea)=> same block of the used bitarray will be used for the previous free and next allocation.

b)

=> the bit allocation array will stay in the page cache!

c)

Three things in concertThursday, December 02, 2010 4:50 PM

Physical Page 30

Page 31: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Keep track of the last block allocated.

To allocate another, look forward in the bit array for a 0. On average, this is in the same cache page! ⟶ it's a memory operation!

A really slimy trickMonday, November 23, 2015 5:09 PM

Physical Page 31

Page 32: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Reason that filesystem is organized this way: quick array indirection.

Array access

check bit i%32 of block bitarray element i/32, assuming 32-bit words in the bitarray.

inode checking works likewise.

Determining whether block i is used:

Formula: (word[bit/32] & (1<<(bit%32)))!=0

Array accessThursday, December 03, 2009 3:18 PM

Physical Page 32

Page 33: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Same issue for block groups. If we want block n on a disk and there are m blocks in a group

n/m is the group number (starting at 0)and n%m is the block in the group.

How to find block b of a file: Suppose inode block pointers start at p.if b<12, block number is p[b]if b<268, block number is p[12][b-12] (treating block as array)

Similar pattern for multiple-indirection: for two levels: p[13][(b-268)/32][(b-268)%32]

Array access continued

Array access continuedThursday, December 03, 2009 3:21 PM

Physical Page 33

Page 34: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

Q: Why do I have no worries about using a bit array for determining used blocks and inodes?

A: because block paging is still in effect!

Crazy like a fox!

If data we need is localized in a small number of blocks, then on average, that data will remain paged in, and we'll be interacting with memory

rather than disk!

Disk version of principle of locality:

Superblock is always paged in, always written to. If I sequentially access a file, its block descriptors are read in one at a time with a minimum number

of reads.

Examples:

Crazy like a fox!Thursday, December 03, 2009 3:29 PM

Physical Page 34

Page 35: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

use bit-arrays for determining used blocks and inodes because on average, these bit arrays will remain in memory! use indirection for finding blocks in large files because on average, files are small!

Why were these choices made?

Basic principles of filesystem designThursday, December 03, 2009 3:27 PM

Physical Page 35

Page 36: Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered.

This all works because two subsystems obey contractual obligations to one another with no detailed knowledge of the function of the other subsystem.

There is no explicit interface that does this; the contractual obligations suffice!

The cosmic factMonday, November 18, 2013 7:00 PM

Physical Page 36