Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon...

Post on 06-Jul-2020

3 views 0 download

Transcript of Basic filesystem concepts · 2019-12-09 · Our naïve models of storage have been based upon...

Allows one to store and retrieve files. Keeps track of the names of the files. Keeps track of file attributes such as protection and owner.

A filesystem is a structure on disk that

(data) blocks: contain the actual file contents. inodes: contain information on where the file is. super-blocks: describe where inodes and blocks are located on disk,

A linux filesystem consists of three kinds of entities:

directory: a special kind of file that associates file names with file numbers.

and -- additionally:

Review of basic filesystem concepts

Basic filesystem conceptsTuesday, November 22, 2011 2:04 PM

Physical Page 1

The exact same lecture as last time ...... at a different level of abstraction ...

Last time: the logical organization of a filesystem.

This time: the physical organization. •

Knowledge boundaries between distinct subsystems.

Contractual obligations about performance that do not require communication

between subsystems.

Some themes:

E.g., the paging system guarantees that repeated accesses to the same file block take little time, because that block's contents are cached.

This timeMonday, November 27, 2017 4:27 PM

Physical Page 2

Filesystems are based upon the existence of the paging subsystem.

If we're messing around with one file or region of disk, we're likely to continue.

Heavily use the principle of locality:

Utilize very efficient binary flag structures.

What's important

(An array is any function f that locates an object f(n) in memory (n = integer) in O(1).)

A set of items has no order. A sequence of items has an order.

This is all about representing sets and sequences.

"a file is a sequence of blocks"

What's important to filesystemsTuesday, November 20, 2012 4:30 PM

Physical Page 3

"Identity node".

owner, group, protection, where its blocks are located.

Contains all attributes of a file or directory:

Does not contain the name. Found by number, not name.

What is an inode?

What is an inode? Tuesday, November 22, 2011 2:06 PM

Physical Page 4

Just a big bag of bits. Typically, 8192 bytes, but this can vary.

What is a block?

What is a block? Tuesday, November 22, 2011 2:08 PM

Physical Page 5

Typically, inodes and blocks are striped into alternating stripes on the disk:

Inode/block stripingTuesday, November 22, 2011 2:12 PM

Physical Page 6

A pair: <inode, sequence of blocks>Inode identified as a number. inode is an offset into a descriptor table; descriptor tells where the blocks are and how to find them.

We require O(1) random access by inode number. Better to say "array" of inodes:

We require O(1) random access by block number. Better to say: "array" of blocks.

What is a file?

What is a file? Tuesday, November 22, 2011 2:10 PM

Physical Page 7

A special kind of file. Contains a mapping from names to inodes. Covers one part of a file path.

What is a directory?

What is a directory? Tuesday, November 22, 2011 2:10 PM

Physical Page 8

All groups of inodesAll block groupsWhat directory is root ("/")

A descriptor with pointers to

Duplicated all over the disk. If it is lost, disk data becomes meaningless!

What is a super-block?

What is a super-block?Tuesday, November 22, 2011 2:09 PM

Physical Page 9

Logical structure of a linux filesystemThursday, December 03, 2009 2:11 PM

Physical Page 10

The identity of a file is a number. Its name is constructed as a sequence of directories. One file can have multiple names, but only one number.

Some really difficult things to understand

Some really difficult things to understandTuesday, November 22, 2011 1:49 PM

Physical Page 11

Start at file. Find the directory it's in. Go up to its parent using '..'.

That name is a component of the filename.

Find the entry of the parent directory that points to this one.

This means you are at the root. Repeat until the number for . == the number for ..

Constructing a filename from a number:

Constructing a name from a numberTuesday, November 22, 2011 1:50 PM

Physical Page 12

Example: constructing a name from a numberTuesday, November 22, 2011 1:55 PM

Physical Page 13

Why? It stores that directory in the PCB as a number.

One of the most expensive system calls is getcwd(), which returns the current working directory for a process.

Why? Preventing an infinite loop would make getcwd even slower than it already is!

If a filesystem has a loop, then doing a getcwd inside the loop will crash the OS.

hard links to directories may only be created by root.

For this reason,

Some very counter-intuitive facts

mSome very counter-intuitive factsTuesday, November 22, 2011 1:55 PM

Physical Page 14

Our naïve models of storage have been based upon "free" and "used" lists. Logically, they're sets, in the sense that they're unordered. In a realistic filesystem, there might be "free" or "used" arrays. The reason that this is reasonable is the principle of locality: that, on average, processes reference stuff in a local region.So, the used array can be cached, and tends to stay in cache. A realistic file system takes advantage of both locality and the fact that there is a block cache!

So far,

So farThursday, December 02, 2010 4:33 PM

Physical Page 15

Logical superstructureThursday, December 03, 2009 2:21 PM

Physical Page 16

So far, we are using "sequence" in a loose way. A file is a "sequence of blocks".The physical way that sequence is defined depends upon the recording medium and a number of design tradeoffs.

From logical to physical

Rock: need to be space efficient. Hard place: need to be able to get to any file block in O(1)

Between "rock" and "hard place"

From logical to physicalThursday, December 03, 2009 2:29 PM

Physical Page 17

Unlimited writes -- no degradation. Minimum time before failure (MBTF) does not depend upon # of writes.

Can write or erase a block at a time.

Magnetic disksMonday, November 23, 2015 4:51 PM

Physical Page 18

Design question 1: what is a sequence of blocks?

What is a sequence?Thursday, December 03, 2009 2:34 PM

Physical Page 19

Design question 2: how to keep track of

unused blocks.

Design question 2: how to keep track of unused blocks?Thursday, December 03, 2009 2:39 PM

Physical Page 20

Primitive linux filesystem.No fancy features (until ext3!).Desires: robustness, simplicity. Medium properties: unlimited writes: a write does not degrade the medium (it's degraded by time

spent running).

Case study: ext2

Case study: ext2fs Thursday, December 03, 2009 2:44 PM

Physical Page 21

Free and used blocks and inodes tracked via arrays of T/F bits

Sequences of blocks represented viaarray indirection that uses whole blocks

as arrays of pointers to blocks.

Ext2 design decisions

Ext2 design decisionsThursday, December 03, 2009 2:46 PM

Physical Page 22

These are not memory pointers like you're used to. They are offsets into the block descriptor arrays. Each one is an integer defining a block on disk.

owner, group, protectionsaccess count of references from directories12 block pointers for first twelve blocks of file. 1 pointer to a block used for single-level indirection

1 pointer for double-level indirection1 pointer for triple-level indirection

Ext2 inode contains:

Ext2 inodeThursday, December 03, 2009 2:48 PM

Physical Page 23

Picture of an ext2 inode:

Picture of an ext2 inodeThursday, December 03, 2009 2:50 PM

Physical Page 24

Statistically, most files are small! The inode contains enough information for files up to 1024*12 bytes.

After this, we allocate a block that can represent indirection to 256*1024 more

bytes. ...

Why indirection?

You remember the boundaries first 1024*12 bytes: in inodenext 256*1024 bytes: in single indirectionnext 256*256*1024 bytes: in double indirection.

Reason it's O(1):

determine which indirection kind, subtract off size of previous blockdivide by block size.

Algorithm for indirection:

Suppose you want byte number 50,000In first indirection blockBase of indirection block is 50,000 - 12*1024Offset of block = (50,000-12*1024)/1024Offset of byte in block (50,000-12*1024)%1024

Why indirection?Thursday, December 03, 2009 3:08 PM

Physical Page 25

1024-byte blocks

16 inodes per inode block 64-byte inodes, 1024/64 = 210/26=24=

256 block pointers per indirection block

4-byte block pointers, 1024/4 =

Typical ext2 block sizes (early version)

Ext2 block handlingThursday, December 03, 2009 2:59 PM

Physical Page 26

Pager is completely unaware of the nature of pages. But the pager is absolutely essential to the timely function of this algorithm, because all disk objects

become -- for a time -- memory objects!

Big point:

Big point: Tuesday, November 20, 2012 5:15 PM

Physical Page 27

"magic" code for kind of filesystemsize of a blocknumber of free and total data blocksnumber of free and total inodeslocation of inode for root of filesystemlocations of all "block groups"

The super-block contains

It's on the disk. It's accessed all the time. It's always cached!

The super-blockThursday, December 03, 2009 3:14 PM

Physical Page 28

Block groups

its own inode tableits own data blocksa copy of the super-blockbitmaps determining used inodes and used data blocks.

In ext2, the disk is partitioned into block groups, each with

Block groupsThursday, December 03, 2009 3:10 PM

Physical Page 29

reuse freed blocks as fast as possiblea)=> same block of the used bitarray will be used for the previous free and next allocation.

b)

=> the bit allocation array will stay in the page cache!

c)

Three things in concertThursday, December 02, 2010 4:50 PM

Physical Page 30

Keep track of the last block allocated.

To allocate another, look forward in the bit array for a 0. On average, this is in the same cache page! ⟶ it's a memory operation!

A really slimy trickMonday, November 23, 2015 5:09 PM

Physical Page 31

Reason that filesystem is organized this way: quick array indirection.

Array access

check bit i%32 of block bitarray element i/32, assuming 32-bit words in the bitarray.

inode checking works likewise.

Determining whether block i is used:

Formula: (word[bit/32] & (1<<(bit%32)))!=0

Array accessThursday, December 03, 2009 3:18 PM

Physical Page 32

Same issue for block groups. If we want block n on a disk and there are m blocks in a group

n/m is the group number (starting at 0)and n%m is the block in the group.

How to find block b of a file: Suppose inode block pointers start at p.if b<12, block number is p[b]if b<268, block number is p[12][b-12] (treating block as array)

Similar pattern for multiple-indirection: for two levels: p[13][(b-268)/32][(b-268)%32]

Array access continued

Array access continuedThursday, December 03, 2009 3:21 PM

Physical Page 33

Q: Why do I have no worries about using a bit array for determining used blocks and inodes?

A: because block paging is still in effect!

Crazy like a fox!

If data we need is localized in a small number of blocks, then on average, that data will remain paged in, and we'll be interacting with memory

rather than disk!

Disk version of principle of locality:

Superblock is always paged in, always written to. If I sequentially access a file, its block descriptors are read in one at a time with a minimum number

of reads.

Examples:

Crazy like a fox!Thursday, December 03, 2009 3:29 PM

Physical Page 34

use bit-arrays for determining used blocks and inodes because on average, these bit arrays will remain in memory! use indirection for finding blocks in large files because on average, files are small!

Why were these choices made?

Basic principles of filesystem designThursday, December 03, 2009 3:27 PM

Physical Page 35

This all works because two subsystems obey contractual obligations to one another with no detailed knowledge of the function of the other subsystem.

There is no explicit interface that does this; the contractual obligations suffice!

The cosmic factMonday, November 18, 2013 7:00 PM

Physical Page 36