Putting some "logic" in LVM.

67
LVM It's only logical. Steven Lembark Workhorse Computing [email protected]

Transcript of Putting some "logic" in LVM.

Page 1: Putting some "logic" in LVM.

LVMIt's only logical.

Steven LembarkWorkhorse [email protected]

Page 2: Putting some "logic" in LVM.

What's so logical about LVM?

Page 3: Putting some "logic" in LVM.

What's so logical about LVM?

Simple: It isn't phyiscal.

Page 4: Putting some "logic" in LVM.

What's so logical about LVM?

Simple: It isn't phyiscal.

Think of it as "virtual memory" on a disk.

Page 5: Putting some "logic" in LVM.

Simple???

PV

VG

LV

thin

provisioned

mirrored

snapshots

Page 6: Putting some "logic" in LVM.

Begin at the beginning...

Disk drives were 5, maybe 10Mb.

Booting from tape takes too long.

Can't afford a whole disk for swap.

What to do?

Tar overlays?

RA-70 packs?

Page 7: Putting some "logic" in LVM.

Partitions save the day!

Divide the drive for swap, boot, O/S.

Allows separate mount points.

New partitions == New mount points.

Page 8: Putting some "logic" in LVM.

What you used to do

Partition the drive.

Remembering to keep a separate partition for /boot.

Using parted once the original layout was outgrown.

Figuring out how to split space with new disks...

Page 9: Putting some "logic" in LVM.

Size matters.

Say you have something big: 20MB of data.

Tape overlays take too long.

RA-70's require remounts.

How can we manage it?

Page 10: Putting some "logic" in LVM.

Making it bigger

We need an abstraction:

Vnodes.

Instead of "hardware".

Page 11: Putting some "logic" in LVM.

Veritas & HP

Developed different ways to do this.

Physical drives.

Grouped into "blocks".

Allocated into "volumes".

Fortunately linux uses HP's scheme.

Page 12: Putting some "logic" in LVM.

First few steps

pvcreate initialize physical storage.

whole disk or partition.

vgcreate multiple drives into pool of blocks.

lvcreate re-partition blocks into mountable units.

Page 13: Putting some "logic" in LVM.

Example: single-disk desktop

grub2 speaks lvm – goodby boot partitions!

Two partitions: primary swap + everything else.

Call them /dev/sda{1,2}.

swap== 2 * RAM

rest == lvm

Page 14: Putting some "logic" in LVM.

Example: single-disk desktop

grub2 speaks lvm – goodby boot partitions!

Two partitions: primary swap + everything else.

Call them /dev/sda{1,2}.

swap is for hibernate and recovery.

otherwise use LVM.

Page 15: Putting some "logic" in LVM.

Example: single-disk desktop

# fdisk /dev/sda; # sda1 => 82, sda2 => 8e

Page 16: Putting some "logic" in LVM.

Example: single-disk desktop

# fdisk /dev/sda; # sda1 => 82, sda2 => 8e

# pvcreate /dev/sda2;

Page 17: Putting some "logic" in LVM.

Example: single-disk desktop

# fdisk /dev/sda; # sda1 => 82, sda2 => 8e

# pvcreate /dev/sda2;

# vgcreate vg00 /dev/sda2;

Page 18: Putting some "logic" in LVM.

Example: single-disk desktop

# fdisk /dev/sda; # sda1 => 82, sda2 => 8e

# pvcreate /dev/sda2;

# vgcreate vg00 /dev/sda2;

# lvcreate -L 8Gi -n root vg00;

Page 19: Putting some "logic" in LVM.

Example: single-disk desktop

# fdisk /dev/sda; # sda1 => 82, sda2 => 8e

# pvcreate /dev/sda2;

# vgcreate vg00 /dev/sda2;

# lvcreate -L 8Gi -n root vg00;

# mkfs.xfs -blog=12 -L root /dev/vg00/root;

Page 20: Putting some "logic" in LVM.

Example: single-disk desktop

# fdisk /dev/sda; # sda1 => 82, sda2 => 8e

# pvcreate /dev/sda2;

# vgcreate vg00 /dev/sda2;

# lvcreate -L 8Gi -n root vg00;

# mkfs.xfs -blog=12 -L root /dev/vg00/root;

# mount /dev/vg00/root /mnt/gentoo;

Page 21: Putting some "logic" in LVM.

Finding yourself

Ever get sick of UUID?

Labels?

Device paths?

Page 22: Putting some "logic" in LVM.

Finding yourself

Ever get sick of UUID?

Labels?

Device paths?

LVM assigns UUIDs to PV, VG, LV.

Page 23: Putting some "logic" in LVM.

Finding yourself

Ever get sick of UUID?

Labels?

Device paths?

Let LVM do the walking: vgscan -v

Page 24: Putting some "logic" in LVM.

Give linux the boot

mount -t proc none /proc;

mount -t sysfs none /sys;

/sbin/mdadm --verbose --assemble --scan;

/sbin/vgscan –verbose;

/sbin/vgchange -a y;

/sbin/mount /dev/vg00/root /mnt/root;

exec /sbin/switch_root /mnt/root /sbin/init;

Page 25: Putting some "logic" in LVM.

Then root fills up...

Say goodby to parted.

lvextend -L12Gi /dev/vg00/root;

xfs_growfs /dev/vg00/root;

Notice the lack of any umount.

Page 26: Putting some "logic" in LVM.

Add a new disk

/sbin/fdisk /dev/sdb; # sdb1 => 8e

pvcreate /dev/sdb1;

vgextend vg00 /dev/sdb1;

lvextend -L24Gi /dev/vg00/root;

xfs_growfs /dev/vg00/root;

Page 27: Putting some "logic" in LVM.

And another disk, and another...

Let's say you've scrounged ten disks.

One large VG.

Page 28: Putting some "logic" in LVM.

And another disk, and another...

Let's say you've scrounged ten disks.

One large VG.

Then one disk fails.

Page 29: Putting some "logic" in LVM.

And another disk, and another...

Let's say you've scrounged ten disks.

One large VG.

Then one disk fails.

And the entire VG with it.

Page 30: Putting some "logic" in LVM.

Adding volume groups

Lesson: Over-large VG's become fragile.

Fix: Multiple VG's partition the vulnerability.

One disk won't bring down everyhing.

Page 31: Putting some "logic" in LVM.

Growing a new VG

Plug in and fdisk your new device.

# pvcreate /dev/sdc1;

# vgcreate vg01 /dev/sdc1;

# lvcreate -n home vg01;

# mkfs.xfs -blog=12 -L home /dev/vg01/home;

Copy files, add /dev/vg01/home to /etc/fstab.

Page 32: Putting some "logic" in LVM.

Your backups just got easier

Separate mount points for "scratch".

"find . -xdev"

Mount /var/spool, /var/tmp.

Back up persistent portion of /var with rest of root.

Page 33: Putting some "logic" in LVM.

More smaller volumes

More /etc/fstab entries.

Isolate disk failures to non-essential data.

Back up by mount point.

Use different filesystems (xfs vs. ext4).

Page 34: Putting some "logic" in LVM.

RAID + LVM

- LV with copies using LVM.

- Or make PV's out of mdadm volumes.

LV's simplify handling huge RAID volumes.

Page 35: Putting some "logic" in LVM.

LVM RAID

# lvcreate -m <#copies> …

Automatically duplicates LV data.

"-m 2" == three-volume RAID (1 data + 2 copy).

Blocks don't need to be contiguous.

Can lvextend later on.

Page 36: Putting some "logic" in LVM.

LVM on RAID

Division of labor:

- mdadm for RAID.

- LVM for mount points.

Use mdadm to create space.

Use LVM to manage it.

Page 37: Putting some "logic" in LVM.

"Stride" for RAID > 1

LV blocks == RAID page size.

Good for RAID 5, 6, 10.

Meaningless for mirror (LVM or hardware).

Page 38: Putting some "logic" in LVM.

Monitored LVM

# lvmcreate –monitor y ...

Use dmeventd for monitoring.

Know about I/O errors.

What else would you want to do at 3am?

Page 39: Putting some "logic" in LVM.

Real SysAdmin's don't need sleep!

Archiving many GB takes time.

Need stable filesystems.

Easy: Kick everyone off, run the backups at 0300.

Page 40: Putting some "logic" in LVM.

Real SysAdmin's don't need sleep!

Archiving many GB takes time.

Need stable filesystems.

Easy: Kick everyone off, run the backups at 0300.

If you enjoy that sort of thing.

Page 41: Putting some "logic" in LVM.

Snapshots: mounted backups

Not a hot copy.

Snapshot pool == stable version of COW blocks.

Stable version of LV being backed up.

Size == max pages that might change during lifetime.

Page 42: Putting some "logic" in LVM.

Lifecycle of a snapshot

Find mountpoint.

Snapshot mount point.

Work with static contents.

Drop snapshot.

Page 43: Putting some "logic" in LVM.

Most common: backup

Live database

Spool directory

Disk cache

Home dirs

Page 44: Putting some "logic" in LVM.

Backup a database

Data and config under /var/postgres.

Data on single LV /dev/vg01/postgres.

At 0300 up to 1GB per hour.

Daily backup takes about 30 minutes.

VG keeps 8GB free for snapshots.

Page 45: Putting some "logic" in LVM.

Backup a database

# lvcreate -L1G -s -n pg-tmp \ /dev/vg01/postgres;

1G == twice the usual amount.

Updates to /dev/vg01/postgres continue.

Original pages stored in /dev/vg01/pg-backup.

I/O error in pg-backup if > 1GB written.

Page 46: Putting some "logic" in LVM.

Backup a database

# lvcreate -L1G -s -n pg-tmp \ /dev/vg01/postgres;

# mount --type xfs \ -o'ro,norecovery,nouuid' /dev/vg01/pg-tmp /mnt/backup;

# find /mnt/backup -xdev …

/mnt/backup is stable for duration of backup.

/var/postgres keeps writing.

Page 47: Putting some "logic" in LVM.

Backup a database

One downside: Duplicate running database.

Takes extra steps to restart.

Not difficult.

Be Prepared.

Page 48: Putting some "logic" in LVM.

Giving away what you ain't got

"Thin volumes".

Like sparse files.

Pre-allocate pool of blocks.

LV's grow as needed.

Allows overcommit.

Page 49: Putting some "logic" in LVM.

"Thin"?

"Thick" == allocate LV blocks at creation time.

"—thin" assigns virtual size.

Physical size varies with use.

Page 50: Putting some "logic" in LVM.

Why bother?

Filesystems that change over time.

No need to pre-allocate all of the space.

Add physical storage as needed.

ETL intake.

Archival.

User scratch.

Page 51: Putting some "logic" in LVM.

Example: Scratch space for users.

Say you have ~30GB of free space.

And three users.

Each "needs" 40GB of space.

No problem.

Page 52: Putting some "logic" in LVM.

The pool is an LV.

"--thinpool" labels the LV as a pool.

Allocate 30GB into /dev/vg00/scatch.

# lvcreate -L 30Gi --thinpool scratch vg00;

Page 53: Putting some "logic" in LVM.

Virtually allocate LV

"lvcreate -V" allocates space out of the pool.

"-V" == "virtual"

# lvcreate -V 40Gi --thin -n thin_one \ /dev/vg00/scratch;

# lvcreate -V 40Gi ...

Page 54: Putting some "logic" in LVM.

Virtually yours

Allocated 120Gi using 30GB of disk.

lvdisplay shows 0 used for each volume??

Page 55: Putting some "logic" in LVM.

Virtually yours

Allocated 60GiB of 50GB.

lvdisplay shows 0 used for each volume??

Right: None used. Yet.

40GiB is a limit.

Page 56: Putting some "logic" in LVM.

Pure magic!

Make a filesytem.

Mount the lvols.

df shows them as 40GiB.

Everyone is happy!

Page 57: Putting some "logic" in LVM.

Pure magic!

Make a filesytem.

Mount the lvols.

df shows them as 20GiB.

Everyone is happy...

Until 30GB is used up.

Page 58: Putting some "logic" in LVM.

Size does matter.

No blocks left to allocate.

Now what?

Writing procs are "killable blocked".

Hold queue until "kill -KILL" or space available.

Page 59: Putting some "logic" in LVM.

One fix: scrounge a disk

vgextend vg00;

lvextend -L<whatever> /dev/vg00/scratch;

Bingo: free blocks.

Page 60: Putting some "logic" in LVM.

Reduce, reuse, recycle

fstrim(8) removed unused blocks from a filesystem.

Reduces virtual allocations.

Allows virtual volumes to re-grow:

ftrim --all -verbose;

cron is your friend.

Page 61: Putting some "logic" in LVM.

Highly dynamic environment

Weekly doesn't cut it:

download directory.

uncompress space

compile large projects.

Or a notebook with small SSD.

Page 62: Putting some "logic" in LVM.

Automatic real-time trimming

3.0+ kernel w/ "TRIM".

Device needs "FITRIM".

http://xfs.org/index.php/FITRIM/discard

Page 63: Putting some "logic" in LVM.

Sanity check: discard avaiable?

$ cat /sys/block/sda/queue/discard_max_bytes;

2147450880

$ cat /sys/block/dm-8/queue/discard_max_bytes;

2147450880

So far so good...

Page 64: Putting some "logic" in LVM.

Do the deed

/dev/vg00/scratch_one /scratch/jowbloe \

xfs defaults,discard 0 1

or

mount -o defaults,discard /foo /mnt/foo;

Page 65: Putting some "logic" in LVM.

What you see is all you get

Dynamic discard == overhead.

Often not worse than network mounts.

YMMV...

Avoids "just in case" provisioning.

Good with SSD, battery-backed RAID.

Page 66: Putting some "logic" in LVM.

LVM benefits

- Saner mount points.

- Hot backups.

- Dynamic space manglement.

Page 67: Putting some "logic" in LVM.

LVM benefits

- Saner mount points.

- Hot backups.

- Dynamic space manglement.

Seems logical?