Проблема фрагментации виртуальных дисков и способы её...

29

Transcript of Проблема фрагментации виртуальных дисков и способы её...

Fragmentation problem in vdisk enviroment

Dmitry Monakhov

2015-09-19

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 1 / 29

Outline

1 Introduction

2 FS fragmentation

3 An Era of Thin Provision Enviroment

4 Future work

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 2 / 29

Basic terminology

Filesystem divides it space in to blocks (usually 4k)

Files consists of blocks

File is fragmented if it's blocks are not continious

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 3 / 29

FS aging problem

Zillions of block-alloc, block-free iterations result in fs fragmentationMost �lesystem has e�ective and reliable techniques which preventsfs aging

Block allocator try to spread data to whole disk

Block allocator try to pack small �les together

Block allocator delay allocation untill close(2)/fsync(2)

Online/o�ine defragmentation tools [still required]

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 4 / 29

When defragmentation is required

There are situation when blockallocator tricks are not su�cient

Filesystem is almost full (>90%)

Weird falloc/unlink/fsync scenario

Special read pattern (boot speedup)

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 5 / 29

Fragmentation: More formal terminology

IntrA-�le-Fragmentation(IAF) Fragmentation of a single �le.

IntEr-�le-Fragmentation(IEF) Fragmentation of a group of �les

1

1Terminology from DFS paperDmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 6 / 29

Existing tools

EXT4Ioctl EXT4_MOVE_IOC (atomic) -

Swap blocks between donor and target file

Util: e4defrag(8) : defrag large files (*IAF*)

XFSIoctl XFS_IOC_SWAPEXT (non atomic)

Swap blocks between donor and target file

Util: xfs_fsr(8) defrag large files (*IAF*)

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 7 / 29

Basic disk layout

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 8 / 29

Virtual Disk: Things got complicated*

New indirection layer

Thin provision driver adds second space management layer, it dividesit space in to allocation blocks aka TPAB or buckets.

Bucket size != FS block size

TPAB is larger than fs block, but less than fs group

1M-4M Ploop, LVM-linear, QCOW2, Ceph(RBD)

64k-256k dm-thin,dm-snap

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 9 / 29

Virtual disk mapping

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 10 / 29

Customer's feedback

I've cotnainer with mail server inside which use 10Gb of data.

Your virtual disk use 40Gb of my super-fast SSD

WHY?

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 11 / 29

Virtual disk fragmentation example

root@dmlp:~# e2freefrag -c 4096 /dev/dm-1

Device: /dev/dm-1

Blocksize: 4096 bytes

Total blocks: 34126848

Free blocks: 12293324 (36.0%)

Chunksize: 4194304 bytes (1024 blocks)

Total chunks: 33328

Free chunks: 8379 (25.1%)

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 12 / 29

ThinProvision fragmentation problem

Visiable e�ect

Ine�cient free-space usage (up to 0.4%)

Bad IO performance

Why?

TRIM/Discard is useless

Existing FS defragmentation tools/techniques are useless

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 13 / 29

Who are a�ected?

Worst use-case

Many small �les

A lot of create(2)/unlink(2)

Unpredictable lifetime

Massive write(2); sync(2)/fsync(2)

Bad pattern examples

Mail server

News server

Photo server

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 14 / 29

Image bloating example

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 15 / 29

New TP defragmentation API wanted

New TP-aware block allocator for FS

New TP-aware defragment tool

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 16 / 29

TP-aware defragmentation tool principles

Take in to account TP layout

Relocate group of �les to according to one TPAB

The only question left

What to relocate?

Where to relocate?

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 17 / 29

TP-aware defragmentation overview

1 Sequential scan of the block bitmap tables. Collect used blocks(build spextent tree)

2 Scan �lesystem hierarchy and collect extents ownership statistics.

3 Rescan �lesystem tree prepare list of candidates for IEFdefragmentation.Fix IntrA-�le-Fragmentation(IAF) issues if discovered

4 Process IEF list and perform actual defragmentation

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 18 / 29

Pass1

Sequential scan of the block bitmap tables.Build free-space tree.

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 19 / 29

Pass2

Scan �lesystem hierarchy and collect extents ownership statistics.

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 20 / 29

Pass3

Rescan �lesystem tree prepare list of candidates for IEFdefragmentation.

Which candidates are good?

Files which belongs to partly populated claster

Readonly �les (old mtime or executable �les)

Small �les

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 21 / 29

Pass3 image

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 22 / 29

Pass4

Process IEF list and perform actual defragmentation

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 23 / 29

Pass4 before

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 24 / 29

Pass4, after

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 25 / 29

Integration

OVZ case

call pcompact(8) nigtly from cron

pcompact invokes e4defrag2 and ploop compact for each ploop

Customer's feedback

Ok ploop image size is now ok, but. . .

Some times pcompact works all the time.

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 26 / 29

Source

GITHUB https://github.com/dmonakhov/e2fsprogs/blob/

e4defrag2/misc/e4defrag2.c

OVZ.GIT TODO add pcompact to git.openvz.org

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 27 / 29

[Future works] Stanrard bitmap scan API required

Currently used block info is obtained via e2fsprogs/xfs-progs

XFS: Analog FS-wide analog of FIEMAP

XFS_IOC_FIEMAPFS

Implement ioctl for EXT4

Move userspace to this new IOCTL

Massive testing and �ne tuning.

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 28 / 29

[Future works2] Smart block allocator

Dave Chinner suggest smart block allocator which encapsulate allsmart-disk internals

Hide SMR internals

Hide TP internals

Garbage collection

Samrt block allocator API proposal

Place my data somewhere, and tell me location

Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 29 / 29