Проблема фрагментации виртуальных дисков и способы её...
Transcript of Проблема фрагментации виртуальных дисков и способы её...
Fragmentation problem in vdisk enviroment
Dmitry Monakhov
2015-09-19
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 1 / 29
Outline
1 Introduction
2 FS fragmentation
3 An Era of Thin Provision Enviroment
4 Future work
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 2 / 29
Basic terminology
Filesystem divides it space in to blocks (usually 4k)
Files consists of blocks
File is fragmented if it's blocks are not continious
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 3 / 29
FS aging problem
Zillions of block-alloc, block-free iterations result in fs fragmentationMost �lesystem has e�ective and reliable techniques which preventsfs aging
Block allocator try to spread data to whole disk
Block allocator try to pack small �les together
Block allocator delay allocation untill close(2)/fsync(2)
Online/o�ine defragmentation tools [still required]
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 4 / 29
When defragmentation is required
There are situation when blockallocator tricks are not su�cient
Filesystem is almost full (>90%)
Weird falloc/unlink/fsync scenario
Special read pattern (boot speedup)
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 5 / 29
Fragmentation: More formal terminology
IntrA-�le-Fragmentation(IAF) Fragmentation of a single �le.
IntEr-�le-Fragmentation(IEF) Fragmentation of a group of �les
1
1Terminology from DFS paperDmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 6 / 29
Existing tools
EXT4Ioctl EXT4_MOVE_IOC (atomic) -
Swap blocks between donor and target file
Util: e4defrag(8) : defrag large files (*IAF*)
XFSIoctl XFS_IOC_SWAPEXT (non atomic)
Swap blocks between donor and target file
Util: xfs_fsr(8) defrag large files (*IAF*)
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 7 / 29
Virtual Disk: Things got complicated*
New indirection layer
Thin provision driver adds second space management layer, it dividesit space in to allocation blocks aka TPAB or buckets.
Bucket size != FS block size
TPAB is larger than fs block, but less than fs group
1M-4M Ploop, LVM-linear, QCOW2, Ceph(RBD)
64k-256k dm-thin,dm-snap
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 9 / 29
Customer's feedback
I've cotnainer with mail server inside which use 10Gb of data.
Your virtual disk use 40Gb of my super-fast SSD
WHY?
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 11 / 29
Virtual disk fragmentation example
root@dmlp:~# e2freefrag -c 4096 /dev/dm-1
Device: /dev/dm-1
Blocksize: 4096 bytes
Total blocks: 34126848
Free blocks: 12293324 (36.0%)
Chunksize: 4194304 bytes (1024 blocks)
Total chunks: 33328
Free chunks: 8379 (25.1%)
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 12 / 29
ThinProvision fragmentation problem
Visiable e�ect
Ine�cient free-space usage (up to 0.4%)
Bad IO performance
Why?
TRIM/Discard is useless
Existing FS defragmentation tools/techniques are useless
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 13 / 29
Who are a�ected?
Worst use-case
Many small �les
A lot of create(2)/unlink(2)
Unpredictable lifetime
Massive write(2); sync(2)/fsync(2)
Bad pattern examples
Mail server
News server
Photo server
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 14 / 29
New TP defragmentation API wanted
New TP-aware block allocator for FS
New TP-aware defragment tool
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 16 / 29
TP-aware defragmentation tool principles
Take in to account TP layout
Relocate group of �les to according to one TPAB
The only question left
What to relocate?
Where to relocate?
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 17 / 29
TP-aware defragmentation overview
1 Sequential scan of the block bitmap tables. Collect used blocks(build spextent tree)
2 Scan �lesystem hierarchy and collect extents ownership statistics.
3 Rescan �lesystem tree prepare list of candidates for IEFdefragmentation.Fix IntrA-�le-Fragmentation(IAF) issues if discovered
4 Process IEF list and perform actual defragmentation
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 18 / 29
Pass1
Sequential scan of the block bitmap tables.Build free-space tree.
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 19 / 29
Pass2
Scan �lesystem hierarchy and collect extents ownership statistics.
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 20 / 29
Pass3
Rescan �lesystem tree prepare list of candidates for IEFdefragmentation.
Which candidates are good?
Files which belongs to partly populated claster
Readonly �les (old mtime or executable �les)
Small �les
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 21 / 29
Pass4
Process IEF list and perform actual defragmentation
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 23 / 29
Integration
OVZ case
call pcompact(8) nigtly from cron
pcompact invokes e4defrag2 and ploop compact for each ploop
Customer's feedback
Ok ploop image size is now ok, but. . .
Some times pcompact works all the time.
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 26 / 29
Source
GITHUB https://github.com/dmonakhov/e2fsprogs/blob/
e4defrag2/misc/e4defrag2.c
OVZ.GIT TODO add pcompact to git.openvz.org
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 27 / 29
[Future works] Stanrard bitmap scan API required
Currently used block info is obtained via e2fsprogs/xfs-progs
XFS: Analog FS-wide analog of FIEMAP
XFS_IOC_FIEMAPFS
Implement ioctl for EXT4
Move userspace to this new IOCTL
Massive testing and �ne tuning.
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 28 / 29
[Future works2] Smart block allocator
Dave Chinner suggest smart block allocator which encapsulate allsmart-disk internals
Hide SMR internals
Hide TP internals
Garbage collection
Samrt block allocator API proposal
Place my data somewhere, and tell me location
Dmitry Monakhov Fragmentation problem in vdisk enviroment 2015-09-19 29 / 29