bup: Git for backups - CCC

39
bup: Git for backups #bup #28c3 1 / 26

Transcript of bup: Git for backups - CCC

Page 1: bup: Git for backups - CCC

bup: Git for backups

#bup #28c3

1 / 26

Page 2: bup: Git for backups - CCC

Zoran Zaric

I @zoranzaric

I Computer Science student at TUDarmstadt

I bup since April 2010

2 / 26

Page 3: bup: Git for backups - CCC

toc

1. Motivation

2. Git backgrounds3. bup

3.1 Features3.2 Algorithms & data structures

3 / 26

Page 4: bup: Git for backups - CCC

Motivation

I Space efficiency of backups

I Convenient access to backups

I Safety against bitrot, filesystem-, and media errors

I Safety against history changes

4 / 26

Page 5: bup: Git for backups - CCC

Git

I Distributed version control system

I Content addressed

I Immutable objects

I Snapshot- instead of diff-based

5 / 26

Page 6: bup: Git for backups - CCC

Git

I Distributed version control system

I Content addressed

I Immutable objects

I Snapshot- instead of diff-based

5 / 26

Page 7: bup: Git for backups - CCC

Git

I Distributed version control system

I Content addressed

I Immutable objects

I Snapshot- instead of diff-based

5 / 26

Page 8: bup: Git for backups - CCC

Git

I Distributed version control system

I Content addressed

I Immutable objects

I Snapshot- instead of diff-based

5 / 26

Page 9: bup: Git for backups - CCC

Git

I Distributed version control system

I Content addressed

I Immutable objects

I Snapshot- instead of diff-based

5 / 26

Page 10: bup: Git for backups - CCC

Git: A Repository

I BLOBs

e69de29

I Trees

82e3a75

I Commits

3dfe461f

I Tags & Branches

v0.1 master

6 / 26

Page 11: bup: Git for backups - CCC

Git: A Repository

I BLOBs

e69de29

I Trees

82e3a75

I Commits

3dfe461f

I Tags & Branches

v0.1 master

Hello World

6 / 26

Page 12: bup: Git for backups - CCC

Git: A Repository

I BLOBs

e69de29

I Trees

82e3a75

I Commits

3dfe461f

I Tags & Branches

v0.1 master

100644 blob 5e1c309dae7f45e0f39b1bf3ac3cd9db12e7d689 README100644 blob 39c8418e04721b9a30232ce754cac8d9ee78340a DESIGN040000 tree 482fa65ae85c1e5bca8c091b479de60b714a4b6a src

6 / 26

Page 13: bup: Git for backups - CCC

Git: A Repository

I BLOBs

e69de29

I Trees

82e3a75

I Commits

3dfe461f

I Tags & Branches

v0.1 master

tree a3d703e579dc9baae20456eb63fa49f5e4e7c9b4author Zoran Zaric <[email protected]>1314498536 +0200committer Zoran Zaric <[email protected]>1314498536 +0200Example commit

6 / 26

Page 14: bup: Git for backups - CCC

Git: A Repository

I BLOBs

e69de29

I Trees

82e3a75

I Commits

3dfe461f

I Tags & Branches

v0.1 master

63866463d511a245a55a57ca48efe8e67b955dec

6 / 26

Page 15: bup: Git for backups - CCC

Git: A Repository

I BLOBs

e69de29

I Trees

82e3a75

I Commits

3dfe461f

I Tags & Branches

v0.1 master

3dfe461f 82e3a75 e69de29

25b2be3 78af04f 41c28e8

master v1.0

6 / 26

Page 16: bup: Git for backups - CCC

Git: A Repository

I Packfiles e69de29

82e3a75

3dfe461f

41c28e8

78af04f

25b2be3

7 / 26

Page 17: bup: Git for backups - CCC

Git: Problems

I Slow & memory-hungry for bigger files

I No meta data (permissions, owners, ACLs)

8 / 26

Page 18: bup: Git for backups - CCC

Git: Problems

I Slow & memory-hungry for bigger files

I No meta data (permissions, owners, ACLs)

8 / 26

Page 19: bup: Git for backups - CCC

Git: Problems

I Slow & memory-hungry for bigger files

I No meta data (permissions, owners, ACLs)

8 / 26

Page 20: bup: Git for backups - CCC

bup

I Avery Pennarun (git subtree, sshuttle, redo)

I https://github.com/apenwarr/bup

I http://groups.google.com/group/bup-list

9 / 26

Page 21: bup: Git for backups - CCC

bup: Installation

$ sudo apt-get install python2.6-dev python-fuse

$ sudo apt-get install python-pyxattr python-pylibacl

$ mkdir ~/src && cd ~/src

$ git clone https://github.com/apenwarr/bup.git

$ cd bup

$ make

$ make test

$ sudo make install

10 / 26

Page 22: bup: Git for backups - CCC

bup: Examples

$ bup index -ux /home/zz

$ bup save -n laptop /home/zz

$ bup save -r myserver -n laptop /home/zz

$ bup on myserver index -ux /home/zz

$ bup on myserver save -n server /home/zz

$ bup ls laptop/latest/home/zz

11 / 26

Page 23: bup: Git for backups - CCC

bup: Features

Deduplication (http://goo.gl/aBpny)

I Benchmark with two servers and a pseudo vm image on themwith little changesrsnapshot: 4.97Gbup: 2.18G

I Import of rsnapshot backups to buprsnapshot: 12.6Gbup: 4.6G

12 / 26

Page 24: bup: Git for backups - CCC

bup: Features

Deduplication (http://goo.gl/aBpny)

I Benchmark with two servers and a pseudo vm image on themwith little changesrsnapshot: 4.97Gbup: 2.18G

I Import of rsnapshot backups to buprsnapshot: 12.6Gbup: 4.6G

12 / 26

Page 25: bup: Git for backups - CCC

bup: Features

Deduplication (http://goo.gl/aBpny)

I Benchmark with two servers and a pseudo vm image on themwith little changesrsnapshot: 4.97Gbup: 2.18G

I Import of rsnapshot backups to buprsnapshot: 12.6Gbup: 4.6G

12 / 26

Page 26: bup: Git for backups - CCC

bup: Features

Meta data (almost done)

I Owner

I Exakt times

I Permissions

I Extended ACLs

I SELinux

13 / 26

Page 27: bup: Git for backups - CCC

bup: Features

FUSE moduleYou can mount your backups and browse them with your favorite filemanager

14 / 26

Page 28: bup: Git for backups - CCC

bup: Features

Web interface

15 / 26

Page 29: bup: Git for backups - CCC

bup: Features

Runs on dd-wrt

16 / 26

Page 30: bup: Git for backups - CCC

bup: Features

Import-script for rsnapshot backupsMore will follow (Duplicity)

17 / 26

Page 31: bup: Git for backups - CCC

bup: Features

Full compatibility with GitGit tools like gitk or tig can be used with bup repositores

18 / 26

Page 32: bup: Git for backups - CCC

bup: Features

Uses par2 to be save against bitrot, filesystem-, and media-errors

19 / 26

Page 33: bup: Git for backups - CCC

bup: Algorithms & Data Structures

I Hashsplitting

I Midx

I Bloom filters

20 / 26

Page 34: bup: Git for backups - CCC

Hashsplitting

I Rolling checksum

I rsync’s algorithm

I Big files are split in 8kB Chunks (avg)

I 11 least significant bits of the checksum ”1“ ⇒ new chunk

21 / 26

Page 35: bup: Git for backups - CCC

Midx

I idx: indexes for packfiles

I 1 idx per packfile

I An object is found with 3-4 lookups per packfile

I Midx for several packfiles

I Object is found with 2 lookups

I Problem: midx have to be recreated for every change

22 / 26

Page 36: bup: Git for backups - CCC

Bloom Filters

I Probabilistic data structure

I Check if a datum is known

I Append possibleI False-positives

I Rate grows with added dataI When rate >1% the bloom filter is expanded and rewritten

I Hash function optimized for few 1s in result

I Bloom filter is a bitarray; the result is added with bitwise OR

I When a hit is found a midx-lookup is done

23 / 26

Page 37: bup: Git for backups - CCC

Recent

I Meta data support about to be finished(patchset available, testing needed)

I Repack patches pending(deleting old backups)

I inotify based daemon is being discussed

24 / 26

Page 38: bup: Git for backups - CCC

You & bup?

I Python & a bit of C

I Native Windows support?

I OSX / Windows meta data support?

I OSX ”inotify“-like port?

I GUI?

I Diff

25 / 26

Page 39: bup: Git for backups - CCC

Thank You

I @zoranzaric

I zorzar on freenode & hackint

I [email protected] (Email & Jabber)

I zoranzaric.de

I github.com/zoranzaric

I gplus.zoranzaric.de

I Slides: zoranzaric.de/bup-28c3.pdf

26 / 26