Scaling Git - Stefan Saasen

77
#atlassian

description

With the widespread adoption of Git, and the rollout of Git-based development workflows, large organizations must be able to scale their source code management system with their needs. In this talk we will provide practical advice to overcome the challenges in scaling Git.

Transcript of Scaling Git - Stefan Saasen

Page 1: Scaling Git - Stefan Saasen

#atlassian

Page 2: Scaling Git - Stefan Saasen

STEFAN SAASEN • DEVELOPMENT MANAGER • ATLASSIAN • @STEFANSAASEN

Scaling Git

Page 3: Scaling Git - Stefan Saasen
Page 4: Scaling Git - Stefan Saasen
Page 5: Scaling Git - Stefan Saasen
Page 6: Scaling Git - Stefan Saasen
Page 7: Scaling Git - Stefan Saasen
Page 8: Scaling Git - Stefan Saasen

Source View - git cat-file

Page 9: Scaling Git - Stefan Saasen
Page 10: Scaling Git - Stefan Saasen
Page 11: Scaling Git - Stefan Saasen
Page 12: Scaling Git - Stefan Saasen
Page 13: Scaling Git - Stefan Saasen

Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it P ro G i t B o o k , S e c t i o n : G i t I n t e r n a l s

”“

Page 14: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── info └── pack !

2 directories

Page 15: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

git add some-file.txt

Page 16: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !

3 directories, 1 file

zlib compressed SHA1

Page 17: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

git commit -m "First commit"

Page 18: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── 13 │   └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │   └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !

5 directories, 3 files

Page 19: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── 13 │   └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │   └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !

5 directories, 3 files

Blob

Page 20: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── 13 │   └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │   └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !

5 directories, 3 files

Blob

Tree

Page 21: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── 13 │   └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │   └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !

5 directories, 3 files

Blob

Tree

Commit

Page 22: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

echo "// Comment" >> some-file.txt

Page 23: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

git add some-file.txt

Page 24: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── 13 │   └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │   └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── c1 │   └── 9e6823e34980033917b6427f3e245ce2102e6e ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d !

6 directories, 4 files

Entirely new BLOB

Page 25: Scaling Git - Stefan Saasen

wat?

Page 26: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

git gc

Page 27: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── info │   └── packs └── pack ├── pack-7475314b451a882d77b1535d215def8bad0f4306.idx └── pack-7475314b451a882d77b1535d215def8bad0f4306.pack !

2 directories, 3 files

Page 28: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

Loose Objects

Page 29: Scaling Git - Stefan Saasen

G I T U N D E R T H E H O O D

1.zlib compressed 2.Delta encoded

PackfileLoose Objects

Page 30: Scaling Git - Stefan Saasen

but...

Page 31: Scaling Git - Stefan Saasen

CPU Replace Graph

processes-git-pack-objects cputime

0

200000

400000

600000

800000

1000000

1407989425 1407989450 1407989475 1407989500 1407989525 1407989550 1407989575

user syst

CPU

Page 32: Scaling Git - Stefan Saasen

Memory

processes-git-pack-objects rss

0

100

200

300

400

500

1407989425 1407989450 1407989475 1407989500 1407989525 1407989550 1407989575

MiB

Memory

Page 33: Scaling Git - Stefan Saasen

IO Replace Graph

processes-git-pack-objects disk_octets

0

5

10

15

20

25

1407989440 1407989460 1407989480 1407989500 1407989520 1407989540

read write

Disk I/O

Page 34: Scaling Git - Stefan Saasen

What's your point?

Page 35: Scaling Git - Stefan Saasen
Page 36: Scaling Git - Stefan Saasen

git clone/fetch

Page 37: Scaling Git - Stefan Saasen

generates a packfile

git clone/fetch

Page 38: Scaling Git - Stefan Saasen

git clone/fetch

every time

Page 39: Scaling Git - Stefan Saasen

Clone

800stash.atlassian.com

Fetch

1,200

Page 40: Scaling Git - Stefan Saasen

This is what we learned so far.

Page 41: Scaling Git - Stefan Saasen

1. SCM Cache

Page 42: Scaling Git - Stefan Saasen

CACHE

git clone 1

Page 43: Scaling Git - Stefan Saasen

git clone 2CACHE

Page 44: Scaling Git - Stefan Saasen

2. Sizing is important

Page 45: Scaling Git - Stefan Saasen

You need sufficient hardware

Page 46: Scaling Git - Stefan Saasen

768MiB

Memory budget

Page 47: Scaling Git - Stefan Saasen

5 GiB

Memory budget

Page 48: Scaling Git - Stefan Saasen

768MiB

5 GiB

Memory budget

Page 49: Scaling Git - Stefan Saasen

3. Limits

Page 50: Scaling Git - Stefan Saasen

Limits

Page 51: Scaling Git - Stefan Saasen

4. Continuous Integration

Page 52: Scaling Git - Stefan Saasen

What do you have?

Page 53: Scaling Git - Stefan Saasen

This is what I've got.

Page 54: Scaling Git - Stefan Saasen

Ok, here is what I've got. Give me everything that's new.

Page 55: Scaling Git - Stefan Saasen

Here you go!

Page 56: Scaling Git - Stefan Saasen

Don't worry. I'm up to date.

Page 57: Scaling Git - Stefan Saasen

Caption goes here

Page 58: Scaling Git - Stefan Saasen

Avoid Polling.

Page 59: Scaling Git - Stefan Saasen

SCM Cache can also cache ref

advertisements

Page 60: Scaling Git - Stefan Saasen

Consider shallow clones.

Page 61: Scaling Git - Stefan Saasen

5. Update

Page 62: Scaling Git - Stefan Saasen

Use recent versions of

Page 63: Scaling Git - Stefan Saasen

Just text by itself, for impact.

Page 64: Scaling Git - Stefan Saasen

Stash Data Center

Page 65: Scaling Git - Stefan Saasen

RDBMS

FS

Page 66: Scaling Git - Stefan Saasen

RDBMS

NFS

Page 67: Scaling Git - Stefan Saasen

Performance at scale

Page 68: Scaling Git - Stefan Saasen

RDBMS

NFS

Page 69: Scaling Git - Stefan Saasen

S c a l i n g G i t

Page 70: Scaling Git - Stefan Saasen

S c a l i n g G i t

• Git hosting operations are expensive.

Page 71: Scaling Git - Stefan Saasen

S c a l i n g G i t

• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect

Page 72: Scaling Git - Stefan Saasen

S c a l i n g G i t

• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of

ref advertisements when you can't

Page 73: Scaling Git - Stefan Saasen

S c a l i n g G i t

• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of

ref advertisements when you can't• Prefer shallow clones

Page 74: Scaling Git - Stefan Saasen

S c a l i n g G i t

• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of

ref advertisements when you can't• Prefer shallow clones• Limits are in place to keep your Stash server running

Page 75: Scaling Git - Stefan Saasen

S c a l i n g G i t

• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of

ref advertisements when you can't• Prefer shallow clones• Limits are in place to keep your Stash server running• Stash Data Center allows you to scale out and have high

availability

Page 76: Scaling Git - Stefan Saasen

Sign up today! !

Talk to me after if you’re interested in learning more

Stash Data Center Beta Program

Page 77: Scaling Git - Stefan Saasen

Thank you!

STEFAN SAASEN • DEVELOPMENT MANAGER • ATLASSIAN • @STEFANSAASEN