Scaling Git - Stefan Saasen

Post on 13-Jun-2015

886 views 1 download

Tags:

description

With the widespread adoption of Git, and the rollout of Git-based development workflows, large organizations must be able to scale their source code management system with their needs. In this talk we will provide practical advice to overcome the challenges in scaling Git.

Transcript of Scaling Git - Stefan Saasen

#atlassian

STEFAN SAASEN • DEVELOPMENT MANAGER • ATLASSIAN • @STEFANSAASEN

Scaling Git

Source View - git cat-file

Git is fundamentally a content-addressable filesystem with a VCS user interface written on top of it P ro G i t B o o k , S e c t i o n : G i t I n t e r n a l s

”“

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── info └── pack !

2 directories

G I T U N D E R T H E H O O D

git add some-file.txt

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !

3 directories, 1 file

zlib compressed SHA1

G I T U N D E R T H E H O O D

git commit -m "First commit"

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── 13 │   └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │   └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !

5 directories, 3 files

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── 13 │   └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │   └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !

5 directories, 3 files

Blob

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── 13 │   └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │   └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !

5 directories, 3 files

Blob

Tree

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── 13 │   └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │   └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d ├── info └── pack !

5 directories, 3 files

Blob

Tree

Commit

G I T U N D E R T H E H O O D

echo "// Comment" >> some-file.txt

G I T U N D E R T H E H O O D

git add some-file.txt

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── 13 │   └── 1e360ae1a0c08acd18182c6160af6a83e0d22f ├── 31 │   └── 995f2d03aa31ee97ee2e814c9f0b0ffd814316 ├── c1 │   └── 9e6823e34980033917b6427f3e245ce2102e6e ├── e4 │   └── 3a6ac59164adadac854d591001bbb10086f37d !

6 directories, 4 files

Entirely new BLOB

wat?

G I T U N D E R T H E H O O D

git gc

G I T U N D E R T H E H O O D

$> tree .git/objects .git/objects ├── info │   └── packs └── pack ├── pack-7475314b451a882d77b1535d215def8bad0f4306.idx └── pack-7475314b451a882d77b1535d215def8bad0f4306.pack !

2 directories, 3 files

G I T U N D E R T H E H O O D

Loose Objects

G I T U N D E R T H E H O O D

1.zlib compressed 2.Delta encoded

PackfileLoose Objects

but...

CPU Replace Graph

processes-git-pack-objects cputime

0

200000

400000

600000

800000

1000000

1407989425 1407989450 1407989475 1407989500 1407989525 1407989550 1407989575

user syst

CPU

Memory

processes-git-pack-objects rss

0

100

200

300

400

500

1407989425 1407989450 1407989475 1407989500 1407989525 1407989550 1407989575

MiB

Memory

IO Replace Graph

processes-git-pack-objects disk_octets

0

5

10

15

20

25

1407989440 1407989460 1407989480 1407989500 1407989520 1407989540

read write

Disk I/O

What's your point?

git clone/fetch

generates a packfile

git clone/fetch

git clone/fetch

every time

Clone

800stash.atlassian.com

Fetch

1,200

This is what we learned so far.

1. SCM Cache

CACHE

git clone 1

git clone 2CACHE

2. Sizing is important

You need sufficient hardware

768MiB

Memory budget

5 GiB

Memory budget

768MiB

5 GiB

Memory budget

3. Limits

Limits

4. Continuous Integration

What do you have?

This is what I've got.

Ok, here is what I've got. Give me everything that's new.

Here you go!

Don't worry. I'm up to date.

Caption goes here

Avoid Polling.

SCM Cache can also cache ref

advertisements

Consider shallow clones.

5. Update

Use recent versions of

Just text by itself, for impact.

Stash Data Center

RDBMS

FS

RDBMS

NFS

Performance at scale

RDBMS

NFS

S c a l i n g G i t

S c a l i n g G i t

• Git hosting operations are expensive.

S c a l i n g G i t

• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect

S c a l i n g G i t

• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of

ref advertisements when you can't

S c a l i n g G i t

• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of

ref advertisements when you can't• Prefer shallow clones

S c a l i n g G i t

• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of

ref advertisements when you can't• Prefer shallow clones• Limits are in place to keep your Stash server running

S c a l i n g G i t

• Git hosting operations are expensive.• Properly size the hardware for the workload that you expect• Avoid polling when you do a lot of builds, enable caching of

ref advertisements when you can't• Prefer shallow clones• Limits are in place to keep your Stash server running• Stash Data Center allows you to scale out and have high

availability

Sign up today! !

Talk to me after if you’re interested in learning more

Stash Data Center Beta Program

Thank you!

STEFAN SAASEN • DEVELOPMENT MANAGER • ATLASSIAN • @STEFANSAASEN