Dark Side of Git - We use this on my big data team

125
How GIT Works Internally SeongJae Park <[email protected]>

Transcript of Dark Side of Git - We use this on my big data team

Page 1: Dark Side of Git - We use this on my big data team

How GIT Works InternallySeongJae Park <[email protected]>

Page 2: Dark Side of Git - We use this on my big data team

Nice To Meet You

SeongJae Park

[email protected]

Page 3: Dark Side of Git - We use this on my big data team

Git

DVCS(Distributed Version Control System)http://git-scm.com/images/logos/downloads/Git-Logo-2Color.png

Page 4: Dark Side of Git - We use this on my big data team

Git

DVCS(Distributed Version Control System)

Made-by Linus Torvalds For Linux

http://git-scm.com/images/logos/downloads/Git-Logo-2Color.png

http://cdn.memegenerator.net/instances/400x/37078331.jpg

Page 5: Dark Side of Git - We use this on my big data team

Git

Many Projects Use Git Because It’s Awesome

http://blog.appliedis.com/wp-content/uploads/2013/11/android1.pnghttp://upload.wikimedia.org/wikipedia/en/4/40/Octocat,_a_Mascot_of_Github.jpghttp://upload.wikimedia.org/wikipedia/commons/thumb/3/35/Tux.svg/512px-Tux.svg.png

http://git-scm.com/images/logos/downloads/Git-Logo-2Color.png

Page 6: Dark Side of Git - We use this on my big data team

Git

Hard To LearnConfusing For CVCS UsersPush? Pull? Fetch? Rebase? HEAD???

http://www.quickmeme.com/img/fd/fd09e17b3393b2ea1cd7e52af1ad7c77f3c2d7a83e9f47d4b90ba3af52dde329.jpg

http://git-scm.com/images/logos/downloads/Git-Logo-2Color.png

Page 7: Dark Side of Git - We use this on my big data team

Git: The Information Manager From Hell

http://www.youblob.com/sites/default/files/styles/large/public/field/image/frontlego1.png?itok=XA5CXt84

Page 8: Dark Side of Git - We use this on my big data team

Git: The Information Manager From Hell

$ git log e83c516

commit e83c5163316f89bfbde7d9ab23ca2e25604af290

Author: Linus Torvalds <[email protected]>

Date: Thu Apr 7 15:13:13 2005 -0700

Initial revision of "git", the information manager from hell

http://www.youblob.com/sites/default/files/styles/large/public/field/image/frontlego1.png?itok=XA5CXt84

Page 9: Dark Side of Git - We use this on my big data team

Git: The Information Manager From Hell

That’s Why So Confusing And Hard To Learn

$ git log e83c516

commit e83c5163316f89bfbde7d9ab23ca2e25604af290

Author: Linus Torvalds <[email protected]>

Date: Thu Apr 7 15:13:13 2005 -0700

Initial revision of "git", the information manager from hell

http://www.youblob.com/sites/default/files/styles/large/public/field/image/frontlego1.png?itok=XA5CXt84

Page 10: Dark Side of Git - We use this on my big data team

This Time, We Will...

See How Git Works From The Scratch

https://lh4.googleusercontent.com/gBpfuABUjSNi2RagtJrGi8TW-pmtgak_0qtGOGubihvKH-5-umreO9CwJgjX2kaA9E7RkLwtEwiDnoMtOgm4iMJ0IWhvXlzlKL1kNVUYWuNa-gLRtRoyNjkVYg

Page 11: Dark Side of Git - We use this on my big data team

This Time, We Will...

See How Git Works From The Scratch

Just For Fun...Or To Be Friend Of Git

https://lh4.googleusercontent.com/gBpfuABUjSNi2RagtJrGi8TW-pmtgak_0qtGOGubihvKH-5-umreO9CwJgjX2kaA9E7RkLwtEwiDnoMtOgm4iMJ0IWhvXlzlKL1kNVUYWuNa-gLRtRoyNjkVYg

Page 12: Dark Side of Git - We use this on my big data team

This Time, We Will...

See How Git Works From The Scratch

Just For Fun...Or To Be Friend Of Git

Forget About TheComplicated CommandsThis Time

https://lh4.googleusercontent.com/gBpfuABUjSNi2RagtJrGi8TW-pmtgak_0qtGOGubihvKH-5-umreO9CwJgjX2kaA9E7RkLwtEwiDnoMtOgm4iMJ0IWhvXlzlKL1kNVUYWuNa-gLRtRoyNjkVYg

Page 13: Dark Side of Git - We use this on my big data team

In Short,

Git Is A Content-Addressable Storage System

http://www.juliagiff.com/wp-content/uploads/2014/03/tldr_trollcat.jpg

Page 14: Dark Side of Git - We use this on my big data team

In Short,

Git Is A Content-Addressable Storage System

Blob, Tree, Commit, Reference. That’s It =3

http://www.juliagiff.com/wp-content/uploads/2014/03/tldr_trollcat.jpg

Page 15: Dark Side of Git - We use this on my big data team

Plumbers: Unsung Heroes Behind

● Git Looks Graceful Owing To Plumbing Commands Consisting Them

http://cfile4.uf.tistory.com/image/182FF7244CFDDFB33CC999http://cfile29.uf.tistory.com/image/18574F224CFDD89B163073

Page 16: Dark Side of Git - We use this on my big data team

Plumbers: Unsung Heroes Behind

● Git Looks Graceful Owing To Plumbing Commands Consisting Them○ The Wounded Foots Are What We Interested In

http://cfile4.uf.tistory.com/image/182FF7244CFDDFB33CC999http://cfile29.uf.tistory.com/image/18574F224CFDD89B163073

Page 17: Dark Side of Git - We use this on my big data team

Again, From The ScratchVCS? Why? How?

Page 18: Dark Side of Git - We use this on my big data team

Why VCS?

Usual Life Of File

FileA ver 0 FileB ver 0

Page 19: Dark Side of Git - We use this on my big data team

Why VCS?

Usual Life Of File

FileA ver 0 FileB ver 1FileB ver 0

Page 20: Dark Side of Git - We use this on my big data team

Why VCS?

Usual Life Of File

FileA ver 0 FileB ver 1

Page 21: Dark Side of Git - We use this on my big data team

Why VCS?

Usual Life Of File

FileB ver 1 FileA ver 1FileA ver 0

Page 22: Dark Side of Git - We use this on my big data team

Why VCS?

Usual Life Of File

FileB ver 1 FileA ver 1

Page 23: Dark Side of Git - We use this on my big data team

Why VCS?

Usual Life Of File

FileB ver 2FileA ver 1FileB ver 1

Page 24: Dark Side of Git - We use this on my big data team

Why VCS?

Usual Life Of File

FileB ver 2FileA ver 1

Page 25: Dark Side of Git - We use this on my big data team

Why VCS?

Usual Life Of File

FileB ver 2FileA ver 1

Page 26: Dark Side of Git - We use this on my big data team

We Need Version Control System

VCS Would...Record Every Changes Safely, Efficiently

Page 27: Dark Side of Git - We use this on my big data team

We Need Version Control System

VCS Would...Record Every Changes Safely, EfficientlyAble To Check Out Any Version

Page 28: Dark Side of Git - We use this on my big data team

We Need Version Control System

VCS Would...Record Every Changes Safely, EfficientlyAble To Check Out Any VersionEasy To Read History

Page 29: Dark Side of Git - We use this on my big data team

Brute-force IdeaVersion Control Using File System

Page 30: Dark Side of Git - We use this on my big data team

Brute-force Idea

Rename / Backup Every Files Whenever Change Made

Page 31: Dark Side of Git - We use this on my big data team

Brute-force Idea

Rename / Backup Every Files Whenever Change Made

$ ls

foo.c

Page 32: Dark Side of Git - We use this on my big data team

Brute-force Idea

Rename / Backup Every Files Whenever Change Made

$ ls

foo.c

foo_20140111.c

Page 33: Dark Side of Git - We use this on my big data team

Brute-force Idea

Rename / Backup Every Files Whenever Change Made

$ ls

foo.c

foo_20140111.c

foo_final.c

Page 34: Dark Side of Git - We use this on my big data team

Brute-force Idea

Rename / Backup Every Files Whenever Change Made

$ ls

foo.c

foo_20140111.c

foo_final.c

foo_realfinal.c

foo_planb.c

foo_finalfinal.c

Page 35: Dark Side of Git - We use this on my big data team

Brute-force Idea

Rename / Backup Every Files Whenever Change Made

$ ls

foo.c

foo_20140111.c

foo_final.c

foo_realfinal.c

foo_planb.c

foo_finalfinal.c

Page 36: Dark Side of Git - We use this on my big data team

Brute-force Idea + History Isolation

Keep Working / History Directory Seperately.

Page 37: Dark Side of Git - We use this on my big data team

Brute-force Idea + History Isolation

Keep Working / History Directory Seperately.Better, But...

$ find . -type f

./working/foo.c

./history/foo_20140111.c

./history/foo_final.c

./history/foo_realfinal.c

./history/foo_planb.c

./history/foo_finalfinal.c

Page 38: Dark Side of Git - We use this on my big data team

TODOs From Version Control Using FS

Use Storage Space-Efficiently

Page 39: Dark Side of Git - We use this on my big data team

TODOs From Version Control Using FS

Use Storage Space-EfficientlyEasy History Searching

Page 40: Dark Side of Git - We use this on my big data team

Mission #1:Store History Space-Efficiently

Page 41: Dark Side of Git - We use this on my big data team

Basic Idea: Avoid Duplicated Objects

Page 42: Dark Side of Git - We use this on my big data team

Basic Idea: Avoid Duplicated Objects

Content-Addressable Storage System

Page 43: Dark Side of Git - We use this on my big data team

Basic Idea: Avoid Duplicated Objects

Content-Addressable Storage SystemKey: SHA-1 Hash Of Object’s Content

Value: Compressed Content

Page 44: Dark Side of Git - We use this on my big data team

Basic Idea: Avoid Duplicated Objects

Content-Addressable Storage SystemKey: SHA-1 Hash Of Object’s Content

Value: Compressed Content

Same Content Never Saved Twice

Page 45: Dark Side of Git - We use this on my big data team

Save / Load ‘homer’

$ mkdir simpsons; cd simpsons; git init

Initialized empty Git repository in simpsons/.git/

$ echo ‘homer’ | git hash-object -w --stdin

de7e45490c9a4a3b5d5fae106faa4235ec669e02

$

Page 46: Dark Side of Git - We use this on my big data team

Save / Load ‘homer’

$ mkdir simpsons; cd simpsons; git init

Initialized empty Git repository in simpsons/.git/

$ echo ‘homer’ | git hash-object -w --stdin

de7e45490c9a4a3b5d5fae106faa4235ec669e02

$ find .git/objects/ -type f.git/objects/de/7e45490c9a4a3b5d5fae106faa4235ec669e02

$

Page 47: Dark Side of Git - We use this on my big data team

Save / Load ‘homer’

$ mkdir simpsons; cd simpsons; git init

Initialized empty Git repository in simpsons/.git/

$ echo ‘homer’ | git hash-object -w --stdin

de7e45490c9a4a3b5d5fae106faa4235ec669e02

$ find .git/objects/ -type f.git/objects/de/7e45490c9a4a3b5d5fae106faa4235ec669e02

$ git cat-file -p de7e4

keep it simple, stupid

$ git cat-file -t de7e4

blob

Page 48: Dark Side of Git - We use this on my big data team

What `hash-object -w` did

hash_object_w(‘homer\n’)

Page 49: Dark Side of Git - We use this on my big data team

What `hash-object -w` did

hash_object_w(‘homer\n’)

# Save compressed header + content at sha1 path

def hash_object_w(content):

header = ‘blob %d\0’ % len(content)

store = header + content

sha1 = sha.new(store).hexdigest()

Page 50: Dark Side of Git - We use this on my big data team

What `hash-object -w` did

hash_object_w(‘homer\n’)

# Save compressed header + content at sha1 path

def hash_object_w(content):

header = ‘blob %d\0’ % len(content)

store = header + content

sha1 = sha.new(store).hexdigest()

dir = ‘.git/objects/’ + sha1[0:2] + ‘/’

filename = sha1[2:]

Page 51: Dark Side of Git - We use this on my big data team

What `hash-object -w` did

hash_object_w(‘homer\n’)

# Save compressed header + content at sha1 path

def hash_object_w(content):

header = ‘blob %d\0’ % len(content)

store = header + content

sha1 = sha.new(store).hexdigest()

dir = ‘.git/objects/’ + sha1[0:2] + ‘/’

filename = sha1[2:]

open(dir + filename, ‘w’).write(

zlib.compress(store))

Page 52: Dark Side of Git - We use this on my big data team

Version Control Using Hash Value

$ echo “bart” > son

$ git hash-object -w son

e00ddae83bdab443f4267426623aa34636c935f2

$

Page 53: Dark Side of Git - We use this on my big data team

Version Control Using Hash Value

$ echo “bart” > son

$ git hash-object -w son

e00ddae83bdab443f4267426623aa34636c935f2

$ echo “hugo” > son

$ git hash-object -w son

8e1e2f09585e021c9727585af72e10871d7be7ce

$

Page 54: Dark Side of Git - We use this on my big data team

Version Control Using Hash Value

$ echo “bart” > son

$ git hash-object -w son

e00ddae83bdab443f4267426623aa34636c935f2

$ echo “hugo” > son

$ git hash-object -w son

8e1e2f09585e021c9727585af72e10871d7be7ce

$

# Need former version, “bart”

$ git cat-file -p e00dd > son

$ cat son

bart

Page 55: Dark Side of Git - We use this on my big data team

TODOs From Version Control Using FS

Use Storage Space-EfficientlyEasy History Searching

Page 56: Dark Side of Git - We use this on my big data team

Version Control Using Hash Value

● DONE○ Efficient Space Usage○ Safe Record / Checkout Of History

https://www.sciencenews.org/sites/default/files/main/articles/sad_opener.jpg

Page 57: Dark Side of Git - We use this on my big data team

Version Control Using Hash Value

● DONE○ Efficient Space Usage○ Safe Record / Checkout Of History

● TODO○ Support Directory Structure○ History Management○ Better Reference Than Hash Value

https://www.sciencenews.org/sites/default/files/main/articles/sad_opener.jpg

Page 58: Dark Side of Git - We use this on my big data team

WAIT!

Q: What If Small Changes Inside A Big File?

Page 59: Dark Side of Git - We use this on my big data team

WAIT!

Q: What If Small Changes Inside A Big File?

$ du -h bigfile.c188Kbigfile.c$ du -sh408K.$ echo ‘/* small change */’ >> bigfile.c$ git commit -as -m “small change, big difference”$ du -sh496K.$

Page 60: Dark Side of Git - We use this on my big data team

WAIT!

Q: What If Small Change Inside A Big File?A: Git Pick up Diff-Only If Necessary

But, Don’t Forget To Keep It Small, Simple

$ du -sh496K.$ git gc

Counting objects: 6, done.

Delta compression using up to 4 threads.

Compressing objects: 100% (4/4), done.

Writing objects: 100% (6/6), done.

Total 6 (delta 1), reused 0 (delta 0)

$ du -sh

388K.

Page 61: Dark Side of Git - We use this on my big data team

Mission #2:Store History Of Directories

Page 62: Dark Side of Git - We use this on my big data team

tree Object

Point Other Objects(Using Hash) With Name

Page 63: Dark Side of Git - We use this on my big data team

tree Object

Point Other Objects(Using Hash) With Name

tree

blob blob tree

blob

a113f2mommy b8934

son

c9240pets

d9b13cat

Page 64: Dark Side of Git - We use this on my big data team

tree Object

Point Other Objects(Using Hash) With Name

“A Root tree Object Is A Snapshot”

tree

blob blob tree

blob

a113f2mommy b8934

son

c9240pets

d9b13cat

I’m a snapshot

Page 65: Dark Side of Git - We use this on my big data team

tree object$ mkdir pets; echo ‘snowball’ > pets/cat

$ git update-index --add son pets/cat

$ git write-tree

15ee76ed3e744b6796950d07f26283d033ea3ea7$

Page 66: Dark Side of Git - We use this on my big data team

tree object$ mkdir pets; echo ‘snowball’ > pets/cat

$ git update-index --add son pets/cat

$ git write-tree

15ee76ed3e744b6796950d07f26283d033ea3ea7$ git cat-file -p 15ee7

040000 tree 85ab72cf1946dc56392718a1aafb3c6f66c02072 pets

100644 blob 8e1e2f09585e021c9727585af72e10871d7be7ce son

$

Page 67: Dark Side of Git - We use this on my big data team

tree object$ mkdir pets; echo ‘snowball’ > pets/cat

$ git update-index --add son pets/cat

$ git write-tree

15ee76ed3e744b6796950d07f26283d033ea3ea7$ git cat-file -p 15ee7

040000 tree 85ab72cf1946dc56392718a1aafb3c6f66c02072 pets

100644 blob 8e1e2f09585e021c9727585af72e10871d7be7ce son

$ git cat-file -p 85ab7

100644 blob 6a1f952e1baedcb3db93a3ea5e3389e5a87941e9 cat

$ git cat-file -p 6a1f9

snowball

$

Page 68: Dark Side of Git - We use this on my big data team

Internal Data Structure

tree

blob tree

8e1e2son

85ab7pets

Page 69: Dark Side of Git - We use this on my big data team

Internal Data Structure

tree

blob tree

blob

6a1f9cat

8e1e2son

85ab7pets

Page 70: Dark Side of Git - We use this on my big data team

Version Control Using tree Object

$ echo “bart” > son

$ git update-index --add son

$ git write-tree

661e6ad514a7f05c46c2931280cb78a339d34ee2

$

Page 71: Dark Side of Git - We use this on my big data team

Version Control Using tree Object

$ echo “bart” > son

$ git update-index --add son

$ git write-tree

661e6ad514a7f05c46c2931280cb78a339d34ee2

$ git cat-file -p 661e6040000 tree 85ab72cf1946dc56392718a1aafb3c6f66c02072 pets

100644 blob e00ddae83bdab443f4267426623aa34636c935f2 son

$

Page 72: Dark Side of Git - We use this on my big data team

Version Control Using tree Object

$ echo “bart” > son

$ git update-index --add son

$ git write-tree

661e6ad514a7f05c46c2931280cb78a339d34ee2

$ git cat-file -p 661e6040000 tree 85ab72cf1946dc56392718a1aafb3c6f66c02072 pets

100644 blob e00ddae83bdab443f4267426623aa34636c935f2 son

$ git cat-file -p e00dd

bart

$

Page 73: Dark Side of Git - We use this on my big data team

Internal Data Structure

tree

blob tree

blob

8e1e2son

85ab7pets

6a1f9cat

Page 74: Dark Side of Git - We use this on my big data team

Internal Data Structure

tree

blob tree

blob

tree

blob

e00ddson85ab7

pets

8e1e2son

85ab7pets

6a1f9cat

Page 75: Dark Side of Git - We use this on my big data team

Version Control Using Hash Value

● DONE○ Efficient Space Usage○ Safe Record / Checkout Of History

● TODO○ Support Directory Structure○ History Management○ Better Reference Than Hash Value

https://www.sciencenews.org/sites/default/files/main/articles/sad_opener.jpg

Page 76: Dark Side of Git - We use this on my big data team

Version Control Using tree Object

● DONE○ Efficient Space Usage○ Safe Record / Checkout Of History○ Support Directory Structure

● TODO○ History Management○ Better Reference Than Hash Value

https://www.sciencenews.org/sites/default/files/main/articles/sad_opener.jpg

Page 77: Dark Side of Git - We use this on my big data team

Mission #3:Commit Message

Page 78: Dark Side of Git - We use this on my big data team

commit Object

Describe Who / When / Why The Change Made

http://modthink.com/wp-content/uploads/2013/05/WhoWhatWhenWhereWHY.jpg

Page 79: Dark Side of Git - We use this on my big data team

commit Object

Describe Who / When / Why The Change Made

Point A tree Object With Information Above

http://modthink.com/wp-content/uploads/2013/05/WhoWhatWhenWhereWHY.jpg

Page 80: Dark Side of Git - We use this on my big data team

commit Object

$ echo '1st commit' | git commit-tree 661e6

0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0

$

Page 81: Dark Side of Git - We use this on my big data team

commit Object

$ echo '1st commit' | git commit-tree 661e6

0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0

$

$ git cat-file -p d075ctree 661e6ad514a7f05c46c2931280cb78a339d34ee2author SeongJae Park <s**@gmail.com> 1410527921 +0900

committer SeongJae Park <s**@gmail.com> 1410527921 +0900

1st commit

$

Page 82: Dark Side of Git - We use this on my big data team

commit Object

$ echo '1st commit' | git commit-tree 661e6

0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0

$

$ git cat-file -p d075ctree 661e6ad514a7f05c46c2931280cb78a339d34ee2author SeongJae Park <s**@gmail.com> 1410527921 +0900

committer SeongJae Park <s**@gmail.com> 1410527921 +0900

1st commit

$

Who WhenWhy

Page 83: Dark Side of Git - We use this on my big data team

Version Control Using commit Object

$ echo '2nd commit' | git commit-tree 15ee7 -p 0ca73

003b5e66caa89a6228c7b4d91e0475e56bf1bdf6

$

$ git cat-file -p 003b5

tree 15ee76ed3e744b6796950d07f26283d033ea3ea7

parent 0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0author SeongJae Park <s**@gmail.com> 1410528231 +0900

committer SeongJae Park <s**@gmail.com> 1410528231 +0900

2nd commit

$

Page 84: Dark Side of Git - We use this on my big data team

Internal Data Structure

That’s Why People Says, “A Commit is a snapshot”

tree

blob tree

blob

tree

blob

commit commit

tree

parent

tree

85ab7pets

8e1e2son

85ab7pets

6a1f9cat

e00ddson

Page 85: Dark Side of Git - We use this on my big data team

Version Control Using tree Object

● DONE○ Efficient Space Usage○ Safe Record / Checkout Of History○ Support Directory Structure

● TODO○ History Management○ Better Reference Than Hash Value

https://www.sciencenews.org/sites/default/files/main/articles/sad_opener.jpg

Page 86: Dark Side of Git - We use this on my big data team

Version Control Using commit Object

● DONE○ Efficient Space Usage○ Safe Record / Checkout Of History○ Support Directory Structure○ Manage History Well

● TODO○ Better Reference Than Hash Value

https://www.sciencenews.org/sites/default/files/main/articles/sad_opener.jpg

Page 87: Dark Side of Git - We use this on my big data team

Mission #4:Human Readable Name

Page 88: Dark Side of Git - We use this on my big data team

Git References

File With Human-Readable Name

Page 89: Dark Side of Git - We use this on my big data team

Git References

File With Human-Readable Name

Storing SHA-1 Value Of commit Object

Page 90: Dark Side of Git - We use this on my big data team

Git References

File With Human-Readable Name

Storing SHA-1 Value Of commit Object

Resides In .git/refs/

Page 91: Dark Side of Git - We use this on my big data team

Git References Using echo

$ echo "0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0" > .git/refs/heads/first

$

Page 92: Dark Side of Git - We use this on my big data team

Git References Using echo

$ echo "0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0" > .git/refs/heads/first

$

$ git log --pretty=oneline first

0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0 1st commit

$

Page 93: Dark Side of Git - We use this on my big data team

Git References Using echo

$ echo "0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0" > .git/refs/heads/first

$

$ git log --pretty=oneline first

0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0 1st commit

$

$ find .git/refs/heads -type f

.git/refs/heads/first

.git/refs/heads/master

$

Page 94: Dark Side of Git - We use this on my big data team

Git References Using update-ref

$ git update-ref refs/heads/master 003b5

$ git log --pretty=oneline master003b5e66caa89a6228c7b4d91e0475e56bf1bdf6 2nd commit

0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0 1st commit

$

Page 95: Dark Side of Git - We use this on my big data team

Git References Using update-ref

$ git update-ref refs/heads/master 003b5

$ git log --pretty=oneline master003b5e66caa89a6228c7b4d91e0475e56bf1bdf6 2nd commit

0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0 1st commit

$

$ find .git/refs/heads -type f

.git/refs/heads/first

.git/refs/heads/master

$

Page 96: Dark Side of Git - We use this on my big data team

Git References Using update-ref

$ git update-ref refs/heads/master 003b5

$ git log --pretty=oneline master003b5e66caa89a6228c7b4d91e0475e56bf1bdf6 2nd commit

0ca7304ad6f5a40f8a26ba05b10b514ff2d8d8a0 1st commit

$

$ find .git/refs/heads -type f

.git/refs/heads/first

.git/refs/heads/master

$

$ cat .git/refs/heads/master

003b5e66caa89a6228c7b4d91e0475e56bf1bdf6

Page 97: Dark Side of Git - We use this on my big data team

Internal Data Structure

tree

blob tree

blob

tree

blob

commit commit

tree

parent

tree

85ab7pets

8e1e2son

85ab7pets

e00ddson

6a1f9cat

Page 98: Dark Side of Git - We use this on my big data team

Internal Data Structure

tree

blob tree

blob

tree

blob

commit commit

tree

parent

tree

refs/heads/master

refs/heads/first

85ab7pets

8e1e2son

85ab7pets

e00ddson

6a1f9cat

Page 99: Dark Side of Git - We use this on my big data team

Version Control Using commit Object

● DONE○ Efficient Space Usage○ Safe Record / Checkout Of History○ Support Directory Structure○ Manage History Well

● TODO○ Better Reference Than Hash Value

https://www.sciencenews.org/sites/default/files/main/articles/sad_opener.jpg

Page 100: Dark Side of Git - We use this on my big data team

Version Control Using Reference

● DONE○ Efficient Space Usage○ Safe Record / Checkout Of History○ Support Directory Structure○ Manage History Well○ Easy To Remember Specific Snapshot

● TODO○ ...cooperation?

https://www.sciencenews.org/sites/default/files/main/articles/sad_opener.jpg

Page 101: Dark Side of Git - We use this on my big data team

FAQ #1How Git Make-up Working Directory?

Page 102: Dark Side of Git - We use this on my big data team

How Git Knows Current Commit?

Answer: HEAD

Page 103: Dark Side of Git - We use this on my big data team

How Git Knows Current Commit?

Answer: HEAD

HEAD Points reference Using ref format(Not SHA-1)

Page 104: Dark Side of Git - We use this on my big data team

How Git Knows Current Commit?

Answer: HEAD

HEAD Points reference Using ref format(Not SHA-1)

$ cat .git/HEADref: refs/heads/master

Page 105: Dark Side of Git - We use this on my big data team

HEAD$ cat .git/HEAD

ref: refs/heads/master

$

Page 106: Dark Side of Git - We use this on my big data team

HEAD$ cat .git/HEAD

ref: refs/heads/master

$ git branch

first

* master

$

Page 107: Dark Side of Git - We use this on my big data team

HEAD$ cat .git/HEAD

ref: refs/heads/master

$ git branch

first

* master

$

$ git symbolic-ref HEAD refs/heads/first

$ cat .git/HEAD

ref: refs/heads/first

$ git branch

* first

master

Page 108: Dark Side of Git - We use this on my big data team

Internal Data Structure

tree

blob tree

blob

tree

blob

commit commit

tree

parent

tree

refs/heads/master

refs/heads/first

85ab7pets

8e1e2son

85ab7pets

e00ddson

6a1f9cat

Page 109: Dark Side of Git - We use this on my big data team

Internal Data Structure

tree

blob tree

blob

tree

blob

commit commit

tree

parent

tree

refs/heads/master

refs/heads/first .git/HEAD

85ab7pets

8e1e2son

85ab7pets

e00ddson

6a1f9cat

Page 110: Dark Side of Git - We use this on my big data team

FAQ #2Cloned. Now Fetch Or Pull ?

Page 111: Dark Side of Git - We use this on my big data team

Fetch / Pull

Fetch Or Pull To Get Latest Code?

Page 112: Dark Side of Git - We use this on my big data team

Fetch

● Just Fetch Remote Repository’s Objects And References To Local Git Internal Storage

Page 113: Dark Side of Git - We use this on my big data team

Fetch

● Just Fetch Remote Repository’s Objects And References To Local Git Internal Storage

● If You Need The Changes On Your Working Directory,

Page 114: Dark Side of Git - We use this on my big data team

Fetch

● Just Fetch Remote Repository’s Objects And References To Local Git Internal Storage

● If You Need The Changes On Your Working Directory,○ Manually Merge Them Using git-merge Or,○ Checkout

Page 115: Dark Side of Git - We use this on my big data team

Fetch

Refspec Describes Source / Destination

$ cat .git/config | grep remote -A3

[remote "origin"]

url = git://10.0.0.1/git/simpsons.git

fetch = +refs/heads/*:refs/remotes/origin/*

Source Destination

Page 116: Dark Side of Git - We use this on my big data team

Fetch: Beforeurl = git://10.0.0.1/git/simpsons.git

fetch = +refs/heads/*:refs/remotes/origin/*

tree

blob tree

blob

a134fson

799cfpets

7cc07cat

tree

blob

65464son

799cfpets

commit commit

tree

parent

tree

refs/heads/master

.git/HEAD

git://10.0.0.1/git/simpsons.git

tree

blob tree

blob

a134fson

799cfpets

7cc07cat

commit

tree

refs/heads/master

.git/HEAD

file:///home/sjpark/simpsons

Page 117: Dark Side of Git - We use this on my big data team

Fetch: Afterurl = git://10.0.0.1/git/simpsons.git

fetch = +refs/heads/*:refs/remotes/origin/*

tree

blob tree

blob

a134fson

799cfpets

7cc07cat

tree

blob

65464son

799cfpets

commit commit

tree

parent

tree

refs/heads/master

.git/HEAD

git://10.0.0.1/git/simpsons.git

tree

blob tree

blob

a134fson

799cfpets

7cc07cat

tree

blob

65464son

799cfpets

commit commit

tree

parent

tree

refs/remotes/

origin/master

refs/heads/master

.git/HEAD

file:///home/sjpark/simpsons

Page 118: Dark Side of Git - We use this on my big data team

git merge origin/master

tree

blob tree

blob

a134fson

799cfpets

7cc07cat

tree

blob

65464son

799cfpets

commit commit

tree

parent

tree

refs/remotes/

origin/master

refs/heads/

first

.git/HEAD

tree

blob tree

blob

a134fson

799cfpets

7cc07cat

tree

blob

65464son

799cfpets

commit commit

tree

parent

tree

refs/remotes/

origin/master

refs/heads/

first

.git/HEAD

Page 119: Dark Side of Git - We use this on my big data team

Pull

Pull Is Just An Abbrev Of Fetch && Merge

May Merge Conflict Occur…

Pull Is Sufficient For Simple Project

Page 120: Dark Side of Git - We use this on my big data team

Wrap-up

Page 121: Dark Side of Git - We use this on my big data team

In Short,

Git Is A Content-Addressable File System

Blob, Tree, Commit, Reference. That’s It =3

http://www.juliagiff.com/wp-content/uploads/2014/03/tldr_trollcat.jpg

Page 122: Dark Side of Git - We use this on my big data team

Thank you :)

http://jeancharpentier.files.wordpress.com/2012/02/capture-plein-c3a9cran-01022012-230955.jpg

Page 125: Dark Side of Git - We use this on my big data team

This work by SeongJae Park is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported

License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/.