Blob sync. Optimized updating of blobs on Azure

45
SAVE BANDWIDTH (AND LEARN TO LOVE BLOBS)

description

Optimize the way you update blobs on Azure Blob Storage. Only upload/download the deltas instead of wasting your bandwidth.

Transcript of Blob sync. Optimized updating of blobs on Azure

Page 1: Blob sync. Optimized updating of blobs on Azure

SAVE BANDWIDTH(AND LEARN TO LOVE BLOBS)

Page 2: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

• DURABLE

• HIGHLY AVAILABLE

• ACCESS ANYWHERE (WITH CREDENTIALS)

• SCALABLE

Page 3: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

CHEAP!!

Page 4: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101: BLOB BASICS

• USE BLOCKS OF DATA TO CONSTRUCT BLOB

• REPLACE BLOCKS IN EXISTING BLOBS

Page 5: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 6: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 7: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 8: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

UPLOAD ENTIRE BLOB AGAIN

Page 9: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

UPLOAD ENTIRE BLOB AGAIN

Page 10: Blob sync. Optimized updating of blobs on Azure

WHY?

Page 11: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

TRY AGAIN

Page 12: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 13: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 14: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

Page 15: Blob sync. Optimized updating of blobs on Azure

CLOUD STORAGE 101

UPLOAD SINGLE BLOCK

Page 16: Blob sync. Optimized updating of blobs on Azure

BLOBSYNC AWESOMESAUCE

• DETECTS CHANGES

• DOES NOT NEED ORIGINAL FILE TO DETECT CHANGES

• UPLOADS/DOWNLOADS CHANGES ONLY

• A TRANSPARENT BLACKBOX… OPEN SOURCE BUT CAN TREAT AS A BLACK BOX

Page 17: Blob sync. Optimized updating of blobs on Azure
Page 18: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• THEORY

Azure Blob Storage

Local machine

Page 19: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• THEORY

Azure Blob Storage

Local machine

0 100 200 300 400

Page 20: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• THEORY

Azure Blob Storage

Local machine

0 100 200 300 400

Page 21: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• THEORY

Azure Blob Storage

Local machine

0 100 200 300 400

Page 22: Blob sync. Optimized updating of blobs on Azure

THEORY….

• IS ALL GOOD IN THEORY

Page 23: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• REALITY

Azure Blob Storage

Local machine

0 100 200 300 400

Page 24: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• REALITY

Azure Blob Storage

Local machine

0 100 200 300 400

A DB C

A B’ C D

Page 25: Blob sync. Optimized updating of blobs on Azure

FINDING COMMON GROUND

• HOW DO WE FIND MOVED BLOCKS?

Page 26: Blob sync. Optimized updating of blobs on Azure

FINDING COMMON GROUND

• HOW DO WE FIND MOVED BLOCKS?

• USE HASH/SIGNATURES FOR EACH BLOCK

• SEARCH FOR SIGNATURE ALL THROUGHOUT FILE

Page 27: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• SEARCH LOCAL

Azure Blob Storage

Local machine

0 100 200 300 400

A DB C

A B’ C D

Page 28: Blob sync. Optimized updating of blobs on Azure

THEORY VS REALITY

• SEARCH LOCAL

• EG. SEARCH FOR ‘C’Local machine

0 100 200 300 400

A B’ C D

Page 29: Blob sync. Optimized updating of blobs on Azure

SUCCESS!

• CAN NOW FIND BLOCKS EVEN WHEN MOVED

Page 30: Blob sync. Optimized updating of blobs on Azure

SUCCESS!

• CAN NOW FIND BLOCKS EVEN WHEN MOVED

• IF WE CAN FIND A BLOCK WE CAN DETERMINE IF WE CAN REUSE IT

Page 31: Blob sync. Optimized updating of blobs on Azure

SUCCESS!

• CAN NOW FIND BLOCKS EVEN WHEN MOVED

• IF WE CAN FIND A BLOCK WE CAN DETERMINE IF WE CAN REUSE IT

• BUT…….

Page 32: Blob sync. Optimized updating of blobs on Azure

SUCCESS!

• CAN NOW FIND BLOCKS EVEN WHEN MOVED

• IF WE CAN FIND A BLOCK WE CAN DETERMINE IF WE CAN REUSE IT

• BUT…….

• MD5/SHA ETC ARE TOO SLOW TO DO THIS

Page 33: Blob sync. Optimized updating of blobs on Azure

• TOO SLOW? NO WAY!

• EG

• 100MB FILE/BLOB

• BLOCK OF 100K

• > 104M HASH CALCULATIONS. JUST TO FIND THAT ONE BLOCK

Page 34: Blob sync. Optimized updating of blobs on Azure

YOU HAVE TO ROLL WITH IT.

• ROLLING SIGNATURE

• EXTREMELY QUICK.

Page 35: Blob sync. Optimized updating of blobs on Azure

YOU HAVE TO ROLL WITH IT.

• ROLLING SIGNATURE

• EXTREMELY QUICK.

• DUE TO FALSE POSITIVES USE MD5/SHA AS CONFIRMATION STEP

Page 36: Blob sync. Optimized updating of blobs on Azure

YOU HAVE TO ROLL WITH IT.

• SIG = FUNC( 0 .. 4 )

Page 37: Blob sync. Optimized updating of blobs on Azure

YOU HAVE TO ROLL WITH IT.

• SIG = FUNC( 0 .. 4 )

• CALCULATE SIG OF 1..5 BASED OFF OLD SIG

• NEW SIG = OLDSIG – ARRAY[0] + ARRAY[5]

Page 38: Blob sync. Optimized updating of blobs on Azure

YOU HAVE TO ROLL WITH IT.

• CAN SEARCH ENTIRE FILE WITH MINIMAL CALCULATIONS. IE FAST!

Page 39: Blob sync. Optimized updating of blobs on Azure

SO WHAT NOW?

• CAN NOW SEARCH FILES QUICKLY FOR SIGNATURE MATCHES

• MEANS WE CAN FIGURE OUT WHAT IS COMMON BETWEEN CLOUD AND LOCAL

• CAN DOWNLOAD/UPLOAD ONLY THE DIFFERENCES.

Page 40: Blob sync. Optimized updating of blobs on Azure

PROVE IT!

Page 41: Blob sync. Optimized updating of blobs on Azure

FILE INTERNALS

Page 42: Blob sync. Optimized updating of blobs on Azure

FILE INTERNALS

ADDDELETE

REPLACE

Page 43: Blob sync. Optimized updating of blobs on Azure

LIES, MORE LIES AND STATISTICS

• SMALL DB (14M).

• CLEARED A SMALL TABLE.

• UPDATE 340K

• LARGE DB (555M).

• CLEARED A SMALL TABLE

• UPDATE 720K

• VM (8G).

• DELETED SOME FILES

• UPDATE 800M

Page 44: Blob sync. Optimized updating of blobs on Azure

UPCOMING CHANGES

• DEFRAG

• DYNAMICALLY DETERMINE BLOCK SIZE

• BETTER PARALLEL UPLOAD/DOWNLOAD

• 32 BIT VERSION

Page 45: Blob sync. Optimized updating of blobs on Azure

LINKS

• BLOG ON BLOBSYNC:

• HTTPS://KPFAULKNER.WORDPRESS.COM/CATEGORY/BLOBSYNC/

• NUGET PACKAGE:

• HTTPS://WWW.NUGET.ORG/PACKAGES/BLOBSYNC/

• GITHUB WITH SOURCE:

• HTTPS://GITHUB.COM/KPFAULKNER/BLOBSYNC/