CS 744: GOOGLE FILE SYSTEM

30
CS 744: GOOGLE FILE SYSTEM Shivaram Venkataraman Fall 2021 Hello !

Transcript of CS 744: GOOGLE FILE SYSTEM

Page 1: CS 744: GOOGLE FILE SYSTEM

CS 744: GOOGLE FILE SYSTEM

Shivaram VenkataramanFall 2021

Hello !

Page 2: CS 744: GOOGLE FILE SYSTEM

ANNOUNCEMENTS

- Assignment 1 out later today- Group submission form- Anybody on the waitlist?

Page 3: CS 744: GOOGLE FILE SYSTEM

OUTLINE

1. Brief history2. GFS3. Discussion4. What happened next?

Page 4: CS 744: GOOGLE FILE SYSTEM

HISTORY OF DISTRIBUTED FILE SYSTEMS

Page 5: CS 744: GOOGLE FILE SYSTEM

SUN NFS

FileServer

Client

Client

Client

Client

RPC

RPC

RPC

RPCLocal FS

~ 1980s

read (a. txt,o,4096)

I

Ents tread

Data

pots ofdisk

Page 6: CS 744: GOOGLE FILE SYSTEM

/dev/sda1 on //dev/sdb1 on /backups

NFS on /home

/

backups home

bak1 bak2 bak3

etc bin

tyler

537

p1 p2

.bashrc

Transparentto the user

Nfs provided Posix semantics

Page 7: CS 744: GOOGLE FILE SYSTEM

CACHING

Client cache records time when data block was fetched (t1)Before using data block, client does a STAT request to server

- get’s last modified timestamp for this file (t2) (not block…)- compare to cache timestamp- refetch data block if changed since timestamp (t2 > t1)

Local FS

Server Client 2

NFScache: A t1t2

Client - side

reaching+

STAT requestto

validate cache

entries

Page 8: CS 744: GOOGLE FILE SYSTEM

ANDREW FILE SYSTEM

- Design for scale

- Whole-file caching

- Callbacks from server

when you open ( AFS)^a file ,

entire

1,

me .

and cached at

client

Page 9: CS 744: GOOGLE FILE SYSTEM

WORKLOAD PATTERNS (1991)

"

" "

f.les

2=44

Page 10: CS 744: GOOGLE FILE SYSTEM

OceanSTORE/PAST

Wide area storage systems

Fully decentralized

Built on distributed hash tables (DHT)

late 90s to

early 2000s

Page 11: CS 744: GOOGLE FILE SYSTEM

GFS: WHY ?

hots of data

↳very large files

Fault tolerance

workload patterns- Primarily append- latency was not a concern

-maximize bandwidth use

Page 12: CS 744: GOOGLE FILE SYSTEM

GFS: WHY ?

Components with failures Files are huge !

Applications are different

Page 13: CS 744: GOOGLE FILE SYSTEM

GFS: WORKLOAD ASSUMPTIONS

“Modest” number of large files

Two kinds of reads: Large Streaming and small random

Writes: Many large, sequential writes. Few random

High bandwidth more important than low latency

--

Page 14: CS 744: GOOGLE FILE SYSTEM

GFS: DESIGN

- Single Master for metadata

- Chunkservers for storing data

- No POSIX API ! - No Caches!

"

gfs.tn"

read part of a filea. txt 2

G -

( abcd ,@11m27)

abcd 5123

M1

☒ ☐ ☐ ☐ DD ☐

files are chunked into

fixed sire chunks

Page 15: CS 744: GOOGLE FILE SYSTEM

CHUNK SIZE TRADE-OFFS

Client à Master

Client à Chunkserver

Metadata

if chunk size is small more calls

to Marter

too small ⇒ need b- open

connections to many

chink serverstoo large

smaller chunks↳ hotspot at chink sewers

⇒ moremetadata-

HDFS ~ 128m13

⇒ 64 MB more ? ?

Page 16: CS 744: GOOGLE FILE SYSTEM

GFS: REPLICATION

- 3-way replication to handle faults- Primary replica for each chunk- Chain replication (consistency)

- Decouple data, control flow- Dataflow: Pipelining, network-

aware

\

replicaslease is

givenb- -

Primary -

replica

-mutationusing -

data_

-

Page 17: CS 744: GOOGLE FILE SYSTEM

RECORD APPENDSWrite Client specifies the offsetRecord Append GFS chooses offset

ConsistencyAt-least onceAtomic

→ write ( a. txt ,21421

- " abcd"

)["

Wisconsin"

,

20ns,200 Ok] → record

→ the appended record will appear atleast

once in the chunk

↳ the entire record will appear

Page 18: CS 744: GOOGLE FILE SYSTEM

MASTER OPERATIONS

- No “directory” inode! Simplifies locking- Replica placement considerations

- Implementing deletes ←÷¥÷.→ load,disk utilization la

→ failure probability

→ hazy .

Rename the file

Garbage collection

Page 19: CS 744: GOOGLE FILE SYSTEM

FAULT TOLERANCE

- Chunk replication with 3 replicas- Master

- Replication of log, checkpoint- Shadow master

- Data integrity using checksum blocks

Gfs Master

¥4

Page 20: CS 744: GOOGLE FILE SYSTEM

DISCUSSION

https://forms.gle/YpDcxPncdqnZ7JXG6

Page 21: CS 744: GOOGLE FILE SYSTEM

GFS SOCIAL NETWORKYou are building a new social networking application. The operations you will need to perform are

(a) add a new friend id for a given user (b) generate a histogram of number of friends per user.

How will you do this using GFS as your storage system ? "

histogram stream through the filedelete

ner+id,<Htottn%appendrandom ÷write ☐ ☒→ small files

merit Merzavoid randomwrites

Page 22: CS 744: GOOGLE FILE SYSTEM

it,C u3 >

It,

us ,cus> append only

U2,

lit

: histogramgroup users u1

,

usneeds grouping

µ÷u1

,

v8, 3-

Page 23: CS 744: GOOGLE FILE SYSTEM

GFS EVALList your takeaways from “Table 3: Performance metrics”

all append goto

replicationlower tfmt same chunkserver

Page 24: CS 744: GOOGLE FILE SYSTEM

WHAT HAPPENED NEXT

Page 25: CS 744: GOOGLE FILE SYSTEM

Keynote at PDSW-DISCS 2017: 2nd Joint International Workshop On Parallel Data Storage & Data Intensive Scalable Computing Systems

Page 26: CS 744: GOOGLE FILE SYSTEM

GFS EVOLUTIONMotivation:

- GFS MasterOne machine not large enough for large FSSingle bottleneck for metadata operations (data path offloaded)Fault tolerant, but not HA

- Lack of predictable performanceNo guarantees of latency(GFS problems: one slow chunkserver -> slow writes)

Page 27: CS 744: GOOGLE FILE SYSTEM

GFS EVOLUTION

GFS master replaced by ColossusMetadata stored in BigTable

Recursive structure ? If Metadata is ~1/10000 the size of data100 PB data → 10 TB metadata10TB metadata → 1GB metametadata1GB metametadata → 100KB meta...

Page 28: CS 744: GOOGLE FILE SYSTEM

GFS EVOLUTION

Need for Efficient Storage

Rebalance old, cold data

Distributes newly written data evenly across disk

Manage both SSD and hard disks

Page 29: CS 744: GOOGLE FILE SYSTEM

Heterogeneous storage

F4: Facebook

Blob stores Key Value Stores

Page 30: CS 744: GOOGLE FILE SYSTEM

NEXT STEPS

- Assignment 1 out tonight!- Next up: MapReduce, Spark