Topic 11: Google Filesystem

Post on 18-Nov-2014

737 views 1 download

description

Cloud Computing Workshop 2013, ITU

Transcript of Topic 11: Google Filesystem

11: Google Filesystem

Zubair Nabi

zubair.nabi@itu.edu.pk

April 20, 2013

Zubair Nabi 11: Google Filesystem April 20, 2013 1 / 29

Outline

1 Introduction

2 Google Filesystem

3 Hadoop Distributed Filesystem

Zubair Nabi 11: Google Filesystem April 20, 2013 2 / 29

Outline

1 Introduction

2 Google Filesystem

3 Hadoop Distributed Filesystem

Zubair Nabi 11: Google Filesystem April 20, 2013 3 / 29

Filesystem

The purpose of a filesystem is to:

1 Organize and store data

2 Support sharing of data among users and applications

3 Ensure persistence of data after a reboot

4 Examples include FAT, NTFS, ext3, ext4, etc.

Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29

Filesystem

The purpose of a filesystem is to:

1 Organize and store data

2 Support sharing of data among users and applications

3 Ensure persistence of data after a reboot

4 Examples include FAT, NTFS, ext3, ext4, etc.

Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29

Filesystem

The purpose of a filesystem is to:

1 Organize and store data

2 Support sharing of data among users and applications

3 Ensure persistence of data after a reboot

4 Examples include FAT, NTFS, ext3, ext4, etc.

Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29

Filesystem

The purpose of a filesystem is to:

1 Organize and store data

2 Support sharing of data among users and applications

3 Ensure persistence of data after a reboot

4 Examples include FAT, NTFS, ext3, ext4, etc.

Zubair Nabi 11: Google Filesystem April 20, 2013 4 / 29

Distributed filesystem

Self-explanatory: the filesystem is distributed across many machines

The DFS provides a common abstraction to the dispersed files

Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.Maintains a namespace which maps logical names to physical names

I Simplifies replication and migration

Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.

Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29

Distributed filesystem

Self-explanatory: the filesystem is distributed across many machines

The DFS provides a common abstraction to the dispersed files

Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.Maintains a namespace which maps logical names to physical names

I Simplifies replication and migration

Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.

Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29

Distributed filesystem

Self-explanatory: the filesystem is distributed across many machines

The DFS provides a common abstraction to the dispersed files

Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.

Maintains a namespace which maps logical names to physical namesI Simplifies replication and migration

Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.

Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29

Distributed filesystem

Self-explanatory: the filesystem is distributed across many machines

The DFS provides a common abstraction to the dispersed files

Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.Maintains a namespace which maps logical names to physical names

I Simplifies replication and migration

Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.

Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29

Distributed filesystem

Self-explanatory: the filesystem is distributed across many machines

The DFS provides a common abstraction to the dispersed files

Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.Maintains a namespace which maps logical names to physical names

I Simplifies replication and migration

Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.

Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29

Distributed filesystem

Self-explanatory: the filesystem is distributed across many machines

The DFS provides a common abstraction to the dispersed files

Each DFS has an associated API that provides a service to clients,which are normal file operations, such as create, read, write, etc.Maintains a namespace which maps logical names to physical names

I Simplifies replication and migration

Examples include the Network Filesystem (NFS), Andrew Filesystem(AFS), etc.

Zubair Nabi 11: Google Filesystem April 20, 2013 5 / 29

Outline

1 Introduction

2 Google Filesystem

3 Hadoop Distributed Filesystem

Zubair Nabi 11: Google Filesystem April 20, 2013 6 / 29

Introduction

Designed by Google to meet its massive storage needs

Shares many goals with previous distributed filesystems such asperformance, scalability, reliability, and availability

At the same time, design driven by key observations of their workloadand infrastructure, both current and future

Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29

Introduction

Designed by Google to meet its massive storage needs

Shares many goals with previous distributed filesystems such asperformance, scalability, reliability, and availability

At the same time, design driven by key observations of their workloadand infrastructure, both current and future

Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29

Introduction

Designed by Google to meet its massive storage needs

Shares many goals with previous distributed filesystems such asperformance, scalability, reliability, and availability

At the same time, design driven by key observations of their workloadand infrastructure, both current and future

Zubair Nabi 11: Google Filesystem April 20, 2013 7 / 29

Design Goals

1 Failure is the norm rather than the exception: The GFS mustconstantly introspect and automatically recover from failure

2 The system stores a fair number of large files: Optimize for largefiles, on the order of GBs, but still support small files

3 Applications prefer to do large streaming reads of contiguousregions: Optimize for this case

Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29

Design Goals

1 Failure is the norm rather than the exception: The GFS mustconstantly introspect and automatically recover from failure

2 The system stores a fair number of large files: Optimize for largefiles, on the order of GBs, but still support small files

3 Applications prefer to do large streaming reads of contiguousregions: Optimize for this case

Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29

Design Goals

1 Failure is the norm rather than the exception: The GFS mustconstantly introspect and automatically recover from failure

2 The system stores a fair number of large files: Optimize for largefiles, on the order of GBs, but still support small files

3 Applications prefer to do large streaming reads of contiguousregions: Optimize for this case

Zubair Nabi 11: Google Filesystem April 20, 2013 8 / 29

Design Goals (2)

4 Most applications perform large, sequential writes that are mostlyappend operations: Support small writes but do not optimize for them

5 Most operations are producer-consume queues or many-waymerging: Support concurrent reads or writes by hundreds of clientssimultaneously

6 Applications process data in bulk at a high rate: Favour throughputover latency

Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29

Design Goals (2)

4 Most applications perform large, sequential writes that are mostlyappend operations: Support small writes but do not optimize for them

5 Most operations are producer-consume queues or many-waymerging: Support concurrent reads or writes by hundreds of clientssimultaneously

6 Applications process data in bulk at a high rate: Favour throughputover latency

Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29

Design Goals (2)

4 Most applications perform large, sequential writes that are mostlyappend operations: Support small writes but do not optimize for them

5 Most operations are producer-consume queues or many-waymerging: Support concurrent reads or writes by hundreds of clientssimultaneously

6 Applications process data in bulk at a high rate: Favour throughputover latency

Zubair Nabi 11: Google Filesystem April 20, 2013 9 / 29

Interface

The interface is similar to traditional filesystems but no support for astandard POSIX-like API

Files are organized hierarchically into directories with pathnames

Support for create, delete, open, close, read, and write operations

Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29

Interface

The interface is similar to traditional filesystems but no support for astandard POSIX-like API

Files are organized hierarchically into directories with pathnames

Support for create, delete, open, close, read, and write operations

Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29

Interface

The interface is similar to traditional filesystems but no support for astandard POSIX-like API

Files are organized hierarchically into directories with pathnames

Support for create, delete, open, close, read, and write operations

Zubair Nabi 11: Google Filesystem April 20, 2013 10 / 29

Architecture

Consists of a single master and multiple chunkservers

The system can be accessed by multiple clients

Both the master and chunkservers run as user-space server processeson commodity Linux machines

Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29

Architecture

Consists of a single master and multiple chunkservers

The system can be accessed by multiple clients

Both the master and chunkservers run as user-space server processeson commodity Linux machines

Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29

Architecture

Consists of a single master and multiple chunkservers

The system can be accessed by multiple clients

Both the master and chunkservers run as user-space server processeson commodity Linux machines

Zubair Nabi 11: Google Filesystem April 20, 2013 11 / 29

Files

Files are sliced into fixed-size chunks

Each chunk is identifiable by an immutable and globally unique 64-bithandle

Chunks are stored by chunkservers as local Linux files

Reads and writes to a chunk are specified by a handle and a byterangeEach chunk is replicated on multiple chunkservers

I 3 by default

Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29

Files

Files are sliced into fixed-size chunks

Each chunk is identifiable by an immutable and globally unique 64-bithandle

Chunks are stored by chunkservers as local Linux files

Reads and writes to a chunk are specified by a handle and a byterangeEach chunk is replicated on multiple chunkservers

I 3 by default

Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29

Files

Files are sliced into fixed-size chunks

Each chunk is identifiable by an immutable and globally unique 64-bithandle

Chunks are stored by chunkservers as local Linux files

Reads and writes to a chunk are specified by a handle and a byterangeEach chunk is replicated on multiple chunkservers

I 3 by default

Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29

Files

Files are sliced into fixed-size chunks

Each chunk is identifiable by an immutable and globally unique 64-bithandle

Chunks are stored by chunkservers as local Linux files

Reads and writes to a chunk are specified by a handle and a byterange

Each chunk is replicated on multiple chunkserversI 3 by default

Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29

Files

Files are sliced into fixed-size chunks

Each chunk is identifiable by an immutable and globally unique 64-bithandle

Chunks are stored by chunkservers as local Linux files

Reads and writes to a chunk are specified by a handle and a byterangeEach chunk is replicated on multiple chunkservers

I 3 by default

Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29

Files

Files are sliced into fixed-size chunks

Each chunk is identifiable by an immutable and globally unique 64-bithandle

Chunks are stored by chunkservers as local Linux files

Reads and writes to a chunk are specified by a handle and a byterangeEach chunk is replicated on multiple chunkservers

I 3 by default

Zubair Nabi 11: Google Filesystem April 20, 2013 12 / 29

Master

In charge of all filesystem metadata

I Namespace, access control information, mapping between files andchunks, and current locations of chunks

I Holds this information in memory and regularly syncs it with a log file

Also in charge of chunk leasing, garbage collection, and chunkmigration

Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers

I As a result, the master does not become a performance bottleneck

Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29

Master

In charge of all filesystem metadataI Namespace, access control information, mapping between files and

chunks, and current locations of chunks

I Holds this information in memory and regularly syncs it with a log file

Also in charge of chunk leasing, garbage collection, and chunkmigration

Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers

I As a result, the master does not become a performance bottleneck

Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29

Master

In charge of all filesystem metadataI Namespace, access control information, mapping between files and

chunks, and current locations of chunksI Holds this information in memory and regularly syncs it with a log file

Also in charge of chunk leasing, garbage collection, and chunkmigration

Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers

I As a result, the master does not become a performance bottleneck

Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29

Master

In charge of all filesystem metadataI Namespace, access control information, mapping between files and

chunks, and current locations of chunksI Holds this information in memory and regularly syncs it with a log file

Also in charge of chunk leasing, garbage collection, and chunkmigration

Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers

I As a result, the master does not become a performance bottleneck

Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29

Master

In charge of all filesystem metadataI Namespace, access control information, mapping between files and

chunks, and current locations of chunksI Holds this information in memory and regularly syncs it with a log file

Also in charge of chunk leasing, garbage collection, and chunkmigration

Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructions

Clients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers

I As a result, the master does not become a performance bottleneck

Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29

Master

In charge of all filesystem metadataI Namespace, access control information, mapping between files and

chunks, and current locations of chunksI Holds this information in memory and regularly syncs it with a log file

Also in charge of chunk leasing, garbage collection, and chunkmigration

Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers

I As a result, the master does not become a performance bottleneck

Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29

Master

In charge of all filesystem metadataI Namespace, access control information, mapping between files and

chunks, and current locations of chunksI Holds this information in memory and regularly syncs it with a log file

Also in charge of chunk leasing, garbage collection, and chunkmigration

Periodically sends each chunkserver a heartbeat signal to check itsstate and send it instructionsClients interact with it to access metadata but all data-bearingcommunication goes directly to the relevant chunkservers

I As a result, the master does not become a performance bottleneck

Zubair Nabi 11: Google Filesystem April 20, 2013 13 / 29

Zubair Nabi 11: Google Filesystem April 20, 2013 14 / 29

Consistency Model: Master

All namespace mutations (such as file creation) are atomic as they areexclusively handled by the master

Namespace locking guarantees atomicity and correctness

The operation log maintained by the master defines a global total orderof these operations

Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29

Consistency Model: Master

All namespace mutations (such as file creation) are atomic as they areexclusively handled by the master

Namespace locking guarantees atomicity and correctness

The operation log maintained by the master defines a global total orderof these operations

Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29

Consistency Model: Master

All namespace mutations (such as file creation) are atomic as they areexclusively handled by the master

Namespace locking guarantees atomicity and correctness

The operation log maintained by the master defines a global total orderof these operations

Zubair Nabi 11: Google Filesystem April 20, 2013 15 / 29

Consistency Model: Data

The state after mutation depends on:I Mutation type: write or append

I Whether it succeeds or failsI Whether there are other concurrent mutations

A file region is consistent if all clients see the same data, regardlessof the replica

A region is defined after a mutation if it is still consistent and clientssee the mutation in its entirety

Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29

Consistency Model: Data

The state after mutation depends on:I Mutation type: write or appendI Whether it succeeds or fails

I Whether there are other concurrent mutations

A file region is consistent if all clients see the same data, regardlessof the replica

A region is defined after a mutation if it is still consistent and clientssee the mutation in its entirety

Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29

Consistency Model: Data

The state after mutation depends on:I Mutation type: write or appendI Whether it succeeds or failsI Whether there are other concurrent mutations

A file region is consistent if all clients see the same data, regardlessof the replica

A region is defined after a mutation if it is still consistent and clientssee the mutation in its entirety

Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29

Consistency Model: Data

The state after mutation depends on:I Mutation type: write or appendI Whether it succeeds or failsI Whether there are other concurrent mutations

A file region is consistent if all clients see the same data, regardlessof the replica

A region is defined after a mutation if it is still consistent and clientssee the mutation in its entirety

Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29

Consistency Model: Data

The state after mutation depends on:I Mutation type: write or appendI Whether it succeeds or failsI Whether there are other concurrent mutations

A file region is consistent if all clients see the same data, regardlessof the replica

A region is defined after a mutation if it is still consistent and clientssee the mutation in its entirety

Zubair Nabi 11: Google Filesystem April 20, 2013 16 / 29

Consistency Model: Data (2)

If there are no other concurrent writers, the region is defined andconsistent

Concurrent and successful mutations leave the region undefined butconsistent

I Mingled fragments from multiple mutations

A failed mutation makes the region both inconsistent and undefined

Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29

Consistency Model: Data (2)

If there are no other concurrent writers, the region is defined andconsistentConcurrent and successful mutations leave the region undefined butconsistent

I Mingled fragments from multiple mutations

A failed mutation makes the region both inconsistent and undefined

Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29

Consistency Model: Data (2)

If there are no other concurrent writers, the region is defined andconsistentConcurrent and successful mutations leave the region undefined butconsistent

I Mingled fragments from multiple mutations

A failed mutation makes the region both inconsistent and undefined

Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29

Consistency Model: Data (2)

If there are no other concurrent writers, the region is defined andconsistentConcurrent and successful mutations leave the region undefined butconsistent

I Mingled fragments from multiple mutations

A failed mutation makes the region both inconsistent and undefined

Zubair Nabi 11: Google Filesystem April 20, 2013 17 / 29

Mutation Operations

Each chunk has many replicas

The primary replica holds a lease from the master

It decides the order of all mutations for all replicas

Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29

Mutation Operations

Each chunk has many replicas

The primary replica holds a lease from the master

It decides the order of all mutations for all replicas

Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29

Mutation Operations

Each chunk has many replicas

The primary replica holds a lease from the master

It decides the order of all mutations for all replicas

Zubair Nabi 11: Google Filesystem April 20, 2013 18 / 29

Write Operation

Client obtains the location of replicas and the identity of the primaryreplica from the master

It then pushes the data to all replica nodes

The client issues an update request to primary

Primary forwards the write request to all replicas

It waits for a reply from all replicas before returning to the client

Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29

Write Operation

Client obtains the location of replicas and the identity of the primaryreplica from the master

It then pushes the data to all replica nodes

The client issues an update request to primary

Primary forwards the write request to all replicas

It waits for a reply from all replicas before returning to the client

Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29

Write Operation

Client obtains the location of replicas and the identity of the primaryreplica from the master

It then pushes the data to all replica nodes

The client issues an update request to primary

Primary forwards the write request to all replicas

It waits for a reply from all replicas before returning to the client

Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29

Write Operation

Client obtains the location of replicas and the identity of the primaryreplica from the master

It then pushes the data to all replica nodes

The client issues an update request to primary

Primary forwards the write request to all replicas

It waits for a reply from all replicas before returning to the client

Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29

Write Operation

Client obtains the location of replicas and the identity of the primaryreplica from the master

It then pushes the data to all replica nodes

The client issues an update request to primary

Primary forwards the write request to all replicas

It waits for a reply from all replicas before returning to the client

Zubair Nabi 11: Google Filesystem April 20, 2013 19 / 29

Record Append Operation

Performed atomically

Append location chosen by the GFS and communicated to the client

Primary forwards the write request to all replicasIt waits for a reply from all replicas before returning to the client

1 If the records fits in the current chunk, it is written and communicated tothe client

2 If it does not, the chunk is padded and the client is told to try the nextchunk

Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29

Record Append Operation

Performed atomically

Append location chosen by the GFS and communicated to the client

Primary forwards the write request to all replicasIt waits for a reply from all replicas before returning to the client

1 If the records fits in the current chunk, it is written and communicated tothe client

2 If it does not, the chunk is padded and the client is told to try the nextchunk

Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29

Record Append Operation

Performed atomically

Append location chosen by the GFS and communicated to the client

Primary forwards the write request to all replicas

It waits for a reply from all replicas before returning to the client1 If the records fits in the current chunk, it is written and communicated to

the client2 If it does not, the chunk is padded and the client is told to try the next

chunk

Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29

Record Append Operation

Performed atomically

Append location chosen by the GFS and communicated to the client

Primary forwards the write request to all replicasIt waits for a reply from all replicas before returning to the client

1 If the records fits in the current chunk, it is written and communicated tothe client

2 If it does not, the chunk is padded and the client is told to try the nextchunk

Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29

Record Append Operation

Performed atomically

Append location chosen by the GFS and communicated to the client

Primary forwards the write request to all replicasIt waits for a reply from all replicas before returning to the client

1 If the records fits in the current chunk, it is written and communicated tothe client

2 If it does not, the chunk is padded and the client is told to try the nextchunk

Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29

Record Append Operation

Performed atomically

Append location chosen by the GFS and communicated to the client

Primary forwards the write request to all replicasIt waits for a reply from all replicas before returning to the client

1 If the records fits in the current chunk, it is written and communicated tothe client

2 If it does not, the chunk is padded and the client is told to try the nextchunk

Zubair Nabi 11: Google Filesystem April 20, 2013 20 / 29

Zubair Nabi 11: Google Filesystem April 20, 2013 21 / 29

Application Safeguards

Use record append rather than write

Insert checksums in record headers to detect fragments

Insert sequence numbers to detect duplicates

Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29

Application Safeguards

Use record append rather than write

Insert checksums in record headers to detect fragments

Insert sequence numbers to detect duplicates

Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29

Application Safeguards

Use record append rather than write

Insert checksums in record headers to detect fragments

Insert sequence numbers to detect duplicates

Zubair Nabi 11: Google Filesystem April 20, 2013 22 / 29

Chunk Placement

Put on chunkservers with below average disk space usage

Limit number of “recent” creations on a chunkserver, to ensure that itdoes not experience any traffic spike due to its fresh data

For reliability, replicas spread across racks

Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29

Chunk Placement

Put on chunkservers with below average disk space usage

Limit number of “recent” creations on a chunkserver, to ensure that itdoes not experience any traffic spike due to its fresh data

For reliability, replicas spread across racks

Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29

Chunk Placement

Put on chunkservers with below average disk space usage

Limit number of “recent” creations on a chunkserver, to ensure that itdoes not experience any traffic spike due to its fresh data

For reliability, replicas spread across racks

Zubair Nabi 11: Google Filesystem April 20, 2013 23 / 29

Garbage Collection

Chunks become garbage when they are orphaned

A lazy reclamation strategy is used by not reclaiming chunks at deletetime

Each chunkserver communicates the subset of its current chunks tothe master in the heartbeat signal

Master pinpoints chunks which have been orphaned

The chunkserver finally reclaims that space

Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29

Garbage Collection

Chunks become garbage when they are orphaned

A lazy reclamation strategy is used by not reclaiming chunks at deletetime

Each chunkserver communicates the subset of its current chunks tothe master in the heartbeat signal

Master pinpoints chunks which have been orphaned

The chunkserver finally reclaims that space

Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29

Garbage Collection

Chunks become garbage when they are orphaned

A lazy reclamation strategy is used by not reclaiming chunks at deletetime

Each chunkserver communicates the subset of its current chunks tothe master in the heartbeat signal

Master pinpoints chunks which have been orphaned

The chunkserver finally reclaims that space

Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29

Garbage Collection

Chunks become garbage when they are orphaned

A lazy reclamation strategy is used by not reclaiming chunks at deletetime

Each chunkserver communicates the subset of its current chunks tothe master in the heartbeat signal

Master pinpoints chunks which have been orphaned

The chunkserver finally reclaims that space

Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29

Garbage Collection

Chunks become garbage when they are orphaned

A lazy reclamation strategy is used by not reclaiming chunks at deletetime

Each chunkserver communicates the subset of its current chunks tothe master in the heartbeat signal

Master pinpoints chunks which have been orphaned

The chunkserver finally reclaims that space

Zubair Nabi 11: Google Filesystem April 20, 2013 24 / 29

Stale Replica Detection

Each chunk is assigned a version number

Each time a new lease is granted, the version number is incremented

Stale replicas will have outdated version numbers

They are simply garbage collected

Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29

Stale Replica Detection

Each chunk is assigned a version number

Each time a new lease is granted, the version number is incremented

Stale replicas will have outdated version numbers

They are simply garbage collected

Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29

Stale Replica Detection

Each chunk is assigned a version number

Each time a new lease is granted, the version number is incremented

Stale replicas will have outdated version numbers

They are simply garbage collected

Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29

Stale Replica Detection

Each chunk is assigned a version number

Each time a new lease is granted, the version number is incremented

Stale replicas will have outdated version numbers

They are simply garbage collected

Zubair Nabi 11: Google Filesystem April 20, 2013 25 / 29

Outline

1 Introduction

2 Google Filesystem

3 Hadoop Distributed Filesystem

Zubair Nabi 11: Google Filesystem April 20, 2013 26 / 29

Introduction

Open-source clone of GFS

Comes packaged with Hadoop

Master is called the NameNode and chunkservers are calledDataNodes

Chunks are known as blocks

Exposes a Java API and a command-line interface

Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29

Introduction

Open-source clone of GFS

Comes packaged with Hadoop

Master is called the NameNode and chunkservers are calledDataNodes

Chunks are known as blocks

Exposes a Java API and a command-line interface

Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29

Introduction

Open-source clone of GFS

Comes packaged with Hadoop

Master is called the NameNode and chunkservers are calledDataNodes

Chunks are known as blocks

Exposes a Java API and a command-line interface

Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29

Introduction

Open-source clone of GFS

Comes packaged with Hadoop

Master is called the NameNode and chunkservers are calledDataNodes

Chunks are known as blocks

Exposes a Java API and a command-line interface

Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29

Introduction

Open-source clone of GFS

Comes packaged with Hadoop

Master is called the NameNode and chunkservers are calledDataNodes

Chunks are known as blocks

Exposes a Java API and a command-line interface

Zubair Nabi 11: Google Filesystem April 20, 2013 27 / 29

Command-line API

Accessible through: bin/hdfs dfs -command args

Useful commands: cat, copyFromLocal, copyToLocal, cp,ls, mkdir, moveFromLocal, moveToLocal, mv, rm, etc.1

1http://hadoop.apache.org/docs/r1.0.4/file_system_shell.htmlZubair Nabi 11: Google Filesystem April 20, 2013 28 / 29

Command-line API

Accessible through: bin/hdfs dfs -command args

Useful commands: cat, copyFromLocal, copyToLocal, cp,ls, mkdir, moveFromLocal, moveToLocal, mv, rm, etc.1

1http://hadoop.apache.org/docs/r1.0.4/file_system_shell.htmlZubair Nabi 11: Google Filesystem April 20, 2013 28 / 29

References

1 Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. 2003. TheGoogle file system. In Proceedings of the nineteenth ACM symposiumon Operating systems principles (SOSP ’03). ACM, New York, NY,USA, 29-43.

Zubair Nabi 11: Google Filesystem April 20, 2013 29 / 29