Hadoop architecture meetup

32
Hadoop Architecture

description

 

Transcript of Hadoop architecture meetup

Page 1: Hadoop architecture meetup

Hadoop Architecture

Page 2: Hadoop architecture meetup

Agenda• Different Hadoop daemons & its roles

• How does a Hadoop cluster look like

• Under the Hood:- How does it write a file

• Under the Hood:- How does it read a file

• Under the Hood:- How does it replicate the file

• Under the Hood:- How does it run a job

• How to balance an un-balanced hadoop cluster

Page 3: Hadoop architecture meetup

Hadoop – A bit of background

• It’s an open source project

• Based on 2 technical papers published by Google

• A well known platform for distributed applications

• Easy to scale-out

• Works well with commodity hard wares(not entirely true)

• Very good for background applications

Page 4: Hadoop architecture meetup

Hadoop Architecture

• Two Primary components Distributed File System (HDFS): It deals with file

operations like read, write, delete & etc

Map Reduce Engine: It deals with parallel computation

Page 5: Hadoop architecture meetup

Hadoop Distributed File System

• Runs on top of existing file system

• A file broken into pre-defined equal sized blocks & stored individually

• Designed to handle very large files

• Not good for huge number of small files

Page 6: Hadoop architecture meetup

Map Reduce Engine

• A Map Reduce Program consists of map and reduce functions

• A Map Reduce job is broken into tasks that run in parallel

• Prefers local processing if possible

Page 7: Hadoop architecture meetup
Page 8: Hadoop architecture meetup

Hadoop Cluster

Page 9: Hadoop architecture meetup

Typical Workflow

Page 10: Hadoop architecture meetup
Page 11: Hadoop architecture meetup
Page 12: Hadoop architecture meetup
Page 13: Hadoop architecture meetup
Page 14: Hadoop architecture meetup
Page 15: Hadoop architecture meetup
Page 16: Hadoop architecture meetup
Page 17: Hadoop architecture meetup
Page 18: Hadoop architecture meetup
Page 19: Hadoop architecture meetup
Page 20: Hadoop architecture meetup
Page 21: Hadoop architecture meetup
Page 22: Hadoop architecture meetup
Page 23: Hadoop architecture meetup
Page 24: Hadoop architecture meetup
Page 25: Hadoop architecture meetup
Page 26: Hadoop architecture meetup

Cluster Balancing

Page 27: Hadoop architecture meetup

Quiz

• If you had written a file of size 1TB into HDFS with replication factor 2, What is the actual size required by the HDFS to store this file?

• True/False? Even if Name node goes down, I still will be able to read files from HDFS.

Page 28: Hadoop architecture meetup

Quiz

• True/False? In Hadoop Cluster, We can have a secondary Job Tracker to enhance the fault tolerance.

• True/False? If Job Tracker goes down, You will not be able to write any file into HDFS.

Page 29: Hadoop architecture meetup

Quiz

• True/False? Name node stores the actual data itself.

• True/False? Name node can be re-built using the secondary name node.

• True/False? If a data node goes down, Hadoop takes care of re-replicating the affected data block.

Page 30: Hadoop architecture meetup

Quiz

• In which scenario, one data node tries to read data from another data node?

• What are the benefits of Name node’s rack-

awareness?

• True/False? HDFS is well suited for applications which write huge number of small files.

Page 31: Hadoop architecture meetup

Quiz

• True/False? Hadoop takes care of balancing the cluster automatically?

• True/False? Output of Map tasks are written to HDFS file?

• True/False? Output of Reduce tasks are written to HDFS file?

Page 32: Hadoop architecture meetup

Quiz

• True/False? In production cluster, commodity hardware can be used to setup Name node.

• Thank You