Hadoop architecture meetup
-
Upload
vmoorthy -
Category
Technology
-
view
466 -
download
4
description
Transcript of Hadoop architecture meetup
Hadoop Architecture
Agenda• Different Hadoop daemons & its roles
• How does a Hadoop cluster look like
• Under the Hood:- How does it write a file
• Under the Hood:- How does it read a file
• Under the Hood:- How does it replicate the file
• Under the Hood:- How does it run a job
• How to balance an un-balanced hadoop cluster
Hadoop – A bit of background
• It’s an open source project
• Based on 2 technical papers published by Google
• A well known platform for distributed applications
• Easy to scale-out
• Works well with commodity hard wares(not entirely true)
• Very good for background applications
Hadoop Architecture
• Two Primary components Distributed File System (HDFS): It deals with file
operations like read, write, delete & etc
Map Reduce Engine: It deals with parallel computation
Hadoop Distributed File System
• Runs on top of existing file system
• A file broken into pre-defined equal sized blocks & stored individually
• Designed to handle very large files
• Not good for huge number of small files
Map Reduce Engine
• A Map Reduce Program consists of map and reduce functions
• A Map Reduce job is broken into tasks that run in parallel
• Prefers local processing if possible
Hadoop Cluster
Typical Workflow
Cluster Balancing
Quiz
• If you had written a file of size 1TB into HDFS with replication factor 2, What is the actual size required by the HDFS to store this file?
• True/False? Even if Name node goes down, I still will be able to read files from HDFS.
Quiz
• True/False? In Hadoop Cluster, We can have a secondary Job Tracker to enhance the fault tolerance.
• True/False? If Job Tracker goes down, You will not be able to write any file into HDFS.
Quiz
• True/False? Name node stores the actual data itself.
• True/False? Name node can be re-built using the secondary name node.
• True/False? If a data node goes down, Hadoop takes care of re-replicating the affected data block.
Quiz
• In which scenario, one data node tries to read data from another data node?
• What are the benefits of Name node’s rack-
awareness?
• True/False? HDFS is well suited for applications which write huge number of small files.
Quiz
• True/False? Hadoop takes care of balancing the cluster automatically?
• True/False? Output of Map tasks are written to HDFS file?
• True/False? Output of Reduce tasks are written to HDFS file?
Quiz
• True/False? In production cluster, commodity hardware can be used to setup Name node.
• Thank You