Understanding Hadoop framework
-
Upload
prashant-sharma -
Category
Documents
-
view
244 -
download
0
Transcript of Understanding Hadoop framework
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 1/31
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 2/31
Week 1 – Understanding Big Data
– Introduction to HDFS
Week 2 – Playing around with Cluster
– Data loading Techniques
Week 3 – Map-Reduce Basics, types and formats
– Use-cases for Map-Reduce
Week 4 – Analytics using Pig
– Understanding Pig Latin
Week 5 – Analytics using Hive
–
Understanding HIVE QL
Week 6 – NoSQL Databases
– Understanding HBASE
Week 7 – Real world Datasets and
– Hadoop Project Environm
Week 8 – Project Reviews
– Planning a career in Big D
Course Topics
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 3/31
Live classes
Class recordings Module wise Quizzes, Coding Assignments
24x7 on-demand technical support
Project work on large Datasets
Online certification exam
Lifetime access to the Learning Management System
How it works
Complementary Java Classes
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 4/31
What is Big Data?
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 5/31
Facebook Example
Facebook users spend 10.5 b
(almost 20,000 years) online
network
Facebook has an average of
comments are posted every
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 6/31
Twitter has over 500 million re
users.
The USA, whose 141.8 million represents 27.4 percent of all T
good enough to finish well ahe
Japan, the UK and Indonesia.
79% of US Twitter users are mo
recommend brands they follow 67% of US Twitter users are mo
buy from brands they follow
57% of all companies that use
for business use Twitter
Twitter Example
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 7/31
Other Industrial Usecases
• Insurance
• Healthcare
• Retail
– Recommendations
–Groupings
• Genome Sequencing
• Utilities
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 8/31
Hadoop Users
http://wiki.apache.org/hadoop/PoweredBy
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 9/31
Data volume is growing exponentially
• Estimated Global Data Volum
– 2011: 1.8 ZB
– 2015: 7.9 ZB
• The world's information doubl
• Over the next 10 years:
– The number of servers world
– Amount of information mana
data centers will grow by 50x
– Number of “files” enterprise
will grow by 75x
Source: http://www.emc.com/leaders
universe.htm, which was based on the
Universe Study
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 10/31
Un-Structured Data is exploding
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 11/31
Read 1 TB Data
10 Machine 4 I/O Channels
Each Channel – 1 4 I/O Channels
Each Channel – 100 MB/s
1 Machine
Why DFS?
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 12/31
10 Machine 4 I/O Channels
Each Channel – 1 4 I/O Channels
Each Channel – 100 MB/s
1 Machine
Read 1 TB Data
45 Minutes
Why DFS?
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 13/31
4.5 Minut45 Minutes
10 Machine 4 I/O Channels
Each Channel – 1 4 I/O Channels
Each Channel – 100 MB/s
1 Machine
Read 1 TB Data
Why DFS?
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 14/31
What Is Distributed File System? (DFS)
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 15/31
Apache Hadoop is a framework that allows for the distributed processing of large data sets ac
of commodity computers using a simple programming model.
Companies using Hadoop:
- Yahoo
- Amazon
- AOL
- IBM
- And many more at
http://wiki.apache.org/hadoop/PoweredBy
What is Hadoop?
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 16/31
Hadoop Eco-System
d
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 17/31
HDFS – Hadoop Distributed File System (storage)
MapReduce (processing)
Hadoop Core Components:
h i S?
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 18/31
HDFS - Hadoop Distributed File System
Highly fault-tolerant
High throughput
Suitable for applications with large data sets
Streaming access to file system data
Can be built out of commodity hardware
What is HDFS?
M i C Of HDFS
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 19/31
NameNode:
master of the system
maintains and manages the blocks which are present on the
DataNodes
Main Components Of HDFS:
DataNodes: slaves which are deployed on each machine and provide the actual
storage
responsible for serving read and write requests for the clients
S d N N d
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 20/31
Secondary NameNode:
Not a hot standby for the NameNode
Connects to NameNode every hour*
Housekeeping, backup of NemeNode metadata
Saved metadata can build a failed NameNode
Secondary NameNode:
You gi
metada
hour, I sec
Sin
F
Secondary
NameNode
NameNode
metadata
metadata
J bT k d T kT k
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 21/31
JobTracker and TaskTracker:
HDFS A hit t
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 22/31
HDFS Architecture
Job Tracker
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 23/31
Job Tracker
Job Tracker Contd
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 24/31
Job Tracker Contd.
Job Tracker Contd
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 25/31
Job Tracker Contd.
Job Tracker Contd
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 26/31
Job Tracker Contd.
HDFS Client Creates a New File
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 27/31
HDFS Client Creates a New File
Rack Awareness
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 28/31
Rack Awareness
Anatomy of a File Write:
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 29/31
Anatomy of a File Write:
Anatomy of a File Read:
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 30/31
Anatomy of a File Read:
7/27/2019 Understanding Hadoop framework
http://slidepdf.com/reader/full/understanding-hadoop-framework 31/31
Thank YouSee You in Class Next Week