Hadoop..

20
Presented by NIKHIL P L 1

description

Apache Hadoop Seminar

Transcript of Hadoop..

Page 1: Hadoop..

1

Presented by NIKHIL P L

Page 2: Hadoop..

Apache Hadoop

• Developer(s) : Apache Software Foundation

• Type : Distributed File System• License : Apache License 2.0• Written in : Java• O S : Cross platform• Created by : Doug Cutting (2005)• Inspired by: Google’s MapReduce, GFS

2

Page 3: Hadoop..

3

Sub projects

• HDFS– distributed, scalable, and portable file system– Store large data sets– Cope with hardware failure– Runs on top of the existing system

Page 4: Hadoop..

4

HDFS - Replication

• Blocks with data are replicated to multiple nodes

• Allow for node failure without data loss

Page 5: Hadoop..

5

Sub projects .

• MapReduce– Technology from Google– Hadoop's fundamental data filtering algorithm– Map and Reduce functions– Useful in a wide range of application• distributed pattern-based searching, distributed

sorting, web link-graph reversal, machine learning, statistical machine translation.

Page 6: Hadoop..

6

MapReduce - Workflow

Page 7: Hadoop..

7

Hadoop cluster (Terminology)

Page 8: Hadoop..

8

Types of Nodes

• HDFS nodes– NameNode (Master)– DataNode (Slaves)

• MapReduce nodes– Job Tracker (Master)– Task Tracker (Slaves)

Page 9: Hadoop..

9

Types of Nodes .

Page 10: Hadoop..

10

Sub projects ..

• Hive– providing data summarization, query, and analysis– initially developed by Facebook

• Hbase– open source, non-relational, distributed database– Providing Google BigTable-model database -like

capabilities

Page 11: Hadoop..

11

Sub projects …

• Zookeeper– distributed configuration service, synchronization

services, notification systems and naming registry for large distributed systems.

• Pig– A language and compiler to generate Hadoop

programs– Originally developed at Yahoo!

Page 12: Hadoop..

12

How does Hadoop works? .

• HDFS Works

Page 13: Hadoop..

13

How does Hadoop works? ..

• MapReduce Works

Page 14: Hadoop..

14

How does Hadoop works? …

• MapReduce Works

Page 15: Hadoop..

15

How does Hadoop works? ….

• Managing Hadoop Jobs

Page 16: Hadoop..

16

Applications

• Marketing analytics• Machin learning (eg: spam filters)• Image processing• Processing of XML messages

Page 17: Hadoop..

17

• world's largest Hadoop production application• ~20,000 machines running Hadoop

Page 18: Hadoop..

18

• the largest Hadoop cluster in the world with 100 PB of storage

• 1200 machines with 8 cores each + 800 machines with 16 cores each

• 32 GB of RAM per machine• 65 millions files in HDFS• 12 TB of compressed data added per day

Page 19: Hadoop..

19

Other Users

Page 20: Hadoop..

20

Thanks