big dat ppt
-
Upload
shailja-dalmia -
Category
Documents
-
view
235 -
download
1
Transcript of big dat ppt
![Page 1: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/1.jpg)
Presented By- SHAILJA DALMIA 13IT252
BIG DATA ANALYTICS USING HADOOP
![Page 2: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/2.jpg)
INTRODUCTIONEra of digitilized WorldChallenges to cutting edge businessesGFS and MapReduceIn 2006,Mike Caferella & Doug Cutting
working under Nutch project implemented Hadoop.
Open Source Framework for writing and running distributed applications.
![Page 3: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/3.jpg)
WHAT IS BIG DATA?
![Page 4: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/4.jpg)
WHY DFS?
![Page 5: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/5.jpg)
What is Distributed File System?
![Page 6: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/6.jpg)
What is Hadoop?
![Page 7: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/7.jpg)
Hadoop Core Components
![Page 8: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/8.jpg)
What is HDFS?
![Page 9: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/9.jpg)
Design of HDFS
Area where HDFS is not a good fit
![Page 10: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/10.jpg)
HDFS COMPONENTS
NameNodeDataNodes
![Page 11: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/11.jpg)
Job Tracker and Task Tracker
![Page 12: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/12.jpg)
HDFS Architecture
![Page 13: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/13.jpg)
Map Reduce• Framework that assigns task to each
datanodes. Map Step-master node takes the input ,partition
it up into smaller sub problem,leading to multi level tree structure.
Reduce Step-Combine the results and generate the output
Each mapping operation is independent of other,Key value pair is generated ,sorters and shufflers are applied .
Parallelism offer fault tolerance,if one nodes fails ,still the work can be rescheduled.
Similar to Divide and Conquer technique. Does task in parallel to accomplish work in less
time.
![Page 14: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/14.jpg)
Hadoop Key Features:
AccessibleRobustnessSimpleScalableCost EffectiveFlexibleFault Tolerant
![Page 15: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/15.jpg)
Differences Between Hadoop and RDBMS
Hadoop Designed to scale out
architecture.Key value pairsFunctional
Programming(scripts and codes),can build complex models
Offline processing (WORA)
RDBMSScaling is expensiveTables having relational
structureDeclarative queriesOnline Processing.(work
for random reading and writing few records.
![Page 16: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/16.jpg)
Hadoop Related TechnologiesAvro-Data Serialization System,rich data
structures,container file,compact fast binary data format.
Chukwa-powerful toolkit for analyzing data.
Hbase-Distributed database,provides big table like capabilities.
Hive-data warehouse useful for data summarization .Uses HiveQL language.
![Page 17: big dat ppt](https://reader036.fdocuments.net/reader036/viewer/2022062401/58ec105a1a28ab28338b4587/html5/thumbnails/17.jpg)
ConclusionHadoop had gained huge momentum
Technologies around are evolving really fast
There is no “One size fits all”
Valuable ,powerful tool.
More targeted businesses.