Big data analysis using map/reduce

18
B B ig ig D D ata Analysis for Page ata Analysis for Page Ranking using Map/Reduce Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.

Transcript of Big data analysis using map/reduce

BBig ig DData Analysis for Page ata Analysis for Page

Ranking using Map/ReduceRanking using Map/Reduce

R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.

OverviewIntroductionWhat is Big Data!Why Big Data?4 V’s Of Big DataBig Data Analytics TechnologiesMap/Reduce Applications Case StudyConclusion

IntroductionData have outgrown the storage and processing capabilities of a single host.

Two fundamental challenges: – how to store and – how to work with voluminous data sizes, and, – how to understand data and turn it into a competitive

advantage.

What is Big Data! ‘Big-data’ is similar to ‘Small-data’ , but bigger

But having data bigger requires different approaches: techniques, tools & architectures

To solve: New problems and old problems in a better way.

The Blind men and the Elephant

Why Big Data?Key enablers for the growth of “Big Data” are:

Increase of Processing Power

Increase of Storage Capacities

Availability of Data

4 V’s of Big Data

Big Data Analytics TechnologiesHadoop

PLATFORA

WibiData

PIG

Hive

MapReduce

NoSQL databases

Column-oriented databases

HadoopHadoop is a distributed file system and data processing engine

Hadoop has two components:– The Hadoop distributed file system (HDFS)– The MapReduce programing.

Map / ReduceA High level abstracted framework for distributed processing of large datasets

Fault Tolerant , Parallelization

Computation consists of two phasesMapReduce

A Master-Slave architecture

Computations occurs in multiple slave nodes

And it tries to provide data locality as much as possible.

MR modelMap– Process a key/value pair to generate intermediate key/value

pairsReduce– Merge all intermediate values associated with the same key

Users implement interface of two primary methods:1. Map: (key1, val1) → (key2, val2)2. Reduce: (key2, [val2]) → [val3]

Applications

Homeland Security

Finance Smarter Healthcare Multi-channel sales

Telecom

Manufacturing

Traffic Control

Trading Analytics Fraud and Risk

Log Analysis

Search Quality

Retails

Case Study

Conclusion

Real-time big data isn’ t just a process for storing

petabytes or exabytes of data in a data warehouse, It’s

about the ability to make better decisions and take

meaningful actions at the right time.

Queries ??