A quick review of Hadoop and MR

11
Deep Learning A Quick Review of Hadoop & MapReduce Dr Xuewen Chen’s Group I.I. Itauma Wayne State University Department of Computer Science November 22, 2013 Itauma Introduction to Hadoop & MapReduce

description

Just a basic guide which will be updated based on the Hadoop and MapReduce course from Udacity

Transcript of A quick review of Hadoop and MR

Page 1: A quick review of Hadoop and MR

Deep Learning

A Quick Review of Hadoop & MapReduceDr Xuewen Chen’s Group

I.I. Itauma

Wayne State UniversityDepartment of Computer Science

November 22, 2013

Itauma Introduction to Hadoop & MapReduce

Page 2: A quick review of Hadoop and MR

Deep Learning

Data

Telecommunication.Internet.Phone data.Online stores.Medicine - X rays.Research - Similarity in tumours.

Need to store & process data.

Itauma Introduction to Hadoop & MapReduce

Page 3: A quick review of Hadoop and MR

Deep Learning

What is Big Data

Anything that can not be stored in a traditional database.Any data too big to be process on a single machine.

Itauma Introduction to Hadoop & MapReduce

Page 4: A quick review of Hadoop and MR

Deep Learning

Challenges in Big Data

Data are created fast.Data from different sources in various formats.Data is not worthless but have a lot of value.

Itauma Introduction to Hadoop & MapReduce

Page 5: A quick review of Hadoop and MR

Deep Learning

3V’s in Big Data

Volume - Size of data.Variety - Different sources and format of data.Velocity - Speed at which it is generated and madeavailable for processing.

Volume: Cost based on size of storage (SAN) AWS. We needcheaper ways to store reliably. (Read & process it efficiently).Streaming data & processing can be slow.Hadoop helps to scale & store data.Variety: structure & unstructured or semi-structure data.Hadoop: Data can be stored in its raw format. Not throwing anyinformation away. [S]

Itauma Introduction to Hadoop & MapReduce

Page 6: A quick review of Hadoop and MR

Deep Learning

What Data interests you?

Science.E-commerce.Medical.Social.Financial.Sports.Utilities...

Itauma Introduction to Hadoop & MapReduce

Page 7: A quick review of Hadoop and MR

Deep Learning

Cloudera - Doug

Hadoop was coinded out of Dong son’s toy elephant whichhe called hadoop.Hadoop store in HDFS and process with MapReduce.It offers an efficient way of storing data via HDFS.Hadoop Ecosystem. [S]CDH. Distribution of Hadoop with easy installationhttps://docs.google.com/document/d/1v0zGBZ6EHap-Smsr3x3sGGpDW-54m82kDpPKC2M6uiY/editHadoop was originally part of the open source projectcalled Nutch.

S1

Itauma Introduction to Hadoop & MapReduce

Page 8: A quick review of Hadoop and MR

Deep Learning

MapReduce

Processing chunks of data in parallel.S2

Used in Recommendation system, Fraud Detection, Itemclassification

Itauma Introduction to Hadoop & MapReduce

Page 9: A quick review of Hadoop and MR

Deep Learning

Running Jobs on the Cluster

Hadoop streaming enables us to write our codes in anylanguage e.p python, octave.

Itauma Introduction to Hadoop & MapReduce

Page 10: A quick review of Hadoop and MR

Appendix Thank you for your attention

Thank you for your attention I

Itauma Introduction to Hadoop & MapReduce

Page 11: A quick review of Hadoop and MR

Appendix Thank you for your attention

Thanks!

Itauma Introduction to Hadoop & MapReduce