Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
A quick review of Hadoop and MR
-
Upload
mighty-itauma -
Category
Documents
-
view
47 -
download
3
description
Transcript of A quick review of Hadoop and MR
Deep Learning
A Quick Review of Hadoop & MapReduceDr Xuewen Chen’s Group
I.I. Itauma
Wayne State UniversityDepartment of Computer Science
November 22, 2013
Itauma Introduction to Hadoop & MapReduce
Deep Learning
Data
Telecommunication.Internet.Phone data.Online stores.Medicine - X rays.Research - Similarity in tumours.
Need to store & process data.
Itauma Introduction to Hadoop & MapReduce
Deep Learning
What is Big Data
Anything that can not be stored in a traditional database.Any data too big to be process on a single machine.
Itauma Introduction to Hadoop & MapReduce
Deep Learning
Challenges in Big Data
Data are created fast.Data from different sources in various formats.Data is not worthless but have a lot of value.
Itauma Introduction to Hadoop & MapReduce
Deep Learning
3V’s in Big Data
Volume - Size of data.Variety - Different sources and format of data.Velocity - Speed at which it is generated and madeavailable for processing.
Volume: Cost based on size of storage (SAN) AWS. We needcheaper ways to store reliably. (Read & process it efficiently).Streaming data & processing can be slow.Hadoop helps to scale & store data.Variety: structure & unstructured or semi-structure data.Hadoop: Data can be stored in its raw format. Not throwing anyinformation away. [S]
Itauma Introduction to Hadoop & MapReduce
Deep Learning
What Data interests you?
Science.E-commerce.Medical.Social.Financial.Sports.Utilities...
Itauma Introduction to Hadoop & MapReduce
Deep Learning
Cloudera - Doug
Hadoop was coinded out of Dong son’s toy elephant whichhe called hadoop.Hadoop store in HDFS and process with MapReduce.It offers an efficient way of storing data via HDFS.Hadoop Ecosystem. [S]CDH. Distribution of Hadoop with easy installationhttps://docs.google.com/document/d/1v0zGBZ6EHap-Smsr3x3sGGpDW-54m82kDpPKC2M6uiY/editHadoop was originally part of the open source projectcalled Nutch.
S1
Itauma Introduction to Hadoop & MapReduce
Deep Learning
MapReduce
Processing chunks of data in parallel.S2
Used in Recommendation system, Fraud Detection, Itemclassification
Itauma Introduction to Hadoop & MapReduce
Deep Learning
Running Jobs on the Cluster
Hadoop streaming enables us to write our codes in anylanguage e.p python, octave.
Itauma Introduction to Hadoop & MapReduce
Appendix Thank you for your attention
Thank you for your attention I
Itauma Introduction to Hadoop & MapReduce
Appendix Thank you for your attention
Thanks!
Itauma Introduction to Hadoop & MapReduce