Builiding analytical apps on Hadoop
-
Upload
dmitry-makarchuk -
Category
Documents
-
view
1.601 -
download
0
Transcript of Builiding analytical apps on Hadoop
![Page 1: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/1.jpg)
1
Headline Goes HereSpeaker Name or Subhead Goes Here
DO NOT USE PUBLICLY PRIOR TO 10/23/12
Building Analytical Applications on HadoopJosh Wills | Director of Data Science November 2012
![Page 2: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/2.jpg)
2
About Me
![Page 3: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/3.jpg)
3
What are ‘Analytical Applications?’
![Page 4: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/4.jpg)
4
The Humble Dashboard
![Page 5: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/5.jpg)
5
Crossfilter with Flight Information
![Page 6: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/6.jpg)
6
New York Times Electoral Vote Map
![Page 7: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/7.jpg)
7
New York Times Electoral Vote Map (Detail)
![Page 8: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/8.jpg)
8
Analytical Applications vs. Frameworks
![Page 9: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/9.jpg)
9
A Case Study
Developing Analytical Applications
![Page 10: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/10.jpg)
10
2012: The Predicting of the President
![Page 11: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/11.jpg)
11
RealClearPolitics
• Simple Average of Polls
• Transparent
• Simple Interactions
![Page 12: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/12.jpg)
12
FiveThirtyEight
• “Foxy” Model
• Opaque
• Simple Interactions with a richer UI
![Page 13: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/13.jpg)
13
Princeton Election Consortium
•Medians and Polynomials
• Transparent
• Rich Interactions
![Page 14: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/14.jpg)
14
How Did They Do?
![Page 15: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/15.jpg)
15
A Few of These, Because They’re Fun
![Page 16: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/16.jpg)
16
A Few of These, Because They’re Fun
![Page 17: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/17.jpg)
17
A Few of These, Because They’re Fun
![Page 18: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/18.jpg)
18
Here’s the Rub: One Expert Beat Nate
![Page 19: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/19.jpg)
19
Index Funds, Hedge Funds, and Warren Buffett
![Page 20: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/20.jpg)
20
A Brief Introduction to Hadoop
![Page 21: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/21.jpg)
21
Data Storage in 2001: Databases
• Structured schemas• Intensive processing
done where data is stored• Somewhat reliable• Expensive at scale
![Page 22: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/22.jpg)
22
Data Storage in 2001: Filers
• No schemas, stores any kind of file• No data processing
capability• Reliable• Expensive at scale
![Page 23: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/23.jpg)
23
And Then, This Happened
![Page 24: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/24.jpg)
24
Data Economics: Return on Byte
![Page 25: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/25.jpg)
25
Big Data Economics
• No individual record is particularly valuable• Having every record is
incredibly valuable• Web index• Recommendation systems• Sensor data• Market basket analysis• Online advertising
![Page 26: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/26.jpg)
26
Introduction to Hadoop
![Page 27: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/27.jpg)
27
The Hadoop Distributed File System
• Based on the Google File System• Data stored in large files
• Large block size: 64MB to 256MB per block• Blocks are replicated to
multiple nodes in the cluster
![Page 28: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/28.jpg)
28
Simple, Reliable Processing: MapReduce
• Map Stage• Embarrassingly parallel
• Shuffle Stage: Large-scale distributed sort• Reduce Stage
• Process all of the values that have the same key in a single step• Process the data where it is stored• Write once and you’re done.
![Page 29: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/29.jpg)
29
Developing Analytical Applications with Hadoop
![Page 30: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/30.jpg)
30
Novelty is the Enemy of Adoption
![Page 31: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/31.jpg)
31
The Best Way to Get Started: Apache Hive
• Apache Hive• Data Warehouse System on
top of Hadoop• SQL-based query language
• SELECT, INSERT, CREATE TABLE
• Includes some MapReduce-specific extensions
![Page 32: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/32.jpg)
32
Borrowing Abstractions
![Page 33: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/33.jpg)
33
Improving the UX (http://github.com/cloudera/impala)
![Page 34: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/34.jpg)
34
Moving Beyond the Abstractions
![Page 35: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/35.jpg)
35
Making the Abstract Concrete
![Page 36: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/36.jpg)
36
Cloudera’s Data Science Course
![Page 37: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/37.jpg)
37
Analytical Applications I Love
![Page 38: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/38.jpg)
38
The Experiments Dashboard
![Page 39: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/39.jpg)
39
Adverse Drug Events
![Page 40: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/40.jpg)
40
Gene Sequencing and Analytics
![Page 41: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/41.jpg)
41
The Doctor’s Perspective
![Page 42: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/42.jpg)
42
A Couple of Themes
1. Structure data the data in the way that makes sense for the problem.
2. Interactive inputs, not just interactive outputs.
3. Simpler interfaces that yield more sophisticated answers.
![Page 43: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/43.jpg)
43
Working Towards The Dream
![Page 44: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/44.jpg)
44
Moving Beyond MapReduce
Developing Analytical Applications
![Page 45: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/45.jpg)
45
The Cambrian Explosion…of Frameworks
![Page 46: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/46.jpg)
46
It’s Frameworks All The Way Down: Spark
• Developed at Berkeley’s AMP Lab• Defines operations on
distributed in-memory collections• Written in Scala• Supports reading to and
writing from HDFS
![Page 47: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/47.jpg)
47
IFATWD: Graphlab
• Developed at CMU• Lower-level primitives
• (but higher than MPI)• Map/Reduce =>
Update/Sort• Flexible, allows for
asynchronous computations• Reads from HDFS
![Page 48: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/48.jpg)
48
Playing with YARN
![Page 49: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/49.jpg)
49
BranchReduce (http://github.com/cloudera/branchreduce)
![Page 50: Builiding analytical apps on Hadoop](https://reader031.fdocuments.net/reader031/viewer/2022013003/55512739b4c905325d8b46c1/html5/thumbnails/50.jpg)
50