HDInsight Hadoop on Windows Azure
-
Upload
lynn-langit -
Category
Technology
-
view
3.606 -
download
1
description
Transcript of HDInsight Hadoop on Windows Azure
![Page 1: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/1.jpg)
S
Hadoop on Azure@LynnLangit
![Page 2: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/2.jpg)
Data Expertise / Lynn Langit
Practicing Architect
Cloud Deployments (Azure, AWS, Google)
Technical author / trainer
Google Cloud Developer SeriesSQL Server 2012 Developer Series Cloudera Certified Developer2 books on SQL Server BI
Industry awards
Microsoft – MVP for SQL Server Google – GDE for Cloud Platform10Gen – Master for MongoDB
Former MSFT FTE
4 years
![Page 3: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/3.jpg)
What is Hadoop?
HUGE Hype factor in 2011 / 2012
Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license• Uses HDFS storage to enable applications to work with
thousands of nodes and petabytes of data • Uses MapReduce to process the data• Inspired by Google
• MapReduce • Google File System
![Page 4: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/4.jpg)
What is HDInsight?
Hadoop on Windows Azure On-premise
Microsoft worked with Hortonworks to port Hadoop to Windows (from Linux)
![Page 5: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/5.jpg)
Working with HDInsight
![Page 6: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/6.jpg)
RDBMS vs. Hadoop
RDBMS Hadoop
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch – NOT Interactive
Updates Read / Write many times
Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response Time
Can be near immediate Has latency (due to batch processing)
![Page 7: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/7.jpg)
Setting Up Your Cluster
![Page 8: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/8.jpg)
Configuration 1
![Page 9: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/9.jpg)
Configuration 2
![Page 10: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/10.jpg)
Pricing (during Preview)
![Page 11: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/11.jpg)
Dem
o
![Page 12: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/12.jpg)
Basic Administration
Connect via RDP
![Page 13: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/13.jpg)
NameNode Utility – Top Level
![Page 14: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/14.jpg)
NameNode Utility – Drill Down
![Page 15: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/15.jpg)
Understanding Storage
![Page 16: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/16.jpg)
Using the Azure Storage Viewer
![Page 17: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/17.jpg)
What is MapReduce?
![Page 18: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/18.jpg)
MapReduce using Java
WordCount example
![Page 19: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/19.jpg)
MapReduce using C# Streaming
WordCount example
![Page 20: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/20.jpg)
MapReduce using JavaScript
WordCount example
![Page 21: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/21.jpg)
Simple Output Graphing
WordCount example
![Page 22: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/22.jpg)
Using HIVE
![Page 23: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/23.jpg)
Understanding Pig
Load>Transform>Dump or Store
![Page 24: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/24.jpg)
Monitoring Job Results
In the portal Main Console
Job icon (button) status summary
Job History Interactive Console
JS quick feedback JS detailed feedback (log)
Using RDP Map/Reduce tool Hadoop command
prompt
![Page 25: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/25.jpg)
![Page 26: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/26.jpg)
Monitoring Job Status
![Page 27: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/27.jpg)
Download – ODBC for HIVE
Includes add-in for Excel
![Page 28: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/28.jpg)
Hadoop Connector to Excel
![Page 29: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/29.jpg)
Connecting to PowerPivot
Create an ODBC connection to HIVE
Connect to ‘other data source’ in PowerPivot
![Page 30: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/30.jpg)
Connecting with PowerQuery
![Page 31: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/31.jpg)
Pulling it Together - Klout
![Page 32: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/32.jpg)
Hadoop To-Do List
• Use Hadoop when business needs designate
• Use other NoSQL if a better fit
BigData = Hadoop
• Quick and cheap• Specialized use
cases• Behavioral data• dev, test ,
training environments
Hadoop on the cloud • Learn
Map/Reduce• Use HIVE via
Excel• Pay attention to
ImpalaHadoop access
technologies
![Page 33: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/33.jpg)
www.TeachingKidsProgramming.org
![Page 34: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/34.jpg)
VOTECONFIRMSHARE
![Page 35: HDInsight Hadoop on Windows Azure](https://reader036.fdocuments.net/reader036/viewer/2022081518/554a130eb4c905825d8b4c29/html5/thumbnails/35.jpg)
Keep Learning
@LynnLangit
YouTube – SoCalDevGal
Hire Me Architecture Best Practices Performance Tuning