… I again turned over the pages. I came to typhoid fever — read the symptoms — discovered that...
-
Upload
stewart-obrien -
Category
Documents
-
view
213 -
download
0
Transcript of … I again turned over the pages. I came to typhoid fever — read the symptoms — discovered that...
Jumpstarting Big Data Projects:Stories from the FieldDBI-B336
Alexei KhalyakoOlivia Klose
EM OFC WIN DBI
CDP TWC DEV AZR
Following this session at 18:30
in Hall 5Meet with Microsoft Product ExpertsSnacks and Beverages Served
Ask The Experts Key and floorplan
Cloud and Datacenter Platform
Data Platform and Business Intelligence
Developer Platform and Tools
Enterprise Mobility
Office 365
Windows
Microsoft Azure
Trustworthy Computing
Focus on Azure & HDInsightGo through typical Big Data questionsCustomer use casesIt is NOT a Hadoop tutorial
Key TakeawaysUnderstand the variety of options in Big Data projects
Session Objectives & Key Takeaways
I have Big Data!
Jerome K. Jerome, Three Men in a Boat
… I again turned over the pages. I came to typhoid fever — read the symptoms — discovered that I had typhoid fever, must have had it for months without knowing it — wondered what else I had got; turned up St. Vitus’s Dance — found, as I expected, that I had that too, — began to get interested in my case, and determined to sift it to the bottom, and so started alphabetically — read up ague, and learnt that I was sickening for it, and that the acute stage would commence in about another fortnight. Bright’s disease, I was relieved to find, I had only in a modified form, and, so far as that was concerned, I might live for years. Cholera I had, with severe complications; and diphtheria I seemed to have been born with. I plodded conscientiously through the twenty-six letters, and the only malady I could conclude I had not got was housemaid’s knee.
Overview
Demand Architecture
DataLoading
DataPreparation
Analytics Validation
Overview
Do I haveBig Data?
Whichplatform?
(The Agony of Choice)
How do I get my data?
How do Ipre-process my data?
How do Ianalyze my data?
How do I validate my architectur
e?
Do I really have Big Data?
Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Do I really have Big Data?
Up to 75 control units in 1 vehicle
About 1,000 individual possible extra
equipments
1 GB car software, 15 GB data on board
(incl. navi)
2,000 user functions implemented
12,000 types of error stored onboard for
diagnosis
Daily up to 60,000 car diagnosis worldwide
Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
“We have structured data”</meldungText><antwort>False</antwort><wert>na</wert></meldung><steuergeraet sgbdVariante="SMG_60"><steuergeraeteFunktion zeitstempel="2013-04-30T09:00:37.9926171-04:00" endDate="2013-04-30T09:00:38.1158609-04:00" jobName="STATUS_FAHRZEUGTESTER"><datensatz satzNr="1"><result name="JOB_STATUS">OKAY</result><result name="_TEL_ANTWORT">80 F1 18 70 70 02 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 82 6B 00 6D 6B 39 CD 14 00 14 00 00 0E 00 15 00 0A 00 19 00 0C 00 12 00 15 85 57 71 88 81 C0 7D 73 C2 08 01 05 02 F7 00 FF FF 01 73 00 00 02 A8 00 C2 00 01 E0 00 00 00 00 00 00 3D 01 00 00 00 01 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 FD 01 E1 02 05 01 F8 03 4F FF AD 04</result><result name="_TEL_AUFTRAG">83 18 F1 30 02 01</result><result name="STAT_KL15_ROH">0</result><result name="STAT_KLR_EIN_ROH">0</result><result name="STAT_WAKE_UP_ROH">1</result><result name="STAT_ISTGANG_TEXT">Neutral</result><sgFunktion zeitstempel=“2013-04-30T10:33:37.0834084+02:00" endDate="2013-04-30T10:33:37.9310504+02:00" jobName="_FLM_LESEN_BOSCH"><datensatz satzNr="1"><result name="FLM_DATEN_1">00 00 00 03 02 08 C6 56 46 4C 4D 39 00 16 4B B2 00 00 00 32 00 00 06 99 00 00 00 65 00 00 18 6E 00 00 00 73 00 00 00 20 00 00 00 73 00 00 00 00 00 00 10 69 00 00 0F 53 00 00 00 2C 00 00 00 0A 00 00 79 6D 00 00 B7 34 00 00 D3 9E 4A 4C 41 52 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2C 00 00 00 00 00 00 1A 5C 00 15 4B CA 00 00 44 08 00 00 2D 39 00 00 1E 45 00 00 26 89 00 00 1E EB 00 00 0C 65 00 00 04 47 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 27 00 00 01 1E 00 00 02 AB 00 00 07 71 00 00 13 D7 00 00 36 48 00 15 91 AD 00 00 3F 97 00 00 19 C1 00 00 07 F9 00 00 02 D4 00 00 00 BD 00 00 00 20 00 16 1C 42 00 00 18 B1 00 00 09 40 00 00 08 9F 00 00 04 3A 00 00 01 3E 00 01 8C D7 00 00 61 A3 00 00 37 9D 00 00 1E 78 00 00 14 96 00 00 0A 71 00 00 05 49 00 00 02 B1 00 00 00 A7 00 00 00 1D 00 00 00 09 00 00 00 05 00 00 00 00 00 00 00 00 00 00 23 BB 00 00 2F 84 00 00 14 EF 00 00 09 40 00 00 04 71 00 00 03 34 00 00 02 12 00 00 01 AC 00 00 01 59 00 00 0B C4 00 00 00 06 00 00 00 38 00 00 00 19 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 03 00 00 00 00 00 00 00 00 52 4F 54 48 00 00 00 00 00 00 00 07 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 00 56 30 00 00 00 03 00 11 00 01 01 06 00 00 00 00 00 00 00 00 00 01 00 00 00 0E 00 05 00 1A 00 12 00 00 00 26 00 00 00 00 00 0B 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 44 44 00 43 00 16 00 08 00 0D 00 04 00 02 00 00 00 02 00 11 00 20 00 1A 00 0A 00 15 00 0F 00 1B 00 13 00 08 00 08 00 00 00 00 00 07 00 0E 00 08 00 04 00 02 00 01 00 00 00 6D 00 03 00 02 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0A 00 21 00 15 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0B 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 18 05 1F 00 00 00 00 00 00 00 00 00 1F 00 03 00 02 00 00 00 00 00 00 00 20 00 05 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 62 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2E 00 00 1B 00 19 00 18 00 0D 00 00 00 00 00 00 00 01 00 01 00 02 00 00 06 00 01 E6 00 00 12 00 03 00 02 00 07 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 02 01 BA 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 24 00</result><result name="FLM_DATEN_2">08 00 00 00 00 00 00 00 00 00 00 0C 00 80 1B 00 45 10 00 A6 0D 00 51 16 00 59 44 00 00 EB 00 00 CA 00 00 49 00 00 17 00 10 00 0C 00 05 00 04 00 06 00 02 00 01 00 00 00 00 12 00 00 3A…
Here!
Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Do I really have Big Data?
Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Recommendation Engine
IIS Logs
Table Storage
BlobOnline Recommender
Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Which Platform?The Agony of Choice
Big Data State of the Art
The Agony of Choice
Big Data
Big Data
Big Data
Big Data
Big Data
Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Big Data
Big Data
Big Data
Big Data
Big Data
Do I really need Hadoop?Generalize
dNo SQL
Hadoop
Standard SQL
or MPP Appliances
Specialized No SQL
Streaming
In-MemoryAnalytics
Velocity
Variety
HighlyStructured
PolyStructured
Batch Realtime
Platform?(Agony of Choice)
Get data?
Pre-processdata?
Analyze data?
Validate architectur
e?
HaveBig Data?
Agony of Choice: ArchitectureOn-Premise Cloud↔
Azure AWS↔HDInsight (PaaS) Hadoop on Azure (IaaS)↔
Windows Linux↔C# / .NET Java↔Microsoft ↔ Big Data
Big Data
Big Data
Big Data
Big Data
Open Source
Platform?(Agony of Choice)
Get data?
Pre-processdata?
Analyze data?
Validate architectur
e?
HaveBig Data?
Agony of Choice: ArchitectureOn-Premise Cloud↔
Azure AWS↔HDInsight (PaaS)Hadoop on Azure (IaaS)↔
Windows Linux↔C# / .NET Java↔Microsoft ↔ Big Data
Big Data
Big Data
Big Data
Big Data
Open Source
Hadoop Deployment Options in AzureHDInsight Hadoop on Azure
Platform?(Agony of Choice)
Get data?
Pre-processdata?
Analyze data?
Validate architectur
e?
HaveBig Data?
Automated Deployment in AzureHDInsight
Need to KNOW configuration BEFORE deploying clusterPowerShellhttp://aka.ms/HDIpowershell
Azure Data FactoryAzure Automation
Hadoop on Azure
Hortonworks or ClouderaGitHub / CodePlexhttps://github.com/lararubbelke/Azure-DDP/
Platform?(Agony of Choice)
Get data?
Pre-processdata?
Analyze data?
Validate architectur
e?
HaveBig Data?
PowerShell Deployment
Platform?(Agony of Choice)
Get data?
Pre-processdata?
Analyze data?
Validate architectur
e?
HaveBig Data?
HDInsight Configuration
Supported Configuration
Files(hadoop dist):
core-site.xmlhdfs-site.xmlmapred-site.xmlcapacity-scheduler.xml
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
HDInsight Configuration – Hive
Supported Configuration
Files(hive dist):
hive-site.xml
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
PowerShell Deployment – Configuration $coreConfig = @{
"io.compression.codec"="org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec"; "io.sort.mb" = "1024";} $mapredConfig = new-object 'Microsoft.WindowsAzure.Management.HDInsight.Cmdlet.DataObjects.AzureHDInsightMapReduceConfiguration'$mapredConfig.Configuration = @{ "mapred.tasktracker.map.tasks.maximum"="2";} $clusterConfig = New-AzureHDInsightClusterConfig -ClusterSizeInNodes $numberNodes ` | Set-AzureHDInsightDefaultStorage -StorageAccountName $fqStorageAccountName -StorageAccountKey $storageAccountKey ` -StorageContainerName ($storageContainer.Name) $continueCheck = Read-Host "Attach additional storage accounts? (yes to continue)"
if ($continueCheck -eq "yes"){ foreach($asa in 1..5) { $newStorageAccountName = ($clusterPrefix + [DateTime]::Now.ToString("yyyyMMddHHmmss") + "a" + $asa) New-AzureStorageAccount -StorageAccountName $newStorageAccountName -Location "North Europe" $clusterConfig = $clusterConfig | Add-AzureHDInsightStorage ` -StorageAccountName ($newStorageAccountName + ".blob.core.windows.net") ` -StorageAccountKey (Get-AzureStorageKey $newStorageAccountName).Primary }}
$clusterConfig = $clusterConfig | Add-AzureHDInsightConfigValues -Core $coreConfig -MapReduce $mapredConfig # "At this point we are able to create a hdinsight cluster with a customised configuration"
Changing cluster configuration setting when deploying:
http://aka.ms/HDIconfiguration
Platform?(Agony of Choice)
Get data?
Pre-processdata?
Analyze data?
Validate architectur
e?
HaveBig Data?
Supported Configuration
Files(oozie dist):
oozie-site.xml
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
HDInsight Configuration – Oozie
Configuration best practicesHDInsight is on-demand compute powerStore important scripts in Blob re-useDo not rely on HDFS as this is NOT default file system
Example: Oozie job configurationnameNode=wasb://container_name@storage_name.blob.core.windows.netjobTracker=jobtrackerhost:9010queueName=default oozie.wf.application.path=wasb:///user/admin/examples/apps/ooziejobsoutputDir=ooziejobs-outoozie.use.system.libpath=true
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
Automation: Self-Made
Provision Cluster
Run Hive/Pig Script
Shut down Cluster
Challenges
Troubleshooting cluster provisioning failures
Serialized workflow execution
Troubleshooting job failuresOozie or Hive/Pig
Company Infrastructure
HDInsightCluster
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
Automation: Self-Made
Provision Cluster
Run Hive/Pig Script
Shut down Cluster
Challenges
Troubleshooting cluster provisioning failures
Serialized workflow execution
Troubleshooting job failuresOozie or Hive/Pig
Azure Automation HDInsightCluster
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
Automation via Azure Data Factory
Incoming Data
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
IIS Logs
IIS Logs
Pre-process
data?
Pre-process
data?
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
ProductAugmenter: Resolve JSON event files, augment latest product data, stores variants/attributes & prepare data to be loaded in SQL.
ProductPolarizer
ProductPolarizerPig
Product Polarizer ADF Pipeline
DailyWorkflow Step
K Daily
ProductSegmenter
ProductSegmenterHive
Product Segmenter ADF Pipeline Workflow Step
M
ProductPolarizer: Retrieves polarizing product information and stores in suitable format for SQL.
Concurrent Execution
Input Data
Azure BlobProduct Data Augmenter ADF Pipeline
Raw JSONEvent Files
ProductFeed
ProductDataAugmented1Hive
ProductsParent
ProductDataAugmented2Hive
ProductDataAugmented3Hive
Attributes Variants ProductsToSql
ProductPrepSQLHive Daily
Workflow Step B
Azure Blob
Azure Blob Azure Blob
Input Data
ProductDataAugmented1
Hive
ADF Table
ADF Activity
Automation via Azure Data Factory
How do I get my Data?
Where the Data was parkedDatabases Storage Account
SQL Azure
SQL IaaS
Table Storagehttp://aka.ms/HDItablestorage
Blob Storage
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
What Data do I have?Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
.txt
.csv
.xml
.txt
.csv
.xmlNote: Hadoop does not do well with lots of small fileshttp://aka.ms/HDI_smallfiles
How do I pre-process
Data?
Data Querying OptimizationsPlatform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process data?
Analytical type of workload
Data Querying OptimizationsPlatform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process data?
Analytical type of workload
Platform?(Agony of Choice)
Get data?
Pre-process data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Data Querying Optimizations
Analytical type of workload
Large, incrementally growing Fact tablesData Warehouse type of workload
Applied to Recommendation EngineStore the customer related dataUse appropriate partition strategyCan hurt performance significantly
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Table Storage
Pre-process data?
Running the WorkflowData stored in BlobAccessible from multiple services inside and outside HDI
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Schedule the Jobs using OozieNow moving to ADF
Mahout for running clustering algorithms
Pig is used for preparing the data setsPre-process
data?
How do I analyze my Data?
Decision TreesRecommendation Engine
Analytics: What was wanted?Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Analytics Options
MahoutOpen sourceWrite your own codeBy default on HDInsight
Azure MLVisual Composition: UI, Drag & DropModulesExtensible / Support for RSupport for CollaborationSupport for Data Science Process
Azure MLPlatform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Mahout DemoRun Random Forests!
What are Random Forests?
Hang on...
One Decision TreePlatform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
One Decision TreePlatform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
One Random TreePlatform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
A Random ForestPlatform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Mahout Demo
1. Where does the data need to be?2. Generate descriptor file3. Build forest4. Classify test data
Mahout: Run Random ForestsPlatform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
Validate architectur
e?
HaveBig Data?
1. Get DataPlatform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
1. Get Datahdfs dfs -cp wasb://<container>@<storage_account>.blob.core.windows.net/user/<remote_user>/testdata/KDDTrain+.arffwasb://<container>@<storage_account>.blob.core.windows.net/user/hdp/testdata/KDDTrain+.arff
hdfs dfs -cp wasb://<container>@<storage_account>.blob.core.windows.net/user/<remote_user>/testdata/KDDTest+.arff wasb://<container>@<storage_account>.blob.core.windows.net/user/hdp/testdata/KDDTest+.arff
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
2. Generate Descriptor FilePlatform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
2. Generate Descriptor Filehadoop jar C:\apps\dist\mahout-0.9\mahout-core-0.9-job.jarorg.apache.mahout.classifier.df.tools.Describe -p wasb:///user/hdp/testdata/KDDTrain+.arff -f testdata/KDDTrain+.info -d N 3 C 2 N C 4 N C 8 N 2 C 19 N L
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
2. Generate Descriptor FilePlatform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
3. Build Foresthadoop jarC:\apps\dist\mahout-0.9\mahout-examples-0.9-job.jarorg.apache.mahout.classifier.df.mapreduce.BuildForest -Dmapred.max.split.size=1874231 -d wasb:///user/hdp/testdata/KDDTrain+.arff -ds wasb:///user/hdp/testdata/KDDTrain+.info -sl 5 -p -t 100 -o nsl-forest
Data
Dataset
Selection Partial #Trees Output
Leaf size
3. Build Forest – Copy Datahdfs dfs -cp wasb://<container>@<storageaccount>.blob.core.windows.net/user/<remoteuser>/nsl-forest wasb://<container>@<storageaccount>.blob.core.windows.net/user/hdp/nsl-forest
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
4. Classify Test Datahadoop jar C:\apps\dist\mahout-0.9\mahout-examples-0.9-job.jarorg.apache.mahout.classifier.df.mapreduce.TestForest-i wasb:///user/hdp/testdata/KDDTest+.arff-ds wasb:///user/hdp/testdata/KDDTrain+.info-m wasb:///user/hdp/nsl-forest -a -mr -o predictions
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
4. Classify Test Data
9,458 253
8,325
Predicted
4,508
normal anomaly
Actu
alnorm
al
an
om
aly
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
4. Classify Test Data
accuracy=#correctly classified instances#classified instances
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
4. Classify Test Data – Output?
http://aka.ms/mahout
Platform?(Agony of Choice)
Get data?
Analyze data?
Validate architectur
e?
HaveBig Data?
Pre-process
data?
Validation & Troubleshooting
Managing Solution
Performance
How many cores does each workload use?
How much data ingested?
Validate architectur
e?
Platform?(Agony of Choice)
Get data?
Analyze data?
HaveBig Data?
Scalability
How do I get more compute/storage?
Does workload utilize the capacities?
Manageability
PaaS almost takes care of itself.
Still needs managing, e.g. storage account
Pre-process
data?
Monitoring and Troubleshooting
Compute
Mahout / Pig Calculations
I/O
HDFSIaaS VM max 16 TB of space
BlobDifferent latency and throughput characteristics
Validate architectur
e?
Platform?(Agony of Choice)
Get data?
Analyze data?
HaveBig Data?
Pre-process
data?
Troubleshooting Pig in RecommenderWorkflowLoad date from the WASB filesGet Product and session dataJoin customer and product dataClean up (duplicates, filters and etc.)Store
Validate architectur
e?
Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
HaveBig Data?
Job fails after running for 4
hours
Troubleshooting Pig in RecommenderIntelligent parallelismLogical PlanPhysical Plan
Reduce Plan may limit execution to single node
Validate architectur
e?
Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
HaveBig Data?
Job fails after running for 4
hours
Specific to HDInsightHDInsight is PaaS No Admin rightsNo access to the data nodes
Logs know all about the systemUse RDP session!Get all you may need: Oozie, Pig
Validate architectur
e?
Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
HaveBig Data?
Storage Account: Advanced AnalyticsExplains application to storage interactionsVery useful counters
AuthorizationError,Availability,AverageE2ELatency,AverageServerLatency,ClientTimeoutError,NetworkError,PercentAuthorizationError,PercentNetworkError,PercentSuccess,ServerTimeoutError,Success,ThrottlingErrorTimestamp,TotalBillableRequests,TotalEgress,TotalIngress,TotalRequests
Application
Storage throttling
When?
Data exchange0
2000000000
4000000000
6000000000
8000000000
10000000000
12000000000
14000000000
Jobs Storage
Sum of TotalIngress Sum of TotalEgressValidate
architecture?
Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
HaveBig Data?
Mapping Application and Storage Logs
0
200
400
600
800
1000
1200 Jobs StorageTotal
2014-03-26 22:28:37,321 INFO CallbackServlet:539 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000000-140326181153083-oozie-hdp-W] ACTION[0000000-140326181153083-oozie-hdp-W@pig-node-01] callback for action [0000000-140326181153083-oozie-hdp-W@pig-node-01]2014-03-26 22:28:37,472 INFO PigActionExecutor:539 - USER[Admin] GROUP[-] TOKEN[] APP[receipts-products-mahout] JOB[0000000-140326181153083-oozie-hdp-W] ACTION[0000000-140326181153083-oozie-hdp-W@pig-node-01] action completed, external ID [job_201403261811_0001]2014-03-26 22:28:37,562 WARN PigActionExecutor:542 - USER[Admin] GROUP[-] TOKEN[] APP[receipts-products-mahout] JOB[0000000-140326181153083-oozie-hdp-W] ACTION[0000000-140326181153083-oozie-hdp-W@pig-node-01]
Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.PigMain], exit code [2]2014-03-26 22:28:38,101 INFO ActionEndXCommand:539 - USER[Admin] GROUP[-] TOKEN[] APP[receipts-products-mahout] JOB[0000000-140326181153083-oozie-hdp-W] ACTION[0000000-140326181153083-oozie-hdp-W@pig-node-01] ERROR is considered as FAILED for SLA2014-03-26 22:28:38,228 WARN JPAService:542 - USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] JPAExecutor [WorkflowActionGetJPAExecutor]
ended with an active transaction, rolling back2014-03-26 22:28:38,343 INFO ActionStartXCommand:539 - USER[Admin] GROUP[-] TOKEN[] APP[receipts-products-mahout] JOB[0000000-140326181153083-oozie-hdp-W] ACTION[0
High Latency Timeout
Validate architectur
e?
Platform?(Agony of Choice)
Get data?
Pre-process
data?
Analyze data?
HaveBig Data?
Wrap Up
Wrapping up
Whichplatform
?
Get my data?
Pre-process
my data?
Analyze my
data?
Validate?
HaveBig
Data?Recognizing the big data need
HDP (IaaS) vs. HDInsight (PaaS)
BLOB preferred (multiple storage accounts)Pig, Hive and others – perform and scale?Mahout, Azure ML and others
Performance – interactions b/w components?
Related ContentBreakout SessionsDBI-B219 Introduction to Hadoop through Azure HDInsightDBI-B221 TWC | Using Big Data and Machine Learning to Protect Your Online Service DBI-B335 Hadoop for Windows Deep Dive DBI-B411 Extending Your Hadoop Distributions in the Cloud
LabsDBI-H335 Working with Hive in HDInsightDBI-IL202 Getting Started Using HBase in Microsoft Azure HDInsight DBI-IL203 Processing WebLogs with HDInsight
Find us later at MSE – Data Platform and Business
Intelligence
Olivia Klose http://blogs.technet.com/b/oliviaklose/
Track Resources
Alexei Khalyako http://alexeikh.wordpress.com/
Big Data Support http://blogs.msdn.com/b/bigdatasupport/
27 Hands on Labs + 8 Instructor Led Labs in Hall 7
DBI Track resources
Free SQL Server 2014 Technical Overview e-book
microsoft.com/sqlserver and Amazon Kindle StoreFree online training at Microsoft Virtual Academy
microsoftvirtualacademy.com Try new Azure data services previews!Azure Machine Learning, DocumentDB, and Stream Analytics
Resources
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Sessions on Demand
http://channel9.msdn.com/Events/TechEd
Developer Network
http://developer.microsoft.com
TechEd Mobile app for session evaluations is currently offline
SUBMIT YOUR TECHED EVALUATIONSFill out an evaluation via
CommNet Station/PC: Schedule Builder
LogIn: europe.msteched.com/catalog
We value your feedback!
EM OFC WIN DBI
CDP TWC DEV AZR
Following this session at 18:30
in Hall 5Meet with Microsoft Product ExpertsSnacks and Beverages Served
Ask The Experts Key and floorplan
Cloud and Datacenter Platform
Data Platform and Business Intelligence
Developer Platform and Tools
Enterprise Mobility
Office 365
Windows
Microsoft Azure
Trustworthy Computing
© 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.