Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm...
Transcript of Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm...
![Page 1: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/1.jpg)
Real-TimeAlarmVerifica/onwithSparkStreamingandMachineLearning
AnaSima,JanStampfli,KurtStockingerZurichUniversityofAppliedSciences
SwissBigDataUserGroupMee3ng
January23,2017
![Page 2: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/2.jpg)
AboutMe
• Prof.Dr.KurtStockinger• Sincesummer2013atZHAW
• Databases,DataWarehousing,BigData
• 2007-2013:• DataWarehouse&BusinessIntelligenceArchitect
• 2004-2007:• ComputerScien3st
• 1999-2003:• ComputerScien3st
2
![Page 3: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/3.jpg)
WhatistheProblemwithAlarmSystems?
3
![Page 4: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/4.jpg)
ATypicalExample
4
Seriousuniversi3es…
![Page 5: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/5.jpg)
…withSeriousStudents
5
![Page 6: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/6.jpg)
ASeriousPartyinaStudentHome
6
![Page 7: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/7.jpg)
…SeriousFireFighters
7
![Page 8: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/8.jpg)
WhyMachineLearning?
• Around90%ofreportedincidentsarefalsealarms• Humanoperatorsshouldpriori3zetruealarms• Machinelearningisableto
• discoverthepa_ernsburiedinthealarmdata• separatetruefromfalsealarmswithahighaccuracy
8
![Page 9: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/9.jpg)
9
DIGITAL FOOTPRINT OF A SECURED OBJECT
Alarm panel
Transmitter
User behavior
Alarm types
User behavior will improve overall process User based business models are possible with machine learning tools e.g. Services payments only when an alarm panel is activated
![Page 10: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/10.jpg)
ResearchProjectwithSitasys
10
![Page 11: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/11.jpg)
TalkOutline
• Machinelearningforalarmverifica3on
• End-to-endperformanceanalysis:• Streamprocessing(liveanalysis)• Batchprocessing(historicanalysis)• Onlinemachinelearning(liveverifica3on)
11
![Page 12: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/12.jpg)
MachineLearning
withSpark
12
![Page 13: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/13.jpg)
AboutMe
JanStampfli• ZHAWBachelorinComputerScience
• Graduatedin2014• ResearchassistantatZHAW
• SinceSeptember2014
• Workinginresearchprojectsonthetopics• BigData• MachineLearning• Informa3onRetrieval
13
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
![Page 14: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/14.jpg)
ATypicalMachineLearningProcess
14
![Page 15: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/15.jpg)
SparkMLPipeline
h_p://spark.apache.org/docs/latest/ml-pipeline.html
15
loca/on labelBern 1Chur 0Sion 1
hash206691120996822577109
predic/on probabili/es0 [0.51,0.49]0 [0.99,0.01]1 [0.13,0.87]
DataFrame
Pipeline
PipelineTransformer EstimatorTransformerTransformer
EstimatorEstimator
![Page 16: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/16.jpg)
Pipeline
SparkMLTuning
• RandomForest
• Numberoftrees• Maxdepthoftrees
• ParameterMap• [10,20,30]• [5,25,50]
16
DataFrame
CrossValidator
Pipeline Evaluator
h_p://spark.apache.org/docs/latest/ml-tuning.html
![Page 17: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/17.jpg)
SparkMLEvalua3on
h_p://spark.apache.org/docs/latest/mllib-evalua3on-metrics.html 17
PredictionDataFrame
label probabili/es1 [0.51,0.49]0 [0.99,0.01]1 [0.13,0.87]
Evaluator
AccuracyPrecision
Recall
…
![Page 18: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/18.jpg)
AppliedMachineLearning
18
![Page 19: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/19.jpg)
DataaboutRealAlarms
• Features
• Loca3on,sensortype,dayofweek,…• Labels
• 0=falsealarm• 1=truealarm
19
Total Falsealarms TruealarmsTotal 339’841 165’997 173’844Training 169’920 84’960 84’960Test 169’921 81’037 88’884
![Page 20: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/20.jpg)
AppliedAlgorithms
• RandomForest
• SupportVectorMachine
• Logis3cRegression
• DeepNeuralNetwork(DNN)
20
![Page 21: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/21.jpg)
Results
21
JanStampfli,KurtStockinger(2016).ApplieddataScience:UsingMachineLearningforAlarmVerifica3on,ERCIMNews,107,10.
Results
21
Accuracy
92.33% 92.05%
![Page 22: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/22.jpg)
• Confusionmatrix
• Precision(correctlypredictedalarms)
• Recall(correctlydetectedalarms)
22
Evalua3on–RandomForest
2222
Predic/onTrue Predic/onFalse
LabelTrue 82’999 5’885
LabelFalse 7’148 73’889
Predic/onTrue Predic/onFalse
LabelTrue 92.07%
LabelFalse 92.62%
Predic/onTrue Predic/onFalse
LabelTrue 93.38%
LabelFalse 91.18%
![Page 23: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/23.jpg)
ConclusionaboutML
• Humanoperatorsarenowabletopriori3zealarmspredictedastrue
• Falsealarmsmusts3llbeevaluated
23
![Page 24: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/24.jpg)
Performance
24
![Page 25: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/25.jpg)
AboutMe
AnaSima
• EPFLMasterinSWSystems,2015
• Previouslyworkedonaframeworkforunifiedstream&batchprocessing(Cyclone)
• JoinedZHAWsinceNovember201625
![Page 26: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/26.jpg)
Welcomebackfromtheholidays!
26
![Page 27: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/27.jpg)
Whyisalarmprocessinga3me-cri3calmission?
27
![Page 28: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/28.jpg)
Whenthingsgowrong…
28
It’sallaboutresponse3me!
![Page 29: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/29.jpg)
Performancetes3ng
29
DataConsumer
Streaming
Historic
MachineLearning
DataGenerator
Alarmstream
![Page 30: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/30.jpg)
Designingtheconsumer
• Spark-”AunifiedengineforBigDataProcessing”
30
SQL
StreamingMachineLearning
GraphProcessing
DataFrames
Kryo Gson Jackson
DataSets
RDDs
![Page 31: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/31.jpg)
Noonesizefitsall,sohowdoIchoose???
31
![Page 32: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/32.jpg)
Thebasecase
32
Producer
WriteNalarms/s
Consumer
Project&FilterDis3nctComputeHistogramPredictTrue/False
Alarms
Historicquery
![Page 33: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/33.jpg)
ConsumerWorkflow
• Everywindowof10s• Streaming
• mAddrsInWindow = alarms.map(Alarm::getMacAddress).distinct();
• Historic• Select MacAddress, count(*) from mongoDB Where MacAddress in mAddrsInWindow
Group By MacAddress
• MachineLearning• (Coveredinfirsthalfoftalk)
33
![Page 34: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/34.jpg)
34
SAVEBenchmarkIden3fyingBo_lenecks
1
ZHAWDatalab
![Page 35: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/35.jpg)
Iden3fyingbo_lenecks(1)
1. Startedexperimentwith• 8-coreKataProducer• 8-coreSparkConsumer
2. Consumerslowevenforabasicstreamingtask• Countnumberofelementsinwindow=>afewsec
3. Producernotoutpuungexpectedthroughput• Stumbled@around12K/s(recordsize≃600B) 35
![Page 36: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/36.jpg)
Commonrootcause?
36
Producer
Serialize,(fasterxmlJackson)WriteNalarms/s
ConsumerDeserialize,
Project&FilterDis3nctComputeHistogramPredictTrue/False
SerializedAlarms
Historicquery
![Page 37: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/37.jpg)
Doesitreallyma_er?
• Per-objectserializa3ononlytakesafewtensofns
37
• Mul3plythatby12K..
• Andbtw,haveitallsentin1s!
![Page 38: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/38.jpg)
Areweusingtherighttool?
• Answer:benchmarks*!
• ”IfyourenvironmentprimarilydealswithlotsofsmallJSONrequests[…]thenGSONisyourlibraryofinterest.Jacksonstrugglesthemostwithsmallfiles.”
• Jackson:upto3xslowerforsmallobjects
*TheulMmateJSONlibrary:hRp://blog.takipi.com/the-ulMmate-json-library-json-simple-vs-gson-vs-jackson-vs-json/
38
![Page 39: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/39.jpg)
39
SAVEBenchmarkIden3fyingBo_lenecks
2
ZHAWDatalab
![Page 40: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/40.jpg)
Results(1)
05,00010,00015,00020,00025,00030,000
Jackson Gson
Producermaxthroughput(alarms/s)
+
40*<30LOCchange=>�~2xspeedupinbothProducer&Consumer
05,00010,00015,00020,00025,00030,000
Jackson Gson
Consumermaxthroughput(alarms/s)
![Page 41: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/41.jpg)
Howmuchwillfitin10s?
41
2.91
2.3
0.02
0.02
0.02
7
5
7.5
0
2
4
6
8
10
12
Kata+Jackson(3.5Kalarms)
Kata+GSON(3.5Kalarms)
Kata+GSON(5.7Kalarms)
Breakdownof/me/subtask
Streaming Historic MachineLearning
Note:thealarmsRDDisusedforboththestreaming&theMLpart=>serializerchangebenefitsboth
![Page 42: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/42.jpg)
Iden3fyingbo_lenecks(2)
1. Caching?• Broughtsomeimprovement,butsmall(~5%)• Theamountofdatareusedistooli_le(20MB)
2. Simplifyingthedesign?• TheKISS*principle
• KeepItSimple,Stupid!
42
![Page 43: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/43.jpg)
Cappedcollec3on
Simplifica3on
43
Alarms
Cappedcollec3on
Producer
WriteNalarms/s
Consumer
Project&FilterDis3nctComputeHistogramPredictTrue/False
Alarms
![Page 44: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/44.jpg)
44
SAVEBenchmarkIden3fyingBo_lenecks
3
ZHAWDatalab
![Page 45: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/45.jpg)
Results(2)
Producer
05,00010,00015,00020,00025,00030,000
Maxthroughput(alarms/s)
45
05,00010,00015,00020,00025,00030,000
Maxthroughput(alarms/s)
Note:Producerslow,buts3ll@previousmaxconsumerthroughput!Consumer:~50%speedup.
+
![Page 46: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/46.jpg)
Howmuchwillfitin10s?
46
2.91 1
0.02
0.02 0.02
7
5
2
0
2
4
6
8
10
12
Kata+Jackson(3.5Kalarms)
Kata+GSON(3.5Kalarms)
Mongo+GSON(3.5Kalarms)
Breakdownof/me/subtask
Streaming Historic MachineLearning
![Page 47: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/47.jpg)
Rootcause?
• UsingKatadirectstreamwithasinglepar33on=>poorparallelism• Mostopera3onsrunonasingleexecutor
• EventhoughSparkisconfiguredtorunon8cores
• UsingMongoDB=>finergrainedcontroloverparallelismlevelfortheRDD• Readanarrayofdocumentsfromtailablecursor• ParallelizearrayintoRDD&runMLoneachpart
47
![Page 48: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/48.jpg)
FIXIT!
ConfiguringtheKataDirectStreaminSparkwithproperseungs….
(numpar33ons=numcores)
48
![Page 49: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/49.jpg)
Results(3)
Producer
05,00010,00015,00020,00025,00030,000
Maxthroughput(alarms/s)
49
05,00010,00015,00020,00025,00030,000
Maxthroughput(alarms/s)
+
![Page 50: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/50.jpg)
Designingtheconsumer
• Spark-”AunifiedengineforBigDataProcessing”
50
SQL
StreamingMachineLearning
GraphProcessing
DataFrames
Kryo Gson Jackson
DataSets
RDDs
Easytousetools…cansome/mesbetrickytogetright
![Page 51: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/51.jpg)
Conclusions & Lessons Learned
• Machine learning algorithm of Spark can be directly applied • Works well for small and big data • No re-writing necessary to make algorithm scale across computing cluster
• Spark Streaming can’t be used out of the box for stream and batch processing: • Need to use a persistency layer • Requires significant performance tuning
• Working with our industry partner Sitasys: • Great collaboration • Working with real-world problem is very rewarding • Can make real contribution and enhance existing algorithms and
technology 51
![Page 52: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/52.jpg)
Appendix
52
![Page 53: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/53.jpg)
PipelineExampleCode(Java)//CreatestringindexerTransformer.StringIndexerstringIndexer=newStringIndexer()
.setInputCol("loca/on").setOutputCol("indexLoca/on");//CreaterandomforestEsMmator.RandomForestClassifierrandomForestClf=newRandomForestClassifier()
.setFeaturesCol("indexLoca/on").setLabelCol("label");//CreatepipelineEsMmator(specifyingtheMLworkflow).Pipelinepipeline=newPipeline()
.setStages(newPipelineStage[]{stringIndexer,randomForestClf});//CreatecrossvalidaMonEsMmator(wrappingthepipelineEsMmator).ParamMap[]paramGrid=newParamGridBuilder()
.addGrid(randomForestClf.numTrees()(),newint[]{10,20,30}).addGrid(randomForestClf.maxDepth(),newint[]{5,25,50}).build();
CrossValidatorcrossValidator=newCrossValidator().setEs3mator(pipeline).setEs3matorParamMaps(paramGrid).setEvaluator(newBinaryClassifica3onEvaluator()).setNumFolds(10);
53
![Page 54: Real-Time Alarm Verifica/on with Spark Streaming and ... · 1/23/2017 · Real-Time Alarm Verifica/on with Spark Streaming and Machine Learning Ana Sima, Jan Stampfli, Kurt Stockinger](https://reader036.fdocuments.net/reader036/viewer/2022081522/5ee1c02bad6a402d666c8496/html5/thumbnails/54.jpg)
Training,Tes3ngandEvalua3onExampleCode(Java)//Given(creaMonofdatasetnoincludedinexamplecode)Dataset<Row>trainSet=…;Dataset<Row>testSet=…;//Fitthecrossvalidatortotrainingdocuments.CrossValidatorModelmodel=crossValidator.fit(trainSet);//MakepredicMonsontestdocuments.Dataset<Row>predic3ons=model.transform(testSet);//Createevaluator.Mul3classClassifica3onEvaluatorevaluator=newMul3classClassifica3onEvaluator()
.setLabelCol("label").setPredic3onCol("predic/on");//EvaluatepredicMons.evaluator.setMetricName("accuracy");doubleaccuracy=evaluator.evaluate(predic3ons); 54