Intro Data Streaming SLIDESNOTES... · 2018. 9. 24. · data streaming operators Two main types:...
Transcript of Intro Data Streaming SLIDESNOTES... · 2018. 9. 24. · data streaming operators Two main types:...
IntroductiontoDataStreaming
VincenzoGulisano,Ph.D.
Agenda
• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography
VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 2
Agenda
• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography
VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 3
VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 4
IoTenablesforincreasedawareness,security,power-efficiency,...
largeIoTsystemsarecomplex
traditionaldataanalysistechniquesalonearenotadequate!
AMIs[1,2,3,4] VNs[5,6]
• demand-response• scheduling[7]• micro-grids• detectionofmediumsizeblackouts[8]• detectionofnontechnicallosses• ...
5
• autonomousdriving• platooning• accidentdetection[9]• variabletolls[9]• congestionmonitoring[10]• ...
IoTenablesforincreasedawareness,security,power-efficiency,...
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
AMIs VNs
6
largeIoTsystemsarecomplex
ThedatastreamingparadigmanditsuseinFogarchitectures
Characteristics[15]:1. edgelocation2. locationawareness3. lowlatency4. geographicaldistribution5. large-scale
6. supportformobility7. real-timeinteractions8. predominanceofwireless9. heterogeneous10. interoperability/federation11. interactionwiththecloud
VincenzoGulisano
7
traditionaldataanalysistechniquesalonearenotadequate![13,14]
ThedatastreamingparadigmanditsuseinFogarchitectures
1. doestheinfrastructureallowforbillionsofreadingsperdaytobetransferredcontinuously?
2. thelatencyincurredwhiletransferringdata,doesthatunderminetheutilityoftheanalysis?
3. isitsecuretoconcentrateallthedatainasingleplace?[11]
4. isitsmarttogiveawayfine-graineddata?[12]
VincenzoGulisano
Agenda
• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography
VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 8
MainMemory
Motivation
DBMSvs.DSMS
Disk
1 Data
QueryProcessing
3 Queryresults
2 Query
MainMemory
QueryProcessing
ContinuousQueryData Query
results
9ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
Beforewestart...aboutdatastreamingandStreamProcessingEngines(SPEs)
10
AnincompletelistofSPEs(cf.relatedworkin[16]):
time
BorealisThe Aurora Project
STanfordstREamdatAManager
NiagaraCQ
COUGAR
StreamCloud
Coveringallofthem/discussingwhichusecasesarebestforeachoneoutofscope...thefollowingshowconnectionbetweenwhatisbeingpresentedandacertainSPE
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
datastream:unboundedsequenceoftuplessharingthesameschema
11
Example:vehicles’speedreports
time
Field Field
vehicleid text
time(secs) text
speed(Km/h) double
Xcoordinate double
Ycoordinate double
A 8:00 55.5 X1 Y1
Let’sassumeeachsource(e.g.,vehicle)producesanddeliversatimestampsortedstream
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
continuousquery(orsimplyquery):DirectedAcyclicGraph(DAG)ofstreamsandoperators
12
OP
OP
OP
OP OP
OP OP
sourceop(1+outstreams)
sinkop(1+instreams)
stream
op(1+in,1+outstreams)
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
datastreamingoperators
Twomaintypes:• Statelessoperators• donotmaintainanystate• one-by-oneprocessing• iftheymaintainsomestate,suchstatedoesnotevolvedependingonthetuplesbeingprocessed
• Statefuloperators• maintainastatethatevolvesdependingonthetuplesbeingprocessed• produceoutputtuplesthatdependonmultipleinputtuples
13
OP
OP
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
statelessoperators
14
Filter ...
Map
Union...
Filter/routetuplesbasedonone(ormore)conditions
Transformeachtuple
Mergemultiplestreams(withthesameschema)intoone
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
statelessoperators
15
Filter ...
Map
Union...
source:http://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
statelessoperators
16
Filter ...
Map
Union...
source:https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/index.html
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
statelessoperators
17
Filter ...
Map
Union...
source:http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
statefuloperators
18
Aggregateinformationfrommultipletuples(e.g.,max,min,sum,...)
Jointuplescomingfrom2streamsgivenacertainpredicate
Aggregate
Join
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
statefuloperators
19
source:http://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html
source:http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams
source:http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
Waitamoment!
ifstreamsareunbounded,howcanweaggregateorjoin?
20ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
windows andstatefulanalysis[16]
Statefuloperationsaredoneoverwindows:• Time-based(e.g.,tuplesinthelast10minutes)• Tuple-based(e.g.,giventhelast50tuples)
21
time[8:00,9:00)
[8:20,9:20)
[8:40,9:40)
Usuallyapplicationsrelyontime-basedslidingwindows
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
time-basedslidingwindowaggregation(count)
22
Counter:4
time[8:00,9:00)
8:05 8:15 8:22 8:45 9:05
Output:4
Counter:1Counter:2
Counter:3
Counter:3
time
8:05 8:15 8:22 8:45 9:05
[8:20,9:20)
weassumedeachsourceproducesanddeliversatimestampsortedstream!Whathappensifthisisnotthecase?
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
23
windows andstatefulanalysis
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
24
basicoperatorsanduser-definedoperators
Besidesasetofbasicoperators,SPEsusuallyallowtheusertodefinead-hocoperators(e.g.,whenexistingaggregationarenotenough)
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
samplequery
Foreachvehicle,raiseanalertifthespeedofthelatestreportismorethan2timeshigherthanitsaveragespeedinthelast30days.
25
time
A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
26
Field
vehicleid
time(secs)
speed(Km/h)
Xcoordinate
Ycoordinate
Computeaveragespeedforeach
vehicleduringthelast30days
Aggregate
Fieldvehicleid
time(secs)
avgspeed(Km/h)
Join
Checkcondition
Filter
Field
vehicleid
time(secs)
speed(Km/h)
Joinonvehicleid
Fieldvehicleid
time(secs)
avgspeed(Km/h)
speed(Km/h)
samplequery
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
27
A J F
samplequery
Notice:• thesamesemanticscanbedefinedinseveralways(usingdifferentoperatorsandcomposingthemindifferentways)• Usingmanybasicbuildingblockscaneasethetaskofdistributingandparallelizingtheanalysis(moreinthefollowing...)
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
Whydatastreaming,then?
Expressive
Online
Parallel&Distributed
VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 28
Expressive
Online
Parallel&Distributed
A F M
A F M
A F M
A F M
FM
A
A
A
F
F
VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 29
Samplequerythate.g.validatesdata/raisesalarms...
Agenda
• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography
VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 30
1. Distributeddeployment2. Paralleldeployment3. Orderinganddeterminism4. Shared-nothingvsshared-memoryparallelism5. Loadbalancing6. Elasticity7. Faulttolerance8. Datasharing
31
Challengesandresearchquestions
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
Beforewestart...
32
Followingexamplesarefromvehicularnetworks
Road-sideunitRSU Vehicle
Server
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
1- Distributeddeployment– wheretoplaceagivenoperator?[17,4,18,19]
33
M?
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
2- Paralleldeployment– howdoweparallelizetheanalysis? [20,21]
34
M
M A
A
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
35
M
M A
A
A 8:00 55.5 X1 Y1
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
A 8:00 55.5 X1 Y1
A 8:07 34.3 X3 Y3
A 8:03 70.3 X2 Y2
Whatiftuplewithtimestamp8:00arrivesaftertuplewithtimestamp8:07?
3– Orderinganddeterminism[22,23,24]
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
4– shared-nothingvs.shared-memoryparallelism[25]
36
M
M
...
A
AJ
...
Howtotakeadvantageofmulti-corearchitectures?
Howtoboostinter-nodeparallelismandintra-nodeparallelism?
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
5– loadbalancing&statetransfer[20,26]
37
IfweshifttheprocessingofacertainsubsetoftuplesfromnodeAtonodeB,howdotransferitspreviousstate?
MA
MA
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
6– elasticity[20,27]
38
How/whentoprovisionordecommissionnewresourcesdependingontheanalysis’costsfluctuations?
J
ThedatastreamingparadigmanditsuseinFogarchitectures
J J
J
VincenzoGulisano
7– faulttolerance[16,28,29]
39
Howtoreplaceafailednodeminimizingrecoverytime(makingittransparenttotheenduser)?
ThedatastreamingparadigmanditsuseinFogarchitectures
J
J
J
J
J
J
VincenzoGulisano
8– datasharing(differentialprivacy)[2,30,31,32]
40
Howtopreventprivacyleaks?
Supposeweareinterestedinpublishingvehicles’averagespeedoverawindowofonehour...
WecouldaggregatebyRSU!
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
8– datasharing(differentialprivacy)
41
Waitamoment!what ifasinglevehicleisconnectedtoacertainRSU?
Whetheracertainmechanismpreservesornottheprivacyoftheunderlyingdatadependsontheknowledgeoftheadversary
Differentialprivacyassumestheworstcasescenario!
ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano
Agenda
• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography
VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 42
Millionsofyearsofevolution
Millionsofsensors
• Storeinformation• Iteratemultipletimesoverdata• Think,donotrushthroughdecisions
• ”Hard-wired”routines• Real-timedecisions• High-throughput/low-latency
ShouldI(really)haveanextrapieceofcake?
Danger!!!Run!!!
Humans
VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 43
Years/Decadesofevolution
Millionsofsensors
WhattrafficcongestionpatternscanIobservefrequently?
Don’ttakeover,carinoppositelane!
• Storeinformation• Iteratemultipletimesoverdata• Think,donotrushthroughdecisions
Databases,dataminingtechniques...
Datastreaming,distributedandparallelanalysis
• Continuousanalysis• Real-timedecisions• High-throughput/low-latency
Computers(cyber-physical/IoTsystems)
VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 44
Agenda
• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography
VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 45
Bibliography1. Zhou,Jiazhen,RoseQingyang Hu,andYiQian."Scalabledistributedcommunicationarchitecturestosupportadvanced
meteringinfrastructureinsmartgrid."IEEETransactionsonParallelandDistributedSystems23.9(2012):1632-1642.2. Gulisano,Vincenzo,etal."BES:DifferentiallyPrivateandDistributedEventAggregationinAdvancedMeteringInfrastructures."
Proceedingsofthe2ndACMInternationalWorkshoponCyber-PhysicalSystemSecurity.ACM,2016.3. Gulisano,Vincenzo,MagnusAlmgren,andMarinaPapatriantafilou."Onlineandscalabledatavalidationinadvancedmetering
infrastructures."IEEEPESInnovativeSmartGridTechnologies,Europe.IEEE,2014.4. Gulisano,Vincenzo,MagnusAlmgren,andMarinaPapatriantafilou."METIS:atwo-tierintrusiondetectionsystemforadvanced
meteringinfrastructures."InternationalConferenceonSecurityandPrivacyinCommunicationSystems.SpringerInternationalPublishing,2014.
5. Yousefi,Saleh,MahmoudSiadat Mousavi,andMahmoodFathy."Vehicularadhocnetworks(VANETs):challengesandperspectives."20066thInternationalConferenceonITSTelecommunications.IEEE,2006.
6. ElZarki,Magda,etal."Securityissuesinafuturevehicularnetwork."EuropeanWireless.Vol.2.2002.7. Georgiadis,Giorgos,andMarinaPapatriantafilou."Dealingwithstoragewithoutforecastsinsmartgrids:Problem
transformationandonlineschedulingalgorithm."Proceedingsofthe29thAnnualACMSymposiumonAppliedComputing.ACM,2014.
8. Fu,Zhang,etal."Onlinetemporal-spatialanalysisfordetectionofcriticaleventsinCyber-PhysicalSystems."BigData(BigData),2014IEEEInternationalConferenceon.IEEE,2014.
ThedatastreamingparadigmanditsuseinFogarchitectures 46VincenzoGulisano
Bibliography9. Arasu,Arvind,etal."Linearroad:astreamdatamanagementbenchmark."Proceedingsofthe
ThirtiethinternationalconferenceonVerylargedatabases-Volume30.VLDBEndowment,2004.
10. Lv,Yisheng,etal."Trafficflowpredictionwithbigdata:adeeplearningapproach."IEEETransactionsonIntelligentTransportationSystems16.2(2015):865-873.
11. Grochocki,David,etal."AMIthreats,intrusiondetectionrequirementsanddeploymentrecommendations."SmartGridCommunications(SmartGridComm),2012IEEEThirdInternationalConferenceon.IEEE,2012.
12. Molina-Markham,Andrés,etal."Privatememoirsofasmartmeter."Proceedingsofthe2ndACMworkshoponembeddedsensingsystemsforenergy-efficiencyinbuilding.ACM,2010.
13. Gulisano,Vincenzo,etal."Streamcloud:Alargescaledatastreamingsystem."DistributedComputingSystems(ICDCS),2010IEEE30thInternationalConferenceon.IEEE,2010.
14. Stonebraker,Michael,Uǧur Çetintemel,andStanZdonik."The8requirementsofreal-timestreamprocessing."ACMSIGMODRecord34.4(2005):42-47.
15. Bonomi,Flavio,etal."Fogcomputinganditsroleintheinternetofthings."ProceedingsofthefirsteditionoftheMCCworkshoponMobilecloudcomputing.ACM,2012.
ThedatastreamingparadigmanditsuseinFogarchitectures 47VincenzoGulisano
Bibliography16. Gulisano,VincenzoMassimiliano. StreamCloud:AnElasticParallel-DistributedStream
ProcessingEngine.Diss.Informatica,2012.17. Cardellini,Valeria,etal."Optimaloperatorplacementfordistributedstreamprocessing
applications."Proceedingsofthe10thACMInternationalConferenceonDistributedandEvent-basedSystems.ACM,2016.
18. Costache,Stefania,etal."UnderstandingtheData-ProcessingChallengesinIntelligentVehicularSystems."Proceedingsofthe2016IEEEIntelligentVehiclesSymposium(IV16).
19. Giatrakos,Nikos,Antonios Deligiannakis,andMinosGarofalakis."ScalableApproximateQueryTrackingoverHighlyDistributedDataStreams."Proceedingsofthe2016InternationalConferenceonManagementofData.ACM,2016.
20. Gulisano,Vincenzo,etal."Streamcloud:Anelasticandscalabledatastreamingsystem."IEEETransactionsonParallelandDistributedSystems23.12(2012):2351-2365.
21. Shah,Mehul A.,etal."Flux:Anadaptivepartitioningoperatorforcontinuousquerysystems."DataEngineering,2003.Proceedings.19thInternationalConferenceon.IEEE,2003.
ThedatastreamingparadigmanditsuseinFogarchitectures 48VincenzoGulisano
Bibliography22. Cederman,Daniel,etal."Briefannouncement:concurrentdatastructuresforefficientstreamingaggregation."Proceedingsof
the26thACMsymposiumonParallelisminalgorithmsandarchitectures.ACM,2014.23. Ji,Yuanzhen,etal."Quality-drivenprocessingofslidingwindowaggregatesoverout-of-orderdatastreams."Proceedingsofthe
9thACMInternationalConferenceonDistributedEvent-BasedSystems.ACM,2015.24. Ji,Yuanzhen,etal."Quality-drivendisorderhandlingforconcurrentwindowedstreamquerieswithsharedoperators."
Proceedingsofthe10thACMInternationalConferenceonDistributedandEvent-basedSystems.ACM,2016.25. Gulisano,Vincenzo,etal."Scalejoin:Adeterministic,disjoint-parallelandskew-resilientstreamjoin."BigData(BigData),2015
IEEEInternationalConferenceon.IEEE,2015.26. Ottenwälder,Beate,etal."MigCEP:operatormigrationformobilitydrivendistributedcomplexeventprocessing."Proceedings
ofthe7thACMinternationalconferenceonDistributedevent-basedsystems.ACM,2013.27. DeMatteis,Tiziano,andGabrieleMencagli."Keepcalmandreactwithforesight:strategiesforlow-latencyandenergy-efficient
elasticdatastreamprocessing."Proceedingsofthe21stACMSIGPLANSymposiumonPrinciplesandPracticeofParallelProgramming.ACM,2016.
28. Balazinska,Magdalena,etal."Fault-toleranceintheBorealisdistributedstreamprocessingsystem."ACMTransactionsonDatabaseSystems(TODS)33.1(2008):3.
29. CastroFernandez,Raul,etal."Integratingscaleoutandfaulttoleranceinstreamprocessingusingoperatorstatemanagement."Proceedingsofthe2013ACMSIGMODinternationalconferenceonManagementofdata.ACM,2013.
ThedatastreamingparadigmanditsuseinFogarchitectures 49VincenzoGulisano
Bibliography
30. Dwork,Cynthia."Differentialprivacy:Asurveyofresults."InternationalConferenceonTheoryandApplicationsofModelsofComputation.SpringerBerlinHeidelberg,2008.
31. Dwork,Cynthia,etal."Differentialprivacyundercontinualobservation."Proceedingsoftheforty-secondACMsymposiumonTheoryofcomputing.ACM,2010.
32. Kargl,Frank,ArikFriedman,andRoksana Boreli."Differentialprivacyinintelligenttransportationsystems."ProceedingsofthesixthACMconferenceonSecurityandprivacyinwirelessandmobilenetworks.ACM,2013.
ThedatastreamingparadigmanditsuseinFogarchitectures 50VincenzoGulisano