Amidst demo (BNAIC 2015)

46
1 BNAIC 2015 November 5-6, 2015 AMIDST Toolbox A Java library for Analysis of MassIve Data Streams using Probabilistic Graphical Models FP7 European research project http://amidst.eu Anders L. Madsen, Andres R. Masegosa, Ana M. Martinez, Hanen Borchani, Thomas D. Nielsen, Helge Langseth, Antonio Salmeron, Dario Ramos-Lopez.

Transcript of Amidst demo (BNAIC 2015)

Page 1: Amidst demo (BNAIC 2015)

1BNAIC 2015 November 5-6, 2015

AMIDST Toolbox A Java library for Analysis of MassIve Data Streams using

Probabilistic Graphical Models

FP7 European research projecthttp://amidst.eu

AndersL.Madsen,AndresR.Masegosa, AnaM.Martinez,Hanen Borchani,ThomasD.Nielsen,Helge Langseth,Antonio

Salmeron, DarioRamos-Lopez.

Page 2: Amidst demo (BNAIC 2015)

Outline1. OverviewofAMIDSTToolbox

o Whydatastreamsareimportant?o WhyPGMs foranalyzingdatastreams?o ScalableInference(andlearning)o Roadmapforcomingreleases

2. LiveDemo:Modelingconceptdriftinfinancialdata.o Handlingdatastreams.o DefiningBayesiannetworkswithhidden variables.o InferenceandLearningBayesiannetworks.

BNAIC 2015 November 5-6, 2015

Page 3: Amidst demo (BNAIC 2015)

ScopePartI

Page 4: Amidst demo (BNAIC 2015)

Data Streams everywhere

• Unboundedflowsofdataaregenerateddaily:• SocialNetworks• NetworkMonitoring• Financial/Bankingindustry• ….

BNAIC 2015 November 5-6, 2015

Page 5: Amidst demo (BNAIC 2015)

Data Stream Processing

• Processingdatastreamsischallenging:– Donotfitinmainmemory– ContinuousModelupdating– ContinuousModelInference– ConceptDrift

BNAIC 2015 November 5-6, 2015

Page 6: Amidst demo (BNAIC 2015)

Processing Massive Data Streams

• Everythinghastoscale:• ScalableComputinginfrastructure• ScalableModels/Inference/Learning

BNAIC 2015 November 5-6, 2015

Page 7: Amidst demo (BNAIC 2015)

AMIDST Toolbox

• Scalableframeworkfordatastreamprocessing.• BasedonProbabilisticGraphicalModels.• UniqueprojectfordatastreamminingusingPGMs.• Opensourceproject(ApacheSoftwareLicense2.0).

BNAIC 2015 November 5-6, 2015

Page 8: Amidst demo (BNAIC 2015)

AMIDST EU Project

8

§ Thistoolboxaimstodealwithreal,complexandmassivedatastreams.§ Appliedtorealuse-casesofAMIDST’sindustrialpartners.

BNAIC 2015 November 5-6, 2015

Page 9: Amidst demo (BNAIC 2015)

Toolbox Web Page

http://amidst.github.io/toolbox/

BNAIC 2015 November 5-6, 2015

Page 10: Amidst demo (BNAIC 2015)

WhyPGMsfordatastreamprocessing?

PartII

Page 11: Amidst demo (BNAIC 2015)

Why Graphical Models?

§ Let’slookatthefollowingsimpleexample:§ Streamofsensormeasurementsabouttemperature andsmoke presenceinagivengeographicalarea.

§ Monitorthestreamtodetectthepresenceofafire (eventdetectionproblem)

?BNAIC 2015 November 5-6, 2015

Page 12: Amidst demo (BNAIC 2015)

§ Casttheproblemasananomalydetectionproblem(outliers).§ StreamingK-Means(widelyusedinindustry).

Why Graphical Models?

Anomaly

BNAIC 2015 November 5-6, 2015

Page 13: Amidst demo (BNAIC 2015)

Why Graphical Models for analyzing Data Streams?§ Manydatastreamsmodelsareblackboxmodels:

§ Pros:§ Noneedtounderstandtheproblem.

§ Cons:§ Manyhyper-parameterstotune.§ Blackbox modelscanrarelyexplainwhattheylearned.

Stream

Blackbox Model

Predictions

BNAIC 2015 November 5-6, 2015

Page 14: Amidst demo (BNAIC 2015)

§ BayesianNetworks:§ Openboxmodels§ Encodepriorknowledge.§ Continuousanddiscretevariables(CLGnetworks).§ Example:

Why Graphical Models?

Fire

Temp Smoke

T1 T2 T3 S1

p(Fire=true|t1,t2,t3,s1)

BNAIC 2015 November 5-6, 2015

Page 15: Amidst demo (BNAIC 2015)

Why Graphical Models?

Stream Predictions

Openbox Models

BNAIC 2015 November 5-6, 2015

Page 16: Amidst demo (BNAIC 2015)

Why Graphical Models?

Stream Predictions

Openbox Models

Blackbox InferenceEngine(multi-coreparallelization)

BNAIC 2015 November 5-6, 2015

Page 17: Amidst demo (BNAIC 2015)

InferenceEnginePartIII

Page 18: Amidst demo (BNAIC 2015)

Inference Engine

§ Queryingthemodel§ p(Fire=true|t1,t2,t3,s1,season)§ E(Temperature|smoke=true).

BNAIC 2015 November 5-6, 2015

Page 19: Amidst demo (BNAIC 2015)

Inference Engine

§ Queryingthemodel§ p(Fire=true|t1,t2,t3,s1,season)§ E(Temperature|smoke=true)

§ Learningfromdata(usingaBayesianapproach):§ Bayesianframeworknaturallydealswithdatastreams.§ Priorisupdatedinthelightofnewdata.

p(✓|d1, . . . , dn, dn+1) / p(dn+1|✓)p(✓|d1, . . . , dn)

BNAIC 2015 November 5-6, 2015

Page 20: Amidst demo (BNAIC 2015)

Querying the model

§ ParallelMonteCarloInference[Salmeron etal.CAEPIA2015]

§ ExploitMulti-Core(poweredbyJava8)

BNAIC 2015 November 5-6, 2015

Page 21: Amidst demo (BNAIC 2015)

Querying the model

§ ParallelMonteCarloInference[Salmeron etal.CAEPIA2015]

§ ExploitMulti-Core(poweredbyJava8)

§ VariationalMessagePassing[Winnetal.JMLR2004]§ Deterministicapproximation

BNAIC 2015 November 5-6, 2015

Page 22: Amidst demo (BNAIC 2015)

Learning from data streams

§ Bayesianapproach:§ Learningasaninferenceproblem.§ PoweredbyVMP.

Z

x

i = 1 . . . N

BNAIC 2015 November 5-6, 2015

Page 23: Amidst demo (BNAIC 2015)

Learning from data streams

§ Bayesianapproach:§ Learningasaninferenceproblem.§ PoweredbyVMP.§ Plateaunotation!!

BNAIC 2015 November 5-6, 2015

Page 24: Amidst demo (BNAIC 2015)

Learning from data streams

§ ParallelStreamingVariationalBayes [Brodericketal.NIPS2013]

§ PoweredbyVariationalMessagePassing.§ Multi-coreprocessing(usingJava8).

BNAIC 2015 November 5-6, 2015

Page 25: Amidst demo (BNAIC 2015)

Links to other open software

§ MoaLink§ MOAisastate-of-the-arttoolfordatastreammining.§ UsingAMIDSTmodelswithinMOAGUI!

§ Greatforevaluation&comparison.

BNAIC 2015 November 5-6, 2015

Page 26: Amidst demo (BNAIC 2015)

Links to other open software

§ HuginLink§ Hugin isacommercialsoftwareforPGMsandinfluencediagrams.§ Modelsconversion.§ Hugin inferenceenginecanbeusedwithinAMIDST.

26BNAIC 2015 November 5-6, 2015

Page 27: Amidst demo (BNAIC 2015)

RoadMapPartIII

Page 28: Amidst demo (BNAIC 2015)

Dynamic Bayesian Networks(release 1.1)

§ Encodetemporalknowledge§ Naturallyfitswithdatastreams

Fire(t)

Temp(t) Smoke(t)

T1(t) T2(t) T3(t) S1(t)

Fire(t-1)

Temp(t-1)

BNAIC 2015 November 5-6, 2015

Page 29: Amidst demo (BNAIC 2015)

Distributed Stream Processing(release 1.1)

§ RLink§ InvokeAMIDSTInferenceenginewithinR.§ Preliminaryfunctionalityrecentlypresented.

29BNAIC 2015 November 5-6, 2015

Page 30: Amidst demo (BNAIC 2015)

Distributed Stream Processing(release 2.0)

§ FlinkLink§ ApacheFlink:Opensourceplatformfordistributedstreamprocessing.§ HandlingMassiveDataStreams.

30BNAIC 2015 November 5-6, 2015

Page 31: Amidst demo (BNAIC 2015)

Open Source project

§ We’reopentoyourcontributions!!;)

31BNAIC 2015 November 5-6, 2015

Page 32: Amidst demo (BNAIC 2015)

Hosted on Github

§ Download::>git clonehttps://github.com/amidst/toolbox.git

§ Compile::>./compile.sh

§ Run::>./run.sh <class-name>

BNAIC 2015 November 5-6, 2015

Page 33: Amidst demo (BNAIC 2015)

Please “star” our project!(if you like it)

33BNAIC 2015 November 5-6, 2015

Page 34: Amidst demo (BNAIC 2015)

Any questions before the live demo ?

34

Page 35: Amidst demo (BNAIC 2015)

LiveDemoTrackingconceptdriftin

FinancialdatawithAMIDST

Borchani etal.ModelingConceptDrift:AProbabilisticGraphicalModelBasedApproach.IDA2015.

Page 36: Amidst demo (BNAIC 2015)

Demo Code Available in Github

36

eu.amidst.bnaic2015.examples.BCC

BNAIC 2015 November 5-6, 2015

Page 37: Amidst demo (BNAIC 2015)

Financial Data

§ ProvidedbyBCC(spanish regionalbank).

§ Consistofmonthlyaggregatedinformation§ Activeclientsbetween18and65yearsold.§ DatabetweenApril2007andMarch2014.§ 11variables

§ Income,totalcredit,expenses,etc.

§ Eachclientisclassifiedas:§ defaulter/non-defaulterinfollowing12months.

37BNAIC 2015 November 5-6, 2015

Page 38: Amidst demo (BNAIC 2015)

Financial Data

§ Hypothesis:§ Doesspanish financialcrisisimpactonbankcustomers?§ Lookattheevolutionofregionalunemploymentrate.

38BNAIC 2015 November 5-6, 2015

Page 39: Amidst demo (BNAIC 2015)

Data Preprocessing/Visualization

§ Visualizetheevolutionofthemonthlyaggregateddata:§ Datadoesnotfitinmainmemory!

39BNAIC 2015 November 5-6, 2015

Page 40: Amidst demo (BNAIC 2015)

Model Building

§ WeuseasimpleNaïveBayesmodel:§ Withaglobalhiddenvariabletotrackconceptdrift.

40

D

A1 A2 A11…

H

BNAIC 2015 November 5-6, 2015

Page 41: Amidst demo (BNAIC 2015)

Model Building

§ WealsousePlateaunotation§ “H”isdesignedtocaptureconceptdrift

41

D

A1 A2 A11…

HtHt-1

i=1…M

BNAIC 2015 November 5-6, 2015

Page 42: Amidst demo (BNAIC 2015)

Tracking concept drift

42BNAIC 2015 November 5-6, 2015

Page 43: Amidst demo (BNAIC 2015)

Tracking concept drift

43BNAIC 2015 November 5-6, 2015

Page 44: Amidst demo (BNAIC 2015)

References

§ Masegosaetal.AMIDST:AnalysisofMassiveDataStreamsusingProbabilisticGraphicalModels.Submitted toJMLR.2015.

§ Borchani etal.ModelingConceptDrift:AProbabilisticGraphicalModelBasedApproach.IDA2015.

§ Masegosaetal.Probabilisticgraphicalmodelsonmulti-coreCPUsusingJava8.Submitted toIEEEComputational IntelligenceMagazine,SpecialIssueonComputational IntelligenceSoftware.2015.

§ Salmeron etal.ParallelimportancesamplinginconditionallinearGaussiannetworks.InProceedingsof theConferencia delaAsociacion Españolapara laInteligencia Artificial, volumeinpress,2015.

§ Winnetal. Variationalmessagepassing.JournalofMachineLearningResearch,6:661–694,2005.

§ Brodericketal.Streamingvariational Bayes.InAdvancesinNeuralInformationProcessingSystems,pages1727–1735,2013.

44BNAIC 2015 November 5-6, 2015

Page 45: Amidst demo (BNAIC 2015)

Any questions ?

45

http://amidst.github.io/toolbox/BNAIC 2015 November 5-6, 2015

Page 46: Amidst demo (BNAIC 2015)

Open Source project

§ We’reopentoyourcontributions!!;)

46BNAIC 2015 November 5-6, 2015