Model-based Validation of Streaming Data

16
Model-based Validation of Streaming Data Cheng Xu, Tore Risch Dept. Information Technology Uppsala University, Sweden Daniel Wedlund, Martin Helgoson AB Sandvik Coromant, Sweden

description

Model-based Validation of Streaming Data. Cheng Xu, Tore Risch Dept. Information Technology Uppsala University, Sweden Daniel Wedlund, Martin Helgoson AB Sandvik Coromant, Sweden. Talk Overview. Motivation Approach and System Architecture Demonstrators Performance experiments - PowerPoint PPT Presentation

Transcript of Model-based Validation of Streaming Data

Page 1: Model-based Validation of Streaming Data

Model-based Validation of Streaming Data

Cheng Xu, Tore Risch

Dept. Information Technology

Uppsala University, Sweden

Daniel Wedlund, Martin Helgoson

AB Sandvik Coromant, Sweden

Page 2: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

Talk Overview Motivation Approach and System Architecture Demonstrators Performance experiments Conclusion Related work Future work

Page 3: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

Motivation Functional products: integrated provision of hardware,

software and services, not just the traditional hardware=> Manufacturer responsble for functioning

In modern manufacturing industry sensors installed on equipment-in-use generate many high rate data streams

Providing productivity, reliability, and quality of functional products require monitoring many streams for unexpected behavior.

When the number of machines increases and data flows are high, validation with low latency may be challenging

SVALI (Stream VALIdator): General system to validate correct equipment behavior by analyzing streams on-the-fly.

Page 4: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

SVALI, Stream VALIdatorTwo validation approaches:

Model-and-validate The user defines an analytical math model of expected

behavior based on streams from equipment sensors The user also defines a validation model that identifies

abnormal equipment sensor readings by comparing the result of the analytical model with measured sensor streams.

A simple case is detecting when difference between expected power consumption and measured power consumption exceeds some threshold.

Learn-and-validate The user provides (statistical) learning model based on a

sampled sub-stream of correctly behaving equipment As for model-and-validate the user also provides a validation

model

Page 5: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

SVALI ArchitectureCLIENT

VISUALIZERS AND ALERTERSUPDATES

SVALI VALIDATION FUNCTIONS

model-n-validate learn-n-validate

STREAM MODELS

Analytical model Statistical model

STREAM WRAPPERS

Stream wrapper A Stream wrapper B

equipment A equipment B

CQ 1 CQ 2

TCP TCP

set threshold = 1.3

EPICDSMS

DB

Page 6: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

Model-and-validate model_n_validate(Bag of Stream s, Function modelfn,

Function validatefn) ->Stream of (Number ts, Object me, Object ex)

modelfn(Object se)->Object ex validatefn(Object se, Object ex)->(Number ts, Object me)

Learn-and-validate learn_n_validate(Bag of Stream s, Function learnfn, Integer n, Function validatefn) -> Stream of (Number ts, Object me, Object ex) learnfn(Vector of Object sa)->Object ex validatefn(Object se, Object ex)->(Number ts, Object me)

The difference is how themodel is defined

SVALI Validation functions

Page 7: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

create function validatePower(Record r, Number ex) -> (Number ts, Number me) as select ts(r), me where me = measuredPower(r) and abs(ex - me) > th(“mill1”);

select model_n_validate(bagof(input), #'expectedPower',#’validatePower’)from Stream inputwhere input = corenetJsonWrapper("h1", 1337);

Model-n-validate demonstrator

The side milling process

The analytical and validation models are entered into the SVALI system

ae

[mm]fz

[mm/tooth]hex

[mm]ap

[mm]vc

[m/min]zc

2 0.0756 0.05 20 200 4

3 0.0641 0.05 20 200 4

Page 8: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

create function extractPowerW(Window w) -> Vector of Number as vselect extractPower(r) from Record r where r in w;

Learn-n-validate demonstrator

Cyclic behavior

Cyclic behavior is defined as predicate (dynamic) windows. A vector of expected power consumptions is computed from the sampled n

first predicate windows The learning model is the normalized average vector over the sampled

windows Validation is done by comparing the normalized euclidean distance

between the learnt power consumptions and the current window’s power consumptions

create function cycleStart(Record s) -> Boolean as s[“trigger”] = 1; The window starts when the trigger is 1create function cycleStop(Record s, Record r) -> Boolean as r[“trigger”] = 0 and s[“trigger”] = 1; The window ends when the trigger is 0 and the

window was started

create function learnCycle(Vector of Window f) -> Vector of Number as navg(select extractPowerW(w) from Window w where w in f);

create function validateCycle(Window w, Vector e) -> (Number ts, Vector of Number m) as select timestamp(w), m where neuclid(e, m) > th(“machine2”) and m = extractPowerW(w);

select learn_n_validate(bagof(sw), #’learnCycle’, 2, #’validateCycle’) from Stream s, Stream sw where s= corenetJsonWrapper( "h2", 1338) and sw = pwindowize(s, #’cycleStart’, #’cycleStop’);

Page 9: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

Performance Experiments Experiment setup

Dell NUMA computer PowerEdge R815 featuring 4 CPUs with 16 2.3 GHz cores each. OS: Scientific Linux release 6.2

The performance of SVALI is measured by average response time of two queries Q1, model-and-validate over single stream events Q2, model-and-validate moving average over 0.1 second stream

windows

To scale-up the number of machines, streams are generated based on real data streams provided by industrial partner with different arrival rates (1 ms – 10 ms), each stream is tagged with a machine id.

Page 10: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

Central vs Parallel

Performance Experiments

merge on ts validation

machine0

machinei

...

one SVALI node

machine0

machinei

...validation0

validationi

...

... merge on ts

central validation parallel validation

Page 11: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

Fig. 1 Average response time Q1

Experiment Measurement Q1

merge on ts validation

machine0

machinei

...

one SVALI node

machine0

machinei

...

validation0

validationi

...

... merge on ts

Page 12: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

Experiment Measurement Q2

Fig. 2 Average response time Q2

merge on ts validation

machine0

machinei

...

one SVALI node

machine0

machinei

...

validation0

validationi

...

... merge on ts

validation includes a groupby on machine id

It is already grouped

around 2 ms

Page 13: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

Conclusion

Two general validation approaches were presented to validate stream behaviors, called model-and-validate and learn-and-validate

Two demonstrators show how they are used in real industrial application streams

Parallel execution enables computation of stream validation with limited delays over many machines

Page 14: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

Related work

Jakubek, S. and Strasser, T.: Fault-diagnosis using neural networks with ellipsoidal basis functions. American Control Conference. Vol. 5. pp.3846-3851, 2002

Learning algorithm to reduce the number of measurements for fault detection,

while we use parallel processing to enable low delays

Tan, T., Gu, X., and Wang, H.: Adaptive system anomaly prediction for large-scale hosting infrastructures. PODC Conf., 2010

Prediction instead of detection Low arrival rates, e.g. one sample every 2 seconds, need not parallelization

Wang, D., Rundensteiner, E., Ellison, R.: Active Complex Event Processing for Realtime Health Care, VLDB Conf., 3(2): pp.1545-1548, 2010

Lower level rule mechanism triggered by state changes during the continuous query process

Zeitler, E. and Risch, T.: Massive scale-out of expensive continuous queries, Proceedings of the VLDB Endowment, ISSN 2150-8097, Vol. 4, No. 11, pp. 1181-11888, 2011

SVALI’s underlying DSMS EPIC extends that work with e.g. sliding windows and incremental aggregation. SVALI provides validation functionalities on top of EPIC

Page 15: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se

Future work

Other strategies for automatic performance improvements

Adaptive learning model by re-sampling Adaptive parallelization of expensive

validation functions

Page 16: Model-based Validation of Streaming Data

Info

rmat

ions

tekn

olog

i

Institutionen för informationsteknologi | www.it.uu.se