Model-based Validation of Streaming Data
description
Transcript of Model-based Validation of Streaming Data
Model-based Validation of Streaming Data
Cheng Xu, Tore Risch
Dept. Information Technology
Uppsala University, Sweden
Daniel Wedlund, Martin Helgoson
AB Sandvik Coromant, Sweden
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
Talk Overview Motivation Approach and System Architecture Demonstrators Performance experiments Conclusion Related work Future work
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
Motivation Functional products: integrated provision of hardware,
software and services, not just the traditional hardware=> Manufacturer responsble for functioning
In modern manufacturing industry sensors installed on equipment-in-use generate many high rate data streams
Providing productivity, reliability, and quality of functional products require monitoring many streams for unexpected behavior.
When the number of machines increases and data flows are high, validation with low latency may be challenging
SVALI (Stream VALIdator): General system to validate correct equipment behavior by analyzing streams on-the-fly.
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
SVALI, Stream VALIdatorTwo validation approaches:
Model-and-validate The user defines an analytical math model of expected
behavior based on streams from equipment sensors The user also defines a validation model that identifies
abnormal equipment sensor readings by comparing the result of the analytical model with measured sensor streams.
A simple case is detecting when difference between expected power consumption and measured power consumption exceeds some threshold.
Learn-and-validate The user provides (statistical) learning model based on a
sampled sub-stream of correctly behaving equipment As for model-and-validate the user also provides a validation
model
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
SVALI ArchitectureCLIENT
VISUALIZERS AND ALERTERSUPDATES
SVALI VALIDATION FUNCTIONS
model-n-validate learn-n-validate
STREAM MODELS
Analytical model Statistical model
STREAM WRAPPERS
Stream wrapper A Stream wrapper B
equipment A equipment B
CQ 1 CQ 2
TCP TCP
set threshold = 1.3
EPICDSMS
DB
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
Model-and-validate model_n_validate(Bag of Stream s, Function modelfn,
Function validatefn) ->Stream of (Number ts, Object me, Object ex)
modelfn(Object se)->Object ex validatefn(Object se, Object ex)->(Number ts, Object me)
Learn-and-validate learn_n_validate(Bag of Stream s, Function learnfn, Integer n, Function validatefn) -> Stream of (Number ts, Object me, Object ex) learnfn(Vector of Object sa)->Object ex validatefn(Object se, Object ex)->(Number ts, Object me)
The difference is how themodel is defined
SVALI Validation functions
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
create function validatePower(Record r, Number ex) -> (Number ts, Number me) as select ts(r), me where me = measuredPower(r) and abs(ex - me) > th(“mill1”);
select model_n_validate(bagof(input), #'expectedPower',#’validatePower’)from Stream inputwhere input = corenetJsonWrapper("h1", 1337);
Model-n-validate demonstrator
The side milling process
The analytical and validation models are entered into the SVALI system
ae
[mm]fz
[mm/tooth]hex
[mm]ap
[mm]vc
[m/min]zc
2 0.0756 0.05 20 200 4
3 0.0641 0.05 20 200 4
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
create function extractPowerW(Window w) -> Vector of Number as vselect extractPower(r) from Record r where r in w;
Learn-n-validate demonstrator
Cyclic behavior
Cyclic behavior is defined as predicate (dynamic) windows. A vector of expected power consumptions is computed from the sampled n
first predicate windows The learning model is the normalized average vector over the sampled
windows Validation is done by comparing the normalized euclidean distance
between the learnt power consumptions and the current window’s power consumptions
create function cycleStart(Record s) -> Boolean as s[“trigger”] = 1; The window starts when the trigger is 1create function cycleStop(Record s, Record r) -> Boolean as r[“trigger”] = 0 and s[“trigger”] = 1; The window ends when the trigger is 0 and the
window was started
create function learnCycle(Vector of Window f) -> Vector of Number as navg(select extractPowerW(w) from Window w where w in f);
create function validateCycle(Window w, Vector e) -> (Number ts, Vector of Number m) as select timestamp(w), m where neuclid(e, m) > th(“machine2”) and m = extractPowerW(w);
select learn_n_validate(bagof(sw), #’learnCycle’, 2, #’validateCycle’) from Stream s, Stream sw where s= corenetJsonWrapper( "h2", 1338) and sw = pwindowize(s, #’cycleStart’, #’cycleStop’);
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
Performance Experiments Experiment setup
Dell NUMA computer PowerEdge R815 featuring 4 CPUs with 16 2.3 GHz cores each. OS: Scientific Linux release 6.2
The performance of SVALI is measured by average response time of two queries Q1, model-and-validate over single stream events Q2, model-and-validate moving average over 0.1 second stream
windows
To scale-up the number of machines, streams are generated based on real data streams provided by industrial partner with different arrival rates (1 ms – 10 ms), each stream is tagged with a machine id.
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
Central vs Parallel
Performance Experiments
merge on ts validation
machine0
machinei
...
one SVALI node
machine0
machinei
...validation0
validationi
...
... merge on ts
central validation parallel validation
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
Fig. 1 Average response time Q1
Experiment Measurement Q1
merge on ts validation
machine0
machinei
...
one SVALI node
machine0
machinei
...
validation0
validationi
...
... merge on ts
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
Experiment Measurement Q2
Fig. 2 Average response time Q2
merge on ts validation
machine0
machinei
...
one SVALI node
machine0
machinei
...
validation0
validationi
...
... merge on ts
validation includes a groupby on machine id
It is already grouped
around 2 ms
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
Conclusion
Two general validation approaches were presented to validate stream behaviors, called model-and-validate and learn-and-validate
Two demonstrators show how they are used in real industrial application streams
Parallel execution enables computation of stream validation with limited delays over many machines
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
Related work
Jakubek, S. and Strasser, T.: Fault-diagnosis using neural networks with ellipsoidal basis functions. American Control Conference. Vol. 5. pp.3846-3851, 2002
Learning algorithm to reduce the number of measurements for fault detection,
while we use parallel processing to enable low delays
Tan, T., Gu, X., and Wang, H.: Adaptive system anomaly prediction for large-scale hosting infrastructures. PODC Conf., 2010
Prediction instead of detection Low arrival rates, e.g. one sample every 2 seconds, need not parallelization
Wang, D., Rundensteiner, E., Ellison, R.: Active Complex Event Processing for Realtime Health Care, VLDB Conf., 3(2): pp.1545-1548, 2010
Lower level rule mechanism triggered by state changes during the continuous query process
Zeitler, E. and Risch, T.: Massive scale-out of expensive continuous queries, Proceedings of the VLDB Endowment, ISSN 2150-8097, Vol. 4, No. 11, pp. 1181-11888, 2011
SVALI’s underlying DSMS EPIC extends that work with e.g. sliding windows and incremental aggregation. SVALI provides validation functionalities on top of EPIC
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se
Future work
Other strategies for automatic performance improvements
Adaptive learning model by re-sampling Adaptive parallelization of expensive
validation functions
Info
rmat
ions
tekn
olog
i
Institutionen för informationsteknologi | www.it.uu.se