A new streaming computation engine for real-time analytics by Michael Barton at Big Data Spain 2015
-
Upload
big-data-spain -
Category
Technology
-
view
462 -
download
0
Transcript of A new streaming computation engine for real-time analytics by Michael Barton at Big Data Spain 2015
& ALGORITHMS
Michael Barton@mrb_bartonITRS Group Malaga
What happens when you make analysis easy to re-use
We have a big complicated trading system
Can we calculate the latency of each order?
Lets say I work in a bank
entry point
exit point
HTTP POST
{“MsgDirection”: “I”,“SendingTime”: “2015-04-05T14:30Z”,…
}
Simple to publish data
entry point
exit point
HTTP POST
{“MsgDirection”: “I”,“SendingTime”: “2015-04-05T14:31Z”,…
}
{“MsgDirection”: “O”,“SendingTime”: “2015-04-05T14:33Z”,…
}
{“MsgDirection”: “O”,“SendingTime”: “2015-04-05T14:32Z”,…
}
Tell Valo the schema?
{"schema": {
"version": "1.0.0","config": {},"topDef": {
"type": "record","properties": {
...
"MsgDirection": {"type": "string","comments": "I for input message, O for output"
},"Account": {
"type": "string","optional": "true","comments": "Account mnemonic as agreed between buy and sell sides"
},
...
"SendingTime": {"type": "datetime","comments": "Time of message transmission (always expressed in UTC (Universal Time Coordinated, also known a
},"Side": {
"type": "string","optional": "true","comments": "Side"
},"Symbol": {
"type": "string","optional": "true","comments": "Ticker symbol. Common, human understood representation of the security."
},
...
{“MsgDirection”: “I”,“SendingTime”: “2015-04-05T14:30Z”,…
}
{“MsgDirection”: “I”,“SendingTime”: “2015-04-05T14:31Z”,…
}
{“MsgDirection”: “O”,“SendingTime”: “2015-04-05T14:33Z”,…
}
{“MsgDirection”: “O”,“SendingTime”: “2015-04-05T14:32Z”,…
}
Lets use it
from historical /streams/demo/fix/exchange where MsgDirection == "I" into left
inner join from /streams/demo/fix/exchange where MsgDirection == "O" into right
on left.ClOrdID == right.ClOrdID &&left.MsgType=="New Order Single" &&right.MsgType=="Execution Report“
select left.ClOrdID, duration(right.SendingTime, left.SendingTime) as resTime
Lets use it
from historical /streams/demo/fix/exchange where MsgDirection == "I" into left
inner join from /streams/demo/fix/exchange where MsgDirection == "O" into right
on left.ClOrdID == right.ClOrdID &&left.MsgType=="New Order Single" &&right.MsgType=="Execution Report“
select left.ClOrdID, duration(right.SendingTime, left.SendingTime) as resTime
From Filter Join Output
Cluster of nodesCommodity hardware
Uniform architectureNo special leaders or roles
Streams of dataImmutable, append-only, distributed Eventual consistency in failure cases
It’s just VALO
nodeA
Semi-structured Repo
Time Series Repo
…
We know our storage
Semi-structured Repo
Hierarchical Document DataFlexible schemas
Lucene IndexesTaxonomies and Facets
Time Series Repo
Well defined schema
Custom I/O layerBitmap B+Tree Indices
From
nodeA
Semi-structured Repo
Time Series Repo
…Filter
Join
Execute directly against the data and indexes in storage
Push down the query
How can we re-use this?
from /streams/demo/fix/exchange where MsgDirection == "I" into left
inner join from /streams/demo/fix/exchange where MsgDirection == "O" into right
on left.ClOrdID == right.ClOrdID &&left.MsgType=="New Order Single" &&right.MsgType=="Execution Report“
select left.ClOrdID, duration(right.SendingTime, left.SendingTime) as resTime
Real-timeHistorical
HybridTime Ranges
ward-G5
ward-G3
intensive-care
Can we look for unusual activityin the ECG monitors?
Lets say I work in a hospital
Assumption-Free Anomaly Detection in Time Series
Li WeiNitin Kumar
Venkata LollaEamonn Keogh Stefano Lonardi
Chotirat Ann Ratanamahatana
University of California – RiversideDepartment of Computer Science & Engineering
Riverside, CA 92521, USA
http://alumni.cs.ucr.edu/~ratana/SSDBM05.pdfhttp://alumni.cs.ucr.edu/~wli/SSDBM05/
Here’s an interesting paper
@ValoOnlineFunction("anomaly")
@ValoOnlineFunctionAnnotation(SchemaAnnotations.ANALYTICS.ANOMALY)
@ValoOnlineFunctionDescription("Unsupervised anomaly detection for time series")
object OnlineAnomalyDetectionFactory extends
OnlineAlgorithmFactory[OnlineAnomalyDetectionParams, Double, OnlineAnomalyDetectionResult] {
override val isCommutative: Boolean = false
override val isAssociative: Boolean = false
override val isMergeable: Boolean = false
override def getDependency(windowType: WindowType): AlgoDependency = AlgoDependency.NoDependencies
override def init(args: OnlineAnomalyDetectionParams): OnlineAlgorithm[Double, OnlineAnomalyDetectionResult] = {
new OnlineAnomalyDetection(args.lagWindow, args.leadWindow, args.featureSize, 3, 5)
}
}
final case class OnlineAnomalyDetectionParams(lagWindow: Int, leadWindow: Int, featureSize: Int)
final case class OnlineAnomalyDetectionResult(isTraining: Boolean, point: Double, signal: Double)
Full algorithm code omitted for brevity!
So lets implement it!
HTTP POST
{“ts”: “2015-04-05T14:30Z”,“contributor”: “ward-g3-monitor0”“value”: 0.25443
}
{“ts”: “2015-04-05T14:30Z”,“contributor”: “ward-g3-monitor1”“value”: 0.36432
}
{“ts”: “2015-04-05T14:30Z”,“contributor”: “intensive-care”“value”: 0.46580
}
{“ts”: “2015-04-05T14:31Z”,“contributor”: “ward-g3-monitor0”“value”: 0.26073
}
ward-G3
intensive-care
Simple to publish data
Lets use it
from historical /streams/demo/infrastructure/ecggroup by contributorselect contributor, anomaly(200, 40, 20, value) as resultemit every value
final case class OnlineAnomalyDetectionParams(lagWindow: Int, leadWindow: Int, featureSize: Int)
Lets use it
from historical /streams/demo/infrastructure/ecggroup by contributorselect contributor, anomaly(200, 40, 20, value) as resultemit every value
One type of monitor is consistentlyhave issues and producing bad results.
Can we monitor which ones?
ward-G3
intensive-careward-G5
Can we re-use the analysis?
Live updating sets of contributors to data
manufacturer == “ACME”
Re-use the same query across domains
ACME Monitors
Domains
ward-G3
intensive-careward-G5
http://collections.rmg.co.uk/mediaLib/476/media-476182/large.jpgCC BY-NC-SA
Can we re-use the analysis?
Same algorithm
Similar queries
Real-time and historical
Can we re-use the analysis?
http://collections.rmg.co.uk/mediaLib/476/media-476182/large.jpgCC BY-NC-SA