HTM & Apache Flink (2016-06-27)
-
Upload
eron-wright -
Category
Data & Analytics
-
view
749 -
download
0
Transcript of HTM & Apache Flink (2016-06-27)
![Page 1: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/1.jpg)
Eron Wright@eronwright
HTM & Apache FlinkExtending Flink for Anomaly Detection with Hierarchical Temporal Memory (HTM)
![Page 2: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/2.jpg)
What is HTM?
2
![Page 3: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/3.jpg)
3
Hierarchical Temporal Memory (HTM) is a theory of
computation for the neocortex.
![Page 4: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/4.jpg)
History
4
2005 – 2009 HTM theory First generation algorithms Hierarchy and vision problems Vision Toolkit
2002
2004
2009 – 2012 Cortical Learning
Algorithms SDRs, sequence
memory, continuous learning
Applications exploration
2013 – 2015 Continued HTM
development NuPIC open source
project Grok for anomaly
detection
2005 2014 – Sensorimotor Goal directed
behavior Sequence
classificationhttp://www.slideshare.net/numenta/why-neurons-have-thousands-of-synapses-a-model-of-sequence-memory-in-the-brain
![Page 5: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/5.jpg)
Computational Properties Online, Unsupervised Learning High-order Representations
• For example: sequences “ABCD” vs “XBCY” Multiple Simultaneous Predictions
• For example: “BC” predicts both “D” and “Y” Anomaly Scores
5
![Page 6: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/6.jpg)
Implementations of HTM Numerous Implementations• NuPIC – official reference library (Python/C)• HTM.java – community-supported library
(Java) Evolving Rapidly• Tracking the theory!
6
![Page 7: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/7.jpg)
7
NuPIC learns the time-based patterns in data, predicts future values, and
detects anomalies.
![Page 8: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/8.jpg)
8
Introducing Flink-HTM
![Page 9: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/9.jpg)
9
flink-htm provides HTM-based learning operators for the Flink
DataStream API, based on HTM.java.
![Page 10: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/10.jpg)
Benefits Good fit for Apache Flink
• Automated model-building• Continuous learning• Temporal awareness
10
Contrast with:github.com/StephanEwen/flink-demos/tree/master/streaming-state-machine
![Page 11: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/11.jpg)
Benefits (con’t) Good fit for HTM• Integration w/ data pipeline• Data connectivity
• e.g. Kafka, Twitter, HDFS, AWS Kinesis• DSL for stream pre- and post-processing
• e.g. aggregation, transformation• Distributed, reliable processing• Event-Time Awareness
11
![Page 12: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/12.jpg)
Features `Learn` Operator
• Feeds input data to an HTM model • Emits predictions and anomaly scores• Supports keyed and non-keyed streams
Checkpoint Integration• Models are serialized• Facilitates exactly-once processing
Numenta RiverView Connector• Public-domain temporal datasets
12
![Page 13: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/13.jpg)
13
NYC Traffic Examplehttp://data.numenta.org/nyc-traffic/meta.html
![Page 14: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/14.jpg)
14
![Page 15: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/15.jpg)
General Approach1. Define Input Type2. Add Data Source3. Apply Learn Operator
• w/ HTM Network Definition• w/ Field Encoders
4. Define Select Function1. Process the inference data (predictions & anomaly
scores)
15
![Page 16: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/16.jpg)
16
![Page 17: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/17.jpg)
17
![Page 18: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/18.jpg)
Advanced Topics `Reset` Function• Indicates the start of a temporal
sequence• For example: A,B,C,D,E, (reset),
A,B,C,D,E Stateful Functions• Use `mapWithState` to store
predictions for the future18
![Page 19: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/19.jpg)
19
![Page 20: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/20.jpg)
20
Extending Flink
![Page 21: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/21.jpg)
Streaming API/DSL Java
1. Static Entrypoint, then2. Intermediate Representation (e.g.
HTMStream), then3. DataStream!
21
![Page 22: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/22.jpg)
Streaming API/DSL (con’t) Scala
1. `RichDataStream` extensions2. Scala Functions3. Scala-Specific TypeInformation
Other• Serialization Hooks• Clean your closures!
22
![Page 23: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/23.jpg)
Learn Operator Implement `AbstractStreamOperator` Respect Flink’s type system• Use the `TypeInformation` class
Use the State Handle abstraction• * keyed streams only
Instrument your code• Accumulators
23
![Page 24: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/24.jpg)
RiverView Connector Extend `RichParallelSourceFunction`• Parallelism is user-defined• Must handle partition assignment
Mix in `Checkpointed`• Synchronize on checkpoint lock
Support cancel/stop
24
![Page 25: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/25.jpg)
25
Closing
![Page 26: HTM & Apache Flink (2016-06-27)](https://reader030.fdocuments.net/reader030/viewer/2022012909/587588171a28ab901c8b5239/html5/thumbnails/26.jpg)
Help Wanted!
26
Issues: github.com/htm-community/flink-htm/issues
Follow: @ApacheFlink, @dataArtisans, @Numenta Info: http://numenta.org/