IGNITE 2015 EU Procurement Breakout "Supplier Management: Client Success Stories"
IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on Apache Ignite
-
Upload
in-memory-computing-summit -
Category
Data & Analytics
-
view
282 -
download
1
Transcript of IMC Summit 2016 Breakout - Matt Coventon - Test Driving Streaming and CEP on Apache Ignite
TEST DRIVING STREAMING AND CEPON APACHE IGNITEMATT COVENTON
See all the presentations from the In-Memory Computing Summit at http://imcsummit.org
ABOUT ME
Big Data Services Lead at Innovative Software Engineering http://www.iseinc.biz [email protected]
WHAT ARE WE GOING TO DO TODAY?
An Overview of Apache Ignite Streaming and CEP Dive into some code!
A simple streaming/CEP use case
APACHE IGNITE STREAMING AND CEPOVERVIEW
WHAT IS STREAMING?
Most commonly, streaming refers to processing unbounded data sets as they arrive to achieve lower latency and therefore more timely results.
If you haven’t already, read these helpful posts that clarify the terms, techniques, and design patterns: https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
WHAT IS CEP?
Complex event processing, or CEP, is event processing that combines data from multiple sources to infer events or patterns that suggest more complicated circumstances. The goal of complex event processing is to identify meaningful events (such as opportunities or threats) and respond to them as quickly as possible (https://en.wikipedia.org/wiki/Complex_event_processing)
IN THE APACHE IGNITE CONTEXT
Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies.
APACHE IGNITE STREAMING
Primarily a high performance means of inserting unbounded data sets into the Ignite Data Grid (cache) using IgniteDataStreamer API
StreamReceiver API offers custom pre-processing
Other data processing through queries (including continuous queries) and cache policies
Backed by all kinds of Ignite goodness: Scalable Fault-tolerant High throughput
Streaming functionality atop a convergent data platform – the future is bright!
IGNITE DATA STREAMER API
IgniteDataStreamer
MQTT Streamer
Kafka Streamer
Camel Streamer
StreamReceiver
StreamTransformer
StreamVisitor
JMS Data Streamer Other...
IGNITE DATA STREAMER API
IgniteDataStreamer API is the basic building block to writing unbounded data to Ignite Scalable Fault-tolerant At-least-once-guarantee (watch out for duplicate data) Buffers data and writes in batches (may introduce unwanted latency, set perNodeBufferSize() and
autoFlushFrequency() accordingly)
STREAM RECEIVER API
StreamReceiver API allows you to add custom, collocated pre-processing of the streaming data prior to putting it into the cache. Does not put data into the cache automatically, you need to handle that during processing Single receiver per IgniteDataStreamer
Two out of the box implementation of StreamReceiver StreamTransformer updates data in the stream cache based on its previous value StreamVisitor visits every key-value tuple in the stream
Might be possible to implement watermark, trigger, accumulation patterns (depending on use case, see https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102)
WINDOWING
Achieved through cache eviction and expiry policies Use eviction policies for size/batch based
Consider SortedEvictionPolicy with custom comparator for “x most recent events” Use expiry policies for time based
Consider notion of event time, ingestion time, and processing time CreatedExpiryPolicy is ingestion time based
What if data is delayed? Consider a custom expiry policy based on event time
QUERYING
All Ignite data indexing capabilities as well as Ignite SQL, TEXT, and Predicate based cache queries are available (it’s just another cache after all)
Leverage continuous queries to filter events on the node and receive real-time notifications that match your criteria Another option to implement watermark, trigger, and accumulation patterns
This is where the complex event processing (CEP) magic happens leveraging distributed joins and cross-cache joins
DIVE INTO SOME CODEA SIMPLE IOT USE CASE
A SIMPLE IOT USE CASE
Monitor productivity on manufacturing lines Sensors stream number of items per second through IgniteDataStreamer Data is retained in the cache for 60 seconds (windowing) Dashboard shows number of items per minute for each active line and the
total items per minute for the entire factory
1
2
n
Ignite Data Streamer Ignite Cache
3
Dashboard
IGNITE POM DEPENDENCIES
PRODUCTION LINE EVENT
CACHE CONFIG
MONITORING APPLICATION MAIN CLASS
MONITORING APPLICATION REST CONTROLLER
STARTING THE IGNITE NODE
H2 DEBUG CONSOLE WITH ONE LINE REPORTING
DASHBOARD
THANK YOU!