Can a Divorced MOM and DAD take care of the CHILD ?
description
Transcript of Can a Divorced MOM and DAD take care of the CHILD ?
© 2003 IBM Corporation
Can a Divorced MOM and DAD take care of the CHILD ?
MOM – Message Oriented Middleware.DAD – Direct Access to Data (DBMSs).CHILD – Correlating Historical or In-transit Large-scale Data-stream.
Business Unit or Product Name
© 2003 IBM Corporation
In this talk …
Introduce CHILD - Correlating Historical In-transit Large-scale Data-streams.
Compare CHILD and current Stream Processing Engines.
How DAD and MOM can/may help/work together?
Summary.
Business Unit or Product Name
© 2003 IBM Corporation
The Supply Chain Example
Some funny DAD characteristics:• DADs are corporates custodians of truth.• DADs generally maintain a single version of truth - the recent truth.• DADs are optimized to answer questions for a single version of truth.• The truths can be atomically evaluated to answer the questions.• There is only one answer to the question.• DADs do not remember the answers provided to the previously asked questions.
Business Unit or Product Name
© 2003 IBM Corporation
Supply Chain Evolves to Accommodate Emerging Business Practices
Some of the Characteristics of MOM
Allows asynchronous communication between disconnected systems within and across organizations.
Provides Message Filtering and Message Correlation.
Persistence and Guaranteed Delivery Mechanism.
Message enrichment can be achieved by referencing static datasets during routing.
Business Unit or Product Name
© 2003 IBM Corporation
Proactive Supply Chain Management
In a proactive case: Each system creates it unique view
of state of interest and receives information about changes to state of interest.
There may not be a complete truth. Facts may arrive over a period of time.
The answers to the questions change as new facts become available.
The aim is to reduce the time to re-compute most recent answers.
Business Unit or Product Name
© 2003 IBM Corporation
Scenario: London Congestion Charging ( + security )Command & Control
Real time processing
SensorReading
DB
Billing
Security/ fraud alerts
Retrospectiveprocessing
Charging ( and security) rules
• vehicle license plate , owner, owner residency, fee paid ?
• entry and exit times of vehicle, time of day, day of week , charging, residency
• reentry within 3 hours is free
• fraud: enters zone and not seen ; security - grouped tanker trucks
• 100,000’s vehicle observations / hour
Business Unit or Product Name
© 2003 IBM Corporation
Example of CHILD Applications
RFID
Sensor Networks
Stock Quotes
Database Notification
Content Routing Networks
RSS Aggregators
Business Unit or Product Name
© 2003 IBM Corporation
CHILD – Correlating Historical or In-transit Large-Scale Data Streams
Characteristics:
1. Append Only Data.
2. Push Paradigm – Stream of Data (truths), static set of queries (questions).
3. Continuous processing requirements.
4. Correlation requirements.
Business Unit or Product Name
© 2003 IBM Corporation
CHILD – Correlating Historical or In-transit Large-Scale Data Streams - 2
S’ ∆’ S’’
∆’’ S’’’
∆’’’ S*
∆*
All queries have associated time constraint specified in terms of windowing functions.
Query Type 1: Query when states S’, S’’. S’’’, S* are reached. (DB Notification)
Query Type 2: Query when S’’’ is reached after S’ and S’’ (Sensor Networks)
Query Type 3: Query when S* is reached within 2 transitions from S’. (BI)
Query Type 4: Get an aggregate of (∆) (Sensor Network)
Query Type 5: Query when S’, S’’ were observed in the past N time windows. (Fraud detection Networks)
Query Type 6: Query when ∆’, ∆’’, ∆’’’ resulted in exact changes from S’ to S’’ to S’’’. (ESB)
Query Type 7: Query when S’,S’’,S’’’ …∆’, ∆’’, ∆’’’… were not observed. (Fraud Detection)
Business Unit or Product Name
© 2003 IBM Corporation
All queries have associated time constraint specified in terms of windowing functions.
Query Type 8: Query Evaluate Join S,P states (All Most all use cases)
Query Type 9: Query Co evaluate Filter on S,P….. (All Most all use cases)
Query Type 10: Query Evaluate Join/Filter on S (t), S (t-T) (Sensor Networks, BI)
Query Type 11: Query Evaluate P between states S’ and S’’’ (Sensor Networks, Stock Ticks)
S’ ∆’ S’’
∆’’ S’’
’ ∆’’’
S* ∆*
P’ δ’ P’’
δ’’ P’’
’ δ’’’
P* δ *
CHILD – Correlating Historical or In-transit Large-Scale Data Streams - 3
Business Unit or Product Name
© 2003 IBM Corporation
Stream Systems – Academic Projects
AURORA
BOREALIS
STREAMDB
TELEGRAPHCQ
NIGARACQ
Business Unit or Product Name
© 2003 IBM Corporation
CHILD and Stream Processing – Some Observations.
Temporal dimension is not always the predominant one.
For business processing all facts are retained.
An event is in the eye of the beholder, so every tuple is a message until observed in a context. Queries need to have context.
Being “Turing complete” SQL will allow one to specify arbitrary data manipulations, the tradeoff is how much State we retain vs. resource usage vs. throughput.
Declarative stream manipulation language needs to be developed.
A conceptual data model for manipulating append only data should be the focus - not limited to the engineering aspect of the systems.
Additionally, smart summarization techniques are required for correlating and mining historic data.
Business Unit or Product Name
© 2003 IBM Corporation
Real-time performance is critical ONLY in some cases.
Providing a common abstraction for sequence analysis on the data items appearing in the stream and across the streams remains critical.
Typical stream systems are restricted to 20-to-30 operators and require resource augmentation to handle higher workloads, which in turn requires capabilities similar to MOMs.
For handling queries over historic data and correlation with historic data CHILD requires capabilities equivalent to DADs.
CHILD and Stream Processing – Observations contd.
Business Unit or Product Name
© 2003 IBM Corporation
DAD
SPE
optionaloptional
STREAM
SPE-1
SPE-2
SPE-4
SPE-3
SPE-5
Is this not MOM with Content Routing Operators ?
Business Unit or Product Name
© 2003 IBM Corporation
SPE-1
SPE-2
SPE-4
SPE-3
SPE-5
What is missing?
Ability to create the Ad hoc network of content routers given a list of streams and queries.
Ability to describe and support smart subscriptions
Ability to scale simultaneous evaluation of multiple expressions.
Business Unit or Product Name
© 2003 IBM Corporation
Classification of MOMs and DADs
Moms/Dads Divorced DAD Relational DAD
Active DAD Temporal DAD
Divorced MOM Stream Processing Systems
Triggers and Database Notification Systems
Rule based subscription evaluation
Context Analysis.
Independent MOM Content Routing Systems
Message Enrichments with static data.
ECA (Event Condition Action) with data dissemination
Temporal Event Correlation.
Transactional MOM
Proprietary Queue based Systems
Secured Pub/Sub with transactional capability
Database as a rule-based Content Provider.
Traceability Analysis + Event Correlation.
Business Unit or Product Name
© 2003 IBM Corporation
Summary
Stream Processing is just one aspect of the emerging
paradigm of processing append only data with support for
continuous queries.
These systems need a new representational model. SQL
Or SQL extensions are not sufficient.
If not careful we may redevelop parts of MOM and DAD in
The process for creating support for CHILD.
© 2003 IBM Corporation
DAD: There is one and only one truth that I know. For previous versions of truth see my log…
MOM: I do not need to know the truth, I just GOSSIP. I GOSSIP about facts !!!
CHILD: But MOM, DAD, I do not need to know the complete truth. I want to take decisions now, I will correct them when I know more.
DAD: Well I can provide you triggers if you want?
CHILD: Ahhh !!! As if they scale.
MOM: Well I can talk with other MOMs and enrich the contents on the fly.
CHILD: Oh Is it !! Can you also enrich it on the fly? Or tell me when three red marbles are followed by four green ones?
MOM: Only if I know what marbles are. May be with my content routing hat on I can do that.
CHILD: Yeah Right !!!
CHILD: Can uncle Active (Database) help.
MOM: Oh no, he suffers from Rule Termination Problem.
DAD: Well if you ask Temporally aware brother of mine he can help you relate things in past.
CHILD: But DAD temporal is just one axes, I consider value Axes. I want to purchase a stock of MOBIL OIL only when the fuel price has risen after a REFINERY BLOWUP. Its not time but the context that matters.
MOM: You know my sister STREAM PROCESSING ENGINE can help.
CHILD: Oh Sure, with an ability to provide 20-30 operators, In-Memory operations only. Optional Recovery, Undefined Semantics and NON DECLARATIVE interface, I will be in great hands!!! YUCK!!
MOM: Oh we need to provide him with a mix or else he will replicate our behaviors.
DAD: DOH !!!