The Past, Present and Future of Real-time...
Transcript of The Past, Present and Future of Real-time...
IBM Streams
28 August 2017Roger Rea, IBM Streams Offering Manager
The Past, Present and Future of Real-time Analytics
Analyze more, store less, and act now
Eleventh International Workshop on
Real-Time Business Intelligence and Analytics
August 28, 2017 - Munich, Germany
Streaming in the past
1954 – The first super computer IBM SAGE
Semi-automatic ground environment
250 tons and 60,000 vacuum tubes
Designed to coordinate radar stations
and direct airplanes to intercept
incoming planes
Remained in continuous operation until
1983, over 20 years
2Sources: Wikipedia – SAGE and AN/FSQ-72
Streaming analytics – A paradigm shift
Historical Fact Finding
Analyze Persisted Data
Batch Philosophy
Pull Approach
On-Demand
Analyze the Current Moment “Now”
Analyze Data Directly “In Motion”
Analyze Data at Speed it is Created
Push Approach
Continuous Insights
Traditional Approach Streaming Analytics
Data Store Analysis Insight Data
Aggregate
Analysis Insight
Raw
3Source: Father Son & Co. My Life at IBM and Beyond by Thomas J Watson, Jr. Page 2313
IBM Research papers and Patents related to IBM Streams
IBM Research page related to Streaming Analytics
Tab with list of over 120 publications
Earliest publication, 2004
• Interval query indexing for efficient stream processing
K L Wu, S K Chen, P S Yu
Proceedings of the thirteenth ACM international conference on Information and
knowledge management, pp. 88--97, 2004
– 2010 Patent review: Over 200 applications, over 40 approved
– 2017 Patent review: Over 60 new applications, over 140 approved
Source: IBM4
Streaming in the Present: A very crowded market
CEP Vendors:
Proprietary
1. 2000: Software AG Apama (acquisition
2013)
2. 2003: Tibco Streambase (acquisition 2013)
3. 2004: IBM ODM (merger of Aptsoft and
iLog acquisition) – Decision Management,
not CEP
4. 2004: Tibco Business Events
5. 2005: SAP Event Stream Processing (from
Sybase EP, merger of Aleri & Coral8)
6. 2006: Oracle Event Processing
7. 2007: Informatica Rule Point
8. 2009: Microsoft StreamInsight
9. 2012: Fujitsu Big Data CEP Server
Open Source
1. 2006: Esper
2. 2008: Redhat Drools Fusion
3. 2010: WS02 CEP Server
Streaming Vendors: Proprietary
1. 2003: IBM Streams (commercial v1 2009)
2. 2006: Cisco Prime Analytics (Truviso,
acquired 2012)
3. 2010: Hitachi uContinuous Stream Data
Platform
4. 2010: Vitria Operational Intelligence
5. 2011: SQLStream
6. 2011: Evam Event and Action Manager
7. 2012: Striim (originally WebAction)
8. 2013: SAS Event Stream Process
9. 2013: Amazon Kinesis Streams (in memory
store)
10. 2015: Microsoft Trill .NET
11. 2015 Microsoft Azure Stream Analytics
12. 2015: Unscrambl BRAIN
13. 2016 Amazon Kinesis Analytics
(SQLStream OEM)
SQL Query Based
Inference Rule Based
Event Condition Action Rule Based
Programmatic Based
Neural Net Based
Streaming Vendors: Open Source
1. 2010: Yahoo S4
2. 2011: Apache Storm
3. 2011: Typesafe Reactive Platform
(Akka, Scala)
4. 2013: Spring XD
5. 2013: Apache Samza
6. 2013: Apache Spark Streaming
(microbatch)
7. 2014: Data Torrent Real Time
Streaming/Apache Apex
8. 2014: Apache Flink Streaming
9. 2014: Google Millwheel Framework
10. 2014 Tigon Cask
11. 2014: Apache NiFi
12. 2015: eBay Pulsar
13. 2015: Google Dataflow/Apache Beam
14. 2016: Apache Edgent
15. 2016: Twitter Heron
16. 2016: Apache Kafka Streaming
17. 2017: AirBnB StreamAlert
SOURCES: Author Experience, Forrester, Bloor Research,
Complex Events , Predictive Analytics Today5
IBM Streams at a glanceNearly 200 operators with 1300 functions
Hadoop
Data
Warehouse
Communications Data Sources
TCP/IP
UDP/IP
HTTP
FTP
RSS
Messaging Toolkit (Kafka, XMS, IBM
MQ, Apache ActiveMQ, RabbitMQ, MQ
TT, MQ Low Latency Messaging)
IBM DataStage
IBM Data Replication
Functions:
• Filter
• Enrich
• Normalize
• Windowed Aggregations
• Machine Learning
• Scoring (SPSS, R,
SparkML, Python)
• CEP & Pattern Matching
• Geospatial
• Video/Image
• Text Analytics (AQL)
• Speech to Text
• Rules
IBM Streams
Scale-out RuntimeHadoop: HDFS, GPFS, Hive, Hbase,
BigSQL, Parquet, Thrift, Avro
RDBMS: IBM DB2, IBM DB2 Parallel
writer, IBM Informix, IBM BigInsights
BigSQL, IBM Netezza,
IBM Netezza NZLoad, solidDB, Oracle,
Microsoft SQL Server, MySQL,
Teradata, Aster, HP Vertica
NoSQL:
Key Value Stores (Memcached, Redis,
Redis-Cluster, Aerospike)
Column Oriented Stores (Cassandra,
Hbase)
Document Oriented Stores (IBM
Cloudant, Mongo, Couchbase)
NoSQL
Application Development
Streams Processing Language
Visual or Text
Java
Scala
Python
6
Machine Learning“The science of getting computers to act without being explicitly programmed”
“Systems that can learn from data”
Many categories of Machine Learning:
• Supervised, Unsupervised and Reinforcement Learning
• Decision Trees, Regressions, Classification, Clustering, Filtering, Associations
• Single variant, Multi-variant
Data
7
Streams Machine LearningUnsupervised: Learn as you go in Streams
– Time Series toolkit has about 20 algorithms
• Continuous update of model and making of predictions
• Anomaly Detection, Classification, Regressions, Clustering, Filtering
Supervised: Learn offline and Score models in Streams– PMML import: Classification, Clustering, Regression, Association
– SPSS import: all SPSS models, including data preparation
– Spark MLLib: Classification, Regression, Trees, Clustering, Filtering
– R scripts: Classification, Regression, Trees, Clustering, Filtering
– Python: Classification, Regression, Trees, Clustering, Filtering
Redeploy updated models without stopping Streams application
Data
8
adopts IBM Streams
Personal Weather Stations
World’s largest PWS network: 250k+ worldwide
Doubling annually since 2015
SOURCE: weather.com9
9
University of
Ontario Institute
of Technology
Detecting illness
Up to 24 hours earlier
10
Source:
YouTube
10
10
Verizon uses IBM Streams to
deliver Cognitive Customer Care Speech to TextListens side by side to
agent-customer
conversation
Intent Detection Comprehends the
discussion and classifies the
intent
Scoring & Next
Best ActionIdentifies proactive and
reactive relevant content
Contextual Assist Delivers cognitive agent
assist
Source: ibm.com
case studies11
Areas to consider for value Functionality required
Reduced hardware footprint
Developer & Admin productivity
Agility to quickly react to new data
Savings sooner via faster development
High Availability/limited downtime
Comparable software prices
New releases of software
Smarter business insights
Vendor Sales Team
Vendor Tech Sales Team
Vendor Quality
Vendor Software Development processes
Vendor Research
Breadth of Vendor offerings
Worldwide or local support
Flexible software
Legal
Governance
Open Source
Patents
Security
Developer availability
Community
Tangible Benefits
Intangible Benefits
Reduced Risk
12
One technology becomes winner take all (2% odds)
Half the current vendors/technologies drop out in 5 years (20% odds)
Half the current vendors/technologies drop out in 10 years (50% odds)
Apache Beam becomes a uniting development API (20% odds)
Sophisticated, cognitive apps with dozens of data sources become
pervasive within 5 years (40% odds)
Data Volumes, Varieties and Velocities will continue to grow (100% odds)
Streaming Analytics outpaces traditional Hadoop/Spark market (70% odds)
Streaming in the near future
These opinions are from the author, Roger Rea, and do not necessarily represent IBM13
Streaming in the Far Future
Foundation by Isaac Asimov
Mathematician Hari Seldon
Mathematics known as psychohistory
Predict the future, at large scale
Source:
Wikipedia
14
Thank you
Roger Rea, IBM Streams Offering Manager
Eleventh International Workshop on
Real-Time Business Intelligence and Analytics
August 28, 2017 - Munich, Germany
15