Continuous Intelligence: Staying Ahead with Streaming Analytics
-
Upload
inside-analysis -
Category
Technology
-
view
368 -
download
1
Transcript of Continuous Intelligence: Staying Ahead with Streaming Analytics
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
! Reveal the essential characteristics of enterprise software, good and bad
! Provide a forum for detailed analysis of today’s innovative technologies
! Give vendors a chance to explain their product to savvy analysts
! Allow audience members to pose serious questions... and get answers!
Mission
Twitter Tag: #briefr
The Briefing Room
MARCH: Operational Intelligence
April: INTELLIGENCE
May: INTEGRATION
June: DATABASE
Twitter Tag: #briefr
The Briefing Room
Operational Intelligence
Processing Monitoring Alerts/triggers/actions
REAL-TIME…
Twitter Tag: #briefr
The Briefing Room
Analyst: Mark Madsen
Mark Madsen is president of Third Nature, Inc.
Twitter Tag: #briefr
The Briefing Room
! SQLstream is an enterprise software company focused on making businesses responsive to real-time big data assets
! Its platform provides a relational stream for analyzing large volumes of service, sensor, and machine and log file data
! SQL queries in SQLstream generate results continuously as data becomes available
SQLstream
Twitter Tag: #briefr
The Briefing Room
Damian Black
Damian Black is the founder and CEO of SQLstream, a pioneer in Streaming Big Data. Damian has worked for almost two decades in Silicon Valley, with senior roles in a variety of companies including Hewlett-Packard, Neustar, Xacct Technologies and Followap. He has spoken at many conferences, and was on GigaOM’s first Big Data panel in 2008. Damian graduated from Manchester University and was one of the first research scientists to join HPLabs Europe. He was selected for the International Management Challenge in conjunction with the Financial Times and Ashridge business school while at Hewlett-Packard. Damian is the author of eleven granted patents with five more pending.
Copyright © SQLstream Inc.
BIG DATA ON TAP™
C o n t i n u o u s I n t e l l i g e n c e :
S t ay i n g A h e a d w i t h S t r e a m i n g L o g F i l e A n a l y t i c s
M a r c h 2 0 1 3 D a m i a n B l a c k , C E O , S Q L s t r e a m
| 10 Copyright © 2013 | +1 877 571 5775 | [email protected]
Mac h ine-Generated B ig Data Explos ion High volume, high velocity, structured and unstructured data from software platforms, applications and systems
GPS
Telematics
IP Networks, Video
Servers, Social Media, Security
Servers, Applications, Storage Networks
Machine-generated data will increase to 42% of all data by 2020, up from 11% in 2005.
“The Digital Universe in 2020” IDC
| 11 Copyright © 2013 | +1 877 571 5775 | [email protected]
OPERAT IONAL INTEL L IGENCE B r i d g i n g t h e C h a s m B e t w e e n A n a l y t i c s a n d O p e r a t i o n s
Business Applications
➔ Transactions
➔ Everyday business
Business Intelligence
➔ Post-hoc analysis
➔ Data warehousing
➔ Strategic direction
Operational Intelligence
➔ Predictive analytics
➔ Automated actions
➔ Ops optimization
➔ Tactical execution TR
AN
SAC
TIO
NS
STRU
CTU
RED
DAT
A
UN
STRU
CTU
RED
DAT
A
VELOCITY VOLUME VARIETY VISUAL VALUE
Real-time, continuous Real-time, continuous Historical, periodic
| 12 Copyright © 2013 | +1 877 571 5775 | [email protected]
OPERAT IONAL INTEL L IGENCE B r i d g i n g t h e C h a s m B e t w e e n A n a l y t i c s a n d O p e r a t i o n s
Business Applications
➔ Transactions
➔ Everyday business
Business Intelligence
➔ Post-hoc analysis
➔ Data warehousing
➔ Strategic direction
Operational Intelligence
➔ Predictive analytics
➔ Automated actions
➔ Ops optimization
➔ Tactical execution TR
AN
SAC
TIO
NS
STRU
CTU
RED
DAT
A
UN
STRU
CTU
RED
DAT
A
VELOCITY VOLUME VARIETY VISUAL VALUE
Real-time, continuous Real-time, continuous Historical, periodic
| 13 Copyright © 2013 | +1 877 571 5775 | [email protected]
MACH INE DATA TO OPERAT IONAL INTEL L IGENCE
PROACTIVE
REACTIVE
| 14 Copyright © 2013 | +1 877 571 5775 | [email protected]
MACH INE DATA TO OPERAT IONAL INTEL L IGENCE
PROACTIVE
REACTIVE
| 15 Copyright © 2013 | +1 877 571 5775 | [email protected]
R EAL - T IME WEB SERVER LOG MONITOR ING M o z i l l a ( G o o g l e : “ Yo u t u b e M o z i l l a G l ow ” )
Real-time monitoring across all download web
servers across the world simultaneously.
Collect
Remote agents transform log files into real-time
streams
Analyze
Real-time analysis & aggregation by location
Share
Continuous ETL into Hadoop Hbase
Internet ‘Glow’ app for real-time visualization
Web Server Log Files (Remote)
Hadoop HBase
Streaming collection, real-time analysis and continuous integration
by location
| 16 Copyright © 2013 | +1 877 571 5775 | [email protected]
parse parse
parse Filter off Bad recs
Merge parse
parse parse
Parse Logs
Add Location
Filter out Bots
Analyze Errors
Streaming Analytics
HBase
Streaming Visualization
Historical Charts
R EAL - T IME WEB SERVER LOG MONITOR ING M o z i l l a ( G o o g l e : “ Yo u t u b e M o z i l l a G l ow ” )
| 17 Copyright © 2013 | +1 877 571 5775 | [email protected]
Mozilla Firefox 4 – Real-time Download Monitor
Continuous processing of download requests
Real-time integration with Hadoop and HBase
REAL -T IME WEB SERVER LOG MONITOR ING M o z i l l a ( G o o g l e : “ Yo u t u b e M o z i l l a G l ow ” )
| 18 Copyright © 2013 | +1 877 571 5775 | [email protected]
MACH INE DATA Where i s t he i n t e l l i gen ce?
TRANS,2013-02-17-15:30:22,3458783,2347897953,128.56.0.253,STATUS:-15, DE69975, 4157588342 Transaction Log Details
Web Server Logs
CDR Records
Smartphone GPS Updates
Twitter {"created_at:Thu Feb 17 15:30:55 +0000 2013,id:304612775055998976,id_str:304612775055998976,text:@MyServiceProvider today sucks, keeps dropped!,source:u006ca href=http:www.url.com rel=nofollow,followers_count:147,friends_count:10142, location: San Francisco, time_zone: Pacific, geo_enabled:true, location:u00dcT: -6.1987552,106.8661953, screen_name:APerson
<id>1597831220</id><deviceid>0198873465</deviceid><lat>lat=47.643957</lat><lon>lon= -122.3269</lon><time>2013-02-17T15:37:26Z</time><bearing>223.4535</bearing>
<id>1597865781</id><deviceid>0198873465</deviceid><lat>lat=47.645982</lat><lon>lon=-122.327500</lon><time>2013-02-17T15:37:26Z</time><bearing>200.6138</bearing>
<id>1597940125</id><deviceid>0198873465</deviceid><lat>lat=47.647381</lat><lon>lon=-122.326501</lon><time>2013-02-17T15:37:26Z</time><bearing>87.4357</bearing>
[Sun Feb 17 15:30:49 2013] [notice] srv-sfo-08 caught SIGTERM, shutting down [Sun Feb 17 15:30:49 2013] [notice] Apache/2.2.21 -- resuming normal operations
TERMINATE,ctl09gsx,01299796304,GMT-08:00,02-17-13,15:21:00,9,387,64ms,02-17-13,15:30:55,0005, IP-TO-IP,4157588342,8775715775,1,0,4157588342,RD_AXY_NN0_001,SFR01AAG34,40.50.245.60, 234.234.60.75,65678,411,399,SIP,SANFRANCISCO,0x4B1698,0x0005E,0x49768,4157588342,0198873465
| 19 Copyright © 2013 | +1 877 571 5775 | [email protected]
MACH INE DATA Where i s t he i n t e l l i gen ce?
TRANS,2013-02-17-15:30:22,3458783,2347897953,128.56.0.253,STATUS:-15, DE69975, 4157588342 Transaction Log Details
Web Server Logs
CDR Records
Smartphone GPS Updates
Twitter {"created_at:Thu Feb 17 15:30:55 +0000 2013,id:304612775055998976,id_str:304612775055998976,text:@MyServiceProvider today sucks, keeps dropped!,source:u006ca href=http:www.url.com rel=nofollow,followers_count:147,friends_count:10142, location: San Francisco, time_zone: Pacific, geo_enabled:true, location:u00dcT: -6.1987552,106.8661953, screen_name:APerson
<id>1597831220</id><deviceid>0198873465</deviceid><lat>lat=47.643957</lat><lon>lon= -122.3269</lon><time>2013-02-17T15:37:26Z</time><bearing>223.4535</bearing>
<id>1597865781</id><deviceid>0198873465</deviceid><lat>lat=47.645982</lat><lon>lon=-122.327500</lon><time>2013-02-17T15:37:26Z</time><bearing>200.6138</bearing>
<id>1597940125</id><deviceid>0198873465</deviceid><lat>lat=47.647381</lat><lon>lon=-122.326501</lon><time>2013-02-17T15:37:26Z</time><bearing>87.4357</bearing>
[Sun Feb 17 15:30:49 2013] [notice] srv-sfo-08 caught SIGTERM, shutting down [Sun Feb 17 15:30:49 2013] [notice] Apache/2.2.21 -- resuming normal operations
TERMINATE,ctl09gsx,01299796304,GMT-08:00,02-17-13,15:21:00,9,387,64ms,02-17-13,15:30:55,0005, IP-TO-IP,4157588342,8775715775,1,0,4157588342,RD_AXY_NN0_001,SFR01AAG34,40.50.245.60, 234.234.60.75,65678,411,399,SIP,SANFRANCISCO,0x4B1698,0x0005E,0x49768,4157588342,0198873465
Timestamp
Timestamp
Timestamp
Timestamp
Timestamp
| 20 Copyright © 2013 | +1 877 571 5775 | [email protected]
MACH INE DATA Where i s t he i n t e l l i gen ce?
TRANS,2013-02-17-15:30:22,3458783,2347897953,128.56.0.253,STATUS:-15, DE69975, 4157588342 Transaction Log Details
Web Server Logs
CDR Records
Smartphone GPS Updates
Twitter {"created_at:Thu Feb 17 15:30:55 +0000 2013,id:304612775055998976,id_str:304612775055998976,text:@MyServiceProvider today sucks, keeps dropped!,source:u006ca href=http:www.url.com rel=nofollow,followers_count:147,friends_count:10142, location: San Francisco, time_zone: Pacific, geo_enabled:true, location:u00dcT: -6.1987552,106.8661953, screen_name:APerson
<id>1597831220</id><deviceid>0198873465</deviceid><lat>lat=47.643957</lat><lon>lon= -122.3269</lon><time>2013-02-17T15:37:26Z</time><bearing>223.4535</bearing>
<id>1597865781</id><deviceid>0198873465</deviceid><lat>lat=47.645982</lat><lon>lon=-122.327500</lon><time>2013-02-17T15:37:26Z</time><bearing>200.6138</bearing>
<id>1597940125</id><deviceid>0198873465</deviceid><lat>lat=47.647381</lat><lon>lon=-122.326501</lon><time>2013-02-17T15:37:26Z</time><bearing>87.4357</bearing>
[Sun Feb 17 15:30:49 2013] [notice] srv-sfo-08 caught SIGTERM, shutting down [Sun Feb 17 15:30:49 2013] [notice] Apache/2.2.21 -- resuming normal operations
TERMINATE,ctl09gsx,01299796304,GMT-08:00,02-17-13,15:21:00,9,387,64ms,02-17-13,15:30:55,0005, IP-TO-IP,4157588342,8775715775,1,0,4157588342,RD_AXY_NN0_001,SFR01AAG34,40.50.245.60, 234.234.60.75,65678,411,399,SIP,SANFRANCISCO,0x4B1698,0x0005E,0x49768,4157588342,0198873465
Timestamp
Timestamp
Timestamp
Timestamp
Timestamp
Mobile # Customer
Server
Mobile # Device ID Term Reason
Device ID Location
Location
Service Provider
Fail Code
| 21 Copyright © 2013 | +1 877 571 5775 | [email protected]
DATA EXPLOSION
COMPLEXITY
BUSINESS AGILITY
OPERAT IONAL STREAMING B IG DATA – PA IN PO INTS
Too difficult to build & maintain real-time apps
Too costly to analyse voluminous real-time data
Too slow to respond to new requirements
| 22 Copyright © 2013 | +1 877 571 5775 | [email protected]
DATA EXPLOSION
COMPLEXITY
BUSINESS AGILITY
OPERAT IONAL STREAMING B IG DATA – PA IN PO INTS
Too difficult to build & maintain real-time apps SQLstream eliminates your development risk.
Too costly to analyse voluminous real-time data SQLstream slashes TCO for real-time analysis.
Too slow to respond to new requirements SQLstream allows you to add new apps easily.
| 23 Copyright © 2013 | +1 877 571 5775 | [email protected]
Real-time alerts, action
and visualization
CONT INUOUS OPERAT IONAL INTEL L IGENCE
Logs
Sensors
GPS
Networks
Social media
RFIDs
Servers
Telecom
Smart grid
Oil & Gas
Manufacturing
Logistics
M2M
Telematics
Retail
Internet
Banking
Data centers
Automotive
| 24 Copyright © 2013 | +1 877 571 5775 | [email protected]
Enhance with
historical information
Store detail and aggregate
data
Real-time alerts, action
and visualization
CONT INUOUS OPERAT IONAL INTEL L IGENCE
Logs
Sensors
GPS
Networks
Social media
RFIDs
Servers
Telecom
Smart grid
Oil & Gas
Manufacturing
Logistics
M2M
Telematics
Retail
Internet
Banking
Data centers
Automotive
• Collect, transform and deliver: ETL++ • Analyze unstructured data & enhance • Predictive analytics & actions
| 25 Copyright © 2013 | +1 877 571 5775 | [email protected]
MOVING FROM H IGH LATENCY TO REAL - T IME RESPONS IVENESS
COLLECT
CLEANSE
ENRICH
ANALYZE
SHARE
HIGH LATENCY
Traditional approach leads to high latency
| 26 Copyright © 2013 | +1 877 571 5775 | [email protected]
MOVING FROM H IGH LATENCY TO REAL - T IME RESPONS IVENESS
COLLECT
CLEANSE
ENRICH
ANALYZE
SHARE
LOW LATENCY
Traditional approach leads to high latency
SQLstream streaming approach:
» Continuous Parallel Dataflow Execution
» Generate real-time answers immediately
» Deliver and share the results immediately
| 27 Copyright © 2013 | +1 877 571 5775 | [email protected]
SQLSTREAM DATAFLOW TECHNOLOGY P I P E L I N I N G A N D S U P E R S C A L A R PA R A L L E L P R O C E S S I N G
Fine-grained parallelism: simple, massively scalable, super fast.
Query Processor =
| 28 Copyright © 2013 | +1 877 571 5775 | [email protected]
Use SQLstream and ISO/ANSI standard SQL » Proven performance, optimization and scalability » Rapid app development with familiar language » Leverage existing SQL skills & investment
Streaming SQL Views
SHARE STREAMING B IG DATA
GENERATES THE STREAM OF NEW YORK ORDERS S H I P P I N G W I T H I N A SERVICE LEVEL OF 1hr
CREATE VIEW compliant_orders AS SELECT STREAM *
FROM orders OVER sla JOIN shipments ON orders.id = shipments.orderid WHERE city = 'New York' WINDOW sla AS
(RANGE INTERVAL '1' HOUR PRECEDING)
| 29 Copyright © 2013 | +1 877 571 5775 | [email protected]
SELECT STREAM ROWTIME, url, numErrorsLastMinute FROM ( SELECT STREAM ROWTIME, url, numErrorsLastMinute, AVG(numErrorsLastMinute) OVER lastMinute AS avgErrorsPerMinute, STDDEV(numErrorsLastMinute) OVER lastMinute AS stdDevErrorsPerMinute FROM ServiceRequestsPerMinute WINDOW lastMinute AS (PARTITION BY url RANGE INTERVAL ‘1’ MINUTE PRECEDING) ) AS S WHERE S.numErrorsLastMinute > S.avgErrorsPerMinute + 2 * S.stdDevErrorsPerMinute;
A STREAMING SQL QUERY C L O U D I N F R A S T R U C T U R E M O N I T O R I N G W I T H B O L L I N G E R B A N D S
BUSINESS NEED: Detect run-away applications
before resource consumption becomes an issue.
| 30 Copyright © 2013 | +1 877 571 5775 | [email protected]
TIME, MONEY, COMPLEXITY
Business Intelligence: Hadoop HBase & Data Warehouses
Supply Chain &
ERP
Operations &
Management
Finance &
Accounting
CRM &
Billing
THE REAL-T IME DATA MANAGEMENT HEADACHE
| 31 Copyright © 2013 | +1 877 571 5775 | [email protected]
STREAMING ANALYTICS AND AGGREGATION
STEAMING EVENT CORRELATION
STREAMING ALERTS & ALARMS
CONTINUOUS ETL
Business Intelligence: Hadoop HBase & Data Warehouses
Supply Chain &
ERP
Operations &
Management
Finance &
Accounting
CRM &
Billing
THE REAL-T IME DATA MANAGEMENT SOLUT ION
| 32 Copyright © 2013 | +1 877 571 5775 | [email protected]
SQLSTREAM STANDARD INTEGRAT ION ADAPTERS
D A T A B A S E S Core Database Adapter
B I G D A T A Hadoop BigQuery
MACHINE DATA Log Files
Sockets
Web Feeds GATE Email
Table Reader
Table Update
Table Lookup (any JDBC)
+ HDFS + HBase
+ Remote Agent + FileWriter + FileReader
+ Twitter + RSS + ATOM etc
+ TCP + UDP
JDBC + JMS + log4j
T Semantic Streaming
XML Parse + XPath
Middleware
XML
STORM
| 33 Copyright © 2013 | +1 877 571 5775 | [email protected]
S TREAMING V ISUAL IZAT ION
| 34 Copyright © 2013 | +1 877 571 5775 | [email protected]
REAL-T IME OPERAT IONAL INTELL IGENCE M A R K E T C O M PA R I S O N
ENTERPRISE REQUIREMENT
OPERATIONAL INTELLIGENCE WITH OTHERS
OPERATIONAL INTELLIGENCE ���WITH SQLSTREAM
Time Series Analytics Simplistic answers without time series. Comprehensive times series support.
Complex Analysis Simple pattern matching and statistics. Elegantly solves hardest problems.
Join & Correlate Does not combine or join streams. Joins data streams in real-time.
Enrich & Integrate Does not enrich or integrate data. Gives rich answers in real-time.
Big Data Scalability No parallel processing; limited scalability. Massively parallel, auto-optimizing.
Painless TCO Very expensive, proprietary, with only basic visualization.
Low TCO, ANSI/ISO standard queries, rich real-time visualization.
| 35 Copyright © 2013 | +1 877 571 5775 | [email protected]
DATA EXPLOSION
COMPLEXITY
BUSINESS AGILITY
SQLSTREAM: B IG DATA ON TAP™, de l ivered
Eliminating the development risk • Fine-grained parallel processing: simple, scalable and fast.
Slashing TCO for real-time analysis • Scales easily without transaction bottlenecks.
Adding new apps easily • Shares dynamic results and data across the organization.
| 36 Copyright © 2013 | +1 877 571 5775 | [email protected]
OPERAT IONAL INTEL L IGENGE - BEYOND I T
ENVIRONMENTAL TRANSPORTATION NETWORKS
Environmental Monitoring Location-based services Machine-to-Machine
Smart Grid Cars as Sensors Logistics
Copyright © SQLstream Inc.
QUESTIONS
About the Presenter
Mark Madsen is president of Third Nature, a technology research and consul8ng firm focused on business intelligence, data integra8on and data management. Mark is an award-‐winning author, architect and CTO whose work has been featured in numerous industry publica8ons. Over the past ten years Mark received awards for his work from the American Produc8vity & Quality Center, TDWI, and the Smithsonian Ins8tute. He is an interna8onal speaker, a contributor at Forbes Online and Informa8on Management. For more informa8on or to contact Mark, follow @markmadsen on TwiMer or visit hMp://ThirdNature.net
Con.nuous Intelligence: Staying Ahead with Streaming Analy.cs
March, 12 2013 Mark Madsen www.ThirdNature.net @markmadsen
The “E” in EDW was a lie…
Transac.ons vs. Events
Transac8ons: ▪ Each one is valuable ▪ The elements of a transac8on can be aggregated easily ▪ A set of transac8ons does not usually have important ordering or dependency
Events: ▪ A single event oUen has no value, e.g. what is the value of one click or one temperature reading in a series? ▪ Some events are extremely valuable, but this is only detectable within the context of other events. ▪ Elements of events are oUen not easily aggregated ▪ A set of events usually has a natural order and dependencies
General model for organiza.onal use of data
Collect new data
Monitor Analyze Exceptions
Analyze Causes Decide Act
No problem No idea Do nothing
Act on the process Usually days/longer timeframe
Act within the process Usually real-time to daily
You need to be able to support both paths
Collect new data
Monitor Analyze Exceptions
Analyze Causes Decide Act
Act on the process
Act within the process
Streaming technologies
Analytics and BI
Different Usage Model Than Conven.onal BI A) Monitoring and detec8on is not repor8ng and
dashboards. Self-‐service BI doesn’t do it B) Lots of data, decreasing in value as the events
recede in 8me C) Analy8cs oUen required to surface meaningful
events, which requires collec8on and processing of (B) to process in real 8me to deliver (A).
D) Actua8on: machine managed, human mediated The future is not data to eyeballs, its machines to machines
Measurement started with the convenient data
The convenient data is transac8onal data. ▪ Goes in the DW and is used, even if it isn’t the right measurement.
The inconvenient data is observa8onal data. ▪ It’s not neat, clean, or designed into most systems of opera8on.
We need to build infrastructure that manages and enables use of data at rest and data in mo8on.
Bridge the data warehouse to other uses: SOA, not SQL
New technologies are needed to extend current capability. http://flickr.com/photos/higaara/228673603/
Ques.ons
1. Queues and streams process messages and objects. How is that made SQL compa8ble?
2. Why SQL when the standard is missing temporal constructs for this?
3. How do you use a single SQL statement across mul8ple streams (i.e., scale out the query)?
4. How much work is human-‐monitored, vs. human no8fied, vs. machine actuated? How big is this problem, really?
Ques.ons
5. What about playback? How do you replay history to trace an event?
6. What tooling is required? Is it possible to add stream monitoring and use exis8ng BI tools, or do we need new end user tools?
7. Linking the in-‐mo8on to the sta8onary, what are the mechanisms?
About Third Nature
Third Nature is a research and consulting firm focused on new and emerging technology and practices in analytics, business intelligence, and performance management. If your question is related to data, analytics, information strategy and technology infrastructure then you‘re at the right place.
Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors.
We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging technologies and markets, evaluating technology and hw it is applied rather than vendor market positions.
Twitter Tag: #briefr
The Briefing Room
Twitter Tag: #briefr
The Briefing Room
April: INTELLIGENCE
May: INTEGRATION
June: DATABASE
Upcoming Topics
www.insideanalysis.com
Twitter Tag: #briefr
The Briefing Room
Thank You for Your
Attention
Certain images and/or photos in this presentation are the copyrighted property of 123RF Limited, their Contributors or Licensed Partners and are being used with permission under license. These images and/or photos may not be copied or downloaded without permission from 123RF Limited.