SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends
-
Upload
dataversity -
Category
Data & Analytics
-
view
175 -
download
0
Transcript of SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends
What is Bottlenose For?
Bottlenose discovers the threats and opportunities that impact your business
Bottlenose does this using patented stream intelligence technology
2
Key Stream Intelligence Use-Cases
Threats• Risk detection• Crisis mitigation• Competitive threats• Reputational threats• Cyber threat detection
Opportunities• Audience and customer insights• Innovation and research• New business and market opportunities• Competitive intelligence• Product and marketing intelligence
3
Vision
Stream IntelligenceOur mission is to build the leading business intelligence company for stream data
Stream data is the fastest growing segment of data. It includes all types of live or historical, unstructured or structured, time-stamped data, such as: email and messaging data, social media, mobile data, news, IT log data, CRM data, support data, sales data, Web and app analytics data, financial data, sensor and device data.
We have built the first unified platform and application for automating the discovery of actionable intelligence across any stream data sources – We call this stream intelligence.
4
... the future belongs to raw unstructured or semi-structured data from both internal and external sources - increasingly delivered in (near) real-time.
This data has great value yet most organizations do not have the tech infrastructure to handle all this data.” - IDC
Problem: Massive growth of unstructured data cannot be managed effectively with existing Tech infrastructure
Real-Time Discovery Against Streaming Data is Required:
5
● There are never going to be enough data scientists or analysts to cope with the rise of unstructured stream data in the enterprise
● Analysts need automated stream intelligence tools to help them deal with the volume, velocity and variety of stream data
Analysts Are Drowning in Streams
Solution: Bottlenose Automates Stream Intelligence
• Bottlenose provides the most advanced automated stream intelligence that automatically finds patterns such as trends, anomalies, threats, opportunities and correlations in stream data
• Bottlenose is extremely easy to use and easy to derive value from right away without extensive engineering and IT involvement or long professional solutions
• The platform combines both internal enterprise data and external data from social, broadcast, web and other areas.
We are In The Stream Intelligence Sweet Spot
The Bottlenose solution is a new generation of tools that automates the production of actionable intelligence from stream data
Variety Velocity
Volume
&
ELK Stack
7
Stream Intelligence is “BI 3.0”
(Source: HBR)
8
BottlenosePlatform
Social & traditional media (social networks, blogs,
Forums, newswires)
98% of all live TV & Radio Broadcasts
Enterprise Data(Sales, financials, Web
analytics, IT systems, email, internal databases, etc.) Web Data, commercial data
sources, financial market data sources, public data
sources
Machine and sensor data (Internet-of-things, machine
data, weather data, etc.)
13Generate Actionable Intelligence from ANY Stream Data
13
Stream Intelligence Pipeline
Applications
Rules & Agents Alerts/Actions Based on Business Interests
Stream Data Storage & APILong-term storage,
real-time access, search & APIs
Trend DetectionExtrapolation, Correlation/Clustering
Data Mining & Analytics~30 Entity Types and ~150 Metrics
Ingestion & EnrichmentPush/Pull of Unstructured/Structured Data
Data in Motion
Alerts & Actions
New Patterns
Entities & Metrics
14
Breaking News
Automatically Discover Threats and Opportunities
Known Issues
Unknown/EmergingIssues
Customer Problems
Enterprise Risk Factor
Fraud Risk
Product Recall
Competitive Threat
Cyber Attack
FocalInterest
Power Outage
Natural DisasterTraffic Congestion
Device Failure
Financial Trading Anomaly
Reputation Risk
Intellectual Property Violation
15
Continuous High-Volume Stream Analytics• 3 billion live + historical messages analyzed every hour
• 72 billion records analyzed per day + predictive analytics on 7.2 billion
• 67,000,000 new messages ingested every day
• Trend detection at a rate of 1 million events per second
• 30 entity types recognized * 150 metrics per entity * 10’s of millions of entities = ~50 to 100 billion time series monitored and analyzed continuously
• Growing to 200 Terabytes of data stored & analyzed continuously in 2015
1000s of High-Level Detected Trends Per Hour• Automated data science layer applies machine learning, statistics, predictive
analytics to correlate, cluster, predict and analyze emergent trends
We See the Near Future Before Anyone Else• 80% of the time, our system detects breaking news and emerging threats,
opportunities and keywords up to 10’s to 100’s of minutes ahead of the media, Twitter, ad networks, etc. Similar advantages against non-text data sources
Key Metrics
Bottlenose analyzes 72 billion data
records every day
16
Customer Facing Products
● Analytics, intelligence, and discovery engine ○ Nerve Center○ Full-stack offering
● Streaming data services to applications ○ Bottlenose API (Platform)
18
‣ Advanced filtering & aggregations using simple OLAP interface
‣ “Interactive Analytics” thanks to sub-second query response time
‣ Add new data sources using central mapping system
Analytics Engine 19
A sophisticated Semantic approach is required to make sense of the raw data. The structure of data can be derived based on entities/dimensions the system has a pattern for. Machine learning techniques can begin to make inferences and match to known profiles as data flows in.
One of the most powerful capabilities is when different data sources need to be compared. A system like ours automatically normalizes them. For example, when the data has different time granularity, we automatically align different time periods in order to find overlaps.
Of course the Semantic engine can also be adjusted with a vertical industries unique facts, relationships, and jargon.
Need for a Radical New Form of Information Retrieval: Semantic meaning bottoms up from raw data
20
Application - Nerve Center
Our application provides a powerful suite of tools to find business insights in streaming data:
● Monitor: Real-time monitoring with powerful live visualizations.● Analyze: Fast interactive analytics to dig deep into the data.● Discover: Automated insight discovery. Get notified when new patterns
are detected.● Customize: Reports and live dashboards can be created for any vertical
by mix-and-matching insights & visualizations across any combination of data streams
21
Augmentation Engine
Analytics Engine
Detection Engine
Correlations Engine
Rules & Agent Engine
Processing Layers
Depth of Insight
23
Augmentation Engine
Analytics Engine
Detection Engine
Correlations Engine
Rules & Agent Engine
Processing Layers
Depth of Insight
24
Augmentation Engine
Analytics Engine
Detection Engine
Correlations Engine
Rules & Agent Engine
Processing Layers
Depth of Insight
25
‣ Typical topic stream like “Beyonce” (Pepsi)
‣ 4M new events (data records) per month
‣ ~8M unique entities tracked per month
‣ ~8M unique entities x 150 metrics x many time buckets = A lot of data points
‣ And this is just 1 stream. We have thousands of these running at all times...
Data Points 26
Augmentation Engine
Analytics Engine
Detection Engine
Correlations Engine
Rules & Agent Engine
Depth of Insight
Processing Layers 27
‣ Systematically walk through all data points.
‣ Continuous stream of categorized signals. Searchable.
Detection Engine 28
Detection EngineAnticipatePython servers for trend detection & extrapolation
DetectorWorkers that continuously aggregateentities and fetch corresponding metrics
Context GatheringFinding additional meta-data around detections
Time Series Extrapolation
ClusteringRolling clustering of trends based on overlapping meta-data and a variety of distance functions
Analytics Requests
Entities & Time Series
Analytics Requests
Related Entities
Find Related Trends
Related Trends
New and updated trends
32
‣ Python library, using SciPy‣ Algorithms for detection & extrapolation in
time series data‣ Includes tooling for debugging, training and
simulating‣ ~500 detections/CPU-core/second
Anticipate 33
Augmentation Engine
Analytics Engine
Detection Engine
Correlations Engine
Rules & Agent Engine
Depth of Insight
Processing Layers 34
Augmentation Engine
Analytics Engine
Detection Engine
Correlations Engine
Rules & Agent Engine
Depth of Insight
Processing Layers 35
Our automated insight discovery on streaming data enables “intelligence as a service for every organization”
Intelligence as a Service 36
[email protected]://twitter.com/bottlenoseapp
Contact37