SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

37
Finding the Signal in the Noise June 15, 2015 Webinar Presentation Nova Spivack, CEO [email protected]

Transcript of SmartData Webinar Slides: How to analyze 72 billion messages a day to find trends

Finding the Signal in the Noise

June 15, 2015Webinar Presentation

Nova Spivack, [email protected]

What is Bottlenose For?

Bottlenose discovers the threats and opportunities that impact your business

Bottlenose does this using patented stream intelligence technology

2

Key Stream Intelligence Use-Cases

Threats• Risk detection• Crisis mitigation• Competitive threats• Reputational threats• Cyber threat detection

Opportunities• Audience and customer insights• Innovation and research• New business and market opportunities• Competitive intelligence• Product and marketing intelligence

3

Vision

Stream IntelligenceOur mission is to build the leading business intelligence company for stream data

Stream data is the fastest growing segment of data. It includes all types of live or historical, unstructured or structured, time-stamped data, such as: email and messaging data, social media, mobile data, news, IT log data, CRM data, support data, sales data, Web and app analytics data, financial data, sensor and device data.

We have built the first unified platform and application for automating the discovery of actionable intelligence across any stream data sources – We call this stream intelligence.

4

... the future belongs to raw unstructured or semi-structured data from both internal and external sources - increasingly delivered in (near) real-time.

This data has great value yet most organizations do not have the tech infrastructure to handle all this data.” - IDC

Problem: Massive growth of unstructured data cannot be managed effectively with existing Tech infrastructure

Real-Time Discovery Against Streaming Data is Required:

5

● There are never going to be enough data scientists or analysts to cope with the rise of unstructured stream data in the enterprise

● Analysts need automated stream intelligence tools to help them deal with the volume, velocity and variety of stream data

Analysts Are Drowning in Streams

Solution: Bottlenose Automates Stream Intelligence

• Bottlenose provides the most advanced automated stream intelligence that automatically finds patterns such as trends, anomalies, threats, opportunities and correlations in stream data

• Bottlenose is extremely easy to use and easy to derive value from right away without extensive engineering and IT involvement or long professional solutions

• The platform combines both internal enterprise data and external data from social, broadcast, web and other areas.

We are In The Stream Intelligence Sweet Spot

The Bottlenose solution is a new generation of tools that automates the production of actionable intelligence from stream data

Variety Velocity

Volume

&

ELK Stack

7

Competitive Advantage from Coping with Stream Data 9

Competitive Advantage from Coping with Stream Data 10

Competitive Advantage from Coping with Stream Data 11

Competitive Advantage from Coping with Stream Data 12

BottlenosePlatform

Social & traditional media (social networks, blogs,

Forums, newswires)

98% of all live TV & Radio Broadcasts

Enterprise Data(Sales, financials, Web

analytics, IT systems, email, internal databases, etc.) Web Data, commercial data

sources, financial market data sources, public data

sources

Machine and sensor data (Internet-of-things, machine

data, weather data, etc.)

13Generate Actionable Intelligence from ANY Stream Data

13

Stream Intelligence Pipeline

Applications

Rules & Agents Alerts/Actions Based on Business Interests

Stream Data Storage & APILong-term storage,

real-time access, search & APIs

Trend DetectionExtrapolation, Correlation/Clustering

Data Mining & Analytics~30 Entity Types and ~150 Metrics

Ingestion & EnrichmentPush/Pull of Unstructured/Structured Data

Data in Motion

Alerts & Actions

New Patterns

Entities & Metrics

14

Breaking News

Automatically Discover Threats and Opportunities

Known Issues

Unknown/EmergingIssues

Customer Problems

Enterprise Risk Factor

Fraud Risk

Product Recall

Competitive Threat

Cyber Attack

FocalInterest

Power Outage

Natural DisasterTraffic Congestion

Device Failure

Financial Trading Anomaly

Reputation Risk

Intellectual Property Violation

15

Continuous High-Volume Stream Analytics• 3 billion live + historical messages analyzed every hour

• 72 billion records analyzed per day + predictive analytics on 7.2 billion

• 67,000,000 new messages ingested every day

• Trend detection at a rate of 1 million events per second

• 30 entity types recognized * 150 metrics per entity * 10’s of millions of entities = ~50 to 100 billion time series monitored and analyzed continuously

• Growing to 200 Terabytes of data stored & analyzed continuously in 2015

1000s of High-Level Detected Trends Per Hour• Automated data science layer applies machine learning, statistics, predictive

analytics to correlate, cluster, predict and analyze emergent trends

We See the Near Future Before Anyone Else• 80% of the time, our system detects breaking news and emerging threats,

opportunities and keywords up to 10’s to 100’s of minutes ahead of the media, Twitter, ad networks, etc. Similar advantages against non-text data sources

Key Metrics

Bottlenose analyzes 72 billion data

records every day

16

Demo: Data Agnostic Stream Intelligence

17

Customer Facing Products

● Analytics, intelligence, and discovery engine ○ Nerve Center○ Full-stack offering

● Streaming data services to applications ○ Bottlenose API (Platform)

18

‣ Advanced filtering & aggregations using simple OLAP interface

‣ “Interactive Analytics” thanks to sub-second query response time

‣ Add new data sources using central mapping system

Analytics Engine 19

A sophisticated Semantic approach is required to make sense of the raw data. The structure of data can be derived based on entities/dimensions the system has a pattern for. Machine learning techniques can begin to make inferences and match to known profiles as data flows in.

One of the most powerful capabilities is when different data sources need to be compared. A system like ours automatically normalizes them. For example, when the data has different time granularity, we automatically align different time periods in order to find overlaps.

Of course the Semantic engine can also be adjusted with a vertical industries unique facts, relationships, and jargon.

Need for a Radical New Form of Information Retrieval: Semantic meaning bottoms up from raw data

20

Application - Nerve Center

Our application provides a powerful suite of tools to find business insights in streaming data:

● Monitor: Real-time monitoring with powerful live visualizations.● Analyze: Fast interactive analytics to dig deep into the data.● Discover: Automated insight discovery. Get notified when new patterns

are detected.● Customize: Reports and live dashboards can be created for any vertical

by mix-and-matching insights & visualizations across any combination of data streams

21

Bottlenose Platform

Ingest Augment

Analytics Engine

Discovery Engine

Nerve Center®

Store

22

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Processing Layers

Depth of Insight

23

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Processing Layers

Depth of Insight

24

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Processing Layers

Depth of Insight

25

‣ Typical topic stream like “Beyonce” (Pepsi)

‣ 4M new events (data records) per month

‣ ~8M unique entities tracked per month

‣ ~8M unique entities x 150 metrics x many time buckets = A lot of data points

‣ And this is just 1 stream. We have thousands of these running at all times...

Data Points 26

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Depth of Insight

Processing Layers 27

‣ Systematically walk through all data points.

‣ Continuous stream of categorized signals. Searchable.

Detection Engine 28

29

Detection EngineAnticipatePython servers for trend detection & extrapolation

DetectorWorkers that continuously aggregateentities and fetch corresponding metrics

Context GatheringFinding additional meta-data around detections

Time Series Extrapolation

ClusteringRolling clustering of trends based on overlapping meta-data and a variety of distance functions

Analytics Requests

Entities & Time Series

Analytics Requests

Related Entities

Find Related Trends

Related Trends

New and updated trends

32

‣ Python library, using SciPy‣ Algorithms for detection & extrapolation in

time series data‣ Includes tooling for debugging, training and

simulating‣ ~500 detections/CPU-core/second

Anticipate 33

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Depth of Insight

Processing Layers 34

Augmentation Engine

Analytics Engine

Detection Engine

Correlations Engine

Rules & Agent Engine

Depth of Insight

Processing Layers 35

Our automated insight discovery on streaming data enables “intelligence as a service for every organization”

Intelligence as a Service 36