Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing...

19
Stream Processing Key Driver for Enabling Instant Insights on Big Data

Transcript of Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing...

Page 1: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Stream ProcessingKey Driver for Enabling Instant Insights on Big Data

Page 2: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Pritesh Maker

• Background• Presently leading Engineering at DataTorrent• Over a decade of experience in Data Management technologies including Data

Integration, Data Virtualization and Data Quality & Profiling • Past Roles include leading Engineering at Informatica for their Big Data

Management products and core Data Engine • Interested in All Things Data!

• Education• BS in Computer Science from University of Texas at Austin• MBA from Haas School of Business, University of California at Berkeley

• Connect with me• LinkedIn: https://www.linkedin.com/in/priteshmaker• Twitter: @priteshmaker

Page 3: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Why is Stream Processing Vital?

Page 4: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

SOURCE DATA

MS Queue’s

Events

XML Files

Databases

Sensor data

Social

Enterprise

Repositories

RDBMS

EDW

NoSQL

Feed m

Feed 2

Feed 1

Load

(Optional) Staging Area

Traditional Analytics – Data at Rest

Business Analytics

Business Intelligence

Visualization Tools

Vis

ual

ize

Analyze

Extract Transform

Feed n

Feed 2

Feed 1

Visualize

Page 5: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Next Generation – Data in Motion • Organizations need to react to changing business conditions in real time

• Faster decision making across all industries • Few companies outside of financial markets, telecom & utilities have experience with

streaming

• Newer data sources – like sensors, social media feeds • Higher Volume and Greater Velocity • More unstructured and semi-structured data

• Democratization of technologies • Open Source Projects • Large Scale Compute & Storage – Hadoop, NoSQL• Streaming Technologies – Apex, Spark, Storm etc. • Real-time dashboards and alert notification systems

• Beyond niche use cases • Broad applicability but needs more adoption

Page 6: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Stream vs. Batch Processing Pipelines

Ingest

Archive

Transform

Normalize

Transform Analyze ActionVisualize/

PersistIngest

Stream Processing Data Pipeline

Batch Processing Data Pipeline

Extract Transform Load Analyze Action

Page 7: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Stream Processing•Continuous processing on data as it flows through a

system•Allows users to act on events instantaneously via

alerts•Processing related to time (event time vs. processing

time)• Real-Time – diff between event time and processing

time is negligible

Enables your Data In Motion Architecture

Page 8: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Big Data Application Types

Data Discovery

Da

ta v

elo

cit

y

IoT

Fraud

CDR

CDC

Reporting

SQL

Operations

Data Discovery

SQL on

Streams

Streaming

Disovery

Ad Hoc

Query

Batch

Processing

Stream

Processing

Stream

Processing

Page 9: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Sample Streaming Analytics Patterns

Preprocessing

• Filtering events

• Transforming attributes

Alerts & Thresholds

• Based on complex conditions

Computing within Windows

• Aggregations

Combining Event Streams

• Correlation

• Error detection

Enrichment

• Looking up database, reference data

Temporal Events

• Detecting events within time windows

Tracking

• Tracking events over space & time

Trend Detection

• Rise, Fall

• Outliers

Source: https://iwringer.wordpress.com/2015/08/03/patterns-for-streaming-realtime-analytics/

Page 10: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Stream Processing Use Cases

Page 11: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Financial Services

• Detect fraudulent activity in real-time

• Risk Analysis

• Deliver personalized products and

offerings

• Make decisions in real-time for trading

and transactional platforms

Page 12: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Financial services big data fabric

Secure, fault tolerant, data

ingestion, formatting & archiving.

Data access layer for application

processing

Financial Data

SMTP Logs

Historical

Application n

Application 1

Persistent

Encrypt Compliance Alert on error

Archive

Page 13: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Telecom

• Real-time network monitoring and

protection

• Quality of service and Customer

Satisfaction

• Take action based on users’ location

• Automatic resource allocation and load

balancing

Page 14: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Online Advertising

• Dynamic bidding

• Real-time targeting & personalization

• Maximize click-through and

conversion rates.

• Reporting that can be updated

continuously

Page 15: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Online advertising dynamic inventory purchases

High volume auto-scaling fault

tolerant event stream.

Dimensional computing to identify

performing ads.Ad Server 1

Ad Server 800

Real-time

Dashboard

Ad Placement

Strategy

Oracle DB

Fault-Tolerant

Flume

In-memory

analytic cube

Campaign

Analysis

Page 16: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Internet of Things

• Environment monitoring

• Infrastructure management

• Manufacturing

• Energy management

• Public Building & Home automation

• Transportation

Page 17: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

IoT secure ingestion and predictive analysis

High performance, multi-customer

secure, data ingestion. Complex

event processing with historical

data for predictive maintenance

Sensor 2

Sensor 1

Sensor N

Application n

Application 1

Persistent

Data

Governance

Complex

Event Process

Predictive

maintenance

Page 18: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries

Stream Processing: Conclusion

• Lots of untapped potential!• Gives your business a competitive edge!

• Open Source and Big Data technologies • Built to address the scale and latency

demands

• Broad use cases • Across industries and verticals

Page 19: Stream Processing · Next Generation –Data in Motion •Organizations need to react to changing business conditions in real time •Faster decision making across all industries