Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... ·...

19
Stream Processing Key Driver for Enabling Instant Insights on Big Data

Transcript of Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... ·...

Page 1: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Stream ProcessingKey Driver for Enabling Instant Insights on Big Data

Page 2: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Pritesh Maker

• Background• Presently leading Engineering at DataTorrent• Over a decade of experience in Data Management technologies including Data

Integration, Data Virtualization and Data Quality & Profiling • Past Roles include leading Engineering at Informatica for their Big Data

Management products and core Data Engine • Interested in All Things Data!

• Education• BS in Computer Science from University of Texas at Austin• MBA from Haas School of Business, University of California at Berkeley

• Connect with me• LinkedIn: https://www.linkedin.com/in/priteshmaker• Twitter: @priteshmaker

Page 3: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Why is Stream Processing Vital?

Page 4: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

SOURCE DATA

MS Queue’s

Events

XML Files

Databases

Sensor data

Social

Enterprise

Repositories

RDBMS

EDW

NoSQL

Feed m

Feed 2

Feed 1

Load

(Optional) Staging Area

Traditional Analytics – Data at Rest

Business Analytics

Business Intelligence

Visualization Tools

Vis

ual

ize

Analyze

Extract Transform

Feed n

Feed 2

Feed 1

Visualize

Page 5: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Next Generation – Data in Motion • Organizations need to react to changing business conditions in real time

• Faster decision making across all industries • Few companies outside of financial markets, telecom & utilities have experience with

streaming

• Newer data sources – like sensors, social media feeds • Higher Volume and Greater Velocity • More unstructured and semi-structured data

• Democratization of technologies • Open Source Projects • Large Scale Compute & Storage – Hadoop, NoSQL• Streaming Technologies – Apex, Spark, Storm etc. • Real-time dashboards and alert notification systems

• Beyond niche use cases • Broad applicability but needs more adoption

Page 6: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Stream vs. Batch Processing Pipelines

Ingest

Archive

Transform

Normalize

Transform Analyze ActionVisualize/

PersistIngest

Stream Processing Data Pipeline

Batch Processing Data Pipeline

Extract Transform Load Analyze Action

Page 7: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Stream Processing•Continuous processing on data as it flows through a

system•Allows users to act on events instantaneously via

alerts•Processing related to time (event time vs. processing

time)• Real-Time – diff between event time and processing

time is negligible

Enables your Data In Motion Architecture

Page 8: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Big Data Application Types

Data Discovery

Da

ta v

elo

cit

y

IoT

Fraud

CDR

CDC

Reporting

SQL

Operations

Data Discovery

SQL on

Streams

Streaming

Disovery

Ad Hoc

Query

Batch

Processing

Stream

Processing

Stream

Processing

Page 9: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Sample Streaming Analytics Patterns

Preprocessing

• Filtering events

• Transforming attributes

Alerts & Thresholds

• Based on complex conditions

Computing within Windows

• Aggregations

Combining Event Streams

• Correlation

• Error detection

Enrichment

• Looking up database, reference data

Temporal Events

• Detecting events within time windows

Tracking

• Tracking events over space & time

Trend Detection

• Rise, Fall

• Outliers

Source: https://iwringer.wordpress.com/2015/08/03/patterns-for-streaming-realtime-analytics/

Page 10: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Stream Processing Use Cases

Page 11: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Financial Services

• Detect fraudulent activity in real-time

• Risk Analysis

• Deliver personalized products and

offerings

• Make decisions in real-time for trading

and transactional platforms

Page 12: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Financial services big data fabric

Secure, fault tolerant, data

ingestion, formatting & archiving.

Data access layer for application

processing

Financial Data

SMTP Logs

Historical

Application n

Application 1

Persistent

Encrypt Compliance Alert on error

Archive

Page 13: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Telecom

• Real-time network monitoring and

protection

• Quality of service and Customer

Satisfaction

• Take action based on users’ location

• Automatic resource allocation and load

balancing

Page 14: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Online Advertising

• Dynamic bidding

• Real-time targeting & personalization

• Maximize click-through and

conversion rates.

• Reporting that can be updated

continuously

Page 15: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Online advertising dynamic inventory purchases

High volume auto-scaling fault

tolerant event stream.

Dimensional computing to identify

performing ads.Ad Server 1

Ad Server 800

Real-time

Dashboard

Ad Placement

Strategy

Oracle DB

Fault-Tolerant

Flume

In-memory

analytic cube

Campaign

Analysis

Page 16: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Internet of Things

• Environment monitoring

• Infrastructure management

• Manufacturing

• Energy management

• Public Building & Home automation

• Transportation

Page 17: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

IoT secure ingestion and predictive analysis

High performance, multi-customer

secure, data ingestion. Complex

event processing with historical

data for predictive maintenance

Sensor 2

Sensor 1

Sensor N

Application n

Application 1

Persistent

Data

Governance

Complex

Event Process

Predictive

maintenance

Page 18: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster

Stream Processing: Conclusion

• Lots of untapped potential!• Gives your business a competitive edge!

• Open Source and Big Data technologies • Built to address the scale and latency

demands

• Broad use cases • Across industries and verticals

Page 19: Stream Processing - Meetupfiles.meetup.com/18649828/Pune_Apex_Meetup_03_Feb_2016_Stream... · •Organizations need to react to changing business conditions in real time •Faster