Overview of Microsoft StreamInsight

44
Overview of Microsoft StreamInsight Torsten Grabs Lead Program Manager Microsoft StreamInsight

description

Overview of Microsoft StreamInsight. Torsten Grabs Lead Program Manager Microsoft StreamInsight. The Need for an Event-Driven Platform. Event. Analytical results need to reflect important changes in business reality immediately and enable responses to them with minimal latency. request. - PowerPoint PPT Presentation

Transcript of Overview of Microsoft StreamInsight

Page 1: Overview of  Microsoft  StreamInsight

Overview of Microsoft StreamInsight

Torsten GrabsLead Program ManagerMicrosoft StreamInsight

Page 2: Overview of  Microsoft  StreamInsight

The Need for an Event-Driven Platform

2

Analytical results need to reflect important changes in business reality immediately and enable responses to them with minimal latency

Database Applications Event-driven ApplicationsQuery Paradigm

Ad-hoc queries or requests

Continuous standing queries

Latency Seconds, hours, days Milliseconds or less

Data Rate Hundreds of events/sec Tens of thousands of events/sec or more

Query Semantics

Declarative relational analytics

Declarative relational and temporal analytics

request

response

Eventoutput stream

input stream

Page 3: Overview of  Microsoft  StreamInsight

Relational Database Applications

Financial trading Applications

Scenarios for Event-Driven Applications

Aggregate Data Rate (Events/sec.)

Latency

0 10 100 1000 10000 100000 ~1million

Months

Days

hours

Minutes

Seconds

100 ms

< 1ms

Operational Analytics Applications, e.g., Logistics,

etc.

Manufacturing Applications

Monitoring Applications

CEP Target Scenarios

Data Warehousing ApplicationsWeb Analytics Applications

3

Page 4: Overview of  Microsoft  StreamInsight

Example Scenarios

4

Dat

a St

ream

Stream Data Store & Archive

Event Processing Engine

Dat

a St

ream

Asset Specs & Parameters

Power, Utilities:• Energy

consumption• Outages• Smart grids• 100,000

events/sec

Visual trend-line and KPI monitoringBatch & product managementAutomated anomaly detectionReal-time customer segmentation Algorithmic tradingProactive condition-based maintenance

Web Analytics:• Click-stream

data• Online

customer behavior

• Page layout• 100,000

events /sec

Manufacturing:• Sensor on plant

floor• React through

device controllers

• Aggregated data

• 10,000 events/sec

• Threshold queries• Event correlation from

multiple sources• Pattern queries

Lookup

Asset Instrumentation for Data Acquisition, Subscriptions to Data Feeds

Financial Services:• Stock & news

feeds• Algorithmic

trading• Patterns over

time• Super-low

latency• 100,000

events /sec

Page 5: Overview of  Microsoft  StreamInsight

Standing Queries

Query Logic

Event sources Event targets

`

Devices, Sensors

Web servers

Event stores & Databases

Stock ticker, news feeds

Event stores & Databases

Pagers &Monitoring devices

KPI Dashboards, SharePoint UI

Trading stations

InputAdapter

s

OutputAdapter

sStreamInsight

Engine

Query Logic

Query Logic

StreamInsightApplication Development

StreamInsight Application at Runtime

StreamInsight Platform

Page 6: Overview of  Microsoft  StreamInsight

What is Project “Austin”?

Analytics

Integrated

• Rich temporal (StreamInsight) and sequential (Reactive Framework) analytics models

• Dynamic, flexible query and data source management experience

• Turn key connectivity for platform data sources and sinks (SQL Azure, Windows Azure Table Storage)

• Integrated with Azure management portal and billing experiences

• Real time data collection from wide variety of connected devices (Sensors, Smart Meters, Servers, Tablets, Phones)

• Standards compliant endpoints (REST, XML, JSON)• Securable data ingress with data enrichment and transformation (geo-

tagging, etc.)

• Multi-tenant Azure service with flexible, elastic capacity for collection and analytics

• Federated scale out collection and analytics• Distributed service monitoring and tracing

Connected

Scalable

Page 7: Overview of  Microsoft  StreamInsight

StreamInsight on Azure: “Austin”

Standing Queries

Stream-Insight Query

Prebuilt InputAdapters Austin

StreamInsight Engine

Reactive Query

Stream-Insight Query

StreamInsightApplication Development

StreamInsight Application at Runtime

Prebuilt Output

AdaptersData

Egress Adapter

Data Egress Adapter

RESTfulendpoint

AzureTables

Management Service Monitoring Service

Scalable Data Ingress Adapter

Authentication

Built-in Archive

Page 8: Overview of  Microsoft  StreamInsight

EventsEvents expose different temporal characteristics

Point in time eventsInterval events with fixed durationInterval events with initially unknown duration

Rich payloads capture all properties of an event

t1 t4t3t2 t5Time

Payl

oad/

val

ue

ab

c de

Page 9: Overview of  Microsoft  StreamInsight

Event TypesEvents in Microsoft’s CEP platform use the .NET type systemEvents are structured and can have multiple fieldsFields are typed using the .NET framework types CEP engine provisioned timestamp fields capture all the different temporal event characteristicsEvent sources populate time stamp fields

Timestamps/Metadata

LongpumpID

StringType

StringLocation

Doubleflow

Doublepressure

… … … … … …

Page 10: Overview of  Microsoft  StreamInsight

Event Streams & AdaptersA stream is a possibly infinite sequence of events

Insertions of new eventsChanges to event durations

Stream characteristics:Event/data arrival patterns

Steady rate with end-of-stream indication Intermittent, random, or in bursts

Out of order events: Order of arrival of events does not match the order of their application timestamps

AdaptersReceive/get events from the data sourceEnqueue events for processing in the engine

10

Page 11: Overview of  Microsoft  StreamInsight

Typical CEP QueriesTypical CEP queries require combination of functionality

Complex type describes event propertiesCalculations introduce additional event propertiesGrouping by one or more event propertiesAggregation for each event group over a pre-defined period of time, typically a windowMultiple event groups monitored by the same queryCorrelate event streams Check for absence of activity with a data sourceEnrich events with reference dataCollection of assets may change over time

We want to make writing and maintaining those queries easy or even effortless

Page 12: Overview of  Microsoft  StreamInsight

StreamInsight Query FeaturesOperators over streams

Calculations (PROJECT)Correlation of streams from different data sources (JOIN)Check for absence of activity with a data source (EXISTS)Selection of events from streams (FILTER)Stream partitioning (GROUP & APPLY)Aggregation (SUM, COUNT, …) Ranking and heavy hitters (TOP-K)Temporal operations: hopping window, sliding window

Extensibility – to add new domain-specific operators

Page 13: Overview of  Microsoft  StreamInsight

LINQ Query Examples

LINQ Example – GROUP&APPLY, WINDOW:from e3 in MyStream3group e3 by e3.i into SubStreamfrom win in SubStream.HoppingWindow( FiveMinutes,ThreeSeconds)select new { i = SubStream.Key, a = win.Avg(e => e.f) };

LINQ Example – JOIN, PROJECT, FILTER:from e1 in MyStream1join e2 in MyStream2 on e1.ID equals e2.IDwhere e1.f2 == “foo”select new { e1.f1, e2.f4 };

Join

FilterProject

Grouping

WindowProject &Aggregate

Page 14: Overview of  Microsoft  StreamInsight

Extensibility SDKBuilt-in operators do not cover all functionality

Need for domain-specific extensionsIntegrate with functionality from existing libraries

Support for extensions in the CEP platform:User-defined operators, functions, aggregatesCode written in .NET, deployed as .NET assemblyQuery operators and LINQ can refer to functionality of the assembly

Temporal snap-shot operator frameworkInterface to implement user-defined operatorsManages operator state and snapshot changesFramework does the heavy lifting to deal with intricate temporal behavior such as out-of-order events

Page 15: Overview of  Microsoft  StreamInsight

ResiliencyOutages happen in computing

Power outages“Patch Tuesday”Human mistakes

Planned and unplanned downtimeSystems need to be “resilient” to outages

Minimize damageBecome operational again quickly

The specific requirements depend on how mission critical your applications is

Page 16: Overview of  Microsoft  StreamInsight

Resiliency: Timeliness

Timeliness: recover from outages quickly.Goal is simple: as fast as possible.

StreamInsight doesn’t store event data, but it does store query state.

This may be significant.This may be slow to recreate.

Page 17: Overview of  Microsoft  StreamInsight

Resiliency: Correctness

c) Rough aggregation: get the moving average price of a stock over the last day.• Missing a few inputs will result in inaccurate, but

close results.• Still don’t want to lose a day’s worth of work.

a) Exact equivalence.• The same stream of events, regardless of outage.

b) Equivalent events.• No missed events, and no wrong events, but

duplicates are allowed.

Three Levels:

Page 18: Overview of  Microsoft  StreamInsight

What is Checkpointing?

Checkpointing saves a query’s state to disk.You control when the checkpoint is initiated.SI takes care of saving out consistent state.

After an outage, StreamInsight can restore this state.

This limits state loss during an outage, speeding recovery.Level of correctness depends on additional work we are able to perform.Recovery process is coordinated by SI.

Page 19: Overview of  Microsoft  StreamInsight

Checkpointing API

public IAsyncResult server.BeginCheckpoint(Query query, AsyncCallback asyncCallback, object asyncState);

public bool server.EndCheckpoint(IAsyncResult asyncResult);

public void server.CancelCheckpoint(IAsyncResult asyncResult);

Page 20: Overview of  Microsoft  StreamInsight

When is Checkpointing Useful?

Provides a mechanism to recover from an outage:

To recover from unexpected system failure.To handle expected outages (e.g., patch Tuesday).For machine migration.

Not a panacea:Does not provide uninterrupted service.Does not protect against broken query logic.

Page 21: Overview of  Microsoft  StreamInsight

Using Checkpoints

We’ll walk through the three progressively-strict checkpointing scenarios:1. State retention.2. Equivalent events.3. Exact equivalence.

Page 22: Overview of  Microsoft  StreamInsight

Low Bar: State Retention

Ideal output:

Real output:

HGFEDCBA …

BA H’G’F’ …

Page 23: Overview of  Microsoft  StreamInsight

Checkpointing

jihgfedc …

jihgfedc …

Enqueue markers into input streams to instructoperators to save their state.

Page 24: Overview of  Microsoft  StreamInsight

Checkpointing

jihgfedc …

jihgfedc …

oops

Page 25: Overview of  Microsoft  StreamInsight

Recovery

nmlkjihg …

nmlkjihg …

Load saved operatorstate and then startconsuming input.

Page 26: Overview of  Microsoft  StreamInsight

Medium Bar: Equivalent Events

Ideal output:

Real output:

HGFEDCBA …

BA DCB …

Page 27: Overview of  Microsoft  StreamInsight

Filling the Gaps

StreamInsight needs help:Missing state since last checkpoint.Missed events during outage.

Solution: replayable adapters.The dance:1. StreamInsight picks a place in the input stream.2. StreamInsight communicates this to the input

adapter.3. The input adapter replays from the chosen

spot.

Page 28: Overview of  Microsoft  StreamInsight

Checkpointing…

…jihgfedc kjihgfed lkjihgfe

jihgfedc kjihgfed lkjihgfe

Page 29: Overview of  Microsoft  StreamInsight

Recovery

lkjihgfe …

lkjihgfe …

Page 30: Overview of  Microsoft  StreamInsight

A Place in the Stream

0

1

2

3

4

5

6

7

8

App

licat

ion

Tim

e

hgfedcba …

Physical Stream

0

1

2

3

4

5

6

7

8

App

licat

ion

Tim

e

0

1

2

3

4

5

6

7

8

App

licat

ion

Tim

e

0

1

2

3

4

5

6

7

8

Hig

h W

ater

Mar

k

Page 31: Overview of  Microsoft  StreamInsight

Communicating the State

Input adapter factories can optionally implement one of

IHighWaterMarkInputAdapterFactoryIHighWaterMarkTypedInputAdapterFactory

In a recovery situation, StreamInsight will then call Create with a high-water mark.The factory is then responsible for properly cueing the input.

Page 32: Overview of  Microsoft  StreamInsight

StreamInsight in Action

Internet of Things Demo

Page 33: Overview of  Microsoft  StreamInsight

The Demo

Sensor Data

Control Data

Status Data

Alert Data

Historical Data

StreamInsight“Austin”

Page 34: Overview of  Microsoft  StreamInsight

StreamInsight Design PrinciplesScalability – Aggregate data rate keeps increasing.

Minimum resources impact (co-located).Local computationAvoid flooding the network

ProgrammabilityExtensibility – UserDefinedAggregates, UserDefinedFunctions, UserDefinedOperators.Composability.Developer experience (language, IDE, debugging, supportability)

AdaptablityEasy to integrate via adapters.Portability (servers, edge devices)

34

Page 35: Overview of  Microsoft  StreamInsight

StreamInsight Architecture

35

Host Process

...

Web Service

Engine

Compiler

Expression / Type

Service

Runtime

Execution Operators

Stream Manager

Event Manager

Query Scheduler

Plan Manager

Synopsis

Command Dispatcher

Management Service

Metadata Diagnostics / TracingStream OS

Adap-ters

Page 36: Overview of  Microsoft  StreamInsight

Host Process

...

Web Service

Engine

Compiler

Expression / Type Service

Runtime

Execution Operators

Stream Manager

Event Manager

Query Scheduler

Plan Manager

Synopsis

Command Dispatcher

Management Service

Metadata Diagnostics / TracingStream OS

Adapters

Highlights• Manageability API for query

management (i.e. create, start, stop, delete query) and supportability / monitoring of running queries

• Same manageability API for both embedded deployment and web service clients

Management Service

Page 37: Overview of  Microsoft  StreamInsight

Host Process

...

Web Service

Engine

Compiler

Expression / Type

Service

Runtime

Execution Operators

Stream Manager

Event Manager

Query Scheduler

Plan Manager

Synopsis

Command Dispatcher

Management Service

Metadata Diagnostics / TracingStream OS

Adapters

Compiler & Expressions

Highlights• Standardized IL allows us to implement a

variety of syntactic surfaces over the algebra - e.g., LINQ, CQL, etc.• Allows for domain-specific front-end

languages• Prepared for future extensions

• Compile time type checking and type safe code generation for minimal runtime impact.

• Support for UDF’s, UDAggs, UDOs.• JIT code generation for field references ,

expression evaluation for low latency processing of high event rates.

• Basing on CLR helps leverage – • Code generator, JIT support • Type System• Tools and Libraries (LINQ Expressions,

IDE, etc.)

Page 38: Overview of  Microsoft  StreamInsight

Host Process

...

Web Service

Engine

Compiler

Expression / Type Service

Runtime

Execution Operators

Stream Manager

Event Manager

Query Scheduler

Plan Manager

Synopsis

Command Dispatcher

Management Service

Metadata Diagnostics / TracingStream OS

Adapters

Highlights• JIT code generation for field references, expression

evaluation because interpreting these references is sub-optimal for low latency processing of high event rates. • Leverage JIT code generation support in CLR runtime for

LINQ expressions.• Bind the query to different deployment environments based

on the metadata.• Event manager is implemented as a combination of

managed and native code in order to minimize overhead and ensure predictable performance.

• Events are read-only and referenced-counted by streams (minimize data copying)

Events & Streams

Page 39: Overview of  Microsoft  StreamInsight

Host Process

...

Web Service

Engine

Compiler

Expression / Type Service

Runtime

Execution Operators

Stream Manager

Event Manager

Query Scheduler

Plan Manager

Synopsis

Command Dispatcher

Management Service

Metadata Diagnostics / TracingStream OS

Adapters

Highlights• A query is executed by scheduling the individual operators

as they become active.• Operator state transition is managed by the Scheduler.• When an operator becomes active a thread is scheduled for

execution. • Scheduling decision based on priority of the query and other

parameters.• Data flow architecture: reduced coupling and pipeline

parallelism• Operators are affinitized to a thread/core (multi-core

environments) to decrease lock contention and increase caching benefits. Periodic checks and migration for load balancing

.

Query Scheduler

Page 40: Overview of  Microsoft  StreamInsight

Host Process

...

Web Service

Engine

Compiler

Expression / Type Service

Runtime

Execution Operators

Stream Manager

Event Manager

Query Scheduler

Plan Manager

Synopsis

Command Dispatcher

Management Service

Metadata Diagnostics / TracingStream OS

Adapters

Highlights• Efficient implementation of

operators that perform incremental evaluation as each event is processed.

• Clean, formal semantics. Leverage relational semantics whenever possible.

• GroupAndApply Operator • Enables parallelism for

scale-up (multi-core).• Groups are dynamically

instantiated and torn down based upon the data. Large numbers of groups can be simultaneously active. (~50M active groups for MSN.com)

XYZ

Group A,B,C

ApplyApplyApply

Union X,Y,Z

ZZZ

YYY

XXX

BB

B

AA

AB

C

CC C

Execution Operators

Page 41: Overview of  Microsoft  StreamInsight

The StreamInsight TeamFounded in 2008 based on incubation between MSR and SQL teamsSmall team – by Microsoft standards Roles in Microsoft engineering teams

Program Managers: customer scenarios, functional specs, APIs, project mgmt, evangelismDevelopers: architecture, technical design, product code, unit testsTesters: test breakout, test code, lab runs, release signoff

Using agile development methods

Page 42: Overview of  Microsoft  StreamInsight

StreamInsight RoadmapStreamInsight 2.1 (on prem)

Development experienceMajor API overhaul

StreamInsight on Azure (Cloud)

StreamInsight service on Windows AzureCurrently private CTPGA this summer

• Using Scrum to organize and manage schedules• Work organized in sprints/milestones• CTP (Community Technology Preview) after each

milestone – similar to public beta• TAP (Technology Adopter Program) as we get closer to

the planned release

Page 43: Overview of  Microsoft  StreamInsight

For More Information

StreamInsight download location: http://go.microsoft.com/fwlink/?LinkId=160598 StreamInsight blog: http://blogs.msdn.com/streaminsight/ StreamInsight MSDN documentation: http://msdn.microsoft.com/en-us/library/ee362541(SQL.105).aspx StreamInsight MSDN portal: http://msdn.microsoft.com/en-us/ee476990.aspx

Page 44: Overview of  Microsoft  StreamInsight