Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go,...

75
Extracting Real-Time Insights from Streaming Data Roger Barga, GM Robotics & Autonomous Systems Amazon Web Services

Transcript of Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go,...

Page 1: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Extracting Real-Time Insights from Streaming Data

Roger Barga, GM

Robotics & Autonomous Systems

Amazon Web Services

Page 2: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

#Real-time

Page 3: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

• Durable

• Continuous

• Fast

• Correct

• Reactive

• Reliable

Process Real-Time Streaming Data

Key Requirements?

Ingest Transform Analyze React Persist

Page 4: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Amazon CloudWatch Logs

Amazon S3 Events

AWS Metering Amazon.com’s Catalog

We All Use Kinesis

Page 5: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

At Hearst, the Clickstream is the real-timetransmission, collection, and processing of thedata generated by Hurst customers across all theirdevices

Action Data

Scrolling or Mouse movement

Event Data

Highlighting Text, Listening to Audio, Playing Video

Geospatial Data

Lat/Lon, GPS, Movement, Proximity

Sensor Data

Pulse, Gait, Body Temperature

Page 6: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

What business is Hearst in?

Magazines20 U.S. titles & nearly 300 international titles

Page 7: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

What business is Hearst in?

Newspapers15 daily & 34 weekly titles

Page 8: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

What business is Hearst in?

Broadcasting30 television & 2 radio stations

Page 9: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

What business is Hearst in?

Business MediaOperates more than 20 business-to businesses with significant holdings in the auto, electronic, medical

and financial industries

Page 10: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

What business is Hearst in?

Hearst has over 300 websites world-wide, which results in 1TB of data per day and over 20 billion

pageviews per year.

Page 11: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

What business is Hearst in?

“Hearst is in the Data Creation Business”

Page 12: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Buzzing@Hearst

Page 13: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Business Value

Real-Time Reactions Instant feedback on articles from their audiences

Promoting Popular Content Cross-ChannelIncremental re-syndication of popular articles across properties(e.g. trending newspaper articles can be adopted by magazines)

Authentic InfluenceInform Hearst editors to write articles that are more relevant to their audiences

Understanding EngagementInform both editors what channels their audiences are leveraging to read Hearst articles

INCREMENTAL REVENUE

25% more page views

15% more visitors

Page 14: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Customer Examples

1 billion events per

week from

connected devices

Near-real-time

home valuation

(Zestimates)

Live clickstream

dashboards refreshed

under 10ms

100 GB/day

clickstreams from

250+ sites

50 billion daily ad

impressions, sub-

50 ms responses

Online stylist

processing 10

million events/day

Migrated data bus

from Kafka to

Kinesis

Facilitate

communications

between 100+

microservices

Analyze billions of

network flows in

real-time

Real-time home

recommendations

Page 15: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Stream New Data in Seconds Get actionable insights quickly

Streaming

Ingest video & data as it’s generated

Process data on the fly

Real-timeAnalytics/ML, alerts, actions

Page 16: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Quick Demohttp://amzn.to/takeselfie

Page 17: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of cameras to track the shopping activities of customers. The innovations his team concocted helped influence an AWS service called Kinesis, which allows customers to stream video from multiple devices to the Amazon cloud, where they can process it, analyze it, and use it to further advance their machine learning efforts.

Page 18: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 19: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Speed Matters

Page 20: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 21: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Kinesis Streaming Services

Robust Random Cut ForrestSummary of a dynamic data stream, highly efficient, wide number of use cases…

Page 22: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 23: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Random Cut Tree

Recurse: The cutting stops when each point is isolated.

Range-biased Cut

0 10

5

Page 24: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Random Cut Forest

Each tree built on a random sample.

Page 25: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Random Sample of a Stream

Reservoir Sampling [Vitter]

Maintain random sample of 5 points in a stream?

Keep heads with probability

Discard tails with probability

56

16

Page 26: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Insert – Case I

Start with the Root

If the point falls inside the bounding box

follow the path to the appropriate child

Page 27: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Insert – Case II

Theorem: Insert generates a tree T’ ~ T( )

Page 28: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

What is an Outlier?

Page 29: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Anomaly Score: Displacement

A point is an anomaly if its insertion greatly increases the tree size ( = sum of path lengths from root to leaves

= description length).

Inlier:

Page 30: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Anomaly Score: Displacement

Outlier

Page 31: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

NYC Taxi Ridership

0

5000

10000

15000

20000

25000

30000

20

14-1

2-0

1 0

0:0

0:0

0

20

14-1

2-0

1 0

4:0

0:0

0

20

14-1

2-0

1 0

8:0

0:0

0

20

14-1

2-0

1 1

2:0

0:0

0

20

14-1

2-0

1 1

6:0

0:0

0

20

14-1

2-0

1 2

0:0

0:0

0

20

14-1

2-0

2 0

0:0

0:0

0

20

14-1

2-0

2 0

4:0

0:0

0

20

14-1

2-0

2 0

8:0

0:0

0

20

14-1

2-0

2 1

2:0

0:0

0

20

14-1

2-0

2 1

6:0

0:0

0

20

14-1

2-0

2 2

0:0

0:0

0

20

14-1

2-0

3 0

0:0

0:0

0

20

14-1

2-0

3 0

4:0

0:0

0

20

14-1

2-0

3 0

8:0

0:0

0

20

14-1

2-0

3 1

2:0

0:0

0

20

14-1

2-0

3 1

6:0

0:0

0

20

14-1

2-0

3 2

0:0

0:0

0

20

14-1

2-0

4 0

0:0

0:0

0

20

14-1

2-0

4 0

4:0

0:0

0

20

14-1

2-0

4 0

8:0

0:0

0

20

14-1

2-0

4 1

2:0

0:0

0

20

14-1

2-0

4 1

6:0

0:0

0

20

14-1

2-0

4 2

0:0

0:0

0

20

14-1

2-0

5 0

0:0

0:0

0

20

14-1

2-0

5 0

4:0

0:0

0

20

14-1

2-0

5 0

8:0

0:0

0

20

14-1

2-0

5 1

2:0

0:0

0

20

14-1

2-0

5 1

6:0

0:0

0

20

14-1

2-0

5 2

0:0

0:0

0

20

14-1

2-0

6 0

0:0

0:0

0

20

14-1

2-0

6 0

4:0

0:0

0

20

14-1

2-0

6 0

8:0

0:0

0

20

14-1

2-0

6 1

2:0

0:0

0

20

14-1

2-0

6 1

6:0

0:0

0

20

14-1

2-0

6 2

0:0

0:0

0

20

14-1

2-0

7 0

0:0

0:0

0

20

14-1

2-0

7 0

4:0

0:0

0

20

14-1

2-0

7 0

8:0

0:0

0

20

14-1

2-0

7 1

2:0

0:0

0

20

14-1

2-0

7 1

6:0

0:0

0

20

14-1

2-0

7 2

0:0

0:0

0

numPassengers

Mon

8am

6pm

4pm

Sat

11pm11am

Tue Wed Thu Fri

Date aggregated every 30 minutes,Shingle Size: 48

1Public Data: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

Page 32: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

NYC Taxi Data

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

20

14-0

9-1

6 2

1:3

0:0

020

14-0

9-1

9 1

3:0

0:0

020

14-0

9-2

2 0

4:3

0:0

020

14-0

9-2

4 2

0:0

0:0

020

14-0

9-2

7 1

1:3

0:0

020

14-0

9-3

0 0

3:0

0:0

020

14-1

0-0

2 1

8:3

0:0

020

14-1

0-0

5 1

0:0

0:0

020

14-1

0-0

8 0

1:3

0:0

020

14-1

0-1

0 1

7:0

0:0

020

14-1

0-1

3 0

8:3

0:0

020

14-1

0-1

6 0

0:0

0:0

020

14-1

0-1

8 1

5:3

0:0

020

14-1

0-2

1 0

7:0

0:0

020

14-1

0-2

3 2

2:3

0:0

020

14-1

0-2

6 1

4:0

0:0

020

14-1

0-2

9 0

5:3

0:0

020

14-1

0-3

1 2

1:0

0:0

020

14-1

1-0

3 1

2:3

0:0

020

14-1

1-0

6 0

4:0

0:0

020

14-1

1-0

8 1

9:3

0:0

020

14-1

1-1

1 1

1:0

0:0

020

14-1

1-1

4 0

2:3

0:0

020

14-1

1-1

6 1

8:0

0:0

020

14-1

1-1

9 0

9:3

0:0

020

14-1

1-2

2 0

1:0

0:0

020

14-1

1-2

4 1

6:3

0:0

020

14-1

1-2

7 0

8:0

0:0

020

14-1

1-2

9 2

3:3

0:0

020

14-1

2-0

2 1

5:0

0:0

020

14-1

2-0

5 0

6:3

0:0

020

14-1

2-0

7 2

2:0

0:0

020

14-1

2-1

0 1

3:3

0:0

020

14-1

2-1

3 0

5:0

0:0

020

14-1

2-1

5 2

0:3

0:0

020

14-1

2-1

8 1

2:0

0:0

020

14-1

2-2

1 0

3:3

0:0

020

14-1

2-2

3 1

9:0

0:0

020

14-1

2-2

6 1

0:3

0:0

020

14-1

2-2

9 0

2:0

0:0

020

14-1

2-3

1 1

7:3

0:0

020

15-0

1-0

3 0

9:0

0:0

020

15-0

1-0

6 0

0:3

0:0

020

15-0

1-0

8 1

6:0

0:0

020

15-0

1-1

1 0

7:3

0:0

020

15-0

1-1

3 2

3:0

0:0

020

15-0

1-1

6 1

4:3

0:0

020

15-0

1-1

9 0

6:0

0:0

020

15-0

1-2

1 2

1:3

0:0

020

15-0

1-2

4 1

3:0

0:0

020

15-0

1-2

7 0

4:3

0:0

020

15-0

1-2

9 2

0:0

0:0

0

numPassengers

Page 33: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

NYC Taxi Data

0

10

20

30

40

50

60

70

80

90

100

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

20

14-0

9-1

6 2

1:3

0:0

020

14-0

9-1

9 1

5:3

0:0

020

14-0

9-2

2 0

9:3

0:0

020

14-0

9-2

5 0

3:3

0:0

020

14-0

9-2

7 2

1:3

0:0

020

14-0

9-3

0 1

5:3

0:0

020

14-1

0-0

3 0

9:3

0:0

020

14-1

0-0

6 0

3:3

0:0

020

14-1

0-0

8 2

1:3

0:0

020

14-1

0-1

1 1

5:3

0:0

020

14-1

0-1

4 0

9:3

0:0

020

14-1

0-1

7 0

3:3

0:0

020

14-1

0-1

9 2

1:3

0:0

020

14-1

0-2

2 1

5:3

0:0

020

14-1

0-2

5 0

9:3

0:0

020

14-1

0-2

8 0

3:3

0:0

020

14-1

0-3

0 2

1:3

0:0

020

14-1

1-0

2 1

5:3

0:0

020

14-1

1-0

5 0

9:3

0:0

020

14-1

1-0

8 0

3:3

0:0

020

14-1

1-1

0 2

1:3

0:0

020

14-1

1-1

3 1

5:3

0:0

020

14-1

1-1

6 0

9:3

0:0

020

14-1

1-1

9 0

3:3

0:0

020

14-1

1-2

1 2

1:3

0:0

020

14-1

1-2

4 1

5:3

0:0

020

14-1

1-2

7 0

9:3

0:0

020

14-1

1-3

0 0

3:3

0:0

020

14-1

2-0

2 2

1:3

0:0

020

14-1

2-0

5 1

5:3

0:0

020

14-1

2-0

8 0

9:3

0:0

020

14-1

2-1

1 0

3:3

0:0

020

14-1

2-1

3 2

1:3

0:0

020

14-1

2-1

6 1

5:3

0:0

020

14-1

2-1

9 0

9:3

0:0

020

14-1

2-2

2 0

3:3

0:0

020

14-1

2-2

4 2

1:3

0:0

020

14-1

2-2

7 1

5:3

0:0

020

14-1

2-3

0 0

9:3

0:0

020

15-0

1-0

2 0

3:3

0:0

020

15-0

1-0

4 2

1:3

0:0

020

15-0

1-0

7 1

5:3

0:0

020

15-0

1-1

0 0

9:3

0:0

020

15-0

1-1

3 0

3:3

0:0

020

15-0

1-1

5 2

1:3

0:0

020

15-0

1-1

8 1

5:3

0:0

020

15-0

1-2

1 0

9:3

0:0

020

15-0

1-2

4 0

3:3

0:0

020

15-0

1-2

6 2

1:3

0:0

020

15-0

1-2

9 1

5:3

0:0

0

numPassengers Anomaly Score

NYC Marathon

Christmas

New Years

Snowstorm

Page 34: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Robust Random Cut ForestsQuick Summary

• Random forests (RF) define ensemble models;• The cut in RCF corresponds to specific choice of partitioning;

• Take a set of points, compute bounding box; • Choose an axis proportional to the length (biased), then choose an

uniform random cut in range;• Recurse on both sides.

• Isolation Forests: Partition at random, dimensions unbiased;• Why RCFs? Can maintain RCF tree distributions efficiently and do

so online as data is streaming in. Distance preserving.

Page 35: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Proceedings 33rd International Conference on Machine Learning, New York, NY, USA, 2016.

Page 36: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

What we are going to cover…

Random Cut Forests

Anomaly Detection

Density Estimation Forecasting

Semi-Supervised Learning

Attribution & Directionality

Page 37: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Explainable/Transparent/Interpretable ML

“If my time-series data with 30 features yields an unusually high anomaly score. How do I explain why this particular point in the time- series is unusual? [..] Ideally I’m looking for some way to visualize “feature importance” for a specific data point.”

--- Robin Meehan, Inasight.com

Page 38: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

What is Attribution?

It’s the ratio of the “distance” of the anomaly from normal.

(It’s a distance in space of repeated patterns in the data.)

Page 39: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

What is Attribution?

An

om

aly Sco

re

Page 40: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

NYC Taxi Ridership Data1

1Public Data: http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml

Page 41: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

An

om

aly

Sco

re

Page 42: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Explaining Anomalies

Page 43: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

The Moving Example

A Fan/Turbine

1000 pts in each blade

Gaussian, for simplicity

Blades designed unequal

Rotate counterclockwise

3 special “query” points

100 trees, 256 points each

Page 44: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Anomaly Score at P1

What is going on at 90 degrees?

Blade overhead = Not an anomaly

Page 45: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

All 3 Blades

Page 46: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Attribution

p1 is far away in x-coord most of the time

But what is happening to y?

x coordinate’s contribution for p1?

Page 47: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Directionality

Sharp transition when the blademoves from above to below at p1!Total score plummets.

Slowly rotating awayTotal score remains high

Page 48: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 49: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Forecasting

Exhibit B: Forecasting using RCFs

Not just the next value!

Ability to “see past” anomalies

Auto-detect periodicity …

Forecasting implies missing value imputation!

Wait, this is just a sine wave … its easy ...

Page 50: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Maybe not…

Page 51: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Realistic Data?

ECG (one lead)

Periodicity unclear …

Shingle Size = 185

Page 52: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Test on hold out data

Page 53: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Paper in preparation!(Sudipto Guha)

Page 54: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Semi-Supervised Learning on Data Streams

One Hour

Amazon.com Orders per Minute

Page 55: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 56: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 57: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 58: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 59: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 60: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 61: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 62: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 63: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 64: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 65: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

In a long stream…

Page 66: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 67: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 68: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 69: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 70: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 71: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 72: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018.

Page 73: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

What we saw today…

Random Cut Forests

Anomaly Detection

Density Estimation Forecasting

Semi-Supervised Learning

Attribution & Directionality

Detecting Anomalies in Streaming Graphs

Page 74: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of
Page 75: Extracting Real-Time Insights from Streaming Data · Dilip Kumar, VP of Technology for Amazon Go, cites as an example Amazon Go’s unique system of streaming data from hundreds of

Contributors To This Project

Roger Barga, Dhivya Eswaran, Gaurav Ghare, Kapil Chhabra, Sudipto Guha, Shiva Kasiviswanathan, Nina Mishra,

Gourav Roy, Joshua Tokle, and Tal Wagner