[email protected] @william markito · 3 - Results are pushed immediately to deployed applications...

23
1 Pivotal Confidential–Internal Use Only

Transcript of [email protected] @william markito · 3 - Results are pushed immediately to deployed applications...

Page 1: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

1

1 Pivotal Confidential–Internal Use Only

Page 2: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

2

William Markito@william_markito

Fred Melo@fredmelo_br

(incubating)

Implementing a highly scalable Stock prediction system with Apache Geode,

Spring XD and Spark MLib

Page 3: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

About us

Fred Melo

Technical Director for Data

[email protected]

@fredmelo_br

William Markito

Enterprise Architect for GemFire

[email protected]

@william_markito

Page 4: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

A Simple Example

Data SourcesLook for patterns

Forecast

Page 5: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training
Page 6: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

"Smart System"

Applicability

Page 7: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

Smart System

Learns with HISTORICAL TRENDS

Live data becomes historical over time

Real-Time

Evaluates LIVE DATA

Historical

What do we want to build?

Trading Data

“According to historical trends, there’s an 80% chance this stock prices might go down within the next few minutes"

"How were the technical indicator readings when the latest price drops happened? "

Page 8: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

Live Data

Data Temperature

Hot

Cold

Apache Hawq

Apache Geode / GemFire1- Live data is ingested into the grid

3 - Results are pushed immediately to deployed applications

4 - “Hot" data ages, becoming part of the historical dataset

5 - Re-training triggered, ML model updated.

Spring XD

2 - Trained ML model compares new data to historical patterns

The Machine Learning Pipeline data flow

Spring XD

Machine Learning model

Page 9: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

Live Data

Data Temperature

Hot

Warm

Apache Geode / GemFire1- Live data is ingested into the grid

3 - Results are pushed immediately to deployed applications

Machine Learning model

2 - Trained ML model compares new data to historical patterns

The Machine Learning Pipeline data flow

5 - Re-training triggered, ML model updated.

Spring XD

Simplified Model

Spring XD

Page 10: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Page 11: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

Too complex?? Eating it in small bites…

Page 12: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

SpringXD GemFire

Page 13: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Page 14: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

/Stocks

/TechIndicators

/Predictions

• Cache • Configurable through XML, ,Java

• Region • Distributed j.u.Map on steroids • Highly available, redundant

• Member • Locator, Server, Client

• Callbacks • Listener, Writer, AsyncEventListener, Parallel/Serial

Apache Geode Concepts

Page 15: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

Apache Geode HA and Fail-Tolerance

Page 16: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Page 17: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

Transform Sink

SpringXDEnrich Filter

Split1

2

Predict3

Streams Pipelines Sources Sinks Filters Taps

Page 18: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

Transform Sink

SpringXD

Extensible Open-Source Fault-Tolerant Horizontally Scalable Cloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Page 19: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Features Label

Page 20: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Features Label

Page 21: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

Demo Time

Error

Page 22: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

https://github.com/Pivotal-Open-Source-Hub/StockInference-SparkSource code and detailed instructions available at:

22

William Markito@william_markito

Fred Melo@fredmelo_br

Follow us on Twitter!

Page 23: wmarkito@pivotal.io @william markito · 3 - Results are pushed immediately to deployed applications 4 - “Hot" data ages, becoming part of the historical dataset 5 - Re-training

23

1 Pivotal Confidential–Internal Use Only