Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

31
Unless otherwise indicated, these slides are © 2013-2015 Pivotal Software, Inc. and licensed under a Creative Commons Attribution-NonCommercial license: http://creativecommons.org/licenses/by-nc/3.0/ SPRINGONE2GX WASHINGTON, DC Implementing a highly scalable Stock prediction system with R, Apache Geode and Spring XD Fred Melo @fredmelo_br William Markito @william_markito

Transcript of Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Page 1: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

SPRINGONE2GXWASHINGTON, DC

Implementing a highly scalable Stock prediction system with R, Apache Geode and Spring XD

Fred Melo@fredmelo_br

William Markito@william_markito

Page 2: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

About us

Fred Melo

Technical Director for Data

[email protected]

@fredmelo_br

2

William Markito

Enterprise Architect for GemFire

[email protected]

@william_markito

Page 3: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 3

Page 4: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 4

It's all about DATA

Data SourcesLook for patterns

Prediction

Page 5: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

What do we want to build?

5

"Smart System"

Page 6: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

… in our specific case

6

Trading Data

"Smart System"

Historical Data Repository

Learns with historical trends"How were the medium average price and relative strength reading when the latest failures happened? "

Live data becomes historical over time

Real-Time Evaluates live data“According to historical trends, there’s an 80% chance this stock prices might go downhill within the next hour"

Historical

Page 7: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

… in our specific case

7

Trading Data

"Smart System"

Historical Data Repository

Learns with historical trends

"How were the medium average price and relative strength reading when the latest failures happened? "

Live data becomes historical over time

Real-Time Evaluates live data“According to historical trends, there’s an 80% chance this stock prices might go downhill within the next hour"

Historical

Page 8: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 8

Live Data

Data Temperature

Hot

Cold

Greenplum DB

Apache Geode / GemFire1- Live data is ingested into the grid

3 - Results are pushed immediately to deployed applications

4 - “Hot" data ages, becoming part of the historical dataset

Machine Learning model 5 - Re-training is triggered,

updating the model with the latest historical data

Spring XD

Spring XD

The ML pipeline data flow

2 - Trained ML model compares new data to historical patterns

Page 9: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 9

Live Data

Apache Geode / GemFire1- Live data is ingested into the grid

2 - Trained ML model compares new data to historical patterns

3 - Results are pushed immediately to deployed applications

Machine Learning model

4 - Re-training is triggered, updating the model with the latest historical data

Spring XD

Spring XD

Simplified demo model Data Temperature

Hot

Warm

Page 10: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 10

Transform Sink

SpringXD

ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Page 11: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 11

Eating it in small bites…

Page 12: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 12

SpringXD GemFire

Page 13: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

• Cache

• Configurable through XML, ,Java

• Region

• Distributed j.u.Map on steroids

• Highly available, redundant

• Member

• Locator, Server, Client

• Callbacks

• Listener, Writer, AsyncEventListener, Parallel/Serial

Apache Geode & GemFire Concepts

13

Page 14: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Apache Geode & GemFire, why ?

• Performance

• Consistency

• Resiliency

14

Page 15: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Apache Geode & GemFire, why ?

15

© Copyright 2014 Pivotal. All rights reserved.

Pivotal GemFire High Availability and Fault Tolerance in 6 acts

Failing data copies are replaced transparently

Data is replicated to other clusters and sites (WAN)

Network segmentations are identified and fixed automatically

Client and cluster disconnections are handled gracefully

Data is persisted on local disk for ultimate durability

“split brain”

Failed function executions are restarted automatically

restart

Page 16: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Some interesting cases…

16

China RailwayCorporation

5,700 train stations4.5 million tickets per day20 million daily users1.4 billion page views per day40,000 visits per second

* http://pivotal.io/big-data/pivotal-gemfire

Indian Railways

7,000 stations72,000 miles of track23 million passengers daily120,000 concurrent users10,000 transactions per minute

Page 17: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Use cases and industries

17

Indian RailwaysChina Railway Corporation

World: ~7,349,000,000

~36% of the world population

Population: 1,251,695,6161,401,586,609

Page 18: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

• Commercial product available since 2004

• Native clients in Java, C++, C#, REST

• Event Subscriptions and Continuous Queries

• Configurable WAN Gateway between clusters

• Enterprise Support, commercial features

Apache Geode & Pivotal GemFire

• Open Sourced in April/2015

• Java Native Client, REST

• 98% of GemFire API

• Event subscriptions

• ~30 contributors

• Under Incubation

18

Page 19: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 19

SpringXD GemFire

Page 20: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

SpringXD Basic Concepts

• Streams

• Pipelines

• Sources

• Sinks

• Filters

• Taps

20

Page 21: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

SpringXD Basic Concepts

21

Page 22: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

A simple example

22

twittersearch --consumerKey=XXX —consumerSecret=XXX --query=SpringOne2GX --outputType=application/json | gemfire-json-server --useLocator=true --host=localhost --port=10334 --regionName=tweets --keyExpression=payload.getField('id_str')

twittersearch --query=SpringOne2GX | gemfire-json-server --host=localhost--regionName=tweets

Page 23: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 23

SpringXD GemFire

Page 24: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Apache Spark Concepts

•RDD

•Dataframe

•Driver

•Worker

24

"An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."

Page 25: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Apache Spark Concepts

•RDD

•Dataframe

•Driver

•Worker

25

Page 26: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 26

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Page 27: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 27

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Features Label

Page 28: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 28

Transform Sink

SpringXD

ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Page 29: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 29

Page 30: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Learn more!

30

https://github.com/Pivotal-Open-Source-Hub/geode-security-sampleshttps://github.com/Pivotal-Open-Source-Hub/WifiAnalyticsIoThttps://github.com/Pivotal-Open-Source-Hub/geode-social-demo

http://pivotal-open-source-hub.github.io/StockInference-Spark/

Page 31: Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Thank you

31

@william_markito @fredmelo_br

Related: Building Highly-Scalable Spring Applications with In-Memory, Distributed Data Grids

by John Blum & Luke ShannonSeptember 15, 2015 -10:30 - Salon M

http://pivotal-open-source-hub.github.io/StockInference-Spark/