Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Post on 26-Jan-2017

908 views 0 download

Transcript of Implementing a highly scalable stock prediction system with R, Geode, SpringXD and Spark

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

SPRINGONE2GXWASHINGTON, DC

Implementing a highly scalable Stock prediction system with R, Apache Geode and Spring XD

Fred Melo@fredmelo_br

William Markito@william_markito

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

About us

Fred Melo

Technical Director for Data

fmelo@pivotal.io

@fredmelo_br

2

William Markito

Enterprise Architect for GemFire

wmarkito@pivotal.io

@william_markito

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 3

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 4

It's all about DATA

Data SourcesLook for patterns

Prediction

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

What do we want to build?

5

"Smart System"

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

… in our specific case

6

Trading Data

"Smart System"

Historical Data Repository

Learns with historical trends"How were the medium average price and relative strength reading when the latest failures happened? "

Live data becomes historical over time

Real-Time Evaluates live data“According to historical trends, there’s an 80% chance this stock prices might go downhill within the next hour"

Historical

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

… in our specific case

7

Trading Data

"Smart System"

Historical Data Repository

Learns with historical trends

"How were the medium average price and relative strength reading when the latest failures happened? "

Live data becomes historical over time

Real-Time Evaluates live data“According to historical trends, there’s an 80% chance this stock prices might go downhill within the next hour"

Historical

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 8

Live Data

Data Temperature

Hot

Cold

Greenplum DB

Apache Geode / GemFire1- Live data is ingested into the grid

3 - Results are pushed immediately to deployed applications

4 - “Hot" data ages, becoming part of the historical dataset

Machine Learning model 5 - Re-training is triggered,

updating the model with the latest historical data

Spring XD

Spring XD

The ML pipeline data flow

2 - Trained ML model compares new data to historical patterns

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 9

Live Data

Apache Geode / GemFire1- Live data is ingested into the grid

2 - Trained ML model compares new data to historical patterns

3 - Results are pushed immediately to deployed applications

Machine Learning model

4 - Re-training is triggered, updating the model with the latest historical data

Spring XD

Spring XD

Simplified demo model Data Temperature

Hot

Warm

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 10

Transform Sink

SpringXD

ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 11

Eating it in small bites…

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 12

SpringXD GemFire

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

• Cache

• Configurable through XML, ,Java

• Region

• Distributed j.u.Map on steroids

• Highly available, redundant

• Member

• Locator, Server, Client

• Callbacks

• Listener, Writer, AsyncEventListener, Parallel/Serial

Apache Geode & GemFire Concepts

13

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Apache Geode & GemFire, why ?

• Performance

• Consistency

• Resiliency

14

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Apache Geode & GemFire, why ?

15

© Copyright 2014 Pivotal. All rights reserved.

Pivotal GemFire High Availability and Fault Tolerance in 6 acts

Failing data copies are replaced transparently

Data is replicated to other clusters and sites (WAN)

Network segmentations are identified and fixed automatically

Client and cluster disconnections are handled gracefully

Data is persisted on local disk for ultimate durability

“split brain”

Failed function executions are restarted automatically

restart

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Some interesting cases…

16

China RailwayCorporation

5,700 train stations4.5 million tickets per day20 million daily users1.4 billion page views per day40,000 visits per second

* http://pivotal.io/big-data/pivotal-gemfire

Indian Railways

7,000 stations72,000 miles of track23 million passengers daily120,000 concurrent users10,000 transactions per minute

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Use cases and industries

17

Indian RailwaysChina Railway Corporation

World: ~7,349,000,000

~36% of the world population

Population: 1,251,695,6161,401,586,609

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

• Commercial product available since 2004

• Native clients in Java, C++, C#, REST

• Event Subscriptions and Continuous Queries

• Configurable WAN Gateway between clusters

• Enterprise Support, commercial features

Apache Geode & Pivotal GemFire

• Open Sourced in April/2015

• Java Native Client, REST

• 98% of GemFire API

• Event subscriptions

• ~30 contributors

• Under Incubation

18

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 19

SpringXD GemFire

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

SpringXD Basic Concepts

• Streams

• Pipelines

• Sources

• Sinks

• Filters

• Taps

20

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

SpringXD Basic Concepts

21

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

A simple example

22

twittersearch --consumerKey=XXX —consumerSecret=XXX --query=SpringOne2GX --outputType=application/json | gemfire-json-server --useLocator=true --host=localhost --port=10334 --regionName=tweets --keyExpression=payload.getField('id_str')

twittersearch --query=SpringOne2GX | gemfire-json-server --host=localhost--regionName=tweets

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 23

SpringXD GemFire

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Apache Spark Concepts

•RDD

•Dataframe

•Driver

•Worker

24

"An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Apache Spark Concepts

•RDD

•Dataframe

•Driver

•Worker

25

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 26

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 27

medium avg (x+1)

relative strength (x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear Regression)

Features Label

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 28

Transform Sink

SpringXD

ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/ 29

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Learn more!

30

https://github.com/Pivotal-Open-Source-Hub/geode-security-sampleshttps://github.com/Pivotal-Open-Source-Hub/WifiAnalyticsIoThttps://github.com/Pivotal-Open-Source-Hub/geode-social-demo

http://pivotal-open-source-hub.github.io/StockInference-Spark/

Unless otherwise indicated, these s l ides are © 2013-2015 Pivotal Software, Inc. and l icensed under a Creat ive Commons Attr ibut ion-NonCommercial l icense: ht tp: / /creat ivecommons.org/ l icenses/by-nc/3.0/

Thank you

31

@william_markito @fredmelo_br

Related: Building Highly-Scalable Spring Applications with In-Memory, Distributed Data Grids

by John Blum & Luke ShannonSeptember 15, 2015 -10:30 - Salon M

http://pivotal-open-source-hub.github.io/StockInference-Spark/