Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

37
‹#› © 2015 Pivotal Software, Inc. All rights reserved. ‹#› Building a Stock Prediction system with Machine Learning using Geode, Spring XD e Spark MLLib William Markito @william_markito Fred Melo @fredmelo_br

Transcript of Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

Page 1: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›

Building a Stock Prediction system with Machine Learning using Geode, Spring XD

e Spark MLLib

William Markito@william_markit

o

Fred Melo@fredmelo_br

Page 2: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib
Page 3: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

It's all about DATA

Data SourcesLook for patterns

Prediction

Page 4: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Page 5: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Page 6: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib
Page 7: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Page 8: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

medium avg (x+1)

relative strength

(x)

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear

Regression)

Page 9: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib
Page 10: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

© Copyright 2014 Pivotal. All rights reserved.

Transform Sink

SpringXD

ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native

Machine Learning

Enrich Filter

Split

Dashboard

Indicators

1

2

Predict

3

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Page 11: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›

Apache Geode (incubating)

Introduction

Page 12: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Introduction

A distributed, memory-based data management platform for data oriented apps that need:High performance, scalability, resiliency and continuous

availabilityFast access to critical data setLocation aware distributed data processingEvent driven data architecture

Page 13: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

CacheIn-memory storage and management for

your dataConfigurable through XML, Spring, Java

API or CLICollection of Region

Page 14: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

RegionDistributed java.util.Map on steroids

(Key/Value)

Consistent API regardless of where or how data is stored

Observable (reactive)

Highly available, redundant on cache Member (s).

Page 15: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

RegionLocal, Replicated or PartitionedIn-memory or persistentRedundantLRU Overflow

LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY

Page 16: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

MemberA process that has a connection to the systemA process that has created a cacheEmbeddable within your application

Client

Locator

Server

Page 17: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

Client cacheA process connected to the Geode server(s)Can have a local copy of the dataCan be notified about events on the servers

Page 18: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

ListenersCacheWriter / CacheListenerAsyncEventListener (queue / batch)Parallel or SerialConflation

Page 19: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

© Copyright 2014 Pivotal. All rights reserved.

Apache Geode (incubating)

Currently under incubation in Apache Software FoundationWelcome contributions and contributors

Code and PatchesBugs, feature requestsDocumentation and contentAny form of feedback

Page 20: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

© Copyright 2014 Pivotal. All rights reserved.

CodeNew features

Bug fixes (patches)

Writing tests

DocumentationWiki

Web site

User guides

CommunityJoin our mailing lists (Ask or answer)

Become a speaker

Find and report bugs

Testing a release candidate or beta

Apache Geode (incubating)

Page 21: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

© Copyright 2014 Pivotal. All rights reserved.

JIRA - https://issues.apache.org/jira/browse/GEODEGitHub - https://github.com/apache/incubator-geodeMailing lists:

Development - [email protected]

Users - [email protected]

Wiki - cwiki.apache.org/confluence/display/GEODEStackOverflow - http://stackoverflow.com/questions/tagged/geode+or+gemfire

Apache Geode (incubating)

Page 22: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›

SpringXDIntroduction

Page 23: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

Page 24: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts A stream is composed from modules. Each module is deployed to a container and its

channels are bound to the transport.

Page 25: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›

Apache Zeppelin(incubating)

Introduction

Page 26: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

Web based REPLIterative & ExploratorySupport for Data Ingestion

Page 27: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

Multi interpretersMarkdownShellSparkGeodePython…

Page 28: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

Sharing through URLs without Reports

Page 29: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved. ‹#›

Apache SparkIntroduction

Page 30: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

RDDDataframeDriverWorker

"An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."

Page 31: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

RDDDataframeDriverWorker

“A dataframe is a distributed collection of rows organized into named columns. An abstraction for selecting, filtering and plotting structured data (pandas), previously known as SchemaRDD."

Page 32: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Concepts

RDDDataframe DriverWorker

Page 33: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Summary

Page 34: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Summary

• Integration• Spark, JDBC, Geode• HDFS, Twitter, File, Mail…

• Data pipeline orchestration• Intuitive DSL• Streaming & Analytics• Distributed and scalable

• Web based REPL• Multiple Interpreters

• Apache Spark• Markdown• Flink• Python• Geode…

• Iterative & Exploratory

Page 35: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

‹#›© 2015 Pivotal Software, Inc. All rights reserved.

Summary

• Fast data processing• Columnar queries• RDDs• Machine Learning• Analytics & Streaming

• Fast data store and processing• In-memory & Persistent• Highly Consistent• Transaction processing• Thousands of concurrent

clients

Page 36: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

© Copyright 2014 Pivotal. All rights reserved.

Source Codehttp://pivotal-open-source-hub.github.io/StockInference-Spark/

Page 37: Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib