Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

Building a Stock Prediction system with Machine Learning using Geode, Spring XD

e Spark MLLib

William Markito@william_markit

Fred Melo@fredmelo_br

It's all about DATA

Data SourcesLook for patterns

Prediction

medium avg (x+1)

relative strength

medium avg (x)

price(x)

Machine Learning Model (e.g. Linear

Regression)

Transform Sink

SpringXD

ExtensibleOpen-SourceFault-TolerantHorizontally ScalableCloud-Native

Machine Learning

Enrich Filter

Dashboard

Indicators

Predict

Real data

Simulator

/Stocks

/TechIndicators

/Predictions

Apache Geode (incubating)

Introduction

A distributed, memory-based data management platform for data oriented apps that need:High performance, scalability, resiliency and continuous

availabilityFast access to critical data setLocation aware distributed data processingEvent driven data architecture

Concepts

CacheIn-memory storage and management for

your dataConfigurable through XML, Spring, Java

API or CLICollection of Region

Concepts

RegionDistributed java.util.Map on steroids

(Key/Value)

Consistent API regardless of where or how data is stored

Observable (reactive)

Highly available, redundant on cache Member (s).

Concepts

RegionLocal, Replicated or PartitionedIn-memory or persistentRedundantLRU Overflow

LOCALLOCAL_HEAP_LRULOCAL_OVERFLOWLOCAL_PERSISTENTLOCAL_PERSISTENT_OVERFLOWPARTITIONPARTITION_HEAP_LRUPARTITION_OVERFLOWPARTITION_PERSISTENTPARTITION_PERSISTENT_OVERFLOWPARTITION_PROXYPARTITION_PROXY_REDUNDANTPARTITION_REDUNDANTPARTITION_REDUNDANT_HEAP_LRUPARTITION_REDUNDANT_OVERFLOWPARTITION_REDUNDANT_PERSISTENTPARTITION_REDUNDANT_PERSISTENT_OVERFLOWREPLICATEREPLICATE_HEAP_LRUREPLICATE_OVERFLOWREPLICATE_PERSISTENTREPLICATE_PERSISTENT_OVERFLOWREPLICATE_PROXY

Concepts

MemberA process that has a connection to the systemA process that has created a cacheEmbeddable within your application

Client

Locator

Server

Concepts

Client cacheA process connected to the Geode server(s)Can have a local copy of the dataCan be notified about events on the servers

Concepts

ListenersCacheWriter / CacheListenerAsyncEventListener (queue / batch)Parallel or SerialConflation

Currently under incubation in Apache Software FoundationWelcome contributions and contributors

Code and PatchesBugs, feature requestsDocumentation and contentAny form of feedback

CodeNew features

Bug fixes (patches)

Writing tests

DocumentationWiki

Web site

User guides

CommunityJoin our mailing lists (Ask or answer)

Become a speaker

Find and report bugs

Testing a release candidate or beta

JIRA - https://issues.apache.org/jira/browse/GEODEGitHub - https://github.com/apache/incubator-geodeMailing lists:

Development - dev@geode.incubator.apache.org

Users - user@geode.incubator.apache.org

Wiki - cwiki.apache.org/confluence/display/GEODEStackOverflow - http://stackoverflow.com/questions/tagged/geode+or+gemfire

SpringXDIntroduction

Concepts

Concepts A stream is composed from modules. Each module is deployed to a container and its

channels are bound to the transport.

Apache Zeppelin(incubating)

Introduction

Concepts

Web based REPLIterative & ExploratorySupport for Data Ingestion

Concepts

Multi interpretersMarkdownShellSparkGeodePython…

Concepts

Sharing through URLs without Reports

Apache SparkIntroduction

Concepts

RDDDataframeDriverWorker

"An RDD in Spark is simply an immutable distributed collection of objects. Each RDD is split into multiple partitions, which may be computed on different nodes of the cluster. RDDs can contain any type of Python, Java, or Scala objects, including user-defined classes."

Concepts

RDDDataframeDriverWorker

“A dataframe is a distributed collection of rows organized into named columns. An abstraction for selecting, filtering and plotting structured data (pandas), previously known as SchemaRDD."

Concepts

RDDDataframe DriverWorker

Summary

• Integration• Spark, JDBC, Geode• HDFS, Twitter, File, Mail…

• Data pipeline orchestration• Intuitive DSL• Streaming & Analytics• Distributed and scalable

• Web based REPL• Multiple Interpreters

• Apache Spark• Markdown• Flink• Python• Geode…

• Iterative & Exploratory

Summary

• Fast data processing• Columnar queries• RDDs• Machine Learning• Analytics & Streaming

• Fast data store and processing• In-memory & Persistent• Highly Consistent• Transaction processing• Thousands of concurrent

clients

Source Codehttp://pivotal-open-source-hub.github.io/StockInference-Spark/

Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

Software

Transcript of Building a Stock Prediction system with Machine Learning using Geode, SpringXD and Spark MLLib

Geode introduction

GEODE, 16 Jan 2007 Curating Occupational Information GEODE – Grid Enabled Occupational Data Environment Session.

Machine Learning using Apache Spark MLlib

MLlib: Scalable Machine Learning on Spark

Geode on Docker

Kon Sep Geode Si

Geode - Day 2

GEODE - technal.com TAPA PLANA 30 GEODE TECHO 32 PRESTACIONES 34 GEODE ASPECTO LISO Acristalamiento: vidrio tipo VEE 6 mm, 23 mm, 31 mm o …

AMD Geode LX Processor Data Book · AMD Geode™ LX Processors Data Book AMD Geode™ LX Processors Data Book February 2009 Publication ID: 33234H

Geode - Day 3

Spark MLlib - Training Material

MLlib: Spark's Machine Learning Library

Geode - Day 1

GEODE Toggle technical manual

Apache Geode Offheap Storage

MLlib sparkmeetup_8_6_13_final_reduced

Spark MLlib and Viral Tweets

MLlib and Machine Learning on Spark

seuil geode

GEODE - sistemamid.com