Machine Learning by Example - Apache Spark

67
Service Symphony Ltd Apache Spark Machine Learning by Example Meeraj Kunnumpurath 25th of February 2017 1

Transcript of Machine Learning by Example - Apache Spark

Page 1: Machine Learning by Example - Apache Spark

Service Symphony Ltd

Apache Spark Machine Learning by Example

Meeraj Kunnumpurath25th of February 2017

1

Page 2: Machine Learning by Example - Apache Spark

Introduction❖ Working as technologist and software architect for couple of decades, at number of leading

financial institutions in the UK

❖ Authored a number books on Enterprise Java, Web Services and SOA

❖ Spoken at a number of technology conferences

❖ Founded Service Symphony Ltd in 2009 serving leading financial services customers building mission critical middleware

❖ Engineer with a keen interest in ML, AI and Data Science

❖ Blog: http://www.servicesymphony.com/blog

❖ Email: [email protected]

❖ Presentation: https://www.slideshare.net/MeerajKunnumpurath/machine-learning-by-example-apache-spark

❖ GitHub: https://github.com/kunnum/sandbox/tree/master/notebooks

2

Page 3: Machine Learning by Example - Apache Spark

Agenda❖ Introduction to using ML with Apache Spark

❖ Hands-on example driven approach

❖ Not a deep dive into Apache Spark Architecture

❖ Neither a deep dive into ML algorithms

❖ Examples built using Apache Zeppelin

❖ Some of the examples are from Spark ASF documentation

3

Page 4: Machine Learning by Example - Apache Spark

Apache Spark - Overview❖ Open source large scale distributed data processing fabric

❖ Offers multiple components addressing different facets of data science for big and fast data processing, ML, analytics and data ingestion

❖ Ability to process large amount of data in memory spanning multiple process spaces

❖ Initially started as a research project in UC Berkeley

❖ Originally released under BSD, top level ASF project licensed under ASL 2.0 since 2014

❖ One of the most active open source project, arguably the most active ASF project

❖ Adopted, extended and commercialised by multiple vendors playing in the data science realm

4

Page 5: Machine Learning by Example - Apache Spark

Apache Spark - Architecture

5

Page 6: Machine Learning by Example - Apache Spark

Apache Spark - Architecture

6

Page 7: Machine Learning by Example - Apache Spark

Apache Spark - Architecture

7

Page 8: Machine Learning by Example - Apache Spark

Scala - Spark Natural Transition

❖ Interest in Spark stemmed from deep interest in Scala and functional programming

❖ Data processing echo system built around Scala, with a strong synergy in Scala’s design motivations

❖ Extends Scala’s idiomatic functional programming model to transcend beyond process boundaries

❖ Spark RDDs - Scala collections on steroids

8

Page 9: Machine Learning by Example - Apache Spark

Spark - Scala Notebook

9

Page 10: Machine Learning by Example - Apache Spark

Spark - Scala Notebook

10

Page 11: Machine Learning by Example - Apache Spark

ML Components

11

Page 12: Machine Learning by Example - Apache Spark

ML Components❖ Data Structures

❖ Vectors and Matrices

❖ Data Frames

❖ Feature Extractors and Transformers

❖ Estimators

❖ Models

❖ Pipelines

❖ Evaluators

❖ Tuning Aids

12

Page 13: Machine Learning by Example - Apache Spark

ML Components - Notebook

13

Page 14: Machine Learning by Example - Apache Spark

ML Components - Notebook

14

Page 15: Machine Learning by Example - Apache Spark

ML Components - Notebook

15

Page 16: Machine Learning by Example - Apache Spark

ML Components - Notebook

16

Page 17: Machine Learning by Example - Apache Spark

Spark ML - Pipeline Architecture

❖ Dataframe

❖ Estimator

❖ Transformer

❖ Pipeline

❖ Parameter

17

Page 18: Machine Learning by Example - Apache Spark

Spark ML - Pipeline Architecture

18

Training time flowPipeline in estimator mode

Pipeline.fit()Creates a pipeline model

Page 19: Machine Learning by Example - Apache Spark

Spark ML - Pipeline Architecture

19

Test time flowPipeline in transformer mode

PipelineModel.transform()Creates dataframe with augmented prediction columns

Page 20: Machine Learning by Example - Apache Spark

ML Pipeline Notebook

20

Page 21: Machine Learning by Example - Apache Spark

ML Pipeline Notebook

21

Page 22: Machine Learning by Example - Apache Spark

ML Pipeline Notebook

22

Page 23: Machine Learning by Example - Apache Spark

ML Pipeline Notebook

23

Page 24: Machine Learning by Example - Apache Spark

ML Pipeline Notebook

24

Page 25: Machine Learning by Example - Apache Spark

Regression❖ Supervised Learning Algorithm for predicting continuous labels

❖ Multiple Algorithms

❖ Linear Regression

❖ Generalised Linear Regression

❖ Decision Tree Regression

❖ Random Forest Regression

❖ Gradient Boosted Tree Regression

❖ Survival Regression

❖ Isotonic Regression

❖ Works with input feature vectors and labelled points

25

Page 26: Machine Learning by Example - Apache Spark

Regression

26

Page 27: Machine Learning by Example - Apache Spark

Linear Regression - Notebook

27

Page 28: Machine Learning by Example - Apache Spark

Linear Regression - Notebook

28

Page 29: Machine Learning by Example - Apache Spark

Linear Regression - Notebook

29

Page 30: Machine Learning by Example - Apache Spark

Linear Regression - Notebook

30

Page 31: Machine Learning by Example - Apache Spark

Linear Regression - Notebook

31

Page 32: Machine Learning by Example - Apache Spark

Classification❖ Supervised learning for predicting discrete labels

❖ Multiple algorithms

❖ Binomial and polynomial logistic regression

❖ Decision tree classifier

❖ Random forest classifier

❖ Gradient boosted tree classifier

❖ Multi-layer neural network classifier

❖ Naive Bayes Classifier

32

Page 33: Machine Learning by Example - Apache Spark

Classification

33

Page 34: Machine Learning by Example - Apache Spark

Classification - Notebook

34

Page 35: Machine Learning by Example - Apache Spark

Classification - Notebook

35

Page 36: Machine Learning by Example - Apache Spark

Classification - Notebook

36

Page 37: Machine Learning by Example - Apache Spark

Classification - Notebook

37

Page 38: Machine Learning by Example - Apache Spark

Classification - Notebook

38

Page 39: Machine Learning by Example - Apache Spark

Classification - Notebook

39

Page 40: Machine Learning by Example - Apache Spark

Classification - Notebook

40

Page 41: Machine Learning by Example - Apache Spark

Classification - Notebook

41

Page 42: Machine Learning by Example - Apache Spark

Clustering❖ Unsupervised learning algorithm based on similarity

vectors

❖ Multiple algorithms

❖ K-Means Clustering

❖ LDA - Latent Dirichlet Allocation

❖ Bisecting K-Means

❖ Gaussian Mixture Model

42

Page 43: Machine Learning by Example - Apache Spark

Clustering

43

Page 44: Machine Learning by Example - Apache Spark

Clustering - Notebook

44

Page 45: Machine Learning by Example - Apache Spark

Clustering - Notebook

45

Page 46: Machine Learning by Example - Apache Spark

Clustering - Notebook

46

Page 47: Machine Learning by Example - Apache Spark

Clustering - Notebook

47

Page 48: Machine Learning by Example - Apache Spark

Clustering - Notebook

48

Page 49: Machine Learning by Example - Apache Spark

Clustering - Notebook

49

Page 50: Machine Learning by Example - Apache Spark

Clustering - Notebook

50

Page 51: Machine Learning by Example - Apache Spark

Clustering - Notebook

51

Page 52: Machine Learning by Example - Apache Spark

Collaborative Filtering

❖ Commonly used for recommender systems

❖ Uses ALS (Alternating Least Squares) to learn latent factors in user to item association

❖ Default assumption is based on explicit feedback for matrix factorization

❖ You an explicitly enable implicit preferences

52

Page 53: Machine Learning by Example - Apache Spark

Collaborative Filtering

53

Page 54: Machine Learning by Example - Apache Spark

Collaborative Filtering - Notebook

54

Page 55: Machine Learning by Example - Apache Spark

Collaborative Filtering - Notebook

55

Page 56: Machine Learning by Example - Apache Spark

Collaborative Filtering - Notebook

56

Page 57: Machine Learning by Example - Apache Spark

Collaborative Filtering - Notebook

57

Page 58: Machine Learning by Example - Apache Spark

Collaborative Filtering - Notebook

58

Page 59: Machine Learning by Example - Apache Spark

Collaborative Filtering - Notebook

59

Page 60: Machine Learning by Example - Apache Spark

Model Tuning

❖ API to tune an individual estimator or the entire pipeline using a normalised parameter model

❖ API to support k-fold cross validation

❖ API to evaluate performance on linear regression, as well as binomial and polynomial classification

❖ API for performing training validation split

60

Page 61: Machine Learning by Example - Apache Spark

Model Tuning - Notebook

61

Page 62: Machine Learning by Example - Apache Spark

Model Tuning - Notebook

62

Page 63: Machine Learning by Example - Apache Spark

Model Tuning - Notebook

63

Page 64: Machine Learning by Example - Apache Spark

Model Tuning - Notebook

64

Page 65: Machine Learning by Example - Apache Spark

Model Tuning - Notebook

65

Page 66: Machine Learning by Example - Apache Spark

Model Tuning - Notebook

66

Page 67: Machine Learning by Example - Apache Spark

Questions

67