TensorFlow with Apache Ignite and Machine and Deep ... · Achieving Continuous Machine and Deep...
Transcript of TensorFlow with Apache Ignite and Machine and Deep ... · Achieving Continuous Machine and Deep...
Achieving Continuous Machine and Deep Learning with Apache Ignite and TensorFlow Yuriy Babak
2019 © GridGain Systems
2019 © GridGain Systems2019 © GridGain Systems
Speaker
2
Yuriy Babak● Head of ML/DL framework development at GridGain● Apache Ignite committer
2019 © GridGain Systems2019 © GridGain Systems
Agenda
• ML introduction• Apache Ignite ML overview• Integration with TensorFlow• Demo
2019 © GridGain Systems
2019 © GridGain Systems GridGain Company Confidential4
Apache Ignite ML overview
2019 © GridGain Systems2019 © GridGain Systems
ML Terms
5
● Raw Data - data of business (logs, transactions, …) ● Training/Test set - preprocessed data from raw data, numeric features
(set: age, user type id, gender, ...)● Training algorithm - produce model by training set (GDB, Gibbs
Sampling, Decision Tree building)● Model - formula for producing answers on new business data (decision
tree, SVM, linear regression)● Meta algorithm - combinator of several models (boosting, bagging,
stacking)● Evaluation/Validation - estimation of model given test/validation set
(Precision, MSE, ROC AUC)● Inference - using ML model for prediction
2019 © GridGain Systems2019 © GridGain Systems
Typical ML tasks
6
1. Regression2. Time-series regression3. Classification (binary, multiclass)4. Clusterization5. Ranking6. Entity extraction, structure recognition7. Generative models
2019 © GridGain Systems2019 © GridGain Systems
Distributed model training
7
Calculate local improvements
Aggregate local improvements
We have a model
Update model
Map
Reduce
2019 © GridGain Systems2019 © GridGain Systems
Ignite partition-based dataset
8
Partition Data Dataset Context Dataset Data
Upstream Cache Context Cache On-Heap
Learning Env
On-Heap
Durable Stateless Durable Recoverable
Dataset dataset = … // Partition based dataset, internal API
dataset.compute((env, ctx, data) -> map(...), (r1, r2) -> reduce(...))
double[][] x = …double[] y = ...
double[][] x = …double[] y = ...
Partition Based Dataset StructuresSource Data
2019 © GridGain Systems2019 © GridGain Systems
Apache Ignite ML API
9
2019 © GridGain Systems2019 © GridGain Systems
List pre-build models
10
Regression: Linear Regression (LSQR and SGD), Decision Tree, Random Forest, GBT, Nearest Neighbours (KNN), MLP.
Classification: SVM, Nearest Neighbours (ANN, KNN), Decision Tree, MLP, GBT, Logistic Regression.
Clustering: K-Means, Gaussian Mixture Model (GMM).
Preprocessing: Normalization, Binarization, Imputer, One-Hot Encoder, String Encoder, MinMax Scaler, MaxAbsScaler.
Ensambles: Boosting, Stacking, Bagging of any model.
2019 © GridGain Systems GridGain Company Confidential11
TensorFlow integration
2019 © GridGain Systems2019 © GridGain Systems
Apache Ignite as data source
12
2019 © GridGain Systems2019 © GridGain Systems
Distributed training
13
TensorFlow Cluster on top of Apache Ignite ClusterTensorFlow/Ignite Client
ignite-tf.sh
Tool that help to submit TensorFlow code into cluster and manage it.
2019 © GridGain Systems2019 © GridGain Systems
Distributed training
14
2019 © GridGain Systems GridGain Company Confidential15
Demo
2019 © GridGain Systems2019 © GridGain Systems
Demo
16
Training and inference for pre-build models
2019 © GridGain Systems2019 © GridGain Systems
Demo
17
TensorFlow model inference from Apache Ignite
2019 © GridGain Systems2019 © GridGain Systems
Links
18
● Apache Ignite docs - https://apacheignite.readme.io/docs● Apache Ignite sources - https://github.com/apache/ignite● TensorFlow IO module - https://github.com/tensorflow/io/tree/master/tensorflow_io/ignite
2019 © GridGain Systems2019 © GridGain Systems19
Q&A
2019 © GridGain Systems2019 © GridGain Systems
Attend the In-Memory Computing Summit
20