Machine Learning: Machine Learning: Introduction Introduction
Machine Learning in Production
-
Upload
ben-freundorfer -
Category
Technology
-
view
160 -
download
1
Transcript of Machine Learning in Production
![Page 1: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/1.jpg)
Machine LearningReal-Life Data & ML in Production
@benfreu Ben Freundorfer
![Page 2: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/2.jpg)
![Page 3: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/3.jpg)
Costs
![Page 4: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/4.jpg)
What’s a model
![Page 5: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/5.jpg)
Many algorithms are a bunch of matrix calculations.
• Costly to train models
• Cheap to apply models (predict)
![Page 6: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/6.jpg)
Human work
![Page 7: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/7.jpg)
Real-Life Data
![Page 8: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/8.jpg)
TransformationTransform relational data into vectors
All algos need: matrices of numbers
Some need0.0 ≤ x ≤ 1.0mean=0σ=1
Look out for algos requiring „normalized“ or „standardized“ values → feature scaling
![Page 9: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/9.jpg)
Categories
• Features with no numerical relation
• Category 5 doesn’t have 5x the y of category 1
• Fix: Dummy variables
• cat_1, cat_2, … cat_5 with values 0 or 1
![Page 10: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/10.jpg)
Missing Values• days_since_last_purchase = null
How to deal with this? 0 or 999?
• Often intuitively clear from the data domain One solution: max(days_since_last_purchase of other users)
• HAS to be addressed
![Page 11: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/11.jpg)
Outliers
• days_since_last_purchase = 2837 for a legacy customer
• If it’s irrelevant, get rid of the whole example (legacy customer)
• Or cap at a max/min value
![Page 12: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/12.jpg)
Reduce Features
• check for correlation between features. get rid of correlated ones
• get rid of intuitively useless features
![Page 13: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/13.jpg)
A Better Model
• Less features - i.e. is simpler
• Trained on more training examples
![Page 14: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/14.jpg)
Moving to Production
![Page 15: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/15.jpg)
![Page 16: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/16.jpg)
Online vs Offline
OFFLINE From time to time retrain whole model and upload model
ONLINE Algorithm runs each time a new example is added and adapts the model a bit
examples should be randomized
![Page 17: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/17.jpg)
ExamplePredict which category user will buy from after
newsletter-signup
![Page 18: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/18.jpg)
Build Model• Collect data
Traffic source, categories looked at prior to signup, etc. and y = category of purchase after signup
• Analyze Try to make predictions using e.g. logistic regression
• Train final model
• Save weights to DB or JSON or file
![Page 19: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/19.jpg)
Predict• User signs up
• Load weights and predict probabilities of categories.
• If P(category X) > thresholdclassify user as „interested in category X“
• Send out newsletters
![Page 20: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/20.jpg)
Tips• Use R or Python/Jupyter/Pandas to analyze data
• Test if you need a separate system for predictions or just for training
• Try not to implement algos yourself If you do, use numerical computation libraries (probably wrappers for C or Fortran code)
• Be sure the past predicts the future
![Page 21: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/21.jpg)
Ethics
• Your model might turn into a racially profiling sexist.
• Be aware of what your input features mean & what you actually base your predictions on
• Relatively harmless when predicting product categories - questionable for credit ratings
![Page 22: Machine Learning in Production](https://reader031.fdocuments.net/reader031/viewer/2022022414/587355251a28ab56378b723f/html5/thumbnails/22.jpg)
Thank youBen Freundorfer
@benfreu