AMIR TABAKOVIC VP OF BUSINESS DEVELOPMENT IG...
Transcript of AMIR TABAKOVIC VP OF BUSINESS DEVELOPMENT IG...
December 1, 2015Zürich
MACHINE LEARNING MADE BEAUTIFULLY SIMPLE
AMIR TABAKOVICVP OF BUSINESS DEVELOPMENT, BIGML, INC
BigML Inc
What’s Machine Learning?
“to give computers the ability to learn to perform a task without being
explicitly programmed”
“to automatically find patterns in data that can be reused in the future”
BigML Inc
Age of ML“The first half of the information age was programming computers to do what we want. In the second half of
the information age computers will program themselves.”
(Pedro Domingos, Master Algorithm)
BigML Inc
"When I was a programmer, I was very good at figuring out all the algorithms and writing them all
down.
Today, I think I would try to figure out how to program a computer to learn something.”
Eric Schmidt, Google
BigML Inc 8
Developers
Data Scientists
Everyone
Analysts
Academics & Researchers
Machine Learning
1980s
2000s
2010
Weka, R, Orange, Knime, Scikit2015
2025
2030
Market Evolution
BigML, Google, Azure ML, Amazon ML
RapidMiner, H20, SkyTree, Dato, Spark
DEVELOPERS15
TASKS7.8M+
DATASETS720k+
PREDICTIVE MODELS4.6M+
20k+Team Members
24
• Founded in January 2011 to automate machine learning.
• Pioneered MLAAS.
• API-first company with a beautiful UI.
• Cloud-based and on-premise private deployments for enterprises.
BigML features overview
REST API
Auto-scalable Infrastructure
Distributed Machine Learning Backend
Web Interface and Visualization Bindings
• Python• Node.js• Java• C#• R
BigMLer BigML GASBigML X
Multi-tenantPrivate Deployments
On-premise Private Deployments
Predictive Applications
13
Types of algorithms
Clustering Anomaly Detection
To group data points by similarity To find outliers that do not fit standard patterns
Supervised
Unsupervised
Decision Trees & Random ForestsRegression Classification
Continuous values Discrete values (classes/labels)
BigML Inc
Benefits of implementing ML APIs
ML APIs automate and transform Machine Learning from a highly manual anddetached mix of processes and heterogenous tools into a single cohesive and easy-‐to-‐use service
ML APIs reduce the cost and complexity of building and deploying predictivemodels
ML APIs increase business performance rapidly incorporating Machine Learninginto each department's operations and decisions reducing the time-‐to-‐market ofdata-‐driven decisions
Manages the heavy infrastructure needed to learn from data and make predictionsat scale
Adds traceability and repeatability to Machine Learning tasks
BigML Inc
Data Transformations
Algorithmic Modeling Process Application / Reports
Users Operations Loans
Early Detection of Delinquency
HistoricalData
Data is Transformedand Tagged
Major model
TransformedData
CurrentData Threshold
Customers with higher probability of falling into default
• Fully automated process
• System is capable of predicting in batch or in real time if acustomer will stop paying a loan in a specified window oftime (2 months, 3 months).
• Generation of reports that give future confidence of acustomer staying current and/or defaulting.
• Can be directly integrated with other systems throughexporting files, use of REST calls, or through libraries inmultiple programming languages.
Predictions Delinquency
BigML Inc
http://www.lendingmemo.com/lending-club-strategy/
• 5 simple ways to increase returns at Lending Club
Diversify, Increase Risk, Reinvest…
• 2 more complicated ways to increase ROI
Predictive Models, Secondary Market
“creating a custom algorithm is beyond the ability of 99% of investors”
BigML Inc
• Playing with data from Lending Club
https://www.lendingclub.com/info/download-data.action
• The data is real but has been filtered
• This is not financial advice – you’re old enough to know that
Disclaimer
BigML Inc
Basic Idea
• Focus on Lending Club assigned grades B - G
• Build a predictive model to detect and filter out bad loans
• Automate everything, build predictive app
• Get shamelessly rich
BigML Inc
Loan Life Cycle
“Closed”“Open”
In GracePeriod
Late 16-
30 Days
Fully Paid
Late 31-120
Days
Charged OffDefault
Current
( if ( = ( field "loan_status" ) "Fully Paid" ) "good", "bad" )
BigML Inc
Exclude Grade A
Split Dataset in “Open” & “Closed”
Loans
Transform 3 Categories of “Closed” Loan
Status Feature in new Label “Quality”
“good” “bad”
Split Dataset in Training and Test Dataset
Exclude Anomalies
with Anomaly Detector
Train Dataset
with Decision
Tree
Train Dataset
with Ensemble
Score “Open” Dataset with
“Quality” Label based on best
Predictive Model“OPEN”
“CLOSED”
20%80%
Evaluate
BigML Inc
Isolation Forest:
Grow a random decision tree until each instance is in its own leaf
“easy” to isolate
“hard” to isolate
Depth
Now repeat the process several times and use average Depth to compute anomaly score: 0 (similar) -> 1 (dissimilar)
BigML Anomaly
Dia Color Shape Fruit
4 red round plum
5 red round apple
5 red round apple
6 red round plum
7 red round apple
27
Ensembles
Bagging!Random Decision Forest!
All Data: “plum”
What is a round, red 6cm fruit?
Sample 2: “apple”
Sample 3: “apple”
Sample 1: “plum”}“apple”
ML Opportunity in FinTechBanks have traditionally • Used statistical analysis of data for many years as the primary way to
model the behaviour and needs of their customers • Not leveraged all the information they have available• No incorporated additional information available to capture new
dimensions for risk taken
ML Opportunity in FinTechIt is time to• Take advantage of machine learning• dealing with many variables to explore thousands of complex
combinations (not just linear) • discover new patterns that otherwise would have been hidden
• Benefit from • better predictive accuracy• adaptability• publicly available data• ML APIs