Square's Machine Learning Infrastructure and Applications - Rong Yan
-
Upload
hakka-labs -
Category
Software
-
view
656 -
download
2
description
Transcript of Square's Machine Learning Infrastructure and Applications - Rong Yan
May 15, 2014
!Rong Yan
Machine Learning @ Square
Birth of Square
Payment
StandReader
Payment Device Payment Aggregation
Risk Model
Payment
Commerce Cash
Market
Our Mission Make commerce easy.
Payment
Data
Commerce
The Next Big Thing
3M+ Readers
$15B+ Annualized
Scale
Offline and Online
Amount
Location
Item Desc.
Card #
Credit Score
Friends
Activity History
Inventory
Sales Volume
Haircut Price
Turn Data into Business Value
FraudDetection
BusinessInsight
CustomerRelation
InformationDiscovery
Fraud Detection @ Square
Fraud Detection in the payment flow
Bank
Clears for
settlement
Suspect ~2000 sellers
Risk OpsTransaction review
150,000 active sellers per day
Risk ML Fraud Detection
Payments
near-real-time
ML Architecture
Merchant
Devices
Bank Accounts
Machine Learning
(300+ features)Suspicions
Card not present: Yes
Pan Diversity: 0.05
Use iPhone: No
Feature Generation
Easy to interpret
!
Dimension reduction !
!
Very powerful in ensemble
Decline Rate >= 0.1
NoYes
Amount <= $10000
NoYes
Business Type = Auto repair
NoYes
0.9 0.6
Decision Tree Model
Random Forests: Decision Tree Ensemble
Decline Rate <= 0.1
NoYes
Amount <= $10000
Business Type = Auto repair
0.9 0.6
Tree 1 Tree N
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Mode for classification = Bad Average for regression = 0.63
NoYes
NoYes
Success Rate <= 0.2
NoYes
Age >= 20
Amount <= $1000
0.4 0.7
NoYes
NoYes
Decline Rate <= 0.3
NoYes
Amount <= $20000
Age <= 22
0.8 0.6
NoYes
NoYes
Tree 2
Bad, 0.9 Good, 0.4 Bad, 0.6
Random Forests - Build each Tree
All data
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
All dataSamples
Random Forests - Build each Tree
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Features
Dollar Amount
Connected with bad user
Business Type
Decline Rate
Time of Day
Location
Randomly select sqrt(n) features
All dataSamples
Random Forests - Build each Tree
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Features
Dollar Amount
Connected with bad user
Business Type
Decline Rate
Time of Day
Location
Randomly select sqrt(n) features
Best split: feature and value
Decline Rate <= 0.1
NoYes
0.4 0.6
All dataSamples
Random Forests - Build each Tree
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Features
Dollar Amount
Connected with bad user
Business Type
Decline Rate
Time of Day
Location
Randomly select sqrt(n) features
Best split: feature and value
Decline Rate <= 0.1
NoYes
0.4 0.6
All dataSamples
Grow Tree Grow Tree
Random Forests - Build each Tree
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Features
Dollar Amount
Connected with bad user
Business Type
Decline Rate
Time of Day
Location
Randomly select sqrt(n) features
Best split: feature and value
Decline Rate <= 0.1
NoYes
0.4 0.6
All dataSamples
Grow Tree Grow Tree
When sample size is small STOP
Random Forests - Build each Tree
Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.
Features
Dollar Amount
Connected with bad user
Business Type
Decline Rate
Time of Day
Location
Randomly select sqrt(n) features
Best split: feature and value
Decline Rate <= 0.1
NoYes
0.4 0.6
All dataSamples
Grow Tree Grow Tree
When sample size is small STOP
Repeat these steps multiple times to create a forest
Random Forests - Build each Tree
Boosting Trees
Tree 1
Boosting Trees
Tree 1 Tree 2
Help Tree 1
Boosting Trees
Tree 1 Tree 2 Tree 3 Tree 4
Help Tree 1
Help Tree 1, 2
Help Tree 1, 2, 3
Stop when no help needed
0 weights all samples
Boosting Trees
Tree 1 Tree 2 Tree 3 Tree 4
Help Tree 1
Help Tree 1, 2
Help Tree 1, 2, 3
8.0 -2.0 1.0 0.57.5 = + + +
Boosting Trees - AlgorithmObjective function:
Loss
Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine." 1999
Precision at a fixed recall level
Results - Precision
Model April May June
Random Forest 76% 77% 80%
Boosting Trees 85% 82% 88%
+11.8% +6.5% +10%
Results - Fraud Detection Recall
# Payments to Reject
Frau
d $
Pre
vent
ed Easy
Hard
Medium
Data Sampling
Highly biased in label distribution - Less than 1 in 1000 !Weighted training - Higher weights on positive samples => oscillation - Lower weights on negative samples => no real gain !Solution - Keep negative:positive ratio to be 3:1 - 10:1 - Scale the final model if calibration is needed !Fewer data requires fewer resources to train !Observed +10% improvement from 20:1 to 3:1
ProductionalizeMachine Learning
‣Ruby-on-Rails + MySQL ‣MySQL replication ‣Tied to production schema ‣Hard to do complex analysis
Startup Architecture
‣ Jave services ‣APIs ‣HDFS
Scale it up: SOA + Data Warehouse
Scale it up: Data Transport
‣Append-only feeds ‣Kafka ‣Replication ‣Protocol buffers
Payments
Highly Available
Merchant
Devices
Bank Accounts
Suspicions
Parallel Environments and Data Integrity
Blue
Green
VIPupstream
Square Random Forest Learning Management
Recommendation
Other ML @ Square
Square Random Forest
RF Learner Implementation Time (Train / Test)
RiskML Random Forest (Built on Scikit-Learn)
C / Cython / Python (Open Source + Square Code) 72 minutes
WiseRF C++ (Proprietary) 23 minutes
Square Random Forest Java (Square Code) 15 minutes
Note: time reported on 3M training and 15M testing data
Learning Management System
‣ Support non-sophisticated users
‣ Fast ad-hoc analytics
‣ Accessible to everyone for easy
model generation and evaluation
‣ Tracks results to ensure different
models can be compared
Square Market Recommendation
10x conversion rate vs. random baseline