Square's Machine Learning Infrastructure and Applications - Rong Yan

May 15, 2014

!Rong Yan

Machine Learning @ Square

Birth of Square

Payment

StandReader

Payment Device Payment Aggregation

Risk Model

Payment

Commerce Cash

Market

Our Mission Make commerce easy.

Payment

Data

Commerce

The Next Big Thing

3M+ Readers

$15B+ Annualized

Scale

Offline and Online

Amount

Location

Item Desc.

Card #

Credit Score

Friends

Activity History

Inventory

Sales Volume

Haircut Price

Turn Data into Business Value

FraudDetection

BusinessInsight

CustomerRelation

InformationDiscovery

Fraud Detection @ Square

Fraud Detection in the payment flow

Bank

Clears for

settlement

Suspect ~2000 sellers

Risk OpsTransaction review

150,000 active sellers per day

Risk ML Fraud Detection

Payments

near-real-time

ML Architecture

Merchant

Devices

Bank Accounts

Machine Learning

(300+ features)Suspicions

Card not present: Yes

Pan Diversity: 0.05

Use iPhone: No

Feature Generation

Easy to interpret

!

Dimension reduction !

!

Very powerful in ensemble

Decline Rate >= 0.1

NoYes

Amount <= $10000

NoYes

Business Type = Auto repair

NoYes

0.9 0.6

Decision Tree Model

Random Forests: Decision Tree Ensemble

Decline Rate <= 0.1

NoYes

Amount <= $10000

Business Type = Auto repair

0.9 0.6

Tree 1 Tree N

Breiman, Leo (2001). "Random Forests". Machine Learning 45 (1): 5–32.

Mode for classification = Bad Average for regression = 0.63

NoYes

NoYes

Success Rate <= 0.2

NoYes

Age >= 20

Amount <= $1000

0.4 0.7

NoYes

NoYes

Decline Rate <= 0.3

NoYes

Amount <= $20000

Age <= 22

0.8 0.6

NoYes

NoYes

Tree 2

Bad, 0.9 Good, 0.4 Bad, 0.6

Random Forests - Build each Tree

All data


All dataSamples



Features

Dollar Amount

Connected with bad user

Business Type

Decline Rate

Time of Day

Location

Randomly select sqrt(n) features

All dataSamples



Features

Dollar Amount


Business Type

Decline Rate

Time of Day

Location


Best split: feature and value

Decline Rate <= 0.1

NoYes

0.4 0.6

All dataSamples



Features

Dollar Amount


Business Type

Decline Rate

Time of Day

Location



Decline Rate <= 0.1

NoYes

0.4 0.6

All dataSamples

Grow Tree Grow Tree



Features

Dollar Amount


Business Type

Decline Rate

Time of Day

Location



Decline Rate <= 0.1

NoYes

0.4 0.6

All dataSamples

Grow Tree Grow Tree

When sample size is small STOP



Features

Dollar Amount


Business Type

Decline Rate

Time of Day

Location



Decline Rate <= 0.1

NoYes

0.4 0.6

All dataSamples

Grow Tree Grow Tree

When sample size is small STOP

Repeat these steps multiple times to create a forest


Boosting Trees

Tree 1

Boosting Trees

Tree 1 Tree 2

Help Tree 1

Boosting Trees

Tree 1 Tree 2 Tree 3 Tree 4

Help Tree 1

Help Tree 1, 2

Help Tree 1, 2, 3

Stop when no help needed

0 weights all samples

Boosting Trees

Tree 1 Tree 2 Tree 3 Tree 4

Help Tree 1

Help Tree 1, 2

Help Tree 1, 2, 3

8.0 -2.0 1.0 0.57.5 = + + +

Boosting Trees - AlgorithmObjective function:

Loss

Friedman, J. H. "Greedy Function Approximation: A Gradient Boosting Machine." 1999

Precision at a fixed recall level

Results - Precision

Model April May June

Random Forest 76% 77% 80%

Boosting Trees 85% 82% 88%

+11.8% +6.5% +10%

Results - Fraud Detection Recall

# Payments to Reject

Frau

d $

Pre

vent

ed Easy

Hard

Medium

Data Sampling

Highly biased in label distribution - Less than 1 in 1000 !Weighted training - Higher weights on positive samples => oscillation - Lower weights on negative samples => no real gain !Solution - Keep negative:positive ratio to be 3:1 - 10:1 - Scale the final model if calibration is needed !Fewer data requires fewer resources to train !Observed +10% improvement from 20:1 to 3:1

ProductionalizeMachine Learning

‣Ruby-on-Rails + MySQL ‣MySQL replication ‣Tied to production schema ‣Hard to do complex analysis

Startup Architecture

‣ Jave services ‣APIs ‣HDFS

Scale it up: SOA + Data Warehouse

Scale it up: Data Transport

‣Append-only feeds ‣Kafka ‣Replication ‣Protocol buffers

Payments

Highly Available

Merchant

Devices

Bank Accounts

Suspicions

Parallel Environments and Data Integrity

Blue

Green

VIPupstream

Square Random Forest Learning Management

Recommendation

Other ML @ Square

Square Random Forest

RF Learner Implementation Time (Train / Test)

RiskML Random Forest (Built on Scikit-Learn)

C / Cython / Python (Open Source + Square Code) 72 minutes

WiseRF C++ (Proprietary) 23 minutes

Square Random Forest Java (Square Code) 15 minutes

Note: time reported on 3M training and 15M testing data

Learning Management System

‣ Support non-sophisticated users

‣ Fast ad-hoc analytics

‣ Accessible to everyone for easy

model generation and evaluation

‣ Tracks results to ensure different

models can be compared

Square Market Recommendation

10x conversion rate vs. random baseline

ML @ Square !

[email protected]

mailto:[email protected]

Square's Machine Learning Infrastructure and Applications - Rong Yan

Software

Transcript of Square's Machine Learning Infrastructure and Applications - Rong Yan