Machine learning the high interest credit card of technical debt [PWL]

Machine LearningThe High Interest Credit Card of Technical Debt

The Market Intelligence Company of the Digital World

$65MFunding

2007Founded

6Offices

300+Employees

Market Intelligence Companyof the Digital World

The

Learned | Estimated

Machine learning: The high interest credit card of technical debt (2014)

Hidden technical debt in machine learning systems (2015)

D. Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, Dan Dennison

A Few Words About The Papers

Systems engineering papers

About Machine Learning systems

Give a lot of names to a lot of things (which we know is hard)

We found them in 2015 and liked them a lot

A Few Words About The Papers

What is ML and what is Technical Debt?

Sources of Technical Debt in ML systems

Mitigation

Today

Machine Learning

Train Predict

Data

Algorithm

Data

Data

Hyperparameters

Why Machine Learning?

Allows us to convert data to software

We often already have data

Some problems are hard or impossible to solve otherwise

http

://xk

cd.c

om/1

425/

http://imgs.xkcd.com/comics/tasks.png

A metaphor for the long term costs of moving quickly

Lack of testing, bad modularity, non-redundant systems, etc.

Somewhat similar to fiscal debt

There are good reasons to take it, but it needs to be serviced

Hidden technical debt - a special, evil, variant

Technical Debt

Boundary Erosion

Components, interfaces, all that jazz

Think MVC, microservices

Implicitly assumed in “good” systems

Makes components easy to:- Test- Change- Reason about- Monitor

Boundaries in Systems Engineering

Entanglement

ML System “Inputs”

Learning settings

Hyperparams

Data prep settings

Real world inputs

?

Other systems outputs

Issues

Change in distribution of any input influences all outputs

Adding/Removing a feature changes the model and output distribution

Any configuration parameter is just as coupled

Retraining not reproducible

Changing Anything Changes Everything (CACE)

Model parts

Correction Cascades

Output Output Output

We sometimes use output from an existing model as a feature to get a small correction

Easier than training a new model

Easier than teaching an existing model new tricks

A

B

C

Correction Cascades

Output Output Output

Improvement

Degradation

Model improvements cause degradation down the line

Corrections might lead to an “improvement deadlock”

A

B

C

Outputs of ML systems include:- Predictions- Weights and other state

Data is easy to consume

In turn makes it hard to improve model

May create hidden feedback loops

Undeclared Consumers

Data Dependencies

Data Dependencies

Regular system

ComponentInput

Component Output

ComponentInput Output

Data Dependencies

Regular system

ComponentInput

Component Output


Data dependency

Data Dependencies

Regular system

ComponentInput

ML System

Component Output


Input Logs

Weights

Output

ML Component

Trainer

PredictInput Output

Data dependency

Features for training can be outputs of other models

IDF tables, Word2Vec embeddings..

Logs, intermediate results, monitoring feeds..

But if they change schema? Stop being updated? Disappear?

Unstable Dependencies

Legacy features - Nobody maintains / wants to maintain them

Bundled features - Not sure which ones we need

Correlated features - May mask features with actual causality

Epsilon features - Improve the result by very little

Underutilized Dependencies

Software Issues

ML as Software

Actual machine learning is a lot more than modeling

ConfigurationData

Collection

Feature Extraction

DataVerification

Process Management

Resource Management

Analysis Tools Serving Infrastructure

Monitoring

Model

Glue code

Software issues

Pipeline jungles

Dead experimental paths

Abstraction Debt

Multiple languages, systems, packages

Need to configure/test/deploy:- Hyper-parameters- Schema (including semantics)- Data dependencies

Hard to understand or visualize what changed

Configuration Debt

Interactions

Experience has shown that the external world is rarely stable

- Word2Vec for “Pokemon”- Population of Sudan- Gregorian dates of holidays

Makes monitoring essential.

Makes testing very hard.

Changes in The External World

A model sometimes influences its future training data

This is common in:- Recommendation systems- Ad placement- Systems that affect the physical world

Especially hard if change is gradual and model updates infrequently

Direct Feedback Loops

Often happen when two different systems learn from each other’s outputs

Classic example is algo-trading

But two independent content generation systems running on the same page also qualify

Undeclared consumers can be a cause

Hidden Feedback Loops

..But waitThere’s More!

Data Testing

Reproducibility

Process Management

Cultural Debt

More!

Mitigation

How easily can an entirely new algorithmic approach be tested at full scale?

What is the transitive closure of all data dependencies?

How precisely can the impact of a new change to the system be measured?

Be Aware of Debt

Does improving one model or signal degrade others?

How quickly can new members of the team be brought up to speed?

Be Aware of Debt

Merge mature models into a single, well defined, well tested system

Prune experimental code paths

Make each feature count

Monitor

Map consumers

Test data

Paying per model

Configuration system - versioned, comprehensive, testable

Data dependency system - versioned, comprehensive, testable

Consolidate mature systems

Reproducibility is awesome

Pay off cultural debt

Paying for Systems

Machine learning the high interest credit card of technical debt [PWL]

Software

Transcript of Machine learning the high interest credit card of technical debt [PWL]