Creating a contemporary risk management system using python (dc)

41
Creating a Contemporary Risk Management System Using Python Piero Ferrante C2FO, Director of Data Science @the_real_pdf

Transcript of Creating a contemporary risk management system using python (dc)

Page 1: Creating a contemporary risk management system using python (dc)

Creating a ContemporaryRisk Management SystemUsing PythonPiero Ferrante C2FO, Director of Data Science

@the_real_pdf

Page 2: Creating a contemporary risk management system using python (dc)

What is C2FO?● Collaborative Cash Flow Optimization● World’s first global market for working capital

○ C2FO receives daily invoices from a massive network of buyers and suppliers ○ Buyers with excess cash set a desired rate of return○ Suppliers can name their rate in terms of discount or APR○ Payment is accelerated through C2FO markets and both parties win

■ Buyers achieve their desired rate of return■ Suppliers are awarded at their desired cost of capital■ C2FO markets finance over $1B invoices/month

Page 3: Creating a contemporary risk management system using python (dc)

What is WFC?● Water For Commerce● An investment fund and short-term lending platform for SMBs● Fund supplier invoices from buyers outside the C2FO network● Using C2FO’s unique data to prospect and underwrite

○ Over 5 years of daily invoices and adjustments○ ≈ 200k suppliers from top tier buyers○ C2FO market bidding data

● Offers favorable yields vs. investments of comparable risk○ ≈ 40 day duration → 6.25%

Page 4: Creating a contemporary risk management system using python (dc)

B3

B1

B2

Bi

S1

S2

S3

Si

20%

30%

25%

25%

How does WFC compliment C2FO?“We are a search engine for liquidity” - Sandy Kemper

Page 5: Creating a contemporary risk management system using python (dc)

SMB lending is hard and crowded, so why bother?● C2FO is the champion of the supplier

○ We want to help our suppliers succeed financially

● All risk is not created equal● We believe we can do it better :)

○ Without rate gauging borrowers○ Without misleading investors

● We have great data● We have great tools

Page 6: Creating a contemporary risk management system using python (dc)

Great tools make great things possible

Page 7: Creating a contemporary risk management system using python (dc)

Prospecting & Underwriting Onboarding Portfolio Mgmt

Concentration Risk Default Risk Fraud Risk Unsystem-atic Risk

Buyer Diversity

Score

Buyer Junk Score

Backtest & Forecast Score(s)

45 / 90 / 180dAR

Score(s)

BankruptcyScore

Congruency Score

NLP Red Flag

Score

Rate & Limit

Calculator

PortfolioDiversifi-

cation

Exposure Risk

Adjustment Volatility

Score

Risk Management Overview

Stage Risk Method

Page 8: Creating a contemporary risk management system using python (dc)

Concentration Risk - Measuring diversity

● With a little pandas fu...● A worse; B better; C best● More diversity of accounts

receivable is better● Less concentration with “junk”

buyers (below BBB) is better

A B

C

Page 9: Creating a contemporary risk management system using python (dc)

Prospecting & Underwriting Onboarding Portfolio Mgmt

Concentration Risk Default Risk Fraud Risk Unsystem-atic Risk

Buyer Diversity

Score

Buyer Junk Score

Backtest & Forecast Score(s)

45 / 90 / 180dAR

Score(s)

BankruptcyScore

Congruency Score

NLP Red Flag

Score

Rate & Limit

Calculator

PortfolioDiversifi-

cation

Exposure Risk

Adjustment Volatility

Score

Risk Management Overview

Stage Risk Method

Page 10: Creating a contemporary risk management system using python (dc)

Default Risk - Forecasting accounts receivableProblem: I want to build a bunch of forecasts using R, but the rest of my pipeline is in Python

Solution: Use rpy2 and get the best of both worlds

● Model types○ ARMA / ARIMA / SARIMA - forecast package○ Exponential smoothing (e.g. Holt-Winters) - forecast package○ Bayesian Structural Time Series - bsts package○ Regression (e.g. OLS, polynomial) - lm function

*Currently evaluating too!

Page 11: Creating a contemporary risk management system using python (dc)

Default Risk - Forecasting accounts receivable

Best model strategy: At least 5 quarters worth of history are required to make a 90 day forecast, which is the maximum loan duration.

Page 12: Creating a contemporary risk management system using python (dc)

Default Risk - Forecasting accounts receivable

Best model strategy: Dozens of models are fit using different time series transformations and model parameter combinations; the “best model” seeks to minimize the mean absolute percentage error (MAPE) and root mean squared error (RMSE) for the last 90 days.

Page 13: Creating a contemporary risk management system using python (dc)

Of course matplotlib and seaborn make even the most customized plots possible.

Default Risk - Forecasting accounts receivable

Page 14: Creating a contemporary risk management system using python (dc)

Default Risk - Understanding seasonal trendsThanks to statsmodels... seasonal decomposition is a breeze!

Knowing where a supplier is in terms of season is critical. It’s helpful to visually decouple seasonality from trend to help put the residual in perspective.

Page 15: Creating a contemporary risk management system using python (dc)

Default Risk - Predicting AR discontinuation Discontinuation is defined by supplier AR dropping to zero with all C2FO buyers.

Challenges:

● Data leakage○ Do not observe that which would not have been observable at the time of prediction

■ Establish criteria for prediction labels (e.g. supplier’s AR goes to 0 and stays there)■ Define prediction cutoff (e.g. 45 days in advance of going to 0)■ Remove all history after the cutoff date

● Engineering features○ Variables used to model the probability of discontinuation

■ All history (except after the prediction cutoff date)■ Various historical windows (e.g. 13 weeks leading up to prediction cutoff date)■ Values observed on the cutoff date

Page 16: Creating a contemporary risk management system using python (dc)

Default Risk - Predicting AR discontinuationHow is this model trained?

● Using scikit-learn for:○ Feature engineering

■ Encoding categoricals■ Creating polynomial features■ Scaling features■ Dimensionality reduction / feature selection

○ Model evaluation

● Using xgboost for:○ Training gradient boosted trees (a very performant machine learning classifier)

■ Since GBT are iterative learners, speed is important○ Used in conjunction with hyperopt for optimizing hyperparameters

■ Currently evaluating spearmint

Page 17: Creating a contemporary risk management system using python (dc)

Default Risk - Predicting AR discontinuationHow is this model evaluated?

● Primarily concerned with model recall● And not overfitting!

Page 18: Creating a contemporary risk management system using python (dc)

Default Risk - Predicting bankruptcyPredicting bankruptcy is very different than predicting AR discontinuation:

● Prediction labels are derived differently ● Bankruptcies may not exhibit the same AR signals/patterns

TODO:

● Receive and process daily feeds from the national bankruptcy database● Undergo a rigorous matching process● Perform data truncation and feature engineering● Enrich with macroeconomic data from the right point in time● Address severe class imbalances● Train awesome models

Page 19: Creating a contemporary risk management system using python (dc)

Default Risk - Predicting bankruptcyHow to perform efficient company matching on a daily basis?

● Clean your data○ Convert to lowercase, remove special characters, ...

● Match on *unique* values first○ Tax IDs & phone numbers

● Use string matching on company names after using soundex to limit the space○ Levenshtein distance, jaro-winkler distance, jaccard distance, …○ Use soundexes to reduce the search space

● Calculate geographical distance between known addresses○ Haversine distance

● Tinker with a weighting strategy that delivers satisfactory results

Pro tip: Cython-ize code (your library might already be doing this for you) or use Numba for JIT compilation where applicable; it pays off in the long run.

Page 20: Creating a contemporary risk management system using python (dc)

Default Risk - Predicting bankruptcy

Page 21: Creating a contemporary risk management system using python (dc)

Default Risk - Predicting bankruptcy

Page 22: Creating a contemporary risk management system using python (dc)

Default Risk - Predicting bankruptcy

Page 23: Creating a contemporary risk management system using python (dc)

Prospecting & Underwriting Onboarding Portfolio Mgmt

Concentration Risk Default Risk Fraud Risk Unsystem-atic Risk

Buyer Diversity

Score

Buyer Junk Score

Backtest & Forecast Score(s)

45 / 90 / 180dAR

Score(s)

BankruptcyScore

Congruency Score

NLP Red Flag

Score

Rate & Limit

Calculator

PortfolioDiversifi-

cation

Exposure Risk

Adjustment Volatility

Score

Risk Management Overview

Stage Risk Method

Page 24: Creating a contemporary risk management system using python (dc)

● Use NLP to transcribe and mine calls

● Post transcription, spacy makes

tokenization, lemmatization, etc. fast

● Identify conversations with red flags like:

○ Debt, leverage, bankruptcy, lien, payroll,

extend, broke, divorce, alcohol, rollover, audit,

layoff, credit, Cayman Islands, ...

● This is needed for 10x growth○ Average WFC audio/day ~90 minutes

Fraud Risk - Screening calls

Page 25: Creating a contemporary risk management system using python (dc)

Fraud Risk - Analyzing invoice congruencyFor each Buyer-Supplier relationship, we calculate the following scores:

● Joined Invoice Amount Score:○ In this equation, Wi is the invoice amount in WFC, and Ci is the invoice amount in C2FO

● Unjoined Score:○ Here Wi and Ci reflect the dollar amounts at the invoice due date aggregation level. We also

set (Wi - Ci) to be 0 if it is negative. This emphasizes suppliers who have more AP in WFC than C2FO.

Page 26: Creating a contemporary risk management system using python (dc)

Fraud Risk - Analyzing invoice congruencyOnce we have the Buyer-Supplier Scores, we calculate a Supplier level score, which is a weighted average of their respective Buyer-Supplier Scores.

Finally, we weight each individual score by the amount of AP in WFC, to get to our final Congruency Score.

Page 27: Creating a contemporary risk management system using python (dc)

Prospecting & Underwriting Onboarding Portfolio Mgmt

Concentration Risk Default Risk Fraud Risk Unsystem-atic Risk

Buyer Diversity

Score

Buyer Junk Score

Backtest & Forecast Score(s)

45 / 90 / 180dAR

Score(s)

BankruptcyScore

Congruency Score

NLP Red Flag

Score

Rate & Limit

Calculator

PortfolioDiversifi-

cation

Exposure Risk

Adjustment Volatility

Score

Risk Management Overview

Stage Risk Method

Page 28: Creating a contemporary risk management system using python (dc)

Who should we be lending to?

For suppliers that don’t meet some of the forecasting criteria, we can train models to predict their WFC scores so that we have total score coverage across the supplier pool.

Page 29: Creating a contemporary risk management system using python (dc)

Exposure Risk - Calculating limits and ratesLimits are calculated:

● Based on WFC score decile● Using loan duration● So, higher decile → greater % of n day forecast cumulative sum

Rates are calculated:

● By observing suppliers’ rates in C2FO markets● Adjusting for additional risk when applicable

Page 30: Creating a contemporary risk management system using python (dc)

Who should we continue lending to?Triggers to monitor:

● Level shifts in AR patterns○ Losing or gaining a buyer, rapid business growth, unprecedented invoices...

● C2FO bid changes○ Significant jumps in supplier bidding strategies

● WFC Score changes○ Seasonal fluctuations in WFC Scores

● Adjustments○ Unprecedented adjustment counts or amounts relative to invoices

● Buyer reserves○ Buyers may know something that the rest of us don’t (e.g. bad product or inventory concerns)

Page 31: Creating a contemporary risk management system using python (dc)

Who should we continue lending to?

Monitoring scores over time is important from a fund active management standpoint.

Page 32: Creating a contemporary risk management system using python (dc)

Prospecting & Underwriting Onboarding Portfolio Mgmt

Concentration Risk Default Risk Fraud Risk Unsystem-atic Risk

Buyer Diversity

Score

Buyer Junk Score

Backtest & Forecast Score(s)

45 / 90 / 180dAR

Score(s)

BankruptcyScore

Congruency Score

NLP Red Flag

Score

Rate & Limit

Calculator

PortfolioDiversifi-

cation

Exposure Risk

Adjustment Volatility

Score

Risk Management Overview

Stage Risk Method

Page 33: Creating a contemporary risk management system using python (dc)

Behind the scenes allstars● anaconda for managing our Python and R

environments● luigi for pipeline task orchestration● dask where doing math lends itself to out-of-core

parallelization

Luigi DAG

Page 34: Creating a contemporary risk management system using python (dc)

Demo time.

Page 35: Creating a contemporary risk management system using python (dc)
Page 36: Creating a contemporary risk management system using python (dc)
Page 37: Creating a contemporary risk management system using python (dc)
Page 38: Creating a contemporary risk management system using python (dc)
Page 39: Creating a contemporary risk management system using python (dc)
Page 40: Creating a contemporary risk management system using python (dc)

So what?● Objectivity gives way to innovation● Better independent data beats more complex algorithms● Tradeoffs must be evaluated with respect to constraints● For many tasks, Python can perform nearly as fast lower level languages● WFC is a win-win for borrowers and investors● Creating great solutions with open source tools is part of OSS too

Page 41: Creating a contemporary risk management system using python (dc)

Questions?@the_real_pdf