Boosted Tree-based Multinomial Logit Model for Aggregated Market Data

Motivation Aggregated Market Multinomial Logit Model Application to Australian Data

Boosted Tree-based Multinomial Logit Model forAggregated Market Data

Jianqiang (Jay) Wang & Trevor Hastie

Hewlett-Packard Labs & Stanford University

Dec 2, 2012

Disclaimer: I, myself, take sole responsibility for any errors and omissions in this presentation.

1 / 16


Hewlett-Packard Labs

HPL Charter:

DELIVER; CREATE; ADVANCE; ENGAGE

Information Analytics Lab:

2 / 16


Statistical Demand Modeling

3 / 16


Pricing and Portfolio Management

Predictive analytics-based PPM decision support system.

2012 INFORMS Revenue Management & Pricing Practice Award.

DemandHow do consumers value products?

Product Selection and PricingWhat products should we offer? What is the right pricing?

Competitive Product SimilarityWhat products are we competing with on the market?

Leveraging IntelligenceCan we infer market intelligence from current prices, andlearn?

4 / 16


Estimating Aggregated Market Demand

Aggregated mobile computer sales data on all brands.

Market sales data reveals customer selection.

Aggregated mobile PC sales.

Brands, country, region, attributes, period, channel, price, volume.

Complexity of model estimation:

40+ different key features (memory, CPU, display, storage, OS, ...).

Price sensitivity varies with attributes, time, and region.

High-dimensional prediction problem.

5 / 16


Discrete Choice Model

Modeling Sales Volume vs Consumer choice (McFadden 1974):

Choice set: products to choose from.

Utility : overall attractiveness given attributes, brand and price.

Better attributes, higher utility; higher price, lower utility.

Challenges:

Sparse selection.

Nonlinearity.

Interactions among (attributes, price).

Semiparametric Multinomial Logit Model (MNL):

Linear MNLs: Train (2003); Semiparametric MNLs: p-splines (Tutz & Scholz 2004).

Flexibly model customers’ valuation without specifying a functional form.

Estimation: Functional gradient boosting with partitioned regression trees as base learners.

6 / 16


Aggregated Market Multinomial Logit Model

Single market with K products; products i = 1, · · · ,K with sales volumn(n1, · · · , nM); latent utilities

ui = fi + εi .

Assuming εiiid∼ standard Gumbel distn, utility maximization leads to

pi =exp(fi )∑Ki=1 exp(fi )

.

Minimize −2 log (multinomial likelihood):

φ(f) = −2K∑i=1

ni log(g(fi )) + 2N log

K∑i=1

g(fi )

+ const.

g(·) link function, e.g., g(u) = exp(u).

7 / 16


Model Variations

Notation: si – attributes, brand and channel; xi = (1, xi )′, xi – price.

Utility Specifications:

Varying coefficient-MNL (price*attribute interaction):

fi = x′i β(si ).

Partially linear-MNL (price & attribute additive):

fi = β0(si ) + xiβ1.

Nonparametric-MNL:fi = β(si , xi ).

Boosted trees:

Partition the products into homogeneous groups in a way that respects the mean utility function..

Iteratively fits simple trees to explain errors not captured in the previous iteration.

8 / 16


Building Block: VC Trees

Underlying VCM model:

ξi = x′iβ(si ) + εi ,

Piecewise constant approximation:

ξi =M∑

m=1

x′iβmI(si∈Cm) + εi ,

M: number of partitions.

{Cm}Mm=1: a partition of the space of si .

Piecewise constant approximation to the unknown high-dimensional function &data-driven partitioning method to obtain homogeneous regression relationships.Algorithm:

Heuristics: greedy algorithm based on binary splits of the space of si (similar to CART).

Splitting criterion: reduction in SSE.

9 / 16


Boosted VC-MNL

Boosted VC-MNL: φ(f) = −2∑K

i=1 ni log(g(x′i β(si ))) + 2N log{∑K

i=1 g(x′i β(si ))}

+ const.

1 Start with naive fit f(0)

= (x′1β(0), · · · , x′K β

(0))′.

2 For b = 1, · · · ,B, repeat:

Compute the “pseudo observations”: ξi = − ∂φ∂fi

∣∣∣f =f (b−1)

.

Fit ξi on si and xi using the “PartReg” algorithm to obtain partitions (C(b)1 , · · · , C (b)

M).

Let zi = (I(si∈C

(b)1

), · · · , I

(si∈C(b)M

), xi I

(si∈C(b)1

), · · · , xi I

(si∈C(b)M

))′, and use IRLS to

estimate β(b)

by minimizing

J(β(b)) = −2K∑i=1

ni

{log(g(f

(b−1)i + z′i β

(b)))}

+ 2N log

K∑i=1

g(f(b−1)i + z′i β

(b))

.Update the fitted model by f (b) = f (b−1) + ν

∑Mm=1

{β

(b)0m + β

(b)1mxi

}I

(si∈C(b)m )

.

3 Output the fitted model f = f (B).

10 / 16


Boosted VC-MNL

Start with naive fit: e.g., simple linear MNL.

Begin the iteration process:

Compute pseudo observations/residuals.

Fit an appropriate tree to predict pseudo residualts.

Generate design matrix based on tree partitions, and fit linear MNL model.

Addtive model of trees, not of predictors.

Iteratively fit linear MNL models based on data-driven piecewise constant“bases”.

11 / 16


Mobile Computer Sales in Australia

6 months, 5 states; 30 choice sets (25 training, 5 test); use price residualsinstead of price.

Varying coefficient-MNL:fi = x′i β(si ).

Partially linear-MNL:fi = β0(si ) + xiβ1.

Nonparametric-MNL:fi = β(si , xi ).

0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

1.0

Varying coefficient−MNL, Boosted

Iterations

R2

TrainingTest

0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

1.0

Partially linear, Boosted

Iterations

R2

TrainingTest

0 200 400 600 800 1000

0.0

0.2

0.4

0.6

0.8

1.0

Nonparametric, Boosted

Iterations

R2

TrainingTest

12 / 16


Competitor Method – Elastic Net MNL

Models: fi = x′iβ(si ).

Linear-MNL: linear β(si ).

Quadratic-MNL (first-order interaction).

Quadratic-MNL: Initial features si .

⇒ Quadratic & first-order interaction among si , obtain design matrix zi .

⇒ Linear specification: β0(si ) = ziγ0 and β1(si ) = ziγ1.

Elastic net (Zou & Hastie 2005) MNL:

arg minγ0,γ1

−2K∑i=1

ni log(g(z′i γ0 + (z′i xi )γ1)) + 2N log

K∑i=1

g(z′i γ0 + (z′i xi )γ1)

+λ

α∑i,j

|γij | +(1− α)

2

∑i,j

γ2ij

α = 0: Ridge regression; α = 1: LASSO.

g(·) : link function.

Sparse and stable coefficient estimates, penalized IRLS.

13 / 16


Summary of Results

Utility Optimal R2 Interactions

SpecificationEstimation

Training TestTime (min)

Among attributes

(α = 1) 399 .357 .17 X

Linear(α = 1

2) .419 .379 .48 X

(α = 1)penalized IRLS

.582 .499 76.91 1st -order

Quadratic(α = 1

2) .554 .53 52.78 1st -order

Varying-coef. .734 .697 186.47 (B=1000)

Partially linear boosted trees .493 .455 24.63 (B=1000) 2nd -order (M=4)

Nonparametric .52 .502 23.43 (B=1000)

M – size of each base tree; B– the number of boosting iterations

Nonparametric MNL specifies a larger model space than VC-MNL, but piecewise constant trees fails to find the

particular interactions.

14 / 16


Discussion

Semiparametric MNL models, estimated by boosted tree methods.

Learning from large-scale market data to a) make predictions and b) gaininsights: econometrics & statistical learning.

Statistical questions:

Assessing errors in R2 and coefficient surface.

Split selection in tree partitioning (variable importance).

Model validation & diagnostics (standardized pseudo residuals).

Choice of link functions.

15 / 16


Jianqiang (Jay) Wang

Information Analytics Lab

Hewlett-Packard Labs

[email protected]

Thank you very much!

16 / 16

Boosted Tree-based Multinomial Logit Model for Aggregated Market Data

Data & Analytics

Transcript of Boosted Tree-based Multinomial Logit Model for Aggregated Market Data

￼Boosted Tree-based Multinomial Logit Model for Aggregated Market Data

Data & Analytics

Transcript of ￼Boosted Tree-based Multinomial Logit Model for Aggregated Market Data

Boosted Tree-based Multinomial Logit Model for Aggregated Market Data

Transcript of Boosted Tree-based Multinomial Logit Model for Aggregated Market Data