Logistic Regression/Markov Chain presentation

23
Logistic Regression and Markov Chain approach to NCAA Basketball seeding Michael Hankin University of Southern California [email protected] April 22, 2013 Michael Hankin (USC) LRMC April 22, 2013 1 / 22

description

Presentation on Logistic Regression/Markov Chain algorithm for forecasting NCAA Basketball tournament outcomes. For Regression and ANOVA course

Transcript of Logistic Regression/Markov Chain presentation

Page 1: Logistic Regression/Markov Chain presentation

Logistic Regression and Markov Chain approach toNCAA Basketball seeding

Michael Hankin

University of Southern California

[email protected]

April 22, 2013

Michael Hankin (USC) LRMC April 22, 2013 1 / 22

Page 2: Logistic Regression/Markov Chain presentation

Overview

1 BackgroundLogistic RegressionMarkov Chain

Michael Hankin (USC) LRMC April 22, 2013 2 / 22

Page 3: Logistic Regression/Markov Chain presentation

Overview of Logistic Regression

Basic idea of Logistic Regression: Given explanatory variables ~X andbinary response variable Y we wish to determine P(Y = 1 | ~X ). Logisticregression allows us to estimate this by modeling

Y ∼ Bernoulli(σ(~wT ~X )

)where σ(~wT ~X ) = 1

1+e~wT ~X

-If we model P(i beats j on j ’s homecourt | i beat j by x on i ’s homecourt)as σ(α + βx) we obtain the following likelihood:

L(α, β) =∏

g :games

(1

1 + eα+βxg

)wg(

1− 1

1 + eα+βxg

)1−wg

Michael Hankin (USC) LRMC April 22, 2013 3 / 22

Page 4: Logistic Regression/Markov Chain presentation

We then find parameters that maximize the likelihood.

` = log L(α, β) =∑

g :games

wg log

(1

1 + eα+βxg

)+(1− wg ) log

(1− 1

1 + eα+βxg

)

` =∑

g :games

−wg log(

1 + eα+βxg)

+(1−wg )(α + βxg − log

(1 + eα+βxg

))

` =∑

g :games

(1− wg ) (α + βxg )− log(

1 + eα+βxg)

Michael Hankin (USC) LRMC April 22, 2013 4 / 22

Page 5: Logistic Regression/Markov Chain presentation

∂`

∂α=

∑g :games

(1− wg )− eα+βxg

1 + eα+βxg(1)

=∑

g :games

(1− wg )−(

1− 1

1 + eα+βxg

)(2)

=∑

g :games

(1

1 + eα+βxg− wg

)(3)

∂`

∂β=

∑g :games

(1− wg )xg −eα+βxg

1 + eα+βxgxg (4)

=∑

g :games

(1− wg )xg −(

1− 1

1 + eα+βxg

)xg (5)

=∑

g :games

(1

1 + eα+βxg− wg

)xg (6)

Michael Hankin (USC) LRMC April 22, 2013 5 / 22

Page 6: Logistic Regression/Markov Chain presentation

∂2`

∂α2=

∑g :games

−(

1

1 + eα+βxg

)(eα+βxg

1 + eα+βxg

)(7)

=∑

g :games

−(

1

1 + eα+βxg

)(1− 1

1 + eα+βxg

)(8)

∂2`

∂α∂β=

∂2`

∂β∂α=

∑g :games

−(

1

1 + eα+βxg

)(eα+βxg

1 + eα+βxg

)xg (9)

=∑

g :games

−(

1

1 + eα+βxg

)(1− 1

1 + eα+βxg

)xg (10)

∂2`

∂β2=

∑g :games

−(

1

1 + eα+βxg

)(eα+βxg

1 + eα+βxg

)x2g (11)

=∑

g :games

−(

1

1 + eα+βxg

)(1− 1

1 + eα+βxg

)x2g (12)

Michael Hankin (USC) LRMC April 22, 2013 6 / 22

Page 7: Logistic Regression/Markov Chain presentation

Want α, β s.t. ∇`(α, β) = 0. For α?, β? let εα = α− α?, εβ = β − β?. ByTaylor we have:

0 = ∇`(α, β) = ∇`(α? + εα, β? + εβ) ≈ ∇`(α?, β?) +∇2`(α?, β?)

εαεβ

0 = ∇`(α?, β?) +∇2`(α?, β?)αβ−∇2`(α?, β?)

α?

β?

Newton to the rescue: Successive updates of the following form shouldconverge to the optimal values.

αβ

=α?

β?−(∇2`(α?, β?)

)−1∇`(α?, β?)

Michael Hankin (USC) LRMC April 22, 2013 7 / 22

Page 8: Logistic Regression/Markov Chain presentation

Use of Logistic Regression in LRMC

Victory/Defeat margin: We have now found rHx , the probability that ifteam i beats team j by x at i’s home court, team i will beat team j at j’shome court. Assuming homecourt advantage is additive, the superiorityprobability sHx , the probability that team i would beat team j on a neutralcourt given that team i beat team j by x on team i’s home court= rHx+h.

This gives h = −αr2βr

and sHx = σ(αr2 + βrx).

Michael Hankin (USC) LRMC April 22, 2013 8 / 22

Page 9: Logistic Regression/Markov Chain presentation

Alternative assumptions: Because each game has finite length (equalexcept for overtime), a reasonable estimator for a teams skill is theproportion of they control the ball. Going further, the proportion of time ateam controls the ball can be estimated by their score divided by the sumof both teams scores. Multiplicative homecourt advantage (look at scoreratio) and log multiplicative (log of score ratio).

Reduce overfitting: By penalizing for large parameter values (implyingthat future games are independent of past games) we can reduceoverfiiting by choosing nonnegative λα, λβ and minimizing−`+ λαα

2 + λββ2.

In my regularized examples I placed larger penalties on the α’s, operatingunder the hypothesis that there is no homecourt advantage.

Michael Hankin (USC) LRMC April 22, 2013 9 / 22

Page 10: Logistic Regression/Markov Chain presentation

Logistic Regression ”Goodness of Fit”

Assumptions for test: Because the number of observations is muchlarger than the number of ”buckets” (for classical LRMC mean andmedian observations per score differential were approximately 32.9 and 17,respectively) the CLT allows us to normalize the residuals by assuming

that each observation is Bernoulli ri = yi−yi√yi (1−yi )

and thus∑

i r2i

H∼ χ2n−2.

-

Michael Hankin (USC) LRMC April 22, 2013 10 / 22

Page 11: Logistic Regression/Markov Chain presentation

Chi Squared p-values for logistic regressions

2011 2012 2013

additive 0.511777 0.552131 0.569139additive (reg) 0.500654 0.534811 0.550568multiplicative 0.495586 0.537728 0.522612multiplicative (reg) 0.027208 0.001498 0.001819log mult 0.499545 0.558072 0.593485log mult (reg) 0.424898 0.440884 0.483908

Table : χ2 p-values

Michael Hankin (USC) LRMC April 22, 2013 11 / 22

Page 12: Logistic Regression/Markov Chain presentation

2010-2011 Logistic Regressions

Numbers in legends are estimated homecourt advantages.

Michael Hankin (USC) LRMC April 22, 2013 12 / 22

Page 13: Logistic Regression/Markov Chain presentation

2011-2012 Logistic Regressions

Numbers in legends are estimated homecourt advantages.

Michael Hankin (USC) LRMC April 22, 2013 13 / 22

Page 14: Logistic Regression/Markov Chain presentation

2012-2013 Logistic Regressions

Numbers in legends are estimated homecourt advantages.

Michael Hankin (USC) LRMC April 22, 2013 14 / 22

Page 15: Logistic Regression/Markov Chain presentation

Parameter estimates for 2012-2013

Additive Parameters:αr , βr =0.68503617299539032, -0.056212447269008876.Variance:

α β

α 1.94257829e-03 -6.11459051e-05β -6.11459051e-05 1.20313009e-05

Michael Hankin (USC) LRMC April 22, 2013 15 / 22

Page 16: Logistic Regression/Markov Chain presentation

Overview of Markov Chains

Stochastic Process with finite states: A Finite-state markov chain is astochastic process where the probability of being in X at time t isdependent only on the state at time t-1.

Steady state: Given some basic conditions, there exists a probabilitydistribution across the states such that if a Markov Chain is run for a longtime we can expect the state at any given time to be ”Multinoulli” withthe steady state distribution.

Michael Hankin (USC) LRMC April 22, 2013 16 / 22

Page 17: Logistic Regression/Markov Chain presentation

Use of Markov Chains in LRMC

LRMC states: In LRMC we create a state for each team, indicating thatwe think that team is the best team.-Transition probabilities: Given some probability distribution based oneach team’s regular season record we either jump to another team or stayput at each ”step”.

Expected time per state: Eventually a steady state distribution emergesrepresenting the amount of time we expect to be in each state. In thiscase because the transition matrix is sparse and small enough for mylaptop to handle, we just find its eigenvector corresponding toeigenvalue=1, and normalize in L1.

Michael Hankin (USC) LRMC April 22, 2013 17 / 22

Page 18: Logistic Regression/Markov Chain presentation

Transition Probabilities

Naive Approach: To motivate the more complex LRMC approach westart simple. Takep = P(team i is better than team j | team i beat team j), wij = thenumber of times i beat j, lij = the number of time j beat i, and Ni = totalnumber of games played by i (required to normalize transitionprobabilities). Then we define the transition probabilitytij = 1

Ni(wij(1− p) + lijp).

-Better approach: Obviously we can do better by considering the victory

margin and game location. tij1Ni

(∑g :iatj r

Hx(g) +

∑g :jati (1− rHx(g))

),

tii = 1−∑

j 6=i tij .-

Michael Hankin (USC) LRMC April 22, 2013 18 / 22

Page 19: Logistic Regression/Markov Chain presentation

2013 Top 10 projected teams

Top teams Top teamsL TopProb TopProbL

0 Miami (FL) Nevada-Las Vegas 0.006619 0.0032621 Michigan Notre Dame 0.006619 0.0032622 Wisconsin Virginia Commonwealth 0.006670 0.0032623 Ohio State James Madison 0.006788 0.0032624 Syracuse Louisville 0.006991 0.0032625 Kansas North Carolina A&T 0.007234 0.0032626 Gonzaga North Carolina State 0.007625 0.0032627 Indiana New Mexico 0.008241 0.0033618 Louisville Syracuse 0.008352 0.0033619 Florida Memphis 0.008582 0.003361

Michael Hankin (USC) LRMC April 22, 2013 19 / 22

Page 20: Logistic Regression/Markov Chain presentation

Solitary and comparative accuracy

Proportion of Tournament matchups predicted correctly:2012-2013 2011-2012 2010-2011

Additive 0.630769230769 0.716417910448 0.615384615385Multiplicative 0.569230769231 0.641791044776 0.615384615385

Log Mult 0.630769230769 0.686567164179 0.630769230769

Michael Hankin (USC) LRMC April 22, 2013 20 / 22

Page 21: Logistic Regression/Markov Chain presentation

2012-2013 Linear Regression for Playoff probabilitydifference vs victory margin

Michael Hankin (USC) LRMC April 22, 2013 21 / 22

Page 22: Logistic Regression/Markov Chain presentation

References

Paul Kvam and Joel S. Sokol (2006)

A Logistic Regression/Markov Chain Model for NCAA Basketball

Naval Research Logistics

RogueWave Logistic Regression Documentation

http://www.roguewave.com/portals/0/products/legacy-hpp/docs/anaug/3-3.html

Michael Hankin (USC) LRMC April 22, 2013 22 / 22

Page 23: Logistic Regression/Markov Chain presentation

The End

Michael Hankin (USC) LRMC April 22, 2013 23 / 22