Logistic Regression/Markov Chain presentation

Logistic Regression and Markov Chain approach toNCAA Basketball seeding

Michael Hankin

University of Southern California

[email protected]

April 22, 2013

Michael Hankin (USC) LRMC April 22, 2013 1 / 22

Overview

1 BackgroundLogistic RegressionMarkov Chain


Overview of Logistic Regression

Basic idea of Logistic Regression: Given explanatory variables ~X andbinary response variable Y we wish to determine P(Y = 1 | ~X ). Logisticregression allows us to estimate this by modeling

Y ∼ Bernoulli(σ(~wT ~X )

)where σ(~wT ~X ) = 1

1+e~wT ~X

-If we model P(i beats j on j ’s homecourt | i beat j by x on i ’s homecourt)as σ(α + βx) we obtain the following likelihood:

L(α, β) =∏

g :games

(1

1 + eα+βxg

)wg(

1− 1

1 + eα+βxg

)1−wg


We then find parameters that maximize the likelihood.

` = log L(α, β) =∑

g :games

wg log

(1

1 + eα+βxg

)+(1− wg ) log

(1− 1

1 + eα+βxg

)

` =∑

g :games

−wg log(

1 + eα+βxg)

+(1−wg )(α + βxg − log

(1 + eα+βxg

))

` =∑

g :games

(1− wg ) (α + βxg )− log(

1 + eα+βxg)


∂`

∂α=

∑g :games

(1− wg )− eα+βxg

1 + eα+βxg(1)

=∑

g :games

(1− wg )−(

1− 1

1 + eα+βxg

)(2)

=∑

g :games

(1

1 + eα+βxg− wg

)(3)

∂`

∂β=

∑g :games

(1− wg )xg −eα+βxg

1 + eα+βxgxg (4)

=∑

g :games

(1− wg )xg −(

1− 1

1 + eα+βxg

)xg (5)

=∑

g :games

(1

1 + eα+βxg− wg

)xg (6)


∂2`

∂α2=

∑g :games

−(

1

1 + eα+βxg

)(eα+βxg

1 + eα+βxg

)(7)

=∑

g :games

−(

1

1 + eα+βxg

)(1− 1

1 + eα+βxg

)(8)

∂2`

∂α∂β=

∂2`

∂β∂α=

∑g :games

−(

1

1 + eα+βxg

)(eα+βxg

1 + eα+βxg

)xg (9)

=∑

g :games

−(

1

1 + eα+βxg

)(1− 1

1 + eα+βxg

)xg (10)

∂2`

∂β2=

∑g :games

−(

1

1 + eα+βxg

)(eα+βxg

1 + eα+βxg

)x2g (11)

=∑

g :games

−(

1

1 + eα+βxg

)(1− 1

1 + eα+βxg

)x2g (12)


Want α, β s.t. ∇`(α, β) = 0. For α?, β? let εα = α− α?, εβ = β − β?. ByTaylor we have:

0 = ∇`(α, β) = ∇`(α? + εα, β? + εβ) ≈ ∇`(α?, β?) +∇2`(α?, β?)

εαεβ

0 = ∇`(α?, β?) +∇2`(α?, β?)αβ−∇2`(α?, β?)

α?

β?

Newton to the rescue: Successive updates of the following form shouldconverge to the optimal values.

αβ

=α?

β?−(∇2`(α?, β?)

)−1∇`(α?, β?)


Use of Logistic Regression in LRMC

Victory/Defeat margin: We have now found rHx , the probability that ifteam i beats team j by x at i’s home court, team i will beat team j at j’shome court. Assuming homecourt advantage is additive, the superiorityprobability sHx , the probability that team i would beat team j on a neutralcourt given that team i beat team j by x on team i’s home court= rHx+h.

This gives h = −αr2βr

and sHx = σ(αr2 + βrx).


Alternative assumptions: Because each game has finite length (equalexcept for overtime), a reasonable estimator for a teams skill is theproportion of they control the ball. Going further, the proportion of time ateam controls the ball can be estimated by their score divided by the sumof both teams scores. Multiplicative homecourt advantage (look at scoreratio) and log multiplicative (log of score ratio).

Reduce overfitting: By penalizing for large parameter values (implyingthat future games are independent of past games) we can reduceoverfiiting by choosing nonnegative λα, λβ and minimizing−`+ λαα

2 + λββ2.

In my regularized examples I placed larger penalties on the α’s, operatingunder the hypothesis that there is no homecourt advantage.


Logistic Regression ”Goodness of Fit”

Assumptions for test: Because the number of observations is muchlarger than the number of ”buckets” (for classical LRMC mean andmedian observations per score differential were approximately 32.9 and 17,respectively) the CLT allows us to normalize the residuals by assuming

that each observation is Bernoulli ri = yi−yi√yi (1−yi )

and thus∑

i r2i

H∼ χ2n−2.

-


Chi Squared p-values for logistic regressions

2011 2012 2013

additive 0.511777 0.552131 0.569139additive (reg) 0.500654 0.534811 0.550568multiplicative 0.495586 0.537728 0.522612multiplicative (reg) 0.027208 0.001498 0.001819log mult 0.499545 0.558072 0.593485log mult (reg) 0.424898 0.440884 0.483908

Table : χ2 p-values


2010-2011 Logistic Regressions

Numbers in legends are estimated homecourt advantages.


Parameter estimates for 2012-2013

Additive Parameters:αr , βr =0.68503617299539032, -0.056212447269008876.Variance:

α β

α 1.94257829e-03 -6.11459051e-05β -6.11459051e-05 1.20313009e-05


Overview of Markov Chains

Stochastic Process with finite states: A Finite-state markov chain is astochastic process where the probability of being in X at time t isdependent only on the state at time t-1.

Steady state: Given some basic conditions, there exists a probabilitydistribution across the states such that if a Markov Chain is run for a longtime we can expect the state at any given time to be ”Multinoulli” withthe steady state distribution.


Use of Markov Chains in LRMC

LRMC states: In LRMC we create a state for each team, indicating thatwe think that team is the best team.-Transition probabilities: Given some probability distribution based oneach team’s regular season record we either jump to another team or stayput at each ”step”.

Expected time per state: Eventually a steady state distribution emergesrepresenting the amount of time we expect to be in each state. In thiscase because the transition matrix is sparse and small enough for mylaptop to handle, we just find its eigenvector corresponding toeigenvalue=1, and normalize in L1.


Transition Probabilities

Naive Approach: To motivate the more complex LRMC approach westart simple. Takep = P(team i is better than team j | team i beat team j), wij = thenumber of times i beat j, lij = the number of time j beat i, and Ni = totalnumber of games played by i (required to normalize transitionprobabilities). Then we define the transition probabilitytij = 1

Ni(wij(1− p) + lijp).

-Better approach: Obviously we can do better by considering the victory

margin and game location. tij1Ni

(∑g :iatj r

Hx(g) +

∑g :jati (1− rHx(g))

),

tii = 1−∑

j 6=i tij .-


2013 Top 10 projected teams

Top teams Top teamsL TopProb TopProbL

0 Miami (FL) Nevada-Las Vegas 0.006619 0.0032621 Michigan Notre Dame 0.006619 0.0032622 Wisconsin Virginia Commonwealth 0.006670 0.0032623 Ohio State James Madison 0.006788 0.0032624 Syracuse Louisville 0.006991 0.0032625 Kansas North Carolina A&T 0.007234 0.0032626 Gonzaga North Carolina State 0.007625 0.0032627 Indiana New Mexico 0.008241 0.0033618 Louisville Syracuse 0.008352 0.0033619 Florida Memphis 0.008582 0.003361


Solitary and comparative accuracy

Proportion of Tournament matchups predicted correctly:2012-2013 2011-2012 2010-2011

Additive 0.630769230769 0.716417910448 0.615384615385Multiplicative 0.569230769231 0.641791044776 0.615384615385

Log Mult 0.630769230769 0.686567164179 0.630769230769


2012-2013 Linear Regression for Playoff probabilitydifference vs victory margin


References

Paul Kvam and Joel S. Sokol (2006)

A Logistic Regression/Markov Chain Model for NCAA Basketball

Naval Research Logistics

RogueWave Logistic Regression Documentation

http://www.roguewave.com/portals/0/products/legacy-hpp/docs/anaug/3-3.html


The End


Logistic Regression/Markov Chain presentation

Education

Transcript of Logistic Regression/Markov Chain presentation