Post on 14-Jun-2015
© Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited.Confidential and proprietary.
Stepwise Logistic RegressionLecture for FMI Students 27.05.2010
Alexander Efremov
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 2
Agenda
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 3
Agenda
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 4
Introduction Applications of the Logistic Regression
Medicine – diagnostics, modeling of disease growth, treatment effect
Psychology – learn process modeling, psychological tests evaluation
Economics – risk analysis, countries debt investigation, occupational choices
Marketing – products consumption, retailers actions effect
Criminology – risk factors for performing of criminal act
Sociology – employment, graduation, vote analysis
Ecology – modeling population growth
linguistics – language changes
Chemistry – reaction models
Media – news effects, copycat reaction
Finance – credit scoring, fraud detection
Physics, Biology, etc.
The Logistic Model
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 5
Introduction System Under Investigation
Individuals /rough data/ => System => Model
=>=>
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 6
IntroductionSystem Identification Stages
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 7
Agenda
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 8
Agenda
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 9
Part I. Logistic Regression Model DevelopmentLogistic Model
Linear relation Logistic relation
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 10
k
ky
ky
N
– index of current individual – intercept
– number of observations – the i+1-th model parameter
– dependent variable – the i-th independent variable /prob. of good/
– model output – i-th independent variable/predicted prob. of good/
Part I. Logistic Regression Model DevelopmentLogistic Model
Logistic Relation – General Form “Linear” Log. Regression Model
k
k
M
M
ke
ey
+=
1ˆ
kMke
y −+=
1
1ˆ
knnkk xxM ,,110 ... θθθ +++=
)...( ,,1101
1ˆ
knnk xxke
y θθθ +++−+=
knnkyy xx
k
k,,110ˆ1
ˆ...ln θθθ +++=−
0θ
iθ
kix ,
ni ,1=
Nk ,1=
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 11
Part I. Logistic Regression Model DevelopmentLogistic Model
Notation
� Parameters vector
� Regression vector
� Logistic model
1+∈ nRθ
1+∈ nk Rϕ
Tn ]...[ 10 θθθθ =
Tknkk xx ]...1[ ,,1=ϕ
θϕθθθ Tkknnk ee
yxxk −+++− +
=+
=1
1
1
1ˆ
)...( ,,110
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 12
Part I. Logistic Regression Model DevelopmentResidual
The Residual
kkkk eyee
y Tk
+=++
=−
ˆ1
1θϕ
=−=−
=−=0,ˆ
1,ˆ1ˆ
for
for
kk
kkkkk yy
yyyye
Sources of Uncertainty
� Unavailable significant factors
� Simplified relations
� Time-varying performance
� Database errors
� Fraud
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 13
Agenda
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 14
Part I. Logistic Regression Model DevelopmentMaximum Likelihood Estimator
Cost Function
� Model output
� Likelihood contribution
� Likelihood function
� Log-likelihood function
Maximum Likelihood Criterion
kk yk
ykk yyl −−= 1
, )ˆ1(ˆθ
θθ
θθLL ln2minlnmax −⇔
∏=
=N
kklL
1,θθ
∑=
−−+=N
kkkkk yyyyL
1
))ˆ1ln()1(ˆln(ln θ
)|1(ˆ kkk yPy ϕ==
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 15
Part I. Logistic Regression Model DevelopmentMaximum Likelihood Estimator
Cost Function /-2 Log L/ for a Real Life Case
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 16
Tailor Series Expansion
Cost Function Models
� Linear model
� Quadratic model
Part I. Logistic Regression Model DevelopmentMaximum Likelihood Estimator
)()()1( ˆˆ iii θθθ ∆+=+
)()()(ˆ
)( )( iTiii gfM θθ
∆+=
)()()(21)()()(
ˆ)( )()( iiTiiTiii HgfM θθθ
θ∆∆+∆+=
3)()()(
21)()()(
ˆ)(
ˆ )()( OHgff iiTiiTiii +∆∆+∆+=∆+
θθθθθθ
)(ˆ
)( iTi fgθ
∇=)(
ˆ2)( ii fH
θ∇=
Cost function
Gradient
Hessian
)(ˆ
)(ˆ ln ii Lf
θθ−=
?)( =∆ iθ
Estimates Update
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 17
Part I. Logistic Regression Model DevelopmentMaximum Likelihood Estimator
Gradient Hessian
I-st Order Methods II-nd Order Method/e.g. Steepest Descent/ /e.g. Newton-Raphson/
gαθ −=∆ gH 1−−=∆ αθ
[ ] 1
10
+∂∂
∂∂
∂∂ ∈= nTfff Rg
nθθθ L11
2
2
1
2
0
2
1
2
21
2
01
2
0
2
10
2
20
2
+×+
∂∂
∂∂∂
∂∂∂
∂∂∂
∂∂
∂∂∂
∂∂∂
∂∂∂
∂∂
∈
= nn
fff
fff
fff
RH
nnn
n
n
θθθθθ
θθθθθ
θθθθθ
L
MOMM
L
L
θ(0) 1
2
θ*θopt
1
2
θ(0)
θ*θopt
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 18
Steepest Newton-Descent Raphson
(NR)
NR with NR with
Line Search Quadratic
Interpolation
1
2
θ(0)
θ*θopt
θ(0) 1
2
θ*θopt
Part I. Logistic Regression Model DevelopmentMaximum Likelihood Estimator
gαθ −=∆gH 1−−=∆ αθ
gH 1* −−=∆ αθgH 1* −−=∆ αθ
θ(0) 1
2
θ*θopt
θ(0) 1
2
θ*θopt
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 19
Agenda
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 20
Numerical Problems
Matrix inversion, hence SVD, EVD, QR, etc.
Local Minima
Part I. Logistic Regression Model DevelopmentPotential problems
Model Overfitting
αθθ −=+ )()1( ˆˆ ii 1−H g
-2lnL
k
y2,k
yk
1,ky
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 21
Agenda
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 22
Part I. Logistic Regression Model DevelopmentFrequently Used Statistics for Model Analysis
Individual Estimate Measures
� Standard error
� Wald statistic
� p-value
Overall Model Measures
� Coefficient of determination (R2)
� generalized R2
� gen. max. resc. R2
� Cost function
21
ˆ)ˆ( ~2ˆ
2
2ˆ
2
χθθ σ
θσ
θθ
i
i
i
iiiW == −
N
LL
eRθθ ˆln0ˆln
212
−
−=10ˆln2
1 −−= N
L
esR
θ
RsR
mR22 =
)(ˆ
)(ˆ ln2 ii Lf
θθ−=
iHi
)][diag( 1ˆ
−=θσ
21Pr χ>
χ
p-value
WWi
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 23
Part I. Logistic Regression Model DevelopmentFrequently Used Statistics for Model Analysis
Modified criteria
� Akaike Information Criterion (AIC)
� Schwarz Criterion (SC)
� Minimum Description Length (MDL), Final Prediction Error (FPE), etc.
Model Validation
� Data split into development and validation samples
nLAIC 2ln2 ˆˆ +−= θθ
)1ln(ln2 ˆˆ −+−= NnLSC θθ
AIC
-2lnL
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 24
Agenda
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 25
Agenda
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 26
Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea
xo, xe – sets of all variables, out/entered in the model
xoi, xei – the most/less significant variable
SLE – Significance Level to Enter
SLS – Significance Level to Stay
SWR
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 27
Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea
Available information
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 28
Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea
1
Initialization
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 29
Forward Selection
Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea
12
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 30
12 3
Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea
Forward Selection
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 31
2 3
Part II. Stepwise Logistic RegressionStepwise Logistic Regression – Basic Idea
Backward Elimination
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 32
Agenda
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 33
Part II. Stepwise Logistic RegressionStep 0. Initialization
Logistic model
1. Intercept Model
2. Full model
3. One Factor Model
� Check for Enter
� Score Chi-Sq for all potential models
� Maximum Score Chi-Square
� p-value & threshold
� Model Determination (Optimization)
θϕTke
yk −+=
1
1ˆ
iiTii gHgS 1−=
R∈θ 1=kϕ1+∈ nRθ T
knkk xx ]1[ ,,1 K=ϕ
ii
Smaxarg1 =l
SLEvalue-p1
<l
Tkk x ]1[ ,1l
=ϕ2R∈θ
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 34
Part II. Stepwise Logistic RegressionStep 1. Forward Selection
1. Check for Enter
� Score Chi-Square of all potential models
� Maximum Score Chi-Square
� p-value & threshold
2. Model Determination (Optimization)
3. Statistics for Model Analysis
� Individual Estimate Measures
� standard error
� Wald statistic & p-value
iiTii gHgS 1−=
ii
i Smaxarg=l
SLEvalue-p <il
Tkkk i
xx ]1[ ,,1 llK=ϕ1+∈ iRθ
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 35
Part II. Stepwise Logistic RegressionStep 1. Forward Selection
3. Statistics for Model Analysis (part 2)
� Overall Model Measures
� Coefficients of determination
� Cost function
� Modified criteria
� Akaike Information Criterion (AIC)
� Schwarz Criterion (SC)
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 36
Part II. Stepwise Logistic RegressionStepwise Logistic Regression
SWR
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 37
Part II. Stepwise Logistic RegressionStep 2. Backward Elimination
1. Check for Leave
� Wald statistic & p-value of all potential models
� p-value & threshold
2. Model Determination (Optimization)
3. Statistics for Model Analysis
� Individual Estimate Measures
� standard error
� Wald statistic & p-value
Tkkkkk ijj
xxxx ]1[ ,,,, 111 llllKK
+−=ϕiR∈θ
SLLvalue-pmax >il
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 38
3. Statistics for Model Analysis (part 2)
� Overall Model Measures
� Coefficients of determination
� Cost function
� Modified criteria
� Akaike Information Criterion (AIC)
� Schwarz Criterion (SC)
Part II. Stepwise Logistic RegressionStep 2. Backward Elimination
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 39
Agenda
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 40
Part II. Stepwise Logistic RegressionPotential problems in the Stepwise Regression
Local Minima & Initial Conditions
Numerical Problems /SVD, EVD, QR, etc./
Model Overfitting
© Experian Limited 2007. All rights reserved.Confidential and proprietary. 41
Summary
IntroductionApplications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model DevelopmentLogistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited.Confidential and proprietary.
Stepwise Logistic RegressionLecture for FMI Students 27.05.2010
Alexander Efremov
Thank You!
http://anp.tu-sofia.bg/aefremov/index.htm