25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background...
Transcript of 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background...
![Page 1: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/1.jpg)
25: Logistic RegressionLisa Yan
June 3, 2020
1
![Page 2: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/2.jpg)
Lisa Yan, CS109, 2020
Quick slide reference
2
3 Background 25a_background
9 Logistic Regression 25b_logistic_regression
27 Training: The big picture 25c_lr_training
56 Training: The details, Testing LIVE
59 Philosophy LIVE
63 Gradient Derivation 25e_derivation
![Page 3: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/3.jpg)
Background
3
25a_background
![Page 4: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/4.jpg)
Lisa Yan, CS109, 2020
1. Weighted sum
If πΏ = π1, π2, β¦ , ππ :
4
dot product
π = π1π1 + π2π2 +β―+ ππππ
=
π=1
π
ππππ weighted sum
= πππΏ
![Page 5: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/5.jpg)
Lisa Yan, CS109, 2020
1. Weighted sum
Recall the linear regression model, where πΏ = π1, π2, β¦ , ππ and π β β:
π πΏ = π0 +
π=1
π
ππππ
How would you rewrite this expression as a single dot product?
5
π€
πππΏ =
π=1
π
ππππDot product/
weighted sum
![Page 6: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/6.jpg)
Lisa Yan, CS109, 2020
1. Weighted sum
Recall the linear regression model, where πΏ = π1, π2, β¦ , ππ and π β β:
π πΏ = π0 +
π=1
π
ππππ
How would you rewrite this expression as a single dot product?
6
Define π0 = 1π πΏ = π0π0 + π1π1 + π2π2 +β―+ ππππ
= πππΏ
πππΏ =
π=1
π
ππππDot product/
weighted sum
New πΏ = 1, π1, π2, β¦ , ππ
Prepending π0 = 1 to each feature vector πΏ makes
matrix operators more accessible.
![Page 7: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/7.jpg)
Lisa Yan, CS109, 2020
2. Sigmoid function π π§
β’ The sigmoid function:
β’ Sigmoid squashes π§ toa number between 0 and 1.
β’ Recall definition of probability:A number between 0 and 1
7
0
0.2
0.4
0.6
0.8
1
-10 -8 -6 -4 -2 0 2 4 6 8 10
π π§ =1
1 + πβπ§
π π§
π§
π π§ can represent
a probability.
![Page 8: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/8.jpg)
Lisa Yan, CS109, 2020
3. Conditional likelihood function
Training data (π datapoints):
β’ π π , π¦ π drawn i.i.d. from a distribution π πΏ = π π , π = π¦ π |π = π π π , π¦ π |π
8
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , πconditional likelihood
of training data
log conditional likelihood
β’ MLE in this lecture is estimator that
maximizes conditional likelihood
β’ Confusingly, log conditional
likelihood is also written as πΏπΏ π
= arg maxπ
π=1
π
log π π¦ π | π π , π
= arg maxπ
πΏπΏ π
Review
![Page 9: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/9.jpg)
Logistic Regression
9
25b_logistic_regression
![Page 10: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/10.jpg)
Lisa Yan, CS109, 2020
β πΏ can be dependent
π€·ββοΈ Regression model ( π β β, not discrete)
10
Prediction models so far
Linear Regression (Regression)
π0 +
π=1
π
πππππΏ π
β Tractable with NB assumption, butβ¦
β οΈ Realistically, ππ features not
necessarily conditionally independent
π€·ββοΈ Actually models π πΏ, π , not π π|πΏ ?
π πΏ|π π π
πΏ
π
π πΏ, π
NaΓ―ve Bayes (Classification)
Review
π = arg maxπ¦= 0,1
π π | πΏ
= arg maxπ¦= 0,1
π πΏ|π π π
π = π0 + Οπ=1π ππππ
![Page 11: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/11.jpg)
Lisa Yan, CS109, 2020
Introducing Logistic Regression!
11
Linear Regression ideas Classification models
+ compute power
![Page 12: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/12.jpg)
Lisa Yan, CS109, 2020
Logistic Regression
Logistic RegressionModel:
Predict π as the most likely πgiven our observation πΏ = π:
β’ Since π β 0,1 , π π = 0|πΏ = π = 1 β π π0 + Οπ=1π πππ₯π
β’ Sigmoid function also known as βlogitβ function
12
π0 +
π=1
π
πππππΏ π π = 1|πΏ
π = arg maxπ¦= 0,1
π π | πΏ
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
sigmoid function
π π§ =1
1 + πβπ§π
![Page 13: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/13.jpg)
Lisa Yan, CS109, 2020
Logistic Regression
13
0.81π = [0,1,1]
π π = 1|πΏ = πconditional likelihood
πΏinput features
π parameter
Slides courtesy of Chris Piech
π π = 1|πΏ = π₯ = π π0 +
π=1
π
πππ₯π
![Page 14: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/14.jpg)
Lisa Yan, CS109, 2020
Logistic Regression cartoon
14
π parameter
Slides courtesy of Chris Piech
![Page 15: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/15.jpg)
Lisa Yan, CS109, 2020
Logistic Regression cartoon
15Slides courtesy of Chris Piech
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
+
π§ π π§
π π = 1|π
![Page 16: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/16.jpg)
Lisa Yan, CS109, 2020
Logistic Regression cartoon
16Slides courtesy of Chris Piech
πΏ, input features
0,1,1
π, output
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
+
π§ π π§
π π = 1|π
![Page 17: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/17.jpg)
Lisa Yan, CS109, 2020
Components of Logistic Regression
17Slides courtesy of Chris Piech
+
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
π§ π π§
π weights
(aka parameters)
π π = 1|π
![Page 18: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/18.jpg)
Lisa Yan, CS109, 2020
Components of Logistic Regression
18Slides courtesy of Chris Piech
+
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
π§ π π§
weighted sum
π π = 1|π
![Page 19: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/19.jpg)
Lisa Yan, CS109, 2020
Components of Logistic Regression
19Slides courtesy of Chris Piech
+
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
π§ π π§
squashing function
b/t 0 and 1
π π = 1|π
![Page 20: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/20.jpg)
Lisa Yan, CS109, 2020
Components of Logistic Regression
20Slides courtesy of Chris Piech
+
π§ π π§
π π = 1|π
prediction
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
![Page 21: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/21.jpg)
Lisa Yan, CS109, 2020
Different predictions for different inputs
21Slides courtesy of Chris Piech
+
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
π π = 1|π
πΏ, input features
0,1,1
![Page 22: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/22.jpg)
Lisa Yan, CS109, 2020
Different predictions for different inputs
22Slides courtesy of Chris Piech
+
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
π π = 1|π
πΏ, input features
0,0,1
![Page 23: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/23.jpg)
Lisa Yan, CS109, 2020
Parameters affect prediction
23Slides courtesy of Chris Piech
+
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
π π = 1|π
![Page 24: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/24.jpg)
Lisa Yan, CS109, 2020
Parameters affect prediction
24Slides courtesy of Chris Piech
+
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
π π = 1|π
![Page 25: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/25.jpg)
Lisa Yan, CS109, 2020
For simplicity
25
π π = 1|πΏ = π = π π0 +
π=1
π
πππ₯π
π π = 1|πΏ = π = π
π=0
π
πππ₯π = π πππ where π₯0 = 1
![Page 26: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/26.jpg)
Lisa Yan, CS109, 2020
Logistic regression classifier
2626
Testing π = arg maxπ¦= 0,1
π π|πΏ
Training π = π0, π1, π2, β¦ , ππ
Given an observation πΏ = π1, π2, β¦ , ππ , predict
π π = 1|πΏ = π = π Οπ=0π πππ₯π = π πππ
Estimate parameters
from training data
π = arg maxπ¦= 0,1
π π|πΏ
![Page 27: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/27.jpg)
Training:The big picture
27
25c_lr_training
![Page 28: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/28.jpg)
Lisa Yan, CS109, 2020
Logistic regression classifier
2828
Testing π = arg maxπ¦= 0,1
π π|π
Training π = π0, π1, π2, β¦ , ππ
Given an observation πΏ = π1, π2, β¦ , ππ , predict
Estimate parameters
from training data
Choose π that optimizes some objective:1. Determine objective function
2. Find gradient with respect to π3. Solve analytically by setting to 0, or
computationally with gradient ascent
We are modeling π π|πdirectly, so we maximize the
conditional likelihood of
training data.
π π = 1|πΏ = π = π Οπ=0π πππ₯π = π πππ
π = arg maxπ¦= 0,1
π π|πΏ
![Page 29: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/29.jpg)
Lisa Yan, CS109, 2020
1. Determine objectivefunction
2. Gradient w.r.t. ππ, for π = 0, 1, β¦ ,π
3. Solve
β’ No analytical derivation of πππΏπΈβ¦
β’ β¦but can still compute πππΏπΈwith gradient ascent!
Estimating π
29
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π
initialize xrepeat many times:compute gradient
x += Ξ· * gradient
![Page 30: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/30.jpg)
Lisa Yan, CS109, 2020
1. Determine objective function
30
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π = arg maxπ
πΏπΏ ππ π = 1|πΏ = π = π Οπ=0
π πππ₯π= π πππ
First: Interpret
conditional likelihood
with Logistic Regression
Second: Write a differentiable
expression for log conditional
likelihood
![Page 31: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/31.jpg)
Lisa Yan, CS109, 2020
1. Determine objective function (interpret)
Suppose you have π = 2 training datapoints: π 1 , 1 , π 2 , 0
Consider the following expressions for a given π:
31
π€
A. π πππ 1 π πππ 2
B. 1 β π πππ 1 π πππ 2
1. Interpret the above expressions as probabilities.
2. If we let π = πππΏπΈ, which probability should be highest?
C. π πππ 1 1 β π πππ 2
D. 1 β π πππ 1 1 β π πππ 2
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π = arg maxπ
πΏπΏ ππ π = 1|πΏ = π = π Οπ=0
π πππ₯π= π πππ
![Page 32: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/32.jpg)
Lisa Yan, CS109, 2020
1. Determine objective function (interpret)
Suppose you have π = 2 training datapoints: π 1 , 1 , π 2 , 0
Consider the following expressions for a given π:
32
A. π πππ 1 π πππ 2
B. 1 β π πππ 1 π πππ 2
1. Interpret the above expressions as probabilities.
2. If we let π = πππΏπΈ, which probability should be highest?
C. π πππ 1 1 β π πππ 2
D. 1 β π πππ 1 1 β π πππ 2
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π = arg maxπ
πΏπΏ ππ π = 1|πΏ = π = π Οπ=0
π πππ₯π= π πππ
![Page 33: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/33.jpg)
Lisa Yan, CS109, 2020
1. Determine objective function (write)
33
1. What is a differentiableexpression for π π = π¦| πΏ = π ?
2. What is a differentiable expressionfor πΏπΏ π , log conditional likelihood?
π π = π¦|πΏ = π = ΰ΅π πππ if π¦ = 1
1 β π πππ if π¦ = 0
πΏπΏ π = logΰ·
π=1
π
π π¦ π | π π , π
π€
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π = arg maxπ
πΏπΏ ππ π = 1|πΏ = π = π Οπ=0
π πππ₯π= π πππ
![Page 34: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/34.jpg)
Lisa Yan, CS109, 2020
1. Determine objective function (write)
34
1. What is a differentiableexpression for π π = π¦| πΏ = π ?
2. What is a differentiable expressionfor πΏπΏ π , log conditional likelihood?
π π = π¦|πΏ = π = ΰ΅π πππ if π¦ = 1
1 β π πππ if π¦ = 0
πΏπΏ π = logΰ·
π=1
π
π π¦ π | π π , π
Recall
Bernoulli MLE!
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π = arg maxπ
πΏπΏ ππ π = 1|πΏ = π = π Οπ=0
π πππ₯π= π πππ
![Page 35: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/35.jpg)
Lisa Yan, CS109, 2020
1. Determine objective function (write)
35
1. What is a differentiableexpression for π π = π¦| πΏ = π ?
2. What is a differentiable expressionfor πΏπΏ π , log conditional likelihood?
π π = π¦|πΏ = π = π ππππ¦1 β π πππ
1βπ¦
πΏπΏ π =
π=1
π
π¦(π) log π πππ(π) + 1 β π¦(π) log 1 β π πππ(π)
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π = arg maxπ
πΏπΏ ππ π = 1|πΏ = π = π Οπ=0
π πππ₯π= π πππ
![Page 36: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/36.jpg)
Lisa Yan, CS109, 2020
2. Find gradient with respect to π
Optimizationproblem:
Gradient w.r.t. ππ, for π = 0, 1, β¦ ,π:
36
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π = arg maxπ
πΏπΏ π
πΏπΏ π =
π=1
π
π¦(π) log π πππ(π) + 1 β π¦(π) log 1 β π πππ(π)
ππΏπΏ π
πππ=
π=1
π
π¦(π) β π πππ(π) π₯π(π)
(derived later)
How do we interpret the gradient
contribution of the i-th training datapoint? π€
![Page 37: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/37.jpg)
Lisa Yan, CS109, 2020
2. Find gradient with respect to π
Optimizationproblem:
Gradient w.r.t. ππ, for π = 0, 1, β¦ ,π:
37
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π = arg maxπ
πΏπΏ π
πΏπΏ π =
π=1
π
π¦(π) log π πππ(π) + 1 β π¦(π) log 1 β π πππ(π)
ππΏπΏ π
πππ=
π=1
π
π¦(π) β π πππ(π) π₯π(π)
(derived later)
scale by j-th feature
![Page 38: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/38.jpg)
Lisa Yan, CS109, 2020
π π = 1|πΏ = π π
2. Find gradient with respect to π
Optimizationproblem:
Gradient w.r.t. ππ, for π = 0, 1, β¦ ,π:
38
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π = arg maxπ
πΏπΏ π
πΏπΏ π =
π=1
π
π¦(π) log π πππ(π) + 1 β π¦(π) log 1 β π πππ(π)
ππΏπΏ π
πππ=
π=1
π
π¦(π) β π πππ(π) π₯π(π)
(derived later)
1 or 0
![Page 39: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/39.jpg)
Lisa Yan, CS109, 2020
Suppose π¦(π) = 1 (the true class label for π-th datapoint):
β’ If π πππ π β₯ 0.5, correct
β’ If π πππ π < 0.5, incorrect β change ππ more
2. Find gradient with respect to π
Optimizationproblem:
Gradient w.r.t. ππ, for π = 0, 1, β¦ ,π:
39
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π = arg maxπ
πΏπΏ π
πΏπΏ π =
π=1
π
π¦(π) log π πππ(π) + 1 β π¦(π) log 1 β π πππ(π)
ππΏπΏ π
πππ=
π=1
π
π¦(π) β π πππ(π) π₯π(π)
(derived later)
![Page 40: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/40.jpg)
Lisa Yan, CS109, 2020
1. Optimizationproblem:
2. Gradient w.r.t. ππ, for π = 0, 1, β¦ ,π:
3. Solve
3. Solve
40
ππΏπΏ π
πππ=
π=1
π
π¦(π) β π πππ(π) π₯π(π)
πππΏπΈ = arg maxπ
ΰ·
π=1
π
π π¦ π | π π , π = arg maxπ
πΏπΏ π
πΏπΏ π =
π=1
π
π¦(π) log π πππ(π) + 1 β π¦(π) log 1 β π πππ(π)
Stay tuned!
![Page 41: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/41.jpg)
(live)25: Logistic RegressionSlides by Lisa Yan
August 12, 2020
41
![Page 42: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/42.jpg)
Lisa Yan, CS109, 2020
Logistic Regression Model
4242
π π = 1|πΏ = π = π
π=0
π
πππ₯π = π πππ
π = arg maxπ¦= 0,1
π π|πΏ
Review
π is prediction of π
π0 +
π=1
π
πππππΏ π π = 1|πΏ
sigmoid function
where π₯0 = 1
π π§ =1
1 + πβπ§
![Page 43: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/43.jpg)
Lisa Yan, CS109, 2020
For the βcorrectβ parameters π:
β’ π, 1 should have πππ₯ > 0β’ π, 0 should have πππ₯ β€ 0
Another view of Logistic Regression
Logistic
Regression
Model
π π = 1|πΏ = π = π πππ πππ =
π=0
π
πππ₯πwhere
π§ = πππ
π§, 1
π§, 0
0
![Page 44: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/44.jpg)
Lisa Yan, CS109, 2020
Learning parameters
44
Training
Learn parameters π = π0, π1, β¦ , ππ
Review
πΏπΏ π =
π=1
π
π¦(π) log π πππ(π) + 1 β π¦(π) log 1 β π πππ(π)
πππΏπΈ = arg maxπ
πΏπΏ π
β’ No analytical derivation of πππΏπΈβ¦
β’ β¦but can still compute πππΏπΈ with gradient ascent!
ππΏπΏ π
πππ=
π=1
π
π¦(π) β π πππ(π) π₯π(π)
for π = 0, 1, β¦ ,π
![Page 45: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/45.jpg)
Lisa Yan, CS109, 2020
Gradient Ascent
Walk uphill and you will find a local maxima(if your step is small enough).
45
πΏπ
π1 π2Logistic regression πΏπΏ π
is concave
Review
![Page 46: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/46.jpg)
Training:The details
46
LIVE
![Page 47: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/47.jpg)
Lisa Yan, CS109, 2020
Training: Gradient ascent step
3. Optimize.
47
ππΏπΏ π
πππ=
π=1
π
π¦(π) β π πππ(π) π₯π(π)
ππnew = ππ
old + π β ππΏπΏ πold
πππold
for all thetas:
= ππold + π β
π=1
π
π¦(π) β π πoldππ(π) π₯π
(π)
What does this look like in code?
repeat many times:
![Page 48: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/48.jpg)
Lisa Yan, CS109, 2020
Training: Gradient Ascent
48
ππnew = ππ
old + π β
π=1
π
π¦(π) β π πoldππ(π) π₯π
(π)
initialize ππ = 0 for 0 β€ j β€ m
repeat many times:
gradient[j] = 0 for 0 β€ j β€ m
// compute all gradient[j]βs// based on n training examples
ππ += Ξ· * gradient[j] for all 0 β€ j β€ m
Gradient
Ascent Step
![Page 49: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/49.jpg)
Lisa Yan, CS109, 2020
Training: Gradient Ascent
49
// update gradient[j] for// current (x,y) example
for each 0 β€ j β€ m:
ππnew = ππ
old + π β
π=1
π
π¦(π) β π πoldππ(π) π₯π
(π)Gradient
Ascent Step
initialize ππ = 0 for 0 β€ j β€ m
repeat many times:
gradient[j] = 0 for 0 β€ j β€ m
for each training example (x, y):
ππ += Ξ· * gradient[j] for all 0 β€ j β€ m
![Page 50: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/50.jpg)
Lisa Yan, CS109, 2020
initialize ππ = 0 for 0 β€ j β€ m
repeat many times:
gradient[j] = 0 for 0 β€ j β€ m
for each training example (x, y):
ππ += Ξ· * gradient[j] for all 0 β€ j β€ m
Training: Gradient Ascent
50
for each 0 β€ j β€ m:
gradient[j] +=
ππnew = ππ
old + π β
π=1
π
π¦(π) β π πoldππ(π) π₯π
(π)Gradient
Ascent Step
π¦ β1
1 + πβπππ
π₯πWhat are the important details?
![Page 51: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/51.jpg)
Lisa Yan, CS109, 2020
initialize ππ = 0 for 0 β€ j β€ m
repeat many times:
gradient[j] = 0 for 0 β€ j β€ m
for each training example (x, y):
ππ += Ξ· * gradient[j] for all 0 β€ j β€ m
Training: Gradient Ascent
51
for each 0 β€ j β€ m:
gradient[j] +=
ππnew = ππ
old + π β
π=1
π
π¦(π) β π πoldππ(π) π₯π
(π)Gradient
Ascent Step
π¦ β1
1 + πβπππ
π₯π
β’ π₯π is π-th feature of
input π = π₯1, β¦ , π₯π
![Page 52: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/52.jpg)
Lisa Yan, CS109, 2020
initialize ππ = 0 for 0 β€ j β€ m
repeat many times:
gradient[j] = 0 for 0 β€ j β€ m
for each training example (x, y):
ππ += Ξ· * gradient[j] for all 0 β€ j β€ m
Training: Gradient Ascent
52
for each 0 β€ j β€ m:
gradient[j] +=
ππnew = ππ
old + π β
π=1
π
π¦(π) β π πoldππ(π) π₯π
(π)Gradient
Ascent Step
π¦ β1
1 + πβπππ
π₯π
β’ π₯π is π-th feature of
input π = π₯1, β¦ , π₯πβ’ Insert π₯0 = 1 before
training
![Page 53: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/53.jpg)
Lisa Yan, CS109, 2020
initialize ππ = 0 for 0 β€ j β€ m
repeat many times:
gradient[j] = 0 for 0 β€ j β€ m
for each training example (x, y):
ππ += Ξ· * gradient[j] for all 0 β€ j β€ m
Training: Gradient Ascent
53
for each 0 β€ j β€ m:
gradient[j] +=
ππnew = ππ
old + π β
π=1
π
π¦(π) β π πoldππ(π) π₯π
(π)Gradient
Ascent Step
π¦ β1
1 + πβπππ
π₯π
β’ π₯π is π-th feature of
input π = π₯1, β¦ , π₯πβ’ Insert π₯0 = 1 before
training
β’ Finish computing
gradient before
updating any part of π
![Page 54: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/54.jpg)
Lisa Yan, CS109, 2020
initialize ππ = 0 for 0 β€ j β€ m
repeat many times:
gradient[j] = 0 for 0 β€ j β€ m
for each training example (x, y):
ππ += Ξ· * gradient[j] for all 0 β€ j β€ m
Training: Gradient Ascent
54
for each 0 β€ j β€ m:
gradient[j] +=
ππnew = ππ
old + π β
π=1
π
π¦(π) β π πoldππ(π) π₯π
(π)Gradient
Ascent Step
π¦ β1
1 + πβπππ
π₯π
β’ π₯π is π-th feature of
input π = π₯1, β¦ , π₯πβ’ Insert π₯0 = 1 before
training
β’ Finish computing
gradient before
updating any part of πβ’ Learning rate π is a
constant you set
before training
![Page 55: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/55.jpg)
Lisa Yan, CS109, 2020
initialize ππ = 0 for 0 β€ j β€ m
repeat many times:
gradient[j] = 0 for 0 β€ j β€ m
for each training example (x, y):
ππ += Ξ· * gradient[j] for all 0 β€ j β€ m
Training: Gradient Ascent
55
for each 0 β€ j β€ m:
gradient[j] +=
ππnew = ππ
old + π β
π=1
π
π¦(π) β π πoldππ(π) π₯π
(π)Gradient
Ascent Step
π¦ β1
1 + πβπππ
π₯π
β’ π₯π is π-th feature of
input π = π₯1, β¦ , π₯πβ’ Insert π₯0 = 1 before
training
β’ Finish computing
gradient before
updating any part of πβ’ Learning rate π is a
constant you set
before training
![Page 56: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/56.jpg)
Testing
56
LIVE
![Page 57: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/57.jpg)
Lisa Yan, CS109, 2020
Introducing notation ΰ·π¦
5757
π π = 1|πΏ = π = π Οπ=0π πππ₯π = π πππ
π = arg maxπ¦= 0,1
π π|πΏ
ΰ·π¦ = π π = 1|πΏ = π = π πππ
π π = π¦|πΏ = π = αΰ·π¦ if π¦ = 11 β ΰ·π¦ if π¦ = 0
Small ΰ·π¦ is
conditional probability
π is prediction of π
![Page 58: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/58.jpg)
Lisa Yan, CS109, 2020
Testing: Classification with Logistic Regression
58
Testing
β’ Compute ΰ·π¦ = π π = 1|πΏ = π = π πππ =
β’ Classify instance as:
α1 ΰ·π¦ > 0.5, equivalently πππ > 00 otherwise
Training
Learn parameters π = π0, π1, β¦ , ππ
via gradient
ascent: ππnew = ππ
old + π β
π=1
π
π¦(π) β π πoldππ(π) π₯π
(π)
1
1 + πβπππ
Parameters ππ are not updated during testing phaseβ οΈ
![Page 59: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/59.jpg)
Interlude for jokes/announcements
59
![Page 60: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/60.jpg)
Lisa Yan, CS109, 2020
Announcements
1. Pset 6 due tomorrow at 1pm. No late days or on-time bonus for this pset.
2. Look out for extra office hours + review session for the Final Quiz
3. Final Quiz begins Friday 5pm and ends Sunday 5pm.
4. Youβre so close, you got this!
60
![Page 61: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/61.jpg)
Lisa Yan, CS109, 2020
Ethics and datasets
Sometimes machine learning feels universally unbiased.
We can even prove our estimators are βunbiasedβ (mathematically).
Google/Nikon/HP had biased datasets.
61
![Page 62: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/62.jpg)
Lisa Yan, CS109, 2020
Should your data be unbiased?
62
Dataset: Google News
Bolukbasi et al., Man is to Computer Programmer as Woman is to
Homemaker? Debiasing Word Embeddings. NIPS 2016
Should our unbiased data collection reflect societyβs systemic bias?
![Page 63: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/63.jpg)
Lisa Yan, CS109, 2020
How can we explain decisions?
63
If your task is image classification,reasoning about high-level features is relatively easy.
Everything can be visualized.
What if you are trying to classify social outcomes?
β’ Criminal recidivism
β’ Job performance
β’ Policing
β’ Terrorist risk
β’ At-risk kids
![Page 64: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/64.jpg)
Ethics in Machine Learning is a whole new field. π
64
![Page 65: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/65.jpg)
Philosophy
65
LIVE
![Page 66: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/66.jpg)
Lisa Yan, CS109, 2020
Logistic Regression is trying to fita line that separates data instanceswhere π¦ = 1 from those where π¦ = 0:
β’ We call such data (or functionsgenerating the data linearly separable.
β’ NaΓ―ve Bayes is linear too, because there is no interaction between different features.
Intuition about Logistic Regression
66
πππ = 0
Logistic
Regression
Model
π π = 1|πΏ = π = π πππ πππ =
π=0
π
πππ₯πwhere
![Page 67: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/67.jpg)
Lisa Yan, CS109, 2020
Data is often not linearly separable
β’ Not possible to draw a line that successfully separates all the π¦ = 1 points (green) from the π¦ = 0 points (red)
β’ Despite this fact, Logistic Regression and Naive Bayes still often work well in practice
67
![Page 68: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/68.jpg)
Lisa Yan, CS109, 2020
Many tradeoffs in choosing an algorithm
68
NaΓ―ve Bayes Logistic Regression
Modeling goal π πΏ, π π π|πΏ
Generative: could use joint
distribution to generate new
points (β οΈbut you might not
need this extra effort)
Generative or
discriminative?
Discriminative: just tries to
discriminate π¦ = 0 vs π¦ = 1(β cannot generate new points
b/c no π πΏ, π )
Continuous input
featuresβ Yes, easily
β οΈ Needs parametric form
(e.g., Gaussian) or
discretized buckets (for
multinomial features)
Discrete input
features
β Yes, multi-value discrete
data = multinomial π ππ|π
β οΈ Multi-valued discrete data
hard (e.g., if ππ β {π΄, π΅, πΆ}, not
necessarily good to encode as
1, 2, 3
![Page 69: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/69.jpg)
Gradient Derivation
69
25e_derivation
![Page 70: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/70.jpg)
Lisa Yan, CS109, 2020
Background: Calculus
70
ππ π₯
ππ₯=ππ π§
ππ§
ππ§
ππ₯Calculus Chain Rule
aka decomposition
of composed functionsπ π₯ = π π§ π₯
Calculus refresher #1:
Derivative(sum) =
sum(derivative)
Calculus refresher #2:
Chain rule πππ
π
ππ₯
π=1
π
ππ π₯ =
π=1
ππππ π₯
ππ₯
![Page 71: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/71.jpg)
Lisa Yan, CS109, 2020
Are you ready?
71
Right now!!!
![Page 72: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/72.jpg)
Lisa Yan, CS109, 2020
Compute gradient of log conditional likelihood
72
πΏπΏ π =
π=1
π
π¦(π) log π πππ(π) + 1 β π¦(π) log 1 β π πππ(π)log conditional
likelihood
ππΏπΏ π
πππwhereFind:
![Page 73: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/73.jpg)
Lisa Yan, CS109, 2020
What is π
ππππ πππ ?
A. π π₯π 1 β π π₯π π₯π
B. π πππ 1 β π πππ π
C. π πππ 1 β π πππ π₯π
D. π πππ π₯π 1 β π πππ π₯π
E. None/other
Aside: Sigmoid has a beautiful derivative
73
π€
Derivative:Sigmoid function:
π π§ =1
1 + πβπ§π
ππ§π π§ = π π§ 1 β π π§
![Page 74: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/74.jpg)
Lisa Yan, CS109, 2020
Aside: Sigmoid has a beautiful derivative
What is π
ππππ πππ ?
A. π π₯π 1 β π π₯π π₯π
B. π πππ 1 β π πππ π
C. π πππ 1 β π πππ π₯π
D. π πππ π₯π 1 β π πππ π₯π
E. None/other74
Derivative:Sigmoid function:
π π§ =1
1 + πβπ§π
ππ§π π§ = π π§ 1 β π π§
Let π§ = πππ
π
ππππ πππ =
π
ππ§π π§ β
ππ§
πππ(Chain Rule)
=
π=0
π
πππ₯π .
= π πππ 1 β π πππ π₯π
![Page 75: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/75.jpg)
Lisa Yan, CS109, 2020
Re-itroducing notation ΰ·π¦
7575
π π = 1|πΏ = π = π Οπ=0π πππ₯π = π πππ
π = arg maxπ¦= 0,1
π π|πΏ
ΰ·π¦ = π π = 1|πΏ = π = π πππ
π π = π¦|πΏ = π = αΰ·π¦ if π¦ = 11 β ΰ·π¦ if π¦ = 0
π π = π¦|πΏ = π = ΰ·π¦ π¦ 1 β ΰ·π¦ 1βπ¦
![Page 76: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/76.jpg)
Lisa Yan, CS109, 2020
Compute gradient of log conditional likelihood
76
πΏπΏ π =
π=1
π
π¦(π) log π πππ(π) + 1 β π¦(π) log 1 β π πππ(π)log conditional
likelihood
ππΏπΏ π
πππwhereFind:
πΏπΏ π =
π=1
π
π¦(π) log ΰ·π¦ π + 1 β π¦(π) log 1 β ΰ·π¦ π
![Page 77: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/77.jpg)
Lisa Yan, CS109, 2020
Compute gradient of log conditional likelihood
77
Let ΰ·π¦ π = π πππ(π)ππΏπΏ π
πππ=
π=1
ππ
ππππ¦(π) log ΰ·π¦ π + 1 β π¦(π) log 1 β ΰ·π¦ π
=
π=1
ππ
π ΰ·π¦ ππ¦(π) log ΰ·π¦ π + 1 β π¦(π) log 1 β ΰ·π¦ π β
π ΰ·π¦ π
πππ(Chain Rule)
=
π=1
π
π¦(π)1
ΰ·π¦ πβ 1 β π¦(π)
1
1 β ΰ·π¦ πβ ΰ·π¦ π 1 β ΰ·π¦ π π₯π
π (calculus)
=
π=1
π
π¦(π) β ΰ·π¦ π π₯π(π)
=
π=1
π
π¦(π) β π πππ π π₯π(π) (simplify)
![Page 78: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/78.jpg)
Lisa Yan, CS109, 2020
Compute gradient of log conditional likelihood
78
Let ΰ·π¦ π = π πππ(π)ππΏπΏ π
πππ=
π=1
ππ
ππππ¦(π) log ΰ·π¦ π + 1 β π¦(π) log 1 β ΰ·π¦ π
=
π=1
ππ
π ΰ·π¦ ππ¦(π) log ΰ·π¦ π + 1 β π¦(π) log 1 β ΰ·π¦ π β
π ΰ·π¦ π
πππ
=
π=1
π
π¦(π)1
ΰ·π¦ πβ 1 β π¦(π)
1
1 β ΰ·π¦ πβ ΰ·π¦ π 1 β ΰ·π¦ π π₯π
π
=
π=1
π
π¦(π) β ΰ·π¦ π π₯π(π)
=
π=1
π
π¦(π) β π πππ π π₯π(π)
π
(Chain Rule)
(calculus)
(simplify)
![Page 79: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/79.jpg)
Wow. We did it!
79
![Page 80: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/80.jpg)
CS109 Wrap-
80
LIVE
![Page 81: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/81.jpg)
What have we learned in CS109?
81
![Page 82: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/82.jpg)
Lisa Yan, CS109, 2020
A wild journey
82
Computer science Probability
![Page 83: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/83.jpg)
Lisa Yan, CS109, 2020
From combinatorics to probabilityβ¦
83
π πΈ + π πΈπΆ = 1
![Page 84: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/84.jpg)
Lisa Yan, CS109, 2020
β¦to random variables and the Central Limit Theoremβ¦
84
Bernoulli
Poisson
Gaussian
![Page 85: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/85.jpg)
Lisa Yan, CS109, 2020
β¦to statistics, parameter estimation, and machine learning
85
A happy
Bhutanese person
πΏπΏ πFlu
Under-
grad
TiredFever
and Learn
![Page 86: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/86.jpg)
Lisa Yan, CS109, 2020
Lots and lots of analysis
86
Climate
sensitivity
Bloom filters
Peer
grading
Biometric keystroke
recognition
Web MD
inference
Coursera A/B testing
![Page 87: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/87.jpg)
Lisa Yan, CS109, 2020
Lots and lots of analysis
87
Heart
Ancestry Netflix
![Page 88: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/88.jpg)
Lisa Yan, CS109, 2020
After CS109
88
Statistics
Stats 200 β Statistical Inference
Stats 208 β Intro to the Bootstrap
Stats 209 β Group Methods/Causal Inference
Theory
CS161 β Algorithmic analysis
CS168 - ~Modern~ Algorithmic Analysis
Stats 217 β Stochastic Processes
CS238 β Decision Making Under Uncertainty
CS228 β Probabilistic Graphical Models
![Page 89: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/89.jpg)
Lisa Yan, CS109, 2020
After CS109
89
AI
CS 221 β Intro to AI
CS 229 β Machine Learning
CS 230 β Deep Learning
CS 224N β Natural Language Processing
CS 231N β Conv Neural Nets for Visual Recognition
CS 234 β Reinforcement Learning
Applications
CS 279 β Bio Computation
Literally any class with numbers in it
![Page 90: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/90.jpg)
What do you want to remember in 5 years?
90
![Page 91: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/91.jpg)
Why study probability + CS?
91
![Page 92: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/92.jpg)
Lisa Yan, CS109, 2020
Why study probability + CS?
92
Source: US Bureau of Labor Statistics
![Page 93: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/93.jpg)
Lisa Yan, CS109, 2020
Why study probability + CS?
93
Interdisciplinary Closest thing to magic
![Page 94: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/94.jpg)
Lisa Yan, CS109, 2020
Why study probability + CS?
94
Everyone is welcome!
![Page 95: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/95.jpg)
Technology magnifies.
What do we want magnified?
95
![Page 96: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/96.jpg)
You are all one step closer to improving the world.
(all of you!)
96
![Page 97: 25: Logistic Regression...Lisa Yan, CS109, 2020 Quick slide reference 2 3 Background 25a_background 9 Logistic Regression 25b_logistic_regression 27 Training: The big picture 25c_lr_training](https://reader034.fdocuments.net/reader034/viewer/2022051915/600661679c7b0225db79389a/html5/thumbnails/97.jpg)
The end
97