Logistic LASSO Regression for Dietary Intakes and Breast ...
UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model...
Transcript of UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model...
![Page 1: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/1.jpg)
UVA CS 6316: Machine Learning
Lecture 6: Linear Regression Model with Regularizations
Dr. Yanjun Qi
University of Virginia Department of Computer Science
![Page 2: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/2.jpg)
Last: Multivariate Linear Regression with basis Expansion
Regression: y continuous
Y = Weighted linear sum of (X basis expansion)
Sum of Squared Error (Least Squared)
Normal Equation / GD / SGD
Regression coefficients
9/18/19 Dr. Yanjun Qi / UVA CS 2
Task: y
Representation: x, f()
Score Function: L()
Search/Optimization : argmin()
Models, Parameters :
!! y =θ0 + θ jϕ j(x)j=1m∑ =ϕ(x)Tθ
![Page 3: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/3.jpg)
Today: Regularized multivariate linear regression
Regression
Y = Weighted linear sum of X’s
Least-squares + Regularization
Linear algebra for Ridge / sub-GD for Lasso & Elastic
Regression coefficients (regularized weights)
Task
Representation
Score Function
Search/Optimization
Models, Parameters
9/18/19 Dr. Yanjun Qi / UVA CS 3
min J(β ) = Y −Y^⎛
⎝⎞⎠
2
i=1
n
∑ + λ( β jq )1/q
j=1
p
∑
![Page 4: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/4.jpg)
We aim to make the learned model
•1. Generalize Well
• 2. Computational Scalable and Efficient
• 3. Robust / Trustworthy / Interpretable• Especially for some domains, this is about trust!
9/18/19 Dr. Yanjun Qi / UVA CS 4
![Page 5: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/5.jpg)
Today
q Linear Regression Model with Regularizations
üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüHow to Choose Regularization Parameter
9/18/19 Dr. Yanjun Qi / UVA CS 5
![Page 6: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/6.jpg)
SUPERVISED Regression
• When, target Y is a continuous target variable
9/18/19 Dr. Yanjun Qi / 6
f(x?)
Training dataset consists of input-
output pairs
![Page 7: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/7.jpg)
Review: Normal equation for LR• Write the cost function in matrix form:
To minimize J(θ), take derivative and set to zero:
9/18/19 Dr. Yanjun Qi / UVA CS
7
J(β)= 12 (x iTβ − yi )2i=1
n
∑
= 12 Xβ − !y( )T Xβ − !y( )= 12 βT XT Xβ −βT XT !y − !yT Xβ + !yT !y( )
⇒ XTXβ = XT !yThe normal equations
β* = XTX( )−1 XT !y
ß
X =
−− x1T −−
−− x2T −−
! ! !−− xn
T −−
"
#
$$$$$
%
&
'''''
Y =
y1y2!yn
!
"
#####
$
%
&&&&&
Assume that XTX is invertible
![Page 8: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/8.jpg)
Comments on the normal equation
What if X has less than full column rank? àNot Invertible
9/18/19 Dr. Yanjun Qi / UVA CS
8
![Page 9: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/9.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 9
Page11 0f Handout L2
![Page 10: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/10.jpg)
Today
q Linear Regression Model with Regularizations
üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüHow to Choose Regularization Parameter
9/18/19 Dr. Yanjun Qi / UVA CS 10
![Page 11: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/11.jpg)
A norm of a vector ||x|| is informally a measure of the “length” of the vector.
– Common norms: L1, L2 (Euclidean)
– Linfinity
Review: Vector norms
9/18/19 Dr. Yanjun Qi / UVA CS 11
![Page 12: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/12.jpg)
Review: Vector Norm (L2, when p=2)
9/18/19 Dr. Yanjun Qi / UVA CS 12
![Page 13: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/13.jpg)
Lasso Quadratic
Norms
![Page 14: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/14.jpg)
Ridge Regression / L2 Regularization
• If not invertible, a classical solution is to add a small positive element to diagonal
9/18/19 Dr. Yanjun Qi / UVA CS 14
β * = XTX +λI( )−1 XT !y
By convention, the bias/intercept term is typically not regularized. Here we assume data has been centered … therefore no bias term
β* = XTX( )−1 XT !y
![Page 15: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/15.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 15
One important property of positive definite matrices is that
è They are always full rank, and hence, invertible.
è Extra: See Proof at Page 17-18 of Linear-Algebra Handout
Extra: Positive Definite Matrix
![Page 16: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/16.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 16
![Page 17: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/17.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 17
β* = XTX +λI( )−1 XT !y
Extra: Positive Definite Matrix
![Page 18: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/18.jpg)
Ridge Regression / Squared Loss+L2
• As the solution from
9/18/19 Dr. Yanjun Qi / UVA CS 18
β * = XTX +λI( )−1 XT !y
β! ridge = argmin( y − Xβ)T( y − Xβ)+λβTβ
HW2
to minimize, take derivative and set to zero
By convention, the bias/intercept term is typically not regularized. Here we assume data has been centered … therefore no bias term
![Page 19: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/19.jpg)
Ridge Regression / Squared Loss+L2
9/18/19 Dr. Yanjun Qi / UVA CS 19
β * = XTX +λI( )−1 XT !y
β!ridge
= argmin( y − Xβ )T( y − Xβ )+λβTβ
HW2
to minimize, take derivative and set to zero
By convention, the bias/intercept term is typically not regularized. Here we assume data has been centered … therefore no bias term
• As the solution from
![Page 20: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/20.jpg)
Ridge Regression / Squared Loss+L2
9/18/19 Dr. Yanjun Qi / UVA CS 20
β * = XTX +λI( )−1 XT !y
β!ridge
= argmin( y − Xβ )T( y − Xβ )+λβTβ
HW2
to minimize, take derivative and set to zero
By convention, the bias/intercept term is typically not regularized. Here we assume data has been centered … therefore no bias term
• As the solution from
• Equivalently β!ridge
= argmin( y − Xβ )T( y − Xβ )subjectto
j={1..p}∑ β j
2 ≤ s2
![Page 21: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/21.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 21
Surface map
Contour map
Review
![Page 22: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/22.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 22
β!ridge
= argmin( y − Xβ )T( y − Xβ )subjectto
j={1..p}∑ β j
2 ≤ s2
![Page 23: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/23.jpg)
Objective Function’s Contour lines from Ridge Regression
9/18/19 Dr. Yanjun Qi / UVA CS 23
OLS: Least Square
solution
s
β!ridge
= argmin( y − Xβ )T( y − Xβ )subjectto
j={1..p}∑ β j
2 ≤ s2
![Page 24: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/24.jpg)
Objective Function’s Contour lines from Ridge Regression
9/18/19 Dr. Yanjun Qi / UVA CS 24
OLS: Least Square
solution
Ridge Regression
solution
s
![Page 25: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/25.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 25
�1
�2
Least Square+L2: Ridge solution
s
Least Square
solution
Ridge Regression
solution
![Page 26: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/26.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 26
![Page 27: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/27.jpg)
Parameter Shrinkage
9/18/19 Dr. Yanjun Qi / UVA CS 27
βOLS = XTX( )−1 XT !y
Page65 of ESL book @ http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
When
When
βRg = XTX +λI( )−1 XT !y
![Page 28: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/28.jpg)
Extra: two forms of Ridge Regression
• Totally equivalent
9/18/19 Dr. Yanjun Qi / UVA CS 28
http://stats.stackexchange.com/questions/190993/how-to-find-regression-coefficients-beta-in-ridge-regression
![Page 29: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/29.jpg)
Ridge Regression: Squared Loss+L2
• > 0 penalizes each
• if = 0 we get the least squares estimator;
• if , then to zero
9/18/19 Dr. Yanjun Qi / UVA CS 29
l
l
¥®l
β𝑗
β𝑗
![Page 30: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/30.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 30
�1
�2
üInfluence of Regularization Parameter
Least Square
solution
Ridge Regression
solution
![Page 31: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/31.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 31
�1
�2
�1
�2
λ→∞
üInfluence of Regularization Parameter
![Page 32: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/32.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 32
�1
�2 λ→0üInfluence of Regularization Parameter
![Page 33: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/33.jpg)
Today
q Linear Regression Model with Regularizations
üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüHow to Pick Regularization Parameter
9/18/19 Dr. Yanjun Qi / UVA CS 33
![Page 34: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/34.jpg)
(2) Lasso (least absolute shrinkage and selection operator) / Squared Loss+L1
• The lasso is a shrinkage method like ridge, but acts in a nonlinear manner on the outcome y.
• The lasso is defined by
9/18/19 Dr. Yanjun Qi / UVA CS 34
β lasso = argmin( y − X β )T( y − X β )subjectto β j ≤ s∑
By convention, the bias/intercept term is typically not regularized. Here we assume data has been centered … therefore no bias term
( yi − xiTβ )2i=1n∑
![Page 35: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/35.jpg)
Lasso (least absolute shrinkage and selection operator)
• Suppose in 2 dimension• β= (β1 , β2)• | β1 |+| β2 |=const• | β1 |+|- β2 |=const• | -β1 |+| β2 |=const• | -β1 |+| -β2 |=const
9/18/19 Dr. Yanjun Qi / UVA CS 35
s
Least Square
solution
Lasso Solution
![Page 36: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/36.jpg)
• In the Figure, the solution has eliminated the role of x2, leading to sparsity
9/18/19 Dr. Yanjun Qi / UVA CS 36
s
Least Square
solution
Lasso Solution
![Page 37: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/37.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 37
Ridge Regression
Lasso Estimator
ss
![Page 38: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/38.jpg)
Lasso (least absolute shrinkage and selection operator)
• Notice that ridge penalty is replaced by
• Due to the nature of the constraint, if tuning parameter is chosen small enough, then the lasso will set some coefficients exactly to zero.
9/18/19 Dr. Yanjun Qi / UVA CS 38
å jbå 2
jb
![Page 39: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/39.jpg)
Lasso: Implicit Feature Selection
9/18/19 Dr. Yanjun Qi / UVA CS 39
X
p
n
pʼ
![Page 40: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/40.jpg)
e.g., Leukemia Diagnosis
9/18/19 Dr. Yanjun Qi / UVA CS 40
Golub et al, Science Vol 286:15 Oct. 1999
-1
+1
n
pʼ
{yi},
![Page 41: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/41.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 41
![Page 42: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/42.jpg)
Today
q Linear Regression Model with Regularizations
üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüHow to Pick Regularization Parameter
9/18/19 Dr. Yanjun Qi / UVA CS 42
![Page 43: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/43.jpg)
Lasso for when p>n• Prediction accuracy and model interpretation are two important
aspects of regression models.
• LASSO does shrinkage and variable selection simultaneously for better prediction and model interpretation.
Disadvantage:-In p>n case, lasso selects at most n variable before it saturates -If there is a group of variables among which the pairwise
correlations are very high, then lasso select one from the group
9/18/19 Dr. Yanjun Qi / UVA CS 43
![Page 44: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/44.jpg)
(3) Hybrid of Ridge and Lasso : Elastic Net regularization
• L1 part of the penalty generates a sparse model • L2 part of the penalty (extra):
• Remove the limitation of the number of selected variables • Encouraging group effect• Stabilize the L1 regularization path
9/18/19 Dr. Yanjun Qi / UVA CS 44
![Page 45: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/45.jpg)
Naïve elastic net• For any non negative fixed λ1 and λ2, naive elastic net criterion:
• The naive elastic net estimator is the minimizer of above equation
• Equivalently:
9/18/19 Dr. Yanjun Qi / UVA CS 45
![Page 46: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/46.jpg)
Naïve elastic net• For any non negative fixed λ1 and λ2, naive elastic net criterion:
• The naive elastic net estimator is the minimizer of above
• Equivalently:
9/18/19 Dr. Yanjun Qi / UVA CS 46
![Page 47: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/47.jpg)
Geometry of elastic net
9/18/19 Dr. Yanjun Qi / UVA CS 47
![Page 48: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/48.jpg)
e.g. A Practical Application of Regression Model
9/18/19 Dr. Yanjun Qi / UVA CS 48
Proceedings of HLT ’2010 Human Language Technologies:
![Page 49: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/49.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 49
![Page 50: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/50.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 50
e.g., Movie Reviews and Revenues: An Experiment in Text Regression, Proceedings of HLT '10 (1.7k n / >3k features)
e.g. counts of a ngram in
the text
![Page 51: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/51.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 51
The feature weights can be directly interpreted as U.S. dollars contributed to the predicted value yˆ by each
occurrence of the feature.
to movies
A REAL APPLICATION: Movie Reviews and meta to Revenues
![Page 52: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/52.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 52
Movie Reviews and Revenues: An Experiment in Text Regression, Proceedings of HLT '10 Human Language Technologies:
Use linear regression to directly predict the opening weekend gross earnings, denoted as y, based on features x extracted from the
movie metadata and/or the text of the reviews.
![Page 53: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/53.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 53
An example of how real applications use the elastic net and its weights!
Here, the features are from the text-only model annotated in Table 2.
The feature weights can be directly interpreted as U.S. dollars contributed to the predicted value by each occurrence of the feature.
Sentiment-related text features are not as prominent as might be expected, and their overall proportion in the set of features with non-zero weights is quite small (estimated in preliminary trials at less than 15%). Phrases that refer to metadata are the more highly weighted and frequent ones.
![Page 54: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/54.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 54
A combination of the meta and text features achieves the best performance both in terms of MAE and pearson r.
![Page 55: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/55.jpg)
• Pearson correlation coefficient
• For regression:
9/18/19 Dr. Yanjun Qi / UVA CS
r(x , y)=(xi − x)( yi − y)
i=1
m
∑
(xi − x)2 × ( yi − y)2i=1
m
∑i=1
m
∑
wherex = 1m xii=1
m
∑ and y = 1m yii=1
m
∑ .
r(x , y) ≤1
More Ways for Measuring Regression Predictions: Correlation Coefficient
r(!ypredicted ,
!yknown )
• Measuring the linear correlationbetween two sequences, x and y,
• giving a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation.
55
![Page 56: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/56.jpg)
Advantage of Elastic net (Extra)
• Native Elastic set can be converted to lasso with augmented data form
• In the augmented formulation, • sample size n+p and X* has rank p • è can potentially select all the predictors
• Naïve elastic net can perform automatic variable selection like lasso
9/18/19 Dr. Yanjun Qi / UVA CS 56
![Page 57: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/57.jpg)
Summary: Regularized multivariate linear regression
9/18/19 Dr. Yanjun Qi / UVA CS
57/54
• Model: pp xxY^
11
^
0
^^bbb +++= !
• Ridge regression estimation:
• LR estimation:
• LASSO estimation:
argmin Y −Y
^⎛
⎝⎜⎞
⎠⎟
2
∑
argmin Y −Y^⎛
⎝⎜⎞
⎠⎟
2
i=1
n
∑ +λ β jj=1
p
∑
argmin Y −Y
^⎛
⎝⎜⎞
⎠⎟
2
i=1
n
∑ +λ β j2
j=1
p
∑
![Page 58: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/58.jpg)
Regularized multivariate linear regression
Regression
Y = Weighted linear sum of X’s
Least-squares + Regularization
Linear algebra for Ridge / sub-GD for Lasso & Elastic
Regression coefficients (regularized weights)
Task
Representation
Score Function
Search/Optimization
Models, Parameters
9/18/19 Dr. Yanjun Qi / UVA CS 58
min J(β ) = Y −Y^⎛
⎝⎞⎠
2
i=1
n
∑ + λ( β jq )1/q
j=1
p
∑
![Page 59: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/59.jpg)
More: A family of shrinkage estimators
• for q >=0, contours of constant value of are shown for the case of two inputs.
9/18/19 Dr. Yanjun Qi / UVA CS 59
β = argminβ ( yi − xiTβ)2i=1N∑
subjectto β j∑q≤ s
å j
q
jb
![Page 60: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/60.jpg)
norms visualized
all p-norms penalize larger weights
q < 2 tends to create sparse (i.e. lots of 0 weights)
q > 2 tends to push for similar weights
q
![Page 61: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/61.jpg)
We aim to make the learned model
•1. Generalize Well
• 2. Computationally Scalable and Efficient
• 3. Robust / Trustworthy / Interpretable• Especially for some domains, this is about trust!
9/18/19 Dr. Yanjun Qi / UVA CS 61
![Page 62: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/62.jpg)
Today
q Linear Regression Model with Regularizations
üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüHow to pick Regularization Parameter
9/18/19 Dr. Yanjun Qi / UVA CS 62
![Page 63: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/63.jpg)
Regularized multivariate linear regression
Regression
Y = Weighted linear sum of X’s
Least-squares + Regularization
Linear algebra for Ridge / sub-GD for Lasso & Elastic
Regression coefficients (regularized weights)
Task
Representation
Score Function
Search/Optimization
Models, Parameters
9/18/19 Dr. Yanjun Qi / UVA CS 63
min J(β ) = Y −Y^⎛
⎝⎞⎠
2
i=1
n
∑ + λ( β jq )1/q
j=1
p
∑
![Page 64: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/64.jpg)
Common regularizers
L2: Squared weights penalizes large values more
L1: Sum of weights will penalize small values more
β jj
∑
β 2j
j
∑
Generally, we don’t want huge weights
If weights are large, a small change in a feature can result in a large change in the prediction
Might also prefer weights of 0 for features that aren’t useful
![Page 65: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/65.jpg)
Model Selection & Generalization
• Generalisation: learn function / hypothesis from past data in order to “explain”, “predict”, “model” or “control” new data examples
• Underfitting: when model is too simple, both training and test errors are large
• Overfitting: when model is too complex and test errors are large although training errors are small.
• After learning knowledge, model tends to learn “noise”
9/18/19 Dr. Yanjun Qi / UVA CS 65
![Page 66: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/66.jpg)
Issue: Overfitting and underfitting
9/18/19 Dr. Yanjun Qi / UVA CS
66
xy 10 qq += 2210 xxy qqq ++= å =
=5
0jj
j xy q
K-fold Cross Validation !!!!
Generalisation: learn function / hypothesis from past data in order to “explain”, “predict”, “model” or “control” new data examples
Under fit Looks good Over fit
![Page 67: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/67.jpg)
Overfitting: Handled by Regularization
A regularizer is an additional criteria to the loss function to make sure that we don’t overfit
It’s called a regularizer since it tries to keep the parameters more normal/regular
It is a bias on the model forces the learning to prefer certain types of weights over others, e.g.,
β! ridge = argminβ ( yi − xiTβ)2i=1
n∑ +λβTβ
![Page 68: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/68.jpg)
WHY and How to Select λ?
• 1. Generalization ability è k-folds CV to decide
• 2. Control the bias and Variance of the model (details in future lectures)
9/18/19 Dr. Yanjun Qi / UVA CS 68
L2: Squared weights penalizes large values more
L1: Sum of weights will penalize small values more
β jj
∑
β 2j
j
∑
![Page 69: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/69.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 69
Regularization path of a Ridge Regression
¥®l λ = 0
![Page 70: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/70.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 70
Regularization path of a Ridge Regression
¥®l λ = 0
Weight Decay
An example with 8 features
![Page 71: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/71.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 71
Regularization path of a Lasso Regression
¥®l λ = 0
when varying λ, how βj varies.
An example with 8 features
![Page 72: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/72.jpg)
An example of Ridge Regression
when varying λ, how βjvaries.
9/18/19 Dr. Yanjun Qi / UVA CS 72
λ increases
λ→∞ λ = 0
Choose λ that generalizes well !
An example with 8 features
![Page 73: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/73.jpg)
9/18/19 Dr. Yanjun Qi / UVA CS 73
¥®l λ = 0
Choose λ that generalizes well !
when varying λ, how βj varies.
An example with 8 features
![Page 74: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/74.jpg)
Today Recap
q Linear Regression Model with Regularizations
üReview: (Ordinary) Least squares: squared loss (Normal Equation)üRidge regression: squared loss with L2 regularizationüLasso regression: squared loss with L1 regularizationüElastic regression: squared loss with L1 AND L2 regularizationüInfluence of Regularization Parameter
9/18/19 Dr. Yanjun Qi / UVA CS 74
![Page 75: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/75.jpg)
Regression (supervised)
q Four ways to train / perform optimization for linear regression modelsq Normal Equationq Gradient Descent (GD) q Stochastic GD q Newton’s method
qSupervised regression models qLinear regression (LR) qLR with non-linear basis functionsqLocally weighted LRqLR with Regularizations
9/18/19 Dr. Yanjun Qi / UVA CS 75
![Page 76: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/76.jpg)
Extra More
• Optimization of regularized regressions: • See L6-extra slide
• Relation between λ and s • See L6-extra slide
• Why Elastic Net has a few nice properties • See L6-extra slide
9/18/19 Dr. Yanjun Qi / UVA CS 76
![Page 77: UVA CS 6316: Machine Learning Lecture 6: Linear Regression ......•Prediction accuracy and model interpretation are two important aspects of regression models. •LASSO does shrinkage](https://reader034.fdocuments.net/reader034/viewer/2022042414/5f2e0fdca71bd81cb4370262/html5/thumbnails/77.jpg)
References
q Big thanks to Prof. Eric Xing @ CMU for allowing me to reuse some of his slides
q Prof. Nando de Freitas’s tutorial slideq Regularization and variable selection via the elastic net, Hui Zou
and Trevor Hastie, Stanford University, USAqESL book: Elements of Statistical Learning
9/18/19 Dr. Yanjun Qi / UVA CS 77