Post on 26-Oct-2021
Day 6 - Linear Regression and Logistic Regression Agenda:
Linear Regression•Examples◦Issues to Pay Attention To with Linear Regression◦
Classification and Logistic Regression•Training classifiers◦Evaluating classifiers◦
More thoughts on square capital example and whether to approach problem as regression or classification
CS 6140: Machine Learning — Fall 2021— Paul Hand
HW 2Due: Wednesday September 29, 2021 at 2:30 PM Eastern time via Gradescope.
Names: [Put Your Name(s) Here]
You can submit this homework either by yourself or in a group of 2. You may consult any andall resources. You may submit your answers to this homework by directly editing this tex file(available on the course website) or by submitting a PDF of a Jupyter or Colab notebook. Whenyou upload your solutions to Gradescope, make sure to tag each problem with the correct page.
Question 1. In this problem, you will fit polynomials to one-dimensional data using linear regression.
(a) Generate training data (xi ,yi ) for i = 1 . . .8 by xi ⇠Uniform([0,1]), and yi = f (xi )+"i , wheref (x) = 1+2x �2x2 and "i ⇠N (0,�2) and � = 0.1. Plot the training data and the function f .
Response:
(b) In this problem, you will find the best fit degree d polynomial for the above data for eachd between 0 and 7. Find it with least squares linear regression by minimizing the trainingmean squared error (MSE)
min✓
12
8X
i=1
✓yi �
dX
k=0
✓kxki
◆2(1)
using the Normal Equations. Use numpy.linalg.solve to solve the Normal Equationsinstead of computing a matrix inverse. On 8 separate plots, plot the data and the best fitdegree-d polynomial.
Response:
(c) Plot the MSE with respect to the training data (training MSE) as a function of d. Whichvalue of d provided the lowest training MSE?
Response:
(d) Generate a test set of 1000 data points sampled according to the same process as in part (a).Plot the MSE with respect to the test data (test MSE) as a function of d. Which value of dprovided the lowest test MSE?
Response:
Question 2. Linear regression using gradient descent and TensorFlow
1
Aggie
raina MLModel
Learn A
valuate
moldyUSE
Inn
Least Squares Formulation for Linear Regression (for a general model)
Given D X Yii n
X GIRD Y EIR
model Y EEmljj.geterror
Want y O nxk
giri'll get ight get
If ii gem foreglenFind
main Illy 011 minimizessum of Squarg
where ye In offof GrasGi Yi JIN
solution given by Normal Equations
Xtra sty a Tty
Examples of setting up and solving linear regression
Find best fit cubic through Id data
Data.EC i YiJ3i wXi Yi
Model YI Goto Xt Gatto X t noise
Findmin spy girl
x x x'if0
more of of fsolution given by Normal Equations
Xtra sty after x'y
Find best fit plane
Data H Yi etIz IR
Model Y Go to X 0222 noise
Findmin fly I y0
more 111 ofD IEEEi
1 9 x
ye I
tally
Solving and Optimization Problem using Gradient Descent
Min Fix
Gradient descent Take successive steps downhill
fit Xi off Xi
Tf pomstep sizelearning rate
ts in directionof steepest descent
frats
X2 X
level sets
Depiction of gradient descent
i
Things that can go wrong: Underfitting and Overfitting Things that can go wrong: numerical instability
yUnderfitting
model y 00
Enough
overfitting
halal is too Expressive
it can fit noise
Xtra sty
Other topics: What happens when there is fewer data than features? What happens if there are outliers in the data? How do you deal with categorical features?
If nik usually
y 0 Atx is invalid
alI that is a unique 0Nhk
If nah rely many solutions
X has a null space
main Illy IgpI outliercan show Estimate
Of 0 arbitrarily much
ISSUE W MSE
more robust medianming Elly tell
y Got 0 X tea Xz
Xzt it person i is consultant
O O'wise
indicator
Be careful about whether you want to view your problem as a prediction task
Classification and Logistic RegressionViewing Regression and Classification as function estimation problems
Let F IRI IRy fix noise
Find 8 f
Classification predict membership in a category
wtf no Io.fiy frxlt noise
Given It Yi in nFind F
boundarydecision
Terminology
X input variables predictors independent vars features
Y response dependent variable outputvariable
f model predictor hypothesis
Parametric Approach: Choose a model for f with unknown parameters. Estimate the parameters.
Parametric
Regression predict a continuous value
Model Fo IR't IRy fort noise
Given city in
Find 8 0
ParametricClassification predict membership in a category
Let link m
o.EEy frxlt noiseGiven It Yi in nFind O
decisionboundary
Approach for Estimating G
select a model for f w parameters 0
minimize the loss between
training labels and predictions on
training data
Binary Classification in 2D with logistic regression
Min LI Yi ForkQ it I predictionloss
or yfunction
Example linear regression hey y y'trsquare loss or l loss
1yo o É
Training Data No Yasin n
X2
class l
class O
X
Given this data, draw a decision boundary (curve where you would say class 1 is on one side and class 2 is on the other side)
Xz o class Iclass O
X
X z o class Iclass O
fittingnoise
Modely of Goto x Osx JK O
where
sigmoid
logisticfunction I
Solve main El LI Yi JIN O for E
PredictFor new sample X predict
class I it Is zclass O if JC V2
What loss function should you use
One choice log loss
Lf Y J logroll it yet
lag l j if y O
binary continuo
ylogs ly log l Y
Decision Boundary for Logistic Regression
Training data x 3Yo b
Modely ofGotQ x Osx JK O
Predictfor new sample X predict
class I if Jt sclass O if Jc Yz
Decision boundary is linear
class I
x
a o sit0 0 0 Attak
Activity: Could you use logistic regression to build a reasonable classifier for the following data?
r
THEIR a 5 01 Got G X t 022s
Decision body is lienar
Bad fit for thisdata
b Forgot aft
Evaluating Classifiers Activity: Someone invents a test for a rare disease that affects 0.1% of the population. The test has accuracy 99.9%. Are you convinced this is a good test?
Prediction
t É
FalsePositive
TrueNegative
Accuracy TI.tn
Activity: You are building a binary classifier that detects whether a pedestrian is crossing the sidewalk within 30 feet of a self driving car. If the detection is positive, the car puts on the breaks. Would you rather have good precision and great recall or good recall and great precision?
Precision Iff what fraction of predicted
positives are real
what fraction of positivesRecall Iffy are correctly predicted
Want high precision high recall
There is a trade off between True Positives and False Positives, and between True Negatives and False Negatives
Training data x 3Yo o b
Modely ofGotQ x Osx JK O
Predictfor new sample X predict
class I it Is zclass 0 if Y V2
could chooseany value
Receiver Operating Characteristic Curves
Each point is a classifier w differentthreshold
Everythingrate
randomrate classifier
declareEverythingnegative
Comparing classifiers and Area-Under-Curve (AUC) Also common to plot precision-recall curves
Iggy VI want Auc l
Fprate
precis