ML-02 Linear Regression - IITKGP
Transcript of ML-02 Linear Regression - IITKGP
![Page 1: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/1.jpg)
CS 60050 Machine Learning
Linear Regression
Some slides taken from course materials of Andrew Ng
![Page 2: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/2.jpg)
Dataset of living area and price of houses in a city
This is a training set. How can we learn to predict the prices of houses of other sizes in the city, as a function of their living area?
![Page 3: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/3.jpg)
Dataset of living area and price of houses in a city
Example of supervised learning problem. When the target variable we are trying to predict is continuous, regression problem.
![Page 4: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/4.jpg)
Dataset of living area and price of houses in a city
m = number of training examples x's = input variables / features y's = output variables / "target" variables (x,y) - single training example (xi, yj) - specific example (ith training example) i is an index to training set
![Page 5: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/5.jpg)
How to use the training set?
Learn a function h(x), so that h(x) is a good predictor for the corresponding value of y h: hypothesis function
![Page 6: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/6.jpg)
How to represent hypothesis h?
θi are parameters - θ0 is zero condition - θ1 is gradient θ: vector of all the parameters We assume y is a linear function of x Univariate linear regression
![Page 7: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/7.jpg)
Digression: Multivariate linear regression
![Page 8: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/8.jpg)
How to represent hypothesis h?
θi are parameters - θ0 is zero condition - θ1 is gradient
We assume y is a linear function of x Univariate linear regression How to learn the values of the parameters θi?
![Page 9: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/9.jpg)
Intuition of hypothesis function
• We are attempting to fit a straight line to the data in the training set
• Values of the parameters decide the equation of the straight line
• Which is the best straight line to fit the data?
![Page 10: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/10.jpg)
Intuition of hypothesis function
• Which is the best straight line to fit the data? • How to learn the values of the parameters θi?
• Choose the parameters such that the prediction is close to the actual y-value for the training examples
![Page 11: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/11.jpg)
Cost function
• Measure of how close the predictions are to the actual y-values
• Average over all the m training instances
• Squared error cost function J(θ) • Choose parameters θ so that J(θ) is minimized
![Page 12: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/12.jpg)
Hypothesis:
Parameters:
Cost Function:
Goal:
![Page 13: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/13.jpg)
(for fixed , this is a function of x) (function of the parameters )
0
100
200
300
400
500
0 1000 2000 3000
Price ($) in 1000’s
Size in feet2 (x)
![Page 14: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/14.jpg)
Contour plot or Contour figure
![Page 15: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/15.jpg)
Minimizing a function
• For now, let us consider some arbitrary function (not necessarily an error function)
• Algorithm called gradient descent
• Used in many applications of minimizing functions
![Page 16: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/16.jpg)
Have some function
Want
Outline:
• Start with some
• Keep changing to reduce
until we hopefully end up at a minimum
![Page 17: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/17.jpg)
θ1
θ0
J(θ0,θ1)
![Page 18: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/18.jpg)
θ0
θ1
J(θ0,θ1)
If the function has multiple local minima, where one starts can decide which minimum is reached
![Page 19: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/19.jpg)
Gradient descent algorithm
α is the learning rate – more on this later
![Page 20: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/20.jpg)
Gradient descent algorithm
Correct: Simultaneous update Incorrect:
![Page 21: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/21.jpg)
For simplicity, let us first consider a function of a single variable
![Page 22: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/22.jpg)
The learning rate
• Gradient descent can converge to a local minimum, even with the learning rate α fixed
• But, value needs to be chosen judiciously • If α is too small, gradient descent can be slow
to converge • If α is too large, gradient descent can
overshoot the minimum. It may fail to converge, or even diverge.
![Page 23: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/23.jpg)
Gradient descent algorithm Linear Regression Model
Gradient descent for univariate linear regression
![Page 24: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/24.jpg)
Gradient descent for univariate linear regression
update and
simultaneously
![Page 25: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/25.jpg)
“Batch”: Each step of gradient descent uses all the training examples. There are other variations like “stochastic gradient descent” (used in learning over huge datasets)
“Batch” Gradient Descent
![Page 26: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/26.jpg)
What about multiple local minima? • The cost function in linear regression is always a
convex function – always has a single global minimum
• So, gradient descent will always converge
![Page 27: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/27.jpg)
Convex cost function
![Page 28: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/28.jpg)
Gradient descent in action
![Page 29: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/29.jpg)
(for fixed , this is a function of x) (function of the parameters )
![Page 30: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/30.jpg)
(for fixed , this is a function of x) (function of the parameters )
![Page 31: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/31.jpg)
(for fixed , this is a function of x) (function of the parameters )
![Page 32: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/32.jpg)
(for fixed , this is a function of x) (function of the parameters )
![Page 33: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/33.jpg)
(for fixed , this is a function of x) (function of the parameters )
![Page 34: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/34.jpg)
(for fixed , this is a function of x) (function of the parameters )
![Page 35: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/35.jpg)
(for fixed , this is a function of x) (function of the parameters )
![Page 36: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/36.jpg)
(for fixed , this is a function of x) (function of the parameters )
![Page 37: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/37.jpg)
(for fixed , this is a function of x) (function of the parameters )
![Page 38: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/38.jpg)
Linear Regression for multiple variables
![Page 39: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/39.jpg)
Size (feet2)
Number of bedrooms
Number of floors
Age of home (years)
Price ($1000)
2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 … … … … …
Multiple features (variables).
![Page 40: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/40.jpg)
Size (feet2)
Number of bedrooms
Number of floors
Age of home (years)
Price ($1000)
2104 5 1 45 460 1416 3 2 40 232 1534 3 2 30 315 852 2 1 36 178 … … … … …
Multiple features (variables).
Notation: = number of features = input (features) of training example. = value of feature in training example.
![Page 41: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/41.jpg)
Hypothesis:
Previously:
For multi-variate linear regression:
For convenience of notation, define .
![Page 42: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/42.jpg)
Hypothesis:
Cost function:
Parameters:
(simultaneously update for every )
Repeat
Gradient descent:
![Page 43: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/43.jpg)
(simultaneously update )
Gradient Descent
Repeat
Previously (n=1):
New algorithm :
Repeat
simultaneously update for
![Page 44: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/44.jpg)
(simultaneously update )
Gradient Descent
Repeat
Previously (n=1):
New algorithm :
Repeat
simultaneously update for
![Page 45: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/45.jpg)
Practical aspects of applying gradient descent
![Page 46: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/46.jpg)
E.g. = size (0-2000 feet2) = number of bedrooms (1-5)
Feature Scaling Idea: Make sure features are on a similar scale.
size (feet2)
number of bedrooms
![Page 47: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/47.jpg)
E.g. = size (0-2000 feet2) = number of bedrooms (1-5)
Feature Scaling Idea: Make sure features are on a similar scale.
Replace with to make features have approximately zero mean (Do not apply to ).
Mean normalization:
Other types of normalization:
![Page 48: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/48.jpg)
Is gradient descent working properly?
• Plot how J(θ) changes with every iteration of gradient descent
• For sufficiently small learning rate, J(θ) should decrease with every iteration
• If not, learning rate needs to be reduced
• However, too small learning rate means slow convergence
![Page 49: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/49.jpg)
When to end gradient descent?
• Example convergence test:
• Declare convergence if J(θ) decreases by less than 0.001 in an iteration (assuming J(θ) is decreasing in every iteration)
![Page 50: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/50.jpg)
Polynomial Regression for multiple variables
![Page 51: ML-02 Linear Regression - IITKGP](https://reader031.fdocuments.net/reader031/viewer/2022013005/61ccbcc517de916c591d2b33/html5/thumbnails/51.jpg)
Choice of features
Price (y)
Size (x)