Foundations of Data Analysis February 20, 2020 › FoDA › lectures ›...

Transcript of Foundations of Data Analysis February 20, 2020 › FoDA › lectures ›...

Linear Regression

Foundations of Data Analysis

February 20, 2020
Algebra

StatisticsGeometry
Is there a relationship between the heights of mothersand their daughters?
If you know a mother’s height, can you predict herdaughter’s height with any accuracy?
Linear regression is a tool for answering these types ofquestions.
It models the relationship as a straight line.
Regression Setup

When we are given real-valued data in pairs:

(x1, y1), (x2, y2), . . . , (xn, yn) ∈ R2

Example:xi is the height of the ith motheryi is the height of the ith mother’s daughter
Linear Regression

Model the data as a line:

x

y

�� i

(x , y )i i

1

�

yi = α + βxi + �i

α : interceptβ : slope�i : error
Geometry: Least Squares

We want to fit a line as close to the data as possible,which means we want to minimize the errors, �i.

x

y

�� i

(x , y )i i

1

�

yi = α + βxi + �i

α : interceptβ : slope�i : error

Taking the line equation: yi = α + βxi + �i

Rearrange to get: �i = yi − α− βxi

We want to minimize the sum-of-squared errors (SSE):

SSE(α, β) =n∑

i=1

�2i =n∑

i=1

(yi − α− βxi)2




SSE(α, β) =n∑

i=1

�2i =n∑

i=1





SSE(α, β) =n∑

i=1

�2i =n∑

i=1

Least Squares: Step 1

Center the data by removing the mean:

ỹi = yi − ȳx̃i = xi − x̄

Note:∑n

i=1 ỹi = 0 and∑n

i=1 x̃i = 0

We’ll first get a solution: ỹ = α+ βx̃, then shift it back tothe original (uncentered) data at the end


ỹi = yi − ȳx̃i = xi − x̄

Note:∑n


i=1 x̃i = 0



ỹi = yi − ȳx̃i = xi − x̄

Note:∑n


i=1 x̃i = 0

Least Squares: Step 2Take derivative of SSE(α, β) wrt α and set to zero:

0 =∂

∂αSSE(α, β)

=∂

∂α

n∑i=1

(ỹi − α− βx̃i)2

= −2n∑

i=1

(ỹi − α− βx̃i)

= −2n∑

i=1

ỹi + 2nα + 2βn∑

i=1

x̃i

Using∑

ỹi =∑

x̃i = 0, we get

α̂ = 0

0 =∂

∂αSSE(α, β) =

∂

∂α

n∑i=1

(ỹi − α− βx̃i)2

= −2n∑

i=1


= −2n∑

i=1

ỹi + 2nα + 2βn∑

i=1

x̃i

Using∑

ỹi =∑

x̃i = 0, we get

α̂ = 0

0 =∂

∂αSSE(α, β) =

∂

∂α

n∑i=1

(ỹi − α− βx̃i)2

= −2n∑

i=1


= −2n∑

i=1

ỹi + 2nα + 2βn∑

i=1

x̃i

Using∑

ỹi =∑

x̃i = 0, we get

α̂ = 0

0 =∂

∂αSSE(α, β) =

∂

∂α

n∑i=1

(ỹi − α− βx̃i)2

= −2n∑

i=1


= −2n∑

i=1

ỹi + 2nα + 2βn∑

i=1

x̃i

Using∑

ỹi =∑

x̃i = 0, we get

α̂ = 0

0 =∂

∂αSSE(α, β) =

∂

∂α

n∑i=1

(ỹi − α− βx̃i)2

= −2n∑

i=1


= −2n∑

i=1

ỹi + 2nα + 2βn∑

i=1

x̃i

Using∑

ỹi =∑

x̃i = 0, we get

α̂ = 0

With α = 0, we are left with

ỹi = βx̃i + �i

Or, in vector notation:ỹ1ỹ2...

ỹn

= β

x̃1x̃2...

x̃n

+�1�2...�n

ỹ1ỹ2...

ỹn

= β

x̃1x̃2...

x̃n

+�1�2...�n

xy

�

�

~

~

x~

Minimizing SSE(α, β) =∑�2i = ‖�‖2 is projection!

Solution is β̂ = 〈x̃,ỹ〉‖x̃‖2

ỹ1ỹ2...

ỹn

= β

x̃1x̃2...

x̃n

+�1�2...�n

xy

�

�

~

~

x~

Minimizing SSE(α, β) =∑�2i = ‖�‖2 is projection!

Solution is β̂ = 〈x̃,ỹ〉‖x̃‖2
Shifting Back to Uncentered Data

So far, we have:ỹi = β̂x̃i + �i

Expanding out x̃i and ỹi gives

(yi − ȳ) = β̂(xi − x̄) + �i

Rearranging gives

yi = (ȳ− β̂x̄) + β̂xi + �i

So, for the uncentered data, α̂ = ȳ− β̂x̄



(yi − ȳ) = β̂(xi − x̄) + �i

Rearranging gives

yi = (ȳ− β̂x̄) + β̂xi + �i




(yi − ȳ) = β̂(xi − x̄) + �i

Rearranging gives

yi = (ȳ− β̂x̄) + β̂xi + �i




(yi − ȳ) = β̂(xi − x̄) + �i

Rearranging gives

yi = (ȳ− β̂x̄) + β̂xi + �i

Probability: Maximum Likelihood

So far, we have only used geometry, but if our data israndom, shouldn’t we be talking about probability?

To make linear regression probabilistic, we model theerrors as Gaussian:

�i ∼ N(0, σ2)

The likelihood is

L(α, β) =n∏

i=1

1√2πσ

exp

(− �

2i

2σ2

)



�i ∼ N(0, σ2)

The likelihood is

L(α, β) =n∏

i=1

1√2πσ

exp

(− �

2i

2σ2

)



�i ∼ N(0, σ2)

The likelihood is

L(α, β) =n∏

i=1

1√2πσ

exp

(− �

2i

2σ2

)

The log-likelihood is then

log L(α, β) = − 12σ2

n∑i=1

�2i + const.

Maximizing this is equaivalent to minimizing SSE!

max log L = min∑

�2i = min SSE

The log-likelihood is then

log L(α, β) = − 12σ2

n∑i=1

�2i + const.

Maximizing this is equaivalent to minimizing SSE!

max log L = min∑

�2i = min SSE

Foundations of Data Analysis February 20, 2020 › FoDA › lectures ›...

Documents

Transcript of Foundations of Data Analysis February 20, 2020 › FoDA › lectures ›...

L11 Regression

Spelling words l11

L11 Scand Notes

DataAnalysisandMachineLearning: Sessions1and2,LinearRegression

L11 Barrett's Taxonomy

L11 ECS - Magnetism

Verilog l11 Mit

Conversion L11

Ficha de Trabajo FODA - retiroarmada.trabajando.comretiroarmada.trabajando.com/.../2011/12/Ficha-Trabajo-FODA-final.pdf · Ficha de Trabajo FODA FODA: FORTALEZAS / OPORTUNIDADES

L11 Norman

L11 - Sorting

Análisis FODA - tutorias.itdelicias.edu.mxtutorias.itdelicias.edu.mx/AnalisisFoda.pdf · Ejemplos de análisis FODA Análisis FODA telescopio.galileo.edu. 10 Bibliografía Análisis

LinearRegression - Georgia Institute of Technologybboots3/CS4641-Fall2016/Lectures/... · Based’on’slide’by’Christopher’Bishop’(PRML) LinearBasis’FunctionModels –

L11 fetal circulation

Tele4653 l11

L11 Variation II

Tutorial Fix l11

l11 Gizi Seimbang

ANALISIS FODA Y MATRIZ FODA!

Sp first l11