Download - Outlineof the course Matrixlanguage

02-03-17

1

Linear models

F. Farnir, E. Moyse

Biostatistics & Bioinformatics

Faculty of Vet. Medicine

University of Liege

Outline of the course

� Intro: why using linear models ?

� R as a tool to learn and do statistics

� Matrix language

� Linear model

◦ Basic formulation

◦ Estimating parameters

◦ Testing hypotheses about the parameters

� Examples of linear model

Outline of the course

◦ Simple linear regression

◦ Simple ANOVA

◦ A more complicated example

◦ A more complex situation: repeated measures

� Summary & exercices

Matrix language

� Matrix definition

◦ A matrix M is a n*m table of scalar elements.

◦ The elements of the matrix are noted M(i,j),

where i corresponds to the row in the matrix

(i = 1, 2, …, n) and j to the column in the

matrix (j = 1, 2, …, m)

◦ Example:

if M = , M(1,3) = 5.

466

5810

02-03-17

2

Matrix language

� Special cases

◦ When m = n, we have a square matrix.

◦ When M is square and M(i,j) = M(j,i) for all j>i, we have a symetric matrix.

◦ When m (or n) = 1, we have a row (column) vector

◦ When m = n = 1, the matrix reduces to a scalar (i.e. the usual math objets we are usedto play with…)

◦ When M is square, and M(i,j) = 0 for all i ≠ j, the matrix is diagonal

Matrix language

� Vector definition using R

� Matrix definition using R

> v1<-c(2,4,6,8,10) # Simplest form> v2<-seq(2,10,2) # Args: min,max,step> v3<-1:10 # Successive numbers

> m1<-matrix(v3,byrow=T,nr=2)> m1

[,1] [,2] [,3] [,4] [,5][1,] 1 2 3 4 5[2,] 6 7 8 9 10>

Matrix language

� Matrix operations

◦ Equality: 2 matrices M and N are equal if:

� they have the same dimensions n and m.

� M(i,j) = N(i,j) for all possible values of (i,j).

◦ Example:

means that a = 10, b = 8, …, f = 4.

=

fed

cba

466

5810

Matrix language


◦ Transposition: N is a transpose of M if:

� M dimensions are m & n, and N dims. are n & m

� N(i,j) = M(j,i) for all possible values of (i,j).

� The usual notation is M’ = N

◦ Example:

=′⇒

=

45

68

610

466

5810MM

02-03-17

3

Matrix language


◦ Addition: C = M + N if:

� C, M & N have the same dimensions n and m.

� C(i,j) = M(i,j) + N(i,j) for all possible values of (i,j).

◦ Example:

◦ Substraction is defined similarly

−−+

=

109

553

5615

037

466

5810

Matrix language

� Matrix operations using R

◦ Addition

> m1<-matrix(1:12,byrow=T,nr=3)> m2<-matrix(1:12,byrow=F,nr=3)> m<-m1+m2> m

[,1] [,2] [,3] [,4][1,] 2 6 10 14[2,] 7 11 15 19[3,] 12 16 20 24>

Matrix language


◦ Scalar multiplication: N = a*M

� M & N have the same dimensions n and m.

� a is a scalar

� N(i,j) = a*M(i,j) for all possible values of (i,j).

◦ Example:

=

81212

101620

466

5810*2

Matrix language


◦ Scalar multiplication

> m1<-matrix(1:12,byrow=T,nr=3)> m2<-3*m1> m2

[,1] [,2] [,3] [,4][1,] 3 6 9 12[2,] 15 18 21 24[3,] 27 30 33 36>

02-03-17

4

Matrix language


◦ Matrix multiplication: C = M*N

� M & N have compatible dimensions n,m and m,q.

� C dimension is n,q

� C(i,j) = Σk M(i,k)*N(k,j) for all possible values of (i,j).

◦ Example:

=

−

14

21

1

2

1

*466

5810

Matrix language


◦ Matrix multiplication

> m1<-matrix(1:12,byrow=T,nr=3)> m2<-matrix(1:12,byrow=T,nr=4)> m<-m1%*%m2> m

[,1] [,2] [,3][1,] 70 80 90[2,] 158 184 210[3,] 246 288 330>

Matrix language


◦ Matrix multiplication (cont’d)

◦ Another example:

� The matrix I with I(i,i) = 1 for all i and I(i,j) = 0 for all

i,j,i ≠ j is called identity matrix (for obvious

reasons…)

=

466

5810

100

010

001

*466

5810

Matrix language


◦ Matrix multiplication with identity

> m1<-matrix(1:12,byrow=T,nr=3)> m2<-diag(4) # Diagonal matrix of dim 4,4> m1%*%m2

[,1] [,2] [,3] [,4][1,] 1 2 3 4[2,] 5 6 7 8[3,] 9 10 11 12>

02-03-17

5

Matrix language


◦ Matrix inversion: N = M-1

� M & N are square and of same dimensions n,n

� M*N = I

� N(i,j) are not easy to compute by hand in general.

� N is unique when it exists

◦ Example:

1

10

01

5.05.1

12*

43

21* −=⇒

=

−−

= MNNM

Matrix language


◦ Matrix inversion

> m<-matrix(1:4,byrow=T,nr=2)> n<-solve(m) # Computes the inverse of m> n

[,1] [,2][1,] -2.0 1.0[2,] 1.5 -0.5> m%*%n

[,1] [,2][1,] 1 0[2,] 0 1>

Matrix language


◦ Matrix inversion: questions ?

� Are all matrices invertible ? No

� When rows (or columns) are linear combinations of other

rows (or columns) of the matrix, the matrix is not invertible.

� The max # of independent rows and columns is called the

rank of the matrix, noted r(M)

� Example:

=> M[3,] = 2*M[1,] + M[2,]

=1098

456

321

M

Matrix language



� But, not obvious to see such dependencies…

� Fortunately, a measure, called determinant can be

systematically computed (although not easily…),

with the property that a null determinant indicates

a non invertible (« singular ») matrix.

02-03-17

6

Matrix language


◦ Matrix inversion: singular matrix

> m<-matrix(c(1,2,3,6,5,4,8,9,10),byrow=T,nr=3)> det(m) # Computes the determinant of m[1] 0> solve(m)Erreur dans solve.default(m) :

sous-programme Lapack dgesv : le système est exactement singulier>

Matrix language



� So, what to do with singular and rectangular

matrices ? Use generalized inverse

◦ G is a generalized inverse of M if M*G*M = M

◦ Example:

=⇒

=3/100

02/10

000

303

022

325

1GM

Matrix language


◦ Generalized inverse

> m<-matrix(c(5,2,3,2,2,0,3,0,3),byrow=T,nr=3)> det(m) # Computes the determinant of m[1] 0> g1<-matrix(c(0,0,0,0,1/2,0,0,0,1/3),byrow=T,nr=3)> m%*%g1%*%m

[,1] [,2] [,3][1,] 5 2 3[2,] 2 2 0[3,] 3 0 3>

Matrix language


◦ Generalized inverse

> m<-matrix(c(5,2,3,2,2,0,3,0,3),byrow=T,nr=3)> library(MASS)> g2<-ginv(m) # Penrose generalized inverse> g2

[,1] [,2] [,3][1,] 0.09259259 0.07407407 0.01851852[2,] 0.07407407 0.25925926 -0.18518519[3,] 0.01851852 -0.18518519 0.20370370> m%*%g2%*%m

[,1] [,2] [,3][1,] 5 2 3[2,] 2 2 0[3,] 3 0 3>

02-03-17

7

Matrix language


◦ Matrix inversion: summary

� For square matrices M with det(M) ≠ 0, a unique

inverse matrix, noted M-1, exists

� Note that: M*M-1*M = M*I = M

� For square matrices M with det(M) = 0 and for

rectangular matrices, an infinite number of

generalized matrices, noted G, exist.

solve(M)

ginv(M)

Matrix language: an application

� Solving a system of linear equations

◦ Consider the following system:

◦ Using our definitions, we can rewrite it as

A*x = b where A = , x = and b =

=+=−+=++

7*2

1*2

8*2

zx

zyx

zyx

−201

112

121

z

y

x

7

1

8


� Solving a system of linear equations (cont’d)

◦ Now we can solve using matrix operations:

A-1*A*x = A-1*b

=> I*x = A-1*b

=> x = A-1*b

=> x = x[1] = (A-1*b)[1],

y = x[2] = (A-1*b)[2],

z = x[3] = (A-1*b)[3].

Matrix language

� Solving a system of equations using R

> A<-matrix(c(1,2,1,2,1,-1,1,0,2),byrow=T,nr=3)> b<-c(8,1,7)> x<-solve(A)%*%b> x

[,1][1,] 1[2,] 2[3,] 3>

02-03-17

8



◦ Now, consider the following system:

◦ Again, we can rewrite it as:

A*x = b where A = , x = , b =

=++−=−+=++

7*2

1*2

8*2

zyx

zyx

zyx

−−211

112

121

z

y

x

7

1

8



◦ But this time, the 3rd eqn is the 1st - the 2nd:

◦ So, we actually have 2 equations with 3

unknowns, which leads to an infinity of

solutions…

=++−=−+=++

7*2

1*2

8*2

zyx

zyx

zyx



◦ In our matrix writing, this leads to

A[3,] = A[1,] – A[2,]

◦ Accordingly, the inverse of A does not exist

and generalized inverses need to be used !

◦ Since an infinity of generalized inverses can be

obtained, this leads to an infinity of solutions,

as expected !

Matrix language

� Solving a system of equations using R> A<-matrix(c(1,2,1,2,1,-1,-1,1,2),byrow=T,nr=3)> b<-c(8,1,7)> library(MASS)> x<-ginv(A)%*%b> x

[,1][1,] 0.3333333[2,] 2.6666667[3,] 2.3333333> y<-c(1,2,3) # Let’s show this is another solution> A%*%y

[,1][1,] 8[2,] 1[3,] 7>

02-03-17

9

Linear models

� A basic formulation

◦ A model is a linear model if the relationship

between the parameters and the modelled

variable is linear.

◦ Examples:

� Linear regression: y(ij) = β0 + β1*x(i) + e(ij)

� Quadratic regression: y(i) = β0 + β1*x(i) + β2*x(i)² + e(ij)

� One way ANOVA: y(i) = µ + a(i) + e(ij)

� Multiple ways ANOVA, other regressions, mixture

of ANOVA and regressions, …

Linear models

� Matrix formulation:y = X*β + e

◦ y’ = (y(1),…,y(n)) = vector of observations

� known (observed, or measured)

◦ β’ = (b(1),…,b(m)) = vector of parameters

� unknown, and to be estimated

◦ e’ = (e(1),…,e(n)) = vector of residuals

� unknown, but supposed N(0,σ²*I)

◦ X = design (or incidence) matrix, linking the parameters to the observations.

� known

Linear models

� Two main problems:

◦ How do we obtain estimators b for the

parameters β of the model ?

◦ How do we test hypotheses about the

parameters of the model ?

Linear models

� Estimation problem:

◦ The rationale we use to infer estimators is to

choose parameters values that make error « as

small as possible »

� To simultaneously reduce all components of e, we

choose to minimize e’*e = Σ e(i)²

� This leads to a « least-square (of the error) estimate »

in the following form:

b = (X’*X)-1*X’*y

02-03-17

10

Linear models

� Estimation problem: example 1

◦ Assume we have the following simple problem

10 = 3*β + e1

12 = 2*β + e2

◦ 2 equations – 1 unknown => no solution

◦ We choose an estimator of β that minimizes the

sum of the squared errors:

min (e1² + e2²) = min [(10 – 3*β)² + (12 – 2*β)²]

=> 2*(10-3*b)*(-3) + 2*(12-2*b)*(-2) = 0

=> b = 54/13

Linear models


◦ Using the matrix notation:

y’ = (10 12), X’ = (3 2),

b = (b), e = (e1 e2)

◦ X’X = 13 => (X’X)-1 = 1/13

X’y = 54

◦ => b = (X’X)-1 * X’y = 54/13

Linear models


◦ The simplest linear model:

yi = µ + ei

◦ In matrix form: y = X*β + e

� y’ = (y1 y2 … yn)

� X’ = (1 1 … 1)

� β = (µ)

� e’ = (e1 e2 … en)

◦ The estimator b of β is then: b = (X’*X)-1*X’*y

Linear models


◦ Let’s compute b:

� X’*X = n => (X’*X)-1 = 1/n

� X’*y = y1 + y2 +… + yn = Σ yi

� b = (X’*X)-1 * X’*y = 1/n * Σ yi = m

◦ Let’s compute (y-X*b)’*(y-X*b)/(n-r(X))

� (X*b)’ = (m m … m)

� (y-X*b)’ = (y1-m y2-m … yn-m)

� (y-X*b)’*(y-X*b) = (y1-m)² + (y2-m)² + … +(yn-m)²

= Σ (yi-m)²

� r(X) = # of indpdt lines of X = 1

02-03-17

11

Linear models


◦ Consequently:

� (y-X*b)’*(y-X*b)/(n-r(X)) = Σ (yi-m)²/(n-1) = s²

Linear models

� Testing problem:

◦ Most of the (null) hypotheses we might want to

test can be written as:

H0: L*β = c

where L and c are known matrix and vector,

respectively. This is known as « general linear

hypothesis ».

Linear models

� Testing problem (cont’d):

◦ A general test can be devised, based on a F statistics, for such hypotheses:

� G is a generalized inverse of X’*X. If the hypothesis is« testable », all G provide the same F value

� q = # of independent lines of L

� The hypothesis is of course embedded in the statistic

� The numerator is the estimator of σ².

� Normality assumptions are necessary to obtain the F distributions

Fq,n-r(X) = [(L*b-c)’*(L*G*L’)-1*(L*b-c)/q] / [(y-X*b)’*(y-X*b)/(n-r(X))]

Examples of linear models

� A simple linear regression:

◦ Let’s consider the following dataset:

◦ A question of interest: is there a significant

relation between weights and weeks ?

Week Weight

1 4.1

2 4.6

3 4.9

4 5.2

5 5.4

02-03-17

12



◦ A first answer is to look at a plot of weights

versus weeks. This can be achieved using R:

> weeks<-1:5> weights<-c(4.1,4.6,4.9,5.2,5.4)> plot(weeks,weights)




� A simple linear regression: there is a clear,

almost linear, increasing trend

◦ This could be modeled using a classical linear

regression:

Y(i) = β0 + β1*X(i) + e(i)

or, using the matrix notation:

Y = X*β + e where β =

1

0

ββ



◦ Computing the estimators b of β.

� X and y are easy to obtain:

( ) yXXX '**'* 1

1

0 −=

b

b

=

=

51

21

11

)5(1

)2(1

)1(1

⋮⋮⋮⋮

X

X

X

X

=

=

4.5

6.4

1.4

)5(

)2(

)1(

⋮⋮

y

y

y

y

02-03-17

13




> weeks<-1:5> weights<-c(4.1,4.6,4.9,5.2,5.4)> Y<-weights> X<-matrix(c(rep(1,5),weeks),byrow=F,nr=5)> b<-solve(t(X)%*%X)%*%t(X)%*%Y> b

[,1][1,] 3.88[2,] 0.32> abline(b[1],b[2],col=« red »)






◦ Testing an hypothesis: β1 = 0.

� The hypothesis can be put in the form L*β = c

as follows:

� L = (0,1), c = 0

� Next, we can use these elements in the formula

� Note that: q = 1 and r(X) = 2

Fq,n-r(X) = [(L*b-c)’*(L*G*L’)-1*(L*b-c)/q] / [(y-X*b)’*(y-X*b)/(n-r(X))]



◦ Testing an hypothesis: β1 = 0.> weeks<-1:5> weights<-c(4.1,4.6,4.9,5.2,5.4)> Y<-weights> n<-length(Y)> X<-matrix(c(rep(1,5),weeks),byrow=F,nr=5)> G<-solve(t(X)%*%X)> b<-G%*%t(X)%*%Y> L<-matrix(c(0,1),nr=1)> c<-c(0)> hypo<-L%*%b-c> numer<-t(hypo)%*%solve(L%*%G%*%t(L))%*%hypo> denom<-t(Y-X%*%b)%*%(Y-X%*%b)> F<-(numer/1)/(denom/(n-2))> pf(F,1,n-2,lower.tail=FALSE)

[,1][1,] 0.001857831

02-03-17

14


� A simpler solution to linear regression:

◦ Testing an hypothesis: β1 = 0.> weeks<-1:5> weights<-c(4.1,4.6,4.9,5.2,5.4)# ‘lm’ stands for ‘linear models’> lr<-lm(weights~weeks)> summary(lr)

Call:lm(formula = weights ~ weeks)

Residuals:1 2 3 4 5

-0.10 0.08 0.06 0.04 -0.08

Examples of linear models� A simpler solution to linear regression

(cont’d):◦ Testing an hypothesis: β1 = 0.

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 3.88000 0.10132 38.29 3.92e-05 ***weeks 0.32000 0.03055 10.47 0.00186 ** ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.09661 on 3 degrees of freedomMultiple R-squared: 0.9734, Adjusted R-squared: 0.9645 F-statistic: 109.7 on 1 and 3 DF, p-value: 0.001858


� The classical solution for linear regression:

◦ Testing an hypothesis: β1 = 0.> weeks<-1:5> weights<-c(4.1,4.6,4.9,5.2,5.4)# b = Σ(X-Xm)*(Y-Ym)/ Σ(X-Xm)²> xm<-mean(weeks)> x<-weeks-xm> ym<-mean(weights)> y<-weights-ym> b<-sum(x*y)/sum(x**2)> b[1] 0.32> SCR<-b*sum(x*y)> SCT<-sum(y*y)> SCE<-SCT-SCR> dfR<-1> dfE<-length(weights)-2> F<-(SCR/dfR)/(SCE/dfE)> pf(F,dfR,dfE,lower.tail=FALSE)[1] 0.001857831


� A simple analysis of variance:

◦ As a second example, consider these data, with

horses heart rates:

Ardennes Warm Half

106.6 115.4 100.2

100.8 97.8 102.1

110.9 120.3 99.6

114.5 98.2 103.8

115.9 113.2 100.7

91.9 107.6

95.0

02-03-17

15


� A simple analysis of variance:

◦ The question of interest here is: is there a linkbetween heart rates and breed ?

◦ This question can be addressed using an ANOVA, i.e. the following model:

y(ij) = µ + αi + e(ij), i = 1, …, 3

or, using the matrix notation:

y = X*β + e where β =

H

W

A

µ

ααα


� A simple analysis of variance

◦ Elements of the model

=⇒

7007

0505

0066

75618

'*XX

=

1001

0011

0011

⋮⋮⋮⋮X

=⇒

0.709

9.544

6.640

5.1894

'*yX

=

0.95

8.100

6.106

⋮y



◦ Elements of the model (using R)

# Data> X<-matrix(c(rep(c(1,1,0,0),6),rep(c(1,0,1,0),5),+ rep(c(1,0,0,1),7)),byrow=TRUE,nr=18)> Y<-c(106.6,100.8,110.9,114.5,115.9,91.9,115.4,+ 97.8,120.3,98.2,113.2,100.2,102.1,99.6,103.8,+ 100.7,107.6,95.0)# Parameters estimators> XX<-t(X)%*%X> XY<-t(X)%*%Y



◦ Computing estimators of βIt is easy to see that X’X matrix is singular (the

first row is equal to the sum of the 3 following

ones => use a generalized inverse

> library(MASS)> G1<-ginv(XX)> b1<-G1%*%XY> b1

[,1][1,] 79.25810[2,] 27.50857[3,] 29.72190[4,] 22.02762

02-03-17

16



◦ Computing estimators of βNote that another generalized inverse could be

obtained « by hand », by setting the estimator

of µ = 0 (and inverting the remaining diag.):

=

7007

0505

0066

75618

'*XX

=⇒

7/1000

05/100

006/10

0000

G



◦ Computing estimators of β (using R)

# Another G> G2<-matrix(rep(0,16),nr=4)> G2[2,2]<-1/6> G2[3,3]<-1/5> G2[4,4]<-1/7# Check generalized inverse> XX%*%G2%*%XX

[,1] [,2] [,3] [,4][1,] 18 6 5 7[2,] 6 6 0 0[3,] 5 0 5 0[4,] 7 0 0 7



◦ Computing estimators of β (using R) (cont’d)

# Solutions> b2<-G2%*%XY> b2

[,1][1,] 0.0000[2,] 106.7667[3,] 108.9800[4,] 101.2857# i.e. the 3 breeds averages# Observe that:> b1[1,1]+b1[2,1][1] 106.7667> b2[1,1]+b2[2,1][1] 106.7667



◦ Testing a first hypothesis:

� H0: µA = µW = µH

which can be rewritten as:

H0: (µA = µW & µA = µH) or (µA - µW = 0 & µA - µH = 0)

� In terms of the general linear hypothesis, this can be

written as:

=

−−

0

0

1010

0110

H

W

A

µ

ααα

02-03-17

17



◦ Testing a first hypothesis using R:# Hypothesis 1> L<-matrix(c(0,1,-1,0,0,1,0,-1),byrow=TRUE,nr=2)> q<-2> n<-18> rX<-3> num<-(t(L%*%b1)%*%solve(L%*%G1%*%t(L))%*%L%*%b1)/q> den<-(t(Y-X%*%b1)%*%(Y-X%*%b1))/(n-rX)# Test the hypothesis> F<-num/den> F

[,1][1,] 1.549397 > pf(F,q,n-rX,lower.tail=FALSE)

[,1][1,] 0.2445187



◦ Easily testing the first hypothesis using R:

# Hypothesis 1> breed<-factor(c(rep(«A»,6),rep(«W»,5),rep(«H»,7)))> model<-lm(Y~breed)> summary(model)

Call:lm(formula = Y ~ breed)

Residuals:Min 1Q Median 3Q Max

-14.8667 -4.8964 0.3238 5.7907 11.3200



◦ Easily testing the first hypothesis using R:

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 106.767 3.225 33.106 1.94e-15 ***breedW 2.213 4.783 0.463 0.650 breedH -5.481 4.395 -1.247 0.231 ---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.9 on 15 degrees of freedomMultiple R-squared: 0.1712, Adjusted R-squared: 0.06071 F-statistic: 1.549 on 2 and 15 DF, p-value: 0.2445



◦ Testing a second hypothesis using R:# Hypothesis 2: H0: µ(H)=0.5*(µ(A)+µ(W))> L<-matrix(c(0,-0.5,-0.5,1),byrow=TRUE,nr=1)> q<-1> n<-18> rX<-3> num<-(t(L%*%b1)%*%solve(L%*%G1%*%t(L))%*%L%*%b1)/q> den<-(t(Y-X%*%b1)%*%(Y-X%*%b1))/(n-rX)# Test the hypothesis> F<-num/den> F

[,1][1,] 2.965257> pf(F,q,n-rX,lower.tail=FALSE)

[,1][1,] 0.1056209

02-03-17

18


� A more complex example:

◦ As a third example, consider this dataset

H St Ge Age HR H St Ge Age HR

1 T M 88 68 8 NT M 64 76

2 T M 96 64 9 NT M 77 75

3 T F 90 76 10 NT F 100 71

4 T F 73 71 11 NT F 75 85

5 T M 85 63 12 NT M 63 81

6 T F 99 63 13 NT M 73 80

7 T F 60 67 14 NT F 67 81

15 NT F 76 83


� A more complex example:

◦ Possible questions are:

� Is there any effect of training on the heart rate (HR) ?

� Is there an age effect and/or a gender effect on HR ?

� Is the (potential) training effect similar in males and in

females ?

◦ Answers:

� All these questions can be « easily » addressed using

a linear model…


� A more complex example :

◦ The model:

y(ijk) = µ + β*a(ijk) + τi + γj + (τγ)ij + e(ijk)

◦ Using the matrix notation:

y = X*β + e

where β’ = (µ,β,τT,τNT,γF,γM,τγTF,τγTM,τγNTF,τγNTM)



◦ Obtaining X and Y

=

83

81

80

81

85

71

75

76

67

63

63

71

76

64

68

y

=

01000110761

01000110671

10001010731

10001010631

01000110751

010001101001

10001010771

10001010641

00010101601

00010101991

00101001851

00010101731

00010101901

00101001961

00101001881

X

02-03-17

19



◦ Performing computations (using R)

# Building matrices> y<-c(68,64,76,71,63,63,67,76,75,71,85,81,80,81,83)> X<-matrix(rep(0,150),nr=15)> X[,1]<-rep(1,15)> X[,2]<-c(88,96,90,73,85,99,60,64,77,+ 100,75,63,73,67,76)> X[,3]<-c(rep(1,7),rep(0,8))> X[,4]<-1-X[,3]> X[,5]<-c(0,0,1,1,0,1,1,0,0,1,1,0,0,1,1)> X[,6]<-1-X[,5]> X[,7]<-c(X[,5][1:7],rep(0,8))> X[,8]<-c(X[,6][1:7],rep(0,8))> X[,9]<-c(rep(0,7),X[,5][8:15])> X[,10]<-c(rep(0,7),X[,6][8:15])



◦ Performing computations (using R) (cont’d)

# Computing estimators> XX<-t(X)%*%X> Xy<-t(X)%*%y> G<-ginv(XX)> b<-G%*%Xy

[,1][1,] 38.1162191[2,] -0.1592766[3,] 15.6683053[4,] 22.4479138[5,] 20.1285345[6,] 17.9876846[7,] 8.1587098[8,] 7.5095955[9,] 11.9698247

[10,] 10.4780891



◦ Remarks:

� Other solutions could be obtained, using other

generalized inverses.

� Since regression parameters are « estimable », the

other solutions would give the same solution for β(see notes for an example).



◦ Testing hypotheses (using R)

# test_H: a generic function to test hypothesestest_H<-function(X,y,L,c) {

library(MASS)XX<-t(X)%*%XG<-ginv(t(X)%*%X)b<-G%*%t(X)%*%ynum<-t(L%*%b-c)%*%solve(L%*%G%*%t(L))%*%(L%*%b-c)den<-t(y-X%*%b)%*%(y-X%*%b)q<-dim(L)[1]n<-length(y)rX<-qr(XX)$rankF<-(num/q)/(den/(n-rX))pF<-pf(F,q,n-rX,lower.tail=FALSE)c(F,pF)

}

02-03-17

20



◦ a) Testing the regression coefficient

◦

◦ b) Testing the training effect

# Test: beta=0> L<-matrix(c(0,1,rep(0,8)),nr=1)> c<-matrix(c(0))> test_H(X,y,L,c)[1] 2.1324558 0.1749022# No significant regression

# Test: tau(T)-tau(NT)=0> L<-matrix(c(0,0,1,-1,rep(0,6)),nr=1)> c<-matrix(c(0))> test_H(X,y,L,c)[1] 14.951074581 0.003126119# Significant training effect


� A word of caution:

◦ When testing the training effect, we actually

compare the means of the 2 groups (Trained –

Not Trained)

◦ The raw means embed information on other

effects of the model, which might not be

desirable…

� This can be shown by replacing the observation by the

assumed model and averaging over each group (see

next slide)



( ) ( )T

TMTFMFTTT eay +++++++=

7

*3*4

7

*3*4*

τγτγγγτβµ

( ) ( )NT

NTMNTFMFNTNTNT eay +++++++=

22*

τγτγγγτβµ

( ) ( )( ) ( ) ( ) ( ) ( )NTT

NTMNTFTMTF

MFNTTNTTNTT

ee

aayy

−+++++

−+−+−=−

14

*7*7*6*814

*

τγτγτγτγ

γγττβ



◦ This complicated expression shows that:

� Due to the non-balanced nature of the dataset,

comparing training statuses involves the gender

effect

� The presence of a covariate might induce differences

if both groups are not balanced wrt age

� The potential interactions between training status

and gender might render comparison of status

meaningless.

02-03-17

21


� A possible solution:

◦ Use « Least Square Means » (LSM)

� We first obtain averages (LSM) on subgroups

� We average these means to obtain marginal LSM

� Example: LSM(T,F) = ?, LSM(T) = ?

Heart rates Trained Not Trained

Females 76,71,63,67 71,85,81,83

Males 68,64,63 76,75,81,80

( ) TFTFFTTF eay +++++= τγγτβµ *

Conventionally averaged over all dataset




� We first obtain averages on subgroups

# Compute L for the 4 subgroups> L_TF<-matrix(c(1,mean(X[,2]),1,0,1,0,1,0,0,0),nr=1)> L_TM<-matrix(c(1,mean(X[,2]),1,0,0,1,0,1,0,0),nr=1)> L_NTF<-matrix(c(1,mean(X[,2]),0,1,1,0,0,0,1,0),nr=1 )> L_NTM<-matrix(c(1,mean(X[,2]),0,1,0,1,0,0,0,1),nr=1 )# Compute LSM for the 4 subgroups> LSM_TF<-L_TF%*%b> LSM_TM<-L_TM%*%b> LSM_NTF<-L_NTF%*%b> LSM_NTM<-L_NTM%*%b




� We then average to obtain marginal LSM

# Compute LSM for the main effects> LSM_T<-0.5*(LSM_TF+LSM_TM)> LSM_NT<-0.5*(LSM_NTF+LSM_NTM)> LSM_F<-0.5*(LSM_TF+LSM_NTF)> LSM_M<-0.5*(LSM_TM+LSM_NTM)




� Finally, since they are linear combinations of the

parameters, LSM (or differences of LSM) can be

tested using the general linear hypothesis test given

above !

� Example: let’s compare the T & NT groups

LSM_T-LSM_NT

= 0.5*(LSM_TM + LSM_TF - LSM_NTF - LSM_NTM)

= 0.5*(L_TM + L_TF - L_NTF - L_NTM)*b

= (0,0,1,-1,0,0,0.5,0.5,-0.5,-0.5)*b

02-03-17

22




� (0,0,1,-1,0,0,0.5,0.5,-0.5,-0.5)*b

# Compute difference of LSM between T and NT> L<-matrix(c(0,0,1,-1,0,0,0.5,0.5,-0.5,-0.5),nr=1)> c<-0> test_H(X,y,L,c)[1] 14.951074581 0.003126119# Same result as before, so showing that the obtained# solution is corrected for the other effects of the # model !

A (even) more complex situation

� Imagine the following situation

◦ 2 groups of 2 individuals are followed

longitudinally and 3 measures are taken on

each individual at 3 specific times (see figure)

A more complex situation

� Some questions are:

◦ Is there a significant difference in the

measures between the groups ?

◦ Is there a significant difference in the

measure between the times ?

� If yes, for which times ?

◦ [Are the 2 groups dynamic behaviour

different ?]


� These questions can easily be adressed

using linear models, as done above.

◦ Omitting the interaction for simplicity:

=

2.97

5.88

0.85

6.110

8.89

5.91

0.118

7.113

7.103

3.116

4.106

4.89

y

=

100101

010101

001101

100101

010101

001101

100011

010011

001011

100011

010011

001011

X

=

3

2

1

2

1

τττγγµ

β

02-03-17

23


� LM analysis, using R (1):

## Observations#> y<-c(89.4,106.4,116.3,103.7,113.7,118.0,91.5,89.8,+ 110.6,85.0,88.5,97.2)## Design matrix#> X<-matrix(rep(0,72),nr=12)> X[,1]<-1> X[1:6,2]<-1> X[7:12,3]<-1> X[c(1,4,7,10),4]<-1> X[c(2,5,8,11),5]<-1> X[c(3,6,9,12),6]<-1



## Compute solutions#> XX<-t(X)%*%X> Xy<-t(X)%*%y> library(MASS)> b<-ginv(XX)%*%Xy## Test of group effect#> L<-matrix(c(0,1,-1,0,0,0),nr=1)> c<-matrix(c(0),nr=1)> test_H(X,y,L,c)[1] 14.89196727 0.00481566## Significant group effect (p = 0.0048)



## Test of time effect#> L<-matrix(c(0,0,0,1,-1,0,0,0,0,0,1,-1),nr=2,byrow=T )> c<-matrix(c(0,0),nr=2)> test_H(X,y,L,c)[1] 8.25934879 0.01133366# Significant time effect (p = 0.0113)

# Or, equivalently:> L<-matrix(c(0,0,0,1,-1,0,0,0,0,1,0,-1),nr=2,byrow=T )> c<-matrix(c(0,0),nr=2)> test_H(X,y,L,c)[1] 8.25934879 0.01133366# Significant time effect (p = 0.0113)



# Obtain LSM for groups> L_G1T1<-matrix(c(1,1,0,1,0,0),nr=1)> L_G1T2<-matrix(c(1,1,0,0,1,0),nr=1)> L_G1T3<-matrix(c(1,1,0,0,0,1),nr=1)> L_G2T1<-matrix(c(1,0,1,1,0,0),nr=1)> L_G2T2<-matrix(c(1,0,1,0,1,0),nr=1)> L_G2T3<-matrix(c(1,0,1,0,0,1),nr=1)> LSM_G1T1<-L_G1T1%*%b> LSM_G1T2<-L_G1T2%*%b> LSM_G1T3<-L_G1T3%*%b> LSM_G2T1<-L_G2T1%*%b> LSM_G2T2<-L_G2T2%*%b> LSM_G2T3<-L_G2T3%*%b> LSM_G1<-(LSM_G1T1+LSM_G1T2+LSM_G1T3)/3> LSM_G2<-(LSM_G2T1+LSM_G2T2+LSM_G2T3)/3

02-03-17

24


� LM analysis, using R (5):## Show LSM for groups, and difference#> c(LSM_G1,LSM_G2,LSM_G1-LSM_G2)[1] 107.91667 93.76667 14.15000## Test whether true difference is 0#> L_delta<-(L_G1T1+L_G1T2+L_G1T3-L_G2T1-L_G2T2-L_G2T3 )/3> c_delta<-matrix(c(0),nr=1)> LSM_delta<-L_delta%*%b## Test the difference#> test_H(X,y,L_delta,c_delta)[1] 14.89196727 0.00481566# Of course identical to previous groups test


� LM analysis, summary:

◦ Everything seems fine, but...

� Independence assumptions have clearly been violated (measures taken on the same individualare likely to be correlated)

� Assuming erroneously independence might:

� Underestimate random residual variation (σ²e)

� Consequently, overestimate effects...

� An thus, increase false positive rates

◦ So, a question of interest is: how can we take these correlations intoaccount ?


� Idea: use a more general family of linear

models, named « mixed models »,

allowing for correlations

◦ « Mixed » refers to the simultaneous use of

« fixed » an « random » effects

� Fixed: this effect would be the same if we repeat

the experiment

� Example: groups, times

� Random: this effect is randomly sampled in a

population of possible levels

� Example: animals

� Matrix formulation:

y = X*β + Z*u + e

◦ Z = design (or incidence) matrix, linking the

random parameters to the observations.

� known

◦ u = vector of random effects

� unknown, values to be predicted

� assumed to be random samples from N(0,I*σ²u)

� so, var(ui) = σ²u for all i

� and cov(ui,uj) = 0 for all combinations of i and j ≠ i (i.e.

individuals are assumed to be un(co)related

� σ²u is an unknown parameter, to be estimated


02-03-17

25

� Matrix formulation (cont’d):

y = X*β + Z*u + e

◦ e = vector of random residuals

� unknown

� assumed to be random samples from N(0,I*σ²e)

� so, var(ei) = σ²u for all i

� and cov(ei,ej) = 0 for all combinations of i and j (i.e. individuals

are assumed to be un(co)related

� σ²e is an unknown parameter, to be estimated

◦ Furthermore, we will assume:

� Cov(ui,ej) = 0 for all i,j


� Variances and covariances

◦ u ~ N(0;G) with G = I*σ²u

◦ e ~ N(0;R) with R = I*σ²e

◦ V = V(y)

= V(X*β + Z*u + e)

= V(X*β) + V(Z*u) + V(e)

+ 2*Cov(X*β,Z*u) + 2*Cov(X*β,e)

+ 2*Cov(Z*u,e)

= 0 + Z*V(u)*Z’ + R + 0 + 0 + Z*Cov(u,e)

= Z*G*Z’ + R + 0

= Z*G*Z’ + R


� Variances and covariances: example

◦ Back to our problem...


## Random effect design matrix#> Z<-matrix(rep(0,48),nr=12)> Z[c(1:3),1]<-1> Z[c(4:6),2]<-1> Z[c(7:9),3]<-1> Z[c(10:12),4]<-1## Known correlation matrices# Arbitrary values are used to start with#> sigma_2_a<-10.0> sigma_2_e<-20.0> G<-diag(4)*sigma_2_a # No correlation between animal s> R<-diag(12)*sigma_2_e # No correlation between residu als


◦ Back to our problem...


## Observations variance-covariance matrix#> V<-Z%*%G%*%t(Z)+R> V

[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12][1,] 30 10 10 0 0 0 0 0 0 0 0 0[2,] 10 30 10 0 0 0 0 0 0 0 0 0[3,] 10 10 30 0 0 0 0 0 0 0 0 0[4,] 0 0 0 30 10 10 0 0 0 0 0 0[5,] 0 0 0 10 30 10 0 0 0 0 0 0[6,] 0 0 0 10 10 30 0 0 0 0 0 0[7,] 0 0 0 0 0 0 30 10 10 0 0 0[8,] 0 0 0 0 0 0 10 30 10 0 0 0[9,] 0 0 0 0 0 0 10 10 30 0 0 0

[10,] 0 0 0 0 0 0 0 0 0 30 10 10[11,] 0 0 0 0 0 0 0 0 0 10 30 10[12,] 0 0 0 0 0 0 0 0 0 10 10 30

02-03-17

26


◦ More generally, in our problem...


++

++

++

++

++

++

=

2222

2222

2222

2222

2222

2222

2222

2222

2222

2222

2222

2222

000000000

000000000

000000000

000000000

000000000

000000000

000000000

000000000

000000000

000000000

000000000

000000000

euuu

ueuu

uueu

euuu

ueuu

uueu

euuu

ueuu

uueu

euuu

ueuu

uueu

σσσσσσσσσσσσ




V

� We can see that:

◦ Introducing a random individual effect in the

model has led to introduce a correlation

between observations on the same

individual

◦ The price to be paid:

� More parameters (u, σ²u)

� Much more complicated resolution => use of

specialized softwares (SAS, AIREML, ...)

� Some details given in the appendix for R solution


� An alternative solution:

◦ Instead of introducing an individual effect in

order to correlate the observations,

correlations can be introduced directly in the

R matrix

� No random effect anymore (Z = 0)

� V = Z*G*Z’ + R = R

� Two parameters (σ1² & σ2²) need to be estimated, with

R(i,i) = σ1² + σ2² for all i

R(i,j) = σ1² for all i ≠ j (same animal)


� An alternative solution (cont’d):� Other equivalent coding: R = K*σ²

σ² = σu² + σe², ρ = σu²/(σu² + σe²)

K(i,i) = 1 for all i

K(i,j) = ρ for all i,j (same animal)

� This correlation structure is referred to as « compound

symmetry » (CS). It involves only 2 parameters (σ² and ρ).


02-03-17

27

� An alternative solution (cont’d):

◦ The latter approach is more flexible, because

other R structures than CS can be introduced

� For example, it could be expected that measures

taken on times 1 and 3 should be less correlated

than measures taken on times 1 and 2

� A possible structure to model this is:

� σ² = σu² + σe², ρ = σu²/(σu² + σe²)

K(i,i) = 1 for all i

K(i,j) = ρ|i-j| for all i,j (same animal)

� This type of structure is named « type 1 auto-regression »

(AR(1)), and also involves only 2 parameters (σ² and ρ)


� Model selection refers to procedures used

to select the « best » model to be used on

a given dataset

� Various approaches have been proposed,

and we’ll show one as an example

� Although almost automated procedures

exist and are implemented, nothing will

replace the experimenter knowledge of

the problem and sound reasoning...

A word on model selection

� We can re-use a previous example to

show the rationale:

A word on model selection

H St Ge Age HR H St Ge Age HR

1 T M 88 68 8 NT M 64 76

2 T M 96 64 9 NT M 77 75

3 T F 90 76 10 NT F 100 71

4 T F 73 71 11 NT F 75 85

5 T M 85 63 12 NT M 63 81

6 T F 99 63 13 NT M 73 80

7 T F 60 67 14 NT F 67 81

15 NT F 76 83

� Several models could be used to select

the « best » one:

(1) yi = m + ei

(2) yi = m + b*Ai + ei

(3) yi = m + b*Ai + Gi + ei

(4) yi = m + b*Ai + Gi + Si + ei

(5) yi = m + b*Ai + Gi + Si + Gi*Si + ei

(6) yi = m + Gi + Si + Gi*Si + ei

(7) yi = m + Gi + Si + ei

...

Possible models

02-03-17

28

� A model is nested within another if it is

built with a subset of the factors of this

model:

(1) => (2) => (3) => (4) => (5)

(7) => (6) => (5)

but, for example:

(6) ≠> (4)

� Comparing nested (linear) models can be

done using a F test (see next slide)

Nested models

� Idea: if a new factor does not contribute

significantly to the model, the extra-fit

provided by this factor is an estimator of

the error variance.

� Remark: when models are not nested,

other criteria (such as ‘Akaike Information

Criterion = AIC) must be used.

F test for nested models selection

Fq,n-r(X) = [(Xc*bc-Xr*br)’*y/q] / [(y-Xc*bc)’*(y-Xc*bc)/(n-r(Xc))]

� Example: comparing (3) to (2)

F test for nested models selection

# Reduced model> Xr<-matrix(rep(0,30),nr=15)> Xr[,1]<-rep(1,15)> Xr[,2]<-age# Complete model> Xc<-matrix(rep(0,45),nr=15)> Xc[,1]<-rep(1,15)> Xc[,2]<-age> Xc[,3]<-gender# Solutions> library(MASS)> br<-ginv(t(Xr)%*%Xr)%*%t(Xr)%*%hr> bc<-ginv(t(Xc)%*%Xc)%*%t(Xc)%*%hr# Test > num<-t(Xc%*%bc-Xr%*%br)%*%hr/1> den<-t(hr-Xc%*%bc)%*%(hr-Xc%*%bc)/(15-2)> pf(num/den,q,n-rxc,lower.tail=FALSE)

[,1][1,] 0.4151881

Appendix

02-03-17

29

� Procedure:

1. Obtain variance components (σ²,ρ,...)

estimation using specific methods

� The today preferred method is called REML

(REstricted Maximum Likelihood)

2. Obtain solutions using these estimates

� A practical method is to used so-called

Henderson’s mixed model equations (MME)

3. Perform (approximate) testing

� Use a modified version of the general linear

hypothesis described above

Appendix: solving mixed models

� Procedure – 1) REML: AI algorithm

◦ θ(k+1) = θ(k) + AI-1*SC

� k = iteration #

� θ’ = (σ²u,σ²e)’ and θ(0) ~ arbitrary

� SC = « score vector »

� SC(1) = 0.5*y’*P*Z*G*Z’*P’*y – trace(Z*G*Z’*P)

� P = « hat matrix » = V-1 – V-1*X’*(X’*V-1*X)-*X*V-1

� SC(2) = 0.5*y’*P*P’*y – trace(P)

� AI = « average information matrix »


=

P'*y*P*y'*PZ'*P'*y*G*Z*P*y'*P

P'*y*Z'*P*G*Z*y'*PZ'*P'*y*G*Z*Z'*P*G*Z*y'*PAI

� Procedure – 1) REML: R implementation


## REML estimators computation#library(MASS)# Init computationsdiff<-1000.0AI<-matrix(rep(0,4),nr=2)SC<-matrix(rep(0,2),nr=2)sigma_2_u<-10.0sigma_2_e<-20.0sigma_2<-c(sigma_2_u,sigma_2_e)# Loop while estimates differwhile (diff>0.01) {

# Loop body => see next slide}



# Loop body (1) # Variance of the observationsZGZ<-Z%*%G%*%t(Z)V<-(ZGZ)*sigma_2_u+R*sigma_2_e# P matrixVi<-solve(V)XVi<-t(X)%*%ViP<-Vi-t(XVi)%*%(ginv(XVi%*%X)%*%XVi)# Partial computationsPy<-P%*%yZGZP<-ZGZ%*%PZGZPy<-ZGZ%*%Py# Continued on next slide...

02-03-17

30



# Loop body (2) # TracestrP<-0trPZGZ<-0for (i in 1:dim(P)[1]) {

trP<-trP+P[i,i]trPZGZ<-trPZGZ+ZGZP[i,i]

}# AI matrixAI[1,1]<-0.5*t(ZGZPy)%*%(P%*%ZGZPy)AI[1,2]<-0.5*t(Py)%*%(P%*%ZGZPy)AI[2,2]<-0.5*t(Py)%*%(P%*%Py)AI[2,1]<-AI[1,2]# Score vectorSC[1]<-0.5*(t(Py)%*%ZGZPy-trPZGZ)SC[2]<-0.5*(t(Py)%*%Py-trP)



# Loop body (3) # New estimatorsnew_sigma_2<-sigma_2+solve(AI)%*%SCnew_sigma_2_u<-new_sigma_2[1]new_sigma_2_e<-new_sigma_2[2]# Differencediff<-(sigma_2[1]-new_sigma_2[1])**2diff<-diff+(sigma_2[2]-new_sigma_2[2])**2sigma_2<-new_sigma_2

}sigma_2

[,1][1,] 18.82629[2,] 26.21528

� Procedure – 2) MME: method

◦ BLUE ( ) and BLUP ( ) can be obtained using:

◦ In our case, this can be written:


=

+ −

−

−−−

−−

y*Z'*R

y*X'*R

u

β

GZ*Z'*RX*Z'*R

Z*X'*RX*X'*R1

1

111

11

ˆ

ˆ

β u

=

+ Z'*y

X'*y

u

β

IZ'*ZZ'*X

X'*ZX'*X

ˆ

ˆ

)/(* 22ue σσ

� Procedure – 2) MME: R implementation


MMEl<-matrix(rep(0,100),nr=10)MMEr<-matrix(rep(0,10),nr=10)MMEl[1:6,1:6]<-t(X)%*%XMMEl[1:6,7:10]<-t(X)%*%ZMMEl[7:10,1:6]<-t(Z)%*%XMMEl[7:10,7:10]<-t(Z)%*%Z+solve(G)*(sigma_2[2]/sigm a_2[1])MMEr[1:6]<-t(X)%*%yMMEr[7:10]<-t(Z)%*%ysol<-ginv(MMEl)%*%MMEr

02-03-17

31

� Procedure – 3) Approximate testing

◦ The approximation comes from the fact that

only estimators of β and u are available

◦ The denominator degres of freedom are

estimated using various methods (see

details in the litterature) well beyond the

scope of this text...!

◦ The estimator is a simple extension of the

method for fixed effects only models


Fq,v = [(L*b-c)’*(L*C*L’)-1*(L*b-c)/q] ^



# Testing group effects> L<-matrix(c(0,1,-1,rep(0,7)),nr=1)> Lsol<-L%*%sol> C_hat<-ginv(MMEl/sigma_2[2])> LCL<-L%*%(C_hat%*%t(L))> LCLi<-ginv(LCL)> Fg<-t(Lsol)%*%(LCLi%*%Lsol)/1> dfg1<-1> dfg2<-2 # Cfr Kenward-Rogers...> 1-pf(Fg,dfg1,dfg2)

[,1][1,] 0.1145035>



# Testing time effects> L<-matrix(c(0, 0,0, 1,-1,0, 0,0,0,0, 0, 0,0, 1,0,-1 ,+ 0,0,0,0),nr=2,byrow=T) > Lsol<-L%*%sol> C_hat<-ginv(MMEl/sigma_2[2])> LCL<-L%*%(C_hat%*%t(L))> LCLi<-ginv(LCL)> Ft<-t(Lsol)%*%(LCLi%*%Lsol)/1> dft1<-1> dft2<-6 # Cfr Kenward-Rogers...> 1-pf(Ft,dft1,dft2)

[,1][1,] 0.006966431 >

� The same analyses, with SAS: program


options ls=80;data phd;

input groupe temps animal pheno @@;cards;

1 1 1 89.4 1 1 2 103.7 1 2 1 106.4 1 2 2 113.71 3 1 116.3 1 3 2 118.0 2 1 3 91.5 2 1 4 85.02 2 3 89.8 2 2 4 88.5 2 3 3 110.6 2 3 4 97.2;proc glm;

class groupe temps;model pheno=groupe temps;lsmeans groupe /pdiff stderr;

proc mixed;class groupe temps animal;model pheno=groupe temps / solution;repeated /sub=animal type=cs;

02-03-17

32

� The same analyses, with SAS: result (1)


SASlisting