Introduction to linear mixed models for repeated measurements...

Introduction to linear mixed modelsfor repeated measurements data:Analysis of single group studies

with SAS PROC MIXED.

Julie Forman, Section of Biostatistics, University of Copenhagen

November 16, 2019

Contents

1 Introduction 3

1.1 How to use this tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Where to learn more . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 About single group studies 4

2.1 What is a balanced single group study? . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Note the difference: Balanced vs unbalanced design . . . . . . . . . . . . . . . 5

2.3 Note the difference: Complete vs incomplete data . . . . . . . . . . . . . . . . 5

2.4 Which analysis: Linear mixed model or paired t-tests? . . . . . . . . . . . . . 5

2.5 Case: The gastric bypass study. . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Preparing data for analysis 7

3.1 For a quick start: Create the case study data within SAS . . . . . . . . . . . . . 7

3.2 Inspecting the data in SAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

3.3 Numerical and categorical variables in SAS . . . . . . . . . . . . . . . . . . . 8

3.4 The long and the wide data format . . . . . . . . . . . . . . . . . . . . . . . . 8

3.5 Why have a categorical and a numerical version of the time variable? . . . . . 9

3.6 SAS-program: Transforming data from the wide to the long format . . . . . . . 10

1

4 Descriptive statistics 10

4.1 Spaghettiplots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4.2 Scatterplot matrices and correlations . . . . . . . . . . . . . . . . . . . . . . . 12

4.3 Is data normally distributed - and does it matter? . . . . . . . . . . . . . . . . . 13

4.4 Trends in summary statistics over time . . . . . . . . . . . . . . . . . . . . . . 13

5 About linear mixed models 15

5.1 What is a linear mixed model? . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.2 Linear mixed model terminology . . . . . . . . . . . . . . . . . . . . . . . . . 16

5.3 Describing linear mixed models in research papers . . . . . . . . . . . . . . . 17

5.4 Case: Oneway ANOVA-like linear mixed model . . . . . . . . . . . . . . . . . 17

5.5 The unstructured covariance pattern . . . . . . . . . . . . . . . . . . . . . . . 18

5.6 Maximum likelihood estimation? . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.7 Numerical optimisation and convergence? . . . . . . . . . . . . . . . . . . . . 20

5.8 Inference: Confidence intervals and hypothesis testing? . . . . . . . . . . . . . 21

5.9 Important: Sample size matters! . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.10 Predicted values and residuals . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6 Analysis and interpretation of the linear mixed model 25

6.1 SAS program: Specifying the linear mixed model . . . . . . . . . . . . . . . . 26

6.2 Checking convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

6.3 Interpretation of estimates, confidence intervals, and p-values . . . . . . . . . . 27

6.4 SAS Program: Alternative syntax (all means and differences) . . . . . . . . . . 29

6.5 Using predicted values and residuals for validation . . . . . . . . . . . . . . . 30

A Appendix: Importing data from an external file. 33

B Appendix: What is in the output from PROC MIXED? 33

C Appendix: Analysis of log-transformed data. 37

? Section contains a high amount of mathematical/technical details.

2

1 Introduction

The tutorial was developed as a supplement to the ph.d. course statistical analysis of correlatedand repeated measurements for health researchers which I teach every fall at the Faculty ofHealth Science of the University of Copenhagen. However, my intention was that the tutorialshould also function as a self-learning text.

The tutorial offers a crash course on linear mixed models for anyone who plan to analyze re-peated measurements data. For a gentle start, I exemplify single group studies, since theseprovide the most simple examples of mixed model analyses.

To be specific: I will teach you how to analyze quantitative data from a balanced single groupfollow-up study using a linear mixed model as implemented in PROC MIXED in SAS statis-tical software. Please note that similar statistical models can be used to analyze studies whererepeated measurements on the same subjects have been made over:

1. Pre-specified, follow-up times, i.e. as in the examplified case study.

2. Pre-specified treatments, i.e. in a fixed sequence cross-over study.

3. Pre-specified locations of the body, i.e. in a study with physiological correlation.

Or any other single group study where the replications factor (i.e. time/treatment/location/. . . )is the only explanatory variable of interest in the analysis. Note that this tutorial makes nocomparison of subgroups, no adjustment for potential confounders, and so forth. I will merelycompare the outcomes between the follow-up times in the case study. To learn how to analysemore complex repeated measurements, check my lecture notes and the other tutorials at mywebpage (see link below).

1.1 How to use this tutorial

You are encouraged to track the analyses of the case study data while reading about them.

To run the tutorial you will need the following files:

1. The data file gastricbypass_data.sas which creates the data within SAS.

2. The program file gastricbypass.sas containing the SAS-code to analyze the data

3. This pdf file which explains study aims, modeling, code, and interpretation of analyses.

Sections 3, 4, and 6 of the tutorial explains the data management, descriptive statistics, andstatistical modeling needed to analyze data in SAS. Section 2 contains terminology for singlegroup study designs and section 5 contains terminology and general statistical theory for linearmixed models. Appendices contain supplementary SAS-code and explanations of output.

3

1.2 Where to learn more

Additional lecture notes, exercises, and other tutorials can be found at my webpage:

http://publicifsv.sund.ku.dk/~jufo/RepeatedMeasures2019.html

If you want to learn more about longitudinal data analysis in general, a thorough introductionis given in the book by Fitzmaurice, Laird & Ware (2012). Students at the University of Copen-hagen have free access to the e-version at the Royal Library. Supplementary SAS-programs canbe found at the book webpage:

https://content.sph.harvard.edu/fitzmaur/ala2e/

I you want to learn about mixed model analysis with the nlme-package in R statistical soft-ware instead, a detailed account is given in the book by Pinheiro & Bates (2006), who developedthe nlme-package, Pinheiro et al. (2019).

Before engaging in analysis of repeeated measurements data, I would strongly recommend thatyou acquire basic SAS programming skills. SAS Instistute offers several e-learning coursesfree of charge at:

https://support.sas.com/ -> Training and Books -> Training -> e-Learning

2 About single group studies

In this section I will characterize the single group study design, discus its statistical properties,and introduce the gastric bypass study which will be used as case study throughout the tutorial.

2.1 What is a balanced single group study?

From a repeated measurements point of view, a balanced single group study is when we have:

1. A sample of subjects from a homogeneous population,

2. from which outcomes are collected repeatedly, at pre-specified -

• e.g. time points (follow-up study),• e.g. treatments (fixed sequence cross-over study)• e.g. locations of the body (study with physiological correlation)

3. - which are intentionally the same for all subjects (i.e. the study design is balanced, ex-planation follows below).

Furthermore, the models considered in this tutorial assume that the sole purpose of the analysisis to evaluate the effect of the replication factor. In other words that the aim of the study was tocompare the outcomes between the follow-up times (or the treatments, or the locations, . . . ).

4

2.2 Note the difference: Balanced vs unbalanced design

A single group study which satisfies the last condition on the list is said to be balanced. Pleasenote that this is a property of the study design, not of the data. In real life collected data froma follow-up study is rarely perfectly balanced, since subjects are seldom assessed at the exactsame times. As long as deviations from the schedule are small and random this has little con-sequence for the statistical results. Data may still be analyzed using the linear mixed modeldescribed in this tutorial.

In contrary unbalanced study designs require substantially different modeling not describedin this tutorial. An example of an unbalanced study design is a retrospective registry studywhere follow-up times vary between subjects, e.g. because patients are seen according to needand ressources rather than according to a strict schedule.

2.3 Note the difference: Complete vs incomplete data

Data may also deviate from the study plan by being incomplete. This is when some of theplanned measurements are missing.

From a technical point of view missing data do not prevent analysis. Linear mixed modelanalyses may still be performed and are even optimal (in terms of statistical power) as long asthe missing data mechanism is missing at random (see lecture notes on missing data).

However, statistical results may be biased if data is missing due to reasons that are related to themissing outcomes. E.g. if the most ill patients drop out of a follow-up study, those who remainno longer make up a representative sample from the study population (their outcomes will leavean overly optimistic impression of how the whole population is faring). Therefore potentialreasons of missingness and their impact on the outcomes of the study should be consideredprior to analysis. If you have data that are missing for all but harmless reasons, I recommendyou consult with a statistician.

2.4 Which analysis: Linear mixed model or paired t-tests?

If your interest is to compare time points two by two in a single group follow-up study, then youcould also make the comparisons with the plain old paired t-test. The linear mixed model andthe paired t-tests will give you the exact same results if the data is complete. However, there isa computational advantage in using SAS PROC MIXED; you only need one extra line of codeto make all of the pairwise comparisons (see section 6.4).

If you have missing outcomes, the linear mixed model is optimal under a missing at randomassumption (see lecture on missing data), whereas the paired t-tests have less statistical powerand may be biased.

5

2.5 Case: The gastric bypass study.

Throughout the tutorial I will use data from a small single group follow-up study, Jorsal et al.(2019), in which n = 20 obese subjects were followed prior to and after gastric bypass surgery.

Follow-up times: -3 months before surgery (baseline), -1 week before surgery, +1 week aftersurgery, and +3 months after surgery (end point).

Outcomes: Gene-expression data from mucosal biopsies and gut hormones from blood samplesas well as ordinary clinical data.

Study aims: To describe changes in the gut profile in response to the treatment. For completerecording clinical outcomes were also analyzed. Note that the subjects underwent a diet inpreparation for the surgery. Hence, genuine physiological changes occurred already prior tothis.

For simplicity, I will focus on two particular secondary outcomes from the study: bodyweight(kg) and serum glucagon (a gut hormone). The latter was assessed by a liquid mixed mealtolerance test and summarized by total area under the curve, see Jorsal et al. (2019) for details.

The motivation for picking these two particular outcomes was to highlight their statistical fea-tures and their implications for analysis. The original study contained many more outcomes,some of which were of greater scientific interest than the ones presented here.

Figure 1: Bodyweight (kg) and serum glucagon (total auc following a liquid mixed meal toler-ance test) in n = 20 obese subjects, recorded -3 months before, -1 week before, +1 week after,and +3 months after gastric bypass surgery.

The case study data is shown in figure 1. We note a decreasing trend in bodyweight throughoutthe study, both in the individual subjects and in the population overall. Concerning glucagon,surgery tend to increase levels in the population, but the long term effect is less pronounced thanthe short term effect. On the individual level, the majority of the subjects seem to experience anincrease following surgery, but there are also subjects who experience a drop. Overall individualglucagon levels are much less predictable than individual bodyweights. In statistical terms wewould say that the serial correlation in bodyweight is much stronger than in glucagon.

The tutorial contains a step-by-step analysis of the bodyweights. The corresponding analysis of

6

glucagon data is left as an exercise (see the course webpage for questions and solutions).

3 Preparing data for analysis

A trimmed version of the original records from the gastric bypass study are contained in thedatafile gastricbypass.txt. It contains the following variables:

Variable Contentsid Pseudo-ID (anonymized).weight1-4 Bodyweight in kg at 1st, 2nd, 3rd, and 4th follow-up.glucagonAUC1-4 Total AUC for glucagon at 1st, 2nd, 3rd, and 4th follow-up.

3.1 For a quick start: Create the case study data within SAS

To give you a quick start on learning linear mixed model analyses, I have made the program filegastricbypass_data.sas which creates the case study data within SAS. If you wouldrather read in the data from the txt file, see appendix A.

To create the data, submit the file gastricbypass_data.SAS to SAS. Open the programfile and hit the Run buttom.

The code ought to look as follows:

/******** Data from gastric bypass study *********************//* Single group study on n=20 obese subject with follow-up at -3months, -1 week, +1 week, and +3 months before/after surgery */

DATA wide;INPUT id weight1-weight4 glucagonAUC1-glucagonAUC4;DATALINES;1 127.2 120.7 115.5 108.1 5032.50 4942.50 20421.00 9249.452 165.2 153.4 149.2 132.0 12142.50 14083.50 10945.50 7612.50

(20 records in total)

20 100.9 95.7 89.9 78.8 14299.50 8739.00 26638.50 11410.50;RUN;

3.2 Inspecting the data in SAS

Once you have managed to create the case study data in SAS, you should be able to locate thedataset wide in the workspace. If you are runnning SAS Enterprise Guide, the data wideshould appear in the Output data window and you can look at its contents.

For a quick overview of the data, use:

7

PROC CONTENTS DATA=wide;RUN;

which results in:

The CONTENTS Procedure

Data Set Name WORK.WIDE Observations 20Member Type DATA Variables 9

(less interesting output omitted)

Alphabetic List of Variables and Attributes

# Variable Type Len6 glucagonAUC1 Num 87 glucagonAUC2 Num 88 glucagonAUC3 Num 89 glucagonAUC4 Num 81 id Num 82 weight1 Num 83 weight2 Num 84 weight3 Num 85 weight4 Num 8

This tells us that the dataset wide contains nine variables and 20 records, what the names ofthe variables are, and what their data types in SAS are.

3.3 Numerical and categorical variables in SAS

It is important to notice that all variables in the wide data initially appear as numerical in SAS.This means that all variables are considered as numbers unless we specifically tell SAS to treatthem differently.

However, id-number is truly is a nominal variable which is labeled 1, 2, 3,. . . for convenience.It is not so that subject number four is twice as much of a person as subject number two, andwe won’t get subject six by adding the two! We should keep this in mind when making theanalyses.

3.4 The long and the wide data format

In order to perform linear mixed model analyses it is important to distinguish two differentformats for the same data.

1. The wide format with one record (dataline) per subject.

8

2. The long format with several records (datalines) per subject, one for each occasion.

The dataset wide is in the wide format. This is the natural format when collecting data ina database because it is compact and makes it easy to compare measurements from the samesubject between occasions. E.g the records from id=2 and id=3 in the wide data reads:

id weight1 weight2 weight3 weight4 glucagonAUC1 . . . glucagonAUC42 165.2 153.4 149.2 132.0 12142.50 . . . 7612.503 109.7 101.6 97.7 87.1 10321.35 . . . 17704.50

From this we see that id=2 had an initial weight of 165.2 kg which had dropped to 132.0 kg atend of study. Moreover, total auc for glucagon decreased from baseline to end of study in thissubject. Similarly id=3 had an initial bodyweight of 109.7 kg which decreased to 87.1 kg, whileat the same time total auc for glucagon increased in this subject.

To perform a mixed model analysis data must be structured in the long format where eachmeasurement appear in separate lines in the dataset and an additional variable identifies theoccasion. E.g. in the gastric bypass study the patient with id=2 contributes four records:

id visit time weight glucagonAUC2 1 -3 month 165.2 12142.52 2 -1 week 153.4 14083.52 3 +1 week 149.2 10945.52 4 +3 month 132.0 7612.5

Here the same id-number is repeated four-fold because it does not change between occasions. Incontrast, the outcome variables weight and glucagonAUC vary between occasions. More-over, note that only two variables are needed to contain the outcomes in the long data, whileeight were needed in the wide data.

3.5 Why have a categorical and a numerical version of the time variable?

You might have noticed that my long data contains two different variables describing thefollow-up times:

• visit is an integer-valued variable which numbers follow-up times as 1,2,....

• time is a categorical variable which labels follow-up times appropriately, in this case as"-3 month", "-1 week", "+1 week", and "+3 month".

The reason I have made these two variables is mostly out of convenience. The time variablelabels the occasions, whereas the visit variable allows me to sort the occasions in the orderthat they took place.

9

3.6 SAS-program: Transforming data from the wide to the long format

Fortunately we do not have to rearrange data manually to make a long version that can be usedin the analyses. The following SAS code transforms the wide data and stores it in a new datasetcalled long:

/* Transform data from wide to long format (all outcomes must be listed) */

DATA long (DROP = weight1-weight4 glucagonAUC1-glucagonAUC4);SET wide;visit=1; time="-3 month"; weight=weight1; glucagonAUC=glucagonAUC1; OUTPUT;visit=2; time="-1 week"; weight=weight2; glucagonAUC=glucagonAUC2; OUTPUT;visit=3; time="+1 week"; weight=weight3; glucagonAUC=glucagonAUC3; OUTPUT;visit=4; time="+3 month"; weight=weight4; glucagonAUC=glucagonAUC4; OUTPUT;RUN;

Note that I create the variables visit and time as part of the transformation.

Please check the contents of the long dataset, e.g. by looking at the Output data window in SASEnterprise Guide. Make sure to notice the differences compared to the wide data format.

4 Descriptive statistics

Before we proceed to making statistical analyses, I will show you how to make descriptivestatistics suitable for a single group study. The graphical displays will help us to:

• Spot erroneous measurements that could otherwise spoil the analyses.

• Make a rough assessment of the trends in the data which will help us interpret the outputfrom the linear mixed model.

• Judge whether data is approximately normally distributed or whether it should be trans-formed to better fulfill modeling assumptions.

Some of the descriptive statistics are absolutely essential for the mixed model analyses. Theywill help you correct errors in your SAS code as well as in your data. Others I have includedfor pedagogical reasons to help you become familiar with repeated measurements data and themultivariate normal distribution in general.

4.1 Spaghettiplots

Personally, I would never analyze repeated measurements without first looking at the spaghetti-plot. The spaghettiplot is highly informative since it shows the data as is. From this we can getan impression of both individual outcomes and the distribution across the sample. If you lookcarefully, you will be able to read of:

10

• Trend.

• Variability.

• Skewness.

• Outlying observations.

• Strength of correlation over time.

All in the same plot.

To make a spaghettiplot, I use PROC SGPLOT.

/* Make a spaghettiplot */

PROC SGPLOT DATA=long NOAUTOLEGEND;SERIES X=visit Y=weight / GROUP=id;RUN;

Note that this applies to the long data. The arguments GROUP=id in the SERIES statementconnects the outcomes belonging to the same id with lines. Additional graphical parameterscan be supplied to change the appearance of the plot, but usually the default options are suffi-cient to get an overview of the data.

Figure 2: Spaghettiplot of bodyweights from the gastric bypass study.

From the spaghettiplot (figure 2) we immediately notice a decreasing trend in bodyweight, bothin the individuals and in the sample as a whole. Variability seems to be roughly the sameat each occasion. Moreover, we see that the distribution of the bodyweights is likely not a

11

normal distribution. This is because the sample is skewed with a few extremely heavy subjects,a majority of moderately heavy subjects, and no normal weight subjects. Finally, we note astrong tracking effect in the plot, meaning that subjects rarely change positions. The heaviestsubject at baseline is the heaviest throughout the study and similarly with the lightest subject.This tells me that that the serial correlation in the bodyweights is likely very high, which makessense also from a biological point of view. Bodyweights are highly predictable over short timedurations, they do not change dramatically from day to day.

4.2 Scatterplot matrices and correlations

If you want to get an impression of how strong the correlation is between your repeated mea-surements and whether they are multivariate normal distributed, you can make scatterplot ma-trices with PROC CORR:

PROC CORR DATA=wide PLOTS=MATRIX(HISTOGRAM) NOPROB;VAR weight1-weight4;RUN;

Please note that this applies to data in the wide format. I use the option NOPROB to omit tests ofH0 : correlation = 0 which are otherwise reported. When checking the scatterplots, you shouldlook for the elliptical shape that characterizes the multivariate normal distribution. This maybe easier to eyeball if you use the option PLOTS=SCATTER(ELLLIPSE=PREDICTION).However, you get a better overview of the data from PLOTS=MATRIX(HISTOGRAM) .

Besides the scatterplot matrix, PROC CORR outputs correlation coefficients plus some addi-tional summary statistics. Only the correlations are shown here.

Pearson Correlation Coefficients, N = 20

weight1 weight2 weight3 weight4weight1 1.00000 0.98972 0.98635 0.94551

weight2 0.98972 1.00000 0.99721 0.95922

weight3 0.98635 0.99721 1.00000 0.96621

weight4 0.94551 0.95922 0.96621 1.00000

The scatterplot matrix (figure 3) confirm the strong correlation we already noted in the spaghet-tiplot. Also we confirm that the distribution of the weights at each separate follow-up time isskewed, albeit not dramatically so. The skew trend seems to be mainly driven by one or twooutlying subjects. However, we know that bodyweight has a skew distribution in the generalpopulation, thus we could consider a logarithmic transformation. But still, the deviation fromthe normal distribution is not all that pronounced. For ease of interpretation, I choose not totransform the data in the examplified analyses. A supplementary analysis of log2-transformeddata is described in appendix C.

12

Figure 3: Scatterplot matrices of weights from the gastric bypass study made with PROC CORR

4.3 Is data normally distributed - and does it matter?

Beware that the usual problem with normality reoccurs: If sample size (number of subjects) istiny, the assumption that data is normally distributed is crucial but impossible to verify (there-fore small sample size should always be recognized as a limitation in research papers).

On the other hand, if sample size is large, the linear mixed model is robust to all but substantialskewness and major outliers. This means that estimates, confidence intervals, and p-values areapproximately valid regardless of deviations from the normal distribution (but be careful withinterpretation; Does the mean actually describe the most typical outcome?).

If the data deviates substantially from the normal distribution, you can try to improve the fit byapplying a transformation, e.g. a logarithm. Further considerations on transformation are givenin appendix C. See in addition sections 5.8–5.9 for considerations on sample size.

4.4 Trends in summary statistics over time

Summary statistics for the various follow-up times can be computed with PROC MEANS. Here Icompute number of non-missing observations, means, standard deviations, medians, and ranges

13

(minimun and maximum):

PROC MEANS NWAY DATA=long NMISS MEAN STDDEV MIN MEDIAN MAX;CLASS visit;VAR weight;OUTPUT OUT=gbmeans MEAN=sample_mean STDDEV=sample_sd;RUN;

The CLASS statement tells SAS that summary statistics should be computed for each occasionin turn. It is important to remember the option NWAY, otherwise SAS will also compute thesample mean etc across all occasions which is not meaningful. I use an OUTPUT statement tosave the sample means and sample standard deviations in a dataset which I name gbmeans.This I use to make plots of means and standard deviations over time.

The summary statistics look as follows:

The MEANS Procedure

Analysis Variable : weight

visit NObs NMiss Mean Std Dev Minimum Median Maximum1 20 0 128.97 20.27 100.90 123.10 173.002 20 0 121.24 18.91 95.70 114.50 162.203 20 0 115.70 18.28 89.90 110.60 155.004 20 0 102.37 17.05 78.80 98.50 148.00

The sample means confirm the decreasing trend we saw in the spaghettiplot. What was lessobvious from the plot, is that the sample standard deviations also decrease over time. Usuallydecreasing variablity with decreasing level suggest a logarithmic transformation, but note thatthe linear mixed model used in the analysis do not assume that the variance is homogeneous.Hence, this is not a model deviation. The medians are, however, consistently smaller than themeans, which confirms the skewness we saw in the plots, and this is a deviation from the normaldistribution.

Use PROC SGPLOT to plot the sample means and standard deviations (figure 4 on next page):

/* Plot the sample means and standard deviations over time */PROC SGPLOT DATA=gbmeans;SERIES X=visit Y=sample_mean / MARKERS;RUN;

PROC SGPLOT DATA=gbmeans;SERIES X=visit Y=sample_sd / MARKERS;RUN;

14

Figure 4: Trends in sample means and standard deviations from the gastric bypass study.

5 About linear mixed models

In this section I introduce linear mixed models and discus their statistical properties. I haveattempted to make the descriptions as non-technical as possible, explaining the theory in wordsas well as in formulae. However, statistics rely on mathematical foundations, so technicalitiesand formulae cannot be avoided completely. If you are non-technically minded: read this sec-tion fast and comfort yourself that full technical understandig of the linear mixed models is notneeded to apply the models in practice. On the other hand, if you are technically minded, youmight find that the model descriptions are too vague to your liking. In this case, please referto Fitzmaurice, Laird & Ware (2012) or Pinheiro & Bates (2006) for a mathematically rigorousintroduction.

5.1 What is a linear mixed model?

A linear mixed model assumes that quantitative repeated measurements follow a multivariatenormal distribution. Hence, to specify a linear mixed model we need to specify:

1. A linear model for the mean parameters (i.e. regression coefficients for covariates)

2. A model for the covariance parameters (i.e. variance and correlation parameters).

The first part is easy if you are familiar with ordinary linear models for independent data. Thenyou already know how to model the mean as a function of the covariates. In this regards mod-eling and interpretation is similar to between linear mixed model analyses and ordinary linearmodel analyses.

The extra challenge that comes with a linear mixed model analysis is to specify the modelfor the covariance. Although the covariance parameters are rarely of an interest of their own,modeling the covariance correctly is important for inference. If the covariance is misspecified,the standard errors for the mean parameters may be biased, meaning that we get confidence

15

intervals that are too narrow or too wide and p-values that are smaller or larger than they outghtto be. In case there are missing data, a misspecified covariance pattern may also inflict bias inthe estimates for the mean parameters, even when the missing at random assumption holds true.

Overall we have three different ways of modeling covariance:

• Direct specification of a covariance pattern (REPEATED statement in PROC MIXED).

• Indirect specification via random effects (RANDOM statement in PROC MIXED).

• A combination of the two (both statements in conjecture).

I will introduce the different covariance models on a case to case basis in my lectures andtutorials, starting with the unstructured covariance pattern detailed in section 5.5 below.

5.2 Linear mixed model terminology

Phrased in mathematical terms, we model repeated measurements over occasions j = 1, . . . ,kon subjects i = 1, . . . ,n as:

Yi j = β1Xi j1 + . . .+βimXi jm + εi j

where Xi j1, . . . ,Xi jm are the covariates for subject i at occasion j, β1, . . . ,βm are the regressionparameters, and εi1, . . . ,εik are the error terms assumed to follow a multivariate normal distri-bution with zero mean. The interdependency between the error terms is comprised in the k× kdimensional covariance matrix

Σ =

σ21 . . . σ1k...

...σk1 σ2

k

which is a mathematically convenient way of reporting variances and correlations in a compactmanner. For interpretation we prefer looking at standard deviations and correlations. These canbe recovered from the covariance matrix by use of the following formulae:

σ j =√

σ2j and ρ jl =

σ jl

σ jσl

It is important to notice that SAS PROC MIXED has estimated covariances as defaullt output.Additional options can be supplied to get the corresponding correlations. However, you willhave to convert variances to standard deviations yourself.

The covariates in a linear mixed model are called fixed effects and the covariance matrix is calledthe residual covariance matrix or simply the R matrix (not to be confused with the software)

16

5.3 Describing linear mixed models in research papers

When describing a particular linear mixed model in the statistical methods section of a researchpaper it is important that we specify not only which covariates (fixed effects) entered the model,but also what model for the covariance was assumed.

A description of the model for the case study would be something like this:

Case: To analyze changes in bodyweight over time we applied a linear mixed model includingtime (categorical) as a fixed effect. To account for the correlation in the repeated measure-ments as well as possible changes in variances/standard deviations over time, we assumed anunstructured covariance pattern.

The model is described in further detail in sections 5.4 and 5.5 below. For reference I wouldcite Fitzmaurice, Laird & Ware (2012) or more specifically chapter 7 which explains why anunstructured covariance pattern is preferable to a compound symmetry pattern / random effectof subject1 in longitudinal data analysis.

5.4 Case: Oneway ANOVA-like linear mixed model

To model changes in mean bodyweight over time, we specify a linear mixed model with timeas a categorical covariate (fixed effect). This is similar to a oneway ANOVA model for inde-pendent data where we have different mean parameters for each follow-up time.

When we analyse data in SAS, the default model specification (section 6.1) will give us esti-mates of the difference in means with respect to a reference time point (usually baseline). I.e.the default model parameters are the regression parameters β1, β2, β3, and β4, which describepopulation means over time as:

occasion Population mean-3 months µ1 = β1-1 week µ2 = β1 +β2+1 week µ3 = β1 +β3+3 months µ4 = β1 +β4

Table 1: Population means (µ) over time in the gastric bypass study. Changes in mean sincebaseline are described by the regression parameters β2, β3, and β4. The baseline mean is theintercept parameter (β1).

Estimates from the mixed model analysis are described in section 6.3.

Note that it is possible to choose a different reference point than baseline. This way we canestimate differences in means between the other follow-up times. Also it is possible to estimatethe mean parameters µ1, µ2, µ3 and µ4 instead of the regression parameters β1, β2, β3, and β4.Check the SAS syntax in section 6.4 to learn how to do this.

1The compound symmetry pattern is equivalent to having a random effect of subject in a mixed model andassumes that both variances and correlations are constant over time, which is most often not a realistic assumption(see lectures on longitudinal data analysis and random effect models).

17

5.5 The unstructured covariance pattern

Specifying a model for the covariance is easy when it comes to single group studies. We canassume an unstructured covariance pattern, which models variances and correlations as is andwhich can never be misspecified.

Note that, as long as the number of occasions is small compared to the number of subjects in thestudy we do not have to make any restrictive assumptions about the covariance parameters atall. Such assumptions could otherwise be that the variances were constant over time or that thecorrelation decreased exponentially with increasing time distance (see lecture on longitudinaldata analysis).

We say that we assume an unstructured covariance pattern when we leave the covariance pa-rameters unrestricted.

With repeated measurements over k occasions, the unstructured covariance pattern has:

• k variance parameters, denoted σ21 , . . . , σ2

k .

• k(k−1)/2 correlation parameters denoted ρ12, . . . ,ρk−1,k.

In the case study, where k = 4, we have 4 district variances and 6 distinct correlations. Estimatesof the covariance parameters are described in section 6.3

I would recommend specifying any linear mixed model with as unstructured covariance patternas far as possible since this guarantees that we are not making any wrong model assumptions(with the possible exception of assuming a normal distribution in the first place). However, alimitation of the model is that it only works with a limited number of follow-up times. E.g. ifwe had forty follow-up times instead of four, SAS would not be able to do the analysis. Alsothe unstructured covariance pattern can only be used to model balanced study designs.

5.6 Maximum likelihood estimation?

This section contains a short and mostly intuitive description of what is actually a delicate matterfor mathematical statisticians. Read it fast to learn about estimation in mixed models, all thewhile comforting yourself that you do not need to understand the technical details it to use themodels in practice.

If you want to learn more, detailed accounts are given in e.g. Fitzmaurice, Laird & Ware (2012)and Pinheiro & Bates (2006).

To assess what parameters fit the data well, we use the likelihood function. In a linear mixedmodel this is the multivariate normal probability density. To state this accurately we need aformula. However, to read the formula you need to be familiar with matrix algebra. If you arenot, then think of it merely as a multivariate extension of the univariate normal density.

L(µ,Σ) =n

∏i=1

(1√

(2π)k|Σ|e−

12 (yi−µ)T Σ−1(yi−µ)

)(1)

18

The product is taken over subjects i = 1, . . . ,n (assumed independent so that their probabilitydensities can be multiplied). Here yi is the k-dimensional vector observation for subject i, µ isthe mean vector, Σ is the residual covariance matrix, Σ−1 its inverse, and |Σ| its determinant.

The key idea in maximum likelihood estimation is as follows:

• We assume that the k-dimensional outcomes y1, . . . ,yn have been sampled from a k-dimensional multivariate normal distribution.

• We want to estimate the population mean µ and the population covariance matrix Σ.

• If we knew the true population parameters µ and Σ, then the likelihood function, i.e. theprobability density (1) would tell us what the likely outcomes are. In a normal distributionthe most likely outcomes are close to the population mean where the density attains itsmaximum value. With increasing distance from the mean, the likelihood values decreaseand outcome values become correspondingly rare.

• In practice, we do not know what the true population parameters are, but we do know abunch of likely outcomes, namely our data, y1, . . . ,yn. Hence we turn the computationsaround and use the likelihood function (1) to determine which model parameters makethe observed outcomes most likely.

This is the likelihood principle which says the parameters that fit the data the best are the oneswhich have the highest likelihood values i.e. the maximum likelihood estimates.

To make matters even more complicated, the parameters in linear mixed models are usuallyestimated by use of a different variant of the likelihood function called the restricted likelihoodfunction, see e.g. Fitzmaurice, Laird & Ware (2012). This is because the restricted likelihoodestimates of the covariance parameters are more accurate in small samples. Therefore restrictedmaximum likelihood (abbreviated REML) is the default estimation method in most softwareincluding PROC MIXED in SAS.

Case: The restricted maximum likelihood estimates from a balanced single group study withcomplete data are identical to the sample means,

µ1 = 129.0, µ2 = 121.2, µ3 = 115.7, µ4 = 102.4,

sample standard deviations,

σ1 = 20.3, σ2 = 18.9, σ3 = 18.3, σ4 = 17.1,

and the sample correlations,

ρ12 = 0.990, ρ13 = 0.986, ρ14 = 0.946, ρ23 = 0.997, ρ24 = 0.959, ρ34 = 0.966,

i.e. identical to the summary statistics from section 4.

19

5.7 Numerical optimisation and convergence?

An important difference between linear mixed models and ordinary linear models is that inlinear mixed models the estimates and standard errors cannot in general be computed usingtextbook formula. Balanced single group studies with complete data are an exception to thisrule, but software packages will ignore this fact when analysing a particular data set. Insteadthey use numerical optimisation to approximate the maximum likelihood estimates.

To give you a rough impression of how numerical optimisation works, I will briefly describewhat PROC MIXED and other software for linear mixed models does. Again, this is a simplifieddescription of a rather complex technical matter.

For ease of computation the numerical optimisation operates on the deviance function which is-2 times the natural logarithm of the likelihood function (1), i.e.

deviance(µ,Σ) = k ·n · log(2π)+n · log |Σ|+n

∑i=1

(yi−µ)TΣ−1(yi−µ) (2)

Higher values of likelihood correspond to lower values of deviance. Hence the maximum of thelikelihood function is the same as the minimum of the deviance function.

In practice, the deviance (2) is usually replaced by the deviance of the restricted likelihoodfunction, see e.g. Fitzmaurice, Laird & Ware (2012) for details.

This is how the numerical optimisation works:

1. PROC MIXED initially makes a qualified guess of what the estimates could be.

• Mean parameters are initially estimated as if data was independent (i.e. using modelformulae for the ordinary linear model, like with PROC GLM).

• Covariance parameters are estimated using more complicated computations that arespecific to the particular covariance pattern, see the SAS documentation for details.

2. Goodness of fit between the candidate estimates and the data is evaluated by the deviancefunction. Lower values of deviance are better (they correspond to higher likelihood).

3. SAS computes the gradient (i.e. the multivariate first order derivative) and the Hessian(i.e. the multivariate second order derivative) of the deviance function to move the can-didate estimates in a direction where the deviance will be smaller (and the likelihoodhigher).

4. Steps 2-3 are iterated until the candidate is within a certain tolerance of the maximumlikelihood estimates (in the sense that the gradient is close to zero), or until the numberof iterations reach an upper limit set to avoid that the program runs forever.

We say that the numerical optimisation has converged, if it stops when it has located the maxi-mum likelihood estimates. Section 6.2 describes how to assess convergence in practice.

20

5.8 Inference: Confidence intervals and hypothesis testing?

Recall that 95% confidence intervals in linear models are computed from the formula:

β ± t0.975(ddf)× s.e.(β )estimate ± roughly 2× standard error

where t0.975(ddf) is formally the 97.5% quantile in a t-distribution with model specific denomi-nator degrees of freedom (ddf). Likewise the hypothesis H0 : β = 0 can be tested using a t-typetest statistic:

t =β

s.e.(β )

The trouble with linear mixed models is that there are no simple general formula to compute thedegrees of freedom. As sample size becomes large t0.975(ddf) ≈ 1.96. Otherwise the residualdegrees of freedom that are used as denominator degrees of freedom for t-tests and correspond-ing confidence intervals can only be computed exactly for some very specific balanced studydesign with complete data. In any other case degrees of freedom must be approximated usingmore of less complex computations.

The most recent and overall the best method for approximating denominator degrees of freedomin linear mixed models is due to Kenward & Roger (2009). This has been implemented inPROC MIXED in SAS and in the pbkrtest-package in R. However, PROC MIXED can dothe correction for all kinds of linear mixed models, whereas pbkrtest only applies to randomeffect models (lecture 3).

5.9 Important: Sample size matters!

Confidence intervals and p-values in linear mixed models are all based on statistical large sam-ple theory. Hence, you need to consider sample size (number of subjects) to assess whetheryour statistical results are valid.

When is sample size large enough? As usual, there is no strict answer to this question. Indeedif deviations from the normal distribution are severe (e.g. if extreme outliers are present), if thenumber of replicates is large, or if correlations are extremely close to -1 or 1, then statisticalresults may be compromised even in larger samples. However, years of working with linearmixed models as a statistical consultant have left me with the following rules of thumb:

Sample size Impact of normal distribution Impact of degrees of freedomTiny n < 10 Linear mixed model analysis disrecommended??

Borderline 10≤ n < 15 Normal distribution matters Large impactSmall 15≤ n < 30 Moderate deviations tolerable Moderate impactModerate 30≤ n < 60 Larger deviations tolerable Small impactLarge n≥ 60 Most deviations tolerable Hardly any impact.?? Suggested alternative: t-tests or non-parametric statistics.

21

Please note that these recommendations are for balanced designs with complete data. In casesome outcomes are missing, the effective sample size will be smaller than the number of subjectsin the sample. We can use the number of complete cases (i.e. subjects with complete data) as alower bound for the effective sample size.

Also note that when several groups are considered, n is the number of subjects per group.

5.10 Predicted values and residuals

Predicted values and residuals are used for model validation. We make residual plots and QQ-plots similar to those for ordinary linear models to evaluate a linear mixed model. However,linear mixed models are more complex and for this reason there are several types of predictedvalues and residuals available for checking different aspects of the model specification.

Three different kinds of predicted values can be considered:

1. Ordinary predicted values (based on covariates).

2. Dynamical predictions (based on covariates and previous observations).

3. Subject specific predictions (for random effects models, lecture 3).

The predicted values (fitted values in R) are what we get from the estimated linear model:

Predicted value for subject i at occasion j : Yi j = β1Xi j1 + . . .+ βmXi jm

These are similar to predicted values from an ordinary linear model. I recommend using thepredicted values for an initial check of your model specification. If your predicted values deviatesubstantially from the trends in the descriptive plots and summary statistics, there is very likelyan error in your R code (otherwise you have made a poor choice of model).

We obtain the (ordinary) residuals by subtracting the predicted values from the observed values:

Residual for subject i at occasion j : ei j = Yi j− Yi j

These are similar the residuals from an ordinary linear models. Only residuals belonging tothe same subject will be correlated and they may have different standard deviations even ifthe model is correctly specified. E.g. the unstructured covariance pattern (section 5.5) assumespotentially different standard deviations for the different occasions.

The residuals allows you to make diagnostics plots that are better for checking normality thanthe simple scatterplots in section 4. However, since standard deviations from different occasionsmay differ in a mixed model, we cannot pool the ordinary residuals in a QQ-plot to checkthat they are normally distributed without first standardizing them. Similar to ordinary linearmodels, residuals may be standardized in two different ways:

• Pearson residuals have been standardized by dividing with the estimated standard devia-tions from each occasion in turn. The resulting residuals have similar standard deviationsbut are still correlated across occasions.

22

• Studentized residuals are similar to the Pearson residuals save from an additional cor-rection to account for estimation uncertainty in small samples.

Dynamical predictions are what we get from predicting outcomes based on covariates and pre-vious outcomes in conjecture. A mathematical property of the multivariate normal distributionis that it implies a linear model for either outcome with any other as predictor, i.e.

Yi j = β1Xi j1 + . . .+βmXi jm + γ1Yi1 + . . .+ γ j−1Yi j−1 + error

I will explain this in terms of only two variables highlighting the connection between correlationand prediction:

Assuming that the two outcomes Y1 and Y2 follow a joint normal distribution with mean µ1and µ2, standard deviations σ1 and σ2 and correlation ρ , then we have the following linearregression for predicting Y2 based on Y1 (see figure 5 for illustration):

Yi2 = µ2−ρσ2

σ1µ1 +ρ

σ2

σ1Yi1 + ε

Y2|Y1i (3)

where the εY2|Y1s are prediction errors following a normal distribution with zero mean and vari-ance (1−ρ2)σ2

2 , called the prediction error variance. Note that the intercept, the slope, and theprediction error variance are determined by the parameters in the multivariate normal distribu-tion. We note that the slope ρ

σ2σ1

has the interpretation that the predicted values of Y2 increaseswith ρ standard deviations every time the predictor Y1 increases with one standard deviation.Moreover, we see that the prediction error variance decreases as the correlation gets stronger.In other words, Y2 is more accurately predicted from Y1 if the correlation is strong. This is ingood accordance with our understanding of the correlation.

Figure 5: Dynamical prediction of the second bodyweight from the first bodyweight in thegastric bypass study. The strong correlation implies that the predictive accuracy is high.

We have similar formulae for predicting Y3 from Y1 and Y2, and so forth. Only they are morelengthy so I will omit them here.

23

From the prediction models we obtain residuals that are estimates of the dynamical predictionerrors. These can be used for further validation of the linear mixed model. A noteworthyresult from mathematical statistical theory is that the prediction errors are independent of eachother. Hence, if the linear mixed model is correctly specified the standardized residuals fromthe dynamical prediction models should be approximately independent and normally distributedwith zero mean and a standard deviation of one. The stadardized dynamical residuals are calledscaled residuals in PROC MIXED.

Case: Figure 6 below shows spaghettiplots of the different residuals obtained when fittingthe oneway ANOVA-like linear mixed model with the unstructured covariance pattern to thebodyweights from the gastric bypass study.

Figure 6: Residuals from fitting the linear mixed model to the bodyweights from the gastricbypass study. From upper left to lower right panel the plots are of: Original data, ordinaryresiduals, studentized residuals, and scaled residuals. Pearson residuals are omitted since theyare almost indistinguishable from the studentized residuals. Assuming that data is truly multi-variate normal distributed all residuals should be symmetrically distributed around zero. Furtherthe standardized residuals (Pearson, studentized and scaled) should have a homogeneous vari-ance across occasions. The tracking effect from the original data persist in all but the scaledresiduals, which ought to be uncorrelated if all assumptions for the linear mixed model holdtrue.

For a worked example, I will take a closer look at the predicted values and residuals for id=2:

24

id visit weight predicted Residual PearsonRes StudentizedRes ScaledRes2 1 165.2 128.97 36.23 1.79 1.83 1.792 2 153.4 121.24 32.16 1.70 1.74 -0.472 3 149.2 115.70 33.50 1.83 1.88 1.882 4 132.0 102.36 29.64 1.74 1.78 -0.46

As for all subjects, the (ordinary) predicted values are equal to the estimated means from theseparate occasions, which again are equal to the sample means because the data is complete.The (ordinary) residuals are the differences between the observed weights and the predictedvalues, e.g.

ei1 = Yi1− µ1 = 165.2−128.97 = 36.23.

This shows that id=2 had an initial weight which is 36.2 kg above the estimated populationmean. We get the Pearson residual by dividing with the estimated standard deviation, that is

ei1

σ1=

36.2320.27

= 1.79

showing that id=2 had an initial weight which is 1.79 standard deviations above the estimatedpopulation mean. The subsequent Pearson residuals show that id=2 remains about 1.7 to 1.8standard deviations above the estimated population mean throughout the study, which makesgood sense considering the strong serial correlation in the bodyweights.

The Studentized residuals are highly similar to the Pearson residuals, only slightly larger.

As to the scaled residuals, the first is identical to the Pearson residual (since we have no previousweights to predict from at baseline). At the second visit, we predict the second weight from thefirst using the linear regression model in figure 5 and we compute the prediction error:

Y |Y1i2 = 154.7 e|Y1

i2 = Yi2− Y |Y1i2 ' 153.40−154.69 =−1.29

We next standardize this by dividing with the estimated standard deviation of the predictionerror distribution, −1.29

2.78 =−0.47. We note that the scaled residual is negative since the secondobserved weight of id=2 is smaller than what the model predicts from his initial weight. How-ever, the deviation from the prediction is not unusually large, it is only about one half standarddeviation of the prediction error distribution. The third scaled residual is 1.88. Thus a visit 3,id=2 had a higher weight than what the model predicts from his two previous weights. Thedeviation is fairly large but still within the normal range for the prediction error distribution. Atthe final visit, id=2 had a weight which is smaller than what the model predicts from the threeprevious weights, but well within the normal range of the estimated prediction error distribution.

The different kinds of residuals and their use for making model diagnostics are further describedin chapter 10 of Fitzmaurice, Laird & Ware (2012). Predicted values and residuals from the casestudy are further investigated in section 6.5.

6 Analysis and interpretation of the linear mixed model

In the following section I will describe how to analyze the gastric bypass data using PROCMIXED in SAS. I will describe the syntax and the interpretation of the output.

25

6.1 SAS program: Specifying the linear mixed model

To do the linear mixed model analysis we use the folowing PROC MIXED syntax:

/*** Linear mixed model analysis of a single group study ***/

PROC MIXED DATA=long PLOTS=ALL;CLASS id visit (REF="1");MODEL weight=visit / CL DDFM=KR2 VCIRY OUTPM=fitmain RESIDUAL;REPEATED visit / SUBJECT=id TYPE=UN R RCORR;RUN;

As to the essential parts of the code:

• DATA specifies which dataset the analysis is based on. It must be the long format.

• PLOTS=ALL makes SAS output a number of residualplots.

• Categorical variables must be declared in the CLASS statement. We use (REF=...) tospecify if a particular group should be used as reference category. In this case I have madebaseline the reference point for visit. Thus the baseline mean will be the intercept inthe linear mixed model.

• In the MODEL statement, weight=visit, specifes which variable is the outcome (leftof =) and which is the covariate (right of =). It is possible to have more than one covariatein a mixed model, but in this case we only have only one, namely visit which is acategorical variable since we have made it so in the CLASS statement. Note that in casethe visit-variable was numerical, SAS would perform something like a linear regressiononly with repeated measurements data (like in a longitudinal exposure-response study).

• The option CL implies that estimates and confidence intervals for the fixed effects areoutputtet.

• The option VCIRY makes diagnostic plots of the scaled residuals.

• The option OUTPM=fitmain saves the predicted values in a dataset which I have namedfitmain. RESIDUAL adds the residuals.

• The REPEATED statement specifies a covariance pattern model for the repeated mea-surements, TYPE=UN means that we have chosen the unstructured covariance pattern. Itis important to specify SUBJECT=id so that SAS knows which measurements belong tothe same subject and to specify visit after REPEATED so that SAS knows the temporalordering of the replicates.

• The options R and RCORR outputs the estimated residual covariance matrix and the cor-responding correlation matrix.

26

6.2 Checking convergence

You don’t need to understand how the numerical optimisation within PROC MIXED works toanalyze your data. However, you do need to know whether or not your attempt to fit the linearmixed model has succeeded. That is, whether or not the numerical optimisation has converged,see section 5.7.

Fortunately, checking convergence is easy. If your attempt to fit the linear mixed model issuccessful, then SAS will output a message stating that convergence criteria has been met. Itought to look like this:

At the same time, you will get a load of other output from PROC MIXED. On the other hand,if the numerical optimisation fails to converge, SAS will return a more or less comprehensibleerror message.

So what if the model doesn’t converge? Most often non-convergence is caused by simpleerrors such as misspelling the name of a variable, applying the model to the wrong dataset, orsuch. If you fix the error, the analysis will proceed smoothly. Otherwise convergence issuessometimes occur from trying to fit a too complex model to a too small dataset. In case you can-not figure out why your mixed models doesn’t converge, you should get help from a statistician.The same applies if your attempt to fit the model results in a warning message.

Beware of false convergence: In rare cases it will happen that SAS mistakes some implausibleparameter values for the true maximum likelihood estimates. This is called false convergence.Therefore you should always use fitted values and residuals for model validation as describedin section 6.5 below. If you suspect that your attempt to fit the linear mixed model has resultedin false convergence, you should get help from a statistician.

6.3 Interpretation of estimates, confidence intervals, and p-values

Mean parameters: The most interesting part of the output is the estimated time effect, i.e. theestimated mean changes in bodyweight (kg) since baseline. These are reported for visit = 2(-1 week follow-up), visit = 3 (+1 week follow-up), and visit = 4 (+3 months follow-up). The intercept in this model corresponds to the estimated population mean at baseline (-3months):

Solution for Fixed Effects

27

Effect visit Estimate StdError DF tValue Pr>|t| Lower UpperIntercept 128.97 4.5324 19 28.46 <.0001 119.48 138.46visit 2 -7.7300 0.6974 19 -11.08 <.0001 -9.1898 -6.2702visit 3 -13.2700 0.8392 19 -15.81 <.0001 -15.0265 -11.5135visit 4 -26.6050 1.5494 19 -17.17 <.0001 -29.8479 -23.3621visit 1 0 . . . . . .

Estimated changes are -7.7 kg, -13.3 kg, and -26.6 kg at -1 week, +1 week, and -3 monthsfollow-up. The confidence intervals indicate substantial decreases in mean, and the p-valuesfor the time effects are all < 0.0001. Hence, we can conclude that the mean bodyweight issubstantially lower at all follow-up times compared with baseline. This is no surprise, sinceboth the pre-surgery diet and the gastric bypass has a well documented effect.

Covariance parameters: The covariance parameters are usually of secondary interest to themean parameters in linear mixed models analyses. The reasons why you might want to checkon them anyway are:

• To better understand linear mixed models and the multivariate normal distributions.

• To use them in power calculations for future studies.

• Implausible values indicate errors in the data or the program.

For the gastric bypass study, the estimated R covariance matrix is:

Estimated R Matrix

Row Col1 Col2 Col3 Col41 410.85 379.36 365.37 326.842 379.36 357.60 344.62 309.343 365.37 344.62 333.99 301.134 326.84 309.34 301.13 290.84

But neither variances nor covariances are easily interpretable. To make this more comprehensi-ble we could take square roots of the variances in the diagonal of the R matrix. This would tellus that the standard deviations in the bodyweights has been estimated by

√410.85 = 20.3kg

at baseline,√

357.60 = 18.9kg at first follow-up,√

333.99 = 18.3kg at third follow-up, and√290.84 = 17.1kg at end of study. The correlations are found in the output:

Estimated R Correlation Matrix


Please note that the estimated standard deviations and correlations are identical to the summarystatistics in section 4. This will always be the case if we analyse complete data from a balanced

28

single group study with a model that is fully flexible (i.e. time as a categorical covariate + anunstructured covariance pattern).

Type 3 tests: If you would rather test the hypothesis H0 : µ1 = µ2 = µ3 = µ4, i.e. that nochanges in means whatsoever occur over time. This is what you get from the Type 3 test in theoutput:

Type 3 Tests of Fixed Effects

Effect NumDF DenDF F Value Pr > Fvisit 3 17 108.85 <.0001

Note the usual limitation of the F-test: If the hypothesis is rejected, further comparisons areneeded to investigate between which particular follow-up times significant differences occur.On the other hand, if the hypothesis is not rejected, you still want to estimate the differencesbetween the follow-up times with 95% confidence intervals to assess whether this is likely dueto lack of effect or due lack of power. Hence, you might as well skip the F-test and do thepairwise comparisons to begin with (with proper adjustment for multiple testing).

6.4 SAS Program: Alternative syntax (all means and differences)

Often researchers want to report estimated means for the various occasions along with the esti-mated changes in means which we obtained from the program in section 6.1. Also you mightwant to estimate changes between all of the various time points instead of just changes sincebaseline.

We can make PROC MIXED estimate means for all occasion and differences between all oc-casion with a small change to the syntax. Adding NOINT as option to the MODEL statement,makes SAS fit the model without an intercept, meaning that each occasion gets a separate meanparameter. In addition the LSMEANS statement estimates differences between means for allpairs of occasions if you supply the DIFFS option:

/**** Alternative syntax: Means over time and all against all ****/

PROC MIXED DATA=long PLOTS=ALL;CLASS id visit;ODS OUTPUT lsmeans=weightmeans diffs=weightdiffs;MODEL weight=visit / NOINT DDFM=KR2 VCIRY OUTPM=fitmain;LSMEANS visit / DIFF CL;REPEATED visit / SUBJECT=id TYPE=UN R RCORR;RUN;

Note that, besides the changes to the MODEL statement, the remaining syntax is similar to thatof section 6.1. To save the estimated means and differences for reporting, I have added thefollowing ODS OUTPUT statement:

ODS OUTPUT lsmeans=weightmeans diffs=weightdiffs;

29

which saves the estimates to two different datasets called weightmeans and weightdiffs.From this we get the estimated means with approximate 95% confidence intervals:

Obs visit Estimate Lower Upper1 1 128.97 119.48 138.462 2 121.24 112.39 130.093 3 115.70 107.15 124.254 4 102.37 94.3835 110.35

When interpreting the confidence intervals it is important to keep the difference between inde-pendent and paired samples in mind. Estimated means from baseline and 1 week before surgeryare contained in each others confidence intervals. Nevertheless, we conclude a significant dif-ference in mean bodyweight with P < 0.0001 as can be seen from the estimated differences:

Obs visit _visit Estimate Probt Lower Upper1 1 2 7.7300 <.0001 6.2702 9.18982 1 3 13.2700 <.0001 11.5135 15.02653 1 4 26.6050 <.0001 23.3621 29.84794 2 3 5.5400 <.0001 4.8251 6.25495 2 4 18.8750 <.0001 16.3225 21.42756 3 4 13.3350 <.0001 11.1122 15.5578

This is a consequence of the strong correlation in the data: The paired t-test has much higherstatistical power than the two-sample t-test. Recall that, we can only infer a significant differ-ence from non-overlapping confidence intervals when samples are independent. This is a factyou might have to remind the reviewer of your paper (otherwise you risk getting rejected due topostulated statistical errors).

6.5 Using predicted values and residuals for validation

Predicted population means: As an initial model check I plot the predicted population meansover time. These should be compared to the similar plots of the summary statistics (section 4.4).If the predictions deviate substantially from the summary statistics, this would indicate eitheran error in my program or a poor choice of model.

As we have already noted, estimated means for the bodyweights are identical to the summarystatistics from section 4. This will always be the case when you have complete data from abalanced single group study and fit a fully flexible model for the mean (i.e. time as a categoricalcovariate, so that each occasion gets its own particular mean parameter).

Residualplots are useful for checking the modeling assumption that data follows a multivariatenormal distribution and for detecting outliers in the data.

Here I look at the studentized residuals which are directly comparable to the standard normaldistribution. If data is truly normal, the points ought to be on the diagonal line in QQ-plotsave from small random deviations. In this case the points in the QQ-plot seems to be smilingindicating that the distribution of the residuals is skewed compared to the normal distribution.

30

Figure 7: Predicted population means from the mixed model analysis of the gastric bypassstudy.

Figure 8: Studentized residuals from the mixed model analysis of the gastric bypass study.

Further note that a large residual (either positive or negative) indicates that the outcome is farfrom the expected mean outcome in the population (measured in number of standard devia-tions). To identify which subjects the outlying residuals belong to, we could inspect the outputdata. However, the most extreme outliers should be obvious already in the spaghettiplots whichare easier to interpret since data is shown on its original scale. In this case all the residual values> 2 belong to the one subject who is the heaviest in the sample.

Finally we use the scaled residuals (section 5.10) to check whether the dynamical predictionsfrom the multivariate normal distribution are in line with the data. Figure 9 below indicate agood fit for the gastric bypass data.

31

Figure 9: Scaled residuals from the mixed model analysis of the gastric bypass study indicate agood fit of the unstructured covariance pattern model (obtained from SAS PROC MIXED.

References

Fitzmaurice, G. M.; Laird, N. M. & Ware, J. H. (2012). Applied longitudinal analysis, volume998. John Wiley & Sons.

Jorsal, T.; Christensen, M.; Mortensen, B.; Nygaard, E.; Zhang, C. & more (2019). “Theeffect of Roux-en-Y gastric bypass surgery on the plasma hormone and gut gene expressionprofile”. submitted for publication.

Kenward, M. G. & Roger, J. H. (2009). “An improved approximation to the precision of fixedeffects from restricted maximum likelihood”. Computational Statistics & Data Analysis,53(7):2583–2595.

Pinheiro, J. & Bates, D. (2006). Mixed-effects models in S and S-PLUS. Springer Science &Business Media.

Pinheiro, J.; Bates, D.; DebRoy, S.; Sarkar, D. & R Core Team (2019). nlme: Linear andNonlinear Mixed Effects Models. R package version 3.1-140.

32

A Appendix: Importing data from an external file.

Rather than creating data from within SAS, we usually import data from an external file. If youprefer this option, a text file version of the gastric bypass data can be found at my webpagetogether with the program files.

To read the data from gastricbypass.txt, first download it to your computer. Next runthe code below. Please note that you have to change the path in the program so that it matchesthe working directory where you have stored the file.

DATA wide;INFILE "C:\repeated\casestudies\gastricbypass.txt" FIRSTOBS=2;INPUT id weight1-weight4 glucagonAUC1-glucagonAUC4;RUN;

The argument FIRSTOBS=2 tells SAS that the first record in the data is found in the secondline of the datafile, while the first line contains the names of the variables.The names of thevariables must be supplied to SAS separately in the INPUT statement.

B Appendix: What is in the output from PROC MIXED?

PROC MIXED produces a lot of output. Here I will comment on it, bit by bit.

The first part of the output from PROC MIXED is a technical summary:

Model Information

Data Set WORK.LONGDependent Variable weightCovariance Structure UnstructuredSubject Effect idEstimation Method REMLResidual Variance Method NoneFixed Effects SE Method Kenward-Roger2Degrees of Freedom Method Kenward-Roger2

I.e. what dataset has been analyzed, which variable was the outcome, what model was assumedfor the covariance, which variable identifies the subjects in the data, and what estimation pro-cedure was applied. REML is an abbreviation of restricted maximum likelihood which is theoptimal statistical method for estimating the parameters in a linear mixed model. So this is allgood. The last part of the summary refers to small sample correction for the degrees of freedom.

Next, we are recalled which variables we have specified as categorical, and what their particularcategories are. Note that if we have specified a reference category this is listed as the last.

33

Class Level Information

Class Levels Valuesid 20 1 2 3 4 5 6 7 8 9 10 11 12 13

14 15 16 17 18 19 20visit 4 2 3 4 1

Next we get information about the number of model parameters PROC MIXED has used in itsinternal representation of the model:

Dimensions

Covariance Parameters 10Columns in X 5Columns in Z 0Subjects 20Max Obs per Subject 4

Covariance parameters is the number of distinct parameters in the model for the covariance (inthis case an unstructured covariance pattern with four variances and six correlations). Columsin X is the number of parameters in PROC MIXEDs list of fixed effect estimates. This is usu-ally somewhat larger than the number of mean parameters in the model as PROC MIXED outputredundant model parameters that has been set to zero (see fixed effects below). Columns inZ refers to the number of random effects in the random effects part of the model (see lectures3 and 4). Subjects is the number of subjects in the REPEATED statement. In this case wehave 20 subjects with four observations each.

Next PROC MIXED summarizes the number of observations in the dataset:

Number of Observations

Number of Observations Read 80Number of Observations Used 80Number of Observations Not Used 0

The case study data contains four records from each of twenty subjects, which amounts to eightyobservations in total. None of the weights are missing, hence all records have been used to fitthe model.

Next an important part of the output: the summary of the numerical optimisation:

Iteration History

Iteration Evaluations -2 Res Log Like Criterion0 1 672.498010771 1 446.75857837 0.00000000

Convergence criteria met.

34

We confirm that the optimisation has converged.

By supplying R and RCORR options to the REPEATED statement in the program, we haverequested estimates of the R covariance matrix and the corresponding correlations, so this iswhat we get here:

Estimated R Matrix for id 1


Estimated variances are on the diagonal of the R matrix. For interpretation we can compute thecorresponding standard deviations as I did in section 6.3. The corresponding correlations are:

Estimated R Correlation Matrix for id 1


If you compare these to the correlation coefficients we computed as descriptive statistics, youwill see that they are just the same. This will happen any time you have complete data from abalanced single group study.

After this, we get default output which lists the estimated covariance parameters from the un-structured covariance pattern. These are the same estimates that appear in the R covariancematrix, only harder to read.

Covariance Parameter Estimates

Cov Parm Subject EstimateUN(1,1) id 357.60UN(2,1) id 344.62UN(2,2) id 333.99UN(3,1) id 309.34UN(3,2) id 301.13UN(3,3) id 290.84UN(4,1) id 379.36UN(4,2) id 365.37UN(4,3) id 326.84UN(4,4) id 410.85

Note that for other choices of covariance patterns the default parameters may be more compre-hensible.

35

Next we get some overall goodness of fit measures; The deviance (-2 log likelihood value), twovariants of the Akaike information criterion (AIC and AICC), and the Bayesian informationcriterion (BIC):

Fit Statistics

-2 Res Log Likelihood 446.8AIC (Smaller is Better) 466.8AICC (Smaller is Better) 470.1BIC (Smaller is Better) 476.7

The two former can be used to compare models that are not necessarily nested. I rarely use thesemeasures and I disrecommend using them for model selection. If you are in doubt whether to fitone or the other model to your data, I would recommend to fit both and compare the statisticalresults in a sensitivity analysis. This way you sidestep the risk of choosing a wrong model.After all having a somewhat higher AIC than another model is no guarantee that the model youhave selected is the correct one. Furthermore, if your candidate model doesn’t fit the data at all,this should be obvious from the diagnostic plots (section 6.5).

Then we get a test of the global null hypothesis i.e. the hypothesis that none of the covariatesin the model has any effect.

Null Model Likelihood Ratio Test

DF Chi-Square Pr > ChiSq9 225.74 <.0001

In this case we only have one covariate, namely visit, so this is a test of the hypothesisH0 : µ1 = µ2 = µ3 = µ4 stating that mean weight do not change with time. Please note that theP-value is based on a likelihood ratio Chi-square test, without any small sample correction.

The most interesting part of the summary are the estimates for the effect of the covariates; Thisis the part of the summary output which I commented on in section 6.3:

Solution for Fixed Effects

Effect visit Estimate StdError DF tValue Pr>|t| Lower UpperIntercept 128.97 4.5324 19 28.46 <.0001 119.48 138.46visit 2 -7.7300 0.6974 19 -11.08 <.0001 -9.1898 -6.2702visit 3 -13.2700 0.8392 19 -15.81 <.0001 -15.0265 -11.5135visit 4 -26.6050 1.5494 19 -17.17 <.0001 -29.8479 -23.3621visit 1 0 . . . . . .

Along with the estimates we get standard error, t-test statistics, and a p-values for testing thehypothesis that the parameter in consideration is zero.

Finally PROC MIXED outputs type 3 tests of fixed effects. With only one covariate in the modelthis is yet another test of the global null hypothesis H0 : µ1 = µ2 = µ3 = µ4, only this time thetest is based on an F-test with apropriate small sample correction to the degrees of freedom:

36

Type 3 Tests of Fixed Effects

Num DenEffect DF DF F Value Pr > Fvisit 3 17 108.85 <.0001

C Appendix: Analysis of log-transformed data.

We previously noted that the distribution of the bodyweights is skewed with few very heavysubjects, a majority of moderately heavy subjects, and no normal weight subjects. Also theweights of the most heavy subjects appeared more variable over time than those of the lightersubjects in the sample. Finally there was a trend that the standard deviations decreased alongwith the means. All of this speaks in favour of a logarithmic-transformation.

Overall recommendations for log-transformation is to use it when:

• It is a substantial improvement on the original model fit.

• There is precedence to transform outcomes of this particular type in the literature.

• It is desirable that changes over time or differences between treatments should be esti-mated in relative terms, i.e. as percentwise changes or differences.

If you are in doubt whether your data should be transformed it is usually better to stick to theoriginal scale. Otherwise consult with a statistician about it.

Figure 10 below compares the spaghettiplot of log2-transformed data with that of the originaldata and figure 11 (on next page) shows the scatteplot matrix for the log2-transformed data. Thedistribution of the log2-transformed data is more symmetric and therefore closer to a normaldistribution than the original data.

Figure 10: Bodyweights from the gastric bypass study on original scale (left) and after log2-transformation (right).

37

Figure 11: Scatterplot matrices of log2-transformed weights from the gastric bypass study. Thisshould be compared with figure 3 showing the data on the original scale.

.

Making an analysis of log-transformed data is similar to making an analysis of untrans-formed data save from the first and the last step in the analysis. The first step would be to addthe log-transformed outcome to the long data and review the spaghettiplot:

/* Add log-transformed weights to the long data */DATA long;SET long;log2w=log2(weight);RUN;

/* Spaghettiplot */PROC SGPLOT DATA=long NOAUTOLEGEND;SERIES X=visit Y=log2w / GROUP=id;RUN;

Then we redo the mixed model analysis. Please note that the SAS code is identical to that insection 6.4 save from the name of the outcome and ODS OUTPUT statement which saves theestimates to a dataset:

38

PROC MIXED DATA=long PLOTS=ALL;ODS OUTPUT SOLUTIONF=estimates;CLASS id visit (REF="1");MODEL log2w=visit / CL DDFM=KR2 VCIRY OUTPM=fitlog RESIDUAL;REPEATED visit / SUBJECT=id TYPE=UN R RCORR;RUN;

From here on the analysis proceeds as in section 6.4. After we extract the estimates and confi-dence intervals for reporting, the final step is to back-transform for interpretation:

/* Back-transform estimates for interpretation */DATA estimates;SET estimates;estimate=2**estimate;lowerCL=2**lower;upperCL=2**upper;KEEP estimate lowerCL upperCL Probt;RUN;

PROC PRINT DATA=estimates;RUN;

Obs Estimate Probt lowerCL upperCL1 127.55 <.0001 118.851 136.8782 0.9402 <.0001 0.930 0.9503 0.8969 <.0001 0.886 0.9084 0.7929 <.0001 0.775 0.8115 1.0000 . . .

Hence, median bodyweight has decreased by -6.0% (95% CI: -6.9% to -5.1%) at first follow-up,by -10.3% (95% CI: -11.3% to -9.3%) at second follow-up, and by -20.7% (95% CI: -22.4% to-19.0%) at end of study.

Similarly, we could obtain estimates of means over time on log2-scale and back-transform theminto estimated medians on the original scale.2 Note that the estimated median bodyweight atbaseline is 127.5 kg (95% CI: 119.3 to 136.4 kg). This is smaller than the estimated mean of129.0 kg we found when analyzing the data on the original scale, albeit not much smaller as thedistribution of the bodyweights is only moderately skew.

To assess whether the linear mixed model fits the log2-transformed data well we inspect theresidualplots. Figures 12 and 13 below shows the studentized and scaled residuals (remainingresidual plots omitted). After the transformation, the studentized residuals show a somewhatbetter fit to the normal distribution whereas the scaled residuals show a similarly good fit as theuntransformed data.

2Since transformation preserves quantiles (including the median) and since in a symmetric distribution (includ-ing the normal distribution) the mean is the same as the median. In case data on log-scale is only approximatelynormal (symmetric), back-transformed means will be geometric means on the original scale.

39

Figure 12: Studentized residuals from the mixed model analysis of the log2-transformed data.

Figure 13: Scaled residuals from the mixed model analysis of the log2-transformed data.

40

Introduction to linear mixed models for repeated measurements...

Documents

Transcript of Introduction to linear mixed models for repeated measurements...