HW3

5
BST762/STA632 Homework Assignment #3 Due Thursday, October 8, 2015, at the beginning of class 1. For this problem, you will be working with the dataset (smoking.sas7bdat) from the Vlagtwedde-Vllaardingen Study, which is used as an example in Section 6.5 of your book. In short, your outcome is FEV 1 , and we are interested in comparing the mean time trend of FEV 1 over time for former (smoke=0) and current smokers (smoke=1). This dataset is an example for which the temporal spacing is the same for everyone (planned measurements at 0, 3, 6, 9, 12, 15, and 19 years), but subjects did not have to contribue data at each of these 7 time points. In fact, some of the 133 subjects only contributed FEV 1 at one time point. The book fit a model that included the main effects and interaction of time in years and smoking category. They further found that an interaction does not belong in the model. Therefore, you will fit the following model for this problem: FEV 1ij = β 0 + β 1 smoke i + β 2 T ime ij + ij a) Just by looking at Figure 6.4 on page 155, do you think that the book’s conclusion that an interaction does not belong in the model is a correct assumption? Why or why not? (5 points) b) Your book fit this model using an unstructured covariance matrix. You are to fit it with multiple covariance structures, using the full dataset, and then again using the first 35 subjects. Fill in the results for the tables in the Word document. (Do this twice, once for the full dataset, and once for the reduced dataset. Therefore, you will have a total of 4 tables.) (15 points) Below is example code that you can use to obtain the reduced dataset: data sample; set smoking; if id>35 then delete; run; data smoking; set smoking; t=time; run; When utilizing empirical SE estimates incorporating the Mancl and DeRouen (2001) correc- tion, the following code may be helpful. The t is needed in the random statement because not everyone contributes an observation at each time point. proc glimmix data=smoking empirical=FIRORES; class id t; model fev1 = smoker time / solution; random t / subject=id type=un vcorr=1 residual; run;

description

HW3

Transcript of HW3

Page 1: HW3

BST762/STA632 Homework Assignment #3

Due Thursday, October 8, 2015, at the beginning of class

1. For this problem, you will be working with the dataset (smoking.sas7bdat) from theVlagtwedde-Vllaardingen Study, which is used as an example in Section 6.5 of your book.In short, your outcome is FEV1, and we are interested in comparing the mean time trend ofFEV1 over time for former (smoke=0) and current smokers (smoke=1). This dataset is anexample for which the temporal spacing is the same for everyone (planned measurements at0, 3, 6, 9, 12, 15, and 19 years), but subjects did not have to contribue data at each of these7 time points. In fact, some of the 133 subjects only contributed FEV1 at one time point.

The book fit a model that included the main effects and interaction of time in years andsmoking category. They further found that an interaction does not belong in the model.Therefore, you will fit the following model for this problem:

FEV1ij = β0 + β1smokei + β2Timeij + εij

a) Just by looking at Figure 6.4 on page 155, do you think that the book’s conclusion thatan interaction does not belong in the model is a correct assumption? Why or why not? (5points)

b) Your book fit this model using an unstructured covariance matrix. You are to fit itwith multiple covariance structures, using the full dataset, and then again using the first 35subjects. Fill in the results for the tables in the Word document. (Do this twice, once for thefull dataset, and once for the reduced dataset. Therefore, you will have a total of 4 tables.)(15 points)

Below is example code that you can use to obtain the reduced dataset:

data sample; set smoking; if id>35 then delete; run;

data smoking; set smoking; t=time; run;

When utilizing empirical SE estimates incorporating the Mancl and DeRouen (2001) correc-tion, the following code may be helpful. The t is needed in the random statement becausenot everyone contributes an observation at each time point.

proc glimmix data=smoking empirical=FIRORES;

class id t;

model fev1 = smoker time / solution;

random t / subject=id type=un vcorr=1 residual;

run;

Page 2: HW3

2

c) Look at your results from the analyses of the full dataset.

i) Are there notable differences between the use of the typical model-based standard errors(SEs) and df relative to the use of the Kenward and Roger Adjustment? Why or why not?(5 points)

ii) Are there notable differences between use of the empirical SEs and df relative to the useof the bias-corrected (using the Mancl and DeRouen correction) SEs and df? Why or whynot? (5 points)

iii) Comparing SE estimates, which structure seems to be correct, or at least reasonblyclose to being correct? Hint: Remember when these different SE estimators are and arenot appropriate; i.e., when they are and are not consistent estimators for the true SEs. (5points)

d) Look at your results from the analyses of the reduced dataset.

i) For which structure are there notable differences between the use of the typical model-basedstandard errors (SEs) and df relative to the use of the Kenward and Roger Adjustment? Whydoes this occur only for this one structure? Hint: How many nuisance covariance parametersare you estimating? (5 points)

ii) Are there notable differences between use of the empirical SEs and df relative to the useof the bias-corrected (using the Mancl and DeRouen correction) SEs and df? Why or whynot? (5 points)

Page 3: HW3

3

2. Suppose we carry out a general study of 75 subjects, and we are simply interested inthe association between X and Y (see the dataset association.sas7bdat). Fit the followingsimple linear regression model in proc reg:

Yi = β0 + β1Xi + εi

a) Look at the diagnostic plots that proc reg automatically outputs. Do you see any modelviolations? If so, what violation(s) do you see, and how can you tell? (5 points)

b) Fit the model again, only using the robust empirical SEs (use the Kauermann & Carrollcorrection). Fit the model in proc glimmix. Here is the appropriate code:

proc glimmix data=hw.Association empirical=root;

class id;

model y=x / solution;

random _residual_ / subject=id type=vc;

run;

i) How have the SE estimates changed? (You should report the model-based SE estimatesfrom a, and the empirical SEs.) (5 points)

ii) Which SE estimates are appropriate to use: the model-based estimates from proc reg orthe empirical estimates? (5 points)

Page 4: HW3

4

3. Suppose we carry out a general study of 100 subjects, and we fit the model below usingthe quest3data.sas7bdat dataset. This study is meant to represent a study in which subjectscome in for four equally spaced visits, and we are interested in the association between twovariables (x1 and x2) and an outcome (Y ). All variables are time-dependent; i.e., their valuesare not fixed throughout the study. Such variables could be blood pressure, body weight,etc. We think that there is no time effect, but we want to first test and make sure there isno time effect. Therefore, this model has a main effect for time, and interactions betweentime and x1 and x2. Suppose we know the true covariance structure has common variancesat each time point and the correlation structure is AR-1. (Note: I generated/simulated thisfake dataset, so I know that the true model from which I generated data has this structureand has no time effect.)

All tests should be at the 5% significance level. For a)-c), fit the true covariance structure,and assume that you are confident that it is the true covariance structure. Note that timeis continuous, so do not include it in the class statement in SAS.

Yij = β0 + β1x1ij + β2x2ij + β3timej + β4timejx1ij + β5timejx2ij + εij;

i = 1, . . . , 100; j = 1, 2, 3, 4; timej = j − 1

a) Carry out a likelihood ratio test for the following (testing to see if time belongs in themodel): (10 points)

H0 : β3 = β4 = β5 = 0

HA : β3 6= 0 and/or β4 6= 0 and/or β5 6= 0

b) Take time completely out of the model, and refit as below.

Yij = β0 + β1x1ij + β2x2ij + εij;

Use model-based standard error estimates and the Kenward and Roger (1997) adjustment.

i) Use Wald tests (you will do 2 separate tests) to test whether or not each of these twocovariates are associated with the outcome. Specifically, test the following for j = 1, 2:

H0 : βj = 0

HA : βj 6= 0

For each test, give the value for the test statistic, state what distribution the statistic ap-proximately follows under H0, give the p-value, and state your conclusion. (10 points)

ii) What are the 95% confidence intervals (CIs) for β1 and β2? Show your work. Hint: Thecritical values for the above two tests (if based on a t-distribution) are approximately ±1.97(exact critical values can be obtained using the following R code: qt(.975,df)). (5 points)

Page 5: HW3

5

c) You will now compare SE estimates. For model-based SE estimates, use the Kenwardand Roger (1997) adjustment. For empirical SE estimates, use the Kauermann and Carroll(2001) bias-correction.

i) What are the model-based SE estimates for β̂1 and β̂2? What are the empirical SEestimates for β̂1 and β̂2? Are the empirical and model-based SE estimates similar? Why orwhy not? (5 points)

ii) Now fit the model using a covariance structure that assumes common variances and a CScorrelation. What are the model-based SE estimates for β̂1 and β̂2? What are the empiricalSE estimates for β̂1 and β̂2? Are the empirical and model-based SE estimates similar? Whyor why not? (5 points)

iii) Which working covariance structure resulted in the smaller empirical SE estimates, andwhy? (5 points)