McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being...
Transcript of McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being...
McGill University
Part A Examination in Statistics
Methodology Paper
Department of Mathematics & Statistics
Date: Thursday, August 23rd 2007 Time: 13:00–17:00
Instructions
• Answer two questions out of Section 1. Only two questions will bemarked.
• Answer two questions out of Section 2. Only two questions will bemarked.
• If you do not indicate which questions you wish to have marked, thequestions will be marked in the order in which they appear in theanswer book until the quota has been reached.
• All questions are weighted equally (20 marks each).
• Good luck!
This exam comprises the cover, questions on pages 1 to 8, and tables on pages 41 to45.v 5-20070427
Part A Examination August 2007 Methodology Paper
Section 1: Answer two of questions SM1 to SM3.
SM1. This dataset is courtesy of Dr Waldon Garris University of Virginia School ofMedicine. Dr Garriss collected the data in a pilot study during his work inthe Dominican Republic in 1997. The subjects are persons who came to med-ical clinics in several villages; variables age, gender, village name systolic
blood pressure, and diastolic blood pressure were collected.
The primary research question of interest is to determine the extent to whichwe can use the first three covariates to predict the systolic blood pressure.
The output for question SM1 begins on page 9.
(a) [3 marks] Test for significance of each of the three covariates individually.Refer clearly to the part(s) of the output that you are using for your tests.Is there evidence to include only one (if any) of the covariates in the model?Or should one include both covariates? Explain.
(b) [3 marks] Test for the significance of each of three covariates in the pres-ence of both of the others. What covariates should be included in themodel for systolic blood pressure? Explain.
(c) [4 marks] State and comment on the appropriateness of the assumptionsof the linear regression model that you’ve selected.
(d) [10 marks] Refer to the code and figures for Question #1, Part (d) for thefollowing questions. We will only examine models that include all threecovariates for conciseness.
i. Interpret the Box-Cox transformation plot. Does the plot and theBox-Cox procedure suggest a transformation of the data is necessary?
ii. Assume now that one would transform the data (regardless of youranswer in part (i)), what transformation would you propose for thisdataset? Briefly explain why you choose your transformation. Brieflydiscuss the advantages and disadvantages of transforming the data theway you’ve proposed.
iii. A researcher wants to use the AIC and BIC to select a “best” regressionmodel for the data. Can the researcher use the AIC and BIC (notshown) to choose amongst transformations? Why or why not? If not,be sure to suggest a possible adjustment to the AIC/BIC values thatwould make such a comparison more reasonable.
1
Part A Examination August 2007 Methodology Paper
SM2. This data gives sugar cane yields for each paddock in the Mulgrave area ofNorth Queensland for the 1997 sugar cane season. It was obtained by DavidGregory and Nick Denman for their MS305 data project at The University ofQueensland in 1998. There are 3775 observations in the dataset.
Mulgrave is a region in North Queensland around the Mulgrave river and thecity of Cairns. Sugar cane is the primary industry in Mulgrave, and all sugarcane from the area is processed through the Mulgrave Central Mill. The datawere provided by the Bureau of Sugar Experimental Stations (BSES) on behalfof the Mulgrave Central Mill and was obtained from the OzDASL repository.
The response variable of interest is the commercial sugar content per rake pro-duced (Sugar). The goal of the analysis is to discover predictors of sugar contentand the best regression model using the following predictor variables:
• DistrictPosition: The Mulgrave area has been divided by the BSESinto fifteen districts, but the statistical authors grouped them further bylocation into 5 groups by location (Central, North, South, East, and West).
• Age: Cane planted the year before may be regarded as having age zero.Cane let to grow for one year after being cut (this is, the cane is firstratoon) can be considered to have an age of one.
• HarvestMonth: The sugar cane cutting season usually begins in June andconcludes in mid-November, the finishing date depending on how the sea-son has gone with respect to rainfall and mill breakdowns. Months arelabelled by their numerical equivalent (June = 6, July = 7, etc.).
The output for question SM2 begins on page 22.
(a) [6 marks] Refer only to the code and plots for part (a) for the followingtwo parts.
i. Give an interpretation for the coefficient DistrictPositionN withrespect to the mean of the response variable.
ii. Test for a significant effect of District Position Group on the sugarcontent yield. Comment on the model fit and validity of the modelassumptions.
(b) [8 marks] Refer only to the code and plots for parts (a) and (b) for thefollowing three parts.
i. Test for a significant effect of Age by itself.
ii. Test for significant effects of both DistrictPosition and Age in thepresence of the other.
2
Part A Examination August 2007 Methodology Paper
iii. Comment on the model fit and validity of the model assumptions forboth models in b(i) and b(ii).
iv. Should both covariates be included in the model? Explain why or whynot.
(c) [6 marks] Refer only to the code part (c) for the following part.
i. Assume that we will include Age and DistrictPosition in the model.There are three suggested modelling choices for HarvestMonth:
• Modeling HarvestMonth as a single quantitative variable
• Modeling HarvestMonth with a linear and quadratic term
• Modeling HarvestMonth as a factor (or categorical) variable
Discuss the relative merits of each of the three models from a statisticalperspective and choose what you would consider the “best” model. Besure your discussion makes clear your reasons for selecting one modelover the other two.
3
Part A Examination August 2007 Methodology Paper
SM3. The mean shift outlier model is given by:
y = Xβ + φzn + ε
where zn is a given (n× 1) vector with zeroes in all positions except the n−thposition, which contains a 1. φ is an unknown scalar parameter and ε is a (n×1)vector of independent Normal(0,σ2) random variables.
(a) [2 marks] Write out the expected value for the nth observation. Whatis the intercept for this model (i.e. what is the expected value for anobservation with all covariate values equal to 0)?
(b) [4 marks] If
d = [z′n(I−H)zn]−1z′n(I−H)y
then show that d = (yn−yn)2
1−hnn, where hnn is the n−th diagonal element of
the hat matrix for the covariates in X.
(c) [4 marks] Show that the increase in regression sum of squares after fittingφ is given by
SSR2 = d2z′n(I−H)zn = e2n/(1− hnn).
(d) [4 marks] From part (c), deduce the corresponding reduction in SSE.
(e) [6 marks] What statistical test would you do to test the hypothesis H0 :φ = 0? What influence diagnostic is this test statistic equivalent to?
4
Part A Examination August 2007 Methodology Paper
Section 2: Answer two of questions SM4 to SM6.
SM4. Dr P. J. Solomon of the Australian National Centre in HIV Epidemiology andClinical Research collected data on 2843 patients diagnosed with AIDS in Aus-tralia before 1 July 1991:
state: Grouped state of origin: NSW, Other, QLD or VIC
sex: Sex of patient
diag: (Julian) date of diagnosis (days)
death: (Julian) date of death or end of observation (days)
status: ”A” (alive) or ”D” (dead) at end of observation
T.categ: Reported transmission category (8 categories)
age: Age (years) at diagnosis.
The survival time (time) was assumed to have an exponential distribution witha log link to a linear model in the regressors.
The output for question SM4 begins on page 36.
Choose suitable models to decide if the survival time is related to
(a) [6 marks]
i. state
ii. sex
iii. transmission category
(b) [6 marks]
i. age
ii. date of diagnosis.
Does the survival time increase or decrease with age? with date of diag-nosis?
(c) [8 marks] Choose a suitable model to estimate the mean survival timeof a 25 year old male patient diagnosed with AIDS in NSW on July 12004 (diag=16253) who reported transmission by heterosexual contact(T.categhet), and the probability that such a patient would survive morethan three years (365×3=1095 days). How reliable do you think this esti-mator is?
5
Part A Examination August 2007 Methodology Paper
SM5. A breast cancer database was obtained from the University of Wisconsin Hos-pitals, Madison from Dr. William H. Wolberg. He assessed biopsies of breasttumors for 699 patients up to 15 July 1992; each of nine attributes has beenscored on a scale of 1 to 10, and the outcome is also known: benign (Y=0) ormalignant (Y=1). This data frame contains the following columns:
V1 Clump thickness
V2 Uniformity of cell size
V3 Uniformity of cell shape
V4 Marginal adhesion
V5 Single epithelial cell size
V6 Bare nuclei (16 values are missing)
V7 Bland chromatin
V8 Normal nucleoli
V9 Mitoses class ”benign” or ”malignant”
The output for question SM5 begins on page 38.
(a) [6 marks] To relate the probability that a tumor is malignant to the firstvariable, clump thickness, two sets of models were fitted, the first assuminga normal family, the second assuming a binomial family. Choose suitablemodels to test for a linear effect of V1.
(b) [3 marks] A factor fV1 was created taking the values of V1 as levels. Usethis to test if the effect of clump thickness is linear in V1 (as opposed tonon-linear).
(c) [3 marks] Look at Figure 17 (a) and (b) on page 41. Why is it that theplots of the fitted values using the model with V1 (triangles) are differentin Figures (a) and (b), yet the plots of the fitted values using the modelwith fV1 (circles) are the same in Figures (a) and (b)? Explain.
(d) [6 marks] Which attributes are related to the malignancy of breast tu-mors?
(e) [2 marks] Do you think a goodness of fit test for the last model is valid?If so, do it; if not, say why not.
6
Part A Examination August 2007 Methodology Paper
SM6. Carl Morris (see next page) showed that there are only six families of distri-butions in the exponential family with quadratic variance functions: normal,poisson, gamma, binomial, negative binomial, and a sixth distribution which hecalled the hyperbolic secant distribution. Its variance function is V (µ) = µ2+1,it is continuous on (−∞,∞) (like the normal distribution), but it is not sym-metric. The deviance parameter is φ > 0. (If m = 1/φ is an integer and µ = 0,then the hyperbolic secant random variable is Y = (2/π)
∑mi=1 log |Ci|, where
C1, . . . , Cm are independent Cauchy random variables.)
(a) [4 marks] Find the canonical link. [Hint: make the substitution µ =tan θ]. Is this a good choice for a generalized linear model?
(b) [4 marks] What is the variance function of the inverse hyperbolic secantdistribution?
(c) [4 marks] Find an expression for the deviance as a function of the obser-vations Y1, Y2, . . . , Yn and their fitted values µ1, µ2, . . . , µn.
(d) [4 marks] Suppose we have 4 observations from this distribution withvalues 0.2,0.5,0.4,0.9. If the mean µ and the deviance parameter φ is thethe same for each observation, find the maximum likelihood estimate of µ,and any good estimate of φ.
(e) [4 marks] We suspect that the data in (d) have a hyperbolic secant dis-tribution with φ = 0.05. Do you think a goodness of fit test for this modelwith φ = 0.05 is valid? If so, do it (approximately); if not, say why not.
7
Part A Examination August 2007 Methodology Paper
8
Part A Examination August 2007 Methodology Paper
Code and output for Question SM1
###Code and output for Question SM1 (a)
> age.mod<-lm(sbp ~ age)
> summary(age.mod)
Call:
lm(formula = sbp ~ age)
Residuals:
Min 1Q Median 3Q Max
-63.080 -16.688 -2.787 11.961 96.815
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 108.30786 4.19013 25.848 < 2e-16 ***
age 0.51462 0.08332 6.176 1.69e-09 ***
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 24.54 on 379 degrees of freedom
Multiple R-Squared: 0.09145, Adjusted R-squared: 0.08905
F-statistic: 38.15 on 1 and 379 DF, p-value: 1.692e-09
############
> gen.mod<-lm(sbp~gender)
> summary(gen.mod)
Call:
lm(formula = sbp ~ gender)
Residuals:
Min 1Q Median 3Q Max
-53.198 -15.198 -3.198 16.802 103.431
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 133.1977 1.6030 83.093 <2e-16 ***
genderMale -0.6286 2.8213 -0.223 0.824
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 25.75 on 379 degrees of freedom
Multiple R-Squared: 0.000131, Adjusted R-squared: -0.002507
F-statistic: 0.04964 on 1 and 379 DF, p-value: 0.8238
9
Part A Examination August 2007 Methodology Paper
##########
> vill.mod<-lm(sbp~village)
> summary(vill.mod)
Call:
lm(formula = sbp ~ village)
Residuals:
Min 1Q Median 3Q Max
-51.625 -18.456 -4.714 14.390 104.375
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 129.625 4.558 28.440 <2e-16 ***
villageBatey Verde 5.985 6.082 0.984 0.326
villageCarmona 3.375 5.582 0.605 0.546
villageCojobal 3.172 5.660 0.560 0.576
villageJuan Sanchez 5.733 5.772 0.993 0.321
villageLa Altagracia 2.000 6.115 0.327 0.744
villageLos Gueneos -1.169 5.695 -0.205 0.838
villageSan Antonio 9.089 6.306 1.441 0.150
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 25.78 on 373 degrees of freedom
Multiple R-Squared: 0.01328, Adjusted R-squared: -0.005241
F-statistic: 0.717 on 7 and 373 DF, p-value: 0.6577
############
########### Code and output for Question SM1 (b)
> all.mod<-lm(sbp~age+village+gender)
> summary(all.mod)
Call:
lm(formula = sbp ~ age + village + gender)
Residuals:
Min 1Q Median 3Q Max
-63.142 -15.789 -3.318 12.842 101.729
10
Part A Examination August 2007 Methodology Paper
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 105.63285 5.89386 17.923 < 2e-16 ***
age 0.55251 0.08717 6.339 6.73e-10 ***
villageBatey Verde 4.94172 5.80636 0.851 0.3953
villageCarmona 2.47623 5.31836 0.466 0.6418
villageCojobal 2.24775 5.42089 0.415 0.6786
villageJuan Sanchez 4.87772 5.51608 0.884 0.3771
villageLa Altagracia 1.07375 5.82694 0.184 0.8539
villageLos Gueneos -0.55883 5.43606 -0.103 0.9182
villageSan Antonio 7.15539 6.01397 1.190 0.2349
genderMale -5.58559 2.84672 -1.962 0.0505 .
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
Residual standard error: 24.56 on 371 degrees of freedom
Multiple R-Squared: 0.1098, Adjusted R-squared: 0.08823
F-statistic: 5.086 on 9 and 371 DF, p-value: 1.699e-06
######
> anova(glm(sbp~age+gender+village))
Analysis of Deviance Table
Model: gaussian, link: identity
Response: sbp
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 380 251294
age 1 22980 379 228314
gender 1 2518 378 225796
village 7 2100 371 223696
#####
> anova(glm(sbp~age+village+gender))
Analysis of Deviance Table
Model: gaussian, link: identity
Response: sbp
11
Part A Examination August 2007 Methodology Paper
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 380 251294
age 1 22980 379 228314
village 7 2297 372 226017
gender 1 2321 371 223696
######
> anova(glm(sbp~village+gender+age))
Analysis of Deviance Table
Model: gaussian, link: identity
Response: sbp
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev
NULL 380 251294
village 7 3336 373 247958
gender 1 37 372 247921
age 1 24225 371 223696
12
Part A Examination August 2007 Methodology Paper
120 130 140 150 160
−50
050
100
Fitted values
Res
idua
ls
Residuals vs Fitted
226
24882
−3 −2 −1 0 1 2 3
−2
01
23
4Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
226
24882
120 130 140 150 160
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
ndar
dize
d re
sidu
als
Scale−Location226
24882
0 100 200 300
0.00
0.02
0.04
0.06
0.08
Obs. number
Coo
k’s
dist
ance
Cook’s distance339338
259
Figure 1: Regression diagnostics for sbp ~age
13
Part A Examination August 2007 Methodology Paper
132.6 132.8 133.0 133.2
−50
050
100
Fitted values
Res
idua
ls
Residuals vs Fitted
226
259 248
−3 −2 −1 0 1 2 3
−2
−1
01
23
4Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
226
259248
132.6 132.8 133.0 133.2
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
ndar
dize
d re
sidu
als
Scale−Location226
259 248
0 100 200 300
0.00
0.02
0.04
0.06
Obs. number
Coo
k’s
dist
ance
Cook’s distance226
259
9
Figure 2: Regression diagnostics for sbp ~gender
14
Part A Examination August 2007 Methodology Paper
132.6 132.8 133.0 133.2
−50
050
100
Fitted values
Res
idua
ls
Residuals vs Fitted
226
259 248
−3 −2 −1 0 1 2 3
−2
−1
01
23
4Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
226
259248
132.6 132.8 133.0 133.2
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
ndar
dize
d re
sidu
als
Scale−Location226
259 248
0 100 200 300
0.00
0.02
0.04
0.06
Obs. number
Coo
k’s
dist
ance
Cook’s distance226
259
9
Figure 3: Regression diagnostics for sbp ~village
15
Part A Examination August 2007 Methodology Paper
120 130 140 150 160
−50
050
100
Fitted values
Res
idua
ls
Residuals vs Fitted
226
248174
−3 −2 −1 0 1 2 3
−2
01
23
4Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
226
248174
120 130 140 150 160
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
ndar
dize
d re
sidu
als
Scale−Location226
248174
0 100 200 300
0.00
0.02
0.04
Obs. number
Coo
k’s
dist
ance
Cook’s distance226
248338
Figure 4: Regression diagnostics for sbp ~village + age
16
Part A Examination August 2007 Methodology Paper
128 130 132 134 136 138
−50
050
100
Fitted values
Res
idua
ls
Residuals vs Fitted
226
248259
−3 −2 −1 0 1 2 3
−2
−1
01
23
4Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
226
248259
128 130 132 134 136 138
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
ndar
dize
d re
sidu
als
Scale−Location226
248259
0 100 200 300
0.00
0.02
0.04
0.06
Obs. number
Coo
k’s
dist
ance
Cook’s distance226
259248
Figure 5: Regression diagnostics for sbp ~village + gender
17
Part A Examination August 2007 Methodology Paper
110 120 130 140 150 160
−50
050
100
Fitted values
Res
idua
ls
Residuals vs Fitted
226
9 248
−3 −2 −1 0 1 2 3
−2
02
4Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
226
9248
110 120 130 140 150 160
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
ndar
dize
d re
sidu
als
Scale−Location226
9 248
0 100 200 300
0.00
0.02
0.04
0.06
0.08
Obs. number
Coo
k’s
dist
ance
Cook’s distance339338
226
Figure 6: Regression diagnostics for sbp ~age + gender
18
Part A Examination August 2007 Methodology Paper
110 120 130 140 150 160
−50
050
100
Fitted values
Res
idua
ls
Residuals vs Fitted
226
2489
−3 −2 −1 0 1 2 3
−2
02
4Theoretical Quantiles
Sta
ndar
dize
d re
sidu
als
Normal Q−Q
226
2489
110 120 130 140 150 160
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
ndar
dize
d re
sidu
als
Scale−Location226
2489
0 100 200 300
0.00
0.02
0.04
Obs. number
Coo
k’s
dist
ance
Cook’s distance226
338339
Figure 7: Regression diagnostics for sbp ~age + gender + village
19
Part A Examination August 2007 Methodology Paper
####
#### Output and code for Question SM1 (d)
> mybox<-boxcox(lm(sbp~age+gender+village))
> mybox$x[order(mybox$y,decreasing=T)[1]]
[1] -0.5454545
20
Part A Examination August 2007 Methodology Paper
−2 −1 0 1 2
−23
80−
2370
−23
60−
2350
−23
40−
2330
λ
log−
Like
lihoo
d
95%
Figure 8: Box-Cox transformation diagnostic plot for systolic blood pressure dataset
21
Part A Examination August 2007 Methodology Paper
Code and output for Question SM2
### Code for Question SM2, part (a)
> districtonly.mod<-lm(Sugar~DistrictPosition)
> summary(districtonly.mod)
Call:
lm(formula = Sugar ~ DistrictPosition)
Residuals:
Min 1Q Median 3Q Max
-5.5326 -0.8103 0.0716 0.8912 4.2147
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.42840 0.06007 190.248 < 2e-16 ***
DistrictPositionE 0.35294 0.08566 4.120 3.86e-05 ***
DistrictPositionN 1.72268 0.08709 19.781 < 2e-16 ***
DistrictPositionS 0.18416 0.08281 2.224 0.026222 *
DistrictPositionW 0.23187 0.06792 3.414 0.000646 ***
----
Residual standard error: 1.341 on 3770 degrees of freedom
Multiple R-Squared: 0.1227,Adjusted R-squared: 0.1217
F-statistic: 131.8 on 4 and 3770 DF, p-value: < 2.2e-16
22
Part A Examination August 2007 Methodology Paper
C E N S W
68
10
12
14
16
Figure 9: Boxplots of Sugar by DistrictPosition
23
Part A Examination August 2007 Methodology Paper
11.5 12.0 12.5 13.0
−6
−2
02
4
Fitted values
Re
sid
ua
ls
Residuals vs Fitted
11991198703
−2 0 2
−4
−2
02
4Theoretical Quantiles
Sta
nd
ard
ize
d r
esid
ua
ls
Normal Q−Q
11991198703
11.5 12.0 12.5 13.0
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
nd
ard
ize
d r
esid
ua
ls
Scale−Location11991198703
0 1000 2000 3000
0.0
00
0.0
03
0.0
06
Obs. number
Co
ok’s
dis
tan
ce
Cook’s distance703
11991198
Figure 10: Regression diagnostics for model for Sugar including onlyDistrictPosition
24
Part A Examination August 2007 Methodology Paper
### Code for Question SM2, part (b)
###### Age Only
> age.mod <- lm(Sugar~Age)
> summary(age.mod)
Call:
lm(formula = Sugar ~ Age)
Residuals:
Min 1Q Median 3Q Max
-5.49281 -0.87684 0.03579 0.89816 5.21414
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.14586 0.03776 321.66 <2e-16 ***
Age -0.15305 0.01395 -10.97 <2e-16 ***
---
Residual standard error: 1.408 on 3773 degrees of freedom
Multiple R-Squared: 0.03092,Adjusted R-squared: 0.03066
F-statistic: 120.4 on 1 and 3773 DF, p-value: < 2.2e-16
########### District Position + Age
> summary(district.age.mod)
Call:
lm(formula = Sugar ~ DistrictPosition + Age)
Residuals:
Min 1Q Median 3Q Max
-5.44238 -0.80695 0.08379 0.90526 4.44009
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.66672 0.06424 181.606 < 2e-16 ***
DistrictPositionE 0.40397 0.08478 4.765 1.96e-06 ***
DistrictPositionN 1.70123 0.08606 19.767 < 2e-16 ***
DistrictPositionS 0.25396 0.08213 3.092 0.002 **
DistrictPositionW 0.28142 0.06729 4.182 2.95e-05 ***
Age -0.12831 0.01324 -9.687 < 2e-16 ***
---
25
Part A Examination August 2007 Methodology Paper
Residual standard error: 1.324 on 3769 degrees of freedom
Multiple R-Squared: 0.144,Adjusted R-squared: 0.1429
F-statistic: 126.8 on 5 and 3769 DF, p-value: < 2.2e-16
#############
> anova(age.mod,district.age.mod)
Analysis of Variance Table
Model 1: Sugar ~ Age
Model 2: Sugar ~ DistrictPosition + Age
Res.Df RSS Df Sum of Sq F Pr(>F)
1 3773 7483.5
2 3769 6610.3 4 873.2 124.47 < 2.2e-16 ***
> anova(districtonly.mod,district.age.mod)
Analysis of Variance Table
Model 1: Sugar ~ DistrictPosition
Model 2: Sugar ~ DistrictPosition + Age
Res.Df RSS Df Sum of Sq F Pr(>F)
1 3770 6774.9
2 3769 6610.3 1 164.6 93.846 < 2.2e-16 ***
---
26
Part A Examination August 2007 Methodology Paper
0 2 4 6 8
68
10
12
14
16
Age
Sugar
Figure 11: Plot of Sugar by Age
27
Part A Examination August 2007 Methodology Paper
11.0 11.2 11.4 11.6 11.8 12.0
−6
−2
02
46
Fitted values
Re
sid
ua
ls
Residuals vs Fitted
70311991198
−2 0 2
−4
−2
02
4Theoretical Quantiles
Sta
nd
ard
ize
d r
esid
ua
ls
Normal Q−Q
70311991198
11.0 11.2 11.4 11.6 11.8 12.0
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
nd
ard
ize
d r
esid
ua
ls
Scale−Location70311991198
0 1000 2000 3000
0.0
00
0.0
04
0.0
08
Obs. number
Co
ok’s
dis
tan
ce
Cook’s distance11991198
658
Figure 12: Regression diagnostics for model for Sugar including only Age
28
Part A Examination August 2007 Methodology Paper
11.0 11.5 12.0 12.5 13.0
−6
−2
02
4
Fitted values
Re
sid
ua
ls
Residuals vs Fitted
70311991198
−2 0 2
−4
−2
02
4Theoretical Quantiles
Sta
nd
ard
ize
d r
esid
ua
ls
Normal Q−Q
70311991198
11.0 11.5 12.0 12.5 13.0
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
nd
ard
ize
d r
esid
ua
ls
Scale−Location70311991198
0 1000 2000 3000
0.0
00
0.0
03
0.0
06
Obs. number
Co
ok’s
dis
tan
ce
Cook’s distance1199703 1198
Figure 13: Regression diagnostics for model for Sugar including only Age andDistrictPosition
29
Part A Examination August 2007 Methodology Paper
### Code for Question SM2, part (c)
###### HarvestMonth as linear
> harvest.mod<-lm(Sugar~DistrictPosition+Age+HarvestMonth)
> summary(harvest.mod)
Call:
lm(formula = Sugar ~ DistrictPosition + Age + HarvestMonth)
Residuals:
Min 1Q Median 3Q Max
-5.70976 -0.76860 0.07636 0.88184 4.04782
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.16050 0.13535 75.068 < 2e-16 ***
DistrictPositionE 0.42889 0.08309 5.161 2.58e-07 ***
DistrictPositionN 1.74824 0.08441 20.712 < 2e-16 ***
DistrictPositionS 0.28550 0.08051 3.546 0.000396 ***
DistrictPositionW 0.28567 0.06593 4.333 1.51e-05 ***
Age -0.13841 0.01300 -10.645 < 2e-16 ***
HarvestMonth 0.17588 0.01399 12.570 < 2e-16 ***
-----
Residual standard error: 1.298 on 3768 degrees of freedom
Multiple R-Squared: 0.1784,Adjusted R-squared: 0.1771
F-statistic: 136.4 on 6 and 3768 DF, p-value: < 2.2e-16
###### HarvestMonth as factor
> harvest.fact.mod<-lm(Sugar~DistrictPosition+Age+factor(HarvestMonth))
> summary(harvest.quad.mod)
Call:
lm(formula = Sugar ~ DistrictPosition + Age + HarvestMonth +
I(HarvestMonth^2))
Residuals:
Min 1Q Median 3Q Max
-5.69731 -0.76383 0.07553 0.85751 4.38852
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.77596 0.73698 2.410 0.016010 *
30
Part A Examination August 2007 Methodology Paper
DistrictPositionE 0.47098 0.08175 5.761 9.01e-09 ***
DistrictPositionN 1.80860 0.08312 21.758 < 2e-16 ***
DistrictPositionS 0.30759 0.07915 3.886 0.000104 ***
DistrictPositionW 0.29750 0.06481 4.591 4.56e-06 ***
Age -0.08725 0.01352 -6.452 1.24e-10 ***
HarvestMonth 2.16681 0.17267 12.549 < 2e-16 ***
I(HarvestMonth^2) -0.11630 0.01006 -11.567 < 2e-16 ***
---
Residual standard error: 1.275 on 3767 degrees of freedom
Multiple R-Squared: 0.2066,Adjusted R-squared: 0.2051
F-statistic: 140.2 on 7 and 3767 DF, p-value: < 2.2e-16
###### HarvestMonth with linear and quadratic terms
> harvest.quad.mod<-lm(Sugar~DistrictPosition+Age+HarvestMonth +
I(HarvestMonth^2))
> summary(harvest.fact.mod)
Call:
lm(formula = Sugar ~ DistrictPosition + Age + factor(HarvestMonth))
Residuals:
Min 1Q Median 3Q Max
-5.53426 -0.75585 0.06586 0.84534 4.18221
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.43768 0.09790 106.618 < 2e-16 ***
DistrictPositionE 0.47509 0.08133 5.841 5.62e-09 ***
DistrictPositionN 1.82553 0.08277 22.055 < 2e-16 ***
DistrictPositionS 0.32759 0.07891 4.151 3.38e-05 ***
DistrictPositionW 0.31851 0.06463 4.928 8.65e-07 ***
Age -0.08761 0.01346 -6.510 8.50e-11 ***
factor(HarvestMonth)7 0.82974 0.08483 9.781 < 2e-16 ***
factor(HarvestMonth)8 1.34458 0.08659 15.528 < 2e-16 ***
factor(HarvestMonth)9 1.40121 0.08659 16.183 < 2e-16 ***
factor(HarvestMonth)10 1.15057 0.08423 13.659 < 2e-16 ***
factor(HarvestMonth)11 1.28707 0.09230 13.944 < 2e-16 ***
----
Residual standard error: 1.268 on 3764 degrees of freedom
Multiple R-Squared: 0.216,Adjusted R-squared: 0.214
31
Part A Examination August 2007 Methodology Paper
F-statistic: 103.7 on 10 and 3764 DF, p-value: < 2.2e-16
#### Model selection criteria
> AIC(harvest.mod)
[1] 12688.75
> AIC(harvest.quad.mod)
[1] 12559.00
> AIC(harvest.fact.mod)
[1] 12519.90
> AIC(harvest.mod,k=log(length(Sugar)))
[1] 12738.64
> AIC(harvest.quad.mod,k=log(length(Sugar)))
[1] 12615.13
> AIC(harvest.fact.mod,k=log(length(Sugar)))
[1] 12594.74
32
Part A Examination August 2007 Methodology Paper
10.5 11.0 11.5 12.0 12.5 13.0 13.5
−6
−2
02
4
Fitted values
Re
sid
ua
ls
Residuals vs Fitted
7031199 709
−2 0 2
−4
−2
02
4Theoretical Quantiles
Sta
nd
ard
ize
d r
esid
ua
ls
Normal Q−Q
7031199709
10.5 11.0 11.5 12.0 12.5 13.0 13.5
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
nd
ard
ize
d r
esid
ua
ls
Scale−Location7031199 709
0 1000 2000 3000
0.0
00
0.0
04
0.0
08
Obs. number
Co
ok’s
dis
tan
ce
Cook’s distance1199
70924
Figure 14: Regression diagnostics for model using Harvest
33
Part A Examination August 2007 Methodology Paper
10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5
−6
−2
02
4
Fitted values
Re
sid
ua
ls
Residuals vs Fitted
70311981199
−2 0 2
−4
−2
02
4Theoretical Quantiles
Sta
nd
ard
ize
d r
esid
ua
ls
Normal Q−Q
70311981199
10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
nd
ard
ize
d r
esid
ua
ls
Scale−Location70311981199
0 1000 2000 3000
0.0
00
0.0
04
0.0
08
Obs. number
Co
ok’s
dis
tan
ce
Cook’s distance709 1199
442
Figure 15: Regression diagnostics for model using Harvest2
34
Part A Examination August 2007 Methodology Paper
10 11 12 13
−6
−2
02
4
Fitted values
Re
sid
ua
ls
Residuals vs Fitted
11997031198
−2 0 2
−4
−2
02
4Theoretical Quantiles
Sta
nd
ard
ize
d r
esid
ua
ls
Normal Q−Q
11997031198
10 11 12 13
0.0
0.5
1.0
1.5
2.0
Fitted values
Sta
nd
ard
ize
d r
esid
ua
ls
Scale−Location11997031198
0 1000 2000 3000
0.0
00
0.0
04
Obs. number
Co
ok’s
dis
tan
ce
Cook’s distance1199709
24
Figure 16: Regression diagnostics for model using factor(Harvest)
35
Part A Examination August 2007 Methodology Paper
Code and output for Question SM4
> data(Aids2)> attach(Aids2)> time<-death-diag+1> c<-codes(status)-1> rate<-c/time> summary(glm(rate~state+sex+diag+T.categ+age, family=poisson, weight=time))
Call:glm(formula = rate ~ state + sex + diag + T.categ + age, family = poisson,
weights = time)
Deviance Residuals:Min 1Q Median 3Q Max
-4.37597 -0.77433 0.04455 0.91472 3.42263
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.6728465 0.4716294 -7.788 6.83e-15 ***stateOther -0.0944785 0.0895655 -1.055 0.29149stateQLD 0.1860238 0.0878128 2.118 0.03414 *stateVIC -0.0018092 0.0613208 -0.030 0.97646sexM -0.0369529 0.1757609 -0.210 0.83348diag -0.0003179 0.0000421 -7.552 4.29e-14 ***T.categhsid -0.1211765 0.1520374 -0.797 0.42544T.categid -0.3799289 0.2459986 -1.544 0.12248T.categhet -0.7307894 0.2652388 -2.755 0.00587 **T.categhaem 0.3462834 0.1881367 1.841 0.06568 .T.categblood 0.1393095 0.1374007 1.014 0.31063T.categmother 0.4603228 0.5893405 0.781 0.43475T.categother 0.1200160 0.1636915 0.733 0.46345age 0.0139496 0.0024987 5.583 2.37e-08 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 4407.2 on 2842 degrees of freedomResidual deviance: 4283.0 on 2829 degrees of freedomAIC: Inf
Number of Fisher Scoring iterations: 8
> glm(rate~state+sex+diag+age, family=poisson, weight=time)
Degrees of Freedom: 2842 Total (i.e. Null); 2836 ResidualNull Deviance: 4407Residual Deviance: 4302 AIC: Inf> glm(rate~sex+diag+T.categ+age, family=poisson, weight=time)
Degrees of Freedom: 2842 Total (i.e. Null); 2832 ResidualNull Deviance: 4407Residual Deviance: 4289 AIC: Inf
36
Part A Examination August 2007 Methodology Paper
> glm(rate~state+diag+T.categ+age, family=poisson, weight=time)
Degrees of Freedom: 2842 Total (i.e. Null); 2830 ResidualNull Deviance: 4407Residual Deviance: 4283 AIC: Inf> glm(rate~diag+T.categ+age, family=poisson, weight=time)
Degrees of Freedom: 2842 Total (i.e. Null); 2833 ResidualNull Deviance: 4407Residual Deviance: 4289 AIC: Inf> glm(rate~sex+diag+age, family=poisson, weight=time)
Degrees of Freedom: 2842 Total (i.e. Null); 2839 ResidualNull Deviance: 4407Residual Deviance: 4308 AIC: InfThere were 50 or more warnings (use warnings() to see the first 50)> glm(rate~state+diag+age, family=poisson, weight=time)
Degrees of Freedom: 2842 Total (i.e. Null); 2837 ResidualNull Deviance: 4407Residual Deviance: 4303 AIC: InfThere were 50 or more warnings (use warnings() to see the first 50)> summary(glm(rate~diag+age, family=poisson, weight=time))
Call:glm(formula = rate ~ diag + age, family = poisson, weights = time)
Deviance Residuals:Min 1Q Median 3Q Max
-4.19449 -0.77350 0.04316 0.91967 3.40510
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.681e+00 4.352e-01 -8.457 < 2e-16 ***diag -3.251e-04 4.122e-05 -7.888 3.07e-15 ***age 1.521e-02 2.411e-03 6.308 2.83e-10 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 4407.2 on 2842 degrees of freedomResidual deviance: 4309.4 on 2840 degrees of freedomAIC: Inf
Number of Fisher Scoring iterations: 8
There were 50 or more warnings (use warnings() to see the first 50)
37
Part A Examination August 2007 Methodology Paper
Code and output for Question SM5
> data(biopsy)> attach(biopsy)> Y<-codes(class)-1> fV1<-factor(V1)> par(mfrow=c(2,2))> glm0<-glm(Y~V1)> summary(glm0)Call:glm(formula = Y ~ V1)
Deviance Residuals:Min 1Q Median 3Q Max
-0.77804 -0.17331 -0.01994 0.06859 1.06859
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.189535 0.023395 -8.102 2.43e-15 ***V1 0.120947 0.004467 27.078 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for gaussian family taken to be 0.1104095)
Null deviance: 157.908 on 698 degrees of freedomResidual deviance: 76.955 on 697 degrees of freedomAIC: 447.39 Number of Fisher Scoring iterations: 2
> glm1<-glm(Y~fV1)> summary(glm1)Call:glm(formula = Y ~ fV1)
Deviance Residuals:Min 1Q Median 3Q Max
-9.565e-01 -1.111e-01 -2.069e-02 3.331e-15 9.793e-01
Coefficients:Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.02069 0.02647 0.782 0.43466fV12 0.05931 0.05227 1.135 0.25689fV13 0.09042 0.04051 2.232 0.02593 *fV14 0.12931 0.04439 2.913 0.00369 **fV15 0.32546 0.03850 8.455 < 2e-16 ***fV16 0.50872 0.06073 8.377 3.04e-16 ***fV17 0.93583 0.07153 13.083 < 2e-16 ***fV18 0.89235 0.05393 16.546 < 2e-16 ***fV19 0.97931 0.08920 10.979 < 2e-16 ***fV110 0.97931 0.04661 21.010 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for gaussian family taken to be 0.1015776)
Null deviance: 157.908 on 698 degrees of freedomResidual deviance: 69.987 on 689 degrees of freedomAIC: 397.04 Number of Fisher Scoring iterations: 2
38
Part A Examination August 2007 Methodology Paper
> plot(V1,fitted(glm1))> points(V1,fitted(glm0),pch=2)> title(’Figure 2.1: Normal family’)> glm0<-glm(Y~V1,family=binomial)> summary(glm0)Call:glm(formula = Y ~ V1, family = binomial)
Deviance Residuals:Min 1Q Median 3Q Max
-2.1986 -0.4261 -0.1704 0.1730 2.9118
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.16017 0.37772 -13.66 <2e-16 ***V1 0.93546 0.07372 12.69 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 900.53 on 698 degrees of freedomResidual deviance: 464.05 on 697 degrees of freedomAIC: 468.05 Number of Fisher Scoring iterations: 5
> glm1<-glm(Y~fV1, family=binomial)> summary(glm1)Call:glm(formula = Y ~ fV1, family = binomial)
Deviance Residuals:Min 1Q Median 3Q Max
-2.50419 -0.48535 -0.20448 0.01184 2.78500
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.8572 0.5834 -6.611 3.81e-11 ***fV12 1.4149 0.7824 1.808 0.07054 .fV13 1.7778 0.6589 2.698 0.00697 **fV14 2.1226 0.6621 3.206 0.00135 **fV15 3.2212 0.6119 5.265 1.40e-07 ***fV16 3.9750 0.6771 5.871 4.34e-09 ***fV17 6.9483 1.1772 5.902 3.58e-09 ***fV18 6.2086 0.7837 7.922 2.33e-15 ***fV19 13.4232 19.3753 0.693 0.48844fV110 13.4232 8.7430 1.535 0.12471---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 900.53 on 698 degrees of freedomResidual deviance: 450.21 on 689 degrees of freedomAIC: 470.21 Number of Fisher Scoring iterations: 8
39
Part A Examination August 2007 Methodology Paper
> plot(V1,fitted(glm1))> points(V1,fitted(glm0),pch=2)> title(’Figure 2.2: Binomial family’)> summary(glm(Y~V1+V2+V3+V4+V5+V6+V7+V8+V9, family=binomial))Call:glm(formula = Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9,
family = binomial)
Deviance Residuals:Min 1Q Median 3Q Max
-3.48404 -0.11529 -0.06192 0.02221 2.46983
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.103859 1.170793 -8.630 < 2e-16 ***V1 0.535008 0.141838 3.772 0.000162 ***V2 -0.006278 0.208786 -0.030 0.976011V3 0.322705 0.230224 1.402 0.161005V4 0.330634 0.123318 2.681 0.007337 **V5 0.096634 0.156467 0.618 0.536836V6 0.383024 0.093741 4.086 4.39e-05 ***V7 0.447184 0.171156 2.613 0.008982 **V8 0.213030 0.112757 1.889 0.058855 .V9 0.534817 0.328105 1.630 0.103098---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 884.35 on 682 degrees of freedomResidual deviance: 102.89 on 673 degrees of freedomAIC: 122.89 Number of Fisher Scoring iterations: 7
> summary(glm(Y~V1+V4+V6+V7, family=binomial))Call:glm(formula = Y ~ V1 + V4 + V6 + V7, family = binomial)
Deviance Residuals:Min 1Q Median 3Q Max
-3.69637 -0.14510 -0.06093 0.02317 2.44758
Coefficients:Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.11370 1.03190 -9.801 < 2e-16 ***V1 0.81166 0.12579 6.453 1.10e-10 ***V4 0.43412 0.11399 3.808 0.00014 ***V6 0.48136 0.08813 5.462 4.72e-08 ***V7 0.70154 0.15190 4.619 3.87e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 884.35 on 682 degrees of freedomResidual deviance: 125.77 on 678 degrees of freedomAIC: 135.77 Number of Fisher Scoring iterations: 7
40
Part A Examination August 2007 Methodology Paper
2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
V1
fitte
d(gl
m1)
a: Normal family
2 4 6 8 10
0.0
0.2
0.4
0.6
0.8
1.0
V1
fitte
d(gl
m1)
b: Binomial family
Figure 17: Figure for question SM4
41
Part A Examination August 2007 Methodology Paper
Upper tail probabilities of the standard Normal distribution
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.46410.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.42470.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.38590.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.34830.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.31210.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.27760.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.24510.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.21480.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.18670.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.16111.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.13791.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.11701.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.09851.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.08231.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.06811.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.05591.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.04551.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.03671.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.02941.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.02332.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.01832.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.01432.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.01102.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.00842.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.00642.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.00482.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.00362.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.00262.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.00192.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.00143.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.00103.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.00073.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.00053.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.00033.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.00023.5 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.00023.6 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.00013.7 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.00013.8 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.00013.9 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.00004.0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
42
Part A Examination August 2007 Methodology Paper
Quantiles of the t and χ2 distributions
Quantiles of the t distribution Quantiles of the χ2 distributionDegrees of Upper tail probability Upper tail probabilityfreedom 0.1 0.05 0.025 0.01 0.005 0.1 0.05 0.025 0.01 0.005
1 3.08 6.31 12.71 31.82 63.66 2.71 3.84 5.02 6.63 7.882 1.89 2.92 4.30 6.96 9.92 4.61 5.99 7.38 9.21 10.603 1.64 2.35 3.18 4.54 5.84 6.25 7.81 9.35 11.34 12.844 1.53 2.13 2.78 3.75 4.60 7.78 9.49 11.14 13.28 14.865 1.48 2.02 2.57 3.36 4.03 9.24 11.07 12.83 15.09 16.756 1.44 1.94 2.45 3.14 3.71 10.64 12.59 14.45 16.81 18.557 1.41 1.89 2.36 3.00 3.50 12.02 14.07 16.01 18.48 20.288 1.40 1.86 2.31 2.90 3.36 13.36 15.51 17.53 20.09 21.959 1.38 1.83 2.26 2.82 3.25 14.68 16.92 19.02 21.67 23.5910 1.37 1.81 2.23 2.76 3.17 15.99 18.31 20.48 23.21 25.1911 1.36 1.80 2.20 2.72 3.11 17.28 19.68 21.92 24.72 26.7612 1.36 1.78 2.18 2.68 3.05 18.55 21.03 23.34 26.22 28.3013 1.35 1.77 2.16 2.65 3.01 19.81 22.36 24.74 27.69 29.8214 1.35 1.76 2.14 2.62 2.98 21.06 23.68 26.12 29.14 31.3215 1.34 1.75 2.13 2.60 2.95 22.31 25.00 27.49 30.58 32.8016 1.34 1.75 2.12 2.58 2.92 23.54 26.30 28.85 32.00 34.2717 1.33 1.74 2.11 2.57 2.90 24.77 27.59 30.19 33.41 35.7218 1.33 1.73 2.10 2.55 2.88 25.99 28.87 31.53 34.81 37.1619 1.33 1.73 2.09 2.54 2.86 27.20 30.14 32.85 36.19 38.5820 1.33 1.72 2.09 2.53 2.85 28.41 31.41 34.17 37.57 40.0021 1.32 1.72 2.08 2.52 2.83 29.62 32.67 35.48 38.93 41.4022 1.32 1.72 2.07 2.51 2.82 30.81 33.92 36.78 40.29 42.8023 1.32 1.71 2.07 2.50 2.81 32.01 35.17 38.08 41.64 44.1824 1.32 1.71 2.06 2.49 2.80 33.20 36.42 39.36 42.98 45.5625 1.32 1.71 2.06 2.49 2.79 34.38 37.65 40.65 44.31 46.9326 1.31 1.71 2.06 2.48 2.78 35.56 38.89 41.92 45.64 48.2927 1.31 1.70 2.05 2.47 2.77 36.74 40.11 43.19 46.96 49.6428 1.31 1.70 2.05 2.47 2.76 37.92 41.34 44.46 48.28 50.9929 1.31 1.70 2.05 2.46 2.76 39.09 42.56 45.72 49.59 52.3430 1.31 1.70 2.04 2.46 2.75 40.26 43.77 46.98 50.89 53.6735 1.31 1.69 2.03 2.44 2.72 46.06 49.80 53.20 57.34 60.2740 1.30 1.68 2.02 2.42 2.70 51.81 55.76 59.34 63.69 66.7745 1.30 1.68 2.01 2.41 2.69 57.51 61.66 65.41 69.96 73.1750 1.30 1.68 2.01 2.40 2.68 63.17 67.50 71.42 76.15 79.4955 1.30 1.67 2.00 2.40 2.67 68.80 73.31 77.38 82.29 85.7560 1.30 1.67 2.00 2.39 2.66 74.40 79.08 83.30 88.38 91.9580 1.29 1.66 1.99 2.37 2.64 96.58 101.88 106.63 112.33 116.32100 1.29 1.66 1.98 2.36 2.63 118.50 124.34 129.56 135.81 140.17120 1.29 1.66 1.98 2.36 2.62 140.23 146.57 152.21 158.95 163.65∞ 1.28 1.64 1.96 2.33 2.5843
Part A Examination August 2007 Methodology Paper
Quantiles of the F distribution, P = 0.05
Denominatordegrees of Numerator degrees of freedomfreedom 1 2 3 4 5 6 7 8 9 10
1 161.4 199.5 215.7 224.6 230.2 233.0 236.8 238.9 240.5 241.92 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.403 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.794 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.965 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.746 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.067 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.648 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.359 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.1410 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.9811 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.8512 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.7513 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.6714 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.6015 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.5416 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.4917 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.4518 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.4119 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.3820 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.3521 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.3222 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.3023 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.2724 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.2525 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.2426 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.2227 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.2028 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.1929 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.1830 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.1635 4.12 3.27 2.87 2.64 2.49 2.37 2.29 2.22 2.16 2.1140 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.0845 4.06 3.20 2.81 2.58 2.42 2.31 2.22 2.15 2.10 2.0550 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.0355 4.02 3.16 2.77 2.54 2.38 2.27 2.18 2.11 2.06 2.0160 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.9980 3.96 3.11 2.72 2.49 2.33 2.21 2.13 2.06 2.00 1.95100 3.94 3.09 2.70 2.46 2.31 2.19 2.10 2.03 1.97 1.93120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91∞ 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.8344
Part A Examination August 2007 Methodology Paper
Quantiles of the Bonferroni t distribution, P (T > t) = 0.025/n
Degrees of freedomn 5 10 15 20 25 30 40 50 60 80 100 120 ∞5 4.03 3.17 2.95 2.85 2.79 2.75 2.70 2.68 2.66 2.64 2.63 2.62 2.5810 4.77 3.58 3.29 3.15 3.08 3.03 2.97 2.94 2.91 2.89 2.87 2.86 2.8115 5.25 3.83 3.48 3.33 3.24 3.19 3.12 3.08 3.06 3.03 3.01 3.00 2.9420 5.60 4.00 3.62 3.46 3.36 3.30 3.23 3.18 3.16 3.12 3.10 3.09 3.0225 5.89 4.14 3.73 3.55 3.45 3.39 3.31 3.26 3.23 3.20 3.17 3.16 3.0930 6.14 4.26 3.82 3.63 3.52 3.45 3.37 3.32 3.29 3.25 3.23 3.22 3.1435 6.35 4.36 3.90 3.70 3.58 3.51 3.43 3.38 3.34 3.30 3.28 3.26 3.1940 6.54 4.44 3.96 3.75 3.64 3.56 3.47 3.42 3.39 3.35 3.32 3.31 3.2345 6.71 4.52 4.02 3.80 3.68 3.61 3.51 3.46 3.43 3.38 3.36 3.34 3.2650 6.87 4.59 4.07 3.85 3.73 3.65 3.55 3.50 3.46 3.42 3.39 3.37 3.2955 7.01 4.65 4.12 3.89 3.76 3.68 3.58 3.53 3.49 3.45 3.42 3.40 3.3260 7.15 4.71 4.16 3.93 3.80 3.71 3.61 3.56 3.52 3.47 3.45 3.43 3.3470 7.39 4.81 4.24 3.99 3.86 3.77 3.67 3.61 3.57 3.52 3.49 3.47 3.3880 7.60 4.90 4.31 4.05 3.91 3.82 3.71 3.65 3.61 3.56 3.53 3.51 3.4290 7.80 4.98 4.36 4.10 3.96 3.86 3.75 3.69 3.65 3.60 3.57 3.55 3.45100 7.98 5.05 4.42 4.15 4.00 3.90 3.79 3.72 3.68 3.63 3.60 3.58 3.48
Quantiles of the Bonferroni t distribution, P (T > t) = 0.005/n
Degrees of freedomn 5 10 15 20 25 30 40 50 60 80 100 120 ∞5 5.89 4.14 3.73 3.55 3.45 3.39 3.31 3.26 3.23 3.20 3.17 3.16 3.0910 6.87 4.59 4.07 3.85 3.73 3.65 3.55 3.50 3.46 3.42 3.39 3.37 3.2915 7.50 4.85 4.27 4.02 3.88 3.80 3.69 3.63 3.59 3.54 3.51 3.49 3.4020 7.98 5.05 4.42 4.15 4.00 3.90 3.79 3.72 3.68 3.63 3.60 3.58 3.4825 8.36 5.20 4.53 4.24 4.08 3.98 3.86 3.79 3.75 3.70 3.66 3.64 3.5430 8.69 5.33 4.62 4.32 4.15 4.05 3.92 3.85 3.81 3.75 3.72 3.69 3.5935 8.98 5.44 4.70 4.39 4.21 4.11 3.98 3.90 3.85 3.80 3.76 3.74 3.6340 9.24 5.53 4.77 4.44 4.27 4.15 4.02 3.94 3.89 3.83 3.80 3.78 3.6645 9.47 5.62 4.83 4.49 4.31 4.20 4.06 3.98 3.93 3.87 3.83 3.81 3.6950 9.68 5.69 4.88 4.54 4.35 4.23 4.09 4.01 3.96 3.90 3.86 3.84 3.7255 9.87 5.76 4.93 4.58 4.39 4.27 4.13 4.04 3.99 3.93 3.89 3.86 3.7460 10.05 5.83 4.97 4.62 4.42 4.30 4.15 4.07 4.02 3.95 3.91 3.89 3.7670 10.38 5.94 5.05 4.68 4.48 4.35 4.20 4.12 4.06 4.00 3.96 3.93 3.8080 10.67 6.04 5.12 4.74 4.53 4.40 4.25 4.16 4.10 4.03 3.99 3.97 3.8490 10.94 6.13 5.18 4.79 4.58 4.44 4.29 4.20 4.14 4.07 4.02 4.00 3.86100 11.18 6.21 5.24 4.84 4.62 4.48 4.32 4.23 4.17 4.10 4.05 4.03 3.89
45