McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being...

McGill University

Part A Examination in Statistics

Methodology Paper

Department of Mathematics & Statistics

Date: Thursday, August 23rd 2007 Time: 13:00–17:00

Instructions

• Answer two questions out of Section 1. Only two questions will bemarked.

• Answer two questions out of Section 2. Only two questions will bemarked.

• If you do not indicate which questions you wish to have marked, thequestions will be marked in the order in which they appear in theanswer book until the quota has been reached.

• All questions are weighted equally (20 marks each).

• Good luck!

This exam comprises the cover, questions on pages 1 to 8, and tables on pages 41 to45.v 5-20070427

Part A Examination August 2007 Methodology Paper

Section 1: Answer two of questions SM1 to SM3.

SM1. This dataset is courtesy of Dr Waldon Garris University of Virginia School ofMedicine. Dr Garriss collected the data in a pilot study during his work inthe Dominican Republic in 1997. The subjects are persons who came to med-ical clinics in several villages; variables age, gender, village name systolic

blood pressure, and diastolic blood pressure were collected.

The primary research question of interest is to determine the extent to whichwe can use the first three covariates to predict the systolic blood pressure.

The output for question SM1 begins on page 9.

(a) [3 marks] Test for significance of each of the three covariates individually.Refer clearly to the part(s) of the output that you are using for your tests.Is there evidence to include only one (if any) of the covariates in the model?Or should one include both covariates? Explain.

(b) [3 marks] Test for the significance of each of three covariates in the pres-ence of both of the others. What covariates should be included in themodel for systolic blood pressure? Explain.

(c) [4 marks] State and comment on the appropriateness of the assumptionsof the linear regression model that you’ve selected.

(d) [10 marks] Refer to the code and figures for Question #1, Part (d) for thefollowing questions. We will only examine models that include all threecovariates for conciseness.

i. Interpret the Box-Cox transformation plot. Does the plot and theBox-Cox procedure suggest a transformation of the data is necessary?

ii. Assume now that one would transform the data (regardless of youranswer in part (i)), what transformation would you propose for thisdataset? Briefly explain why you choose your transformation. Brieflydiscuss the advantages and disadvantages of transforming the data theway you’ve proposed.

iii. A researcher wants to use the AIC and BIC to select a “best” regressionmodel for the data. Can the researcher use the AIC and BIC (notshown) to choose amongst transformations? Why or why not? If not,be sure to suggest a possible adjustment to the AIC/BIC values thatwould make such a comparison more reasonable.

1


SM2. This data gives sugar cane yields for each paddock in the Mulgrave area ofNorth Queensland for the 1997 sugar cane season. It was obtained by DavidGregory and Nick Denman for their MS305 data project at The University ofQueensland in 1998. There are 3775 observations in the dataset.

Mulgrave is a region in North Queensland around the Mulgrave river and thecity of Cairns. Sugar cane is the primary industry in Mulgrave, and all sugarcane from the area is processed through the Mulgrave Central Mill. The datawere provided by the Bureau of Sugar Experimental Stations (BSES) on behalfof the Mulgrave Central Mill and was obtained from the OzDASL repository.

The response variable of interest is the commercial sugar content per rake pro-duced (Sugar). The goal of the analysis is to discover predictors of sugar contentand the best regression model using the following predictor variables:

• DistrictPosition: The Mulgrave area has been divided by the BSESinto fifteen districts, but the statistical authors grouped them further bylocation into 5 groups by location (Central, North, South, East, and West).

• Age: Cane planted the year before may be regarded as having age zero.Cane let to grow for one year after being cut (this is, the cane is firstratoon) can be considered to have an age of one.

• HarvestMonth: The sugar cane cutting season usually begins in June andconcludes in mid-November, the finishing date depending on how the sea-son has gone with respect to rainfall and mill breakdowns. Months arelabelled by their numerical equivalent (June = 6, July = 7, etc.).


(a) [6 marks] Refer only to the code and plots for part (a) for the followingtwo parts.

i. Give an interpretation for the coefficient DistrictPositionN withrespect to the mean of the response variable.

ii. Test for a significant effect of District Position Group on the sugarcontent yield. Comment on the model fit and validity of the modelassumptions.

(b) [8 marks] Refer only to the code and plots for parts (a) and (b) for thefollowing three parts.

i. Test for a significant effect of Age by itself.

ii. Test for significant effects of both DistrictPosition and Age in thepresence of the other.

2


iii. Comment on the model fit and validity of the model assumptions forboth models in b(i) and b(ii).

iv. Should both covariates be included in the model? Explain why or whynot.

(c) [6 marks] Refer only to the code part (c) for the following part.

i. Assume that we will include Age and DistrictPosition in the model.There are three suggested modelling choices for HarvestMonth:

• Modeling HarvestMonth as a single quantitative variable

• Modeling HarvestMonth with a linear and quadratic term

• Modeling HarvestMonth as a factor (or categorical) variable

Discuss the relative merits of each of the three models from a statisticalperspective and choose what you would consider the “best” model. Besure your discussion makes clear your reasons for selecting one modelover the other two.

3


SM3. The mean shift outlier model is given by:

y = Xβ + φzn + ε

where zn is a given (n× 1) vector with zeroes in all positions except the n−thposition, which contains a 1. φ is an unknown scalar parameter and ε is a (n×1)vector of independent Normal(0,σ2) random variables.

(a) [2 marks] Write out the expected value for the nth observation. Whatis the intercept for this model (i.e. what is the expected value for anobservation with all covariate values equal to 0)?

(b) [4 marks] If

d = [z′n(I−H)zn]−1z′n(I−H)y

then show that d = (yn−yn)2

1−hnn, where hnn is the n−th diagonal element of

the hat matrix for the covariates in X.

(c) [4 marks] Show that the increase in regression sum of squares after fittingφ is given by

SSR2 = d2z′n(I−H)zn = e2n/(1− hnn).

(d) [4 marks] From part (c), deduce the corresponding reduction in SSE.

(e) [6 marks] What statistical test would you do to test the hypothesis H0 :φ = 0? What influence diagnostic is this test statistic equivalent to?

4


Section 2: Answer two of questions SM4 to SM6.

SM4. Dr P. J. Solomon of the Australian National Centre in HIV Epidemiology andClinical Research collected data on 2843 patients diagnosed with AIDS in Aus-tralia before 1 July 1991:

state: Grouped state of origin: NSW, Other, QLD or VIC

sex: Sex of patient

diag: (Julian) date of diagnosis (days)

death: (Julian) date of death or end of observation (days)

status: ”A” (alive) or ”D” (dead) at end of observation

T.categ: Reported transmission category (8 categories)

age: Age (years) at diagnosis.

The survival time (time) was assumed to have an exponential distribution witha log link to a linear model in the regressors.


Choose suitable models to decide if the survival time is related to

(a) [6 marks]

i. state

ii. sex

iii. transmission category

(b) [6 marks]

i. age

ii. date of diagnosis.

Does the survival time increase or decrease with age? with date of diag-nosis?

(c) [8 marks] Choose a suitable model to estimate the mean survival timeof a 25 year old male patient diagnosed with AIDS in NSW on July 12004 (diag=16253) who reported transmission by heterosexual contact(T.categhet), and the probability that such a patient would survive morethan three years (365×3=1095 days). How reliable do you think this esti-mator is?

5


SM5. A breast cancer database was obtained from the University of Wisconsin Hos-pitals, Madison from Dr. William H. Wolberg. He assessed biopsies of breasttumors for 699 patients up to 15 July 1992; each of nine attributes has beenscored on a scale of 1 to 10, and the outcome is also known: benign (Y=0) ormalignant (Y=1). This data frame contains the following columns:

V1 Clump thickness

V2 Uniformity of cell size

V3 Uniformity of cell shape

V4 Marginal adhesion

V5 Single epithelial cell size

V6 Bare nuclei (16 values are missing)

V7 Bland chromatin

V8 Normal nucleoli

V9 Mitoses class ”benign” or ”malignant”


(a) [6 marks] To relate the probability that a tumor is malignant to the firstvariable, clump thickness, two sets of models were fitted, the first assuminga normal family, the second assuming a binomial family. Choose suitablemodels to test for a linear effect of V1.

(b) [3 marks] A factor fV1 was created taking the values of V1 as levels. Usethis to test if the effect of clump thickness is linear in V1 (as opposed tonon-linear).

(c) [3 marks] Look at Figure 17 (a) and (b) on page 41. Why is it that theplots of the fitted values using the model with V1 (triangles) are differentin Figures (a) and (b), yet the plots of the fitted values using the modelwith fV1 (circles) are the same in Figures (a) and (b)? Explain.

(d) [6 marks] Which attributes are related to the malignancy of breast tu-mors?

(e) [2 marks] Do you think a goodness of fit test for the last model is valid?If so, do it; if not, say why not.

6


SM6. Carl Morris (see next page) showed that there are only six families of distri-butions in the exponential family with quadratic variance functions: normal,poisson, gamma, binomial, negative binomial, and a sixth distribution which hecalled the hyperbolic secant distribution. Its variance function is V (µ) = µ2+1,it is continuous on (−∞,∞) (like the normal distribution), but it is not sym-metric. The deviance parameter is φ > 0. (If m = 1/φ is an integer and µ = 0,then the hyperbolic secant random variable is Y = (2/π)

∑mi=1 log |Ci|, where

C1, . . . , Cm are independent Cauchy random variables.)

(a) [4 marks] Find the canonical link. [Hint: make the substitution µ =tan θ]. Is this a good choice for a generalized linear model?

(b) [4 marks] What is the variance function of the inverse hyperbolic secantdistribution?

(c) [4 marks] Find an expression for the deviance as a function of the obser-vations Y1, Y2, . . . , Yn and their fitted values µ1, µ2, . . . , µn.

(d) [4 marks] Suppose we have 4 observations from this distribution withvalues 0.2,0.5,0.4,0.9. If the mean µ and the deviance parameter φ is thethe same for each observation, find the maximum likelihood estimate of µ,and any good estimate of φ.

(e) [4 marks] We suspect that the data in (d) have a hyperbolic secant dis-tribution with φ = 0.05. Do you think a goodness of fit test for this modelwith φ = 0.05 is valid? If so, do it (approximately); if not, say why not.

7


8


Code and output for Question SM1

###Code and output for Question SM1 (a)

> age.mod<-lm(sbp ~ age)

> summary(age.mod)

Call:

lm(formula = sbp ~ age)

Residuals:

Min 1Q Median 3Q Max

-63.080 -16.688 -2.787 11.961 96.815

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 108.30786 4.19013 25.848 < 2e-16 ***

age 0.51462 0.08332 6.176 1.69e-09 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 24.54 on 379 degrees of freedom

Multiple R-Squared: 0.09145, Adjusted R-squared: 0.08905

F-statistic: 38.15 on 1 and 379 DF, p-value: 1.692e-09

############

> gen.mod<-lm(sbp~gender)

> summary(gen.mod)

Call:

lm(formula = sbp ~ gender)

Residuals:


-53.198 -15.198 -3.198 16.802 103.431

Coefficients:


(Intercept) 133.1977 1.6030 83.093 <2e-16 ***

genderMale -0.6286 2.8213 -0.223 0.824

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1


Multiple R-Squared: 0.000131, Adjusted R-squared: -0.002507

F-statistic: 0.04964 on 1 and 379 DF, p-value: 0.8238

9


##########

> vill.mod<-lm(sbp~village)

> summary(vill.mod)

Call:

lm(formula = sbp ~ village)

Residuals:


-51.625 -18.456 -4.714 14.390 104.375

Coefficients:


(Intercept) 129.625 4.558 28.440 <2e-16 ***

villageBatey Verde 5.985 6.082 0.984 0.326

villageCarmona 3.375 5.582 0.605 0.546

villageCojobal 3.172 5.660 0.560 0.576

villageJuan Sanchez 5.733 5.772 0.993 0.321

villageLa Altagracia 2.000 6.115 0.327 0.744

villageLos Gueneos -1.169 5.695 -0.205 0.838

villageSan Antonio 9.089 6.306 1.441 0.150

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1


Multiple R-Squared: 0.01328, Adjusted R-squared: -0.005241

F-statistic: 0.717 on 7 and 373 DF, p-value: 0.6577

############

########### Code and output for Question SM1 (b)

> all.mod<-lm(sbp~age+village+gender)

> summary(all.mod)

Call:

lm(formula = sbp ~ age + village + gender)

Residuals:


-63.142 -15.789 -3.318 12.842 101.729

10


Coefficients:


(Intercept) 105.63285 5.89386 17.923 < 2e-16 ***

age 0.55251 0.08717 6.339 6.73e-10 ***

villageBatey Verde 4.94172 5.80636 0.851 0.3953

villageCarmona 2.47623 5.31836 0.466 0.6418

villageCojobal 2.24775 5.42089 0.415 0.6786

villageJuan Sanchez 4.87772 5.51608 0.884 0.3771

villageLa Altagracia 1.07375 5.82694 0.184 0.8539

villageLos Gueneos -0.55883 5.43606 -0.103 0.9182

villageSan Antonio 7.15539 6.01397 1.190 0.2349

genderMale -5.58559 2.84672 -1.962 0.0505 .

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1


Multiple R-Squared: 0.1098, Adjusted R-squared: 0.08823

F-statistic: 5.086 on 9 and 371 DF, p-value: 1.699e-06

######

> anova(glm(sbp~age+gender+village))

Analysis of Deviance Table

Model: gaussian, link: identity

Response: sbp

Terms added sequentially (first to last)

Df Deviance Resid. Df Resid. Dev

NULL 380 251294

age 1 22980 379 228314

gender 1 2518 378 225796

village 7 2100 371 223696

#####

> anova(glm(sbp~age+village+gender))



Response: sbp

11




NULL 380 251294

age 1 22980 379 228314

village 7 2297 372 226017

gender 1 2321 371 223696

######

> anova(glm(sbp~village+gender+age))



Response: sbp



NULL 380 251294

village 7 3336 373 247958

gender 1 37 372 247921

age 1 24225 371 223696

12


120 130 140 150 160

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

24882

−3 −2 −1 0 1 2 3

−2

01

23

4Theoretical Quantiles

Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

24882

120 130 140 150 160

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

24882

0 100 200 300

0.00

0.02

0.04

0.06

0.08

Obs. number

Coo

k’s

dist

ance

Cook’s distance339338

259

Figure 1: Regression diagnostics for sbp ~age

13


132.6 132.8 133.0 133.2

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

259 248

−3 −2 −1 0 1 2 3

−2

−1

01

23


Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

259248

132.6 132.8 133.0 133.2

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

259 248

0 100 200 300

0.00

0.02

0.04

0.06

Obs. number

Coo

k’s

dist

ance


259

9

Figure 2: Regression diagnostics for sbp ~gender

14


132.6 132.8 133.0 133.2

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

259 248

−3 −2 −1 0 1 2 3

−2

−1

01

23


Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

259248

132.6 132.8 133.0 133.2

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

259 248

0 100 200 300

0.00

0.02

0.04

0.06

Obs. number

Coo

k’s

dist

ance


259

9

Figure 3: Regression diagnostics for sbp ~village

15


120 130 140 150 160

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

248174

−3 −2 −1 0 1 2 3

−2

01

23


Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

248174

120 130 140 150 160

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

248174

0 100 200 300

0.00

0.02

0.04

Obs. number

Coo

k’s

dist

ance


248338

Figure 4: Regression diagnostics for sbp ~village + age

16


128 130 132 134 136 138

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

248259

−3 −2 −1 0 1 2 3

−2

−1

01

23


Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

248259

128 130 132 134 136 138

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

248259

0 100 200 300

0.00

0.02

0.04

0.06

Obs. number

Coo

k’s

dist

ance


259248

Figure 5: Regression diagnostics for sbp ~village + gender

17


110 120 130 140 150 160

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

9 248

−3 −2 −1 0 1 2 3

−2

02


Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

9248

110 120 130 140 150 160

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

9 248

0 100 200 300

0.00

0.02

0.04

0.06

0.08

Obs. number

Coo

k’s

dist

ance


226

Figure 6: Regression diagnostics for sbp ~age + gender

18


110 120 130 140 150 160

−50

050

100

Fitted values

Res

idua

ls

Residuals vs Fitted

226

2489

−3 −2 −1 0 1 2 3

−2

02


Sta

ndar

dize

d re

sidu

als

Normal Q−Q

226

2489

110 120 130 140 150 160

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

ndar

dize

d re

sidu

als

Scale−Location226

2489

0 100 200 300

0.00

0.02

0.04

Obs. number

Coo

k’s

dist

ance


338339

Figure 7: Regression diagnostics for sbp ~age + gender + village

19


####

#### Output and code for Question SM1 (d)

> mybox<-boxcox(lm(sbp~age+gender+village))

> mybox$x[order(mybox$y,decreasing=T)[1]]

[1] -0.5454545

20


−2 −1 0 1 2

−23

80−

2370

−23

60−

2350

−23

40−

2330

λ

log−

Like

lihoo

d

95%

Figure 8: Box-Cox transformation diagnostic plot for systolic blood pressure dataset

21



### Code for Question SM2, part (a)

> districtonly.mod<-lm(Sugar~DistrictPosition)

> summary(districtonly.mod)

Call:

lm(formula = Sugar ~ DistrictPosition)

Residuals:


-5.5326 -0.8103 0.0716 0.8912 4.2147

Coefficients:


(Intercept) 11.42840 0.06007 190.248 < 2e-16 ***

DistrictPositionE 0.35294 0.08566 4.120 3.86e-05 ***

DistrictPositionN 1.72268 0.08709 19.781 < 2e-16 ***

DistrictPositionS 0.18416 0.08281 2.224 0.026222 *

DistrictPositionW 0.23187 0.06792 3.414 0.000646 ***

----


Multiple R-Squared: 0.1227,Adjusted R-squared: 0.1217

F-statistic: 131.8 on 4 and 3770 DF, p-value: < 2.2e-16

22


C E N S W

68

10

12

14

16

Figure 9: Boxplots of Sugar by DistrictPosition

23


11.5 12.0 12.5 13.0

−6

−2

02

4

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

11991198703

−2 0 2

−4

−2

02


Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

11991198703

11.5 12.0 12.5 13.0

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls

Scale−Location11991198703

0 1000 2000 3000

0.0

00

0.0

03

0.0

06

Obs. number

Co

ok’s

dis

tan

ce


11991198

Figure 10: Regression diagnostics for model for Sugar including onlyDistrictPosition

24


### Code for Question SM2, part (b)

###### Age Only

> age.mod <- lm(Sugar~Age)

> summary(age.mod)

Call:

lm(formula = Sugar ~ Age)

Residuals:


-5.49281 -0.87684 0.03579 0.89816 5.21414

Coefficients:


(Intercept) 12.14586 0.03776 321.66 <2e-16 ***

Age -0.15305 0.01395 -10.97 <2e-16 ***

---




########### District Position + Age

> summary(district.age.mod)

Call:

lm(formula = Sugar ~ DistrictPosition + Age)

Residuals:


-5.44238 -0.80695 0.08379 0.90526 4.44009

Coefficients:


(Intercept) 11.66672 0.06424 181.606 < 2e-16 ***



DistrictPositionS 0.25396 0.08213 3.092 0.002 **

DistrictPositionW 0.28142 0.06729 4.182 2.95e-05 ***

Age -0.12831 0.01324 -9.687 < 2e-16 ***

---

25





#############

> anova(age.mod,district.age.mod)

Analysis of Variance Table

Model 1: Sugar ~ Age

Model 2: Sugar ~ DistrictPosition + Age

Res.Df RSS Df Sum of Sq F Pr(>F)

1 3773 7483.5

2 3769 6610.3 4 873.2 124.47 < 2.2e-16 ***

> anova(districtonly.mod,district.age.mod)

Analysis of Variance Table

Model 1: Sugar ~ DistrictPosition

Model 2: Sugar ~ DistrictPosition + Age

Res.Df RSS Df Sum of Sq F Pr(>F)

1 3770 6774.9

2 3769 6610.3 1 164.6 93.846 < 2.2e-16 ***

---

26


0 2 4 6 8

68

10

12

14

16

Age

Sugar

Figure 11: Plot of Sugar by Age

27


11.0 11.2 11.4 11.6 11.8 12.0

−6

−2

02

46

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

70311991198

−2 0 2

−4

−2

02


Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

70311991198

11.0 11.2 11.4 11.6 11.8 12.0

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls


0 1000 2000 3000

0.0

00

0.0

04

0.0

08

Obs. number

Co

ok’s

dis

tan

ce


658

Figure 12: Regression diagnostics for model for Sugar including only Age

28


11.0 11.5 12.0 12.5 13.0

−6

−2

02

4

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

70311991198

−2 0 2

−4

−2

02


Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

70311991198

11.0 11.5 12.0 12.5 13.0

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls


0 1000 2000 3000

0.0

00

0.0

03

0.0

06

Obs. number

Co

ok’s

dis

tan

ce

Cook’s distance1199703 1198

Figure 13: Regression diagnostics for model for Sugar including only Age andDistrictPosition

29


### Code for Question SM2, part (c)

###### HarvestMonth as linear

> harvest.mod<-lm(Sugar~DistrictPosition+Age+HarvestMonth)

> summary(harvest.mod)

Call:

lm(formula = Sugar ~ DistrictPosition + Age + HarvestMonth)

Residuals:


-5.70976 -0.76860 0.07636 0.88184 4.04782

Coefficients:


(Intercept) 10.16050 0.13535 75.068 < 2e-16 ***



DistrictPositionS 0.28550 0.08051 3.546 0.000396 ***


Age -0.13841 0.01300 -10.645 < 2e-16 ***

HarvestMonth 0.17588 0.01399 12.570 < 2e-16 ***

-----




###### HarvestMonth as factor

> harvest.fact.mod<-lm(Sugar~DistrictPosition+Age+factor(HarvestMonth))

> summary(harvest.quad.mod)

Call:

lm(formula = Sugar ~ DistrictPosition + Age + HarvestMonth +

I(HarvestMonth^2))

Residuals:


-5.69731 -0.76383 0.07553 0.85751 4.38852

Coefficients:


(Intercept) 1.77596 0.73698 2.410 0.016010 *

30




DistrictPositionS 0.30759 0.07915 3.886 0.000104 ***


Age -0.08725 0.01352 -6.452 1.24e-10 ***

HarvestMonth 2.16681 0.17267 12.549 < 2e-16 ***

I(HarvestMonth^2) -0.11630 0.01006 -11.567 < 2e-16 ***

---




###### HarvestMonth with linear and quadratic terms

> harvest.quad.mod<-lm(Sugar~DistrictPosition+Age+HarvestMonth +

I(HarvestMonth^2))

> summary(harvest.fact.mod)

Call:

lm(formula = Sugar ~ DistrictPosition + Age + factor(HarvestMonth))

Residuals:


-5.53426 -0.75585 0.06586 0.84534 4.18221

Coefficients:


(Intercept) 10.43768 0.09790 106.618 < 2e-16 ***



DistrictPositionS 0.32759 0.07891 4.151 3.38e-05 ***


Age -0.08761 0.01346 -6.510 8.50e-11 ***

factor(HarvestMonth)7 0.82974 0.08483 9.781 < 2e-16 ***





----



31



#### Model selection criteria

> AIC(harvest.mod)

[1] 12688.75

> AIC(harvest.quad.mod)

[1] 12559.00

> AIC(harvest.fact.mod)

[1] 12519.90

> AIC(harvest.mod,k=log(length(Sugar)))

[1] 12738.64

> AIC(harvest.quad.mod,k=log(length(Sugar)))

[1] 12615.13

> AIC(harvest.fact.mod,k=log(length(Sugar)))

[1] 12594.74

32


10.5 11.0 11.5 12.0 12.5 13.0 13.5

−6

−2

02

4

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

7031199 709

−2 0 2

−4

−2

02


Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

7031199709

10.5 11.0 11.5 12.0 12.5 13.0 13.5

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls

Scale−Location7031199 709

0 1000 2000 3000

0.0

00

0.0

04

0.0

08

Obs. number

Co

ok’s

dis

tan

ce


70924

Figure 14: Regression diagnostics for model using Harvest

33


10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5

−6

−2

02

4

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

70311981199

−2 0 2

−4

−2

02


Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

70311981199

10.0 10.5 11.0 11.5 12.0 12.5 13.0 13.5

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls


0 1000 2000 3000

0.0

00

0.0

04

0.0

08

Obs. number

Co

ok’s

dis

tan

ce

Cook’s distance709 1199

442

Figure 15: Regression diagnostics for model using Harvest2

34


10 11 12 13

−6

−2

02

4

Fitted values

Re

sid

ua

ls

Residuals vs Fitted

11997031198

−2 0 2

−4

−2

02


Sta

nd

ard

ize

d r

esid

ua

ls

Normal Q−Q

11997031198

10 11 12 13

0.0

0.5

1.0

1.5

2.0

Fitted values

Sta

nd

ard

ize

d r

esid

ua

ls


0 1000 2000 3000

0.0

00

0.0

04

Obs. number

Co

ok’s

dis

tan

ce


24

Figure 16: Regression diagnostics for model using factor(Harvest)

35



> data(Aids2)> attach(Aids2)> time<-death-diag+1> c<-codes(status)-1> rate<-c/time> summary(glm(rate~state+sex+diag+T.categ+age, family=poisson, weight=time))

Call:glm(formula = rate ~ state + sex + diag + T.categ + age, family = poisson,

weights = time)

Deviance Residuals:Min 1Q Median 3Q Max

-4.37597 -0.77433 0.04455 0.91472 3.42263

Coefficients:Estimate Std. Error z value Pr(>|z|)

(Intercept) -3.6728465 0.4716294 -7.788 6.83e-15 ***stateOther -0.0944785 0.0895655 -1.055 0.29149stateQLD 0.1860238 0.0878128 2.118 0.03414 *stateVIC -0.0018092 0.0613208 -0.030 0.97646sexM -0.0369529 0.1757609 -0.210 0.83348diag -0.0003179 0.0000421 -7.552 4.29e-14 ***T.categhsid -0.1211765 0.1520374 -0.797 0.42544T.categid -0.3799289 0.2459986 -1.544 0.12248T.categhet -0.7307894 0.2652388 -2.755 0.00587 **T.categhaem 0.3462834 0.1881367 1.841 0.06568 .T.categblood 0.1393095 0.1374007 1.014 0.31063T.categmother 0.4603228 0.5893405 0.781 0.43475T.categother 0.1200160 0.1636915 0.733 0.46345age 0.0139496 0.0024987 5.583 2.37e-08 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 4407.2 on 2842 degrees of freedomResidual deviance: 4283.0 on 2829 degrees of freedomAIC: Inf

Number of Fisher Scoring iterations: 8

> glm(rate~state+sex+diag+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2836 ResidualNull Deviance: 4407Residual Deviance: 4302 AIC: Inf> glm(rate~sex+diag+T.categ+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2832 ResidualNull Deviance: 4407Residual Deviance: 4289 AIC: Inf

36


> glm(rate~state+diag+T.categ+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2830 ResidualNull Deviance: 4407Residual Deviance: 4283 AIC: Inf> glm(rate~diag+T.categ+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2833 ResidualNull Deviance: 4407Residual Deviance: 4289 AIC: Inf> glm(rate~sex+diag+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2839 ResidualNull Deviance: 4407Residual Deviance: 4308 AIC: InfThere were 50 or more warnings (use warnings() to see the first 50)> glm(rate~state+diag+age, family=poisson, weight=time)

Degrees of Freedom: 2842 Total (i.e. Null); 2837 ResidualNull Deviance: 4407Residual Deviance: 4303 AIC: InfThere were 50 or more warnings (use warnings() to see the first 50)> summary(glm(rate~diag+age, family=poisson, weight=time))

Call:glm(formula = rate ~ diag + age, family = poisson, weights = time)


-4.19449 -0.77350 0.04316 0.91967 3.40510


(Intercept) -3.681e+00 4.352e-01 -8.457 < 2e-16 ***diag -3.251e-04 4.122e-05 -7.888 3.07e-15 ***age 1.521e-02 2.411e-03 6.308 2.83e-10 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 4407.2 on 2842 degrees of freedomResidual deviance: 4309.4 on 2840 degrees of freedomAIC: Inf

Number of Fisher Scoring iterations: 8

There were 50 or more warnings (use warnings() to see the first 50)

37



> data(biopsy)> attach(biopsy)> Y<-codes(class)-1> fV1<-factor(V1)> par(mfrow=c(2,2))> glm0<-glm(Y~V1)> summary(glm0)Call:glm(formula = Y ~ V1)


-0.77804 -0.17331 -0.01994 0.06859 1.06859

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) -0.189535 0.023395 -8.102 2.43e-15 ***V1 0.120947 0.004467 27.078 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for gaussian family taken to be 0.1104095)

Null deviance: 157.908 on 698 degrees of freedomResidual deviance: 76.955 on 697 degrees of freedomAIC: 447.39 Number of Fisher Scoring iterations: 2

> glm1<-glm(Y~fV1)> summary(glm1)Call:glm(formula = Y ~ fV1)


-9.565e-01 -1.111e-01 -2.069e-02 3.331e-15 9.793e-01

Coefficients:Estimate Std. Error t value Pr(>|t|)

(Intercept) 0.02069 0.02647 0.782 0.43466fV12 0.05931 0.05227 1.135 0.25689fV13 0.09042 0.04051 2.232 0.02593 *fV14 0.12931 0.04439 2.913 0.00369 **fV15 0.32546 0.03850 8.455 < 2e-16 ***fV16 0.50872 0.06073 8.377 3.04e-16 ***fV17 0.93583 0.07153 13.083 < 2e-16 ***fV18 0.89235 0.05393 16.546 < 2e-16 ***fV19 0.97931 0.08920 10.979 < 2e-16 ***fV110 0.97931 0.04661 21.010 < 2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for gaussian family taken to be 0.1015776)


38


> plot(V1,fitted(glm1))> points(V1,fitted(glm0),pch=2)> title(’Figure 2.1: Normal family’)> glm0<-glm(Y~V1,family=binomial)> summary(glm0)Call:glm(formula = Y ~ V1, family = binomial)


-2.1986 -0.4261 -0.1704 0.1730 2.9118


(Intercept) -5.16017 0.37772 -13.66 <2e-16 ***V1 0.93546 0.07372 12.69 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)


> glm1<-glm(Y~fV1, family=binomial)> summary(glm1)Call:glm(formula = Y ~ fV1, family = binomial)


-2.50419 -0.48535 -0.20448 0.01184 2.78500


(Intercept) -3.8572 0.5834 -6.611 3.81e-11 ***fV12 1.4149 0.7824 1.808 0.07054 .fV13 1.7778 0.6589 2.698 0.00697 **fV14 2.1226 0.6621 3.206 0.00135 **fV15 3.2212 0.6119 5.265 1.40e-07 ***fV16 3.9750 0.6771 5.871 4.34e-09 ***fV17 6.9483 1.1772 5.902 3.58e-09 ***fV18 6.2086 0.7837 7.922 2.33e-15 ***fV19 13.4232 19.3753 0.693 0.48844fV110 13.4232 8.7430 1.535 0.12471---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



39


> plot(V1,fitted(glm1))> points(V1,fitted(glm0),pch=2)> title(’Figure 2.2: Binomial family’)> summary(glm(Y~V1+V2+V3+V4+V5+V6+V7+V8+V9, family=binomial))Call:glm(formula = Y ~ V1 + V2 + V3 + V4 + V5 + V6 + V7 + V8 + V9,

family = binomial)


-3.48404 -0.11529 -0.06192 0.02221 2.46983


(Intercept) -10.103859 1.170793 -8.630 < 2e-16 ***V1 0.535008 0.141838 3.772 0.000162 ***V2 -0.006278 0.208786 -0.030 0.976011V3 0.322705 0.230224 1.402 0.161005V4 0.330634 0.123318 2.681 0.007337 **V5 0.096634 0.156467 0.618 0.536836V6 0.383024 0.093741 4.086 4.39e-05 ***V7 0.447184 0.171156 2.613 0.008982 **V8 0.213030 0.112757 1.889 0.058855 .V9 0.534817 0.328105 1.630 0.103098---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



> summary(glm(Y~V1+V4+V6+V7, family=binomial))Call:glm(formula = Y ~ V1 + V4 + V6 + V7, family = binomial)


-3.69637 -0.14510 -0.06093 0.02317 2.44758


(Intercept) -10.11370 1.03190 -9.801 < 2e-16 ***V1 0.81166 0.12579 6.453 1.10e-10 ***V4 0.43412 0.11399 3.808 0.00014 ***V6 0.48136 0.08813 5.462 4.72e-08 ***V7 0.70154 0.15190 4.619 3.87e-06 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



40


2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

V1

fitte

d(gl

m1)

a: Normal family

2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

V1

fitte

d(gl

m1)

b: Binomial family

Figure 17: Figure for question SM4

41


Upper tail probabilities of the standard Normal distribution

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.46410.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.42470.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.38590.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.34830.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.31210.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.27760.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.24510.7 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.21480.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.18670.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.16111.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.13791.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.11701.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.09851.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.08231.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.06811.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.05591.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.04551.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.03671.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.02941.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.02332.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.01832.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.01432.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.01102.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.00842.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.00642.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.00482.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.00362.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.00262.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.00192.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.00143.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.00103.1 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.00073.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.00053.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.00033.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.00023.5 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.00023.6 0.0002 0.0002 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.00013.7 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.00013.8 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.00013.9 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.00004.0 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

42


Quantiles of the t and χ2 distributions

Quantiles of the t distribution Quantiles of the χ2 distributionDegrees of Upper tail probability Upper tail probabilityfreedom 0.1 0.05 0.025 0.01 0.005 0.1 0.05 0.025 0.01 0.005

1 3.08 6.31 12.71 31.82 63.66 2.71 3.84 5.02 6.63 7.882 1.89 2.92 4.30 6.96 9.92 4.61 5.99 7.38 9.21 10.603 1.64 2.35 3.18 4.54 5.84 6.25 7.81 9.35 11.34 12.844 1.53 2.13 2.78 3.75 4.60 7.78 9.49 11.14 13.28 14.865 1.48 2.02 2.57 3.36 4.03 9.24 11.07 12.83 15.09 16.756 1.44 1.94 2.45 3.14 3.71 10.64 12.59 14.45 16.81 18.557 1.41 1.89 2.36 3.00 3.50 12.02 14.07 16.01 18.48 20.288 1.40 1.86 2.31 2.90 3.36 13.36 15.51 17.53 20.09 21.959 1.38 1.83 2.26 2.82 3.25 14.68 16.92 19.02 21.67 23.5910 1.37 1.81 2.23 2.76 3.17 15.99 18.31 20.48 23.21 25.1911 1.36 1.80 2.20 2.72 3.11 17.28 19.68 21.92 24.72 26.7612 1.36 1.78 2.18 2.68 3.05 18.55 21.03 23.34 26.22 28.3013 1.35 1.77 2.16 2.65 3.01 19.81 22.36 24.74 27.69 29.8214 1.35 1.76 2.14 2.62 2.98 21.06 23.68 26.12 29.14 31.3215 1.34 1.75 2.13 2.60 2.95 22.31 25.00 27.49 30.58 32.8016 1.34 1.75 2.12 2.58 2.92 23.54 26.30 28.85 32.00 34.2717 1.33 1.74 2.11 2.57 2.90 24.77 27.59 30.19 33.41 35.7218 1.33 1.73 2.10 2.55 2.88 25.99 28.87 31.53 34.81 37.1619 1.33 1.73 2.09 2.54 2.86 27.20 30.14 32.85 36.19 38.5820 1.33 1.72 2.09 2.53 2.85 28.41 31.41 34.17 37.57 40.0021 1.32 1.72 2.08 2.52 2.83 29.62 32.67 35.48 38.93 41.4022 1.32 1.72 2.07 2.51 2.82 30.81 33.92 36.78 40.29 42.8023 1.32 1.71 2.07 2.50 2.81 32.01 35.17 38.08 41.64 44.1824 1.32 1.71 2.06 2.49 2.80 33.20 36.42 39.36 42.98 45.5625 1.32 1.71 2.06 2.49 2.79 34.38 37.65 40.65 44.31 46.9326 1.31 1.71 2.06 2.48 2.78 35.56 38.89 41.92 45.64 48.2927 1.31 1.70 2.05 2.47 2.77 36.74 40.11 43.19 46.96 49.6428 1.31 1.70 2.05 2.47 2.76 37.92 41.34 44.46 48.28 50.9929 1.31 1.70 2.05 2.46 2.76 39.09 42.56 45.72 49.59 52.3430 1.31 1.70 2.04 2.46 2.75 40.26 43.77 46.98 50.89 53.6735 1.31 1.69 2.03 2.44 2.72 46.06 49.80 53.20 57.34 60.2740 1.30 1.68 2.02 2.42 2.70 51.81 55.76 59.34 63.69 66.7745 1.30 1.68 2.01 2.41 2.69 57.51 61.66 65.41 69.96 73.1750 1.30 1.68 2.01 2.40 2.68 63.17 67.50 71.42 76.15 79.4955 1.30 1.67 2.00 2.40 2.67 68.80 73.31 77.38 82.29 85.7560 1.30 1.67 2.00 2.39 2.66 74.40 79.08 83.30 88.38 91.9580 1.29 1.66 1.99 2.37 2.64 96.58 101.88 106.63 112.33 116.32100 1.29 1.66 1.98 2.36 2.63 118.50 124.34 129.56 135.81 140.17120 1.29 1.66 1.98 2.36 2.62 140.23 146.57 152.21 158.95 163.65∞ 1.28 1.64 1.96 2.33 2.5843


Quantiles of the F distribution, P = 0.05

Denominatordegrees of Numerator degrees of freedomfreedom 1 2 3 4 5 6 7 8 9 10

1 161.4 199.5 215.7 224.6 230.2 233.0 236.8 238.9 240.5 241.92 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38 19.403 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.794 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.965 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.746 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.067 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.648 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.359 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.1410 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.9811 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.8512 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.7513 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.6714 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.6015 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.5416 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.4917 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.4518 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.4119 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.3820 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.3521 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.3222 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.3023 4.28 3.42 3.03 2.80 2.64 2.53 2.44 2.37 2.32 2.2724 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.2525 4.24 3.39 2.99 2.76 2.60 2.49 2.40 2.34 2.28 2.2426 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.2227 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.31 2.25 2.2028 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.1929 4.18 3.33 2.93 2.70 2.55 2.43 2.35 2.28 2.22 2.1830 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.1635 4.12 3.27 2.87 2.64 2.49 2.37 2.29 2.22 2.16 2.1140 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.0845 4.06 3.20 2.81 2.58 2.42 2.31 2.22 2.15 2.10 2.0550 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.0355 4.02 3.16 2.77 2.54 2.38 2.27 2.18 2.11 2.06 2.0160 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.9980 3.96 3.11 2.72 2.49 2.33 2.21 2.13 2.06 2.00 1.95100 3.94 3.09 2.70 2.46 2.31 2.19 2.10 2.03 1.97 1.93120 3.92 3.07 2.68 2.45 2.29 2.18 2.09 2.02 1.96 1.91∞ 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.8344


Quantiles of the Bonferroni t distribution, P (T > t) = 0.025/n

Degrees of freedomn 5 10 15 20 25 30 40 50 60 80 100 120 ∞5 4.03 3.17 2.95 2.85 2.79 2.75 2.70 2.68 2.66 2.64 2.63 2.62 2.5810 4.77 3.58 3.29 3.15 3.08 3.03 2.97 2.94 2.91 2.89 2.87 2.86 2.8115 5.25 3.83 3.48 3.33 3.24 3.19 3.12 3.08 3.06 3.03 3.01 3.00 2.9420 5.60 4.00 3.62 3.46 3.36 3.30 3.23 3.18 3.16 3.12 3.10 3.09 3.0225 5.89 4.14 3.73 3.55 3.45 3.39 3.31 3.26 3.23 3.20 3.17 3.16 3.0930 6.14 4.26 3.82 3.63 3.52 3.45 3.37 3.32 3.29 3.25 3.23 3.22 3.1435 6.35 4.36 3.90 3.70 3.58 3.51 3.43 3.38 3.34 3.30 3.28 3.26 3.1940 6.54 4.44 3.96 3.75 3.64 3.56 3.47 3.42 3.39 3.35 3.32 3.31 3.2345 6.71 4.52 4.02 3.80 3.68 3.61 3.51 3.46 3.43 3.38 3.36 3.34 3.2650 6.87 4.59 4.07 3.85 3.73 3.65 3.55 3.50 3.46 3.42 3.39 3.37 3.2955 7.01 4.65 4.12 3.89 3.76 3.68 3.58 3.53 3.49 3.45 3.42 3.40 3.3260 7.15 4.71 4.16 3.93 3.80 3.71 3.61 3.56 3.52 3.47 3.45 3.43 3.3470 7.39 4.81 4.24 3.99 3.86 3.77 3.67 3.61 3.57 3.52 3.49 3.47 3.3880 7.60 4.90 4.31 4.05 3.91 3.82 3.71 3.65 3.61 3.56 3.53 3.51 3.4290 7.80 4.98 4.36 4.10 3.96 3.86 3.75 3.69 3.65 3.60 3.57 3.55 3.45100 7.98 5.05 4.42 4.15 4.00 3.90 3.79 3.72 3.68 3.63 3.60 3.58 3.48

Quantiles of the Bonferroni t distribution, P (T > t) = 0.005/n

Degrees of freedomn 5 10 15 20 25 30 40 50 60 80 100 120 ∞5 5.89 4.14 3.73 3.55 3.45 3.39 3.31 3.26 3.23 3.20 3.17 3.16 3.0910 6.87 4.59 4.07 3.85 3.73 3.65 3.55 3.50 3.46 3.42 3.39 3.37 3.2915 7.50 4.85 4.27 4.02 3.88 3.80 3.69 3.63 3.59 3.54 3.51 3.49 3.4020 7.98 5.05 4.42 4.15 4.00 3.90 3.79 3.72 3.68 3.63 3.60 3.58 3.4825 8.36 5.20 4.53 4.24 4.08 3.98 3.86 3.79 3.75 3.70 3.66 3.64 3.5430 8.69 5.33 4.62 4.32 4.15 4.05 3.92 3.85 3.81 3.75 3.72 3.69 3.5935 8.98 5.44 4.70 4.39 4.21 4.11 3.98 3.90 3.85 3.80 3.76 3.74 3.6340 9.24 5.53 4.77 4.44 4.27 4.15 4.02 3.94 3.89 3.83 3.80 3.78 3.6645 9.47 5.62 4.83 4.49 4.31 4.20 4.06 3.98 3.93 3.87 3.83 3.81 3.6950 9.68 5.69 4.88 4.54 4.35 4.23 4.09 4.01 3.96 3.90 3.86 3.84 3.7255 9.87 5.76 4.93 4.58 4.39 4.27 4.13 4.04 3.99 3.93 3.89 3.86 3.7460 10.05 5.83 4.97 4.62 4.42 4.30 4.15 4.07 4.02 3.95 3.91 3.89 3.7670 10.38 5.94 5.05 4.68 4.48 4.35 4.20 4.12 4.06 4.00 3.96 3.93 3.8080 10.67 6.04 5.12 4.74 4.53 4.40 4.25 4.16 4.10 4.03 3.99 3.97 3.8490 10.94 6.13 5.18 4.79 4.58 4.44 4.29 4.20 4.14 4.07 4.02 4.00 3.86100 11.18 6.21 5.24 4.84 4.62 4.48 4.32 4.23 4.17 4.10 4.05 4.03 3.89

45

McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being...

Documents

Transcript of McGill University Part A Examination in Statistics ... · Cane let to grow for one year after being...