Applications of Linear Models in Animal Breeding Henderson-1984

385

description

Applications of Linear Models in Animal Breeding

Transcript of Applications of Linear Models in Animal Breeding Henderson-1984

Chapter 1- Models Chapter 2- Linear Unbiased Estimation Chapter 3- Best Linear Unbiased Estimation Chapter 4- Test of Hypoteses Chapter 5- Prediction of Random Variables Chapter 6- G and R Known to Proportionality Chapter 7- Known Functions of Fixed Effects Chapter 8- Unbiased Methods for G and R UnknownChapter 9- Biased Estimation and Prediction Chapter 10- Quadratic Estimation of Variances Chapter 11- MIVQUE of Variances and Covariances Chapter 12- REML and ML Estimation Chapter 13- Effects of Selection Chapter 14- Restricted Best Linear Prediction Chapter 15- Sampling from finite populations Chapter 16- The One-Way Classification Chapter 17- The Two Way Classification Chapter 18- The Three Way Classification Chapter 19- Nested Classifications Chapter 20- Analysis of Regression Models Chapter 21- Analysis of Covariance Model Chapter 22- Animal Model, Single Records Chapter 23- Sire Model, Single Records Chapter 24- Animal Model, Repeated Records Chapter 25- Sire Model, Repeated Records Chapter 26- Animal Model, Multiple Traits Chapter 27- Sire Model, Multiple Traits Chapter 28- Joint Cow and Sire Evaluation Chapter 29- Non-Additive Genetic Merit Chapter 30- Line Cross and Breed Cross Analyses Chapter 31- Maternal Effects Chapter 32- Three Way Mixed Model Chapter 33- Selection When Variances are Unequal Links to ChaptersChapter1ModelsC.R.Henderson1984-GuelphThisbookisconcernedexclusivelywiththeanalysisofdataarisingfromanexperi-mentorsamplingschemeforwhichalinearmodel isassumedtobeasuitableapproxi-mation. Weshouldnot,however,besonaiveastobelievethatalinearmodelisalwayscorrect. Theimportant considerationis whether its usepermits predictions tobeac-complishedaccuratelyenoughforourpurposes. Thischapterwill deal withageneralformulationthatencompassesall linearmodelsthathavebeenusedinanimal breedingandrelatedelds. Somesuggestionsforchoosingamodelwillalsobediscussed.All linearmodelscan, Ibelieve, bewrittenasfollowswithproperdenitionof thevariouselementsofthemodel. Denetheobservabledatavectorwithnelementsasy.In order for the problem to be amenable to a statistical analysis from which we can drawinferences concerning the parameters of the model or can predict future observations it isnecessarythatthedatavectorberegardedlegitimatelyasarandomsamplefromsomereal or conceptual populationwithsomeknownor assumeddistribution. Becauseweseldomknowwhatthetruedistributionreallyis,acommonlyusedmethodistoassumeasanapproximationtothetruththatthedistributionismultivariatenormal. Analysesbasedonthisapproximationoftenhaveremarkablepower. See, forexample, Cochran(1937). Themultivariatenormal distributionisdenedcompletelybyitsmeanandbyits central second moments. Consequently we write a linear model for ywith elements inthemodelthatdeterminethesemoments. Thisisy = X +Zu +e.Xisaknown,xed,n pmatrixwithrank=r minimumof(n, p). isaxed,p 1vectorgenerallyunknown,althoughinselectionindexmethodologyitisassumed,probablyalwaysincorrectly,thatitisknown.Zisaknown,xed,n qmatrix.uisarandom,q 1vectorwithnullmeans.eisarandom,n 1vectorwithnullmeans.1Thevariance-covariancematrixofuisG,aq qsymmetricmatrixthatisusuallynon-singular. Hereafter for convenience we shall use the notationV ar(u) tomeanavariance-covariancematrixofarandomvector.V ar(e)=Risann n, symmetric, usuallynon-singularmatrix. Cov(u, e

)=0,thatis, all elementsof thecovariancematrixforuwithearezeroinmostbutnotallapplications.It must beunderstoodthat wehavehypothesizedapopulationof uvectorsfromwhicharandomsampleofonehasbeendrawnintothesampleassociatedwiththedatavector, y, andsimilarlyapopulationof evectorsisassumed, andasamplevectorhasbeendrawnwiththerstelementof thesamplevectorbeingassociatedwiththerstelementofy,etc.Generallywedonotknowthevaluesof theindividual elementsof GandR. Weusuallyarewilling,however,tomakeassumptionsaboutthepatternofthesevalues. Forexample, itisoftenassumedthatall thediagonal elementsof Rareequal andthatallo-diagonal elementsarezero. Thatis, theelementsof ehaveequal variancesandaremutuallyuncorrelated. Givensomeassumedpatternof valuesof GandR, itisthenpossibletoestimatethesematricesassumingasuitabledesign(valuesofXandZ)andasuitablesamplingscheme, thatis, guaranteethatthedatavectoraroseinaccordancewithuandebeingrandomvectorsfromtheirrespectivepopulations. WiththemodeljustdescribedE(y) = meanof y = X.V ar(y) = ZGZ

+R.Weshallnowpresentafewexamplesofwellknownmodelsandshowhowthesecanbe formulated by the general model described above. The important advantage to havingonemodelthatincludesallcasesisthatwecantherebypresentinacondensedmannerthe basic methods for estimation, computing sampling variances, testing hypotheses, andprediction.1 SimpleRegressionModelThesimpleregressionmodelcanbewrittenasfollows,yi=+xi+ei.This is a scalar model, yibeing the ithof n observations. The xed elements of the modelareand,thelatterrepresentingtheregressioncoecient. Theconcomitantvariableassociatedwiththeithobservationisxi, regardedasxedandmeasuredwithouterror.2Notethat inconceptual repeatedsamplingthevalues of xiremainconstant fromonesampletoanother, but ineachsampleanewset of eiis taken, andconsequentlythevaluesofyichange. Nowrelativetoourgeneralmodel,y

= (y1y2... yn), = ( ),X

=

1 1 ... 1x1x2... xn

, ande

= (e1e2... en)Zudoesnotexistinthemodel. UsuallyRisassumedtobeI2einregressionmodels.2 OneWayRandomModelSupposewehavearandomsampleofunrelatedsiresfromsomepopulationofsiresandthat thesearematedtoasampleof unrelateddamswithoneprogenyper dam. Theresultingprogenyarerearedinacommonenvironment, andonerecordisobservedoneach. Anappropriatemodelwouldseemtobeyij= + si + eij,yijbeingtheobservationonthejthprogenyoftheithsire.Suppose that there are 3 sires with progeny numbers 3, 2, l respectively. Then yis avectorwith6elements.y

= (y11y12y13y21y22y31),x

= (1 1 1 1 1 1),u

= (s1s2s3), ande

= (e11e12e13e21e22e23),V ar(u) = I2s,V ar(e) = I2e,wherethesetwoidentitymatricesareoforder3and6,respectively.Cov(u, e

) = 0.Supposenextthatthesiresinthesamplearerelated,forexample,sires2and3arehalf-sib progeny of sire l, and all 3 are non-inbred. Then under an additive genetic model3V ar(u) =__1 1/2 1/21/2 1 1/41/2 1/4 1__ 2s.What if the mates are related?Suppose that the numerator relationship matrix, Am,forthe6matesis__________1 1/2 1/2 0 0 00 1 0 0 1/2 1/21/2 0 1 1/4 0 01/2 0 1/4 1 0 00 1/2 0 0 1 1/40 1/2 0 0 1/4 1__________.Supposefurtherthatweinvokeanadditivegeneticmodelwithh2= 1/4.ThenV ar(e) =__________1 0 1/30 1/30 0 00 1 0 0 1/30 1/301/30 0 1 1/60 0 01/30 0 1/60 1 0 00 1/30 0 0 1 1/600 1/30 0 0 1/60 1__________2e.Thisresultisbasedon2s= 2y/16, 2e= 152y/16,andleadstoV ar(y) = (.25Ap + .75I)2y,whereApistherelationshipmatrixforthe6progeny.3 TwoTraitAdditiveGeneticModelSuppose that we have a random sample of 5 related animals with measurements on 2 cor-related traits. We assume an additive genetic model. Let A be the numerator relationshipmatrixofthe5animals. Let

g11g12g12g22

4bethegeneticvariance-covariancematrixand

r11r12r12r22

be the environmental variance-covariance matrix. Then h2for trait 1 is g11/(g11+r11), andthe genetic correlation between the two traits is g12/(g11g22)1/2. Order the 10 observations,animalswithintraits. Thatis, therst5elementsofyaretheobservationsontrait1.Supposethattraits1and2havecommonmeans1,2respectively. ThenX

=

1 1 1 1 1 0 0 0 0 00 0 0 0 0 1 1 1 1 1

,and

=(12).Therst5elementsofuarebreedingvaluesfortrait1andthelast5arebreedingvaluesfortrait2. Similarlytheerrorsarepartitionedintosubvectorswith5elementseach. ThenZ=IandG=V ar(u) =

Ag11Ag12Ag12Ag22

,R=Var(e) =

Ir11Ir12Ir12Ir22

,whereeachIhasorder,5.4 TwoWayMixedModelSupposethatwehavearandomsampleof3unrelatedsiresandthattheyarematedtounrelateddams. Oneprogenyofeachmatingisobtained,andtheresultingprogenyareassignedatrandomtotwodierenttreatments. ThetableofsubclassnumbersisTreatmentsSires 1 21 2 12 0 23 3 05Orderingthedatabytreatmentswithinsires,y

=

y111y112y121y221y222y311y312y313

.Treatmentsareregardedasxed,andvariancesofsiresanderrorsareconsideredtobeunaectedbytreatments. Thenu

=

s1s2s3st11st12st22st31

.Z=_______________1 0 0 1 0 0 01 0 0 1 0 0 01 0 0 0 1 0 00 1 0 0 0 1 00 1 0 0 0 1 00 0 1 0 0 0 10 0 1 0 0 0 10 0 1 0 0 0 1_______________.V ar(s) = I32s, V ar(st) =I42st, V ar(e) =I82e.Cov(s, (st

)) =0.This is certainly not the only linear model that could be invoked for this design. Forexample, one might want to assume that sire and error variances are related to treatments.5 EquivalentModelsIt was stated above that a linear model must describe the mean and the variance-covariance matrixof y. Giventhese two, aninnityof models canbe writtenall ofwhich yield the same rst and second moments. These models are called linear equivalentmodels.Letonemodelbey=X + Zu + ewithV ar(u)=G, V ar(e)=R. Letasecondmodelbey = X +Zu +e,with V ar(u) = G, V ar(e) = R.Thenthemeansof y under these 2 models are Xand Xrespectively. V ar(y) under the 2 models isZGZ

+RandZGZ

+R.6Consequentlywestatethatthese2modelsarelinearlyequivalentifandonlyifX = XandZGZ

+R = ZGZ

+R.Toillustrate, X=Xsupposewehaveatreatmentdesignwith3treatmentsand2observationsoneach. Supposewewriteamodelyij= + ti + eij,thenX =__________1 1 0 01 1 0 01 0 1 01 0 1 01 0 0 11 0 0 1_______________t1t2t3_____.Analternativemodelisyij= i + eij,thenX=__________1 0 01 0 00 1 00 1 00 0 10 0 1_____________123___.Thenifwedenei= +ti,itisseenthatE(y)isthesameinthetwomodels.ToillustratewithtwomodelsthatgivethesameV ar(y)considerarepeatedlactationmodel. Suppose we have 3 unrelated, random sample cows with 3, 2, 1 lactation records,respectively. Invokingasimplerepeatabilitymodel,thatis,thecorrelationbetweenanypairofrecordsonthesameanimalisr,onemodelignoringthexedeectsisyij=ci+eij.V ar(c) =V ar___c1c2c3___ =___r 0 00 r 00 0 r___2y.7V ar(e) =I6(1 r)2y.Analternativefortherandompartofthemodelisyij=eij,whereZudoesnotexist.V ar() =R =__________1 r r 0 0 0r 1 r 0 0 0r r 1 0 0 00 0 0 1 r 00 0 0 r 1 00 0 0 0 0 1__________2y.Relatingthe2models,2

= 2c+2e.Cov(ij, ij, ) = 2cfor j = j

.Weshall seethatsomemodelsaremucheasiercomputationallythanothers. Alsotheparametersofonemodelcanalwaysbewrittenaslinearfunctionsoftheparametersofanyequivalentmodel. Consequentlylinearandquadraticestimatesunderonemodelcanbeconvertedbythesesamelinearfunctionstoestimatesforanequivalentmodel.6 SubclassMeansModelWithsomemodelsitisconvenienttowritethemasmodelsforthesmallestsubclassmean. Bysmallestweimplyasubclassidentiedbyallofthesubscriptsinthemodelexceptfortheindividual observations. Forthismodel toapply, thevariance-covariancematrixof elementsof epertainingtoobservationsinthesamesmallestsubclassmusthavetheform____v c...c v____,no covariates exist, and the covariances between elements of e in dierent subclasses mustbezero. Thenthemodelcanbewritten8 y =X +Zu + . y is the vector of smallest subclass means.X andZ relate these means to elementsofandu. Theerrorvector, , isthemeanofelementsofeinthesamesubclass. Itsvariance-covariancematrixisdiagonalwiththeithdiagonalelementbeing

vni+ni1nic

2e,whereniisthenumberofobservationsintheithsubclass.7 DeterminingPossibleElementsInTheModelHenderson(1959) described in detail an algorithm for determining the potential lines of anANOVAtableandcorrespondinglytheelementsofalinearmodel. First,theexperimentisdescribedintermsoftwotypesoffactors,namelymainfactorsandnestedfactors. Byamainfactorismeantaclassication, thelevelsof whichareidentiedbyasinglesubscript. Byanestingfactorismeantonewhoselevelsarenotcompletelyidentiedexceptbyspecifyingamainfactororacombinationof mainfactorswithinwhichthenestingfactorisnested. Identifyeachof themainfactorsbyasingleuniqueletter, forexample, B for breeds and T for treatments. Identify nesting factors by a letter followed bya colon and then the letter or letters describing the main factor or factors within which it isnested. For example, if sires are nested within breeds, this would be described as S:B. Onthe other hand, if a dierent set of sires is used for each breed by treatment combination,sireswouldbeidentiedasS:BT. Todeterminepotential 2factorinteractionscombinetheletterstotheleftof thecolon(foramainfactoracolonisimpliedwithnolettersfollowing). Thencombinetheletterswithoutrepetitiontotherightofthecolon. Ifnoletterappearsonboththerightandleftofthecolonthisisavalid2factorinteraction.For example, factors are A,B,C:B. Two way combinations are AB, AC:B, BC:B. The thirddoesnotqualifysinceBappearstotheleftandrightof thecolon. AC:BmeansAbyCinteractionnestedwithinB. Threefactorandhigherinteractionsaredeterminedbytakingallpossibletriosandcarryingouttheaboveprocedure. Forexample, factorsare(A,D,B:D,C:D).Twofactorpossibilitiesare(AD,AB:D,AC:D,DB:D,DC:D,BC:D).The4thand5tharenotvalid. Threefactorpossibilitiesare(ADB:D, ADC:D, ABC:D,DBC:D). Noneof theseisvalidexceptABC:D. ThefourfactorpossibilityisADBC:D,andthisisnotvalid.Having written the main factors and interactions one uses each of these as a subvectorof either or u. The next question is how to determine which. First consider main factorsandnestingfactors. If thelevelsof thefactorintheexperimentcanberegardedasa9random sample from some population of levels, the levels would be a subvector of u. Withrespecttointeractions,ifoneormoreletterstotheleftofthecolonrepresentafactorinu,theinteractionlevelsaresubvectorsofu. Thusinteractionofxedbyrandomfactorsis regarded as random, as is the nesting of random within xed. As a nal step we decidethe variance-covariance matrix of each subvector of u,the covariance between subvectorsof u, andthevariance- covariancematrixof (u, e). Theselast decisionsarebasedonknowledgeofthebiologyandthesamplingschemethatproducedthedatavector.It seems tomethat modellingis themost important andmost dicult aspect oflinearmodelsapplications. Giventhemodeleverythingelseisessentiallycomputational.10Chapter2LinearUnbiasedEstimationC.R.Henderson1984-GuelphWeareinterestedinlinearunbiasedestimatorsof oroflinearfunctionsof,sayk

. Thatis, theestimatorhastheform, a

y, andE(a

y) = k

, if possible. It isnotnecessarilythecasethatk

canbeestimatedunbiasedly. Ifk

canbeestimatedunbiasedly,itiscalledestimable. Howdowedetermineestimability?1 VerifyingEstimabilityE(a

y) =a

X.Doesthisequalk

?Itwillforanyvalueof ifandonlyifa

X=k

.Consequently,ifwecanndanyasuchthata

X=k

,thenk

isestimable. LetusillustratewithX=1 1 21 2 41 1 21 3 6.Is1estimable,thatis,(1 0 0) estimable?Leta

= (2 1 0 0)thena

X=(1 0 0) =k

.Therefore,k

isestimable.Is(0 1 2)estimable?Leta

= (1 1 0 0)thena

X=(0 1 2) =k

.Therefore,itisestimable.Is2estimable?No,becausenoa

existssuchthata

X=(0 1 0).Generallyitiseasiertoprovebytheabovemethodthatanestimablefunctionisin-deed estimable than to prove that a non-estimable function is non-estimable. Accordingly,weconsiderothermethodsfordeterminingestimability.11.1 SecondMethodPartitionXasfollowswithpossiblere-orderingofcolumns.X=(X1X1L),where X1has r linearly independent columns. Remember that X is n p with rank = r.ThedimensionsofLarer (p r).Thenk

isestimableifandonlyifk

=(k

1k

1L),wherek

1hasrelements,andk

1Lhasp relements. Considerthepreviousexample.X1=1 11 21 11 3, and L =

02

.Is(1 0 0)estimable?k

1=(1 0), k

1L =(1 0)

02

=0.Thusk

= (1 0 0),andthefunctionisestimable.Is(0 1 2)estimable?k

1=(0 1), andk

1L =(0 1)

02

=2.Thusk

= (0 1 2),andthefunctionisestimable.Is(0 1 0)estimable?k

1=(0 1), andk

1L =(0 1)

02

=2.Thus(k

1k

1L). =(0 1 2) =(0 1 0).Thefunctionisnotestimable.1.2 ThirdMethodAthirdmethodistondamatrix,C,oforderp (p r)andrank,p r,suchthatXC=0.2Thenk

isestimableifandonlyifk

C = 0.Intheexample1 1 21 2 41 1 21 3 6021 =0000.Therefore(1 0 0) isestimablebecause(1 0 0)021 = 0.Sois(0 1 2)because(0 1 2)021 = 0.But(0 1 0)isnotbecause(0 1 0)021 = 2=0.1.3 FourthMethodAfourthmethodis tondsomeg-inverseof X

X, denotedby(X

X). Thenk

isestimableifandonlyifk

(X

X)X

X=k

.Adenitionofandmethodsforcomputingag-inversearepresentedinChapter3.IntheexampleX

X=4 7 147 15 3014 30 60,andag-inverseis11115 7 07 4 00 0 0.3(1 0 0)(X

X)X

X=(1 0 0).Therefore(1 0 0)isestimable.(0 1 2)(X

X)X

X=(0 1 2).Therefore(0 1 2)isestimable.(0 1 0)(X

X)X

X=(0 1 2).Therefore(0 1 0)isnotestimable.Relatedtothisfourthmethodanylinearfunctionof(X

X)X

Xisestimable.If rank (X) = p = the number of columns in X, any linear function of is estimable.Inthatcasetheonlyg-inverseofX

Xis(X

X)1,aregularinverse. Thenbythefourthmethodk

(X

X)X

X=k

(X

X)1X

X=k

I =k

.Therefore,anyk

isestimable.There is anextensive literature ongeneralizedinverses. See for example, Searle(1971b,1982),RaoandMitra(1971)andHarville(1999??).4Chapter3BestLinearUnbiasedEstimationC.R.Henderson1984-GuelphIn Chapter 2 we discussed linear unbiased estimation of k

,having determinedthatitisestimable. Lettheestimatebea

y,andifk

isestimable,someaexistssuchthatE(a

y) = k

.Assuming that more than one a gives an unbiased estimator, which one should be chosen?The most common criterion for choice is minimum sampling variance. Such an estimatoriscalledthebestlinearunbiasedestimator(BLUE).Thuswenda

suchthatE(a

y) =k

and, intheclassof suchestimators, hasminimumsamplingvariance. NowV ar(a

y) = a

(V ar(y))a = a

Va,whereV ar(y) = V,assumedknown,forthemoment.Forunbiasednesswerequirea

X = k

. Consequentlywendathatminimizesa

Vasubjecttoa

X=k

. UsingaLagrangemultiplier, , andapplyingdierential calculusweneedtosolveforainequations

V XX

0

a

=

0k

.Thisisaconsistentsetofequationsifandonlyifk

isestimable. InthatcasetheuniquesolutiontoaisV1X(X

V1X)k.Asolutiontois(X

V1X)k,andthisisnotuniquewhenXandconsequentlyX

V1Xisnotfullrank. Neverthelessthe solution to a is invariant to the choice of a g-inverse of X

V1X. Thus, BLUE of k

isk

(X

V1X)X

V1y.Butleto= (X

V1X)X

V1y,1whereoisanysolutionto(X

V1X)o= X

V1yknown as generalized least squares (GLS) equations, Aitken (1935). Superscript 0 is usedtodenotesomesolution,notauniquesolution. ThereforeBLUEofk

isk

o.LetusillustratewithX=1 1 21 2 41 1 21 3 6,andy

=(5243). SupposeV ar(y) =I2e. ThentheGLSequationsare2e4 7 147 15 3014 30 60 o=1422442e.Asolutionis(o)

=(56 10 0)/11.ThenBLUEof(0 1 2),whichhasbeenshowntobeestimable,is(0 1 2)(56 10 0)

/11 =10/11.Anothersolutiontoois(56 0 5)

/11.ThenBLUEof(0 1 2)is10/11,thesameastheothersolutiontoo.1 MixedModelMethodForBLUEOne frequent dicultywithGLSequations, particularlyinthe mixedmodel, is thatV = ZGZ

+R is large and non-diagonal. Consequently V1is dicult or impossible tocomputebyusualmethods. ItwasprovedbyHendersonetal.(1959)thatV1= R1R1Z(Z

R1Z +G1)1Z

R1.NowifR1iseasiertocomputethanV1, asisoftentrue, ifG1iseasytocom-pute, and(Z

R1Z + G1)1iseasytocompute, thiswayofcomputingV1mayhaveimportant advantages. Note that this result can be obtained by writing equations, knownasHendersonsmixedmodelequations(1950)asfollows,

X

R1X X

R1ZZ

R1X Z

R1Z +G1

o u

=

X

R1yZ

R1y

.2Notethatifwesolvefor uinthesecondequationandsubstitutethisintherstwegetX

[R1R1Z(Z

R1Z +G1)1Z

R1]Xo= X

[R1R1Z(Z

R1Z +G1)1Z

R1]y,orfromtheresultforV1X

V1Xo= X

V1y.Thus, a solution to oin the mixed model equations is a GLS solution. An interpre-tationof uisgiveninChapter5. Themixedmodelequationsareoftenwellsuitedtoaniterativesolution. LetusillustratethemixedmodelmethodforBLUEwithX =1 11 21 11 3, Z =1 01 01 00 1, G =

.1 00 .1

,andR = I, y

=[5432].Thenthemixedmodelequationsare4 7 3 17 15 4 33 4 13 01 3 0 11

o u

=1422122.Thesolutionis[286 502 2]/57.InthiscasethesolutionisuniquebecauseXhasfullcolumnrank.NowconsideraGLSsolution.V = [ZGZ

+R] =1.1 .1 .1 0.1 1.1 .1 0.1 .1 1.1 00 0 0 1.1.V1=1143132 11 11 011 132 11 011 11 132 00 0 0 130.ThenX

V1Xo= X

V1ybecomes1143

460 830830 1852

o=1143

15802540

.Thesolutionis(286 50)/57asinthemixedmodelequations.32 VarianceofBLUEOnce having an estimate of k

we should like to know its sampling variance. Consider asetofestimators,K

o.V ar(K

o) = V ar[K

(X

V1X)X

V1y]= K

(X

V1X)X

V1VV1X(X

V1X)K= K

(X

V1X)KprovidedK

isestimable.Thevarianceisinvarianttothechoiceofag-inverseprovidedK

isestimable. Wecanalsoobtainthisresultfromag-inverseofthecoecientmatrixofthemixedmodelequations. Letag-inverseofthismatrixbe

C11C12C21C22

.ThenV ar(K

o) = K

C11K.ThisresultcanbeprovedbynotingthatC11= (X

[R1R1Z(Z

R1Z +G1)1Z

R1]X)= (X

V1X).Usingthemixedmodelexample,letK

=

1 00 1

.Ag-inverse(regularinverse)ofthecoecientmatrixis1570926 415 86 29415 230 25 2586 25 56 129 25 1 56.ThenV ar(K

o) =1570

926 415415 230

.The same result canbe obtainedfromthe inverse of the GLScoecient matrixbecause

1431

460 830830 1852 1=1570

926 415415 230

.43 GeneralizedInversesandMixedModelEquationsEarlierinthischapterwefoundthatBLUEofK

,estimable,isK

o,whereoisanysolution to either GLS or mixed model equations. Also the sampling variance requires a g-inverse of the coecient matrix of either of these sets of equations. We dene (X

V1X)as a g-inverse of X

V1X. There are various types of generalized inverses, but the one weshalluseisdenedasfollows.Aisag-inverseofAprovidedthatAAA = A.Thenifwehaveasetofconsistentequations,Ap = z,asolutiontopisAz.Weshall beconcerned, inthischapter, onlywithg-inversesof singular, symmetricmatricescharacteristicofGLSandmixedmodelequations.3.1 Firsttypeofg-inverseLetWbeasymmetricmatrixwithorder,s,andrank,t < s. PartitionWwithpossiblere-orderingofrows(andthesamere-orderingofcolumns)asW =

W11W12W

12W22

,whereW11isanon-singularmatrixwithordert. Then W=

W11100 0

.It is of interest that for this type of Wit is true that WWW= Was well asWWW=W. Thisiscalledareexiveg-inverse. Toillustrate,supposeWisaGLScoecientmatrix,W =4 7 8 157 15 17 328 17 22 3915 32 39 71.Thismatrixhasrank3andtheupper3 3isnon-singularwithinverse30141 18 118 24 121 12 11.5Thereforeag-inverseis30141 18 1 018 24 12 01 12 11 00 0 0 0.Anotherg-inverseofthistypeis30141 17 0 117 59 0 230 0 0 01 23 0 11.Thiswasobtainedbyinvertingthefullranksubmatrixcomposedofrows(andcolumns)1,2,4ofW. Thistypeofg-inverseisdescribedinSearle(1971b).In the mixed model equations a comparable g-inverse is obtained as follows. PartitionX

R1Xwithpossiblere-orderingofrows(andcolumns)as

X

1R1X1X

1R1X2X

2R1X1X

2R1X2

sothatX

1R1X1hasorderrandisfullrank. Compute

X

1R1X1X

1R1ZZ

R1X1Z

R1Z +G1

1=

C00C02C

02C22

.Thenag-inverse of the coecient matrixisC000 C020 0 0C

020 C22. Weillustratewithamixedmodelcoecientmatrixasfollows.5 8 8 3 28 16 16 4 48 16 16 4 43 4 4 8 02 4 4 0 7whereXhas3columnsandZhas2. ThereforeX

R1Xistheupper3x3submatrix.Ithasrank2becausethe3rdcolumnisthenegativeofthesecond. Consequentlyndag-inversebyinvertingthematrixwiththe3rdrowandcolumndeleted. Thisgives5601656 300 0 96 16300 185 0 20 200 0 0 0 096 20 0 96 1616 20 0 16 96.6Withthistypeofg-inversethesolutiontoois(o10)

,whereo1hasrelements. Onlytherstprowsof themixedmodel equationscontributetolackof rankof themixedmodel matrix. Thematrixhasorderp + qandrankr + q, wherer=rankof X, p=columnsinX,andq=columnsinZ.3.2 Secondtypeofg-inverseAsecondtypeof g-inverseisonewhichimposesrestrictionsonthesolutiontoo. LetM

be a set of pr linearly independent, non-estimable functions of . Then a g-inversefortheGLSmatrixisobtainedasfollows

X

V1X MM

O

1=

C11C12C

12C22

.C11 is a reexive g-inverse of X

V1X. This type of solution is described in Kempthorne(1952). LetusillustrateGLSequationsasfollows.11 5 6 3 85 5 0 2 36 0 6 1 53 2 1 3 08 3 5 0 8o=127584.Thismatrixhasorder5butrankonly3. Twoindependentnon-estimablefunctionsareneeded. Amongothersthefollowingqualify

0 1 1 0 00 0 0 1 1

.Thereforeweinvert11 5 6 3 8 0 05 5 0 2 3 1 06 0 6 1 5 1 03 2 1 3 0 0 18 3 5 0 8 0 10 1 1 0 0 0 00 0 0 1 1 0 0,whichis244128 1 1 13 13 122 1221 24 24 7 7 122 01 24 24 7 7 122 013 7 7 30 30 0 12213 7 7 30 30 0 122122 122 122 0 0 0 0122 0 0 122 122 0 0.7Theupper5x5submatrixisag-inverse. Thisgivesasolutiono=(386 8 8 262 262)

/244.Acorrespondingg-inverseforthemixedmodelisasfollowsX

R1X X

R1Z MZ

R1X Z

R1Z +G10M

0 01=C11C12C13C

12C22C23C

13C23C33.Then

C11C12C

12C22

is a g-inverse of the mixed model coecient matrix. The property of ocoming from thistypeofg-inverseisM

o= 0.3.3 Thirdtypeofg-inverseAthirdtypeof g-inverseuses Mof theprevious sectionas follows. (XV1X +MM

)1=C.ThenCisag-inverseofX

V1X.InthiscaseC(X

V1X)C =C.ThisisdescribedinRaoandMitra(1971).WeillustratewiththesameGLSmatrixasbeforeandM

=

0 1 1 0 00 0 0 1 1

asbefore.(X

V1X+MM

) =11 5 6 3 85 6 1 2 36 1 7 1 53 2 1 4 18 3 5 1 9withinverse2441150 62 60 48 7462 85 37 7 760 37 85 7 748 7 7 91 3174 7 7 31 91,whichisag-inverseoftheGLSmatrix. Theresultingsolutiontooisthesameastheprevioussection.8The corresponding methodfor nding a g-inverse of the mixedmodel matrixis

X

R1X+MM

X

R1ZZ

R1X Z

R1Z +G1

1= C. ThenCisag-inverse. ThepropertyofthesolutiontooisM

o= 0.4 ReparameterizationAn entirely dierent method for dealing with the not full rank X problem is reparameter-ization. LetK

beasetofrlinearlyindependent,estimablefunctionsof. Let beBLUE of K

. To nd solve (KK)1K

X

V1XK(K

K)1 =(K

K)1K

X

V1y. has a unique solution, and the regular inverse of the coecient matrix is V ar( ). ThiscorrespondstoamodelE(y) =XK(K

K)1.ThismethodwassuggestedtomebyGianola(1980).Fromtheimmediatelyprecedingexampleweneed3estimablefunctions. Aninde-pendentsetis1 1/2 1/2 1/2 1/20 1 1 0 00 0 0 1 1.ThecorrespondingGLSequationsare11 .50 2.50.5 2.75 .752.5 .75 2.75 =1212.Thesolutionis

=(193 8 262)/122.Thisisidenticalto1 1/2 1/2 1/2 1/20 1 1 0 00 0 0 1 1ofromtheprevioussolutioninwhich

0 1 1 0 00 0 0 1 1

owasforcedtoequal0.9Thecorrespondingsetofequationsformixedmodelsis

(K

K)1K

X

R1XK(K

K)1(K

K)1K

X

R1ZZ

R1XK(K

K)1Z

R1Z +G1

u

=

(K

K)1K

X

R1yZ

R1y

.5 PrecautionsinSolvingEquationsPrecautions must beobservedinthesolutiontoequations, especiallyif thereis somedoubt about therankof thematrix. If asupposedg-inverseis calculated, it maybeadvisable to check that AAA = A. Another check is to regenerate the right hand sidesasfollows. LettheequationsbeC = r.Havingcomputed ,computeC andcheckthatitisequal,exceptforroundingerror,tor.10Chapter4TestofHypothesesC.R.Henderson1984-GuelphMuch of the statistical literature for many years dealt primarily with tests of hypothe-ses ( or tests of signicance). More recently increased emphasis has been placed, properlyI think, on estimation and prediction. Nevertheless, many research workers and certainlymost editors of scientic journals insist on tests of signicance. Most tests involving linearmodelscanbestatedasfollows. Wewishtotestthenullhypothesis,H

0 = c0,against some alternative hypothesis, most commonly the alternative that can have anyvalueintheparameterspace. Anotherpossibilityisthegeneralalternativehypothesis,H

a = ca.Inbothof thesehypothesestheremaybeelementsof thatarenotdeterminedbyH. Theseelementsareassumedtohaveanyvaluesintheparameterspace. H

0andH

aareassumedtohavefullrowrankwithmandarowsrespectively. Alsor m > a.Undertheunrestrictedhypothesisa = 0.TwoimportantrestrictionsarerequiredlogicallyforH0andHa. First, bothH

0and H

amust be estimable. It hardly seems logical that we could test hypotheses aboutfunctionsof unlesswecanestimatethesefunctions. Second,thenullhypothesismustbecontainedinthealternativehypothesis. Thatis, if thenull istrue, thealternativemust be true. For this to be so we require that H

acan be written as MH

0and caas Mc0forsomeM.1 EquivalentHypothesesItshouldberecognizedthatthereareaninnityof hypothesesthatareequivalenttoH

0= c. LetPbeanm m,non-singularmatrix. ThenPH

0= Pcisequivalentto1H

0 = c. Forexample,consideraxedmodelyij= + ti + eij, i =1, 2, 3.Anullhypothesisoftentestedis

1 0 10 1 1

t = 0.Anequivalenthypothesisis

2/3 1/3 1/31/3 2/3 1/3

t = 0.Toconvertthersttothesecondpre-multiply

1 0 10 1 1

by

2/3 1/31/3 2/3

.AsanexampleofuseofH

aconsideratypeofanalysissometimesrecommendedforatwowayxedmodelwithoutinteraction. Letthemodelbeyijk= + ai + bj+ eijk,wherei = 1, 2, 3andj = 1, 2, 3, 4. Thelinesof theANOVAtablecouldbeasfollows.SumofSquaresRowsignoringcolumns(columndierencesregardedasnon-existent),Columnswithrowsaccountedfor,Residual.Thesumof these3sumsof squaresisequal to(y

ycorrectionfactor). Therstsumofsquaresisrepresentedastestingthenullhypothesis:0 1 0 1 0 0 0 00 0 1 1 0 0 0 00 0 0 0 1 0 0 10 0 0 0 0 1 0 10 0 0 0 0 0 1 1 = 0.andthealternativehypothesis:0 0 0 0 1 0 0 10 0 0 0 0 1 0 10 0 0 0 0 0 1 1 = 0.Thesecondsumofsquaresrepresentstestingthenullhypothesis:0 0 0 0 1 0 0 10 0 0 0 0 1 0 10 0 0 0 0 0 1 1 = 0.andthealternativehypothesis: entireparameterspace.22 TestCriteria2.1 DierencesbetweenresidualsNowitisassumedforpurposesof testinghypothesesthatyhasamultivariatenormaldistribution. Then it can be proved by the likelihood ratio method of testing hypotheses,NeymanandPearson(1933), that under thenull hypothesis thefollowingquantityisdistributedas2.(y X0)

V1(y X0) (y Xa)

V1(y Xa). (1)0isasolutiontoGLSequationssubjecttotherestrictionH

00=c0. 0canbefoundbysolving

X

V1X H0H

00

00

=

X

V1yc0

orbysolvingthecomparablemixedmodelequationsX

R1X X

R1Z H0Z

R1X Z

R1Z +G10H

00 00u00=X

R1yZ

R1yc0.aisasolutiontoGLSormixedmodel equationswithrestrictions, H

aa= caratherthanH

00=c0.Incase the alternative hypothesis is unrestricted( canhave anyvalues), thatis, aisasolutiontotheunrestrictedGLSormixedmodel equations. Underthenullhypothesis(1)isdistributedas2with(ma)degreesoffreedom,mbeingthenumberof rows (independent) in H

0, and a being the number of rows (independent) in H

a. If thealternative hypothesis is unrestricted, a = 0. Having computed (1) this value is comparedwithvaluesof2maforthechosenlevelofsignicance.Letusillustratewithamodely = + ti + eij, tixed, i = 1, 2, 3R = V ar(e) = 5I.Suppose that the number of observations onthe levels of tiare 4, 3, 2, andthetreatmenttotalsare25,15,9withindividualobservations,(6,7,8,4,4,5,6,5,4). Wewishtotestthatthelevelsoftiareequal,whichcanbeexpressedas

0 1 0 10 0 1 1

( t1t2t3)

=(0 0)

.3Weuseasthealternativehypothesistheunrestrictedhypothesis. TheGLSequationsundertherestrictionare.29 4 3 2 0 04 4 0 0 1 03 0 3 0 0 12 0 0 2 1 10 1 0 1 0 00 0 1 1 0 0

00

= .2492515900.Asolutionis

o=(49 0 0 0)/9,

o=(29 12)/9.TheGLSequationswithnorestrictionsare.29 4 3 24 4 0 03 0 3 02 0 0 2

a

= .24925159.Asolutionisa=(0 25 20 18)/4.(y Xo)

= (5 14 23 13 13 4 5 4 13)/9.(y Xo)

V1(y Xo) = 146/45.(y Xa)

= [1, 3, 7, 9, 4, 0, 4, 2, 2]/4.(y Xa)

V1(y Xa) = 9/4.Thedierenceis1464594=179180.2.2 DierencesbetweenreductionsTwoeasiermethodsofcomputationthatleadtothesameresultwillnowbepresented.Therst,describedinSearle(1971b),is

aX

V1y +

aca

oX

V1y

oco. (2)The rst 2 terms are called reduction in sums of squares under the alternative hypothesis.Thelasttwotermsarethenegativeof thereductioninsumof squaresunderthenullhypothesis. Inourexample

aX

V1y +

aca= 1087/20.

oX

V1y +

oco= 2401/45.108720240145=179180asbefore.4Ifthemixedmodelequationsareused,(2)canbecomputedas

aX

R1y +u

aZ

R1y +

aca

oX

R1y u

oZ

R1y

oco. (3)2.3 MethodbasedonvariancesoflinearfunctionsAsecondeasiermethodis(H

ooco)

[H

o(X

V1X)Ho]1(H

ooco)(H

aoca)

[H

a(X

V1X)Ha]1(H

aoca). (4)If H

aisunrestrictedthesecondtermof (4)issetto0. Rememberthatoisasolu-tionintheunrestrictedGLSequations. Inplaceof (X

V1X)onecansubstitutethecorrespondingsubmatrixofag-inverseofthemixedmodelcoecientmatrix.Thisisaconvenientpointtoprovethatanequivalenthypothesis, P(H

c)=0gives thesameresult as H

c, rememberingthat Pis non-singular. Thequantitycorrespondingto(4)forP(H

c)is(H

oc)

P

[PH

(X

V1X)HP

]1P(H

c)= (H

oc)

P

(P

)1[H

(X

V1X)H]1P1P(H

oc)= (H

oc)

[H

(X

V1X)H]1(H

oc),whichprovestheequalityofthetwoequivalenthypotheses.Letusillustrate(3)withourexample

0 1 0 10 0 1 1

o=

0 1 0 10 0 1 1

(0 25 20 18)

/4 =

72

/4.Ag-inverseofX

V1Xis0 0 0 00 15 0 00 0 20 00 0 0 30/12.H

0(X

V1X)H0=

45 3030 50

/12.5Theinverseofthisis

20 1212 18

/45.Then14(72)

20 1212 18

145

72

14=179180asbefore.The d.f. for 2are 2 because H

0 has 2 rows and the alternative hypothesis is unrestricted.2.4 ComparisonofreductionsunderreducedmodelsAnothercommonlyusedmethodistocomparereductionsinsumsof squaresresultingfromdeletionsofdierentsubvectorsof fromthereduction. Thedicultywiththismethod is the determination of what hypothesis is tested by the dierence between a pairof reductions. Itisnottrueingeneral, assometimesthought, thatRed() Red(1)teststhehypothesisthat2= 0, where

= (

1

2). Inmostdesigns, 2isnotestimable. WeneedtodeterminewhatH

imposedonasolutionwill givethesamereductioninsumofsquaresasdoesRed(1).Inthelattercasewesolve(X

1V1X1) o1=X

1V1yandthenReduction = (o1)

X

1V1y. (5)Considerahypothesis,H

2=0.WecouldsolveX

1V1X1X

1V1X20X

2V1X1X

2V1X2H0 H

0o1o2 =X

1V1yX

2V1y0. (6)ThenReduction =(o1)

X

1V1y +(o2)

X

2V1y. (7)Clearly(7)isequalto(5)ifasolutionto(6)iso2=0,fortheno1=(X

1V1X1)X

1V1y.Consequentlyinordertodeterminewhathypothesisisimpliedwhen2isdeletedfrom the model, we need to nd some H 2=0 such that a solution to (6) is o2=0.Weillustratewithatwowayxedmodelwithinteraction. Thenumbersofobserva-tionspersubclassare

3 2 11 2 5

.6Thesubclasstotalsare

6 2 23 5 9

.AnanalysissometimessuggestedisRed(, r, c) Red(, c)totestrows.Red(fullmodel) Red(, r, c)totestinteraction.Theleastsquaresequationsare14 6 8 4 4 6 3 2 1 1 2 56 0 3 2 1 3 2 1 0 0 08 1 2 5 0 0 0 1 2 54 0 0 3 0 0 1 0 04 0 0 2 0 0 2 06 0 0 1 0 0 53 0 0 0 0 02 0 0 0 01 0 0 01 0 02 05o=2710179711622359Asolutiontotheseequationsis[0, 0, 0, 0, 0, 0, 2, 1, 2, 3, 2.5, 1.8],whichgivesareductionof 55.7, thefull model reduction. Asolutionwheninteractiontermsaredeletedis[1.9677, .8065, 0, .8871, .1855, 0]givingareductionof54.3468. Thiscorrespondstoanhypothesis,

1 0 1 1 0 10 1 1 0 1 1

rc =0.WhenthisisincludedasaLagrangemultiplierasin(6),asolutionis[1.9677, .8065, 0, .8871, .1855, 0, 0, 0, 0, 0, 0, 0, .1452, .6935].Notethat(rc)o= 0, provingthatdroppingrc correspondstothehypothesisstatedabove. Thereductionagainis54.3468.Whenr andrc aredroppedfromtheequations,asolutionis[0, 2.25, 1.75, 1.8333]7givingareductionof52.6667. Thiscorrespondstoanhypothesis3 3 1 1 1 1 1 10 0 1 0 1 1 0 10 0 0 1 1 0 1 1

rrc

=0.WhenthisisaddedasaLagrangemultiplier,asolutionis[2.25, 0, 0, 0, .5, .4167, 0, 0, 0, 0, 0, 0, .6944, .05556, .8056].Notethatroandrcoarenull, verifyingthehypothesis. Thereductionagainis52.6667.Thenthetestsareasfollows:Rowsassumingrcnon-existent=54.3468-52.6667.Interaction=55.7-54.3468.8Chapter5PredictionofRandomVariablesC.R.Henderson1984-GuelphWe have discussed estimation of , regarded as xed. Now we shall consider a ratherdierent problem, predictionof randomvariables, andespeciallypredictionof u. Wecanalsoformulatethisproblemasestimationoftherealizedvaluesofrandomvariables.Theserealizedvaluesarexed, buttheyaretherealizationofvaluesfromsomeknownpopulation. Thisknowledgeenablesbetterestimates(smallermeansquarederrors)tobeobtainedthanif weignorethisinformationandestimateubyGLS. Ingeneticsthepredictorsofuareusedasselectioncriteria. Somebasicresultsconcerningselectionarenowpresented.Whichisthemorelogicalconcept,predictionofarandomvariableorestimationoftherealizedvalueof arandomvariable? If wehaveananimal alreadyborn, itseemsreasonabletodescribetheevaluationofitsbreedingvalueasanestimationproblem. Onthe other hand, if we are interested in evaluating the potential breeding value of a matingbetween two potential parents, this would be a problem in prediction. If we are interestedinfuturerecords,theproblemisclearlyoneofprediction.1 BestPredictionLet w =f(y)beapredictoroftherandomvariablew. Findf(y)suchthatE( w w)2isminimum. Cochran(1951)provedthatf(y) = E(w |y). (1)Thisrequiresknowingthejointdistributionofwandy, beingabletoderivethecondi-tional mean, andknowingthevaluesof parametersappearingintheconditional mean.Alloftheserequirementsareseldompossibleinpractice.Cochran also proved in his 1951 paper the following important result concerning selec-tion. Let p individuals regarded as a random sample from some population as candidatesforselection. Therealizedvaluesoftheseindividualsarew1, . . . wp, notobservable. Wecan observe yi, a vector of records on each. (wi, yi) are jointly distributed as f(w, y) inde-pendent of (wj, yj). Some function, say f(yi), is to be used as a selection criterion and thefraction, , withhighestf(yi)istobeselected. Whatfwill maximizetheexpectation1ofthemeanoftheassociatedwi?CochranprovedthatE(w |y)accomplishesthisgoal.Thisisaveryimportantresult,butnotethatseldomifeverdotherequirementsofthistheorem hold in animal breeding. Two obvious deciencies suggest themselves. First, thecandidatesforselectionhavedieringamountsofinformation(numberofelementsinydier). Second,candidatesarerelatedandconsequentlytheyiarenotindependentandneitherarethewi.Propertiesofbestpredictor1. E( wi) = E(wi). (2)2. V ar( wiwi) = V ar(w |y)averagedoverthedistributionof y. (3)3. Maximizesr wwforallfunctionsof y. (4)2 BestLinearPredictionBecauseweseldomknowtheformof distributionof (y, w), consideralinearpredictorthatminimizesthesquaredpredictionerror. Find w= a

y + b,wherea

isavectorandbascalarsuchthatE( w w)2isminimum. NotethatincontrasttoBPtheformofdistribution of (y, w) is not required. We shall see that the rst and second moments areneeded.LetE(w) = ,E(y) = ,Cov(y, w) = c, andV ar(y) = V.ThenE(a

y + b w)2= a

Va 2a

c +a

a + b2+ 2a

b 2a

2b + V ar(w) + 2.Dierentiatingthiswithrespecttoaandbandequatingto0

V+

1

ab

=

c +

.Thesolutionisa = V1c, b =

V1c. (5)2Thus w = +c

V1(y ).Note that this is E(w | y) when y, w are jointly normally distributed. Note also that BLPistheselectionindexofgenetics. SewallWright(1931)andJ.L.Lush(1931)wereusingthisselectioncriterionprior totheinventionofselectionindexbyFaireldSmith(1936).Ithinktheywereinvokingtheconditionalmeanundernormality,buttheywerenottooclearinthisregard.OtherpropertiesofBLPareunbiased,thatisE( w) = E(w). (6)E( w) = E[ +c

V1(y )]= +c

V1()= = E(w).V ar( w) = V ar(c

V1y) = c

V1VV1c = c

V1c. (7)Cov( w, w) = c

V1Cov(y, w) = c

V1c = V ar( w) (8)V ar( w w) = V ar(w) V ar( w) (9)Intheclassoflinearfunctionsofy,BLPmaximizesthecorrelation,r ww= a

c/[a

VaV ar(w)].5. (10)Maximizelog r.log r = log a

c .5 log[a

Va] .5 log V ar(w).Dierentiatingwithrespecttoaandequatingto0.Vaa

Va=ca

cor Va =cV ar( w)Cov( w, w).The ratio on the right does not aect r. Consequently let it be one. Then a = V1c.Alsotheconstant,b,doesnotaectthecorrelation. Consequently,BLPmaximizesr.BLPofm

wism

w,where wisBLPofw. NowwisavectorwithE(w) = andCov(y, w

) = C.Substitutethescalar,m

wforwinthestatementforBLP.ThenBLPofm

w = m

+m

C

V1(y )= m

[ +CV1(y )]= m

w (11)because w = +C

V1(y ).3In the multivariate normal case, BLP maximizes the probability of selecting the betterof two candidates for selection, Henderson (1963). For xed number selected, it maximizestheexpectationofthemeanoftheselectedui,Bulmer(1980).Itshouldbenotedthatwhenthedistributionof(y, w)ismultivariatenormal,BLPisthemeanofwgiveny,thatis,theconditionalmean,andconsequentlyisBPwithitsdesirablepropertiesasaselectioncriterion. Unfortunately, however, weprobablyneverknowthe meanof y, whichis Xinour mixedmodel. We may, however, knowVaccuratelyenoughtoassumethatourestimateistheparametervalue. Thisleadstothederivationofbestlinearunbiasedprediction(BLUP).3 BestLinearUnbiasedPredictionSuppose the predictand is the random variable, w, and all we know about it is that it hasmeank

,variance=v,anditscovariancewithy

isc

.Howshouldwepredictw?Onepossibility is to nd some linear function of y that has expectation, k

(is unbiased), andintheclassofsuchpredictorshasminimumvarianceofpredictionerrors. Thismethodiscalledbestlinearunbiasedprediction(BLUP).Letthepredictorbea

y.Theexpectationofa

y=a

X,andwewanttochooseasothattheexpectationofa

yisk

.Inorderforthistobetrueforanyvalueof,itisseenthata

mustbechosensothata

X = k

. (12)NowthevarianceofthepredictionerrorisV ar(a

y w) = a

Va 2a

c + v. (13)Consequently, weminimize(13)subjecttotheconditionof (12). Theequationstobesolvedtoaccomplishthisare

V XX

0

a

=

ck

. (14)Notethesimilarityto(1)inChapter3,theequationsforndingBLUEofk

.Solvingforaintherstequationof(14),a = V1X +V1c. (15)Substitutingthisvalueofainthesecondequationof(14)X

V1X = k +X

V1c.4Then, if the equations are consistent, and this will be true if and only if k

is estimable,asolutiontois = (X

V1X)k + (X

V1X)X

V1c.Substitutingthesolutiontoin(15)wenda = V1X(X

V1X)k V1X(X

V1X)X

V1c +V1c. (16)Thenthepredictorisa

y = k

(X

V1X)X

V1y +c

V1[y X(X

V1X)X

V1y]. (17)Butbecause(X

V1X)X

V1y=o, asolutiontoGLSequations, thepredictorcanbewrittenask

o+c

V1(y Xo). (18)This result was described by Henderson (1963) and a similar result by Goldberger (1962).Notethat if k

=0andif is known, thepredictor wouldbec

V1(y X).Thisistheusual selectionindexmethodforpredictingw. ThusBLUPisBLPwithosubstitutedfor.4 AlternativeDerivationsOfBLUP4.1 TranslationinvarianceWe want to predict m

win the situation with unknown . But BLP, the minimum MSEpredictor in the class of linear functions of y, involves . Is there a comparable predictorthatisinvariantto?Letthepredictorbea

y + b,invarianttothevalueof. Fortranslationinvariancewerequirea

y + b = a

(y +Xk) + bforanyvalueofk. Thiswillbetrueifandonlyifa

X=0.WeminimizeE(a

y + b m

w)2= a

Va 2a

Cm+ b2+m

Gmwhena

X=0andwhere G=V ar(w). Clearlyb must equal 0because b2is posi-tive. Minimizationofa

Va 2a

Cmsubjecttoa

X = 0leadsimmediatelytopredictorm

C

V1(y Xo), theBLUPpredictor. UndernormalityBLUPhas, intheclassofinvariantpredictors,thesamepropertiesasthosestatedforBLP.54.2 SelectionindexusingfunctionsofywithzeromeansAn interesting way to compute BLUP of w is the following. Compute = L

y such thatE(X) = X.Thencomputey= y X= (I XL

)y T

y.NowV ar(y) = T

VT V, (19)andCov(y, w

) = T

C C, (20)whereC = Cov(y, w

). Thenselectionindexis w = C

V y. (21)V ar( w) = Cov( w, w

) = C

VC. (22)V ar( ww) = V ar(w) V ar( w). (23)Now w is invariant to choice of T and to the g-inverse of V that is computed. V hasrank = nr. One choice of is OLS = (X

X)X

y. In that case T=I X(X

X)X

.couldalsobecomputedasOLSofanappropriatesubsetofy, withnofewerthanrelementsofy.Undernormality, w = E(w |y), and (24)V ar( ww) = V ar(w |y). (25)5 VarianceOfPredictionErrorsWenowstatesomeuseful variancesandcovariances. Letavectorofpredictandsbew.Letthevariance-covariancematrixofthevectorbeGanditscovariancewithybeC

.Thenthepredictorofwis w = K

o+C

V1(y Xo). (26)Cov( w, w

) = K

(X

V1X)X

V1C+C

V1CC

V1X(X

V1X)X

V1C. (27)6V ar( w) = K

(X

V1X)K+C

V1CC

V1X(X

V1X)X

V1C. (28)V ar( ww) = V ar(w) Cov( w, w

) Cov(w, w

) + V ar( w)= K

(X

V1X)KK

(X

V1X)X

V1CC

V1X(X

V1X)K+GC

V1C+C

V1X(X

V1X)X

V1C. (29)6 MixedModelMethodsThemixedmodelequations,(4)ofChapter3,oftenprovideaneasymethodtocomputeBLUP.Supposethepredictand,w,canbewrittenasw = K

+u, (30)whereu arethevariablesofthemixedmodel. ThenitcanbeprovedthatBLUPof w = BLUPof K

+u = K

o+ u, (31)whereoand uaresolutionstothemixedmodel equations. Fromthesecondequationofthemixedmodelequations, u = (Z

R1Z +G1)1Z

R1(y Xo).Butitcanbeprovedthat(Z

R1Z +G1)1Z

R1= C

V1,whereC = ZG,andV = ZGZ

+R.AlsooisaGLSsolution. Consequently,K

o+C

V1(y Xo) = K

o+ u.From(24)itcanbeseenthatBLUPof u = u. (32)Proofthat(Z

R1Z +G1)1Z

R1= C

V1follows.C

V1= GZ

V1= GZ

[R1R1Z(Z

R1Z +G1)1Z

R1]= G[Z

R1Z

R1Z(Z

R1Z +G1)1Z

R1]= G[Z

R1(Z

R1Z +G1)(Z

R1Z +G1)1Z

R1+G1(Z

R1Z +G1)1Z

R1]= G[Z

R1Z

R1+G1(Z

R1Z +G1)1Z

R1]= (Z

R1Z +G1)1Z

R1.This result was presentedbyHenderson(1963). Themixedmodel methodof estima-tionandpredictioncanbeformulatedasBayesianestimation, Dempe(1977). ThisisdiscussedinChapter9.77 VariancesfromMixedModelEquationsAg-inverseof thecoecientmatrixof themixedmodel equationscanbeusedtondneededvariances andcovariances. Let ag-inverse of the matrixof the mixedmodelequationsbe

C11C12C

12C22

(33)ThenV ar(K

o) = K

C11K. (34)Cov(K

o, u

) = 0. (35)Cov(K

o, u

) = K

C12. (36)Cov(K

o, u

u

) = K

C12. (37)V ar( u) = GC22. (38)Cov( u, u

) = GC22. (39)V ar( u u) = C22. (40)V ar( ww) = K

C11K+K

C12 +C

12K+C22. (41)TheseresultswerederivedbyHenderson(1975a).8 PredictionOfErrorsThepredictionoferrors(estimationoftherealizedvalues)issimple. First,considerthemodely = X +andthepredictionoftheentireerrorvector,. From(18) = C

V1(y Xo),butsinceC

= Cov(, y

) = V,thepredictorissimply = VV1(y Xo)= y Xo. (42)Topredict n+1, not inthemodel for y, weneedtoknowits covariancewithy.Supposethisisc

. Thenn+1= c

V1(y Xo)= c

V1 . (43)8Nextconsiderpredictionofefromthemixedmodel. NowCov(e, y

) = R. Then e = RV1(y Xo)= R[R1R1Z(Z

R1Z +G1)1Z

R1](y Xo),fromtheresultonV1,= [I Z(Z

R1Z +G1)1Z

R1](y Xo)= y XoZ(Z

R1Z +G1)1Z

R1(y Xo)= y XoZ u. (44)To predict en+1, not in the model for y, we need the covariance between it and e, sayc

. Thenthepredictoris en+1= c

R1 e. (45)We now dene e

= [e

pe

m], where eprefers to errors attached to y and emto futureerrors. Let

epem

=

RppRpmR

pmRmm

(46)Then ep= y XoZ u,and em= R

pmR1pp ep.Somepredictionerrorvariancesandcovariancesfollow.V ar( ep ep) = WCW

,whereW = [XZ], C =

C1C2

whereCis theinverseof mixedmodel coecient matrix, andC1, C2havep,qrowsrespectively. Additionally,Cov[( epep), (o)

K] = WC

1K,Cov[( epep), ( u u)

] = WC

2,Cov[( epep), ( emem)

] = WCW

R1ppRpm,V ar( emem) = RmmR

pmR1ppWCW

R1ppRpm,Cov[( emem), (o)

K] = R

pmR1ppWC

1K, andCov[( emem), ( u u)

] = R

pmR1ppWC

2.99 PredictionOfMissinguThreesimplemethodsexistforpredictionofauvectornotinthemodel,sayun. un= B

V1(y Xo) (47)whereB

isthecovariancebetweenunandy

. Or un= C

G1 u, (48)whereC

=Cov(un, u

), G=V ar(u), and uisBLUPof u. Orwriteexpandedmixedmodelequationsasfollows:___X

R1X X

R1Z 0Z

R1X Z

R1Z +W11W120 W

12W22______o u un__ =___X

R1yZ

R1y0___, (49)where

W11W12W

12W22

=

G CC

Gn

1andG = V ar(u), C = Cov(u, u

n), Gn= V ar(un). Thesolutionto(49)givesthesameresultsasbeforewhenunisignored. TheproofsoftheseresultsareinHenderson(1977a).10 PredictionWhenGIsSingularThe possibility exists that G is singular. This could be true in an additive genetic modelwithoneormorepairsofidenticaltwins. Thisposesnoproblemifoneusesthemethod u=GZ

V1(y X0), butthemixedmodel methodpreviouslydescribedcannotbeusedsinceG1isrequired. Amodicationofthemixedmodelequationsdoespermitasolutiontooand u. Onepossibilityistosolvethefollowing.

X

R1X X

R1ZGZ

R1X GZ

R1Z +I

o u

=

X

R1yGZ

R1y

(50)Thecoecientmatrixhasrank,r + q. ThenoisaGLSsolutionto,and uisBLUPof u. Notethatthecoecientmatrixaboveisnotsymmetric. Further, ag-inverseofitdoesnotyieldsamplingvariances. Forthisweproceedasfollows. ComputeC, someg-inverseofthematrix. ThenC

I 00 G

hasthesamepropertiesastheg-inversein(33).10If wewantasymmetriccoecientmatrixwecanmodifytheequationsof (50)asfollows.

X

R1X X

R1ZGGZ

R1X GZ

R1ZG+G

o

=

X

R1yGZ

R1y

(51)Thiscoecientmatrixhasrank,r+rank(G). Solveforo, . Then u = G .LetCbeag-inverseofthematrixof(51). Then

I 00 G

C

I 00 G

hasthepropertiesof(33).TheseresultsonsingularGareduetoHarville(1976). Thesetwomethodsforsin-gular G can also be used for nonsingular G if one wishes to avoid inverting G, Henderson(1973).11 ExamplesofPredictionMethodsLetusillustratesomeofthesepredictionmethods. SupposeX

=

1 1 1 1 11 2 1 3 4

, Z

=___1 1 0 0 00 0 1 0 00 0 0 1 1___,G=___3 2 14 15___, R = 9I, y

= (5, 3, 6, 7, 5).BythebasicGLSandBLUPmethodsV = ZGZ

+R=________12 3 2 1 112 2 1 113 1 114 514________.ThentheGLSequations,X

V1Xo= X

V1yare

.249211 .523659.523659 1.583100

o=

1.2807572.627792

.11Theinverseofthecoecientmatrixis

13.1578 4.35224.3522 2.0712

,andthesolutiontoois[5.4153 .1314]

.Topredictu,y Xo=________.28392.1525.71611.9788.1102________,GZ

V1=___.1838 .1838 .0929 .0284 .0284.0929 .0929 .2747 .0284 .0284.0284 .0284 .0284 .2587 .2587___, u = GZ

V1(y Xo) =___.3220.0297.4915___,Cov(o, u

u

) = (X

V1X)X

V1ZG=

3.1377 3.5333 .4470.5053 .6936 1.3633

, andV ar( u u) = GGZ

V1ZG+GZ

V1X(X

V1X)X

V1ZG=___3 2 14 15______1.3456 1.1638 .74451.5274 .74452.6719___+___1.1973 1.2432 .91821.3063 .79432.3541___=___2.8517 2.0794 1.17373.7789 1.04984.6822___.Themixedmodelmethodisconsiderablyeasier.X

R1X =

.5556 1.22221.2222 3.4444

,X

R1Z =

.2222 .1111 .2222.3333 .1111 .7778

,Z

R1Z =___.2222 0 0.1111 0.2222___,12X

R1y =

2.88896.4444

, Z

R1y =___.8889.66671.3333___,G1=___.5135 .2432 .0541.3784 .0270.2162___.Thenthemixedmodelequationsare________.5556 1.2222 .2222 .1111 .22223.4444 .3333 .1111 .7778.7357 .2432 .0541.4895 .0270.4384________

o u

=________2.88896.4444.8889.66671.3333________.Ag-inverse(regularinverse)is________13.1578 4.3522 3.1377 3.5333 .44702.0712 .5053 .6936 1.36332.8517 2.0794 1.17373.7789 1.04984.6822________.Theupper2x2represents(X

V1X), theupper2x3representsCov(o, u

u

),andthelower3x3V ar( u u). Thesearethesameresultsasbefore. Thesolutionis(5.4153, .1314, .3220, .0297, .4915)asbefore.NowletusillustratewithsingularG. LetthedatabethesameasbeforeexceptG =___2 1 33 47___.Notethatthe3rdrowofGisthesumoftherst2rows. NowV=________11 2 1 3 311 1 3 312 4 416 716________,andV1=________.0993 .0118 .0004 .0115 .0115.0993 .0004 .0115 .0115.0943 .0165 .0165.0832 .0280.0832________.13TheGLSequationsare

.2233 .36701.2409

o=

1.08031.7749

.(X

V1X)1=

8.7155 2.57791.5684

.o=

4.8397.0011

. u =___.1065 .1065 .0032 .1032 .1032.0032 .0032 .1516 .1484 .1484.1033 .1033 .1484 .2516 .2516___________.16141.83751.16142.1636.1648________=___0582.5270.5852___.Notethat u3= u1 + u2asaconsequenceofthelineardependenciesinG.Cov(o, u

u

) =

.9491 .8081 1.7572.5564 .7124 1.2688

.V ar( u u) =___1.9309 1.0473 2.97822.5628 3.61006.5883___.BythemodiedmixedmodelmethodsGZ

R1X=___1.2222 3.11111.4444 3.77782.6667 6.8889___,GZ

R1Z=___.4444 .1111 .6667.2222 .3333 .8889.6667 .4444 1.5556___,GZ

R1y =___6.44448.222214.6667___, X

R1y =

2.88896.4444

.Thenthenon-symmetricmixedmodelequations(50)are________.5556 1.2222 .2222 .1111 .22221.2222 3.4444 .3333 .1111 .77781.2222 3.1111 1.4444 .1111 .66671.4444 3.7778 .2222 1.3333 .88892.6667 6.8889 .6667 .4444 2.5556________

o u

=________2.88896.44446.44448.222214.6667________.14The solutionis (4.8397, .0011, .0582, .5270, .5852) as before. The inverse of thecoecientmatrixis________8.7155 2.5779 .8666 .5922 .45872.5779 1.5684 .1737 .1913 .3650.9491 .5563 .9673 .0509 .0182.8081 .7124 .1843 .8842 .06851.7572 1.2688 .1516 .0649 .9133________.Postmultiplyingthismatrixby

I 00 G

gives________8.7155 2.5779 .9491 .8081 1.75721.5684 .5563 .7124 1.26881.9309 1.0473 2.97822.5628 3.61016.5883________.These yieldthe same variances andcovariances as before. The analogous symmetricequations(51)are________.5556 1.2222 1.2222 1.4444 2.66673.4444 3.1111 3.7778 6.88895.0 4.4444 9.44447.7778 12.222221.6667________

o

=________2.88896.44446.44448.222214.6667________.Asolutionis [4.8397, .0011, .2697, 0, .1992]. Premultiplying byGweobtain u

= (.0582, .5270, .5852)asbefore.Ag-inverseofthematrixis________8.7155 2.5779 .2744 0 .13341.5684 .0176 0 .17371.1530 0 .46320 0.3197________.Pre-andpost-multiplyingthis matrixby

I 00 G

, yields thesamematrixas post-multiplyingthe non-symmetric inverse by

I 00 G

andconsequentlywe have therequiredmatrixforvariancesandcovariances.1512 IllustrationOfPredictionOfMissinguWeillustratepredictionof randomvariablesnotinthemodel forybyamultipletraitexample. Suppose we have 2 traits and 2 animals, the rst 2 with measurements on traits1 and 2, but the third with a record only on trait 1. We assume an additive genetic modelandwishtopredictbreedingvalueofbothtraitsonall3animalsandalsotopredictthesecondtraitofanimal3. Thenumeratorrelationshipmatrixforthe3animalsis___1 1/2 1/21/2 1 1/41/2 1/4 1___.Theadditivegeneticvariance-covarianceanderrorcovariancematricesareassumedtobeG0andR0=

2 22 3

and

4 11 5

, respectively. Therecords areorderedanimalsintraitsandare[6, 8, 7, 9, 5].AssumeX

=

1 1 1 0 00 0 0 1 1

.Ifall6elementsofu areincludedZ =________1 0 0 0 0 00 1 0 0 0 00 0 1 0 0 00 0 0 1 0 00 0 0 0 1 0________.If thelast(missingu6)isnotincludeddeletethelastcolumnfromZ. Whenall uareincludedG =

Ag11Ag12Ag12Ag22

,wheregijistheijthelementofG0,thegeneticvariance-covariancematrix. Numericallythisis__________2 1 1 2 1 12 .5 1 2 .52 1 .5 23 1.5 1.53 .753__________.Ifu6isnotincluded,deletethe6throwandcolumnfromG.R =________4 0 0 1 04 0 0 14 0 05 05________.16R1=________.2632 0 0 .0526 0.2632 0 0 .0526.25 0 0.2105 0.2105________.G1fortherst5elementsofuis________2.1667 1. .3333 1.3333 .66672. 0 .6667 1.3333.6667 0 01.3333 .66671.3333________.Thenthemixedmodelequationsforoand u1, . . . , u5are_____________.7763 .1053 .2632 .2632 .25 .0526 .0526.4211 .0526 .0526 0 .2105 .21052.4298 1. .3333 1.3860 .66672.2632 0 .6667 1.38609167 0 01.5439 .66671.5439__________________________12 u1 u2 u3 u4 u5_____________= (4.70, 2.21, 1.11, 1.84, 1.75, 1.58, .63)

.Thesolutionis(6.9909,6.9959,.0545,-.0495,.0223,.2651,-.2601).Topredictu6wecanuse u1, . . . , u5. Thesolutionis u6= [1 .5 2 1.5 .75]________2 1 1 2 12 .5 1 22 1 .53 1.53________1________ u1 u2 u3 u4 u5________= .1276.Wecouldhavesolveddirectlyfor u6inmixedmodelequationsasfollows._______________.7763 .1053 .2632 .2632 .25 .0526 .0526 0.4211 .0526 .0526 0 .2105 .2105 02.7632 1. 1. 1.7193 .6667 .66672.2632 0 .6667 1.3860 02.25 .6667 0 1.33331.8772 .6667 .66671.5439 01.3333_______________17

u

=[4.70, 2.21, 1.11, 1.84, 1.75, 1.58, .63, 0]

.Thesolutionis(6.9909,6.9959,.0545,-.0495,.0223,.2651,-.2601,.1276),andequalstheprevioussolution.The predictor of the record on the second trait on animal 3 is some new2 + u6 + e6.Wealreadyhave u6. Wecanpredict e6from e1 . . . e5.________ e1 e2 e3 e4 e5________=________y1y2y3y4y5________________1 0 1 0 0 0 01 0 0 1 0 0 01 0 0 0 1 0 00 1 0 0 0 1 00 1 0 0 0 0 1_____________________12 u1 u2 u3 u4 u5_____________=________1.04541.0586.01321.73911.7358________.Then e6= (0 0 1 0 0) R1( e1 . . . e5)

= .0033. The column vector above is Cov [e6, (e1e2e3e4e4e5)].RaboveisV ar[(e1 . . . e5)

].Supposewehadthesamemodelasbeforebutwehavenodataonthesecondtrait.Wewanttopredictbreedingvaluesforbothtraitsinthe3animals, thatis, u1, . . . , u6.We also want to predict records on the second trait,that is,u4 +e4, u5 +e5, u6 +e6. Themixedmodelequationsare_____________.75 .25 .25 .25 0 0 02.75 1. 1. 1.6667 .6667 .66672.25 0 .6667 1.3333 02.25 .6667 0 1.33331.6667 .6667 .66671.3333 01.3333__________________________ u1 u2 u3 u4 u5 u6_____________=_____________5.251.502.001.75000_____________.Thesolutionis[7.0345, .2069, .1881, .0846, .2069, .1881, .0846].Thelast6valuesrepresentpredictionofbreedingvalues.___ e1 e2 e3___ =___y1y2y3___ (X Z)______ u1 u2 u3______=___.8276.7774.0502___.Then___ e4 e5 e6___ =___1 0 00 1 00 0 1______4 0 00 4 00 0 4___1___ e1 e2 e3___ =___.2069.1944.0125___.18Thenpredictionsofsecondtraitrecordsare2+___.2069.1881.0846___+___.2069.1944.0125___,but2isunknown.13 ASingularSubmatrixInGSupposethatGcanbepartitionedasG=

G1100 G22

suchthat G11is non-singular andG22is singular. Acorrespondingpartitionof u

is(u

1u

2). Thentwoadditionalmethodscanbeused. First,solve(52)___X

R1X X

R1Z1X

R1Z2Z

1R1X Z

1R1Z1 +G111Z

1R1Z2G22Z

2R1X G22Z

2R1Z1G22Z

2R1Z2 +I______o u1 u2___ =___X

R1yZ

1R1yG22Z

2R1y___. (52)Letag-inverseofthismatrixbeC. ThenthepredictionerrorscomefromC___I 0 00 I 00 0 G22___. (53)Thesymmetriccounterpartoftheseequationsis___X

R1X X

R1Z1X

R1Z2G22Z

1R1X Z

1R1Z1 +G111Z

1R1Z2G22G22Z

2R1X G22Z

2R1Z1G22Z

2R1Z2G22 +G22______o u1 2___ =___X

R1yZ

1R1yG22Z

2R1y___, (54)and u2= G22 2.Let C be a g-inverse of the coecient matrix of (54). Then the variances and covari-ancescomefrom___I 0 00 I 00 0 G22___C___I 0 00 I 00 0 G22___. (55)1914 PredictionOfFutureRecordsMostapplicationsof geneticevaluationareessentiallyproblemsinpredictionof futurerecords, or more precisely, prediction of the relative values of future records, the relativityarising from the fact that we may have no data available for estimation of future X, forexample,a year eect for some record in a future year. Let the model for a future recordbeyi= x

i +z

iu + ei. (56)ThenifwehaveavailableBLUEofx

i= x

ioandBLUPofu andei, uand ei,BLUPofthisfuturerecordisx

io+z

i u + ei.Supposehoweverthatwehaveinformationononlyasubvectorofsay2. Writethemodelforafuturerecordasx

1i1 +x

2i2 +z

iu + ei.ThenwecanassertBLUPforonlyx

2i2 +z

2u + ei.Butif wehavesomeotherrecordwewishtocomparewiththisone, sayyj, withmodel,yj= x

1j1 +x

2j2 +z

ju + ej,wecancomputeBLUPofyiyjprovidedthatx1i= x1j.It should be remembered that the variance of the error of prediction of a future record(orlinearfunctionofasetofrecords)shouldtakeintoaccountthevarianceoftheerrorof predictionof theerror(orlinearcombinationof errors)andalsoitscovariancewithoand u. SeeSection8forthesevariancesandcovariances. AnextensivediscussionofpredictionoffuturerecordsisinHenderson(1977b).15 WhenRankofMMEIsGreaterThannInsomegeneticproblems,andinparticularindividualanimalmultipletraitmodels,theorder of themixedmodel coecient matrixcanbemuchgreater thann, thenumberof observations. Inthesecasesonemightwishtoconsideramethoddescribedinthis20section, especially if one can thereby store and invert the coecient matrix in cases whenthemixedmodelequationsaretoolargeforthistobedone. Solveequations(57)foroands.

V XX

0

so

=

y0

. (57)ThenoisaGLSsolutionand u = GZ

s (58)is BLUP of u. It is easy to see why these are true. Eliminate s from equations (57). Thisgives(X

V1X)o= X

V1y,whicharetheGLSequations. Solvingforsin(57)weobtains = V1(y Xo).ThenGZ

s = GZ

V1(y Xo),whichweknowtobeBLUPofu.Somevariances andcovariances fromag-inverseof thematrixof (57) areshownbelow. Letag-inversebe

C11C12C

12C22

.ThenV ar(K

o) = K

C22K. (59)V ar( u) = GZ

C11VC11ZG. (60)Cov(K

o, u

) = K

C

12VC11ZG = 0. (61)Cov(K

o, u

) = K

C

12ZG (62)Cov(K

o, u

u

) = K

C

12ZG. (63)V ar( u u) = GV ar( u). (64)Thematrixof(57)willoftenbetoolargetoinvertforpurposesofsolvingsando.Withmixedmodel equationsthataretoolargewecansolvebyGauss-Seidel iteration.Becausethismethodrequiresdiagonalsthatarenon-zero, wecannotsolve(57)bythismethod. Butifweareinterestedin u,butnotino,aniterativemethodcanbeused.Subsection4.2presentedamethodforBLUPthatis u = C

V y.NowsolveiterativelyVs = y, (65)21then u = C

s. (66)RememberthatVhasrank= n r. Neverthelessconvergencewilloccur,butnottoauniquesolution. V(andy)couldbereducedtodimension,n r,sothatthereducedVwouldbenon-singular.SupposethatX

=

1 1 1 1 11 2 3 2 4

,C

=

1 1 2 0 32 0 1 1 2

,V =________9 3 2 1 18 1 2 29 2 17 28________,y

= [6 3 5 2 8].FirstletuscomputeobyGLSand ubyGZ

V1(y Xo).TheGLSequationsare

.335816 .828030.828030 2.821936

o=

1.6228844.987475

.(o)

=[1.717054 1.263566].Fromthis u

=[.817829 1.027132].Bythemethodof(57)wehaveequations_____________9 3 2 1 1 1 18 1 2 2 1 29 2 1 1 37 2 1 28 1 40 00_____________

so

=_____________6352800_____________.Thesolutionis(o)

=sameasforGLS,s

= (.461240 .296996 .076550 .356589.268895).22Then u = C

s = same as before. Next let us compute u from dierent y. First let bethesolutiontoOLSusingthersttwoelementsofy. Thisgives=

2 1 0 0 01 1 0 0 0

y,andy=________0 0 0 0 00 0 0 0 01 2 1 0 00 1 0 1 02 3 0 0 1________y =T

y,ory

=[0 0 5 1 11].Usingthelast3elementsofygivesV

=___38 11 4411 1472___, C

=

1 1 23 1 6

.Then u = C

V1y= sameasbefore.AnotherpossibilityistocomputebyOLSusingelements1,3ofy. Thisgives=

1.5 0 .5 0 0.5 0 .5 0 0

y,andy

= [02.503.53.5].Droppingtherstandthirdelementsofy,V=___9.5 4.0 6.59.5 4.025.5___, C

=

.5 1.5 .51.5 .5 1.5

.Thisgivesthesamevaluefor u.FinallyweillustratebyGLS.=

.780362 .254522 .142119 .645995 .538760.242894 .036176 .136951 .167959 .310078

y.y

=

3.019380, 1.244186, .507752, 2.244186, 1.228682

.23V=________3.268734 .852713 .025840 2.85713 .9043934.744186 1.658915 1.255814 .0620165.656331 .658915 3.0284243.744186 .0620162.005168________.C

=

.940568 .015504 .090439 .984496 .165375.909561 1.193798 .297158 .193798 .599483

.Then u = C

V y.Vhasrank=3,andoneg-inverseis________0 0 0 0 0.271363 .092077 .107220 0.211736 .068145 0.315035 00________.Thisgives uthesameasbefore.Anotherg-inverseis________1.372401 0 0 1.035917 .5869570 0 0 00 0 01.049149 .434783.75000________.Thisgivesthesame uasbefore.Itcanbeseenthatwhen= o,aGLSsolution,C

V1y= C

V y. ThusifVcanbeinvertedtoobtaino, thisistheeasiermethod. Of coursethissectionisreallyconcernedwiththesituationinwhichV1istoodiculttocompute, andthemixedmodelequationsarealsointractable.16 PredictionWhenRIsSingularIf Rissingular, theusual mixedmodel equations, whichrequireR1, cannotbeused.Harville (1976) does describe a method using a particular g-inverse of R that can be used.Findingthisg-inverseisnottrivial. Consequently, weshall describemethodsdierentfrom his that lead to the same results. Dierent situations exist depending upon whetherXand/orZarelinearlyindependentofR.2416.1 XandZlinearlydependentonRIfRhasrankt < n,wecanwriteRwithpossiblere-orderingofrowsandcolumnsasR =

R1R1LL

R1L

R1L

,whereR1is t t, andLis t (n t) withrank(n t). Thenif X, ZarelinearlydependentuponR,X =

X1L

X1

, Z =

Z1L

Z1

.ThenitcanbeseenthatVissingular,andXislinearlydependentuponV. Onecouldndoand ubysolvingtheseequations

V XX

0

so

=

y0

, (67)and u=GZ

s. Seesection14. Itshouldbenotedthat(67)isnotaconsistentsetofequationsunlessy =

y1L

y1

.IfXhasfullcolumnrank, thesolutiontooisunique. IfXisnotfullrank, K

oisunique, givenK

isestimable. Thereisnotauniquesolutiontosbut u=GZ

sisunique.LetusillustratewithX

= (1 2 3), Z

=

1 2 32 1 3

, y

= (5 3 8),R =___3 1 24 35___, G = I.ThenV = R+ZGZ =___8 3 119 1223___,whichissingular. Thenwendsomesolutionto_____8 3 11 13 9 12 211 12 23 31 2 3 0__________s1s2s3o_____ =_____5380_____.25Threedierentsolutionvectorsare(14 7 0 54)/29,(21 0 7 54)/29,(0 21 14 54)/29.Eachofthesegives u

=(021)/29ando=54/29.WecanalsoobtainauniquesolutiontoK

oand ubysettingupmixedmodelequations using y1only or any other linearly independent subset of y. In our example letususetherst2elementsofy. Themixedmodelequationsare______1 21 22 1___

3 11 4

1

1 1 22 2 1

+___0 0 00 1 00 0 1_________o u1 u2___ =___1 21 22 1___

3 11 4

53

.Theseare111___20 20 1920 31 1919 19 34______o u1 u2___ =___515160___/11.Thesolutionis(54, 0, 21)/29asbefore.Ifweusey1,y3wegetthesameequationsasabove,andalsothesameifweusey2,y316.2 XlinearlyindependentofV,andZlinearlydependentonRInthiscaseVissingularbutwithXindependentof Vequations(67)haveauniquesolutionifXhasfullcolumnrank. OtherwiseK

oisuniqueprovidedK

isestimable.Incontrasttosection15.1,yneednotbelinearlydependentuponVandR. Letususetheexampleofsection14.1exceptnowX

= (123),andy

= (534). Thentheuniquesolutionis(s o)

= (1104, 588, 24, 4536)/2268.16.3 ZlinearlyindependentofRIn this case Vis non-singular, and X is usually linearly independent of Veven though itmay be linearly dependent on R. Consequently s and K

oare unique as in section 15.2.2617 AnotherExampleofPredictionErrorVariancesWedemonstratevariancesofpredictionerrorsandpredictorsbythefollowingexample.nijTreatment Animals1 21 2 12 1 3LetR = 5I, G =

2 11 3

.Themixedmodelcoecientmatrixis________1.4 .6 .8 .6 .8.6 0 .4 .28 .2 .61.2 .21.2________, (68)andag-inverseofthismatrixis________0 0 0 0 03.33333 1.66667 1.66667 1.666673.19820 1.44144 2.11721.84685 1.306312.38739________. (69)LetK

=

1 1 01 0 1

.ThenV ar

K

o u u

=

K

I2

[Matrix(69)](KI2)=_____3.33333 1.66667 1.66667 1.666673.19820 1.44144 2.11721.84685 1.306312.38739_____. (70)V ar

K

o u

=_____3.33333 1.66667 0 03.198198 0 0.15315 .30631.61261_____. (71)27Theupper2x2isthesameasin(70).Cov(K

o, u

) = 0.V ar( u) = GV ar( uu).Letusderivetheseresultsfromrstprinciples.

K

o u

=_____.33333 .33333 .33333 0.04504 .04504 .09009 .35135.03604 .03604 .07207 .08108.07207 .07207 .14414 .162160 0 0.21622 .21622 .21622.02703 .02703 .02703.05405 .05405 .05405_____y (72)computedby

K

I2

[matrix(71)]

X

R1Z

R1

.ContributionofRtoV ar

K

o u

= [matrix(72)]R[matrix(72)]

=_____1.6667 0 0 01.37935 .10348 .20696.08278 .16557.33114_____. (73)Foruin

K

o u

=

K

I2

[matrix(72)]Z=_____.66667 .33333.44144 .55856.15315 .15315.30631 .30631_____. (74)ContributionofGtoV ar

K

o u

= [matrix(74)]G[matrix(74)]

=_____1.6667 1.66662 0 01.81885 .10348 .20696.07037 .14074.28143_____. (75)28Thenthesumofmatrix(73)andmatrix(75)=matrix(71). ForvarianceofpredictionerrorsweneedMatrix(74)_____0 00 01 00 1_____ =_____.66667 .33333.44144 .55856.84685 .15315.30631 .69369_____. (76)ThencontributionofGtopredictionerrorvarianceis[matrix(76)]G[Matrix(76)]

,=_____1.66667 1.66667 1.66667 1.666671.81885 1.54492 1.910151.76406 1.471882.05624_____. (77)Thenpredictionerrorvarianceismatrix(73)+matrix(77)=matrix(70).18 PredictionWhenuAndeAreCorrelatedIn most applications of BLUE and BLUP it is assumed that Cov(u, e

) = 0. If this is notthe case, the mixed model equations can be modied to account for such covariances. SeeSchaeerandHenderson(1983).LetV ar

eu

=

R SS

G

. (78)ThenV ar(y) = ZGZ

+R+ZS

+SZ

. (79)Letanequivalentmodelbey = X +Tu +, (80)whereT = Z +SG1,V ar

u

=

G 00 B

, (81)andB = RSG1S

. ThenV ar(y) = V ar(Tu +)= ZGZ

+ZS

+SZ

+SG1S

+RSG1S

= ZGZ

+R+ZS

+SZ

29asintheoriginalmodel,thusprovingequivalence. Nowthemixedmodelequationsare

X

B1X X

B1TT

B1X T

B1T+G1

o u

=

X

B1yT

B1y

, (82)Ag-inverse of this matrixyields the requiredvariances andcovariances for estimablefunctionsofo, u,and u u.BcanbeinvertedbyamethodanalogoustoV1= R1R1Z(Z

R1Z +G1)1Z

R1whereV = ZGZ

+R,B1= R1+R1S(GS

R1S)1S

R1. (83)Infact,itisunnecessarytocomputeB1ifweinsteadsolve(84).___X

R1X X

R1T X

R1ST

R1X T

R1T+G1T

R1SS

R1X S

R1T S

R1S G______o u___ =___X

R1yT

R1yS

R1y___. (84)This may not be a good set of equations to solve iteratively since (S

R1SG) is negativedenite. Consequently Gauss- Seidel iteration is not guaranteed to converge, Van Norton(1959).Weillustratethemethodofthissectionbyanadditivegeneticmodel.X =_____1 11 21 11 4_____, Z = I4, G =_____1. .5 .25 .251. .25 .251. .51._____, R = 4I4,S = S

= .9I4, y

= (5, 6, 7, 9).FromtheseparametersB =_____2.88625 .50625 .10125 .101252.88625 .10125 .101252.88625 .506252.88625_____,andT = T

=_____2.2375 .5625 .1125 .11252.2375 .1125 .11252.2375 .56252.2375_____.30Thenthemixedmodelequationsof(84)are__________1.112656 2.225313 .403338 .403338 .403338 .4033386.864946 .079365 1.097106 .660224 2.8691873.451184 1.842933 .261705 .2617053.451184 .261705 .2617053.451184 1.8429333.451184____________________o1o2 u1 u2 u3 u4__________=__________7.51043117.2751501.3897822.5662522.2905754.643516__________.Thesolutionis(4.78722, .98139, .21423, .21009, .31707, .10725).Wecouldsolvethisproblembythebasicmethodo= (X

V1X)X

V1y,and u = Cov(u, y

)V1(y Xo).Weillustratethatthesegivethesameanswersasthemixedmodelmethod.V ar(y) = V =_____6.8 .5 .25 .256.8 .25 .256.8 .56.8_____.ThentheGLSequationsare

.512821 1.0256411.025641 2.991992

=

3.4615387.846280

,and = (4.78722, .98139)

asbefore.Cov(u, y

) =_____1.90 .50 .25 .251.90 .25 .251.90 .501.90_____ = GZ

+S

.(y X) = (.768610, .750000, 1.231390, .287221). u = (.21423, .21009, .31707, .10725)

= (GZ

+S

)V1(y Xo)asbefore.3119 DirectSolutionTo Andu+TInsomeproblemswewishtopredictw=u + T. Themixedmodel equationscanbemodiedtodothis. Writethemixedmodel equationsas(85). ThiscanbedonesinceE(wT) = 0.

X

R1X X

R1ZZ

R1X Z

R1Z +G1

owTo

=

X

R1yZ

R1y

. (85)Re-write(85)as

X

R1XX

R1ZT X

R1ZZ

R1XM Z

R1Z +G1

o w

=

X

R1yZ

R1y

(86)where M= (Z

R1Z + G1)T.ToobtainsymmetrypremultiplythesecondequationbyT

andsubtractthisproductfromtherstequation. Thisgives

X

R1XX

R1ZTT

Z

R1X+T

MX

R1Z M

Z

R1XM Z

R1Z +G1

o w

=

X

R1y T

Z

R1yZ

R1y

. (87)Letag-inverseofthematrixof(87)be

C11C12C

12C22

.ThenV ar(K

o) = K

C11K.V ar( ww) = C22.Hendersonsmixedmodelequationsforaselectionmodel,equation(31),inBiomet-rics(1975a)canbederivedfrom(86)bymakingthefollowingsubstitutions,

XB

forX, (0B)forT,andnotingthatB = ZBu +Be.Weillustrate(87)withthefollowingexample.X =_____1 22 11 13 4_____, Z =_____1 1 22 3 21 2 14 1 3_____, R =_____5 1 1 26 2 17 18_____,G =___3 1 14 25___, T =___3 12 32 4___, y

=(5, 2, 3, 6).32Theregularmixedmodelequationsare________1.576535 1.651127 1.913753 1.188811 1.5843052.250194 2.088578 .860140 1.8593632.763701 1.154009 1.8229522.024882 1.1424622.077104________

o u

=________2.6519043.8710183.1841491.8671333.383061________(88)Thesolutionis(2.114786, 2.422179, .086576, .757782, .580739).Theequationsforsolutiontoandtow = u +Tare________65.146040 69.396108 12.331273 8.607904 10.32368481.185360 11.428959 10.938364 11.6993912.763701 1.154009 1.8229522.024882 1.1424622.077104________

o w

=________17.40093218.4467753.1841491.8671333.383061________. (89)Thesolutionis(2.115, 2.422, 3.836, 3.795, 6.040).Thisisthesamesolutiontooasin(88),and u +Tooftheprevioussolutiongives wofthissolution. Further,

I 0T I

[inverseof (88)]

I 0T I

, = [inverseof (89)]20 DerivationOfMMEByMaximizingf(y, w)This sectiondescribes rst themethodusedbyHenderson(1950) toderivehis mixedmodelequations. Thenamoregeneralresultisdescribed. FortheregularmixedmodelE

ue

= 0, V ar

ue

=

G 00 R

.33Thedensityfunctionisf(y, u) = g(y |u)h(u),andundernormalitythelogofthisisk[(y X Zu)

R1(y X Zu) +u

G1u],wherekisaconstant. Dierentiatingwithrespectto, uandequatingto0weobtaintheregularmixedmodelequations.NowconsideramoregeneralmixedlinearmodelinwhichE

yw

=

XT

withTestimable,andV ar

yw

=

V CC

G

with

V CC

G

1=

C11C12C

12C22

.Logoff(y, w)isk[(y X)

C11(y X) + (y X)

C12(wT)+(wT)

C

12(y X) + (wT)

C22(wT).Dierentiatingwithrespecttoandtowandequatingto0,weobtain

X

C11X+X

C12T+T

C

12X+T

C22T (X

C12 +T

C22)(X

C12 +T

C22)

C22

o w

=

X

C11y +T

C

12yC

12y

. (90)Eliminating wweobtainX

(C11C

12C122 C12)Xo= X

(C11C

12C122 C12)y. (91)ButfrompartitionedmatrixinverseresultsweknowthatC11C

12C122 C12= V1.Therefore(91)areGLSequationsandK

oisBLUEofK

ifestimable.Nowsolvefor wfromthesecondequationof(90). w = C122 C

12(y Xo) +To.= C

V1(y Xo) +To.= BLUPof wbecauseC122 C

12= C

V1.34To prove that C122 C

12= C

V1note that by the denition of an inverse C

12V+C22C

=0. Pre-multiplythisbyC122andpost-multiplybyV1toobtainC122 C

12 +C

V1= 0 or C122 C

12= C

V1.Weillustratethemethodwiththesameexampleasthatofsection18.V = ZGZ

+R =_____46 66 38 74118 67 11745 66149_____, C = ZG =_____6 9 1311 18 186 11 1016 14 21_____.Thenfromtheinverseof

V ZGGZ

G

,weobtainC11=_____.229215 .023310 .018648 .052059.188811 .048951 .011655.160839 .009324.140637_____,C12=_____.044289 .069930 .236985.258741 .433566 .247086.006993 .146853 .002331.477855 .034965 .285159_____,andC22=___2.763701 1.154009 1.8229522.024882 1.1424622.077104___.Thenapplying(90)totheseresultsweobtainthesameequationsasin(89).Themethodofthissectioncouldhavebeenusedtoderivetheequationsof(82)forCov(u, e

) =0.f(y, u) = g(y |u)h(u).E(y |u) = X +Tu, V ar(y |u) = B.Seesection17fordenitionofTandB.Thenlog g(y |u)h(u) = k(y X Tu)

B1(y X Tu) +u

G1u.Thisismaximizedbysolving(82).35Thismethodalsocouldbeusedtoderivetheresultof section18. Againwemakeuseoff(y, w) = g(y |w)h(w).E(y |w) = X +Z(wT).V ar(y |w) = R.Thenlog g(y |w)h(w) = k[(y X +ZwZT)

R1(y X +ZwZT)]+ (wT)

G1(wT).Thisismaximizedbysolvingequations(87).36Chapter6GandRKnowntoProportionalityC.R.Henderson1984-GuelphIn the preceding chapters it has been assumed that V ar(u) =G and V ar(e) =Rareknown. Thisis, of course, anunrealisticassumption, butwasmadeinordertopresentestimation,prediction,andhypothesistestingmethodsthatareexactandwhichmaysuggestapproximationsforthesituationwithunknownGandR. Onecasedoesexist,however,inwhichBLUEandBLUPexist,andexacttestscanbemadeevenwhenthesevariancesareunknown. ThiscaseisGandRknowntoproportionality.SupposethatweknowGandRtoproportionality,thatisG = G2e, (1)R = R2e.Gand Rare known, but 2eis not. For example, suppose that we have a one way mixedmodelyij= x

ij + ai + eij.V ar(a1a2...)

= I2a.V ar(e11e12...)

= I2e.Supposeweknowthat2a/2e= . ThenG = I2a=I2e.R = I2e.Thenbythenotationof(1)G=I, R=I.1 BLUEandBLUPLetuswritetheGLSequationswiththenotationof(1).V = ZGZ

+R= (ZGZ

+R)2e= V2e.1ThenX

V1Xo=X

V1ycanbewrittenas2eX

V1Xo=X

V1y2e. (2)Multiplyingbothsidesby2eweobtainasetofequationsthatcanbewrittenas,X

V1Xo=X

V1y. (3)ThenBLUEofK

isK

o,whereoisanysolutionto(3).Similarlythemixedmodelequationswitheachsidemultipliedby2eare

X

R1X X

R1ZZ

R1X Z

R1Z +G1

o u

=

X

R1yZ

R1y

. (4) uisBLUPofuwhenGandRareknown.To nd the sampling variance of K

owe need a g-inverse of the matrix of (2). Thisis(X

V1X)2e.Consequently,V ar(K

o) =K

(X

V1X)K2e. (5)AlsoV ar(K

o) =K

C11K2e,whereC11istheupperp2submatrixofag-inverseofthematrixof(4). Similarlyalloftheresultsof(34)to(41)inChapter5arecorrectifwemultiplythemby2e.Ofcourse2eisunknown,sowecanonlyestimatethevariancebysubstitutingsomeestimateof2e,say 2e,in(5). Thereareseveralmethodsforestimating2e,butthemostfrequentlyusedoneistheminimumvariance, translationinvariant, quadratic, unbiasedestimatorcomputedby[y

V1y (o)

X

V1y]/[n rank(X)] (6)orby[y

R1y (o)

X

R1y u

Z

R1y]/[n rank(X)]. (7)AmoredetailedaccountofestimationofvariancesispresentedinChapters10, 11, and12.Next looking at BLUP of u under model (1), it is readily seen that u of (4) is BLUP.Similarlyvariancesandcovariancesinvolving uand u uareeasilyderivedfromtheresultsforknownGandR. Let

C11C12C12C22

2beag-inverseofthematrixof(4). ThenCov(K

o, u

u

) = K

C122e, (8)V ar( u) = (GC22)2e, (9)V ar( u u) = C222e. (10)2 TestsofHypothesesInthesamewayinwhichGandRknowntoproportionalityposenoproblemsinBLUEandBLUP,exacttestsofhypothesesregarding canbeperformed,assumingasbefore a multivariate normal distribution. Chapter 4 describes computation of a quadratic,s, thatisdistributedas2withm adegreesof freedomwhenthenull hypothesisistrue,andmandaarethenumberofrowsinH

0andH

arespectively. NowwecomputethesequadraticsexactlyasinthesemethodsexceptthatV,G,RaresubstitutedforV, G, R.Thenwhenthenullhypothesisistrue, s/ 2e(m a)isdistributedasFwithma,andnrank(X)degreesoffreedom,where 2eiscomputedby(6)or(7).3 PowerOfTheTestOfNullHypothesesTwo dierent types of errors can be made in tests of hypotheses. First, the null hypothesismayberejectedwheninfactitistrue. ThisiscommonlycalledaType1error. Second,thenull hypothesismaybeacceptedwhenitisreallynottrue. ThisiscalledaType2error, andthepowerofthetestisdenedas1minustheprobabilityofaType2error.TheresultsthatfollowregardingpowerassumethatGandRareknown.Thepowerofthetestcanbecomputedonlyif1. Thetruevalueof forwhichthepoweristobedeterminedisspecied. Dierentvaluesof givedierentpowers. Letthisvaluebet.Ofcoursewedonotknowthetruevalue,butwemaybeinterestedinthepowerofthetest,usuallyforsomeminimum dierences among elements of . Logically t must be true if the null andthealternativehypothesesaretrue. AccordinglyatmustbechosenthatviolatesneitherH

0 =c0norH

a =ca.2. The probability of the type 1 error must be specied. This is often called the chosensignicancelevelofthetest.3. Thevalueof 2emustbespecied. Becausethepowershouldnormallybecomputedpriortotheexperiment,thiswouldcomefrompriorresearch. Denethisvalueasd.34. XandZmustbespecied.ThenletA = signicancelevelF1= ma =numeratord.f.F2= n rank(X) =denominatord.f.Compute = the quadratic, s, but with Xtsubstituted for y in the computations.ComputeP =[/(ma + 1)d]1/2(11)andenterTikustable(1967)withA, F1, F2, Ptondthepowerofthetest.Letusillustratecomputationofpowerbyasimpleone-wayxedmodel,yij= + ti + eij,i = 1, 2, 3.V ar(e) = I2e.Supposethereare3,2,4observationsrespectivelyonthe3treatments. WewishtotestH

0 =0,whereH

0=

0 1 0 10 0 1 1

,againsttheunrestrictedhypothesis.Supposewewantthepowerofthetestfor

t= [10, 2, 1, 3]and2e= 12.Thatis,d = 12. Then(Xt)

=[12, 12, 12, 11, 11, 7, 7, 7, 7].As we have shown, the reduction under the null hypothesis in this case can be found fromthereducedmodelE(y) =0.TheOLSequationsare9 3 2 43 3 0 02 0 2 04 0 0 4

oto

=86362228.Asolutionis(0,12,11,7),andreduction=870. Therestrictedequationsare9o=86,4andthereductionis821.78. Thens = 48.22 = . LetuschooseA=.05asthesignicancelevelF1= 2 0 = 2.F2= 9 3 = 6.P =48.223(12)= 1.157.EnteringTikustableweobtainthepowerofthetest.5Chapter7KnownFunctionsofFixedEectsC.R.Henderson1984-GuelphInprevious chapters we have dealt withlinear relationships amongof thefollowingtypes.1. M

isasetofp-rnon-estimablefunctionsof, andasolutiontoGLSormixedmodelequationsisobtainedsuchthatM

o= c.2. K

isasetofrestimablefunctions. Thenwewriteasetofequations,thesolutiontowhichyieldsdirectlyBLUEofK

.3. H

isasetofestimablefunctionsthatareusedinhypothesistesting.Inthischapterweshallbeconcernedwithdenedlinearrelationshipsoftheform,T

= c.All of these are linearlyindependent. The consequence of these relationships is thatfunctions of maybecome estimable that are not estimable under amodel withnosuchdenitions concerning. Infact, if T

represents p r linearlyindependentnon-estimablefunctions,alllinearfunctionsofbecomeestimable.1 TestsofEstimabilityIf T

representst 0,1 1 0 01 0 1 01 0 0 1t1t2t3.Further with 1being just , and K

1being 1, and X

1= (1 1 1), K

11is estimable underamodelE(yij) = .Supposeincontrastthatwewanttoimposeaprioronjustt3. Then

1=(t1t2)and2= t3. NowK

1

1=1 1 01 0 11 0 0t1t2.Butthethirdrowrepresentsanon-estimablefunction. Thatis,isnotestimableunderthemodelwith

1= (t1t2). Consequently + t3shouldnotbeestimatedinthisway.7Asanotherexamplesupposewehavea2 3xedmodelwithn23= 0andallothernij> 0. Wewanttoestimateallsixij= + ai + bj + ij. Withnomissingsubclassestheseareestimable, sotheyarecandidatesforestimation. Supposeweusepriorson.Then(K

1K

2)

12

=1 1 0 1 0 0 1 0 0 0 0 01 1 0 0 1 0 0 1 0 0 0 01 1 0 0 0 1 0 0 1 0 0 01 0 1 1 0 0 0 0 0 1 0 01 0 1 0 1 0 0 0 0 0 1 01 0 1 0 0 1 0 0 0 0 0 1a1a2b1b2b3.NowK

11isestimableunderamodel, E(yijk)= + ai + bj. Consequentlywecanbyourrulesestimateallsixij. Thesewillhaveexpectationsasfollows.E( ij) = + ai + bj +somefunctionof = + ai + bj +ij.Nowsupposewewishtoestimatebyusingaprioronlyon23. ThenthelastrowofK

1is + a2 + b3butthisisnotestimableunderamodelEy11y12y13y21y22y23= + a1 + b1 +11 + a1 + b2 +12 + a1 + b3 +13 + a2 + b1 +21 + a2 + b2 +22 + a2 + b3.Consequentlyweshouldnotuseaprioronjust23.7 TestsOfHypothesesExact tests of hypotheses do not exist when biased estimation is used, but one mightwishtousethefollowingapproximateteststhatarebasedonusingmeansquarederrorofK

oratherthanV ar(K

o).7.1 V ar(e) = I2eWhenV ar(e) = I2ewrite(7)as(32)or(8)as(33). UsingthenotationofChapter6,G = G2eandP = P2e.X

1X1X

1X2X

1ZPX

2X1PX

2X2 +IPX

2ZZ

X1Z

X2Z

Z +G112u =X

1yPX

2yZ

y. (32)8Thecorrespondingequationswithsymmetriccoecientmatrixarein(33).X

1X1X

1X2PX

1ZPX

2X1PX

2X2P +PPX

2ZZ

X1Z

X2PZ

Z +G11u =X

1yPX

2yZ

y (33)Then2=P2.Letag-inverseofthematrixof(32)post-multipliedbyI 0 00 P 00 0 I Qorag-inverseofthematrix(33)pre-multipledandpost-multipliedbyQbedenotedby

C11C12C21C22

,where C11has order p p and C22has order q q. Then ifP= P, mean squared errorofK

isK

C11K2e. Then(K

c)

[K

C11K]1(K

c)/s 2eis distributed under the null hypothesis approximately as F with s, t degrees of freedom,wheres=numberofrows(linearlyindependent)inK

, and 2eisestimatedunbiasedlywithtdegreesoffreedom.7.2 V ar(e) = RLetg-inverseof(7)post-multipliedbyI 0 00 P 00 0 I Qorag-inverseof(8)pre-multipliedandpost-multipliedbyQbedenotedby

C11C12C21C22

.ThenifR=R,G=G, andP=P, K

C11Kisthemeansquarederrorof K

, and(K

c)

(K

C11K)1(K

c)isdistributedapproximatelyas2withsdegreesoffreedomunderthenullhypothesis,K

= c.98 EstimationofPIf one is to use biased estimation and prediction, one would usually have to estimateP, ordinarilyasingularmatrix. Iftheelementsof2arethoughttohavenoparticularpattern,permutationtheorymight beusedto deriveaveragevaluesofsquaresandprod-ucts of elements of 2, that is the value of P. We might then formulate this as estimationof a variance covariance matrix, usually with fewer parameters than t(t +1)/2, where t istheorderofP. IthinkIwouldestimatetheseparametersbytheMIVQUEmethodforsingularGdescribedinSection9ofChapter11orbyREMLofChapter12.9 IllustrationWeillustratebiasedestimationbya3-waymixedmodel. Themodelisyhijk= rh + ci +hi + uj + eijk,r, c, arexed,V ar(u) = I/10, V ar(e) = 2I.Thedataareasfollows:Levelsofjhisubclasses 1 2 3 yhi..11 2 1 0 1812 0 1 1 1313 1 0 0 721 1 2 1 2622 0 0 1 9y..j. 25 27 21Wewanttoestimateusingpriorvaluesofthesquaresandproductsofhi. Supposethisisasfollows,orderingiwithinh,andincluding23..1 .05 .05 .1 .05 .05.1 .05 .05 .1 .05.1 .05 .05 .1.1 .05 .05.1 .05.1.TheequationsoftheformX

1R1X1X

1R1X2X

1R1ZX

2R1X1X

2R1X2X

2R1ZZ

R1X1Z

R1X2Z

R1Z12u =X

1R1yX

2R1yZ

R1y10arepresentedin(34).126 0 3 2 1 3 2 1 0 0 0 3 2 15 4 1 0 0 0 0 4 1 0 1 2 27 0 0 3 0 0 4 0 0 3 3 13 0 0 2 0 0 1 0 0 1 21 0 0 1 0 0 0 1 0 03 0 0 0 0 0 2 1 02 0 0 0 0 0 1 11 0 0 0 1 0 04 0 0 1 2 11 0 0 0 10 0 0 04 0 04 03rcu=38354422718137269025272112(34)Notethat23isincludedeventhoughnoobservationonitexists.Pre-multiplyingtheseequationsbyI 0 00 P 00 0 I Tand adding I to the diagonals of equations (6)-(11) and 10I to the diagonals of equations(12)-(14) we obtain the coecient matrix to solve for the biased estimators and predictors.Therighthandsidevectoris(19, 17.5, 22, 11, 3.5, .675, .225, .45, .675, .225, .45, 12.5, 13.5, 10.5)

.Thisgivesasolutionofr= (3.6899, 4.8607),c= (1.9328, 3.3010, 3.3168),= (.11406, .11406, 0, .11406, .11406, 0),u= (.00664, .04282, .03618).Notethat

iij= 0fori =1, 2, and

jij= 0forj =1, 2, 3.11Thesearethesamerelationshipsthatweredenedfor.Post-multiplying the g-inverse of the coecient matrix by T we get (35) . . . (38) andthematrixforcomputingmeansquarederrorsforM

(r, c, , u). Thelower9 9submatrixissymmetricandinvariantreectingthefactthat,anduareinvarianttotheg-inversetaken.Upperleft7 7.26181 .10042 .02331 0 .15599 .02368 .00368.05313 .58747 .22911 0 .54493 .07756 .00244.05783 .26296 .41930 0 .35232 .02259 .00741.56640 .61368 .64753 0 1.02228 .00080 .03080.29989 .13633 .02243 0 2.07553 .07567 .04433.02288 .07836 .02339 0 .07488 .08341 .03341.02712 .02836 .02339 0 .07512 .03341 .08341(35)Upperright7 7.02 .02368 .00368 .02 .03780 .01750 .00469.08 .07756 .00244 .08 .01180 .01276 .03544.03 .02259 .00741 .03 .01986 .02631 .00617.03 .00080 .03080 .03 .02588 .01608 .04980.12 .07567 .04433 .12 .05563 .01213 .00350.05 .08341 .03341 .05 .00199 .00317 .00118.05 .03341 .08341 .05 .00199 .00317 .00118(36)Lowerleft7 7.05 .05 0 0 .15 .05 .05.02288 .07836 .02339 0 .07488 .08341 .03341.02712 .02836 .02339 0 .07512 .03341 .08341.05 .05 0 0 .15 .05 .05.01192 .01408 .04574 0 .08l51 .00199 .00199.03359 .02884 .01023 0 .02821 .00317 .00317.05450 .08524 .05597 0 .05330 .00118 .00118(37)Lowerright7 7.10 .05 .05 .10 0 0 0.08341 .03341 .05 .00199 .00317 .00118.08341 .05 .00199 .00317 .00118.10 0 0 0.09343 .00537 .00120.09008 .00455.09425(38)12Ag-inverseofthecoecientmatrixofequationslike(8)isin(39) . . . (41).This gives a solution (1.17081, 0, 6.79345, 8.16174, 8.17745, 0, 0, .76038, 1.52076, 0, 0,.00664, .04282, .03