Bio Metrical Model

8/4/2019 Bio Metrical Model

1/13

TAMIL NADU VETERINARY AND ANIMAL SCIENCES UNIVERSITY

VETERINARY COLLEGE AND RESEARCH INSTITUTE

NAMAKKAL 637 002

AGB 605

BIOMETRICAL TECHNIQUES IN ANIMAL BREEDING 3 + 1Term paper

On

BIOMETRICAL MODELS AND MAXIMUM LIKEHOOD METHOD IN ANIMAL

BREEDING

SUBMITTED TO

DR. A.K. THIRUVENKADAN Ph.D.,

ASSOCIATE PROFESSOR

DEPARTMENT OF ANIMAL GENETICS AND BREEDING

NAMAKKAL.

SUBMITTED BY

A. RAMACHANDRANMVN 10001 (AGB)

DEPARTMENT OF ANIMAL GENETICS AND BREEDING

VETERINARY COLLEGE AND RESEARCH INSTITUTE

NAMAKKAL637 002

2011


2/13

BIOMETRICAL MODELS AND MAXIMUM LIKEHOOD METHOD IN ANIMAL

BREEDING

The results of progeny testing are expressed in the form of an index, which is the

index of the genetic worth of the sire, and such an index is known as sire index. In otherwords, an attempt to express what a sire would have produced, if he had been a cow is the

sire index of the bull. It is the operational part of progeny testing called as sire proof. Based

on sire index a numerical value is obtained which indicates the production ability of the sire.

The sire index helps in ranking the bulls in order of their merit to choose the best.

The different indices developed are for two purposes viz.

1. Indices, which simply rank the sires2. The indices, which provides the estimates of breeding value of sires.

The B.V is estimated for indexing in a single herd as well as for indexing in many herds.

The main objective of modeling in animal breeding is to estimate the breeding value

of an animal. The breeding value of an individual is represented by the average effect of

genes and individual receives from both parents. Each parent contributes a sample half of its

genes to its progeny and the sample half of genes passed on to its progeny is the transmitting

ability of the parent. The

A model can be defined as a physical, mathematical or otherwise logical

representation of a system, entity, phenomenon, or process. For any model the information

that is available in the form of records is the phenotype of the individual. The basic animal

model partitions the phenotype into genotype and environment.

Phenotype = genetic effects + environmental effects + residual effects

Yij = i + gi + eij

Yij is the jth record of the ith animal

i refers to the identifiable non-random environmental effects such as herd management, year

of birth or sex of the animal

gi is the sum of the additive, dominance and epistatic genetic values of the genotype of animal

I and


3/13

eij is the sum of random environmental effects affecting animal i.

The additive genetic value in the term g above represents the average additive effects

of genes and individual receives from both parents and is termed the breeding value. Since

the additive genetic value is a function of the genes transmitted from parents to progeny, it isthe only component that can be selected for and therefore the main component of interest. In

most cases, dominance and epistasis, which represent intralocus and interlocus interactions

respectively, are assumed to be of little significance and are included in the e ij term of the

model. The assumptions for the linear model are

Y follows a multivariate normal distribution, implying that traits are determined by

infinitely many additive genes of infinitesimal effect at unlinked loci, the so-called

infinitesimal model (Fisher, 1918; Bulmer, 1980).

Variances Va and Ve are known, or at least that their proportionality is known, and

that there is no correlation between g and e (cov(g i,eij)=0) and no correlation among mates

(cov(ei,eik)=0). Also, , the mean performance of the animals in the same management group

is assumed to be known.

The accurate prediction of breeding value constitutes an important component of any

breeding programme, since genetic improvement through selection depends on correctlyidentifying individuals with the highest true breeding value. The method employed for the

prediction of breeding value depends on the type and amount of information available on the

candidates available selection.

Single record per individual

EBV=b(yi)

where b is the regression of true breeding value on phenotypic performance and ,the mean performance of animals in the same management group and is assumed to be

known.

b = cov(a,y)/var(y) = cov(a,a+e)/var(y)

= a2/ y

2

= h2


4/13

The prediction is simply the adjusted record multiplied by the heritability. The correlation

between the selection criterion, in this case the phenotypic value and the true breeding value

is known ad the accuracy of selection. It provides the means of evaluating different selection

criteria because the higher the correlation, the better the criterion as a predictor of breeding

value. This is given as the reliability or repeatability ra,y, which is square root of h2 for

selection based on single measurement per individual.

ra,y = h

Repeated records

When multiple measurements on a single individual are available

b = a2

/[t + (1t/n] y2

= nh2/[1 + (n-1)t]

ra,y= b

Breeding value prediction from progeny

b = 2n/n + k, where k= (4-h2)/h

2

ra,y = n/n + k

Breeding value prediction from pedigree

ao =(as + ad)/2

ra,ao = 1/2r2

s + r2

d

Breeding value prediction for one trait from another

b = raxyhxhyx/y

rax,y = raxyhy

Correlated response in trait x as a result of direct selection on y is

CRX = ihxhyraxyy

Selection Index (best linear prediction)


5/13

The selection index is a method of estimating the breeding value of an animal

combining all information available on the animal and its relatives. It is the best linear

prediction of an individual breeding value. The numerical value obtained for each animal is

referred to as the index (I) and it is the basis on which animals are ranked for selection.

Suppose y1,y2 and y3 are phenotypic values for animal I and its sire and dam in the same herd,

the index for this animal using this information would be

I1 = ebv = b1(y1) + b2(y2) + b3(y3)

where b1, b2, b3 are the factors by which each measurement is weighed.

The accuracy of selection is given by I/a where I= bpb

Best Linear Unbiased Prediction

The use of a selection index for genetic evaluation has certain disadvantages. Firstly, records

may have to be preadjusted for fixed or environmental factors and these are assumed to be

known, but these are usually not assumed to be known. Henderson (1949) developed a

methodology called best linear unbiased prediction (BLUP), by which fixed effects and

breeding values can be simultaneously estimated.

Best: maximizes the correlation between true (a) and predicted breeding value (a) orminimizes prediction error variance (PEV)

Linear: Predictors are linear functions of observations

Unbiased: Estimation of realized values for a random variable such as animal breeding values

and of estimable functions of fixed effects are unbiased (E(a/a)=a

Prediction: involves prediction of true breeding value.

BLUP has found widespread usage in genetic evaluation of domestic animals because

of its desirable properties. This has evolved from simple models such as the sire model in its

early years to more complex models such as the animal, maternal and multivariate models in

recent years.

The mixed model is given by

y=Xb + Za + e where

y=n x1 vector of observations; n = number of records


6/13

b=px1 vector of fixed effects; p = number of levels for fixed effects

a=qx1 vector of random animal effects; q=number of levels for random effects

e=nx1 vector of random residual effects

X=design matrix of order nxp, which related records to fixed effects

Z=design matrix of order nxq, which related records to random animal effects

Var(a) = A2a

The solutions to the MME give the Best Linear Unbiased Estimate (BLUE) OF Kb

and the BLUP of breeding value (a) under certain assumptions as follows

i. Distributions of y, u and e are assumed to be multivariate normal, implying thattraits are determined by many additive genes of infinitesimal effects at many

unlinked loci.

ii. The variances and covariances (R and G) for the base population are assumed tobe known or at least known to proportionality. In practice, variances and

covariances of the base population are never known exactly but, assuming the

infinitesimal model, these can be estimated by restricted maximum likelihood(REML) if data include information which selection was based

iii. The MME can take selection into account if they are based on the linear functionof y and there is no selection on information not included in the data.

Nicholas (1982) and Mrode (1996) have described the steps involved in using

these MME of Henderson (1975) for prediction of breeding values.

The different models under a BLUP estimation are

Sire model: The application of a sire model implies that only sires are evaluated,

using progeny records. The main advantage with this model is that the number of equations is

reduced compared with an animal model since only sires are evaluated. However, with a sire

model, the genetic merit of the mate (dam of progeny) is not accounted for and can result in

bias in the predicted breeding value if there is preferential mating.

Animal model: In this model the individual or animal is taken as the source ofvariation and is unbiased. Since it takes into consideration the effect of dams also the animal


7/13

model can be extended to estimate variance components due to maternal, common

environment and permanent environment. However the number of equations to be solved is

more and this model requires more computing power.

Reduced animal model: In order to reduce the total number of equations to be

solved, the equations are set for parents alone and the breeding value for progeny can be

obtained from the breeding value of the parents. Developed by Quaas and Pollak (1980).

Animal models with groups: In the usual animal model, the breeding value of

animals in subsequent generations are usually expressed relative to those that of base animals.

If the base population differ in mean, for eg. the animals in the base population are from

different countries, this must then be accounted for in the model. The sires are grouped based

on the time period and country of origin. Within the country, the four selection paths: sire of

sires, sire of dams, dam of sires and dam of dams, are usually assumed to be of different

genetic merit and this is accounted for in the grouping strategy.

In some circumstances, environmental factors constitute an important component of

the covariance between individuals such as members of a family reared together (common

environment), or between the records of an individual (permanent environmental effects).

Such effects are included in the model to ensure accurate prediction of breeding value.

Repeatability model

The repeatability model is appropriate when multiple measurements on the same trait

are recorded on an individual, such as litter size in successive pregnancies or milk yield in

successive lactations. For an animal, the model always assumes a genetic correlation of unity

between all pairs of records, equal variance for all records and equal environmental

correlation between all pairs of records. The repeatability model is given by

y = Xb + Za + Wpe + e

Var(pe) =I 2pe is the additional permanent environmental variance estimated.

Apart from the resemblance between records of an individual due to permanent

environmental conditions, common environmental contributes to the similarity between

individuals of a family reared together. This increases the variance between families. Sources

of common environmental variance between families may be due to factors such as nutrition


8/13

and /or climatic conditions. This component must be taken care of in cases of full-sibs or

maternal half-sibs etc., Influence of dam also adds to the environmental component of

variance in such cases

Maternal trait models

The phenotypic expression of some traits in the progeny, such as weaning weight in

beef cattle, is influenced by the ability of the dam t provide a suitable environment in the

form of better nourishment. The dam contributes to the progeny in two ways: firstly through

her direct genetic effects passed to the progeny and secondly through her ability t provide a

suitable environment, for instance in producing milk. Hence the phenotype may be

partitioned into the following.

1. Additive genetic effects from the sire and the dam, usually termed direct geneticeffect.

2. Additive genetic ability of the dam to provide a suitable environment, usuallytermed indirect or maternal genetic effect.

3. Permanent environmental effects, which include permanent environmentalinfluences on the dams mothering ability and maternal non-additive genetic

effects of the dam.

4. Other random environmental effects, termed residual effects.The model can be represented as

y = Xb + Za + Sm + Wpe + e

Methods of estimation in linear models

Variance components are commonly used in formulating appropriate designs,

establishing quality control procedures, or, in statistical genetics in estimating heritabilities


9/13

and genetic correlations. Traditionally the estimators used most often have been the analysis

of variance (ANOVA) estimators, which are obtained by equating observed and expected

mean squares from an analysis of variance and solving the resulting equations. If the data are

balanced, the ANOVA estimators have many appealing properties. In unbalanced situations,

these properties are rarely hold true which create number of problems in arriving at correct

decisions. As in reality, variance components are mostly estimated from unbalanced data only

so there are number of problems associated with them in these situations. In unbalanced

situations, two general classes of estimators have sparked considerable interest: maximum

likelihood and restricted maximum likelihood (ML and REML) and minimum norm and

minimum variance quadratic unbiased estimation (MINQUE and MIVQUE). The links

between them is also very important component. In addition to estimation problems in

unbalanced case, the notion of robust estimation which takes care of influence of outliers and

underlying statistical assumptions is also of interest.

The classical least squares model contains only are random element, the random error;

all other effects are assumed to be fixed constants. For this class of models, the assumption of

independence of the i implies independence of the yi. That is if ()VarI=2, then ()VaryI=2

also. Such models are called fixed effects models or more simply fixed models. There are

situations when there is more than one random term. The classical variance components

problems, in which the purpose is to estimate components of variance rather than specific

treatment effects, is one example. In these cases, the treatment effects are assumed to be a

random sample from a population of such effects and the goal of the study is to estimate the

variance among these effects in the population. The individual effects that happen to be

observed in the study are not of any particular interest except for the information they provide

on the variance component. Models in which all effects assumed to be random effects are

called random models. Models that contain both fixed and random effects are called mixedmodel.

The method of least squares estimates the estimator that gives the least sum of

squares between the Y and expected value of y. This method requires assumption about the

distribution of response variable only for expected value and possibly their variance-

covariance structure (Dobson and Barnett, 2008).

Maximum likelihood estimation powerful logic that can be applied to any form of statistical

inference.


10/13

For a given set of parameters defining a statistical model, their likelihood is defined as the

probability of observing the actual data in hand if those parameter estimates were true:

parameter

estimates with low likelihoods are therefore those under which observing the actual data

would be a rare event, and soforth. Probability is calculated based on assumptions about the

statistical probability distribution of the data, usually that it is multivariate normal. An ML

analysis then simply identifies the

set of parameters that maximizes the likelihood of observing the actual data. To estimate the

likelihood of the model in equation assume that both the additive genetic effects and the

residual errors are normally distributed, and hence that the trait y is also normally distributed

(in practice, REML estimators are fairly

robust to this assumption. All ML estimates have the undesirable property of being

statistically

biased, because they fail to account for the degrees of freedom lost in estimating fixed

effects. This generates bias even when the only fixed effect being considered is the mean, but

the bias can be considerable for larger numbers of fixed effects (Meyer 1989). As a result, an

ML approach will underestimate the residual variance. However, the bias can be avoided by

considering a restricted maximum likelihood (REML) in which only the likelihood of the

part of the data that does not depend on the fixed effects is considered (Patterson &

Thompson, 1971). To obtain REML estimators rather than just ML for the model in equation,

the likelihood is maximized for a transformed vectory, wherey contains the data corrected

by a particular transformation matrixK(soy Ky), andKdepends on the design matrixX

such that KX ssentially the ML estimates for these

transformed variables.

Predicting breeding values

An individuals breeding value for a given phenotypic trait is the total additive effect

of its genes on that trait (Falconer &Mackay 1996). Armed with estimates of the variance

components that define V, we can return to equation to make predictions of individual

additive genetic effects, or breeding values, and estimates of fixed effects. These are known

as BLUPs and BLUEs, respectively: best (because they minimize error variance), linear (they

are linear functions of the data), unbiased (their expected mean is equal to what they are


11/13

estimating), predictors (for random effects) or estimates (for fixed effects). The BLUE of

fixed effects is simply the least-squares estimator.

Solution to linear models

The various methods used to solve the linear models can be broadly divided into

1. Direct inversion2. Iteration on the MME: Done by Jacobi or Gauss-Seidel iteration3. Iteration on the data is done by setting up of equations for each level of the effect

and solution is through any one of the iterations.

Mrode (1996) has given detailed description about different models and solving of linear

equations with appropriate examples.

Bayesian method of estimation

It is based on the conditioning that the parameter to be estimated is a random variable

and the data are fixed and it is explained by the Bayes equation

P(/y) = P(y/ )P() and is called as the posterior estimate based on the prior. This method is

more intuitive as data once created cannot be created and Bayesian principle takes into

consideration this fact. This methodology is more useful when the assumptions of normality

or other distributions is not fulfilled in case of maximum likelihood distribution.

Simulation of data

As defined earlier, a Model can be described as a physical, mathematical or

otherwise logical representation of a system, entity, phenomenon, or process. A Simulation

is the implementation or exercise of a Model over time6 hence, the simulation, utilising

models, becomes the dynamic representation of a real world activity or entity. Simulation is

done in order to get numerous subsets of data with different circumstances so as to enable

prediction and forecasting. Simulation helps in obtaining data with more volume, greater

detail and accuracy. Real data can have some disadvantages like false positive significance,


12/13

lack of power or absence of true signal, which can be over come by simulation. Simulation

has been used in new method development and genetic models for disease. However

simulated data is much cleaner and can never replace real data.

References

Dobson, A. and Barnett, G. (2008). An Introduction to Generalized Linear Models. CRC Press,

London.

Harvey, W.R., 1990. Mixed Model Least-squares and Maximum Likelihood Computer Programme.

PC-2 version. Ohio State University, Columbus.

Henderson, C. R. 1975 Best linear unbiased estimation and prediction under a selection model.

Biometrics 31, 423447.

Kruuk, L.E.B. 2004. Estimating genetic parameters in natural populations using the Animal Model.

Phil. Trans. R. Soc. Lond., 359: 873-890

Meyer, K. (1998). DFREML User Notes. University of New England, Armidale, Australia.

Meyer, K. 1989. Restricted maximum-likelihood to estimate variance components for animal models

with several random effects using a derivative-free algorithm. Genet. Selection.Evol.,21,

317340.

Mrode, R. A. (1996). Linear Models for the Prediction of Animal Breeding Values. CAB

international, UK.

Nicholas, F. W. (1982). Veterinary Genetics.

Patterson, H. D. & Thompson, R. 1971 Recovery of interblock information when block sizes are

unequal. Biometrika, 58: 545554.


13/13

Bio Metrical Model

Documents

Transcript of Bio Metrical Model