Bio Metrical Model

download Bio Metrical Model

of 13

Transcript of Bio Metrical Model

  • 8/4/2019 Bio Metrical Model

    1/13

    TAMIL NADU VETERINARY AND ANIMAL SCIENCES UNIVERSITY

    VETERINARY COLLEGE AND RESEARCH INSTITUTE

    NAMAKKAL 637 002

    AGB 605

    BIOMETRICAL TECHNIQUES IN ANIMAL BREEDING 3 + 1Term paper

    On

    BIOMETRICAL MODELS AND MAXIMUM LIKEHOOD METHOD IN ANIMAL

    BREEDING

    SUBMITTED TO

    DR. A.K. THIRUVENKADAN Ph.D.,

    ASSOCIATE PROFESSOR

    DEPARTMENT OF ANIMAL GENETICS AND BREEDING

    NAMAKKAL.

    SUBMITTED BY

    A. RAMACHANDRANMVN 10001 (AGB)

    DEPARTMENT OF ANIMAL GENETICS AND BREEDING

    VETERINARY COLLEGE AND RESEARCH INSTITUTE

    NAMAKKAL637 002

    2011

  • 8/4/2019 Bio Metrical Model

    2/13

    BIOMETRICAL MODELS AND MAXIMUM LIKEHOOD METHOD IN ANIMAL

    BREEDING

    The results of progeny testing are expressed in the form of an index, which is the

    index of the genetic worth of the sire, and such an index is known as sire index. In otherwords, an attempt to express what a sire would have produced, if he had been a cow is the

    sire index of the bull. It is the operational part of progeny testing called as sire proof. Based

    on sire index a numerical value is obtained which indicates the production ability of the sire.

    The sire index helps in ranking the bulls in order of their merit to choose the best.

    The different indices developed are for two purposes viz.

    1. Indices, which simply rank the sires2. The indices, which provides the estimates of breeding value of sires.

    The B.V is estimated for indexing in a single herd as well as for indexing in many herds.

    The main objective of modeling in animal breeding is to estimate the breeding value

    of an animal. The breeding value of an individual is represented by the average effect of

    genes and individual receives from both parents. Each parent contributes a sample half of its

    genes to its progeny and the sample half of genes passed on to its progeny is the transmitting

    ability of the parent. The

    A model can be defined as a physical, mathematical or otherwise logical

    representation of a system, entity, phenomenon, or process. For any model the information

    that is available in the form of records is the phenotype of the individual. The basic animal

    model partitions the phenotype into genotype and environment.

    Phenotype = genetic effects + environmental effects + residual effects

    Yij = i + gi + eij

    Yij is the jth record of the ith animal

    i refers to the identifiable non-random environmental effects such as herd management, year

    of birth or sex of the animal

    gi is the sum of the additive, dominance and epistatic genetic values of the genotype of animal

    I and

  • 8/4/2019 Bio Metrical Model

    3/13

    eij is the sum of random environmental effects affecting animal i.

    The additive genetic value in the term g above represents the average additive effects

    of genes and individual receives from both parents and is termed the breeding value. Since

    the additive genetic value is a function of the genes transmitted from parents to progeny, it isthe only component that can be selected for and therefore the main component of interest. In

    most cases, dominance and epistasis, which represent intralocus and interlocus interactions

    respectively, are assumed to be of little significance and are included in the e ij term of the

    model. The assumptions for the linear model are

    Y follows a multivariate normal distribution, implying that traits are determined by

    infinitely many additive genes of infinitesimal effect at unlinked loci, the so-called

    infinitesimal model (Fisher, 1918; Bulmer, 1980).

    Variances Va and Ve are known, or at least that their proportionality is known, and

    that there is no correlation between g and e (cov(g i,eij)=0) and no correlation among mates

    (cov(ei,eik)=0). Also, , the mean performance of the animals in the same management group

    is assumed to be known.

    The accurate prediction of breeding value constitutes an important component of any

    breeding programme, since genetic improvement through selection depends on correctlyidentifying individuals with the highest true breeding value. The method employed for the

    prediction of breeding value depends on the type and amount of information available on the

    candidates available selection.

    Single record per individual

    EBV=b(yi)

    where b is the regression of true breeding value on phenotypic performance and ,the mean performance of animals in the same management group and is assumed to be

    known.

    b = cov(a,y)/var(y) = cov(a,a+e)/var(y)

    = a2/ y

    2

    = h2

  • 8/4/2019 Bio Metrical Model

    4/13

    The prediction is simply the adjusted record multiplied by the heritability. The correlation

    between the selection criterion, in this case the phenotypic value and the true breeding value

    is known ad the accuracy of selection. It provides the means of evaluating different selection

    criteria because the higher the correlation, the better the criterion as a predictor of breeding

    value. This is given as the reliability or repeatability ra,y, which is square root of h2 for

    selection based on single measurement per individual.

    ra,y = h

    Repeated records

    When multiple measurements on a single individual are available

    b = a2

    /[t + (1t/n] y2

    = nh2/[1 + (n-1)t]

    ra,y= b

    Breeding value prediction from progeny

    b = 2n/n + k, where k= (4-h2)/h

    2

    ra,y = n/n + k

    Breeding value prediction from pedigree

    ao =(as + ad)/2

    ra,ao = 1/2r2

    s + r2

    d

    Breeding value prediction for one trait from another

    b = raxyhxhyx/y

    rax,y = raxyhy

    Correlated response in trait x as a result of direct selection on y is

    CRX = ihxhyraxyy

    Selection Index (best linear prediction)

  • 8/4/2019 Bio Metrical Model

    5/13

    The selection index is a method of estimating the breeding value of an animal

    combining all information available on the animal and its relatives. It is the best linear

    prediction of an individual breeding value. The numerical value obtained for each animal is

    referred to as the index (I) and it is the basis on which animals are ranked for selection.

    Suppose y1,y2 and y3 are phenotypic values for animal I and its sire and dam in the same herd,

    the index for this animal using this information would be

    I1 = ebv = b1(y1) + b2(y2) + b3(y3)

    where b1, b2, b3 are the factors by which each measurement is weighed.

    The accuracy of selection is given by I/a where I= bpb

    Best Linear Unbiased Prediction

    The use of a selection index for genetic evaluation has certain disadvantages. Firstly, records

    may have to be preadjusted for fixed or environmental factors and these are assumed to be

    known, but these are usually not assumed to be known. Henderson (1949) developed a

    methodology called best linear unbiased prediction (BLUP), by which fixed effects and

    breeding values can be simultaneously estimated.

    Best: maximizes the correlation between true (a) and predicted breeding value (a) orminimizes prediction error variance (PEV)

    Linear: Predictors are linear functions of observations

    Unbiased: Estimation of realized values for a random variable such as animal breeding values

    and of estimable functions of fixed effects are unbiased (E(a/a)=a

    Prediction: involves prediction of true breeding value.

    BLUP has found widespread usage in genetic evaluation of domestic animals because

    of its desirable properties. This has evolved from simple models such as the sire model in its

    early years to more complex models such as the animal, maternal and multivariate models in

    recent years.

    The mixed model is given by

    y=Xb + Za + e where

    y=n x1 vector of observations; n = number of records

  • 8/4/2019 Bio Metrical Model

    6/13

    b=px1 vector of fixed effects; p = number of levels for fixed effects

    a=qx1 vector of random animal effects; q=number of levels for random effects

    e=nx1 vector of random residual effects

    X=design matrix of order nxp, which related records to fixed effects

    Z=design matrix of order nxq, which related records to random animal effects

    Var(a) = A2a

    The solutions to the MME give the Best Linear Unbiased Estimate (BLUE) OF Kb

    and the BLUP of breeding value (a) under certain assumptions as follows

    i. Distributions of y, u and e are assumed to be multivariate normal, implying thattraits are determined by many additive genes of infinitesimal effects at many

    unlinked loci.

    ii. The variances and covariances (R and G) for the base population are assumed tobe known or at least known to proportionality. In practice, variances and

    covariances of the base population are never known exactly but, assuming the

    infinitesimal model, these can be estimated by restricted maximum likelihood(REML) if data include information which selection was based

    iii. The MME can take selection into account if they are based on the linear functionof y and there is no selection on information not included in the data.

    Nicholas (1982) and Mrode (1996) have described the steps involved in using

    these MME of Henderson (1975) for prediction of breeding values.

    The different models under a BLUP estimation are

    Sire model: The application of a sire model implies that only sires are evaluated,

    using progeny records. The main advantage with this model is that the number of equations is

    reduced compared with an animal model since only sires are evaluated. However, with a sire

    model, the genetic merit of the mate (dam of progeny) is not accounted for and can result in

    bias in the predicted breeding value if there is preferential mating.

    Animal model: In this model the individual or animal is taken as the source ofvariation and is unbiased. Since it takes into consideration the effect of dams also the animal

  • 8/4/2019 Bio Metrical Model

    7/13

    model can be extended to estimate variance components due to maternal, common

    environment and permanent environment. However the number of equations to be solved is

    more and this model requires more computing power.

    Reduced animal model: In order to reduce the total number of equations to be

    solved, the equations are set for parents alone and the breeding value for progeny can be

    obtained from the breeding value of the parents. Developed by Quaas and Pollak (1980).

    Animal models with groups: In the usual animal model, the breeding value of

    animals in subsequent generations are usually expressed relative to those that of base animals.

    If the base population differ in mean, for eg. the animals in the base population are from

    different countries, this must then be accounted for in the model. The sires are grouped based

    on the time period and country of origin. Within the country, the four selection paths: sire of

    sires, sire of dams, dam of sires and dam of dams, are usually assumed to be of different

    genetic merit and this is accounted for in the grouping strategy.

    In some circumstances, environmental factors constitute an important component of

    the covariance between individuals such as members of a family reared together (common

    environment), or between the records of an individual (permanent environmental effects).

    Such effects are included in the model to ensure accurate prediction of breeding value.

    Repeatability model

    The repeatability model is appropriate when multiple measurements on the same trait

    are recorded on an individual, such as litter size in successive pregnancies or milk yield in

    successive lactations. For an animal, the model always assumes a genetic correlation of unity

    between all pairs of records, equal variance for all records and equal environmental

    correlation between all pairs of records. The repeatability model is given by

    y = Xb + Za + Wpe + e

    Var(pe) =I 2pe is the additional permanent environmental variance estimated.

    Apart from the resemblance between records of an individual due to permanent

    environmental conditions, common environmental contributes to the similarity between

    individuals of a family reared together. This increases the variance between families. Sources

    of common environmental variance between families may be due to factors such as nutrition

  • 8/4/2019 Bio Metrical Model

    8/13

    and /or climatic conditions. This component must be taken care of in cases of full-sibs or

    maternal half-sibs etc., Influence of dam also adds to the environmental component of

    variance in such cases

    Maternal trait models

    The phenotypic expression of some traits in the progeny, such as weaning weight in

    beef cattle, is influenced by the ability of the dam t provide a suitable environment in the

    form of better nourishment. The dam contributes to the progeny in two ways: firstly through

    her direct genetic effects passed to the progeny and secondly through her ability t provide a

    suitable environment, for instance in producing milk. Hence the phenotype may be

    partitioned into the following.

    1. Additive genetic effects from the sire and the dam, usually termed direct geneticeffect.

    2. Additive genetic ability of the dam to provide a suitable environment, usuallytermed indirect or maternal genetic effect.

    3. Permanent environmental effects, which include permanent environmentalinfluences on the dams mothering ability and maternal non-additive genetic

    effects of the dam.

    4. Other random environmental effects, termed residual effects.The model can be represented as

    y = Xb + Za + Sm + Wpe + e

    Methods of estimation in linear models

    Variance components are commonly used in formulating appropriate designs,

    establishing quality control procedures, or, in statistical genetics in estimating heritabilities

  • 8/4/2019 Bio Metrical Model

    9/13

    and genetic correlations. Traditionally the estimators used most often have been the analysis

    of variance (ANOVA) estimators, which are obtained by equating observed and expected

    mean squares from an analysis of variance and solving the resulting equations. If the data are

    balanced, the ANOVA estimators have many appealing properties. In unbalanced situations,

    these properties are rarely hold true which create number of problems in arriving at correct

    decisions. As in reality, variance components are mostly estimated from unbalanced data only

    so there are number of problems associated with them in these situations. In unbalanced

    situations, two general classes of estimators have sparked considerable interest: maximum

    likelihood and restricted maximum likelihood (ML and REML) and minimum norm and

    minimum variance quadratic unbiased estimation (MINQUE and MIVQUE). The links

    between them is also very important component. In addition to estimation problems in

    unbalanced case, the notion of robust estimation which takes care of influence of outliers and

    underlying statistical assumptions is also of interest.

    The classical least squares model contains only are random element, the random error;

    all other effects are assumed to be fixed constants. For this class of models, the assumption of

    independence of the i implies independence of the yi. That is if ()VarI=2, then ()VaryI=2

    also. Such models are called fixed effects models or more simply fixed models. There are

    situations when there is more than one random term. The classical variance components

    problems, in which the purpose is to estimate components of variance rather than specific

    treatment effects, is one example. In these cases, the treatment effects are assumed to be a

    random sample from a population of such effects and the goal of the study is to estimate the

    variance among these effects in the population. The individual effects that happen to be

    observed in the study are not of any particular interest except for the information they provide

    on the variance component. Models in which all effects assumed to be random effects are

    called random models. Models that contain both fixed and random effects are called mixedmodel.

    The method of least squares estimates the estimator that gives the least sum of

    squares between the Y and expected value of y. This method requires assumption about the

    distribution of response variable only for expected value and possibly their variance-

    covariance structure (Dobson and Barnett, 2008).

    Maximum likelihood estimation powerful logic that can be applied to any form of statistical

    inference.

  • 8/4/2019 Bio Metrical Model

    10/13

    For a given set of parameters defining a statistical model, their likelihood is defined as the

    probability of observing the actual data in hand if those parameter estimates were true:

    parameter

    estimates with low likelihoods are therefore those under which observing the actual data

    would be a rare event, and soforth. Probability is calculated based on assumptions about the

    statistical probability distribution of the data, usually that it is multivariate normal. An ML

    analysis then simply identifies the

    set of parameters that maximizes the likelihood of observing the actual data. To estimate the

    likelihood of the model in equation assume that both the additive genetic effects and the

    residual errors are normally distributed, and hence that the trait y is also normally distributed

    (in practice, REML estimators are fairly

    robust to this assumption. All ML estimates have the undesirable property of being

    statistically

    biased, because they fail to account for the degrees of freedom lost in estimating fixed

    effects. This generates bias even when the only fixed effect being considered is the mean, but

    the bias can be considerable for larger numbers of fixed effects (Meyer 1989). As a result, an

    ML approach will underestimate the residual variance. However, the bias can be avoided by

    considering a restricted maximum likelihood (REML) in which only the likelihood of the

    part of the data that does not depend on the fixed effects is considered (Patterson &

    Thompson, 1971). To obtain REML estimators rather than just ML for the model in equation,

    the likelihood is maximized for a transformed vectory, wherey contains the data corrected

    by a particular transformation matrixK(soy Ky), andKdepends on the design matrixX

    such that KX ssentially the ML estimates for these

    transformed variables.

    Predicting breeding values

    An individuals breeding value for a given phenotypic trait is the total additive effect

    of its genes on that trait (Falconer &Mackay 1996). Armed with estimates of the variance

    components that define V, we can return to equation to make predictions of individual

    additive genetic effects, or breeding values, and estimates of fixed effects. These are known

    as BLUPs and BLUEs, respectively: best (because they minimize error variance), linear (they

    are linear functions of the data), unbiased (their expected mean is equal to what they are

  • 8/4/2019 Bio Metrical Model

    11/13

    estimating), predictors (for random effects) or estimates (for fixed effects). The BLUE of

    fixed effects is simply the least-squares estimator.

    Solution to linear models

    The various methods used to solve the linear models can be broadly divided into

    1. Direct inversion2. Iteration on the MME: Done by Jacobi or Gauss-Seidel iteration3. Iteration on the data is done by setting up of equations for each level of the effect

    and solution is through any one of the iterations.

    Mrode (1996) has given detailed description about different models and solving of linear

    equations with appropriate examples.

    Bayesian method of estimation

    It is based on the conditioning that the parameter to be estimated is a random variable

    and the data are fixed and it is explained by the Bayes equation

    P(/y) = P(y/ )P() and is called as the posterior estimate based on the prior. This method is

    more intuitive as data once created cannot be created and Bayesian principle takes into

    consideration this fact. This methodology is more useful when the assumptions of normality

    or other distributions is not fulfilled in case of maximum likelihood distribution.

    Simulation of data

    As defined earlier, a Model can be described as a physical, mathematical or

    otherwise logical representation of a system, entity, phenomenon, or process. A Simulation

    is the implementation or exercise of a Model over time6 hence, the simulation, utilising

    models, becomes the dynamic representation of a real world activity or entity. Simulation is

    done in order to get numerous subsets of data with different circumstances so as to enable

    prediction and forecasting. Simulation helps in obtaining data with more volume, greater

    detail and accuracy. Real data can have some disadvantages like false positive significance,

  • 8/4/2019 Bio Metrical Model

    12/13

    lack of power or absence of true signal, which can be over come by simulation. Simulation

    has been used in new method development and genetic models for disease. However

    simulated data is much cleaner and can never replace real data.

    References

    Dobson, A. and Barnett, G. (2008). An Introduction to Generalized Linear Models. CRC Press,

    London.

    Harvey, W.R., 1990. Mixed Model Least-squares and Maximum Likelihood Computer Programme.

    PC-2 version. Ohio State University, Columbus.

    Henderson, C. R. 1975 Best linear unbiased estimation and prediction under a selection model.

    Biometrics 31, 423447.

    Kruuk, L.E.B. 2004. Estimating genetic parameters in natural populations using the Animal Model.

    Phil. Trans. R. Soc. Lond., 359: 873-890

    Meyer, K. (1998). DFREML User Notes. University of New England, Armidale, Australia.

    Meyer, K. 1989. Restricted maximum-likelihood to estimate variance components for animal models

    with several random effects using a derivative-free algorithm. Genet. Selection.Evol.,21,

    317340.

    Mrode, R. A. (1996). Linear Models for the Prediction of Animal Breeding Values. CAB

    international, UK.

    Nicholas, F. W. (1982). Veterinary Genetics.

    Patterson, H. D. & Thompson, R. 1971 Recovery of interblock information when block sizes are

    unequal. Biometrika, 58: 545554.

  • 8/4/2019 Bio Metrical Model

    13/13