How to solve biological problems with math 2012
description
Transcript of How to solve biological problems with math 2012
How to solve biological problems with math 2012
23 Mars 2012
Phenotypic variation:
0
0.2
0.4
0.6
0.8
1
1.2
-6 -4 -2 0 2 4 6
What is association?chromosomeSNPs trait variant
Genetic variation yields phenotypic variation
Population with ‘ ’ allele Population with ‘ ’ allele
Distributions of “trait”
Quantifying Significance
T-test
t-value (significance) can be translated into p-value (probability)
Association using regression
genotype Coded genotype
phen
otyp
e
Regression analysis
X
Y
“response”
“feature(s)”
“intercept”
“coefficients”
“residuals”
Regression formalism
(monotonic)transformation
phenotype(response variable)of individual i
effect size(regression coefficient)
coded genotype(feature) of individual i
p(β=0)error(residual)
Goal: Find effect size that explains best all (potentially transformed) phenotypes as a linear function of the genotypes and estimate the probability (p-value) for the data being consistent with the null hypothesis (i.e. no effect)
Matlab function for Linear regression
• [x p tmp se] = regress_p(pheno,[ones(length(pheno),1) COV1 COV2 Genotype ]
Régression logistique
• Très utilisée en épidémiologie• Variable à expliquer: dichotomique• La maladie est caractérisée par un risque• Exprimer sous forme de risque ( ou de
probabilité) la relation entre une variable Y dichotomique et plusieurs variables X (facteurs de risque) (qualitatives ou quantitatives)
• Méthode d’estimation de l’association entre les facteurs de risque et la maladie (les bétas): méthode du maximum de vraisemblance,
• Odds ratio (rapport des cotes): force de l’association entre 1 facteur et la maladie (risque relatif)
Régression logistique
Le modèle logistiqueProbabilité d'une maladie cardiaque
en fonction de l'age
AGE
70605040302010
Pro
b(Y
=1 /
X)
1.0
.8
.6
.4
.2
0.0
Probability of the outcome
measure of the total contribution of all the independent variables used in the model and is known as the logit
The application of a logistic regression may be illustrated using a fictitious example of death from heart disease. This simplified model uses only three risk factors (age, sex, and blood cholesterol level) to predict the 10-year risk of death from heart disease. These are the parameters that the data fit:
The model can hence be expressed as
In this model, increasing age is associated with an increasing risk of death from heart disease (z goes up by 2.0 for every year over the age of 50), female sex is associated with a decreased risk of death from heart disease (z goes down by 1.0 if the patient is female), and increasing cholesterol is associated with an increasing risk of death (z goes up by 1.2 for each 1 mmol/L increase in cholesterol above 5 mmol/L).We wish to use this model to predict a particular subject's risk of death from heart disease: he is 50 years old and his cholesterol level is 7.0 mmol/L. The subject's risk of death is therefore
This means that by this model, the subject's risk of dying from heart disease in the next 10 years is 0.07 (or 7%).
Odds ratio• Rapport des chances, rapport des cotes ou risque relatif rapproché est
une • Mesure statistique, permettant de mesurer le degré de dépendance entre
des variables aléatoires qualitatives. • Mesure l'effet d'un facteur.• Le rapport des chances qu'un événement arrivant, par exemple une
maladie, à un groupe de personnes A arrive également à un autre groupe B.
• Si la probabilité qu'un évènement arrive dans le groupe A est p et q dans le groupe B, le rapport des chances est :
Odds ratio (OR) =
Matlab function for logistic regression
• [p0 x0 se0] = log_reg(Pheno,[COV1 COV2 ],Geno)