BASIC of GENETICS W HAT YOU NEED TO KNOW Ahmed Rebai [email protected].
-
Upload
shonda-whitehead -
Category
Documents
-
view
215 -
download
0
Transcript of BASIC of GENETICS W HAT YOU NEED TO KNOW Ahmed Rebai [email protected].
DNA.. THE CODE OF LIFE
DNA is a molecule made of four bricks Living cells/organisms have DNA within it DNA contains the ‘text’ of life
DNA
FROM DNA TO PROTEIN
DNA
Parts of DNA are CODING (give proteins) this is only 3% in human genome but 95% of yeast
Parts of DNA are NON-CODING:Introns Regulatory region of genesOther (junk DNA!)
GENE
Gene: a section of DNA that codes for a protein and protein contributes to a trait
A chromosome is a ‘chunk’ of DNA and genes are parts of chromosomes
GENES … ALLELES
Because we have a pair of each chromosome, we have two copies of each gene
These two forms can be identical in sequence or different: they are called ALLELE
Alleles can yield different phenotypes
ALLELE
Allele: the different ‘options’ for a gene Example: attached or unattached earlobes
are the alleles for the gene for earlobe shape
DOMINANT/RECESSIVE
Dominant: an allele that blocks or hides a recessive allele
Recessive: an allele that is blocked by or hidden by a dominant allele
GENOTYPE
Genotype: A person’s set of alleles (gene options)
Genotypes can be noted by Two letters denoting alleles: AA, AB, BB or for
single variations for example AA, AG, GG A digit 1, 2, 3 or 0,1,2 (choosing a reference
allele)
2
1
0
HOMOZYGOUS/HETEROZYGOUS
Homozygous: When a person’s two alleles for a gene are the same
Heterozygous: When a person’s two alleles for a gene are different
You get one allele from your mom and one from your dad.
If you get the same alleles from your mom and dad, you are homozygous for that gene.
If your mom gave you a different allele than your dad, you are heterozygous for that gene
PHENOTYPE
Phenotype: A person’s physical features because of their genotype
What you look like (your phenotype) is based on what your genotype is (your genes)
SEGERGATION: LESSONS FROM PEAS
Mendel (1822-1884) in the monastry of St. Thomas in the town of Brno (Brünn), in the Czech Republic. By a series of experiments in 1856-1863 on garden peas discovred the laws of inheritance
SEXUAL REPRODUCTION
MENDELIAN GENETICS: THE LAWS
SEGERGATION
SEGREGATION RULES 1. Genes come in pairs, which means that a
cell or individual has two copies (alleles) of each gene.
2. For each pair of genes, the alleles may be identical (homozygous WW or homozygous ww), or they may be different (heterozygous Ww).
3. Each reproductive cell (gamete) produced by an individual contains only one allele of each gene (that is, either W or w).
4. In the formation of gametes, any particular gamete is equally likely to include either allele (hence, from a heterozygous Ww genotype, half the gametes contain W and the other half contain w).
5. The union of male and female reproductive cells is a random process that reunites the alleles in pairs.
MENDEL’S FIRST LAW
The Principle of Segregation: In the formation of gametes, the paired hereditary determinants separate (segregate) in such a way that each gamete is equally likely to contain either member of the pair.
RECOMBINATION Mendel studied co-segregation of two
genes by crossing: Wrinkled and Green x Round and Yellow
MENDENL’S SECOND LAW
The Principle of Independent Assortment: Segregation of the members of any pair of alleles is independent of the segregation of other pairs in the formation of reproductive cells.
This is of course valid for unlinked genes
RECOMBINATION
When two genes are linked (close on the same chromosome) they do not segregate independently; frequencies of genotypes in progeny depend on the distance between genes
MULTIPLE GENES FOR A PHENOTYPE: POLYGENIC TRAITS
CONTINIOUS SCALE FOR A PHENOTYPE
LET US EXERCICE
What are the genotypes produced by the following matings and their frequencies:
AA x AA AA x Aa AA x aa Aa x Aa Aa x aa aa x aa What are the frequencies of two-gene
genotypes from this mating: AABb x AaBB?
POPULATION GENETICSBasic concepts and theories
PROBABILITY IN POPULATION GENETICS
Consider the offsprings of the mating Aa x Aa The addition rule:
Pr(an offspring have at least one A allele)=Pr(A-)= Pr(AA or Aa)= Pr(AA)+Pr(Aa)=1/4+1/2=3/4
For any two independent events A and B Pr(A or B)=Pr(A)+Pr(B)
The multiplication rule: Pr(two offsprings having at least one A allele
each)= Pr(A- and A-)=Pr(A-)xPr(A-)= 3/4x3/4=9/16 Far any two independent events A and B
Pr(A and B)=Pr(A)xPr(B)
EXERCICE
Two indivdiuals with genotypes Aa and Aa married and had three children; what is the probability that one of their children has the genotype aa?
Pr(aa and (AA or Aa) and (AA or Aa))= Pr(aa)xPr(A-)xPr(A-)=1/4x3/4x3/4=9/64
But Since the aa child have three possible birth
orders we should multiply by 3. so 27/64. Compute for the case of two children? (response: 6/16; for 4 children this is also 27/64)
ORGANIZATION OF GENETIC VARIATION
A population is a group of organisms of the same species living within a sufficiently restricted geographical area that any mmeber can potentially mate with any other member (of the opposite sex)
Population subdivision can be due to geographic constraints as well as to social behaviour
Local populations: by country, town, : a group of individuals that can interbreed also said subpopulations or Mendelian populations
GENETIC VARIATION
Phenotypic diversity in natural populations is impressive and is due to genetic variation: multiple alleles for many genes affecting the phenotype
Population genetics is concerned by describing how alleles are organized into genotypes and to determine wether alleles of the same or different genes are associated at random
32
ALLELE FREQUENCIES IN POPULATIONS
Allele frequency is the proportion in the population of all alleles of the gene that are of the specified type
Since the population are of large size allele frequencies are estimated from a population sample Consider a gene with genotypes: AA, Aa et aa and a
sample of N individuals We count the number of individuals that have AA, Aa et
aa genotypes (denoted NAA, NAa et Naa, respectively) and we estimate the ferquency of allele A by the number of alleles A among all alleles segregating in the population, that is:
pA= (2NAA+NAa)/2N
and then pa=1-pA
EXAMPLE In a sample of 1000 individuals 298 were of genotype
MM and 489 MN and 213 NN so the ferquency of allele M is
pM=(2*298+489)/(2*1000)=0.54 We can compute a 95% confidence interval for the
frequency based on the binomial law and normal approximation:
This approximation is only valid for non-small (>0.1) and non-high (<0.9) frequencies
In example we get [0.52 ; 0.56]
FOR RARE ALLELES
For rare alleles (less than 1%) there is chance that a sample do not contain any allele carrier so the frequency estimation will be 0
An alternative is to use Emprical Bayes estimation: For uniform prior this gives p=(k+2)/(n+4) where
k is the observed number of alleles in the sample and n the total number of alleles
RANDOM MATING
Means that any two individuals (of opposite sex) have the same probability to mate
This means that genotypes meet each other with the same probability as if they were formed by random collision of genotypes
Random mating can apply to some genes like those controlling blood groups or neutral polymorphisms but not for others like those controlling skin color or height
NON OVERLAPPING GENERATION
Formally this means that the cycle of birth, maturation and death includes the death of all individuals present in each generation before the next generation mature
This is only an approximation (simplistic in humans) but works well as far as geotype frequencies are considered
THE HARDY-WEINBERG PRINCIPLE
38
If we assume that The organism is diploidReproduction is sexualGenerations non-overlappingAllele frequencies identical in males
and femalesThe population is of large sizeMating is randomMigration and mutation is negligibleNatural seltcion does not affect alleles
THEN..
Genotype frequencies can be deduced from allele frequencies (p is frequency of allele A, q=1-p of allele a):
AA: p² Aa: 2pq aa: q² These frequencies (allelic and genotypic)
remains the same over generations : we say that the population is in Hardy-Weinberg Equilibrium (HWE)
WHY?
IMPLICATION OF HWE
Despite very restrictive and incorrect assumption HWE offers a reference model in which there are no evolutionary forces at work other than those imposed by the process of reproduction itself (like a mechanical model of falling object without any force in action other than gravity)
The HW model separates life cycle to two phases: games->zygote and zygote->adult
Even if the assumptions of non-overlapping generations is not true HWE will be attained gradually
Applies also to multiallelic genes
IMPLICATION OF HWE
APPLICATION OF HWE
We can calculate the number of carriers of a rare mutation in the population
Ex: cystic fibrosis in european population patient is known to be 1 over 1700 (q=0.024) so the number of heterozygotes is (due to HWE) about 5%
So when there is a very rare allele most of genotypes containing this allele are heterozygous:
Show that for a rare allele of frequency is 1/1000 there are 2000 times more heterzoygotes than recessive homozygotes?
HWE DEVIATION
44
Deviation from HWE can be due to inbreeding, population stratification, selection, gender-dependent allele frequencies, non-random (assortative) mating
Principle do not apply directly to X-linked genes or Y-linked genes
TESTS OF HWE
45
Compare observed to expected genotype counts using Pearson chi-square test of goodness of fit: with 3 genotypes and 1 parameter estimated (p) we have a test with 1 df
Inappropriate for rare variants (low genotype counts): use Fisher Exact Test (FET)
Other Exact tests are available in the R language (e.g. Genetics package,…)
PEARSON CHI-SQUARE THROUGH D
46
Let DA= PAA- p²Testing HWE is testing DA=0
Compute p-value = Pr(²1df> ²obs)If p-value<0,05 (or 0,0001) then Deviation
from HWE
))²1((²
²
pp
DN A
47
Example: In a sample of 1000 individuals 298 were of genotype MM and 489 MN and 213 NN so the ferquency of allele M is
Genotypes: MM MN NN Observed counts : 298 489 213 Expected counts : 294.3 496.4 209.3
pM=0.54, PMM=0.294 so D=0.298-0.294=0.004
²=N D²/(p(1-p))²=1000*(0.004/(0.54*0.46))²
²=0.25<3.84; p-value=0.61
TESTS OF HWE: LET’S DO IT!
HAPLOTYPES FROM GENOTYPES
48
If we study many genes they can be linked and one can use haplotypes
A haplotype (haploid genotype) is a set for alleles carried by one chromosome for several genes
Consider two genes (A,a) and (B,b) with allele frequencies (pA, pa) and (pB, pb)
If gametic frequencies are product of allele frequencies:
AB: pAxpB, Ab:pAxpb, aB: paxpB, ab:paxpb
We say that the genes are in random association or in Linkage equilibrium
LINKAGE DISEQUIULIBRIUM
If the observed frequency of gametes (e.g. PAB) differ from that expected under linkage equilibrium (pAxpB) we say that the gene is in Linkage Disequilibrium (LD)
To measure and test LD we need to know the haplotype frequencies
LINKAGE DISEQUILIBRIUM
51
SNP1 SNP2Allele Frequencies
40%
60%
30%70%
No LDLinkage Disequilibrium (LD)
12%
28%
18%
42%
a
A
60%
30%
10%
B
b
LD MEASURES: D
The difference between observed and expected haplotype frequency
Is also equal to
D is bounded between Dmax and Dmin
BAAB ppPD
aBAbabAB PPPPD
D’: STANDARDIZED D
Practically choose alleles A and B such that D>0 and pA>pB,
A standardized measure of LD is thus:
D’=1 denotes complete LD
BA pp
D
D
DD
)1('
max
THE R² MEASURE : MORE PRACTICAL
54
This is correlation from the 2x2 contingency table of haplotype counts
Or
bBaA PPPP
Dr
²²
bA
aB
PP
PPDr )²'(²
TESTING LD
We can show that Nr² is a chi-square test of LD (1df)Exercice: two blood group systems:
M/N and S/s gave following haplotypes (1000 individuals):
MS: 474 Ms: 611 NS: 142 Ns: 733 Allele frequencies are M: 0.54, S: 0.31 Compute D and D’ and r² Test LD Solution: D=0.07, D’=0.50 r²=0.47, X²=470, p<10-100
CAUSES OF LD
LD is ‘created by linkage’ If r is the recombination rate between two
genes then we can show that LD at generation t is given by
Dt=(1-r)tD0
If r is small (genes very close on chromosome) the decay is very slow and can stay for over hundreds of generation
RECOMBINATION AND LD
(1-r)/2 /2
DECAY OF LD OVER GENERATIONS
ADMIXTURE OF POPULATIONS
LD can be created by the merge of populations having different gametic frequencies
Let two populations and two genes in linkage equiulibrium in both, where alleles A and B have frequencies 0.05 in the first population and 0.95 in the second population
A new population is formed by equal mixture of the two populations, show that LD is high in that population (D=0.2 and D’=0.81) ?
ADMIXTURE
NATURAL (DARWINIAN) SELECTION
Individuals differ in their ability to survive and reproduce owing in part to their genotype
Th selective advantage/disadvantage is measured by fitness
Selection results in a change of allele frequencies over generations and deviation from HWE
EFFECT OF SELECTION
RANDOM GENETIC DRIFT
For each generation there is a chance in the drawing of gametes that will unit to form the next generation
This chance can result in a random change in allele frequency and may ultimately lead to the fixation or elimination of some alleles
SIMPLY SAYING
MATHEMATICAL MODELS OF DRIFT
Wright-Fisher model (1930): probability of obtaining k copies of an allele that had frequency p in the last generation is:
expected time before a neutral allele becomes fixed through genetic drift is given by:
POPULATION BOTTLENECK
FOUNDER EFFECT
POPULATION SUBSTRUCTURE
When a population is organized in several subpopulations having different genetic composition (allele frequencies)
Substructure generally results in the reduction of heterozygotes frequency relative to that expected with random mating (Wahlund principle)
Several measures to assess population substructure : F-statistics
F-STATISTICS
Defined by Wright (1921)
(1-FIT)=(1-FIS)(1-FST)
ANOTHER FORMULATION
The mots useful to test substructure is FST an index that measures the level of genetic divergence among subpopulations
FST=(HT-HS)/HT
HS: average heterozygosity among individuals within subpopulations
HT: average heterozygosity among individuals within the total populations
According to variance of allele frequencies
Can be calculated by R package (hierfstat) FST is not a genetic distance
HOW TO USE IT?
FST=1 means total divergence by fixation of alternative alleles in subpopulations
<0.05: little differentiation 0.0<FST<0.15 moderate
0.15<FST<0.25 high >0.25 very high Test chi-square with 1 df: X²= (k-1) N FST Examples:
between european and sub-sahrian african: 0.15 Japanese-african: 0.19 europeans: 0.11
EXAMPLE
Two population where allele frequency is 0,5 and 0,3
ADMIXTURE
Genetic admixture occurs when individuals from two or more previously separated populations begin interbreeding.
Admixture results in the introduction of new genetic lineages into a population.
Most human populations are a product of mixture of genetically distinct groups that intermixed within the last 4,000 years.
ADMIXTURE DETECTION
By testing HWE Standard statistical methods applied to data
on genotype, alleles/haplotype frequencies: Principal component Analysis (PCA), Clustering: K-means, hierarchical,..
Advanced methods: Maximum likelihood (psmix R package) Bayesian methods Wavelet analysis (adwave R package)
STRUCTURE
PRINCIPAL COMPONENT ANALYSIS
CLUSTERING
STRUCTURE
inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed.
http://pritchardlab.stanford.edu/structure.html
ADMIXTURE
https://www.genetics.ucla.edu/software/admixture/
R PACKAGES Genetics: Classes and methods for handling
genetic data. Includes classes to represent genotypes and haplotypes at single markers up to multiple markers on multiple chromosomes. Function include allele frequencies, flagging homo/heterozygotes, flagging carriers of certain alleles, estimating and testing for Hardy-Weinberg disequilibrium, estimating and testing for linkage disequilibrium, ...
Adegenet: Classes and functions for genetic data analysis within the multivariate framework
Hierfstat: estimation of hierarchical F-statistics from haploid or diploid genetic data with any numbers of levels in the hierarchy, following the algorithm Functions are also given to test via randomisation the significance of each F and variance components
RECOMMENDED READINGS