BASIC of GENETICS W HAT YOU NEED TO KNOW Ahmed Rebai [email protected].

BASIC of GENETICSWHAT YOU NEED TO KNOW

Ahmed Rebai

[email protected]

DNA.. THE CODE OF LIFE

DNA is a molecule made of four bricks Living cells/organisms have DNA within it DNA contains the ‘text’ of life

FROM DNA TO PROTEIN

DNA

Parts of DNA are CODING (give proteins) this is only 3% in human genome but 95% of yeast

Parts of DNA are NON-CODING:Introns Regulatory region of genesOther (junk DNA!)

GENE

Gene: a section of DNA that codes for a protein and protein contributes to a trait

A chromosome is a ‘chunk’ of DNA and genes are parts of chromosomes

GENES … ALLELES

Because we have a pair of each chromosome, we have two copies of each gene

These two forms can be identical in sequence or different: they are called ALLELE

Alleles can yield different phenotypes

ALLELE

Allele: the different ‘options’ for a gene Example: attached or unattached earlobes

are the alleles for the gene for earlobe shape

DOMINANT/RECESSIVE

Dominant: an allele that blocks or hides a recessive allele

Recessive: an allele that is blocked by or hidden by a dominant allele

GENOTYPE

Genotype: A person’s set of alleles (gene options)

Genotypes can be noted by Two letters denoting alleles: AA, AB, BB or for

single variations for example AA, AG, GG A digit 1, 2, 3 or 0,1,2 (choosing a reference

allele)

2

1

0

HOMOZYGOUS/HETEROZYGOUS

Homozygous: When a person’s two alleles for a gene are the same

Heterozygous: When a person’s two alleles for a gene are different

You get one allele from your mom and one from your dad.

If you get the same alleles from your mom and dad, you are homozygous for that gene.

If your mom gave you a different allele than your dad, you are heterozygous for that gene

PHENOTYPE

Phenotype: A person’s physical features because of their genotype

What you look like (your phenotype) is based on what your genotype is (your genes)

SEGERGATION: LESSONS FROM PEAS

Mendel (1822-1884) in the monastry of St. Thomas in the town of Brno (Brünn), in the Czech Republic. By a series of experiments in 1856-1863 on garden peas discovred the laws of inheritance

SEXUAL REPRODUCTION

MENDELIAN GENETICS: THE LAWS

SEGERGATION

SEGREGATION RULES 1. Genes come in pairs, which means that a

cell or individual has two copies (alleles) of each gene.

2. For each pair of genes, the alleles may be identical (homozygous WW or homozygous ww), or they may be different (heterozygous Ww).

3. Each reproductive cell (gamete) produced by an individual contains only one allele of each gene (that is, either W or w).

4. In the formation of gametes, any particular gamete is equally likely to include either allele (hence, from a heterozygous Ww genotype, half the gametes contain W and the other half contain w).

5. The union of male and female reproductive cells is a random process that reunites the alleles in pairs.

MENDEL’S FIRST LAW

The Principle of Segregation: In the formation of gametes, the paired hereditary determinants separate (segregate) in such a way that each gamete is equally likely to contain either member of the pair.

RECOMBINATION Mendel studied co-segregation of two

genes by crossing: Wrinkled and Green x Round and Yellow

MENDENL’S SECOND LAW

The Principle of Independent Assortment: Segregation of the members of any pair of alleles is independent of the segregation of other pairs in the formation of reproductive cells.

This is of course valid for unlinked genes

RECOMBINATION

When two genes are linked (close on the same chromosome) they do not segregate independently; frequencies of genotypes in progeny depend on the distance between genes

MULTIPLE GENES FOR A PHENOTYPE: POLYGENIC TRAITS

CONTINIOUS SCALE FOR A PHENOTYPE

LET US EXERCICE

What are the genotypes produced by the following matings and their frequencies:

AA x AA AA x Aa AA x aa Aa x Aa Aa x aa aa x aa What are the frequencies of two-gene

genotypes from this mating: AABb x AaBB?

POPULATION GENETICSBasic concepts and theories

PROBABILITY IN POPULATION GENETICS

Consider the offsprings of the mating Aa x Aa The addition rule:

Pr(an offspring have at least one A allele)=Pr(A-)= Pr(AA or Aa)= Pr(AA)+Pr(Aa)=1/4+1/2=3/4

For any two independent events A and B Pr(A or B)=Pr(A)+Pr(B)

The multiplication rule: Pr(two offsprings having at least one A allele

each)= Pr(A- and A-)=Pr(A-)xPr(A-)= 3/4x3/4=9/16 Far any two independent events A and B

Pr(A and B)=Pr(A)xPr(B)

EXERCICE

Two indivdiuals with genotypes Aa and Aa married and had three children; what is the probability that one of their children has the genotype aa?

Pr(aa and (AA or Aa) and (AA or Aa))= Pr(aa)xPr(A-)xPr(A-)=1/4x3/4x3/4=9/64

But Since the aa child have three possible birth

orders we should multiply by 3. so 27/64. Compute for the case of two children? (response: 6/16; for 4 children this is also 27/64)

ORGANIZATION OF GENETIC VARIATION

A population is a group of organisms of the same species living within a sufficiently restricted geographical area that any mmeber can potentially mate with any other member (of the opposite sex)

Population subdivision can be due to geographic constraints as well as to social behaviour

Local populations: by country, town, : a group of individuals that can interbreed also said subpopulations or Mendelian populations

GENETIC VARIATION

Phenotypic diversity in natural populations is impressive and is due to genetic variation: multiple alleles for many genes affecting the phenotype

Population genetics is concerned by describing how alleles are organized into genotypes and to determine wether alleles of the same or different genes are associated at random

32

ALLELE FREQUENCIES IN POPULATIONS

Allele frequency is the proportion in the population of all alleles of the gene that are of the specified type

Since the population are of large size allele frequencies are estimated from a population sample Consider a gene with genotypes: AA, Aa et aa and a

sample of N individuals We count the number of individuals that have AA, Aa et

aa genotypes (denoted NAA, NAa et Naa, respectively) and we estimate the ferquency of allele A by the number of alleles A among all alleles segregating in the population, that is:

pA= (2NAA+NAa)/2N

and then pa=1-pA

EXAMPLE In a sample of 1000 individuals 298 were of genotype

MM and 489 MN and 213 NN so the ferquency of allele M is

pM=(2*298+489)/(2*1000)=0.54 We can compute a 95% confidence interval for the

frequency based on the binomial law and normal approximation:

This approximation is only valid for non-small (>0.1) and non-high (<0.9) frequencies

In example we get [0.52 ; 0.56]

FOR RARE ALLELES

For rare alleles (less than 1%) there is chance that a sample do not contain any allele carrier so the frequency estimation will be 0

An alternative is to use Emprical Bayes estimation: For uniform prior this gives p=(k+2)/(n+4) where

k is the observed number of alleles in the sample and n the total number of alleles

RANDOM MATING

Means that any two individuals (of opposite sex) have the same probability to mate

This means that genotypes meet each other with the same probability as if they were formed by random collision of genotypes

Random mating can apply to some genes like those controlling blood groups or neutral polymorphisms but not for others like those controlling skin color or height

NON OVERLAPPING GENERATION

Formally this means that the cycle of birth, maturation and death includes the death of all individuals present in each generation before the next generation mature

This is only an approximation (simplistic in humans) but works well as far as geotype frequencies are considered

THE HARDY-WEINBERG PRINCIPLE

38

If we assume that The organism is diploidReproduction is sexualGenerations non-overlappingAllele frequencies identical in males

and femalesThe population is of large sizeMating is randomMigration and mutation is negligibleNatural seltcion does not affect alleles

THEN..

Genotype frequencies can be deduced from allele frequencies (p is frequency of allele A, q=1-p of allele a):

AA: p² Aa: 2pq aa: q² These frequencies (allelic and genotypic)

remains the same over generations : we say that the population is in Hardy-Weinberg Equilibrium (HWE)

IMPLICATION OF HWE

Despite very restrictive and incorrect assumption HWE offers a reference model in which there are no evolutionary forces at work other than those imposed by the process of reproduction itself (like a mechanical model of falling object without any force in action other than gravity)

The HW model separates life cycle to two phases: games->zygote and zygote->adult

Even if the assumptions of non-overlapping generations is not true HWE will be attained gradually

Applies also to multiallelic genes

IMPLICATION OF HWE

APPLICATION OF HWE

We can calculate the number of carriers of a rare mutation in the population

Ex: cystic fibrosis in european population patient is known to be 1 over 1700 (q=0.024) so the number of heterozygotes is (due to HWE) about 5%

So when there is a very rare allele most of genotypes containing this allele are heterozygous:

Show that for a rare allele of frequency is 1/1000 there are 2000 times more heterzoygotes than recessive homozygotes?

HWE DEVIATION

44

Deviation from HWE can be due to inbreeding, population stratification, selection, gender-dependent allele frequencies, non-random (assortative) mating

Principle do not apply directly to X-linked genes or Y-linked genes

TESTS OF HWE

45

Compare observed to expected genotype counts using Pearson chi-square test of goodness of fit: with 3 genotypes and 1 parameter estimated (p) we have a test with 1 df

Inappropriate for rare variants (low genotype counts): use Fisher Exact Test (FET)

Other Exact tests are available in the R language (e.g. Genetics package,…)

PEARSON CHI-SQUARE THROUGH D

46

Let DA= PAA- p²Testing HWE is testing DA=0

Compute p-value = Pr(²1df> ²obs)If p-value<0,05 (or 0,0001) then Deviation

from HWE

))²1((²

²

pp

DN A

47

Example: In a sample of 1000 individuals 298 were of genotype MM and 489 MN and 213 NN so the ferquency of allele M is

Genotypes: MM MN NN Observed counts : 298 489 213 Expected counts : 294.3 496.4 209.3

pM=0.54, PMM=0.294 so D=0.298-0.294=0.004

²=N D²/(p(1-p))²=1000*(0.004/(0.54*0.46))²

²=0.25<3.84; p-value=0.61

TESTS OF HWE: LET’S DO IT!

HAPLOTYPES FROM GENOTYPES

48

If we study many genes they can be linked and one can use haplotypes

A haplotype (haploid genotype) is a set for alleles carried by one chromosome for several genes

Consider two genes (A,a) and (B,b) with allele frequencies (pA, pa) and (pB, pb)

If gametic frequencies are product of allele frequencies:

AB: pAxpB, Ab:pAxpb, aB: paxpB, ab:paxpb

We say that the genes are in random association or in Linkage equilibrium

LINKAGE DISEQUIULIBRIUM

If the observed frequency of gametes (e.g. PAB) differ from that expected under linkage equilibrium (pAxpB) we say that the gene is in Linkage Disequilibrium (LD)

To measure and test LD we need to know the haplotype frequencies

LINKAGE DISEQUILIBRIUM

51

SNP1 SNP2Allele Frequencies

40%

60%

30%70%

No LDLinkage Disequilibrium (LD)

12%

28%

18%

42%

a

A

60%

30%

10%

B

b

LD MEASURES: D

The difference between observed and expected haplotype frequency

Is also equal to

D is bounded between Dmax and Dmin

BAAB ppPD

aBAbabAB PPPPD

D’: STANDARDIZED D

Practically choose alleles A and B such that D>0 and pA>pB,

A standardized measure of LD is thus:

D’=1 denotes complete LD

BA pp

D

D

DD

)1('

max

THE R² MEASURE : MORE PRACTICAL

54

This is correlation from the 2x2 contingency table of haplotype counts

Or

bBaA PPPP

Dr

²²

bA

aB

PP

PPDr )²'(²

TESTING LD

We can show that Nr² is a chi-square test of LD (1df)Exercice: two blood group systems:

M/N and S/s gave following haplotypes (1000 individuals):

MS: 474 Ms: 611 NS: 142 Ns: 733 Allele frequencies are M: 0.54, S: 0.31 Compute D and D’ and r² Test LD Solution: D=0.07, D’=0.50 r²=0.47, X²=470, p<10-100

CAUSES OF LD

LD is ‘created by linkage’ If r is the recombination rate between two

genes then we can show that LD at generation t is given by

Dt=(1-r)tD0

If r is small (genes very close on chromosome) the decay is very slow and can stay for over hundreds of generation

RECOMBINATION AND LD

(1-r)/2 /2

DECAY OF LD OVER GENERATIONS

ADMIXTURE OF POPULATIONS

LD can be created by the merge of populations having different gametic frequencies

Let two populations and two genes in linkage equiulibrium in both, where alleles A and B have frequencies 0.05 in the first population and 0.95 in the second population

A new population is formed by equal mixture of the two populations, show that LD is high in that population (D=0.2 and D’=0.81) ?

ADMIXTURE

NATURAL (DARWINIAN) SELECTION

Individuals differ in their ability to survive and reproduce owing in part to their genotype

Th selective advantage/disadvantage is measured by fitness

Selection results in a change of allele frequencies over generations and deviation from HWE

EFFECT OF SELECTION

RANDOM GENETIC DRIFT

For each generation there is a chance in the drawing of gametes that will unit to form the next generation

This chance can result in a random change in allele frequency and may ultimately lead to the fixation or elimination of some alleles

SIMPLY SAYING

MATHEMATICAL MODELS OF DRIFT

Wright-Fisher model (1930): probability of obtaining k copies of an allele that had frequency p in the last generation is:

expected time before a neutral allele becomes fixed through genetic drift is given by:

POPULATION BOTTLENECK

FOUNDER EFFECT

POPULATION SUBSTRUCTURE

When a population is organized in several subpopulations having different genetic composition (allele frequencies)

Substructure generally results in the reduction of heterozygotes frequency relative to that expected with random mating (Wahlund principle)

Several measures to assess population substructure : F-statistics

F-STATISTICS

Defined by Wright (1921)

(1-FIT)=(1-FIS)(1-FST)

ANOTHER FORMULATION

The mots useful to test substructure is FST an index that measures the level of genetic divergence among subpopulations

FST=(HT-HS)/HT

HS: average heterozygosity among individuals within subpopulations

HT: average heterozygosity among individuals within the total populations

According to variance of allele frequencies

Can be calculated by R package (hierfstat) FST is not a genetic distance

HOW TO USE IT?

FST=1 means total divergence by fixation of alternative alleles in subpopulations

<0.05: little differentiation 0.0<FST<0.15 moderate

0.15<FST<0.25 high >0.25 very high Test chi-square with 1 df: X²= (k-1) N FST Examples:

between european and sub-sahrian african: 0.15 Japanese-african: 0.19 europeans: 0.11

EXAMPLE

Two population where allele frequency is 0,5 and 0,3

ADMIXTURE

Genetic admixture occurs when individuals from two or more previously separated populations begin interbreeding.

Admixture results in the introduction of new genetic lineages into a population.

Most human populations are a product of mixture of genetically distinct groups that intermixed within the last 4,000 years.

ADMIXTURE DETECTION

By testing HWE Standard statistical methods applied to data

on genotype, alleles/haplotype frequencies: Principal component Analysis (PCA), Clustering: K-means, hierarchical,..

Advanced methods: Maximum likelihood (psmix R package) Bayesian methods Wavelet analysis (adwave R package)

STRUCTURE

PRINCIPAL COMPONENT ANALYSIS

CLUSTERING

STRUCTURE

inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed.

http://pritchardlab.stanford.edu/structure.html



ADMIXTURE

https://www.genetics.ucla.edu/software/admixture/



R PACKAGES Genetics: Classes and methods for handling

genetic data. Includes classes to represent genotypes and haplotypes at single markers up to multiple markers on multiple chromosomes. Function include allele frequencies, flagging homo/heterozygotes, flagging carriers of certain alleles, estimating and testing for Hardy-Weinberg disequilibrium, estimating and testing for linkage disequilibrium, ...

Adegenet: Classes and functions for genetic data analysis within the multivariate framework

Hierfstat: estimation of hierarchical F-statistics from haploid or diploid genetic data with any numbers of levels in the hierarchy, following the algorithm Functions are also given to test via randomisation the significance of each F and variance components

BASIC of GENETICS W HAT YOU NEED TO KNOW Ahmed Rebai [email protected].

Documents

Transcript of BASIC of GENETICS W HAT YOU NEED TO KNOW Ahmed Rebai [email protected].