Human Genome Project

28
HUMAN GENOME PROJECT

description

Human Genome Project. Basic Strategy. How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in 1995. Various side projects: genetic diseases, variations between individuals, ethnic variation, comparison to other species. Strategy: - PowerPoint PPT Presentation

Transcript of Human Genome Project

Page 1: Human Genome Project

HUMAN GENOME PROJECT

Page 2: Human Genome Project

BASIC STRATEGYHow to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in 1995.

Various side projects: genetic diseases, variations between individuals, ethnic variation, comparison to other species.

Strategy:• 1. physical map relating specific DNA markers to the proper

chromosomal position.• 2. Overlapping set of cloned DNAs (contigs)• 3. sequencing and assembly• 4. finding the genes in the sequence• 5. annotation of gene function

Page 3: Human Genome Project

GENETIC MAPPING Where and why genes are present inside chromosomes Simply means we need to locate genes in total genome A genetic map uses recombination, crossing over during

meiosis, to determine how frequently two genes (or markers) are inherited together.

Genes genotypes phenotypes

Page 4: Human Genome Project

Gene map

Linkage map physical map

It tells you whether the presence of genes in chromosome

2 genes are close

or distantly related

No location

Page 5: Human Genome Project

LINKAGE MAPGenetic linkage is the tendency of genes that are located proximal to each other on a chromosome to be inherited together during meiosis.

Genes whose loci are nearer to each other are less likely to be separated onto different chromatids during chromosomal crossover, and are therefore said to be genetically linked.

In other words, the nearer two genes are on a chromosome, the lower is the chance of a swap occurring between them, and the more likely they are to be inherited together.

Page 6: Human Genome Project

CHROMOSOME THEORY OF LINKAGE

Morgan, along with Castle formulated the chromosome theory of linkage. It has the following postulates;

1. Genes are found arranged in a linear manner in the chromosomes.

2. Genes which exhibit linkage are located on the same chromosome.

3. Genes generally tend to stay in parental combination, except in cases of crossing over.

4. The distance between linked genes in a chromosome determines the strength of linkage. Genes located close to each other show stronger linkage than that are located far from each other, since the former are less likely to enter into crossing over.

Page 7: Human Genome Project

However crossing over does not occur between linked genes in every meiotic event, especially when the positions of the genes on the chromosome are very near one another.

The frequency with which crossing over occurs between any two linked genes is proportional to the distance between the loci along the chromosome.

Page 8: Human Genome Project

1. At very small distances, crossover is very rare, and most gametes are parental.

2. As the distance between two genes increases, crossover frequency increases. More recombinant gametes, fewer parental gametes.

3. When genetic loci are very far apart on the same chromosome, crossing over nearly always occurs, and the

frequency of recombinant gametes approaches 50 percent.

Page 9: Human Genome Project

WHAT IS MOLECULAR MARKER?

DNA sequence used to mark a particular location on a particular chromosomes.

Page 10: Human Genome Project

GENETIC MARKERS

Modern genetic markers: SNPs

A genetic marker is a gene or DNA sequence with a known location on a chromosome that can be used to identify individuals or species.

It can be described as a variation (which may arise due to mutation or alteration in the genomic loci) that can be observed.

A genetic marker may be a short DNA sequence, such as a sequence surrounding a single base-pair change (single nucleotide polymorphism, SNP), or a long one, like

Page 11: Human Genome Project

What are they?Variable sites in the genomeWhat are their uses?Finding disease genesTesting/estimating relationships Studying population differences

Phenotype Genotype

Brown eyes BB or Bb

Blue eyes bb

Page 12: Human Genome Project

PHYSICAL MAPPING

Cytogenetic mapping

A cytogenetic map is the visual appearance of a chromosome when stained and examined under a microscope.

Particularly important are visually distinct regions, called light and dark bands, which give each of the chromosomes a unique appearance.

This feature allows a person's chromosomes to be studied in a clinical test known as a karyotype, which allows scientists to look for chromosomal alterations

Page 13: Human Genome Project

PHYSICAL MAPSA physical map determines where a given DNA marker is located on the DNA of the chromosome.

Genetic and physical maps are (supposed to be) colinear—all the genes appear in the same order in both maps. But, distances are quite different: there is very little recombination in the centromeres, so large DNA distances are very short recombination distances.

Genetic maps using microsatellite (SSR) markers were used to develop physical maps: the appropriate SSR sites were expected to be found on the corresponding cloned DNA.

Page 14: Human Genome Project

SEQUENCE TAGGED SITES

Produced by sequencing RNA which in turn transcript from genes

RNA present Genes which are turned on in tissue

Its called “taq” because they are not really complete sequence of genes, its only partially sequenced

Page 15: Human Genome Project

SEQUENCE TAGGED SITESa sequence tagged site (STS) is a short sequence that is unique in the genome.

You obtain the sequence information from cloned DNA, and then locate it in the genome.

Using PCR it is then possible to determine whether your STS is present in any other clone or cell line.

Obtaining STS: sequencing the ends of large cloned DNAs (BACs or YACs, for example).

Uniqueness: use the cloned DNA from the STS as a probe on a Southern blot of genomic DNA: if the STS is unique, only 1 band will hybridize.

Repetitive DNA is very common in the human genome, and many DNA sequences are not unique.

A good source of unique DNA is EST clones: cDNA made from messenger RNA.

Page 16: Human Genome Project

SOMATIC CELL HYBRIDSHuman and mouse (or hamster) cultured cells can be fused together using polyethylene glycol.

• The resulting fused cell is a heterokaryon: it has 2 nuclei from different species.

• If the heterokaryon undergoes mitosis, the nuclei fuse.• Human chromosomes are unstable in a mixed nucleus, and

most of them are randomly lost. The mouse chromosomes all stay.

• Different cell lines can be established that contain different combinations of human chromosomes

• You can identify which human chromosomes remain using chromosome banding techniques.

A good way to determine which chromosome a DNA sequence is on. Sometimes also for gene products or phenotypes.

Page 17: Human Genome Project

RADIATION HYBRIDSStandard somatic cell fusions contain entire human chromosomes. To locate a gene more closely, you need to use chromosome fragments.

Start by irradiating human cells with a controlled dose of X-rays: chromosomes break up. Then, fuse the cells to mouse cells. The human chromosome fragments get integrated into the mouse chromosomes.

Create a panel of mouse/human hybrid cell lines.

• The current standard panels contain about 100 cell lines.

• Each line contains about 32% of the human genome• Average size of human genome fragment = 25 kbp• More radiation = smaller fragments

Mapping: the hybrid cell lines contain random human chromosome fragments, but closely linked sites are usually in the same cell line (same basic principle as recombination mapping).

• Until you have located some of the markers on the chromosomes, radiation hybrid mapping only gives you information about whether any two sequences are close together on the chromosome.

Page 18: Human Genome Project

CONTIGSA contig is a set of partially overlapping clones, a contiguous set of clones. No gaps between them.

Contigs allow you to build up the sequence of the chromosome over much larger regions than any single clone.

The first reasonably complete physical map of the human genome involved contigs generated by YACs (yeast artificial chromosomes).

Initially, you have a collection of clones with no information about how they are ordered on the chromosome.

Contigs are built up by using PCR to identify unique sequences (STS or EST) on each clone, and then looking for overlaps between the clones.

Page 19: Human Genome Project

SEQUENCING STRATEGYOnce a contig map of the genome was obtained, it was necessary to sequence each individual clone.

Most of the actual human genome sequencing was done on BAC clones, which are less prone to rearrangement than YAC clones. BACs are about 100-200 kbp long.

Large clones are generally sequenced by shotgun sequencing: The large cloned DNA is randomly broken up into a series of small fragments ( less than 1 kb). These fragments are cloned and sequenced. A computer program then assembles them based on overlaps between the sequences of each clone.

To ensure that every bit has been covered, you need to sequence random clones until you have covered each spot 5-10 times on average.

Page 20: Human Genome Project

WHOLE GENOME SHOTGUN SEQUENCING

Why bother with creating a large scale physical map: all that YAC and BAC cloning, radiation hybrids, STS comparisons, etc? Why not just fragment the whole genome into 1 kb pieces, sequence them all, and let the computer assemble the whole genome?

In practice, the genome is cloned into large fragments first, and then each large fragment is broken up for shotgun sequencing. But, the large fragments are not ordered: no physical map or set of contigs is created.

Requires a lot of overlapping coverage

Also requires good software.

Very successful for prokaryotic genomes (10 Mbp or less).• but the human genome is 300 times larger

Big problem: repeat sequence DNA, which is everywhere, and especially near the centromere. To find overlaps between clones, you need unique regions.

It remains unclear whether whole genome shotgun sequencing will work if there is no other information available to provide order. It has not been widely adopted for eukaryotic projects (so far).

Page 21: Human Genome Project

EST (EXPRESSED SEQUENCE TAG): A unique stretch of DNA within a coding region of a gene that is useful for identifying full-length genes and serves as a landmark for mapping.

An EST is a sequence tagged site (STS) derived from cDNA.

An STS is a short segment of DNA which occurs but once in the genome and whose location and base sequence are known. STSs are detectable by the polymerase chain reaction (PCR), are helpful in localizing and orienting mapping and sequence data, and serve as landmarks in the physical map of the genome.

Page 22: Human Genome Project

EXPRESSED-SEQUENCE TAGS (ESTS)

are cDNA sequences that have been sequenced from either the 5’ or 3’ ends.

They may contain all or part of a particular cDNA coding sequence,

and are useful for identifying unknown genes, mapping their positions within a genome,

and as a potential source for genetic material when a full-length cDNA is not available for a specific gene of interest.

Page 23: Human Genome Project

GENE DETECTIONthe best evidence that a given DNA sequence is expressed is to find an EST (cDNA copy of mRNA) that matches it.

Large numbers of EST libraries have been constructed and sequenced.

• The primary result of this was to determine that many genes have several different intron slicing patterns: sequences are exons in some tissues but introns in others.

Page 24: Human Genome Project

GENE DETECTIONHomology searches, using BLAST, are a good way to find genes. If a DNA sequence closely matches a sequence from another organism, it has been evolutionarily conserved, and that usually means that it is an expressed gene.

Exon prediction: exons need to be open reading frames (no stop codons), and they display patterns of nucleotide usage different from random DNA. Several different programs exist, and they give somewhat varying results. “Hypothetical genes” are genes whose existence has been predicted by computer but which lacks any experimental or cross-species data to confirm it.

• a “conserved hypothetical gene” is a sequence that matches other species even though there is no EST or other experimental evidence for its expression

Page 25: Human Genome Project
Page 26: Human Genome Project

GENOME ANNOTATION

The process of identifying the locations of genes and all of the coding regions in a genome and determining what those genes do.

Once a genome is sequenced, it needs to be annotated to make sense of it.

Page 27: Human Genome Project

GENE ANNOTATIONThere is a big problem of too much information not uniformly coded or maintained. The scientific literature contains numerous examples of the same gene or protein with several different names, and getting common definitions of functions is even harder.

To counter this, the Gene Ontology Consortium (GO) has created a controlled vocabulary of about 11,000 terms.

Every gene product (protein) can be annotated into three general categories:

• molecular function: what the protein actually does, such as “kinase activity”

• biological process: what cellular process the protein participates in, such as “signal transduction”

• cellular component: where the protein is found in the cell, such as “integral to the plasma membrane”

Each gene product can have multiple descriptive terms.

The terms are hierarchical: more specific terms are contained within less specific terms.

But, a given term can have more than one parent and more than one child term.

Page 28: Human Genome Project

GO EXAMPLE