BITS - Comparative genomics on the genome level

Post on 05-Dec-2014

928 views 4 download

description

This is the third presentation of the BITS training on 'Comparative genomics'. It reviews the basic concepts of sequence homology on the gene Thanks to Klaas Vandepoele of the PSB department.

Transcript of BITS - Comparative genomics on the genome level

Comparative genomicsin eukaryotes

Genome analysis

Klaas Vandepoele, PhD

Professor Ghent UniversityComparative & Integrative GenomicsVIB – Ghent University, Belgium

2

I. Genome conservation & genomic homology

Alignment of homologous regions Inter-genomic: aligning genomic sequences from different

species Intra-genomic aligning genomic sequences from the same

species

Different levels of resolution Comparative mapping (markers) Synteny (~ gene content) Colinearity (gene content + order conservation) DNA-based alignments (base-to-base mapping)

3

Human – Mouse - Ratresolution

4

Human – Mouse orthologous regions

Mouse chr IV

Comparativemapping

Genome translocations associated with human-mouse speciation

resolution

Human

www.ensembl.org

5

Human genome browser

Human chr IMouse chr IV

Conserved gene content & order

Human gene model

EST/cDNA similarities

Genome similarities

Gene loss and insertions in orthologous segments since human-mouse speciation

resolution

6

Human – Mouse base-to-base mapping

Blue: coding exons GT donor AG acceptor

Functional sequences (e.g. exons) evolve slower than non-functional ones (e.g. introns) due to natural selection against mutations in these regions

Consequently, functional elements, both coding and non-coding, are unusually well conserved in orthologous regions

resolution

7

DNA substitution rates for different gene/genome regions

Molecular Evolution, Li WH

8

Multiple species comparisons (gene-based)

PhIGsHedges, 2002

9

Genome size variation in the grasses: the use of model systems

BEP

PACC

Rice 450Mb

Barley ~5000Mb55 MYA

46 MYA

28 MYA

Maize ~2400Mb

Gaut 2002

Sorghum ~750Mb

10

Grass genomes: a single genetic system?

Gale and Devos, 1998

11

Micro-colinearity within the grasses

Bennetzen lab

Yeast Gene Order Browser (YGOB)

12

13

II. Computational detection of genomic homology

Synteny~ conservation of gene content

Colinearity~ conservation of (gene) content & order

Macro-colinearity Marker-based

Micro-colinearity DNA based or gene-based

14

How to find evidence for gene colinearity?

Time

1 2 3 4 5 6 7 8 9 10 11A

1 2 3 4 5 6 7 8 9 10 11

1 2 3 4 5 6 7 8 9 10 11

speciation

S1

S2

1 3 4 6 7 10 11

1 2 4 6 7 8 9 11

Gene loss, insertions, rearrangements, translocation, etc …

S1

S2

2

retained orthologs (anchor points)

15

Matrix representation

segment S1

seg

men

t S

2

1 3 4 - 6 X 101

2

4

6

7

8

9

-

- 7 X

-

X

1 3 4 6 7 10 11

1 2 4 6 7 8 9 11

S1

S2

11

11

16

Chromosome 1

Chr

omos

ome

2

• Represent chromosomes as sorted gene lists

• Identify all homologous gene pairs between chromosomes (all-against-all BLASTP*).

• Score pairs of homologues in matrix

Identifying homologous regions = identifying diagonal series of

elements in the gene homology matrix (GHM).

Map-based approach

Vandepoele et al., Genome Research 2002

17

The map-based approach: terminology

Chromosome 1

Ch

rom

oso

me

2

Tandem duplication

Colinear segment

Inverted colinear segment

1

2

Homologous gene

Gene Homology Matrix (GHM)

18

Detection of colinear homologous regions

HsaC1

MmuC4HsaC1

GgaC23

Human-mouse Chicken-human

19

Detection of colinear homologous regions

HsaC1HsaC1

MmuC4TviC1

Human-tetraodonHuman-mouse

MUMmer

20

NUCmer PROmer

21

And what about synteny?

ancient duplication

HsaC1

HsaC9

Identifying syntenic regions = identifying high homolog-density

regions in the gene homology matrix (GHM).

• Application of 2-dimensional sliding-window approach to score regions with a high density of homologous genes between 2 chromosomes

DeSyRe, Vandepoele et al. unpublished

22

Detection of recent and ancient large-scale duplications

synteny

ancient duplication

HsaC1

HsaC9

recent duplication

C2

C4

colinearity

23

III. Whole-genome alignments

Evolutionary constrained sequences are a good indicator of functional genome regions

Basic protocol1. Sequence generation2. Reconstructing homologous colinearity across

related genomes3. Multi-sequence alignment4. Detection sequences under purifying selection.

Margulies & Birney, NRG 2008

Reconstructing homologous colinearity

24

• Segmental duplication and other species-specific rearrangements (e.g. inversions, insertions, deletions) interfere with the accurate detection of orthologous genomic regions

Tools

Mercator (Ensembl) coding exons as anchor points graph of colinearity information travel through graph to generate homologous

regions chains-and-nets (UCSC)

reference-based local alignments different genomes (BLASTZ)

filtering highest-scoring chains net together chains from same locus

25

Sequence alignment & constraint detection

26

PhastConsBinConsGERPSiphy

Whole-genome base-pair alignment

Challenges multi-species alignment long DNA sequences (reflecting homologous

colinear regions) one-to-one mapping (with reference genome) various levels of sequence divergence

27

Whole-genome base-pair alignment toolbox

MLAGAN CHAOS seeding algorithm (k-mer anchors)

Dynamic programming (pairwise)

Multiple alignment using progressive strategy

Shuffle-LAGAN (incl. rearrangement map); VISTA

TBA / MultiZ; UCSC Pairwise BLASTZ alignments (local blocks)

Merging joining blocks using MultiZ

Complex ordering of blocks using Threaded Blockset Aligner

PECAN (Ensembl) Consistency alignment based on pairwise alignments (incl. outgroup

information)

MAVID

28

29

From gene to DNA-based colinearity…

Pairwise approach: Human segment as

reference

VISTA http://genome.lbl.gov/vista

30

From gene to DNA-based colinearity…

31

Input and output files

Frazer et al., 2003

PIP- maker

32

Conserved Non-coding Sequences or Elements (CNS/CNE)

Human/dog

Human/mouse

Mouse/dog

Blue: exonsTurquoise: UTR

VIS

TA

plo

t

Exercise

Explore the genome organization and conservation of your favorite locus in a set of related species.

Plants http://bioinformatics.psb.ugent.be/plaza/

Vertebrates http://teleost.cs.uoregon.edu/synteny_db/

Yeast http://wolfe.gen.tcd.ie/ygob/

33

34