BITS - Comparative genomics on the genome level

34
Comparative genomics in eukaryotes Genome analysis Klaas Vandepoele, PhD Professor Ghent University Comparative & Integrative Genomics VIB – Ghent University, Belgium

description

This is the third presentation of the BITS training on 'Comparative genomics'. It reviews the basic concepts of sequence homology on the gene Thanks to Klaas Vandepoele of the PSB department.

Transcript of BITS - Comparative genomics on the genome level

Page 1: BITS - Comparative genomics on the genome level

Comparative genomicsin eukaryotes

Genome analysis

Klaas Vandepoele, PhD

Professor Ghent UniversityComparative & Integrative GenomicsVIB – Ghent University, Belgium

Page 2: BITS - Comparative genomics on the genome level

2

I. Genome conservation & genomic homology

Alignment of homologous regions Inter-genomic: aligning genomic sequences from different

species Intra-genomic aligning genomic sequences from the same

species

Different levels of resolution Comparative mapping (markers) Synteny (~ gene content) Colinearity (gene content + order conservation) DNA-based alignments (base-to-base mapping)

Page 3: BITS - Comparative genomics on the genome level

3

Human – Mouse - Ratresolution

Page 4: BITS - Comparative genomics on the genome level

4

Human – Mouse orthologous regions

Mouse chr IV

Comparativemapping

Genome translocations associated with human-mouse speciation

resolution

Human

www.ensembl.org

Page 5: BITS - Comparative genomics on the genome level

5

Human genome browser

Human chr IMouse chr IV

Conserved gene content & order

Human gene model

EST/cDNA similarities

Genome similarities

Gene loss and insertions in orthologous segments since human-mouse speciation

resolution

Page 6: BITS - Comparative genomics on the genome level

6

Human – Mouse base-to-base mapping

Blue: coding exons GT donor AG acceptor

Functional sequences (e.g. exons) evolve slower than non-functional ones (e.g. introns) due to natural selection against mutations in these regions

Consequently, functional elements, both coding and non-coding, are unusually well conserved in orthologous regions

resolution

Page 7: BITS - Comparative genomics on the genome level

7

DNA substitution rates for different gene/genome regions

Molecular Evolution, Li WH

Page 8: BITS - Comparative genomics on the genome level

8

Multiple species comparisons (gene-based)

PhIGsHedges, 2002

Page 9: BITS - Comparative genomics on the genome level

9

Genome size variation in the grasses: the use of model systems

BEP

PACC

Rice 450Mb

Barley ~5000Mb55 MYA

46 MYA

28 MYA

Maize ~2400Mb

Gaut 2002

Sorghum ~750Mb

Page 10: BITS - Comparative genomics on the genome level

10

Grass genomes: a single genetic system?

Gale and Devos, 1998

Page 11: BITS - Comparative genomics on the genome level

11

Micro-colinearity within the grasses

Bennetzen lab

Page 12: BITS - Comparative genomics on the genome level

Yeast Gene Order Browser (YGOB)

12

Page 13: BITS - Comparative genomics on the genome level

13

II. Computational detection of genomic homology

Synteny~ conservation of gene content

Colinearity~ conservation of (gene) content & order

Macro-colinearity Marker-based

Micro-colinearity DNA based or gene-based

Page 14: BITS - Comparative genomics on the genome level

14

How to find evidence for gene colinearity?

Time

1 2 3 4 5 6 7 8 9 10 11A

1 2 3 4 5 6 7 8 9 10 11

1 2 3 4 5 6 7 8 9 10 11

speciation

S1

S2

1 3 4 6 7 10 11

1 2 4 6 7 8 9 11

Gene loss, insertions, rearrangements, translocation, etc …

S1

S2

2

retained orthologs (anchor points)

Page 15: BITS - Comparative genomics on the genome level

15

Matrix representation

segment S1

seg

men

t S

2

1 3 4 - 6 X 101

2

4

6

7

8

9

-

- 7 X

-

X

1 3 4 6 7 10 11

1 2 4 6 7 8 9 11

S1

S2

11

11

Page 16: BITS - Comparative genomics on the genome level

16

Chromosome 1

Chr

omos

ome

2

• Represent chromosomes as sorted gene lists

• Identify all homologous gene pairs between chromosomes (all-against-all BLASTP*).

• Score pairs of homologues in matrix

Identifying homologous regions = identifying diagonal series of

elements in the gene homology matrix (GHM).

Map-based approach

Vandepoele et al., Genome Research 2002

Page 17: BITS - Comparative genomics on the genome level

17

The map-based approach: terminology

Chromosome 1

Ch

rom

oso

me

2

Tandem duplication

Colinear segment

Inverted colinear segment

1

2

Homologous gene

Gene Homology Matrix (GHM)

Page 18: BITS - Comparative genomics on the genome level

18

Detection of colinear homologous regions

HsaC1

MmuC4HsaC1

GgaC23

Human-mouse Chicken-human

Page 19: BITS - Comparative genomics on the genome level

19

Detection of colinear homologous regions

HsaC1HsaC1

MmuC4TviC1

Human-tetraodonHuman-mouse

Page 20: BITS - Comparative genomics on the genome level

MUMmer

20

NUCmer PROmer

Page 21: BITS - Comparative genomics on the genome level

21

And what about synteny?

ancient duplication

HsaC1

HsaC9

Identifying syntenic regions = identifying high homolog-density

regions in the gene homology matrix (GHM).

• Application of 2-dimensional sliding-window approach to score regions with a high density of homologous genes between 2 chromosomes

DeSyRe, Vandepoele et al. unpublished

Page 22: BITS - Comparative genomics on the genome level

22

Detection of recent and ancient large-scale duplications

synteny

ancient duplication

HsaC1

HsaC9

recent duplication

C2

C4

colinearity

Page 23: BITS - Comparative genomics on the genome level

23

III. Whole-genome alignments

Evolutionary constrained sequences are a good indicator of functional genome regions

Basic protocol1. Sequence generation2. Reconstructing homologous colinearity across

related genomes3. Multi-sequence alignment4. Detection sequences under purifying selection.

Margulies & Birney, NRG 2008

Page 24: BITS - Comparative genomics on the genome level

Reconstructing homologous colinearity

24

• Segmental duplication and other species-specific rearrangements (e.g. inversions, insertions, deletions) interfere with the accurate detection of orthologous genomic regions

Page 25: BITS - Comparative genomics on the genome level

Tools

Mercator (Ensembl) coding exons as anchor points graph of colinearity information travel through graph to generate homologous

regions chains-and-nets (UCSC)

reference-based local alignments different genomes (BLASTZ)

filtering highest-scoring chains net together chains from same locus

25

Page 26: BITS - Comparative genomics on the genome level

Sequence alignment & constraint detection

26

PhastConsBinConsGERPSiphy

Page 27: BITS - Comparative genomics on the genome level

Whole-genome base-pair alignment

Challenges multi-species alignment long DNA sequences (reflecting homologous

colinear regions) one-to-one mapping (with reference genome) various levels of sequence divergence

27

Page 28: BITS - Comparative genomics on the genome level

Whole-genome base-pair alignment toolbox

MLAGAN CHAOS seeding algorithm (k-mer anchors)

Dynamic programming (pairwise)

Multiple alignment using progressive strategy

Shuffle-LAGAN (incl. rearrangement map); VISTA

TBA / MultiZ; UCSC Pairwise BLASTZ alignments (local blocks)

Merging joining blocks using MultiZ

Complex ordering of blocks using Threaded Blockset Aligner

PECAN (Ensembl) Consistency alignment based on pairwise alignments (incl. outgroup

information)

MAVID

28

Page 29: BITS - Comparative genomics on the genome level

29

From gene to DNA-based colinearity…

Pairwise approach: Human segment as

reference

VISTA http://genome.lbl.gov/vista

Page 30: BITS - Comparative genomics on the genome level

30

From gene to DNA-based colinearity…

Page 31: BITS - Comparative genomics on the genome level

31

Input and output files

Frazer et al., 2003

PIP- maker

Page 32: BITS - Comparative genomics on the genome level

32

Conserved Non-coding Sequences or Elements (CNS/CNE)

Human/dog

Human/mouse

Mouse/dog

Blue: exonsTurquoise: UTR

VIS

TA

plo

t

Page 33: BITS - Comparative genomics on the genome level

Exercise

Explore the genome organization and conservation of your favorite locus in a set of related species.

Plants http://bioinformatics.psb.ugent.be/plaza/

Vertebrates http://teleost.cs.uoregon.edu/synteny_db/

Yeast http://wolfe.gen.tcd.ie/ygob/

33

Page 34: BITS - Comparative genomics on the genome level

34