Post on 05-Dec-2014
description
Comparative genomicsin eukaryotes
Genome analysis
Klaas Vandepoele, PhD
Professor Ghent UniversityComparative & Integrative GenomicsVIB – Ghent University, Belgium
2
I. Genome conservation & genomic homology
Alignment of homologous regions Inter-genomic: aligning genomic sequences from different
species Intra-genomic aligning genomic sequences from the same
species
Different levels of resolution Comparative mapping (markers) Synteny (~ gene content) Colinearity (gene content + order conservation) DNA-based alignments (base-to-base mapping)
3
Human – Mouse - Ratresolution
4
Human – Mouse orthologous regions
Mouse chr IV
Comparativemapping
Genome translocations associated with human-mouse speciation
resolution
Human
www.ensembl.org
5
Human genome browser
Human chr IMouse chr IV
Conserved gene content & order
Human gene model
EST/cDNA similarities
Genome similarities
Gene loss and insertions in orthologous segments since human-mouse speciation
resolution
6
Human – Mouse base-to-base mapping
Blue: coding exons GT donor AG acceptor
Functional sequences (e.g. exons) evolve slower than non-functional ones (e.g. introns) due to natural selection against mutations in these regions
Consequently, functional elements, both coding and non-coding, are unusually well conserved in orthologous regions
resolution
7
DNA substitution rates for different gene/genome regions
Molecular Evolution, Li WH
8
Multiple species comparisons (gene-based)
PhIGsHedges, 2002
9
Genome size variation in the grasses: the use of model systems
BEP
PACC
Rice 450Mb
Barley ~5000Mb55 MYA
46 MYA
28 MYA
Maize ~2400Mb
Gaut 2002
Sorghum ~750Mb
10
Grass genomes: a single genetic system?
Gale and Devos, 1998
11
Micro-colinearity within the grasses
Bennetzen lab
Yeast Gene Order Browser (YGOB)
12
13
II. Computational detection of genomic homology
Synteny~ conservation of gene content
Colinearity~ conservation of (gene) content & order
Macro-colinearity Marker-based
Micro-colinearity DNA based or gene-based
14
How to find evidence for gene colinearity?
Time
1 2 3 4 5 6 7 8 9 10 11A
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11
speciation
S1
S2
1 3 4 6 7 10 11
1 2 4 6 7 8 9 11
Gene loss, insertions, rearrangements, translocation, etc …
S1
S2
2
retained orthologs (anchor points)
15
Matrix representation
segment S1
seg
men
t S
2
1 3 4 - 6 X 101
2
4
6
7
8
9
-
- 7 X
-
X
1 3 4 6 7 10 11
1 2 4 6 7 8 9 11
S1
S2
11
11
16
Chromosome 1
Chr
omos
ome
2
• Represent chromosomes as sorted gene lists
• Identify all homologous gene pairs between chromosomes (all-against-all BLASTP*).
• Score pairs of homologues in matrix
Identifying homologous regions = identifying diagonal series of
elements in the gene homology matrix (GHM).
Map-based approach
Vandepoele et al., Genome Research 2002
17
The map-based approach: terminology
Chromosome 1
Ch
rom
oso
me
2
Tandem duplication
Colinear segment
Inverted colinear segment
1
2
Homologous gene
Gene Homology Matrix (GHM)
18
Detection of colinear homologous regions
HsaC1
MmuC4HsaC1
GgaC23
Human-mouse Chicken-human
19
Detection of colinear homologous regions
HsaC1HsaC1
MmuC4TviC1
Human-tetraodonHuman-mouse
MUMmer
20
NUCmer PROmer
21
And what about synteny?
ancient duplication
HsaC1
HsaC9
Identifying syntenic regions = identifying high homolog-density
regions in the gene homology matrix (GHM).
• Application of 2-dimensional sliding-window approach to score regions with a high density of homologous genes between 2 chromosomes
DeSyRe, Vandepoele et al. unpublished
22
Detection of recent and ancient large-scale duplications
synteny
ancient duplication
HsaC1
HsaC9
recent duplication
C2
C4
colinearity
23
III. Whole-genome alignments
Evolutionary constrained sequences are a good indicator of functional genome regions
Basic protocol1. Sequence generation2. Reconstructing homologous colinearity across
related genomes3. Multi-sequence alignment4. Detection sequences under purifying selection.
Margulies & Birney, NRG 2008
Reconstructing homologous colinearity
24
• Segmental duplication and other species-specific rearrangements (e.g. inversions, insertions, deletions) interfere with the accurate detection of orthologous genomic regions
Tools
Mercator (Ensembl) coding exons as anchor points graph of colinearity information travel through graph to generate homologous
regions chains-and-nets (UCSC)
reference-based local alignments different genomes (BLASTZ)
filtering highest-scoring chains net together chains from same locus
25
Sequence alignment & constraint detection
26
PhastConsBinConsGERPSiphy
Whole-genome base-pair alignment
Challenges multi-species alignment long DNA sequences (reflecting homologous
colinear regions) one-to-one mapping (with reference genome) various levels of sequence divergence
27
Whole-genome base-pair alignment toolbox
MLAGAN CHAOS seeding algorithm (k-mer anchors)
Dynamic programming (pairwise)
Multiple alignment using progressive strategy
Shuffle-LAGAN (incl. rearrangement map); VISTA
TBA / MultiZ; UCSC Pairwise BLASTZ alignments (local blocks)
Merging joining blocks using MultiZ
Complex ordering of blocks using Threaded Blockset Aligner
PECAN (Ensembl) Consistency alignment based on pairwise alignments (incl. outgroup
information)
MAVID
28
29
From gene to DNA-based colinearity…
Pairwise approach: Human segment as
reference
VISTA http://genome.lbl.gov/vista
30
From gene to DNA-based colinearity…
31
Input and output files
Frazer et al., 2003
PIP- maker
32
Conserved Non-coding Sequences or Elements (CNS/CNE)
Human/dog
Human/mouse
Mouse/dog
Blue: exonsTurquoise: UTR
VIS
TA
plo
t
Exercise
Explore the genome organization and conservation of your favorite locus in a set of related species.
Plants http://bioinformatics.psb.ugent.be/plaza/
Vertebrates http://teleost.cs.uoregon.edu/synteny_db/
Yeast http://wolfe.gen.tcd.ie/ygob/
33
34