Using the T-Coffee Multiple Sequence Alignment Package I - Overview
description
Transcript of Using the T-Coffee Multiple Sequence Alignment Package I - Overview
![Page 1: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/1.jpg)
Using the T-Coffee Multiple Sequence Alignment Package
I - Overview
Cédric NotredameComparative Bioinformatics GroupBioinformatics and Genomics Program
![Page 2: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/2.jpg)
What is T-Coffee ?
Tree Based Consistency based Objective Function for Alignment Evaluation– Progressive Alignment– Consistency
![Page 3: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/3.jpg)
Progressive Alignment
Feng and Dolittle, 1988; Taylor 1989
Clustering
![Page 4: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/4.jpg)
Dynamic Programming Using A Substitution Matrix
Progressive Alignment
![Page 5: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/5.jpg)
Progressive Alignment
-Depends on the ORDER of the sequences (Tree).
-Depends on the CHOICE of the sequences.
-Depends on the PARAMETERS:
•Substitution Matrix.
•Penalties (Gop, Gep).
•Sequence Weight.
•Tree making Algorithm.
![Page 6: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/6.jpg)
Consistency?
Consistency is an attempt to use alignment information at very early stages
![Page 7: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/7.jpg)
T-Coffee and Concistency…
SeqA GARFIELD THE LAST FAT CAT Prim. Weight =88SeqB GARFIELD THE FAST CAT ---
SeqA GARFIELD THE LAST FA-T CAT Prim. Weight =77 SeqC GARFIELD THE VERY FAST CAT
SeqA GARFIELD THE LAST FAT CAT Prim. Weight =100SeqD -------- THE ---- FAT CAT
SeqB GARFIELD THE ---- FAST CAT Prim. Weight =100SeqC GARFIELD THE VERY FAST CAT
SeqC GARFIELD THE VERY FAST CAT Prim. Weight =100SeqD -------- THE ---- FA-T CAT
![Page 8: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/8.jpg)
T-Coffee and Concistency…
SeqA GARFIELD THE LAST FAT CAT Prim. Weight =88SeqB GARFIELD THE FAST CAT ---
SeqA GARFIELD THE LAST FA-T CAT Prim. Weight =77 SeqC GARFIELD THE VERY FAST CAT
SeqA GARFIELD THE LAST FAT CAT Prim. Weight =100SeqD -------- THE ---- FAT CAT
SeqB GARFIELD THE ---- FAST CAT Prim. Weight =100SeqC GARFIELD THE VERY FAST CAT
SeqC GARFIELD THE VERY FAST CAT Prim. Weight =100SeqD -------- THE ---- FA-T CAT
SeqA GARFIELD THE LAST FAT CAT Weight =88SeqB GARFIELD THE FAST CAT ---
SeqA GARFIELD THE LAST FA-T CAT Weight =77 SeqC GARFIELD THE VERY FAST CATSeqB GARFIELD THE ---- FAST CAT
SeqA GARFIELD THE LAST FA-T CAT Weight =100SeqD -------- THE ---- FA-T CATSeqB GARFIELD THE ---- FAST CAT
![Page 9: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/9.jpg)
T-Coffee and Concistency…
SeqA GARFIELD THE LAST FAT CAT Weight =88SeqB GARFIELD THE FAST CAT ---
SeqA GARFIELD THE LAST FA-T CAT Weight =77 SeqC GARFIELD THE VERY FAST CATSeqB GARFIELD THE ---- FAST CAT
SeqA GARFIELD THE LAST FA-T CAT Weight =100SeqD -------- THE ---- FA-T CATSeqB GARFIELD THE ---- FAST CAT
![Page 10: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/10.jpg)
T-Coffee and Concistency…
![Page 11: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/11.jpg)
Where Do The Primary Alignments Come From?
Primary Alignments– Primary Library
Source– Any valid Third Party Method
![Page 12: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/12.jpg)
T-Coffee and Concistency…
![Page 13: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/13.jpg)
T-Coffee and Concistency…
![Page 14: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/14.jpg)
Using the T-Coffee Multiple Sequence Alignment Package
II – M-Coffee
Cédric NotredameComparative Bioinformatics GroupBioinformatics and Genomics Program
![Page 15: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/15.jpg)
What is the Best MSA method ?
More than 50 MSA methods Some methods are fast and inacurate
– Mafft, muscle, kalign
Some methods are slow and accurate– T-Coffee, ProbCons
Some Methods are slow and inacurate…– ClustalW
![Page 16: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/16.jpg)
Why Not Combining Them ?
All Methods give different alignments Their Agreement is an indication of accuracy
t_coffee –method mafft_msa, muscle_msa
![Page 17: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/17.jpg)
Combining Many MSAs into ONE
MUSCLE
MAFFT
ClustalW
???????
T-Coffee
![Page 18: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/18.jpg)
![Page 19: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/19.jpg)
Where to Trust Your Alignments
Most Methods Agree
Most Methods Disagree
![Page 20: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/20.jpg)
What To Do Without Structures
![Page 21: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/21.jpg)
Using the T-Coffee Multiple Sequence Alignment Package
III – Template Based Alignments
Cédric NotredameComparative Bioinformatics GroupBioinformatics and Genomics Program
![Page 22: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/22.jpg)
Sometimes Sequences are Not Enough
Sequence based alignments are limited in accuracy– 30% for proteins– 70% for DNA
It is hard to align correctly sequences whose similarity is below these values– Twilight zone
![Page 23: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/23.jpg)
One Solution: Template Based Alignment
Replace the sequence with something more informative– PDB Structure Expresso– Profile PSI-Coffee– RNA-Structure R-Coffee
![Page 24: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/24.jpg)
Template Based Multiple Sequence Alignments
-Structure-Profile-…
Sources
Templates
Library
TemplateAligner
Template Alignment
Source Template Alignment
Remove Templates
Templates-Structure-Profile-…
![Page 25: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/25.jpg)
Expresso: Finding the Right Structure
Sources
Templates
Library
BLAST BLAST
SAP
Template Alignment
Source Template Alignment
Remove Templates
Templates
![Page 26: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/26.jpg)
PSI-Coffee: Homology Extension
Sources
Templates
Library
BLAST BLAST
Template Alignment
Source Template Alignment
Remove Templates
TemplatesProfile Aligner
![Page 27: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/27.jpg)
What is Homology Extension ?
L L
L
?
-Simple scoring schemes result in alignment ambiguities
![Page 28: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/28.jpg)
What is Homology Extension ?
L L
L
LLLLLL
LLIVIL
LLLLLL
Profile 1
Profile 2
![Page 29: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/29.jpg)
What is Homology Extension ?
L L
L
LLLLLL
LLIVIL
LLLLLL
Profile 1
Profile 2
![Page 30: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/30.jpg)
Method Method Template Score Comment
ClustalW-2 Progressive NO 22.74
PRANK Gap NO 26.18 Science2008
MAFFT Iterative NO 26.18
Muscle Iterative NO 31.37
ProbCons Consistency NO 40.80
ProbCons MonoPhasic NO 37.53
T-Coffee Consistency NO 42.30
M-Coffe4 Consistency NO 43.60
PSI-Coffee Consistency Profile 53.71
PROMAL Consistency Profile 55.08
PROMAL-3D Consistency PDB 57.60
3D-Coffee Consistency PDB 61.00 Expresso
Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase).
![Page 31: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/31.jpg)
ExperimentalData…
TARGET
ExperimentalData…
TARGETTemplate Aligner
Template-Sequence Alignment
Primary Library
Template Alignment
Template based Alignmentof the Sequences
Templates Templates
TARGET
![Page 32: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/32.jpg)
Using the T-Coffee Multiple Sequence Alignment Package
IV – RNA Alignments
Cédric NotredameComparative Bioinformatics GroupBioinformatics and Genomics Program
![Page 33: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/33.jpg)
ncRNAs Comparison
And ENCODE said…“nearly the entire genome may be represented in primary transcripts that extensively overlap and include many non-protein-coding regions”
Who Are They?– tRNA, rRNA, snoRNAs, – microRNAs, siRNAs– piRNAs– long ncRNAs (Xist, Evf, Air, CTN, PINK…)
How Many of them– Open question– 30.000 is a common guess– Harder to detect than proteins
.
![Page 34: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/34.jpg)
ncRNAs Can Evolve Rapidly
CCAGGCAAGACGGGACGAGAGTTGCCTGGCCTCCGTTCAGAGGTGCATAGAACGGAGG**-------*--**---*-**------**
GAACGGACC
CTTGCCTGG
GG
AAC CA
CGG
AG
AC G
CTTGCCTCC
GAACGGAGG
GG
AAC CA
CGG
AG
AC G
![Page 35: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/35.jpg)
The Holy Grail of RNA Comparison:Sankoff’ Algorithm
![Page 36: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/36.jpg)
The Holy Grail of RNA ComparisonSankoff’ Algorithm
Simultaneous Folding and Alignment
– Time Complexity: O(L2n)– Space Complexity: O(L3n)
In Practice, for Two Sequences:
– 50 nucleotides: 1 min. 6 M.– 100 nucleotides 16 min. 256 M.– 200 nucleotides 4 hours 4 G.– 400 nucleotides 3 days 3 T.
Forget about– Multiple sequence alignments– Database searches
![Page 37: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/37.jpg)
RNA Sequences
Secondary Structures
Primary Library
R-Coffee ExtendedPrimary Library
Progressive AlignmentUsing The R-Score
RNAplfoldConsan
orMafft / Muscle / ProbCons
R-CoffeeExtension
R-Score
![Page 38: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/38.jpg)
CC
R-Coffee Extension
GG
TC Library
G G Score XC C Score Y
CC
GG
Goal: Embedding RNA Structures Within The T-Coffee Libraries The R-extension can be added on the top of any existing method.
![Page 39: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/39.jpg)
R-Coffee + Regular Aligners
Method Avg Braliscore Net Improv.direct +T +R +T +R
-----------------------------------------------------------Poa 0.62 0.65 0.70 48 154Pcma 0.62 0.64 0.67 34 120Prrn 0.64 0.61 0.66 -63 45ClustalW 0.65 0.65 0.69 -7 83Mafft_fftnts 0.68 0.68 0.72 17 68ProbConsRNA 0.69 0.67 0.71 -49 39Muscle 0.69 0.69 0.73 -17 42Mafft_ginsi 0.70 0.68 0.72 -49 39-----------------------------------------------------------
Improvement= # R-Coffee wins - # R-Coffee looses
![Page 40: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/40.jpg)
RM-Coffee + Regular Aligners
Method Avg Braliscore Net Improv.direct +T +R +T +R
-----------------------------------------------------------Poa 0.62 0.65 0.70 48 154Pcma 0.62 0.64 0.67 34 120Prrn 0.64 0.61 0.66 -63 45ClustalW 0.65 0.65 0.69 -7 83Mafft_fftnts 0.68 0.68 0.72 17 68ProbConsRNA 0.69 0.67 0.71 -49 39Muscle 0.69 0.69 0.73 -17 42Mafft_ginsi 0.70 0.68 0.72 -49 39-----------------------------------------------------------RM-Coffee4 0.71 / 0.74 / 84
![Page 41: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/41.jpg)
R-Coffee + Structural Aligners
Method Avg Braliscore Net Improv.direct +T +R +T +R
-----------------------------------------------------------Stemloc 0.62 0.75 0.76 104 113Mlocarna 0.66 0.69 0.71 101 133Murlet 0.73 0.70 0.72 -132 -73Pmcomp 0.73 0.73 0.73 142 145T-Lara 0.74 0.74 0.69 -36 -8Foldalign 0.75 0.77 0.77 72 73-----------------------------------------------------------Dyalign --- 0.63 0.62 --- ---Consan --- 0.79 0.79 --- --------------------------------------------------------------RM-Coffee4 0.71 / 0.74 / 84
![Page 42: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/42.jpg)
Using the T-Coffee Multiple Sequence Alignment Package
V – DNA Alignments
Cédric NotredameComparative Bioinformatics GroupBioinformatics and Genomics Program
![Page 43: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/43.jpg)
Aligning Genomic DNA
Main problem– Tell a good alignment from a bad one
Strategy:– Tuning on Orthologous Promoter Detection– Evaluation on ChIp-Seq Data
![Page 44: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/44.jpg)
Aligning Genomic DNA
Main problem– Tell a good alignment from a bad one
Strategy:– Tuning on Orthologous Promoter Detection– Evaluation on ChIp-Seq Data
![Page 45: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/45.jpg)
Aligning Genomic DNA
Tuning of Gap Penalties
Design of a di-nucleotide substitution matrix
![Page 46: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/46.jpg)
Aligning Genomic DNA
![Page 47: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/47.jpg)
Aligning Genomic DNA
gDNA is very heterogenous Each genomic feature requires its own
aligner Aligning non-orthologous regions with a
global aligner is impossible Pro-Coffee is designed to align orthologous
promoter regions
![Page 48: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/48.jpg)
Using the T-Coffee Multiple Sequence Alignment Package
VI – Wrap Up
Cédric NotredameComparative Bioinformatics GroupBioinformatics and Genomics Program
![Page 49: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/49.jpg)
Which Flavor?
Fast Alignments– M-Coffee with Fast Aligners: mafft, muscle, kalign
Difficult Protein Alignments– Expresso– PSI-Coffee
RNA Alignments– R-Coffee
Promoter Alignments– Pro-Coffee
![Page 50: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/50.jpg)
www.tcoffee.org
![Page 51: Using the T-Coffee Multiple Sequence Alignment Package I - Overview](https://reader035.fdocuments.net/reader035/viewer/2022062518/5681469f550346895db3b848/html5/thumbnails/51.jpg)