Improving ffas alignments using t cofee
Transcript of Improving ffas alignments using t cofee
![Page 1: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/1.jpg)
Improving FFAS alignments using T-Coffee
![Page 2: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/2.jpg)
Why we improve alignments?
• Protein function determination is nontrivial task. The best way to do it is to relate sequence of unknown protein to proteins with known properties.
• To explore evolution of related sequences• Because determination of protein structure
experimentally is still time consuming alignments are also used to create homology models which can give us additional functional information.
![Page 3: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/3.jpg)
What is sequence profile?
A protein profile is a matrix that describes a particular domain or family. Each row of the matrix represents a position in a multiple alignment. Each row has 20 scores, one for each amino acid, reflecting the probabilities of the various amino acids occurring at that row's position in the profile. Thus, scores are position dependent.
![Page 4: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/4.jpg)
Why we use profiles?
To better understanding of evolutionary, structural functional relationships between related sequences.
For biological analysis we usually prepare following steps:
• Finding protein sequences related to our query using database searches algorithm. BLAST, FASTA with reasonable confidence. (FFAS stops at this stage)
• Creating multiple sequence alignment of related sequences (Clustal W, T-Cofee, POA, Dialign)
• Using additional information e.g., predicted secondary structure (Orfeus), knowledge (biological importance of given amino acids)
![Page 5: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/5.jpg)
How T-Coffee works?
It performs all possible pairwise alignments within the set of sequences but in two steps: first with ClustalW and second using „lalign” program from local-Fasta package.The results from both methods are combined into primary library. A library extension step determines how residue pair align with respect to other residues. Then library is used to assess how well sequences are aligned given the other sequences in the dataset, rather then looking at two sequences in isolation. The final alignment is then built progressively using the information in the library.
![Page 6: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/6.jpg)
What was done?
Four algorithms were used in attempt to obtain better alignments:
• Simple elongation of sequences in blast profile.
• Aligning sequences to blast profile using T-Coffee
• Creating profiles using T-coffee in multiple sequence alignment mode
• Mixed method – T-coffee + elongation
![Page 7: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/7.jpg)
Benchmark
• 1024 pairs of protein domains from SCOP
• Low seqence identity
• High strctural similarity
• No redundant pairs
• Only one domains structures
![Page 8: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/8.jpg)
Benchmark
• In each method profiles were created for target and template
• Altough for some algorithms benchmark was computional power expensive (e.g. 14 days on 15 cpu’s for T-coffee in multiple sequence alignment mode) only oryginal PSI-BLAST profile creating procedure was tested.
• In T-coffee in multiple sequence alignment mode there was no results for some pairs.
![Page 9: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/9.jpg)
Alignment qualiy measure
It is common that only fraction of the model is correct. After structural superposition the most significant subset is found using LG score measure.
This allow to compare only reasonable parts of models.
![Page 10: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/10.jpg)
T-cofee to profile algorithm
LG score FFAS vs LG score T-Cofee
0
2000
4000
6000
8000
10000
12000
14000
0 5000 10000 15000
LG score FFAS
LG
sc
ore
T-C
ofe
e
![Page 11: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/11.jpg)
T-cofee to profile algorithm
LG score FFAS vs LG score T-Cofee
0
200
400
600
800
1000
1200
1400
1600
1800
2000
0 500 1000 1500 2000
LG score FFAS
LG
sco
re T
-Co
fee
![Page 12: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/12.jpg)
Elongation
LG score FFAS vs LG score T-Cofee
0
2000
4000
6000
8000
10000
12000
14000
0 2000 4000 6000 8000 10000 12000 14000
LG score FFAS
LG
sc
ore
T-C
ofe
e
![Page 13: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/13.jpg)
T-Coffee
LG score FFAS vs LG score T-coffee
0200
400600
8001000
12001400
16001800
2000
0 500 1000 1500 2000
LG score FFAS
LG
sco
re T
-co
ffee
![Page 14: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/14.jpg)
T-coffee + elongation
LG score FFAS vs LG score T-coffee
0
2000
4000
6000
8000
10000
12000
0 2000 4000 6000 8000 10000 12000
LG score FFAS
LG
sco
re T
-co
ffee
![Page 15: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/15.jpg)
• Best results are obtained using T-coffe only. For FFAS LG score <600 alignments are improved in 72% of all cases.
• Altough LG score is unknow in „real life” there is necesery to find correlation between alignment improvement and known factors.
Note:
• Some of the results are missing. • We can not trust benchmark in all cases
![Page 16: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/16.jpg)
d1ca1_1 a.124.1.1 d1ah7__ a.124.1.1
• Sequence identity 33%
• LG score FFAS = 2457.6
• LG score T-coffee = 2545.6
![Page 17: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/17.jpg)
FFAS ALIGNMENT: 10 20 30 40 50 60 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 1 EDKHKEGVNSHLWIVNRAIDIMSRNTTL----VKQDRVAQLNEWRTELENGIYAADYENP 56 model 1 WDGKIDGTGTHAMIVTQGVSILENDLSKNEPESVRKNLEILKENMHELQLGSTYPDYDKN 60 70 80 90 100 110 120 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 57 YYDNSTFASHFYDPDNGKTYI---------PFAKQAKETGAKYFKLAGESYKNKDMKQAF 107 model 61 AYD--LYQDHFWDPDTDNNFSKDNSWYLAYSIPDTGESQIRKFSALARYEWQRGNYKQAT 118 130 140 150 160 170 180 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 108 FYLGLSLHYLGDVNQPMHAANFTNLSYPQGFHSKYENFVDTIKDNYKVTDGNGYWNWKGT 167 model 119 FYLGEAMHYFGDIDTPYHPANVTAVD--SAGHVKFETFAEERKEQYKI-------NTVGC 169 190 200 210 220 230 240 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 168 NPEEWIHGAAVVAKQDYSGIVNDN--------TKDWFVKAAVSQEYAD-KWRAEVTPMTG 218 model 170 KTNEDFYAD-ILKNKDFNAWSKEYARGFAKTGKSIYYSHASMSHSWDDW------DYAAK 222 250 260 ....|....|....|....|...d1ah7__ 219 KRLMDAQRVTAGYIQLWFDTYGD 241 model 223 VTLANSQKGTAGYIYRFLHDVSE 245
d1ca1_1d1ah7__
![Page 18: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/18.jpg)
TCOFEE ALIGNMENT: 10 20 30 40 50 60 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 1 EDKHKEGVNSHLWIVNRAIDIMSRNTT----LVKQDRVAQLNEWRTELENGIYAADYENP 56 model 1 WDGKIDGTGTHAMIVTQGVSILENDLSKNEPESVRKNLEILKENMHELQLGSTYPDYDKN 60 70 80 90 100 110 120 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 57 YYDNSTFASHFYDPDNGKTY---------IPFAKQAKETGAKYFKLAGESYKNKDMKQAF 107 model 61 AY--DLYQDHFWDPDTDNNFSKDNSWYLAYSIPDTGESQIRKFSALARYEWQRGNYKQAT 118 130 140 150 160 170 180 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 108 FYLGLSLHYLGDVNQPMHAANFTNLSYPQGFHSKYENFVDTIKDNYKVTDGNGYWNWKGT 167 model 119 FYLGEAMHYFGDIDTPYHPANVTAVDSAG--HVKFETFAEERKEQYKINTVGCK-----T 171 190 200 210 220 230 240 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 168 NPEEWIHGAAVVAKQDYSGIVNDNTKDWFVKAAVSQEYADKWRAEVTPMTGKRLMDAQRV 227 model 172 NEDFYADILKNKDFNAWSKEYARGFAKTGKSIYYSHASMSHSWDDWDYAAKVTLANSQKG 231 250 ....|....|....|.d1ah7__ 228 TAGYI-QLWFDTYGDR 242 model 232 TAGYIYRFLHDVSEGN 247
d1ca1_1d1ah7__
![Page 19: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/19.jpg)
STRUCTURAL ALIGNMENT: 10 20 30 40 50 60 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 1 WSAEDKHKEGVNSHLWIVNRAIDIMSRNTTLVK----QDRVAQLNEWRTELENGIYAADY 56 model 1 WDGKIDG---TGTHAMIVTQGVSILENDLSKNEPESVRKNLEILKENMHELQLGSTYPDY 57 70 80 90 100 110 120 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 57 ENPYYDNSTFASHFYDPDNGKTYIP---------FAKQAKETGAKYFKLAGESYKNKDMK 107 model 58 DK-NAYD-LYQDHFWDPDTDNNFSKDNSWYLAYSIPDTGESQIRKFSALARYEWQRGNYK 115 130 140 150 160 170 180 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 108 QAFFYLGLSLHYLGDVNQPMHAANFTNLSYPQGFHSKYENFVDTIKDNYKVTDGNGYWNW 167 model 116 QATFYLGEAMHYFGDIDTPYHPANVTAVDS--AGHVKFETFAEERKEQYKINTVGCKTNE 173 190 200 210 220 230 240 ....|....|....|....|....|....|....|....|....|....|....|....|d1ah7__ 167 -----------KGTNPEEWIHGAAVVAKQDYSG-IVNDNTKDWFVKAAVSQEYADKWRAE 215 model 174 DFYADILKNKDFNAWSKEYARGFAKTGKSIYYSHASMSH-----------------SWDD 216 250 260 270 ....|....|....|....|....|....|...d1ah7__ 216 VTPMTGKRLMDAQRVTAGYIQLWFDTYGDR--- 245 model 217 WDYAAKVTLANSQKGTAGYIYRFLHDVSEGNDP 249
d1ca1_1d1ah7__
![Page 20: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/20.jpg)
d1ca1_1d1ah7__d1ca1_1d1ah7__
![Page 21: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/21.jpg)
d1mgta2 c.55.7.1 d1sfe_2 c.55.7.1
• Sequence identity 40%
• LG score FFAS = 207.7
• LG score T-coffee = 251.1
![Page 22: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/22.jpg)
d1fb1a_d1a8ra_
![Page 23: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/23.jpg)
d1mgta2 c.55.7.1 d1sfe_2 c.55.7.1
• Sequence identity 39%
• LG score FFAS = 47.6
• LG score T-coffee = 113.8
![Page 24: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/24.jpg)
d1mgta2d1sfe_2
![Page 25: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/25.jpg)
d1mgta2d1sfe_2
![Page 26: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/26.jpg)
Sequence Identity
-6000
-4000
-2000
0
2000
4000
0 20 40 60 80 100 120
Identity %
LG
sco
re
![Page 27: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/27.jpg)
Profile identity vs LG score
-6000
-5000
-4000
-3000
-2000
-1000
0
1000
2000
3000
4000
0 20 40 60 80 100
Identity %
LG
- s
co
re
![Page 28: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/28.jpg)
FFAS score vs LG score
-1000
-800
-600
-400
-200
0
200
400
600
800
1000
-2.00E+02 -1.80E+02 -1.60E+02 -1.40E+02 -1.20E+02 -1.00E+02 -8.00E+01 -6.00E+01 -4.00E+01 -2.00E+01 0.00E+00
FFAS score
LG
sco
re
![Page 29: Improving ffas alignments using t cofee](https://reader034.fdocuments.net/reader034/viewer/2022042507/55ac0c441a28ab07388b4802/html5/thumbnails/29.jpg)
Conclusions:
• FFAS alignments still can be improved.• Using T-coffee to create FFAS profiles can improve
alignment quality• It is not known how to add logic wether use T-coffee to
create FFAS profiles
To do:
• Check correlation between alignment diversity and alignment improvement.
• Try to use different method of comparison of sequence alignment (overlap score)
• Compare other multiple alignment method to T-coffee.