daTAA server
-
Upload
pawel-szczesny -
Category
Technology
-
view
1.705 -
download
1
description
Transcript of daTAA server
Server daTAA: http://toolkit.tuebingen.mpg.de/dataa
Paweł Szczęsny MPI for Developmental Biology, Tuebingen, Germany Institute of Biochemistry and Biophysics PAS, Warsaw, Poland
Internal complexity of TAAs
MFWMCFVIFFIGEFIMKKLSVTSKRQYNLYASPISRRLSLLMKLSLETVTVMFLLGASPVLA/SNLALTGAKNLSQNSPGVNYSKGSHGSIVLSGDDDFCGADYVLGRGGNSTVRNGIPISVEEEYERFVKQKLMNNATSPYSQSSEQQVWTGDGLTSKGSGYMGGKSTDGDKNILPEAYGIY-------------------------SFATGCGSSAQGNY-------------------------SVAFGANATALTGG-------------------------SQAFGVAALASGRV-------------------------SVAIGVGSEATGEA-------------------------GVSLGGLSKAAGAR-------------------------SVAIGTRANAYGEE-------------------------SIAIGGGLKQGSDNKIGSAVAQGLK-------------------------AISIGSDSVGFQHY-------------------------AVAIGAKSRALLLK-------------------------SVALGSYSVADVDAGVRGYDPVEDEPSKNVSFVWKSSVGAVSVGNRKEGLTRQIIGVAAG---TEDTDAVNVAQLKALR:GMISEK|GGWNLTVNNDNNTVVSSGGALDLSSGSKNLKIAKDGKKNNVTFDVARDLTLKSIKLDGVTLNETGLFIANGPQITASGINAGSQKITGVAEG---TDANDAVNFGQL-----------------------------------------------------------------------------------KKI|ETEVKE-----QVAASGFVKQDSDTK:YLTIGKDTDGDTINIANNKSDKRTLMGIKEGDISKDSSEAITGSQLFTTNQNVKTVSDNLQTAATNIAKTFGGDAKYE-DGEWTAPTFKVKTVTGEGKE-EEKTYQNVADALAGVGSSITNVQ-------NKVTEQVNNAIT--KVEGDALLWSDEANAFVARHEKSKLEKGASKATQENSKITYLLDGDVSKDSTDAITGKQLYSLGD--------------KIASYLGGNAKYE-NGEWTAPTFKVKTVKEDGKE-EEQTYHNVAAAFEGVGTSFTNVK-------NEITKQINHL----QSDDSAVVHYDKDDK-NGSINYASVTLGKGKDSAAVTLHNVAAGNIAKDSHDAINGSQIYSLNE--------------QLATYFGGGAGYNKEGKWTAPTFTVKTVKEDGEE-EEKTYQNVAEALTGVGTSFTNIK-------SEITKQIANEIS--NVTGDSLVKKDLDTNLITIGKEVAGTEINIASVSKADRTLSGVKEA---VKDNEAVNKGQL---------------------------------------------------------------------------------------DKGLKHLSDSLQSEDSAVVHYDKKTDETGGINYTSVTLG-GKDKTPVALHNVADGSISKDSHDAINGGQIHTIGE--------------DVAKFLGGAASFN-NGAFTGPTYKLSNIDAKGDV-QQSEFKDIGSAFAGLDTNIKNVNNNVTNKFNELTQNITNVTQ--QVKGDALLWSDEANAFVARHEKSKLGKGASKATQENSKITYLLDGDVSKDSTDAITGKQLYSLGD--------------KIASYLGGNAKYE-DGEWTAPTFKVKTVKEDGKE-EEKTYQNVAEALTGVGTSFTNVK-------NEITKQINHL----QSDDSAVVHYDKNKDETGGINYASVTLGKGKDSAAVTLHNVADGSISKDSRDAINGSQIYSLNE--------------QLATYFGGGAKYE-NGQWTAPIFKVKTVKEDGEE-EEKTYQNVAEALTGVGTSFTNIK-------SEITKQIANEIS--SVTGDSLVKKDLATNLITIGKEVAGTEINIASVSKADRTLSGVKEA---VKDNEAVNKGQL---------------------------------------------------------------------DTNIKKVE-------DKLTEAVGKVTQ--QVKGDALLWSNEDNAFVADHGKDSAKTKSKITHLLDGNIASGSTDAVTGGQLYSLNE--------------QLATYFGGGAKYE-NGQWTAPTFKVKTVNGEGKE-EEQTYQNVAEALTGVGASFMNVQNKIT---NEITNQVNNAIT--KVEGDSLVKQDNLG-IITLGKERGGLKVDFANRDGLDRTLSGVKEA---VNDNEAVNKGQL---------------------------------------------------------------------DADISKVNNNVTNKFNELTQNITNVTQ--QVKGDALLWSDEANAFVARHEKSKLEKGVSKATQENSKITYLLDGDISKGSTDAVTGGQLYSLNE--------------QLATYFGGDAKYE-NGQWTAPTFKVKTVNGEGKE-EEQTYHNVAAAFEGVGTSFTNIK-------SEITKQINNEIS--NVKGDSLVKKDLATNLITIGKEVAGTEINIASVSKADRTLSGVKEA---VKDNEAVNKGQL---------------------------------------------------------------------DTNIKKVE-------DKLTEAVGKVTQ--QVKGDALLWSNEDNAFVADHGKDSAKTKSKITHLLDGNIASGSTDAVTGGQLYSLNE--------------QLATYFGGGAKYE-NGQWTAPTFKVKTVNGDGKE-EEQTYQNVAEALTGVGTSFTNVQNKIT---NEITNQVNNAIT--KVEGDSLVKQDNLG-IITLGKERGGLKVDFANRDGLDRTLSGVKEA---VNDNEAVNKGQL---------------------------------------------------------------------DANISKVNNNVTNKFNELTQNITNVTQ--QVQGDTLLWSDEANAFVARHEKSKLEKGVSKATQENSKITYLLDGDISKGSTDAVTGGQLYSLNE--------------QLATYFGGGAKYE-NGEWTAPTFKVKTVNGEGKE-EEQTYHNVAAAFEGVGTSFTNIK-------SEITKQIDNEII--NVKGDSLVKRDLATNLITIGKEIEGSAINIANKSGEARTISGVKEA---VNNNEAVNKGQL---------------------------------------------------------------------DTNIKKVE-------DKLTEAVGKVTQ--QVKGDALLWSNEDNAFVADHGKDSAKTKSKITHLLDGNIASGSTDAVTGGQLYSLNE--------------QLATYFGGGAKYE-NGQWTAPSFKVKTVKEDGKE-EEQTYQNVAEALTGVGTSFTNVK-------NEITKQINHL----QSDDSAVVHYDKNKDETGTINYASVTLGKGKDSAAVTLHNVADGSISKDSRDAINGGQIHTIGE--------------DVAKFLGGDAAFK-DGAFTGPTYKLSNIDAKGDV-QQSEFKDIGSAFAGLDTNIKNVNNNVTNKFNELTQSITNVTQ--QVKGDSLLWSDEANAFVARHEKSKLEKGASKAIQENSKITYLLDGNVSKGSTDAVTGGQLYSMSN--------------MLATYLGGNAKYE-NGEWTAPTFKVKTVNGEGKE-EEQTYQNVAEALTGVGTSFTNIK-------SEIAKQINHL----QSDDSAVIHYDKNKDETGTINYASVTLGKGEDSAAVALHNVAAGNIAKDSRDAINGSQLYSLNE--------------QLLTYFGGNAGYK-DGQWIAPKFQVSQFKSDGSSGEKESYDNVAAAFEGVNKSLAGM--------NERINNVVTAGQ--NVSSNSLNWNETEGGYDARHNGVDSKLTHVENGDVSEKSKEAVNGSQLWNTNEKVEAVEKDVKNIEKKVQDIATVADSAVKYEKDSTGKKTNVIKLVGGSESDPVLIDNVADGDIKEGSKQAVNGGQLRDYTEKQMKIVLEDAKKYTDERFNDVVNNGVNEAKAYTDMKFEALSYAVEDVRKEARQAQLLVWRYLTYVTMIYRDL AAIGLAVSNLRYYDIPGSLSLSFGTGIWRSQSAFAVGAGYTSEDGNIRSNLSITNAGGHWGVGAGITLRLK
Automated vs manual annotation
Domain type PFAM manually
Present in PFAM 28% 35%
Not present in PFAM - 18%
Coiled coils - 3%
Total 28% 56%
Present in PFAM 26% 31%
Not present in PFAM - 36%
Coiled coils - 25%
Total 26% 92%
Coverage of annotation
Automated vs manual annotation
Domain type PFAM daTAA manually
Present in PFAM 28% 32% 35%
Not present in PFAM - 13% 18%
Coiled coils - 5% 3%
Total 28% 50% 56%
Present in PFAM 26% 28% 31%
Not present in PFAM - 27% 36%
Coiled coils - 11% 25%
Total 26% 66% 92%
Coverage of annotation
Prediction of individual repeats in YadA
|----------Hep_Hag---------|---------Hep_Hag- |---Ylhead---|---Ylhead----|-----ASAKGIHSIAIGATAEAAKGAAVAVGAGSIATGVNSVAIGPLSKALG ----------| |---------Hep_Hag-------Ylhead--|---Ylhead---|---Ylhead---|----Ylhead--DSAVTYGAASTAQKDGVAIGARASTSDTGVAVGFNSKADAKNSVAIG ---| |----------Hep_Hag---------|-|----Ylhead-----|----Ylhead---|HSSHVAANHGYSIAIGDRSKTDRENSVSIGHESL
Key points
Approach of human annotator implemented in a computer system
Improvement in coverage and accuracy over general annotation servers
Unique workflow with knowledge-based rules
Visual helpers for interpretation of the results
Acknowledgements
MPI for Developmental Biology
Institute of Biochemistry and Biophysics PAS
Andrei Lupas Dirk Linke Toolkit development
team
Piotr Zielenkiewicz Marcin Grynberg