daTAA server

12
Server daTAA: http://toolkit.tuebingen.mpg.de/dataa Paweł Szczęsny MPI for Developmental Biology, Tuebingen, Germany Institute of Biochemistry and Biophysics PAS,

description

My talk about domain annotation in trimeric autotransporter adhesins.

Transcript of daTAA server

Page 1: daTAA server

Server daTAA: http://toolkit.tuebingen.mpg.de/dataa

Paweł Szczęsny MPI for Developmental Biology, Tuebingen, Germany Institute of Biochemistry and Biophysics PAS, Warsaw, Poland

Page 2: daTAA server

Internal complexity of TAAs

MFWMCFVIFFIGEFIMKKLSVTSKRQYNLYASPISRRLSLLMKLSLETVTVMFLLGASPVLA/SNLALTGAKNLSQNSPGVNYSKGSHGSIVLSGDDDFCGADYVLGRGGNSTVRNGIPISVEEEYERFVKQKLMNNATSPYSQSSEQQVWTGDGLTSKGSGYMGGKSTDGDKNILPEAYGIY-------------------------SFATGCGSSAQGNY-------------------------SVAFGANATALTGG-------------------------SQAFGVAALASGRV-------------------------SVAIGVGSEATGEA-------------------------GVSLGGLSKAAGAR-------------------------SVAIGTRANAYGEE-------------------------SIAIGGGLKQGSDNKIGSAVAQGLK-------------------------AISIGSDSVGFQHY-------------------------AVAIGAKSRALLLK-------------------------SVALGSYSVADVDAGVRGYDPVEDEPSKNVSFVWKSSVGAVSVGNRKEGLTRQIIGVAAG---TEDTDAVNVAQLKALR:GMISEK|GGWNLTVNNDNNTVVSSGGALDLSSGSKNLKIAKDGKKNNVTFDVARDLTLKSIKLDGVTLNETGLFIANGPQITASGINAGSQKITGVAEG---TDANDAVNFGQL-----------------------------------------------------------------------------------KKI|ETEVKE-----QVAASGFVKQDSDTK:YLTIGKDTDGDTINIANNKSDKRTLMGIKEGDISKDSSEAITGSQLFTTNQNVKTVSDNLQTAATNIAKTFGGDAKYE-DGEWTAPTFKVKTVTGEGKE-EEKTYQNVADALAGVGSSITNVQ-------NKVTEQVNNAIT--KVEGDALLWSDEANAFVARHEKSKLEKGASKATQENSKITYLLDGDVSKDSTDAITGKQLYSLGD--------------KIASYLGGNAKYE-NGEWTAPTFKVKTVKEDGKE-EEQTYHNVAAAFEGVGTSFTNVK-------NEITKQINHL----QSDDSAVVHYDKDDK-NGSINYASVTLGKGKDSAAVTLHNVAAGNIAKDSHDAINGSQIYSLNE--------------QLATYFGGGAGYNKEGKWTAPTFTVKTVKEDGEE-EEKTYQNVAEALTGVGTSFTNIK-------SEITKQIANEIS--NVTGDSLVKKDLDTNLITIGKEVAGTEINIASVSKADRTLSGVKEA---VKDNEAVNKGQL---------------------------------------------------------------------------------------DKGLKHLSDSLQSEDSAVVHYDKKTDETGGINYTSVTLG-GKDKTPVALHNVADGSISKDSHDAINGGQIHTIGE--------------DVAKFLGGAASFN-NGAFTGPTYKLSNIDAKGDV-QQSEFKDIGSAFAGLDTNIKNVNNNVTNKFNELTQNITNVTQ--QVKGDALLWSDEANAFVARHEKSKLGKGASKATQENSKITYLLDGDVSKDSTDAITGKQLYSLGD--------------KIASYLGGNAKYE-DGEWTAPTFKVKTVKEDGKE-EEKTYQNVAEALTGVGTSFTNVK-------NEITKQINHL----QSDDSAVVHYDKNKDETGGINYASVTLGKGKDSAAVTLHNVADGSISKDSRDAINGSQIYSLNE--------------QLATYFGGGAKYE-NGQWTAPIFKVKTVKEDGEE-EEKTYQNVAEALTGVGTSFTNIK-------SEITKQIANEIS--SVTGDSLVKKDLATNLITIGKEVAGTEINIASVSKADRTLSGVKEA---VKDNEAVNKGQL---------------------------------------------------------------------DTNIKKVE-------DKLTEAVGKVTQ--QVKGDALLWSNEDNAFVADHGKDSAKTKSKITHLLDGNIASGSTDAVTGGQLYSLNE--------------QLATYFGGGAKYE-NGQWTAPTFKVKTVNGEGKE-EEQTYQNVAEALTGVGASFMNVQNKIT---NEITNQVNNAIT--KVEGDSLVKQDNLG-IITLGKERGGLKVDFANRDGLDRTLSGVKEA---VNDNEAVNKGQL---------------------------------------------------------------------DADISKVNNNVTNKFNELTQNITNVTQ--QVKGDALLWSDEANAFVARHEKSKLEKGVSKATQENSKITYLLDGDISKGSTDAVTGGQLYSLNE--------------QLATYFGGDAKYE-NGQWTAPTFKVKTVNGEGKE-EEQTYHNVAAAFEGVGTSFTNIK-------SEITKQINNEIS--NVKGDSLVKKDLATNLITIGKEVAGTEINIASVSKADRTLSGVKEA---VKDNEAVNKGQL---------------------------------------------------------------------DTNIKKVE-------DKLTEAVGKVTQ--QVKGDALLWSNEDNAFVADHGKDSAKTKSKITHLLDGNIASGSTDAVTGGQLYSLNE--------------QLATYFGGGAKYE-NGQWTAPTFKVKTVNGDGKE-EEQTYQNVAEALTGVGTSFTNVQNKIT---NEITNQVNNAIT--KVEGDSLVKQDNLG-IITLGKERGGLKVDFANRDGLDRTLSGVKEA---VNDNEAVNKGQL---------------------------------------------------------------------DANISKVNNNVTNKFNELTQNITNVTQ--QVQGDTLLWSDEANAFVARHEKSKLEKGVSKATQENSKITYLLDGDISKGSTDAVTGGQLYSLNE--------------QLATYFGGGAKYE-NGEWTAPTFKVKTVNGEGKE-EEQTYHNVAAAFEGVGTSFTNIK-------SEITKQIDNEII--NVKGDSLVKRDLATNLITIGKEIEGSAINIANKSGEARTISGVKEA---VNNNEAVNKGQL---------------------------------------------------------------------DTNIKKVE-------DKLTEAVGKVTQ--QVKGDALLWSNEDNAFVADHGKDSAKTKSKITHLLDGNIASGSTDAVTGGQLYSLNE--------------QLATYFGGGAKYE-NGQWTAPSFKVKTVKEDGKE-EEQTYQNVAEALTGVGTSFTNVK-------NEITKQINHL----QSDDSAVVHYDKNKDETGTINYASVTLGKGKDSAAVTLHNVADGSISKDSRDAINGGQIHTIGE--------------DVAKFLGGDAAFK-DGAFTGPTYKLSNIDAKGDV-QQSEFKDIGSAFAGLDTNIKNVNNNVTNKFNELTQSITNVTQ--QVKGDSLLWSDEANAFVARHEKSKLEKGASKAIQENSKITYLLDGNVSKGSTDAVTGGQLYSMSN--------------MLATYLGGNAKYE-NGEWTAPTFKVKTVNGEGKE-EEQTYQNVAEALTGVGTSFTNIK-------SEIAKQINHL----QSDDSAVIHYDKNKDETGTINYASVTLGKGEDSAAVALHNVAAGNIAKDSRDAINGSQLYSLNE--------------QLLTYFGGNAGYK-DGQWIAPKFQVSQFKSDGSSGEKESYDNVAAAFEGVNKSLAGM--------NERINNVVTAGQ--NVSSNSLNWNETEGGYDARHNGVDSKLTHVENGDVSEKSKEAVNGSQLWNTNEKVEAVEKDVKNIEKKVQDIATVADSAVKYEKDSTGKKTNVIKLVGGSESDPVLIDNVADGDIKEGSKQAVNGGQLRDYTEKQMKIVLEDAKKYTDERFNDVVNNGVNEAKAYTDMKFEALSYAVEDVRKEARQAQLLVWRYLTYVTMIYRDL AAIGLAVSNLRYYDIPGSLSLSFGTGIWRSQSAFAVGAGYTSEDGNIRSNLSITNAGGHWGVGAGITLRLK

Page 3: daTAA server

Automated vs manual annotation

Domain type PFAM manually

Present in PFAM 28% 35%

Not present in PFAM - 18%

Coiled coils - 3%

Total 28% 56%

Present in PFAM 26% 31%

Not present in PFAM - 36%

Coiled coils - 25%

Total 26% 92%

Coverage of annotation

Page 4: daTAA server

Automated vs manual annotation

Domain type PFAM daTAA manually

Present in PFAM 28% 32% 35%

Not present in PFAM - 13% 18%

Coiled coils - 5% 3%

Total 28% 50% 56%

Present in PFAM 26% 28% 31%

Not present in PFAM - 27% 36%

Coiled coils - 11% 25%

Total 26% 66% 92%

Coverage of annotation

Page 5: daTAA server

Prediction of individual repeats in YadA

|----------Hep_Hag---------|---------Hep_Hag- |---Ylhead---|---Ylhead----|-----ASAKGIHSIAIGATAEAAKGAAVAVGAGSIATGVNSVAIGPLSKALG ----------| |---------Hep_Hag-------Ylhead--|---Ylhead---|---Ylhead---|----Ylhead--DSAVTYGAASTAQKDGVAIGARASTSDTGVAVGFNSKADAKNSVAIG ---| |----------Hep_Hag---------|-|----Ylhead-----|----Ylhead---|HSSHVAANHGYSIAIGDRSKTDRENSVSIGHESL

Page 6: daTAA server
Page 7: daTAA server
Page 8: daTAA server
Page 9: daTAA server
Page 10: daTAA server
Page 11: daTAA server

Key points

Approach of human annotator implemented in a computer system

Improvement in coverage and accuracy over general annotation servers

Unique workflow with knowledge-based rules

Visual helpers for interpretation of the results

Page 12: daTAA server

Acknowledgements

MPI for Developmental Biology

Institute of Biochemistry and Biophysics PAS

Andrei Lupas Dirk Linke Toolkit development

team

Piotr Zielenkiewicz Marcin Grynberg