RAP – a putative RNA-binding domain

4
vinculin. A crucial feature suggested by these structures is a two-step model of vinculin activation whereby binding of one ligand promotes structural changes that enable recruitment of additional ligands. However, this model needs to be established by direct experimentation. These crystal structures will be invaluable for the design of mutants or fluorescent probes that recognize vinculin in various conformational states. Such probes should enable conformational dynamics to be visualized in living cells as adhesions that assemble or disassemble in response to specific stimuli. Acknowledgements I thank Ernesto Fuentes, Mike Schaller and Keith Burridge for their comments. References 1 Winkler, J. et al. (1996) The ultrastructure of chicken gizzard vinculin as visualized by high-resolution electron microscopy. J. Struct. Biol. 116, 270–277 2 Johnson, R.P. and Craig, S.W. (1995) The carboxy-terminal tail domain of vinculin contains a cryptic binding site for acidic phospholipids. Biochem. Biophys. Res. Commun. 210, 159–164 3 Gilmore, A.P. and Burridge, K. (1996) Regulation of vinculin binding to talin and actin by phosphatidyl-inositol-4-5-bisphosphate. Nature 381, 531–535 4 Weekes, J. et al. (1996) Acidic phospholipids inhibit the intramolecular association between the N- and C-terminal regions of vinculin, exposing actin-binding and protein kinase C phosphorylation sites. Biochem. J. 314, 827–832 5 Zamir, E. and Geiger, B. (2001) Molecular complexity and dynamics of cell–matrix adhesions. J. Cell Sci. 114, 3583–3590 6 Burridge, K. and Mangeat, P. (1984) An interaction between vinculin and talin. Nature 308, 744–746 7 Weiss, E.E. et al. (1998) Vinculin is part of the cadherin–catenin junctional complex: complex formation between a-catenin and vincu- lin. J. Cell Biol. 141, 755–764 8 Izard, T. et al. (2004) Vinculin activation by talin through helical bundle conversion. Nature 427, 171–175 9 Bakolitsa, C. et al. (2004) Structural basis for vinculin activation at sites of cell adhesion. Nature 430, 583–586 10 Borgon, R.A. et al. (2004) Crystal structure of human vinculin. Structure (Camb) 12, 1189–1197 11 Bakolitsa, C. et al. (1999) Crystal structure of the vinculin tail suggests a pathway for activation. Cell 99, 603–613 12 DeMali, K.A. et al. (2002) Recruitment of the Arp2/3 complex to vinculin: coupling membrane protrusion to matrix adhesion. J. Cell Biol. 159, 881–891 13 Bass, M.D. et al. (1999) Talin contains three similar vinculin-binding sites predicted to form an amphipathic helix. Biochem. J. 341, 257–263 14 Izard, T. and Vonrhein, C. (2004) Structural basis for amplifying vinculin activation by talin. J. Biol. Chem. 279, 27667–27678 15 Tran Van Nhieu, G. et al. (1997) Modulation of bacterial entry into epithelial cells by association between vinculin and the Shigella IpaA invasin. EMBO J. 16, 2717–2729 16 Bourdet-Sicard, R. et al. (1999) Binding of the Shigella protein IpaA to vinculin induces F-actin depolymerization. EMBO J. 18, 5853–5862 17 Critchley, D.R. (2000) Focal adhesion – the cytoskeletal connection. Curr. Opin. Cell Biol. 12, 133–139 18 DeMali, K.A. and Burridge, K. (2003) Coupling membrane protrusion and cell adhesion. J. Cell Sci. 116, 2389–2397 19 Subauste, M.C. et al. (2004) Vinculin modulation of paxillin–FAK interactions regulates ERK to control survival and motility. J. Cell Biol. 165, 371–381 0968-0004/$ - see front matter Q 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.tibs.2004.09.001 Protein sequence motif RAP – a putative RNA-binding domain Ian Lee 1 and Wanjin Hong 1,2 1 Computational Molecular Biology Programme, Institute of Molecular and Cell Biology, 61 Biopolis Drive, Proteos, Singapore (138673) 2 Membrane Biology Lab, Institute of Molecular and Cell Biology, 61 Biopolis Drive, Proteos, Singapore (138673) A novel w60-residue domain has been identified in Homo sapiens MGC5297 and various other proteins in eukaryotes. Sequence searches reveal that the domain is particularly abundant in apicomplexans and is predicted to be involved in diverse RNA-binding activities. Protein and expressed sequence tag (EST) data archives from large-scale sequencing efforts continue to aid the identification of novel protein modules of organism-specific and widespread functional signifi- cance [1]. While analyzing Homo sapiens MGC5297, we discovered a domain that might mediate a range of cellular functions through its potential interactions with RNA. Identification of the RAP domain A PSI-BLAST [2] search of the non-redundant database at the NCBI seeded with the C-terminal 100 residues of H. sapiens MGC5297 resulted in statistically significant matches to multiple proteins in eukaryotes (E values%1E-3 within three iterations). Their relationships were confirmed by performing a Gibbs sampling-based iterative search (PROBE [3]; inclusion threshold of p!0.001 to a maximum of four iterations) and profile hidden Markov models (HMM) searches, derived from aligned members identified in the first pass of the PROBE search (HMMER2 [4]; E values !0.01). Searches were conducted on protein datasets for completely sequenced genomes in mammals, nematodes, flies, yeasts and plasmodia retrieved from the NCBI tax- onomy browser. The distribution of proteins, after removal of duplicates, for each proteome is shown in Figure 1b. Identified proteins in plasmodia were at least double that in equivalent or larger proteomes. A HMM search Corresponding author: Ian Lee ([email protected]). Available online 15 September 2004 Update TRENDS in Biochemical Sciences Vol.29 No.11 November 2004 567 www.sciencedirect.com

Transcript of RAP – a putative RNA-binding domain

Page 1: RAP – a putative RNA-binding domain

Update TRENDS in Biochemical Sciences Vol.29 No.11 November 2004 567

vinculin. A crucial feature suggested by these structures isa two-step model of vinculin activation whereby binding ofone ligand promotes structural changes that enablerecruitment of additional ligands. However, this modelneeds to be established by direct experimentation. Thesecrystal structures will be invaluable for the design ofmutants or fluorescent probes that recognize vinculin invarious conformational states. Such probes should enableconformational dynamics to be visualized in living cells asadhesions that assemble or disassemble in response tospecific stimuli.

Acknowledgements

I thank Ernesto Fuentes, Mike Schaller and Keith Burridge for theircomments.

References

1 Winkler, J. et al. (1996) The ultrastructure of chicken gizzard vinculinas visualized by high-resolution electron microscopy. J. Struct. Biol.116, 270–277

2 Johnson, R.P. and Craig, S.W. (1995) The carboxy-terminal taildomain of vinculin contains a cryptic binding site for acidicphospholipids. Biochem. Biophys. Res. Commun. 210, 159–164

3 Gilmore, A.P. and Burridge, K. (1996) Regulation of vinculin bindingto talin and actin by phosphatidyl-inositol-4-5-bisphosphate. Nature381, 531–535

4 Weekes, J. et al. (1996) Acidic phospholipids inhibit the intramolecularassociation between the N- and C-terminal regions of vinculin,exposing actin-binding and protein kinase C phosphorylation sites.Biochem. J. 314, 827–832

5 Zamir, E. and Geiger, B. (2001) Molecular complexity and dynamics ofcell–matrix adhesions. J. Cell Sci. 114, 3583–3590

Corresponding author: Ian Lee ([email protected]).Available online 15 September 2004

www.sciencedirect.com

6 Burridge, K. and Mangeat, P. (1984) An interaction between vinculinand talin. Nature 308, 744–746

7 Weiss, E.E. et al. (1998) Vinculin is part of the cadherin–cateninjunctional complex: complex formation between a-catenin and vincu-lin. J. Cell Biol. 141, 755–764

8 Izard, T. et al. (2004) Vinculin activation by talin through helicalbundle conversion. Nature 427, 171–175

9 Bakolitsa, C. et al. (2004) Structural basis for vinculin activation atsites of cell adhesion. Nature 430, 583–586

10 Borgon, R.A. et al. (2004) Crystal structure of human vinculin.Structure (Camb) 12, 1189–1197

11 Bakolitsa, C. et al. (1999) Crystal structure of the vinculin tailsuggests a pathway for activation. Cell 99, 603–613

12 DeMali, K.A. et al. (2002) Recruitment of the Arp2/3 complex tovinculin: coupling membrane protrusion to matrix adhesion. J. CellBiol. 159, 881–891

13 Bass, M.D. et al. (1999) Talin contains three similar vinculin-bindingsites predicted to form an amphipathic helix. Biochem. J. 341, 257–263

14 Izard, T. and Vonrhein, C. (2004) Structural basis for amplifyingvinculin activation by talin. J. Biol. Chem. 279, 27667–27678

15 Tran Van Nhieu, G. et al. (1997) Modulation of bacterial entry intoepithelial cells by association between vinculin and the Shigella IpaAinvasin. EMBO J. 16, 2717–2729

16 Bourdet-Sicard, R. et al. (1999) Binding of the Shigella protein IpaA tovinculin induces F-actin depolymerization. EMBO J. 18, 5853–5862

17 Critchley, D.R. (2000) Focal adhesion – the cytoskeletal connection.Curr. Opin. Cell Biol. 12, 133–139

18 DeMali, K.A. and Burridge, K. (2003) Coupling membrane protrusionand cell adhesion. J. Cell Sci. 116, 2389–2397

19 Subauste, M.C. et al. (2004) Vinculin modulation of paxillin–FAKinteractions regulates ERK to control survival and motility. J. CellBiol. 165, 371–381

0968-0004/$ - see front matter Q 2004 Elsevier Ltd. All rights reserved.

doi:10.1016/j.tibs.2004.09.001

Protein sequence motif

RAP – a putative RNA-binding domain

Ian Lee1 and Wanjin Hong1,2

1Computational Molecular Biology Programme, Institute of Molecular and Cell Biology, 61 Biopolis Drive, Proteos,

Singapore (138673)2Membrane Biology Lab, Institute of Molecular and Cell Biology, 61 Biopolis Drive, Proteos, Singapore (138673)

A novel w60-residue domain has been identified in

Homo sapiens MGC5297 and various other proteins in

eukaryotes. Sequence searches reveal that the domain

is particularly abundant in apicomplexans and is

predicted to be involved in diverse RNA-binding

activities.

Protein and expressed sequence tag (EST) dataarchives from large-scale sequencing efforts continueto aid the identification of novel protein modules oforganism-specific and widespread functional signifi-cance [1]. While analyzing Homo sapiens MGC5297,we discovered a domain that might mediate a range ofcellular functions through its potential interactionswith RNA.

Identification of the RAP domain

A PSI-BLAST [2] search of the non-redundant databaseat the NCBI seeded with the C-terminal 100 residues ofH. sapiens MGC5297 resulted in statistically significantmatches tomultiple proteins in eukaryotes (E values%1E-3within three iterations). Their relationshipswere confirmedby performing a Gibbs sampling-based iterative search(PROBE [3]; inclusion threshold of p!0.001 to a maximumoffour iterations) and profile hiddenMarkovmodels (HMM)searches, derived from aligned members identified in thefirst pass of the PROBE search (HMMER2 [4]; E values!0.01). Searches were conducted on protein datasets forcompletely sequenced genomes in mammals, nematodes,flies, yeasts and plasmodia retrieved from the NCBI tax-onomybrowser.Thedistributionof proteins, after removal ofduplicates, for each proteome is shown in Figure 1b.

Identified proteins in plasmodia were at least doublethat in equivalent or larger proteomes. A HMM search

Page 2: RAP – a putative RNA-binding domain

KIAA1792 Hs gi14017801 741-801 LAVQFTNRNQY-(8)-GLHNMKRRQLARLGYRVVELSYWEWLPLRTRLEKLAFLHETbrg4 Hs gi21739306 561-619 LAFLRWEFPNF-(8)-GRFVLARRHIVAAGFLIVDVPFYEWLELKSEWQKGAYLKDAt2g31890 At gi30685105 607-665 VALEIDGPTHF-(8)-GHTMLKRRYVAAAGWKVVSLSLQEWEEHEGSHEQLEYLREOSJNBa0032H19.2 Os gi30089729 575-632 LAFEIDGPSHF-(8)-GHTAFKRRYIAAAGWNLVSLSHQEWENLEGEFEQLEYLRRMGC5297 Hs gi40068497 591-649 IALCIDGPKRF-(8)-GKEAIKQRHLQLLGYQVVQIPYHEIGMLKSRRELVEYLQRFAST Hs gi5729822 477-535 VVLVLRERWHF-(8)-GSRALRERHLGLMGYQLLPLPFEELESQRGLPQLKSYLRQGH07286p Dm gi16197813 496-555 VALMVIDFHDI-(9)-GVTNLTFDLLEKSGYHVIPVPYNEFSTSDKLLKRVQYLESRIK5330408N05 Hs gi16550732 734-794 IALEFLDSKAL-(8)-GKSAMKKRHLEILGYRVIQISQFEWNSMSTKDARMDYLRECG31643PA Dm gi24582106 832-893 VAILLLKLDSF-(9)-GPESLKMRHLEMMGYKVMQINEHDWNSKASSTAKANYLKCRIK281042lI24 Hs gi7578791 634-691 VAVLCVSRSAY-(8)-GFLAMKMRHLNAMGFHVILVNNWEMDKLEME-DAVTFLKTPF14_0509 Pf gi23509731 1658-1716 VLNEMQKADPQ-(9)-GTTIFKHWLLQKSGWSIINVTSFEWNKINKD-EKKKHIIKPFE0800w Pf gi23613182 1116-1175 IFGKEYLMRTL-(10)GIVTLQMRILHAHGWKIIPINAGEWLQLNFD-QKKNLLSEPF10_0064 Pf gi23507868 805-862 NDVLYNPYKTF-(8)-SNVLVKINFLLIKGYNIIAIPFYTWRNMSYE-EKEKHVELPFL0605c Pf gi23508818 644-704 IFLQFENQWKL-(10)NFEMNKMNHLKKEGFKPIFICHDTFLKCKEEQEKLEYIHSPFE0905w Pf gi23613203 1308-1368 IVIEIDGPNHF-(8)-SNTLFKKRLLRALGYTVISVPISDYTFMFSALDTMHFTKRPFL1280w Pf gi23508950 1281-1339 IVIEVDGISHF-(7)-TNSVIKDYILKKLGWNVIHIPYQEWNQCFTFKKKVLYAIEPF14_0673 Pf gi23509895 433-491 IAIEVDGPSHF-(8)-TYTKLKHRILTKLGYNVIHISYIDWRKLRNKSEREEFILKPF13_0292 Pf gi23619474 892-950 LIIEVNGEHHF-(7)-SLSKFKHKLLSDLGYVVINIPYFDWAILNTDFDKKAYIKKPF11_0247 Pf gi23508438 635-692 YIIELNAHFQY-(8)-TLSKWRHKFLSQMGYKVIHISYRIWNNLHNDTQKMEYIYSPFE1295c Pf gi23613281 736-791 LINKTSSNQHT-(7)--NTKYKKWLLSNYPFHIIYIPYYKWNML-THKQKK-YLMEMAL6P1.211 Pf gi23612295 1542-1600 HLIDLIYEDVL-(8)-SSE-LRQKHLALKGWTVHSINFRDIYNSIKDKNIISYIFNRaa3 Cr gi20532239 1713-1772 LAVGAAAGGAV-(9)-GAGALRRRLLTHAGWLVVPVRERQWKDLRSAEQQRRVVRECG2124PA Dm gi18858071 563-623 VVIVIAGWNNV-(8)-GQMDMKLRQLRQLGHQPLVIYWHEWRELENSADRQDFLKR

Consensus/80% hh..h.s...b.....s...b+b+bL..bGapll.l.bbpW.ph....pp..albp2Structure/PHD EEEEE HHHHHHHHHHH EEEE HHHHHHH

995368_1998761.053.1 Tg VALDLLSEGNY-(8)-GVARLKRRHMKILGWKYVDVRRKTWLKLRTREERCNALRD995363.178.1 Tg VQF-LDDETKI-(10)TPHIIKARHLKQLGYHYLLVDCWQWRRLRSEAEQTVFLKQ995283.053.1 Tg VW-QCNTADRF-(8)-TAVKLQERITQAMGLKVGNCEYWQWMKMKRKRTRLEYIRM994720.098.1 Tg TVDMFHASQTV-(16)TALKRRHQLLLSMGFKLVLVPHQRWGSLQTEEKKLLFLLP995364_0.074.1 Tg FRDPEGAEIEL-----TAWTLRHELLHAQGWCVVAIPHFEWTALPDRLTRLRYLQR995368_3464634.010.1 Tg EEKLLSSTGGV-(14)PSDLLKHRHLRLAGYTLLVIRLSQWQALDTVTEKREYIVS995366_2595624.041.1 Tg QCWFVDGPSDF-(7)-TANKLQHRILSELGWNIRRVRWNDWVQLGTDMDKVEYLS-995373.113.1 Tg IAIEVDGPTHF-(7)-TATKLKHRLLTRMGYKVLHVPYFEWRRLRGQKEREEYMRR995363.185.1 Tg LAVEIDGPSHF-(7)-VASRLKQRLLREMGWTVLPVSFFEWRQLVTPERKVAY---995368_0.144.1 Tg LVVEVDGEAHF-(7)-TATRMKRELLAAMGWRVVVVPQELWRN-KRKGKIKEFVAR995362_1355687.059.1 Tg GEISLDNAVPQ-(6)-DSTRLKHRILTGAGWHVLHIAWFDWPR--RPHNQQAALMQ995362_0.025.1 Tg VLGEKDGLKTL-(10)GVNLLRLRALKKRGWQIVTVNVHEWSEAQTPDNRLSLLLD995361_1370831.051.1 Tg ILLPE--KKTV-(16)TASRLQQRLLELQGYSVCVLPYYEWSELQNPRFLWTFGRR995279_0.075.1 Tg TSQHH-RWRGL—(5)-TSSLFKQRLLERHGFRVVRIAYSDWMRLRDRPWLLKRLSP995367.221.1 Tg RIAREEATKNL-(5)-PETKARLTHLKELGWRVLVVHYRDWTAINSPLSKAKFVRSTap787h09.q1kb.349 Ta MCIDILSKGSV-(8)-GSVNMTERHMNILGYKYIKIRKEEW---------------Tap579b07.q1c.217c Ta LVSFVDDERKL-(10)SNLSLKLRNLNALGYKASVVRYWEWRRLKTEKNELKYAFRTap579b07.q1c.64 Ta IAIEVDGPSHF-(7)-TATKLKHRLLTRMGYKVLHVPFFEWRRLRGAREREEYMRATapCf4.q1c.31c Ta FFIEIDGPYHY-(8)-SSSKAKHYIIESYGTKLVHIPYFEWSKCVGDEEKIKYIESTap821d03.p1c.86c Ta VAIEVNGYTHF-(8)-ALTQLKYKILKDMGWNVVGINYYNWKN-RNKQSRLDYIVKTap404f10.p1c.195 Ta LVWLCNSYHRF-(8)-ANSKLLTKLIKAFGYKVSVINYYQWGRMKCKRTRFAFLRMTapCf4.q1c.54 Ta NRKLCFEYNYL-(11)GLVSFRRRLFNKFGYNTAVIHRHQWENLTI-LEKEEQLLQTap404f10.p1c.206c Ta LLDENCDGEGI-(11)GKTIYRNNILSKLGYKLVSIPWFVIHQYNND-NVYNYVKDSecretory_protein Ta gi46195393 201-270 NCWFIDGPSCF-(7)-TKVKLQHRILNNLGWNIRRVQWYKWVDYMNDEDKINYIRK

Species: Proteome Size RAP proteinsMammals (H. sapiens, M. musculus, R. norvegicus) ~40000 6Flies (D. melanogaster, A. gambiae) ~14000 3Plants (A. thaliana, O. sativa) ~27000 1Nematodes (C. elegans, C. briggsae) ~19000 0Yeasts (S. cerevisiae, S. pombe) ~7000 0Plasmodia (P. falciparum, P. yoelli yoelli) ~5800 ~11

(a)

(b)

Figure 1. Multiple alignment and species distribution of the RAP (for RNA-binding domain abundant in Apicomplexans) domain. (a) Multiple sequence alignment of RAP

domains generated by CLUSTAL W [5] and annotated with CHROMA [17]. Domains were identified by PSI-BLAST [2] and the secondary structure predicted using PHD [6] is

shown after the alignment (E, b strand; H, a helix). Sequences are denoted by protein, gene, complementary DNA or contig identifiers followed by species abbreviation,

accession codes and residue limits. Residues are colored according to the 80% concensus sequence: orange against yellow, hydrophobic (ACLIVMHYFW); black against

yellow, aliphatic (ILV); blue against yellow, aromatic (FYHW); white against green, large (EFHIKLMQRWY); charged, cyan (KRED); blue, polar (KRHEDNST); red, tiny or small

(AGS). Invariant residues are colored against a gray background. Numbers in parenthesis represent amino acids not shown between alignment blocks. Species

abbreviations: At, Arabidopsis thaliana; Cr, Chlamydomonas reinhardtii; Dm, Drosophila melanogaster; Hs, Homo sapiens; Mm, Mus musculus; Os, Oryza sativa; Pf,

Plasmodium falciparum 3D7; Pyy, Plasmodium yoelli yoelli; Ta, Theileria annulata; Tg, Toxoplasma gondii. Alignment of six-frame translated T. annulata expressed

sequence tags and T. gondii peptide data detected by HMMsearch. (b) Distribution of RAP proteins detected in completely sequenced genomes. Proteome sizes are

approximate.

Update TRENDS in Biochemical Sciences Vol.29 No.11 November 2004568

www.sciencedirect.com

Page 3: RAP – a putative RNA-binding domain

200 AA

OSJNBa0032H19.2

At2g31890

Raa3

PFL0605c

PF10_0064

PFE1295c

(d)

PF11_0247

PFE0800w

PF14_0673

PFL1280w

PFE0905w

MAL6P1.211

PF13_0292

PF14_0509

(c)

(b)

RIK2810421I24

CG31643

GH07286p

Tbrg4

MGC5297

RIK5330408N05

FAST

(e)

(a)

KIAA1792

Key:

GIDA ActA

Coiled Coil PDX

C34 PT

CWC15 Transmembrane

POX G5 DNATop

FAST FTZ

RAP

Figure 2. Domain architecture of RAP (for RNA-binding domain abundant in Apicomplexans) domain-containing proteins. Schematic representation of the RAP-domain

(dark blue square) proteins determined through a local RPS-BLAST (http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi) search against position-specific scoring matrices

downloaded from the NCBI conserved domain database resource [18] (version 1.66; ftp://ftp.ncbi.nih.gov/pub/mmdb/cdd/cdd.tar.gz) as well as online searches against the

PFAM [19] and SMART [20] domain databases. Searches were repeated on BLASTP-identified orthologs of RAP proteins (E value!10–12, data not shown). For Homo sapiens

MGC5297, an otherwise undetected RNA polymerase III complex component C34 (PFAM05158) was assigned based on a significant hit in a corresponding region of its

putative mouse ortholog (gi26328905; BLASTP E valueZ0.0). PT (PFAM03247) refers to a sequence in prothymosin a – a nuclear protein that is co-localized with cleavage

bodies involved in 3 0 cleavage and polyadenylation of RNA. Cwc15 (PFAM04889) refers to a component of a multi-protein complex involved in pre-mRNA splicing. FTZ

(PFAM03867) refers to the LxxLL-motif-containing nuclear co-activator. POX G5 (PFAM04599) is a sequence in poxviruses. DNATop (SMART00435) refers to a DNA

topoisomerase-like sequence present in poxvirus proteins. ActA (PFAM05058) refers to a domain in Listeriamonocytogenes ActA involved in cell motility. GIDA (PFAM01134)

refers to a domain resembling that of the bacterial glucose-inhibited division protein, GidA. PDX refers to a pyridoxamine 50 phosphate oxidase-like sequence assigned by

Rivier et al. [8]. (a) The Fas-activated serine/threonine kinase (FAST) family of cell-cycle regulators. Functional sites flanking the RAP domain were found to be essential for its

interactions with BCL-XL [13], giving confidence about its boundaries and making it unlikely to be an extension of the FAST domain. (b–e) Plasmodium proteins grouped by

expression profiles generated from high-throughput proteomic techniques [15,16]. (b) Putativemaintenance proteins that are expressed at all developmental stages. (c) Post-

infection proteins that are salient at tropozoite and gametocyte stages. (d) PFE1295c is part of a cluster of proteins expressed primarily in the tropozoite stage. (e) Plasmodium

proteins that have not been categorized due to unavailability of DNA expression profile data and other RAP-domain proteins.

Update TRENDS in Biochemical Sciences Vol.29 No.11 November 2004 569

against six-frame translated ESTs from Theileriaannulata and predicted peptide data from Toxoplasmagondii (from http://www.sanger.ac.uk/Projects/T_annulataand http://toxodb.org/restricted/data/Genome/pep, respect-ively) recovered at least 15 unique proteins in T. gondiiand nine proteins in T. annulata, confirming over-representation of the domain in apicomplexans. Thedomain is, therefore, named RAP (an acronym forRNA-binding domain abundant in Apicomplexans) basedon its inferred RNA-binding function.

A multiple alignment of the RAP domain withCLUSTAL [5] used the PAM substitution matrix andgap opening and extension penalties of 0.0 and 5.0,respectively. The alignment produced was manuallyedited to remove inserts. The final alignment reveals adomain of w60 residues, consisting of multiple blocks ofcharged and aromatic residues. PHD [6] predicts that thedomain is composed of a helical and b strand structures

www.sciencedirect.com

(Figure 1). Two predicted loop regions that are dominatedby glycine and tryptophan residues are found before andafter the central b sheet.

Biological significance of the RAP domain

The chloroplast gene psaA encodes a peptide of thephotosystem I reaction center in the green algaeChlamydomonas reinhardtii. The group II introns ofpsaA require tscA, a small chloroplast RNA, for theirsplicing. C. reinhardtii Raa3 binds specifically to tscARNA as part of a ribonucleoprotein (RNP) complexinvolved in trans-splicing of group II introns fromdisparate psaA transcripts [7]. No evidence was foundfor a role by other factors in the activities of the RNPcomplex or its stability.

The trans-splicing reaction is dependent on aC-terminal segment of Raa3 that includes a pyridoxamine5 0-phosphate oxidase (PDX) sequence and a RAP domain.

Page 4: RAP – a putative RNA-binding domain

Update TRENDS in Biochemical Sciences Vol.29 No.11 November 2004570

The PDX-like sequence does not correspond to thecatalytic site of PDX and is not essential for trans-splicing[7]. In addition, PDX does not bind RNA, which suggeststhat the active site is the RAP domain. Comparison ofRaa3 with other eukaryotic trans-splicing factors supportsa putative RNA-binding function for the RAP domain. InCaenorhabditis elegans serine and arginine dipeptide-containing (SR) proteins, only the RNA-recognition motif,which binds splicing enhancers within the pre-mRNAsubstrate, is indispensable to the trans-splicing reaction[8]. Similarly, inclusion of exon N1 of the mouse c-src genedepends on binding of a complex of splicing regulatorsto the intronic splicing enhancer through the KH(hnRNP K-homology) RNA-binding domains of KH-typesplicing regulatory protein (KSRP) [9].

In addition, Raa3 possesses sequence features incommon with nematode and trypanosome SR proteins.These are stretches of SR dipeptides that are distributedthroughout the length of the protein. Experimentalcharacterization of these peptides in C. elegans SR [8],its protein homolog in Trypanosoma brucei, TSR1 [10],and its interacting partner, TSR1IP [11], demonstratestheir roles in recognition of other splicing factors withinthe RNP complex. However, the SR peptides are dispens-able to trans-splicing, as their truncation in SF2/ASFneither affects its capacity to bind its splicing enhancersnor its activation of splicing [12].

Another RAP protein, Fas-activated serine/threoninekinase (FAST), has been experimentally characterized asan interacting partner of TIA-1, a downstream effector ofthe double-stranded RNA-dependent protein kinase(PKR)-eukaryotic initiation factor 2 (eIF2) pathway [13].Thepresence of viralRNAs triggers a stereotypical responseleading to inhibition of protein synthesis at translationinitiation, thus restricting viral protein translation and,ultimately, viral replication. Stalled translation initiationcomplexes and associated transcripts are routed to stressgranules (SG) – discrete cytoplasmic foci where untrans-lated mRNAs are subsequently targeted for translation ordegradation. TIA-1 is instrumental in the assembly ofSGs and the escort of untranslated mRNAs to them [14].Both TIA-1 and PKR-mediated reactions are dependent ontheir RNA-recognition motifs. Therefore, the regulation ofTIA-1 by FAST might thus be initiated by recognition ofRNA by the RAP domain.

Thus, based on the correspondence between Raa3and eukaryotic trans-splicing factors as well as therelationship between FAST and the PKR/TIA-1 pathway,an RNA-binding function is predicted for the RAP domain.

An interesting feature of apicomplexans is that theyappear to possess an unparalleled number of RAP pro-teins. Functions for these proteins in plasmodia and theirpotential as drug targets are suggested by their stage-specific expression. Their expression profiles are clusteredinto at least three groups, namely proteins expressedat all developmental stages (PF14_0509, PF13_0292,MAL6P1.211, PFE005w), proteins expressed at tropozoiteand gametocyte stages (PFL1280w, PF14_0673,PFE0800w, PF11_0247) and proteins expressed only atthe tropozoite stage (PFE1295c) [15,16]. The first group is

www.sciencedirect.com

likely to be important to the maintenance of parasitefunctions and be crucial for its survival, the second inproliferation within host cells and the last in proliferationspecifically in erythrocytes (Figure 2).

Thus, RAP is a putative RNA-binding domain ofparticular importance to apicomplexans in its mediationof various parasite-host cell interactions. Its inclusion indomain databases should aid the annotation of apicom-plexan genomes currently being sequenced.

Acknowledgements

We acknowledge equipment support from the Bioinformatics Institute inthe form of a powerful Linux workstation on which various bioinformaticstools were developed and run.

References

1 Yeats, C. and Bateman, A. (2003) The Bon domain: a putativemembrane-binding domain. Trends Biochem. Sci. 28, 352–355

2 Altschul, S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a newgeneration of protein database search programs. Nucleic Acids Res.25, 3389–3402

3 Neuwald, A.F. et al. (1997) Extracting protein alignment models fromthe sequence database. Nucleic Acids Res. 25, 1665–1673

4 Eddy, S.R. (1998) Profile hidden Markov models. Bioinformatics 14,755–763

5 Thompson, J.D. et al. (1994) CLUSTALW: improving the sensitivity ofprogressive multiple sequence alignment through sequence weight-ing, position-specific gap penalties and weight matrix choice. NucleicAcids Res. 22, 4673–4680

6 Rost, B. (1996) PHD: predicting one-dimensional protein structure byprofile based neural networks. (1996).Methods Enzymol. 266, 525–539

7 Rivier, C. et al. (2001) Identification of an RNA–protein complex inchloroplast group II intron trans-splicing in Chlamydomonas rein-hardtii. EMBO J. 20, 1765–1773

8 Furuyama, S. and Bruzik, J.P. (2002) Multiple roles for SR proteins intrans-splicing. Mol. Cell. Biol. 22, 5337–5346

9 Min, H. et al. (1997) A new regulatory protein, KSRP, mediatesexon inclusion through an intronic splicing enhancer. Genes Dev. 11,1023–1036

10 Ismaili, N. et al. (1999) Characterization of a SR protein fromTrypanosoma brucei with homology to RNA-binding cis-splicingproteins. Mol. Biochem. Parasitol. 102, 103–115

11 Ismaili, N. et al. (2000) Characterization of a Trypanosoma bruceiSR-domain containing protein bearing homology to cis-spliceosomalU1 70kDa proteins. Mol. Biochem. Parasitol. 106, 109–120

12 Tange, T.O. and Kjems, J. (2001) SF2/ASF binds to a splicing enhancerin the third HIV-1 Tat exon and stimulates U2AF binding indepen-dently of the RS domain. J. Mol. Biol. 312, 649–662

13 Li, W. et al. (2004) FAST is a BCL-XL-associated mitochondrialprotein. Biochem. Biophys. Res. Commun. 318, 95–102

14 Anderson, P. and Kedersha, N. (2002) Stressful initiations. J. Cell Sci.115, 3227–3234

15 Le Roch, K.G. et al. (2003) Discovery of gene function by expressionprofiling of the malaria parasite life cycle. Science 301, 1503–1508

16 Florens, L. et al. (2002) A proteomic view of the Plasmodiumfalciparum life cycle. Nature 419, 520–526

17 Goodstadt, L. and Ponting, C.P. (2001) CHROMA: consensus-basedcoloring of multiple alignments for publication. Bioinformatics 17,845–846

18 Marchler-Bauer, A. et al. (2003) CDD: a curated Entrez database ofconserved domain alignments. Nucleic Acids Res. 31, 383–387

19 Bateman, A. et al. (2000) The PFAM protein families database.NucleicAcids Res. 28, 263–266

20 Schultz, J. et al. (1998) SMART, a simple modular architectureresearch tool: identification of signaling domains. Proc. Natl. Acad.Sci. U. S. A. 95, 5857–5864

0968-0004/$ - see front matter Q 2004 Elsevier Ltd. All rights reserved.

doi:10.1016/j.tibs.2004.09.005