1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020 · 58 Opisthokonta,...
Transcript of 1,2 , Courtney W. Stairs , Daniel J. G. Lahr , Robert E. · 29.04.2020 · 58 Opisthokonta,...
1
The integrin-mediated adhesome complex, essential to multicellularity, is present in the most 1 recent common ancestor of animals, fungi, and amoebae. 2 3 Seungho Kang1,2, Alexander K. Tice1,2, Courtney W. Stairs3, Daniel J. G. Lahr4, Robert E. 4 Jones1,2, Matthew W. Brown1,2* 5 6 1. Department of Biological Sciences, Mississippi State University USA 7 2. Institute for Genomics, Biocomputing & Biotechnology, Mississippi State University, USA 8 3. Department of Cell and Molecular Biology, Uppsala University, Sweden. 9 4. Department of Zoology, University of São Paulo, Brazil 10 *Correspondence: Matthew W. Brown, [email protected] 11 12 Keywords: Integrin, Amoebozoa, Protein Domain architecture, amoeba, reductive evolution 13
14
Abstract: Integrins are transmembrane receptor proteins that activate signal transduction 15
pathways upon extracellular matrix binding. The Integrin Mediated Adhesion Complex (IMAC), 16
mediates various cell physiological process. The IMAC was thought to be an animal specific 17
machinery until over the last decade these complexes were discovered in Obazoa, the group 18
containing animals, fungi, and several microbial eukaryote lineages. Amoebozoa is the eukaryotic 19
supergroup sister to Obazoa. Even though Amoebozoa represents the closest outgroup to Obazoa, 20
little genomic-level data and attention to gene inventories has been given to the supergroup. To 21
examine the evolutionary history of the IMAC, we examine gene inventories of deeply sampled 22
set of 100+ Amoebozoa taxa, including new data from several taxa. From these robust data 23
sampled from the entire breadth of known amoebozoan clades, we show the presence of an 24
ancestral complex of integrin adhesion proteins that predate the evolution of the Amoebozoa. Our 25
results highlight that many of these proteins appear to have evolved earlier in eukaryote evolution 26
than previously thought. Co-option of an ancient protein complex was key to the emergence of 27
animal type multicellularity. The role of the IMAC in a unicellular context is unknown but must 28
also play a critical role for at least some unicellular organisms. 29
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
2
30
INTRODUCTION 31
Integrins are transmembrane signaling and adhesion heterodimers, made up of a-integrin (ITGA) 32
and b-integrin (ITGB) protein subunits [1]. They are important signal transduction molecules 33
across the cell membrane and associate with a set of intracellular proteins that act on the actin 34
cytoskeleton, together known as the integrin mediated adhesion complex (IMAC) (figure 1A) [2]. 35
The intracellular component of the IMAC includes adhesome proteins that are 1) integrin bound 36
proteins that can bind to actin directly (talin, α-actinin, and filamin); 2) integrin bound proteins 37
that can bind indirectly to the cytoskeleton (integrin linked kinase (ILK), particularly interesting 38
new cysteine-histidine rich protein (PINCH), and Paxillin); and 3) non-integrin binding proteins 39
such as vinculins. 40
41
The IMAC is best known for its critical role in cellular aspects of behavior including development, 42
migration, proliferation, survival and metabolism [1,3,4]. They function by relaying messages 43
through outside/in and inside/out between the extracellular matrix and the internal cytoskeleton[5]. 44
45
Since integrin proteins have been demonstrated to be absent in other complex multicellular 46
lineages of eukaryotes (i.e., plants and fungi) as well as in the choanoflagellates, which are the 47
closest protistan relatives of Metazoa, integrins were initially believed to be an animal specific 48
evolutionary innovation. However, over the last decade, investigations into the genomic content 49
of other close protistan relatives of animals have led to the discovery of integrins and other IMAC 50
components outside of Metazoa. Integrins and complete components of the IMAC have been 51
reported outside Metazoa in unicellular opisthokonts, such as Capsaspora owczarzaki[6], 52
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
3
Pigoraptor spp.[7], Syssomonas multiformis[7], Corallochytrium limacisporum[8], Creolimax 53
fragrantissima[9], Sphaeroforma arctica[10], as well in the apusomonad Thecamonas trahens [6], 54
and breviate Pygsuia biforma[11]. Thus, the IMAC complex and other associated proteins have a 55
much more ancient evolutionary origin than previously expected, well outside of the animals, but 56
IMAC remains to be only reported the in lineage called Obazoa, which is composed of 57
Opisthokonta, Breviatea, and Apusomonadida [11]. However, are these complexes exclusive to 58
Obazoa, or have they appeared even deeper in evolutionary history? 59
60
Amoebozoa is the eukaryotic supergroup that is sister to Obazoa, altogether grouped in the 61
Amorphea[12]. Despite the relatively close evolutionary proximity of Amoebozoa to the animals 62
containing lineage, genomic-level data and relevant research into the supergroup remains sparse. 63
The predicted or inferred genomic complement present has yet to be undetermined in the last 64
common ancestor of Amoebozoa. Recent works revealed that in Amoebozoa (Dictyostelium 65
discoideum and Acanthamoeba castellanii) crucial components of the IMAC were missing [13], 66
namely ITGA and ITGB. Was this result simply due to a lack of data or is it an evolutionary trend? 67
Here, we examine the gene repertories across the Amoebozoa supergroup to investigate the 68
evolutionary history of the adhesome associated with the integrin proteins. The absence of the 69
integrin proteins reported previously may simply be due to the great paucity of taxonomically 70
broad genomic data from this clade, which is a limiting factor in our knowledge of IMAC evolution. 71
72
To test if the lack of IMAC is truly absence or rather unobserved due to a lack of data from a broad 73
taxonomic sampling, we took a comparative genomic/transcriptomic approach utilizing data 74
recently used to resolve the tree of Amoebozoa [14]. Here, we have found a number of predicted 75
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
4
IMAC proteins across a wide breadth of Amoebozoa and in some unexamined obazoans 76
(Ministeria vibrans (Opisthokonta [12]) and Lenisia limosa (Breviatea[15])), including integrins, 77
both ITGA and ITGB (figure 1B). Across the breadth of these taxa, we have identified a nearly 78
complete IMAC. Here we demonstrate that IMAC is not exclusive to Obazoa. Rather it was in the 79
last common ancestor of Amorphea (i.e., Obazoa + Amoebozoa), which from our interpretations 80
of the gene distributions probably had a full set of these components (figure 1C). 81
82
BACKGROUND 83
Molecular details on Integrin proteins 84
The overall architecture of both integrin proteins includes a large extracellular N-terminal ligand-85
binding head domain and C-terminal stalk (also termed “legs”). Integrin activation is dependent 86
on the binding of a ligand (such as extracellular matrix proteins (collagen, fibronectin, and laminin) 87
[5]and the binding of divalent cations (primarily Ca2+, Mg2+, or Mn2+) on unique but well-88
conserved cation binding motifs[16–19]. In animals, there are universally five cation binding sites 89
on the ITGA and three on the ITGB paralogs[16,17,19,20]. Intracellular integrin engagement with 90
cytoskeletal proteins in IMAC can regulate cell-signaling pathways that control differentiation, 91
survival, and motility making the integrins one of the most functionally diverse family of cell 92
adhesion molecules [5,21]. 93
94
Integrin Alpha (ITGA) 95
Metazoa ITGA 96
In metazoan ITGA, which we refer herein to as “canonical ITGA proteins”, there is a β-propeller 97
made of seven blades (IPR013519)[18], an integrin leg domain (IPR013649)[18], a 98
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
5
transmembrane domain, and a C-terminal cytoplasmic tail amino acid motif (KXGFFXR - 99
IPR018184) [22](numbers are INTERPRO ids derived from INTERPROSCAN version 100
5.27.66 (IPRSCAN)[23]). These proteins are typically 700-800 amino acids in animals. The β-101
propeller blade motif is composed of an FG-GAP domain (IPR013517), and in the final 3-4 blades 102
there is a cation binding site motif between an “FG” and the “GAP” motifs. Of note, the consensus 103
pattern of the FG-GAP domain is highly variable[20], and not necessarily those amino acids (i.e., 104
FG and GAP)[20]. The cation binding blades of the β-propeller influences extracellular ligand 105
affinity and ligand specificity[24,25]. The cation motifs follow the consensus pattern of D/E-h-106
D/N-x-D/N-G-h-x-D/E (h=hydrophobic residue; x=any residue)[16]. In metazoan ITGA, there are 107
three ITGA leg domains (IPR013649) located next to the ITGA head. These ITGA leg 108
(IPR013649) domains are structurally and functionally similar to immunoglobulin (Ig) fold-like 109
beta sandwich[18]. 110
111
The classification of an authentic ITGA protein outside Metazoa would be a membrane docked 112
protein with an ITGA head composed of a β-propeller made of several β-propeller blades 113
(IPR013519) (figure 2), some with FG-GAP domains (IPR013517) only, and some with a cation 114
motif flanked by FG and GAP motifs (figure 4). However, we do not discard the possibility of 115
truncated ITGA because they occur in metazoans [26,27] and in the opisthokont Creolimax 116
fragrantissima [9]. 117
118
119
Integrin Beta (ITGB) 120
Metazoan ITGB 121
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
6
In metazoan ITGB, which we refer herein to as “canonical ITGB proteins”, there are seven 122
domains; a plexin-semaphorin-integrin (PSI) (IPR002165), a hybrid domain (described below), 123
three to four EGF modules (IPR013111), an integrin b tail subunit (IPR012896), transmembrane, 124
and signal peptide domains[18]. The metazoan hybrid domain is composed of a unique von 125
Willebrand factor A (known as ITGB “inserted” domain (or b-I) (IPR002369) and a Ig-like domain 126
(IPR013783) upstream of the b-I domain [18] (figure 3). EGF domains make up a cysteine rich 127
stalk motif (CRSM) that corresponds to a b-subunit stalk region[18]. The consensus pattern of the 128
CRSM in Opisthokonta is CXCXXCXC[6,28], in Metazoa there are typically 3-4 CRSMs. The 129
metazoan transmembrane domain consists of GXXXG membrane motif, important for 130
oligomerization of integrin proteins[29]. 131
132
In Metazoa, the b-I-domain is made up of three metal ion binding sites (comprising a ca. 200 133
amino-acid region in total)[18]. These are, a metal ion dependent adhesion site (MIDAS), site 134
adjacent to MIDAS (called ADMIDAS), and a synergistic metal ion binding site (SyMBS)[16]. 135
These metal ion binding sites are key regulators for all integrin-ligand interactions and require the 136
divalent cation ions such as manganese, calcium, and magnesium ions[16]. Three metal binding 137
sites have a consensus motif of DXSYSX95EX30D (at site 150 in Human ITGB1 (NCBI:P05556.2) 138
(MIDAS), SXXDDX121DX83A (at site 154 in Human ITGB1 (NCBI:P05556.2) (ADMIDAS), and 139
D/EX55NDXPXE (at site 189 in Human ITGB1 (NCBI:P05556.2) (SyMBS)[17]. The MIDAS and 140
SyMBS (the positive regulatory site for integrin activation[16]) motifs are essential for integrin-141
ligand binding[16,18], but ADMIDAS (the negative regulatory [30] site for integrin inhibition) is 142
not absolutely necessary for integrin activation[30]. The ITGB tail subunit (IPR012896) consists 143
of two conserved motifs responsible for activating the integrin complex: the cytosolic tail motif 144
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
7
(NPXY/F - to scaffold cytoskeletal proteins) located after the transmembrane region, and the 145
ITGA interacting motif (LLXXXHDRKE – a highly electrically charged amino acid motif) 146
downstream (C-terminus) of the NPXY/F motif, which interacts with scaffolding and signaling 147
proteins[31]. 148
149
The classification of an authentic ITGB protein outside Metazoa would be a membrane docked 150
protein with an ITGB head composed of a von Willebrand factor A (IPR002369) with three metal 151
ion binding sites, and cysteine rich regions at downstream (N-terminal) (figure 3). Again, we do 152
not discard the possibility of truncated ITGB since these are present in both metazoans [26,27] and 153
Creolimax fragrantissima[9]. 154
155
RESULTS 156
Integrin Alpha 157
Amoebozoan ITGA: General observations on ITGA 158
Our results show ITGA homologs are found in 23 out of 113 amoebozoan transcriptomes 159
examined (figure 2). Of those 23 amoebozoans, eight have more than one ITGA paralog (figure 160
2). Amoebozoan ITGAs vary greatly in size ranging from 523-1200 amino acids in length 161
(supplemental table 1). Many of the proteins we identified are predicted to have a diverse number 162
of β-propellers from a few (3-5 b-propeller blades), to canonical (6-7 b-propeller blades), to many 163
(more than 8 b-propeller blades) and cation binding sites (1 to 5 sites). However, all amoebozoan 164
ITGA proteins have at least three β-propeller blade (IPR013517) domains (figure 2), with an FG-165
GAP domain and FG-GAP/cation binding motif (figure 3A). The positions of the cation binding 166
motif/FG-GAP domains are seemingly distributed randomly (supplemental table 1), and not 167
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
8
necessarily in the last 3-4 blades as in canonical animal ITGA. We noticed the presence of a C-168
terminal transmembrane anchor (IPR021157) domain in three amoebozoan ITGAs. Animal-like 169
KXGFFXKR cytoplasmic tail motif is absent in all but one amoebozoan (figure 3E; supplemental 170
table 1). 171
172
We find amoebozoan ITGAs are present in all three major clade, but interestingly these proteins 173
are not in eight amoebozoan taxa that have genome sequences, (Dictyostelium spp., Acytostelium 174
spp., Heterostelium spp., Speleostelium caveatum, Physarum polycephalum, Entamoeba spp., 175
Acanthamoeba spp., and Protostelium fungiovrum[32]). However very importantly, we were able 176
to confidently find ITGA in Mastigamoeba balamuthi (figure 2), which has a complete genome 177
available [33]. To confirm the presence of these genes in the genome, we used a BLAST-based 178
homology search to identify 2 ITGA genes of M. balamuthi to the genome data (CBKX00000000) 179
on NCBI (see Methods) [86]. For ITGA, there are no introns in either paralog 1 180
(CBKX010025823.1) or paralog 2 (CBKX010005624.1). 181
182
We classified ITGA into two representative types based on their predicted IPRSCAN domains, 183
cation motifs, their similarity to metazoans ITGA, and novel domains that are previously 184
unobserved in ITGA proteins. 185
186
Amoebozoa ITGA: Type I ITGA: similar architecture to Metazoa 187
For communication purposes we have classified the amoebozoan ITGA homologs as ‘Type I’ and 188
‘Type II’ based on their similar or divergent domain complements compared to metazoan ITGA 189
sequences, respectively (figure 2). From the 23 amoebozoan taxa in which we identified an ITGA, 190
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
9
19 taxa have Type I (figure 2). One amoebozoan (Amphizonella sp. 9 paralog 2) has a Type I ITGA 191
with a domain architecture and cation motifs that are virtually identical to head region of metazoan 192
ITGA (figure 2, supplemental table 1). The rest of the amoebozoan Type I ITGAs have canonical 193
beta propellers, but they have varying numbers of cation motifs that differs from the canonical 194
ITGA proteins (supplemental table 1). Most of the cation motifs of all but four amoebozoan Type 195
I ITGAs follow the consensus metazoan amino acid pattern (figure 3A, supplemental table 1). 196
thirteen of the amoebozoan Type I ITGAs have both a predicted signal peptide and transmembrane 197
region (figure 2). The rest of the amoebozoans Type I ITGAs have only a predicted signal peptide 198
region or transmembrane region. Amoebozoa ITGA without transmembrane region may be the 199
result of endogenous soluble integrin. In a few cases we did not observe the signal peptide or the 200
transmembrane region. This may be due to truncation as we are predicting proteins from a 201
transcriptome or the product of shedding [34]. However, in the genome of Mastigamoeba 202
balamuthi all ITGA paralogs lack signal peptides and one lacks a transmembrane region (paralog 203
1). 204
205
Out of 19 Type I amoebozoan ITGA homologs, four ITGAs have an extracellular Ig-fold-like 206
(IPR013783) or cadherin-like (IPR015919) domain next to the ITGA head (figure 2). A cadherin 207
like domain (IPR015919) and an Ig-fold-like domain (IPR013783) are part of an immunoglobulin 208
fold like beta sandwich protein class, which fold in a similar structure[35]. The three amoebozoan 209
ITGAs that contain either an Ig-fold-like or a cadherin-like domain also have canonical b-210
propellers (6-7 blades) (figure 2), although two of the four Type I ITGA have an extra cation motif 211
(supplemental table 1). 212
213
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
10
Amoebozoan ITGA is a Type II ITGA with novel domains 214
We designate any ITGA homolog with a domain flanking the ITGA head that has not previously 215
been reported in an ITGA protein as a “Type II” ITGA (figure 2). From the 23 amoebozoan taxa 216
in which we identified an ITGA, 12 taxa have Type II (figure 2). Of the 23 Type II amoebozoan 217
ITGAs we discovered, two have a protein kinase domain (IPR000719) (figure 2), five 218
amoebozoans have a proline rich extension motif called “extensins” (PR01217) (numbers are 219
SPRINTS ids derived from PRINTS version 42.0 [36] and INTERPROSCAN version 5.27.66 220
[23](supplemental methods)) (figure 3F; supplemental table 1). 221
222
In one Type II ITGA, one amoebozoan sequence (Flabellula citata paralog 1) uniquely has an 223
epidermal growth factor (EGF)-like domain (IPR000742) as its only novel domain located 224
downstream (C-terminal) to the ITGA head (figure 2). Additionally, three amoebozoans uniquely 225
have two epidermal growth factor (EGF) domains (figure 3B), a Thrombospondin repeat-1 (TSP1) 226
(IPR000884) (figure 3C), and a carbohydrate binding domain called a wall stress component 227
(WSC) (IPR002889) (figure 3D) all with cysteine rich motifs located upstream (N-terminal) to the 228
ITGA head (figure 2). We were unable to detect any other protein in the nr database with 229
significant similarity to the N-terminal domain composition of these sequences (EGF-TSP1-WSC) 230
using an e-value cut-off < 1e-10. This suggests that this EGF-TSP1-WSC architecture may be 231
unique to these three amoebozoans. 232
233
Seven of the amoebozoan ITGAs we designated as Type II possess canonical (6-7 blades) b-234
propellers (figure 2) and a different number of cation motifs (supplemental table 1). The other five 235
amoebozoans Type II ITGAs have short (3-5 blades) b-propellers, and a wide range of number of 236
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
11
cation motifs (supplemental table 1). Four amoebozoan ITGA homologs of this type contained 237
both a signal peptide and a transmembrane domain (figure 2). Eight others had either a signal 238
peptide or a transmembrane region. 239
240
Obazoan ITGA: General observations on ITGA 241
We also found ITGA homologs in several underexplored obazoan transcriptomes: the opisthokont 242
Ministeria vibrans and the breviate Lenisia limosa (figure 2). Obazoan ITGA proteins have at least 243
three β-propeller blade (IPR013517) domains, with an FG-GAP domain (figure 2, supplemental 244
figure 5A) and FG-GAP/cation binding motif domains are seemingly distributed randomly 245
(supplemental table 1). Animal-like KXGFFXKR cytoplasmic tail motifs were observed in most 246
obazoan transcriptomes that were examined in this study (figure 4E, supplemental figure 5C). 247
Among the non-animal obazoans, the seven novel obazoans ITGA have an Ig-fold-like domain leg 248
next to ITGA head (figure 2; supplemental figure 5D). 249
250
Ministeria vibrans has both Type I and Type II paralogs (figure 2). In Type II, paralog 1 and 2 251
have a predicted long (10) ITGA b-propeller, but they do not contain transmembrane domains. 252
The other paralog has a few (3) b-propeller blades (one with FG-GAP/cation binding motif), a 253
cytosolic tail motif (figure 3E), and a transmembrane domain (figure 2). In both paralog 1 and 3, 254
a signal peptide region was not predicted. Ministeria vibrans Type II ITGA uniquely has an 255
epidermal growth factor (EGF) like domain (IPR000742) attached upstream (N-terminal) to the 256
ITGA head (figure 2). All obazoans in our study have a signal peptide or a transmembrane region 257
except for Pigoraptor spp., Sphaeroforma arctica, and Syssomonas multiformis (figure 2), which 258
lack both. 259
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
12
260
Phylogenetic analysis of ITGA 261
The unrooted tree of ITGA recovered all of Amoebozoa ITGA as a clade (figure 2), which is 262
generally sister to most of Obazoa. However, a few opisthokont sequences are nested in this clade. 263
These include a paralog of Capsaspora owczarzaki (ITGA4: XP_004347912), both M. vibrans 264
paralogs, Breviates and the lone Syssomonas multiformis ITGA. The major lineages or sublineages 265
of Amoebozoa were not recovered. Instead amoebozoan ITGA are divided into two clades. One 266
clade consists primarily of tubulinid amoebozoans (+ Ministeria) and the other contains the rest of 267
Amoebozoa. 268
It should be noted that in addition to these taxa, the genome sequence of the very 269
evolutionary distant Stramenopile brown alga (Ectocarpus siliculosus) revealed the presence of 270
three ITGA homologs[37] (supplemental figure 8). However, no other available Stramenopile 271
genome data or transcriptomic data searched thus far (for a list of taxa searched, please see 272
supplemental text) contains integrin homologs in our examinations. Therefore, we inferred an 273
additional ITGA tree with this species (supplemental figure 3). Through BlastP to NCBI-NR we 274
identified several predicted beta propeller proteins in Archaea that have a similar architecture to 275
the canonical ITGA. An unrooted tree of beta propeller homologs recovered Archaea as a 276
paraphyletic across our phylogeny (supplemental figure 8), this is further discussed below. 277
We observed more FG-GAP proteins in other taxa such as cyanobacteria (e.g. Gloeobacter 278
violaceus; WP_011143221.1, Nostoc punctiforme; ACC84844.1 and Crocosphaera watsonii; 279
WP_007307935.1), Haptophyta (Emiliania huxleyi; XP_005778208.1), Crytophyta (Guillardia 280
theta; XP_005842481.1), and Stramenopiles (e.g. Nannochloropsis spp. CCMP1776: TFJ84058.1, 281
EWM21519 and Cafeteria roenbergensis; KAA0148838.1) on NCBI. These FG-GAP proteins 282
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
13
either do not possess transmembrane and signal peptide regions or possess 7 transmembrane or 283
more. Thus, in our estimation these FG-GAP proteins fail to pass our minimum criteria of being 284
an ITGA. 285
286
Integrin Beta 287
Amoebozoan ITGB: General observations on non-metazoan ITGB 288
We find that ITGB homologs are found in 19 out of 113 amoebozoan transcriptomes examined 289
with three ITGB paralogs in Rhizamoeba saxonica and two ITGB paralogs in Idionectes vortex 290
(figure 4). We find amoebozoan ITGBs are present in all three major clades of Amoebozoa. 291
However like in ITGA, these proteins are not in four amoebozoans groups that have genome 292
sequences, (the dictyostelids, entamoebids, acanthamoebids, and Protostelium fungiovrum[32]). 293
We were able to find ITGB gene in the genome of the amoebozoan Mastigamoeba balamuthi 294
ATCC 30984 (figure 6). For ITGB, there were five introns in the genomic scaffold 295
CBKX010020319.1 and eight introns in CBKX010020318.1. These introns are canonical M. 296
balamuthi introns and the other genes on the scaffold are mastigamoebid/amoebozoan in nature. 297
We detected a tenascin ORF on ITGB genomic scaffold of CBKX010020318.1 and PA14 ORF on 298
CBKX010020318.1. We inferred a phylogenetic tree for tenascin and PA14 by searching for 299
homologs in our extensive datasets, both of which clearly show amoebozoan signal (shown as a 300
subset in figure 6). The presence of introns within the genomic scaffold for ITGB clearly 301
demonstrates that this protein is indeed part of the genome. Additionally, the phylogenetic signal 302
of other ORFs on these genomic scaffolds show amoebozoan signal and are not from 303
contamination in the genome project. 304
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
14
305
The size range of ITGB paralogs in Amoebozoa vary widely from 595 to 3296 amino acids 306
(supplemental table 2). In addition to the novel ITGB proteins in Amoebozoa, we also found a 307
previously unreported ITGB homologs in two obazoan transcriptomes, as well in CRuMs, which 308
is a recently described clade closely related to Amorphea[38]. Based on domain and motif 309
architecture from IPRSCAN outputs, we classify non-metazoan ITGBs into two representatives; 310
ITGB that are similar to a canonical ITGB, and unique domain architecture proteins that are ITGB-311
like. 312
313
Type I ITGB: similar architecture to Metazoa 314
Our first representative type of ITGB has a similar domain architecture to metazoan ITGB (figure 315
4). From the 19 amoebozoan taxa in which we identified an ITGB, 14 taxa have Type I (figure 4). 316
Most amoebozoans in this class appear to have the vWA domain, b-subunit stalk region and the 317
PSI domains except two amoebozoans that lacks PSI domains and one amoebozoans which possess 318
the duplicated vWA domain (figure 4). Amoebozoa have the CRSM pattern in the b-subunit stalk 319
region of DXCGXCGGX(3-6)CXGC (figure 5F) ranging from three to seven times (Supplemental 320
table 2). In of b-tail motifs (IPR012896), the cytosolic tail motif (NPXY/F) is present across 321
Amoebozoa (figure 5G) except one amoebozoan (supplemental table 2) whereas an ITGA 322
interacting motif (LLXXXHDRKE) is absent in Amoebozoa. Twelve amoebozoans in this class 323
possess amino acids with a predicted electrostatic charge where the ITGA interacting motif 324
(LLXXXHDRKE) would be present in metazoan ITGB (figure 4, figure 5H). MIDAS are well 325
conserved especially in amoebozoan b-I domains (figure 5A, B) and SyMBS are well conserved 326
across the amoebozoans (figure 5C, D). However, ADMIDAS motifs are quite variable in 327
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
15
Amoebozoa compared to the metazoan consensus pattern (figure 5B, Supplemental table 2). A 328
signal peptide and a transmembrane domain were predicted in most amoebozoan ITGB in this 329
class and three amoebozoans lacking signal peptides (figure 4). 330
331
The ITGB proteins of non-metazoan obazoans have domain/motif architectures very similar to a 332
canonical metazoan ITGB (figure 4, supplemental 6). At the B-tail motifs (IPR012896) of Obazoa, 333
the cytosolic tail motif (NPXY/F) is present in all obazoans ITGB homologs (supplemental figure 334
6G), but we did not observe the ITGA interacting motif (LLXXXHDRKE) in ITGB of Breviatea. 335
We found a metazoan GXXXG motif in three amoebozoan ITGB (supplemental table 2), 336
suggesting that this motif is not animal specific. There are multiple tandem repeats of the 337
opisthokont CRSM pattern in Obazoa ITGB and both breviate taxa have a clear expansion of b-338
subunit stalk region as previously reported [6–11] (figure 3). All of the obazoan ITGB MIDAS, 339
ADMIDAS, SyMBS motifs were well-conserved (supplemental figure 6A-D). We also 340
investigated a possible “choanoflagellate ITGB gene” that was recently reported to be present in 341
the transcriptome of Didymoeca costata[39]. The protein domain architecture has a serine protease 342
flanking vWA, PSI domain. However, it is missing the CRSM, NPXY/F (IPR021157), 343
LLXXXHDRKE (IPR014836) and GXXXG motifs. To our surprise Rigifila ramosa (in CRuMs), 344
has a protein that contains all the canonical domains of ITGB such as vWA, amoebozoan CRSM 345
pattern and c-terminal motifs (figure 3) and all typical cation motifs (supplemental table 2). We 346
were unable to find an ITGA homolog in either Rigifila ramosa or Didymoeca costata. 347
348
Type two ITGB: unique architecture compared to Metazoa and solely found in the amoebozoan 349
clade Variosea 350
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
16
Our second representative class of ITGB, which we refer to as Type II, are large proteins (1405 aa 351
to 3641 aa) that have novel domain architectures unobserved before in ITGB, which we refer to as 352
Type II. Of the total 19 amoebozoan taxa that have ITGB five amoebozoans have an ITGB 353
assigned to our Type II class (figure 4). Type II amoebozoan ITGB domain architectures include 354
Laminin G3 (LamG3) (IPR013320), Von Willebrand Type D (vWD) (IPR001846), Ig-like fold 355
(IPR013783), EGF-like (IPR013032) and a cyclic nucleotide binding (IPR000595) domain (figure 356
4. A canonical human ITGB has four EGF domains [16,18] that coincide with the location of Ig 357
regions in amoebozoan ITGB. Three amoebozoan ITGB in this class have an extracellular EGF–358
like domain at the N-terminus, at the location of the PSI domain (figure 3). LamG3 (CSM) and 359
EGF-like domain are recovered in two amoebozoan ITGB (figure 3). The cation binding motifs 360
MIDAS, ADMIDAS motifs are hypervariable in Type II representatives compared to the metazoan 361
consensus pattern, but SyMBS is well conserved except in Filamoeba nolandi (supplemental table 362
2). 363
364
365
Phylogenetic analysis of ITGB 366
We selected Homo sapiens, Gallus gallus, and Nematostella vectensis as a selective reference 367
group because integrin proteins are highly conserved among Metazoa. The tree of ITGB reveals a 368
three major clade when we rooted on Opisthokonta as an outgroup (figure 3). We observed three 369
clades of type II amoebozoan ITGB, breviates with apusomonads ITGB, and type I amoebozoan 370
with the CRuMs taxon Rigifila ramosa ITGB. Therefore, we do not recover Amoebozoa as a 371
monophyletic group. 372
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
17
Additionally, we investigated the evolutionary history of Sib (Similar to Integrin Beta) proteins. 373
Sib proteins is a cell adhesion molecule that are similar to ITGB, originally discovered in 374
Dictyostelium discoideum [40]. However, they do not appear to have a canonical ITGB 375
(Supplemental result). Additionally, an ITGB homolog was shown in cyanobacteria, but it 376
contained only one of the ITGB domains[6]. Therefore, we inferred an additional ITGB tree with 377
these SIBA proteins from these species (Didymoeca costata, Dictyostelium discoideum and 378
cyanobacteria) (supplemental figure 2). In this tree, a clade of Type II ITGB is placed between 379
the cyanobacteria Trichodesmium and the choanoflagellate Didymoeca. However, these apparent 380
ITGB homologs do not have obvious ITGB-stalk region or NPXY motifs. While they are 381
recovered well with other ITGB, given the important missing motifs and domains the placement 382
of these in a phylogenetic context alone does not clearly define these as ITGB. 383
384
Scaffold Cytoplasmic proteins 385
In our comparative transcriptome and genome analysis, we observed the following intracellular 386
scaffold cytoskeletal adhesomes proteins: talin, a-actinin, vinculin, PINCH, paxillin, and ILK 387
throughout Amoebozoa. These six adhesome proteins were have been reported to be present in 388
Amoebozoa[6]. Since then the number of the core consensus adhesomes were expanded[21]. 389
Filamin, kindlin, and tensin are three consensus adhesome proteins and cross-actin linkers that 390
interact with ITGB[21]. We have observed two novel adhesome proteins (filamin, tensin) that were 391
not known in Amoebozoa (Supplemental figure 1). We suspect these cytoskeleton proteins may 392
be critical, and the absence of these genes in some amoebozoan taxa is probably a result low gene 393
coverage in some transcriptomes. Talin, PINCH, filamin, and Paxillin key players in the integrin 394
adhesome, were not present in seven amoebozoan transcriptomes (figure 1B), however their 395
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
18
absence is also suspected to be a artefact of low gene coverage in these data. Across the 396
amoebozoan clade, we also noticed a large number of what appear to be truncated talin compared 397
to the rest of adhesome proteins (figure 1B). 398
399
ILK is scattered across the Amoebozoa unlike in Apusomonadida and Opisthokonta[6–10]. a-400
actinin is the only signaling protein that is present in all of the amoebozoan taxa. Consistent with 401
previous reports, we did not detect FAK-cSRC or Parvin in any of Amoebozoa or Breviatea[11]. 402
Since the gene coverage for Pygsuia (Breviatea) is high and genomes are available for Lenisia 403
limosa (Breviatea) and Entamoeba spp. (Amoebozoa), we conclude that breviates and Entamoeba 404
spp. likely do not possess vinculin proteins (supplemental figure 1). For Ectocarpus siliculosus, 405
its genome has talin, and a-actinin homologs[37], but it does not contain any other integrin-406
signaling adhesome protein homologs. 407
408
409
410
DISCUSSION 411
Our findings show that the IMAC is not exclusive to obazoans as previously proposed[6,8–11]. 412
We report a nearly full set of IMAC proteins is present throughout Amorphea including four 413
amoebozoan species (Flabellula citata, Rhizamoeba saxonica, Cryptodifflugia operculatum, 414
Mastigamoeba balamuthi, and Nebela sp.), and two previously unsurveyed obazoan taxa Lenisia 415
limosa (Breviatea) and in Ministeria vibrans (Opisthokonta). Our increased taxon sampling that 416
spans the breadth of the known diversity within Amoebozoa permitted discovery of these proteins. 417
While the integrin proteins undoubtedly played a role in the origins of animal 418
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
19
multicellularity[6,41,42], we do not find any evidence of integrins in the few multicellular lineages 419
of Amoebozoa, those which socially aggregate to form emergent fruiting body structures 420
(Copromyxa protea and the model dictyostelids). Interestingly, secondary loss of these proteins 421
seems to be rampant, as the majority of amoebozoan taxa appear to lack integrins or some of the 422
IMAC components, but the caveat is that some of these data are based on incomplete 423
transcriptomic data. Therefore, their absence may be due to the available data. None-the-less, the 424
most parsimonious explanation of the IMAC evolution is that the ancestor of Amoebozoa must 425
have contained them. An analogous scenario of secondary loss of the IMAC plays out in obazoan 426
evolution, in which we see that the closest relative of animals, the Choanoflagellata[39,43,44], as 427
well as the Nucletmycea (also referred to as the Holomycota, the fungal lineage) are devoid of key 428
IMAC components (namely the integrins)[6,11]. 429
430
The functional domains of ITGA in amoebozoans are similar to the canonical ITGA found in 431
animals and their close relatives like the opisthokont filastereans Capsaspora owczarzaki[6], 432
Ministeria vibrans, and Pigoraptor spp.[7]. Given the homologous nature of the functional domain 433
architecture in amoebozoan ITGA, it is likely that they function at the cellular-level in a similar 434
manner to those in animals. Therefore, we suspect the last common ancestor of Amorphea (LCAA) 435
probably already had ITGA within its genome. Of note, there are potential ITGA homologs in 436
Archaea, therefore it is likely ITGA homologs predate the eukaryotic last common ancestor, but 437
have been retained mostly in Amorphea. We also suspect that the ITGA Ig-like domains possessed 438
by some amoebozoans and obazoans are equivalent to metazoan ITGA legs originally thought to 439
be specific to metazoans[6,11]. An ITGA leg domain was probably in the ITGA of the LCAA, and 440
secondarily lost in some taxa. 441
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
20
442
Some novel functional domains in Type II representatives are integrin binding domains that are 443
present in ECM protein such as LamG[45], TSP-1[46], and EGF-like domains[47]. Could the last 444
common ancestor of Type II representatives or LCAA contain these integrin binding domains? 445
Other functional domains in amoebozoan ITGA such as protein kinase, TSP-1 together with WSC 446
and intramolecular chaperone are not found in Obazoa. Domain shuffling is an important 447
mechanism for creating a novel genes in evolution[48]. It has been suggested that many domains 448
in the ECM are found in unrelated proteins[49]. Whether these proteins were parts of the domain 449
shuffling in ITGA of the last common ancestor of Amoebozoa, which resulted in ECM evolution 450
remains to be seen and their function remains elusive. However, based on our results, it is possible 451
that we have predicted novel integrin proteins with catalytic as well as adhesive properties. 452
453
The domains of amoebozoan ITGB are very similar to the canonical metazoan ITGB and we 454
suspect that the last common ancestor of Amoebozoa retained these canonical ITGB structures 455
including the CRSM motif and three metal binding sites. The cytosolic tail motif “GFFKR” is 456
probably specific to Obazoa and there may be no interaction in the c-terminal intracellular region 457
of integrins in Amoebozoa. The presence of transmembrane and the signal peptide regions are 458
sporadic in Amorphea including metazoan, a previous study showed shedding (evolution from 459
membrane docked to a secretory function) of integrins occurs[27]. The amoebozoan ITGB 460
cysteine pattern of CGXCGGXXXXCXGC possesses more amino acids, including glycine, 461
compared to Obazoa (figure 5F). Glycine is a small non-polar amino acid[50] and due to its small 462
size, glycine positions are often highly conserved within proteins[51,52]; because, glycine residues 463
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
21
maintain the overall architecture[53]. It may be interpreted that increased glycine residues could 464
maintain the overall protein structure in amoebozoan ITGB. 465
466
The type II ITGB form an independent clade, and it is a sister to the other amoebozoans and R. 467
ramosa in figure 4. LamG3 and vWD in ITGB have never been observed in Obazoa and the 468
functions of these type II ITGB proteins are unknown. Some unicellular protists such as breviates, 469
Capsaspora owczarzaki, and apusomonads have a long expansion of the CRSM for ITGB [6,11]. 470
We suspect that Breviatea with a long integrin legs form a long integrin heterodimerization 471
complex at the extracellular area of the cell, as both ITGA and ITGB has long legs of nearly equal 472
proportions in Pygsuia biforma (figures 2 and 3)[11]. 473
474
Our prediction is that most of Discosea, which includes Acanthamoeba (again with several 475
genomes available) and many other sampled transcriptomes in our study, have lost these proteins. 476
However, one representative discosean (Mycamoeba gemmipara) has canonical ITGB homolog 477
(figure 4). Additionally, four other amoebozoan taxa with genome sequences available also appear 478
to have lost ITGB in their evolution. There are amoebozoans in which we have observed only 479
ITGA or ITGB receptors, without their counterpart integrin protein. A study by Li et al. and 480
Schneider et al. showed that homodimerization of ITGA or ITGB might occur by clustering of 481
integrin in a lipid raft[54,55]. Our results show that many of the amoebozoan species singly possess 482
either ITGA or ITGB and yet they possess all the components of signal components of IMAC. 483
These amoebozoan with singular integrins may be capable of initiating the integrin signaling 484
pathway by a similar process. 485
486
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
22
Also, the question remains that the complexity of extra domains of integrin are not species related 487
and their compositions are a single multidomain protein that is present only in Amoebozoa. 488
Phylogenetically these orthologs (i.e., amoebozoan ITGA and ITGB) form novel clades 489
independent to known organisms and most likely are ancestral to the Amoebozoa as a whole. In 490
order to further examine the overall distributions and protein architectures, genomic sequencing 491
will be necessary, and the combination of localization and gene knockout strategies are necessary 492
to understand the function of these ‘multicellularity’ proteins in unicellular taxa. 493
494
Conclusion 495
Our most comprehensive comparative genomic and transcriptomic of Amoebozoa has revealed the 496
presence of a full set of IMAC across the Amoebozoa. To date, this complex was believed to be 497
Obazoa specific, our data shows that these IMAC proteins predates Obazoa. Therefore, the last 498
common ancestor of Amorphea contains the necessary machinery of IMAC, yet it remains 499
unknown if and what functional role they play in these unicellular protists. In future work, 500
localization and functionality studies of these IMAC proteins will be investigated. 501
502
Acknowledgements 503
This project was supported in part by the United States National Science Foundation (NSF) 504
Division of Environmental Biology (DEB) grant 1456054 (http://www.nsf.gov), awarded to MWB. 505
We thank Dr. Andrew J. Roger (Dalhousie University) for advanced access to Mastigamoeba 506
balamuthi transcriptome, which was supported by grant MOP-142349 from the Canadian Institutes 507
of Health Research awarded to A.J. Roger. 508
509 FIGURE LEGENDS 510
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
23
511 Figure 1: A) Repertoire of IMAC in amoebozoan species that have either integrin alpha or beta. 512 Names of IMAC proteins are listed at the bottom and names of amoebozoan taxa are listed on side 513 of the presence/absence heat map. Arrows indicates amoebozoan species that have nearly complete 514 set of IMAC proteins. Dark blue indicates the presence of a canonical IMAC protein, white 515 indicates the absence of an IMAC protein and light blue indicates a truncated form of IMAC 516 protein. B) Phylogenetic distribution of IMAC. Circle, innovation/gain; bar, loss. Abbreviations: 517 vin, vinculin; pax, paxillin; tal, talin; par, parvin; pin, particularly interesting new cysteine-518 histidine-rich protein; FAK, focal adhesion kinase; ILK, integrin-linked kinase; αA, alpha-actinin; 519 α, α-integrin; β, β-integrin. C) Cartoon schematic of the membrane docked IMAC, which 520 associates with intracellular actin (modified from Brown et al., 2013), color corresponds with the 521 species tree. 522 523 524 525 526 527 528
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
24
529 Figure 2: Domain architecture of ITGA from Amoebozoa. The domain architecture of ITGA for 530 amoebozoan species are shown at the right side and phylogenetic tree of Amoebozoa is shown at 531 the left side of figure. Keys to each domain’s color and shape are represented at the bottom of the 532 figure. Phylogenetic tree of amoebozoan ITGA, 160 sites phylogeny of amoebozoan ITGA rooted 533 with Obazoa ITGA. The tree was built using IQTREE v1.5.5 under the LG+C60+F+G model of 534 protein evolution. Numbers at nodes are ML bootstrap (MLBS) values derived from 1000 ultrafast 535 MLBS replicates. Any values that are below 50% are not shown. The color of branches 536 amoebozoan sequences are illustrated according to Kang et al figure 3[14]. When domains are 537 overlapped because of different software algorithms, these are represented in transparent colors. 538 539
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
25
540 Figure 3: Most significantly enriched motifs in 23 amoebozoans ITGA homologs, as obtained by 541 MEME-suite except for (E). (A) Consensus cation motif and FG-GAP motifs found in 23 542 amoebozoans ITGA homologs. (B-D) For figure 3B to 3D, these consensus motifs are exclusively 543 enriched in three amoebozoan ITGA homologs out of 23 amoebozoan ITGA homologs: (B) EGF-544 like motif (C) Thrombospodin repeat 1 motif (D) Carbohydrate WSC motif. (E) GFFKR motif 545 found in three ITGA homologs from Filamoeba nolandi and two obazoans species, Ministeria 546 vibrans and Pygsuia biforma. (F) Proline-rich extensin motif found in 8 ITGA amoebozoan 547 homologs out of 23 amoebozoan ITGA homologs. 548
549
ITGA Motif
MEME (no SSC) 07.06.2018 21:11
0
1
2
3
4
bits
1
C2E
LQ
3FY
4GST
5W 6GST
7PTV
8W 9DST
10
QA
11
C
12
S
13
YV
14
DE
15
CMEME (no SSC) 18.06.2018 20:29
0
1
2
3
4
bits
1ACILNRYTS
2AFLQSIV
3
DNTIVAS
4
F
P
V
CNTYGLAS
5
EIGLSA
6ARPG
7D 8AMLFVI
9DN
10
HNVAG
11
ND
12
KRNG
13
MRYFVKLI
14
KQLAGSNPD
15
ED
16
AVLI
17
MIAVL
18
FLVI
19
ASG
20
CEFRVITSA
21
GQTVKSYP
MEME (no SSC) 18.06.2018 21:14
0
1
2
3
4
bits
1LTP
2ST
3INPVL
4ALTVNS
5VAP
6IST
7FLNS
8NQTAS
9VTP
10
T
11
NP
12
NRTS
13
LTP
14
GIST
15
ARSVNP
16
NQRPS
17
TP
18
ST
19
NQSVL
20
ILT
21
P
22
ST
23
EINQPT
24
NPRS
25
ILP
26
ST
27
DFILTP
28
PQTS
29
VTP
30
ST
31
DNP
32
LNVS
33
LNP
34
AST
35
TLNI
36
AIMRS
37
AQTP38
PST
39
PTINS
40
NSVT
41
RP
42
RST
43
LPSTIN
44
ATRS
45
LSIP
46
ST
47FNRVP
48NTAS
49NRP
50
ST
MEME (no SSC) 18.06.2018 22:08
0
1
2
3
4
bits
1
C
2ALQV
3
C4D
EKL
5DLNP
6
G7F 8F
HST
9AG
10
D11
G12
KLNS
13
QN
14
C15
FT
16RD
MEME (no SSC) 18.06.2018 22:11
0
1
2
3
4
bits
1N 2TS
3N 4HRV
5ALN
6
C
7MQT
8LQ
9AY
10
C11
EFQ
12
D13
L14
GHQ
15
FMY
16
PS
17
YF
18
A19
A20
T
A.
B. C.
D.
F. MEME (no SSC) 18.06.2018 20:36
0
1
2
3
4
bits
1HLK
2HKQE
3ENSRK
4AFEKR
5PQE
6MPQVE
7ADNPK
8MPQE
9INRYDK
10
IQRE
11
ELM
E.
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
26
550
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
27
551 552 Figure 4: Phylogenetic tree of ITGB homologs with a focus on Amoebozoa. The domain 553 architecture of ITGB homologs are shown to the right and the phylogenetic tree of ITGB. Keys to 554 each of domains color and shape are represented in the box at the bottom of the figure. Tree of 555 amoebozoan ITGB 295amino acids sites phylogeny of amoebozoan ITGB rooted with Obazoa. 556 The tree was built using IQTREE v1.5.5 under the LG+C60+F+G model of protein evolution. 557 Numbers at nodes on the phylogenetic tree are maximum likelihood bootstrap support values 558 derived from 1000 ultrafast bootstrap replicates. Any values that are below 50% are not shown. 559 The color of branches amoebozoan sequences are illustrated according to Kang et al. figure 3[14]. 560 When domains are overlapped because of different software algorithms, these are represented in 561 transparent colors. 562 563
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
28
564 Figure 5: Most significantly enriched motifs in 18 amoebozoans ITGB homologs, as obtained by 565 MEME-suite. (A)-(F) Three divalent cation binding motifs: MIDAS, ADMIDAS and SyMBS are 566 shown. (A) MIDAS (black arrow) and ADMIDAS (red arrow) motifs are shown. (B) A shared 567 amino acid (indicted by arrow) in MIDAS and ADMIDAS motifs are illustrated. (C) SyMBS motif 568 represented by green arrow and a shared amino acid of MIDAS motif indicated by black arrow are 569 shown. These consensus motifs are obtained from type I amoebozoan ITGB homologs sequences 570 only. (D) SyMBS motif indicted by a green arrow is shown. (E-F) Cysteine rich motifs are shown. 571 (E) PSI domain and (F) Cysteine-rich motif stalk is shown. These are recovered from type I 572 amoebozoan ITGB sequences. (G-H) Motifs found at the C-terminal are shown. (G) NPXY motif 573 shown in blue arrow. found in most amoebozoans ITGB except three amoebozoan ITGB homolog. 574 (H) An electrostatic charge motif is shown, this motif is found exclusively in type I ITGB 575 homologs. 576 577 578 579
ITGB MotifA.
C.
B.
E.
MEME (no SSC) 11.01.2018 20:59
0
1
2
3
4
bits
1YFILTD
2LSTVDEI
3ILQVP
4ASP
5LI
6ED
7LMVF
8ILVYF
9IVF
10
IVL
11
HMLQ
12
D13
VSA
14
TS
15
EFYAVGI
16
S17
IM
18
FGHLQVIKRS
19
QRDP
20
EFLNDHY
21
VMI
22
DELNQASTK
23
ADMTNL
24
MVFI
25
ILSGAQR
26
ALNDRST
27
ISVML
28
ALTMV
29
GP
30
DNRKQ
31
VLI
0
1
2
3
4
bits
1
C
2NVG
3VYFIW
4
C
5DEKSAVG
6EHGLQDS
7DHKSVLNT
8DENRSKG
9LMQVKARSET
10
C11
DHQWEIRVML
12
LVGPSTA
13
KVCNAG
14
GQATSDN
15
DLMQTVIAEGP
0
1
2
3
4
bits
1AD
L
N
EKSTQRG
2AGMSYTINVL
3HL
Q
V
Y
GIRTSA
4DE
F
L
Y
ANRSIG
5VGN
6AFKSQINGY
7ED
8FLNP
Q
VERTWGH
9KQYCP
10
DQNE
11
GTADNS
12
C
D
VAFGLMSPQ
13
HSWIMFL
14
NSVTDE
15GA
16VIMAL
17
AGIVYMFL
18
MRFHILQY
19
TGLVSA
20
GILTSA
21
A
C
D
IKQWYRVL
0
1
2
3
4
bits
1IRVFLY
2D 3ELAIV
4
C5N
G6I
V7
C8D
G9
G10
KLND
11
SGN
12
DLNRTS
13
LST
14
C15
ADQRIL
16
DG
17
C18
D19
HNG
20
FKRIV
21
AFSP
0
1
2
3
4
bits
1SARG
2HVIMYLF
3NTVSAG
4GMKQATS
5HYF
6RLSIV
7NED
8IRK
9AKNRSVTP
10
AKNTCQLIV
11
MYVALS
12
GNVYP
13
C
HI
LS
W
QMYF
14
STIMAVG
15
C
QRDNYGKS
16
AMSIYWPT
17
AQRNSVPT
18
QSTDEP
MEME (no SSC) 24.04.2018 21:59
0
1
2
3
4
bits
1LPS
KRH
2FTMVIL
3AF
CTMIVL
4AY
IMFLV
5
TIYMLVF
6
LCSIATV
7GST
8EKD
9ENGSDA
10
ACNKGSDT
11
TAYSF
D.
0
1
2
3
4
bits
1NQSTAW
2DGHIVTNSQK
3AGMSVTQL
4SKNEQGHD
5GH
V
AENTSD
6SN
7P 8HTIL
9FY
10
IRQEK
11
DQPANGES
12
NQSA
13
FQGVIKT
14
DNRSTK
15
AQVNKET
16
KRYHVTF
17
F
SNEIQKVD
18
SN
19
AVP
20
QFIRTL
21
LFY
MEME (no SSC) 06.04.2018 16:56
0
1
2
3
4
bits
1LTA
2IV
3VIA
4GLVA
5GL
6SA
7TVAS
8ILV
9LTA
10
AGILV
11
V
12
AG
13
LIV
14
IV
15
IFA
16
A
17
GVAI
18
LAIV
19
GA
20
AFIVCL
21
CLVI
22
ALVMF
23
CFL
24
AGILSVF
25
VSG
26
CGA
27
ASTG
28
VA
29
IKA
30
IM
31
VAM
32
RAKV
33
A
34
D
35
ADEHKNQT
36
IMVE
37
ML
38
FHLVI
39
TDE
40
QK
41
LE
F.
G.
H.
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
29
580 Figure 6: The presence of ITGB gene in a Mastigamoeba balamuthi genome sequence. 581 Introns/exons boundaries sequences are shown along with corresponding intron size. The genome 582 contig that possess ITGB gene are colored blue and green. The protein domain architecture of 583 ITGB gene are shown along with protein size. The adjacent tenascin gene on the ITGB genome 584 contig are shown left and right along with the inferred phylogenetic tree. 585 586
Supplemental Materials 587
Supplemental Results: Details on the Sib (similar to integrin beta) proteins found in dictyostelids. 588
Supplemental Figure 1: Repertoire of IMAC including filamin and tensin in Amoebozoa. 589
Supplemental Figure 2: Repertoire of IMAC in Amoebozoa species. Supplemental Figure 3: 590
Maximum likelihood tree of ITA homolog and other cell adhesion molecules that contain beta-591
propellers. Supplemental Figure 4: Maximum likelihood tree of ITB homolog and Sib proteins. 592
Supplemental Figure 6: Most significantly enriched motifs in Obazoan ITA, obtained by MEME-593
suite. Supplemental Figure 7: Most significantly enriched motifs in Obazoan ITB, obtained by 594
MEME-suite. Supplemental Figure 8: Most significantly enriched motifs in Sib protein, obtained 595
by MEME. 596
597
598
Supplemental Table 1: The table shows the list of Amorphea ITGA proteins characteristic 599
including: length, beta propeller motifs, subcellular location, any extra domains attached to ITGA, 600
c-terminal motif signal peptide, transmembrane regions and type of ITGA. The order of taxons are 601
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
30
listed in accordance to the Figure 2. The subcellular localization is displayed with likelihood value 602
in parentheses. The location of extra domains attached to ITGA are shown as the C-terminal end 603
(CTE) or at the start of N-terminal (NTB). The presence of ITGA interacting motif with ITGB is 604
shown as GFFKR and GXXXG. The location of beta propeller motifs performed by MEME-suite 605
is displayed. The presence of FG-GAP motifs are shown in asterixis. The location of beta motifs 606
that corresponds to IPRSCAN output are highlighted by parentheses. The presence of consensus 607
cation motifs are displayed in number 2 superscripts. 608
609
Supplemental Table 2: The table shows ITGB motifs, subcellular localizations, signal peptide, 610
transmembrane and extra domains attached to ITGB protein. The order of taxons are listed in 611
accordance to the Figure 3. For ITGB motifs of MIDAS, AMIDAS and SyMBS, any non-canonical 612
amino acids were highlighted in red color, cysteine rich motifs are either shown as EGF or PSI 613
domains. The presence of C-terminal motifs are shown in KLXXXD or NPXY/F and GXXXG. 614
The parentheses indicate number of motifs present in ITGB. 615
616
Supplemental Table 3: This table shows Rsem value of protists that have either a complete set of 617
IMAC, integrins with extra domains or unexpected discovery of an integrin protein. The values 618
are shown in transcripts per million (TPM). 619
620
Supplemental Table 4: A). Summary of amoebozoan ITA type I table listing: number of paralogs, 621
Beta propeller, FG-GAP and canonical motifs. The table also shows presence of signal peptide 622
and transmembrane region, c-terminal interacting motif, GXXXG motif and Ig-fold like/Cadherin 623
domain. B). Summary of amoebozoan ITA type II table listing: number of paralogs, Beta 624
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
31
propeller, FG-GAP and canonical motifs. The table also shows presence of signal peptide and 625
transmembrane region, c-terminal interacting motif, GXXXG motif and extra domain attached to 626
ITA. 627
628
REFERENCES 629
1. Hynes, R.O. (2002). Integrins: bidirectional, allosteric signaling machines. Cell 110, 673–630 687. 631 2. LaFlamme, S.E., and Auer, K.L. (1996). Integrin signaling. Semin. Cancer Biol. 7, 111–632 118. 633 3. Harburger, D.S., and Calderwood, D.A. (2009). Integrin signalling at a glance. J. Cell Sci. 634 122, 159–163. 635 4. Horton, E.R., Byron, A., Askari, J.A., Ng, D.H.J., Millon-Frémillon, A., Robertson, J., 636 Koper, E.J., Paul, N.R., Warwood, S., Knight, D., et al. (2015). Definition of a consensus integrin 637 adhesome and its dynamics during adhesion complex assembly and disassembly. Nat. Cell Biol. 638 17, 1577. 639 5. LaFlamme, S.E., Mathew-Steiner, S., Singh, N., Colello-Borges, D., and Nieves, B. (2018). 640 Integrin and microtubule crosstalk in the regulation of cellular processes. Cell. Mol. Life Sci. 75, 641 4177–4185. 642 6. Sebé-Pedrós, A., Roger, A.J., Lang, F.B., King, N., and Ruiz-Trillo, I. (2010). Ancient 643 origin of the integrin-mediated adhesion and signaling machinery. Proc. Natl. Acad. Sci. 107, 644 10142. 645 7. Hehenberger, E., Tikhonenkov, D.V., Kolisko, M., del Campo, J., Esaulov, A.S., Mylnikov, 646 A.P., and Keeling, P.J. (2017). Novel Predators Reshape Holozoan Phylogeny and Reveal the 647 Presence of a Two-Component Signaling System in the Ancestor of Animals. Curr. Biol. 27, 2043-648 2050.e6. 649 8. Grau-Bové, X., Torruella, G., Donachie, S., Suga, H., Leonard, G., Richards, T.A., and 650 Ruiz-Trillo, I. (2017). Dynamics of genomic innovation in the unicellular ancestry of animals. 651 eLife 6, e26036. 652 9. de Mendoza, A., Suga, H., Permanyer, J., Irimia, M., and Ruiz-Trillo, I. (2015). Complex 653 transcriptional regulation and independent evolution of fungal-like traits in a relative of animals. 654 Elife 4, e08904. 655 10. Dudin, O., Ondracka, A., Grau-Bové, X., A.B. Haraldsen, A., Toyoda, A., Suga, H., Bråte, 656 J., and Ruiz-Trillo, I. (2019). A unicellular relative of animals generates an epithelium-like cell 657 layer by actomyosin-dependent cellularization. bioRxiv, 563726. 658 11. Brown, M.W., Sharpe, S.C., Silberman, J.D., Heiss, A.A., Lang, B.F., Simpson, A.G., and 659 Roger, A.J. (2013). Phylogenomics demonstrates that breviate flagellates are related to 660 opisthokonts and apusomonads. Proc. R. Soc. Lond. B Biol. Sci. 280, 20131755. 661 12. Adl, S.M., Bass, D., Lane, C.E., Lukeš, J., Schoch, C.L., Smirnov, A., Agatha, S., Berney, 662 C., Brown, M.W., Burki, F., et al. (2018). Revisions to the Classification, Nomenclature, and 663 Diversity of Eukaryotes. J. Eukaryot. Microbiol. 0. Available at: https://doi.org/10.1111/jeu.12691 664 [Accessed October 18, 2018]. 665
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
32
13. Cavalier-Smith, T. (2017). Origin of animal multicellularity: precursors, causes, 666 consequences—the choanoflagellate/sponge transition, neurogenesis and the Cambrian explosion. 667 Philos. Trans. R. Soc. B Biol. Sci. 372. Available at: 668 http://rstb.royalsocietypublishing.org/content/372/1713/20150476.abstract. 669 14. Kang, S., Tice, A.K., Spiegel, F.W., Silberman, J.D., Pánek, T., Čepička, I., Kostka, M., 670 Kosakyan, A., Alcântara, D., and Roger, A.J. (2017). Between a pod and a hard test: the deep 671 evolution of amoebae. Mol. Biol. Evol. 672 15. Hamann, E., Gruber-Vodicka, H., Kleiner, M., Tegetmeyer, H.E., Riedel, D., Littmann, S., 673 Chen, J., Milucka, J., Viehweger, B., Becker, K.W., et al. (2016). Environmental Breviatea 674 harbour mutualistic Arcobacter epibionts. Nature 534, 254. 675 16. Zhang, K., and Chen, J. (2012). The regulation of integrin function by divalent cations. 676 Cell Adhes. Migr. 6, 20–29. 677 17. Xia, W., and Springer, T.A. (2014). Metal ion and ligand binding of integrin α5β1. Proc. 678 Natl. Acad. Sci. 111, 17863–17868. 679 18. Campbell, I.D., and Humphries, M.J. (2011). Integrin structure, activation, and interactions. 680 Cold Spring Harb. Perspect. Biol. 3, a004994. 681 19. Lee, J.-O., Rieu, P., Arnaout, M.A., and Liddington, R. Crystal structure of the A domain 682 from the a subunit of integrin CR3 (CD11 b/CD18). Cell 80, 631–638. 683 20. Xiong, J.-P., Stehle, T., Diefenbach, B., Zhang, R., Dunker, R., Scott, D.L., Joachimiak, 684 A., Goodman, S.L., and Arnaout, M.A. (2001). Crystal structure of the extracellular segment of 685 integrin αVβ3. Science 294, 339–345. 686 21. Humphries, J.D., Chastney, M.R., Askari, J.A., and Humphries, M.J. (2019). Signal 687 transduction via integrin adhesion complexes. Cell Archit. 56, 14–21. 688 22. Srichai, M.B., and Zent, R. (2010). Integrin structure and function. In Cell-Extracellular 689 Matrix Interactions in Cancer (Springer), pp. 19–41. 690 23. Finn, R.D., Attwood, T.K., Babbitt, P.C., Bateman, A., Bork, P., Bridge, A.J., Chang, H.-691 Y., Dosztányi, Z., El-Gebali, S., Fraser, M., et al. (2017). InterPro in 2017—beyond protein family 692 and domain annotations. Nucleic Acids Res. 45, D190–D199. 693 24. Oxvig, C., and Springer, T.A. (1998). Experimental support for a beta-propeller domain in 694 integrin alpha-subunits and a calcium binding site on its lower surface. Proc. Natl. Acad. Sci. U. 695 S. A. 95, 4870–4875. 696 25. Humphries, M.J., Symonds, E.J.H., and Mould, A.P. (2003). Mapping functional residues 697 onto integrin crystal structures. Curr. Opin. Struct. Biol. 13, 236–243. 698 26. Jin, R., Trikha, M., Cai, Y., Grignon, D., and Honn, K.V. (2007). A naturally occurring 699 truncated β3 integrin in tumor cells: Native anti-integrin involved in tumor cell motility. Cancer 700 Biol. Ther. 6, 1559–1568. 701 27. Gomez, I.G., Tang, J., Wilson, C.L., Yan, W., Heinecke, J.W., Harlan, J.M., and Raines, 702 E.W. (2012). Metalloproteinase-mediated Shedding of Integrin β2 promotes macrophage efflux 703 from inflammatory sites. J. Biol. Chem. 287, 4581–4589. 704 28. Tan, S.-M., Walters, S.E., Mathew, E.C., Robinson, M.K., Drbal, K., Shaw, J.M., and Law, 705 S.K.A. (2001). Defining the repeating elements in the cysteine-rich region (CRR) of the CD18 706 integrin β2 subunit. FEBS Lett. 505, 27–30. 707 29. Russ, W.P., and Engelman, D.M. (2000). The GxxxG motif: A framework for 708 transmembrane helix-helix association11Edited by G. von Heijne. J. Mol. Biol. 296, 911–919. 709
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
33
30. Mould, A.P., Barton, S.J., Askari, J.A., Craig, S.E., and Humphries, M.J. (2003). Role of 710 ADMIDAS cation-binding site in ligand recognition by integrin α5β1. J. Biol. Chem. 278, 51622–711 51629. 712 31. Calderwood, D.A., Fujioka, Y., de Pereda, J.M., García-Alvarez, B., Nakamoto, T., 713 Margolis, B., McGlade, C.J., Liddington, R.C., and Ginsberg, M.H. (2003). Integrin β cytoplasmic 714 domain interactions with phosphotyrosine-binding domains: A structural prototype for diversity 715 in integrin signaling. Proc. Natl. Acad. Sci. 100, 2272. 716 32. Hillmann, F., Forbes, G., Novohradská, S., Ferling, I., Riege, K., Groth, M., Westermann, 717 M., Marz, M., Spaller, T., Winckler, T., et al. (2018). Multiple Roots of Fruiting Body Formation 718 in Amoebozoa. Genome Biol. Evol. 10, 591–606. 719 33. Nývltová, E., Šuták, R., Harant, K., Šedinová, M., Hrdý, I., Pačes, J., Vlček, Č., and 720 Tachezy, J. (2013). NIF-type iron-sulfur cluster assembly system is duplicated and distributed in 721 the mitochondria and cytosol of Mastigamoeba balamuthi. Proc Natl Acad Sci U A 110, 7371–722 7376. 723 34. Metcalf, J.A., Funkhouser-Jones, L.J., Brileya, K., Reysenbach, A.-L., and Bordenstein, 724 S.R. (2014). Antibacterial gene transfer across the tree of life. Elife 3, e04266. 725 35. Clarke, J., Cota, E., Fowler, S.B., and Hamill, S.J. (1999). Folding studies of 726 immunoglobulin-like β-sandwich proteins suggest that they share a common folding pathway. 727 Structure 7, 1145–1153. 728 36. Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Mitchell, A.L., 729 Moulton, G., Nordle, A., Paine, K., Taylor, P., et al. (2003). PRINTS and its automatic supplement, 730 prePRINTS. Nucleic Acids Res. 31, 400–402. 731 37. Cock, J.M., Sterck, L., Rouzé, P., Scornet, D., Allen, A.E., Amoutzias, G., Anthouard, V., 732 Artiguenave, F., Aury, J.-M., Badger, J.H., et al. (2010). The Ectocarpus genome and the 733 independent evolution of multicellularity in brown algae. Nature 465, 617. 734 38. Brown, M.W., Heiss, A.A., Kamikawa, R., Inagaki, Y., Yabuki, A., Tice, A.K., Shiratori, 735 T., Ishida, K.-I., Hashimoto, T., Simpson, A.G.B., et al. (2018). Phylogenomics Places Orphan 736 Protistan Lineages in a Novel Eukaryotic Super-Group. Genome Biol. Evol. 10, 427–433. 737 39. Richter, D., Fozouni, P., Eisen, M., and King, N. (2017). The ancestral animal genetic 738 toolkit revealed by diverse choanoflagellate transcriptomes. bioRxiv. Available at: 739 http://biorxiv.org/content/early/2017/12/26/211789.abstract. 740 40. Cornillon, S., Gebbie, L., Benghezal, M., Nair, P., Keller, S., Wehrle-Haller, B., Charette, 741 S.J., Brückert, F., Letourneur, F., and Cosson, P. (2006). An adhesion molecule in free-living 742 Dictyostelium amoebae with integrin β features. EMBO Rep. 7, 617–621. 743 41. Suga, H., Chen, Z., de Mendoza, A., Sebé-Pedrós, A., Brown, M.W., Kramer, E., Carr, M., 744 Kerner, P., Vervoort, M., Sánchez-Pons, N., et al. (2013). The Capsaspora genome reveals a 745 complex unicellular prehistory of animals. 4, 2325. 746 42. Brunet, T., and King, N. (2017). The Origin of Animal Multicellularity and Cell 747 Differentiation. Dev. Cell 43, 124–140. 748 43. King, N., Westbrook, M.J., Young, S.L., Kuo, A., Abedin, M., Chapman, J., Fairclough, 749 S., Hellsten, U., Isogai, Y., and Letunic, I. (2008). The genome of the choanoflagellate Monosiga 750 brevicollis and the origin of metazoans. Nature 451, 783–788. 751 44. Fairclough, S.R., Chen, Z., Kramer, E., Zeng, Q., Young, S., Robertson, H.M., Begovic, 752 E., Richter, D.J., Russ, C., and Westbrook, M.J. (2013). Premetazoan genome evolution and the 753 regulation of cell differentiation in the choanoflagellate Salpingoeca rosetta. Genome Biol. 14, 754 R15. 755
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
34
45. Pulido, D., Hussain, S.-A., and Hohenester, E. (2017). Crystal Structure of the 756 Heterotrimeric Integrin-Binding Region of Laminin-111. Struct. Lond. Engl. 1993 25, 530–535. 757 46. Calzada, M.J., Annis, D.S., Zeng, B., Marcinkiewicz, C., Banas, B., Lawler, J., Mosher, 758 D.F., and Roberts, D.D. (2004). Identification of novel β1 integrin binding sites in the type 1 and 759 type 2 repeats of thrombospondin-1. J. Biol. Chem. 279, 41734–41743. 760 47. Hynes, R.O. (2012). The evolution of metazoan extracellular matrix. J. Cell Biol. 196, 671. 761 48. Babushok, D.V., Ohshima, K., Ostertag, E.M., Chen, X., Wang, Y., Mandal, P.K., Okada, 762 N., Abrams, C.S., and Kazazian, H.H., Jr (2007). A novel testis ubiquitin-binding protein gene 763 arose by exon shuffling in hominoids. Genome Res. 17, 1129–1138. 764 49. Adams, J.C. (2013). Extracellular Matrix Evolution: An Overview. In Evolution of 765 Extracellular Matrix, F. W. Keeley and R. P. Mecham, eds. (Berlin, Heidelberg: Springer Berlin 766 Heidelberg), pp. 1–25. Available at: https://doi.org/10.1007/978-3-642-36002-2_1. 767 50. Lodish, H., Darnell, J.E., Berk, A., Kaiser, C.A., Krieger, M., Scott, M.P., Bretscher, A., 768 Ploegh, H., and Matsudaira, P. (2008). Molecular cell biology (Macmillan). 769 51. Creighton, T.E. (1993). Proteins: structures and molecular properties (Macmillan). 770 52. Branden, C.I., and Tooze, J. (2012). Introduction to protein structure (Garland Science). 771 53. Jacob, J., Duclohier, H., and Cafiso, D.S. (1999). The Role of Proline and Glycine in 772 Determining the Backbone Flexibility of a Channel-Forming Peptide. Biophys. J. 76, 1367–1376. 773 54. Li, R., Mitra, N., Gratkowski, H., Vilaire, G., Litvinov, R., Nagasami, C., Weisel, J.W., 774 Lear, J.D., DeGrado, W.F., and Bennett, J.S. (2003). Activation of integrin αIIbß3 by modulation 775 of transmembrane helix associations. Science 300, 795–798. 776 55. Schneider, D., and Engelman, D.M. (2004). Involvement of transmembrane domain 777 interactions in signal transduction by α/β integrins. J. Biol. Chem. 279, 9840–9846. 778 56. Clarke, M., Lohan, A.J., Liu, B., Lagkouvardos, I., Roy, S., Zafar, N., Bertelli, C., Schilde, 779 C., Kianianmomeni, A., and Burglin, T.R. (2013). Genome of Acanthamoeba castellanii highlights 780 extensive lateral gene transfer and early evolution of tyrosine kinase signaling. Genome Biol 14. 781 57. Tice, A.K., Shadwick, L.L., Fiore-Donno, A.M., Geisen, S., Kang, S., Schuler, G.A., 782 Spiegel, F.W., Wilkinson, K.A., Bonkowski, M., Dumack, K., et al. (2016). Expansion of the 783 molecular and morphological diversity of Acanthamoebidae (Centramoebida, Amoebozoa) and 784 identification of a novel life cycle type within the group. Biol. Direct 11, 69. 785 58. Urushihara, H., Kuwayama, H., Fukuhara, K., Itoh, T., Kagoshima, H., Shin-I, T., Toyoda, 786 A., Ohishi, K., Taniguchi, T., Noguchi, H., et al. (2015). Comparative genome and transcriptome 787 analyses of the social amoeba Acytostelium subglobosum that accomplishes multicellular 788 development without germ-soma differentiation. BMC Genomics 16, 80–80. 789 59. Lahr, D.J.G., Kosakyan, A., Lara, E., Mitchell, E.A.D., Morais, L., Porfirio-Sousa, A.L., 790 Ribeiro, G.M., Tice, A.K., Pánek, T., Kang, S., et al. (2019). Phylogenomics and Morphological 791 Reconstruction of Arcellinida Testate Amoebae Highlight Diversity of Microbial Eukaryotes in 792 the Neoproterozoic. Curr. Biol. 29, 991-1001.e3. 793 60. Tekle, Y.I., Anderson, O.R., Katz, L.A., Maurer-Alcalá, X.X., Romero, M.A.C., and 794 Molestina, R. (2016). Phylogenomics of ‘Discosea’: A new molecular phylogenetic perspective 795 on Amoebozoa with flat body forms. Mol Phylogenet Evol 99, 144–154. 796 61. Heidel, A.J., and Glöckner, G. (2008). Mitochondrial Genome Evolution in the Social 797 Amoebae. Mol. Biol. Evol. 25, 1440–1450. 798 62. Eichinger, L., Pachebat, J.A., Glockner, G., Rajandream, M.A., Sucgang, R., Berriman, M., 799 Song, J., Olsen, R., Szafranski, K., Xu, Q., et al. (2005). The genome of the social amoeba 800 Dictyostelium discoideum. Nature 435, 43–57. 801
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
35
63. Gonzales, C.M., Spencer, T.D., Pendley, S.S., and Welker, D.L. (1999). Dgp1 and Dfp1 802 Are Closely Related Plasmids in theDictyosteliumDdp2 Plasmid Family. Plasmid 41, 89–96. 803 64. Sucgang, R., Kuo, A., Tian, X., Salerno, W., Parikh, A., Feasley, C.L., Dalin, E., Tu, H., 804 Huang, E., Barry, K., et al. (2011). Comparative genomics of the social amoebae Dictyostelium 805 discoideum and Dictyostelium purpureum. Genome Biol. 12, R20–R20. 806 65. Loftus, B., Anderson, I., Davies, R., Alsmark, U.C.M., Samuelson, J., Amedeo, P., 807 Roncaglia, P., Berriman, M., Hirt, R.P., Mann, B.J., et al. (2005). The genome of the protist 808 parasite Entamoeba histolytica. Nature 433, 865. 809 66. Ehrenkaufer, G.M., Weedall, G.D., Williams, D., Lorenzi, H.A., Caler, E., Hall, N., and 810 Singh, U. (2013). The genome and transcriptome of the enteric parasite Entamoeba invadens, a 811 model for encystation. Genome Biol. 14, R77. 812 67. Tachibana, H., Yanagi, T., Pandey, K., Cheng, X.-J., Kobayashi, S., Sherchand, J.B., and 813 Kanbara, H. (2007). An Entamoeba sp. strain isolated from rhesus monkey is virulent but 814 genetically different from Entamoeba histolytica. Mol. Biochem. Parasitol. 153, 107–114. 815 68. Schaap, P., Barrantes, I., Minx, P., Sasaki, N., Anderson, R.W., Bénard, M., Biggar, K.K., 816 Buchler, N.E., Bundschuh, R., Chen, X., et al. (2015). The Physarum polycephalum Genome 817 Reveals Extensive Use of Prokaryotic Two-Component and Metazoan-Type Tyrosine Kinase 818 Signaling. Genome Biol. Evol. 8, 109–125. 819 69. Heidel, A.J., Lawal, H.M., Felder, M., Schilde, C., Helps, N.R., Tunggal, B., Rivero, F., 820 John, U., Schleicher, M., and Eichinger, L. (2011). Phylogeny-wide analysis of social amoeba 821 genomes highlights ancient origins for complex intercellular communication. Genome Res. 822 70. Cavalier-Smith, T., Fiore-Donno, A.M., Chao, E., Kudryavtsev, A., Berney, C., Snell, E.A., 823 and Lewis, R. (2015). Multigene phylogeny resolves deep branching of Amoebozoa. Mol 824 Phylogenet Evol 83, 293–304. 825 71. Li, W., and Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large 826 sets of protein or nucleotide sequences. Bioinformatics 22, 1658–1659. 827 72. Grabherr, M.G., Haas, B.J., Yassour, M., Levin, J.Z., Thompson, D.A., and Amit, I. (2011). 828 Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat 829 Biotechnol 29. 830 73. Haas, B.J., Papanicolaou, A., Yassour, M., Grabherr, M., Blood, P.D., Bowden, J., Couger, 831 M.B., Eccles, D., Li, B., Lieber, M., et al. (2013). De novo transcript sequence reconstruction from 832 RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 8, 1494–833 1512. 834 74. Croning, M.D.R., Apweiler, R., and Möller, S. (2001). Evaluation of methods for the 835 prediction of membrane spanning regions. Bioinformatics 17, 646–653. 836 75. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat 837 Methods 9. 838 76. Buchfink, B., Xie, C., and Huson, D.H. (2014). Fast and sensitive protein alignment using 839 DIAMOND. Nat. Methods 12, 59. 840 77. Fischer, S., Brunk, B.P., Chen, F., Gao, X., Harb, O.S., Iodice, J.B., Shanmugam, D., Roos, 841 D.S., and Stoeckert, C.J. (2002). Using OrthoMCL to Assign Proteins to OrthoMCL-DB Groups 842 or to Cluster Proteomes Into New Ortholog Groups. In Current Protocols in Bioinformatics (John 843 Wiley & Sons, Inc.). 844 78. Bolger, A.M., Lohse, M., and Usadel, B. (2014). Trimmomatic: a flexible trimmer for 845 Illumina sequence data. Bioinformatics 30. 846
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
36
79. Katoh, K., and Standley, D.M. (2013). MAFFT Multiple Sequence Alignment Software 847 Version 7: Improvements in Performance and Usability. Mol Biol Evol 30. 848 80. Criscuolo, A., and Gribaldo, S. (2010). BMGE (Block Mapping and Gathering with 849 Entropy): a new software for selection of phylogenetic informative regions from multiple sequence 850 alignments. BMC Evol Biol 10. 851 81. Nguyen, L.T., Schmidt, H.A., Haeseler, A., and Minh, B.Q. (2014). IQ-TREE: A Fast and 852 Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Mol Biol Evol 853 32. 854 82. Li, B., and Dewey, C.N. (2011). RSEM: accurate transcript quantification from RNA-Seq 855 data with or without a reference genome. BMC Bioinformatics 12. 856 83. Almagro Armenteros, J.J., Tsirigos, K.D., Sønderby, C.K., Petersen, T.N., Winther, O., 857 Brunak, S., von Heijne, G., and Nielsen, H. (2019). SignalP 5.0 improves signal peptide 858 predictions using deep neural networks. Nat. Biotechnol. 37, 420–423. 859 84. MacManes, M.D. (2017). The Oyster River Protocol: A Multi Assembler and Kmer 860 Approach For de novo Transcriptome Assembly. bioRxiv. Available at: 861 http://biorxiv.org/content/early/2017/08/16/177253.abstract. 862 85. Bailey, T.L., Boden, M., Buske, F.A., Frith, M., Grant, C.E., Clementi, L., Ren, J., Li, 863 W.W., and Noble, W.S. (2009). MEME Suite: tools for motif discovery and searching. Nucleic 864 Acids Res. 37, W202–W208. 865 86. Nielsen, H., Almagro Armenteros, J.J., Sønderby, C.K., Sønderby, S.K., and Winther, O. 866 (2017). DeepLoc: prediction of protein subcellular localization using deep learning. 867 Bioinformatics 33, 3387–3395. 868 87. Huerta-Cepas, J., Serra, F., and Bork, P. (2016). ETE 3: Reconstruction, Analysis, and 869 Visualization of Phylogenomic Data. Mol. Biol. Evol. 33, 1635–1638. 870 88. El-Gebali, S., Mistry, J., Bateman, A., Eddy, S.R., Luciani, A., Potter, S.C., Qureshi, M., 871 Richardson, L.J., Salazar, G.A., Smart, A., et al. (2018). The Pfam protein families database in 872 2019. Nucleic Acids Res. 47, D427–D432. 873 89. Van Dongen, S.M. (2000). Graph clustering by flow simulation. 874 90. Blandenier, Q., Seppey, C.V.W., Singer, D., Vlimant, M., Simon, A., Duckert, C., and Lara, 875 E. (2017). Mycamoeba gemmipara nov. gen., nov. sp., the First Cultured Member of the 876 Environmental Dermamoebidae Clade LKM74 and its Unusual Life Cycle. J. Eukaryot. Microbiol. 877 64, 257–265. 878 91. Picelli, S., Faridani, O.R., Bjorklund, A.K., Winberg, G., Sagasser, S., and Sandberg, R. 879 (2014). Full-length RNA-seq from single cells using Smart-seq2. Nat Protoc 9. 880 92. Onsbring, H., Tice, A.K., Barton, B.T., Brown, M.W., and Ettema, T.J.G. (2019). An 881 efficient single-cell transcriptomics workflow to assess protist diversity and lifestyle. bioRxiv, 882 782235. 883 93. Boratyn, G.M., Schäffer, A.A., Agarwala, R., Altschul, S.F., Lipman, D.J., and Madden, 884 T.L. (2012). Domain enhanced lookup time accelerated BLAST. Biol. Direct 7, 12–12. 885 94. Krogh, A., Larsson, B., von Heijne, G., and Sonnhammer, E.L.L. (2001). Predicting 886 transmembrane protein topology with a hidden markov model: application to complete 887 genomes11Edited by F. Cohen. J. Mol. Biol. 305, 567–580. 888 95. Fischer, S., Brunk, B.P., Chen, F., Gao, X., Harb, O.S., Iodice, J.B., Shanmugam, D., Roos, 889 D.S., and Stoeckert, C.J., Jr (2011). Using OrthoMCL to assign proteins to OrthoMCL-DB groups 890 or to cluster proteomes into new ortholog groups. Curr. Protoc. Bioinforma. Chapter 6, Unit-891 6.12.19. 892
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
37
96. Xie, Y., Wu, G., Tang, J., Luo, R., Patterson, J., and Liu, S. (2014). SOAPdenovo-Trans: 893 de novo transcriptome assembly with short RNA-Seq reads. Bioinformatics 30. 894 895 896 897 898 METHODS 899
900
Method Details 901
We have strategically sampled 118 Amorphea species and their transcriptomes and genomes to 902
cover all known Amorphea subgroups. Additionally, we also searched for IMAC proteins in any 903
of the eukaryotic data in NCBI. 904
905
Culturing and Isolation of Mycamoeba gemmipara 906
Mycamoeba gemmipara was obtained from Q. Blandenier and cultured as in Blandenier et al. 907
2017[90]. 908
909
Ultra-low input/ Single-Cell cDNA library construction based on Smart-Seq2 910
The cultured amoeba from Mycamoeba gemmipara were subjected to Smart-Seq2 [91] for cDNA 911
library construction as in Onsbring et al. 2019[92]. About 50 cells scraped off of a agar plate using 912
a 30-gauge platinum wire placed directly into a 200 µL thin-walled PCR tube containing the cell 913
lysis mixture [91]. Each reaction was then subjected to six freeze-thaw cycles in -80 °C 914
isopropanol and ~25 °C DI H2O respectively, before following the remainder of the protocol. A 915
modified version of Smart-Seq2 [91] was used for RNA isolation and dscDNA preparation. cDNA 916
libraries were prepared using a Nextera XT DNA Library Prep Kit (Illumina, CA) following the 917
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
38
manufacturer's protocol with dual index primers. The Mycamoeba gemmipara library was 918
sequenced using an Illumina HiSeq 4000 at Genome Quebec. 919
920
Transcriptomic Sequencing Assembly 921
Raw sequence data from the Illumina sequencing platform or from Bioproject accession number 922
PRJNA380424 [14] were subjected to TRIMMOMATIC V 0.35 [78], for cleaning and trimming of 923
poorly called reads or specific to the adaptor will be used in sequencing with the paired-end data. 924
Quality filtered and cleaned high quality reads were assembled through TRINITY [73] for de novo 925
assembly. Nucleotide sequences were translated with TRANSDECODER V 5.5.0 926
(https://github.com/TransDecoder/TransDecoder/) which also predicts open reading frames, and 927
PFAM domains [88] as well. Contigs of transcriptomes were blasted against the non-redundant 928
database of NCBI using the BLASTX program of BLAST+ suite [93]. Each predicted ORFs were 929
assigned an ortholog number through the OrthoMCL v5 [77] database. For taxa with clearly 930
truncated predicted proteins, the computationally laborious transcriptome assembly pipeline, 931
Oyster River Protocol[84], was performed in an attempt to retrieve predicted a full-length integrin 932
proteins. 933
934
Integrin Protein analysis 935
We selected a model organism IMAC proteins from NCBI: ITGA5:P08648, ITGB1:NP_002202, 936
Talin:AAF27330, Parvin:AAH16713, PINCH:NP_060450.2, vinculin:AAH39174, 937
FAK:AAA35819 Paxillin:AAC50104 ILK:NP_001014794, a-actinin:AAC17470, 938
Filamin:AAF72339 and Tensin:AAG33700. INTERPROSCAN 5.27-66.0[23] was used to determine 939
these IMAC protein domain architecture along with SIGNALIP v 5.0 [83] and TMHMM v 2.0 [94]. 940
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
39
MEME-SUITE 5.0.4 [85]was used to examine integrin motifs. DEEPLOC v 1.0[86] was used to 941
determine the subcellular localization of integrin proteins. We set the minimum criteria of 942
canonical IMAC proteins based on their architecture and motifs. We used OrthoMCL to assign 943
IMAC proteins their own ortholog numbers. 944
945
Since OrthoMCL database is heavily metazoan biased, we created an ortholog database with 946
eukaryotes listed in resources table. A modified version of OrthoMCL-DB [95] was used to create 947
a novel ortholog database using the above listed transcriptomes and genomes as well as the whole 948
OrthoMCL DB v5. To do this, an all-against-all BLAST using DIAMOND-BLASTP[76] was 949
conducted using each protein from the above data as queries. The DIAMOND-BLASTP collected up 950
to 1,000 hits with e-value of 1e-5 as a cutoff for putative homology. Blast results were clustered 951
using the OrthoMCL pipeline methodology [95]. 952
953
BLASTP was used to search for novel amoebozoan IMAC proteins through the new ortholog 954
database and using the previous amoebozoan ortholog protein as a query. Using a modified custom 955
pipeline was used to collect a novel IMAC orthologs. These methods generated various orthologs 956
with a similar protein architecture. We created a FASTA file of all novel eukaryote integrin 957
proteins based on their protein architecture. Clustering of these integrin proteins was performed 958
by CD- HIT [71], which used the condition of 0.95 global sequence identity. From CD-HIT output, 959
we used DIAMOND [76] to perform All-vs-ALL blast with database size of 50 million sequences. 960
Consequently, a custom script of Markov cluster (MCL) algorithm[89] was used to cluster the 961
output of DIAMOND. A custom PYTHON script was used to eliminate any MCL clustered integrin 962
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
40
proteins that were less than 500 AA. We used a custom PYTHON pipeline to collects ortholog 963
sequences that clustered to model metazoan IMAC orthologs listed above. 964
965
Sometimes, we fail to capture integrin proteins even after all these strenuous processes. Therefore, 966
PFAM ID of Integrins was used to grab any putative integrins from TRANSDECODER output. For 967
ITGA, we used FG-GAP (PF01839), and for ITGB we used PSI_integrin (PF17205), integrin_beta 968
(PF00362), integrin_b_cyt (PF08725) and integrin B tail (PF07965). We disregarded any protein 969
sequences with PFam ID lower than e-value of 1e-10. The identity of the putative integrins were 970
confirmed by INTERPROSCAN, SIGNALIP, TMHMM, DEEPLOC, and MEME-SUITE. 971
972
For each putative integrin protein, we examined proteins motifs with MEME-SUITE, and always 973
blasted our proteins of interest using BLASTP against NCBI’s GenBank NR database to check for 974
possible contaminations. For integrin proteins with a novel domains, the read depth of these 975
transcripts were examined with RSEM. The final amoebozoan integrin architectures were compared 976
with metazoan integrin proteins. A custom PYTHON script was used to create a presence/absence 977
binary (1,0) matrix of IMAC proteins across Amoebozoa. A custom R script were used to create a 978
heatmap. 979
980
In our current study these data are transcriptomic in nature and by this virtue some of the predicted 981
transcript may be truncated or missing. However, identifying full-length transcripts from short 982
reads from Illumina technology can be difficult because assembled transcript can be fragmented 983
or incomplete due to alternative splice sites and untranslated regions [96]. Therefore, it is possible 984
that some transcripts are not fully assembled, and therefore their corresponding predicted protein 985
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
41
is truncated, thus the transmembrane region and signal peptide region may be missing. The same 986
can be said for the absence of these proteins within our data, because transcriptomes contain only 987
genes being expressed at the time mRNA was harvested some genes within an organism’s genome 988
may be missed. None-the-less, our methodology permitted the unparalleled discovery of these 989
proteins within a whole supergroup, highlighting the utility of the method. 990
991
Manually assembled contigs in SEQUENCHER 992
Where necessary due to fragmentation, contigs of our automated de novo transcriptomic 993
assemblies as above from each gene of interest were blasted (BLASTN) back to the raw nucleotide 994
data and to the assembly for each transcriptome. Hits were collected and assembled using 995
SEQUENCHER v 5.4.6 (GeneCodes, Madison, WI, USA). The taxa which we had to take this 996
approach were Amphizonella sp. 2, Centropyxis aerophyla, Goncevia foncevia and Hyalosphenia 997
papilio, Pellita catalonica for ITGA, and Nebela sp. for both integrin proteins. 998
999
Presence of integrin in the Mastigamoeba genome 1000
To confirm the presence of these genes in the genome, we blasted the integrin transcripts of M. 1001
balamuthi to the whole shotgun genome data (CBKX00000000; 1002
https://www.ncbi.nlm.nih.gov/nuccore/CBKX000000000.1) on NCBI [33]. The common splicing 1003
sites were for searched in integrins intron and exon boundaries by assembling the integrin genomic 1004
contig and integrin transcript in SEQUENCHER v 5.4.6. The genome contigs of which contained 1005
integrin gene were searched for additional ORFs to examine the phylogenetic affinity of other 1006
ORFs on the genomic contigs. We inferred maximum likelihood phylogenetic tree of tenascin 1007
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
42
(adjacent to the ITGB gene on CBKX010020318.1) and PA14 (adjacent to the ITGB gene on 1008
CBKX010020319.1), we used same conditions as the integrin phylogenetic trees (see below). 1009
1010 Phylogenetic trees 1011
To look for evolutionary relationships of ITGA and ITGB within Amorphea, protein sequences 1012
were aligned by MAFFT-LINSI with the parameters “--maxiterate 1000” and “--local pair”. 1013
Ambiguous sites were trimmed from the alignments using BMGE [80] by gap penalty of 0.8 settings. 1014
Maximum likelihood (ML) trees were inferred from these trimmed alignments in IQTREE v 1.5.5 1015
[81]under the LG model with the C60 series model of site heterogeneity. Each tree is ML 1016
bootstrapped (MLBS) by 1,000 pseudoreplicates. From the resultant tree we used a custom 1017
PYTHON script implementing ETE-TOOLKIT (www.etetoolkit.org) to map protein domain 1018
architecture (IPRSCAN domain IDs) onto the tree. 1019
1020
Data and software availability 1021
The accession number for the Mycamoeba gemmipara and Mastigamoeba balamuthi 1022
transcriptome raw reads is NCBI-SRA: XXXXX and XXXXX also shown in Key resources table. 1023
All protein and nucleotide sequences of each gene, all protein alignments (trimmed and untrimmed) 1024
for phylogenetic trees, and phylogenetic trees are available on the DRYAD repository 1025
(https://datadryad.org/ | Accession: XXXXX). 1026
1027
1028
Supplemental Text 1029
SUPPLEMENTAL METHODS, NOTES, AND RESULTS 1030 Key resources Table 1031
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
43
Resources Sources Identifier Data Examined 1032 Acanthamoeba castellanii [56] https://doi.org/10.1186/gb-2013-14-2-r11 Acanthamoeba pyroformis [57] https://doi.org/10.1186/s13062-016-0171-
0 Acanthamoeba astronyxis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB7687/ Acanthamoeba comandoni PRJNA354221 https://www.ncbi.nlm.nih.gov/bioproject/
PRJNA354221 Acanthamoeba culbertsoni PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB7687 Acanthamoeba divionensis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB7687 Acanthamoeba healyi PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB7687 Acanthamoeba lenticulate PRJNA374454 https://www.ncbi.nlm.nih.gov/bioproject/
PRJNA374454 Acanthamoeba lugdunensis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB7687 Acanthamoeba mauritaniensis PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB7687 Acanthamoeba pearcei PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/
PRJEB7687
Acanthamoeba polyphaga PRJNA307312 https://www.ncbi.nlm.nih.gov/bioproject/PRJNA307312
Acanthamoeba quina PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB7687
Acanthamoeba rhysodes PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB7687
Acanthamoeba royreba PRJEB7687 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB7687
Acytostelium ellipticum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB14640
Acytostelium leptosomum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB14640
Acytostelium subglobosum [58] https://doi.org/10.1186/s12864-015-1278-x
Amoeba proteus [14] https://doi.org/10.1093/molbev/msx162
Amphizonella sp. 2 [59] https://doi.org/10.1016/j.cub.2019.01.078 Amphizonella sp. 7 [59] https://doi.org/10.1016/j.cub.2019.01.078 Amphizonella sp. 8 [59] https://doi.org/10.1016/j.cub.2019.01.078 Amphizonella sp. 9 [59] https://doi.org/10.1016/j.cub.2019.01.078 Amphizonella sp.10 [59] https://doi.org/10.1093/molbev/msx162
Arcella gibbosa [14] https://doi.org/10.1093/molbev/msx162
Arcella vulgaris [59] https://doi.org/10.1016/j.cub.2019.01.078
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
44
Armaparvus languidus MMETSP0420 https://www.imicrobe.us/project/view/304
Cavenderia deminutiva
PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB14640
Cavostelium apophysatum [14] https://doi.org/10.1093/molbev/msx162
Centropyxis aerophyla [59] https://doi.org/10.1016/j.cub.2019.01.078 Centropyxis sp. [59] https://doi.org/10.1016/j.cub.2019.01.078
Ceratiomyxa fruticulosa [14] https://doi.org/10.1093/molbev/msx162
Ceratiomyxella tahitiensis [14] https://doi.org/10.1093/molbev/msx162
Clastostelium recurvatum [14] https://doi.org/10.1093/molbev/msx162
Clydonella sp. [60] https://doi.org/10.1016/j.ympev.2016.03.029
Cochliopodium minus [14] https://doi.org/10.1093/molbev/msx162
Copromyxa protea [14] https://doi.org/10.1093/molbev/msx162
Coremiostelium polycephalum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB14640
Cryptodifflugia operculata [14] https://doi.org/10.1093/molbev/msx162
Cunea sp. [JDS-Ruffled] [14] https://doi.org/10.1093/molbev/msx162
Cunea sp. [MMETSP0417] MMETSP0417 https://data.iplantcollaborative.org/dav/iplant/commons/community_released/imicrobe/projects/104/samples/1830/
Cyclopyxis sp. [59] https://doi.org/10.1016/j.cub.2019.01.078 Dermamoeba algensis [14] https://doi.org/10.1093/molbev/msx162
Dictyostelium citrinum [61] https://doi.org/10.1093/molbev/msn088 Dictyostelium discoideum [62] https://doi.org/10.1038/nature03481 Dictyostelium firmibasis [63] https://doi.org/10.1006/plas.1998.1385 Dictyostelium intermedium PRJNA45879 https://www.ncbi.nlm.nih.gov/bioproject/
PRJNA45879 Dictyostelium purpureum [64] https://doi.org/10.1006/10.1186/gb-2011-
12-2-r20 Difflugia bryophila [59] https://doi.org/10.1016/j.cub.2019.01.078
Difflugia sp. [14] https://doi.org/10.1093/molbev/msx162
Difflugia sp. D2 [59] https://doi.org/10.1016/j.cub.2019.01.078
Diplochlamys sp. [14] https://doi.org/10.1093/molbev/msx162
Dracoamoeba jomungandrii [57] https://doi.org/10.1186/s13062-016-0171-0
Echinamoeba exudans [14] https://doi.org/10.1093/molbev/msx162
Echinosteliopsis oligospora [14] https://doi.org/10.1093/molbev/msx162
Echinostelium bisporum [14] https://doi.org/10.1093/molbev/msx162
Echinostelium minutum [14] https://doi.org/10.1093/molbev/msx162
Endostelium zonatum [PRA-191] [57] https://doi.org/10.1186/s13062-016-0171-0
Endostelium zonatum LINKS [14] https://doi.org/10.1093/molbev/msx162
Entamoeba dispar PRJNA12916 https://www.ncbi.nlm.nih.gov/bioproject/12916
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
45
Entamoeba histolytica [65] https://doi.org/10.1038/nature03291 Entamoeba invadens [66] https://doi.org/10.1186/gb-2013-14-7-r77 Entamoeba moshkovskii PRJNA12921 https://www.ncbi.nlm.nih.gov/bioproject/
12921 Entamoeba nuttalli [67] https://doi.org/10.1016/j.molbiopara.2007.
02.006
Filamoeba nolandi MMETSP0413 https://www.imicrobe.us/assembly/view/293
Flabellula citata [14] https://doi.org/10.1093/molbev/msx162
Flamella aegyptia [14] https://doi.org/10.1093/molbev/msx162
Gocevia fonbrunei [14] https://doi.org/10.1093/molbev/msx162
Gocevia fonbrunei_Tekle_2016 [60] https://doi.org/10.1016/j.ympev.2016.03.029
Grellamoeba robusta [14] https://doi.org/10.1093/molbev/msx162
Heleopera sphagni [59] https://doi.org/10.1016/j.cub.2019.01.078 Heleopera sylvatica [59] https://doi.org/10.1016/j.cub.2019.01.078
Heterostelium multicystogenum PRJNA495730 https://www.ncbi.nlm.nih.gov/bioproject/PRJNA495730
Heterostelium pallidum [61] http://dictybase.org/ Hyalosphenia elegance [59] https://doi.org/10.1016/j.cub.2019.01.078
Hyalosphenia papilio [59] https://doi.org/10.1016/j.cub.2019.01.078 Idionectes vortex PRJNA531640 https://www.ncbi.nlm.nih.gov/bioproject/?
term=Idionectes%20vortex Lesquereusia sp. [59] https://doi.org/10.1016/j.cub.2019.01.078
Lingulamoeba sp. [14] https://doi.org/10.1093/molbev/msx162
Luapelamoeba arachisporum [57] https://doi.org/10.1186/s13062-016-0171-0
Luapelamoeba hula [57] https://doi.org/10.1186/s13062-016-0171-0
Mastigamoeba abducta [14] https://doi.org/10.1093/molbev/msx162
Mastigamoeba balamuthi (genome) [33] http://doi.org/10.1073/pnas.1219590110
Mastigamoeba balamuthi (transcriptome)
SRA TBD To be provided
Mastigella eilhardi [14] https://doi.org/10.1093/molbev/msx162
Mayorella cantabrigiensis [14] https://doi.org/10.1093/molbev/msx162
Micriamoeba sp. [14] https://doi.org/10.1093/molbev/msx162
Microchlamys sp. [59] https://doi.org/10.1016/j.cub.2019.01.078 Mycamoeba gemmipara SRA TBD https://doi.org/10.1111/jeu.1235 Nebela sp. [59] https://doi.org/10.1016/j.cub.2019.01.078 Nematostelium gracile [14] https://doi.org/10.1093/molbev/msx162
Netzelia oviformis [59] https://doi.org/10.1016/j.cub.2019.01.078 Nolandella sp. [14] https://doi.org/10.1093/molbev/msx162
Ovalopodium desertum [14] https://doi.org/10.1093/molbev/msx162
Paradermamoeba levis [14] https://doi.org/10.1093/molbev/msx162
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
46
Paramoeba aestuarina MMETSP0161_2 https://www.imicrobe.us/assembly/view/205
Paramoeba atlantica MMETSP0151_2 https://www.imicrobe.us/assembly/view/210
Parvamoeba rugata [14] https://doi.org/10.1093/molbev/msx162
Parvamoeba monoura [32] https://doi.org/10.1016/j.ympev.2016.03.029
Pellita catalonica [14] https://doi.org/10.1093/molbev/msx162
Pelomyxa sp. [14] https://doi.org/10.1093/molbev/msx162
Phalansterium solitarium [14] https://doi.org/10.1093/molbev/msx162
Physarum polycephalum [68] https://dx.doi.org/10.1093/gbe/evv237 Pygsuia biforma [11] http://doi.org/10.1098/rspb.2013.1755
Planocarina carinata [59] https://doi.org/10.1016/j.cub.2019.01.078 Polysphondylium pallidum [69] http://dictybase.org/ Protacanthamoeba bohemica [14] https://doi.org/10.1093/molbev/msx162
Protosporangium articulatum [14] https://doi.org/10.1093/molbev/msx162
Protostelium nocturnum [14] https://doi.org/10.1093/molbev/msx162
Pyxidicula operculatum [59] https://doi.org/10.1016/j.cub.2019.01.078
Rhizamoeba saonxica [14] https://doi.org/10.1093/molbev/msx162
Rhizomastix elongate [14] https://doi.org/10.1093/molbev/msx162
Rhizomastix libera [14] https://doi.org/10.1093/molbev/msx162
Ripella sp. [14] https://doi.org/10.1093/molbev/msx162
Sapocribrum chincoteaguense MMETSP0437 https://www.imicrobe.us/assembly/view/314
Sappinia pedata [14] https://doi.org/10.1093/molbev/msx162
Schizoplasmodiopsis pseudoendospora
[14] https://doi.org/10.1093/molbev/msx162
Schizoplasmodiopsis vulgaris [14] https://doi.org/10.1093/molbev/msx162
Speleostelium caveatum PRJNA495862 https://www.ncbi.nlm.nih.gov/bioproject/PRJNA495862
Soliformovum irregularis [14] https://doi.org/10.1093/molbev/msx162
Squamamoeba japonica [14] https://doi.org/10.1093/molbev/msx162
Stenamoeba limacine [14] https://doi.org/10.1093/molbev/msx162
Stenamoeba stenopodia [14] https://doi.org/10.1093/molbev/msx162
Stygamoeba regulata MMETSP0447 https://www.imicrobe.us/assembly/view/316
Synstelium polycarpum PRJEB14640 https://www.ncbi.nlm.nih.gov/bioproject/PRJEB14640
Thecamoeba quadrilineata [60] https://doi.org/10.1016/j.ympev.2016.03.029
Thecamoeba sp. [14] https://doi.org/10.1093/molbev/msx162
Thecamoebidae isolate [RHP1-1] [14] https://doi.org/10.1093/molbev/msx162
Trichosphaerium sp. (ATCC 40318) MMETSP0405 https://www.imicrobe.us/assembly/view/300
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
47
Tychosporium acutostipes [14] https://doi.org/10.1093/molbev/msx162
Vannella fimicola [14] https://doi.org/10.1093/molbev/msx162
Vannella robusta MMETSP0166 https://www.imicrobe.us/?#/samples/2469
Vannella schaefferi MMETSP0417 https://www.imicrobe.us/assembly/view/295
Vannella sp. MMETSP0168 https://www.imicrobe.us/assembly/view/34
Vermamoeba sp. [14] https://doi.org/10.1093/molbev/msx162
Vermamoeba vermiformis [14] https://doi.org/10.1093/molbev/msx162
Vermistella antarctica [60] https://doi.org/10.1016/j.ympev.2016.03.029
Vexillifera bacillipedes [70] http://doi.org/10.1016/j.ympev.2014.08.011
Vexillifera sp. MMETSP0173 https://www.imicrobe.us/#/samples/1757
List of Stramenophile TAXA from MMETSP 1033 Alexandrium minutum MOORE-via-CAMERA-MMETSP0328 1034 Aureoumbra_lagunensis CCMP1509 MMETSP0890 1035 Bolidomonas_sp RCC1657 MMETSP1321 1036 Bolidomonas_sp RCC2347 MMETSP1320 1037 Bolidomonas pacifica MMETSP1319-RCC208 1038 Cafeteria_sp CaronLabIsolate MMETSP1104 1039 Chattonella_subsalsa CCMP2191 MMETSP0947 1040 Chrysocystis_fragilis CCMP3189 MMETSP1165 **** 1041 Dictyocha_speculum CCMP1381 MMETSP1174 1042 Dinobryon_sp UTEXLB2267 MMETSP0019 1043 Fibrocapsa japonica-CCMP1661 MOORE-via-CAMERA-MMETSP1339 1044 Florenciella_sp RCC1587 MMETSP1324 Stramenopiles 1045 Florenciella parvula MMETSP1323-RCC1693 1046 Heterosigma_akashiwo CCMP2393 MMETSP0292 1047 Love2098 non_described LOVEJOY CMP2098 MMETSP0990 1048 Mallomonas_sp CCMP3275 MMETSP1167 1049 Ochromonas_sp BG1 MMETSP1105 1050 Paraphysomonas_bandaiensis CaronLabIsolate MMETSP1103 1051 Paraphysomonas_vestita GFlagA MMETSP1107 1052 Pelagomonas_calceolata CCMP1756 MMETSP0886 1053 Pelagococcus_subviridis CCMP1429 MMETSP0882 1054 Phaeomonas_parva CCMP2877 MMETSP1163 1055 Pinguiococcus_pyrenoidosus CCMP2078 MMETSP1160 1056 Pseudopedinella_elastica CCMP716 MMETSP1068 1057 Rhizochromulina_marina cf CCMP1243 MMETSP1173 1058 Spumella_elongata CCAP955 1 MMETSP1098 1059 Thraustochytrium_sp LLF1b MMETSP0198 1060 Vaucheria_litorea CCMP2940 MMETSP0945 1061
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
48
1062 1063 Reagents
2-propanol(certified ACS) Ethanol (pure) 200 proof
Sigma-aldrich I9516-25ML
Critical Commercial Assay Kits
Nextera XT DNA Library Prep Kit Illumina FC-131-1096 Nextera XT index kit v2 set A Illumina FC-131-2001 Nextera XT index kit v2 set B Illumina FC-131-2002 Nextera XT index kit v2 set C Illumina FC-131-2003 Nextera XT index kit v2 set C Illumina FC-131-2003
1064 Software and Algorithms
CD-HIT [71] https://github.com/weizhongli/cdhit TRINITY V2.4.0 [72] https://github.com/trinityrnaseq/trinityrnase
q/ TRANSDECODER V 5.5.0 [73] https://transdecoder.github.io/ TMHMM V 2.0 [74] http://www.cbs.dtu.dk/services/TMHMM/ BOWTIE2 V 2.3.4.3 [75] https://sourceforge.net/projects/bowtie-
bio/files/bowtie2/2.3.3.1 DIAMOND V 0.9.25 [76] https://github.com/bbuchfink/diamond ORTHOMCL V 5.0 [77] http://orthomcl.org/orthomcl/ TRIMMOMATIC V 0.35 [78] https://github.com/timflutre/trimmomatic MAFFT-LINSI V 7 [79] https://mafft.cbrc.jp/alignment/server/ BMGE V 1.12 [80] ftp://ftp.pasteur.fr/pub/GenSoft/projects/BM
GE SEQUENCER v 5.4.6. http://www.genecodes.com/ BLAST 2.2.30+ https://blast.ncbi.nlm.nih.gov/ IQTREE v 1.5.5 [81] http://www.iqtree.org RSEM [82] https://github.com/deweylab/RSEM SIGNALIP 5.0 [83] http://www.cbs.dtu.dk/services/SignalP/ OYSTER RIVER PROTOCOL V 2.1.1 [84] https://oyster-river-
protocol.readthedocs.io/en/latest/ MEME-SUITE V 5.0.4 [85] http://meme-suite.org/index.html DEEPLOC -1.0 [86] http://www.cbs.dtu.dk/services/DeepLoc/ INTERPROSCAN 5.27-66.0 [23] https://www.ebi.ac.uk/interpro/search/seque
nce-search ETE3 [87] http://etetoolkit.org/documentation/ete-
view/ PFAM [88] https://pfam.xfam.org/ MCL [89] https://micans.org/mcl/
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
49
1065 1066 1067
Contact for Reagent and resources sharing 1068
Further information and request for resources should be directed to, and will be fulfilled by, the 1069
corresponding author Matthew W. Brown ([email protected]). 1070
1071
Sib (similar to integrin beta) proteins found in dictyostelids | Sib Proteins comparison with 1072
ITB-like 1073
SibA, a cell adhesion molecule that binds to phagocytic particles similar to ITB, was discovered 1074
in Dictyostelium discoideum [45]. Sib proteins (A to E) are capable of binding to talin and SibC 1075
regulates cell adhesion [46]. However their protein structure is different from the canonical ITB 1076
in a lack of three cation binding motifs as well as a cysteine rich region at the C-terminus [26]. 1077
Although the varioseans, described above, are phylogenetically close to dictyostelids their type 1078
two ITB-like proteins with three cation motifs are distinct from Sib proteins, though with a great 1079
variation in amino acid sequences. Interestingly, we did not observe Sib proteins outside of the 1080
Dictyostelia clade, suggesting that they evolved recently in the dictyostelids. The protein 1081
architecture of SibA has a similar composition to metazoan ITB solely with a GXXXG motif in 1082
the transmembrane region, but it is observed in type II ITB (see supplemental table 2). In 1083
addition, LamG3, vWD domains in type II ITB have never been observed in Sib. 1084
1085
1086 1087 1088 1089 1090 1091 1092
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
50
1093 1094 1095 1096
1097 Supplemental Figure 1: Repertoire of IMAC including filamin and tensin in Amoebozoa: 1098 names of IMAC proteins are listed at the top and names of amoebozoan taxa are listed on side of 1099 the map. Arrows indicates amoebozoan species that have nearly complete set of IMAC proteins. 1100
Sappinia pedataThecamoeba sp.Thecamoebidae isolate [RHP1-1]
Stenamoeba limacina
Dermamoeba algensisMayorella cantabrigiensis
Squamamoeba japonicaSapocribrum chincoteaguenseArmaparvus languidus
Entamoeba nuttalliEntamoeba moshkovskiiEntamoeba invadensEntamoeba disparEntamoeba histolytica
Rhizomastix liberaRhizomastix elongataPelomyxa sp.
Mastigamoeba abductaMastigamoeba balamuthiMastigella eilhardi
Dictyostelium discoideumPolysphondylium pallidum
Clastostelium recurvatumProtosproangium articulatumCeratiomyxa fruticulosa
Physarum polycephalum
Echinostelium bisporumEchinosteliopsis oligospora
Phalansterium solitariumFlamella aegyptia
Filamoeba nolandiProtostelium nocturnumSoliformovum irregularisGrellamoeba robusta
Ceratiomyxella tahitiensisAcramoeba dendroidea
Cavostelium apophysatum
Schizoplasmodiopsis vulgarisSchizoplasmodiopsis pseudoendosporaTychosporium acutostipes
Trichosphaerium sp. Diplochlamys sp.Amphizonella sp. 2 Amphizonella sp.10Amphizonella sp. 8Amphizonella sp. 9Amphizonella sp. 7
Rhizamoeba_saxonicaFlabellula citata
Nolandella sp.Amoeba proteusCopromyxa protea
Microchlamys sp.Pyxidicula operculatumCryptodifflugia operculata
Heleopera sylvatica
Hyalosphenia papilioNebela sp.
Lesquereusia sp.Difflugia sp. D2Difflugia bryophila
Centropyxis aerophyla
Arcella vulgaris
α−Int
egrin
β−Int
egrin
Vincu
linPa
xillin
TalinPINCH
FAK
ILK α−Ac
tinin
ILKILK
Taxon
Vermamoeba vermiformisEchinamoeba exudansVermamoeba sp.
Heleopera sphagni
Echinostelium minutum
Nematostelium gracile
Stenamoeba stenopodiaThecamoeba quadrilineata
Paradermamoeba levis
Micriamoeba sp.
Paramoeba aestuarinaParamoeba atlanticaVexillifera sp. [MMETSP0173]Stygamoeba regulata
Ripella sp.Lingulamoeba sp.Vannella robustaVannella sp.
Ovalopodium desertumParvamoeba rugataGocevia fonbruneiCochliopodium minus
Acanthamoeba castellanii
Dracoamoeba jomungandriiProtacanthamoeba bohemica Luapelamoeba hula Luapelamoeba arachisporum
Pellita catalonicaEndostelium zonatum [PRA-191] Endostelium zonatum LINKS
Vermistella antarcticaVexillifera minutissima
Cunea sp. [JDS-Ruffled]
Clydonella sp.
Cunea sp.
Acanthamoeba pyroformis
ILK
TubulineaEvosea
Discosea
Mycamoeba Gemmipara
Filam
inTe
nsin
Amoebozoa
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
51
Dark blue indicates the presence of a canonical IMAC protein, white indicates the absence of an 1101 IMAC protein and light blue indicates a truncated form of IMAC protein. 1102 1103 1104
1105 Supplemental Figure 2: Repertoire of IMAC in only amoebozoan species that have either 1106 integrin alpha or beta. Names of IMAC proteins are listed at the top and names of amoebozoan 1107 taxa are listed on side of the map. Arrows indicates amoebozoan species that have nearly 1108 complete set of IMAC proteins. Dark blue indicates the presence of a canonical IMAC protein, 1109 white indicates the absence of an IMAC protein and light blue indicates a truncated form of 1110 IMAC protein 1111 1112 1113
Sappinia pedataThecamoeba sp.
Thecamoebidae isolate [RHP1-1]
Stenamoeba limacina
Dermamoeba algensis
Mayorella cantabrigiensis
Squamamoeba japonicaSapocribrum chincoteaguense
Armaparvus languidus
Entamoeba nuttalli
Entamoeba invadensEntamoeba disparEntamoeba histolyticaRhizomastix liberaRhizomastix elongataPelomyxa sp.
Mastigamoeba abductaMastigamoeba balamuthiMastigella eilhardi
Dictyostelium discoideum
Clastostelium recurvatumProtosproangium articulatumCeratiomyxa fruticulosa
Physarum polycephalumEchinostelium bisporumEchinosteliopsis oligospora
Phalansterium solitariumFlamella aegyptia
Filamoeba nolandiProtostelium nocturnum
Soliformovum irregularisGrellamoeba robusta
Ceratiomyxella tahitiensisAcramoeba dendroidea
Cavostelium apophysatum
Schizoplasmodiopsis vulgaris
Schizoplasmodiopsis pseudoendosporaTychosporium acutostipes
Trichosphaerium sp. Diplochlamys sp.
Amphizonella sp. 2
Amphizonella sp.10Amphizonella sp. 8Amphizonella sp. 9Amphizonella sp. 7
Rhizamoeba_saxonicaFlabellula citata
Nolandella sp.
Amoeba proteus
Microchlamys sp.Pyxidicula operculatum
Cryptodifflugia operculata
Heleopera sylvatica
Hyalosphenia papilioNebela sp.
Lesquereusia sp.
Difflugia sp. D2Difflugia bryophila
Centropyxis aerophyla
Arcella vulgaris
α−Int
egrin
β−Int
egrin
α−Pa
rvin
β−Pa
rvin
Vincu
linPa
xillin
TalinPINCH
FAK
ILK α−Ac
tinin
Amoebozoa
Choanoflagellates
integrins
integrins
paxvintalpinILK
pin
FAK
αβ
c-Src FAKLore
Tal
PinILKPar
Pax
Vin
αA
Extracellular
Actin
Cytoplasmic
OpisthokontaHolozoa
Fungi
Innovation/GainLoss
par
par
Obazoa
ILK
ILK
Other EukaryotesOther Eukaryotes
Canonical Protein PresentTruncated Protein PresentProtein Absent in DataNearly Complete IMAC
Taxon
Amorphea
Animals
Vermamoeba vermiformisEchinamoeba exudans
Vermamoeba sp.
Heleopera sphagni
Echinostelium minutum
Nematostelium gracile
Stenamoeba stenopodia
Thecamoeba quadrilineata
Paradermamoeba levis
Micriamoeba sp.
Paramoeba aestuarinaParamoeba atlanticaVexillifera sp. [MMETSP0173]
Stygamoeba regulata
Ripella sp.Lingulamoeba sp.
Vannella robustaVannella sp.
Ovalopodium desertumParvamoeba rugata
Gocevia fonbrunei
Cochliopodium minus
Acanthamoeba castellanii
Dracoamoeba jomungandriiProtacanthamoeba bohemica
Luapelamoeba hula Luapelamoeba arachisporum
Pellita catalonica
Endostelium zonatum [PRA-191] Endostelium zonatum LINKS
Vermistella antarctica
Vexillifera minutissima
Cunea sp. [JDS-Ruffled]
Clydonella sp.
Cunea sp. [MMETSP0417]
Acanthamoeba pyroformis
TubulineaEvosea
Discosea
AB
Capsaspora
ThecamonasPygsuia
Mycamoeba Gemmipara
Acanthamoeba quina
Acanthamoeba healyiAcanthamoeba astronyxisAcanthamoeba commandoniAcanthamoeba culbertsoniAcanthamoeba divionensisAcanthamoeba healyiAcanthamoeba lenticulateAcanthamoeba lugdunensisAcanthamoeba mauritaniensisAcanthamoeba pearceiAcanthamoeba polyphaga
Cyclopyxis sp
Difflugia sp. USP
Centropyxis sp.Planocarina carinata
Hyalosphenia elegance
Arcella GibbosaNetzelia oviformis
Copromyxa protea
Ptoleamoeba bullinesis
Idionectes vortex
Entamoeba moshkovskii
Dictyostelium citrinum
Tieghemostelium
lacteumDictyostelium firmibasis
Dictyostelium intermedium
Dictyostelium purpureumSpeleostelium caveatumCavenderia deminutivaAcytostelium ellipticumAcytostelium leptosomumAcytostelium subglobosumSynstelium polycarpum
Coremiostelium polycephalum
Heterostelium pallidum
Heterostelium multicystogenum
Vannella schaefferi
Parvamoeba monoura
Vannella firmcola
Gocevia fonbrunei_Tekle_2016
Acanthamoeba rhysodesAcanthamoeba royreba
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
52
1114 Supplemental Figure 3: Maximum likelihood tree of ITA homolog and other cell adhesion 1115 molecules that contain beta-propellers. Bootstrap values ≥50 are shown on the branch points. 1116 Phylogenetic tree was built with IQtree v1.5.5 under under the LG+C60+F+G model of protein 1117 evolution ML bootstrap (MLBS) (1000 ultrafast BS reps) values respectively. The protein 1118 sequences were aligned by MAFFT with the parameter linsi, maxiterate 1000 and local pair. 1119 BMGE was used to mask the alignment with the parameter of gap penalty of 0.8. ITA homolog 1120 phylogenetic tree are rooted with Linkin protein. Amoebozoan ITA genes are mostly 1121 concentrated on a single clade (colored) except for the Flabellula citata paralog. 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
53
1139 Supplemental Figure 4: Maximum likelihood tree of ITB homolog and Sib protein. Bootstrap 1140 values ≥50 are shown on the branch points. Phylogenetic tree was built with IQtree v1.5.0 under 1141 the LG+C60+F+G model of protein evolution ML bootstrap (MLBS) (1000 ultrafast BS reps) 1142 values respectively. The protein sequences were aligned by MAFFT with the parameter linsi, 1143 maxiterate 1000 and local pair. BMGE was used to mask the alignment with the parameter of 1144 gap penalty of 0.6. ITB homolog phylogenetic tree are rooted with Sib protein. Amoebozoan ITB 1145 genes are mostly concentrated on a single clade (colored). 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157
100
0.5
Metazoan_ITGB
Opistho-2_34916_Pigoraptor
8 7
97
93
100
100
91
92
78
100
100
90
93
96
99
92
100
78
7594
97
98
92
99
98
90
75
86
90
100
91
100
96
91
99
90
100
100
Schizoplasmodiopsis_vulgaris_TR1491
Cavostelium_apophysatum_c13787
Filamoeba_nolandi_DN12164
Soliformovum_irregularis_c14427
Clastostelium_recurvatum_c10578
Trichodesmium erythraeum_ABG51146.1_ITGB-like
Rhodobacteraceae bacterium_KPP84658.1_ITGB_like
Didymoeca costata_m.142452_ITGB-like
Pygsuia_biforma_ITGB
Lenisia_limosa_LenisMan46
Thecamonas_trahens_AMSG_06622T0
Capsaspora_owczarzaki_CAOG04086T0
Capsaspora_owczarzaki_CAOG02005T0
Capsaspora_owczarzaki_CAOG05058T0
Capsaspora_owczarzaki_CAOG01283T0
Rigifila_ramosa_a2731185122
Rhizamoeba_saxonica_TR7441
Rhizamoeba_saxonica_TR11158
Rhizamoeba_saxonica_TR7856
Cryptodifflugia_operculata_c2
Pyxidicula_operculatum_TR11783
Heleopera_sphagni_DN3395
Difflugia_sp._D2_TR12857
Difflugia_sp._USP_TR583
Nebela_sp._0_TR6609
Hyalosphenia_papilio_DN7101
chinamoeba_exudans_TR10338E
Flabellula_citata_135911
Mastigamoeba_balamuthi_MMETSP0035-20110809
Sphaeroforma arctica_XP_014159356.1_ITGB
Ministeria_vibrans_c46
Trichoplax sp. H2_3000888_ITGB6
Trichoplax sp. H2_3000889_ITGB1
Sib_Dictyostelium discoideum
Pigoraptor_vietnamica_104197_Paralog2
Pigoraptor_chileana_93177_Paralog1
Syssomonas_multiformis_24197
Creolimax_fragrantissima_CFRG6586T1
Pigoraptor_vietnamica_12581_Paralog1
Pigoraptor_chileana_34909_Paralog2
Idionectes_vortex_TRINITY_DN4916_c4_g12_paralog1
Idionectes_vortex_TRINITY_DN4117_C0_G1_I1_paralog2
Mastigella_eilhardi_A1302_1610_2327
Ministeria_vibrans_c334
Ministeria_vibrans_c191
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
54
1158 Supplemental Figure 6: Most significantly enriched motifs in Obazoan ITA, obtained by 1159 MEME-suite. (A) Consensus cation motif and FG-GAP obtained by RNA-seq for an Obazoan 1160 ITA, (B) EGF motif found exclusively in Ministeria vibrans (C) GFFKR motif found in Obazoa, 1161 (D) immunoglobulin like motif found in Obazoa. 1162 1163
ITGA Obazoan Motif
MEME (no SSC) 16.04.2019 22:29
0
1
2
3
4
bits
1ANTDYG
2ADHGSE
3KNSRLQE
4ACLSF
5DG
6GESFY
7RATS
8SLVMI
9IVSA
10
ADLRSTVEGNP
11
PVLGIA
12
ARKG
13
D14
SIVL
15
DN16
ADQGKNR
17
D18
G19
RCYIVF
20
DGLNTQP
21
D22
FILMSTV
23
ILAV
24
MVI
25
SG
26
CMA
27
P
28
AKMRTGNYF
29
ADFKTY
A.
B.
MEME (no SSC) 26.06.2019 15:18
0
1
2
3
4
bits
1
C2M
WL
3LSY
4GQA
5HG
6DLP
7
G8Q
RE9H
Y10
CDFS
11
C12
AGSV
13
VYC
14EP
C.
MEME (no SSC) 16.04.2019 22:34
0
1
2
3
4
bits
1ACLFG
2FVAIM
3GAL
4VWAKY
5ARSHK
6GMRAKL
7PHG
8AMSF
9F 10
LRDAK
11
AER
12
HNK
13
KLMTAP
14
TKP
15
T
A
S
P16
TKEP
17
EHMNQAT
18
DQSYFL
19
HDVE
20
GLMNTE
D.
MEME (no SSC) 26.06.2019 16:24
0
1
2
3
4
bits
1IYV
2LTI
3AFSPR
4LYKN
5FISV
6LRTVYG
7AFV
8GIKRSTV
9N 10
RLNT
11
G
12
KTP
13
GHQSA
14
QT
15
A16
GKF
17
APDG
18
CILNTV
19
HQST
20
ILV
21
KLNYT
22
FLI
23
QST
24
FIMSTL
25
LP26
RSTVYA
27
ASG
28
MV29
FSTVYE
30
VFL
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
55
1164 Supplemental Figure 7: Most significantly enriched motifs in Obazoan ITB, obtained by MEME. 1165 (A-D) Cation binding motifs (A) MIDAS binding motif DXSXS (B) Possible fifth position of 1166 MIDAS and ADMIDAS, (C) ITB SyMBS (green) and MIDAS (Yellow), (D) Possible E of 1167 SyMBS (first amino acid of SyMBS) (E-F) Cysteine rich motif, (E) PSI domain, (F) Cysteine-1168 rich motif stalk, (G-H) C-terminal motifs (G) NPXY motif, (H) Possible KLXXXD motif 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190
ITGB Obazoan Motif
MEME (no SSC) 16.04.2019 22:32
0
1
2
3
4
bits
1STDC
2FGHTWYLVA
3QSKNYR
4WFMVIL
5QRG
6IMLF
7ESAG
8NVGTS
9HYF
10
ALIV
11
NED
12
RK
13
HSP
14
KLMNIQRAV
15
ADEGINQVTSY
16
VP
17
LMSAIF
MEME (no SSC) 16.04.2019 20:55
0
1
2
3
4
bits
1AWLVFY
2RK
3CVGIL
4AGFVYI
5AHITVGNQ
6HKWGRTMV
7HKLQRYIMA
8QSLAMR
9AEND
10
NRK
11
NQKR
12
ADLQE
13
W14
DGNRSAKQE
15
AEQRK
16
HWF
17
HQYNE
18
ESQRAK
19
GIKADQE
20
LMQKR
21
DGRSAKQE
MEME (no SSC) 27.06.2019 17:57
0
1
2
3
4
bits
1IRTV
2PQVKT
3ILRSTVP
4RVANSPQ
5RSTVEPAGQ
6DGLPQSYRC
7IQYPK
8DEKSGNP
9IVL
10
ED11
FL12
ILVFY
13
IVFYL
14
IVL
15
VFELM
16
D17
IAVFL
18
TS
19
DQYSAG
20
GTS
21
FM
22
DEFLARSG
23
GKSND
24
QD
25
IMVKL
26
HKTVNRDA
27
EFSVNKT
28
IMVL
29
VAEQRK
MEME (no SSC) 16.04.2019 22:28
0
1
2
3
4
bits
1KR
2KQHR
3LMI
4MVIL
5FL 6FIMYVL
7GIVSLA
8ST
9ND
10
NSDA
11
GLNPQTD
12
GTAPSFY
13
DEQH
14
QYFMIL
15
GNVERA
16
FKMTG
17
ED
18
YASG
19
FGNPSTYELRA
20
LMSIKRA
21
QRVYILA
22
MRSYG
A.
MEME (no SSC) 16.04.2019 22:33
0
1
2
3
4
bits
1
C
2ASVYT
3ATD
4ILYFK
5GLRT
6INDT
7
C8D
EHKNR
9ESD
10
C11
NVLI
12
TN13
GDN
14
ADGKPL
15
ENRSTA
16
LMI
17
AYTP
18
EMNQTG
19
C20
SG
21
W22
C23
AEIVG
24ANQSTD
25
FHMTD
26
PQG
27
QSGT
28
IC
29
GQLR
MEME (no SSC) 16.04.2019 22:28
0
1
2
3
4
bits
1W 2AGNSK
3LNQTGAS
4AGNQTD
5EGANDQ
6N 7P 8HL
9YF 10
EVKQ
11
DAPS
12
ALNS
13
FTV
14
DILRTK
15
QE16
HYFT
17
INSVDE
18
N19
P20
VTIL
21
FY
MEME (no SSC) 16.04.2019 22:31
0
1
2
3
4
bits
1LNVTI
2AHLQRST
3DSEG
4DN
5KLRNI
6HD
7AFHQTYW
8SAP
9NE
10
NQDAS
11
ASVGMP
12
ISTFL
13
DE14
GA15
AML
16AFSVLM
17
HQ
18
ISVA
19
SVILA
20
KRLVA
21
C22
PQDKRT
23
ASDNTK
24
MRSEIL
25
LSTIV
26
KDNG
27
W28
GPAR
29
AENS
MEME (no SSC) 16.04.2019 22:29
0
1
2
3
4
bits
1NC
2ADGNS
3NPYG
4FKQRH
5
G6A
ENVST
7EFC
8ADLQESTNV
9
C
10
ASG
11
ALQREVS
12
C
13
NQTEKV
14
C
15
AFNED
16
DETGSA
17
ADLNG
18
FYW
19
ADFGIVTS
20
LQSG
21
IQAEPSDV
22
FGKTVSD
23
TC
24
GKEDTS
25
FC
B.
C. D.
E. F.
G. H.
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
56
1191 1192 1193 1194
1195 Supplemental Figure 8: Most significantly enriched motifs in Sib protein, obtained by MEME. 1196 (A)-(D) Cation binding motifs (A) Possible binding motif of MIDAS and ADMIDAS, (B) 1197 Possible fifth position of MIDAS and ADMIDAS (C) Possible binding motif of SyMBS (D) 1198 NPXY motif 1199 1200 1201 1202 1203 1204 1205 1206 1207
MEME (no SSC) 27.06.2019 20:22
0
1
2
3
4
bits
1IL 2VL
3MQ
4D 5LV 6ST
7KQA
8ANS
9NF
10
GSTA
11
YS12
ED13
FIL
14
PAS
15
ILTY
16
LVI
17ARK
18STA
19FSVQ
20
AIL
21
LNSTV
22
QGS
23
ILTV
24
LVYF
25
FNSWY
26
MNQRT
27
CILV
28
STVK
29
ADTQ
30
ITV
31
FY
32
AP
33
KSYN
34
VA
35
QRK
36
ILF
37
AG
38
FILV
39
GTA
40
TS
41
F
42
S
43
D
44
K
45
P
46
LI
47
SA
48
NP
49
FY
50
GA.
B.
C.
D.
Sib Motif
MEME (no SSC) 17.09.2019 18:05
0
1
2
3
4
bits
1DG
2GD
3AG
4V 5NS 6ATS
7N 8P 9LM 10
Y
11
QE
12
EQ
13
S
14
GKA
15
NRS
16
GSA
17
GA
18
IE
19
N
20
P
21
L
22
Y23
ELQ
24
AS
25
AS
26
NS
27
DE
MEME (no SSC) 17.09.2019 18:02
0
1
2
3
4
bits
1DC
2EGIPV
3GKTD
4GKN
5FW
6TD
7STP
8AITE
9KYS
10
DKQN
11
C12
LYF
13
DE
14
C15
LKS
16
TDK
17
G18
FHY
19
Y20
G21
DQE
22
NK
23
C24
ITVD
25
VAP
26
LTV
27
DP28
LP29
C30
DKV
31
KN32
G33
EIV
34
SP35
N36
GES
37
G38
SVI
39QLN
40G
41ND
42
G43
K44
C45
MTYL
46
C47
ISYN
48
YN49
G50
YWMEME (no SSC) 17.09.2019 17:57
0
1
2
3
4
bits
1FIV
2N 3HRW
4
S
5P 6V 7TS
8NG
9M 10
ILMV
11
P
12
I
13
IYV
14
PRTQ
15
LV
16
NTVI
17
RYA
18
DGN
19
RK
20
N21
N22
QN23
F24
LNR
25
VI26
API
27
A28
NS29
D30
PQ31
HN32
GKLV
33
RDQ
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
57
Supplemental Table 1: The table shows the list of Amorphea ITGA proteins characteristic 1208 including: length, beta propeller motifs, subcellular location, any extra domains attached to 1209 ITGA, c-terminal motif signal peptide, transmembrane regions and type of ITGA. The order of 1210 taxons are listed in accordance to the Figure 2. The subcellular localization is displayed with 1211 likelihood value in parentheses. The location of extra domains attached to ITGA are shown as 1212 the C-terminal end (CTE) or at the start of N-terminal (NTB). The presence of ITGA interacting 1213 motif with ITGB is shown as GFFKR and GXXXG. The location of beta propeller motifs 1214 performed by MEME-suite is displayed. The presence of FG-GAP motifs are shown in asterixis. 1215 The location of beta motifs that corresponds to IPRSCAN output are highlighted by parentheses. 1216 The presence of consensus cation motifs are displayed in number 2 superscripts. 1217
Species Size of AA
Subcellular localization Predictability based on likelihood
Extra Domain C-terminal end = CTE N-terminal Beginning =NTB
GFFKR motif GXXXG motif
Location of Beta propellor *= predict to possess FG-GAP 2 = Cation motif (D/E-h-D/N-x-D/N-G-h-x-D/E)
Signal peptide(SP) or transmembrane (TM)
Type I and II
Filamoeba nolandi (CAMPEP_0168571262)
631
Cell membrane protein Likelihood (0.971)
PRICHEXTENSN(CTE) protein kinase (CTE)
GFFKR GIAVG
16, 47, [109], 171
TM II
Ministeria vibrans Paralog 2 (2753598)
917 Extracellular (0.9862) Soluble (1) Likelihood (0.997)
EGF 2 repeat (NT)
1952, 2552, [319]2, [436]2*, [581]2, 772, 8122
II
Rhizamoeba saxonica (TR11094_c0_g1_i1)
449 Cell membrane protein Likelihood (0.437)
None [55], 1222, 184, 2542*
SP, TM I
Flabellula citata paralog3 (c2461_g1_i2)
673 Cell membrane protein Likelihood (0.301)
None 57*, 121, 184*, 251*, 313
SP, TM I
Cryptodifflugia operculata Paralog 2 (c8521_g1_i1)
1200
Cell membrane Protein Likelihood (0.257)
(NTB) Thrombospondin type -1 repeat, EGF 4 repeat Intramolecular chaperone
8372, 8892*, [1033]2, [1107]2
SP II
Nolandella sp. AFSM (TR12097c0_g1_i1)
975 Cell membrane Protein Likelihood (0.672)
(NTB) Thrombospondin type -1 repeat, EGF 4 repeat
613, [670]2*, [734]2*, [805]2*, 934
SP II
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
58
Arcella intermedia (c23786_g1_i1)
1074
Nucleus membrane Protein Likelihood (0.58)
(NTB) Thrombospondin type -1 repeat, EGF 4 repeat Intramolecular chaperone
[899]2*, [965]2* SP II
Flabellula citata paralog1 (c9004_g2_i1)
626 Cell membrane Protein Likelihood (0.994)
EGF (CTE) 123, [259], [323]2*
SP II
Flabellula citata paralog2 (c8227_g1_i1)
713 Cell membrane Protein Likelihood (0.669)
EGF (CTE) 88*, 150*, 255, 310
TM II
Ministeria vibrans Paralog 1 (2757223)
860 Peroxisome (0.3325), Soluble (0.9929) Likelihood (0.386)
None 582, [125]2*, 185, 243, 310, 3872, 448, [578]2*, [647], 748
I
Flabellula citata paralog5 (c8994_g2_i16)
604 Extracellular (0.5213) Soluble (0.9842) Likelihood (0.622)
None 113, 172, [302]*
I
Flabellula citata paralog4 (c8386_g1_i3)
646 Cell membrane Protein Likelihood (0.833)
None 53, [116]*, [178]*
SP, TM I
Cryptodifflugia operculata Paralog 1 (c43306_g1_i1)
581 Cell membrane Protein Likelihood (0.445)
None 582, [120]*2, [188]*, 2612, 3252, 383*, [449]2
SP, TM I
Amphizonella sp 9 paralog2 (TR2587c1_g1_i1)
565 Extracellular (0.9902) Soluble (0.9998) Likelihood (0.997)
None [47], 121, [198]1, 262, [344]2, [414]2, 4822
SP I
Mastigamoeba balamuthi paralog1 (MMETSP0035-
201108093729)
813 Cytoplasm (0.3215) Soluble (0.8872) Likelihood (0.493)
Cadherin/Ig-like fold (CTE)
66, [200]2, 268, [345], 416, 468, [540]
I
Amphizonella sp 8 Paralog 2 (TR9459c0_g1_i1)
569 Vacuole (0.4562) Soluble (0.8119) Likelihood (0.362)
None [71],138, 207, 274 [353], 419
SP I
Heleopera sylvatica (Gene.370_NODE_157)
743 Cell membrane Protein Likelihood (0.493)
ADP-ribose (NTB)
GIVAG [56], 139, 2152, [280]2
TM II
Nebela sp. (Gene.16036_NODE_111)
839 Peroxisome (0.3381), Soluble (0.6007)
Cadherin (CTE)
[36], [109], [179], [244],
TM I
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
59
Likelihood (0.125) [320], [388], [455]
Difflugia sp. D2 (DN18414_c0_g1_i5)
266 Organelle (0.3485) soluble (0.8094) Likelihood (0.287)
None [55], [122], [190]
I
Protosproangium articulatum Paralog3 (TR6387|m.15047)
671 Cell membrane Protein Likelihood (0.937)
None GAVVG
[48], [124], [194]
TM I
Ceratiomyxa fruticulosa (c6850_g1_i1)
565 Vacuole (0.4506) membrane (0.7419) Likelihood (0.215)
None 742, [127], 198, 259, 321, 393, 471
SP, TM I
Diplochlamys sp. Paralog2 (m.4379)
666 Cell membrane Protein Likelihood (0.989)
None GAIAG [120]2, 249, [311]2,
SP, TM
Paramoeba atlantica (2184673)
623 Cell membrane Protein Likelihood (0.498)
none 69, [125], 198, [337]2, 4252
SP, TM I
Mastigamoeba balamuthi paralog2 (MMETSP0035-
20110809854)
1042
Cell membrane Protein Likelihood (0.674)
Cadherin/Ig-like fold (CTE)
[133], 202, 265, 343,522 613, [685], 754
TM I
Amphizonella sp 9 Paralog3 (Gene.962_NODE_597)
685 Cell membrane Protein Likelihood (0.87)
PRICHEXTENSN(CTE)
68, [133]2, [195], 255, [287]2, [380]2*, 455, 5202, 586
SP, TM II
Amphizonella sp 9 paralog1 (TR17939c0_g1_i1)
677 extracellular (0.8519) soluble (0.8141) Likelihood (0.936)
None [62], [124], [199]2, [268]*, [337], [409]2, [488]2
SP I
Diplochlamys sp. Paralog1 (2112184)
540 Vacuole (0.559) Soluble (0.5526) Likelihood (0.639)
None [68], 143, 397, [468]
SP I
Vermistella antarctica (156452)
570 Extracellular (0.7931) Soluble (0.9906) Likelihood (0.905)
None 83, 224, 2992, 368, 4422, 516
SP I
Protosproangium articulatum Paralog4 (TR6387_c1_g4_i7)
631 Vacuole (0.4093) Soluble (0.8173) Likelihood (0.459)
None [108], [180], 248, 329, [390]*, [487], [567]*
SP I
Amphizonella sp 8 Paralog 1
656 Extracellular (0.8352) Soluble (0.992)
PRICHEXTENSN(CTE)
[53], [125], [200]2, 2662,
SP II
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
60
(TR6668_c0_g1_i4)
Likelihood (0.964) 3322, [403]2, 473
Amphizonella sp. 10 Paralog1 (TR4717c0_g1_i1)
563 Cytoplasm (0.625) Soluble (0.8464) Likelihood (0.758)
PRICHEXTENSN(CTE)
16, 87, [161], 237, 2992, 370, [438]
II
Amphizonella sp 2 (TR2129_c1_g3_i9)
544 Endoplasmic reticulum (0.2477) Soluble (0.5541) Likelihood (0.224)
none 111, [189]2, [257], [334]2, [403]2, 474, [539]2
SP I
Endostelium zonatum LINKS (789702)
683 Cell membrane Protein Likelihood (0.451)
PRICHEXTENSN(CTE)
2322, [295], 432, [497]2, [572]2, [647]
SP II
Endostelium zonatum (m.5907)
681 Cell membrane Protein Likelihood (0.479)
PRICHEXTENSN(CTE)
2312, [293], 430, [495]2, [570]2, [645]
SP II
Protosproangium articulatum Paralog5 (TR2944c0_g1_i1)
735 Extracellular (0.4086) Soluble (0.4867) Likelihood (0.572)
none [70], [137], [200], [268], [344], 407, 480
SP I
Protosproangium articulatum Paralog1 (TR2074c0_g1_i2)
851 Cell membrane Protein Likelihood (0.448)
protein kinase GIVFG 572, [131]2, [201], [275]2, [344], 418, 485
SP, 2TM II
Protosproangium articulatum Paralog2 (TR2817c1_g1_i1_6465)
685 Cell membrane Protein Likelihood (0.997)
None GVGLG [105], 176, [240], 303, [378], 450, 512
TM I
Stenamoeba stenopodia (2782329)
662 Cell membrane Protein Likelihood (0.93)
None [168]2, 3102, 4442, 5142
SP, TM I
Gocevia fonbrunei Paralog1 (DN14950_c7_g1_i2)
869 Cell membrane Protein Likelihood (0.999)
Cadherin/Immunoglobulin-like fold (CTE)
GAIIG [66]2, [207], 281, [348], 417, [488]2,
TM II
Gocevia fonbrunei Paralog2 (DN13664_c1_g1_i1)
665 Extracellular (0.5821) Soluble (0.9361) Likelihood (0.656)
None 52, [121], 194, 257, [318]2, [386]2*, 463
SP I
Amphizonella sp. 10 Paralog2 (TR11506c6_g2_i3)
523 Extracellular (0.9162) Soluble (0.9984) Likelihood (0.929)
None [56], [129]2, [201], [273]2, 3402, 3972, [468]
SP I
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
61
Ectocarpus siliculosus
1105
Cell membrane Protein Likelihood (0.758)
PRICHEXTENSN(CTE)
[56], [127], 195, 260, 326, [396], 464*
SP, TM II
Ministeria vibrans Paralog 3 (m.23418)
634 Cell membrane Protein Likelihood (0.214)
None GFFKR
[18]2*, 78 TM I
Pygsuia biforma (ITGA)
3312
Extracellular (0.9162) Soluble (0.9984) Likelihood (0.93)
DUF11 4 repeats (CTE)
GFFKR GLAIG
267, [685], [807]
TM I
Lenisia limosa (LenisMan47)
3346
Cytoplasm (0.5704) Soluble (0.9615) Likelihood (0.632)
Ig fold like 11 repeats (CTE)
GFFKR
3102, [701], [826]2
I
Thecamonas trahens
2114
Cell membrane Protein Likelihood (0.975)
Ig fold like 2 repeats (CTE) Laminin_g3 (CTE)
GFFKR
349*, [894]2*, [956]2*
SP, TM I
1218 Supplemental Table 2: The table shows ITGB motifs, subcellular localizations, signal peptide, 1219 transmembrane and extra domains attached to ITGB protein. The order of taxons are listed in 1220 accordance to the Figure 3. For ITGB motifs of MIDAS, AMIDAS and SyMBS, any non-1221 canonical amino acids were highlighted in red colour, cysteine rich motifs are either shown as 1222 EGF or PSI domains. The presence of C-terminal motifs are shown in KLXXXD or NPXY/F and 1223 GXXXG. The parentheses indicate number of motifs present in ITGB. 1224
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
62
Species MIDAS
AMIDAS SyMBS CRM Cysteine rich motifs
Size
aa
Signal
Peptide
and
Transme
mbrane
B-tail
motifs
Ty
pe
Subcellular
localization
(Deeploc)
Clastostelium recurvatum
25D 27T 29S 123E 139R
29S 32S 33D 202A 160D
81E 120R 121P 122D 123E
None 1405aa
TM GAVVG II Extracellular Likelihood (0.3436) soluble likelihood (0.731)
Filamoeba nolandi
893D 895Y 897S 956E 991D
897S 899K 900A 1066A 991D
906D 949D 950E 951P 952N
EGF, PSI (NTB)
2714aa
SP, TM GAFIG II Cell membrane Protein likelihood (0.5463)
Soliformovum irregularis
872D 874S 876N 958S 975N
874N 879Y 880N 1071A 975N
924D 1012D 1013P 1014N 1015E
PSI 2218aa SP NPXY Present
II Extracellular likelihood (0.8972) soluble likelihood (0.9172)
Schizoplasmodiopsis vulgaris
839D, 841T 845S 928E 944D
845S 846Y 857E 1045A 944D
889D 923N 925Q 927L 928E
PSI 3641aa SP, TM NPXY present GAAG
II Extracellular likelihood (0.5465) soluble likelihood (0.6244)
Cavostelium apophysatum
507D 509T 511S 592N 642D
511S 514S 515D 708A 642D
546D 656N 658D 659E 660D
PSI 3296aa
TM NPXY present GAASG
II Cell membrane protein likelihood (0.7274)
Mycamoeba gemmipara
277D 279T 281S 375E 416E
281S 285N 286D 530A 390D
328E 403N 411D 414P 416E
Unique cysteine rich region pattern downstream from 500AA-860AA
927aa
TM NPXY present GVIAG
I Cell membrane protein likelihood (0.998)
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
63
Mastigamoeba balamuthi
159D, 161T, 163S 242E 271D
163S, 170P, 171N 371S, 271D
198E, 238S, 239D, 241P, 242E
PSI, EGF-like present 5 times
1315 aa
SP, TM NPXA present GAASG
I Cell membrane Protein likelihood (0.561)
Mastigella eilhardi
127D, 129S, 131S 212E 230D
131S 134D 135D 339A 230D
198E, 207N, 209D, 211P, 212E
PSI, EGF-like present 5 times
1292 aa
SP, TM NPXA present GAASG
I Cell membrane Protein likelihood (0.7939)
Rigifila limosa
124D 126S 128S 206E 237D
128S 131D 133D 391A 237D
164D 201N 203D 205P 206E
PSI, EGF-like present 13 times
932 aa SP, TM NPXY present GTATG
I Extracellular soluble likelihood (0.681)
Echinamoeba exudans
123D, 125S, 127S, 209E, 239D
127S, 140G 141E 338A 239 D
174E 204N, 206E, 208P, 209E
PSI, EGF-like present 3 times
593 aa SP, TM
NPXY present (2) KMAAAD GAVIG
I Cell membrane protein likelihood (0.5596)
Difflugia sp. D2
115D, 117S, 119S, 213E, 249D
119S 136D 137E 295I 249D
177E, 208N, 210E, 212P, 213E
PSI, EGF present 3 times
605 aa
TM NPXY present (2)
I Cell membrane protein likelihood (0.8598)
Nebula sp. 131D, 133S, 135S, 225E, 264D
135S 138P 139Y 361A, 264D
192E, 223N, 225E, 227P, 228E
PSI, EGF present 3 times
630 aa SP, TM NPXY present (2)
I Cell membrane protein likelihood (0.7413)
Heleopera sphagni
127D 129S 131S 221E 266D
131S 134P 135H 352A 266D
174E 216N 218E 220P 221E
PSI, EGF present 3 times
622 aa SP, TM
NPXY present (2) KADX(4)E
I Cell membrane protein likelihood (0.9332)
Difflugia sp. USP
146D, 148S, 150S, 245E 280D
150S 153P 154Y 370A 280D
197E 239N 241E 243P 245E
PSI, EGF present 3 times
630 aa SP, TM
NPXY present (2)
I Cell membrane protein likelihood (0.9847)
Hyalosphenia papilio
35D 37S 39S 132E
39S 42P 43Y 270D
85D 127N 129E 131P
EGF present 3 times
531 aa TM NPXY present (2) KADX(4)D
I Cell membrane protein likelihood (0.8186)
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
64
172D 172D 132E Rhizamoeba saxonica (paralog 1) TR11158c0_g1_i1
143D 145S 147S 228E 259D
147S 150Q 151E 383A 259D
182E 223N, 225D, 227P, 228E
PSI, EGF present 7 times
951 aa SP, TM NPXY present (2)
I Cell membrane protein likelihood (0.7754)
Cryptodifflugia operculum
144D 146S 148S 241E 278D
148S 151P 152L 370G 278D
194E 236N 238E 240P 241E
PSI, EGF present 3 times
633aa SP, TM NPXY present (2)
I Cell membrane protein likelihood (0.9426)
Rhizamoeba Saonxica (Paralog2) TR7856c0_g1_i1
142D, 144S 146S 227E, 258D
145S 148D 149D 329A 258D
181E, 222N, 224D, 226C, 227E
PSI, EGF present 3 times
932aa TM NPXY present (2)
I Cell membrane protein likelihood (0.8204)
Flabellula citata
163D, 165S 167S 234E, 282D
167S 172M 173E 364A 282D
185D, 229G 231D, 233P 234E
PSI, EGF present 5 times
1141 aa
SP GAAVG I Cell membrane protein likelihood (0.7827)
Rhizamoeba saxonica (Paralog 3) TR7441_c0_g1_i1
832D 834S 836S 918E 949D
836S 839P 840F 1063A 949D
871E, 913N 915D 917C 918E
EGF-like present 2 times
1456 aa SP, TM
NPXY present (2)
I Extracellular (05263) soluble likelihood (0.731)
Pyxidicula operculata
133D, 135S, 137S, 229E, 264D
137S, 140P 141Y, 360A, 264D
192E, 224N, 226E, 228P, 229E
PSI, EGF present 3 times
621 aa
SP, TM NPXY present KADX(4)D GECGG
I Extracellular (0.9867) soluble likelihood (0.9995)
Pygsuia biforma
110D 112S 114S 204E 231D
114S 117D 118D 331A 231D
161D 200Q 201D 203P 204E
PSI, EGF present 34 times
1793 aa TM NPXY present KAGX(4)E present
I Cell membrane Protein likelihood (0.9792)
Lenisia limosa
130D, 132S 134S 218S 246D
134S 137N 138D 357N 246D
182N 215N 216E 217P 218S
PSI, EGF present 12 times, EGF like 4 times
SP, TM NPXY present RFNAAMKE present
I Cell membrane Protein likelihood (0.9934)
Ministeria vibrans (paralog 1) c191|m.702
85D, 87S 89S 179E
89S 92D 93D 310S
124E 174N 176D 178P
EGF-like present 5 times
SP, TM NPXY present KAWQQWEE
I Cell membrane protein likelihood (0.9827)
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
65
1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 Supplemental Table 3: This table shows Rsem value of protists that have either a complete set 1237 of IMAC, integrins with extra domains or unexpected discovery of an integrin protein. The 1238 values are shown in transcripts per million (TPM). 1239
Species ITA ITB Talin
PINCH
ILK Paxillin Actinin Actin eF1α Vinculin
Filamoeba nolandi
10.23 91.07(?) 45.54 30.62 11.71 0.00 138.81 2417.45
5932.12 36.07
Difflugia sp. D2 0.00 0.00 0.00 6.01 N/A 0.00 63.16 12.24 0.00 0.00
Flabellula citata
16.31(Paralog1) 9.81(Paralog2) 14.56(paralog3) 4.90 (Paralog4)
421.32 18.56 34.78 5.54 25.09 109.90 2708.65
6180.65 7.28
210D 210D 179E Ministeria vibrans (paralog 2) c46_198
73D, 75S 77S 194E 225D
77S 80N 81D 339A 225D
122D 189N 191D 193P 194E
EGF-like present 14 times
TM NPXF present (2) KLAQVAAD
I Cell membrane Protein likelihood (0.9941)
Ministeria vibrans (paralog 3) c334|m.1235
594D, 596S 598S 682E 714D
598S 601N 602D 824A 714D
633E 677N 679H 681S 682E
EGF-like present at N-terminal and C-terminal
SP, TM NPXY present (2) KLITGAAD
I Cell membrane Protein likelihood (0.9985)
Didymoeca costata
444D 446S, 448S 530E, 562D
446S 449D 451D 652A 562D
524S 527D 529P 530E
None 1673aa SP, TM None I Cell membrane protein likelihood (0.7458)
Sphaeroforma arctica
119D 121S 123S 231E 260D
123S 125D 127D 296A 260D
195E 227N 229D 230P 231E
EGF-like present 7 times
1141aa
TM KGIQMVMD
I Cell membrane protein likelihood (0.638)
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
66
0.67(paralog 5) Nebela sp. 3.73 11.34 3.95 85.57 N/A 25.24 444.60 653.00 2388.17 6.42
Rhizamoeba saxonica
174.00 52.55(Paralog3) 82.61(Paralog2) 121.63(paralog1)
25.63 29.96 3.23 13.13 106.07 12079.17
5322.57 20.45
Cryptodifflugia operculata
4.93 (paralog1) 1.80 (paralog2)
7.95 50.58 2.40 2.47 15.50 5.55 272.10 1857.80 (c7512_g1_i1)
9.65
Nolandella sp. AFSM
2.77 N/A 9.84 41.65 N/A 11.66 3.44 2929.71
3621.35 12.38
Arcella intermedia
2.98 N/A 4.91 1.19 N/A 11.19 43.20 110.29 (c17090_g2_i1)
2001.59 4.71
Rigifila ramosa N/A 810.35 348.93 59.65 N/A 141.87 127.94 18519.18
2284.60 7.37
Pygsuia biforma 2.14 1.45 41.95 111.50 N/A 13.47 103.13 17397.16
7071.38 N/A
Ministeria vibrans
1.71 (Paralog 1) 2.90 (Paralog 2) 20.02 (Paralog 3)
0.00(Paralog 1) 5.17(Paralog 2) 10.56 (Paralog3)
3.88
13.43 3.38 13.43 8.72 277.70 310.59 1.15
1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257
Supplemental Table 4: A). Summary of amoebozoan ITA type I table listing: number of 1258 paralogs, Beta propeller, FG-GAP and canonical motifs. The table also shows presence of signal 1259 peptide and transmembrane region, c-terminal interacting motif, GXXXG motif and Ig-fold 1260 like/Cadherin domain. B). Summary of amoebozoan ITA type II table listing: number of 1261 paralogs, Beta propeller, FG-GAP and canonical motifs. The table also shows presence of signal 1262 peptide and transmembrane region, c-terminal interacting motif, GXXXG motif and extra 1263 domain attached to ITA. 1264 1265 1266 1267
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint
67
1268 1269
1270
.CC-BY-NC-ND 4.0 International licenseavailable under awas not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprint (whichthis version posted April 30, 2020. ; https://doi.org/10.1101/2020.04.29.069435doi: bioRxiv preprint