The architecture of metabolism maximizes biosynthetic ... · 1/31/2020 · 2 The architecture of...
Transcript of The architecture of metabolism maximizes biosynthetic ... · 1/31/2020 · 2 The architecture of...
1
Research Article 1
The architecture of metabolism maximizes biosynthetic diversity in the largest class of 2
fungi 3
Authors: 4
Emile Gluck-Thaler, Department of Plant Pathology, The Ohio State University Columbus, OH, USA, 5
and Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA 6
Sajeet Haridas, US Department of Energy Joint Genome Institute, Lawrence Berkeley National 7
Laboratory, Berkeley, CA, USA 8
Manfred Binder, TechBase, R-Tech GmbH, Regensburg, Germany 9
Igor V. Grigoriev, US Department of Energy Joint Genome Institute, Lawrence Berkeley National 10
Laboratory, Berkeley, CA, USA, and Department of Plant and Microbial Biology, University of 11
California, Berkeley, CA 12
Pedro W. Crous, Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584 CT Utrecht, The 13
Netherlands 14
Joseph W. Spatafora, Department of Botany and Plant Pathology, Oregon State University, OR, USA 15
Kathryn Bushley, Department of Plant and Microbial Biology, University of Minnesota, MN, USA 16
Jason C. Slot, Department of Plant Pathology, The Ohio State University Columbus, OH, USA 17
corresponding author: [email protected] 18
19
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
2
Abstract: 19
Background - Ecological diversity in fungi is largely defined by metabolic traits, including the 20
ability to produce secondary or "specialized" metabolites (SMs) that mediate interactions with 21
other organisms. Fungal SM pathways are frequently encoded in biosynthetic gene clusters 22
(BGCs), which facilitate the identification and characterization of metabolic pathways. Variation 23
in BGC composition reflects the diversity of their SM products. Recent studies have documented 24
surprising diversity of BGC repertoires among isolates of the same fungal species, yet little is 25
known about how this population-level variation is inherited across macroevolutionary 26
timescales. 27
Results - Here, we applied a novel linkage-based algorithm to reveal previously unexplored 28
dimensions of diversity in BGC composition, distribution, and repertoire across 101 species of 29
Dothideomycetes, which are considered to be the most phylogenetically diverse class of fungi 30
and are known to produce many SMs. We predicted both complementary and overlapping sets of 31
clustered genes compared with existing methods and identified novel gene pairs that associate 32
with known secondary metabolite genes. We found that variation in BGC repertoires is due to 33
non-overlapping BGC combinations and that several BGCs have biased ecological distributions, 34
consistent with niche-specific selection. We observed that total BGC diversity scales linearly 35
with increasing repertoire size, suggesting that secondary metabolites have little structural 36
redundancy in individual fungi. 37
Conclusions - We project that there is substantial unsampled BGC diversity across specific 38
families of Dothideomycetes, which will provide a roadmap for future sampling efforts. Our 39
approach and findings lend new insight into how BGC diversity is generated and maintained 40
across an entire fungal taxonomic class. 41
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
3
Keywords: 42
chemical ecology; Fungi; metabolism; gene cluster 43
44
Background: 45
Plants, bacteria and fungi produce the majority of the earth's biochemical diversity. These 46
organisms produce a remarkable variety of secondary/specialized metabolites (SMs) that can 47
mediate ecological functions, including defense, resource acquisition, and mutualism. Standing 48
SM diversity is often high at the population level, which may affect the rates of adaptation over 49
microevolutionary timescales. For example, high intraspecific quantitative and qualitative 50
chemotype diversity in plants can enable rapid adaptation to local biotic factors (Agrawal, 51
Hastings et al. 2012; Züst, Heichinger et al. 2012; Glassmire, Jeffrey et al. 2016). However, the 52
fate of population-level chemodiversity across longer timescales is not well explored in plants or 53
other lineages. We therefore sought to identify how metabolic variation is distributed across 54
macroevolutionary timescales by profiling chemodiversity across a well-sampled taxonomic 55
class. 56
The Dothideomycetes, which originated between 247 and 459 million years ago 57
(Beimforde, Feldberg et al. 2014), comprise the largest and arguably most phylogenetically 58
diverse class of fungi. Currently, 19,000 species are recognized in 32 orders containing more 59
than 1,300 genera (Zhang, Crous et al. 2011). Dothideomycetes are divided into two major 60
subclasses, the Pleosporomycetidae (order Pleosporales) and Dothideomycetidae (orders 61
Dothideales, Capnodiales, and Myriangiales), which correspond to the presence or absence, 62
respectively, of pseudoparaphyses during development of the asci (Schoch, Crous et al. 2009). 63
Several other orders await definitive placement. 64
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
4
Dothideomycetes also display a large diversity of fungal lifestyles and ecologies. The 65
majority of Dothideomycetes are terrestrial and associate with phototrophic hosts as either 66
pathogens, saprobes, endophytes (Schoch, Crous et al. 2009), lichens (Nelsen, Lucking et al. 67
2011), or ectomycorrhizal symbionts (Spatafora, Owensby et al. 2012). Six orders contain plant 68
pathogens capable of infecting nearly every known crop species. The Pleosporales and 69
Capnodiales, in particular, are dominated by asexual plant pathogens that cause significant 70
economic losses and have been well sampled in previous genome sequencing efforts (Goodwin, 71
Ben M'Barek et al. 2011; Ohm, Feau et al. 2012; Oliver, Friesen et al. 2012; Condon, Leng et al. 72
2013; Manning, Pandelova et al. 2013). A single order (Jahnulales) contains aquatic, primarily 73
freshwater, species (Suetrong, Boonyuen et al. 2011). Other ecologies include human and animal 74
pathogens, including some taxa that can elicit allergies and asthma (Crameri, Garbani et al. 75
2014), and rock-inhabiting fungi (Ruibal, Gueidan et al. 2009). 76
This broad range of lifestyles is accompanied by extensive diversity of SMs, for which 77
very few have known ecological roles. The Dothideomycetes, and several other ascomycete 78
classes (Eurotiomycetes, Sordariomycetes, and Leotiomycetes), produce the greatest number and 79
diversity of SMs across the fungal kingdom (Spatafora and Bushley 2015; Akimitsu et al. 2014). 80
Economically important plant pathogens in the Pleosporales (Alternaria, Bipolaris, Exserohilum, 81
Leptosphaeria, Pyrenophora, and Stagonospora), in particular, are known to produce host-82
selective toxins that confer the ability to cause disease in specific plant hosts (Walton and 83
Panaccione 1993; Walton 1996; Wolpert, Dunkle et al. 2002; Ciuffetti, Manning et al. 2010, 84
Pandelova, Figueroa et al. 2012; Akimitsu, Tsuge et al. 2014). Other toxins first identified in 85
Pleosporales have roles in virulence, but are not pathogenicity determinants, including the PKS 86
derived compounds depudecin (Wight, Kim et al. 2009) and solanapyrone (Kaur 1995). 87
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
5
Dothideomycetes are also known to produce bioactive metabolites shared with more 88
distantly related fungal classes. Sirodesmin, a virulence factor produced by Leptosphaeria 89
maculans, for example, belongs to the same class of epipolythiodioxopiperazine (ETP) toxins as 90
gliotoxin, an immunosuppressant produced by the eurotiomycete human pathogen Aspergillus 91
fumigatus (Gardiner, Waring et al. 2005; Patron, Waller et al. 2007). Dothistromin, a polyketide 92
metabolite produced by the pine pathogen Dothistroma septosporum shares ancestry with 93
aflatoxin (Bradshaw, Slot et al. 2013), a mycotoxin produced by Aspergillus species that poses 94
serious human health and environmental risks worldwide (Horn 2003; Wang and Tang 2004). 95
A majority of bioactive metabolites in Dothideomycetes are small-molecule SMs that, 96
like those of other fungi, are frequently the products of biosynthetic gene clusters (BGCs) 97
composed of enzymes, transporters, and regulators that contribute to a common SM pathway. 98
Most of these BGCs are defined by four main classes of SM core signature enzymes: 1) 99
nonribosomal peptide synthetases (NRPS), 2) polyketide synthetases (PKS), 3) terpene synthases 100
(TS), and 4) dimethylallyl tryptophan synthases (DMAT) (Hoffmeister & Keller 2007). Fungal 101
gene clusters are hotspots for genome evolution through gene duplication, loss, and horizontal 102
transfer, which recombine pathways and generate diversity (Wisecaver, Slot et al. 2014). 103
Additionally, recent studies have shown that gene clusters may evolve through recombination or 104
shuffling of modular subunits of syntenic genes (Lind, Wisecaver et al. 2017; Gluck-Thaler et al 105
2018. Changes in BGC gene content often result in structural changes to the SM product(s), and 106
therefore BGCs can be used to monitor the evolution of chemodiversity (Lind, Wisecaver et al. 107
2017; Proctor, McCormick et al. 2018). The most widely used methods for detecting BGCs rely 108
on models of gene cluster composition based on putative functions in SM biosynthesis informed 109
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
6
by a phylogenetically limited set of taxa, but gene function agnostic methods are being 110
developed (Slot and Gluck-Thaler 2019). 111
Here, we systematically assessed BGC richness and compositional diversity in the 112
genomes of 101 Dothideomycetes species, most recently sequenced (Haridas et al. in press). 113
Using a newly benchmarked algorithm that identifies clustered genes of interest through the 114
frequency of their co-occurrence with and around signature biosynthetic genes, we identified 115
3399 putative BGCs, grouped into 719 unique cluster types, including 5 varieties of candidate 116
DHN melanin clusters. The conservation of specific gene pairs across BGC types suggests that 117
precise functional interactions contribute to the modular evolution of these loci. Numerous BGCs 118
have either over- or under-dispersed phylogenetic distributions, suggesting pathways have been 119
differentially impacted by selection. In comparisons across species, BGC repertoire diversity 120
increases linearly with repertoire size, reflecting a mode of metabolic evolution in these fungi 121
that is likely distinct from that of plants. We found little overlap in cluster repertoires among 122
genomes from different genera, and project that a wealth of unique BGCs remain to be 123
discovered within this fungal lineage. 124
Results: 125
Dothideomycetes contain hundreds of distinct types of BGCs, a small fraction of which are 126
characterized. 127
Using a novel cluster detection approach based on shared syntenic relationships among 128
genes (CO-OCCUR, see Methods, Figure 1, Figure SA), we identified 332 gene homolog groups 129
(homolog groups) of interest (Table SA, Table SB) whose members were organized into 3399 130
candidate BGCs of at least two genes (Table SC) in 101 Dothideomycete genomes (Table SD), 131
representing an average of 33.7 BGCs per genome (SD= 15.4, Figure SG). We grouped BGCs 132
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
Random distributions of HG
co-occurrences
BGCs(3,399 Total)
Group together BGCs that have ~90% of their genes in
common(719 Total)
Cluster groups (CGs)
(422 Total)
Orphan clusters (OCs)
(297 Total)
CO-OCCURPipeline for identifying
biosynthetic gene clusters (BGCs) using
unexpected HG co-occurrences
Pipeline for sampling random pairs of co-
occurring HGs(500,000 replicates)
101 DothideomycetesGenomes
with genes clustered into homolog groups (HGs)
Figure 1.
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
7
into 719 unique cluster types based on a minimum gene content similarity of 90%; 422 cluster 133
types are part of homologous cluster groups (cluster groups) found in 2 or more genomes, and 134
297 are orphan clusters found in only one genome (Table SE). Of these, 345 cluster types (166 135
cluster groups and 179 orphan clusters) had 5 or more genes per BGC (Figure 2), and 459 cluster 136
types (239 cluster groups and 220 orphan clusters) had 4 or more genes per BGC (Figure SC). 137
Only 9 of the 459 cluster types with greater than 4 genes were ever found more than once in any 138
given genome (Table SE). According to standard practice, we classified cluster types based on 139
the presence of biosynthetic signature genes: dimethylallyl tryptophan synthase (DMAT), 140
polyketide synthase (PKS), PKS-like, nonribosomal peptide-synthetase (NRPS), NRPS-like, 141
hybrid (HYBRID), and terpene cyclase (TC). We found that among all cluster types with greater 142
than 4 genes, 186 contained only PKS and 29 contained only NRPS signature genes. Similarly, 143
we detected 4 DMAT, 38 PKS-like, 16 NRPS-like, 3 HYBRID, and 3 TC-only cluster types. 127 144
cluster types contained more than 1 type of signature gene, and 53 cluster types contained no 145
signature gene at all but still consisted of genes found in significant co-occurrences. By 146
searching the MIBiG database for highly similar hits (≥70% amino acid identity) to the signature 147
biosynthetic genes in CO-OCCUR BGCs, we were able to confidently annotate 158 of the BGCs 148
recovered by CO-OCCUR with 32 unique MIBiG entries, corresponding to 22 unique 149
metabolites (Table SF). BGC annotations based instead on content overlap with characterized 150
MIBiG clusters can be found in Table SG (minimum cluster size=3 genes; minimum percentage 151
of genes with similarity=70%). 152
Of the 158 BGCs with hits to the MIBiG database, some encoded non-host selective 153
phytotoxins or other compounds with known roles in virulence to plants. Two PKS BGC’s 154
encoding the non-host-specific phytotoxin and DNA polymerase inhibitor, solanapyrone 155
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
Viridothelium virensTrypetheliales
Myriangium duriaei CBS 260.36MyriangialesElsinoe ampelinaMyriangiales
Aureobasidium pullulans EXF-150Dothideales
Aureobasidium namibiae CBS 147.97DothidealesAureobasidium melanogenum CBS 110374Dothideales
Aureobasidium subglaciale EXF-2481Dothideales
Delphinella strobiligenaDothideales
Polychaeton citri CBS 116435Capnodiales
Dissoconium aciculare CBS 342.82Capnodiales
Zymoseptoria tritici IPO323CapnodialesZymoseptoria pseudotritici STIR04_2.2.1CapnodialesZymoseptoria ardabiliae STIR04_1.1.1Capnodiales
Dothistroma septosporum NZE10CapnodialesPassalora fulvaCapnodiales
Zasmidium cellare ATCC 36951Capnodiales
Pseudocercospora fijiensisCapnodiales
Sphaerulina musiva SO2202CapnodialesSphaerulina populicolaCapnodiales
Cercospora zeae-maydis SCOH1-5Capnodiales
Hortaea acidophilaDothideales
Baudoinia panamericana UAMH 10762Capnodiales
Teratosphaeria nubilosaCapnodiales
Piedraia hortae Capnodiales
Acidomyces richmondensis BFWUnknown
Eremomyces bilateralis CBS 781.70Unknown
Microthyrium microscopicumMicrothyriales
Trichodelitschia bisporulaPhaeotrichales
Tothia fuscellaVenturiales
Verruconis gallopavaVenturiales
Venturia inaequalisVenturialesVenturia pyrinaVenturiales
Aulographum hederaeUnknown
Rhizodiscina lignyotaPatellariales
Lineolata rhizophoraeDothideales
Patellaria atrata CBS 101060Patellariales
Coniosporium apollinis CBS 100218Unknown
Aplosporella prunicola CBS 121167Botryosphaeriales
Phyllosticta citriasianaBotryosphaeriales
Neofusicoccum parvum UCRNP2Botryosphaeriales
Macrophomina phaseolina MS6BotryosphaerialesBotryosphaeria dothideaBotryosphaeriales
Diplodia seriataBotryosphaeriales
Saccharata proteae CBS 121410Botryosphaeriales
Pseudovirgaria hyperparasiticaCapnodiales
Lepidopterella palustris CBS 459.81Mytilinidiales
Glonium stellatumUnknownCenococcum geophilum 1.58Unknown
Lophium mytilinumMytilinidialesMytilinidion resinicolaMytilinidialesRhytidhysteron rufulumHysteriales
Hysterium pulicareHysteriales
Delitschia confertasporaPleosporales
Zopfia rhizophila CBS 207.26Unknown
Lindgomyces ingoldianusPleosporales
Clohesyomyces aquaticusPleosporales
Amniculicola lignicola CBS 123094PleosporalesLophiotrema nuculaPleosporales
Polyplosphaeria fuscaPleosporales
Didymosphaeria enaliaPleosporales
Aaosphaeria arxiiPleosporalesLophiostoma macrostomum CBS 122681Pleosporales
Westerdykella ornataPleosporales
Sporormia fimetaria CBS 119925Pleosporales
Massariosphaeria phaeosporaPleosporalesTrematosphaeria pertusaPleosporales
Bimuria novae-zelandiaePleosporalesParaphaeosphaeria sporulosaPleosporalesKarstenula rhodostoma CBS 690.94PleosporalesMassarina eburnea CBS 473.64Pleosporales
Byssothecium circinansPleosporalesPericonia macrospinosaPleosporales
Lentithecium fluviatile CBS 122367Pleosporales
Stagonospora sp. SRC1lsM3aPleosporalesAmpelomyces quisqualisPleosporales
Parastagonospora nodorum SN15Pleosporales
Ophiobolus disseminansPleosporales
Setomelanomma holmiiPleosporales
Leptosphaeria maculans JN3Pleosporales
Plenodomus tracheiphilus IPT5PleosporalesClathrospora elynaePleosporales
Decorospora gaudefroyiPleosporales
Alternaria brassicicolaPleosporalesAlternaria alternataPleosporales
Pyrenophora tritici-repentisPleosporalesPyrenophora teres f. teres 0-1PleosporalesSetosphaeria turcica Et28APleosporalesCurvularia lunata m118PleosporalesBipolaris maydis C5PleosporalesBipolaris sorokiniana ND90PrPleosporalesBipolaris oryzae ATCC 44560PleosporalesBipolaris zeicola 26-R-13PleosporalesBipolaris victoriae FI3Pleosporales
Cucurbitaria berberidis CBS 394.84Pleosporales
Pyrenochaeta sp. DS3sAY3aPleosporalesLizonia empirigoniaUnknown
Didymella exigua CBS 183.55Pleosporales
Macroventuria anomochaetaPleosporales
Dothidotthia symphoricarpi CBS 119687Pleosporales
Pleomassaria siparia CBS 279.74Pleosporales
Melanomma pulvis-pyrius CBS 109.77Pleosporales
Aspergillus nidulansEurotiomycetes
Coccidioides immitisEurotiomycetes
Fusarium graminearumSordariomycetes
Neurospora crassaSordariomycetes
Botrytis cinereaLeotiomycetesSclerotinia sclerotiorumLeotiomycetes
Pyronema omphalodesPezizomycetes
Tuber melanosporumPezizomycetes
Number of clusters2PKS
NRPSPKS-likeNRPS-likeHybrid
Signature gene type1
> 1 signature gene
Dothideales
Capnodiales
Venturiales
Botryo-sphaeriales
MytilinidalesHysteriales
Pleosporales
Myriangiales
Order
Homologous cluster group linkage treeD
othi
deom
ycet
e sp
ecie
s tre
e
signature gene ≥70% identical to known locus
Depudecin
Alternapyrone
DHN melanin
0.10
Cercosporin
Dimethylcoprogen
Ferrichrome
Betaenone A/B/C
Chaetoviridin/Chaetomugilin
Curvupallides
Swainsonine
Sirodesmin
Aflatoxin
Figure 2
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
8
(Mizushina, Kamisuki et al. 2002, Kasahara, Miyamoto et al. 2010), and the related 156
alternapyrone (Fujii, Yoshida et al. 2005), first identified in Alternaria solani (Kasahara et al. 157
2010; Mizushina et al. 2002; Fujii et al. 2005), were found across taxa primarily in the order 158
Pleosporales, especially in the closely related Pleosporaceae, Leptosphaeriaceae, and 159
Phaeosphaeriaceae families (Figure 2, Table SF, SG). Several BGCs mapping to the MIBiG 160
cluster for the extracellular siderophore dimethylcoprogen, which plays a role in virulence in the 161
corn pathogen Cochliobolus heterostrophus (Dothideomycetes) and Fusarium graminearum 162
(Sordariomycetes), was also found in most taxa in Pleosporales (Oide, Moeder et al. 2006). In 163
contrast, a BGC mapping to the NRPS phytotoxin sirodesmin, produced by Leptosphaeria 164
maculans (Gardiner, Cozijnsen et al. 2004), and depudecin, a histone deacetylase (HDAC) 165
synthesized by a PKS BGC first identified in A. brassicicola (Wight et al. 2009), were found 166
discontinuously distributed in only a few unrelated species within Pleosporales (Figure 2, Table 167
SF, SG). Aside from sirodesmin, only one other BGC had hits in MIBiG to a BGC producing a 168
host-selective toxin. A BGC mapping to T-toxin, a polyketide toxin produced by race T (C4) of 169
C. heterostrophus (only Race O (C5) included in this study) that was responsible for the170
devastating Southern Corn Leaf Blight (Daly 1982; Turgeon and Baker 2007), was detected by 171
CO-OCCUR in only two additional taxa, Ampelomyces quisqualis and L. maculans (Table SG). 172
Other BGCs matched MIBiG clusters from other ascomycete classes (Eurotiomycetes, 173
Sordariomycetes), some of which have been previously detected in Dothideomycetes while 174
others were unexpected. The aflatoxin-like dothistromin clusters, which are fragmented into six 175
mini-clusters in Dothistroma septorum (Bradshaw, Slot et al. 2013), predictably mapped to 176
clusters detected by CO-OCCUR in D. septorum and the closely related species Passalora fulva 177
(Capnodiales) Some unexpected findings included a cluster in Macrophomina phaseolina, that 178
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
9
matched the PKS BGC for chaetoglobosins, a class of mycotoxins with both antifungal and anti-179
cancer activities (Ali, Caggia et al. 2015; Jiang, Song et al. 2017) found in the distantly related 180
Chaetomium globosum (Sordariomycetes) and some Eurotiomycetes (Schumann and Hertweck 181
2007) (Figure 2, Table S1). Another unexpected finding was the similarity between a CO-182
OCCUR cluster in M. phaseolina and the BGC for leucinostatin, a peptaibol compound with 183
putative antimicrobial and antifungal activity, that was previously only known from taxa in the 184
Sordariomycetes (Wang, Liu et al. 2016). 185
Cluster co-occurrence networks reveal contrasting trends in diversification 186
We visualized all significant homolog group co-occurrences predicted by CO-OCCUR as 187
networks where nodes represent homolog groups and edges connect homolog groups that co-188
occur with unexpected frequency in genomic regions containing core biosynthetic genes (Figure 189
3a). A total of 33 discrete networks were recovered, with 71% of homolog groups located in the 190
largest two networks. Signature genes tended to be highly connected to other homolog groups in 191
two qualitatively different types of subnetworks. In one type of subnetwork, signature genes are 192
centrally connected to diverse accessory homolog groups (e.g. PKS subnetworks), while in the 193
other type one or more signature genes are non-centrally linked with fewer accessory homolog 194
groups (e.g., the NRPS and DMAT subnetwork in network 1). By quantifying the betweenness 195
centrality of each node (a function of the number of shortest network paths that pass through that 196
node) within each network, we identified signature genes and several other biosynthetic 197
enzymes, transporters, and DNA binding proteins that bridge alternate subnetworks (Figure 198
3a,b). 199
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
10
PKS BGCs are more compositionally diverse than NRPS BGCs. BGCs containing PKS 200
signature genes tended to have fewer significant co-occurrences among their constituent genes 201
across various BGC sizes, compared to BGCs containing NRPS signature genes (Figure 3c). 202
This is consistent with a trend in which clusters containing PKS signature genes have more 203
unique types of BGCs for a given cluster size (corrected by the total number of BGCs of that 204
size), compared with BGCs containing NRPS signature genes (Figure 3d). 205
Different algorithms annotate overlapping and complementary sets of clustered genes. 206
CO-OCCUR predictions and the pHMM-based SMURF (Khaldi, Seifuddin et al. 2010) 207
and antiSMASH (Blin, Wolf et al. 2017) programs all predicted similar numbers, but different 208
types of BGCs. antiSMASH identified a total of 1710 clusters that were part of 252 cluster 209
groups and 887 orphan clusters with 4 or more homolog groups (Table SH, Table SI). SMURF 210
identified a total of 686 clusters that were part of 194 cluster groups and 495 orphan clusters with 211
4 or more homolog groups (Table SJ, Table SK). CO-OCCUR predicted 1469 clusters that are 212
part of 239 cluster groups and 220 orphan clusters with 4 or more homolog groups (Table SC, 213
Table SE). We found that no single algorithm was able to annotate all predicted genes of interest 214
in a BGC, even those predicted to be involved in SM biosynthesis (Figure 4a, Table SL). CO-215
OCCUR identified 51.2% and 37.7% of the clustered genes detected by SMURF and 216
antiSMASH, respectively. Conversely, SMURF and antiSMASH identify 40.7% and 42.0% of 217
the clustered genes detected by CO-OCCUR, respectively. When examining only genes 218
predicted to participate in SM biosynthesis, transport and catabolism, we found that CO-OCCUR 219
identified 51.2% and 43.3% of genes detected by SMURF and antiSMASH, respectively, while 220
SMURF and antiSMASH each identified 62.6% of those detected by CO-OCCUR (Figure 4a). 221
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
DMAT
Hybrid
NRPS-like
TC
PKS
PKS-like
NRPS
Signature genes
Transport-relatedhomolog group
Homolog groupNode types
Gen
e cl
uste
r div
ersi
ty
0
10
20
30
40
0
10
20
30
40
Gene cluster size
0.2
0.4
0.6
0.8
4 6 8 10 12 14
co-occurences involving signature gene
co-occurences notinvolving signature gene
Gene cluster size4 6 8 10 12 14
Num
ber o
f sig
nific
ant c
o-oc
curre
nces
per
clu
ster
NRPS-LIKE (0.44)
HYBRID
NRPS-LIKE
PKS (0.42)
NRPS-LIKE
PKS (0.75)
TC
NRPS-LIKE
PKS-LIKE
NRPS (0.35)
PKS (0.78)
NRPS-LIKE
DMAT
PKS-LIKE
NRPS
a) b)
c)
d)
0
25
150
50
Betweeness Centrality
Num
ber o
f nod
es
0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8
Network 1 Network 2
Network 1
Network 2
n1 (0.32)
n2 (0.31)
n3 (0.31)
n4 (0.25)
n5 (0.22)
n6 (0.22)
n7 (0.21)
n8 (0.21)
n1: serine hydrolasen2: multi-drug resistance proteinn3: trichothecene efflux pumpn4: amino acid aminotransferase
n5: FAD-bindingn6: o-methyltransferasen7: DNA-binding proteinn8: mono-carboxylate transporter
Figure 3
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
6051(796)
1553(290)
5416(887)
1254(200)
3326(1301)
CO-OCCURSMURF
antiSMASH
CTB6CTB4
CTB2CTB1
CTB3CTB5
CTB7CTB8
ORF1185
332ORF12
4308
3CFP
CTB9CTB10
CTB11
CTB12
* * * * ** * ** ** required for cercosporin biosynthesis (Chen et al., 2007; Newman et al., 2016; de Jonge et al., 2018)
a)
b)
c)
3873(1468)
1408(200)
CO-OCCUR % recovery
antiS
MAS
H %
reco
very
0 25 50 75 1000
25
50
75
100
CO-OCCUR % discovery
antiS
MAS
H %
dis
cove
ry
0 100 200 300 400
100
200
300
400
0500
500
Figure 4
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
11
The complementary nature of the CO-OCCUR and antiSMASH algorithms is illustrated 222
by their annotations of a characterized BGC that encodes the biosynthesis of cercosporin, a non-223
host specific polyketide produced by Cercospora spp. (Dothideomycetes) and Colletotrichum 224
(Sordariomycetes) (de Jonge, Ebert et al. 2018). Encoded in a BGC, all 10 genes involved in 225
cercosporin biosynthesis, are known and characterized (CTB1-3, CTB5-7, CTB9), in addition to 226
a regulator (CTB8) and two transporters (CTB4 and CFP)(Chen, Lee et al. 2007; de Jonge, Ebert 227
et al. 2018; Newman & Townsend 2016). At this BGC’s locus in Cercospora zeae-maydis, both 228
antiSMASH and CO-OCCUR annotated CTB1, CTB2, and CTB3 as genes of interest; only 229
antiSMASH annotated CTB4, CTB5 and CTB6; only CO-OCCUR annotated CTB10, CTB11 230
and CTB12; and no algorithm annotated CTB7, CTB8, CTB9 or CFP (Figure 4b). 231
CO-OCCUR and antiSMASH recovered similar proportions of loci homologous to 232
known BGCs and predicted additional genes of interest in the vicinity of these candidate. Using 233
BLASTP, we identified 364 BGCs with ≥ 3 genes across all Dothideomycete genomes that are 234
homologous to 58 characterized BGCs from the MIBiG database (i.e., where ≥75% of genes 235
show similarity) (Table SM). We then compared how many genes within and around these BGCs 236
were predicted to be of interest by either antiSMASH or CO-OCCUR by cross-referencing all 237
BGCs detected by each method, and found that both algorithms recovered similar percentages of 238
BGC content (antiSMASH mean percent recovery = 48.3%, SD = 37.6%; CO-OCCUR mean 239
percent recovery = 51.0%, SD = 42.6%), although for any given BGC, percent recovery often 240
differed between each algorithm (Figure 4c, Table SG). We also found that both antiSMASH and 241
CO-OCCUR identified similar numbers of new genes of interest around BGC loci (antiSMASH 242
mean percent discovery = 65.4%, SD = 85.4%; CO-OCCUR mean percent discovery = 56.6%, 243
SD = 89.4%), and that the number of additional genes of interest often exceeded the size of the 244
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
12
recovered candidate cluster. High rates of novel gene discovery are perhaps expected given that 245
many of the clusters in MIBiG are only partially annotated. 246
Some over-dispersed clusters have ecologically biased distributions 247
We found that nearly one-fifth (18%) of cluster groups are phylogenetically over-248
dispersed when compared to expected distributions that would result from strict Brownian 249
evolution using Fritz and Purvis’ D statistic, where more closely related species are predicted to 250
be more similar to each other compared to more distantly related species (Figure 5, Figure SD). 251
Six over-dispersed cluster groups were over-represented (present at least twice as often) in either 252
plant pathotrophs or plant saprotrophs (Figure 5). By comparison 22.5% of cluster groups had 253
distributions that were more conserved than expected. The remaining cluster group distributions 254
either fell on a continuum between phylogenetically conserved and over-dispersed (35.1%) or 255
were present in sets of taxa too small to be analyzed (23.8%). Figure 6 presents three examples 256
of closely related cluster groups that vary in their phylogenetic conservation (See Methods). 257
Cluster groups in the first group, which partially encode the 1,8-dihydroxynaphthalene (DHN) 258
melanin pathway, were found in nearly all Dothideomycetes; cluster groups in the second group 259
were restricted to the Pleosporales; and cluster groups in the third group were found among 260
Bipolaris and Dothidotthia, two closely related genera within the Pleosporales. 261
Dothideomycetes have five distinct types of DHN melanin clusters 262
We detected five cluster groups with distinct but overlapping compositions that appear to 263
encode partial pathways for 1,8-dihydroxynaphthalene (DHN) melanin biosynthesis in 87 of the 264
101 genomes (Figure 6). No genome had more than one predicted DHN melanin cluster. The two 265
most prevalent types, cluster group 131 and cluster group 113, are found in 48 fungi from 10 of 266
the 13 taxonomic orders and in 29 fungi from 6 orders, respectively. Cluster groups 131, 113, 267
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
(a) group 221 NRPS 4(b) group 6 PKS 14
(c) group 182 PKS,TC 4(d) group 163 PKS,DMAT 5(e) group 49 PKS 8(f) group 202 PKS 5
52
3367
CG id Sig. gene(s) Size Freq.b)
(a)(b)(c)(d)(e)(f)
c)
HCG
Dothideomycetespecies tree
Plan
t pat
h. :
Plan
t sap
.Pl
ant s
ap. :
Pla
nt p
ath.
Life
styl
e ra
tios
a)
df
a
bec
8
7
6
5
4
3
2
1
0
1
2
3
4
5
6
7
-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5
Fritz and Purvis' D
Best known signature(s)
0.10
Homologous cluster group (CG)P(Brownian motion) > 0.05P(Brownian motion) ≤ 0.05
Asperphenamate (35%)
Emodin, Emericellin,Pestheic acid (60%)Fusarubin (46%)
ACT-toxin (47%)
Emodin (66%)
Azanigerone (46%)
2
Figure 5
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
0.10
Set 1: DHN melanin
Polyketide synthase
( )( )
1,3,8-trihydroxynaphthalene
re
ductase
Transcription factor cmr1
Prefoldin subunit
hsp40 co-chaperone JID1
Unknown
(a) group 131
(b) group 113
(c) group 223
(d) group 134
(e) group 137
( ) = present in < 50% of clusters in HCG
a b c d e
2948 6 2 2
f g h i j
Set 2: Unknown NRPS-like
NRPS-like
Glycerol-3-phosphate
dehydrogenase
Unknown
Efflux pump
MFS multidrug
tra
nsporter
ABC transporter
(f) group 185
(g) group 168
(h) group 100
( )Gliotoxin
exporter
Total fungi with HCG: 5 18 25 3 2
Dot
hide
omyc
ete
spec
ies
tree
(i) group 16
Set 3: Alternapyrone-like PKS
PKS
(j) group 22
PKS Lanosterol
synthaseFAD binding
Reductase
Prenyltransferase
Unknown
FAD binding
Salicylate hydroxylase
Unknown
Berberine
a) b)Set 1 Set 2 Set 3
Figure 6
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
13
223 and 134 encode 2 of the 5 biosynthetic genes (pks1, 1,3,8-trihydroxynapthalnene [T3HN] 268
reductase) and the transcription factor (cmr1) involved in DHN melanin biosynthesis, while 269
cluster group 137 encodes 1 biosynthetic gene (T3HN reductase) and cmr1 (Figure 6b). In 270
addition to the homolog groups known to participate in DHN melanin biosynthesis, we detected 271
3 additional homolog groups (Prefoldin subunit, Heat-shock protein 40 co-chaperone JID1, and a 272
protein of unknown function) that are broadly conserved within DHN melanin clusters but that 273
have no known role in melanin biosynthesis. As an example of how CO-OCCUR is not 274
constrained by a priori assumptions of pHMMs, these additional homolog groups were not 275
detected by either antiSMASH or SMURF despite their prevalent linkage to the known 276
biosynthetic genes. 277
278
SM cluster diversity is under-sampled and increases proportionally with total number of 279
genomes in Pleosporales 280
Cluster repertoires (combinations of cluster groups found within a given genome) differ 281
markedly between fungi from different genera (mean pairwise Sørensen dissimilarity = 0.79, SD 282
= 0.12) and to a lesser extent within a genus (mean = 0.37, SD = 0.13), with dissimilarity 283
increasing linearly with phylogenetic distance across all pairwise species combinations among 284
49 Pleosporales (y = 0.84x + 0.51; r2 = 0.50), the most well-sampled Dothideomycetes order 285
(Figure 7a, Figure SF). However, given the same level of within-repertoire diversity (i.e., alpha 286
diversity) and total diversity across all repertoires (i.e., gamma diversity), dissimilarity between 287
repertoires (i.e., beta diversity) can result from either nestedness (where some repertoires are 288
subsets of others) or turnover (where no repertoire is a subset of the other), or a combination of 289
the two. When we partitioned total Sørensen dissimilarity between all cluster repertoires (βSOR = 290
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
Linkage tree of Raup-Crick dissimilarity between unique cluster types
Masph1Spofi1
Amnli1Melpu1
Veren1Lopnu1Polfu1
Linin1Zoprh1
Delco1Perma1
Cloaq1Trepe1
Bimnz1Lopma1
Lenfl1Karrh1Parsp1
Wesor1Bysci1
Maseb1Aaoar1Plesi1
Pyrtr1Pyrtt1
Coclu2Settu1
Altal1Altbr1
Cocmi1Cocca1Cocvi1
CocheC5_3Cocsa1
Stano2Ampqui1
Clael1Photr1
Macan1Didex1Lizem1
Stasp1Cucbe1Pyrsp1
Ophdi1Setho1
Decga1Dotsy1Lepmu1
1.0 0.8 0.6 0.4 0.2 0.0
0
100
200
300
400
0 25 50 75 100C
lust
er ri
chne
ss
10
15
20
25
Cluster repertoire size
Number of sampled Pleosporalean genomes
10 20 30 40
Clu
ster
repe
rtoire
div
ersi
ty(to
tal R
aup-
Circ
k bra
nch
leng
th)
a) b)
c)
Link
age
tree
of S
øren
sen
diss
imila
rity
betw
een
Pleo
spor
alea
n cl
uste
r rep
erto
ires
observed interpolation extrapolation
sampled Pleosporalean genome
Figure 7
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
14
0.969) into its nestedness and turnover components, we found that nearly all of the differences 291
between the cluster repertoires of different genomes were due to turnover (βSIM = 0.96) and not 292
repertoire nestedness (βSNE = 0.008), such that any given cluster repertoire contains a unique 293
combination of clusters (Figure SH). Furthermore, the compositional diversity of gene clusters 294
within a given repertoire (measured as the total branch length on a Raup-Crick linkage tree, 295
Figure 7a) scales linearly with repertoire size (y = 0.49x + 3.92; adj. R2 = 0.86), indicating that 296
clusters added to a given repertoire are generally dissimilar to the clusters already present in that 297
repertoire (Figure 7b). Finally, rarefaction analysis of the total number of unique cluster groups 298
and orphan clusters (i.e., cluster richness) detected over increasing numbers of sampled genomes 299
suggests genomes within Pleosporales are under-sampled with respect to BGC diversity, and 300
project substantially more unique cluster types arising from future genome sampling within this 301
order (Figure 7c). 302
Discussion: 303
BGC diversity has been investigated primarily in bacteria (Cimermancic, Medema et al. 304
2014) and within individual genera in the fungal classes Eurotiomycetes and Sordariomycetes 305
(Lind, Wisecaver et al. 2017; Theobald, Vesth et al. 2017; Villani, Proctor et al. 2019). Although 306
Dothideomycetes are producers of a number of secondary metabolites important to fungal-plant 307
interactions and toxin production, to date there has not been a systematic evaluation of BGC 308
diversity in the Dothideomycetes nor in any other fungal class. Fungal genomes experience 309
frequent reorganization and changes in gene composition that underlies large-scale differences in 310
chromosomal macro- and micro-synteny among species (Grandaubert, Lowe et al. 2014; Hane, 311
Rouxel et al. 2011; Shi‐Kunne, Faino et al. 2018). Yet despite the overall dynamic nature of 312
fungal chromatin, tight linkage is often maintained between loci with related metabolic 313
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
15
functions, manifesting as gene clusters (Del Carratore, Zych et al. 2014). Here, we developed an 314
alternative, function-agnostic approach to annotating SM genes of interest that exploits these 315
patterns of microsynteny in order to identify previously unexplored dimensions of fungal BGC 316
diversity. 317
Complementary methodologies enhance understanding of BGC composition and diversity 318
in Dothideomycetes 319
There are two main approaches to predicting genes that are functionally associated in 320
BGCs. The first uses targeted methods based on precomputed pHMMs derived from a set of 321
genes known to participate in SM metabolism to identify sequences of interest (Khaldi, 322
Seifuddin et al., 2010; Blin, Wolf et al., 2017). The second uses untargeted methods based on 323
some function-agnostic criteria, such as synteny conservation or shared evolutionary history, to 324
implicate genes as part of a gene cluster (Gluck-Thaler & Slot 2018). Due to common metabolic 325
functions employed across distantly related taxa, targeted approaches, such as those employed by 326
SMURF and antiSMASH, have proven enormously successful. However, our objective in this 327
study was to develop a complementary untargeted approach in order to capture undescribed BGC 328
diversity within a single fungal lineage. 329
Our CO-OCCUR algorithm leverages a database of 101 Dothideomycete genomes in 330
order to annotate genes of interest using unexpectedly conserved genetic linkage as an indicator 331
of selection for co-inheritance with SM signature genes. CO-OCCUR failed to recover many of 332
the genes annotated using the pHMM approaches employed by SMURF and antiSMASH, 333
indicating that it has limitations in its prediction of secondary metabolite BGC content. These 334
results suggest that it is not optimal for the de novo BGC annotation of individual genomes, and 335
its ability to annotate genes of interest is proportional to their co-occurrence frequency in a given 336
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
16
database, meaning that it is not well suited for recovering associated SM genes that are not 337
evolutionarily conserved. This may explain in part why 10,295 genes (including 2,478 genes 338
predicted to be involved in secondary metabolism) identified by antiSMASH and SMURF 339
combined were not detected with CO-OCCUR (Figure 4), and why CO-OCCUR detected only a 340
few of the host-selective toxins found in Dothideomycetes. 341
Nevertheless, our method avoids some of the limitations intrinsic to algorithms that 342
employ pHMMs to delineate cluster content. While pHMM-based approaches gain predictive 343
power by leveraging similarities in SM biosynthesis across disparate organisms, they may fail to 344
identify gene families involved in secondary metabolism that are unique to a particular lineage of 345
organisms. For example, SMURF detects accessory SM genes using pHMMs derived from 346
mostly Aspergillus (Eurotiomycetes) BGCs (Khaldi, Seifuddin et al., 2010), while antiSMASH 347
v4 and v5 use 301 pHMMs of smCOGs (secondary metabolism gene families) derived from 348
aligning SM-related proteins, of which few are currently from fungi, in order to identify genes of 349
interest in the regions surrounding signature biosynthetic genes (Blin, Wolf et al., 2017). 350
Taxonomic bias introduced by sampling a limited number of BGCs may account for the 6,051 351
proteins found in BCGs that were identified by CO-OCCUR but not any other algorithm, of 352
which 796 are predicted to participate in secondary metabolism and 617 could not be assigned to 353
a COG category but nevertheless have domains commonly observed in secondary metabolite 354
biosynthetic proteins (e.g., methyltransferase, hydrolase). 355
A linkage-based approach can also identify non-canonical accessory genes involved in 356
SM biosynthesis. For example, we detected 3 genes among the 5 variants of the DHN melanin 357
cluster that were not previously considered to be part of this BGC and not detected by either 358
antiSMASH or SMURF. One of these genes, a predicted HSP40 chaperone, is a homolog of the 359
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
17
yeast gene JID1, whose knock-out mutants display a range of phenotypes 360
(https://www.yeastgenome.org/locus/S000006265/phenotype) related to melanin production, 361
including increased sensitivity to heat and chemical stress. We propose that natural selection (not 362
genetic hitchhiking) is responsible for conservation of synteny in these loci, because SM cluster 363
locus composition and microsynteny in general are typically highly dynamic in fungi (Lind, 364
Wisecaver et al., 2018; Proctor, McCormick et al. 2018), and therefore conserved linkage in 365
these clusters over speciation events is a strong indicator of related function (de Jonge, Ebert et 366
al. 2018; Del Carratore, Zych et al. 2019). The identification of genes with non-canonical 367
functions, including those not participating directly in SM biosynthesis, may reveal SM 368
supportive functions, including mechanisms to protect endogenous targets of the metabolic 369
product (Keller 2015), in addition to novel biosynthetic genes (de Jonge, Ebert et al. 2018). 370
Ultimately, targeted and untargeted approaches to BGC annotation reinforce and enrich 371
our understanding of BGC diversity, as no single method identifies all accessory genes of 372
interest in the regions surrounding signature biosynthetic genes (Figure 4). It is notable that the 373
cercosporin BGC was long thought to consist only of CTB1-8, based on functional analyses and 374
structural prediction. However, de Jonge et al. recently predicted CTB9-12 to be of interest after 375
observing that these genes have conserved synteny among all fungi that possessed CTB1-8, and 376
subsequently demonstrated they are essential for cercosporin biosynthesis (de Jonge, Ebert et al. 377
2018). Only CO-OCCUR detected these additional four genes and bBoth pHMM-based models 378
and CO-OCCUR were required to detect the complete cercosporin BGC in our study. Given the 379
complementary nature of the advantages and disadvantages of different algorithms, we suggest 380
future studies incorporate multiple lines of evidence from both targeted and untargeted 381
approaches to more fully capture BGC compositional diversity. The 332 homolog groups of 382
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
18
interest that we identified using CO-OCCUR could further be used to build pHMMs and be 383
incorporated into existing BGC annotation pipelines in order to facilitate more complete analyses 384
of single genomes. 385
Signature genes differ in mode of BGC diversification 386
Although BGCs in fungi typically display characteristics of diversification ‘hotspots’, 387
showing elevated rates of gene duplication and gene gain and loss (Wisecaver, Slot et al. 2014; 388
Lind, Wisecaver et al. 2017), modular parts of clusters and even entire clusters are often shared 389
between divergent species. BGC diversification through gain and loss of individual genes and 390
sub-clusters of genes has been demonstrated in bacterial BGC diversity (Cimermancic, Medema 391
et al. 2014). Although the extent of sub-clustering in fungal genomes has never been directly 392
addressed to our knowledge, the algorithm we designed here essentially functions by identifying 393
the smallest possible type of sub-cluster: a pair of genes found more often than expected by 394
chance. The unexpected co-occurrence of gene pairs revealed that the two largest types of 395
signature gene families, PKS and NRPS, have contrasting co-occurrence network structures. 396
NRPS homolog groups are embedded in highly reticulate cliques (i.e., form unexpected 397
associations with genes that co-occur amongst themselves). This could suggest NRPS cluster 398
diversification is constrained by interdependencies among accessory genes. By contrast, PKS 399
homolog groups are network hubs (i.e., form unexpected associations with many non-co-400
occurring genes), which may underlie the higher compositional diversity and decreased 401
frequency of unexpected co-occurrences found within PKS clusters (Figure 3c, d). The apparent 402
contrast in how these different signature cluster types are assembled may reflect the range of 403
accessory modifications typically applied to the structures of polyketides and nonribosomal 404
peptides produced by PKSs and NRPSs. Alternatively, PKS clusters may be subject to more 405
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
19
diversifying selection, due to the ability of cognate metabolism in other organisms to utilize, 406
degrade, or neutralize the metabolites. These hypotheses remain to be tested. 407
Persistent gene co-occurrences reveal layers of combinatorial evolution 408
Previous large-scale analyses of BGCs suggest there is an upper limit to the number of 409
gene families that associate with signature biosynthetic genes, and that diversity is in large part 410
dependent on combinatorial re-shuffling of existing loci (Cimermancic, Medema et al. 2014). 411
Our analysis expands the number of gene families implicated in BGC diversity and identifies 412
patterns of modular combinatorial evolution among accessory homolog groups with metabolic, 413
transport and regulatory-related functions. While some of these accessory homolog groups are 414
restricted to BGCs with a particular type of signature SM gene, others are present in multiple 415
BGC types, suggesting they encode evolvable or promiscuous functions that can be readily 416
incorporated into different metabolic processes (Table SB). For example, 34 homolog groups 417
with predicted transporter functions are common features of the clusters we detected, present in 418
just under half (43%) of all predicted clusters. Among these homolog groups, 5 have been 419
recruited to compositionally diverse gene clusters and are primarily annotated as toxin efflux 420
transporters or multidrug resistance proteins. Transporters are a key component of fungal 421
chemical defense systems, well known for facilitating resistance to fungicides and host-produced 422
toxins (Coleman & Mylonakis 2009). Transporters are also increasingly recognized as integral 423
components of self-defense mechanisms against toxicity of endogenously produced SMs 424
(Menke, Dong et al., 2012). 425
Heterogeneous dispersal patterns of BGCs underpin fungal ecological diversity 426
The distribution of fungal chemodiversity remains difficult to observe and interpret 427
directly, making BGCs useful tools for elucidating underlying trends in fungal chemical ecology. 428
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
20
Although the vast majority of BGCs remain uncharacterized, their phylogenetic distributions 429
occasionally provide clues to the selective environments that promote their retention (Slot 2017). 430
For example, spotty distributions resulting from horizontal transfer of BGCs between distantly 431
related but ecologically similar species suggests the encoded metabolites contribute to fitness in 432
the shared environment (Dhillon, Feau et al. 2015; Reynolds, Vijayakumar et al. 2018). Shared 433
ecological lifestyle may also help explain why certain clusters, such as those involved in putative 434
degradative pathways, are retained among phylogenetically distant species (Gluck-Thaler and 435
Slot 2018). Our simple eco-evolutionary screen identified 43 BGCs that are more widely 436
dispersed than expected under neutral evolutionary models, and further revealed that a subset of 437
these BGCs are present more often in fungi with specific nutritional strategies (e.g. plant 438
saprotrophs and plant pathotrophs), suggesting the molecules they encode contribute to specific 439
plant-associated lifestyles (Figure 5). For example, we found an over-dispersed NRPS BGC 440
(group 221) that is present in three plant pathogens and one plant saprotroph. In contrast, the 54 441
BGCs showing a phylogenetically under-dispersed distribution among mostly closely related 442
genomes is consistent with lineage-favored traits, which may or may not be due to shared 443
ecology. For example, a monophyletic clade of 26 pleosporalean fungi all have a 6 gene NRPS-444
like cluster (group 100) of unknown function, fully maintained among these allied taxa (and a 445
single distant relative), suggesting it encodes a trait that contributes to the success of this lineage. 446
Phylogenetic screens, especially when coupled with more robust phylogenetic analyses, will be 447
useful for prioritizing the characterization of BGCs most likely to contribute to the success of 448
particular guilds or clades. 449
Among those BGCs with hits to the MIBiG database, we identified clusters that displayed 450
both lineage specific and spotty or sporadic distributions. The Pleosporales, for example, 451
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
21
contains many plant pathogens and the conservation of BGCs involved in production of general 452
virulence factors towards plants such as solanopyrone, alternapyrone, and the extracellular 453
siderophore dimethylcoprogen across many taxa in this order suggests a shared lineage-specific 454
trait with roles in plant-pathogenesis. In contrast, the aflatoxin-like cluster Dothistromin cluster, 455
which was proposed to be horizontally transferred from Aspergillus (Eurotiomycetes), had a very 456
spotty distribution, found only in several closely related taxa in Capnodiales, supporting a 457
hypothesis of HGT. Similarly, the ETP toxin sirodesmin shares 6 genes with the BGC producing 458
the epipolythiodioxopiperazine (ETP) toxin gliotoxin, which plays a role in virulence towards 459
animals in human pathogen Aspergillus fumigatus (Eurotiomycetes) (Gardiner, Cozijnsen et al. 460
2004; Bok, Chung et al. 2006). Related ETP-like BGCs have since been identified in a number 461
of other taxa of Eurotiomycetes and Sordariomycetes, but among Dothiodeomycetes were 462
previously known only from L. maculans and a partial cluster in Sirodesmin diversum lacking 463
the core NRPS (Patron, Waller et al. 2007). We detected homologs of this cluster found 464
sporadically distributed in several other taxa within Pleosporales (Figure 2, Table SG). The CO-465
OCCUR algorithm detected only a few BGC with hits to host-selective toxins (sirodesmin and T-466
toxin) but failed to detect several well-known host-selective toxins such as HC-toxin, and other 467
host-selective toxins in Alternaria alternata. Either these host-selective toxins are not represented 468
in MIBiG or as discussed above, the uniqueness of these clusters and rarity of the linkages 469
between genes in these clusters in the overall dataset may make them difficult to detect through 470
CO-OCCUR. 471
472
Variation among BGC repertoires is due to high BGC turnover, not nestedness 473
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
22
Recent comparative studies have documented high intraspecific diversity of SM 474
pathways within and between different species of plants, bacteria and fungi (Penn et al., 2009; 475
Choudoir, Pepe-Ranney & Buckley 2018; Holeski, Hillstrom et al. 2012; Holeski, Keefover-476
Ring et al. 2013; Vesth, Nybo et al. 2018). However, identical estimates of diversity can result 477
from two distinct processes: nestedness, where one set of features is entirely subsumed within 478
another, or turnover, where differences are instead due to a lack of overlap among the features of 479
different sets (Baselga 2012). When we partitioned diversity among BGC repertoires in 480
Pleosporales (i.e., β diversity), we found that the vast majority of variation is due to a high 481
degree of genome-specific cluster combinations, and not nestedness (Figure 7, Figure SH). Much 482
of the turnover in BGC repertoire content between genomes appears to occur over relatively 483
short evolutionary timescales (Figure SF), and then diversifies more gradually, suggesting that 484
divergence in repertoires may be closely linked to speciation processes, such as niche 485
differentiation or geographic isolation. Directional selection, especially for multi-genic traits 486
encoded at a single locus (e.g., BGCs), leads to rapid gain/loss dynamics exemplary of many SM 487
phenotypes and genotypes (Choudoir, Pepe-Ranney & Buckley 2018; Lind, Wisecaver et al. 488
2017). Niche differentiation further reinforces divergence between closely related repertoires, 489
which might lead to rapid accumulation of variation over short evolutionary timescales. Indeed, 490
evidence from within populations suggests that BGCs are occasionally located in genomic 491
regions experiencing selective sweeps in geographically isolated pathogen populations 492
(Hartmann, McDonald et al. 2018). The retention/loss of certain SM clusters is coincident with 493
speciation in bacteria (Kurmayer, Blom et al., 2015) and much of the variation in cluster 494
repertoires in Metarhizium insect pathogens is species specific (Xu, Luo et al., 2016). Within 495
Dothideomycetes, the evolution of host-selective toxins even within a single species of pathogen, 496
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
23
for example, may allow for niche differentiation, host specialization, and potentially speciation. 497
Rare chemical phenotypes, especially with regards to defense chemistry, may also increase 498
fitness in complex communities (Kursar, Dexter et al. 2009). 499
BGCs interact with dimensions of chemical diversity 500
Biological activity of a SM can increase organismal fitness, but any given molecule is not 501
likely to be biologically active. The screening hypothesis posits that mechanisms to generate and 502
retain biochemical diversity would therefore be selected, despite the energetic costs, because 503
increasing structural diversity increases the probability of “finding” those that are adaptive (Firn 504
and Jones 2003). This phenomenon is analogous to the mammalian immune system’s latent 505
capacity to generate novel antibodies, resulting in a remarkable ability to respond to diverse 506
antagonists (Firn and Jones 2003). However, while the screening hypothesis may equally apply 507
to plants and microorganisms, patterns of diversity we observe here suggest each lineage 508
generates and maintains biochemical diversity in fundamentally distinct ways. Specifically, 509
fungal individuals appear to maximize total chemical beta-diversity while simultaneously 510
minimizing alpha-diversity of similar chemical classes (Nielsen, Grijseels et al. 2017; Vesth, 511
Nybo et al. 2018). In contrast, individual plants are more likely to produce diverse suites of 512
structurally similar molecules (Li, Bladwin & Gaquerel 2015; Song, Qiao et al. 2017; Weinhold, 513
Ullah et al. 2017). We show that total cluster diversity increases linearly with repertoire size 514
across a broad sample of fungi, extending previous observations that individual fungal genomes 515
are streamlined to produce molecules that share little structural similarity. Rather than 516
maintaining sets of homologous BGCs and pathways within the same genome, evidence from 517
ours and other studies suggests that fungi instead maintain high genetic variation in homologous 518
BGCs across individuals at the level of the pan-genome (Ziemert, Lechner et al. 2014; Lind, 519
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
24
Wisecaver et al. 2017; Olarte et al.2019). Although not a selectable evolvability mechanism per 520
se, greater access to the diversity of BGCs harbored in pan-genomes through recombination, 521
hybridization and horizontal transfer effectively outsources the incremental screening for 522
bioactive metabolites across many individuals, thereby decreasing the costs for generating 523
diversity for any given individual and likely accelerating the rate at which effective bioactive 524
metabolite repertoires are assembled within a given lineage (Slot and Gluck-Thaler 2019). Our 525
characterization of BGC diversity across the largest fungal taxonomic class represents a step 526
towards elucidating the broader consequences of these contrasting strategies for generating and 527
maintaining biodiversity of metabolism writ large. 528
Conclusions: 529
Fungi produce a range of secondary metabolites that are linked to different ecological functions 530
or defense mechanisms, playing a role in adaptation over time. Although studied at intra- and 531
interspecific level, this phenomenon has not been studied at macroevolutionary scales. The 532
Dothideomycetes represent the largest and phylogenetically most diverse class of fungi, 533
displaying a range of fungal lifestyles and ecologies. Here we assessed the patterns of diversity 534
of biosynthetic gene clusters across the genomes of 101 Dothideomycetes to dissect patterns of 535
evolution of chemodiversity. Our results suggest that different classes of BGCs (e.g. PKS versus 536
NRPS) have differing diversity of cluster content and connectedness among networks of co-537
occurring genes and implicate high rates of BGC turnover, rather than nestedness, as the main 538
contributor to the high diversity of BGCs observed among fungi. Consequently, little overlap 539
was found in biosynthetic gene clusters from different genera, consistent with diverse ecologies 540
and lifestyles among the Dothideomycetes, and suggesting that most of the metabolic capacity of 541
this fungal class remains to be discovered. 542
543
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
25
Methods: 544
Dothideomycetes genome database and species phylogeny 545
A database of 101 Dothideomycetes annotated genomes, gene homolog groups, and the 546
corresponding phylogenomic species tree were obtained from (Grigoriev et al. 2014; Haridas et 547
al in press). 548
Gene cluster annotation with the SMURF algorithm 549
We used a command-line Python script based on the SMURF algorithm (Vesth, Nybo et al. 550
2018). Using genomic coordinate data and annotated PFAM domains of predicted genes as input, 551
the algorithm predicts seven types of SM clusters based on the multi-PFAM domain composition 552
of known 'backbone' genes. The cluster types are 1) Polyketide synthases (PKSs), 2) PKS-like, 3) 553
nonribosomal peptide-synthetases (NRPSs) 4) NRPS-like, 5) hybrid PKS-NRPS, 6) 554
prenyltransferases (DMATS), and 7) terpene cyclases (TCs). The borders of clusters are 555
determined using PFAM domains that are enriched in characterized SM clusters, allowing up to 556
3 kb of intergenic space between genes, and no more than 6 intervening genes that lack SM-557
associated domains. SM-associated PFAM domains were borrowed from Khaldi et al. (2010). 558
Gene cluster annotation with antiSMASH 559
All genomes were annotated using antiSMASH v4.2.0 by submitting genome assemblies 560
and GFF files to the public web server with options “use ClusterFinder algorithm for BGC 561
border prediction” and “smCOG analysis” (Blin, Wolf et al., 2017). antiSMASH reports all 562
genes within the borders of a predicted cluster as part of the cluster. For our analysis, we only 563
considered genes belonging to annotated smCOGs or signature biosynthetic gene families as part 564
of a given cluster and excluded all others, in order to obtain conservative, high confidence 565
estimations of cluster content based on genes of interest. 566
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
26
Sampling null homolog group-pair distributions 567
We created null distributions from which we could empirically estimate co-occurrence 568
probabilities by randomly sampling homolog group pairs without replacement from all 569
Dothideomycete genomes (Figure SA). Before beginning, we defined null distributions based on 570
two parameters: a range of sizes for the smallest homolog group in the pair, and a range of sizes 571
for the largest homolog group in the pair, where each range progressively incremented by 25 572
from 1-800 and all combinations of ranges were considered. For example, there existed a null 573
distribution for homolog group pairs where the smallest homolog group had between 26-50 574
members, and the largest homolog group had between 151-175 members. To begin, we randomly 575
sampled a genome and then randomly selected two genes within 6 genes of one other from that 576
genome. We retrieved the homolog groups to which those genes belonged, and then counted the 577
number of times members of each homolog group were found within 6 genes of each other 578
across all Dothideomycete genomes. We counted the number of members belonging to each 579
homolog group, excluding those that were found within 6 genes of the end of a contig, in order to 580
obtain a corrected size for each homolog group that accounted for variation in assembly quality. 581
The co-occurrence observation was then stored in the appropriate null distribution based on the 582
corrected sizes of each homolog group. For example, the number of co-occurrences of a sampled 583
homolog group pair where the smallest homolog group had a corrected size of 89, and the largest 584
homolog group had a corrected size of 732 would be placed in the null distribution where the 585
smallest size bin was 76-100 members, and the largest size bin was 726-750. All homolog 586
groups with greater than 800 members were assigned to the 776-800 size bin. This sampling 587
procedure was repeated 500,000 times. After evaluating various bin sizes, we ultimately decided 588
to use a range of 25 because this resulted in the most even distribution of samples across all null 589
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
27
distributions. Due to variation in the number of homolog groups with any given size across our 590
dataset, it was not possible for all null distributions to contain the same number of samples. 591
The CO-OCCUR pipeline 592
Current BGC detection algorithms first identify signature biosynthetic genes using profile 593
Hidden Markov Models (pHMMs) of genes known to participate in SM biosynthesis, and then 594
search predefined regions surrounding signature genes for co-located "accessory" biosynthetic, 595
regulatory, and transport genes. The approach of CO-OCCUR, in contrast, is to define genes of 596
interest based on whether they are ever found to have unexpectedly conserved syntenic 597
relationships with other genes in the vicinity of signature biosynthetic genes, agnostic of gene 598
function. Here, we used CO-OCCUR in conjunction with a preliminary SMURF analysis to 599
arrive at our final BGC annotations (Figure SA). We first took all SMURF BGC predictions and 600
extended their boundaries to genes within a 6 gene distance that belonged to homolog groups 601
found in another SMURF BGC, effectively “bootstrapping” the BGC annotations in order to 602
ensure consistent identification of BGC content across the various genomes. SMURF BGCs at 603
this point in the analysis were considered to consist of all genes found within the cluster’s 604
boundaries. For each pair of genes in each BGC (including signature biosynthetic genes), we 605
retrieved their homolog groups, and kept track of how many times that homolog group pair was 606
observed across all BGCs. Then, for each observed homolog group pair, we divided the number 607
of randomly sampled homolog group pairs in the appropriate null distribution (based on the 608
corrected sizes of the smallest and largest homolog groups within the observed pair, see above) 609
that had a number of co-occurrences greater than or equal to the observed number of co-610
occurrences by the total number of samples in the null distribution. In doing so, we empirically 611
estimated the probability of observing a homolog group pair with at least that many co-612
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
28
occurrences by chance, given the sizes of the homolog groups. In this way, we were able to take 613
into account the relative frequencies of each homolog group within a pair across all genomes 614
when assessing the probability of observing that pair’s co-occurrence. For example, if we 615
observed that homolog group 1 and homolog group 2 co-occurred 19 times within SMURF-616
predicted BGCs, and that homolog group 1 had 57 members while homolog group 2 had 391 617
members, we would count the number of randomly sampled homolog group pairs that co-618
occurred 19 or more times within the null distribution where the smallest homolog group size bin 619
was 51-75 and the largest homolog group size bin was 376-400, and then divided this by the total 620
number of samples in that same null distribution to obtain the probability of observing homolog 621
group 1 and homolog group 2’s co-occurrences by chance. All co-occurrences with an empirical 622
probability estimate of ≤0.05 were considered significant and retained for further analysis. In 623
order to decrease the risk of false positive error, we did not evaluate the probability of observing 624
any homolog group pairs with less than 5 co-occurrences, and also did not evaluate any homolog 625
group pairs whose corresponding null distribution had fewer than 10 samples. 626
Next, in order to obtain our final set of predicted BGCs, we took all homolog groups 627
found in significant co-occurrences, and conducted a de novo search in each genome for all 628
clusters containing genes belonging to those homolog groups within a 6 gene distance of each 629
other. In this way, all BGC clusters in our final set consisted of genes that belonged to these 630
homolog groups of interest, while all other intervening genes were not considered to be part of 631
the cluster. We treated homolog groups containing signature biosynthetic genes as we would any 632
other homolog group: if a signature gene predicted by SMURF was not a member of a homolog 633
group part of an unexpected co-occurrence, we did not consider it part of any clusters. We stress 634
that co-occurrences were only used to determine homolog groups of interest, but that once those 635
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
29
homolog groups were identified, they did not need to be part of an unexpected co-occurrence 636
within a predicted cluster in order to be considered part of that cluster. By focusing only on 637
genes that form unexpected co-occurrences, it is likely that we have underestimated the 638
compositional diversity of Dothideomycetes BGCs (but this may be the case for all cluster 639
detection algorithms; see Results). 640
We then grouped all predicted BGCs into homologous cluster groups (cluster groups) 641
based on a minimum of 90% similarity in their gene content, rounded down, in order to obtain a 642
strict definition of BGC homology that increases the likelihood that homologous clusters encode 643
similar metabolic phenotypes. This meant that clusters with sizes ranging from 2-10 were 644
allowed to differ in at most 1 gene; clusters with sizes ranging from 11-20 were allowed to differ 645
in at most 2 genes, etc. Clusters that were not at least 90% similar to any other cluster in the 646
dataset were designated orphan clusters. Note that because there is no perfect way to determine 647
homology when using similarity based metrics, (e.g., a 10 gene cluster could be 90% similar to a 648
9 gene cluster, which in turn could be 90% similar to a 8 gene cluster, but that 8 gene cluster 649
cannot be 90% similar to the 10 gene cluster), we developed a heuristic approach for sorting 650
clusters into groups. First, we conducted an all-vs-all comparison of content similarity to sort all 651
clusters into preliminary groups by iterating through the clusters from largest to smallest, where 652
size equaled the number of unique homolog groups, and clusters could only be assigned to a 653
single group. Then, within each preliminary group, we identified clusters most similar to all 654
other clusters within the group and used them as references to which all other clusters were 655
compared during a new round of group assignment. In this final round, clusters were grouped 656
together with a given reference into a cluster group if they were at least 90% similar to it and 657
were classified as orphan clusters if they were not 90% similar to any references. The often-658
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
30
unique compositions of clusters means that in most cases, there is no ambiguity to how the 659
clusters are classified; however, for a small number of clusters, especially those with fewer 660
genes, there may be some ambiguity as to which group they belong. 661
Annotation of BGCs and gene functions 662
In order to detect loci homologous to known BGCs in Dothideomycete genomes, amino 663
acid sequences of each annotated BGC within the MIBiG database (v1.4) were downloaded and 664
used as queries in a BLASTp search of all Dothideomycete proteomes (last accessed 665
04/01/2019). All hits with ≥50 bitscore and ≤1x10-4 evalue were retained, and clusters composed 666
of these hits were retrieved using a maximum of 6 intervening genes. In order to retain only 667
credible homologs of the annotated MIBiG queries and to account for error in BLAST searches 668
due to overlapping hits, we retained clusters with at least 3 genes that recovered at least 75% of 669
the genes in the initial query. This set of high confidence MIBiG BGCs was then compared to 670
the set of BGCs predicted by CO-OCCUR and antiSMASH to assess the ability of each 671
algorithm to recover homologs to known clusters. For each algorithm and each BGC recovered 672
using BLASTp to search the MIBiG database, we calculated percent recovery, defined as the 673
number of genes identified by the BLASTp search that were also identified as clustered by the 674
algorithm, divided by the size of the BGC identified by the BLASTp search, multiplied by 100. 675
We also calculated percent discovery, defined as the number of clustered genes identified by the 676
algorithm but not identified in the BLASTp search, divided by the size of the BGC identified by 677
the BLASTp search, multiplied by 100. 678
In order to annotate BGCs recovered by CO-OCCUR with characterized clusters, we 679
used amino acid sequences of all signature biosynthetic genes in CO-OCCUR clusters as 680
BLASTp queries in a search of the MIBiG database (min. percent similarity=70%; max 681
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
31
evalue=1x10-4; min. high scoring pairs coverage=50%). Basing our annotations on percent amino 682
acid similarity to characterized signature biosynthetic genes rather than on the number of genes 683
with similarity to BGC genes enabled a more conservative and comprehensive approach, as 684
many BGC entries within the MiBIG database are not complete. 685
Proteins within predicted BGCs were annotated using eggNOG-mapper (Huerta-Cepas, 686
Forslund et al., 2017) based on fungal-specific fuNOG orthology data (Huerta-Cepas, 687
Szklarczyk, et al., 2015). Consensus annotations for all homolog groups were derived by 688
selecting the most frequent annotation among all members of the group. 689
Comparing BGC detection algorithms 690
In order to assess the relative performances of SMURF, antiSMASH and CO-OCCUR, 691
we compared all BGCs predicted by each method, and kept track of the genes within those BGCs 692
that were identified by either one or multiple methods. We summarized these findings in a venn 693
diagram using the “eulerr” package in R (Larsson 2019). Note that for the purposes of this 694
analysis, BGCs predicted by SMURF and antiSMASH were considered to be composed only of 695
genes that matched a precomputed pHMM, and BGCs predicted by CO-OCCUR were composed 696
only of genes belonging to homolog groups that were part of unexpected co-occurrences, while 697
all other intervening genes within the BGC’s boundaries were not considered to be part of the 698
cluster. In doing so, we effectively ignored intervening genes that were situated between or are 699
immediately adjacent to these clustered genes of interest for the purposes of defining a cluster’s 700
content. While this approach likely does not capture the full diversity of cluster composition, it is 701
expected to decrease false positive error in BGC content prediction and represents a conservative 702
approach to identifying what genes make up a given cluster. 703
Construction of a co-occurrence network 704
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
32
We visualized relationships between homolog group pairs with unexpectedly large 705
numbers of co-occurrences in a network using Cytoscape v.3.4.0 (Shannon, Markiel et al. 2003). 706
The network layout was determined using the AllegroLayout plugin with the Allegro Spring-707
Electric algorithm. In order to identify hub nodes within the network, we calculated betweeness 708
centrality, a measurement of the shortest paths within a network that pass through a given node, 709
for each node using Cytoscape. 710
Assessment of cluster group phylogenetic signal 711
In order to quantify the dispersion of phylogenetic distributions of cluster groups 712
predicted by CO-OCCUR, we created a binary genome x cluster group matrix for all 239 cluster 713
groups with ≥4 genes that indicated the presence or absence of these cluster groups across all 101 714
genomes. We used this matrix in conjunction with the “phylo.d” function from the “caper” 715
package v1.0.1 in R (Orme, Freckleton et al. 2012) to calculate Fritz and Purvis’ D statistic for 716
each cluster group’s distribution, where D is a measurement of phylogenetic signal for a binary 717
trait obtained by calibrating the observed number of changes in a binary trait’s evolution across a 718
phylogeny by the mean sum of changes expected under two null models of binary trait evolution. 719
The first null model simulates the phylogenetic distribution expected under a model of random 720
trait inheritance, and the second simulates the phylogenetic distribution expected under a 721
threshold model of Brownian evolution that evolves a trait along the phylogeny under a 722
Brownian process where variation in that trait’s distribution accumulates at a rate proportional to 723
branch length (Fritz & Purvis 2010). D≈1 if the trait has phylogenetically random distribution; 724
D≈0 if the trait has a phylogenetic distribution that follows the Brownian model; D>1 if the trait 725
has a phylogenetic distribution that is less conserved, or over-dispersed, compared to the 726
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
33
Brownian model; D < 0 if the trait has a phylogenetic distribution that is more conserved, or 727
under-dispersed, compared to the Brownian model. 728
Dissimilarity and Diversity Analyses 729
We created cluster group and orphan cluster x homolog group matrices in order to 730
determine the dissimilarity between predicted cluster groups. In these matrices, for each cluster 731
group or orphan cluster, we indicated the presence or absence of homolog groups in at least one 732
cluster within the cluster group or orphan cluster, effectively summarizing each cluster group and 733
orphan cluster by integrating over the content of all clusters assigned to that group. We next used 734
the matrix in conjunction with the “vegdist” function from the “vegan” package in R (Oksanen, 735
Blanchet et al. 2016) to create a Raup-Crick dissimilarity matrix that was visualized as a 736
dendrogram using complete linkage clustering as implemented in the “hclust” function from the 737
core “stats” package in R. These dendrograms were then used to assess the functional diversity 738
of BGC repertoires (e.g., in the Pleosporales) by measuring the total branch distance connecting 739
all cluster groups and orphan clusters within a given repertoire using the “treedive” function 740
from the “vegan” package in R. 741
We used the same above procedure to calculate Sørensen dissimilarity between 742
Pleosporalean genomes based on their BGC repertoires, only this time using a genome x cluster 743
group and orphan cluster matrix that depicted the presence or absence of cluster groups and 744
orphan clusters across all 49 Pleosporalean genomes. We also used this matrix to calculate and 745
partition β diversity in Pleosporalean cluster repertoires using the “beta.sample” function 746
(index.family = "sorensen", sites = 10, samples = 999) from the “betapart” v1.4 package in R 747
(Baselga & Orme 2012) in order to determine how much of the observed diversity among 748
repertoires was due to gain/loss of cluster groups and orphan clusters, and how much was due to 749
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
34
nestedness. We also used the genome x cluster group and orphan cluster matrix to conduct a 750
rarefaction of cluster richness across Pleosporalean genomes using the “iNEXT” function (q = 0, 751
datatype="incidence_raw", endpoint=98) from the “iNEXT” package in R (Hseih, Ma & Chao 752
2016). 753
Abbreviations: 754
BGC - Biosynthetic Gene Cluster; pHMM - profile Hidden Markov Model; SM - Secondary 755
Metabolite; PKS - Polyketide Synthetase; NRPS - Nonribosomal Peptide Synthetase; TC - 756
Terpene Cyclase; DMAT - dimethylallyl tryptophan synthase; 757
Declarations: 758
Ethics approval and consent to participate 759
No human subjects were involved in the research 760
Consent for publication 761
No human data was used in the research 762
Availability of data and materials 763
All genome data is available at https://mycocosm.jgi.doe.gov/mycocosm/home 764
and described in Haridas et al. in press. 765
All scripts used in the analyses are available at https://github.com/egluckthaler/co-occur 766
Additional data generated in this study is included in the supplemental datafile 767
Funding 768
This work was supported by the National Science Foundation (DEB-1638999, JCS), the Fonds 769
de Recherche du Québec-Nature et Technologies (EG-T), and the Ohio State University 770
Graduate School (EG-T). The work conducted by the U.S. Department of Energy Joint Genome 771
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
35
Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the 772
U.S. Department of Energy under Contract No. DE-AC02-05CH11231. 773
Authors' contributions 774
Formulated the study EG-T, KEB, JCS 775
Designed the methodology EG-T, JCS 776
Generated resources EG-T, PWC, IG, MB 777
Collected the data EG-T, SH 778
Analyzed the data EG-T 779
Provided leadership and/or mentorship in the study JWS, JCS 780
Prepared the manuscript EG-T, JCS 781
Contributed to the writing/editing of the manuscript - KEB, PWC, JWS 782
Acknowledgements 783
Computational work by EG-T was conducted using the resources of the Ohio Supercomputer 784
Center. 785
Competing interests 786
The authors declare that they have no competing interests. 787
Figure legends: 788
Figure 1. CO-OCCUR pipeline. The pipeline used genome annotations from 101 789
Dothideomycetes, and previously computed homolog groups (HGs) consisting of both orthologs 790
and paralogs (Haridas et al. in press). Biosynthetic gene clusters (BGCs) were inferred by 791
determining unexpectedly distributed shared HG pairs, determined according to a null-792
distribution of randomly sampled gene pairs in the same genomes, and then a search for all 793
clusters containing the HG pairs. The resulting BGCs were then either consolidated into cluster 794
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
36
groups (CGs) that share ~90% of gene content or labeled orphan cluster (OCs) if found only in a 795
single taxon. A detailed pipeline is presented in Figure SA. 796
Figure 2. Diversity of the largest detected secondary metabolite gene clusters across 101 797
Dothideomycetes. A maximum likelihood phylogenomic tree of 101 Dothideomycete species 798
(Haridas et al. in press) corresponds to rows in a heatmap (right) that depicts the number of 799
secondary metabolite clusters found in each genome, delimited by order (dotted line). Each 800
cluster is assigned to a homologous cluster group (cluster group; column) defined by at least 801
90% similarity at the composition level. Only cluster groups with ≥ 5 unique homolog groups per 802
cluster are shown. A complete linkage tree (top) depicts relationships among cluster groups, 803
where distance is proportional to the Raup-Crick dissimilarity in cluster group composition. 804
Cluster groups are colored according to their core signature biosynthetic genes, and cluster 805
groups with greater than 1 signature gene are left uncolored. Cluster groups with signature genes 806
≥70% identical to characterized BGC signature genes in MIBiG are indicated by a labeled red 807
box. 808
Figure 3. Gene co-occurrence networks among biosynthetic signature gene clusters. a) Co-809
occurrence network of gene homolog groups (homolog groups). Nodes in the co-occurrence 810
network represent all homolog groups found in homologous cluster groups (cluster groups). 811
Edges represent significant co-occurrences between homolog groups. Node size is proportional 812
to the number of significant co-occurrences involving that homolog group, and edge width is 813
proportional to the number of unique cluster types (either cluster groups or orphan clusters) with 814
≥ 4 homolog groups that contain the co-occurrence. Distance between nodes is proportional to 815
the number of co-occurrences they have in common, adjusted by edge width. Signature genes 816
(colored circles) and transport-related function (squares) are indicated. Betweenness centrality 817
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
37
scores ≥0.2 are indicated in brackets for signature genes and eight other nodes (n1-8). Networks 818
1 and 2 are the two largest networks. b) Histogram of betweenness centrality scores for all nodes 819
in Networks 1 and 2 (bin width = 0.1). c) Significant co-occurrences within PKS and NRPS 820
clusters. Boxplots of homolog group co-occurrences involving signature genes (top) and non-821
signature genes (bottom) across all polyketide synthase (PKS; green) and nonribosomal 822
polypeptide synthetase (NRPS; purple) clusters with ≥4 unique homolog groups. Boxplots 823
display the 75% percentile (top hinge), median (middle hinge), the 25th percentile (lower hinge), 824
and outliers (dots) determined by Tukey’s method. d) Diversity of PKS and NRPS clusters. A 825
line chart tracks the diversity of PKS and NRPS clusters across all cluster sizes for both PKS 826
(green) and NRPS (purple) clusters, where diversity is defined as the total number of unique 827
cluster types (either cluster group or orphan clusters) divided by the total number of clusters. 828
Figure 4. Benchmarking three different algorithms for biosynthetic gene cluster (BGC) 829
detection. a) Proportional Venn diagram of distinct and overlapping BGC genes of interest 830
detected by SMURF, antiSMASH and CO-OCCUR. SMURF and antiSMASH use profile Hidden 831
Markov Models (pHMMs) to identify clustered genes of interest, while CO-OCCUR uses 832
linkage-based criteria (see methods). Clustered genes (unbracketed) and secondary metabolism 833
biosynthesis, transport and catabolism clustered genes (fuNOG) detected are indicated for each 834
algorithm/combination. b) Complementary recovery of the cercosporin BGC using antiSMASH 835
and CO-OCCUR). Shading of genes in the Cercospora zeae-maydis cercosporin BGC (MIBiG 836
ID BGC0001541; recovered clusterID Cerzm1_BGC0001541_h92 in Table SG) indicates genes 837
identified by antiSMASH (blue), CO-OCCUR (yellow), or both algorithms (green). Gene names 838
are as in (de Jonge et al., 2018) and those required for cercosporin biosynthesis (Chen et al., 839
2007; Newman et al., 2016; de Jonge et al., 2018) are indicated with an asterisk. c) Gene 840
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
38
recovery and discovery in clusters homologous to known BGCs. Scatterplots show the percent of 841
genes recovered (top) or discovered (bottom) by antiSMASH vs. CO-OCCUR at each locus 842
homologous to a MIBiG BGC (search criteria: minimum 3 gene cutoff; minimum of 75% genes 843
similar to MIBiG BGC genes in locus). Percent recovery is defined as the number of genes 844
identified by BLASTp in an algorithm-identified cluster divided by the size of the BLASTp 845
identified BGC, multiplied by 100. Percent discovery is defined as the number of genes 846
identified by the cluster algorithm but not identified in the BLASTp search, divided by the size 847
of the BLASTp identified BGC, multiplied by 100. y = x at the dotted reference line. 848
Figure 5. Phylogenetic and ecological signal in the distributions of homologous cluster 849
groups (cluster groups). a) Scatterplot of phylogenetic and ecological signal of cluster groups. 850
Values along the X-axis correspond to Fritz and Purvis’ D statistic, representing phylogenetic 851
signal in a cluster group’s distribution. Distributions of cluster groups with D<0 are more 852
conserved compared to a Brownian model of trait evolution, and distributions of cluster groups 853
with D>1 are considered over-dispersed. Cluster groups with more pathotrophs than saprotrophs 854
have Y >0 while cluster groups with more saprotrophs than pathotrophs have Y <0. Point 855
representing cluster group distributions with probability ≤0.05 of Brownian trait evolution are in 856
black, while those >0.05 are in gray. Cluster groups with P(Brownian) ≤0.05 and a lifestyle ratio 857
≥2 are labeled and described in b) and c). Only cluster groups with ≥ 4 unique gene homolog 858
groups (homolog groups) per cluster are shown. b) Summary descriptions of labeled cluster 859
groups. Sig. genes = signature genes present in the cluster group; Size = number of unique 860
homolog groups in the cluster group reference cluster; Freq. = number of fungi with a cluster that 861
belongs to the cluster group; Best known signature(s) = signature gene(s) from the MIBiG 862
database with the highest similarity to signature genes from the cluster group, with average 863
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
39
percentage similarity shown in parentheses. c) Phylogenetic distributions of labeled cluster 864
groups. Presence (black cells) and absence (gray cells) matrix of clusters assigned to each 865
labeled cluster group across Dothideomycetes genomes tree as in Figure 2. 866
Figure 6. Three examples of homologous cluster groups (cluster groups) with conserved 867
phylogenetic distributions. a) Cluster group distributions. Presence (black cells) and absence 868
(gray cells) matrix of clusters assigned to various cluster groups (columns a-j, described in part 869
b), across Dothideomycetes genomes tree as in Figure 2. Each matrix contains distinct sets of 870
cluster groups that are separated by ≤0.05 distance units on the complete linkage tree in Figure 2. 871
The number of fungi with each cluster group is indicated at the bottom of each column. b) 872
Cluster group composition. Cluster groups in set 1 are predicted to encode DHN melanin 873
biosynthesis; set 2 contains unknown cluster groups with NRPS-like signature genes; set 3 874
contains unknown cluster groups with PKS signature genes, where the PKSs from group 16 are 875
on average 84% similar to the PKS in the characterized alternapyrone cluster (MIBiG ID: 876
BGC0000012). Homolog group presence in a given cluster group is indicated by a gray box 877
below the description. Brackets surround homolog groups present in <50% of clusters assigned 878
to a given cluster group. 879
Figure 7. Diversity of secondary metabolite gene cluster repertoires in Pleosporalean fungi. 880
a) Grouping of fungi based on the combinations of gene clusters (i.e., cluster repertoires) found 881
in their genomes. Shown to the left is a complete linkage tree where distance between different 882
fungal species is proportional to the Sorensen dissimilarity between their cluster repertoires. To 883
the right is a presence (black) and absence (white) matrix where each column represents a unique 884
cluster type (either a homologous cluster group or cluster orphan) and each row corresponds to 885
the adjacent fungal genome. On top of the heatmap is a complete linkage tree displaying 886
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
40
relationships between unique cluster types, where distance is proportional to the Raup-Crick 887
dissimilarity in cluster composition. b) Relationship between cluster repertoire size and cluster 888
repertoire diversity. Cluster repertoire diversity was calculated for each genome by finding the 889
total branch length on the Raup-Crick dissimilarity tree in a) associated with the set of clusters 890
found in that genome. Cluster repertoire diversity is thus a measurement of a given genome’s 891
repertoire diversity, in terms of the gene content of its clusters. A solid line models the linear 892
relationship between repertoire size and diversity (adj. R2 = 0.855). The shaded area around the 893
line represents the 95% confidence interval associated with the model. c) Sampled and projected 894
secondary metabolite cluster richness within the Pleosporales. Rarefied (solid lines) and 895
extrapolated (dotted lines) estimates of secondary metabolite gene cluster richness (i.e., the 896
number of unique cluster types) with respect to the number of sampled genomes are shown for 897
the Pleosporales. Shaded areas represent the 95% confidence intervals for both estimate types, 898
derived from 100 bootstrap replicates. All three graphs were generated using data from the 318 899
unique cluster types with ≥ 4 unique gene homolog groups that are associated with 47 900
Pleosporalean fungi and 2 as yet unclassified fungi found within the Pleosporalean clade on the 901
phylogenomic species tree in Figure 2. 902
903
Supporting information: 904
Table SA. Gene homolog groups (homolog groups) part of unexpected co-occurrences. 905
Table SB. Unexpected co-occurrences between gene homolog groups (homolog groups) 906
occurring in the vicinity of signature biosynthetic genes, and their frequency across different SM 907
classes. 908
Table SC. Positional information of all recovered CO-OCCUR clusters. 909
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
41
Table SD. Genomes used in this study. 910
Table SE. Cluster types (groups and orphans detected by CO-OCCUR. 911
Table SF. BLAST-based annotation of CO-OCCUR clusters with known signature biosynthetic 912
genes from the MIBiG database. 913
Table SG. Cross-referencing clusters retrieved by CO-OCCUR, antiSMASH, and BLAST 914
searches of MIBiG database to determine percent recovery and discovery. 915
Table SH. positional information of all recovered antiSMASH clusters. 916
Table SI. Cluster types (groups and orphans detected by antiSMASH. 917
Table SJ. positional information of all recovered SMURF clusters. 918
Table SK. Cluster types (groups and orphans detected by SMURF). 919
Table SL. Overlapping and complementary recovery of clustered genes of interest using 920
antiSMASH, SMURF, and CO-OCCUR. 921
Table SM. Positional information of all clusters recovered with a BLASTp search of the MIBiG 922
database. 923
924
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
42
References: 925 Agrawal AA, Hastings AP, Johnson MT, Maron JL, Salminen JP. Insect herbivores drive real-926
time ecological and evolutionary change in plant populations. Science. 2012;338:6103. 927 Akimitsu K, Tsuge T, Kodama M, Yamamoto M, Otani H. Alternaria host-selective toxins: 928
determinant factors of plant disease. Journal of General Plant Pathology. 2014;80:2. 929 Ali A, Caggia S, Matesic DF, Khan SA. Chaetoglobosin K, an Akt pathway inhibitor, prevents 930
proliferation and migration of prostate carcinoma cells. 2015. 931 Baselga A, Orme CD L. betapart: an R package for the study of beta diversity. Methods in 932
ecology and evolution. 2012;3:5. 933 Baselga, A. The relationship between species replacement, dissimilarity derived from nestedness, 934
and nestedness. Global Ecology and Biogeography. 2012;21:12. 935 Beimforde C, Feldberg K, Nylinder S, Rikkinen J, Tuovila H, Dörfelt H, M. Gube DJ Jackson, 936
Reitner J, Seyfullah LJ. Estimating the Phanerozoic history of the Ascomycota lineages: 937 combining fossil and molecular data. Molecular phylogenetics and evolution. 2014;78. 938
Blin K, Wolf T, Chevrette MG, Lu X, Schwalen CJ, Kautsar SA, ... Medema MH. antiSMASH 939 4. 0—improvements in chemistry prediction and gene cluster boundary identification. 940 Nucleic acids research. 2017;45:W1. 941
Bok JW, Chung D, Balajee SA, Marr KA, Andes D, Nielsen KF, ... Keller NP. GliZ, a 942 transcriptional regulator of gliotoxin biosynthesis, contributes to Aspergillus fumigatus 943 virulence. Infection and immunity. 2006;74:12. 944
Bradshaw RE, Slot JC, Moore GG, Chettri P, de Wit PJ, Ehrlich KC, ... Cox MP. Fragmentation 945 of an aflatoxin‐like gene cluster in a forest pathogen. New Phytologist. 2013;198:2. 946
Chen H, Lee MH, Daub ME, Chung KR. Molecular analysis of the cercosporin biosynthetic gene 947 cluster in Cercospora nicotianae. Molecular microbiology. 2007;64:3. 948
Choudoir MJ, Pepe-Ranney C, Buckley DH. Diversification of secondary metabolite 949 biosynthetic gene clusters coincides with lineage divergence in Streptomyces. 950 Antibiotics. 2018;7:1. 951
Cimermancic P, Medema MH, Claesen J, Kurita K, Brown LC W, Mavrommatis K, ... Birren 952 BW. Insights into secondary metabolism from a global analysis of prokaryotic 953 biosynthetic gene clusters. Cell. 2014;158:2. 954
Ciuffetti LM, Manning VA, Pandelova I, Betts MF, Martinez JP. Host‐selective toxins, Ptr ToxA 955 and Ptr ToxB, as necrotrophic effectors in the Pyrenophora tritici‐repentis–wheat 956 interaction. New Phytologist. 2010 Sep;187(4):911-9. 957
Coleman JJ, Mylonakis, E. Efflux in fungi: la piece de resistance. PLoS Pathogens. 2009;5:6. 958 Condon BJ, Elliott C, González JB, Yun SH, Akagi Y, Wiesner-Hanks T, Kodama M, Turgeon 959
BG. Clues to an evolutionary mystery: the genes for T-Toxin, enabler of the devastating 960 1970 Southern corn leaf blight epidemic, are present in ancestral species, suggesting an 961 ancient origin. Molecular plant-microbe interactions. 2018 Nov 12;31(11):1154-65. 962
Crameri R, Garbani M, Rhyner C, Huitema, C. Fungi: the neglected allergenic sources. Allergy. 963 2014;69:2. 964
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
43
Daly, J. The host-specific toxins of Helminthosporia. Plant Infection: The Physiological and 965 Biochemical Basis. Asada Y, Bushnell, WR, Ouchi, S., and Vance CP Berlin, 966 Springer_verlag. 1982. 967
De Jonge R, Ebert MK, Huitt-Roehl CR, Pal P, Suttle JC, Spanner RE, ... Thomma BP. Gene 968 cluster conservation provides insight into cercosporin biosynthesis and extends 969 production to the genus Colletotrichum. Proceedings of the National Academy of 970 Sciences. 2018;115:24. 971
Del Carratore F, Zych K, Cummings M, Takano E, Medema MH, Breitling, R. Computational 972 identification of co-evolving multi-gene modules in microbial biosynthetic gene clusters. 973 Communications Biology. 2019;2:1. 974
Dhillon B, Feau N, Aerts AL, Beauseigle S, Bernier L, Copeland A, ... LaButti KM. Horizontal 975 gene transfer and gene dosage drives adaptation to wood colonization in a tree pathogen. 976 Proceedings of the National Academy of Sciences. 2015;112:11. 977
Firn RD, Jones CG. Natural products–a simple model to explain chemical diversity. Natural 978 product reports. 2003;20:4. 979
Fritz SA, Purvis, A. Selectivity in mammalian extinction risk and threat types: a new measure of 980 phylogenetic signal strength in binary traits. Conservation Biology. 2010;24:4. 981
Fujii I, Yoshida N, Shimomaki S, Oikawa H, Ebizuka, Y. An iterative type I polyketide synthase 982 PKSN catalyzes synthesis of the decaketide alternapyrone with regio-specific octa-983 methylation. Chemistry biology. 2005;12:12. 984
Gardiner DM, Cozijnsen AJ, Wilson LM, Pedras MS, Howlett BJ. The sirodesmin biosynthetic 985 gene cluster of the plant pathogenic fungus Leptosphaeria maculans. Molecular 986 microbiology. 2004 Sep;53(5):1307-18. 987
Gardiner DM, Waring P, Howlett BJ. The epipolythiodioxopiperazine (ETP) class of fungal 988 toxins: distribution, mode of action, functions and biosynthesis. Microbiology. 989 2005;151:4. 990
Glassmire AE, Jeffrey CS, Forister ML, Parchman TL, Nice CC, Jahner JP, ... Leonard MD. 991 Intraspecific phytochemical variation shapes community and population structure for 992 specialist caterpillars. New Phytologist. 2016;212:1. 993
Gluck-Thaler E, Slot JC. Specialized plant biochemistry drives gene clustering in fungi. The 994 ISME journal. 2018;12:7. 995
Gluck‐Thaler E, Vijayakumar V, Slot JC. Fungal adaptation to plant defences through 996 convergent assembly of metabolic modules. Molecular ecology. 2018;27:24. 997
Goodwin SB, M'Barek SB, Dhillon B, Wittenberg AH, Crane CF, Hane JK, ... Antoniw, J. 998 Finished genome of the fungal wheat pathogen Mycosphaerella graminicola reveals 999 dispensome structure, chromosome plasticity, and stealth pathogenesis. PLoS genetics. 1000 2011;7:6. 1001
Grandaubert J, Lowe RG, Soyer JL, Schoch CL, Van de Wouw AP, Fudal I, ... Linglin, J. 1002 Transposable element-assisted evolution and adaptation to host plant within the 1003
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
44
Leptosphaeria maculans-Leptosphaeria biglobosa species complex of fungal pathogens. 1004 BMC genomics. 2014;15:1. 1005
Grigoriev IV, Nikitin R, Haridas S, Kuo A, Ohm R, Otillar R, Riley R, Salamov A, Zhao X, 1006 Korzeniewski F, Smirnova T. MycoCosm portal: gearing up for 1000 fungal genomes. 1007 Nucleic acids research. 2014 Jan 1;42(D1):D699-704. 1008
Hane JK, Rouxel T, Howlett BJ, Kema GH, Goodwin SB, Oliver RP. A novel mode of 1009 chromosomal evolution peculiar to filamentous Ascomycete fungi. Genome biology. 1010 2011;12:5. 1011
Haridas S, Albert R, Binder M, Bloem J, LaButti K, Salamov A, Andreopoulos B, Baker, SE, 1012 Barry K, Bills G, Bluhm, BH, Cannon C, Castanera R, Culley, DE, Daum C, Ezra D, 1013 González, JB, Henrissat B, Kuo A, Liang C, Lipzen A, Lutzoni F, Magnuson J, Mondo S, 1014 Nolan M, Ohm, RA, Pangilinan J, Park, H-J, Ramírez L, Alfaro M, Sun H, Tritt A, 1015 Yoshinaga Y, Zwiers L-H, Turgeon BG, Goodwin SB, Spatafora JW, Crous PW, 1016 Grigoriev IV. 101 Dothideomycetes genomes: a test case for predicting lifestyles and 1017 emergence of pathogens. Studies in Mycology, in press. 1018
Hartmann FE, McDonald BA, Croll, D. Genome‐wide evidence for divergent selection between 1019 populations of a major agricultural pathogen. Molecular ecology. 2018;27:12. 1020
Hoffmeister D, Keller NP. Natural products of filamentous fungi: enzymes, genes, and their 1021 regulation. Natural product reports. 2007;24:2. 1022
Holeski LM, Hillstrom ML, Whitham TG, Lindroth RL. Relative importance of genetic, 1023 ontogenetic, induction, and seasonal variation in producing a multivariate defense 1024 phenotype in a foundation tree species. Oecologia. 2012;170:3. 1025
Holeski LM, Keefover-Ring K, Bowers MD, Harnenz ZT, Lindroth RL. Patterns of 1026 phytochemical variation in Mimulus guttatus (yellow monkeyflower). Journal of 1027 chemical ecology. 2013;39:4. 1028
Horn BW. Ecology and population biology of aflatoxigenic fungi in soil. Journal of Toxicology-1029 Toxin Reviews 2003;22:2-3. 1030
Hsieh TC, Ma KH, Chao, A. iNEXT: an R package for rarefaction and extrapolation of species 1031 diversity (H ill numbers). Methods in Ecology and Evolution. 2016;7:12. 1032
Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, Von Mering C, Bork, P. Fast 1033 genome-wide functional annotation through orthology assignment by eggNOG-mapper. 1034 Molecular biology and evolution. 2017;34:8. 1035
Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, ... Jensen LJ. 1036 eggNOG 4. 5: a hierarchical orthology framework with improved functional annotations 1037 for eukaryotic, prokaryotic and viral sequences. Nucleic acids research. 2016;44:D1. 1038
Jiang C, Song J, Zhang J, Yang, Q. New production process of the antifungal chaetoglobosin A 1039 using cornstalks. Brazilian journal of microbiology. 2017;48:3. 1040
Kasahara K, Miyamoto T, Fujimoto T, Oguri H, Tokiwano T, Oikawa H, ... Fujii, I. 1041 Solanapyrone synthase, a possible Diels–Alderase and iterative type I polyketide 1042
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
45
synthase encoded in a biosynthetic gene cluster from Alternaria solani. ChemBioChem. 1043 2010;11:9. 1044
Kaur, S. Phytotoxicity of solanapyrones produced by the fungus Ascochyta rabiei and their 1045 possible role in blight of chickpea (Cicer arietinum). Plant Science. 1995;109:1. 1046
Keller NP. Translating biosynthetic gene clusters into fungal armor and weaponry. Nature 1047 chemical biology. 2015;11:9. 1048
Khaldi N, Seifuddin FT, Turner G, Haft D, Nierman WC, Wolfe KH, Fedorova ND. SMURF: 1049 genomic mapping of fungal secondary metabolite clusters. Fungal Genetics and Biology. 1050 2010;47:9. 1051
Kurmayer R, Blom JF, Deng L, Pernthaler J. Integrating phylogeny, geographic niche 1052 partitioning and secondary metabolite synthesis in bloom-forming Planktothrix. The 1053 ISME journal. 2015;9:4. 1054
Kursar TA, Dexter KG, Lokvam J, Pennington RT, Richardson JE, Weber MG, ... Coley PD. 1055 The evolution of antiherbivore defenses and their contribution to species coexistence in 1056 the tropical tree genus Inga. Proceedings of the National Academy of Sciences. 1057 2009;106:43. 1058
Larsson, J. Eulerr: Area-Proportional Euler and Venn Diagrams with Ellipses. 2018. R package 1059 version 3. 1060
Li D, Baldwin IT, Gaquerel E. Navigating natural variation in herbivory-induced secondary 1061 metabolism in coyote tobacco populations using MS/MS structural analysis. Proceedings 1062 of the National Academy of Sciences. 2015;112:30. 1063
Lind AL, Wisecaver JH, Lameiras C, Wiemann P, Palmer JM, Keller NP, ... Rokas A. Drivers 1064 of genetic diversity in secondary metabolic gene clusters within a fungal species. PLoS 1065 biology. 2017;15:11. 1066
Manning VA, Pandelova I, Dhillon B, Wilhelm LJ, Goodwin SB, Berlin AM, ... Holman WH. 1067 Comparative genomics of a plant-pathogenic fungus, Pyrenophora tritici-repentis, reveals 1068 transduplication and the impact of repeat elements on pathogenicity and population 1069 divergence. G3: Genes, Genomes, Genetics. 2013;3:1. 1070
Menke J, Dong Y, Kistler HC. Fusarium graminearum Tri12p influences virulence to wheat and 1071 trichothecene accumulation. Molecular plant-microbe interactions. 2012;25:11. 1072
Mizushina Y, Kamisuki S, Kasai N, Shimazaki N, Takemura M, Asahara H, ... Sugawara F. A 1073 plant phytotoxin, solanapyrone A, is an inhibitor of DNA polymerase β and λ. Journal of 1074 Biological Chemistry. 2002;277:1. 1075
Newman AG, Townsend CA. Molecular characterization of the cercosporin biosynthetic 1076 pathway in the fungal plant pathogen Cercospora nicotianae. Journal of the American 1077 Chemical Society. 2016;138:12. 1078
Nielsen JC, Grijseels S, Prigent S, Ji B, Dainat J, Nielsen KF, ... Nielsen J. Global analysis of 1079 biosynthetic gene clusters reveals vast potential of secondary metabolite production in 1080 Penicillium species. Nature microbiology. 2017;2:6. 1081
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
46
Ohm RA, Feau N, Henrissat B, Schoch CL, Horwitz BA, Barry KW, Condon BJ, Copeland AC, 1082 Dhillon B, Glaser F, Hesse CN. Diverse lifestyles and strategies of plant pathogenesis 1083 encoded in the genomes of eighteen Dothideomycetes fungi. PLoS Pathogens. 2012 1084 Dec;8(12). 1085
Oksanen J, Kindt R, Legendre P, O’Hara B, Stevens MH, Oksanen MJ, Solymos P, Wagner H. 1086 The vegan package. Community ecology package. 2007;10. 1087
Olarte RA, Menke J, Zhang Y, Sullivan S, Slot JC, Huang Y, ... Bushley KE. Chromosome 1088 rearrangements shape the diversification of secondary metabolism in the cyclosporin 1089 producing fungus Tolypocladium inflatum. BMC genomics. 2019;20:1. 1090
Oliver RP, Friesen TL, Faris JD, Solomon PS. Stagonospora nodorum: from pathology to 1091 genomics and host resistance. Annual review of phytopathology. 2012;50. 1092
Orme D, Freckleton R, Thomas G, Petzoldt T, Fritz S, Isaac N, ... Pearse W. Caper: comparative 1093 analyses of phylogenetics and evolution in R. R package version 0.5. 2012. 1094
Pandelova I, Figueroa M, Wilhelm LJ, Manning VA, Mankaney AN, Mockler TC, Ciuffetti LM. 1095 Host-selective toxins of Pyrenophora tritici-repentis induce common responses associated 1096 with host susceptibility. PLoS One. 2012;7:7. 1097
Patron NJ, Waller RF, Cozijnsen AJ, Straney DC, Gardiner DM, Nierman WC, Howlett BJ. 1098 Origin and distribution of epipolythiodioxopiperazine (ETP) gene clusters in filamentous 1099 ascomycetes. BMC Evolutionary Biology. 2007;7:1. 1100
Proctor RH, McCormick SP, Kim HS, Cardoza RE, Stanley AM, Lindo L, ... Alexander NJ. 1101 Evolution of structural diversity of trichothecenes, a family of toxins produced by plant 1102 pathogenic and entomopathogenic fungi. PLoS pathogens. 2018;14:4. 1103
Reynolds HT, Vijayakumar V, Gluck‐Thaler E, Korotkin HB, Matheny PB, Slot JC. Horizontal 1104 gene cluster transfer increased hallucinogenic mushroom diversity. Evolution letters. 1105 2018;2:2. 1106
Ruibal C, Gueidan C, Selbmann L, Gorbushina AA, Crous PW, Groenewald JZ, ... Staley JT. 1107 Phylogeny of rock-inhabiting fungi related to Dothideomycetes. Studies in Mycology. 1108 2009;64. 1109
Schoch CL, Crous PW, Groenewald JZ, Boehm EW A, Burgess TI, De Gruyter J, ... Harada, Y. 1110 A class-wide phylogenetic assessment of Dothideomycetes. Studies in mycology. 1111 2009;64. 1112
Schümann J, Hertweck C. Molecular basis of cytochalasan biosynthesis in fungi: gene cluster 1113 analysis and evidence for the involvement of a PKS-NRPS hybrid synthase by RNA 1114 silencing. Journal of the American Chemical Society. 2007;129:31. 1115
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, ... Ideker T. Cytoscape: a 1116 software environment for integrated models of biomolecular interaction networks. 1117 Genome research. 2003;13:11. 1118
Shi‐Kunne X, Faino L, van den Berg GC, Thomma BP, Seidl MF. Evolution within the fungal 1119 genus Verticillium is characterized by chromosomal rearrangement and gene loss. 1120 Environmental microbiology. 2018;20:4. 1121
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
47
Slot JC. Fungal gene cluster diversity and evolution. In Advances in genetics (Vol. 100, pp. 141-1122 178). Academic Press. 2017. 1123
Slot JC, Gluck-Thaler E. Metabolic gene clusters, fungal diversity, and the generation of 1124 accessory functions. Current opinion in genetics development. 2019;58. 1125
Song W, Qiao X, Chen K, Wang Y, Ji S, Feng J, ... Ye M. Biosynthesis-based quantitative 1126 analysis of 151 secondary metabolites of licorice to differentiate medicinal Glycyrrhiza 1127 species and their hybrids. Analytical chemistry. 2017;89:5. 1128
Spatafora JW, Bushley KE. Phylogenomics and evolution of secondary metabolism in plant-1129 associated fungi. Current opinion in plant biology. 2015;26. 1130
Spatafora JW, Owensby CA, Douhan GW, Boehm EW, Schoch CL. Phylogenetic placement of 1131 the ectomycorrhizal genus Cenococcum in Gloniaceae (Dothideomycetes). Mycologia. 1132 2012;104:3. 1133
Suetrong S, Boonyuen N, Pang KL, Ueapattanakit J, Klaysuban A, Sri-indrasutdhi V, ... Jones 1134 EG. A taxonomic revision and phylogenetic reconstruction of the Jahnulales 1135 (Dothideomycetes), and the new family Manglicolaceae. Fungal Diversity. 2011;51:1. 1136
Theobald S, Vesth TC, Rendsvig JK, Nielsen KF, Riley R, de Abreu LM, ... Hoof JB. 1137 Uncovering secondary metabolite evolution and biosynthesis using gene cluster networks 1138 and genetic dereplication. Scientific reports. 2018;8:1. 1139
Turgeon BG, Baker SE. Genetic and genomic dissection of the Cochliobolus heterostrophus 1140 Tox1 locus controlling biosynthesis of the polyketide virulence factor T‐toxin. Advances 1141 in genetics. 2007;57. 1142
Vesth TC, Nybo JL, Theobald S, Frisvad JC, Larsen TO, Nielsen KF, ... Gladden JM. 1143 Investigation of inter-and intraspecies variation through genome sequencing of 1144 Aspergillus section Nigri. Nature Genetics. 2018;50:12. 1145
Villani A, Proctor RH, Kim HS, Brown DW, Logrieco AF, Amatulli MT, ... Susca A. Variation 1146 in secondary metabolite production potential in the Fusarium incarnatum-equiseti species 1147 complex revealed by comparative analysis of 13 genomes. BMC genomics. 2019;20:1. 1148
Walton JD. Host-selective toxins: Agents of compatibility. Plant Cell. 1996;8:10. 1149 Walton JD, Panaccione DG. Host-selective toxins and disease specificity: perspectives and 1150
progress. Annual review of phytopathology. 1993;31:1. 1151 Wang G, Liu Z, Lin R, Li E, Mao Z, Ling J, ... Xie B. Biosynthesis of antibiotic leucinostatins in 1152
bio-control fungus Purpureocillium lilacinum and their inhibition on Phytophthora 1153 revealed by genome mining. PLoS pathogens. 2016;12:7. 1154
Wang JS, Tang, L. Epidemiology of aflatoxin exposure and human liver cancer. Journal of 1155 Toxicology: Toxin Reviews. 2004;23:2-3. 1156
Weinhold A, Ullah C, Dressel S, Schoettner M, Gase K, Gaquerel E, ... Baldwin IT. O-acyl 1157 sugars protect a wild tobacco from both native fungal pathogens and a specialist 1158 herbivore. Plant physiology. 2017;174:1. 1159
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
48
Wight WD, Kim KH, Lawrence CB, Walton JD. Biosynthesis and role in virulence of the 1160 histone deacetylase inhibitor depudecin from Alternaria brassicicola. Molecular plant-1161 microbe interactions. 2009;22:10. 1162
Wijayawardene NN, Hyde KD, Lumbsch HT, Liu JK, Maharachchikumbura SS, Ekanayaka AH, 1163 ... Phookamsak R. Outline of ascomycota: 2017. Fungal Diversity. 2018;88:1. 1164
Wisecaver JH, Slot JC, Rokas, A. The evolution of fungal metabolic pathways. PLoS Genetics. 1165 2014;10:12. 1166
Wolpert TJ, Dunkle LD, Ciuffetti LM. Host-selective toxins and avirulence determinants: what's 1167 in a name?. Annual review of phytopathology. 2002;40:1. 1168
Xu YJ, Luo F, Li B, Shang Y, Wang C. Metabolic conservation and diversification of 1169 Metarhizium species correlate with fungal host-specificity. Frontiers in microbiology. 1170 2016;7. 1171
Ziemert N, Lechner A, Wietz M, Millán-Aguiñaga N, Chavarria KL, Jensen PR. Diversity and 1172 evolution of secondary metabolism in the marine actinomycete genus Salinispora. 1173 Proceedings of the National Academy of Sciences. 2014;111:12. 1174
Züst T, Heichinger C, Grossniklaus U, Harrington R, Kliebenstein DJ, Turnbull LA. Natural 1175 enemies drive geographic variation in plant defenses. Science. 2012;338:6103. 1176
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
SM cluster prediction using SMURF
newSM clusters(3399 Total)
Extract HGs present in SMURF clusters that co-occur an unexpected number of times
across all Dothideomycete genomes compared with the random distribution
Conduct de novo search for all clusters consisting of genes
belonging to HGs that are part of unexpected co-occurrences
Group together new SM clusters that have ~90% of
their genes in common(719 Total)
Similar clusters found?
Cluster groups (CGs)
(422 Total)
Orphan clusters (OCs)
(297 Total)
Group together genomes based on Raup-Crick
dissimilarity in their CG repertoire
Multi-cluster profiles
Co-occurrence network of HGs
Count the frequency of all unexpected HG co-
occurrences across all CGs
Unexpected HG
co-occurrences
BLAST CGs and OCs against known cluster
database(MIBIG, local database)
MIBIG Annotated CGs and OCs
Randomly sample 2 genes occurring within 6 genes of
each other
Retrieve the HG to which each gene belongs and
number of genes in each HG
Count the number of CGs per genome
(Average 33.7)
CG distribution across all genomes
Count the number of times members of each HG are found within 6 genes of each
other across complete dataset (101 Dothideomycete Genomes)
Bin sampled HG pairs into different categories based on the total number of
genes in each HG
Random distributions of HG
co-occurrences
Null model pipeline
CO-OCCUR pipeline
Yes No
Rep
eat 5
00,0
00 ti
mes
(with
out r
epla
cem
ent()
Extend SM clusters to contain neighboring genes (within a 6 gene
distance) belonging to a homolog group (HG) found in another SM cluster
Secondary metabolite (SM) clustering pipeline
Annotated genomes
with genes clustered into homolog groups
(HGs)
extended SM clusters
Initial SM clusters
Figure SA (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
6051(796)
1553(290)
5416(887)
1254(200)
1408(200*)
3326(1301*) 3873
(1468*)
co-occursmurf
antiSMASH
Number of proteins(Proteins participating in SM biosynthesis, transport and catabolism)* = p(enriched) < 0.01, Bonferroni corrected
Figure SB
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
DM
AT+N
RPS
_gro
up30
_s9
n.d.
_gro
up17
8_s4
n.d.
+NR
PS-L
ike_
grou
p191
_s4
n.d.
+NR
PS-L
ike_
grou
p193
_s4
n.d.
_gro
up12
3_s5
PKS_
grou
p67_
s7TC
_gro
up18
4_s4
NR
PS+P
KS-L
ike+
TC_g
roup
233_
s4PK
S-Li
ke+T
C_g
roup
126_
s5n.
d._g
roup
185_
s4n.
d.+N
RPS
-Lik
e_gr
oup1
68_s
4N
RPS
-Lik
e_gr
oup1
00_s
6PK
S-Li
ke_g
roup
190_
s4n.
d.+P
KS_g
roup
230_
s4n.
d._g
roup
231_
s4n.
d._g
roup
146_
s5PK
S-Li
ke_g
roup
166_
s5PK
S-Li
ke_g
roup
38_s
8PK
S-Li
ke_g
roup
154_
s5PK
S-Li
ke_g
roup
62_s
7PK
S-Li
ke_g
roup
199_
s4n.
d._g
roup
174_
s4PK
S-Li
ke_g
roup
155_
s5PK
S-Li
ke_g
roup
110_
s6PK
S-Li
ke_g
roup
41_s
8PK
S-Li
ke_g
roup
91_s
6N
RPS
_gro
up21
0_s4
n.d.
+PKS
_gro
up20
2_s4
PKS_
grou
p136
_s5
PKS_
grou
p39_
s8N
RPS
-Lik
e_gr
oup2
06_s
4n.
d._g
roup
211_
s4PK
S_gr
oup1
28_s
5N
RPS
+PKS
_gro
up97
_s6
PKS_
grou
p108
_s6
n.d.
_gro
up18
1_s4
n.d.
_gro
up21
7_s4
PKS_
grou
p78_
s6PK
S_gr
oup4
4_s8
PKS_
grou
p149
_s5
NR
PS-L
ike_
grou
p239
_s4
NR
PS-L
ike+
PKS_
grou
p81_
s6N
RPS
-Lik
e+PK
S+TC
_gro
up10
1_s6
NR
PS-L
ike+
PKS+
TC_g
roup
160_
s5N
RPS
-Lik
e+PK
S_gr
oup6
0_s7
HYB
RID
_gro
up12
1_s5
HYB
RID
_gro
up10
3_s6
HYB
RID
_gro
up11
6_s5
PKS-
Like
_gro
up86
_s6
PKS-
Like
_gro
up93
_s6
n.d.
+PKS
-Lik
e_gr
oup2
18_s
4PK
S-Li
ke_g
roup
65_s
7PK
S-Li
ke_g
roup
130_
s5PK
S-Li
ke_g
roup
125_
s5PK
S-Li
ke_g
roup
107_
s6n.
d._g
roup
177_
s4PK
S_gr
oup1
17_s
5n.
d._g
roup
213_
s4n.
d._g
roup
192_
s4PK
S_gr
oup1
18_s
5PK
S_gr
oup3
5_s8
n.d.
+PKS
_gro
up19
8_s4
PKS_
grou
p89_
s6PK
S_gr
oup2
0_s1
1H
YBR
ID+P
KS_g
roup
25_s
10PK
S_gr
oup1
57_s
5PK
S_gr
oup2
29_s
4D
MAT
+NR
PS_g
roup
17_s
12D
MAT
+NR
PS_g
roup
5_s1
4D
MAT
+NR
PS_g
roup
3_s1
5D
MAT
+NR
PS_g
roup
1_s1
8D
MAT
+NR
PS_g
roup
2_s1
7D
MAT
+NR
PS_g
roup
4_s1
5D
MAT
+NR
PS+P
KS_g
roup
24_s
11D
MAT
+NR
PS_g
roup
14_s
12D
MAT
+NR
PS_g
roup
21_s
11D
MAT
+NR
PS_g
roup
12_s
13n.
d._g
roup
208_
s4n.
d._g
roup
92_s
6PK
S_gr
oup1
9_s1
1PK
S_gr
oup3
1_s9
n.d.
_gro
up14
1_s5
n.d.
_gro
up17
3_s4
n.d.
_gro
up46
_s8
NR
PS+P
KS-L
ike_
grou
p13_
s13
PKS-
Like
_gro
up37
_s8
PKS-
Like
_gro
up12
2_s5
PKS-
Like
_gro
up52
_s8
PKS-
Like
_gro
up18
_s11
PKS-
Like
_gro
up8_
s13
PKS-
Like
_gro
up29
_s10
PKS-
Like
_gro
up9_
s13
PKS-
Like
_gro
up68
_s7
PKS-
Like
_gro
up32
_s9
NR
PS_g
roup
69_s
7N
RPS
_gro
up15
6_s5
NR
PS_g
roup
228_
s4N
RPS
_gro
up22
1_s4
NR
PS_g
roup
164_
s5N
RPS
+PKS
+PKS
-Lik
e_gr
oup8
8_s6
NR
PS+P
KS-L
ike_
grou
p87_
s6N
RPS
+PKS
-Lik
e_gr
oup2
7_s1
0N
RPS
+PKS
-Lik
e_gr
oup5
9_s7
NR
PS+N
RPS
-Lik
e_gr
oup1
80_s
4N
RPS
_gro
up75
_s6
NR
PS_g
roup
63_s
7N
RPS
_gro
up34
_s9
NR
PS_g
roup
158_
s5N
RPS
_gro
up14
4_s5
NR
PS_g
roup
133_
s5N
RPS
+NR
PS-L
ike_
grou
p216
_s4
NR
PS+N
RPS
-Lik
e_gr
oup5
1_s8
NR
PS+N
RPS
-Lik
e_gr
oup7
3_s7
NR
PS_g
roup
152_
s5N
RPS
_gro
up11
5_s6
NR
PS_g
roup
42_s
8N
RPS
_gro
up26
_s10
NR
PS_g
roup
50_s
8N
RPS
_gro
up33
_s9
NR
PS_g
roup
66_s
7N
RPS
_gro
up99
_s6
NR
PS+N
RPS
-Lik
e_gr
oup5
7_s7
NR
PS_g
roup
172_
s4N
RPS
_gro
up18
8_s4
NR
PS+T
C_g
roup
90_s
6N
RPS
+PKS
_gro
up22
0_s4
NR
PS_g
roup
189_
s4n.
d._g
roup
225_
s4N
RPS
-Lik
e+PK
S_gr
oup1
31_s
5PK
S_gr
oup1
34_s
5n.
d._g
roup
137_
s5PK
S_gr
oup2
23_s
4PK
S_gr
oup1
13_s
6PK
S_gr
oup4
0_s8
PKS_
grou
p175
_s4
PKS_
grou
p10_
s13
PKS_
grou
p165
_s5
n.d.
+PKS
_gro
up15
0_s5
PKS_
grou
p49_
s8D
MA T
+PKS
_gro
up23
_s11
DM
AT+P
KS_g
roup
7_s1
4PK
S_gr
oup6
_s14
PKS_
grou
p11_
s13
PKS_
grou
p127
_s5
PKS_
grou
p28_
s10
PKS+
TC_g
roup
142_
s5PK
S+TC
_gro
up18
2_s4
PKS_
grou
p80_
s6PK
S_gr
oup2
00_s
4PK
S_gr
oup1
87_s
4PK
S_gr
oup9
4_s6
PKS_
grou
p132
_s5
PKS_
grou
p64_
s7PK
S_gr
oup1
06_s
6PK
S_gr
oup4
3_s8
PKS_
grou
p209
_s4
PKS+
PKS-
Like
_gro
up36
_s8
PKS+
PKS-
Like
_gro
up10
9_s6
PKS_
grou
p170
_s4
PKS_
grou
p205
_s4
PKS_
grou
p70_
s7PK
S_gr
oup1
1 1_s
6PK
S_gr
oup1
05_s
6PK
S_gr
oup1
12_s
6PK
S_gr
oup1
96_s
4PK
S_gr
oup2
34_s
4PK
S_gr
oup5
4_s7
PKS_
grou
p238
_s4
NR
PS-L
ike_
grou
p226
_s4
NR
PS-L
ike_
grou
p203
_s4
PKS_
grou
p138
_s5
PKS_
grou
p159
_s5
PKS_
grou
p72_
s7H
YBR
ID+P
KS_g
roup
15_s
12PK
S_gr
oup2
2_s1
1PK
S_gr
oup1
6_s1
2PK
S_gr
oup2
12_s
4PK
S_gr
oup1
76_s
4PK
S+TC
_gro
up45
_s8
PKS_
grou
p102
_s6
PKS_
grou
p139
_s5
PKS_
grou
p83_
s6PK
S+TC
_gro
up58
_s7
PKS_
grou
p96_
s6D
MAT
+PKS
_gro
up18
3_s4
PKS_
grou
p53_
s8PK
S_gr
oup1
24_s
5PK
S_gr
oup1
04_s
6N
RPS
+PKS
_gro
up22
7_s4
NR
PS+P
KS_g
roup
143_
s5PK
S_gr
oup2
37_s
4PK
S_gr
oup7
1_s7
PKS_
grou
p56_
s7PK
S_gr
oup2
07_s
4PK
S_gr
oup8
2_s6
PKS_
grou
p47_
s8PK
S_gr
oup8
4_s6
DM
A T+P
KS_g
roup
48_s
8PK
S_gr
oup7
6_s6
NR
PS+P
KS_g
roup
161_
s5D
MAT
+PKS
_gro
up16
3_s5
n.d.
+PKS
_gro
up61
_s7
PKS_
grou
p153
_s5
PKS_
grou
p114
_s6
PKS_
grou
p135
_s5
PKS_
grou
p171
_s4
PKS_
grou
p145
_s5
NR
PS+P
KS_g
roup
224_
s4D
MAT
+PKS
_gro
up98
_s6
PKS+
TC_g
roup
179_
s4PK
S_gr
oup1
40_s
5PK
S_gr
oup9
5_s6
PKS_
grou
p197
_s4
PKS_
grou
p167
_s4
PKS_
grou
p186
_s4
PKS_
grou
p85_
s6PK
S_gr
oup2
14_s
4PK
S_gr
oup2
32_s
4n.
d.+P
KS_g
roup
201_
s4PK
S_gr
oup2
36_s
4N
RPS
-Lik
e+PK
S_gr
oup2
22_s
4PK
S_gr
oup1
69_s
4PK
S_gr
oup1
20_s
5PK
S_gr
oup1
29_s
5PK
S_gr
oup1
95_s
4PK
S_gr
oup1
51_s
5PK
S_gr
oup7
4_s6
PKS_
grou
p215
_s4
NR
PS-L
ike+
PKS_
grou
p194
_s4
PKS_
grou
p119
_s5
PKS_
grou
p204
_s4
n.d.
+PKS
_gro
up23
5_s4
PKS_
grou
p79_
s6PK
S+TC
_gro
up14
7_s5
PKS+
TC_g
roup
148_
s5PK
S_gr
oup5
5_s7
PKS_
grou
p219
_s4
PKS_
grou
p77_
s6PK
S_gr
oup1
62_s
5
Viridothelium virensTrypetheliales
Myriangium duriaei CBS 260.36Myriangiales
Elsinoe ampelinaMyriangiales
Aureobasidium pullulans EXF-150Dothideales
Aureobasidium namibiae CBS 147.97Dothideales
Aureobasidium melanogenum CBS 110374Dothideales
Aureobasidium subglaciale EXF-2481Dothideales
Delphinella strobiligenaDothideales
Polychaeton citri CBS 116435Capnodiales
Dissoconium aciculare CBS 342.82Capnodiales
Zymoseptoria tritici IPO323Capnodiales
Zymoseptoria pseudotritici STIR04_2.2.1Capnodiales
Zymoseptoria ardabiliae STIR04_1.1.1Capnodiales
Dothistroma septosporum NZE10Capnodiales
Passalora fulvaCapnodiales
Zasmidium cellare ATCC 36951Capnodiales
Pseudocercospora fijiensisCapnodiales
Sphaerulina musiva SO2202Capnodiales
Sphaerulina populicolaCapnodiales
Cercospora zeae-maydis SCOH1-5Capnodiales
Hortaea acidophilaDothideales
Baudoinia panamericana UAMH 10762Capnodiales
Teratosphaeria nubilosaCapnodiales
Piedraia hortae CBS 480.64Capnodiales
Acidomyces richmondensis BFWUnknown
Eremomyces bilateralis CBS 781.70Unknown
Microthyrium microscopicumMicrothyriales
Trichodelitschia bisporulaPhaeotrichales
Tothia fuscellaVenturiales
Verruconis gallopavaVenturiales
Venturia inaequalisVenturiales
Venturia pyrinaVenturiales
Aulographum hederaeUnknown
Rhizodiscina lignyotaPatellariales
Lineolata rhizophoraeDothideales
Patellaria atrata CBS 101060Patellariales
Coniosporium apollinis CBS 100218Unknown
Aplosporella prunicola CBS 121167Botryosphaeriales
Phyllosticta citriasianaBotryosphaeriales
Neofusicoccum parvum UCRNP2Botryosphaeriales
Macrophomina phaseolina MS6Botryosphaeriales
Botryosphaeria dothideaBotryosphaeriales
Diplodia seriataBotryosphaeriales
Saccharata proteae CBS 121410Botryosphaeriales
Pseudovirgaria hyperparasiticaCapnodiales
Lepidopterella palustris CBS 459.81Mytilinidiales
Glonium stellatumUnknown
Cenococcum geophilum 1.58Unknown
Lophium mytilinumMytilinidiales
Mytilinidion resinicolaMytilinidiales
Rhytidhysteron rufulumHysteriales
Hysterium pulicareHysteriales
Delitschia confertasporaPleosporales
Zopfia rhizophila CBS 207.26Unknown
Lindgomyces ingoldianusPleosporales
Clohesyomyces aquaticusPleosporales
Amniculicola lignicola CBS 123094Pleosporales
Lophiotrema nuculaPleosporales
Polyplosphaeria fuscaPleosporales
Didymosphaeria enaliaPleosporales
Aaosphaeria arxiiPleosporales
Lophiostoma macrostomum CBS 122681Pleosporales
Westerdykella ornataPleosporales
Sporormia fimetaria CBS 119925Pleosporales
Massariosphaeria phaeosporaPleosporales
Trematosphaeria pertusaPleosporales
Bimuria novae-zelandiaePleosporales
Paraphaeosphaeria sporulosaPleosporales
Karstenula rhodostoma CBS 690.94Pleosporales
Massarina eburnea CBS 473.64Pleosporales
Byssothecium circinansPleosporales
Periconia macrospinosaPleosporales
Lentithecium fluviatile CBS 122367Pleosporales
Stagonospora sp. SRC1lsM3aPleosporales
Ampelomyces quisqualisPleosporales
Parastagonospora nodorum SN15Pleosporales
Ophiobolus disseminansPleosporales
Setomelanomma holmiiPleosporales
Leptosphaeria maculans JN3Pleosporales
Plenodomus tracheiphilus IPT5Pleosporales
Clathrospora elynaePleosporales
Decorospora gaudefroyiPleosporales
Alternaria brassicicolaPleosporales
Alternaria alternataPleosporales
Pyrenophora tritici-repentisPleosporales
Pyrenophora teres f. teres 0-1Pleosporales
Setosphaeria turcica Et28APleosporales
Curvularia lunata m118Pleosporales
Bipolaris maydis C5Pleosporales
Bipolaris sorokiniana ND90PrPleosporales
Bipolaris oryzae ATCC 44560Pleosporales
Bipolaris zeicola 26-R-13Pleosporales
Bipolaris victoriae FI3Pleosporales
Cucurbitaria berberidis CBS 394.84Pleosporales
Pyrenochaeta sp. DS3sAY3aPleosporales
Lizonia empirigoniaUnknown
Didymella exigua CBS 183.55Pleosporales
Macroventuria anomochaetaPleosporales
Dothidotthia symphoricarpi CBS 119687Pleosporales
Pleomassaria siparia CBS 279.74Pleosporales
Melanomma pulvis-pyrius CBS 109.77Pleosporales
Aspergillus nidulansEurotiomycetes
Coccidioides immitisEurotiomycetes
Fusarium graminearumSordariomycetes
Neurospora crassaSordariomycetes
Botrytis cinereaLeotiomycetes
Sclerotinia sclerotiorumLeotiomycetes
Pyronema omphalodesPezizomycetes
Tuber melanosporumPezizomycetes
0.10
y
Figure SC
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
NR
PS.L
ike.
PKS_
grou
p60_
s7PK
S_gr
oup2
38_s
4PK
S_gr
oup1
86_s
4PK
S_gr
oup5
3_s8
PKS_
grou
p85_
s6N
RPS
.Lik
e_gr
oup1
00_s
6PK
S.Li
ke_g
roup
125_
s5PK
S.Li
ke_g
roup
122_
s5PK
S_gr
oup1
76_s
4N
RPS
_gro
up22
8_s4
DM
AT.N
RPS
_gro
up12
_s13
PKS_
grou
p44_
s8N
RPS
.Lik
e.PK
S_gr
oup2
22_s
4D
MAT
.NR
PS_g
roup
5_s1
4N
RPS
_gro
up42
_s8
DM
AT.N
RPS
_gro
up17
_s12
NR
PS.N
RPS
.Lik
e_gr
oup5
1_s8
PKS_
grou
p74_
s6n.
d._g
roup
185_
s4PK
S_gr
oup1
75_s
4PK
S.Li
ke_g
roup
107_
s6N
RPS
_gro
up66
_s7
PKS_
grou
p76_
s6N
RPS
_gro
up14
4_s5
NR
PS.P
KS.L
ike.
TC_g
roup
233_
s4PK
S_gr
oup5
5_s7
PKS_
grou
p132
_s5
NR
PS.N
RPS
.Lik
e_gr
oup5
7_s7
n.d.
.NR
PS.L
ike_
grou
p168
_s4
n.d.
_gro
up14
1_s5
PKS_
grou
p72_
s7PK
S_gr
oup2
14_s
4PK
S_gr
oup2
34_s
4PK
S_gr
oup1
69_s
4D
MAT
.NR
PS_g
roup
30_s
9PK
S_gr
oup1
02_s
6N
RPS
_gro
up17
2_s4
n.d.
_gro
up23
1_s4
PKS.
TC_g
roup
179_
s4n.
d..P
KS_g
roup
201_
s4n.
d..P
KS_g
roup
230_
s4PK
S_gr
oup5
4_s7
PKS.
PKS.
Like
_gro
up36
_s8
PKS_
grou
p120
_s5
PKS.
Like
_gro
up19
9_s4
PKS_
grou
p56_
s7PK
S_gr
oup1
59_s
5N
RPS
_gro
up26
_s10
PKS_
grou
p197
_s4
NR
PS.P
KS.L
ike_
grou
p13_
s13
PKS_
grou
p119
_s5
PKS_
grou
p149
_s5
NR
PS_g
roup
152_
s5PK
S_gr
oup1
17_s
5D
MAT
.PKS
_gro
up18
3_s4
n.d.
_gro
up17
4_s4
PKS_
grou
p113
_s6
PKS_
grou
p47_
s8D
MAT
.NR
PS_g
roup
1_s1
8PK
S.Li
ke_g
roup
68_s
7n.
d..P
KS.L
ike_
grou
p218
_s4
PKS_
grou
p232
_s4
n.d.
_gro
up17
3_s4
n.d.
_gro
up21
1_s4
HYB
RID
_gro
up11
6_s5
NR
PS.L
ike_
grou
p203
_s4
NR
PS.L
ike.
PKS_
grou
p131
_s5
PKS.
Like
_gro
up52
_s8
DM
AT.P
KS_g
roup
7_s1
4N
RPS
.NR
PS.L
ike_
grou
p180
_s4
PKS.
Like
_gro
up15
4_s5
HYB
RID
.PKS
_gro
up25
_s10
NR
PS_g
roup
63_s
7N
RPS
_gro
up15
8_s5
n.d.
.PKS
_gro
up23
5_s4
PKS_
grou
p219
_s4
NR
PS.P
KS.P
KS.L
ike_
grou
p88_
s6PK
S_gr
oup2
04_s
4N
RPS
_gro
up11
5_s6
PKS_
grou
p28_
s10
NR
PS_g
roup
188_
s4n.
d._g
roup
181_
s4PK
S_gr
oup2
2_s1
1n.
d._g
roup
146_
s5N
RPS
.PKS
.Lik
e_gr
oup2
7_s1
0PK
S_gr
oup2
05_s
4N
RPS
.PKS
_gro
up97
_s6
PKS_
grou
p229
_s4
PKS_
grou
p95_
s6PK
S_gr
oup3
5_s8
PKS_
grou
p67_
s7PK
S.Li
ke_g
roup
91_s
6PK
S_gr
oup7
9_s6
PKS_
grou
p236
_s4
NR
PS.L
ike.
PKS_
grou
p194
_s4
PKS_
grou
p80_
s6PK
S_gr
oup1
29_s
5PK
S_gr
oup4
3_s8
PKS_
grou
p162
_s5
n.d.
_gro
up17
8_s4
PKS.
Like
_gro
up41
_s8
PKS_
grou
p77_
s6PK
S_gr
oup2
23_s
4N
RPS
.TC
_gro
up90
_s6
PKS_
grou
p171
_s4
NR
PS.L
ike_
grou
p206
_s4
NR
PS.P
KS.L
ike_
grou
p59_
s7n.
d._g
roup
225_
s4D
MAT
.NR
PS_g
roup
14_s
12n.
d..N
RPS
.Lik
e_gr
oup1
91_s
4PK
S.Li
ke_g
roup
32_s
9PK
S_gr
oup7
8_s6
TC_g
roup
184_
s4N
RPS
_gro
up16
4_s5
PKS.
TC_g
roup
142_
s5PK
S.Li
ke_g
roup
155_
s5N
RPS
.Lik
e_gr
oup2
39_s
4PK
S_gr
oup2
07_s
4N
RPS
.PKS
_gro
up14
3_s5
PKS.
Like
_gro
up11
0_s6
DM
AT.N
RPS
_gro
up4_
s15
DM
AT.P
KS_g
roup
23_s
11PK
S_gr
oup1
36_s
5PK
S_gr
oup1
08_s
6N
RPS
.PKS
_gro
up16
1_s5
PKS_
grou
p167
_s4
PKS_
grou
p145
_s5
PKS_
grou
p89_
s6PK
S_gr
oup1
34_s
5PK
S_gr
oup1
53_s
5N
RPS
.PKS
_gro
up22
7_s4
HYB
RID
_gro
up12
1_s5
NR
PS.L
ike_
grou
p226
_s4
PKS_
grou
p94_
s6PK
S_gr
oup1
14_s
6N
RPS
_gro
up75
_s6
PKS.
TC_g
roup
147_
s5D
MAT
.PKS
_gro
up48
_s8
PKS_
grou
p111
_s6
PKS_
grou
p39_
s8n.
d..P
KS_g
roup
198_
s4PK
S_gr
oup1
95_s
4n.
d..P
KS_g
roup
202_
s4PK
S_gr
oup1
24_s
5PK
S_gr
oup1
06_s
6D
MAT
.NR
PS_g
roup
2_s1
7N
RPS
.PKS
_gro
up22
0_s4
PKS_
grou
p31_
s9PK
S_gr
oup6
4_s7
PKS_
grou
p200
_s4
PKS_
grou
p49_
s8n.
d..P
KS_g
roup
150_
s5N
RPS
.NR
PS.L
ike_
grou
p73_
s7PK
S_gr
oup8
2_s6
PKS_
grou
p40_
s8PK
S_gr
oup8
4_s6
n.d.
_gro
up12
3_s5
PKS.
Like
_gro
up37
_s8
DM
AT.N
RPS
_gro
up3_
s15
PKS.
Like
_gro
up16
6_s5
NR
PS.L
ike.
PKS_
grou
p81_
s6PK
S.TC
_gro
up58
_s7
PKS.
PKS.
Like
_gro
up10
9_s6
PKS_
grou
p196
_s4
PKS_
grou
p237
_s4
DM
A T.P
KS_g
roup
163_
s5N
RPS
_gro
up22
1_s4
PKS.
Like
_gro
up19
0_s4
PKS_
grou
p212
_s4
NR
PS.L
ike.
PKS.
TC_g
roup
160_
s5PK
S_gr
oup1
27_s
5N
RPS
_gro
up21
0_s4
DM
AT.N
RPS
_gro
up21
_s11
PKS_
grou
p105
_s6
PKS_
grou
p128
_s5
n.d.
_gro
up21
7_s4
PKS.
TC_g
roup
182_
s4PK
S_gr
oup1
39_s
5PK
S_gr
oup6
_s14
PKS_
grou
p209
_s4
PKS_
grou
p187
_s4
DM
AT.N
RPS
.PKS
_gro
up24
_s11
DM
AT.P
KS_g
roup
98_s
6H
YBR
ID_g
roup
103_
s6H
YBR
ID.P
KS_g
roup
15_s
12n.
d._g
roup
46_s
8n.
d._g
roup
92_s
6n.
d._g
roup
137_
s5n.
d._g
roup
177_
s4n.
d._g
roup
192_
s4n.
d._g
roup
208_
s4n.
d._g
roup
213_
s4n.
d..N
RPS
.Lik
e_gr
oup1
93_s
4n.
d..P
KS_g
roup
61_s
7N
RPS
_gro
up33
_s9
NR
PS_g
roup
34_s
9N
RPS
_gro
up50
_s8
NR
PS_g
roup
69_s
7N
RPS
_gro
up99
_s6
NR
PS_g
roup
133_
s5N
RPS
_gro
up15
6_s5
NR
PS.L
ike.
PKS.
TC_g
roup
101_
s6N
RPS
.NR
PS.L
ike_
grou
p216
_s4
NR
PS.P
KS_g
roup
224_
s4N
RPS
.PKS
.Lik
e_gr
oup8
7_s6
PKS_
grou
p10_
s13
PKS_
grou
p11_
s13
PKS_
grou
p16_
s12
PKS_
grou
p19_
s11
PKS_
grou
p20_
s11
PKS_
grou
p70_
s7PK
S_gr
oup7
1_s7
PKS_
grou
p83_
s6PK
S_gr
oup9
6_s6
PKS_
grou
p104
_s6
PKS_
grou
p112
_s6
PKS_
grou
p118
_s5
PKS_
grou
p135
_s5
PKS_
grou
p138
_s5
PKS_
grou
p140
_s5
PKS_
grou
p151
_s5
PKS_
grou
p157
_s5
PKS_
grou
p165
_s5
PKS_
grou
p170
_s4
PKS_
grou
p215
_s4
PKS.
Like
_gro
up8_
s13
PKS.
Like
_gro
up9_
s13
PKS.
Like
_gro
up18
_s11
PKS.
Like
_gro
up29
_s10
PKS.
Like
_gro
up38
_s8
PKS.
Like
_gro
up62
_s7
PKS.
Like
_gro
up65
_s7
PKS.
Like
_gro
up86
_s6
PKS.
Like
_gro
up93
_s6
PKS.
Like
_gro
up13
0_s5
PKS.
Like
.TC
_gro
up12
6_s5
PKS.
TC_g
roup
45_s
8PK
S.TC
_gro
up14
8_s5
Viridothelium virensTrypetheliales
Myriangium duriaei CBS 260.36Myriangiales
Elsinoe ampelinaMyriangiales
Aureobasidium pullulans EXF-150Dothideales
Aureobasidium namibiae CBS 147.97Dothideales
Aureobasidium melanogenum CBS 110374Dothideales
Aureobasidium subglaciale EXF-2481Dothideales
Delphinella strobiligenaDothideales
Polychaeton citri CBS 116435Capnodiales
Dissoconium aciculare CBS 342.82Capnodiales
Zymoseptoria tritici IPO323Capnodiales
Zymoseptoria pseudotritici STIR04_2.2.1Capnodiales
Zymoseptoria ardabiliae STIR04_1.1.1Capnodiales
Dothistroma septosporum NZE10Capnodiales
Passalora fulvaCapnodiales
Zasmidium cellare ATCC 36951Capnodiales
Pseudocercospora fijiensisCapnodiales
Sphaerulina musiva SO2202Capnodiales
Sphaerulina populicolaCapnodiales
Cercospora zeae-maydis SCOH1-5Capnodiales
Hortaea acidophilaDothideales
Baudoinia panamericana UAMH 10762Capnodiales
Teratosphaeria nubilosaCapnodiales
Piedraia hortae CBS 480.64Capnodiales
Acidomyces richmondensis BFWUnknown
Eremomyces bilateralis CBS 781.70Unknown
Microthyrium microscopicumMicrothyriales
Trichodelitschia bisporulaPhaeotrichales
Tothia fuscellaVenturiales
Verruconis gallopavaVenturiales
Venturia inaequalisVenturiales
Venturia pyrinaVenturiales
Aulographum hederaeUnknown
Rhizodiscina lignyotaPatellariales
Lineolata rhizophoraeDothideales
Patellaria atrata CBS 101060Patellariales
Coniosporium apollinis CBS 100218Unknown
Aplosporella prunicola CBS 121167Botryosphaeriales
Phyllosticta citriasianaBotryosphaeriales
Neofusicoccum parvum UCRNP2Botryosphaeriales
Macrophomina phaseolina MS6Botryosphaeriales
Botryosphaeria dothideaBotryosphaeriales
Diplodia seriataBotryosphaeriales
Saccharata proteae CBS 121410Botryosphaeriales
Pseudovirgaria hyperparasiticaCapnodiales
Lepidopterella palustris CBS 459.81Mytilinidiales
Glonium stellatumUnknown
Cenococcum geophilum 1.58Unknown
Lophium mytilinumMytilinidiales
Mytilinidion resinicolaMytilinidiales
Rhytidhysteron rufulumHysteriales
Hysterium pulicareHysteriales
Delitschia confertasporaPleosporales
Zopfia rhizophila CBS 207.26Unknown
Lindgomyces ingoldianusPleosporales
Clohesyomyces aquaticusPleosporales
Amniculicola lignicola CBS 123094Pleosporales
Lophiotrema nuculaPleosporales
Polyplosphaeria fuscaPleosporales
Didymosphaeria enaliaPleosporales
Aaosphaeria arxiiPleosporales
Lophiostoma macrostomum CBS 122681Pleosporales
Westerdykella ornataPleosporales
Sporormia fimetaria CBS 119925Pleosporales
Massariosphaeria phaeosporaPleosporales
Trematosphaeria pertusaPleosporales
Bimuria novae-zelandiaePleosporales
Paraphaeosphaeria sporulosaPleosporales
Karstenula rhodostoma CBS 690.94Pleosporales
Massarina eburnea CBS 473.64Pleosporales
Byssothecium circinansPleosporales
Periconia macrospinosaPleosporales
Lentithecium fluviatile CBS 122367Pleosporales
Stagonospora sp. SRC1lsM3aPleosporales
Ampelomyces quisqualisPleosporales
Parastagonospora nodorum SN15Pleosporales
Ophiobolus disseminansPleosporales
Setomelanomma holmiiPleosporales
Leptosphaeria maculans JN3Pleosporales
Plenodomus tracheiphilus IPT5Pleosporales
Clathrospora elynaePleosporales
Decorospora gaudefroyiPleosporales
Alternaria brassicicolaPleosporales
Alternaria alternataPleosporales
Pyrenophora tritici-repentisPleosporales
Pyrenophora teres f. teres 0-1Pleosporales
Setosphaeria turcica Et28APleosporales
Curvularia lunata m118Pleosporales
Bipolaris maydis C5Pleosporales
Bipolaris sorokiniana ND90PrPleosporales
Bipolaris oryzae ATCC 44560Pleosporales
Bipolaris zeicola 26-R-13Pleosporales
Bipolaris victoriae FI3Pleosporales
Cucurbitaria berberidis CBS 394.84Pleosporales
Pyrenochaeta sp. DS3sAY3aPleosporales
Lizonia empirigoniaUnknown
Didymella exigua CBS 183.55Pleosporales
Macroventuria anomochaetaPleosporales
Dothidotthia symphoricarpi CBS 119687Pleosporales
Pleomassaria siparia CBS 279.74Pleosporales
Melanomma pulvis-pyrius CBS 109.77Pleosporales
Aspergillus nidulansEurotiomycetes
Coccidioides immitisEurotiomycetes
Fusarium graminearumSordariomycetes
Neurospora crassaSordariomycetes
Botrytis cinereaLeotiomycetes
Sclerotinia sclerotiorumLeotiomycetes
Pyronema omphalodesPezizomycetes
Tuber melanosporumPezizomycetes
0.10
0Fritz and Purvis ‘ D
1 2.5
Figure SD(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.
The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
0.00
0.25
0.50
0.75
1.00
co-o
ccur
antiS
MASH
SM
URF
Cluster detection algorithm
% t
ota
l pro
tein
s r
eco
ve
red
at
locu
sRecovery of 87 melanin cluster loci
Figure SE (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
0.25
0.50
0.75
1.00
0.0 0.1 0.2 0.3 0.4Phylogenetic distance
Sore
nsen
dis
sim
ilarit
yPhylogenetic distance vs. dissimilarity in cluster repertoire in the Pleosporales
Figure SF
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
Viridothelium virensTrypetheliales
Myriangium duriaei CBS 260.36MyriangialesElsinoe ampelinaMyriangiales
Aureobasidium pullulans EXF-150Dothideales
Aureobasidium namibiae CBS 147.97DothidealesAureobasidium melanogenum CBS 110374Dothideales
Aureobasidium subglaciale EXF-2481Dothideales
Delphinella strobiligenaDothideales
Polychaeton citri CBS 116435Capnodiales
Dissoconium aciculare CBS 342.82Capnodiales
Zymoseptoria tritici IPO323CapnodialesZymoseptoria pseudotritici STIR04_2.2.1CapnodialesZymoseptoria ardabiliae STIR04_1.1.1Capnodiales
Dothistroma septosporum NZE10CapnodialesPassalora fulvaCapnodiales
Zasmidium cellare ATCC 36951Capnodiales
Pseudocercospora fijiensisCapnodiales
Sphaerulina musiva SO2202CapnodialesSphaerulina populicolaCapnodiales
Cercospora zeae-maydis SCOH1-5Capnodiales
Hortaea acidophilaDothideales
Baudoinia panamericana UAMH 10762Capnodiales
Teratosphaeria nubilosaCapnodiales
Piedraia hortae Capnodiales
Acidomyces richmondensis BFWUnknown
Eremomyces bilateralis CBS 781.70Unknown
Microthyrium microscopicumMicrothyriales
Trichodelitschia bisporulaPhaeotrichales
Tothia fuscellaVenturiales
Verruconis gallopavaVenturiales
Venturia inaequalisVenturialesVenturia pyrinaVenturiales
Aulographum hederaeUnknown
Rhizodiscina lignyotaPatellariales
Lineolata rhizophoraeDothideales
Patellaria atrata CBS 101060Patellariales
Coniosporium apollinis CBS 100218Unknown
Aplosporella prunicola CBS 121167Botryosphaeriales
Phyllosticta citriasianaBotryosphaeriales
Neofusicoccum parvum UCRNP2Botryosphaeriales
Macrophomina phaseolina MS6BotryosphaerialesBotryosphaeria dothideaBotryosphaeriales
Diplodia seriataBotryosphaeriales
Saccharata proteae CBS 121410Botryosphaeriales
Pseudovirgaria hyperparasiticaCapnodiales
Lepidopterella palustris CBS 459.81Mytilinidiales
Glonium stellatumUnknownCenococcum geophilum 1.58Unknown
Lophium mytilinumMytilinidialesMytilinidion resinicolaMytilinidialesRhytidhysteron rufulumHysteriales
Hysterium pulicareHysteriales
Delitschia confertasporaPleosporales
Zopfia rhizophila CBS 207.26Unknown
Lindgomyces ingoldianusPleosporales
Clohesyomyces aquaticusPleosporales
Amniculicola lignicola CBS 123094PleosporalesLophiotrema nuculaPleosporales
Polyplosphaeria fuscaPleosporales
Didymosphaeria enaliaPleosporales
Aaosphaeria arxiiPleosporalesLophiostoma macrostomum CBS 122681Pleosporales
Westerdykella ornataPleosporales
Sporormia fimetaria CBS 119925Pleosporales
Massariosphaeria phaeosporaPleosporalesTrematosphaeria pertusaPleosporales
Bimuria novae-zelandiaePleosporalesParaphaeosphaeria sporulosaPleosporalesKarstenula rhodostoma CBS 690.94PleosporalesMassarina eburnea CBS 473.64Pleosporales
Byssothecium circinansPleosporalesPericonia macrospinosaPleosporales
Lentithecium fluviatile CBS 122367Pleosporales
Stagonospora sp. SRC1lsM3aPleosporalesAmpelomyces quisqualisPleosporales
Parastagonospora nodorum SN15Pleosporales
Ophiobolus disseminansPleosporales
Setomelanomma holmiiPleosporales
Leptosphaeria maculans JN3Pleosporales
Plenodomus tracheiphilus IPT5PleosporalesClathrospora elynaePleosporales
Decorospora gaudefroyiPleosporales
Alternaria brassicicolaPleosporalesAlternaria alternataPleosporales
Pyrenophora tritici-repentisPleosporalesPyrenophora teres f. teres 0-1PleosporalesSetosphaeria turcica Et28APleosporalesCurvularia lunata m118PleosporalesBipolaris maydis C5PleosporalesBipolaris sorokiniana ND90PrPleosporalesBipolaris oryzae ATCC 44560PleosporalesBipolaris zeicola 26-R-13PleosporalesBipolaris victoriae FI3Pleosporales
Cucurbitaria berberidis CBS 394.84Pleosporales
Pyrenochaeta sp. DS3sAY3aPleosporalesLizonia empirigoniaUnknown
Didymella exigua CBS 183.55Pleosporales
Macroventuria anomochaetaPleosporales
Dothidotthia symphoricarpi CBS 119687Pleosporales
Pleomassaria siparia CBS 279.74Pleosporales
Melanomma pulvis-pyrius CBS 109.77Pleosporales
Aspergillus nidulansEurotiomycetes
Coccidioides immitisEurotiomycetes
Fusarium graminearumSordariomycetes
Neurospora crassaSordariomycetes
Botrytis cinereaLeotiomycetesSclerotinia sclerotiorumLeotiomycetes
Pyronema omphalodesPezizomycetes
Tuber melanosporumPezizomycetes
0.10
Dothideales
Capnodiales
Venturiales
Botryo-sphaeriales
MytilinidalesHysteriales
Pleosporales
Myriangiales
0 20 40 60Number of refined clusters with a signature gene
Figure SG
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
0
20
40
60
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Dissimilarity estimates across all cluster repertoires
Den
sity
Partitioning dissimilarity between cluster repertoires (Pleosporales)
Sørensen dissimilarity(total dissimilarity
across all repertoires)
Turnover componentof Sørensen dissimilarity
Nestedness componentof Sørensen dissimilarity
Figure SH
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint
MCL000001
MCL000002
MCL000003
MCL000005
MCL000006
MCL000016 MCL000017
MCL000033
MCL000109
MCL000176
MCL000193
MCL000253
MCL000357
MCL006196
MCL007003
MCL007271
MCL007288
0
10
20
30
0 300 600 900gene homolog group size
Ra
up
-Cri
ck d
issim
ilari
tyRelationship between gene homolog group sizeand cluster composition diversity (Raup-Crick)
Figure SI
(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint