The architecture of metabolism maximizes biosynthetic ... · 1/31/2020 · 2 The architecture of...

1

Research Article 1

The architecture of metabolism maximizes biosynthetic diversity in the largest class of 2

fungi 3

Authors: 4

Emile Gluck-Thaler, Department of Plant Pathology, The Ohio State University Columbus, OH, USA, 5

and Biological Sciences, University of Pittsburgh, Pittsburgh, PA, USA 6

Sajeet Haridas, US Department of Energy Joint Genome Institute, Lawrence Berkeley National 7

Laboratory, Berkeley, CA, USA 8

Manfred Binder, TechBase, R-Tech GmbH, Regensburg, Germany 9

Igor V. Grigoriev, US Department of Energy Joint Genome Institute, Lawrence Berkeley National 10

Laboratory, Berkeley, CA, USA, and Department of Plant and Microbial Biology, University of 11

California, Berkeley, CA 12

Pedro W. Crous, Westerdijk Fungal Biodiversity Institute, Uppsalalaan 8, 3584 CT Utrecht, The 13

Netherlands 14

Joseph W. Spatafora, Department of Botany and Plant Pathology, Oregon State University, OR, USA 15

Kathryn Bushley, Department of Plant and Microbial Biology, University of Minnesota, MN, USA 16

Jason C. Slot, Department of Plant Pathology, The Ohio State University Columbus, OH, USA 17

corresponding author: [email protected] 18

19

(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint

https://doi.org/10.1101/2020.01.31.928846

2

Abstract: 19

Background - Ecological diversity in fungi is largely defined by metabolic traits, including the 20

ability to produce secondary or "specialized" metabolites (SMs) that mediate interactions with 21

other organisms. Fungal SM pathways are frequently encoded in biosynthetic gene clusters 22

(BGCs), which facilitate the identification and characterization of metabolic pathways. Variation 23

in BGC composition reflects the diversity of their SM products. Recent studies have documented 24

surprising diversity of BGC repertoires among isolates of the same fungal species, yet little is 25

known about how this population-level variation is inherited across macroevolutionary 26

timescales. 27

Results - Here, we applied a novel linkage-based algorithm to reveal previously unexplored 28

dimensions of diversity in BGC composition, distribution, and repertoire across 101 species of 29

Dothideomycetes, which are considered to be the most phylogenetically diverse class of fungi 30

and are known to produce many SMs. We predicted both complementary and overlapping sets of 31

clustered genes compared with existing methods and identified novel gene pairs that associate 32

with known secondary metabolite genes. We found that variation in BGC repertoires is due to 33

non-overlapping BGC combinations and that several BGCs have biased ecological distributions, 34

consistent with niche-specific selection. We observed that total BGC diversity scales linearly 35

with increasing repertoire size, suggesting that secondary metabolites have little structural 36

redundancy in individual fungi. 37

Conclusions - We project that there is substantial unsampled BGC diversity across specific 38

families of Dothideomycetes, which will provide a roadmap for future sampling efforts. Our 39

approach and findings lend new insight into how BGC diversity is generated and maintained 40

across an entire fungal taxonomic class. 41


https://doi.org/10.1101/2020.01.31.928846

3

Keywords: 42

chemical ecology; Fungi; metabolism; gene cluster 43

44

Background: 45

Plants, bacteria and fungi produce the majority of the earth's biochemical diversity. These 46

organisms produce a remarkable variety of secondary/specialized metabolites (SMs) that can 47

mediate ecological functions, including defense, resource acquisition, and mutualism. Standing 48

SM diversity is often high at the population level, which may affect the rates of adaptation over 49

microevolutionary timescales. For example, high intraspecific quantitative and qualitative 50

chemotype diversity in plants can enable rapid adaptation to local biotic factors (Agrawal, 51

Hastings et al. 2012; Züst, Heichinger et al. 2012; Glassmire, Jeffrey et al. 2016). However, the 52

fate of population-level chemodiversity across longer timescales is not well explored in plants or 53

other lineages. We therefore sought to identify how metabolic variation is distributed across 54

macroevolutionary timescales by profiling chemodiversity across a well-sampled taxonomic 55

class. 56

The Dothideomycetes, which originated between 247 and 459 million years ago 57

(Beimforde, Feldberg et al. 2014), comprise the largest and arguably most phylogenetically 58

diverse class of fungi. Currently, 19,000 species are recognized in 32 orders containing more 59

than 1,300 genera (Zhang, Crous et al. 2011). Dothideomycetes are divided into two major 60

subclasses, the Pleosporomycetidae (order Pleosporales) and Dothideomycetidae (orders 61

Dothideales, Capnodiales, and Myriangiales), which correspond to the presence or absence, 62

respectively, of pseudoparaphyses during development of the asci (Schoch, Crous et al. 2009). 63

Several other orders await definitive placement. 64


https://doi.org/10.1101/2020.01.31.928846

4

Dothideomycetes also display a large diversity of fungal lifestyles and ecologies. The 65

majority of Dothideomycetes are terrestrial and associate with phototrophic hosts as either 66

pathogens, saprobes, endophytes (Schoch, Crous et al. 2009), lichens (Nelsen, Lucking et al. 67

2011), or ectomycorrhizal symbionts (Spatafora, Owensby et al. 2012). Six orders contain plant 68

pathogens capable of infecting nearly every known crop species. The Pleosporales and 69

Capnodiales, in particular, are dominated by asexual plant pathogens that cause significant 70

economic losses and have been well sampled in previous genome sequencing efforts (Goodwin, 71

Ben M'Barek et al. 2011; Ohm, Feau et al. 2012; Oliver, Friesen et al. 2012; Condon, Leng et al. 72

2013; Manning, Pandelova et al. 2013). A single order (Jahnulales) contains aquatic, primarily 73

freshwater, species (Suetrong, Boonyuen et al. 2011). Other ecologies include human and animal 74

pathogens, including some taxa that can elicit allergies and asthma (Crameri, Garbani et al. 75

2014), and rock-inhabiting fungi (Ruibal, Gueidan et al. 2009). 76

This broad range of lifestyles is accompanied by extensive diversity of SMs, for which 77

very few have known ecological roles. The Dothideomycetes, and several other ascomycete 78

classes (Eurotiomycetes, Sordariomycetes, and Leotiomycetes), produce the greatest number and 79

diversity of SMs across the fungal kingdom (Spatafora and Bushley 2015; Akimitsu et al. 2014). 80

Economically important plant pathogens in the Pleosporales (Alternaria, Bipolaris, Exserohilum, 81

Leptosphaeria, Pyrenophora, and Stagonospora), in particular, are known to produce host-82

selective toxins that confer the ability to cause disease in specific plant hosts (Walton and 83

Panaccione 1993; Walton 1996; Wolpert, Dunkle et al. 2002; Ciuffetti, Manning et al. 2010, 84

Pandelova, Figueroa et al. 2012; Akimitsu, Tsuge et al. 2014). Other toxins first identified in 85

Pleosporales have roles in virulence, but are not pathogenicity determinants, including the PKS 86

derived compounds depudecin (Wight, Kim et al. 2009) and solanapyrone (Kaur 1995). 87


https://doi.org/10.1101/2020.01.31.928846

5

Dothideomycetes are also known to produce bioactive metabolites shared with more 88

distantly related fungal classes. Sirodesmin, a virulence factor produced by Leptosphaeria 89

maculans, for example, belongs to the same class of epipolythiodioxopiperazine (ETP) toxins as 90

gliotoxin, an immunosuppressant produced by the eurotiomycete human pathogen Aspergillus 91

fumigatus (Gardiner, Waring et al. 2005; Patron, Waller et al. 2007). Dothistromin, a polyketide 92

metabolite produced by the pine pathogen Dothistroma septosporum shares ancestry with 93

aflatoxin (Bradshaw, Slot et al. 2013), a mycotoxin produced by Aspergillus species that poses 94

serious human health and environmental risks worldwide (Horn 2003; Wang and Tang 2004). 95

A majority of bioactive metabolites in Dothideomycetes are small-molecule SMs that, 96

like those of other fungi, are frequently the products of biosynthetic gene clusters (BGCs) 97

composed of enzymes, transporters, and regulators that contribute to a common SM pathway. 98

Most of these BGCs are defined by four main classes of SM core signature enzymes: 1) 99

nonribosomal peptide synthetases (NRPS), 2) polyketide synthetases (PKS), 3) terpene synthases 100

(TS), and 4) dimethylallyl tryptophan synthases (DMAT) (Hoffmeister & Keller 2007). Fungal 101

gene clusters are hotspots for genome evolution through gene duplication, loss, and horizontal 102

transfer, which recombine pathways and generate diversity (Wisecaver, Slot et al. 2014). 103

Additionally, recent studies have shown that gene clusters may evolve through recombination or 104

shuffling of modular subunits of syntenic genes (Lind, Wisecaver et al. 2017; Gluck-Thaler et al 105

2018. Changes in BGC gene content often result in structural changes to the SM product(s), and 106

therefore BGCs can be used to monitor the evolution of chemodiversity (Lind, Wisecaver et al. 107

2017; Proctor, McCormick et al. 2018). The most widely used methods for detecting BGCs rely 108

on models of gene cluster composition based on putative functions in SM biosynthesis informed 109


https://doi.org/10.1101/2020.01.31.928846

6

by a phylogenetically limited set of taxa, but gene function agnostic methods are being 110

developed (Slot and Gluck-Thaler 2019). 111

Here, we systematically assessed BGC richness and compositional diversity in the 112

genomes of 101 Dothideomycetes species, most recently sequenced (Haridas et al. in press). 113

Using a newly benchmarked algorithm that identifies clustered genes of interest through the 114

frequency of their co-occurrence with and around signature biosynthetic genes, we identified 115

3399 putative BGCs, grouped into 719 unique cluster types, including 5 varieties of candidate 116

DHN melanin clusters. The conservation of specific gene pairs across BGC types suggests that 117

precise functional interactions contribute to the modular evolution of these loci. Numerous BGCs 118

have either over- or under-dispersed phylogenetic distributions, suggesting pathways have been 119

differentially impacted by selection. In comparisons across species, BGC repertoire diversity 120

increases linearly with repertoire size, reflecting a mode of metabolic evolution in these fungi 121

that is likely distinct from that of plants. We found little overlap in cluster repertoires among 122

genomes from different genera, and project that a wealth of unique BGCs remain to be 123

discovered within this fungal lineage. 124

Results: 125

Dothideomycetes contain hundreds of distinct types of BGCs, a small fraction of which are 126

characterized. 127

Using a novel cluster detection approach based on shared syntenic relationships among 128

genes (CO-OCCUR, see Methods, Figure 1, Figure SA), we identified 332 gene homolog groups 129

(homolog groups) of interest (Table SA, Table SB) whose members were organized into 3399 130

candidate BGCs of at least two genes (Table SC) in 101 Dothideomycete genomes (Table SD), 131

representing an average of 33.7 BGCs per genome (SD= 15.4, Figure SG). We grouped BGCs 132


https://doi.org/10.1101/2020.01.31.928846

Random distributions of HG

co-occurrences

BGCs(3,399 Total)

Group together BGCs that have ~90% of their genes in

common(719 Total)

Cluster groups (CGs)

(422 Total)

Orphan clusters (OCs)

(297 Total)

CO-OCCURPipeline for identifying

biosynthetic gene clusters (BGCs) using

unexpected HG co-occurrences

Pipeline for sampling random pairs of co-

occurring HGs(500,000 replicates)

101 DothideomycetesGenomes

with genes clustered into homolog groups (HGs)

Figure 1.


https://doi.org/10.1101/2020.01.31.928846

7

into 719 unique cluster types based on a minimum gene content similarity of 90%; 422 cluster 133

types are part of homologous cluster groups (cluster groups) found in 2 or more genomes, and 134

297 are orphan clusters found in only one genome (Table SE). Of these, 345 cluster types (166 135

cluster groups and 179 orphan clusters) had 5 or more genes per BGC (Figure 2), and 459 cluster 136

types (239 cluster groups and 220 orphan clusters) had 4 or more genes per BGC (Figure SC). 137

Only 9 of the 459 cluster types with greater than 4 genes were ever found more than once in any 138

given genome (Table SE). According to standard practice, we classified cluster types based on 139

the presence of biosynthetic signature genes: dimethylallyl tryptophan synthase (DMAT), 140

polyketide synthase (PKS), PKS-like, nonribosomal peptide-synthetase (NRPS), NRPS-like, 141

hybrid (HYBRID), and terpene cyclase (TC). We found that among all cluster types with greater 142

than 4 genes, 186 contained only PKS and 29 contained only NRPS signature genes. Similarly, 143

we detected 4 DMAT, 38 PKS-like, 16 NRPS-like, 3 HYBRID, and 3 TC-only cluster types. 127 144

cluster types contained more than 1 type of signature gene, and 53 cluster types contained no 145

signature gene at all but still consisted of genes found in significant co-occurrences. By 146

searching the MIBiG database for highly similar hits (≥70% amino acid identity) to the signature 147

biosynthetic genes in CO-OCCUR BGCs, we were able to confidently annotate 158 of the BGCs 148

recovered by CO-OCCUR with 32 unique MIBiG entries, corresponding to 22 unique 149

metabolites (Table SF). BGC annotations based instead on content overlap with characterized 150

MIBiG clusters can be found in Table SG (minimum cluster size=3 genes; minimum percentage 151

of genes with similarity=70%). 152

Of the 158 BGCs with hits to the MIBiG database, some encoded non-host selective 153

phytotoxins or other compounds with known roles in virulence to plants. Two PKS BGC’s 154

encoding the non-host-specific phytotoxin and DNA polymerase inhibitor, solanapyrone 155


https://doi.org/10.1101/2020.01.31.928846

Viridothelium virensTrypetheliales

Myriangium duriaei CBS 260.36MyriangialesElsinoe ampelinaMyriangiales

Aureobasidium pullulans EXF-150Dothideales

Aureobasidium namibiae CBS 147.97DothidealesAureobasidium melanogenum CBS 110374Dothideales

Aureobasidium subglaciale EXF-2481Dothideales

Delphinella strobiligenaDothideales

Polychaeton citri CBS 116435Capnodiales

Dissoconium aciculare CBS 342.82Capnodiales

Zymoseptoria tritici IPO323CapnodialesZymoseptoria pseudotritici STIR04_2.2.1CapnodialesZymoseptoria ardabiliae STIR04_1.1.1Capnodiales

Dothistroma septosporum NZE10CapnodialesPassalora fulvaCapnodiales

Zasmidium cellare ATCC 36951Capnodiales

Pseudocercospora fijiensisCapnodiales

Sphaerulina musiva SO2202CapnodialesSphaerulina populicolaCapnodiales

Cercospora zeae-maydis SCOH1-5Capnodiales

Hortaea acidophilaDothideales

Baudoinia panamericana UAMH 10762Capnodiales

Teratosphaeria nubilosaCapnodiales

Piedraia hortae Capnodiales

Acidomyces richmondensis BFWUnknown

Eremomyces bilateralis CBS 781.70Unknown

Microthyrium microscopicumMicrothyriales

Trichodelitschia bisporulaPhaeotrichales

Tothia fuscellaVenturiales

Verruconis gallopavaVenturiales

Venturia inaequalisVenturialesVenturia pyrinaVenturiales

Aulographum hederaeUnknown

Rhizodiscina lignyotaPatellariales

Lineolata rhizophoraeDothideales

Patellaria atrata CBS 101060Patellariales

Coniosporium apollinis CBS 100218Unknown

Aplosporella prunicola CBS 121167Botryosphaeriales

Phyllosticta citriasianaBotryosphaeriales

Neofusicoccum parvum UCRNP2Botryosphaeriales

Macrophomina phaseolina MS6BotryosphaerialesBotryosphaeria dothideaBotryosphaeriales

Diplodia seriataBotryosphaeriales

Saccharata proteae CBS 121410Botryosphaeriales

Pseudovirgaria hyperparasiticaCapnodiales

Lepidopterella palustris CBS 459.81Mytilinidiales

Glonium stellatumUnknownCenococcum geophilum 1.58Unknown

Lophium mytilinumMytilinidialesMytilinidion resinicolaMytilinidialesRhytidhysteron rufulumHysteriales

Hysterium pulicareHysteriales

Delitschia confertasporaPleosporales

Zopfia rhizophila CBS 207.26Unknown

Lindgomyces ingoldianusPleosporales

Clohesyomyces aquaticusPleosporales

Amniculicola lignicola CBS 123094PleosporalesLophiotrema nuculaPleosporales

Polyplosphaeria fuscaPleosporales

Didymosphaeria enaliaPleosporales

Aaosphaeria arxiiPleosporalesLophiostoma macrostomum CBS 122681Pleosporales

Westerdykella ornataPleosporales

Sporormia fimetaria CBS 119925Pleosporales

Massariosphaeria phaeosporaPleosporalesTrematosphaeria pertusaPleosporales

Bimuria novae-zelandiaePleosporalesParaphaeosphaeria sporulosaPleosporalesKarstenula rhodostoma CBS 690.94PleosporalesMassarina eburnea CBS 473.64Pleosporales

Byssothecium circinansPleosporalesPericonia macrospinosaPleosporales

Lentithecium fluviatile CBS 122367Pleosporales

Stagonospora sp. SRC1lsM3aPleosporalesAmpelomyces quisqualisPleosporales

Parastagonospora nodorum SN15Pleosporales

Ophiobolus disseminansPleosporales

Setomelanomma holmiiPleosporales

Leptosphaeria maculans JN3Pleosporales

Plenodomus tracheiphilus IPT5PleosporalesClathrospora elynaePleosporales

Decorospora gaudefroyiPleosporales

Alternaria brassicicolaPleosporalesAlternaria alternataPleosporales

Pyrenophora tritici-repentisPleosporalesPyrenophora teres f. teres 0-1PleosporalesSetosphaeria turcica Et28APleosporalesCurvularia lunata m118PleosporalesBipolaris maydis C5PleosporalesBipolaris sorokiniana ND90PrPleosporalesBipolaris oryzae ATCC 44560PleosporalesBipolaris zeicola 26-R-13PleosporalesBipolaris victoriae FI3Pleosporales

Cucurbitaria berberidis CBS 394.84Pleosporales

Pyrenochaeta sp. DS3sAY3aPleosporalesLizonia empirigoniaUnknown

Didymella exigua CBS 183.55Pleosporales

Macroventuria anomochaetaPleosporales

Dothidotthia symphoricarpi CBS 119687Pleosporales

Pleomassaria siparia CBS 279.74Pleosporales

Melanomma pulvis-pyrius CBS 109.77Pleosporales

Aspergillus nidulansEurotiomycetes

Coccidioides immitisEurotiomycetes

Fusarium graminearumSordariomycetes

Neurospora crassaSordariomycetes

Botrytis cinereaLeotiomycetesSclerotinia sclerotiorumLeotiomycetes

Pyronema omphalodesPezizomycetes

Tuber melanosporumPezizomycetes

Number of clusters2PKS

NRPSPKS-likeNRPS-likeHybrid

Signature gene type1

> 1 signature gene

Dothideales

Capnodiales

Venturiales

Botryo-sphaeriales

MytilinidalesHysteriales

Pleosporales

Myriangiales

Order

Homologous cluster group linkage treeD

othi

deom

ycet

e sp

ecie

s tre

e

signature gene ≥70% identical to known locus

Depudecin

Alternapyrone

DHN melanin

0.10

Cercosporin

Dimethylcoprogen

Ferrichrome

Betaenone A/B/C

Chaetoviridin/Chaetomugilin

Curvupallides

Swainsonine

Sirodesmin

Aflatoxin

Figure 2


https://doi.org/10.1101/2020.01.31.928846

8

(Mizushina, Kamisuki et al. 2002, Kasahara, Miyamoto et al. 2010), and the related 156

alternapyrone (Fujii, Yoshida et al. 2005), first identified in Alternaria solani (Kasahara et al. 157

2010; Mizushina et al. 2002; Fujii et al. 2005), were found across taxa primarily in the order 158

Pleosporales, especially in the closely related Pleosporaceae, Leptosphaeriaceae, and 159

Phaeosphaeriaceae families (Figure 2, Table SF, SG). Several BGCs mapping to the MIBiG 160

cluster for the extracellular siderophore dimethylcoprogen, which plays a role in virulence in the 161

corn pathogen Cochliobolus heterostrophus (Dothideomycetes) and Fusarium graminearum 162

(Sordariomycetes), was also found in most taxa in Pleosporales (Oide, Moeder et al. 2006). In 163

contrast, a BGC mapping to the NRPS phytotoxin sirodesmin, produced by Leptosphaeria 164

maculans (Gardiner, Cozijnsen et al. 2004), and depudecin, a histone deacetylase (HDAC) 165

synthesized by a PKS BGC first identified in A. brassicicola (Wight et al. 2009), were found 166

discontinuously distributed in only a few unrelated species within Pleosporales (Figure 2, Table 167

SF, SG). Aside from sirodesmin, only one other BGC had hits in MIBiG to a BGC producing a 168

host-selective toxin. A BGC mapping to T-toxin, a polyketide toxin produced by race T (C4) of 169

C. heterostrophus (only Race O (C5) included in this study) that was responsible for the170

devastating Southern Corn Leaf Blight (Daly 1982; Turgeon and Baker 2007), was detected by 171

CO-OCCUR in only two additional taxa, Ampelomyces quisqualis and L. maculans (Table SG). 172

Other BGCs matched MIBiG clusters from other ascomycete classes (Eurotiomycetes, 173

Sordariomycetes), some of which have been previously detected in Dothideomycetes while 174

others were unexpected. The aflatoxin-like dothistromin clusters, which are fragmented into six 175

mini-clusters in Dothistroma septorum (Bradshaw, Slot et al. 2013), predictably mapped to 176

clusters detected by CO-OCCUR in D. septorum and the closely related species Passalora fulva 177

(Capnodiales) Some unexpected findings included a cluster in Macrophomina phaseolina, that 178


https://doi.org/10.1101/2020.01.31.928846

9

matched the PKS BGC for chaetoglobosins, a class of mycotoxins with both antifungal and anti-179

cancer activities (Ali, Caggia et al. 2015; Jiang, Song et al. 2017) found in the distantly related 180

Chaetomium globosum (Sordariomycetes) and some Eurotiomycetes (Schumann and Hertweck 181

2007) (Figure 2, Table S1). Another unexpected finding was the similarity between a CO-182

OCCUR cluster in M. phaseolina and the BGC for leucinostatin, a peptaibol compound with 183

putative antimicrobial and antifungal activity, that was previously only known from taxa in the 184

Sordariomycetes (Wang, Liu et al. 2016). 185

Cluster co-occurrence networks reveal contrasting trends in diversification 186

We visualized all significant homolog group co-occurrences predicted by CO-OCCUR as 187

networks where nodes represent homolog groups and edges connect homolog groups that co-188

occur with unexpected frequency in genomic regions containing core biosynthetic genes (Figure 189

3a). A total of 33 discrete networks were recovered, with 71% of homolog groups located in the 190

largest two networks. Signature genes tended to be highly connected to other homolog groups in 191

two qualitatively different types of subnetworks. In one type of subnetwork, signature genes are 192

centrally connected to diverse accessory homolog groups (e.g. PKS subnetworks), while in the 193

other type one or more signature genes are non-centrally linked with fewer accessory homolog 194

groups (e.g., the NRPS and DMAT subnetwork in network 1). By quantifying the betweenness 195

centrality of each node (a function of the number of shortest network paths that pass through that 196

node) within each network, we identified signature genes and several other biosynthetic 197

enzymes, transporters, and DNA binding proteins that bridge alternate subnetworks (Figure 198

3a,b). 199


https://doi.org/10.1101/2020.01.31.928846

10

PKS BGCs are more compositionally diverse than NRPS BGCs. BGCs containing PKS 200

signature genes tended to have fewer significant co-occurrences among their constituent genes 201

across various BGC sizes, compared to BGCs containing NRPS signature genes (Figure 3c). 202

This is consistent with a trend in which clusters containing PKS signature genes have more 203

unique types of BGCs for a given cluster size (corrected by the total number of BGCs of that 204

size), compared with BGCs containing NRPS signature genes (Figure 3d). 205

Different algorithms annotate overlapping and complementary sets of clustered genes. 206

CO-OCCUR predictions and the pHMM-based SMURF (Khaldi, Seifuddin et al. 2010) 207

and antiSMASH (Blin, Wolf et al. 2017) programs all predicted similar numbers, but different 208

types of BGCs. antiSMASH identified a total of 1710 clusters that were part of 252 cluster 209

groups and 887 orphan clusters with 4 or more homolog groups (Table SH, Table SI). SMURF 210

identified a total of 686 clusters that were part of 194 cluster groups and 495 orphan clusters with 211

4 or more homolog groups (Table SJ, Table SK). CO-OCCUR predicted 1469 clusters that are 212

part of 239 cluster groups and 220 orphan clusters with 4 or more homolog groups (Table SC, 213

Table SE). We found that no single algorithm was able to annotate all predicted genes of interest 214

in a BGC, even those predicted to be involved in SM biosynthesis (Figure 4a, Table SL). CO-215

OCCUR identified 51.2% and 37.7% of the clustered genes detected by SMURF and 216

antiSMASH, respectively. Conversely, SMURF and antiSMASH identify 40.7% and 42.0% of 217

the clustered genes detected by CO-OCCUR, respectively. When examining only genes 218

predicted to participate in SM biosynthesis, transport and catabolism, we found that CO-OCCUR 219

identified 51.2% and 43.3% of genes detected by SMURF and antiSMASH, respectively, while 220

SMURF and antiSMASH each identified 62.6% of those detected by CO-OCCUR (Figure 4a). 221


https://doi.org/10.1101/2020.01.31.928846

DMAT

Hybrid

NRPS-like

TC

PKS

PKS-like

NRPS

Signature genes

Transport-relatedhomolog group

Homolog groupNode types

Gen

e cl

uste

r div

ersi

ty

0

10

20

30

40

0

10

20

30

40

Gene cluster size

0.2

0.4

0.6

0.8

4 6 8 10 12 14

co-occurences involving signature gene

co-occurences notinvolving signature gene

Gene cluster size4 6 8 10 12 14

Num

ber o

f sig

nific

ant c

o-oc

curre

nces

per

clu

ster

NRPS-LIKE (0.44)

HYBRID

NRPS-LIKE

PKS (0.42)

NRPS-LIKE

PKS (0.75)

TC

NRPS-LIKE

PKS-LIKE

NRPS (0.35)

PKS (0.78)

NRPS-LIKE

DMAT

PKS-LIKE

NRPS

a) b)

c)

d)

0

25

150

50

Betweeness Centrality

Num

ber o

f nod

es

0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8

Network 1 Network 2

Network 1

Network 2

n1 (0.32)

n2 (0.31)

n3 (0.31)

n4 (0.25)

n5 (0.22)

n6 (0.22)

n7 (0.21)

n8 (0.21)

n1: serine hydrolasen2: multi-drug resistance proteinn3: trichothecene efflux pumpn4: amino acid aminotransferase

n5: FAD-bindingn6: o-methyltransferasen7: DNA-binding proteinn8: mono-carboxylate transporter

Figure 3


https://doi.org/10.1101/2020.01.31.928846

6051(796)

1553(290)

5416(887)

1254(200)

3326(1301)

CO-OCCURSMURF

antiSMASH

CTB6CTB4

CTB2CTB1

CTB3CTB5

CTB7CTB8

ORF1185

332ORF12

4308

3CFP

CTB9CTB10

CTB11

CTB12

* * * * ** * ** ** required for cercosporin biosynthesis (Chen et al., 2007; Newman et al., 2016; de Jonge et al., 2018)

a)

b)

c)

3873(1468)

1408(200)

CO-OCCUR % recovery

antiS

MAS

H %

reco

very

0 25 50 75 1000

25

50

75

100

CO-OCCUR % discovery

antiS

MAS

H %

dis

cove

ry

0 100 200 300 400

100

200

300

400

0500

500

Figure 4


https://doi.org/10.1101/2020.01.31.928846

11

The complementary nature of the CO-OCCUR and antiSMASH algorithms is illustrated 222

by their annotations of a characterized BGC that encodes the biosynthesis of cercosporin, a non-223

host specific polyketide produced by Cercospora spp. (Dothideomycetes) and Colletotrichum 224

(Sordariomycetes) (de Jonge, Ebert et al. 2018). Encoded in a BGC, all 10 genes involved in 225

cercosporin biosynthesis, are known and characterized (CTB1-3, CTB5-7, CTB9), in addition to 226

a regulator (CTB8) and two transporters (CTB4 and CFP)(Chen, Lee et al. 2007; de Jonge, Ebert 227

et al. 2018; Newman & Townsend 2016). At this BGC’s locus in Cercospora zeae-maydis, both 228

antiSMASH and CO-OCCUR annotated CTB1, CTB2, and CTB3 as genes of interest; only 229

antiSMASH annotated CTB4, CTB5 and CTB6; only CO-OCCUR annotated CTB10, CTB11 230

and CTB12; and no algorithm annotated CTB7, CTB8, CTB9 or CFP (Figure 4b). 231

CO-OCCUR and antiSMASH recovered similar proportions of loci homologous to 232

known BGCs and predicted additional genes of interest in the vicinity of these candidate. Using 233

BLASTP, we identified 364 BGCs with ≥ 3 genes across all Dothideomycete genomes that are 234

homologous to 58 characterized BGCs from the MIBiG database (i.e., where ≥75% of genes 235

show similarity) (Table SM). We then compared how many genes within and around these BGCs 236

were predicted to be of interest by either antiSMASH or CO-OCCUR by cross-referencing all 237

BGCs detected by each method, and found that both algorithms recovered similar percentages of 238

BGC content (antiSMASH mean percent recovery = 48.3%, SD = 37.6%; CO-OCCUR mean 239

percent recovery = 51.0%, SD = 42.6%), although for any given BGC, percent recovery often 240

differed between each algorithm (Figure 4c, Table SG). We also found that both antiSMASH and 241

CO-OCCUR identified similar numbers of new genes of interest around BGC loci (antiSMASH 242

mean percent discovery = 65.4%, SD = 85.4%; CO-OCCUR mean percent discovery = 56.6%, 243

SD = 89.4%), and that the number of additional genes of interest often exceeded the size of the 244


https://doi.org/10.1101/2020.01.31.928846

12

recovered candidate cluster. High rates of novel gene discovery are perhaps expected given that 245

many of the clusters in MIBiG are only partially annotated. 246

Some over-dispersed clusters have ecologically biased distributions 247

We found that nearly one-fifth (18%) of cluster groups are phylogenetically over-248

dispersed when compared to expected distributions that would result from strict Brownian 249

evolution using Fritz and Purvis’ D statistic, where more closely related species are predicted to 250

be more similar to each other compared to more distantly related species (Figure 5, Figure SD). 251

Six over-dispersed cluster groups were over-represented (present at least twice as often) in either 252

plant pathotrophs or plant saprotrophs (Figure 5). By comparison 22.5% of cluster groups had 253

distributions that were more conserved than expected. The remaining cluster group distributions 254

either fell on a continuum between phylogenetically conserved and over-dispersed (35.1%) or 255

were present in sets of taxa too small to be analyzed (23.8%). Figure 6 presents three examples 256

of closely related cluster groups that vary in their phylogenetic conservation (See Methods). 257

Cluster groups in the first group, which partially encode the 1,8-dihydroxynaphthalene (DHN) 258

melanin pathway, were found in nearly all Dothideomycetes; cluster groups in the second group 259

were restricted to the Pleosporales; and cluster groups in the third group were found among 260

Bipolaris and Dothidotthia, two closely related genera within the Pleosporales. 261

Dothideomycetes have five distinct types of DHN melanin clusters 262

We detected five cluster groups with distinct but overlapping compositions that appear to 263

encode partial pathways for 1,8-dihydroxynaphthalene (DHN) melanin biosynthesis in 87 of the 264

101 genomes (Figure 6). No genome had more than one predicted DHN melanin cluster. The two 265

most prevalent types, cluster group 131 and cluster group 113, are found in 48 fungi from 10 of 266

the 13 taxonomic orders and in 29 fungi from 6 orders, respectively. Cluster groups 131, 113, 267


https://doi.org/10.1101/2020.01.31.928846

(a) group 221 NRPS 4(b) group 6 PKS 14

(c) group 182 PKS,TC 4(d) group 163 PKS,DMAT 5(e) group 49 PKS 8(f) group 202 PKS 5

52

3367

CG id Sig. gene(s) Size Freq.b)

(a)(b)(c)(d)(e)(f)

c)

HCG

Dothideomycetespecies tree

Plan

t pat

h. :

Plan

t sap

.Pl

ant s

ap. :

Pla

nt p

ath.

Life

styl

e ra

tios

a)

df

a

bec

8

7

6

5

4

3

2

1

0

1

2

3

4

5

6

7

-2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5

Fritz and Purvis' D

Best known signature(s)

0.10

Homologous cluster group (CG)P(Brownian motion) > 0.05P(Brownian motion) ≤ 0.05

Asperphenamate (35%)

Emodin, Emericellin,Pestheic acid (60%)Fusarubin (46%)

ACT-toxin (47%)

Emodin (66%)

Azanigerone (46%)

2

Figure 5


https://doi.org/10.1101/2020.01.31.928846

0.10

Set 1: DHN melanin

Polyketide synthase

( )( )

1,3,8-trihydroxynaphthalene

re

ductase

Transcription factor cmr1

Prefoldin subunit

hsp40 co-chaperone JID1

Unknown

(a) group 131

(b) group 113

(c) group 223

(d) group 134

(e) group 137

( ) = present in < 50% of clusters in HCG

a b c d e

2948 6 2 2

f g h i j

Set 2: Unknown NRPS-like

NRPS-like

Glycerol-3-phosphate

dehydrogenase

Unknown

Efflux pump

MFS multidrug

tra

nsporter

ABC transporter

(f) group 185

(g) group 168

(h) group 100

( )Gliotoxin

exporter

Total fungi with HCG: 5 18 25 3 2

Dot

hide

omyc

ete

spec

ies

tree

(i) group 16

Set 3: Alternapyrone-like PKS

PKS

(j) group 22

PKS Lanosterol

synthaseFAD binding

Reductase

Prenyltransferase

Unknown

FAD binding

Salicylate hydroxylase

Unknown

Berberine

a) b)Set 1 Set 2 Set 3

Figure 6


https://doi.org/10.1101/2020.01.31.928846

13

223 and 134 encode 2 of the 5 biosynthetic genes (pks1, 1,3,8-trihydroxynapthalnene [T3HN] 268

reductase) and the transcription factor (cmr1) involved in DHN melanin biosynthesis, while 269

cluster group 137 encodes 1 biosynthetic gene (T3HN reductase) and cmr1 (Figure 6b). In 270

addition to the homolog groups known to participate in DHN melanin biosynthesis, we detected 271

3 additional homolog groups (Prefoldin subunit, Heat-shock protein 40 co-chaperone JID1, and a 272

protein of unknown function) that are broadly conserved within DHN melanin clusters but that 273

have no known role in melanin biosynthesis. As an example of how CO-OCCUR is not 274

constrained by a priori assumptions of pHMMs, these additional homolog groups were not 275

detected by either antiSMASH or SMURF despite their prevalent linkage to the known 276

biosynthetic genes. 277

278

SM cluster diversity is under-sampled and increases proportionally with total number of 279

genomes in Pleosporales 280

Cluster repertoires (combinations of cluster groups found within a given genome) differ 281

markedly between fungi from different genera (mean pairwise Sørensen dissimilarity = 0.79, SD 282

= 0.12) and to a lesser extent within a genus (mean = 0.37, SD = 0.13), with dissimilarity 283

increasing linearly with phylogenetic distance across all pairwise species combinations among 284

49 Pleosporales (y = 0.84x + 0.51; r2 = 0.50), the most well-sampled Dothideomycetes order 285

(Figure 7a, Figure SF). However, given the same level of within-repertoire diversity (i.e., alpha 286

diversity) and total diversity across all repertoires (i.e., gamma diversity), dissimilarity between 287

repertoires (i.e., beta diversity) can result from either nestedness (where some repertoires are 288

subsets of others) or turnover (where no repertoire is a subset of the other), or a combination of 289

the two. When we partitioned total Sørensen dissimilarity between all cluster repertoires (βSOR = 290


https://doi.org/10.1101/2020.01.31.928846

Linkage tree of Raup-Crick dissimilarity between unique cluster types

Masph1Spofi1

Amnli1Melpu1

Veren1Lopnu1Polfu1

Linin1Zoprh1

Delco1Perma1

Cloaq1Trepe1

Bimnz1Lopma1

Lenfl1Karrh1Parsp1

Wesor1Bysci1

Maseb1Aaoar1Plesi1

Pyrtr1Pyrtt1

Coclu2Settu1

Altal1Altbr1

Cocmi1Cocca1Cocvi1

CocheC5_3Cocsa1

Stano2Ampqui1

Clael1Photr1

Macan1Didex1Lizem1

Stasp1Cucbe1Pyrsp1

Ophdi1Setho1

Decga1Dotsy1Lepmu1

1.0 0.8 0.6 0.4 0.2 0.0

0

100

200

300

400

0 25 50 75 100C

lust

er ri

chne

ss

10

15

20

25

Cluster repertoire size

Number of sampled Pleosporalean genomes

10 20 30 40

Clu

ster

repe

rtoire

div

ersi

ty(to

tal R

aup-

Circ

k bra

nch

leng

th)

a) b)

c)

Link

age

tree

of S

øren

sen

diss

imila

rity

betw

een

Pleo

spor

alea

n cl

uste

r rep

erto

ires

observed interpolation extrapolation

sampled Pleosporalean genome

Figure 7


https://doi.org/10.1101/2020.01.31.928846

14

0.969) into its nestedness and turnover components, we found that nearly all of the differences 291

between the cluster repertoires of different genomes were due to turnover (βSIM = 0.96) and not 292

repertoire nestedness (βSNE = 0.008), such that any given cluster repertoire contains a unique 293

combination of clusters (Figure SH). Furthermore, the compositional diversity of gene clusters 294

within a given repertoire (measured as the total branch length on a Raup-Crick linkage tree, 295

Figure 7a) scales linearly with repertoire size (y = 0.49x + 3.92; adj. R2 = 0.86), indicating that 296

clusters added to a given repertoire are generally dissimilar to the clusters already present in that 297

repertoire (Figure 7b). Finally, rarefaction analysis of the total number of unique cluster groups 298

and orphan clusters (i.e., cluster richness) detected over increasing numbers of sampled genomes 299

suggests genomes within Pleosporales are under-sampled with respect to BGC diversity, and 300

project substantially more unique cluster types arising from future genome sampling within this 301

order (Figure 7c). 302

Discussion: 303

BGC diversity has been investigated primarily in bacteria (Cimermancic, Medema et al. 304

2014) and within individual genera in the fungal classes Eurotiomycetes and Sordariomycetes 305

(Lind, Wisecaver et al. 2017; Theobald, Vesth et al. 2017; Villani, Proctor et al. 2019). Although 306

Dothideomycetes are producers of a number of secondary metabolites important to fungal-plant 307

interactions and toxin production, to date there has not been a systematic evaluation of BGC 308

diversity in the Dothideomycetes nor in any other fungal class. Fungal genomes experience 309

frequent reorganization and changes in gene composition that underlies large-scale differences in 310

chromosomal macro- and micro-synteny among species (Grandaubert, Lowe et al. 2014; Hane, 311

Rouxel et al. 2011; Shi‐Kunne, Faino et al. 2018). Yet despite the overall dynamic nature of 312

fungal chromatin, tight linkage is often maintained between loci with related metabolic 313


https://doi.org/10.1101/2020.01.31.928846

15

functions, manifesting as gene clusters (Del Carratore, Zych et al. 2014). Here, we developed an 314

alternative, function-agnostic approach to annotating SM genes of interest that exploits these 315

patterns of microsynteny in order to identify previously unexplored dimensions of fungal BGC 316

diversity. 317

Complementary methodologies enhance understanding of BGC composition and diversity 318

in Dothideomycetes 319

There are two main approaches to predicting genes that are functionally associated in 320

BGCs. The first uses targeted methods based on precomputed pHMMs derived from a set of 321

genes known to participate in SM metabolism to identify sequences of interest (Khaldi, 322

Seifuddin et al., 2010; Blin, Wolf et al., 2017). The second uses untargeted methods based on 323

some function-agnostic criteria, such as synteny conservation or shared evolutionary history, to 324

implicate genes as part of a gene cluster (Gluck-Thaler & Slot 2018). Due to common metabolic 325

functions employed across distantly related taxa, targeted approaches, such as those employed by 326

SMURF and antiSMASH, have proven enormously successful. However, our objective in this 327

study was to develop a complementary untargeted approach in order to capture undescribed BGC 328

diversity within a single fungal lineage. 329

Our CO-OCCUR algorithm leverages a database of 101 Dothideomycete genomes in 330

order to annotate genes of interest using unexpectedly conserved genetic linkage as an indicator 331

of selection for co-inheritance with SM signature genes. CO-OCCUR failed to recover many of 332

the genes annotated using the pHMM approaches employed by SMURF and antiSMASH, 333

indicating that it has limitations in its prediction of secondary metabolite BGC content. These 334

results suggest that it is not optimal for the de novo BGC annotation of individual genomes, and 335

its ability to annotate genes of interest is proportional to their co-occurrence frequency in a given 336


https://doi.org/10.1101/2020.01.31.928846

16

database, meaning that it is not well suited for recovering associated SM genes that are not 337

evolutionarily conserved. This may explain in part why 10,295 genes (including 2,478 genes 338

predicted to be involved in secondary metabolism) identified by antiSMASH and SMURF 339

combined were not detected with CO-OCCUR (Figure 4), and why CO-OCCUR detected only a 340

few of the host-selective toxins found in Dothideomycetes. 341

Nevertheless, our method avoids some of the limitations intrinsic to algorithms that 342

employ pHMMs to delineate cluster content. While pHMM-based approaches gain predictive 343

power by leveraging similarities in SM biosynthesis across disparate organisms, they may fail to 344

identify gene families involved in secondary metabolism that are unique to a particular lineage of 345

organisms. For example, SMURF detects accessory SM genes using pHMMs derived from 346

mostly Aspergillus (Eurotiomycetes) BGCs (Khaldi, Seifuddin et al., 2010), while antiSMASH 347

v4 and v5 use 301 pHMMs of smCOGs (secondary metabolism gene families) derived from 348

aligning SM-related proteins, of which few are currently from fungi, in order to identify genes of 349

interest in the regions surrounding signature biosynthetic genes (Blin, Wolf et al., 2017). 350

Taxonomic bias introduced by sampling a limited number of BGCs may account for the 6,051 351

proteins found in BCGs that were identified by CO-OCCUR but not any other algorithm, of 352

which 796 are predicted to participate in secondary metabolism and 617 could not be assigned to 353

a COG category but nevertheless have domains commonly observed in secondary metabolite 354

biosynthetic proteins (e.g., methyltransferase, hydrolase). 355

A linkage-based approach can also identify non-canonical accessory genes involved in 356

SM biosynthesis. For example, we detected 3 genes among the 5 variants of the DHN melanin 357

cluster that were not previously considered to be part of this BGC and not detected by either 358

antiSMASH or SMURF. One of these genes, a predicted HSP40 chaperone, is a homolog of the 359


https://doi.org/10.1101/2020.01.31.928846

17

yeast gene JID1, whose knock-out mutants display a range of phenotypes 360

(https://www.yeastgenome.org/locus/S000006265/phenotype) related to melanin production, 361

including increased sensitivity to heat and chemical stress. We propose that natural selection (not 362

genetic hitchhiking) is responsible for conservation of synteny in these loci, because SM cluster 363

locus composition and microsynteny in general are typically highly dynamic in fungi (Lind, 364

Wisecaver et al., 2018; Proctor, McCormick et al. 2018), and therefore conserved linkage in 365

these clusters over speciation events is a strong indicator of related function (de Jonge, Ebert et 366

al. 2018; Del Carratore, Zych et al. 2019). The identification of genes with non-canonical 367

functions, including those not participating directly in SM biosynthesis, may reveal SM 368

supportive functions, including mechanisms to protect endogenous targets of the metabolic 369

product (Keller 2015), in addition to novel biosynthetic genes (de Jonge, Ebert et al. 2018). 370

Ultimately, targeted and untargeted approaches to BGC annotation reinforce and enrich 371

our understanding of BGC diversity, as no single method identifies all accessory genes of 372

interest in the regions surrounding signature biosynthetic genes (Figure 4). It is notable that the 373

cercosporin BGC was long thought to consist only of CTB1-8, based on functional analyses and 374

structural prediction. However, de Jonge et al. recently predicted CTB9-12 to be of interest after 375

observing that these genes have conserved synteny among all fungi that possessed CTB1-8, and 376

subsequently demonstrated they are essential for cercosporin biosynthesis (de Jonge, Ebert et al. 377

2018). Only CO-OCCUR detected these additional four genes and bBoth pHMM-based models 378

and CO-OCCUR were required to detect the complete cercosporin BGC in our study. Given the 379

complementary nature of the advantages and disadvantages of different algorithms, we suggest 380

future studies incorporate multiple lines of evidence from both targeted and untargeted 381

approaches to more fully capture BGC compositional diversity. The 332 homolog groups of 382


https://doi.org/10.1101/2020.01.31.928846

18

interest that we identified using CO-OCCUR could further be used to build pHMMs and be 383

incorporated into existing BGC annotation pipelines in order to facilitate more complete analyses 384

of single genomes. 385

Signature genes differ in mode of BGC diversification 386

Although BGCs in fungi typically display characteristics of diversification ‘hotspots’, 387

showing elevated rates of gene duplication and gene gain and loss (Wisecaver, Slot et al. 2014; 388

Lind, Wisecaver et al. 2017), modular parts of clusters and even entire clusters are often shared 389

between divergent species. BGC diversification through gain and loss of individual genes and 390

sub-clusters of genes has been demonstrated in bacterial BGC diversity (Cimermancic, Medema 391

et al. 2014). Although the extent of sub-clustering in fungal genomes has never been directly 392

addressed to our knowledge, the algorithm we designed here essentially functions by identifying 393

the smallest possible type of sub-cluster: a pair of genes found more often than expected by 394

chance. The unexpected co-occurrence of gene pairs revealed that the two largest types of 395

signature gene families, PKS and NRPS, have contrasting co-occurrence network structures. 396

NRPS homolog groups are embedded in highly reticulate cliques (i.e., form unexpected 397

associations with genes that co-occur amongst themselves). This could suggest NRPS cluster 398

diversification is constrained by interdependencies among accessory genes. By contrast, PKS 399

homolog groups are network hubs (i.e., form unexpected associations with many non-co-400

occurring genes), which may underlie the higher compositional diversity and decreased 401

frequency of unexpected co-occurrences found within PKS clusters (Figure 3c, d). The apparent 402

contrast in how these different signature cluster types are assembled may reflect the range of 403

accessory modifications typically applied to the structures of polyketides and nonribosomal 404

peptides produced by PKSs and NRPSs. Alternatively, PKS clusters may be subject to more 405


https://doi.org/10.1101/2020.01.31.928846

19

diversifying selection, due to the ability of cognate metabolism in other organisms to utilize, 406

degrade, or neutralize the metabolites. These hypotheses remain to be tested. 407

Persistent gene co-occurrences reveal layers of combinatorial evolution 408

Previous large-scale analyses of BGCs suggest there is an upper limit to the number of 409

gene families that associate with signature biosynthetic genes, and that diversity is in large part 410

dependent on combinatorial re-shuffling of existing loci (Cimermancic, Medema et al. 2014). 411

Our analysis expands the number of gene families implicated in BGC diversity and identifies 412

patterns of modular combinatorial evolution among accessory homolog groups with metabolic, 413

transport and regulatory-related functions. While some of these accessory homolog groups are 414

restricted to BGCs with a particular type of signature SM gene, others are present in multiple 415

BGC types, suggesting they encode evolvable or promiscuous functions that can be readily 416

incorporated into different metabolic processes (Table SB). For example, 34 homolog groups 417

with predicted transporter functions are common features of the clusters we detected, present in 418

just under half (43%) of all predicted clusters. Among these homolog groups, 5 have been 419

recruited to compositionally diverse gene clusters and are primarily annotated as toxin efflux 420

transporters or multidrug resistance proteins. Transporters are a key component of fungal 421

chemical defense systems, well known for facilitating resistance to fungicides and host-produced 422

toxins (Coleman & Mylonakis 2009). Transporters are also increasingly recognized as integral 423

components of self-defense mechanisms against toxicity of endogenously produced SMs 424

(Menke, Dong et al., 2012). 425

Heterogeneous dispersal patterns of BGCs underpin fungal ecological diversity 426

The distribution of fungal chemodiversity remains difficult to observe and interpret 427

directly, making BGCs useful tools for elucidating underlying trends in fungal chemical ecology. 428


https://doi.org/10.1101/2020.01.31.928846

20

Although the vast majority of BGCs remain uncharacterized, their phylogenetic distributions 429

occasionally provide clues to the selective environments that promote their retention (Slot 2017). 430

For example, spotty distributions resulting from horizontal transfer of BGCs between distantly 431

related but ecologically similar species suggests the encoded metabolites contribute to fitness in 432

the shared environment (Dhillon, Feau et al. 2015; Reynolds, Vijayakumar et al. 2018). Shared 433

ecological lifestyle may also help explain why certain clusters, such as those involved in putative 434

degradative pathways, are retained among phylogenetically distant species (Gluck-Thaler and 435

Slot 2018). Our simple eco-evolutionary screen identified 43 BGCs that are more widely 436

dispersed than expected under neutral evolutionary models, and further revealed that a subset of 437

these BGCs are present more often in fungi with specific nutritional strategies (e.g. plant 438

saprotrophs and plant pathotrophs), suggesting the molecules they encode contribute to specific 439

plant-associated lifestyles (Figure 5). For example, we found an over-dispersed NRPS BGC 440

(group 221) that is present in three plant pathogens and one plant saprotroph. In contrast, the 54 441

BGCs showing a phylogenetically under-dispersed distribution among mostly closely related 442

genomes is consistent with lineage-favored traits, which may or may not be due to shared 443

ecology. For example, a monophyletic clade of 26 pleosporalean fungi all have a 6 gene NRPS-444

like cluster (group 100) of unknown function, fully maintained among these allied taxa (and a 445

single distant relative), suggesting it encodes a trait that contributes to the success of this lineage. 446

Phylogenetic screens, especially when coupled with more robust phylogenetic analyses, will be 447

useful for prioritizing the characterization of BGCs most likely to contribute to the success of 448

particular guilds or clades. 449

Among those BGCs with hits to the MIBiG database, we identified clusters that displayed 450

both lineage specific and spotty or sporadic distributions. The Pleosporales, for example, 451


https://doi.org/10.1101/2020.01.31.928846

21

contains many plant pathogens and the conservation of BGCs involved in production of general 452

virulence factors towards plants such as solanopyrone, alternapyrone, and the extracellular 453

siderophore dimethylcoprogen across many taxa in this order suggests a shared lineage-specific 454

trait with roles in plant-pathogenesis. In contrast, the aflatoxin-like cluster Dothistromin cluster, 455

which was proposed to be horizontally transferred from Aspergillus (Eurotiomycetes), had a very 456

spotty distribution, found only in several closely related taxa in Capnodiales, supporting a 457

hypothesis of HGT. Similarly, the ETP toxin sirodesmin shares 6 genes with the BGC producing 458

the epipolythiodioxopiperazine (ETP) toxin gliotoxin, which plays a role in virulence towards 459

animals in human pathogen Aspergillus fumigatus (Eurotiomycetes) (Gardiner, Cozijnsen et al. 460

2004; Bok, Chung et al. 2006). Related ETP-like BGCs have since been identified in a number 461

of other taxa of Eurotiomycetes and Sordariomycetes, but among Dothiodeomycetes were 462

previously known only from L. maculans and a partial cluster in Sirodesmin diversum lacking 463

the core NRPS (Patron, Waller et al. 2007). We detected homologs of this cluster found 464

sporadically distributed in several other taxa within Pleosporales (Figure 2, Table SG). The CO-465

OCCUR algorithm detected only a few BGC with hits to host-selective toxins (sirodesmin and T-466

toxin) but failed to detect several well-known host-selective toxins such as HC-toxin, and other 467

host-selective toxins in Alternaria alternata. Either these host-selective toxins are not represented 468

in MIBiG or as discussed above, the uniqueness of these clusters and rarity of the linkages 469

between genes in these clusters in the overall dataset may make them difficult to detect through 470

CO-OCCUR. 471

472

Variation among BGC repertoires is due to high BGC turnover, not nestedness 473


https://doi.org/10.1101/2020.01.31.928846

22

Recent comparative studies have documented high intraspecific diversity of SM 474

pathways within and between different species of plants, bacteria and fungi (Penn et al., 2009; 475

Choudoir, Pepe-Ranney & Buckley 2018; Holeski, Hillstrom et al. 2012; Holeski, Keefover-476

Ring et al. 2013; Vesth, Nybo et al. 2018). However, identical estimates of diversity can result 477

from two distinct processes: nestedness, where one set of features is entirely subsumed within 478

another, or turnover, where differences are instead due to a lack of overlap among the features of 479

different sets (Baselga 2012). When we partitioned diversity among BGC repertoires in 480

Pleosporales (i.e., β diversity), we found that the vast majority of variation is due to a high 481

degree of genome-specific cluster combinations, and not nestedness (Figure 7, Figure SH). Much 482

of the turnover in BGC repertoire content between genomes appears to occur over relatively 483

short evolutionary timescales (Figure SF), and then diversifies more gradually, suggesting that 484

divergence in repertoires may be closely linked to speciation processes, such as niche 485

differentiation or geographic isolation. Directional selection, especially for multi-genic traits 486

encoded at a single locus (e.g., BGCs), leads to rapid gain/loss dynamics exemplary of many SM 487

phenotypes and genotypes (Choudoir, Pepe-Ranney & Buckley 2018; Lind, Wisecaver et al. 488

2017). Niche differentiation further reinforces divergence between closely related repertoires, 489

which might lead to rapid accumulation of variation over short evolutionary timescales. Indeed, 490

evidence from within populations suggests that BGCs are occasionally located in genomic 491

regions experiencing selective sweeps in geographically isolated pathogen populations 492

(Hartmann, McDonald et al. 2018). The retention/loss of certain SM clusters is coincident with 493

speciation in bacteria (Kurmayer, Blom et al., 2015) and much of the variation in cluster 494

repertoires in Metarhizium insect pathogens is species specific (Xu, Luo et al., 2016). Within 495

Dothideomycetes, the evolution of host-selective toxins even within a single species of pathogen, 496


https://doi.org/10.1101/2020.01.31.928846

23

for example, may allow for niche differentiation, host specialization, and potentially speciation. 497

Rare chemical phenotypes, especially with regards to defense chemistry, may also increase 498

fitness in complex communities (Kursar, Dexter et al. 2009). 499

BGCs interact with dimensions of chemical diversity 500

Biological activity of a SM can increase organismal fitness, but any given molecule is not 501

likely to be biologically active. The screening hypothesis posits that mechanisms to generate and 502

retain biochemical diversity would therefore be selected, despite the energetic costs, because 503

increasing structural diversity increases the probability of “finding” those that are adaptive (Firn 504

and Jones 2003). This phenomenon is analogous to the mammalian immune system’s latent 505

capacity to generate novel antibodies, resulting in a remarkable ability to respond to diverse 506

antagonists (Firn and Jones 2003). However, while the screening hypothesis may equally apply 507

to plants and microorganisms, patterns of diversity we observe here suggest each lineage 508

generates and maintains biochemical diversity in fundamentally distinct ways. Specifically, 509

fungal individuals appear to maximize total chemical beta-diversity while simultaneously 510

minimizing alpha-diversity of similar chemical classes (Nielsen, Grijseels et al. 2017; Vesth, 511

Nybo et al. 2018). In contrast, individual plants are more likely to produce diverse suites of 512

structurally similar molecules (Li, Bladwin & Gaquerel 2015; Song, Qiao et al. 2017; Weinhold, 513

Ullah et al. 2017). We show that total cluster diversity increases linearly with repertoire size 514

across a broad sample of fungi, extending previous observations that individual fungal genomes 515

are streamlined to produce molecules that share little structural similarity. Rather than 516

maintaining sets of homologous BGCs and pathways within the same genome, evidence from 517

ours and other studies suggests that fungi instead maintain high genetic variation in homologous 518

BGCs across individuals at the level of the pan-genome (Ziemert, Lechner et al. 2014; Lind, 519


https://doi.org/10.1101/2020.01.31.928846

24

Wisecaver et al. 2017; Olarte et al.2019). Although not a selectable evolvability mechanism per 520

se, greater access to the diversity of BGCs harbored in pan-genomes through recombination, 521

hybridization and horizontal transfer effectively outsources the incremental screening for 522

bioactive metabolites across many individuals, thereby decreasing the costs for generating 523

diversity for any given individual and likely accelerating the rate at which effective bioactive 524

metabolite repertoires are assembled within a given lineage (Slot and Gluck-Thaler 2019). Our 525

characterization of BGC diversity across the largest fungal taxonomic class represents a step 526

towards elucidating the broader consequences of these contrasting strategies for generating and 527

maintaining biodiversity of metabolism writ large. 528

Conclusions: 529

Fungi produce a range of secondary metabolites that are linked to different ecological functions 530

or defense mechanisms, playing a role in adaptation over time. Although studied at intra- and 531

interspecific level, this phenomenon has not been studied at macroevolutionary scales. The 532

Dothideomycetes represent the largest and phylogenetically most diverse class of fungi, 533

displaying a range of fungal lifestyles and ecologies. Here we assessed the patterns of diversity 534

of biosynthetic gene clusters across the genomes of 101 Dothideomycetes to dissect patterns of 535

evolution of chemodiversity. Our results suggest that different classes of BGCs (e.g. PKS versus 536

NRPS) have differing diversity of cluster content and connectedness among networks of co-537

occurring genes and implicate high rates of BGC turnover, rather than nestedness, as the main 538

contributor to the high diversity of BGCs observed among fungi. Consequently, little overlap 539

was found in biosynthetic gene clusters from different genera, consistent with diverse ecologies 540

and lifestyles among the Dothideomycetes, and suggesting that most of the metabolic capacity of 541

this fungal class remains to be discovered. 542

543


https://doi.org/10.1101/2020.01.31.928846

25

Methods: 544

Dothideomycetes genome database and species phylogeny 545

A database of 101 Dothideomycetes annotated genomes, gene homolog groups, and the 546

corresponding phylogenomic species tree were obtained from (Grigoriev et al. 2014; Haridas et 547

al in press). 548

Gene cluster annotation with the SMURF algorithm 549

We used a command-line Python script based on the SMURF algorithm (Vesth, Nybo et al. 550

2018). Using genomic coordinate data and annotated PFAM domains of predicted genes as input, 551

the algorithm predicts seven types of SM clusters based on the multi-PFAM domain composition 552

of known 'backbone' genes. The cluster types are 1) Polyketide synthases (PKSs), 2) PKS-like, 3) 553

nonribosomal peptide-synthetases (NRPSs) 4) NRPS-like, 5) hybrid PKS-NRPS, 6) 554

prenyltransferases (DMATS), and 7) terpene cyclases (TCs). The borders of clusters are 555

determined using PFAM domains that are enriched in characterized SM clusters, allowing up to 556

3 kb of intergenic space between genes, and no more than 6 intervening genes that lack SM-557

associated domains. SM-associated PFAM domains were borrowed from Khaldi et al. (2010). 558

Gene cluster annotation with antiSMASH 559

All genomes were annotated using antiSMASH v4.2.0 by submitting genome assemblies 560

and GFF files to the public web server with options “use ClusterFinder algorithm for BGC 561

border prediction” and “smCOG analysis” (Blin, Wolf et al., 2017). antiSMASH reports all 562

genes within the borders of a predicted cluster as part of the cluster. For our analysis, we only 563

considered genes belonging to annotated smCOGs or signature biosynthetic gene families as part 564

of a given cluster and excluded all others, in order to obtain conservative, high confidence 565

estimations of cluster content based on genes of interest. 566


https://doi.org/10.1101/2020.01.31.928846

26

Sampling null homolog group-pair distributions 567

We created null distributions from which we could empirically estimate co-occurrence 568

probabilities by randomly sampling homolog group pairs without replacement from all 569

Dothideomycete genomes (Figure SA). Before beginning, we defined null distributions based on 570

two parameters: a range of sizes for the smallest homolog group in the pair, and a range of sizes 571

for the largest homolog group in the pair, where each range progressively incremented by 25 572

from 1-800 and all combinations of ranges were considered. For example, there existed a null 573

distribution for homolog group pairs where the smallest homolog group had between 26-50 574

members, and the largest homolog group had between 151-175 members. To begin, we randomly 575

sampled a genome and then randomly selected two genes within 6 genes of one other from that 576

genome. We retrieved the homolog groups to which those genes belonged, and then counted the 577

number of times members of each homolog group were found within 6 genes of each other 578

across all Dothideomycete genomes. We counted the number of members belonging to each 579

homolog group, excluding those that were found within 6 genes of the end of a contig, in order to 580

obtain a corrected size for each homolog group that accounted for variation in assembly quality. 581

The co-occurrence observation was then stored in the appropriate null distribution based on the 582

corrected sizes of each homolog group. For example, the number of co-occurrences of a sampled 583

homolog group pair where the smallest homolog group had a corrected size of 89, and the largest 584

homolog group had a corrected size of 732 would be placed in the null distribution where the 585

smallest size bin was 76-100 members, and the largest size bin was 726-750. All homolog 586

groups with greater than 800 members were assigned to the 776-800 size bin. This sampling 587

procedure was repeated 500,000 times. After evaluating various bin sizes, we ultimately decided 588

to use a range of 25 because this resulted in the most even distribution of samples across all null 589


https://doi.org/10.1101/2020.01.31.928846

27

distributions. Due to variation in the number of homolog groups with any given size across our 590

dataset, it was not possible for all null distributions to contain the same number of samples. 591

The CO-OCCUR pipeline 592

Current BGC detection algorithms first identify signature biosynthetic genes using profile 593

Hidden Markov Models (pHMMs) of genes known to participate in SM biosynthesis, and then 594

search predefined regions surrounding signature genes for co-located "accessory" biosynthetic, 595

regulatory, and transport genes. The approach of CO-OCCUR, in contrast, is to define genes of 596

interest based on whether they are ever found to have unexpectedly conserved syntenic 597

relationships with other genes in the vicinity of signature biosynthetic genes, agnostic of gene 598

function. Here, we used CO-OCCUR in conjunction with a preliminary SMURF analysis to 599

arrive at our final BGC annotations (Figure SA). We first took all SMURF BGC predictions and 600

extended their boundaries to genes within a 6 gene distance that belonged to homolog groups 601

found in another SMURF BGC, effectively “bootstrapping” the BGC annotations in order to 602

ensure consistent identification of BGC content across the various genomes. SMURF BGCs at 603

this point in the analysis were considered to consist of all genes found within the cluster’s 604

boundaries. For each pair of genes in each BGC (including signature biosynthetic genes), we 605

retrieved their homolog groups, and kept track of how many times that homolog group pair was 606

observed across all BGCs. Then, for each observed homolog group pair, we divided the number 607

of randomly sampled homolog group pairs in the appropriate null distribution (based on the 608

corrected sizes of the smallest and largest homolog groups within the observed pair, see above) 609

that had a number of co-occurrences greater than or equal to the observed number of co-610

occurrences by the total number of samples in the null distribution. In doing so, we empirically 611

estimated the probability of observing a homolog group pair with at least that many co-612


https://doi.org/10.1101/2020.01.31.928846

28

occurrences by chance, given the sizes of the homolog groups. In this way, we were able to take 613

into account the relative frequencies of each homolog group within a pair across all genomes 614

when assessing the probability of observing that pair’s co-occurrence. For example, if we 615

observed that homolog group 1 and homolog group 2 co-occurred 19 times within SMURF-616

predicted BGCs, and that homolog group 1 had 57 members while homolog group 2 had 391 617

members, we would count the number of randomly sampled homolog group pairs that co-618

occurred 19 or more times within the null distribution where the smallest homolog group size bin 619

was 51-75 and the largest homolog group size bin was 376-400, and then divided this by the total 620

number of samples in that same null distribution to obtain the probability of observing homolog 621

group 1 and homolog group 2’s co-occurrences by chance. All co-occurrences with an empirical 622

probability estimate of ≤0.05 were considered significant and retained for further analysis. In 623

order to decrease the risk of false positive error, we did not evaluate the probability of observing 624

any homolog group pairs with less than 5 co-occurrences, and also did not evaluate any homolog 625

group pairs whose corresponding null distribution had fewer than 10 samples. 626

Next, in order to obtain our final set of predicted BGCs, we took all homolog groups 627

found in significant co-occurrences, and conducted a de novo search in each genome for all 628

clusters containing genes belonging to those homolog groups within a 6 gene distance of each 629

other. In this way, all BGC clusters in our final set consisted of genes that belonged to these 630

homolog groups of interest, while all other intervening genes were not considered to be part of 631

the cluster. We treated homolog groups containing signature biosynthetic genes as we would any 632

other homolog group: if a signature gene predicted by SMURF was not a member of a homolog 633

group part of an unexpected co-occurrence, we did not consider it part of any clusters. We stress 634

that co-occurrences were only used to determine homolog groups of interest, but that once those 635


https://doi.org/10.1101/2020.01.31.928846

29

homolog groups were identified, they did not need to be part of an unexpected co-occurrence 636

within a predicted cluster in order to be considered part of that cluster. By focusing only on 637

genes that form unexpected co-occurrences, it is likely that we have underestimated the 638

compositional diversity of Dothideomycetes BGCs (but this may be the case for all cluster 639

detection algorithms; see Results). 640

We then grouped all predicted BGCs into homologous cluster groups (cluster groups) 641

based on a minimum of 90% similarity in their gene content, rounded down, in order to obtain a 642

strict definition of BGC homology that increases the likelihood that homologous clusters encode 643

similar metabolic phenotypes. This meant that clusters with sizes ranging from 2-10 were 644

allowed to differ in at most 1 gene; clusters with sizes ranging from 11-20 were allowed to differ 645

in at most 2 genes, etc. Clusters that were not at least 90% similar to any other cluster in the 646

dataset were designated orphan clusters. Note that because there is no perfect way to determine 647

homology when using similarity based metrics, (e.g., a 10 gene cluster could be 90% similar to a 648

9 gene cluster, which in turn could be 90% similar to a 8 gene cluster, but that 8 gene cluster 649

cannot be 90% similar to the 10 gene cluster), we developed a heuristic approach for sorting 650

clusters into groups. First, we conducted an all-vs-all comparison of content similarity to sort all 651

clusters into preliminary groups by iterating through the clusters from largest to smallest, where 652

size equaled the number of unique homolog groups, and clusters could only be assigned to a 653

single group. Then, within each preliminary group, we identified clusters most similar to all 654

other clusters within the group and used them as references to which all other clusters were 655

compared during a new round of group assignment. In this final round, clusters were grouped 656

together with a given reference into a cluster group if they were at least 90% similar to it and 657

were classified as orphan clusters if they were not 90% similar to any references. The often-658


https://doi.org/10.1101/2020.01.31.928846

30

unique compositions of clusters means that in most cases, there is no ambiguity to how the 659

clusters are classified; however, for a small number of clusters, especially those with fewer 660

genes, there may be some ambiguity as to which group they belong. 661

Annotation of BGCs and gene functions 662

In order to detect loci homologous to known BGCs in Dothideomycete genomes, amino 663

acid sequences of each annotated BGC within the MIBiG database (v1.4) were downloaded and 664

used as queries in a BLASTp search of all Dothideomycete proteomes (last accessed 665

04/01/2019). All hits with ≥50 bitscore and ≤1x10-4 evalue were retained, and clusters composed 666

of these hits were retrieved using a maximum of 6 intervening genes. In order to retain only 667

credible homologs of the annotated MIBiG queries and to account for error in BLAST searches 668

due to overlapping hits, we retained clusters with at least 3 genes that recovered at least 75% of 669

the genes in the initial query. This set of high confidence MIBiG BGCs was then compared to 670

the set of BGCs predicted by CO-OCCUR and antiSMASH to assess the ability of each 671

algorithm to recover homologs to known clusters. For each algorithm and each BGC recovered 672

using BLASTp to search the MIBiG database, we calculated percent recovery, defined as the 673

number of genes identified by the BLASTp search that were also identified as clustered by the 674

algorithm, divided by the size of the BGC identified by the BLASTp search, multiplied by 100. 675

We also calculated percent discovery, defined as the number of clustered genes identified by the 676

algorithm but not identified in the BLASTp search, divided by the size of the BGC identified by 677

the BLASTp search, multiplied by 100. 678

In order to annotate BGCs recovered by CO-OCCUR with characterized clusters, we 679

used amino acid sequences of all signature biosynthetic genes in CO-OCCUR clusters as 680

BLASTp queries in a search of the MIBiG database (min. percent similarity=70%; max 681


https://doi.org/10.1101/2020.01.31.928846

31

evalue=1x10-4; min. high scoring pairs coverage=50%). Basing our annotations on percent amino 682

acid similarity to characterized signature biosynthetic genes rather than on the number of genes 683

with similarity to BGC genes enabled a more conservative and comprehensive approach, as 684

many BGC entries within the MiBIG database are not complete. 685

Proteins within predicted BGCs were annotated using eggNOG-mapper (Huerta-Cepas, 686

Forslund et al., 2017) based on fungal-specific fuNOG orthology data (Huerta-Cepas, 687

Szklarczyk, et al., 2015). Consensus annotations for all homolog groups were derived by 688

selecting the most frequent annotation among all members of the group. 689

Comparing BGC detection algorithms 690

In order to assess the relative performances of SMURF, antiSMASH and CO-OCCUR, 691

we compared all BGCs predicted by each method, and kept track of the genes within those BGCs 692

that were identified by either one or multiple methods. We summarized these findings in a venn 693

diagram using the “eulerr” package in R (Larsson 2019). Note that for the purposes of this 694

analysis, BGCs predicted by SMURF and antiSMASH were considered to be composed only of 695

genes that matched a precomputed pHMM, and BGCs predicted by CO-OCCUR were composed 696

only of genes belonging to homolog groups that were part of unexpected co-occurrences, while 697

all other intervening genes within the BGC’s boundaries were not considered to be part of the 698

cluster. In doing so, we effectively ignored intervening genes that were situated between or are 699

immediately adjacent to these clustered genes of interest for the purposes of defining a cluster’s 700

content. While this approach likely does not capture the full diversity of cluster composition, it is 701

expected to decrease false positive error in BGC content prediction and represents a conservative 702

approach to identifying what genes make up a given cluster. 703

Construction of a co-occurrence network 704


https://doi.org/10.1101/2020.01.31.928846

32

We visualized relationships between homolog group pairs with unexpectedly large 705

numbers of co-occurrences in a network using Cytoscape v.3.4.0 (Shannon, Markiel et al. 2003). 706

The network layout was determined using the AllegroLayout plugin with the Allegro Spring-707

Electric algorithm. In order to identify hub nodes within the network, we calculated betweeness 708

centrality, a measurement of the shortest paths within a network that pass through a given node, 709

for each node using Cytoscape. 710

Assessment of cluster group phylogenetic signal 711

In order to quantify the dispersion of phylogenetic distributions of cluster groups 712

predicted by CO-OCCUR, we created a binary genome x cluster group matrix for all 239 cluster 713

groups with ≥4 genes that indicated the presence or absence of these cluster groups across all 101 714

genomes. We used this matrix in conjunction with the “phylo.d” function from the “caper” 715

package v1.0.1 in R (Orme, Freckleton et al. 2012) to calculate Fritz and Purvis’ D statistic for 716

each cluster group’s distribution, where D is a measurement of phylogenetic signal for a binary 717

trait obtained by calibrating the observed number of changes in a binary trait’s evolution across a 718

phylogeny by the mean sum of changes expected under two null models of binary trait evolution. 719

The first null model simulates the phylogenetic distribution expected under a model of random 720

trait inheritance, and the second simulates the phylogenetic distribution expected under a 721

threshold model of Brownian evolution that evolves a trait along the phylogeny under a 722

Brownian process where variation in that trait’s distribution accumulates at a rate proportional to 723

branch length (Fritz & Purvis 2010). D≈1 if the trait has phylogenetically random distribution; 724

D≈0 if the trait has a phylogenetic distribution that follows the Brownian model; D>1 if the trait 725

has a phylogenetic distribution that is less conserved, or over-dispersed, compared to the 726


https://doi.org/10.1101/2020.01.31.928846

33

Brownian model; D < 0 if the trait has a phylogenetic distribution that is more conserved, or 727

under-dispersed, compared to the Brownian model. 728

Dissimilarity and Diversity Analyses 729

We created cluster group and orphan cluster x homolog group matrices in order to 730

determine the dissimilarity between predicted cluster groups. In these matrices, for each cluster 731

group or orphan cluster, we indicated the presence or absence of homolog groups in at least one 732

cluster within the cluster group or orphan cluster, effectively summarizing each cluster group and 733

orphan cluster by integrating over the content of all clusters assigned to that group. We next used 734

the matrix in conjunction with the “vegdist” function from the “vegan” package in R (Oksanen, 735

Blanchet et al. 2016) to create a Raup-Crick dissimilarity matrix that was visualized as a 736

dendrogram using complete linkage clustering as implemented in the “hclust” function from the 737

core “stats” package in R. These dendrograms were then used to assess the functional diversity 738

of BGC repertoires (e.g., in the Pleosporales) by measuring the total branch distance connecting 739

all cluster groups and orphan clusters within a given repertoire using the “treedive” function 740

from the “vegan” package in R. 741

We used the same above procedure to calculate Sørensen dissimilarity between 742

Pleosporalean genomes based on their BGC repertoires, only this time using a genome x cluster 743

group and orphan cluster matrix that depicted the presence or absence of cluster groups and 744

orphan clusters across all 49 Pleosporalean genomes. We also used this matrix to calculate and 745

partition β diversity in Pleosporalean cluster repertoires using the “beta.sample” function 746

(index.family = "sorensen", sites = 10, samples = 999) from the “betapart” v1.4 package in R 747

(Baselga & Orme 2012) in order to determine how much of the observed diversity among 748

repertoires was due to gain/loss of cluster groups and orphan clusters, and how much was due to 749


https://doi.org/10.1101/2020.01.31.928846

34

nestedness. We also used the genome x cluster group and orphan cluster matrix to conduct a 750

rarefaction of cluster richness across Pleosporalean genomes using the “iNEXT” function (q = 0, 751

datatype="incidence_raw", endpoint=98) from the “iNEXT” package in R (Hseih, Ma & Chao 752

2016). 753

Abbreviations: 754

BGC - Biosynthetic Gene Cluster; pHMM - profile Hidden Markov Model; SM - Secondary 755

Metabolite; PKS - Polyketide Synthetase; NRPS - Nonribosomal Peptide Synthetase; TC - 756

Terpene Cyclase; DMAT - dimethylallyl tryptophan synthase; 757

Declarations: 758

Ethics approval and consent to participate 759

No human subjects were involved in the research 760

Consent for publication 761

No human data was used in the research 762

Availability of data and materials 763

All genome data is available at https://mycocosm.jgi.doe.gov/mycocosm/home 764

and described in Haridas et al. in press. 765

All scripts used in the analyses are available at https://github.com/egluckthaler/co-occur 766

Additional data generated in this study is included in the supplemental datafile 767

Funding 768

This work was supported by the National Science Foundation (DEB-1638999, JCS), the Fonds 769

de Recherche du Québec-Nature et Technologies (EG-T), and the Ohio State University 770

Graduate School (EG-T). The work conducted by the U.S. Department of Energy Joint Genome 771


https://doi.org/10.1101/2020.01.31.928846

35

Institute, a DOE Office of Science User Facility, is supported by the Office of Science of the 772

U.S. Department of Energy under Contract No. DE-AC02-05CH11231. 773

Authors' contributions 774

Formulated the study EG-T, KEB, JCS 775

Designed the methodology EG-T, JCS 776

Generated resources EG-T, PWC, IG, MB 777

Collected the data EG-T, SH 778

Analyzed the data EG-T 779

Provided leadership and/or mentorship in the study JWS, JCS 780

Prepared the manuscript EG-T, JCS 781

Contributed to the writing/editing of the manuscript - KEB, PWC, JWS 782

Acknowledgements 783

Computational work by EG-T was conducted using the resources of the Ohio Supercomputer 784

Center. 785

Competing interests 786

The authors declare that they have no competing interests. 787

Figure legends: 788

Figure 1. CO-OCCUR pipeline. The pipeline used genome annotations from 101 789

Dothideomycetes, and previously computed homolog groups (HGs) consisting of both orthologs 790

and paralogs (Haridas et al. in press). Biosynthetic gene clusters (BGCs) were inferred by 791

determining unexpectedly distributed shared HG pairs, determined according to a null-792

distribution of randomly sampled gene pairs in the same genomes, and then a search for all 793

clusters containing the HG pairs. The resulting BGCs were then either consolidated into cluster 794


https://doi.org/10.1101/2020.01.31.928846

36

groups (CGs) that share ~90% of gene content or labeled orphan cluster (OCs) if found only in a 795

single taxon. A detailed pipeline is presented in Figure SA. 796

Figure 2. Diversity of the largest detected secondary metabolite gene clusters across 101 797

Dothideomycetes. A maximum likelihood phylogenomic tree of 101 Dothideomycete species 798

(Haridas et al. in press) corresponds to rows in a heatmap (right) that depicts the number of 799

secondary metabolite clusters found in each genome, delimited by order (dotted line). Each 800

cluster is assigned to a homologous cluster group (cluster group; column) defined by at least 801

90% similarity at the composition level. Only cluster groups with ≥ 5 unique homolog groups per 802

cluster are shown. A complete linkage tree (top) depicts relationships among cluster groups, 803

where distance is proportional to the Raup-Crick dissimilarity in cluster group composition. 804

Cluster groups are colored according to their core signature biosynthetic genes, and cluster 805

groups with greater than 1 signature gene are left uncolored. Cluster groups with signature genes 806

≥70% identical to characterized BGC signature genes in MIBiG are indicated by a labeled red 807

box. 808

Figure 3. Gene co-occurrence networks among biosynthetic signature gene clusters. a) Co-809

occurrence network of gene homolog groups (homolog groups). Nodes in the co-occurrence 810

network represent all homolog groups found in homologous cluster groups (cluster groups). 811

Edges represent significant co-occurrences between homolog groups. Node size is proportional 812

to the number of significant co-occurrences involving that homolog group, and edge width is 813

proportional to the number of unique cluster types (either cluster groups or orphan clusters) with 814

≥ 4 homolog groups that contain the co-occurrence. Distance between nodes is proportional to 815

the number of co-occurrences they have in common, adjusted by edge width. Signature genes 816

(colored circles) and transport-related function (squares) are indicated. Betweenness centrality 817


https://doi.org/10.1101/2020.01.31.928846

37

scores ≥0.2 are indicated in brackets for signature genes and eight other nodes (n1-8). Networks 818

1 and 2 are the two largest networks. b) Histogram of betweenness centrality scores for all nodes 819

in Networks 1 and 2 (bin width = 0.1). c) Significant co-occurrences within PKS and NRPS 820

clusters. Boxplots of homolog group co-occurrences involving signature genes (top) and non-821

signature genes (bottom) across all polyketide synthase (PKS; green) and nonribosomal 822

polypeptide synthetase (NRPS; purple) clusters with ≥4 unique homolog groups. Boxplots 823

display the 75% percentile (top hinge), median (middle hinge), the 25th percentile (lower hinge), 824

and outliers (dots) determined by Tukey’s method. d) Diversity of PKS and NRPS clusters. A 825

line chart tracks the diversity of PKS and NRPS clusters across all cluster sizes for both PKS 826

(green) and NRPS (purple) clusters, where diversity is defined as the total number of unique 827

cluster types (either cluster group or orphan clusters) divided by the total number of clusters. 828

Figure 4. Benchmarking three different algorithms for biosynthetic gene cluster (BGC) 829

detection. a) Proportional Venn diagram of distinct and overlapping BGC genes of interest 830

detected by SMURF, antiSMASH and CO-OCCUR. SMURF and antiSMASH use profile Hidden 831

Markov Models (pHMMs) to identify clustered genes of interest, while CO-OCCUR uses 832

linkage-based criteria (see methods). Clustered genes (unbracketed) and secondary metabolism 833

biosynthesis, transport and catabolism clustered genes (fuNOG) detected are indicated for each 834

algorithm/combination. b) Complementary recovery of the cercosporin BGC using antiSMASH 835

and CO-OCCUR). Shading of genes in the Cercospora zeae-maydis cercosporin BGC (MIBiG 836

ID BGC0001541; recovered clusterID Cerzm1_BGC0001541_h92 in Table SG) indicates genes 837

identified by antiSMASH (blue), CO-OCCUR (yellow), or both algorithms (green). Gene names 838

are as in (de Jonge et al., 2018) and those required for cercosporin biosynthesis (Chen et al., 839

2007; Newman et al., 2016; de Jonge et al., 2018) are indicated with an asterisk. c) Gene 840


https://doi.org/10.1101/2020.01.31.928846

38

recovery and discovery in clusters homologous to known BGCs. Scatterplots show the percent of 841

genes recovered (top) or discovered (bottom) by antiSMASH vs. CO-OCCUR at each locus 842

homologous to a MIBiG BGC (search criteria: minimum 3 gene cutoff; minimum of 75% genes 843

similar to MIBiG BGC genes in locus). Percent recovery is defined as the number of genes 844

identified by BLASTp in an algorithm-identified cluster divided by the size of the BLASTp 845

identified BGC, multiplied by 100. Percent discovery is defined as the number of genes 846

identified by the cluster algorithm but not identified in the BLASTp search, divided by the size 847

of the BLASTp identified BGC, multiplied by 100. y = x at the dotted reference line. 848

Figure 5. Phylogenetic and ecological signal in the distributions of homologous cluster 849

groups (cluster groups). a) Scatterplot of phylogenetic and ecological signal of cluster groups. 850

Values along the X-axis correspond to Fritz and Purvis’ D statistic, representing phylogenetic 851

signal in a cluster group’s distribution. Distributions of cluster groups with D<0 are more 852

conserved compared to a Brownian model of trait evolution, and distributions of cluster groups 853

with D>1 are considered over-dispersed. Cluster groups with more pathotrophs than saprotrophs 854

have Y >0 while cluster groups with more saprotrophs than pathotrophs have Y <0. Point 855

representing cluster group distributions with probability ≤0.05 of Brownian trait evolution are in 856

black, while those >0.05 are in gray. Cluster groups with P(Brownian) ≤0.05 and a lifestyle ratio 857

≥2 are labeled and described in b) and c). Only cluster groups with ≥ 4 unique gene homolog 858

groups (homolog groups) per cluster are shown. b) Summary descriptions of labeled cluster 859

groups. Sig. genes = signature genes present in the cluster group; Size = number of unique 860

homolog groups in the cluster group reference cluster; Freq. = number of fungi with a cluster that 861

belongs to the cluster group; Best known signature(s) = signature gene(s) from the MIBiG 862

database with the highest similarity to signature genes from the cluster group, with average 863


https://doi.org/10.1101/2020.01.31.928846

39

percentage similarity shown in parentheses. c) Phylogenetic distributions of labeled cluster 864

groups. Presence (black cells) and absence (gray cells) matrix of clusters assigned to each 865

labeled cluster group across Dothideomycetes genomes tree as in Figure 2. 866

Figure 6. Three examples of homologous cluster groups (cluster groups) with conserved 867

phylogenetic distributions. a) Cluster group distributions. Presence (black cells) and absence 868

(gray cells) matrix of clusters assigned to various cluster groups (columns a-j, described in part 869

b), across Dothideomycetes genomes tree as in Figure 2. Each matrix contains distinct sets of 870

cluster groups that are separated by ≤0.05 distance units on the complete linkage tree in Figure 2. 871

The number of fungi with each cluster group is indicated at the bottom of each column. b) 872

Cluster group composition. Cluster groups in set 1 are predicted to encode DHN melanin 873

biosynthesis; set 2 contains unknown cluster groups with NRPS-like signature genes; set 3 874

contains unknown cluster groups with PKS signature genes, where the PKSs from group 16 are 875

on average 84% similar to the PKS in the characterized alternapyrone cluster (MIBiG ID: 876

BGC0000012). Homolog group presence in a given cluster group is indicated by a gray box 877

below the description. Brackets surround homolog groups present in <50% of clusters assigned 878

to a given cluster group. 879

Figure 7. Diversity of secondary metabolite gene cluster repertoires in Pleosporalean fungi. 880

a) Grouping of fungi based on the combinations of gene clusters (i.e., cluster repertoires) found 881

in their genomes. Shown to the left is a complete linkage tree where distance between different 882

fungal species is proportional to the Sorensen dissimilarity between their cluster repertoires. To 883

the right is a presence (black) and absence (white) matrix where each column represents a unique 884

cluster type (either a homologous cluster group or cluster orphan) and each row corresponds to 885

the adjacent fungal genome. On top of the heatmap is a complete linkage tree displaying 886


https://doi.org/10.1101/2020.01.31.928846

40

relationships between unique cluster types, where distance is proportional to the Raup-Crick 887

dissimilarity in cluster composition. b) Relationship between cluster repertoire size and cluster 888

repertoire diversity. Cluster repertoire diversity was calculated for each genome by finding the 889

total branch length on the Raup-Crick dissimilarity tree in a) associated with the set of clusters 890

found in that genome. Cluster repertoire diversity is thus a measurement of a given genome’s 891

repertoire diversity, in terms of the gene content of its clusters. A solid line models the linear 892

relationship between repertoire size and diversity (adj. R2 = 0.855). The shaded area around the 893

line represents the 95% confidence interval associated with the model. c) Sampled and projected 894

secondary metabolite cluster richness within the Pleosporales. Rarefied (solid lines) and 895

extrapolated (dotted lines) estimates of secondary metabolite gene cluster richness (i.e., the 896

number of unique cluster types) with respect to the number of sampled genomes are shown for 897

the Pleosporales. Shaded areas represent the 95% confidence intervals for both estimate types, 898

derived from 100 bootstrap replicates. All three graphs were generated using data from the 318 899

unique cluster types with ≥ 4 unique gene homolog groups that are associated with 47 900

Pleosporalean fungi and 2 as yet unclassified fungi found within the Pleosporalean clade on the 901

phylogenomic species tree in Figure 2. 902

903

Supporting information: 904

Table SA. Gene homolog groups (homolog groups) part of unexpected co-occurrences. 905

Table SB. Unexpected co-occurrences between gene homolog groups (homolog groups) 906

occurring in the vicinity of signature biosynthetic genes, and their frequency across different SM 907

classes. 908

Table SC. Positional information of all recovered CO-OCCUR clusters. 909


https://doi.org/10.1101/2020.01.31.928846

41

Table SD. Genomes used in this study. 910

Table SE. Cluster types (groups and orphans detected by CO-OCCUR. 911

Table SF. BLAST-based annotation of CO-OCCUR clusters with known signature biosynthetic 912

genes from the MIBiG database. 913

Table SG. Cross-referencing clusters retrieved by CO-OCCUR, antiSMASH, and BLAST 914

searches of MIBiG database to determine percent recovery and discovery. 915

Table SH. positional information of all recovered antiSMASH clusters. 916

Table SI. Cluster types (groups and orphans detected by antiSMASH. 917

Table SJ. positional information of all recovered SMURF clusters. 918

Table SK. Cluster types (groups and orphans detected by SMURF). 919

Table SL. Overlapping and complementary recovery of clustered genes of interest using 920

antiSMASH, SMURF, and CO-OCCUR. 921

Table SM. Positional information of all clusters recovered with a BLASTp search of the MIBiG 922

database. 923

924


https://doi.org/10.1101/2020.01.31.928846

42

References: 925 Agrawal AA, Hastings AP, Johnson MT, Maron JL, Salminen JP. Insect herbivores drive real-926

time ecological and evolutionary change in plant populations. Science. 2012;338:6103. 927 Akimitsu K, Tsuge T, Kodama M, Yamamoto M, Otani H. Alternaria host-selective toxins: 928

determinant factors of plant disease. Journal of General Plant Pathology. 2014;80:2. 929 Ali A, Caggia S, Matesic DF, Khan SA. Chaetoglobosin K, an Akt pathway inhibitor, prevents 930

proliferation and migration of prostate carcinoma cells. 2015. 931 Baselga A, Orme CD L. betapart: an R package for the study of beta diversity. Methods in 932

ecology and evolution. 2012;3:5. 933 Baselga, A. The relationship between species replacement, dissimilarity derived from nestedness, 934

and nestedness. Global Ecology and Biogeography. 2012;21:12. 935 Beimforde C, Feldberg K, Nylinder S, Rikkinen J, Tuovila H, Dörfelt H, M. Gube DJ Jackson, 936

Reitner J, Seyfullah LJ. Estimating the Phanerozoic history of the Ascomycota lineages: 937 combining fossil and molecular data. Molecular phylogenetics and evolution. 2014;78. 938

Blin K, Wolf T, Chevrette MG, Lu X, Schwalen CJ, Kautsar SA, ... Medema MH. antiSMASH 939 4. 0—improvements in chemistry prediction and gene cluster boundary identification. 940 Nucleic acids research. 2017;45:W1. 941

Bok JW, Chung D, Balajee SA, Marr KA, Andes D, Nielsen KF, ... Keller NP. GliZ, a 942 transcriptional regulator of gliotoxin biosynthesis, contributes to Aspergillus fumigatus 943 virulence. Infection and immunity. 2006;74:12. 944

Bradshaw RE, Slot JC, Moore GG, Chettri P, de Wit PJ, Ehrlich KC, ... Cox MP. Fragmentation 945 of an aflatoxin‐like gene cluster in a forest pathogen. New Phytologist. 2013;198:2. 946

Chen H, Lee MH, Daub ME, Chung KR. Molecular analysis of the cercosporin biosynthetic gene 947 cluster in Cercospora nicotianae. Molecular microbiology. 2007;64:3. 948

Choudoir MJ, Pepe-Ranney C, Buckley DH. Diversification of secondary metabolite 949 biosynthetic gene clusters coincides with lineage divergence in Streptomyces. 950 Antibiotics. 2018;7:1. 951

Cimermancic P, Medema MH, Claesen J, Kurita K, Brown LC W, Mavrommatis K, ... Birren 952 BW. Insights into secondary metabolism from a global analysis of prokaryotic 953 biosynthetic gene clusters. Cell. 2014;158:2. 954

Ciuffetti LM, Manning VA, Pandelova I, Betts MF, Martinez JP. Host‐selective toxins, Ptr ToxA 955 and Ptr ToxB, as necrotrophic effectors in the Pyrenophora tritici‐repentis–wheat 956 interaction. New Phytologist. 2010 Sep;187(4):911-9. 957

Coleman JJ, Mylonakis, E. Efflux in fungi: la piece de resistance. PLoS Pathogens. 2009;5:6. 958 Condon BJ, Elliott C, González JB, Yun SH, Akagi Y, Wiesner-Hanks T, Kodama M, Turgeon 959

BG. Clues to an evolutionary mystery: the genes for T-Toxin, enabler of the devastating 960 1970 Southern corn leaf blight epidemic, are present in ancestral species, suggesting an 961 ancient origin. Molecular plant-microbe interactions. 2018 Nov 12;31(11):1154-65. 962

Crameri R, Garbani M, Rhyner C, Huitema, C. Fungi: the neglected allergenic sources. Allergy. 963 2014;69:2. 964


https://doi.org/10.1101/2020.01.31.928846

43

Daly, J. The host-specific toxins of Helminthosporia. Plant Infection: The Physiological and 965 Biochemical Basis. Asada Y, Bushnell, WR, Ouchi, S., and Vance CP Berlin, 966 Springer_verlag. 1982. 967

De Jonge R, Ebert MK, Huitt-Roehl CR, Pal P, Suttle JC, Spanner RE, ... Thomma BP. Gene 968 cluster conservation provides insight into cercosporin biosynthesis and extends 969 production to the genus Colletotrichum. Proceedings of the National Academy of 970 Sciences. 2018;115:24. 971

Del Carratore F, Zych K, Cummings M, Takano E, Medema MH, Breitling, R. Computational 972 identification of co-evolving multi-gene modules in microbial biosynthetic gene clusters. 973 Communications Biology. 2019;2:1. 974

Dhillon B, Feau N, Aerts AL, Beauseigle S, Bernier L, Copeland A, ... LaButti KM. Horizontal 975 gene transfer and gene dosage drives adaptation to wood colonization in a tree pathogen. 976 Proceedings of the National Academy of Sciences. 2015;112:11. 977

Firn RD, Jones CG. Natural products–a simple model to explain chemical diversity. Natural 978 product reports. 2003;20:4. 979

Fritz SA, Purvis, A. Selectivity in mammalian extinction risk and threat types: a new measure of 980 phylogenetic signal strength in binary traits. Conservation Biology. 2010;24:4. 981

Fujii I, Yoshida N, Shimomaki S, Oikawa H, Ebizuka, Y. An iterative type I polyketide synthase 982 PKSN catalyzes synthesis of the decaketide alternapyrone with regio-specific octa-983 methylation. Chemistry biology. 2005;12:12. 984

Gardiner DM, Cozijnsen AJ, Wilson LM, Pedras MS, Howlett BJ. The sirodesmin biosynthetic 985 gene cluster of the plant pathogenic fungus Leptosphaeria maculans. Molecular 986 microbiology. 2004 Sep;53(5):1307-18. 987

Gardiner DM, Waring P, Howlett BJ. The epipolythiodioxopiperazine (ETP) class of fungal 988 toxins: distribution, mode of action, functions and biosynthesis. Microbiology. 989 2005;151:4. 990

Glassmire AE, Jeffrey CS, Forister ML, Parchman TL, Nice CC, Jahner JP, ... Leonard MD. 991 Intraspecific phytochemical variation shapes community and population structure for 992 specialist caterpillars. New Phytologist. 2016;212:1. 993

Gluck-Thaler E, Slot JC. Specialized plant biochemistry drives gene clustering in fungi. The 994 ISME journal. 2018;12:7. 995

Gluck‐Thaler E, Vijayakumar V, Slot JC. Fungal adaptation to plant defences through 996 convergent assembly of metabolic modules. Molecular ecology. 2018;27:24. 997

Goodwin SB, M'Barek SB, Dhillon B, Wittenberg AH, Crane CF, Hane JK, ... Antoniw, J. 998 Finished genome of the fungal wheat pathogen Mycosphaerella graminicola reveals 999 dispensome structure, chromosome plasticity, and stealth pathogenesis. PLoS genetics. 1000 2011;7:6. 1001

Grandaubert J, Lowe RG, Soyer JL, Schoch CL, Van de Wouw AP, Fudal I, ... Linglin, J. 1002 Transposable element-assisted evolution and adaptation to host plant within the 1003


https://doi.org/10.1101/2020.01.31.928846

44

Leptosphaeria maculans-Leptosphaeria biglobosa species complex of fungal pathogens. 1004 BMC genomics. 2014;15:1. 1005

Grigoriev IV, Nikitin R, Haridas S, Kuo A, Ohm R, Otillar R, Riley R, Salamov A, Zhao X, 1006 Korzeniewski F, Smirnova T. MycoCosm portal: gearing up for 1000 fungal genomes. 1007 Nucleic acids research. 2014 Jan 1;42(D1):D699-704. 1008

Hane JK, Rouxel T, Howlett BJ, Kema GH, Goodwin SB, Oliver RP. A novel mode of 1009 chromosomal evolution peculiar to filamentous Ascomycete fungi. Genome biology. 1010 2011;12:5. 1011

Haridas S, Albert R, Binder M, Bloem J, LaButti K, Salamov A, Andreopoulos B, Baker, SE, 1012 Barry K, Bills G, Bluhm, BH, Cannon C, Castanera R, Culley, DE, Daum C, Ezra D, 1013 González, JB, Henrissat B, Kuo A, Liang C, Lipzen A, Lutzoni F, Magnuson J, Mondo S, 1014 Nolan M, Ohm, RA, Pangilinan J, Park, H-J, Ramírez L, Alfaro M, Sun H, Tritt A, 1015 Yoshinaga Y, Zwiers L-H, Turgeon BG, Goodwin SB, Spatafora JW, Crous PW, 1016 Grigoriev IV. 101 Dothideomycetes genomes: a test case for predicting lifestyles and 1017 emergence of pathogens. Studies in Mycology, in press. 1018

Hartmann FE, McDonald BA, Croll, D. Genome‐wide evidence for divergent selection between 1019 populations of a major agricultural pathogen. Molecular ecology. 2018;27:12. 1020

Hoffmeister D, Keller NP. Natural products of filamentous fungi: enzymes, genes, and their 1021 regulation. Natural product reports. 2007;24:2. 1022

Holeski LM, Hillstrom ML, Whitham TG, Lindroth RL. Relative importance of genetic, 1023 ontogenetic, induction, and seasonal variation in producing a multivariate defense 1024 phenotype in a foundation tree species. Oecologia. 2012;170:3. 1025

Holeski LM, Keefover-Ring K, Bowers MD, Harnenz ZT, Lindroth RL. Patterns of 1026 phytochemical variation in Mimulus guttatus (yellow monkeyflower). Journal of 1027 chemical ecology. 2013;39:4. 1028

Horn BW. Ecology and population biology of aflatoxigenic fungi in soil. Journal of Toxicology-1029 Toxin Reviews 2003;22:2-3. 1030

Hsieh TC, Ma KH, Chao, A. iNEXT: an R package for rarefaction and extrapolation of species 1031 diversity (H ill numbers). Methods in Ecology and Evolution. 2016;7:12. 1032

Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, Von Mering C, Bork, P. Fast 1033 genome-wide functional annotation through orthology assignment by eggNOG-mapper. 1034 Molecular biology and evolution. 2017;34:8. 1035

Huerta-Cepas J, Szklarczyk D, Forslund K, Cook H, Heller D, Walter MC, ... Jensen LJ. 1036 eggNOG 4. 5: a hierarchical orthology framework with improved functional annotations 1037 for eukaryotic, prokaryotic and viral sequences. Nucleic acids research. 2016;44:D1. 1038

Jiang C, Song J, Zhang J, Yang, Q. New production process of the antifungal chaetoglobosin A 1039 using cornstalks. Brazilian journal of microbiology. 2017;48:3. 1040

Kasahara K, Miyamoto T, Fujimoto T, Oguri H, Tokiwano T, Oikawa H, ... Fujii, I. 1041 Solanapyrone synthase, a possible Diels–Alderase and iterative type I polyketide 1042


https://doi.org/10.1101/2020.01.31.928846

45

synthase encoded in a biosynthetic gene cluster from Alternaria solani. ChemBioChem. 1043 2010;11:9. 1044

Kaur, S. Phytotoxicity of solanapyrones produced by the fungus Ascochyta rabiei and their 1045 possible role in blight of chickpea (Cicer arietinum). Plant Science. 1995;109:1. 1046

Keller NP. Translating biosynthetic gene clusters into fungal armor and weaponry. Nature 1047 chemical biology. 2015;11:9. 1048

Khaldi N, Seifuddin FT, Turner G, Haft D, Nierman WC, Wolfe KH, Fedorova ND. SMURF: 1049 genomic mapping of fungal secondary metabolite clusters. Fungal Genetics and Biology. 1050 2010;47:9. 1051

Kurmayer R, Blom JF, Deng L, Pernthaler J. Integrating phylogeny, geographic niche 1052 partitioning and secondary metabolite synthesis in bloom-forming Planktothrix. The 1053 ISME journal. 2015;9:4. 1054

Kursar TA, Dexter KG, Lokvam J, Pennington RT, Richardson JE, Weber MG, ... Coley PD. 1055 The evolution of antiherbivore defenses and their contribution to species coexistence in 1056 the tropical tree genus Inga. Proceedings of the National Academy of Sciences. 1057 2009;106:43. 1058

Larsson, J. Eulerr: Area-Proportional Euler and Venn Diagrams with Ellipses. 2018. R package 1059 version 3. 1060

Li D, Baldwin IT, Gaquerel E. Navigating natural variation in herbivory-induced secondary 1061 metabolism in coyote tobacco populations using MS/MS structural analysis. Proceedings 1062 of the National Academy of Sciences. 2015;112:30. 1063

Lind AL, Wisecaver JH, Lameiras C, Wiemann P, Palmer JM, Keller NP, ... Rokas A. Drivers 1064 of genetic diversity in secondary metabolic gene clusters within a fungal species. PLoS 1065 biology. 2017;15:11. 1066

Manning VA, Pandelova I, Dhillon B, Wilhelm LJ, Goodwin SB, Berlin AM, ... Holman WH. 1067 Comparative genomics of a plant-pathogenic fungus, Pyrenophora tritici-repentis, reveals 1068 transduplication and the impact of repeat elements on pathogenicity and population 1069 divergence. G3: Genes, Genomes, Genetics. 2013;3:1. 1070

Menke J, Dong Y, Kistler HC. Fusarium graminearum Tri12p influences virulence to wheat and 1071 trichothecene accumulation. Molecular plant-microbe interactions. 2012;25:11. 1072

Mizushina Y, Kamisuki S, Kasai N, Shimazaki N, Takemura M, Asahara H, ... Sugawara F. A 1073 plant phytotoxin, solanapyrone A, is an inhibitor of DNA polymerase β and λ. Journal of 1074 Biological Chemistry. 2002;277:1. 1075

Newman AG, Townsend CA. Molecular characterization of the cercosporin biosynthetic 1076 pathway in the fungal plant pathogen Cercospora nicotianae. Journal of the American 1077 Chemical Society. 2016;138:12. 1078

Nielsen JC, Grijseels S, Prigent S, Ji B, Dainat J, Nielsen KF, ... Nielsen J. Global analysis of 1079 biosynthetic gene clusters reveals vast potential of secondary metabolite production in 1080 Penicillium species. Nature microbiology. 2017;2:6. 1081


https://doi.org/10.1101/2020.01.31.928846

46

Ohm RA, Feau N, Henrissat B, Schoch CL, Horwitz BA, Barry KW, Condon BJ, Copeland AC, 1082 Dhillon B, Glaser F, Hesse CN. Diverse lifestyles and strategies of plant pathogenesis 1083 encoded in the genomes of eighteen Dothideomycetes fungi. PLoS Pathogens. 2012 1084 Dec;8(12). 1085

Oksanen J, Kindt R, Legendre P, O’Hara B, Stevens MH, Oksanen MJ, Solymos P, Wagner H. 1086 The vegan package. Community ecology package. 2007;10. 1087

Olarte RA, Menke J, Zhang Y, Sullivan S, Slot JC, Huang Y, ... Bushley KE. Chromosome 1088 rearrangements shape the diversification of secondary metabolism in the cyclosporin 1089 producing fungus Tolypocladium inflatum. BMC genomics. 2019;20:1. 1090

Oliver RP, Friesen TL, Faris JD, Solomon PS. Stagonospora nodorum: from pathology to 1091 genomics and host resistance. Annual review of phytopathology. 2012;50. 1092

Orme D, Freckleton R, Thomas G, Petzoldt T, Fritz S, Isaac N, ... Pearse W. Caper: comparative 1093 analyses of phylogenetics and evolution in R. R package version 0.5. 2012. 1094

Pandelova I, Figueroa M, Wilhelm LJ, Manning VA, Mankaney AN, Mockler TC, Ciuffetti LM. 1095 Host-selective toxins of Pyrenophora tritici-repentis induce common responses associated 1096 with host susceptibility. PLoS One. 2012;7:7. 1097

Patron NJ, Waller RF, Cozijnsen AJ, Straney DC, Gardiner DM, Nierman WC, Howlett BJ. 1098 Origin and distribution of epipolythiodioxopiperazine (ETP) gene clusters in filamentous 1099 ascomycetes. BMC Evolutionary Biology. 2007;7:1. 1100

Proctor RH, McCormick SP, Kim HS, Cardoza RE, Stanley AM, Lindo L, ... Alexander NJ. 1101 Evolution of structural diversity of trichothecenes, a family of toxins produced by plant 1102 pathogenic and entomopathogenic fungi. PLoS pathogens. 2018;14:4. 1103

Reynolds HT, Vijayakumar V, Gluck‐Thaler E, Korotkin HB, Matheny PB, Slot JC. Horizontal 1104 gene cluster transfer increased hallucinogenic mushroom diversity. Evolution letters. 1105 2018;2:2. 1106

Ruibal C, Gueidan C, Selbmann L, Gorbushina AA, Crous PW, Groenewald JZ, ... Staley JT. 1107 Phylogeny of rock-inhabiting fungi related to Dothideomycetes. Studies in Mycology. 1108 2009;64. 1109

Schoch CL, Crous PW, Groenewald JZ, Boehm EW A, Burgess TI, De Gruyter J, ... Harada, Y. 1110 A class-wide phylogenetic assessment of Dothideomycetes. Studies in mycology. 1111 2009;64. 1112

Schümann J, Hertweck C. Molecular basis of cytochalasan biosynthesis in fungi: gene cluster 1113 analysis and evidence for the involvement of a PKS-NRPS hybrid synthase by RNA 1114 silencing. Journal of the American Chemical Society. 2007;129:31. 1115

Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, ... Ideker T. Cytoscape: a 1116 software environment for integrated models of biomolecular interaction networks. 1117 Genome research. 2003;13:11. 1118

Shi‐Kunne X, Faino L, van den Berg GC, Thomma BP, Seidl MF. Evolution within the fungal 1119 genus Verticillium is characterized by chromosomal rearrangement and gene loss. 1120 Environmental microbiology. 2018;20:4. 1121


https://doi.org/10.1101/2020.01.31.928846

47

Slot JC. Fungal gene cluster diversity and evolution. In Advances in genetics (Vol. 100, pp. 141-1122 178). Academic Press. 2017. 1123

Slot JC, Gluck-Thaler E. Metabolic gene clusters, fungal diversity, and the generation of 1124 accessory functions. Current opinion in genetics development. 2019;58. 1125

Song W, Qiao X, Chen K, Wang Y, Ji S, Feng J, ... Ye M. Biosynthesis-based quantitative 1126 analysis of 151 secondary metabolites of licorice to differentiate medicinal Glycyrrhiza 1127 species and their hybrids. Analytical chemistry. 2017;89:5. 1128

Spatafora JW, Bushley KE. Phylogenomics and evolution of secondary metabolism in plant-1129 associated fungi. Current opinion in plant biology. 2015;26. 1130

Spatafora JW, Owensby CA, Douhan GW, Boehm EW, Schoch CL. Phylogenetic placement of 1131 the ectomycorrhizal genus Cenococcum in Gloniaceae (Dothideomycetes). Mycologia. 1132 2012;104:3. 1133

Suetrong S, Boonyuen N, Pang KL, Ueapattanakit J, Klaysuban A, Sri-indrasutdhi V, ... Jones 1134 EG. A taxonomic revision and phylogenetic reconstruction of the Jahnulales 1135 (Dothideomycetes), and the new family Manglicolaceae. Fungal Diversity. 2011;51:1. 1136

Theobald S, Vesth TC, Rendsvig JK, Nielsen KF, Riley R, de Abreu LM, ... Hoof JB. 1137 Uncovering secondary metabolite evolution and biosynthesis using gene cluster networks 1138 and genetic dereplication. Scientific reports. 2018;8:1. 1139

Turgeon BG, Baker SE. Genetic and genomic dissection of the Cochliobolus heterostrophus 1140 Tox1 locus controlling biosynthesis of the polyketide virulence factor T‐toxin. Advances 1141 in genetics. 2007;57. 1142

Vesth TC, Nybo JL, Theobald S, Frisvad JC, Larsen TO, Nielsen KF, ... Gladden JM. 1143 Investigation of inter-and intraspecies variation through genome sequencing of 1144 Aspergillus section Nigri. Nature Genetics. 2018;50:12. 1145

Villani A, Proctor RH, Kim HS, Brown DW, Logrieco AF, Amatulli MT, ... Susca A. Variation 1146 in secondary metabolite production potential in the Fusarium incarnatum-equiseti species 1147 complex revealed by comparative analysis of 13 genomes. BMC genomics. 2019;20:1. 1148

Walton JD. Host-selective toxins: Agents of compatibility. Plant Cell. 1996;8:10. 1149 Walton JD, Panaccione DG. Host-selective toxins and disease specificity: perspectives and 1150

progress. Annual review of phytopathology. 1993;31:1. 1151 Wang G, Liu Z, Lin R, Li E, Mao Z, Ling J, ... Xie B. Biosynthesis of antibiotic leucinostatins in 1152

bio-control fungus Purpureocillium lilacinum and their inhibition on Phytophthora 1153 revealed by genome mining. PLoS pathogens. 2016;12:7. 1154

Wang JS, Tang, L. Epidemiology of aflatoxin exposure and human liver cancer. Journal of 1155 Toxicology: Toxin Reviews. 2004;23:2-3. 1156

Weinhold A, Ullah C, Dressel S, Schoettner M, Gase K, Gaquerel E, ... Baldwin IT. O-acyl 1157 sugars protect a wild tobacco from both native fungal pathogens and a specialist 1158 herbivore. Plant physiology. 2017;174:1. 1159


https://doi.org/10.1101/2020.01.31.928846

48

Wight WD, Kim KH, Lawrence CB, Walton JD. Biosynthesis and role in virulence of the 1160 histone deacetylase inhibitor depudecin from Alternaria brassicicola. Molecular plant-1161 microbe interactions. 2009;22:10. 1162

Wijayawardene NN, Hyde KD, Lumbsch HT, Liu JK, Maharachchikumbura SS, Ekanayaka AH, 1163 ... Phookamsak R. Outline of ascomycota: 2017. Fungal Diversity. 2018;88:1. 1164

Wisecaver JH, Slot JC, Rokas, A. The evolution of fungal metabolic pathways. PLoS Genetics. 1165 2014;10:12. 1166

Wolpert TJ, Dunkle LD, Ciuffetti LM. Host-selective toxins and avirulence determinants: what's 1167 in a name?. Annual review of phytopathology. 2002;40:1. 1168

Xu YJ, Luo F, Li B, Shang Y, Wang C. Metabolic conservation and diversification of 1169 Metarhizium species correlate with fungal host-specificity. Frontiers in microbiology. 1170 2016;7. 1171

Ziemert N, Lechner A, Wietz M, Millán-Aguiñaga N, Chavarria KL, Jensen PR. Diversity and 1172 evolution of secondary metabolism in the marine actinomycete genus Salinispora. 1173 Proceedings of the National Academy of Sciences. 2014;111:12. 1174

Züst T, Heichinger C, Grossniklaus U, Harrington R, Kliebenstein DJ, Turnbull LA. Natural 1175 enemies drive geographic variation in plant defenses. Science. 2012;338:6103. 1176


https://doi.org/10.1101/2020.01.31.928846

SM cluster prediction using SMURF

newSM clusters(3399 Total)

Extract HGs present in SMURF clusters that co-occur an unexpected number of times

across all Dothideomycete genomes compared with the random distribution

Conduct de novo search for all clusters consisting of genes

belonging to HGs that are part of unexpected co-occurrences

Group together new SM clusters that have ~90% of

their genes in common(719 Total)

Similar clusters found?

Cluster groups (CGs)

(422 Total)

Orphan clusters (OCs)

(297 Total)

Group together genomes based on Raup-Crick

dissimilarity in their CG repertoire

Multi-cluster profiles

Co-occurrence network of HGs

Count the frequency of all unexpected HG co-

occurrences across all CGs

Unexpected HG

co-occurrences

BLAST CGs and OCs against known cluster

database(MIBIG, local database)

MIBIG Annotated CGs and OCs

Randomly sample 2 genes occurring within 6 genes of

each other

Retrieve the HG to which each gene belongs and

number of genes in each HG

Count the number of CGs per genome

(Average 33.7)

CG distribution across all genomes

Count the number of times members of each HG are found within 6 genes of each

other across complete dataset (101 Dothideomycete Genomes)

Bin sampled HG pairs into different categories based on the total number of

genes in each HG

Random distributions of HG

co-occurrences

Null model pipeline

CO-OCCUR pipeline

Yes No

Rep

eat 5

00,0

00 ti

mes

(with

out r

epla

cem

ent()

Extend SM clusters to contain neighboring genes (within a 6 gene

distance) belonging to a homolog group (HG) found in another SM cluster

Secondary metabolite (SM) clustering pipeline

Annotated genomes

with genes clustered into homolog groups

(HGs)

extended SM clusters

Initial SM clusters

Figure SA (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint

https://doi.org/10.1101/2020.01.31.928846

6051(796)

1553(290)

5416(887)

1254(200)

1408(200*)

3326(1301*) 3873

(1468*)

co-occursmurf

antiSMASH

Number of proteins(Proteins participating in SM biosynthesis, transport and catabolism)* = p(enriched) < 0.01, Bonferroni corrected

Figure SB


https://doi.org/10.1101/2020.01.31.928846

DM

AT+N

RPS

_gro

up30

_s9

n.d.

_gro

up17

8_s4

n.d.

+NR

PS-L

ike_

grou

p191

_s4

n.d.

+NR

PS-L

ike_

grou

p193

_s4

n.d.

_gro

up12

3_s5

PKS_

grou

p67_

s7TC

_gro

up18

4_s4

NR

PS+P

KS-L

ike+

TC_g

roup

233_

s4PK

S-Li

ke+T

C_g

roup

126_

s5n.

d._g

roup

185_

s4n.

d.+N

RPS

-Lik

e_gr

oup1

68_s

4N

RPS

-Lik

e_gr

oup1

00_s

6PK

S-Li

ke_g

roup

190_

s4n.

d.+P

KS_g

roup

230_

s4n.

d._g

roup

231_

s4n.

d._g

roup

146_

s5PK

S-Li

ke_g

roup

166_

s5PK

S-Li

ke_g

roup

38_s

8PK

S-Li

ke_g

roup

154_

s5PK

S-Li

ke_g

roup

62_s

7PK

S-Li

ke_g

roup

199_

s4n.

d._g

roup

174_

s4PK

S-Li

ke_g

roup

155_

s5PK

S-Li

ke_g

roup

110_

s6PK

S-Li

ke_g

roup

41_s

8PK

S-Li

ke_g

roup

91_s

6N

RPS

_gro

up21

0_s4

n.d.

+PKS

_gro

up20

2_s4

PKS_

grou

p136

_s5

PKS_

grou

p39_

s8N

RPS

-Lik

e_gr

oup2

06_s

4n.

d._g

roup

211_

s4PK

S_gr

oup1

28_s

5N

RPS

+PKS

_gro

up97

_s6

PKS_

grou

p108

_s6

n.d.

_gro

up18

1_s4

n.d.

_gro

up21

7_s4

PKS_

grou

p78_

s6PK

S_gr

oup4

4_s8

PKS_

grou

p149

_s5

NR

PS-L

ike_

grou

p239

_s4

NR

PS-L

ike+

PKS_

grou

p81_

s6N

RPS

-Lik

e+PK

S+TC

_gro

up10

1_s6

NR

PS-L

ike+

PKS+

TC_g

roup

160_

s5N

RPS

-Lik

e+PK

S_gr

oup6

0_s7

HYB

RID

_gro

up12

1_s5

HYB

RID

_gro

up10

3_s6

HYB

RID

_gro

up11

6_s5

PKS-

Like

_gro

up86

_s6

PKS-

Like

_gro

up93

_s6

n.d.

+PKS

-Lik

e_gr

oup2

18_s

4PK

S-Li

ke_g

roup

65_s

7PK

S-Li

ke_g

roup

130_

s5PK

S-Li

ke_g

roup

125_

s5PK

S-Li

ke_g

roup

107_

s6n.

d._g

roup

177_

s4PK

S_gr

oup1

17_s

5n.

d._g

roup

213_

s4n.

d._g

roup

192_

s4PK

S_gr

oup1

18_s

5PK

S_gr

oup3

5_s8

n.d.

+PKS

_gro

up19

8_s4

PKS_

grou

p89_

s6PK

S_gr

oup2

0_s1

1H

YBR

ID+P

KS_g

roup

25_s

10PK

S_gr

oup1

57_s

5PK

S_gr

oup2

29_s

4D

MAT

+NR

PS_g

roup

17_s

12D

MAT

+NR

PS_g

roup

5_s1

4D

MAT

+NR

PS_g

roup

3_s1

5D

MAT

+NR

PS_g

roup

1_s1

8D

MAT

+NR

PS_g

roup

2_s1

7D

MAT

+NR

PS_g

roup

4_s1

5D

MAT

+NR

PS+P

KS_g

roup

24_s

11D

MAT

+NR

PS_g

roup

14_s

12D

MAT

+NR

PS_g

roup

21_s

11D

MAT

+NR

PS_g

roup

12_s

13n.

d._g

roup

208_

s4n.

d._g

roup

92_s

6PK

S_gr

oup1

9_s1

1PK

S_gr

oup3

1_s9

n.d.

_gro

up14

1_s5

n.d.

_gro

up17

3_s4

n.d.

_gro

up46

_s8

NR

PS+P

KS-L

ike_

grou

p13_

s13

PKS-

Like

_gro

up37

_s8

PKS-

Like

_gro

up12

2_s5

PKS-

Like

_gro

up52

_s8

PKS-

Like

_gro

up18

_s11

PKS-

Like

_gro

up8_

s13

PKS-

Like

_gro

up29

_s10

PKS-

Like

_gro

up9_

s13

PKS-

Like

_gro

up68

_s7

PKS-

Like

_gro

up32

_s9

NR

PS_g

roup

69_s

7N

RPS

_gro

up15

6_s5

NR

PS_g

roup

228_

s4N

RPS

_gro

up22

1_s4

NR

PS_g

roup

164_

s5N

RPS

+PKS

+PKS

-Lik

e_gr

oup8

8_s6

NR

PS+P

KS-L

ike_

grou

p87_

s6N

RPS

+PKS

-Lik

e_gr

oup2

7_s1

0N

RPS

+PKS

-Lik

e_gr

oup5

9_s7

NR

PS+N

RPS

-Lik

e_gr

oup1

80_s

4N

RPS

_gro

up75

_s6

NR

PS_g

roup

63_s

7N

RPS

_gro

up34

_s9

NR

PS_g

roup

158_

s5N

RPS

_gro

up14

4_s5

NR

PS_g

roup

133_

s5N

RPS

+NR

PS-L

ike_

grou

p216

_s4

NR

PS+N

RPS

-Lik

e_gr

oup5

1_s8

NR

PS+N

RPS

-Lik

e_gr

oup7

3_s7

NR

PS_g

roup

152_

s5N

RPS

_gro

up11

5_s6

NR

PS_g

roup

42_s

8N

RPS

_gro

up26

_s10

NR

PS_g

roup

50_s

8N

RPS

_gro

up33

_s9

NR

PS_g

roup

66_s

7N

RPS

_gro

up99

_s6

NR

PS+N

RPS

-Lik

e_gr

oup5

7_s7

NR

PS_g

roup

172_

s4N

RPS

_gro

up18

8_s4

NR

PS+T

C_g

roup

90_s

6N

RPS

+PKS

_gro

up22

0_s4

NR

PS_g

roup

189_

s4n.

d._g

roup

225_

s4N

RPS

-Lik

e+PK

S_gr

oup1

31_s

5PK

S_gr

oup1

34_s

5n.

d._g

roup

137_

s5PK

S_gr

oup2

23_s

4PK

S_gr

oup1

13_s

6PK

S_gr

oup4

0_s8

PKS_

grou

p175

_s4

PKS_

grou

p10_

s13

PKS_

grou

p165

_s5

n.d.

+PKS

_gro

up15

0_s5

PKS_

grou

p49_

s8D

MA T

+PKS

_gro

up23

_s11

DM

AT+P

KS_g

roup

7_s1

4PK

S_gr

oup6

_s14

PKS_

grou

p11_

s13

PKS_

grou

p127

_s5

PKS_

grou

p28_

s10

PKS+

TC_g

roup

142_

s5PK

S+TC

_gro

up18

2_s4

PKS_

grou

p80_

s6PK

S_gr

oup2

00_s

4PK

S_gr

oup1

87_s

4PK

S_gr

oup9

4_s6

PKS_

grou

p132

_s5

PKS_

grou

p64_

s7PK

S_gr

oup1

06_s

6PK

S_gr

oup4

3_s8

PKS_

grou

p209

_s4

PKS+

PKS-

Like

_gro

up36

_s8

PKS+

PKS-

Like

_gro

up10

9_s6

PKS_

grou

p170

_s4

PKS_

grou

p205

_s4

PKS_

grou

p70_

s7PK

S_gr

oup1

1 1_s

6PK

S_gr

oup1

05_s

6PK

S_gr

oup1

12_s

6PK

S_gr

oup1

96_s

4PK

S_gr

oup2

34_s

4PK

S_gr

oup5

4_s7

PKS_

grou

p238

_s4

NR

PS-L

ike_

grou

p226

_s4

NR

PS-L

ike_

grou

p203

_s4

PKS_

grou

p138

_s5

PKS_

grou

p159

_s5

PKS_

grou

p72_

s7H

YBR

ID+P

KS_g

roup

15_s

12PK

S_gr

oup2

2_s1

1PK

S_gr

oup1

6_s1

2PK

S_gr

oup2

12_s

4PK

S_gr

oup1

76_s

4PK

S+TC

_gro

up45

_s8

PKS_

grou

p102

_s6

PKS_

grou

p139

_s5

PKS_

grou

p83_

s6PK

S+TC

_gro

up58

_s7

PKS_

grou

p96_

s6D

MAT

+PKS

_gro

up18

3_s4

PKS_

grou

p53_

s8PK

S_gr

oup1

24_s

5PK

S_gr

oup1

04_s

6N

RPS

+PKS

_gro

up22

7_s4

NR

PS+P

KS_g

roup

143_

s5PK

S_gr

oup2

37_s

4PK

S_gr

oup7

1_s7

PKS_

grou

p56_

s7PK

S_gr

oup2

07_s

4PK

S_gr

oup8

2_s6

PKS_

grou

p47_

s8PK

S_gr

oup8

4_s6

DM

A T+P

KS_g

roup

48_s

8PK

S_gr

oup7

6_s6

NR

PS+P

KS_g

roup

161_

s5D

MAT

+PKS

_gro

up16

3_s5

n.d.

+PKS

_gro

up61

_s7

PKS_

grou

p153

_s5

PKS_

grou

p114

_s6

PKS_

grou

p135

_s5

PKS_

grou

p171

_s4

PKS_

grou

p145

_s5

NR

PS+P

KS_g

roup

224_

s4D

MAT

+PKS

_gro

up98

_s6

PKS+

TC_g

roup

179_

s4PK

S_gr

oup1

40_s

5PK

S_gr

oup9

5_s6

PKS_

grou

p197

_s4

PKS_

grou

p167

_s4

PKS_

grou

p186

_s4

PKS_

grou

p85_

s6PK

S_gr

oup2

14_s

4PK

S_gr

oup2

32_s

4n.

d.+P

KS_g

roup

201_

s4PK

S_gr

oup2

36_s

4N

RPS

-Lik

e+PK

S_gr

oup2

22_s

4PK

S_gr

oup1

69_s

4PK

S_gr

oup1

20_s

5PK

S_gr

oup1

29_s

5PK

S_gr

oup1

95_s

4PK

S_gr

oup1

51_s

5PK

S_gr

oup7

4_s6

PKS_

grou

p215

_s4

NR

PS-L

ike+

PKS_

grou

p194

_s4

PKS_

grou

p119

_s5

PKS_

grou

p204

_s4

n.d.

+PKS

_gro

up23

5_s4

PKS_

grou

p79_

s6PK

S+TC

_gro

up14

7_s5

PKS+

TC_g

roup

148_

s5PK

S_gr

oup5

5_s7

PKS_

grou

p219

_s4

PKS_

grou

p77_

s6PK

S_gr

oup1

62_s

5


Myriangium duriaei CBS 260.36Myriangiales

Elsinoe ampelinaMyriangiales


Aureobasidium namibiae CBS 147.97Dothideales

Aureobasidium melanogenum CBS 110374Dothideales





Zymoseptoria tritici IPO323Capnodiales

Zymoseptoria pseudotritici STIR04_2.2.1Capnodiales

Zymoseptoria ardabiliae STIR04_1.1.1Capnodiales

Dothistroma septosporum NZE10Capnodiales

Passalora fulvaCapnodiales



Sphaerulina musiva SO2202Capnodiales

Sphaerulina populicolaCapnodiales





Piedraia hortae CBS 480.64Capnodiales







Venturia inaequalisVenturiales

Venturia pyrinaVenturiales









Macrophomina phaseolina MS6Botryosphaeriales

Botryosphaeria dothideaBotryosphaeriales





Glonium stellatumUnknown

Cenococcum geophilum 1.58Unknown

Lophium mytilinumMytilinidiales

Mytilinidion resinicolaMytilinidiales

Rhytidhysteron rufulumHysteriales






Amniculicola lignicola CBS 123094Pleosporales

Lophiotrema nuculaPleosporales



Aaosphaeria arxiiPleosporales

Lophiostoma macrostomum CBS 122681Pleosporales



Massariosphaeria phaeosporaPleosporales

Trematosphaeria pertusaPleosporales

Bimuria novae-zelandiaePleosporales

Paraphaeosphaeria sporulosaPleosporales

Karstenula rhodostoma CBS 690.94Pleosporales

Massarina eburnea CBS 473.64Pleosporales

Byssothecium circinansPleosporales

Periconia macrospinosaPleosporales


Stagonospora sp. SRC1lsM3aPleosporales

Ampelomyces quisqualisPleosporales





Plenodomus tracheiphilus IPT5Pleosporales

Clathrospora elynaePleosporales


Alternaria brassicicolaPleosporales

Alternaria alternataPleosporales

Pyrenophora tritici-repentisPleosporales

Pyrenophora teres f. teres 0-1Pleosporales

Setosphaeria turcica Et28APleosporales

Curvularia lunata m118Pleosporales

Bipolaris maydis C5Pleosporales

Bipolaris sorokiniana ND90PrPleosporales

Bipolaris oryzae ATCC 44560Pleosporales

Bipolaris zeicola 26-R-13Pleosporales

Bipolaris victoriae FI3Pleosporales


Pyrenochaeta sp. DS3sAY3aPleosporales

Lizonia empirigoniaUnknown










Botrytis cinereaLeotiomycetes

Sclerotinia sclerotiorumLeotiomycetes



0.10

y

Figure SC


https://doi.org/10.1101/2020.01.31.928846

NR

PS.L

ike.

PKS_

grou

p60_

s7PK

S_gr

oup2

38_s

4PK

S_gr

oup1

86_s

4PK

S_gr

oup5

3_s8

PKS_

grou

p85_

s6N

RPS

.Lik

e_gr

oup1

00_s

6PK

S.Li

ke_g

roup

125_

s5PK

S.Li

ke_g

roup

122_

s5PK

S_gr

oup1

76_s

4N

RPS

_gro

up22

8_s4

DM

AT.N

RPS

_gro

up12

_s13

PKS_

grou

p44_

s8N

RPS

.Lik

e.PK

S_gr

oup2

22_s

4D

MAT

.NR

PS_g

roup

5_s1

4N

RPS

_gro

up42

_s8

DM

AT.N

RPS

_gro

up17

_s12

NR

PS.N

RPS

.Lik

e_gr

oup5

1_s8

PKS_

grou

p74_

s6n.

d._g

roup

185_

s4PK

S_gr

oup1

75_s

4PK

S.Li

ke_g

roup

107_

s6N

RPS

_gro

up66

_s7

PKS_

grou

p76_

s6N

RPS

_gro

up14

4_s5

NR

PS.P

KS.L

ike.

TC_g

roup

233_

s4PK

S_gr

oup5

5_s7

PKS_

grou

p132

_s5

NR

PS.N

RPS

.Lik

e_gr

oup5

7_s7

n.d.

.NR

PS.L

ike_

grou

p168

_s4

n.d.

_gro

up14

1_s5

PKS_

grou

p72_

s7PK

S_gr

oup2

14_s

4PK

S_gr

oup2

34_s

4PK

S_gr

oup1

69_s

4D

MAT

.NR

PS_g

roup

30_s

9PK

S_gr

oup1

02_s

6N

RPS

_gro

up17

2_s4

n.d.

_gro

up23

1_s4

PKS.

TC_g

roup

179_

s4n.

d..P

KS_g

roup

201_

s4n.

d..P

KS_g

roup

230_

s4PK

S_gr

oup5

4_s7

PKS.

PKS.

Like

_gro

up36

_s8

PKS_

grou

p120

_s5

PKS.

Like

_gro

up19

9_s4

PKS_

grou

p56_

s7PK

S_gr

oup1

59_s

5N

RPS

_gro

up26

_s10

PKS_

grou

p197

_s4

NR

PS.P

KS.L

ike_

grou

p13_

s13

PKS_

grou

p119

_s5

PKS_

grou

p149

_s5

NR

PS_g

roup

152_

s5PK

S_gr

oup1

17_s

5D

MAT

.PKS

_gro

up18

3_s4

n.d.

_gro

up17

4_s4

PKS_

grou

p113

_s6

PKS_

grou

p47_

s8D

MAT

.NR

PS_g

roup

1_s1

8PK

S.Li

ke_g

roup

68_s

7n.

d..P

KS.L

ike_

grou

p218

_s4

PKS_

grou

p232

_s4

n.d.

_gro

up17

3_s4

n.d.

_gro

up21

1_s4

HYB

RID

_gro

up11

6_s5

NR

PS.L

ike_

grou

p203

_s4

NR

PS.L

ike.

PKS_

grou

p131

_s5

PKS.

Like

_gro

up52

_s8

DM

AT.P

KS_g

roup

7_s1

4N

RPS

.NR

PS.L

ike_

grou

p180

_s4

PKS.

Like

_gro

up15

4_s5

HYB

RID

.PKS

_gro

up25

_s10

NR

PS_g

roup

63_s

7N

RPS

_gro

up15

8_s5

n.d.

.PKS

_gro

up23

5_s4

PKS_

grou

p219

_s4

NR

PS.P

KS.P

KS.L

ike_

grou

p88_

s6PK

S_gr

oup2

04_s

4N

RPS

_gro

up11

5_s6

PKS_

grou

p28_

s10

NR

PS_g

roup

188_

s4n.

d._g

roup

181_

s4PK

S_gr

oup2

2_s1

1n.

d._g

roup

146_

s5N

RPS

.PKS

.Lik

e_gr

oup2

7_s1

0PK

S_gr

oup2

05_s

4N

RPS

.PKS

_gro

up97

_s6

PKS_

grou

p229

_s4

PKS_

grou

p95_

s6PK

S_gr

oup3

5_s8

PKS_

grou

p67_

s7PK

S.Li

ke_g

roup

91_s

6PK

S_gr

oup7

9_s6

PKS_

grou

p236

_s4

NR

PS.L

ike.

PKS_

grou

p194

_s4

PKS_

grou

p80_

s6PK

S_gr

oup1

29_s

5PK

S_gr

oup4

3_s8

PKS_

grou

p162

_s5

n.d.

_gro

up17

8_s4

PKS.

Like

_gro

up41

_s8

PKS_

grou

p77_

s6PK

S_gr

oup2

23_s

4N

RPS

.TC

_gro

up90

_s6

PKS_

grou

p171

_s4

NR

PS.L

ike_

grou

p206

_s4

NR

PS.P

KS.L

ike_

grou

p59_

s7n.

d._g

roup

225_

s4D

MAT

.NR

PS_g

roup

14_s

12n.

d..N

RPS

.Lik

e_gr

oup1

91_s

4PK

S.Li

ke_g

roup

32_s

9PK

S_gr

oup7

8_s6

TC_g

roup

184_

s4N

RPS

_gro

up16

4_s5

PKS.

TC_g

roup

142_

s5PK

S.Li

ke_g

roup

155_

s5N

RPS

.Lik

e_gr

oup2

39_s

4PK

S_gr

oup2

07_s

4N

RPS

.PKS

_gro

up14

3_s5

PKS.

Like

_gro

up11

0_s6

DM

AT.N

RPS

_gro

up4_

s15

DM

AT.P

KS_g

roup

23_s

11PK

S_gr

oup1

36_s

5PK

S_gr

oup1

08_s

6N

RPS

.PKS

_gro

up16

1_s5

PKS_

grou

p167

_s4

PKS_

grou

p145

_s5

PKS_

grou

p89_

s6PK

S_gr

oup1

34_s

5PK

S_gr

oup1

53_s

5N

RPS

.PKS

_gro

up22

7_s4

HYB

RID

_gro

up12

1_s5

NR

PS.L

ike_

grou

p226

_s4

PKS_

grou

p94_

s6PK

S_gr

oup1

14_s

6N

RPS

_gro

up75

_s6

PKS.

TC_g

roup

147_

s5D

MAT

.PKS

_gro

up48

_s8

PKS_

grou

p111

_s6

PKS_

grou

p39_

s8n.

d..P

KS_g

roup

198_

s4PK

S_gr

oup1

95_s

4n.

d..P

KS_g

roup

202_

s4PK

S_gr

oup1

24_s

5PK

S_gr

oup1

06_s

6D

MAT

.NR

PS_g

roup

2_s1

7N

RPS

.PKS

_gro

up22

0_s4

PKS_

grou

p31_

s9PK

S_gr

oup6

4_s7

PKS_

grou

p200

_s4

PKS_

grou

p49_

s8n.

d..P

KS_g

roup

150_

s5N

RPS

.NR

PS.L

ike_

grou

p73_

s7PK

S_gr

oup8

2_s6

PKS_

grou

p40_

s8PK

S_gr

oup8

4_s6

n.d.

_gro

up12

3_s5

PKS.

Like

_gro

up37

_s8

DM

AT.N

RPS

_gro

up3_

s15

PKS.

Like

_gro

up16

6_s5

NR

PS.L

ike.

PKS_

grou

p81_

s6PK

S.TC

_gro

up58

_s7

PKS.

PKS.

Like

_gro

up10

9_s6

PKS_

grou

p196

_s4

PKS_

grou

p237

_s4

DM

A T.P

KS_g

roup

163_

s5N

RPS

_gro

up22

1_s4

PKS.

Like

_gro

up19

0_s4

PKS_

grou

p212

_s4

NR

PS.L

ike.

PKS.

TC_g

roup

160_

s5PK

S_gr

oup1

27_s

5N

RPS

_gro

up21

0_s4

DM

AT.N

RPS

_gro

up21

_s11

PKS_

grou

p105

_s6

PKS_

grou

p128

_s5

n.d.

_gro

up21

7_s4

PKS.

TC_g

roup

182_

s4PK

S_gr

oup1

39_s

5PK

S_gr

oup6

_s14

PKS_

grou

p209

_s4

PKS_

grou

p187

_s4

DM

AT.N

RPS

.PKS

_gro

up24

_s11

DM

AT.P

KS_g

roup

98_s

6H

YBR

ID_g

roup

103_

s6H

YBR

ID.P

KS_g

roup

15_s

12n.

d._g

roup

46_s

8n.

d._g

roup

92_s

6n.

d._g

roup

137_

s5n.

d._g

roup

177_

s4n.

d._g

roup

192_

s4n.

d._g

roup

208_

s4n.

d._g

roup

213_

s4n.

d..N

RPS

.Lik

e_gr

oup1

93_s

4n.

d..P

KS_g

roup

61_s

7N

RPS

_gro

up33

_s9

NR

PS_g

roup

34_s

9N

RPS

_gro

up50

_s8

NR

PS_g

roup

69_s

7N

RPS

_gro

up99

_s6

NR

PS_g

roup

133_

s5N

RPS

_gro

up15

6_s5

NR

PS.L

ike.

PKS.

TC_g

roup

101_

s6N

RPS

.NR

PS.L

ike_

grou

p216

_s4

NR

PS.P

KS_g

roup

224_

s4N

RPS

.PKS

.Lik

e_gr

oup8

7_s6

PKS_

grou

p10_

s13

PKS_

grou

p11_

s13

PKS_

grou

p16_

s12

PKS_

grou

p19_

s11

PKS_

grou

p20_

s11

PKS_

grou

p70_

s7PK

S_gr

oup7

1_s7

PKS_

grou

p83_

s6PK

S_gr

oup9

6_s6

PKS_

grou

p104

_s6

PKS_

grou

p112

_s6

PKS_

grou

p118

_s5

PKS_

grou

p135

_s5

PKS_

grou

p138

_s5

PKS_

grou

p140

_s5

PKS_

grou

p151

_s5

PKS_

grou

p157

_s5

PKS_

grou

p165

_s5

PKS_

grou

p170

_s4

PKS_

grou

p215

_s4

PKS.

Like

_gro

up8_

s13

PKS.

Like

_gro

up9_

s13

PKS.

Like

_gro

up18

_s11

PKS.

Like

_gro

up29

_s10

PKS.

Like

_gro

up38

_s8

PKS.

Like

_gro

up62

_s7

PKS.

Like

_gro

up65

_s7

PKS.

Like

_gro

up86

_s6

PKS.

Like

_gro

up93

_s6

PKS.

Like

_gro

up13

0_s5

PKS.

Like

.TC

_gro

up12

6_s5

PKS.

TC_g

roup

45_s

8PK

S.TC

_gro

up14

8_s5


Myriangium duriaei CBS 260.36Myriangiales

Elsinoe ampelinaMyriangiales


Aureobasidium namibiae CBS 147.97Dothideales

Aureobasidium melanogenum CBS 110374Dothideales





Zymoseptoria tritici IPO323Capnodiales

Zymoseptoria pseudotritici STIR04_2.2.1Capnodiales

Zymoseptoria ardabiliae STIR04_1.1.1Capnodiales

Dothistroma septosporum NZE10Capnodiales

Passalora fulvaCapnodiales



Sphaerulina musiva SO2202Capnodiales

Sphaerulina populicolaCapnodiales





Piedraia hortae CBS 480.64Capnodiales







Venturia inaequalisVenturiales

Venturia pyrinaVenturiales









Macrophomina phaseolina MS6Botryosphaeriales

Botryosphaeria dothideaBotryosphaeriales





Glonium stellatumUnknown

Cenococcum geophilum 1.58Unknown

Lophium mytilinumMytilinidiales

Mytilinidion resinicolaMytilinidiales

Rhytidhysteron rufulumHysteriales






Amniculicola lignicola CBS 123094Pleosporales

Lophiotrema nuculaPleosporales



Aaosphaeria arxiiPleosporales

Lophiostoma macrostomum CBS 122681Pleosporales



Massariosphaeria phaeosporaPleosporales

Trematosphaeria pertusaPleosporales

Bimuria novae-zelandiaePleosporales

Paraphaeosphaeria sporulosaPleosporales

Karstenula rhodostoma CBS 690.94Pleosporales

Massarina eburnea CBS 473.64Pleosporales

Byssothecium circinansPleosporales

Periconia macrospinosaPleosporales


Stagonospora sp. SRC1lsM3aPleosporales

Ampelomyces quisqualisPleosporales





Plenodomus tracheiphilus IPT5Pleosporales

Clathrospora elynaePleosporales


Alternaria brassicicolaPleosporales

Alternaria alternataPleosporales

Pyrenophora tritici-repentisPleosporales

Pyrenophora teres f. teres 0-1Pleosporales

Setosphaeria turcica Et28APleosporales

Curvularia lunata m118Pleosporales

Bipolaris maydis C5Pleosporales

Bipolaris sorokiniana ND90PrPleosporales

Bipolaris oryzae ATCC 44560Pleosporales

Bipolaris zeicola 26-R-13Pleosporales

Bipolaris victoriae FI3Pleosporales


Pyrenochaeta sp. DS3sAY3aPleosporales

Lizonia empirigoniaUnknown










Botrytis cinereaLeotiomycetes

Sclerotinia sclerotiorumLeotiomycetes



0.10

0Fritz and Purvis ‘ D

1 2.5

Figure SD(which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission.

The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint

https://doi.org/10.1101/2020.01.31.928846

0.00

0.25

0.50

0.75

1.00

co-o

ccur

antiS

MASH

SM

URF

Cluster detection algorithm

% t

ota

l pro

tein

s r

eco

ve

red

at

locu

sRecovery of 87 melanin cluster loci

Figure SE (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. The copyright holder for this preprintthis version posted February 1, 2020. ; https://doi.org/10.1101/2020.01.31.928846doi: bioRxiv preprint

https://doi.org/10.1101/2020.01.31.928846

0.25

0.50

0.75

1.00

0.0 0.1 0.2 0.3 0.4Phylogenetic distance

Sore

nsen

dis

sim

ilarit

yPhylogenetic distance vs. dissimilarity in cluster repertoire in the Pleosporales

Figure SF


https://doi.org/10.1101/2020.01.31.928846


Myriangium duriaei CBS 260.36MyriangialesElsinoe ampelinaMyriangiales


Aureobasidium namibiae CBS 147.97DothidealesAureobasidium melanogenum CBS 110374Dothideales





Zymoseptoria tritici IPO323CapnodialesZymoseptoria pseudotritici STIR04_2.2.1CapnodialesZymoseptoria ardabiliae STIR04_1.1.1Capnodiales

Dothistroma septosporum NZE10CapnodialesPassalora fulvaCapnodiales



Sphaerulina musiva SO2202CapnodialesSphaerulina populicolaCapnodiales





Piedraia hortae Capnodiales







Venturia inaequalisVenturialesVenturia pyrinaVenturiales









Macrophomina phaseolina MS6BotryosphaerialesBotryosphaeria dothideaBotryosphaeriales





Glonium stellatumUnknownCenococcum geophilum 1.58Unknown

Lophium mytilinumMytilinidialesMytilinidion resinicolaMytilinidialesRhytidhysteron rufulumHysteriales






Amniculicola lignicola CBS 123094PleosporalesLophiotrema nuculaPleosporales



Aaosphaeria arxiiPleosporalesLophiostoma macrostomum CBS 122681Pleosporales



Massariosphaeria phaeosporaPleosporalesTrematosphaeria pertusaPleosporales

Bimuria novae-zelandiaePleosporalesParaphaeosphaeria sporulosaPleosporalesKarstenula rhodostoma CBS 690.94PleosporalesMassarina eburnea CBS 473.64Pleosporales

Byssothecium circinansPleosporalesPericonia macrospinosaPleosporales


Stagonospora sp. SRC1lsM3aPleosporalesAmpelomyces quisqualisPleosporales





Plenodomus tracheiphilus IPT5PleosporalesClathrospora elynaePleosporales


Alternaria brassicicolaPleosporalesAlternaria alternataPleosporales

Pyrenophora tritici-repentisPleosporalesPyrenophora teres f. teres 0-1PleosporalesSetosphaeria turcica Et28APleosporalesCurvularia lunata m118PleosporalesBipolaris maydis C5PleosporalesBipolaris sorokiniana ND90PrPleosporalesBipolaris oryzae ATCC 44560PleosporalesBipolaris zeicola 26-R-13PleosporalesBipolaris victoriae FI3Pleosporales


Pyrenochaeta sp. DS3sAY3aPleosporalesLizonia empirigoniaUnknown










Botrytis cinereaLeotiomycetesSclerotinia sclerotiorumLeotiomycetes



0.10

Dothideales

Capnodiales

Venturiales

Botryo-sphaeriales

MytilinidalesHysteriales

Pleosporales

Myriangiales

0 20 40 60Number of refined clusters with a signature gene

Figure SG


https://doi.org/10.1101/2020.01.31.928846

0

20

40

60

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Dissimilarity estimates across all cluster repertoires

Den

sity

Partitioning dissimilarity between cluster repertoires (Pleosporales)

Sørensen dissimilarity(total dissimilarity

across all repertoires)

Turnover componentof Sørensen dissimilarity

Nestedness componentof Sørensen dissimilarity

Figure SH


https://doi.org/10.1101/2020.01.31.928846

MCL000001

MCL000002

MCL000003

MCL000005

MCL000006

MCL000016 MCL000017

MCL000033

MCL000109

MCL000176

MCL000193

MCL000253

MCL000357

MCL006196

MCL007003

MCL007271

MCL007288

0

10

20

30

0 300 600 900gene homolog group size

Ra

up

-Cri

ck d

issim

ilari

tyRelationship between gene homolog group sizeand cluster composition diversity (Raup-Crick)

Figure SI


https://doi.org/10.1101/2020.01.31.928846

The architecture of metabolism maximizes biosynthetic ... · 1/31/2020 · 2 The architecture of...

Documents

Transcript of The architecture of metabolism maximizes biosynthetic ... · 1/31/2020 · 2 The architecture of...