Supplementary Information Polyketide and Nonribosomal ... fileSupplementary Information Polyketide...

Supplementary Information

Polyketide and Nonribosomal Peptide Retrobiosynthesis and Comparison to Gene Clusters

Chris A. Dejong1,2, Gregory M. Chen1,2, Haoxin Li1,2, Chad W. Johnston1, Mclean R. Edwards1,

Philip N. Rees1, Michael A. Skinnider1 Andrew L. H. Webster1 & Nathan A. Magarvey1* 1 Department of Biochemistry & Biomedical Sciences; Department of Chemistry & Chemical

Biology, M. G. DeGroote Institute for Infectious Disease Research; McMaster University,

Hamilton, Canada L8S 4K1

2 these authors contributed equally to this work

* Corresponding author, email address: [email protected]

Nature Chemical Biology: doi:10.1038/nchembio.2188

mailto:[email protected]

Supplementary Results

Supplementary Figure 1. Visualization of GRAPE process pipeline. GRAPE takes in small

molecule structures in the form of SMILES and breaks them down in the above order, while

capturing details of the chemistries during retro-synthesis. The final output is monomer

information for both amino acids (AA) and polyketides (PK), as well as additional tailoring and

scaffold information. For full names of the abbreviations, see Supplementary Data Set.


Supplementary Figure 2. Visualization of polyketide carbon walking. GRAPE finds the

biosynthetic end carbon (carboxylic acid carbon) and is then able to find the biosynthetic start

carbon by locating the furthest carbon away that is in a carbon only chain (no other intermediate

atoms) and is not terminal. If the furthest atom away cannot be a β carbon (incorrect number of

carbons in the chain) the second furthest away carbon is then used. The states of each α and β

carbon are analyzed for the substrate and oxidation state respectively. For full names of the

abbreviations, see Supplementary Data Set.


Supplementary Figure 3. Visualization of amino acids as acyl keto-extension units in hybrid

polyketides and non-ribosomal peptides. GRAPE identifies the longest carbon only backbone

from the α-carbon of the amine to carbonyl carbon of the furthest carboxylic acid. If the carbon

chain has an odd number of carbons, the keto-extended amino acid is identified as β-amino acid.

The bond between β-carbon and γ-carbon is then broken, and a carboxylic acid is added to the β-

carbon to create the β amino acid. If the carbon chain has an even number of carbons, the keto-

extended amino acid is identified as α-amino acid. The bond between α-carbon and β-carbon is

then broken and a carboxylic acid is added to the α-carbon to create α-amino acid.


Supplementary Table 1. a) Accuracy of assembly line predictive units. b) Assembly line

biosynthetic tailoring features detected by PRISM and GRAPE. c) Non-assembly line

biosynthetic features detected by PRISM and GRAPE. For each assembly line or non-assembly

line biosynthetic feature, PRISM detects the corresponding genes in a gene cluster, while GRAPE

detects the corresponding structural features in small molecules.

a

Assembly line predictive types PRISM prediction accuracy

Proteinogenic amino acids 94%

Nonprotenogenic amino acids 93%

Substrates of AT domains in polyketide

clusters

74%

Oxidation states of AT domains 75%

Deoxy sugars 64%

b

Assembly line biosynthetic

tailoring features

Detection by PRISM Detection by GRAPE

Fatty acyl addition Fatty acyl adenylating enzyme Presence of a fatty chain

O-methyltransferase O-methyltransferase See Supplementary Table 4

N-methyltransferase N-methyltransferase See Supplementary Table 4

C-methyltransferase C-methyltransferase See Supplementary Table 4

Thiazole Cyclase See Supplementary Table 4

Oxazole Cyclase See Supplementary Table 4

Tryptophan dioxygenase Tryptophan dioxygenase Kynurenine substructure

c

Non-assembly line

biosynthetic features

Detection by PRISM Detection by GRAPE

Chlorination Chlorination enzyme Presence of chlorine atoms

Presence of a sugar Sugar synthesis enzymes Identification of a specific

sugar

Sulphate group Presence of sulfotransferase

enzyme

Presence of sulphate group

Chemical Scaffolds1 Enzymes responsible scaffold

biosynthesis

Identification of a specific

scaffold

Acyl adenylating

substrates

Acyl adenylating enzyme Identification of a specific

acyl adenylating substrate 1 Chemical scaffolds are only applied to type 2 polyketides and enediyne scaffolds.


Supplementary Table 2. Glycosylated tailoring detected by PRISM and GRAPE. Each sugar

molecule detected by GRAPE is linked to one or multiple biosynthetic enzymes for the

identification.

Sugar Sugar enzymes

L-aculose 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-

ketoreductase, epimerase, oxidoreductase, deoxygenase

glycosyltransferase

L-cinerulose A 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-

ketoreductase, epimerase, deoxygenase glycosyltransferase

L-rhodinose 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-


Rednose 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-

ketoreductase, epimerase, deoxygenase glycosyltransferasegenase

glycosyltransferase

L-cinerulose B 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-


O-methyl-L-amicetose 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-

ketoreductase, epimerase, O-methyltransferase, deoxygenase

glycosyltransferase

4-O-methyl-L-rhodinose 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-

ketoreductase, epimerase, O-methyltransferase, deoxygenase

glycosyltransferase

L-daunosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 3-

aminotransferase, deoxygenase glycosyltransferase

L-ristosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 4-

aminotransferase, deoxygenase glycosyltransferase

D-digitoxose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,

deoxygenase glycosyltransferase

L-digitoxose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,

epimerase, deoxygenase glycosyltransferase

2-deoxy-L-fucose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,

Epimerase, deoxygenase glycosyltransferase

D-olivose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,

epimerase, deoxygenase glycosyltransferase

D-oliose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,


4-oxo-L-vancosamine 4,6-dehydratase, 2,3-dehydratase, Epimerase, 4-aminotransferase, C-

methyltransferase, deoxygenase glycosyltransferase

D-forosamine 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-

aminotransferase, N,N-dimethyltransferase, deoxygenase

glycosyltransferase

L-actinosamine 4,6-dehydratase, 2,3-dehydratase, deoxygenase glycosyltransferase

L-vancosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 3-

aminotransferase, C-methyltransferase, deoxygenase glycosyltransferase

L-vicenisamine 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-aminotransferase,

N-methyltransferase, Deoxygenase glycosyltransferase

D-chalcose 4,6-dehydratase, 3-ketoreductase, 4-aminotransferase, O-

methyltransferase, oxidative deaminase, deoxygenase glycosyltransferase

D-mycarose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,

epimerase, C-methyltransferase, deoxygenase glycosyltransferase


Supplementary Table 2 (continued)

Sugar Sugar enzymes

D-mycosamine 4,6-dehydratase, 3,4-isomerase, 3-aminotransferase, O-methyltransferase,


4-deoxy-4-thio-D-

digitoxose

4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, epimerase, thiosugar

synthase, deoxygenase glycosyltransferase

D-fucofuranose 4,6-dehydratase, 4-ketoreductase, deoxygenase glycosyltransferase

D-fucose 4,6-dehydratase, 4-ketoreductase, deoxygenase glycosyltransferase

L-rhamnose 4,6-dehydratase, 4-ketoreductase, epimerase, deoxygenase

glycosyltransferase

4-N-ethyl-4-amino-3-O-

methoxy-2,4,5-

trideoxypentose

UDP-sugar decarboxylase, UDP-sugar dehydrogenase, 2,3-dehydratase,

3-ketoreductase, 4-aminotransferase, N-ethyltransferase, deoxygenase

glycosyltransferase

D-3-N-methyl-4-O-

methyl-L-ristosamine

4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 4-

aminotransferase, N-methyltransferase, O-methyltransferase,


N,N-dimethyl-L-

pyrrolosamine

4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, epimerase, 4-


glycosyltransferase

D-desosamine 4,6-dehydratase, 3,4-dehydratase, 3-aminotransferase, N,N-

dimethyltransferase, oxidative deaminase, deoxygenase

glycosyltransferase

L-megosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 3-


glycosyltransferase

Nogalamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 3-


glycosyltransferase

L-rhodosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 3-


glycosyltransferase

D-angolosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, 3-aminotransferase,

N,N-dimethyltransferase, deoxygenase glycosyltransferase

Kedarosamine 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-aminotransferase,

N,N-dimethyltransferase, deoxygenase glycosyltransferase

L-noviose 4,6-dehydratase, 4-ketoreductase, Epimerase, C-methyltransferase,


L-cladinose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,

epimerase, C-methyltransferase, O-methyltransferase, deoxygenase

glycosyltransferase

2'-N-methyl-D-

fucosamine

4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, N-methyltransferase,


D-digitalose 4,6-dehydratase, 4-ketoreductase, O-methyltransferase, deoxygenase

glycosyltransferase

3-O-methyl-rhamnose 4,6-dehydratase, 4-ketoreductase, epimerase, O-methyltransferase,


2-O-methyl-rhamnose 4,6-dehydratase, 4-ketoreductase, epimerase, O-methyltransferase,




Sugar Sugar enzymes

4-O-carbamoyl-D-olivose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,

carbamoyltransferase, deoxygenase glycosyltransferase

D-ravidosamine 4,6-dehydratase, 3,4-isomerase, 3-aminotransferase, N,N-

dimethyltransferase, deoxygenase glycosyltransferase

3-N,N-dimethyl-D-

mycosamine

4,6-dehydratase, 3,4-isomerase, 3-aminotransferase, N,N-

dimethyltransferase, deoxygenase glycosyltransferase

2,3-O-dimethyl-L-

rhamnose

4,6-dehydratase, 4-ketoreductase, epimerase, O-methyltransferase,


2,4-O-dimethyl-L-

rhamnose



3,4-O-dimethyl-L-

rhamnose



2-thioglucose thiosugar synthase, deoxygenase glycosyltransferase

Olivomycose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,

epimerase, C-methyltransferase, acetyltransferase, deoxygenase

glycosyltransferase

4-N,N-dimethylamino-4-

deoxy-5-C-methyl-l-

rhamnose

4,6-dehydratase, epimerase, 4-aminotransferase, C-methyltransferase,

N,N-dimethyltransferase, acetyltransferase, deoxygenase

glycosyltransferase

2,3,4-tri-O-

methylrhamnose



4-O-acetyl-L-arcanose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,

epimerase, C-methyltransferase, O-methyltransferase, acetyltransferase,


3-N-acetyl-D-

ravidosamine


dimethyltransferase, acetyltransferase, deoxygenase glycosyltransferase

3-O-carbamoyl-L-

noviose

4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, C-methyltransferase,

carbamoyltransferase, deoxygenase glycosyltransferase

L-nogalose 4,6-dehydratase, 4-ketoreductase, epimerase, C-methyltransferase, O-


4-O-acetyl-D-

ravidosamine



3-O-carbamoyl-4-O-

methyl-L-noviose

4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, C-methyltransferase,

O-methyltransferase, carbamoyltransferase, deoxygenase

glycosyltransferase

3-N-acetyl-4-O-acetyl-D-

ravidosamine



3-(5'-methyl-2'-

pyrrolylcarbonyl-)4-O-

methyl-L-noviose

4,6-dehydratase, 4-ketoreductase, epimerase, C-methyltransferase, O-

methyltransferase, pyrrolyltransferase, deoxygenase glycosyltransferase

Madurose UDP-sugar decarboxylase, UDP-sugar dehydrogenase, 4-

aminotransferase, C-methyltransferase, deoxygenase glycosyltransferase

4-N-methyl-4-amino-3-O-

methoxy-2,4,5-

trideoxypentose

UDP-sugar decarboxylase, UDP-sugar dehydrogenase, 2,3-dehydratase,

3-ketoreductase, 4-aminotransferase, N-methyltransferase, deoxygenase

glycosyltransferase

Glucose hexose

N-acetylglucosamine hexose



Sugar Sugar enzymes

Mannose hexose

Gulose hexose

L-oleandrose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,

epimerase, O-methyltransferase, deoxygenase glycosyltransferase

Olivomose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase, O-


4,6-dideoxy-4-

hydroxylamino-D-

glucose

4,6-dehydratase, 4-aminotransferase, deoxygenase glycosyltransferase

3-N,N-dimethyl-L-

eremosamine

4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, C-

methyltransferase, N,N-dimethyltransferase, deoxygenase

glycosyltransferase

Chromose (4-O-acetyl-β-

D-oliose

4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,

acetyltransferase, deoxygenase glycosyltransferase


Supplementary Table 3: Macrocycle & Heterocycle Cleavage Reactions. Each type of bond

breakage is shown with the associated retro-synthesis reaction and a compound reflecting the type

of bond breakage as an example. The R groups in reactions are used to simplify structures and are

independent to each other. The GRAPE breakdowns of all examples are listed at the end of the

table. For full names of the abbreviations, see Supplementary Data Set.

Breakage type Bond breakage reactions Examples1

Macrolide

Erythromycin

Thioesters

Thiocoraline

Macrolactam

ML-449

Β lactam like

structures

Salinosporimid

e

Cephalosporin;

penicillin

Nocardicin

Oxazoles

Chivosazole A

Thiazoles

Curacin

Multi-

thiazoles

Bleomycin

Kendomycin

substructure

Kendomycin



Breakage

type

Bond breakage reactions Examples1

Avermectin

terminal

substructure

Avermectin

Avermectin

starter

substructure

Avermectin

Piercidin

type

substructure

Piercidin

Anthramycin

type

substructure

Anthramycin

Epoxiketone

type

substructure

Eponemycin

Cyclic ether

process 1 in

PK

recognition

Monensin

Cyclic ether

process 2 in

PK

recognition

Monensin

PK epoxide

restoration

Mupirocin

1 GRAPE breakdowns of compounds are shown below.


Supplementary Table 4. Cleaving Chemical Bonds between monomers and tailorings. Each

type of bond breakage was shown with the associated retro-synthesis reaction and a compound

reflecting the type of bond breakage as an example. The R groups in reactions are used to simplify

structures and are independent to each other. The GRAPE breakdowns of all examples were listed

at the end of the table. For full names of the abbreviations, see Supplementary Data Set.

Breakage

type

Bond breakage reactions Examples1

Imide

bonds

Arthrofactin

Amide

bonds

Arthrofactin

Ureido

bonds

Mycoplaneci

n D

Disulfide

bridges SW163C

Ether

bridged

aromatics

Vancomycin

Bi-aryl C-

C linkages

between

aromatics

Vancomycin

Sulphate

groups

A-47934

Glycans

Apoptolidin

N-

methylated

amino

acids/sugar

s

SW163C



Breakage type Bond breakage reactions Examples1

O-methylated

amino

acids/sugars

Apoptolidin

C-methylated

amino acids

Yersiniabacti

n

Halogens

Vancomycin

Finding

epoxide rings

eponemycin

1 GRAPE breakdowns of compounds were listed below.


Supplementary Table 5. GARLIC algorithm configurations. For each scoring scenario, two

scoring metrics, basic scoring and refined scoring are listed. For details of non-assembly line

biosynthetic features match, sugar gene match and scaffold bonus, see Supplementary Tables 4, 5

and 6, respectively.

Scoring Property Basic

scoring

Empirical

scoring

Empirical

optimized scoring

Gaps between PRISM predicted ORFs -1 -2 -2.31

Gaps within PRISM predicted ORFs -1 -5 -5.37

Gaps caused by repeated GRAPE

monomers1

-1 -2.5 -2.5

Gaps between GRAPE monomer blocks2 -1 -2 -2.31

Gaps within GRAPE monomer blocks3 -1 -5 -5.37

Fatty acid or polyketide gap penalty -0.001 -0.001 -0.64

Proteinogenic amino acid match4 1 5 5.31

Amino acid partial match5 -1 1 1.18

Amino acid substitution penalty -1 -2 -2.25

Non-proteinogenic amino acid bonus6 0 3 3.43

Aromatic amino acid bonus 0 1 1.17

β-lactam match 0.25 1.5 0

Amino acid - polyketide substitution7 0 -10 -10

Polyketide substrate match8 0.5 1 1.27

Polyketide complete match bonus 0 1 0.99

Polyketide rare substrate match bonus9 0 3 2.01

Polyketide oxidation state match 0.5 3 3.25

Polyketide maximum score when multiple

oxidation states possible

No

limit

5 5

Polyketide substrate substitution penalty -0.5 -1 -1.93

Polyketide oxidation substitution penalty -0.5 -1 -1

Site specific tailoring match 0.25 2 5.44

Sugar gene match10, 11 0.25 0.05 0.05

Nonspecific tailoring match10 0.25 0.5 7.95

Acyl adenylating match10, 12 0.25 1 11.17

Scaffold bonus10, 13 0.25 2 0

Chemical type match10, 13 0.25 2 0

1 When GRAPE has repeated monomer units but PRISM only has one of the same monomer unit,

a penalty of “-1” or “-2.5” is given in basic or refined scoring, respectively. 2 When GRAPE outputs have less monomer units than PRISM outputs, gaps are created in the

alignment. If the gaps are between PRISM predicted ORFs, a penalty of “-1” or “-2” is given in

basic or refined scoring, respectively. 3 When GRAPE outputs have less monomer units than PRISM outputs, gaps are created in the

alignment. If the gaps are within PRISM predicted ORFs, a penalty of “-1” or “-5” is given in basic

or refined scoring, respectively.


4 A list of proteinogenic amino acid: alanine, arginine, asparagine, aspartic acid, cysteine, glutamic

acid, glutamine, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline,

serine, threonine, tyrosine, tryptophan, valine, phenylalanine, tryptophan and tyrosine. A list of

identical substrates for scoring (all in same brackets are considered the full match for scoring

purposes): (valine - 3 hydroxy anthranilic acid); (dehydro aminobutryic acid – threonine);

(hydroxy ornithine - ornithine).

5 A list of similar substrates for scoring (all in same brackets are considered the partial match for

scoring purposes): (valine - isoleucine - alanine - leucine – β-hydroxyvaline – β-lysine -

hydroxyleucine); (alanine – β-alanine); (glutamic acid - glutamine - piperazic acid); (phenylalanine

- tryptophan - tyrosine - hydroxyl-3-methylpentanoic acid - quinozaline carboxylic acid - β

hydroxy phenylalanine - hydroxy tyrosine – β-methyl phenylalanine); (asparagine - aspartic acid

- hydroxyasparagine - β-methyl aspartic acid - hydroxyaspartic acid); (methyl proline - proline).

6 A list of non-proteinogenic amino acids: amino epoxi oxodecanoic acid, β alanine, β-

methylphenylalanine, β-phenylalanine, β hydroxyphenylalanine, hydroxyleucine,

hydroxyphenylglycine, dihydroxyphenylglycine, hydroxyasparagine, hydroxyaspartic acid, β-

methyl aspartic acid, capreomycidine, citrulline, norvaline, isovaline, β-hydroxylvaline,

hydroxyornithine, hydroxyacetylornithine, ornithine, butenylmethyl threonine, methylproline,

hydroxytyrosine, β-lysine, adipic acid, kynurenine, aminobutyric acid, dehydro aminobutyric acid,

aminoisobutyric, coronamic, diaminopropinate, enduracididine, diaminobutyric acid, pipecolic

acid, methylglutamate, epoxy oxodecanoic acid, hydroxyl quinaldic acid, piperazic acid,

quinoxaline carboxylic acid, 3-hydroxy anthranilic acid. 7 When an amino acid is aligned with a polyketide monomer, a penalty of “-10” is given in refined

scoring. 8 A list of polyketide substrates for scoring: malonate, methyl malonate, methoxy malonate;

benzoate; ethyl malonate; isobutyrate; methyl butyrate. 9 Rare polyketide substrates are all substrates but malonate and methyl malonate. 10 Scoring properties added to the match score between the compounds after the alignment and not

normalized based on the size of the alignment. Therefore the scoring values given to these

properties will inherently have increased weight over those scores that get normalized, thus why

their scores are often lower than 1, and why the ‘Basic’ scoring scheme defaults them to 0.25. 11 For details of sugars and sugar genes, see Supplementary Table 5. 12 A list of identical acyl adenylating substrates for scoring (all in same brackets are considered

the partial match for scoring purposes): (pyruvate - lactate - valeric acid - 3 hydroxyl pantanolic

acid); (salicylic acid – 3-formamido-5-hydroxy benzoic acid - benzoic acid); (alpha-

ketoisocaproae – hydroxyl-3-methyl pentanoic acid). 13 Scaffold and chemical type matches are only applied to type 2 polyketides and enediynes. For

details of scaffolds and chemical types of type 2 polyketides and enediynes, see Supplementary

Table 6.


Supplementary Table 6. Scaffold types of type 2 polyketides and enediynes detected by

PRISM and GRAPE. Compounds are matched to each scaffold type in GRAPE. a) PRISM

identifies one or multiple cyclases enzymes in gene clusters and match to each scaffold for type 2

polyketides. b) Ketosynthase and phosphopantetheinyl transferase of enediyne PKSs are detected

by PRISM which match to corresponding enediyne core structures.

Supplementary Table 6a

Scaffold types First cyclase Second cyclase Third cyclase Fourth

cyclase

Enterocin Favorskiiase Type II polyketide

cyclase, clade 1

Type II polyketide

cyclase, clade 5a

AZ154 Type II polyketide

cyclase, clade 7

Type II polyketide

cyclase, clade 1

Type II polyketide

cyclase, clade 6a

Type II

polyketide

cyclase, clade

3

Fredericamycin1 Type II polyketide

cyclase, clade 7

Type II polyketide

cyclase, clade 1

Type II polyketide

cyclase, clade 6a

Type II

polyketide

cyclase, clade

3


cyclase, clade 7

Type II polyketide

cyclase, clade 1

Type II polyketide

cyclase, clade 6a

Type II

polyketide

cyclase, clade

3


cyclase, clade 7

Type II polyketide

cyclase, clade 1

Type II polyketide

cyclase, clade 6a

Type II

polyketide

cyclase, clade

3


cyclase, clade 7

Type II polyketide

cyclase, clade 2

Type II polyketide

cyclase, clade 6b

Aureolic_acid1 Type II polyketide

cyclase, clade 7

Type II polyketide

cyclase, clade 1

Type II polyketide

cyclase, clade 5a

Lysolipin Type II polyketide

cyclase, clade 7

Type II polyketide

cyclase, clade 1

Type II polyketide

cyclase, clade 5a

Rubromycin4 Type II polyketide

cyclase, clade 7

Type II polyketide

cyclase, clade 1

Type II polyketide

cyclase, clade 5a


cyclase, clade 7

Type II polyketide

cyclase, clade 1

Type II polyketide

cyclase, clade 5a

Tetracyclines Type II polyketide

cyclase, clade 10;

Type II polyketide

cyclase, clade 8a;

Type II polyketide

cyclase, clades 8/9

Type II polyketide

cyclase, clade 2

Type II polyketide

cyclase, clade 6b

subtype 1


cyclase, clade 7

Type II polyketide

cyclase, clade 1

Type II polyketide

cyclase, clade 5a


cyclase, clade 7

Type II polyketide

cyclase, clade 2

Type II polyketide

cyclase, clade 6a


Supplementary Table 6a (continued)

Scaffold types First cyclase Second cyclase Third

cyclase

Fourth

cyclase

Chartarin Type II polyketide cyclase,

clade 8a; Type II

polyketide cyclase, clades

8/9, Type II polyketide

cyclase, clade 10

Type II polyketide

cyclase, clade 4, Type II

polyketide cyclase, clade

5b

Urdamycin Type II polyketide cyclase,

clade 10; Type II


8a; Type II polyketide

cyclase, clades 8/9

Type II polyketide

cyclase, clade 1

Type II

polyketide

cyclase,

clade 5a

Pradimicin2 Type II polyketide cyclase,

clade 7

Type II polyketide

cyclase, clade 1

Type II

polyketide

cyclase,

clade 5a

Pradimicin1 Type II polyketide cyclase,

clade 7

Type II polyketide

cyclase, clade 1

Wailupemyci

n

Type II polyketide cyclase,

clade 3

Type II polyketide

cyclase, clade 4; Type II


5b

Angucyclines4 Type II polyketide cyclase,

clade 10; Type II



cyclase, clades 8/9

Type II polyketide



5b


clade 10; Type II



cyclase, clades 8/9

Type II polyketide



5b


clade 10; Type II



cyclase, clades 8/9

Type II polyketide



5b


clade 10; Type II



cyclase, clades 8/9

Type II polyketide

cyclase, clade 2

Type II

polyketide

cyclase,

clade 6b

subtype 2

Nogalonate Type II polyketide cyclase,

clade 10; Type II



cyclase, clades 8/9

Type II polyketide

cyclase, clade 3


Supplementary Table 6a (continued)

Scaffold types First cyclase Second cyclase Third

cyclase

Fourth

cyclase

Benziosochromanequinone Type II polyketide

cyclase, clade 10; Type

II polyketide cyclase,

clade 8a; Type II

polyketide cyclase,

clades 8/9

Type II

polyketide

cyclase, clade 2

Type II

polyketide

cyclase,

clade 6b

UT_X26 Type II polyketide

cyclase, clade 7

Type II

polyketide

cyclase, clade 4,

Type II

polyketide

cyclase, clade 5b

AB649 Type II polyketide



clade 8a; Type II

polyketide cyclase,

clades 8/9

Juglomycin Type II polyketide



clade 8a; Type II

polyketide cyclase,

clades 8/9

Supplementary Table 6b

Scaffold types Enediyne ketosynthase Enediyne PPTase

Nine-membered enediyne

core structure

Nine-membered

enediyne type I

ketosynthase

Nine-membered

enediyne PPTase

Ten-membered enediyne

core structure

Ten-membered enediyne

type I ketosynthase

Ten-membered

enediyne PPTase


Supplementary Figure 4. A comparison of different scoring matrices of GARLIC. Smith-

Waterman algorithm is implemented for local alignment, whereas Needleman-Wunsch algorithm

is used for global alignment. GF and LOO represent optimized from global refine scoring and

leave one out analysis, respectively. Basic and refined scoring metrics are in Supplementary

Table 3.


Supplementary Figure 5. Monomodular enediynes, glycosylated type 2 polyketides match in

GARLIC. GRAPE results of 17 enediynes and glycosylated type 2 polyketides are clustered and

GARLIC matches are showing in dark purple. Structures of calicheamicin, cosmomycin and

dactylocycline are showing as examples.


Supplementary Figure 6. Screenshot of GARLIC front end web application. It can be accessed

through www.magarveylab.ca/garlic.


Supplementary Figure 7. GARLIC alignment for the telomycin gene cluster block output

versus the telomycin breakdown output. (a) Gene cluster of telomycin from PRISM; (b)

GARLIC match of telomycin. For full names of the abbreviations, see Supplementary Data Set.


Supplementary Figure 8. GARLIC process pipeline of acidobactin. (a) Gene cluster of

acidobactin from PRISM; (b) GARLIC match of acidobactin. For full names of the abbreviations,

see Supplementary Data Set.


Supplementary Figure 9. GARLIC process pipeline of thanamycin. (a) Gene cluster of

thanamycin from PRISM; (b) GARLIC match of thanamycin. For full names of the abbreviations,

see Supplementary Data Set.


Supplementary Figure 10. GARLIC process pipeline of potensimicin. (a) Gene cluster of

potensimicin from PRISM; (b) GARLIC match of potensimicin. For full names of the

abbreviations, see Supplementary Data Set.


Supplementary Figure 11. GARLIC process pipeline of lucensomycin. (a) Gene cluster of

lucensomycin from PRISM; (b) GARLIC match of lucensomycin; (c) and (d) LC/MS output for

lucensomycin. For full names of the abbreviations, see Supplementary Data Set.


Supplementary Figure 12. GARLIC process pipeline of octacosamicin. (a) Gene cluster of

octacosamicin from PRISM; (b) GARLIC match of octacosamicin; (c) and (d) LC/MS output for

octacosamicin. For full names of the abbreviations, see Supplementary Data Set.


Supplementary Figure 13. GARLIC process pipeline of tauramamide. (a) Gene cluster of

tauramamide from PRISM; (b) GARLIC match of tauramamide; (c) and (d) LC/MS output for

tauramamide. For full names of the abbreviations, see Supplementary Data Set.


Supplementary Figure 14. GARLIC process pipeline of bogorol. (a) Gene cluster of bogorol

from PRISM; (b) GARLIC match of bogorol; (c) and (d) LC/MS output for bogorol. For full names

of the abbreviations, see Supplementary Data Set.


Supplementary Figure 15. Pairwise alignments of biosynthetic gene clusters identified by

GARLIC with putative gene clusters from established producers. a. Homology searches

using the lucensomycin gene cluster from Streptomyces achromogenes NRRL 3125 against the

highly fragmented (3,223 contigs) and low coverage (largest contig: 21,706 bp) genome

sequence data of the known lucensomycin producer Streptomyces lucensis JCM 4490 revealed

likely lucensomycin biosynthesis genes on 11 contigs, including 19 sequences associated with

open reading frames. The average open reading frame percent identity for nucleotides was

90.2%. b. Homology searches using the bogorol gene cluster from Brevibacillus laterosporus

DSM 25 against the complete genome of the known bogorol producer Brevibacillus laterosporus

LMG 15441 revealed an identical biosynthetic gene cluster. The pairwise alignment revealed a

percent identity for nucleotides of 97.1%. c. Homology searches using the octacosamicin gene

cluster from Amycolatopsis spp. NAM50 against the partial genome of the known octacosmicin

producer Amycolatopsis azurea DSM 43854 revealed an identical biosynthetic gene cluster split

between two contigs. The pairwise alignment revealed a percent identity for nucleotides of

90.6%.


Supplementary Figure 16. GARLIC process pipeline of potensibactin. (a) Gene cluster of

potensibactin from PRISM; (b) GARLIC match of potensibactin; (c) LC/MS output for

potensibactin. For full names of the abbreviations, see Supplementary Data Set.


Supplementary Table 7. Comparison of the GRAPE results of known compounds with

PRISM results of their producers’ genomes based on GARLIC analysis. All full genomes of

producers of 171 compounds were ran through PRISM, and the PRISM outputs were aligned

with the GRAPE results of their produced compounds through GARLIC program. The score

differences between the top matched cluster in each genome and the second best match were

listed. The score differences of octacosamicin, lucensomycin and bogorol were also included

based on our in-house genomes.

Compounds Organism Score difference between

top match and the second

best clusters

A47934 Streptomyces toyocaensis NRRL15009 0.97

surfactin Bacillus subtilis 1.22

lichenysin Bacillus licheniformis 1.46

polymyxin Bacillus polymyxa 0.82

pimaricin Streptomyces natalensis 1.18

ristocetin Amycolatopsis lurida 0.86

fengycin Bacillus subtilis 1.05

salinomycin Streptomyces albus XM211 0.61

erythromycin Aeromicrobium erythreum 1.12

amphotericin Streptomyces nodosus ATCC 14899 1.11

crocacin Chondromyces crocatus 0.66

telomycin Streptomyces canus 0.4

daptomycin Streptomyces roseosporus 0.67

nodularin Nodularia spumigena 0.79

Octacosamicin1 NAM50 0.85

FK506 Streptomyces tsukubaensis 0.49

Cephamycin C Streptomyces clavuligerus 0.55

bacillaene Bacillus subtilis 0.57

Gephyronic acid Cystobacter violaceus Cb vi76 0.49

avermectin Streptomyces avermetlis 0.37

myxothiazol Myxococcus fulvus 0.37

kutznerides Kutzneria sp. 744 0.36

Lucensomycin1 Streptomytces achromogenes 0.64

Acidobactin Acidovorax citrulli DSM 17060 0.49

Bogorol1 Brevibacillus laterosporus DSM 15441 0.9 1 The score differences were derived from in-house genomes.


Supplementary Data Set: Names and abbreviations of substrates used in the GARLIC

pipeline. Amino acids, hydroxy acids, sugars, fatty acids and site-specific tailoring are listed

with their full names and abbreviations.


Supplementary Note

High resolution mass data for in-house compounds.

Compound Molecular

Formula

Calculated

m/z

Observed

m/z

Δppm

Bogorol [M+H]+ C80H143N16O16 1584.087 1584.087 0.1

Lucensomycin [M+H]+ C36H54NO13 708.3595 708.3572 3.269

Octacosamicin B [M+H]+ C32H55N4O9 639.3969 639.3947 3.463

Tauramamide methyl ester

[M+H]+

C45H68N9O9 878.5140 878.5133 0.854

Acidobactin B [M+H]+ C28H48N7O15 722.3228 722.3177 2.609

Telomycin A [M+H]+ C59H78N13O19 1272.553 1272.553 0.662

Thanamycin A [M+H]+ C54H88ClN12O22 1291.582 1291.582 0.205

Potensimicin [M+H]+ C28H48NO8 526.3374 526.3356 4.565

Potensibactin [M+H]+ C29H43N8O14 727.28987 727.28958 0.403


NMR spectroscopic data for lucensomycin (700 MHz, in DMSO-d6)1. Structure, key COSY

and HMBC correlations of lucensomycin are showing below the table.

Position δH (mult.) δC Position δH (mult.) δC

1 0.86 (t, 5.9) 13.84 19 - 174.87

2 1.29, 1.26 (m) 21.96 20 4.01 (m) 65.25

3 1.25, 1.25 (m) 27.02 21 1.82, 1.11 (d, 8.2) 44.23

4 1.61, 1.57 (m) 33.74 22 - 97.11

5 4.7 (m) 72.65 23 1.58, 1.50 (d, 14) 46.26

6

2.39, 2.15 (q, 10.8,

12.2) 37.39 24 4.12 (m) 66.05

7 5.56 (m) 128.62 25 1.92, 1.18 (m) 40.98

8 6.01 (dd, 10.4, 4.4) 135.46 26 2.73 (m) 58.16

9 6.15 (m) 131.44 27 3.22 (m) 53.79

10 6.13 (m) 131.98 28 6.33 (dd, 6.9, 8.9) 144.27

11 6.2 (dd, 10.4, 3.5) 131.38 29 6.08 (m) 123.88

12 6.49 (dd, 11.5, 2.7) 133.44 30 - 164.75

13 6.08 (m) 128.67 1' 4.32 (m) 96.22

14 5.85 (dd, 8.8, 6.1) 136.02 2' 3.63 (m) 69.56

15 4.36 (m) 74.31 3' 3.37 (m) 71.54

16 2.05, 1.54 (m) 36.66 4' 2.59 (m) 54.39

17 4.19 (m) 65.2 5' 3.25 (m) 70.46

18 1.91 (m) 57.44 6' 1.18 (m) 18.04 1 Chemical shift and (multiplicity, J in Hz).


1H NMR spectrum of lucensomycin in d6-DMSO.

13C NMR spectrum of lucensomycin in d6-DMSO.


1H-13C HSQC NMR spectrum of lucensomycin in d6-DMSO.

1H-13C HSQC-TOCSY NMR spectrum of lucensomycin in d6-DMSO.


1H-13C HMBC NMR spectrum of lucensomycin in d6-DMSO.

1H-1H COSY NMR spectrum of lucensomycin in d6-DMSO.


1H-1H TOCSY NMR spectrum of lucensomycin in d6-DMSO.

1H-1H NOESY NMR spectrum of lucensomycin in d6-DMSO.


NMR spectroscopic data for potensibactin (700 MHz, in DMSO-d6) 1. Structure, key COSY

and HMBC correlations of potensibactin are showing below the table.

Position δH (mult.) δC Position δH (mult.) δC

1 - 149.37 19 4.28 (ov.) 55.38

2 - 146.2 20 3.60, 3.55

(ov.) 61.66

3 6.92 (d, 7.4) 118.84 21 - 169.67

4 6.69 (t, 7.9) 118 22 7.92 (d, 8.2) -

5 7.3 (d, 7.8) 117.8 23 4.25 (ov.) 52.3

6 - 115.3 24 1.68, 1.50

(ov.) 29.38

7 - 169.74 25 1.57, 1.51

(ov.) 22.83

8 9.09 (m) - 26 3.50, 3.41

(ov.) 46.69

9 3.95, 3.95 (m) 42.3 27 - 170.42

10 - 168.97 28 1.97 (s) 20.39

11 8.29 (t, 5.2) - 29 - 171.16

12 3.81, 3.81 (d,

4.8) 42.12 30 8.05 (ov.) -

13 - 168.92 31 4.3 (ov.) 49.53

14 8.03 (ov.) - 32 1.87, 1.65

(ov.) 27.53

15 4.33 (ov.) 55.24 33 1.90, 1.85

(ov.) 20.31

16 3.60, 3.55

(ov.) 61.7 34

3.46, 3.45

(ov.) 51.23

17 - 170.11 35 - 164.7

18 7.94 (d, 7.4) - 1 Chemical shift and (multiplicity, J in Hz).


1H NMR spectrum of potensibactin in d6-DMSO.

13C NMR spectrum of potensibactin in d6-DMSO.


1H-13C HSQC NMR spectrum of potensibactin in d6-DMSO.

1H-13C HSQC-TOCSY NMR spectrum of potensibactin in d6-DMSO.


1H-13C HMBC NMR spectrum of potensibactin in d6-DMSO.

1H-1H COSY NMR spectrum of potensibactin in d6-DMSO.


Supplementary Information Polyketide and Nonribosomal ... fileSupplementary Information Polyketide...

Documents

Transcript of Supplementary Information Polyketide and Nonribosomal ... fileSupplementary Information Polyketide...