Supplementary Information Polyketide and Nonribosomal ... fileSupplementary Information Polyketide...
Transcript of Supplementary Information Polyketide and Nonribosomal ... fileSupplementary Information Polyketide...
Supplementary Information
Polyketide and Nonribosomal Peptide Retrobiosynthesis and Comparison to Gene Clusters
Chris A. Dejong1,2, Gregory M. Chen1,2, Haoxin Li1,2, Chad W. Johnston1, Mclean R. Edwards1,
Philip N. Rees1, Michael A. Skinnider1 Andrew L. H. Webster1 & Nathan A. Magarvey1* 1 Department of Biochemistry & Biomedical Sciences; Department of Chemistry & Chemical
Biology, M. G. DeGroote Institute for Infectious Disease Research; McMaster University,
Hamilton, Canada L8S 4K1
2 these authors contributed equally to this work
* Corresponding author, email address: [email protected]
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Results
Supplementary Figure 1. Visualization of GRAPE process pipeline. GRAPE takes in small
molecule structures in the form of SMILES and breaks them down in the above order, while
capturing details of the chemistries during retro-synthesis. The final output is monomer
information for both amino acids (AA) and polyketides (PK), as well as additional tailoring and
scaffold information. For full names of the abbreviations, see Supplementary Data Set.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 2. Visualization of polyketide carbon walking. GRAPE finds the
biosynthetic end carbon (carboxylic acid carbon) and is then able to find the biosynthetic start
carbon by locating the furthest carbon away that is in a carbon only chain (no other intermediate
atoms) and is not terminal. If the furthest atom away cannot be a β carbon (incorrect number of
carbons in the chain) the second furthest away carbon is then used. The states of each α and β
carbon are analyzed for the substrate and oxidation state respectively. For full names of the
abbreviations, see Supplementary Data Set.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 3. Visualization of amino acids as acyl keto-extension units in hybrid
polyketides and non-ribosomal peptides. GRAPE identifies the longest carbon only backbone
from the α-carbon of the amine to carbonyl carbon of the furthest carboxylic acid. If the carbon
chain has an odd number of carbons, the keto-extended amino acid is identified as β-amino acid.
The bond between β-carbon and γ-carbon is then broken, and a carboxylic acid is added to the β-
carbon to create the β amino acid. If the carbon chain has an even number of carbons, the keto-
extended amino acid is identified as α-amino acid. The bond between α-carbon and β-carbon is
then broken and a carboxylic acid is added to the α-carbon to create α-amino acid.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 1. a) Accuracy of assembly line predictive units. b) Assembly line
biosynthetic tailoring features detected by PRISM and GRAPE. c) Non-assembly line
biosynthetic features detected by PRISM and GRAPE. For each assembly line or non-assembly
line biosynthetic feature, PRISM detects the corresponding genes in a gene cluster, while GRAPE
detects the corresponding structural features in small molecules.
a
Assembly line predictive types PRISM prediction accuracy
Proteinogenic amino acids 94%
Nonprotenogenic amino acids 93%
Substrates of AT domains in polyketide
clusters
74%
Oxidation states of AT domains 75%
Deoxy sugars 64%
b
Assembly line biosynthetic
tailoring features
Detection by PRISM Detection by GRAPE
Fatty acyl addition Fatty acyl adenylating enzyme Presence of a fatty chain
O-methyltransferase O-methyltransferase See Supplementary Table 4
N-methyltransferase N-methyltransferase See Supplementary Table 4
C-methyltransferase C-methyltransferase See Supplementary Table 4
Thiazole Cyclase See Supplementary Table 4
Oxazole Cyclase See Supplementary Table 4
Tryptophan dioxygenase Tryptophan dioxygenase Kynurenine substructure
c
Non-assembly line
biosynthetic features
Detection by PRISM Detection by GRAPE
Chlorination Chlorination enzyme Presence of chlorine atoms
Presence of a sugar Sugar synthesis enzymes Identification of a specific
sugar
Sulphate group Presence of sulfotransferase
enzyme
Presence of sulphate group
Chemical Scaffolds1 Enzymes responsible scaffold
biosynthesis
Identification of a specific
scaffold
Acyl adenylating
substrates
Acyl adenylating enzyme Identification of a specific
acyl adenylating substrate 1 Chemical scaffolds are only applied to type 2 polyketides and enediyne scaffolds.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 2. Glycosylated tailoring detected by PRISM and GRAPE. Each sugar
molecule detected by GRAPE is linked to one or multiple biosynthetic enzymes for the
identification.
Sugar Sugar enzymes
L-aculose 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-
ketoreductase, epimerase, oxidoreductase, deoxygenase
glycosyltransferase
L-cinerulose A 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-
ketoreductase, epimerase, deoxygenase glycosyltransferase
L-rhodinose 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-
ketoreductase, epimerase, deoxygenase glycosyltransferase
Rednose 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-
ketoreductase, epimerase, deoxygenase glycosyltransferasegenase
glycosyltransferase
L-cinerulose B 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-
ketoreductase, epimerase, deoxygenase glycosyltransferase
O-methyl-L-amicetose 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-
ketoreductase, epimerase, O-methyltransferase, deoxygenase
glycosyltransferase
4-O-methyl-L-rhodinose 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-
ketoreductase, epimerase, O-methyltransferase, deoxygenase
glycosyltransferase
L-daunosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 3-
aminotransferase, deoxygenase glycosyltransferase
L-ristosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 4-
aminotransferase, deoxygenase glycosyltransferase
D-digitoxose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
deoxygenase glycosyltransferase
L-digitoxose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
epimerase, deoxygenase glycosyltransferase
2-deoxy-L-fucose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
Epimerase, deoxygenase glycosyltransferase
D-olivose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
epimerase, deoxygenase glycosyltransferase
D-oliose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
deoxygenase glycosyltransferase
4-oxo-L-vancosamine 4,6-dehydratase, 2,3-dehydratase, Epimerase, 4-aminotransferase, C-
methyltransferase, deoxygenase glycosyltransferase
D-forosamine 4,6-dehydratase, 2,3-dehydratase, 3,4-dehydratase, 3-ketoreductase, 4-
aminotransferase, N,N-dimethyltransferase, deoxygenase
glycosyltransferase
L-actinosamine 4,6-dehydratase, 2,3-dehydratase, deoxygenase glycosyltransferase
L-vancosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 3-
aminotransferase, C-methyltransferase, deoxygenase glycosyltransferase
L-vicenisamine 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-aminotransferase,
N-methyltransferase, Deoxygenase glycosyltransferase
D-chalcose 4,6-dehydratase, 3-ketoreductase, 4-aminotransferase, O-
methyltransferase, oxidative deaminase, deoxygenase glycosyltransferase
D-mycarose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
epimerase, C-methyltransferase, deoxygenase glycosyltransferase
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 2 (continued)
Sugar Sugar enzymes
D-mycosamine 4,6-dehydratase, 3,4-isomerase, 3-aminotransferase, O-methyltransferase,
deoxygenase glycosyltransferase
4-deoxy-4-thio-D-
digitoxose
4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, epimerase, thiosugar
synthase, deoxygenase glycosyltransferase
D-fucofuranose 4,6-dehydratase, 4-ketoreductase, deoxygenase glycosyltransferase
D-fucose 4,6-dehydratase, 4-ketoreductase, deoxygenase glycosyltransferase
L-rhamnose 4,6-dehydratase, 4-ketoreductase, epimerase, deoxygenase
glycosyltransferase
4-N-ethyl-4-amino-3-O-
methoxy-2,4,5-
trideoxypentose
UDP-sugar decarboxylase, UDP-sugar dehydrogenase, 2,3-dehydratase,
3-ketoreductase, 4-aminotransferase, N-ethyltransferase, deoxygenase
glycosyltransferase
D-3-N-methyl-4-O-
methyl-L-ristosamine
4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 4-
aminotransferase, N-methyltransferase, O-methyltransferase,
deoxygenase glycosyltransferase
N,N-dimethyl-L-
pyrrolosamine
4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, epimerase, 4-
aminotransferase, N,N-dimethyltransferase, deoxygenase
glycosyltransferase
D-desosamine 4,6-dehydratase, 3,4-dehydratase, 3-aminotransferase, N,N-
dimethyltransferase, oxidative deaminase, deoxygenase
glycosyltransferase
L-megosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 3-
aminotransferase, N,N-dimethyltransferase, deoxygenase
glycosyltransferase
Nogalamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 3-
aminotransferase, N,N-dimethyltransferase, deoxygenase
glycosyltransferase
L-rhodosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, 3-
aminotransferase, N,N-dimethyltransferase, deoxygenase
glycosyltransferase
D-angolosamine 4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, 3-aminotransferase,
N,N-dimethyltransferase, deoxygenase glycosyltransferase
Kedarosamine 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-aminotransferase,
N,N-dimethyltransferase, deoxygenase glycosyltransferase
L-noviose 4,6-dehydratase, 4-ketoreductase, Epimerase, C-methyltransferase,
deoxygenase glycosyltransferase
L-cladinose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
epimerase, C-methyltransferase, O-methyltransferase, deoxygenase
glycosyltransferase
2'-N-methyl-D-
fucosamine
4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, N-methyltransferase,
deoxygenase glycosyltransferase
D-digitalose 4,6-dehydratase, 4-ketoreductase, O-methyltransferase, deoxygenase
glycosyltransferase
3-O-methyl-rhamnose 4,6-dehydratase, 4-ketoreductase, epimerase, O-methyltransferase,
deoxygenase glycosyltransferase
2-O-methyl-rhamnose 4,6-dehydratase, 4-ketoreductase, epimerase, O-methyltransferase,
deoxygenase glycosyltransferase
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 2 (continued)
Sugar Sugar enzymes
4-O-carbamoyl-D-olivose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
carbamoyltransferase, deoxygenase glycosyltransferase
D-ravidosamine 4,6-dehydratase, 3,4-isomerase, 3-aminotransferase, N,N-
dimethyltransferase, deoxygenase glycosyltransferase
3-N,N-dimethyl-D-
mycosamine
4,6-dehydratase, 3,4-isomerase, 3-aminotransferase, N,N-
dimethyltransferase, deoxygenase glycosyltransferase
2,3-O-dimethyl-L-
rhamnose
4,6-dehydratase, 4-ketoreductase, epimerase, O-methyltransferase,
deoxygenase glycosyltransferase
2,4-O-dimethyl-L-
rhamnose
4,6-dehydratase, 4-ketoreductase, epimerase, O-methyltransferase,
deoxygenase glycosyltransferase
3,4-O-dimethyl-L-
rhamnose
4,6-dehydratase, 4-ketoreductase, epimerase, O-methyltransferase,
deoxygenase glycosyltransferase
2-thioglucose thiosugar synthase, deoxygenase glycosyltransferase
Olivomycose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
epimerase, C-methyltransferase, acetyltransferase, deoxygenase
glycosyltransferase
4-N,N-dimethylamino-4-
deoxy-5-C-methyl-l-
rhamnose
4,6-dehydratase, epimerase, 4-aminotransferase, C-methyltransferase,
N,N-dimethyltransferase, acetyltransferase, deoxygenase
glycosyltransferase
2,3,4-tri-O-
methylrhamnose
4,6-dehydratase, 4-ketoreductase, epimerase, O-methyltransferase,
deoxygenase glycosyltransferase
4-O-acetyl-L-arcanose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
epimerase, C-methyltransferase, O-methyltransferase, acetyltransferase,
deoxygenase glycosyltransferase
3-N-acetyl-D-
ravidosamine
4,6-dehydratase, 3,4-isomerase, 3-aminotransferase, N,N-
dimethyltransferase, acetyltransferase, deoxygenase glycosyltransferase
3-O-carbamoyl-L-
noviose
4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, C-methyltransferase,
carbamoyltransferase, deoxygenase glycosyltransferase
L-nogalose 4,6-dehydratase, 4-ketoreductase, epimerase, C-methyltransferase, O-
methyltransferase, deoxygenase glycosyltransferase
4-O-acetyl-D-
ravidosamine
4,6-dehydratase, 3,4-isomerase, 3-aminotransferase, N,N-
dimethyltransferase, acetyltransferase, deoxygenase glycosyltransferase
3-O-carbamoyl-4-O-
methyl-L-noviose
4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, C-methyltransferase,
O-methyltransferase, carbamoyltransferase, deoxygenase
glycosyltransferase
3-N-acetyl-4-O-acetyl-D-
ravidosamine
4,6-dehydratase, 3,4-isomerase, 3-aminotransferase, N,N-
dimethyltransferase, acetyltransferase, deoxygenase glycosyltransferase
3-(5'-methyl-2'-
pyrrolylcarbonyl-)4-O-
methyl-L-noviose
4,6-dehydratase, 4-ketoreductase, epimerase, C-methyltransferase, O-
methyltransferase, pyrrolyltransferase, deoxygenase glycosyltransferase
Madurose UDP-sugar decarboxylase, UDP-sugar dehydrogenase, 4-
aminotransferase, C-methyltransferase, deoxygenase glycosyltransferase
4-N-methyl-4-amino-3-O-
methoxy-2,4,5-
trideoxypentose
UDP-sugar decarboxylase, UDP-sugar dehydrogenase, 2,3-dehydratase,
3-ketoreductase, 4-aminotransferase, N-methyltransferase, deoxygenase
glycosyltransferase
Glucose hexose
N-acetylglucosamine hexose
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 2 (continued)
Sugar Sugar enzymes
Mannose hexose
Gulose hexose
L-oleandrose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
epimerase, O-methyltransferase, deoxygenase glycosyltransferase
Olivomose 4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase, O-
methyltransferase, deoxygenase glycosyltransferase
4,6-dideoxy-4-
hydroxylamino-D-
glucose
4,6-dehydratase, 4-aminotransferase, deoxygenase glycosyltransferase
3-N,N-dimethyl-L-
eremosamine
4,6-dehydratase, 2,3-dehydratase, 4-ketoreductase, epimerase, C-
methyltransferase, N,N-dimethyltransferase, deoxygenase
glycosyltransferase
Chromose (4-O-acetyl-β-
D-oliose
4,6-dehydratase, 2,3-dehydratase, 3-ketoreductase, 4-ketoreductase,
acetyltransferase, deoxygenase glycosyltransferase
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 3: Macrocycle & Heterocycle Cleavage Reactions. Each type of bond
breakage is shown with the associated retro-synthesis reaction and a compound reflecting the type
of bond breakage as an example. The R groups in reactions are used to simplify structures and are
independent to each other. The GRAPE breakdowns of all examples are listed at the end of the
table. For full names of the abbreviations, see Supplementary Data Set.
Breakage type Bond breakage reactions Examples1
Macrolide
Erythromycin
Thioesters
Thiocoraline
Macrolactam
ML-449
Β lactam like
structures
Salinosporimid
e
Cephalosporin;
penicillin
Nocardicin
Oxazoles
Chivosazole A
Thiazoles
Curacin
Multi-
thiazoles
Bleomycin
Kendomycin
substructure
Kendomycin
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 3 (continued)
Breakage
type
Bond breakage reactions Examples1
Avermectin
terminal
substructure
Avermectin
Avermectin
starter
substructure
Avermectin
Piercidin
type
substructure
Piercidin
Anthramycin
type
substructure
Anthramycin
Epoxiketone
type
substructure
Eponemycin
Cyclic ether
process 1 in
PK
recognition
Monensin
Cyclic ether
process 2 in
PK
recognition
Monensin
PK epoxide
restoration
Mupirocin
1 GRAPE breakdowns of compounds are shown below.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Nature Chemical Biology: doi:10.1038/nchembio.2188
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 4. Cleaving Chemical Bonds between monomers and tailorings. Each
type of bond breakage was shown with the associated retro-synthesis reaction and a compound
reflecting the type of bond breakage as an example. The R groups in reactions are used to simplify
structures and are independent to each other. The GRAPE breakdowns of all examples were listed
at the end of the table. For full names of the abbreviations, see Supplementary Data Set.
Breakage
type
Bond breakage reactions Examples1
Imide
bonds
Arthrofactin
Amide
bonds
Arthrofactin
Ureido
bonds
Mycoplaneci
n D
Disulfide
bridges SW163C
Ether
bridged
aromatics
Vancomycin
Bi-aryl C-
C linkages
between
aromatics
Vancomycin
Sulphate
groups
A-47934
Glycans
Apoptolidin
N-
methylated
amino
acids/sugar
s
SW163C
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 4 (continued)
Breakage type Bond breakage reactions Examples1
O-methylated
amino
acids/sugars
Apoptolidin
C-methylated
amino acids
Yersiniabacti
n
Halogens
Vancomycin
Finding
epoxide rings
eponemycin
1 GRAPE breakdowns of compounds were listed below.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 5. GARLIC algorithm configurations. For each scoring scenario, two
scoring metrics, basic scoring and refined scoring are listed. For details of non-assembly line
biosynthetic features match, sugar gene match and scaffold bonus, see Supplementary Tables 4, 5
and 6, respectively.
Scoring Property Basic
scoring
Empirical
scoring
Empirical
optimized scoring
Gaps between PRISM predicted ORFs -1 -2 -2.31
Gaps within PRISM predicted ORFs -1 -5 -5.37
Gaps caused by repeated GRAPE
monomers1
-1 -2.5 -2.5
Gaps between GRAPE monomer blocks2 -1 -2 -2.31
Gaps within GRAPE monomer blocks3 -1 -5 -5.37
Fatty acid or polyketide gap penalty -0.001 -0.001 -0.64
Proteinogenic amino acid match4 1 5 5.31
Amino acid partial match5 -1 1 1.18
Amino acid substitution penalty -1 -2 -2.25
Non-proteinogenic amino acid bonus6 0 3 3.43
Aromatic amino acid bonus 0 1 1.17
β-lactam match 0.25 1.5 0
Amino acid - polyketide substitution7 0 -10 -10
Polyketide substrate match8 0.5 1 1.27
Polyketide complete match bonus 0 1 0.99
Polyketide rare substrate match bonus9 0 3 2.01
Polyketide oxidation state match 0.5 3 3.25
Polyketide maximum score when multiple
oxidation states possible
No
limit
5 5
Polyketide substrate substitution penalty -0.5 -1 -1.93
Polyketide oxidation substitution penalty -0.5 -1 -1
Site specific tailoring match 0.25 2 5.44
Sugar gene match10, 11 0.25 0.05 0.05
Nonspecific tailoring match10 0.25 0.5 7.95
Acyl adenylating match10, 12 0.25 1 11.17
Scaffold bonus10, 13 0.25 2 0
Chemical type match10, 13 0.25 2 0
1 When GRAPE has repeated monomer units but PRISM only has one of the same monomer unit,
a penalty of “-1” or “-2.5” is given in basic or refined scoring, respectively. 2 When GRAPE outputs have less monomer units than PRISM outputs, gaps are created in the
alignment. If the gaps are between PRISM predicted ORFs, a penalty of “-1” or “-2” is given in
basic or refined scoring, respectively. 3 When GRAPE outputs have less monomer units than PRISM outputs, gaps are created in the
alignment. If the gaps are within PRISM predicted ORFs, a penalty of “-1” or “-5” is given in basic
or refined scoring, respectively.
Nature Chemical Biology: doi:10.1038/nchembio.2188
4 A list of proteinogenic amino acid: alanine, arginine, asparagine, aspartic acid, cysteine, glutamic
acid, glutamine, glycine, histidine, isoleucine, leucine, lysine, methionine, phenylalanine, proline,
serine, threonine, tyrosine, tryptophan, valine, phenylalanine, tryptophan and tyrosine. A list of
identical substrates for scoring (all in same brackets are considered the full match for scoring
purposes): (valine - 3 hydroxy anthranilic acid); (dehydro aminobutryic acid – threonine);
(hydroxy ornithine - ornithine).
5 A list of similar substrates for scoring (all in same brackets are considered the partial match for
scoring purposes): (valine - isoleucine - alanine - leucine – β-hydroxyvaline – β-lysine -
hydroxyleucine); (alanine – β-alanine); (glutamic acid - glutamine - piperazic acid); (phenylalanine
- tryptophan - tyrosine - hydroxyl-3-methylpentanoic acid - quinozaline carboxylic acid - β
hydroxy phenylalanine - hydroxy tyrosine – β-methyl phenylalanine); (asparagine - aspartic acid
- hydroxyasparagine - β-methyl aspartic acid - hydroxyaspartic acid); (methyl proline - proline).
6 A list of non-proteinogenic amino acids: amino epoxi oxodecanoic acid, β alanine, β-
methylphenylalanine, β-phenylalanine, β hydroxyphenylalanine, hydroxyleucine,
hydroxyphenylglycine, dihydroxyphenylglycine, hydroxyasparagine, hydroxyaspartic acid, β-
methyl aspartic acid, capreomycidine, citrulline, norvaline, isovaline, β-hydroxylvaline,
hydroxyornithine, hydroxyacetylornithine, ornithine, butenylmethyl threonine, methylproline,
hydroxytyrosine, β-lysine, adipic acid, kynurenine, aminobutyric acid, dehydro aminobutyric acid,
aminoisobutyric, coronamic, diaminopropinate, enduracididine, diaminobutyric acid, pipecolic
acid, methylglutamate, epoxy oxodecanoic acid, hydroxyl quinaldic acid, piperazic acid,
quinoxaline carboxylic acid, 3-hydroxy anthranilic acid. 7 When an amino acid is aligned with a polyketide monomer, a penalty of “-10” is given in refined
scoring. 8 A list of polyketide substrates for scoring: malonate, methyl malonate, methoxy malonate;
benzoate; ethyl malonate; isobutyrate; methyl butyrate. 9 Rare polyketide substrates are all substrates but malonate and methyl malonate. 10 Scoring properties added to the match score between the compounds after the alignment and not
normalized based on the size of the alignment. Therefore the scoring values given to these
properties will inherently have increased weight over those scores that get normalized, thus why
their scores are often lower than 1, and why the ‘Basic’ scoring scheme defaults them to 0.25. 11 For details of sugars and sugar genes, see Supplementary Table 5. 12 A list of identical acyl adenylating substrates for scoring (all in same brackets are considered
the partial match for scoring purposes): (pyruvate - lactate - valeric acid - 3 hydroxyl pantanolic
acid); (salicylic acid – 3-formamido-5-hydroxy benzoic acid - benzoic acid); (alpha-
ketoisocaproae – hydroxyl-3-methyl pentanoic acid). 13 Scaffold and chemical type matches are only applied to type 2 polyketides and enediynes. For
details of scaffolds and chemical types of type 2 polyketides and enediynes, see Supplementary
Table 6.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 6. Scaffold types of type 2 polyketides and enediynes detected by
PRISM and GRAPE. Compounds are matched to each scaffold type in GRAPE. a) PRISM
identifies one or multiple cyclases enzymes in gene clusters and match to each scaffold for type 2
polyketides. b) Ketosynthase and phosphopantetheinyl transferase of enediyne PKSs are detected
by PRISM which match to corresponding enediyne core structures.
Supplementary Table 6a
Scaffold types First cyclase Second cyclase Third cyclase Fourth
cyclase
Enterocin Favorskiiase Type II polyketide
cyclase, clade 1
Type II polyketide
cyclase, clade 5a
AZ154 Type II polyketide
cyclase, clade 7
Type II polyketide
cyclase, clade 1
Type II polyketide
cyclase, clade 6a
Type II
polyketide
cyclase, clade
3
Fredericamycin1 Type II polyketide
cyclase, clade 7
Type II polyketide
cyclase, clade 1
Type II polyketide
cyclase, clade 6a
Type II
polyketide
cyclase, clade
3
Fredericamycin2 Type II polyketide
cyclase, clade 7
Type II polyketide
cyclase, clade 1
Type II polyketide
cyclase, clade 6a
Type II
polyketide
cyclase, clade
3
Fredericamycin3 Type II polyketide
cyclase, clade 7
Type II polyketide
cyclase, clade 1
Type II polyketide
cyclase, clade 6a
Type II
polyketide
cyclase, clade
3
Fredericamycin4 Type II polyketide
cyclase, clade 7
Type II polyketide
cyclase, clade 2
Type II polyketide
cyclase, clade 6b
Aureolic_acid1 Type II polyketide
cyclase, clade 7
Type II polyketide
cyclase, clade 1
Type II polyketide
cyclase, clade 5a
Lysolipin Type II polyketide
cyclase, clade 7
Type II polyketide
cyclase, clade 1
Type II polyketide
cyclase, clade 5a
Rubromycin4 Type II polyketide
cyclase, clade 7
Type II polyketide
cyclase, clade 1
Type II polyketide
cyclase, clade 5a
Rubromycin3 Type II polyketide
cyclase, clade 7
Type II polyketide
cyclase, clade 1
Type II polyketide
cyclase, clade 5a
Tetracyclines Type II polyketide
cyclase, clade 10;
Type II polyketide
cyclase, clade 8a;
Type II polyketide
cyclase, clades 8/9
Type II polyketide
cyclase, clade 2
Type II polyketide
cyclase, clade 6b
subtype 1
Rubromycin2 Type II polyketide
cyclase, clade 7
Type II polyketide
cyclase, clade 1
Type II polyketide
cyclase, clade 5a
Rubromycin1 Type II polyketide
cyclase, clade 7
Type II polyketide
cyclase, clade 2
Type II polyketide
cyclase, clade 6a
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 6a (continued)
Scaffold types First cyclase Second cyclase Third
cyclase
Fourth
cyclase
Chartarin Type II polyketide cyclase,
clade 8a; Type II
polyketide cyclase, clades
8/9, Type II polyketide
cyclase, clade 10
Type II polyketide
cyclase, clade 4, Type II
polyketide cyclase, clade
5b
Urdamycin Type II polyketide cyclase,
clade 10; Type II
polyketide cyclase, clade
8a; Type II polyketide
cyclase, clades 8/9
Type II polyketide
cyclase, clade 1
Type II
polyketide
cyclase,
clade 5a
Pradimicin2 Type II polyketide cyclase,
clade 7
Type II polyketide
cyclase, clade 1
Type II
polyketide
cyclase,
clade 5a
Pradimicin1 Type II polyketide cyclase,
clade 7
Type II polyketide
cyclase, clade 1
Wailupemyci
n
Type II polyketide cyclase,
clade 3
Type II polyketide
cyclase, clade 4; Type II
polyketide cyclase, clade
5b
Angucyclines4 Type II polyketide cyclase,
clade 10; Type II
polyketide cyclase, clade
8a; Type II polyketide
cyclase, clades 8/9
Type II polyketide
cyclase, clade 4; Type II
polyketide cyclase, clade
5b
Angucyclines3 Type II polyketide cyclase,
clade 10; Type II
polyketide cyclase, clade
8a; Type II polyketide
cyclase, clades 8/9
Type II polyketide
cyclase, clade 4; Type II
polyketide cyclase, clade
5b
Angucyclines2 Type II polyketide cyclase,
clade 10; Type II
polyketide cyclase, clade
8a; Type II polyketide
cyclase, clades 8/9
Type II polyketide
cyclase, clade 4; Type II
polyketide cyclase, clade
5b
Angucyclines1 Type II polyketide cyclase,
clade 10; Type II
polyketide cyclase, clade
8a; Type II polyketide
cyclase, clades 8/9
Type II polyketide
cyclase, clade 2
Type II
polyketide
cyclase,
clade 6b
subtype 2
Nogalonate Type II polyketide cyclase,
clade 10; Type II
polyketide cyclase, clade
8a; Type II polyketide
cyclase, clades 8/9
Type II polyketide
cyclase, clade 3
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 6a (continued)
Scaffold types First cyclase Second cyclase Third
cyclase
Fourth
cyclase
Benziosochromanequinone Type II polyketide
cyclase, clade 10; Type
II polyketide cyclase,
clade 8a; Type II
polyketide cyclase,
clades 8/9
Type II
polyketide
cyclase, clade 2
Type II
polyketide
cyclase,
clade 6b
UT_X26 Type II polyketide
cyclase, clade 7
Type II
polyketide
cyclase, clade 4,
Type II
polyketide
cyclase, clade 5b
AB649 Type II polyketide
cyclase, clade 10; Type
II polyketide cyclase,
clade 8a; Type II
polyketide cyclase,
clades 8/9
Juglomycin Type II polyketide
cyclase, clade 10; Type
II polyketide cyclase,
clade 8a; Type II
polyketide cyclase,
clades 8/9
Supplementary Table 6b
Scaffold types Enediyne ketosynthase Enediyne PPTase
Nine-membered enediyne
core structure
Nine-membered
enediyne type I
ketosynthase
Nine-membered
enediyne PPTase
Ten-membered enediyne
core structure
Ten-membered enediyne
type I ketosynthase
Ten-membered
enediyne PPTase
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 4. A comparison of different scoring matrices of GARLIC. Smith-
Waterman algorithm is implemented for local alignment, whereas Needleman-Wunsch algorithm
is used for global alignment. GF and LOO represent optimized from global refine scoring and
leave one out analysis, respectively. Basic and refined scoring metrics are in Supplementary
Table 3.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 5. Monomodular enediynes, glycosylated type 2 polyketides match in
GARLIC. GRAPE results of 17 enediynes and glycosylated type 2 polyketides are clustered and
GARLIC matches are showing in dark purple. Structures of calicheamicin, cosmomycin and
dactylocycline are showing as examples.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 6. Screenshot of GARLIC front end web application. It can be accessed
through www.magarveylab.ca/garlic.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 7. GARLIC alignment for the telomycin gene cluster block output
versus the telomycin breakdown output. (a) Gene cluster of telomycin from PRISM; (b)
GARLIC match of telomycin. For full names of the abbreviations, see Supplementary Data Set.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 8. GARLIC process pipeline of acidobactin. (a) Gene cluster of
acidobactin from PRISM; (b) GARLIC match of acidobactin. For full names of the abbreviations,
see Supplementary Data Set.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 9. GARLIC process pipeline of thanamycin. (a) Gene cluster of
thanamycin from PRISM; (b) GARLIC match of thanamycin. For full names of the abbreviations,
see Supplementary Data Set.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 10. GARLIC process pipeline of potensimicin. (a) Gene cluster of
potensimicin from PRISM; (b) GARLIC match of potensimicin. For full names of the
abbreviations, see Supplementary Data Set.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 11. GARLIC process pipeline of lucensomycin. (a) Gene cluster of
lucensomycin from PRISM; (b) GARLIC match of lucensomycin; (c) and (d) LC/MS output for
lucensomycin. For full names of the abbreviations, see Supplementary Data Set.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 12. GARLIC process pipeline of octacosamicin. (a) Gene cluster of
octacosamicin from PRISM; (b) GARLIC match of octacosamicin; (c) and (d) LC/MS output for
octacosamicin. For full names of the abbreviations, see Supplementary Data Set.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 13. GARLIC process pipeline of tauramamide. (a) Gene cluster of
tauramamide from PRISM; (b) GARLIC match of tauramamide; (c) and (d) LC/MS output for
tauramamide. For full names of the abbreviations, see Supplementary Data Set.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 14. GARLIC process pipeline of bogorol. (a) Gene cluster of bogorol
from PRISM; (b) GARLIC match of bogorol; (c) and (d) LC/MS output for bogorol. For full names
of the abbreviations, see Supplementary Data Set.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 15. Pairwise alignments of biosynthetic gene clusters identified by
GARLIC with putative gene clusters from established producers. a. Homology searches
using the lucensomycin gene cluster from Streptomyces achromogenes NRRL 3125 against the
highly fragmented (3,223 contigs) and low coverage (largest contig: 21,706 bp) genome
sequence data of the known lucensomycin producer Streptomyces lucensis JCM 4490 revealed
likely lucensomycin biosynthesis genes on 11 contigs, including 19 sequences associated with
open reading frames. The average open reading frame percent identity for nucleotides was
90.2%. b. Homology searches using the bogorol gene cluster from Brevibacillus laterosporus
DSM 25 against the complete genome of the known bogorol producer Brevibacillus laterosporus
LMG 15441 revealed an identical biosynthetic gene cluster. The pairwise alignment revealed a
percent identity for nucleotides of 97.1%. c. Homology searches using the octacosamicin gene
cluster from Amycolatopsis spp. NAM50 against the partial genome of the known octacosmicin
producer Amycolatopsis azurea DSM 43854 revealed an identical biosynthetic gene cluster split
between two contigs. The pairwise alignment revealed a percent identity for nucleotides of
90.6%.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Figure 16. GARLIC process pipeline of potensibactin. (a) Gene cluster of
potensibactin from PRISM; (b) GARLIC match of potensibactin; (c) LC/MS output for
potensibactin. For full names of the abbreviations, see Supplementary Data Set.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Table 7. Comparison of the GRAPE results of known compounds with
PRISM results of their producers’ genomes based on GARLIC analysis. All full genomes of
producers of 171 compounds were ran through PRISM, and the PRISM outputs were aligned
with the GRAPE results of their produced compounds through GARLIC program. The score
differences between the top matched cluster in each genome and the second best match were
listed. The score differences of octacosamicin, lucensomycin and bogorol were also included
based on our in-house genomes.
Compounds Organism Score difference between
top match and the second
best clusters
A47934 Streptomyces toyocaensis NRRL15009 0.97
surfactin Bacillus subtilis 1.22
lichenysin Bacillus licheniformis 1.46
polymyxin Bacillus polymyxa 0.82
pimaricin Streptomyces natalensis 1.18
ristocetin Amycolatopsis lurida 0.86
fengycin Bacillus subtilis 1.05
salinomycin Streptomyces albus XM211 0.61
erythromycin Aeromicrobium erythreum 1.12
amphotericin Streptomyces nodosus ATCC 14899 1.11
crocacin Chondromyces crocatus 0.66
telomycin Streptomyces canus 0.4
daptomycin Streptomyces roseosporus 0.67
nodularin Nodularia spumigena 0.79
Octacosamicin1 NAM50 0.85
FK506 Streptomyces tsukubaensis 0.49
Cephamycin C Streptomyces clavuligerus 0.55
bacillaene Bacillus subtilis 0.57
Gephyronic acid Cystobacter violaceus Cb vi76 0.49
avermectin Streptomyces avermetlis 0.37
myxothiazol Myxococcus fulvus 0.37
kutznerides Kutzneria sp. 744 0.36
Lucensomycin1 Streptomytces achromogenes 0.64
Acidobactin Acidovorax citrulli DSM 17060 0.49
Bogorol1 Brevibacillus laterosporus DSM 15441 0.9 1 The score differences were derived from in-house genomes.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Data Set: Names and abbreviations of substrates used in the GARLIC
pipeline. Amino acids, hydroxy acids, sugars, fatty acids and site-specific tailoring are listed
with their full names and abbreviations.
Nature Chemical Biology: doi:10.1038/nchembio.2188
Supplementary Note
High resolution mass data for in-house compounds.
Compound Molecular
Formula
Calculated
m/z
Observed
m/z
Δppm
Bogorol [M+H]+ C80H143N16O16 1584.087 1584.087 0.1
Lucensomycin [M+H]+ C36H54NO13 708.3595 708.3572 3.269
Octacosamicin B [M+H]+ C32H55N4O9 639.3969 639.3947 3.463
Tauramamide methyl ester
[M+H]+
C45H68N9O9 878.5140 878.5133 0.854
Acidobactin B [M+H]+ C28H48N7O15 722.3228 722.3177 2.609
Telomycin A [M+H]+ C59H78N13O19 1272.553 1272.553 0.662
Thanamycin A [M+H]+ C54H88ClN12O22 1291.582 1291.582 0.205
Potensimicin [M+H]+ C28H48NO8 526.3374 526.3356 4.565
Potensibactin [M+H]+ C29H43N8O14 727.28987 727.28958 0.403
Nature Chemical Biology: doi:10.1038/nchembio.2188
NMR spectroscopic data for lucensomycin (700 MHz, in DMSO-d6)1. Structure, key COSY
and HMBC correlations of lucensomycin are showing below the table.
Position δH (mult.) δC Position δH (mult.) δC
1 0.86 (t, 5.9) 13.84 19 - 174.87
2 1.29, 1.26 (m) 21.96 20 4.01 (m) 65.25
3 1.25, 1.25 (m) 27.02 21 1.82, 1.11 (d, 8.2) 44.23
4 1.61, 1.57 (m) 33.74 22 - 97.11
5 4.7 (m) 72.65 23 1.58, 1.50 (d, 14) 46.26
6
2.39, 2.15 (q, 10.8,
12.2) 37.39 24 4.12 (m) 66.05
7 5.56 (m) 128.62 25 1.92, 1.18 (m) 40.98
8 6.01 (dd, 10.4, 4.4) 135.46 26 2.73 (m) 58.16
9 6.15 (m) 131.44 27 3.22 (m) 53.79
10 6.13 (m) 131.98 28 6.33 (dd, 6.9, 8.9) 144.27
11 6.2 (dd, 10.4, 3.5) 131.38 29 6.08 (m) 123.88
12 6.49 (dd, 11.5, 2.7) 133.44 30 - 164.75
13 6.08 (m) 128.67 1' 4.32 (m) 96.22
14 5.85 (dd, 8.8, 6.1) 136.02 2' 3.63 (m) 69.56
15 4.36 (m) 74.31 3' 3.37 (m) 71.54
16 2.05, 1.54 (m) 36.66 4' 2.59 (m) 54.39
17 4.19 (m) 65.2 5' 3.25 (m) 70.46
18 1.91 (m) 57.44 6' 1.18 (m) 18.04 1 Chemical shift and (multiplicity, J in Hz).
Nature Chemical Biology: doi:10.1038/nchembio.2188
1H NMR spectrum of lucensomycin in d6-DMSO.
13C NMR spectrum of lucensomycin in d6-DMSO.
Nature Chemical Biology: doi:10.1038/nchembio.2188
1H-13C HSQC NMR spectrum of lucensomycin in d6-DMSO.
1H-13C HSQC-TOCSY NMR spectrum of lucensomycin in d6-DMSO.
Nature Chemical Biology: doi:10.1038/nchembio.2188
1H-13C HMBC NMR spectrum of lucensomycin in d6-DMSO.
1H-1H COSY NMR spectrum of lucensomycin in d6-DMSO.
Nature Chemical Biology: doi:10.1038/nchembio.2188
1H-1H TOCSY NMR spectrum of lucensomycin in d6-DMSO.
1H-1H NOESY NMR spectrum of lucensomycin in d6-DMSO.
Nature Chemical Biology: doi:10.1038/nchembio.2188
NMR spectroscopic data for potensibactin (700 MHz, in DMSO-d6) 1. Structure, key COSY
and HMBC correlations of potensibactin are showing below the table.
Position δH (mult.) δC Position δH (mult.) δC
1 - 149.37 19 4.28 (ov.) 55.38
2 - 146.2 20 3.60, 3.55
(ov.) 61.66
3 6.92 (d, 7.4) 118.84 21 - 169.67
4 6.69 (t, 7.9) 118 22 7.92 (d, 8.2) -
5 7.3 (d, 7.8) 117.8 23 4.25 (ov.) 52.3
6 - 115.3 24 1.68, 1.50
(ov.) 29.38
7 - 169.74 25 1.57, 1.51
(ov.) 22.83
8 9.09 (m) - 26 3.50, 3.41
(ov.) 46.69
9 3.95, 3.95 (m) 42.3 27 - 170.42
10 - 168.97 28 1.97 (s) 20.39
11 8.29 (t, 5.2) - 29 - 171.16
12 3.81, 3.81 (d,
4.8) 42.12 30 8.05 (ov.) -
13 - 168.92 31 4.3 (ov.) 49.53
14 8.03 (ov.) - 32 1.87, 1.65
(ov.) 27.53
15 4.33 (ov.) 55.24 33 1.90, 1.85
(ov.) 20.31
16 3.60, 3.55
(ov.) 61.7 34
3.46, 3.45
(ov.) 51.23
17 - 170.11 35 - 164.7
18 7.94 (d, 7.4) - 1 Chemical shift and (multiplicity, J in Hz).
Nature Chemical Biology: doi:10.1038/nchembio.2188
1H NMR spectrum of potensibactin in d6-DMSO.
13C NMR spectrum of potensibactin in d6-DMSO.
Nature Chemical Biology: doi:10.1038/nchembio.2188
1H-13C HSQC NMR spectrum of potensibactin in d6-DMSO.
1H-13C HSQC-TOCSY NMR spectrum of potensibactin in d6-DMSO.
Nature Chemical Biology: doi:10.1038/nchembio.2188
1H-13C HMBC NMR spectrum of potensibactin in d6-DMSO.
1H-1H COSY NMR spectrum of potensibactin in d6-DMSO.
Nature Chemical Biology: doi:10.1038/nchembio.2188