Gene Networks Underlying Cannabinoid and Terpenoid ... · polyketide pathway gives rise to...

21
Gene Networks Underlying Cannabinoid and Terpenoid Accumulation in Cannabis 1[OPEN] Jordan J. Zager, a Iris Lange, a Narayanan Srividya, a Anthony Smith, b and B. Markus Lange a,2,3 a Institute of Biological Chemistry and M.J. Murdock Metabolomics Laboratory, Washington State University, Pullman, Washington 99164-6340 b Evio Labs, Central Point, Oregon 97502 ORCID IDs: 0000-0001-6970-5832 (J.J.Z.); 0000-0001-7934-7987 (N.S.); 0000-0001-6565-9584 (B.M.L.). Glandular trichomes are specialized anatomical structures that accumulate secretions with important biological roles in plant- environment interactions. These secretions also have commercial uses in the avor, fragrance, and pharmaceutical industries. The capitate-stalked glandular trichomes of Cannabis sativa (cannabis), situated on the surfaces of the bracts of the female owers, are the primary site for the biosynthesis and storage of resins rich in cannabinoids and terpenoids. In this study, we proled nine commercial cannabis strains with purportedly different attributes, such as taste, color, smell, and genetic origin. Glandular trichomes were isolated from each of these strains, and cell type-specic transcriptome data sets were acquired. Cannabinoids and terpenoids were quantied in ower buds. Statistical analyses indicated that these data sets enable the high-resolution differentiation of strains by providing complementary information. Integrative analyses revealed a coexpression network of genes involved in the biosynthesis of both cannabinoids and terpenoids from imported precursors. Terpene synthase genes involved in the biosynthesis of the major monoterpenes and sesquiterpenes routinely assayed by cannabis testing laboratories were identied and functionally evaluated. In addition to cloning variants of previously characterized genes, specically CsTPS14CT [(2)-limonene synthase] and CsTPS15CT (b-myrcene synthase), we functionally evaluated genes that encode enzymes with activities not previously described in cannabis, namely CsTPS18VF and CsTPS19BL (nerolidol/linalool synthases), CsTPS16CC (germacrene B synthase), and CsTPS20CT (hedycaryol synthase). This study lays the groundwork for developing a better understanding of the complex chemistry and biochemistry underlying resin accumulation across commercial cannabis strains. Cannabis sativa (cannabis) was originally discovered in Central Asia and has likely been cultivated for tens of thousands of years by human civilizations, with the rst mention about 5,000 years ago in Chinese texts (Unschuld, 1986). Whereas the initial utility was pri- marily as a source of grain and ber, strains with me- dicinal properties were already in use in northwest China some 2,700 years ago, as evidenced by the detection of the psychoactive cannabinoid, (2)-trans- D 9 -tetrahydrocannabinol (THC), in plant residues re- covered from an ancient grave (Russo et al., 2008). Cannabis strains containing less THC but more of the nonpsychoactive cannabidiol (CBD), commonly referred to as hemp, were grown in Roman Britain for grain and ber but later found additional uses as a medicine during the Anglo-Saxon period (Grattan and Singer, 1952). The 1925 Geneva International Opium Convention required signatories to control the trade of certain drugs (including cannabis), which was followed by increasingly restrictive resolutions by the League of Nations and later the United Nations (United Nations, 1966). Until very recently, cannabis was considered an illicit substance of abuse by many governments and could only be researched by selected, authorized sci- entists in tightly supervised laboratories. Despite these restrictions, evidence for the medicinal potential was sufciently convincing that, by the mid-1980s, the synthetic cannabinoids nabilone and dronabinol had been granted approval by the U.S. Food and Drug Administration to suppress nausea during chemother- apy (Abuhasira et al., 2018). The discovery of the exis- tence of a high-afnity cannabinoid receptor in the rat brain during the late 1980s (Devane et al., 1988) prompted further research to identify the endogenous ligands. This resulted in the characterization, beginning in the early 1990s, of several lipid-based retrograde 1 This work was supported by gifts from private individuals, with no association with the cannabis industry. All work with raw mate- rials was conducted by A.S. at a facility accredited to National Envi- ronmental Laboratory Accreditation Program standards and licensed by the Oregon Liquor Control Commission. Work of employees of Washington State University (J.J.Z., I.L., and B.M.L.) was performed in accordance with the OR/ORSO Guideline of July 2017. 2 Author for contact: [email protected]. 3 Senior author. The author responsible for distribution of materials integral to the ndings presented in this article in accordance with the policy de- scribed in the Instructions for Authors (www.plantphysiol.org) is: B. Markus Lange ([email protected]). J.J.Z., A.S., and B.M.L. designed the experiments; A.S. harvested and extracted plant materials; A.S. performed metabolite analyses; J.J.Z., I.L. and N.S. cloned terpene synthase genes and performed functional assays; J.J.Z., A.S., and B.M.L. analyzed the data; J.J.Z. and B.M.L. wrote the article, with input from all authors. [OPEN] Articles can be viewed without a subscription. www.plantphysiol.org/cgi/doi/10.1104/pp.18.01506 Plant Physiology Ò , August 2019, Vol. 180, pp. 18771897, www.plantphysiol.org Ó 2019 American Society of Plant Biologists. All Rights Reserved. 1877 https://plantphysiol.org Downloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

Transcript of Gene Networks Underlying Cannabinoid and Terpenoid ... · polyketide pathway gives rise to...

  • Gene Networks Underlying Cannabinoid and TerpenoidAccumulation in Cannabis1[OPEN]

    Jordan J. Zager,a Iris Lange,a Narayanan Srividya,a Anthony Smith,b and B. Markus Langea,2,3

    aInstitute of Biological Chemistry and M.J. Murdock Metabolomics Laboratory, Washington State University,Pullman, Washington 99164-6340bEvio Labs, Central Point, Oregon 97502

    ORCID IDs: 0000-0001-6970-5832 (J.J.Z.); 0000-0001-7934-7987 (N.S.); 0000-0001-6565-9584 (B.M.L.).

    Glandular trichomes are specialized anatomical structures that accumulate secretions with important biological roles in plant-environment interactions. These secretions also have commercial uses in the flavor, fragrance, and pharmaceutical industries.The capitate-stalked glandular trichomes of Cannabis sativa (cannabis), situated on the surfaces of the bracts of the female flowers,are the primary site for the biosynthesis and storage of resins rich in cannabinoids and terpenoids. In this study, we profiled ninecommercial cannabis strains with purportedly different attributes, such as taste, color, smell, and genetic origin. Glandulartrichomes were isolated from each of these strains, and cell type-specific transcriptome data sets were acquired. Cannabinoidsand terpenoids were quantified in flower buds. Statistical analyses indicated that these data sets enable the high-resolutiondifferentiation of strains by providing complementary information. Integrative analyses revealed a coexpression network ofgenes involved in the biosynthesis of both cannabinoids and terpenoids from imported precursors. Terpene synthase genesinvolved in the biosynthesis of the major monoterpenes and sesquiterpenes routinely assayed by cannabis testing laboratorieswere identified and functionally evaluated. In addition to cloning variants of previously characterized genes, specificallyCsTPS14CT [(2)-limonene synthase] and CsTPS15CT (b-myrcene synthase), we functionally evaluated genes that encodeenzymes with activities not previously described in cannabis, namely CsTPS18VF and CsTPS19BL (nerolidol/linaloolsynthases), CsTPS16CC (germacrene B synthase), and CsTPS20CT (hedycaryol synthase). This study lays the groundwork fordeveloping a better understanding of the complex chemistry and biochemistry underlying resin accumulation across commercialcannabis strains.

    Cannabis sativa (cannabis) was originally discoveredin Central Asia and has likely been cultivated for tens ofthousands of years by human civilizations, with thefirst mention about 5,000 years ago in Chinese texts(Unschuld, 1986). Whereas the initial utility was pri-marily as a source of grain and fiber, strains with me-dicinal properties were already in use in northwestChina some 2,700 years ago, as evidenced by the

    detection of the psychoactive cannabinoid, (2)-trans-D9-tetrahydrocannabinol (THC), in plant residues re-covered from an ancient grave (Russo et al., 2008).Cannabis strains containing less THC but more ofthe nonpsychoactive cannabidiol (CBD), commonlyreferred to as hemp, were grown in Roman Britain forgrain and fiber but later found additional uses as amedicine during the Anglo-Saxon period (Grattan andSinger, 1952). The 1925 Geneva International OpiumConvention required signatories to control the trade ofcertain drugs (including cannabis), which was followedby increasingly restrictive resolutions by the League ofNations and later the United Nations (United Nations,1966). Until very recently, cannabis was considered anillicit substance of abuse by many governments andcould only be researched by selected, authorized sci-entists in tightly supervised laboratories. Despite theserestrictions, evidence for the medicinal potential wassufficiently convincing that, by the mid-1980s, thesynthetic cannabinoids nabilone and dronabinol hadbeen granted approval by the U.S. Food and DrugAdministration to suppress nausea during chemother-apy (Abuhasira et al., 2018). The discovery of the exis-tence of a high-affinity cannabinoid receptor in therat brain during the late 1980s (Devane et al., 1988)prompted further research to identify the endogenousligands. This resulted in the characterization, beginningin the early 1990s, of several lipid-based retrograde

    1This work was supported by gifts from private individuals, withno association with the cannabis industry. All work with raw mate-rials was conducted by A.S. at a facility accredited to National Envi-ronmental Laboratory Accreditation Program standards and licensedby the Oregon Liquor Control Commission. Work of employees ofWashington State University (J.J.Z., I.L., and B.M.L.) was performedin accordance with the OR/ORSO Guideline of July 2017.

    2Author for contact: [email protected] author.The author responsible for distribution of materials integral to the

    findings presented in this article in accordance with the policy de-scribed in the Instructions for Authors (www.plantphysiol.org) is: B.Markus Lange ([email protected]).

    J.J.Z., A.S., and B.M.L. designed the experiments; A.S. harvestedand extracted plant materials; A.S. performed metabolite analyses;J.J.Z., I.L. and N.S. cloned terpene synthase genes and performedfunctional assays; J.J.Z., A.S., and B.M.L. analyzed the data; J.J.Z.and B.M.L. wrote the article, with input from all authors.

    [OPEN]Articles can be viewed without a subscription.www.plantphysiol.org/cgi/doi/10.1104/pp.18.01506

    Plant Physiology�, August 2019, Vol. 180, pp. 1877–1897, www.plantphysiol.org � 2019 American Society of Plant Biologists. All Rights Reserved. 1877

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    http://orcid.org/0000-0001-6970-5832http://orcid.org/0000-0001-6970-5832http://orcid.org/0000-0001-7934-7987http://orcid.org/0000-0001-7934-7987http://orcid.org/0000-0001-6565-9584http://orcid.org/0000-0001-6565-9584http://orcid.org/0000-0001-6970-5832http://orcid.org/0000-0001-7934-7987http://orcid.org/0000-0001-6565-9584http://crossmark.crossref.org/dialog/?doi=10.1104/pp.18.01506&domain=pdf&date_stamp=2019-07-20mailto:[email protected]://www.plantphysiol.orgmailto:[email protected]://www.plantphysiol.org/cgi/doi/10.1104/pp.18.01506https://plantphysiol.org

  • neurotransmitters (endocannabinoids) and multipleenzymes involved in their biosynthesis, trafficking, andperception (the endocannabinoid system), which weresubsequently demonstrated to regulate a multitude ofphysiological and cognitive processes in humans andother animals (Devane et al., 1992). With receptor tar-gets in hand, follow-up research and clinical trialsbrought several additional cannabis-related products tothe pharmaceutical marketplace, including nabiximols(marketed as Sativex in Canada since 2005), a cannabisextract used to treat symptoms of multiple sclerosis,and a formulation of highly purified, plant-sourcedCBD (marketed as Epidiolex in the United States sinceearly 2018) to treat certain forms of epilepsy. In themeantime, several jurisdictions and even entire coun-tries changed their policies on cannabis, endorsing lawsthat allow its therapeutic use and decriminalizing or

    even legalizing it for recreational purposes (Abuhasiraet al., 2018). Legislation has not been able to keep upwith these recent developments, and specific labelingregulations with regard to the composition of activeingredients, serving sizes, and recommended doses arewoefully lacking (Subritzky et al., 2016). This situationis exacerbated by an inadequate understanding of howthe chemistry (cannabinoids and other specializedmetabolites) of cannabis extracts and formulations re-lates to their biological effects.

    Since the original structural elucidation, during theearly 1960s, of THC as a psychoactive principle incannabis (Gaoni and Mechoulam, 1964), the structuresof more than 90 biogenic cannabinoids have beenreported to occur in members of the genus Cannabis(Andre et al., 2016), with a handful of constituents beingthe most prominent across strains (Fig. 1). These

    Figure 1. Shared origin of the cannabinoid andterpenoid biosynthetic pathways. A circled Pdenotes phosphate moieties.

    1878 Plant Physiol. Vol. 180, 2019

    Zager et al.

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    https://plantphysiol.org

  • cannabinoids accumulate primarily in capitate-stalkedglandular trichomes of female plants at the floweringstage. A second class of metabolites with high abun-dance and even greater chemical diversity in cannabisglandular trichomes are monoterpenes and sesquiter-penes (Fig. 1; Brenneisen, 2007). These volatile terpe-noids are responsible for the distinctive aromas ofdifferent cannabis strains. The popular press and trademagazines liberally use the term “entourage effect”to suggest that synergism among cannabinoids orbetween cannabinoids and other constituents (in par-ticular terpenoids) may contribute to different psycho-logical perceptions of cannabis varieties by users. Insupport of this view, b-caryophyllene, a sesquiterpenewith almost ubiquitous occurrence in plant oils andresins, was demonstrated to bind with high affinity tothe CB2 cannabinoid receptor and has therefore beenreferred to as a dietary cannabinoid (Gertsch et al.,2008). However, there is only limited clinical evidencefor entourage effects of terpenoids in cannabis formu-lations (Gertsch et al., 2010; Russo, 2011). Irrespective ofthese considerations, the chemical composition of eachcannabis strain is unique, and acquiring a metabolicfingerprint is an excellent first step in building a morerobust scientific foundation for assessing the correlationbetween the composition of plant material and theperception by users (Fischedick et al., 2010).Most of the cannabis products traded licitly or illicitly

    today are sourced from strains for which minimaldocumentation is available in the public domain andfor which the primary goal was clearly to breed high-THC strains (Cascini et al., 2012). In other words, thegenetics underlying chemical diversity in commercialcannabis strains is currently poorly understood(Welling et al., 2016). In this context, it is interesting thatcannabinoids and terpenoids share a common biosyn-thetic origin. The biosynthesis of the prominent can-nabinoids involves two direct precursor pathways. Thepolyketide pathway gives rise to olivetolic acid from ashort-chain fatty acid intermediate (hexanoyl-CoA),whereas the methylerythritol 4-phosphate (MEP)pathway provides geranyl diphosphate (GPP; Fig. 1;Fellermeier et al., 2001; Taura et al., 2009; Gagne et al.,2012; Stout et al., 2012 ). An aromatic prenyltransferasecatalyzes the formation of cannabigerolic acid fromoilvetolic acid and GPP (Fellermeier and Zenk, 1998;Page and Boubakir, 2012). The pathway then branchesagain toward different cyclized products, such as tet-rahydrocannabinolic acid (THCA), cannabidiolic acid(CBDA), and cannabichromanic acid (Fig. 1; Sirikantaramaset al., 2005; Taura et al., 2007). Reduced metabolicproducts of these acids are formed nonenzymaticallyby exposure to heat . Plant monoterpenes are mostlyderived from the plastid-localized MEP pathway,whereas the cytosolic/peroxisomal mevalonate path-way is a common source of precursors for sesquiter-penes, although cross talk between both pathways hasalso been reported (Fig. 1; Hemmerlin et al., 2012).Terpene synthases catalyze the first committed step inthe biosynthesis of a specific terpenoid from a prenyl

    diphosphate precursor of the appropriate chain length.To date, monoterpene synthases (accepting a C10 pre-cursor) and sesquiterpene synthases (acting on a C15precursor) that are responsible for the production ofabout half a dozen terpenoids in cannabis have beenreported (Fig. 1; Günnewich et al., 2007; Booth et al.,2017), with many more awaiting functional characteri-zation. In this article, we report the chemical profiles andcorresponding gene networks across several cannabisstrains, thereby building the foundation for a better un-derstanding of their chemical and biochemical diversity.

    RESULTS

    Strategic Considerations for Logistics, Strain Selection, andExperimental Design

    One of the goals of this pilot study was to test theutility of combining metabolic and transcriptomic datato differentiate cannabis strains with regard to the mostrelevant traits. To ensure the consistency of data sets, allplant materials were sourced from the same facility,where they had been maintained under comparablegrowth conditions (Shadowbox Farms in Williams,Oregon). Plant harvest was performed when the ap-pearance of glandular trichome content had changedfrom a turbid white to clear and before another changeto an amber-like color occurred. For most strains, thepistils had changed color from white to yellow or or-ange. These are the visual cues used by experiencedgrowers to indicate optimal harvest time. All furtherprocessing was performed with fresh (uncured) mate-rial to avoid the previously reported loss of terpenoidvolatiles during drying (Ross and ElSohly, 1996). Can-nabinoids and terpenoids were extracted and quanti-fied at a testing facility licensed according to the NationalEnvironmental Laboratory Accreditation Program’s TNI2009 Standard (Evio Labs). At this facility, fractions highlyenriched in glandular trichomes were obtained and RNAwas isolated, with minor modifications, using previouslyestablished protocols (Lange et al., 2000). Glandulartrichome-specific RNA sequencing (RNA-seq) data werethen acquired by a commercial service provider (QuickBiology). Metabolite and transcriptome data were ac-quired for three biological replicates per strain.This study involved a selection of strains with

    C. sativa ancestry, whereas Cannabis indica (formallyclassified as C. sativa forma indica) was dominant inothers (Fig. 2). Strains of C. sativa provenance are gen-erally characterized by fairly thin and narrow leaves,comparatively longer flowering cycles, and a relativelytall stature. A typical example in this study is MamaThai, which is generally considered a landrace ofC. sativa. In contrast, C. indica strains ordinarily havelarge and thick leaves, a rather short flowering cycle(6–8 weeks), and a proportionately short habitus(Fig. 2A). Our pilot study featured Blackberry Kush as aC. indica dominant strain. The remaining strains werehybrids ofmixedC. sativa andC. indica lineage, plus onestrain (Terple) with poorly documented origin (Fig. 2B).

    Plant Physiol. Vol. 180, 2019 1879

    Coregulation of Cannabinoid and Terpenoid Pathways

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    https://plantphysiol.org

  • To address our goal of assessing the utility of our datafor classifying strains, RNA-seq and chemical data(cannabinoid and terpenoid profiles) were subjected tomultivariate statistical analyses. We then tested thehypothesis that cannabinoid and terpenoid pathwaysare coregulated by performing gene coexpression net-work analyses. A combination of gene network andphylogenetic analyses was subsequently used to iden-tify candidate genes for hitherto uncharacterized ter-pene synthases that contribute significantly to thecannabis volatile bouquet.

    Strain Differentiation Based on RNA-Seq Data

    High-quality libraries reflecting transcripts expressedin isolated glandular trichomes were subjected to RNA-seq analysis (nine strains, three biological replicates each,

    27 samples total) on the Illumina HiSeq 4000 platform.A de novo consensus transcriptome assembly wasgenerated using the Trinity suite (Haas et al., 2013; as-sembly statistics are given in Supplemental Table S1).The reads were assembled into contigs covering a totalof 305 Mb of sequence with a GC content of 40.4%.The resulting assembly produced an N50 (sum of thelengths of all contigs of N50 value or longer contain atleast 50 % of the total transcriptome sequence) value of833 bp, containing 514,208 contigs of at least 201 bp inlength. The assembled transcriptome data set wassearched against the National Center for BiotechnologyInformation nonredundant protein database, whichresulted in the annotation of 82,523 sequences ate-values, 1e-5. Read counts for each transcript in eachsample were then processed with the RSEM softwarepackage (Li and Dewey, 2011) to calculate normalized

    Figure 2. Characteristics of cannabisstrains. A, Floral phenotypes. B, Originsand aroma descriptions (according tohttps://www.leafly.com).

    1880 Plant Physiol. Vol. 180, 2019

    Zager et al.

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1https://www.leafly.comhttps://plantphysiol.org

  • expression levels as transcripts per kilobase million(TPM). Transcripts with TPM values lower than 5across all varieties were removed from subsequentanalysis, resulting in 46,559 predicted genes with sig-nificant expression (Supplemental Table S2).As a first step to investigate the utility of RNA-seq for

    strain categorization, transcriptome data sets weresubjected to principal component analysis (PCA), astatistical procedure that reduces attribute space from alarger number of variables to a smaller number of so-called principal components, thereby decreasing thedimensionality of the original data. The first threeprincipal components accounted for 83% of the varia-bility in the data set (Fig. 3A). The replicates for eachstrain clustered together in a three-dimensional PCAplot, whereas the component scores for each strainwereseparated from those of all other strains, indicating thatthe overall transcriptome of each strain is unique(Fig. 3A). Processing of RNA-seq data by hierarchicalclustering analysis (HCA), which builds a cluster hier-archy that is commonly displayed as a dendrogram,grouped strains into twomajor clades (Fig. 3B). The firstclade contained Blackberry Kush, Cherry Chem, and

    Terple, whereas the second consisted of Mama Thai,White Cookies, Valley Fire, Black Lime, Canna Tsu, andSour Diesel, indicating a clear separation of strains byheritage (C. indica for clade 1 and C. sativa for clade 2).

    Strain Differentiation Based on Metabolite Profiling Data

    The highly robust analytical platforms that served asthe basis for the analysis of six cannabinoids and 24terpenoids were described in a previous report(Fischedick et al., 2010) and used here with minormodifications. Cannabinoid concentration was highestin White Cookies (28.4% of flower bud dry weight),with relatively high contents also occurring in CherryChem (17.7%), Black Lime (17.5%), Backberry Kush(15.8%), Valley Fire (15.7%), Terple (15.6%), Sour Diesel(12.4%), and Canna Tsu (12.2%; Table 1). Significantlylower concentrations were detected in Mama Thai(6.4%). In eight of the nine strains investigated, THCAwas the major cannabinoid, ranging from 26.3% of theflower bud dry weight in White Cookies to 5.9% inMama Thai (Table 1). The only exceptionwas the Canna

    Figure 3. Cannabis strain differentiation based on glandular trichome-specific RNA-seq data. A, Three-dimensional plot rep-resenting outcomes of a PCA. B, Heat map of a two-way HCA. The numerical values and red-white-blue color code indicate thelog2 fold change comparedwith the average gene expression value across all strains. Strain abbreviations at the bottom of B are asfollows: BB, Blackberry Kush; BL, Black Lime; CC, Cherry Chem; CT, Canna Tsu; MT, Mama Thai; SD, Sour Diesel; T, Terple; VF,Valley Fire; WC, White Cookies.

    Plant Physiol. Vol. 180, 2019 1881

    Coregulation of Cannabinoid and Terpenoid Pathways

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1https://plantphysiol.org

  • Tab

    le1.Constituen

    tsofca

    nnab

    isfemaleflower

    buds(m

    etab

    olite

    contentin

    ninestrainsex

    pressed

    aspercentage

    ofdry

    weigh

    t)

    n.d.,Notdetec

    table.

    Metab

    olite

    Blackberry

    Kush

    Black

    Lime

    Can

    naTsu

    Cherry

    Chem

    ValleyFire

    Mam

    maThai

    SourDiesel

    Terple

    WhiteCookies

    Can

    nab

    inoids

    THCA

    13.566

    0.90

    15.026

    1.10

    3.196

    0.20

    16.556

    0.81

    13.896

    1.33

    5.916

    0.60

    11.316

    1.04

    13.726

    1.36

    26.336

    0.54

    Tetrah

    ydroca

    nnab

    inol

    0.316

    0.02

    1.626

    0.19

    0.556

    0.055

    0.156

    0.008

    0.416

    0.049

    0.146

    0.02

    0.226

    0.027

    1.156

    0.12

    0.866

    0.091

    CBDA

    0.456

    0.02

    0.126

    0.012

    7.766

    0.63

    0.0796

    0.007

    0.0376

    0.001

    0.0166

    0.003

    0.0326

    0.002

    0.0676

    0.002

    0.0886

    0.004

    CBD

    0.956

    0.07

    0.1396

    0.016

    0.0856

    0.013

    0.0796

    0.008

    0.126

    0.004

    0.0476

    0.005

    0.0866

    0.006

    0.116

    0.008

    0.0986

    0.013

    Can

    nab

    igerol

    0.126

    0.015

    0.0866

    0.008

    0.0936

    0.008

    0.0516

    0.005

    0.156

    0.027

    0.0166

    0.001

    0.0526

    0.004

    0.0936

    0.002

    0.256

    0.005

    Can

    nab

    inol

    1.746

    0.20

    0.556

    0.019

    0.536

    0.051

    0.836

    0.019

    1.126

    0.14

    0.296

    0.028

    0.686

    0.033

    0.5026

    0.007

    0.786

    0.025

    Can

    nab

    ichromen

    en.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    Totalca

    nnab

    inoids

    15.876

    1.13

    17.536

    1.25

    12.206

    0.85

    17.746

    0.82

    15.706

    1.53

    6.416

    0.65

    12.3876

    1.05

    15.646

    1.47

    28.406

    0.54

    Monoterpen

    es?-Myrce

    ne

    2.356

    0.2

    4.346

    0.36

    1.706

    0.15

    1.616

    0.049

    2.246

    0.28

    0.116

    0.009

    0.706

    0.046

    2.966

    0.25

    1.146

    0.17

    (2)-Limonen

    e0.296

    0.02

    0.896

    0.08

    0.166

    0.021

    0.236

    0.015

    0.656

    0.098

    0.036

    0.003

    0.176

    0.011

    0.236

    0.019

    1.536

    0.24

    ?-Pinen

    e0.0156

    0.001

    1.996

    0.12

    0.386

    0.039

    0.0166

    0.001

    0.0446

    0.008

    0.0076

    0.001

    0.0046

    00.826

    0.051

    0.206

    0.032

    ?-Pinen

    e0.0866

    0.005

    0.506

    0.034

    0.186

    0.025

    0.0566

    0.003

    0.116

    0.013

    0.0266

    0.002

    0.0396

    0.002

    0.316

    0.022

    0.046

    0.007

    1,8-Cineo

    le0.266

    0.02

    0.386

    0.038

    0.526

    0.075

    0.4646

    0.012

    0.226

    0.028

    0.0576

    0.007

    0.116

    0.011

    0.006

    00.316

    0.037

    Linalool

    0.0826

    0.005

    0.0796

    0.004

    0.0526

    0.005

    0.136

    0.003

    0.166

    0.027

    0.0236

    0.002

    0.0746

    0.005

    0.0676

    0.006

    0.576

    0.072

    Terpinolene

    0.0196

    0.001

    0.0346

    0.003

    0.0196

    0.002

    0.0196

    0.001

    0.026

    0.003

    0.136

    0.016

    0.0176

    0.001

    0.026

    0.002

    0.0416

    0.006

    Borneo

    l0.0396

    0.002

    0.0416

    0.003

    n.d.

    0.0326

    0.002

    0.0336

    0.005

    0.0216

    0.002

    0.0266

    0.002

    0.0366

    0.002

    0.0486

    0.008

    ?-Ocimen

    en.d.

    0.0396

    0.003

    n.d.

    n.d.

    0.0066

    0.001

    0.136

    0.014

    n.d.

    0.0866

    0.007

    0.0156

    0.002

    Cam

    phen

    en.d.

    0.0896

    0.008

    0.0556

    0.007

    n.d.

    0.0046

    0.001

    n.d.

    n.d.

    0.0196

    0.002

    0.076

    0.012

    d-3-Caren

    e0.0296

    0.002

    0.0526

    0.006

    0.0036

    00.0086

    0.001

    0.0226

    0.003

    0.0036

    0.001

    n.d.

    0.0276

    0.002

    0.0166

    0.002

    Cam

    phor

    0.0446

    0.003

    0.0066

    0.001

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    0.1016

    0.013

    (1)-Terpinen

    e0.0016

    0.001

    n.d.

    n.d.

    n.d.

    n.d.

    0.0056

    0.001

    n.d.

    n.d.

    0.0026

    0To

    talmonoterpen

    es3.236

    0.26

    8.436

    0.66

    3.076

    0.32

    2.566

    0.085

    3.526

    0.47

    0.546

    0.057

    1.146

    0.078

    4.576

    0.36

    4.096

    0.60

    Sesquiterpen

    es?-Caryo

    phyllene

    0.136

    0.01

    0.246

    0.023

    0.216

    0.022

    0.746

    0.012

    0.236

    0.034

    0.126

    0.013

    0.456

    0.026

    0.156

    0.009

    0.606

    0.068

    ?-Humulene

    0.036

    0.002

    0.066

    0.005

    0.0516

    0.005

    0.206

    0.011

    0.0876

    0.014

    0.0686

    0.008

    0.196

    0.009

    0.0586

    0.003

    0.156

    0.018

    Nerolidol

    n.d.

    0.066

    0.004

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    Totalsesquiterpen

    es0.166

    0.015

    0.3616

    0.032

    0.266

    0.027

    0.936

    0.019

    0.326

    0.048

    0.196

    0.021

    0.646

    0.035

    0.216

    0.012

    0.756

    0.086

    Totalterpen

    oids

    3.396

    0.27

    8.796

    0.69

    3.336

    0.35

    3.496

    0.10

    3.846

    0.51

    0.736

    0.078

    1.786

    0.11

    4.786

    0.38

    4.836

    0.69

    1882 Plant Physiol. Vol. 180, 2019

    Zager et al.

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    https://plantphysiol.org

  • Tsu strain, in which CBDA (7.8% of flower bud dryweight) dominated over THCA (3.2%), whereas CBDAin all other strains remained at 1% or less. Two addi-tional cannabinoids of fairly high abundance werecannabinol, which accumulated to 0.2% to 1.7% offlower bud dry weight, and tetrahydrocannabinol,which amounted to 0.2% to 1.6% (Table 1; for struc-tures, see Fig. 1). Cannabichromene was not detected inany of the sampled varieties.Terpenoid content was highest in Black Lime (8.8% of

    flower bud dry weight), with fairly high contents alsooccurring in White Cookies (4.8%), Terple (4.8%), Val-ley Fire (3.8%), Cherry Chem (3.5%), Blackberry Kush(3.4%), and Canna Tsu (3.3%; Table 1). Significantlylower concentrations were detected in Sour Diesel(1.8%) and Mama Thai (0.7%). The monoterpene (C10)-to-sesquiterpene (C15) ratio was generally very high(greater than 10), with only three strains in which theratio was below 3 (Cherry Chem, Mama Thai, andSour Diesel; Table 1). It should be noted that this ratioonly applies to the terpenoids we were able to quan-tify based on the availability of authentic standards.b-Myrcene was the most abundant monoterpene inmost strains (up to 4.3% of flower bud dry weight inBlack Lime). The only exceptions were Mama Thai(generally low terpenoid contents, with terpinolene asthe most abundant monoterpene at 0.1%) and WhiteCookies (with limonene at 1.5%; Table 1). Limonenecontent was also high in Black Lime (0.9%) and ValleyFire (0.7%). a-Pinene and b-pinene amounts were quitehigh in Black Lime (2% and 0.5%, respectively). 1,8-Cineole was particularly abundant in Canna Tsuand Cherry Chem (0.5% in both; Table 1). All other

    monoterpenes had concentrations below 0.2%. Allstrains contained sesquiterpenes, of which b-caryophyllenewas consistently the most abundant (0.1%–0.7% of flowerbud dry weight). a-Humulene was also detectable in allstrains (less than 0.2%), whereas Black Lime was the onlystrain in which the nerolidol concentration rose above thelimit of quantitation (less than 0.1%; Table 1).Processing of the metabolite data (cannabinoid and

    terpenoid profiles) by PCA resulted in a clear separa-tion of the strains, with individual biological replicatesclustering closely together (Fig. 4A). Remarkably, 99%of the data variation across genotypes was captured bythe first three principal components. Application oforthogonal projections to latent structures discriminantanalysis (OPLS-DA), a statistical modeling tool usedcommonly in metabolomics research (Worley andPowers, 2013), indicated a separation of strains intotwo groups based on our metabolite profiling data, onerepresenting the C. indica-dominant strains, whereasthe other constituted the C. sativa-dominant strains(Fig. 4B). Biological replicates for each strain once againclustered together, whereas significant separation wasobserved across strains. In summary, glandulartrichome-specific gene expression and metabolite datawere consistent in differentiating cannabis strains.

    Evidence for Coexpression of Cannabinoid andTerpenoid Pathways

    Our glandular trichome RNA-seq data sets were fil-tered to eliminate geneswith consistently low expressionlevels (below 50 TPM), thereby retaining roughly 16,000

    Figure 4. Cannabis strain differentiation based on cannabinoid and terpenoid profiles. A, Three-dimensional plot representingoutcomes of a PCA. B, Two-dimensional plot of the outcomes of OPLS-DA.

    Plant Physiol. Vol. 180, 2019 1883

    Coregulation of Cannabinoid and Terpenoid Pathways

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    https://plantphysiol.org

  • expressed genes with significant expression levels in atleast one strain. Gene abundance across strains wasthen evaluated using the weighted gene correlationnetwork analysis (WGCNA) package in R (Langfelderand Horvath, 2008), which resulted in the binning ofgenes (only those with Spearman correlation coeffi-cients [SCCs] $ 0.8 were considered) into seven coex-pression modules (Supplemental Table S3). Furtheranalysis using the moduleEigengenes function indi-cated that the accumulation of CBDA, the signaturecannabinoid of the Canna Tsu strain, was highly

    correlated (SCC of 0.97, P value of 2e-17) with one of thecoexpression modules (indicated by brown color inFig. 5A). Interestingly, this module contained the genecoding for CBDA synthase, the enzyme responsiblefor the conversion of cannabigerolic acid to CBDA(Table 2). An analogous analysis for THCA or THC(which correlated with a module indicated by yellowcolor in Fig. 5A) and THCA synthase was not possible,because single-nucleotide polymorphisms in this gene(and not lack of expression) result in an inactive enzymein strains that accumulate primarily CBDA (Kojoma

    Figure 5. Coexpression of genes involved in cannabinoid and terpenoid biosynthesis. A,WGCNA of glandular trichome-specificRNA-seq data categorizes transcripts into eight color-codedmodules (for gene lists, see Supplemental Table S3). B, Correlation ofWGCNA modules with metabolites. A color code is used to visualize the SCCs for each module-metabolite pair, with red colorrepresenting positive and blue color indicating negative SCCs. C, Genes involved in cannabinoid and terpenoid biosynthesis areenriched in the yellow coexpression module obtained by WGCNA. Color code for pathways: light blue, hexanoateformation; dark green, precursors for monoterpenes; light green, monoterpene synthases; orange, sesquiterpenes; dark blue,cannabinoids; cyan, remaining genes. D, Functional context of genes highlighted in C in a simplifiedmetabolic pathway scheme.AAE1, Acyl-activating enzyme for short-chain fatty acids; Ac-CoA, acetyl-CoA; ACC1, acetyl-CoA carboxylase; CsTPS1FN/CsTPS14CT, (2)-limonene synthase; CsTPS2SK, (1)-a-pinene synthase; CsTPS3FN/CsTPS15CT, b-myrcene synthase;CsTPS16CC, germacrene B synthase; DHAP, dihydroxyacetone phosphate; DXS, 1-deoxy-D-xylulose-5-phosphate synthase;ENO, enolase; FNR-Root, ferredoxin-NADP1 reductase (isoform of roots and glandular trichomes); FPPS, farnesyl diphosphatesynthase; GAP, glyceraldehyde-3-phosphate; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; GPP, geranyl diphosphate;GPPS, geranyl diphosphate synthase; KR, b-ketoacyl reductase (fatty acid synthase complex); OA, olivetolic acid; PDH, pyruvatedehydrogenase; PFK, phosphofructokinase; PGI, phosphoglucoisomerase; PGM, phosphoglucomutase; PK, pyruvate kinase;PT1, cannabigerolic acid synthase; Pyr, pyruvate; THCAS, tetrahydrocannabinolic acid synthase; TPI, triose phosphate isomerase.

    1884 Plant Physiol. Vol. 180, 2019

    Zager et al.

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1https://plantphysiol.org

  • Tab

    le2.Transcriptab

    undan

    ce(inTPM)forge

    nes

    invo

    lved

    inthebiosynthesisofca

    nnab

    inoidsan

    dterpen

    oidsin

    cannab

    isstrains

    n.d.,Notdetec

    table.

    Gen

    eAnnotation

    UniProtIden

    tifier

    Tran

    scriptAbundan

    ce

    Blackberry

    Kush

    Black

    Lime

    Can

    naTsu

    Cherry

    Chem

    Mam

    aThai

    Sour

    Diesel

    Terple

    ValleyFire

    White

    Cookies

    Can

    nab

    inoid

    pathway

    Acylac

    tivatingen

    zyme1

    H9A1V3_C

    ANSA

    80.63

    160.44

    316.06

    840.92

    377.29

    397.99

    93.59

    188.84

    229.65

    Olivetolsynthase

    OLIS_CANSA

    3,946.85

    9,454.00

    10,400.03

    14,619.66

    17,955.05

    4,984.60

    9,706.06

    11,374.75

    12,373.11

    Geran

    yldiphosphate:olivetolate

    geranyltran

    sferase

    CsPT1

    422.42

    222.42

    189.43

    407.76

    649.37

    263.62

    246.13

    175.87

    115.21

    CBDAsynthase

    CBDAS_

    CANSA

    n.d.

    n.d.

    1282.46

    n.d.

    n.d.

    18.39

    n.d.

    n.d.

    n.d.

    THCAsynthase

    THCAS_CANSA

    885.17

    423.29

    1203.31

    2321.64

    2317.68

    1557.54

    619.22

    309.23

    524.08

    MEP

    pathway

    1-D

    eoxy

    - D-xylulose-5-phosphatesynthase

    A0A1V0QSH

    6_C

    ANSA

    221.85

    284.41

    412.74

    1627.02

    319.76

    1751.70

    533.57

    288.69

    16.32

    1-D

    eoxy

    -D-xylulose

    5-phosphate

    reductoisomerase

    A0A1V0QSG

    8_C

    ANSA

    172.63

    228.15

    185.07

    667.96

    304.62

    117.92

    176.79

    256.25

    16.01

    2-C-M

    ethyl-D-erythritol4-phosphate

    cytidylyltran

    sferase

    A0A1V0QSI6_C

    ANSA

    36.77

    95.99

    96.25

    168.24

    160.40

    146.38

    46.96

    75.40

    64.73

    4-(Cytidine59-diphospho)-2-C-m

    ethyl- D-

    erythritolkinase

    A0A1V0QSI2_C

    ANSA

    35.20

    3.70

    67.94

    211.85

    212.43

    109.88

    57.60

    104.05

    80.23

    2-C-M

    ethyl-D-erythritol2,4,-

    cyclodiphosphatesynthase

    G9C075_H

    UMLU

    67.75

    118.23

    315.86

    338.21

    184.98

    419.84

    69.75

    171.17

    207.15

    (E)-4-H

    ydroxy

    -3-m

    ethylbut-2-enyl-

    diphosphatesynthase

    A0A1V0QSG

    3_C

    ANSA

    107.65

    287.57

    794.25

    744.09

    444.09

    596.36

    349.56

    297.07

    317.55

    (E)-4-H

    ydroxy

    -3-m

    ethylbut-2-enyl-

    diphosphatereductase

    A0A1V0QSH

    9_C

    ANSA

    1,485.98

    561.96

    3,447.50

    3,468.57

    3,090.49

    3,024.22

    1,889.37

    1,031.90

    4,544.35

    Isopen

    tenyldiphosphateisomerase

    A0A1V0QSG

    5_C

    ANSA

    165.10

    272.72

    433.46

    1,836.07

    306.03

    347.85

    476.86

    509.70

    9.96

    Mevalonatepathway

    Ace

    toacetyl-CoAthiolase

    A0A1V0QSH

    3_C

    ANSA

    38.35

    11.90

    253.38

    302.58

    313.99

    134.71

    252.40

    54.35

    248.13

    3-H

    ydroxy

    -3-m

    ethylglutaryl-CoAsynthase

    A0A1V0QSH

    3_C

    ANSA

    13.44

    22.98

    20.81

    21.60

    27.81

    34.33

    9.24

    19.32

    91.24

    3-H

    ydroxy

    -3-m

    ethylglutaryl-CoAreductase

    A0A1V0QSF

    5_C

    ANSA

    26.69

    56.93

    21.92

    43.41

    29.05

    107.71

    19.75

    69.30

    48.26

    Mevalonatekinase

    A0A1V0QSI0_C

    ANSA

    1.63

    1,449.32

    3.63

    3.41

    5.81

    4.75

    2.45

    5.93

    5.05

    Phosphomevalonatekinase

    A0A1V0QSH

    8_C

    ANSA

    3.68

    7.58

    7.99

    6.63

    8.09

    6.03

    3.81

    7.40

    305.27

    Mevalonatediphosphatedec

    arboxy

    lase

    A0A1V0QSG

    4_C

    ANSA

    5.00

    11.89

    10.21

    14.89

    21.24

    19.39

    9.67

    9.64

    9.96

    Plant Physiol. Vol. 180, 2019 1885

    Coregulation of Cannabinoid and Terpenoid Pathways

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    https://plantphysiol.org

  • et al., 2006; Laverty et al., 2019; Table 2). Interestingly, theTHCA synthase sequences were essentially identical,with the exception of that of theCannaTsu strain, the onlyCBDA accumulator in our pilot study (Supplemental Fig.S1). Consequently, a full-length CBDA synthase genewasexpressed only in the Canna Tsu strain (SupplementalFig. S2), which is novel information that furthers ourunderstanding of the mechanisms underlying CBDA ac-cumulation. Finally, the yellow-coloredmodule (which asmentioned above contained THCA synthase) also com-prised cannabigerolic acid synthase (Table 2), the genepreceding THCA synthase in the cannabinoid pathway(Fig. 1), thereby providing additional evidence for gene-to-metabolite correlation in the cannabinoid pathway.

    We then asked if similar gene-to-metabolite correla-tions occurred in the terpenoid pathway. Interestingly,two coexpression modules (indicated by black andyellow color in Fig. 5A) correlated with b-myrcene ac-cumulation (Fig. 5B). This metabolite is formed by amonoterpene synthase encoded by the CsTPS3FN gene(Booth et al., 2017), which was contained in one of thesemodules (yellow color in Fig. 5A; Table 3). Analogousgene-to-metabolite correlations were observed for lim-onene and CsTPS1FN, a-pinene and CsTPS2FN,b-ocimene and CsTPS6FN, and b-caryophyllene/a-humulene and CsTPS9FN (color of modules inFig. 5A: black, yellow, and yellow, turquoise, respec-tively; terpene synthase annotation based onGünnewich et al. [2007] and Booth et al. [2017]; Fig. 5B).Transcripts corresponding to CsTPS5FN (b-myrcene/a-pinene synthase), CsTPS4FN (alloaromadendrenesynthase), CsTPS8FN (g-eudesmol/valencene syn-thase), and CsTPS13PK (a second b-ocimene synthase;Booth et al., 2017) remained below the threshold ex-pression level in our data sets. The corresponding ter-penoids were not detected in the strains investigated,indicating that the expressed gene complement wasgenerally sufficient to account for the presence of themajor terpenoids (Table 3). Linalool and nerolidol wereexceptions for which the corresponding terpene syn-thases had hitherto not been identified from cannabis.Notably, genes involved in the formation of these ter-penoids (and others) were cloned and functionallycharacterized as part of this study, which contributessignificantly to a better understanding of the geneticunderpinnings of terpenoid diversity.

    The yellow module featured prominently in ourgene-to-metabolite correlation analysis for the canna-binoid and terpenoid pathways. Interestingly, a GeneOntology (GO) analysis implied a substantial enrich-ment of genes involved in terpenoid biosynthesis in theyellow module (P value of 1.4e-05; Supplemental TableS3; note that GO terms for cannabinoid biosynthesis asa biological process have not yet been released). Inter-estingly, a total of 22 genes involved in the conver-sion of precursor metabolites into cannabinoid andterpenoid end products were coexpressed with THCAsynthase (Fig. 5C). Specifically, these genes code forenzymes involved in glycolysis (conversion of animported carbon source into triose phosphates and

    pyruvic acid), the MEP pathway toward GPP and ul-timately monoterpenes, the production of sesquiter-penes, the formation of olivetolic acid from fatty acidprecursors, and the incorporation of olivetolic acid andGPP into cannabinoids (Fig. 5D).

    Target Gene Identification and Characterization

    Building on our terpenoid profiling and glandu-lar trichome-specific transcriptome data sets, weembarked on gene discovery efforts aimed at charac-terizing terpene synthases associated with the biosyn-thesis of major monoterpenes and sesquiterpenesroutinely quantified in commercial cannabis testing aswell as other terpenoids that are not assayed routinely.The analytical chemistry data were employed to assesswhich genes would be expected to be expressed tosupport the observed terpenoid profiles. We then per-formed BLASTX searches with previously character-ized terpene synthases to identify contigs with highsequence identity in our transcriptome data sets. Wethen asked which of the putative cannabis terpenesynthases were expressed at appreciable levels in par-ticular cannabis strains. Sequences of selected contigswere then chosen to perform a sequence relatednessanalysis with previously characterized terpene syn-thases, thereby enabling their categorization by class.cDNAs of putative terpene synthases were cloned intoappropriate vectors and expressed heterologously inEscherichia coli, the corresponding recombinant proteinswere purified, and assays were performed with ap-propriate prenyl diphosphate substrates. Expressionfor genes putatively encoding geranyl diphosphatesynthase and trans,trans-farnesyl diphosphate syn-thase was readily detectable in transcriptome data setsof all strains; in contrast, no putative orthologs of neryldiphosphate (NPP) synthase and cis,cis-farnesyl di-phosphate synthase were recognizable based on se-quence identity (Supplemental Tables S1 and S2).Nevertheless, terpene synthase assays were performedwith GPP, NPP, 2-trans,6-trans-farnesyl diphosphate(tFPP), and 2-cis,6-cis-farnesyl diphosphate (cFPP).

    b-Myrcene and (2)-limonene were principal mono-terpenes in all strains (Table 1), and expectedly, contigswith high sequence identity to the previously charac-terized b-myrcene and (2)-limonene synthases of can-nabis (Günnewich et al., 2007; Booth et al., 2017), whichbelong to the TPS-b clade of terpene synthases (Fig. 6;Supplemental Table S4), were expressed at high levelsacross most strains investigated in this study (Table 2).Cloning was successful for the corresponding cDNAsfrom the Canna Tsu strain (CsTPS14CT and CsTPS15CT),and a functional evaluation confirmed the annotation[(2)-limonene synthase and b-myrcene, respectively;Fig. 7, A and B]. The translated peptide sequences ofb-myrcene synthases (CsTPS3FN and CsTPS15CT; ex-cluding plastidial targeting sequence) had 13 mis-matches (Supplemental Fig. S3) but identical specificity(100% b-myrcene as product with GPP as substrate).

    1886 Plant Physiol. Vol. 180, 2019

    Zager et al.

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1https://plantphysiol.org

  • Tab

    le3.Transcriptab

    undan

    ce(inTPM)forterpen

    esynthases

    across

    cannab

    isstrains

    n.d.,Notdetec

    table.

    Gen

    e

    Gen

    Ban

    kAcc

    ession

    No.

    CsTPS

    Iden

    tifier

    Tran

    scriptAbundan

    ce

    Blackberry

    Kush

    Black

    Lime

    Can

    naTsu

    Cherry

    Chem

    ValleyFire

    Mam

    ma

    Thai

    Sour

    Diesel

    Terple

    White

    Cookies

    Monoterpen

    esynthases

    (TPS-bclad

    e)(2

    )-Limonen

    esynthasea

    MK801766

    CsTPS1

    4CT

    646.24

    898.94

    612.37

    651.86

    2272.48

    751.48

    201.94

    2.46

    895.86

    (1)-a-Pinen

    esynthaseb

    KY014565

    CsTPS2

    FN217.36

    2,041.33

    1,554.77

    101.32

    96.90

    n.d.

    n.d.

    1,298.95

    49.52

    b-M

    yrce

    nesynthasea

    MK801765

    CsTPS1

    5CT

    183.29

    597.88

    325.85

    272.65

    157.78

    254.10

    183.29

    436.63

    n.d.

    b-M

    yrce

    ne/(2

    )-a-pinen

    esynthaseb

    KY014560

    CsTPS5

    FN217.59

    640.97

    483.09

    547.24

    157.78

    445.85

    125.94

    472.33

    50.51

    (E)-b-O

    cimen

    esynthaseb

    KY014563

    CsTPS6

    FNn.d.

    n.d.

    n.d.

    n.d.

    n.d.

    103.41

    n.d.

    191.65

    n.d.

    (Z)-b-O

    cimen

    esynthaseb

    KY014558

    CsTPS1

    3PK

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    Acyclic

    terpen

    esynthases

    (TPS-gclad

    e)(E)-Nerolidol/(1

    )-linaloolsynthasea

    MK801764

    CsTPS1

    8VF

    2.82

    9.41

    2.62

    16.21

    16.39

    2.51

    4.80

    16.77

    8.76

    (E)-Nerolidol/linaloolsynthasea

    MK801763

    CST

    PS1

    9BL

    56.78

    81.13

    27.22

    80.23

    249.23

    62.53

    47.73

    90.86

    66.47

    Sesquiterpen

    esynthases

    (TPS-aclad

    e)Alloaromad

    endrenesynthaseb

    KY014564

    CsTPS4

    FNn.d.

    108.92

    n.d.

    639.56

    n.d.

    329.87

    148.17

    n.d.

    323.36

    g-Eudesmol/valencenesynthase

    (putative)b

    KY014556

    CsTPS8

    FNn.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    n.d.

    d-Selinen

    esynthase(putative)b

    KY014554

    CsTPS7

    FN356.34

    n.d.

    367.47

    n.d.

    316.74

    210.58

    n.d.

    n.d.

    268.50

    b-Caryo

    phyllene/a-humulene

    synthaseb

    KY014555

    CsTPS9

    FN764.18

    794.46

    435.11

    3,241.85

    1,090.94

    738.74

    555.25

    495.72

    591.86

    Germac

    reneBsynthasea

    MK131289

    CsTPS1

    6CC

    16.14

    19.44

    9.13

    156.08

    20.60

    40.36

    20.22

    7.19

    22.72

    Hed

    ycaryo

    lsynthasea

    MK801762

    CST

    PS2

    0CT

    310.43

    27.00

    498.70

    98.21

    19.35

    11.98

    17.67

    0.00

    17.02

    aFu

    nctionally

    charac

    terize

    das

    partofthisstudy.

    bFrom

    Booth

    etal.(2017).

    Plant Physiol. Vol. 180, 2019 1887

    Coregulation of Cannabinoid and Terpenoid Pathways

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    https://plantphysiol.org

  • The sequence of the (2)-limonene synthase character-ized as part of this study (CsTPS14CT; excluding plas-tidial targeting sequence) had two mismatches whencompared with CsTPS1SK and nine mismatches whencompared with CsTPS1FN (Supplemental Fig. S3). Asdescribed for CsTPS1SK, CsTPS14CT generated severalother products, and we report the stereochemistry ofthose (Fig. 7A).

    The monoterpene linalool was accumulated to fairlyhigh amounts in the Valley Fire and White Cookiesstrains, whereas the sesquiterpene nerolidol wasquantifiable only in the Black Lime strain (Table 1).Contigs with moderate sequence identity (slightlyabove 50%) to bifunctional nerolidol/linalool syn-thases (strawberry [Fragaria spp.; Aharoni et al., 2004]and snapdragon [Antirrhinum majus; Nagegowdaet al., 2008]) and considerable expression in glandulartrichomes were identified in our transcriptome datasets (Table 3), and corresponding cDNAs were clonedfrom the Valley Fire (CsTPS18VF) and Black Lime(CsTPS19BL) strains. These sequences belong to theTPS-g clade of terpene synthases (Fig. 6; SupplementalTable S4). Heterologous expression and functionalcharacterization confirmed that the corresponding re-combinant proteins were able to catalyze the formationof (E)-nerolidol from tFPP and linalool from GPP, but

    no activity was detected with NPP or cFPP (Fig. 8).Interestingly, follow-up chiral separation of productsfrom assays performed with GPP as substrate indicatedthat CsTPS18VF generated almost exclusively (1)-linalool, whereas CsTPS19BL produced a mixture of(2)-linalool and (1)-linalool (Fig. 7, C and D). Sequencedifferences across sesquiterpene synthases with differ-ent product profiles included residues with potentialroles in catalysis (Fig. 9), and the implications areevaluated in “Discussion.”

    To further investigate the genetic potential forgenerating terpenoid chemical diversity, two repre-sentatives of the TPS-b clade of terpene synthases(CsTPS16CC and CsTPS20CT) were selected forfunctional characterization. CsTPS16CC had veryhigh expression levels in the ‘Cherry Chem’ strain(Table 3). The sequence was most similar to that of thepreviously characterized alloaromadendrene syn-thase (Booth et al., 2017; Fig. 6; Supplemental TableS4). In our assays, the recombinant protein generatedgermacrene B from tFPP (Fig. 8C), with g-elemenebeing detected as a thermal breakdown product (deKraker et al., 1998). Other prenyl diphosphate sub-strates were not accepted as substrates with appre-ciable conversion rates (Fig. 8). The ‘Canna Tsu’ strainhad a particularly high expression level of CsTPS20CT

    Figure 6. Maximum likelihood phylo-genetic tree of selected, functionallycharacterized terpene synthases. Thetree is rooted with the ancestral ent-kaurene synthase of Physcomitrellapatens (PpCPS/KS). A color code is usedto indicate different clades (yellow,TPS-a; green, TPS-b; and purple, TPS-g). Abbreviations are as follows: BL,Black Lime; CC, Cherry Chem; Cs,Cannabis sativa; CT, Canna Tsu; FN,Finola; FRAAN, Fragaria 3 ananassa;FRAVE, Fragaria vesca; HUMLU, Hu-mulus lupulus; OCIBA, Ocimum basi-licum; ROSRU, Rosa rugosa; SALOF,Salvia officinalis; VF, Valley Fire; VITVI,Vitis vinifera. The accession numbersand sequences of the terpene synthasesare provided in Supplemental Table S4.

    1888 Plant Physiol. Vol. 180, 2019

    Zager et al.

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1https://plantphysiol.org

  • Figure 7. Functional characterization of cannabis terpene synthases that act onGPPas substrate. Left, Chiral gas chromatography(GC) scans; center, mass spectra of primary products; right, product distribution. A, (2)-Limonene synthase (CsTPS14CT). B,b-Myrcene synthase (CsTPS15CT). C, (E)-Nerolidol/(1)-linalool synthase (CsTPS18VF). D, (E)-Nerolidol/(1)-linalool synthase(CsTPS19BL).

    Plant Physiol. Vol. 180, 2019 1889

    Coregulation of Cannabinoid and Terpenoid Pathways

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    https://plantphysiol.org

  • Figure 8. Functional characterization of cannabis terpene synthases that act on tFPP as substrate. Left, GC-mass spectrometryscans; center, mass spectra of primary products; right, product distribution. A, (E)-Nerolidol/(1)-linalool synthase (CsTPS18VF). B,(E)-Nerolidol/(1)-linalool synthase (CsTPS19BL). C, Germacrene B synthase (CsTPS16CC). D, Hedycaryol synthase (CsTPS20CT).

    1890 Plant Physiol. Vol. 180, 2019

    Zager et al.

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    https://plantphysiol.org

  • (Table 3). Its closest neighbor in the sequence relat-edness tree was a putative d-selinene synthase fromcannabis (Booth et al., 2017; Fig. 6; SupplementalTable S4). Functional assays with the purified, re-combinant protein indicated a conversion of tFPP to ele-mol, a thermal breakdown product of the sesquiterpenehedycaryol (Koo and Gang, 2012; Hattan et al., 2016), butthere was little or no activity with other prenyl diphos-phate substrates (Fig. 8D). In summary, we demon-strate that the resources and approaches described herecan be employed to identify candidates and subse-quently characterize functions of terpene synthase genesthat belong to three different clades, thereby contribut-ing to a better understanding of the genetic determinantsof terpenoid chemical diversity in cannabis.

    DISCUSSION

    Utility of Transcript Profiling for Strain Differentiation

    Competition in decriminalized retail markets forcannabis has put pressure on breeders to differentiatetheir product from that of their competitors. This has

    led to branding with a plethora of distinct and memo-rable names, which has caused both confusion andcontroversy (Small, 2015). Chemical profiling can beemployed as a powerful tool in strain differentiation,but adding genotyping information further increasesthe resolution of the analysis. The differentiation ofdrug-type and fiber-type cannabis strains can be ach-ievedwith standard genotyping analyses (Piluzza et al.,2013). However, a differentiation of genetically relatedstrains has been much more challenging (Sawler et al.,2015; Punja et al., 2017). Traditional genotypingapproaches benefit significantly from high-quality ref-erence genome sequences (Scheben et al., 2017), but,unfortunately, only fairly low-quality genome se-quences have been published for two cannabis strains(van Bakel et al., 2011). We employed RNA-seq as analternative approach for genotyping (Haseneyer et al.,2011), which does not depend on prior sequence data(Wang et al., 2009). We used RNA-seq to obtain thetranscriptome of glandular trichome cells of nine se-lected cannabis strains (with three biological replicateseach). Importantly, statistical analyses of these data setsallowed the differentiation of strains into broaderclades (descendants of landraces of C. sativa orC. indica)

    Figure 9. Variation of the residue putatively stabilizing carbocation intermediates correlates with outcome of catalysis in can-nabis sesquiterpene synthases. A, Sequence alignment of sesquiterpene synthases (with carbocation-stabilizing residues high-lighted). B, Proposed cyclization reactions catalyzed by sesquiterpene synthases. Identifiers for sequences from the literature(Aharoni et al., 2004; Nagegowda et al., 2008) are as follows: AmNES/LIS1, EF433761; AmNES/LIS2, EF433762; FvNES1,AX529002; FaNES2, AX529067; FaNES1, KX450224, with species abbreviations as follows: Am, Antirrhinummajus; Fa, Fragaria3 ananassa; Fv, Fragaria vesca).

    Plant Physiol. Vol. 180, 2019 1891

    Coregulation of Cannabinoid and Terpenoid Pathways

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1https://plantphysiol.org

  • but also resulted in the full separation of all individualstrains (with biological replicates clustering closely to-gether; Fig. 3). We fully recognize that RNA-seq is not aviable option for routine genotyping, but it can be usedto develop single-nucleotide polymorphism-basedgenotyping platforms. This approach has been employedsuccessfully for a number of crops, including alfalfa(Medicago sativa; Yanget al., 2011),maize (Zeamays;Hanseyet al., 2012), and wheat (Triticum aestivum; Ramirez-Gonzalez et al., 2015). Our data sets are therefore highlyvaluable for building resources for follow-up researchwithcannabis. As an added benefit, RNA-seq data can be usedfor gene expression analysis, thereby providing a func-tional context, which is discussed in more detail below.

    Utility of Metabolite Profiling for Strain Differentiation

    We assessed the utility of cannabinoid and terpenoidprofiling, in addition to strain differentiation by geno-typing as discussed above, to demarcate nine com-mercial cannabis strains. Two independent statisticalapproaches, PCA and OPLS-DA, grouped biologicalreplicates closely together while still separating indi-vidual strains and classes of strains (those of C. sativa orC. indica heritage; Fig. 4). Several authors have advo-cated the profiling of both cannabinoids and terpenoidsin recent publications (Fischedick et al., 2010; Elzingaet al., 2015; Aizpurua-Olaizola et al., 2016; Hazekampet al., 2016; Fischedick, 2017; Lewis et al., 2018;Orser et al., 2018; Richins et al., 2018; Sexton et al., 2018).The key advantage of this approach over merely pro-filing cannabinoids lies in the enormous diversity ofterpenoids accumulated in cannabis (and in otherplants as well), which significantly increases the powerof statistical analyses. It also reflects the fact that manyusers select cannabis strains based on both the reportedTHC content and aroma (which is largely imparted byterpenoids; Gilbert and DiVerdi, 2018). A comprehen-sive analysis of cannabis strains recently indicated thepresence of close to 200 detectable volatiles, which weretentatively identified based on searches against variousspectral databases (Rice and Koziel, 2015). A notablechallenge with terpenoid profiling pertains to the lim-itation that authentic standards are often very costly orunavailable from commercial sources, which is partic-ularly true for sesquiterpenes (dozens detected by Riceand Koziel [2015]). Commercial cannabis testing labo-ratories therefore rarely offer services that comprisemore than 20 terpenoids. While such analyses maydetect the most abundant terpenoids for popularstrains, it is not unlikely that important aroma volatileswith a low odor detection threshold could be missed(Chin and Marriott, 2015). Another reason why acomprehensive profiling of terpenoids would be de-sirable relates to testing the validity of the entourageeffect, the proposed synergism between cannabinoidsand other constituents (in particular terpenoids) thatmight affect the experience of the user (Gertsch et al.,2008; Russo, 2011). Should such effects be substantiated

    by empirical evidence, it would be advisable to recon-sider the current laws and rules for formulations con-taining cannabis extracts, which are based solely onTHC. An improved understanding of terpenoid phy-tochemistry in cannabis would be an important firststep in this direction (Booth and Bohlmann, 2019).

    Coregulation of Metabolic Pathways in Cannabis IsConsistent with Gene Expression Patterns CommonlyObserved in Glandular Trichomes

    Our statistical analyses using the WGCNA packageindicated a tight correlation of biosynthetic genes withcannabinoid and terpenoid end products (Fig. 5). Werecently performed a meta-analysis of gene expressionpatterns in glandular trichomes across various species(Zager and Lange, 2018). One of the conclusions, con-sistent with the data presented here, was that gene ex-pression patterns correlate well with the metabolicspecialization in these anatomical structures. Cor-egulation has been observed for genes across multiplepathways of specialized metabolism, such as cannabi-noids and terpenoids (this study), monoterpenes andditerpenes (Salvia pomifera; Trikka et al., 2015), flavo-noids and acyl sugars (Salpiglossis sinuata and Solanumquitoense; Moghe et al., 2017), and bitter acids and pre-nylflavonoids (Humulus lupulus; Kavalier et al., 2011;Clark et al., 2013). These tight gene-to-metabolite cor-relations were also reflective of predicted fluxesthrough the relevant pathways (Zager and Lange,2018). In contrast, gene expression patterns appear tobe less predictive of fluxes through central carbon me-tabolism, where regulation at the protein level plays amore significant role (Paul and Pellny, 2003; Koch, 2004;Gibon et al., 2006; Schwender et al., 2014; Rocca et al.,2015). This does not mean that feedback regulation ofspecialized metabolism is negligible in glandular tri-chomes; there is just a particularly strong overall gene-to-metabolite correlation, and unraveling the detailswill be an exciting topic for future research.

    Functional Characterization of Terpene SynthasesContributes to an Improved Understanding of the GeneticDeterminants of Terpenoid Diversity

    The observed gene-to-metabolite correlations incannabis glandular trichomes provided opportunitiesfor gene discovery efforts. Booth et al. (2017) analyzedtranscriptome data sets obtained with the Finola andPurple Kush strains to obtain candidate genes for ter-pene synthases that were subsequently characterized toencode enzymes for the production of 14 monoterpenesand sesquiterpenes. Those that contribute to the for-mation of some of the common monoterpenes andsesquiterpenes [e.g. b-myrcene, (2)-limonene, a-pinene,b-caryophyllene, and a-humulene] were found to beexpressed at fairly high levels across the strains in-cluded in this analysis, whereas those that generate less

    1892 Plant Physiol. Vol. 180, 2019

    Zager et al.

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    https://plantphysiol.org

  • common products [e.g. (Z)-b-ocimene, g-eudesmol,alloaromadendrene, d-selinene, and valencene] werefound to be expressed only in a limited number ofstrains or not at all (Table 3). To assess sequence vari-ation among these genes, we cloned genes with highsequence identity to the previously characterizedb-myrcene and (2)-limonene synthases.Prior to this study, a notable gap existed with regard

    to the terpene synthases underlying the formation ofthe monoterpene linalool and the sesquiterpene ner-olidol, which are both common constituents in cannabisresin. We now identified a gene coding for an enzyme(CsTPS19BL) that generates a mixture of (1)-linalooland (2)-linalool from GPP and (E)-nerolidol from tFPPin the Black Lime strain. We also cloned a putativeortholog from the Valley Fire strain to evaluate the ef-fects of sequence variation. Interestingly, the encodedenzyme (CsTPS18VF) had the same specificity asCsTPS19BL with regard to the tFPP substrate [(E)-nerolidol as product]; however, with GPP as substrate,(1)-linalool was detected as the essentially exclusiveproduct. This difference in specificity is surprisinggiven that the peptide sequences have only three mis-matches (Supplemental Fig. S3).Finally, we cloned genes that, based on sequence

    relatedness, were expected to code for enzymes thatgenerate sesquiterpene products not previously detec-ted in assays with cannabis terpene synthases. Indeed,CsTPS16CC was demonstrated to produce germacreneB and CsTPS20CT formed hedycaryol as primary pro-duct. In assays with CsTPS16CC, g-elemene was alsodetected, but this is a well-known product of thermaldegradation in the GC inlet (de Kraker et al., 1998).Elemol was the sole product of assays with CsTPS20CT,which is also a thermal degradation product, in thiscase of hedycaryol (Koo and Gang, 2012; Hattan et al.,2016). Consequently, the enzyme activities are referredto as germacrene B synthase and hedycaryol synthase,respectively. To the best of our knowledge, the sesqui-terpenes generated by these terpene synthases (ger-macrene B and hedycaryol) have not been identified incannabis samples yet, indicating the need for a morecomprehensive coverage of terpenoids to better un-derstand strain-specific aromaprofiles. It should also benoted that several recent studies reporting on compre-hensive chemical and sensory analyses of volatilesemitted from cannabis found that nonterpenoid alco-hols and aldehydes have potent odor impacts (Rice andKoziel, 2015; Wiebelhaus et al., 2016; Calvi et al., 2018).These considerations indicate that more emphasisneeds to be placed on comprehensive metabolite pro-filing, including cannabinoids and terpenoids but alsoextending to other volatiles, for future efforts focusedon strain characterization.With a larger number of functionally characterized

    genes in cannabis, sequence comparisons are nowallowing us to ask questions about some of the deter-minants of specificity. The overall sequence identity ofthe sesquiterpene synthases characterized here is fairlylow (less than 70% at the amino acid level), but there are

    striking differences in the nature of a conserved aro-matic residue (Tyr-527) that had previously been hy-pothesized to stabilize the positive charge of thecarbocation occurring during the formation of a ger-macrene intermediate in the epi-aristolochene synthasecatalytic sequence (Starks et al., 1997). The equivalentresidues in sesquiterpene synthases that catalyze theformationof cyclic products (CsTPS16CCandCsTPS20CT)are also Tyr residues (Fig. 9). In contrast, Gln residues oc-cupy this position in CsTPS18VF, CsTPS19BL, and othercharacterized enzymes of theTPS-g clade (Fig. 9A;Aharoniet al., 2004; Nagegowda et al., 2008), which, possiblybecause of insufficient carbocation stabilization, generate(E)-nerolidol as a noncyclic product (Fig. 9). Testing thishypothesis will be an important future goal for follow-upresearch.

    MATERIALS AND METHODS

    Plant Materials and Chemicals

    Clonal plant cuttings of nine Cannabis sativa (cannabis) strains (Sour Diesel,Canna Tsu, Black Lime, Valley Fire, White Cookies, Mama Thai, Terple, CherryChem, and Blackberry Kush) were placed in 250-L pots and grown in hoop-style, light-deprivation greenhouses at Shadowbox Farms inWilliams, Oregon,under a 18-h-light/6-h-dark regime (natural light) to stimulate vegetativegrowth, before shifting to a 12-h-light/12-h-dark cycle to induce flowering. Thelength of these time periods varied from strain to strain and was adjusted basedon phenotypic evaluations. All aspects of plant growth, harvest, and transportwere performed in accordance with the laws and rules under Chapter 475B, asreleased by the Oregon Liquor Control Commission. Plant harvest was per-formed when the consistency of glandular trichome content had changed froma turbid white to clear and before another change to an amber-like color oc-curred. For most strains, the pistils had changed color from white to yellow ororange. Buds were harvested, parts with low glandular trichome content wereremoved using scissors, and the remainder were placed on ice until furtherprocessing (always within 3 h). Monoterpene and sesquiterpene referencestandards were purchased from Restek. Cannabinoid reference standards wereobtained from Sigma-Aldrich. Solvents for extraction were procured fromSigma-Aldrich Solvents and chemicals for chromatography were sourced fromBurdick & Jackson. Substrates for enzyme assays (GPP, NPP, and E,E-FPP)were prepared synthetically (Davisson et al., 1986) or obtained from a com-mercial source (Z,Z-FPP; Echelon Biosciences). The sources of standards forenzyme assays were as follows: germacrene B, isolated as a side product fromassays with germacrene C synthase (Colby et al., 1998); g-elemene, obtained byheating germacrene B under argon (de Kraker et al., 1998); elemol, institutionalchemical repository (originally purchased from Parchem); hedycaryol, institu-tional chemical repository (source unknown); (S)-(1)-linalool, isolated from co-riander (Coriandrum sativum) oil; (2)-limonene, (1)-limonene, (R)-(2)-linalool,b-myrcene, (E)-nerolidol, (2)-a-pinene, (2)-b-pinene, and a-terpinolene, allpurchased from Sigma-Aldrich.

    Metabolite Extraction and Analysis

    Cannabinoids and terpenoids were extracted and quantified according toFischedick et al. (2010), with modifications, at a testing facility with accredita-tion by ISO/IEC 17025 and licensed through the National EnvironmentalLaboratory Accreditation Program (Evio Labs). Briefly, roughly 2 g of fresh budtissue was crushed in a Falcon tube, suspended in 10 mL of methyl tert-butylether (containing 1-octanol as internal standard) with gentle shaking for 15min,followed by centrifugation at 2,000g for 5 min. The supernatant was transferredto a new vial, and the plant material was extracted twomore times as above (noaddition of internal standard to solvent). The combined supernatants werefiltered through a polytetrafluoroethylene syringe filter (0.45 mm pore size,25 mm diameter), and an aliquot was transferred to a screw-cap glass vial,which was stored at 220°C until further analysis. Following extraction, the

    Plant Physiol. Vol. 180, 2019 1893

    Coregulation of Cannabinoid and Terpenoid Pathways

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1https://plantphysiol.org

  • remaining plantmaterial was dried in an oven (50°C) andweighed to determinedry weights for each sample.

    Cannabinoidswere separated viaHPLC (model LC-2030C; Shimadzu) using aKinetex C18 reverse-phase column (50 3 4.6 mm, 2.6 mm particle size; Phenom-enex) and a binary gradient of solvent A (water containing 0.1% [v/v] formic acidand 10mM ammonium formate) and solvent B (methanol containing 0.05% [v/v]formic acid) with the following settings: 0 to 9 min, 68% to 78% B; 9 to 11.9 min,78% to 100% B; 11.9 to 13.5 min, hold at 100% B. Analytes were monitored at 228nm in a diode array detector. Peak identification was achieved based on com-parisons of retention times and spectral characteristics with those of authenticcannabinoid reference standards. Analytes were quantified based on calibrationcurves acquiredwith authentic standards. The validation of the analyticalmethodwas performed according to Fischedick et al. (2010).

    Terpenoidswere separated viaGC (model 6890; Agilent Technologies) usinga DB5 column (30 m3 25 mm, 25 mm film thickness; Agilent Technologies) anddetected with a flame ionization detector. The conditions for separation were asfollows: injector at 250°C, 20:1 split injection mode (1 mL injected); detector at250°C (H2 flow at 30 mL min21, airflow at 400 mL min21, makeup flow [He] at25 mL min21); oven heating from 40°C to 120°C at 2°C min21, then ramped to200°C at 50°C min21, with a final hold at 200°C for 2 min. GC peaks wereidentified based on comparisons of retention times of authentic standards(purchased from Sigma-Aldrich). Analytes were quantified based on calibra-tion curves acquired with authentic standards. The validation of the analyticalmethod was performed according to Fischedick et al. (2010).

    RNA Isolation from Glandular Trichomes and cDNALibrary Preparation

    Secretory cells of glandular trichomeswere removed from10 to 15 gof bud tissueby surface abrasion and then collected by filtering through a series of nylon meshes(Lange et al., 2000). Total RNA was isolated from secretory cells using the RNeasyPlant kit (Qiagen) according to the manufacturer’s instructions. RNA integrity wasdetermined using a BioAnalyzer 2100 (Agilent Technologies). cDNA libraries from1 to 2 mg of total RNA were generated using the SuperScript III Reverse Tran-scriptase kit (Invitrogen) according to the manufacturer’s instructions.

    RNA-Seq and Transcriptome Assembly

    RNA-seq libraries were prepared from 250 ng of total glandular trichomeRNA with the Stranded mRNA-Seq Poly(A) Selection kit (KAPA Biosystems).The quality and quantity of the sequencing library were assessed using a Bio-analyzer 2100 and a Qubit 3.0 Fluorometer (Agilent Technologies and LifeTechnologies). Sequencing of 150-bp paired-end reads was performed on aHiSeq 4000 instrument (Illumina). Sequenced reads were trimmed of adaptersequences with Trimmomatic (Bolger et al., 2014), and sequence quality waschecked with FastQC (Andrews, 2010). Trimmed sequences were merged andassembled using the Trinity de novo assembler, and downstream functionalannotation of the assembly was performed with Trinotate (Haas et al., 2013).The resulting transcriptome assembly contained 514,208 contigs, with a meancontig length of 875 bp and an N50 value of 1,529 bp. Transcript abundance ineach RNA-seq data set (three biological replicates per strain) was determinedwith RSEM (Li and Dewey, 2011).

    Analysis of Global Gene Expression Patterns andGO Enrichment

    Testing for differential gene expression across strains was performed usingthe Bioconductor package DESeq2 (version 1.18.1; Love et al., 2014). P valueswere adjusted using the Benjamini-Hochberg procedure (Benjamini andHochberg, 1995). An adjusted P value (false discovery rate) # 1e-10 and log2ratio $ 3 were set as thresholds. A cluster analysis of gene expression patternsbetween strains was performed within the Trinity suite (Haas et al., 2013) bypartitioning genes into clusters by cutting the hierarchically clustered gene treeat 60% height of the tree. A GO enrichment analysis of differentially expressedgenes was performed using the GOseq package in R (Young et al., 2010). GOterms with an adjusted P , 0.01 were considered significantly enriched.

    Gene Coexpression Network Analysis

    A gene coexpression network was built using the WGCNA package in R(Langfelder and Horvath, 2008). Transcriptome data sets were filtered to

    remove genes with an average expression value of 50 TPM or smaller. Coex-pression modules were identified using the function blockwiseModules withthe following settings: power at 7, mergeCutHeight at 0.55, andminModuleSizeat 30. Eigengene values were determined for each coexpression module to testfor association significance. Modules with similar eigengene values weremerged to obtain the final coexpression modules.

    Phylogenetic Analysis of TPS Candidates

    The identification of TPS candidate geneswas accomplished by searching thetranslated transcriptome consensus assembly against a manually curated pro-teindatabase specific to characterizedplant TPSsusing theBLASTxalgorithm.Areciprocal search (tBLASTn)wasperformedwith sequences of 114 characterizedangiosperm TPSs against the assembly for each individual strain. Predicted TPSsequences were then analyzed for gene expression values across strains.Translated amino acid sequences of these and reference TPSs (fromC. sativa andHumulus lupulus) were aligned using theMUSCLE algorithm. Alignments wereanalyzed with maximum likelihood analysis using a Jones-Taylor-Thorntonmodel with gamma distribution for rates among amino acid sites. One thou-sand bootstrap replicates were then used to construct a phylogeny usingMEGA7 (Jones et al., 1992; Kumar et al., 2016).

    Cloning of TPS cDNAs

    First-strand cDNA was prepared from RNA with the SuperScript III FirstStrand Synthesis kit (Invitrogen) with random hexamer oligonucleotides. Openreading frames for TPSs were amplified using gene-specific primers(Supplemental Table S5; amplicons for full-length cDNAs were generated forputative sesquiterpene synthases, whereas cDNAs devoid of the plastidialtargeting sequence were amplified for putative monoterpene synthases).Amplicons were ligated into the pGEM-T Easy vector (Promega) and sequenceverified. For expression in Escherichia coli, full-length or truncated genes weresubcloned into the pSBET expression vector (predigested with NdeI andBamHI). Several terpene synthase cDNAs (CsTPS18VF, CsTPS19BL, andCsTPS20CT) were purchased as synthetic products (in the pET28B expressionvector) from GenScript.

    In Vitro Functional Assays for Recombinant TPSs

    Plasmids were transformed into chemically competent cells of several E. colistrains [BL21 (DE3), C41 (DE3), C43 (DE3), C43 (DE3) pLysS, and ArcticExpress(DE3)], which were then grown in 25 mL of liquid Luria-Bertani medium at37°Cwith shaking to anOD600 of 0.8. Expression of TPS geneswas inducedwith0.1 or 0.5 mM isopropyl b-D-1-thiogalactopyranoside (Goldbio), and cells weregrown for another 24 h at three different temperatures (16°C, 10°C, and 4°C).Bacterial cells were harvested by centrifugation at 5,000g and resuspended in300 mL of MOPSO buffer, pH 7, supplemented with 1 mM DTT (Goldbio). Cellswere lysed using a model 475 sonicator (VirTis), with three 15-s bursts andcooling on ice between bursts. The resulting homogenate was centrifuged at15,000g for 30 min at 4°C, and the clear supernatant was mixed with ceramichydroxyapatite (Bio-Rad). The purification of recombinant protein was per-formed as described by Srividya et al. (2016) for constructs in the pSBET ex-pression vector, whereas those in the pET28B expression vector were purifiedover Ni21 affinity columns according to the manufacturer’s instructions(Novagen-EMD Millipore). In vitro assays were performed in 2-mL glass vialscontaining 200 mg of purified enzyme in MOPSO buffer containing DTT andMgCl2 (total volume of 100 mL). A prenyl diphosphate substrate (GPP, NPP,tFPP, or cFPP) was added to a final concentration of 0.5 mM. The assay mixtureswere overlaid with 100mL of n-hexane (Avantor) and incubated at 30°C for 16 hon a multitube rotator (Labquake; Barnstead Thermolyne). The enzymatic re-action was stopped by vigorousmixing of the contents of the tubes, followed by30 min at 280°C for phase separation. The organic phase was removed andtransferred to glass vial inserts and stored in GC vials at 220°C until furtheranalysis.

    Enzymatically formed products were analyzed on a 6890N gas chromato-graph coupled to a 5973 mass selective detector (Agilent). Analyte separationwas achieved under the conditions developed byAdams (2007), which includesa comprehensive resource for spectral comparisons of volatiles. The chiralseparation of monoterpenes was achieved as described by Turner et al. (2019).Enzymatically generated products were identified based on retention times andmass spectral properties when compared with those of authentic standards.

    1894 Plant Physiol. Vol. 180, 2019

    Zager et al.

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1https://plantphysiol.org

  • Statistical Analyses

    For metabolite analyses, statistical analyses were performed in R using theMetaboAnalystR package (Chong and Xia, 2018). Quantitative terpenoid andcannabinoid data were scaled by dividing mean centered values by the SD ofeach variable to generate principal component loadings. Principal componentswere then plotted in three dimensions within the R environment. OPLS-DAanalysis was also performed in the same way using the MetaboAnalystRpackage. Differential gene expression patterns were assessed using the Bio-conductor package DESeq2 (version 1.18.1; Love et al., 2014), with the P valuefor the Benjamini-Hochberg false discovery threshold being adjusted to 1e-10 orless and the log2 fold-change ratio to 3 or greater. Cluster analysis of differentialgene expression was performed within the Trinity suite (Haas et al., 2013) bycutting the clustered gene tree at 60% tree height, and differentially expressedgenes were subjected to further analysis within GOseq as described above(Young et al., 2010). TPS candidates were identified based on sequence identitywith functionally characterized TPSs in tBLASTn searches. Candidates withe-values. 0.001 and bitscores, 250 were removed from further consideration.

    Accession Numbers

    The raw transcriptome sequence data for cannabis strains are available at theNational Center for Biotechnology Information Sequence Read Archive, projectnumber PRJNA498707. Nucleotide sequences for genes characterized as partof this study were deposited in GenBank and received the accessionnumbers MK131289 (CsTPS16CC), MK801762 (CsTPS20CT), MK801763(CsTPS19BL), MK801764 (CsTPS18VF), MK801765 (CsTPS15CT), andMK801766 (CsTPS14CT).

    Supplemental Data

    The following supplemental materials are available.

    Supplemental Figure S1. Alignment of translated peptide sequences,based on RNA-seq data, of THCA synthase across cannabis strains.

    Supplemental Figure S2. Nucleotide and translated peptide sequence,based on RNA-seq data, of CBDA synthase from the cannabis strainCanna Tsu.

    Supplemental Figure S3. Alignment of terpene synthase sequences.

    Supplemental Table S1. Statistics of de novo assemblies performed basedon cannabis glandular trichome-specific RNA-seq data sets.

    Supplemental Table S2. Annotation of transcripts represented in cannabisglandular trichome-specific RNA-seq data sets.

    Supplemental Table S3. Clustering of genes into coexpression modulesobtained by WGCNA of cannabis glandular trichome-specific RNA-seqdata sets.

    Supplemental Table S4. Accession numbers and sequences of terpenesynthases considered for phylogenetic analysis.

    Supplemental Table S5. Primers used to clone cannabis cDNAs for func-tional characterization.

    ACKNOWLEDGMENTS

    This study was supported by gifts from private individuals, and we aregrateful for their generosity. We also thank Shadowbox Farms for allowing A.S.to harvest plant materials.

    Received December 5, 2018; accepted May 15, 2019; published May 28, 2019.

    LITERATURE CITED

    Abuhasira R, Shbiro L, Landschaft Y (2018) Medical use of cannabis andcannabinoids containing products: Regulations in Europe and NorthAmerica. Eur J Intern Med 49: 2–6

    Adams RP (2007) Identification of Essential Oil Components By GasChromatography/Mass Spectrometry, 4. Allured Publishing Corpora-tion, Carol Steam, IL

    Aharoni A, Giri AP, Verstappen FW, Bertea CM, Sevenier R, Sun Z,Jongsma MA, Schwab W, Bouwmeester HJ (2004) Gain and loss of fruitflavor compounds produced by wild and cultivated strawberry species.Plant Cell 16: 3110–3131

    Aizpurua-Olaizola O, Soydaner U, Öztürk E, Schibano D, Simsir Y,Navarro P, Etxebarria N, Usobiaga A (2016) Evolution of the cannabi-noid and terpene content during the growth of Cannabis sativa plantsfrom different chemotypes. J Nat Prod 79: 324–331

    Andre CM, Hausman JF, Guerriero G (2016) Cannabis sativa: The plant ofthe thousand and one molecules. Front Plant Sci 7: 19

    Andrews S (2010) FastQC: A quality control tool for high throughput se-quence data. http://www.bioinformatics.babraham.ac.uk/projects/fastqc

    Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: Apractical and powerful approach to multiple testing. J R Stat Soc B 57:289–300

    Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: A flexible trimmer forIllumina sequence data. Bioinformatics 30: 2114–2120

    Booth JK, Bohlmann J (2019) Terpenes in Cannabis sativa: From plantgenome to humans. Plant Sci 284: 67–72

    Booth JK, Page JE, Bohlmann J (2017) Terpene synthases from Cannabissativa. PLoS ONE 12: e0173911

    Brenneisen R (2007) Chemistry and analysis of phytocannabinoids andother cannabis constituents. In M.A. ElSohly, ed, Forensic Science andMedicine. Marijuana and the Cannabinoids. Humana Press, New York,pp 17–49

    Calvi L, Pentimalli D, Panseri S, Giupponi L, Gelmini F, Beretta G, VitaliD, Bruno M, Zilio E, Pavlovic R, et al (2018) Comprehensive qualityevaluation of medical Cannabis sativa L. inflorescence and macerated oilsbased on HS-SPME coupled to GC-MS and LC-HRMS (q-exactive orbi-trap�) approach. J Pharm Biomed Anal 150: 208–219

    Cascini F, Aiello C, Di Tanna G (2012) Increasing delta-9-tetrahydrocan-nabinol (D-9-THC) content in herbal cannabis over time: Systematic re-view and meta-analysis. Curr Drug Abuse Rev 5: 32–40

    Chin ST, Marriott PJ (2015) Review of the role and methodology of highresolution approaches in aroma analysis. Anal Chim Acta 854: 1–12

    Chong J, Xia J (2018) MetaboAnalystR: An R package for flexible and re-producible analysis of metabolomics data. Bioinformatics 34: 4313–4314

    Clark SM, Vaitheeswaran V, Ambrose SJ, Purves RW, Page JE (2013)Transcriptome analysis of bitter acid biosynthesis and precursor path-ways in hop (Humulus lupulus). BMC Plant Biol 13: 12

    Colby SM, Crock J, Dowdle-Rizzo B, Lemaux PG, Croteau R (1998)Germacrene C synthase from Lycopersicon esculentum cv. VFNT cherrytomato: cDNA isolation, characterization, and bacterial expression ofthe multiple product sesquiterpene cyclase. Proc Natl Acad Sci USA 95:2216–2221

    Davisson VJ, Woodside AB, Neal TR, Stremler KE, Muehlbacher M,Poulter CD (1986) Phosphorylation of isoprenoid alcohols. J Org Chem51: 4768–4779

    de Kraker JW, de Groot A, Franssen MC, Konig WA, Bouwmeester HJ(1998) (1)-Germacrene A biosynthesis: The committed step in the bio-synthesis of bitter sesquiterpene lactones in chicory. Plant Physiol 117:1381–1392

    Devane WA, Dysarz FA III, Johnson MR, Melvin LS, Howlett AC (1988)Determination and characterization of a cannabinoid receptor in ratbrain. Mol Pharmacol 34: 605–613

    Devane WA, Hanus L, Breuer A, Pertwee RG, Stevenson LA, Griffin G,Gibson D, Mandelbaum A, Etinger A, Mechoulam R (1992) Isolationand structure of a brain constituent that binds to the cannabinoid re-ceptor. Science 258: 1946–1949

    Elzinga S, Fischedick J, Podkolinski R, Raber JC (2015) Cannabinoids andterpenes as chemotaxonomic markers in cannabis. Nat Prod Chem Res 3:2

    Fellermeier M, Zenk MH (1998) Prenylation of olivetolate by a hemptransferase yields cannabigerolic acid, the precursor of tetrahydrocan-nabinol. FEBS Lett 427: 283–285

    Fellermeier M, Eisenreich W, Bacher A, Zenk MH (2001) Biosynthesis ofcannabinoids. Incorporation experiments with (13)C-labeled glucoses.Eur J Biochem 268: 1596–1604

    Fischedick JT (2017) Identification of terpenoid chemotypes among high(2)-trans-D9-tetrahydrocannabinol-producing Cannabis sativa L. culti-vars. Cannabis Cannabinoid Res 2: 34–47

    Plant Physiol. Vol. 180, 2019 1895

    Coregulation of Cannabinoid and Terpenoid Pathways

    https://plantphysiol.orgDownloaded on March 29, 2021. - Published by Copyright (c) 2020 American Society of Plant Biologists. All rights reserved.

    http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.plantphysiol.org/cgi/content/full/pp.18.01506/DC1http://www.bioinformatics.babraham.ac.uk/projects/fastqchttp://www.bioinformatics.babraham.ac.uk/projects/fastqchttps://plantphysiol.org

  • Fischedick JT, Hazekamp A, Erkelens T, Cho