CHAPTER I – Literature review
Transcript of CHAPTER I – Literature review
Abstract
BARRANGOU, RODOLPHE. Functional genomic analyses of carbohydrate utilization
by Lactobacillus acidophilus. (Under the direction of Professor Todd R. Klaenhammer).
Carbohydrates are a primary source of energy for microbes. Specifically, lactic
acid bacteria have the ability to utilize a variety of nutrients available in their respective
habitats. For probiotic microbes inhabiting the human gastrointestinal tract, the ability to
utilize sugars non-digested by the host plays an important role in their survival.
Lactobacillus acidophilus is a probiotic organism which can utilize a variety of mono-,
di- and poly-saccharides, including prebiotic compounds such as fructooligosaccharides
and raffinose. However, little information is available about the mechanisms and genes
involved in carbohydrate utilization by lactobacilli. The transport and catabolic
machinery involved in utilization of glucose, fructose, sucrose, FOS, raffinose, lactose,
galactose and trehalose was characterized using global transcriptional profiling.
Microarray hybridizations were carried out using a round-robin design and data analyzed
using a two-stage mixed model ANOVA. Genes differentially expressed between
treatments were visualized by hierarchical clustering, volcano plots, and 3-way contour
plots. Globally, a small number of genes were highly induced, including a variety of
carbohydrate transporters and sugar hydrolases. Members of the phosphoenolpyruvate
sugar phosphotransferase system (PTS) family of transporters were identified for uptake
of glucose, fructose, sucrose and trehalose. In contrast, transporters of the ATP binding
cassette (ABC) family were identified for uptake of FOS and raffinose. A member of the
LacS family of galactoside-pentose-hexuronide (GPH) translocators was identified for
uptake of galactose and lactose. Saccharolytic enzymes likely involved in the metabolism
of mono-, di- and poly- saccharides were also identified, including the enzymatic
machinery of the Leloir pathway. Insertional inactivation of genes encoding sugar
transporters and hydrolases confirmed microarray results. Quantitative RT-PCR was also
used to confirm differential gene expression. Additional transcription experiments
showed specific induction of genes encoding sugar transporters and hydrolases, and
transcriptional repression by glucose. Collectively, microarray data revealed coordinated
and regulated transcription of genes involved in sugar utilization based on carbohydrate
availability, likely via carbon catabolite repression.
The relationships between gene expression level, codon usage, chromosomal
location and intrinsic gene parameters were investigated globally. Gene expression levels
correlated most highly with GC content, codon adaptation index and gene size. In
contrast, gene expression levels did not correlate with GC content at the third codon
position. Perhaps the high correlation between GC content and gene expression is due to
the low genomic GC composition of L. acidophilus. Analysis of variance was used to
investigate the impact of chromosomal location on gene expression after data was
segregated into four groups, by strand and orientation relative to the origin and terminus
of replication. Results showed genes on the leading strand were more highly expressed.
Also, genes pointing toward the terminus of replication showed higher expression levels.
This preference allows for co-directional replication and transcription. Collectively,
results showed a strong influence of chromosomal architecture, GC content and codon
usage on gene transcription.
Globally, analysis of gene expression in Lactobacillus acidophilus revealed
orchestrated transcription, and adaptation to environmental conditions. Specifically,
dynamic adaptation to carbohydrate sources available in the environment might
contribute to competition with other commensal microbes for the limited nutrient sources
available in the human gastrointestinal tract.
FUNCTIONAL GENOMIC ANALYSES OF CARBOHYDRATE
UTILIZATION BY LACTOBACILLUS ACIDOPHILUS
by
RODOLPHE BARRANGOU
A dissertation submitted to the Graduate Faculty of North Carolina State University
in partial fulfillment of the requirements for the Degree of
Doctor of Philosophy
FUNCTIONAL GENOMICS
Raleigh
2004
APPROVED BY: ________________________________ _________________________________ Dr. Todd R. Klaenhammer Dr. Greg Gibson Chairman of Advisory Committee ________________________________ _________________________________ Dr. Robert M. Kelly Dr. Dahlia M. Nielsen
Biography
Rodolphe Barrangou, the son of Charles Barrangou-Poueys and Roseline Helie, was born
on July 20, 1975 in Caen, France and raised in Paris, France. He attended the University
of Rene Descartes, Paris V (France) between 1994 and 1996 where he obtained a degree
in Life Sciences. He also attended the University of Technology of Compiegne (France)
between 1996 and 1998 where he obtained a M. S. degree in Biological Engineering. In
January 1999, he began working towards a Master of Science in Food Science at North
Carolina State University (USA) in the Vegetable Fermentation Laboratory (USDA-
ARS) under the direction of Dr. Henry P. Fleming and Dr. Todd R. Klaenhammer. In
January 2001, he began working towards a Ph. D. in Functional Genomics at North
Carolina State University (USA) in the Southeast Dairy Foods Research Center under the
direction of Dr. Todd R. Klaenhammer.
ii
Acknowledgements
First and foremost, I would like to thank my advisor, Dr. Todd R. Klaenhammer for
giving me the opportunity to pursue another graduate degree at NC State, for his time,
supervision, guidance, availability and support throughout my graduate education. I also
wish to acknowledge Dr. Greg Gibson, Dr. Robert M. Kelly, and Dr. Dahlia Nielsen, for
serving on my advisory committee, giving me time outside of committee meetings, and
insightful discussions. Also, I would like to acknowledge all my co-workers and
collaborators within the “Klaenhammer lab”, especially Evelyn Durmaz, Dr. Andrea
Azcarate Peril, Dr. Eric Altermann, and Tri Duong, for technical help, sharing their
expertise and suggestions. I would also like to thank my other collaborators on campus, at
the GRL (Dr. Bryon Sosinski and Regina Brierley), for providing help with microarray
printing and scanning; in the Microbiology Department (Dr. Jose Bruno-Barcena and Dr.
Hosni Hassan), for proving help with Q-PCR; and my collaborators in the Bioinformatics
Program, namely Shannon Conners and Joshua Starmer for collaborating with me. I
would also like to acknowledge Dr. Barbara Sherry and Dr. Stephanie Curtis for their
leadership in the Functional Genomics program. I would like to dedicate my work to my
whole family for teaching me everything that I need to know, and for understanding my
need to go overseas. I would also like to acknowledge my friends Tri and Mike for
making my experience in the lab (and beyond) particularly enjoyable. Finally, I would
like to give a very special and personal thank you to my wife Lisa, for her patience,
understanding, and permanent support throughout my graduate career, for helping me
make the right decisions, understand what is important, and sharing everything in my life.
iii
Table of contents
LIST OF TABLES. ___________________________________________________VII
LIST OF FIGURES. _________________________________________________ VIII
LIST OF ABBREVIATIONS. ___________________________________________ X
CHAPTER I – LITERATURE REVIEW: TRANSPORT SYSTEMS IN LACTIC ACID BACTERIA. _________________________________________________ 1
1.1 INTRODUCTION. __________________________________________________ 2
1.2 THE LACTIC ACID BACTERIA. ______________________________________ 4
1.3 GENOMICS OF LACTIC ACID BACTERIA. ____________________________ 8
1.4 FERMENTATION CAPABILITIES OF LACTIC ACID BACTERIA. ________ 11
1.5 ABC TRANSPORTERS. _____________________________________________ 13
1.6 PTS TRANSPORTERS. _____________________________________________ 17
1.7 OTHER TRANSPORTERS. __________________________________________ 19
1.8 REGULATION AND CARBON CATABOLITE REPRESSION. ____________ 22
1.9 CONCLUSIONS AND PERSPECTIVES. _______________________________ 26
1.10 REFERENCES. ___________________________________________________ 29
CHAPTER II – FUNCTIONAL AND COMPARATIVE GENOMIC ANALYSES OF AN OPERON INVOLVED IN FRUCTOOLIGOSACCAHRIDE UTILIZATION BY LACTOBACILLUS ACIDOPHILUS. _________________ 42
2.1 ABSTRACT. ______________________________________________________ 43
2.2 INTRODUCTION. _________________________________________________ 44
2.3 MATERIALS AND METHODS. ______________________________________ 45
2.3.1 Bacterial strain and media used in this study. ________________________ 45
2.3.2 Computational analysis of the putative msm operon. ___________________ 46
iv
2.3.3 RNA isolation and analysis. ______________________________________ 46
2.3.4 Comparative genomic analyses. ___________________________________ 47
2.3.5 Phylogenetic trees. _____________________________________________ 48
2.3.6 Gene inactivation. ______________________________________________ 48
2.4 RESULTS. ________________________________________________________ 49
2.4.1 Computational analysis of the msm operon. __________________________ 49
2.4.2 Sugar induction and co-expression of contiguous genes. ________________ 50
2.4.3 Mutant phenotype analysis. ______________________________________ 51
2.4.4 Comparative genomic analyses and locus alignments. _________________ 51
2.4.5 Phylogenetic trees. _____________________________________________ 52
2.4.6 Catabolite response elements (cre) analysis. _________________________ 53
2.5 DISCUSSION. _____________________________________________________ 54
2.6 REFERENCES. ____________________________________________________ 60
CHAPTER III – GLOBAL ANALYSIS OF CARBOHYDRATE UTILIZATION AND TRANSCRIPTIONAL REGULATION IN LACTOBACILLUS ACIDOPHILUS USING WHOLE-GENOME cDNA MICROARRAYS. _____ 77
3.1 ABSTRACT. ______________________________________________________ 78
3.2 INTRODUCTION. _________________________________________________ 80
3.3 MATERIALS AND METHODS. ______________________________________ 82
3.3.1 Bacterial strain and media used in this study. ________________________ 82
3.3.2 RNA isolation. _________________________________________________ 82
3.3.3 Microarray fabrication. _________________________________________ 83
3.3.4 cDNA target preparation and microarray hybridization. ________________ 83
3.3.5 Microarray data collection and analysis. ____________________________ 84
3.3.6 Real-Time Quantitative RT-PCR. __________________________________ 86
3.4 RESULTS. ________________________________________________________ 86
3.4.1 Differentially expressed genes. ____________________________________ 86
3.4.2 Real-Time Quantitative RT-PCR. __________________________________ 91
3.5 DISCUSSION. _____________________________________________________ 92
3.6 REFERENCES. ____________________________________________________ 98
v
CHAPTER IV – GLOBAL CHARACTERIZATION OF THE LACTOBACILLUS ACIDOPHILUS TRANSCRIPTOME AND ANALYSIS OF RELATIONSHIPS BETWEEN GENE EXPRESSION LEVEL, CODON USAGE, CHROMOSOMAL LOCATION AND INTRINSIC GENE CHARACTERISTICS._____________________________________________ 115
4.1 ABSTRACT. _____________________________________________________ 116
4.2 INTRODUCTION. ________________________________________________ 118
4.3 MATERIALS AND METHODS. _____________________________________ 120
4.3.1 Genome and microarray data. ___________________________________ 120
4.3.2 Gene intrinsic parameters. ______________________________________ 121
4.3.3 Codon adaptation index. ________________________________________ 122
4.3.4 Ribosome binding site identification. ______________________________ 123
4.3.5 Statistical analyses. ____________________________________________ 123
4.4 RESULTS. _______________________________________________________ 124
4.4.1 Distribution patterns. __________________________________________ 124
4.4.2 Correlation analyses. __________________________________________ 126
4.4.3 Chromosomal location. _________________________________________ 127
4.5 DISCUSSION. ____________________________________________________ 130
4.6 REFERENCES. ___________________________________________________ 142
APPENDIX I – FUNCTIONAL AND COMPARATIVE GENOMIC ANALYSES OF AN OPERON INVOLVED IN FRUCTOOLIGOSACCAHRIDE UTILIZATION BY LACTOBACILLUS ACIDOPHILUS. ________________ 157
vi
List of tables
Chapter I
1. Genomes of lactic acid bacteria and other probiotic species. _______________ 36
2. Carbohydrate utilization profiles for select lactic acid bacteria. ____________ 37
3. Transmembrane domains in L. acidophilus transporters. __________________ 38
Chapter II
1. Catabolite responsive elements sequences. ______________________________ 64
2. Primers used in this study. ___________________________________________ 65
3. Genes and proteins used for comparative genomic analyses. _______________ 66
Chapter IV
1. Codon usage table. _________________________________________________ 145
2. Correlation analyses. _______________________________________________ 146
3. Analysis of variance between chromosomal locations. ____________________ 147
4. Correlation analyses, by chromosomal location. ________________________ 148
vii
List of figures
Chapter I
1. Phylogenetic tree of lactic acid bacteri and select microbial species. _________ 39
2. Transporters commonly found in lactic acid bacteria. _____________________ 40
3. Transmembrane domains in ABC, PTS and GPH transporters in L. acidophilus. __________________________________________________________________ 41
Chapter II
1. Operon layout. _____________________________________________________ 68
2.Sugar induction and repression. _______________________________________ 69
3. Growth curves. _____________________________________________________ 70
4. Operon architecture analysis. _________________________________________ 71
5. Neighbor-joining phylogenetic tree. ____________________________________ 72
6. Co-expression of contiguous genes. ____________________________________ 73
7. Mutant growth on select carbohydrates. ________________________________ 74
8. Motifs highly conserved amongst repressors and fructosidases. _____________ 75
9. Biochemical pathways. ______________________________________________ 76
Chapter III
1. Round-robin microarray hybridization design. _________________________ 102
2. Hierarchical clustering analyses of gene expression patterns. ______________ 103
3. Hierarchical clustering analyses of gene expression patterns for select genes and operons. _________________________________________________________ 104
4. Volcano plot comparison of gene expression between FOS and raffinose. ____ 105
5. Contour plot comparison of gene expression between FOS, raffinose and trehalose. _________________________________________________________________ 106
6. Global differential gene expression. ___________________________________ 107
7. Gene fold induction. ________________________________________________ 108
8. RT-Q-PCR analysis of differentially expressed genes. ____________________ 109
9.Genetic loci of interest. ______________________________________________ 110
viii
10.Lactose locus in select lactobacilli. ___________________________________ 111
11. Catabolite responsive elements sequences. ____________________________ 112
12. Carbohydrate utilization in L. acidophilus. ____________________________ 113
13. Expression of glycolysis genes. ______________________________________ 114
Chapter IV
1. Gene distribution over select parameters. ______________________________ 149
2. Chromosomal locations. ____________________________________________ 150
3. Correlations between gene expression level and intrinsic genes parameters. _ 151
4. Analysis of variance, by chromosomal location. _________________________ 153
5. Correlations between gene expression level and intrinsic genes parameters, by chromosomal location. _____________________________________________ 154
6. Gene distribution over select parameters, by chromosomal location. _______ 156
ix
x
List of abbreviations
ABC ATP Binding Cassette ANOVA ANalysis Of Variance CAI Codon Adaptation Index CCR Carbon Catabolite Repression CH CHaperone proteins CRE Catabolite Responsive Element DNA Deoxyribo Nucleic Acid EC Enzyme Commission FOS Fructo Oligo Saccharides GIT Gastro Intestinal Tract GPH Galactoside Pentose Hexuronide LaOT Lagging strand, between the Origin and Terminus LaTO Lagging strand, between the Terminus and Origin LeOT Leading strand, between the Origin and Terminus LeTO Leading strand, between the Terminus and Origin LGT Lateral Gene Transfer LSM Least Squares Means MSM Multiple Sugar Metabolism NCFM North Carolina Food Microbiology NDO Non Digestible Oligosaccharides ORF Open Reading Frame PCR Polymerase Chain Reaction PEP Phospho Enol Pyruvate PHX Predicted Highly eXpressed PTS Phoshoenolpyruvate Transferase System RBS Ribosome Binding Site RNA Ribo Nucleic Acid RP Ribosomal Proteins RSCU Relative Synonymous Codon Usage SD Shine Dalgarno TF Transcription and Translation Factors
CHAPTER I - Literature review: Transport systems in Lactic Acid Bacteria
1.1 Introduction
Bacteria are a dominant and diverse life form on earth. Molecular comparisons
between life forms divide organisms into three groups, namely eubacteria, archaebacteria
and eukaryotes (Woese et al., 1990). At the molecular level, those three groups are based
on differences within the ribosomal RNA (rRNA) structure and sequence (Woese et al.,
1990). This triad-nomenclature includes the eukaryote-prokaryote dichotomy, which is
based on presence / absence of a nucleus. Specifically, life on earth is divided into three
“domains”, namely Bacteria (replacing eubacteria), Archaea (replacing archaebacteria)
and Eucarya (replacing eukaryotes) (Woese et al., 1990; Embley et al., 1994), wherein
there are six “kingdoms”, bacteria, fungi, plantae, animalia, protoctista (protozoa) and
chromista (Embley et al., 1994; Margulis, 1996; Cavalier-Smith, 2004). Both archaea and
bacteria are monohomogenomic, with no nucleus, whereas eucarya are
polyheterogenomic and contain a nucleus (Margulis, 1996).
The importance of microbes for all life-forms has been illustrated recently. Recent
phylogenetic analyses suggest the eukaryotic genome actually resulted from the fusion of
an archaeal genome with a bacterial genome (Margulis, 1996; Rivera and Lake, 2004),
consequently changing the tree of life into the ring of life (Rivera and Lake, 2004). This
recent theory emphasizes the historical and evolutionary importance of the bacterial
kingdom.
Prokaryotic diversity and predominance illustrate the physiological flexibility of
microbes, as well as their adaptability to many environments. A recent metagenomic
oceanic study investigating microbial genome diversity within a water community
illustrates our limited knowledge and comprehension of microbial diversity and
2
physiological properties (Venter et al., 2004), although measures of microbial diversity
have previously shown our limited knowledge of microbial diversity as well (Curtis et al.,
2002; Curtis and Sloan, 2004). The limited extent of microbial diversity is well
documented, and most environmental studies end up uncovering novel species and
lineages (Embley et al., 1996; Cavalier-Smith, 2004). Recent “conservative” assumptions
estimate microbial diversity at over 1030 individuals representing over 107 species
(Embley et al., 1994; Curtis et al., 2002; Curtis and Sloan, 2004), although estimates of
microbial diversity may be inaccurate. Nevertheless, recent advances in microbial
genomics have shown microbial diversity at many levels, especially for microbes that can
be cultured. Specifically, microbial diversity is visible both within and between species,
including differences in genome size, genome content, GC content, codon usage, mobile
genetic elements, cell shape, occurrence in the environment, growth conditions
(temperature, oxygen level, energy sources), and many others.
From a genomic standpoint, microbial diversity is visible through genome size,
GC content (ranging between 25% and 75%; Muto and Osawa, 1987), codon usage
(Grantham et al., 1980), genome content, and occurrence of bacteriophage and plasmids.
Although the differences can be overwhelming and represent a large proportion of the
genome, even within a given species, those differences illustrate physiological adaptation
to various environmental conditions. Specifically, microbes tend to adapt to their
environment via evolutionary pressures, in order to optimize their survival and
competitiveness.
Interestingly, genes encoding sugar transporters and carbohydrate hydrolases can
represent a large proportion of strain-specific genes, with ABC transporters reported to
3
the highest horizontal gene transfer frequency in Thermotoga maritima (Nesbo et al.,
2002). Similarly, it has been suggested that genes involved in catabolic properties of B.
longum (Schell et al., 2002) and sugar uptake genes in L. plantarum (Kleerebezem et al.,
2003) have been acquired via horizontal gene transfer, as part of the adaptation process of
these bacteria to their respective environments.
Understanding how microbes modulate their genomes to acquire physiological
properties and phenotypic traits that further their ability to withstand environmental
conditions and utilize resources available in their various habitats is important.
Specifically, for lactic acid bacteria, this review illustrates how various transport systems
contribute to their ability to utilize a diversity of energy sources available in a number of
habitats.
1.2 The lactic acid bacteria
Lactic acid bacteria (LAB) are a heterogeneous family of microbes which can
ferment a variety of nutrients (Poolman, 2002) primarily into lactic acid. LAB are mainly
Gram-positive, non-sporulating, acid tolerant, anaerobic bacteria divided in two subsets,
the low GC taxa, and the high GC taxa. Biochemically, lactic acid bacteria include both
homofermenters and heterofermenters. The former produce primarily lactic acid, while
the latter yield also a variety of fermentation by-products, including mostly acetic acid,
ethanol, carbon dioxide and formic acid (Hugenholtz et al., 2002; Kleerebezem and
Hugenholtz, 2003). Although their primary contribution consists of the rapid formation of
lactic acid, which results in acidification of food products, they also contribute to flavor,
texture and nutrition in a variety of food products (Kleerebezem and Hugenholtz, 2003).
4
Environmentally, LAB reside in a variety of habitats, including human cavities
such as the gastrointestinal tract (Lactobacillus plantarum, Lactobacillus acidophilus,
Lactobacillus johnsonii, Bifidobacterium longum, Streptococcus agalactiae,
Enterococcus faecalis), the oral cavity (S. mutans, B. longum), the respiratory tract (S.
pneumoniae) and the vaginal cavity (B. longum, S. agalactiae) (Tannock, 1999;
Ouwehand et al., 2002; Vaughan et al., 2002). Additionally, lactic acid bacteria are
naturally found in a variety of environmental niches including dairy, meat, vegetable and
plant environments (Kleerebezem et al., 2003).
The two driving forces behind the tremendous amount of work performed in lactic
acid bacteria are their use in fermentation processes and as probiotics. Specifically, a
diversity of microbial strains is used as starter cultures in the food industry, primarily in
dairy applications, although Lactococcus lactis is by far the best characterized lactic acid
bacterium (Bolotin et al., 1999). Additionally, select strains are used as health-promoting
probiotics in food product and dietary supplements (Gibson and Roberfroid, 1995; Reid,
1999).
In fermentation processes, lactic acid bacteria are used as starter cultures.
Although they are used in fermentation of meats, vegetables and wine, they are primarily
used in dairy processes. Specifically, they are widely used in cheese and yogurt
manufacturing. As a result, Lactococcus lactis is perhaps the most extensively studied
species among LAB, and a variety of genetic tools have been developed therein (Bolotin
et al., 1999; Hugenholtz et al., 2002; Kleerebezem and Hugenholtz, 2003).
Probiotics are generally defined as “live microorganisms which, when
administered in adequate amounts, confer a health benefit on the host” (Reid et al., 2003).
5
Probiotic microbes promote health via their presence and sometimes residence in the
human gastrointestinal tract, and interaction with the intestinal flora and host tissue. As a
result, phenomena such as adherence to human epithelial cells, survival at low pH,
resistance to acids, survival in the presence of bile salts, and competition with other
commensals all contribute to their ability to survive and promote human health (Sanders
and Klaenhammer, 2001). However, those functionalities rely on the survival and
competitiveness of the strain, which is dependent upon its ability to efficiently use
nutrient sources available in the intestinal environment. As a result, transporters are a key
factor involved in probiotic functionality. Lactic acid bacteria generally harbor a
significant number of transporters for acquisition of a diverse set of carbohydrates and
amino-acids.
Similarly, organisms used in fermentation applications need to use energy sources
available in their environment in order to carry out the desired metabolic processes. As a
result, uptake of nutrients, particularly carbohydrates is essential for fermentative LAB.
Therefore, identification and characterization of their transport systems is essential to
develop our understanding of the physiological processes involved in their
functionalities.
Although a large diversity of microbes produce lactic acid, only select members
of the lactic acid bacteria are widely used in fermentation processes and probiotic
applications. The primary genera employed are: Lactococcus, Lactobacillus,
Streptococcus, Bifidobacterium, and to a lesser extent Leuconostoc, and Oenococcus.
Additionally, within those genera, most of the work has focused on only a few select
species, as shown in Table 1.
6
A large and diverse microbial community resides in the human gastrointestinal
(Tannock, 1999). In particular, the complex microbial population in the intestine includes
beneficial bacteria such as bifidobacteria and lactobacilli (Tannock, 1999; Ouwehand et
al., 2002; Vaughan et al., 2002). Although they are not dominant microbes, probiotics are
important organisms that can promote health in a variety of mucosal locations, including
the human intestine. In humans, lactobacilli and bifidobacteria in particular, are perceived
as exerting health-promoting properties (Gibson and Roberfroid, 1995; Ouwehand et al.,
2002). Lactobacilli have been associated with a variety of health-promoting
functionalities, widely documented for humans, specifically in the case of Lactobacillus
species (Reid, 1999; Sanders and Klaenhammer, 2001). The large intestine in particular is
the most heavily colonized region of the human digestive tract (Gibson and Roberfroid,
1995). The colonic microbiota feeds on the unabsorbed remains of the diet, which
primarily consist of non-digestible sugars (Alles et al., 1996). Even though microbes have
a limited capacity to utilize substrates present in the environment, some bacteria have a
diverse genomic makeup shaped by evolution and adaptation that is selectively fashioned
to utilize and catabolize a wide range of nutrients present in their environmental niche.
Consequently, a wide carbohydrate catabolic potential likely allows microbes to compete
and survive in environmental niches where sugar molecules are scarce, as previously
suggested for Lactobacillus plantarum (Kleerebezem et al., 2002), Lactobacillus
acidophilus (Barrangou et al., 2003; Altermann, 2004), Lactobacillus johnsonii
(Pridmore et al., 2004) and Bifidobacterium longum (Schell et al., 2004). The ability of
select intestinal microbes to utilize intestinal nutrients, including substrates non-digested
by the host plays an important role in their ability to successfully survive and colonize the
7
mammalian intestinal tract. Whether they are fermentative organisms, or health-
promoting probiotics, microbial growth is primarily dependent upon energy sources such
as carbohydrates.
1.3 Genomics of lactic acid bacteria
In the recent past, substantial progress has been achieved in microbial genomics,
particularly in genome sequencing. To date, over 193 complete microbial genomes have
been published (NCBI website, www.ncbi.nlm.nih.gov/genomes/MICROBES/
complete.html), including 174 bacteria and 19 archaea, covering a wide diversity of
taxonomic groups. Early microbial genome analyses suggest that genome content reflects
adaptation to environmental conditions, specifically genes involved in transport and
catabolism of nutrients, since microbes shape their genomes to efficiently utilize
available resources and adapt to their habitats, according to temperature, levels of
oxygen, toxic compounds, and other factors.
The genome sequences of several lactic acid bacteria have been published,
including Lactococcus lactis (Bolotin et al., 1999), S. mutans (Ajdic et al., 2002), S.
pneumoniae (Tettelin et al., 2001), S. agalactiae (Tettelin et al., 2002), S. pyogenes
(Ferretti et al., 2001), Bifidobacterium longum (Schell et al., 2002), Lactobacillus
plantarum (Kleerebezem et al., 2003), L. johnsonii (Pridmore et al., 2004) and L.
acidophilus (Altermann et al., 2004). Several more are underway (Klaenhammer et al.,
2002; Siezen et al., 2004). For these LAB, probiotic organisms and other intestinal
microbes, genome features are presented in Table 1.
8
Lactic acid bacteria are low GC organisms (lactobacilli, streptococci, lactococci)
and high GC organisms (bifidobacteria, brevibacteria) (Table 1). LAB genomes vary
widely in size (between 1.8 and 4.4 Mbp), although most genomes are between 1.8 and
2.5 Mbp (Table 1). Genetically, LAB are diverse, as illustrated in Figure 1, including
high GC genera such as Bifidobacterium and Brevibacteria, and distinct low GC genera
Leuconostoc and Oenococcus seems distant from other LAB (Figure 1). In contrast,
streptococci and lactococci appear closely related, as well as lactobacilli and pediococci
(Figure 1).
Recent genome analyses have shown that bifidobacteria, streptococci and
lactobacilli possess specialized saccharolytic potentials which reflect the nutrient
availability in their respective environments (Tettelin et al., 2001; Ajdic et al., 2002;
Schell et al., 2002; Kleerebezem et al., 2003; Altermann et al., 2004; Pridmore et al.,
2004). Analysis of the L. plantarum genome revealed a variety of transporters, suggesting
a broad capacity to adapt to varying environmental conditions (Kleerebezem et al., 2003).
In particular, a “lifestyle adaptation island” bearing genes involved in sugar transport and
metabolism was defined on the chromosome (Kleerebezem et al., 2003). Similarly, the
diversity of transporters in S. mutans and S. pneumoniae have been associated with an
increased ability to utilize nutrient sources present in their environments, namely the oral
cavity and respiratory tract (Tettelin et al., 2001; Ajdic et al., 2002). The L. acidophilus
NCFM genome was also recently determined, and further substantiates these
observations (Altermann et al., 2004). Early analyses indicate that the genome contents of
bifidobacteria and lactobacilli reflect their habitats, particularly with regards to transport
systems able to utilize a variety of carbohydrates. In silico analyses of the genes encoded
9
in these genomes provide insight as to their fermentative and uptake capabilities. In
particular, a variety of putative carbohydrate transporters have been identified, suggesting
a wide saccharolytic potential for most of these microbes, especially with regards to
mono- and di-saccharides. However, most of the substrates for ABC transporters, and
some of the substrates for PTS transporters remain unknown (Altermann et al., 2004).
This is not uncommon, since a large portion of the content of microbial genomes remains
obscure, even for model organisms, consisting of unknown ORFs and conserved genes
encoding hypothetical proteins.
Within LAB. a diverse saccharolytic potential has previously been associated with
microbial ability to establish residency in specific environmental niches, in particular
adaptation of Bifidobacterium longum to the human gastro-intestinal tract (GIT) (Schell
et al., 2002), cariogenic activity of Streptococcus mutans in the oral cavity
(Vadeboncoeur and Pelletier, 1997; Ajdic et al., 2002), and the incidence of Lactobacillus
plantarum in a variety of environmental niches (Kleerebezem et al., 2003). Perhaps a
diverse catabolic potential is derived from environmental pressures, in response to
competition for scarce nutrients in the intestinal ecosystem (Schell et al., 2002;
Barrangou et al., 2003) and in the mouth cavity (Vadeboncoeur and Pelletier, 1997; Ajdic
et al., 2002). Although energy sources in the environment are vital for survival, the
capacity to uptake them efficiently can result in a competitive advantage. Therefore,
understanding the transportomes of microbes is expected to provide insight into their
respective abilities to survive and compete within their natural habitats. The classification
of transporter families encoded within a genome, and the identification of the uptake
systems provides a platform for understanding which resources are used by a specific
10
microorganism. Although there are only a few families of transporters well characterized
in prokaryotes, within each family, there are a diverse number of uptake systems with
varying substrate specificities.
This overview will describe the main families of transporters identified in lactic
acid bacteria, categorize within each family the uptake systems that are well
characterized, and investigate the diversity of transport capabilities within and between
organisms. Specifically, the capability of LAB to utilize a variety of carbohydrates via
the PTS and ABC transporter super families of transporters will be reviewed.
1.4 Fermentation capabilities of lactic acid bacteria
There are different means by which carbohydrates are utilized by bacteria: either
they are hydrolyzed outside of the cell into readily fermentable sugars and transported
into the cell thereafter, or they are transported into the cell and then catabolized. Either
way, carbohydrates have to be transported into the cell in order to be catabolized and
used as an energy source.
Although early genome analyses of LAB genomes have specifically looked at
utilization of carbohydrates, the actual substrates for the majority of the transporters
identified remain unknown. Additionally, the classification of transporters into specific
families, and attribution of a specific substrate, derived from in silico analyses, remains
largely putative. Nevertheless, the comparison of both fermentation patterns and genomic
content provides substantial insight into the transport abilities of LAB.
The fermentation profiles for L. acidophilus, L. johnsonii, L. gasseri and L.
plantarum are shown in Table 2. Additionally, detailed transporter annotations are
11
available for S. mutans, S. pneumoniae, L. lactis and L. acidophilus (Table 2). It appears
that most LAB have the ability to utilize a variety of mono- and di-saccharides,
specifically, hexoses such as fructose, glucose, galactose and mannose, and disaccharides
such as cellobiose, lactose, maltose, sucrose and trehalose (Table 1). In contrast,
utilization discrepancies are observed between LAB for pentoses, oligosaccharides, sugar
alcohols, deoxysugars and modified sugars (Table 2). Globally, it appears that LAB are
specialized for utilization of hexoses and disaccharides, and select species have gained
the ability to utilize more complex carbohydrates individually. This is consistent with
previous findings in LAB suggesting that L. plantarum, L. johnsonii and L. lactis appear
to ferment mainly mono-, di- and tri-saccharides (Siezen et al., 2004).
For the intestinal LAB, a limited number of species have the ability to transport
undigested complex carbohydrates, including prebiotics. Prebiotics are defined as “non-
digestible substances that provide a beneficial physiological effect on the host by
selectively stimulating the favorable growth or activity of a limited number of indigenous
bacteria” (Reid et al., 2003). These compounds include non-digestible plant
oligosaccharides such as FOS and raffinose (Van Laere et al., 2000; Rycroft et al., 2001).
Among LAB, L. acidophilus, L. plantarum, L. casei, S. thermophilus and a variety of
bifidobacteria have the ability to utilize FOS (Kaplan and Hutkins, 2000).
There are three primary families of transporters in LAB that have been identified
for sugar transport: (i) secondary active transport via the major facilitator superfamily
(MFS); (ii) the phosphoenolpyruvate transferase system (PTS); and (iii) the ATP binding
cassette (ABC) transport system (Paulsen et al., 1998; Paulsen et al., 2000; Saier, 2000;
Kaplan and Hutkins, 2003).
12
1.5 ABC transporters
The ABC superfamily (TC #3.1) is a diverse family of transporters which include
both inwardly importers and outwardly exporters (Saier, 2000; Davidson and Chen,
2004), whereby substrate translocation is coupled with adenosine tri phosphate (ATP)
hydrolysis (Locher et al., 2002). TC numbers represent categories of the Transport
Commission classification (Saier, 2000). ABC transporters are a dominant transporter
superfamily in bacteria (Paulsen et al., 2000), and, they are the most abundant class of
primary transport systems in lactic acid bacteria (Poolman, 2002). ABC transporters are
the most dominant transporter family in L. plantarum, wherein 57 complete systems were
annotated (Kleerebezem et al., 2003), in S. mutans where over 60 ABC transporters are
hypothetically present (Ajdic et al., 2002) and in S. pneumoniae where over 30% of
transporters are predicted to be sugar transporters. Although ABC transporters recognize
a variety of substrates, in LAB, ABC uptake transporters primarily recognize
carbohydrates. In contrast, in B. longum, most of the 25 ABC transporters seem to have
specificity for oligopeptides and amino acids (Schell et al., 2003). For most LAB,
members of the ATP binding cassette (ABC) family of transporters include uptake
proteins identified primarily for the transport of mono-, di-, tri- and poly- saccharides.
Specifically, ABC transporters have been characterized for the transport of maltose,
trehalose, lactose, arabinose, ribose, glucose, fucose, raffinose, and a variety of peptides.
ABC transporters usually consist of several subunits, namely the nucleotide
binding domains (NBDs), the membrane spanning domains (MSDs), and substrate
binding proteins (SBPs) (Quentin et al., 1999; Braibant et al., 2000; Poolman, 2002). The
13
minimum “complete” ABC transporter must include both a nucleotide binding domain
and a membrane spanning domain. Importers are usually pentameric, including two
NBDs, two MSDs and one SBP, whereas exporters are tetrameric, including two NBSs
and two MSDs (Braibant et al., 2000) (Figure 2).
There are many sub-families of ABC transporters, which are classified by the
nature of the substrate being translocated, including peptides, amino-acids, drugs,
antibiotics, iron, ions, and carbohydrates (Braibant et al., 2000). For importers, ABC
transporters involved in the uptake of carbohydrates are a key sub-family. Specifically,
most carbohydrate ABC transporters are similar to MalEFGK (Paulsen et al., 2000),
whereby MalE is a periplasmic substrate/solute binding protein (pfam 01547), MalFG are
two membrane-spanning permeases (pfam 00528), and MalK is a cytoplasmic
nucleotide-binding protein (pfam 00005), characteristic of the four subunits of a typical
ABC transport system (Quentin et al., 1999). In prokaryotes, the various elements of
ABC transporters are usually encoded by genes in the same operon, or locus, as
illustrated by the malEFGK and msmEFGK operons (Russel et al., 1992; Quentin et al.,
1999; Braibant et al., 2000; Barrangou et al., 2003).An anchoring motif similar to LPxTG
is usually present at the N-terminus of the substrate binding protein, allowing attachment
of this protein to the cell wall via a hydrophobic lipid extension (Quentin et al., 1999;
Braibant et al., 2000). However, this anchoring motif can vary between organisms, as
shown in L. plantarum, where the anchoring consensus sequence is LPQTxE
(Kleerebezem et al., 2003). Each permease usually contains four to eight transmembrane
α-helices, with most MSDs containing six trans-membrane regions (Table 3, Figure 3),
14
which form a trans-membrane channel allowing transport of the substrate across the
membrane, into the cell cytoplasm.
For the nucleotide binding protein, which is responsible for the hydrolysis of ATP
associated with transport of each molecule into the cell, there are several well conserved
motifs typical of ABC transporters. Specifically, genome-wide analyses of ABC
transporters in prokaryotes have shown that four motifs within the NBDs are highly
conserved between and within species, namely: Walker A (P loop), Walker B, Linton and
Higgins, and the ABC signature sequence (Linton and Higgins, 1998; Quentin et al.,
1999; Braibant et al., 2000; Locher et al., 2001; Davidson and Chen, 2004). The Walker
A motif has a GxxGxGKST / [AG]xxxxGK[ST] consensus, Walker B has a hhhhDEPT /
DExxxxxD consensus, the Linton and Higgins has a hhhhH+/- consensus, and ABC
signature sequence has a LSGG / LSGGQ consensus, whereby h and +/- represent
hydrophobic and charged residues, respectively (Linton and Higgins, 1998; Quentin et
al., 1999; Braibant et al., 2000; Davidson and Chen, 2004).
Perhaps the best characterized sugar ABC transporter in LAB is the MsmEFGK
transport system (Russell et al., 1992). It was originally described in S. mutans (Russell et
al., 1992; McLaughlin et al., 1996), and homologs were found in S. pneumoniae
(Rosenow et al., 1999) and L. acidophilus (Barrangou et al., 2003; Altermann et al.,
2004). MsmEFGK is involved in uptake of multiple sugars, including melibiose,
raffinose, isomaltotriose and FOS (Russell et al., 1992; Barrangou et al., 2003; Kaplan
and Hutkins, 2003).
Also, in B. longum, MalEFGK-like ABC transporters seem to be involved in the
transport of plant oligosaccharides such as arabinoglycans and arabinoxylans, which is
15
consistent with the presence of endoarabinosidases and endoxylanases in the genome
(Schell et al., 2003). A similar combination has also been found in E. faecalis (Paulsen et
al., 2003).
Most exporters are involved in transport of components toxic to the cell, such as
drugs and antibiotics (Poolman, 2002), whereas most importers are involved in transport
of energy sources and building blocks. Multidrug ABC transporters are commonly found
in LAB genomes. In particular, the LmrA multidrug ABC transporter has been well
characterized in Lactococcus lactis (van Veen et al., 1999). LmrA has the ability to
export anthracyclines, vinca-alkaloids, antibiotics and cytotoxic agents such as ethydium
bromide (van Veen et al., 1999). Multidrug ABC transporters are part of the mechanisms
developed by microbes in response to the occurrence of toxic compounds in their natural
habitats.
Overall, ABC transporters involved in carbohydrate uptake seem to have affinity
primarily for tri- and poly-saccharides. The substrate specificity is determined by the
substrate binding protein, although one specific SBP can recognize more than one
substrate, as illustrated by the msm operon in S. mutans (Russell et al., 1992). In
environments whereby tri- and poly-saccharides are present, such as the lower gastro-
intestinal tract, ABC transport systems are expected to provide a competitive advantage
by expanding the organism’s access to the pool of available substrates.
1.6 PTS transporters
Members of the phosphoenolpyruvate:sugar phosphotransferase system family of
transporters include uptake proteins identified primarily for the transport of mono- and
16
di-saccharides. The PTS is characterized by a phosphate transfer cascade involving
phosphoenolpyruvate (PEP), enzyme I (EI), HPr, and various EIIABCs, whereby a
phosphate originating from PEP is ultimately transferred to the carbohydrate substrate
(Vadeboncoeur and Pelletier, 1997; Siebold et al., 2001; Poolman, 2002; Warner and
Lolkema, 2003). Specifically, PTS transporters (TC # 4.1 - 4.6) have been characterized
for the transport of glucose, mannose, fructose, cellobiose, sucrose, trehalose (Table 2). It
was previously suggested the PTS system is the primary sugar transport system of Gram-
positive bacteria (Ajdic et al., 2002; Warner and Lolkema, 2003). Although PTS
transporters are not found in archaea or eukarya, they are present in most bacteria
(Paulsen et al., 2000; Saier, 2000). The PTS consists of three (EIIA, B and C) or four
domains (EIIA, B, C and D) (Saier and Reizer, 1992). The hydrophilic chains bearing the
first and second phosphorylation sites are EIIA and EIIB, respectively, while the
transmembrane channel and sugar binding site consist of EIIC (Saier and Reizer, 1998).
The number of predicted transmembrane spanning domain is usually 10 in PTS
transporters (Table 3, Figure 3), which is different from ABC transporters. When
applicable, EIID is the hydrophobic protein of the splinter group (Saier and Reizer,
1992). The range and specificity of substrates transported by each PTS transporter is
determined by the range of the EII complex.
In streptococci, PTS transporters are important in carbohydrate uptake and
regulation (Vadeboncoeur and Pelletier, 1997). Specifically, in Streptococcus salivarius,
Streptococcus mutans and Streptococcus sobrinus, PTS transporters involved in uptake of
a variety of mono- and di- saccharides have been identified (Vadeboncoeur and Pelletier,
1997). In contrast, only one PTS transporter is present in B. longum (Schell et al., 2003).
17
In lactobacilli, a variety of PTS transporters have been identified, including 13, 16
and 25 complete PTS transporters in L. lactis, L. johnsonii and L. plantarum, respectively
(Bolotin et al., 1999; Schell et al., 2003; Kleerebezem et al., 2003). In streptococci, a
variety of PTS transporters have also been identified, including 21 complete PTS
transporters in S. pneumoniae (Tettelin et al., 2001).
Since there is a correlation between the genomic association of genes and
functional interaction of the proteins they encode (Snel et al, 2002), catabolic enzymes
are expected to be encoded in the vicinity of the genes encoding transporters of their
substrates. Similarly, transcriptional regulators are also commonly found in the vicinity
of the genes they control. As a result, for carbohydrate loci, transcriptional regulators,
transporters and sugar hydrolases are usually found in operons and loci.
Perhaps the best characterized PTS transporters in LAB are the sucrose and
glucose/mannose transport systems (Luesink et al., 1999b; Cochu et al., 2003). The
sucrose PTS locus has been described in L. lactis (Luesink et al., 1999b), L. plantarum
(Naumoff and Livshits, 2001) and P. pentosaceus (Naumoff and Livshits, 2001). The
glucose/mannose PTS EIIABCDMan transporter was recently characterized in S.
thermophilus (Cochu et al., 2003). Both PTS transporters have also been found in
recently sequenced LAB genomes (see table 2).
A number of lactic acid bacteria uptake glucose and mannose via a PTS
transporter. Specifically, the EIIMan PTS transporter has the ability to uptake both
mannose and glucose (Cochu et al., 2003). The glucose-mannose PTS transporter has
been well characterized in S. thermophilus (Cochu et al., 2003). The glucose PTS has
18
been identified in a variety of streptococci, namely S. mutans, S. sobrinus and S.
thermophilus (Vadeboncoeur and Pelletier, 1997).
Several PTS loci have been well characterized in LAB, especially the glucose,
fructose and sucrose loci, which contain the mannose / glucose PTS transporter
EIIABCDMan, the fructose PTS transporter EIIABCFru, and the sucrose PTS transporter
EIIBCASuc. Additionally, the trehalose locus, including a trehalose PTS transporter
EIIABCTre has been well characterized in L acidophilus (Duong et al., 2004). Putative
PTS transporters have also been identified in a variety of LAB (Table 2), but the
annotation is based on similarity to other non-LAB transporters, and most have not been
substantiated by functional analyses. In streptococci, PTS activity has been shown for
glucose, fructose, mannose, lactose, mannitol, sorbitol, maltose, sucrose, trehalose, and
xylitol (Vadeboncoeur and Pelletier, 1997).
Overall, PTS transporters involved in carbohydrate uptake appear to have affinity
primarily for mono- and di-saccharides. The substrate specificity is determined by the
EIIA, EIID or EIIC substrate binding protein, although one specific SBP can recognize
more than one substrate, as illustrated by the mannose / glucose EIIABCDMan. In
environments whereby mono- and di- saccharides are present, such as the upper
gastrointestinal tract, PTS transport systems might provide efficient carbohydrate
utilization and potentially a competitive advantage.
1.7 Other transporters
Lactic acid bacteria possess a variety of transport systems (Saier, 2000; Konings,
2002). In addition to ABC and PTS transporters, the main transporter families include the
19
F0F1 ATPase, the uniport / symport / antiport systems, and the protein secretion / export
system (Konings, 2002; Figure 2).
The secondary transport system, including uniport / symport / antiport complexes
is involved primarily in transport of amino acids, ions, and acids (Konings, 2002).
Specifically, amino-acid transporters have been well characterized in Lactococcus lactis
(Bolotin et al., 1999; Konings, 2002).
The F0F1-ATPase (TC 3.1, Paulsen et al., 1998) has been well characterized in
several LAB, including lactobacilli (Kullen and Klaenhammer, 1999; Sievers et al.,
2003), bifidobacteria (Ventura et al., 2004), oenococci (Sievers et al., 2003), pediococci
(Seivers et al., 2003) and Lactococcus lactis (Bolotin et al., 1999; Konings, 2002). The
operon has been well characterized in L. acidophilus (Kullen and Klaenhammer, 1999)
and B. lactis (Ventura et al., 2004), whereby atpBEFHAGDC encode the a, c, b, δ, α, γ,
β, and ε subunits of the F0F1-ATPase, respectively. This transport system is an important
element in the response and tolerance to low pH, which is instrumental for resistance to
acid stress in the human gastrointestinal tract. This is another typical example of how
genomes of intestinal microbes include specific transporters which allow them to exist in
various environments. Similarly, members of the Oenococcus and Leuconostoc genera
used in wine fermentation also have a F0F1-ATPase which confers resistance to low pH
(Sievers et al., 2003).
The major facilitator superfamily (MFS) includes a variety of transporters (TC
#2.1-2.2). Specifically, the glycoside-pentoside-hexuronide (GPH):cation symporter
family is associated with transport of carbohydrates, including galactosides (Saier, 2000).
This family includes 12 transmembrane domains (Saier, 2000), which is different from
20
PTS transporters, and similar to the two combined MSDs in each ABC transporter (Table
3, Figure 3).
With regard to drug resistance, in addition to multidrug ABC transporters, LAB
have also developed secondary transporters which export drugs and toxic compounds
(van Veen et al., 1999). Specifically, in L. lactis, LmrP mediates the extrusion of drugs,
such as antibiotics, in antiport with protons (van Veen et al., 1999). This system is also
part of the major facilitator superfamily (Saier, 2000).
Member of the LacS subfamily of galactoside-pentose-hexoronide subfamily of
translocators have been identified for the uptake of lactose and galactose in Lactobacillus
bulgaricus (Leong-Morgenthaler et al., 1991), Leuconostoc lactis (Vaughan et al., 1996),
S. thermophilus (van den Bogaard et al., 2000; Vaillancourt et al., 2002), S. salivarius
(Vaillancourt et al., 2002; Lessard et al., 2003), L. delbrueckii (Lapierre et al., 2002) and
L. helveticus (Fortina et al., 2003). A similar GPH transporter, LacY, is present in L.
lactis (Bolotin et al., 1999).
Although LacS contains a PTS EIIA at the carboxy-terminus, towards the
cytoplasmic side of the protein (Vaughan et al., 1996; Lessard et al., 2003), it is not a
member of the PTS family of transporters. Also, LacS contains 12 transmembrane
domains, which differs from PTS transporters (Table 3, Figure 3). LacS has been reported
to have the ability to take up both galactose and lactose in select organisms (Vaughan et
al., 1996; van den Bogaart et al., 2000), although the specificity varies between
organisms, and depends on the presence of alternative galactoside transporters in the
organism.
21
A LacS-LacY homolog was also identified in L. brevis (Djordjevic et al., 2001).
Although it is a member of the GPH family (TC # 2.2), it did not include a PTS IIA
domain, indicating dependence upon a different regulatory network than that of the PTS
and other GPH transporters (Djordjevic et al., 2001).
The gene encoding the GPH transporter is usually associated with ORFs encoding
enzymes involved in the metabolism of galactosides. Specifically, saccharolytic enzymes
likely involved in the metabolism of galactosides include the enzymatic machinery of the
Leloir pathway, although operon organization is variable and unstable among LAB
(Lapierre et al., 2002; Vaillancourt et al., 2002; Boucher et al., 2003; Fortina et al., 2003;
Grossiord et al., 2003; Pridmore et al., 2004). The Leloir pathway allows catabolism of
both lactose and galactose into substrates of glycolysis (Grossiord et al, 2003).
Alternatively, the tagatose pathway may also metabolize galactosides (de Vos, 1996;
Boels et al., 2003).
1.8 Regulation and carbon catabolite repression
To understand how microbes utilize carbohydrates, we must determine the genetic
and biochemical bases for sugar transport into the cell, and identify the regulatory
networks involved in transcription of genes encoding transporters. Carbohydrate transport
and catabolism are well orchestrated in LAB, so as to utilize carbohydrate sources
optimally. The regulatory mechanism for global carbohydrate utilization is carbon
catabolite repression (CCR).
Carbon catabolite repression (CCR) is a mechanism widely distributed amongst
Gram-positive bacteria, usually mediated in cis by catabolite response elements (cre)
22
(Weickert and Chambliss, 1990; Miwa et al., 2000), and in trans by repressors of the
LacI family, responsible for transcriptional repression of genes encoding unnecessary
saccharolytic components (Weickert and Chambliss, 1990; Viana et al., 2000;
Muscariello et al, 2001; Titgemeyer and Hillen, 2002; Warner and Lolkema, 2003). Cre
sequences (Weickert and Chambliss, 1990) are well conserved amongst Gram-positive
bacteria and found in most LAB in the promoter-operator of many genes involved in
carbohydrate utilization (Barrangou et al., 2003), including: L. plantarum (Muscariello et
al., 2001), L. pentosus (Mahr et al., 2000). CCR controls transcription of proteins
involved in transport and catabolism of carbohydrates (Miwa et al., 2000), as to
transcribe genes encoding the transport and enzymatic machinery of a particular
substrate, exclusively when it is present in the environment. This regulatory system
allows cells to coordinate the utilization of diverse carbohydrates, as to focus primarily
on preferred energy sources (Poolman, 2002). Understanding carbon catabolite repression
is critical to describing how microbes adapt their uptake machinery to changing nutrients
in their environment.
CCR is able to control both PTS, ABC and GPH transporters. Specifically, ABC
transporters of the MsmEFGK family have been shown to be repressed by glucose in a
manner consistent with CCR, in S. pneumoniae (Rosenow et al., 1999; Barrangou et al.,
2003). Similarly, genes of the galactose operon seem to be regulated via CCR in S.
salivarius (Vaillancourt et al., 2002).
The L. acidophilus genome encodes a large variety of genes related to
carbohydrate utilization. In particular, many members of the ABC and PTS families of
transporters were found. Additionally, the members of the general carbohydrate
23
utilization regulatory network were identified, namely HPr (ptsH), E1 (ptsI), CcpA
(ccpA) and HPrK/P (ptsK). Similarly, all those genes were identified in S. pneumoniae
(Tettelin et al., 2001). Those genes are involved in an active regulatory network based on
sugar availability. The regulatory networks involved in sugar utilization are not well
documented in lactobacilli and bifidobacteria, whereas they have been characterized in
streptococci (Vadeboncoeur and Pelletier, 1997). Nevertheless, previous work has
indicated involvement of CcpA in repression of specific operons in L. casei, and L.
plantarum (Viana et al., 2000; Muscariello et al., 2001) and L. pentosus (Mahr et al.,
2000). Specifically, the pepQ-ccpA locus has been identified in L. pentosus, L.
delbrueckii, L. casei, S. mutans and L. lactis (Mahr et al., 2000), and in most cases, a cre
sequence is found in the promoter-operator region of ccpA. The PTS is characterized by a
phosphate transfer cascade involving PEP, EI, HPr, and various EIIABCs, whereby a
phosphate is ultimately transferred to the carbohydrate substrate (Saier, 2000; Titgemeyer
and Hillen, 2002; Warner and Lolkema, 2003). HPr is a key component of CCR, which is
regulated via phosphorylation by enzyme I (EI) and HPr kinase/phosphatase (HPr K/P).
While HPr is the primary regulator of CCR, HPr K/P is the sensor enzyme of CCR in
Gram positive bacteria (Nessler, et al., 2003). HPrK/P has been found in a variety of
LAB, including L. casei, L. brevis, L. delbrueckii, L. gasseri, L. acidophilus, L. lactis,
Streptococcus bovis, S. mutans, S. salivarius, S. pneumoniae, S. pyogenes, S. agalactiae
and Leuconostoc mesenteroides (Warner and Lolkema, 2003; Altermann et al., 2004).
Similarly, HPr has also been found in a variety of LAB, including L. casei, L. sakei, L.
acidophilus, L. gasseri, L. brevis, L. mesenteroides, L. lactis. E. Faecalis, S. mutans, S.
salivarius, S. bovis, S. pyogenes, S. pneumoniae, S. thermophilus, S. agalactiae and
24
Oenococcus oeni (Warner and Lolkema, 2003; Altermann et al., 2004). The HPr-HPrK/P
complex has been characterized structurally (Fieulaine et al., 2002). When HPr is
phosphorylated at His15, the PTS is on (Poolman, 2002), and carbohydrates transported
via the PTS are phosphorylated via EIIABCs. In contrast, when HPr is phosphorylated at
Ser46, the PTS machinery is not functional (Vadeboncoeur and Pelletier, 1997;
Mijakovic et al., 2002,; Nessler et al., 2003). HPr-Ser46 acts as a co-repressor by binding
to CcpA (Fieulaine et al., 2002; Nessler et al., 2003). Ultimately, CcpA binds to cre
sequences in the promoter-operator region of operons encoding carbohydrate transporters
and hydrolases, and prevents their transcription (Hueck and Hillen, 1995; Poolman,
2002).
HPr has been identified in E. faecalis (Vadeboncoeur and Pelletier, 1997), S.
pyogenes (Deutscher and Saier, 1983; Vadeboncoeur and Pelletier, 1997), and L. lactis
(Luesink et al., 1999a).
CcpA-dependent repression and activation is well documented in a variety of
LAB, including enterococci, lactobacilli, lactococci and streptococci, especially with
regard to repression of the genes involved in utilization of galactosides (Titgemeyer and
Hillen, 2002).
The interaction between HPr and LacS has been shown in S. salivarius (Lessard et
al., 2003). It happens between HPr-His and EIIALacS, although LacS is not a member of
the PTS system. Since HPr is the primary regulator of CCR, the interaction between HPr
and LacS illustrates the likely regulation of the GPH system by CCR. In S. thermophilus,
the control of LacS by CCR has been illustrated, likely via interaction between CcpA and
25
two cre sequences found in the promoter-operator region of the lacSZ. Operon (van den
Bogaard et al., 2000).
Although the phosphorylation cascade suggests regulation at the protein level,
studies in LAB report both transcriptional modulation and constitutive expression of
ccpA and ptsHI. Specifically, in S. thermophilus, CcpA production is induced by glucose
(can den Bogaard, 2000). Similarly, in other bacteria, the carbohydrate source modulates
ptsHI transcriptional levels (Luesink et al., 1999a). In contrast, expression levels of ccpA
in L. pentosus (Mahr et al., 2000) and of ptsHI in S. thermophilus (Cochu et al., 2003) did
not vary in the presence of different carbohydrates.
Carbon catabolite repression is likely present in L. acidophilus, since all the
necessary regulatory proteins are encoded within its genome, cre-like sequences are
present in the promoter-operator regions of several carbohydrate loci (Barrangou et al.,
2003), and transcription of operons involved in utilization of non-preferred carbohydrates
is repressed by glucose (Barrangou et al., 2003).
Carbon catabolite repression illustrates how lactic acid bacteria adapt dynamically
to the diverse carbohydrate sources available in their various habitats.
1.9 Conclusions and perspectives
Although a variety of putative carbohydrate transporters have been identified in
LAB genomes recently published, little information is available regarding their biological
functions and expression profiles. Specifically, the substrate specificity of most PTS and
ABC transporters remains unclear, as illustrated in the incomplete annotation of most
PTS transporters in L. plantarum, L. acidophilus, L. johnsonii and S. pneumoniae
26
(Kleerebezem, 2003; Altermann, 2004; Schell et al., 2003; Tettelin et al., 2001). As a
result, in silico analyses must be confirmed and complemented by transcriptional and
biological analyses.
Surveys of carbohydrate uptake systems revealed greater diversity in prokaryotes
than eukaryotes. Specifically, eukaryotic carbohydrate transport is dominated by the
MFS, whereas that of prokaryotes involved both the MFS, PTS and ABC superfamilies
of transporters (Saier, 2000).
Recent advances in high throughput technologies, primarily genome sequencing
and microarrays have yielded global data that provide insight into the physiology of
microbes. Particularly, LAB genome analyses have illustrated the breadth and importance
of carbohydrate transporters in lactobacilli and bifidobacteria. Similarly, global
transcriptome analyses, similar to those carried out in Escherichia coli (Beloin et al.,
2004), Bacillus subtilis (Blencke et al., 2003), Vibrio cholerae (Meibom et al., 2003),
Thermotoga maritima (Chhabra et al., 2003; Pysz et al., 2004a; Pysz et al., 2004b) and
Pyrococcus furiosus (Shockley et al., 2003), applied to carbohydrate utilization
investigation in LAB will provide further insight into the transporters and metabolic
pathways involved in adaptation of LAB to their various environmental conditions.
Ultimately, genetic engineering of LAB could allow development of better starter
cultures and probiotic strains, optimized for utilization of specific carbohydrate sources,
and competition with other commensals. Genetic engineering in LAB is now possible,
following the development of molecular biology tools, including food-grade systems (de
Vos, 1996; Russell and Klaenhammer, 1998; Boucher et al., 2002; Kleerebezem and
Hugenholtz, 2003).
27
Overall, the combination of a diverse saccharolytic enzymatic machinery with a
polyvalent transport system, consisting primarily of ABC and PTS transporters, allows
lactic acid bacteria to utilize a variety of nutrient resources efficiently and dynamically
adapt its transcriptome to environmental conditions, ultimately rending these microbes
more competitive in their respective environments.
28
1.10 References
Ajdic, D., McShan, W. M., McLaughlin, R. E., Savic, G., Chang, J., Carson, M. B., Primeaux, C., Tian, R., Kenton, S., Jia, H., Lin, S., Qian, Y., Li, S., Zhu, H., Najar, F., Lai, H., White, J., Roe, B. A. & Ferretti, J. J. (2002) Proc. Natl. Acad. Sci. USA 99, 14434-14439
Alles, M. S., Hautvast, J. G. A. J., Nagengast, F. M., Hartemink, R., Van Laere, K. M. J.,
and J. B. M. Jansen (1996) Brit. J. Nutr. 76, 211-221 Altermann, E., Russell, W. M., Azcarate-Peril, M. A., Barrangou, R., Buck, L. B.,
McAuliffe, O., Souther, N., Dobson, A., Duong, T., Callanan, M., Lick, S., Hamrick, A., Cano, R., & Klaenhammer, T. R. (2004). J. Bacteriol In review
Barrangou R, Altermann E, Hutkins R, Cano, & Klaenhammer, TR. (2003) Proc. Natl.
Acad. Sci. USA 100, 8957-8962 Beloin, C., Valle, J., Latour-Lambert, P., Faure, P., Kzreminski, M., Balestrino, D.,
Haagensen, J. A. J., Molin, S., Prensier, G., Arbeile, B., & Ghigo, J. M. (2004) Mol. Microbiol. 51, 659-674
Blencke, H. M., Homuth, G., Ludwig, H., Mader, U., Hecker, M., & Stulke, J. (2003)
Metab. Eng. 5, 133-149 Boels, I. C., Kleerebezem, M., & de Vos, W. M. (2003) Appl. Environ. Microbiol. 69,
1129-1135 Bolotin, A., Mauger, S., Malarme, K., Ehrlich, S. D., & Sorokin, A. (1999) Antonie van
Leeuwenhoek 76, 27-76 Boucher, I., Parrot, M., Gaudreau, H., Champagne, C. P., Vadeboncoeur, C., & Moineau,
S. (2002) Appl. Environ. Microbiol. 68, 6152-6161 Boucher, I., Vadeboncoeur, C., & Moineau, S. (2003) Appl. Environ. Microbiol. 69,
4149-4156 Braibant, M., Gilot, P., & Content, J. (2000) FEMS Microbiol. Rev. 24, 449-467 Cavalier-Smith, T. (2004) Proc. R. Soc. Lond. 271, 1251-1262 Chhabra, S. R., Shockley, K. R., Conners, S. B., Scott, K. L., Wolfinger, R. D., & Kelly,
R. M. (2003) J. Biol. Chem. 278, 7540-7552 Cochu, A., Vadeboncoeur, C., Moineau, S, & Frenette, M. (2003) Appl. Environ.
Microbiol. 69, 5423-32
29
Curtis, T. P., & Sloan, W. T. (2004) Curr. Opin. Microbiol. 7, 221-226 Curtis, T. P., Sloan, W. T., & Scannell, J. W. (2002) Proc. Natl. Acad. Sci. USA 99,
10494-10499 Davidson, A. L., & Chen, J. (2004) Annu. Rev. Biochem. 73, 241-268 Deutscher, J., & Saier, M. H. (1983) Proc. Natl. Acad. Sci. USA 80, 6790-6794 De Vos, W. M. (1996) Antonie van Leeuwenhoek 70, 223-242 Djordjevic, G. M., Tchieu, J. H., & Saier, M. H. (2001) J. Bacteriol. 183, 3224-3236 Duong, T., Barrangou, R., Russell, M. W., & Klaenhammer, T. R. (2004) In review Embley, T. M., Hirt, R. P., & Williams, D. M. (1994) Phil. Trans. R. Soc. Lond. 345, 21-
33 Ferretti, J. J., McShan, W. M., Ajdic, D., Savic, D. J., Savic, G., Lyon, K., Primeaux, C.,
Sezate, S., Suvorov, A., Kenton, S., Lai, H. S., Lin, S. P., Qian, Y., Jia, H. G., Najar, F. Z., Ren, Q., Zhu, H., Song, L., White, J., Yuan, X., Clifton, S. W., Roe, B. A., & McLaughlin, R. (2001) Proc. Natl. Acad. Sci. USA 98, 4658-4663
Fieulaine, S., Morera, S., Poncet, S., Mijakoic, I., Galinier, A., Janin, J., Deutscher, J., &
Nessler, S. (2002) Proc. Natl. Acad. Sci. USA 99, 13437-13441 Fortina, M. G., Ricci, G., Mora, D., Guglielmetti, S., & Manachini, P. L. (2003) Appl.
Environ. Microbiol. 69, 3238-43 Gibson, G. R. & Roberfroid, M. B. (1995) J. Nutr. 125, 1401-1412 Grantham, R., Gautier, C., Gouy, M., Mercier, R., & Pave, A. (1980) Nucleic Acids Res.
8, r49-r62 Grossiord, B. P., Luesink, E. J., Vaughan, E. E., Arnaud, A., & De Vos, W. M. (2003) J.
Bacteriol. 185, 870-8 Hueck, C. J., & Hillen, W. (1995) Mol. Microbiol. 15, 395-401 Hugenholtz, J., Sybesma, W., Groot, M. N., Wisselink, W., Ladero, V., Birgess, K., van
Sinderen, D., Piard, J. C., Eggink, G., Smid, E. J., Savoy, G., Sesma, F., Jansen, T., Hols, P., & Kleerebezem, M. (2002) Antonie van Leeuwenhoek 82, 217-235
Kaplan, H., & Hutkins, R. W. (2000) Appl. Environ. Microbiol. 66, 2682-2684
30
Kaplan, H., & Hutkins, R. W. (2003) Appl. Environ. Microbiol. 69, 2217-2222 Klaenhammer, T. R., Altermann, E., Arigoni, F., Bolotin, A., Breidt, F., Broadbent, J.,
Cano, R., Chaillou, S., Deutscher, J., Gasson, M., van de Guchte, M., Guzzo, J., Hartke, A., Hawkins, T., Hols, P., Hutkins, R., Kleerebezem, M., Kok, J., Kuipers, O., Lubbers, M., Maguin, E., McKay, L., Mills, D., Nauta, A., Overbeek, R., Pel, H., Pridmore, D., Saier, M., van Sinderen, D., Sorokin, A., Steele, J., O'Sullivan, D., de Vos, W., Weimer, B., Zagorec, M., and Siezen, R. (2002) Antonie Van Leeuwenhoek 82, 29-58
Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,
R., Tarchini, R., Peters, S. A., Sandbrink, H. M., Fiers, M. W., Stiekema, W., Lankhorst, R. M., Bron, P. A., Hoffer, S. M., Groot, M. N., Kerkhoven, R., de Vries, M., Ursing, B., de Vos, W. M. & Siezen, R. J. (2003) Proc. Natl. Acad. Sci. USA 100, 1990-5
Kleerebezem, M., & Hugenholtz, J. (2003) Curr. Opin. Biotechnol. 14, 232-237 Konings, W. N. (2002) Antonie van Leeuwenkoeck 82, 3-27 Krogh, A., Larsson, B., von Heijne, G., & Sonnhammer, E. L. L. (2001) J. Mol. Biol.
305, 567-580 Kullen, M. J., & Klaenhammer, T. R. (1999) Mol. Microbiol. 33, 1152-1161 Kumar, S., Tamura, K., Jakobsen, I. B., & Nei, M. (2001) Bioinformatics 17, 1244-1245 Lapierre, L., Mollet, B., & Germond, J. E. (2002) J. Bacteriol. 184, 928-35 Leong-Morgenthaler, P., Zwahlen, M. C., & Hottinger, H. (1991) J. Bacteriol. 173, 1951-
1957 Lessard, C., Cochu, A., Lemay, J. D., Roy, D., Vaillancourt, K., Frenette, M., Moineau,
S., & Vadeboncoeur, C. (2003) J. Bacteriol. 185, 6764-72 Linton, K. J., & Higgins, C. F. (1998) Mol. Microbiol. 28, 5-13 Locher, K. P., Lee, A. T., & Rees, D. C. (2002) Science 296, 1091-1098 Luesink, E. J., Beumer, C. M. A., Kuipers, O. P., & de Vos, W. M. (1999a) J. Bacteriol.
181, 764-771 Luesink, E. J., Marugg, J. D., Kuipers, O. P. & de Vos, W. M. (1999b) J. Bacteriol. 181,
1924-1926 Mahr, K., Hillen, W., & Titgemeyer, F. (2000) Appl. Environ. Microbiol. 66, 277-83
31
Margulis, L. (1996) Proc. Natl. Acad. Sci. USA 93, 1071-1076 McLaughlin, R. E., & Ferretti, J. J. (1996) FEMS Microbiol. Lett. 140, 261-264 Meibom, K. L., Li, X. B., Wu, C. Y., Roseman, S., & Schoolnik, G. K. (2004) Proc. Natl.
Acad. Sci. USA 101, 2524-2529 Mijakovic, I., Poncet, S., Galinier, A., Monedero, V., Fieulaine, S., Janin, J., Nessler, S.,
Marquez, J. A., Scheffzek, K., Hasenbein, S., Hengstenberg, W., & Deutscher, J. (2002) Proc. Natl. Acad. Sci. USA 99, 13442-7
Miwa, Y., Nakata, A., Ogiwara, A., Yamamoto, M. & Fujita, Y. (2000) Nucleic Acids
Res. 28, 1206-10 Muscariello, L., Marasco, R., De Felice M., & Sacco, M. (2001) Appl. Environ.
Microbiol. 67, 2903-2907 Muto, A., & Osawa, S. (1987) Proc. Natl. Acad. Sci. USA 84, 166-169 Naumoff, D. G., & Livshits, V. A. (2001) Mol. Biol. 35, 19-27 Nesbo, C. L., Nelson, K. E., & Doolitle, W. F. (2002) J. Bacteriol. 184, 4475-4488 Nessler, S., Fieulaine, S., Poncet, S., Galinier, A., Deutscher, J., & Janin, J. (2003) J.
Bacteriol. 185, 4003-4010 Ouwehand, A. C., Salminen, S., & Isolauri, E. (2002) Antonie van Leeuwenhoek 82, 279-
289 Paulsen, I. T. Sliwinski, M. K., & Saier, M. H. (1998) J. Mol. Biol. 277, 573-592 Paulsen, I. T., Nguyen, L., Sliwinski, M. K., Rabus, R., & Saier, M. H. (2000) J. Mol.
Biol. 301, 75-100 Paulsen, I. T., Banerjei, L., Myers, G. S. A. Nelson, K. E., Seshadri, R., Read, T. D.,
Fouts, D. E., Eisen, J. A., Gill, S. R., Heidelberg, J. F., Tettelin, H., Dodson, R. J., Umayam, L., Brinkac, L., Beanan, M., Daugherty, S., DeBoy, R. T., Durkin, S., Kolonay, J., Madupu, R., Nelson, W., Vamathevan, J., Tran, B., Upton, J., Hansen, T., Shetty, J., Khouri, H., Utterback, T., Radune, D., Ketchum, K. A. Dougherty, B. A., & Fraser, C. M. (2003) Science 299, 2071-2074
Poolman, B. (2002) Antonie van Leeuwenhoek 82, 147-164
32
Pridmore RD, Berger B, Desiere F, Vilanova D, Barretto C, Pittet AC, Zwahlen MC, Rouvet M, Altermann E, Barrangou R, Mollet B, Mercenier A, Klaenhammer TR, Arigoni F, & Schell MA. (2004) Proc. Natl. Acad. Sci. USA 101, 2512-2517
Pysz, M. A., Conners, S. B., Montero, C. I., Shockley, K. R., Johnson, M. R., Ward, D.
E., & Kelly, R. M. (2004a) Appl. Environ. Microbiol. 70, 6098-6112 Pysz, M. A., Ward, D. E., Shockley, K. R., Montero, C. I., Conners, S. B., Johnson, M.
R., & Kelly, R. M. (2004b) Extremophiles 8, 209-17 Quentin, Y., Fichant, G., & Denizot, F. (1999) J. Mol. Biol. 287, 467-484 Reid, G. (1999) Appl. Environ. Microbiol. 65, 3763-6 Reid, G., Sanders, M. E., Gaskins, H. R., Gibson, G. R., Mercenier, A., Rastall, R.,
Roberfroid, M., Rowland, I., Cherbut, C., & Klaenhammer T. R. (2003) J. Clin. Gastroenterol. 37, 105-118
Rivera, M. C., & Lake, J. A. (2004) Nature 431, 152-155 Rosenow, C., Maniar, M., & Trias, J. (1999) Genome Res. 9, 1189-97 Russell, R. R. B., Aduse-Opoku, J., Sutcliffe, I. C., Tao, L. & Ferretti, J. J. (1992) J. Biol.
Chem. 267, 4631-4637 Russell, W. M., & Klaenhammer, T. R. (2001) Appl. Environ. Microbiol. 67, 4361-4364 Rycroft, C. E., Jones, M. R., Gibson, G. R. & Rastall, R. A. (2001) J. Appl. Microbiol.
91, 878-87 Saier, M. H., & Reizer, J. (1992) J. Bacteriol. 174, 1433-1438 Saier, M. H. (2000) Mol. Microbiol. 35, 699-710 Sanders, M. E., & Klaenhammer, T. R. (2001) J. Dairy. Sci. 84, 319-331 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,
M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sci. USA 99, 14422-14427
Shockley, K. R., Ward, D. E., Chhabra, S. R., Conners, S. B., Montero, C. I., & Kelly, R.
M. (2003) Appl. Environ. Microbiol. 69, 2365-2371 Siebold, C., Flukiger, K., Beutler, R., & Erni, B. (2001) FEBS Lett. 504, 104-111
33
Sievers, M., Uermosi, C., Fehlmann, M., & Krieger, S. (2003) System. Appl. Microbiol. 26, 350-356
Siezen, R. J., van Enckevort, F. H. J., Kleerebezem, M., & Teusink, B. (2004) Curr.
Opin. Biotechnol. 15, 105-115 Snel, B., Bork, P., & Huynen M. A. (2002) Proc. Natl. Acad. Sci. USA 99, 5890-5895 Tannock, G. W. (1999) Antonie van Leeuwenhoek 76, 265-278 Tettelin, H., Nelson, K. E., Paulsen, I. T., Eisen, J. A., Read, T. D., Peterson, S.,
Heidelberg, J., Deboy, R. T., Haft, D. H., Dodson, R. J., Durkin, A. S., Gwinn, M., Kolonay, J. F., Nelson, W. C., Peretron, J. D., Umayam, L. A., While, O., Salzberg, S. L., Lewis, M. R., Radune, D., Holtzapple, E., Khouri, H., Wolf, A. M., Utterback, T. R., Hansen, C. L., McDonald, L. A., Feldblyum, T. V., Angiuoli, S., Dickinson, T., Hickey, E. K., Holt, I. E., Loftus, B. J., Yang, F., Smith, H. O., Venter, J. C., Dougherty, B. A., Morrison, D. A., Hollingshead, S. K., & Fraser, C. M. (2001) Science 293, 498-506
Tettelin, H., Masignani, V., Cieslewicz, M. J., Eisen, J. A., Peterson, S., Wessels, M. R.,
Paulsen, I. T., Nelson, K. E., Margarit, I., Read, T. D., Madoff, L. C., Wolf, A. M., Beanan, M. J., Brinkac, L. M., Daugherty, S. C., DeBoy, R. T., Durkin, A. S., Kolonay, J. F., Madupu, R., Lewis, M. R., Radune, D., Fedorova, N. B., Scanlan, D., Khouri, H., Mulligan, S., Carty, H. A., Cline, R. T., Van Aken, S. E., Gill, J., Scarselli, M., Mora, M., Iacobini, E. T., Brettoni, C., Galli, G., Mariani, M., Vegni, F., Maione, D., Rinaudo, D., Rappuoli, R., Telford, J. L., Kasper, D. L., Grandi, G., & Fraser, C. M. (2002) Proc. Natl. Acad. Sci. USA 99, 12391-12396
Thompson, J. D., Higgins, D. G., & Gibson, T. J. (1994) Nucleic Acids Res. 22, 4673-
4680 Titgemeyer, F., & Hillen, W. (2002) Antonie van Leeuwenhoek 82, 59-71 Vadeboncoeur, C., & Pelletier, M. (1997) FEMS Microbiol. Rev. 19, 187-207 Vaillancourt, K., Moineau, S., Frenette, M., Lessard, C., & Vadeboncoeur, C. (2002) J.
Bacteriol. 184, 785-793 Van den Bogaard, P. T. C., Kleerebezem, M., Kuipers, O. P., & De Vos, W. M. (2000) J.
Bacteriol. 182, 5982-5989 Van Laere, K. M., Hartemink, R., Bosveld, M., Schols, H. A. & Voragen, A. G. (2000) J.
Agric. Food Chem. 48, 1644-1652 Van Veen, H. W., Margolles, A., Putman, M., Sakamoto, K., & Konings, W. N. Antonie
van Leeuwenhoek 76, 347-352
34
Vaughan, E. E., David, S., & De Vos W. M. (1996) Appl. Environ. Microbiol. 62, 1574-
82 Vaughan, E. E., de Vries, M. C., Zoetendal, E. G., Ben-Amor, K., Akkermans, A. D. L.,
& de Vos, W. M. (2002) Antonie van Leeuwenhoek 82, 341-352 Venter, J. C., Remington, K., Heidelberg, J. F., Halpern, A. L., Rusch, D., Eisen, J. A.,
Wu, D., Paulsen, I., Nelson, K. E., Nelson, W., Fouts, D. E., Levy, S., Knap, A. H., Lomas, M. W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y. H., & Smith, H. O. (2004) Science 304, 66-74
Ventura, M., Canchaya, C., van Sinderen, D., Fitzgerald, G. F., & Zink, R. (2004) Appl.
Environ. Microbiol. 70, 3110-3121 Viana, R., Monedero, V., Dossonet, V., Vadeboncoeur, C., Perez-Martinez, G., &
Deutscher, J. (2000) Mol. Microbiol. 36, 570-584 Warner, J. B., & Lolkema, J. S. (2003) Microbiol. Mol. Rev. 67, 475-490 Weickert, M. J. & Chambliss, G. H. (1990) Proc. Natl. Acad. Sci. USA 87, 6238-6242 Woese, C. R., Kandler, O., & Wheelis, M. L. (1990) Proc. Natl. Acad. Sci. USA 87,
4576-4579
35
Table 1. Genomes of lactic acid bacteria and other probiotic species
Genus Species strain Size (Mbp) %GC Status reference Bifidobacterium longum NCC2705 2.3 60.1 C1 Schell et al. breve NCIMB8807 2.4 58.8 C Siezen et al. Enterococcus faecalis V583 3.2 37.5 C Paulsen et al. Lactobacillus acidophilus NCFM 2.0 34.7 C Altermann et al. gasseri ATCC333323 1.8 35.1 IP JGI johnsonii NCC533 2.0 34.6 C Pridmore et al. plantarum WCFS1 3.3 44.5 C Kleerebezem et al. casei ATCC334 2.5 41.1 IP JGI rhamnosus HN001 2.4 46.4 IP Klaenhammer et al. helveticus CNRZ32 2.4 37.1 IP Klaenhammer et al. brevis ATCC367 2.0 43.1 IP JGI sakei 23K 1.9 41.2 C Klaenhammer et al. delbrueckii ATCCBAA365 2.3 45.7 IP JGI Lactococcus lactis ssp. lactis IL1403 2.3 35.4 C Bolotin et al. lactis ssp. cremoris SK11 2.3 30.9 IP JGI Leuconostoc mesenteroides ATCC8293 2.0 37.4 IP JGI Oenococcus oeni ATCCBAA331 1.8 37.5 IP JGI Pediococcus pentosaceus ATCC25745 2.0 37.0 IP JGI Streptococcus agalactiae 2603V/R 2.2 35.7 C Tettelin et al. mutans UA159 2.0 36.8 C Ajdic et al. pneumoniae TIGR4 2.2 39.7 C Tettelin et al. pyogenes M1 1.9 38.5 C Ferretti etal. thermophilus LMD9 1.8 36.8 IP JGI
1 C, complete
2 IP, in progress
3 JGI, Joint Genome Institute
Adapted from Klaenhammer et al., 2002 and Siezen et al., 2004
36
Table 2. Carbohydrate utilization profiles for select lactic acid bacteria
Fermentation1 Annotation2 Type Sugar Lac Lpl Ljo Lga Smu Spn Lla Pentoses Arabinose Yes Ribose Yes Yes Ribulose PTS Xylose Yes Xylulose Hexoses Fructose PTS PTS Yes Yes PTS PTS PTS Galactose GPH Yes Yes Yes ABC ABC GPH Glucose PTS Yes Yes Yes PTS PTS Yes Mannose PTS PTS Yes Yes PTS PTS PTS Disaccharides Cellobiose PTS PTS Yes Yes PTS PTS PTS Gentiobiose PTS PTS Yes Lactose GPH Yes Yes Yes PTS PTS GPH Maltose ABC Yes Yes Yes ABC ABC Yes Melibiose PTS Yes Yes Yes Sucrose PTS PTS Yes PTS PTS PTS Trehalose PTS Yes Yes Yes PTS PTS PTS Turanose Yes Oligosaccharides FOS ABC Yes Melezitose Yes Raffinose ABC Yes Yes Yes ABC Sugar alcohols Galactitol PTS PTS Glycerol ABC ABC Mannitol PTS PTS PTS PTS Sorbitol PTS PTS Deoxysugars Fucose Rhamnose Yes Modified Sugars Amygdalin Yes PTS Yes Yes Arbutin PTS PTS Yes Yes Esculin PTS Yes Yes Yes Gluconate PTS Malate N-acetylglucosamine PTS PTS Yes Yes PTS Yes Salicin PTS PTS Yes Yes Yes
1 determined by fermentation patterns obtained from API50CHO (BioMerieux, Durham, NC) 2 determined by ORF functional assignment from the genome annotation Lac Lactobacillus acidophilus Lpl Lactobacillus plantarum Ljo Lactobacillus johnsonii Lga Lactobacillus gasseri Smu Streptococcus mutans Spn Streptococcus pneumoniae Lla Lactococcus lactis
37
Table 3. Transmembrane domains in L. acidophilus transporters
Family Gene ORF# Substrate TMD
ABC msmF La503 FOS 6 msmG La504 FOS 6 msmF2 La1441 Raffinose 6 msmG2 La1440 Raffinose 6
PTS scrA 401 Sucrose 8-10 treB 1012 Trehalose 10 fruA 1777 Fructose 10 manLMN La452-5 Mannose/glucose 10
GPH lacS 1463 Lactose/galactose 12
TMD, number of transmembrane domains in a protein, as predicted by the algorithm developed by Krogh et al., 2001.
38
L. lac
tis
S. mutans
S. pneumoniae
S. pyogenes
S. agalactiae
L. gasseri
L. jo
hnso
nii
L. a
cido
phi lu
s
L. p
lant
arum
P. p
ento
sace
us
E. faec
alis
B. halodurans
B. subtilis
S. aureusL. mesenteroides
O. oeni
B. longum
B. linens
T. maritim
a
.r
L. lac
tis
S. mutans
S. pneumoniae
S. pyogenes
S. agalactiae
L. gasseri
L. jo
hnso
nii
L. a
cido
phi lu
s
L. p
lant
arum
P. p
ento
sace
us
E. faec
alis
B. halodurans
B. subtilis
S. aureusL. mesenteroides
O. oeni
B. longum
B. linens
T. maritim
a
.r
V. alginolyticus
V. cholerae
E. coliK. pneumoniae
S typhimuyium
V. alginolyticus
V. cholerae
E. coliK. pneumoniae
S typhimuyium
Figure 1. Phylogenetic tree of lactic acid bacteria and select microbial species. This phylogenetic tree is a neighbor-joining tree obtained from the multiple sequence alignment of 16S rRNA genes in ClustalW (Thompson et al., 1994), visualized in MEGA2 (Kumar et al., 2001). Black, lactic acid bacteria; red, bacillales; yellow, thermotogae; red, proteobacteria. Within LAB, branches for different subgroups have different colors: blue, streptococci, pink, lactobacilli, purple, high GC brevibacteria and bifidobacteria.
39
IIC IIB IIA
PTS
ABC ABC
IIA
SBPMSD
NBD
MSD
NBD
GPH
IMPORTER EXPORTER
ATPase
SECONDARY TRANSPORTERS
UNIPORT ANTIPORT SYMPORT
IIC IIB IIA
PTS
ABC ABC
IIA
SBPMSD
NBD
MSD
NBD
GPH
IMPORTER EXPORTER
ATPase
SECONDARY TRANSPORTERS
UNIPORT ANTIPORT SYMPORT
Figure 2. Transporters commonly found in lactic acid bacteria. Green, ABC transporters; Red, PTS transporters; yellow, GPH transporters; Gray, ATPase; blue, secondary transporters.
40
A B C D
Figure 3. Transmembrane domains in ABC, PTS and GPH transporters in L. acidophilus. A, TMDs in FOS ABC transporter MsmE; B, TMDs in FOS ABC transporter MsmF; C, TMDs in sucrose PTS transporter ScrB; D, TMDs in lactose/galactose GPH transporter LacS
41
CHAPTER II – Functional and comparative genomic analyses of an operon involved in fructooligosaccharide utilization by Lactobacillus
acidophilus
Published in Proc. Natl. Acad. Sci. USA 100, 8957-8962 – see appendix 1
42
2.1 Abstract
Lactobacillus acidophilus NCFM is a probiotic organism that displays the ability to
utilize prebiotic compounds, such as fructo-oligosaccharides (FOS), which stimulate the
growth of beneficial commensals in the gastrointestinal tract. However, little is known
about the mechanisms and genes involved in FOS utilization by Lactobacillus species.
Analysis of the L. acidophilus NCFM genome revealed an msm locus composed of a
transcriptional regulator of the LacI family, a four component ABC transport system, a
fructosidase and a sucrose phosphorylase. Transcriptional analysis of this operon
demonstrated that gene expression was induced by sucrose and FOS, but not by glucose
or fructose, suggesting some specificity for non-readily fermentable sugars. Additionally,
expression was repressed by glucose, but not by fructose, suggesting catabolite
repression, via two cre-like sequences identified in the promoter-operator region.
Insertional inactivation of the genes encoding the ABC transporter substrate binding
protein and the fructosidase reduced the ability of the mutants to grow on FOS.
Comparative analysis of gene architecture within this cluster revealed a high degree of
synteny with operons in Streptococcus mutans and Streptococcus pneumoniae. However,
the association between a fructosidase and an ABC transporter is unusual, and may be
specific to L. acidophilus. This is the first description of a gene locus involved in
transport and catabolism of FOS compounds, which can promote competition of
beneficial microorganism in the human gastrointestinal tract.
43
2.2 Introduction
The ability of select intestinal microbes to utilize substrates non-digested by the
host may play an important role in their ability to successfully colonize the mammalian
gastrointestinal (GI) tract. A diverse carbohydrate catabolic potential is associated with
cariogenic activity of S. mutans in the oral cavity (Ajdic et al., 2002), adaptation of L.
plantarum to a variety of environmental niches (Kleerebezem et al., 2003), and residence
of B. longum in the colon (Schell et al., 2002), illustrating the competitive benefits of
complex sugar utilization. Prebiotics are non-digestible food ingredients that selectively
stimulate the growth and/or activity of beneficial microbial strains residing in the host
intestine (Gibson and Roberfroid, 1995). Among sugars that qualify as prebiotics, fructo-
oligosaccharides (FOS) are a diverse family of fructose polymers used commercially in
food products and nutritional supplements, that vary in length and can be either
derivatives of simple fructose polymers, or fructose moieties attached to a sucrose
molecule. The linkage and degree of polymerization can vary widely (usually between 2
and 60 moieties), and several names such as inulin, levan, oligofructose and neosugars
are used accordingly. The average daily intake of such compounds, originating mainly
from wheat, onion, artichoke, banana, and asparagus (Gibson and Roberfroid, 1995;
Moshfegh et al., 1999), is fairly significant with nearly 2.6 g of inulin and 2.5 g of
oligofructose consumed in the average American diet (Moshfegh et al., 1999). FOS are
not digested in the upper gastrointestinal tract and can be degraded by a variety of lactic
acid bacteria (Hartemink et al., 1995; Hartemink et al., 1997; Kaplan and Hutkins, 2000;
Van Laere et al., 2000), residing in the human lower gastrointestinal tract (Gibson and
Roberfroid, 1995; Orrhage et al., 2000). FOS and other oligosaccharides have been
44
shown in vivo to beneficially modulate the composition of the intestinal microbiota, and
specifically to increase bifidobacteria and lactobacilli (Gibson and Roberfroid, 1995;
Orrhage et al., 2000). A variety of L. acidophilus strains in particular have been shown to
utilize several polysaccharides and oligosaccharides such as arabinogalactan,
arabinoxylan and FOS (Kaplan and Hutkins, 2000; Van Laere et al., 2000). Despite the
recent interest in FOS utilization, little information is available about the metabolic
pathways and enzymes responsible for transport and catabolism of such complex sugars
in lactobacilli.
In silico analysis of a particular locus within the L. acidophilus NCFM genome
revealed the presence of a gene cluster encoding proteins potentially involved in prebiotic
transport and hydrolysis. This specific cluster was analyzed computationally and
functionally to reveal the genetic basis for FOS transport and catabolism by L.
acidophilus NCFM.
2.3Materials and Methods
2.3.1 Bacterial strain and media used in this study
The strain used in this study is L. acidophilus NCFM (Barefoot and
Klaenhammer, 1983). Cultures were propagated at 37°C, aerobically in MRS broth
(Difco). A semi-synthetic medium consisted of: 1% bactopeptone (w/v) (Difco), 0.5%
yeast extract (w/v) (Difco), 0.2% dipotassium phosphate (w/v) (Fisher), 0.5% sodium
acetate (w/v) (Fisher), 0.2% ammonium citrate (w/v) (Sigma), 0.02% magnesium sulfate
(w/v) (Fisher), 0.005% manganese sulfate (w/v) (Fisher), 0.1% Tween 80 (v/v) (Sigma),
0.003 % bromocresol purple (v/v) (Fisher), and 1% sugar (w/v). The carbohydrates added
45
were either glucose (dextrose) (Sigma), fructose (Sigma), sucrose (Sigma), or FOS. Two
types of complex sugars were used as FOS: a GFn mix (manufactured by R. Hutkins),
consisting of glucose monomers linked α-1,2 to two, three or four fructosyl moieties
linked β-2,1, to form kestose (GF2), nystose (GF3) and fructofuranosyl-nystose (GF4),
respectively; and an Fn mix, raftilose, derived from inulin hydrolysis (Orafti). Without
carbohydrate supplementation, the semi-synthetic medium was unable to sustain bacterial
growth above OD600nm~0.2.
2.3.2 Computational analysis of the putative msm operon
A 10 kbp DNA locus containing a putative msm (multiple sugar metabolism)
operon was identified from the L. acidophilus NCFM genome sequence. ORF predictions
were carried out by four computational programs: Glimmer (Salzberg et al., 1998;
Delcher et al., 1999), Clone Manager (Scientific and Educational Software), the NCBI
ORF caller (http://www.ncbi.nlm.nih.gov/gorf/gorf.html), and GenoMax (InforMax Inc.,
MD). Glimmer was previously trained with a set of L. acidophilus genes available in
public databases. The predicted ORF’s were translated into putative proteins that were
submitted to BlastP analysis (Altschul et al., 1990).
2.3.3 RNA isolation and analysis
Total RNA was isolated using TRIzol (GibcoBRL) by following the supplier’s
instructions. Cells in the mid-log phase were harvested by centrifugation (2 minutes,
14,000 rpm) and cooled on ice. Pellets were resuspended in TRIZOL, by vortexing and
underwent five cycles of 1 min bead beating and 1 min on ice. Nucleic acids were
46
subsequently purified using three chloroform extractions, and precipitated using
isopropanol and centrifugation for 10 min at 12,000 rpm. The RNA pellet was washed
with 70% ethanol, and resuspended into DEPC treated water. RNA samples were treated
with DNAse I according to the supplier’s instructions (Boehringer Mannheim). First
strand cDNA was synthesized using the Invitrogen RT-PCR kit according to the
supplier’s instructions. cDNA products were subsequently amplified using PCR with
primers internal to genes of interest. For RNA slot blots, RNA samples were transferred
to nitrocellulose membranes (BioRad) using a slot blot apparatus (Bio-Dot SF, BioRad),
and the RNAs were UV crosslinked to the membranes. Blots were probed with DNA
fragments generated by PCR that had been purified from agarose gels (GeneClean III kit,
Midwest Scientific). Probes were labeled with α-32P, using the Amersham Multiprime
Kit, and consisted of a 700 bp and 750 bp fragment internal to the msmE and bfrA genes,
respectively. Hybridization and washes were carried out according to the supplier’s
instructions (Bio-Dot Microfiltration Apparatus, BioRad) and radioactive signals were
detected using a Kodak Biomax film. Primers are listed in Supporting Table 1.
2.3.4 Comparative genomic analysis
A gene cluster bearing a fructosidase gene was selected after computational data-
mining of the L. acidophilus NCFM genome. Additionally, microbial clusters containing
fructosidase EC 3.2.1.26 orthologs, or bearing an ABC transport system associated with
an alpha-galactosidase EC 3.2.1.22 were selected from public databases (NCBI, TIGR).
The sucrose operon is a widely distributed cluster, consisting of either three or four
elements, namely: a regulator, a sucrose PTS transporter, a sucrose hydrolase and
47
occasionally a fructokinase. Two gene cluster alignments were generated: (i) a PTS
alignment, representing similarities over the sucrose operon, bearing a PTS transport
system associated with a sucrose hydrolase; (ii) an ABC alignment, representing
similarities over the multiple sugar metabolism cluster, bearing an ABC transport system
usually associated with a galactosidase. Sequence information is available in Table 2.
2.3.5 Phylogenetic trees
Nucleotide and protein sequences were aligned computationally using the
CLUSTALW algorithm (Thompson et al., 1994). The multiple alignment outputs were
used for generating unrooted neighbor-joining phylogenetic trees using MEGA2 (Kumar
et al., 2001). In addition to a phylogenetic tree derived from 16S rRNA genes, trees were
generated for ABC transporters, PTS transporters, transcription regulators, fructosidases,
and fructokinases.
2.3.6 Gene inactivation
Gene inactivation was conducted by site-specific plasmid integration into the L.
acidophilus chromosome via homologous recombination (Russell and Klaenhammer,
2001). Internal fragments of the msmE and bfrA genes were cloned into pORI28 using E.
coli as a host (Law et al., 1995), and the constructs were subsequently purified and
transformed into L. acidophilus NCFM. The ability of the mutant strains to grow on a
variety of carbohydrate substrates was investigated using growth curves. Strains were
grown on semi-synthetic medium supplemented with 0.5% w/v carbohydrate.
48
2.4 Results
2.4.1 Computational analysis of the msm operon
Analysis of the msm locus using four ORF calling programs revealed the presence
of seven putative ORF’s. Because most of the encoded proteins were homologous to
those of the msm operon present in S. mutans (Russell et al., 1992), a similar gene
nomenclature was used. The analysis of the predicted ORF’s suggested the presence of a
transcriptional regulator of the LacI repressor family, MsmR; a four component transport
system of the ATP binding cassette (ABC) family, MsmEFGK; and two enzymes
involved in carbohydrate metabolism, namely a fructosidase EC 3.2.1.26, BfrA; and a
sucrose phosphorylase EC 2.4.1.7, GtfA. A putative Shine-Dalgarno sequence
5’AGGAGG3’ was found within 10 bp upstream of the msmE start codon. A dyad
symmetry analysis revealed the presence of two stem loop structures that could act as
putative Rho-independent transcriptional terminators: one between msmK and gtfA
(between bp 6986 and 7014), free energy – 13.6 kcal.mol-1, and one 20 bp downstream of
the last gene of the putative operon (between bp 8,500 and 8,538), free energy –16.5
kcal.mol-1. The operon structure is shown in Figure 1.
The regulator contained two distinct domains: a DNA binding domain at the
amino-terminus with a predicted helix-turn-helix motif (pfam00354), and a sugar-binding
domain at the carboxy-terminus (pfam00532). The transport elements consisted of a
periplasmic solute binding protein (pfam01547), two membrane spanning permeases
(pfam00528), and a cytoplasmic nucleotide binding protein (pfam 00005), characteristic
of the different subunits of a typical ABC transport system (Quentin et al., 1999). A
putative anchoring motif LSLTG was present at the amino-terminus of the substrate-
49
binding protein. Each permease contained five trans-membrane regions predicted
computationally (Krogh et al., 2001). Analyses of ABC transporters in recently
sequenced microbial genomes have defined four characteristic sequence motifs (Linton
and Higgins, 1998; Braibant et al., 2000). The predicted MsmK protein included all four
ABC conserved motifs, namely: Walker A: GPSGCGKST (consensus GxxGxGKST or
[AG]xxxxGK[ST]); Walker B: IFLMDEPLSNLD (consensus hhhhDEPT or
DExxxxxD); ABC signature sequence: LSGG; and Linton and Higgins motif: IAKLHQ
(consensus hhhhH+/-, with h, hydrophobic and +/- charged residues). The putative
fructosidase showed high similarity to glycosyl hydrolases (pfam 00251). The putative
sucrose phosphorylase shared 63% residue identity with that of S. mutans.
2.4.2 Sugar induction and co-expression of contiguous genes
Transcriptional analysis of the msm operon using RT-PCR and RNA slot blots
showed that sucrose and both types of oligofructose (GFn and Fn) were able to induce
expression of msmE and bfrA (Figure 2A). In contrast, glucose and fructose did not
induce transcription of those genes, suggesting specificity for non-readily fermentable
sugars and the presence of a regulation system based on carbohydrate availability. In the
presence of both FOS and readily fermentable sugars, glucose repressed expression of
msmE, even if present at a lower concentration, whereas fructose did not (Figure 2B).
Analysis of the transcripts induced by oligofructose indicated that all genes within the
operon are co-expressed (Figure 6) in a manner consistent with the S. mutans msm
operon (McLaughlin and Ferretti, 1996).
50
2.4.3 Mutant phenotype analysis
The ability of the bfrA (fructosidase) and msmE (ABC transporter) mutant strains
to grow on a variety of carbohydrates was monitored by both optical density at 600nm
and colony forming units (cfu). The mutants retained the ability to grow on glucose,
fructose, sucrose, galactose, lactose and FOS-GFn, in a manner similar to that of the
control strain (Figure 7), a lacZ mutant of the L. acidophilus parental strain also
generated by plasmid integration (Russell and Klaenhammer, 2001). This strain was
chosen because it also bears a copy of the plasmid used for gene inactivation integrated in
the genome. In contrast, both the bfrA and msmE mutants halted growth on FOS-Fn
prematurely (Figure 3), likely upon exhaustion of simple carbohydrate from the semi-
synthetic medium. After one passage, the msmE mutant displayed slower growth on FOS-
Fn, while the bfrA mutant could not grow (Figure 3). Additionally, terminal cell counts
from overnight cultures grown on FOS-Fn were significantly lower for the mutants,
especially after one passage (Figure 7).
2.4.4 Comparative genomic analyses and locus alignments
Comparative genomic analysis of gene architecture between L. acidophilus, S.
mutans, S. pneumoniae, B. subtilis and B. halodurans revealed a high degree of synteny
within the msm cluster, except for the core sugar hydrolase (Figure 4A). In contrast, gene
content was consistent, whereas gene order was not well conserved for the sucrose
operon (Figure 4B). The lactic acid bacteria exhibit a divergent sucrose operon, where the
regulator and the hydrolase are transcribed opposite to the transporter and the
fructokinase. In contrast, gene architecture was variable amongst the proteobacteria.
51
2.4.5 Phylogenetic trees
Phylogenetic trees were generated to investigate whether there was a correlation
between protein similarity, gene architecture and the phylogenic relationships of the
selected microorganisms. The phylogenetic relationships were obtained from 16S
ribosomal DNA alignment. All proteobacteria appeared distant from the LAB, and the
Clostridium species formed a well-defined cluster between T. maritima and the bacillales
(Figure 5A).
For the fructosidases, all enzymes obtained from the LAB sucrose operons
clustered extremely well together, at the left end of the tree, whereas there was apparent
shuffling of the other three groups (Figure 5B). The paralogs of those fructosidases in S.
mutans, S. pneumoniae, and L. acidophilus clustered at the opposite end of the tree.
Interestingly, the L. acidophilus fructosidase was distant from the LAB sucrose
hydrolases cluster, and showed strong homology to enzymes experimentally associated
with oligosaccharide hydrolysis, in organisms such as T. maritima, M. laevaniformans,
and B. subtilis.
Each component of the ABC transport system clustered together (Figure 5C),
namely MsmE, MsmF, MsmG and MsmK for substrate binding, membrane spanning
proteins and nucleotide binding unit, respectively. For MsmE, MsmF and MsmG, three
consistent sub-clusters were obtained: (i) the two Bacillus species; (ii) L. acidophilus, S.
mutans and S. pneumoniae from the operons bearing a galactosidase; (iii) L. acidophilus
and S. pneumoniae from the operons bearing a fructosidase.
52
For the phospho-transferase system (PTS) transporters, the clustering did not
proceed according to phylogeny, especially for lactic acid bacteria, which formed two
separate clusters (Figure 5D). The two distant transporters at the bottom of the tree are
non-PTS sucrose transporters of the major facilitator family of transporters, as suggested
by their initial annotation.
All regulators were repressors, with the exception of those regulators of L.
acidophilus, S. pneumoniae and S. mutans clustering at the bottom of the tree (Figure
5E), which activate transcription of operons bearing an ABC transport system associated
with a galactosidase (Russell et al., 1992). In contrast, the msm regulators for both S.
pneumoniae and L. acidophilus seemed to be repressors similar to that of the sucrose
operon (5E). The helix-turn-helix DNA binding motif of the regulator was very well
conserved amongst selected regulators of the LacI family (Supporting Figure 3A), as
shown previously (Nguyen and Saier, 1995). In contrast, the seven regulators at the
bottom of the tree did not contain this conserved motif.
The fructokinase clustering was the most similar to that of the 16S phylogenetic
tree, with distinct clustering of lactobacillales, bacillales, clostridia, and proteobacteria
(Figure 5F). The lack of correlation between phylogeny, gene architecture and protein
similarity may be due to extensive gene transfer amongst bacteria and independent
sequence divergence.
2.4.6 Catabolite response elements (cre) analysis
Analysis of the promoter-operator region upstream of the msmE gene revealed the
presence of two 17-bp palindromes separated by 30 nucleotides, showing high similarity
53
to a consensus sequence for the cis-acting sites controlling catabolite repression in Gram
positive bacteria, notably Bacillus subtilis (Burne et al., 1999; Weickert and Chambliss,
1990; Miwa et al., 2000; Yamamoto et al., 2001). Several cre-like sequences highly
similar to those found in B. subtilis and S. mutans (Weickert and Chambliss, 1990; Miwa
et al., 2000; Yamamoto et al., 2001) were also retrieved from the promoter-operator
region of the L. acidophilus NCFM sucrose operon as well as that of the other msm locus
(Table 1). Interestingly, sequences nearly identical to the cre-like elements found in the
L. acidophilus msm operon, were found in the promoter-operator region of the msm locus
in S. pneumoniae (Table 1).
2.5 Discussion
The L. acidophilus NCFM msm operon encodes an ABC transporter associated
with a fructosidase that are both induced in the presence of FOS. Sucrose and both types
of oligofructose induced expression of the operon, whereas glucose and fructose did not.
Additionally, glucose repressed expression of the operon, suggesting the presence of a
regulation mechanism of preferred carbohydrate utilization based on availability. Specific
induction by FOS and sucrose, and repression by glucose indicated transcriptional
regulation, likely through cre present in the operator-promoter region, similar to those
found in B. subtilis (Miwa et al., 2000) and S. mutans (Burne et al., 1999). Catabolite
repression is a mechanism widely distributed amongst Gram-positive bacteria, usually
mediated in cis by catabolite response elements, and in trans by repressors of the LacI
family, responsible for transcriptional repression of genes encoding catabolic enzymes in
54
the presence of readily fermentable sugars (Weickert and Chambliss, 1990; Hueck et al.,
1994; Wen and Burne, 2002).
A variety of enzymes have been associated with microbial utilization of fructo-
oligosaccharides, namely: fructosidase EC 3.2.1.26 (Burne et al., 1987; Liebl et al.,
1998), inulinase EC 3.2.1.7 (Onodera and Shiomi, 1988; McKellar and Modler, 1989;
Xiao et al., 1989), levanase EC 3.2.1.65 (Menendez et al., 2002), fructofuranosidase EC
3.2.1.26 (Muramatsu et al., 1992; Oda and Ito, 2000; Perrin et al., 2000), fructanase EC
3.2.1.80 (Hartemink et al., 1995), and levan biohydrolase EC 3.2.1.64 (Saito et al., 2000;
Song et al., 2002). Despite the semantic diversity, these enzymes are functionally related,
and should be considered as members of the same β-fructosidase super-family that
incorporates members of both glycosyl family 32 and 68 (Naumoff, 2001). All those
enzymes share the conserved motif H-x(2)-P-x(4)-[LIVM]-N-D-P-N-G, and are all
involved in the hydrolysis of β-D-fructosidic linkages to release fructose. Generally,
fructosidases across genera share approximately 25-30% identity and 35-50% similarity
(Burne et al., 1999), with several regions widely conserved across the glycosyl hydrolase
32 family (Naumoff, 2001). The two residues shown to be involved in the enzymatic
activity of fructan-hydrolases, namely Asp 47 and Cys 230 (Reddy and Maley, 1990;
Liebl et al., 1998), as well as motifs highly conserved in the beta-fructosidase
superfamily, such as the NDPNG, FRDP, and ECP motifs (Liebl et al., 1998; Naumoff,
2001), were extremely well conserved amongst all fructosidase sequences (Supporting
Figure 3B).
Since the L. acidophilus fructosidase was similar to that of T. maritima and S.
mutans’ FruA (see Figure 5B), two enzymes that have experimentally been associated
55
with oligofructose hydrolysis (Burne et al., 1987; Liebl et al., 1998), we initially
hypothesized that BfrA is responsible for FOS hydrolysis. Induction and gene
inactivation data confirmed the correlation between the msm locus and FOS utilization.
The L. acidophilus BfrA fructosidase was most similar to that of T. maritima, which has
the ability to release fructose from sucrose, raffinose, levan (β2,6) and inulin (β2,1) in an
exo-type manner (Liebl et al., 1998). It was also very similar to other enzymes which
have been characterized experimentally, and associated with hydrolysis of FOS
compounds by S. mutans (Burne et al., 1999) and M. laevaniformans (Song et al., 2002).
Analysis of FOS degradation by S. mutans showed that FruA is involved in hydrolysis of
levan, inulin, sucrose and raffinose (Burne et al., 1987; Russell et al., 1992; Hartemink et
al., 1995; Burne et al., 1999). Additionally, it was shown that expression of this gene was
regulated by catabolite response elements (Burne et al., 1999; Wen et al., 2002) and that
fruA transcription was induced by levan, inulin and sucrose, whereas repressed by readily
metabolizable hexoses (Burne et al., 1987; Burne et al., 1999).
In S. mutans, FruA was shown to be an extracellular enzyme, which is anchored
to the cell wall by a LPxTG motif (Burne and Penders, 1992), that catalyses the
degradation of available complex carbohydrates outside of the cell. Additionally,
microbial fructosidases associated with FOS hydrolysis such as M. laevaniformans LevM
(Song et al., 2002) and S. exfoliatus levanbiohydrolase (Saito et al., 2000) have been
reported as extracellular enzymes as well. In contrast, the L. acidophilus NCFM
fructosidase does not contain an anchoring signal, thus is likely a cytoplasmic enzyme
requiring transport of its substrate(s) through the cell membrane. No additional secreted
levanase or inulinase was found in the L. acidophilus genome sequence. Since transporter
56
genes are often co-expressed with genes involved in the metabolism of the transported
compounds (Lambert et al., 2001), in silico analysis of the msm operon indicates that the
substrate of the fructosidase is transported by an ABC transport system. This is rather
unusual since when the fructosidase is not extracellular, the fructosidase gene is
commonly associated with a sucrose PTS transporter (Figure 4), notably in lactococci,
streptococci and bacilli (Hiratsuka et al., 1998; Luesink et al., 1999), or a sucrose
permease of the major facilitator family, as in B. longum. Those fructosidases usually
associated with PTS transporters are generally sucrose-6-phosphate hydrolases that do
not have FOS as cognate substrate. Therefore, L. acidophilus NCFM may have combined
the ABC transport system usually associated with an alpha-galactosidase, with a
fructosidases, in the msm locus. The genetic makeup of NCFM is seemingly distinct, and
exclusively similar to that of S. pneumoniae. Additionally, recent evidence in L.
paracasei suggested that an ABC transport system might be involved in FOS utilization
(Kaplan and Hutkins, 2003), which further supports the hypothesis that FOS is
transported by an ABC transporter in L. acidophilus.
Lateral gene transfer (LGT) has increasingly been shown to account for a
significant number of genes in bacterial genomes (Koonin et al., 2001), and may account
for a large proportion of the strain-specific genes found in microbes, as shown in H.
pylori (Salama et al., 2000), C. jejuni (Dorrell et al., 2001), S. pneumoniae (Hackenbeck
et al., 2001), and T. maritima (Nesbo et al., 2002). Notably, in T. maritima, genes
involved in sugar transport and polysaccharide degradation represent a large proportion
of variable genes, with ABC transporters having the highest horizontal gene transfer
frequency (Nesbo et al., 2002). In addition, it was recently suggested that oligosaccharide
57
catabolic capabilities of B. longum have been expanded through horizontal transfer, as
part of its adaptation to the human GI tract (Schell et al., 2002), and that the large set of
sugar uptake and utilization genes in L. plantarum was acquired through LGT
(Kleerebezem et al., 2003).
Intestinal microbes would benefit greatly from acquisition of gene clusters
involved in transport and catabolism of undigested sugars, especially if they conferred a
competitive edge towards successful colonization of the host GI tract. It is possible that L.
acidophilus acquired the ability to utilize FOS through genetic exchange, since ABC
transporters and polysaccharide degradation enzymes have a high horizontal gene transfer
frequency (Nesbo et al., 2002). The two fructosidase paralogs seemed fairly distant from
one another, sharing 28% identity and 44% similarity, suggesting those genes might have
arisen from LGT rather than gene duplication. Also, since no neighboring genes or
sequences are common to those two genes, a duplication event seems unlikely. Given the
lack of consistency between phylogeny, gene architecture, and protein similarity, it is
possible both the msm and sucrose operons underwent gene rearrangements. However,
there was no evidence the msm cluster was obtained through LGT, since the GC content
was very similar to that of the genome, and there was no discrepancy in the genetic code
usage.
Based on these observations, we conclude that L. acidophilus has combined the
ABC transport system derived from the raffinose operon with a β-fructosidase to form a
distinct gene cluster involved in transport and catabolism of prebiotic compounds
including FOS, suggesting a possible adaptation of the sugar catabolism system towards
different complex sugars. The catabolic properties of this operon might differ from those
58
of the raffinose and sucrose operons (Figure 9). In light of the theory that environmental
factors and ecology might be dominant over phylogeny for variable genes (Nesbo et al.,
2002), we may hypothesize that L. acidophilus has acquired FOS utilization capabilities
through LGT, or rearranged its genetic make-up to build a competitive edge towards
colonization of the human GI tract by using prebiotic compounds, ultimately contributing
to a more beneficial microbiota.
59
2.6 References
Ajdic, D., McShan, W. M., McLaughlin, R. E., Savic, G., Chang, J., Carson, M. B., Primeaux, C., Tian, R., Kenton, S., Jia, H., Lin, S., Qian, Y., Li, S., Zhu, H., Najar, F., Lai, H., White, J., Roe, B. A. & Ferretti, J. J. (2002) Proc. Natl. Acad. Sci. USA 99, 14434-14439
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) J. Mol. Biol.
215, 403-410 Barefoot, S. F. & Klaenhammer, T. R. (1983) Appl. Environ. Microbiol. 45, 1808-1815 Braibant, M., Gilot, P. & Content, J. (2000) FEMS Microbiol. Rev. 24, 449-467 Burne, R. A., Schilling, K., Bowen, W. H. & Yasbin, R. E. (1987) J. Bacteriol. 169,
4507-4517 Burne, R. A. & Penders, J. E. (1992) Infect. Immun. 60, 4621-4632 Burne, R. A., Wen, Z. T., Chen, Y. Y. M. & Penders, J. E. C. (1999) J. Bacteriol. 181,
2863-2871 Delcher, A. L., Harmon, D., Kasif, S., White, O. & Salzberg, S. L. (1999) Nucleic Acids
Res. 27, 4636-4641 Dorrell, N., Mangan, J. A., Laing, K. G., Hinds, J., Linton, D., Al-Ghusein, H., Barrell,
B. G., Parkhill, J., Stoker, N. G., Karlyshev, A. V., Butcher, P. D. & Wren, B. W. (2001) Genome Res. 11, 1706-1715
Gibson, G. R. & Roberfroid, M. B. (1995) J. Nutr. 125, 1401-1412 Hakenbeck, R., Balmelle, N., Weber, B., Gardes, C., Keck, W. & de Saizieu, A. (2001)
Infect. Immun. 69, 2477-2486 Hartemink, R., Quataert, M. C. J., Vanlaere, K. M. J., Nout, M. J. R. & Rombouts, F. M.
(1995) J. Appl. Bacteriol. 79, 551-557 Hartemink, R., VanLaere, K. M. J. & Rombouts, F. M. (1997) J. Appl. Microbiol. 83,
367-374 Hiratsuka, K., Wang, B., Sato, Y. & Kuramitsu, H. (1998) Infect. Immun. 66, 3736-3743 Hueck, C. J., Hillen, W. & Saier, M. H., Jr. (1994) Res. Microbiol. 145, 503-518
60
Kaplan, H. & Hutkins, R. W. (2000) Appl. Environ. Microbiol. 66, 2682-2684 Kaplan, H., and Hutkins, R. W. (2003) Appl. Environ. Microbiol. 69, 2217-2222 Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,
R., Tarchini, R., Peters, S. A., Sandbrink, H. M., Fiers, M. W., Stiekema, W., Lankhorst, R. M., Bron, P. A., Hoffer, S. M., Groot, M. N., Kerkhoven, R., de Vries, M., Ursing, B., de Vos, W. M. & Siezen, R. J. (2003) Proc. Natl. Acad. Sci. USA 100, 1990-1995
Koonin, E. V., Makarova, K. S. & Aravind, L. (2001) Annu. Rev. Microbiol. 55, 709-742 Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. (2001) J. Mol. Biol. 305,
567-580 Kumar, S., Tamura, K., Jakobsen, I. B. & Nei, M. (2001) Bioinformatics 17, 1244-1245 Lambert, A., Osteras, M., Mandon, K., Poggi, M. C. & Le Rudulier, D. (2001) J.
Bacteriol. 183, 4709-4717 Law, J., Buist, G., Haandrikman, A., Kok, J., Venema, G. & Leenhouts, K. (1995) J.
Bacteriol. 177, 7011-7018 Liebl, W., Brem, D. & Gotschlich, A. (1998) Appl. Microbiol. Biotechnol. 50, 55-64 Linton, K. J. & Higgins, C. F. (1998) Mol. Microbiol. 28, 5-13 Luesink, E. J., Marugg, J. D., Kuipers, O. P. & de Vos, W. M. (1999) J. Bacteriol. 181,
1924-1926 McKellar, R. C. & Modler, H. W. (1989) Appl. Microbiol. Biotechnol. 31, 537-541 McLaughlin, R. E. & Ferretti, J. J. (1996) Fems Microbiol. Lett. 140, 261-264 Menendez, C., Hernandez, L., Selman, G., Mendoza, M. F., Hevia, P., Sotolongo, M. &
Arrieta, J. G. (2002) Curr. Microbiol. 45, 5-12 Miwa, Y., Nakata, A., Ogiwara, A., Yamamoto, M. & Fujita, Y. (2000) Nucleic Acids
Res. 28, 1206-1210 Moshfegh, A. J., Friday, J. E., Goldman, J. P. & Ahuja, J. K. C. (1999) J. Nutr. 129,
1407s-1411s Muramatsu, K., Onodera, S., Kikuchi, M. & Shiomi, N. (1992) Biosci. Biotech. Biochem.
56, 1451-1454
61
Naumoff, D. G. (2001) Proteins 42, 66-76 Nesbo, C. L., Nelson, K. E. & Doolittle, W. F. (2002) J. Bacteriol. 184, 4475-4488 Nguyen, C. C. & Saier, M. H., Jr. (1995) FEBS Lett. 377, 98-102 Oda, Y. & Ito, M. (2000) Curr. Microbiol. 41, 392-395 Onodera, S. & Shiomi, N. (1988) Agric. Biol. Chem. 52, 2569-2576 Orrhage, K., Sjostedt, S. & Nord, C. E. (2000) J. Antimicrob. Chemother. 46, 603-612 Perrin, S., Grill, J. P. & Schneider, F. (2000) J. Appl. Microbiol. 88, 968-974 Quentin, Y., Fichant, G. & Denizot, F. (1999) J. Mol. Biol. 287, 467-484 Reddy, V. A. & Maley, F. (1990) J. Biol. Chem. 265, 10817-10820 Russell, R. R. B., Aduseopoku, J., Sutcliffe, I. C., Tao, L. & Ferretti, J. J. (1992) J. Biol.
Chem. 267, 4631-4637 Russell, W. M. & Klaenhammer, T. R. (2001) Appl. Environ. Microbiol. 67, 4361-4364 Rycroft, C. E., Jones, M. R., Gibson, G. R. & Rastall, R. A. (2001) J. Appl. Microbiol.
91, 878-887 Saito, K., Kondo, K., Kojima, I., Yokota, A. & Tomita, F. (2000) Appl. Environ.
Microbiol. 66, 252-256 Salama, N., Guillemin, K., McDaniel, T. K., Sherlock, G., Tompkins, L. & Falkow, S.
(2000) Proc. Natl. Acad. Sci. USA 97, 14668-14673 Salzberg, S. L., Delcher, A. L., Kasif, S. & White, O. (1998) Nucleic Acids Res. 26, 544-
548 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,
M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sc.i USA 99, 14422-14427
Song, E. K., Kim, H., Sung, H. K. & Cha, J. (2002) Gene 291, 45-55 Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994) Nucleic Acids Res. 22, 4673-
4680
62
Van Laere, K. M., Hartemink, R., Bosveld, M., Schols, H. A. & Voragen, A. G. (2000) J. Agric. Food Chem. 48, 1644-1652
Weickert, M. J. & Chambliss, G. H. (1990) Proc. Natl. Acad. Sci. USA 87, 6238-6242 Wen, Z. T. & Burne, R. A. (2002) J. Bacteriol. 184, 126-133 Xiao, R., Tanida, M. & Takao, S. (1989) J. Ferment. Bioeng. 67, 331-334 Yamamoto, H., Serizawa, M., Thompson, J. & Sekiguchi, J. (2001) J. Bacteriol. 183,
5110-5121
63
Table 1. Catabolite responsive elements sequences
Bacterium Sequence* Origin
B. subtilis WTGNAANCGNWNNCW search sequence Miwa et al., 2000
B. subtilis WWTGNAARCGNWWWCAWW new consensus Miwa et al., 2000
B. subtilis TGWAANCGNTNWCA consensus Weickert and Chambliss, 1990
B. subtilis TGTAAGCGCTTACA optimal operator Weickert and Chambliss, 1990
B. subtilis TGTAAACGTTATCA Yamamoto et al., 2001 L. acidophilus cre1 ATTG-AAACGTTT-CAA upstream of msmE L. acidophilus cre2 ATAG-AAACGTTT-CAA upstream of msmE S. pneumoniae cre1 AATG-AAACGTTT-CAA upstream of msmE2 S. pneumoniae cre2 AATG-AAACGTTT-CAA upstream of msmE2 L. acidophilus scr AATAAAAGCGTTTACAT upstream of scrB L. acidophilus cre3 TATGAAAGCGCTTAAAA upstream of msmE2 S. mutans creW AGATAGCGATTTGG Burne et al., 1999 S. mutans creS AGATAGCGCTTACA Burne et al., 1999
* N, any; W, A or T; R, G or A; shaded nucleotides were specifically conserved and consistent with the consensus sequences
64
Table 2. Primers used in this study
Primer Sequence* Gene† Position‡ A GTAATAATAGTCAAAGTGGC msmEf 1,518 B GATCGGATCCAAGATCAATGCTGCTTTAAA msmEf2 1,706 C GGAAGGCTGAAGTAGTTTGC msmEr 2,192 D GATCGAATTCGATACAGGATATGGCATTACG msmEr2 2,355 E AGGATCCATCCATATGCTCCACACT bfrAf 4,655 F AGAATTCAACATGATCAGCACTTCT bfrAr 5,370 G GGAATATCTTCGGCTAATTG bfrAr2 5,540 H CCACTTCAAGTAGCTGTTACTAATA msmGf 4,337 I CTTGAGTAAGATACTTTTGG msmGr 4,469 J GACCAGAAGATATTCACGCC msmKf 6,661 K ACCTGGCTTGTGATAATCAC msmKr 6,833 L GGTCTTTGAACTTGTTCCGC gtfAr 8,269
* underlined sequence indicates restriction site used for cloning † f, indicates forward strand; r, indicates reverse strand. ‡ position of the 5’ end of the primer, relative to the 10,000 bp DNA locus.
65
Table 3. Genes and proteins used for comparative genomic analyses
Bacterium Genome or locus Sequence information
B. anthracis NC_003995 bfrA NP_654697 B. halodurans NC_002570 BH1855 NP_242721, SacP NP242722, BH1857
NP_242723, SacA NP_242724, 16S (nt22,819-24,370), MsmR NP_243093, MsmE NP_243092, AmyD NP_243091, AmyC NP_243090, bh2223 NP_243089
B. longum AE014295 cscA BL0105 (fructosidase) AE014625_3, cscB (major facilitator family permease) AE014625_4, BL0107 (lacI) AE014625_5, 16S nt AE014785 nt 2,881-4,400
B. subtilis NC_000964 SacT NP_391686, SacP NP_391684. SacA NP_391683, 16S nt 9,809-11,361, MsmR NP_390904, MsmE NP_390905, AmyD NP_390906, AmyC NP_390907, MelA NP_390908, SacC NP_390581, YdhR O05510, YdjE O34768
C. acetobutylicum NC_003030 LicT NP_347062, 0423 NP_347063, 0424 NP_347064, SacA NP_347066, 16S nt 9,710-11,219
C. beijerinckii AF059741 ScrA AAC99320, ScrR AAC999321, ScrB AAC99322, ScrK AAC99323, 16S X_68179
C. perfringens NC_003366 1531 NP_562447, SacA NP_562448, 1533 NP_562449, 1534 NP_562450, 16S 10,173-11,680
E. coli NC_002655 3623 NP_288931, 3624 NP_288932, 3625 NP_288933, 3626 NP_288934, 16S nt 227,103-228,644
E. faecalis TIGR shotgun, NC 002938
EF1601, EF1603, EF1604, 16S AF515223, EFA0067, EFA0069, EFA0070, available at http://www.tigr.org
G. stearothermophilus TIGR shotgun, NC_002926
16S contig221 nt 1,001-2,440, SurT AAB38977, SurP AAB72022, SurA AAB38976, PfK KIBSFF
K. pneumoniae WashU shotgun, NC_002941
ScrR P37076, ScrA CAA40658, ScrB CAA40659, 16S AJ233420, locus X57401
L. acidophilus AY172019 (msm), AY172020 (msm2), AY177419 (scr)
ScrR, ScrB, ScrA, 16S nt 59,261-60,816, MsmR, MsmE, MsmF, MsmG, BfrA, MsmK, GtfA, MsmR2, MsmE2, MsmF2, MsmG2, MsmK2, Aga, GtfA2
L. fermentum ScrK CAD24410 L. gasseri NZ_AAAB0100
0011 In progress, JGI
ScrR ZP_00046868, ScrB58 (contig 58) ZP_00046078, ScrB38 (contig 38) ZP_00046869, ScrA21 (contig 21), ScrA 58 (contig 58) ZP_00046080, ScrK ZP_00046753 , 16S AF519171
L. lactis M96669 SacB CAB09690, SacA CAB09689, SacR CAB09692, SacK CAB09691, Luesink et al., 1999, 16S X54260
L. plantarum AL935263 16S AF515222, sacK1 CAD62854, pts1bca CAD62855, sacA CAD62856, sacR CAD62857
L. sakei ScrA AAK92528
66
Table 3. Genes and proteins used for comparative genomic analyses (continued)
Bacterium Genome or locus Sequence information
M. laevaniformans LevM BAB59060 P. multocida NC_002663 PtsB NP_246785, ScrR NP_246786, ScrB
NP_246787, PM1849 NP_246788, 16S AY078999
P. pentosaceus Z32771 ScrK CAA83667, ScrA CAA83668, ScrB CAA83669, ScrR CAA83670, 16S AF515227
R. solanacearum NC_003296 ScrR NP_522845, ScrA NP_522844, ScrB NP_522843, 16S nt 1,532,714-1,534,226
S. agalactiae NC_004116 ScrR NP_688683, ScrB NP_688682, Sag1690 NP_688681, ScrK NP_688680, 16S nt 16411-17916
S. aureus NC_002758 ScrR NP_372566, ScrB NP_372565, 2040 NP_372564, 16S P83357
S. mutans M77351 ScrK NP_722157, ScrA NP_722158, ScrB NP_722159, ScrR NP_722160, msmR AAA26932, Aga AAA26933, MsmE AAA26934, MsmF AAA26935, MsmG AAA26936, GtfA AAA26937, MsmK AAA26938, FruB AAD28639, FruA Q03174, 16S AF139603
S. pneumoniae NC_003098 ScrK NP_359158, ScrA NP_359159, ScrB NP_359160, ScrR NP_359161, 16S nt15,161-16,674, MsmR NP_359306, Aga NP_359305, MsmE NP_359304, MsmF NP_359303, MsmG NP_359302, GtfA NP_359301, ScrR2 NP_359213, Sbp NP_359212, MspA NP_359211, MspB NP_359210, SacA NP_359209
S. pyogenes NC_002737 ScrK NP_269817, ScrA NP_269819, ScrB NP_269820, ScrR NP_269821, 16S nt 17,170-18,504
S. sobrinus ScrB S68598, ScrA S68599 S. typhimurium ScrK P26984, ScrAP08470, ScrR CAA47975,
ScrB P37075, 16S Z49264 S. xylosus ScrA S39978, ScrB Q05936, ScrR P74892 T. maritima NC_000853 bfrA NP_229215, 1416 NP_229217, 1417
NP_229218, 16S AJ401021, 0296 NP_228108 V. alginolyticus ScrR P24508, ScrB P13394, ScrK P22824, ScrA
P22825, 16S AF513447 V. cholerae NC_002506 0653 NP_233042, ScrR NP_233043, 0655
NP_233044, 0656 NP_233045, 16S X74694
67
msmR msmE msmF msmG bfrA msmK gtfA
1231 aaatggcaataccacaaaaTAActgttgacaagttgtgaaagcgatattatcatttaatt1291 gtaaattgaaaacgtttccaaagtgttcaaatagttttttgctaaataattatttttttg1351 tagcgaaaTAGAAACGTTTCAAttaatttaaaacaattagatcttagtaggaaacctttt
cre21411 aatttttgtgcaaaaTTGAAACGTTTCAAaAGGAGGaaaaATGaaaaaatggaaattagg
cre1
msmR msmE msmF msmG bfrA msmK gtfA
1231 aaatggcaataccacaaaaTAActgttgacaagttgtgaaagcgatattatcatttaatt1291 gtaaattgaaaacgtttccaaagtgttcaaatagttttttgctaaataattatttttttg1351 tagcgaaaTAGAAACGTTTCAAttaatttaaaacaattagatcttagtaggaaacctttt
cre21411 aatttttgtgcaaaaTTGAAACGTTTCAAaAGGAGGaaaaATGaaaaaatggaaattagg
cre1
Figure 1. Operon layout. The start and stop codons are in bold, the putative ribosome binding site is boxed, and the cre-like elements are underlined. Terminators are indicated by hairpin structures
68
msmE
bfrA
msmE
bfrA
Glc Fru Suc GFn Fn ctrl
msmEbfrA
%Fn 1.0 1.0 1.0 1.0 DNA%Glc 0.0 0.1 0.5 1.0 ctrl
msmEbfrA
%Fn 1.0 1.0 1.0 1.0 DNA%Fru 0.0 0.1 0.5 1.0 ctrl
A
B
msmE
bfrA
msmE
bfrA
Glc Fru Suc GFn Fn ctrl
msmEbfrA
%Fn 1.0 1.0 1.0 1.0 DNA%Glc 0.0 0.1 0.5 1.0 ctrl
msmEbfrA
%Fn 1.0 1.0 1.0 1.0 DNA%Fru 0.0 0.1 0.5 1.0 ctrl
A
B
Figure 2. Sugar induction and repression. A. Transcriptional induction of the msmE, and bfrA genes, monitored by RT-PCR (top) and RNA slot blots (bottom). Cells were grown on glucose (Glc), fructose (Fru), sucrose (Suc), FOS GFn, and FOS Fn. Chromosomal DNA was used as a positive control for the probe. B. Transcriptional repression analysis of msmE and bfrA by variable levels of glucose (Glc) and fructose (Fru): 0.1% (5.5 mM), 0.5% (28 mM) and 1.0% (55 mM), in the presence of 1% Fn. Cells were grown in the presence of Fn until OD600nm approximated 0.5-0.6, glucose was added and cells were propagated for an additional 30 minutes
69
Time (hrs)0 2 4 6 8 10 12 14 16 18
Ln O
D60
0nm
e-3
e-2
e-1
e0
e1
fructoseGFnFnlacZFn passage
Time (hrs)0 2 4 6 8 10 12 14 16 18
Ln O
D60
0nm
e-3
e-2
e-1
e0
e1
fructoseGFnFnlacZFn passage
Time (hrs)0 2 4 6 8 10 12 14 16 18
Ln O
D60
0nm
e-3
e-2
e-1
e0
e1
fructoseGFnFnlacZFn passage
Time (hrs)0 2 4 6 8 10 12 14 16 18
Ln O
D60
0nm
e-3
e-2
e-1
e0
e1
fructoseGFnFnlacZFn passage
Figure 3. Growth curves. The two mutants, bfrA (top) and msmE (bottom) were grown on semi-synthetic medium supplemented with 0.5% w/v carbohydrate: fructose (●), GFn (○), Fn (▼), Fn for one passage ( ). The lacZ mutant grown on Fn was used as control (∇)
70
S. pneumoniae
S. mutans
B. subtilis
B. halodurans
L. acidophilus
L. acidophilus
S. pneumoniae
R A E F G S
R A E F G S K
R E F G A
R E F G A
R2 E2 F2 G2 K2 A S2
R E F G B K S
R2 E2 F2 G2 B
E. faecalis
E. faecalis plasmid
S. pyogenes
P. pentosaceus
S. mutans
S. agalactiae
S. pneumoniae
L. acidophilus
L. plantarum
L. lactis
E. coli O157:H7
S. aureus
C. beijerinckii
C. perfringens
P. multocida
V. cholerae
B. subtilis
B. halodurans
C. acetobutylicum
G. stearothermophilus
R. solanacearum
1604 1603 1601
0070 0069 0067
scrR scrB scrA scrK
scrR scrB scrA scrK
scrR scrB scrA scrK
scrR scrB sag1 scrK
scrR scrB scrA scrK
scrR scrB scrA
sacR sacA PTS sacK
sacR sacB sacB sacK
3626 3625 3624 3623
scrR scrB 2040
scrA scrR scrB scrK
1534 1533 sacA 1531
ptsB scrR scrB 1849
0653 scrR 0655 0656
sacT sacP sacA
1855 sacP 1857 sacA
licT 0423 0424 sacA
surT surP surA
scrR scrA scrB
A
B
S. pneumoniae
S. mutans
B. subtilis
B. halodurans
L. acidophilus
L. acidophilus
S. pneumoniae
R A E F G S
R A E F G S K
R E F G A
R E F G A
R2 E2 F2 G2 K2 A S2
R E F G B K S
R2 E2 F2 G2 B
E. faecalis
E. faecalis plasmid
S. pyogenes
P. pentosaceus
S. mutans
S. agalactiae
S. pneumoniae
L. acidophilus
L. plantarum
L. lactis
E. coli O157:H7
S. aureus
C. beijerinckii
C. perfringens
P. multocida
V. cholerae
B. subtilis
B. halodurans
C. acetobutylicum
G. stearothermophilus
R. solanacearum
1604 1603 1601
0070 0069 0067
scrR scrB scrA scrK
scrR scrB scrA scrK
scrR scrB scrA scrK
scrR scrB sag1 scrK
scrR scrB scrA scrK
scrR scrB scrA
sacR sacA PTS sacK
sacR sacB sacB sacK
3626 3625 3624 3623
scrR scrB 2040
scrA scrR scrB scrK
1534 1533 sacA 1531
ptsB scrR scrB 1849
0653 scrR 0655 0656
sacT sacP sacA
1855 sacP 1857 sacA
licT 0423 0424 sacA
surT surP surA
scrR scrA scrB
A
B
Figure 4. Operon architecture analysis. A. Alignment of the msm locus from selected bacteria. Regulators, white; α-galactosidases, blue; ABC transporters, gray; fructosidases, yellow; sucrose phosphorylase, red. B. Alignment of the sucrose locus from selected microbes. Regulators, white; fructosidases, yellow; PTS transporters, green; fructokinase, purple; putative proteins, black
71
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
agala
ctiae B. longum
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
E. L.
P.
S.
S.
E. L.
P.
S.
S. aga
lactia
e B. longum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
F 0.2
C. beijerin
ckiiC. perfringens
C. acetobutylicum
B. subtilis
T. maritim
aS. aureus
B. h
alod
uran
s
.
s
r
P. pentosaceus
L. fermentum
L. acidophilus
L. gasseri
L. lactisS. pneumoniae
S. mutans
S. pyogenes S. agalactiae
L. plantarum
F 0.2
C. beijerin
ckiiC. perfringens
C. acetobutylicum
B. subtilis
T. maritim
aS. aureus
B. h
alod
uran
s
.
s
r
P. pentosaceus
L. fermentum
L. acidophilus
L. gasseri
L. lactisS. pneumoniae
S. mutans
S. pyogenes S. agalactiae
L. plantarum
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
S. muta
nsmsm
F
L. acid
ophilus msm
F2
S. pneumoniae msmFS. pneumoniae mspA
L. acidophilus msmF
B. halodurans amyD
B. subtilis amyD
Spneumoniaemsp
B
L. a
cidop
hilu
s m
smG
S. p
neum
onia
em
smG
S. m
utan
sm
smG
L. a
cidop
hilus
msm
G2
B. h
alodu
rans
amyC
B. subtilis
amyC
S. pne
umon
iaemsm
E
S. mutans
msmE
L. acidophilus msmE2
S. pneunoniae sbpL. acidophilus msmE
B. halodurans msmEB. subtilis
msm
E
T. maritim
a1416
T. maritim
a1417
S. mutans
msm
K
L. acidophilus msm
K
L. acidophilus msmK2
C 0.2
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
L. ac
idoph
ilus
L. gasseriL. plantarumP. pentosaceus
L. lactis
S. mutans
S. pneumoniae
S. pyogenes
E. faecalis
B. h
alod
uran
s
B. sub
tilis
G. stearothermophilus
C. acetobutylicum
C. beijerinckiiC. perfringensT. maritima
S. aureus
A 0.05
B. longum
S. agalactiae
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
agala
ctiae B. longum
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
E. L.
P.
S.
S.
E. L.
P.
S.
S. aga
lactia
e B. longum
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
agala
ctiae B. longum
B 0.2
L. lactisL. acidophilusL. gasseri 38
faecalisplantarum
pentosaceus
L. gasseri 58
S. aureus
S. xylosus C. b
eijin
rinck
iiC
. per
fring
ens
B. a
nthr
acis
C. acetobutylicumB. su
btilis
sacA
B. halodurans
T. maritim
aL. acidophilus bfrA
S. mutans
fruAB. subtilis
sacC
M. laevaniform
ansS. m
utansfruB
Sneumoniae
sacA
G. stearothermophilus
S. m
utan
sS
. sob
rinus
E. fa
ecal
isp
S. p
neum
onia
epyogenes
o
E. L.
P.
S.
S.
E. L.
P.
S.
S. aga
lactia
e B. longum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
E 0.5
L. acidophilus scrR
L. gasseri
P. pentosaceus
E. faecalisL. lactis
E. fa
ecali
s p
S. mutans
S. p
yoge
nes
S. p
neum
onia
e
L. acidophilus R
S. pneumoniae
R2us
E. coli
C. b
eije
rinck
iiC
. per
fring
ens
B. h
alodu
rans
S. aureus
S. xylosus
B. halodurans msmR
B. sub ilis msmR
B. subtilissacT
G. stearotherm
ophilus
C. acetobutylicum
S. pneumoniae
R
S. mutans
R
L. acidophilus R2
B. longum
S. a
gala
ctia
e
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
D 0.2
L. a
cido
philu
s
L. g
asse
ri21
E. fa
ecal
isp
S. muta
ns
L. sakei
P. pentosaceus
.
S. xylosus
V. alginolyticus C. a
ceto
buty
licum
Bu
ilis
B. halodura
ns
G. stearothermophilus
calis
C. beijerinckiiL. gasseri 58
S. sobrinus
S. pyogenes
S. pneumoniaeC. perfringens
L. lactisS. agalactiae
B. longum
L. plantarum
F 0.2
C. beijerin
ckiiC. perfringens
C. acetobutylicum
B. subtilis
T. maritim
aS. aureus
B. h
alod
uran
s
.
s
r
P. pentosaceus
L. fermentum
L. acidophilus
L. gasseri
L. lactisS. pneumoniae
S. mutans
S. pyogenes S. agalactiae
L. plantarum
F 0.2
C. beijerin
ckiiC. perfringens
C. acetobutylicum
B. subtilis
T. maritim
aS. aureus
B. h
alod
uran
s
.
s
r
P. pentosaceus
L. fermentum
L. acidophilus
L. gasseri
L. lactisS. pneumoniae
S. mutans
S. pyogenes S. agalactiae
L. plantarum
R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
E. c
oli
S. typ
himuri
um
P. multocida
V. choleraeV alginolyticuR. solanacea um
E. c
oli
S. typ
himuri
um
P. multocida
V. choleraeV alginolyticuR. solanacea um
R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli R. solanacearum
P. multocida
V. alginolyticus
V. choleraeK. pneum
oniaeS. typhim
uyrium
E. coli
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
P. multocida
V. alginolyticus
V. c
holer
ae
R. solanacearum
K. pneumoniaeS. typhimurium
. p
E. c li
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
V. alginolytic
R. s
olan
acea
rum
P. multocidaK. pneumoniaeS. typhimurium
tV. cholerae
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
P multocida
V. cholerae
K. p
neum
onia
e
S. t
yphi
mur
ium
. sbt
E. fae
E. coli
R. s
olan
acea
rum
E. c
oli
S. typ
himuri
um
P. multocida
V. choleraeV alginolyticuR. solanacea um
E. c
oli
S. typ
himuri
um
P. multocida
V. choleraeV alginolyticuR. solanacea um
Figure 5. Neighbor-joining phylogenetic trees. Lactobacillales, black; bacillales, green; clostridia, blue; thermotogae, yellow; proteobacteria, red. A, 16S; B, fructosidase; C, ABC; D, PTS; E, regulators; F, fructokinase. L. acidophilus proteins are boxed, and shaded when encoded by the msm locus. Bars indicate scales for computed pairwise distances
72
msmE msmF msmG bfrA msmK gtfA
A C B D
A B C D a
M noR
T R
T D
NA
no
RT
RT
DN
A
noR
T R
T D
NA
no
RT
RT
DN
A
Figure 6. Co-expression of contiguous genes. Co-transcription of contiguous genes was monitored by RT-PCR using primers as shown on the lower panel. In each set of three bands, a negative control did not undergo reverse transcription (left), and a positive control was obtained from chromosomal DNA used as a template for PCR (right)
73
glc fru suc GFn Fn FnRP lac Gal Raff
log 10
cfu/
ml
1e+6
1e+7
1e+8
1e+9
1e+10
bfrAmsmElacZ
ncfm lacZ msmE bfrA
Log 10
cfu/
ml
1e+3
1e+4
1e+5
1e+6
1e+7
1e+8
1e+9
1e+10
fructoseFn
Figure 7. Mutant growth on select carbohydrates. Strains were grown overnight (18 hours) on semi-synthetic medium supplemented with 0.5% w/v carbohydrates, either glucose (Glc), fructose (Fru), sucrose (Suc), FOS-GFn (GFn), FOS-Fn from Orafti (Fn), FOS-Fn from Rhone-Poulenc (FnRP), lactose (Lac), or galactose (Gal). Cell counts obtained after one passage of the bfrA mutant on FOS-Fn are shown in the lower graph.
74
A Helix Turn Helix LacI consensus * TIKDVARLAGVSKSTVSRVLN B. halodurans_msmR MATIKDIAKLANVSNATVSRVLNR 24 B. subtilis_msmR MVRIKDIALKAKVSSATVSRILNE 24 K. pneumoniae_scrR RVTIKDIAELAGVSKATASLVLNG 28 S. typhimurium_scrR RVTIKDIAEQAGVSKATASLVLNG 28 P. multocida_scrR RITLSDIAKCCGLSTTTVSMILNN 31 C. beijerinckii_scrR KVTIQDIANMVNVSKSSVSRYLNN 27 C. perfringens_1533 KVTIQDIANMVGVSKSTVSRYLNG 26 B. halodurans_1855 MTTILDIAKLAGVAKSTVSRYLNG 24 S. aureus_scrR MKNISDIAKLAGVSKSTVSRFLNN 24 S. xylosus_scrR MKNIADIAKIAGVSKSTVSRYLNN 24 E. faecalis_0070 VAKLTDVAELAGVSPTTVSRVINN 35 S. pyogenes_scrR VAKLTDVAALAGVSPTTVSRVINK 25 S. agalactiae_scrR VAKLTDVAALAGVSPTTVSRVINK 25 S. mutans_scrR VAKLTDVAKLAGVSPTTVSRVINR 25 S. pneumoniae_scrR VAKLTDVAKLAGVSPTTVSRVINK 25 E. faecalis_1604 VVKLTDVAKLAGVSPTTVSRVINN 28 L. lactis_sacR MIKLEDVANKAGVSVTTVSRVINR 24 L. acidophilus_scrR PAKLSDVAREAGCSVTTVSRVINN 25 L. gasseri_scrR MVKLTDVAAKAGCSVTTVSRVINN 26 L. plantarum_sacR KPKLNDVAKLAGVSATTVSRVINN 25 P. pentosaceus_scrR KPKLNDVAKLAGVSATTVSRVINN 25 L. acidophilus_msmR MATMKDVAQRAGVGVGTVSRVINH 23 S. pneumoniae_scrR2 SITMKDVALEAGVSVGTVSRVINK 32 V. alginolyticus_scrR --SLHDVARLAGVSKSTVSRVIND 24 E. coli_3626 MASLKDVARLAGVSMMTVSRVMHN 24 B. longum_BL0107 MVTMKEIANKAGVSVSTVSLVLNG 25 R. solanacearum_scrR RPTIRDVATLAGVSTSTVSRVLNN 34
B NDPNG L. acidophilus_bfrA WINDPNGL 38 S. pneumoniae_sacA WINDPNGF 45 E. coli_3625 WMNDPNGL 43 B. longum WINDPNGL 45 P. multocida_scrB LLNDPNGL 71 V. alginolyticus_scrB LLNDPNGL 55 L. acidophilus_scrB LINDPNGF 51 L. gasseri_scrB38 LLNDPNGF 51 S. mutans_scrB LLNDPNGF 51 S. sobrinus_scrB LLNDPNGF 51 E. faecalisp_0069 LLNDPNGF 51 S. pneumoniae_scrB LLNDPNGF 51 S. agalactiae LLNDPNGH 51 S. pyogenes_scrB LLNDPNGF 51 L. lactis_sacA LLNDPNGF 51 L. plantarum_sacA LLNDPNGF 51 P. pentosaceus_scrB LLNDPNGF 51 E. faecalis_1603 LLNDPNGF 56 S. aureus_scrB LLNDPNGL 52 S. xylosus_scrB LLNDPNGL 52 L. gasseri_scrB58 MLGDPNGF 77 C. beijinrinckii_scrB LINDPNGL 45 C. perfringens_sacA LINDPNGL 43 B. subtilis_sacA LLNDPNGV 47 G. stearothermophilus surA LMNDPNGL 47 B. halodurans_sacA LLNDPNGF 47 C. acetobutylicum_sacA FMNDPNGL 45 K. pneumoniae_scrB LLNDPNGF 45 S. typhimurium_scrB LMNDPNGF 45 V. cholerae_0655 LLNDPNGF 112 R. solanacearum_scrB LLNDPNGL 33 T. maritima_bfrA WMNDPNGL 21 B. subtilis_sacC WMNDPNGM 53 S. mutans_fruA WANDPNGL 499 M. laevaniformans_levM WMNDPQRP 90 S. mutans_fruB FMNDIQTI 52
FRDP SNFRDPKV 161 ADFRDPKL 170 MHFRDPKV 165 HHYRDPKV 167 EHVRDPKP 190 EHFRDPKV 172 EHFRDPQI 169 EHFRDPQL 170 EHFRDPQI 170 EHFRDPQI 170 DHFRDPQI 170 DHFRDPQI 170 EHFRDPQI 170 EHFRDPQL 170 DHFRDPQI 171 SSFRDPDL 120 SSFRDPDL 171 SHFRDPMV 175 SHFRDPKV 172 QHFRDPKV 172 GHFRDPKI 205 AHFRDPYV 164 AHFRDPYI 162 AHFSRSEV 167 AHFRDPK 167 AHFRDPKV 165 RHFRDPKV 166 GHVRDPKV 163 GHVRDPKV 163 EHIRDPKV 232 GHFRDPKA 152 HAFRDPKV 141 KDFRDPKV 175 QDFRDPKV 605 RDFRDPKV 221 QNARDPYI 173
ECP MTECPDYF 214 MWECPDYF 224 MWECPDFF 220 MLECPDFF 222 MWECPDLL 249 MWECPDFF 228 MIECPNLV 226 MIECPNLV 227 MIECPNLV 228 MIECPNLV 228 MIECPNLL 228 MMECPNLV 228 MIECPNII 228 MIECPNLV 228 MIECPNLI 229 MIECPNLV 176 MIECPKSG 227 MVECPNLV 233 MWECPDYF 228 MWECPDYF 228 MWECSDYF 261 MWECPNII 220 MWECPSFF 218 MWECPDLF 226 MWECPDLF 226 MWECPDLF 225 MWECPNLF 226 MWECPDLF 223 MWECPDLF 223 MWECPDWF 288 MYECPDLF 207 EIECPDLV 195 VWECPDLF 228 HTECPDMY 649 TIECPDLF 275 LVECPNLK 234
Figure 8. Motifs highly conserved amongst repressors and fructosidases. A, conserved helix-turn-helix motif of the regulators, the consensus sequence was obtained from Nguyen et al., 1995; B, conserved motifs of the β-fructosidases
75
scrR scrAscrB
msmE msmF msmG bfrA msmK gtfA
msmE2 msmF2 msmG2 msmK2 melA gtfA2
msmR
msmR2
SUCROSE
FOS
RAFFINOSE
SUCROSE-6P
FOS
RAFFINOSE
GLUCOSE-6P + FRUCTOSE
SUCROSE
+FRUCTOSE
SUCROSE+ GALACTOSE
GLUCOSE-1P + FRUCTOSE
ScrA ScrB
3.2.1.26
MsmEFGKBfrA
3.2.1.26
MsmEFGK2MelA
3.2.1.22
GtfA
2.4.1.7GLUCOSE-1P
+FRUCTOSE
GtfA2
2.4.1.7
scrR scrAscrB
msmE msmF msmG bfrA msmK gtfA
msmE2 msmF2 msmG2 msmK2 melA gtfA2
msmR
msmR2
SUCROSE
FOS
RAFFINOSE
SUCROSE-6P
FOS
RAFFINOSE
GLUCOSE-6P + FRUCTOSE
SUCROSE
+FRUCTOSE
SUCROSE+ GALACTOSE
GLUCOSE-1P + FRUCTOSE
ScrA ScrB
3.2.1.26
MsmEFGKBfrA
3.2.1.26
MsmEFGK2MelA
3.2.1.22
GtfA
2.4.1.7GLUCOSE-1P
+FRUCTOSE
GtfA2
2.4.1.7
Figure 9. Biochemical pathways. Biochemical pathways describing the likely reactions carried out by the enzymes and transporters encoded in the sucrose, FOS and raffinose loci. For the scr operon, sucrose is transported across the membrane and phosphorylated by a PTS transporter; the sucrose phosphate hydrolase hydrolyses the phosphorylated sucrose molecule into fructose and glucose-6-phosphate, and fructose. For the msm operon, FOS is transported across the membrane by an ABC transporter; the fructosidase hydrolyses fructose moieties, and the sucrose phosphorylase hydrolyses sucrose into glucose-1-phosphate and fructose. For the msm2 operon, raffinose is transported across the membrane by an ABC transporter, the alpha-galactosidase hydrolyses the galactose moiety, and the sucrose phosphorylase hydrolyses sucrose into glucose-1-phosphate and fructose.
76
Chapter III – Global analysis of carbohydrate utilization and transcriptional regulation in Lactobacillus acidophilus using whole-
genome cDNA microarrays
77
3.1 Abstract
The transport and catabolic machinery involved in carbohydrate utilization by the
probiotic lactic acid bacterium Lactobacillus acidophilus was characterized using whole-
genome cDNA microarrays. Global transcriptional profiles were determined for growth
on glucose, fructose, galactose, sucrose, lactose, trehalose, raffinose and
fructooligosaccharides. Hybridizations were carried out using a round robin design, and
microarray data was analyzed using a two-stage mixed model ANOVA. Genes
differentially expressed were visualized by hierarchical clustering, volcano plots and
novel 3-way contour plots. Quantitative PCR confirmed the fold induction determined by
microarrays. Although 379 genes (20% of the genome) were significantly differentially
expressed, only 63 genes showed induction above 4 fold, indicating that there was a small
number of highly induced genes, which included a variety of carbohydrate transporters
and sugar hydrolases. Specifically, members of the phosphoenolpyruvate: sugar
phosphotransferase system family of transporters were identified for uptake of glucose,
fructose, sucrose and trehalose. Transporters of the ATP binding cassette family were
identified for uptake of raffinose and fructooligosaccharides. A member of the LacS
subfamily of galactoside-pentose-hexuronide translocators was identified for uptake of
galactose and lactose. Saccharolytic enzymes likely involved in the metabolism of mono-
, di- and poly-saccharides into substrates of glycolysis were also identified, including the
enzymatic machinery of the Leloir pathway, involved in the catabolism of galactosides.
Results suggested the transcriptome is regulated by carbon catabolite repression.
Although substrate-specific carbohydrate transporters and hydrolases were regulated at
the transcriptional level, genes encoding regulatory proteins CcpA, Hpr, HprK/P and EI
78
were consistently highly expressed. Collectively, microarray data revealed coordinated
and regulated transcription of genes involved in sugar uptake and metabolism based on
carbohydrate availability in the environment. This dynamic adaptation to environmental
conditions likely contributes to competition with commensals for limited carbohydrate
sources available in the human gastrointestinal tract. This model study provides a global
view of carbohydrate metabolism in L. acidophilus, and illustrates how recently
implemented genomic tools can be used to investigate microbial physiology on a global
scale.
79
3.2 Introduction
A large, diverse and dynamic microbial community resides in the human
gastrointestinal tract (Tannock, 1999). In particular, the complex intestinal microbial
population includes beneficial bacteria such as bifidobacteria and lactobacilli (Gibson and
Roberfroid, 1995). Among species considered important for human health, a number of
documented lactobacilli have been characterized as probiotics (Reid, 1999). Probiotics
are generally defined as “live microorganisms which, when administered in adequate
amounts confer a health benefit on the host” (Reid et al, 2003). For such microbes,
survival and residence in the intestine relies on their ability to survive gastric passage,
adhere to epithelial cells and utilize nutrients available in the intestine.
Lactobacillus acidophilus NCFM is a gram-positive probiotic lactic acid
bacterium which has the ability to survive in the gastrointestinal tract (Sanders and
Klaenhammer, 2001; Sui et al., 2002), adhere to human epithelial cells in vitro (Greene
and Klaenhammer, 1994; Sanders and Klaenhammer, 2001), modify fecal flora (Sui et
al., 2002), modulate the host immune response (Varcoe et al., 2003), and prevent
microbial gastroenteritis (Varcoe et al., 2003). Additionally, L. acidophilus NCFM has
the ability to utilize prebiotic compounds, which may contribute to the organism’s ability
to compete in the human GIT (Barrangou et al., 2003).
Undigested carbohydrates are a primary source of energy for intestinal microbes
residing in the large intestine. Non-digestible oligosaccharides (NDO) consist primarily
of plant carbohydrates that are resistant to enzymatic degradation and are not absorbed in
the upper intestinal tract. Such dietary compounds eventually reach the large intestine,
whereby they are hydrolyzed by a limited range of organisms. As a result, NDO have the
80
ability to selectively modulate the composition of the intestinal microflora (Sui et al.,
2002). NDO such as raffinose and fructooligosaccharides have been shown to selectively
promote the growth of probiotic species, thus are considered prebiotic compounds
(Benno et al., 1987; Gibson et al., 1995). Prebiotics are defined as non-digestible
substances that provide a beneficial physiological effect on the host by selectively
stimulating the favorable growth or activity of a limited number of indigenous bacteria
(Reid et al., 2003). Although considerable attention has been devoted to studying
modulation of the intestinal flora by prebiotics, the molecular mechanisms involved in
uptake and metabolism of those compounds by desirable intestinal microbes remains
mostly uncharacterized.
Lactic acid bacteria are a heterogeneous family of microbes which can use a
variety of nutrients. Specifically, bifidobacteria, streptococci and lactobacilli possess
specialized saccharolytic potentials which reflect the nutrient availability in their
respective environments (Ajdic et al., 2002; Schell et al., 2002; Kleerebezem et al., 2003;
Pridmore et al., 2004). In particular, the versatile saccharolytic potential of L. acidophilus
likely reflects its ability to efficiently utilize energy sources available in the intestinal
environment. Although the Lactobacillus acidophilus NCFM genome encodes numerous
putative genes potentially involved in uptake and metabolism of a variety of
carbohydrates (Altermann et al., 2004), little information is available regarding their
biological functions and expression profiles.
The objective of this study was to use cDNA microarrays to characterize and compare
global gene expression in Lactobacillus acidophilus. Global gene transcription profiles
were used to identify uptake systems, catabolic machinery and regulatory networks
81
involved in utilization of eight carbohydrates. This is the first comparative global
transcriptional analysis of the fermentation pathways of a lactic acid bacterium over a
range of carbohydrates.
3.3 Materials and Methods
3.3.1 Bacterial strains and media used in this study
The strain used in this study is L. acidophilus NCFM (NCK56) (Altermann et al.,
2004). Cultures were propagated at 37°C, aerobically in MRS broth (Difco). A semi-
synthetic medium consisted of: 1% bactopeptone (w/v) (Difco), 0.5% yeast extract (w/v)
(Difco), 0.2% dipotassium phosphate (w/v) (Fisher), 0.5% sodium acetate (w/v) (Fisher),
0.2% ammonium citrate (w/v) (Sigma), 0.02% magnesium sulfate (w/v) (Fisher), 0.005%
manganese sulfate (w/v) (Fisher), 0.1% Tween 80 (v/v) (Sigma), 0.003 % bromocresol
purple (v/v) (Fisher), and 1% sugar (w/v). The carbohydrates added were either: glucose
(dextrose) (Sigma), fructose (Sigma), sucrose (Sigma), FOS (raftilose P95) (Orafti),
raffinose (Sigma), lactose (Fisher), galactose (Sigma), or trehalose (Sigma). Without
carbohydrate supplementation, the semi-synthetic medium was unable to sustain bacterial
growth. Cells underwent at least five passages on each sugar prior to RNA isolation, to
minimize carryover between substrates (Chhabra et al., 2003).
3.3.2 RNA isolation
Total RNA was isolated using TRIzol (GibcoBRL) by following the
manufacturer’s instructions. L. acidophilus cells were inoculated into semi-synthetic
medium supplemented with 1% (w/v) select sugars and propagated to mid-log phase
82
(OD600nm~0.6). Cells were harvested by centrifugation (2 minutes, 14,000 rpm) and
immediately cooled on ice. Pellets were resuspended in TRIZOL, by vortexing and
underwent five cycles of 1 min bead beating and 1 min on ice. Nucleic acids were
purified using three chloroform (Fisher) extractions, and precipitated using isopropanol
(Fisher) and centrifugation for 10 min at 12,000 rpm. The RNA pellet was washed with
70% ethanol (AAPER Alcohol and Chemical co.), and resuspended into DEPC- (Sigma)
treated water. RNA samples were treated with DNAse I according to the manufacturer’s
recommendations (Boehringer Mannheim).
3.3.3 Microarray fabrication
A whole-genome cDNA microarray was used for global gene expression analysis.
The microarray contained triplicate spots of 1,889 cDNA PCR products amplified from
genomic DNA, as described previously (Azcarate-Peril et al., 2004). Purified PCR
amplicons were spotted on GAPPS II aminosilane-coated glass slides (Corning, Acton,
MA), using an Affymetrix 417 Arrayer (Affymetrix, CA), and slides were processed as
described previously (Hedge et al., 2000; Azcarate-Peril et al., 2004).
3.3.4 cDNA target preparation and microarray hybridization
For each hybridization, two total RNA samples (25 µg each) were amino-allyl
labeled by reverse transcription using random hexamers (Invitrogen Life Technologies,
Carlsbad, CA) as primers, in the presence of amino-allyl dUTP (Sigma, Town, state), by
a SuperScript II reverse transcriptase (Invitrogen Life Technologies, Carlsbad, CA), as
described previously (Hedge et al., 2000; Azcarate-Peril et al., 2004;
83
http://pga.tigr.org/sop/M004_1a.pdf). Labeled cDNA samples were subsequently coupled
with either Cy3 or Cy5 N-hydroxysuccinimidyl-dyes (Amersham Biosciences Corp.,
Piscalaway, NJ), and purified using a PCR purification kit (Qiagen). The resulting
samples were hybridized onto microarray slides and further processed as described
previously (Azcarate-Peril et al., 2004), according to the TIGR protocol (Hedge et al.,
2000; http://pga.tigr.org/sop/M005_1a.pdf). Hybridizations were performed according to
a single Round-Robin design, so that all possible direct pair-wise comparisons were
conducted. With 8 different sugars, a total of 28 hybridizations were performed (Figure
1). Each treatment was labeled 7 times, and every-other treatment was labeled with either
Cy3 or Cy5, 4 and 3 times, alternatively.
3.3.5 Microarray data collection and analysis
Microarray images were acquired using a Scanarray 4000 Microarray Scanner
(Packard Biochip Bioscience, MA). Signal fluorescence, including spot and background
intensities were subsequently quantified and assigned to genomic ORFs using Quantarray
3.0 (Packard BioChip Technologies LLC, Billerica, MA).
Raw data were imported into SAS (SAS Institute Inc., Cary, NC), compiled,
background corrected, log2 transformed, and subjected to a mixed model of analysis of
variance (SAS proc mixed) with two sequential linear models (Wolfinger et al., 2001).
ANOVA mixed models have proven successful at analyzing microarray data (Wolfinger
et al., 2001; Jin et al., 2001; Kerr and Churchill, 2001; Chhabra et al., 2003; Madsen et
al., 2004; Hsieh et al., 2004; Pysz et al., 2004). The first model accomplished
normalization of data with respect to the global effects of array, dye, treatment, and spot.
84
The normalization model was: log2(yijkl)=µ+Ai+Dj+Tk+Ai(Sl)+εij,, where µ is the sample
mean, Ai is the effect of the ith array, Dj is the effect of the jth dye, Tk is the effect of the
kth treatment, Sl is the effect of the lth spot, and εijkl is the stochastic error. The estimated
effects resulting from this model were used to predict expected intensities for each value,
and residuals were subsequently calculated, as the difference between the observed and
expected intensities. The normalization residuals were subsequently used as input for the
second model, a series of 1,889 gene-specific models, which removed gene-specific
biases and calculated least square mean estimates of treatment effects for each gene under
each treatment. The gene-specific models were: rijkl=µ+Ai+Dj+Tij+Ai(Sij)+εij. The array
and spot effects were treated as random effects, and as such their variance was removed
without performing parameter estimates. Least squares means estimated were calculated
for fixed effects dye and treatment condition. The resulting difference between least
square estimates for two different treatments is analogous to a log2-transformed ratio of
gene expression between those two treatments.
Differences were calculated between all pairs of treatments for each gene and a
measure of statistical significance was obtained from a t-test using these differences and
their associated standard errors. A Bonferroni correction was applied to account for bias
due to multiple tests by dividing the desired level of significance (α=0.05) by the total
number of comparisons performed (54,781). Thus, the corrected false positive rate was
α=9.12x10-7 corresponding to a –log10(p-value) = 6.04 (10-6.04). All p-values which fell
below α=9.12x10-7 were considered statistically significant. Volcano plots of log2-
transformed fold changes (indicating induction ratios) versus log10-transformed p-values
(indicating statistical significance), and three way plots (contour plots) of individual
85
treatment effects were used to visualize contrasts between treatments, and statistical
significance of the results. Global patterns in treatment effects were visualized using
Ward’s method of hierarchical clustering in JMP 5.0 (SAS), using least squares mean
estimates and their standardized counterparts as input.
3.3.6 Real-Time Quantitative RT-PCR
Experiments were conducted using a Q-PCR thermal cycler (I-cycler, BIORAD),
in combination with the QuantiTect SYBR Green PCR kit (Qiagen). PCR primers were
determined according to specifications recommended by the manufacturer (Tm, length,
base content). Six carbohydrates samples were included, namely glucose, fructose,
sucrose, FOS, lactose and galactose. Each set of samples was analyzed in triplicate. The
RNA samples used in Q-PCR experiments were identical to those used in microarray
experiments.
3.4 Results
3.4.1 Differentially expressed genes
Global gene expression patterns obtained from growth on eight different
carbohydrates were visualized by cluster analysis (Eisen et al., 1998) using Ward’s
hierarchical clustering method (Figures 2 and 3), volcano plots (Figure 4) and contour
plots (Figure 5). Overall, between 23 and 379 genes were differentially expressed
between paired treatment conditions (with p-values below the Bonferroni correction),
representing between 1% and 20% of the genome, respectively (Figure 6). Although 342
genes (18% of the genome) showed induction levels above two fold, only 63 genes (3%
86
of the genome) showed induction above 4 fold (Figure 7), indicating a relatively small
number of genes were highly induced. Although overall expression levels of the majority
of the genes remained consistent regardless of the growth substrate (80% of the genome),
select clusters showed differential transcription of genes and operons (Figures 2 and 3).
Nevertheless, for each sugar, a limited number of genes showed specific induction.
In the presence of glucose, ORFs La1679 and La1680 (Figure 3) were highly
induced when compared to other monosaccharides (fructose, galactose) and di-
saccharides (sucrose, lactose, trehalose). The induction levels compared to other sugars
varied between 3.5 and 6.3 for La1679 and between 3.7 and 4.7 for La1680. La1679
encodes an ABC nucleotide binding protein, including commonly found nucleotide
binding domain motifs, namely WalkerA, WalkerB, ABC signature sequence and Linton
and Higgins motif. La1680 encodes an ABC permease, with 10 predicted membrane
spanning domains. No solute binding protein is encoded in their vicinity, suggesting a
possible role as an exporter rather than an importer. Several genes and operons were
specifically repressed by glucose (see Figures 2 and 3), including ORFs La680-686,
which are involved in glycogen metabolism. Since glycogen is metabolized by the cell in
order to store energy, in the presence of the preferred carbon source such as glucose,
energy storage is not necessary. Other genes repressed in the presence of glucose
included proteins involved in uptake of alternative carbohydrate sources, and enzymes
involved in hydrolysis of such carbohydrates.
The three genes of the putative fructose locus, La1777 (FruA, fructose PTS
transporter EIIABCFru), La1778 (FruK, phosphofructokinase EC 2.7.1.56) and La1779
(FruR, transcription regulator) were differentially expressed (Figure 3). Induction levels
87
were up to 3.9, 4.3 and 4.6 for fruA, fruK and fruR, respectively. These results suggest
fructose is transported into the cell via a PTS transporter, into fructose-6-phosphate,
which the phosphofructokinase FruK phosphorylates into fructose-1,6 bi-phosphate, a
glycolysis intermediate.
In the presence of sucrose, the three genes of the sucrose locus were differentially
expressed (Figure 3), namely La399 (ScrR, transcription regulator), La400 (ScrB,
sucrose-6-phosphate hydrolase EC 3.2.1.26), and La401 (ScrA, sucrose PTS transporter
EIIBCASuc). When compared to glucose, induction levels were up to 3.1, 2.8 and 17.2 for
scrR, scrB and scrA, respectively. La401 in particular showed high induction levels,
between 8.0 and 17.2 when compared to mono- and di-saccharides. These results indicate
that sucrose is transported into the cell via a PTS transporter, into sucrose-6-phosphate,
which is subsequently hydrolyzed into glucose-6-phosphate and fructose by ScrB.
The six genes of the FOS operon were differentially expressed (Figures 3, 4, 5),
namely La502, La503, La504, La506 (MsmEFGK ABC transporter), La505 (BfrA, β-
fructosidase EC 3.2.1.26) and La507 (GtfA, sucrose phosphorylase EC 2.7.1.4).
Induction levels varied between 15.1 and 40.6 when compared to mono- and di-
saccharides, and between 5.5 and 8.9 when compared to raffinose. These results suggest
FOS is transported into the cell via an ABC transporter and subsequently hydrolyzed into
fructose and sucrose by the fructosidase. Sucrose is likely subsequently hydrolyzed into
fructose and glucose-1-P by the sucrose phosphorylase. In addition to the FOS operon,
FOS also induced the fructose operon, the sucrose PTS transporter, the trehalose operon
and an ABC transporter (La1679-La1680).
88
In the presence of raffinose, the six genes of the raffinose operon were
specifically induced (Figures 3, 4, 5). The raffinose locus consists of La1442, La1441,
La1440, La1439 (MsmEFGK2 ABC transporter), La1438 (MelA α-galactosidase EC
3.2.1.22), and La1437 (GtfA2, sucrose phosphorylase EC 2.7.1.4). Induction levels varied
between 15.1 and 45.6, when compared to all other conditions. Additionally, La1433-4
(di-hydroxyacetone kinase EC 2.7.1.29), and La1436 (glycerol uptake facilitator) were
induced between 1.9 and 24.7 fold when compared to other conditions.
In the presence of lactose and galactose, ten genes distributed in two loci were
differentially expressed, namely La1463 (LacS permease of the GPH translocator
family), La1462 (LacZ, β-galactosidase EC 3.2.1.23), La1461 (conserved hypothetical
protein), La1460 (surface protein), La1459 (GalK, galactokinase EC 2.7.1.6), La1458
(GalT, galactose-1 phosphate uridylyl transferase EC 2.7.7.10), La1457 (GalM, galactose
epimerase EC 5.1.3.3), La1467-8 (LacLM, β-galactosidase EC 3.2.1.23 large and small
subunits), and La 1469 (GalE, UDP-glucose epimerase EC 5.1.3.2). LacS is similar to
GPH permeases previously identified in lactic acid bacteria. Although LacS contains a
EIIA at the carboxy-terminus, it is not a PTS transporter. Also, LacS includes a His at
position 553, which might be involved in interaction with HPr, as shown in S. salivarius
(Lessard et al., 2003). In the presence of lactose and galactose, galKTM were induced
between 3.7 and 17.6 fold; lacSZ were induced between 2.8 and 17.6 fold; lacL and galE
were induced between 2.7 and 29.5, when compared to other carbohydrates not
containing galactose, i. e. glucose, fructose, sucrose, trehalose and FOS. These results
suggest lactose is transported into the cell via the LacS permease of the galactoside-
pentose hexuronide translocator family. Inside the cell, lactose is hydrolyzed into glucose
89
and galactose by LacZ. Galactose is then phosphorylated by GalK into galactose-1
phosphate, further transformed into UDP-galactose by GalT. UDP-galactose is
subsequently epimerized to UDP-glucose by GalE. UDP-glucose is likely turned into
glucose-1P by La1719, which encodes a UDP-glucose phosphorylase EC 2.7.7.9,
consistently highly expressed. Finally, the phosphoglucomutase EC 5.4.2.2 likely acts on
glucose-1P to yield glucose-6P, a glycolysis substrate.
The three genes of the putative trehalose locus were also differentially expressed
(Figures 3 and 5). The trehalose locus consists of La1012 (encoding the TreB trehalose
PTS transporter EIIABCTre EC 2.7.1.69), La1013 (TreR, trehalose regulator) and La1014
(TreC, trehalose-6 phosphate hydrolase EC 3.2.1.93). Induction levels were between 4.3
and 18.6 for treB, between 2.3 and 7.3 for treR, and between 2.7 and 18.5 for treC, when
compared to glucose, sucrose, raffinose and galactose. These results suggest trehalose is
transported into the cell via a PTS transporter, phosphorylated to trehalose-6 phosphate
and hydrolyzed into glucose and glucose-6 phosphate by TreC.
In addition, genes showing differential expression included hypothetical genes La
457, La466, La1006, La1008, La1010, La1011, La1206; sugar- and energy- related genes
La874 (beta galactosidase EC 3.2.1.86), La910 (L-LDH EC 1.1.1.27), La1007 (pyridoxal
kinase 2.7.1.35), La1812 (alpha glucosidase EC 3.2.1.3), La1632 (aldehyde
dehydrogenase EC 1.2.1.16), La1401 (NADH peroxidase EC 1.11.1.1), LA1974
(pyruvate oxidase EC 1.2.3.3), adherence genes La555, La649, La1019; aminopeptidase
La911, La1086; amino-acid permease, La1102 (membrane protein), La1783 (ABC
transporter), La1879 (pyrimidine kinase EC 2.7.4.7).
90
3.4.2 Real-Time Quantitative RT-PCR
Five genes that were differentially expressed in microarray experiments were
selected for real-time quantitative RT-PCR experiments, in order to validate induction
levels measured by microarrays. These genes were selected for both their broad
expression range (LSM between -1.52 and +3.87), and induction levels between sugars
(fold induction up to 34). All selected genes showed an induction level above 6 fold in at
least one instance. Also, the annotations of the selected genes were correlated
functionally with carbohydrate utilization. The five selected genes were: beta-
fructosidase (La505), trehalose PTS (La1012), glycerol uptake facilitator (La1436), beta-
galactosidase (La1467), and ABC transporter (La1679).
The induction leveled measured by microarrays were plotted against induction
levels measured by Q-PCR, for the five selected genes, in order to validate microarray
data (Figure 8). Individual R-square values ranged between 0.642 and 0.883 for each of
the tested genes (between 0.652 and 0.978 using data in a log2 scale). When the data were
combined, the global R-square value was 0.78 (0.88 using data in a log2 scale). A
correlation analysis was run in SAS (Cary, NC), and showed a correlation between the
two methods with P-values less than 0.001, for Spearman, Hoeffding and Kendall tests.
Additionally, a regression analysis was run in excel (Microsoft, CA), and showed a
statistically highly significant (p < 1.02x10-25) correlation between microarray data and
Q-PCR results. Nevertheless, Q-PCR measurements revealed larger induction levels,
which is likely due to the smaller dynamic range of the microarray scanner, compared to
that of the Q-PCR cycler. Similar results have been reported previously (Wagner et al.,
2003).
91
3.5 Discussion
Comparative analyses of global transcription profiles determined for growth on
eight carbohydrates identified the basis for carbohydrate transport and catabolism in L
acidophilus. Specifically, three different types of carbohydrate transporters were
differentially expressed, namely phosphoenolpyruvate: sugar phosphotransferase system
(PTS), ATP binding cassette (ABC) and galactoside-pentose hexuronide (GPH)
translocator, illustrating the diversity of carbohydrate transporters used by L. acidophilus.
Transcription profiles suggested that galactosides were transported by a GPH
translocator, while mono- and di- saccharides were transported by members of the PTS,
and polysaccharides were transported by members of the ABC family.
Microarray results indicated fructose, sucrose and trehalose are transported by
PTS transporters EIIABCFru (La1777), EIIBCASuc (La401) and EIIABCTre (La1012),
respectively. Those genes are encoded on typical PTS loci (Figure 9), along with
regulators and enzymes that have been well characterized in other organisms. In contrast,
FOS and raffinose are transported by ABC transporters of the MsmEFGK family, La502-
505 and La1437-1442, respectively. In the case of trehalose and FOS, microarray results
correlate well with functional studies in which targeted knock out of carbohydrate
transporters and hydrolases modified the saccharolytic potential of L. acidophilus NCFM
(Barrangou et al., 2002; Duong et al., 2004). Differential expression of the EIIABCTre is
consistent with recent work in L. acidophilus indicating La1012 is involved in trehalose
uptake (Duong et al., 2004). Similarly, differential expression of the fos operon is
consistent with previous work in L. acidophilus indicating those genes are involved in
92
uptake and catabolism of FOS, and induced in the presence of FOS and repressed in the
presence of glucose (Barrangou et al., 2003). Additionally, induction of the raffinose msm
locus is consistent with previous work in Streptococcus mutans (Russell et al., 1992) and
Streptococcus pneumoniae (Rosenow et al., 1999).
A number of lactic acid bacteria take up glucose via a PTS transporter. The EIIMan
PTS transporter has the ability to import both mannose and glucose (Cochu et al., 2003).
The L. acidophilus mannose PTS system is similar to that of Streptococcus thermophilus,
with proteins sharing 53-65% identity and 72-79% similarity. Specifically, the EIIMan is
composed of three proteins IIABMan IICMan IIDMan encoded by La452 (manL), La455
(manM) and La456 (manN), respectively (Figure 9). Most of the carbohydrates examined
here specifically induced genes involved in their own transport and hydrolysis, but
glucose did not. Analysis of the mannose PTS revealed that the genes encoding the
EIIABCDMan were consistently highly expressed, regardless of the carbohydrate source
(Figure 3A). This expression profile suggests glucose is a preferred carbohydrate, and L.
acidophilus is also designed for efficient utilization of different carbohydrate sources, as
was suggested previously for L. plantarum (Kleerebezem et al, 2003).
The genes differentially expressed in the presence of galactose and lactose
included a permease (LacS), and the enzymatic machinery of the Leloir pathway.
Members of the LacS subfamily of galactoside-pentose-hexuronide (GPH) translocators
have been described in a variety of lactic acid bacteria, including Leuconostoc lactis
(Vaughan et al., 1996), S. thermophilus (van den Bogaard et al., 2000), Streptococcus
salivarius (Lessard et al., 2003) and Lactobacillus delbrueckii (Lapierre et al., 2002).
Although LacS contains a PTS EIIA at the carboxy terminus, it is not a member of the
93
PTS family of transporters. LacS has been reported to have the ability to import both
galactose and lactose in select organisms (Vaughan et al., 1996; van den Bogaart et al.,
2000). Although the combination of a LacS lactose permease with two β-galactosidase
subunits LacL and LacM has been described in L. plantarum (Kleerebezem et al., 2003)
and Leuconostoc lactis (Vaughan et al., 1996), it has never been reported in L.
acidophilus. Even though constitutive expression of lacS and lacLM has been reported
previously (Vaughan et al., 1996), our current results indicate specific induction of the
genes involved in uptake and catabolism of both galactose and lactose. Operon
organization for galactoside utilization is variable and unstable among Gram-positive
bacteria (Lapierre et al., 2002; Vaillancourt et al., 2002, Boucher et al., 2003; Fortina et
al., 2003; Grossiord et al., 2003). Interestingly, even amongst closely related
Lactobacillus species, namely L. johnsonii, L. gasseri and L. acidophilus, the lactose-
galactose locus is not well conserved (Pridmore et al., 2004) (Figure 10). Perhaps the
presence of mobile elements in the vicinity of those genes is responsible for the
instability of this locus (Altermann et al., 2004).
Although it was previously suggested that the phosphoenolpyruvate:
phosphotransferase system is the primary sugar transport system of Gram-positive
bacteria (Ajdic et al., 2002; Warner and Lolkema, 2003), current microarray data indicate
that ABC transport systems are also important. While PTS transporters are involved in
uptake of mono- and di-saccharides, those carbohydrates are digested in the upper GIT.
In contrast, oligosaccharides reach the lower intestine whereby commensals are likely to
compete for more complex and scarce nutrients. Perhaps under such conditions ABC
transporters are even more crucial than the PTS, given their apparent roles in transport of
94
oligosaccharides like FOS and raffinose. In this regard, the ability to utilize nutrients that
has been are non digestible by the host has been associated with competitiveness and
persistence of beneficial intestinal flora in the colon (Schell et al., 2002).
Transcription profiles of genes differentially expressed in conditions tested
indicated that all carbohydrate uptake systems and their respective sugar hydrolases were
specifically induced by their substrate, except for glucose. Moreover, genes within those
inducible loci were repressed in the presence of glucose, and cre sequences were
identified in their promoter-operator regions (Figure 11). Together, these results indicate
regulation of carbohydrate uptake and metabolism at the transcription level, and implicate
the involvement of a global regulatory system compatible with carbon catabolite
repression. Carbon catabolite repression (CCR) controls transcription of proteins
involved in transport and catabolism of carbohydrates (Miwa et al., 2000). Catabolite
repression is a mechanism widely distributed amongst Gram-positive bacteria, mediated
in cis by catabolite responsive elements (Miwa et al., 2000; Wickert and Chambliss,
1990), and in trans by repressors of the LacI family, which is responsible for
transcriptional repression of genes encoding unnecessary saccharolytic components in the
presence of preferred substrates (Weickert and Chambliss, 1990; Viana et al., 2000;
Muscariello et al., 2001 Warner and Lolkema, 2003). This regulatory mechanism allows
cells to coordinate the utilization of diverse carbohydrates, to focus primarily on
preferred energy sources. CCR is based upon several key enzymes, namely HPr (La639,
ptsH), EI (La640, ptsI), CcpA (La431, ccpA), and HPrK/P (La676, ptsK), all of which are
encoded within the L. acidophilus chromosome.
95
Carbon catabolite repression has already been described in lactobacilli (Mahr et
al., 2000). The PTS is characterized by a phosphate transfer cascade involving PEP, EI,
HPr, EIIABC, whereby a phosphate is ultimately transferred to the carbohydrate substrate
(Saier, 2000; Warner and Lolkema, 2003). HPr is an important component of CCR,
which is regulated via phosphorylation by enzyme I and HPrK/P. When HPr is
phosphorylated at His15, the PTS is active, and carbohydrates transported via the PTS are
phosphorylated via EIIABCs. In contrast, when HPr is phosphorylated at Ser46, the PTS
machinery is not functional (Mijakovic et al., 2002).
Although the phosphorylation cascade suggests regulation at the protein level,
several studies report transcriptional modulation of ccpA and ptsHI. In S. thermophilus,
CcpA production is induced by glucose (van den Bogaart et al., 2000). In several
bacteria, the carbohydrate source modulates ptsHI transcription levels (Luesink et al.,
1999). In contrast, expression levels of ccpA, ptsH, ptsI and ptsK did not vary in the
presence of different carbohydrates in L. acidophilus. These results are consistent with
regulation via phosphorylation at the protein level. Similar results have been reported for
ccpA expression levels in Lactobacillus pentosus (Mahr et al., 2000), and ptsHI
transcription in S. thermophilus (Cochu et al., 2003).
Globally, microarray results allowed reconstruction of carbohydrate transport and
catabolism pathways (Figure 12). Although transcription of carbohydrate transporters and
hydrolases was specifically induced by their respective substrates, glycolysis genes were
consistently highly expressed (Figure 13). Orchestrated carbohydrate uptake likely
withdraws energy sources from the intestinal environment and deprives other bacteria of
96
access to such resources. Consequently, L. acidophilus may compete well against other
commensals for nutrients.
In summary, a variety of carbohydrate uptake systems were identified and
characterized, with respect to expression profiles in the presence of different
carbohydrates, including PTS, ABC and GHP transporters. The uptake and catabolic
machinery is highly regulated at the transcription level, suggesting the L. acidophilus
transcriptome is flexible, dynamic and designed for efficient carbohydrate utilization.
Differential gene expression indicated the presence of a global carbon catabolite
repression regulatory network. Regulatory proteins were consistently highly expressed,
suggesting regulation at the protein level, rather than the transcriptional level.
Collectively, L. acidophilus appears to be able to efficiently adapt its metabolic
machinery to fluctuating carbohydrate sources available in the nutritional complex
environment of the small intestine. In particular, ABC transporters of the MsmEFG
family involved in uptake of FOS and raffinose likely play an important role in the ability
of L. acidophilus to compete with intestinal commensals for complex sugars that are not
digested by the human host. Ultimately, this information provides new insights into how
undigested dietary compounds influence the intestinal microbial balance. This study is a
model for comparative transcriptional analysis of a bacterium exposed to varying growth
substrates.
97
3.6 References
Ajdic, D., McShan, W. M., McLaughlin, R. E., Savic, G., Chang, J., Carson, M. B., Primeaux, C., Tian, R., Kenton, S., Jia, H., Lin, S., Qian, Y., Li, S., Zhu, H., Najar, F., Lai, H., White, J., Roe, B. A. & Ferretti, J. J. (2002) Proc. Natl. Acad. Sci. USA 99, 14434-14439
Altermann, E., Russell, W. M., Azcarate-Peril, M. A., Barrangou, R., Buck, L. B.,
McAuliffe, O., Souther, N., Dobson, A., Duong, T., Callanan, M., Lick, S., Hamrick, A., Cano, R., & Klaenhammer, T. R. (2004). J. Bacteriol In review
Azcarate-Peril et al., 2004 In review Barrangou R, Altermann E, Hutkins R, Cano, & Klaenhammer, TR. (2003) Proc. Natl.
Acad. Sci. USA 100, 8957-8962 Benno, Y., Endo, K., Shiragami, N., Sayama, K., and Mitsuoka, T. (1987) Bifido. Micro.
6, 59-63 Bogaard, van den P. T. C., Kleerebezem, M., Kuipers, O. P., & De Vos, W. M. (2000) J.
Bacteriol. 182, 5982-9 Boucher, I., Vadeboncoeur, C., & Moineau, S. (2003) Appl. Environ. Microbiol. 69,
4149-56 Chhabra, S. R., Shockley, K. R., Conners, S. B., Scott, K. L., Wolfinger, R. D., & Kelly,
R. M. (2003) J. Biol. Chem. 278,7540-7552 Cochu, A., Vadeboncoeur, C., Moineau, S, & Frenette, M. (2003) Appl. Environ.
Microbiol. 69, 5423-32 Duong, T., Barrangou, R., Russell, M. W., & Klaenhammer, T. R. (2004) In review Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998) Proc. Natl. Acad. Sci.
USA 95, 14863-8 Fortina, M. G., Ricci, G., Mora, D., Guglielmetti, S., & Manachini, P. L. (2003) Appl.
Environ. Microbiol. 69, 3238-43 Gibson, G. R., Beatty, E. R., Wan, X., & Cummings, J. H. (1995) Gastroent. 108, 975-82 Gibson, G. R. & Roberfroid, M. B. (1995) J. Nutr. 125, 1401-1412. Greene, J. D., & Klaenhammer, T. R. (1994) Appl. Environ. Microbiol. 60, 4487-4494
98
Grossiord, B. P., Luesink, E. J., Vaughan, E. E., Arnaud, A., & De Vos, W. M. (2003) J. Bacteriol. 185, 870-8
Hedge, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Hughes, J. E.,
Snesrud, E., Lee, N., & Quackenbush J. (2000) Biotechniques 29, 548-562 Helden, van J., Andre, B., & Collado-Vides, J. (2000) Yeast 16, 177-87 Hsieh, W. P., Chu, T. M., Wolfinger, R. D., & Gibson, G. (2003) Genetics 165, 747-57 Jin, W., Riley, R. M., Wolfinger, R. D., White, K. P., Passador-Gurgel, G., & Gibson, G.
(2001) Nature Genet. 29, 389-395 Kerr, M. K., and Churchill G. A. (2001) Genet. Res. Camb. 77, 123-8 Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,
R., Tarchini, R., Peters, S. A., Sandbrink, H. M., Fiers, M. W., Stiekema, W., Lankhorst, R. M., Bron, P. A., Hoffer, S. M., Groot, M. N., Kerkhoven, R., de Vries, M., Ursing, B., de Vos, W. M. & Siezen, R. J. (2003) Proc. Natl. Acad. Sci. USA 100, 1990-5.
Lapierre, L., Mollet, B., & Germond, J. E. (2002) J. Bacteriol. 184, 928-35 Lessard, C., Cochu, A., Lemay, J. D., Roy, D., Vaillancourt, K., Frenette, M., Moineau,
S., & Vadeboncoeur, C. (2003) J. Bacteriol. 185, 6764-72 Luesink, E. J., Marugg, J. D., Kuipers, O. P. & de Vos, W. M. (1999) J. Bacteriol. 181,
764-71 Madsen, S. A., Chang, L. C., Hickey, M. C., Rosa, G. J. M., Coussens, P. M., & Burton,
J. L. (2004) Physiol. Genomics 16, 212-21 Mahr, K., Hillen, W., & Titgemeyer, F. (2000) Appl. Environ. Microbiol. 66, 277-83 Mijakovic, I., Poncet, S., Galinier, A., Monedero, V., Fieulaine, S., Janin, J., Nessler, S.,
Marquez, J. A., Scheffzek, K., Hasenbein, S., Hengstenberg, W., & Deutscher, J. (2002) Proc. Natl. Acad. Sci. USA 99, 13442-7
Miwa, Y., Nakata, A., Ogiwara, A., Yamamoto, M. & Fujita, Y. (2000) Nucleic Acids
Res. 28, 1206-10 Muscariello, L., Marasco, R., De Felice M., & Sacco, M. (2001) Appl. Environ.
Microbiol. 67, 2903-7
99
Pridmore RD, Berger B, Desiere F, Vilanova D, Barretto C, Pittet AC, Zwahlen MC, Rouvet M, Altermann E, Barrangou R, Mollet B, Mercenier A, Klaenhammer TR, Arigoni F, & Schell MA. (2004) Proc. Natl. Acad. Sci. USA 101, 2512-2517
Pysz, M. A., Ward, D. E., Shockley, K. R., Montero, C. I., Conners, S. B., Johnson, M.
R., & Kelly, R. M. (2004) Extremophiles 8, 209-17 Reid, G. (1999) Appl. Environ. Microbiol. 65, 3763-6 Reid, G., Sanders, M. E., Gaskins, H. R., Gibson, G. R., Mercenier, A., Rastall, R.,
Roberfroid, M., Rowland, I., Cherbut, C., & Klaenhammer T. R. (2003) J. Clin. Gastroenterol. 37, 105-118
Rosenow, C., Maniar, M., & Trias, J. (1999) Genome Res. 9, 1189-97 Russell, R. R. B., Aduseopoku, J., Sutcliffe, I. C., Tao, L. & Ferretti, J. J. (1992) J. Biol.
Chem. 267, 4631-4637. Saier, M. H. Jr. (2000) Mol. Microbiol. 35, 699-710 Sanders, M. E., & Klaenhammer, T. R. (2001) J. Dairy. Sci. 84, 319-331 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,
M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sci. USA 99, 14422-14427.
Sui, J., Leighton, S., Busta, F., & Brady, L. (2002) J. Appl. Microbiol. 92, 907-12 Tannock, G. W. (1999) Antonie van Leeuwenhoek 76, 265-78 Vaillancourt, K., Moineau, S., Frenette, M., Lessard, C., & Vadeboncoeur, C. (2002) J.
Bacteriol. 184, 785-93 Varcoe. J. J., Krejcarek, G., Busta, F., & Brady, L. (2003) J. Food Prot. 66, 457-465 Vaughan, E. E., David, S., & De Vos W. M. (1996) Appl. Environ. Microbiol. 62, 1574-
82 Viana, R., Monedero, V., Dossonet, V., Vadeboncoeur, C., Perez-Martinez, G., &
Deutscher, J. (2000) Mol. Microbiol. 36, 570-584 Wagner, V. E., Bushnell, D., Passador, L., Brooks, A. I., & Iglewski, H. I.(2003) J. Bac.
185, 2080-95 Warner, J. B., & Lolkema, J. S. (2003) Microbiol. Mol. Rev. 67, 475-90
100
Weickert, M. J. & Chambliss, G. H. (1990) Proc. Natl. Acad. Sci. USA 87, 6238-42 Wolfinger, R. D., Gibson, G., Wolfinger, E. D., Bennett, L., Hamadeh, H., Bushel, P.,
Afshari, C., & Paules, R. S. (2001) J. Comput. Biol. 8, 625-637
101
Glc Fru
Tre Suc
Gal FOS
Lac Raff
12
3456
7Glc Fru
Tre Suc
Gal FOS
Lac Raff
8
9
101112
13
Glc Fru
Tre Suc
Gal FOS
Lac Raff
14
1516
17
18
Glc Fru
Tre Suc
Gal FOS
Lac Raff
1920
2122
Glc Fru
Tre Suc
Gal FOS
Lac Raff23
24
25
Glc Fru
Tre Suc
Gal FOS
Lac Raff
26
27
Glc Fru
Tre Suc
Gal FOS
Lac Raff
28
Glc Fru
Tre Suc
Gal FOS
Lac Raff
Glc Fru
Tre Suc
Gal FOS
Lac Raff
12
3456
7Glc Fru
Tre Suc
Gal FOS
Lac Raff
12
3456
7Glc Fru
Tre Suc
Gal FOS
Lac Raff
8
9
101112
13
Glc Fru
Tre Suc
Gal FOS
Lac Raff
8
9
101112
13
Glc Fru
Tre Suc
Gal FOS
Lac Raff
14
1516
17
18
Glc Fru
Tre Suc
Gal FOS
Lac Raff
14
1516
17
18
Glc Fru
Tre Suc
Gal FOS
Lac Raff
1920
2122
Glc Fru
Tre Suc
Gal FOS
Lac Raff
1920
2122
Glc Fru
Tre Suc
Gal FOS
Lac Raff23
24
25
Glc Fru
Tre Suc
Gal FOS
Lac Raff23
24
25
Glc Fru
Tre Suc
Gal FOS
Lac Raff
26
27
Glc Fru
Tre Suc
Gal FOS
Lac Raff
26
27
Glc Fru
Tre Suc
Gal FOS
Lac Raff
28
Glc Fru
Tre Suc
Gal FOS
Lac Raff
28
Glc Fru
Tre Suc
Gal FOS
Lac Raff
Glc Fru
Tre Suc
Gal FOS
Lac Raff
Figure 1. Round-robin microarray hybridization design. Each carbohydrate is at a vertex of an octagon. Glc, glucose; Fru, fructose; Suc, sucrose; FOS, fructooligosaccharides; Raf, raffinose; Lac, lactose; Gal, galactose; Tre, trehalose. Each arrow represents a hybridization whereby the plain end of the arrow indicates labeling with Cy3, and the tip of the arrow indicates labeling with Cy5. This design allows all possible direct comparison of all treatments.
102
Figure 2. Hierarchical clustering analyses of gene expression patterns. The expression of 1,889 genes (vertically) after growth on eight carbohydrates (horizontally) is shown colorimetrically. (A) Least squares means, representing overall gene expression level corrected for systematic and random errors (see Methods): low=blue, high=red; Hierarchical clustering of least squares means allows visualization of the relative expression levels of all genes within each treatment (Figure 1A). (B) Standardized least square means, representing gene expression level standardized across all 8 treatments, with color indicating expression level relative to the mean expression level across all treatments: low=green, high=red. Clustering of standardized least squares means allows comparison of the standardized expression profile of every gene, across all treatments (Figure 1B). FOS, fructooligosaccharides; FRU, fructose; GAL, galactose; GLC, glucose; LAC, lactose; RAF, raffinose; SUC, sucrose; TRE, trehalose.
103
Figure 3. Hierarchical clustering analysis of expression patterns for select genes and operons. (A) Least squares means of genes of selected genes and operons of interest, representing overall gene expression within treatments: low=blue, high=red; (B) Standardized least squares means of genes of interest, indicating relative expression level across all treatments: low=green, high=red. Carbohydrate sources are displayed at the bottom: FOS, fructooligosaccharides; FRU, fructose; GAL, galactose; GLC, glucose; LAC, lactose; RAF, raffinose; SUC, sucrose; TRE, trehalose.
104
fold change FOS/RAFF
-64 -32 -16 -8 -4 -2 0 2 4 8 16 32 64
sign
ifica
nce
(-lo
g10
P-va
lue)
0
10
20
30
40
501438
14371441
1442 1439
1440 507
503506
502504
505
fold change FOS/RAFF
-64 -32 -16 -8 -4 -2 0 2 4 8 16 32 64
sign
ifica
nce
(-lo
g10
P-va
lue)
0
10
20
30
40
501438
14371441
1442 1439
1440 507
503506
502504
505
1014
1012
1014
1012
Figure 4. Volcano plot comparison of gene expression between FOS and raffinose. Visualization of the global differential gene expression profiles in the presence of raffinose and FOS. The X axis indicates the differential expression profiles, plotting the fold-induction ratios in a logarithmic-2 scale. The Y axis indicates the statistical significance, plotting the statistical significance of the difference in expression (P-value from a t-test) in a logarithmic-10 scale. Genes within the raffinose msm locus are shown in green, genes within the FOS msm locus are shown in blue, and two genes within the trehalose tre locus are shown in red.
105
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
Lsm
RA
FFIN
OSE
-3.0 -2.0 -1.0 .0 1.0 2.0 3.0 4.0 5.0
Lsm FOS
Lsm
TREH
ALO
SE
<= -2
<= -1
<= 0
<= 1
<= 2
<= 3
> 3
-3.0
-2.0
-1.0
0.0
1.0
2.0
3.0
4.0
Lsm
RA
FFIN
OSE
-3.0 -2.0 -1.0 .0 1.0 2.0 3.0 4.0 5.0
Lsm FOS
Lsm
TREH
ALO
SE
<= -2
<= -1
<= 0
<= 1
<= 2
<= 3
> 3
Figure 5. Contour plot comparison of gene expression between FOS, raffinose and trehalose. Three-way plot of the least squares means of all the genes in the presence of FOS (X axis), raffinose (Y axis), trehalose (Z axis, color coded). In the third dimension (Z axis) the gene expression level is coded colorimetrically: blue=low gene expression, red=high gene expression. Each color in-between is representative of a value range. Differentially expressed operons are annotated: 1437-1442 raffinose msm operon, 502-507 FOS msm operon, 1012, 1014 trehalose tre locus.
106
Treatment Comparison
Lac-Raf
Raf-Suc
Raf-Fru
Fos-Raf
Raf-Glu
Tre-Raf
Lac-Gal
Gal-Fos
Gal-Raf
Gal-Suc
Gal-Tre
Gal-Fru
Fos-Glu
Tre-Fru
Lac-Fos
Lac-Fru
Lac-Glu
Glu-Fru
Gal-Glu
Tre-Fos
Tre-Suc
Tre-Glu
Tre-lac
Suc-Glu
Lac-Suc
Fru-Fos
Suc-Fos
Fru-Suc
Num
ber o
f gen
es d
iffer
entia
lly e
xpre
ssed
0
50
100
150
200
250
300
350
400
Figure 6. Global differential gene expression. Quantification of the number of genes declared differentially expressed by statistical criteria. For all 28 possible treatment comparisons, genes with p-values from a t-test below the Bonferroni correction (-log10(p-value) > 6.04) were considered differentially expressed. For each comparison, the number of genes statistically differentially expressed is plotted, in decreasing order.
107
Minimum Fold Induction
1 2 3 4 5 6 7 8 9 10 11
Num
ber o
f gen
es
0
50
100
150
200
250
300
350
400
Figure 7. Gene fold induction. Quantification of the number of genes differentially expressed above various fold induction cut offs. All possible treatment comparisons were considered, and a gene was considered induced above a particular level if it showed induction in at least one treatment comparison. For genes that showed induction in more than one instance, the highest induction level was selected.
108
1 2 4 8 16 32 64
1
2
4
8
16
32
64
128
256
fold
indu
ctio
n Q
-PCR
fold induction microarrays
La1467 La505 La1436 La1012 La1679
Figure 8. RT-Q-PCR analysis of differentially expressed genes. For five selected genes, induction levels were compared between six different treatments, resulting in 15 induction levels for each gene. The comparison between the fold induction determined by microarrays (X axis) and real-time quantitative RT-PCR (Y axis) is plotted, on a logarithmic-2 scale. Induction levels for each genes are color-coded.
109
manL
pepQ ccpA
Man
Fru
Suc
Fos
Raff
Lac
Lac
Tre
CCR
manM manN
fruR fruK fruA
scrR scrAscrB
msmE msmF msmG bfrA msmK gtfA
msmE2 msmF2 msmG2 msmK2 melA gtfA2
galK galT galMlacS lacZ hypo muB
lacL lacM galE
treC treBtreR
ptsH ptsI ptsK
msmR
msmR2
manL
pepQ ccpA
Man
Fru
Suc
Fos
Raff
Lac
Lac
Tre
CCR
manM manN
fruR fruK fruA
scrR scrAscrB
msmE msmF msmG bfrA msmK gtfA
msmE2 msmF2 msmG2 msmK2 melA gtfA2
galK galT galMlacS lacZ hypo muB
lacL lacM galE
treC treBtreR
ptsH ptsI ptsK
msmR
msmR2
Figure 9. Genetic loci of interest. The layouts of the loci discussed in the text are shown: man, glucose-mannose locus; fru, fructose locus; suc, sucrose locus; fos, FOS locus; raff, raffinose locus; Lac, lactose-galactose loci; tre, trehalose locus; CCR, carbon catabolite loci.
110
L. johnsonii reg lacS bgaB galK galT galMgalE lacM lacL
L. gasseri reg galK galT galMgalE
lacS lactose-proton symporter
lacLM beta-galactosidase
galE galactose epimerase
galK galactokinase
galT galactose-1P uridyl transferase
L. acidophilus galK galT galMlacS lacZ
hypo muB
reg
Tn
galE lacM lacL
L. johnsonii reg lacS bgaB galK galT galMgalE lacM lacL
L. gasseri reg galK galT galMgalE
lacS lactose-proton symporter
lacLM beta-galactosidase
galE galactose epimerase
galK galactokinase
galT galactose-1P uridyl transferase
L. acidophilus galK galT galMlacS lacZ
hypo muB
reg
Tn
galE lacM lacL
Figure 10. Lactose locus in select lactobacili. Layout of the lactose loci in Lactobacillus gasseri, Lactobacillus johnsonii and Lactobacillus acidophilus.
111
La400 cre1 TGataaaCGtttgaCA -72 bp
cre2 AGataaCGcttaCA -17 bpLa401 cre1 TGaataCGttatCA -48 bp
cre2 TAaaagCGtttaCA -17 bpLa452 cre1 TAaaagCGgattCA -27 bpLa502 cre1 TGaaagCGatatTA -172 bp
cre2 TGaaaaCGtttcCA -140 bpcre3 TAgaaaCGtttcAA -78 bpcre4 TTcaaaCGtttcAA -14 bp
La680 cre1 AGtaagCGctttCC -40 bpLa1012 cre1 TGtgatCGctttCA -82 bp
cre2 TGaaaaCGctttAT -15 bpLa1013 cre1 ATaaagCGttttCA -155 bp
cre2 TGaaagCGatcaCA -88 bpLa1442 cre1 AGaataCGcaatAA -69 bp
cre2 TGaaagCGcttaAA -38 bpLa1459 cre1 TGaaaaCGattaCA -27 bpLa1460 cre1 GAtggaCGaataTA -22 bpLa1461 cre1 AGgtatCGtcatCT -103 bpLa1463 cre1 AAaattCGtcttCT -36 bpLa1465 cre1 AAtaaaCGtaagTA -27 bpLa1467 cre1 TAaaagCGttttCA -32 bpLa1469 cre1 TGtaatCGatttCA -21 bpLa1684 cre1 AGttttCGgacaAC -61 bp
cre2 AGaaatCGcttaCA -25 bp
Figure 11. Catabolite responsive elements sequences. Putative catabolite responsive elements are highlighted in the promoter regions of select differentially expressed genes. Numbers indicate the position of the last cre nucleotide relative to the translational start of the ORF mentioned. The promoter-operator regions of differentially expressed genes and operons were searched for putative catabolite response elements according to consensus sequences TGNNWNCGNNWNCA (Miwa et al., 2000) and TGWAANCGNTNWCA (Weickert and Chambliss, 1990).
112
GLYCOLYSIS
GLUCOSEFRUCTOSESUCROSETREHALOSE
FOS
RAFFINOSE
LACTOSE
GALACTOSE
GLUCOSE-6PFRUCTOSE-1PSUCROSE-6P
LACTOSE
GALACTOSE
TREHALOSE-6P
FOS
RAFFINOSE
FRUCTOSE-1-6P2GLUCOSE-6P + GLUCOSE
GLUCOSE-6P + FRUCTOSE
FRUCTOSE
SUCROSE+
GALACTOSE
GLUCOSE +FRUCTOSE
GLUCOSE +GALACTOSE
GALACTOSE-1P
UDP-GALACTOSEUDP-GLUCOSE
GLUCOSE-1P
GLUCOSE-6P
FruA
2.7.1.56 FruK
ScrA
3.2.1.26 ScrBMsmEFGK
BfrA 3.2.1.26
MsmEFGK2
MelA 3.2.1.22GtfA
3.2.1.26
LacS
LacS
3.2.1.23 LacZ
2.7.1.6 GalK
2.7.1.10 GalTGalE
5.1.3.2
2.7.7.9 GalU
5.4.2.2 Pgm
ManLMNTreB
3.2.1.93 TreC
GLYCOLYSIS
GLUCOSEFRUCTOSESUCROSETREHALOSE
FOS
RAFFINOSE
LACTOSE
GALACTOSE
GLUCOSE-6PFRUCTOSE-1PSUCROSE-6P
LACTOSE
GALACTOSE
TREHALOSE-6P
FOS
RAFFINOSE
FRUCTOSE-1-6P2GLUCOSE-6P + GLUCOSE
GLUCOSE-6P + FRUCTOSE
FRUCTOSE
SUCROSE+
GALACTOSE
GLUCOSE +FRUCTOSE
GLUCOSE +GALACTOSE
GALACTOSE-1P
UDP-GALACTOSEUDP-GLUCOSE
GLUCOSE-1P
GLUCOSE-6P
FruA
2.7.1.56 FruK
ScrA
3.2.1.26 ScrBMsmEFGK
BfrA 3.2.1.26
MsmEFGK2
MelA 3.2.1.22GtfA
3.2.1.26
LacS
LacS
3.2.1.23 LacZ
2.7.1.6 GalK
2.7.1.10 GalTGalE
5.1.3.2
2.7.7.9 GalU
5.4.2.2 Pgm
ManLMNTreB
3.2.1.93 TreC
Figure 12. Carbohydrate utilization in L. acidophilus. This diagram shows carbohydrate transporters and hydrolases as predicted by transcriptional profiles. Protein names and EC numbers are specified for each element. PTS transporters are shown in red. GPH transporters are shown in yellow. ABC transporters are shown in green.
113
Figure 13. Expression of glycolysis genes. D-lactate dehydrogenase (D-LDH, La55), phosphyglucerate mutase (PGM, La185), L-lactate dehydrogenase (L-LDH, La271), glyceraldehyde 3-phosphate dehydrogenase (GPDH, La698), phosphoglycerate kinase (PGK La699), glucose 6-phosphate isomerase (GPI, La752), 2-phosphoglycerate dehydratase (PGDH, La889), phosphofructokinase (PFK, La956), pyruvate kinase (PK, La957), fructose-biphosphate aldolase (FBPA, La1599).
114
Chapter IV – Global characterization of the Lactobacillus acidophilus transcriptome and analysis of relationships between gene expression level, codon usage, chromosomal location and
intrinsic gene characteristics
115
4.1 Abstract
The relationships between gene expression level, codon usage, chromosomal
location and intrinsic genes parameters were investigated globally, in Lactobacillus
acidophilus. The codon usage profile revealed a general bias towards AT-rich codons, as
expected for a low GC content organisms. In contrast, genes showing high codon usage
bias had higher GC-content at the third codon position. Correlation analyses showed that
gene expression levels were most highly correlated with GC content, codon adaptation
index, size and then RBS. Gene expression levels did not correlate with GC content at the
third codon position. The high correlation between GC content and gene expression level
may reflect that genes with GC contents much higher than that of the genome signature
are biologically important and highly expressed. Data were segregated into four
chromosomal locations, by strand, location and orientation, relative to the origin and
terminus of replication. Analysis of variance was used to investigate whether there were
differences in gene expression between the four chromosomal locations. The results
showed that genes on the leading strand were more highly expressed, and showed higher
codon usage bias. Also, genes located between the origin and terminus of replication,
relative to the forward strand were also more highly expressed. Overall, genes on either
strand pointing towards the terminus of replication were more highly expressed. Analysis
of the correlation between gene expression level and intrinsic gene parameters, by
location, revealed a strong influence of chromosomal architecture on gene transcription.
Codon usage showed a strong strand bias. Specifically, genes on the leading strand
located between the origin and terminus of replication, pointing towards the terminus,
showed both the highest codon usage bias and gene expression levels. For this particular
116
location, gene expression levels were most highly correlated with codon adaptation
index. Additionally, genes on the lagging strand located between the terminus and the
origin of replication, oriented towards the terminus, showed high expression levels, but
low codon usage bias. The correlations between gene expression level, CAI and GC
content indicate very highly expressed genes have a higher GC content, and display
codon bias. Globally, chromosomal architecture strongly influences gene expression
levels, with a bias towards locating the majority of the highly expressed genes on each
strand pointing towards the terminus of replication. This preferred combination of strand
location and orientation allows for more efficient co-directional replication and
transcription, ultimately providing a selective advantage. Although chromosomal location
and intrinsic gene parameters influence strongly gene transcription, additional factors
including environmental conditions and evolutionary forces also affect gene expression.
This study illustrates the importance of chromosomal architecture for gene expression
and shows that, for L. acidophilus, chromosomal location, codon usage and GC content
are correlated with gene expression level.
117
4.2 Introduction
The universal translation process relies on the genetic code, which describes how
64 codons specify 20 amino acids. The degeneracy of the genetic code allows all amino
acids except Met and Trp to be encoded by more than one codon. As a result, the
“genome hypothesis” proposed that each species developed its own preferred codon
usage pattern (Grantham et al., 1980). The extent to which alternative synonymous
codons are used is not random between and within organisms, which is illustrated by
differences in codon usage between and within species (Sharp et al, 1986; Sharp et al.,
1988; Aota et al., 1988; Lloyd and Sharp, 1992, Coghlan and Wolfe, 2000; Ohno et al.,
2001). Although base composition of the first two codon positions is species-
independent, the base distribution pattern at the third position allows variation between
species (Zhang and Chou, 1994).
The initial premise underlying codon usage studies is based upon the assumption
that selection molds the pattern of codon usage differently in various organisms. In
particular, genes encoding proteins necessary at most stages of the bacterial life cycle
seem to have evolved a specific codon usage that allows such genes to be consistently
highly expressed (Karlin and Mrazek, 2000; Karlin et al., 2004). Specifically, alternative
synonymous codons are not used randomly, and highly expressed genes are
representative of an organism’s codon bias. Codon usage variability is largely determined
by natural selection and mutation, on a genome-wide scale (Sharp et al., 1986; Sharp and
Li, 1987; Chen et al., 2004). Mutational forces are the primary factor responsible for
genome-wide codon bias (Chen et al., 2004), resulting in species genomic differentiation.
Specifically, selection seems to occur via translation efficiency as to differentiate highly
118
expressed genes. Indeed, in model organisms such as Escherichia coli and
Saccharomyces cerevisiae, very highly expressed genes appear to display a relatively
high degree of codon bias (Coghlan and Wolfe, 2000; dosReis et al., 2003).
Among the several measures of codon bias that have been established, most
studies rely upon the codon adaptation index (CAI) (Sharpe and Li, 1987) as a measure of
codon bias (Coghlan and Wolfe, 2000). The codon adaptation index (CAI) is a measure
of synonymous codon usage bias in a particular gene, relative to that of a set of highly
expressed genes (Sharp and Li, 1987). The CAI uses a reference set of highly expressed
genes to assess the relative frequencies of each codon. The CAI has been used to predict
the level of expression of a gene, assess adaptation of genes to a host, and make
comparisons of codon usage in different organisms (Sharp and Li, 1987).
Codon usage measures have been used in a variety of organisms to forecast gene
transcription levels. Specifically, predicted highly expressed (PHX) genes studies have
been carried out in prokaryotes (Karlin and Mrazek, 2000), particularly in low GC Gram-
positive organisms (Karlin et al., 2004). Although CAI has been shown to be strongly
correlated with mRNA concentration in Saccharomyces cerevisiae (Coghlan and Wolfe,
2000), it is unclear whether this correlation is found throughout the prokaryotic kingdom.
Additionally, recent genome-wide studies have questioned the relationship between gene
expression level and codon bias (Coghlan and Wolfe, 2000). Although considerable
attention has been devoted to studying codon usage in a variety of prokaryotes, the
relationship between gene expression levels and codon usage in low GC Gram-positive
bacteria remains mostly uncharacterized.
119
Lactic acid bacteria are a heterogeneous family of microbes which reside in a
variety of environments. The genome sequences of several lactic acid bacteria have been
published, including bifidobacteria (Schell et al., 2002; Siezen et al., 2004), lactobacilli
(Kleerebezem et al., 2003; Pridmore et al., 2004; Altermann et al., 2004), lactococci
(Bolotin et al., 1999), streptococci (Ferretti et al., 2001; Ajdic et al., 2002; Tettelin et al.,
2001), and several others are underway (Klaenhammer et al., 2003; Siezen et al., 2004).
Although it was recently suggested that lactic acid bacteria are prime candidates for
codon optimization, little information is available regarding codon usage in these
microbes. The Lactobacillus acidophilus NCFM genome and transcriptome are well
characterized (Altermann et al., 2004; Azcarate-Peril et al., 2004; Barrangou et al., 2004),
albeit little information is available regarding codon usage in this organism.
The objective of this study was to investigate the relationships between gene
expression levels determined by microarray expression profiles and codon usage,
chromosomal location and intrinsic gene properties such as size and GC content, in L.
acidophilus. This is the first global analysis of codon usage in lactobacilli, providing a
better understanding of the parameters which underlie gene expression for low GC Gram-
positive bacteria.
4.3 Materials and Methods
4.3.1 Genome and microarray data
The complete genome sequence of the probiotic lactic acid bacterium
Lactobacillus acidophilus NCFM was used, as described by Altermann et al. (2004). All
120
annotated ORFs were used, including those annotated as hypothetical, and predicted by
computational methods only.
A whole-genome cDNA microarray platform of L. acidophilus has been
implemented recently (Azcarate-Peril et al., 2004). For the current study, we used a
carbohydrate microarray dataset published previously (Barrangou et al., 2004), and for all
ORFs spotted on the array (n=1889), the median LSM over the 8 treatments, as obtained
from the mixed model ANOVA data analysis (Barrangou et al., 2004), was calculated
and used as the gene expression level for microarray experiments.
4.3.2 Gene intrinsic parameters
For each ORF annotated in the genome, characteristics were parsed out of the
genbank file, namely: ORF number, start position, strand location (either leading or
lagging strand), gene size (in nucleotide), G+C content, for the first (GC1), second (GC2)
and third (GC3) codon positions, and for the whole gene (GCall).
Additionally, expression levels were added for each ORF (array LSM), as the
median expression level over the carbohydrate array experiments, as well as the gene
position, relative to the chromosome terminus, either between the origin and terminus of
replication (O>T), or between the terminus and the origin of replication (T>O). The
terminus was defined as the intergenic region between ORFs 1128 and 1129, which is
located in the middle of the proposed terminus region defined in the L. acidophilus
NCFM genome (Altermann et al., 2004). The complete data set was also split into four
chromosomal locations, by segregating data by strand (leading or lagging), and relative to
the terminus (between the origin and the terminus, or between the terminus and the
121
origin, relative to the leading strand). The resulting four subgroups were designed: LeOT,
genes on the Leading strand located from the Origin to the Terminus; LeTO, genes on the
Leading strand located from the Terminus to the Origin; LaOT, genes on the Lagging
strand located from the Origin to the Terminus (relative to the leading strand); LaTO,
genes on the Lagging strand located from the Terminus to the Origin (relative to the
leading strand).
4.3.3 Codon adaptation index
For codon bias analyses, parameters were calculated according to the method of
Sharp and Li (1987). Specifically, the relative synonymous codon usage (RSCU) table,
the relative adaptedness of each codon (w) and the codon adaptation index (CAI) were
calculated (Sharp and Li, 1987).
The CAI is a measure of the degree of deviation of codon usage in a specific
gene, as compared to that of a selected training set. Three different training sets were
used to calculate CAI. The first training set consisted of the 10 most highly expressed
genes, and resulted in CAI10. The second training set consisted of the 50 most highly
expressed genes, and resulted in CAI50. The third training set included the whole
genome, in order to use the complete genome-wide codon usage bias as the reference.
CAI calculations were carried out using the EMBOSS suite (Rice et al., 2000), using the
cusp and cai tools to compute the codon usage table and CAI, respectively. For the
training set, a desirable sample size of 1% of the predicted coding sequences was
previously proposed (Carbone et al., 2003), and a set of the 20 most highly expressed
genes has also been used for CAI training in S. cerevisiae (Fraser et al., 2004). For the L.
122
acidophilus genome, 1% represents approximately 20 ORFs, which falls in-between the
first two training sets we used, namely 10 and 50 genes.
4.3.4 Ribosome binding site identification
Ribosome binding site (RBS) analyses were carried out using a custom-made
script available at http://sourceforge.net/projects/free2bind. This program is designed to
identify putative RBS based on the lowest calculated energy of the interaction between
the 16S 3’ tail and a given DNA sequence, an approach similar to that of Osada et al.
(1999). The resulting calculated free energy level of the pairing between the 16S 3’ tail
and the putative RBS sequence is used subsequently as an indicator of the RBS quality,
with lowest energy levels representing higher quality Shine Dalgarno (SD) sequences.
Two different RBS types were calculated. Since putative RBS sites have been annotated
in the L. acidophilus NCMF genome (Altermann et al., 2004), the region encompassing
10 bp before and after the first annotated base of the RBS was used as template for search
of RBS sites, which included 1250 ORFs. Additionally, the region upstream of each
annotated ORF was also used as a template for search of RBS sites, which included 1874
ORFs.
4.3.5 Statistical analyses
Correlation analyses were carried out in SAS (SAS Institute, Cary, NC), using the
correlation procedure (proc corr), invoking Pearson, Spearman (non-parametric) and
Kendall (non-parametric) correlation tests. Additionally, regression analysis was carried
out in Excel (Microsoft, CA). Although there is a priori no reason to make assumptions
123
regarding the normal distribution of parameters, or the linearity of the relationships being
studied, we attempted linear regression analysis. ANOVA statistical analyses were
carried out in SAS as well, using the GLM procedure (proc GLM). Although
correspondence analysis is usually used in studies of codon usage (Lloyd and Sharp,
1992, Perriere and Thioulouse, 2002), our objective was to investigate correlations rather
than describe codon usage statistics distributions.
4.4 Results
4.4.1 Distribution patterns
A total of 1813 ORFs had entries for all the parameters, and were subsequently
used in the study. These 1813 ORFs represent over 97% of the genomic ORFs in the L.
acidophilus NCFM genome.
The codon usage profiles for the three training sets are shown in Table 1. For 18
amino acids, more than one codon was used. The codon usage profile revealed a general
bias towards A-T rich codons, as expected in low GC Gram positive genomes. For
CAI10, 14 dominant codons were AT rich at the third position, and four codons were GC
rich at the third position. The amino acids with preferred GC-rich third position were:
Asn (AAC), His (CAC), Lys (AAG) and Tyr (TAC). In contrast, for those four residues,
CAIall had an AT-rich dominant third codon, namely Asn (AAT), His (CAT), Lys
(AAA) and Tyr (TAT). Those dominant codon preferences indicate that throughout the
genome, low GC third positions are preferred, whereas for codons with high bias, high
GC third positions are preferred. This suggests that codons with high bias are different
from genes typical of a low GC organism.
124
For all measured parameters, the distribution of the data over the entire genome
was analyzed, and distributions were visualized for gene expression level, gene size, GC
contents and CAI (Figure 1). The mean expression level was 0.03, ranging between -2.00
and 4.46. The median was -0.36, indicating that most genes have an expression level
below the mean. The transcriptome distribution pattern indicates many genes are lowly
expressed, and few genes are highly expressed. The distribution of gene expression levels
across the genome is similar to that of S. pneumoniae (Martin-Galiano et al., 2004).
For chromosomal location (Figure 2), 996 genes (55%) were on the leading strand
(Le), and 817 (45%) were on the lagging strand (La). Overall, 1042 genes (57%) were
located between the origin and the terminus (OT), and 771 (43%) were located between
the terminus and the origin (TO). Most of the genes on the leading strand (n=813, 82%)
were located between the origin and the terminus (LeOT), and most of the genes on the
lagging strand (n=588, 72%) were located between the terminus and the origin (LaTO).
Gene size varied between approximately 0.1 kb and 13.0 kb, with a mean value
952bp. Overall, 26 genes (1.5%) were bigger than 3.0 kb, and 183 (10.1%) were smaller
than 0.3 kb. Most genes (n=1208, 67%) had sizes between 500 bp and 1,500 bp.
Overall GC content (GCall) varied between 23.2% and 46.3%, with a 35.0%
mean, which is within 0.3% of the genome-wide GC content (Altermann et al., 2004).
There was a strong position specific bias, however, with a GC1 mean (46.2%) over 10%
above and a GC3 mean (25.2%) about 10% below the GCall mean. In contrast, the GC2
mean (33.4%) was close to that of GCall.
For CAI10 and CAI50, distributions profiles were very similar, with values
ranging between 0.24 and 0.87, with a small number of genes (n=232, 13%) showing a
125
high bias (above 0.65). In contrast, for CAIall, a dual-distribution was observed, with a
lot of genes showing high bias. The distribution of CAI10 values across the genome is
similar to that of S. pneumoniae (Martin-Galiano, 2004), although the average is higher in
L. acidophilus.
Putative ribosome binding sites with sequence closest to the RBS consensus
AGGAGG had the lowest free energy of pairing with the 16S 3’ tail. These results are
consistent with previous findings in prokaryotes (Osada et al., 1999, Ma et al., 2002).
4.4.2 Correlation analyses
Correlations between gene expression level and all measured parameters are
summarized in Table 2. Analyses included one parametric (Pearson) and two non-
parametric (Spearman and Kendall) measures of correlation, as well as two indicators of
linear regression analysis fit, namely significance and sum of squares residual.
Globally, parameters showing the highest correlation with gene expression were,
in decreasing order: GCall, GC1, GC2, CAI10, CAI50, size and RBS. GC3, start and
CAIall did not show any significant correlation in all three tests. Gene size showed a
higher correlation in the Spearman and Kendall tests than in Pearson’s. The Pearson test
correlated perfectly with both regression measures.
Although no particular parameter showed a very high correlation with gene
expression level (all correlation coefficients were below 0.5), GCall displayed correlation
coefficients of 0.427, 0.411 and 0.289, for Pearson, Spearman and Kendall, respectively.
Additionally, the linear regression analysis between GCall and LSM gave a statistically
very significant fit, although the relationship may not be fully linear (Figure 3).
126
Visualization of the correlations between gene expression levels and all other
parameters is shown in Figure 3. The best regression curve (least sum of square residual)
corresponded with the most statistically significant P-value, and the highest correlation
coefficient (see Table 2). As previously suggested, the parameter showing the best
regression curve is the best predictor of gene expression level (Coghlan and Wolfe,
2000). However, a prior report suggests that the Pearson correlation is inappropriate for
analyzing CAI correlation with gene expression level, thus Spearman should be preferred
(Coghlan and Wolfe, 2000). Our results indicate both tests gave similar results, with the
exception of correlation to gene size.
A previous study reported a strong correlation between PHX genes and strong
Shine-Dalgarno sequences (Karlin and Mrazek, 2000), and a second study found a
correlation between Shine-Dalgarno sequence conservation and codon usage (Sakai et al.,
2001). In contrast, we found a small correlation between gene expression level and RBS
strength. This weak correlation has been reported for three archaebacteria previously
(Sakai et al., 2001).
4.4.3 Chromosomal location
Although it is tempting to arbitrarily select subsets of genes, based on expression
level, CAI range, or GC content, we segregated the data according to genome location,
within strand, and relative to the origin and terminus of replication (Figure 2). The
correlation between gene expression level and all other parameters were then investigated
again, using methodologies presented above. ANOVA analysis investigated whether
there were differences in the distributions of the parameters between the four
127
chromosomal locations. Results are summarized in Table 3. Analyses revealed genes on
the leading strand were more highly expressed that those on the lagging strand. Similarly,
genes located from the origin to the terminus were more highly expressed than those
located from the terminus to the origin. Genes were segregated by strand (Leading or
Lagging) and relative to the terminus (from the Origin to the Terminus OT, or from the
Terminus to the Origin TO) into four groups, namely LeTO, LeOT, LaTO, LaOT. A
comparison of gene expression across these four groups revealed that LeOT genes were
the most highly expressed, followed by LaTO, and by both LeTO and LaOT, which were
both most lowly expressed (Figure 4).
In contrast, GC3 showed opposite distributions between the four locations, with
LeOT and LaTO showing a lower GC3 content than LeTO and LaOT (Table 3, Figure 4).
CAI10 and CAI50 both showed differences between strands, with higher values
on the leading strand. These results correlate well with distributions observed on Figure
1, showing differences between CAI10-CAI50 and CAIall. Also, the strand differences
explain the dual distribution observed for CAIall on Figure1.
Additionally, correlation analyses were carried out, after data were split into the
four chromosomal locations (Table 4). Differences between locations were observed,
consistent with ANOVA results (Table 3). Since genes LeOT and LaTO showed the
highest expression levels (Table 3, Figure 4), particular attention was given to their
correlations with other parameters. For LeOT, CAI showed the highest correlation,
followed by GCall, GC1, GC2, Size and RBS. Interestingly, CAI10/CAI50 showed the
highest correlation coefficients, namely 0.52 and 0.51. For LaTO, GCall, GC1, CAIall,
128
GC2, CAI10, size, CAI50 and size showed highest correlations. Both GC3 and start did
not show any significant correlation, which is similar to global results (Table 2).
Visualization of the correlation analyses for CAI10, GCall and size can be seen on
Figure 5. For select parameters, gene distribution for each location can be seen on Figure
6. CAI10 showed the strongest correlation with gene expression level, for LeOT (Table
4). Most of those high CAI10 value correspond to genes with high expression levels
(Figure 5). Most of the highly expressed genes are located on LeOT, and some on LaTO
(Figures 5 and 6), while only a few genes located on LeTO and LaOT show expression
above LSM=1.0. In contrast, for the lagging strand, only a few genes show CAI10 above
0.55 (Figures 5 and 6), none of which have high gene expression.
When comparing the relationship between CAI10 and LSM globally (Figure 3)
vs. by location (Figure 5), there is a location-specific difference. In contrast, the
relationship between LSM and gene size, or LSM and GCall does not change when data
is segregated by location (Figures 3 and 5). This is consistent with a strand discrepancy in
codon adaptation (CAI10), as seen on Figure 6.
Both GCall and GC3 contents are consistent regardless of location (Figure 6),
with a value close to that of the genome in the case of GCall. For GC3, the value has to
be lower than that of the genome, due to the restriction in the first two positions, resulting
in a higher GC content for this position. As a result, since L. acidophilus is a low GC
organism, the GC content at the third codon position has to be lower than that of the
genome. Gene size distribution is seemingly equal throughout the chromosome, although
many more genes are present on the LeOT, and LaTO. There is a strong difference
between the two strands as to codon adaptation (Table 3, Figures 4 and 6), which was
129
observed irregardless of the training set used. The codon adaptation index is always
higher for the leading strand, regardless of direction relative to the origin or terminus of
replication (Figure 6).
4.5 Discussion
The analysis of the relationships between gene expression levels, codon usage,
chromosomal location and intrinsic gene properties in L. acidophilus revealed strong
correlations between GC content, codon usage, chromosomal location and gene
expression levels. However, there was no correlation between GC3 and gene expression
level. Globally, chromosomal architecture seemed to influence gene expression strongly,
with both a strand bias, and a gene location and orientation effect, relative to the origin
and terminus or replication.
Globally, a relatively small number of genes showed high expression levels.
Predicted highly expressed genes usually encompass ribosomal proteins (RP),
transcription and translation processing factors (TF), chaperone proteins (CH),
recombination and repair proteins, outer membrane proteins and energy metabolism
enzymes (Karlin and Mrazek, 2000; Karlin et al., 2004). Throughout a variety of
prokaryotes, those genes display a high codon bias (Karlin and Mrazek, 2000). Our
results indicate that the 20 most highly expressed included genes were involved in
glycolysis, transcription, ATP synthesis, membrane construction, ribosomal proteins,
regulators, and a peptidase. Genes encoding glycolytic enzymes and translation factors
have also been shown to be highly expressed in S. pneumoniae (Martin-Galiano et al.,
130
2004). Although this is consistent with RP and TF families of genes, genes most highly
expressed in L. acidophilus did not include CH genes.
Although most studies analyzing codon bias have relied on multivariate statistical
analyses such as correspondence analysis (Perriere and Thioulouse, 2002), the major
trends identified in codon usage account for a low proportion of the variation (Grocok
and Sharp, 2002). In a thorough study of Pseudomonas aeruginosa, the first axis
accounted for 17% of the variation, and the first four axes combined accounted for a total
of 30% of the variation (Grocock and Sharp, 2002). In another study, the combination of
the first three axes account for less than 23% of the variation in codon usage (McInerney,
1994).
Since our objective was to investigate correlation between gene features and
expression levels, rather than describe the variation within CAI distributions, we used
correlation analysis rather than correspondence analysis. Although no assumption can be
made as to the linearity of the relationships between parameters being tested, a linear
regression was attempted nonetheless. Several correlation analyses were carried out,
including both parametric and non-parametric analyses, namely Pearson, Spearman and
Kendall, since no assumptions were made a priori regarding data distribution and
linearity of the relationships. Spearman correlation analysis has previously been used in
codon analysis studies and Spearman ranking was considered a more appropriate statistic
than the Pearson correlation coefficient (Coghlan and Wolfe, 2000). Similarly, Spearman
correlation has also been used previously to investigate correlation between effective
number of codons in a gene (Nc) and CAI (Fuglsang, 2003). Additionally, Kendall
correlation has also been used to analyze the correlation between gene expression level
131
and codon usage (dosReis et al., 2003). A combination of both Pearson and Spearman
correlation analyses has also been used to investigate correlations between CAI and other
parameters (Jansen et al., 2003). Pearson correlation coefficients have also been used to
analyze the correlation between codon bias and microarray expression data (Fraser et al.,
2004). Our strategy allows comparison of results obtained from both parametric
(Pearson) and non-parametric (Kendall, Spearman) correlation tests. It was previously
suggested that non-parametric tests are more appropriate for such analyses, since they are
robust against non-linearity and non-normality (dosReis et al., 2003).
Prior studies carried out using correspondence analysis to investigate CAI statistic
distribution (Lloyd and Sharp, 1992) have identified a major and a secondary trend, with
the first axis appearing to differentiate genes according to their expression level. (Lloyd
and Sharp, 1992, Kliman et al., 2003). Although our results indicate a correlation
between CAI and gene expression level, globally, our strongest correlation was
established between gene expression level and GC content. Additionally, our
investigation of the correlation between CAI and other statistics indicated it is not
correlated with GC3.
CAI has previously been shown to be the best codon usage bias indicator
(Coghlan and Wolfe, 2000). CAI was also shown to be highly correlated with mRNA
expression levels in S. cerevisiae (Coghlan and Wolfe, 2000). CAI and mRNA levels
have been shown to be correlated previously (dosReis et al., 2003). In our study, we
found a strong correlation between gene expression level and CAI10/CAI50. Although it
was not as strong as that between gene expression level and GCall on a global scale, it
was the strongest correlation for gene positioned LeOT. Our results show gene GC
132
content is the parameter most highly correlated with gene expression, which is different
from results shown previously (dosReis et al., 2003), but similar to findings from rodents
(Konu and Li, 2002).
Our results, indicating a positive correlation between gene expression level and
gene size, differ from previous studies reporting a negative correlation between mRNA
concentration and protein length (Coghlan and Wolfe, 2000) and gene length and codon
usage (Kliman et al., 2003). Perhaps this discrepancy reflects the differences between the
organism used in the study, eukaryotic S. cerevisiae and prokaryotic L. acidophilus. The
relationship between CAI and mRNA levels has been shown previously in S. cerevisiae
(Coghlan and Wolfe, 2000), and in E. coli (dosReis et al., 2003). Also, a non-parametric
regression on mRNA expression levels in E. coli has shown that gene size followed by
GC and then CAI are the best predictors of mRNA concentration (dosReis et al., 2003).
Although several studies have used the CAI as an indicator of gene expression, a
variable positive correlation is found between codon bias and level of gene expression.
Historically, initial CAI studies claimed the strong correlation between CAI and levels of
gene expression allow utilization of CAI as a predictor of gene expression (Sharp and Li,
1987). In contrast, we believe the correlation between CAI and gene expression level is
indicative, rather than predictive of the level at which a gene is expressed.
The a priori assumption that genes with genes bias close to that of highly
expressed genes should be highly expressed is not consistent with the fact that some
genes with very high CAI values are not highly expressed (Figure 5). Analysis of CAI
and microarray gene expression levels in Streptococcus pneumoniae showed that CAI is
not always predictive of gene expression (Martin-Galiano et al., 2004). Specifically,
133
genes with high CAI are not always highly expressed, and genes with low CAI can be
highly expressed (Marting-Galiano et al., 2004), which is also shown in our current
findings (Figure 5). Interestingly, S. pneumoniae and L. acidophilus are both low GC
Gram-positive lactic acid bacteria.
A small correlation (r2 0.09) has been shown between CAI and microarray
fluorescence in S. pneumoniae (Martin-Galiano et al., 2004). A similar correlation level
(r2 0.07) was shown in L. acidophilus. In contrast, a higher correlation (r2 0.18) was
found between GC and gene expression level in L. acidophilus.
The genomic DNA GC-content varies widely between species, as a result of
mutation pressure (Muto and Osawa, 1987). GC variation has been shown to be the most
important parameter differentiating codon usage bias between organisms, in archae and
eubacteria (Chen et al., 2004). The relationship between codon usage bias and GC
composition has been characterized across unicellular genomes (Wan et al., 2004).
Specifically, GC3 was shown to be the primary factor within GC content to correlate
highly with codon usage bias (Wan et al., 2004). Further, GC3 was hypothesized as the
key factor driving synonymous codon usage, independently of species (Zhang and Chou,
1994; Wan et al., 2003). Although those results were inferred across 70 bacterial species
and 16 archaeal genomes, our results show this is not the case for L. acidophilus. We
found no correlation between GC3 and CAI. The non-linearity of the relationship
between codon usage bias measures and GC3 has been shown previously in a variety of
bacteria and archaea (Wan et al., 2004).
The L. acidophilus NCFM genome is 34.7% GC, so it is not surprising that codon
usage is related to base composition bias. The observed differences in GC content at the
134
three codon positions illustrate the overall GC content. Codon degeneracy is located
primarily at the third position of the codon, since there are strict constraints on the first
and second position of each codon (Zhang and Chou, 1994). As a result, the third codon
position is representative of the GC content of an organism, and reflects differences
between species (Muto and Osawa, 1987; Carbone et al., 2003). GC3 has previously been
shown to vary between species (Zhang and Chou, 1994), explaining the species impact
on the correlation between GC3 and CAI (Lloyd et al., 1992). Also, it was previously
reported that CAI can most highly correlate with GC skew (Carbone et al., 2003), and
that gene expression levels are correlated with GC3 (Kliman et al., 2003). The position-
specific GC content within codons has been investigated previously (Muto and Osawa,
1987; Chen and Zhang, 2003), across species with varying GC content, indicating that
low GC content bacteria have higher GC content at the first codon position and lower GC
content at the third codon position, than that of their overall genome content, while that
of the second codon position is close to their genomic content (Chen and Zhang, 2003).
This is consistent with our findings in L. acidophilus (Figure 1). Early work showed that
there is a codon position bias in GC content, which is correlated with genome GC content
(Muto and Osawa, 1987). Specifically, the correlation between GC3 and genome GC
content explains the discrepancies observed at the third codon position between species
with varying GC content (Muto and Osawa, 1987).
A previous study investigating codon bias in P. aeruginosa (Grocock et al., 2002)
reported that for species with highly biased GC base composition, the CAI methodology
may not be appropriate. While the study in P. aeruginosa (67% GC) illustrated this point
for high GC organisms, our analyses in L. acidophilus (35% GC) might validate this
135
theory for low GC organisms. It was recently suggested lactic acid bacteria are a
desirable group of organisms for analysis of codon usage (Fuglsang, 2003), but our result
suggest that caution should be applied when using the CAI methodology.
Perhaps the high correlation between GC content and gene expression level is due
to the genomic composition of L. acidophilus. The genomic GC content in prokaryotes
ranges between approximately 25% and 75% (Muto and Osawa, 1987), which allows
great codon usage flexibility and variability. Since L. acidophilus is a low GC organism
(Altermann et al., 2004), perhaps the strong correlation between GC content and gene
expression level is due to the importance of high GC content genes. Indeed, for a low GC
organism such as L. acidophilus, genes with a high GC content differ widely from its
genomic “fingerprint”, since GC content is a main component of genomic signature
(Sandberg et al., 2003). Therefore, retaining genes that vary from its overall genomic
signature may indicate that they are biologically important, and consequently highly
expressed.
A correlation between RBS and gene expression level was found, albeit it was
minor compared to that of GCall. Nonetheless, a positive correlation between a strong
RBS and gene expression level is intuitive, and consistent with previous findings (Ma et
al., 2002).
We observed a discrepancy between the genome signature (low GC) and highly
expressed genes (high GC), perhaps indicating the codon usage for highly expressed
genes is different from that of the genome. Specifically, the genome-wide codon usage is
characterized by a high AT content at the third codon position, which is consistent with a
low GC organism. In contrast, genes with high codon bias showed a specific preference
136
for high GC content at the third codon position for select amino acids (Table 1).
However, GC3 was not a good indicator of gene expression (Tables 2 and 4). Perhaps
this is an indicator that for low GC organisms, overall gene GC content is more
representative of bias than codon usage.
Differences in the base composition between strands have been shown previously
(Grocock and Sharp, 2002; Lobry and Sueoka, 2002). The leading-lagging strand bias in
codon usage has been shown in Borriella burgdorferi (McInerney, 1998; Carbone et al.,
2003). Additionally, replication selection is seemingly responsible for the presence of the
majority of the genes on the leading strand, whereas transcription selection results in
higher expression of genes present on the leading strand (McInerney, 1998).
Interestingly, location per se did not correlate with gene expression level globally
(Table 2). This means that the position of the start of any gene on the chromosome does
not correlate with gene expression level. However, it was shown previously that location
is indeed an important factor in gene expression. We therefore further investigated the
effect of both strand location, and orientation relative to the terminus on gene expression
level.
The importance of chromosomal location has been illustrated before in P.
aeruginosa (Grocock and Sharp, 2002), Borrelia burgdorferi (McInerney, 1998), and
Treponoma pallidum (Lafay et al., 1999). Specifically, differences between strands have
been illustrated for codon usage (McInerney, 1998). Strand location was shown to be a
major cause of variation in codon usage (McInerney, 1998). Albeit the correlation
between gene location and expression level has been estimated weak in P. aeruginosa,
whereby gene location was only the tertiary trend in correspondence analysis, accounting
137
for only 4.4% of variation (Grocock and Sharp, 2002). Strand location accounted for
8.6% of the variation in codon usage, as the secondary source of variation (Lafay et al.,
1999). In contrast, in B. burgdorferi, strand location is the primary parameter involved in
codon usage, accounting for 13.7% of the variation (McInerney, 1998). Within species,
inter-strand differences appear on the primary axis of correspondence analysis (Lafay et
al., 1999). Nevertheless, they showed that the position of a gene relative to the strand has
an influence on codon usage (Grocock and Sharp, 2002). In addition to strand location,
the orientation of a gene relative to the direction of DNA replication is also important in
codon usage pattern (McInerney, 1998). Nevertheless, the impact of both strand and
orientation on gene expression had not yet been illustrated simultaneously, prior to our
study.
Chromosomal architecture has a major effect on gene expression, both relative to
strand bias and gene position and orientation relative to the terminus of replication. The
impact of chromosomal architecture is important for many of the parameters measured in
our study, showing a significant bias for genes converging towards the terminus.
Although it was previously shown the leading strand in low GC Gram-positives
pervasively exceeds 75% of the genes (Karlin et al., 2004), it is not the case in L.
acidophilus, where only 55% of the genes are on the leading strand. Nevertheless, very
significant differences in codon usage, GC content and other parameters were observed
between the two strands.
Interestingly, while codon usage, GC content and gene size all showed a global
correlation with gene expression levels, CAI was the parameter which showed the most
variability between chromosomal locations, relative to the strand bias, and the position
138
and orientation relative to the terminus. Specifically, the correlation between CAI10 and
gene expression level is higher for LeOT genes (Table 4). In contrast, the correlation
between gene expression level and GCall or GC3 was consistent regardless of location.
For CAI particularly, genes on the leading strand located between the origin and the
terminus of replication show the most codon usage bias. Specifically, genes that show the
most codon bias are located in this region, and are likely to be highly expressed (Figure
5).
Globally, it seems chromosomal architecture is a primary factor controlling gene
expression in L. acidophilus. Perhaps the combination between replication efficiency and
transcription efficiency underlie the impact of chromosomal location on gene
expressivity. Indeed, replication is thought to be more efficient while co-directional with
transcription (French, 1992), since collisions between the RNA and DNA polymerases
are likely to slow down both processes (French, 1992). Hence, there is a selective
advantage towards locating the majority of the genes on each strand pointing towards the
terminus. As suggested previously, more efficient replication may be a selective
advantage (McInerney, 1998) and the most desirable gene location would combine genes
on the leading strand and on the lagging strand pointing toward the terminus. This is
consistent with our observations, namely genes on LeOT and LaTO showing higher
expression levels (Table 3, Figures 4 and 5).
The causal link established between codon usage and gene expression level is still
as controversial as when the concept was initially presented (Sharp et al., 1986). Early
work aimed at predicting expression level of a gene given only the nucleotide sequence
of the coding region (Sharp et al., 1986). Although, tRNA relative abundancies also
139
impact gene expression level, we did not include them in our study, since our primary
objective was to investigate the correlation between intrinsic gene features and
expression. Nevertheless, a correlation between usage of preferred codons and level of
their respective major isoacceptor tRNA has been shown in E. coli . This correlation
explains an adaptation of highly expressed genes towards translational efficiency
(dosReis et al., 2003).
Although CAI has previously been reported as a predictor of mRNA
concentration, it is an imperfect and unreliable measure of gene expression (Coghlan and
Wolfe, 2000). From a biological standpoint, intrinsic gene parameters are set, regardless
of environmental conditions. Since environmental conditions have been shown to impact
gene expression on a large scale, as shown by microarray studies, intrinsic gene
parameters are unable to predict changes in mRNA levels with changing biological
conditions, as mentioned previously (Coghlan and Wolfe, 2000). Indeed, extrinsic
parameters such as intergenic regions comprising promoter sequences are also involved
in gene expression control.
Globally, gene expression is controlled at several levels, including initiation of
transcription, transcription termination and codon usage. Additionally, the minor codon
modulator hypothesis stipulates that minor codons near the initiation site may play a role
in regulating gene expression (Ohno et al., 2001). As a result, although codon bias
measures may be correlated with intrinsic parameters, they are not good predictors of
mRNA levels. Perhaps a mixed model similar to that presented by dosReis et al. (2003),
including several parameters, is more representative of the heteroscedastic nature of gene
expression. Overall, many factors are involved in gene expression, including codon
140
usage, gene length, transcription initiation, amino-acid composition, protein function,
tRNA abundance, environmental conditions, mutation and evolutionary forces, GC
compositions, and others, which underlies the complexity in modeling and predicting
gene expression based on a defined number of parameters. It would be utopic to consider
that intrinsic gene parameters can solely be used to predict gene expression. An effective
predictor of gene expression has to include all of the parameters involved in translation,
transcription, environmental conditions and physiological state of the organism.
Nevertheless, this study illustrates the importance of chromosomal architecture for gene
transcription, and shows that codon usage and GC content are best correlated with
expression levels in L. acidophilus.
141
4.6 References
Ajdic, D., McShan, W. M., McLaughlin, R. E., Savic, G., Chang, J., Carson, M. B., Primeaux, C., Tian, R., Kenton, S., Jia, H., Lin, S., Qian, Y., Li, S., Zhu, H., Najar, F., Lai, H., White, J., Roe, B. A. & Ferretti, J. J. (2002) Proc. Natl. Acad. Sci. USA 99, 14434-14439
Altermann, E., Russell, W. M., Azcarate-Peril, M. A., Barrangou, R., Buck, L. B.,
McAuliffe, O., Souther, N., Dobson, A., Duong, T., Callanan, M., Lick, S., Hamrick, A., Cano, R., & Klaenhammer, T. R. (2004). J. Bacteriol In review
Aota, S. I., Gojobori, T., Ishibashi, F., Maruyama, T., & Ikemura, T. (1988) Nucleic
Acids Res. 16, r315-r402 Azcarate-Peril et al., (2004) In review Barrangou, R., Azcarate-Peril, M. A., Duong, T., Conners, S. B., Kelly, R. M., &
Klaenhammer, T. R. (2004) In review. Bolotin, A., Mauger, S., Malarme, K., Ehrlich, S. D., & Sorokin, A. (1999) Antonie van
Leeuwenhoek 76, 27-76 Carbone, a., Zinovyev, A., & Kepes, F. (2003) Bioinformatics 19, 2005-2015 Chen, L. L., & Zhang, C. T. (2003) Biochem. Biophys. Res. Comm. 306, 310-317 Chen, S. L., Lee, W., Hottes, A. K., Shapiro, L., & McAdams, H. H. (2004) Proc. Natl.
Acad. Sci. USA 101, 3480-3485 Coghlan, A., & Wolfe, K. H. (2000) Yeast 16, 1131-1145 dosReis, M., Wernisch, L., & Savva, R. (2003) Nucleic Acids Res. 31, 6976-6985 Ferretti, J. J., McShan, W. M., Ajdic, D., Savic, D. J., Savic, G., Lyon, K., Primeaux, C.,
Sezate, S., Suvorov, A., Kenton, S., Lai, H. S., Lin, S. P., Qian, Y., Jia, H. G., Najar, F. Z., Ren, Q., Zhu, H., Song, L., White, J., Yuan, X., Clifton, S. W., Roe, B. A., & McLaughlin, R. (2001) Proc. Natl. Acad. Sci. USA 98, 4658-4663
Fraser, H. B., Hirsch, A. E., Wall, D. P., & Eisen, M. B. (2004) Proc. Natl. Acad. Sci.
USA 101, 9033-9038 French, S. (1992) Science 258, 1362-1361365 Fuglsang, A. (2003) Biochem. Biophys. Res. Comm. 312, 285-291
142
Fuglsang, A. (2004) Antonie van Leeuwenhoek 86, 135-147 Grantham, R., Gautier, C., Gouy, M., Mercier, R., & Pave, A. (1980) Nucleic Acids Res.
8, r49-r62 Grocock, R. J., & Sharp, P. M. Gene 289, 131-139 Jansen, R., Bussemaker, H. J., & Gerstein, M. (2003) Nucleic Acids Res. 31, 2242-2251 Karlin, S., & Mrazek, J. (2000) J. Bacteriol. 182, 5238-5250 Karlin, S., Theriot, J., & Mrazek, J. (2004) Proc. Natl. Acad. Sci. USA 101, 6182-6187 Klaenhammer, T. R., Altermann, E., Arigoni, F., Bolotin, A., Breidt, F., Broadbent, J.,
Cano, R., Chaillou, S., Deutscher, J., Gasson, M., van de Guchte, M., Guzzo, J., Hartke, A., Hawkins, T., Hols, P., Hutkins, R., Kleerebezem, M., Kok, J., Kuipers, O., Lubbers, M., Maguin, E., McKay, L., Mills, D., Nauta, A., Overbeek, R., Pel, H., Pridmore, D., Saier, M., van Sinderen, D., Sorokin, A., Steele, J., O'Sullivan, D., de Vos, W., Weimer, B., Zagorec, M., and Siezen, R. (2002) Antonie Van Leeuwenhoek 82, 29-58
Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,
R., Tarchini, R., Peters, S. A., Sandbrink, H. M., Fiers, M. W., Stiekema, W., Lankhorst, R. M., Bron, P. A., Hoffer, S. M., Groot, M. N., Kerkhoven, R., de Vries, M., Ursing, B., de Vos, W. M. & Siezen, R. J. (2003) Proc. Natl. Acad. Sci. USA 100, 1990-5
Kliman, R. M., Irving, N., & Santiago, M. (2003) J. Mol. Evol. 57, 98-109 Lafay, B., Lloyd, A. T., McLean, M. J., Devine, K. M., Sharp, P. M., and Wolfe, K. H.
(1999) Nucleic Acids Res. 27, 1642-1649 Lloyd, A. T., & Sharp, P. M. (1992) Nucleic Acids Res. 20, 5289-5295 Lobry, J. R., & Sueoka, N. (2002) Genome Biol. 3, 1-14 Ma, J., Campbell, A., & Karlin, S. (2002) J. Bacteriol. 184, 5733-5745 Martin-Galiano, A. J., Wells, J. M., & de la Campa, A. G. (2004) Microbiol. 150, 2313-
2325 McInerney, J. O. (1998) Proc. Natl. Acad. Sci. USA 95, 10698-10703 Muto, A., & Osawa, S. (1987) Proc. Natl. Acad. Sci. USA 84, 166-169 Ohno, H., Sakai, H., Washio, T., & Tomita, M. (2001) Gene 276, 107-115
143
Osada, Y., Saito, R., & Tomita, M. (1999) Bioinformatics 15, 578-581 Perriere, G., & Thioulouse, J. (2002) Nucleic Acids Res. 30, 4548-4555 Pridmore RD, Berger B, Desiere F, Vilanova D, Barretto C, Pittet AC, Zwahlen MC,
Rouvet M, Altermann E, Barrangou R, Mollet B, Mercenier A, Klaenhammer TR, Arigoni F, & Schell MA. (2004) Proc. Natl. Acad. Sci. USA 101, 2512-2517
Rice, P., Longden, I., & Bleasby, A. (2000) Trends Gen. 16, 276-7 Sakai, H., Imamura, C., Osada, Y., Saito, R., Washio, T., & Tomita, M. (2001) J. Mol.
Evol. 52, 164-170 Sandberg, R., Branden, C. I., Ernberg, I., & Coster, J. (2003) Gene 311, 35-42 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,
M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sci. USA 99, 14422-14427
Sharp, P. M., Tuohy, T. M. F., & Mosurski, K. R. (1986) Nucleic Acids Res. 14, 5125-
5143 Sharp, P. M., & Li, W. H. (1987) Nucleic Acids Res. 15, 1281-1295 Sharp, P. M., Cowe, E., Higgins, D. G., Shields, D. C., Wolfe, K. H., & Wright, F. (1988)
Nucleic Acids Res. 16, 8207-8211 Siezen, R. J., van Enckevort, F. H. J., Kleerebezem, K., & Teusink, B. (2004) Curr. Opin.
Biotechnol. 15, 105-115 Tettelin, H., Nelson, K. E., Paulsen, I. T., Eisen, J. A., Read, T. D., Peterson, S.,
Heidelberg, J., Deboy, R. T., Haft, D. H., Dodson, R. J., Durkin, A. S., Gwinn, M., Kolonay, J. F., Nelson, W. C., Peretron, J. D., Umayam, L. A., While, O., Salzberg, S. L., Lewis, M. R., Radune, D., Holtzapple, E., Khouri, H., Wolf, A. M., Utterback, T. R., Hansen, C. L., McDonald, L. A., Feldblyum, T. V., Angiuoli, S., Dickinson, T., Hickey, E. K., Holt, I. E., Loftus, B. J., Yang, F., Smith, H. O., Venter, J. C., Dougherty, B. A., Morrison, D. A., Hollingshead, S. K., & Fraser, C. M. (2001) Science 293, 498-506
Wan, X. F., Xu, D., Kleinhofs, A., & Zhou, J. (2004) BMC Evol. Biol. 28, 19-30 Zhang, C. T., & Chou, K. C. (1994) J. Mol. Biol. 238, 1-8
144
Table 1. Codon usage Table
Fraction per AA Fraction per 1000 Total number Codon AA CAI10 CAI50 CAIall CAI10 CAI50 CAIall CAI10 CAI50 CAIall
GCA Ala 0.291 0.312 0.352 25.581 25.144 18.303 110 510 10551 GCC 0.106 0.078 0.108 9.302 6.311 5.598 40 128 3227 GCG 0.029 0.024 0.101 2.558 1.972 5.232 11 40 3016 GCT 0.574 0.585 0.440 50.465 47.133 22.846 217 956 13170 AGA Arg 0.214 0.229 0.341 9.302 10.403 13.987 40 211 8063 AGG 0.011 0.007 0.128 0.465 0.296 5.249 2 6 3026 CGA 0.053 0.029 0.103 2.326 1.331 4.243 10 27 2446 CGC 0.043 0.071 0.089 1.86 3.205 3.671 8 65 2116 CGG 0.011 0.008 0.071 0.465 0.345 2.92 2 7 1683 CGT 0.668 0.657 0.268 29.07 29.779 10.982 125 604 6331 AAC Asn 0.615 0.539 0.311 29.302 25.588 15.404 126 519 8880 AAT 0.385 0.461 0.689 18.372 21.89 34.05 79 444 19629 GAC Asp 0.401 0.333 0.229 28.837 24.405 9.317 124 495 5371 GAT 0.599 0.667 0.771 43.023 48.809 31.33 185 990 18061 TGC Cys 0.200 0.179 0.496 0.465 0.592 5.947 2 12 3428 TGT 0.800 0.821 0.504 1.86 2.712 6.032 8 55 3477 CAA Gln 0.901 0.913 0.691 31.628 29.434 29.627 136 597 17079 CAG 0.099 0.087 0.309 3.488 2.81 13.279 15 57 7655 GAA Glu 0.930 0.943 0.834 65.349 65.03 35.541 281 1319 20488 GAG 0.070 0.057 0.166 4.884 3.895 7.079 21 79 4081 GGA Gly 0.090 0.085 0.229 7.209 6.508 10.615 31 132 6119 GGC 0.119 0.143 0.173 9.535 10.994 8.009 41 223 4617 GGG 0.026 0.028 0.102 2.093 2.169 4.722 9 44 2722 GGT 0.765 0.743 0.496 61.163 56.994 22.976 263 1156 13245 CAC His 0.667 0.610 0.378 12.093 10.649 8.36 52 216 4819 CAT 0.333 0.390 0.622 6.047 6.804 13.73 26 138 7915 ATA Ile 0.032 0.025 0.265 2.093 1.726 22.263 9 35 12834 ATC 0.389 0.315 0.220 25.349 21.644 18.466 109 439 10645 ATT 0.579 0.660 0.516 37.674 45.407 43.39 162 921 25013 CTA Leu 0.020 0.027 0.129 1.628 2.169 16.056 7 44 9256 CTC 0.037 0.023 0.043 3.023 1.873 5.383 13 38 3103 CTG 0.014 0.011 0.107 1.163 0.937 13.26 5 19 7644 CTT 0.301 0.315 0.146 24.884 25.736 18.169 107 522 10474 TTA 0.361 0.362 0.332 29.767 29.532 41.241 128 599 23774 TTG 0.268 0.262 0.242 22.093 21.348 30.106 95 433 17355 AAA Lys 0.311 0.318 0.559 21.163 23.468 46.568 91 476 26845 AAG 0.689 0.682 0.441 46.977 50.288 36.722 202 1020 21169 ATG Met 1.000 1.000 1.000 20.93 26.426 34.24 90 536 19738 TTC Phe 0.400 0.392 0.311 13.023 13.657 13.777 56 277 7942 TTT 0.600 0.608 0.689 19.535 21.151 30.538 84 429 17604 CCA Pro 0.679 0.634 0.493 25.116 24.06 14.219 108 488 8197 CCC 0.038 0.014 0.065 1.395 0.542 1.863 6 11 1074 CCG 0.038 0.036 0.144 1.395 1.38 4.163 6 28 2400 CCT 0.245 0.316 0.298 9.07 11.98 8.597 39 243 4956 AGC Ser 0.093 0.071 0.133 4.884 3.796 7.423 21 77 4279 AGT 0.182 0.189 0.221 9.535 10.058 12.318 41 204 7101 TCA 0.529 0.544 0.299 27.674 28.94 16.625 119 587 9584 TCC 0.036 0.021 0.061 1.86 1.134 3.379 8 23 1948 TCG 0.009 0.011 0.087 0.465 0.592 4.843 2 12 2792 TCT 0.151 0.163 0.198 7.907 8.677 11.038 34 176 6363 ACA Thr 0.098 0.111 0.271 6.512 6.409 13.354 28 130 7698 ACC 0.112 0.096 0.125 7.442 5.571 6.136 32 113 3537 ACG 0.010 0.023 0.129 0.698 1.331 6.375 3 27 3675 ACT 0.780 0.770 0.475 51.86 44.569 23.412 223 904 13496 TGG Trp 1.000 1.000 1.000 3.721 7.247 12.547 16 147 7233 TAC Tyr 0.566 0.578 0.340 20 18.981 11.279 86 385 6502 TAT 0.434 0.422 0.660 15.349 13.854 21.901 66 281 12625 GTA Val 0.250 0.275 0.313 19.767 22.087 20.775 85 448 11976 GTC 0.035 0.041 0.101 2.791 3.303 6.722 12 67 3875 GTG 0.029 0.037 0.193 2.326 2.958 12.797 10 60 7377 GTT 0.685 0.647 0.393 54.186 51.965 26.062 233 1054 15024 TAA Stop 0.000 0.000 0.422 0.000 0.000 14.733 0 0 8493 TAG 0.000 0.000 0.322 0.000 0.000 11.239 0 0 6479 TGA 0.000 0.000 0.257 0.000 0.000 8.972 0 0 5172
145
Table 2. Correlation analyses GCall GC1 GC2 GC3 CAI10 CAI50 CAIall RBSall Start Size
rP*
0.427
<.0001 0.419
<.0001 0.280
<0.0001 0.090 0.0001
0.268 <0.0001
0.262 <0.0001
0.054 0.0220
-0.171 <0.0001
-0.080 0.0006
0.240 <0.0001
rS*
0.411
<0.0001 0.410
<0.0001 0.283
<0.0001 0.070 0.0030
0.144 <0.0001
0.152 <0.0001
0.042 0.077
-0.144 <0.0001
-0.062 0.008
0.401 <0.0001
rK*
0.289 <0.0001
0.285 <0.0001
0.194 <0.0001
0.047 0.0025
0.097 <0.0001
0.103 <0.0001
0.026 0.1042
-0.098 <0.0001
-0.043 0.0064
0.276 <0.0001
Pval 4.0E-81 6.8E-78 5.5E-34 1.14E-4 4.3E-31 6.1E-30 2.2E-2 2.1E-13 6.0E-4 3.6E-25
SSR 3,124 3,149 3,520 3,788 3,546 3,556 3,808 3,707 3,795 3,599
* The first number indicates the correlation coefficient, and the second number indicates
the statistical significance rP Pearson correlation coefficient rS Spearman correlation coefficient rK Kendall correlation coefficient P-value significance of the linear regression statistic SSR Sum of Square Residuals from the linear regression
146
Table 3. Analysis of variance between chromosomal locations
genes LSM GCall GC1 GC2 GC3 CAI10 CAI50 CAIall RBSall Start Size
LeOT*
813
0.2665 A
0.3495 B
0.4688A
0.3356AB
0.2439B
0.5762A
0.5850A
0.7728A
-6.3774 A
556132 B
941.28AB
LeTO
183
-0.3239 C
0.3559 A
0.4548BC
0.3417A
0.2708A
0.5662A
0.5718B
0.7425B
-6.4574 A
1572284A
929.21AB
LaOT
229
-0.4418 C
0.3487 B
0.4455C
0.3295B
0.2696A
0.4383B
0.4292C
0.6444C
-6.2983 A
538891 B
837.73B
LaTO
588
0.0075 B
0.3488 B
0.4636AB
0.3300B
0.2513B
0.4434B
0.4304C
0.6305D
-6.2580 A
1552695A
1021.17A
* The first number indicates the mean, and the second number indicates the statistical
significance. Means with the same letter are not significantly different. LeOT leading strand, from the origin to the terminus LeTO leading strand, from the terminus to the origin LaOT lagging strand, from the origin to the terminus LaTO lagging strand, from the terminus to the origin
147
Table 4. Correlation analyses, by chromosomal location
genes GCall GC1 GC2 GC3 CAI10 CAI50 CAIall RBSall Start Size
LeOT*
813
0.4921 <.0001
0.4658 <.0001
0.3167 <.0001
0.0876 0.0124
0.5176 <.0001
0.5118 <.0001
0.1500 <.0001
-0.2631 <.0001
-0.0490 0.1629
0.2505 <.0001
LeTO
183
0.3691 <.0001
0.2917 <.0001
0.2600 <.0001
0.1601 0.0304
0.1509 0.0415
0.1255 0.0905
-0.10530.1560
0.0479 0.5198
-0.01831 0.8056
0.2604 0.0004
LaOT
229
0.4729 <.0001
0.3920 <.0001
0.3374 <.0001
0.1934 0.0033
-0.15230.0211
-0.11310.0876
-0.3368<.0001
-0.1322 0.0457
-0.0872 0.1886
0.44666<.0001
LaTO
588
0.3710 <.0001
0.3412 <.0001
0.2240 <.0001
0.1551 0.0002
-0.2205<.0001
-0.1793<.0001
-0.3122<.0001
-0.0972 0.0184
-0.0681 0.0992
0.1989 <.0001
* The first number indicates the Pearson correlation coefficient, the second number
indicates the statistical significance. LeOT leading strand, from the origin to the terminus LeTO leading strand, from the terminus to the origin LaOT lagging strand, from the origin to the terminus LaTO lagging strand, from the terminus to the origin
148
%GC
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Num
ber o
f gen
es
0
100
200
300
400
500
600
GC1GC2GC3GCall
Gene expression level (array LSM)
-3 -2 -1 0 1 2 3 4 5
Num
ber o
f gen
es
0
10
20
30
40
50
60
70
CAI
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Num
ber o
f gen
es
0
50
100
150
200
250
300
CAI10CAI50CAIall
Gene size (nt)
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
2500
2600
2700
2800
2900
3000
>300
0
Num
ber o
f gen
es
0
20
40
60
80
100
120
140
160
180
%GC
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
Num
ber o
f gen
es
0
100
200
300
400
500
600
GC1GC2GC3GCall
Gene expression level (array LSM)
-3 -2 -1 0 1 2 3 4 5
Num
ber o
f gen
es
0
10
20
30
40
50
60
70
CAI
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Num
ber o
f gen
es
0
50
100
150
200
250
300
CAI10CAI50CAIall
Gene size (nt)
100
200
300
400
500
600
700
800
900
1000
1100
1200
1300
1400
1500
1600
1700
1800
1900
2000
2100
2200
2300
2400
2500
2600
2700
2800
2900
3000
>300
0
Num
ber o
f gen
es
0
20
40
60
80
100
120
140
160
180
Figure 1. Gene distribution over select parameters. The gene distribution is shown over gene expression level (top left), gene size (top right), %GC (bottom left) and codon adaptation index (bottom right). For gene expression levels, the distribution is plotted as a factor of the transcription level determined by microarray experiments, namely the LSM (least square means), representing median gene expression level. For gene size, the distribution is plotted as a factor of the size of the gene, in nucleotides. For %GC, the distribution is plotted for each gene, globally, and for each codon position, namely GC1, GC2 and GC3 for the first, second and third position, respectively. For codon adaptation index, the distribution is plotted for all three training sets, namely CAI10 (using the 10 most highly expressed genes as training set), CAI50 (using the 50 most highly expressed genes as training set), and CAIall (using all the genes as training set).
149
LeOTLeTO
Origin Origin
TerminusTerminus
LEADING STRAND LeOTLeTO
Origin Origin
TerminusTerminus
LEADING STRAND
LaTO LaOTLAGGING STRANDLaTO LaOT
LAGGING STRAND
Figure 2. Chromosomal locations. Each strand is represented individually. The leading strand is colored in blue, while the lagging strand is colored in red. For genes on the leading strand (Le), genes from the origin to the terminus (OT) are solid blue (LeOT), whereas genes from the terminus to the origin (TO) are dashed blue (LeTO). For genes on the lagging strand (La), genes from the origin to the terminus (OT) (relative to the leading strand) are dashed red (LaOT), whereas genes from the terminus to the origin (TO) are solid red (LaTO).
150
Figure 3. Correlations between gene expression level and intrinsic gene parameters.
-2
-1
0
1
2
3
4
5
LSM
0 1000000 2000000start
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7 .8 .9CAI10
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7 .8 .9CAI50
-2
-1
0
1
2
3
4
5
LSM
.5 .6 .7 .8 .9CAIall
-2
-1
0
1
2
3
4
5
LSM
100 1000 600 400 200 10000 5000 3000
Size
-2
-1
0
1
2
3
4
5
LSM
-18 -16 -14 -12 -10 -8 -6 -4 -2 0RBSall
-2
-1
0
1
2
3
4
5
LSM
0 1000000 2000000start
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7 .8 .9CAI10
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7 .8 .9CAI50
-2
-1
0
1
2
3
4
5
LSM
.5 .6 .7 .8 .9CAIall
-2
-1
0
1
2
3
4
5
LSM
100 1000 600 400 200 10000 5000 3000
Size
-2
-1
0
1
2
3
4
5
LSM
-18 -16 -14 -12 -10 -8 -6 -4 -2 0RBSall
151
Figure 3 (continued). Correlations between gene expression level and intrinsic gene parameters. All plots investigate the relationship between an intrinsic parameter (X axis) and gene expression level (Y axis). The microarray median LSM represents the gene expression level. For intrinsic parameters, the position of the first translated nucleotide is used as the “start”; the gene length in nucleotide is used as the “size”; CAI10, CAI50 and CAI all are used as the values for codon adaptation index calculated for each training set; GC1, GC2, GC3 and GCall are used as the GC contents of the first, second, and third codon positions, respectively, while GCall represents the global GC content of a gene; RBSall represents the free energy level of the putative Shine Dalgarno sequence found upstream of the translational start.
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7GC1
-2
-1
0
1
2
3
4
5
LSM
.1 .2 .3 .4 .5GC3
-2
-1
0
1
2
3
4
5
LSM
.1 .2 .3 .4 .5 .6GC2
-2
-1
0
1
2
3
4
5LS
M
.2 .3 .4GCall
-2
-1
0
1
2
3
4
5
LSM
.2 .3 .4 .5 .6 .7GC1
-2
-1
0
1
2
3
4
5
LSM
.1 .2 .3 .4 .5GC3
-2
-1
0
1
2
3
4
5
LSM
.1 .2 .3 .4 .5 .6GC2
-2
-1
0
1
2
3
4
5LS
M
.2 .3 .4GCall
152
Figure 4. Analysis of variance by chromosomal location. For each chromosomal location, namely LeOT (Leading strand, between the origin and the terminus), LeTO (Leading strand, between the terminus and the origin), LaOT (Lagging strand, between the origin and the terminus, relative to the leading strand) and LaTO (Lagging strand, between the terminus and the origin, relative to the leading strand), the mean values for gene expression level, GC-content at the third codon position (%GC3), and codon adaptation index, as determined by the first training set (CAI10) are plotted. Means with the same letter are not significantly different Within each plot, data points with the same letter are not significantly different.
before terminus after terminus
%G
C3
0.240
0.245
0.250
0.255
0.260
0.265
0.270
0.275
Leading strandLagging strand
before terminus after terminus
CA
I10
0.42
0.44
0.46
0.48
0.50
0.52
0.54
0.56
0.58
0.60
Leading strandLagging strand
before Terminus after Terminus
Mea
n E
xpre
ssio
n Le
vel
-0.6
-0.4
-0.2
0.0
0.2
0.4
Leading strandLagging strand
LeOT - A
LeOT - B
LeOT - A
LaOT - C
LaOT - A
LaOT - B
LaTO - B
LaTO - B
LaTO - B
LeTO - C
LeTO - A
LeTO - A
before terminus after terminus
%G
C3
0.240
0.245
0.250
0.255
0.260
0.265
0.270
0.275
Leading strandLagging strand
before terminus after terminus
CA
I10
0.42
0.44
0.46
0.48
0.50
0.52
0.54
0.56
0.58
0.60
Leading strandLagging strand
before Terminus after Terminus
Mea
n E
xpre
ssio
n Le
vel
-0.6
-0.4
-0.2
0.0
0.2
0.4
Leading strandLagging strand
before terminus after terminus
%G
C3
0.240
0.245
0.250
0.255
0.260
0.265
0.270
0.275
Leading strandLagging strand
before terminus after terminus
CA
I10
0.42
0.44
0.46
0.48
0.50
0.52
0.54
0.56
0.58
0.60
Leading strandLagging strand
before Terminus after Terminus
Mea
n E
xpre
ssio
n Le
vel
-0.6
-0.4
-0.2
0.0
0.2
0.4
Leading strandLagging strand
LeOT - A
LeOT - B
LeOT - A
LaOT - C
LaOT - A
LaOT - B
LaTO - B
LaTO - B
LaTO - B
LeTO - C
LeTO - A
LeTO - A
153
Size LeTO
10 100 1000 10000
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
size LaOT
10 100 1000 10000
Arra
y LS
M-3
-2
-1
0
1
2
3
4
5
Size LeOT
10 100 1000 10000
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
Size LaTO
10 100 1000 10000
arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
Size LeTO
10 100 1000 10000
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
size LaOT
10 100 1000 10000
Arra
y LS
M-3
-2
-1
0
1
2
3
4
5
Size LeOT
10 100 1000 10000
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
Size LaTO
10 100 1000 10000
arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
LaOT GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LeOT GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LaTO GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LeTO GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LaOT GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LeOT GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LaTO GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
LeTO GCall
0.20 0.25 0.30 0.35 0.40 0.45 0.50
LSM
-3
-2
-1
0
1
2
3
4
5
Figure 5. Correlations between gene expression level and intrinsic genes parameters, by chromosomal location.
154
CAI10 LeTO
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LaTO
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LeOT
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LaOT
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LeTO
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LaTO
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LeOT
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
CAI10 LaOT
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Arra
y LS
M
-3
-2
-1
0
1
2
3
4
5
Figure 5 (continued). Correlations between gene expression level and intrinsic genes parameters, by chromosomal location. The relationships between gene expression level (array LSM) and three intrinsic parameters (gene size, GCall and CAI10) are plotted, for each location, namely LeOT, LeTO, LaOT, LaTO, as specified previously.
155
GC3
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
LaTOLeTOLeOTLaOT
Gene size (bp)
10 100 1000 10000
Num
ber o
f gen
es
0
20
40
60
80
100
LaTOLeTOLeOTLaOT
CAIall
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
500
600
LaTOLeTOLeOTLaOT
CAI10
0.3 0.4 0.5 0.6 0.7 0.8 0.9
Num
ber o
f gen
es
0
50
100
150
200
250
300
LaTOLeTOLeOTLaOT
%GCall
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
LaTOLeTOLeOTLaOT
Gene expression level (LSM)
-4 -2 0 2 4 6
Num
ber o
f gen
es
0
20
40
60
80
100
LaTOLeTOLeOTLaOT
GC3
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
LaTOLeTOLeOTLaOT
Gene size (bp)
10 100 1000 10000
Num
ber o
f gen
es
0
20
40
60
80
100
LaTOLeTOLeOTLaOT
CAIall
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
500
600
LaTOLeTOLeOTLaOT
CAI10
0.3 0.4 0.5 0.6 0.7 0.8 0.9
Num
ber o
f gen
es
0
50
100
150
200
250
300
LaTOLeTOLeOTLaOT
%GCall
0.0 0.2 0.4 0.6 0.8 1.0
Num
ber o
f gen
es
0
100
200
300
400
LaTOLeTOLeOTLaOT
Gene expression level (LSM)
-4 -2 0 2 4 6
Num
ber o
f gen
es
0
20
40
60
80
100
LaTOLeTOLeOTLaOT
Figure 6. Gene distribution over select parameters, by chromosomal location. The gene distribution over select parameters, namely gene expression level, %GCall, gene size, CAI10, CAIall and GC3 is plotted, for each chromosomal location, namely LeOT, LeTO, LaOT, LaTO, as specified previously.
156
APPENDIX I – Functional and comparative genomic analyses of an operon involved in fructooligosaccharides utilization by
Lactobacillus acidophilus
157
158
159
160
161
162
163