CHAPTER I – Literature review

Abstract

BARRANGOU, RODOLPHE. Functional genomic analyses of carbohydrate utilization

by Lactobacillus acidophilus. (Under the direction of Professor Todd R. Klaenhammer).

Carbohydrates are a primary source of energy for microbes. Specifically, lactic

acid bacteria have the ability to utilize a variety of nutrients available in their respective

habitats. For probiotic microbes inhabiting the human gastrointestinal tract, the ability to

utilize sugars non-digested by the host plays an important role in their survival.

Lactobacillus acidophilus is a probiotic organism which can utilize a variety of mono-,

di- and poly-saccharides, including prebiotic compounds such as fructooligosaccharides

and raffinose. However, little information is available about the mechanisms and genes

involved in carbohydrate utilization by lactobacilli. The transport and catabolic

machinery involved in utilization of glucose, fructose, sucrose, FOS, raffinose, lactose,

galactose and trehalose was characterized using global transcriptional profiling.

Microarray hybridizations were carried out using a round-robin design and data analyzed

using a two-stage mixed model ANOVA. Genes differentially expressed between

treatments were visualized by hierarchical clustering, volcano plots, and 3-way contour

plots. Globally, a small number of genes were highly induced, including a variety of

carbohydrate transporters and sugar hydrolases. Members of the phosphoenolpyruvate

sugar phosphotransferase system (PTS) family of transporters were identified for uptake

of glucose, fructose, sucrose and trehalose. In contrast, transporters of the ATP binding

cassette (ABC) family were identified for uptake of FOS and raffinose. A member of the

LacS family of galactoside-pentose-hexuronide (GPH) translocators was identified for

uptake of galactose and lactose. Saccharolytic enzymes likely involved in the metabolism

of mono-, di- and poly- saccharides were also identified, including the enzymatic

machinery of the Leloir pathway. Insertional inactivation of genes encoding sugar

transporters and hydrolases confirmed microarray results. Quantitative RT-PCR was also

used to confirm differential gene expression. Additional transcription experiments

showed specific induction of genes encoding sugar transporters and hydrolases, and

transcriptional repression by glucose. Collectively, microarray data revealed coordinated

and regulated transcription of genes involved in sugar utilization based on carbohydrate

availability, likely via carbon catabolite repression.

The relationships between gene expression level, codon usage, chromosomal

location and intrinsic gene parameters were investigated globally. Gene expression levels

correlated most highly with GC content, codon adaptation index and gene size. In

contrast, gene expression levels did not correlate with GC content at the third codon

position. Perhaps the high correlation between GC content and gene expression is due to

the low genomic GC composition of L. acidophilus. Analysis of variance was used to

investigate the impact of chromosomal location on gene expression after data was

segregated into four groups, by strand and orientation relative to the origin and terminus

of replication. Results showed genes on the leading strand were more highly expressed.

Also, genes pointing toward the terminus of replication showed higher expression levels.

This preference allows for co-directional replication and transcription. Collectively,

results showed a strong influence of chromosomal architecture, GC content and codon

usage on gene transcription.

Globally, analysis of gene expression in Lactobacillus acidophilus revealed

orchestrated transcription, and adaptation to environmental conditions. Specifically,

dynamic adaptation to carbohydrate sources available in the environment might

contribute to competition with other commensal microbes for the limited nutrient sources

available in the human gastrointestinal tract.

FUNCTIONAL GENOMIC ANALYSES OF CARBOHYDRATE

UTILIZATION BY LACTOBACILLUS ACIDOPHILUS

by

RODOLPHE BARRANGOU

A dissertation submitted to the Graduate Faculty of North Carolina State University

in partial fulfillment of the requirements for the Degree of

Doctor of Philosophy

FUNCTIONAL GENOMICS

Raleigh

2004

APPROVED BY: ________________________________ _________________________________ Dr. Todd R. Klaenhammer Dr. Greg Gibson Chairman of Advisory Committee ________________________________ _________________________________ Dr. Robert M. Kelly Dr. Dahlia M. Nielsen

Biography

Rodolphe Barrangou, the son of Charles Barrangou-Poueys and Roseline Helie, was born

on July 20, 1975 in Caen, France and raised in Paris, France. He attended the University

of Rene Descartes, Paris V (France) between 1994 and 1996 where he obtained a degree

in Life Sciences. He also attended the University of Technology of Compiegne (France)

between 1996 and 1998 where he obtained a M. S. degree in Biological Engineering. In

January 1999, he began working towards a Master of Science in Food Science at North

Carolina State University (USA) in the Vegetable Fermentation Laboratory (USDA-

ARS) under the direction of Dr. Henry P. Fleming and Dr. Todd R. Klaenhammer. In

January 2001, he began working towards a Ph. D. in Functional Genomics at North

Carolina State University (USA) in the Southeast Dairy Foods Research Center under the

direction of Dr. Todd R. Klaenhammer.

ii

Acknowledgements

First and foremost, I would like to thank my advisor, Dr. Todd R. Klaenhammer for

giving me the opportunity to pursue another graduate degree at NC State, for his time,

supervision, guidance, availability and support throughout my graduate education. I also

wish to acknowledge Dr. Greg Gibson, Dr. Robert M. Kelly, and Dr. Dahlia Nielsen, for

serving on my advisory committee, giving me time outside of committee meetings, and

insightful discussions. Also, I would like to acknowledge all my co-workers and

collaborators within the “Klaenhammer lab”, especially Evelyn Durmaz, Dr. Andrea

Azcarate Peril, Dr. Eric Altermann, and Tri Duong, for technical help, sharing their

expertise and suggestions. I would also like to thank my other collaborators on campus, at

the GRL (Dr. Bryon Sosinski and Regina Brierley), for providing help with microarray

printing and scanning; in the Microbiology Department (Dr. Jose Bruno-Barcena and Dr.

Hosni Hassan), for proving help with Q-PCR; and my collaborators in the Bioinformatics

Program, namely Shannon Conners and Joshua Starmer for collaborating with me. I

would also like to acknowledge Dr. Barbara Sherry and Dr. Stephanie Curtis for their

leadership in the Functional Genomics program. I would like to dedicate my work to my

whole family for teaching me everything that I need to know, and for understanding my

need to go overseas. I would also like to acknowledge my friends Tri and Mike for

making my experience in the lab (and beyond) particularly enjoyable. Finally, I would

like to give a very special and personal thank you to my wife Lisa, for her patience,

understanding, and permanent support throughout my graduate career, for helping me

make the right decisions, understand what is important, and sharing everything in my life.

iii

Table of contents

LIST OF TABLES. ___________________________________________________VII

LIST OF FIGURES. _________________________________________________ VIII

LIST OF ABBREVIATIONS. ___________________________________________ X

CHAPTER I – LITERATURE REVIEW: TRANSPORT SYSTEMS IN LACTIC ACID BACTERIA. _________________________________________________ 1

1.1 INTRODUCTION. __________________________________________________ 2

1.2 THE LACTIC ACID BACTERIA. ______________________________________ 4

1.3 GENOMICS OF LACTIC ACID BACTERIA. ____________________________ 8

1.4 FERMENTATION CAPABILITIES OF LACTIC ACID BACTERIA. ________ 11

1.5 ABC TRANSPORTERS. _____________________________________________ 13

1.6 PTS TRANSPORTERS. _____________________________________________ 17

1.7 OTHER TRANSPORTERS. __________________________________________ 19

1.8 REGULATION AND CARBON CATABOLITE REPRESSION. ____________ 22

1.9 CONCLUSIONS AND PERSPECTIVES. _______________________________ 26

1.10 REFERENCES. ___________________________________________________ 29

CHAPTER II – FUNCTIONAL AND COMPARATIVE GENOMIC ANALYSES OF AN OPERON INVOLVED IN FRUCTOOLIGOSACCAHRIDE UTILIZATION BY LACTOBACILLUS ACIDOPHILUS. _________________ 42

2.1 ABSTRACT. ______________________________________________________ 43

2.2 INTRODUCTION. _________________________________________________ 44

2.3 MATERIALS AND METHODS. ______________________________________ 45

2.3.1 Bacterial strain and media used in this study. ________________________ 45

2.3.2 Computational analysis of the putative msm operon. ___________________ 46

iv

2.3.3 RNA isolation and analysis. ______________________________________ 46

2.3.4 Comparative genomic analyses. ___________________________________ 47

2.3.5 Phylogenetic trees. _____________________________________________ 48

2.3.6 Gene inactivation. ______________________________________________ 48

2.4 RESULTS. ________________________________________________________ 49

2.4.1 Computational analysis of the msm operon. __________________________ 49

2.4.2 Sugar induction and co-expression of contiguous genes. ________________ 50

2.4.3 Mutant phenotype analysis. ______________________________________ 51

2.4.4 Comparative genomic analyses and locus alignments. _________________ 51

2.4.5 Phylogenetic trees. _____________________________________________ 52

2.4.6 Catabolite response elements (cre) analysis. _________________________ 53

2.5 DISCUSSION. _____________________________________________________ 54

2.6 REFERENCES. ____________________________________________________ 60

CHAPTER III – GLOBAL ANALYSIS OF CARBOHYDRATE UTILIZATION AND TRANSCRIPTIONAL REGULATION IN LACTOBACILLUS ACIDOPHILUS USING WHOLE-GENOME cDNA MICROARRAYS. _____ 77

3.1 ABSTRACT. ______________________________________________________ 78

3.2 INTRODUCTION. _________________________________________________ 80

3.3 MATERIALS AND METHODS. ______________________________________ 82

3.3.1 Bacterial strain and media used in this study. ________________________ 82

3.3.2 RNA isolation. _________________________________________________ 82

3.3.3 Microarray fabrication. _________________________________________ 83

3.3.4 cDNA target preparation and microarray hybridization. ________________ 83

3.3.5 Microarray data collection and analysis. ____________________________ 84

3.3.6 Real-Time Quantitative RT-PCR. __________________________________ 86

3.4 RESULTS. ________________________________________________________ 86

3.4.1 Differentially expressed genes. ____________________________________ 86

3.4.2 Real-Time Quantitative RT-PCR. __________________________________ 91

3.5 DISCUSSION. _____________________________________________________ 92

3.6 REFERENCES. ____________________________________________________ 98

v

CHAPTER IV – GLOBAL CHARACTERIZATION OF THE LACTOBACILLUS ACIDOPHILUS TRANSCRIPTOME AND ANALYSIS OF RELATIONSHIPS BETWEEN GENE EXPRESSION LEVEL, CODON USAGE, CHROMOSOMAL LOCATION AND INTRINSIC GENE CHARACTERISTICS._____________________________________________ 115

4.1 ABSTRACT. _____________________________________________________ 116

4.2 INTRODUCTION. ________________________________________________ 118

4.3 MATERIALS AND METHODS. _____________________________________ 120

4.3.1 Genome and microarray data. ___________________________________ 120

4.3.2 Gene intrinsic parameters. ______________________________________ 121

4.3.3 Codon adaptation index. ________________________________________ 122

4.3.4 Ribosome binding site identification. ______________________________ 123

4.3.5 Statistical analyses. ____________________________________________ 123

4.4 RESULTS. _______________________________________________________ 124

4.4.1 Distribution patterns. __________________________________________ 124

4.4.2 Correlation analyses. __________________________________________ 126

4.4.3 Chromosomal location. _________________________________________ 127

4.5 DISCUSSION. ____________________________________________________ 130

4.6 REFERENCES. ___________________________________________________ 142

APPENDIX I – FUNCTIONAL AND COMPARATIVE GENOMIC ANALYSES OF AN OPERON INVOLVED IN FRUCTOOLIGOSACCAHRIDE UTILIZATION BY LACTOBACILLUS ACIDOPHILUS. ________________ 157

vi

List of tables

Chapter I

1. Genomes of lactic acid bacteria and other probiotic species. _______________ 36

2. Carbohydrate utilization profiles for select lactic acid bacteria. ____________ 37

3. Transmembrane domains in L. acidophilus transporters. __________________ 38

Chapter II

1. Catabolite responsive elements sequences. ______________________________ 64

2. Primers used in this study. ___________________________________________ 65

3. Genes and proteins used for comparative genomic analyses. _______________ 66

Chapter IV

1. Codon usage table. _________________________________________________ 145

2. Correlation analyses. _______________________________________________ 146

3. Analysis of variance between chromosomal locations. ____________________ 147

4. Correlation analyses, by chromosomal location. ________________________ 148

vii

List of figures

Chapter I

1. Phylogenetic tree of lactic acid bacteri and select microbial species. _________ 39

2. Transporters commonly found in lactic acid bacteria. _____________________ 40

3. Transmembrane domains in ABC, PTS and GPH transporters in L. acidophilus. __________________________________________________________________ 41

Chapter II

1. Operon layout. _____________________________________________________ 68

2.Sugar induction and repression. _______________________________________ 69

3. Growth curves. _____________________________________________________ 70

4. Operon architecture analysis. _________________________________________ 71

5. Neighbor-joining phylogenetic tree. ____________________________________ 72

6. Co-expression of contiguous genes. ____________________________________ 73

7. Mutant growth on select carbohydrates. ________________________________ 74

8. Motifs highly conserved amongst repressors and fructosidases. _____________ 75

9. Biochemical pathways. ______________________________________________ 76

Chapter III

1. Round-robin microarray hybridization design. _________________________ 102

2. Hierarchical clustering analyses of gene expression patterns. ______________ 103

3. Hierarchical clustering analyses of gene expression patterns for select genes and operons. _________________________________________________________ 104

4. Volcano plot comparison of gene expression between FOS and raffinose. ____ 105

5. Contour plot comparison of gene expression between FOS, raffinose and trehalose. _________________________________________________________________ 106

6. Global differential gene expression. ___________________________________ 107

7. Gene fold induction. ________________________________________________ 108

8. RT-Q-PCR analysis of differentially expressed genes. ____________________ 109

9.Genetic loci of interest. ______________________________________________ 110

viii

10.Lactose locus in select lactobacilli. ___________________________________ 111

11. Catabolite responsive elements sequences. ____________________________ 112

12. Carbohydrate utilization in L. acidophilus. ____________________________ 113

13. Expression of glycolysis genes. ______________________________________ 114

Chapter IV

1. Gene distribution over select parameters. ______________________________ 149

2. Chromosomal locations. ____________________________________________ 150

3. Correlations between gene expression level and intrinsic genes parameters. _ 151

4. Analysis of variance, by chromosomal location. _________________________ 153

5. Correlations between gene expression level and intrinsic genes parameters, by chromosomal location. _____________________________________________ 154

6. Gene distribution over select parameters, by chromosomal location. _______ 156

ix

x

List of abbreviations

ABC ATP Binding Cassette ANOVA ANalysis Of Variance CAI Codon Adaptation Index CCR Carbon Catabolite Repression CH CHaperone proteins CRE Catabolite Responsive Element DNA Deoxyribo Nucleic Acid EC Enzyme Commission FOS Fructo Oligo Saccharides GIT Gastro Intestinal Tract GPH Galactoside Pentose Hexuronide LaOT Lagging strand, between the Origin and Terminus LaTO Lagging strand, between the Terminus and Origin LeOT Leading strand, between the Origin and Terminus LeTO Leading strand, between the Terminus and Origin LGT Lateral Gene Transfer LSM Least Squares Means MSM Multiple Sugar Metabolism NCFM North Carolina Food Microbiology NDO Non Digestible Oligosaccharides ORF Open Reading Frame PCR Polymerase Chain Reaction PEP Phospho Enol Pyruvate PHX Predicted Highly eXpressed PTS Phoshoenolpyruvate Transferase System RBS Ribosome Binding Site RNA Ribo Nucleic Acid RP Ribosomal Proteins RSCU Relative Synonymous Codon Usage SD Shine Dalgarno TF Transcription and Translation Factors

CHAPTER I - Literature review: Transport systems in Lactic Acid Bacteria

1.1 Introduction

Bacteria are a dominant and diverse life form on earth. Molecular comparisons

between life forms divide organisms into three groups, namely eubacteria, archaebacteria

and eukaryotes (Woese et al., 1990). At the molecular level, those three groups are based

on differences within the ribosomal RNA (rRNA) structure and sequence (Woese et al.,

1990). This triad-nomenclature includes the eukaryote-prokaryote dichotomy, which is

based on presence / absence of a nucleus. Specifically, life on earth is divided into three

“domains”, namely Bacteria (replacing eubacteria), Archaea (replacing archaebacteria)

and Eucarya (replacing eukaryotes) (Woese et al., 1990; Embley et al., 1994), wherein

there are six “kingdoms”, bacteria, fungi, plantae, animalia, protoctista (protozoa) and

chromista (Embley et al., 1994; Margulis, 1996; Cavalier-Smith, 2004). Both archaea and

bacteria are monohomogenomic, with no nucleus, whereas eucarya are

polyheterogenomic and contain a nucleus (Margulis, 1996).

The importance of microbes for all life-forms has been illustrated recently. Recent

phylogenetic analyses suggest the eukaryotic genome actually resulted from the fusion of

an archaeal genome with a bacterial genome (Margulis, 1996; Rivera and Lake, 2004),

consequently changing the tree of life into the ring of life (Rivera and Lake, 2004). This

recent theory emphasizes the historical and evolutionary importance of the bacterial

kingdom.

Prokaryotic diversity and predominance illustrate the physiological flexibility of

microbes, as well as their adaptability to many environments. A recent metagenomic

oceanic study investigating microbial genome diversity within a water community

illustrates our limited knowledge and comprehension of microbial diversity and

2

physiological properties (Venter et al., 2004), although measures of microbial diversity

have previously shown our limited knowledge of microbial diversity as well (Curtis et al.,

2002; Curtis and Sloan, 2004). The limited extent of microbial diversity is well

documented, and most environmental studies end up uncovering novel species and

lineages (Embley et al., 1996; Cavalier-Smith, 2004). Recent “conservative” assumptions

estimate microbial diversity at over 1030 individuals representing over 107 species

(Embley et al., 1994; Curtis et al., 2002; Curtis and Sloan, 2004), although estimates of

microbial diversity may be inaccurate. Nevertheless, recent advances in microbial

genomics have shown microbial diversity at many levels, especially for microbes that can

be cultured. Specifically, microbial diversity is visible both within and between species,

including differences in genome size, genome content, GC content, codon usage, mobile

genetic elements, cell shape, occurrence in the environment, growth conditions

(temperature, oxygen level, energy sources), and many others.

From a genomic standpoint, microbial diversity is visible through genome size,

GC content (ranging between 25% and 75%; Muto and Osawa, 1987), codon usage

(Grantham et al., 1980), genome content, and occurrence of bacteriophage and plasmids.

Although the differences can be overwhelming and represent a large proportion of the

genome, even within a given species, those differences illustrate physiological adaptation

to various environmental conditions. Specifically, microbes tend to adapt to their

environment via evolutionary pressures, in order to optimize their survival and

competitiveness.

Interestingly, genes encoding sugar transporters and carbohydrate hydrolases can

represent a large proportion of strain-specific genes, with ABC transporters reported to

3

the highest horizontal gene transfer frequency in Thermotoga maritima (Nesbo et al.,

2002). Similarly, it has been suggested that genes involved in catabolic properties of B.

longum (Schell et al., 2002) and sugar uptake genes in L. plantarum (Kleerebezem et al.,

2003) have been acquired via horizontal gene transfer, as part of the adaptation process of

these bacteria to their respective environments.

Understanding how microbes modulate their genomes to acquire physiological

properties and phenotypic traits that further their ability to withstand environmental

conditions and utilize resources available in their various habitats is important.

Specifically, for lactic acid bacteria, this review illustrates how various transport systems

contribute to their ability to utilize a diversity of energy sources available in a number of

habitats.

1.2 The lactic acid bacteria

Lactic acid bacteria (LAB) are a heterogeneous family of microbes which can

ferment a variety of nutrients (Poolman, 2002) primarily into lactic acid. LAB are mainly

Gram-positive, non-sporulating, acid tolerant, anaerobic bacteria divided in two subsets,

the low GC taxa, and the high GC taxa. Biochemically, lactic acid bacteria include both

homofermenters and heterofermenters. The former produce primarily lactic acid, while

the latter yield also a variety of fermentation by-products, including mostly acetic acid,

ethanol, carbon dioxide and formic acid (Hugenholtz et al., 2002; Kleerebezem and

Hugenholtz, 2003). Although their primary contribution consists of the rapid formation of

lactic acid, which results in acidification of food products, they also contribute to flavor,

texture and nutrition in a variety of food products (Kleerebezem and Hugenholtz, 2003).

4

Environmentally, LAB reside in a variety of habitats, including human cavities

such as the gastrointestinal tract (Lactobacillus plantarum, Lactobacillus acidophilus,

Lactobacillus johnsonii, Bifidobacterium longum, Streptococcus agalactiae,

Enterococcus faecalis), the oral cavity (S. mutans, B. longum), the respiratory tract (S.

pneumoniae) and the vaginal cavity (B. longum, S. agalactiae) (Tannock, 1999;

Ouwehand et al., 2002; Vaughan et al., 2002). Additionally, lactic acid bacteria are

naturally found in a variety of environmental niches including dairy, meat, vegetable and

plant environments (Kleerebezem et al., 2003).

The two driving forces behind the tremendous amount of work performed in lactic

acid bacteria are their use in fermentation processes and as probiotics. Specifically, a

diversity of microbial strains is used as starter cultures in the food industry, primarily in

dairy applications, although Lactococcus lactis is by far the best characterized lactic acid

bacterium (Bolotin et al., 1999). Additionally, select strains are used as health-promoting

probiotics in food product and dietary supplements (Gibson and Roberfroid, 1995; Reid,

1999).

In fermentation processes, lactic acid bacteria are used as starter cultures.

Although they are used in fermentation of meats, vegetables and wine, they are primarily

used in dairy processes. Specifically, they are widely used in cheese and yogurt

manufacturing. As a result, Lactococcus lactis is perhaps the most extensively studied

species among LAB, and a variety of genetic tools have been developed therein (Bolotin

et al., 1999; Hugenholtz et al., 2002; Kleerebezem and Hugenholtz, 2003).

Probiotics are generally defined as “live microorganisms which, when

administered in adequate amounts, confer a health benefit on the host” (Reid et al., 2003).

5

Probiotic microbes promote health via their presence and sometimes residence in the

human gastrointestinal tract, and interaction with the intestinal flora and host tissue. As a

result, phenomena such as adherence to human epithelial cells, survival at low pH,

resistance to acids, survival in the presence of bile salts, and competition with other

commensals all contribute to their ability to survive and promote human health (Sanders

and Klaenhammer, 2001). However, those functionalities rely on the survival and

competitiveness of the strain, which is dependent upon its ability to efficiently use

nutrient sources available in the intestinal environment. As a result, transporters are a key

factor involved in probiotic functionality. Lactic acid bacteria generally harbor a

significant number of transporters for acquisition of a diverse set of carbohydrates and

amino-acids.

Similarly, organisms used in fermentation applications need to use energy sources

available in their environment in order to carry out the desired metabolic processes. As a

result, uptake of nutrients, particularly carbohydrates is essential for fermentative LAB.

Therefore, identification and characterization of their transport systems is essential to

develop our understanding of the physiological processes involved in their

functionalities.

Although a large diversity of microbes produce lactic acid, only select members

of the lactic acid bacteria are widely used in fermentation processes and probiotic

applications. The primary genera employed are: Lactococcus, Lactobacillus,

Streptococcus, Bifidobacterium, and to a lesser extent Leuconostoc, and Oenococcus.

Additionally, within those genera, most of the work has focused on only a few select

species, as shown in Table 1.

6

A large and diverse microbial community resides in the human gastrointestinal

(Tannock, 1999). In particular, the complex microbial population in the intestine includes

beneficial bacteria such as bifidobacteria and lactobacilli (Tannock, 1999; Ouwehand et

al., 2002; Vaughan et al., 2002). Although they are not dominant microbes, probiotics are

important organisms that can promote health in a variety of mucosal locations, including

the human intestine. In humans, lactobacilli and bifidobacteria in particular, are perceived

as exerting health-promoting properties (Gibson and Roberfroid, 1995; Ouwehand et al.,

2002). Lactobacilli have been associated with a variety of health-promoting

functionalities, widely documented for humans, specifically in the case of Lactobacillus

species (Reid, 1999; Sanders and Klaenhammer, 2001). The large intestine in particular is

the most heavily colonized region of the human digestive tract (Gibson and Roberfroid,

1995). The colonic microbiota feeds on the unabsorbed remains of the diet, which

primarily consist of non-digestible sugars (Alles et al., 1996). Even though microbes have

a limited capacity to utilize substrates present in the environment, some bacteria have a

diverse genomic makeup shaped by evolution and adaptation that is selectively fashioned

to utilize and catabolize a wide range of nutrients present in their environmental niche.

Consequently, a wide carbohydrate catabolic potential likely allows microbes to compete

and survive in environmental niches where sugar molecules are scarce, as previously

suggested for Lactobacillus plantarum (Kleerebezem et al., 2002), Lactobacillus

acidophilus (Barrangou et al., 2003; Altermann, 2004), Lactobacillus johnsonii

(Pridmore et al., 2004) and Bifidobacterium longum (Schell et al., 2004). The ability of

select intestinal microbes to utilize intestinal nutrients, including substrates non-digested

by the host plays an important role in their ability to successfully survive and colonize the

7

mammalian intestinal tract. Whether they are fermentative organisms, or health-

promoting probiotics, microbial growth is primarily dependent upon energy sources such

as carbohydrates.

1.3 Genomics of lactic acid bacteria

In the recent past, substantial progress has been achieved in microbial genomics,

particularly in genome sequencing. To date, over 193 complete microbial genomes have

been published (NCBI website, www.ncbi.nlm.nih.gov/genomes/MICROBES/

complete.html), including 174 bacteria and 19 archaea, covering a wide diversity of

taxonomic groups. Early microbial genome analyses suggest that genome content reflects

adaptation to environmental conditions, specifically genes involved in transport and

catabolism of nutrients, since microbes shape their genomes to efficiently utilize

available resources and adapt to their habitats, according to temperature, levels of

oxygen, toxic compounds, and other factors.

The genome sequences of several lactic acid bacteria have been published,

including Lactococcus lactis (Bolotin et al., 1999), S. mutans (Ajdic et al., 2002), S.

pneumoniae (Tettelin et al., 2001), S. agalactiae (Tettelin et al., 2002), S. pyogenes

(Ferretti et al., 2001), Bifidobacterium longum (Schell et al., 2002), Lactobacillus

plantarum (Kleerebezem et al., 2003), L. johnsonii (Pridmore et al., 2004) and L.

acidophilus (Altermann et al., 2004). Several more are underway (Klaenhammer et al.,

2002; Siezen et al., 2004). For these LAB, probiotic organisms and other intestinal

microbes, genome features are presented in Table 1.

8

Lactic acid bacteria are low GC organisms (lactobacilli, streptococci, lactococci)

and high GC organisms (bifidobacteria, brevibacteria) (Table 1). LAB genomes vary

widely in size (between 1.8 and 4.4 Mbp), although most genomes are between 1.8 and

2.5 Mbp (Table 1). Genetically, LAB are diverse, as illustrated in Figure 1, including

high GC genera such as Bifidobacterium and Brevibacteria, and distinct low GC genera

Leuconostoc and Oenococcus seems distant from other LAB (Figure 1). In contrast,

streptococci and lactococci appear closely related, as well as lactobacilli and pediococci

(Figure 1).

Recent genome analyses have shown that bifidobacteria, streptococci and

lactobacilli possess specialized saccharolytic potentials which reflect the nutrient

availability in their respective environments (Tettelin et al., 2001; Ajdic et al., 2002;

Schell et al., 2002; Kleerebezem et al., 2003; Altermann et al., 2004; Pridmore et al.,

2004). Analysis of the L. plantarum genome revealed a variety of transporters, suggesting

a broad capacity to adapt to varying environmental conditions (Kleerebezem et al., 2003).

In particular, a “lifestyle adaptation island” bearing genes involved in sugar transport and

metabolism was defined on the chromosome (Kleerebezem et al., 2003). Similarly, the

diversity of transporters in S. mutans and S. pneumoniae have been associated with an

increased ability to utilize nutrient sources present in their environments, namely the oral

cavity and respiratory tract (Tettelin et al., 2001; Ajdic et al., 2002). The L. acidophilus

NCFM genome was also recently determined, and further substantiates these

observations (Altermann et al., 2004). Early analyses indicate that the genome contents of

bifidobacteria and lactobacilli reflect their habitats, particularly with regards to transport

systems able to utilize a variety of carbohydrates. In silico analyses of the genes encoded

9

in these genomes provide insight as to their fermentative and uptake capabilities. In

particular, a variety of putative carbohydrate transporters have been identified, suggesting

a wide saccharolytic potential for most of these microbes, especially with regards to

mono- and di-saccharides. However, most of the substrates for ABC transporters, and

some of the substrates for PTS transporters remain unknown (Altermann et al., 2004).

This is not uncommon, since a large portion of the content of microbial genomes remains

obscure, even for model organisms, consisting of unknown ORFs and conserved genes

encoding hypothetical proteins.

Within LAB. a diverse saccharolytic potential has previously been associated with

microbial ability to establish residency in specific environmental niches, in particular

adaptation of Bifidobacterium longum to the human gastro-intestinal tract (GIT) (Schell

et al., 2002), cariogenic activity of Streptococcus mutans in the oral cavity

(Vadeboncoeur and Pelletier, 1997; Ajdic et al., 2002), and the incidence of Lactobacillus

plantarum in a variety of environmental niches (Kleerebezem et al., 2003). Perhaps a

diverse catabolic potential is derived from environmental pressures, in response to

competition for scarce nutrients in the intestinal ecosystem (Schell et al., 2002;

Barrangou et al., 2003) and in the mouth cavity (Vadeboncoeur and Pelletier, 1997; Ajdic

et al., 2002). Although energy sources in the environment are vital for survival, the

capacity to uptake them efficiently can result in a competitive advantage. Therefore,

understanding the transportomes of microbes is expected to provide insight into their

respective abilities to survive and compete within their natural habitats. The classification

of transporter families encoded within a genome, and the identification of the uptake

systems provides a platform for understanding which resources are used by a specific

10

microorganism. Although there are only a few families of transporters well characterized

in prokaryotes, within each family, there are a diverse number of uptake systems with

varying substrate specificities.

This overview will describe the main families of transporters identified in lactic

acid bacteria, categorize within each family the uptake systems that are well

characterized, and investigate the diversity of transport capabilities within and between

organisms. Specifically, the capability of LAB to utilize a variety of carbohydrates via

the PTS and ABC transporter super families of transporters will be reviewed.

1.4 Fermentation capabilities of lactic acid bacteria

There are different means by which carbohydrates are utilized by bacteria: either

they are hydrolyzed outside of the cell into readily fermentable sugars and transported

into the cell thereafter, or they are transported into the cell and then catabolized. Either

way, carbohydrates have to be transported into the cell in order to be catabolized and

used as an energy source.

Although early genome analyses of LAB genomes have specifically looked at

utilization of carbohydrates, the actual substrates for the majority of the transporters

identified remain unknown. Additionally, the classification of transporters into specific

families, and attribution of a specific substrate, derived from in silico analyses, remains

largely putative. Nevertheless, the comparison of both fermentation patterns and genomic

content provides substantial insight into the transport abilities of LAB.

The fermentation profiles for L. acidophilus, L. johnsonii, L. gasseri and L.

plantarum are shown in Table 2. Additionally, detailed transporter annotations are

11

available for S. mutans, S. pneumoniae, L. lactis and L. acidophilus (Table 2). It appears

that most LAB have the ability to utilize a variety of mono- and di-saccharides,

specifically, hexoses such as fructose, glucose, galactose and mannose, and disaccharides

such as cellobiose, lactose, maltose, sucrose and trehalose (Table 1). In contrast,

utilization discrepancies are observed between LAB for pentoses, oligosaccharides, sugar

alcohols, deoxysugars and modified sugars (Table 2). Globally, it appears that LAB are

specialized for utilization of hexoses and disaccharides, and select species have gained

the ability to utilize more complex carbohydrates individually. This is consistent with

previous findings in LAB suggesting that L. plantarum, L. johnsonii and L. lactis appear

to ferment mainly mono-, di- and tri-saccharides (Siezen et al., 2004).

For the intestinal LAB, a limited number of species have the ability to transport

undigested complex carbohydrates, including prebiotics. Prebiotics are defined as “non-

digestible substances that provide a beneficial physiological effect on the host by

selectively stimulating the favorable growth or activity of a limited number of indigenous

bacteria” (Reid et al., 2003). These compounds include non-digestible plant

oligosaccharides such as FOS and raffinose (Van Laere et al., 2000; Rycroft et al., 2001).

Among LAB, L. acidophilus, L. plantarum, L. casei, S. thermophilus and a variety of

bifidobacteria have the ability to utilize FOS (Kaplan and Hutkins, 2000).

There are three primary families of transporters in LAB that have been identified

for sugar transport: (i) secondary active transport via the major facilitator superfamily

(MFS); (ii) the phosphoenolpyruvate transferase system (PTS); and (iii) the ATP binding

cassette (ABC) transport system (Paulsen et al., 1998; Paulsen et al., 2000; Saier, 2000;

Kaplan and Hutkins, 2003).

12

1.5 ABC transporters

The ABC superfamily (TC #3.1) is a diverse family of transporters which include

both inwardly importers and outwardly exporters (Saier, 2000; Davidson and Chen,

2004), whereby substrate translocation is coupled with adenosine tri phosphate (ATP)

hydrolysis (Locher et al., 2002). TC numbers represent categories of the Transport

Commission classification (Saier, 2000). ABC transporters are a dominant transporter

superfamily in bacteria (Paulsen et al., 2000), and, they are the most abundant class of

primary transport systems in lactic acid bacteria (Poolman, 2002). ABC transporters are

the most dominant transporter family in L. plantarum, wherein 57 complete systems were

annotated (Kleerebezem et al., 2003), in S. mutans where over 60 ABC transporters are

hypothetically present (Ajdic et al., 2002) and in S. pneumoniae where over 30% of

transporters are predicted to be sugar transporters. Although ABC transporters recognize

a variety of substrates, in LAB, ABC uptake transporters primarily recognize

carbohydrates. In contrast, in B. longum, most of the 25 ABC transporters seem to have

specificity for oligopeptides and amino acids (Schell et al., 2003). For most LAB,

members of the ATP binding cassette (ABC) family of transporters include uptake

proteins identified primarily for the transport of mono-, di-, tri- and poly- saccharides.

Specifically, ABC transporters have been characterized for the transport of maltose,

trehalose, lactose, arabinose, ribose, glucose, fucose, raffinose, and a variety of peptides.

ABC transporters usually consist of several subunits, namely the nucleotide

binding domains (NBDs), the membrane spanning domains (MSDs), and substrate

binding proteins (SBPs) (Quentin et al., 1999; Braibant et al., 2000; Poolman, 2002). The

13

minimum “complete” ABC transporter must include both a nucleotide binding domain

and a membrane spanning domain. Importers are usually pentameric, including two

NBDs, two MSDs and one SBP, whereas exporters are tetrameric, including two NBSs

and two MSDs (Braibant et al., 2000) (Figure 2).

There are many sub-families of ABC transporters, which are classified by the

nature of the substrate being translocated, including peptides, amino-acids, drugs,

antibiotics, iron, ions, and carbohydrates (Braibant et al., 2000). For importers, ABC

transporters involved in the uptake of carbohydrates are a key sub-family. Specifically,

most carbohydrate ABC transporters are similar to MalEFGK (Paulsen et al., 2000),

whereby MalE is a periplasmic substrate/solute binding protein (pfam 01547), MalFG are

two membrane-spanning permeases (pfam 00528), and MalK is a cytoplasmic

nucleotide-binding protein (pfam 00005), characteristic of the four subunits of a typical

ABC transport system (Quentin et al., 1999). In prokaryotes, the various elements of

ABC transporters are usually encoded by genes in the same operon, or locus, as

illustrated by the malEFGK and msmEFGK operons (Russel et al., 1992; Quentin et al.,

1999; Braibant et al., 2000; Barrangou et al., 2003).An anchoring motif similar to LPxTG

is usually present at the N-terminus of the substrate binding protein, allowing attachment

of this protein to the cell wall via a hydrophobic lipid extension (Quentin et al., 1999;

Braibant et al., 2000). However, this anchoring motif can vary between organisms, as

shown in L. plantarum, where the anchoring consensus sequence is LPQTxE

(Kleerebezem et al., 2003). Each permease usually contains four to eight transmembrane

α-helices, with most MSDs containing six trans-membrane regions (Table 3, Figure 3),

14

which form a trans-membrane channel allowing transport of the substrate across the

membrane, into the cell cytoplasm.

For the nucleotide binding protein, which is responsible for the hydrolysis of ATP

associated with transport of each molecule into the cell, there are several well conserved

motifs typical of ABC transporters. Specifically, genome-wide analyses of ABC

transporters in prokaryotes have shown that four motifs within the NBDs are highly

conserved between and within species, namely: Walker A (P loop), Walker B, Linton and

Higgins, and the ABC signature sequence (Linton and Higgins, 1998; Quentin et al.,

1999; Braibant et al., 2000; Locher et al., 2001; Davidson and Chen, 2004). The Walker

A motif has a GxxGxGKST / [AG]xxxxGK[ST] consensus, Walker B has a hhhhDEPT /

DExxxxxD consensus, the Linton and Higgins has a hhhhH+/- consensus, and ABC

signature sequence has a LSGG / LSGGQ consensus, whereby h and +/- represent

hydrophobic and charged residues, respectively (Linton and Higgins, 1998; Quentin et

al., 1999; Braibant et al., 2000; Davidson and Chen, 2004).

Perhaps the best characterized sugar ABC transporter in LAB is the MsmEFGK

transport system (Russell et al., 1992). It was originally described in S. mutans (Russell et

al., 1992; McLaughlin et al., 1996), and homologs were found in S. pneumoniae

(Rosenow et al., 1999) and L. acidophilus (Barrangou et al., 2003; Altermann et al.,

2004). MsmEFGK is involved in uptake of multiple sugars, including melibiose,

raffinose, isomaltotriose and FOS (Russell et al., 1992; Barrangou et al., 2003; Kaplan

and Hutkins, 2003).

Also, in B. longum, MalEFGK-like ABC transporters seem to be involved in the

transport of plant oligosaccharides such as arabinoglycans and arabinoxylans, which is

15

consistent with the presence of endoarabinosidases and endoxylanases in the genome

(Schell et al., 2003). A similar combination has also been found in E. faecalis (Paulsen et

al., 2003).

Most exporters are involved in transport of components toxic to the cell, such as

drugs and antibiotics (Poolman, 2002), whereas most importers are involved in transport

of energy sources and building blocks. Multidrug ABC transporters are commonly found

in LAB genomes. In particular, the LmrA multidrug ABC transporter has been well

characterized in Lactococcus lactis (van Veen et al., 1999). LmrA has the ability to

export anthracyclines, vinca-alkaloids, antibiotics and cytotoxic agents such as ethydium

bromide (van Veen et al., 1999). Multidrug ABC transporters are part of the mechanisms

developed by microbes in response to the occurrence of toxic compounds in their natural

habitats.

Overall, ABC transporters involved in carbohydrate uptake seem to have affinity

primarily for tri- and poly-saccharides. The substrate specificity is determined by the

substrate binding protein, although one specific SBP can recognize more than one

substrate, as illustrated by the msm operon in S. mutans (Russell et al., 1992). In

environments whereby tri- and poly-saccharides are present, such as the lower gastro-

intestinal tract, ABC transport systems are expected to provide a competitive advantage

by expanding the organism’s access to the pool of available substrates.

1.6 PTS transporters

Members of the phosphoenolpyruvate:sugar phosphotransferase system family of

transporters include uptake proteins identified primarily for the transport of mono- and

16

di-saccharides. The PTS is characterized by a phosphate transfer cascade involving

phosphoenolpyruvate (PEP), enzyme I (EI), HPr, and various EIIABCs, whereby a

phosphate originating from PEP is ultimately transferred to the carbohydrate substrate

(Vadeboncoeur and Pelletier, 1997; Siebold et al., 2001; Poolman, 2002; Warner and

Lolkema, 2003). Specifically, PTS transporters (TC # 4.1 - 4.6) have been characterized

for the transport of glucose, mannose, fructose, cellobiose, sucrose, trehalose (Table 2). It

was previously suggested the PTS system is the primary sugar transport system of Gram-

positive bacteria (Ajdic et al., 2002; Warner and Lolkema, 2003). Although PTS

transporters are not found in archaea or eukarya, they are present in most bacteria

(Paulsen et al., 2000; Saier, 2000). The PTS consists of three (EIIA, B and C) or four

domains (EIIA, B, C and D) (Saier and Reizer, 1992). The hydrophilic chains bearing the

first and second phosphorylation sites are EIIA and EIIB, respectively, while the

transmembrane channel and sugar binding site consist of EIIC (Saier and Reizer, 1998).

The number of predicted transmembrane spanning domain is usually 10 in PTS

transporters (Table 3, Figure 3), which is different from ABC transporters. When

applicable, EIID is the hydrophobic protein of the splinter group (Saier and Reizer,

1992). The range and specificity of substrates transported by each PTS transporter is

determined by the range of the EII complex.

In streptococci, PTS transporters are important in carbohydrate uptake and

regulation (Vadeboncoeur and Pelletier, 1997). Specifically, in Streptococcus salivarius,

Streptococcus mutans and Streptococcus sobrinus, PTS transporters involved in uptake of

a variety of mono- and di- saccharides have been identified (Vadeboncoeur and Pelletier,

1997). In contrast, only one PTS transporter is present in B. longum (Schell et al., 2003).

17

In lactobacilli, a variety of PTS transporters have been identified, including 13, 16

and 25 complete PTS transporters in L. lactis, L. johnsonii and L. plantarum, respectively

(Bolotin et al., 1999; Schell et al., 2003; Kleerebezem et al., 2003). In streptococci, a

variety of PTS transporters have also been identified, including 21 complete PTS

transporters in S. pneumoniae (Tettelin et al., 2001).

Since there is a correlation between the genomic association of genes and

functional interaction of the proteins they encode (Snel et al, 2002), catabolic enzymes

are expected to be encoded in the vicinity of the genes encoding transporters of their

substrates. Similarly, transcriptional regulators are also commonly found in the vicinity

of the genes they control. As a result, for carbohydrate loci, transcriptional regulators,

transporters and sugar hydrolases are usually found in operons and loci.

Perhaps the best characterized PTS transporters in LAB are the sucrose and

glucose/mannose transport systems (Luesink et al., 1999b; Cochu et al., 2003). The

sucrose PTS locus has been described in L. lactis (Luesink et al., 1999b), L. plantarum

(Naumoff and Livshits, 2001) and P. pentosaceus (Naumoff and Livshits, 2001). The

glucose/mannose PTS EIIABCDMan transporter was recently characterized in S.

thermophilus (Cochu et al., 2003). Both PTS transporters have also been found in

recently sequenced LAB genomes (see table 2).

A number of lactic acid bacteria uptake glucose and mannose via a PTS

transporter. Specifically, the EIIMan PTS transporter has the ability to uptake both

mannose and glucose (Cochu et al., 2003). The glucose-mannose PTS transporter has

been well characterized in S. thermophilus (Cochu et al., 2003). The glucose PTS has

18

been identified in a variety of streptococci, namely S. mutans, S. sobrinus and S.

thermophilus (Vadeboncoeur and Pelletier, 1997).

Several PTS loci have been well characterized in LAB, especially the glucose,

fructose and sucrose loci, which contain the mannose / glucose PTS transporter

EIIABCDMan, the fructose PTS transporter EIIABCFru, and the sucrose PTS transporter

EIIBCASuc. Additionally, the trehalose locus, including a trehalose PTS transporter

EIIABCTre has been well characterized in L acidophilus (Duong et al., 2004). Putative

PTS transporters have also been identified in a variety of LAB (Table 2), but the

annotation is based on similarity to other non-LAB transporters, and most have not been

substantiated by functional analyses. In streptococci, PTS activity has been shown for

glucose, fructose, mannose, lactose, mannitol, sorbitol, maltose, sucrose, trehalose, and

xylitol (Vadeboncoeur and Pelletier, 1997).

Overall, PTS transporters involved in carbohydrate uptake appear to have affinity

primarily for mono- and di-saccharides. The substrate specificity is determined by the

EIIA, EIID or EIIC substrate binding protein, although one specific SBP can recognize

more than one substrate, as illustrated by the mannose / glucose EIIABCDMan. In

environments whereby mono- and di- saccharides are present, such as the upper

gastrointestinal tract, PTS transport systems might provide efficient carbohydrate

utilization and potentially a competitive advantage.

1.7 Other transporters

Lactic acid bacteria possess a variety of transport systems (Saier, 2000; Konings,

2002). In addition to ABC and PTS transporters, the main transporter families include the

19

F0F1 ATPase, the uniport / symport / antiport systems, and the protein secretion / export

system (Konings, 2002; Figure 2).

The secondary transport system, including uniport / symport / antiport complexes

is involved primarily in transport of amino acids, ions, and acids (Konings, 2002).

Specifically, amino-acid transporters have been well characterized in Lactococcus lactis

(Bolotin et al., 1999; Konings, 2002).

The F0F1-ATPase (TC 3.1, Paulsen et al., 1998) has been well characterized in

several LAB, including lactobacilli (Kullen and Klaenhammer, 1999; Sievers et al.,

2003), bifidobacteria (Ventura et al., 2004), oenococci (Sievers et al., 2003), pediococci

(Seivers et al., 2003) and Lactococcus lactis (Bolotin et al., 1999; Konings, 2002). The

operon has been well characterized in L. acidophilus (Kullen and Klaenhammer, 1999)

and B. lactis (Ventura et al., 2004), whereby atpBEFHAGDC encode the a, c, b, δ, α, γ,

β, and ε subunits of the F0F1-ATPase, respectively. This transport system is an important

element in the response and tolerance to low pH, which is instrumental for resistance to

acid stress in the human gastrointestinal tract. This is another typical example of how

genomes of intestinal microbes include specific transporters which allow them to exist in

various environments. Similarly, members of the Oenococcus and Leuconostoc genera

used in wine fermentation also have a F0F1-ATPase which confers resistance to low pH

(Sievers et al., 2003).

The major facilitator superfamily (MFS) includes a variety of transporters (TC

#2.1-2.2). Specifically, the glycoside-pentoside-hexuronide (GPH):cation symporter

family is associated with transport of carbohydrates, including galactosides (Saier, 2000).

This family includes 12 transmembrane domains (Saier, 2000), which is different from

20

PTS transporters, and similar to the two combined MSDs in each ABC transporter (Table

3, Figure 3).

With regard to drug resistance, in addition to multidrug ABC transporters, LAB

have also developed secondary transporters which export drugs and toxic compounds

(van Veen et al., 1999). Specifically, in L. lactis, LmrP mediates the extrusion of drugs,

such as antibiotics, in antiport with protons (van Veen et al., 1999). This system is also

part of the major facilitator superfamily (Saier, 2000).

Member of the LacS subfamily of galactoside-pentose-hexoronide subfamily of

translocators have been identified for the uptake of lactose and galactose in Lactobacillus

bulgaricus (Leong-Morgenthaler et al., 1991), Leuconostoc lactis (Vaughan et al., 1996),

S. thermophilus (van den Bogaard et al., 2000; Vaillancourt et al., 2002), S. salivarius

(Vaillancourt et al., 2002; Lessard et al., 2003), L. delbrueckii (Lapierre et al., 2002) and

L. helveticus (Fortina et al., 2003). A similar GPH transporter, LacY, is present in L.

lactis (Bolotin et al., 1999).

Although LacS contains a PTS EIIA at the carboxy-terminus, towards the

cytoplasmic side of the protein (Vaughan et al., 1996; Lessard et al., 2003), it is not a

member of the PTS family of transporters. Also, LacS contains 12 transmembrane

domains, which differs from PTS transporters (Table 3, Figure 3). LacS has been reported

to have the ability to take up both galactose and lactose in select organisms (Vaughan et

al., 1996; van den Bogaart et al., 2000), although the specificity varies between

organisms, and depends on the presence of alternative galactoside transporters in the

organism.

21

A LacS-LacY homolog was also identified in L. brevis (Djordjevic et al., 2001).

Although it is a member of the GPH family (TC # 2.2), it did not include a PTS IIA

domain, indicating dependence upon a different regulatory network than that of the PTS

and other GPH transporters (Djordjevic et al., 2001).

The gene encoding the GPH transporter is usually associated with ORFs encoding

enzymes involved in the metabolism of galactosides. Specifically, saccharolytic enzymes

likely involved in the metabolism of galactosides include the enzymatic machinery of the

Leloir pathway, although operon organization is variable and unstable among LAB

(Lapierre et al., 2002; Vaillancourt et al., 2002; Boucher et al., 2003; Fortina et al., 2003;

Grossiord et al., 2003; Pridmore et al., 2004). The Leloir pathway allows catabolism of

both lactose and galactose into substrates of glycolysis (Grossiord et al, 2003).

Alternatively, the tagatose pathway may also metabolize galactosides (de Vos, 1996;

Boels et al., 2003).

1.8 Regulation and carbon catabolite repression

To understand how microbes utilize carbohydrates, we must determine the genetic

and biochemical bases for sugar transport into the cell, and identify the regulatory

networks involved in transcription of genes encoding transporters. Carbohydrate transport

and catabolism are well orchestrated in LAB, so as to utilize carbohydrate sources

optimally. The regulatory mechanism for global carbohydrate utilization is carbon

catabolite repression (CCR).

Carbon catabolite repression (CCR) is a mechanism widely distributed amongst

Gram-positive bacteria, usually mediated in cis by catabolite response elements (cre)

22

(Weickert and Chambliss, 1990; Miwa et al., 2000), and in trans by repressors of the

LacI family, responsible for transcriptional repression of genes encoding unnecessary

saccharolytic components (Weickert and Chambliss, 1990; Viana et al., 2000;

Muscariello et al, 2001; Titgemeyer and Hillen, 2002; Warner and Lolkema, 2003). Cre

sequences (Weickert and Chambliss, 1990) are well conserved amongst Gram-positive

bacteria and found in most LAB in the promoter-operator of many genes involved in

carbohydrate utilization (Barrangou et al., 2003), including: L. plantarum (Muscariello et

al., 2001), L. pentosus (Mahr et al., 2000). CCR controls transcription of proteins

involved in transport and catabolism of carbohydrates (Miwa et al., 2000), as to

transcribe genes encoding the transport and enzymatic machinery of a particular

substrate, exclusively when it is present in the environment. This regulatory system

allows cells to coordinate the utilization of diverse carbohydrates, as to focus primarily

on preferred energy sources (Poolman, 2002). Understanding carbon catabolite repression

is critical to describing how microbes adapt their uptake machinery to changing nutrients

in their environment.

CCR is able to control both PTS, ABC and GPH transporters. Specifically, ABC

transporters of the MsmEFGK family have been shown to be repressed by glucose in a

manner consistent with CCR, in S. pneumoniae (Rosenow et al., 1999; Barrangou et al.,

2003). Similarly, genes of the galactose operon seem to be regulated via CCR in S.

salivarius (Vaillancourt et al., 2002).

The L. acidophilus genome encodes a large variety of genes related to

carbohydrate utilization. In particular, many members of the ABC and PTS families of

transporters were found. Additionally, the members of the general carbohydrate

23

utilization regulatory network were identified, namely HPr (ptsH), E1 (ptsI), CcpA

(ccpA) and HPrK/P (ptsK). Similarly, all those genes were identified in S. pneumoniae

(Tettelin et al., 2001). Those genes are involved in an active regulatory network based on

sugar availability. The regulatory networks involved in sugar utilization are not well

documented in lactobacilli and bifidobacteria, whereas they have been characterized in

streptococci (Vadeboncoeur and Pelletier, 1997). Nevertheless, previous work has

indicated involvement of CcpA in repression of specific operons in L. casei, and L.

plantarum (Viana et al., 2000; Muscariello et al., 2001) and L. pentosus (Mahr et al.,

2000). Specifically, the pepQ-ccpA locus has been identified in L. pentosus, L.

delbrueckii, L. casei, S. mutans and L. lactis (Mahr et al., 2000), and in most cases, a cre

sequence is found in the promoter-operator region of ccpA. The PTS is characterized by a

phosphate transfer cascade involving PEP, EI, HPr, and various EIIABCs, whereby a

phosphate is ultimately transferred to the carbohydrate substrate (Saier, 2000; Titgemeyer

and Hillen, 2002; Warner and Lolkema, 2003). HPr is a key component of CCR, which is

regulated via phosphorylation by enzyme I (EI) and HPr kinase/phosphatase (HPr K/P).

While HPr is the primary regulator of CCR, HPr K/P is the sensor enzyme of CCR in

Gram positive bacteria (Nessler, et al., 2003). HPrK/P has been found in a variety of

LAB, including L. casei, L. brevis, L. delbrueckii, L. gasseri, L. acidophilus, L. lactis,

Streptococcus bovis, S. mutans, S. salivarius, S. pneumoniae, S. pyogenes, S. agalactiae

and Leuconostoc mesenteroides (Warner and Lolkema, 2003; Altermann et al., 2004).

Similarly, HPr has also been found in a variety of LAB, including L. casei, L. sakei, L.

acidophilus, L. gasseri, L. brevis, L. mesenteroides, L. lactis. E. Faecalis, S. mutans, S.

salivarius, S. bovis, S. pyogenes, S. pneumoniae, S. thermophilus, S. agalactiae and

24

Oenococcus oeni (Warner and Lolkema, 2003; Altermann et al., 2004). The HPr-HPrK/P

complex has been characterized structurally (Fieulaine et al., 2002). When HPr is

phosphorylated at His15, the PTS is on (Poolman, 2002), and carbohydrates transported

via the PTS are phosphorylated via EIIABCs. In contrast, when HPr is phosphorylated at

Ser46, the PTS machinery is not functional (Vadeboncoeur and Pelletier, 1997;

Mijakovic et al., 2002,; Nessler et al., 2003). HPr-Ser46 acts as a co-repressor by binding

to CcpA (Fieulaine et al., 2002; Nessler et al., 2003). Ultimately, CcpA binds to cre

sequences in the promoter-operator region of operons encoding carbohydrate transporters

and hydrolases, and prevents their transcription (Hueck and Hillen, 1995; Poolman,

2002).

HPr has been identified in E. faecalis (Vadeboncoeur and Pelletier, 1997), S.

pyogenes (Deutscher and Saier, 1983; Vadeboncoeur and Pelletier, 1997), and L. lactis

(Luesink et al., 1999a).

CcpA-dependent repression and activation is well documented in a variety of

LAB, including enterococci, lactobacilli, lactococci and streptococci, especially with

regard to repression of the genes involved in utilization of galactosides (Titgemeyer and

Hillen, 2002).

The interaction between HPr and LacS has been shown in S. salivarius (Lessard et

al., 2003). It happens between HPr-His and EIIALacS, although LacS is not a member of

the PTS system. Since HPr is the primary regulator of CCR, the interaction between HPr

and LacS illustrates the likely regulation of the GPH system by CCR. In S. thermophilus,

the control of LacS by CCR has been illustrated, likely via interaction between CcpA and

25

two cre sequences found in the promoter-operator region of the lacSZ. Operon (van den

Bogaard et al., 2000).

Although the phosphorylation cascade suggests regulation at the protein level,

studies in LAB report both transcriptional modulation and constitutive expression of

ccpA and ptsHI. Specifically, in S. thermophilus, CcpA production is induced by glucose

(can den Bogaard, 2000). Similarly, in other bacteria, the carbohydrate source modulates

ptsHI transcriptional levels (Luesink et al., 1999a). In contrast, expression levels of ccpA

in L. pentosus (Mahr et al., 2000) and of ptsHI in S. thermophilus (Cochu et al., 2003) did

not vary in the presence of different carbohydrates.

Carbon catabolite repression is likely present in L. acidophilus, since all the

necessary regulatory proteins are encoded within its genome, cre-like sequences are

present in the promoter-operator regions of several carbohydrate loci (Barrangou et al.,

2003), and transcription of operons involved in utilization of non-preferred carbohydrates

is repressed by glucose (Barrangou et al., 2003).

Carbon catabolite repression illustrates how lactic acid bacteria adapt dynamically

to the diverse carbohydrate sources available in their various habitats.

1.9 Conclusions and perspectives

Although a variety of putative carbohydrate transporters have been identified in

LAB genomes recently published, little information is available regarding their biological

functions and expression profiles. Specifically, the substrate specificity of most PTS and

ABC transporters remains unclear, as illustrated in the incomplete annotation of most

PTS transporters in L. plantarum, L. acidophilus, L. johnsonii and S. pneumoniae

26

(Kleerebezem, 2003; Altermann, 2004; Schell et al., 2003; Tettelin et al., 2001). As a

result, in silico analyses must be confirmed and complemented by transcriptional and

biological analyses.

Surveys of carbohydrate uptake systems revealed greater diversity in prokaryotes

than eukaryotes. Specifically, eukaryotic carbohydrate transport is dominated by the

MFS, whereas that of prokaryotes involved both the MFS, PTS and ABC superfamilies

of transporters (Saier, 2000).

Recent advances in high throughput technologies, primarily genome sequencing

and microarrays have yielded global data that provide insight into the physiology of

microbes. Particularly, LAB genome analyses have illustrated the breadth and importance

of carbohydrate transporters in lactobacilli and bifidobacteria. Similarly, global

transcriptome analyses, similar to those carried out in Escherichia coli (Beloin et al.,

2004), Bacillus subtilis (Blencke et al., 2003), Vibrio cholerae (Meibom et al., 2003),

Thermotoga maritima (Chhabra et al., 2003; Pysz et al., 2004a; Pysz et al., 2004b) and

Pyrococcus furiosus (Shockley et al., 2003), applied to carbohydrate utilization

investigation in LAB will provide further insight into the transporters and metabolic

pathways involved in adaptation of LAB to their various environmental conditions.

Ultimately, genetic engineering of LAB could allow development of better starter

cultures and probiotic strains, optimized for utilization of specific carbohydrate sources,

and competition with other commensals. Genetic engineering in LAB is now possible,

following the development of molecular biology tools, including food-grade systems (de

Vos, 1996; Russell and Klaenhammer, 1998; Boucher et al., 2002; Kleerebezem and

Hugenholtz, 2003).

27

Overall, the combination of a diverse saccharolytic enzymatic machinery with a

polyvalent transport system, consisting primarily of ABC and PTS transporters, allows

lactic acid bacteria to utilize a variety of nutrient resources efficiently and dynamically

adapt its transcriptome to environmental conditions, ultimately rending these microbes

more competitive in their respective environments.

28

1.10 References

Ajdic, D., McShan, W. M., McLaughlin, R. E., Savic, G., Chang, J., Carson, M. B., Primeaux, C., Tian, R., Kenton, S., Jia, H., Lin, S., Qian, Y., Li, S., Zhu, H., Najar, F., Lai, H., White, J., Roe, B. A. & Ferretti, J. J. (2002) Proc. Natl. Acad. Sci. USA 99, 14434-14439

Alles, M. S., Hautvast, J. G. A. J., Nagengast, F. M., Hartemink, R., Van Laere, K. M. J.,

and J. B. M. Jansen (1996) Brit. J. Nutr. 76, 211-221 Altermann, E., Russell, W. M., Azcarate-Peril, M. A., Barrangou, R., Buck, L. B.,

McAuliffe, O., Souther, N., Dobson, A., Duong, T., Callanan, M., Lick, S., Hamrick, A., Cano, R., & Klaenhammer, T. R. (2004). J. Bacteriol In review

Barrangou R, Altermann E, Hutkins R, Cano, & Klaenhammer, TR. (2003) Proc. Natl.

Acad. Sci. USA 100, 8957-8962 Beloin, C., Valle, J., Latour-Lambert, P., Faure, P., Kzreminski, M., Balestrino, D.,

Haagensen, J. A. J., Molin, S., Prensier, G., Arbeile, B., & Ghigo, J. M. (2004) Mol. Microbiol. 51, 659-674

Blencke, H. M., Homuth, G., Ludwig, H., Mader, U., Hecker, M., & Stulke, J. (2003)

Metab. Eng. 5, 133-149 Boels, I. C., Kleerebezem, M., & de Vos, W. M. (2003) Appl. Environ. Microbiol. 69,

1129-1135 Bolotin, A., Mauger, S., Malarme, K., Ehrlich, S. D., & Sorokin, A. (1999) Antonie van

Leeuwenhoek 76, 27-76 Boucher, I., Parrot, M., Gaudreau, H., Champagne, C. P., Vadeboncoeur, C., & Moineau,

S. (2002) Appl. Environ. Microbiol. 68, 6152-6161 Boucher, I., Vadeboncoeur, C., & Moineau, S. (2003) Appl. Environ. Microbiol. 69,

4149-4156 Braibant, M., Gilot, P., & Content, J. (2000) FEMS Microbiol. Rev. 24, 449-467 Cavalier-Smith, T. (2004) Proc. R. Soc. Lond. 271, 1251-1262 Chhabra, S. R., Shockley, K. R., Conners, S. B., Scott, K. L., Wolfinger, R. D., & Kelly,

R. M. (2003) J. Biol. Chem. 278, 7540-7552 Cochu, A., Vadeboncoeur, C., Moineau, S, & Frenette, M. (2003) Appl. Environ.

Microbiol. 69, 5423-32

29

Curtis, T. P., & Sloan, W. T. (2004) Curr. Opin. Microbiol. 7, 221-226 Curtis, T. P., Sloan, W. T., & Scannell, J. W. (2002) Proc. Natl. Acad. Sci. USA 99,

10494-10499 Davidson, A. L., & Chen, J. (2004) Annu. Rev. Biochem. 73, 241-268 Deutscher, J., & Saier, M. H. (1983) Proc. Natl. Acad. Sci. USA 80, 6790-6794 De Vos, W. M. (1996) Antonie van Leeuwenhoek 70, 223-242 Djordjevic, G. M., Tchieu, J. H., & Saier, M. H. (2001) J. Bacteriol. 183, 3224-3236 Duong, T., Barrangou, R., Russell, M. W., & Klaenhammer, T. R. (2004) In review Embley, T. M., Hirt, R. P., & Williams, D. M. (1994) Phil. Trans. R. Soc. Lond. 345, 21-

33 Ferretti, J. J., McShan, W. M., Ajdic, D., Savic, D. J., Savic, G., Lyon, K., Primeaux, C.,

Sezate, S., Suvorov, A., Kenton, S., Lai, H. S., Lin, S. P., Qian, Y., Jia, H. G., Najar, F. Z., Ren, Q., Zhu, H., Song, L., White, J., Yuan, X., Clifton, S. W., Roe, B. A., & McLaughlin, R. (2001) Proc. Natl. Acad. Sci. USA 98, 4658-4663

Fieulaine, S., Morera, S., Poncet, S., Mijakoic, I., Galinier, A., Janin, J., Deutscher, J., &

Nessler, S. (2002) Proc. Natl. Acad. Sci. USA 99, 13437-13441 Fortina, M. G., Ricci, G., Mora, D., Guglielmetti, S., & Manachini, P. L. (2003) Appl.

Environ. Microbiol. 69, 3238-43 Gibson, G. R. & Roberfroid, M. B. (1995) J. Nutr. 125, 1401-1412 Grantham, R., Gautier, C., Gouy, M., Mercier, R., & Pave, A. (1980) Nucleic Acids Res.

8, r49-r62 Grossiord, B. P., Luesink, E. J., Vaughan, E. E., Arnaud, A., & De Vos, W. M. (2003) J.

Bacteriol. 185, 870-8 Hueck, C. J., & Hillen, W. (1995) Mol. Microbiol. 15, 395-401 Hugenholtz, J., Sybesma, W., Groot, M. N., Wisselink, W., Ladero, V., Birgess, K., van

Sinderen, D., Piard, J. C., Eggink, G., Smid, E. J., Savoy, G., Sesma, F., Jansen, T., Hols, P., & Kleerebezem, M. (2002) Antonie van Leeuwenhoek 82, 217-235

Kaplan, H., & Hutkins, R. W. (2000) Appl. Environ. Microbiol. 66, 2682-2684

30

Kaplan, H., & Hutkins, R. W. (2003) Appl. Environ. Microbiol. 69, 2217-2222 Klaenhammer, T. R., Altermann, E., Arigoni, F., Bolotin, A., Breidt, F., Broadbent, J.,

Cano, R., Chaillou, S., Deutscher, J., Gasson, M., van de Guchte, M., Guzzo, J., Hartke, A., Hawkins, T., Hols, P., Hutkins, R., Kleerebezem, M., Kok, J., Kuipers, O., Lubbers, M., Maguin, E., McKay, L., Mills, D., Nauta, A., Overbeek, R., Pel, H., Pridmore, D., Saier, M., van Sinderen, D., Sorokin, A., Steele, J., O'Sullivan, D., de Vos, W., Weimer, B., Zagorec, M., and Siezen, R. (2002) Antonie Van Leeuwenhoek 82, 29-58

Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,

R., Tarchini, R., Peters, S. A., Sandbrink, H. M., Fiers, M. W., Stiekema, W., Lankhorst, R. M., Bron, P. A., Hoffer, S. M., Groot, M. N., Kerkhoven, R., de Vries, M., Ursing, B., de Vos, W. M. & Siezen, R. J. (2003) Proc. Natl. Acad. Sci. USA 100, 1990-5

Kleerebezem, M., & Hugenholtz, J. (2003) Curr. Opin. Biotechnol. 14, 232-237 Konings, W. N. (2002) Antonie van Leeuwenkoeck 82, 3-27 Krogh, A., Larsson, B., von Heijne, G., & Sonnhammer, E. L. L. (2001) J. Mol. Biol.

305, 567-580 Kullen, M. J., & Klaenhammer, T. R. (1999) Mol. Microbiol. 33, 1152-1161 Kumar, S., Tamura, K., Jakobsen, I. B., & Nei, M. (2001) Bioinformatics 17, 1244-1245 Lapierre, L., Mollet, B., & Germond, J. E. (2002) J. Bacteriol. 184, 928-35 Leong-Morgenthaler, P., Zwahlen, M. C., & Hottinger, H. (1991) J. Bacteriol. 173, 1951-

1957 Lessard, C., Cochu, A., Lemay, J. D., Roy, D., Vaillancourt, K., Frenette, M., Moineau,

S., & Vadeboncoeur, C. (2003) J. Bacteriol. 185, 6764-72 Linton, K. J., & Higgins, C. F. (1998) Mol. Microbiol. 28, 5-13 Locher, K. P., Lee, A. T., & Rees, D. C. (2002) Science 296, 1091-1098 Luesink, E. J., Beumer, C. M. A., Kuipers, O. P., & de Vos, W. M. (1999a) J. Bacteriol.

181, 764-771 Luesink, E. J., Marugg, J. D., Kuipers, O. P. & de Vos, W. M. (1999b) J. Bacteriol. 181,

1924-1926 Mahr, K., Hillen, W., & Titgemeyer, F. (2000) Appl. Environ. Microbiol. 66, 277-83

31

Margulis, L. (1996) Proc. Natl. Acad. Sci. USA 93, 1071-1076 McLaughlin, R. E., & Ferretti, J. J. (1996) FEMS Microbiol. Lett. 140, 261-264 Meibom, K. L., Li, X. B., Wu, C. Y., Roseman, S., & Schoolnik, G. K. (2004) Proc. Natl.

Acad. Sci. USA 101, 2524-2529 Mijakovic, I., Poncet, S., Galinier, A., Monedero, V., Fieulaine, S., Janin, J., Nessler, S.,

Marquez, J. A., Scheffzek, K., Hasenbein, S., Hengstenberg, W., & Deutscher, J. (2002) Proc. Natl. Acad. Sci. USA 99, 13442-7

Miwa, Y., Nakata, A., Ogiwara, A., Yamamoto, M. & Fujita, Y. (2000) Nucleic Acids

Res. 28, 1206-10 Muscariello, L., Marasco, R., De Felice M., & Sacco, M. (2001) Appl. Environ.

Microbiol. 67, 2903-2907 Muto, A., & Osawa, S. (1987) Proc. Natl. Acad. Sci. USA 84, 166-169 Naumoff, D. G., & Livshits, V. A. (2001) Mol. Biol. 35, 19-27 Nesbo, C. L., Nelson, K. E., & Doolitle, W. F. (2002) J. Bacteriol. 184, 4475-4488 Nessler, S., Fieulaine, S., Poncet, S., Galinier, A., Deutscher, J., & Janin, J. (2003) J.

Bacteriol. 185, 4003-4010 Ouwehand, A. C., Salminen, S., & Isolauri, E. (2002) Antonie van Leeuwenhoek 82, 279-

289 Paulsen, I. T. Sliwinski, M. K., & Saier, M. H. (1998) J. Mol. Biol. 277, 573-592 Paulsen, I. T., Nguyen, L., Sliwinski, M. K., Rabus, R., & Saier, M. H. (2000) J. Mol.

Biol. 301, 75-100 Paulsen, I. T., Banerjei, L., Myers, G. S. A. Nelson, K. E., Seshadri, R., Read, T. D.,

Fouts, D. E., Eisen, J. A., Gill, S. R., Heidelberg, J. F., Tettelin, H., Dodson, R. J., Umayam, L., Brinkac, L., Beanan, M., Daugherty, S., DeBoy, R. T., Durkin, S., Kolonay, J., Madupu, R., Nelson, W., Vamathevan, J., Tran, B., Upton, J., Hansen, T., Shetty, J., Khouri, H., Utterback, T., Radune, D., Ketchum, K. A. Dougherty, B. A., & Fraser, C. M. (2003) Science 299, 2071-2074

Poolman, B. (2002) Antonie van Leeuwenhoek 82, 147-164

32

Pridmore RD, Berger B, Desiere F, Vilanova D, Barretto C, Pittet AC, Zwahlen MC, Rouvet M, Altermann E, Barrangou R, Mollet B, Mercenier A, Klaenhammer TR, Arigoni F, & Schell MA. (2004) Proc. Natl. Acad. Sci. USA 101, 2512-2517

Pysz, M. A., Conners, S. B., Montero, C. I., Shockley, K. R., Johnson, M. R., Ward, D.

E., & Kelly, R. M. (2004a) Appl. Environ. Microbiol. 70, 6098-6112 Pysz, M. A., Ward, D. E., Shockley, K. R., Montero, C. I., Conners, S. B., Johnson, M.

R., & Kelly, R. M. (2004b) Extremophiles 8, 209-17 Quentin, Y., Fichant, G., & Denizot, F. (1999) J. Mol. Biol. 287, 467-484 Reid, G. (1999) Appl. Environ. Microbiol. 65, 3763-6 Reid, G., Sanders, M. E., Gaskins, H. R., Gibson, G. R., Mercenier, A., Rastall, R.,

Roberfroid, M., Rowland, I., Cherbut, C., & Klaenhammer T. R. (2003) J. Clin. Gastroenterol. 37, 105-118

Rivera, M. C., & Lake, J. A. (2004) Nature 431, 152-155 Rosenow, C., Maniar, M., & Trias, J. (1999) Genome Res. 9, 1189-97 Russell, R. R. B., Aduse-Opoku, J., Sutcliffe, I. C., Tao, L. & Ferretti, J. J. (1992) J. Biol.

Chem. 267, 4631-4637 Russell, W. M., & Klaenhammer, T. R. (2001) Appl. Environ. Microbiol. 67, 4361-4364 Rycroft, C. E., Jones, M. R., Gibson, G. R. & Rastall, R. A. (2001) J. Appl. Microbiol.

91, 878-87 Saier, M. H., & Reizer, J. (1992) J. Bacteriol. 174, 1433-1438 Saier, M. H. (2000) Mol. Microbiol. 35, 699-710 Sanders, M. E., & Klaenhammer, T. R. (2001) J. Dairy. Sci. 84, 319-331 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,

M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sci. USA 99, 14422-14427

Shockley, K. R., Ward, D. E., Chhabra, S. R., Conners, S. B., Montero, C. I., & Kelly, R.

M. (2003) Appl. Environ. Microbiol. 69, 2365-2371 Siebold, C., Flukiger, K., Beutler, R., & Erni, B. (2001) FEBS Lett. 504, 104-111

33

Sievers, M., Uermosi, C., Fehlmann, M., & Krieger, S. (2003) System. Appl. Microbiol. 26, 350-356

Siezen, R. J., van Enckevort, F. H. J., Kleerebezem, M., & Teusink, B. (2004) Curr.

Opin. Biotechnol. 15, 105-115 Snel, B., Bork, P., & Huynen M. A. (2002) Proc. Natl. Acad. Sci. USA 99, 5890-5895 Tannock, G. W. (1999) Antonie van Leeuwenhoek 76, 265-278 Tettelin, H., Nelson, K. E., Paulsen, I. T., Eisen, J. A., Read, T. D., Peterson, S.,

Heidelberg, J., Deboy, R. T., Haft, D. H., Dodson, R. J., Durkin, A. S., Gwinn, M., Kolonay, J. F., Nelson, W. C., Peretron, J. D., Umayam, L. A., While, O., Salzberg, S. L., Lewis, M. R., Radune, D., Holtzapple, E., Khouri, H., Wolf, A. M., Utterback, T. R., Hansen, C. L., McDonald, L. A., Feldblyum, T. V., Angiuoli, S., Dickinson, T., Hickey, E. K., Holt, I. E., Loftus, B. J., Yang, F., Smith, H. O., Venter, J. C., Dougherty, B. A., Morrison, D. A., Hollingshead, S. K., & Fraser, C. M. (2001) Science 293, 498-506

Tettelin, H., Masignani, V., Cieslewicz, M. J., Eisen, J. A., Peterson, S., Wessels, M. R.,

Paulsen, I. T., Nelson, K. E., Margarit, I., Read, T. D., Madoff, L. C., Wolf, A. M., Beanan, M. J., Brinkac, L. M., Daugherty, S. C., DeBoy, R. T., Durkin, A. S., Kolonay, J. F., Madupu, R., Lewis, M. R., Radune, D., Fedorova, N. B., Scanlan, D., Khouri, H., Mulligan, S., Carty, H. A., Cline, R. T., Van Aken, S. E., Gill, J., Scarselli, M., Mora, M., Iacobini, E. T., Brettoni, C., Galli, G., Mariani, M., Vegni, F., Maione, D., Rinaudo, D., Rappuoli, R., Telford, J. L., Kasper, D. L., Grandi, G., & Fraser, C. M. (2002) Proc. Natl. Acad. Sci. USA 99, 12391-12396

Thompson, J. D., Higgins, D. G., & Gibson, T. J. (1994) Nucleic Acids Res. 22, 4673-

4680 Titgemeyer, F., & Hillen, W. (2002) Antonie van Leeuwenhoek 82, 59-71 Vadeboncoeur, C., & Pelletier, M. (1997) FEMS Microbiol. Rev. 19, 187-207 Vaillancourt, K., Moineau, S., Frenette, M., Lessard, C., & Vadeboncoeur, C. (2002) J.

Bacteriol. 184, 785-793 Van den Bogaard, P. T. C., Kleerebezem, M., Kuipers, O. P., & De Vos, W. M. (2000) J.

Bacteriol. 182, 5982-5989 Van Laere, K. M., Hartemink, R., Bosveld, M., Schols, H. A. & Voragen, A. G. (2000) J.

Agric. Food Chem. 48, 1644-1652 Van Veen, H. W., Margolles, A., Putman, M., Sakamoto, K., & Konings, W. N. Antonie

van Leeuwenhoek 76, 347-352

34

Vaughan, E. E., David, S., & De Vos W. M. (1996) Appl. Environ. Microbiol. 62, 1574-

82 Vaughan, E. E., de Vries, M. C., Zoetendal, E. G., Ben-Amor, K., Akkermans, A. D. L.,

& de Vos, W. M. (2002) Antonie van Leeuwenhoek 82, 341-352 Venter, J. C., Remington, K., Heidelberg, J. F., Halpern, A. L., Rusch, D., Eisen, J. A.,

Wu, D., Paulsen, I., Nelson, K. E., Nelson, W., Fouts, D. E., Levy, S., Knap, A. H., Lomas, M. W., Nealson, K., White, O., Peterson, J., Hoffman, J., Parsons, R., Baden-Tillson, H., Pfannkoch, C., Rogers, Y. H., & Smith, H. O. (2004) Science 304, 66-74

Ventura, M., Canchaya, C., van Sinderen, D., Fitzgerald, G. F., & Zink, R. (2004) Appl.

Environ. Microbiol. 70, 3110-3121 Viana, R., Monedero, V., Dossonet, V., Vadeboncoeur, C., Perez-Martinez, G., &

Deutscher, J. (2000) Mol. Microbiol. 36, 570-584 Warner, J. B., & Lolkema, J. S. (2003) Microbiol. Mol. Rev. 67, 475-490 Weickert, M. J. & Chambliss, G. H. (1990) Proc. Natl. Acad. Sci. USA 87, 6238-6242 Woese, C. R., Kandler, O., & Wheelis, M. L. (1990) Proc. Natl. Acad. Sci. USA 87,

4576-4579

35

Table 1. Genomes of lactic acid bacteria and other probiotic species

Genus Species strain Size (Mbp) %GC Status reference Bifidobacterium longum NCC2705 2.3 60.1 C1 Schell et al. breve NCIMB8807 2.4 58.8 C Siezen et al. Enterococcus faecalis V583 3.2 37.5 C Paulsen et al. Lactobacillus acidophilus NCFM 2.0 34.7 C Altermann et al. gasseri ATCC333323 1.8 35.1 IP JGI johnsonii NCC533 2.0 34.6 C Pridmore et al. plantarum WCFS1 3.3 44.5 C Kleerebezem et al. casei ATCC334 2.5 41.1 IP JGI rhamnosus HN001 2.4 46.4 IP Klaenhammer et al. helveticus CNRZ32 2.4 37.1 IP Klaenhammer et al. brevis ATCC367 2.0 43.1 IP JGI sakei 23K 1.9 41.2 C Klaenhammer et al. delbrueckii ATCCBAA365 2.3 45.7 IP JGI Lactococcus lactis ssp. lactis IL1403 2.3 35.4 C Bolotin et al. lactis ssp. cremoris SK11 2.3 30.9 IP JGI Leuconostoc mesenteroides ATCC8293 2.0 37.4 IP JGI Oenococcus oeni ATCCBAA331 1.8 37.5 IP JGI Pediococcus pentosaceus ATCC25745 2.0 37.0 IP JGI Streptococcus agalactiae 2603V/R 2.2 35.7 C Tettelin et al. mutans UA159 2.0 36.8 C Ajdic et al. pneumoniae TIGR4 2.2 39.7 C Tettelin et al. pyogenes M1 1.9 38.5 C Ferretti etal. thermophilus LMD9 1.8 36.8 IP JGI

1 C, complete

2 IP, in progress

3 JGI, Joint Genome Institute

Adapted from Klaenhammer et al., 2002 and Siezen et al., 2004

36

Table 2. Carbohydrate utilization profiles for select lactic acid bacteria

Fermentation1 Annotation2 Type Sugar Lac Lpl Ljo Lga Smu Spn Lla Pentoses Arabinose Yes Ribose Yes Yes Ribulose PTS Xylose Yes Xylulose Hexoses Fructose PTS PTS Yes Yes PTS PTS PTS Galactose GPH Yes Yes Yes ABC ABC GPH Glucose PTS Yes Yes Yes PTS PTS Yes Mannose PTS PTS Yes Yes PTS PTS PTS Disaccharides Cellobiose PTS PTS Yes Yes PTS PTS PTS Gentiobiose PTS PTS Yes Lactose GPH Yes Yes Yes PTS PTS GPH Maltose ABC Yes Yes Yes ABC ABC Yes Melibiose PTS Yes Yes Yes Sucrose PTS PTS Yes PTS PTS PTS Trehalose PTS Yes Yes Yes PTS PTS PTS Turanose Yes Oligosaccharides FOS ABC Yes Melezitose Yes Raffinose ABC Yes Yes Yes ABC Sugar alcohols Galactitol PTS PTS Glycerol ABC ABC Mannitol PTS PTS PTS PTS Sorbitol PTS PTS Deoxysugars Fucose Rhamnose Yes Modified Sugars Amygdalin Yes PTS Yes Yes Arbutin PTS PTS Yes Yes Esculin PTS Yes Yes Yes Gluconate PTS Malate N-acetylglucosamine PTS PTS Yes Yes PTS Yes Salicin PTS PTS Yes Yes Yes

1 determined by fermentation patterns obtained from API50CHO (BioMerieux, Durham, NC) 2 determined by ORF functional assignment from the genome annotation Lac Lactobacillus acidophilus Lpl Lactobacillus plantarum Ljo Lactobacillus johnsonii Lga Lactobacillus gasseri Smu Streptococcus mutans Spn Streptococcus pneumoniae Lla Lactococcus lactis

37

Table 3. Transmembrane domains in L. acidophilus transporters

Family Gene ORF# Substrate TMD

ABC msmF La503 FOS 6 msmG La504 FOS 6 msmF2 La1441 Raffinose 6 msmG2 La1440 Raffinose 6

PTS scrA 401 Sucrose 8-10 treB 1012 Trehalose 10 fruA 1777 Fructose 10 manLMN La452-5 Mannose/glucose 10

GPH lacS 1463 Lactose/galactose 12

TMD, number of transmembrane domains in a protein, as predicted by the algorithm developed by Krogh et al., 2001.

38

L. lac

tis

S. mutans

S. pneumoniae

S. pyogenes

S. agalactiae

L. gasseri

L. jo

hnso

nii

L. a

cido

phi lu

s

L. p

lant

arum

P. p

ento

sace

us

E. faec

alis

B. halodurans

B. subtilis

S. aureusL. mesenteroides

O. oeni

B. longum

B. linens

T. maritim

a

.r

L. lac

tis

S. mutans

S. pneumoniae

S. pyogenes

S. agalactiae

L. gasseri

L. jo

hnso

nii

L. a

cido

phi lu

s

L. p

lant

arum

P. p

ento

sace

us

E. faec

alis

B. halodurans

B. subtilis

S. aureusL. mesenteroides

O. oeni

B. longum

B. linens

T. maritim

a

.r

V. alginolyticus

V. cholerae

E. coliK. pneumoniae

S typhimuyium

V. alginolyticus

V. cholerae

E. coliK. pneumoniae

S typhimuyium

Figure 1. Phylogenetic tree of lactic acid bacteria and select microbial species. This phylogenetic tree is a neighbor-joining tree obtained from the multiple sequence alignment of 16S rRNA genes in ClustalW (Thompson et al., 1994), visualized in MEGA2 (Kumar et al., 2001). Black, lactic acid bacteria; red, bacillales; yellow, thermotogae; red, proteobacteria. Within LAB, branches for different subgroups have different colors: blue, streptococci, pink, lactobacilli, purple, high GC brevibacteria and bifidobacteria.

39

IIC IIB IIA

PTS

ABC ABC

IIA

SBPMSD

NBD

MSD

NBD

GPH

IMPORTER EXPORTER

ATPase

SECONDARY TRANSPORTERS

UNIPORT ANTIPORT SYMPORT

IIC IIB IIA

PTS

ABC ABC

IIA

SBPMSD

NBD

MSD

NBD

GPH

IMPORTER EXPORTER

ATPase

SECONDARY TRANSPORTERS

UNIPORT ANTIPORT SYMPORT

Figure 2. Transporters commonly found in lactic acid bacteria. Green, ABC transporters; Red, PTS transporters; yellow, GPH transporters; Gray, ATPase; blue, secondary transporters.

40

A B C D

Figure 3. Transmembrane domains in ABC, PTS and GPH transporters in L. acidophilus. A, TMDs in FOS ABC transporter MsmE; B, TMDs in FOS ABC transporter MsmF; C, TMDs in sucrose PTS transporter ScrB; D, TMDs in lactose/galactose GPH transporter LacS

41

CHAPTER II – Functional and comparative genomic analyses of an operon involved in fructooligosaccharide utilization by Lactobacillus

acidophilus

Published in Proc. Natl. Acad. Sci. USA 100, 8957-8962 – see appendix 1

42

2.1 Abstract

Lactobacillus acidophilus NCFM is a probiotic organism that displays the ability to

utilize prebiotic compounds, such as fructo-oligosaccharides (FOS), which stimulate the

growth of beneficial commensals in the gastrointestinal tract. However, little is known

about the mechanisms and genes involved in FOS utilization by Lactobacillus species.

Analysis of the L. acidophilus NCFM genome revealed an msm locus composed of a

transcriptional regulator of the LacI family, a four component ABC transport system, a

fructosidase and a sucrose phosphorylase. Transcriptional analysis of this operon

demonstrated that gene expression was induced by sucrose and FOS, but not by glucose

or fructose, suggesting some specificity for non-readily fermentable sugars. Additionally,

expression was repressed by glucose, but not by fructose, suggesting catabolite

repression, via two cre-like sequences identified in the promoter-operator region.

Insertional inactivation of the genes encoding the ABC transporter substrate binding

protein and the fructosidase reduced the ability of the mutants to grow on FOS.

Comparative analysis of gene architecture within this cluster revealed a high degree of

synteny with operons in Streptococcus mutans and Streptococcus pneumoniae. However,

the association between a fructosidase and an ABC transporter is unusual, and may be

specific to L. acidophilus. This is the first description of a gene locus involved in

transport and catabolism of FOS compounds, which can promote competition of

beneficial microorganism in the human gastrointestinal tract.

43

2.2 Introduction

The ability of select intestinal microbes to utilize substrates non-digested by the

host may play an important role in their ability to successfully colonize the mammalian

gastrointestinal (GI) tract. A diverse carbohydrate catabolic potential is associated with

cariogenic activity of S. mutans in the oral cavity (Ajdic et al., 2002), adaptation of L.

plantarum to a variety of environmental niches (Kleerebezem et al., 2003), and residence

of B. longum in the colon (Schell et al., 2002), illustrating the competitive benefits of

complex sugar utilization. Prebiotics are non-digestible food ingredients that selectively

stimulate the growth and/or activity of beneficial microbial strains residing in the host

intestine (Gibson and Roberfroid, 1995). Among sugars that qualify as prebiotics, fructo-

oligosaccharides (FOS) are a diverse family of fructose polymers used commercially in

food products and nutritional supplements, that vary in length and can be either

derivatives of simple fructose polymers, or fructose moieties attached to a sucrose

molecule. The linkage and degree of polymerization can vary widely (usually between 2

and 60 moieties), and several names such as inulin, levan, oligofructose and neosugars

are used accordingly. The average daily intake of such compounds, originating mainly

from wheat, onion, artichoke, banana, and asparagus (Gibson and Roberfroid, 1995;

Moshfegh et al., 1999), is fairly significant with nearly 2.6 g of inulin and 2.5 g of

oligofructose consumed in the average American diet (Moshfegh et al., 1999). FOS are

not digested in the upper gastrointestinal tract and can be degraded by a variety of lactic

acid bacteria (Hartemink et al., 1995; Hartemink et al., 1997; Kaplan and Hutkins, 2000;

Van Laere et al., 2000), residing in the human lower gastrointestinal tract (Gibson and

Roberfroid, 1995; Orrhage et al., 2000). FOS and other oligosaccharides have been

44

shown in vivo to beneficially modulate the composition of the intestinal microbiota, and

specifically to increase bifidobacteria and lactobacilli (Gibson and Roberfroid, 1995;

Orrhage et al., 2000). A variety of L. acidophilus strains in particular have been shown to

utilize several polysaccharides and oligosaccharides such as arabinogalactan,

arabinoxylan and FOS (Kaplan and Hutkins, 2000; Van Laere et al., 2000). Despite the

recent interest in FOS utilization, little information is available about the metabolic

pathways and enzymes responsible for transport and catabolism of such complex sugars

in lactobacilli.

In silico analysis of a particular locus within the L. acidophilus NCFM genome

revealed the presence of a gene cluster encoding proteins potentially involved in prebiotic

transport and hydrolysis. This specific cluster was analyzed computationally and

functionally to reveal the genetic basis for FOS transport and catabolism by L.

acidophilus NCFM.

2.3Materials and Methods

2.3.1 Bacterial strain and media used in this study

The strain used in this study is L. acidophilus NCFM (Barefoot and

Klaenhammer, 1983). Cultures were propagated at 37°C, aerobically in MRS broth

(Difco). A semi-synthetic medium consisted of: 1% bactopeptone (w/v) (Difco), 0.5%

yeast extract (w/v) (Difco), 0.2% dipotassium phosphate (w/v) (Fisher), 0.5% sodium

acetate (w/v) (Fisher), 0.2% ammonium citrate (w/v) (Sigma), 0.02% magnesium sulfate

(w/v) (Fisher), 0.005% manganese sulfate (w/v) (Fisher), 0.1% Tween 80 (v/v) (Sigma),

0.003 % bromocresol purple (v/v) (Fisher), and 1% sugar (w/v). The carbohydrates added

45

were either glucose (dextrose) (Sigma), fructose (Sigma), sucrose (Sigma), or FOS. Two

types of complex sugars were used as FOS: a GFn mix (manufactured by R. Hutkins),

consisting of glucose monomers linked α-1,2 to two, three or four fructosyl moieties

linked β-2,1, to form kestose (GF2), nystose (GF3) and fructofuranosyl-nystose (GF4),

respectively; and an Fn mix, raftilose, derived from inulin hydrolysis (Orafti). Without

carbohydrate supplementation, the semi-synthetic medium was unable to sustain bacterial

growth above OD600nm~0.2.

2.3.2 Computational analysis of the putative msm operon

A 10 kbp DNA locus containing a putative msm (multiple sugar metabolism)

operon was identified from the L. acidophilus NCFM genome sequence. ORF predictions

were carried out by four computational programs: Glimmer (Salzberg et al., 1998;

Delcher et al., 1999), Clone Manager (Scientific and Educational Software), the NCBI

ORF caller (http://www.ncbi.nlm.nih.gov/gorf/gorf.html), and GenoMax (InforMax Inc.,

MD). Glimmer was previously trained with a set of L. acidophilus genes available in

public databases. The predicted ORF’s were translated into putative proteins that were

submitted to BlastP analysis (Altschul et al., 1990).

2.3.3 RNA isolation and analysis

Total RNA was isolated using TRIzol (GibcoBRL) by following the supplier’s

instructions. Cells in the mid-log phase were harvested by centrifugation (2 minutes,

14,000 rpm) and cooled on ice. Pellets were resuspended in TRIZOL, by vortexing and

underwent five cycles of 1 min bead beating and 1 min on ice. Nucleic acids were

46

subsequently purified using three chloroform extractions, and precipitated using

isopropanol and centrifugation for 10 min at 12,000 rpm. The RNA pellet was washed

with 70% ethanol, and resuspended into DEPC treated water. RNA samples were treated

with DNAse I according to the supplier’s instructions (Boehringer Mannheim). First

strand cDNA was synthesized using the Invitrogen RT-PCR kit according to the

supplier’s instructions. cDNA products were subsequently amplified using PCR with

primers internal to genes of interest. For RNA slot blots, RNA samples were transferred

to nitrocellulose membranes (BioRad) using a slot blot apparatus (Bio-Dot SF, BioRad),

and the RNAs were UV crosslinked to the membranes. Blots were probed with DNA

fragments generated by PCR that had been purified from agarose gels (GeneClean III kit,

Midwest Scientific). Probes were labeled with α-32P, using the Amersham Multiprime

Kit, and consisted of a 700 bp and 750 bp fragment internal to the msmE and bfrA genes,

respectively. Hybridization and washes were carried out according to the supplier’s

instructions (Bio-Dot Microfiltration Apparatus, BioRad) and radioactive signals were

detected using a Kodak Biomax film. Primers are listed in Supporting Table 1.

2.3.4 Comparative genomic analysis

A gene cluster bearing a fructosidase gene was selected after computational data-

mining of the L. acidophilus NCFM genome. Additionally, microbial clusters containing

fructosidase EC 3.2.1.26 orthologs, or bearing an ABC transport system associated with

an alpha-galactosidase EC 3.2.1.22 were selected from public databases (NCBI, TIGR).

The sucrose operon is a widely distributed cluster, consisting of either three or four

elements, namely: a regulator, a sucrose PTS transporter, a sucrose hydrolase and

47

occasionally a fructokinase. Two gene cluster alignments were generated: (i) a PTS

alignment, representing similarities over the sucrose operon, bearing a PTS transport

system associated with a sucrose hydrolase; (ii) an ABC alignment, representing

similarities over the multiple sugar metabolism cluster, bearing an ABC transport system

usually associated with a galactosidase. Sequence information is available in Table 2.

2.3.5 Phylogenetic trees

Nucleotide and protein sequences were aligned computationally using the

CLUSTALW algorithm (Thompson et al., 1994). The multiple alignment outputs were

used for generating unrooted neighbor-joining phylogenetic trees using MEGA2 (Kumar

et al., 2001). In addition to a phylogenetic tree derived from 16S rRNA genes, trees were

generated for ABC transporters, PTS transporters, transcription regulators, fructosidases,

and fructokinases.

2.3.6 Gene inactivation

Gene inactivation was conducted by site-specific plasmid integration into the L.

acidophilus chromosome via homologous recombination (Russell and Klaenhammer,

2001). Internal fragments of the msmE and bfrA genes were cloned into pORI28 using E.

coli as a host (Law et al., 1995), and the constructs were subsequently purified and

transformed into L. acidophilus NCFM. The ability of the mutant strains to grow on a

variety of carbohydrate substrates was investigated using growth curves. Strains were

grown on semi-synthetic medium supplemented with 0.5% w/v carbohydrate.

48

2.4 Results

2.4.1 Computational analysis of the msm operon

Analysis of the msm locus using four ORF calling programs revealed the presence

of seven putative ORF’s. Because most of the encoded proteins were homologous to

those of the msm operon present in S. mutans (Russell et al., 1992), a similar gene

nomenclature was used. The analysis of the predicted ORF’s suggested the presence of a

transcriptional regulator of the LacI repressor family, MsmR; a four component transport

system of the ATP binding cassette (ABC) family, MsmEFGK; and two enzymes

involved in carbohydrate metabolism, namely a fructosidase EC 3.2.1.26, BfrA; and a

sucrose phosphorylase EC 2.4.1.7, GtfA. A putative Shine-Dalgarno sequence

5’AGGAGG3’ was found within 10 bp upstream of the msmE start codon. A dyad

symmetry analysis revealed the presence of two stem loop structures that could act as

putative Rho-independent transcriptional terminators: one between msmK and gtfA

(between bp 6986 and 7014), free energy – 13.6 kcal.mol-1, and one 20 bp downstream of

the last gene of the putative operon (between bp 8,500 and 8,538), free energy –16.5

kcal.mol-1. The operon structure is shown in Figure 1.

The regulator contained two distinct domains: a DNA binding domain at the

amino-terminus with a predicted helix-turn-helix motif (pfam00354), and a sugar-binding

domain at the carboxy-terminus (pfam00532). The transport elements consisted of a

periplasmic solute binding protein (pfam01547), two membrane spanning permeases

(pfam00528), and a cytoplasmic nucleotide binding protein (pfam 00005), characteristic

of the different subunits of a typical ABC transport system (Quentin et al., 1999). A

putative anchoring motif LSLTG was present at the amino-terminus of the substrate-

49

binding protein. Each permease contained five trans-membrane regions predicted

computationally (Krogh et al., 2001). Analyses of ABC transporters in recently

sequenced microbial genomes have defined four characteristic sequence motifs (Linton

and Higgins, 1998; Braibant et al., 2000). The predicted MsmK protein included all four

ABC conserved motifs, namely: Walker A: GPSGCGKST (consensus GxxGxGKST or

[AG]xxxxGK[ST]); Walker B: IFLMDEPLSNLD (consensus hhhhDEPT or

DExxxxxD); ABC signature sequence: LSGG; and Linton and Higgins motif: IAKLHQ

(consensus hhhhH+/-, with h, hydrophobic and +/- charged residues). The putative

fructosidase showed high similarity to glycosyl hydrolases (pfam 00251). The putative

sucrose phosphorylase shared 63% residue identity with that of S. mutans.

2.4.2 Sugar induction and co-expression of contiguous genes

Transcriptional analysis of the msm operon using RT-PCR and RNA slot blots

showed that sucrose and both types of oligofructose (GFn and Fn) were able to induce

expression of msmE and bfrA (Figure 2A). In contrast, glucose and fructose did not

induce transcription of those genes, suggesting specificity for non-readily fermentable

sugars and the presence of a regulation system based on carbohydrate availability. In the

presence of both FOS and readily fermentable sugars, glucose repressed expression of

msmE, even if present at a lower concentration, whereas fructose did not (Figure 2B).

Analysis of the transcripts induced by oligofructose indicated that all genes within the

operon are co-expressed (Figure 6) in a manner consistent with the S. mutans msm

operon (McLaughlin and Ferretti, 1996).

50

2.4.3 Mutant phenotype analysis

The ability of the bfrA (fructosidase) and msmE (ABC transporter) mutant strains

to grow on a variety of carbohydrates was monitored by both optical density at 600nm

and colony forming units (cfu). The mutants retained the ability to grow on glucose,

fructose, sucrose, galactose, lactose and FOS-GFn, in a manner similar to that of the

control strain (Figure 7), a lacZ mutant of the L. acidophilus parental strain also

generated by plasmid integration (Russell and Klaenhammer, 2001). This strain was

chosen because it also bears a copy of the plasmid used for gene inactivation integrated in

the genome. In contrast, both the bfrA and msmE mutants halted growth on FOS-Fn

prematurely (Figure 3), likely upon exhaustion of simple carbohydrate from the semi-

synthetic medium. After one passage, the msmE mutant displayed slower growth on FOS-

Fn, while the bfrA mutant could not grow (Figure 3). Additionally, terminal cell counts

from overnight cultures grown on FOS-Fn were significantly lower for the mutants,

especially after one passage (Figure 7).

2.4.4 Comparative genomic analyses and locus alignments

Comparative genomic analysis of gene architecture between L. acidophilus, S.

mutans, S. pneumoniae, B. subtilis and B. halodurans revealed a high degree of synteny

within the msm cluster, except for the core sugar hydrolase (Figure 4A). In contrast, gene

content was consistent, whereas gene order was not well conserved for the sucrose

operon (Figure 4B). The lactic acid bacteria exhibit a divergent sucrose operon, where the

regulator and the hydrolase are transcribed opposite to the transporter and the

fructokinase. In contrast, gene architecture was variable amongst the proteobacteria.

51

2.4.5 Phylogenetic trees

Phylogenetic trees were generated to investigate whether there was a correlation

between protein similarity, gene architecture and the phylogenic relationships of the

selected microorganisms. The phylogenetic relationships were obtained from 16S

ribosomal DNA alignment. All proteobacteria appeared distant from the LAB, and the

Clostridium species formed a well-defined cluster between T. maritima and the bacillales

(Figure 5A).

For the fructosidases, all enzymes obtained from the LAB sucrose operons

clustered extremely well together, at the left end of the tree, whereas there was apparent

shuffling of the other three groups (Figure 5B). The paralogs of those fructosidases in S.

mutans, S. pneumoniae, and L. acidophilus clustered at the opposite end of the tree.

Interestingly, the L. acidophilus fructosidase was distant from the LAB sucrose

hydrolases cluster, and showed strong homology to enzymes experimentally associated

with oligosaccharide hydrolysis, in organisms such as T. maritima, M. laevaniformans,

and B. subtilis.

Each component of the ABC transport system clustered together (Figure 5C),

namely MsmE, MsmF, MsmG and MsmK for substrate binding, membrane spanning

proteins and nucleotide binding unit, respectively. For MsmE, MsmF and MsmG, three

consistent sub-clusters were obtained: (i) the two Bacillus species; (ii) L. acidophilus, S.

mutans and S. pneumoniae from the operons bearing a galactosidase; (iii) L. acidophilus

and S. pneumoniae from the operons bearing a fructosidase.

52

For the phospho-transferase system (PTS) transporters, the clustering did not

proceed according to phylogeny, especially for lactic acid bacteria, which formed two

separate clusters (Figure 5D). The two distant transporters at the bottom of the tree are

non-PTS sucrose transporters of the major facilitator family of transporters, as suggested

by their initial annotation.

All regulators were repressors, with the exception of those regulators of L.

acidophilus, S. pneumoniae and S. mutans clustering at the bottom of the tree (Figure

5E), which activate transcription of operons bearing an ABC transport system associated

with a galactosidase (Russell et al., 1992). In contrast, the msm regulators for both S.

pneumoniae and L. acidophilus seemed to be repressors similar to that of the sucrose

operon (5E). The helix-turn-helix DNA binding motif of the regulator was very well

conserved amongst selected regulators of the LacI family (Supporting Figure 3A), as

shown previously (Nguyen and Saier, 1995). In contrast, the seven regulators at the

bottom of the tree did not contain this conserved motif.

The fructokinase clustering was the most similar to that of the 16S phylogenetic

tree, with distinct clustering of lactobacillales, bacillales, clostridia, and proteobacteria

(Figure 5F). The lack of correlation between phylogeny, gene architecture and protein

similarity may be due to extensive gene transfer amongst bacteria and independent

sequence divergence.

2.4.6 Catabolite response elements (cre) analysis

Analysis of the promoter-operator region upstream of the msmE gene revealed the

presence of two 17-bp palindromes separated by 30 nucleotides, showing high similarity

53

to a consensus sequence for the cis-acting sites controlling catabolite repression in Gram

positive bacteria, notably Bacillus subtilis (Burne et al., 1999; Weickert and Chambliss,

1990; Miwa et al., 2000; Yamamoto et al., 2001). Several cre-like sequences highly

similar to those found in B. subtilis and S. mutans (Weickert and Chambliss, 1990; Miwa

et al., 2000; Yamamoto et al., 2001) were also retrieved from the promoter-operator

region of the L. acidophilus NCFM sucrose operon as well as that of the other msm locus

(Table 1). Interestingly, sequences nearly identical to the cre-like elements found in the

L. acidophilus msm operon, were found in the promoter-operator region of the msm locus

in S. pneumoniae (Table 1).

2.5 Discussion

The L. acidophilus NCFM msm operon encodes an ABC transporter associated

with a fructosidase that are both induced in the presence of FOS. Sucrose and both types

of oligofructose induced expression of the operon, whereas glucose and fructose did not.

Additionally, glucose repressed expression of the operon, suggesting the presence of a

regulation mechanism of preferred carbohydrate utilization based on availability. Specific

induction by FOS and sucrose, and repression by glucose indicated transcriptional

regulation, likely through cre present in the operator-promoter region, similar to those

found in B. subtilis (Miwa et al., 2000) and S. mutans (Burne et al., 1999). Catabolite

repression is a mechanism widely distributed amongst Gram-positive bacteria, usually

mediated in cis by catabolite response elements, and in trans by repressors of the LacI

family, responsible for transcriptional repression of genes encoding catabolic enzymes in

54

the presence of readily fermentable sugars (Weickert and Chambliss, 1990; Hueck et al.,

1994; Wen and Burne, 2002).

A variety of enzymes have been associated with microbial utilization of fructo-

oligosaccharides, namely: fructosidase EC 3.2.1.26 (Burne et al., 1987; Liebl et al.,

1998), inulinase EC 3.2.1.7 (Onodera and Shiomi, 1988; McKellar and Modler, 1989;

Xiao et al., 1989), levanase EC 3.2.1.65 (Menendez et al., 2002), fructofuranosidase EC

3.2.1.26 (Muramatsu et al., 1992; Oda and Ito, 2000; Perrin et al., 2000), fructanase EC

3.2.1.80 (Hartemink et al., 1995), and levan biohydrolase EC 3.2.1.64 (Saito et al., 2000;

Song et al., 2002). Despite the semantic diversity, these enzymes are functionally related,

and should be considered as members of the same β-fructosidase super-family that

incorporates members of both glycosyl family 32 and 68 (Naumoff, 2001). All those

enzymes share the conserved motif H-x(2)-P-x(4)-[LIVM]-N-D-P-N-G, and are all

involved in the hydrolysis of β-D-fructosidic linkages to release fructose. Generally,

fructosidases across genera share approximately 25-30% identity and 35-50% similarity

(Burne et al., 1999), with several regions widely conserved across the glycosyl hydrolase

32 family (Naumoff, 2001). The two residues shown to be involved in the enzymatic

activity of fructan-hydrolases, namely Asp 47 and Cys 230 (Reddy and Maley, 1990;

Liebl et al., 1998), as well as motifs highly conserved in the beta-fructosidase

superfamily, such as the NDPNG, FRDP, and ECP motifs (Liebl et al., 1998; Naumoff,

2001), were extremely well conserved amongst all fructosidase sequences (Supporting

Figure 3B).

Since the L. acidophilus fructosidase was similar to that of T. maritima and S.

mutans’ FruA (see Figure 5B), two enzymes that have experimentally been associated

55

with oligofructose hydrolysis (Burne et al., 1987; Liebl et al., 1998), we initially

hypothesized that BfrA is responsible for FOS hydrolysis. Induction and gene

inactivation data confirmed the correlation between the msm locus and FOS utilization.

The L. acidophilus BfrA fructosidase was most similar to that of T. maritima, which has

the ability to release fructose from sucrose, raffinose, levan (β2,6) and inulin (β2,1) in an

exo-type manner (Liebl et al., 1998). It was also very similar to other enzymes which

have been characterized experimentally, and associated with hydrolysis of FOS

compounds by S. mutans (Burne et al., 1999) and M. laevaniformans (Song et al., 2002).

Analysis of FOS degradation by S. mutans showed that FruA is involved in hydrolysis of

levan, inulin, sucrose and raffinose (Burne et al., 1987; Russell et al., 1992; Hartemink et

al., 1995; Burne et al., 1999). Additionally, it was shown that expression of this gene was

regulated by catabolite response elements (Burne et al., 1999; Wen et al., 2002) and that

fruA transcription was induced by levan, inulin and sucrose, whereas repressed by readily

metabolizable hexoses (Burne et al., 1987; Burne et al., 1999).

In S. mutans, FruA was shown to be an extracellular enzyme, which is anchored

to the cell wall by a LPxTG motif (Burne and Penders, 1992), that catalyses the

degradation of available complex carbohydrates outside of the cell. Additionally,

microbial fructosidases associated with FOS hydrolysis such as M. laevaniformans LevM

(Song et al., 2002) and S. exfoliatus levanbiohydrolase (Saito et al., 2000) have been

reported as extracellular enzymes as well. In contrast, the L. acidophilus NCFM

fructosidase does not contain an anchoring signal, thus is likely a cytoplasmic enzyme

requiring transport of its substrate(s) through the cell membrane. No additional secreted

levanase or inulinase was found in the L. acidophilus genome sequence. Since transporter

56

genes are often co-expressed with genes involved in the metabolism of the transported

compounds (Lambert et al., 2001), in silico analysis of the msm operon indicates that the

substrate of the fructosidase is transported by an ABC transport system. This is rather

unusual since when the fructosidase is not extracellular, the fructosidase gene is

commonly associated with a sucrose PTS transporter (Figure 4), notably in lactococci,

streptococci and bacilli (Hiratsuka et al., 1998; Luesink et al., 1999), or a sucrose

permease of the major facilitator family, as in B. longum. Those fructosidases usually

associated with PTS transporters are generally sucrose-6-phosphate hydrolases that do

not have FOS as cognate substrate. Therefore, L. acidophilus NCFM may have combined

the ABC transport system usually associated with an alpha-galactosidase, with a

fructosidases, in the msm locus. The genetic makeup of NCFM is seemingly distinct, and

exclusively similar to that of S. pneumoniae. Additionally, recent evidence in L.

paracasei suggested that an ABC transport system might be involved in FOS utilization

(Kaplan and Hutkins, 2003), which further supports the hypothesis that FOS is

transported by an ABC transporter in L. acidophilus.

Lateral gene transfer (LGT) has increasingly been shown to account for a

significant number of genes in bacterial genomes (Koonin et al., 2001), and may account

for a large proportion of the strain-specific genes found in microbes, as shown in H.

pylori (Salama et al., 2000), C. jejuni (Dorrell et al., 2001), S. pneumoniae (Hackenbeck

et al., 2001), and T. maritima (Nesbo et al., 2002). Notably, in T. maritima, genes

involved in sugar transport and polysaccharide degradation represent a large proportion

of variable genes, with ABC transporters having the highest horizontal gene transfer

frequency (Nesbo et al., 2002). In addition, it was recently suggested that oligosaccharide

57

catabolic capabilities of B. longum have been expanded through horizontal transfer, as

part of its adaptation to the human GI tract (Schell et al., 2002), and that the large set of

sugar uptake and utilization genes in L. plantarum was acquired through LGT

(Kleerebezem et al., 2003).

Intestinal microbes would benefit greatly from acquisition of gene clusters

involved in transport and catabolism of undigested sugars, especially if they conferred a

competitive edge towards successful colonization of the host GI tract. It is possible that L.

acidophilus acquired the ability to utilize FOS through genetic exchange, since ABC

transporters and polysaccharide degradation enzymes have a high horizontal gene transfer

frequency (Nesbo et al., 2002). The two fructosidase paralogs seemed fairly distant from

one another, sharing 28% identity and 44% similarity, suggesting those genes might have

arisen from LGT rather than gene duplication. Also, since no neighboring genes or

sequences are common to those two genes, a duplication event seems unlikely. Given the

lack of consistency between phylogeny, gene architecture, and protein similarity, it is

possible both the msm and sucrose operons underwent gene rearrangements. However,

there was no evidence the msm cluster was obtained through LGT, since the GC content

was very similar to that of the genome, and there was no discrepancy in the genetic code

usage.

Based on these observations, we conclude that L. acidophilus has combined the

ABC transport system derived from the raffinose operon with a β-fructosidase to form a

distinct gene cluster involved in transport and catabolism of prebiotic compounds

including FOS, suggesting a possible adaptation of the sugar catabolism system towards

different complex sugars. The catabolic properties of this operon might differ from those

58

of the raffinose and sucrose operons (Figure 9). In light of the theory that environmental

factors and ecology might be dominant over phylogeny for variable genes (Nesbo et al.,

2002), we may hypothesize that L. acidophilus has acquired FOS utilization capabilities

through LGT, or rearranged its genetic make-up to build a competitive edge towards

colonization of the human GI tract by using prebiotic compounds, ultimately contributing

to a more beneficial microbiota.

59

2.6 References


Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. (1990) J. Mol. Biol.

215, 403-410 Barefoot, S. F. & Klaenhammer, T. R. (1983) Appl. Environ. Microbiol. 45, 1808-1815 Braibant, M., Gilot, P. & Content, J. (2000) FEMS Microbiol. Rev. 24, 449-467 Burne, R. A., Schilling, K., Bowen, W. H. & Yasbin, R. E. (1987) J. Bacteriol. 169,

4507-4517 Burne, R. A. & Penders, J. E. (1992) Infect. Immun. 60, 4621-4632 Burne, R. A., Wen, Z. T., Chen, Y. Y. M. & Penders, J. E. C. (1999) J. Bacteriol. 181,

2863-2871 Delcher, A. L., Harmon, D., Kasif, S., White, O. & Salzberg, S. L. (1999) Nucleic Acids

Res. 27, 4636-4641 Dorrell, N., Mangan, J. A., Laing, K. G., Hinds, J., Linton, D., Al-Ghusein, H., Barrell,

B. G., Parkhill, J., Stoker, N. G., Karlyshev, A. V., Butcher, P. D. & Wren, B. W. (2001) Genome Res. 11, 1706-1715

Gibson, G. R. & Roberfroid, M. B. (1995) J. Nutr. 125, 1401-1412 Hakenbeck, R., Balmelle, N., Weber, B., Gardes, C., Keck, W. & de Saizieu, A. (2001)

Infect. Immun. 69, 2477-2486 Hartemink, R., Quataert, M. C. J., Vanlaere, K. M. J., Nout, M. J. R. & Rombouts, F. M.

(1995) J. Appl. Bacteriol. 79, 551-557 Hartemink, R., VanLaere, K. M. J. & Rombouts, F. M. (1997) J. Appl. Microbiol. 83,

367-374 Hiratsuka, K., Wang, B., Sato, Y. & Kuramitsu, H. (1998) Infect. Immun. 66, 3736-3743 Hueck, C. J., Hillen, W. & Saier, M. H., Jr. (1994) Res. Microbiol. 145, 503-518

60

Kaplan, H. & Hutkins, R. W. (2000) Appl. Environ. Microbiol. 66, 2682-2684 Kaplan, H., and Hutkins, R. W. (2003) Appl. Environ. Microbiol. 69, 2217-2222 Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,


Koonin, E. V., Makarova, K. S. & Aravind, L. (2001) Annu. Rev. Microbiol. 55, 709-742 Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E. L. (2001) J. Mol. Biol. 305,

567-580 Kumar, S., Tamura, K., Jakobsen, I. B. & Nei, M. (2001) Bioinformatics 17, 1244-1245 Lambert, A., Osteras, M., Mandon, K., Poggi, M. C. & Le Rudulier, D. (2001) J.

Bacteriol. 183, 4709-4717 Law, J., Buist, G., Haandrikman, A., Kok, J., Venema, G. & Leenhouts, K. (1995) J.

Bacteriol. 177, 7011-7018 Liebl, W., Brem, D. & Gotschlich, A. (1998) Appl. Microbiol. Biotechnol. 50, 55-64 Linton, K. J. & Higgins, C. F. (1998) Mol. Microbiol. 28, 5-13 Luesink, E. J., Marugg, J. D., Kuipers, O. P. & de Vos, W. M. (1999) J. Bacteriol. 181,

1924-1926 McKellar, R. C. & Modler, H. W. (1989) Appl. Microbiol. Biotechnol. 31, 537-541 McLaughlin, R. E. & Ferretti, J. J. (1996) Fems Microbiol. Lett. 140, 261-264 Menendez, C., Hernandez, L., Selman, G., Mendoza, M. F., Hevia, P., Sotolongo, M. &

Arrieta, J. G. (2002) Curr. Microbiol. 45, 5-12 Miwa, Y., Nakata, A., Ogiwara, A., Yamamoto, M. & Fujita, Y. (2000) Nucleic Acids

Res. 28, 1206-1210 Moshfegh, A. J., Friday, J. E., Goldman, J. P. & Ahuja, J. K. C. (1999) J. Nutr. 129,

1407s-1411s Muramatsu, K., Onodera, S., Kikuchi, M. & Shiomi, N. (1992) Biosci. Biotech. Biochem.

56, 1451-1454

61

Naumoff, D. G. (2001) Proteins 42, 66-76 Nesbo, C. L., Nelson, K. E. & Doolittle, W. F. (2002) J. Bacteriol. 184, 4475-4488 Nguyen, C. C. & Saier, M. H., Jr. (1995) FEBS Lett. 377, 98-102 Oda, Y. & Ito, M. (2000) Curr. Microbiol. 41, 392-395 Onodera, S. & Shiomi, N. (1988) Agric. Biol. Chem. 52, 2569-2576 Orrhage, K., Sjostedt, S. & Nord, C. E. (2000) J. Antimicrob. Chemother. 46, 603-612 Perrin, S., Grill, J. P. & Schneider, F. (2000) J. Appl. Microbiol. 88, 968-974 Quentin, Y., Fichant, G. & Denizot, F. (1999) J. Mol. Biol. 287, 467-484 Reddy, V. A. & Maley, F. (1990) J. Biol. Chem. 265, 10817-10820 Russell, R. R. B., Aduseopoku, J., Sutcliffe, I. C., Tao, L. & Ferretti, J. J. (1992) J. Biol.

Chem. 267, 4631-4637 Russell, W. M. & Klaenhammer, T. R. (2001) Appl. Environ. Microbiol. 67, 4361-4364 Rycroft, C. E., Jones, M. R., Gibson, G. R. & Rastall, R. A. (2001) J. Appl. Microbiol.

91, 878-887 Saito, K., Kondo, K., Kojima, I., Yokota, A. & Tomita, F. (2000) Appl. Environ.

Microbiol. 66, 252-256 Salama, N., Guillemin, K., McDaniel, T. K., Sherlock, G., Tompkins, L. & Falkow, S.

(2000) Proc. Natl. Acad. Sci. USA 97, 14668-14673 Salzberg, S. L., Delcher, A. L., Kasif, S. & White, O. (1998) Nucleic Acids Res. 26, 544-

548 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,

M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sc.i USA 99, 14422-14427

Song, E. K., Kim, H., Sung, H. K. & Cha, J. (2002) Gene 291, 45-55 Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994) Nucleic Acids Res. 22, 4673-

4680

62

Van Laere, K. M., Hartemink, R., Bosveld, M., Schols, H. A. & Voragen, A. G. (2000) J. Agric. Food Chem. 48, 1644-1652

Weickert, M. J. & Chambliss, G. H. (1990) Proc. Natl. Acad. Sci. USA 87, 6238-6242 Wen, Z. T. & Burne, R. A. (2002) J. Bacteriol. 184, 126-133 Xiao, R., Tanida, M. & Takao, S. (1989) J. Ferment. Bioeng. 67, 331-334 Yamamoto, H., Serizawa, M., Thompson, J. & Sekiguchi, J. (2001) J. Bacteriol. 183,

5110-5121

63

Table 1. Catabolite responsive elements sequences

Bacterium Sequence* Origin

B. subtilis WTGNAANCGNWNNCW search sequence Miwa et al., 2000

B. subtilis WWTGNAARCGNWWWCAWW new consensus Miwa et al., 2000

B. subtilis TGWAANCGNTNWCA consensus Weickert and Chambliss, 1990

B. subtilis TGTAAGCGCTTACA optimal operator Weickert and Chambliss, 1990

B. subtilis TGTAAACGTTATCA Yamamoto et al., 2001 L. acidophilus cre1 ATTG-AAACGTTT-CAA upstream of msmE L. acidophilus cre2 ATAG-AAACGTTT-CAA upstream of msmE S. pneumoniae cre1 AATG-AAACGTTT-CAA upstream of msmE2 S. pneumoniae cre2 AATG-AAACGTTT-CAA upstream of msmE2 L. acidophilus scr AATAAAAGCGTTTACAT upstream of scrB L. acidophilus cre3 TATGAAAGCGCTTAAAA upstream of msmE2 S. mutans creW AGATAGCGATTTGG Burne et al., 1999 S. mutans creS AGATAGCGCTTACA Burne et al., 1999

* N, any; W, A or T; R, G or A; shaded nucleotides were specifically conserved and consistent with the consensus sequences

64

Table 2. Primers used in this study

Primer Sequence* Gene† Position‡ A GTAATAATAGTCAAAGTGGC msmEf 1,518 B GATCGGATCCAAGATCAATGCTGCTTTAAA msmEf2 1,706 C GGAAGGCTGAAGTAGTTTGC msmEr 2,192 D GATCGAATTCGATACAGGATATGGCATTACG msmEr2 2,355 E AGGATCCATCCATATGCTCCACACT bfrAf 4,655 F AGAATTCAACATGATCAGCACTTCT bfrAr 5,370 G GGAATATCTTCGGCTAATTG bfrAr2 5,540 H CCACTTCAAGTAGCTGTTACTAATA msmGf 4,337 I CTTGAGTAAGATACTTTTGG msmGr 4,469 J GACCAGAAGATATTCACGCC msmKf 6,661 K ACCTGGCTTGTGATAATCAC msmKr 6,833 L GGTCTTTGAACTTGTTCCGC gtfAr 8,269

* underlined sequence indicates restriction site used for cloning † f, indicates forward strand; r, indicates reverse strand. ‡ position of the 5’ end of the primer, relative to the 10,000 bp DNA locus.

65

Table 3. Genes and proteins used for comparative genomic analyses

Bacterium Genome or locus Sequence information

B. anthracis NC_003995 bfrA NP_654697 B. halodurans NC_002570 BH1855 NP_242721, SacP NP242722, BH1857

NP_242723, SacA NP_242724, 16S (nt22,819-24,370), MsmR NP_243093, MsmE NP_243092, AmyD NP_243091, AmyC NP_243090, bh2223 NP_243089

B. longum AE014295 cscA BL0105 (fructosidase) AE014625_3, cscB (major facilitator family permease) AE014625_4, BL0107 (lacI) AE014625_5, 16S nt AE014785 nt 2,881-4,400

B. subtilis NC_000964 SacT NP_391686, SacP NP_391684. SacA NP_391683, 16S nt 9,809-11,361, MsmR NP_390904, MsmE NP_390905, AmyD NP_390906, AmyC NP_390907, MelA NP_390908, SacC NP_390581, YdhR O05510, YdjE O34768

C. acetobutylicum NC_003030 LicT NP_347062, 0423 NP_347063, 0424 NP_347064, SacA NP_347066, 16S nt 9,710-11,219

C. beijerinckii AF059741 ScrA AAC99320, ScrR AAC999321, ScrB AAC99322, ScrK AAC99323, 16S X_68179

C. perfringens NC_003366 1531 NP_562447, SacA NP_562448, 1533 NP_562449, 1534 NP_562450, 16S 10,173-11,680

E. coli NC_002655 3623 NP_288931, 3624 NP_288932, 3625 NP_288933, 3626 NP_288934, 16S nt 227,103-228,644

E. faecalis TIGR shotgun, NC 002938

EF1601, EF1603, EF1604, 16S AF515223, EFA0067, EFA0069, EFA0070, available at http://www.tigr.org

G. stearothermophilus TIGR shotgun, NC_002926

16S contig221 nt 1,001-2,440, SurT AAB38977, SurP AAB72022, SurA AAB38976, PfK KIBSFF

K. pneumoniae WashU shotgun, NC_002941

ScrR P37076, ScrA CAA40658, ScrB CAA40659, 16S AJ233420, locus X57401

L. acidophilus AY172019 (msm), AY172020 (msm2), AY177419 (scr)

ScrR, ScrB, ScrA, 16S nt 59,261-60,816, MsmR, MsmE, MsmF, MsmG, BfrA, MsmK, GtfA, MsmR2, MsmE2, MsmF2, MsmG2, MsmK2, Aga, GtfA2

L. fermentum ScrK CAD24410 L. gasseri NZ_AAAB0100

0011 In progress, JGI

ScrR ZP_00046868, ScrB58 (contig 58) ZP_00046078, ScrB38 (contig 38) ZP_00046869, ScrA21 (contig 21), ScrA 58 (contig 58) ZP_00046080, ScrK ZP_00046753 , 16S AF519171

L. lactis M96669 SacB CAB09690, SacA CAB09689, SacR CAB09692, SacK CAB09691, Luesink et al., 1999, 16S X54260

L. plantarum AL935263 16S AF515222, sacK1 CAD62854, pts1bca CAD62855, sacA CAD62856, sacR CAD62857

L. sakei ScrA AAK92528

66

Table 3. Genes and proteins used for comparative genomic analyses (continued)

Bacterium Genome or locus Sequence information

M. laevaniformans LevM BAB59060 P. multocida NC_002663 PtsB NP_246785, ScrR NP_246786, ScrB

NP_246787, PM1849 NP_246788, 16S AY078999

P. pentosaceus Z32771 ScrK CAA83667, ScrA CAA83668, ScrB CAA83669, ScrR CAA83670, 16S AF515227

R. solanacearum NC_003296 ScrR NP_522845, ScrA NP_522844, ScrB NP_522843, 16S nt 1,532,714-1,534,226

S. agalactiae NC_004116 ScrR NP_688683, ScrB NP_688682, Sag1690 NP_688681, ScrK NP_688680, 16S nt 16411-17916

S. aureus NC_002758 ScrR NP_372566, ScrB NP_372565, 2040 NP_372564, 16S P83357

S. mutans M77351 ScrK NP_722157, ScrA NP_722158, ScrB NP_722159, ScrR NP_722160, msmR AAA26932, Aga AAA26933, MsmE AAA26934, MsmF AAA26935, MsmG AAA26936, GtfA AAA26937, MsmK AAA26938, FruB AAD28639, FruA Q03174, 16S AF139603

S. pneumoniae NC_003098 ScrK NP_359158, ScrA NP_359159, ScrB NP_359160, ScrR NP_359161, 16S nt15,161-16,674, MsmR NP_359306, Aga NP_359305, MsmE NP_359304, MsmF NP_359303, MsmG NP_359302, GtfA NP_359301, ScrR2 NP_359213, Sbp NP_359212, MspA NP_359211, MspB NP_359210, SacA NP_359209

S. pyogenes NC_002737 ScrK NP_269817, ScrA NP_269819, ScrB NP_269820, ScrR NP_269821, 16S nt 17,170-18,504

S. sobrinus ScrB S68598, ScrA S68599 S. typhimurium ScrK P26984, ScrAP08470, ScrR CAA47975,

ScrB P37075, 16S Z49264 S. xylosus ScrA S39978, ScrB Q05936, ScrR P74892 T. maritima NC_000853 bfrA NP_229215, 1416 NP_229217, 1417

NP_229218, 16S AJ401021, 0296 NP_228108 V. alginolyticus ScrR P24508, ScrB P13394, ScrK P22824, ScrA

P22825, 16S AF513447 V. cholerae NC_002506 0653 NP_233042, ScrR NP_233043, 0655

NP_233044, 0656 NP_233045, 16S X74694

67

msmR msmE msmF msmG bfrA msmK gtfA

1231 aaatggcaataccacaaaaTAActgttgacaagttgtgaaagcgatattatcatttaatt1291 gtaaattgaaaacgtttccaaagtgttcaaatagttttttgctaaataattatttttttg1351 tagcgaaaTAGAAACGTTTCAAttaatttaaaacaattagatcttagtaggaaacctttt

cre21411 aatttttgtgcaaaaTTGAAACGTTTCAAaAGGAGGaaaaATGaaaaaatggaaattagg

cre1

msmR msmE msmF msmG bfrA msmK gtfA

1231 aaatggcaataccacaaaaTAActgttgacaagttgtgaaagcgatattatcatttaatt1291 gtaaattgaaaacgtttccaaagtgttcaaatagttttttgctaaataattatttttttg1351 tagcgaaaTAGAAACGTTTCAAttaatttaaaacaattagatcttagtaggaaacctttt

cre21411 aatttttgtgcaaaaTTGAAACGTTTCAAaAGGAGGaaaaATGaaaaaatggaaattagg

cre1

Figure 1. Operon layout. The start and stop codons are in bold, the putative ribosome binding site is boxed, and the cre-like elements are underlined. Terminators are indicated by hairpin structures

68

msmE

bfrA

msmE

bfrA

Glc Fru Suc GFn Fn ctrl

msmEbfrA

%Fn 1.0 1.0 1.0 1.0 DNA%Glc 0.0 0.1 0.5 1.0 ctrl

msmEbfrA

%Fn 1.0 1.0 1.0 1.0 DNA%Fru 0.0 0.1 0.5 1.0 ctrl

A

B

msmE

bfrA

msmE

bfrA

Glc Fru Suc GFn Fn ctrl

msmEbfrA

%Fn 1.0 1.0 1.0 1.0 DNA%Glc 0.0 0.1 0.5 1.0 ctrl

msmEbfrA

%Fn 1.0 1.0 1.0 1.0 DNA%Fru 0.0 0.1 0.5 1.0 ctrl

A

B

Figure 2. Sugar induction and repression. A. Transcriptional induction of the msmE, and bfrA genes, monitored by RT-PCR (top) and RNA slot blots (bottom). Cells were grown on glucose (Glc), fructose (Fru), sucrose (Suc), FOS GFn, and FOS Fn. Chromosomal DNA was used as a positive control for the probe. B. Transcriptional repression analysis of msmE and bfrA by variable levels of glucose (Glc) and fructose (Fru): 0.1% (5.5 mM), 0.5% (28 mM) and 1.0% (55 mM), in the presence of 1% Fn. Cells were grown in the presence of Fn until OD600nm approximated 0.5-0.6, glucose was added and cells were propagated for an additional 30 minutes

69

Time (hrs)0 2 4 6 8 10 12 14 16 18

Ln O

D60

0nm

e-3

e-2

e-1

e0

e1

fructoseGFnFnlacZFn passage

Time (hrs)0 2 4 6 8 10 12 14 16 18

Ln O

D60

0nm

e-3

e-2

e-1

e0

e1


Time (hrs)0 2 4 6 8 10 12 14 16 18

Ln O

D60

0nm

e-3

e-2

e-1

e0

e1


Time (hrs)0 2 4 6 8 10 12 14 16 18

Ln O

D60

0nm

e-3

e-2

e-1

e0

e1


Figure 3. Growth curves. The two mutants, bfrA (top) and msmE (bottom) were grown on semi-synthetic medium supplemented with 0.5% w/v carbohydrate: fructose (●), GFn (○), Fn (▼), Fn for one passage ( ). The lacZ mutant grown on Fn was used as control (∇)

70

S. pneumoniae

S. mutans

B. subtilis

B. halodurans

L. acidophilus

L. acidophilus

S. pneumoniae

R A E F G S

R A E F G S K

R E F G A

R E F G A

R2 E2 F2 G2 K2 A S2

R E F G B K S

R2 E2 F2 G2 B

E. faecalis

E. faecalis plasmid

S. pyogenes

P. pentosaceus

S. mutans

S. agalactiae

S. pneumoniae

L. acidophilus

L. plantarum

L. lactis

E. coli O157:H7

S. aureus

C. beijerinckii

C. perfringens

P. multocida

V. cholerae

B. subtilis

B. halodurans

C. acetobutylicum

G. stearothermophilus

R. solanacearum

1604 1603 1601

0070 0069 0067

scrR scrB scrA scrK

scrR scrB scrA scrK

scrR scrB scrA scrK

scrR scrB sag1 scrK

scrR scrB scrA scrK

scrR scrB scrA

sacR sacA PTS sacK

sacR sacB sacB sacK

3626 3625 3624 3623

scrR scrB 2040

scrA scrR scrB scrK

1534 1533 sacA 1531

ptsB scrR scrB 1849

0653 scrR 0655 0656

sacT sacP sacA

1855 sacP 1857 sacA

licT 0423 0424 sacA

surT surP surA

scrR scrA scrB

A

B

S. pneumoniae

S. mutans

B. subtilis

B. halodurans

L. acidophilus

L. acidophilus

S. pneumoniae

R A E F G S

R A E F G S K

R E F G A

R E F G A

R2 E2 F2 G2 K2 A S2

R E F G B K S

R2 E2 F2 G2 B

E. faecalis

E. faecalis plasmid

S. pyogenes

P. pentosaceus

S. mutans

S. agalactiae

S. pneumoniae

L. acidophilus

L. plantarum

L. lactis

E. coli O157:H7

S. aureus

C. beijerinckii

C. perfringens

P. multocida

V. cholerae

B. subtilis

B. halodurans

C. acetobutylicum


R. solanacearum

1604 1603 1601

0070 0069 0067

scrR scrB scrA scrK

scrR scrB scrA scrK

scrR scrB scrA scrK

scrR scrB sag1 scrK

scrR scrB scrA scrK

scrR scrB scrA

sacR sacA PTS sacK

sacR sacB sacB sacK

3626 3625 3624 3623

scrR scrB 2040

scrA scrR scrB scrK

1534 1533 sacA 1531

ptsB scrR scrB 1849

0653 scrR 0655 0656

sacT sacP sacA

1855 sacP 1857 sacA

licT 0423 0424 sacA

surT surP surA

scrR scrA scrB

A

B

Figure 4. Operon architecture analysis. A. Alignment of the msm locus from selected bacteria. Regulators, white; α-galactosidases, blue; ABC transporters, gray; fructosidases, yellow; sucrose phosphorylase, red. B. Alignment of the sucrose locus from selected microbes. Regulators, white; fructosidases, yellow; PTS transporters, green; fructokinase, purple; putative proteins, black

71

S. muta

nsmsm

F

L. acid

ophilus msm

F2

S. pneumoniae msmFS. pneumoniae mspA

L. acidophilus msmF

B. halodurans amyD

B. subtilis amyD

Spneumoniaemsp

B

L. a

cidop

hilu

s m

smG

S. p

neum

onia

em

smG

S. m

utan

sm

smG

L. a

cidop

hilus

msm

G2

B. h

alodu

rans

amyC

B. subtilis

amyC

S. pne

umon

iaemsm

E

S. mutans

msmE

L. acidophilus msmE2

S. pneunoniae sbpL. acidophilus msmE

B. halodurans msmEB. subtilis

msm

E

T. maritim

a1416

T. maritim

a1417

S. mutans

msm

K

L. acidophilus msm

K

L. acidophilus msmK2

C 0.2

S. muta

nsmsm

F

L. acid

ophilus msm

F2


L. acidophilus msmF

B. halodurans amyD

B. subtilis amyD

Spneumoniaemsp

B

L. a

cidop

hilu

s m

smG

S. p

neum

onia

em

smG

S. m

utan

sm

smG

L. a

cidop

hilus

msm

G2

B. h

alodu

rans

amyC

B. subtilis

amyC

S. pne

umon

iaemsm

E

S. mutans

msmE




msm

E

T. maritim

a1416

T. maritim

a1417

S. mutans

msm

K

L. acidophilus msm

K


C 0.2

L. ac

idoph

ilus

L. gasseriL. plantarumP. pentosaceus

L. lactis

S. mutans

S. pneumoniae

S. pyogenes

E. faecalis

B. h

alod

uran

s

B. sub

tilis


C. acetobutylicum

C. beijerinckiiC. perfringensT. maritima

S. aureus

A 0.05

B. longum

S. agalactiae

L. ac

idoph

ilus


L. lactis

S. mutans

S. pneumoniae

S. pyogenes

E. faecalis

B. h

alod

uran

s

B. sub

tilis


C. acetobutylicum


S. aureus

A 0.05

B. longum

S. agalactiae

B 0.2

L. lactisL. acidophilusL. gasseri 38

faecalisplantarum

pentosaceus

L. gasseri 58

S. aureus

S. xylosus C. b

eijin

rinck

iiC

. per

fring

ens

B. a

nthr

acis

C. acetobutylicumB. su

btilis

sacA

B. halodurans

T. maritim

aL. acidophilus bfrA

S. mutans

fruAB. subtilis

sacC

M. laevaniform

ansS. m

utansfruB

Sneumoniae

sacA


S. m

utan

sS

. sob

rinus

E. fa

ecal

isp

S. p

neum

onia

epyogenes

o

agala

ctiae B. longum

B 0.2


faecalisplantarum

pentosaceus

L. gasseri 58

S. aureus

S. xylosus C. b

eijin

rinck

iiC

. per

fring

ens

B. a

nthr

acis


btilis

sacA

B. halodurans

T. maritim


S. mutans

fruAB. subtilis

sacC

M. laevaniform

ansS. m

utansfruB

Sneumoniae

sacA


S. m

utan

sS

. sob

rinus

E. fa

ecal

isp

S. p

neum

onia

epyogenes

o

E. L.

P.

S.

S.

E. L.

P.

S.

S. aga

lactia

e B. longum

E 0.5

L. acidophilus scrR

L. gasseri

P. pentosaceus

E. faecalisL. lactis

E. fa

ecali

s p

S. mutans

S. p

yoge

nes

S. p

neum

onia

e

L. acidophilus R

S. pneumoniae

R2us

E. coli

C. b

eije

rinck

iiC

. per

fring

ens

B. h

alodu

rans

S. aureus

S. xylosus

B. halodurans msmR

B. sub ilis msmR

B. subtilissacT

G. stearotherm

ophilus

C. acetobutylicum

S. pneumoniae

R

S. mutans

R

L. acidophilus R2

B. longum

S. a

gala

ctia

e

L. plantarum

E 0.5

L. acidophilus scrR

L. gasseri

P. pentosaceus


E. fa

ecali

s p

S. mutans

S. p

yoge

nes

S. p

neum

onia

e

L. acidophilus R

S. pneumoniae

R2us

E. coli

C. b

eije

rinck

iiC

. per

fring

ens

B. h

alodu

rans

S. aureus

S. xylosus

B. halodurans msmR

B. sub ilis msmR

B. subtilissacT

G. stearotherm

ophilus

C. acetobutylicum

S. pneumoniae

R

S. mutans

R

L. acidophilus R2

B. longum

S. a

gala

ctia

e

L. plantarum

D 0.2

L. a

cido

philu

s

L. g

asse

ri21

E. fa

ecal

isp

S. muta

ns

L. sakei

P. pentosaceus

.

S. xylosus

V. alginolyticus C. a

ceto

buty

licum

Bu

ilis

B. halodura

ns


calis

C. beijerinckiiL. gasseri 58

S. sobrinus

S. pyogenes

S. pneumoniaeC. perfringens

L. lactisS. agalactiae

B. longum

L. plantarum

D 0.2

L. a

cido

philu

s

L. g

asse

ri21

E. fa

ecal

isp

S. muta

ns

L. sakei

P. pentosaceus

.

S. xylosus


ceto

buty

licum

Bu

ilis

B. halodura

ns


calis


S. sobrinus

S. pyogenes



B. longum

L. plantarum

F 0.2

C. beijerin

ckiiC. perfringens

C. acetobutylicum

B. subtilis

T. maritim

aS. aureus

B. h

alod

uran

s

.

s

r

P. pentosaceus

L. fermentum

L. acidophilus

L. gasseri

L. lactisS. pneumoniae

S. mutans

S. pyogenes S. agalactiae

L. plantarum

F 0.2

C. beijerin

ckiiC. perfringens

C. acetobutylicum

B. subtilis

T. maritim

aS. aureus

B. h

alod

uran

s

.

s

r

P. pentosaceus

L. fermentum

L. acidophilus

L. gasseri


S. mutans


L. plantarum

S. muta

nsmsm

F

L. acid

ophilus msm

F2


L. acidophilus msmF

B. halodurans amyD

B. subtilis amyD

Spneumoniaemsp

B

L. a

cidop

hilu

s m

smG

S. p

neum

onia

em

smG

S. m

utan

sm

smG

L. a

cidop

hilus

msm

G2

B. h

alodu

rans

amyC

B. subtilis

amyC

S. pne

umon

iaemsm

E

S. mutans

msmE




msm

E

T. maritim

a1416

T. maritim

a1417

S. mutans

msm

K

L. acidophilus msm

K


C 0.2

S. muta

nsmsm

F

L. acid

ophilus msm

F2


L. acidophilus msmF

B. halodurans amyD

B. subtilis amyD

Spneumoniaemsp

B

L. a

cidop

hilu

s m

smG

S. p

neum

onia

em

smG

S. m

utan

sm

smG

L. a

cidop

hilus

msm

G2

B. h

alodu

rans

amyC

B. subtilis

amyC

S. pne

umon

iaemsm

E

S. mutans

msmE




msm

E

T. maritim

a1416

T. maritim

a1417

S. mutans

msm

K

L. acidophilus msm

K


C 0.2

S. muta

nsmsm

F

L. acid

ophilus msm

F2


L. acidophilus msmF

B. halodurans amyD

B. subtilis amyD

Spneumoniaemsp

B

L. a

cidop

hilu

s m

smG

S. p

neum

onia

em

smG

S. m

utan

sm

smG

L. a

cidop

hilus

msm

G2

B. h

alodu

rans

amyC

B. subtilis

amyC

S. pne

umon

iaemsm

E

S. mutans

msmE




msm

E

T. maritim

a1416

T. maritim

a1417

S. mutans

msm

K

L. acidophilus msm

K


C 0.2

S. muta

nsmsm

F

L. acid

ophilus msm

F2


L. acidophilus msmF

B. halodurans amyD

B. subtilis amyD

Spneumoniaemsp

B

L. a

cidop

hilu

s m

smG

S. p

neum

onia

em

smG

S. m

utan

sm

smG

L. a

cidop

hilus

msm

G2

B. h

alodu

rans

amyC

B. subtilis

amyC

S. pne

umon

iaemsm

E

S. mutans

msmE




msm

E

T. maritim

a1416

T. maritim

a1417

S. mutans

msm

K

L. acidophilus msm

K


C 0.2

L. ac

idoph

ilus


L. lactis

S. mutans

S. pneumoniae

S. pyogenes

E. faecalis

B. h

alod

uran

s

B. sub

tilis


C. acetobutylicum


S. aureus

A 0.05

B. longum

S. agalactiae

L. ac

idoph

ilus


L. lactis

S. mutans

S. pneumoniae

S. pyogenes

E. faecalis

B. h

alod

uran

s

B. sub

tilis


C. acetobutylicum


S. aureus

A 0.05

B. longum

S. agalactiae

L. ac

idoph

ilus


L. lactis

S. mutans

S. pneumoniae

S. pyogenes

E. faecalis

B. h

alod

uran

s

B. sub

tilis


C. acetobutylicum


S. aureus

A 0.05

B. longum

S. agalactiae

L. ac

idoph

ilus


L. lactis

S. mutans

S. pneumoniae

S. pyogenes

E. faecalis

B. h

alod

uran

s

B. sub

tilis


C. acetobutylicum


S. aureus

A 0.05

B. longum

S. agalactiae

B 0.2


faecalisplantarum

pentosaceus

L. gasseri 58

S. aureus

S. xylosus C. b

eijin

rinck

iiC

. per

fring

ens

B. a

nthr

acis


btilis

sacA

B. halodurans

T. maritim


S. mutans

fruAB. subtilis

sacC

M. laevaniform

ansS. m

utansfruB

Sneumoniae

sacA


S. m

utan

sS

. sob

rinus

E. fa

ecal

isp

S. p

neum

onia

epyogenes

o

agala

ctiae B. longum

B 0.2


faecalisplantarum

pentosaceus

L. gasseri 58

S. aureus

S. xylosus C. b

eijin

rinck

iiC

. per

fring

ens

B. a

nthr

acis


btilis

sacA

B. halodurans

T. maritim


S. mutans

fruAB. subtilis

sacC

M. laevaniform

ansS. m

utansfruB

Sneumoniae

sacA


S. m

utan

sS

. sob

rinus

E. fa

ecal

isp

S. p

neum

onia

epyogenes

o

E. L.

P.

S.

S.

E. L.

P.

S.

S. aga

lactia

e B. longum

B 0.2


faecalisplantarum

pentosaceus

L. gasseri 58

S. aureus

S. xylosus C. b

eijin

rinck

iiC

. per

fring

ens

B. a

nthr

acis


btilis

sacA

B. halodurans

T. maritim


S. mutans

fruAB. subtilis

sacC

M. laevaniform

ansS. m

utansfruB

Sneumoniae

sacA


S. m

utan

sS

. sob

rinus

E. fa

ecal

isp

S. p

neum

onia

epyogenes

o

agala

ctiae B. longum

B 0.2


faecalisplantarum

pentosaceus

L. gasseri 58

S. aureus

S. xylosus C. b

eijin

rinck

iiC

. per

fring

ens

B. a

nthr

acis


btilis

sacA

B. halodurans

T. maritim


S. mutans

fruAB. subtilis

sacC

M. laevaniform

ansS. m

utansfruB

Sneumoniae

sacA


S. m

utan

sS

. sob

rinus

E. fa

ecal

isp

S. p

neum

onia

epyogenes

o

E. L.

P.

S.

S.

E. L.

P.

S.

S. aga

lactia

e B. longum

E 0.5

L. acidophilus scrR

L. gasseri

P. pentosaceus


E. fa

ecali

s p

S. mutans

S. p

yoge

nes

S. p

neum

onia

e

L. acidophilus R

S. pneumoniae

R2us

E. coli

C. b

eije

rinck

iiC

. per

fring

ens

B. h

alodu

rans

S. aureus

S. xylosus

B. halodurans msmR

B. sub ilis msmR

B. subtilissacT

G. stearotherm

ophilus

C. acetobutylicum

S. pneumoniae

R

S. mutans

R

L. acidophilus R2

B. longum

S. a

gala

ctia

e

L. plantarum

E 0.5

L. acidophilus scrR

L. gasseri

P. pentosaceus


E. fa

ecali

s p

S. mutans

S. p

yoge

nes

S. p

neum

onia

e

L. acidophilus R

S. pneumoniae

R2us

E. coli

C. b

eije

rinck

iiC

. per

fring

ens

B. h

alodu

rans

S. aureus

S. xylosus

B. halodurans msmR

B. sub ilis msmR

B. subtilissacT

G. stearotherm

ophilus

C. acetobutylicum

S. pneumoniae

R

S. mutans

R

L. acidophilus R2

B. longum

S. a

gala

ctia

e

L. plantarum

E 0.5

L. acidophilus scrR

L. gasseri

P. pentosaceus


E. fa

ecali

s p

S. mutans

S. p

yoge

nes

S. p

neum

onia

e

L. acidophilus R

S. pneumoniae

R2us

E. coli

C. b

eije

rinck

iiC

. per

fring

ens

B. h

alodu

rans

S. aureus

S. xylosus

B. halodurans msmR

B. sub ilis msmR

B. subtilissacT

G. stearotherm

ophilus

C. acetobutylicum

S. pneumoniae

R

S. mutans

R

L. acidophilus R2

B. longum

S. a

gala

ctia

e

L. plantarum

E 0.5

L. acidophilus scrR

L. gasseri

P. pentosaceus


E. fa

ecali

s p

S. mutans

S. p

yoge

nes

S. p

neum

onia

e

L. acidophilus R

S. pneumoniae

R2us

E. coli

C. b

eije

rinck

iiC

. per

fring

ens

B. h

alodu

rans

S. aureus

S. xylosus

B. halodurans msmR

B. sub ilis msmR

B. subtilissacT

G. stearotherm

ophilus

C. acetobutylicum

S. pneumoniae

R

S. mutans

R

L. acidophilus R2

B. longum

S. a

gala

ctia

e

L. plantarum

D 0.2

L. a

cido

philu

s

L. g

asse

ri21

E. fa

ecal

isp

S. muta

ns

L. sakei

P. pentosaceus

.

S. xylosus


ceto

buty

licum

Bu

ilis

B. halodura

ns


calis


S. sobrinus

S. pyogenes



B. longum

L. plantarum

D 0.2

L. a

cido

philu

s

L. g

asse

ri21

E. fa

ecal

isp

S. muta

ns

L. sakei

P. pentosaceus

.

S. xylosus


ceto

buty

licum

Bu

ilis

B. halodura

ns


calis


S. sobrinus

S. pyogenes



B. longum

L. plantarum

D 0.2

L. a

cido

philu

s

L. g

asse

ri21

E. fa

ecal

isp

S. muta

ns

L. sakei

P. pentosaceus

.

S. xylosus


ceto

buty

licum

Bu

ilis

B. halodura

ns


calis


S. sobrinus

S. pyogenes



B. longum

L. plantarum

D 0.2

L. a

cido

philu

s

L. g

asse

ri21

E. fa

ecal

isp

S. muta

ns

L. sakei

P. pentosaceus

.

S. xylosus


ceto

buty

licum

Bu

ilis

B. halodura

ns


calis


S. sobrinus

S. pyogenes



B. longum

L. plantarum

F 0.2

C. beijerin

ckiiC. perfringens

C. acetobutylicum

B. subtilis

T. maritim

aS. aureus

B. h

alod

uran

s

.

s

r

P. pentosaceus

L. fermentum

L. acidophilus

L. gasseri


S. mutans


L. plantarum

F 0.2

C. beijerin

ckiiC. perfringens

C. acetobutylicum

B. subtilis

T. maritim

aS. aureus

B. h

alod

uran

s

.

s

r

P. pentosaceus

L. fermentum

L. acidophilus

L. gasseri


S. mutans


L. plantarum

R. solanacearum

P. multocida

V. alginolyticus

V. choleraeK. pneum

oniaeS. typhim

uyrium

E. coli R. solanacearum

P. multocida

V. alginolyticus

V. choleraeK. pneum

oniaeS. typhim

uyrium

E. coli

P. multocida

V. alginolyticus

V. c

holer

ae

R. solanacearum

K. pneumoniaeS. typhimurium

. p

E. c li

P. multocida

V. alginolyticus

V. c

holer

ae

R. solanacearum


. p

E. c li

V. alginolytic

R. s

olan

acea

rum

P. multocidaK. pneumoniaeS. typhimurium

tV. cholerae

V. alginolytic

R. s

olan

acea

rum


tV. cholerae

P multocida

V. cholerae

K. p

neum

onia

e

S. t

yphi

mur

ium

. sbt

E. fae

E. coli

R. s

olan

acea

rum

P multocida

V. cholerae

K. p

neum

onia

e

S. t

yphi

mur

ium

. sbt

E. fae

E. coli

R. s

olan

acea

rum

E. c

oli

S. typ

himuri

um

P. multocida

V. choleraeV alginolyticuR. solanacea um

E. c

oli

S. typ

himuri

um

P. multocida


R. solanacearum

P. multocida

V. alginolyticus

V. choleraeK. pneum

oniaeS. typhim

uyrium


P. multocida

V. alginolyticus

V. choleraeK. pneum

oniaeS. typhim

uyrium


P. multocida

V. alginolyticus

V. choleraeK. pneum

oniaeS. typhim

uyrium


P. multocida

V. alginolyticus

V. choleraeK. pneum

oniaeS. typhim

uyrium

E. coli

P. multocida

V. alginolyticus

V. c

holer

ae

R. solanacearum


. p

E. c li

P. multocida

V. alginolyticus

V. c

holer

ae

R. solanacearum


. p

E. c li

P. multocida

V. alginolyticus

V. c

holer

ae

R. solanacearum


. p

E. c li

P. multocida

V. alginolyticus

V. c

holer

ae

R. solanacearum


. p

E. c li

V. alginolytic

R. s

olan

acea

rum


tV. cholerae

V. alginolytic

R. s

olan

acea

rum


tV. cholerae

V. alginolytic

R. s

olan

acea

rum


tV. cholerae

V. alginolytic

R. s

olan

acea

rum


tV. cholerae

P multocida

V. cholerae

K. p

neum

onia

e

S. t

yphi

mur

ium

. sbt

E. fae

E. coli

R. s

olan

acea

rum

P multocida

V. cholerae

K. p

neum

onia

e

S. t

yphi

mur

ium

. sbt

E. fae

E. coli

R. s

olan

acea

rum

P multocida

V. cholerae

K. p

neum

onia

e

S. t

yphi

mur

ium

. sbt

E. fae

E. coli

R. s

olan

acea

rum

P multocida

V. cholerae

K. p

neum

onia

e

S. t

yphi

mur

ium

. sbt

E. fae

E. coli

R. s

olan

acea

rum

E. c

oli

S. typ

himuri

um

P. multocida


E. c

oli

S. typ

himuri

um

P. multocida


Figure 5. Neighbor-joining phylogenetic trees. Lactobacillales, black; bacillales, green; clostridia, blue; thermotogae, yellow; proteobacteria, red. A, 16S; B, fructosidase; C, ABC; D, PTS; E, regulators; F, fructokinase. L. acidophilus proteins are boxed, and shaded when encoded by the msm locus. Bars indicate scales for computed pairwise distances

72

msmE msmF msmG bfrA msmK gtfA

A C B D

A B C D a

M noR

T R

T D

NA

no

RT

RT

DN

A

noR

T R

T D

NA

no

RT

RT

DN

A

Figure 6. Co-expression of contiguous genes. Co-transcription of contiguous genes was monitored by RT-PCR using primers as shown on the lower panel. In each set of three bands, a negative control did not undergo reverse transcription (left), and a positive control was obtained from chromosomal DNA used as a template for PCR (right)

73

glc fru suc GFn Fn FnRP lac Gal Raff

log 10

cfu/

ml

1e+6

1e+7

1e+8

1e+9

1e+10

bfrAmsmElacZ

ncfm lacZ msmE bfrA

Log 10

cfu/

ml

1e+3

1e+4

1e+5

1e+6

1e+7

1e+8

1e+9

1e+10

fructoseFn

Figure 7. Mutant growth on select carbohydrates. Strains were grown overnight (18 hours) on semi-synthetic medium supplemented with 0.5% w/v carbohydrates, either glucose (Glc), fructose (Fru), sucrose (Suc), FOS-GFn (GFn), FOS-Fn from Orafti (Fn), FOS-Fn from Rhone-Poulenc (FnRP), lactose (Lac), or galactose (Gal). Cell counts obtained after one passage of the bfrA mutant on FOS-Fn are shown in the lower graph.

74

A Helix Turn Helix LacI consensus * TIKDVARLAGVSKSTVSRVLN B. halodurans_msmR MATIKDIAKLANVSNATVSRVLNR 24 B. subtilis_msmR MVRIKDIALKAKVSSATVSRILNE 24 K. pneumoniae_scrR RVTIKDIAELAGVSKATASLVLNG 28 S. typhimurium_scrR RVTIKDIAEQAGVSKATASLVLNG 28 P. multocida_scrR RITLSDIAKCCGLSTTTVSMILNN 31 C. beijerinckii_scrR KVTIQDIANMVNVSKSSVSRYLNN 27 C. perfringens_1533 KVTIQDIANMVGVSKSTVSRYLNG 26 B. halodurans_1855 MTTILDIAKLAGVAKSTVSRYLNG 24 S. aureus_scrR MKNISDIAKLAGVSKSTVSRFLNN 24 S. xylosus_scrR MKNIADIAKIAGVSKSTVSRYLNN 24 E. faecalis_0070 VAKLTDVAELAGVSPTTVSRVINN 35 S. pyogenes_scrR VAKLTDVAALAGVSPTTVSRVINK 25 S. agalactiae_scrR VAKLTDVAALAGVSPTTVSRVINK 25 S. mutans_scrR VAKLTDVAKLAGVSPTTVSRVINR 25 S. pneumoniae_scrR VAKLTDVAKLAGVSPTTVSRVINK 25 E. faecalis_1604 VVKLTDVAKLAGVSPTTVSRVINN 28 L. lactis_sacR MIKLEDVANKAGVSVTTVSRVINR 24 L. acidophilus_scrR PAKLSDVAREAGCSVTTVSRVINN 25 L. gasseri_scrR MVKLTDVAAKAGCSVTTVSRVINN 26 L. plantarum_sacR KPKLNDVAKLAGVSATTVSRVINN 25 P. pentosaceus_scrR KPKLNDVAKLAGVSATTVSRVINN 25 L. acidophilus_msmR MATMKDVAQRAGVGVGTVSRVINH 23 S. pneumoniae_scrR2 SITMKDVALEAGVSVGTVSRVINK 32 V. alginolyticus_scrR --SLHDVARLAGVSKSTVSRVIND 24 E. coli_3626 MASLKDVARLAGVSMMTVSRVMHN 24 B. longum_BL0107 MVTMKEIANKAGVSVSTVSLVLNG 25 R. solanacearum_scrR RPTIRDVATLAGVSTSTVSRVLNN 34

B NDPNG L. acidophilus_bfrA WINDPNGL 38 S. pneumoniae_sacA WINDPNGF 45 E. coli_3625 WMNDPNGL 43 B. longum WINDPNGL 45 P. multocida_scrB LLNDPNGL 71 V. alginolyticus_scrB LLNDPNGL 55 L. acidophilus_scrB LINDPNGF 51 L. gasseri_scrB38 LLNDPNGF 51 S. mutans_scrB LLNDPNGF 51 S. sobrinus_scrB LLNDPNGF 51 E. faecalisp_0069 LLNDPNGF 51 S. pneumoniae_scrB LLNDPNGF 51 S. agalactiae LLNDPNGH 51 S. pyogenes_scrB LLNDPNGF 51 L. lactis_sacA LLNDPNGF 51 L. plantarum_sacA LLNDPNGF 51 P. pentosaceus_scrB LLNDPNGF 51 E. faecalis_1603 LLNDPNGF 56 S. aureus_scrB LLNDPNGL 52 S. xylosus_scrB LLNDPNGL 52 L. gasseri_scrB58 MLGDPNGF 77 C. beijinrinckii_scrB LINDPNGL 45 C. perfringens_sacA LINDPNGL 43 B. subtilis_sacA LLNDPNGV 47 G. stearothermophilus surA LMNDPNGL 47 B. halodurans_sacA LLNDPNGF 47 C. acetobutylicum_sacA FMNDPNGL 45 K. pneumoniae_scrB LLNDPNGF 45 S. typhimurium_scrB LMNDPNGF 45 V. cholerae_0655 LLNDPNGF 112 R. solanacearum_scrB LLNDPNGL 33 T. maritima_bfrA WMNDPNGL 21 B. subtilis_sacC WMNDPNGM 53 S. mutans_fruA WANDPNGL 499 M. laevaniformans_levM WMNDPQRP 90 S. mutans_fruB FMNDIQTI 52

FRDP SNFRDPKV 161 ADFRDPKL 170 MHFRDPKV 165 HHYRDPKV 167 EHVRDPKP 190 EHFRDPKV 172 EHFRDPQI 169 EHFRDPQL 170 EHFRDPQI 170 EHFRDPQI 170 DHFRDPQI 170 DHFRDPQI 170 EHFRDPQI 170 EHFRDPQL 170 DHFRDPQI 171 SSFRDPDL 120 SSFRDPDL 171 SHFRDPMV 175 SHFRDPKV 172 QHFRDPKV 172 GHFRDPKI 205 AHFRDPYV 164 AHFRDPYI 162 AHFSRSEV 167 AHFRDPK 167 AHFRDPKV 165 RHFRDPKV 166 GHVRDPKV 163 GHVRDPKV 163 EHIRDPKV 232 GHFRDPKA 152 HAFRDPKV 141 KDFRDPKV 175 QDFRDPKV 605 RDFRDPKV 221 QNARDPYI 173

ECP MTECPDYF 214 MWECPDYF 224 MWECPDFF 220 MLECPDFF 222 MWECPDLL 249 MWECPDFF 228 MIECPNLV 226 MIECPNLV 227 MIECPNLV 228 MIECPNLV 228 MIECPNLL 228 MMECPNLV 228 MIECPNII 228 MIECPNLV 228 MIECPNLI 229 MIECPNLV 176 MIECPKSG 227 MVECPNLV 233 MWECPDYF 228 MWECPDYF 228 MWECSDYF 261 MWECPNII 220 MWECPSFF 218 MWECPDLF 226 MWECPDLF 226 MWECPDLF 225 MWECPNLF 226 MWECPDLF 223 MWECPDLF 223 MWECPDWF 288 MYECPDLF 207 EIECPDLV 195 VWECPDLF 228 HTECPDMY 649 TIECPDLF 275 LVECPNLK 234

Figure 8. Motifs highly conserved amongst repressors and fructosidases. A, conserved helix-turn-helix motif of the regulators, the consensus sequence was obtained from Nguyen et al., 1995; B, conserved motifs of the β-fructosidases

75

scrR scrAscrB


msmE2 msmF2 msmG2 msmK2 melA gtfA2

msmR

msmR2

SUCROSE

FOS

RAFFINOSE

SUCROSE-6P

FOS

RAFFINOSE

GLUCOSE-6P + FRUCTOSE

SUCROSE

+FRUCTOSE

SUCROSE+ GALACTOSE


ScrA ScrB

3.2.1.26

MsmEFGKBfrA

3.2.1.26

MsmEFGK2MelA

3.2.1.22

GtfA

2.4.1.7GLUCOSE-1P

+FRUCTOSE

GtfA2

2.4.1.7

scrR scrAscrB



msmR

msmR2

SUCROSE

FOS

RAFFINOSE

SUCROSE-6P

FOS

RAFFINOSE


SUCROSE

+FRUCTOSE

SUCROSE+ GALACTOSE


ScrA ScrB

3.2.1.26

MsmEFGKBfrA

3.2.1.26

MsmEFGK2MelA

3.2.1.22

GtfA

2.4.1.7GLUCOSE-1P

+FRUCTOSE

GtfA2

2.4.1.7

Figure 9. Biochemical pathways. Biochemical pathways describing the likely reactions carried out by the enzymes and transporters encoded in the sucrose, FOS and raffinose loci. For the scr operon, sucrose is transported across the membrane and phosphorylated by a PTS transporter; the sucrose phosphate hydrolase hydrolyses the phosphorylated sucrose molecule into fructose and glucose-6-phosphate, and fructose. For the msm operon, FOS is transported across the membrane by an ABC transporter; the fructosidase hydrolyses fructose moieties, and the sucrose phosphorylase hydrolyses sucrose into glucose-1-phosphate and fructose. For the msm2 operon, raffinose is transported across the membrane by an ABC transporter, the alpha-galactosidase hydrolyses the galactose moiety, and the sucrose phosphorylase hydrolyses sucrose into glucose-1-phosphate and fructose.

76

Chapter III – Global analysis of carbohydrate utilization and transcriptional regulation in Lactobacillus acidophilus using whole-

genome cDNA microarrays

77

3.1 Abstract

The transport and catabolic machinery involved in carbohydrate utilization by the

probiotic lactic acid bacterium Lactobacillus acidophilus was characterized using whole-

genome cDNA microarrays. Global transcriptional profiles were determined for growth

on glucose, fructose, galactose, sucrose, lactose, trehalose, raffinose and

fructooligosaccharides. Hybridizations were carried out using a round robin design, and

microarray data was analyzed using a two-stage mixed model ANOVA. Genes

differentially expressed were visualized by hierarchical clustering, volcano plots and

novel 3-way contour plots. Quantitative PCR confirmed the fold induction determined by

microarrays. Although 379 genes (20% of the genome) were significantly differentially

expressed, only 63 genes showed induction above 4 fold, indicating that there was a small

number of highly induced genes, which included a variety of carbohydrate transporters

and sugar hydrolases. Specifically, members of the phosphoenolpyruvate: sugar

phosphotransferase system family of transporters were identified for uptake of glucose,

fructose, sucrose and trehalose. Transporters of the ATP binding cassette family were

identified for uptake of raffinose and fructooligosaccharides. A member of the LacS

subfamily of galactoside-pentose-hexuronide translocators was identified for uptake of

galactose and lactose. Saccharolytic enzymes likely involved in the metabolism of mono-

, di- and poly-saccharides into substrates of glycolysis were also identified, including the

enzymatic machinery of the Leloir pathway, involved in the catabolism of galactosides.

Results suggested the transcriptome is regulated by carbon catabolite repression.

Although substrate-specific carbohydrate transporters and hydrolases were regulated at

the transcriptional level, genes encoding regulatory proteins CcpA, Hpr, HprK/P and EI

78

were consistently highly expressed. Collectively, microarray data revealed coordinated

and regulated transcription of genes involved in sugar uptake and metabolism based on

carbohydrate availability in the environment. This dynamic adaptation to environmental

conditions likely contributes to competition with commensals for limited carbohydrate

sources available in the human gastrointestinal tract. This model study provides a global

view of carbohydrate metabolism in L. acidophilus, and illustrates how recently

implemented genomic tools can be used to investigate microbial physiology on a global

scale.

79

3.2 Introduction

A large, diverse and dynamic microbial community resides in the human

gastrointestinal tract (Tannock, 1999). In particular, the complex intestinal microbial

population includes beneficial bacteria such as bifidobacteria and lactobacilli (Gibson and

Roberfroid, 1995). Among species considered important for human health, a number of

documented lactobacilli have been characterized as probiotics (Reid, 1999). Probiotics

are generally defined as “live microorganisms which, when administered in adequate

amounts confer a health benefit on the host” (Reid et al, 2003). For such microbes,

survival and residence in the intestine relies on their ability to survive gastric passage,

adhere to epithelial cells and utilize nutrients available in the intestine.

Lactobacillus acidophilus NCFM is a gram-positive probiotic lactic acid

bacterium which has the ability to survive in the gastrointestinal tract (Sanders and

Klaenhammer, 2001; Sui et al., 2002), adhere to human epithelial cells in vitro (Greene

and Klaenhammer, 1994; Sanders and Klaenhammer, 2001), modify fecal flora (Sui et

al., 2002), modulate the host immune response (Varcoe et al., 2003), and prevent

microbial gastroenteritis (Varcoe et al., 2003). Additionally, L. acidophilus NCFM has

the ability to utilize prebiotic compounds, which may contribute to the organism’s ability

to compete in the human GIT (Barrangou et al., 2003).

Undigested carbohydrates are a primary source of energy for intestinal microbes

residing in the large intestine. Non-digestible oligosaccharides (NDO) consist primarily

of plant carbohydrates that are resistant to enzymatic degradation and are not absorbed in

the upper intestinal tract. Such dietary compounds eventually reach the large intestine,

whereby they are hydrolyzed by a limited range of organisms. As a result, NDO have the

80

ability to selectively modulate the composition of the intestinal microflora (Sui et al.,

2002). NDO such as raffinose and fructooligosaccharides have been shown to selectively

promote the growth of probiotic species, thus are considered prebiotic compounds

(Benno et al., 1987; Gibson et al., 1995). Prebiotics are defined as non-digestible

substances that provide a beneficial physiological effect on the host by selectively

stimulating the favorable growth or activity of a limited number of indigenous bacteria

(Reid et al., 2003). Although considerable attention has been devoted to studying

modulation of the intestinal flora by prebiotics, the molecular mechanisms involved in

uptake and metabolism of those compounds by desirable intestinal microbes remains

mostly uncharacterized.

Lactic acid bacteria are a heterogeneous family of microbes which can use a

variety of nutrients. Specifically, bifidobacteria, streptococci and lactobacilli possess

specialized saccharolytic potentials which reflect the nutrient availability in their

respective environments (Ajdic et al., 2002; Schell et al., 2002; Kleerebezem et al., 2003;

Pridmore et al., 2004). In particular, the versatile saccharolytic potential of L. acidophilus

likely reflects its ability to efficiently utilize energy sources available in the intestinal

environment. Although the Lactobacillus acidophilus NCFM genome encodes numerous

putative genes potentially involved in uptake and metabolism of a variety of

carbohydrates (Altermann et al., 2004), little information is available regarding their

biological functions and expression profiles.

The objective of this study was to use cDNA microarrays to characterize and compare

global gene expression in Lactobacillus acidophilus. Global gene transcription profiles

were used to identify uptake systems, catabolic machinery and regulatory networks

81

involved in utilization of eight carbohydrates. This is the first comparative global

transcriptional analysis of the fermentation pathways of a lactic acid bacterium over a

range of carbohydrates.

3.3 Materials and Methods

3.3.1 Bacterial strains and media used in this study

The strain used in this study is L. acidophilus NCFM (NCK56) (Altermann et al.,

2004). Cultures were propagated at 37°C, aerobically in MRS broth (Difco). A semi-

synthetic medium consisted of: 1% bactopeptone (w/v) (Difco), 0.5% yeast extract (w/v)

(Difco), 0.2% dipotassium phosphate (w/v) (Fisher), 0.5% sodium acetate (w/v) (Fisher),

0.2% ammonium citrate (w/v) (Sigma), 0.02% magnesium sulfate (w/v) (Fisher), 0.005%

manganese sulfate (w/v) (Fisher), 0.1% Tween 80 (v/v) (Sigma), 0.003 % bromocresol

purple (v/v) (Fisher), and 1% sugar (w/v). The carbohydrates added were either: glucose

(dextrose) (Sigma), fructose (Sigma), sucrose (Sigma), FOS (raftilose P95) (Orafti),

raffinose (Sigma), lactose (Fisher), galactose (Sigma), or trehalose (Sigma). Without

carbohydrate supplementation, the semi-synthetic medium was unable to sustain bacterial

growth. Cells underwent at least five passages on each sugar prior to RNA isolation, to

minimize carryover between substrates (Chhabra et al., 2003).

3.3.2 RNA isolation

Total RNA was isolated using TRIzol (GibcoBRL) by following the

manufacturer’s instructions. L. acidophilus cells were inoculated into semi-synthetic

medium supplemented with 1% (w/v) select sugars and propagated to mid-log phase

82

(OD600nm~0.6). Cells were harvested by centrifugation (2 minutes, 14,000 rpm) and

immediately cooled on ice. Pellets were resuspended in TRIZOL, by vortexing and

underwent five cycles of 1 min bead beating and 1 min on ice. Nucleic acids were

purified using three chloroform (Fisher) extractions, and precipitated using isopropanol

(Fisher) and centrifugation for 10 min at 12,000 rpm. The RNA pellet was washed with

70% ethanol (AAPER Alcohol and Chemical co.), and resuspended into DEPC- (Sigma)

treated water. RNA samples were treated with DNAse I according to the manufacturer’s

recommendations (Boehringer Mannheim).

3.3.3 Microarray fabrication

A whole-genome cDNA microarray was used for global gene expression analysis.

The microarray contained triplicate spots of 1,889 cDNA PCR products amplified from

genomic DNA, as described previously (Azcarate-Peril et al., 2004). Purified PCR

amplicons were spotted on GAPPS II aminosilane-coated glass slides (Corning, Acton,

MA), using an Affymetrix 417 Arrayer (Affymetrix, CA), and slides were processed as

described previously (Hedge et al., 2000; Azcarate-Peril et al., 2004).

3.3.4 cDNA target preparation and microarray hybridization

For each hybridization, two total RNA samples (25 µg each) were amino-allyl

labeled by reverse transcription using random hexamers (Invitrogen Life Technologies,

Carlsbad, CA) as primers, in the presence of amino-allyl dUTP (Sigma, Town, state), by

a SuperScript II reverse transcriptase (Invitrogen Life Technologies, Carlsbad, CA), as

described previously (Hedge et al., 2000; Azcarate-Peril et al., 2004;

83

http://pga.tigr.org/sop/M004_1a.pdf). Labeled cDNA samples were subsequently coupled

with either Cy3 or Cy5 N-hydroxysuccinimidyl-dyes (Amersham Biosciences Corp.,

Piscalaway, NJ), and purified using a PCR purification kit (Qiagen). The resulting

samples were hybridized onto microarray slides and further processed as described

previously (Azcarate-Peril et al., 2004), according to the TIGR protocol (Hedge et al.,

2000; http://pga.tigr.org/sop/M005_1a.pdf). Hybridizations were performed according to

a single Round-Robin design, so that all possible direct pair-wise comparisons were

conducted. With 8 different sugars, a total of 28 hybridizations were performed (Figure

1). Each treatment was labeled 7 times, and every-other treatment was labeled with either

Cy3 or Cy5, 4 and 3 times, alternatively.

3.3.5 Microarray data collection and analysis

Microarray images were acquired using a Scanarray 4000 Microarray Scanner

(Packard Biochip Bioscience, MA). Signal fluorescence, including spot and background

intensities were subsequently quantified and assigned to genomic ORFs using Quantarray

3.0 (Packard BioChip Technologies LLC, Billerica, MA).

Raw data were imported into SAS (SAS Institute Inc., Cary, NC), compiled,

background corrected, log2 transformed, and subjected to a mixed model of analysis of

variance (SAS proc mixed) with two sequential linear models (Wolfinger et al., 2001).

ANOVA mixed models have proven successful at analyzing microarray data (Wolfinger

et al., 2001; Jin et al., 2001; Kerr and Churchill, 2001; Chhabra et al., 2003; Madsen et

al., 2004; Hsieh et al., 2004; Pysz et al., 2004). The first model accomplished

normalization of data with respect to the global effects of array, dye, treatment, and spot.

84

The normalization model was: log2(yijkl)=µ+Ai+Dj+Tk+Ai(Sl)+εij,, where µ is the sample

mean, Ai is the effect of the ith array, Dj is the effect of the jth dye, Tk is the effect of the

kth treatment, Sl is the effect of the lth spot, and εijkl is the stochastic error. The estimated

effects resulting from this model were used to predict expected intensities for each value,

and residuals were subsequently calculated, as the difference between the observed and

expected intensities. The normalization residuals were subsequently used as input for the

second model, a series of 1,889 gene-specific models, which removed gene-specific

biases and calculated least square mean estimates of treatment effects for each gene under

each treatment. The gene-specific models were: rijkl=µ+Ai+Dj+Tij+Ai(Sij)+εij. The array

and spot effects were treated as random effects, and as such their variance was removed

without performing parameter estimates. Least squares means estimated were calculated

for fixed effects dye and treatment condition. The resulting difference between least

square estimates for two different treatments is analogous to a log2-transformed ratio of

gene expression between those two treatments.

Differences were calculated between all pairs of treatments for each gene and a

measure of statistical significance was obtained from a t-test using these differences and

their associated standard errors. A Bonferroni correction was applied to account for bias

due to multiple tests by dividing the desired level of significance (α=0.05) by the total

number of comparisons performed (54,781). Thus, the corrected false positive rate was

α=9.12x10-7 corresponding to a –log10(p-value) = 6.04 (10-6.04). All p-values which fell

below α=9.12x10-7 were considered statistically significant. Volcano plots of log2-

transformed fold changes (indicating induction ratios) versus log10-transformed p-values

(indicating statistical significance), and three way plots (contour plots) of individual

85

treatment effects were used to visualize contrasts between treatments, and statistical

significance of the results. Global patterns in treatment effects were visualized using

Ward’s method of hierarchical clustering in JMP 5.0 (SAS), using least squares mean

estimates and their standardized counterparts as input.

3.3.6 Real-Time Quantitative RT-PCR

Experiments were conducted using a Q-PCR thermal cycler (I-cycler, BIORAD),

in combination with the QuantiTect SYBR Green PCR kit (Qiagen). PCR primers were

determined according to specifications recommended by the manufacturer (Tm, length,

base content). Six carbohydrates samples were included, namely glucose, fructose,

sucrose, FOS, lactose and galactose. Each set of samples was analyzed in triplicate. The

RNA samples used in Q-PCR experiments were identical to those used in microarray

experiments.

3.4 Results

3.4.1 Differentially expressed genes

Global gene expression patterns obtained from growth on eight different

carbohydrates were visualized by cluster analysis (Eisen et al., 1998) using Ward’s

hierarchical clustering method (Figures 2 and 3), volcano plots (Figure 4) and contour

plots (Figure 5). Overall, between 23 and 379 genes were differentially expressed

between paired treatment conditions (with p-values below the Bonferroni correction),

representing between 1% and 20% of the genome, respectively (Figure 6). Although 342

genes (18% of the genome) showed induction levels above two fold, only 63 genes (3%

86

of the genome) showed induction above 4 fold (Figure 7), indicating a relatively small

number of genes were highly induced. Although overall expression levels of the majority

of the genes remained consistent regardless of the growth substrate (80% of the genome),

select clusters showed differential transcription of genes and operons (Figures 2 and 3).

Nevertheless, for each sugar, a limited number of genes showed specific induction.

In the presence of glucose, ORFs La1679 and La1680 (Figure 3) were highly

induced when compared to other monosaccharides (fructose, galactose) and di-

saccharides (sucrose, lactose, trehalose). The induction levels compared to other sugars

varied between 3.5 and 6.3 for La1679 and between 3.7 and 4.7 for La1680. La1679

encodes an ABC nucleotide binding protein, including commonly found nucleotide

binding domain motifs, namely WalkerA, WalkerB, ABC signature sequence and Linton

and Higgins motif. La1680 encodes an ABC permease, with 10 predicted membrane

spanning domains. No solute binding protein is encoded in their vicinity, suggesting a

possible role as an exporter rather than an importer. Several genes and operons were

specifically repressed by glucose (see Figures 2 and 3), including ORFs La680-686,

which are involved in glycogen metabolism. Since glycogen is metabolized by the cell in

order to store energy, in the presence of the preferred carbon source such as glucose,

energy storage is not necessary. Other genes repressed in the presence of glucose

included proteins involved in uptake of alternative carbohydrate sources, and enzymes

involved in hydrolysis of such carbohydrates.

The three genes of the putative fructose locus, La1777 (FruA, fructose PTS

transporter EIIABCFru), La1778 (FruK, phosphofructokinase EC 2.7.1.56) and La1779

(FruR, transcription regulator) were differentially expressed (Figure 3). Induction levels

87

were up to 3.9, 4.3 and 4.6 for fruA, fruK and fruR, respectively. These results suggest

fructose is transported into the cell via a PTS transporter, into fructose-6-phosphate,

which the phosphofructokinase FruK phosphorylates into fructose-1,6 bi-phosphate, a

glycolysis intermediate.

In the presence of sucrose, the three genes of the sucrose locus were differentially

expressed (Figure 3), namely La399 (ScrR, transcription regulator), La400 (ScrB,

sucrose-6-phosphate hydrolase EC 3.2.1.26), and La401 (ScrA, sucrose PTS transporter

EIIBCASuc). When compared to glucose, induction levels were up to 3.1, 2.8 and 17.2 for

scrR, scrB and scrA, respectively. La401 in particular showed high induction levels,

between 8.0 and 17.2 when compared to mono- and di-saccharides. These results indicate

that sucrose is transported into the cell via a PTS transporter, into sucrose-6-phosphate,

which is subsequently hydrolyzed into glucose-6-phosphate and fructose by ScrB.

The six genes of the FOS operon were differentially expressed (Figures 3, 4, 5),

namely La502, La503, La504, La506 (MsmEFGK ABC transporter), La505 (BfrA, β-

fructosidase EC 3.2.1.26) and La507 (GtfA, sucrose phosphorylase EC 2.7.1.4).

Induction levels varied between 15.1 and 40.6 when compared to mono- and di-

saccharides, and between 5.5 and 8.9 when compared to raffinose. These results suggest

FOS is transported into the cell via an ABC transporter and subsequently hydrolyzed into

fructose and sucrose by the fructosidase. Sucrose is likely subsequently hydrolyzed into

fructose and glucose-1-P by the sucrose phosphorylase. In addition to the FOS operon,

FOS also induced the fructose operon, the sucrose PTS transporter, the trehalose operon

and an ABC transporter (La1679-La1680).

88

In the presence of raffinose, the six genes of the raffinose operon were

specifically induced (Figures 3, 4, 5). The raffinose locus consists of La1442, La1441,

La1440, La1439 (MsmEFGK2 ABC transporter), La1438 (MelA α-galactosidase EC

3.2.1.22), and La1437 (GtfA2, sucrose phosphorylase EC 2.7.1.4). Induction levels varied

between 15.1 and 45.6, when compared to all other conditions. Additionally, La1433-4

(di-hydroxyacetone kinase EC 2.7.1.29), and La1436 (glycerol uptake facilitator) were

induced between 1.9 and 24.7 fold when compared to other conditions.

In the presence of lactose and galactose, ten genes distributed in two loci were

differentially expressed, namely La1463 (LacS permease of the GPH translocator

family), La1462 (LacZ, β-galactosidase EC 3.2.1.23), La1461 (conserved hypothetical

protein), La1460 (surface protein), La1459 (GalK, galactokinase EC 2.7.1.6), La1458

(GalT, galactose-1 phosphate uridylyl transferase EC 2.7.7.10), La1457 (GalM, galactose

epimerase EC 5.1.3.3), La1467-8 (LacLM, β-galactosidase EC 3.2.1.23 large and small

subunits), and La 1469 (GalE, UDP-glucose epimerase EC 5.1.3.2). LacS is similar to

GPH permeases previously identified in lactic acid bacteria. Although LacS contains a

EIIA at the carboxy-terminus, it is not a PTS transporter. Also, LacS includes a His at

position 553, which might be involved in interaction with HPr, as shown in S. salivarius

(Lessard et al., 2003). In the presence of lactose and galactose, galKTM were induced

between 3.7 and 17.6 fold; lacSZ were induced between 2.8 and 17.6 fold; lacL and galE

were induced between 2.7 and 29.5, when compared to other carbohydrates not

containing galactose, i. e. glucose, fructose, sucrose, trehalose and FOS. These results

suggest lactose is transported into the cell via the LacS permease of the galactoside-

pentose hexuronide translocator family. Inside the cell, lactose is hydrolyzed into glucose

89

and galactose by LacZ. Galactose is then phosphorylated by GalK into galactose-1

phosphate, further transformed into UDP-galactose by GalT. UDP-galactose is

subsequently epimerized to UDP-glucose by GalE. UDP-glucose is likely turned into

glucose-1P by La1719, which encodes a UDP-glucose phosphorylase EC 2.7.7.9,

consistently highly expressed. Finally, the phosphoglucomutase EC 5.4.2.2 likely acts on

glucose-1P to yield glucose-6P, a glycolysis substrate.

The three genes of the putative trehalose locus were also differentially expressed

(Figures 3 and 5). The trehalose locus consists of La1012 (encoding the TreB trehalose

PTS transporter EIIABCTre EC 2.7.1.69), La1013 (TreR, trehalose regulator) and La1014

(TreC, trehalose-6 phosphate hydrolase EC 3.2.1.93). Induction levels were between 4.3

and 18.6 for treB, between 2.3 and 7.3 for treR, and between 2.7 and 18.5 for treC, when

compared to glucose, sucrose, raffinose and galactose. These results suggest trehalose is

transported into the cell via a PTS transporter, phosphorylated to trehalose-6 phosphate

and hydrolyzed into glucose and glucose-6 phosphate by TreC.

In addition, genes showing differential expression included hypothetical genes La

457, La466, La1006, La1008, La1010, La1011, La1206; sugar- and energy- related genes

La874 (beta galactosidase EC 3.2.1.86), La910 (L-LDH EC 1.1.1.27), La1007 (pyridoxal

kinase 2.7.1.35), La1812 (alpha glucosidase EC 3.2.1.3), La1632 (aldehyde

dehydrogenase EC 1.2.1.16), La1401 (NADH peroxidase EC 1.11.1.1), LA1974

(pyruvate oxidase EC 1.2.3.3), adherence genes La555, La649, La1019; aminopeptidase

La911, La1086; amino-acid permease, La1102 (membrane protein), La1783 (ABC

transporter), La1879 (pyrimidine kinase EC 2.7.4.7).

90

3.4.2 Real-Time Quantitative RT-PCR

Five genes that were differentially expressed in microarray experiments were

selected for real-time quantitative RT-PCR experiments, in order to validate induction

levels measured by microarrays. These genes were selected for both their broad

expression range (LSM between -1.52 and +3.87), and induction levels between sugars

(fold induction up to 34). All selected genes showed an induction level above 6 fold in at

least one instance. Also, the annotations of the selected genes were correlated

functionally with carbohydrate utilization. The five selected genes were: beta-

fructosidase (La505), trehalose PTS (La1012), glycerol uptake facilitator (La1436), beta-

galactosidase (La1467), and ABC transporter (La1679).

The induction leveled measured by microarrays were plotted against induction

levels measured by Q-PCR, for the five selected genes, in order to validate microarray

data (Figure 8). Individual R-square values ranged between 0.642 and 0.883 for each of

the tested genes (between 0.652 and 0.978 using data in a log2 scale). When the data were

combined, the global R-square value was 0.78 (0.88 using data in a log2 scale). A

correlation analysis was run in SAS (Cary, NC), and showed a correlation between the

two methods with P-values less than 0.001, for Spearman, Hoeffding and Kendall tests.

Additionally, a regression analysis was run in excel (Microsoft, CA), and showed a

statistically highly significant (p < 1.02x10-25) correlation between microarray data and

Q-PCR results. Nevertheless, Q-PCR measurements revealed larger induction levels,

which is likely due to the smaller dynamic range of the microarray scanner, compared to

that of the Q-PCR cycler. Similar results have been reported previously (Wagner et al.,

2003).

91

3.5 Discussion

Comparative analyses of global transcription profiles determined for growth on

eight carbohydrates identified the basis for carbohydrate transport and catabolism in L

acidophilus. Specifically, three different types of carbohydrate transporters were

differentially expressed, namely phosphoenolpyruvate: sugar phosphotransferase system

(PTS), ATP binding cassette (ABC) and galactoside-pentose hexuronide (GPH)

translocator, illustrating the diversity of carbohydrate transporters used by L. acidophilus.

Transcription profiles suggested that galactosides were transported by a GPH

translocator, while mono- and di- saccharides were transported by members of the PTS,

and polysaccharides were transported by members of the ABC family.

Microarray results indicated fructose, sucrose and trehalose are transported by

PTS transporters EIIABCFru (La1777), EIIBCASuc (La401) and EIIABCTre (La1012),

respectively. Those genes are encoded on typical PTS loci (Figure 9), along with

regulators and enzymes that have been well characterized in other organisms. In contrast,

FOS and raffinose are transported by ABC transporters of the MsmEFGK family, La502-

505 and La1437-1442, respectively. In the case of trehalose and FOS, microarray results

correlate well with functional studies in which targeted knock out of carbohydrate

transporters and hydrolases modified the saccharolytic potential of L. acidophilus NCFM

(Barrangou et al., 2002; Duong et al., 2004). Differential expression of the EIIABCTre is

consistent with recent work in L. acidophilus indicating La1012 is involved in trehalose

uptake (Duong et al., 2004). Similarly, differential expression of the fos operon is

consistent with previous work in L. acidophilus indicating those genes are involved in

92

uptake and catabolism of FOS, and induced in the presence of FOS and repressed in the

presence of glucose (Barrangou et al., 2003). Additionally, induction of the raffinose msm

locus is consistent with previous work in Streptococcus mutans (Russell et al., 1992) and

Streptococcus pneumoniae (Rosenow et al., 1999).

A number of lactic acid bacteria take up glucose via a PTS transporter. The EIIMan

PTS transporter has the ability to import both mannose and glucose (Cochu et al., 2003).

The L. acidophilus mannose PTS system is similar to that of Streptococcus thermophilus,

with proteins sharing 53-65% identity and 72-79% similarity. Specifically, the EIIMan is

composed of three proteins IIABMan IICMan IIDMan encoded by La452 (manL), La455

(manM) and La456 (manN), respectively (Figure 9). Most of the carbohydrates examined

here specifically induced genes involved in their own transport and hydrolysis, but

glucose did not. Analysis of the mannose PTS revealed that the genes encoding the

EIIABCDMan were consistently highly expressed, regardless of the carbohydrate source

(Figure 3A). This expression profile suggests glucose is a preferred carbohydrate, and L.

acidophilus is also designed for efficient utilization of different carbohydrate sources, as

was suggested previously for L. plantarum (Kleerebezem et al, 2003).

The genes differentially expressed in the presence of galactose and lactose

included a permease (LacS), and the enzymatic machinery of the Leloir pathway.

Members of the LacS subfamily of galactoside-pentose-hexuronide (GPH) translocators

have been described in a variety of lactic acid bacteria, including Leuconostoc lactis

(Vaughan et al., 1996), S. thermophilus (van den Bogaard et al., 2000), Streptococcus

salivarius (Lessard et al., 2003) and Lactobacillus delbrueckii (Lapierre et al., 2002).

Although LacS contains a PTS EIIA at the carboxy terminus, it is not a member of the

93

PTS family of transporters. LacS has been reported to have the ability to import both

galactose and lactose in select organisms (Vaughan et al., 1996; van den Bogaart et al.,

2000). Although the combination of a LacS lactose permease with two β-galactosidase

subunits LacL and LacM has been described in L. plantarum (Kleerebezem et al., 2003)

and Leuconostoc lactis (Vaughan et al., 1996), it has never been reported in L.

acidophilus. Even though constitutive expression of lacS and lacLM has been reported

previously (Vaughan et al., 1996), our current results indicate specific induction of the

genes involved in uptake and catabolism of both galactose and lactose. Operon

organization for galactoside utilization is variable and unstable among Gram-positive

bacteria (Lapierre et al., 2002; Vaillancourt et al., 2002, Boucher et al., 2003; Fortina et

al., 2003; Grossiord et al., 2003). Interestingly, even amongst closely related

Lactobacillus species, namely L. johnsonii, L. gasseri and L. acidophilus, the lactose-

galactose locus is not well conserved (Pridmore et al., 2004) (Figure 10). Perhaps the

presence of mobile elements in the vicinity of those genes is responsible for the

instability of this locus (Altermann et al., 2004).

Although it was previously suggested that the phosphoenolpyruvate:

phosphotransferase system is the primary sugar transport system of Gram-positive

bacteria (Ajdic et al., 2002; Warner and Lolkema, 2003), current microarray data indicate

that ABC transport systems are also important. While PTS transporters are involved in

uptake of mono- and di-saccharides, those carbohydrates are digested in the upper GIT.

In contrast, oligosaccharides reach the lower intestine whereby commensals are likely to

compete for more complex and scarce nutrients. Perhaps under such conditions ABC

transporters are even more crucial than the PTS, given their apparent roles in transport of

94

oligosaccharides like FOS and raffinose. In this regard, the ability to utilize nutrients that

has been are non digestible by the host has been associated with competitiveness and

persistence of beneficial intestinal flora in the colon (Schell et al., 2002).

Transcription profiles of genes differentially expressed in conditions tested

indicated that all carbohydrate uptake systems and their respective sugar hydrolases were

specifically induced by their substrate, except for glucose. Moreover, genes within those

inducible loci were repressed in the presence of glucose, and cre sequences were

identified in their promoter-operator regions (Figure 11). Together, these results indicate

regulation of carbohydrate uptake and metabolism at the transcription level, and implicate

the involvement of a global regulatory system compatible with carbon catabolite

repression. Carbon catabolite repression (CCR) controls transcription of proteins

involved in transport and catabolism of carbohydrates (Miwa et al., 2000). Catabolite

repression is a mechanism widely distributed amongst Gram-positive bacteria, mediated

in cis by catabolite responsive elements (Miwa et al., 2000; Wickert and Chambliss,

1990), and in trans by repressors of the LacI family, which is responsible for

transcriptional repression of genes encoding unnecessary saccharolytic components in the

presence of preferred substrates (Weickert and Chambliss, 1990; Viana et al., 2000;

Muscariello et al., 2001 Warner and Lolkema, 2003). This regulatory mechanism allows

cells to coordinate the utilization of diverse carbohydrates, to focus primarily on

preferred energy sources. CCR is based upon several key enzymes, namely HPr (La639,

ptsH), EI (La640, ptsI), CcpA (La431, ccpA), and HPrK/P (La676, ptsK), all of which are

encoded within the L. acidophilus chromosome.

95

Carbon catabolite repression has already been described in lactobacilli (Mahr et

al., 2000). The PTS is characterized by a phosphate transfer cascade involving PEP, EI,

HPr, EIIABC, whereby a phosphate is ultimately transferred to the carbohydrate substrate

(Saier, 2000; Warner and Lolkema, 2003). HPr is an important component of CCR,

which is regulated via phosphorylation by enzyme I and HPrK/P. When HPr is

phosphorylated at His15, the PTS is active, and carbohydrates transported via the PTS are

phosphorylated via EIIABCs. In contrast, when HPr is phosphorylated at Ser46, the PTS

machinery is not functional (Mijakovic et al., 2002).

Although the phosphorylation cascade suggests regulation at the protein level,

several studies report transcriptional modulation of ccpA and ptsHI. In S. thermophilus,

CcpA production is induced by glucose (van den Bogaart et al., 2000). In several

bacteria, the carbohydrate source modulates ptsHI transcription levels (Luesink et al.,

1999). In contrast, expression levels of ccpA, ptsH, ptsI and ptsK did not vary in the

presence of different carbohydrates in L. acidophilus. These results are consistent with

regulation via phosphorylation at the protein level. Similar results have been reported for

ccpA expression levels in Lactobacillus pentosus (Mahr et al., 2000), and ptsHI

transcription in S. thermophilus (Cochu et al., 2003).

Globally, microarray results allowed reconstruction of carbohydrate transport and

catabolism pathways (Figure 12). Although transcription of carbohydrate transporters and

hydrolases was specifically induced by their respective substrates, glycolysis genes were

consistently highly expressed (Figure 13). Orchestrated carbohydrate uptake likely

withdraws energy sources from the intestinal environment and deprives other bacteria of

96

access to such resources. Consequently, L. acidophilus may compete well against other

commensals for nutrients.

In summary, a variety of carbohydrate uptake systems were identified and

characterized, with respect to expression profiles in the presence of different

carbohydrates, including PTS, ABC and GHP transporters. The uptake and catabolic

machinery is highly regulated at the transcription level, suggesting the L. acidophilus

transcriptome is flexible, dynamic and designed for efficient carbohydrate utilization.

Differential gene expression indicated the presence of a global carbon catabolite

repression regulatory network. Regulatory proteins were consistently highly expressed,

suggesting regulation at the protein level, rather than the transcriptional level.

Collectively, L. acidophilus appears to be able to efficiently adapt its metabolic

machinery to fluctuating carbohydrate sources available in the nutritional complex

environment of the small intestine. In particular, ABC transporters of the MsmEFG

family involved in uptake of FOS and raffinose likely play an important role in the ability

of L. acidophilus to compete with intestinal commensals for complex sugars that are not

digested by the human host. Ultimately, this information provides new insights into how

undigested dietary compounds influence the intestinal microbial balance. This study is a

model for comparative transcriptional analysis of a bacterium exposed to varying growth

substrates.

97

3.6 References


Altermann, E., Russell, W. M., Azcarate-Peril, M. A., Barrangou, R., Buck, L. B.,


Azcarate-Peril et al., 2004 In review Barrangou R, Altermann E, Hutkins R, Cano, & Klaenhammer, TR. (2003) Proc. Natl.

Acad. Sci. USA 100, 8957-8962 Benno, Y., Endo, K., Shiragami, N., Sayama, K., and Mitsuoka, T. (1987) Bifido. Micro.

6, 59-63 Bogaard, van den P. T. C., Kleerebezem, M., Kuipers, O. P., & De Vos, W. M. (2000) J.

Bacteriol. 182, 5982-9 Boucher, I., Vadeboncoeur, C., & Moineau, S. (2003) Appl. Environ. Microbiol. 69,

4149-56 Chhabra, S. R., Shockley, K. R., Conners, S. B., Scott, K. L., Wolfinger, R. D., & Kelly,

R. M. (2003) J. Biol. Chem. 278,7540-7552 Cochu, A., Vadeboncoeur, C., Moineau, S, & Frenette, M. (2003) Appl. Environ.

Microbiol. 69, 5423-32 Duong, T., Barrangou, R., Russell, M. W., & Klaenhammer, T. R. (2004) In review Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998) Proc. Natl. Acad. Sci.

USA 95, 14863-8 Fortina, M. G., Ricci, G., Mora, D., Guglielmetti, S., & Manachini, P. L. (2003) Appl.

Environ. Microbiol. 69, 3238-43 Gibson, G. R., Beatty, E. R., Wan, X., & Cummings, J. H. (1995) Gastroent. 108, 975-82 Gibson, G. R. & Roberfroid, M. B. (1995) J. Nutr. 125, 1401-1412. Greene, J. D., & Klaenhammer, T. R. (1994) Appl. Environ. Microbiol. 60, 4487-4494

98

Grossiord, B. P., Luesink, E. J., Vaughan, E. E., Arnaud, A., & De Vos, W. M. (2003) J. Bacteriol. 185, 870-8

Hedge, P., Qi, R., Abernathy, K., Gay, C., Dharap, S., Gaspard, R., Hughes, J. E.,

Snesrud, E., Lee, N., & Quackenbush J. (2000) Biotechniques 29, 548-562 Helden, van J., Andre, B., & Collado-Vides, J. (2000) Yeast 16, 177-87 Hsieh, W. P., Chu, T. M., Wolfinger, R. D., & Gibson, G. (2003) Genetics 165, 747-57 Jin, W., Riley, R. M., Wolfinger, R. D., White, K. P., Passador-Gurgel, G., & Gibson, G.

(2001) Nature Genet. 29, 389-395 Kerr, M. K., and Churchill G. A. (2001) Genet. Res. Camb. 77, 123-8 Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,

R., Tarchini, R., Peters, S. A., Sandbrink, H. M., Fiers, M. W., Stiekema, W., Lankhorst, R. M., Bron, P. A., Hoffer, S. M., Groot, M. N., Kerkhoven, R., de Vries, M., Ursing, B., de Vos, W. M. & Siezen, R. J. (2003) Proc. Natl. Acad. Sci. USA 100, 1990-5.

Lapierre, L., Mollet, B., & Germond, J. E. (2002) J. Bacteriol. 184, 928-35 Lessard, C., Cochu, A., Lemay, J. D., Roy, D., Vaillancourt, K., Frenette, M., Moineau,

S., & Vadeboncoeur, C. (2003) J. Bacteriol. 185, 6764-72 Luesink, E. J., Marugg, J. D., Kuipers, O. P. & de Vos, W. M. (1999) J. Bacteriol. 181,

764-71 Madsen, S. A., Chang, L. C., Hickey, M. C., Rosa, G. J. M., Coussens, P. M., & Burton,

J. L. (2004) Physiol. Genomics 16, 212-21 Mahr, K., Hillen, W., & Titgemeyer, F. (2000) Appl. Environ. Microbiol. 66, 277-83 Mijakovic, I., Poncet, S., Galinier, A., Monedero, V., Fieulaine, S., Janin, J., Nessler, S.,

Marquez, J. A., Scheffzek, K., Hasenbein, S., Hengstenberg, W., & Deutscher, J. (2002) Proc. Natl. Acad. Sci. USA 99, 13442-7

Miwa, Y., Nakata, A., Ogiwara, A., Yamamoto, M. & Fujita, Y. (2000) Nucleic Acids

Res. 28, 1206-10 Muscariello, L., Marasco, R., De Felice M., & Sacco, M. (2001) Appl. Environ.

Microbiol. 67, 2903-7

99

Pridmore RD, Berger B, Desiere F, Vilanova D, Barretto C, Pittet AC, Zwahlen MC, Rouvet M, Altermann E, Barrangou R, Mollet B, Mercenier A, Klaenhammer TR, Arigoni F, & Schell MA. (2004) Proc. Natl. Acad. Sci. USA 101, 2512-2517

Pysz, M. A., Ward, D. E., Shockley, K. R., Montero, C. I., Conners, S. B., Johnson, M.

R., & Kelly, R. M. (2004) Extremophiles 8, 209-17 Reid, G. (1999) Appl. Environ. Microbiol. 65, 3763-6 Reid, G., Sanders, M. E., Gaskins, H. R., Gibson, G. R., Mercenier, A., Rastall, R.,

Roberfroid, M., Rowland, I., Cherbut, C., & Klaenhammer T. R. (2003) J. Clin. Gastroenterol. 37, 105-118

Rosenow, C., Maniar, M., & Trias, J. (1999) Genome Res. 9, 1189-97 Russell, R. R. B., Aduseopoku, J., Sutcliffe, I. C., Tao, L. & Ferretti, J. J. (1992) J. Biol.

Chem. 267, 4631-4637. Saier, M. H. Jr. (2000) Mol. Microbiol. 35, 699-710 Sanders, M. E., & Klaenhammer, T. R. (2001) J. Dairy. Sci. 84, 319-331 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,

M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sci. USA 99, 14422-14427.

Sui, J., Leighton, S., Busta, F., & Brady, L. (2002) J. Appl. Microbiol. 92, 907-12 Tannock, G. W. (1999) Antonie van Leeuwenhoek 76, 265-78 Vaillancourt, K., Moineau, S., Frenette, M., Lessard, C., & Vadeboncoeur, C. (2002) J.

Bacteriol. 184, 785-93 Varcoe. J. J., Krejcarek, G., Busta, F., & Brady, L. (2003) J. Food Prot. 66, 457-465 Vaughan, E. E., David, S., & De Vos W. M. (1996) Appl. Environ. Microbiol. 62, 1574-

82 Viana, R., Monedero, V., Dossonet, V., Vadeboncoeur, C., Perez-Martinez, G., &

Deutscher, J. (2000) Mol. Microbiol. 36, 570-584 Wagner, V. E., Bushnell, D., Passador, L., Brooks, A. I., & Iglewski, H. I.(2003) J. Bac.

185, 2080-95 Warner, J. B., & Lolkema, J. S. (2003) Microbiol. Mol. Rev. 67, 475-90

100

Weickert, M. J. & Chambliss, G. H. (1990) Proc. Natl. Acad. Sci. USA 87, 6238-42 Wolfinger, R. D., Gibson, G., Wolfinger, E. D., Bennett, L., Hamadeh, H., Bushel, P.,

Afshari, C., & Paules, R. S. (2001) J. Comput. Biol. 8, 625-637

101

Glc Fru

Tre Suc

Gal FOS

Lac Raff

12

3456

7Glc Fru

Tre Suc

Gal FOS

Lac Raff

8

9

101112

13

Glc Fru

Tre Suc

Gal FOS

Lac Raff

14

1516

17

18

Glc Fru

Tre Suc

Gal FOS

Lac Raff

1920

2122

Glc Fru

Tre Suc

Gal FOS

Lac Raff23

24

25

Glc Fru

Tre Suc

Gal FOS

Lac Raff

26

27

Glc Fru

Tre Suc

Gal FOS

Lac Raff

28

Glc Fru

Tre Suc

Gal FOS

Lac Raff

Glc Fru

Tre Suc

Gal FOS

Lac Raff

12

3456

7Glc Fru

Tre Suc

Gal FOS

Lac Raff

12

3456

7Glc Fru

Tre Suc

Gal FOS

Lac Raff

8

9

101112

13

Glc Fru

Tre Suc

Gal FOS

Lac Raff

8

9

101112

13

Glc Fru

Tre Suc

Gal FOS

Lac Raff

14

1516

17

18

Glc Fru

Tre Suc

Gal FOS

Lac Raff

14

1516

17

18

Glc Fru

Tre Suc

Gal FOS

Lac Raff

1920

2122

Glc Fru

Tre Suc

Gal FOS

Lac Raff

1920

2122

Glc Fru

Tre Suc

Gal FOS

Lac Raff23

24

25

Glc Fru

Tre Suc

Gal FOS

Lac Raff23

24

25

Glc Fru

Tre Suc

Gal FOS

Lac Raff

26

27

Glc Fru

Tre Suc

Gal FOS

Lac Raff

26

27

Glc Fru

Tre Suc

Gal FOS

Lac Raff

28

Glc Fru

Tre Suc

Gal FOS

Lac Raff

28

Glc Fru

Tre Suc

Gal FOS

Lac Raff

Glc Fru

Tre Suc

Gal FOS

Lac Raff

Figure 1. Round-robin microarray hybridization design. Each carbohydrate is at a vertex of an octagon. Glc, glucose; Fru, fructose; Suc, sucrose; FOS, fructooligosaccharides; Raf, raffinose; Lac, lactose; Gal, galactose; Tre, trehalose. Each arrow represents a hybridization whereby the plain end of the arrow indicates labeling with Cy3, and the tip of the arrow indicates labeling with Cy5. This design allows all possible direct comparison of all treatments.

102

Figure 2. Hierarchical clustering analyses of gene expression patterns. The expression of 1,889 genes (vertically) after growth on eight carbohydrates (horizontally) is shown colorimetrically. (A) Least squares means, representing overall gene expression level corrected for systematic and random errors (see Methods): low=blue, high=red; Hierarchical clustering of least squares means allows visualization of the relative expression levels of all genes within each treatment (Figure 1A). (B) Standardized least square means, representing gene expression level standardized across all 8 treatments, with color indicating expression level relative to the mean expression level across all treatments: low=green, high=red. Clustering of standardized least squares means allows comparison of the standardized expression profile of every gene, across all treatments (Figure 1B). FOS, fructooligosaccharides; FRU, fructose; GAL, galactose; GLC, glucose; LAC, lactose; RAF, raffinose; SUC, sucrose; TRE, trehalose.

103

Figure 3. Hierarchical clustering analysis of expression patterns for select genes and operons. (A) Least squares means of genes of selected genes and operons of interest, representing overall gene expression within treatments: low=blue, high=red; (B) Standardized least squares means of genes of interest, indicating relative expression level across all treatments: low=green, high=red. Carbohydrate sources are displayed at the bottom: FOS, fructooligosaccharides; FRU, fructose; GAL, galactose; GLC, glucose; LAC, lactose; RAF, raffinose; SUC, sucrose; TRE, trehalose.

104

fold change FOS/RAFF

-64 -32 -16 -8 -4 -2 0 2 4 8 16 32 64

sign

ifica

nce

(-lo

g10

P-va

lue)

0

10

20

30

40

501438

14371441

1442 1439

1440 507

503506

502504

505

fold change FOS/RAFF

-64 -32 -16 -8 -4 -2 0 2 4 8 16 32 64

sign

ifica

nce

(-lo

g10

P-va

lue)

0

10

20

30

40

501438

14371441

1442 1439

1440 507

503506

502504

505

1014

1012

1014

1012

Figure 4. Volcano plot comparison of gene expression between FOS and raffinose. Visualization of the global differential gene expression profiles in the presence of raffinose and FOS. The X axis indicates the differential expression profiles, plotting the fold-induction ratios in a logarithmic-2 scale. The Y axis indicates the statistical significance, plotting the statistical significance of the difference in expression (P-value from a t-test) in a logarithmic-10 scale. Genes within the raffinose msm locus are shown in green, genes within the FOS msm locus are shown in blue, and two genes within the trehalose tre locus are shown in red.

105

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

4.0

Lsm

RA

FFIN

OSE

-3.0 -2.0 -1.0 .0 1.0 2.0 3.0 4.0 5.0

Lsm FOS

Lsm

TREH

ALO

SE

<= -2

<= -1

<= 0

<= 1

<= 2

<= 3

> 3

-3.0

-2.0

-1.0

0.0

1.0

2.0

3.0

4.0

Lsm

RA

FFIN

OSE

-3.0 -2.0 -1.0 .0 1.0 2.0 3.0 4.0 5.0

Lsm FOS

Lsm

TREH

ALO

SE

<= -2

<= -1

<= 0

<= 1

<= 2

<= 3

> 3

Figure 5. Contour plot comparison of gene expression between FOS, raffinose and trehalose. Three-way plot of the least squares means of all the genes in the presence of FOS (X axis), raffinose (Y axis), trehalose (Z axis, color coded). In the third dimension (Z axis) the gene expression level is coded colorimetrically: blue=low gene expression, red=high gene expression. Each color in-between is representative of a value range. Differentially expressed operons are annotated: 1437-1442 raffinose msm operon, 502-507 FOS msm operon, 1012, 1014 trehalose tre locus.

106

Treatment Comparison

Lac-Raf

Raf-Suc

Raf-Fru

Fos-Raf

Raf-Glu

Tre-Raf

Lac-Gal

Gal-Fos

Gal-Raf

Gal-Suc

Gal-Tre

Gal-Fru

Fos-Glu

Tre-Fru

Lac-Fos

Lac-Fru

Lac-Glu

Glu-Fru

Gal-Glu

Tre-Fos

Tre-Suc

Tre-Glu

Tre-lac

Suc-Glu

Lac-Suc

Fru-Fos

Suc-Fos

Fru-Suc

Num

ber o

f gen

es d

iffer

entia

lly e

xpre

ssed

0

50

100

150

200

250

300

350

400

Figure 6. Global differential gene expression. Quantification of the number of genes declared differentially expressed by statistical criteria. For all 28 possible treatment comparisons, genes with p-values from a t-test below the Bonferroni correction (-log10(p-value) > 6.04) were considered differentially expressed. For each comparison, the number of genes statistically differentially expressed is plotted, in decreasing order.

107

Minimum Fold Induction

1 2 3 4 5 6 7 8 9 10 11

Num

ber o

f gen

es

0

50

100

150

200

250

300

350

400

Figure 7. Gene fold induction. Quantification of the number of genes differentially expressed above various fold induction cut offs. All possible treatment comparisons were considered, and a gene was considered induced above a particular level if it showed induction in at least one treatment comparison. For genes that showed induction in more than one instance, the highest induction level was selected.

108

1 2 4 8 16 32 64

1

2

4

8

16

32

64

128

256

fold

indu

ctio

n Q

-PCR

fold induction microarrays

La1467 La505 La1436 La1012 La1679

Figure 8. RT-Q-PCR analysis of differentially expressed genes. For five selected genes, induction levels were compared between six different treatments, resulting in 15 induction levels for each gene. The comparison between the fold induction determined by microarrays (X axis) and real-time quantitative RT-PCR (Y axis) is plotted, on a logarithmic-2 scale. Induction levels for each genes are color-coded.

109

manL

pepQ ccpA

Man

Fru

Suc

Fos

Raff

Lac

Lac

Tre

CCR

manM manN

fruR fruK fruA

scrR scrAscrB



galK galT galMlacS lacZ hypo muB

lacL lacM galE

treC treBtreR

ptsH ptsI ptsK

msmR

msmR2

manL

pepQ ccpA

Man

Fru

Suc

Fos

Raff

Lac

Lac

Tre

CCR

manM manN

fruR fruK fruA

scrR scrAscrB



galK galT galMlacS lacZ hypo muB

lacL lacM galE

treC treBtreR

ptsH ptsI ptsK

msmR

msmR2

Figure 9. Genetic loci of interest. The layouts of the loci discussed in the text are shown: man, glucose-mannose locus; fru, fructose locus; suc, sucrose locus; fos, FOS locus; raff, raffinose locus; Lac, lactose-galactose loci; tre, trehalose locus; CCR, carbon catabolite loci.

110

L. johnsonii reg lacS bgaB galK galT galMgalE lacM lacL

L. gasseri reg galK galT galMgalE

lacS lactose-proton symporter

lacLM beta-galactosidase

galE galactose epimerase

galK galactokinase

galT galactose-1P uridyl transferase

L. acidophilus galK galT galMlacS lacZ

hypo muB

reg

Tn

galE lacM lacL

L. johnsonii reg lacS bgaB galK galT galMgalE lacM lacL

L. gasseri reg galK galT galMgalE

lacS lactose-proton symporter

lacLM beta-galactosidase

galE galactose epimerase

galK galactokinase

galT galactose-1P uridyl transferase

L. acidophilus galK galT galMlacS lacZ

hypo muB

reg

Tn

galE lacM lacL

Figure 10. Lactose locus in select lactobacili. Layout of the lactose loci in Lactobacillus gasseri, Lactobacillus johnsonii and Lactobacillus acidophilus.

111

La400 cre1 TGataaaCGtttgaCA -72 bp

cre2 AGataaCGcttaCA -17 bpLa401 cre1 TGaataCGttatCA -48 bp

cre2 TAaaagCGtttaCA -17 bpLa452 cre1 TAaaagCGgattCA -27 bpLa502 cre1 TGaaagCGatatTA -172 bp

cre2 TGaaaaCGtttcCA -140 bpcre3 TAgaaaCGtttcAA -78 bpcre4 TTcaaaCGtttcAA -14 bp

La680 cre1 AGtaagCGctttCC -40 bpLa1012 cre1 TGtgatCGctttCA -82 bp

cre2 TGaaaaCGctttAT -15 bpLa1013 cre1 ATaaagCGttttCA -155 bp

cre2 TGaaagCGatcaCA -88 bpLa1442 cre1 AGaataCGcaatAA -69 bp

cre2 TGaaagCGcttaAA -38 bpLa1459 cre1 TGaaaaCGattaCA -27 bpLa1460 cre1 GAtggaCGaataTA -22 bpLa1461 cre1 AGgtatCGtcatCT -103 bpLa1463 cre1 AAaattCGtcttCT -36 bpLa1465 cre1 AAtaaaCGtaagTA -27 bpLa1467 cre1 TAaaagCGttttCA -32 bpLa1469 cre1 TGtaatCGatttCA -21 bpLa1684 cre1 AGttttCGgacaAC -61 bp

cre2 AGaaatCGcttaCA -25 bp

Figure 11. Catabolite responsive elements sequences. Putative catabolite responsive elements are highlighted in the promoter regions of select differentially expressed genes. Numbers indicate the position of the last cre nucleotide relative to the translational start of the ORF mentioned. The promoter-operator regions of differentially expressed genes and operons were searched for putative catabolite response elements according to consensus sequences TGNNWNCGNNWNCA (Miwa et al., 2000) and TGWAANCGNTNWCA (Weickert and Chambliss, 1990).

112

GLYCOLYSIS

GLUCOSEFRUCTOSESUCROSETREHALOSE

FOS

RAFFINOSE

LACTOSE

GALACTOSE

GLUCOSE-6PFRUCTOSE-1PSUCROSE-6P

LACTOSE

GALACTOSE

TREHALOSE-6P

FOS

RAFFINOSE

FRUCTOSE-1-6P2GLUCOSE-6P + GLUCOSE


FRUCTOSE

SUCROSE+

GALACTOSE

GLUCOSE +FRUCTOSE

GLUCOSE +GALACTOSE

GALACTOSE-1P

UDP-GALACTOSEUDP-GLUCOSE

GLUCOSE-1P

GLUCOSE-6P

FruA

2.7.1.56 FruK

ScrA

3.2.1.26 ScrBMsmEFGK

BfrA 3.2.1.26

MsmEFGK2

MelA 3.2.1.22GtfA

3.2.1.26

LacS

LacS

3.2.1.23 LacZ

2.7.1.6 GalK

2.7.1.10 GalTGalE

5.1.3.2

2.7.7.9 GalU

5.4.2.2 Pgm

ManLMNTreB

3.2.1.93 TreC

GLYCOLYSIS

GLUCOSEFRUCTOSESUCROSETREHALOSE

FOS

RAFFINOSE

LACTOSE

GALACTOSE

GLUCOSE-6PFRUCTOSE-1PSUCROSE-6P

LACTOSE

GALACTOSE

TREHALOSE-6P

FOS

RAFFINOSE

FRUCTOSE-1-6P2GLUCOSE-6P + GLUCOSE


FRUCTOSE

SUCROSE+

GALACTOSE

GLUCOSE +FRUCTOSE

GLUCOSE +GALACTOSE

GALACTOSE-1P

UDP-GALACTOSEUDP-GLUCOSE

GLUCOSE-1P

GLUCOSE-6P

FruA

2.7.1.56 FruK

ScrA

3.2.1.26 ScrBMsmEFGK

BfrA 3.2.1.26

MsmEFGK2

MelA 3.2.1.22GtfA

3.2.1.26

LacS

LacS

3.2.1.23 LacZ

2.7.1.6 GalK

2.7.1.10 GalTGalE

5.1.3.2

2.7.7.9 GalU

5.4.2.2 Pgm

ManLMNTreB

3.2.1.93 TreC

Figure 12. Carbohydrate utilization in L. acidophilus. This diagram shows carbohydrate transporters and hydrolases as predicted by transcriptional profiles. Protein names and EC numbers are specified for each element. PTS transporters are shown in red. GPH transporters are shown in yellow. ABC transporters are shown in green.

113

Figure 13. Expression of glycolysis genes. D-lactate dehydrogenase (D-LDH, La55), phosphyglucerate mutase (PGM, La185), L-lactate dehydrogenase (L-LDH, La271), glyceraldehyde 3-phosphate dehydrogenase (GPDH, La698), phosphoglycerate kinase (PGK La699), glucose 6-phosphate isomerase (GPI, La752), 2-phosphoglycerate dehydratase (PGDH, La889), phosphofructokinase (PFK, La956), pyruvate kinase (PK, La957), fructose-biphosphate aldolase (FBPA, La1599).

114

Chapter IV – Global characterization of the Lactobacillus acidophilus transcriptome and analysis of relationships between gene expression level, codon usage, chromosomal location and

intrinsic gene characteristics

115

4.1 Abstract

The relationships between gene expression level, codon usage, chromosomal

location and intrinsic genes parameters were investigated globally, in Lactobacillus

acidophilus. The codon usage profile revealed a general bias towards AT-rich codons, as

expected for a low GC content organisms. In contrast, genes showing high codon usage

bias had higher GC-content at the third codon position. Correlation analyses showed that

gene expression levels were most highly correlated with GC content, codon adaptation

index, size and then RBS. Gene expression levels did not correlate with GC content at the

third codon position. The high correlation between GC content and gene expression level

may reflect that genes with GC contents much higher than that of the genome signature

are biologically important and highly expressed. Data were segregated into four

chromosomal locations, by strand, location and orientation, relative to the origin and

terminus of replication. Analysis of variance was used to investigate whether there were

differences in gene expression between the four chromosomal locations. The results

showed that genes on the leading strand were more highly expressed, and showed higher

codon usage bias. Also, genes located between the origin and terminus of replication,

relative to the forward strand were also more highly expressed. Overall, genes on either

strand pointing towards the terminus of replication were more highly expressed. Analysis

of the correlation between gene expression level and intrinsic gene parameters, by

location, revealed a strong influence of chromosomal architecture on gene transcription.

Codon usage showed a strong strand bias. Specifically, genes on the leading strand

located between the origin and terminus of replication, pointing towards the terminus,

showed both the highest codon usage bias and gene expression levels. For this particular

116

location, gene expression levels were most highly correlated with codon adaptation

index. Additionally, genes on the lagging strand located between the terminus and the

origin of replication, oriented towards the terminus, showed high expression levels, but

low codon usage bias. The correlations between gene expression level, CAI and GC

content indicate very highly expressed genes have a higher GC content, and display

codon bias. Globally, chromosomal architecture strongly influences gene expression

levels, with a bias towards locating the majority of the highly expressed genes on each

strand pointing towards the terminus of replication. This preferred combination of strand

location and orientation allows for more efficient co-directional replication and

transcription, ultimately providing a selective advantage. Although chromosomal location

and intrinsic gene parameters influence strongly gene transcription, additional factors

including environmental conditions and evolutionary forces also affect gene expression.

This study illustrates the importance of chromosomal architecture for gene expression

and shows that, for L. acidophilus, chromosomal location, codon usage and GC content

are correlated with gene expression level.

117

4.2 Introduction

The universal translation process relies on the genetic code, which describes how

64 codons specify 20 amino acids. The degeneracy of the genetic code allows all amino

acids except Met and Trp to be encoded by more than one codon. As a result, the

“genome hypothesis” proposed that each species developed its own preferred codon

usage pattern (Grantham et al., 1980). The extent to which alternative synonymous

codons are used is not random between and within organisms, which is illustrated by

differences in codon usage between and within species (Sharp et al, 1986; Sharp et al.,

1988; Aota et al., 1988; Lloyd and Sharp, 1992, Coghlan and Wolfe, 2000; Ohno et al.,

2001). Although base composition of the first two codon positions is species-

independent, the base distribution pattern at the third position allows variation between

species (Zhang and Chou, 1994).

The initial premise underlying codon usage studies is based upon the assumption

that selection molds the pattern of codon usage differently in various organisms. In

particular, genes encoding proteins necessary at most stages of the bacterial life cycle

seem to have evolved a specific codon usage that allows such genes to be consistently

highly expressed (Karlin and Mrazek, 2000; Karlin et al., 2004). Specifically, alternative

synonymous codons are not used randomly, and highly expressed genes are

representative of an organism’s codon bias. Codon usage variability is largely determined

by natural selection and mutation, on a genome-wide scale (Sharp et al., 1986; Sharp and

Li, 1987; Chen et al., 2004). Mutational forces are the primary factor responsible for

genome-wide codon bias (Chen et al., 2004), resulting in species genomic differentiation.

Specifically, selection seems to occur via translation efficiency as to differentiate highly

118

expressed genes. Indeed, in model organisms such as Escherichia coli and

Saccharomyces cerevisiae, very highly expressed genes appear to display a relatively

high degree of codon bias (Coghlan and Wolfe, 2000; dosReis et al., 2003).

Among the several measures of codon bias that have been established, most

studies rely upon the codon adaptation index (CAI) (Sharpe and Li, 1987) as a measure of

codon bias (Coghlan and Wolfe, 2000). The codon adaptation index (CAI) is a measure

of synonymous codon usage bias in a particular gene, relative to that of a set of highly

expressed genes (Sharp and Li, 1987). The CAI uses a reference set of highly expressed

genes to assess the relative frequencies of each codon. The CAI has been used to predict

the level of expression of a gene, assess adaptation of genes to a host, and make

comparisons of codon usage in different organisms (Sharp and Li, 1987).

Codon usage measures have been used in a variety of organisms to forecast gene

transcription levels. Specifically, predicted highly expressed (PHX) genes studies have

been carried out in prokaryotes (Karlin and Mrazek, 2000), particularly in low GC Gram-

positive organisms (Karlin et al., 2004). Although CAI has been shown to be strongly

correlated with mRNA concentration in Saccharomyces cerevisiae (Coghlan and Wolfe,

2000), it is unclear whether this correlation is found throughout the prokaryotic kingdom.

Additionally, recent genome-wide studies have questioned the relationship between gene

expression level and codon bias (Coghlan and Wolfe, 2000). Although considerable

attention has been devoted to studying codon usage in a variety of prokaryotes, the

relationship between gene expression levels and codon usage in low GC Gram-positive

bacteria remains mostly uncharacterized.

119

Lactic acid bacteria are a heterogeneous family of microbes which reside in a

variety of environments. The genome sequences of several lactic acid bacteria have been

published, including bifidobacteria (Schell et al., 2002; Siezen et al., 2004), lactobacilli

(Kleerebezem et al., 2003; Pridmore et al., 2004; Altermann et al., 2004), lactococci

(Bolotin et al., 1999), streptococci (Ferretti et al., 2001; Ajdic et al., 2002; Tettelin et al.,

2001), and several others are underway (Klaenhammer et al., 2003; Siezen et al., 2004).

Although it was recently suggested that lactic acid bacteria are prime candidates for

codon optimization, little information is available regarding codon usage in these

microbes. The Lactobacillus acidophilus NCFM genome and transcriptome are well

characterized (Altermann et al., 2004; Azcarate-Peril et al., 2004; Barrangou et al., 2004),

albeit little information is available regarding codon usage in this organism.

The objective of this study was to investigate the relationships between gene

expression levels determined by microarray expression profiles and codon usage,

chromosomal location and intrinsic gene properties such as size and GC content, in L.

acidophilus. This is the first global analysis of codon usage in lactobacilli, providing a

better understanding of the parameters which underlie gene expression for low GC Gram-

positive bacteria.

4.3 Materials and Methods

4.3.1 Genome and microarray data

The complete genome sequence of the probiotic lactic acid bacterium

Lactobacillus acidophilus NCFM was used, as described by Altermann et al. (2004). All

120

annotated ORFs were used, including those annotated as hypothetical, and predicted by

computational methods only.

A whole-genome cDNA microarray platform of L. acidophilus has been

implemented recently (Azcarate-Peril et al., 2004). For the current study, we used a

carbohydrate microarray dataset published previously (Barrangou et al., 2004), and for all

ORFs spotted on the array (n=1889), the median LSM over the 8 treatments, as obtained

from the mixed model ANOVA data analysis (Barrangou et al., 2004), was calculated

and used as the gene expression level for microarray experiments.

4.3.2 Gene intrinsic parameters

For each ORF annotated in the genome, characteristics were parsed out of the

genbank file, namely: ORF number, start position, strand location (either leading or

lagging strand), gene size (in nucleotide), G+C content, for the first (GC1), second (GC2)

and third (GC3) codon positions, and for the whole gene (GCall).

Additionally, expression levels were added for each ORF (array LSM), as the

median expression level over the carbohydrate array experiments, as well as the gene

position, relative to the chromosome terminus, either between the origin and terminus of

replication (O>T), or between the terminus and the origin of replication (T>O). The

terminus was defined as the intergenic region between ORFs 1128 and 1129, which is

located in the middle of the proposed terminus region defined in the L. acidophilus

NCFM genome (Altermann et al., 2004). The complete data set was also split into four

chromosomal locations, by segregating data by strand (leading or lagging), and relative to

the terminus (between the origin and the terminus, or between the terminus and the

121

origin, relative to the leading strand). The resulting four subgroups were designed: LeOT,

genes on the Leading strand located from the Origin to the Terminus; LeTO, genes on the

Leading strand located from the Terminus to the Origin; LaOT, genes on the Lagging

strand located from the Origin to the Terminus (relative to the leading strand); LaTO,

genes on the Lagging strand located from the Terminus to the Origin (relative to the

leading strand).

4.3.3 Codon adaptation index

For codon bias analyses, parameters were calculated according to the method of

Sharp and Li (1987). Specifically, the relative synonymous codon usage (RSCU) table,

the relative adaptedness of each codon (w) and the codon adaptation index (CAI) were

calculated (Sharp and Li, 1987).

The CAI is a measure of the degree of deviation of codon usage in a specific

gene, as compared to that of a selected training set. Three different training sets were

used to calculate CAI. The first training set consisted of the 10 most highly expressed

genes, and resulted in CAI10. The second training set consisted of the 50 most highly

expressed genes, and resulted in CAI50. The third training set included the whole

genome, in order to use the complete genome-wide codon usage bias as the reference.

CAI calculations were carried out using the EMBOSS suite (Rice et al., 2000), using the

cusp and cai tools to compute the codon usage table and CAI, respectively. For the

training set, a desirable sample size of 1% of the predicted coding sequences was

previously proposed (Carbone et al., 2003), and a set of the 20 most highly expressed

genes has also been used for CAI training in S. cerevisiae (Fraser et al., 2004). For the L.

122

acidophilus genome, 1% represents approximately 20 ORFs, which falls in-between the

first two training sets we used, namely 10 and 50 genes.

4.3.4 Ribosome binding site identification

Ribosome binding site (RBS) analyses were carried out using a custom-made

script available at http://sourceforge.net/projects/free2bind. This program is designed to

identify putative RBS based on the lowest calculated energy of the interaction between

the 16S 3’ tail and a given DNA sequence, an approach similar to that of Osada et al.

(1999). The resulting calculated free energy level of the pairing between the 16S 3’ tail

and the putative RBS sequence is used subsequently as an indicator of the RBS quality,

with lowest energy levels representing higher quality Shine Dalgarno (SD) sequences.

Two different RBS types were calculated. Since putative RBS sites have been annotated

in the L. acidophilus NCMF genome (Altermann et al., 2004), the region encompassing

10 bp before and after the first annotated base of the RBS was used as template for search

of RBS sites, which included 1250 ORFs. Additionally, the region upstream of each

annotated ORF was also used as a template for search of RBS sites, which included 1874

ORFs.

4.3.5 Statistical analyses

Correlation analyses were carried out in SAS (SAS Institute, Cary, NC), using the

correlation procedure (proc corr), invoking Pearson, Spearman (non-parametric) and

Kendall (non-parametric) correlation tests. Additionally, regression analysis was carried

out in Excel (Microsoft, CA). Although there is a priori no reason to make assumptions

123

http://sourceforge.net/projects/free2bind

regarding the normal distribution of parameters, or the linearity of the relationships being

studied, we attempted linear regression analysis. ANOVA statistical analyses were

carried out in SAS as well, using the GLM procedure (proc GLM). Although

correspondence analysis is usually used in studies of codon usage (Lloyd and Sharp,

1992, Perriere and Thioulouse, 2002), our objective was to investigate correlations rather

than describe codon usage statistics distributions.

4.4 Results

4.4.1 Distribution patterns

A total of 1813 ORFs had entries for all the parameters, and were subsequently

used in the study. These 1813 ORFs represent over 97% of the genomic ORFs in the L.

acidophilus NCFM genome.

The codon usage profiles for the three training sets are shown in Table 1. For 18

amino acids, more than one codon was used. The codon usage profile revealed a general

bias towards A-T rich codons, as expected in low GC Gram positive genomes. For

CAI10, 14 dominant codons were AT rich at the third position, and four codons were GC

rich at the third position. The amino acids with preferred GC-rich third position were:

Asn (AAC), His (CAC), Lys (AAG) and Tyr (TAC). In contrast, for those four residues,

CAIall had an AT-rich dominant third codon, namely Asn (AAT), His (CAT), Lys

(AAA) and Tyr (TAT). Those dominant codon preferences indicate that throughout the

genome, low GC third positions are preferred, whereas for codons with high bias, high

GC third positions are preferred. This suggests that codons with high bias are different

from genes typical of a low GC organism.

124

For all measured parameters, the distribution of the data over the entire genome

was analyzed, and distributions were visualized for gene expression level, gene size, GC

contents and CAI (Figure 1). The mean expression level was 0.03, ranging between -2.00

and 4.46. The median was -0.36, indicating that most genes have an expression level

below the mean. The transcriptome distribution pattern indicates many genes are lowly

expressed, and few genes are highly expressed. The distribution of gene expression levels

across the genome is similar to that of S. pneumoniae (Martin-Galiano et al., 2004).

For chromosomal location (Figure 2), 996 genes (55%) were on the leading strand

(Le), and 817 (45%) were on the lagging strand (La). Overall, 1042 genes (57%) were

located between the origin and the terminus (OT), and 771 (43%) were located between

the terminus and the origin (TO). Most of the genes on the leading strand (n=813, 82%)

were located between the origin and the terminus (LeOT), and most of the genes on the

lagging strand (n=588, 72%) were located between the terminus and the origin (LaTO).

Gene size varied between approximately 0.1 kb and 13.0 kb, with a mean value

952bp. Overall, 26 genes (1.5%) were bigger than 3.0 kb, and 183 (10.1%) were smaller

than 0.3 kb. Most genes (n=1208, 67%) had sizes between 500 bp and 1,500 bp.

Overall GC content (GCall) varied between 23.2% and 46.3%, with a 35.0%

mean, which is within 0.3% of the genome-wide GC content (Altermann et al., 2004).

There was a strong position specific bias, however, with a GC1 mean (46.2%) over 10%

above and a GC3 mean (25.2%) about 10% below the GCall mean. In contrast, the GC2

mean (33.4%) was close to that of GCall.

For CAI10 and CAI50, distributions profiles were very similar, with values

ranging between 0.24 and 0.87, with a small number of genes (n=232, 13%) showing a

125

high bias (above 0.65). In contrast, for CAIall, a dual-distribution was observed, with a

lot of genes showing high bias. The distribution of CAI10 values across the genome is

similar to that of S. pneumoniae (Martin-Galiano, 2004), although the average is higher in

L. acidophilus.

Putative ribosome binding sites with sequence closest to the RBS consensus

AGGAGG had the lowest free energy of pairing with the 16S 3’ tail. These results are

consistent with previous findings in prokaryotes (Osada et al., 1999, Ma et al., 2002).

4.4.2 Correlation analyses

Correlations between gene expression level and all measured parameters are

summarized in Table 2. Analyses included one parametric (Pearson) and two non-

parametric (Spearman and Kendall) measures of correlation, as well as two indicators of

linear regression analysis fit, namely significance and sum of squares residual.

Globally, parameters showing the highest correlation with gene expression were,

in decreasing order: GCall, GC1, GC2, CAI10, CAI50, size and RBS. GC3, start and

CAIall did not show any significant correlation in all three tests. Gene size showed a

higher correlation in the Spearman and Kendall tests than in Pearson’s. The Pearson test

correlated perfectly with both regression measures.

Although no particular parameter showed a very high correlation with gene

expression level (all correlation coefficients were below 0.5), GCall displayed correlation

coefficients of 0.427, 0.411 and 0.289, for Pearson, Spearman and Kendall, respectively.

Additionally, the linear regression analysis between GCall and LSM gave a statistically

very significant fit, although the relationship may not be fully linear (Figure 3).

126

Visualization of the correlations between gene expression levels and all other

parameters is shown in Figure 3. The best regression curve (least sum of square residual)

corresponded with the most statistically significant P-value, and the highest correlation

coefficient (see Table 2). As previously suggested, the parameter showing the best

regression curve is the best predictor of gene expression level (Coghlan and Wolfe,

2000). However, a prior report suggests that the Pearson correlation is inappropriate for

analyzing CAI correlation with gene expression level, thus Spearman should be preferred

(Coghlan and Wolfe, 2000). Our results indicate both tests gave similar results, with the

exception of correlation to gene size.

A previous study reported a strong correlation between PHX genes and strong

Shine-Dalgarno sequences (Karlin and Mrazek, 2000), and a second study found a

correlation between Shine-Dalgarno sequence conservation and codon usage (Sakai et al.,

2001). In contrast, we found a small correlation between gene expression level and RBS

strength. This weak correlation has been reported for three archaebacteria previously

(Sakai et al., 2001).

4.4.3 Chromosomal location

Although it is tempting to arbitrarily select subsets of genes, based on expression

level, CAI range, or GC content, we segregated the data according to genome location,

within strand, and relative to the origin and terminus of replication (Figure 2). The

correlation between gene expression level and all other parameters were then investigated

again, using methodologies presented above. ANOVA analysis investigated whether

there were differences in the distributions of the parameters between the four

127

chromosomal locations. Results are summarized in Table 3. Analyses revealed genes on

the leading strand were more highly expressed that those on the lagging strand. Similarly,

genes located from the origin to the terminus were more highly expressed than those

located from the terminus to the origin. Genes were segregated by strand (Leading or

Lagging) and relative to the terminus (from the Origin to the Terminus OT, or from the

Terminus to the Origin TO) into four groups, namely LeTO, LeOT, LaTO, LaOT. A

comparison of gene expression across these four groups revealed that LeOT genes were

the most highly expressed, followed by LaTO, and by both LeTO and LaOT, which were

both most lowly expressed (Figure 4).

In contrast, GC3 showed opposite distributions between the four locations, with

LeOT and LaTO showing a lower GC3 content than LeTO and LaOT (Table 3, Figure 4).

CAI10 and CAI50 both showed differences between strands, with higher values

on the leading strand. These results correlate well with distributions observed on Figure

1, showing differences between CAI10-CAI50 and CAIall. Also, the strand differences

explain the dual distribution observed for CAIall on Figure1.

Additionally, correlation analyses were carried out, after data were split into the

four chromosomal locations (Table 4). Differences between locations were observed,

consistent with ANOVA results (Table 3). Since genes LeOT and LaTO showed the

highest expression levels (Table 3, Figure 4), particular attention was given to their

correlations with other parameters. For LeOT, CAI showed the highest correlation,

followed by GCall, GC1, GC2, Size and RBS. Interestingly, CAI10/CAI50 showed the

highest correlation coefficients, namely 0.52 and 0.51. For LaTO, GCall, GC1, CAIall,

128

GC2, CAI10, size, CAI50 and size showed highest correlations. Both GC3 and start did

not show any significant correlation, which is similar to global results (Table 2).

Visualization of the correlation analyses for CAI10, GCall and size can be seen on

Figure 5. For select parameters, gene distribution for each location can be seen on Figure

6. CAI10 showed the strongest correlation with gene expression level, for LeOT (Table

4). Most of those high CAI10 value correspond to genes with high expression levels

(Figure 5). Most of the highly expressed genes are located on LeOT, and some on LaTO

(Figures 5 and 6), while only a few genes located on LeTO and LaOT show expression

above LSM=1.0. In contrast, for the lagging strand, only a few genes show CAI10 above

0.55 (Figures 5 and 6), none of which have high gene expression.

When comparing the relationship between CAI10 and LSM globally (Figure 3)

vs. by location (Figure 5), there is a location-specific difference. In contrast, the

relationship between LSM and gene size, or LSM and GCall does not change when data

is segregated by location (Figures 3 and 5). This is consistent with a strand discrepancy in

codon adaptation (CAI10), as seen on Figure 6.

Both GCall and GC3 contents are consistent regardless of location (Figure 6),

with a value close to that of the genome in the case of GCall. For GC3, the value has to

be lower than that of the genome, due to the restriction in the first two positions, resulting

in a higher GC content for this position. As a result, since L. acidophilus is a low GC

organism, the GC content at the third codon position has to be lower than that of the

genome. Gene size distribution is seemingly equal throughout the chromosome, although

many more genes are present on the LeOT, and LaTO. There is a strong difference

between the two strands as to codon adaptation (Table 3, Figures 4 and 6), which was

129

observed irregardless of the training set used. The codon adaptation index is always

higher for the leading strand, regardless of direction relative to the origin or terminus of

replication (Figure 6).

4.5 Discussion

The analysis of the relationships between gene expression levels, codon usage,

chromosomal location and intrinsic gene properties in L. acidophilus revealed strong

correlations between GC content, codon usage, chromosomal location and gene

expression levels. However, there was no correlation between GC3 and gene expression

level. Globally, chromosomal architecture seemed to influence gene expression strongly,

with both a strand bias, and a gene location and orientation effect, relative to the origin

and terminus or replication.

Globally, a relatively small number of genes showed high expression levels.

Predicted highly expressed genes usually encompass ribosomal proteins (RP),

transcription and translation processing factors (TF), chaperone proteins (CH),

recombination and repair proteins, outer membrane proteins and energy metabolism

enzymes (Karlin and Mrazek, 2000; Karlin et al., 2004). Throughout a variety of

prokaryotes, those genes display a high codon bias (Karlin and Mrazek, 2000). Our

results indicate that the 20 most highly expressed included genes were involved in

glycolysis, transcription, ATP synthesis, membrane construction, ribosomal proteins,

regulators, and a peptidase. Genes encoding glycolytic enzymes and translation factors

have also been shown to be highly expressed in S. pneumoniae (Martin-Galiano et al.,

130

2004). Although this is consistent with RP and TF families of genes, genes most highly

expressed in L. acidophilus did not include CH genes.

Although most studies analyzing codon bias have relied on multivariate statistical

analyses such as correspondence analysis (Perriere and Thioulouse, 2002), the major

trends identified in codon usage account for a low proportion of the variation (Grocok

and Sharp, 2002). In a thorough study of Pseudomonas aeruginosa, the first axis

accounted for 17% of the variation, and the first four axes combined accounted for a total

of 30% of the variation (Grocock and Sharp, 2002). In another study, the combination of

the first three axes account for less than 23% of the variation in codon usage (McInerney,

1994).

Since our objective was to investigate correlation between gene features and

expression levels, rather than describe the variation within CAI distributions, we used

correlation analysis rather than correspondence analysis. Although no assumption can be

made as to the linearity of the relationships between parameters being tested, a linear

regression was attempted nonetheless. Several correlation analyses were carried out,

including both parametric and non-parametric analyses, namely Pearson, Spearman and

Kendall, since no assumptions were made a priori regarding data distribution and

linearity of the relationships. Spearman correlation analysis has previously been used in

codon analysis studies and Spearman ranking was considered a more appropriate statistic

than the Pearson correlation coefficient (Coghlan and Wolfe, 2000). Similarly, Spearman

correlation has also been used previously to investigate correlation between effective

number of codons in a gene (Nc) and CAI (Fuglsang, 2003). Additionally, Kendall

correlation has also been used to analyze the correlation between gene expression level

131

and codon usage (dosReis et al., 2003). A combination of both Pearson and Spearman

correlation analyses has also been used to investigate correlations between CAI and other

parameters (Jansen et al., 2003). Pearson correlation coefficients have also been used to

analyze the correlation between codon bias and microarray expression data (Fraser et al.,

2004). Our strategy allows comparison of results obtained from both parametric

(Pearson) and non-parametric (Kendall, Spearman) correlation tests. It was previously

suggested that non-parametric tests are more appropriate for such analyses, since they are

robust against non-linearity and non-normality (dosReis et al., 2003).

Prior studies carried out using correspondence analysis to investigate CAI statistic

distribution (Lloyd and Sharp, 1992) have identified a major and a secondary trend, with

the first axis appearing to differentiate genes according to their expression level. (Lloyd

and Sharp, 1992, Kliman et al., 2003). Although our results indicate a correlation

between CAI and gene expression level, globally, our strongest correlation was

established between gene expression level and GC content. Additionally, our

investigation of the correlation between CAI and other statistics indicated it is not

correlated with GC3.

CAI has previously been shown to be the best codon usage bias indicator

(Coghlan and Wolfe, 2000). CAI was also shown to be highly correlated with mRNA

expression levels in S. cerevisiae (Coghlan and Wolfe, 2000). CAI and mRNA levels

have been shown to be correlated previously (dosReis et al., 2003). In our study, we

found a strong correlation between gene expression level and CAI10/CAI50. Although it

was not as strong as that between gene expression level and GCall on a global scale, it

was the strongest correlation for gene positioned LeOT. Our results show gene GC

132

content is the parameter most highly correlated with gene expression, which is different

from results shown previously (dosReis et al., 2003), but similar to findings from rodents

(Konu and Li, 2002).

Our results, indicating a positive correlation between gene expression level and

gene size, differ from previous studies reporting a negative correlation between mRNA

concentration and protein length (Coghlan and Wolfe, 2000) and gene length and codon

usage (Kliman et al., 2003). Perhaps this discrepancy reflects the differences between the

organism used in the study, eukaryotic S. cerevisiae and prokaryotic L. acidophilus. The

relationship between CAI and mRNA levels has been shown previously in S. cerevisiae

(Coghlan and Wolfe, 2000), and in E. coli (dosReis et al., 2003). Also, a non-parametric

regression on mRNA expression levels in E. coli has shown that gene size followed by

GC and then CAI are the best predictors of mRNA concentration (dosReis et al., 2003).

Although several studies have used the CAI as an indicator of gene expression, a

variable positive correlation is found between codon bias and level of gene expression.

Historically, initial CAI studies claimed the strong correlation between CAI and levels of

gene expression allow utilization of CAI as a predictor of gene expression (Sharp and Li,

1987). In contrast, we believe the correlation between CAI and gene expression level is

indicative, rather than predictive of the level at which a gene is expressed.

The a priori assumption that genes with genes bias close to that of highly

expressed genes should be highly expressed is not consistent with the fact that some

genes with very high CAI values are not highly expressed (Figure 5). Analysis of CAI

and microarray gene expression levels in Streptococcus pneumoniae showed that CAI is

not always predictive of gene expression (Martin-Galiano et al., 2004). Specifically,

133

genes with high CAI are not always highly expressed, and genes with low CAI can be

highly expressed (Marting-Galiano et al., 2004), which is also shown in our current

findings (Figure 5). Interestingly, S. pneumoniae and L. acidophilus are both low GC

Gram-positive lactic acid bacteria.

A small correlation (r2 0.09) has been shown between CAI and microarray

fluorescence in S. pneumoniae (Martin-Galiano et al., 2004). A similar correlation level

(r2 0.07) was shown in L. acidophilus. In contrast, a higher correlation (r2 0.18) was

found between GC and gene expression level in L. acidophilus.

The genomic DNA GC-content varies widely between species, as a result of

mutation pressure (Muto and Osawa, 1987). GC variation has been shown to be the most

important parameter differentiating codon usage bias between organisms, in archae and

eubacteria (Chen et al., 2004). The relationship between codon usage bias and GC

composition has been characterized across unicellular genomes (Wan et al., 2004).

Specifically, GC3 was shown to be the primary factor within GC content to correlate

highly with codon usage bias (Wan et al., 2004). Further, GC3 was hypothesized as the

key factor driving synonymous codon usage, independently of species (Zhang and Chou,

1994; Wan et al., 2003). Although those results were inferred across 70 bacterial species

and 16 archaeal genomes, our results show this is not the case for L. acidophilus. We

found no correlation between GC3 and CAI. The non-linearity of the relationship

between codon usage bias measures and GC3 has been shown previously in a variety of

bacteria and archaea (Wan et al., 2004).

The L. acidophilus NCFM genome is 34.7% GC, so it is not surprising that codon

usage is related to base composition bias. The observed differences in GC content at the

134

three codon positions illustrate the overall GC content. Codon degeneracy is located

primarily at the third position of the codon, since there are strict constraints on the first

and second position of each codon (Zhang and Chou, 1994). As a result, the third codon

position is representative of the GC content of an organism, and reflects differences

between species (Muto and Osawa, 1987; Carbone et al., 2003). GC3 has previously been

shown to vary between species (Zhang and Chou, 1994), explaining the species impact

on the correlation between GC3 and CAI (Lloyd et al., 1992). Also, it was previously

reported that CAI can most highly correlate with GC skew (Carbone et al., 2003), and

that gene expression levels are correlated with GC3 (Kliman et al., 2003). The position-

specific GC content within codons has been investigated previously (Muto and Osawa,

1987; Chen and Zhang, 2003), across species with varying GC content, indicating that

low GC content bacteria have higher GC content at the first codon position and lower GC

content at the third codon position, than that of their overall genome content, while that

of the second codon position is close to their genomic content (Chen and Zhang, 2003).

This is consistent with our findings in L. acidophilus (Figure 1). Early work showed that

there is a codon position bias in GC content, which is correlated with genome GC content

(Muto and Osawa, 1987). Specifically, the correlation between GC3 and genome GC

content explains the discrepancies observed at the third codon position between species

with varying GC content (Muto and Osawa, 1987).

A previous study investigating codon bias in P. aeruginosa (Grocock et al., 2002)

reported that for species with highly biased GC base composition, the CAI methodology

may not be appropriate. While the study in P. aeruginosa (67% GC) illustrated this point

for high GC organisms, our analyses in L. acidophilus (35% GC) might validate this

135

theory for low GC organisms. It was recently suggested lactic acid bacteria are a

desirable group of organisms for analysis of codon usage (Fuglsang, 2003), but our result

suggest that caution should be applied when using the CAI methodology.

Perhaps the high correlation between GC content and gene expression level is due

to the genomic composition of L. acidophilus. The genomic GC content in prokaryotes

ranges between approximately 25% and 75% (Muto and Osawa, 1987), which allows

great codon usage flexibility and variability. Since L. acidophilus is a low GC organism

(Altermann et al., 2004), perhaps the strong correlation between GC content and gene

expression level is due to the importance of high GC content genes. Indeed, for a low GC

organism such as L. acidophilus, genes with a high GC content differ widely from its

genomic “fingerprint”, since GC content is a main component of genomic signature

(Sandberg et al., 2003). Therefore, retaining genes that vary from its overall genomic

signature may indicate that they are biologically important, and consequently highly

expressed.

A correlation between RBS and gene expression level was found, albeit it was

minor compared to that of GCall. Nonetheless, a positive correlation between a strong

RBS and gene expression level is intuitive, and consistent with previous findings (Ma et

al., 2002).

We observed a discrepancy between the genome signature (low GC) and highly

expressed genes (high GC), perhaps indicating the codon usage for highly expressed

genes is different from that of the genome. Specifically, the genome-wide codon usage is

characterized by a high AT content at the third codon position, which is consistent with a

low GC organism. In contrast, genes with high codon bias showed a specific preference

136

for high GC content at the third codon position for select amino acids (Table 1).

However, GC3 was not a good indicator of gene expression (Tables 2 and 4). Perhaps

this is an indicator that for low GC organisms, overall gene GC content is more

representative of bias than codon usage.

Differences in the base composition between strands have been shown previously

(Grocock and Sharp, 2002; Lobry and Sueoka, 2002). The leading-lagging strand bias in

codon usage has been shown in Borriella burgdorferi (McInerney, 1998; Carbone et al.,

2003). Additionally, replication selection is seemingly responsible for the presence of the

majority of the genes on the leading strand, whereas transcription selection results in

higher expression of genes present on the leading strand (McInerney, 1998).

Interestingly, location per se did not correlate with gene expression level globally

(Table 2). This means that the position of the start of any gene on the chromosome does

not correlate with gene expression level. However, it was shown previously that location

is indeed an important factor in gene expression. We therefore further investigated the

effect of both strand location, and orientation relative to the terminus on gene expression

level.

The importance of chromosomal location has been illustrated before in P.

aeruginosa (Grocock and Sharp, 2002), Borrelia burgdorferi (McInerney, 1998), and

Treponoma pallidum (Lafay et al., 1999). Specifically, differences between strands have

been illustrated for codon usage (McInerney, 1998). Strand location was shown to be a

major cause of variation in codon usage (McInerney, 1998). Albeit the correlation

between gene location and expression level has been estimated weak in P. aeruginosa,

whereby gene location was only the tertiary trend in correspondence analysis, accounting

137

for only 4.4% of variation (Grocock and Sharp, 2002). Strand location accounted for

8.6% of the variation in codon usage, as the secondary source of variation (Lafay et al.,

1999). In contrast, in B. burgdorferi, strand location is the primary parameter involved in

codon usage, accounting for 13.7% of the variation (McInerney, 1998). Within species,

inter-strand differences appear on the primary axis of correspondence analysis (Lafay et

al., 1999). Nevertheless, they showed that the position of a gene relative to the strand has

an influence on codon usage (Grocock and Sharp, 2002). In addition to strand location,

the orientation of a gene relative to the direction of DNA replication is also important in

codon usage pattern (McInerney, 1998). Nevertheless, the impact of both strand and

orientation on gene expression had not yet been illustrated simultaneously, prior to our

study.

Chromosomal architecture has a major effect on gene expression, both relative to

strand bias and gene position and orientation relative to the terminus of replication. The

impact of chromosomal architecture is important for many of the parameters measured in

our study, showing a significant bias for genes converging towards the terminus.

Although it was previously shown the leading strand in low GC Gram-positives

pervasively exceeds 75% of the genes (Karlin et al., 2004), it is not the case in L.

acidophilus, where only 55% of the genes are on the leading strand. Nevertheless, very

significant differences in codon usage, GC content and other parameters were observed

between the two strands.

Interestingly, while codon usage, GC content and gene size all showed a global

correlation with gene expression levels, CAI was the parameter which showed the most

variability between chromosomal locations, relative to the strand bias, and the position

138

and orientation relative to the terminus. Specifically, the correlation between CAI10 and

gene expression level is higher for LeOT genes (Table 4). In contrast, the correlation

between gene expression level and GCall or GC3 was consistent regardless of location.

For CAI particularly, genes on the leading strand located between the origin and the

terminus of replication show the most codon usage bias. Specifically, genes that show the

most codon bias are located in this region, and are likely to be highly expressed (Figure

5).

Globally, it seems chromosomal architecture is a primary factor controlling gene

expression in L. acidophilus. Perhaps the combination between replication efficiency and

transcription efficiency underlie the impact of chromosomal location on gene

expressivity. Indeed, replication is thought to be more efficient while co-directional with

transcription (French, 1992), since collisions between the RNA and DNA polymerases

are likely to slow down both processes (French, 1992). Hence, there is a selective

advantage towards locating the majority of the genes on each strand pointing towards the

terminus. As suggested previously, more efficient replication may be a selective

advantage (McInerney, 1998) and the most desirable gene location would combine genes

on the leading strand and on the lagging strand pointing toward the terminus. This is

consistent with our observations, namely genes on LeOT and LaTO showing higher

expression levels (Table 3, Figures 4 and 5).

The causal link established between codon usage and gene expression level is still

as controversial as when the concept was initially presented (Sharp et al., 1986). Early

work aimed at predicting expression level of a gene given only the nucleotide sequence

of the coding region (Sharp et al., 1986). Although, tRNA relative abundancies also

139

impact gene expression level, we did not include them in our study, since our primary

objective was to investigate the correlation between intrinsic gene features and

expression. Nevertheless, a correlation between usage of preferred codons and level of

their respective major isoacceptor tRNA has been shown in E. coli . This correlation

explains an adaptation of highly expressed genes towards translational efficiency

(dosReis et al., 2003).

Although CAI has previously been reported as a predictor of mRNA

concentration, it is an imperfect and unreliable measure of gene expression (Coghlan and

Wolfe, 2000). From a biological standpoint, intrinsic gene parameters are set, regardless

of environmental conditions. Since environmental conditions have been shown to impact

gene expression on a large scale, as shown by microarray studies, intrinsic gene

parameters are unable to predict changes in mRNA levels with changing biological

conditions, as mentioned previously (Coghlan and Wolfe, 2000). Indeed, extrinsic

parameters such as intergenic regions comprising promoter sequences are also involved

in gene expression control.

Globally, gene expression is controlled at several levels, including initiation of

transcription, transcription termination and codon usage. Additionally, the minor codon

modulator hypothesis stipulates that minor codons near the initiation site may play a role

in regulating gene expression (Ohno et al., 2001). As a result, although codon bias

measures may be correlated with intrinsic parameters, they are not good predictors of

mRNA levels. Perhaps a mixed model similar to that presented by dosReis et al. (2003),

including several parameters, is more representative of the heteroscedastic nature of gene

expression. Overall, many factors are involved in gene expression, including codon

140

usage, gene length, transcription initiation, amino-acid composition, protein function,

tRNA abundance, environmental conditions, mutation and evolutionary forces, GC

compositions, and others, which underlies the complexity in modeling and predicting

gene expression based on a defined number of parameters. It would be utopic to consider

that intrinsic gene parameters can solely be used to predict gene expression. An effective

predictor of gene expression has to include all of the parameters involved in translation,

transcription, environmental conditions and physiological state of the organism.

Nevertheless, this study illustrates the importance of chromosomal architecture for gene

transcription, and shows that codon usage and GC content are best correlated with

expression levels in L. acidophilus.

141

4.6 References


Altermann, E., Russell, W. M., Azcarate-Peril, M. A., Barrangou, R., Buck, L. B.,


Aota, S. I., Gojobori, T., Ishibashi, F., Maruyama, T., & Ikemura, T. (1988) Nucleic

Acids Res. 16, r315-r402 Azcarate-Peril et al., (2004) In review Barrangou, R., Azcarate-Peril, M. A., Duong, T., Conners, S. B., Kelly, R. M., &

Klaenhammer, T. R. (2004) In review. Bolotin, A., Mauger, S., Malarme, K., Ehrlich, S. D., & Sorokin, A. (1999) Antonie van

Leeuwenhoek 76, 27-76 Carbone, a., Zinovyev, A., & Kepes, F. (2003) Bioinformatics 19, 2005-2015 Chen, L. L., & Zhang, C. T. (2003) Biochem. Biophys. Res. Comm. 306, 310-317 Chen, S. L., Lee, W., Hottes, A. K., Shapiro, L., & McAdams, H. H. (2004) Proc. Natl.

Acad. Sci. USA 101, 3480-3485 Coghlan, A., & Wolfe, K. H. (2000) Yeast 16, 1131-1145 dosReis, M., Wernisch, L., & Savva, R. (2003) Nucleic Acids Res. 31, 6976-6985 Ferretti, J. J., McShan, W. M., Ajdic, D., Savic, D. J., Savic, G., Lyon, K., Primeaux, C.,

Sezate, S., Suvorov, A., Kenton, S., Lai, H. S., Lin, S. P., Qian, Y., Jia, H. G., Najar, F. Z., Ren, Q., Zhu, H., Song, L., White, J., Yuan, X., Clifton, S. W., Roe, B. A., & McLaughlin, R. (2001) Proc. Natl. Acad. Sci. USA 98, 4658-4663

Fraser, H. B., Hirsch, A. E., Wall, D. P., & Eisen, M. B. (2004) Proc. Natl. Acad. Sci.

USA 101, 9033-9038 French, S. (1992) Science 258, 1362-1361365 Fuglsang, A. (2003) Biochem. Biophys. Res. Comm. 312, 285-291

142

Fuglsang, A. (2004) Antonie van Leeuwenhoek 86, 135-147 Grantham, R., Gautier, C., Gouy, M., Mercier, R., & Pave, A. (1980) Nucleic Acids Res.

8, r49-r62 Grocock, R. J., & Sharp, P. M. Gene 289, 131-139 Jansen, R., Bussemaker, H. J., & Gerstein, M. (2003) Nucleic Acids Res. 31, 2242-2251 Karlin, S., & Mrazek, J. (2000) J. Bacteriol. 182, 5238-5250 Karlin, S., Theriot, J., & Mrazek, J. (2004) Proc. Natl. Acad. Sci. USA 101, 6182-6187 Klaenhammer, T. R., Altermann, E., Arigoni, F., Bolotin, A., Breidt, F., Broadbent, J.,

Cano, R., Chaillou, S., Deutscher, J., Gasson, M., van de Guchte, M., Guzzo, J., Hartke, A., Hawkins, T., Hols, P., Hutkins, R., Kleerebezem, M., Kok, J., Kuipers, O., Lubbers, M., Maguin, E., McKay, L., Mills, D., Nauta, A., Overbeek, R., Pel, H., Pridmore, D., Saier, M., van Sinderen, D., Sorokin, A., Steele, J., O'Sullivan, D., de Vos, W., Weimer, B., Zagorec, M., and Siezen, R. (2002) Antonie Van Leeuwenhoek 82, 29-58

Kleerebezem, M., Boekhorst, J., van Kranenburg, R., Molenaar, D., Kuipers, O. P., Leer,


Kliman, R. M., Irving, N., & Santiago, M. (2003) J. Mol. Evol. 57, 98-109 Lafay, B., Lloyd, A. T., McLean, M. J., Devine, K. M., Sharp, P. M., and Wolfe, K. H.

(1999) Nucleic Acids Res. 27, 1642-1649 Lloyd, A. T., & Sharp, P. M. (1992) Nucleic Acids Res. 20, 5289-5295 Lobry, J. R., & Sueoka, N. (2002) Genome Biol. 3, 1-14 Ma, J., Campbell, A., & Karlin, S. (2002) J. Bacteriol. 184, 5733-5745 Martin-Galiano, A. J., Wells, J. M., & de la Campa, A. G. (2004) Microbiol. 150, 2313-

2325 McInerney, J. O. (1998) Proc. Natl. Acad. Sci. USA 95, 10698-10703 Muto, A., & Osawa, S. (1987) Proc. Natl. Acad. Sci. USA 84, 166-169 Ohno, H., Sakai, H., Washio, T., & Tomita, M. (2001) Gene 276, 107-115

143

Osada, Y., Saito, R., & Tomita, M. (1999) Bioinformatics 15, 578-581 Perriere, G., & Thioulouse, J. (2002) Nucleic Acids Res. 30, 4548-4555 Pridmore RD, Berger B, Desiere F, Vilanova D, Barretto C, Pittet AC, Zwahlen MC,

Rouvet M, Altermann E, Barrangou R, Mollet B, Mercenier A, Klaenhammer TR, Arigoni F, & Schell MA. (2004) Proc. Natl. Acad. Sci. USA 101, 2512-2517

Rice, P., Longden, I., & Bleasby, A. (2000) Trends Gen. 16, 276-7 Sakai, H., Imamura, C., Osada, Y., Saito, R., Washio, T., & Tomita, M. (2001) J. Mol.

Evol. 52, 164-170 Sandberg, R., Branden, C. I., Ernberg, I., & Coster, J. (2003) Gene 311, 35-42 Schell, M. A., Karmirantzou, M., Snel, B., Vilanova, D., Berger, B., Pessi, G., Zwahlen,

M. C., Desiere, F., Bork, P., Delley, M., Pridmore, R. D. & Arigoni, F. (2002) Proc. Natl. Acad. Sci. USA 99, 14422-14427

Sharp, P. M., Tuohy, T. M. F., & Mosurski, K. R. (1986) Nucleic Acids Res. 14, 5125-

5143 Sharp, P. M., & Li, W. H. (1987) Nucleic Acids Res. 15, 1281-1295 Sharp, P. M., Cowe, E., Higgins, D. G., Shields, D. C., Wolfe, K. H., & Wright, F. (1988)

Nucleic Acids Res. 16, 8207-8211 Siezen, R. J., van Enckevort, F. H. J., Kleerebezem, K., & Teusink, B. (2004) Curr. Opin.

Biotechnol. 15, 105-115 Tettelin, H., Nelson, K. E., Paulsen, I. T., Eisen, J. A., Read, T. D., Peterson, S.,

Heidelberg, J., Deboy, R. T., Haft, D. H., Dodson, R. J., Durkin, A. S., Gwinn, M., Kolonay, J. F., Nelson, W. C., Peretron, J. D., Umayam, L. A., While, O., Salzberg, S. L., Lewis, M. R., Radune, D., Holtzapple, E., Khouri, H., Wolf, A. M., Utterback, T. R., Hansen, C. L., McDonald, L. A., Feldblyum, T. V., Angiuoli, S., Dickinson, T., Hickey, E. K., Holt, I. E., Loftus, B. J., Yang, F., Smith, H. O., Venter, J. C., Dougherty, B. A., Morrison, D. A., Hollingshead, S. K., & Fraser, C. M. (2001) Science 293, 498-506

Wan, X. F., Xu, D., Kleinhofs, A., & Zhou, J. (2004) BMC Evol. Biol. 28, 19-30 Zhang, C. T., & Chou, K. C. (1994) J. Mol. Biol. 238, 1-8

144

Table 1. Codon usage Table

Fraction per AA Fraction per 1000 Total number Codon AA CAI10 CAI50 CAIall CAI10 CAI50 CAIall CAI10 CAI50 CAIall

GCA Ala 0.291 0.312 0.352 25.581 25.144 18.303 110 510 10551 GCC 0.106 0.078 0.108 9.302 6.311 5.598 40 128 3227 GCG 0.029 0.024 0.101 2.558 1.972 5.232 11 40 3016 GCT 0.574 0.585 0.440 50.465 47.133 22.846 217 956 13170 AGA Arg 0.214 0.229 0.341 9.302 10.403 13.987 40 211 8063 AGG 0.011 0.007 0.128 0.465 0.296 5.249 2 6 3026 CGA 0.053 0.029 0.103 2.326 1.331 4.243 10 27 2446 CGC 0.043 0.071 0.089 1.86 3.205 3.671 8 65 2116 CGG 0.011 0.008 0.071 0.465 0.345 2.92 2 7 1683 CGT 0.668 0.657 0.268 29.07 29.779 10.982 125 604 6331 AAC Asn 0.615 0.539 0.311 29.302 25.588 15.404 126 519 8880 AAT 0.385 0.461 0.689 18.372 21.89 34.05 79 444 19629 GAC Asp 0.401 0.333 0.229 28.837 24.405 9.317 124 495 5371 GAT 0.599 0.667 0.771 43.023 48.809 31.33 185 990 18061 TGC Cys 0.200 0.179 0.496 0.465 0.592 5.947 2 12 3428 TGT 0.800 0.821 0.504 1.86 2.712 6.032 8 55 3477 CAA Gln 0.901 0.913 0.691 31.628 29.434 29.627 136 597 17079 CAG 0.099 0.087 0.309 3.488 2.81 13.279 15 57 7655 GAA Glu 0.930 0.943 0.834 65.349 65.03 35.541 281 1319 20488 GAG 0.070 0.057 0.166 4.884 3.895 7.079 21 79 4081 GGA Gly 0.090 0.085 0.229 7.209 6.508 10.615 31 132 6119 GGC 0.119 0.143 0.173 9.535 10.994 8.009 41 223 4617 GGG 0.026 0.028 0.102 2.093 2.169 4.722 9 44 2722 GGT 0.765 0.743 0.496 61.163 56.994 22.976 263 1156 13245 CAC His 0.667 0.610 0.378 12.093 10.649 8.36 52 216 4819 CAT 0.333 0.390 0.622 6.047 6.804 13.73 26 138 7915 ATA Ile 0.032 0.025 0.265 2.093 1.726 22.263 9 35 12834 ATC 0.389 0.315 0.220 25.349 21.644 18.466 109 439 10645 ATT 0.579 0.660 0.516 37.674 45.407 43.39 162 921 25013 CTA Leu 0.020 0.027 0.129 1.628 2.169 16.056 7 44 9256 CTC 0.037 0.023 0.043 3.023 1.873 5.383 13 38 3103 CTG 0.014 0.011 0.107 1.163 0.937 13.26 5 19 7644 CTT 0.301 0.315 0.146 24.884 25.736 18.169 107 522 10474 TTA 0.361 0.362 0.332 29.767 29.532 41.241 128 599 23774 TTG 0.268 0.262 0.242 22.093 21.348 30.106 95 433 17355 AAA Lys 0.311 0.318 0.559 21.163 23.468 46.568 91 476 26845 AAG 0.689 0.682 0.441 46.977 50.288 36.722 202 1020 21169 ATG Met 1.000 1.000 1.000 20.93 26.426 34.24 90 536 19738 TTC Phe 0.400 0.392 0.311 13.023 13.657 13.777 56 277 7942 TTT 0.600 0.608 0.689 19.535 21.151 30.538 84 429 17604 CCA Pro 0.679 0.634 0.493 25.116 24.06 14.219 108 488 8197 CCC 0.038 0.014 0.065 1.395 0.542 1.863 6 11 1074 CCG 0.038 0.036 0.144 1.395 1.38 4.163 6 28 2400 CCT 0.245 0.316 0.298 9.07 11.98 8.597 39 243 4956 AGC Ser 0.093 0.071 0.133 4.884 3.796 7.423 21 77 4279 AGT 0.182 0.189 0.221 9.535 10.058 12.318 41 204 7101 TCA 0.529 0.544 0.299 27.674 28.94 16.625 119 587 9584 TCC 0.036 0.021 0.061 1.86 1.134 3.379 8 23 1948 TCG 0.009 0.011 0.087 0.465 0.592 4.843 2 12 2792 TCT 0.151 0.163 0.198 7.907 8.677 11.038 34 176 6363 ACA Thr 0.098 0.111 0.271 6.512 6.409 13.354 28 130 7698 ACC 0.112 0.096 0.125 7.442 5.571 6.136 32 113 3537 ACG 0.010 0.023 0.129 0.698 1.331 6.375 3 27 3675 ACT 0.780 0.770 0.475 51.86 44.569 23.412 223 904 13496 TGG Trp 1.000 1.000 1.000 3.721 7.247 12.547 16 147 7233 TAC Tyr 0.566 0.578 0.340 20 18.981 11.279 86 385 6502 TAT 0.434 0.422 0.660 15.349 13.854 21.901 66 281 12625 GTA Val 0.250 0.275 0.313 19.767 22.087 20.775 85 448 11976 GTC 0.035 0.041 0.101 2.791 3.303 6.722 12 67 3875 GTG 0.029 0.037 0.193 2.326 2.958 12.797 10 60 7377 GTT 0.685 0.647 0.393 54.186 51.965 26.062 233 1054 15024 TAA Stop 0.000 0.000 0.422 0.000 0.000 14.733 0 0 8493 TAG 0.000 0.000 0.322 0.000 0.000 11.239 0 0 6479 TGA 0.000 0.000 0.257 0.000 0.000 8.972 0 0 5172

145

Table 2. Correlation analyses GCall GC1 GC2 GC3 CAI10 CAI50 CAIall RBSall Start Size

rP*

0.427

<.0001 0.419

<.0001 0.280

<0.0001 0.090 0.0001

0.268 <0.0001

0.262 <0.0001

0.054 0.0220

-0.171 <0.0001

-0.080 0.0006

0.240 <0.0001

rS*

0.411

<0.0001 0.410

<0.0001 0.283

<0.0001 0.070 0.0030

0.144 <0.0001

0.152 <0.0001

0.042 0.077

-0.144 <0.0001

-0.062 0.008

0.401 <0.0001

rK*

0.289 <0.0001

0.285 <0.0001

0.194 <0.0001

0.047 0.0025

0.097 <0.0001

0.103 <0.0001

0.026 0.1042

-0.098 <0.0001

-0.043 0.0064

0.276 <0.0001

Pval 4.0E-81 6.8E-78 5.5E-34 1.14E-4 4.3E-31 6.1E-30 2.2E-2 2.1E-13 6.0E-4 3.6E-25

SSR 3,124 3,149 3,520 3,788 3,546 3,556 3,808 3,707 3,795 3,599

* The first number indicates the correlation coefficient, and the second number indicates

the statistical significance rP Pearson correlation coefficient rS Spearman correlation coefficient rK Kendall correlation coefficient P-value significance of the linear regression statistic SSR Sum of Square Residuals from the linear regression

146

Table 3. Analysis of variance between chromosomal locations

genes LSM GCall GC1 GC2 GC3 CAI10 CAI50 CAIall RBSall Start Size

LeOT*

813

0.2665 A

0.3495 B

0.4688A

0.3356AB

0.2439B

0.5762A

0.5850A

0.7728A

-6.3774 A

556132 B

941.28AB

LeTO

183

-0.3239 C

0.3559 A

0.4548BC

0.3417A

0.2708A

0.5662A

0.5718B

0.7425B

-6.4574 A

1572284A

929.21AB

LaOT

229

-0.4418 C

0.3487 B

0.4455C

0.3295B

0.2696A

0.4383B

0.4292C

0.6444C

-6.2983 A

538891 B

837.73B

LaTO

588

0.0075 B

0.3488 B

0.4636AB

0.3300B

0.2513B

0.4434B

0.4304C

0.6305D

-6.2580 A

1552695A

1021.17A

* The first number indicates the mean, and the second number indicates the statistical

significance. Means with the same letter are not significantly different. LeOT leading strand, from the origin to the terminus LeTO leading strand, from the terminus to the origin LaOT lagging strand, from the origin to the terminus LaTO lagging strand, from the terminus to the origin

147

Table 4. Correlation analyses, by chromosomal location

genes GCall GC1 GC2 GC3 CAI10 CAI50 CAIall RBSall Start Size

LeOT*

813

0.4921 <.0001

0.4658 <.0001

0.3167 <.0001

0.0876 0.0124

0.5176 <.0001

0.5118 <.0001

0.1500 <.0001

-0.2631 <.0001

-0.0490 0.1629

0.2505 <.0001

LeTO

183

0.3691 <.0001

0.2917 <.0001

0.2600 <.0001

0.1601 0.0304

0.1509 0.0415

0.1255 0.0905

-0.10530.1560

0.0479 0.5198

-0.01831 0.8056

0.2604 0.0004

LaOT

229

0.4729 <.0001

0.3920 <.0001

0.3374 <.0001

0.1934 0.0033

-0.15230.0211

-0.11310.0876

-0.3368<.0001

-0.1322 0.0457

-0.0872 0.1886

0.44666<.0001

LaTO

588

0.3710 <.0001

0.3412 <.0001

0.2240 <.0001

0.1551 0.0002

-0.2205<.0001

-0.1793<.0001

-0.3122<.0001

-0.0972 0.0184

-0.0681 0.0992

0.1989 <.0001

* The first number indicates the Pearson correlation coefficient, the second number

indicates the statistical significance. LeOT leading strand, from the origin to the terminus LeTO leading strand, from the terminus to the origin LaOT lagging strand, from the origin to the terminus LaTO lagging strand, from the terminus to the origin

148

%GC

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Num

ber o

f gen

es

0

100

200

300

400

500

600

GC1GC2GC3GCall

Gene expression level (array LSM)

-3 -2 -1 0 1 2 3 4 5

Num

ber o

f gen

es

0

10

20

30

40

50

60

70

CAI

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Num

ber o

f gen

es

0

50

100

150

200

250

300

CAI10CAI50CAIall

Gene size (nt)

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

2100

2200

2300

2400

2500

2600

2700

2800

2900

3000

>300

0

Num

ber o

f gen

es

0

20

40

60

80

100

120

140

160

180

%GC

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Num

ber o

f gen

es

0

100

200

300

400

500

600

GC1GC2GC3GCall

Gene expression level (array LSM)

-3 -2 -1 0 1 2 3 4 5

Num

ber o

f gen

es

0

10

20

30

40

50

60

70

CAI

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Num

ber o

f gen

es

0

50

100

150

200

250

300

CAI10CAI50CAIall

Gene size (nt)

100

200

300

400

500

600

700

800

900

1000

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

2100

2200

2300

2400

2500

2600

2700

2800

2900

3000

>300

0

Num

ber o

f gen

es

0

20

40

60

80

100

120

140

160

180

Figure 1. Gene distribution over select parameters. The gene distribution is shown over gene expression level (top left), gene size (top right), %GC (bottom left) and codon adaptation index (bottom right). For gene expression levels, the distribution is plotted as a factor of the transcription level determined by microarray experiments, namely the LSM (least square means), representing median gene expression level. For gene size, the distribution is plotted as a factor of the size of the gene, in nucleotides. For %GC, the distribution is plotted for each gene, globally, and for each codon position, namely GC1, GC2 and GC3 for the first, second and third position, respectively. For codon adaptation index, the distribution is plotted for all three training sets, namely CAI10 (using the 10 most highly expressed genes as training set), CAI50 (using the 50 most highly expressed genes as training set), and CAIall (using all the genes as training set).

149

LeOTLeTO

Origin Origin

TerminusTerminus

LEADING STRAND LeOTLeTO

Origin Origin

TerminusTerminus

LEADING STRAND

LaTO LaOTLAGGING STRANDLaTO LaOT

LAGGING STRAND

Figure 2. Chromosomal locations. Each strand is represented individually. The leading strand is colored in blue, while the lagging strand is colored in red. For genes on the leading strand (Le), genes from the origin to the terminus (OT) are solid blue (LeOT), whereas genes from the terminus to the origin (TO) are dashed blue (LeTO). For genes on the lagging strand (La), genes from the origin to the terminus (OT) (relative to the leading strand) are dashed red (LaOT), whereas genes from the terminus to the origin (TO) are solid red (LaTO).

150

Figure 3. Correlations between gene expression level and intrinsic gene parameters.

-2

-1

0

1

2

3

4

5

LSM

0 1000000 2000000start

-2

-1

0

1

2

3

4

5

LSM

.2 .3 .4 .5 .6 .7 .8 .9CAI10

-2

-1

0

1

2

3

4

5

LSM

.2 .3 .4 .5 .6 .7 .8 .9CAI50

-2

-1

0

1

2

3

4

5

LSM

.5 .6 .7 .8 .9CAIall

-2

-1

0

1

2

3

4

5

LSM

100 1000 600 400 200 10000 5000 3000

Size

-2

-1

0

1

2

3

4

5

LSM

-18 -16 -14 -12 -10 -8 -6 -4 -2 0RBSall

-2

-1

0

1

2

3

4

5

LSM

0 1000000 2000000start

-2

-1

0

1

2

3

4

5

LSM

.2 .3 .4 .5 .6 .7 .8 .9CAI10

-2

-1

0

1

2

3

4

5

LSM

.2 .3 .4 .5 .6 .7 .8 .9CAI50

-2

-1

0

1

2

3

4

5

LSM

.5 .6 .7 .8 .9CAIall

-2

-1

0

1

2

3

4

5

LSM

100 1000 600 400 200 10000 5000 3000

Size

-2

-1

0

1

2

3

4

5

LSM

-18 -16 -14 -12 -10 -8 -6 -4 -2 0RBSall

151

Figure 3 (continued). Correlations between gene expression level and intrinsic gene parameters. All plots investigate the relationship between an intrinsic parameter (X axis) and gene expression level (Y axis). The microarray median LSM represents the gene expression level. For intrinsic parameters, the position of the first translated nucleotide is used as the “start”; the gene length in nucleotide is used as the “size”; CAI10, CAI50 and CAI all are used as the values for codon adaptation index calculated for each training set; GC1, GC2, GC3 and GCall are used as the GC contents of the first, second, and third codon positions, respectively, while GCall represents the global GC content of a gene; RBSall represents the free energy level of the putative Shine Dalgarno sequence found upstream of the translational start.

-2

-1

0

1

2

3

4

5

LSM

.2 .3 .4 .5 .6 .7GC1

-2

-1

0

1

2

3

4

5

LSM

.1 .2 .3 .4 .5GC3

-2

-1

0

1

2

3

4

5

LSM

.1 .2 .3 .4 .5 .6GC2

-2

-1

0

1

2

3

4

5LS

M

.2 .3 .4GCall

-2

-1

0

1

2

3

4

5

LSM

.2 .3 .4 .5 .6 .7GC1

-2

-1

0

1

2

3

4

5

LSM

.1 .2 .3 .4 .5GC3

-2

-1

0

1

2

3

4

5

LSM

.1 .2 .3 .4 .5 .6GC2

-2

-1

0

1

2

3

4

5LS

M

.2 .3 .4GCall

152

Figure 4. Analysis of variance by chromosomal location. For each chromosomal location, namely LeOT (Leading strand, between the origin and the terminus), LeTO (Leading strand, between the terminus and the origin), LaOT (Lagging strand, between the origin and the terminus, relative to the leading strand) and LaTO (Lagging strand, between the terminus and the origin, relative to the leading strand), the mean values for gene expression level, GC-content at the third codon position (%GC3), and codon adaptation index, as determined by the first training set (CAI10) are plotted. Means with the same letter are not significantly different Within each plot, data points with the same letter are not significantly different.

before terminus after terminus

%G

C3

0.240

0.245

0.250

0.255

0.260

0.265

0.270

0.275

Leading strandLagging strand


CA

I10

0.42

0.44

0.46

0.48

0.50

0.52

0.54

0.56

0.58

0.60


before Terminus after Terminus

Mea

n E

xpre

ssio

n Le

vel

-0.6

-0.4

-0.2

0.0

0.2

0.4


LeOT - A

LeOT - B

LeOT - A

LaOT - C

LaOT - A

LaOT - B

LaTO - B

LaTO - B

LaTO - B

LeTO - C

LeTO - A

LeTO - A


%G

C3

0.240

0.245

0.250

0.255

0.260

0.265

0.270

0.275



CA

I10

0.42

0.44

0.46

0.48

0.50

0.52

0.54

0.56

0.58

0.60



Mea

n E

xpre

ssio

n Le

vel

-0.6

-0.4

-0.2

0.0

0.2

0.4



%G

C3

0.240

0.245

0.250

0.255

0.260

0.265

0.270

0.275



CA

I10

0.42

0.44

0.46

0.48

0.50

0.52

0.54

0.56

0.58

0.60



Mea

n E

xpre

ssio

n Le

vel

-0.6

-0.4

-0.2

0.0

0.2

0.4


LeOT - A

LeOT - B

LeOT - A

LaOT - C

LaOT - A

LaOT - B

LaTO - B

LaTO - B

LaTO - B

LeTO - C

LeTO - A

LeTO - A

153

Size LeTO

10 100 1000 10000

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

size LaOT

10 100 1000 10000

Arra

y LS

M-3

-2

-1

0

1

2

3

4

5

Size LeOT

10 100 1000 10000

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

Size LaTO

10 100 1000 10000

arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

Size LeTO

10 100 1000 10000

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

size LaOT

10 100 1000 10000

Arra

y LS

M-3

-2

-1

0

1

2

3

4

5

Size LeOT

10 100 1000 10000

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

Size LaTO

10 100 1000 10000

arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

LaOT GCall

0.20 0.25 0.30 0.35 0.40 0.45 0.50

LSM

-3

-2

-1

0

1

2

3

4

5

LeOT GCall

0.20 0.25 0.30 0.35 0.40 0.45 0.50

LSM

-3

-2

-1

0

1

2

3

4

5

LaTO GCall

0.20 0.25 0.30 0.35 0.40 0.45 0.50

LSM

-3

-2

-1

0

1

2

3

4

5

LeTO GCall

0.20 0.25 0.30 0.35 0.40 0.45 0.50

LSM

-3

-2

-1

0

1

2

3

4

5

LaOT GCall

0.20 0.25 0.30 0.35 0.40 0.45 0.50

LSM

-3

-2

-1

0

1

2

3

4

5

LeOT GCall

0.20 0.25 0.30 0.35 0.40 0.45 0.50

LSM

-3

-2

-1

0

1

2

3

4

5

LaTO GCall

0.20 0.25 0.30 0.35 0.40 0.45 0.50

LSM

-3

-2

-1

0

1

2

3

4

5

LeTO GCall

0.20 0.25 0.30 0.35 0.40 0.45 0.50

LSM

-3

-2

-1

0

1

2

3

4

5

Figure 5. Correlations between gene expression level and intrinsic genes parameters, by chromosomal location.

154

CAI10 LeTO

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

CAI10 LaTO

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

CAI10 LeOT

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

CAI10 LaOT

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

CAI10 LeTO

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

CAI10 LaTO

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

CAI10 LeOT

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

CAI10 LaOT

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Arra

y LS

M

-3

-2

-1

0

1

2

3

4

5

Figure 5 (continued). Correlations between gene expression level and intrinsic genes parameters, by chromosomal location. The relationships between gene expression level (array LSM) and three intrinsic parameters (gene size, GCall and CAI10) are plotted, for each location, namely LeOT, LeTO, LaOT, LaTO, as specified previously.

155

GC3

0.0 0.2 0.4 0.6 0.8 1.0

Num

ber o

f gen

es

0

100

200

300

400

LaTOLeTOLeOTLaOT

Gene size (bp)

10 100 1000 10000

Num

ber o

f gen

es

0

20

40

60

80

100

LaTOLeTOLeOTLaOT

CAIall

0.0 0.2 0.4 0.6 0.8 1.0

Num

ber o

f gen

es

0

100

200

300

400

500

600

LaTOLeTOLeOTLaOT

CAI10

0.3 0.4 0.5 0.6 0.7 0.8 0.9

Num

ber o

f gen

es

0

50

100

150

200

250

300

LaTOLeTOLeOTLaOT

%GCall

0.0 0.2 0.4 0.6 0.8 1.0

Num

ber o

f gen

es

0

100

200

300

400

LaTOLeTOLeOTLaOT

Gene expression level (LSM)

-4 -2 0 2 4 6

Num

ber o

f gen

es

0

20

40

60

80

100

LaTOLeTOLeOTLaOT

GC3

0.0 0.2 0.4 0.6 0.8 1.0

Num

ber o

f gen

es

0

100

200

300

400

LaTOLeTOLeOTLaOT

Gene size (bp)

10 100 1000 10000

Num

ber o

f gen

es

0

20

40

60

80

100

LaTOLeTOLeOTLaOT

CAIall

0.0 0.2 0.4 0.6 0.8 1.0

Num

ber o

f gen

es

0

100

200

300

400

500

600

LaTOLeTOLeOTLaOT

CAI10

0.3 0.4 0.5 0.6 0.7 0.8 0.9

Num

ber o

f gen

es

0

50

100

150

200

250

300

LaTOLeTOLeOTLaOT

%GCall

0.0 0.2 0.4 0.6 0.8 1.0

Num

ber o

f gen

es

0

100

200

300

400

LaTOLeTOLeOTLaOT

Gene expression level (LSM)

-4 -2 0 2 4 6

Num

ber o

f gen

es

0

20

40

60

80

100

LaTOLeTOLeOTLaOT

Figure 6. Gene distribution over select parameters, by chromosomal location. The gene distribution over select parameters, namely gene expression level, %GCall, gene size, CAI10, CAIall and GC3 is plotted, for each chromosomal location, namely LeOT, LeTO, LaOT, LaTO, as specified previously.

156

APPENDIX I – Functional and comparative genomic analyses of an operon involved in fructooligosaccharides utilization by

Lactobacillus acidophilus

157

CHAPTER I – Literature review

Documents

Transcript of CHAPTER I – Literature review