Analysis of Horizontal Gene Transfers of Potential Relevance to Microbial Virulence Fiona Brinkman...
-
Upload
henry-floyd -
Category
Documents
-
view
219 -
download
0
Transcript of Analysis of Horizontal Gene Transfers of Potential Relevance to Microbial Virulence Fiona Brinkman...
Analysis of Horizontal Gene Transfers of Potential Relevance to Microbial Virulence
Fiona Brinkman Simon Fraser University,
Greater Vancouver, British Columbia, Canada
Overview
1. High pathogen-host protein similarities: detecting horizontal gene transfer
2. Characteristics of proteins/genes putatively horizontally acquired by bacterial pathogens
3. Implications
4. Proposal: How we should be combating bacterial pathogens
Approach
Idea: Could we identify novel virulence factors by identifying bacterial pathogen proteins more similar to host proteins than you would expect?
1. Primary sequence similarity approach – identifies possible horizontal gene transfer
(2. Structural similarity approach)
• For each complete bacterial and eukaryote genome: BLASTP (and MSP Crunch) analysis of all deduced proteins, searched against non-redundant SWALL database
• Overlay NCBI taxonomy information
• Query database for bacterial proteins who’s top BLASTP scoring “hit” is eukaryotic (and eukaryotic proteins who’s top hit is bacterial)
• Initial Assumption: Three Domains of life (Bacteria, Eukarya, and Archaea) are so divergent that top hits to another Domain are rare
Unusual similarities between Bacteria & Eukaryote genes: Sequence similarity-based approach
• Problem: If a gene transfer occurs from a eukaryote to an ancestor of closely related bacteria top hit will be to other bacteria
• Therefore, perform similar query, but filtering different taxonomic groups from the analysis…
Unusual similarities between Bacteria & Eukaryote genes: Sequence similarity-based approach
Bacteria 1 closely related Bacteria 2 bacteria (same species,
family, etc)
Eukaryote
Problem: Proteins highly conserved in the three domains of life
Top hit to a protein from another domain may occur by chance.
“StepRatio” score helps detect these.
Example:Glucose-6-Phosphate Reductase
BAE-watch: Analysis of Haemophilus influenzae Rd-KW20 proteins for unusual eukaryotic protein similarities
Genome data for…
Anthrax Necrotizing fasciitis Cat scratch disease Paratyphoid/enteric feverChancroid Peptic ulcers and gastritisChlamydia Periodontal diseaseCholera PlagueDental caries PneumoniaDiarrhea (E. coli etc.) SalmonellosisDiphtheria Scarlet feverEpidemic typhus ShigellosisMediterranean fever Strep throatGastroenteritis SyphilisGonorrhea Toxic shock syndromeLegionnaires' disease Tuberculosis Leprosy TularemiaLeptospirosis Typhoid feverListeriosis UrethritisLyme disease Urinary Tract InfectionsMeliodosis Whooping cough Meningitis +Hospital-acquired infections
Bacterial Pathogens
Chlamydophila psittaci Respiratory disease, primarily in birdsMycoplasma mycoides Contagious bovine pleuropneumoniaMycoplasma hyopneumoniae Pneumonia in pigsPasteurella haemolytica Cattle shipping feverPasteurella multicoda Cattle septicemia, pig rhinitisRalstonia solanacearum Plant bacterial wiltXanthomonas citri Citrus cankerXylella fastidiosa Pierce’s Disease - grapevines
Bacterial wilt
Trends in this Sequence-based Analysis
• Identifies the strongest cases of lateral gene transfer between bacteria and eukaryotes
• Most common:
Bacteria Unicellular Eukaryote
Makes sense:
Bacteria to Multicellular eukaryote must involve germline
Eukaryote to Bacteria must not involve introns
Trends in this Sequence-based Analysis
• Identifies nuclear genes with potential organelle origins
• A control: Method identifies all previously reported Chlamydia trachomatis “plant-like” genes.
First case: Bacterium Eukaryote Lateral Transfer
0.1
Bacillus subtilis
Escherichia coli
Salmonella typhimurium
Staphylococcus aureus
Clostridium perfringens
Clostridium difficile
Trichomonas vaginalis
Haemophilus influenzae
Acinetobacillus actinomycetemcomitans
Pasteurella multocida
N-acetylneuraminate lyase (NanA) of the protozoan Trichomonas vaginalis is 92-95% similar to NanA of Pasteurellaceae bacteria.
de Koning et al. (2000) Mol Biol Evol 17:1769-1773
Pasteurellaceae
N-acetylneuraminate lyase – role in pathogenicity?
Pasteurellaceae
•Mucosal pathogens of the respiratory tract
T. vaginalis
•Mucosal pathogen, causative agent of the STD Trichomonas
N-acetylneuraminate lyase (sialic acid lyase, NanA)
Involved in sialic acid metabolism
Role in Bacteria: Proposed to parasitize the mucous membranes of animals for nutritional purposes
Role in Trichomonas: ?
Hydrolysis of glycosidic linkages of terminal sialic residues in glycoproteins, glycolipids SialidaseFree sialic acid
Transporter
Free sialic acid NanA
N-acetyl-D-mannosamine + pyruvate
Another case: A Sensor Histidine Kinase for a Two-component Regulation System
Signal Transduction “In General”
Histidine kinases more common in bacteria
Ser/Thr/Tyr kinases more common in eukaryotes
However, a histidine kinase was recently identified in fungi, including pathogens Fusarium solani and Candida albicans
How did it get there?
Candida
Neurospora crassa NIK-1
Fusarium solani FIK2 Streptomyces coelicolor SC4G10.06c
Candida albicans CaNIK1
Escherichia coli RcsC
Erwinia carotovora RpfA / ExpSEscherichia coli BarASalmonella typhimurium BarA
Pseudomonas aeruginosa GacS
Pseudomonas fluorescens GacS / ApdAPseudomonas tolaasii RtpA / PheN
Pseudomonas syringae GacS / LemA
Pseudomonas viridiflava RepAAzotobacter vinelandii GacS
0.1
Streptomyces coelicolor SC7C7.03
Xanthomonas campestris RpfCVibrio cholerae TorS
Escherichia coli TorS
Fusarium solani FIK1Fungi
Pseudomonas aeruginosa PhoQ
100
100
51100
100
100
100
100100
100
100
100
100
86
54
39
100
100
Streptomyces Histidine Kinase. The Missing Link?
Virulence Factor ( )in every organism examined to date
Brinkman et al. (2001) Infection and Immunity 69:5207-5211
“Plant-like” genes in Chlamydia• Proteins: Unusually high number most similar to plant
proteins
• Previous proposal: Obtained genes from a plant-like amoebal host?
(A relative of Chlamydiaceae infects Acanthamoeba. Chlamydiaceae: Obligate intracellular pathogens)
• However Acanthamoeba relationship to plants very controversial
“Plant-like” genes in ChlamydiaNCBI GI Protein description Subcellular localization in plants
4377270 Glycyl tRNA Synthetase Chloroplast
4376626 cADP/ATP Translocase Chloroplast
4376667 cGlycogen Hydrolase Chloroplast
4377189 GTP Cyclohydratase & DHBP Synthase Chloroplast
4377237 cBeta-Ketoacyl-ACP Synthase Chloroplast
4376686 cEnoy-Acyl-Carrier Reductase Chloroplast
4376591 cThioredoxin Reductase Chloroplast
4377185 Metal Transport P-type ATPase Chloroplast
4377346 Similar to NA+/H+ Antiporter Chloroplast
4376650 cPhosphate Permease Chloroplast
4376637 GcpE protein Chloroplast
4376637 Tyrosyl tRNA Synthetase Chloroplast
4377360 cMalate Dehydrogenase Chloroplast
4376763 GTP Binding protein Chloroplast
4376911 cADP/ATP Translocase Chloroplast
3329179 Phosphoglycerate Mutase Chloroplast
4377281 cGlycerol-3-Phosphate Acyltransferase Chloroplast
4376993 ABC Transporter ATPase Chloroplast
4376509 dDeoxyoctulonosic Acid Synthetase Chloroplast
4376872 eSugar Nucleotide Phosphorylase Chloroplast
“Plant-like” genes in Chlamydia
6578112 rRNA Methytransferase Chloroplast
3329217 HSP60 Chloroplast
3328745 cPhosphoribosylanthranilate Isomerase Chloroplast
6578104 cAspartate Aminotransferase Chloroplastf
4377328 cPolyribonucleotide Nucleotidyltransferase Chloroplastf
4377362 Putative D-Amino Acid Dehydrogenase Chloroplastg
4377331 Cytosine Deaminase Chloroplast?h
4376915 Lipoate-Protein Ligase A Mitochondrial
4377272 Glycogen Synthase N/Ai
4377065 cDihydropteroate Synthase N/Ai
4377239 cInorganic Pyrophosphatase N/Ai
4376904 Uridine 5’-Monophosphate Synthase N/Ai
4377173 cUDP-Glucose Pyrophosphorylase N/Ai
4376815 GutQ/Kpsf Family Sugar-Phosphate Isomerase Mitochondrial?j
“Plant-like” genes in Chlamydia
• Endosymbiotic theory
• Rickettsia: Many eukaryotic-like genes
• Synechocystis: Many plant-like genes
• Does Chlamydia share an ancient ancestral relationship with the ancestor of the Chloroplast?
Chlamydiaceae share an ancestral
relationship with Cyanobacteria and
Chloroplast0.1
Pyrococcus furiosus (Archaea)
Thermotoga maritima
Aquifex pyrophilus
Bacillus subtilis
Chlamydophila pneumoniae
Chlamydophila psittaci
Chlamydia muridarum
Chlamydia trachomatis1000
7041000
Chlamydomonas reinhardtii
Klebsormidium flaccidum
Zea mays
Nicotiana tabacum1000
988
998
Synechococcus PCC6301
Synechocystis PCC6803
Microcystis viridis1000
1000
1000
530
Escherichia coli
Zea mays mitochondrion
Rickettsia prowazekii
Caulobacter crescentus
868986
764
349
1000
538
Chloroplasts
Cyanobacteria
Chlamydiaceae
16S rRNA
Chlamydiaceae share an ancestral relationship with Cyanobacteria and Chloroplast
L3
L4
L23
L2
S19
L22 S3
L16 L29
S17
L14
L24
L5
S14 S8
L6
L18 S5
L30
L15S10
EscherichiaBacillusThermatogaSynechocystisChlamydia
Unique shared-derived characters unite Chlamydiaceae and Synechocystis
Chlamydiaceae “plant-like” genes reflect an ancestral relationship with Cyanobacteria and the Chloroplast
• Chlamydiaceae do not appear to be exchanging DNA with their hosts
• Existing knowledge of Cyanobacteria may stimulate ideas about the function and control of pathogenic Chlamydia?
Brinkman et al. (2002) Genome Research 12:1159-1167.
Overview
1. High pathogen-host protein similarities: detecting horizontal gene transfer
2. Characteristics of proteins/genes putatively horizontally acquired by bacterial pathogens
3. Implications
4. Proposal: How we should be combating bacterial pathogens
Horizontal Gene Transfer and Bacterial Pathogenicity
Transposons: ST enterotoxin genes in E. coli
Prophages:Shiga-like toxins in EHECDiptheria toxin gene, Cholera toxinBotulinum toxins
Plasmids:Shigella, Salmonella, Yersinia
Pathogenicity Islands:
Uro/Entero-pathogenic E. coliSalmonella typhimuriumYersinia spp.Helicobacter pyloriVibrio cholerae
Pathogenicity Islands
Associated with
– Atypical %G+C– tRNA sequences– Transposases, Integrases and other mobility genes– Flanking repeats
IslandPath: Aiding identification of Pathogenicity Islands and other Genomic Islands
Yellow circle = high %G+C
Pink circle = low %G+C
Region of unusual dinucleotide bias
tRNA gene lies between the two dots
rRNA gene lies between the two dots
Both tRNA and rRNA lie between the two dots
Dot is named a transposase
Dot is named an integrase
_
Hsiao et al. (2003) Bioinformatics 19: 418-420
Dinucleotide bias analysis
Genome divided into “ORF-clusters” of 6 consecutive ORFs
For each ORF cluster,the average absolute dinucleotide relative abundance difference is
where f (fragment) is derived from sequences in an ORF-cluster g (genome) is derived from all predicted ORFs in the genome
Dinucleotide relative abundance is *XY = f*XY/f*Xf*Y
where f*X denotes the frequency of the mononucleotide X f*XY the frequency of the dinucleotide XY
|)(*)(*|16
1),(* gfgf xyxy
See Hsiao et al. (2003) Bioinformatics 19: 418-420
and Karlin, S. and Burge, C (1995). Trends in Genetics 1995 11:283-90 for review
Dinucleotide bias analysis
ORF-clusters sampled in an overlapping manner (shift by one ORF at a time)
The mean is calculated by averaging the results from all ORF-clusters in the genome
Regions with greater than 1 standard deviation away from the mean are marked on the IslandPath graphical display with strikethrough lines
Why did we use 6 ORFs per cluster?- Not enough bp in a single ORF to get a good estimate - 4.5kb (corresponding to approximately 6-8 ORFs) is required for “reliable estimation
of nucleotide composition” (Lawrence and Ochman, J Mol Evolution 1997 44:383-97)
),(* gf
1
7
11
20
22
33
34 3536
II
I
V
IV
III
VI
VII
VIII
IX
X
32
Boxes: Known islands in the Salmonella typhi genome
What features best predict Islands?
Examined prevalence of features in over 200 known islands
• 94% of islands contain >25% dinucleotide bias (majority have >75% dinucleotide bias coverage)
• Mobility genes identified in >75% (but ID recently improved)
• Atypical %G+C (above cutoff used in Brinkman et al., 2002) not over 50% coverage on average, and tRNA genes not observed with >50% of known islands
1
37
11
18
20
22
33
34 3536
II
I
V
IV
III
VI
VII
VIII
IX
X
32
1
569
1012
13
1415
17
2122
24
323334
3536
Boxes: “Insertions” in the Salmonella typhi genome verses Salmonella typhimurium
Properties of genes in these islands?
Defined a “putative island” as
– 8 or more genes in a row with dinucleotide bias
Functional category analysis Any difference for genes in islands verses genome?
Bac
illus
sub
tilis
168
Bor
relia
bur
gdor
feri
B31
Buc
hner
a sp
. AP
S
Chl
amyd
ia tr
acho
mat
is D
Clo
strid
ium
ace
tobu
tylic
um A
TCC
824
Esc
heric
hia
coli
K12
Esc
heric
hia
coli
O1
57_E
DL9
33
Hae
mop
hilu
s in
fluen
zae
Rd-
KW
20
Hel
icob
acte
r pyl
ori 2
6695
List
eria
inno
cua
Clip
1126
2
Myc
obac
teriu
m le
prae
Myc
obac
teriu
m tu
berc
ulos
is C
DC
1551
Myc
opla
sma
pneu
mon
iae
M12
9 N
eiss
eria
men
ingi
tidis
ser
ogro
up B
stra
in M
C58
P
seud
omon
as a
erug
inos
a P
AO
1
Ric
ketts
ia p
row
azek
ii M
adrid
E
Sal
mon
ella
typh
imur
ium
LT2
S
taph
yloc
occu
s au
reus
sub
sp. A
ureu
sN
315
Stre
ptoc
occu
s pn
eum
onia
e TI
GR
4
Sul
folo
bus
solfa
taric
us
Vib
rio c
hole
rae
chro
mos
ome
I
Vib
rio c
hole
rae
chro
mos
ome
II
Yer
sini
a pe
stis
CO
92
0
10
20
30
40
50
60
70
80
Organisms
% H
ypo
thet
ical
Pro
tein
s
Hypothetical genes are more common in putative islands vs the genome
(Paired T test P= 6.8E-19)
Genome
Put. Islands
Analysis 1: COG functional category analysis
Analysis 2: SUPERFAMILY HMM search results
SUPERFAMILY: a set of HMMs built from SCOP superfamilies
Fewer ORFs in the putative islands were assigned to a SUPERFAMILY class
Genome
Put Islands
Paired T test P= 3.3E-14
B.
ha
lod
ura
ns
B.
bu
rgd
orf
eri
C.
tra
ch
om
atis
C.
ac
eto
bu
tylic
um
E.
co
li K
12
H.
influ
en
za
e
L.
mo
no
cy
tog
en
es
M.
lep
rae
M.
tub
erc
ulo
sis
M.
ge
nita
lium
N.
me
nin
giti
dis
B
P.
ae
rug
ino
sa
R.
pro
wa
ze
kii
S.
typ
him
uri
um
S.
au
reu
s
V.
ch
ole
rae
ch
I
V.
ch
ole
rae
ch
II
Y.
pe
stis
0
10
20
30
40
50
60
70
80%
OR
Fs
as
sig
ne
d t
o s
up
erf
am
ily
Analysis 3: Gene size in Putative Islands vs. “Non-Islands”
B.s
ub
tilis
B.
bu
rgd
orf
eri
B.
me
lite
ns
is c
h I
B.
me
lite
ns
is c
h I
I
Bu
ch
ne
ra
C.
tra
ch
om
atis
D
D.
rad
iod
ura
ns
ch
1
D.
rad
iod
ura
ns
ch
2
E.
co
li K
12
E.
co
li O
15
7
H.in
flue
nz
ae
L.
mo
no
cy
tog
en
es
M.
lep
rae
M.
tub
erc
ulo
sis
CD
C1
55
1
M.
tub
erc
ulo
sis
H3
7R
v
N.
me
nin
giti
dis
A
N.
me
nin
giti
dis
B
P.
ae
rug
ino
sa
R.
pro
wa
ze
kii
S.
typ
him
uri
um
S.
flex
ne
ri
S.
au
reu
s
S.
pn
eu
mo
nia
e
V.
ch
ole
rae
ch
I
V.
ch
ole
rae
ch
II
Y.
pe
stis
0
50
100
150
200
250
300
350
400
Me
an
ge
ne
le
ng
th (
bp
)ORFans (genes with no homologs among 60 microbial genomes) tend to be shorter genesAre genes in putative islands shorter as well on average?
In most cases, average ORF length in putative islands is shorter
Non Island
Put. Islands
Paired T test P= 7.1E-34
Analysis 4: COG analysis after removing ORFs <300 bp
Genes may be less well predicted in such island/atypical dinucleotide bias regions
Some genomes still show marked increase % hypothetical genes in islands verses genome
Hypothetical genes more common in islands?
Paired T test P= 0.0016
0
10
20
30
40
50
60
70
80
% h
yp
oth
eti
ca
l p
rote
ins
Genome
Islands
Summary: Bacteria gene transfer analysis
• No cases identified in our database to date of clear, recent horizontal gene transfer between bacteria and a multicellular eukaryote (involving >80% sequence similarity)
The pathogens studied are not commonly acquiring genes from their hosts, or vice versa
• Bacterial and eukaryotic pathogens may have exchanged genes
• Overall increased prevalence of hypothetical genes in putative bacterial genomic islands? Cautionary note about gene prediction accuracy
Overview
1. High pathogen-host protein similarities: detecting horizontal gene transfer
2. Characteristics of proteins/genes putatively horizontally acquired by bacterial pathogens
3. Implications
4. Proposal: How we should be combating bacterial pathogens
Implications: Evolution of Pathogenicity
Pathogen mimicry of their host: Convergent evolution or genes selectively maintained
Gene exchange between pathogens: “Arms Deals”
Pathogens and “The Art of War”
“What is of supreme importance in war is to attack the enemy's strategy. Next best is to disrupt his alliances by diplomacy. The next best is to attack his army. And the worst policy is to attack cities.”
FPMIIN
DU
STR
Y
Ani
geni
cs C
anad
a
Inim
ex P
harm
a In
c AC
AD
EM
IA
VID
O, U
Sask
UB
C, SFU
, BC
GSC
GOVERNMENTGenome CanadaGenome Prairie
Genome BCGovt of Saskatchewan
Functional Pathogenomics of Mucosal Immunitywww.pathogenomics.ca
• BC Pathogenomics group Ann M. Rose, Yossef Av-Gay, David L. Baillie, Fiona S. L. Brinkman, Robert
Brunham, Artem Cherkasov, Rachel C. Fernandez, B. Brett Finlay, Hans Greberg, Robert E.W. Hancock, Steven J. Jones, Patrick Keeling, Audrey de Koning, Don G.
Moerman, Sarah P. Otto, B. Francis Ouellette, Nancy Price, William Hsiao.
• Jeff Blanchard (NCGR, New Mexico) and Olof Emanuelsson (Stockholm Bioinformatics Center)
• Peter Wall Institute for Advanced Studies, Genome Canada