Bioinformatics Computational methods to discover ncRNA in bacteria
description
Transcript of Bioinformatics Computational methods to discover ncRNA in bacteria
www. .uni-rostock.de
BioinformaticsBioinformaticsComputational methods to discover ncRNA in bacteriaComputational methods to discover ncRNA in bacteria
Bioinformatics and Systems Biology Groupwww.sbi.informatik.uni-rostock.de
Ulf Schmitz, Computational methods to discover ncRNA 2
www. .uni-rostock.de
Outline
1. Problem description
2. Streptoccocus pyogenes
3. The RNome, transcriptome
4. Characteristics of bacterial ncRNA
5. Approaches to find fRNA
6. Conclusion / Outlook
Ulf Schmitz, Computational methods to discover ncRNA 3
www. .uni-rostock.de
Streptococcus pyogenes
• important human pathogen (group A streptococcus or GAS)• causes following diseases:
– pyoderma (111 million cases/year)– pharyngitis (616 million cases/year and 517,000 deaths/year)
pyoderma (source: DermNet NZ) pharyngitis (source: UCSD)
• completely adapted to humans as it’s only natural host• causes purulent infections of the skin and mucous membranes
and rarely life-threatening systemic diseases
Ulf Schmitz, Computational methods to discover ncRNA 4
www. .uni-rostock.de
Streptococcus pyogenes
varies in multiplication rate -> associated with type of infection
to understand the regulation, one studied the growth-phase regulatory factors and gene expression in response to specific environmental differences within the host
a novel growth phase assosiated two-component-type regulator was identified fasBCA operon, present in all 12 tested M serotypes
contained two potential HPK genes (FasB, FasC) and one RR (FasA)
shows its maximum expression and activity at the transition phase
and to potentially support the aggressive spreading of the bacteria in its host
HPK = Histidine protein kinaseRR = response regulator
Ulf Schmitz, Computational methods to discover ncRNA 5
www. .uni-rostock.de
Streptococcus pyogenes
• downstream of the fas operon they identified a ~300 nucleotide transcript (fasX)
• not encoding for a peptide/protein– but also growth phase related– main effector molecule of fas regulon
• ncRNA or fRNA
Ulf Schmitz, Computational methods to discover ncRNA 6
www. .uni-rostock.de
ncRNA
gltX-L fasB fasC fasA
pfas pfasXtttt
prnpA
rnpA-L
fasX
1kb
Ulf Schmitz, Computational methods to discover ncRNA 7
www. .uni-rostock.de
RNome or transcriptome
RNARNA
mRNAmRNA
ncRNA / fRNA snmRNA / sRNAncRNA / fRNA
snmRNA / sRNA
Structural RNAStructural RNA miRNAmiRNA siRNAsiRNA snRNAsnRNA snoRNAsnoRNA stRNAstRNA
tRNAtRNArRNArRNA putative gene expression regulators(also protein interaction – and housekeeping ncRNAs where found)
Ulf Schmitz, Computational methods to discover ncRNA 8
www. .uni-rostock.de
RNome or transcriptome
fRNA Functional RNA essentially synonymous with non-coding RNA
miRNA MicroRNA 21-24 nucleotide RNAs probably acting as translational regulators mRNA
siRNA Small interfering RNA active molecules in RNA Interference
snRNA Small nuclear RNA includes spliceosomal RNAs
snmRNA Small non-mRNA essentially synonymous with small ncRNAs
snoRNA Small nucleolar RNA most known snoRNAs are involved in rRNA modification
stRNA Small temporal RNA for example, lin-4 and let-7 in Caenorhabditis elegans
Non-coding RNA (ncRNA) genes produce functional RNA molecules rather than encoding proteins and here are the nominees:
mRNA messenger RNA - transcript of a protein coding gene
rRNA ribosomal RNA - form large parts of the ribosome, the protein producing machinary
tRNAtransfer RNA - also involved in protein production, carrying single amino acids to the growing amino acid chain of a protein
ncRNA non coding RNA - found in intergenic regions, playing miscellaneous roles
types of RNA:
Ulf Schmitz, Computational methods to discover ncRNA 9
www. .uni-rostock.de
Functions of ncRNA
…target mRNAs via imperfect sequence complementarity
binding may result in:• blockage of ribosome entry (translation repression)
• melting of inhibitory secondary structures (translation activation)
loop-loop kissing complexdissolving fold the fold back structure
Ulf Schmitz, Computational methods to discover ncRNA 10
www. .uni-rostock.de
Streptococcus pyogenes genomes
Serotype Length Date
M1 GAS 1852441 bp Sep 19 2001
MGAS10270 1928252 bp May 4 2006
MGAS10394 1899877 bp Aug 3 2004
MGAS10750 1937111 bp May 4 2006
MGAS2096 1860355 bp May 4 2006
MGAS315 1900521 bp Jul 18 2002
MGAS5005 1838554 bp Aug 8 2005
MGAS6180 1897573 bp Aug 8 2005
MGAS8232 1895017 bp Jan 31 2002
MGAS9429 1836467 bp May 4 2006
SSI-1 1836467 bp May 4 2006
Genome Info & Features:
Genes: 1805
Protein coding 1697
Length 1,852,441 nt
Structural RNAs: 73
GC Content: 38%
Pseudo genes: 35
Coding: 83%
Topology: circular
Molecule dsDNA
Ulf Schmitz, Computational methods to discover ncRNA 11
www. .uni-rostock.de
Intergenic sequence inspector (ISI)
IGR extractor
Annotated genome
IGR filtering BLAST BLAST Analyser Genview Final results
IGR databankFiltered IGR
databankBLAST results
Alignedfeatures
Sequence features
Bacterial genomes database
Ulf Schmitz, Computational methods to discover ncRNA 12
www. .uni-rostock.de
Characteristics of bacterial ncRNA
• intergenic sequence/structure conservation between related genomes• encoded by free-standing genes, oriented in opposite fashion to both flanking genes • 50 to 400 nt long (avrg. >200nt)• higher G+C content than average intergenic space• σ70 promoter• ρ – independent terminator• imperfect sequence complementary with target mRNA
Ulf Schmitz, Computational methods to discover ncRNA 13
www. .uni-rostock.de
Characteristics of bacterial ncRNA
CA90T
Promotor
T82T84G78A65C54A45 T80A95T45A60A50T96
-35 -10
16-19bp 5-9bp
Startpoint
intrinsic terminator
Ulf Schmitz, Computational methods to discover ncRNA 14
www. .uni-rostock.de
The structure approach with RNAz
1. multiple sequence alignment
2. measure of thermodynamic stability (z score)
3. measure for RNA secondary structure conservation
Function of many ncRNAs depend on a defined secondary structure
Ulf Schmitz, Computational methods to discover ncRNA 15
www. .uni-rostock.de
• calculation of the MFE (minimum free energy) as a measure of thermodynamic stability
• MFE depends on the length and the base composition of the sequence– and is therefor difficult to interpret in absolute terms
• RNAz calculates a normalized measure of thermodynamic stability by – compares the MFE m of a given (native) sequence – with the MFEs of a large number of random sequences with similar length
and base composition.• A z-score is calculated as
, where µ and σ are the mean and standard deviations, resp., of the MFEs of the random samples
• negative z score indicates the a sequence is more stable than expected by chance
The structure approach
Thermodynamic stability
m
z
Ulf Schmitz, Computational methods to discover ncRNA 16
www. .uni-rostock.de
• RNAz predicts a consensus secondary structure for an alignment – results in a consensus MFE EA
• RNAz compares this consensus MFE to the average MFE of the individual sequences Ē and calculates a structure conservation index:
• SCI will be low if no consensus fold can be found.
The structure approach
Structural conservation
_
A E/ESCI
Ulf Schmitz, Computational methods to discover ncRNA 17
www. .uni-rostock.de
The structure approach
• z-score and SCI, are used to classify an alignment as “structural RNA” or “other”.
• RNAz uses a support vector machine (SVM) learning algorithm which is trained on a set of known ncRNAs.
Ulf Schmitz, Computational methods to discover ncRNA 18
www. .uni-rostock.de
Analysis pipeline of Freiburg group
extraction of intergenic regions ≥50nt
BLASTN
E-value ≤10-8
discard
no
reverse complement
Unify overlapping
Clustering
Scoring
local alignment of IGRs with BLASTN
of candidate sequences
to reduce redundancy
using ClustalW
using RNAz
Ulf Schmitz, Computational methods to discover ncRNA 19
www. .uni-rostock.de
Summary / Conclusion
• there are ‘reliable’ computational methods to find ncRNA coding genes in bacteria
• key methods involve: – IGR extraction and filtering– observing sequence conservation in related
genomes (BLAST search, ClustalW alignment)– checking for structure conservation and
thermodynamic stability
• next step is to proof their existance experimentally via microArrays or Northern Blots
Ulf Schmitz, Computational methods to discover ncRNA 20
www. .uni-rostock.de
Outlook
• might it be possible to predict target mRNA?
Thanks for your attention!