Introduction to bioinformatics, 2010 - Göteborgs...
Transcript of Introduction to bioinformatics, 2010 - Göteborgs...
RNAbioinformatics
Marcela DavilaDepartment of Medical Biochemistry and Cell Biology
Institute of Biomedicine
Introduction to bioinformatics, 2010
RNA bioinformatics 2
Types and Roles of ncRNAs• mRNA codes for proteins
• A non-coding RNA (ncRNA) is any RNAmolecule that is not translated into a protein
•Genomic stabilityTelomerase
•RNA processing and modificationSpliceosomal snRNAU7 snRNARNAse PRNAse MRP
•Transcription7SK RNA6S RNA
•TranslationtRNAtmRNArRNA
•Protein traffickingSRP RNA
Gisela Storz, Shoshy Altuvia and Karen M. Wasserman (2005)Matera, A.G., R.M. Terns, and M.P. Terns, Nat Rev Mol Cell Biol, 2007.
RNA bioinformatics 3
ncRNA content
Are ncRNAs responsible for the complexity in different organisms?
Huttenhofer, A., P. Schattner, and N. Polacek, Trends Genet, 2005
RNA bioinformatics 4
DiseasePrasanth, K.V. and D.L. Spector, Genes Dev, 2007. Costa, F.F. Drug Discov Today 2009Pandey, A.K., P. Agarwal, K. Kaur, and M. Datta. Cell Physiol Biochem 2009
miR DiabetesMRP RNA Cartilage hair-hypoplasia
RNA bioinformatics 5
DiseaseThiel, C.T., G. Mortier, I. Kaitila, A. Reis, and A. Rauch. Am J Hum Genet 2007
Cartilage hair-hypoplasia
MRP RNA processing of pre-rRNA
RNA bioinformatics 6
Protein - Primary sequenceClustalW
Sequence similarity biological relationsame function
RNA bioinformatics 7
ncRNA - Primary sequence
No sequence conservation,but structural
Covariation: Consistent and compensatory mutations that (often) conserve the structure
Mfold, RNAfold
RNA bioinformatics 8
A single mutation can radically change the structure
Canonical pairs Non-canonical pairs: GU wobble
http://prion.bchs.uh.edu/bp_type/bp_structure.html
RNA bioinformatics 9
Multibranched loop
Secondary structure
RNA functionality depends on structure
External base
Stem
Loop
Hairpin
Internal loop
Bulge
Pseudoknot
RNA bioinformatics 10
Tertiary structure
RNA tertiary structure comprises interactions of SS:two helicestwo unpaired regionsone unpaired region and a double-stranded helix
Prediction of RNA 3D structure is very difficult and RNA bioinformatics is therefore dominated by the prediction and analysis of secondary structure.
RNA bioinformatics 11
Family structure
tRNA Telomerase RNAP RNA
Each family typically adopts a characteristic secondary structure
RNA bioinformatics 12
However...
Dictyostelium discoideumCandida albicans
Trypanosoma brucei
U1 snRNA
MRP RNA
RNA bioinformatics 13
RNA regulatory elements
A cis-regulatory element or cis-element is a region of RNA that regulates the expression of genes located on that same strand.
iron-responsive element/iron regulatory protein 26–30 nts (long hairpin), CAGUGN apical loop sequence, 5’UTR – 3’UTR
RNA bioinformatics 14
IRE regulationMuckenthaler MU, Galy B, Hentze MW. AnnuRev Nutr. 2008
Ferritiniron storage protein
Transferrin receptoriron acquisition protein
RNA bioinformatics 15
RNA regulatory elements
A cis-regulatory element or cis-element is a region of RNA that regulates the expression of genes located on that same strand.
Riboswitch-Typically found in the 5’ UTR-Biosynthesis, catabolism and transport of various cellular catabolites (aminoacids [K,G], cofactors, nucleotides and metal ions)-Most known occur in Bacteria
RNA bioinformatics 16
Riboswitch examplesSerganov A, Patel DJ. Biochim Biophys Acta. 2009
Transcription Translation
Shine-Dalgarno
RNA bioinformatics 17
RNA regulatory elements
A cis-regulatory element or cis-element is a region of RNA that regulates the expression of genes located on that same strand.
SECIS element: selenocysteine insertion sequence element3’UTR
RNA bioinformatics 18
Selenoprotein synthesis
Incorporation of Sec into selenoproteins requires:
1.- UGA-Sec
2.- Sec tRNA[Ser]Sec
3.- SECIS - selenocysteine insertion sequence element several Kb away from UGA – 3’UTR
4.- SRE – selenocysteine redefinition element6 nt downstream UGA - CDS
5- several protein factors: EFSecSBP2Sec- specific elongation factorribosomal protein L30Secp43 - RBPSLA - soluble liver antigen
Berry MJ, Nat genet. 2005Papp, LV, et al. ANTIOXIDANTS & REDOX SIGNALING 2007
RNA bioinformatics 19
RNA regulatory elements
Trans-regulatory elements are RNAs that may modify the expression of genes, distant from the gene that was originally transcribed to create them.
U7 snRNA
D3
B G
ELsm10
Lsm11 F Symplekin
CPSF-73
CPSF-100
SLBP
ZFP-100
Histone pre-mRNA
Dominski, Z. and W.F. Marzluff. Gene, 2007
RNA bioinformatics 20
Protein vs RNA identification
Sequence-similarity based
Conserved primary sequence
Protein RNA
Promoters (Pol II)Not Conserved primary sequencePromoters (Pol II, Pol III)Sequence-similarity basedSecondary structure basedComparative genomics
RNA bioinformatics 21
Methods
•Nussinov algorithm•Mfold (prediction of secondary structure)•Analysis of mutual information•Pattern matching•SCFG (Stochastic context-free grammar models)•Phylogenetic analysis
Nussinov algorithm: Find the structure with the most base pairs (dynamic programming)
Drawbacks:Not unique structureTesting all possible structures
numerically impossible
RNA bioinformatics 22
Methods
•Nussinov algorithm•Mfold (prediction of secondary structure)•Analysis of mutual information•Pattern matching•SCFG (Stochastic context-free grammar models)•Phylogenetic analysis
Zuker folding algorithm (1981): The correct structure is the one with the lowest equilibrium free energy (ΔG) which is the sum of individual contributions from loops, base pairs and other secondary structure elements
Every system seeks to achieve a minimum of free energy (MFE)
However ... The structure with the lowest MFE not always is the biological relevant
RNA bioinformatics 23
Methods
•Nussinov algorithm•Mfold (prediction of secondary structure)•Analysis of mutual information•Pattern matching•SCFG (Stochastic context-free grammar models)•Phylogenetic analysis
Mutual information: quantity that measures the mutual dependence of the two variables (two positions). The unit of measurement is the bit.
0.000.00
0.000.00
0.001.00
0.002.00
0.000.00
RNA bioinformatics 24
Mutual information – excercise
RNA bioinformatics 25
Mutual information plot
Diagonals of covarying positions correspond to the four stems of the tRNA. Dashed lines indicate some of the addtional tertiary contacts observed in the yeast tRNA-Phe crytal structure.
RNA bioinformatics 26
Methods
•Nussinov algorithm•Mfold (prediction of secondary structure)•Analysis of mutual information•Pattern matching•SCFG (Stochastic context-free grammar models)•Phylogenetic analysis
p1 = 5...7GGAA~p1
Patscan: is a pattern matcher (deterministic motifs as well as secondary structure constraints) which searches protein or nucleotide sequence archives
Drawback:Yes/No answer
RNA bioinformatics 27
PatScan - Example
SRP RNA
N.gonorrhoeaeM. pneumoniae E.fecalis
r1={au,ua,gc,cg,gu,ug,ga,ag}p1=4...4cagrp2=3...3graar1~p2agcaar1~p1
RNA bioinformatics 28
Methods
•Nussinov algorithm•Mfold (prediction of secondary structure)•Analysis of mutual information•Pattern matching•SCFG (Stochastic context-free grammar models)•Phylogenetic analysis
Regular grammar primary sequence models
T aS | bT | ɛaT aaS aabS aabaT aabaɛ aaba
S aT | bS
Model repeat regions (ex. FMR-1 triplet repeat region)
S gW1W1 cW2W2 gW3W3 cW4W4 gW5W5 gW6W6 cW7 | aW4 | cW4W7 tW8W8 g
gcg cgg ctggcg cgg agg cgg ctggag agg ctggcg agg cgg ctggcg agg cgg cgg
RNA bioinformatics 29
Methods
•Nussinov algorithm•Mfold (prediction of secondary structure)•Analysis of mutual information•Pattern matching•SCFG (Stochastic context-free grammar models)•Phylogenetic analysis
Context-free grammar primary sequence models palindromes
S aSa | bSb | aa | bb S aSa aaSaa aabSbaa aabaabaa
RNA secondary structureCAGGAAACUGGCUGCAAAGCGCUGCAACUG
S aW1u | cW1g | gW1c |uW1aW1 aW2u | cW2g | gW2c |uW2aW2 aW3u | cW3g | gW3c |uW3aW3 ggaa | gcaa
G AG AG.CA.UC.G
C AG AU.AC.GG.C
C AG AUxCCxUGxG
RNA bioinformatics 30
Methods
•Nussinov algorithm•Mfold (prediction of secondary structure)•Analysis of mutual information•Pattern matching•SCFG (Stochastic context-free grammar models)•Phylogenetic analysis
Stochastic regular grammar weighted primary sequence models (probabilistic)
S rW1 S kW1 S nW1
(0,45) (0,45) (0,10)
Hidden markov modelsA
C G
T
ɛβ
RNA bioinformatics 31
Methods
•Nussinov algorithm•Mfold (prediction of secondary structure)•Analysis of mutual information•Pattern matching•SCFG (Stochastic context-free grammar models)•Phylogenetic analysis
Stochastic context-free grammar Covariance models: probabilistic models that flexibly describe the secondary structure and primary sequences consensus fo an RNA sequence family
RNA bioinformatics 32
Infernal Package
•Search for additional and family-related sequences in sequence databases
RNA bioinformatics 33
Database containing information about ncRNA families and other structured RNA elements.
Rfam
RNA bioinformatics 34
Methods
•Nussinov algorithm•Mfold (prediction of secondary structure)•Analysis of mutual information•Pattern matching•SCFG (Stochastic context-free grammar models)•Phylogenetic analysis
- Conserved elements alignment- SCFG Secondary structure- Fold- Phylogenetic evaluation
EVOfold:
RNA bioinformatics 35
miRNA
•SS RNA
•~22 nucleotides
•Accounts for ~1% of all transcripts in humans and potentially regulate 10%-30% of all genes
•Expressed ubiquitously and highly conserved in Metazoans (animal kingdom) and Plants
•Inhibit the translation of mRNAs to their protein products by biding to specific regions in the 3ʼ UTR
C D Sm7G
5’ 3’miRNA
5’3’
AAUAA AAAAAAAATarget
RNA bioinformatics 36
miRNANegrini, M., M.S. Nicoloso, and G.A. Calin. CurrOpin Cell Biol 2009.
ApoptosisCell prolifertion Cell differentiationDevelopmentOrganism defense against infectionsTissue morphogenesisRegulation of metabolism
CancerViral infectionsNeurodegenerative disordersCardiac pathologiesMuscle disordersDiabetes
Biological processes Diseases
RNA bioinformatics 37
miRNA genesKim VN Nat Rev Mol Cell Biol. 2005Winter J et al Nat Cell Biol. 2009
Exonic / Intergenic
Intronic
mirtron
RNA bioinformatics 38
miRNA BiogenesisWinter, J., S. Jung, S. Keller, R.I. Gregory, and S. Diederichs. Nat Cell Biol 2009. Paul S. Meltzer, Nature, 2005
Editing
RNA bioinformatics 39
miRNA BiogenesisWinter, J., S. Jung, S. Keller, R.I. Gregory, and S. Diederichs. Nat Cell Biol 2009. Paul S. Meltzer, Nature, 2005
RNA bioinformatics 40
miRNA structureNegrini, M., M.S. Nicoloso, and G.A. Calin. CurrOpin Cell Biol 2009.
miRNA
miRNA*
Interveningloop
Hairpin structure
Human genome ~11 million hairpins
RNA bioinformatics 41
miRNA computational identification
Homology search basedBLASTmiRAling, ProMir, microHARVESTER
Gene findingIdentification of conserved genomic regionsFolding of the identified regions (Mfold, RNAfold)Evalutation of hairpinsmiRseeker, miRscan
Neighbour stem loop (~42% of human miRNA genes are clustered together)Check surroundings of a known miRNA for candidate secondary structures
Comparative genomicsBLAST intergenic sequences of two genomes against each otherFilter based on rules inferred based on known miRNAsmiRFinder
Intragenomic matching (A functional miRNA should have at least a target)miRNAs show perfect complementarity to their targets (?)It simultaneously predicts miRNAs and their targetsmiMatcher
RNA bioinformatics 42
miRNA experimental validation through sequencing
Experimental approach:
– Purify small RNAs (15-35 nt)– Deep sequencing of the RNA library.– Map sequence traces to the genome.
Ruby JG. et al. Genome Res., 2007
RNA bioinformatics 43
miRNA Target predictionNegrini, M., M.S. Nicoloso, and G.A. Calin. CurrOpin Cell Biol 2009.
• Predicting miRNA targets in plants is easier, due to the perfectcomplementarity to the miRNAs
• In animals, perfect complementarity is not common– miRNA seed complementarity (6 to 9 nt)– High false positives rate
• Common approach– Experimental evidences – Validated miRNA/target pairs– Tarbase, miRecords
• Computational methods:– Base-pairing rules and binding sites sequence features– Conservation– Thermodynamics
C D Sm7G
5’ 3’miRNA
5’3’
AAUAA AAAAAAAATarget
RNA bioinformatics 44
Base-pairing rulesBartel, D.P. 2009. Cell 2009.
6-9 nt, starting usually at P2P1 is typically unpaired or starts with UOften flanked by AUsually no G:U wobbles (vs regulation)
3’ compensatory sites
Canonical sites
Atypical sites
lsy-6/cog-1 3’UTR
5’ dominant sites
May compensate for insufficient basepairing in the seed
RNA bioinformatics 45
More methods ...Negrini, M., M.S. Nicoloso, and G.A. Calin. CurrOpin Cell Biol 2009.
Search for conserved seeds in the UTRs across different species
Evaluation of ΔG of predicted duplexes usually < -20 Kcal/molDiscard F(+) but favorable interactions not always correspond to
actual duplex
The targe site on the mRNA not involved in any intramolecular bp
Any existing secondary structure must be first removed
Thermodynamics
Structural accesiblity
Conservation
RNA bioinformatics 46
miRNABartel, D.P. 2009. Cell 2009
RNA bioinformatics 47
miRNA gene expression in cancerNegrini, M., M.S. Nicoloso, and G.A. Calin. CurrOpin Cell Biol 2009.
Lu, J., et al., Nature, 2005
RNA bioinformatics 48
Carlo Croce 2009
A
B
miR-29b or scrambled oligos injection (5 µg)K562 cells injected SC
Days
Tumor size
Stop
0 3 7 10 14
D
* P<0.003
0
200
400
600
800
1000
1200
1400
1600
1800
0 +3Days +7 +10 +14
Tum
or V
olum
e (m
m3 )
Mock
Scrambled
miR-29b
**
miR-29b
Scrambled
C
Tum
or W
eigh
t (gr
ams)
P<0.001
0
0.2
0.4
0.6
0.8
1
1.2
scrambled miR-29b
(A) Diagram illustrating the experimental design of the mice xenograft experiment.
(B) Graphic representing the tumor volume determinations at the indicated days during the experiment for the three groups; mock (n= 6), scrambled (n=12) and synthetic miR-29b (n=12).
(C) Tumor weight averages between scrambled and synthetic miR-29b treated mice groups at the end of the experiment (Day +14). P-values were obtained using t-test. Bars represent ±S.D.
(D) Photographs of two mice injected with miR-29b (left flank) or scrambled (right flank).
MiR-29b inhibits Leukemic growth in vivo.
miRNAs as tumor suppresors
RNA bioinformatics 49
miR DBs
Published miRNAS
Experimentally suported targets
Prediction of miRNAS targets
miRNA-disease relationships reported in the literature.