Na r cisus

29
“NaRCiSuS” Noncoding RNA Comparitive Searching System Marie Curie European Reintegration Grants (ERG) Call: FP7-PEOPLE-RG-2009

Transcript of Na r cisus

Page 1: Na r cisus

“NaRCiSuS”

Noncoding RNA Comparitive Searching System

Marie Curie European Reintegration Grants (ERG)

Call: FP7-PEOPLE-RG-2009

Page 2: Na r cisus

What are noncoding RNAs?

• RNA transcripts

• without long translable open reading frame

• highly structured

• realising their functions as RNAs

Page 3: Na r cisus

What do they do?

• RNA-protein machine:– Transfer RNA (tRNA).– Ribosomal RNA (rRNA).– RNAs (snRNAs) in spliceosome.

• Catalytic RNAs (ribozymes): catalyzing some functions.

• Micro RNAs (miRNAs): regulatory roles.• Small interfering RNAs (siRNAs): RNA silencing

– The genome’s immune system. [Plasterk, Science (2002)]– The breakthrough of the year by Science magazine in 2002.

Page 4: Na r cisus

Protein coding gene prediction

1. Prokaryota

• translation initiation site• ribosome binding site (Kozak sequence)• ORF features – length, similarity to known proteins,

codon usage, coding capacity

2. Eukaryota

• poli-A site• 3’, 5’ UTR• axon-intron location, splice sites

Page 5: Na r cisus

ncRNA gene prediction ?

• no translation initiation site

• no ribosome binding site

• no KOZAK sequence

• no significant ORF

Page 6: Na r cisus

ncRNA gene prediction ?

• base composition statistics

• simple translated regions search

• secondary structure based search

• combined comparative approach

Page 7: Na r cisus

RNA secondary structure

• ncRNA is not a random sequence.• Most RNAs fold into particular base-paired

secondary structure.

• Canonical basepairs:– Watson-Crick basepairs:

• G - C• A - U

– Wobble basepair:• G – U

Page 8: Na r cisus

RNA secondary structure

• Stacks: continuous nested basepairs. (energetically favorable)

• Non-basepaired loops:– Hairpin loop.– Bulge.– Internal loop.– Multiloop.

Page 9: Na r cisus

RNA secondary structure

• Most basepairs are non-crossing basepairs.– Any two pairs (i, j) and (i’,j’) i < i’ < j’ < j or i’ < i < j < j’

• Pseudoknots are the crossing basepairs.

Page 10: Na r cisus

Pseudoknots• Pseudoknots are important for certain ncRNAs• Violate the non-crossing assumption.• Pseudoknots make most problems harder • We assume there are no pseudoknots otherwise

noted.

[Rivas and Eddy (1999)]

Page 11: Na r cisus

ncRNA evolution is constrained by it secondary

structure

http://www.sanger.ac.uk/Software/Rfam/

• Drastic sequence changes can be tolerated.• Compensatory mutations are very common.

– One basepair mutates into another basepair.– Doesn’t change its secondary structure.

• In this talk:

ncRNA – conserved structured RNA.

tRNA1:

tRNA2:

Compensatory mutation

Page 12: Na r cisus

Secondary Structure alone is not enough

• de novo prediction:– Find stable secondary

structure from genome. [Shapiro et al. (1990)]

• The stability of ncRNA secondary structure is not sufficiently different from the predicted stability of a random sequence. [Rivas and Eddy (2000)].

– Look transcript signals. [Wassarman et al. (2001), Argaman et al. (2000)]

• ncRNA transcript signals are not strong.

• protein coding gene signals (open reading frame, promoter).

[Rivas and Eddy (2000)].

Page 13: Na r cisus

RNA secondary structure prediction

• It is a basic issue in ncRNA analysis

• It is important information to the biologists.

• Searching and alignment algorithms are based on these models.

• RNA secondary structure -- a set of non-crossing base pairs.

Page 14: Na r cisus

Base pair maximization problem

• A simple energy model is to maximize the number of basepairs to minimize the free energy. [Waterman (1978), Nussinov et al (1978), Waterman and Smith (1978)]

• G – C, A – U, and G – U are treated as equal stability.

• Contributions of stacking are ignored.

Page 15: Na r cisus

A dynamic programming solution

• Let s[1…n] be an RNA sequence.• δ(i,j) = 1 if s[i] and s[j] form a complementary base

pair, else δ(i,j) = 0.• M(i,j) is the maximum number of base pairs in s[i…j].

[Nussinov (1980)]

Page 16: Na r cisus

A dynamic programming solution

• M(1,n) is the number of base pairs in the optimal basepaired structure for s[1…n].

• All these basepairs can be found by tracing back through the matrix M.

• Filling M needs O(n3) time.

Page 17: Na r cisus

RNA structure: example

0

1 1

1 1 0

2 2 1 1

3 2 1 1 0

i 1 2 3 4 5 6j

3

4

5

6

2A C G A U UA C G A U U1 2 3 4 5 61 2 3 4 5 6

Page 18: Na r cisus

Zuker-Sankoff minimum energy model

• Stacks (contiguous nested base pairs) are the dominant stabilizing force – contribute the negative energy

• Unpaired bases form loops contribute the positive energy.– Hairpin loops, bulge/internal loops, and multiloops.

• Zuker-Sankoff minimum energy model. [Zuker and Sankoff (1984), Sankoff (1985)]

• Mfold and ViennaRNA are all based on this model. (this model is also called mfold model)

Page 19: Na r cisus

Zuker-Sankoff minimum energy model

:eH(i,j)

:a+3*b+4*c

:eL(i,j,i’,j’)

ij

i

j J’

i’

[Lyngsø (1999)]

ii+1

j j-1:eS(i,j,i+1,j-1)

Page 20: Na r cisus

RNA minimum energy problem

• This problem can be solved by a dynamic programming algorithm in O(n4) time.

• Lyngsø et al. (1999) revise the energy function for internal loop, proposed an O(n3) time solution.

Page 21: Na r cisus

Zuker-Sankoff model

Page 22: Na r cisus

Recursive functions

• W(i) holds the minimum energy of a structure on s[1…i].

• V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair.

• WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.

Page 23: Na r cisus

Recursive functions (Zuker)• W(i) holds the

minimum energy of a structure on s[1…i].

• V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair.

• WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.

Page 24: Na r cisus

A recursive solution• W(i) holds the

minimum energy of a structure on s[1…i].

• V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair.

• WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.

Page 25: Na r cisus

A recursive solution• W(i) holds the

minimum energy of a structure on s[1…i].

• V(i,j) holds the minimum energy of a structure on s[i…j] with s[i] and s[j] forming a basepair.

• WM(i,j) holds the minimum energy of a structure on s[i…j] that is part of multiloop.

Page 26: Na r cisus

Idea of the project• Integrated bioinformatics platform that is specifically addressed

for detecting, verifying, and classifying of noncoding RNAs. • This complex approach to "Computational RNomics" will provide

the pipeline which will be capable of detecting RNA motifs with low sequence conservation.

• It will also integrate RNA motif prediction which should significantly improve the quality of the RNA homolog search.

• The secondary structure of the RNA can be represented as a graph.

• Graph can distinguish RNA of known families from Rfam database.

Page 27: Na r cisus

Goals

• Using experssed sequence tags and profile-profile protein like method for predicting ORF.

• The candidates will be folded “ab initio” usinf method proposed by zucker (MFold)

• The secondary strucetres of the candidates can be then compared to sequence form Rfam or ncRNA database to find novel ncRNAs or detect missing members of already know families.

Page 28: Na r cisus

Service Features• Prediction missing members of already defined RNA

families • Describe properties of ncRNA and measure

confidence level of their functional importance • Prediction of novel ncRNA “ab initio”. • Visualization secondary and tetriary structure “online” • Exchange formats of RNA annotation. • Identify interaction with coding genes. • Design novel RNA’s for therapy and drug delivery

(RNAi)

Page 29: Na r cisus

SERVER

NCBI

NCBI

Sanger

Repeatmasker

BLASTz

tBLASTn

BLASTn

mRNA/EST%GC

PromH

NCBI

Loop Scaner QRNA

mFOLD

Human Mouse

Human Mouse

PFAM

NR

EST

Remove repeats

Searching for sequence of high

functionality

Searching for coding sequences

Comparison of expression patterns

Searching for sequences of well defined structure

orcontaining compensatory

mutations

Searching for remaining genes

1

2

3

4

5

6