Advanced Review Computational analysis of noncoding...

20
Advanced Review Computational analysis of noncoding RNAs Stefan Washietl 1,2,†,, Sebastian Will 1,† , David A. Hendrix 1 , Loyal A. Goff 1,3 , John L. Rinn 2,3 , Bonnie Berger 1,2,4 and Manolis Kellis 1,2 Noncoding RNAs have emerged as important key players in the cell. Understand- ing their surprisingly diverse range of functions is challenging for experimental and computational biology. Here, we review computational methods to analyze non- coding RNAs. The topics covered include basic and advanced techniques to predict RNA structures, annotation of noncoding RNAs in genomic data, mining RNA-seq data for novel transcripts and prediction of transcript structures, computational aspects of microRNAs, and database resources. © 2012 John Wiley & Sons, Ltd. How to cite this article: WIREs RNA 2012. doi: 10.1002/wrna.1134 INTRODUCTION N oncoding RNAs (ncRNAs) are transcripts that are not translated to proteins but act as functional RNAs. Several well-known ncRNAs such as transfer RNAs or ribosomal RNAs can be found throughout the tree of life. They fulfill central functions in the cell and thus have been studied for a long time. However, over the past years a few key discov- eries have shown that ncRNAs have a much richer functional spectrum than anticipated. 1 The discovery of microRNAs (miRNAs/miRs) for example changed our view of how genes are regulated. 2,3 Another surprising observation revealed by high-throughput methods is that in human 90% of the genome is tran- scribed at some time in some tissue. 4 Although the full extent and functional consequences of this pervasive transcription remains highly controversial, 5,6 the vast amount of transcripts produced suggests that many important ncRNA functions are yet to be discovered. These authors contributed equally Correspondence to: [email protected] 1 Computer Science and Artificial Intelligence Laboratory, Mas- sachusetts Institute of Technology, Cambridge, MA, USA 2 Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, MA, USA 3 Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA 4 Mathematics Department, Massachusetts Institute of Technology, Cambridge, MA, USA In particular, long noncoding RNAs (lncR- NAs)—transcripts that can be several kilobases in length, spliced and processed like mRNAs but lack obvious coding potential—seem to be a rich source of novel functions. 7 All these ncRNAs have been sug- gested to form a hidden layer of regulation that is necessary to establish the complexity of eukaryotic genomes. 8 Prokaryotic genomes also contain many sur- prises. Riboswitches, 9 small regulatory RNAs, 10 or completely unknown structured RNAs 11 suggest that ncRNAs also form an important functional layer in bacteria. Understanding the function of ncRNAs—in particular in the age of high-throughput experi- ments—is clearly not possible without computational approaches. Algorithms to annotate, organize and functionally characterize ncRNAs are of increasing relevance. In this article, we give a broad overview of programs and resources to analyze many different aspects of ncRNAs (Figure 1). STRUCTURAL ANALYSIS For many RNAs there is a close connection between structure and function. Having a good model of the structure of a RNA is thus critical and often the first clue towards elucidating its function. Determining the complete three dimensional structure (‘tertiary struc- ture’) of an RNA is a tedious and time-consuming © 2012 John Wiley & Sons, Ltd.

Transcript of Advanced Review Computational analysis of noncoding...

Page 1: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

Advanced Review

Computational analysisof noncoding RNAsStefan Washietl12daggerlowast Sebastian Will1dagger David A Hendrix1Loyal A Goff13 John L Rinn23 Bonnie Berger124 andManolis Kellis12

Noncoding RNAs have emerged as important key players in the cell Understand-ing their surprisingly diverse range of functions is challenging for experimental andcomputational biology Here we review computational methods to analyze non-coding RNAs The topics covered include basic and advanced techniques to predictRNA structures annotation of noncoding RNAs in genomic data mining RNA-seqdata for novel transcripts and prediction of transcript structures computationalaspects of microRNAs and database resources copy 2012 John Wiley amp Sons Ltd

How to cite this articleWIREs RNA 2012 doi 101002wrna1134

INTRODUCTION

Noncoding RNAs (ncRNAs) are transcripts thatare not translated to proteins but act as

functional RNAs Several well-known ncRNAs suchas transfer RNAs or ribosomal RNAs can be foundthroughout the tree of life They fulfill centralfunctions in the cell and thus have been studied for along time

However over the past years a few key discov-eries have shown that ncRNAs have a much richerfunctional spectrum than anticipated1 The discoveryof microRNAs (miRNAsmiRs) for example changedour view of how genes are regulated23 Anothersurprising observation revealed by high-throughputmethods is that in human 90 of the genome is tran-scribed at some time in some tissue4 Although the fullextent and functional consequences of this pervasivetranscription remains highly controversial56 the vastamount of transcripts produced suggests that manyimportant ncRNA functions are yet to be discovered

daggerThese authors contributed equallylowastCorrespondence to washmitedu1Computer Science and Artificial Intelligence Laboratory Mas-sachusetts Institute of Technology Cambridge MA USA2Broad Institute of Massachusetts Institute of Technology andHarvard Cambridge MA USA3Department of Stem Cell and Regenerative Biology HarvardUniversity Cambridge MA USA4Mathematics Department Massachusetts Institute of TechnologyCambridge MA USA

In particular long noncoding RNAs (lncR-NAs)mdashtranscripts that can be several kilobases inlength spliced and processed like mRNAs but lackobvious coding potentialmdashseem to be a rich sourceof novel functions7 All these ncRNAs have been sug-gested to form a hidden layer of regulation that isnecessary to establish the complexity of eukaryoticgenomes8

Prokaryotic genomes also contain many sur-prises Riboswitches9 small regulatory RNAs10 orcompletely unknown structured RNAs11 suggest thatncRNAs also form an important functional layer inbacteria

Understanding the function of ncRNAsmdashinparticular in the age of high-throughput experi-mentsmdashis clearly not possible without computationalapproaches Algorithms to annotate organize andfunctionally characterize ncRNAs are of increasingrelevance In this article we give a broad overviewof programs and resources to analyze many differentaspects of ncRNAs (Figure 1)

STRUCTURAL ANALYSIS

For many RNAs there is a close connection betweenstructure and function Having a good model of thestructure of a RNA is thus critical and often the firstclue towards elucidating its function Determining thecomplete three dimensional structure (lsquotertiary struc-turersquo) of an RNA is a tedious and time-consuming

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 1 | Outline of the main topics covered in this article Many topics overlap depend on each other or share similar concepts The mostimportant of these interconnections are shown by arrows

undertaking Computational methodsmdasheither com-pletely de novo or assisted by experimental datamdasharetherefore routinely used to predict structure modelsA strong focus lies on the prediction of the secondarystructure ie the pattern of intramolecular base-pairs(AmiddotU GmiddotC and GmiddotU) typically formed in RNAs

Secondary StructureThermodynamic Folding of Single SequencesIn RNAs the secondary structural elements areresponsible for most of the overall folding energy andcan be seen as a coarse-grained approximation of thetertiary structure This important biophysical propertyin combination with the fact that secondary structurecan easily be formalized as a simple graph (Figure 2)led to secondary structure being widely studiedearly on One of the first attempts to approach theRNA folding problem (ie predicting the secondarystructure from the primary sequence) was by Nussinovand Jacobson12 They proposed an algorithm to findthe secondary structure with the maximum numberof base pairs It is one of the classical examples ofdynamic programming algorithms in computationalbiology and all modern variants of folding algorithmessentially use the same principle (Box 1)

In practice however finding the structure withthe maximum number of pairs does not give accu-rate results Ideally we want to find the struc-ture of minimum folding energy Since most of thefolding energy in RNAs is contributed by stacking

interactions between neighboring base pairs countingsingle base pairs is not sufficient Therefore cur-rent folding algorithms use a lsquonearest neighborrsquo oralso called lsquoloop-basedrsquo energy model A structureis uniquely decomposed into substructural elements(stacked bases hairpin-loops bulges interior-loopsand multi-way-junctions Figure 2a) The structuralelements are assigned energies which add up to thetotal folding energy of the structure (Figure 2b) Theenergy values are established empirically and typicallycome from systematic melting experiments on smallsynthetic RNAs An up-to-date set of energy parame-ters is maintained by Douglas Turnerrsquos lab1314

A dynamic programming algorithm to find theminimum free energy (MFE) for this more complexenergy model was proposed by Zuker and Stiegler15

and forms the basis for modern prediction pro-grams The most common implementations used areUNAFold16 RNAfold of the Vienna RNA package17

and RNAstructure18 The accuracy of MFE predic-tions depend on the type of RNA Although someRNAs can be predicted with high accuracy in gen-eral one has to expect that roughly a third of thepredicted base pairs are wrong and one third of truebase pairs are missed It is thus important to keep inmind that even the best currently available predictionmethods only give a rough model of the structure Ter-tiary interactions protein context and other inherentlimitations of the energy model are all sources ofpotential errors

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(a)

(b)

S

S

SSSSS

S

S S S S

B Bulge

FIGURE 2 | Principles of RNA structure prediction (a) RNA secondary structure can be represented as an outerplanar graph (right) The backboneis arranged as a circle and base pairs are represented as arcs The faces of this graph correspond to different structural elements This formalization isthe basis for most structure prediction algorithms Any structure can be uniquely decomposed into these basic elements which are independent fromeach other This allows for efficient folding algorithms based on the lsquodynamic programmingrsquo principle that breaks down the problem into smallersubproblems (see also Box 1) (b) Example of energy evaluation of a small RNA structure Thermodynamic folding algorithms assign free energies tothe structural elements In the example shown two stacks and a symmetric interior loop stabilize the structure (negative free energy) while thehairpin loop destabilizes the structure (positive free energy) The total free energy of the structure is the sum of the energy of all structural elements

BOX 1

DYNAMIC PROGRAMMING

The Dynamic Programming (DP) paradigm is usedfor many algorithms related to RNA folding DPbreaks down a problem in smaller subproblemsto find the overall solution efficiently Nussinovrsquosalgorithm is a classical example of a DPalgorithm Let us assume we want to find theminimum free energy Eij between the positionsi and j of a sequence and already know thesolution for a sequence from i + 1 to j ie asequence that is one base shorter The new basei can either be unpaired or forms a base-pairwith some position k

i jj i i+1 j i i+1 kminus1 k k+1|=

The base-pair k divides the problem intotwo smaller sub-problems namely finding thesolution for Ei+1kminus1 and Ek+1j We thus can findthe solution using a recursive algorithm

Eij = min

Ei+1j mini+1leklej

Ei+1kminus1 + Ek+1j + βik

βik is the energy contribution for the base-pair i k in this simplified energy model

At room temperature RNAs usually exist inan ensemble of different structures and the MFEstructure is not necessarily the biologically relevantstructure There are different algorithms to predictsuboptimal structures close to the MFE structure1920

McCaskillrsquos algorithm21 allows one to calculate thepartition function over all possible structures andsubsequently the probability of a particular basepair in the thermodynamic ensemble Consideringthe pair-probability matrix of all possible base pairsgives a more comprehensive view of the structuralproperties of an RNA than just the MFE predictionIt is also possible to obtain individual structurepredictions from the pair-probability matrix either bysampling22 or by finding a structure that maximizesthe expected accuracy considering a weighting factorbetween sensitivity and specificity2324

RNA Folding Using Probabilistic ModelsAn alternative to the thermodynamic approach ofRNA folding is a probabilistic approach based onmachine-learning principles Instead of using energyparameters folding parameters can be estimatedfrom a training set of known structures and areused to predict structures of unknown sequencesThere are several probabilistic frameworks to accom-plish parameter estimation and prediction Stochasticcontext-free grammars (SCFGs) are a generalization

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

BOX 2

STOCHASTIC CONTEXT FREE GRAMMARS

Context free grammars (CFG) are concepts fromformal language theory In most simple termsa CFG is a set production rules V rarr w where Vrepresent a so-called nonterminal symbol thatproduces a string of terminal or non-terminalsymbols w An example of a simple RNA gram-mar would be S rarr aSa|aS|Sa|SS|ε The gram-mar has one type of nonterminal symbol Sand one type of terminal symbols a isin A C G Trepresenting the bases The grammar consistsof production rules for unpaired and pairedbases (aa represent two complementary bases)This simple rules allow to produce all possibleRNA secondary structures A stochastic contextfree grammar (SCFG) extends CFGs by assign-ing probabilities to all production rules In thecase of our RNA grammar a full parametrizedSCFG would thus describe the probability dis-tribution over all structures and sequences

of Hidden Markov models that are widely used inbioinformatics (Box 2) SCFGs allow one to con-sider nested dependencies a prerequisite to modelRNA structure They have been used successfully forhomology search problems and consensus structureprediction (see below) Although SCFGs can be usedfor single sequence structure prediction25 they arenot widely used for this problem CONTRAFOLD23 analternative machine learning approach based on con-ditional random fields however could establish itselfas a serious alternative to thermodynamic methodsThere are also hybrid approaches that try to enhancethe thermodynamic parameters by training on knownstructures2627

Incorporating Structure Probing Data intoFolding AlgorithmsStructure probing experiments typically use enzymaticor chemical agents that specifically target paired orunpaired regions28 Most implementations of thermo-dynamic folding like UNAfold or RNAfold allowfor the incorporation of this type of information byrestricting the folding to structures consistent with theexperimental constraints As an alternative exper-imental information can also be incorporated aslsquopseudo-energiesrsquo into the folding algorithm enforc-ing regions to be preferentially paired or unpairedreflecting the experimental evidence29ndash31 Recentlyhigh-throughput sequencing techniques were used toscale up structure probing experiments massively32ndash34

(a)

(b)

FIGURE 3 | Principles of comparative analysis for RNA structureprediction (a) A short sequence that can fold into a hairpin is aligned tothree other sequences with different mutation patterns Mutated basesare indicated by a lightning symbol The affected base pair is shown inblue green and red for the case of a consistent mutation acompensatory double mutation or a inconsistent mutation thatdestroys the base pair (b) Sequence based alignment versus structuralalignment Consensus structure predicted for two aligned sequencesFirst the alignment is optimized to match the sequences resulting in apoor consensus structure with few conserved base-pairs (green)Second the two sequences are aligned to optimize a common structureresulting in a much better consensus structure with more conservedpairs (All structures are shown in lsquodotbracketrsquo notation in whichbase-pairs are indicated by brackets and unpaired positions are shownas dots)

Dealing with inherently noisy data of this type turnedout to be challenging and is still an active field ofresearch

Secondary Structure Prediction for HomologousSequencesAnother way to improve secondary structure predic-tion is to consider homologous sequences from relatedspecies If two or more sequences share a commonstructure but have diverged on the sequence leveltypical base substitution patterns that maintain thecommon structure can be observed (Figure 3a) A con-sistent mutation changes one base (eg AmiddotU harr GmiddotU)while compensatory mutations change both bases inthe base pair (eg AmiddotU harr GmiddotC or CmiddotG harr GmiddotC)Clearly these patterns provide useful information toinfer a secondary structure

The simplest way to exploit this signal isto calculate a mutual information score to findcolumns that show highly correlated mutation pat-terns This method led to surprisingly accurate struc-tures for rRNAs as early as 30 years ago35 Aswe rarely have such large datasets of RNAs withextremely conserved structures it is natural to combinecovariance analysis with classical folding algorithms

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

RNAalifold36 extends Zukerrsquos folding algorithm tomultiple sequence alignments by averaging the energycontribution over the sequences and adding covari-ance information in the form of lsquopseudoenergiesrsquoA probabilistic alternative is Pfold37 which usesa simple stochastic context free grammar combinedwith an evolutionary model of sequence evolutionto infer a consensus structure PetFold38 extendsPfold by incorporating pair probabilities from ther-modynamic folding and thus unifies evolutionaryand thermodynamic information A more recent pro-gram TurboFold39 also uses thermodynamic fold-ing and iteratively refines the energy parameters byincorporating pair probabilities from homologoussequences

Structural AlignmentEven unaligned RNAs can provide more informa-tion about their common structure than a sin-gle sequence Low sequence homology below 60sequence identity40 prohibits the sequence alignment-based approach of the previous section (Figure 3b)since correct alignment requires information aboutthe structure Since structure predictions for singlesequences are unreliable folding the sequences fol-lowed by structure-based alignment can also fail

Therefore the most successful strategies foldand align the RNAs simultaneously The first suchalgorithm41 by Sankoff simultaneously optimizes thealignmentrsquos edit distance and the free energies of bothRNA structures applying a loop-based energy modelHowever only recent advances made this strategyapplicable to practical RNA analysis

The first complete pairwise Sankoff-implementation Dynalign42 implements a loop-based energymodel but employs a simple banding technique forincreasing efficiency A further pairwise Sankoff-liketool is Foldalign43 Computing a loop-based energymodel during the alignment these algorithms areaccurate but computationally expensive in practicethey compensate this by strong sequence-basedheuristic restrictions

Several less expensive Sankoff-like algorithmsare based on simplifications introduced by PMcomp44

PMcomp replaces the loop-based energy model byassigning lsquopseudo-energiesrsquo to single base pairs Thisreduces the computational overhead significantly Bycomputing the pseudoenergies of base pairs fromtheir probabilities in the structure ensembles of thesingle RNAs accurate information from the loop-based energy model is fed back into the light-weightalgorithm

Approaches following this idea are LocARNA45

foldalignM46 RAF47 and LocARNA-P48 All these

tools additionally employ sparsity in the structureensemble of the single sequences

The sparsified PMcomp-like approaches aresufficiently fast for multiple alignment and large scalestudies performing for example clustering4546 andde novo prediction of structural RNA49

Prediction with PseudoknotsPseudoknotted structures follow the same rules asother secondary structures but allow nontree likeconfigurations for example because of an additionallevel of nested base pairs or pairing between differenthairpin loops (kissing hairpin) (Figure 4) Pseudoknotsare prevalent in many ncRNAs still most algorithmsignore them for technical reasons pseudoknot-foldingis computationally expensive and accurate empiricalenergy models are missing

Algorithmic ChallengesThe run-time of pseudoknot-free structure predic-tion grows only with the cube of the sequencelength Unfortunately when general pseudoknots areallowed the computation time grows much fasternamely exponentially with the sequence length50

Consequently finding exact solutions is intractablefor all but very short RNAs Note that the lsquoprincipleof optimalityrsquo which allows dynamic programming(Box 1) in the pseudoknot-free case is not applicablein the case of general pseudoknots By this princi-ple every optimal structure can be composed fromoptimal structures of its subsequences

In practice often heuristic methods are appli-cable Among the numerous approaches are ILM53

HotKnots54 KnotSeeker55 and IPknot56 ILM53

applies the classic principle of lsquohierarchic foldingrsquo itconstructs a pseudoknotted structure by iterativelypredicting a most likely stem using pseudoknot-freeprediction which is then added to the structure Hot-Knots54 refines the construction of the pseudoknottedstructure from likely more general secondary struc-ture elements TurboKnot57 even predicts conservedpseudoknots from a set of homologous RNAs Apply-ing a topological classification of RNA structures58

TT2NE59 guarantees to find the best RNA structureregardless of pseudoknot complexity however thislimits the length of treatable sequences

Other algorithms restrict the types of pseu-doknots such that dynamic programming can beapplied50ndash5260ndash63 These algorithms differ in theircomputational complexity and the complexity of con-sidered structures Figure 4 shows pseudoknots ofdifferent complexity

Rivas and Eddy51 proposed the most generalsuch algorithm It predicts three-knots (Figure 4c)

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52

predicts only canonical pseudoknots

(a)

(b)

(c)

(d)

(e)

but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63

(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically

Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are

no empirical energy parameters for pseudoknottedstructure elements

Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365

Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167

Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward

There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(a) (b)

FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)

example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69

predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70

RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions

Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular

base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed

Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72

and RNAduplex73

A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)

The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75

Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets

Finally several approaches predict con-served interactions between multiple sequencealignments7879

Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems

(a)

(b)

(c)

(d)

Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83

is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases

ANNOTATING ncRNAS IN GENOMICDATA

Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs

De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function

As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(c)

(a)(b)

FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)

of evolutionary signatures in alignments of relatedsequences can improve the signal considerably

The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93

To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95

searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596

and many other genomes (eg Ref 97)

Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization

Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101

A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models

In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110

signal recognition particle RNAs111

Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112

For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115

found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117

MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS

The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels

Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners

such as TopHat118 GSNAP119 and SpliceMap 120

identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified

Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)

With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121

and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference

In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms

sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements

Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126

using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood

function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms

RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries

The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 2: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

Advanced Review wireswileycomrna

FIGURE 1 | Outline of the main topics covered in this article Many topics overlap depend on each other or share similar concepts The mostimportant of these interconnections are shown by arrows

undertaking Computational methodsmdasheither com-pletely de novo or assisted by experimental datamdasharetherefore routinely used to predict structure modelsA strong focus lies on the prediction of the secondarystructure ie the pattern of intramolecular base-pairs(AmiddotU GmiddotC and GmiddotU) typically formed in RNAs

Secondary StructureThermodynamic Folding of Single SequencesIn RNAs the secondary structural elements areresponsible for most of the overall folding energy andcan be seen as a coarse-grained approximation of thetertiary structure This important biophysical propertyin combination with the fact that secondary structurecan easily be formalized as a simple graph (Figure 2)led to secondary structure being widely studiedearly on One of the first attempts to approach theRNA folding problem (ie predicting the secondarystructure from the primary sequence) was by Nussinovand Jacobson12 They proposed an algorithm to findthe secondary structure with the maximum numberof base pairs It is one of the classical examples ofdynamic programming algorithms in computationalbiology and all modern variants of folding algorithmessentially use the same principle (Box 1)

In practice however finding the structure withthe maximum number of pairs does not give accu-rate results Ideally we want to find the struc-ture of minimum folding energy Since most of thefolding energy in RNAs is contributed by stacking

interactions between neighboring base pairs countingsingle base pairs is not sufficient Therefore cur-rent folding algorithms use a lsquonearest neighborrsquo oralso called lsquoloop-basedrsquo energy model A structureis uniquely decomposed into substructural elements(stacked bases hairpin-loops bulges interior-loopsand multi-way-junctions Figure 2a) The structuralelements are assigned energies which add up to thetotal folding energy of the structure (Figure 2b) Theenergy values are established empirically and typicallycome from systematic melting experiments on smallsynthetic RNAs An up-to-date set of energy parame-ters is maintained by Douglas Turnerrsquos lab1314

A dynamic programming algorithm to find theminimum free energy (MFE) for this more complexenergy model was proposed by Zuker and Stiegler15

and forms the basis for modern prediction pro-grams The most common implementations used areUNAFold16 RNAfold of the Vienna RNA package17

and RNAstructure18 The accuracy of MFE predic-tions depend on the type of RNA Although someRNAs can be predicted with high accuracy in gen-eral one has to expect that roughly a third of thepredicted base pairs are wrong and one third of truebase pairs are missed It is thus important to keep inmind that even the best currently available predictionmethods only give a rough model of the structure Ter-tiary interactions protein context and other inherentlimitations of the energy model are all sources ofpotential errors

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(a)

(b)

S

S

SSSSS

S

S S S S

B Bulge

FIGURE 2 | Principles of RNA structure prediction (a) RNA secondary structure can be represented as an outerplanar graph (right) The backboneis arranged as a circle and base pairs are represented as arcs The faces of this graph correspond to different structural elements This formalization isthe basis for most structure prediction algorithms Any structure can be uniquely decomposed into these basic elements which are independent fromeach other This allows for efficient folding algorithms based on the lsquodynamic programmingrsquo principle that breaks down the problem into smallersubproblems (see also Box 1) (b) Example of energy evaluation of a small RNA structure Thermodynamic folding algorithms assign free energies tothe structural elements In the example shown two stacks and a symmetric interior loop stabilize the structure (negative free energy) while thehairpin loop destabilizes the structure (positive free energy) The total free energy of the structure is the sum of the energy of all structural elements

BOX 1

DYNAMIC PROGRAMMING

The Dynamic Programming (DP) paradigm is usedfor many algorithms related to RNA folding DPbreaks down a problem in smaller subproblemsto find the overall solution efficiently Nussinovrsquosalgorithm is a classical example of a DPalgorithm Let us assume we want to find theminimum free energy Eij between the positionsi and j of a sequence and already know thesolution for a sequence from i + 1 to j ie asequence that is one base shorter The new basei can either be unpaired or forms a base-pairwith some position k

i jj i i+1 j i i+1 kminus1 k k+1|=

The base-pair k divides the problem intotwo smaller sub-problems namely finding thesolution for Ei+1kminus1 and Ek+1j We thus can findthe solution using a recursive algorithm

Eij = min

Ei+1j mini+1leklej

Ei+1kminus1 + Ek+1j + βik

βik is the energy contribution for the base-pair i k in this simplified energy model

At room temperature RNAs usually exist inan ensemble of different structures and the MFEstructure is not necessarily the biologically relevantstructure There are different algorithms to predictsuboptimal structures close to the MFE structure1920

McCaskillrsquos algorithm21 allows one to calculate thepartition function over all possible structures andsubsequently the probability of a particular basepair in the thermodynamic ensemble Consideringthe pair-probability matrix of all possible base pairsgives a more comprehensive view of the structuralproperties of an RNA than just the MFE predictionIt is also possible to obtain individual structurepredictions from the pair-probability matrix either bysampling22 or by finding a structure that maximizesthe expected accuracy considering a weighting factorbetween sensitivity and specificity2324

RNA Folding Using Probabilistic ModelsAn alternative to the thermodynamic approach ofRNA folding is a probabilistic approach based onmachine-learning principles Instead of using energyparameters folding parameters can be estimatedfrom a training set of known structures and areused to predict structures of unknown sequencesThere are several probabilistic frameworks to accom-plish parameter estimation and prediction Stochasticcontext-free grammars (SCFGs) are a generalization

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

BOX 2

STOCHASTIC CONTEXT FREE GRAMMARS

Context free grammars (CFG) are concepts fromformal language theory In most simple termsa CFG is a set production rules V rarr w where Vrepresent a so-called nonterminal symbol thatproduces a string of terminal or non-terminalsymbols w An example of a simple RNA gram-mar would be S rarr aSa|aS|Sa|SS|ε The gram-mar has one type of nonterminal symbol Sand one type of terminal symbols a isin A C G Trepresenting the bases The grammar consistsof production rules for unpaired and pairedbases (aa represent two complementary bases)This simple rules allow to produce all possibleRNA secondary structures A stochastic contextfree grammar (SCFG) extends CFGs by assign-ing probabilities to all production rules In thecase of our RNA grammar a full parametrizedSCFG would thus describe the probability dis-tribution over all structures and sequences

of Hidden Markov models that are widely used inbioinformatics (Box 2) SCFGs allow one to con-sider nested dependencies a prerequisite to modelRNA structure They have been used successfully forhomology search problems and consensus structureprediction (see below) Although SCFGs can be usedfor single sequence structure prediction25 they arenot widely used for this problem CONTRAFOLD23 analternative machine learning approach based on con-ditional random fields however could establish itselfas a serious alternative to thermodynamic methodsThere are also hybrid approaches that try to enhancethe thermodynamic parameters by training on knownstructures2627

Incorporating Structure Probing Data intoFolding AlgorithmsStructure probing experiments typically use enzymaticor chemical agents that specifically target paired orunpaired regions28 Most implementations of thermo-dynamic folding like UNAfold or RNAfold allowfor the incorporation of this type of information byrestricting the folding to structures consistent with theexperimental constraints As an alternative exper-imental information can also be incorporated aslsquopseudo-energiesrsquo into the folding algorithm enforc-ing regions to be preferentially paired or unpairedreflecting the experimental evidence29ndash31 Recentlyhigh-throughput sequencing techniques were used toscale up structure probing experiments massively32ndash34

(a)

(b)

FIGURE 3 | Principles of comparative analysis for RNA structureprediction (a) A short sequence that can fold into a hairpin is aligned tothree other sequences with different mutation patterns Mutated basesare indicated by a lightning symbol The affected base pair is shown inblue green and red for the case of a consistent mutation acompensatory double mutation or a inconsistent mutation thatdestroys the base pair (b) Sequence based alignment versus structuralalignment Consensus structure predicted for two aligned sequencesFirst the alignment is optimized to match the sequences resulting in apoor consensus structure with few conserved base-pairs (green)Second the two sequences are aligned to optimize a common structureresulting in a much better consensus structure with more conservedpairs (All structures are shown in lsquodotbracketrsquo notation in whichbase-pairs are indicated by brackets and unpaired positions are shownas dots)

Dealing with inherently noisy data of this type turnedout to be challenging and is still an active field ofresearch

Secondary Structure Prediction for HomologousSequencesAnother way to improve secondary structure predic-tion is to consider homologous sequences from relatedspecies If two or more sequences share a commonstructure but have diverged on the sequence leveltypical base substitution patterns that maintain thecommon structure can be observed (Figure 3a) A con-sistent mutation changes one base (eg AmiddotU harr GmiddotU)while compensatory mutations change both bases inthe base pair (eg AmiddotU harr GmiddotC or CmiddotG harr GmiddotC)Clearly these patterns provide useful information toinfer a secondary structure

The simplest way to exploit this signal isto calculate a mutual information score to findcolumns that show highly correlated mutation pat-terns This method led to surprisingly accurate struc-tures for rRNAs as early as 30 years ago35 Aswe rarely have such large datasets of RNAs withextremely conserved structures it is natural to combinecovariance analysis with classical folding algorithms

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

RNAalifold36 extends Zukerrsquos folding algorithm tomultiple sequence alignments by averaging the energycontribution over the sequences and adding covari-ance information in the form of lsquopseudoenergiesrsquoA probabilistic alternative is Pfold37 which usesa simple stochastic context free grammar combinedwith an evolutionary model of sequence evolutionto infer a consensus structure PetFold38 extendsPfold by incorporating pair probabilities from ther-modynamic folding and thus unifies evolutionaryand thermodynamic information A more recent pro-gram TurboFold39 also uses thermodynamic fold-ing and iteratively refines the energy parameters byincorporating pair probabilities from homologoussequences

Structural AlignmentEven unaligned RNAs can provide more informa-tion about their common structure than a sin-gle sequence Low sequence homology below 60sequence identity40 prohibits the sequence alignment-based approach of the previous section (Figure 3b)since correct alignment requires information aboutthe structure Since structure predictions for singlesequences are unreliable folding the sequences fol-lowed by structure-based alignment can also fail

Therefore the most successful strategies foldand align the RNAs simultaneously The first suchalgorithm41 by Sankoff simultaneously optimizes thealignmentrsquos edit distance and the free energies of bothRNA structures applying a loop-based energy modelHowever only recent advances made this strategyapplicable to practical RNA analysis

The first complete pairwise Sankoff-implementation Dynalign42 implements a loop-based energymodel but employs a simple banding technique forincreasing efficiency A further pairwise Sankoff-liketool is Foldalign43 Computing a loop-based energymodel during the alignment these algorithms areaccurate but computationally expensive in practicethey compensate this by strong sequence-basedheuristic restrictions

Several less expensive Sankoff-like algorithmsare based on simplifications introduced by PMcomp44

PMcomp replaces the loop-based energy model byassigning lsquopseudo-energiesrsquo to single base pairs Thisreduces the computational overhead significantly Bycomputing the pseudoenergies of base pairs fromtheir probabilities in the structure ensembles of thesingle RNAs accurate information from the loop-based energy model is fed back into the light-weightalgorithm

Approaches following this idea are LocARNA45

foldalignM46 RAF47 and LocARNA-P48 All these

tools additionally employ sparsity in the structureensemble of the single sequences

The sparsified PMcomp-like approaches aresufficiently fast for multiple alignment and large scalestudies performing for example clustering4546 andde novo prediction of structural RNA49

Prediction with PseudoknotsPseudoknotted structures follow the same rules asother secondary structures but allow nontree likeconfigurations for example because of an additionallevel of nested base pairs or pairing between differenthairpin loops (kissing hairpin) (Figure 4) Pseudoknotsare prevalent in many ncRNAs still most algorithmsignore them for technical reasons pseudoknot-foldingis computationally expensive and accurate empiricalenergy models are missing

Algorithmic ChallengesThe run-time of pseudoknot-free structure predic-tion grows only with the cube of the sequencelength Unfortunately when general pseudoknots areallowed the computation time grows much fasternamely exponentially with the sequence length50

Consequently finding exact solutions is intractablefor all but very short RNAs Note that the lsquoprincipleof optimalityrsquo which allows dynamic programming(Box 1) in the pseudoknot-free case is not applicablein the case of general pseudoknots By this princi-ple every optimal structure can be composed fromoptimal structures of its subsequences

In practice often heuristic methods are appli-cable Among the numerous approaches are ILM53

HotKnots54 KnotSeeker55 and IPknot56 ILM53

applies the classic principle of lsquohierarchic foldingrsquo itconstructs a pseudoknotted structure by iterativelypredicting a most likely stem using pseudoknot-freeprediction which is then added to the structure Hot-Knots54 refines the construction of the pseudoknottedstructure from likely more general secondary struc-ture elements TurboKnot57 even predicts conservedpseudoknots from a set of homologous RNAs Apply-ing a topological classification of RNA structures58

TT2NE59 guarantees to find the best RNA structureregardless of pseudoknot complexity however thislimits the length of treatable sequences

Other algorithms restrict the types of pseu-doknots such that dynamic programming can beapplied50ndash5260ndash63 These algorithms differ in theircomputational complexity and the complexity of con-sidered structures Figure 4 shows pseudoknots ofdifferent complexity

Rivas and Eddy51 proposed the most generalsuch algorithm It predicts three-knots (Figure 4c)

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52

predicts only canonical pseudoknots

(a)

(b)

(c)

(d)

(e)

but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63

(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically

Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are

no empirical energy parameters for pseudoknottedstructure elements

Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365

Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167

Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward

There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(a) (b)

FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)

example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69

predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70

RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions

Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular

base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed

Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72

and RNAduplex73

A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)

The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75

Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets

Finally several approaches predict con-served interactions between multiple sequencealignments7879

Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems

(a)

(b)

(c)

(d)

Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83

is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases

ANNOTATING ncRNAS IN GENOMICDATA

Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs

De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function

As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(c)

(a)(b)

FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)

of evolutionary signatures in alignments of relatedsequences can improve the signal considerably

The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93

To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95

searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596

and many other genomes (eg Ref 97)

Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization

Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101

A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models

In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110

signal recognition particle RNAs111

Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112

For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115

found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117

MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS

The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels

Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners

such as TopHat118 GSNAP119 and SpliceMap 120

identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified

Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)

With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121

and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference

In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms

sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements

Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126

using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood

function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms

RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries

The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 3: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

WIREs RNA Computational analysis of noncoding RNAs

(a)

(b)

S

S

SSSSS

S

S S S S

B Bulge

FIGURE 2 | Principles of RNA structure prediction (a) RNA secondary structure can be represented as an outerplanar graph (right) The backboneis arranged as a circle and base pairs are represented as arcs The faces of this graph correspond to different structural elements This formalization isthe basis for most structure prediction algorithms Any structure can be uniquely decomposed into these basic elements which are independent fromeach other This allows for efficient folding algorithms based on the lsquodynamic programmingrsquo principle that breaks down the problem into smallersubproblems (see also Box 1) (b) Example of energy evaluation of a small RNA structure Thermodynamic folding algorithms assign free energies tothe structural elements In the example shown two stacks and a symmetric interior loop stabilize the structure (negative free energy) while thehairpin loop destabilizes the structure (positive free energy) The total free energy of the structure is the sum of the energy of all structural elements

BOX 1

DYNAMIC PROGRAMMING

The Dynamic Programming (DP) paradigm is usedfor many algorithms related to RNA folding DPbreaks down a problem in smaller subproblemsto find the overall solution efficiently Nussinovrsquosalgorithm is a classical example of a DPalgorithm Let us assume we want to find theminimum free energy Eij between the positionsi and j of a sequence and already know thesolution for a sequence from i + 1 to j ie asequence that is one base shorter The new basei can either be unpaired or forms a base-pairwith some position k

i jj i i+1 j i i+1 kminus1 k k+1|=

The base-pair k divides the problem intotwo smaller sub-problems namely finding thesolution for Ei+1kminus1 and Ek+1j We thus can findthe solution using a recursive algorithm

Eij = min

Ei+1j mini+1leklej

Ei+1kminus1 + Ek+1j + βik

βik is the energy contribution for the base-pair i k in this simplified energy model

At room temperature RNAs usually exist inan ensemble of different structures and the MFEstructure is not necessarily the biologically relevantstructure There are different algorithms to predictsuboptimal structures close to the MFE structure1920

McCaskillrsquos algorithm21 allows one to calculate thepartition function over all possible structures andsubsequently the probability of a particular basepair in the thermodynamic ensemble Consideringthe pair-probability matrix of all possible base pairsgives a more comprehensive view of the structuralproperties of an RNA than just the MFE predictionIt is also possible to obtain individual structurepredictions from the pair-probability matrix either bysampling22 or by finding a structure that maximizesthe expected accuracy considering a weighting factorbetween sensitivity and specificity2324

RNA Folding Using Probabilistic ModelsAn alternative to the thermodynamic approach ofRNA folding is a probabilistic approach based onmachine-learning principles Instead of using energyparameters folding parameters can be estimatedfrom a training set of known structures and areused to predict structures of unknown sequencesThere are several probabilistic frameworks to accom-plish parameter estimation and prediction Stochasticcontext-free grammars (SCFGs) are a generalization

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

BOX 2

STOCHASTIC CONTEXT FREE GRAMMARS

Context free grammars (CFG) are concepts fromformal language theory In most simple termsa CFG is a set production rules V rarr w where Vrepresent a so-called nonterminal symbol thatproduces a string of terminal or non-terminalsymbols w An example of a simple RNA gram-mar would be S rarr aSa|aS|Sa|SS|ε The gram-mar has one type of nonterminal symbol Sand one type of terminal symbols a isin A C G Trepresenting the bases The grammar consistsof production rules for unpaired and pairedbases (aa represent two complementary bases)This simple rules allow to produce all possibleRNA secondary structures A stochastic contextfree grammar (SCFG) extends CFGs by assign-ing probabilities to all production rules In thecase of our RNA grammar a full parametrizedSCFG would thus describe the probability dis-tribution over all structures and sequences

of Hidden Markov models that are widely used inbioinformatics (Box 2) SCFGs allow one to con-sider nested dependencies a prerequisite to modelRNA structure They have been used successfully forhomology search problems and consensus structureprediction (see below) Although SCFGs can be usedfor single sequence structure prediction25 they arenot widely used for this problem CONTRAFOLD23 analternative machine learning approach based on con-ditional random fields however could establish itselfas a serious alternative to thermodynamic methodsThere are also hybrid approaches that try to enhancethe thermodynamic parameters by training on knownstructures2627

Incorporating Structure Probing Data intoFolding AlgorithmsStructure probing experiments typically use enzymaticor chemical agents that specifically target paired orunpaired regions28 Most implementations of thermo-dynamic folding like UNAfold or RNAfold allowfor the incorporation of this type of information byrestricting the folding to structures consistent with theexperimental constraints As an alternative exper-imental information can also be incorporated aslsquopseudo-energiesrsquo into the folding algorithm enforc-ing regions to be preferentially paired or unpairedreflecting the experimental evidence29ndash31 Recentlyhigh-throughput sequencing techniques were used toscale up structure probing experiments massively32ndash34

(a)

(b)

FIGURE 3 | Principles of comparative analysis for RNA structureprediction (a) A short sequence that can fold into a hairpin is aligned tothree other sequences with different mutation patterns Mutated basesare indicated by a lightning symbol The affected base pair is shown inblue green and red for the case of a consistent mutation acompensatory double mutation or a inconsistent mutation thatdestroys the base pair (b) Sequence based alignment versus structuralalignment Consensus structure predicted for two aligned sequencesFirst the alignment is optimized to match the sequences resulting in apoor consensus structure with few conserved base-pairs (green)Second the two sequences are aligned to optimize a common structureresulting in a much better consensus structure with more conservedpairs (All structures are shown in lsquodotbracketrsquo notation in whichbase-pairs are indicated by brackets and unpaired positions are shownas dots)

Dealing with inherently noisy data of this type turnedout to be challenging and is still an active field ofresearch

Secondary Structure Prediction for HomologousSequencesAnother way to improve secondary structure predic-tion is to consider homologous sequences from relatedspecies If two or more sequences share a commonstructure but have diverged on the sequence leveltypical base substitution patterns that maintain thecommon structure can be observed (Figure 3a) A con-sistent mutation changes one base (eg AmiddotU harr GmiddotU)while compensatory mutations change both bases inthe base pair (eg AmiddotU harr GmiddotC or CmiddotG harr GmiddotC)Clearly these patterns provide useful information toinfer a secondary structure

The simplest way to exploit this signal isto calculate a mutual information score to findcolumns that show highly correlated mutation pat-terns This method led to surprisingly accurate struc-tures for rRNAs as early as 30 years ago35 Aswe rarely have such large datasets of RNAs withextremely conserved structures it is natural to combinecovariance analysis with classical folding algorithms

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

RNAalifold36 extends Zukerrsquos folding algorithm tomultiple sequence alignments by averaging the energycontribution over the sequences and adding covari-ance information in the form of lsquopseudoenergiesrsquoA probabilistic alternative is Pfold37 which usesa simple stochastic context free grammar combinedwith an evolutionary model of sequence evolutionto infer a consensus structure PetFold38 extendsPfold by incorporating pair probabilities from ther-modynamic folding and thus unifies evolutionaryand thermodynamic information A more recent pro-gram TurboFold39 also uses thermodynamic fold-ing and iteratively refines the energy parameters byincorporating pair probabilities from homologoussequences

Structural AlignmentEven unaligned RNAs can provide more informa-tion about their common structure than a sin-gle sequence Low sequence homology below 60sequence identity40 prohibits the sequence alignment-based approach of the previous section (Figure 3b)since correct alignment requires information aboutthe structure Since structure predictions for singlesequences are unreliable folding the sequences fol-lowed by structure-based alignment can also fail

Therefore the most successful strategies foldand align the RNAs simultaneously The first suchalgorithm41 by Sankoff simultaneously optimizes thealignmentrsquos edit distance and the free energies of bothRNA structures applying a loop-based energy modelHowever only recent advances made this strategyapplicable to practical RNA analysis

The first complete pairwise Sankoff-implementation Dynalign42 implements a loop-based energymodel but employs a simple banding technique forincreasing efficiency A further pairwise Sankoff-liketool is Foldalign43 Computing a loop-based energymodel during the alignment these algorithms areaccurate but computationally expensive in practicethey compensate this by strong sequence-basedheuristic restrictions

Several less expensive Sankoff-like algorithmsare based on simplifications introduced by PMcomp44

PMcomp replaces the loop-based energy model byassigning lsquopseudo-energiesrsquo to single base pairs Thisreduces the computational overhead significantly Bycomputing the pseudoenergies of base pairs fromtheir probabilities in the structure ensembles of thesingle RNAs accurate information from the loop-based energy model is fed back into the light-weightalgorithm

Approaches following this idea are LocARNA45

foldalignM46 RAF47 and LocARNA-P48 All these

tools additionally employ sparsity in the structureensemble of the single sequences

The sparsified PMcomp-like approaches aresufficiently fast for multiple alignment and large scalestudies performing for example clustering4546 andde novo prediction of structural RNA49

Prediction with PseudoknotsPseudoknotted structures follow the same rules asother secondary structures but allow nontree likeconfigurations for example because of an additionallevel of nested base pairs or pairing between differenthairpin loops (kissing hairpin) (Figure 4) Pseudoknotsare prevalent in many ncRNAs still most algorithmsignore them for technical reasons pseudoknot-foldingis computationally expensive and accurate empiricalenergy models are missing

Algorithmic ChallengesThe run-time of pseudoknot-free structure predic-tion grows only with the cube of the sequencelength Unfortunately when general pseudoknots areallowed the computation time grows much fasternamely exponentially with the sequence length50

Consequently finding exact solutions is intractablefor all but very short RNAs Note that the lsquoprincipleof optimalityrsquo which allows dynamic programming(Box 1) in the pseudoknot-free case is not applicablein the case of general pseudoknots By this princi-ple every optimal structure can be composed fromoptimal structures of its subsequences

In practice often heuristic methods are appli-cable Among the numerous approaches are ILM53

HotKnots54 KnotSeeker55 and IPknot56 ILM53

applies the classic principle of lsquohierarchic foldingrsquo itconstructs a pseudoknotted structure by iterativelypredicting a most likely stem using pseudoknot-freeprediction which is then added to the structure Hot-Knots54 refines the construction of the pseudoknottedstructure from likely more general secondary struc-ture elements TurboKnot57 even predicts conservedpseudoknots from a set of homologous RNAs Apply-ing a topological classification of RNA structures58

TT2NE59 guarantees to find the best RNA structureregardless of pseudoknot complexity however thislimits the length of treatable sequences

Other algorithms restrict the types of pseu-doknots such that dynamic programming can beapplied50ndash5260ndash63 These algorithms differ in theircomputational complexity and the complexity of con-sidered structures Figure 4 shows pseudoknots ofdifferent complexity

Rivas and Eddy51 proposed the most generalsuch algorithm It predicts three-knots (Figure 4c)

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52

predicts only canonical pseudoknots

(a)

(b)

(c)

(d)

(e)

but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63

(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically

Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are

no empirical energy parameters for pseudoknottedstructure elements

Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365

Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167

Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward

There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(a) (b)

FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)

example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69

predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70

RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions

Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular

base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed

Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72

and RNAduplex73

A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)

The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75

Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets

Finally several approaches predict con-served interactions between multiple sequencealignments7879

Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems

(a)

(b)

(c)

(d)

Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83

is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases

ANNOTATING ncRNAS IN GENOMICDATA

Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs

De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function

As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(c)

(a)(b)

FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)

of evolutionary signatures in alignments of relatedsequences can improve the signal considerably

The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93

To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95

searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596

and many other genomes (eg Ref 97)

Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization

Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101

A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models

In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110

signal recognition particle RNAs111

Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112

For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115

found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117

MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS

The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels

Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners

such as TopHat118 GSNAP119 and SpliceMap 120

identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified

Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)

With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121

and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference

In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms

sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements

Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126

using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood

function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms

RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries

The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 4: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

Advanced Review wireswileycomrna

BOX 2

STOCHASTIC CONTEXT FREE GRAMMARS

Context free grammars (CFG) are concepts fromformal language theory In most simple termsa CFG is a set production rules V rarr w where Vrepresent a so-called nonterminal symbol thatproduces a string of terminal or non-terminalsymbols w An example of a simple RNA gram-mar would be S rarr aSa|aS|Sa|SS|ε The gram-mar has one type of nonterminal symbol Sand one type of terminal symbols a isin A C G Trepresenting the bases The grammar consistsof production rules for unpaired and pairedbases (aa represent two complementary bases)This simple rules allow to produce all possibleRNA secondary structures A stochastic contextfree grammar (SCFG) extends CFGs by assign-ing probabilities to all production rules In thecase of our RNA grammar a full parametrizedSCFG would thus describe the probability dis-tribution over all structures and sequences

of Hidden Markov models that are widely used inbioinformatics (Box 2) SCFGs allow one to con-sider nested dependencies a prerequisite to modelRNA structure They have been used successfully forhomology search problems and consensus structureprediction (see below) Although SCFGs can be usedfor single sequence structure prediction25 they arenot widely used for this problem CONTRAFOLD23 analternative machine learning approach based on con-ditional random fields however could establish itselfas a serious alternative to thermodynamic methodsThere are also hybrid approaches that try to enhancethe thermodynamic parameters by training on knownstructures2627

Incorporating Structure Probing Data intoFolding AlgorithmsStructure probing experiments typically use enzymaticor chemical agents that specifically target paired orunpaired regions28 Most implementations of thermo-dynamic folding like UNAfold or RNAfold allowfor the incorporation of this type of information byrestricting the folding to structures consistent with theexperimental constraints As an alternative exper-imental information can also be incorporated aslsquopseudo-energiesrsquo into the folding algorithm enforc-ing regions to be preferentially paired or unpairedreflecting the experimental evidence29ndash31 Recentlyhigh-throughput sequencing techniques were used toscale up structure probing experiments massively32ndash34

(a)

(b)

FIGURE 3 | Principles of comparative analysis for RNA structureprediction (a) A short sequence that can fold into a hairpin is aligned tothree other sequences with different mutation patterns Mutated basesare indicated by a lightning symbol The affected base pair is shown inblue green and red for the case of a consistent mutation acompensatory double mutation or a inconsistent mutation thatdestroys the base pair (b) Sequence based alignment versus structuralalignment Consensus structure predicted for two aligned sequencesFirst the alignment is optimized to match the sequences resulting in apoor consensus structure with few conserved base-pairs (green)Second the two sequences are aligned to optimize a common structureresulting in a much better consensus structure with more conservedpairs (All structures are shown in lsquodotbracketrsquo notation in whichbase-pairs are indicated by brackets and unpaired positions are shownas dots)

Dealing with inherently noisy data of this type turnedout to be challenging and is still an active field ofresearch

Secondary Structure Prediction for HomologousSequencesAnother way to improve secondary structure predic-tion is to consider homologous sequences from relatedspecies If two or more sequences share a commonstructure but have diverged on the sequence leveltypical base substitution patterns that maintain thecommon structure can be observed (Figure 3a) A con-sistent mutation changes one base (eg AmiddotU harr GmiddotU)while compensatory mutations change both bases inthe base pair (eg AmiddotU harr GmiddotC or CmiddotG harr GmiddotC)Clearly these patterns provide useful information toinfer a secondary structure

The simplest way to exploit this signal isto calculate a mutual information score to findcolumns that show highly correlated mutation pat-terns This method led to surprisingly accurate struc-tures for rRNAs as early as 30 years ago35 Aswe rarely have such large datasets of RNAs withextremely conserved structures it is natural to combinecovariance analysis with classical folding algorithms

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

RNAalifold36 extends Zukerrsquos folding algorithm tomultiple sequence alignments by averaging the energycontribution over the sequences and adding covari-ance information in the form of lsquopseudoenergiesrsquoA probabilistic alternative is Pfold37 which usesa simple stochastic context free grammar combinedwith an evolutionary model of sequence evolutionto infer a consensus structure PetFold38 extendsPfold by incorporating pair probabilities from ther-modynamic folding and thus unifies evolutionaryand thermodynamic information A more recent pro-gram TurboFold39 also uses thermodynamic fold-ing and iteratively refines the energy parameters byincorporating pair probabilities from homologoussequences

Structural AlignmentEven unaligned RNAs can provide more informa-tion about their common structure than a sin-gle sequence Low sequence homology below 60sequence identity40 prohibits the sequence alignment-based approach of the previous section (Figure 3b)since correct alignment requires information aboutthe structure Since structure predictions for singlesequences are unreliable folding the sequences fol-lowed by structure-based alignment can also fail

Therefore the most successful strategies foldand align the RNAs simultaneously The first suchalgorithm41 by Sankoff simultaneously optimizes thealignmentrsquos edit distance and the free energies of bothRNA structures applying a loop-based energy modelHowever only recent advances made this strategyapplicable to practical RNA analysis

The first complete pairwise Sankoff-implementation Dynalign42 implements a loop-based energymodel but employs a simple banding technique forincreasing efficiency A further pairwise Sankoff-liketool is Foldalign43 Computing a loop-based energymodel during the alignment these algorithms areaccurate but computationally expensive in practicethey compensate this by strong sequence-basedheuristic restrictions

Several less expensive Sankoff-like algorithmsare based on simplifications introduced by PMcomp44

PMcomp replaces the loop-based energy model byassigning lsquopseudo-energiesrsquo to single base pairs Thisreduces the computational overhead significantly Bycomputing the pseudoenergies of base pairs fromtheir probabilities in the structure ensembles of thesingle RNAs accurate information from the loop-based energy model is fed back into the light-weightalgorithm

Approaches following this idea are LocARNA45

foldalignM46 RAF47 and LocARNA-P48 All these

tools additionally employ sparsity in the structureensemble of the single sequences

The sparsified PMcomp-like approaches aresufficiently fast for multiple alignment and large scalestudies performing for example clustering4546 andde novo prediction of structural RNA49

Prediction with PseudoknotsPseudoknotted structures follow the same rules asother secondary structures but allow nontree likeconfigurations for example because of an additionallevel of nested base pairs or pairing between differenthairpin loops (kissing hairpin) (Figure 4) Pseudoknotsare prevalent in many ncRNAs still most algorithmsignore them for technical reasons pseudoknot-foldingis computationally expensive and accurate empiricalenergy models are missing

Algorithmic ChallengesThe run-time of pseudoknot-free structure predic-tion grows only with the cube of the sequencelength Unfortunately when general pseudoknots areallowed the computation time grows much fasternamely exponentially with the sequence length50

Consequently finding exact solutions is intractablefor all but very short RNAs Note that the lsquoprincipleof optimalityrsquo which allows dynamic programming(Box 1) in the pseudoknot-free case is not applicablein the case of general pseudoknots By this princi-ple every optimal structure can be composed fromoptimal structures of its subsequences

In practice often heuristic methods are appli-cable Among the numerous approaches are ILM53

HotKnots54 KnotSeeker55 and IPknot56 ILM53

applies the classic principle of lsquohierarchic foldingrsquo itconstructs a pseudoknotted structure by iterativelypredicting a most likely stem using pseudoknot-freeprediction which is then added to the structure Hot-Knots54 refines the construction of the pseudoknottedstructure from likely more general secondary struc-ture elements TurboKnot57 even predicts conservedpseudoknots from a set of homologous RNAs Apply-ing a topological classification of RNA structures58

TT2NE59 guarantees to find the best RNA structureregardless of pseudoknot complexity however thislimits the length of treatable sequences

Other algorithms restrict the types of pseu-doknots such that dynamic programming can beapplied50ndash5260ndash63 These algorithms differ in theircomputational complexity and the complexity of con-sidered structures Figure 4 shows pseudoknots ofdifferent complexity

Rivas and Eddy51 proposed the most generalsuch algorithm It predicts three-knots (Figure 4c)

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52

predicts only canonical pseudoknots

(a)

(b)

(c)

(d)

(e)

but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63

(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically

Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are

no empirical energy parameters for pseudoknottedstructure elements

Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365

Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167

Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward

There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(a) (b)

FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)

example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69

predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70

RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions

Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular

base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed

Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72

and RNAduplex73

A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)

The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75

Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets

Finally several approaches predict con-served interactions between multiple sequencealignments7879

Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems

(a)

(b)

(c)

(d)

Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83

is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases

ANNOTATING ncRNAS IN GENOMICDATA

Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs

De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function

As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(c)

(a)(b)

FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)

of evolutionary signatures in alignments of relatedsequences can improve the signal considerably

The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93

To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95

searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596

and many other genomes (eg Ref 97)

Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization

Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101

A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models

In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110

signal recognition particle RNAs111

Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112

For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115

found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117

MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS

The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels

Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners

such as TopHat118 GSNAP119 and SpliceMap 120

identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified

Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)

With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121

and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference

In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms

sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements

Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126

using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood

function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms

RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries

The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 5: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

WIREs RNA Computational analysis of noncoding RNAs

RNAalifold36 extends Zukerrsquos folding algorithm tomultiple sequence alignments by averaging the energycontribution over the sequences and adding covari-ance information in the form of lsquopseudoenergiesrsquoA probabilistic alternative is Pfold37 which usesa simple stochastic context free grammar combinedwith an evolutionary model of sequence evolutionto infer a consensus structure PetFold38 extendsPfold by incorporating pair probabilities from ther-modynamic folding and thus unifies evolutionaryand thermodynamic information A more recent pro-gram TurboFold39 also uses thermodynamic fold-ing and iteratively refines the energy parameters byincorporating pair probabilities from homologoussequences

Structural AlignmentEven unaligned RNAs can provide more informa-tion about their common structure than a sin-gle sequence Low sequence homology below 60sequence identity40 prohibits the sequence alignment-based approach of the previous section (Figure 3b)since correct alignment requires information aboutthe structure Since structure predictions for singlesequences are unreliable folding the sequences fol-lowed by structure-based alignment can also fail

Therefore the most successful strategies foldand align the RNAs simultaneously The first suchalgorithm41 by Sankoff simultaneously optimizes thealignmentrsquos edit distance and the free energies of bothRNA structures applying a loop-based energy modelHowever only recent advances made this strategyapplicable to practical RNA analysis

The first complete pairwise Sankoff-implementation Dynalign42 implements a loop-based energymodel but employs a simple banding technique forincreasing efficiency A further pairwise Sankoff-liketool is Foldalign43 Computing a loop-based energymodel during the alignment these algorithms areaccurate but computationally expensive in practicethey compensate this by strong sequence-basedheuristic restrictions

Several less expensive Sankoff-like algorithmsare based on simplifications introduced by PMcomp44

PMcomp replaces the loop-based energy model byassigning lsquopseudo-energiesrsquo to single base pairs Thisreduces the computational overhead significantly Bycomputing the pseudoenergies of base pairs fromtheir probabilities in the structure ensembles of thesingle RNAs accurate information from the loop-based energy model is fed back into the light-weightalgorithm

Approaches following this idea are LocARNA45

foldalignM46 RAF47 and LocARNA-P48 All these

tools additionally employ sparsity in the structureensemble of the single sequences

The sparsified PMcomp-like approaches aresufficiently fast for multiple alignment and large scalestudies performing for example clustering4546 andde novo prediction of structural RNA49

Prediction with PseudoknotsPseudoknotted structures follow the same rules asother secondary structures but allow nontree likeconfigurations for example because of an additionallevel of nested base pairs or pairing between differenthairpin loops (kissing hairpin) (Figure 4) Pseudoknotsare prevalent in many ncRNAs still most algorithmsignore them for technical reasons pseudoknot-foldingis computationally expensive and accurate empiricalenergy models are missing

Algorithmic ChallengesThe run-time of pseudoknot-free structure predic-tion grows only with the cube of the sequencelength Unfortunately when general pseudoknots areallowed the computation time grows much fasternamely exponentially with the sequence length50

Consequently finding exact solutions is intractablefor all but very short RNAs Note that the lsquoprincipleof optimalityrsquo which allows dynamic programming(Box 1) in the pseudoknot-free case is not applicablein the case of general pseudoknots By this princi-ple every optimal structure can be composed fromoptimal structures of its subsequences

In practice often heuristic methods are appli-cable Among the numerous approaches are ILM53

HotKnots54 KnotSeeker55 and IPknot56 ILM53

applies the classic principle of lsquohierarchic foldingrsquo itconstructs a pseudoknotted structure by iterativelypredicting a most likely stem using pseudoknot-freeprediction which is then added to the structure Hot-Knots54 refines the construction of the pseudoknottedstructure from likely more general secondary struc-ture elements TurboKnot57 even predicts conservedpseudoknots from a set of homologous RNAs Apply-ing a topological classification of RNA structures58

TT2NE59 guarantees to find the best RNA structureregardless of pseudoknot complexity however thislimits the length of treatable sequences

Other algorithms restrict the types of pseu-doknots such that dynamic programming can beapplied50ndash5260ndash63 These algorithms differ in theircomputational complexity and the complexity of con-sidered structures Figure 4 shows pseudoknots ofdifferent complexity

Rivas and Eddy51 proposed the most generalsuch algorithm It predicts three-knots (Figure 4c)

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52

predicts only canonical pseudoknots

(a)

(b)

(c)

(d)

(e)

but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63

(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically

Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are

no empirical energy parameters for pseudoknottedstructure elements

Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365

Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167

Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward

There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(a) (b)

FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)

example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69

predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70

RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions

Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular

base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed

Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72

and RNAduplex73

A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)

The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75

Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets

Finally several approaches predict con-served interactions between multiple sequencealignments7879

Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems

(a)

(b)

(c)

(d)

Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83

is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases

ANNOTATING ncRNAS IN GENOMICDATA

Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs

De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function

As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(c)

(a)(b)

FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)

of evolutionary signatures in alignments of relatedsequences can improve the signal considerably

The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93

To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95

searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596

and many other genomes (eg Ref 97)

Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization

Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101

A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models

In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110

signal recognition particle RNAs111

Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112

For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115

found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117

MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS

The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels

Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners

such as TopHat118 GSNAP119 and SpliceMap 120

identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified

Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)

With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121

and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference

In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms

sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements

Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126

using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood

function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms

RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries

The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 6: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

Advanced Review wireswileycomrna

FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52

predicts only canonical pseudoknots

(a)

(b)

(c)

(d)

(e)

but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63

(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically

Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are

no empirical energy parameters for pseudoknottedstructure elements

Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365

Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167

Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward

There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(a) (b)

FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)

example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69

predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70

RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions

Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular

base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed

Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72

and RNAduplex73

A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)

The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75

Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets

Finally several approaches predict con-served interactions between multiple sequencealignments7879

Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems

(a)

(b)

(c)

(d)

Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83

is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases

ANNOTATING ncRNAS IN GENOMICDATA

Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs

De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function

As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(c)

(a)(b)

FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)

of evolutionary signatures in alignments of relatedsequences can improve the signal considerably

The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93

To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95

searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596

and many other genomes (eg Ref 97)

Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization

Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101

A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models

In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110

signal recognition particle RNAs111

Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112

For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115

found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117

MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS

The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels

Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners

such as TopHat118 GSNAP119 and SpliceMap 120

identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified

Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)

With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121

and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference

In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms

sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements

Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126

using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood

function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms

RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries

The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 7: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

WIREs RNA Computational analysis of noncoding RNAs

(a) (b)

FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)

example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69

predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70

RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions

Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular

base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed

Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72

and RNAduplex73

A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)

The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75

Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets

Finally several approaches predict con-served interactions between multiple sequencealignments7879

Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems

(a)

(b)

(c)

(d)

Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83

is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases

ANNOTATING ncRNAS IN GENOMICDATA

Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs

De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function

As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(c)

(a)(b)

FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)

of evolutionary signatures in alignments of relatedsequences can improve the signal considerably

The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93

To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95

searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596

and many other genomes (eg Ref 97)

Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization

Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101

A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models

In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110

signal recognition particle RNAs111

Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112

For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115

found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117

MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS

The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels

Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners

such as TopHat118 GSNAP119 and SpliceMap 120

identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified

Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)

With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121

and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference

In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms

sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements

Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126

using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood

function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms

RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries

The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 8: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

Advanced Review wireswileycomrna

FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems

(a)

(b)

(c)

(d)

Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83

is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases

ANNOTATING ncRNAS IN GENOMICDATA

Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs

De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function

As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

(c)

(a)(b)

FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)

of evolutionary signatures in alignments of relatedsequences can improve the signal considerably

The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93

To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95

searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596

and many other genomes (eg Ref 97)

Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization

Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101

A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models

In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110

signal recognition particle RNAs111

Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112

For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115

found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117

MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS

The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels

Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners

such as TopHat118 GSNAP119 and SpliceMap 120

identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified

Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)

With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121

and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference

In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms

sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements

Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126

using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood

function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms

RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries

The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 9: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

WIREs RNA Computational analysis of noncoding RNAs

(c)

(a)(b)

FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)

of evolutionary signatures in alignments of relatedsequences can improve the signal considerably

The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93

To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95

searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596

and many other genomes (eg Ref 97)

Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization

Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101

A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models

In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110

signal recognition particle RNAs111

Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112

For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115

found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117

MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS

The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels

Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners

such as TopHat118 GSNAP119 and SpliceMap 120

identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified

Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)

With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121

and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference

In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms

sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements

Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126

using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood

function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms

RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries

The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 10: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

Advanced Review wireswileycomrna

Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models

In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110

signal recognition particle RNAs111

Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112

For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115

found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117

MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS

The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels

Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners

such as TopHat118 GSNAP119 and SpliceMap 120

identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified

Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)

With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121

and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference

In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms

sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements

Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126

using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood

function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms

RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries

The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 11: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms

sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements

Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126

using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood

function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms

RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries

The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 12: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

Advanced Review wireswileycomrna

a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics

miRNAs

miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132

miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137

Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139

Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144

Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs

especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145

and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150

Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153

Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159

miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 13: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

WIREs RNA Computational analysis of noncoding RNAs

FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)

predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161

DATABASES

There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview

There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162

All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164

RNAdb165 fRNAdb166 lncRNAdb167)

Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated

The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171

CONCLUSIONS AND OUTLOOK

The wide variety of topics covered in this paperreflects the increasing complexity of the field It also

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 14: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

Advanced Review wireswileycomrna

clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements

of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172

ACKNOWLEDGMENTS

This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH

REFERENCES1 Amaral P Dinger M Mercer T Mattick J The

eukaryotic genome as an RNA machine Science 20083191787ndash1789

2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89

3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862

4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816

5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371

6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625

7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300

8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939

9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566

10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682

11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659

12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313

13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940

14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735

15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148

16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 15: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

WIREs RNA Computational analysis of noncoding RNAs

17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74

18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129

19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165

20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52

21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119

22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166

23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98

24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813

25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571

26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318

27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28

28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304

29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292

30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102

31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272

32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107

33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001

34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068

35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189

36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474

37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428

38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362

39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108

40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439

41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825

42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203

43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908

44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227

45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65

46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932

47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 16: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

Advanced Review wireswileycomrna

48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914

49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012

50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]

51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068

52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104

53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66

54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504

55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640

56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93

57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798

58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911

59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93

60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]

61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62

62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677

63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815

64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539

65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699

66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214

67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42

68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55

69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136

70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101

71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282

72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454

73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188

74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101

75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490

76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182

77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856

78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 17: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

WIREs RNA Computational analysis of noncoding RNAs

79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219

80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923

81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316

82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]

83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457

84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651

85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178

86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067

87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610

88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605

89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30

90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28

91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091

92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373

93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107

94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459

95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33

96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864

97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292

98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819

99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452

100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125

101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082

102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331

103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735

104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337

105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964

106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108

107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171

108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 18: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

Advanced Review wireswileycomrna

109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164

110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453

111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377

112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927

113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023

114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176

115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48

116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282

117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594

118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111

119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881

120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578

121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510

122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by

RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515

123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829

124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912

125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628

126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847

127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192

128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015

129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22

130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106

131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140

132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355

133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297

134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756

135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403

136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34

137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7

copy 2012 John Wiley amp Sons Ltd

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 19: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

WIREs RNA Computational analysis of noncoding RNAs

138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008

139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263

140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42

141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345

142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879

143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581

144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614

145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917

146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390

147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202

148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910

149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310

150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276

151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322

152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster

at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748

153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39

154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569

155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207

156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207

157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189

158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011

159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234

160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146

161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233

162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15

163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6

164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172

165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130

166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92

copy 2012 John Wiley amp Sons Ltd

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd

Page 20: Advanced Review Computational analysis of noncoding RNAscompbio.mit.edu/publications/79_Washietl_WileyRNA_12.pdf · FIGURE 2| Principles of RNA structure prediction (a) RNA secondary

Advanced Review wireswileycomrna

167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151

168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145

169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157

170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197

171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104

172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946

copy 2012 John Wiley amp Sons Ltd