Stochastic Context Free Grammars CBB 261 for noncoding RNA gene prediction B. Majoros.
Advanced Review Computational analysis of noncoding...
Transcript of Advanced Review Computational analysis of noncoding...
Advanced Review
Computational analysisof noncoding RNAsStefan Washietl12daggerlowast Sebastian Will1dagger David A Hendrix1Loyal A Goff13 John L Rinn23 Bonnie Berger124 andManolis Kellis12
Noncoding RNAs have emerged as important key players in the cell Understand-ing their surprisingly diverse range of functions is challenging for experimental andcomputational biology Here we review computational methods to analyze non-coding RNAs The topics covered include basic and advanced techniques to predictRNA structures annotation of noncoding RNAs in genomic data mining RNA-seqdata for novel transcripts and prediction of transcript structures computationalaspects of microRNAs and database resources copy 2012 John Wiley amp Sons Ltd
How to cite this articleWIREs RNA 2012 doi 101002wrna1134
INTRODUCTION
Noncoding RNAs (ncRNAs) are transcripts thatare not translated to proteins but act as
functional RNAs Several well-known ncRNAs suchas transfer RNAs or ribosomal RNAs can be foundthroughout the tree of life They fulfill centralfunctions in the cell and thus have been studied for along time
However over the past years a few key discov-eries have shown that ncRNAs have a much richerfunctional spectrum than anticipated1 The discoveryof microRNAs (miRNAsmiRs) for example changedour view of how genes are regulated23 Anothersurprising observation revealed by high-throughputmethods is that in human 90 of the genome is tran-scribed at some time in some tissue4 Although the fullextent and functional consequences of this pervasivetranscription remains highly controversial56 the vastamount of transcripts produced suggests that manyimportant ncRNA functions are yet to be discovered
daggerThese authors contributed equallylowastCorrespondence to washmitedu1Computer Science and Artificial Intelligence Laboratory Mas-sachusetts Institute of Technology Cambridge MA USA2Broad Institute of Massachusetts Institute of Technology andHarvard Cambridge MA USA3Department of Stem Cell and Regenerative Biology HarvardUniversity Cambridge MA USA4Mathematics Department Massachusetts Institute of TechnologyCambridge MA USA
In particular long noncoding RNAs (lncR-NAs)mdashtranscripts that can be several kilobases inlength spliced and processed like mRNAs but lackobvious coding potentialmdashseem to be a rich sourceof novel functions7 All these ncRNAs have been sug-gested to form a hidden layer of regulation that isnecessary to establish the complexity of eukaryoticgenomes8
Prokaryotic genomes also contain many sur-prises Riboswitches9 small regulatory RNAs10 orcompletely unknown structured RNAs11 suggest thatncRNAs also form an important functional layer inbacteria
Understanding the function of ncRNAsmdashinparticular in the age of high-throughput experi-mentsmdashis clearly not possible without computationalapproaches Algorithms to annotate organize andfunctionally characterize ncRNAs are of increasingrelevance In this article we give a broad overviewof programs and resources to analyze many differentaspects of ncRNAs (Figure 1)
STRUCTURAL ANALYSIS
For many RNAs there is a close connection betweenstructure and function Having a good model of thestructure of a RNA is thus critical and often the firstclue towards elucidating its function Determining thecomplete three dimensional structure (lsquotertiary struc-turersquo) of an RNA is a tedious and time-consuming
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 1 | Outline of the main topics covered in this article Many topics overlap depend on each other or share similar concepts The mostimportant of these interconnections are shown by arrows
undertaking Computational methodsmdasheither com-pletely de novo or assisted by experimental datamdasharetherefore routinely used to predict structure modelsA strong focus lies on the prediction of the secondarystructure ie the pattern of intramolecular base-pairs(AmiddotU GmiddotC and GmiddotU) typically formed in RNAs
Secondary StructureThermodynamic Folding of Single SequencesIn RNAs the secondary structural elements areresponsible for most of the overall folding energy andcan be seen as a coarse-grained approximation of thetertiary structure This important biophysical propertyin combination with the fact that secondary structurecan easily be formalized as a simple graph (Figure 2)led to secondary structure being widely studiedearly on One of the first attempts to approach theRNA folding problem (ie predicting the secondarystructure from the primary sequence) was by Nussinovand Jacobson12 They proposed an algorithm to findthe secondary structure with the maximum numberof base pairs It is one of the classical examples ofdynamic programming algorithms in computationalbiology and all modern variants of folding algorithmessentially use the same principle (Box 1)
In practice however finding the structure withthe maximum number of pairs does not give accu-rate results Ideally we want to find the struc-ture of minimum folding energy Since most of thefolding energy in RNAs is contributed by stacking
interactions between neighboring base pairs countingsingle base pairs is not sufficient Therefore cur-rent folding algorithms use a lsquonearest neighborrsquo oralso called lsquoloop-basedrsquo energy model A structureis uniquely decomposed into substructural elements(stacked bases hairpin-loops bulges interior-loopsand multi-way-junctions Figure 2a) The structuralelements are assigned energies which add up to thetotal folding energy of the structure (Figure 2b) Theenergy values are established empirically and typicallycome from systematic melting experiments on smallsynthetic RNAs An up-to-date set of energy parame-ters is maintained by Douglas Turnerrsquos lab1314
A dynamic programming algorithm to find theminimum free energy (MFE) for this more complexenergy model was proposed by Zuker and Stiegler15
and forms the basis for modern prediction pro-grams The most common implementations used areUNAFold16 RNAfold of the Vienna RNA package17
and RNAstructure18 The accuracy of MFE predic-tions depend on the type of RNA Although someRNAs can be predicted with high accuracy in gen-eral one has to expect that roughly a third of thepredicted base pairs are wrong and one third of truebase pairs are missed It is thus important to keep inmind that even the best currently available predictionmethods only give a rough model of the structure Ter-tiary interactions protein context and other inherentlimitations of the energy model are all sources ofpotential errors
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(a)
(b)
S
S
SSSSS
S
S S S S
B Bulge
FIGURE 2 | Principles of RNA structure prediction (a) RNA secondary structure can be represented as an outerplanar graph (right) The backboneis arranged as a circle and base pairs are represented as arcs The faces of this graph correspond to different structural elements This formalization isthe basis for most structure prediction algorithms Any structure can be uniquely decomposed into these basic elements which are independent fromeach other This allows for efficient folding algorithms based on the lsquodynamic programmingrsquo principle that breaks down the problem into smallersubproblems (see also Box 1) (b) Example of energy evaluation of a small RNA structure Thermodynamic folding algorithms assign free energies tothe structural elements In the example shown two stacks and a symmetric interior loop stabilize the structure (negative free energy) while thehairpin loop destabilizes the structure (positive free energy) The total free energy of the structure is the sum of the energy of all structural elements
BOX 1
DYNAMIC PROGRAMMING
The Dynamic Programming (DP) paradigm is usedfor many algorithms related to RNA folding DPbreaks down a problem in smaller subproblemsto find the overall solution efficiently Nussinovrsquosalgorithm is a classical example of a DPalgorithm Let us assume we want to find theminimum free energy Eij between the positionsi and j of a sequence and already know thesolution for a sequence from i + 1 to j ie asequence that is one base shorter The new basei can either be unpaired or forms a base-pairwith some position k
i jj i i+1 j i i+1 kminus1 k k+1|=
The base-pair k divides the problem intotwo smaller sub-problems namely finding thesolution for Ei+1kminus1 and Ek+1j We thus can findthe solution using a recursive algorithm
Eij = min
Ei+1j mini+1leklej
Ei+1kminus1 + Ek+1j + βik
βik is the energy contribution for the base-pair i k in this simplified energy model
At room temperature RNAs usually exist inan ensemble of different structures and the MFEstructure is not necessarily the biologically relevantstructure There are different algorithms to predictsuboptimal structures close to the MFE structure1920
McCaskillrsquos algorithm21 allows one to calculate thepartition function over all possible structures andsubsequently the probability of a particular basepair in the thermodynamic ensemble Consideringthe pair-probability matrix of all possible base pairsgives a more comprehensive view of the structuralproperties of an RNA than just the MFE predictionIt is also possible to obtain individual structurepredictions from the pair-probability matrix either bysampling22 or by finding a structure that maximizesthe expected accuracy considering a weighting factorbetween sensitivity and specificity2324
RNA Folding Using Probabilistic ModelsAn alternative to the thermodynamic approach ofRNA folding is a probabilistic approach based onmachine-learning principles Instead of using energyparameters folding parameters can be estimatedfrom a training set of known structures and areused to predict structures of unknown sequencesThere are several probabilistic frameworks to accom-plish parameter estimation and prediction Stochasticcontext-free grammars (SCFGs) are a generalization
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
BOX 2
STOCHASTIC CONTEXT FREE GRAMMARS
Context free grammars (CFG) are concepts fromformal language theory In most simple termsa CFG is a set production rules V rarr w where Vrepresent a so-called nonterminal symbol thatproduces a string of terminal or non-terminalsymbols w An example of a simple RNA gram-mar would be S rarr aSa|aS|Sa|SS|ε The gram-mar has one type of nonterminal symbol Sand one type of terminal symbols a isin A C G Trepresenting the bases The grammar consistsof production rules for unpaired and pairedbases (aa represent two complementary bases)This simple rules allow to produce all possibleRNA secondary structures A stochastic contextfree grammar (SCFG) extends CFGs by assign-ing probabilities to all production rules In thecase of our RNA grammar a full parametrizedSCFG would thus describe the probability dis-tribution over all structures and sequences
of Hidden Markov models that are widely used inbioinformatics (Box 2) SCFGs allow one to con-sider nested dependencies a prerequisite to modelRNA structure They have been used successfully forhomology search problems and consensus structureprediction (see below) Although SCFGs can be usedfor single sequence structure prediction25 they arenot widely used for this problem CONTRAFOLD23 analternative machine learning approach based on con-ditional random fields however could establish itselfas a serious alternative to thermodynamic methodsThere are also hybrid approaches that try to enhancethe thermodynamic parameters by training on knownstructures2627
Incorporating Structure Probing Data intoFolding AlgorithmsStructure probing experiments typically use enzymaticor chemical agents that specifically target paired orunpaired regions28 Most implementations of thermo-dynamic folding like UNAfold or RNAfold allowfor the incorporation of this type of information byrestricting the folding to structures consistent with theexperimental constraints As an alternative exper-imental information can also be incorporated aslsquopseudo-energiesrsquo into the folding algorithm enforc-ing regions to be preferentially paired or unpairedreflecting the experimental evidence29ndash31 Recentlyhigh-throughput sequencing techniques were used toscale up structure probing experiments massively32ndash34
(a)
(b)
FIGURE 3 | Principles of comparative analysis for RNA structureprediction (a) A short sequence that can fold into a hairpin is aligned tothree other sequences with different mutation patterns Mutated basesare indicated by a lightning symbol The affected base pair is shown inblue green and red for the case of a consistent mutation acompensatory double mutation or a inconsistent mutation thatdestroys the base pair (b) Sequence based alignment versus structuralalignment Consensus structure predicted for two aligned sequencesFirst the alignment is optimized to match the sequences resulting in apoor consensus structure with few conserved base-pairs (green)Second the two sequences are aligned to optimize a common structureresulting in a much better consensus structure with more conservedpairs (All structures are shown in lsquodotbracketrsquo notation in whichbase-pairs are indicated by brackets and unpaired positions are shownas dots)
Dealing with inherently noisy data of this type turnedout to be challenging and is still an active field ofresearch
Secondary Structure Prediction for HomologousSequencesAnother way to improve secondary structure predic-tion is to consider homologous sequences from relatedspecies If two or more sequences share a commonstructure but have diverged on the sequence leveltypical base substitution patterns that maintain thecommon structure can be observed (Figure 3a) A con-sistent mutation changes one base (eg AmiddotU harr GmiddotU)while compensatory mutations change both bases inthe base pair (eg AmiddotU harr GmiddotC or CmiddotG harr GmiddotC)Clearly these patterns provide useful information toinfer a secondary structure
The simplest way to exploit this signal isto calculate a mutual information score to findcolumns that show highly correlated mutation pat-terns This method led to surprisingly accurate struc-tures for rRNAs as early as 30 years ago35 Aswe rarely have such large datasets of RNAs withextremely conserved structures it is natural to combinecovariance analysis with classical folding algorithms
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
RNAalifold36 extends Zukerrsquos folding algorithm tomultiple sequence alignments by averaging the energycontribution over the sequences and adding covari-ance information in the form of lsquopseudoenergiesrsquoA probabilistic alternative is Pfold37 which usesa simple stochastic context free grammar combinedwith an evolutionary model of sequence evolutionto infer a consensus structure PetFold38 extendsPfold by incorporating pair probabilities from ther-modynamic folding and thus unifies evolutionaryand thermodynamic information A more recent pro-gram TurboFold39 also uses thermodynamic fold-ing and iteratively refines the energy parameters byincorporating pair probabilities from homologoussequences
Structural AlignmentEven unaligned RNAs can provide more informa-tion about their common structure than a sin-gle sequence Low sequence homology below 60sequence identity40 prohibits the sequence alignment-based approach of the previous section (Figure 3b)since correct alignment requires information aboutthe structure Since structure predictions for singlesequences are unreliable folding the sequences fol-lowed by structure-based alignment can also fail
Therefore the most successful strategies foldand align the RNAs simultaneously The first suchalgorithm41 by Sankoff simultaneously optimizes thealignmentrsquos edit distance and the free energies of bothRNA structures applying a loop-based energy modelHowever only recent advances made this strategyapplicable to practical RNA analysis
The first complete pairwise Sankoff-implementation Dynalign42 implements a loop-based energymodel but employs a simple banding technique forincreasing efficiency A further pairwise Sankoff-liketool is Foldalign43 Computing a loop-based energymodel during the alignment these algorithms areaccurate but computationally expensive in practicethey compensate this by strong sequence-basedheuristic restrictions
Several less expensive Sankoff-like algorithmsare based on simplifications introduced by PMcomp44
PMcomp replaces the loop-based energy model byassigning lsquopseudo-energiesrsquo to single base pairs Thisreduces the computational overhead significantly Bycomputing the pseudoenergies of base pairs fromtheir probabilities in the structure ensembles of thesingle RNAs accurate information from the loop-based energy model is fed back into the light-weightalgorithm
Approaches following this idea are LocARNA45
foldalignM46 RAF47 and LocARNA-P48 All these
tools additionally employ sparsity in the structureensemble of the single sequences
The sparsified PMcomp-like approaches aresufficiently fast for multiple alignment and large scalestudies performing for example clustering4546 andde novo prediction of structural RNA49
Prediction with PseudoknotsPseudoknotted structures follow the same rules asother secondary structures but allow nontree likeconfigurations for example because of an additionallevel of nested base pairs or pairing between differenthairpin loops (kissing hairpin) (Figure 4) Pseudoknotsare prevalent in many ncRNAs still most algorithmsignore them for technical reasons pseudoknot-foldingis computationally expensive and accurate empiricalenergy models are missing
Algorithmic ChallengesThe run-time of pseudoknot-free structure predic-tion grows only with the cube of the sequencelength Unfortunately when general pseudoknots areallowed the computation time grows much fasternamely exponentially with the sequence length50
Consequently finding exact solutions is intractablefor all but very short RNAs Note that the lsquoprincipleof optimalityrsquo which allows dynamic programming(Box 1) in the pseudoknot-free case is not applicablein the case of general pseudoknots By this princi-ple every optimal structure can be composed fromoptimal structures of its subsequences
In practice often heuristic methods are appli-cable Among the numerous approaches are ILM53
HotKnots54 KnotSeeker55 and IPknot56 ILM53
applies the classic principle of lsquohierarchic foldingrsquo itconstructs a pseudoknotted structure by iterativelypredicting a most likely stem using pseudoknot-freeprediction which is then added to the structure Hot-Knots54 refines the construction of the pseudoknottedstructure from likely more general secondary struc-ture elements TurboKnot57 even predicts conservedpseudoknots from a set of homologous RNAs Apply-ing a topological classification of RNA structures58
TT2NE59 guarantees to find the best RNA structureregardless of pseudoknot complexity however thislimits the length of treatable sequences
Other algorithms restrict the types of pseu-doknots such that dynamic programming can beapplied50ndash5260ndash63 These algorithms differ in theircomputational complexity and the complexity of con-sidered structures Figure 4 shows pseudoknots ofdifferent complexity
Rivas and Eddy51 proposed the most generalsuch algorithm It predicts three-knots (Figure 4c)
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52
predicts only canonical pseudoknots
(a)
(b)
(c)
(d)
(e)
but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63
(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically
Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are
no empirical energy parameters for pseudoknottedstructure elements
Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365
Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167
Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward
There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(a) (b)
FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)
example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69
predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70
RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions
Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular
base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed
Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72
and RNAduplex73
A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)
The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75
Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets
Finally several approaches predict con-served interactions between multiple sequencealignments7879
Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems
(a)
(b)
(c)
(d)
Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83
is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases
ANNOTATING ncRNAS IN GENOMICDATA
Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs
De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function
As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(c)
(a)(b)
FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)
of evolutionary signatures in alignments of relatedsequences can improve the signal considerably
The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93
To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95
searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596
and many other genomes (eg Ref 97)
Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization
Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101
A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models
In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110
signal recognition particle RNAs111
Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112
For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115
found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117
MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS
The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels
Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners
such as TopHat118 GSNAP119 and SpliceMap 120
identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified
Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)
With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121
and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference
In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms
sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements
Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126
using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood
function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms
RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries
The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 1 | Outline of the main topics covered in this article Many topics overlap depend on each other or share similar concepts The mostimportant of these interconnections are shown by arrows
undertaking Computational methodsmdasheither com-pletely de novo or assisted by experimental datamdasharetherefore routinely used to predict structure modelsA strong focus lies on the prediction of the secondarystructure ie the pattern of intramolecular base-pairs(AmiddotU GmiddotC and GmiddotU) typically formed in RNAs
Secondary StructureThermodynamic Folding of Single SequencesIn RNAs the secondary structural elements areresponsible for most of the overall folding energy andcan be seen as a coarse-grained approximation of thetertiary structure This important biophysical propertyin combination with the fact that secondary structurecan easily be formalized as a simple graph (Figure 2)led to secondary structure being widely studiedearly on One of the first attempts to approach theRNA folding problem (ie predicting the secondarystructure from the primary sequence) was by Nussinovand Jacobson12 They proposed an algorithm to findthe secondary structure with the maximum numberof base pairs It is one of the classical examples ofdynamic programming algorithms in computationalbiology and all modern variants of folding algorithmessentially use the same principle (Box 1)
In practice however finding the structure withthe maximum number of pairs does not give accu-rate results Ideally we want to find the struc-ture of minimum folding energy Since most of thefolding energy in RNAs is contributed by stacking
interactions between neighboring base pairs countingsingle base pairs is not sufficient Therefore cur-rent folding algorithms use a lsquonearest neighborrsquo oralso called lsquoloop-basedrsquo energy model A structureis uniquely decomposed into substructural elements(stacked bases hairpin-loops bulges interior-loopsand multi-way-junctions Figure 2a) The structuralelements are assigned energies which add up to thetotal folding energy of the structure (Figure 2b) Theenergy values are established empirically and typicallycome from systematic melting experiments on smallsynthetic RNAs An up-to-date set of energy parame-ters is maintained by Douglas Turnerrsquos lab1314
A dynamic programming algorithm to find theminimum free energy (MFE) for this more complexenergy model was proposed by Zuker and Stiegler15
and forms the basis for modern prediction pro-grams The most common implementations used areUNAFold16 RNAfold of the Vienna RNA package17
and RNAstructure18 The accuracy of MFE predic-tions depend on the type of RNA Although someRNAs can be predicted with high accuracy in gen-eral one has to expect that roughly a third of thepredicted base pairs are wrong and one third of truebase pairs are missed It is thus important to keep inmind that even the best currently available predictionmethods only give a rough model of the structure Ter-tiary interactions protein context and other inherentlimitations of the energy model are all sources ofpotential errors
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(a)
(b)
S
S
SSSSS
S
S S S S
B Bulge
FIGURE 2 | Principles of RNA structure prediction (a) RNA secondary structure can be represented as an outerplanar graph (right) The backboneis arranged as a circle and base pairs are represented as arcs The faces of this graph correspond to different structural elements This formalization isthe basis for most structure prediction algorithms Any structure can be uniquely decomposed into these basic elements which are independent fromeach other This allows for efficient folding algorithms based on the lsquodynamic programmingrsquo principle that breaks down the problem into smallersubproblems (see also Box 1) (b) Example of energy evaluation of a small RNA structure Thermodynamic folding algorithms assign free energies tothe structural elements In the example shown two stacks and a symmetric interior loop stabilize the structure (negative free energy) while thehairpin loop destabilizes the structure (positive free energy) The total free energy of the structure is the sum of the energy of all structural elements
BOX 1
DYNAMIC PROGRAMMING
The Dynamic Programming (DP) paradigm is usedfor many algorithms related to RNA folding DPbreaks down a problem in smaller subproblemsto find the overall solution efficiently Nussinovrsquosalgorithm is a classical example of a DPalgorithm Let us assume we want to find theminimum free energy Eij between the positionsi and j of a sequence and already know thesolution for a sequence from i + 1 to j ie asequence that is one base shorter The new basei can either be unpaired or forms a base-pairwith some position k
i jj i i+1 j i i+1 kminus1 k k+1|=
The base-pair k divides the problem intotwo smaller sub-problems namely finding thesolution for Ei+1kminus1 and Ek+1j We thus can findthe solution using a recursive algorithm
Eij = min
Ei+1j mini+1leklej
Ei+1kminus1 + Ek+1j + βik
βik is the energy contribution for the base-pair i k in this simplified energy model
At room temperature RNAs usually exist inan ensemble of different structures and the MFEstructure is not necessarily the biologically relevantstructure There are different algorithms to predictsuboptimal structures close to the MFE structure1920
McCaskillrsquos algorithm21 allows one to calculate thepartition function over all possible structures andsubsequently the probability of a particular basepair in the thermodynamic ensemble Consideringthe pair-probability matrix of all possible base pairsgives a more comprehensive view of the structuralproperties of an RNA than just the MFE predictionIt is also possible to obtain individual structurepredictions from the pair-probability matrix either bysampling22 or by finding a structure that maximizesthe expected accuracy considering a weighting factorbetween sensitivity and specificity2324
RNA Folding Using Probabilistic ModelsAn alternative to the thermodynamic approach ofRNA folding is a probabilistic approach based onmachine-learning principles Instead of using energyparameters folding parameters can be estimatedfrom a training set of known structures and areused to predict structures of unknown sequencesThere are several probabilistic frameworks to accom-plish parameter estimation and prediction Stochasticcontext-free grammars (SCFGs) are a generalization
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
BOX 2
STOCHASTIC CONTEXT FREE GRAMMARS
Context free grammars (CFG) are concepts fromformal language theory In most simple termsa CFG is a set production rules V rarr w where Vrepresent a so-called nonterminal symbol thatproduces a string of terminal or non-terminalsymbols w An example of a simple RNA gram-mar would be S rarr aSa|aS|Sa|SS|ε The gram-mar has one type of nonterminal symbol Sand one type of terminal symbols a isin A C G Trepresenting the bases The grammar consistsof production rules for unpaired and pairedbases (aa represent two complementary bases)This simple rules allow to produce all possibleRNA secondary structures A stochastic contextfree grammar (SCFG) extends CFGs by assign-ing probabilities to all production rules In thecase of our RNA grammar a full parametrizedSCFG would thus describe the probability dis-tribution over all structures and sequences
of Hidden Markov models that are widely used inbioinformatics (Box 2) SCFGs allow one to con-sider nested dependencies a prerequisite to modelRNA structure They have been used successfully forhomology search problems and consensus structureprediction (see below) Although SCFGs can be usedfor single sequence structure prediction25 they arenot widely used for this problem CONTRAFOLD23 analternative machine learning approach based on con-ditional random fields however could establish itselfas a serious alternative to thermodynamic methodsThere are also hybrid approaches that try to enhancethe thermodynamic parameters by training on knownstructures2627
Incorporating Structure Probing Data intoFolding AlgorithmsStructure probing experiments typically use enzymaticor chemical agents that specifically target paired orunpaired regions28 Most implementations of thermo-dynamic folding like UNAfold or RNAfold allowfor the incorporation of this type of information byrestricting the folding to structures consistent with theexperimental constraints As an alternative exper-imental information can also be incorporated aslsquopseudo-energiesrsquo into the folding algorithm enforc-ing regions to be preferentially paired or unpairedreflecting the experimental evidence29ndash31 Recentlyhigh-throughput sequencing techniques were used toscale up structure probing experiments massively32ndash34
(a)
(b)
FIGURE 3 | Principles of comparative analysis for RNA structureprediction (a) A short sequence that can fold into a hairpin is aligned tothree other sequences with different mutation patterns Mutated basesare indicated by a lightning symbol The affected base pair is shown inblue green and red for the case of a consistent mutation acompensatory double mutation or a inconsistent mutation thatdestroys the base pair (b) Sequence based alignment versus structuralalignment Consensus structure predicted for two aligned sequencesFirst the alignment is optimized to match the sequences resulting in apoor consensus structure with few conserved base-pairs (green)Second the two sequences are aligned to optimize a common structureresulting in a much better consensus structure with more conservedpairs (All structures are shown in lsquodotbracketrsquo notation in whichbase-pairs are indicated by brackets and unpaired positions are shownas dots)
Dealing with inherently noisy data of this type turnedout to be challenging and is still an active field ofresearch
Secondary Structure Prediction for HomologousSequencesAnother way to improve secondary structure predic-tion is to consider homologous sequences from relatedspecies If two or more sequences share a commonstructure but have diverged on the sequence leveltypical base substitution patterns that maintain thecommon structure can be observed (Figure 3a) A con-sistent mutation changes one base (eg AmiddotU harr GmiddotU)while compensatory mutations change both bases inthe base pair (eg AmiddotU harr GmiddotC or CmiddotG harr GmiddotC)Clearly these patterns provide useful information toinfer a secondary structure
The simplest way to exploit this signal isto calculate a mutual information score to findcolumns that show highly correlated mutation pat-terns This method led to surprisingly accurate struc-tures for rRNAs as early as 30 years ago35 Aswe rarely have such large datasets of RNAs withextremely conserved structures it is natural to combinecovariance analysis with classical folding algorithms
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
RNAalifold36 extends Zukerrsquos folding algorithm tomultiple sequence alignments by averaging the energycontribution over the sequences and adding covari-ance information in the form of lsquopseudoenergiesrsquoA probabilistic alternative is Pfold37 which usesa simple stochastic context free grammar combinedwith an evolutionary model of sequence evolutionto infer a consensus structure PetFold38 extendsPfold by incorporating pair probabilities from ther-modynamic folding and thus unifies evolutionaryand thermodynamic information A more recent pro-gram TurboFold39 also uses thermodynamic fold-ing and iteratively refines the energy parameters byincorporating pair probabilities from homologoussequences
Structural AlignmentEven unaligned RNAs can provide more informa-tion about their common structure than a sin-gle sequence Low sequence homology below 60sequence identity40 prohibits the sequence alignment-based approach of the previous section (Figure 3b)since correct alignment requires information aboutthe structure Since structure predictions for singlesequences are unreliable folding the sequences fol-lowed by structure-based alignment can also fail
Therefore the most successful strategies foldand align the RNAs simultaneously The first suchalgorithm41 by Sankoff simultaneously optimizes thealignmentrsquos edit distance and the free energies of bothRNA structures applying a loop-based energy modelHowever only recent advances made this strategyapplicable to practical RNA analysis
The first complete pairwise Sankoff-implementation Dynalign42 implements a loop-based energymodel but employs a simple banding technique forincreasing efficiency A further pairwise Sankoff-liketool is Foldalign43 Computing a loop-based energymodel during the alignment these algorithms areaccurate but computationally expensive in practicethey compensate this by strong sequence-basedheuristic restrictions
Several less expensive Sankoff-like algorithmsare based on simplifications introduced by PMcomp44
PMcomp replaces the loop-based energy model byassigning lsquopseudo-energiesrsquo to single base pairs Thisreduces the computational overhead significantly Bycomputing the pseudoenergies of base pairs fromtheir probabilities in the structure ensembles of thesingle RNAs accurate information from the loop-based energy model is fed back into the light-weightalgorithm
Approaches following this idea are LocARNA45
foldalignM46 RAF47 and LocARNA-P48 All these
tools additionally employ sparsity in the structureensemble of the single sequences
The sparsified PMcomp-like approaches aresufficiently fast for multiple alignment and large scalestudies performing for example clustering4546 andde novo prediction of structural RNA49
Prediction with PseudoknotsPseudoknotted structures follow the same rules asother secondary structures but allow nontree likeconfigurations for example because of an additionallevel of nested base pairs or pairing between differenthairpin loops (kissing hairpin) (Figure 4) Pseudoknotsare prevalent in many ncRNAs still most algorithmsignore them for technical reasons pseudoknot-foldingis computationally expensive and accurate empiricalenergy models are missing
Algorithmic ChallengesThe run-time of pseudoknot-free structure predic-tion grows only with the cube of the sequencelength Unfortunately when general pseudoknots areallowed the computation time grows much fasternamely exponentially with the sequence length50
Consequently finding exact solutions is intractablefor all but very short RNAs Note that the lsquoprincipleof optimalityrsquo which allows dynamic programming(Box 1) in the pseudoknot-free case is not applicablein the case of general pseudoknots By this princi-ple every optimal structure can be composed fromoptimal structures of its subsequences
In practice often heuristic methods are appli-cable Among the numerous approaches are ILM53
HotKnots54 KnotSeeker55 and IPknot56 ILM53
applies the classic principle of lsquohierarchic foldingrsquo itconstructs a pseudoknotted structure by iterativelypredicting a most likely stem using pseudoknot-freeprediction which is then added to the structure Hot-Knots54 refines the construction of the pseudoknottedstructure from likely more general secondary struc-ture elements TurboKnot57 even predicts conservedpseudoknots from a set of homologous RNAs Apply-ing a topological classification of RNA structures58
TT2NE59 guarantees to find the best RNA structureregardless of pseudoknot complexity however thislimits the length of treatable sequences
Other algorithms restrict the types of pseu-doknots such that dynamic programming can beapplied50ndash5260ndash63 These algorithms differ in theircomputational complexity and the complexity of con-sidered structures Figure 4 shows pseudoknots ofdifferent complexity
Rivas and Eddy51 proposed the most generalsuch algorithm It predicts three-knots (Figure 4c)
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52
predicts only canonical pseudoknots
(a)
(b)
(c)
(d)
(e)
but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63
(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically
Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are
no empirical energy parameters for pseudoknottedstructure elements
Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365
Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167
Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward
There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(a) (b)
FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)
example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69
predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70
RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions
Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular
base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed
Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72
and RNAduplex73
A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)
The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75
Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets
Finally several approaches predict con-served interactions between multiple sequencealignments7879
Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems
(a)
(b)
(c)
(d)
Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83
is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases
ANNOTATING ncRNAS IN GENOMICDATA
Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs
De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function
As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(c)
(a)(b)
FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)
of evolutionary signatures in alignments of relatedsequences can improve the signal considerably
The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93
To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95
searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596
and many other genomes (eg Ref 97)
Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization
Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101
A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models
In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110
signal recognition particle RNAs111
Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112
For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115
found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117
MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS
The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels
Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners
such as TopHat118 GSNAP119 and SpliceMap 120
identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified
Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)
With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121
and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference
In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms
sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements
Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126
using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood
function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms
RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries
The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(a)
(b)
S
S
SSSSS
S
S S S S
B Bulge
FIGURE 2 | Principles of RNA structure prediction (a) RNA secondary structure can be represented as an outerplanar graph (right) The backboneis arranged as a circle and base pairs are represented as arcs The faces of this graph correspond to different structural elements This formalization isthe basis for most structure prediction algorithms Any structure can be uniquely decomposed into these basic elements which are independent fromeach other This allows for efficient folding algorithms based on the lsquodynamic programmingrsquo principle that breaks down the problem into smallersubproblems (see also Box 1) (b) Example of energy evaluation of a small RNA structure Thermodynamic folding algorithms assign free energies tothe structural elements In the example shown two stacks and a symmetric interior loop stabilize the structure (negative free energy) while thehairpin loop destabilizes the structure (positive free energy) The total free energy of the structure is the sum of the energy of all structural elements
BOX 1
DYNAMIC PROGRAMMING
The Dynamic Programming (DP) paradigm is usedfor many algorithms related to RNA folding DPbreaks down a problem in smaller subproblemsto find the overall solution efficiently Nussinovrsquosalgorithm is a classical example of a DPalgorithm Let us assume we want to find theminimum free energy Eij between the positionsi and j of a sequence and already know thesolution for a sequence from i + 1 to j ie asequence that is one base shorter The new basei can either be unpaired or forms a base-pairwith some position k
i jj i i+1 j i i+1 kminus1 k k+1|=
The base-pair k divides the problem intotwo smaller sub-problems namely finding thesolution for Ei+1kminus1 and Ek+1j We thus can findthe solution using a recursive algorithm
Eij = min
Ei+1j mini+1leklej
Ei+1kminus1 + Ek+1j + βik
βik is the energy contribution for the base-pair i k in this simplified energy model
At room temperature RNAs usually exist inan ensemble of different structures and the MFEstructure is not necessarily the biologically relevantstructure There are different algorithms to predictsuboptimal structures close to the MFE structure1920
McCaskillrsquos algorithm21 allows one to calculate thepartition function over all possible structures andsubsequently the probability of a particular basepair in the thermodynamic ensemble Consideringthe pair-probability matrix of all possible base pairsgives a more comprehensive view of the structuralproperties of an RNA than just the MFE predictionIt is also possible to obtain individual structurepredictions from the pair-probability matrix either bysampling22 or by finding a structure that maximizesthe expected accuracy considering a weighting factorbetween sensitivity and specificity2324
RNA Folding Using Probabilistic ModelsAn alternative to the thermodynamic approach ofRNA folding is a probabilistic approach based onmachine-learning principles Instead of using energyparameters folding parameters can be estimatedfrom a training set of known structures and areused to predict structures of unknown sequencesThere are several probabilistic frameworks to accom-plish parameter estimation and prediction Stochasticcontext-free grammars (SCFGs) are a generalization
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
BOX 2
STOCHASTIC CONTEXT FREE GRAMMARS
Context free grammars (CFG) are concepts fromformal language theory In most simple termsa CFG is a set production rules V rarr w where Vrepresent a so-called nonterminal symbol thatproduces a string of terminal or non-terminalsymbols w An example of a simple RNA gram-mar would be S rarr aSa|aS|Sa|SS|ε The gram-mar has one type of nonterminal symbol Sand one type of terminal symbols a isin A C G Trepresenting the bases The grammar consistsof production rules for unpaired and pairedbases (aa represent two complementary bases)This simple rules allow to produce all possibleRNA secondary structures A stochastic contextfree grammar (SCFG) extends CFGs by assign-ing probabilities to all production rules In thecase of our RNA grammar a full parametrizedSCFG would thus describe the probability dis-tribution over all structures and sequences
of Hidden Markov models that are widely used inbioinformatics (Box 2) SCFGs allow one to con-sider nested dependencies a prerequisite to modelRNA structure They have been used successfully forhomology search problems and consensus structureprediction (see below) Although SCFGs can be usedfor single sequence structure prediction25 they arenot widely used for this problem CONTRAFOLD23 analternative machine learning approach based on con-ditional random fields however could establish itselfas a serious alternative to thermodynamic methodsThere are also hybrid approaches that try to enhancethe thermodynamic parameters by training on knownstructures2627
Incorporating Structure Probing Data intoFolding AlgorithmsStructure probing experiments typically use enzymaticor chemical agents that specifically target paired orunpaired regions28 Most implementations of thermo-dynamic folding like UNAfold or RNAfold allowfor the incorporation of this type of information byrestricting the folding to structures consistent with theexperimental constraints As an alternative exper-imental information can also be incorporated aslsquopseudo-energiesrsquo into the folding algorithm enforc-ing regions to be preferentially paired or unpairedreflecting the experimental evidence29ndash31 Recentlyhigh-throughput sequencing techniques were used toscale up structure probing experiments massively32ndash34
(a)
(b)
FIGURE 3 | Principles of comparative analysis for RNA structureprediction (a) A short sequence that can fold into a hairpin is aligned tothree other sequences with different mutation patterns Mutated basesare indicated by a lightning symbol The affected base pair is shown inblue green and red for the case of a consistent mutation acompensatory double mutation or a inconsistent mutation thatdestroys the base pair (b) Sequence based alignment versus structuralalignment Consensus structure predicted for two aligned sequencesFirst the alignment is optimized to match the sequences resulting in apoor consensus structure with few conserved base-pairs (green)Second the two sequences are aligned to optimize a common structureresulting in a much better consensus structure with more conservedpairs (All structures are shown in lsquodotbracketrsquo notation in whichbase-pairs are indicated by brackets and unpaired positions are shownas dots)
Dealing with inherently noisy data of this type turnedout to be challenging and is still an active field ofresearch
Secondary Structure Prediction for HomologousSequencesAnother way to improve secondary structure predic-tion is to consider homologous sequences from relatedspecies If two or more sequences share a commonstructure but have diverged on the sequence leveltypical base substitution patterns that maintain thecommon structure can be observed (Figure 3a) A con-sistent mutation changes one base (eg AmiddotU harr GmiddotU)while compensatory mutations change both bases inthe base pair (eg AmiddotU harr GmiddotC or CmiddotG harr GmiddotC)Clearly these patterns provide useful information toinfer a secondary structure
The simplest way to exploit this signal isto calculate a mutual information score to findcolumns that show highly correlated mutation pat-terns This method led to surprisingly accurate struc-tures for rRNAs as early as 30 years ago35 Aswe rarely have such large datasets of RNAs withextremely conserved structures it is natural to combinecovariance analysis with classical folding algorithms
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
RNAalifold36 extends Zukerrsquos folding algorithm tomultiple sequence alignments by averaging the energycontribution over the sequences and adding covari-ance information in the form of lsquopseudoenergiesrsquoA probabilistic alternative is Pfold37 which usesa simple stochastic context free grammar combinedwith an evolutionary model of sequence evolutionto infer a consensus structure PetFold38 extendsPfold by incorporating pair probabilities from ther-modynamic folding and thus unifies evolutionaryand thermodynamic information A more recent pro-gram TurboFold39 also uses thermodynamic fold-ing and iteratively refines the energy parameters byincorporating pair probabilities from homologoussequences
Structural AlignmentEven unaligned RNAs can provide more informa-tion about their common structure than a sin-gle sequence Low sequence homology below 60sequence identity40 prohibits the sequence alignment-based approach of the previous section (Figure 3b)since correct alignment requires information aboutthe structure Since structure predictions for singlesequences are unreliable folding the sequences fol-lowed by structure-based alignment can also fail
Therefore the most successful strategies foldand align the RNAs simultaneously The first suchalgorithm41 by Sankoff simultaneously optimizes thealignmentrsquos edit distance and the free energies of bothRNA structures applying a loop-based energy modelHowever only recent advances made this strategyapplicable to practical RNA analysis
The first complete pairwise Sankoff-implementation Dynalign42 implements a loop-based energymodel but employs a simple banding technique forincreasing efficiency A further pairwise Sankoff-liketool is Foldalign43 Computing a loop-based energymodel during the alignment these algorithms areaccurate but computationally expensive in practicethey compensate this by strong sequence-basedheuristic restrictions
Several less expensive Sankoff-like algorithmsare based on simplifications introduced by PMcomp44
PMcomp replaces the loop-based energy model byassigning lsquopseudo-energiesrsquo to single base pairs Thisreduces the computational overhead significantly Bycomputing the pseudoenergies of base pairs fromtheir probabilities in the structure ensembles of thesingle RNAs accurate information from the loop-based energy model is fed back into the light-weightalgorithm
Approaches following this idea are LocARNA45
foldalignM46 RAF47 and LocARNA-P48 All these
tools additionally employ sparsity in the structureensemble of the single sequences
The sparsified PMcomp-like approaches aresufficiently fast for multiple alignment and large scalestudies performing for example clustering4546 andde novo prediction of structural RNA49
Prediction with PseudoknotsPseudoknotted structures follow the same rules asother secondary structures but allow nontree likeconfigurations for example because of an additionallevel of nested base pairs or pairing between differenthairpin loops (kissing hairpin) (Figure 4) Pseudoknotsare prevalent in many ncRNAs still most algorithmsignore them for technical reasons pseudoknot-foldingis computationally expensive and accurate empiricalenergy models are missing
Algorithmic ChallengesThe run-time of pseudoknot-free structure predic-tion grows only with the cube of the sequencelength Unfortunately when general pseudoknots areallowed the computation time grows much fasternamely exponentially with the sequence length50
Consequently finding exact solutions is intractablefor all but very short RNAs Note that the lsquoprincipleof optimalityrsquo which allows dynamic programming(Box 1) in the pseudoknot-free case is not applicablein the case of general pseudoknots By this princi-ple every optimal structure can be composed fromoptimal structures of its subsequences
In practice often heuristic methods are appli-cable Among the numerous approaches are ILM53
HotKnots54 KnotSeeker55 and IPknot56 ILM53
applies the classic principle of lsquohierarchic foldingrsquo itconstructs a pseudoknotted structure by iterativelypredicting a most likely stem using pseudoknot-freeprediction which is then added to the structure Hot-Knots54 refines the construction of the pseudoknottedstructure from likely more general secondary struc-ture elements TurboKnot57 even predicts conservedpseudoknots from a set of homologous RNAs Apply-ing a topological classification of RNA structures58
TT2NE59 guarantees to find the best RNA structureregardless of pseudoknot complexity however thislimits the length of treatable sequences
Other algorithms restrict the types of pseu-doknots such that dynamic programming can beapplied50ndash5260ndash63 These algorithms differ in theircomputational complexity and the complexity of con-sidered structures Figure 4 shows pseudoknots ofdifferent complexity
Rivas and Eddy51 proposed the most generalsuch algorithm It predicts three-knots (Figure 4c)
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52
predicts only canonical pseudoknots
(a)
(b)
(c)
(d)
(e)
but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63
(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically
Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are
no empirical energy parameters for pseudoknottedstructure elements
Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365
Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167
Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward
There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(a) (b)
FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)
example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69
predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70
RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions
Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular
base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed
Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72
and RNAduplex73
A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)
The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75
Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets
Finally several approaches predict con-served interactions between multiple sequencealignments7879
Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems
(a)
(b)
(c)
(d)
Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83
is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases
ANNOTATING ncRNAS IN GENOMICDATA
Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs
De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function
As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(c)
(a)(b)
FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)
of evolutionary signatures in alignments of relatedsequences can improve the signal considerably
The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93
To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95
searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596
and many other genomes (eg Ref 97)
Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization
Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101
A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models
In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110
signal recognition particle RNAs111
Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112
For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115
found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117
MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS
The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels
Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners
such as TopHat118 GSNAP119 and SpliceMap 120
identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified
Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)
With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121
and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference
In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms
sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements
Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126
using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood
function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms
RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries
The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
BOX 2
STOCHASTIC CONTEXT FREE GRAMMARS
Context free grammars (CFG) are concepts fromformal language theory In most simple termsa CFG is a set production rules V rarr w where Vrepresent a so-called nonterminal symbol thatproduces a string of terminal or non-terminalsymbols w An example of a simple RNA gram-mar would be S rarr aSa|aS|Sa|SS|ε The gram-mar has one type of nonterminal symbol Sand one type of terminal symbols a isin A C G Trepresenting the bases The grammar consistsof production rules for unpaired and pairedbases (aa represent two complementary bases)This simple rules allow to produce all possibleRNA secondary structures A stochastic contextfree grammar (SCFG) extends CFGs by assign-ing probabilities to all production rules In thecase of our RNA grammar a full parametrizedSCFG would thus describe the probability dis-tribution over all structures and sequences
of Hidden Markov models that are widely used inbioinformatics (Box 2) SCFGs allow one to con-sider nested dependencies a prerequisite to modelRNA structure They have been used successfully forhomology search problems and consensus structureprediction (see below) Although SCFGs can be usedfor single sequence structure prediction25 they arenot widely used for this problem CONTRAFOLD23 analternative machine learning approach based on con-ditional random fields however could establish itselfas a serious alternative to thermodynamic methodsThere are also hybrid approaches that try to enhancethe thermodynamic parameters by training on knownstructures2627
Incorporating Structure Probing Data intoFolding AlgorithmsStructure probing experiments typically use enzymaticor chemical agents that specifically target paired orunpaired regions28 Most implementations of thermo-dynamic folding like UNAfold or RNAfold allowfor the incorporation of this type of information byrestricting the folding to structures consistent with theexperimental constraints As an alternative exper-imental information can also be incorporated aslsquopseudo-energiesrsquo into the folding algorithm enforc-ing regions to be preferentially paired or unpairedreflecting the experimental evidence29ndash31 Recentlyhigh-throughput sequencing techniques were used toscale up structure probing experiments massively32ndash34
(a)
(b)
FIGURE 3 | Principles of comparative analysis for RNA structureprediction (a) A short sequence that can fold into a hairpin is aligned tothree other sequences with different mutation patterns Mutated basesare indicated by a lightning symbol The affected base pair is shown inblue green and red for the case of a consistent mutation acompensatory double mutation or a inconsistent mutation thatdestroys the base pair (b) Sequence based alignment versus structuralalignment Consensus structure predicted for two aligned sequencesFirst the alignment is optimized to match the sequences resulting in apoor consensus structure with few conserved base-pairs (green)Second the two sequences are aligned to optimize a common structureresulting in a much better consensus structure with more conservedpairs (All structures are shown in lsquodotbracketrsquo notation in whichbase-pairs are indicated by brackets and unpaired positions are shownas dots)
Dealing with inherently noisy data of this type turnedout to be challenging and is still an active field ofresearch
Secondary Structure Prediction for HomologousSequencesAnother way to improve secondary structure predic-tion is to consider homologous sequences from relatedspecies If two or more sequences share a commonstructure but have diverged on the sequence leveltypical base substitution patterns that maintain thecommon structure can be observed (Figure 3a) A con-sistent mutation changes one base (eg AmiddotU harr GmiddotU)while compensatory mutations change both bases inthe base pair (eg AmiddotU harr GmiddotC or CmiddotG harr GmiddotC)Clearly these patterns provide useful information toinfer a secondary structure
The simplest way to exploit this signal isto calculate a mutual information score to findcolumns that show highly correlated mutation pat-terns This method led to surprisingly accurate struc-tures for rRNAs as early as 30 years ago35 Aswe rarely have such large datasets of RNAs withextremely conserved structures it is natural to combinecovariance analysis with classical folding algorithms
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
RNAalifold36 extends Zukerrsquos folding algorithm tomultiple sequence alignments by averaging the energycontribution over the sequences and adding covari-ance information in the form of lsquopseudoenergiesrsquoA probabilistic alternative is Pfold37 which usesa simple stochastic context free grammar combinedwith an evolutionary model of sequence evolutionto infer a consensus structure PetFold38 extendsPfold by incorporating pair probabilities from ther-modynamic folding and thus unifies evolutionaryand thermodynamic information A more recent pro-gram TurboFold39 also uses thermodynamic fold-ing and iteratively refines the energy parameters byincorporating pair probabilities from homologoussequences
Structural AlignmentEven unaligned RNAs can provide more informa-tion about their common structure than a sin-gle sequence Low sequence homology below 60sequence identity40 prohibits the sequence alignment-based approach of the previous section (Figure 3b)since correct alignment requires information aboutthe structure Since structure predictions for singlesequences are unreliable folding the sequences fol-lowed by structure-based alignment can also fail
Therefore the most successful strategies foldand align the RNAs simultaneously The first suchalgorithm41 by Sankoff simultaneously optimizes thealignmentrsquos edit distance and the free energies of bothRNA structures applying a loop-based energy modelHowever only recent advances made this strategyapplicable to practical RNA analysis
The first complete pairwise Sankoff-implementation Dynalign42 implements a loop-based energymodel but employs a simple banding technique forincreasing efficiency A further pairwise Sankoff-liketool is Foldalign43 Computing a loop-based energymodel during the alignment these algorithms areaccurate but computationally expensive in practicethey compensate this by strong sequence-basedheuristic restrictions
Several less expensive Sankoff-like algorithmsare based on simplifications introduced by PMcomp44
PMcomp replaces the loop-based energy model byassigning lsquopseudo-energiesrsquo to single base pairs Thisreduces the computational overhead significantly Bycomputing the pseudoenergies of base pairs fromtheir probabilities in the structure ensembles of thesingle RNAs accurate information from the loop-based energy model is fed back into the light-weightalgorithm
Approaches following this idea are LocARNA45
foldalignM46 RAF47 and LocARNA-P48 All these
tools additionally employ sparsity in the structureensemble of the single sequences
The sparsified PMcomp-like approaches aresufficiently fast for multiple alignment and large scalestudies performing for example clustering4546 andde novo prediction of structural RNA49
Prediction with PseudoknotsPseudoknotted structures follow the same rules asother secondary structures but allow nontree likeconfigurations for example because of an additionallevel of nested base pairs or pairing between differenthairpin loops (kissing hairpin) (Figure 4) Pseudoknotsare prevalent in many ncRNAs still most algorithmsignore them for technical reasons pseudoknot-foldingis computationally expensive and accurate empiricalenergy models are missing
Algorithmic ChallengesThe run-time of pseudoknot-free structure predic-tion grows only with the cube of the sequencelength Unfortunately when general pseudoknots areallowed the computation time grows much fasternamely exponentially with the sequence length50
Consequently finding exact solutions is intractablefor all but very short RNAs Note that the lsquoprincipleof optimalityrsquo which allows dynamic programming(Box 1) in the pseudoknot-free case is not applicablein the case of general pseudoknots By this princi-ple every optimal structure can be composed fromoptimal structures of its subsequences
In practice often heuristic methods are appli-cable Among the numerous approaches are ILM53
HotKnots54 KnotSeeker55 and IPknot56 ILM53
applies the classic principle of lsquohierarchic foldingrsquo itconstructs a pseudoknotted structure by iterativelypredicting a most likely stem using pseudoknot-freeprediction which is then added to the structure Hot-Knots54 refines the construction of the pseudoknottedstructure from likely more general secondary struc-ture elements TurboKnot57 even predicts conservedpseudoknots from a set of homologous RNAs Apply-ing a topological classification of RNA structures58
TT2NE59 guarantees to find the best RNA structureregardless of pseudoknot complexity however thislimits the length of treatable sequences
Other algorithms restrict the types of pseu-doknots such that dynamic programming can beapplied50ndash5260ndash63 These algorithms differ in theircomputational complexity and the complexity of con-sidered structures Figure 4 shows pseudoknots ofdifferent complexity
Rivas and Eddy51 proposed the most generalsuch algorithm It predicts three-knots (Figure 4c)
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52
predicts only canonical pseudoknots
(a)
(b)
(c)
(d)
(e)
but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63
(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically
Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are
no empirical energy parameters for pseudoknottedstructure elements
Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365
Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167
Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward
There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(a) (b)
FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)
example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69
predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70
RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions
Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular
base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed
Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72
and RNAduplex73
A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)
The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75
Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets
Finally several approaches predict con-served interactions between multiple sequencealignments7879
Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems
(a)
(b)
(c)
(d)
Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83
is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases
ANNOTATING ncRNAS IN GENOMICDATA
Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs
De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function
As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(c)
(a)(b)
FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)
of evolutionary signatures in alignments of relatedsequences can improve the signal considerably
The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93
To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95
searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596
and many other genomes (eg Ref 97)
Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization
Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101
A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models
In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110
signal recognition particle RNAs111
Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112
For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115
found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117
MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS
The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels
Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners
such as TopHat118 GSNAP119 and SpliceMap 120
identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified
Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)
With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121
and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference
In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms
sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements
Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126
using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood
function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms
RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries
The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
RNAalifold36 extends Zukerrsquos folding algorithm tomultiple sequence alignments by averaging the energycontribution over the sequences and adding covari-ance information in the form of lsquopseudoenergiesrsquoA probabilistic alternative is Pfold37 which usesa simple stochastic context free grammar combinedwith an evolutionary model of sequence evolutionto infer a consensus structure PetFold38 extendsPfold by incorporating pair probabilities from ther-modynamic folding and thus unifies evolutionaryand thermodynamic information A more recent pro-gram TurboFold39 also uses thermodynamic fold-ing and iteratively refines the energy parameters byincorporating pair probabilities from homologoussequences
Structural AlignmentEven unaligned RNAs can provide more informa-tion about their common structure than a sin-gle sequence Low sequence homology below 60sequence identity40 prohibits the sequence alignment-based approach of the previous section (Figure 3b)since correct alignment requires information aboutthe structure Since structure predictions for singlesequences are unreliable folding the sequences fol-lowed by structure-based alignment can also fail
Therefore the most successful strategies foldand align the RNAs simultaneously The first suchalgorithm41 by Sankoff simultaneously optimizes thealignmentrsquos edit distance and the free energies of bothRNA structures applying a loop-based energy modelHowever only recent advances made this strategyapplicable to practical RNA analysis
The first complete pairwise Sankoff-implementation Dynalign42 implements a loop-based energymodel but employs a simple banding technique forincreasing efficiency A further pairwise Sankoff-liketool is Foldalign43 Computing a loop-based energymodel during the alignment these algorithms areaccurate but computationally expensive in practicethey compensate this by strong sequence-basedheuristic restrictions
Several less expensive Sankoff-like algorithmsare based on simplifications introduced by PMcomp44
PMcomp replaces the loop-based energy model byassigning lsquopseudo-energiesrsquo to single base pairs Thisreduces the computational overhead significantly Bycomputing the pseudoenergies of base pairs fromtheir probabilities in the structure ensembles of thesingle RNAs accurate information from the loop-based energy model is fed back into the light-weightalgorithm
Approaches following this idea are LocARNA45
foldalignM46 RAF47 and LocARNA-P48 All these
tools additionally employ sparsity in the structureensemble of the single sequences
The sparsified PMcomp-like approaches aresufficiently fast for multiple alignment and large scalestudies performing for example clustering4546 andde novo prediction of structural RNA49
Prediction with PseudoknotsPseudoknotted structures follow the same rules asother secondary structures but allow nontree likeconfigurations for example because of an additionallevel of nested base pairs or pairing between differenthairpin loops (kissing hairpin) (Figure 4) Pseudoknotsare prevalent in many ncRNAs still most algorithmsignore them for technical reasons pseudoknot-foldingis computationally expensive and accurate empiricalenergy models are missing
Algorithmic ChallengesThe run-time of pseudoknot-free structure predic-tion grows only with the cube of the sequencelength Unfortunately when general pseudoknots areallowed the computation time grows much fasternamely exponentially with the sequence length50
Consequently finding exact solutions is intractablefor all but very short RNAs Note that the lsquoprincipleof optimalityrsquo which allows dynamic programming(Box 1) in the pseudoknot-free case is not applicablein the case of general pseudoknots By this princi-ple every optimal structure can be composed fromoptimal structures of its subsequences
In practice often heuristic methods are appli-cable Among the numerous approaches are ILM53
HotKnots54 KnotSeeker55 and IPknot56 ILM53
applies the classic principle of lsquohierarchic foldingrsquo itconstructs a pseudoknotted structure by iterativelypredicting a most likely stem using pseudoknot-freeprediction which is then added to the structure Hot-Knots54 refines the construction of the pseudoknottedstructure from likely more general secondary struc-ture elements TurboKnot57 even predicts conservedpseudoknots from a set of homologous RNAs Apply-ing a topological classification of RNA structures58
TT2NE59 guarantees to find the best RNA structureregardless of pseudoknot complexity however thislimits the length of treatable sequences
Other algorithms restrict the types of pseu-doknots such that dynamic programming can beapplied50ndash5260ndash63 These algorithms differ in theircomputational complexity and the complexity of con-sidered structures Figure 4 shows pseudoknots ofdifferent complexity
Rivas and Eddy51 proposed the most generalsuch algorithm It predicts three-knots (Figure 4c)
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52
predicts only canonical pseudoknots
(a)
(b)
(c)
(d)
(e)
but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63
(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically
Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are
no empirical energy parameters for pseudoknottedstructure elements
Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365
Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167
Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward
There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(a) (b)
FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)
example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69
predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70
RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions
Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular
base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed
Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72
and RNAduplex73
A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)
The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75
Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets
Finally several approaches predict con-served interactions between multiple sequencealignments7879
Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems
(a)
(b)
(c)
(d)
Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83
is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases
ANNOTATING ncRNAS IN GENOMICDATA
Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs
De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function
As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(c)
(a)(b)
FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)
of evolutionary signatures in alignments of relatedsequences can improve the signal considerably
The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93
To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95
searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596
and many other genomes (eg Ref 97)
Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization
Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101
A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models
In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110
signal recognition particle RNAs111
Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112
For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115
found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117
MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS
The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels
Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners
such as TopHat118 GSNAP119 and SpliceMap 120
identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified
Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)
With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121
and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference
In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms
sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements
Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126
using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood
function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms
RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries
The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 4 | Pseudoknot types (a) The simplest typeof pseudoknot (H-type) formed by two crossing stemsThe most efficient algorithms predict only this mostcommon form of pseudoknot (b) Three-chain or kissinghairpin Two hairpin loops are connected by one ormore base pairs (c) Three-knot Three stems cross eachother This configuration is predicted only by theexpensive algorithm by Rivas and Eddy51 (d) Four-chainclosed by a fifth stem This complex motif cannot bepredicted by the algorithm of Rivas and Eddy but wouldrequire an even more costly algorithm (e) Canonicalpseudoknot A pseudoknot formed by two perfect stemsof canonical base pairs that are maximally extendedthat is they cannot be extended further by canonicalbase pairs the figure indicates the latter by the dashedlsquoconflictrsquo-arcs between non-canonical base pairs AAGA CA and CC (from left to right) The mostspace-efficient pseudoknot prediction algorithm52
predicts only canonical pseudoknots
(a)
(b)
(c)
(d)
(e)
but cannot predict more complex pseudoknots suchas the one shown in Figure 4d Its large time and spacerequirements prohibit its application to large scaledata analysis The most efficient such algorithm52 hasonly the same space requirements as pseudoknot-freeprediction its run-time grows with the fourth powerof the sequence length adding only a linear factorover pseudoknot-free folding However it predictsonly canonical pseudoknots (Figure 4e) which areformed by two canonical stems such stems are (1)composed of canonical base pairs (2) lsquoperfectrsquo that isthey do not contain interior loops or bulges and (3)maximally extended that is they cannot be extendedby canonical base pairs Further such algorithms aretailored for specific interesting pseudoknot-classes63
(Figure 4a and b) Mohl et al64 recently managed tospeed up such algorithms nonheuristically
Energy Models for PseudoknotsA further challenge of pseudoknot prediction isto find an accurate energy model The establishedloop-based energy models for RNA are tailoredfor pseudoknot-free structures to date there are
no empirical energy parameters for pseudoknottedstructure elements
Consequently some algorithms consider onlythe simplistic case of base-pair maximization5365
Although some authors argue that important entropycontributions in pseudoknots cannot be coveredby a loop-based energy model66 most approachesextend the loop-based energy model for pseudoknot-loops5167
Tertiary StructureWhile secondary structure is strongly stabilizing thethree-dimensional structure the tertiary structuredepends on stabilizing non-canonical base pairs andvan der Waals interactions Furthermore pseudoknotsimpact the tertiary structure Therefore deriving thetertiary structure in a hierarchic way from predictedsecondary structure is not straightforward
There are two main ways to model tertiary RNAstructure One template-based modeling employshomology to other RNAs with known structuresThe other de novo prediction computes struc-tures from physical and knowledge based rules For
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(a) (b)
FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)
example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69
predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70
RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions
Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular
base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed
Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72
and RNAduplex73
A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)
The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75
Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets
Finally several approaches predict con-served interactions between multiple sequencealignments7879
Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems
(a)
(b)
(c)
(d)
Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83
is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases
ANNOTATING ncRNAS IN GENOMICDATA
Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs
De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function
As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(c)
(a)(b)
FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)
of evolutionary signatures in alignments of relatedsequences can improve the signal considerably
The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93
To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95
searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596
and many other genomes (eg Ref 97)
Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization
Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101
A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models
In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110
signal recognition particle RNAs111
Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112
For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115
found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117
MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS
The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels
Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners
such as TopHat118 GSNAP119 and SpliceMap 120
identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified
Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)
With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121
and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference
In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms
sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements
Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126
using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood
function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms
RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries
The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(a) (b)
FIGURE 5 | Tertiary structure prediction Example prediction fromthe MC-Fold and MC-Sym pipeline68 (a) Secondary structureincluding canonical (bold lines) and noncanonical base-pairs (non-boldlines) as predicted by MC-Fold (b) Tertiary structure predicted fromsecondary structure (a) by MC-Sym The prediction (blue) is compared tothe experimental structure (gold)
example the MC-foldMC-sym pipeline68 (Figure 5)assembles fragments of experimentally determinedthree-dimensional structures On the basis of suchstructures the approach builds a library of frequentsmall secondary structure loop-motifs called NucleicCyclic Motifs (NCMs) together with their three-dimensional configurations Given a RNA sequenceMC-Fold constructs probable secondary structuresby merging NCMs in this process it assigns likelyNCMs to subsequences MC-sym assigns concrete3D-structures to the NCMs to generate a consis-tent 3D structure The approach has been testedon 13 RNAs of an average size of 30 nucleotidesRunning several hours per prediction the known3D-structures have been reproduced within 23 A ataverage68 Similar to MC-Fold the recent RNAwolf69
predicts extended secondary structures consideringnon-canonical base pairs the same work reports amore efficient dynamic programming-based reimple-mentation of MC-Fold which improves parameterestimation A more detailed review and a systematicperformance comparison of RNA tertiary structureprediction programs is provided by Laing et al70
RNARNA InteractionsMany ncRNAs interact with other RNAs by specificbase-pairing most prominently miRNAs targetthe untranslated regions of mRNAs PredictingRNARNA interactions can thus elucidate RNAinteraction partners and potential functions
Most generally one aims to predict the sec-ondary structure of the interaction complex of twoRNAs consisting of intramolecular and intermolecular
base-pairs (Figure 6) Alkan et al71 formalized theproblem and showed thatmdashsimilar to pseudoknot pre-dictionmdashit cannot be solved efficiently Therefore sev-eral simplifications and heuristics have been proposed
Most approaches restrict the possible structuresof the interaction complex to enable efficient algo-rithms using dynamic programming Figure 6 showsinteraction complexes from several restriction classesThe simplest approaches ignore intramolecular base-pairs and predict only the best set of interactingbase-pairs (Figure 6a) examples are RNAhybrid72
and RNAduplex73
A more general approach optimizes intra-and intermolecular base-pairs simultaneously in arestricted structure space lsquoCo-foldingrsquo of RNAs forexample implemented by RNAcofold73 concatenatesthe two RNA sequences and predicts a pseudoknot-free structure for the concatenation Co-folding leadsto a very efficient algorithm but strongly restricts thespace of possible structures such that only externalbases can interact (Figure 6b)
The dynamic-programming algorithms7174 thatpredict more general structures (Figure 6c) forbiddingonly pseudoknots crossing interaction and zig-zags(Figure 6d) are computationally as expensive asthe most complex efficient pseudoknot predictionalgorithm 51 they are therefore rarely used in practicealbeit their efficiency has been improved recently75
Several fast methods717677 assume that inter-actions form in two steps First the RNA unfoldspartially which requires certain energy to open theintramolecular base-pairs Second the unfolded nowaccessible RNA hybridizes with its partner form-ing energetically favorable intermolecular base-pairsRNAup76 computes the energies to unfold each sub-sequence in the single RNAs and combines theunfolding energies with the hybridization energies toapproximate the energy of the interaction complexIntaRNA77 optimizes this approach and extends it toscreen large data sets for potential interaction targets
Finally several approaches predict con-served interactions between multiple sequencealignments7879
Kinetic FoldingCommon structure prediction methods assume thatthe functional RNA structure can be identified solelybased on the thermodynamic equilibrium withoutconsidering the dynamics of the folding processAlthough the true impact of the kinetics on functionalRNA structures is still unknown there are examplessuch as RNA switches80 that highlight the importanceof folding kinetics
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems
(a)
(b)
(c)
(d)
Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83
is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases
ANNOTATING ncRNAS IN GENOMICDATA
Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs
De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function
As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(c)
(a)(b)
FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)
of evolutionary signatures in alignments of relatedsequences can improve the signal considerably
The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93
To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95
searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596
and many other genomes (eg Ref 97)
Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization
Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101
A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models
In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110
signal recognition particle RNAs111
Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112
For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115
found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117
MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS
The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels
Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners
such as TopHat118 GSNAP119 and SpliceMap 120
identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified
Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)
With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121
and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference
In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms
sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements
Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126
using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood
function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms
RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries
The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
FIGURE 6 | RNARNA interactions (a) Simplehybridization no internal structure of RNAs Thesimplest interaction prediction approaches predictonly the hybridization at a single site withoutconsidering internal structure (b) Hybridizationand restricted internal structure as in the co-foldingmodel Interactions can occur at several siteshowever only between external bases Whenconcatenating the two interacting RNAs thestructure of all inter- and intramolecular base-pairsis pseudoknot-free such that it can be predictedfrom the concatenation by (a variant of) Zukerrsquosalgorithm (c) Interaction structure as predictableby the most complex dynamic programmingalgorithms Such structures are free ofpseudoknots crossing interactions and zig-zags(see d) (d) Zig-zag Intramolecular stems in eachRNA cover a common interaction as well asinteractions to the outside of the stems
(a)
(b)
(c)
(d)
Several groups have studied the folding processof RNAs (reviewed in Refs 66 83) The RNA foldingprocess is commonly modeled using energy landscapes(Figure 7)84 Such landscapes assign energies to singlestructures or states and define neighborship betweenstates RNAlocopt85 enables studying the Boltzmannensemble of local optima in an RNA energy landscapeThe folding process iteratively moves from one stateto a neighbor the move probability depends on theenergy difference Studying folding by simulation83
is expensive since it requires averaging over manytrajectories Because the exact model of folding as aMarkov process can be solved only for small systemsmany methods coarse-grain the energy landscapeto enable the analysis of the process For examplelsquobarrier-treesrsquo represent the energy landscape as atree of local minima connected by their saddlepoints82 BarMap81 generalizes coarse-graining tononstationary scenarios like temperature changes orco-transcriptional folding86 predicts RNA foldingpathways based on motion planning techniques fromrobotics Kinefold87 simulates single folding pathsover seconds to minutes for sequences up to 400 bases
ANNOTATING ncRNAS IN GENOMICDATA
Another major challenge in understanding thefunction of ncRNAs is to find and annotate them incomplete genomes We distinguish homology searchie trying to identify new members of already knownclasses of ncRNAs and de novo prediction with theaim to discover novel ncRNAs
De Novo PredictionAlthough a general de novo ncRNA finder remainselusive some progress has been made in the identifi-cation of structural RNAs that is ncRNAs that relyon a defined secondary structure for their function
As a first attempt one could use normal foldingalgorithms such RNAfold and hope to find structuralRNAs to be thermodynamically more stable than thegenomic background However although on averagestructural RNAs are more stable than expected thisapproach is generally not significant enough to reliablydistinguish true structural RNAs from the rest of thegenome8889 Comparative approaches that make use
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(c)
(a)(b)
FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)
of evolutionary signatures in alignments of relatedsequences can improve the signal considerably
The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93
To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95
searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596
and many other genomes (eg Ref 97)
Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization
Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101
A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models
In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110
signal recognition particle RNAs111
Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112
For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115
found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117
MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS
The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels
Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners
such as TopHat118 GSNAP119 and SpliceMap 120
identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified
Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)
With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121
and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference
In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms
sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements
Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126
using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood
function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms
RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries
The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
(c)
(a)(b)
FIGURE 7 | Kinetic folding pathways (a) Schematic energy landscape and associated barrier tree A barrier tree shows the local minima and theminimum energy barriers between them (Reprinted with permission from Ref 81 Copyright 2010 RNA Society) (b) Barrier tree of the small RNA xbix(c) Exact folding kinetics of xbix starting from the open chain Probability of local minima over time While the minimum free energy (MFE) structure isfinally most prominent other lsquointermediaryrsquo structures (2 3 and 4) are temporarily more probable(Reprinted with permission from Ref 82 Copyright2004 IOP Publishing Ltd)
of evolutionary signatures in alignments of relatedsequences can improve the signal considerably
The first program that used pairwise alignmentsto find structured RNAs was QRNA90 Based onstochastic context free grammars it could successfullyidentify novel ncRNAs in bacteria9192 MSARIwas thefirst algorithm applying the idea of finding conservedRNA structures to multiple sequence alignments93
To screen larger genomes higher accuracy wasnecessary RNAz94 analyzes multiple sequence align-ments and combines evidence from structural con-servation and thermodynamic stability EvoFold95
searches for conserved secondary structures in multi-ple alignments using a phylogenetic stochastic context-free grammar Both programs were used to mappotential RNA secondary structures in the human9596
and many other genomes (eg Ref 97)
Another approach that was used to detect con-served RNA secondary structures in bacteria98 isimplemented in CMFinder99 CMFinder builds acovariance model from a set of unaligned sequencesby iterative optimization
Homology SearchPure sequence based search algorithms like BLASTquickly reach their limits when used to identify distanthomologs of RNAs100101
A solution is to include structure information inthe search Several motif description languages havebeen developed that allow one to manually specifysequence and structure properties and subsequentlyuse these patterns to search databases or genomic dataExamples of such descriptor based search algorithmare RNAMOT102 and RNAmotif103
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models
In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110
signal recognition particle RNAs111
Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112
For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115
found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117
MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS
The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels
Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners
such as TopHat118 GSNAP119 and SpliceMap 120
identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified
Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)
With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121
and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference
In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms
sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements
Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126
using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood
function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms
RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries
The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
Another class of programs automatically create adescription of a structural RNA from a structure anno-tated alignment The most commonly used program ofthis class is INFERNAL that uses covariance models afull probabilistic description of an RNA family basedon stochastic context-free grammars104 The Rfamdatabase (see below) is based on INFERNAL and pro-vides a curated collection of such covariance models
In addition to these generic homology searchtools there are several specialized programs forfinding ncRNAs of a particular family such astRNAs105 rRNAs106 snoRNAs107ndash109 tmRNAs110
signal recognition particle RNAs111
Coding PotentialA complication during ncRNA annotation is the factthat many transcripts appear to be noncoding butin fact have the potential to code for a protein112
For example short open reading frames can be easilymissed and biological ambiguities of transcripts thatact both on the level of the RNA and protein canmake the annotation difficult113114 A good overviewof different methods to assess the coding potential ofRNAs is given by Frith et al The benchmark study115
found comparative analysis to be one of the mostpromising approaches Purifying selection on the pro-tein sequence turned out to be a reliable indicator ofcoding potential and several programs were developedto exploit this feature90116117
MINING RNA-SEQ DATA FORNONCODING RNA TRANSCRIPTS
The advent of high-throughput RNA sequencing hasprovided a robust platform for the developmentand expansion of several transcriptome-level analy-ses RNA-seq is the highly parallelized process ofsequencing individual cDNA fragments created froma population of RNA molecules Here we discussthree challenges that must be addressed to mine RNA-seq data for noncoding transcripts (1) read mappingto a reference genome (or transcriptome) (2) tran-scriptome reconstruction from mapped reads and (3)quantification of transcript levels
Short Read MappingThe first stage in short-read sequencing data analysisis the alignment of the sequenced reads to a refer-ence genome The algorithmic details of short readmapping is beyond the scope of this review Hereit is important to note that transcript reconstructionrequires so-called lsquospliced alignersrsquo Spliced aligners
such as TopHat118 GSNAP119 and SpliceMap 120
identify and map short reads that span exonndashexonjunctions From these spliced alignments novel splic-ing events and subsequently new transcript modelscan be identified
Reconstruction of Transcript ModelsA key advantage of RNA-seq over traditional formsof RNA expression analysis is the fact that little tono a priori information on the presence of an RNAsequence is required and in principle all requiredinformation can be learned directly from the data Thisadvantage however is dependent on the ability to re-construct a transcriptome fragmented into millionsof short reads Common approaches to solving thisjigsaw puzzle-like problem focus on one of twodifferent strategies (1) reference-guided assembly or(2) de novo assembly (Figure 8)
With a reference-guided assembly reads are ini-tially aligned using a spliced aligner to a referencegenome sequence The requirement for gapped align-ments allows for discovery of putative splice junctionsat the locations in which a read maps to the ref-erence with a gap across an appropriately sizedgenomic interval The two most popular reference-guided transcriptome assembly tools Scripture121
and Cufflinks122 both treat these gaps as candidatesplice junctions and use this information to constructa graph representation of the transcriptome In thecase of Scripture the graph represents the exonicconnectivity potential of the reference genome Cuf-flinks creates independent graph models for eachindependent genomic interval assumed to be a putativelsquogenersquo In either case the various paths through theseconnectivity graphs represent independent transcriptisoforms Scripture will attempt to identify all pos-sible paths through the graph that can be explained bythe mapped reads for a given gene and in this regardis useful for identifying lowly expressed isoformsbut tends to produce more noise in highly splicedstructures In contrast Cufflinks produces a set ofisoforms that represent the most parsimonious pathsthat can explain the given mapped reads which maynot report some redundant (but true) isoforms butdoes not overburden the results with false positivesAdditionally Cufflinks estimates the read cover-age across the paths to assist the selection of the mostparsimonious isoforms The result of either of thesetwo approaches is a reconstructed transcriptome thedetail of which is supported by the read sequencesabundance and mappability to a reliable reference
In contrast to these reference-guided approachesVelvet123 and transABySS 124 use the short-read
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms
sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements
Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126
using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood
function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms
RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries
The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 8 | Reconstructing transcript models from RNA-seq data Two splice isoforms of RNAs are shown for which the RNA-seq experimentgenerated short sequence fragments One approach (left) to reconstruct the transcript is mapping the fragments to a reference genome Spliced readsthat span exons boundaries can be used to infer the connectivity graph The paths through this graph correspond to the different isoformsAlternatively the transcripts can be re-constructed by de novo assembly of the reads into transcripts (right) If available the assembled RNAtranscripts can be mapped to a reference genome afterwards to obtain the intron-exon structure of the isoforms
sequences directly and attempt to construct contig-liketranscripts124 This approach tends to be significantlymore computationally intensive but is essential inspecies that do not have a reliable reference genomeor in the case when the expected transcriptome candeviate significantly from the reference genome due torearrangements
Quantification and Differential ExpressionIn RNA-Seq the number of individual sequencedfragments from a given transcript is used as a proxyfor its abundance Determinations of expression levelcan be coarsely determined at the gene-level125126
using a pseudomodel that consists of either themost-abundant isoform model an intersection modelquantifying only the regions present in all predictedisoforms or a union model The intersection modelhas been shown to to reduce the ability to accuratelydetermine differential expression and the union modelcan underestimate expression for those genes withalternative splicing122127 More accurately gene-levelestimates can be determined as the sum of isoform-level abundance estimates122128 involving a likelihood
function to model the various effects encountered inthe sequencing process129 The result of fitting thesemodels to the data is a maximum likelihood estimateof the isoform-level abundances for each gene Gene-level abundance estimates are easily determined bysumming the expression levels of individual isoforms
RNA-seq expression values must be normal-ized to correct for inherent biases in the dataThe lsquoReads Per Kilobase of transcript per Millionmappedrsquo (RPKM) has emerged as a standard metricfor reporting of estimated abundance levels This met-ric has the advantage of correcting for the two mainsources of variability in RNA-seq data the length ofthe transcript and the depth of the libraries
The robust quantification of transcript levelsalso allows one to study differential expression Sincemost gene-level projections of abundance estimationresult in a single RPKM value for each gene it wouldbe reasonable to directly use most of the many dif-ferential expression tests that have been developedfor microarray analysis over the past few yearsThere are however additional benefits that can begained from using RNA-seq data such as the abilityto derive a distribution of abundance estimates from
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
a given sample or set of samples Short read map-ping to a given genomic interval can be considereda counting problem and many differential expres-sion analyses initially attempted to fit read countsto either Poisson or Binomial distributions to deter-mine enriched transcripts These methods howeverfail to incorporate any information about biologicalvariability Several more recent applications such asCuffdiff122 DESeq130 and EdgeR 131 incorpo-rate variance information from biological replicates intheir differential expression models leading to morerigorous statistics
miRNAs
miRNAs are short endogenous regulatory non-codingRNAs found in eukaryotic cells whose primaryfunction is to post-transcriptionally repress genes132
miRNAs inhibit translation and promote mRNAdegradation via sequence-specific binding to the 3primeUTR regions and coding sequences133ndash135 They areproduced from hairpin precursors (pri-miRNAs) thatare processed by Drosha to form a pre-miR hairpinand then by Dicer to generate one or more 18- to23-nt mature miRNAs136 Mature miRNAs are thenincorporated into RISC where they hybridize withtarget sites of the mRNA which are complementaryto the miRNA seed (positions 2ndash8) leading to post-transcriptional repression Since their discovery therehas been much interest in the computational identifi-cation of miRNAs at a genome wide level using somecombination of evolutionary conservation hairpinstructure thermodynamic stability genomic contextand more recently the presence of the mature miRNAin sequencing data137
Identification of miRNAsEvolutionary ConservationSome of the first approaches such as miRScan andsnarloop use conservation and similarity to knownmiRNAs for the prediction new examples138139
Other approaches such as miRSeeker incorpo-rate miRNA-specific patterns of conservation suchas stronger conservation in the hairpin stem com-pared to the loop140 Additionally patterns of con-servation of target sites have been used to iden-tify novel miRNAs141 and to refine annotations ofknown miRNAs142 Other approaches combine bothsequence and structural alignments to find miRNAhomologs143144
Structural PropertiesThe secondary structure and thermodynamic stabilityare important features for the prediction of miRNAs
especially when they are not conserved or orthologsdo not exist in known species Because miRNAsneed to form stable hairpins for their processingmany studies have used structural features for theirprediction It has been demonstrated that miRNAs aresignificantly more stable than randomized sequencesof the same nucleotide or dinucleotide composition145
and many studies have used programs like RNAz topredict novel miRNAs based on this characteristicfeature146147 Other studies have developed machinelearning approaches that train classifiers on knownmiRNAs and subsequently identify novel and in manycases nonconserved miRNAs148ndash150
Genomic ContextOther approaches have looked for features in thesurrounding genomic context for the prediction ofnovel miRNAs and for refining other predictionsSome early work helped to filter predictions with acharacteristic motif upstream of and patterns of con-servation flanking the pre-miR151 Other studies haveused the fact that miRNAs tend to reside within poly-cistronic clusters of more than one miR to identifynovel miRNAs3152 Some approaches also make useof the fact that regions proximal to miRNAs tend to bedevoid of other non-miR small RNAs and when theyare flanked by other small RNAs such as miRNA off-set RNAs (moRs) the separation from mature miRNAsequences is minimal153
Next Generation SequencingThe analysis of the sequencing of size-selected cDNAlibraries has proved to be the most reliable method forthe identification of novel miRNAs in most instancescoupled with other features such as structure andconservation to enhance predictions because it pro-vides validation that the mature sequence is expressedIn addition to novel miRNAs the analysis of high-throughput sequencing data of small RNAs has led tothe elucidation of many other classes of small RNAsincluding endogenous siRNAs154 piRNAs155156 andmoRs 157 among others There are now a few publiclyavailable software tools for the prediction of miR-NAs from high-throughput sequencing data such asmiRDeep MIReNA and miRTRAP153158159
miRNA Target PredictionmiRNA target prediction is another lively area ofcomputational analysis related to miRNAs Earlyapproaches such as targetScan identify evolution-ary conserved seed matches and later approachessuch as PicTar have incorporated target sitestability155160 The topic is related to the problem of
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
FIGURE 9 | Growth of miRBase and Rfam over the past 8 years For miRBase the number of miRNAs are shown for Rfam the number of structurefamilies and the number of sequences found to be member of an Rfam family (The drop of the number of Rfam sequences in 2011 is the result of there-organization of some large families and the elimination of pseudogenes)
predicting RNARNA interactions discussed aboveFor a comprehensive review on target prediction seeBartel161
DATABASES
There are many databases related to ncRNAs and wecannot cover all of them here A more specializedreview162 and the yearly database issue of NucleicAcids Research163 are good resources to get a moredetailed overview
There are many highly specialized databases thatcollect RNAs of a specific class Basically for all wellknown lsquoclassicalrsquo RNAs like tRNAs rRNAs snoR-NAs SRP RNAs tmRNAs group I or II introns adatabase is available162
All newly identified ncRNA sequences are usu-ally deposited in general sequence database suchas Genbank However typically they are not sys-tematically annotated in these databases and con-sequently a few other databases have emergedthat systematically collect ncRNAs (NONCODE164
RNAdb165 fRNAdb166 lncRNAdb167)
Rfam is an important resource for structuredRNAs and also includes structured regulatory ele-ments in mRNAs168 It collects hand curated covari-ance models (see above) that are used to systematicallysearch sequence databases for new members As ofwriting this review Rfam contains 1973 families witha total of 2756313 members (Figure 9) All RNAfamilies in Rfam are manually annotated
The most extensive database for miRNAsequences hairpins and target sites is miRBase (httpwwwmirbaseorg)169 miRBase has seen rapid growthover the past few years (Figure 9) and is the officialrepository and naming authority for newly discoveredmiRNAs Other related databases include Tarbasewhich is a database of experimentally verified targetsites170 and miR2Disease which is a database thatmaintains a manually curated set of disease associatedmiRNA target sites171
CONCLUSIONS AND OUTLOOK
The wide variety of topics covered in this paperreflects the increasing complexity of the field It also
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
clearly demonstrates the interdisciplinary effort thatis necessary to address these problems It is safeto predict that computational problems related toncRNAs will remain challenging for the comingyears In particular elucidating the functions of lin-cRNAs will require new approaches Many methodsfor structural analysis for example were developedfor rather short structured ncRNAs and cannot bedirectly applied to lncRNAs that can be kilobasesin length Prediction of long range intramolecularinteractions within lncRNAs or prediction of inter-molecular RNARNA or RNADNA interactions oflncRNAs will require extensions and improvements
of established algorithms Also the problem of pre-dicting proteinndashRNA interactions will be of highrelevance given the increasing number of examplesof lincRNAs that act as scaffolds for protein com-plexes Also more accurate and efficient analysis ofhigh-throughput data will be a challenge for the fieldWe have mentioned analysis of RNA-seq data butnext generation sequencing can also used for a vari-ety of other ncRNA related problems such as highthroughput RNA secondary structure probing or map-ping RNAprotein interactions Also new approachesto organize ncRNA data will be important and thereis need for new centralized databases and specializedresources172
ACKNOWLEDGMENTS
This work was supported by the Austrian Fonds zur Forderung der Wissenschaftlichen Forschung [SchrodingerFellowship J2966-B12 to StW] and by the German Research Foundation (grant WI 36281-1 to SW) and NIHaward 1RC1CA147187 to DAH
REFERENCES1 Amaral P Dinger M Mercer T Mattick J The
eukaryotic genome as an RNA machine Science 20083191787ndash1789
2 Pasquinelli A Reinhart B Slack F Martindale MKuroda M Maller B Hayward D Ball E DegnanB Muller P et al Conservation of the sequence andtemporal expression of let-7 heterochronic regulatoryRNA Nature 2000 40886ndash89
3 Lau N Lim L Weinstein E Bartel D An abundantclass of tiny RNAs with probable regulatory roles inCaenorhabditis elegans Science 2001 294858ndash862
4 ENCODE Project Consortium Identification andanalysis of functional elements in 1 of the humangenome by the ENCODE pilot project Nature 2007447799ndash816
5 van Bakel H Nislow C Blencowe BJ Hughes TRMost lsquolsquodark matterrsquorsquo transcripts are associated withknown genes PLoS Biol 2010 8e1000371
6 Clark MB Amaral PP Schlesinger FJ Dinger METaft RJ Rinn JL Ponting CP Stadler PF Morris KVMorillon A et al The reality of pervasive transcrip-tion PLoS Biol 2011 9e1000625
7 Guttman M Donaghey J Carey BW Garber M Gre-nier JK Munson G Young G Lucas AB Ach R BruhnL et al lincRNAs act in the circuitry controllingpluripotency and differentiation Nature 2011 477295ndash300
8 Mattick JS Challenging the dogma the hidden layerof non-protein-coding RNAs in complex organismsBioessays 2003 25930ndash939
9 Breaker RR Riboswitches and the RNA WorldCold Spring Harb Perspect Biol 2012 4a003566doi101101cshperspecta003566
10 Frhlich KS Vogel J Activation of gene expression bysmall RNA Curr Opin Microbiol 2009 12674ndash682
11 Weinberg Z Perreault J Meyer MM Breaker RRExceptional structured noncoding RNAs revealed bybacterial metagenome analysis Nature 2009 462656ndash659
12 Nussinov R Jacobson A Fast algorithm for predictingthe secondary structure of single-stranded RNA ProcNatl Acad Sci U S A 1980 776309ndash6313
13 Mathews D Sabina J Zuker M Turner D Expandedsequence dependence of thermodynamic parametersimproves prediction of RNA secondary structureJ Mol Biol 1999 288911ndash940
14 Xia T SantaLucia J Jr Burkard M Kierzek RSchroeder S Jiao X Cox C Turner D Ther-modynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes withWatson-Crick base pairs Biochemistry 1998 3714719ndash14735
15 Zuker M Stiegler P Optimal computer folding oflarger RNA sequences using thermodynamics andauxiliary information Nucleic Acids Res 19819133ndash148
16 Markham NR Zuker M UNAFold software fornucleic acid folding and hybridization Methods MolBiol 2008 4533ndash31
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
17 Gruber A Lorenz R Bernhart S Neubock RHofacker I The Vienna RNA websuite Nucleic AcidsRes 2008 36W70ndashW74
18 Reuter J Mathews D RNAstructure software forRNA secondary structure prediction and analysisBMC Bioinform 2010 11129
19 Wuchty S Fontana W Hofacker IL Schuster PComplete suboptimal folding of RNA and the sta-bility of secondary structures Biopolymers 1999 49145ndash165
20 Zuker M On finding all suboptimal foldings of anRNA molecule Science 1989 24448ndash52
21 McCaskill J The equilibrium partition function andbase pair binding probabilities for RNA secondarystructure Biopolymers 1990 291105ndash1119
22 Ding Y Chan CY Lawrence CE RNA secondarystructure prediction by centroids in a Boltzmannweighted ensemble RNA 2005 111157ndash1166
23 Do CB Woods DA Batzoglou S CONTRAfold RNAsecondary structure prediction without physics-basedmodels Bioinformatics 2006 22e90ndashe98
24 Lu Z Gloor J Mathews D Improved RNA secondarystructure prediction by maximizing expected pair accu-racy RNA 2009 151805ndash1813
25 Dowell RD Eddy SR Evaluation of several lightweightstochastic context-free grammars for RNA secondarystructure prediction BMC Bioinform 2004 571
26 Andronescu M Condon A Hoos H Mathews D Mur-phy K Computational approaches for RNA energyparameter estimation RNA 2010 162304ndash2318
27 Andronescu M Condon A Hoos H Mathews DMurphy K Efficient parameter estimation for RNAsecondary structure prediction Bioinformatics 200723i19ndashi28
28 Weeks K Advances in RNA structure analysis bychemical probing Curr Opin Struct Biol 2010 20295ndash304
29 Mathews DH Disney MD Childs JL Schroeder SJZuker M Turner DH Incorporating chemical mod-ification constraints into a dynamic programmingalgorithm for prediction of RNA secondary structureProc Natl Acad Sci U S A 2004 1017287ndash7292
30 Deigan KE Li TW Mathews DH Weeks KM Accu-rate SHAPE-directed RNA structure determinationProc Natl Acad Sci U S A 2009 10697ndash102
31 Washietl S Hofacker IL Stadler PF Kellis M RNAfolding with soft constraints reconciliation of probingdata and thermodynamic secondary structure predic-tion Nucleic Acids Res 2012 404261ndash4272
32 Kertesz M Wan Y Mazor E Rinn J Nutter RChang H Segal E Genome-wide measurement ofRNA secondary structure in yeast Nature 2010 467103ndash107
33 Underwood J Uzilov A Katzman S Onodera CMainzer J Mathews D Lowe T Salama S Haus-sler D FragSeq transcriptome-wide RNA structureprobing using high-throughput sequencing Nat Meth-ods 2010 7995ndash1001
34 Lucks J Mortimer S Trapnell C Luo S Aviran SSchroth G Pachter L Doudna J Arkin A Multi-plexed RNA structure characterization with selective2prime-hydroxyl acylation analyzed by primer extensionsequencing (SHAPE-Seq) Proc Natl Acad Sci U S A2011 10811063ndash11068
35 Noller HF Kop J Wheaton V Brosius J Gutell RRKopylov AM Dohme F Herr W Stahl DA GuptaR et al Secondary structure model for 23S ribosomalRNA Nucleic Acids Res 1981 96167ndash6189
36 Bernhart S Hofacker I Will S Gruber A Stadler PRNAalifold improved consensus structure predictionfor RNA alignments BMC Bioinform 2008 9474
37 Knudsen B Hein J Pfold RNA secondary structureprediction using stochastic context-free grammarsNucleic Acids Res 2003 313423ndash3428
38 Seemann S Gorodkin J Backofen R Unifying evo-lutionary and thermodynamic information for RNAfolding of multiple alignments Nucleic Acids Res2008 366355ndash6362
39 Harmanci AO Sharma G Mathews DH TurboFolditerative probabilistic estimation of secondary struc-tures for multiple RNA sequences BMC Bioinform2011 12108
40 Gardner PP Wilm A Washietl S A benchmark ofmultiple sequence alignment programs upon structuralRNAs Nucleic Acids Res 2005 332433ndash2439
41 Sankoff D Simultaneous solution of the RNA foldingalignment and protosequence problems SIAM J ApplMath 1985 45810ndash825
42 Mathews DH Turner DH Dynalign an algorithm forfinding the secondary structure common to two RNAsequences J Mol Biol 2002 317191ndash203
43 Havgaard JH Torarinsson E Gorodkin J Fast pair-wise structural RNA alignments by pruning of thedynamical programming matrix PLoS Comput Biol2007 31896ndash1908
44 Hofacker IL Bernhart SH Stadler PF Alignment ofRNA base pairing probability matrices Bioinformatics2004 202222ndash2227
45 Will S Reiche K Hofacker IL Stadler PF Back-ofen R Inferring non-coding RNA families and classesby means of genome-scale structure-based clusteringPLoS Comput Biol 2007 3e65
46 Torarinsson E Havgaard JH Gorodkin J Multiplestructural alignment and clustering of RNA sequencesBioinformatics 2007 23926ndash932
47 Do CB Foo CS Batzoglou S A max-margin model forefficient simultaneous alignment and folding of RNAsequences Bioinformatics 2008 24i68ndashi76
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
48 Will S Joshi T Hofacker IL Stadler PF BackofenR LocARNA-P accurate boundary prediction andimproved detection of structural RNAs RNA 201218900ndash914
49 Will S Yu M Berger B Structure-based WholeGenome Realignment Reveals Many Novel Non-coding RNAs Proceedings of the 16th InternationalConference on Research in Computational MolecularBiology (RECOMB 2012) Volume 7262 of LNCSHeidelberg Springer-Verlag 2012
50 Lyngso RB Pedersen CNS Pseudoknots in RNASecondary Structures Proc of the Fourth AnnualInternational Conferences on Computational Molec-ular Biology (RECOMBrsquo00) Washington DC ACMPress 2000 [BRICS Report Series RS-00-1]
51 Rivas E Eddy SR A dynamic programming algorithmfor RNA structure prediction including pseudoknotsJ Mol Biol 1999 2852053ndash2068
52 Reeder J Giegerich R Design implementation andevaluation of a practical pseudoknot folding algo-rithm based on thermodynamics BMC Bioinform2004 5104
53 Ruan J Stormo GD Zhang W An iterated loopmatching approach to the prediction of RNA sec-ondary structures with pseudoknots Bioinformatics2004 2058ndash66
54 Ren J Rastegari B Condon A Hoos HH HotKnotsheuristic prediction of RNA secondary structuresincluding pseudoknots RNA 2005 111494ndash1504
55 Sperschneider J Datta A KnotSeeker heuristic pseu-doknot detection in long RNA sequences RNA 200814630ndash640
56 Sato K Kato Y Hamada M Akutsu T Asai K IPknotfast and accurate prediction of RNA secondary struc-tures with pseudoknots using integer programmingBioinformatics 2011 27i85ndashi93
57 Seetin MG Mathews DH TurboKnot rapid predic-tion of conserved RNA secondary structures includingpseudoknots Bioinformatics 2012 28792ndash798
58 Bon M Vernizzi G Orland H Zee A Topologicalclassification of RNA structures J Mol Biol 2008379900ndash911
59 Bon M Orland H TT2NE a novel algorithm topredict RNA secondary structures with pseudoknotsNucleic Acids Res 2011 39e93
60 Uemura Y Hasegawa A Kobayashi S Yokomori TTree adjoining grammars for RNA structure predic-tion Theoret Comput Sci 1999 210277ndash303 [Paperas Print Copy]
61 Akutsu T Dynamic programming algorithms for RNAsecondary structure prediction with pseudoknots DiscAppl Math 2000 10445ndash62
62 Dirks RM Pierce NA A partition function algorithmfor nucleic acid secondary structure including pseudo-knots J Comput Chem 2003 241664ndash1677
63 Chen HL Condon A Jabbari H An O(n(5)) algorithmfor MFE prediction of kissing hairpins and 4-chains innucleic acids J Comput Biol 2009 16803ndash815
64 Mohl M Salari R Will S Backofen R Sahinalp SCSparsification of RNA structure prediction includingpseudoknots Algor Mol Biol 2010 539
65 Tabaska JE Cary RB Gabow HN Stormo GD AnRNA folding method capable of identifying pseu-doknots and base triples Bioinformatics 1998 14691ndash699
66 Chen SJ RNA folding conformational statistics fold-ing kinetics and ion electrostatics Annu Rev Biophys2008 37197ndash214
67 Andronescu MS Pop C Condon AE Improved freeenergy parameters for RNA pseudoknotted secondarystructure prediction RNA 2010 1626ndash42
68 Parisien M Major F The MC-Fold and MC-Sympipeline infers RNA structure from sequence dataNature 2008 45251ndash55
69 zu Siederdissen CH Bernhart SH Stadler PFHofacker IL A folding algorithm for extendedRNA secondary structures Bioinformatics 2011 27i129ndashi136
70 Laing C Schlick T Computational approaches to3D modeling of RNA J Phys Condens Matter 201022283101
71 Alkan C Karakoc E Nadeau JH Sahinalp SC ZhangK RNA-RNA interaction prediction and antisenseRNA target search J Comput Biol 2006 13267ndash282
72 Kruger J Rehmsmeier M RNAhybrid microRNA tar-get prediction easy fast and flexible Nucleic Acids Res2006 34W451ndashW454
73 Hofacker IL Fontana W Stadler PF Bonhoeffer STacker M Schuster P Fast folding and comparison ofRNA secondary structures Monatshefte Chem 1994125167ndash188
74 Pervouchine DD IRIS intermolecular RNA interac-tion search Genome Inform 2004 1592ndash101
75 Salari R Mohl M Will S Sahinalp S Backofen RTime and space efficient RNA-RNA interaction pre-diction via sparse folding In Berger B ed Proc ofRECOMB 2010 Volume 6044 of Lecture Notes inComputer Science BerlinHeidelberg Springer 2010473ndash490
76 Muckstein U Tafer H Hackermuller J Bernhart SHStadler PF Hofacker IL Thermodynamics of RNA-RNA binding Bioinformatics 2006 221177ndash1182
77 Busch A Richter AS Backofen R IntaRNA efficientprediction of bacterial sRNA targets incorporating tar-get site accessibility and seed regions Bioinformatics2008 242849ndash2856
78 Li AX Marz M Qin J Reidys CM RNA-RNAinteraction prediction based on multiple sequencealignments Bioinformatics 2011 27456ndash463
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
79 Seemann SE Richter AS Gesell T Backofen R Gorod-kin J PETcofold predicting conserved interactionsand structures of two multiple alignments of RNAsequences Bioinformatics 2011 27211ndash219
80 Nagel JHA Pleij CWA Self-induced structuralswitches in RNA Biochimie 2002 84913ndash923
81 Hofacker IL Flamm C Heine C Wolfinger MTScheuermann G Stadler PF BarMap RNA fold-ing on dynamic energy landscapes RNA 2010 161308ndash1316
82 Wolfinger MT Svrcek-Seiler WA Flamm C HofackerIL Stadler PF Efficient computation of RNA foldingdynamics J Phys A Math Gen 2004 374731ndash4741[httpstacksioporg0305-4470374731]
83 Flamm C Hofacker I Beyond energy minimizationapproaches to the kinetic folding of RNA Chem Mon2008 139447ndash457
84 Chen SJ Dill KA RNA folding energy landscapesProc Natl Acad Sci USA 2000 97646ndash651
85 Lorenz WA Clote P Computing the partition func-tion for kinetically trapped RNA secondary structuresPLoS One 2011 6e16178
86 Tang X Thomas S Tapia L Giedroc DP Amato NMSimulating RNA folding kinetics on approximatedenergy landscapes J Mol Biol 2008 3811055ndash1067
87 Xayaphoummine A Bucher T Isambert H Kinefoldweb server for RNADNA folding path and structureprediction including pseudoknots and knots NucleicAcids Res 2005 33W605ndashW610
88 Rivas E Eddy SR Secondary structure alone is gen-erally not statistically significant for the detection ofnoncoding RNAs Bioinformatics 2000 16583ndash605
89 Washietl S Hofacker IL Consensus folding of alignedsequences as a new measure for the detection of func-tional RNAs by comparative genomics J Mol Biol2004 34219ndash30
90 Rivas E Eddy SR Noncoding RNA gene detectionusing comparative sequence analysis BMC Bioinform2001 28
91 del Val C Rivas E Torres-Quesada O Toro NJimnez-Zurdo JI Identification of differentiallyexpressed small non-coding RNAs in the legumeendosymbiont Sinorhizobium meliloti by comparativegenomics Mol Microbiol 2007 661080ndash1091
92 Rivas E Klein RJ Jones TA Eddy SR Computationalidentification of noncoding RNAs in E coli by com-parative genomics Curr Biol 2001 111369ndash1373
93 Coventry A Kleitman DJ Berger B MSARI multiplesequence alignments for statistical detection of RNAsecondary structure Proc Natl Acad Sci U S A 200410112102ndash12107
94 Washietl S Hofacker IL Stadler PF Fast and reliableprediction of noncoding RNAs Proc Natl Acad SciU S A 2005 1022454ndash2459
95 Pedersen JS Bejerano G Siepel A Rosenbloom KLindblad-Toh K Lander ES Kent J Miller W Haus-sler D Identification and classification of conservedRNA secondary structures in the human genome PLoSComput Biol 2006 2e33
96 Washietl S Pedersen JS Korbel JO Stocsits C GruberAR Hackermller J Hertel J Lindemeyer M ReicheK Tanzer A et al Structured RNAs in the ENCODEselected regions of the human genome Genome Res2007 17852ndash864
97 Mourier T Carret C Kyes S Christodoulou Z Gard-ner PP Jeffares DC Pinches R Barrell B Berriman MGriffiths-Jones S et al Genome-wide discovery andverification of novel structured RNAs in Plasmodiumfalciparum Genome Res 2008 18281ndash292
98 Weinberg Z Barrick JE Yao Z Roth A Kim JNGore J Wang JX Lee ER Block KF Sudarsan Net al Identification of 22 candidate structured RNAsin bacteria using the CMfinder comparative genomicspipeline Nucleic Acids Res 2007 354809ndash4819
99 Yao Z Weinberg Z Ruzzo WL CMfinderndasha covari-ance model based RNA motif finding algorithmBioinformatics 2006 22445ndash452
100 Freyhult EK Bollback JP Gardner PP Exploringgenomic dark matter a critical assessment of the per-formance of homology search methods on noncodingRNA Genome Res 2007 17117ndash125
101 Menzel P Gorodkin J Stadler PF The tedious taskof finding homologous noncoding RNA genes RNA2009 152075ndash2082
102 Gautheret D Major F Cedergren R Pattern searchingalignment with RNA primary and secondary struc-tures an effective descriptor for tRNA Comput ApplBiosci 1990 6325ndash331
103 Macke TJ Ecker DJ Gutell RR Gautheret D CaseDA Sampath R RNAMotif an RNA secondary struc-ture definition and search algorithm Nucleic Acids Res2001 294724ndash4735
104 Nawrocki EP Kolbe DL Eddy SR Infernal 10inference of RNA alignments Bioinformatics 2009251335ndash1337
105 Lowe TM Eddy SR tRNAscan-SE a program forimproved detection of transfer RNA genes in genomicsequence Nucleic Acids Res 1997 25955ndash964
106 Lagesen K Hallin P Rodland EA Staerfeldt HHRognes T Ussery DW RNAmmer consistent andrapid annotation of ribosomal RNA genes NucleicAcids Res 2007 353100ndash3108
107 Lowe TM Eddy SR A computational screen formethylation guide snoRNAs in yeast Science 19992831168ndash1171
108 Schattner P Decatur WA Davis CA Fournier MJLowe TM Genome-wide searching for pseudouridy-lation guide snoRNAs analysis of the Saccharomycescerevisiae genome Nucleic Acids Res 2004 324281ndash4296
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
109 Hertel J Hofacker IL Stadler PF SnoReport com-putational identification of snoRNAs with unknowntargets Bioinformatics 2008 24158ndash164
110 Laslett D Canback B Andersson S BRUCE a pro-gram for the detection of transfer-messenger RNAgenes in nucleotide sequences Nucleic Acids Res 2002303449ndash3453
111 Regalia M Rosenblad MA Samuelsson T Predic-tion of signal recognition particle RNA genes NucleicAcids Res 2002 303368ndash3377
112 Cabili MN Trapnell C Goff L Koziol M Tazon-Vega B Regev A Rinn JL Integrative annotationof human large intergenic noncoding RNAs revealsglobal properties and specific subclasses Genes Dev2011 251915ndash1927
113 Findei S Engelhardt J Prohaska SJ Stadler PFProtein-coding structured RNAs A computationalsurvey of conserved RNA secondary structures over-lapping coding regions in drosophilids Biochimie2011 932019ndash2023
114 Dinger ME Pang KC Mercer TR Mattick JS Differ-entiating protein-coding and noncoding RNA chal-lenges and ambiguities PLoS Comput Biol 20084e1000176
115 Frith MC Bailey TL Kasukawa T Mignone F Kum-merfeld SK Madera M Sunkara S Furuno M Bult CJQuackenbush J et al Discrimination of non-protein-coding transcripts from protein-coding mRNA RNABiol 2006 340ndash48
116 Lin MF Jungreis I Kellis M PhyloCSF a compara-tive genomics method to distinguish protein codingand non-coding regions Bioinformatics 2011 27i275ndashi282
117 Washietl S Findeiss S Muller SA Kalkhof S vonBergen M Hofacker IL Stadler PF Goldman N RNA-code robust discrimination of coding and noncodingregions in comparative sequence data RNA 2011 17578ndash594
118 Trapnell C Pachter L Salzberg SL TopHat discov-ering splice junctions with RNA-Seq Bioinformatics2009 251105ndash1111
119 Wu TD Nacu S Fast and SNP-tolerant detection ofcomplex variants and splicing in short reads Bioinfor-matics 2010 26873ndash881
120 Au KF Jiang H Lin L Xing Y Wong WH Detectionof splice junctions from paired-end RNA-seq data bySpliceMap Nucleic Acids Res 2010 384570ndash4578
121 Guttman M Garber M Levin JZ Donaghey J Robin-son J Adiconis X Fan L Koziol MJ Gnirke ANusbaum C et al Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conservedmulti-exonic structure of lincRNAs Nat Biotechnol2010 28503ndash510
122 Trapnell C Williams BA Pertea G Mortazavi AKwan G van Baren MJ Salzberg SL Wold BJPachter L Transcript assembly and quantification by
RNA-Seq reveals unannotated transcripts and isoformswitching during cell differentiation Nat Biotechnol2010 28511ndash515
123 Zerbino DR Birney E Velvet algorithms for de novoshort read assembly using de Bruijn graphs GenomeRes 2008 18821ndash829
124 Robertson G Schein J Chiu R Corbett R Field MJackman SD Mungall K Lee S Okada HM Qian JQet al De novo assembly and analysis of RNA-seq dataNat Methods 2010 7909ndash912
125 Mortazavi A Williams BA McCue K Schaef-fer L Wold B Mapping and quantifying mammaliantranscriptomes by RNA-Seq Nat Methods 20085621ndash628
126 Griffith M Griffith OL Mwenifumbo J Goya R Mor-rissy AS Morin RD Corbett R Tang MJ Hou YCPugh TJ et al Alternative expression analysis by RNAsequencing Nat Methods 2010 7843ndash847
127 Wang X Wu Z Zhang X Isoform abundance infer-ence provides a more accurate estimation of geneexpression levels in RNA-seq J Bioinform ComputBiol 2010 8 (Suppl 1)177ndash192
128 Katz Y Wang ET Airoldi EM Burge CB Analysis anddesign of RNA sequencing experiments for identifyingisoform regulation Nat Methods 2010 71009ndash1015
129 Roberts A Trapnell C Donaghey J Rinn JL PachterL Improving RNA-Seq expression estimates by cor-recting for fragment bias Genome Biol 2011 12R22
130 Anders S Huber W Differential expression analysisfor sequence count data Genome Biol 2010 11R106
131 Robinson MD McCarthy DJ Smyth GK edgeR aBioconductor package for differential expression anal-ysis of digital gene expression data Bioinformatics2010 26139ndash140
132 Ambros V The functions of animal microRNAsNature 2004 431350ndash355
133 Bartel D MicroRNAs genomics biogenesis mecha-nism and function Cell 2004 116281ndash297
134 Schnall-Levin M Zhao Y Perrimon N Berger BConserved microRNA targeting in Drosophila is aswidespread in coding regions as in 3primeUTRs Proc NatlAcad Sci U S A 2010 10715751ndash15756
135 Schnall-Levin M Rissland OS Johnston WK Per-rimon N Bartel DP Berger B Unusually effec-tive microRNA targeting within repeat-rich codingregions of mammalian mRNAs Genome Res 2011211395ndash1403
136 Grishok A Pasquinelli A Conte D Li N Parrish SHa I Baillie D Fire A Ruvkun G Mello C Genesand mechanisms related to RNA interference regulateexpression of the small temporal RNAs that control Celegans developmental timing Cell 2001 10623ndash34
137 Berezikov E Cuppen E Plasterk R Approaches tomicroRNA discovery Nat Genet 2006 38(Suppl)S2ndashS7
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
WIREs RNA Computational analysis of noncoding RNAs
138 Lim L Lau N Weinstein E Abdelhakim A Yekta SRhoades M Burge C Bartel D The microR-NAs of Caenorhabditis elegans Genes Dev 200317991ndash1008
139 Grad Y Aach J Hayes G Reinhart B Church GRuvkun G Kim J Computational and experimen-tal identification of C elegans microRNAs Mol Cell2003 111253ndash1263
140 Lai E Tomancak P Williams R Rubin G Computa-tional identification of Drosophila microRNA genesGenome Biol 2003 4R42
141 Xie X Lu J Kulbokas E Golub T Mootha VLindblad-Toh K Lander E Kellis M Systematic dis-covery of regulatory motifs in human promoters and3rsquo UTRs by comparison of several mammals Nature2005 434338ndash345
142 Stark A Kheradpour P Parts L Brennecke J HodgesE Hannon G Kellis M Systematic discovery and char-acterization of fly microRNAs using 12 Drosophilagenomes Genome Res 2007 171865ndash1879
143 Nam J Shin K Han J Lee Y Kim V Zhang BHuman microRNA prediction through a probabilisticco-learning model of sequence and structure NucleicAcids Res 2005 333570ndash3581
144 Wang X Zhang J Li F Gu J He T Zhang X Li YMicroRNA identification based on sequence and struc-ture alignment Bioinformatics 2005 213610ndash3614
145 Bonnet E Wuyts J Rouze P Van de Peer Y Evidencethat microRNA precursors unlike other non-codingRNAs have lower folding free energies than randomsequences Bioinformatics 2004 202911ndash2917
146 Washietl S Hofacker I Lukasser M Huttenhofer AStadler P Mapping of conserved RNA secondarystructures predicts thousands of functional noncodingRNAs in the human genome Nat Biotechnol 2005231383ndash1390
147 Hertel J Stadler P Hairpins in a Haystack recognizingmicroRNA precursors in comparative genomics dataBioinformatics 2006 22e197ndashe202
148 Bentwich I Prediction and validation of microRNAsand their targets FEBS Lett 2005 5795904ndash5910
149 Xue C Li F He T Liu G Li Y Zhang X Classifica-tion of real and pseudo microRNA precursors usinglocal structure-sequence features and support vectormachine BMC Bioinform 2005 6310
150 Pfeffer S Sewer A Lagos-Quintana M Sheridan RSander C Grasser F van Dyk L Ho C Shuman SChien M et al Identification of microRNAs of theherpesvirus family Nat Methods 2005 2269ndash276
151 Ohler U Yekta S Lim L Bartel D Burge C Patternsof flanking sequence conservation and a characteris-tic upstream motif for microRNA gene identificationRNA 2004 101309ndash1322
152 Seitz H Royo H Bortolin M Lin S Ferguson-Smith ACavaille J A large imprinted microRNA gene cluster
at the mouse Dlk1-Gtl2 domain Genome Res 2004141741ndash1748
153 Hendrix D Levine M Shi W miRTRAP a com-putational method for the systematic identificationof miRNAs from high throughput sequencing dataGenome Biol 2010 11R39
154 Lu C Tej S Luo S Haudenschild C Meyers B GreenP Elucidation of the small RNA component of thetranscriptome Science 2005 3091567ndash1569
155 Ruby J Jan C Player C Axtell M Lee W Nusbaum CGe H Bartel D Large-scale sequencing reveals 21U-RNAs and additional microRNAs and endogenoussiRNAs in C elegans Cell 2006 1271193ndash1207
156 Aravin A Gaidatzis D Pfeffer S Lagos-QuintanaM Landgraf P Iovino N Morris P Brownstein MKuramochi-Miyagawa S Nakano T et al A novelclass of small RNAs bind to MILI protein in mousetestes Nature 2006 442203ndash207
157 Shi W Hendrix D Levine M Haley B A distinctclass of small RNAs arises from pre-miRNA-proximalregions in a simple chordate Nat Struct Mol Biol2009 16183ndash189
158 Friedlander M Mackowiak S Li N Chen W Rajew-sky N miRDeep2 accurately identifies known andhundreds of novel microRNA genes in seven animalclades Nucleic Acids Res 2011
159 Mathelier A Carbone A MIReNA finding microR-NAs with high accuracy and no learning at genomescale and from deep sequencing data Bioinformatics2010 262226ndash2234
160 Stark A Brennecke J Bushati N Russell R Cohen SAnimal MicroRNAs confer robustness to gene expres-sion and have a significant impact on 3primeUTR evolutionCell 2005 1231133ndash1146
161 Bartel D MicroRNAs target recognition and regula-tory functions Cell 2009 136215ndash233
162 Washietl S Hofacker IL Nucleic acid sequenceand structure databases Methods Mol Biol 20106093ndash15
163 Galperin MY Cochrane GR The 2011 Nucleic AcidsResearch Database Issue and the online MolecularBiology Database Collection Nucleic Acids Res 201139D1ndashD6
164 He S Liu C Skogerboslash G Zhao H Wang J Liu T BaiB Zhao Y Chen R NONCODE v20 decoding thenon-coding Nucleic Acids Res 2008 36D170ndashD172
165 Pang KC Stephen S Engstrom PG Tajul-Arifin KChen W Wahlestedt C Lenhard B HayashizakiY Mattick JS RNAdbndasha comprehensive mammaliannoncoding RNA database Nucleic Acids Res 200533D125ndashD130
166 Mituyama T Yamada K Hattori E Okida H Ono YTerai G Yoshizawa A Komori T Asai K The Func-tional RNA Database 30 databases to support miningand annotation of functional RNAs Nucleic Acids Res2009 37D89ndashD92
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd
Advanced Review wireswileycomrna
167 Amaral PP Clark MB Gascoigne DK Dinger MEMattick JS lncRNAdb a reference database forlong noncoding RNAs Nucleic Acids Res 201139D146ndashD151
168 Gardner PP Daub J Tate J Moore BL Osuch IHGriffiths-Jones S Finn RD Nawrocki EP KolbeDL Eddy SR et al Rfam Wikipedia clans andthe lsquolsquodecimalrsquorsquo release Nucleic Acids Res 2011 39D141ndashD145
169 Kozomara A Griffiths-Jones S miRBase integrat-ing microRNA annotation and deep-sequencing dataNucleic Acids Res 2011 39D152ndashD157
170 Sethupathy P Corda B Hatzigeorgiou A TarBase acomprehensive database of experimentally supportedanimal microRNA targets RNA 2006 12192ndash197
171 Jiang Q Wang Y Hao Y Juan L Teng M Zhang X LiM Wang G Liu Y miR2Disease a manually curateddatabase for microRNA deregulation in human dis-ease Nucleic Acids Res 2009 37D98ndashD104
172 Bateman A Agrawal S Birney E Bruford EA BujnickiJM Cochrane G Cole JR Dinger ME Enright AJGardner PP et al RNAcentral a vision for an inter-national database of RNA sequences RNA 2011 171941ndash1946
copy 2012 John Wiley amp Sons Ltd