Intron 1 GATA site enhances ALAS2 expression indispensably ...
A comprehensive map of intron branchpoints and …...2019/04/17 · 1 1 LARGE-SCALE BIOLOGY 2 3 A...
Transcript of A comprehensive map of intron branchpoints and …...2019/04/17 · 1 1 LARGE-SCALE BIOLOGY 2 3 A...
1
LARGE-SCALE BIOLOGY 1
2
A Comprehensive Map of Intron Branchpoints and Lariat RNAs in 3
Plants 4
5
Xiaotuo Zhanga,e
, Yong Zhanga,e
, Taiyun Wanga, Ziwei Li
a, Jinping Cheng
a, 6
Haoran Gea, Qi Tang
a, Kun Chen
d, Li Liu
d, Chenyu Lu
c, Junqiang Guo
b,c, 7
Binglian Zhenga,f
, and Yun Zhengb,c,f
8
9 a State Key Laboratory of Genetic Engineering, Ministry of Education Key Laboratory 10
of Biodiversity Sciences and Ecological Engineering, Institute of Plant Biology, 11
School of Life Sciences, Fudan University, Shanghai 200438, China 12 b
Faculty of Information Engineering and Automation, Kunming University of 13
Science and Technology, Kunming, Yunnan 650500, China 14 c
Yunnan Key Laboratory of Primate Biomedical Research; Institute of Primate 15
Translational Medicine, Kunming University of Science and Technology, Kunming, 16
Yunnan 650500, China 17 d
Faculty of Life Science and Technology, Kunming University of Science and 18
Technology, Kunming, Yunnan 650500, China 19 e These authors contributed equally to the work. 20
f Correspondence: YZ ([email protected]) and BZ ([email protected]). 21
22
Short title: Splicing branchpoints and lariats in plants 23
24
One-sentence summary: Analysis of 948 RNA sequencing datasets produced a 25
comprehensive map of intron branchpoints and lariat RNAs in Arabidopsis thaliana, 26
tomato, rice, and maize. 27
28
The author(s) responsible for distribution of materials integral to the findings 29
presented in this article in accordance with the policy described in the Instructions for 30
Authors (www.plantcell.org) are: Yun Zheng ([email protected]) and 31
Binglian Zheng ([email protected]). 32
33
ABSTRACT 34
Lariats are formed by excised introns, when the 5’ splice site joins with the 35
branchpoint (BP) during splicing. Although lariat RNAs are usually degraded by 36
RNA debranching enzyme 1 (DBR1), recent findings in animals detected many lariat 37
RNAs under physiological conditions. By contrast, the features of BPs and to what 38
extent lariat RNAs accumulate naturally are largely unexplored in plants. Here, we 39
Plant Cell Advance Publication. Published on March 20, 2019, doi:10.1105/tpc.18.00711
©2019 American Society of Plant Biologists. All Rights Reserved
2
analyzed 948 RNA sequencing datasets to document plant BPs and lariat RNAs on a 40
genome-wide scale. In total, we identified 13872, 5199, 29582, and 13478 BPs in 41
Arabidopsis thaliana, tomato (Solanum lycopersicum), rice (Oryza sativa), and maize 42
(Zea mays), respectively. Features of plant BPs are highly similar to those in yeast and 43
human, in that BPs are adenine-preferred and flanked by uracil-enriched sequences. 44
Intriguingly, ~20% of introns harbor multiple BPs, and BP usage is tissue-specific. 45
Furthermore, 10,580 lariat RNAs accumulate in wild-type Arabidopsis plants, and 46
most of these lariat RNAs originate from longer or retroelement-depleted introns. 47
Moreover, the expression of these lariat RNAs is accompanied with the incidence of 48
back-splicing of parent exons. Collectively, our results provide a comprehensive map 49
of intron BPs and lariat RNAs in four plant species, and uncover a link between lariat 50
turnover and splicing. 51
52
INTRODUCTION 53
In eukaryotes, splicing of mRNA precursors (pre-mRNAs), a highly conserved critical 54
step for gene expression, comprises two catalytic steps (Ruskin et al., 1984). In the 55
first step, the 5’ splice site (5’ss, usually a GU dinucleotide) is attacked and 56
concurrently the 5' end of the intron is joined to the branchpoint (BP) by forming a 57
2’-5’ phosphodiester bond. This results in the production of a 5’ exon and a lariat 58
intermediate RNA that consists of a lariat form-intron and a 3’ exon. These 59
intermediates are then subjected to the second step of the reaction, in which the 3’ 60
splice site (3’ss, usually an AG dinucleotide) is attacked and the two exons are ligated 61
to produce the mRNA (Ruskin et al., 1984). The excised lariat introns, termed lariat 62
RNAs, are traditionally thought to be degraded quickly, when a dedicated 63
debranching enzyme 1 (DBR1) recognizes the BP and linearizes the lariat, promoting 64
its rapid degradation (Ruskin and Green, 1985; Nam et al., 1997; Kim et al., 2000; 65
Kim et al., 2001; Wang et al., 2004). As an obligate signal, the BP must be properly 66
selected to ensure efficient splicing (Jacquier and Rosbash, 1986). Recent studies 67
showed that features of the BP are highly conserved from yeast to human, that BP 68
selection is indeed regulated, and that BP mutation occurs in various diseases 69
3
(Taggart et al., 2012; Bitton et al., 2014; Mercer et al., 2015; Taggart et al., 2017; 70
Pineda and Bradley, 2018). 71
In general, lariat RNAs derived from excised introns are usually destined for 72
intron recycling, although in animals some debranched lariat RNAs can be further 73
processed into mirtron microRNAs (Ruby et al., 2007; Okamura et al., 2007), or into 74
small interfering RNAs in yeast (Dumesic et al., 2013), or into small nucleolar RNAs 75
(Ooi et al., 1998). However, some lariat RNAs accumulate under physiological 76
conditions in animals (Zhang et al., 2013; Talhouarne and Gall, 2014; Tay and Pek, 77
2017; Talhouarne and Gall, 2018). Moreover, two recent studies showed that intron 78
RNAs promote cell survival in yeast (Parenteau et al., 2019; Morgan et al., 2019), 79
further indicating that intronic RNAs are not useless by-products of splicing, but 80
rather that these intronic RNAs play essential roles in eukaryotes. 81
Loss-of-function mutants of DBR1 are embryonic lethal in both plants to animals, 82
and accompanied with over-accumulation of lariat RNAs (Wang et al., 2004; Zheng et 83
al., 2015; Li et al., 2016), indicating that DBR1 is essential for viability in both plants 84
and animals. We showed that lariat RNAs act as decoys to inhibit genome-wide 85
miRNA biogenesis by sequestering the Dicer complex (Li et al., 2016). Together, 86
findings that lariat RNAs act as decoys to sequester the toxicity of TDP-43 in 87
Amyotrophic Lateral Sclerosis (ALS) disease (Armakola et al., 2012) and that loss of 88
DBR1 caused compromised retrovirus replication (Ye et al., 2005; Galvis et al., 2014; 89
Galvis et al., 2017; Zhang et al., 2018), indicate that a strategy to control lariat RNA 90
abundance is a potential therapeutic approach. 91
In earlier studies, the detection of lariat RNAs was usually based on RT-PCR, 92
which exploits the ability of the reverse transcriptase to read through the BP (Suzuki 93
et al., 2006). With breakthroughs of high throughput sequencing technologies and 94
bioinformatics analyses, several studies from animals, such as Xenopus tropicalis, 95
Drosophila melanogaster, mouse, chicken, zebrafish, and human, showed 96
genome-wide accumulation of lariat RNAs in stable circular forms (Zhang et al., 2013; 97
Tay and Pek, 2017; Talhouarne and Gall, 2018), implying that the phenomenon of 98
lariat RNAs accumulating naturally is evolutionarily conserved in animals. Although 99
4
the formation of lariat RNAs is highly conserved in eukaryotes, the features of lariat 100
RNAs in plants were largely unexplored. Importantly, the features of BPs and/or 101
whether flanking sequences of BPs play a role in lariat RNA turnover was also 102
unclear in plants. In contrast to the increasing understanding of BPs and lariat RNAs 103
in yeast (Bitton et al., 2014) and animals (Taggart et al., 2012; Mercer et al., 2015; 104
Pineda and Bradley, 2018), a genome-wide analysis of plant intronic BPs and lariat 105
RNAs had not yet been reported. 106
Here, we performed large-scale analyses to systematically identify BPs across 107
four plant species, dicots and monocots, to provide a comprehensive map of intron 108
BPs on a genome-wide scale in plants. Our results indicate that plant introns prefer 109
adenines as their BPs, that many introns have multiple BPs, and that BP usage is 110
tissue-specific. Furthermore, using circular RNA-seq analyses from wild type and a 111
weak viable dbr1 mutant (dbr1-2), we showed that over 10,000 lariat RNAs 112
accumulate with at least five FPKM (Fragments Per Kilo basepairs per Million 113
sequencing tags) in wild-type Arabidopsis. The expression of these lariat RNAs is 114
anti-correlated with the insertion frequency of retroelements in the introns, but is 115
positively correlated with the incidence of back-splicing of flanking exons. Our data 116
provide insights into the characteristics of plant lariat RNAs and intron BPs, and 117
reveal an unexpected complexity of BP selection, lariat RNA turnover and splicing. 118
119
RESULTS 120
Transcriptomic analyses of Col-0 and dbr1-2 by circular RNA sequencing 121
Because recent studies showed that accumulation of lariat RNAs occurs widely in 122
animals (Zhang et al., 2013; Tay and Pek, 2017; Talhouarne and Gall, 2018), we 123
investigated whether this was also the case in plants. As it is known that lariat RNAs 124
exist in a circular form by escaping linearization in vivo, we performed circular RNA 125
sequencing to globally identify lariat RNAs under physiological conditions. Briefly, 126
by taking advantage of dbr1-2, a weak viable allele of dbr1 (Li et al., 2016), we 127
enriched sequencing tags spanning the junction between the 5’ss and the BP from the 128
transcriptomes of Col-0 and dbr1-2, for genome-wide identification of BPs and lariat 129
5
RNAs (Supplemental Figure 1). The RNA-seq profiles were consistent in multiple 130
samples for both Col-0 and dbr1-2, with and without RNase R treatments 131
(Supplemental Figure 2A). 132
As most linear mRNAs were digested by RNase R, the RNase R-treated samples 133
had lower gene expression levels than samples without RNase R treatments 134
(Supplemental Figure 3A). The global gene expression patterns of Col-0 and dbr1-2 135
were very similar, as shown by the high correlation coefficient values between the 136
Col-0 and dbr1-2 samples without RNase R treatments (Supplemental Figure 3B). 137
The hierarchical clustering showed that the Col-0 and dbr1-2 samples without RNase 138
R treatments grouped together, with very little difference within the cluster, which 139
was much smaller than with the RNase R-treated samples of Col-0 and dbr1-2 140
(Supplemental Figure 3B). Principal Component Analysis (PCA) also showed that 141
samples without RNase R treatments were clustered (Supplemental Figure 3C). 142
However, relative to their expression levels in Col-0, only 60 genes were de-regulated 143
in dbr1-2 (with multiple test corrected P-value < 0.05) (Supplemental Data Set 1). 144
These results suggest that the two genotypes are similar in expression profiles of 145
genes. 146
However, the accumulation levels of introns were significantly increased in 147
dbr1-2, whether samples with and without RNase R treatments (P < 10-100
, in both 148
cases, Mann-Whitney U-test, Figure 1A). The intron expression was consistent for 149
Col-0 and dbr1-2 profiles with and without RNase R treatments (Supplemental Figure 150
2B). Interestingly, samples were clustered based on the intron level profiles by both 151
hierarchical clustering (Figure 1B) and PCA (Figure 1C). Furthermore, the differences 152
between Col-0 and dbr1-2 samples detected using intron levels were larger than those 153
in using clustering results of genes (Supplemental Figure 3B). These results suggest 154
that the intronic expression underlies the main differences in the transcriptomes of 155
Col-0 and dbr1-2. By selecting intronic transcripts that had average abundances of ≥ 5 156
FPKM in Col-0 samples with RNase R treatments, 10,580 transcripts (619 transcripts 157
were detected only in Col-0 and 9961 transcripts were detected in both Col-0 and 158
dbr1-2) were identified as lariat RNAs from 6585 genes (548 genes from Col-0 only 159
6
and 6037 genes from both Col-0 and dbr1-2) (Figure 1D and Supplemental Data Set 160
1), indicating that these lariat RNAs accumulated under physiological conditions. Of 161
note, the number of total annotated introns is 128,271 from 22,524 genes in the 162
Arabidopsis thaliana genome, but more than 50% of the introns (64,213/128,271 = 163
50.4%) are ≤ 100 nucleotides (nt) which were underrepresented in our study 164
presumably due to the intentionally depletion during library preparation. 165
In contrast to those in Col-0, 15,602 intronic transcripts (5641 transcripts were 166
detected only in dbr1-2 and 9961 transcripts were detected in both Col-0 and dbr1-2) 167
from 10242 genes (4205 genes from dbr1-2 only and 6037 genes from both Col-0 and 168
dbr1-2) showed average abundances of ≥ 5 FPKM in dbr1-2 samples with RNase R 169
treatments (Figure 1D), and 6720 unique intronic transcripts from 4672 genes had 170
significantly higher expression levels in dbr1-2 than in Col-0 (Figure 1E and 171
Supplemental Data Set 1). Notably, this higher intronic expression was heavily biased 172
for long introns (Figure 1F). This bias might be small introns depleted during the 173
commercial library construction protocols. The increased intronic accumulation in 174
dbr1-2 was due to accumulation of lariat RNAs because linear RNAs were removed 175
in the RNase R treatments. 176
To exclude the possibility that increased intronic accumulation in dbr1-2 was 177
caused by defective splicing efficiency, we compared the splicing efficiency (SE) in 178
Col-0 and dbr1-2 using RNA-seq profiles without RNase R treatments. The overall 179
SE showed no significant difference between Col-0 and dbr1-2 (Supplemental Figure 180
3D). Therefore, because dbr1-2 showed minor effects on gene expression, but major 181
effects on intron expression, it was reasonable to use these transcriptomes to further 182
investigate BPs and lariat RNAs in Arabidopsis. 183
184
BP features are highly conserved from dicots to monocots 185
Reverse transcriptase can traverse the BP to copy the intronic region upstream of the 186
BP, and thus this product contains two juxtaposed intronic segments that align in an 187
inverted order, defining the 5’ss and the BP (Suzuki et al., 2006). Considering that this 188
same read-through phenomenon also occurs during the construction of RNA 189
7
sequencing libraries, we developed a computational pipeline using RNA-seq datasets 190
to systemically map the BPs in plants (Figure 2A). In brief, we first aligned all 191
sequenced reads to the genome with TopHat2 (Kim et al., 2013) or HISAT2 (Kim et 192
al., 2015). Then, we aligned unmapped reads to introns with BLASTN or Bowtie 2 193
(Langmead and Salzberg, 2012). For those reads that could be partially mapped to 194
introns, we examined whether the unmapped regions of the same reads could be 195
aligned to the same introns (Figure 2A). The reads that could span the 5’ss and 196
another region close to the 3’ss with at least 6 nucleotides on each segment were used 197
to infer the BP. The last nucleotide of the mapped region close to the 3’ss is predicted 198
as the BP. The reads that cover the same BP were grouped. By employing this 199
approach to analyze 948 RNA-seq profiles in total (Supplemental Data Set 2, 200
including 167 RNA-seq profiles for Arabidopsis (Arabidopsis thaliana), 264 201
RNA-seq profiles for tomato (Solanum lycopersicum), 207 RNA-seq profiles for rice 202
(Oryza sativa), and 310 RNA-seq profiles for maize (Zea mays), respectively), we 203
obtained ~300,000 informative sequenced reads in total, and identified 13,872 BPs 204
from 6414 introns in Arabidopsis (Supplemental Data Set 3), 5199 BPs from 2566 205
introns in tomato (Supplemental Data Set 3), 29582 BPs from 11026 introns in rice 206
(Supplemental Data Set 3), and 13487 BPs from 5986 introns in maize (Supplemental 207
Data Set 3), respectively. 208
In both dicot species (Arabidopsis and tomato) and monocot species (rice and 209
maize), BPs within constitutive introns were most frequently adenines (>50%), 210
followed by thymines/uracils (15-20%), guanines (~8-20%), and cytosines (~2-10%) 211
(Figure 2B), as reported in yeast and human (Taggart et al., 2012; Bitton et al., 2014; 212
Mercer et al., 2015; Taggart et al., 2017; Pineda and Bradley, 2018). By randomly 213
selecting 16 lariat RNAs from Arabidopsis for Sanger sequencing, we confirmed that 214
these BPs were adenines (Supplemental Figure 3E and 3F). In addition, previous 215
studies showed that the distance from the BP to the 3’ss is tightly constrained 216
(Taggart et al., 2012; Bitton et al., 2014; Mercer et al., 2015). We found that BPs were 217
preferentially positioned within 50 nucleotides upstream of the 3’ss in Arabidopsis, 218
tomato, and rice (Figure 2C), highly similar to those in yeast and humans (Taggart et 219
8
al., 2012; Bitton et al., 2014; Mercer et al., 2015). However, only 51.2% (6903/13487) 220
of BPs located within 50 nucleotides upstream of the 3’ss in maize (Figure 2C), and 221
around half of BPs were positioned between 100 and 1000 nucleotides from the 3’ss 222
(Figure 2C). The heterogeneity of the distance of BP from the 3’ss in maize indicates 223
that the mechanism of BP selection in maize is more complicated. 224
Although >50% of constitutive introns in plants use adenine as the BP, a 225
substantial portion of introns applied other nucleotides as their BPs (Figure 2B and 226
Supplemental Figure 4). To exclude the possibility that non-adenine BPs were caused 227
by lower fidelity during the conversion of BPs by reverse transcriptase, we analyzed 228
the mutation events of BP during conversion by calculating the indicated nucleotide 229
of the referred genome relative to total numbers of identified BPs in each species. We 230
showed that adenine in the annotated sequences of pre-mRNAs was much more easily 231
converted to uracil (Supplemental Figure 5A-D). In contrast, guanines retained high 232
fidelity during library construction (Supplemental Figure 5A-D). This phenomenon 233
has been reported in a previous study (Taggart et al., 2012). Therefore, the identified 234
BPs with guanines are most likely non-canonical BPs in plants. 235
Since the flanking regions of BPs bind U2 snRNA, we surmised that nucleotides 236
flanking the BPs might be important for BP recognition. To identify potential 237
cis-elements, we analyzed nucleotides around the BP. We identified a consensus 238
motif containing a 10 nt uracil-rich element downstream of the BP (Figure 2D), and 239
the second position upstream of the BP exhibited a strong preference for the U 240
nucleotide (4.0-fold enrichment) in all four plant species (Figure 2D), which is 241
consistent with a recent finding in human cell lines (Mercer et al., 2015). Moreover, 242
multiple nucleotides downstream of the BP showed preferences for uracils (Figure 243
2D). These observations indicate that BP selection is highly conserved from plants to 244
animals. 245
246
Multiple branchpoints in plants 247
In general, a default BP is set for each intron. However, by calculating the number of 248
BPs in a single intron in all four plant species, we found that although most introns 249
9
only had one identified BP, ~20% of the introns used two or more BPs (Figure 3A). 250
For example, At4g39260.1 I1 (the first intron of At4g39260.1) has two BPs identified 251
from our RNA sequencing analyses, and the interval between two BPs is very short 252
(Supplemental Figure 6A). We ranked the BPs for each intron by the number of 253
mapped lariat reads and defined the one supported by the highest number of reads as 254
the major BP. Consistent with this definition, we found that the major BP in 255
At4g39260.1 I1 is supported by more informative sequencing reads (Supplemental 256
Figure 6B), in which the major BP (the 258th
) is adenine supported by 15 reads and 257
the second BP (the 256th
) is uracil supported by 4 reads (Supplemental Figure 6B). 258
We then used Sanger sequencing to further validate the two BPs in At4g39260.1 259
I1. The score peaks before the 258th
nt of this intron were distinct, which indicates a 260
single possible nucleotide. However, on and after the 258th
nt, there were multiple 261
peaks for each nucleotide. We carefully examined the sequence that corresponds to 262
the major and minor peaks by Sanger sequencing. The two sequences (Supplemental 263
Figure 6C) actually resulted from the two BPs at the 258th
and 256th
nt of 264
At4g39260.1 I1, respectively. 265
Next, we investigated whether the distance of the BP from the 3’ss affected BP 266
usage. Quantitatively plotting the numbers of lariat reads as a function of BP position 267
showed that the majority of the most frequently used BPs resided within a narrow 268
window in all four plant species, consistent with the restricted genome-wide 269
distribution of BPs (Figure 3B). Together with that multiple BPs occurs widely in 270
human cell lines (Pineda and Bradley, 2018), these findings indicate that the 271
phenomena of multiple BPs is conserved from plants to mammals. 272
273
Tissue-specific branchpoints in Arabidopsis and rice 274
We surmised that the existence of multiple BPs might play a regulatory role in 275
pre-mRNA splicing. In other words, BP usage might be developmentally regulated, as 276
recently reported in human cell lines (Pineda and Bradley, 2018). To test this 277
hypothesis, we investigated whether certain introns exhibit tissue-specific BP usage in 278
Arabidopsis and rice. By grouping 167 RNA-seq profiles in Arabidopsis according to 279
10
the tissue used for RNA extraction, we selected five tissues (callus, roots, seedlings, 280
leaves, and inflorescences) that had the largest numbers of supporting reads flanking 281
the BP to identify tissue-specific BPs using multinomial proportion tests and 282
estimated the False Discovery Rate (FDR) according to a previous study (Benjamini 283
and Hochberg, 1995). Due to the transient nature of lariat RNAs and the specific 284
selection of poly A-plus RNA in traditional RNA-seq library construction protocols, 285
informative reads that traverse the lariat junction between the 5’ss and the BP are rare. 286
However, we still detected 136 tissue-specific BPs in Arabidopsis (Supplemental Data 287
Set 4). By using the same method, we grouped 207 rice RNA-seq profiles according 288
to the tissue used for RNA extraction, and selected five tissues (nematode-induced 289
giant cells, panicle, roots, shoots, and vascular cells) that had the largest numbers of 290
supporting reads flanking the BP to identify tissue-specific BPs using the multinomial 291
proportion tests. Consequently, we identified 565 tissue-specific BPs in rice 292
(Supplemental Data Set 4). 293
Given the above-mentioned positional effects on BP usage, we expected to 294
observe preferential usage of BPs that were within ~50 bp proximal to the 3’ss. 295
Unexpectedly, we found that BP usage was instead highly tissue-specific 296
(Supplemental Data Set 4). For example, three BPs were identified for the ninth intron 297
of At3g01500.1, in which the distal BP (the 154th
nt upstream of the 3’ss) had a 298
significantly higher preference in leaves and seedlings but not in inflorescences (P = 299
1.7 × 10-61
, multinomial proportion test, Figure 4A), while the proximal BPs (the 33th
300
and 34th
nt upstream of the 3’ss) were frequently used in inflorescences but not in leaf 301
and seedlings (Figure 4A). Similarly, three BPs were identified for the first intron of 302
Os04g16748, in which the most distal BP (the 700th
nt upstream of the 3’ss) was only 303
detected in panicles and the distal BP (the 87th
nt upstream of the 3’ss) was mainly 304
used in giant cells and vascular cells, while the closest BP (the 7th
nt upstream of the 305
3’ss) was frequently used in panicles, shoots, and roots (Figure 4B). 306
To further validate this tissue-specific BP usage, we amplified the RT-PCR 307
products of the 7th
intron of At3g23590.1 from five different tissues (roots, seedlings, 308
leaves, inflorescences, and siliques) using indicated divergent primers, and performed 309
11
Sanger sequencing. We obtained seven different isoforms of the 7th
intron of 310
At3g23590 by using the same pair of primers, and showed that seven unique BPs 311
existed in the lariat RNAs (Figure 4C). To examine whether the usage of these seven 312
BPs exhibited a tissue-specific pattern, we sequenced more than 10 independent 313
clones of RT-PCR products for each tissue, and counted the frequency of different 314
BPs in tested tissues (Supplemental Figure 7). We showed that different BPs 315
exhibited significant preferences in specific tissues (Figure 4D). For example, the 316
216th
BP was mainly selected in leaves, inflorescences, and siliques (Figure 4D). In 317
contrast, the 224th
BP was preferentially used in roots and seedlings (Figure 4D). 318
Unexpectedly, the 285th
and 287th
BPs, two BPs within 50 bp upstream of the 3’ss, 319
were seldom used in any tested tissues. Although the regulatory mechanism of 320
specific BP selection remains unknown, these results suggest that the multiple BP 321
usage is indeed regulated in a tissue-specific manner, which is consistent with results 322
in human introns (Pineda and Bradley, 2018). 323
324
A subset of introns self-circularize with the 5’ss and the 3’ss in plants 325
Several studies show that BP selection determines the 3’ss recognition, in which the 326
first AG downstream of the BP is usually used as the 3’ss (Smith et al., 1989; 327
Gooding et al., 2006). Although other criteria, including secondary structure, context 328
flanking the AG, distance to neighboring AGs, and an optimal distance between the 329
BP and AG shown above (Figure 2D), have been used (Chen et al., 2000; Chua and 330
Reed, 2001; Meyer et al., 2011), the ‘AG exclusion zone’ has been widely accepted to 331
predict the 3’ss. To validate whether the “AG exclusion zone” is applied to the 3’ss 332
recognition of plant introns, we scanned the context of the 3’ss of all introns with BPs 333
identified in our study. As expected, 58.2% in Arabidopsis, 54.2% in tomato, 54.8% 334
in rice, and 48.2% in maize of the BPs selected the first AG downstream of the BP as 335
the 3’ss (P < 0.001, by permutation test) (Figure 5A and Supplemental Data Set 3). 336
However, a substantial portion (~30-40% in four species) of the 3’ss skipped the first 337
AG downstream of the BP (Figure 5A and Supplemental Data Set 3). More 338
interestingly, some introns appeared to avoid AG downstream of the BP but instead 339
12
selected a non-AG as the 3’ss (Figure 5A and Supplemental Data Set 3). These 340
observations suggest that the determination of the 3’ss in plants is not tightly 341
regulated by the “AG exclusion zone”. 342
Unexpectedly, we found that 107 introns in Arabidopsis, 82 introns in tomato, 343
429 introns in rice, and 269 introns in maize showed an overlap between the BP and 344
the 3’ss (Figure 5A), indicating that these intronic RNAs self-circularized with the 345
5’ss and the 3’ss, as also reported in human cells (Taggart et al., 2012; Gardner et al., 346
2012; Tay and Pek, 2017; Talhouarne and Gall, 2018). Several examples showed that 347
these intron transcripts were indeed circularized from the 5’ss to the 3’ss (Figure 5B 348
and 5C and Supplemental Figure 8), and each circularized intronic RNA was detected 349
in at least two independent RNA-seq profiles with more than 30 unique supporting 350
reads (Figure 5B and 5C and Supplemental Figure 8). Moreover, the average lengths 351
of these stably accumulated introns are longer than average lengths of all introns in 352
four species, especially in rice and maize (Supplemental Figure 9). These 353
observations indicate that some intronic RNAs are not traditionally degraded, instead, 354
these intronic RNAs can accumulate with a circular form in vivo. 355
356
Identification of lariat-derived circular RNAs in Arabidopsis 357
Lariat RNA formation during splicing is highly conserved in eukaryotes, and we 358
previous showed that some lariat RNAs accumulate naturally in plants (Li et al., 359
2016), as also reported in animals (Gardner et al., 2012; Zhang et al., 2013; Tay and 360
Pek, 2017; Talhouarne and Gall, 2018). To identify lariat RNAs in plants on a 361
genome-wide scale, we performed circular RNA sequencing using total RNAs from 362
inflorescences of wild type plants and focused on the reads mapped to intronic regions 363
only. Since RNase R degrades most linear RNAs, introns with significant 364
accumulation of sequencing reads in RNase R-treated Col-0 samples were regarded as 365
lariat-derived circular RNAs. We identified 10580 lariat-derived circular RNAs with 366
≥5 FPKM generated from 6585 genes in Col-0 (Figure 1D and Supplemental Data Set 367
1). Among these lariat-derived circular RNAs, 1489 lariat RNAs with ≥20 FPKM 368
were detected in wild type plants (Figure 6A). The average length of 64058 369
13
introns >100 bp in the Arabidopsis thaliana genome is 253 bp, but the average length 370
of 10580 introns with lariat accumulation is 378 bp, which is significantly longer than 371
that of all introns (P<10-15
, Welch’s t-test, Figure 6B). However, among these 10580 372
introns with lariat accumulation in Col-0, the intron length was anti-correlated with 373
the expression of lariat-derived circular RNAs (Figure 6C), suggesting that the 374
expression of larger introns is limited. Consistent with this point, we found that 375
introns up-regulated in dbr1-2 were significantly shorter than those introns with lariat 376
accumulation in Col-0 (Figure 6B). 377
By examining the frequency of a single gene harboring lariat RNAs, we showed 378
that most genes only allowed one lariat RNA to accumulate (Supplemental Figure 379
10A), and that more than 2,000 lariat RNAs were originated from the first intron 380
(Supplemental Figure 10B). The potential coding capacity analysis showed that most 381
lariat-derived circular RNAs are non-coding transcripts (Supplemental Figure 10C). 382
Moreover, the expression of the lariat-derived circular RNAs are moderately 383
correlated with expression of the parent gene in both Col-0 and dbr1-2 (Supplemental 384
Figure 10D and 10E), which is consistent with the finding that lariat-derived circular 385
RNAs (ciRNAs) promote expression of the parent gene in human cell lines (Zhang et 386
al., 2013). However, the correlation coefficient between the expression of genes and 387
introns was significantly decreased in dbr1-2 (Supplemental Figure 10D and 10E), 388
consistent with the disturbed processing of lariat RNAs in dbr1-2. 389
To validate identified lariat-derived circular RNAs in vivo, we randomly chose 390
four loci (Figure 6D) for detection by RNA gel blotting. These four lariat-derived 391
circular RNAs represent two types of loci. One type is present in wild type (Col-0) 392
plants, i.e. At4g17390.1 I2 and At3g52590.1 I3 (Figure 6D), while the other only 393
accumulates in the dbr1-2 mutant, i.e. At1g60995.1 I8 and At5g23050.1 I8 (Figure 394
6D). We first analyzed these lariat-derived circular RNAs in denatured agarose gels 395
by RNA gel blotting using the antisense transcript of the intron as the probe (Figure 396
6E). As shown in Figure 6F, these four previously unreported intronic RNAs were 397
detected with expected sizes. At4g17390.1 I2 and At3g52590.1 I3 were detected in 398
Col-0, and At1g60995.1 I8 and At5g23050.1 I8 were only detected in dbr1-2 (Figure 399
14
6F). Although the levels of mature mRNA of At4g17390.1 and At3g52590.1 were 400
comparable between Col-0 and dbr1-2, the intronic RNA levels of At4g17390.1 I2 401
and At3g52590.1 I3 were significantly higher in dbr1-2 (Figure 6F). Notably, the size 402
of intronic RNAs are much smaller than the mRNA of the parent genes (Figure 6F), 403
indicating that these intronic RNAs are individual transcripts. 404
To further exclude that detected bands seen in Figure 6F are not alternative linear 405
precursor mRNAs or linear individual intronic RNAs, we loaded all RNA samples 406
with three different sized RNA standards (Figure 6G, the most left panel) in the same 407
denatured PAGE gels, and then performed RNA gel blotting using the same probes as 408
in Figure 6E for At4g17390.1 I2 and At3g52590.1 I3, respectively. It is well known 409
that circular RNAs usually migrate much slower than linear RNAs with equivalent 410
sizes in denatured PAGE gels. Consistent with the nature of lariat-derived circular 411
RNAs observed in RNA sequencing, both individual RNAs from At4g17390.1 I2 (290 412
nt) and At3g52590.1 I3 (343 nt), respectively, migrated much more slowly than the 413
linear RNA standards, although their predicted sizes are much smaller than those of 414
the RNA standards (Figure 6G). Moreover, the RNA from At4g17390.1 I2 migrated 415
slightly slower than that the one from At3g52590.1 I3 (Figure 6G), consistent with 416
their size difference. 417
To conclusively demonstrate that these intronic RNAs are circular RNAs, we 418
first treated total RNA using RNase R to degrade linear RNAs (Supplemental Figure 419
10F), and then examined whether the transcripts of At4g17390.1 I2 were retained by 420
RNA gel blotting. As expectedly, there was a distinct band of At4g17390.1 I2 at 421
exactly the same position as in the RNase R-treated samples (Figure 6G), suggesting 422
that this intronic RNA is present in vivo with a circular form. Indeed, we observed 423
that sequencing reads from At4g17390.1 I2, At3g52590.1 I3, At1g60995.1 I8, and 424
At5g23050.1 I8, only covered their respective intronic regions between the 5’ss and 425
the BP, and there were no sequencing reads corresponding to the region between the 426
BP and the 3’ss (the dashed region) (Figure 6D). We thus systemically identified 427
hundreds of previously unidentified but highly abundant lariat-derived circular RNAs 428
in Arabidopsis. 429
15
430
Lariat accumulated introns accompany increased incidences of exonic 431
back-splicing events 432
Since lariat-derived circular RNAs are formed simultaneously with the maturation of 433
pre-mRNAs, we tested whether the accumulation of lariat RNA affects linear mRNA 434
maturation in plants. It is known that specific sequences in the introns, such as Alu 435
elements in mammalian introns, promote back-splicing of adjacent exons, thus 436
inhibiting the production of linear mature mRNA (Liang and Wilusz, 2014; Zhang et 437
al., 2014; Kramer et al., 2015). However, this mechanism might not be applicable in 438
species that lack noticeable flanking intronic secondary structure, and a subsequent 439
study showed that the formation of double lariats contributes to the occurrence of 440
exonic back-splicing events in yeast (Barrett et al., 2015), indicating that there might 441
be a connection between lariat structure and exon circularization. By analyzing the 442
ratio of back-splicing events of two flanking exons, we showed that the incidence of 443
back-splicing events was significantly correlated to the accumulation of lariat-derived 444
circular RNAs (Figure 7A). Moreover, the correlation between the exonic 445
circularization and intronic accumulation was independent of the position of flanking 446
exons, i.e., both upstream and downstream adjacent exons exhibited increased 447
incidence of back-splicing (Figure 7A). In addition to the correlation with two 448
adjacent exons, the incidence of back-splicing events of the parent gene was also 449
significantly increased with the accumulation of lariat-derived circular RNAs 450
(Supplemental Figure 11A). These results indicate that the rapid elimination of lariat 451
RNAs favors the production of linear mature mRNAs. 452
453
Exclusion of retroelements in lariat RNA accumulated introns 454
To understand sequence features of lariat RNA accumulated introns, we investigated 455
whether the insertion of transposable elements in the intronic regions affected the 456
turnover of lariat RNAs. We used RepeatMasker (Tempel, 2012) to analyze the 457
distributions of TEs in three types of introns, i.e., all 64,058 introns >100 bp, 10,580 458
introns with lariat-derived circular RNA accumulation in Col-0, and 6720 introns with 459
16
higher accumulation of lariat-derived circular RNAs in dbr1-2. As shown in Figure 460
7B, 6510 introns harbored various types of TEs or repeated sequences, mainly 461
including retroelements (Long Terminal Repeat (LTR) elements, SINE, LINE, ~1% 462
of total length), DNA transposons (~1.5% of total length), and simple repeats (~1.2% 463
of total length). In contrast, introns with accumulation of lariat-derived circular 464
RNAs, were significantly depleted of LTR retroelements and DNA transposons 465
(Figure 7B), but retained the simple repeated sequences (Figure 7B). Especially, those 466
introns with the most abundant lariat-derived circular RNAs (≥50 FPKM in Col-0) 467
were specifically enriched in satellite sequences (Figure 7B). In total, there are 468
retroelements in 486 of all introns with ≥100 bp in the Arabidopsis genome 469
(Supplemental Figure 11B and Supplemental Data Set 6). The ratio of introns with 470
retroelements were significantly reduced in introns with lariat RNA accumulated in 471
Col-0 and in introns with higher expression in dbr1-2 (P = 3.8×10-16
and P = 1.6×10-6
, 472
respectively, Fisher’s exact tests, Supplemental Figure 11B). Compared to naturally 473
accumulated introns in Col-0, the ratio of introns with repeat elements was slightly 474
increased in introns with higher expression in dbr1-2 (P = 0.04, Fisher’s exact tests, 475
Supplemental Figure 11B). These results indicate that the insertion of different classes 476
of TEs might play a role in the turnover of lariat RNAs. 477
Given that introns are longer in more complex eukaryotes, and the insertion of 478
TEs into intronic regions most likely contributes the increase of intron length, we 479
wanted to know if these introns depleted of retroelements are longer than other introns. 480
We named those 486 introns with >100 nt in length and retroelement sequences, as 481
RE-introns, and other introns as non-RE introns. Indeed, RE-introns are significantly 482
longer than other introns (P < 10-100
, Student's t-test) (Figure 7C). Moreover, because 483
the insertion of TEs usually leads to the formation of heterochromatic status of parent 484
genes, which generally inhibits gene expression, we compared the expression levels 485
of RE-introns and non-RE introns. RE-introns themselves had significantly lower 486
expression levels than non-RE introns (P < 10-10
, Mann-Whitney U-test) in all 8 487
RNA-seq profiles (Figure 7D), further suggesting that the presence of retroelements is 488
anti-correlated with lariat RNA accumulation. Furthermore, the expression levels of 489
17
parent genes with RE-introns were also significantly lower than for non-RE parental 490
genes (P < 10-10
, Mann-Whitney U-test) in all 8 RNA-seq profiles (Supplemental 491
Figure 11C). As shown in Figure 7E, there are 3 retroelements in At2g34880.1 I5, 492
which might contribute to its extremely low expression levels. In contrast, 493
At2g14080.1 I1 only consists of one LINE element, and the expression of both parent 494
gene and intron were much higher than At2g34880 (Figure 7E). Furthermore, 495
At4g39260.1 I1 contained no retroelements and had much higher expression levels 496
than either At2g34880.1 I5 and At2g14080.1 I1 (Figure 7E), and the parent gene 497
At4g39260.1 also had much higher expression levels than At2g34880.1 and 498
At2g14080.1, suggesting that retroelements contribute to the expression levels of both 499
parent genes and their introns. 500
In Col-0 and dbr1-2 RNase R-untreated transcriptomes, we found that the 501
expression level of At2g14080.1 I1 was very limited (Figure 7E), but At2g14080.1 I1 502
expression was abundant in RNase R (+) libraries, especially in dbr1-2 RNase R (+) 503
libraries (Figure 7E), further indicating that DBR1 is responsible for the degradation 504
of At2g14080.1 I1. Therefore, we systemically examined the types of TEs in higher 505
expressed introns in dbr1-2. We found that unlike the exclusion of retroelements and 506
DNA transposons in naturally accumulated introns in Col-0 (Figure 7B), a substantial 507
portion of higher expressed introns in dbr1-2 harbored retroelements and DNA 508
repeats (Supplemental Figure 11D). Collectively, these analyses indicate that 509
retroelement insertion is anti-correlated with the accumulation of lariat RNAs. 510
511
DISCUSSION 512
Although both BP selection and lariat RNA formation are essential during pre-mRNA 513
splicing, the features of BPs and lariat RNA detection have been mostly reported case 514
by case. The first large-scale analysis of BPs was performed in Fairbrother’s lab 515
(Taggart et al., 2012; Taggart et al., 2017), in which high throughput RNA sequencing 516
data from human cell lines was used to find the BP location and to map the 517
distribution of splicing factors around BPs. A subsequent study developed a 518
data-driven algorithm LaSSO (Lariat Sequence Site Origin) to map precisely the 519
18
location of BPs on a genome-wide scale in yeast (Bitton et al., 2014). With the 520
improvement of circular RNA sequencing, Mercer et al. used RNase R digestion 521
followed by RNA sequencing to enrich sequences that traverse the lariat junction, and 522
provided a first comprehensive map for human BPs (Mercer et al., 2015; Taggart et al., 523
2017). All these studies provide comprehensive knowledge about BPs and lariat 524
RNAs in yeast and humans. However, the understanding about BPs and lariat RNAs 525
in plants was still unexplored. In this study, we utilized a huge number of published 526
RNA-seq datasets from four plant species to extract BPs, and we took advantage of 527
the viability of a weak allele of dbr1 to enrich for lariat RNAs, thus providing a 528
comprehensive view of BPs in both monocots and dicots. Moreover, the systemic 529
identification of lariat-derived circular RNAs in wild type plants opens a research 530
avenue that will allow examination of the unexpected role of intron transcripts. 531
The basic principles of the BP selection are highly conserved from plants to 532
human (Taggart et al., 2012; Bitton et al., 2014; Mercer et al., 2015; Taggart et al., 533
2017; Pineda and Bradley, 2018). First, the BP nucleotide is strictly constrained in 534
distance from the 3’ss. Second, the BP nucleotide exhibits a strong preference for 535
adenine. Third, sequences flanking the BP exhibit U-rich nucleotides. Fourth, uracil is 536
preferred as the second nucleotide upstream of the BP. One of the earliest steps in 537
spliceosome assembly is the binding of SF1 to the BP (Pastuszak et al., 2011), a 538
process for which SF1 requires only the UnA motif, providing a mechanistic 539
explanation for the importance of the U in the second last position before the BP. 540
Although downstream sequences of the BP in plants are U-rich (this study), both 541
U-rich and C-rich downstream sequences in humans (Mercer et al., 2015) indicates 542
that heterogeneity of downstream sequences may enable the sequence-specific 543
selection of multiple BPs by the spliceosome, resulting in more complicated 544
regulation of splicing in larger genomes. Besides the common features in the BP 545
nucleotide and flanking sequences, we also found that multiple BPs (Figure 3) and 546
tissue-specific BP usage (Figure 4) might contribute to the complexity of gene 547
expression in plants. 548
Interestingly, we observed that the accumulation of lariat-derived circular RNAs 549
19
was correlated with the occurrence of back-splicing events of flanking exonic regions 550
(Figure 7), indicating that quick turnover of lariat RNAs by DBR1 is beneficial for 551
pre-mRNA splicing to favor the production of linear mRNA. Recent mechanistic 552
studies show that intronic complementary sequences (Liang and Wilusz, 2014; Zhang 553
et al., 2014; Kramer et al., 2015), the homodimerization of specific proteins binding to 554
intronic regions (Conn et al., 2015), or potential intronic RNA-RNA interaction 555
(Ivanov et al., 2015), promote exonic back-splicing events. Our finding that the rapid 556
turnover of lariat RNAs prevents back-splicing uncovers a new perspective to 557
understand the biological significance of intron metabolism in gene expression. 558
Identification of lariat RNA binding proteins will provide further mechanistic 559
evidence of the balance between linear mRNA production and intronic circRNA 560
formation. 561
In addition, we observed an anti-correlation between intronic retroelement 562
insertion and lariat RNA accumulation in plants (Figure 7 and Supplemental Figure 563
11). Two possibilities might explain this phenomenon. First, that the insertion of TEs 564
in intronic regions of coding genes usually leads to heterochromatization of the parent 565
gene, and thus the transcription of the parent gene is limited, which finally leads to 566
less production of lariat RNAs. Second, that the transcription of parent genes is quite 567
normal, but their corresponding lariat RNAs with TE sequences are preferentially 568
degraded by DBR1. The latter possibility is consistent with previous findings that 569
DBR1 was initially identified as the regulator for TE transposition in yeast (Karst et 570
al., 2000). Together with our finding that those TE-contained introns were more 571
highly expressed in dbr1-2 (Supplemental Figure 11F), these results suggest that lariat 572
RNAs formed from TE-contained introns might be much more sensitive to DBR1 573
activity. 574
In summary, our work provides a comprehensive map of branchpoints and 575
lariat-derived circular RNAs in four plant species, uncovers features of branchpoints 576
and lariat-derived circular RNAs, shows a potential link between intron metabolism 577
and the evolution of transposable elements, and opens a novel perspective to 578
understand the communication between intronic circular RNAs and exonic circular 579
20
RNAs. 580
581
MATERIALS AND METHODS 582
Materials and RNA-seq libraries 583
Arabidopsis thaliana Columbia (Col-0) was used as wild type. Seeds of dbr1-2 were 584
generated (Li et al., 2016). Plants were grown in a 16 h light (bulb type: PHILIPS 585
TLD 36W/865, with eight tubes), 8 h dark growth room. Inflorescences were 586
collected for total RNA extraction with Trizol (Amion). Total RNA was treated with a 587
Ribo Zero kit (Epicenter) to obtain ribosomal RNA-depleted RNA (ribo-RNAs), then 588
incubated with or without RNase R (Epicenter) and subjected to phenol:chloroform 589
purification. Purified RNAs were used for library preparation with Illumina TruSeq 590
Stranded Total RNA HT Sample Prep Kit (P/N15031048), and libraries were 591
sequenced with Illumina HiSeq 2500 sequencer at Genergy (Shanghai, China). Two 592
replicates for each sample were performed. The RNA-seq data are deposited in the 593
NCBI GEO database with series accession No. GSE117416. 594
595
RNA gel blotting 596
Total RNA was extracted from inflorescences using Trizol reagent (Invitrogen). 20 µg 597
total RNA was loaded on 1.2% denatured agarose gels or 5% urea-PAGE gels and 598
transferred to a nylon membrane. 32
P α-UTP-labeled antisense RNAs as probes or 599
linear standards were transcribed in vitro using T7 RNA polymerase. Hybridization 600
was performed using hybridization buffer (Ambion), and the signals were detected 601
using Typhoon FLA9500 (GE Healthcare). Primers used for in vitro transcription are 602
listed in Supplemental Table 1. 603
604
Validation of lariat RNAs by RT-PCR follower by Sanger sequencing 605
Lariat RNAs across the BP were detected by RT-PCR as described (Suzuki et al., 606
2006). Total RNA with RNase R treatments were used as templates. cDNA synthesis 607
was carried out using SuperScript III (Invitrogen) with random hexamers. Reaction 608
mixtures were incubated at 30◦C for 10 min, at 42◦C for 120 min, at 50◦C for 30 min, 609
at 60◦C for 30 min, and at 99◦C for 5 min. Then lariat RNAs were obtained by PCR 610
21
and purified by gel purification for Sanger sequencing to identify the BP. Primer 611
sequences used are listed in Supplemental Table 1. 612
613
Computational analysis of the RNA-seq profiles 614
The RNA-seq libraries were mapped to the genome of Arabidopsis thaliana (version 615
TAIR10) using Cufflinks v2.2.1 (Trapnell et al., 2010). Cuffquant and Cuffnorm of 616
were used to quantify and normalize the FPKM values of the genes, respectively. 617
Correlation coefficients of gene expression levels were calculated for Col-0 and 618
dbr1-2, with and without RNase R digestion. Normalized FPKM values of genes in 619
the Col-0 and dbr1-2 samples without RNase R treatments were compared to find 620
deregulated genes with edgeR (Robinson et al., 2010). Genes with average 621
abundances of at least 5 FPKM in either dbr1-2 or Col-0 and multiple-test corrected 622
P-values smaller than 0.05 were designated as de-regulated genes. Genes with 623
abundances of at least 10 FPKM in at least one of the 8 samples and standard 624
deviation of at least 1 were used for further analyses. The normalized FPKM values 625
plus one were log scaled to calculate the correlation coefficient (CC) values between 626
samples. The CC values were applied to the pheatmap function in the pheatmap 627
library in R to perform hierarchical clustering. These filtered genes were also used to 628
perform Principal Component Analysis (PCA). Log-scaled normalized FPKM values 629
plus one were applied to the prcomp function in the psych library in R to perform 630
PCA. 631
632
Estimation the expression levels of lariat RNAs 633
The “bedtools genomecov” command of bedtools (Quinlan and Hall, 2010) was used 634
to calculate the genome coverage of RNA-seq libraries. A custom program was used 635
to calculate FPKMs (Fragments Per Kilo basepairs per Million sequencing tags) of 636
introns of annotated genes in TAIR10, using the genome coverage results of RNA-seq 637
libraries. To compare global changes of intron expression, the average intron 638
expression levels were calculated for Col-0 and dbr1-2, with and without RNase R 639
treatments. Those intronic transcripts with expression levels of FPKM ≥5 from the 640
22
Col-0 samples with RNase R treatments were defined lariat RNAs in wild type plants. 641
The differences of intron expression levels for Col-0 and dbr1-2, with and without 642
treatments, were evaluated with the Mann-Whitney U-test. The correlation 643
coefficients of intron expression levels were calculated for the two samples of Col-0 644
and dbr1-2, with and without RNase R treatments. To find de-regulated introns in 645
dbr1-2, introns with at least 5 FPKM in dbr1-2 were kept. Then, the expression levels 646
of introns in dbr1-2 and Col-0 with RNase R treatments, respectively, were used to 647
find de-regulated introns using edgeR (Robinson et al., 2010). The introns with false 648
discovery rate (FDR) values of smaller than 0.05 and log scaled fold change larger 649
than 1 were deemed higher expressed introns in dbr1-2. The introns were selected 650
using the same criteria as genes, then used to perform hierarchical clustering and PCA 651
using the same methods as the genes. 652
653
Correlation between the expression levels of introns and their parent genes 654
The average FPKM values of introns and the average FPKM values of their host 655
genes were used to calculate the correlation coefficient values in the four Col-0 and 656
four dbr1-2 samples without RNase R treatment, respectively. If a gene had more than 657
one intron, the intron closest to the transcription start site was kept. The average 658
FPKM values of introns and average FPKM values of genes should both be larger 659
than 1 or equal to 1. Only introns with at least 200 bp were used for analysis. 660
661
The computational pipeline for identifying BPs 662
Reverse aligned reads to 2'-5'-phosphodiester site regions were identified with a 663
customized computational pipeline (Figure 2A). First, a database of all introns in A. 664
thaliana (for all annotated genes in the TAIR10 database) was generated using a 665
self-written program. Second, RNA-seq profiles were aligned to the genome using 666
TopHat2 for self-generated data sets (Kim et al., 2013) or HISAT2 (Kim et al., 2015) 667
for published data sets, by specifying the unmapped reads using the option 668
"--un-conc". For TopHat2, reads that could not be mapped to the genome were 669
retrieved with bamToFastq in bedtools (Quinlan and Hall, 2010). Then, the unmapped 670
23
reads were aligned to introns of TAIR10 annotated genes with BLASTN for 671
self-generated RNA-seq data sets, using the options of "-S 1 -e 1e-20", or Bowtie 2 672
(Langmead and Salzberg, 2012) for published RNA-seq data sets, using the options of 673
"--local -q --norc --no-unal -p 32 -a --no-hd --no-sq". Finally, a self-written program 674
was used to check whether the remaining regions of the partially matched reads could 675
also be aligned to the same introns. We selected reads that spanned the 5’ss and the 676
potential BP, requiring that both of the two matched segments in a matched read had 677
at least 6 nucleotides. The BP is then the last nucleotide of the matched segment near 678
the 3' end of the intron. The branch positions that were detected in at least one of the 679
selected RNA-seq libraries were retained for counting the different branch nucleotides 680
of lariat RNAs. 681
To identify BPs in rice (Oryza sativa), tomato (Solanum lycopersicum) and 682
maize (Zea mays), the MSU Rice Genome Annotation for Oryza sativa Nipponbare 683
(release 7), ITGA3.20 annotation for the tomato genome (Tomato Genome 684
Consortium), and annotation of Zea mays cv B73 (version 4), respectively, were used 685
to retrieve the intron sequences. Selected RNA-seq profiles of rice, tomato and maize 686
(as listed in Supplemental Data Set 2) were used to identify BPs in rice using the same 687
method as for Arabidopsis. The regions from 10 bp upstream to 10 bp downstream of 688
the detected BP were used to analyze nucleotide composition. 689
690
Identifying tissue-specific BPs in Arabidopsis and rice 691
The number of supporting reads for identified BPs in Arabidopsis and rice were 692
grouped into different tissues (Supplemental Data Set 4). The five tissues with the 693
largest numbers of supporting reads of BPs were used to identify tissue-specific BPs 694
using the multinomial proportion test (Pineda and Bradley, 2018). The obtained 695
P-values were corrected using the method proposed by Benjamini and Hochberg. BPs 696
with multiple test correlated P-values smaller than 0.05 were deemed tissue-specific 697
BPs. 698
699
Calculating the splicing efficiency 700
24
The "bedtools genomecov" command of bedtools (Quinlan and Hall, 2010) was used 701
to calculate the genome coverage of the RNA-seq libraries. The maximal number of 702
reads that cover the +10 bp regions of exon-to-intron sites, EI, was calculated. The 703
number of junction reads, JR, was reported by TopHat in the Cufflinks pipeline. The 704
splicing efficiency of a gene was calculated as the log2 value of (EI/JR) as proposed 705
in (Bitton et al., 2014). 706
707
Analysis of transposable elements in introns 708
RepeatMasker (version open-4.0.6) (Tempel, 2012) was used to analyze transposable 709
elements in all introns longer than 100 bp, 10580 introns with lariat accumulation (≥5 710
FPKM) in Col-0, and 6720 introns with higher expression in dbr1-2. RepeatMasker 711
edition of RepBase (Bao et al., 2015) was used in RepeatMasker. 712
713
Accession numbers 714
Sequence data from this paper can be found with the accession numbers listed in 715
Supplemental Data Set 2. The RNA-seq data are deposited in the NCBI GEO database 716
with series accession No. GSE117416. 717
718
SUPPLEMENTAL DATA 719
Supplemental Figure 1. Schematic view of the experimental design. 720
Supplemental Figure 2. Correlation of gene and intron expression levels for each 721
group between two biological replicates. 722
Supplemental Figure 3. Gene expression patterns of the used samples and validation 723
of selected BPs in Arabidopsis. 724
Supplemental Figure 4. Examples of introns using guanines as their BPs in plants. 725
Supplemental Figure 5. Fidelity of BPs during library construction of RNA-seq 726
samples. 727
Supplemental Figure 6. Validation of two BPs in At4g39260.1 I1. 728
Supplemental Figure 7. Validation of the 7 BPs of At3g23590.1 I7. 729
Supplemental Figure 8. Examples of self-circularized introns in tomato and rice. 730
25
Supplemental Figure 9. Length distribution of self-circularized introns in plants. 731
Supplemental Figure 10. Other characteristics of lariat RNAs in Arabidopsis. 732
Supplemental Figure 11. Back-splicing and expression of parent genes with lariat 733
RNA accumulation. 734
Supplemental Table 1. Primers used in the study. 735
Supplemental Data Set 1. Transcriptome analysis of Col-0 and dbr1-2. 736
Supplemental Data Set 2. List of RNA sequencing datasets used in this study. 737
Supplemental Data Set 3. BPs identified in four plant species. 738
Supplemental Data Set 4. Tissue-specific BPs identified in Arabidopsis and rice. 739
Supplemental Data Set 5. Self-circularized introns identified in four plant species. 740
Supplemental Data Set 6. Introns with retroelements in Arabidopsis. 741
742
Competing interests 743
The authors declare that they have no competing interests. 744
745
AUTHOR'S CONTRIBUTIONS 746
Y.Zheng and B.Z. conceived and designed the research. X.Z. performed most747
bioinformatic analyses, Y.Zhang and T.W., Z.L. performed biological experiments, 748
including preparing samples for RNA-seq and validation of lariat RNAs. H.G., Q.T., 749
and J.C. provided technique help and critical comments on this project. Y.Zheng 750
designed and implemented the computational methods. K.C., L.L., C.L., and J.G. 751
helped to analyze RNA-seq data. Y.Zheng and B.Z. wrote the manuscript. 752
753
ACKNOWLEDGEMENTS 754
We thank Prof. Sheila McCormick for editing. This work was supported by grants of 755
the National Natural Science Foundation of China (No. 31830045, 31671261, and 756
31470281) to BZ, and by a grant (No. 31460295) of National Natural Science 757
Foundation of China to Y. Zheng. 758
759
REFERENCES 760
26
Armakola, M., Higgins, M.J., Figley, M.D., Barmada, S.J., Scarborough, E.A., 761
Diaz, Z., Fang, X., Shorter, J., Krogan, N.J., Finkbeiner, S., Farese, R.V., 762
Jr., and Gitler, A.D. (2012). Inhibition of RNA lariat debranching enzyme 763
suppresses TDP-43 toxicity in ALS disease models. Nat Genet 44, 1302-1309. 764
Bao, W., Kojima, K.K., and Kohany, O. (2015). Repbase Update, a database of 765
repetitive elements in eukaryotic genomes. Mob DNA 6, 11. 766
Barrett, S.P., Wang, P.L., and Salzman, J. (2015). Circular RNA biogenesis can 767
proceed through an exon-containing lariat precursor. Elife 4, e07540. 768
Benhamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a 769
practical and powerful apporoach to multiple testing. J R Stat Soc Series B 770
Stat Methodol, 289-300. 771
Bitton, D.A., Rallis, C., Jeffares, D.C., Smith, G.C., Chen, Y.Y., Codlin, S., 772
Marguerat, S., and Bahler, J. (2014). LaSSO, a strategy for genome-wide 773
mapping of intronic lariats and branch points using RNA-seq. Genome Res 24, 774
1169-1179. 775
Chen, S., Anderson, K., and Moore, M.J. (2000). Evidence for a linear search in 776
bimolecular 3' splice site AG selection. Proc Natl Acad Sci U S A 97, 777
593-598. 778
Chua, K., and Reed, R. (2001). An upstream AG determines whether a downstream 779
AG is selected during catalytic step II of splicing. Mol Cell Biol 21, 780
1509-1514. 781
Conn, S.J., Pillman, K.A., Toubia, J., Conn, V.M., Salmanidis, M., Phillips, C.A., 782
Roslan, S., Schreiber, A.W., Gregory, P.A., and Goodall, G.J. (2015). The 783
RNA binding protein quaking regulates formation of circRNAs. Cell 160, 784
1125-1134. 785
Dumesic, P.A., Natarajan, P., Chen, C., Drinnenberg, I.A., Schiller, B.J., 786
Thompson, J., Moresco, J.J., Yates, J.R., 3rd, Bartel, D.P., and Madhani, 787
H.D. (2013). Stalled spliceosomes are a signal for RNAi-mediated genome 788
defense. Cell 152, 957-968. 789
Galvis, A.E., Fisher, H.E., Fan, H., and Camerini, D. (2017). Conformational 790
27
changes in the 5' end of the HIV-1 genome dependent on the debranching 791
enzyme DBR1 during early stages of infection. J Virol 91. 792
Galvis, A.E., Fisher, H.E., Nitta, T., Fan, H., and Camerini, D. (2014). Impairment 793
of HIV-1 cDNA synthesis by DBR1 knockdown. J Virol 88, 7054-7069. 794
Gardner, E.J., Nizami, Z.F., Talbot, C.C., Jr., and Gall, J.G. (2012). Stable 795
intronic sequence RNA (sisRNA), a new class of noncoding RNA from the 796
oocyte nucleus of Xenopus tropicalis. Genes Dev 26, 2550-2559. 797
Gooding, C., Clark, F., Wollerton, M.C., Grellscheid, S.N., Groom, H., and 798
Smith, C.W. (2006). A class of human exons with predicted distant branch 799
points revealed by analysis of AG dinucleotide exclusion zones. Genome Biol 800
7, R1. 801
Ivanov, A., Memczak, S., Wyler, E., Torti, F., Porath, H.T., Orejuela, M.R., 802
Piechotta, M., Levanon, E.Y., Landthaler, M., Dieterich, C., and 803
Rajewsky, N. (2015). Analysis of intron sequences reveals hallmarks of 804
circular RNA biogenesis in animals. Cell Rep 10, 170-177. 805
Jacquier, A., and Rosbash, M. (1986). RNA splicing and intron turnover are greatly 806
diminished by a mutant yeast branch point. Proc Natl Acad Sci U S A 83, 807
5835-5839. 808
Karst, S.M., Rutz, M.L., and Menees, T.M. (2000). The yeast retrotransposons Ty1 809
and Ty3 require the RNA Lariat debranching enzyme, Dbr1p, for efficient 810
accumulation of reverse transcripts. Biochem Biophys Res Commun 268, 811
112-117.812
Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced aligner 813
with low memory requirements. Nat Methods 12, 357-360. 814
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. 815
(2013). TopHat2: accurate alignment of transcriptomes in the presence of 816
insertions, deletions and gene fusions. Genome Biol 14, R36. 817
Kim, H.C., Kim, G.M., Yang, J.M., and Ki, J.W. (2001). Cloning, expression, and 818
complementation test of the RNA lariat debranching enzyme cDNA from 819
mouse. Mol Cells 11, 198-203. 820
28
Kim, J.W., Kim, H.C., Kim, G.M., Yang, J.M., Boeke, J.D., and Nam, K. (2000). 821
Human RNA lariat debranching enzyme cDNA complements the phenotypes 822
of Saccharomyces cerevisiae dbr1 and Schizosaccharomyces pombe dbr1 823
mutants. Nucleic Acids Res 28, 3666-3673. 824
Kramer, M.C., Liang, D., Tatomer, D.C., Gold, B., March, Z.M., Cherry, S., and 825
Wilusz, J.E. (2015). Combinatorial control of Drosophila circular RNA 826
expression by intronic repeats, hnRNPs, and SR proteins. Genes Dev 29, 827
2168-2182. 828
Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 829
2. Nat Methods 9, 357-359.830
Li, Z., Wang, S., Cheng, J., Su, C., Zhong, S., Liu, Q., Fang, Y., Yu, Y., Lv, H., 831
Zheng, Y., and Zheng, B. (2016). Intron lariat RNA inhibits microRNA 832
biogenesis by sequestering the dicing complex in Arabidopsis. PLoS Genet 12, 833
e1006422. 834
Liang, D., and Wilusz, J.E. (2014). Short intronic repeat sequences facilitate circular 835
RNA production. Genes Dev 28, 2233-2247. 836
Mercer, T.R., Clark, M.B., Andersen, S.B., Brunck, M.E., Haerty, W., Crawford, 837
J., Taft, R.J., Nielsen, L.K., Dinger, M.E., and Mattick, J.S. (2015). 838
Genome-wide discovery of human splicing branchpoints. Genome Res 25, 839
290-303.840
Meyer, M., Plass, M., Perez-Valle, J., Eyras, E., and Vilardell, J. (2011). 841
Deciphering 3'ss selection in the yeast genome reveals an RNA thermosensor 842
that mediates alternative splicing. Mol Cell 43, 1033-1039. 843
Morgan, J.T., Fink, G.R., and Bartel, D.P. (2019). Excised linear introns regulate 844
growth in yeast. Nature. Epub ahead of print. doi: 845
10.1038/s41586-018-0828-1. 846
Nam, K., Lee, G., Trambley, J., Devine, S.E., and Boeke, J.D. (1997). Severe 847
growth defect in a Schizosaccharomyces pombe mutant defective in intron 848
lariat degradation. Mol Cell Biol 17, 809-818. 849
Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. (2007). The 850
29
mirtron pathway generates microRNA-class regulatory RNAs in Drosophila. 851
Cell 130, 89-100. 852
Ooi, S.L., Samarsky, D.A., Fournier, M.J., and Boeke, J.D. (1998). Intronic 853
snoRNA biosynthesis in Saccharomyces cerevisiae depends on the 854
lariat-debranching enzyme: intron length effects and activity of a precursor 855
snoRNA. RNA 4, 1096-1110. 856
Parenteau, J., Maignon, L., Berthoumieux, M., Catala, M., Gagnon, V., and 857
Abou Elela, S. (2019). Introns are mediators of cell response to starvation. 858
Nature. Epub ahead of print. doi: 10.1038/s41586-018-0859-7. 859
Pastuszak, A.W., Joachimiak, M.P., Blanchette, M., Rio, D.C., Brenner, S.E., and 860
Frankel, A.D. (2011). An SF1 affinity model to identify branch point 861
sequences in human introns. Nucleic Acids Res 39, 2344-2356. 862
Pineda, J.M.B., and Bradley, R.K. (2018). Most human introns are recognized via 863
multiple and tissue-specific branchpoints. Genes Dev 32, 577-591. 864
Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for 865
comparing genomic features. Bioinformatics 26, 841-842. 866
Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor 867
package for differential expression analysis of digital gene expression data. 868
Bioinformatics 26, 139-140. 869
Ruby, J.G., Jan, C.H., and Bartel, D.P. (2007). Intronic microRNA precursors that 870
bypass Drosha processing. Nature 448, 83-86. 871
Ruskin, B., and Green, M.R. (1985). An RNA processing activity that debranches 872
RNA lariats. Science 229, 135-140. 873
Ruskin, B., Krainer, A.R., Maniatis, T., and Green, M.R. (1984). Excision of an 874
intact intron as a novel lariat structure during pre-mRNA splicing in vitro. Cell 875
38, 317-331. 876
Smith, C.W., Porro, E.B., Patton, J.G., and Nadal-Ginard, B. (1989). Scanning 877
from an independently specified branch point defines the 3' splice site of 878
mammalian introns. Nature 342, 243-247. 879
Suzuki, H., Zuo, Y., Wang, J., Zhang, M.Q., Malhotra, A., and Mayeda, A. 880
30
(2006). Characterization of RNase R-digested cellular RNA source that 881
consists of lariat and circular RNAs from pre-mRNA splicing. Nucleic Acids 882
Res 34, e63. 883
Taggart, A.J., DeSimone, A.M., Shih, J.S., Filloux, M.E., and Fairbrother, W.G. 884
(2012). Large-scale mapping of branchpoints in human pre-mRNA transcripts 885
in vivo. Nat Struct Mol Biol 19, 719-721. 886
Taggart, A.J., Lin, C.L., Shrestha, B., Heintzelman, C., Kim, S., and Fairbrother, 887
W.G. (2017). Large-scale analysis of branchpoint usage across species and 888
cell lines. Genome Res 27, 639-649. 889
Talhouarne, G.J., and Gall, J.G. (2014). Lariat intronic RNAs in the cytoplasm of 890
Xenopus tropicalis oocytes. RNA 20, 1476-1487. 891
Talhouarne, G.J.S., and Gall, J.G. (2018). Lariat intronic RNAs in the cytoplasm of 892
vertebrate cells. Proc Natl Acad Sci U S A 115, E7970-E7977. 893
Tay, M.L., and Pek, J.W. (2017). Maternally inherited stable intronic sequence RNA 894
triggers a self-reinforcing feedback loop during development. Curr Biol 27, 895
1062-1067. 896
Tempel, S. (2012). Using and understanding RepeatMasker. Methods Mol Biol 859, 897
29-51. 898
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, 899
M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010). Transcript 900
assembly and quantification by RNA-Seq reveals unannotated transcripts and 901
isoform switching during cell differentiation. Nat Biotechnol 28, 511-515. 902
Wang, H., Hill, K., and Perry, S.E. (2004). An Arabidopsis RNA lariat debranching 903
enzyme is essential for embryogenesis. J Biol Chem 279, 1468-1473. 904
Ye, Y., De Leon, J., Yokoyama, N., Naidu, Y., and Camerini, D. (2005). DBR1 905
siRNA inhibition of HIV-1 replication. Retrovirology 2, 63. 906
Zhang, S.Y., Clark, N.E., Freije, C.A., Pauwels, E., Taggart, A.J., Okada, S., 907
Mandel, H., Garcia, P., Ciancanelli, M.J., Biran, A., Lafaille, F.G., 908
Tsumura, M., Cobat, A., Luo, J., Volpi, S., Zimmer, B., Sakata, S., Dinis, 909
A., Ohara, O., Garcia Reino, E.J., Dobbs, K., Hasek, M., Holloway, S.P., 910
31
McCammon, K., Hussong, S.A., DeRosa, N., Van Skike, C.E., Katolik, A., 911
Lorenzo, L., Hyodo, M., Faria, E., Halwani, R., Fukuhara, R., Smith, 912
G.A., Galvan, V., Damha, M.J., Al-Muhsen, S., Itan, Y., Boeke, J.D.,913
Notarangelo, L.D., Studer, L., Kobayashi, M., Diogo, L., Fairbrother, 914
W.G., Abel, L., Rosenberg, B.R., Hart, P.J., Etzioni, A., and Casanova,915
J.L. (2018). Inborn errors of RNA lariat metabolism in humans with brainstem916
viral infection. Cell 172, 952-965 e918. 917
Zhang, X.O., Wang, H.B., Zhang, Y., Lu, X., Chen, L.L., and Yang, L. (2014). 918
Complementary sequence-mediated exon circularization. Cell 159, 134-147. 919
Zhang, Y., Zhang, X.O., Chen, T., Xiang, J.F., Yin, Q.F., Xing, Y.H., Zhu, S., 920
Yang, L., and Chen, L.L. (2013). Circular intronic long noncoding RNAs. 921
Mol Cell 51, 792-806. 922
Zheng, S., Vuong, B.Q., Vaidyanathan, B., Lin, J.Y., Huang, F.T., and 923
Chaudhuri, J. (2015). Non-coding RNA generated following lariat 924
debranching mediates targeting of AID to DNA. Cell 161, 762-773. 925
926
Figure Legends 927
Figure 1. A summary of the results of 8 RNA-seq profiles. 928
(A) A global view of intron expression levels in the 8 RNA-seq profiles. The929
introns >100 bp were used. The average expression levels of Col-0 and dbr1-2, with 930
and without RNase R treatments, were compared using the Mann-Whitney U-test. r1 931
and r2 indicate replicate 1 and replicate 2, respectively. 932
(B) The correlation coefficient between intron expression levels and hierarchical933
clustering analysis. 934
(C) Principal Component Analysis (PCA) based on the intron expression levels.935
(D) The Venn diagram showing the numbers of genes with accumulation of lariat936
RNA in Col-0 and dbr1-2, respectively. The numbers indicate gene numbers with 937
average abundances of intronic transcripts ≥5 FPKM in the Col-0 and dbr1-2. 938
(E) The numbers of down- (blue) and up-regulated (red) introns in dbr1-2 RNase R (+)939
samples when compared to Col-0 RNase R (+) samples. 940
32
(F) The length of all introns >100 bp and introns de-regulated in dbr1-2 samples with 941
RNase R treatments. 942
943
Figure 2. Characterization of BPs in Arabidopsis, tomato, rice, and maize. 944
(A) The computational pipeline to identify BPs. The sequencing reads were aligned to945
the genome with TopHat (v2). The unmapped reads were aligned to the database of 946
introns with BLASTN. The partially mapped reads were examined to find whether the 947
remaining parts of the same reads could be aligned to the same introns with a 948
self-developed program. The reads supporting the same BP were collectively used to 949
infer the corresponding BP. The results from different RNA-seq profiles were 950
combined. 951
(B) The percentages of different nucleotides as the identified BPs in the four plant952
species, i.e., Arabidopsis, tomato, rice and maize. Numbers indicate the numbers with 953
the indicated nucleotide as the BP, and the percentage indicates the ratio of the 954
indicated BP numbers relative to the total numbers of identified BPs. 955
(C) The distribution of the distance from the BP to the 3’ss in the four plant species,956
i.e., Arabidopsis, tomato, rice and maize. In the distribution of maize, the widths of957
bars from -1000 to -50 are 19 nt, and the widths of bars from -50 to 0 are 1 nt. 958
(D) The nucleotide preferences flanking the BP in the four plant species, i.e.,959
Arabidopsis, tomato, rice and maize. 960
961
Figure 3. Multiple branchpoints in the four plant species. 962
(A) The intron numbers (Y-axis) with indicated BP numbers (X-axis) in the four plant963
species, i.e., Arabidopsis, tomato, rice and maize. 964
(B) The distance distributions of multiple BPs along the intron in the four plant965
species. Only introns with ≥5 lariat reads were counted. The percentages of lariat 966
reads (Y-axis) were calculated by dividing the number of lariat reads for a specific BP 967
by the total number of lariat reads for an intron. 968
969
Figure 4. Tissue-specific BP usage in Arabidopsis and rice. 970
33
(A) and (B) The percentages of supporting lariat reads at the indicated BPs in 971
different tissues for the 9th
intron of At3g01500.1 (A) and for the first intron of 972
Os04g16748.1 (B). The nucleotides in upper cases and red colors were tissue-specific 973
BPs identified in the current study. The P-values indicate the multiple corrected 974
P-values. The numbers below the sequence of the intron are the positions of the975
identified BPs in the intron from the 3’ss. 976
(C) The summarized information of identified BPs for the 7th
intron of At3g23590.1 977
by RT-PCR followed by Sanger sequencing. The 7th
intron length of At3g23590.1 is 978
311 bp, and the 216th
, 217th
, 216th
…indicate the distance downstream of the 5’ss, and 979
F and R indicate the pair of primers used for RT-PCR to amplify lariat RNAs 980
originated from this intron. 981
(D) The distributions of multiple BPs in different tissues by RT-PCR followed by982
Sanger sequencing. 17, 14, 21, 24, and 28 individual clones were sequenced for roots, 983
seedlings, leaves, inflorescences, and siliques, respectively. The numbers within the 984
circle indicate the clones carrying the indicated BP identified by Sanger sequencing. 985
986
Figure 5. The distributions of the first AGs downstream of the BP in plants. 987
(A) The nucleotide categories of the 3’ss based on the BP in four plant species. The988
four types of the 3’ss based on the BP were classified into (i) the first AG downstream 989
the BP is the 3’ss (blue), (ii) other AG rather than the first AG is the 3’ss (orange), (iii) 990
the 3’ss is non-AG (grey), (iv) the BP is one of the nucleotide of the 3’ss (yellow). 991
(B) The BP of the second intron of At1g70830.1 is exactly located within the 3’ss and992
its supporting reads in two different RNA-seq profiles (SRR1190492 and 993
SRR3234408). 994
(C) The BP of the 8th
intron of Zm0001d013156.2 in maize is exactly located within 995
the 3’ss and its supporting reads in two different RNA-seq profiles (SRR765414 and 996
SRR765622). Numbers in the right indicate the total numbers of supporting reads for 997
each transcript. 998
999
Figure 6. The identification and validation of lariat RNAs in Arabidopsis. 1000
34
(A) The distribution of total 10580 lariat RNAs identified in Col-0 with different 1001
expression levels. FPKM was used to evaluate the expression level of each lariat RNA, 1002
and the numbers indicate total numbers of lariat RNAs with indicated expression 1003
levels. 1004
(B) The length of all introns >100 bp in the Arabidopsis thaliana genome, introns1005
with lariat RNA expression ≥5 FPKM in Col-0, and introns with lariat RNA introns in 1006
dbr1-2. * indicates a P-value of <10-15
, Welch’s t-test.1007
(C) The length of introns with indicated expression levels of lariat RNAs. * indicates1008
a P-value of <10-15
, Welch’s t-test, compared to the group of 5542 introns with lariat 1009
RNA expression at FPKM5~10 in Col-0. 1010
(D) The genome browser of the abundances of four lariat-derived circular RNAs in1011
the 8 RNA-seq profiles. The dashed regions highlight the region between the BP and 1012
the 3’ss without supporting reads. “Ch4: 9714955-9715275” indicates the genetic 1013
location of the shown region in the chromosome. Numbers in the brackets indicate the 1014
value of the normalized expression level. 1015
(E) Schematic shows the information of the probes used in (F) and (G).1016
(F) RNA gel blotting detected the expression of the four lariat-derived circular RNAs1017
shown in (D). The lower panel (28S and 18S rRNAs were stained by ethidium 1018
bromide) indicates the loading controls, and all total RNA samples were loaded onto 1019
the same denatured agarose gel. The size of each band is indicated in each blot. 1020
(G) RNA gel blotting further detected the lariat-derived circular RNA of At4g17390.11021
I2 and At3g52590.1 I3 in the denatured PAGE gel. Equal amounts of total RNA 1022
samples from Col-0 and dbr1-2 were loaded onto the same urea-PAGE gel. The most 1023
right panel shows the sample of total RNAs treated by RNase R. Linear RNA 1024
standards of 380 nt, 238 nt, and 183 nt were used as the size indicators. The red 1025
arrows indicate the detected bands. 1026
1027
Figure 7. The interplay among exonic back-splicing, intronic TE insertion, and lariat 1028
RNA accumulation. 1029
(A) Fold change of back-splicing incidence at two adjacent exons. * indicates a1030
35
P-value of <0.001, χ2-test, compared to all introns in the genome.1031
(B) The percentage of different types of TEs in different groups of introns. All introns1032
represent introns >100 nt in the genome, other introns with lariat RNA accumulation 1033
at the level of FPKM 5~10, 10~20, 20~30, 30~50, >50, and introns with up-regulated 1034
lariat RNAs in dbr1-2 as indicated. 1035
(C) The comparison of lengths of RE-introns and non-RE introns.1036
(D) The comparisons of intron expression between RE-introns and non-RE introns. *1037
indicates a P-value smaller than 10-10
(Mann-Whitney U-test). 1038
(E) Three examples of intron with or without retroelements in the 8 RNA-seq profiles.1039
“Ch2: 14713970-14715028” indicates the genetic location of the shown region in the 1040
chromosome. Numbers in the brackets indicate the value of the normalized expression 1041
level. 1042
1043
Co
l-0_r1
db
r1-2
_r1
db
r1-2
_r2
Co
l-0_r2
Co
l-0_r1
db
r1-2
_r1
db
r1-2
_r2
Co
l-0_r2
Lo
g2 (
FP
KM
+1)
of
intr
on
s
0
5
10
15
20
P = 3.0 × 10-220
P = 0
RNase R (-) RNase R (+)
Col-0_r1
dbr1-2_r1
dbr1-2_r2
Col-0_r2
Col-0_r1
dbr1-2_r1
dbr1-2_r2
Col-0_r2
RN
ase R
(-)
RN
ase R
(+)
Co
l-0_r1
db
r1-2
_r1
db
r1-2
_r2
Co
l-0_r2
Co
l-0_r1
db
r1-2
_r1
db
r1-2
_r2
Co
l-0_r2
RNase R (-) RNase R (+)
0 0.5 1
A B
PC
2 (
17.0
%)
60
40
20
0
-20
-40
-60
-80-150 -100 -50 0 50 100 150
PC1 (73.4%)
dbr1-2
RNase R (+)
Col-0
RNase R (+)
6037 4205 548
Down-regulated
Up-regulated
Col-0 RNase R (-)
Col-0 RNase R (+)
dbr1-2 RNase R (-)
dbr1-2 RNase R (+)
C D
E F
- L
og
10 (F
DR
)
LogFC
30
25
20
15
10
5
0 -8 -4 0 4 8 10
Lo
g2 n
um
be
r o
f in
tro
ns
Length of introns (nt)
0
5
10
15
500 1000 1500 2000 2500 >3000
Down-regulated intronsUp-regulated intronsAll introns
Figure 1. A summary of the results of 8 RNA-seq profiles.
(A) A global view of intron expression levels in the 8 RNA-seq profiles. The introns >100 bp were used.
The average expression levels of Col-0 and dbr1-2, with and without RNase R treatments, were compared
using the Mann-Whitney U-test. r1 and r2 indicate replicate 1 and replicate 2, respectively.
(B) The correlation coefficient between intron expression levels and hierarchical clustering analysis.
(C) Principal Component Analysis (PCA) based on the intron expression levels.
(D) The Venn diagram showing the numbers of genes with accumulation of lariat RNA in Col-0 and dbr1-2,
respectively. The numbers indicate gene numbers with average abundances of intronic transcripts
≥5 FPKM in the Col-0 and dbr1-2.
(E) The numbers of down- (blue) and up-regulated (red) introns in dbr1-2 RNase R (+) samples when
Compared to Col-0 RNase R (+) samples.
(F) The length of all introns >100 bp and introns de-regulated in dbr1-2 samples with RNase R treatments.
1. Construct a database of
introns of annotated genes
2. Align reads to the genome
with TopHat2 or HISAT2
3. Align unmapped reads to
the intron database with
BLASTN or BOWTIE2
4. Align unmapped regions
of partially mapped reads
to the same intron
5. Combine the results from
different RNA seq libraries
10035,
72.3%
2415,
17.4% 1054,
7.6%
368,
2.7%
A BArabidopsis Tomato
A
U
G
C 2469,
18.3% 5099,
17.2%
19785,
66.9% 7073,
52.4% 2616,
19.4%
1329,
9.9%
3238,
10.9%
1460,
4.9%
Rice Maize
Distance from the BP (nt)
-10 -5 0 5 10
1.0
0.8
0.6
0.4
0.2
0
Raw
nu
cle
oti
de
fra
cti
on
A
C
G
U
D
C
BP
Fre
qu
en
cy
Distance from the 3’ss (nt)
-100 -80 -60 -40 -20 00
300
600
900
100
200
300
400
0 0
100
200
300
400
800
1200
1600
0
Distance from the 3’ss (nt)
-100 -80 -60 -40 -20 0
Distance from the 3’ss (nt)
-100 -80 -60 -40 -20 0
Distance from the 3’ss (nt)
-1000 -500 -50 -25 0
Distance from the BP (nt)
-10 -5 0 5 10
Distance from the BP (nt)
-10 -5 0 5 10
Distance from the BP (nt)
-10 -5 0 5 10
Rice Maize Tomato Arabidopsis
855,
16.4%,
3552,
68.3%
519,
10%
273,
5.3%
Arabidopsis Tomato Rice Maize
-200
Figure 2. Characterization of BPs in Arabidopsis, tomato, rice, and maize.
(A) The computational pipeline to identify BPs. The sequencing reads were aligned to the genome with
TopHat (v2). The unmapped reads were aligned to the database of introns with BLASTN. The partially
mapped reads were examined to find whether the remaining parts of the same reads could be aligned to
the same introns with a self-developed program. The reads supporting the same BP were collectively used
to infer the corresponding BP. The results from different RNA-seq profiles were combined.
(B) The percentages of different nucleotides as the identified BPs in the four plant species, i.e.,
Arabidopsis, tomato, rice and maize. Numbers indicate the numbers with the indicated nucleotide as the
BP, and the percentage indicates the ratio of the indicated BP numbers relative to the total numbers of
identified BPs.
(C) The distribution of the distance from the BP to the 3’ss in the four plant species, i.e., Arabidopsis,
tomato, rice and maize. In the distribution of maize, the widths of bars from -1000 to -50 are 19 nt, and the
widths of bars from -50 to 0 are 1 nt.(D) The nucleotide preferences flanking the BP in the four plant
species, i.e., Arabidopsis, tomato, rice and maize.
B
Numbers of BP per intron
Nu
mb
ers
of
intr
on
s
8000
6000
4000
2000
1 2 3 4 5 6 7 8 9 10
A
16000
12000
8000
4000
3000
1000
0
2000
8000
6000
4000
2000
0 0 0
Numbers of BP per intron
1 2 3 4 5 6 7 8 9 10
Numbers of BP per intron
1 2 3 4 5 6 7 8 9 10
Numbers of BP per intron
1 2 3 4 5 6 7 8 9 10
La
riat
read
s (
%)
La
riat
read
s (
%)
80
60
40
20
0
100
80
60
40
20
0
100
-100 -80 -60 -40 -20
Distance from the BP to the 3’ss (nt)
0 -200 -150 -100 -50
Distance from the BP to the 3’ss (nt)
0
La
riat
read
s (
%)
La
riat
read
s (
%)
80
60
40
20
0
100
80
60
40
20
0
100
Arabidopsis Tomato Rice Maize
Arabidopsis Tomato
Rice Maize
-100 -80 -60 -40 -20
Distance from the BP to the 3’ss (nt)
0 -100 -80 -60 -40 -20
Distance from the BP to the 3’ss (nt)
0
A
C
G
U
Figure 3. Multiple branchpoints in the four plant species.
(A) The intron numbers (Y-axis) with indicated BP numbers (X-axis) in the four plant species, i.e.,
Arabidopsis, tomato, rice and maize.
(B) The distance distributions of multiple BPs along the intron in the four plant species. Only introns with
≥5 lariat reads were counted. The percentages of lariat reads (Y-axis) were calculated by dividing the
number of lariat reads for a specific BP by the total number of lariat reads for an intron.
C
A
D
At3g01500.1, the 9th intron
-154 -34 -33 5’ss 3’ss
B
Os04g16748.1, the 1st intron
100
80
60
40
20
0
La
riat
read
s (
%)
Inflorescences (n = 26)
Leaves (n =33)
Callus (n = 0)
Roots (n= 0)
Seedlings (n = 21)
-700 -87 -7 5’ss 3’ss
100
80
60
40
20
0
La
riat
read
s (
%)
Nematode-induced giant cells (n = 54)
Panicle (n =19)
Roots (n = 18)
Shoots (n= 2745)
Vascular cells (n = 2)
216th C 217th T 220th A 224th A 285th T 287th T 223th T
Roots (17 clones) Seedlings (14 clones) Leaves (21 clones) Inflorescences (24 clones) Siliques (28 clones)
3
5
6
2 1
5
9
6
14
1
11
2
6
2
2 1
23
4
1
311 bp
C T
216th 217th F R
AT3G23590.1, the 7th intron
A A T T
220th 224th 285th 287th
T
223th
Exon 7 Exon 8
P = 3.0 × 10-60
P = 9.8 × 10-42
P = 1.6 × 10-8
P = 3.3 × 10-11
P = 1.6× 10-65
P = 5.6 × 10-43
Figure 4. Tissue-specific BP usage in Arabidopsis and rice.
(A) and (B) The percentages of supporting lariat reads at the indicated BPs in different tissues for the 9th
intron of At3g01500.1 (A) and for the first intron of Os04g16748.1 (B). The nucleotides in upper cases and
red colors were tissue-specific BPs identified in the current study. The P-values indicate the multiple
corrected P-values. The numbers below the sequence of the intron are the positions of the identified BPs
in the intron from the 3’ss.
(C) The summarized information of identified BPs for the 7th intron of At3g23590.1 by RT-PCR followed by
Sanger sequencing. The 7th intron length of At3g23590.1 is 311 bp, and the 216th, 217th, 216th…indicate
the distance downstream of the 5’ss, and F and R indicate the pair of primers used for RT-PCR to amplify
lariat RNAs originated from this intron.
(D) The distributions of multiple BPs in different tissues by RT-PCR followed by Sanger sequencing. 17, 14,
21, 24, and 28 individual clones were sequenced for roots, seedlings, leaves, inflorescences, and siliques,
respectively. The numbers within the circle indicate the clones carrying the indicated BP identified by
Sanger sequencing.
C
5’ss 3’ss
Zm0001d013156.2, the 8th intronexonexon
1.5%, 82 0.9%, 46 2.0%, 269 0.8%, 107 2.8%, 393
Arabidopsis Tomato Maize Rice
5’ss 3’ss
exonAt1g70830.5, the 2nd intron
SR
R3234408
S
RR
1190492
B
A
48.2%, 6504
18.5%, 2493 31.3%, 4221
41.4%, 12254 54.8%, 16208 38.2%, 5305 58.2%, 8067 43.4%, 2254 54.2%, 2817
1.5%, 429 2.3%, 691
exon
The BP is the 3’ss The 3’ss is not AG other AG is the 3’ss The 1st AG is the 3’ss
SR
R765414
S
RR
765622
Figure 5. The distributions of the first AGs downstream of the BP in plants.
(A) The nucleotide categories of the 3’ss based on the BP in four plant species. The four types of the 3’ss
based on the BP were classified into (i) the first AG downstream the BP is the 3’ss (blue), (ii) other AG
rather than the first AG is the 3’ss (orange), (iii) the 3’ss is non-AG (grey), (iv) the BP is one of the
nucleotide of the 3’ss (yellow).
(B) The BP of the second intron of At1g70830.1 is exactly located within the 3’ss and its supporting reads
in two different RNA-seq profiles (SRR1190492 and SRR3234408).
(C) The BP of the 8th intron of Zm0001d013156.2 in maize is exactly located within the 3’ss and its
supporting reads in two different RNA-seq profiles (SRR765414 and SRR765622). Numbers in the right
indicate the total numbers of supporting reads for each transcript.
At4g17390 At3g52590 At1g60995 At5g23050
mRNA mRNA lariat
intron 2 lariat
intron 3 lariat
intron 8 lariat
intron 8
G
Co
l-0
db
r1-2
Co
l-0
db
r1-2
290 nt 343 nt
375 nt 183 nt
Co
l-0
db
r1-2
Co
l-0
db
r1-2
RNA standards Col-0 dbr1-2 Col-0 dbr1-2 Col-0 dbr1-2
At3g52590.1 I3
RNase R (-) RNase R (+) RNase R (-)
E
F
380 nt
238 nt
183 nt
Co
l-0
db
r1-2
Co
l-0
db
r1-2
BP Exon Exon
probe for intron probe for mRNA
1033 nt
927 nt
343 nt 290 nt
At4g17390.1 I2
At4g17390.1 I2 At3g52590.1 I3 At1g60995.1 I8 At5g23050.1 I8
[0-3000]
[0-1500] [0-150]
[0-500]
[0-150]
[0-1000]
[0-150]
[0-800]
Ch4: 9714955-9715275 Ch3: 19506133-19506511 Ch1: 22467482-22467907 Ch5: 7732986-7733197
Col-0_r1
Col-0_r2
dbr1-2_r1
dbr1-2_r2
Col-0_r1
Col-0_r2
dbr1-2_r1
dbr1-2_r2 RN
ase R
(+)
RN
ase R
(-)
D Expression level (FPKM) of lariat RNAs
300
340
380
420
Intr
on
len
gth
(n
t)
8
12
16
Lo
g2(l
en
gth
of
intr
on
s)
10
14
All introns >100 nt
Introns expressed in Col-0
Up-regulated introns
in dbr1-2
FPKM 5~10
52.6%, 5542
FPKM 10~20
33.3%, 3515
FPKM 20~30
8%, 848
FPKM 30~50
4.1%, 436
FPKM >50
2%, 205
A B C
5’ss 3’ss 5’ss 5’ss 5’ss 3’ss 3’ss 3’ss
Figure 6. The identification and validation of lariat RNAs in Arabidopsis.
The distribution of total 10580 lariat RNAs identified in Col-0 with different expression levels. FPKM was
used to evaluate the expression level of each lariat RNA, and the numbers indicate total numbers of lariat
RNAs with indicated expression levels.
The length of all introns >100 bp in the Arabidopsis thaliana genome, introns with lariat RNA expression ≥5
FPKM in Col-0, and introns with lariat RNA introns in dbr1-2. * indicates a P-value of <10-15, Welch’s t-test.
The length of introns with indicated expression levels of lariat RNAs. * indicates a P-value of <10-15,
Welch’s t-test, compared to the group of 5542 introns with lariat RNA expression at FPKM5~10 in Col-0.
(D) The genome browser of the abundances of four lariat-derived circular RNAs in the 8 RNA-seq profiles.
The dashed regions highlight the region between the BP and the 3’ss without supporting reads. “Ch4:
9714955-9715275” indicates the genetic location of the shown region in the chromosome. Numbers in the
brackets indicate the value of the normalized expression level.
(E) Schematic shows the information of the probes used in (F) and (G).
(F) RNA gel blotting detected the expression of the four lariat-derived circular RNAs shown in (D). The
lower panel (28S and 18S rRNAs were stained by ethidium bromide) indicates the loading controls, and all
total RNA samples were loaded onto the same denatured agarose gel. The size of each band is indicated
in each blot.
(G) RNA gel blotting further detected the lariat-derived circular RNA of At4g17390.1 I2 and At3g52590.1 I3
in the denatured PAGE gel. Equal amounts of total RNA samples from Col-0 and dbr1-2 were loaded onto
the same urea-PAGE gel. The most right panel shows the sample of total RNAs treated by RNase R.
Linear RNA standards of 380 nt, 238 nt, and 183 nt were used as the size indicators. The red arrows
indicate the detected bands.
2
3
4
5
1
0 5’ exon 5’ and 3’exon 3’exon
Fo
ld c
ha
ng
e o
f exo
nic
cir
cu
lari
zati
on
2
3
4
5
1
0 Perc
en
t o
f d
iffe
ren
t T
E t
yp
es i
n in
tro
ns
Introns with indicated expression
level of lariat RNA (FPKM)
10
12
14
16
8
Lo
g2(l
en
gth
of
intr
on
)
Non-RE
introns
RE
introns
P < 10-100
A B C
15
At2g34880.1 I5 At2g14080.1 I1 At4g39260.1 I1
[0-100]
[0-100] [0-100]
[0-100]
[0-500]
[0-500]
Ch2:
14713970-14715028
Ch2:
5925824-5926195
Ch3:
18274568-18274850
Col-0_r1
Col-0_r1
dbr1-2_r1
dbr1-2_r2
Col-0_r1
Col-0_r2
dbr1-2_r1
dbr1-2_r2
RN
ase R
(+)
RN
ase R
(-)
LTR
LINE Co
l-0_r1
Co
l-0_r2
db
r1-2
_r1
db
r1-2
_r2
Co
l-0_r1
Co
l-0_r2
db
r1-2
_r1
db
r1-2
_r2
RNase R(+) RNase R(-)
Non-RE introns
RE introns
10
5
0 Exp
ressio
n o
f in
tro
ns
(lo
g10(F
PK
M+
1))
D E
SINEs
LINEs
LTR elements
DNA transposons
Unclassified TE
Satellites
Simple repeats
Low complexity
All introns
Average FPKM 10~20
Average FPKM 20~30
Average FPKM 30~50
Average FPKM >50
Figure 7. The interplay among exonic back-splicing, intronic TE insertion, and lariat RNA accumulation.
(A) Fold change of back-splicing incidence at two adjacent exons. * indicates a P-value of <0.001, χ2-test,
compared to all introns in the genome.
(B) The percentage of different types of TEs in different groups of introns. All introns represent introns
>100 nt in the genome, other introns with lariat RNA accumulation at the level of FPKM 5~10, 10~20,
20~30, 30~50, >50, and introns with up-regulated lariat RNAs in dbr1-2 as indicated.
(C) The comparison of lengths of RE-introns and non-RE introns.
(D) The comparisons of intron expression between RE-introns and non-RE introns. * indicates a P-value
smaller than 10-10 (Mann-Whitney U-test).
(E) Three examples of intron with or without retroelements in the 8 RNA-seq profiles. “Ch2: 14713970-
14715028” indicates the genetic location of the shown region in the chromosome. Numbers in the brackets
indicate the value of the normalized expression level.
Parsed CitationsArmakola, M., Higgins, M.J., Figley, M.D., Barmada, S.J., Scarborough, E.A., Diaz, Z., Fang, X., Shorter, J., Krogan, N.J., Finkbeiner, S.,Farese, R.V., Jr., and Gitler, A.D. (2012). Inhibition of RNA lariat debranching enzyme suppresses TDP-43 toxicity in ALS diseasemodels. Nat Genet 44, 1302-1309.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Bao, W., Kojima, K.K., and Kohany, O. (2015). Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA 6,11.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Barrett, S.P., Wang, P.L., and Salzman, J. (2015). Circular RNA biogenesis can proceed through an exon-containing lariat precursor.Elife 4, e07540.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Benhamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful apporoach to multiple testing. J RStat Soc Series B Stat Methodol, 289-300.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Bitton, D.A., Rallis, C., Jeffares, D.C., Smith, G.C., Chen, Y.Y., Codlin, S., Marguerat, S., and Bahler, J. (2014). LaSSO, a strategy forgenome-wide mapping of intronic lariats and branch points using RNA-seq. Genome Res 24, 1169-1179.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Chen, S., Anderson, K., and Moore, M.J. (2000). Evidence for a linear search in bimolecular 3' splice site AG selection. Proc Natl AcadSci U S A 97, 593-598.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Chua, K., and Reed, R. (2001). An upstream AG determines whether a downstream AG is selected during catalytic step II of splicing.Mol Cell Biol 21, 1509-1514.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Conn, S.J., Pillman, K.A., Toubia, J., Conn, V.M., Salmanidis, M., Phillips, C.A., Roslan, S., Schreiber, A.W., Gregory, P.A., and Goodall,G.J. (2015). The RNA binding protein quaking regulates formation of circRNAs. Cell 160, 1125-1134.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Dumesic, P.A., Natarajan, P., Chen, C., Drinnenberg, I.A., Schiller, B.J., Thompson, J., Moresco, J.J., Yates, J.R., 3rd, Bartel, D.P., andMadhani, H.D. (2013). Stalled spliceosomes are a signal for RNAi-mediated genome defense. Cell 152, 957-968.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Galvis, A.E., Fisher, H.E., Fan, H., and Camerini, D. (2017). Conformational changes in the 5' end of the HIV-1 genome dependent on thedebranching enzyme DBR1 during early stages of infection. J Virol 91.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Galvis, A.E., Fisher, H.E., Nitta, T., Fan, H., and Camerini, D. (2014). Impairment of HIV-1 cDNA synthesis by DBR1 knockdown. J Virol 88,7054-7069.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Gardner, E.J., Nizami, Z.F., Talbot, C.C., Jr., and Gall, J.G. (2012). Stable intronic sequence RNA (sisRNA), a new class of noncodingRNA from the oocyte nucleus of Xenopus tropicalis. Genes Dev 26, 2550-2559.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Gooding, C., Clark, F., Wollerton, M.C., Grellscheid, S.N., Groom, H., and Smith, C.W. (2006). A class of human exons with predicteddistant branch points revealed by analysis of AG dinucleotide exclusion zones. Genome Biol 7, R1.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ivanov, A., Memczak, S., Wyler, E., Torti, F., Porath, H.T., Orejuela, M.R., Piechotta, M., Levanon, E.Y., Landthaler, M., Dieterich, C., andRajewsky, N. (2015). Analysis of intron sequences reveals hallmarks of circular RNA biogenesis in animals. Cell Rep 10, 170-177.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Jacquier, A., and Rosbash, M. (1986). RNA splicing and intron turnover are greatly diminished by a mutant yeast branch point. Proc Natl
Acad Sci U S A 83, 5835-5839.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Karst, S.M., Rutz, M.L., and Menees, T.M. (2000). The yeast retrotransposons Ty1 and Ty3 require the RNA Lariat debranching enzyme,Dbr1p, for efficient accumulation of reverse transcripts. Biochem Biophys Res Commun 268, 112-117.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Kim, D., Langmead, B., and Salzberg, S.L. (2015). HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12, 357-360.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Kim, D., Pertea, G., Trapnell, C., Pimentel, H., Kelley, R., and Salzberg, S.L. (2013). TopHat2: accurate alignment of transcriptomes inthe presence of insertions, deletions and gene fusions. Genome Biol 14, R36.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Kim, H.C., Kim, G.M., Yang, J.M., and Ki, J.W. (2001). Cloning, expression, and complementation test of the RNA lariat debranchingenzyme cDNA from mouse. Mol Cells 11, 198-203.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Kim, J.W., Kim, H.C., Kim, G.M., Yang, J.M., Boeke, J.D., and Nam, K. (2000). Human RNA lariat debranching enzyme cDNA complementsthe phenotypes of Saccharomyces cerevisiae dbr1 and Schizosaccharomyces pombe dbr1 mutants. Nucleic Acids Res 28, 3666-3673.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Kramer, M.C., Liang, D., Tatomer, D.C., Gold, B., March, Z.M., Cherry, S., and Wilusz, J.E. (2015). Combinatorial control of Drosophilacircular RNA expression by intronic repeats, hnRNPs, and SR proteins. Genes Dev 29, 2168-2182.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357-359.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Li, Z., Wang, S., Cheng, J., Su, C., Zhong, S., Liu, Q., Fang, Y., Yu, Y., Lv, H., Zheng, Y., and Zheng, B. (2016). Intron lariat RNA inhibitsmicroRNA biogenesis by sequestering the dicing complex in Arabidopsis. PLoS Genet 12, e1006422.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Liang, D., and Wilusz, J.E. (2014). Short intronic repeat sequences facilitate circular RNA production. Genes Dev 28, 2233-2247.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Mercer, T.R., Clark, M.B., Andersen, S.B., Brunck, M.E., Haerty, W., Crawford, J., Taft, R.J., Nielsen, L.K., Dinger, M.E., and Mattick, J.S.(2015). Genome-wide discovery of human splicing branchpoints. Genome Res 25, 290-303.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Meyer, M., Plass, M., Perez-Valle, J., Eyras, E., and Vilardell, J. (2011). Deciphering 3'ss selection in the yeast genome reveals an RNAthermosensor that mediates alternative splicing. Mol Cell 43, 1033-1039.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Morgan, J.T., Fink, G.R., and Bartel, D.P. (2019). Excised linear introns regulate growth in yeast. Nature. Epub ahead of print. doi:10.1038/s41586-018-0828-1.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Nam, K., Lee, G., Trambley, J., Devine, S.E., and Boeke, J.D. (1997). Severe growth defect in a Schizosaccharomyces pombe mutantdefective in intron lariat degradation. Mol Cell Biol 17, 809-818.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Okamura, K., Hagen, J.W., Duan, H., Tyler, D.M., and Lai, E.C. (2007). The mirtron pathway generates microRNA-class regulatory RNAsin Drosophila. Cell 130, 89-100.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ooi, S.L., Samarsky, D.A., Fournier, M.J., and Boeke, J.D. (1998). Intronic snoRNA biosynthesis in Saccharomyces cerevisiae dependson the lariat-debranching enzyme: intron length effects and activity of a precursor snoRNA. RNA 4, 1096-1110.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Parenteau, J., Maignon, L., Berthoumieux, M., Catala, M., Gagnon, V., and Abou Elela, S. (2019). Introns are mediators of cell responseto starvation. Nature. Epub ahead of print. doi: 10.1038/s41586-018-0859-7.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Pastuszak, A.W., Joachimiak, M.P., Blanchette, M., Rio, D.C., Brenner, S.E., and Frankel, A.D. (2011). An SF1 affinity model to identifybranch point sequences in human introns. Nucleic Acids Res 39, 2344-2356.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Pineda, J.M.B., and Bradley, R.K. (2018). Most human introns are recognized via multiple and tissue-specific branchpoints. Genes Dev32, 577-591.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor package for differential expression analysis of digitalgene expression data. Bioinformatics 26, 139-140.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ruby, J.G., Jan, C.H., and Bartel, D.P. (2007). Intronic microRNA precursors that bypass Drosha processing. Nature 448, 83-86.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ruskin, B., and Green, M.R. (1985). An RNA processing activity that debranches RNA lariats. Science 229, 135-140.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ruskin, B., Krainer, A.R., Maniatis, T., and Green, M.R. (1984). Excision of an intact intron as a novel lariat structure during pre-mRNAsplicing in vitro. Cell 38, 317-331.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Smith, C.W., Porro, E.B., Patton, J.G., and Nadal-Ginard, B. (1989). Scanning from an independently specified branch point defines the3' splice site of mammalian introns. Nature 342, 243-247.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Suzuki, H., Zuo, Y., Wang, J., Zhang, M.Q., Malhotra, A., and Mayeda, A. (2006). Characterization of RNase R-digested cellular RNAsource that consists of lariat and circular RNAs from pre-mRNA splicing. Nucleic Acids Res 34, e63.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Taggart, A.J., DeSimone, A.M., Shih, J.S., Filloux, M.E., and Fairbrother, W.G. (2012). Large-scale mapping of branchpoints in humanpre-mRNA transcripts in vivo. Nat Struct Mol Biol 19, 719-721.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Taggart, A.J., Lin, C.L., Shrestha, B., Heintzelman, C., Kim, S., and Fairbrother, W.G. (2017). Large-scale analysis of branchpoint usageacross species and cell lines. Genome Res 27, 639-649.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Talhouarne, G.J., and Gall, J.G. (2014). Lariat intronic RNAs in the cytoplasm of Xenopus tropicalis oocytes. RNA 20, 1476-1487.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Talhouarne, G.J.S., and Gall, J.G. (2018). Lariat intronic RNAs in the cytoplasm of vertebrate cells. Proc Natl Acad Sci U S A 115, E7970-E7977.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Tay, M.L., and Pek, J.W. (2017). Maternally inherited stable intronic sequence RNA triggers a self-reinforcing feedback loop duringdevelopment. Curr Biol 27, 1062-1067.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Tempel, S. (2012). Using and understanding RepeatMasker. Methods Mol Biol 859, 29-51.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Trapnell, C., Williams, B.A., Pertea, G., Mortazavi, A., Kwan, G., van Baren, M.J., Salzberg, S.L., Wold, B.J., and Pachter, L. (2010).Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation.Nat Biotechnol 28, 511-515.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Wang, H., Hill, K., and Perry, S.E. (2004). An Arabidopsis RNA lariat debranching enzyme is essential for embryogenesis. J Biol Chem279, 1468-1473.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Ye, Y., De Leon, J., Yokoyama, N., Naidu, Y., and Camerini, D. (2005). DBR1 siRNA inhibition of HIV-1 replication. Retrovirology 2, 63.Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Zhang, S.Y., Clark, N.E., Freije, C.A., Pauwels, E., Taggart, A.J., Okada, S., Mandel, H., Garcia, P., Ciancanelli, M.J., Biran, A., Lafaille,F.G., Tsumura, M., Cobat, A., Luo, J., Volpi, S., Zimmer, B., Sakata, S., Dinis, A., Ohara, O., Garcia Reino, E.J., Dobbs, K., Hasek, M.,Holloway, S.P., McCammon, K., Hussong, S.A., DeRosa, N., Van Skike, C.E., Katolik, A., Lorenzo, L., Hyodo, M., Faria, E., Halwani, R.,Fukuhara, R., Smith, G.A., Galvan, V., Damha, M.J., Al-Muhsen, S., Itan, Y., Boeke, J.D., Notarangelo, L.D., Studer, L., Kobayashi, M.,Diogo, L., Fairbrother, W.G., Abel, L., Rosenberg, B.R., Hart, P.J., Etzioni, A., and Casanova, J.L. (2018). Inborn errors of RNA lariatmetabolism in humans with brainstem viral infection. Cell 172, 952-965 e918.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Zhang, X.O., Wang, H.B., Zhang, Y., Lu, X., Chen, L.L., and Yang, L. (2014). Complementary sequence-mediated exon circularization.Cell 159, 134-147.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Zhang, Y., Zhang, X.O., Chen, T., Xiang, J.F., Yin, Q.F., Xing, Y.H., Zhu, S., Yang, L., and Chen, L.L. (2013). Circular intronic longnoncoding RNAs. Mol Cell 51, 792-806.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
Zheng, S., Vuong, B.Q., Vaidyanathan, B., Lin, J.Y., Huang, F.T., and Chaudhuri, J. (2015). Non-coding RNA generated following lariatdebranching mediates targeting of AID to DNA. Cell 161, 762-773.
Pubmed: Author and TitleGoogle Scholar: Author Only Title Only Author and Title
DOI 10.1105/tpc.18.00711; originally published online March 20, 2019;Plant Cell
Li Liu, Chenyu Lu, Junqiang Guo, Binglian Zheng and Yun ZhengXiaotuo Zhang, Yong Zhang, Taiyun Wang, Ziwei Li, Jinping Cheng, Haoran Ge, Qi Tang, Kun Chen,
A comprehensive map of intron branchpoints and lariat RNAs in plants
This information is current as of December 6, 2020
Supplemental Data /content/suppl/2019/03/20/tpc.18.00711.DC1.html /content/suppl/2019/03/31/tpc.18.00711.DC2.html
Permissions X
https://www.copyright.com/ccc/openurl.do?sid=pd_hw1532298X&issn=1532298X&WT.mc_id=pd_hw1532298
eTOCs http://www.plantcell.org/cgi/alerts/ctmain
Sign up for eTOCs at:
CiteTrack Alerts http://www.plantcell.org/cgi/alerts/ctmain
Sign up for CiteTrack Alerts at:
Subscription Information http://www.aspb.org/publications/subscriptions.cfm
is available at:Plant Physiology and The Plant CellSubscription Information for
ADVANCING THE SCIENCE OF PLANT BIOLOGY © American Society of Plant Biologists