Patterns of tRNA Modificationsbenasque.org/2018rna/talks_contr/1710_Stadler-Benasque...Patterns of...

9
Patterns of tRNA Modifications Peter F. Stadler Bioinformatics Group, Dept. of Computer Science & Interdisciplinary Center for Bioinformatics, University of Leipzig Max Planck Institute for Mathematics in the Sciences,Leipzig RNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig Institute for Theoretical Chemistry, Univ. of Vienna (external faculty) The Santa Fe Institute (external faculty) Joint work with Anne Hoffmann, J¨ org Fallmann, Fabian Amman, and the M¨orl Lab Benasque, Jul 17 2018

Transcript of Patterns of tRNA Modificationsbenasque.org/2018rna/talks_contr/1710_Stadler-Benasque...Patterns of...

Page 1: Patterns of tRNA Modificationsbenasque.org/2018rna/talks_contr/1710_Stadler-Benasque...Patterns of tRNA Modi cations Peter F. Stadler Bioinformatics Group, Dept. of Computer Science

Patterns of tRNA Modifications

Peter F. Stadler

Bioinformatics Group, Dept. of Computer Science & Interdisciplinary Center forBioinformatics, University of Leipzig

Max Planck Institute for Mathematics in the Sciences,LeipzigRNomics Group, Fraunhofer Institute for Cell Therapy and Immunology, Leipzig

Institute for Theoretical Chemistry, Univ. of Vienna (external faculty)The Santa Fe Institute (external faculty)

Joint work withAnne Hoffmann, Jorg Fallmann, Fabian Amman, and the Morl Lab

Benasque, Jul 17 2018

Page 2: Patterns of tRNA Modificationsbenasque.org/2018rna/talks_contr/1710_Stadler-Benasque...Patterns of tRNA Modi cations Peter F. Stadler Bioinformatics Group, Dept. of Computer Science

Conservation of Modification Patterns

How conserved are chemical modifications of RNAs

between different speciesbetween different tissues

Can this be studied with widely available data?tRNAs (or at least fragments) appear in high coverage inmost small-RNA-seq datasets.Can this information be mined?

The answer is a partial yes: Many but not all chemicalmodification leave detectable traces in normal RNA-seq data

Page 3: Patterns of tRNA Modificationsbenasque.org/2018rna/talks_contr/1710_Stadler-Benasque...Patterns of tRNA Modi cations Peter F. Stadler Bioinformatics Group, Dept. of Computer Science

Hallmarks of Modifications in RNA-seq data

!

! !

!

!

!

!

!

! !

!

!

!

!

!

! !

! ! ! !

!

!

!

!

! !

!

!

!

"#

"#

"#

"#

"#

"# "#

"#

"#

"#

"#

"#

"#

"#

"#

"#

"#

"# "#

"#

"#

"#

"#

"#

"#

"#

"#

"# "#

"#

0

0.1

0.2

0.3

0.4

0.5

0.6

43 48 53 58 63 68 73

err

or

rate

tRNA nucleotide position

1-Methyladenosine

canonical 3’-end

0.25 0.50

A→A

A→C

A→G

A→T

Figure 1: Frequency of mismatches between RNAseq reads and ge-

nomic reference sequence for reads mapping close to the 3’-ends

of human (solid line; triangles) and macaque (dashed line; circles)

tRNAs. Note that the distribution of the error rates is highly cor-

related between the two species. The inset shows the frequency of

nucleotides observed at position 58. Apart from the being read “cor-

rectly” as , this post-transcriptional modification is typically seen as

an A-to-T transversion by the sequencer. Again, human (dark gray

bars) and macaque (light gray bars) have highly similar substitution

patterns.

for this modification seems to be largely invariant to the

library preparation.

This modification is the most prominent one that is

directly visible from the superposition of the error pro-

files of all tRNAs. Several other modifications are de-

tectable as conspicuous accumulations of mismatches

in individual tRNAs. Notably, most of the detectable

positional variations are located either towards the 5’-

or towards the 3’-end of the tRNA. This is caused

by the very uneven coverage of tRNAs with small se-

quencing reads, which is heavily biased towards the

ends and the fact that the error-prone 3’- and 5’-termini

of sequencing reads naturally coincide with the ends

of the tRNA (see Fig. S1). Besides the very strong

ect of 1-methyladenosine on the accuracy of the

cDNA, RNAseq data additionally exhibits moderately

increased error rates for N2,N2-dimethylguanosinesand

dihydrouridines. This is consistent with the find-

ings that the major substitution sites in plant tRNAs

correspond to known RNA base modifications: N1-

methyladenosine (m1A), N2-methylguanosine (m2G),

and N2-N2-methylguanosine (m22G) (Iida et al., 2009;

Ebhardt et al., 2009).

Fig. 2 summarizes the genomic distribution of mis-

matches that can be interpreted as sites of chemical

10 10 10 10 10 10

other

CDS

miRNA

tRNA

Number of potential RNA modifications (log)

Figure 2: Distribution of potential modifications among human (dark

gray) and macaque (light gray) RNA classes. Despite the fact that the

data set generated in the macaque RNAseq experiments is about 50-

times larger than the human one (Tab. 1) the total number of possible

modifications in tRNA and miRNA does not di er proportionally. In

the case of tRNAs, 862 modifications were found in macaque, com-

pared to 657 in human sets; for microRNAs, there are 1400 potential

modifications in macaque and about 500 in human.

modifications. As a consequence of the short RNA se-

quencing protocol, tRNAs and miRNAs, as measured

by the proportion of their genome sequence, accumu-

late most of the positional sequence variations seen in

the data sets used here. For tRNAs, the numbers are

comparable between human and macaque (657 vs. 862),

indicating that even the comparably small human li-

brary is nearly saturated, while for microRNAs there

are about 500 positional sequence variation in human

and 1400 in macaque, leading us to expect that the true

number of putative modifications in human microRNAs

will be larger than observed here. Such an saturation

ect cannot be expected for long transcripts, including

CDS, as a consequence of the small RNA sequencing

protocol. The much higher di erence of observable nu-

cleotide variations within these classes clearly supports

this assumption. Surprisingly, more than two third of

the detectable loci in human fall into coding sequences,

while only 8% are located in tRNAs. In macaque, the

ratio is even smaller, suggesting that chemical modifica-

tions indicated by the observed sequence variations are

a common phenomenon in general.

A detailed analysis of individual modification sites

with respect to their substitution patterns requires a suf-

ficiently large coverage. We have identified two fre-

quent potential modifications in human mir-124-3 and

mir-125b-2 transcript tags. In the case of the human

mir-124-3 (Fig. 3) 33 di erent tags cover nucleotide

77G but only 11 tags are in accordance with the refer-

ence sequence. The other nucleotides A, C, and T were

counted 6, 5, and 11 times, respectively. No genomic

11

00

10

,00

0

.(((((((..((((...........)))).(((((.......)))))....................(((((.......)))))))))))).

D-loop Ac-loop T-loopV-region

no

rma

lize

d r

ea

d c

ou

nt

2

3

4

5

1

1,0

00

,00

0

mismatches RT stops

Page 4: Patterns of tRNA Modificationsbenasque.org/2018rna/talks_contr/1710_Stadler-Benasque...Patterns of tRNA Modi cations Peter F. Stadler Bioinformatics Group, Dept. of Computer Science

Mapping of tRNA reads

Many very similar tRNA paralog

tRNAs with introns

Specialized mapping pipeline

remove pre-tRNA reads that map across the boundaries ofmature tRNAs (and everyhing that maps well outside tRNAs)

mask out tRNA genes from genome

cluster very similar mature tRNAs and use a consensus

add these “tRNA prototypes” as artificial chromosomes

disregard positions with variations between paralogs in thesame cluster

Page 5: Patterns of tRNA Modificationsbenasque.org/2018rna/talks_contr/1710_Stadler-Benasque...Patterns of tRNA Modi cations Peter F. Stadler Bioinformatics Group, Dept. of Computer Science

Variations between tissues

Page 6: Patterns of tRNA Modificationsbenasque.org/2018rna/talks_contr/1710_Stadler-Benasque...Patterns of tRNA Modi cations Peter F. Stadler Bioinformatics Group, Dept. of Computer Science

More Results

Very similar results for rat tissues

Evidence for a few modifications in miRNAs

the method also picks up A-I editing in miRNAs (and othersmall RNAs)

modifications seem to be common also in snoRNAs

Page 7: Patterns of tRNA Modificationsbenasque.org/2018rna/talks_contr/1710_Stadler-Benasque...Patterns of tRNA Modi cations Peter F. Stadler Bioinformatics Group, Dept. of Computer Science

Many modifications require chemical treatment

Page 8: Patterns of tRNA Modificationsbenasque.org/2018rna/talks_contr/1710_Stadler-Benasque...Patterns of tRNA Modi cations Peter F. Stadler Bioinformatics Group, Dept. of Computer Science

Temperature dependence of Dihydrouridine

10◦C

●● ●

● ●

● ●● ●

●● ●●● ●

●● ● ●

● ● ●●

●● ● ●

●●●

●● ●

●● ●

● ●● ●●

●● ●● ●

● ●

● ●

● ●

●●● ●

●●

● ●● ●● ●

● ●

●●

●●

● ●

●●

● ●●

● ●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

● ●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Ala−GGC−1Ala−TGC−1Arg−ACG−1Arg−CCG−1Arg−TCT−1Asn−GTT−1Asp−GTC−1Cys−GCA−1Gln−TTG−1Glu−TTC−1Glu−TTC−2Gly−GCC−1Gly−TCC−1Gly−TCC−2His−GTG−1His−GTG−2

Ile−GAT−1Leu−CAA−1Leu−CAG−1Leu−GAG−1Leu−TAA−1Leu−TAG−1Lys−TTT−1Lys−TTT−2Met−CAT−1Met−CAT−2Met−CAT−3

Phe−GAA−1Pro−TGG−1Ser−CGA−1Ser−GCT−1Ser−GGA−1Ser−TGA−1Thr−GGT−1Thr−TGT−1Trp−CCA−1Tyr−GTA−1Val−GAC−1Val−TAC−1

1 2 3 4 5 6 7 8 10 11 12 13 16 17 17a20 20a

20b24 25 26 27 28 29 31 32 33 34 35 36 38 39 41 42 43 44 45 11e

12e13e

15e2e 3e 4e 5e 24e

22e21e

46 47 48 50 51 54 55 59 60 62 63 64 65 66

U tRNA position

tRN

A c

lust

er

10

20

30

ratio

20◦C

●● ●

●● ●●

● ● ●

●● ●●

● ●● ● ●

●●

● ●●●● ●

●● ●● ●●● ● ●

● ●● ●

● ●

● ●●●

●●●

●● ●● ● ●●●

● ●● ● ●●

● ●

● ●●

●● ● ●

●●●

●●

● ●● ●● ●

● ●●●

●●

●●

● ●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●● ●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Ala−GGC−1Ala−TGC−1Arg−ACG−1Arg−CCG−1Arg−TCT−1Asn−GTT−1Asp−GTC−1Cys−GCA−1Gln−TTG−1Glu−TTC−1Glu−TTC−2Gly−GCC−1Gly−TCC−1Gly−TCC−2His−GTG−1His−GTG−2

Ile−GAT−1Leu−CAA−1Leu−CAG−1Leu−GAG−1Leu−TAA−1Leu−TAG−1Lys−TTT−1Lys−TTT−2Met−CAT−1Met−CAT−2Met−CAT−3

Phe−GAA−1Pro−TGG−1Ser−CGA−1Ser−GCT−1Ser−GGA−1Ser−TGA−1Thr−GGT−1Thr−TGT−1Trp−CCA−1Tyr−GTA−1Val−GAC−1Val−TAC−1

1 2 3 4 5 6 7 8 10 11 12 13 16 17 17a20 20a

20b24 25 26 27 28 29 31 32 33 34 35 36 38 39 41 42 43 44 45 11e

12e13e

15e2e 3e 4e 5e 24e

22e21e

46 47 48 50 51 54 55 59 60 62 63 64 65 66

U tRNA position

tRN

A c

lust

er

20

40

60ratio

30◦C

●●

● ●

● ●

● ●●●

● ●●● ●● ●

● ●●● ●

● ●●● ●

●● ● ●●

●●●

●● ●●●

● ●● ●

● ●● ●

●●

●● ● ●

●●

●● ●●● ●

● ●●●

● ●

●●

● ●

●●

●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

● ●

●●

● ●●

●●●

●●

●● ●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Ala−GGC−1Ala−TGC−1Arg−ACG−1Arg−CCG−1Arg−TCT−1Asn−GTT−1Asp−GTC−1Cys−GCA−1Gln−TTG−1Glu−TTC−1Glu−TTC−2Gly−GCC−1Gly−TCC−1Gly−TCC−2His−GTG−1His−GTG−2

Ile−GAT−1Leu−CAA−1Leu−CAG−1Leu−GAG−1Leu−TAA−1Leu−TAG−1Lys−TTT−1Lys−TTT−2Met−CAT−1Met−CAT−2Met−CAT−3

Phe−GAA−1Pro−TGG−1Ser−CGA−1Ser−GCT−1Ser−GGA−1Ser−TGA−1Thr−GGT−1Thr−TGT−1Trp−CCA−1Tyr−GTA−1Val−GAC−1Val−TAC−1

1 2 3 4 5 6 7 8 10 11 12 13 16 17 17a 20 20a 20b 24 25 26 27 28 29 31 32 33 34 35 36 38 39 41 42 43 44 45 11e 12e 13e 15e 2e 3e 4e 5e 24e 22e 21e 46 47 48 50 51 54 55 59 60 62 63 64 65 66

U tRNA position

tRN

A cl

uste

r

20

40

60ratio

Exiguobacterium sibir-icum

3 biological replicates

simple model to determine ap-value for theover-representation of RT-stopsat each position

Benjamini-Hochberg correctionwith 5% FDR

cutoff for minimum fold change

Page 9: Patterns of tRNA Modificationsbenasque.org/2018rna/talks_contr/1710_Stadler-Benasque...Patterns of tRNA Modi cations Peter F. Stadler Bioinformatics Group, Dept. of Computer Science

Temperature dependence of Pseudouridine

10◦C

●●

● ● ●

● ●●●

●●● ●

● ●● ●

● ●

● ●

●● ●●

● ●

●●

●●

● ●

●●

●●

● ●●

● ●●

●●

●●

●●

● ●

●●

● ●●●

●●

●●●

●●

●● ●●

●●

●●

●●

●●

●●

●●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Ala−GGC−1Ala−TGC−1Arg−ACG−1Arg−CCG−1Arg−TCT−1Asn−GTT−1Asp−GTC−1Cys−GCA−1Gln−TTG−1Glu−TTC−1Glu−TTC−2Gly−GCC−1Gly−TCC−1Gly−TCC−2His−GTG−1His−GTG−2

Ile−GAT−1Leu−CAA−1Leu−CAG−1Leu−GAG−1Leu−TAA−1Leu−TAG−1Lys−TTT−1Lys−TTT−2Met−CAT−1Met−CAT−2Met−CAT−3

Phe−GAA−1Pro−TGG−1Ser−CGA−1Ser−GCT−1Ser−GGA−1Ser−TGA−1Thr−GGT−1Thr−TGT−1Trp−CCA−1Tyr−GTA−1Val−GAC−1Val−TAC−1

1 2 3 4 5 6 7 8 10 11 12 13 16 17 17a20 20a

20b24 25 26 27 28 29 31 32 33 34 35 36 38 39 41 42 43 44 45 11e

12e13e

15e2e 3e 4e 5e 24e

22e21e

46 47 48 50 51 54 55 59 60 62 63 64 65 66

U tRNA position

tRN

A c

lust

er

3

6

9

ratio

20◦C

● ●

● ●

● ●●

● ● ●

● ●●● ●

●●●

● ●● ●

● ●

● ●●●

●●

● ●

●●

● ●

●●

●●

● ●●

● ●●

●●

●●

●●

● ●

●●

● ●●

●●●

●●

●● ●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Ala−GGC−1Ala−TGC−1Arg−ACG−1Arg−CCG−1Arg−TCT−1Asn−GTT−1Asp−GTC−1Cys−GCA−1Gln−TTG−1Glu−TTC−1Glu−TTC−2Gly−GCC−1Gly−TCC−1Gly−TCC−2His−GTG−1His−GTG−2

Ile−GAT−1Leu−CAA−1Leu−CAG−1Leu−GAG−1Leu−TAA−1Leu−TAG−1Lys−TTT−1Lys−TTT−2Met−CAT−1Met−CAT−2Met−CAT−3

Phe−GAA−1Pro−TGG−1Ser−CGA−1Ser−GCT−1Ser−GGA−1Ser−TGA−1Thr−GGT−1Thr−TGT−1Trp−CCA−1Tyr−GTA−1Val−GAC−1Val−TAC−1

1 2 3 4 5 6 7 8 10 11 12 13 16 17 17a20 20a

20b24 25 26 27 28 29 31 32 33 34 35 36 38 39 41 42 43 44 45 11e

12e13e

15e2e 3e 4e 5e 24e

22e21e

46 47 48 50 51 54 55 59 60 62 63 64 65 66

U tRNA position

tRN

A c

lust

er

5

10

15

20

ratio

30◦C

● ●

● ●●●

● ●●

● ●

●●●

●●

● ●● ●

● ●●

● ●

●● ●●

● ●

●●

●●

● ●

●●

●●

●●

● ●●

● ●●

●●

●●

●●

● ●

●●

●●●

●●

●●●

●●

●● ●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

Ala−GGC−1Ala−TGC−1Arg−ACG−1Arg−CCG−1Arg−TCT−1Asn−GTT−1Asp−GTC−1Cys−GCA−1Gln−TTG−1Glu−TTC−1Glu−TTC−2Gly−GCC−1Gly−TCC−1Gly−TCC−2His−GTG−1His−GTG−2

Ile−GAT−1Leu−CAA−1Leu−CAG−1Leu−GAG−1Leu−TAA−1Leu−TAG−1Lys−TTT−1Lys−TTT−2Met−CAT−1Met−CAT−2Met−CAT−3

Phe−GAA−1Pro−TGG−1Ser−CGA−1Ser−GCT−1Ser−GGA−1Ser−TGA−1Thr−GGT−1Thr−TGT−1Trp−CCA−1Tyr−GTA−1Val−GAC−1Val−TAC−1

1 2 3 4 5 6 7 8 10 11 12 13 16 17 17a20 20a

20b24 25 26 27 28 29 31 32 33 34 35 36 38 39 41 42 43 44 45 11e

12e13e

15e2e 3e 4e 5e 24e

22e21e

46 47 48 50 51 54 55 59 60 62 63 64 65 66

U tRNA position

tRN

A c

lust

er

10

20

ratio

Exiguobacterium sibir-icum