Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with...

22
Estimation of alternative splicing isoform frequencies from RNA-Seq data Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    0

Transcript of Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with...

Page 1: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Estimation of alternative splicing isoform frequencies

from RNA-Seq dataMarius Nicolae

Computer Science and Engineering Department

University of Connecticut

Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky

Page 2: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Introduction EM Algorithm Results Conclusions and future work

Outline

Page 3: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

RNA-Seq

A B C D E

Make cDNA & shatter into fragments

Sequence fragment ends

Map reads

Gene Expression (GE)

A B C

A C

D E

Isoform Discovery (ID) Isoform Expression (IE)

Page 4: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Read ambiguity (multireads)

What is the gene length?

Gene Expression Challenges

A B C D E

Page 5: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Ignore multireads [Mortazavi et al. 08]

◦ Fractionally allocate multireads based on unique read estimates

[Pasaniuc et al. 10]◦ EM algorithm for solving ambiguities

Gene length: sum of lengths of exons that appear in at least one isoform Underestimate expression levels for genes with 2

or more isoforms [Trapnell et al. 10]

Previous approaches to GE

Page 6: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Read Ambiguity in IE

A B C D E

A C

Page 7: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

[Jiang&Wong 09]◦ Poisson model, single reads only

[Li et al.10]◦ EM Algorithm, single reads only

[Feng et al. 10]◦ Convex quadratic program, pairs used only for ID

[Trapnell et al. 10]◦ Extends Jiang’s model to paired reads◦ Fragment length distribution

Previous approaches to IE

Page 8: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

EM Algorithm for IE◦ Single and paired reads◦ Fragment length distribution◦ Strand information◦ Base quality scores

Solving GE by adding isoform levels

Our contributions

Page 9: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Introduction EM Algorithm Results Conclusions and future work

Outline

Page 10: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Read-Isoform Compatibility

Page 11: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Paired reads

Single reads

Fragment length distribution

A B C

A C

A B C

A CA C

A B C

A B C

A C

A B C

A C

A B C

A C

Series1

Series1

Series1

Series1

Page 12: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

IsoEM algorithm

E-step

M-step

Page 13: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Introduction EM Algorithm Results Conclusions and future work

Outline

Page 14: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Human genome UCSC known isoforms

GNFAtlas2 gene expression levels◦ Uniform/geometric expression of gene isoforms

Normally distributed fragment lengths◦ Mean 250, std. dev. 25

Experimental setup

0 5 10 15 20 25 30 35 40 45 50 551

10

100

1000

10000

100000

Number of isoforms

Num

ber

of

genes

0

5000

10000

15000

20000

25000

Isoform length

Num

ber

of

isofo

rms

Page 15: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Error Fraction (EF)◦ Percentage of isoforms (or genes) with relative

error larger than given threshold t

Median Percent Error (MPE)◦ Threshold t for which EF is 50%

r2 ◦ Coefficient of determination

Accuracy measurements

Page 16: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

30M single reads of length 25

Main difference b/w IsoEM and RSEM is fragment length modeling

0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

70

80

90

100Uniq Rescue UniqLN RSEM

IsoEM

Relative error threshold

% o

f is

ofo

rms

ove

r th

resh

old

Isoform Error Fraction Curves

Page 17: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Gene Error Fraction Curves

0 0.2 0.4 0.6 0.8 10

10

20

30

40

50

60

70

80

90

100

Uniq Rescue

GeneEM RSEM

IsoEM

Relative error threshold

% o

f g

enes

ove

r th

resh

old

30M single reads of length 25

Page 18: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Fixed sequencing throughput (750Mb)

50bp reads better than 100bp!

Read Length Effect

25 35 45 55 65 75 85 950

5

10

15

20

25

Paired reads

Single reads

Read lengthM

ed

ian

Perc

en

t E

rro

r25 35 45 55 65 75 85 95

0.962000000000001

0.964000000000001

0.966000000000001

0.968000000000001

0.970000000000001

0.972000000000001

0.974000000000001

0.976000000000001

0.978000000000001

Paired reads

Single reads

Read length

r2

Page 19: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

1-60M 75bp reads

Pairs help, strand info doesn’t [Trapnell et al. 10] r2=.95 for 13M PE reads

Effect of Pairs & Strand Information

0 10000000 20000000 30000000 40000000 50000000 600000000.9250.93

0.9350.94

0.9450.95

0.9550.96

0.9650.97

0.9750.98

0.985

RandomStrand-Pairs-PerfectMapping

RandomStrand-Pairs

CodingStrand-pairs

RandomStrand-Single

CodingStrand-single

# reads

r2

Page 20: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Introduction EM Algorithm Results Conclusions and future work

Outline

Page 21: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Presented EM algorithm for isoform frequency estimation that exploits fragment length distribution for both single and paired reads◦ Significant accuracy improvement over existing

methods◦ Code and datasets to be released publicly soon

Ongoing extensions◦ Confidence intervals◦ Allelic specific isoform expression◦ Testing for novel isoforms◦ Integration with isoform discovery

Conclusions & Future Work

Page 22: Marius Nicolae Computer Science and Engineering Department University of Connecticut Joint work with Serghei Mangul, Ion Mandoiu and Alex Zelikovsky.

Questions?