30 Sept. 2010

35
1 30 Sept. 2010 Genome Sciences Centre BC Cancer Agency, Vancouver, BC, Canada Malachi Griffith ALEXA-Seq analysis reveals breast cell type specific mRNA isoforms www.AlexaPlatform.org

description

Genome Sciences Centre. BC Cancer Agency, Vancouver, BC, Canada. ALEXA-Seq analysis reveals breast cell type specific mRNA isoforms www.AlexaPlatform.org. Malachi Griffith. 30 Sept. 2010. In most genes, transcript diversity is generated by alternative expression. - PowerPoint PPT Presentation

Transcript of 30 Sept. 2010

Page 1: 30 Sept. 2010

130 Sept. 2010

Genome Sciences CentreBC Cancer Agency, Vancouver, BC, Canada

Malachi Griffith

ALEXA-Seq analysis reveals breast cell type specific mRNA isoforms

www.AlexaPlatform.org

Page 2: 30 Sept. 2010

2

In most genes, transcript diversity is generated by alternative expression

Types of alternative expressionGene expression

Page 3: 30 Sept. 2010

3

Transcript variation is important to the study of human disease

• Alternative expression generates multiple distinct transcript variants from most human loci

• Specific transcript variants may represent useful therapeutic targets or diagnostic markers

(Venables, 2006)

Page 4: 30 Sept. 2010

4

Massively parallel RNA sequencing

Isolate RNAs

Sequence ends

263 million paired reads21 billion bases of sequence

Generate cDNA, fragment, size select, add linkersLuminal

Map to genome, transcriptome, and predicted exon junctions

Discover isoforms and measure abundance

Myoepithelial

hESCs

vHMECs

Tissues/Cell Lines

Page 5: 30 Sept. 2010

5

Pipeline overview

Page 6: 30 Sept. 2010

6

What is an ALEXA-Seq sequence ‘feature’

Summary of features for human:~4 million total (14% ‘known’)

37k Genes62k Transcripts

278k exons2,210k exon junctions407k alternative exon boundaries560k intron regions227k intergenic regions

Page 7: 30 Sept. 2010

7

Data analyzed to date

• ALEXA-Seq processing: 19 projects – REMC + 18 others

• 105 libraries (200+ lanes) • 3.9 billion paired-end reads• 36-mers to 75-mers

Page 8: 30 Sept. 2010

8

Output• Expression, differential expression and alternative expression

values for 3.8 million features for each library processed• Library quality analysis• Number of features expressed (above background)

– Genes, transcripts, exon regions, junctions, etc.• Differential gene expression

– Ranked lists• Alternative expression

– Ranked lists– Alternative isoforms involving exon skipping, alternative transcript

initiation sites, etc.– Known or predicted novel isoforms

• Candidate peptides– Ranked lists

Page 9: 30 Sept. 2010

9

ALEXA-Seq data browser(using REMC analysis as an example)

• Goals– Visualization, interpretation, design of validation

experiments, distribute results to internal/external collaborators

• What kinds of questions does ALEXA-Seq allow us to ask/answer?

• http://www.alexaplatform.org/alexa_seq/Breast/Summary.htm

Page 10: 30 Sept. 2010

10

Is the RNA-Seq library suitable for alternative expression analysis?

• Library summary• Read quality• Tag redundancy• End bias• Mapping rates• Signal-to-noise• hnRNA & gDNA

contamination• Features detected

Page 11: 30 Sept. 2010

11

Is my favorite gene expressed? alternatively expressed?

Page 12: 30 Sept. 2010

12

What are the most highly expressed genes, exons, etc. in each library?

• Expression• Differential

expression • Alternative

expression• Provided for each

feature type (gene, exon, junction, etc.)

• Ranked lists of events

Page 13: 30 Sept. 2010

13

e.g. most highly expressed genes

Page 14: 30 Sept. 2010

14

What are the top DE and AE genes for each tissue comparison?

• Candidate genes• Each comparison• DE or AE events• Gains or Losses

Page 15: 30 Sept. 2010

15

Summary page for vHMECs vs. Luminal

Page 16: 30 Sept. 2010

16

Candidate features gained in vHMECs

CD10

vHMECs vs. Luminal

Page 17: 30 Sept. 2010

17

Which exons/junctions and corresponding peptides might be suitable for antibody design?

Page 18: 30 Sept. 2010

18

Candidate peptides gained in vHMECs

vHMECs vs. Luminal

Page 19: 30 Sept. 2010

19

Example housekeeping gene(Actin; no change)

Page 20: 30 Sept. 2010

20

CD10 (used to sort myoepithelial cells)

Myoepithelial & vHMECs

Luminal

422-fold higher in Myoepithelial than Luminal

Page 21: 30 Sept. 2010

21

CD227 (used to sort luminal epithelial cells)

Myoepithelial

Luminal CD227

CD227

Page 22: 30 Sept. 2010

22

Differential gene expression of CASP14 (Caspase 14 gained in vHMECs)

Page 23: 30 Sept. 2010

23

Novel skipping of PTEN exon 6

Page 24: 30 Sept. 2010

24

Exon 12 skipping of DDX5 (p68)

Page 25: 30 Sept. 2010

25

Tissue specific isoforms of CA12

Luminal

Myoepithelial vHMECs

Page 26: 30 Sept. 2010

26

Alternative first exons of INPP4B

Page 27: 30 Sept. 2010

27

Alternative first exons of SERPINB7

Page 28: 30 Sept. 2010

28

FERM domain containing proteins are alternatively expressed *

* (FRM6, FRM4A, FRMD4B are AE) (FRMD3, FRMD8 are DE)

Page 29: 30 Sept. 2010

29

Novel isoforms observed only in vHMECs

E6-E10 E7-E10

Page 30: 30 Sept. 2010

30

How reliable are predictions from ALEXA-Seq?

• Are novel junctions real?– What proportion validate by RT-PCR and Sanger

sequencing?• Are differential/alternative expression changes

observed between tissues accurate?– How well do DE values correlate with qPCR?

• To answer these questions we performed ~400 validations of ALEXA-Seq predictions from a comparison of two cell lines…

Page 31: 30 Sept. 2010

31

Validation (qualitative)

33 of 189 assays shown. Overall validation rate = 85%

Page 32: 30 Sept. 2010

32

Validation (quantitative)

qPCR of 192 exons identified as alternatively expressed by ALEXA-Seq

Validation rate = 88%

Page 33: 30 Sept. 2010

33

Conclusions

• ALEXA-Seq approach provides comprehensive global transcriptome profile– Input: paired-end RNA sequence data– Output: expression, differential expression, alternative

expression, candidate peptides, etc.• Detection of both known and novel isoforms

– Subset that differ between conditions• Predictions are highly accurate

– 86% validation rate by RT-PCR, qPCR and Sanger sequencing

• www.AlexaPlatform.org

Page 34: 30 Sept. 2010

34

Acknowledgements

SupervisorMarco Marra

Committee Joseph ConnorsStephane FlibotteSteve JonesGregg Morin

BioinformaticsObi GriffithRyan MorinRodrigo GoyaAllen DelaneyGordon RobertsonRichard Corbett

SequencingMartin HirstThomas ZengYongjun ZhaoHelen McDonald

LaboratoryTrevor PughTesa Severson

5-FU resistanceMichelle TangIsabella TaiMarco Marra

Multiple MyelomaRodrigo GoyaMarco Marra

NeuroblastomaOlena MorozovaMarco Marra

MorgenPamela HoodlessJacquie ScheinInanc Birol Gordon RobertsonShaun Jackman

Iressa and SutentObi GriffithSteven Jones

LymphomaRyan MorinMarco Marra

Griffith M, Griffith OL, Morin RD, Tang MJ, Pugh TJ, Ally A, Asano JK, Chan SY, Li I, McDonald H, Teague K, Zhao Y, Zeng T, Delaney AD, Hirst M, Morin GB, Jones SJM, Tai IT, Marra MA. Alternative expression analysis by RNA sequencing. In review (Nature Methods).

Page 35: 30 Sept. 2010

35