Whole Transcriptome Analysis of Testicular Germ Cell Tumors

1
Figure 4. Partek Flow® Whole Transcriptome pipeline. This pipeline is a complete analysis of Whole Transcriptome RNA- Seq data from unaligned reads to a list of differentially expressed genes. The alignment is based on the two-step alignment method for Ion Proton™ transcriptome data. The resulting aligned reads from both Tophat2 and the Bowtie2 are combined into one data node, quantified against a known transcriptome and analyzed for differential gene expression using Partek® gene specific analysis. Lovorka Degoricija 1 , Kathy Y. Lee 1 , Sunali Patel 1 , Shirley Chu 1 , Ad J. M. Gillis 2 , Martin Rijlaarsdam 2 , Lambert C. J. Dorssers 2 and Leendert Looijenga 2 . 1 Thermo Fisher Scientific, 180 Oyster Point Blvd, South San Francisco, CA, 94080, USA and 2 Erasmus Medical Center—University Medical Center Rotterdam, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands RESULTS Figure 3. The Ion Torrent Proton™ System. Sequencing is performed on the Ion Torrent Proton™ System (A), utilizing semi-conductor technology. The sequencing chip (B) consists of millions of individual wells, each with a single sensor (C). Sequencing is performed by placing a clonal template Ion Sphere into each well (D). Nucleotides flow sequentially over the chip and incorporation results in a release of H + ions. The incorporation is detected by the sensors as a change in pH. Figure 8. Validation using TaqMan® Assays. A subset of the genes selected for screening were validated using TaqMan® Gene Expression Assays (1ng per qPCR reaction). ABSTRACT Next generation sequencing of the whole transcriptome enables high resolution measurement of gene expression activity in different tissue and cell types. This methodology provides an in depth study of known transcripts and depending on the data analysis, allows identification of additional transcript types such as transcript variants, fusion transcripts, and small and long ncRNAs. In this study we performed RNA-Seq using the Ion Torrent™ sequencing platform to compare the expression profile of testicular germ cell cancers (seminoma type, n=3) and normal testis (n=3). Using Partek Flow® 3.0 and TopHat/BowTie or Star aligners, we aligned the reads to the human genome and mapped sequences to the RefSeq database. Differentially expressed genes were identified and screened with additional germ cell tumors. PCA analysis showed clear separation of the two sample types indicating biological differences. List of differentially expressed genes generated from TopHat/Bowtie and Star were similar. We identified a large number of genes that were up and down regulated with high degree of significance (p<0.01, >2X FC (fold change)). These included genes related to testicular tissue type, stem cell pluripotency (NANOG; POU5F1) and proliferation (KRAS, CCND2). In addition, a number of differentially expressed noncoding RNAs were identified (SNORD12B, XIST). The method was validated on a small set of genes (n=20) using qPCR (TaqMan® Assays) and were found to be correlated. We used the OpenArray® platform to quickly and quantitatively screen 102 differentially expressed genes and 10 endogenous control genes across a number of different testicular germ cell cancer types. We used a complete work flow solution from sample prep to NGS to qPCR to compare the expression profile of normal testis and seminoma type germ cell tumors. From the NGS experiments we identified a large number of differentially expressed genes for qPCR screening with samples from different types of germ cell tumors. Results from these screening studies will be presented. CONCLUSIONS 1. We successfully used the Ion Proton™ System with Ambion® RNA library preparation kits for Whole Transcriptome sequencing on normal and seminoma type cancer samples. 2. Partek Flow® 3.0 and TopHat2/Bowtie2 and Star aligners were used to identify a large number of differentially expressed genes. 3. Validation with TaqMan® Assays showed very good directional concordance for 95% (19/20) of the genes tested. 4. From NGS data we identified 112 genes of interest that had p<0.05 and >2 fold differential for screening on the QuantStudio™ 12K Flex Real-Time qPCR Systems OpenArray® Platform. 5. Data from 24 samples show that the non- seminoma, seminoma, and normal samples cluster separately and can be differentiated. REFERENCES 1. User Guide: Ion Total RNA-Seq Kit v2 (Pub No 4476286, Rev E), Life Technologies 2. User Guide: ERCC RNA Spike-In Control Mixes (Pub No 4455352 Rev D), Life Technologies TRADEMARKS/LICENSING © 2014 Thermo Fisher Scientific Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and its subsidiaries unless otherwise specified. Flow is a registered trademark of Partek Incorporated. Whole Transcriptome Analysis of of Testicular Germ Cell Tumors Life Technologies • 5791 Van Allen Way • Carlsbad, CA 92008 • www.lifetechnologies.com Sample Prep Library Prep and Template Prep NGS Sequencing on Ion Proton™ Data Analysis using Partek Flow® Map reads to Human genome A. Profiling on NGS Ion Proton™ Validation and Screening Data Analysis Identify Biomarker s Open Array Panel B. Validation/Screening on OpenArray®/OA Panel MATERIALS AND METHODS Figure 5. Comparison of ERCC spike-in controls for normal and seminoma type cancer samples. ERCC analysis was performed using Partek Flow®. Samples treated with a RiboMinus™ Eukaryote System v2 for ribosomal depletion typically yield R 2 values of 0.8-0.9. An R 2 value of 0.8 and above indicates excellent accuracy of measurements. A) Instrument B) Semiconductor Chip C) Chip cross- section D) Detection dNTP Silicon Substrate To column receiver Sensor Plate Drain Source Bulk pH Q V Sensing Layer H + Figure 7. Comparison of the number of differentially expressed genes. Venn diagram comparing the number of differentially expressed genes that were identified in Partek Genomic Suite using TopHat2/Bowtie2 (A) and Star (B) aligners, p< 0.05 and fold change is greater than 2. 112 genes that showed differential expression with both aligners were selected for screening on the Open® Array® System. Figure 6. PCA on Normal and Tumor samples. Ion Torrent Proton™ data analysis was performed using Partek Flow®. PCA map shows clustering of the tumor (blue) and normal (red) tissue with great separation between the two tissue populations. Figure 2. (A) Total RNA quality check. Bioanalyzer trace of total RNA after extraction with the mirVana™ miRNA Isolation Kit. The presence of 18S and 28S rRNA peaks (blue oval) allows for the calculation of RIN values and indicates high quality total RNA. RIN values ranged from 8-9.2. (B) Figure 3. Fragmented RNA after rRNA depletion. Bioanalyzer trace of fragmented RNA after rRNA depletion using the RiboMinus™ Eukaryote System v2. A B Figure 1. Complete workflow for whole transcrriptome translational project. Workflow from profiling with sequencing on the Ion Torrent Proton™ System (step 1) to screening and validation on the OpenArray® (step 2). Total RNA-Seq Kit v2 was used to prepare the library starting with 400-500 ng of Ribo-depleted and DNase treated RNA. Ambion® ERCC Spike In controls were used to monitor the quality of the library. (Ref 1) Figure 9. Distribution of 112 genes spotted on QuantStudio™ OpenArray ® Panel. Genes selected had consistent differential expression between normal and GCC samples, and are suggested to play a role in in normal and aberrant gonadal development. CCPG1 CD53 CDX1 CYP17A 1 DPPA5 DSP ELOVL6 EPHB4 ESRP1 FAM162 B GSG1 IGF2 L1TD1 LDHC MOB1A MOB1B MSH2 MYCL MYCN NGEF NLRP9 OSBPL3 PDPN PIK3AP1 PIK3C2B PIM2 PLCZ1 PQLC3 PTPN12 SERPINA 5 SLA SNX10 SOX13 SOX30 SOX9 SYNJ2B P TET1 TFCP2L1 YWHAZ ZNF107 AKAP4 ALPL ALPPL2 ARHGDIB BCAR4 CARD11 CASP8 CCND2 CD44 CPEB1 CTSC CXCL9 DICER1 DNAAF 1 DSCC1 EFEMP 1 CECR1 ETV1 ETV4 ETV5 FMN2 HDAC9 HELLS IGF2BP1 JUP KIT KRAS LASP1 LDHB LIN28A MDM2 MSH6 NANOG NANOS3 NFE2L3 NLRP7 NRCAM PODXL POU5F1 PRDM1 PRDM14 PROM1 PTCHD1 RAB15 RPRM SET SIRT1 SOX2 SOX15 SOX17 SPRY4 SUSD2 TFAP2C XIST ZFP42 TEAD4 WNT3 DLG3 FZD10 PPP1CB FRMD6 FZD5 GAPDH HPRT1 USP32 BCR CASC3 ATP5B ARF1 ALAS1 IPO8 GUSB 36% 49% 6% 9% Distribution of Selected Genes Differential Expression between GCC and Normal Role in Normal or GCC Development Hippo pathway related Endogenous Controls Figure 10 Cluster Analysis of OpenArray ® Panel Screening Data. Expression levels were normalized to USP32 (most stable in samples according to transcriptome sequencing and qPCR data), and hierarchically clustered using an unsupervised centroid algorithm. Green means ΔΔCt is greater than the median of that gene in all samples; red means ΔΔCt is lower than median; grey is undetectable. SE samples are separated from NS and N samples using these selected genes.

Transcript of Whole Transcriptome Analysis of Testicular Germ Cell Tumors

Page 1: Whole Transcriptome Analysis of Testicular Germ Cell Tumors

Figure 4. Partek Flow® Whole Transcriptome pipeline. This pipeline is a complete analysis of Whole Transcriptome RNA-Seq data from unaligned reads to a list of differentially expressed genes. The alignment is based on the two-step alignment method for Ion Proton™ transcriptome data. The resulting aligned reads from both Tophat2 and the Bowtie2 are combined into one data node, quantified against a known transcriptome and analyzed for differential gene expression using Partek® gene specific analysis.

Lovorka Degoricija1, Kathy Y. Lee1, Sunali Patel1, Shirley Chu1, Ad J. M. Gillis2, Martin Rijlaarsdam2, Lambert C. J. Dorssers2 and Leendert Looijenga2. 1Thermo Fisher Scientific, 180 Oyster Point Blvd, South San Francisco, CA, 94080, USA and 2Erasmus Medical Center—University Medical Center Rotterdam, P.O. Box 2040, 3000 CA Rotterdam, The Netherlands

RESULTS

Figure 3. The Ion Torrent Proton™ System. Sequencing is performed on the Ion Torrent Proton™ System (A), utilizing semi-conductor technology. The sequencing chip (B) consists of millions of individual wells, each with a single sensor (C). Sequencing is performed by placing a clonal template Ion Sphere into each well (D). Nucleotides flow sequentially over the chip and incorporation results in a release of H+ ions. The incorporation is detected by the sensors as a change in pH.

Figure 8. Validation using TaqMan® Assays. A subset of the genes selected for screening were validated using TaqMan® Gene Expression Assays (1ng per qPCR reaction).

ABSTRACT Next generation sequencing of the whole transcriptome enables high resolution measurement of gene expression activity in different t issue and cell types. This methodology provides an in depth study of known transcripts and depending on the data analysis, allows identification of additional transcript types such as transcript variants, fusion transcripts, and small and long ncRNAs. In this study we performed RNA-Seq using the Ion Torrent™ sequencing platform to compare the expression profile of testicular germ cell cancers (seminoma type, n=3) and normal testis (n=3). Using Partek Flow® 3.0 and TopHat/BowTie or Star aligners, we aligned the reads to the human genome and mapped sequences to the RefSeq database. Differentially expressed genes were identified and screened with additional germ cell tumors. PCA analysis showed clear separation of the two sample types indicating biological differences. List of differentially expressed genes generated from TopHat/Bowtie and Star were similar. We identified a large number of genes that were up and down regulated with high degree of significance (p<0.01, >2X FC (fold change)). These included genes related to testicular tissue type, stem cell pluripotency (NANOG; POU5F1) and proliferation (KRAS, CCND2). In addition, a number of differentially expressed noncoding RNAs were identified (SNORD12B, XIST). The method was validated on a small set of genes (n=20) using qPCR (TaqMan® Assays) and were found to be correlated. We used the OpenArray® platform to quickly and quantitatively screen 102 differentially expressed genes and 10 endogenous control genes across a number of different testicular germ cell cancer types. We used a complete work flow solution from sample prep to NGS to qPCR to compare the expression profile of normal testis and seminoma type germ cell tumors. From the NGS experiments we identified a large number of differentially expressed genes for qPCR screening with samples from different types of germ cell tumors. Results from these screening studies will be presented.

CONCLUSIONS 1.  We successfully used the Ion Proton™

System with Ambion® RNA library preparation kits for Whole Transcriptome sequencing on normal and seminoma type cancer samples.

2.  Partek Flow® 3.0 and TopHat2/Bowtie2 and Star aligners were used to identify a large number of differentially expressed genes.

3.  Validation with TaqMan® Assays showed very good directional concordance for 95% (19/20) of the genes tested.

4.  From NGS data we identified 112 genes of interest that had p<0.05 and >2 fold d i f f e r e n t i a l f o r s c r e e n i n g o n t h e QuantStudio™ 12K Flex Real-Time qPCR Systems OpenArray® Platform.

5.  Data from 24 samples show that the non-seminoma, seminoma, and normal samples cluster separately and can be differentiated.

REFERENCES 1.  User Guide: Ion Total RNA-Seq Kit v2 (Pub

No 4476286, Rev E), Life Technologies

2.  User Guide: ERCC RNA Spike-In Control Mixes (Pub No 4455352 Rev D), Life Technologies

TRADEMARKS/LICENSING © 2014 Thermo Fisher Scientific Inc. All rights reserved. All trademarks are the property of Thermo Fisher Scientific and its subsidiaries unless otherwise specified. Flow is a registered trademark of Partek Incorporated.

Whole Transcriptome Analysis of of Testicular Germ Cell Tumors

Life Technologies • 5791 Van Allen Way • Carlsbad, CA 92008 • www.lifetechnologies.com

Sample Prep Library Prep and Template

Prep

NGS Sequencing on Ion Proton™

Data Analysis using Partek

Flow®

Map reads to Human genome

A. Profiling on NGS Ion Proton™

Validation and

Screening Data

Analysis Identify

Biomarkers

Open Array Panel

B. Validation/Screening on OpenArray®/OA Panel

MATERIALS AND METHODS

Figure 5. Comparison of ERCC spike-in controls for normal and seminoma type cancer samples. ERCC analysis was performed using Partek Flow®. Samples treated with a RiboMinus™ Eukaryote System v2 for ribosomal depletion typically yield R2 values of 0.8-0.9. An R2 value of 0.8 and above indicates excellent accuracy of measurements.

A) Instrument B) Semiconductor Chip

C) Chip cross-section

D) Detection dNTP

Silicon Substrate To column receiver

Sensor Plate

Drain Source Bulk

∆ pH

∆ Q

∆ V

Sensing Layer

H+

Figure 7. Comparison of the number of differentially expressed genes. Venn diagram comparing the number of differentially expressed genes that were identified in Partek Genomic Suite using TopHat2/Bowtie2 (A) and Star (B) aligners, p< 0.05 and fold change is greater than 2. 112 genes that showed differential expression with both aligners were selected for screening on the Open® Array® System.

Figure 6. PCA on Normal and Tumor samples. Ion Torrent Proton™ data analysis was performed using Partek Flow®. PCA map shows clustering of the tumor (blue) and normal (red) tissue with great separation between the two tissue populations.

Figure 2. (A) Total RNA quality check. Bioanalyzer trace of total RNA after extraction with the mirVana™ miRNA Isolation Kit. The presence of 18S and 28S rRNA peaks (blue oval) allows for the calculation of RIN values and indicates high quality total RNA. RIN values ranged from 8-9.2. (B) Figure 3. Fragmented RNA after rRNA depletion. Bioanalyzer trace of fragmented RNA after rRNA depletion using the RiboMinus™ Eukaryote System v2.

A B

Figure 1. Complete workflow for whole transcrriptome translational project. Workflow from profiling with sequencing on the Ion Torrent Proton™ System (step 1) to screening and validation on the OpenArray® (step 2). Total RNA-Seq Kit v2 was used to prepare the library starting with 400-500 ng of Ribo-depleted and DNase treated RNA. Ambion® ERCC Spike In controls were used to monitor the quality of the library. (Ref 1)

Figure 9. Distribution of 112 genes spotted on QuantStudio™ OpenArray® Panel. Genes selected had consistent differential expression between normal and GCC samples, and are suggested to play a role in in normal and aberrant gonadal development.

CCPG1 CD53 CDX1 CYP17A1 DPPA5 DSP ELOVL6

EPHB4 ESRP1 FAM162B GSG1 IGF2 L1TD1 LDHC

MOB1A MOB1B MSH2 MYCL MYCN NGEF NLRP9

OSBPL3 PDPN PIK3AP1 PIK3C2B PIM2 PLCZ1 PQLC3

PTPN12 SERPINA5 SLA SNX10 SOX13 SOX30 SOX9

SYNJ2BP TET1 TFCP2L1 YWHAZ ZNF107 AKAP4 ALPL

ALPPL2 ARHGDIB BCAR4 CARD11 CASP8 CCND2 CD44

CPEB1 CTSC CXCL9 DICER1 DNAAF1 DSCC1 EFEMP

1

CECR1 ETV1 ETV4 ETV5 FMN2 HDAC9 HELLS

IGF2BP1 JUP KIT KRAS LASP1 LDHB LIN28A

MDM2 MSH6 NANOG NANOS3 NFE2L3 NLRP7 NRCAM

PODXL POU5F1 PRDM1 PRDM14 PROM1 PTCHD1 RAB15

RPRM SET SIRT1 SOX2 SOX15 SOX17 SPRY4

SUSD2 TFAP2C XIST ZFP42 TEAD4 WNT3 DLG3

FZD10 PPP1CB FRMD6 FZD5 GAPDH HPRT1 USP32

BCR CASC3 ATP5B ARF1 ALAS1 IPO8 GUSB

36%

49%

6% 9%

Distribution of Selected Genes

Differential Expression between GCC and Normal Role in Normal or GCC Development Hippo pathway related

Endogenous Controls

Figure 10 Cluster Analysis of OpenArray® Panel Screening Data. Expression levels were normalized to USP32 (most stable in samples according to transcriptome sequencing and qPCR data), and hierarchically clustered using an unsupervised centroid algorithm. Green means ΔΔCt is greater than the median of that gene in all samples; red means ΔΔCt is lower than median; grey is undetectable. SE samples are separated from NS and N samples using these selected genes.