Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Transcript of Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
Whole Transcriptome Profiling of Cancer Tumors in Mouse PDX Models
http://www.impactjournals.com/oncotarget/index.php?journal=oncotarget&page=article&op=view&path%5B%5D=8014
Based on Breast Cancer Samples taken from the publication “Whole transcriptome profiling of patient-derived xenograft models as a tool to identify both tumor and stromal specific biomarkers” (James R. Bradford et. al.; DOI:
10.18632/oncotarget.8014)
IntroductionFrom a complex data set that included a number of cancer types in several different mouse
species (Whole Transcriptome Profiling of PDX Models), a focused dataset can be extracted to look at transcriptional differences between cancer subtypes and expression-based interaction between tumor and stroma. Specifically, this dataset contains 21 samples from 3 subtypes of
breast cancer in 4 different mouse models. Human tumor cells from patients with varying acuteness of cancer were placed in 4 different
mouse models. At a later point, RNA from human tumor cells and mouse stroma cells was extracted and analyzed using unsupervised and supervised analysis methods on the T-Bio
platform. The goal was to identify differences in expression as well as select representative genes that could be considered as biomarker candidates. Of special interest was
transcriptional stromal response to tumor type due to a major role stroma cells play in determining tumor malignancy.
PDX Mouse Species
XID: This mouse species is characterized by the absence of the thymus , mutant B lymphocytes and
no T cell function.
NOD SCID: Combined immunodeficiency, with no mature T cells and B cells.
Athymic Nude: This mouse species lacks the thymus and is
unable to produce T-cells
CB17 SCID: a severe combined immunodeficiency affecting both B
and T lymphocytes. They have normal NK cells, macrophages, and
granulocytes. Breast TN: Triple Negative Breast Cancer, this cancer is negative (based on gene expression) for common biomarker genes including ER, PR, and HER2 (genes that express hormones) and does not respond to typical hormonal therapy. Survival rates are lower for this cancer than ER+ cancer types. Breast ER+: Estrogen Receptor Positive, this is the most common breast cancer diagnosed. Treatment often includes Hormone Therapy and has a more positive outlook in the short term. Breast HER2+: Human Epidermal growth factor Receptor Positive, tends to be a more aggressive cancer type than ER+.
After samples were extracted, RNA libraries were prepared with the Illumina TruSeq RNA Sample Preparation kit (un-stranded) according to the manufacturer’s protocol. These libraries were then submitted for 100 bp paired-end sequencing on the Illumina HiSeq 2000 platform using one lane per three to six PDX models.
Sample Summary
RNA-seq pipeline prepares all annotated and non-annotated genomic element estimation of
expression levels
Removing genomic elements that did not have any expression (all zeros) in the RSEM table.
Quantile NormalizationPrincipal Component
Analysis
RSEM output tables of genes, isoforms and exons are prepared for Machine
Learning Analysis1. Mapping TopHat2. Finding Isoforms using Cufflinks3. GTF file of isoforms using Cuffmerge4. Mapping Bowtie-2t on new transcriptome Factor Regression Analysis
Analysis Pipeline Overview
Principal Component Analysis (PCA)
Principal Component Analysis is a data reduction technique that represents the dataset structure on a 2 dimensional plane projected on principal component coordinates. Those components that explain the most percentage of variability are chosen as principal.After genetic elements such as genes and isoforms were mapped and exported as a table, they were prepared for machine learning analysis. Zero level expression across all samples were removed. All values were normalized using quintile normalization. PCA was performed using log values.
-20 -15 -10 -5 0 5 10 15
-10
-5
0
5
10
15
PCA of Human and Mouse Genes (RSEM-FDR 0.05)
Mouse (Stroma)
-20 -15 -10 -5 0 5 10 15 20
-15
-10
-5
0
5
10
15
PCA Mouse and Human RSEM FDR: 0.05 After Batch Correction
Mouse (Stroma)
Before Correction After Correction
Batch Effect Correction
Batch Effect: unwanted technical interference that occurs when when data arise from complex experiments, involving, for instance, cell sorting, low-input RNA or different batches (e.g., multiple sequencing centers or different read lengths); we refer to such typically unknown nuisance technical effects as unwanted variation. Removal of Batch effect is a crucial normalization step in the analysis of RNA-seq dat to remove confounding variability.http://www.nature.com/nbt/journal/v32/n9/fig_tab/nbt.2931_F2.htmlFILE: RSEM0.05_beforeBATCH.xls; 0.05_batch_sorted_PCA_dots.xlsx
Data Pre-processing
GENES FILE:expression_genes_breast_conct_normalized_PCA.xlsx; ISOFORMS FILE: expression_isoform_nozero_breast_normalized_PCA_.xlsx
GENES: PC1:17.86%, PC2:17.65
ISOFORMS: PC1:19.65%, PC2:9.77%
Initial PCA of gene and isoform expression profiles using concatenated mouse and human genome did
not produce any meaningful results
SAMPLES
COMPONENTS
Further investigation resulted in identifying
Batch Effect across samples
-0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
Estrogen Receptor
Estrogen Receptor
Estrogen Receptor Estrogen Receptor
Estrogen Receptor Estrogen Receptor
Estrogen Receptor
Triple Negative Triple Negative
Triple Negative
Triple Negative
Triple Negative Triple Negative Triple Negative
Triple Negative
Triple Negative
Triple Negative
Triple Negative
Triple Negative
Triple Negative
Triple Negative Estrogen Receptor
-0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5
-0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
Estrogen Receptor Estrogen Receptor
Estrogen Receptor
Estrogen Receptor
Estrogen Receptor
Estrogen Receptor
Estrogen Receptor
Triple Negative
Triple Negative
Triple Negative Triple Negative Triple Negative
Triple Negative
Triple Negative
Triple Negative
Triple Negative
Triple Negative Triple Negative
Triple Negative Triple Negative
Triple Negative Estrogen Receptor
A total of 40,266 ENSG- Human Genes were identified from RSEM concatenated results after removing all zeros. This table was used to create the following PCA, which shows only the human gene expression from the RSEM table. We can see good separation between the Triple negative subtype and ER+ subtype with sample ERR1084766 as an outlier in both genes and isoforms.
PCA of Human Genes and Isoforms (Tumor)
ERR1084802
Triple_NEG ER+ HER2+
Genes
A total of 40,266 ENSG- Human Genes were identified from RSEM concatenated results after removing all zeros. This table was used to create the following PCA, which shows only the human gene expression from the RSEM table. We can see good separation between the Triple negative subtype and ER+ subtype with sample ERR1084766 as an outlier in both genes and isoforms.
ERR1084810
ERR1084766
ERR1084802 Triple_NEG ER+ HER2+
Isoforms
PC1:22.16%, PC2:9.22%
PC1:12.38%, PC2:10.09%
ER+
ER+
Triple_NEG
Triple_NEG
-0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4
-0.3
-0.2
-0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
ERR1084763
ERR1084766
ERR1084799
AthymicNude__Triple_NEG NOD_SCID_Triple_NEG Athymicnude__ER+ CB17_SCID_Triple_NEG NOD_SCID__ER+
Triple NEG
ER+
A total of 22,656 ENSMUSG Mouse Genes were identified from RSEM concatenated results after removing all zeros. This table was used to create the following PCA. This represents the stroma expression in the mouse in the PDX model, this PCA is labeled to show the different mouse species used in the study. Athymic nude mice lacks the thymus and is unable to produce T-cells. The SCID mice: both have a combined immunodeficiency. This mouse species is characterized by the absence of the thymus , mutant B lymphocytes and no T cell function.
PCA of Mouse Genes (Stroma)
Factor Regression Analysis
In order to select genes and isoforms that are affected by tumor type (Factor A) and/or mouse types (Factor B), we can apply Factor Regression Analysis to gene and isoform expression tables. One can select mouse genes under the influence of tumor type and human genes/isoforms that are under the influence of mouse type. Thus, we can select a number of genomic elements that are involved in the interplay of tumor and stroma.
A0B0 Triple Neg/ Athymic Nude
A0B1 Triple Neg-/SCID
A1B0 ER+/ Athymic Nude
A1B1 ER+/ SCID
Factor Regression Analysis output table
Gene ID Expression Levels Factor Influence FTEST
Factor Analysis Output: out_expression_genes_breast_cont_afterbatch_prefactor_changed.xlsx
Factor A: Triple Negative vs. ER+
Factor B: Athymic Mouse vs. SCID Mouse
Factor Table (2 factors, 2 levels each)
Factor A (Triple Negative vs. ER+)
0
2
4
6
8
10
12
ER+Triple-Negative
SFRP1 has been associated with the TNBC subtype, expression shows high upregulation when compared with ER+ hormone positive samples. In our results, we can see that the SFRP1 gene is upregulated in the triple negative samples when compared to the ER+. (Influence of secreted frizzled receptor protein 1 (SFRP1) on neoadjuvant chemotherapy in triple negative breast cancer does not rely on WNT signaling. – Bernemann C et. al.)
Factor A: SFRP1_ENSG00000104332
0
2
4
6
8
10
12
ENSG00000160182_ENSG00000160182
TFF1 and TFF3 play a role in tumors under the effect of estrogen, thus the ER+ samples have an increased expression to breast cancer subtype TNBC, that are suppose to be negative for estrogen receptor.
http://www.neoplasia.com/article/S1476-5586(10)80022-3/pdfhttp://www.ncbi.nlm.nih.gov/pubmed/11919164
Factor A:TFF1&3: Estrogen-Regulated Proteins
0123456789
ENSG00000143556 ENSG00000120075 ENSG00000205076 ENSG00000170608
ENSMUSG00000060183 ENSG00000259610 ENSG00000251533 ENSMUSG00000022157
Factor B: ER- Athymic Nude vs. NOD SCID MICE ENSG00000143556 S100 calcium binding protein A7ENSG00000120075 homeobox B5ENSG00000205076 galectin 7ENSG00000170608 forkhead box A3
ENSG00000251533long intergenic non-protein coding RNA 605
ENSMUSG00000022157 mast cell protease 8ENSMUSG00000060183
chemokine (C-X-C motif) ligand 11
Chemokine (C-X-C motif): encodes a secretory protein that is a member of the CXC subfamily of chemokines., which recruit and activate leukocytes, classified by function (inflammatory or homeostatic) or by structure.
Cancer-Type Specific Examples:
ENST is an LncRNA whose genes are located near the protein coding gene AFF3, the protein produced by this gene has been associated with the ENT/WNT pathway that is vital for cell migration and invasion.
LncRNA Relationship Coding Gene Transcript
Coding Gene Symbol
ENST00000434301
Natural antisense
ENST00000317233 AFF3
1_TN102_TN9
2_TN82_TN7
2_TN62_TN5
1_TN42_TN1
1_TN2
2_TN11
2_TN134_ER1
3_ER23_ER7
4_ER34_ER4
4_ER54_ER6
0
1
2
3
4
5
6
Lnc-RNA-ENST00000434301Triple-Negative
ER+
1_TN102_TN9
2_TN82_TN7
2_TN62_TN5
2_TN11_TN2
2_TN3
2_TN11
2_TN12
2_TN134_ER1
3_ER23_ER7
4_ER34_ER4
4_ER54_ER6
0123456789
10
AFF3-ENSG00000144218 ER+Triple-Nega-tive
ERR1084802 Triple_NEG
ERR1084801_Triple_NEG
ERR1084800_Triple_NEG
RR1084799_Triple_NEG
ERR1084798_Triple_NEG
ERR1084810_Triple_NEG
ERR1084809_Triple_NEG
ERR1084808_Triple_NEG
ERR1084804_Triple_NEG
ERR1084807_Triple_NEG
ERR1084768_Triple_NEG
ERR1084766_Triple_NEG
ERR1084775_ER+
ERR1084765_ER+
ERR1084811_ER+
ERR1084764_ER+
ERR1084763_ER+
ERR1084806_ER+
ERR1084805_ER+02468
CXorf61-ENST00000371894Triple-Negative ER+
CXorf61 is specific to Triple Negative Breast CancerWhen overlapping isoforms expressed by mouse and human, Cxorf61 was identified as
an outlier. CXorf61 fulfils the requirement of an ideal target for cancer immunotherapy as it is cancer-cell selective and expressed at a high frequency in TNBC tumors.
0
2
4
6
8
10
12ENSG00000091831-Human Estrogen Receptor
ERR10
84802
_Trip
le_NEG
ERR10
8480
1_Tri
ple_N
EG
ERR10
8480
0_Tri
ple_N
EG
RR1084
799_
Triple
_NEG
ERR10
8480
4_Tri
ple_N
EG
ERR10
8480
3_Tri
ple_N
EG
ERR10
8481
0_Tri
ple_N
EG
ERR10
8480
9_Tri
ple_N
EG
ERR10
8480
8_Tri
ple_N
EG
ERR10
8480
7_Tri
ple_N
EG
ERR10
8479
8_Tri
ple_N
EG
ERR10
8476
8_Tri
ple_N
EG
ERR10
8476
6_Tri
ple_N
EG
ERR10
8476
5_ER
+
ERR10
8481
1_ER
+
ERR10
8476
4_ER
+
ERR10
8476
3_PE
_ER+
ERR10
8480
6_ER
+
ERR10
8477
5_ER
+
ERR10
8480
5_ER
+
ERR10
8476
7_HER
2012345
ENSG00000140009-Human Estrogen Receptor 2Demonstrates TNBC cancer does not have Erα protein expression, but does have ER β protein expression.
Second Look at Gene ExpressionDetermining if a Breast sample is TNBC is often done by protein expression values/ histology. It is not uncommon for the TNBC to have some gene expression for Human Estrogen Receptor, still when compared with the expression of the ER+ samples, a few could be considered ambiguous. These samples are labeled in red.
Summary:• Mapping with a Concatenated (Human/Mouse) Genome; expression results in ~60% mapped to human genome and ~40% mapped with
mouse genome. • Human (Tumor) RSEM Breast Expression does separate by hormone subtypes, ER+ and triple negative samples show clear separation.
This trend of expression is in agreement with the publication. • Mouse (Stroma) RSEM Breast Expression doesn't have clear separation between either the tumor type or stroma type, but this needs to
be further investigated. • Factor Regression Analysis is useful to identify relationship between stroma and tumor gene and isoform expression.• When analyzing the hormone expression, a few of the triple negative samples did have higher than expected ESR1 expression. Overall,
the trend did still show significantly higher hormone expression in ER+ samples than triple negative. • This presentation gives a brief overview of the trends found in this data set and show a few examples of how the T-BioInfo platform can
be used with complex data sets to find meaningful results.
Author Results:• Focus on comparison of their PDX models with other clinical data sets as a validation for the validity of the PDX models. • Identified expression differences between the breast subtypes (triple negative and ER+) • Comparison of their samples with other clinical samples demonstrated that key markers of breast cancer can be observed with these
PDX models and highlights the difference of expression when stromal recruitment is accounted for. • The authors struggled to identify any significant association between mouse gender and tumor stage and only small association with
mouse strain, with clusters specific for athymic nude mice.
Files: https://www.dropbox.com/sh/mtat4tjdj3f2rcy/AACGks1itR_spMydQ5Dpq5k2a?dl=0