Novel Aberrations Uncovered in Barrett's Esophagus and...

13
Genomics Novel Aberrations Uncovered in Barrett's Esophagus and Esophageal Adenocarcinoma Using Whole Transcriptome Sequencing Jesper L.V. Maag 1,2 , Oliver M. Fisher 2,3 , Angelique Levert-Mignon 3 , Dominik C. Kaczorowski 1 , Melissa L. Thomas 3,4 , Damian J. Hussey 5 , David I. Watson 5 , Antony Wettstein 3 ,Yuri V. Bobryshev 2,3 , Melanie Edwards 3,4 , Marcel E. Dinger 1,2 , and Reginald V. Lord 3,4 Abstract Esophageal adenocarcinoma (EAC) has one of the fastest increases in incidence of any cancer, along with poor ve-year survival rates. Barrett's esophagus (BE) is the main risk factor for EAC; however, the mechanisms driving EAC development remain poorly understood. Here, transcriptomic proling was performed using RNA-sequencing (RNA-seq) on premalignant and malignant Barrett's tissues to better understand this dis- ease. Machine-learning and network analysis methods were applied to discover novel driver genes for EAC development. Identied gene expression signatures for the distinction of EAC from BE were validated in separate datasets. An extensive analysis of the noncoding RNA (ncRNA) landscape was per- formed to determine the involvement of novel transcriptomic elements in Barrett's disease and EAC. Finally, transcriptomic mutational investigation of genes that are recurrently mutated in EAC was performed. Through these approaches, novel driver genes were discovered for EAC, which involved key cell cycle and DNA repair genes, such as BRCA1 and PRKDC. A novel 4- gene signature (CTSL, COL17A1, KLF4, and E2F3) was identi- ed, externally validated, and shown to provide excellent distinction of EAC from BE. Furthermore, expression changes were observed in 685 long noncoding RNAs (lncRNA) and a systematic dysregulation of repeat elements across different stages of Barrett's disease, with wide-ranging downregulation of Alu elements in EAC. Mutational investigation revealed distinct pathways activated between EAC tissues with or with- out TP53 mutations compared with Barrett's disease. In sum- mary, transcriptome sequencing revealed altered expression of numerous novel elements, processes, and networks in EAC and premalignant BE. Implications: This study identied opportunities to improve early detection and treatment of patients with BE and esophageal adenocarcinoma. Mol Cancer Res; 15(11); 155869. Ó2017 AACR. Introduction Esophageal adenocarcinoma (EAC) has one of the fastest increases in incidence rates of all cancers since the 1970s in white populations of all age groups in many countries (1, 2). The strongest risk factor for EAC is the presence of Barrett's esophagus (BE), a condition in which columnar intestinal-type mucosa (IM) replaces the normal squamous lining of the esophagus in response to gastroesophageal reux. Barrett's IM without dysplasia (non-dysplastic BE, NDBE) can progress through the stages of low-grade dysplasia (LGD) to high-grade dysplasia (HGD) to invasive EAC (3). Most patients with EAC present with late stages of disease, contributing to the poor 5-year population survival rates of 15% (4, 5). Even after highly aggressive chemoradiotherapy followed by esophagect- omy for treatment of potentially curable disease, 5-year survival rates are typically below 50% (6). The mechanisms driving EAC development are poorly understood, and the successes of early detection screening programs remain modest (7). Exome- and whole-genome sequencing studies have recently contributed greatly to the identication of driver mutations and frequent catastrophic, chromothriptic events (8). Further efforts have been made to investigate the order of genomic alterations occurring in the development of EAC by studying the mutations in Barrett's IM, HGD, and EAC (9). Whereas previous expression studies of Barrett's disease have mainly used microarray-based technologies relying on predened, mostly coding genes, RNA-sequencing (RNA-seq) offers an unbiased method of analyzing the whole 1 Genome Informatics, Genomics and Epigenetics Division, Garvan Institute of Medical Research, Sydney, NSW, Australia. 2 Faculty of Medicine, St Vincent's Clinical School, University of New South Wales, Sydney, NSW, Australia. 3 Gastroesophageal Cancer Program, St. Vincent's Centre for Applied Medical Research, Sydney, Australia. 4 University of Notre Dame School of Medicine, Sydney, Australia. 5 Department of Surgery, Flinders University, Adelaide, Australia. Note: Supplementary data for this article are available at Molecular Cancer Research Online (http://mcr.aacrjournals.org/). J.L.V. Maag and O.M. Fisher share rst authorship and contributed equally to this article. M.E. Dinger and R.V. Lord share senior authorship of this article. Corresponding Authors: Marcel E. Dinger, The Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, NSW, Australia 2010. Phone: 61-2- 9935-5860; Fax: 61-2-9935-5860; E-mail: [email protected]; Reginald V. Lord, Suite 606, 438 Victoria Street, Darlinghurst, NSW, Australia 2010. Phone: 61-2-8382-6671; Fax: 61-2-8382-6672; E-mail: [email protected] doi: 10.1158/1541-7786.MCR-17-0332 Ó2017 American Association for Cancer Research. Molecular Cancer Research Mol Cancer Res; 15(11) November 2017 1558 on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

Transcript of Novel Aberrations Uncovered in Barrett's Esophagus and...

Genomics

Novel Aberrations Uncovered in Barrett'sEsophagus and Esophageal AdenocarcinomaUsing Whole Transcriptome SequencingJesper L.V. Maag1,2, Oliver M. Fisher2,3, Angelique Levert-Mignon3,Dominik C. Kaczorowski1, Melissa L. Thomas3,4, Damian J. Hussey5,David I.Watson5, Antony Wettstein3, Yuri V. Bobryshev2,3, Melanie Edwards3,4,Marcel E. Dinger1,2, and Reginald V. Lord3,4

Abstract

Esophageal adenocarcinoma (EAC) has one of the fastestincreases in incidence of any cancer, along with poor five-yearsurvival rates. Barrett's esophagus (BE) is the main risk factorfor EAC; however, the mechanisms driving EAC developmentremain poorly understood. Here, transcriptomic profiling wasperformed using RNA-sequencing (RNA-seq) on premalignantand malignant Barrett's tissues to better understand this dis-ease. Machine-learning and network analysis methods wereapplied to discover novel driver genes for EAC development.Identified gene expression signatures for the distinction of EACfrom BE were validated in separate datasets. An extensiveanalysis of the noncoding RNA (ncRNA) landscape was per-formed to determine the involvement of novel transcriptomicelements in Barrett's disease and EAC. Finally, transcriptomicmutational investigation of genes that are recurrently mutatedin EAC was performed. Through these approaches, novel drivergenes were discovered for EAC, which involved key cell cycle

and DNA repair genes, such as BRCA1 and PRKDC. A novel 4-gene signature (CTSL, COL17A1, KLF4, and E2F3) was identi-fied, externally validated, and shown to provide excellentdistinction of EAC from BE. Furthermore, expression changeswere observed in 685 long noncoding RNAs (lncRNA) and asystematic dysregulation of repeat elements across differentstages of Barrett's disease, with wide-ranging downregulationof Alu elements in EAC. Mutational investigation revealeddistinct pathways activated between EAC tissues with or with-out TP53 mutations compared with Barrett's disease. In sum-mary, transcriptome sequencing revealed altered expression ofnumerous novel elements, processes, and networks in EAC andpremalignant BE.

Implications: This study identified opportunities to improveearly detection and treatment of patients with BE and esophagealadenocarcinoma. Mol Cancer Res; 15(11); 1558–69. �2017 AACR.

IntroductionEsophageal adenocarcinoma (EAC) has one of the fastest

increases in incidence rates of all cancers since the 1970s inwhite populations of all age groups in many countries (1, 2).

The strongest risk factor for EAC is the presence of Barrett'sesophagus (BE), a condition in which columnar intestinal-typemucosa (IM) replaces the normal squamous lining of theesophagus in response to gastroesophageal reflux. Barrett's IMwithout dysplasia (non-dysplastic BE, NDBE) can progressthrough the stages of low-grade dysplasia (LGD) to high-gradedysplasia (HGD) to invasive EAC (3). Most patients with EACpresent with late stages of disease, contributing to the poor5-year population survival rates of �15% (4, 5). Even afterhighly aggressive chemoradiotherapy followed by esophagect-omy for treatment of potentially curable disease, 5-year survivalrates are typically below 50% (6).

The mechanisms driving EAC development are poorlyunderstood, and the successes of early detection screeningprograms remain modest (7). Exome- and whole-genomesequencing studies have recently contributed greatly to theidentification of driver mutations and frequent catastrophic,chromothriptic events (8). Further efforts have been made toinvestigate the order of genomic alterations occurring in thedevelopment of EAC by studying the mutations in Barrett's IM,HGD, and EAC (9).

Whereas previous expression studies of Barrett's diseasehave mainly used microarray-based technologies relying onpredefined, mostly coding genes, RNA-sequencing (RNA-seq)offers an unbiased method of analyzing the whole

1Genome Informatics, Genomics and Epigenetics Division, Garvan Institute ofMedical Research, Sydney, NSW, Australia. 2Faculty of Medicine, St Vincent'sClinical School, University of New South Wales, Sydney, NSW, Australia.3Gastroesophageal Cancer Program, St. Vincent's Centre for Applied MedicalResearch, Sydney, Australia. 4University of Notre Dame School of Medicine,Sydney, Australia. 5Department of Surgery, Flinders University, Adelaide,Australia.

Note: Supplementary data for this article are available at Molecular CancerResearch Online (http://mcr.aacrjournals.org/).

J.L.V. Maag andO.M. Fisher share first authorship and contributed equally to thisarticle.

M.E. Dinger and R.V. Lord share senior authorship of this article.

Corresponding Authors: Marcel E. Dinger, The Garvan Institute of MedicalResearch, 384 Victoria Street, Darlinghurst, NSW, Australia 2010. Phone: 61-2-9935-5860; Fax: 61-2-9935-5860; E-mail: [email protected]; Reginald V.Lord, Suite 606, 438 Victoria Street, Darlinghurst, NSW, Australia 2010. Phone:61-2-8382-6671; Fax: 61-2-8382-6672; E-mail: [email protected]

doi: 10.1158/1541-7786.MCR-17-0332

�2017 American Association for Cancer Research.

MolecularCancerResearch

Mol Cancer Res; 15(11) November 20171558

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

transcriptome. Most microarray-based transcriptome analyseswill for example miss a large proportion of long noncodingRNAs (lncRNAs), which are involved in almost all knowncellular processes (10–14), and are important in cancer biol-ogy, as shown by the number of functional characterizedlncRNAs involved in carcinogenesis (15). In EAC, only twolncRNAs have been reported (16, 17), and no systematicapproach identifying lncRNAs dysregulation at different stagesof Barrett's disease and EAC development has been publishedso far.

Despite having potentially important roles in disease, othersubsets of the transcriptome are equally difficult to characterizewithout sequencing based technologies and have therefore rarelybeen explored. For example, repeat elements, whichmake up twothirds of the human genome (18), have been associated withcancer (19) and double stranded breaks (20), and long-inter-spersed nuclear element 1 (LINE-1) retrotransposition isincreased in EAC and Barrett's (21).

Here we report a characterization of the transcriptomic land-scape at different stages of the Barrett's disease spectrum byperforming whole transcriptome RNA-seq. Utilizing machinelearning and network analysis, we identify novel genes and net-works in EAC development. This analysis provides the one of thefirst systematic studies of aberrations in lncRNAs and repeatelement expression in EAC and identifies a novel, non-TP53mutation-based pathway for EAC formation through transcrip-tional mutation analysis.

Materials and MethodsStudy population, specimen collection and analysis, and RNAextraction

Institutional review board approval for this study was obtainedat all collaborating institutions, and all patients provided writteninformed consent.

Fifty-one tissue samples [normal squamous esophagus (NE)¼17, NDBE ¼ 14, BE with LGD ¼ 8, EAC ¼ 12] were collected atendoscopy from 44 patients for RNA-seq. Four patients providedmatched normal squamous tissues. For the immunohistochem-istry analysis, an additional 15 patients (NE¼ 5, NDBE¼ 5, EAC¼ 5) were added for external validation of protein findings.Demographic details of all patients included in this study areprovided in Supplementary Table S1.

All cancer specimens were pretreatment biopsies from radio-chemotherapy-na€�ve patients.

Specimen collectionAll tissue samples were collected from the tubular esophagus at

or above the gastroesophageal junction, which was defined as theproximal border of the gastric rugal folds. In patients with NDBEor LGD, a columnarmucosal segment of at least 1 cmwas requiredfor inclusion. Following endoscopic removal, research specimenswere immediately placed in a solution for RNA stabilization andstored for 24 to 48 hours at 4�C.

Determining tissue cellularity and dysplasiaPrior to RNA isolation specimens were cut into three pieces

using surgical scalpels treated with an RNAse and DNA decon-tamination reagent. The central section of the specimen wasformalin-fixed and paraffin-embedded with subsequent repre-sentative sections taken for H&E staining. The two remaining

sections of the specimens were stored at �80�C until furtheruse. The corresponding H&E sections were reviewed by twoexperienced gastrointestinal pathologists (Y.V. Bobryshev andM. Edwards) to verify the correct diagnosis and determinetissue cellularity as well as the degree and extent of dysplasia.Median microscopically estimated percentages of diagnosisdefining epithelium in the specimens used for RNA extractionwere as follows: NE 100% (IQR 100%–100%), NDBE 55%(IQR 50%–88%), LGD 50% (IQR 40%–79%), and EAC 73%(IQR 70%–100%).

RNA sequencing, bioinformatic analysis, gene and lncRNAvalidation

A detailed description of the RNA sequencing, the bioinfor-matic pipelines, and qPCR validation of lncRNAs is providedin Supplementary Methods. Briefly, trimming, mapping, andcountingwere performedusing trimgalore, STAR (22), andHTSeq(23). Differential gene expression analysis was performed withEdgeR28. Network analysis was performed using WGCNA (24).The SNV calling pipeline consisted of Samtools (25) mpileup,bcftools, vcfutilis, and vcffilter. The mutations were analyzedusing VEP. The method to generate, and validate, the 12- and4-gene signature is described in Supplementary Methods.

A detailed description of lncRNAqPCR validation can be foundin SupplementaryMethods. Raw expression data has been depos-ited to http://www.ebi.ac.uk/arrayexpress/(E-MTAB-4054).

ResultsNext-generation RNA sequencing of Barrett's disease tissues

A summary of RNA-seq run metrics and patient demo-graphics is provided in Supplementary Table S1 and in theSupplementary Methods.

Computational tissue purity estimation resulted in assessedpurities > 80% for all analyzed samples (Supplementary Fig.S1A) higher than estimated microscopically. This lack of cor-relation of computational purity and histopathologic estimatesis consistent with previous reports (26). The median stromalcompartment was higher in EAC samples compared with NDBEand LGD and lowest in NE (Supplementary Fig. S1B), whichmay reflect tumor-induced desmoplasia. Enrichment for theimmune cell score was only observed in EAC, possibly indi-cating a higher percentage of immune cell infiltration in EACcompared with BE (Supplementary Fig. S1C).

Whole transcriptome sequencing characterizesdisease-specific patient cohorts

Correlating gene expression profiles revealed high correlationbetween NDBE, LGD, and EAC (Supplementary Fig. S1D), withclosest similarity between NDBE and LGD (R ¼ 0.97). All tissuesare almost equally dissimilar to NE, indicating their non-squa-mous tissue histology. The correlation between EAC and NDBEand EAC and LGD was R ¼ 0.94 and 0.93, respectively (Supple-mentary Fig. S1D).

Principal component analysis corroborated these findings,revealing that themajor transcriptomic differences foundbetweendisease entities are between normal squamous tissues andNDBE/LGD/EAC (Fig. 1A). However, in the second principal componentanalysis, EAC showed distinct clustering characteristics with asubset of cancers grouping with the respective, tightly correlatedprecursor lesions (Fig. 1A).

Transcriptomic Landscape in EAC Development

www.aacrjournals.org Mol Cancer Res; 15(11) November 2017 1559

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

One sample, LNC_B13, originally classified asNDBE, showed agene expression signature much closer to normal squamousesophagus. Upon microscopic re-inspection, the specimen con-tained a larger proportion of squamous esophagus than intestinalmetaplasia, confirming the accuracy of RNA-seq–based tissueclassification.

Differential gene expression analysis identifies specificallydifferentially expressed genes across Barrett's disease stages

Differential gene expression (DE) analysis showed that thehighest number of DE genes (absolute log2 FC � 1, FDR �0.05) was identified when comparing EAC to NE (n ¼6,766; Fig. 1B), with almost equal numbers of DE genes presentwhen EAC was compared with LGD (n ¼ 2,885) and to NDBE(2,466; Fig. 1B). Of the top 10 DE genes between EAC and NE, 4genes (HNF4A [log2FCEAC/NE 8.5, FDR ¼ 9e�125], ANKS4B[log2FCEAC/NE 8,7, FDR ¼ 2e�104], LGALS4 [log2FCEAC/NE

7.9, FDR ¼ 6e�104], and BCL2L14 [log2FCEAC/NE 6.1, FDR ¼1e�97], Supplementary Table S2) were also differentiallyexpressed between NDBE and LGD versus NE, suggesting atissue specific expression of these genes.

Of the 7,803 uniquely DE genes between EAC and the othergroups (Fig. 1C), 4,029 were unique to EAC versus NE showingenrichment for developmental processes (P ¼ 5.6e�22; Supple-mentary Fig. S1E). Shared DE genes between all groups versusEAC (n ¼ 1,260) showed enrichment for cell motility (P ¼7.2e�14), response to stress (P ¼ 2.9e�16), and inflammatoryresponse (P ¼ 1.1e�17) in EAC (Supplementary Fig. S2), whilethe 316 genes dysregulated between EAC and NDBE andLGD were enriched for biological processes typical of highlyreplicative malignant tissues, such as mitotic cell cycle (P ¼1.3e�11) and DNA-replication initiation (P ¼ 2.5e�7; Supple-mentary Fig. S1E). Hierarchical clustering of the top 100 DE

genes (based on FDR) between EAC and all groups illustrateda perfect classification of histologic tissue types, with EAC andNE displaying perfect separation, whereas NDBE and LGDclustered together, but distinctly from EAC (Fig. 1D). However,despite the similarities between NDBE and LGD comparedwith EAC, we observed 214 significantly differentiallyexpressed genes between NDBE and LGD (Fig. 1E). Althoughthese lacked significant gene enrichment signatures, thetop 10 DE genes included 6 transcription factors (FOSB[log2FCLGD/NDBE 4.5, FDR ¼ 3.8e�7], NR4A1 [log2FCLGD/NDBE

2.4, FDR ¼ 6e�5], EGR1 [log2FCLGD/NDBE 2.6, FDR ¼ 6e�5],FOS [log2FCLGD/NDBE 2.4, FDR¼ 1.3e�4], EGR3 [log2FCLGD/NDBE

2.4, FDR ¼ 1e�3], and ATF3 [log2FCLGD/NDBE 1.8, FDR ¼ 1e�3]),which showed similar dysregulation in EAC (SupplementaryFig. S2A). Most of these transcription factors are involved in cellproliferation, differentiation, and transformationprocesses. Thus,these identified novel alterations could function as biomarkers fordysplasia in patients with histopathology that is difficult tointerpret, an important and not uncommon clinical problem forLGD (27).

A list of top differentially expressed genes is provided inSupplementary Table S2.

Network analysis reveals EAC specific coexpression networksMost transcriptomic studies focus on deriving lists of differen-

tially expressed genes. However, the information captured bytranscriptomic experiments is essentially much richer as therelationship betweenmeasured transcripts can also be consideredthrough pair-wise correlation of gene expression profiles. To thisend, we used weighted gene coexpression network analysis(WGCNA) to identify gene coexpression networks that may beimportant for EAC carcinogenesis. WGCNA identifies clusters ofhighly interconnected genes, which are termed eigenmodules or

Figure 1.

RNA-seq of esophageal tissues. A,Principal component analysis of the firsttwo components, with EAC (red), non-dysplastic Barrett's esophagus (NDBE,blue), BE with LGD (green), and normalsquamous esophagus (NE, purple). B,Number of upregulated (red) anddownregulated (blue) differentiallyexpressed (absolute log2 FC � 1, FDR �0.05) genes when comparing LGD/NDBE/NE to EACwith the percentage ofchange in each direction described onthe x-axis. C, Venn diagram of theuniquely differentially expressed genesfrom B between LGD/NDBE/NE andEAC.D,Hierarchal clusteringheatmapofthe top 100 differentially expressedgenes between each condition and EACnormalized as z-score per gene. E,Volcano plot comparing geneexpression between LGD and NDBE.Genes with an absolute log2 fold changeover � 2 and FDR � 0.05 are coloredred.

Maag et al.

Mol Cancer Res; 15(11) November 2017 Molecular Cancer Research1560

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

in short, modules, which can then be explored for their biologicalplausibility through gene ontology, pathway analysis and similar.Through WGCNA we identified three modules specifically dysre-gulated in EAC compared with NDBE, LGD, and NE samples.

Module 1 (Fig. 2) showed increased expression in EAC andwashighly enriched for cell-cycle genes (Fig. 2A and B). The mosthighly connected genes (Fig. 2C) included many significantlydifferential expressed genes between EAC and NDBE as well asEAC and LGD (labeled red) such as PRKDC (log2FCEAC/NDBE 1.1FDREAC/BE 4.8e�09, log2FCEAC/LGD 1.5 FDREAC/LGD 5.0e�09) andBRCA1 (log2FCEAC/NDBE 2.3 FDREAC/NDBE 2.4e

�9, log2FCEAC/LGD

1.5 FDREAC/LGD 1.2e�6; Supplementary Fig. S2B), while somewere only specifically differentially expressed between EAC andLGD (labeled green, e.g., XPO1 [log2FCEAC/LGD 1.2 FDREAC/LGD

3.6e�09]), or uniquely differentially expressed between EAC andNDBE (labeled blue, e.g., PLK1 [log2FCEAC/LGD 1.1 FDREAC/LGD

1.3e�5]; Fig. 2B). This identification of BRCA1 and PRKDC in

EAC is striking, as these genes are involved in DNA editingand repair mechanisms, which could be explained by the highdegree of genomic rearrangement and chromothriptic eventsseen in EAC (8, 28). PRKDC was further validated as beingupregulated in EAC at the protein level by IHC (Fig. 2D and E).

WGCNA also confirmed the increased immune cell scoreobserved in EAC by identifying an EAC-specific module highlyexpressed and enriched for genes involved in immune systemfunctions (Supplementary Fig. S2C). Another specific moduleshown to be specifically downregulated in EAC and LGD com-pared with NDBE mainly included genes involved in proteintransport functions (Supplementary Fig. S2D).

Identification of a novel, EAC-specific gene expressionsignature

Following the identification of EAC-specific gene coexpressionnetworks, we attempted to discover an EAC specific gene signature

Figure 2.

Top identified weighted gene coexpression network in EAC. A, Heatmap (top) of all genes in the identified EAC specific module, and median eigengeneexpression profile (bottom) of the brown module identified by WGCNA, sorted by each condition. B, Top 10 gene ontology enrichment processes of allgenes in the brown module. C, The top connected mRNA genes (�10 connections) protein–protein interaction from the EAC-specific module (A). Sizecorresponds to number of connections (small �10, medium �20, large �30). Nodes differentially expressed in EAC versus NDBE/LGD (red), EACversus NDBE (blue), and EAC versus LGD (green). D, IHC intensity score for PRKDC. ���, P � 0.001; n.s., nonsignificant; mean � SEM. E, RepresentativeIHC slide of PRKDC between NDBE (left) and EAC (right).

Transcriptomic Landscape in EAC Development

www.aacrjournals.org Mol Cancer Res; 15(11) November 2017 1561

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

by applying four machine-learning classification methods(Fig. 3A). Following our selection criteria (Supplementary Fig.S3A), we identified 39 genes showing complete separation of the

conditions upon unsupervised hierarchical clustering (Fig. 3B).The identified genes were significantly enriched for biosynthe-tic processes, inflammatory responses, and cell proliferation

Figure 3.

Identification of EAC-specific gene signature through machine learning. A, Venn diagram of the four different cross-validations methods used to identifyEAC-specific genes. Five-CV (five-cross-validation), MCCV (Monte–Carlo cross-validation), LOOCV (leave-one-out cross-validation). B, Heatmap ofgenes present in �2 cross-validation methods, and differentially expressed between EAC versus NDBE. C, Gene enrichment analysis of the genesfrom B. D, Conditional classification tree for sub setting of 12-gene signature in combined external microarrays (D top left) with resulting 4 genessignature and corresponding RNA-seq expression (D, bottom left). AUROC for combined microarray datasets comparing EAC to NDBE (D, top right)and for validation data sets of gastric (green) and colorectal (blue) cancer (D, bottom right) compared with corresponding normal tissues. E, IHCintensity score for KLF4, E2F3, COL17A1, and E2F3. � , P � 0.05; �� , P � 0.01; ��� , P � 0.001; n.s., nonsignificant; mean � SEM. F, Representative IHCslide of COL17A1 and E2F3 with NDBE on the left and EAC to the right.

Maag et al.

Mol Cancer Res; 15(11) November 2017 Molecular Cancer Research1562

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

(Fig. 3C). Interestingly, the gene signature included fourlncRNAs not previously associated with EAC, such as DLGAP2-AS1 (log2FCEAC/NE 4.9 FDREAC/NE ¼ 1.7e�49, log2FCEAC/NDBE

2.9 FDREAC/NDBE 8.3e�22, log2FCEAC/LGD 2.0 FDREAC/LGD

2.4e�6; Fig. 3B).Next, 12 genes identified in all four methods and differen-

tially expressed between EAC and NDBE (SupplementaryFig. S3B) were evaluated for their capacity to differentiatebetween benign and malignant Barrett's tissues in two publi-cally available microarray datasets [GSE37203, ref. (29);GSE26886, ref. (30)] comparing 87 EAC to non-dysplastic BEtissues (GSE37203, n¼ 46; GSE26866, n¼ 41). All correspond-ing probes of the 12 genes were identified in GSE26886,whereas probes mapping to PM20D2 could not be found inGSE37203. Each gene's discrimination characteristics were thenassessed by plotting ROC curves and calculating correspondingareas under the curve (AUC), sensitivities, and specificities.These results are summarized in Supplementary Table S3. Ourrandom forest classification analysis of combined externaldatasets identified four of the 12 genes (KLF4, E2F3, COL17A,CTSL) as main drivers of differences, thus contributing the mostto classification accuracies (Fig. 3D, top and bottom). Near-perfect distinction of disease states could be achieved, whencombining these four genes into an EAC specific gene signature(AUROC of combined validation sets: 0.965 [0.931–1.000]; Fig. 3D, right; Supplementary Fig. S3C). As this 4-genesignature had been derived through combining data from bothour internal RNA-seq and the two external microarray datasets,we subsequently tested this novel 4-gene signature in a third,separate cohort of 84 BE and EAC tissues from 73 patients(GSE13898, ref. 31). In this cohort, we observed an AUC of0.89 (95% CI 0.78–1.00), which translates to a sensitivityof 93% (95% CI 0.85–0.97) and specificity of 78% (95% CI0.45–0.93). Intriguingly, these 4 genes also provided excellentseparation of both gastric [AUROC: 0.917 (0.810–1.000),GSE19826, ref. 32] and colonic [AUROC: 0.974 (0.933–1.000), GSE23878, ref. 33] cancers from corresponding normaltissues (Fig. 3D, bottom right; Supplementary Fig. S3D), indi-cating a central role of these genes for epithelial gastrointestinalcancer formation.

To assess cell viability effects of individual shRNA-mediatedgene knockdowns of the identified gene signature and EACspecific coexpression network, two EAC cell lines (OE33 andJHESOAD1) were accessed through the Broad Institute's ProjectAchilles Data Portal (see Methods). In accordance with the iden-tified expression patterns, knockdown of EAC downregulatedgenes tended to result in increased EAC cell proliferation rates.Accordingly, knockdown of EAC-specific upregulated genes, suchas CTSL (log2FCEAC/NE 4.0 FDREAC/NE ¼ 7.7e�57, log2FCEAC/NDBE

2.0 FDREAC/NDBE 1.9e�15, log2FCEAC/LGD 2.0 FDREAC/LGD 5.5e�8),

resulted in marked decrease in EAC cell viability (SupplementaryFig. S4A). Similar effects were observed in other GI-cancer celllines indicating that the identified genes may be important for GImalignancy formation and perhaps not specific to EAC (Supple-mentary Fig. S4B). Despite this, some of the genes identified asproducing lowest cell viabilities if knocked down, such as PRKDC,may potentially be exploited as therapeutic targets (34).

Expression of the gene signature proteinsProtein expression of the genes identified in our 4-gene EAC-

specific signature (CTSL, COL17A1, E2F3, and KLF4) was inves-

tigated by IHC. Following quantitative analysis, significant pro-tein overexpression of CTSL, E2F3 in EAC compared with NDBEcould be confirmed (Fig. 3E and F; Supplementary Fig. S4C).Equally, the significant downregulation of COL17A1 in EACcompared with NDBE, as implied through our gene-expressionanalyses, was also confirmed in our IHC analysis (Fig. 3E and F),whereas there was no significant difference in protein expressionlevels for KLF4.

Long noncoding RNAs show characteristic differentialexpression across different stages of Barrett's disease

Our machine-learning approach identified lncRNAs thatwere specifically expressed in EAC, and thus potentially drivingEAC development. Therefore, we sought to systematically char-acterize lncRNA dysregulation at different stages of Barrett'sdisease. Previously, only two lncRNAs (HNF1A-AS1 and AFAP-AS1) have been reported as upregulated in EAC (16, 17).Although these were upregulated in EAC compared with NE,we observed no difference between EAC and NDBE/LGD (Sup-plementary Fig. S5A and S5B). In total, our whole transcrip-tome analyses identify 685 significantly dysregulated lncRNAsin EAC compared with NE, NDBE, and LGD (SupplementaryTable S4).

Of the 685 identified dysregulated lncRNAs, 19 overlappedFantom5 enhancer RNAs (eRNA), suggesting that these are moreinvolved in regulating gene expression than exerting individualmolecular functions.

Unsupervised hierarchical clustering of the top 20 DE lncRNAsbetween EAC and the other groups showed highly accurate tissueclassification (Fig. 4A; Supplementary Table S4), which representsthe first of its kind for lncRNAs.

Comparing EAC to its precursor lesions and NE shows thatthe majority of lncRNAs tend to be upregulated in EAC(Fig. 4B). The two largest classes of DE lncRNAs are antisenseand lincRNA, which also represent the two largest classes ofannotated lncRNAs (Fig. 4C).

Furthermore, we observed significant expression correlationof 93% (129 of 139) of DE lncRNA–mRNA neighboring pairs(Fig. 4D). This was validated for 15 available lncRNA–mRNApairs in the EAC TCGA dataset (Supplementary Fig. S5C). Guilt-by-association analysis, through k-means clustering, furtherrevealed 53 and 102 lncRNAs in clusters enriched for biologicalterms associated with cell cycle and immune system, respec-tively, suggesting these lncRNAs may play a role in thoseprocesses of EAC development (Fig. 4E).

The expression of four top DE lncRNAs was validated by qRT-PCR in 41 patients (NE ¼ 13, BE ¼ 13, LGD ¼ 6, EAC ¼ 9),confirming the significant overexpressed levels of DLGAP1-AS2,DLEU2, AC108488.3, and PCAT7 compared with control tissues(Fig. 4F). With DLGAP1-AS2 also having been identified as apotential driver gene for EAC formation in our machine-learninganalysis, its role as an important biomarker for EAC developmentwarrants further exploration.

Taken together, these results indicate an important role forlncRNAs in esophageal adenocarcinogenesis.

Repeat elements are detectable by RNA-seq and showdifferential expression patterns across different stages ofBarrett's disease

A novel endogenous retrovirus-associated long noncodingRNA (EVADR) has recently been reported in various human

Transcriptomic Landscape in EAC Development

www.aacrjournals.org Mol Cancer Res; 15(11) November 2017 1563

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

adenocarcinomas but EAC was not included in the analysis (35).Here, we report elevated expression levels of EVADR in EACcompared with NE (Fig. 5C, [log2FCEAC/NE 4.9 FDREAC/NE ¼1.2e�11]), confirming the previously reported overexpression ofEVADR specifically occurring in adenocarcinomas. However, wealso detected substantially elevated levels of EVADR both inNDBE and LGD (log2FCNDBE/NE 3.3 FDRNDBE/NE 2.7e�06,log2FCLGD/NE 2.0 FDRLGD/NE 1.4e

�2) indicating the preneoplasticnature of these metaplastic conditions and a potential role forEVADR in step-wise malignancy formation (Fig. 5C).

Seeing that many lncRNAs, including EVADR, are repeatenriched (36), we explored the expression of repeat elementsat different stages of EAC development. DE analysis of thefour largest repeat types revealed wide dysregulation between

EAC, NDBE, LGD, and normal samples with the largest repeatelement differential expression patterns found betweenEAC and normal tissues (Fig. 5A; Supplementary Table S5).Overall, long terminal repeats (LTR), the class of repeatsenriched in EVADR, and LINE elements show the highestexpression in EAC, while SINE elements are mostly down-regulated in EAC (Fig. 5B). This latter finding is especiallyinteresting as we also found that APOBEC3G, an inhibitor ofSINE elements (37), is upregulated in EAC compared withother tissues (log2FCEAC/NE ¼ 0.94 FDREAC/NE ¼ 1.1e�4,log2FCEAC/NDBE ¼ 0.90 FDREAC/NDBE ¼ 5.3e�4, log2FCEAC/LGD

¼ 1.3 FDREAC/LGD

4.7e�4).Previous studies have shown that the REP522 family is

upregulated in most promoters in a variety of cancers (38). In

Figure 4.

lncRNA expression changes in EAC and premalignant lesions. A, Hierarchal clustered heatmap of the top 20 differentially expressed lncRNAs betweeneach condition and EAC normalized as z-score per gene. B, Number of upregulated (red) and downregulated (blue) differentially expressed(absolute log2 FC � 1, FDR � 0.05) genes between EAC and LGD/NDBE/NE. The respective lncRNA class is shown in the pie chart to the right. C,Representative location of each lncRNA class as classified in Gencode. D, Pearson correlation of differentially expressed neighbor lncRNA-mRNA pairsubgrouped per class. E, K-means clustering of all differentially expressed Gencode genes between EAC and NE/NDBE/LGD with mean clusterexpression (blue/red), number of genes in each cluster (white/green), number of lncRNAs per cluster (white/black), and the top GO biologicalprocess enrichment term to the right. F, IGV coverage plot (left) and qPCR (right) validation of (top-to-bottom) DLGAP1-AS2, DLEU2, AC108488.3,and PCAT7. � , P � 0.05; �� , P � 0.01; ���, P � 0.001; n.s., nonsignificant; mean � SEM.

Maag et al.

Mol Cancer Res; 15(11) November 2017 Molecular Cancer Research1564

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

our data we also found an increase of REP522 expressionbetween EAC and NE, NDBE, and LGD. (log2FCEAC/NE 0.89FDREAC/NE ¼ 4.0e�4, log2FCEAC/NDBE 1.6 FDREAC/NDBE 1.4e�8,log2FCEAC/LGD 1.0 FDREAC/LGD 5.9e�3).

Centromere satellites and mRNA are dysregulated patterns inEAC

Through our repeat element analysis, we also observedincreased expression of three centromere satellites in EACcompared with other patient groups (Fig. 5D). a-Satellites,found in the centromere regions of all chromosomes, havebeen linked to aneuploidy, a well-known feature of EAC(39, 40). In light of these observations, we investigated cen-tromere protein (CENP) expression. CENP are involved incell-cycle control and are upregulated in several cancers, wherethey instigate chromosomal instability (41, 42). We foundthe majority of all CENP to be upregulated in EAC comparedwith NDBE, LGD, and NE (Fig. 5E). This feature has not beenreported previously in EAC; it may be involved in the frequentchromothriptic events and aneuploidy observed in thismalignancy (8).

Whole transcriptome sequencing successfully identifiesmutations prevalent in EAC and allows for the discoveryof a RNA-seq mutational signature of different stages ofBarrett's disease

Although RNA-seq is mainly employed to study gene expres-sion, it has been shown that it is possible to identify mutationsin expressed genes (43, 44).

To identify commonly occurring mutational patterns occur-ring in expressed transcripts, we investigated the overall tran-scriptional mutation signatures of each sample (Supplemen-tary Methods). Based on five discovered signatures, we saw thatthe majority of EAC cluster closely together as most EAC tendto have reduced frequencies of signature 2, which mainlyconsists of C>T variants (Supplementary Fig. S5D and S5E).Interestingly, out of the two LGD and one NDBE patientsclustering closely to EAC, one LGD patient later progressedto develop EAC.

Next, we focused our analysis on recurrently mutated SNPsin EAC (9, 28, 40). In concordance with previous studies (9),we observed a high percentage (50%) of EAC patients withmissense mutation in TP53, whereas no TP53 mutation was

Figure 5.

Dysregulation of repeat elements and centromere proteins at different stages of EAC development. A, Expression of EVADR, a long-terminal repeat(LTR) enriched lncRNAs previously described in other human adenocarcinomas but uncharacterized in Barrett's disease. B, Number of differentiallyexpressed (FDR � 0.05) repeat elements from the largest four classes (DNA, LINE, LTR, and SINE elements). Y-axis indicates number of differentiallyexpressed repeat elements. C, Heatmap of differentially expressed (FDR � 0.05) Alu/SINE elements between LGD/NDBE/NE and EAC illustratingwide-ranging downregulation of Alu elements in EAC. D, Significant differential expression (FDR � 0.05) and upregulation of centromere satellites in EAC(red) compared with NE (purple), NDBE (blue) and LGD (green). E, Log-fold change of all centromere protein encoding genes in EAC compared withNE/NDBE/LGD. � , FDR � 0.05.

Transcriptomic Landscape in EAC Development

www.aacrjournals.org Mol Cancer Res; 15(11) November 2017 1565

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

detected in NDBE or LGD (Fig. 6A). All TP53 mutations hadpreviously been reported in the COSMIC database. Further,mutations in SMARCA4 and CDNK2A were restricted to EACand LGD patients (Fig. 6A), suggesting that these could beimportant drivers in tumor development.

RNA-seq identifies severe dysregulation of the transcriptome inTP53 mutated EACs and wide-ranging metabolictranscriptional dysregulation in TP53 wild-type tumorscompared with BE

With 50% of EAC patients having either wild-type TP53(TP53þ) or mutant TP53 (TP53�) transcripts, we compared thegene expression profile of TP53þ and TP53� patient groups toNDBE. We found that TP53� EACs have approximately twice asmany DE genes (n ¼ 3000) as TP53þ tumors (n ¼ 1776) whencompared with BE (Fig. 6C). Moreover, we found that TP53�

tumors showed significant enrichment for cell cycle processeswhereas TP53þ tumors showed enrichment for cell metabolicfunctions when compared with BE (Supplementary Fig. S6A).Genes commonly differentially expressed between both types of

EACs and BE showed enrichment for inflammatory responseprocesses indicating two highly distinct pathways of EAC forma-tion with inflammation—potentially triggered by gastroesopha-geal reflux—as a commonpathway inmalignancy formation. Thiswas further supported by our selective analysis of genes involvedin the TP53pathway,wherewe found that TP53� andTP53þEACsshow different expression of TP53 pathway genes, with MDM2upregulation and TP53AIP downregulation being a characteristictrait of TP53wild-type tumors comparedwith BE (SupplementaryFig. S6B).

DiscussionIn this study, we characterized transcriptomic alterations occur-

ring at different stages of Barrett's disease through RNA-seq. Wehave identified key players in EAC-specific gene networks anddescribe a novel EAC specific 4-gene expression signature. Ourinvestigation of the noncoding RNA landscape and assessment ofexpressed gene mutations have also identified new processesinvolved in the development of this disease.

Figure 6.

Identification of transcribed SNPs present in EAC and premalignant lesions of the esophagus. A, Variants (missense, synonymous, frame shift, stopgained, inframe insertions, splice variant) in known genomically mutated genes across EAC, LGD, and NDBE with aggregated percentage of patientswith mutations (bar chart left), and the corresponding frequency of mutations as determined in Broad/TCGA genomic EAC data on the top of eachbar. Mutations found in COSMIC were marked with an asterisk. B, MA plot of gene expression between TP53 wild-type (þ) EAC, TP53 mutated (�) EACand Barrett's intestinal metaplasia. Significantly differentially expressed (absolute log2 FC � 1, FDR � 0.05) genes are colored red (upregulated) orblue (downregulated).

Maag et al.

Mol Cancer Res; 15(11) November 2017 Molecular Cancer Research1566

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

We identified upregulated transcription factors (e.g., EGR1,EGR3, FOSB, FOS, NR4A1, ATF3) in Barrett's with LGD and inEAC that have not previously been associated with early stageBarrett's disease progression and could thus be tested as novelbiomarkers.

Our investigation of the mRNA landscape through weightedcoexpression gene network analysis (WCGNA) revealed a pleth-ora of highly connected, differentially expressed genes mostlyinvolved in cell-cycle control and DNA repair mechanisms.Prominent members of this network included genes not pre-viously characterized in EAC such as BRCA1 and PRKDC.Protein levels of PRKDC were also overexpressed, and previousPRKDC knockdown data produced some of the lowest mea-sured viabilities in the two tested EAC as well as multiple othergastrointestinal cancer cell lines. The protein kinase, DNA-activated, catalytic polypeptide (PRKDC, or DNA-PKcs) is crit-ical for DNA double strand break repair and recombination(45, 46). As such, PRKDC activity in EAC is in line with thegenomic rearrangement recently described as a central featureof EAC (8, 28). However, PRKDCs is also a central regulator oftranscriptional networks, facilitating tumor formation and pro-gression (47). PRKDC may also have clinical relevance as apredictor of sensitivity to chemotherapy, as reported for esoph-ageal SCC (48), and it has been reported as an attractive novelchemotherapeutic target (34).

Through machine learning, we discovered and validated anovel gene signature that displayed near-perfect distinction ofEAC and BE tissues in two public datasets (GSE37203, ref. 29;GSE26886, ref. 30) and in a third, completely independentvalidation set (GSE13898, ref. 31). Although, to our knowledge,the derived gene list lacks genes previously associated with EACand its development, many are associated with other cancers(Supplementary Table S6). Furthermore, our random forest anal-ysis determined four genes (CTSL, COL17A1, E2F3, and KLF4)that could accurately differentiate normal from malignantBarrett's tissues both at the transcript and protein level for CTSL,COL17A1, and E2F3, while only at the RNA expression level forKLF4. E2F3 and KLF4 were equally important for distinguishingbenign and malignant gastric and colorectal tissues, suggestingpossible roles as onco- and tumor suppressor genes in epithelialGI cancers. E2F family member overexpression and loss ofKLF4 expression are known phenomena in both colon and gastriccancer, whereas in colorectal cancer, KLF4 is regarded as a tumorsuppressor gene that may also mediate epithelial mesenchymaltransition (49). We surmised that E2F family member over-expression may be the result of defective RB1 signaling, whichis known to occur in EAC (3). Moreover, of the other proteincoding genes identified through this machine learning and net-work analysis, many displayed characteristics in public shRNA-mediated gene knockdown data, indicating that they may bedrivers of EAC formation and propagation.

Using whole-transcriptome sequencing we discovered 685lncRNAs differentially expressed between EAC and NE, NDBE,and LGD. Some of these were also identified in our EAC-specificgene signature. lncRNA-mRNA coexpression analysis suggestedthat lncRNAs function in malignancy relevant processes such asmitosis, development, and extracellular matrix organizationalprocesses.

Although the identified eRNAs only make up �3% of thetotal dysregulated lncRNAs, these elements point to the poten-tial involvement of dysregulated enhancers in EAC. Enhancers

regulate the transcription of target gene/genes; however, iden-tifying the targets can be difficult because enhancers can actupon large areas of the genome and not just regulating the genein cis. The use of chromatin configuration capture methods,such as Hi-C, will help identifying the genomic targets of theenhancers in EAC.

Further studies should now aim at establishing the function-al role of the identified lncRNAs in EAC development and howthese may be exploited from a prognostic and therapeuticperspective.

We found many classes of repeat elements to be upregulatedin EAC. We observed increased expression of LINE elements,consistent with recent reports describing increased LINE-1retrotransposition activity in both BE and EAC (21). Interest-ingly, however, Alu elements, primate-specific short inter-spersed nuclear elements (SINE), showed almost uniformdownregulation in EAC compared with premalignant Barrett'sand normal squamous tissues. Alu elements are known reg-ulators of gene expression (50), and thus their downregulationin EAC could indicate loss of important regulatory elementsfacilitating malignancy formation.

Previous studies have identified similar patterns of upregula-tion in repeat elements in a variety of cancers. In hepatocellularcarcinoma, LTR promoters show increased activation and tumorswith high LTR expression were less differentiated compared withtumors with low LTR expression (51).

Furthermore, a study conducting a pan-cancer investigationnoticed that promoters containing SINE, LINE, and LTRs wereupregulated in cancers (38). In our study and in concordancewithprevious studies, we observed that the majority of LTR and LINEelements were upregulated in EAC. Our study also confirms theupregulation of REP522, which has previously been indicated incancer (38).What differs fromprevious studies is thatweobserveda downregulation of SINE in EAC. Further studies should aim toelucidate the role of repeat elements in EAC.

Intriguingly, comparing gene expression profiles of EACtumors with or without defective TP53 transcripts to NDBEidentified two highly distinct gene enrichment profiles: TP53-mutated tumors were enriched for cell-cycle division andmitotic processes, whereas TP53 wild-type tumors enriched formetabolic processes. While the acquisition of TP53 mutationshas been identified as a key step in Barrett's disease progression(9), approximately 30% of EACs harbor wild-type TP53 (28).Thus, we propose severe dysregulation in metabolic pathwaysas an alternative, non-TP53 mutation dependent form of EACdevelopment.

There are limitations to our findings. First, this study isdescriptive by design and cannot provide information regard-ing risk of Barrett's progression, as we studied samples fromdifferent individuals at different disease stages. Therefore, it isclear that spatiotemporal gene expression variability or biologicvariance due to sampling of different regions of the esophagusor capturing of particular BE cell clones/subpopulations cannotbe ruled out as explanations for our findings. A second limi-tation of our study is the lack of HGD Barrett's tissues (HGD),although we note that most recent genomic (9) and recent geneexpression studies (52) indicate that the alterations occurringin HGD are very similar to those identified in EAC.

In summary, by performing whole transcriptome sequencingwe have described novel genes and transcription elements andelucidated signaling pathways previously not observed or

Transcriptomic Landscape in EAC Development

www.aacrjournals.org Mol Cancer Res; 15(11) November 2017 1567

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

under-characterized in the Barrett's metaplasia–dysplasia–neo-plasia sequence. Through external validation of our findings,we identified a novel four gene signature, which providesalmost complete differentiation of EAC from BE but requiresfurther prospective testing. Our results also suggest separatepathways to EAC in TP53 mutant and nonmutant tumors, withinflammation, presumably due to the severe gastroesophagealreflux that is involved in initiating Barrett's disease, as a com-mon factor.

Disclosure of Potential Conflicts of InterestNo potential conflicts of interest were disclosed.

Authors' ContributionsConception and design: J.L.V. Maag, O.M. Fisher, A. Levert-Mignon,M.L. Thomas, Y.V. Bobryshev, M.E. Dinger, R.V. LordDevelopment of methodology: J.L.V. Maag, O.M. Fisher, D.C. Kaczorowski,R.V. LordAcquisition of data (provided animals, acquired and managed patients,provided facilities, etc.): O.M. Fisher, A. Levert-Mignon, M.L. Thomas,D.J. Hussey, D.I. Watson, A. Wettstein, Y.V. Bobryshev, M. Edwards, R.V. LordAnalysis and interpretation of data (e.g., statistical analysis, biostatistics,computational analysis): J.L.V. Maag, O.M. Fisher, Y.V. Bobryshev, R.V. LordWriting, review, and/or revision of the manuscript: J.L.V. Maag, O.M. Fisher,M.L. Thomas, D.J. Hussey, D.I. Watson, Y.V. Bobryshev, M.E. Dinger, R.V. Lord

Administrative, technical, or material support (i.e., reporting or organizingdata, constructingdatabases):O.M. Fisher, A. Levert-Mignon,D.C. Kaczorowski,M.L. Thomas, R.V. LordStudy supervision: D.I. Watson, M.E. Dinger, R.V. Lord

AcknowledgmentsThe authors would like to thank Alice Boulghoujian and Anaiis Zaratzian of

the Garvan Institute for Medical Research Histopathology Department for theirexcellent technical assistance with all immunohistochemical analyses.

Grant SupportThis work was supported by the Progression of Barrett's Esophagus

to Cancer Network (PROBE-Net), the Australian National Health andMedical Research Council (NHMRC1040947), Cancer Council New SouthWales (SRP 08-04 and RG 13-03), and St Vincent's Clinic Foundation,Sydney. O.M. Fisher is supported by the Swiss Cancer League (BIL-KLS133-02-2013), the Australian National Health & Medical ResearchCouncil (GNT1094423) and the Swiss National Science Foundation(P1SKP3_161806).

The costs of publication of this article were defrayed in part by thepayment of page charges. This article must therefore be hereby markedadvertisement in accordance with 18 U.S.C. Section 1734 solely to indicatethis fact.

Received June 25, 2017; revised June 30, 2017; accepted July 21, 2017;published OnlineFirst July 27, 2017.

References1. Pohl H, Welch HG. The role of overdiagnosis and reclassification in the

marked increase of esophageal adenocarcinoma incidence. J Natl CancerInst 2005;97:142–6.

2. Edgren G, Adami HO, Weiderpass E, Nyr�en O. A global assessmentof the oesophageal adenocarcinoma epidemic. Gut 2013;62:1406–14.

3. Clemons NJ, Phillips WA, Lord RV. Signaling pathways in the molecularpathogenesis of adenocarcinomas of the esophagus and gastroesophagealjunction. Cancer Biol Ther 2013;14:782–95.

4. Enzinger PC, Mayer RJ. Esophageal cancer. N Engl J Med 2003;349:2241–52.

5. Rustgi AK, El-Serag HB. Esophageal carcinoma. N Engl J Med 2014;371:2499–509.

6. van Hagen P, Hulshof MC, van Lanschot JJ, Steyerberg EW, van BergeHenegouwen MI, Wijnhoven BP, et al. Preoperative chemoradiother-apy for esophageal or junctional cancer. N Engl J Med 2012;366:2074–84.

7. Weaver JM, Ross-Innes CS, Fitzgerald RC. The "-omics" revolution andoesophageal adenocarcinoma. Nat Rev Gastroenterol Hepatol 2014;11:19–27.

8. Nones K,Waddell N,WayteN, Patch AM, Bailey P,Newell F, et al. Genomiccatastrophes frequently arise in esophageal adenocarcinoma and drivetumorigenesis. Nat Commun 2014;5:5224–.

9. Weaver JMJ, Ross-Innes CS, Shannon N, Lynch AG, Forshew T, Barbera M,et al. Ordering of mutations in preinvasive disease stages of esophagealcarcinogenesis. Nat Genet 2014;46:837–43.

10. Gupta RA, Shah N, Wang KC, Kim J, Horlings HM, Wong DJ, et al. Longnon-coding RNA HOTAIR reprograms chromatin state to promote cancermetastasis. Nature 2011;464:1071–6.

11. Kretz M, Siprashvili Z, Chu C, Webster DE, Zehnder A, Qu K, et al. Controlof somatic tissue differentiation by the long non-coding RNA TINCR.Nature 2012;493:231–5.

12. Wang P, Xue Y, Han Y, Lin L, Wu C, Xu S, et al. The STAT3-binding longnoncoding RNA lnc-DC controls human dendritic cell differentiation.Science 2014;344:310–3.

13. Lin N, Chang KY, Li Z, Gates K, Rana ZA, Dang J, et al. An evolutionarilyconserved long noncoding RNA TUNA controls pluripotency and neurallineage commitment. Mol Cell 2014;53:1005–19.

14. Engreitz JM, Pandya-Jones A,McDonel P, Shishkin A, SirokmanK, Surka C,et al. The Xist lncRNA exploits three-dimensional genome architectureto spread across the X chromosome. Science 2013;341:1237973.

15. Quek XC, ThomsonDW,Maag JLV, Bartonicek N, Signal B, Clark MB, et al.lncRNAdb v2.0: expanding the reference database for functional longnoncoding RNAs. Nucleic Acids Res 2015;43:D168–73.

16. Yang X, Song JH, Cheng Y, Wu W, Bhagat T, Yu Y, et al. Long non-codingRNA HNF1A-AS1 regulates proliferation and migration in oesophagealadenocarcinoma cells. Gut 2014;63:881–90.

17. Wu W, Bhagat TD, Yang X, Song JH, Cheng Y, Agarwal R, et al. Hypo-methylation of noncoding DNA regions and overexpression of thelong noncoding RNA, AFAP1-AS1, in Barrett's esophagus and esophagealadenocarcinoma. Gastroenterology 2013;144:956–966.e4.

18. de Koning AP, Gu W, Castoe TA, Batzer MA, Pollock DD. Repetitiveelementsmay comprise over two-thirds of the human genome. PLoSGenet2011;7:e1002384.

19. Thibodeau S, Bren G, Schaid D. Microsatellite instability in cancer of theproximal colon. Science 1993;260:816–9.

20. Gasior SL, Wakeman TP, Xu B, Deininger PL. The human LINE-1retrotransposon creates DNA double-strand breaks. J Mol Biol 2006;357:1383–93.

21. Doucet-O'Hare TT, Rodi�c N, Sharma R, Darbari I, Abril G, Choi JA, et al.LINE-1 expression and retrotransposition in Barrett's esophagus andesophageal carcinoma. Proc Natl Acad Sci U S A 2015;112:E4894–900.

22. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, et al. STAR:ultrafast universal RNA-seq aligner. Bioinformatics 2012;29:15–21.

23. Anders S, Pyl PT, HuberW.HTSeq–a Python framework towork with high-throughput sequencing data. Bioinformatics 2015;31:166–9.

24. Langfelder P, Horvath S. WGCNA: an R package for weighted correlationnetwork analysis. BMC Bioinformatics 2008;9:559.

25. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. TheSequence Alignment/Map format and SAMtools. Bioinformatics 2009;25:2078–9.

26. Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-Garcia W, et al. Inferring tumour purity and stromal and immune celladmixture from expression data. Nat Commun 2013;4.

27. Lao-Sirieix P, Fitzgerald RC. Screening for oesophageal cancer. Nat Rev ClinOncol 2012;9:278–87.

Maag et al.

Mol Cancer Res; 15(11) November 2017 Molecular Cancer Research1568

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

28. Dulak AM, Stojanov P, Peng S, LawrenceMS, Fox C, Stewart C, et al. Exomeand whole-genome sequencing of esophageal adenocarcinoma identifiesrecurrent driver events and mutational complexity. Nat Genet 2013;45:478–86.

29. Silvers AL, Lin L, Bass AJ, Chen G, Wang Z, Thomas DG, et al. Decreasedselenium-binding protein 1 in esophageal adenocarcinoma results fromposttranscriptional and epigenetic regulation and affects chemosensitivity.Clin Cancer Res 2010;16:2009–21.

30. Wang Q,Ma C, KemmnerW.Wdr66 is a novel marker for risk stratificationand involved in epithelial-mesenchymal transition of esophageal squa-mous cell carcinoma. BMC Cancer 2013;13:137.

31. Kim SM, Park YY, Park ES, Cho JY, Izzo JG, Zhang D, et al. Prognosticbiomarkers for esophageal adenocarcinoma identifiedby analysis of tumortranscriptome. PLoS One 2010;5:e15074.

32. WangQ,WenYG, LiDP, Xia J, ZhouCZ, YanDW, et al. Upregulated INHBAexpression is associated with poor survival in gastric cancer. Med Oncol2010;29:77–83.

33. Uddin S, Ahmed M, Hussain A, Abubaker J, Al-Sanea N, AbdulJabbar A,et al. Genome-wide expression analysis of middle eastern colorectal cancerreveals FOXM1 as a novel target for cancer therapy. Am J Pathol 2011;178:537–47.

34. Riabinska A, DaheimM,Herter-Sprie GS,Winkler J, Fritz C, HallekM, et al.Therapeutic targeting of a robust non-oncogene addiction to PRKDC inATM-defective tumors. Sci Transl Med 2013;5:189ra78–8.

35. Gibb EA, Warren RL, Wilson GW, Brown SD, Robertson GA, Morin GB,et al. Activation of an endogenous retrovirus-associated long non-codingRNA in human adenocarcinoma. Genome Med 2015;7:101.

36. Kapusta A, Kronenberg Z, Lynch VJ, Zhuo X, Ramsay L, Bourque G, et al.Transposable elements aremajor contributors to the origin, diversification,and regulation of vertebrate long noncoding RNAs. PLoS Genet 2013;9:e1003470.

37. Hulme AE, Bogerd HP, Cullen BR, Moran JV. Selective inhibition of Aluretrotransposition by APOBEC3G. Gene 2007;390:199–205.

38. Kaczkowski B, Tanaka Y, Kawaji H, Sandelin A, Andersson R, Itoh M,et al. Transcriptome analysis of recurrently deregulated genes acrossmultiple cancers identifies new pan-cancer biomarkers. Cancer Res2016;76:216–26.

39. Dulak AM, Schumacher SE, van Lieshout J, Imamura Y, Fox C, ShimB, et al.Gastrointestinal adenocarcinomas of the esophagus, stomach, and colonexhibit distinct patterns of genome instability and oncogenesis. Cancer Res2012;72:4383–93.

40. Stachler MD, Taylor-Weiner A, Peng S, McKenna A, Agoston AT, Odze RD,et al. Paired exome analysis of Barrett's esophagus and adenocarcinoma.Nat Genet 2015;47:1047–55.

41. Guo XZ, Zhang G, Wang JY, Liu WL, Wang F, Dong JQ, et al. Prognosticrelevance of Centromere protein H expression in esophageal carcinoma.BMC Cancer 2008;8:233.

42. Tomonaga T. Centromere protein H is up-regulated in primary humancolorectal cancer and its overexpression induces aneuploidy. Cancer Res2005;65:4683–9.

43. Blachly JS, Ruppert AS, Zhao W, Long S, Flynn J, Flinn I, et al. Immuno-globulin transcript sequence and somatic hypermutation computationfromunselectedRNA-seq reads in chronic lymphocytic leukemia. ProcNatlAcad Sci U S A 2015;112:4323–7.

44. Network TCGAR. Comprehensive molecular profiling of lung adenocar-cinoma. Nature 2014;511:543–50.

45. Kurimasa A, Kumano S, Boubnov NV, Story MD, Tung CS, Peterson SR,et al. Requirement for the kinase activity of human DNA-dependentprotein kinase catalytic subunit in DNA strand break rejoining. Mol CellBiol 1999;19:3877–84.

46. Zhao Y, Thomas HD, Batey MA, Cowell IG, Richardson CJ, Griffin RJ, et al.Preclinical evaluation of a potent novel DNA-dependent protein kinaseinhibitor NU7441. Cancer Res 2006;66:5354–62.

47. Goodwin JF, Kothari V, Drake JM, Zhao S, Dylgjeri E, Dean JL, et al. DNA-PKcs-mediated transcriptional regulation drives prostate cancer progres-sion and metastasis. Cancer Cell 2015;28:97–113.

48. Noguchi T, Shibata T, Fumoto S, Uchida Y, Mueller W, Takeno S.DNA-PKcs expression in esophageal cancer as a predictor forchemoradiation therapeutic sensitivity. Ann Surg Oncol 2002;9:1017–22.

49. Wei D. Emerging role of KLF4 in human gastrointestinal cancer.Carcinogenesis 2005;27:23–31.

50. Julien H€asler KS. Alu elements as regulators of gene expression. NucleicAcids Res 2006;34:5491–7.

51. Hashimoto K, Suzuki AM, Dos Santos A, Desterke C, Collino A, Ghisletti S,et al. CAGE profiling of ncRNAs in hepatocellular carcinoma revealswidespread activationof retroviral LTRpromoters in virus-induced tumors.Genome Res 2015;25:1812–24.

52. Varghese S, Newton R, Ross-Innes CS, Lao-Sirieix P, Krishnadath KK,O'Donovan M, et al. Analysis of dysplasia in patients with Barrett'sesophagus based on expression pattern of 90 genes. Gastroenterology2015;149:1511–5.

www.aacrjournals.org Mol Cancer Res; 15(11) November 2017 1569

Transcriptomic Landscape in EAC Development

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332

2017;15:1558-1569. Published OnlineFirst July 27, 2017.Mol Cancer Res   Jesper L.V. Maag, Oliver M. Fisher, Angelique Levert-Mignon, et al.   SequencingEsophageal Adenocarcinoma Using Whole Transcriptome Novel Aberrations Uncovered in Barrett's Esophagus and

  Updated version

  10.1158/1541-7786.MCR-17-0332doi:

Access the most recent version of this article at:

  Material

Supplementary

  http://mcr.aacrjournals.org/content/suppl/2017/07/25/1541-7786.MCR-17-0332.DC1

http://mcr.aacrjournals.org/content/suppl/2017/07/26/1541-7786.MCR-17-0332.DC2Access the most recent supplemental material at:

   

   

  Cited articles

  http://mcr.aacrjournals.org/content/15/11/1558.full#ref-list-1

This article cites 51 articles, 13 of which you can access for free at:

   

  E-mail alerts related to this article or journal.Sign up to receive free email-alerts

  Subscriptions

Reprints and

  [email protected]

To order reprints of this article or to subscribe to the journal, contact the AACR Publications Department at

  Permissions

  Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)

.http://mcr.aacrjournals.org/content/15/11/1558To request permission to re-use all or part of this article, use this link

on June 29, 2018. © 2017 American Association for Cancer Research. mcr.aacrjournals.org Downloaded from

Published OnlineFirst July 27, 2017; DOI: 10.1158/1541-7786.MCR-17-0332