Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor...

360
Identification of novel risk variants for sarcoma and other cancers by whole exome sequencing analysis in cancer cluster families Submitted by Rachel Jones This thesis is presented for the degree of Doctor of Philosophy The University of Western Australia School of Surgery 2017 i

Transcript of Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor...

Page 1: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Identification of novel risk variants forsarcoma and other cancers by wholeexome sequencing analysis in cancer

cluster families

Submitted byRachel Jones

This thesis is presented for the degree ofDoctor of Philosophy

The University of Western AustraliaSchool of Surgery

2017

i

Page 2: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ii

Page 3: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Declaration

I, Rachel Jones, certify that:

This thesis has been substantially accomplished during enrolment in the degree.

This thesis does not contain material which has been accepted for the awardof any other degree or diploma in my name, in any university or other tertiaryinstitution.

No part of this work will, in the future, be used in a submission in my name,for any other degree or diploma in any university or other tertiary institutionwithout the prior approval of The University of Western Australia and whereapplicable, any partner institution responsible for the joint-award of this degree.

This thesis does not contain any material previously published or written byanother person, except where due reference has been made in the text.

The work(s) are not in any way a violation or infringement of any copyright,trademark, patent, or other rights whatsoever of any person.

The research involving human data reported in this thesis was assessed andapproved by The University of Western Australia Human Research Ethics Committee.Approval number: RA/4/1/6434.

Third party editorial assistance was provided in the preparation of the thesis byDr Tegan McNab.

iii

Page 4: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby
Page 5: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

For Gareth and Abbie

“... [A] knowledge of sequences could contribute much to our understanding ofliving matter.”

Frederick Sanger [1980]

v

Page 6: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

vi

Page 7: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Abstract

Cancer is a genetic disease caused by an accumulation of genetic and epigeneticalterations. Cancers can be caused by mutations that arise in single somaticcells, resulting in sporadic tumours or mutations that occur in the germline,resulting in hereditary predisposition to cancer. While only a small proportionof cancers are estimated to involve an inherited genetic mutation, familial clusteringof cancers is relatively common. More than 100 cancer predisposition geneshave been identified using a variety of genetic strategies. However, only a smallproportion of familial cancer risk can be explained by established cancer susceptibilitygenes. The identification of genes that predispose individuals to cancer is ofhigh importance in human medical research as inherited genetic variants ingenes that metabolise and process drugs can influence response to treatment.

Sarcomas are a rare group of cancers that arise predominantly from the connectivetissues of the body. Despite representing only 1% of all cancers, sarcomas are ahigh impact group of cancers that disproportionately affect the young. Whileit is sometimes difficult to distinguish sporadic from hereditary cancer, rarecancer, such as sarcoma, occurring twice within the one family is epidemiologicallystriking. The use of whole exome sequencing (WES) in families currently representsan optimal study design for the identification of rare genetic variants involved inthe risk of cancer. Families in which multiple members develop a rare form ofcancer, such as sarcoma, are more likely to have a mutation segregating in aninherited cancer gene compared to families affected by more common types ofcancer.

vii

Page 8: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

In this study, three cancer cluster families (19 individuals) with a sarcoma probandwere selected from the International Sarcoma Kindred Study, and WES wasperformed on germline DNA from both affected and unaffected family membersusing the Ion Proton platform at 100X coverage. WES data was annotatedusing Annotate Variation (ANNOVAR) and Regulome database (RegulomeDB).Putative structural and regulatory variants were filtered using genomic locationand variant class or RegulomeDB score. Three different strategies were used toprioritise rare private variants, known rare variants and candidate gene variants.Association and segregation analyses of the prioritised variants were used toidentify eight nominally significant germline risk variants in the ARHGAP39,C16orf96, ABCB5, ZFP69B, UVSSA, BEAN1, KIF2C and PDIA2 genes thatshow segregation with cancer in the families.

Matched tumour and germline analyses were performed on WES data generatedusing the Illumina HiSeq 4000 at 60X coverage for two myxoid liposarcomapatients from two of the cancer cluster families. A total of 13 statistically significantsomatic mutations were identified using VarScan2 and Strelka (PRMT5, ASPN,LAMA2, TET2, FHOD3, GATAD2A, ADSSL1, P4HTM, ABL1, SLC6A18,PLK2 and two intergenic variants between SLC22A20 and POLA2, and SDR16C6Pand PENK ). A region of loss of heterozygosity on chromosome 16 was also identifiedin one of the myxoid liposarcoma tumours.

Whole genome sequencing (WGS) of germline DNA using the Illumina HiSeqX Ten platform was available for 561 sarcoma cases and 1,144 healthy ageingcontrols from the Garvan Institute for Medical Research. Using this WGS data,variant burden analyses were performed independently for summed nonsynonymousdeleterious variants and putative regulatory variants to validate target regionsidentified in the cancer cluster families. The target regions were defined as thegenes in which candidate germline and somatic mutations were identified andincluded 1,000 bases either side. For intergenic variants, both flanking geneswere included. Of the 21 regions analysed, six (C16orf96, SLC6A218, TET2,ARHGAP39, ABL1 and a region encompassing SLC22A20 and POLA2 ) werefound to have a significantly higher burden of variants in sarcoma cases comparedto controls (p-value < 2.38 x 10−3).

viii

Page 9: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

The current study was the first to perform WES on cancer cluster families identifiedby a sarcoma proband. The results indicate the utility of this approach to identifynovel sarcoma candidate risk genes by sequencing a small number of mixedcancer cluster families and validating the results in larger population cohorts.Genomic regions identified in this study should be prioritised for further studiesto determine the role of these genes in cancer and sarcoma pathogenesis.

ix

Page 10: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

x

Page 11: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Contents

Declaration iii

Abstract vii

Table of contents xi

List of tables xviii

List of figures xx

Acknowledgements xxiii

Authorship declaration xxv

Abbreviations xxvii

1 Literature review 11.1 Cancer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Cancer genetics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Familial cancers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3.1 Familial cancer predisposition syndromes . . . . . . . . . . 41.3.2 Familial cancer clusters . . . . . . . . . . . . . . . . . . . . 5

1.4 Evidence for pleiotropic genetic risk factors . . . . . . . . . . . . . 61.5 Sarcoma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5.1 Sarcoma genetics . . . . . . . . . . . . . . . . . . . . . . . 71.6 Methods for identifying genetic risk variants . . . . . . . . . . . . 9

1.6.1 Linkage mapping . . . . . . . . . . . . . . . . . . . . . . . 101.6.2 Association . . . . . . . . . . . . . . . . . . . . . . . . . . 101.6.3 DNA sequencing . . . . . . . . . . . . . . . . . . . . . . . 111.6.4 Whole exome sequencing . . . . . . . . . . . . . . . . . . . 12

xi

Page 12: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

1.6.5 Whole exome sequencing of cancer cluster families . . . . . 131.7 Next generation sequencing study considerations . . . . . . . . . . 131.8 Known cancer predisposition genes . . . . . . . . . . . . . . . . . 141.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.10 Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2 Aim 1: Whole exome sequencing of three cancer cluster familiesidentified by a sarcoma proband 192.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1.1 Ion Proton platform . . . . . . . . . . . . . . . . . . . . . 202.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.1 Families selected for whole exome sequencing . . . . . . . . 212.2.2 DNA extraction . . . . . . . . . . . . . . . . . . . . . . . . 252.2.3 Whole exome sequencing . . . . . . . . . . . . . . . . . . . 25

2.2.3.1 Library preparation . . . . . . . . . . . . . . . . 252.2.3.2 Exome sequencing . . . . . . . . . . . . . . . . . 27

2.2.4 Sequence alignment and variant calling . . . . . . . . . . . 282.2.5 Variation to sequence alignment and variant calling . . . . 28

2.2.5.1 Torrent variant caller plugin . . . . . . . . . . . . 282.2.5.2 Genome analysis toolkit . . . . . . . . . . . . . . 292.2.5.3 Intersect variant calls from Torrent Variant Caller

and Genome Analysis Toolkit . . . . . . . . . . . 302.2.6 Recalibrate variants . . . . . . . . . . . . . . . . . . . . . . 302.2.7 Genotype concordance . . . . . . . . . . . . . . . . . . . . 31

2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.3.1 Families selected for whole exome sequencing . . . . . . . . 312.3.2 Whole exome sequencing . . . . . . . . . . . . . . . . . . . 322.3.3 Variant calling . . . . . . . . . . . . . . . . . . . . . . . . 332.3.4 Recalibrate variants . . . . . . . . . . . . . . . . . . . . . . 332.3.5 Genotype concordance . . . . . . . . . . . . . . . . . . . . 36

2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.4.1 Evaluation of families used in this study . . . . . . . . . . 402.4.2 The use of whole exome sequencing to identify disease

causing variants . . . . . . . . . . . . . . . . . . . . . . . . 412.4.3 Limitations of whole exome sequencing . . . . . . . . . . . 42

xii

Page 13: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

2.4.4 The Ion Proton sequencing platform . . . . . . . . . . . . 432.4.5 Base calling software . . . . . . . . . . . . . . . . . . . . . 432.4.6 Concordance . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 Aim 2: Identification of candidate germline risk variants inthree cancer cluster families 473.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.2 Bioinformatic strategies for variant filtering and prioritisation in

whole exome sequencing . . . . . . . . . . . . . . . . . . . . . . . 483.2.1 Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.2.1.1 Annotation of non-coding regions . . . . . . . . . 493.2.2 Variant class filtering . . . . . . . . . . . . . . . . . . . . . 493.2.3 Population frequency filtering . . . . . . . . . . . . . . . . 503.2.4 Evolutionary conservation . . . . . . . . . . . . . . . . . . 503.2.5 Functional impact prediction . . . . . . . . . . . . . . . . . 503.2.6 Association analysis in families . . . . . . . . . . . . . . . 523.2.7 Familial segregation . . . . . . . . . . . . . . . . . . . . . . 523.2.8 Outline of chapter . . . . . . . . . . . . . . . . . . . . . . 52

3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.3.1 Ascertainment bias correction . . . . . . . . . . . . . . . . 533.3.2 Intersection . . . . . . . . . . . . . . . . . . . . . . . . . . 533.3.3 Annotation and filtration . . . . . . . . . . . . . . . . . . . 533.3.4 Prioritisation strategies . . . . . . . . . . . . . . . . . . . . 55

3.3.4.1 Prioritisation using a rare private variants strategy 553.3.4.2 Prioritisation using a known rare variants strategy 553.3.4.3 Prioritisation using a candidate gene strategy . . 56

3.3.5 Methods for testing association of variants with cancerphenotypes . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.3.6 Bonferroni correction . . . . . . . . . . . . . . . . . . . . . 573.3.7 Familial segregation analysis . . . . . . . . . . . . . . . . . 573.3.8 Evidence further supporting candidate risk genes . . . . . 57

3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.4.1 Variant prioritisation . . . . . . . . . . . . . . . . . . . . . 58

3.4.1.1 Prioritisation using a rare private variants strategy 593.4.1.2 Prioritisation using a known rare variants strategy 59

xiii

Page 14: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.4.1.3 Prioritisation using a candidate gene strategy . . 593.4.1.4 Summary of annotated variants from each prioritisation

strategy . . . . . . . . . . . . . . . . . . . . . . . 593.4.2 Rare private variants . . . . . . . . . . . . . . . . . . . . . 62

3.4.2.1 Association analysis in SOLAR . . . . . . . . . . 623.4.2.2 Segregation analysis results . . . . . . . . . . . . 64

3.4.3 Known rare variants . . . . . . . . . . . . . . . . . . . . . 653.4.3.1 Association analysis in SOLAR . . . . . . . . . . 653.4.3.2 Segregation analysis results . . . . . . . . . . . . 68

3.4.4 Candidate gene variants . . . . . . . . . . . . . . . . . . . 713.4.4.1 Association analysis in SOLAR . . . . . . . . . . 713.4.4.2 Segregation analysis results . . . . . . . . . . . . 74

3.4.5 Evidence further supporting germline risk genes . . . . . . 753.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3.5.1 Variant filtering and prioritisation strategies . . . . . . . . 813.5.2 Association and segregation analyses of candidate risk

variants in families . . . . . . . . . . . . . . . . . . . . . . 823.5.2.1 The ABCB5 gene . . . . . . . . . . . . . . . . . 833.5.2.2 The KIF2C gene . . . . . . . . . . . . . . . . . . 843.5.2.3 The PDIA2 gene . . . . . . . . . . . . . . . . . . 84

3.5.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4 Aim 3: A comparison of matched tumour and germline DNAfrom two sarcoma patients 874.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

4.1.1 Myxoid liposarcoma . . . . . . . . . . . . . . . . . . . . . 874.1.1.1 Somatic variants . . . . . . . . . . . . . . . . . . 884.1.1.2 Loss of heterozygosity . . . . . . . . . . . . . . . 884.1.1.3 Somatic copy number alteration . . . . . . . . . . 89

4.1.2 Bioinformatic assessment of matched tumour and germlinesamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

4.1.3 Somatic mutations and drug sensitivity . . . . . . . . . . . 904.1.4 Outline of chapter . . . . . . . . . . . . . . . . . . . . . . 91

4.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 914.2.1 Whole exome sequencing . . . . . . . . . . . . . . . . . . . 91

xiv

Page 15: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

4.2.2 Pre-processing and quality control . . . . . . . . . . . . . . 934.2.3 Adapter trimming . . . . . . . . . . . . . . . . . . . . . . . 934.2.4 Sequence alignment and calling . . . . . . . . . . . . . . . 934.2.5 BAM quality control . . . . . . . . . . . . . . . . . . . . . 944.2.6 Generate mpileup file . . . . . . . . . . . . . . . . . . . . . 944.2.7 Somatic variant calling using VarScan2 . . . . . . . . . . . 94

4.2.7.1 Somatic variant calling using Strelka . . . . . . . 954.2.8 Evidence further supporting somatic risk genes . . . . . . . 964.2.9 Drug sensitivity . . . . . . . . . . . . . . . . . . . . . . . . 964.2.10 Loss of heterozygosity variant calling using VarScan2 . . . 964.2.11 Variant annotation and filtering . . . . . . . . . . . . . . . 974.2.12 Somatic copy number analysis using VarScan2 . . . . . . . 97

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 974.3.1 Whole exome sequencing . . . . . . . . . . . . . . . . . . . 974.3.2 Sequence alignment and calling . . . . . . . . . . . . . . . 984.3.3 BAM quality control . . . . . . . . . . . . . . . . . . . . . 1004.3.4 Somatic variant calling . . . . . . . . . . . . . . . . . . . . 103

4.3.4.1 VarScan2 . . . . . . . . . . . . . . . . . . . . . . 1034.3.4.2 Validation of somatic variants using Strelka . . . 1044.3.4.3 Evidence further supporting somatic risk genes . 1074.3.4.4 Drug sensitivity . . . . . . . . . . . . . . . . . . . 118

4.3.5 Loss of heterozygosity variants . . . . . . . . . . . . . . . . 1184.3.6 Copy number analysis . . . . . . . . . . . . . . . . . . . . 119

4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1214.4.1 Comparison of results in the context of published literature

on myxoid liposarcoma genetics . . . . . . . . . . . . . . . 1214.4.2 Strengths . . . . . . . . . . . . . . . . . . . . . . . . . . . 1254.4.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . 1254.4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5 Aim 4: Variant burden analyses at candidate risk loci in sarcomacases and healthy ageing controls 1275.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

5.1.1 Variant burden analyses in sarcoma cohorts . . . . . . . . 1285.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

xv

Page 16: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

5.2.1 Study participants . . . . . . . . . . . . . . . . . . . . . . 1285.2.2 Whole genome sequencing . . . . . . . . . . . . . . . . . . 1295.2.3 Genomic regions selected for validation . . . . . . . . . . . 1295.2.4 Statistical analyses . . . . . . . . . . . . . . . . . . . . . . 131

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335.3.1 Identification of nonsynonymous deleterious variants in

the target regions . . . . . . . . . . . . . . . . . . . . . . . 1335.3.2 Statistical analyses . . . . . . . . . . . . . . . . . . . . . . 134

5.3.2.1 Nonsynonymous deleterious variants . . . . . . . 1345.3.2.2 Putative regulatory variants . . . . . . . . . . . . 136

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1385.4.1 Novel findings . . . . . . . . . . . . . . . . . . . . . . . . . 1385.4.2 Known cancer genes . . . . . . . . . . . . . . . . . . . . . 1395.4.3 Clinical implications . . . . . . . . . . . . . . . . . . . . . 1405.4.4 Strengths and limitations . . . . . . . . . . . . . . . . . . . 1415.4.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

6 Conclusion 1456.1 Summary of results . . . . . . . . . . . . . . . . . . . . . . . . . . 1456.2 Clinical utility of findings . . . . . . . . . . . . . . . . . . . . . . 1466.3 Review of methodology . . . . . . . . . . . . . . . . . . . . . . . . 1476.4 Recommendations for future work . . . . . . . . . . . . . . . . . . 148

Bibliography 149

Appendices 239

A World Health Organisation classification of soft tissue tumoursand bone tumours 241

B Novel tumour-predisposing genes identified by whole exomesequencing 251

C Familial cancer syndromes associated with sarcomas 265

D Translocations associated with sarcomas 271

xvi

Page 17: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

E Genetically complex sarcomas 277

F Known cancer predisposition genes 281

G Candidate genes used for variant prioritisation based on apriori knowledge of cancer biology 289

H Genes in which variants were also prioritised using the candidategene prioritisation strategy 293

I Patient 1-II-2: Copy number variation by chromosome 297

J Patient 2-II-1: Copy number variation by chromosome 303

K A list of nonsynonymous deleterious variants included in variantburden analyses 309

L Gene identified by variant burden analyses by Ballinger et al.(2016) and Brohl et al. (2017) 315

M A list of putative regulatory variants included in variant burdenanalyses 319

xvii

Page 18: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

List of Tables

2.1 Parameters used to create whole exome sequencing run plansusing Torrent Suite software . . . . . . . . . . . . . . . . . . . . . 27

2.2 Parameters used to run the Torrent Variant Caller plugin to callbases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2.3 Parameters used for Genome Analysis Toolkit UnifiedGenotyperto call bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.4 Depth of coverage summary from Torrent Suite . . . . . . . . . . 322.5 Genome Analysis Toolkit VariantRecalibrator tranche results . . . 342.6 Discordant genotype calls between the Agilent HaloPlex custom

panel and whole exome sequencing for Patient 2-II-1 . . . . . . . 392.7 Discordant genotype calls between the Agilent HaloPlex custom

panel and whole exome sequencing for Patient 3-III-1 . . . . . . . 39

3.1 Classification of Regulome database scores . . . . . . . . . . . . . 543.2 Functional annotation of intersect file using ANNOVAR . . . . . . 583.3 Summary of variant annotation using Annotate Variation and

Regulome Database for each prioritisation strategy . . . . . . . . 603.4 Summary of SOLAR association results for rare private variants . 633.5 Summary of SOLAR association results for known rare variants . 663.6 Summary of SOLAR association results for candidate gene variants 723.7 Summary of findings from in silico resources investigating the

role of candidate germline risk variants in cancer pathogenesis . . 763.8 Summary of search results from PubMed for genes in which germline

variants were identified . . . . . . . . . . . . . . . . . . . . . . . . 79

4.1 Parameters specified for VarScan2 somaticfilter to filter falsepositives from the high confidence somatic mutations . . . . . . . 95

xviii

Page 19: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

4.2 Raw data summary from Macrogen Inc. for Patient 1-II-2 andPatient 2-II-1 germline and tumour samples . . . . . . . . . . . . 98

4.3 Summary statistics generated using Samtools flagstat for Patient1-II-2 and 2-II-1 germline and tumour samples . . . . . . . . . . . 99

4.4 Results from VarScan2 somaticfilter to remove possible falsepositives from the high confidence somatic calls for Patient 1-II-2and Patient 2-II-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.5 Somatic variants identified by VarScan2 and Strelka for Patient1-II-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

4.6 Somatic variants identified by VarScan2 and Strelka for Patient2-II-1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

4.7 Summary of findings from in silico resources investigating therole of somatic risk variants and the genes in which they arise incancer pathogenesis . . . . . . . . . . . . . . . . . . . . . . . . . . 108

4.8 Summary of search results from PubMed for genes in which somaticvariants were identified . . . . . . . . . . . . . . . . . . . . . . . . 114

4.9 Statistically significant high confidence loss of heterozygosityvariants for Patient 1-II-2 . . . . . . . . . . . . . . . . . . . . . . 120

5.1 Genomic coordinates for target regions in which germline andsomatic risk variants were identified . . . . . . . . . . . . . . . . . 130

5.2 Classification of Regulome database scores . . . . . . . . . . . . . 1325.3 Annotated summary of nonsynonymous deleterious variants and

putative regulatory variants in the target regions . . . . . . . . . . 1335.4 Odds ratios, p-values and 95% confidence intervals from Fisher’s

exact test for target regions for nonsynonymous deleterious variants 1355.5 Odds ratios and p-values from Fisher’s exact test for target regions

for putative regulatory variants . . . . . . . . . . . . . . . . . . . 137

xix

Page 20: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

List of Figures

1.1 Location of known cancer predisposition genes . . . . . . . . . . . 16

2.1 Pedigree of family 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2 Pedigree of family 2 . . . . . . . . . . . . . . . . . . . . . . . . . . 242.3 Pedigree of family 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 242.4 Whole exome sequencing pipeline flowchart . . . . . . . . . . . . . 262.5 The number of variants called by Torrent Variant Caller and

Genome Analysis Toolkit UnifiedGenotyper, and the number ofvariants that were called by both callers (intersect) . . . . . . . . 33

2.6 Genome Analysis Toolkit VariantRecalibrator tranche plot . . . . 342.7 Genome Analysis Toolkit VariantRecalibrator projection for mapping

quality rank sum (MQRankSum) versus haplotype score . . . . . 352.8 Concordance of genotype calls between the Agilent HaloPlex

custom panel and whole exome sequencing on Ion Proton forthree patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.1 Genotypes for the ARHGAP39 variant that shows segregation inpatients with cancer in family 3 . . . . . . . . . . . . . . . . . . . 64

3.2 Genotypes for the C16orf96 and ABCB5 variants that showsegregation in patients with cancer in family 2 . . . . . . . . . . . 68

3.3 Genotypes for the ZFP69B, BEAN1, UVSSA and KIF2C variantsthat show segregation in patients with cancer in family 3 . . . . . 70

3.4 Genotypes for the PDIA2 variant that shows segregation in patientswith cancer in family 2 . . . . . . . . . . . . . . . . . . . . . . . . 74

4.1 Pedigree of family 1 highlighting sarcoma Patient 1-II-2 for tumour-germlinecomparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

4.2 Pedigree of family 2 highlighting sarcoma Patient 2-II-1 for tumour-germlinecomparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

xx

Page 21: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

4.3 Genome analysis toolkit depth of coverage summary for Patient1-II-2 and Patient 2-II-1 germline and tumour DNA . . . . . . . . 101

4.4 Insert size histogram plots generated by Picard for Patient 1-II-2and Patient 2-II-1 germline and tumour samples . . . . . . . . . . 102

4.5 Pedigree of family 1 indicating genotypes for each patient atchr16:53513055 (rs8049033) in the RBL2 gene . . . . . . . . . . . 123

xxi

Page 22: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

xxii

Page 23: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Acknowledgements

I would like to acknowledge support from Mandy Basson and the Board of Directorsof the Abbie Basson Sarcoma Foundation Ltd (Sock it to Sarcoma!).

I would like to sincerely thank David Thomas, Mandy Ballinger and Mark Pinese,for providing the DNA samples and data used in this thesis. I would also like toacknowledge the participants from the International Sarcoma Kindred Studyand the Medical Genome Reference Bank.

I would like to express my gratitude to my supervisors Eric Moses, Phillip Melton,David Wood, David Thomas and Evan Ingley, for their guidance and for theopportunity to pursue this project. I would also like to acknowledge Jane Allenand Barry Iacopetta for their support.

I would like to thank all my friends at the Centre for Genetic Origins of Healthand Disease for your daily guidance and support, especially Alex Rea for hisassistance in the lab and Gemma Cadby for her helpful advice and for readingdrafts. I would also like to thank Tegan McNab for proofreading my thesis.

I am grateful to my family and friends who have always supported my studies.

xxiii

Page 24: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

xxiv

Page 25: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby
Page 26: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

xxvi

Page 27: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Abbreviations

Abbreviation Definition

*.bam Binary Alignment/Map

*.bed Browser Extensible Data

*.sam Sequence Alignment/Map

*.vcf Variant Call Format

ABC ATP-binding cassette

Alt Alternate allele

ANNOVAR Annotate Variation

ASPREE ASPirin in Reducing Events in the Elderly

ATP Adenosine Triphosphate

ATPase Adenosinetriphosphatase

ATRA All-Trans-Retinoic-Acid

B Benign

BCFtools Binary Variant Call Format Tools

BWA Burrows-Wheeler Aligner

BWA-MEM Burrows-Wheeler Aligner Maximal Exact Matches

Chr Chromosome

CNV Copy Number Variation

COSMIC The Catalogue of Somatic Mutations in Cancer

CpG 5’—C—phosphate—G—3’

CREB cAMP Response Element-binding Protein

xxvii

Page 28: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Abbreviation Definition

D Deleterious

dbSNP Short Genetic Variations Database

DNA Deoxyribonucleic Acid

DNase Deoxyribonuclease

dNTP Deoxynucleotide

E2F E2 Factor

ECM Extracellular Matrix

ENCODE Encyclopedia of DNA Elements

eQTL Expression Quantitative Trait Loci

ER Endoplasmic Reticulum

ERbB Erythroblastosis

ERK Extracellular Signal-Regulated Kinase

ESC Embryonic Stem Cells

ExAC Exome Aggregation Consortium

FAMMM Familial Atypical Multiple Mole Melanoma

FFPE Formalin-Fixed and Paraffin-Embedded

GATK Genome Analysis ToolKit

GeneRIF Gene References into Functions

GERP Genomic Evolutionary Rate Profiling

GO Gene Ontology

GOHaD Centre for Genetic Origins of Health and Disease

GPCR G Protein–Coupled Receptor

GTP Guanosine Triphosphate

GTPase Guanosine Triphosphatase

GWA Genome Wide Association

HapMap International Haplotype Project

HDI Histone Deacetylation Inhibitor

hg19 Human Genome build 19

xxviii

Page 29: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Abbreviation Definition

hMSCs Human bone marrow-derived Mesenchymal Stromal Cells

IG Intergenic

IGV Integrative Genomics Viewer

INDEL Insertions and Deletions

Int Intronic

isec BCFtools Intersect

ISKS International Sarcoma Kindred Study

Kb Kilobase

LOD Logarithm of the Odds

LOH Loss Of Heterozygosity

MAF Minor Allele Frequency

MGRB Medical Genome Reference Bank

MPNST Malignant Peripheral Nerve Sheath Tumour

MQRankSum Mapping Quality Rank Sum

mRNA Messenger Ribonucleic Acid

NCBI National Center for Biotechnology Information

NGS Next Generation Sequencing

NS Nonsynonymous

NTR Neurotrophins

OMIM Online Mendelian Inheritance in Man

P Possibly damaging

PDI Protein Disulphide Isomerase

PNET Primitive Neuroectodermal Tumour

PolyPhen-2 Polymorphism Phenotyping-2

Probit Probability Unit

Q Base Quality Score

QC Quality Control

Rb Retinoblastoma

xxix

Page 30: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Abbreviation Definition

Ref Reference allele

RegulomeDB Regulome Database

RNA Ribonucleic Acid

Robo Roundabout family of proteins

rs ID Reference SNP Identification

S Synonymous

SCNA Somatic Copy Number Alteration

SIFT Sorting Intolerant from Tolerant

SLBP Stem-Loop Binding Domain

SNP Single Nucleotide Polymorphism

SNV Single Nucleotide Variant

SOLAR Sequential Oligogenic Linkage Analysis Routines

T Tolerated

TF Transcription Factor

TMAP Torrent Mapping Alignment Program

TVC Torrent Variant Caller

UCSC University of California Santa Cruz

USA United States of America

UTR Untranslated Region

UTR3 3’ Untranslated Region

UTR5 5’ Untranslated Region

UWA The University of Western Australia

VQSLOD Variant Quality Score Log-Odds

VQSR Variant Quality Score Recalibration

WES Whole Exome Sequencing

WGS Whole Genome Sequencing

xxx

Page 31: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Chapter 1

Literature review

1.1 Cancer

Collectively, cancers are a diverse spectrum of human diseases with a commonprogression resulting from the failure to regulate normal cell growth, proliferationand apoptosis.1 Cancers can arise from any of the cell or tissue types in thehuman body and are classified accordingly.2 The most common cancers in adultsare carcinomas, (approximately 90% of cancers)2 which are derived from epithelialcells that line body cavities and glands.3 Lymphomas and leukaemias arise inthe tissue that gives rise to lymphoid and blood cells and account for approximately8% of human malignancies.3,4 Melanomas, retinoblastomas, neuroblastomasand glioblastomas are derived from dividing cells in melanocytes, ocular retina,neurons and neural glia, respectively.3 Sarcomas arise from the connective tissuessuch as bones, tendons, cartilage and fat.2

Cancer is one of the leading worldwide causes of death with over 14 millionpeople affected each year.5 In 2012, there were 4.3 million premature deathsfrom cancer with premature deaths expected to increase 44% from 2012 to 2030.6,7

The lost years of life and productivity caused by cancer represent the largestcost to the global economy compared to other causes of death.8

1

Page 32: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

1.2 Cancer genetics

Cancer is a genetic disease arising from an accumulation of genetic and epigeneticmutations.9 These mutations can deregulate multiple complex regulatory pathwaysof genes affecting cellular growth, division, migration, and survival.10 Tumourgenomes usually exhibit many mutations and can be highly unstable.11 Mutationscan range from intragenic mutations to large gains and losses of chromosomalmaterial.9

A genetic mutation is a permanent change in the DNA sequence. A polymorphismis a genetic variation that is common in the population. The arbitrary cut-offbetween a mutation and a polymorphism is 1%, that is, the less common alleleof a polymorphism must have a frequency of at least 1% in the population.12

Mutations in a cancer genome can comprise the following types of DNA change:substitutions, insertions or deletions of small or large segments of DNA, rearrangements,copy number increases, and copy number reductions.13 Cancer cells can alsoacquire new DNA sequences from viruses including human papillomavirus, Epstein-Barrvirus, hepatitis B virus, human T-lymphotropic virus 1, and human herpesvirus.14

Cancer genomes can also acquire epigenetic changes which alter chromatin structureand gene expression.15

There can be anywhere between tens to thousands of mutations per cancergenome.16 The substantial variation in the number and pattern of mutations inindividual cancers reflects exposure to different risk factors, DNA repair defects,and the cellular origins.17

Mutations that occur in cancers fall into two functional categories: mutationsrequired for tumourigenesis, and mutations that merely occur during tumourigenesisand do not contribute to the process. These are called driver and passengermutations, respectively.

Drivers confer a selective advantage during clonal evolution and therefore ‘drive’the tumourigenesis process. Passenger mutations do not appear in tumours as aresult of evolutionary selection, but rather as a variation that occurs by chancein a cell that harbours a driver mutation. It is likely that most cancers carry

2

Page 33: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

more than one driver mutation, and the number of drivers varies between cancertype.13,16,18–20

Mutations can arise in three broad categories of genes - oncogenes, tumoursuppressor genes, and genome stability genes. Mutations in oncogenes and tumoursuppressor genes drive the tumourigenesis process by increasing proliferationor inhibiting apoptosis, respectively, whereas mutations in genome stabilitygenes drive tumourigenesis by increasing the rate of mutations in other genes.9

The characterisation of these genes has led to the discovery of the biochemicalpathways underlying the process of tumourigenesis, and also to a better understandingof the normal homeostatic roles these pathways play in healthy cells and tissues.21

Mutations in these three classes of genes can occur in single somatic cells, resultingin sporadic tumours, or in the germline, resulting in hereditary predisposition tocancer.

Sporadic cancers develop due to mutations that arise during a person’s lifetime.The majority of cancers (90-95%) develop sporadically due to genetic mutationsthat result from DNA damage from exposure to environmental and lifestylefactors.22 Environmental risk factors include occupational exposures (chemicals,dust, and industrial processes), sunlight, radiation, and environmental pollution.23

Lifestyle factors that may increase the risk of developing cancer include smoking,excessive alcohol consumption, poor diet, obesity and physical inactivity, chronicinfections, sun tanning, and sunburn.24,25

Only a small proportion (5-10%) of cancers are estimated to involve an inheritedgenetic mutation.26–30 However, familial clustering of cancers is relatively common.31

Familial clustering is the occurrence of a disease, such as cancer, in some familiesmore than what would be expected from the presence in the general population.32

Familial clustering of cancer can be measured by familial proportion (the proportionof cases with an affected relative), which has been reported as high as 20% inprostate cancer.33 Familial clustering of cancers is likely due to a combinationof environmental factors, rare gene mutations with high penetrance and morecommon, lower penetrant gene variants that act together to increase cancersusceptibility.32,34–36

3

Page 34: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

1.3 Familial cancers

All cancers, both rare and common, show some degree of familial clustering.37

Cancers can be two- to four-fold more common in first degree relatives of individualswith cancer.38

1.3.1 Familial cancer predisposition syndromes

Familial clustering of cancers can sometimes represent a familial cancer predispositionsyndrome. A familial cancer predisposition syndrome manifests when multiplemembers of a family inherit gene mutations that predispose them to one ormore types of cancer.39 These families have multiple affected individuals, andfamily members often show early onset of cancer, multiple primary sites of disease,and occasionally bilateral involvement of paired organs.35,39 Some cancer predispositionsyndromes appear to confer an increased risk of adult-onset cancers, such asbreast, ovarian and colorectal cancers.40–42 Other syndromes increase the susceptibilityof tumour onset in childhood, such as hereditary retinoblastoma,43 or earlyonset in both children and adults, such as von Hippel-Lindau disease.44

Most familial cancer predisposition syndromes are transmitted in a Mendelianautosomal dominant manner.35,45 Dominant mutations require only one defectiveallele to be present for the individual to be predisposed to cancer. Individualswith one defective and one normal allele are heterozygous. An example of aMendelian autosomal dominant cancer predisposition syndrome is hereditarybreast-ovarian cancer.46 This syndrome is caused by mutations in the BRCA1and BRCA2 genes.47 Women with germline mutations in BRCA1 have a 46–65%risk of developing breast cancer by age 70, while those with a BRCA2 mutationhave a lower risk of 43–45% by age 70.40,48

Less often, familial cancer predisposition syndromes can be transmitted in anautosomal recessive manner. In the case of recessive mutations, both allelesmust be mutated for the individual to have a predisposition to cancer. Individualswho inherit a recessive germline mutation in a gene are known as carriers andcarry the mutation in every cell of their body. There is a variable risk that acarrier will develop cancer. A carrier will not develop cancer unless the remaining

4

Page 35: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

normal allele is also mutated. The particular mutation, other genes, and dietary,lifestyle and environmental factors can influence risk.49 The likelihood that acarrier will develop cancer is defined as the penetrance of the mutation.3

An example of an autosomal recessive cancer predisposition syndrome is xerodermapigmentosum complementation group A, characterised by increased sensitivityto sunlight with the development of carcinomas at an early age.50 Xerodermapigmentosum complementation group A has been associated with homozygousor compound heterozygous mutations in the XPA gene.50

The study of familial cancer predisposition syndromes has led to the identificationof genes critical to carcinogenesis and has also informed our understanding ofthe fundamental biology of human cancer.51 Li and Fraumeni (1969) describedthe first familial cancer syndrome in four unrelated children with sarcoma andother affected family members.52 They hypothesised that the occurrence ofvarious malignancies in a family might represent a familial cancer syndromedue to the transmission of an autosomal dominant gene mutation.52 In 1990 theTP53 gene was identified as the underlying gene responsible for Li-Fraumenisyndrome.53 The TP53 gene encodes a tumour suppressor protein that respondsto diverse cellular stresses to regulate expression of target genes, thereby inducingcell cycle arrest, apoptosis, senescence, DNA repair, or changes in metabolism.54–56

Germline mutations in the TP53 gene were later established to also be theunderlying genetic cause for many other malignancies.51

1.3.2 Familial cancer clusters

There are also familial cancer clusters that are not defined by known hereditarycancer syndromes. Familial cancer clusters are those that do not exhibit thefeatures of hereditary types of cancer but occur in more individuals in the familythan statistically expected.36

In addition to familial clustering for the majority of specific cancers, aggregationof different types of cancers in families has also been observed. For example,individuals with BRCA1 and BRCA2 mutations, have not only increased susceptibilityto breast and ovarian cancers, but also colon, cervix, uterus, pancreas, and prostatecancers.57

5

Page 36: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

1.4 Evidence for pleiotropic genetic risk factors

Early studies assessed the discordant clustering of cancer in families to determineif there was a general susceptibility to cancer. Case-control, registry- and population-basedstudies have evaluated familial clustering using risk ratio and kinship coefficientestimations.58 Identifying shared genetic associations between diseases (pleiotropy)is a useful approach to identify new risk loci, and may elucidate common aetiologiesand help in risk prediction.59 The largest studies using the Utah Population andCancer Registry Database and the Swedish Family-Cancer Database demonstratedexcess familial clustering at almost every cancer site in the body.34,60–63 However,these studies focused on familial clustering exclusively in nuclear families, therefore,they were not able to separate the role of shared environmental and geneticfactors in the familial aggregation of cancer.

A more extensive study by Cannon-Albright et al. (1994) used the Utah PopulationDatabase to evaluate familial clustering for more distant relatives.60 As familialrisk can be due to shared exposure to an environmental risk and/or a commongenetic mutation, examination of familial clustering in near and distant relativesis useful. In more distant relationships, shared familial environment might beless likely, and the probability of shared genotypes can be measured.60 Thisstudy found that there was significant clustering of cancer outside the nuclearfamily for cancer sites.60 These results support the hypothesis of an inheritedbasis to cancer of almost all sites and support the existence of more than onesusceptibility locus for some cancers.60

In support of this finding, a study by Amundadottir et al. (2005) analysed familialaggregation of cancer in extended families from Iceland to search for geneticfactors that contribute to cancer at one or more sites in the body.58 The authorsfound that most cancer sites demonstrated a significantly increased risk for thesame cancer beyond the nuclear family.58 They also found significantly increasedfamilial clustering between different cancer sites in both close and distant relatives.58

Therefore, Amundadottir et al. concluded that genetic factors are involved inthe aetiology of many cancers and that these factors are in some cases sharedby different cancer sites.58 These findings support the conclusions by Cannon-Albrightet al. However, shared environment or non-random mating for certain risk factors

6

Page 37: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

also play a role in the familial clustering of cancer.58 Several types of studydesigns can be used to identify genetic risk variants that may be involved in theaetiology of cancers.

1.5 Sarcoma

Sarcomas are a rare group of cancers that arise predominantly from the embryonicmesoderm (the connective tissues of the body), for example, bones, muscles,cartilage and fat. There are over 70 different subtypes of sarcoma that are groupedinto two broad classifications of bone or soft tissue (Appendix A).64 The majorityof sarcomas arise in the soft tissue, while malignant bone tumours make up justover 10% of all sarcomas.65 Soft tissue sarcomas are often further sub-categorisedby the line of differentiation, for example, liposarcoma (fat), leiomyosarcoma(smooth muscle), rhabdomyosarcoma (skeletal muscle) and fibrosarcoma (connectivetissue).66,67 Bone tumours are further classified into bone-forming tumours,cartilage-forming tumours, marrow tumours, or vascular tumours.68 It can bedifficult to diagnose and classify this diverse group of malignancies with overlappinghistological features. However, it is important to correctly determine the specifichistologic subtype for management and treatment decisions.67,68

Sarcomas are a high impact group of cancers that disproportionately affect theyoung. Although sarcomas are rare, they contribute significantly to the burdenof disease as they tend to affect teenagers and young adults.69,70 Sarcomas representonly 1% of all cancers in adults but represent 10% of cancers in children and 8%of cancers in adolescents and young adults.71 There are approximately 800 newsarcoma cases in Australia each year.72

1.5.1 Sarcoma genetics

There is evidence to suggest a strong genetic basis to sarcomas. First, sarcomasdisproportionately affect the young, with early age at diagnosis associated witha genetic basis for many heritable diseases, including hereditary cancers.73,74

Second, sarcomas are over-represented among survivors of melanoma, breastcancer, thyroid cancer, Hodgkin’s lymphoma, and leukaemias.75 Third, sarcomasurvivors are at increased risk of secondary cancers.76 Finally, several rare genetic

7

Page 38: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

syndromes are associated with sarcomas such as Li-Fraumeni syndrome.35 AppendixC contains a summary of hereditary syndromes associated with sarcoma includinggenes and genomic locations.

In addition to sarcomas being associated with familial cancer predispositionsyndromes,52,77–79 sarcomas also show evidence of familial clustering. Up to 33%of paediatric sarcomas are estimated to be associated with a significant familyhistory of cancers.80 The risk of sarcomas is increased six-fold in relatives ofchildren with sarcoma compared to age-matched controls. When a causal genemutation is identified, this risk increased to over 250-fold.81

Whilst some sarcomas are associated with familial inherited predisposition,most sarcomas do not have a known cause. Very little is currently known aboutthe causes of sarcoma because they are so rare.65 Several risk factors have beenassociated with sarcomas including ionising radiation,82,83 viruses (Epstein-Barrvirus84 and Kaposi’s sarcoma-associated herpes virus85), occupation,86–90 exposureto chemicals,91–96 hormones,97,98 antibiotics,99 medications for nausea used duringpregnancy,100 use of antibiotics in babies,101 birth weight,102 gestational age,103

birth order, and maternal age.104,105

Sarcomas that arise due to somatic mutations are classified into two main groupsbased on genetics:

1. Sarcomas with specific recurrent genetic mutations on a background ofrelatively few other chromosomal changes

2. Sarcomas with no specific genetic mutations on a complex background ofnumerous chromosomal changes

Approximately one-third of all sarcomas have specific recurrent genetic mutations.106

These tumours either contain disease-specific chromosome translocations orspecific activating mutations.

Most sarcomas with specific recurrent genetic mutations are characterised bybalanced or reciprocal translocations (the exchange of pieces between two chromosomes),resulting in two derivative chromosomes with no net gain or loss of chromosomal

8

Page 39: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

material.107 In some cases, only one derivative chromosome is formed, and somegenetic material is lost. The fusion proteins produced as a result of the translocationcan contribute to oncogenesis by increasing cell proliferation, promotinganchorage-independent cell growth, overriding cell contact adhesion, inhibitingapoptosis, enhancing invasion and suppressing terminal differentiation.107 AppendixD contains a table of known translocations that have been associated with sarcoma.

The remaining sarcomas in the specific recurrent genetic mutations group arecharacterised by specific activating mutations.108 These tumours show somedegree of aneuploidy, but generally, have less disordered karyotypes than thecomplex group of sarcomas.83 An example of a sarcoma subtype with a specificactivating mutation is gastrointestinal stromal tumours which have activatingmutations in KIT or PDGFRA.109,110

The remaining two-thirds of sarcomas have highly complex unbalanced karyotypeslacking specific genetic translocations.66,111 This group is mostly composed ofspindle cell or pleomorphic sarcomas including leiomyosarcoma, myxofibrosarcoma,pleomorphic liposarcoma, pleomorphic rhabdomyosarcoma, malignant peripheralnerve sheath tumour, angiosarcoma, extraskeletal osteosarcoma and spindlecell/pleomorphic unclassified sarcoma (previously known as spindle cell/pleomorphicmalignant fibrous histiocytoma).111 These neoplasms show gains and lossesof many chromosomes or chromosome regions and amplifications.111 Many ofthem share recurrent aberrations (such as the gain of 5p13-p15) that play asignificant role in tumour progression or metastatic dissemination.111 AppendixE lists the genomic regions identified in complex sarcomas.

1.6 Methods for identifying genetic risk variants

Several study designs can be employed to identify genetic risk variants. Eachstudy design is suited to identifying different types of mutations from highlypenetrant genes in rare Mendelian disorders to low-penetrant variants in morecommon disease, and rare variants.

9

Page 40: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

1.6.1 Linkage mapping

Linkage mapping in families has been used with success in localising highlypenetrant disease-causing genes (e.g., BRACA1 and BRACA2 ) and, in particular,those involved in rare Mendelian human diseases (e.g., Online Inheritance InMan (OMIM), http://www.omim.org/). Linkage analysis in families is a formof positional cloning and makes no underlying assumptions about the natureof the genes involved. In human disease studies, the aim of linkage mapping isfirst to determine the chromosomal location of putative risk genes by identifyingpolymorphic DNA markers that cosegregate with a disease of interest. Thegenes in such linkage regions are referred to as positional candidate risk genes.These genes are then prioritised for further genetic and molecular analyses toidentify the specific causal mutations or polymorphisms.

Linkage mapping has been used to identify highly penetrant susceptibility allelesassociated with Mendelian familial cancer predisposition syndromes.112–117 However,these variants explain only a small fraction of the genetics of all cancer cases.For example, inherited mutations in the BRCA1 and BRCA2 genes account forapproximately 2%–3% of all breast cancer cases.118,119 However, more prevalentfounder mutations in these genes can explain up to about 10% of the disease insome populations.47,120–123

1.6.2 Association

It has been postulated that more common cancers that do not show a clearpattern of inheritance are caused by many genes that confer a small risk ofdisease.124 For disease risk genes of small effect, association studies can be morepowerful than linkage studies.125 With the advent of dense panels of single nucleotidepolymorphism (SNP) markers and high-throughput technology for efficientlygenotyping them in thousands of individuals, the genome wide association (GWA)analysis study design was subsequently adopted widely for the genetic analysisof common human diseases. GWA is also a form of positional cloning and relieson linkage disequilibrium, the non-random association of alleles at different locithat is a function of population history.

10

Page 41: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

GWA studies have identified thousands of lower penetrance risk variants forcommon human traits and diseases typically with small effect size (odds ratiobetween 1.1 and 1.5).126 Lower penetrance genetic variants associated withnon-familial syndrome breast cancer confer slight risk alterations (odds ratioof approximately 1.2),127 compared to the high penetrance variants in BRCA1and BRCA2 identified by linkage with odds ratios between 2 and 4.127

Most genetic cancer risk variants identified so far confer relatively small incrementsin risk and explain only a small proportion of familial clustering.128 The inabilityof the risk variants detected by GWA studies to account for much of the heritabilityof most common disorders, “missing heritability”, has led to an emerging viewthat rare variants with larger effect sizes could be responsible for a substantialproportion of genetic risk for complex human disease.129 Significant advances ingenome sequencing have now offered the possibility of using this technology asan alternative study design to GWA studies for the detection of rare genetic riskvariants.

1.6.3 DNA sequencing

DNA sequencing analysis is the process of determining the precise order of nucleotidesin a given DNA sample. One aim of DNA sequencing is to identify genomicvariations and to associate those changes with human disease. A breakthroughin DNA sequencing technology was the development of Sanger’s chain terminationmethod.130 In this approach, sequencing occurs by the selective incorporationof a single chain-terminating dideoxynucleotide by DNA polymerase.130 Forapproximately 40 years Sanger sequencing was the most widely used approach.

Since the completion of the Human Genome Project in 2003, there have beensubstantive developments in Next Generation Sequencing (NGS) technologies.Whereas the first human genome took 13 years and several billion dollars tocomplete, a human genome can now be sequenced in a day for $1,000 US (at20X coverage). The speed of sequencing has increased as NGS enables the simultaneousdetection of multiple mutations in multiple genes by the parallel sequencingof millions of different DNA fragments.131 The development of affordable andefficient next generation DNA sequencing technologies has now provided a new

11

Page 42: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

study paradigm to search for rare risk variants involved in common, complexdiseases.

The impact of NGS technology on the discovery of genetic variants in humandisease has been profound. Since the introduction of NGS there have been enormousadvances in speed, read length and throughput of sequencing studies.132 Theadvent of NGS has allowed the inquiry of nearly every base in the genome.133

The growth in cancer genomics discovery has been unprecedented; knowledge ofgenes frequently mutated in cancer has grown from four genes in 2004 to over600 genes listed in the Catalogue of Somatic Mutations in Cancer (COSMIC)currently (v79, released 14-NOV-16).134 Initiatives such as the Cancer GenomeAtlas135 and the International Cancer Genome Consortium,136 have employedNGS strategies to characterise tumour genomes and provide multi-platform datafor thousands of tumours from a variety of cancer types and subtypes.137

Typical NGS applications include DNA sequencing, RNA sequencing (to measuregene expression changes to discover new transcripts), chromatin immunoprecipitationsequencing (to detect genome wide transcription factor binding sites andchromatin-associated modifications) and methylation sequencing (to profilevarious types of DNA methylation).138,139 Next generation DNA sequencingcan be used for whole genome sequencing (WGS), whole exome sequencing(WES), or sequencing of a specifically targeted region of the genome.138 TheNGS workflow consists of multiple steps including library preparation and enrichment,sequencing, base calling, sequence alignment and variant calling.

1.6.4 Whole exome sequencing

WES involves sequencing only the protein-coding region of the genome. Thehuman exome makes up approximately 1% of the human genome. However,the majority (85%) of disease-causing mutations in Mendelian disorders areexpected to arise in the exome.140 Therefore WES is a cost-effective initial strategyto identify disease-causing variants. In the last decade, WES of unrelated individualsor families with multiple affected members a rare disorder has identified thegenetic basis of diseases such as Freeman-Sheldon syndrome, Kabuki syndrome,Miller syndrome, and autosomal dominant spinocerebellar ataxia.141–146 WES

12

Page 43: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

studies have also identified more than 50 novel tumour-predisposing genes, listedin Appendix B.

1.6.5 Whole exome sequencing of cancer cluster families

While WES has been used with great success to identify novel tumour predisposingmutations, only one published study has used WES to identify pleiotropic geneticrisk variants that predispose families to more than one type of cancer.

The recent WES study by Thutkawkorapin et al. (2016) utilised NGS technologyto investigate a family with a dominant cancer syndrome with a high risk ofboth rectal and gastric cancer.147 The authors hypothesised that the mixedrepresentation of rectal and gastric cancer among family members was due toone predisposing mutation in one gene.147 The authors performed WES in threefamily members, two with rectal cancer and one with gastric cancer, and followedup with WES and Sanger sequencing in additional family members, other patientsand controls.147 Thutkawkorapin et al. identified 12 novel nonsynonymous singlenucleotide variants (SNVs) shared among five affected members of this family.The authors suggested that at least five of the 12 variants may be candidatesthat contributed to the disease in the family.147 These variants did not segregatein other families and are therefore unlikely to be highly penetrant variants.

1.7 Next generation sequencing study considerations

NGS technologies can be used to identify rare variants in tumour or germlineDNA that increase an individual’s susceptibility to developing cancer.133 It isessential to compare tumour DNA with matched germline DNA to determinesomatic and germline alterations in cancer.148 Germline variants exist in thenormal germline sequence.149 Somatic variants are those in the tumour sequencebut not in the normal germline sequence.149 The ability of NGS to detect somaticvariants depends on the variant frequency within the tumour sample, samplecontamination, tumour heterogeneity, sequencing error, and the scarcity of somaticmutations within a genome.150,151

Recently there has been a return to family-based designs to identify rare riskvariants involved in common human disease, based on the hypothesis that affected

13

Page 44: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

members of the same family will carry the same rare susceptibility variant.133,152–155

Therefore, the number of individuals needed for rare variant discovery is potentiallysmaller than in cohorts of unrelated individuals.133 Families used in these typesof studies to identify rare inherited variants can either be consanguineous families,or non-consanguineous, large multigenerational and multiplex pedigrees.133 Targetedsequencing technologies have been used to successfully identify new causal genesin hereditary non-polyposis colon cancer and familial adenomatous polyposis,156

and hereditary breast and ovarian cancers.157 Two-phase NGS family studydesigns are recommended. In the first phase, family members are sequenced,and the discovered variants are ranked according to their likelihood of beingassociated with the trait.158 In the second phase, the variants are tested forassociation in an independent population-based sample.158

Families used to study cancer clustering should be selected carefully. Suitablefamilies have multiple affected and unaffected individuals from two or moregenerations available for analysis.49 Families in which various members developa rare form of cancer, such as sarcoma, are more likely to have a mutation segregatingin an inherited cancer gene compared to families affected by more commontypes of cancer, for example, adenocarcinomas of the lung, breast, prostate,and colon.49 Therefore ideal families for genetic studies of familial clustering ofcancers are those with multiple generations of affected and unaffected individualsand families with multiple cases of a rare form of cancer such as sarcoma.

1.8 Known cancer predisposition genes

Over the last 30 years, more than 100 cancer predisposition genes have beenidentified using a variety of strategies.134,159–161 Figure 1.1 shows the location ofknown cancer predisposition genes and a full list of known cancer predispositiongene is available in Appendix F. However, only a small proportion of familialcancer risk can be explained by established cancer predisposition genes.38,162

The use of family-based NGS strategies in this field may facilitate the discoveryof rare genetic mutations that explain the remaining genetic risk for cancerpredisposition if much of the missing genetic control is due to gene variants thatare too rare to be picked up by GWA studies and have relatively large effects onrisk.

14

Page 45: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

1.9 Summary

There have been a substantial number of studies performed to identify geneticrisk variants associated with cancer. Linkage studies have identified high penetrantrisk alleles associated with Mendelian autosomal dominant cancer predispositionsyndromes. Association studies have been used to successfully identify lowerpenetrant variants associated with more common types of cancer. However,much of the heritability of cancer remains unexplained. The introduction ofNGS technology has allowed the identification of rare variants that are expectedto explain some of the missing heritability of cancer.

Study considerations for using NGS in cancer research include sequencing bothtumour and germline DNA to facilitate the differentiation of somatic and germlinemutations and to use a family-based study design with multiple generations ofaffected and unaffected individuals and families with multiple cases of a rareform of cancer such as sarcoma.

To date, there have been few studies on shared genetic risk factors in cancercluster families that are not defined by a known familial cancer predispositionsyndrome. This study will employ the approach of performing WES in cancercluster families of mixed cancer types. WES will be conducted on both affectedand unaffected individuals from cancer cluster families that have been identifiedby a sarcoma proband to identify rare cancer predisposing variants. Only oneprevious study has used NGS technology to investigate shared genetic risk variantsacross multiple cancer types.147 This study will be the second WES study performedon cancer cluster families to identify shared genetic risk variants, and the firstWES study to select cancer cluster families by a sarcoma proband.

15

Page 46: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

1

p36.22p34.3p32.1p22.3p13.2

q21.1q23.2

q31.1q41q43

2

p25.2p23.1p16.1p11.2q12.3q21.3q24.2q32.2

q35

3

p25.3p22.2p21.1p11.2q13.2q22.3q26.1q28

4

p16.1p15.1q12q21.3q25

q31.1q32.3

5

p15.31p13.2q12.1q14.2q22.2q31.2q34

6

p24.3p21.32p11.1q14.3

q22.1q23.2

7

p22.1p14.2p11.1q21.12q31.1q33

8

p23.1p11.23q12.2q21.2q23.3

9

p23p13.2q13q22.2q33.1

10

p15.1p11.23q11.23q23.1q25.1

11

p15.3p13q12.3q14.1q23.2

12

p13.31p11.1q13.3q21.33q24.22

13

p12q12.2q14.3q31.1

14

p12q11.2q22.1q31.1

15

p12q12q21.1q25.1

16

p13.2p11.1q21

17

p13.2q11.2q23.1

18

p11.23

q12.1q21.31

19

p13.2q11

20p12.3q11.21

21

p12

q21.2

22p12q11.23

Xp22.32p21.2q11.1q21.31q24q27.3

Y

p11.2q11.223

Indicates position of known cancer predisposition gene

Figure 1.1: Location of known cancer predisposition genes

16

Page 47: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

1.10 Aims

The identification of genes that predispose individuals to cancer is a high priorityin human medical research. It is anticipated that this knowledge will drive anew era of personalised human medicine, potentially allowing tailoring of specificdrug treatments and interventions. The use of NGS in families currently representsan optimal study design for the identification of rare genetic variants involvedin the risk of cancer and other common complex human diseases. Waves ofnovel genetic discoveries using this approach are now regularly appearing in theliterature. While it is sometimes difficult to distinguish sporadic from hereditarycancer, rare cancer, such as sarcoma, occurring twice within the one family isepidemiologically striking.163 The identification of genetic risk factors for cancerwill be a significant contribution to medicine and particularly in the provision ofhealth care to cancer patients and their families.

The aims of this study are:

1. To perform WES on three cancer cluster families identified by a sarcomaproband using peripheral blood samples.

2. To identify candidate germline risk variants by prioritising and filteringstructural and regulatory variants that segregate with cancer or sarcomain the three families.

3. To perform a matched tumour and germline analysis on two myxoid liposarcomapatients using peripheral blood genomic DNA and genomic DNA isolatedfrom sarcoma tumour tissue to distinguish somatic mutations.

4. To validate the most significant putative germline and somatic cancerpredisposing mutations in unrelated sarcoma cases and cancer-free controls.

17

Page 48: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

18

Page 49: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Chapter 2

Aim 1: Whole exome sequencingof three cancer cluster familiesidentified by a sarcoma proband

2.1 Introduction

Next Generation Sequencing (NGS) has provided tremendous insight into thegenomic landscape of several tumour types, including defining tumour subtypes,identifying new druggable targets and understanding into the heterogeneityof many tumours.164,165 Protein-coding genes constitute approximately 1% ofthe human genome but harbour nearly 85% of the disease-causing mutations ofMendelian diseases, although this may be due to ascertainment bias.140,166–169

Genetic variations discovered in coding regions of genes may inform immediatetreatment choices and also further other therapeutic discoveries.170,171 Therefore,exome sequencing is an efficient approach for identifying actionable variants.The first aim of this study was to perform whole exome sequencing (WES) inthree cancer cluster families ascertained from an index sarcoma patient.

19

Page 50: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

2.1.1 Ion Proton platform

The Ion Proton platform from Thermo Fisher Scientific is a benchtopsemiconductor-based sequencing system for the human genome, exome or transcriptomesequencing. Semiconductor sequencing is based on the detection of hydrogenions that are released during the polymerisation of DNA using a sequencing-by-synthesisapproach.172 The Ion Proton sequencing chemistry uses native deoxynucleotides(dNTPs) and electronic sensors to detect the release of hydrogen atoms as thedNTPs are incorporated into the growing DNA strand.173 Microwells are sequentiallyflooded with each dNTP to distinguish the order of each nucleotide.173 Homopolymerruns are detected by the magnitude of the pH change to determine how manynucleotides were added.173 Errors on the Ion Proton are mostly due to insertionsand deletions in homopolymer runs due to the difficulty in evaluating the magnitudeof signal when several dNTPs are incorporated in one cycle.174

Automated sequencing analysis occurs using the Torrent Suite software thatis preinstalled on the Torrent Server. The web-based interface can be used toplan, monitor and view the results of sequencing runs. The Torrent Suite basecalling algorithm converts the raw file information into a sequence of bases andwrites the sequence to an unaligned Binary Alignment/Map (*.bam) file. The*.bam file is then aligned using Torrent Mapping Alignment Program (TMAP).Variants are called using the Torrent Variant Caller (TVC). Both TMAP andTVC were developed specifically for Ion Torrent data and were used in thischapter.

20

Page 51: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

2.2 Methods

2.2.1 Families selected for whole exome sequencing

The patients were recruited from the International Sarcoma Kindred Study(ISKS). The ISKS was initiated in 2008 to investigate the prevalence and natureof heritable risk in sarcoma populations.175 The ISKS is a global genetic, biological,epidemiological, and clinical resource for researchers to investigate the hereditarycharacteristics of sarcoma. Patients were recruited from several sites acrossAustralia, France, New Zealand, India, the United States of America, the UnitedKingdom, and Canada. The ISKS Steering Committee granted access to thedatabase for this study under an ethically approved protocol (the University ofWestern Australia (UWA) Human Research Ethics Committee RA/4/1/6434).

Patients with sarcoma (probands) were recruited from major sarcoma treatmentcentres, regardless of their family history of cancer. Individuals with adult-onsetsarcoma (> 15 years old) were eligible for the ISKS. Family members were alsoinvited to participate if the patient with sarcoma was < 45 years of age, orthere was a significant family history of cancer.175 Study questionnaires containingdemographic, medical, epidemiological and psychosocial information were completed,including personal history of cancer or exposure to known risk factors for sarcoma.176

Patients were also asked to donate a venous blood sample and tumour sample,as well as provide access to medical information and access to information aboutdeceased relatives (collected from cancer registries and other health organisations).Medical history and treatment records were obtained for each proband wherepossible.176 All reported cancer diagnoses were independently verified by medicalrecords, Australian and New Zealand cancer registries or death certificates.

There are now more than 1,300 families enrolled in the ISKS with detailed pedigreeinformation and cancer incidence verified for each. More than 1,800 blood sampleshave been collected and approximately 2,100 questionnaires completed. Theaverage age at onset for sarcoma in the ISKS cohort is 46.6 years (range 3-95years) with the majority being sarcomas of soft tissue. Family members havereported over 2,000 other cancers. The average age at diagnosis for these othercancers is 57.9 years compared to 65.6 years in the general population.175

21

Page 52: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Since the establishment of the ISKS, several studies have focused on identifyingTP53 germline mutations in Li-Fraumeni and the less stringent Li-Fraumeni-likesyndrome in the cohort.163,176,177 A previous study found pathogenic TP53mutations in blood DNA of 20 of 559 sarcoma probands (3.6%) in the ISKScohort.176 The study of familial cancer cluster patterns in the ISKS identified14% of the ISKS families with patterns of familial clustering without conformingto any known syndrome.163 A more recent study using the ISKS discovered thatmore than half of the sarcoma patients had an excess of putatively pathogenicmonogenic and polygenic germline variation in known and novel cancer genesusing a case-control rare variant burden test.178

The combination of findings that 14% of cancer cluster families in the ISKS donot conform to known syndromes and the excess of rare monogenic and polygenicgermline mutations in more than half of the ISKS patients indicate the potentialutility of this cohort to identify novel genetic risk factors for sarcomas and cancerclustering in families. Three ISKS families that do not conform to known cancersyndromes were targeted for selection in the current study and represented aunique opportunity to identify novel variants that may influence sarcoma orcancer development.

These three families were selected for the current study based on the followingselection criteria:

• The sarcoma proband must have blood and tumour biospecimens available

• The pedigree must contain a first degree relative with cancer also withgermline samples available

• The pedigree must contain at least one unaffected relative with germlinematerial available, and

• The family is not defined by TP53 or other known familial cancer susceptibilitygenes

Family 1 (Figure 2.1) depicts a proband (Patient 1-III-1) who developed Ewing’ssarcoma at 15 years of age, as well as a non-identical twin brother (Patient1-III-2) who has not developed sarcoma. The proband’s father (Patient 1-II-2)

22

Page 53: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

developed myxoid liposarcoma at 39 years of age. Germline DNA was availablefrom the proband and father, and from the proband’s twin brother, mother(Patient 1-II-3), an aunt (Patient 1-II-1) and grandparents (Patient 1-I-1 andPatient 1-I-2), who were all unaffected by cancer.

Family 2 (Figure 2.2) was identified by a proband (Patient 2-II-2) who developedmyxoid liposarcoma at 61 years of age. The proband’s father (Patient 2-I-1)developed prostate cancer at 71 years old, and two of the proband’s sisters werediagnosed with skin melanomas at 44 (Patient 2-II-3) and 46 (Patient 2-II-2)years of age. Germline DNA was available for the proband, one of his unaffectedchildren (Patient 2-III-1), three of his sisters (including an unaffected sister,Patient 2-II-4), and his parents (Patient 2-I-1 and Patient 2-1-2).

In family 3 (Figure 2.3), there are two individuals with sarcoma; the proband(Patient 3-III-1) who developed a primitive neuroectodermal tumour (PNET) at22 years of age, and her grandmother (Patient 3-I-1) who developed malignantperipheral nerve sheath tumour (MPNST) at 79 years old. The proband’s father(Patient 3-II-1) was diagnosed with prostate cancer at 51 years of age, and theproband’s aunt developed breast cancer at age 36. Germline DNA was availablefrom the proband, her parents (Patient 3-II-1 and Patient 3-II-2), her unaffectedbrother (Patient 3-III-2), and her grandmother.

1-I-1 1-I-2

1-II-1 1-II-2

Sarcoma (39)

1-II-3

1-III-1

Sarcoma (15)

1-III-2

Affected male

Affected female

Unaffected male

Unaffected female

Proband

Key

Figure 2.1: Pedigree of family 1

23

Page 54: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

2-I-1 2-I-2

Prostate (71)

2-II-1

Sarcoma (61)

2-II-2

Melanoma (46) 2-II-4

2-III-1

2-II-3

Melanoma (44)

Affected male

Affected female

Unaffected male

Unaffected female

Proband

Key

Figure 2.2: Pedigree of family 2

3-I-1

Sarcoma (79)

3-III-1

Sarcoma (22)

3-II-1

Prostate (51)

3-II-2

3-III-2

Affected male

Affected female

Unaffected male

Unaffected female

Proband

Key

Figure 2.3: Pedigree of family 3

24

Page 55: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

2.2.2 DNA extraction

DNA extraction was performed by researchers at the Peter MacCallum CancerCentre in Melbourne, Australia. Anti-coagulated blood was processed usinga Ficoll gradient. DNA was extracted from the nucleated cell product usingQIAamp DNA blood kit (Qiagen).176

2.2.3 Whole exome sequencing

WES was performed by the candidate at the Curtin University - UWA Centrefor Genetic Origins of Health and Disease (GOHaD). Two germline samplesfrom Patient 3-I-1 and Patient 3-III-2 were badly degraded and of poor quality.Therefore, whole genome amplification was performed on these samples usinga Qiagen REPLI-g Mini Kit (Qiagen) as per the manufacturer’s instructions.Exome library preparation was performed using the Thermo Fisher ScientificIon AmpliSeq Exome RDY Kit as per the manufacturer’s instructions. Librarieswere loaded onto the Ion P1 v2 BC Chip (Thermo Fisher Scientific) using theIon Chef and sequenced on the Ion Proton as per the manufacturer’s instructions.An overview of the WES pipeline is shown in Figure 2.4.

2.2.3.1 Library preparation

The target regions were amplified using the Ion Ampliseq Exome RDY LibraryPreparation from 100 ng of genomic DNA in the Ion Ampliseq Exome RDYplates and the Ion Ampliseq HiFi Mix. The amplicons were treated with FuPareagent to digest the primers partially and to phosphorylate the amplicons.The amplicons were then ligated to Ion Xpress Barcode Adapters, purified anddissolved in 50 µl of Low TE. Validation of enrichment and quantification oftarget DNA were performed on the ViiA 7 (Thermo Fisher Scientific). Three10-fold dilutions of Escherichia coli control library were prepared at 6.8 pM,0.68 pM and 0.068 pM. 9 µl of each control library and each sample were addedto wells of a 96-well qPCR plate as well as 11 µl of the reaction mixture for atotal reaction volume of 20 µl. The qPCR was run for 40 cycles.

25

Page 56: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Ion Ampliseq Exome RDY Library Preparation

19 germline samplesSamples

Library Preparation

Sequencing platform Life Technologies Ion Proton

Sequence alignment

Variant calling

Torrent Mapping Alignment program

Torrent Variant Caller plugin Genome Analysis Toolkit

UnifiedGenotyper

Merge using bcftools

BCFtools intersect

Quality check Torrent Suite software

Figure 2.4: Whole exome sequencing pipeline flowchart

26

Page 57: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

2.2.3.2 Exome sequencing

Run plans were created for each chip with the barcode and sample identitynumber on the Torrent Browser server. The plans were created using the TorrentSuite Software with the run parameters listed in Table 2.1.

Table 2.1: Parameters used to create whole exome sequencing run plans usingTorrent Suite software

Parameter Specified

Application DNA

Kit Ion Ampliseq Exome Kit

Library kit type Ion Ampliseq Exome RDY – IC Kit 1x8

Template kit Ion Chef, Ion PI IC 200 kit

Flows 520

Chip type Ion PI chip

Barcode set IonXpress

Reference library Human genome build 19 (hg19)

Plug ins variantCaller and coverageAnalysis

The sample libraries were diluted to approximately 50 pM, the optimal inputconcentration. The Ion PI v2 BC chips were prepared for loading by performingalternate washes with 100% isopropanol, Ion PI Chip Preparation Solution,nuclease-free water, 0.1 M NaOH, and 1X Ion Chip Priming Solution as per themanufacturer’s instructions. The Ion PI IC Reagents 200 cartridge was removedfrom the freezer and warmed to room temperature 45 min before the Ion ChefInstrument run.

The Ion Chef Instrument was loaded with treated Ion chips, consumables, reagentsand libraries as per the manufacturer’s instructions (Thermo Fisher Scientific).The Ion Chef Instrument run completed overnight.

27

Page 58: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

The following day, the Ion Proton Sequencer was initialised as per the manufacturer’sinstructions (Thermo Fisher Scientific). The Ion chips were unloaded from theIon Chef Instrument, and the first chip was loaded into the Ion Proton Sequencer.The second chip was stored in a container at 4◦C until 20 min before the end ofthe first run. When the first run was completed, the second chip was loadedimmediately for sequencing.

2.2.4 Sequence alignment and variant calling

The Torrent Suite software (Life Technologies, v4.4.3.3) Torrent Variant Caller(TVC) was used to perform base calling. The resulting base calls were storedin an unmapped *.bam format. The Torrent Suite Torrent Mapping AlignmentProgram (TMAP) was used to align sequencing reads to the reference genomeusing human genome build 19 (hg19). Some or all of the reads produced by theWES pipeline are used as input for TMAP, along with the reference genomeand index files. The output from TMAP is a mapped *.bam file.

2.2.5 Variation to sequence alignment and variant calling

2.2.5.1 Torrent variant caller plugin

As an additional measure, base calling was performed a second time using theTVC Plugin (Life Technologies, version 5.0.0). The TVC Plugin software wasinstalled on Magnus (Pawsey Centre), a Cray XC40 supercomputer. The AmpliSeqExome capture browser extensible data (*.bed) file from Life Technologies wasused as the target region *.bed and primer trim *.bed file (available fromhttps://www.ampliseq.com). The output is a variant call format (*.vcf) filecontaining meta-information lines, a header line and data lines for each positionin the genome.179 Each individual was called separately using TVC, generating19 individual *.vcf files. The details used to run the TVC Plugin on Magnus areoutlined in Table 2.2.

28

Page 59: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 2.2: Parameters used to run the Torrent Variant Caller plugin to callbases

Parameter Specified

Input bam All *.bam files from the Ion Proton

Reference fasta hg19.fasta

Region bed AmpliSeqExome.20131001.designed.bed

Primer trim bed AmpliSeqExome.20131001.designed.bed

Error motifs ampliseqexome_germline_p1_hiq_motifset.txt

Each of the 19 patients was called individually and then merged using BinaryVariant Call Format Tools (BCFtools) vcf-merge179 to create a single *.vcf file.As TVC only calls individual *.bam files, there is uncertainty whether a positionis truly missing or is reference homozygous. BCFtools missing-to-reference179

was also run on the merged file to fill unknown positions to homozygous reference(0/0).

2.2.5.2 Genome analysis toolkit

The Genome Analysis Toolkit (GATK, version 3.4.0) UnifiedGenotyper180 wasused in addition to the single sample calling to sort, index and call the *.bamfiles to ensure base calling accuracy. GATK can perform multi-sample calling.Therefore, all 19 patients were called together.

GATK UnifiedGenotyper was used on a secure Linux server owned by GOHaD(operating system: Bio-Linux (based on Ubuntu 14.04.3)). UnifiedGenotyperuses a Bayesian genotype likelihood model to estimate the most likely genotypesand allele frequency in a population of samples simultaneously and produces agenotype for each site. First, each sample was sorted using SAMtools sort andindexed using SAMtools index .181 Picard CreateSequenceDictionary (version2.4.1, https://github.com/broadinstitute/picard) was used to create a sequencedictionary for a reference sequence and then Picard BedToIntervalList was usedto convert a *.bed file to Picard interval list format. The specifications used torun GATK UnifiedGenotyper on the server are outlined in Table 2.3.

29

Page 60: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 2.3: Parameters used for Genome Analysis Toolkit UnifiedGenotyper tocall bases

Parameter Specified

Reference fasta hg19.fasta

Genotype likelihoods model SNP

Input bam All sorted *.bam files from the Ion Proton

Target interval list AmpliSeqExome.20131001.bed

Out mode EMIT_ALL_CONFIDENT_SITES

Metrics Directory for metrics

Stand-conf-call 50.0

Stand-emit-conf 10.0

Annotation AlleleBalance

2.2.5.3 Intersect variant calls from Torrent Variant Caller and GenomeAnalysis Toolkit

The resulting *.vcf files from both TVC and GATK were combined using BCFtoolsintersect (isec)181 exact allele match to identify the common calls between TVCand GATK. This tool created both intersections and complements of the TVCand GATK *.vcf files. The intersect data from both callers was used for theremainder of the analysis.

2.2.6 Recalibrate variants

GATK VariantRecalibrator180 was used to assign a well-calibrated probability toeach variant call in a call set. This tool has a two stage process called VariantQuality Score Recalibration (VQSR). The first pass is performed by VariantRecalibrator180

and consists of creating a Gaussian mixture model by looking at the distributionof annotation values over a high quality subset of the input call set and thenscoring all input variants according to the model.180 The recalibrated variantquality score provides a continuous estimate of the probability that each variantis correct, allowing one to partition the call sets into quality tranches.182 The

30

Page 61: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

primary purpose of the tranches is to establish thresholds within the data thatcorrespond to particular levels of sensitivity relative to the truth sets.

The second pass is performed by the ApplyRecalibration tool180 that applies themodel parameters to each variant in input *.vcf files to produce a recalibratedVCF file in which each variant is annotated with its variant quality score log-odds(VQSLOD) value.182 This step also filters the calls based on this new logarithmof the odds (LOD) score by adding “Pass” for variants that meet the specifiedthreshold, and “LowQual” in the FILTER column for variants that do not meetthe specified LOD threshold.180 The filter level selected for the ApplyRecalibrationtool was 99.0.

2.2.7 Genotype concordance

Concordance was measured in three patients that had previously been genotypedto validate the genotype calls. The three patients, Patient 1-II-2, Patient 2-II-1and Patient 3-III-1, all sarcoma cases, had been genotyped previously throughthe ISKS using an Agilent HaloPlex custom panel of 85-101 gene coding sequencecapture. Genotype calls were compared across the three sarcoma cases and todetermine how many calls (either 0/0, 0/1 or 1/1) were the same between theintersect file and previous genotyping using the Agilent HaloPlex custom panel.Any discordant variants were checked in the *.vcf files. The *.bam files werealso visually examined in Integrative Genomics Viewer (IGV, version 2.3.80).183,184

2.3 Results

2.3.1 Families selected for whole exome sequencing

This study included 19 patients from three multigenerational mixed cancerfamilies. Of these, 11 (58%) were female, and nine (47%) had been diagnosedwith cancer. The average age of the patients at the time of blood collection was55.3 years (range: 15 years to 90 years) and the average age of cancer (includingsarcoma) onset was 47.5 years (range: 15 years to 79 years). The average ageof onset in the three families is younger than the average age of onset of all

31

Page 62: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

cancers in the whole ISKS cohort (57.9 years) but similar to the age of onsetof sarcomas (46.6 years).

2.3.2 Whole exome sequencing

Table 2.4 shows the summary statistics generated by the Torrent Suite software.The average depth of coverage across all samples was 100.66 reads, which isa sufficient depth for detecting single nucleotide variants (SNVs).185,186 Theaverage number of mapped reads was 38,484,361, and the average total genotypingrate was 98.9%.

Table 2.4: Depth of coverage summary from Torrent Suite

Patient Mapped reads On target Mean Depth Number of variants

1-I-1 43,848,035 94.24% 115.80 47,625

1-I-2 28,509,630 96.37% 79.96 48,690

1-II-1 28,343,027 96.36% 80.53 47,334

1-II-2 38,178,599 93.83% 94.99 47,113

1-II-3 39,158,180 94.60% 98.83 47,915

1-III-1 37,229,527 93.93% 93.94 46,670

1-III-2 42,568,341 95.26% 108.40 47,641

2-I-1 33,480,989 94.43% 87.30 42,574

2-I-2 48,585,532 95.67% 131.80 48,220

2-II-1 35,464,936 94.21% 95.84 47,678

2-II-2 45,333,955 94.63% 119.30 48,491

2-II-3 46,884,691 95.38% 128.30 49,238

2-II-4 36,173,806 95.19% 99.70 48,517

2-III-1 30,353,951 95.56% 82.98 47,282

3-I-1 34,870,702 96.03% 79.57 41,493

3-II-1 42,063,872 95.10% 114.40 53,329

3-II-2 40,663,971 95.07% 110.60 52,846

3-III-1 47,344,500 95.68% 118.20 48,337

3-III-2 32,146,623 95.01% 72.06 41,169

Average 38,484,361 95.08% 100.66 47,482

32

Page 63: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

2.3.3 Variant calling

5,099,324 unknown positions were changed to reference positions in the mergedTVC *.vcf files using BCFtools missing-to-reference. In total, 109,503 variantswere called by TVC and 238,530 variants were called by GATK UnifiedGenotyper.

Figure 2.5 shows a diagram of the number of calls by TVC and GATK and theintersection of both callers. The intersect file from both callers contained 94,263variants for all 19 patients.

144,267 15,240 94,263

Genome Analysis

Toolkit

Torrent Variant CallerIntersect

Figure 2.5: The number of variants called by Torrent Variant Caller andGenome Analysis Toolkit UnifiedGenotyper, and the number of variants that

were called by both callers (intersect)

2.3.4 Recalibrate variants

Figure 2.6 shows the tranche plot generated by GATK VariantRecalibrator. Thefirst tranche (90), has the lowest value of truth sensitivity but the highest valueof novel Ti/Tv, is very specific but less sensitive.187 Each subsequent trancheintroduces additional true positive calls along with a growing number of falsepositive calls.187 Table 2.5 shows the 99.0 tranche used in this study that has85,941 known calls and 3,097 novel calls with 49,447 accessible truth sites. Intotal, 48,952 calls were made in tranche 99.0. The resulting file now has a newcolumn generated by VariantRecalibrator that has “pass” or “low quality” foreach variant.

33

Page 64: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Figure 2.6: Genome Analysis Toolkit VariantRecalibrator tranche plot

X-axis: the number of novel variants called. Y-axis: the novel transition to transversion ratio

and the overall truth sensitivity. TP (true positive): exact match of non-reference genotype.

FP (false positive): additional alternate allele in WES genotype.

Table 2.5: Genome Analysis Toolkit VariantRecalibrator tranche results

Tranch minVQSLOD Known Novel Truth sites Called

90.0 1.14 75,136 at 2.77 2,254 at 1.89 49,447 accessible 44,502

99.0 1.01 85,941 at 2.71 3,097 at 1.50 49,447 accessible 48,952

99.90 6.32 88,789 at 2.69 4,200 at 1.20 49,447 accessible 49,397

100.00 192.99 88,975 at 2.69 4,528 at 1.14 49,447 accessible 49,447

34

Page 65: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Figure 2.7 shows the 2D projection of mapping quality rank sum (MQRankSum)test versus haplotype score by marginalising over the other annotation dimensionsin the model. The mapping quality rank sum test is the u-based z-approximationfrom the Mann-Whitney Rank Sum Test188 for mapping qualities, that is, readswith reference bases versus those with the alternate allele.187 This measure canbe used to evaluate the likelihood of SNPs being real.

Figure 2.7: Genome Analysis Toolkit VariantRecalibrator projection formapping quality rank sum (MQRankSum) versus haplotype score

The upper left panel shows the probability density function that was fitted to the data.

Green: high quality. Red: lowest quality. The remaining three panels give scatter plots in

which each single nucleotide polymorphism (SNP) is plotted in the two annotation dimensions

(MQRankSum and HaplotypeScore) in a point cloud. In the upper right panel, SNPs are

coloured black and red to show which SNPs are retained and filtered, respectively, by

applying the variant quality score recalibration procedure. The lower left panel colours SNPs

green, grey, and purple to give a sense of the distribution of the variants used to train the

model. Green SNPs: found in the training sets. Purple: given the lowest probability of being

true. The lower right panel colours each SNP by their known/novel status. Blue: known

SNPs. Red: novel SNPs.

35

Page 66: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

2.3.5 Genotype concordance

A total of 212 positions across three previously genotyped individuals wereused to compare genotype calls from WES and Agilent HaloPlex custom panel(Figure 2.8). Of those 212 positions, 77 were not called in the WES data due tolow coverage or position of the primers.

Of the remaining 135 positions, 123 calls (91%) were concordant between thetwo data types and 12 calls (9%) were discordant. Of the 12 discordant calls,two of the calls were in Patient 3-III-1 and were called at 1/1 using the AgilentHaloPlex custom panel data and called as 0/1 in the WES data. The remainingten discordant calls were all in Patient 2-II-1 and were called as 0/0 from theAgilent HaloPlex custom panel data and either 0/1 (6 calls) or 1/1 (4 calls)using the WES data. Both concordant and discordant calls were kept in theintersect file. The genotyping positions were all located in easy to map regionsof the genome and may not reflect the true false positive to false negative ratefor all positions.

36

Page 67: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

2 10123

Not called in whole exome sequencing data 77

Concordant 123

Discordant 12

Called as variant by Agilent HaloPlex 2

Called as variant by Ion Proton 10

TOTAL 212

Called variant by Ion

Proton whole exome

sequencing

Called variant by Agilent

HaloPlex custom panel

genotyping

Concordant

Figure 2.8: Concordance of genotype calls between the Agilent HaloPlex custompanel and whole exome sequencing on Ion Proton for three patients

Blue: Called homozygous alternate (1/1) by Agilent HaloPlex custom panel but called

heterozygous (0/1) by Ion Proton whole exome sequencing. Green: called variant (0/1 or

1/1) by Ion Proton whole exome but called homozygous reference (0/0) by sequencing Agilent

HaloPlex custom panel.

37

Page 68: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 2.6 shows the ten positions in Patient 2-II-1 in which the genotype callsare discordant between the Agilent HaloPlex custom panel and the WES genotype,that is, where the variant is called 0/1 in the intersect file but no variant iscalled by the Agilent HaloPlex custom panel. The genotype calls for Patient2-II-1 were checked in the TVC *.vcf, the GATK *.vcf and the intersect file.The genotype calls for the ten positions were the same across the three files.The genotype results from WES for both parents of Patient 2-II-1 (Patient2-I-1 and Patient 2-I-2) are included in the last two columns of Table 2.6. Theseresults indicate the WES genotype calls for Patient 2-II-1 at these positions arelikely correct, given the genotypes of both parents.

Table 2.7 shows the two discordant variants for Patient 3-III-1 which were bothcalled as homozygous alternate using the Agilent HaloPlex custom panel butcalled as heterozygous in the intersect file. The genotype calls for these twopositions were checked in the TVC *.vcf file, the GATK *.vcf file and the intersectfile. TVC called the first variant (chromosome 7) as 1/1 whereas GATK calledthe variant 0/1. Therefore the position is called as 0/1 in the intersect file. TVCcalled the second variant (chromosome 13) also as 1/1, GATK called the variant1/1, however, in the intersect file the variant is called 0/1. For both variants,the parents of 3-III-1 (last two columns) have a homozygous alternate genotypecall. On visual inspection of the *.bam files in IGV, Patient 3-III-1 appearsto be also homozygous for the alternate allele at these positions. Therefore itappears the errors for these variant calls occurred when intersecting the *.vcffiles.

38

Page 69: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 2.6: Discordant genotype calls between the Agilent HaloPlex custom panel and whole exome sequencing forPatient 2-II-1

Chr Position Ref Alt Agilent HaloPlex GT Intersect file GT 2-I-1 (Father) 2-I-2 (Mother)

4 84383810 C T 0/0 0/1 0/0 1/1

7 6026775 T C 0/0 1/1 1/1 0/1

9 86617265 A G 0/0 0/1 0/1 (low reads) 1/1

11 108183167 A G 0/0 1/1 1/1 1/1

11 125525195 A G 0/0 0/1 0/1 (low reads) 1/1 (low reads)

14 75513883 T C 0/0 1/1 0/1 (low reads) 1/1

17 7579472 G C 0/0 1/1 0/1 1/1

17 59763347 A G 0/0 0/1 0/1 0/1

17 63554591 G A 0/0 0/1 0/1 0/0

18 60027241 C T 0/0 0/1 0/1 0/1Chr: chromosome. Ref: reference allele. Alt: alternate allele. GT: genotype. Low reads: less than 10 reads at this position.

Table 2.7: Discordant genotype calls between the Agilent HaloPlex custom panel and whole exome sequencing forPatient 3-III-1

Chr Position Ref Alt Agilent HaloPlex GT Intersect file GT 3-II-1 (Father) 3-II-2 (Mother)

7 6026775 T C 1/1 0/1 1/1 1/1

13 103527930 G C 1/1 0/1 1/1 1/1Chr: chromosome. Ref: reference allele. Alt: alternate allele. GT: genotype.39

Page 70: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

2.4 Discussion

2.4.1 Evaluation of families used in this study

It has long been recognised that cancer has a familial component. Genetic studieswere traditionally performed on sets of related individuals, including Mendel’sstudy of inheritance patterns in pea plants from parents to offspring that proposethe underlying mechanisms of inheritance.189 Pedigree studies have been usedsuccessfully to identify genes influencing a broad range of monogenic, highlypenetrant traits.161

There are several reasons why family studies are used for gene discovery. Firstly,pedigrees are more likely to represent a more homogeneous and limited set ofcausal genes which enhance the statistical power for gene discovery.190 Secondly,clinical characteristics that are shared among family members also reduce heterogeneityfor analysis.190 Thirdly, the analysis of phenotypes among family members iscontrolled to some extent for both genetic background and environmental exposures.190

Therefore, the background genetic variation is also controlled to some extent.Finally, family data allow a deeper level of genotyping quality control than ispossible in studies of unrelated individuals.190

There are also disadvantages of using families in genetic research. It can bemore costly to recruit entire pedigrees compared to unrelated individuals.190

However, the analysis of disease/trait segregation in pedigrees with known geneticmarkers has proven to be a robust approach to gene discovery.

The study of familial cancer predisposition syndromes characterised by sarcomaprobands has resulted in valuable insight into cancer biology and genetic risk.For example, the study of Li-Fraumeni syndrome defined the roles of the tumoursuppressor gene, TP53, in the development of cancer. Since germline mutationsin the TP53 gene were first identified in Li-Fraumeni syndrome families, thegene has also been implicated in the sporadic form of most cancers.51 It is nowknown that the TP53 gene has a role in the regulation of the cell cycle, DNArepair, apoptosis, cellular metabolism, and senescence.191 These findings havehad a significant impact on the clinical management of familial cancer predispositionsyndromes and cancers in general.192

40

Page 71: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

The ascertainment of cancer cluster families by a sarcoma proband has alsobeen used to study incidence and distributions of cancers in relatives of sarcomaprobands in families not defined by known syndromes.193–197 These studies foundan increased cancer risk in relatives of sarcoma probands, and suggest the presenceof shared underlying genetic risk variants independent of known cancer predispositionsyndromes.195,196 The families selected for investigation in the current studywere in this category, i.e. they were not defined by a known cancer predispositionsyndrome and therefore represent an opportunity to identify novel risk variantsassociated with both sarcoma and cancer risk.

The ISKS families selected for WES in the current study include sarcoma, prostatecancer and melanoma cases. The occurrence of these cancers in families hasbeen previously reported in familial cancer syndromes such as Li-Fraumenisyndrome51,52,198,199 and familial atypical multiple mole melanoma (FAMMM)syndrome (characterised by mutations in the CDKN2A gene),200–203 as wellas other non-FAMMM syndrome families, also found to have mutations in theCDKN2A gene.202,204 However, the three families selected do not have mutationsin the CDKN2A gene and therefore represent an opportunity to identify novelgenetic variants that may lead to the development of these cancers within afamily.

The number and size of pedigrees vary widely in genetic studies of familial cancer.The number of relatives can range from two family members to extended pedigreeswith > 30 individuals.205,206 The families used in this study are similar in sizeto the families studied by Roach et al. (2010) to discover the causative gene forMiller syndrome and Shi et al. (2014) to identify rare POT1 variants in familialcutaneous malignant melanoma.207,208

2.4.2 The use of whole exome sequencing to identify diseasecausing variants

WES has been a powerful approach for identifying genes that underlie Mendeliandisorders and complex traits.141,144,209,210 To date, most genes discovered thatunderlie rare Mendelian disorders have genetic variation in protein coding sequencesthat are predicted to have functional consequences and be deleterious.166,211

41

Page 72: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

WES has also been a powerful and efficient approach for the discovery of geneticmutations in various cancers, identifying more than 50 novel tumour-predisposinggenes (Appendix B). The identification of clinically actionable driver mutationsthrough WES has enabled the development of precision oncology therapies.212–215

Many of the genes that have been implicated in hereditary sarcomas play asignificant role in the cellular response to DNA damage that has led to the developmentof DNA repair targeted therapies.216,217

WES has the advantage of increased coverage of regions of interest (exons) atlower cost and higher throughput compared with current whole genome sequencing(WGS).148 WES was therefore chosen for this study as an appropriate, affordableand robust in-house method.139,210

2.4.3 Limitations of whole exome sequencing

A weakness of WES is that it largely ignores variants residing in non-codingand intergenic regions that can affect gene expression.218 Non-coding DNAplays an important role in gene regulation and 3D chromatin folding219 However,the effects of non-coding variants on gene expression are not yet completelyunderstood.220 The effects of regulatory variation may be more subtle and maybe more important in common complex diseases such as cancer compared toMendelian diseases.221 The relevance of regulatory variation to cancer susceptibilityin humans is unclear, but it is possible that polymorphisms in non-coding regionsmight have an important role.221,222

As the costs of WGS decrease and analytical tools such as Encyclopedia of DNAElements (ENCODE)223 become more adept at interpreting the effects of non-codingvariants, WGS will become more widespread. The use of WGS studies to investigategenetic variants in cancer cluster families may lead to the discovery of mutationsin regulatory elements that add to the pool of disease-associated variants.224

Structural variations (defined as DNA sequence alterations other than SNVsincluding insertions, deletions, duplications, inversions and translocations)225–227

were not examined using WES in this chapter. There are many challenges insomatic structural variation detection inherent in the limitations of NGS technologies,the complexities of tumour samples and the difficulties in structural variant

42

Page 73: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

reconstruction.227 As WGS technologies improve, the use of paired-end reads,deeper coverage and longer sequence reads will facilitate the examination ofsomatic structural variants in cancer.

2.4.4 The Ion Proton sequencing platform

The Ion Proton generally shows similar performance to other high-throughputsequencing platforms.228,229 The Ion Proton is also known to produce high qualitydata at a comparable average depth and read length in addition to a fasterturnaround time compared to the Illumina HiSeq.172,229,230

The average percent of reads on target produced in this study was 95.08%. Themeasurement of reads on target is represented by the ratio of the number ofreads within a target region to the total number of bases output by the sequencer,expressed as a percentage. Off-target regions refer to those areas that are located5’ and 3’ to target regions (upstream, downstream, untranslated regions andintronic). The percentage of on-target reads are dependent on the platformused as each platform uses different target choices, bait lengths, bait densityand molecules used for capture.185

2.4.5 Base calling software

The TVC software was developed specifically to call Ion Proton sequencingdata. However, it cannot produce multi-sample variant call files. The advantageof using multi-sample calling is to distinguish non-variant genotypes betweenhomozygous reference genotype and missing genotype in cohort analysis.149,231

Multi-sample variant calling reduces the probability of calling random sequencingerrors and increases the likelihood of calling alleles of low frequency or low coveragein a single sample.149 Therefore, the sensitivity and accuracy of base calling areimproved.149 When calling the samples individually using TVC, many positionshad to be filled to reference homozygous, and it was impossible to distinguishmissing from homozygous reference positions. GATK UnifiedGenotyper canperform multi-sample calling and can, therefore, distinguish between missingand reference homozygous positions. However, GATK is not suited to Ion Protondata as the Ion Proton platform produces markedly different data to the Illumina

43

Page 74: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

platform.232 There were over twice the number of variants returned from GATKUnifiedGenotyper (238,530) compared to the number returned by TVC (109,503).Anecdotally, GATK does produce a higher number of false positives which mayaccount for the difference in variants called (up to 10 times as many as reportedon online bioinformatics forums).

An intersect file of the calls made by TVC and GATK UnifiedGenotyper wascreated to reduce the number of false positives in the final call set and to overcomethe problem of single sample calling by TVC and the platform differences byusing GATK. Previous studies have recommended using multiple callers to generatea final call set.233,234 A simple way to combine call sets is to take the intersectionor union of calls as final calls.234 However, this was a very rigorous approachthat reduced the number of variants from 109,503 called by TVC and 238,530from GATK UnifiedGenotyper to just 94,263 in the intersect file. Therefore,some true variants may have been excluded as a result of using the intersectfile. However, this may be the best approach for reducing the number of falsepositive calls.

2.4.6 Concordance

In this study, the concordance rate of genotype calls for 135 positions fromWES and the Agilent HaloPlex custom panel was 91%. The concordance ratefalls into the range supported by previous literature on the concordance rates ofpanel versus sequencing data.

A previous study by Motoike et al. (2014) aimed to validate SNV calls by exomeanalysis. They sequenced 12 independent genomes from Japanese patients usingthe Ion Proton semiconductor sequencer for whole exome sequencing (averagedepth 109).235 Reads were aligned to hg19 using TMAP and genotype callingwas performed on each sample using TVC.235 Single nucleotide polymorphism(SNP) calls based on the Illumina Human Omni (version 2.5-8) SNP chip datawere used as the reference. They analysed a total of 79,143 SNPs on the autosomesand found the concordance rate between the Omni 2.5-8 and Ion Proton callsto be 81.8 – 96.0%.235 These figures are comparable to results reported in aprevious study.229

44

Page 75: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

The intersect file described in this chapter was used in Aim 2 of this study toidentify candidate risk variants. None of the discordant calls were removed fromthe intersect file. However, due to the findings of the concordance analysis,particularly the wrong call found in the intersect file but not either of the original*.vcf files from TVC or GATK, each variant detected in the analysis of thisdata was visually verified in the *.bam file using IGV.

45

Page 76: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

46

Page 77: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Chapter 3

Aim 2: Identification ofcandidate germline risk variantsin three cancer cluster families

3.1 Introduction

Whole exome sequencing (WES) generates data on a large number of variants,most of which are not relevant to the disease of interest as they do not havea functional effect at the protein or systemic level.236 The second aim of thisstudy was to use the WES data described in Chapter 2 to identify candidategermline risk variants that segregate with cancer or sarcoma in three cancercluster families. The analysis of WES data requires comprehensive computationalapproaches and strategies to identify candidate risk variants or genes for a diseaseof interest.237–239 Despite advances in sequencing platform technology, referencedata sets, software, and analysis pipelines, there is no gold standard for thefiltering and prioritisation of variants. However, many guidelines, tools, andonline resources have been developed to assist in the identification of functionalvariants from WES.

47

Page 78: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.2 Bioinformatic strategies for variant filteringand prioritisation in whole exome sequencing

3.2.1 Annotation

As the sequencing of cancer genomes can reveal thousands of mutations, anessential step in the interpretation of WES data is the annotation of variantsand their potential effects on genes and transcripts.240 Variant annotation is theprocess of assigning functional information to DNA variants. At a basic level,annotations can be used to identify genes, transcripts and genomic regions, andat a higher level, also predict the impact of the variant on the protein product.

There are over 80 bioinformatic tools available for genomic annotation, manyof which are available as web-based applications.241 Most tools focus on theannotation of single nucleotide variants (SNVs) as they are easily identifiedand analysed.242 However, an increasing number of tools are being developedto annotate copy number alterations and other structural variations includinginsertions, deletions, inversions and translocations.241,243–250

The most common form of annotation is the provision of links to public databasessuch as the National Center for Biotechnology Information (NCBI) Short GeneticVariations Database (dbSNP) or the 1000 Genomes Project.251,252 The functionalprediction of variants can result from a simple sequence-based analysis, region-basedanalysis, or evaluation of the structural impact on proteins.242 The choice ofannotation tool is largely dependent on the desired selection of variant annotations.

A widely used annotation tool to identify the functional consequence of sequencevariation is Annotate Variation (ANNOVAR).245 ANNOVAR predicts the functionaleffects of variants on genes, as well as performing genomic region-based annotationand comparison of variants to existing databases.245 ANNOVAR incorporatesscores based on evolutionary conservation and in silico prediction of functionalconsequences.

48

Page 79: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.2.1.1 Annotation of non-coding regions

A significant portion of the reads obtained in WES come from outside of thedesigned target region.253 In a typical WES study, approximately 40-60% ofthe reads are off target, and all or most of these off-target reads are usuallyignored.254–256 Three main types of off-target reads are found in WES data:reads from introns and intergenic regions, reads from the mitochondrial genomeand reads from viral genomes.218 Although WES is not designed to identifyregulatory variants in intronic and intergenic regions, off-target reads should notbe discarded as many changes outside the coding regions may be responsible fordisease phenotypes.253

Annotation also plays an essential role in the interpretation of off-target variants.Regulome Database (RegulomeDB) can be used to guide the interpretationof regulatory variants in the human genome to identify potential regulatorychanges based on experimental data sets from the Encyclopaedia of DNA Elements(ENCODE) and other sources.257 RegulomeDB also includes computationalpredictions and manual annotations to identify putative regulatory potentialand identify functional variants.257 RegulomeDB uses a heuristic scoring systembased on the functional consequence of the variant.257

3.2.2 Variant class filtering

Variant filtering can be carried out using annotations for the genomic locationand the variant class. Annotations from ANNOVAR can be used to identifyintronic variants, exonic variants, intergenic variants, 5’ and 3’-untranslatedregion (UTR) variants, splicing site variants, and upstream or downstream variants.245

For exonic variants, ANNOVAR scans annotated messenger ribonucleic acid(mRNA) sequences to identify and report amino acid changes, as well as stop-gainor stop-loss mutations.245 Exonic missense, nonsense, stop-loss, frameshift andsplice site variants all have potential to affect protein function and are retainedduring this filtering process.211,239 RegulomeDB scores can also be used to filtervariants that are more likely to lie in a functional location.

49

Page 80: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.2.3 Population frequency filtering

Population frequency is one of the primary criteria for predicting if a variant islikely to have a functional effect on the encoded protein.258 Some rare nonsensevariants might be expected to have a larger functional impact than a frequentlyoccurring one.211,259 The Exome Aggregation Consortium (ExAC) database isthe biggest catalogue of protein-coding genetic variation to date and is intendedto be used as a general population resource to filter variants, including, for example,minor allele frequency (MAF).260,261 The ExAC database is the aggregationand analysis of high-quality exome DNA sequence data for 60,706 individualsof diverse ancestries.261 The ExAC database is recommended due to the allelefrequencies being calculated from considerably more samples compared to theExome Variant Server and the 1000 Genomes Project.252,260 In disease studies,a commonly used starting point for filtering is to remove variants with a MAF> 1%.239

3.2.4 Evolutionary conservation

Genomic Evolutionary Rate Profiling (GERP) uses a comparative genomicsapproach to identify putatively functional sequences by comparing similarityacross divergent species to identify sequences that have been maintained duringevolution.262 Pathogenic mutations tend to have a markedly higher conservationthan benign variants.263,264 GERP uses maximum likelihood evolutionary rateestimation for position-specific scoring.262 GERP scores range from a maximumof 6.18 to a below-zero minimum (-12.36). Positive scores represent a substitutiondeficit (expected for sites under selective constraint), while negative scores representa substitution surplus.

3.2.5 Functional impact prediction

In silico analysis of functional consequences of a variant on protein function andestimates of evolutionary conservation are often used for prioritisation in geneticdiscovery studies. Non-synonymous variants that lead to an amino acid changein the protein product are of particular interest as amino acid substitutions

50

Page 81: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

account for approximately half of the known genetic variants responsible forhuman inherited disease.265

Sorting Intolerant From Tolerant (SIFT) and Polymorphism Phenotyping-2(PolyPhen-2) are commonly used tools that can predict if an amino acid substitutionwill have an effect on the protein function.266,267 SIFT uses sequence homologyto predict whether an amino acid substitution will affect protein function andpotentially alter phenotype.266 A SIFT score ≤ 0.05 is predicted to be damaging,and a score > 0.05 is predicted to be tolerated. PolyPhen-2 predicts the possibleimpact of amino acid substitutions on the stability and function of human proteinsusing structural and comparative evolutionary considerations.267 A PolyPhen-2score between 0.0 and 0.15 is predicted to be benign, a score between 0.15 and1.0 is predicted to be possibly damaging, and a score between 0.85 and 1.0 ismore confidently predicted to be damaging.267

An alternative strategy for filtering variants is based on a priori knowledgeof the functional involvement of variants or genes. For example, associationstudies with candidate genes have been used to identify a number of risk genesfor complex diseases.268 A candidate gene study takes advantage of and is limitedby knowledge of the phenotype, tissues, genes and proteins that are likely tobe involved or have been previously implicated in the disease.268,269 Assessingcandidate genes possessing functional variants in the context of existing biomedicalknowledge and known biomolecular functions can be used to produce a manageableset of variants for further validation or exploration.239 Several next generationsequencing (NGS) studies have identified rare variants associated with diseaseusing a candidate gene approach.270–274

In addition to variant filtering based on annotation and functional impact predictions,strong genetic support is also necessary for assigning possible causality to variantsidentified using WES.239 Evidence of genetic association or familial segregationshould be supplemented by functional and bioinformatics support.

51

Page 82: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.2.6 Association analysis in families

Association analysis in families can identify genes that influence complex humantraits and provide protection against population stratification.275 Variance componentsmodels are a way to assess the amount of variation in a dependent variable thatis associated with one or more random-effects variables.276 Variance componentsanalysis is widely used in the genetic analysis of quantitative traits in familystudies.275 This approach is favoured because it can accommodate pedigrees ofany size, it allows both linkage and association analysis, and tends to be morerobust than competing approaches.275

Sequential Oligogenic Linkage Analysis Routines (SOLAR) is a software thatperforms variance components analysis in pedigrees.277 Almasy and Blangero(1998)278 extended the strategy developed by Amos (1994)279 for pedigree-basedvariance components analysis to estimate the genetic variance attributable tothe region around a specific genetic marker using SOLAR. Maximum likelihoodmethods that take into account relationships among family members can beused to determine association in a polygenic model in SOLAR.

3.2.7 Familial segregation

Segregation analysis is a general method for evaluating the transmission of adisease or trait within pedigrees. Segregation analysis can be used to prioritiseand filter variants by assessing the co-segregation of candidate variants withdisease status.276 This analysis distinguishes variants that segregate with thedisease of interest and are absent in unaffected family members. Segregationanalysis can be applied to any pedigree structure and works with both qualitativeand quantitative traits.280

3.2.8 Outline of chapter

This chapter describes the annotation, filtering, prioritisation and segregationanalysis of WES data to identify putative germline risk variants that are associatedwith cancer or sarcoma in three cancer cluster families. WES data from Chapter2 was annotated using ANNOVAR and RegulomeDB. Putative structural andregulatory variants were filtered using genomic location and variant class or

52

Page 83: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

RegulomeDB score. Three different strategies were used to further prioritiserare private variants, known rare variants and candidate gene variants. Prioritisedvariants were tested for association with sarcoma and cancer using SOLAR.Significant variants were assessed for familial segregation with disease.

3.3 Methods

3.3.1 Ascertainment bias correction

The families selected for this study were ascertained from the InternationalSarcoma Kindred Study (ISKS),175 as described previously in Chapter 2. Aweighted covariate using a probability unit (probit) regression was created inR281 (bias reduction in binomial-response generalised linear models (brglm)library, version 3.1.2)281 to account for ascertainment bias in the sample. Probitregression assigns a weight to each based on their case status and can be usedas a covariate in modelling.

3.3.2 Intersection

The intersect file created from the variant call files from the Torrent VariantCaller (TVC, version 5.0.0), and Genome Analysis Toolkit (GATK, version3.4.0) UnifiedGenotyper in Chapter 2 was used in these analyses. This file consistsof 94,623 variants.

3.3.3 Annotation and filtration

ANNOVAR (version 2015Jun16)245 was used to annotate the intersect file usinggene-based annotation. Using the ANNOVAR annotation, variants were filteredto include only putative structural variants. Variant filtering retained loci ifthey: (1) were exonic, (2) were predicted to be nonsynonymous or resultingin a stop gain or stop loss, (3) were predicted to be deleterious or probablydamaging in SIFT and PolyPhen-2 and, (4) had a GERP score < 3.

53

Page 84: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

All remaining variants that were not classified as putative structural variantswere annotated using RegulomeDB.257 Putative regulatory variants that had aRegulomeDB score of 1a, 1b, 1c, 1d, 1e, 1f, 2a, 2b or 2c were retained as thesescores represent the highest confidence that a variant lies within a functionallocation. Table 3.1 shows the classification of scores from RegulomeDB. Knownexpression quantitative trait loci (eQTL) for genes are associated with expressionand are most likely to result in a functional consequence.257 Other subcategorieswith high confidence for regulatory variants are transcription factor (TF) binding,TF motifs, Deoxyribonuclease (DNase) footprints and DNase peaks.257

Table 3.1: Classification of Regulome database scores

Score Supporting data

1a eQTL + TF binding + matched TF motif + matchedDNase Footprint + DNase peak

1b eQTL + TF binding + any motif + DNase Footprint +DNase peak

1c eQTL + TF binding + matched TF motif + DNase peak

1d eQTL + TF binding + any motif + DNase peak

1e eQTL + TF binding + matched TF motif

1f eQTL + TF binding / DNase peak

2a TF binding + matched TF motif + matched DNaseFootprint + DNase peak

2b TF binding + any motif + DNase Footprint + DNase peak

2c TF binding + matched TF motif + DNase peak

3a TF binding + any motif + DNase peak

3b TF binding + matched TF motif

4 TF binding + DNase peak

5 TF binding or DNase peak

6 OthereQTL: Expression Quantitative Trait Loci. TF: Transcription Factor. DNase:Deoxyribonuclease.

54

Page 85: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

False positive variants that arise due to misalignment, inaccuracies and biasesin the reference sequence can be identified and provisionally excluded during asearch for disease-causing variants. Fuentes Fajardo et al. (2012) analysed WESdata from 118 individuals in 29 families to create a list of 2,157 genes that arecandidates for provisional exclusion from exome analysis.282 All filtered variantsin this study were cross-referenced to the exclusion list by Fuentes Fajardo et al.(Available in the paper’s Supplementary material: ‘Table S7 gene exclusion listfinal’) to determine if any results found in polygenic regions should be excludedto reduce the risk of false positives.

3.3.4 Prioritisation strategies

3.3.4.1 Prioritisation using a rare private variants strategy

The first prioritisation strategy was applied to the filtered variants from theintersect file to identify rare private variants. Rare private variants are definedas those unique to individuals or families, and those that have not been previouslyannotated.283 A major driving hypothesis behind WES of complex diseases isthat multiple, rare variants in protein-coding genes contribute to the disease/traitof interest.284 The focus on rare genetic variation is supported by studies thatpredict that numerous functional and deleterious variants segregate in the populationat frequencies too low (0.5 - 5%) to detect by genome wide association (GWA)studies.128 Investigators have successfully used this approach to identify rareprivate variants after removing known variants with a reference SNP identification(rs ID) from further consideration if they are found in the International HaplotypeProject (HapMap),285 the 1000 Genomes Project,286 or dbSNP.251

The variants from the intersect file were filtered to remove those that had beenpreviously annotated to prioritise rare private variants in this study.251

3.3.4.2 Prioritisation using a known rare variants strategy

The second strategy was used to prioritise known rare variants from the filteredintersect file using a population database and MAF information. By filteringthe data from WES for rare variants that have been documented in a largedatabase such as ExAC, variants that occur at a low frequency in the population

55

Page 86: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

that may be associated with cancer are more likely to be prioritised in thesecancer cluster families.

The full list of variants from the ExAC browser were downloaded (version 0.3.1,30 August 2016). Variants from a complete list of ExAC browser variants witha MAF ≤ 0.01 (1%) and that were also in the intersect file were selected.

3.3.4.3 Prioritisation using a candidate gene strategy

The prioritisation of candidate genes based on a priori knowledge of cancerbiology was the third prioritisation strategy used on the filtered intersect filein this study. The variants from the intersect file were filtered to prioritise thosedetected in 119 known cancer and sarcoma genes including 25 kb upstream anddownstream of the gene to include any potential regulatory variants capturedin off-target reads. Candidate genes were selected from two cancer gene panelsand a search of the Online Mendelian Inheritance in Man (OMIM) database.287

Cancer genes were chosen from the HaloPlex Cancer Research Panel,288 andIllumina’s MiSeq and TruSeq Cancer Panels.289 Both panels are NGS targetenrichment panels that were designed for known cancer hotspots. The panelscontain genes found in previous research to be associated with a broad range ofcancer types as well as with published drug targets.

Candidate genes from the results of a search of the OMIM database for genesknown to be associated with the specific sarcoma subtypes in the three familieswere also included.287 The full list of cancer genes used in the prioritisationprocess can be found in Appendix G. The variants present in both the intersectfile and in the candidate genes were selected.

3.3.5 Methods for testing association of variants with cancerphenotypes

SOLAR (version 7.6.4)277 was employed to estimate and test the significance ofassociation under a polygenic model for quantitative phenotypes (age at onsetof cancer and age at onset of sarcoma) and disease status (cancer and sarcoma).Covariates included were age and sex of the participant, and the age–sex interactionsalong with a weighting factor assigned to each individual to correct for the ascertainment

56

Page 87: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

bias. Analysis of disease status as discrete binary traits was performed using aliability threshold model in SOLAR. This model employs probit regression forthe mean effect component and a standard random effects variance componentmodel for the residual additive genetic component of variance.278,290 As variancecomponent models are highly influenced by kurtosis (a descriptor of the shapeof a probability curve), the quantitative phenotypes were inverse normalisedusing the SOLAR function, inorm.291

3.3.6 Bonferroni correction

Bonferroni correction was performed on each annotated variant list to correctfor multiple testing.292 Corrections were performed for each method based onthe number of variants in the prioritised list. Any significant variants after correctingfor multiple testing, or nominal variants (p-value < 0.05), were investigated forco-segregation in the families.

3.3.7 Familial segregation analysis

Three assumptions were used to determine familial segregation. First, the variantwill be rare (shared only by cases in one family). Second, every carrier of aputative disease-causing variant will have the phenotype (complete penetrance).Third, every individual with the disorder will carry the putative disease-causingvariant (100% probability of observing a genotype given the phenotype).284

Due to the segregation analysis assumptions, it was hypothesised that variantsidentified by this approach would be private mutations that co-segregate withcancer or sarcoma in each family. The genotypes of any variants found to segregatewith the phenotype of interest were visually confirmed by importing the BinaryAlignment/Map (*.bam) files into Integrative Genomics Viewer (IGV, version2.3.80) by determining the number of reads for each allele.183,184

3.3.8 Evidence further supporting candidate risk genes

The candidate germline risk variants and the genes in which they arise werefurther examined for association with cancer pathogenesis using several in silicoresources including the Catalogue of Somatic Mutations in Cancer (COSMIC),134

57

Page 88: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

the pathway unification database (PathCards),293 gene ontology (GO) annotations,294

PubMeth (a database of methylation in cancer),295 and NCBI.296 A PubMedsearch was performed using a string (“gene name”) AND (cancer OR malignancyOR tumor* OR tumour* OR sarcoma) in April 2017. Abstracts were screenedfor relevance to the current study.

3.4 Results

3.4.1 Variant prioritisation

The intersect file containing 94,263 variants was annotated with ANNOVARand RegulomeDB and variants in known polymorphic regions were removed.Approximately 42% of variants were exonic and 51% were intronic (Table 3.2).Less than 1% of variants were intergenic. Of the exonic variants, approximately48% were nonsynonymous, and 51% were synonymous, with 0.5% classified asstop gain and loss variants.

Table 3.2: Functional annotation of intersect file using ANNOVAR

Function Percentage

Exonic 42.45

Nonsynonymous 47.61

Synonymous 50.55

Stop gain/loss 0.50

Unknown 1.35

Intronic 50.74

Intergenic 0.04

Upstream/downstream 0.68

UTR 4.96

Other 1.13

58

Page 89: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.4.1.1 Prioritisation using a rare private variants strategy

The first prioritisation method was employed to identify rare, novel variantsnot previously reported in reference data sets. Of the 94,263 variants in theintersect file, 4,425 variants had not previously been annotated with an rs IDnumber. Of these, 1,858 (42%) were exonic variants and 1,184 (64%) were nonsynonymous.

3.4.1.2 Prioritisation using a known rare variants strategy

The second prioritisation method was used to identify known rare variants usingthe ExAC public database. There were over 10 million variants in the ExACbrowser (release 0.3.1, 30 March 2016). Of those 10 million variants, 3,686,062variants had a MAF of less than 0.01. Of the ~3.7 million rare variants, 8,840variants were also in the intersect file. Of these, 5,184 (59%) were exonic and2,815 (54%) were nonsynonymous.

3.4.1.3 Prioritisation using a candidate gene strategy

The third prioritisation method was based on a priori knowledge of cancer andsarcoma. The results of the WES intersect file were filtered to only those variantsin known cancer and sarcoma genes (1,297 variants). Of these variants, 806were in the known cancer genes listed in Appendix G. The remaining 491 variantswere located in regions upstream and downstream (25 kb) of each known cancergene. Appendix H contains a table of variants in the upstream and downstreamregions of the cancer genes that were also prioritised using this method. Of the1,297 variants, 487 (38%) were exonic and 211 (43%) were nonsynonymous.

3.4.1.4 Summary of annotated variants from each prioritisation strategy

A summary of the annotated variants from each prioritisation strategy is presentedin Table 3.3. The first section of the table shows the number of variants prioritisedby each strategy, followed by the genomic location of variants, exonic functionand functional prediction. The final section of the table shows the number ofvariants classified as putative structural and functional variants. The results ofeach prioritisation strategy were tested for significant associations with cancerphenotypes using SOLAR.

59

Page 90: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 3.3: Summary of variant annotation using Annotate Variation and Regulome Database for each prioritisationstrategy

Strategy Rare private variants Known rare variants Candidate gene variants

Number of variants prioritised 4,425 8,840 1,297

Location

Exonic 1,858 5,184 487

Intronic 2,170 3,209 724

Downstream 8 6 5

Upstream 25 14 3

5’ untranslated region 132 124 28

3’ untranslated region 119 197 38

Splicing 19 19 2

Non-coding RNA 91 84 10

Intergenic 1 3 0

Upstream/downstream 1 0 0

5’/3’ untranslated region 1 0 0

Exonic function

Nonsynonymous 1,184 2,815 211

Stop gain 40 34 1

Stop loss 1 4 0

Synonymous 601 2,268 273

Unknown 32 63 2

60

Page 91: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Strategy Rare private variants Known rare variants Candidate gene variants

Functional prediction

Deleterious in SIFT and PolyPhen-2 254 449 22

Tolerated in SIFT and PolyPhen-2 545 1,551 134

Unknown in SIFT and PolyPhen-2 3,189 46 6

Regulome database score < 3 0 683 168

Classification

Putative structural variants 254 449 22

Putative regulatory variants 0 683 168

SIFT: Sorting Intolerant From Tolerant. PolyPhen-2: Polymorphism Phenotyping-2.

61

Page 92: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.4.2 Rare private variants

3.4.2.1 Association analysis in SOLAR

The annotated rare private variants (Table 3.3) were tested for association withcancer phenotypes using SOLAR. The results from SOLAR were corrected formultiple testing using the Bonferroni method with the number of prioritisedvariants (4,425). The significance level after correction was α < 1.23 x 10−5. Novariants were significantly associated with a cancer phenotype after correctingfor multiple testing. As the variants prioritised by this strategy were rare, novelvariants, all nominally significant variants (p-value < 0.05) were visually confirmedusing IGV to determine if they could be due to alignment or calling error. Anyvariants located near an insertion or deletion or on the edge of a gap or readblock were removed.

Table 3.4 contains a summary of the nominally significant variants(p-value < 0.05) for the age at onset of cancer, the age at onset of sarcoma, andcancer status. The results show eight variants nominally associated with age atonset of cancer, six variants nominally associated with age at onset of sarcoma,and two variants showing nominal association with cancer status. There wereno variants with a p-value < 0.05 for sarcoma status.

Of the total variants, eight were associated with a single cancer phenotype,and four variants were associated with more than one cancer phenotype. Twovariants were associated with age at onset of cancer and cancer status, andtwo variants were associated with age at onset of cancer and age at onset ofsarcoma. As these were rare variants without an rs ID, MAF data from the1000 Genomes Project database and annotation using RegulomeDB was notavailable. Therefore, all the variants identified by this prioritisation strategywere rare risk alleles .

62

Page 93: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 3.4: Summary of SOLAR association results for rare private variants

Chr:Pos Gene p-value Beta SE Exonic function SIFT PolyPhen-2 GERP Ref Alt MAF

Age at onset of cancer

8:145773319 ARHGAP39 0.01 1.36 0.51 NS D D 5.37 C T 0.08

2:232790160 NPPC 0.01 1.03 0.42 NS D D 5.29 C G 0.13

20:57568079 NELFCD 0.02 1.42 0.58 NS D D 5.84 T C 0.05

17:11726319 DNAH9 0.04 1.87 0.89 NS D D 4.05 G A 0.03

6:25517633 LRRC16A 0.04 0.93 0.44 NS D D 5.94 G A 0.05

9:95772634 FGD3 0.04 -1.87 0.89 NS D D 4.29 C A 0.47

19:41128570 LTBP4 0.05 1.00 0.50 Unknown D D 3.77 C A 0.08

6:72889438 RIMS1 0.05 1.30 0.65 NS D D 5.65 G A 0.05

Age at onset of sarcoma

17:72878728 FADS6 < 0.01 1.59 0.49 NS D D 5.15 A G 0.03

17:11726319 DNAH9 0.01 1.51 0.57 NS D D 4.05 G A 0.03

6:25517633 LRRC16A 0.01 0.72 0.27 NS D D 5.94 G A 0.05

19:51021625 LRRC4B 0.03 -0.58 0.27 NS D D 3.45 C G 0.18

16:77227362 MON1B 0.03 -0.67 0.31 NS D D 3.61 G T 0.21

6:30624000 DHX16 0.06 0.56 0.30 NS D D 4.89 A G 0.18

Cancer status

8:145773319 ARHGAP39 0.02 -5.14 2.26 NS D D 5.37 C T 0.08

19:41128570 LTBP4 0.02 -3.26 1.44 Unknown D D 3.77 C A 0.08

Chr:Pos: Chromosome:Position. SE: Standard Error. SIFT: Sorting Intolerant from Tolerant score. PolyPhen-2:Polymorphism Phenotyping-2. GERP: Genomic Evolutionary Rate Profiling score. Ref: reference allele. Alt:alternate allele. MAF: Minor Allele Frequency in the study population. NS: nonsynonymous. D: deleterious.

63

Page 94: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.4.2.2 Segregation analysis results

Of the 12 variants identified, seven were seen only in one family (ARHGAP39,NELFCD, LTBP4, RIMS1, DNAH9, LRRC16A and FADS6 ). However, usingall three criteria for familial segregation, only one conserved deleterious variantin the ARHGAP39 gene showed nominal association with age at onset of cancerand cancer status and complete familial segregation in family 3 (Figure 3.1).Each family member with cancer was heterozygous at this position, whereasunaffected family members were homozygous for the reference allele at thisposition. None of the other prioritised rare private variants showed completefamilial segregation in any of the families according to the familial segregationcriteria.

3-I-1

Sarcoma

3-III-1

Sarcoma

3-II-1

Prostate

3-II-2

3-III-2

Affected male

Affected female

Unaffected male

Unaffected female

Proband

Key

PatientGenotype at position

in ARHGAP39 gene

Read depth

Ref, alt

Patient 3-I-1 C/T 18,16

Patient 3-II-1 C/T 38,30

Patient 3-II-2 C/C 66,0

Patient 3-III-1 C/T 37,57

Patient 3-III-2 C/C 13,0

Figure 3.1: Genotypes for the ARHGAP39 variant that shows segregation inpatients with cancer in family 3

64

Page 95: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.4.3 Known rare variants

3.4.3.1 Association analysis in SOLAR

The annotated known rare variants (Table 3.3) were tested for association withcancer phenotypes using SOLAR. The results from SOLAR were corrected formultiple testing using the Bonferroni method with the number of variants prioritised(8,840). The significance level after correction was α < 5.66 x 10−6. No variantswere significant after correcting for multiple testing. Table 3.5 contains a summaryof the nominally associated variants (p-value < 0.05) for the age at onset ofcancer, the age at onset of sarcoma, and cancer status. The results include tenvariants that showed nominal association with age at onset of cancer (eightputative structural and two putative regulatory variants), one putative regulatoryvariant that showed nominal association with age at onset of sarcoma, and 15variants showing nominal association with cancer status (12 putative structuraland three putative regulatory variants). There were no variants showing associationwith a p-value < 0.05 for sarcoma status.

Of all the variants, 12 were associated with a single cancer phenotype, and sevenvariants were associated with more than one cancer phenotype. Of the latter,all seven variants were associated with both cancer status and age at onset ofcancer.

65

Page 96: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 3.5: Summary of SOLAR association results for known rare variantsChr:Pos Gene p-value Beta SE Exonic function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt MAF 1000G MAF

Age at onset of cancer

1:40929077 ZFP69B 0.01 1.36 0.51 NS D D 3a 3.33 C G 0.0016 0.08

16:66503705 BEAN1 0.01 1.36 0.51 NS D . 5 4.11 C A 0.0050 0.08

4:1348920 UVSSA 0.01 1.36 0.51 NS D D 5 5.26 G A 0.0040 0.08

16:4606552 C16orf96 0.01 1.27 0.49 NS D D 2b 5.22 T C 0.0002 0.11

6:4087949 C6orf201 0.03 -0.76 0.34 NS D D 6 3.07 A T 0.0473 0.16

10:72462080 ADAMTS14 0.03 1.11 0.50 NS D D 5 5.93 C T 0.0002 0.08

10:79590510 DLG5 0.03 1.11 0.50 NS D D 5 5.67 C G 0.0058 0.08

8:128750540 MYC < 0.01 1.64 0.57 NS T D 2b 3.91 A G 0.0152 0.05

1:45224937 KIF2C 0.01 1.36 0.51 S . . 2b . G A 0.0012 0.08

7:20721130 ABCB5 0.01 1.27 0.49 S . . 2c . G A 0.0008 0.11

Age at onset of sarcoma

16:70595515 SF3B3 < 0.01 -1.30 0.25 Int . . 2b . T G . 0.16

Cancer status

16:4606552 C16orf96 0.01 -4.79 1.79 NS D D 2b 5.22 T C 0.0002 0.11

1:40929077 ZFP69B 0.02 -5.14 2.26 NS D D 3a 3.33 C G 0.0016 0.08

4:1348920 UVSSA 0.02 -5.14 2.26 NS D D 5 5.26 G A 0.0040 0.08

10:72462080 ADAMTS14 0.02 -3.48 1.54 NS D D 5 5.93 C T 0.0002 0.08

10:79590510 DLG5 0.02 -3.48 1.54 NS D D 5 5.67 C G 0.0058 0.08

66

Page 97: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Chr:Pos Gene p-value Beta SE Exonic function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt MAF 1000G MAF

13:33703738 STARD13 0.02 -3.29 1.45 NS D D 5 5.82 C T 0.0002 0.08

8:21986479 HR 0.02 -3.29 1.45 NS D D 5 2.95 G A 0.0012 0.08

8:67341481 RRS1 0.02 -3.29 1.45 NS D D 4 2.18 C A . 0.08

3:63264392 SYNPR 0.02 -3.26 1.44 NS D D 4 4.3 C T 0.0012 0.08

5:52397270 MOCS2 0.02 -3.26 1.44 NS D D 5 5.92 G A 0.0008 0.08

4:52926666 SPATA18 0.03 -1.47 0.67 NS D D 5 3.72 A T 0.0024 0.16

6:4087949 C6orf201 0.05 1.18 0.60 NS D D 6 3.07 A T 0.0473 0.18

7:20721130 ABCB5 0.01 -4.79 1.79 S . . 2c . G A 0.0008 0.11

3:10255002 IRAK2 0.01 3.01 1.22 NS T 0.00 2b -0.447 C G 0.0323 0.11

17:39197601 KRTAP1-1 0.02 1.49 0.64 NS T 0.00 2b -8.93 T C . 0.24

Chr:Pos: Chromosome:Position. SE: Standard Error. SIFT: Sorting Intolerant from Tolerant score. PolyPhen-2: Polymorphism Phenotyping-2.

GERP: Genomic Evolutionary Rate Profiling score. Ref: reference allele. Alt: alternate allele. MAF 1000G: Minor Allele Frequency in 1000

Genomes Project. MAF: Minor Allele Frequency in study population. NS: nonsynonymous. S: synonymous. Int: intronic. UTR3: 3’ untranslated

region. UTR5: 5’ untranslated region. D: deleterious. .: not annotated in database.

67

Page 98: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.4.3.2 Segregation analysis results

Of the 19 variants, 13 were only seen in one family (ZFP69B, BEAN1, UVSSA,C16orf96, ADAMTS14, DLG5, KIF2C, ABCB5, STARD13, HR, RRS1, SYNPRand MOCS2 ). Using the three criteria for familial segregation, six variants showedcomplete familial segregation. Two conserved, deleterious variants showed completefamilial segregation in family 2 (Figure 3.2). An exonic nonsynonymous variantin the C16orf96 gene showed nominal association with age at onset of cancerand onset of cancer. A synonymous variant in the ABCB5 gene also showednominal association with age at onset of cancer and cancer status. Each familymember with cancer was heterozygous at these positions, whereas unaffectedfamily members were homozygous for the reference allele at these positions.

2-I-1 2-I-2

Prostate

2-II-1

Sarcoma

2-II-2

Melanoma

2-II-4

2-III-1

2-II-3

Melanoma

Affected male

Affected female

Unaffected male

Unaffected female

Proband

Key

Patient Genotype at

position in

C16orf96 gene

Read depth

Ref, alt

Genotype at position

in ABCB5 gene

Read depth

Ref, alt

Patient 2-I-1 T/T 119,0 G/G 69,0

Patient 2-I-2 T/C 28,38 G/A 6,4

Patient 2-II-1 T/C 63,38 G/A 32,41

Patient 2-II-2 T/C 36,27 G/A 54,65

Patient 2-II-3 T/C 58.44 G/A 41,38

Patient 2-II-4 T/T 62,1 G/G 65,0

Patient 2-III-1 T/T 42,0 G/G 96,0

Figure 3.2: Genotypes for the C16orf96 and ABCB5 variants that showsegregation in patients with cancer in family 2

68

Page 99: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Using the three criteria for familial segregation, four conserved, deleterious variantsshowed complete familial segregation in family 3 (Figure 3.3). Exonic variantsin the ZFP69B and UVSSA gene showed nominal association with both ageat onset of cancer and cancer status in family 3. Two exonic variants in theBEAN1 and KIF2C genes showed nominal association with age at onset ofcancer in family 3. All patients with cancer in family 3 were heterozygous atthese positions, and unaffected family members were homozygous for the referenceallele at these positions. None of the other prioritised known rare variants showedcomplete familial segregation in any of the families according to the familialsegregation criteria.

69

Page 100: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3-I-1

Sarcoma

3-III-1

Sarcoma

3-II-1

Prostate3-II-2

3-III-2

Affected male

Affected female

Unaffected male

Unaffected female

Proband

Key

Patient

Genotype at

position in

ZFP69B gene

Read depth

Ref, alt

Genotype at

position in

UVSSA gene

Read depth

Ref, alt

Genotype at

position in

BEAN1 gene

Read depth

Ref, alt

Genotype at

position in

KIF2C gene

Read depth

Ref, alt

Patient 3-I-1 C/G 27,19 G/A 58,31 C/A 28,26 G/A 59,52

Patient 3-II-1 C/G 29,37 G/A 75,69 C/A 18,19 G/A 50,66

Patient 3-II-2 C/C 118,0 G/G 129,0 C/C 62,0 G/G 126,0

Patient 3-III-1 C/G 24,41 G/A 127,64 C/A 44,30 G/A 125,107

Patient 3-III-2 C/C 44,0 G/G 85,0 C/C 52,0 G/G 118,0

Figure 3.3: Genotypes for the ZFP69B, BEAN1, UVSSA and KIF2C variants that show segregation in patients withcancer in family 3

70

Page 101: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.4.4 Candidate gene variants

3.4.4.1 Association analysis in SOLAR

The annotated candidate gene variants (Table 3.3) were tested for associationwith cancer phenotypes using SOLAR. The results from SOLAR were correctedfor multiple testing using the Bonferroni method with the number of variantsprioritised (1,297). The significance level after correction was α < 3.86 x 10−5.No variants were significant after correcting for multiple testing. Table 3.6 containsa summary of the nominally associated variants (p-value < 0.05) for the ageat onset of cancer, the age at onset of sarcoma, and cancer status. The resultsinclude 14 variants that showed nominal association with age at onset of cancer(2 putative structural and 12 putative regulatory variants), two putative regulatoryvariants that showed nominal association with age at onset of sarcoma, and12 variants that showed nominal association with cancer status (one putativestructural and 11 putative regulatory variants). There were no variants showingan association with a p-value < 0.05 for sarcoma status.

Of the total variants, 18 variants were associated with a single cancer phenotype,and five variants were associated with more than one cancer phenotype. Threevariants were associated with age at onset of cancer and cancer status, one variantwas associated with age at onset of cancer and age at onset of sarcoma, and onevariant was associated with age at onset of sarcoma and cancer status.

71

Page 102: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 3.6: Summary of SOLAR association results for candidate gene variantsChr:Pos Gene p-value Beta SE Exonic function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt MAF 1000G MAF

Age at onset of cancer

16:334543 PDIA2 0.01 1.27 0.49 NS D D 5 3.17 C G 0.05 0.11

11:108098576 ATM 0.04 1.87 0.89 NS D D 6 4.22 C G 0.004 0.03

11:64577620 MEN1 0.01 -0.59 0.22 Int . . 2b -6.21 G C 0.17 0.08

11:64564208 MAP4K2 0.01 -1.01 0.38 Int . . 1f 1.94 A G 0.16 0.11

19:45866972 ERCC2 < 0.01 1.42 0.58 Int . . 2b 2.51 C T 0.00 0.05

17:41622861 ETV4 0.03 -0.75 0.34 Int . . 2b 2.49 G A . 0.18

17:18208544 TOP3A 0.03 -0.86 0.40 Int . . 1f -6.25 G A 0.32 0.13

17:18226177 SMCR8 0.03 -0.86 0.40 S . . 2b 5.20 G T 0.25 0.13

17:18231998 SHMT1 0.03 -0.86 0.40 UTR3 . . 1f -0.29 G A 0.32 0.13

17:18232017 SHMT1 0.03 -0.86 0.40 UTR3 . . 1f 1.74 G C 0.20 0.13

17:18233810 SHMT1 0.03 -0.86 0.40 Int . . 1f -6.43 T C 0.20 0.13

11:47369443 MYBPC3 0.04 1.87 0.89 S . . 1f -11.10 G A 0.07 0.03

11:64557132 MAP4K2 0.04 -1.07 0.52 Int . . 1f 1.69 C T 0.17 0.05

19:42342319 LYPD4 0.04 -0.53 0.26 S . . 1f -3.55 A G 0.60 0.55

Age at onset of sarcoma

8:145742879 RECQL4 0.01 0.49 0.18 S . . 2b 0.96 T C 0.53 0.39

11:47369443 MYBPC3 0.01 1.51 0.57 S . . 1f -11.10 G A 0.07 0.03

72

Page 103: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Chr:Pos Gene p-value Beta SE Exonic function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt MAF 1000G MAF

Cancer status

16:334543 PDIA2 0.01 -2.76 1.03 NS D D 5 3.17 C G 0.06 0.11

11:64577620 MEN1 0.01 4.39 1.73 Int . . 2b -6.21 G C 0.18 0.18

16:322934 RGS11 0.02 -1.50 0.65 Int . . 2b 3.34 C T 0.55 0.82

17:41622861 ETV4 0.02 1.50 0.65 Int . . 2b 2.49 G A . 0.18

20:43030160 HNF4A 0.02 -2.50 1.11 Int . . 2b 2.12 G A 0.02 0.08

8:145730330 GPT 0.03 -1.37 0.64 Int . . 2b -0.87 G A 0.39 0.53

8:145737514 RECQL4 0.03 -1.37 0.64 Int . . 2b -9.84 G A 0.41 0.53

8:145741765 RECQL4 0.03 -1.37 0.64 S . . 2b -9.43 G A 0.36 0.53

11:47371578 MYBPC3 0.04 3.20 1.53 S . . 2b 0.53 G A 0.01 0.08

16:419923 MRPL28 0.04 3.20 1.53 Int . . 2b -7.23 G C 0.11 0.08

16:68857289 CDH1 0.04 2.94 1.41 Int . . 2b 3.50 T C 0.07 0.08

8:145742879 RECQL4 0.05 -0.95 0.48 S . . 2b 0.96 T C 0.53 0.39

Chr:Pos: Chromosome:Position. SE: Standard Error. SIFT: Sorting Intolerant from Tolerant score. PolyPhen-2: Polymorphism Phenotyping-2.

GERP: Genomic Evolutionary Rate Profiling score. Ref: reference allele. Alt: alternate allele. MAF 1000G: Minor Allele Frequency in 1000

Genomes Project. MAF: Minor Allele Frequency in the study population. NS: nonsynonymous. S: synonymous. Int: intronic. UTR3: 3’

untranslated region. UTR5: 5’ untranslated region. D: deleterious. .: not annotated in database.

73

Page 104: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.4.4.2 Segregation analysis results

Of the 23 different candidate variants identified, five variants were only seenin one family (PDIA2, ERCC2, HNF4A, MYBPC3 and MRPL28 ). However,using all three criteria for familial segregation, only one variant in the PDIA2gene showed nominal association with age at onset of cancer and cancer statusand complete familial segregation in family 2 (Figure 3.4). Each family memberwith cancer was heterozygous at this position, whereas unaffected family memberswere homozygous for the reference allele at this position. None of the otherprioritised candidate gene variants showed complete familial segregation in anyof the families according to the familial segregation criteria.

2-I-1 2-I-2

Prostate

2-II-1

Sarcoma2-II-2

Melanoma

2-II-4

2-III-1

2-II-3

Melanoma

Affected male

Affected female

Unaffected male

Unaffected female

Proband

Key

Patient Genotype at

position in

PDIA2 gene

Read depth

Ref, alt

Patient 2-I-1 C/C 206,1

Patient 2-I-2 C/G 109,135

Patient 2-II-1 C/G 74,67

Patient 2-II-2 C/G 63,43

Patient 2-II-3 C/G 106,108

Patient 2-II-4 C/C 138,0

Patient 2-III-1 C/C 161,0

Figure 3.4: Genotypes for the PDIA2 variant that shows segregation in patientswith cancer in family 2

74

Page 105: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.4.5 Evidence further supporting germline risk genes

The nominally significant (p-value < 0.05) variants that showed familial segregationwere researched using several in silico resources. Table 3.7 contains a combinedsummary of several in silico resources for all nominally significant candidategermline risk variants identified by the three prioritisation strategies that showedfamilial segregation and the genes in which they arise. Evidence from the tableindicates that none of the candidate risk variants identified were reported inCOSMIC (database of genes somatically mutated in cancers). However, six ofthe genes in which germline risk variants were identified were each reportedto have mutations in the COSMIC database. None of the genes were listed inthe COSMIC cancer gene census. They also were not listed in the PubMethdatabase, which suggests there is currently no evidence of methylation of thesegenes in cancer. Two of the genes were reported to have gene functions thatsupport involvement in cancer pathogenesis in NCBI. The Gene References intoFunctions (GeneRIF) for ABCB5 suggests a role for this gene in chemoresistanceand the GeneRIF for KIF2C suggests this gene is involved in directional migrationand invasion of tumour cells.

A summary of the PubMed searches for the eight candidate risk genes is summarisedin Table 3.8. The PubMed searches revealed previously published associationsbetween the ABCB5, KIF2C, and PDIA2 genes and cancer. However, there isno supporting evidence for the involvement of ARHGAP39, C16orf96, ZFP69B,UVSSA and BEAN1 genes in cancer pathogenesis at this time. The single publicationreturned by the search strategy for the ARHGAP39 gene revealed a role for thegene as a binding partner for CNK2 which is a spatial modulator of Rac cyclingduring spine morphogenesis.297 This publication did not report any associationof ARHGAP39 and cancer. The PubMed search for the BEAN1 gene returnedresults on randomised soya trials, labelled BEAN1 and BEAN2.298,299 No publicationswere returned on the function of the BEAN1 gene or involvement in cancerpathogenesis.

75

Page 106: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 3.7: Summary of findings from in silico resources investigating the role of candidate germline risk variants incancer pathogenesis

Gene Genomiclocation

Variant inCOSMIC

No.mutationsin COSMIC

Cancergenecensus

SuperPath GO Molecular function Methylation GeneRIF

ARHGAP39 8:145773319 No 246 No Developmental biology

Signalling by Robo receptor

75 NTR receptor-mediatedsignalling

Signalling by GPCR

Signalling by Rho GTPases

GTPase activator activity No No

C16orf96 16:4606552 No 155 No . . No No

ABCB5 7:20721130 No 332 No ABC-family proteins mediatedtransport

Transmembrane transport ofsmall molecules

ATP binding

Xeonobiotic-transportingATPase activity

Efflux transmembranetransporter activity

ATPase activity

No Chemoresistance

Page 107: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Variant inCOSMIC

No.mutationsin COSMIC

Cancergenecensus

SuperPath GO Molecular function Methylation GeneRIF

ZFP69B 1:40929077 No 0 No Gene expression DNA binding

Transcription factor activity,sequence-specific DNA binding

Protein binding

Metal ion binding

No No

UVSSA 4:1348920 No 0 No Transcription-couplednucleotide excision repair

DNA double strand breakrepair

RNA polymerase II corebinding

Protein binding

No No

BEAN1 16:66503705 No 31 No . . No No

KIF2C 1:45224937 No 141 No Golgi-to-ER retrogradetransport

Cell cycle

Mitotic metaphase andanaphase

Mitotic prometaphase

Vesicle-mediated transport

Microtubule motor activity

Protein binding

ATP binding

Microtubule binding

ATPase activity

No Directionalmigration andinvasion of tumourcells

Page 108: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Variant inCOSMIC

No.mutationsin COSMIC

Cancergenecensus

SuperPath GO Molecular function Methylation GeneRIF

PDIA2 16:334543 No 114 No Statin pathway Protein disulfide isomeraseactivity

Steroid binding

Protein binding

Lipid binding

Disulfide oxidoreductaseactivity

No No

Genomic location: chromosome:position.COSMIC: Catalogue of Somatic Mutations in Cancer database (http://cancer.sanger.ac.uk/cosmic).134

No. mutations in COSMIC: the number of mutations reported in the gene in the COSMIC database.Cancer gene census: is the gene reported in the cancer gene census in COSMIC? The cancer gene census is a catalogue of genes for which mutations have been causallyimplicated in cancer.SuperPath: from Pathcards, an integrated database of human pathways and their annotations. (http://pathcards.genecards.org/).293 Human pathways are clusteredinto SuperPaths based on gene content similarity.GO molecular function: Gene Ontology molecular function.294

Methylation: is the gene reported to be methylated in cancer by PubMeth? (http://www.pubmeth.org).295

GeneRIF: Gene References Into Functions from National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/).296 Are any GeneRIF associated withcancer reported for the gene? Robo: Roundabout family of proteins. NTR: Neurotrophins. GPCR: G-protein-coupled receptors. GTPase: Guanosinetriphosphatase.ABC: ATP-binding cassette. ATP: Adenosine triphosphate. ATPase: Adenosinetriphosphatase. ER: endoplasmic reticulum.

Page 109: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 3.8: Summary of search results from PubMed for genes in which germline variants were identified

Gene No. of publications Role of gene Selected references

ARHGAP39 1 . .

C16orf96 0 . .

ABCB5 109 ABCB5 is a drug efflux pump associated with melanoma, colon cancer,Merkel cell carcinoma, oral squamous cell carcinoma, acute leukemia,colorectal cancer, hepatic cancer, breast cancer and osteosarcoma drugresistance. ABCB5 has also been found to be overexpressed at thetranscriptional level in a number of cancer subtypes, including breast cancer,melanoma. Alterations found in ABCB5 reported in lung cancer.

300–311

ZFP69B 0 . .

UVSSA 5 UVSSA is involved in transcription-coupled nucleotide excision repair byrelieving RNA polymerase II arrest at damaged sites to permit repair of thetemplate strand. Mutations in UVSSA associated with Cockayne syndromegroup B (characterised by photosensitivity, growth failure, progressiveneurodevelopmental disorder, and premature ageing but no predispositionto skin cancer).

312–316

BEAN1 2 . .

KIF2C 55 KIF2C (also known as MCAK ) is critical in the regulation of microtubuledynamics during mitosis. KIF2C is also involved in the directional migrationand invasion of tumour cells and plays a role in cell proliferation. KIF2C is agene likely to be involved in carcinogenesis.

317–323

Page 110: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene No. of publications Role of gene Selected references

PDIA2 3 Gene expression of PDIA2 found to influence the prognostic significance ofTWIST (correlated with cancer invasion and metastasis in several humancancers). PDIA2 plays a role in the maintenance of endoplasmic reticulumhomeostasis and endoplasmic reticulum stress-induced apoptosis.

324,325

PubMed search was performed using a string (“gene name”) AND (cancer OR malignancy OR tumor* OR tumour* OR sarcoma) in April2017. Abstracts were screened for relevance to the current study.

Page 111: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.5 Discussion

The filtering and prioritisation of eight germline variants generated by WESin three families were described in this chapter. Eight candidate germline riskvariants were found to show nominal association with cancer and age at onset ofcancer in two of the three cancer cluster families.

3.5.1 Variant filtering and prioritisation strategies

The annotation results using ANNOVAR are consistent with a previous publicationthat reports a significant amount of DNA fragments across WES capture falloutside target regions.256 There were slightly more synonymous variants thannonsynonymous variants, which is also consistent with previous findings.286

SIFT and PolyPhen-2 scores from ANNOVAR annotation were used to determineif the variants were likely to have a deleterious effect on protein function. Aprevious study reports reasonable sensitivity for SIFT and PolyPhen-2 (69%and 68%, respectively) but low specificity (13% and 16%, respectively).326 Therefore,both programs have a high false-positive rate and these results should be interpretedwith caution and should be reported in the context of other available evidence.326

In addition to variants reported as deleterious or tolerated by SIFT and PolyPhen-2,there are a number of variants that were not annotated with a score (unknown).In particular, 80% of variants prioritised by the rare variants strategy were filteredout because they were unknown in both databases.

Although two of the prioritisation strategies identified more regulatory variantsthan structural variants, of the eight candidate risk variants that showed familialsegregation, seven were structural, and only one was regulatory. In this study,exome sequencing combined with variant filtering and prioritisation is an efficientstrategy for identifying risk alleles in cancer cluster families.

81

Page 112: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.5.2 Association and segregation analyses of candidate riskvariants in families

Family segregation studies are re-emerging as an optimal way to classify extremelyrare variants.327 In this study, three assumptions were made in determiningfamilial segregation. These assumptions did not take into account the possibilityof unaffected carriers (incomplete penetrance), later onset of disease, or riskvariants that occur in cases in more than one family. Therefore, some true variantsmay have been excluded using these assumptions.

SOLAR was used to test for association of filtered and prioritised variants withboth age at onset of disease and disease status. Despite efforts to filter andprioritise variants, no variants reached statistical significance after correctingfor multiple testing and a nominal p-value of < 0.05 was therefore used to selectvariants for familial segregation analyses. The large number of variants identifiedand the relatively small sample size are the likely reasons that no variants reachedstatistical significance after correcting for multiple testing in this study.

Despite these limitations, by treating each family as a separate discovery unit,it was hoped that some insight might be gained into genetic contributions tothe risk of cancer in each family. Eight variants nominally associated with ageat onset of cancer and cancer status were identified in two of the three cancercluster families.

The candidate risk variants identified in this study were all private variants,seen only in one family. There has been increasing awareness that rare variantsof modest to large effect contribute to complex diseases and may explain a substantialproportion of “missing heritability”.129 There has been, therefore, a return tofamily-based studies to identify rare risk variants involved in common humandisease.133,152–155

Recent sequencing studies have shown that the rate of private mutations inindividuals is larger than previously expected.328–330 Rare, private mutationsfound in families could be due to the explosion of human populations and theslowing of negative selection by improved food supplies, sanitation, vaccines

82

Page 113: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

and routine health care.329–331 Rare variants that are private to families couldconstitute a proportion of disease risk variants.328

It is plausible that the variants found in the ABCB5, KIF2C and PDIA2 genesmay be involved in the pathogenesis of cancer based on previous publicationsand the proposed function of the protein. Each of these genes is discussed inmore detail below.

3.5.2.1 The ABCB5 gene

The ABCB5 gene is a ATP-binding cassette (ABC) drug efflux transporterpresent in a number of stem cells.332,333 ABCB5 functions as a determinant ofmembrane potential and regulator of cell fusion in physiologic skin cells.334 Thisgene is also expressed in clinical malignant melanoma tumours and preferentiallymarks CD133+ stem cell phenotype expressing tumour cells.334

ABCB5 is a rhodamine-123 efflux transporter and marks CD133-expressingprogenitor cells. ABCB5 regulates membrane potential in these progenitor cellsand determines the propensity to undergo cell fusion.334 Membrane hyperpolarisationis associated with the multidrug resistance phenotype of human cancer cells.335

ABCB5 plays a role in multi-drug resistance of multiple malignancies includinghuman malignant melanoma,333,336,337 colon cancer,304,338 Merkel cell carcinoma,305

oral squamous cell carcinoma,306 acute leukaemia,307 colorectal cancer,309 hepatocellularcarcinoma,310 breast cancer,311 and osteosarcoma.303 Melanoma is resistant tothe effects of doxorubicin,333 a chemotherapy drug used to treat many differenttypes of cancer. It has been proposed that the ABCB5 drug efflux function maybe involved in doxorubicin resistance.334

The variant identified in the ABCB5 gene may be phenotypically relevant tofamily 2 as this family has two family members affected by melanoma (Patient2-II-2 and Patient 2-II-3), in addition to a prostate cancer case (Patient 2-I-1),and a sarcoma case (Patient 2-II-1).

83

Page 114: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.5.2.2 The KIF2C gene

KIF2C is a kinesin-like protein that functions as a microtubule-dependent molecularmotor.339 The KIF2C gene (also known as MCAK ), is one of the best characterisedmembers of the kinesin-13 family and plays an important role in microtubuledynamics during mitosis.320 The deregulation of KIF2C induces defects in spindleassembly, chromosome congression and segregation leading to chromosome instability,340–344

one of the hallmarks of cancer.320 KIF2C is important for the migration andinvasion of tumour cells via the modulation of microtubule dynamics in thecytoskeleton.320,345,346

The KIF2C gene has been identified as a tumour antigen in patients with colorectalcancer.347 The overexpression of KIF2C associates with a more invasive andmetastatic phenotype and poor prognosis for breast, gastric and colorectal cancerpatients.347–350 KIF2C may represent an attractive target for antigen-specificimmunotherapies in colorectal cancer and other malignancies.347,348

3.5.2.3 The PDIA2 gene

The PDIA2 gene is the pancreas-specific member of the protein disulphide isomerase(PDI) family of proteins. PDIA2, as with other PDIs, has a central role as areductase, an oxidase, an isomerase and molecular chaperone in the endoplasmicreticulum.351 It has been proposed that PDIA2 plays a role in the productionand secretion of digestive enzymes in vivo352 and in the binding and regulationof oestrogen synthesis.353

A higher level of PDIA2 expression was found to be associated with shortersurvival time in patients whose prostate cancer expressed a high level of TWISTbut not in patients whose prostate cancer expressed a low level of TWIST .324

TWIST is an oncogene that is correlated with cancer invasion and metastasis inhuman cancers including breast cancer, rhabdomyosarcoma, gastric carcinomas,bladder and prostate cancer.354–357 Little is known about the role of PDIA2in prostate cancer, although lower levels of PDIA2 expression were associatedwith better survival.324 Therefore, PDIA2 may promote cancer progression.324

However, PDIA2 alone was a poor prognostic marker for prostate cancer.324

84

Page 115: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

3.5.3 Conclusion

In conclusion, WES data was annotated, filtered and prioritised in an attemptto identify candidate germline risk variants that may be involved in cancer orsarcoma pathogenesis in three cancer cluster families. As there is no gold standardfor the filtering and prioritisation of WES data, these results represent the currentstate of tools, databases and knowledge of cancer biology. With the data obtained,it is not possible to determine whether the variants in the ARHGAP39, C16orf96,ZFP69B, UVSSA and BEAN1 genes are pathogenic mutations. These genes,however, become candidates that can be further tested for association withcancer in independent families and study populations. With further geneticevidence of involvement in risk of cancer, functional studies including assays ofpatient-derived tissue or well-established cell or animal models of gene functioncould be undertaken to determine the causal effect of all candidate risk variantson the cancer phenotype.236 Due to time and budget limitations, these types offunctional studies are beyond the scope of this thesis.

85

Page 116: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

86

Page 117: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Chapter 4

Aim 3: A comparison of matchedtumour and germline DNA fromtwo sarcoma patients

4.1 Introduction

Next Generation Sequencing (NGS) of tumour samples and matched germlinesamples is a powerful strategy for studying the genetic basis of cancer initiation,development, and growth.133 The third aim of this study was to perform a matchedtumour and germline analysis on two myxoid liposarcoma patients using peripheralblood genomic DNA and genomic DNA isolated from tumour tissue to identifysomatic mutations.

4.1.1 Myxoid liposarcoma

Myxoid liposarcomas are the second most common group of adipocytic/lipogenicsarcomas.64 Myxoid liposarcomas are malignant tumours composed of uniformround to oval shaped primitive non-lipogenic cells and a variable number ofsmall signet-ring cell lipoblasts.64 The tumours typically exhibit a FUS-DDIT3or EWSR1-DDIT3 rearrangement.64 Myxoid liposarcomas occur most commonlyin the deep soft tissue of the extremities and very rarely in the retroperitoneum.

87

Page 118: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

4.1.1.1 Somatic variants

A comparison of matched tumour and germline samples from a patient allowsresearchers to distinguish between somatic variation (< 0.01% of variants) andinherited germline variation (> 99.99% of variants).133 Germline variants arethose that exist in the germline DNA which is the source of DNA for all cells inthe body.149 A variant contained within the germline can be passed from parentto offspring. The identification of putative germline variants was the focus ofAim 2 (Chapter 3). Therefore, germline variants will not be reported in thischapter.

In contrast, somatic variants are those found in the tumour DNA but not in thegermline DNA.149 Most cancers arise and evolve as a consequence of somaticmutations.358 The characterisation of somatic mutations in cancer genomesis essential for understanding the disease and for the development of targetedtherapeutics.359 Over the last three decades, more than 600 genes have beenshown to be somatically mutated in cancers.134,358

Molecular characterisation of somatic driver mutations allows greater understandingof biological abnormalities within cancer cells and provides information on thefunction of gene products, and relationships between genes and biochemicalpathways.134 Development of new therapeutic and preventative agents are dependenton the identification and modulation of these molecular targets.134,360 Targetedtherapies for advanced lung cancer,361 melanoma,362 colorectal cancer,363 andgastrointestinal stromal tumour364 are examples that have resulted from thetranslation of knowledge gained from genomics. In addition to somatic variants,a comparison of matched tumour and germline DNA can also identify the absenceof heterozygosity at loci in tumour DNA compared to germline DNA.

4.1.1.2 Loss of heterozygosity

Loss of heterozygosity (LOH) is a common genetic event in cancer development.365

LOH is a change in the polymorphic markers from a heterozygous state in thegermline DNA to a homozygous state in the tumour DNA.366 In cancers, theabsence of one functional copy of a tumour suppressor gene does not affect thephenotype. However, if LOH occurs and the remaining normal copy of the tumour

88

Page 119: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

suppressor gene is lost, this will result in the complete loss of the protectivefunction of the tumour suppressor gene. LOH is known to be involved in thesomatic loss of wild-type alleles in many inherited cancer syndromes such asretinoblastoma and hereditary breast and ovarian cancer syndromes.366,367

4.1.1.3 Somatic copy number alteration

In addition to distinguishing somatic and LOH variants, somatic copy numberalterations (SCNA) can be identified in a tumour sample relative to the matchedgermline sample by comparing the normalised read depth.368,369 The DNA sequencecopy number is the number of copies of DNA in a region of a genome.370 Cancerprogression often involves alterations in DNA copy number.370 In humans, thenormal copy number is two for all the autosomes. A copy number variation(CNV) is defined as structurally variant regions where copy number differenceshave been observed between two or more genomes that are larger than one kilobase(kb) in size.371 CNVs can alter transcription of genes by changing the dosage orby disrupting proximal or distant regulatory regions.372

SCNA, distinguished from germline CNV, play a role in activating oncogenesand inactivating tumour suppressor genes.13 Identification of SCNA can providevaluable insights into the cellular defects that cause cancer and suggest potentialtherapeutic strategies.373 SCNA and CNVs have a significant role in tumourigenesisin many cancers including gastric cancer,374 ovarian cancer,375 hepatocellularcarcinoma,376 testicular germ cell tumours,377 colorectal carcinoma,378 and bladdercancer.379 The characterisation of focal SCNAs has led to the identification ofnovel cancer genes such as MYB, PAX5 and DUSP4 .380–387

4.1.2 Bioinformatic assessment of matched tumour and germlinesamples

A number of bioinformatic tools have been developed to analyse matched tumourand germline samples. Initially, these tools used algorithms that involved callingvariants in the tumour and germline samples separately followed by classificationusing a statistical significance test or simple subtraction.388 More recently, toolshave been developed that compare the tumour and germline directly at each

89

Page 120: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

locus. VarScan2 and Strelka are two calling algorithms that were specificallydesigned for the joint analysis of matched tumour and germline samples.368,369,389

VarScan2 uses tumour and germline samples to heuristically detect sequencevariants and classify them by somatic status (germline, somatic or LOH).368,369

Strelka utilises a novel Bayesian approach to represent continuous allele frequenciesfor both tumour and normal samples to efficiently identify somatic variants.389

Using Strelka, the normal sample is represented as a mixture of diploid germlinevariation with noise, and the tumour sample is represented as a combination ofthe normal sample with somatic variation.389 It is important to identify somaticmutations in cancer studies as these variants often play important roles in tumourdevelopment and treatment decisions.149

4.1.3 Somatic mutations and drug sensitivity

The identification of somatic driver mutations that arise in tumours is importantin developing new cancer therapeutic targets as genetic variation influences theresponse of an individual to drug treatments.390 The current treatment for mostcancers includes using cytotoxic chemotherapy, which is not precisely targetedto the somatic mutations that drive malignant transformation.390 Somatic mutationscan influence tumour behaviour and clinical outcome. Therefore, therapies shouldbe targeted to the patient’s tumour genotype rather than a generic treatment.An increased understanding of somatic mutations in individual patients hasthe potential to make therapies safer and more effective by assisting treatmentselection and dosage based on driver mutations in the tumour.

The Genomics of Drug Sensitivity in Cancer database (http://www.cancerrxgene.org/)391 is a large dataset on drug sensitivity in cancer cells linked to genomicinformation to facilitate the discovery of new biomarkers of drug response.391

The database contains information on over 250 anticancer drugs across > 1,000cell lines.391 Molecular markers are identified by integrating data from the Catalogueof Somatic Mutations in Cancer (COSMIC) database134 and cell line drug sensitivitydata.

90

Page 121: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

4.1.4 Outline of chapter

Whole exome sequencing (WES) was performed on matched tumour and germlineDNA from two myxoid liposarcoma patients from the families described in Chapter2. VarScan2 was used to identify candidate somatic variants that were confirmedusing Strelka. VarScan2 was also used to identify LOH variants and SCNAevents to determine regions of interest in both patients.

4.2 Methods

4.2.1 Whole exome sequencing

Tumour DNA from formalin-fixed and paraffin-embedded (FFPE) tumour samplesand germline DNA from Patient 1-II-2 and Patient 2-II-1 were available to performa matched tumour-germline analysis. DNA was extracted at the Peter MacCallumCancer Centre in Melbourne, Australia. After microdissection of tumour materialfrom FFPE tissue, DNA was extracted using a DNeasy Tissue kit (Qiagen) aspreviously described.392 Anti-coagulated blood was processed using a Ficollgradient. DNA was extracted from the nucleated cell product using QIAampDNA blood kit (Qiagen).

Patient 1-II-2 (Figure 4.1) is a male patient who was diagnosed with a myxoidliposarcoma at 39 years of age. Patient 2-II-1 (Figure 4.2) is a male patient whowas diagnosed with a myxoid liposarcoma at 61 years old.

91

Page 122: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

1-I-1 1-I-2

1-II-1 1-II-2Sarcoma

1-II-3

1-III-1Sarcoma

1-III-2

Affected male

Affected female

Unaffected male

Unaffected female

Proband

Patient selected fortumour-normal analysis

Key

*

*

Figure 4.1: Pedigree of family 1 highlighting sarcoma Patient 1-II-2 fortumour-germline comparison

2-I-1 2-I-2Prostate

2-II-1Sarcoma

2-II-2Melanoma

2-II-4

2-III-1

2-II-3Melanoma

Affected male

Affected female

Unaffected male

Unaffected female

Proband

Patient selected fortumour-normal analysis

Key

**

Figure 4.2: Pedigree of family 2 highlighting sarcoma Patient 2-II-1 fortumour-germline comparison

92

Page 123: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Due to difficulties performing WES on older FFPE samples,393 DNA extractedfrom these samples were sent to an external sequencing facility. The four sampleswere sequenced using Agilent SureSelect V5 Capture on the Illumina HiSeq4000 at 60X coverage.

4.2.2 Pre-processing and quality control

FASTQ files were received from Macrogen, Inc. Initial quality control (QC)reports were generated using FastQC (version 0.11.3), a quality control applicationfor high throughput sequence data.394 FastQC reads FASTQ files and can eitherprovide an interactive application to review the results of several different checksor create an HTML based report which can be integrated into a pipeline.394 QCreports were generated on sequence quality, GC content, duplication levels andadapter content.

4.2.3 Adapter trimming

The presence of technical sequences such as adapters in WES data can resultin suboptimal downstream analyses.395 The Illumina-specific adapter sequenceswere trimmed from the FASTQ files using Trimmomatic (version 0.36).395 AsIllumina sequences are paired-end, the ‘palindrome mode’ was used. This modeis specifically aimed at detecting typical adapter read-through situations inwhich the DNA fragment is shorter than the read length and indicates adaptercontamination on the end of the reads.395 After the Illumina-specific adaptershad been trimmed from the FASTQ files, the second round of QC reports weregenerated on the adapter trimmed data using FastQC.

4.2.4 Sequence alignment and calling

The raw sequencing data was then aligned to the human genome using theBurrows-Wheeler Aligner (BWA, version 0.7.2).396 BWA alignment was performedin two steps. In the first step, the genome was indexed to the human genomebuild 19 (hg19) reference sequence. In the second phase, BWA Maximal ExactMatches (BWA-MEM) was used to run the alignment to align the sequencereads to hg19.

93

Page 124: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

The alignment step creates the alignment in Sequence Alignment/Map (*.sam)format. SAMtools (version 1.3.1) View397 was used to convert the *.sam files to*.bam format to reduce the size of the data.

Summary statistics were created for the *.bam files using SAMtools flagstat.397

Index files were created for each *.bam file using SAMtools index .397 Localrealignment was performed on the *.bam files in two stages using Genome AnalysisToolkit (GATK) RealignerTargetCreator and IndelRealigner (version 3.4.0).180

The Picard (version 2.4.1) FixMateInformation tool was used to ensure that allread entries had their mate information written correctly. The Picard MarkDuplicatestool was then used to identify duplicate reads.

4.2.5 BAM quality control

A final round of QC was performed on the *.bam files using GATK DepthOfCoverage,180

and Picard CollectInsertSizeMetrics and CollectAlignmentSummaryMetrics todetermine coverage, insert size (the library portion between the adapter sequences)and alignment metrics, respectively.

4.2.6 Generate mpileup file

The germline and tumour *.bam files for each patient were grouped using SAMtoolsmpileup.181 Alignment records were consolidated by sample identifiers in readgroup header lines.

4.2.7 Somatic variant calling using VarScan2

The genotype for each sample was determined from mpileup files using VarScan2(version 2.3.9).368 The algorithm read the data from both tumour and germlinesamples simultaneously. VarScan2 employed a heuristic approach to call variantsthat met the thresholds for read depth, base quality, variant allele frequency,and statistical significance.368,369 If the genotypes did not match, the read countswere evaluated by one-tailed Fisher’s exact test in a two-by-two table, comparingthe number of reference-supporting reads and variant-supporting reads observedin the tumour to the numbers that were observed in the germline.368 If the

94

Page 125: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

resulting p-value met the significance threshold (default 0.10), then the variantwas called somatic (if the germline matched the reference genome at that position).368

The VarScan2 subcommand, processSomatic, was then used to create outputfiles of somatic variants based on confidence (low confidence and high confidence).High confidence variants are classed as those with a tumour variant allele frequency> 15%, normal variant allele frequency < 5%, and a somatic p-value of < 0.03.The remaining variants are classed as low confidence. VarScan2 somaticfilterwas used to filter the possible false positives from the high confidence somaticmutations. Table 4.1 shows the settings used to run the somaticfilter command.

Table 4.1: Parameters specified for VarScan2 somaticfilter to filter falsepositives from the high confidence somatic mutations

Parameter Specified

Minimum read depth 10

Minimum supporting reads for a variant 2

Minimum number of strands on which variant observed 1

Minimum average base quality for variant-supporting reads 20

Minimum variant allele frequency threshold 0.2

Default p-value threshold for calling variants 1 x 10−1

Bonferroni adjustments were made to the somatic p-value values from VarScan2to correct for multiple testing.292 The total number of variants in the mpileupfiles for each patient were used for the correction. The genotypes of any significantvariants were visually confirmed by importing *.bam files into Integrative GenomicsViewer (IGV, version 2.3.80) by determining the number of reads for each allele.183,184

4.2.7.1 Somatic variant calling using Strelka

A second somatic variant caller, Strelka (version 1.0.15),389 was used to confirmthe statistically significant somatic variants called by VarScan2. The first stepof somatic variant analysis using Strelka is to run preliminary configurationvalidation (ensure that the chromosome names match in the *.bam header and

95

Page 126: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

reference genome). Template configuration files from Strelka were used in thisanalysis. The configuration generates a makefile that controls the analysis step.The second phase is to run the analysis using the makefile. The sorted tumourand germline *.bam files and hg19 reference sequence were used in the analysis.

4.2.8 Evidence further supporting somatic risk genes

The significant somatic variants and the genes in which they arise were furtherexamined for evidence in cancer pathogenesis using several in silico resourcesincluding COSMIC (catalogue of somatic mutations),134 the pathway unificationdatabase (PathCards),293 gene ontology (GO) annotations,294 PubMeth (a databaseof methylation in cancer),295 and National Center for Biotechnology Information(NCBI).296 A PubMed search was performed using a string (“gene name”) AND(cancer OR malignancy OR tumor* OR tumour* OR sarcoma) in April 2017.Abstracts were screened for relevance to the current study.

4.2.9 Drug sensitivity

The genes in which somatic mutations were identified in two sarcoma patientswere searched in the Genomics of Drug Sensitivity in Cancer database(http://www.cancerrxgene.org/)391 to determine whether they were known moleculartargets.

4.2.10 Loss of heterozygosity variant calling using VarScan2

VarScan2 was used to call LOH variants. Similar to the somatic variant callingprocess, if the genotype between tumour and germline DNA did not match, theread counts were evaluated by a one-tailed Fisher’s exact test. If the resultingp-value met the significance threshold (default 0.10), then the variant was calledLOH (if the germline was heterozygous).

Bonferroni adjustments were made to the LOH p-value values from VarScan2 tocorrect for multiple testing.292 The total number of variants in the mpileup filesfor each patient were used for the correction. The genotypes of any significantvariants were visually confirmed by importing *.bam files into IGV.

96

Page 127: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

4.2.11 Variant annotation and filtering

Statistically significant somatic and LOH variants were annotated using AnnotateVariation (ANNOVAR, version 2015Jun16)245 and Regulome database (RegulomeDB).257

The somatic and LOH variants that reached statistical significance after Bonferronicorrection were cross-referenced to the exclusion list of Fuentes Fajardo et al.(2012) (Available in the paper’s Supplementary material: ‘Table S7 gene exclusionlist final’) to determine if any variants in highly polymorphic regions should beexcluded.282

4.2.12 Somatic copy number analysis using VarScan2

VarScan2 copynumber was applied to the tumour-germline mpileup files to createa single output file of raw SCNAs. VarScan2 copycaller was then used to adjustfor GC content and make preliminary calls. The adjusted calls files were importedinto R,281 and the package DNAcopy (version 1.48.0)398 was used to performcircular binary segmentation on a per-chromosome basis to smooth and segmentthe raw output from VarScan2 copycaller .370 The results of DNAcopy wereplotted in R to visualise SCNA.

4.3 Results

4.3.1 Whole exome sequencing

Raw data reports from Macrogen, Inc. are summarised in Table 4.2. The GCcontent for an exome typically falls within the range of 49-51%.399 Therefore,the samples show just below average %GC content, with the tumour samplesshowing lower %GC content than the germline samples. Three of the sampleshave over 90% bases with a base quality (Q) score above 20 in the Phred scale(call accuracy of 99%), except Patient 2-II-1 tumour sample, which has 88.6%bases with a Q score above 20. Pre-processing QC and adapter trimming didnot result in any sequences being flagged or trimmed.

97

Page 128: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 4.2: Raw data summary from Macrogen Inc. for Patient 1-II-2 andPatient 2-II-1 germline and tumour samples

Sample IDTotal read bases(base pairs)

Total reads GC(%) AT(%) Q20(%) Q30(%)

Patient 1-II-2 germline 7,720,761,786 76,443,186 48.9 51.1 98.6 96.09

Patient 2-II-1 germline 6,079,509,766 60,193,166 48.88 51.12 98.24 95.19

Patient 1-II-2 tumour 6,711,209,620 66,447,620 47.13 52.87 96.5 91.47

Patient 2-II-1 tumour 5,022,136,524 49,724,124 47.47 52.53 94.7 88.58Sample ID: sample name. Total read bases: total number of bases sequenced. Total reads:

total number of reads. GC(%): GC content. AT(%): AT content. Q20(%): Ratio of reads

that have Phred quality score of over 20. Q30(%): Ratio of reads that have Phred quality

score of over 30.

4.3.2 Sequence alignment and calling

Summary statistics on the trimmed *.bam files were computed using Samtoolsflagstat181 and are presented in Table 4.3. The results show that both germlinesamples had over 99% of reads mapped, and both tumour samples had over98% of reads mapped. Both germline samples had almost all of the mappedreads properly paired (> 98.8%). However, the tumour samples had slightlylower properly paired reads (93.68% for Patient 1-II-2 and 95.86% for Patient2-II-1).

98

Page 129: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 4.3: Summary statistics generated using Samtools flagstat for Patient 1-II-2 and 2-II-1 germline and tumoursamples

Statistic Patient 1-II-2 germline Patient 2-II-1 germline Patient 1-II-1 tumour Patient 2-II-1 tumour

Total (QC-passed reads + QC-failed reads) 76,335,490 60,113,342 63,327,174 49,372,458

Duplicates 0 0 0 0

Mapped (%) 75,948,149 (99.49%) 59,803,630 (99.48%) 62,074,975 (98.02%) 48,478,673 (98.19%)

Paired in sequencing 76,335,490 60,113,342 63,327,174 49,372,458

Read 1 38,167,745 30,056,671 31,663,587 24,686,229

Read 2 38,167,745 30,056,671 31,663,587 24,686,229

Properly paired 75,457,268 (98.85%) 59,445,380 (98.89%) 59,321,774 (93.68%) 47,327,518 (95.86%)

With itself and mate mapped 75,833,239 59,704,772 61,514,992 47,943,636

Singletons (%) 114,910 (0.15%) 98,858 (0.16%) 559,983 (0.88%) 535,037 (1.08%)

Mate mapped to a different chromosome 174,551 105,540 109,018 39,240

Mate mapped to a different chromosome (mapQ≥ 5) 156,396 93,541 65,912 27,128QC: quality control.

Page 130: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Local realignment was performed using GATK RealignerTargetCreator. ForPatient 1-II-2 there were 3,793,051 (2.72%) reads filtered out during the traversal.Of these, 224,019 reads failed the ‘bad mate’ filter, 3,569,018 reads failed the‘mapping quality zero’ filter, and 14 reads failed the ‘unmapped read’ filter.For Patient 2-II-1 there were 3,060,049 (2.79%) reads filtered out during thetraversal. Of these, 121,551 reads failed the ‘bad mate’ filter, 2,938,469 readsfailed the ‘mapping quality zero’ filter, and 29 reads failed the ‘unmapped read’filter.

For Patient 1-II-2, no reads were filtered out of 76,335,490 total reads in thegermline sample, and no reads were filtered out of 63,327,174 total reads in thetumour sample. For Patient 2-II-1, no reads were filtered out of 60,113,342 totalreads in the germline sample, and no reads were filtered out of 49,372,458 totalreads in the tumour sample.

4.3.3 BAM quality control

GATK depth of coverage results are presented in Figure 4.3. As expected, themajority of bases were covered at a depth of 100X or less in each sample. Germlinesamples (blue) for both patients show slightly higher coverage compared to thetumour samples (orange).

100

Page 131: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

-5,000

20,000

45,000

70,000

95,000

120,000

145,000

170,000

195,000

220,000

>=0 >=50 >=100 >=150 >=200 >=250 >=300 >=350 >=400 >=450 >=500

Nu

mb

er o

f b

ases

Depth

Patient 1-II-2 Germline Patient 1-II-2 Tumour

(a) Patient 1-II-2

-

20,000

45,000

70,000

95,000

120,000

145,000

170,000

195,000

220,000

5,000>=0 >=50 >=100 >=150 >=200 >=250 >=300 >=350 >=400 >=450 >=500

Num

ber o

f bas

es

Depth

Patient 2-II-1 Germline Patient 2-II-1 Tumour

(b) Patient 2-II-1

Figure 4.3: Genome analysis toolkit depth of coverage summary for Patient 1-II-2 and Patient 2-II-1 germline andtumour DNA

Page 132: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

The average insert size for the germline samples of both Patient 1-II-2 and Patient2-II-1 is approximately 150 base pairs. The tumours samples have slightly smallerinsert sizes of approximately 125 base pairs and 140 base pairs for Patient 1-II-2and Patient 2-II-1, respectively. Figure 4.4 shows histogram plots of the insertsize distribution for both patients’ germline and tumour samples generated byPicard (Patient 1-II-2: top panels, Patient 2-II-1: bottom panels).

a)

c)

b)

d)

Patient 1-II-2 germline Patient 1-II-2 tumour

Patient 2-II-1 germline Patient 2-II-1 tumour

Figure 4.4: Insert size histogram plots generated by Picard for Patient 1-II-2and Patient 2-II-1 germline and tumour samples

102

Page 133: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

High level metrics about the alignment of reads within a *.bam file were producedby the CollectAlignmentSummaryMetrics tool from Picard. All the reads fromboth patients’ germline and tumour samples passed the filter criteria, and thepercentage of reads aligned was above 98% for all samples.

4.3.4 Somatic variant calling

4.3.4.1 VarScan2

VarScan2 identified 4,888 somatic variants in Patient 1-II-2, of which, 702 wereclassed as high confidence. Patient 2-II-1 had 2,667 somatic variants with 595classed as high confidence. The results of the somaticfilter command (to removepossible false positives) for the SNV somatic high confidence files are presentedin Table 4.4. Most of the variants that were removed from both patients failedthe Reads2 requirement (minimum supporting reads for a variant).

Table 4.4: Results from VarScan2 somaticfilter to remove possible false positivesfrom the high confidence somatic calls for Patient 1-II-2 and Patient 2-II-1

Filter Patient 1-II-2 Patient 2-II-1

Total variants in input 702 595

Coverage requirement (10) 3 7

Reads2 requirement (2) 652 459

VarFreq requirement (0.2) 0 0

p-value requirement (1 x 10−1) 2 14

SNP clusters requirement 0 4

Near INDELs 0 0

Passed 45 111

Reads2: minimum supporting reads for a variant filter. VarFreq: Minimum variant allele

frequency filter. SNP: single nucleotide polymorphism. INDEL: insertion or deletion.

103

Page 134: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Bonferroni adjustment was performed on the p-values from VarScan2 to correctfor multiple testing. As Patient 1-II-2 had 66,265,606 positions for comparison,the significance level after Bonferroni correction was α < 7.55 x 10−10. Aftercorrecting for multiple testing, Patient 1-II-2 had 11 statistically significantsomatic variants.

Patient 2-II-1 had 67,054,165 positions for comparison, therefore the significancelevel after Bonferroni correction was α < 7.46 x 10−10. After correcting for multipletesting, Patient 2-II-1 had three statistically significant somatic variants.

4.3.4.2 Validation of somatic variants using Strelka

Of the 11 somatic variants identified by VarScan2 in Patient 1-II-2, ten werealso reported as somatic variants by Strelka (Table 4.5). A variant in the CCDC66gene (position 19:47768072) was reported by VarScan2 but not reported byStrelka. All three somatic variants identified by VarScan2 in Patient 2-II-1 wereconfirmed by Strelka (Table 4.6). The variants were annotated and cross-referencedagainst a provisional gene exclusion list, but no variants were removed.282

104

Page 135: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 4.5: Somatic variants identified by VarScan2 and Strelka for Patient 1-II-2

Chr:Pos Gene p-value Function SIFT PolyPhen-2 RegulomeDB GERP Ref AltRead depth

Ref, alt

chr14:23397376 PRMT5 1.70 x 10−26 NS D B . 4.64 G A 93,76

chr9:95237133 ASPN 5.55 x 10−22 NS T D . 5.12 T A 60,51

chr6:129704330 LAMA2 2.86 x 10−18 NS T D 6 5.74 G A 67,49

chr4:106156358 TET2 1.09 x 10−16 NS T B 5 2.32 C T 61,55

chr18:34322702 FHOD3 9.73 x 10−16 NS T D 6 4.17 A T 54,54

chr19:19607014 GATAD2A 3.78 x 10−14 NS T D 5 5.65 G A 9,17

chr14:105212646 ADSSL1 3.05 x 10−12 S . . 2b . C T 41,34

chr3:49042587 P4HTM 5.26 x 10−12 NS . B 2b -4.39 A C 54,26

chr11:65000621 SLC22A20,POLA2 4.24 x 10−11 IG . . 5 . G T 21,15

chr9:133760443 ABL1 1.03 x 10−10 S . . 5 . G C 53,32Chr:pos: Chromosome:position. p-value: Fisher’s p-value. SIFT: Sorting Intolerant From Tolerant. PolyPhen-2: Polymorphism

Phenotyping-2. GERP: Genomic Evolutionary Rate Profiling score (a positive GERP score represents a substitution deficit, while a negative

GERP score represents a substitution surplus). Ref: reference allele. Alt: alternate allele. NS: nonsynonymous. S: synonymous. IG:

intergenic. D: deleterious. B: benign. T: tolerated.

Page 136: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 4.6: Somatic variants identified by VarScan2 and Strelka for Patient 2-II-1

Chr:Pos Gene p-value Function SIFT PolyPhen-2 RegulomeDB GERP Ref AltRead depth

Ref, alt

chr8:57307625 SDR16C6P,PENK 9.39 x 10−17 Intergenic . . . . C T 44,32

chr5:1244883 SLC6A18 1.33 x 10−12 Splicing . . 5 3.29 G A 10,12

chr5:57754947 PLK2 4.34 x 10−11 Intronic . . 4 . T C 31,23Chr:pos: Chromosome:position. p-value: Fisher’s p-value. SIFT: Sorting Intolerant From Tolerant. PolyPhen-2: Polymorphism

Phenotyping-2. GERP: Genomic Evolutionary Rate Profiling score (a positive GERP score represents a substitution deficit, while a negative

GERP score represents a substitution surplus). Ref: reference allele. Alt: alternate allele. D: deleterious. B: benign.

Page 137: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

4.3.4.3 Evidence further supporting somatic risk genes

Table 4.7 contains a summary of several in silico resources for the somatic riskvariants and the genes in which they arise for both patients. None of the somaticrisk variants were reported in COSMIC. However, all but one of the genes (SDR16C6P)were reported to have mutations in the COSMIC database. Two genes werelisted in the COSMIC cancer gene census (TET2 and ABL1 ). The ABL1 andPENK genes were reported in the PubMeth database. However, none of theother genes were reported in PubMeth, which suggests there is currently noevidence of methylation of these genes in cancer. Evidence from NCBI suggeststhat ten genes have been reported to have gene functions that support involvementin cancer. Of these ten genes, six genes (PRMT5, LAMA2, TET2, FHOD3,ABL1 and PLK2 ) were reported to have Gene References into Functions (GeneRIF)evidence for involvement in cancer pathogenesis. Two genes (POLA2 and PENK )had GeneRIF that indicated these genes might be biomarkers for cancer andtwo genes (ASPN and P4HTM ) had GeneRIF that suggested these genes maybe targets for cancer therapeutics.

A summary of the PubMed searches for the candidate somatic risk genes issummarised in Table 4.8. The PubMed searches revealed previously publishedassociations between the genes and cancer except for the ADSSL1 and SDR16C6Pgenes. Therefore there is no evidence supporting the involvement of ADSSL1and SDR16C6P genes in cancer pathogenesis at this time.

107

Page 138: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 4.7: Summary of findings from in silico resources investigating the role of somatic risk variants and the genes inwhich they arise in cancer pathogenesis

Gene Genomiclocation

Variant inCOSMIC

No.mutationsin COSMIC

Cancergenecensus

SuperPath GO Molecular function Methylation GeneRIF

PRMT5 chr14:23397376 No 95 No Regulation of TP53activity

Chromatin organisation

Gene expression

Transport of the SLBPindependent maturemRNA

RNA transport

Core promotersequence-specific DNAbinding

Transcription corepressoractivity

Protein binding

Methyltransferase activity

Methyl-CpG binding

No Colorectal cancerpathogenesis

Acute myeloidleukemia growth

Marker of poorprognosis innasopharyngealcarcinoma

Marker for earlycolorectal carcinomas

ASPN chr9:95237133 No 96 No ECM proteoglycans

Degradation of theextracellular matrix

Protein kinase inhibitoractivity

Calcium ion binding

Collagen binding

No Role in gastric cancer

Therapeutic targetmolecule

Page 139: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Variant inCOSMIC

No. variantsin COSMIC

Cancergenecensus

SuperPath GO Molecular function Methylation GeneRIF

LAMA2 chr6:129704330 No 854 No Integrin pathway

ERK signalling

Arrhythmogenicright ventricularcardiomyopathy

Dilated cardiomyopathy

Focal adhesion

Receptor binding

Structural molecule activity

No Mutations inhepatocellularcarcinoma patients

TET2 chr4:106156358 No 2,726 Yes Activated PKN1stimulates transcriptionof androgen receptorregulated genes

Chromatin regulation /Acetylation

Gene expression

Sulfonate dioxygenaseactivity

DNA binding

Protein binding

Ferrous iron binding

Zinc ion binding

No Involved inleukemogenesis

Oncogenic role inmyeloid tumour

FHOD3 chr18:34322702 No 393 No . Actin binding

Protein binding

No Glioma linearmigration

Associated withacute lymphoblasticleukemia

Promotes invasivemigration and localinvasion in vivo

Page 140: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Variant inCOSMIC

No. variantsin COSMIC

Cancergenecensus

SuperPath GO Molecular function Methylation GeneRIF

GATAD2A chr19:19607014 No 95 No Activated PKN1stimulates transcriptionof androgen receptorregulated genes

Chromatin organisation

Gene expression

Regulation of TP53activity

Contributes to RNApolymerase II regulatoryregion sequence-specificDNA binding

Transcription factor activity,sequence-specific DNAbinding

Protein binding

Zinc ion binding

Protein binding, bridging

No .

ADSSL1 chr14:105212646No 114 No Purine metabolism

Purine nucleotides denovo biosynthesis

Metabolism purinemetabolism

Alanine, aspartate andglutamate metabolism

Magnesium ion binding

GTPase activity

Adenylosuccinate synthaseactivity

GTP binding

Ligase activity

No .

P4HTM chr3:49042587 No 75 No . Iron ion binding

Calcium ion binding

Oxidoreductase activity

No May aid the designof novel therapiesfor inhibiting bonetumours

Page 141: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Variant inCOSMIC

No. variantsin COSMIC

Cancergenecensus

SuperPath GO Molecular function Methylation GeneRIF

SLC22A20 chr11:65000621 No 83 No . Inorganic anion exchangeractivity

Sodium-independent organicanion transmembranetransporter activity

No .

POLA2 chr11:65000621 No 108 No Telomere C-strandsynthesis

E2F mediated regulationof DNA replication

Regulation of activatedPAK-2p34 by proteasomemediated degradation

Purine metabolism

Cell cycle, Mitotic

DNA binding

DNA-directed DNApolymerase activity

Protein heterodimerisationactivity

No Prognostic biomarkerin non small cell lungcancer pathogenesis

Page 142: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Variant inCOSMIC

No. variantsin COSMIC

Cancergenecensus

SuperPath GO Molecular function Methylation GeneRIF

ABL1 chr9:133760443 No 1,684 Yes DNA double-strand breakrepair

Development Slit-Robosignalling

Regulation of actindynamics for phagocyticcup formation

Cell cycle

ErbB signalling pathway

Magnesium ion binding

DNA binding

Actin monomer binding

Nicotinate-nucleotideadenylyltransferase activity

Protein kinase activity

Yes BCR/ABL oncogenein leukaemia

Promote breast cancerosteolytic metastasis

Progression of gastriccancer

SDR16C6P chr8:57307625 No . No . . No .

PENK chr8:57307625 No 163 No Apoptotic pathways insynovial fibroblasts

GPCR pathway

ERK signalling

Nanog in MammalianESC Pluripotency

CREB Pathway

Opioid peptide activity

Neuropeptide hormoneactivity

Opioid receptor binding

Yes Promoter methylationassociatedwith colorectaladenocarcinomadiagnosis

Page 143: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Variant inCOSMIC

No. variantsin COSMIC

Cancergenecensus

SuperPath GO Molecular function Methylation GeneRIF

SLC6A18 chr5:1244883 No 186 No Transport of glucoseand other sugars, bilesalts and organic acids,metal ions and aminecompounds

Amino acid transportacross the plasmamembrane

Neurotransmitter:sodiumsymporter activity

Amino acid transmembranetransporter activity

Symporter activity

No .

PLK2 chr5:57754947 No 153 No FoxO signalling pathway

Gene expression

TP53 Regulatestranscription of cell cyclegenes

DNA damage

Regulation of TP53activity

Nucleotide binding

Protein kinase activity

Protein serine/threoninekinase activity

Signal transducer activity

Protein binding

No Promoting tumourprogression

Increases cellproliferation anddecreases apoptosisin gastric cancer cells

Genomic location: chromosome:position. COSMIC: Catalogue of Somatic Mutations in Cancer database.134 No. mutations in COSMIC: the number of mutationsreported in the gene in the COSMIC database. Cancer gene census: is the gene reported in the cancer gene census in COSMIC? The cancer gene census is a catalogueof genes for which mutations have been causally implicated in cancer. SuperPath: from Pathcards, an integrated database of human pathways and their annotations.(http://pathcards.genecards.org/). Human pathways were clustered into SuperPaths based on gene content similarity. GO molecular function: Gene Ontology molecularfunction.294 Methylation: is the gene reported to be methylated in cancer by PubMeth? (http://www.pubmeth.org).295 GeneRIF: Gene References Into Functionsfrom National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/). Are any GeneRIF associated with cancer reported for the gene? SLBP: stem-loopbinding protein. ECM: extracellular matrix. ERK: extracellular receptor kinase. GTP: guanosine triphosphate. E2F: E2 factor. ErbB: erythroblastosis oncogene B.GPCR: G-protein-coupled receptors. ESC: embryonic stem cells. CREB: cAMP response element-binding protein.

Page 144: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 4.8: Summary of search results from PubMed for genes in which somatic variants were identified

Gene No. of publications Role of gene Selected references

PRMT5 156 PRMT5 is a regulator of homologous recombination-mediated double-strandbreak repair. PRMT5 methyltransferase activity is necessary for tumour cellproliferation and plays an important role in cancer progression by repressingthe expression of key tumour suppressor genes. Mutations in PRMT5associated with gastric cancer, oropharyngeal squamous cell carcinoma,hepatocellular carcinoma, prostate cancer, lung adenocarcinoma, lungsquamous cell carcinoma, endometrial carcinoma and breast carcinoma.

400–404,404,405

ASPN 33 ASPN is a secreted small leucine rich proteoglycan with known rolesin ligament regulation and chondrogenesis. It is a potential mediator ofmetastatic progression found within the tumour microenvironment. ASPNhas been shown to play a role in breast cancer, scirrhous gastric cancer,pancreas, and prostate cancer.

406–411

LAMA2 22 LAMA2 functionally involved in the formation of extracellular matrixand is found to be upregulated in metastatic renal cell carcinoma andduring serum-induced glioma initiating cells differentiation. Downregulationof LAMA2 reported in oesophageal cancer, extracellular matrix indrug-resistant ovarian cancer cell line, hepatocellular carcinoma, andlaryngeal cancer. Abnormal methylation reported in breast cancer carcinomaand colorectal cancer. LAMA2 is a candidate marker and indicator of poorprognosis for posterior fossa subgroup A epdendymal tumours.

412–424

Page 145: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene No. of publications Role of gene Selected references

TET2 664 TET2 is an epigenetic regulator which is frequently mutated or inactivatedin cancer, and it has been suggested that the TET proteins may protectagainst abnormal DNA methylation at promoters. TET2 mutationsfrequently observed in myeloid, lymphoid and hematological malignancies.

425–430

FHOD3 11 FHOD3 involved in cancer cell migration and invasion via regulationof dynamic actin spike assembly in cells invading in vitro and in vivo.FHOD3 plays a role in glioma linear migration motility. FHOD3 washypomethylated, overexpressed and involved in major deletions and may playa role in thyroid cancer. FHOD3 mutations in leukaemia associated withmethotrexate polyglutamates accumulation.

431–435

GATAD2A 5 GATAD2A is a subunit of the nucleosome remodeling and histonedeacetylase complex, a chromatin-level regulator of transcription with anumber of important and emerging roles in cancer biology. Knockdown ofGATAD2A decreased the ability of cell proliferation and colony formationand promoted cell apoptosis in thyroid cancer cells. A variant in GATAD2Aassociated with susceptibility to three cancers (breast, ovarian and prostate).

436–440

ADSSL1 0 . .

P4HTM 1 P4HTM found to be hypermethylated in rhabdomyosarcoma. P4HTMsilencing by promoter DNA methylation is a potential mechanism forHIF − 1α stabilisation in rhabdomyosarcomas.

441

SLC22A20 3 SLC22A20 (OAT6 ) as an uptake carrier of sorafenib, SLC22A20 isdifferentially methylated in hepatocellular carcinoma

321,442,443

Page 146: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene No. of publications Role of gene Selected references

POLA2 8 POLA2 has been reported to be involved in cell proliferation by mediatingDNA replication, recombination, and repair. A variant in POLA2 improvesdifferential survivability and mortality in non-small cell lung cancerpatients and could be used as a prognostic biomarker. The knockdown ofPOLA2 increases gemcitabine resistance in human lung cancer cells. LowmRNA expression of POLA2 was prognostic of poor outcome in ovariancarcinomas. POLA2-CDC42EP2 read-through fusion transcript identifiedin gastrointestinal stromal tumours. POLA2 found to be overexpressed inmesothelioma.

322,444–449

ABL1 1,368 The product of the ABL1 gene is a tyrosine kinase which plays a role incellular growth control and response to DNA damage. The BCR-ABL1(Philadelphia chromosome) gene fusion is responsible for > 95% of chronicmyeloid leukemia. Mutations in BCR-ABL1 gene have been found to be amajor cause of disease progression and resistance to tyrosine kinase inhibitorsin chronic myeloid leukemia patients. Methylation of the proximal promoterof the ABL1 oncogene is a common epigenetic alteration associated withclinical progression of chronic myeloid leukemia. ABL1 first identified asoncogene in leukaemia but mutations also reported in lung cancer.

450–453

SDR16C6P 0 . .

PENK 99 PENK is a candidate tumour suppressor gene that is hypermethylated invarious cancers. PENK is also a potential biomarker for prostate, colorectaland bladder cancer. Hypermethylation of PENK contributes to cell motilityand adhesion.

454–462

SLC6A18 1 Gain of 5p15.33 (harbouring the SLC6A18 gene) reported in non-small celllung cancer cases.

463

Page 147: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene No. of publications Role of gene Selected references

PLK2 99 PLK2 plays a critical role in cell cycle and response to DNA damage. PLK2plays a tumour suppressor role in cervical cancer, ovarian cancer, gastriccancer and hematopoietic diseases. PLK2 is involved in paclitaxel resistancein solid tumours. PLK2 phosphorylates TAp73 resulting in inhibited cellproliferation, increased apoptosis, G1 phase arrest, and decreased cellinvasion. Protein kinases represent the most effective class of therapeutictargets in cancer.

464–474

PubMed search was performed using a string (“gene name”) AND (cancer OR malignancy OR tumor* OR tumour* OR sarcoma) in April2017. Abstracts were screened for relevance to the current study.

Page 148: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

4.3.4.4 Drug sensitivity

Two of the genes identified as of interest (TET2 and LAMA2 ) were reportedin the Genomics of Drug Sensitivity in Cancer database.391 The TET2 geneshowed a statistically significant association (p-value < 10−3) with VNLG/124and Bexarotene in pan-cancer analysis (drug sensitivity for cell lines from allcancer types with genomic features identified from the analysis of patient tumoursacross multiple different cancer types).391 While the LAMA2 gene was reportedin the Genomics of Drug Sensitivity in Cancer database, none of the associationsreached statistical significance.

VNLG/124 is a novel mutual prodrug of all-trans-retinoic-acid (ATRA) andhistone deacetylation inhibitors (HDIs).475 TET2 mutations determine sensitivityto ATRA. ATRA has been previously shown to induce the interaction and chromatinrecruitment of a novel RARβ-TET2 complex to epigenetically activate a specificcohort of target genes.476 Wu et al. (2017) reported a novel RARβ-TET2-miR-200c-PKCζsignalling pathway that directs cancer cell state changes that may have potentialtherapeutic implications.476 Bexarotene is a selective retinoid X receptors (RXR)agonist with properties overlapping ATRA.477 Bexarotene exerts its effects inblocking cell cycle progression, inducing apoptosis and differentiation, preventingmultidrug resistance, and inhibiting angiogenesis and metastasis.477 Therefore itis a promising chemopreventive agent against cancer.477

None of the remaining genes harbouring significant somatic mutations werelisted in the Genomics of Drug Sensitivity in Cancer database as of April 2017.However, as the understanding of genes and pathways that are causally implicatedin cancer grows, more therapeutics will be added to the database in the future.

4.3.5 Loss of heterozygosity variants

A total of 2,075 LOH variants were identified in Patient 1-II-2 using VarScan2.Of these, 507 were high confidence. After correcting for multiple testing andremoving variants in polygenic regions, 18 LOH variants were statistically significant(Table 4.9). Of these LOH variants, 16 were located on chromosome 16, and theremaining two were located on chromosome 19.

118

Page 149: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

There were 1,344 LOH variants identified in Patient 2-II-1 using VarScan2, with785 categorised as high confidence. After correcting for multiple testing, noLOH variants reached statistical significance for Patient 2-II-1.

4.3.6 Copy number analysis

The results of DNAcopy were visualised as SCNA graphs per chromosome forPatient 1-II-2 (Appendix I) and Patient 2-II-1 (Appendix J). The SCNA graphsfor Patient 1-II-1 show a considerable disruption on chromosome 16. The SCNAgraphs for Patient 2-II-1 do not show any large regions of disruption.

119

Page 150: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 4.9: Statistically significant high confidence loss of heterozygosity variants for Patient 1-II-2

Chr:Pos Gene Somatic p-value Function SIFT PolyPhen-2 RegulomeDB GERP Ref Alt Read depth (Ref, alt) MAF 1000G

chr16:87678441 JPH3 8.11 x 10−19 S . . 5 . C T 132,18 0.13

chr16:57503213 POLR2C 1.22 x 10−18 Int . . . . C T 11,77 0.52

chr16:84208335 DNAAF1 2.55 x 10−15 Int . . 1f . G T 6,96 0.33

chr16:89300014 ZNF778, ANKRD11 1.73 x 10−14 IG T P 3a -2 C T 9,132 0.009

chr16:84691044 KLHL36 4.31 x 10−13 S . . 1f . C T 44,0 0.34

chr16:71319539 CMTR2 4.92 x 10−12 S . . . . C T 10,57 0.68

chr19:33444588 CEP89 7.78 x 10−12 NS T P 2b 2.25 T G 52,0 .

chr16:68598007 ZFP90 1.63 x 10−11 S . . 3a . A G 82,10 0.57

chr16:81929488 PLCG2 1.99 x 10−11 S . . 3a . C T 82,7 0.35

chr16:53326860 CHD9 3.39 x 10−11 S . . . . G A 88,10 0.30

chr16:88884466 GALNS 5.52 x 10−11 S . . 4 . C T 1,39 0.38

chr16:87678144 JPH3 7.31 x 10−11 S . . . . T C 6,61 0.50

chr19:33444576 CEP89 1.93 x 10−10 NS T B 4 -6.38 C T 34,0 .

chr16:53513055 RBL2 2.44 x 10−10 Int . . 6 . T C 3,60 0.46

chr16:69973297 WWP2 2.60 x 10−10 S . . 4 . C T 84,11 0.03

chr16:55562466 LPCAT2 4.82 x 10−10 NS T B 6 -3.8 G A 68,2 0.65

chr16:67316600 PLEKHG4 5.82 x 10−10 Int . . 1f . G A 7,63 0.44

chr16:89805261 ZNF276 7.18 x 10−10 UTR3 . . 4 . A G 70,3 0.61

Chr:Pos: Chromosome:Position. SIFT: Sorting Intolerant from Tolerant score. PolyPhen-2: Polymorphism Phenotyping-2. GERP: Genomic

Evolutionary Rate Profiling score (a positive GERP score represents a substitution deficit, while a negative GERP score represents a substitution

surplus). Ref: reference allele. Alt: alternate allele. MAF 1000G: Minor Allele Frequency in 1000 Genomes Project. S: synonymous. NS:

nonsynonymous. Int: intronic. IG: intergenic. UTR3: 3’ untranslated region. D: deleterious. T: tolerated. B: benign. P: possibly damaging.

Page 151: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

4.4 Discussion

In summary, ten somatic variants in Patient 1-II-2 and three somatic variants inPatient 2-II-1 were identified by VarScan2 and confirmed by Strelka. VarScan2also identified a large region of LOH on chromosome 16 in Patient 1-II-2. ThisLOH region was supported by the SCNA results which also indicated a regionof SCNA on chromosome 16. Of the somatic mutations identified, two werelisted in the Genomics of Drug Sensitivity in Cancer database, indicating thepotential clinical utility of these findings. Of the 13 genes in which somaticmutations were identified, 11 genes have been previously associated with cancerin published literature.

4.4.1 Comparison of results in the context of published literatureon myxoid liposarcoma genetics

The majority of myxoid liposarcomas are characterised by the presence of thereciprocal chromosomal translocation t(12;16)(q13;p11). This translocationcreates the FUS-DDIT3 chimeric gene.64 A smaller fraction of myxoid liposarcomacases harbour a similar variant translocation and gene fusion, the t(12;22)(q13;q12),which fuses the EWSR1 gene to the DDIT3 gene.478 It is likely that these translocationsare the primary genetic event essential for tumour formation.479 However, insolid tumours, single base substitutions outweigh the number of chromosomaltranslocations by at least one order of magnitude.16 Therefore, it is possiblethat sarcomas with fusion gene drivers may also harbour other driver gene mutations.479

Myxoid liposarcoma can contain several additional molecular genetic alterations,including TP53, PIK3CA, and TERT mutations, which directly influence tumourcell biology and may be involved in round cell transformation, migration capacity,and differential response to drugs.480–485 Alterations of the TP53 pathway havealso been described in myxoid liposarcoma.480,486,487

One study has previously performed a matched tumour and germline analysison myxoid liposarcoma tumours. Joseph et al. (2014) performed WES on eightfresh frozen surgically resected myxoid liposarcomas and matched blood samples.488

A median of 10.8 (range 3–15) somatic mutations per tumour were reported,

121

Page 152: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

consistent with the findings of this study (ten somatic mutations reported inPatient 1-II-2 and three in Patient 2-II-1). One somatic variant was reportedby Joseph et al. in FHOD3 gene (g.chr18:32552101G>T).488 However, this is adifferent FHOD3 variant to the variant reported in this study.

A PubMed search was performed (May 2017) using a string (“gene name”)AND (“myxoid liposarcoma”) for each of the genes in which somatic mutationswere identified. No results were returned for any genes except ABL1. It haspreviously been suggested that ABL1 may play a role in pre- and post-transcriptionalregulatory networks that contribute to sensitivity to trabectedin treatment inmyxoid liposarcoma patients.489 The other genes in which somatic mutationswere identified in Patient 1-II-2 and Patient 2-II-1 have not been previouslyreported in myxoid liposarcomas.

A cluster of 16 LOH variants on chromosome 16q was also identified in Patient1-II-2. The SCNA plots for Patient 1-II-2 also highlight a region of SCNA onchromosome 16 which suggests that this may be the site of a significant genomicdisruption in this patient. Of the 1,015 genes in this chromosomal region, 66genes have previously been associated with cancer.

Patient 1-II-2 had a LOH mutation in one of the cancer genes located in theregion of LOH on chromosome 16, RBL2, at position chr16:53513055 (rs8049033).The minor allele frequency (MAF) in 1000 Genomes Project European populationis 0.4602. Therefore, this is a common variant in the general population. TheRegulomeDB score for rs8049033 is 6, which indicates there is minimal bindingevidence at this position. As this is an intronic variant, we do not know theeffect of LOH at this position on the phenotype.

All other patients in family 1 are also heterozygous at this position, exceptPatient 1-I-2 (unaffected) who is a homozygous reference (Figure 4.5). Patient1-III-1 (Ewing’s sarcoma) is also heterozygous at this position in the germlineDNA, however, without tumour sample for Patient 1-III-1 it is not possible todetermine if this variant also becomes homozygous for the alternate allele in thetumour DNA at this position.

122

Page 153: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

1-I-1 1-I-2

1-II-1 1-II-2Sarcoma

1-II-3

1-III-1Sarcoma

1-III-2

Affected male

Affected female

Unaffected male

Unaffected female

Proband

Patient selected fortumour-normal analysis

Key

*

*

Patient Genotype at

position in

RBL2 gene

Read depth

Ref, alt

Patient 1-I-1 germline T/C 52,71

Patient 1-I-2 germline T/T 115,0

Patient 1-II-1 germline T/C 42,47

Patient 1-II-2 germline T/C 77,60

Patient 1-II-2 tumour C/C 3,60

Patient 1-II-3 germline T/C 70,46

Patient 1-III-1 germline T/C 55,39

Patient 1-III-2 germline T/C 34,55

Figure 4.5: Pedigree of family 1 indicating genotypes for each patient atchr16:53513055 (rs8049033) in the RBL2 gene

123

Page 154: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

The Retinoblastoma-Like 2 (RBL2 ) gene, also known as RB2 or p130, is atumour suppressor gene that has been implicated in endometrial cancer,490–495

intraocular melanoma,496,497 lung cancer,498–506 nasopharyngeal cancer,507,508

neuroblastoma,509–511 and retinoblastoma.512–527

The Retinoblastoma (Rb) protein family plays an important role in regulatingother cellular processes, such as terminal differentiation and senescence.528 Previousstudies have also shown that Rb proteins are differentially regulated duringadipogenic differentiation of pre-adipocyte cell lines,529,530 suggesting that anabsence of RB1 or RB2 may promote adipogenesis.531 Human bone marrow-derivedmesenchymal stromal cells (hMSCs) are multipotent cells that, under definedconditions, can differentiate into multiple connective tissue cell types, such asadipocytes, osteoblasts, chondrocytes, and myoblasts.532 Differentiation of hMSCsinto different lineages involves complex regulation and transcriptional activationor repression of a vast number of genes, and disruption of this regulation canhave severe pathological consequences, such as cancer development.533,534

A second cancer gene of interest in the region of LOH on chromosome 16 inPatient 1-II-2 is the fused in sarcoma (FUS) gene, although no significant variantswere reported by VarScan2 in this gene. The FUS gene is involved in the specifictranslocation of myxoid liposarcomas (t(12;16)(q13;p11)).535 This translocationfuses exons 5, 7, or 8 of FUS gene with exon 2 of the DDIT3 gene. The FUSgene, also known as translocated in liposarcomas (TLS), is involved in pre-messengerribonucleic acid (mRNA) splicing and the export of fully processed mRNA tothe cytoplasm.536 This protein belongs to the FET family of RNA-binding proteins(consisting of FUS, EWS and TAF15) which have been implicated in cellularprocesses that include regulation of gene expression, maintenance of genomicintegrity and mRNA/microRNA processing.537 FET genes are directly involvedin deleterious genomic rearrangements, primarily in sarcomas and leukaemia.538

Given that Patient 1-II-2 was diagnosed with a myxoid liposarcoma (a tumourderived from primitive cells that undergo adipose differentiation), the regionidentified on chromosome 16 may be significant. Chromosome 16 shows a vastregion of LOH that encompasses both the RBL2 and FUS genes, as well as64 other known cancer genes and numerous SCNA events that may contributetowards tumour pathogenesis in this patient.

124

Page 155: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

4.4.2 Strengths

A strength of the current study is the confirmation of statistically significantsomatic variants using a second, independent variant caller. Many cancer sequencingstudies have relied on a single calling pipeline to generate candidates. However,there is an imperfect consensus between different callers; therefore the resultsfrom a single caller should not be over-interpreted.539 Each caller algorithm hasdifferent weaknesses, and VarScan2 has a tendency to return a very high totalnumber of reported calls, which indicates a low specificity.540 Ideally, more thanone algorithm with different biases may reduce the number of false positives.539

Therefore, statistically significant somatic variants called by VarScan2 werevalidated using a second somatic variant caller, Strelka. Of the 14 statisticallysignificant somatic variants called by VarScan2 (11 in Patient 1-II-2 and threein Patient 2-II-1), 13 were also called by Strelka (93%). As these somatic variantshave been called by two independent callers, it is less likely that these are falsepositive results. Despite these somatic variants being called by two independentsomatic variant callers, these variants should be validated using Sanger sequencing.However, this was beyond the scope of the current project.

4.4.3 Limitations

The analysis of matched tumour and germline data has several unique challengesincluding accounting for heterogeneity from subclonal variation and sampleimpurity.148,541,542 The nature of cancer tissue makes somatic variant callinga challenging task.540 The tumour DNA for this analysis was extracted fromFFPE samples collected > ten years earlier. It is hard to determine the tumourpurity and heterogeneity from DNA extracted in this manner as it is impossibleto verify whether the block contained a mixture of tumour and adjacent normaltissue, or whether the tumour contained heterogeneous cell populations. Therefore,the tumour purity and effects of heterogeneity could not be taken into accountin this analysis but should be considered in future studies.

125

Page 156: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Other issues that arise from using FFPE samples for WES are artefacts suchas fragmentation and artificial base alterations.543–547 FFPE samples can be agood resource for discovery of biomarkers in cancer using WES, but fresh frozentissue is preferred as it minimises the damage to nucleotides.543

There are also sources of error from mapping and sequencing processes. In general,data generated on the Illumina platform have increased error rates at the end ofreads, a tendency towards transversion base call errors, a low INDEL error rate,and systematic sequence-specific errors following inverted repeat sequences andGG motifs.548–550

The matched tumour and germline comparisons were only performed on twoof the five sarcoma cases from the three cancer cluster families described inChapter 2. A clearer picture of the full somatic mutation burden in the threecancer cluster families could be achieved by performing a matched tumour andgermline analysis for all sarcomas and other cancers in these families. However,tumour DNA was not available for these patients.

4.4.4 Summary

In summary, 13 novel somatic mutations were identified in two myxoid liposarcomapatients. Two of the genes in which somatic mutations were identified (FHOD3and ABL1 ) have been previously associated with myxoid liposarcoma in theliterature. A large region of LOH and SCNA on chromosome 16q that includesthe genes FUS and RBL2 was reported in Patient 1-II-2, which suggests thatthis chromosomal region may contribute towards tumour pathogenesis in thispatient.

The genes in which somatic and LOH variants were identified are candidates forfurther investigation. Independent experimental validation should be performedto screen additional myxoid liposarcomas for variants in these candidate genes.Further functional studies could be carried out to determine the role of thesevariants or genes in myxoid liposarcoma pathogenesis. Due to time and budgetlimitations, these types of functional studies are beyond the scope of this thesis.

126

Page 157: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Chapter 5

Aim 4: Variant burden analysesat candidate risk loci in sarcomacases and healthy ageing controls

5.1 Introduction

In genetic studies of complex human disease, like cancers, the validation of candidaterisk variants is an important and often rate-limiting step.551–553 Existing singlevariant association tests are underpowered for validating rare risk variants unlesssample or effect sizes are large.554,555 A more robust approach involves combininginformation across variants in a target region, such as a gene.556 Burden testsuse methods that combine rare and common variants across a gene/target regionand compare an aggregate statistic between cases and controls.272,557,558 A simpleapproach is to summarise the genotype information by counting the numberof minor alleles across all variants in the target region.556 In this chapter thecandidate risk loci identified in Chapter 3 and Chapter 4 will be assessed by acase-control variant burden analysis to evaluate the full mutational burden ofthese regions.

127

Page 158: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

5.1.1 Variant burden analyses in sarcoma cohorts

A case-control rare variant burden analysis has previously been performed usingsarcoma cases from the International Sarcoma Kindred Study (ISKS) cohort.178

Targeted exon sequencing was performed on 72 genes associated with increasedcancer risk in 1,162 sarcoma cases (including 966 from the ISKS) and 6,545Caucasian controls. Ballinger et al. found an excess of pathogenic germlinevariants (combined odds ratio (OR) = 1.43, 95% confidence interval = 1.24–1.64,p-value < 0.0001) with approximately half of the sarcoma cases found to haveputatively pathogenic monogenic and polygenic variation in known and novelcancer genes.178 This study found a measurable contribution of polygenic effectsto sarcoma risk by rare variant burden analysis of cases and controls.178

A variant burden analysis was also performed using 175 Ewing’s sarcoma patientsfrom the International Cancer Genome Consortium (100 patients) and PediatricCancer Genome Project (19 patients).559 Pathogenic and likely pathogenic mutationswere found in 13.1% of Ewing’s sarcoma cases, which is significantly highercompared to the same genes in the Exome Aggregation Consortium (ExAC)database (53,105 subjects).559 Brohl et al. found pathogenic mutations werehighly enriched for genes involved in DNA damage repair and cancer predispositionsyndromes.559 A table of genes identified by Ballinger et al. and Brohl et al.can be found in Appendix L.

5.2 Methods

5.2.1 Study participants

Sarcoma cases (561) were selected from the ISKS,175 described previously inChapter 2. Briefly, the ISKS was initiated in 2008 and is a global resource forresearchers to investigate the hereditary characteristics of sarcoma.175 Patientswith sarcoma were recruited from major sarcoma treatment centres across Australia,France, New Zealand, India, the United States of America (USA), the UnitedKingdom, and Canada, regardless of their family history of cancer.175 Individualswith adult-onset sarcoma (> 15 years old) were eligible for the ISKS.

128

Page 159: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

A total of 1,144 healthy ageing cancer-free controls were selected from the MedicalGenome Reference Bank (MGRB) program.560 The MGRB program is a collaborativeproject between the New South Wales State Government and the Garvan Instituteof Medical Research to sequence healthy, older individuals to create a high qualitydatabase that is depleted of damaging genetic variants.560 The MGRB programutilises participants from an existing cohort, the ASPirin in Reducing Eventsin the Elderly (ASPREE) Study.561 The ASPREE Study is an internationalclinical trial to determine whether daily low-dose aspirin improves the quality oflife for 19,000 older people in Australia and the USA.561

5.2.2 Whole genome sequencing

Whole genome sequencing (WGS) for ISKS cases and MGRB controls was performedby collaborators at the Garvan Institute for Medical Research. Cases and controlswere sequenced at one lane per sample on the Illumina HiSeq X Ten platformusing TruSeq Nano chemistry (2 x 150 base pair paired-end reads,> 30X mean depth for all samples). Samples passing FastQC394 and verifyBamID562

contamination filters were mapped to the 1000 Genomes Project hs37d5 reference563

with an additional PhiX decoy, and small variants called using the GenomeAnalysis Toolkit (GATK) 3.7 best practices pipeline.564 The hs37d5 referenceis the hg19-based reference genome employed by the 1000 Genomes Projectfor Phase 3 analysis. This genome differs from the hg19 genome due to theinclusion of 35 Mb of human sequence that is included as an additional contig(hs37d5). Variants passing variant quality score recalibration (VQSR) tranchethresholds of 99.5% (single nucleotide polymorphisms) and 99.0% (insertionsand deletions) were retained to summarise frequencies.182

5.2.3 Genomic regions selected for validation

Table 5.1 contains eight target regions that were identified in Chapter 3 (ABCB5,ARHGAP39, BEAN1, C16orf96, KIF2C, PDIA2, UVSSA and ZFP69B). Thesetarget regions are genes in which germline risk variants segregating with cancerand age at onset of cancer in three cancer-cluster families were identified usingwhole exome sequencing (WES).

129

Page 160: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 5.1: Genomic coordinates for target regions in which germline andsomatic risk variants were identified

Target region Chromosome Start coordinate End coordinate

Identified in Chapter 3

KIF2C 1 45,204,490 45,234,438

ZFP69B 1 40,915,337 40,930,390

UVSSA 4 1,340,104 1,382,837

ABCB5 7 20,654,245 20,797,637

ARHGAP39 8 145,753,563 145,839,888

BEAN1 16 66,460,200 66,517,745

C16orf96 16 4,605,491 4,651,318

PDIA2 16 331,615 338,209

Identified in Chapter 4

P4HTM 3 49,026,341 49,045,581

TET2 4 106,066,842 106,201,960

PLK2 5 57,748,810 57,756,966

SLC6A18 5 1,224,470 1,247,304

LAMA2 6 129,203,286 129,838,710

SDR16C6P,PENK 8 57,286,277 57,359,593

ABL1 9 133,588,268 133,764,062

ASPN 9 95,217,489 95,245,844

SLC22A20,POLA2 11 64,980,311 65,066,088

ADSSL1 14 105,195,228 105,214,647

PRMT5 14 23,388,733 23,399,661

FHOD3 18 33,876,702 34,361,018

GATAD2A 19 19,495,642 19,620,741Genomic coordinates for each target region (± 1,000 bases) based on human genome 19

(hg19) were obtained from the University of California Santa Cruz (UCSC) Genome Browser

(https://genome.ucsc.edu/).565

130

Page 161: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

The additional 13 target regions listed in Table 5.1 were identified in Chapter4; (ABL1, ADSSL1, ASPN, FHOD3, GATAD2A, LAMA2, P4HTM, PLK2,PRMT5, SLC6A18, TET2, two target regions encompassing SDR16C6P andPENK, SLC22A20, and POLA2 ). These target regions are genes in which candidatesomatic risk variants were identified by a matched tumour and germline analysisin two myxoid liposarcoma patients. For intergenic variants, both flanking geneswere included.

Genomic coordinates for each target region were obtained from the University ofCalifornia Santa Cruz (UCSC) Genome Browser (https://genome.ucsc.edu/)565

using human genome build 19 (hg19) and included 1,000 bases either side ofeach target region.

Frequency summary files for the target regions for both case and controls (invariant call format (*.vcf)) were received and annotated using Annotate Variation(ANNOVAR, version 2015Jun16) and Regulome database (RegulomeDB).245,257

5.2.4 Statistical analyses

Using the annotation from ANNOVAR, the number of nonsynonymous anddeleterious alleles (defined as deleterious in both Sorting Intolerant from Tolerant(SIFT) and Polymorphism Phenotyping-2 (PolyPhen-2))266,267 and normal allelesin each target region were summed in cases and controls. As deleterious alleleswere defined as being deleterious in both SIFT and PolyPhen-2, this was a moreconservative approach.566 The number of putative regulatory alleles (definedas those with a RegulomeDB score of 1a, 1b, 1c, 1d, 1e, 1f, 2a, 2b or 2c) andnormal alleles in each target region were summed in cases and controls. Table5.2 shows the classification of scores from RegulomeDB.

Odds ratios (ORs) and p-values reported for variant burden analysis were obtainedfrom one-sided Fisher’s exact tests performed in R281 to compare the total burdenof deleterious and putative regulatory variants, separately, in cases and controls,a method used previously by Ballinger et al. (2016).178 Bonferroni adjustmentwas performed to correct for multiple testing.292

131

Page 162: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 5.2: Classification of Regulome database scores

Score Supporting data

1a eQTL + TF binding + matched TF motif + matchedDNase Footprint + DNase peak

1b eQTL + TF binding + any motif + DNase Footprint +DNase peak

1c eQTL + TF binding + matched TF motif + DNase peak

1d eQTL + TF binding + any motif + DNase peak

1e eQTL + TF binding + matched TF motif

1f eQTL + TF binding / DNase peak

2a TF binding + matched TF motif + matched DNaseFootprint + DNase peak

2b TF binding + any motif + DNase Footprint + DNase peak

2c TF binding + matched TF motif + DNase peak

3a TF binding + any motif + DNase peak

3b TF binding + matched TF motif

4 TF binding + DNase peak

5 TF binding or DNase peak

6 OthereQTL: Expression Quantitative Trait Loci. TF: Transcription Factor. DNase:

Deoxyribonuclease.

132

Page 163: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

5.3 Results

5.3.1 Identification of nonsynonymous deleterious variantsin the target regions

The results of the annotation of the frequency summary files using ANNOVARand RegulomeDB are summarised in Table 5.3. On average, 1,128 variants wereidentified in each gene in ISKS cohort and 2,282 in the MGRB cohort. Eachgene had an average of five nonsynonymous deleterious variants in the ISKScohort and six nonsynonymous deleterious variants in the MGRB cohort. TheISKS cohort had an average of 12 putative regulatory variants per gene comparedto 11 per gene in the MGRB cohort.

Table 5.3: Annotated summary of nonsynonymous deleterious variants andputative regulatory variants in the target regions

Total variants Deleterious variants Regulatory variants

Target region ISKS MGRB ISKS MGRB ISKS MGRB

KIF2C 345 396 0 1 2 2

ZFP69B 197 224 2 4 0 0

P4HTM 115 163 2 2 7 7

TET2 328 386 5 9 6 5

UVSSA 667 841 8 6 23 22

PLK2 89 106 0 1 2 2

SLC6A18 314 425 5 8 4 4

LAMA2 6,431 7,864 7 14 18 16

ABCB5 2,117 22,112 17 16 10 8

ARHGAP39 1,085 1,313 1 3 4 5

SDR16C6P,PENK 842 1,100 2 1 3 2

ABL1 2,173 2,563 3 2 24 26

ASPN 23 31 1 2 0 0

133

Page 164: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Total variants Deleterious variants Regulatory variants

Target region ISKS MGRB ISKS MGRB ISKS MGRB

SLC22A20,POLA2 811 954 1 2 22 19

ADSSL1 249 302 5 6 13 13

PRMT5 51 57 1 1 0 0

BEAN1 506 582 1 0 7 7

C16orf96 552 737 4 9 8 7

PDIA2 128 124 7 3 1 1

FHOD3 5,215 5,892 9 9 62 49

GATAD2A 1,456 1,753 1 2 32 31

ISKS: International Sarcoma Kindred Study. MGRB: Medical Genome Reference Bank.

Deleterious variants: defined as nonsynonymous variants that are deleterious in both Sorting

Intolerant from Tolerant (SIFT) and Polymorphism Phenotyping-2 (PolyPhen-2). Regulatory

variants: defined as variants with a Regulome database score < 3. Number of variants corresponds

to the number of deleterious or regulatory variants within each target region.

5.3.2 Statistical analyses

5.3.2.1 Nonsynonymous deleterious variants

Table 5.4 shows the number of nonsynonymous deleterious alleles and normalalleles for each target region for cases and controls and the results of Fisher’sexact test. The significance level after Bonferroni correction was α < 2.38 x10−3. A table containing each nonsynonymous deleterious variant for each targetregion that was included in the variant burden test is located inAppendix K.

134

Page 165: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 5.4: Odds ratios, p-values and 95% confidence intervals from Fisher’sexact test for target regions for nonsynonymous deleterious variants

Target region Chr. Identified as Odds ratio p-value 95% CI

KIF2C 1 Germline 0 1 .

ZFP69B 1 Germline 0.51 0.55 0.06-2.17

P4HTM 3 Somatic 3.08 0.34 0.35-36.89

UVSSA 4 Germline 1.29 0.45 0.69-2.41

TET2 4 Somatic 2.24 2.29 x 10−3 1.31-3.78

PLK2 5 Somatic 0 1 .

SLC6A18 5 Somatic 2.12 7.271 x 10−7 1.57-2.85

LAMA2 6 Somatic 0.88 0.58 0.59-1.30

ABCB5 7 Germline 0.99 0.82 0.89-1.09

ARHGAP39 8 Germline 2.07 0.45 0.04-25.76

SDR16C6P,PENK 8 Somatic 2.04 0.62 0.11-120.33

ABL1 9 Somatic 1.36 0.70 0.18-10.19

ASPN 9 Somatic 2.05 0.03 0.99-4.04

SLC22A20,POLA2 11 Somatic 2.04 0.48 0.03-39.19

ADSSL1 14 Somatic 1.36 0.56 0.36-4.52

PRMT5 14 Somatic 2.04 0.55 0.03-160.01

BEAN1 16 Germline 0 1 .

C16orf96 16 Germline 2.78 9.95 x 10−5 1.64-4.64

PDIA2 16 Germline 0.42 2.2 x 10−16 0.35-0.51

FHOD3 18 Somatic 1.11 0.19 0.94-1.30

GATAD2A 19 Somatic 2.06 0.48 0.03-39.55

Chr: Chromosome. ISKS: International Sarcoma Kindred Study. MGRB: Medical Genome

Reference Bank. CI: Confidence interval. Odds ratios, p-values and 95% CI obtained from

Fisher’s exact test performed in R.

135

Page 166: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Four target regions reached statistical significance after correction for multipletesting (C16orf96, PDIA2, SLC6A18 and TET2 ). Of these, C16orf96 and PDIA2were initially identified as germline variants in three cancer cluster families,and SLC6A18 and TET2 were identified as somatic variants from a matchedtumour-germline analysis in two myxoid liposarcoma cases.

The odds ratios in Table 5.4 indicate a higher burden of nonsynonymous deleteriousalleles in sarcoma cases compared to controls for C16orf96, SLC6A18 and TET2.However, the odds ratio for PDIA2 suggests that controls have a higher burdenof variant alleles compared to the sarcoma cases.

5.3.2.2 Putative regulatory variants

Table 5.5 shows the number of putative regulatory alleles and normal allelesfor each target region for cases and controls and the results of Fisher’s exacttest. The significance level after Bonferroni correction was α < 2.78 x 10−3. Atable containing each putative regulatory variant for each target region that wasincluded in the variant burden test is located in Appendix M.

136

Page 167: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Table 5.5: Odds ratios and p-values from Fisher’s exact test for target regionsfor putative regulatory variants

Target region Chr. Identified as Odds ratio p-value 95% CI

KIF2C 1 Germline 1 0.98 0.89-1.13

ZFP69B 1 Germline . . .

P4HTM 3 Somatic 1 0.85 0.95-1.04

UVSSA 4 Germline 0.86 2.2 x 10−16 0.83-0.88

TET2 4 Somatic 0.78 5.27 x 10−4 0.68-0.90

PLK2 5 Somatic 0.89 0.19 0.75-1.06

SLC6A18 5 Somatic 1.14 0.21 0.93-1.39

LAMA2 6 Somatic 0.89 1.11 x 10−6 0.85-0.93

ABCB5 7 Germline 0.72 2.2 x 10−16 0.68-0.75

ARHGAP39 8 Germline 1.29 4.91 x 10−6 1.16-1.45

SDR16C6P,PENK 8 Somatic 2.04 0.34 0.48-9.84

ABL1 9 Somatic 1.18 5.109 x 10−12 1.12-1.23

ASPN 9 Somatic . . .

SLC22A20,POLA2 11 Somatic 1.27 2.2 x 10−16 1.22-1.31

ADSSL1 14 Somatic 1.01 0.64 0.97-1.05

PRMT5 14 Somatic . . .

BEAN1 16 Germline 1 0.99 0.93-1.08

C16orf96 16 Germline 0.84 4.38 x 10−10 0.79-0.89

PDIA2 16 Germline 0.85 0.85 0.36-1.86

FHOD3 18 Somatic 0.7 2.2 x 10−16 0.68-0.72

GATAD2A 19 Somatic 0.97 0.08 0.94-1.00

Chr: Chromosome. ISKS: International Sarcoma Kindred Study. MGRB: Medical Genome

Reference Bank. CI: Confidence interval. Odds ratios, p-values and 95% CI obtained from

Fisher’s exact test performed in R.

137

Page 168: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Nine target regions reached statistical significance after correction for multipletesting (ABCB5, ARHGAP39, C16orf96, UVSSA, ABL1, FHOD3, LAMA2,TET2 and a region encompassing SLC22A20 and POLA2 ). Of these, ABCBC5,ARHGAP39, C16orf96, UVSSA were identified as germline variants in threecancer cluster families, and ABL1, FHOD3, LAMA2, TET2 and a region encompassingSLC22A20 and POLA2 were identified as somatic variants from a matchedtumour-germline analysis in two myxoid liposarcoma cases.

The odds ratios indicate a higher burden of putative regulatory variants in sarcomacases compared to controls for ARHGAP39, ABL1 and a region encompassingSLC22A20 and POLA2. However, the odds ratio for ABCB5, C16orf96, UVSSA,FHOD3, LAMA2, and TET2 indicates that controls have a higher burden ofvariant alleles compared to the sarcoma cases.

5.4 Discussion

A total of six target regions of interest (C16orf96, SLC6A18, TET2, ARHGAP39,ABL1 and a region encompassing SLC22A20 and POLA2 ) were found to havea higher burden of nonsynonymous deleterious variants or putative regulatoryvariants in 561 sarcoma cases compared to 1,144 healthy ageing controls.

5.4.1 Novel findings

This is the first study to report associations between the C16orf96, SLC6A18,ARHGAP39, POLA2 and SLC22A20 genes and sarcoma. None of these geneswere reported by Ballinger et al. or Brohl et al. in their variant burden analysesin sarcoma cohorts.178,559

C16orf96 is an open reading frame gene on chromosome 16 that is an uncharacterisedprotein coding gene. The function of C16orf96 is currently unknown, and expressionis generally low in cells. In situ hybridisation experiments have shown C16orf96RNA expression is low in testis and skin only and not present in other tissuetypes.567 The function of this gene or any potential role for this gene in cancerpathogenesis has not been established.

138

Page 169: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

The SLC6A18 gene is a member of the SLC6 specific transporter family. SLC6A18is involved in the transport of glucose and other sugars, bile salts and organicacids, metal ions and amine compounds. A previous study reported a gain ofregion 5p15.33 containing SLC6A18 in small cell lung cancers.463 Copy numbervariations in SLC618A have also been reported in lung adenocarcinoma.568

The protein encoded by ARHGAP39 is a binding partner for CNK2 that is aspatial modulator of Rac cycling during spine morphogenesis and signalling byG protein–coupled receptors (GPCR).297 There is no supporting evidence for arole for ARHGAP39 in cancer pathogenesis at this time.

SLC22A20 is a member of the solute carrier family that plays a role in inorganicanion exchanger activity. SLC22A20 is differentially methylated in hepatocellularcarcinoma and may be used as a biomarker for early detection.321,442,443

The POLA2 gene has been reported to be involved in cell proliferation by mediatingDNA replication, recombination, and repair.444 A variant in POLA2 has beenfound to improve differential survivability and mortality in non-small cell lungcancer patients and could be used as a prognostic biomarker.445,448 Low mRNAexpression of POLA2 was found to be prognostic of poor outcome in ovariancarcinomas.446 Additionally, POLA2 was found to be overexpressed in mesothelioma.449

The role of C6orf96, SLC6A18, ARHGAP39, SLC22A20 and POLA2 in sarcomapathogenesis remains to be elucidated. The results of this study should prioritisefurther research on these genes in sarcomas.

5.4.2 Known cancer genes

Both the TET2 and ABL1 genes are known cancer genes listed in the Catalogueof Somatic Mutations in Cancer (COSMIC) cancer gene census.134 TET2 isreported to be frequently mutated or inactivated in cancer and mutations arecommonly observed in myeloid, lymphoid and haematological malignancies.425–430

TET2 has previously been associated with sarcomas. The loss of TET2 is acharacteristic of myeloid sarcomas and may be used as a novel marker.569,570

139

Page 170: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ABL1 is a proto-oncogene that encodes a protein tyrosine kinase involved ina variety of cellular processes, including cell division, adhesion, differentiation,and response to stress.571 This gene is known to be fused to a variety of translocationpartner genes in various leukaemias, for example, chronic myelogenous leukaemia(BCR-ABL1 ).572 ABL kinases may also play a role in solid tumours includingbreast, colon, lung and kidney carcinomas, and melanoma.573–582

ABL1 variants have previously been reported in sarcomas. Two patients withchronic myeloid leukaemia and secondary sarcomas (histiocytic sarcoma andsegregated extramedullary (nodal) myeloid sarcoma) were found to be positivefor the t(9;22) BCR/ABL1 translocation in the sarcoma tumours.583,584 Thisevidence suggests that the lineages may be clonally related.583,584 However,there is no evidence of ABL1 variants in sarcoma cases without chronic myeloidleukaemia.

5.4.3 Clinical implications

Three of the regions of interest identified in this study may have clinical implicationsin the treatment of sarcomas. TET2 is listed in the Genomics of Drug Sensitivityin Cancer database and shows a statistically significant association(p-value < 10−3) with VNLG/124 and Bexarotene.391 There may be myeloidsarcomas among the ISKS cases sequenced in this study that harbour TET2mutations and may respond to VNLG/124 or Bexarotene. However, there mayalso be other sarcoma subtypes harbouring TET2 mutations. The role of TET2in sarcoma subtypes other than myeloid sarcomas and treatment of sarcomaswith TET2 variants using VNLG/124 and Bexarotene should be further investigated.

ABL1 is associated with trabectedin sensitivity in myxoid liposarcomas.489

Therefore, there may be an opportunity to treat other sarcoma subtypes thatexhibit ABL1 variants with trabectedin. An expanded access program testedtrabectedin in patients with incurable soft tissue sarcoma following the progressionof disease with standard therapy.585 Results of the study demonstrated diseasecontrol despite a low incidence of objective responses in advanced soft tissuesarcoma patients after failure of standard chemotherapy.585 The study also foundgreater clinical benefit rate and longer median overall survival in patients with

140

Page 171: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

leiomyosarcoma and liposarcoma compared with patients with histopathologicsubsets of sarcomas other than leiomyosarcoma and liposarcoma.585 A secondstudy that evaluated the effectiveness of trabectedin for patients with soft tissuesarcoma also found there may be a benefit in using trabectedin in patients withleiomyosarcoma or liposarcoma who failed standard of care agents.586

The SCL22A20 gene offers some interest and potential clinical utility as anuptake carrier of sorafenib, a multikinase inhibitor.442 Sorafenib has been shownto have activity in metastatic soft tissue sarcoma, specifically in leiomyosarcoma.587,588

5.4.4 Strengths and limitations

Classic single-marker association analysis for rare variants are underpoweredunless the sample size is extremely large, or the variants have a large effectsize.558,589 Consequently, burden tests for the analysis of rare genetic variantshave been developed that consider their joint effects on complex traits withinthe same functional unit or genomic region. The burden test makes assumptionsthat all variants in a region are causal and associated with a trait in the samedirection and magnitude of effect.590 Violation of these assumptions can reducethe power of the test.591–593

For the variants identified in a genomic region by WES and WGS, like in thisstudy, some variants will have little or no effect on the phenotype, some variantsmay be protective, and some may be deleterious. The magnitude of the effect ofeach variant may also vary. For example, rare variants may have a larger effectcompared to common variants. Some burden tests, for example, sequence kernelassociation tests, take violations of these assumptions into consideration.592

However, as only frequency summary files for each cohort were available, thebreach of these assumptions could not be addressed at this time.

141

Page 172: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

There were also several regions of interest that were identified to have a highermutational rate in controls compared to cases. PDIA2 was found to have ahigher rate of nonsynonymous deleterious variants in controls compared to cases.ABCB5, C16orf96, UVSSA, FHOD3, LAMA2 and TET2 were found to have ahigher rate of putative regulatory variants in controls compared to cases. Thismay be due to the presence of common minor alleles in the general population(see Appendix K for minor allele frequencies (MAF) for each variant) or thepresence of variants that are phenotypically neutral.

Two of the regions of interest (TET2 and C16orf96 ) were found to have a highermutational rate of nonsynonymous deleterious variants in cases compared tocontrols, but a higher mutational rate of putative regulatory variants in controls.This may also be due to the presence of common minor alleles classified as putativeregulatory variants (see Appendix M for MAF for each variant). For example,two putative regulatory variants in C16orf96 have a MAF of 0.61 and 1.00 inthe general population. Therefore, these may be phenotypically neutral variants.Whereas the nonsynonymous deleterious variants in C16orf96 had MAF < 2%.Likewise, TET2 nonsynonymous variants had MAF < 2% whereas one putativeregulatory variant had a MAF of 0.21.

Due to these findings of higher mutational rates in controls compared to casesand contradictory findings between nonsynonymous deleterious variants andputative regulatory variants for C16orf96 and TET2, further studies are requiredto confirm these gene-level associations.

5.4.5 Conclusion

In conclusion, six target regions that were identified by WES in cancer clusterfamilies and matched tumour and germline analysis of two myxoid liposarcomashave been validated using a large independent case and control cohort. C16orf96,SLC6A18 and TET2 were found to have a higher mutational burden of nonsynonymousdeleterious variants in sarcoma cases compared to healthy ageing controls. Ahigher mutational burden of putative regulatory variants in cases was found inARHGAP39, ABL1 and a region encompassing SCL22A20 and POLA2. Thisstudy reported five novel associations between C6orf96, SLC6A18, ARHGAP39,

142

Page 173: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

POLA2 and SLC22A20 and sarcoma. Two of these genes, TET2 and ABL1,are known cancer genes and have potential clinical utility as they have beenidentified to contribute to drug sensitivity in cancers. This study has identifiednovel risk genes that appear to have a higher mutational burden in sarcomacases compared to healthy ageing controls and should be prioritised for furtherresearch.

143

Page 174: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

144

Page 175: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Chapter 6

Conclusion

6.1 Summary of results

Whole exome sequencing (WES) was performed on three mixed cancer clusterfamilies identified by a sarcoma proband from the International Sarcoma KindredStudy (ISKS). The cancer cluster families selected were not defined by knowncancer predisposition syndromes and therefore represented an opportunity toidentify novel risk variants associated with both sarcoma and cancer risk.

The WES data was annotated, filtered and prioritised using three different strategiesto identify rare private variants, known rare variants and candidate gene variants.The prioritised variants were then tested for association with cancer phenotypesusing Sequential Oligogenic Linkage Analysis Routines (SOLAR). Nominallysignificant variants were assessed for familial segregation in each cancer clusterfamily. Eight novel putative germline risk variants were identified to segregatewith cancer in the families. Each variant was private to a single family andshowed segregation with mixed cancer types. These findings suggest the presenceof inherited cancer mutations that may increase the risk for cancer within families.

145

Page 176: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Matched tumour and germline analyses were performed on two myxoid liposarcomacases from the cancer cluster families. VarScan2 and Strelka were used to identify13 novel statistically significant somatic mutations. A vast region of loss ofheterozygosity and somatic copy number alterations on chromosome 16 encompassingthe RBL2 and FUS genes was also identified in one of the tumours, which maycontribute towards tumour pathogenesis.

Target regions in which germline and somatic mutations were identified in thecancer cluster families were validated using variant burden analyses in 561 sarcomacases and 1,144 healthy ageing controls. Six target regions showed an increasedmutational burden of nonsynonymous deleterious variants (C16orf96, SLC6A18and TET2 ) or putative regulatory variants (ARHGAP39, ABL1 and a regionencompassing SLC22A20 and POLA2 ) in sarcoma cases compared to controls.

6.2 Clinical utility of findings

Two target regions that were found to have a higher mutational burden in sarcomacases (TET2 and ABL1 ) are known cancer genes and have potential clinicalutility in the treatment of sarcomas as they have both been identified to contributeto drug sensitivity in cancers. Also, the SCL22A20 gene offers potential clinicalutility as an uptake carrier of sorafenib.

TET2 and ABL1 have been reported to be associated with myeloid sarcomasand secondary sarcomas in patients with chronic myeloid leukaemia, respectively.However, there is no evidence of association with other sarcoma subtypes. Theremaining genes identified in this study represent novel candidate risk genes forsarcoma. The POLA2 gene has been reported to be involved in cell proliferationby mediating DNA replication, recombination, and repair.444 The role of theremaining genes of interest (C16orf96, ARHGAP39 and SLC6A18 ) in cancerpathogenesis remain to be elucidated.

As previously observed by Ballinger et al. and consistent with the findings ofthe current study, there is a burden of clinically relevant genetic variation insarcoma patients and their families.178 The results from this study will be returnedto the ISKS coordinators and submitted to a central database. The databasecontains molecular and biological information that has been collected over time

146

Page 177: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

on the ISKS families and specimens. It is critical to catalogue genetic variantsas future studies of these candidates may provide a further understanding ofthe aetiology of sarcoma or new therapies that target these candidates may bedeveloped.

6.3 Review of methodology

The current study was the first to perform WES in mixed cancer cluster familiesidentified by a sarcoma proband. This study is an example of a successful two-phasenext generation sequencing family study approach; the application of WESto cancer cluster families with rare cancers followed by larger replication inindependent population cohorts. The results of the current study show the utilityof this approach in small cancer cluster families to identify novel risk genes for arare disease, such as sarcoma.

The current study was limited in the size of the initial study sample (19 peoplein three families) and assumptions used for variant filtering, prioritisation andsegregation analysis, and the availability of tumour DNA. The validation usingvariant burden analysis was also limited by the inability to account for risk,neutral and protective alleles. The current state of bioinformatic tools, databasesand knowledge of cancer biology underpinned the study design and analysesperformed. The WES data generated in this study may be re-analysed in thefuture as new tools are developed and/or the results may become clinicallyrelevant as knowledge in this field progresses.

The validation of findings from WES (both germline in families and tumour-germlinecomparison in myxoid liposarcomas) does not provide conclusive evidence of aninvolvement of these genes in sarcoma pathogenesis. Rather, the results of thisstudy should be seen as hypothesis-generating for novel candidate risk genesthat should be prioritised for future research.

147

Page 178: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

6.4 Recommendations for future work

The current study has identified novel candidate risk genes for sarcoma by performingWES in a small number of cancer cluster families. The role of these genes insarcoma pathogenesis has not been elucidated in this study and was beyondthe scope of this thesis. These genes, however, become candidates that canbe further tested for association in other sarcoma and cancer cohorts and forfunctional validation studies such as molecular assays to determine expression orinteractions, or biological assays in animal models.

The two-phase NGS family study approach is gaining momentum in genomicsliterature as researchers return to family-based study designs to identify raregenetic variants. The current study adds to the growing evidence that this approachcan be successfully used to identify novel risk genes for a rare complex diseasesuch as sarcoma, and may be extended to identify novel risk genes for othercomplex diseases.

148

Page 179: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Bibliography

1Gerard I. Evan and Karen H. Vousden. Proliferation, cell cycle and apoptosisin cancer. Nature, 411(6835):342–348, 2001.

2 SEER Training Modules. Cancer classification. Technical report, U.S.National Institutes of Health, National Cancer Institute, 2016.

3 Fred Bunz. Principles of cancer genetics. Springer, Netherlands, 1st edition,2008.

4Geoffrey M. Cooper and Robert E. Hausman. The development and causesof cancer. In The Cell: A Molecular Approach, pages 725–766. SinauerAssociates Sunderland, 2nd edition, 2000.

5 Jacques Ferlay, Isabelle Soerjomataram, Rajesh Dikshit, Sultan Eser,Colin Mathers, and Marise Rebelo et al. Cancer incidence and mortalityworldwide: sources, methods and major patterns in GLOBOCAN 2012.International Journal of Cancer, 136(5):E359–E386, 2015.

6World Health Organization. Global health observatory: the data repository;URL: http://www.who.int/gho/en/, 2016.

7World Health Organization. Health in 2015: from MDGs to SDGs. WorldHealth Organization, Geneva, 2015.

8Rijo John and Hana Ross. The global economic cost of cancer. Technicalreport, American Cancer Society, 2010.

9Bert Vogelstein and Kenneth W. Kinzler. Cancer genes and the pathwaysthey control. Nature Medicine, 10(8):789–799, 2004.

149

Page 180: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

10Douglas Hanahan and Robert A. Weinberg. The hallmarks of cancer. Cell,100(1):57–70, 2000.

11Keith R. Loeb and Lawrence A. Loeb. Significance of multiple mutations incancer. Carcinogenesis, 21(3):379–385, 2000.

12Roshan Karki, Deep Pandya, Robert C. Elston, and Cristiano Ferlini.Defining mutation and polymorphism in the era of personal genomics. BMCMedical Genomics, 8(1):1, 2015.

13Michael R. Stratton, Peter J. Campbell, and P. Andrew Futreal. The cancergenome. Nature, 458(7239):719–724, 2009.

14 Simon J. Talbot and Dorothy H. Crawford. Viruses and tumours - anupdate. European Journal of Cancer, 40(13):1998–2005, 2004.

15 Peter A. Jones and Stephen B. Baylin. The fundamental role of epigeneticevents in cancer. Nature Review Genetics, 3(6):415–428, 2002.

16Bert Vogelstein, Nickolas Papadopoulos, Victor E. Velculescu, Shibin Zhou,Luis A. Diaz, and Kenneth W. Kinzler. Cancer genome landscapes. Science,339(6127):1546–1558, 2013.

17Christopher Greenman, Philip Stephens, Raffaella Smith, Gillian L.Dalgliesh, Christopher Hunter, and Graham Bignell et al. Patterns ofsomatic mutation in human cancer genomes. Nature, 446(7132):153–158,2007.

18Daniel G. Miller. On the nature of susceptibility to cancer. The presidentialaddress. Cancer, 46(6):1307–1318, 1980.

19Anna C. Schinzel and William C. Hahn. Oncogenic transformation andexperimental models of human cancer. Frontiers in Bioscience, 13:71–84,2007.

20Niko Beerenwinkel, Tibor Antal, David Dingli, Arne Traulsen, Kenneth W.Kinzler, and Victor E. Velculescu et al. Genetic progression and the waitingtime to cancer. PLOS Computational Biology, 3(11):e225, 2007.

150

Page 181: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

21 Pawan Upadhyay, Renu Dwivedi, and Amit Dutt. Applications ofnext-generation sequencing in cancer. Current Science, 107(5):795, 2014.

22 International Agency for Research on Cancer. World cancer report 2014.Technical report, World Health Organisation, 2014.

23Australian Institute of Health and Welfare & Australasian Associationof Cancer Registries. Cancer in Australia: an overview, 2012. Technicalreport, AIHW, 2012.

24 Julian Peto. Cancer epidemiology in the last century and the next decade.Nature, 411(6835):390–395, 2001.

25Tracey DiSipio, Carla Rogers, Beth Newman, David Whiteman, ElizabethEakin, Lin Fritschi, and Joanne Aitken. The Queensland cancer risk study:behavioural risk factor results. Australian and New Zealand Journal of PublicHealth, 30(4):375–382, 2006.

26 Elizabeth B. Claus, Joellen M. Schildkraut, Douglas W. Thompson, andNeil J. Risch. The genetic attributable risk of breast and ovarian cancer.Cancer, 77(11):2318–2324, 1996.

27 Lauri A. Aaltonen, Reijo Salovaara, Paula Kristo, Federico Canzian, AkseliHemminki, and Paivi Peltomaki et al. Incidence of hereditary nonpolyposiscolorectal cancer and the feasibility of molecular screening for the disease.New England Journal of Medicine, 338(21):1481–1487, 1998.

28Agnes Chompret, Laurence Brugieres, Muriel Ronsin, Maryvonne Gardes,Francoise Dessarps-Freichey, and Anne Abel et al. P53 germline mutations inchildhood cancers and cancer risk for carrier individuals. British Journal ofCancer, 82(12):1932, 2000.

29Carlo La Vecchia, Eva Negri, Antonella Gentile, and Silvia Franceschi.Family history and the risk of stomach and colorectal cancer. Cancer,70(1):50–55, 1992.

30Gianni Zanghieri, Carmela Di Gregorio, Carla Sacchetti, Rossella Fante,Romano Sassatelli, and Giacomo Cannizzo et al. Familial occurrence

151

Page 182: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

of gastric cancer in the 2-year experience of a population-based registry.Cancer, 66(9):2047–2051, 1990.

31 Shirley Hodgson. Mechanisms of inherited cancer susceptibility. Journal ofZhejiang University. Science. B, 9(1):1–4, 2008.

32Knut Borch-Johnsen, Jorgen H. Olsen, and Thorkild I.A. Sorensen. Genesand family environment in familial clustering of cancer. Theoretical Medicine,15(4):377–386, 1994.

33Kari Hemminki, Jan Sundquist, and Justo L. Bermejo. How common isfamilial cancer? Annals of Oncology, 19(1):163–167, 2008.

34David E. Goldgar, Douglas F. Easton, Lisa A. Cannon-Albright, andMark H. Skolnick. Systematic population-based assessment of cancer riskin first-degree relatives of cancer probands. Journal of the National CancerInstitute, 86(21):1600–1608, 1994.

35 Frederick P. Li, Joseph F. Fraumeni, John J. Mulvihill, William A. Blattner,Margaret G. Dreyfus, Margaret A. Tucker, and Robert W. Miller. Acancer family syndrome in twenty-four kindreds. Cancer Research,48(18):5358–5362, 1988.

36 Janice L. Berliner and Angela Musial Fay. Risk assessment and geneticcounseling for hereditary breast and ovarian cancer: recommendations ofthe national society of genetic counselors. Journal of Genetic Counseling,16(3):241–260, 2007.

37Kari Hemminki, Mahdi Fallah, and Akseli Hemminki. Collection anduse of family history in oncology clinics. Journal of Clinical Oncology,32(29):3344–3345, 2014.

38 Paul Lichtenstein, Niels V. Holm, Pia K. Verkasalo, Anastasia Iliadou,Jaakko Kaprio, and Markku Koskenvuo et al. Environmental and heritablefactors in the causation of cancer - analyses of cohorts of twins from Sweden,Denmark, and Finland. New England Journal of Medicine, 343(2):78–85,2000.

152

Page 183: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

39 Frederick P. Li and Joseph F. Fraumeni. Prospective study of a familycancer syndrome. The Journal of the American Medical Association,247(19):2692–2694, 1982.

40Anthony Antoniou, Paul D.P. Pharoah, Steven Narod, Harvey A. Risch,Jorunn E. Eyfjord, and John L. Hopper et al. Average risks of breast andovarian cancer associated with BRCA1 or BRCA2 mutations detected in caseseries unselected for family history: a combined analysis of 22 studies. TheAmerican Journal of Human Genetics, 72(5):1117–1130, 2003.

41Harvey A. Risch, John R. McLaughlin, David E.C. Cole, Barry Rosen, LindaBradley, and Elaine Kwan et al. Prevalence and penetrance of germlineBRCA1 and BRCA2 mutations in a population series of 649 women withovarian cancer. The American Journal of Human Genetics, 68(3):700–710,2001.

42Henry T. Lynch and Albert de la Chapelle. Hereditary colorectal cancer.New England Journal of Medicine, 348(10):919–932, 2003.

43Alfred G. Knudson. Mutation and cancer: statistical study ofretinoblastoma. Proceedings of the National Academy of Sciences,68(4):820–823, 1971.

44Abha Gupta and David Malkin. Sarcomasand cancer predisposition syndromes; URL:http://sarcomahelp.org/articles/sarcoma-predisposition-syndromes.html,2008.

45 Judy E. Garber and Kenneth Offit. Hereditary cancer predispositionsyndromes. Journal of Clinical Oncology, 23(2):276–292, 2005.

46Csilla I. Szabo and Mary-Claire King. Inherited breast and ovarian cancer.Human Molecular Genetics, 4(suppl 1):1811–1817, 1995.

47Mary-Claire King, Joan H. Marks, and Jessica B. Mandell. Breast andovarian cancer risks due to inherited mutations in BRCA1 and BRCA2.Science, 302(5645):643–646, 2003.

153

Page 184: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

48 Sining Chen, Edwin S. Iversen, Tara Friebel, Dianne Finkelstein, Barbara L.Weber, and Andrea Eisen et al. Characterization of BRCA1 and BRCA2mutations in a large United States sample. Journal of Clinical Oncology,24(6):863–871, 2006.

49 Eric R. Fearon. Human cancer syndromes: clues to the origin and nature ofcancer. Science, 278(5340):1043, 1997.

50 Ichiro Satokata, Kiyoji Tanaka, Naoyuki Miura, Michiko Narita, TakashiMimaki, and Yoshiaki Satoh et al. Three nonsense mutations responsiblefor group A xeroderma pigmentosum. Mutation Research/DNA Repair,273(2):193–202, 1992.

51David Malkin, Frederick P. Li, Louise C. Strong, Joseph F. Fraumeni,Camille E. Nelson, and David H. Kim et al. Germline p53 mutations in afamilial syndrome of breast cancer, sarcomas, and other neoplasms. Science,250(4985):1233–1238, 1990.

52 Frederick P. Li and Joseph F. Jr Fraumeni. Soft-tissue sarcomas, breastcancer, and other neoplasms: a familial syndrome? Annals of InternalMedicine, 71(4):747–752, 1969.

53David Malkin, Kent W. Jolly, Noele Barbier, A. Thomas Look, Stephen H.Friend, and Mark C. Gebhardt et al. Germline mutations of the p53tumor-suppressor gene in children and young adults with second malignantneoplasms. New England Journal of Medicine, 326(20):1309–1315, 1992.

54Arnold J. Levine. P53, the cellular gatekeeper for growth and division. Cell,88(3):323–331, 1997.

55Amato J. Giaccia and Michael B. Kastan. The complexity of p53modulation: emerging patterns from divergent signals. Genes &Development, 12(19):2973–2983, 1998.

56Charles J. Sherr and Frank McCormick. The RB and p53 pathways incancer. Cancer Cell, 2(2):103–112, 2002.

57 Fattaneh A. Tavassoli, Peter Devilee, and World Health Organization.Tumours of the breast and female genital organs - pathology and genetics.

154

Page 185: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

World Health Organization Classification of Tumours. Lyon, France: IARCPress, 2003.

58 Laufey T. Amundadottir, Sverrir Thorvaldsson, Daniel F. Gudbjartsson,Patrick Sulem, Kristleifur Kristjansson, and Sigurdur Arnason et al. Canceras a complex phenotype: pattern of cancer distribution within and beyondthe nuclear family. PLOS Medicine, 1(3):e65, 2005.

59 Iona Cheng, Jonathan M. Kocarnik, Logan Dumitrescu, Noralane M. Lindor,Jenny Chang-Claude, and Christy L. Avery et al. Pleiotropic effects ofgenetic risk variants for other cancers on colorectal cancer risk: PAGE,GECCO and CCFR consortia. Gut, 63(5):800–807, 2014.

60 Lisa A. Cannon-Albright, Alun Thomas, David E. Goldgar, KhosrowGholami, Kerry Rowe, and Matt Jacobsen et al. Familiality of cancer inUtah. Cancer Research, 54(9):2378–2385, 1994.

61 Pauli Vaittinen and Kari Hemminki. Familial cancer risks in offspring fromdiscordant parental cancers. International Journal of Cancer, 81(1):12–19,1999.

62Chuanhui Dong and Kari Hemminki. Modification of cancer risks in offspringby sibling and parental cancers from 2,112,616 nuclear families. InternationalJournal of Cancer, 92(1):144–150, 2001.

63Kamila Czene, Paul Lichtenstein, and Kari Hemminki. Environmentaland heritable causes of cancer among 9.6 million individuals in the Swedishfamily-cancer database. International Journal of Cancer, 99(2):260–266,2002.

64Christopher D.M. Fletcher and World Health Organization. WHOclassification of tumours of soft tissue and bone. International Agency forResearch on Cancer, 2013.

65 Zachary Burningham, Mia Hashibe, Logan Spector, and Joshua Schiffman.The epidemiology of sarcoma. Clinical Sarcoma Research, 2(1):14, 2012.

66Guy Lahat, Alexander Lazar, and Dina Lev. Sarcoma epidemiology andetiology: potential environmental and genetic factors. Surgical Clinics ofNorth America, 88(3):451–481, 2008.

155

Page 186: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

67 John R. Goldblum, Sharon W. Weiss, and Andrew L. Folpe. Enzinger andWeiss’s soft tissue tumors. Elsevier Health Sciences, 2013.

68 Fritz Schajowicz. Histological typing of bone tumours. Springer Science &Business Media, 2012.

69W. Archie Bleyer. Cancer in older adolescents and young adults:epidemiology, diagnosis, treatment, survival, and importance of clinical trials.Medical and Pediatric Oncology, 38(1):1–10, 2002.

70W. Archie Bleyer, Troy Budd, and Michael Montello. Adolescents and youngadults with cancer. Cancer, 107(S7):1645–1655, 2006.

71 Ernest K. Amankwah, Anthony P. Conley, and Damon R. Reed.Epidemiology and therapies for metastatic sarcoma. Clinical Epidemiology,5:147–162, 2013.

72Australasian Association of Cancer Registries. Cancer in Australia 1998:incidence and mortality data for 1998. Technical report, Australian Instituteof Health and Welfare, 2001.

73Kasmintan A. Schrader, Donavan T. Cheng, Vijai Joseph, Meera Prasad,Michael Walsh, and Ahmet Zehir et al. Germline variants in targeted tumorsequencing using matched normal DNA. JAMA Oncology, 2(1):104–111,2016.

74 Jinghui Zhang, Michael F. Walsh, Gang Wu, Michael N. Edmonson,Tanja A. Gruber, and John Easton et al. Germline mutations inpredisposition genes in pediatric cancer. New England Journal of Medicine,373(24):2336–2346, 2015.

75 Fabio Levi, Lalao Randimbison, Manuela Maspoli-Conconi, RafaelBlanc-Moya, and Carlo La Vecchia. Incidence of second sarcomas: a cancerregistry-based study. Cancer Causes & Control, 25(4):473–477, 2014.

76 Josefin Fernebro, Anna Bladstrom, Anders Rydholm, Pelle Gustafson, HakanOlsson, Jacob Engellau, and Mef Nilbert. Increased risk of malignancies in apopulation-based study of 818 soft-tissue sarcoma patients. British Journalof Cancer, 95(8):986–990, 2006.

156

Page 187: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

77Ruth A. Kleinerman, Sara J. Schonfeld, and Margaret A. Tucker. Sarcomasin hereditary retinoblastoma. Clinical Sarcoma Research, 2, 2012.

78Michael A. Postow and Mark E. Robson. Inherited gastrointestinal stromaltumor syndromes: mutations, clinical features, and therapeutic implications.Clinical Sarcoma Research, 2, 2012.

79D. Gareth R. Evans, Susan M. Huson, and Jillian M. Birch. Malignantperipheral nerve sheath tumours in inherited disease. Clinical SarcomaResearch, 2, 2012.

80 Junya Toguchida, Toshikazu Yamaguchi, Siri H. Dayton, Roberta L.Beaughamp, Guillermo E. Herrera, and Kanji Ishizaki at al. Prevalenceand spectrum of germline mutations of the p53 gene among patients withsarcoma. New England Journal of Medicine, 326(20):1301–1308, 1992.

81 Shih-Jen Hwang, Guillermina Lozano, Christopher I. Amos, and Louise C.Strong. Germline p53 mutations in a cohort with childhood sarcoma: sexdifferences in cancer risk. The American Journal of Human Genetics,72(4):975–983, 2003.

82Amy Berrington de Gonzalez, Alina Kutsenko, and Preetha Rajaraman.Sarcoma risk after radiation exposure. Clinical Sarcoma Research, 2(1):1,2012.

83 Lee J. Helman and Paul Meltzer. Mechanisms of sarcoma development.Nature Reviews Cancer, 3(9):685–694, 2003.

84Kishor Bhatia, Meredith S. Shiels, Alexandra Berg, and Eric A. Engels.Sarcomas other than Kaposi sarcoma occurring in immunodeficiency:interpretations from a systematic literature review. Current Opinion inOncology, 24(5):537, 2012.

85Denise Whitby, Chris Boshoff, T. Hatzioannou, Robert A. Weiss, Thomas F.Schulz, and Mark R. Howard et al. Detection of Kaposi sarcoma associatedherpesvirus in peripheral blood of HIV-infected individuals and progressionto Kaposi’s sarcoma. The Lancet, 346(8978):799–802, 1995.

157

Page 188: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

86R. Balarajan and Ernest D. Acheson. Soft tissue sarcomas in agricultureand forestry workers. Journal of Epidemiology and Community Health,38(2):113–116, 1984.

87Diego Serraino, Silvia Franceschi, Carlo La Vecchia, and Antonino Carbone.Occupation and soft-tissue sarcoma in northeastern Italy. Cancer Causes &Control, 3(1):25–30, 1992.

88Gun Wingren, Mats Fredrikson, H. Noorlind Brage, Bo Nordenskjold, andOlav Axelson. Soft tissue sarcoma and occupational exposures. Cancer,66(4):806–811, 1990.

89 Franco Merletti, Lorenzo Richiardi, Franco Bertoni, Wolfgang Ahrens,Antoine Buemi, and Cristina Costa-Santos et al. Occupational factors andrisk of adult bone sarcomas: A multicentric case-control study in Europe.International Journal of Cancer, 118(3):721–727, 2006.

90 Eero Pukkala, Jan Ivar Martinsen, Elsebeth Lynge, Holmfridur KolbrunGunnarsdottir, Par Sparen, and Laufey Tryggvadottir et al. Occupationand cancer-follow-up of 15 million people in five Nordic countries. ActaOncologica, 48(5):646–790, 2009.

91Mikael Eriksson, Lennart Hardell, and Hans-Olov Adami. Exposureto dioxins as a risk factor for soft tissue sarcoma: A population-basedcase-control study. Journal of the National Cancer Institute, 82(6):486–490,1990.

92 Jane A. Hoppin, Paige E. Tolbert, W. Dana Flanders, Rebecca H. Zhang,Danni S. Daniels, Bruce D. Ragsdale, and Edward A. Brann. Occupationalrisk factors for sarcoma subtypes. Epidemiology, 10(3):300–306, 1999.

93Manolis Kogevinas, Timo Kauppinen, Regina Winkelmann, Heiko Becher,Pier Alberto Bertazzi, and H. Bas Bueno-de-Mesquita et al. Soft tissuesarcoma and non-Hodgkin’s lymphoma in workers exposed to phenoxyherbicides, chlorophenols, and dioxins: two nested case-control studies.Epidemiology, 6(4):396–402, 1995.

94 Lennart Hardell and Mikael Eriksson. The association between soft tissuesarcomas and exposure to phenoxyacetic acids. Cancer, 62(3):652–656, 1988.

158

Page 189: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

95 J. Gustav Smith and Allen J. Christophers. Phenoxy herbicides andchlorophenols: a case control study on soft tissue sarcoma and malignantlymphoma. British Journal of Cancer, 65(3):442, 1992.

96 James S. Woods, Lincoln Polissar, Richard K. Severson, LS. Heuser, andBruce G. Kulander. Soft tissue sarcoma and non-Hodgkin’s lymphoma inrelation to phenoxyherbicide and chlorinated phenol exposure in westernWashington. Journal of the National Cancer Institute, 78(5):899–910, 1987.

97 Francesca Fioretti, Alessandra Tavani, Silvano Gallus, Eva Negri, SilviaFranceschi, and Carlo La Vecchia. Menstrual and reproductive factors andrisk of soft tissue sarcomas. Cancer, 88(4):786–789, 2000.

98Kristin P. Anfinsen, Susan S. Devesa, Freddie Bray, Rebecca Troisi, Thora J.Jonasdottir, Oyvind S. Bruland, and Tom Grotmol. Age-period-cohortanalysis of primary bone cancer incidence rates in the United States(1976-2005). Cancer Epidemiology Biomarkers & Prevention,20(8):1770–1777, 2011.

99Deborah M. Winn, Frederick P. Li, Leslie L. Robison, John J. Mulvihill,Ann E. Daigle, and Joseph F. Fraumeni. A case-control study of the etiologyof Ewing’s sarcoma. Cancer Epidemiology Biomarkers & Prevention,1(7):525–532, 1992.

100 Seymour Grufferman, Helen H. Wang, Elizabeth R. DeLong, Sue Y.S. Kimm,Elizabeth S. Delzell, and John M. Falletta. Environmental factors in theetiology of rhabdomyosarcoma in childhood. Journal of the National CancerInstitute, 68(1):107–113, 1982.

101Ann L. Hartley, Jillian M. Birch, Henry B. Marsden, Martin Harris, and ValBlair. Neurofibromatosis in children with soft tissue sarcoma. PediatricHematology and Oncology, 5(1):7–16, 1988.

102 Lisa Mirabello, Ruth Pfeiffer, Gwen Murphy, Najat C. Daw, AnaPatino-Garcia, and Rebecca J. Troisi et al. Height at diagnosis andbirth-weight as risk factors for osteosarcoma. Cancer Causes & Control,22(6):899–908, 2011.

159

Page 190: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

103 Logan G. Spector, Susan E. Puumala, Susan E. Carozza, Eric J. Chow,Erin E. Fox, and Scott Horel et al. Cancer risk among children with verylow birth weights. Pediatrics, 124(1):96–104, 2009.

104 Simona Ognjanovic, Susan E. Carozza, Eric J. Chow, Erin E. Fox, ScottHorel, and Colleen C. McLaughlin et al. Birth characteristics and the risk ofchildhood rhabdomyosarcoma based on histological subtype. British Journalof Cancer, 102(1):227–231, 2010.

105 Julie Von Behren, Logan G. Spector, Beth A. Mueller, Susan E. Carozza,Eric J. Chow, and Erin E. Fox et al. Birth order and risk of childhoodcancer: a pooled analysis from five US States. International Journal ofCancer, 128(11):2709–2716, 2011.

106 Felix Mitelman, Bertil Johansson, and Fredrik Mertens. Mitelmandatabase of chromosome aberrations and gene fusions in cancer; URL:http://cgap.nci.nih.gov/Chromosomes/Mitelman, 2016.

107 Shujuan J. Xia and Frederic G. Barr. Chromosome translocations insarcomas and the emergence of oncogenic transcription factors. EuropeanJournal of Cancer, 41(16):2513–2527, 2005.

108 Surbhi Jain, Lori W. McGinnes, and Trudy G. Morrison. Thiol/disulfideexchange is required for membrane fusion directed by the Newcastle diseasevirus fusion protein. Journal of Virology, 81(5):2328–2339, 2007.

109Brian P. Rubin, Samuel Singer, Connie Tsao, Anette Duensing, Marcia L.Lux, and Robert Ruiz et al. KIT activation is a ubiquitous feature ofgastrointestinal stromal tumors. Cancer Research, 61(22):8118–8121, 2001.

110Michael C. Heinrich, Christopher L. Corless, Anette Duensing, LauraMcGreevey, Chang-Jie Chen, and Nora Joseph et al. PDGFRA activatingmutations in gastrointestinal stromal tumors. Science, 299(5607):708–710,2003.

111 Louis Guillou and Alain Aurias. Soft tissue sarcomas with complex genomicprofiles. Virchows Archiv, 456(2):201–217, 2009.

160

Page 191: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

112 Jeff M. Hall, Ming K. Lee, Beth Newman, Jan E. Morrow, Lee A. Anderson,Bing Huey, and Marie-Claire King. Linkage of early-onset familial breastcancer to chromosome 17q21. Science, 250(4988):1684, 1990.

113Richard Wooster, Susan L. Neuhausen, Jonathan Mangion, Yvette Quirk,Deborah Ford, and Nadine Collins et al. Localization of a breastcancer susceptibility gene, BRCA2, to chromosome 13q12-13. Science,265(5181):2088–2091, 1994.

114Walter F. Bodmer, Carolyn J. Bailey, Julia G. Bodmer, H.J.R. Bussey,Anthony Ellis, and Patricia Gorman et al. Localization of thegene for familial adenomatous polyposis on chromosome 5. Nature,328(6131):614–616, 1987.

115 Paivi Peltomaki, Lauri A. Aaltonen, Pertti Sistonen, Lea Pylkkanen,Jukka-Pekka Mecklin, and Heikki Jarvinen et al. Genetic mapping of a locuspredisposing to human colorectal cancer. Science, 260(5109):810–812, 1993.

116Annika Lindblom, Pia Tannergard, Barbro Werelius, and MagnusNordenskjold. Genetic mapping of a second locus predisposing to hereditarynon-polyposis colon cancer. Nature Genetics, 5(3):279–282, 1993.

117 Lisa A. Cannon-Albright, David E. Goldgar, Laurence J. Meyer, Cathryn M.Lewis, David E. Anderson, and J.W. Fountain et al. Assignment of alocus for familial melanoma, MLM, to chromosome 9p13-p22. Science,258(5085):1148, 1992.

118Group Anglian Breast Cancer Study. Prevalence and penetrance of BRCA1and BRCA2 mutations in a population-based series of breast cancer cases.British Journal of Cancer, 83(10):1301, 2000.

119Kirsi Syrjakoski, Pia Vahteristo, Hannaleena Eerola, Anitta Tamminen, KatiKivinummi, and Laura Sarantaus et al. Population-based study of BRCA1and BRCA2 mutations in 1035 unselected Finnish breast cancer patients.Journal of the National Cancer Institute, 92(18):1529–1531, 2000.

120Gudrun Johannesdottir, Julius Gudmundsson, Jon T. Bergthorsson, AdalgeirArason, Bjarni A. Agnarsson, and Gudny Eiriksdottir et al. High prevalence

161

Page 192: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

of the 999del5 mutation in Icelandic breast and ovarian cancer patients.Cancer Research, 56(16):3663–3665, 1996.

121 Steinunn Thorlacius, Stefan Sigurdsson, Helga Bjarnadottir, GudridurOlafsdottir, Jon Gunnlaugur Jonasson, and Laufey Tryggvadottir et al.Study of a single BRCA2 mutation with high carrier frequency in a smallpopulation. American Journal of Human Genetics, 60(5):1079, 1997.

122 Patricia Hartge, Jeffery P. Struewing, Sholom Wacholder, Lawrence C.Brody, and Margaret A. Tucker. The prevalence of common BRCA1 andBRCA2 mutations among Ashkenazi Jews. The American Journal of HumanGenetics, 64(4):963–970, 1999.

123 Steinunn Thorlacius, Jeffery P. Struewing, Patricia Hartage, Gudridur H.Olafsdottir, Helgi Sigvaldason, and Laufey Tryggvadottir et al.Population-based study of risk of breast cancer in carriers of BRCA2mutation. The Lancet, 352(9137):1337–1339, 1998.

124Bruce A.J. Ponder. Cancer genetics. Nature, 411(6835):336–341, 2001.

125 Joel N. Hirschhorn and Mark J. Daly. Genome-wide association studies forcommon diseases and complex traits. Nature Review Genetics, 6(2):95–108,2005.

126Tony Burdett, Peggy N. Hall, Emma Hastings, Lucia A. Hindorff, andHeather A. Junkins. The NHGRI-EBI Catalog of published genome-wideassociation studies. Available at: www.ebiacuk/gwas, 2015.

127Andrew D. Beggs and Shirley V. Hodgson. Genomics and breast cancer:the different levels of inherited susceptibility. European Journal of HumanGenetics, 17(7):855–856, 2009.

128Teri A. Manolio, Francis S. Collins, Nancy J. Cox, David B. Goldstein,Lucia A. Hindorff, and David J. Hunter et al. Finding the missingheritability of complex diseases. Nature, 461(7265):747–753, 2009.

129 Jon McClellan and Mary-Claire King. Genetic heterogeneity in humandisease. Cell, 141(2):210–217, 2010.

162

Page 193: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

130 Frederick Sanger, Steven Nicklen, and Alan R. Coulson. DNA sequencingwith chain-terminating inhibitors. Proceedings of the National Academy ofSciences, 74(12):5463–5467, 1977.

131Marcel Margulies, Michael Egholm, William E. Altman, Said Attiya, Joel S.Bader, and Lisa A. Bemben et al. Genome sequencing in microfabricatedhigh-density picolitre reactors. Nature, 437(7057):376–380, 2005.

132 Erwin L. van Dijk, Helene Auger, Yan Jaszczyszyn, and Claude Thermes.Ten years of next-generation sequencing technology. Trends in Genetics,30(9):418–426, 2014.

133Daniel C. Koboldt, Karyn Meltz Steinberg, David E. Larson, Richard K.Wilson, and Elaine R. Mardis. The next-generation sequencing revolutionand its impact on genomics. Cell, 155(1):27–38, 2013.

134 Sally Bamford, Emily Dawson, Simon Forbes, Jody Clements, Roger Pettett,and Ahmet Dogan et al. The COSMIC (catalogue of somatic mutations incancer) database and website. British Journal of Cancer, 91(2):355–358,2004.

135The Cancer Genome Atlas Research Network, John N. Weinstein, Eric A.Collisson, Gordon B. Mills, Kenna R. Mills Shaw, Brad A. Ozenberger, andKyle Ellrott et al. The cancer genome atlas pan-cancer analysis project.Nature Genetics, 45(10):1113–1120, 2013.

136Thomas J. Hudson, Warwick Anderson, Axel Aretz, Anna D. Barker, CindyBell, and Rosa R. Bernabe et al. International network of cancer genomeprojects. Nature, 464(7291):993–998, 2010.

137Veronique G. LeBlanc and Marco A. Marra. Next-generation sequencingapproaches in cancer: Where have they brought us and where will they takeus? Cancers, 7(3):1925–1958, 2015.

138 Elaine R. Mardis. Next-generation DNA sequencing methods. Annual Reviewof Genomics and Human Genetics, 9(1):387–402, 2008.

139Michael L. Metzker. Sequencing technologies - the next generation. NatureReviews Genetics, 11(1):31–46, 2010.

163

Page 194: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

140David N. Cooper. The nature and mechanisms of human gene mutation. TheMetabolic and Molecular Bases of Inherited Disease, pages 259–291, 1995.

141 Sarah B. Ng, Emily H. Turner, Peggy D. Robertson, Steven D. Flygare,Abigail W. Bigham, and Choli Lee et al. Targeted capture and massivelyparallel sequencing of 12 human exomes. Nature, 461(7261):272–276, 2009.

142 Sarah B. Ng, Abigail W. Bigham, Kati J. Buckingham, Mark C. Hannibal,Margaret J. McMillin, and Heidi I. Gildersleeve et al. Exome sequencingidentifies MLL2 mutations as a cause of Kabuki syndrome. Nature Genetics,42(9):790–793, 2010.

143Alexander Hoischen, Bregje W.M. van Bon, Christian Gilissen, Peer Arts,Bart van Lier, and Marloes Steehouwer et al. De novo mutations of SETBP1cause Schinzel-Giedion syndrome. Nature Genetics, 42(6):483–485, 2010.

144 Sarah B. Ng, Kati J. Buckingham, Choli Lee, Abigail W. Bigham, Holly K.Tabor, and Karin M. Dent et al. Exome sequencing identifies the cause of aMendelian disorder. Nature Genetics, 42(1):30–35, 2010.

145 Jun Ling Wang, Xu Yang, Kun Xia, Zheng Mao Hu, Ling Weng, and XinJin et al. TGM6 identified as a novel causative gene of spinocerebellarataxias using exome sequencing. Brain, 133(12):3510–3518, 2010.

146Chee-Seng Ku, Nasheen Naidoo, and Yudi Pawitan. Revisiting Mendeliandisorders through exome sequencing. Human Genetics, 129(4):351–370, 2011.

147 Jessada Thutkawkorapin, Simone Picelli, Vinaykumar Kontham, Tao Liu,Daniel Nilsson, and Annika Lindblom. Exome sequencing in one family withgastric- and rectal cancer. BMC Genetics, 17:41, 2016.

148Matthew Meyerson, Stacey Gabriel, and Gad Getz. Advances inunderstanding cancer genomes through second-generation sequencing. NatureReview Genetics, 11(10):685–696, 2010.

149Riyue Bao, Lei Huang, Jorge Andrade, Wei Tan, Warren A. Kibbe, HongmeiJiang, and Gang Feng. Review of current methods, applications, and datamanagement for the bioinformatics analysis of whole exome sequencing.Cancer Informatics, 13(Suppl 2):67–82, 2014.

164

Page 195: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

150Kristian Cibulskis, Michael S. Lawrence, Scott L. Carter, Andrey Sivachenko,David Jaffe, and Carrie Sougnez et al. Sensitive detection of somaticpoint mutations in impure and heterogeneous cancer samples. NatureBiotechnology, 31(3):213–219, 2013.

151Qingguo Wang, Peilin Jia, Fei Li, Haiquan Chen, Hongbin Ji, and DonaldHucks et al. Detecting somatic point mutations in cancer genome sequencingdata: a comparison of mutation callers. Genome Medicine, 5(10):1–8, 2013.

152Xiaofeng Zhu, Tao Feng, Yali Li, Qing Lu, and Robert C. Elston. Detectingrare variants for complex traits using family and unrelated data. GeneticEpidemiology, 34(2):171–187, 2010.

153Tao Feng, Robert C. Elston, and Xiaofeng Zhu. Detecting rare and commonvariants for complex traits: sibpair and odds ratio weighted sum statistics(SPWSS, ORWSS). Genetic Epidemiology, 35(5):398–409, 2011.

154 Iuliana Ionita-Laza and Ruth Ottman. Study designs for identification ofrare disease variants in complex diseases: the utility of family-based designs.Genetics, 189(3):1061–1068, 2011.

155Gang Shi and D.C. Rao. Optimum designs for next-generation sequencing todiscover rare variants for common complex disease. Genetic Epidemiology,35(6):572–579, 2011.

156Colin C. Pritchard, Christina Smith, Stephen J. Salipante, Ming K. Lee,Anne M. Thornton, and Alex S. Nord et al. ColoSeq provides comprehensiveLynch and polyposis syndrome mutational analysis using massively parallelsequencing. The Journal of Molecular Diagnostics, 14(4):357–366, 2012.

157Tom Walsh, Ming K. Lee, Silvia Casadei, Anne M. Thornton, Sunday M.Stray, and Christopher Pennil et al. Detection of inherited mutationsfor breast and ovarian cancer using genomic capture and massivelyparallel sequencing. Proceedings of the National Academy of Sciences,107(28):12629–12633, 2010.

158Duncan Thomas, Zhao Yang, and Fan Yang. Two-phase and family-baseddesigns for next-generation sequencing studies. Frontiers in Genetics, 4(276),2013.

165

Page 196: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

159Nazneen Rahman. Realizing the promise of cancer predisposition genes.Nature, 505(7483):302–308, 2014.

160Nazneen Rahman. Mainstreaming genetic testing of cancer predispositiongenes. Clinical Medicine, 14(4):436–439, 2014.

161Victor A. McKusick. Mendelian Inheritance in Man and Its Online Version,OMIM. American Journal of Human Genetics, 80(4):588–604, 2007.

162Olivia Fletcher and Richard S. Houlston. Architecture of inheritedsusceptibility to common cancer. Nature Reviews Cancer, 10(5):353–361,2010.

163David M. Thomas and Mandy L. Ballinger. Inherited and de novo germlineTP53 mutations in adult-onset sarcoma. Hereditary Cancer in ClinicalPractice, 10(2):A26, 2012.

164 Levi A. Garraway and Eric S. Lander. Lessons from the cancer genome. Cell,153(1):17–37, 2013.

165Himisha Beltran, Davide Prandi, Juan Miguel Mosquera, Matteo Benelli,Loredana Puca, and Joanna Cyrta et al. Divergent clonal evolution ofcastration-resistant neuroendocrine prostate cancer. Nature Medicine,22(3):298–305, 2016.

166 Peter D. Stenson, Edward V. Ball, Katy Howells, Andrew D. Phillips,Matthew Mort, and David N. Cooper. The human gene mutation database:providing a comprehensive central mutation database for moleculardiagnostics and personalised genomics. Human Genomics, 4(2):69, 2009.

167Murim Choi, Ute I. Scholl, Weizhen Ji, Tiewen Liu, Irina R. Tikhonova, andPaul Zumbo et al. Genetic diagnosis by whole exome capture and massivelyparallel DNA sequencing. Proceedings of the National Academy of Sciences,106(45):19096–19101, 2009.

168Dale Hedges, Dan Burges, Eric Powell, Cherylyn Almonte, Jia Huang,and Stuart Young et al. Exome sequencing of a multigenerational humanpedigree. PLOS ONE, 4(12):e8232, 2009.

166

Page 197: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

169David Botstein and Neil Risch. Discovering genotypes underlying humanphenotypes: past successes for mendelian disease, future approaches forcomplex disease. Nature Genetics, 33(3s):228, 2003.

170Urs A. Meyer. Pharmacogenetics and adverse drug reactions. The Lancet,356(9242):1667–1671, 2000.

171Urs A. Meyer, Ulrich M. Zanger, and Matthias Schwab. Omics and drugresponse. Annual Review of Pharmacology and Toxicology, 53(1):475–502,2013.

172Barry Merriman, Ion Torrent Development Team, and Jonathan M.Rothberg. Progress in Ion Torrent semiconductor chip based sequencing.Electrophoresis, 33(23):3397–3417, 2012.

173Martin Mascher, Shuangye Wu, Paul St Amand, Nils Stein, and JessePoland. Application of genotyping-by-sequencing on semiconductorsequencing platforms: a comparison of genetic and reference-based markerordering in barley. PLOS ONE, 8(10):e76925, 2013.

174Nicholas J. Loman, Raju V. Misra, Timothy J. Dallman, ChrystalaConstantinidou, Saheer E. Gharbia, John Wain, and Mark J. Pallen.Performance comparison of benchtop high-throughput sequencing platforms.Nature Biotechnology, 30(5):434–439, 2012.

175Australasian Sarcoma Study Group. International sarcoma kindred study,URL: http://www.australiansarcomagroup.org/sarcomakindredstudy, 2013.

176Gillian Mitchell, Mandy L. Ballinger, Stephen Wong, Chelsee Hewitt, PaulJames, and Mary-Anne Young et al. High frequency of germline TP53mutations in a prospective adult-onset sarcoma cohort. PLOS ONE,8(7):1–7, 2013.

177Gang Peng, Jasmina Bojadzieva, Mandy L. Ballinger, Jialu Li, Amanda L.Blackford, and Phuong L. Mai et al. Estimating TP53 mutation carrierprobability in families with Li-Fraumeni syndrome using LFSPRO. CancerEpidemiology and Prevention Biomarkers, pages cebp–0695.2016, 2017.

167

Page 198: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

178Mandy L. Ballinger, David L. Goode, Isabelle Ray-Coquard, Paul A. James,Gillian Mitchell, and Eveline Niedermayr et al. Monogenic and polygenicdeterminants of sarcoma risk: an international genetic study. The LancetOncology, 17(9):1261–1271, 2016.

179 Petr Danecek, Adam Auton, Goncalo Abecasis, Cornelis A. Albers, EricBanks, and Mark A. DePristo et al. The variant call format and vcftools.Bioinformatics, 27(15):2156–2158, 2011.

180Aaron McKenna, Matthew Hanna, Eric Banks, Andrey Sivachenko, KristianCibulskis, and Andrew Kernytsky et al. The genome analysis toolkit: aMapReduce framework for analyzing next-generation DNA sequencing data.Genome Research, 20(9):1297–1303, 2010.

181Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, andNils Homer et al. The Sequence Alignment/Map format and SAMtools.Bioinformatics, 25(16):2078–2079, 2009.

182GATK Documentation. Variant quality score recalibration (VQSR), URL:http://gatkforums.broadinstitute.org/gatk/discussion/39/variant-quality-score-recalibration-vqsr, 2016.

183 James T. Robinson, Helga Thorvaldsdottir, Wendy Winckler, MitchellGuttman, Eric S. Lander, Gad Getz, and Jill P. Mesirov. Integrativegenomics viewer. Nature Biotechnology, 29(1):24–26, 2011.

184Helga Thorvaldsdottir, James T. Robinson, and Jill P. Mesirov. IntegrativeGenomics Viewer (IGV): high-performance genomics data visualization andexploration. Briefings in Bioinformatics, 14(2):178–192, 2013.

185Michael J. Clark, Rui Chen, Hugo Y. K. Lam, Konrad J. Karczewski, RongChen, and Ghia Euskirchen et al. Performance comparison of exome DNAsequencing technologies. Nature Biotechnology, 29(10):908–914, 2011.

186Alison M. Meynert, Louise S. Bicknell, Matthew E. Hurles, Andrew P.Jackson, and Martin S. Taylor. Quantifying single nucleotide variantdetection sensitivity in exome sequencing. BMC Bioinformatics, 14(1):1,2013.

168

Page 199: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

187Robert P. VanderWaal, Douglas R. Spitz, Cara L. Griffith, RyujiHigashikubo, and Joseph L. Roti Roti. Evidence that protein disulfideisomerase (PDI) is involved in DNA-nuclear matrix anchoring. Journal ofCellular Biochemistry, 85(4):689–702, 2002.

188Henry B. Mann and Donald R. Whitney. On a test of whether one of tworandom variables is stochastically larger than the other. The Annals ofMathematical Statistics, pages 50–60, 1947.

189William Bateson and Gregor Mendel. Mendel’s principles of heredity.University press, 1913.

190 Ingrid B. Borecki and Michael A. Province. Genetic and genomic discoveryusing family studies. Circulation, 118(10):1057–1063, 2008.

191Diana Merino and David Malkin. p53 and hereditary cancer. In Deb SwatiPalit and Deb Sumitra, editors, Mutant p53 and MDM2 in Cancer, pages1–16. Springer Netherlands, Dordrecht, 2014.

192 Joanne Ngeow and Charis Eng. Precision medicine in heritable cancer:when somatic tumour testing and germline mutations meet. NPJ GenomicMedicine, 1:15006, 2016.

193 Edward D. Lustbader, Wick R. Williams, Melissa L. Bondy, Sara Strom, andLouise C. Strong. Segregation analysis of cancer in families of childhoodsoft-tissue-sarcoma patients. American Journal of Human Genetics,51(2):344–356, 1992.

194Biljana Novakovic, Alisa M. Goldstein, Leonard H. Wexler, and Margaret A.Tucker. Increased risk of neuroectodermal tumors and stomach cancer inrelatives of patients with Ewing’s sarcoma family of tumors. Journal of theNational Cancer Institute, 86(22):1702–1706, 1994.

195Ann L. Hartley, Jillian M. Birch, Val Blair, Anna M. Kelsey, Martin Harris,and Patricia H. Morris Jones. Patterns of cancer in the families of childrenwith soft tissue sarcoma. Cancer, 72(3):923–930, 1993.

196 Eileen Burke, Frederick P. Li, Abbe J. Janov, Stephen Batter, HolcombeGrier, and Allen Goorin. Cancer in relatives of survivors of childhoodsarcoma. Cancer, 67(5):1467–1469, 1991.

169

Page 200: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

197Kevin B. Jones, Joshua D. Schiffman, Wendy Kohlmann, R. Lor Randall,Stephen L. Lessnick, and Lisa A. Cannon-Albright. Complex genotypesarcomas display familial inheritance independent of known cancerpredisposition syndromes. Cancer Epidemiology Biomarkers & Prevention,20(5):751–757, 2011.

198Henry T. Lynch, Gabriel M. Mulcahy, Randall E. Harris, Hoda A. Guirgis,and Jane F. Lynch. Genetic and pathologic findings in a kindred withhereditary sarcoma breast cancer, brain tumors, leukemia, lung, laryngeal,and adrenal cortical carcinoma. Cancer, 41:2055–2064, 1978.

199Wick R. Williams and Louise C. Strong. Genetic epidemiology of soft tissuesarcomas in children. In Familial Cancer, pages 151–153. Karger Publishers,1985.

200Henry T. Lynch, Randall E. Brand, David Hogg, Carolyn A. Deters,Ramon M. Fusaro, and Jane F. Lynch et al. Phenotypic variation ineight extended CDKN2A germline mutation familial atypical multiple molemelanoma-pancreatic carcinoma-prone families. Cancer, 94(1):84–96, 2002.

201 Stephen J. Rulyak, Teresa A. Brentnall, Henry T. Lynch, and Melissa A.Austin. Characterization of the neoplastic phenotype in the familialatypical multiple mole melanoma pancreatic carcinoma syndrome. Cancer,98(4):798–804, 2003.

202 Sophie Sun, Pamela M. Pollock, Ling Liu, Sepideh Karimi, Serge Jothy, andBenedict J. Milner et al. CDKN2A mutation in a non-FAMMM kindredwith cancers at multiple sites results in a functionally abnormal protein.International Journal of Cancer, 73(4):531–536, 1997.

203Rodney C.P. Go, Mary-Claire King, Joan Bailey-Wilson, Robert C. Elston,and Henry T. Lynch. Genetic epidemiology of breast cancer and associatedcancers in high-risk families. I. Segregation analysis. Journal of the NationalCancer Institute, 71(3):455–461, 1983.

204Henry T. Lynch, Carolyn A. Deters, David Hogg, Jane F. Lynch, YuliaKinarsky, and Zoran Gatalica. Familial sarcoma. Cancer, 98(9):1947–1957,2003.

170

Page 201: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

205Audrey H. Schnell and John S. Witte. Family-based study designs. InTimothy R. Rebeck, Christine B. Ambrosone, and Peter G. Shields, editors,Molecular Epidemiology: Applications in Cancer and Other Human Diseases,pages 19–28. Taylor & Francis, 2008.

206 Steven A. Narod, Deborah Ford, Peter Devilee, Rosa B. Barkardottir,Henry T. Lynch, and Simon A. Smith et al. An evaluation of geneticheterogeneity in 145 breast-ovarian cancer families. American Journal ofHuman Genetics, 56(1):254–264, 1995.

207 Jared C. Roach, Gustavo Glusman, Arian F.A. Smit, Chad D. Huff, RobertHubley, and Paul T. Shannon et al. Analysis of genetic inheritance in afamily quartet by whole-genome sequencing. Science, 328(5978):636–639,2010.

208 Jianxin Shi, Xiaohong R. Yang, Bari Ballew, Melissa Rotunno, DonatoCalista, and Maria Concetta Fargnoli et al. Rare missense variants in POT1predispose to familial cutaneous malignant melanoma. Nature Genetics,46(5):482–486, 2014.

209 Leslie G. Biesecker. Exome sequencing makes medical genomics a reality.Nature Genetics, 42(1):13–15, 2010.

210Michael J. Bamshad, Sarah B. Ng, Abigail W. Bigham, Holly K. Tabor,Mary J. Emond, Deborah A. Nickerson, and Jay Shendure. Exomesequencing as a tool for Mendelian disease gene discovery. Nature ReviewsGenetics, 12(11):745–755, 2011.

211Gregory V. Kryukov, Len A. Pennacchio, and Shamil R. Sunyaev. Mostrare missense alleles are deleterious in humans: implications for complexdisease and association studies. The American Journal of Human Genetics,80(4):727–739, 2007.

212Colin C. Pritchard, Stephen J. Salipante, Karen Koehler, Christina Smith,Sheena Scroggins, and Brent Wood et al. Validation and implementationof targeted capture and sequencing for the detection of actionable mutation,copy number variation, and gene rearrangement in clinical cancer specimens.The Journal of Molecular Diagnostics, 16(1):56–67, 2014.

171

Page 202: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

213Antonija Kreso, Catherine A. O’Brien, Peter van Galen, Olga I. Gan,Faiyaz Notta, and Andrew M.K. Brown et al. Variable clonal repopulationdynamics influence chemotherapy response in colorectal cancer. Science,339(6119):543–548, 2013.

214 Sreenath V. Sharma, Daphne W. Bell, Jeffrey Settleman, and Daniel A.Haber. Epidermal growth factor receptor mutations in lung cancer. NatureReviews Cancer, 7(3):169–181, 2007.

215 Paul B. Chapman, Axel Hauschild, Caroline Robert, John B. Haanen, PaoloAscierto, and James Larkin et al. Improved survival with vemurafenib inmelanoma with BRAF V600E mutation. New England Journal of Medicine,2011(364):2507–2516, 2011.

216David M. Thomas and Mandy L. Ballinger. Diagnosis and management ofhereditary sarcoma. In Rare Hereditary Cancers, pages 169–189. Springer,2016.

217Navnath S. Gavande, Pamela S. VanderVere-Carozza, Hilary D. Hinshaw,Shadia I. Jalal, Catherine R. Sears, Katherine S. Pawelczak, and John J.Turchi. DNA repair targeted therapy: The past or future of cancertreatment? Pharmacology & Therapeutics, 160:65–83, 2016.

218David C. Samuels, Leng Han, Jiang Li, Sheng Quanghu, Travis A. Clark,Yu Shyr, and Yan Guo. Finding the lost treasures in exome sequencing data.Trends in Genetics, 29(10):593–599, 2013.

219Malte Spielmann and Stefan Mundlos. Looking beyond the genes: therole of non-coding variants in human disease. Human Molecular Genetics,25(R2):R157–R165, 2016.

220Graham R.S. Ritchie and Paul Flicek. Computational approaches tointerpreting genomic sequence variation. Genome Medicine, 6(10):87, 2014.

221 Paul D.P. Pharoah, Alison M. Dunning, Bruce A.J. Ponder, and Douglas F.Easton. Association studies for finding cancer-susceptibility genetic variants.Nature Reviews Cancer, 4(11):850–860, 2004.

172

Page 203: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

222 Susanne Horn, Adina Figl, P. Sivaramakrishna Rachakonda, ChristineFischer, Antje Sucker, Andreas Gast, Stephanie Kadel, Iris Moll, EduardoNagore, and Kari Hemminki. Tert promoter mutations in familial andsporadic melanoma. Science, 339(6122):959–961, 2013.

223The ENCODE Project Consortium. An integrated encyclopedia of DNAelements in the human genome. Nature, 489(7414):57–74, 2012.

224Amanda Warr, Christelle Robert, David Hume, Alan Archibald, Nader Deeb,and Mick Watson. Exome sequencing: current and future perspectives. G3:Genes|Genomes|Genetics, 5(8):1543–1550, 2015.

225Ken Chen, John W. Wallis, Michael D. McLellan, David E. Larson,Joelle M. Kalicki, Craig S. Pohl, and et al. Breakdancer: an algorithm forhigh-resolution mapping of genomic structural variation. Nature Methods,6(9):677–681, 2009.

226Can Alkan, Bradley P. Coe, and Evan E. Eichler. Genome structuralvariation discovery and genotyping. Nature Reviews Genetics, 12(5):363–376,2011.

227Biao Liu, Jeffrey M. Conroy, Carl D. Morrison, Adekunle O. Odunsi,Maochun Qin, Lei Wei, and et al. Structural variation discovery in thecancer genome using next generation sequencing: computational solutionsand perspectives. Oncotarget, 6(8):5477–5489, 2015.

228 Shengpei Chen, Sheng Li, Weiwei Xie, Xuchao Li, Chunlei Zhang, andHaojun Jiang et al. Performance comparison between rapid sequencingplatforms for ultra-low coverage sequencing strategy. PLOS ONE,9(3):e92192, 2014.

229 Joseph F. Boland, Charles C. Chung, David Roberson, Jason Mitchell, XijunZhang, and Kate M. Im et al. The new sequencer on the block: comparisonof Life Technology’s Proton sequencer to an Illumina HiSeq for whole-exomesequencing. Human Genetics, 132(10):1153–1163, 2013.

230 Eric Samorodnitsky, Benjamin M. Jewell, Raffi Hagopian, Jharna Miya,Michele R. Wing, and Ezra Lyon et al. Evaluation of hybridization capture

173

Page 204: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

versus amplicon-based methods for whole-exome sequencing. HumanMutation, 36(9):903–914, 2015.

231 Pankaj Kumar, Mashael Al-Shafai, Wadha Ahmed Al Muftah, NaderChalhoub, Mahmoud F. Elsaid, Alice Abdel Aleem, and Karsten Suhre.Evaluation of SNP calling using single and multiple-sample callingalgorithms by validation against array base genotyping and Mendelianinheritance. BMC Research Notes, 7:747, 2014.

232 Pengyuan Zhu, Lingyu He, Yaqiao Li, Wenpan Huang, Feng Xi, and LinLin et al. OTG-snpcaller: an optimized pipeline based on TMAP and GATKfor SNP calling from Ion Torrent data. PLOS ONE, 9(5):e97507, 2014.

233Xiangtao Liu, Shizhong Han, Zuoheng Wang, Joel Gelernter, and Bao-ZhuYang. Variant callers for next-generation sequencing data: a comparisonstudy. PLOS ONE, 8(9):e75619, 2013.

234 Su Yeon Kim, Laurent Jacob, and Terence P. Speed. Combining calls frommultiple somatic mutation-callers. BMC Bioinformatics, 15(1):154, 2014.

235 Ikuko N. Motoike, Mitsuyo Matsumoto, Inaho Danjoh, Fumiki Katsuoka,Kaname Kojima, and Naoki Nariai et al. Validation of multiple singlenucleotide variation calls by additional exome analysis with a semiconductorsequencer to supplement data of whole-genome sequencing of a humanpopulation. BMC Genomics, 15(1):673, 2014.

236Daniel G. MacArthur, Teri. A. Manolio, David P. Dimmock, Heidi L. Rehm,Jay Shendure, and Goncalo R. Abecasis et al. Guidelines for investigatingcausality of sequence variants in human disease. Nature, 508(7497):469–476,2014.

237 LaDeana W. Hillier, Gabor T. Marth, Aaron R. Quinlan, David Dooling,Ginger Fewell, and Derek Barnett et al. Whole-genome sequencing andvariant discovery in C. elegans. Nature Methods, 5(2):183–188, 2008.

238Christian Gilissen, Alexander Hoischen, Han G. Brunner, and Joris A.Veltman. Disease gene identification strategies for exome sequencing.European Journal of Human Genetics, 20(5):490–497, 2012.

174

Page 205: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

239Mahjoubeh Jalali Sefid Dashti and Junaid Gamieldien. Identifying candidatefunction-impacting variants. BioTechniques, 62(1):18–30, 2017.

240Damian Smedley and Peter N. Robinson. Phenotype-driven strategies forexome prioritization of human Mendelian disease genes. Genome Medicine,7(1):81, 2015.

241Vincent J. Henry, Anita E. Bandrowski, Anne-Sophie Pepin, Bruno J.Gonzalez, and Arnaud Desfeux. OMICtools: an informative directory formulti-omic data analysis. Database, 2014:bau069–bau069, 2014.

242 Stephan Pabinger, Andreas Dander, Maria Fischer, Rene Snajder, MichaelSperk, and Mirjana Efremova et al. A survey of tools for variant analysisof next-generation genome sequencing data. Briefings in Bioinformatics,15(2):256–278, 2013.

243Min Zhao and Zhongming Zhao. CNVannotator: a comprehensive annotationserver for copy number variation in the human genome. PLOS ONE,8(11):e80170, 2013.

244 Eric R. Gamazon, Wei Zhang, Anuar Konkashbaev, Shiwei Duan, Emily O.Kistner, and Dan L. Nicolae et al. SCAN: SNP and copy number annotation.Bioinformatics, 26(2):259–262, 2010.

245Kai Wang, Mingyao Li, and Hakon Hakonarson. ANNOVAR: functionalannotation of genetic variants from high-throughput sequencing data.Nucleic Acids Research, 38(16):e164, 2010.

246Kai Wang, Mingyao Li, Dexter Hadley, Rui Liu, Joseph Glessner, and StruanF.A. Grant et al. PennCNV: an integrated hidden Markov model designedfor high-resolution copy number variation detection in whole-genome SNPgenotyping data. Genome Research, 17(11):1665–1674, 2007.

247Vladimir Makarov, Tina O’Grady, Guiqing Cai, Jayon Lihm, Joseph D.Buxbaum, and Seungtai Yoon. AnnTools: a comprehensive and versatileannotation toolkit for genomic variants. Bioinformatics, 28(5):724–725, 2012.

248Ryan L. Collins, Matthew R. Stone, Harrison Brand, Joseph T. Glessner,and Michael E. Talkowski. CNView: a visualization and annotation tool for

175

Page 206: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

copy number variation from whole-genome sequencing. bioRxiv, page 049536,2016.

249Yuanwei Zhang, Zhenhua Yu, Rongjun Ban, Huan Zhang, Furhan Iqbal, andAiwu Zhao et al. DeAnnCNV: a tool for online detection and annotation ofcopy number variations from whole-exome sequencing data. Nucleic AcidsResearch, 43(W1):W289–W294, 2015.

250Galina A. Erikson, Neha Deshpande, Balachandar G. Kesavan, and AliTorkamani. SG-ADVISER CNV: copy-number variant annotation andinterpretation. Genetics in Medicine, 17(9):714–718, 2014.

251 Stephen T. Sherry, Ming H. Ward, Michael Kholodov, Jonathan Baker,Lon Phan, Elizabeth M. Smigielski, and Karl Sirotkin. dbSNP: the NCBIdatabase of genetic variation. Nucleic Acids Research, 29(1):308–311, 2001.

252Consortium Genomes Project. A map of human genome variation frompopulation-scale sequencing. Nature, 467(7319):1061–1073, 2010.

253 Feng Zhang and James R. Lupski. Non-coding genetic variants in humandisease. Human Molecular Genetics, 24(R1):R102–R110, 2015.

254Anna-Maija Sulonen, Pekka Ellonen, Henrikki Almusa, Maija Lepisto,Samuli Eldfors, and Sari Hannula et al. Comparison of solution-based exomecapture methods for next generation sequencing. Genome Biology, 12(9):R94,2011.

255Yu Xu, Hui Jiang, Chris Tyler-Smith, Yali Xue, Tao Jiang, and JiaweiWang et al. Comprehensive comparison of three commercial humanwhole-exome capture platforms. Genome Biology, 12(9):1, 2011.

256Yan Guo, Jirong Long, Jing He, Chung-I. Li, Qiuyin Cai, and Xiao-OuShu et al. Exome sequencing generates high quality data in non-targetregions. BMC Genomics, 13(1):194, 2012.

257Alan P. Boyle, Eurie L. Hong, Manoj Hariharan, Yong Cheng, Marc A.Schaub, and Maya Kasowski et al. Annotation of functional variation inpersonal genomes using RegulomeDB. Genome Research, 22(9):1790–1797,2012.

176

Page 207: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

258Matthew R. Nelson, Daniel Wegmann, Margaret G. Ehm, Darren Kessner,Pamela St Jean, and Claudio Verzilli et al. An abundance of rare functionalvariants in 202 drug target genes sequenced in 14,002 people. Science,337(6090):100–104, 2012.

259 Elizabeth T. Cirulli and David B. Goldstein. Uncovering the roles of rarevariants in common disease through whole-genome sequencing. NatureReviews Genetics, 11(6):415–425, 2010.

260 Exome Variant Server. Exome variant server, URL:http://evs.gs.washington.edu/EVS/, 2016.

261Monkol Lek, Konrad J. Karczewski, Eric V. Minikel, Kaitlin E. Samocha,Eric Banks, and Timothy Fennell et al. Analysis of protein-coding geneticvariation in 60,706 humans. Nature, 536(7616):285–291, 2016.

262 Eugene V. Davydov, David L. Goode, Marina Sirota, Gregory M. Cooper,Arend Sidow, and Serafim Batzoglou. Identifying a high fraction of thehuman genome to be under selective constraint using GERP++. PLOSComputational Biology, 6(12):e1001025, 2010.

263 Lisenka E.L.M. Vissers, Joep de Ligt, Christian Gilissen, Irene Janssen,Marloes Steehouwer, and Petra de Vries et al. A de novo paradigm formental retardation. Nature Genetics, 42(12):1109–1112, 2010.

264Gregory M. Cooper, David L. Goode, Sarah B. Ng, Arend Sidow, Michael J.Bamshad, Jay Shendure, and Deborah A. Nickerson. Single-nucleotideevolutionary constraint scores highlight disease-causing mutations. NatureMethods, 7(4):250–251, 2010.

265Michael Krawczak, Edward V. Ball, Iain Fenton, Peter D. Stenson, ShaunAbeysinghe, Nick Thomas, and David N. Cooper. Human gene mutationdatabase - a biomedical information and research resource. Human Mutation,15(1):45, 2000.

266 Pauline C. Ng and Steven Henikoff. SIFT: Predicting amino acid changesthat affect protein function. Nucleic Acids Research, 31(13):3812–3814, 2003.

177

Page 208: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

267 Ivan Adzhubei, Daniel M. Jordan, and Shamil R. Sunyaev. Predictingfunctional effect of human missense mutations using PolyPhen-2. CurrentProtocols in Human Genetics, pages 7–20, 2013.

268Holly K. Tabor, Neil J. Risch, and Richard M. Myers. Candidate-geneapproaches for studying complex genetic traits: practical considerations.Nature Review Genetics, 3(5):391–397, 2002.

269 Jennifer M. Kwon and Alison M. Goate. The candidate gene approach.Alcohol Research and Health, 24(3):164–168, 2000.

270Nadav Ahituv, Nihan Kavaslar, Wendy Schackwitz, Anna Ustaszewska,Joel Martin, and Sybil Hebert et al. Medical sequencing at the extremesof human body mass. The American Journal of Human Genetics,80(4):779–791, 2007.

271Amelie Bonnefond, Nathalie Clement, Katherine Fawcett, Loic Yengo,Emmanuel Vaillant, and Jean-Luc Guillaume et al. Rare MTNR1B variantsimpairing melatonin receptor 1B function contribute to type 2 diabetes.Nature Genetics, 44(3):297–301, 2012.

272 Jonathan C. Cohen, Robert S. Kiss, Alexander Pertsemlidis, Yves L. Marcel,Ruth McPherson, and Helen H. Hobbs. Multiple rare alleles contribute tolow plasma levels of HDL cholesterol. Science, 305(5685):869–872, 2004.

273Dorothee Diogo, Fina Kurreeman, Eli A. Stahl, Katherine P. Liao, NamrataGupta, and Jeffrey D. Greenberg et al. Rare, low-frequency, and commonvariants in the protein-coding sequence of biological candidate genes fromGWASs contribute to risk of rheumatoid arthritis. The American Journal ofHuman Genetics, 92(1):15–27, 2013.

274Weizhen Ji, Jia Nee Foo, Brian J. O’Roak, Hongyu Zhao, Martin G. Larson,and David B. Simon et al. Rare independent mutations in renal salt handlinggenes contribute to blood pressure variation. Nature Genetics, 40(5):592–599,2008.

275Guoqing Diao and D.Y. Lin. Variance-components methods for linkageand association analysis of ordinal traits in general pedigrees. GeneticEpidemiology, 34(3):232–237, 2010.

178

Page 209: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

276George D. Garson. Variance Components Analysis. Statistical AssociatesPublishers, Asheboro, NC, 2012.

277 John Blangero and Laura Almasy. Solar: sequential oligogenic linkageanalysis routines. Population Genetics Laboratory Technical Report, 6, 1996.

278 Laura Almasy and John Blangero. Multipoint quantitative-trait linkageanalysis in general pedigrees. The American Journal of Human Genetics,62(5):1198–1211, 1998.

279Christopher I. Amos. Robust variance-components approach for assessinggenetic linkage in pedigrees. American Journal of Human Genetics,54(3):535–543, 1994.

280Gail P. Jarvik, Laura M. Amendola, Jonathan S. Berg, Kyle Brothers,Ellen W. Clayton, and Wendy Chung et al. Return of genomic results toresearch participants: the floor, the ceiling, and the choices in between. TheAmerican Journal of Human Genetics, 94(6):818–826, 2014.

281R Core Team. R: a language and environment for statistical computing.,2014.

282Karin V. Fuentes Fajardo, David Adams, Nisc Comparative SequencingProgram, Christopher E. Mason, Murat Sincan, and Cynthia Tifft et al.Detecting false-positive signals in exome sequencing. Human Mutation,33(4):609–613, 2012.

283Giulio Genovese, Menachem Fromer, Eli A. Stahl, Douglas M. Ruderfer,Kimberly Chambert, and Mikael Landen et al. Increased burden of ultra-rareprotein-altering variants among 4,877 individuals with schizophrenia. NatureNeuroscience, 19(11):1433–1441, 2016.

284Nathan O. Stitziel, Adam Kiezun, and Shamil Sunyaev. Computational andstatistical approaches to analyzing variants identified by exome sequencing.Genome Biology, 12(9):227, 2011.

285The International HapMap Consortium. A haplotype map of the humangenome. Nature, 437(7063):1299–1320, 2005.

179

Page 210: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

286Genomes Project Consortium. A map of human genome variation frompopulation-scale sequencing. Nature, 467(7319):1061–1073, 2010.

287McKusick-Nathans Institute of Genetic Medicine. Online mendelianinheritance in man, OMIM, URL: http://omim.org/, 2015.

288Agilent Technologies. Clearseq cancer research panels, URL:http://www.genomics.agilent.com/article.jsp?pageId=6900003#Cancer, 2016.

289 Illumina. Truseq amplicon - cancer panel, URL:https://www.illumina.com/products/by-type/clinical-research-products/truseq-amplicon-cancer-panel.html, 2016.

290Ravindranath Duggirala, Jeff T. Williams, Sarah Williams-Blangero, andJohn Blangero. A variance component approach to dichotomous trait linkageanalysis using a threshold model. Genetic Epidemiology, 14(6):987–992, 1997.

291Bo Peng, Robert K. Yu, Kevin L. DeHoff, and Christopher I. Amos.Normalizing a large number of quantitative traits using empirical normalquantile transformation. BMC Proceedings, 1(1):S156, 2007.

292 J. Martin Bland and Douglas G. Altman. Multiple significance tests: theBonferroni method. BMJ, 310(6973):170, 1995.

293 Frida Belinky, Noam Nativ, Gil Stelzer, Shahar Zimmerman, Tsippi InyStein, Marilyn Safran, and Doron Lancet. Pathcards: multi-sourceconsolidation of human biological pathways. Database, 2015, 2015.

294Michael Ashburner, Catherine A. Ball, Judith A. Blake, David Botstein,Heather Butler, and J. Michael Cherry et al. Gene Ontology: tool for theunification of biology. Nature Genetics, 25(1):25–29, 2000.

295Mate Ongenaert, Leander Van Neste, Tim De Meyer, Gerben Menschaert,Sofie Bekaert, and Wim Van Criekinge. PubMeth: a cancer methylationdatabase combining text-mining and expert annotation. Nucleic AcidsResearch, 36(suppl 1):D842–D846, 2008.

296Donna Maglott, Jim Ostell, Kim D. Pruitt, and Tatiana Tatusova. EntrezGene: gene-centered information at NCBI. Nucleic Acids Research,33(suppl_1):D54–D58, 2005.

180

Page 211: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

297 Junghwa Lim, Daniel A. Ritt, Ming Zhou, and Deborah K. Morrison. TheCNK2 scaffold interacts with vilse and modulates Rac cycling during spinemorphogenesis in hippocampal neurons. Current Biology, 24(7):786–792,2014.

298Gertraud Maskarinec, Yukiko Morimoto, Sreang Heak, Marissa Isaki, AstridSteinbrecher, Laurie J. Custer, and Adrian A. Franke. Urinary estrogenmetabolites in two soy trials with premenopausal women. European Journalof Clinical Nutrition, 66(9):1044–1049, 2012.

299Brook E. Harmon, Yukiko Morimoto, Fanchon Beckford, Adrian A. Franke,Frank Z. Stanczyk, and Gertraud Maskarinec. Oestrogen levels in serum andurine of premenopausal women eating low and high amounts of meat. PublicHealth Nutrition, 17(9):2087–2093, 2014.

300Reetobrata Basu, Nicholas Baumgaertel, Shiyong Wu, and John J. Kopchick.Growth hormone receptor knockdown sensitizes human melanoma cellsto chemotherapy by attenuating expression of ABC drug efflux pumps.Hormones and Cancer, pages 1–14, 2017.

301 Juntao Yao, Xuan Yao, Tao Tian, Xiao Fu, Wenjuan Wang, and SuoniLi et al. ABCB5-ZEB1 axis promotes invasion and metastasis in breastcancer cells. Oncology Research Featuring Preclinical and Clinical CancerTherapeutics, 25(3):305–316, 2017.

302Thilo Gambichler, A.L. Petig, Eggert Stockfleth, and Markus Stucker.Expression of SOX10, ABCB5 and CD271 in melanocytic lesions andcorrelation with survival data of patients with melanoma. Clinical andExperimental Dermatology, 41(7):709–716, 2016.

303Yang Wang and Jia-Song Teng. Increased multi-drug resistance and reducedapoptosis in osteosarcoma side population cells are crucial factors for tumorrecurrence. Experimental and Therapeutic Medicine, 12(1):81–86, 2016.

304Huanle Zhang, P. Wang, Miao-zhen Lu, and Shu-Dong Zhang. c-Mycregulation of ATP-binding cassette transporter reverses chemoresistance inCD133 (+) colon cancer stem cells. Sheng Li Xue Bao:[Acta physiologicaSinica], 68(2):171–178, 2016.

181

Page 212: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

305 Sonja Kleffel, Nayoung Lee, Cecilia Lezcano, Brian J. Wilson, KristineSobolewski, and Karim R. Saab et al. ABCB5-targeted chemoresistancereversal inhibits Merkel cell carcinoma growth. Journal of InvestigativeDermatology, 136(4):838–846, 2016.

306Martin Grimm, Marcel Cetindis, Max Lehmann, Thorsten Biegner, AdelheidMunz, Peter Teriete, and Siegmar Reinert. Apoptosis resistance-relatedABCB5 and DNaseX (Apo10) expression in oral carcinogenesis. ActaOdontologica Scandinavica, 73(5):336–342, 2015.

307Hala M. Farawela, Mervat M. Khorshied, Neemat M. Kassem, Heba A.Kassem, and Hamdy M. Zawam. The clinical relevance and prognosticsignificance of adenosine triphosphate ATP-binding cassette (ABCB5)and multidrug resistance (MDR1) genes expression in acute leukemia:an Egyptian study. Journal of Cancer Research and Clinical Oncology,140(8):1323–1330, 2014.

308Ramaswamy Govindan, Li Ding, Malachi Griffith, JanakiramanSubramanian, Nathan D. Dees, and Krishna L. Kanchi et al. Genomiclandscape of non-small cell lung cancer in smokers and never-smokers. Cell,150(6):1121–1134, 2012.

309Brian J. Wilson, Tobias Schatton, Qian Zhan, Martin Gasser, Jie Ma, andKarim R. Saab et al. ABCB5 identifies a therapy-refractory tumor cellpopulation in colorectal cancer patients. Cancer Research, 71(15):5307–5316,2011.

310 Siu Tim Cheung, Phyllis F.Y. Cheung, Christine K.C. Cheng, Nicholas C.L.Wong, and Sheung Tat Fan. Granulin-epithelin precursor andATP-dependent binding cassette (ABC)B5 regulate liver cancer cellchemoresistance. Gastroenterology, 140(1):344–355, 2011.

311 Ji Yeon Yang, Seon-Ah Ha, Yun-Sik Yang, and Jin Woo Kim. p-glycoproteinABCB5 and YB-1 expression plays a role in increased heterogeneity of breastcancer cells: correlations with cell fusion and doxorubicin resistance. BMCCancer, 10(1):388–398, 2010.

182

Page 213: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

312Mitsuru Higa, Xue Zhang, Kiyoji Tanaka, and Masafumi Saijo. Stabilizationof Ultraviolet (UV)-stimulated Scaffold Protein A by interaction withubiquitin-specific peptidase 7 is essential for transcription-coupled nucleotideexcision repair. Journal of Biological Chemistry, 291(26):13771–13779, 2016.

313 James E. Cleaver, Angela M. Brennan-Minnella, Raymond A. Swanson,Ka-wing Fong, Junjie Chen, and Kai-ming Chou et al. Mitochondrialreactive oxygen species are scavenged by Cockayne syndrome B protein inhuman fibroblasts without nuclear DNA damage. Proceedings of the NationalAcademy of Sciences, 111(37):13487–13492, 2014.

314 Jia Guo, Philip C. Hanawalt, and Graciela Spivak. Comet-FISH withstrand-specific probes reveals transcription-coupled repair of 8-oxoGuaninein human cells. Nucleic Acids Research, 41(16):7700–7712, 2013.

315 Petra Schwertman, Wim Vermeulen, and Jurgen A. Marteijn. UVSSA andUSP7, a new couple in transcription-coupled DNA repair. Chromosoma,122(4):275–284, 2013.

316 Jia Fei and Junjie Chen. KIAA1530 protein is recruited by Cockaynesyndrome complementation group protein A (CSA) to participate intranscription-coupled repair (TCR). Journal of Biological Chemistry,287(42):35118–35126, 2012.

317Gaowu Hu, Ye Xu, Wenquan Chen, Jiandong Wang, Chunying Zhao, andMing Wang. RNA interference of IQ motif containing GTPase-activatingprotein 3 (IQGAP3) inhibits cell proliferation and invasion in breastcarcinoma cells. Oncology Research Featuring Preclinical and Clinical CancerTherapeutics, 24(6):455–461, 2016.

318Malwina Michalak, Uwe Warnken, Sabine Andre, Martina Schnolzer,Hans-Joachim Gabius, and Juergen Kopitz. Detection of proteome changesin human colon cancer induced by cell surface binding of growth-inhibitoryhuman galectin-4 using quantitative SILAC-based proteomics. Journal ofProteome Research, 15(12):4412–4422, 2016.

183

Page 214: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

319Yanqin Gu, Linfeng Lu, Lingfeng Wu, Hao Chen, Wei Zhu, and Yi He.Identification of prognostic genes in kidney renal clear cell carcinoma byRNA-seq data analysis. Molecular Medicine Reports, 15(4):1661–1667, 2017.

320Andreas Ritter, Mourad Sanhaji, Alexandra Friemel, Susanne Roth,Udo Rolle, Frank Louwen, and Juping Yuan. Functional analysis ofphosphorylation of the mitotic centromere-associated kinesin by Aurora Bkinase in human tumor cells. Cell Cycle, 14(23):3755–3767, 2015.

321Yangxing Zhao, Feng Xue, Jinfeng Sun, Shicheng Guo, Hongyu Zhang, andBijun Qiu et al. Genome-wide methylation profiling of the different stagesof hepatitis b virus-related hepatocellular carcinoma development in plasmacell-free DNA reveals potential biomarkers for early detection and high-riskmonitoring of hepatocellular carcinoma. Clinical Epigenetics, 6(1):30, 2014.

322Yong-Chen Lu, Xin Yao, Jessica S. Crystal, Yong F. Li, Mona El-Gamil,and Colin Gross et al. Efficient identification of mutated cancer antigensrecognized by T cells associated with durable tumor regressions. ClinicalCancer Research, 20(13):3401–3410, 2014.

323Cerys S. Manning, Steven Hooper, and Erik A. Sahai. Intravital imagingof SRF and Notch signalling identifies a key role for EZH2 in invasivemelanoma cells. Oncogene, 34(33):4320–4332, 2015.

324 Peng Lyu, Shu-Dong Zhang, Hiu-Fung Yuen, Cian M. McCrudden,Qing Wen, Kwok-Wah Chan, and Hang Fai Kwok. Identification ofTWIST-interacting genes in prostate cancer. Science China Life Sciences,pages 1–11, 2017.

325Kimberly A. Krautkramer, Amelia K. Linnemann, Danielle A. Fontaine,Amy L. Whillock, Ted W. Harris, and Gregory J. Schleis et al. Tcf19 is anovel islet factor necessary for proliferation and survival in the INS-1beta-cell line. American Journal of Physiology - Endocrinology AndMetabolism, 305(5):E600–E610, 2013.

326 Sarah E. Flanagan, Ann-Marie Patch, and Sian Ellard. Using SIFT andPolyPhen to predict loss-of-function and gain-of-function mutations. GeneticTesting and Molecular Biomarkers, 14(4):533–537, 2010.

184

Page 215: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

327Brian H. Shirts, Colin C. Pritchard, and Tom Walsh. Family-specificvariants and the limits of human genetics. Trends in Molecular Medicine,22(11):925–934, 2016.

328 James R. Lupski, John W. Belmont, Eric Boerwinkle, and Richard A.Gibbs. Clan genomics and the complex architecture of human disease. Cell,147(1):32–43, 2011.

329Alex Coventry, Lara M. Bull-Otterson, Xiaoming Liu, Andrew G. Clark,Taylor J. Maxwell, and Jacy Crosby et al. Deep resequencing reveals excessrare recent variants consistent with explosive population growth. NatureCommunications, 1:131–136, 2010.

330Daniel J. Turner, Marcos Miretti, Diana Rajan, Heike Fiegler, Nigel P.Carter, and Martyn L. Blayney et al. Germline rates of de novo meioticdeletions and duplications causing several genomic disorders. NatureGenetics, 40(1):90–95, 2008.

331Adam R. Boyko, Scott H. Williamson, Amit R. Indap, Jeremiah D.Degenhardt, Ryan D. Hernandez, and Kirk E. Lohmueller et al. Assessingthe evolutionary impact of amino acid mutations in the human genome.PLOS Genetics, 4(5):e1000083, 2008.

332Michael Dean and Tarmo Annilo. Evolution of the ATP-binding cassette(ABC) transporter superfamily in vertebrates. Annual Review of GenomicsHuman Genetics, 6:123–142, 2005.

333Natasha Y. Frank, Armen Margaryan, Ying Huang, Tobias Schatton,Ana Maria Waaga-Gasser, and Martin Gasser et al. ABCB5-mediateddoxorubicin transport and chemoresistance in human malignant melanoma.Cancer Research, 65(10):4320–4333, 2005.

334Natasha Y. Frank, Shona S. Pendse, Peter H. Lapchak, Armen Margaryan,Debbie Shlain, and Carsten Doeing et al. Regulation of progenitor cell fusionby ABCB5 P-glycoprotein, a novel human ATP-binding cassette transporter.Journal of Biological Chemistry, 278(47):47156–47165, 2003.

335Claudina Aleman, Jean-Philippe Annereau, Xing-Jie Liang, Carol O.Cardarelli, Barbara Taylor, and Jun Jie Yin et al. P-glycoprotein, expressed

185

Page 216: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

in multidrug resistant cells, is not responsible for alterations in membranefluidity or membrane potential. Cancer Research, 63(12):3084, 2003.

336Marine Chartrain, Joelle Riond, Aline Stennevin, Isabelle Vandenberghe,Bruno Gomes, and Laurence Lamant et al. Melanoma chemotherapy leadsto the selection of ABCB5-expressing cells. PLOS One, 7(5):e36762, 2012.

337Brian J. Wilson, Karim R. Saab, Jie Ma, Tobias Schatton, Pablo Putz,and Qian Zhan et al. ABCB5 maintains melanoma-initiating cellsthrough a proinflammatory cytokine signaling circuit. Cancer Research,74(15):4196–4207, 2014.

338Ge Yang, Ou Jiang, Daiqiong Ling, Xiaoyue Jiang, Pingzong Yuan,and Guang Zeng et al. MicroRNA-522 reverses drug resistance ofdoxorubicin-induced HT29 colon cancer cell by targeting ABCB5. MolecularMedicine Reports, 12(3):3930–3936, 2015.

339 Elma Zaganjor, Lauren M. Weil, Joshua X. Gonzales, John D. Minna, andMelanie H. Cobb. Ras transformation uncouples the kinesin-coordinatedcellular nutrient response. Proceedings of the National Academy of Sciences,111(29):10568–10573, 2014.

340Mourad Sanhaji, Claire Therese Friel, Nina-Naomi Kreis, Andrea Kramer,Claudia Martin, and Jonathon Howard et al. Functional and spatialregulation of mitotic centromere-associated kinesin by cyclin-dependentkinase 1. Molecular and Cellular Biology, 30(11):2594–2607, 2010.

341Mourad Sanhaji, Andreas Ritter, Hannah R. Belsham, Claire T. Friel,Susanne Roth, Frank Louwen, and Juping Yuan. Polo-like kinase 1regulates the stability of the mitotic centromere-associated kinesin in mitosis.Oncotarget, 5(10):3130–3144, 2014.

342Andreas Ritter, Mourad Sanhaji, Kerstin Steinhauser, Susanne Roth,Frank Louwen, and Juping Yuan. The activity regulation of themitotic centromere-associated kinesin by Polo-like kinase 1. Oncotarget,6(9):6641–6655, 2015.

343 Liangyu Zhang, Hengyi Shao, Yuejia Huang, Feng Yan, Youjun Chu, andHai Hou et al. PLK1 phosphorylates mitotic centromere-associated kinesin

186

Page 217: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

and promotes its depolymerase activity. Journal of Biological Chemistry,286(4):3033–3046, 2011.

344Todd Maney, Andrew W. Hunter, Mike Wagenbach, and Linda Wordeman.Mitotic centromere-associated kinesin is important for anaphase chromosomesegregation. The Journal of Cell Biology, 142(3):787–801, 1998.

345Ayana T. Moore, Kathleen E. Rankin, George Von Dassow, Leticia Peris,Michael Wagenbach, and Yulia Ovechkina et al. MCAK associates withthe tips of polymerizing microtubules. The Journal of Cell Biology,169(3):391–397, 2005.

346Alexander Braun, Kyvan Dang, Felinah Buslig, Michelle A. Baird,Michael W. Davidson, Clare M. Waterman, and Kenneth A. Myers. Rac1and Aurora A regulate MCAK to polarize microtubule growth in migratingendothelial cells. The Journal of Cell Biology, 206(1):97–112, 2014.

347 Sacha Gnjatic, Yanran Cao, Uta Reichelt, Emre F. Yekebas, ChristinaNolker, and Andreas H. Marx et al. NY-CO-58/KIF2C is overexpressed in avariety of solid tumors and induces frequent T cell responses in patients withcolorectal cancer. International Journal of Cancer, 127(2):381–393, 2010.

348Arata Shimo, Chizu Tanikawa, Toshihiko Nishidate, Meng-Lay Lin, KoichiMatsuda, and Jae-Hyun Park et al. Involvement of kinesin family member2C/mitotic centromere-associated kinesin overexpression in mammarycarcinogenesis. Cancer Science, 99(1):62–70, 2008.

349Yuji Nakamura, Fumiaki Tanaka, Naoto Haraguchi, Koshi Mimori, TatsuhikoMatsumoto, and Hiroshi Inoue et al. Clinicopathological and biologicalsignificance of mitotic centromere-associated kinesin overexpression in humangastric cancer. British Journal of Cancer, 97(4):543–549, 2007.

350Kazuhiro Ishikawa, Yukio Kamohara, Fumiaki Tanaka, NaotoHaraguchi, Koshi Mimori, Hiroshi Inoue, and Masatomo Mori. Mitoticcentromere-associated kinesin is a novel marker for prognosis and lymphnode metastasis in colorectal cancer. British Journal of Cancer,98(11):1824–1829, 2008.

187

Page 218: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

351Carlo Turano, Sabina Coppari, Fabio Altieri, and Anna Ferraro. Proteinsof the PDI family: unpredicted non-ER locations and functions. Journal ofCellular Physiology, 193(2):154–163, 2002.

352 Peter Klappa, Lloyd W. Ruddock, Nigel J. Darby, and Robert B. Freedman.The b’ domain provides the principal peptide-binding site of protein disulfideisomerase but all domains contribute to binding of misfolded proteins. TheEMBO Journal, 17(4):927–935, 1998.

353Xin-Miao Fu and Bao Ting Zhu. Human pancreas-specific protein disulfideisomerase homolog (PDIp) is an intracellular estrogen-binding protein thatmodulates estrogen levels and actions in target cells. The Journal of SteroidBiochemistry and Molecular Biology, 115(1):20–29, 2009.

354Roberta Maestro, Angelo P. Dei Tos, Yasuo Hamamori, SvetlanaKrasnokutsky, Vittorio Sartorelli, and Larry Kedes et al. Twist is a potentialoncogene that inhibits apoptosis. Genes & Development, 13(17):2207–2217,1999.

355 Eric N. Olson and William H. Klein. bHLH factors in muscle development:dead lines and commitments, what to leave in and what to leave out. Genes& Development, 8(1):1–8, 1994.

356 Elisabeth H. Villavicencio, Joon Won Yoon, Daniel J. Frank, Ernst-MartinFuchtbauer, David O. Walterhouse, and Philip M. Iannaccone. CooperativeE-box regulation of human GLI1 by TWIST and USF. Genesis,32(4):247–258, 2002.

357 Erika Rosivatz, Ingrid Becker, Katja Specht, Elena Fricke, Birgit Luber, andRaymonde Busch et al. Differential expression of the epithelial-mesenchymaltransition regulators snail, SIP1, and twist in gastric cancer. The AmericanJournal of Pathology, 161(5):1881–1891, 2002.

358 P. Andrew Futreal, Lachlan Coin, Mhairi Marshall, Thomas Down, TimothyHubbard, and Richard Wooster et al. A census of human cancer genes.Nature Reviews Cancer, 4(3):177–183, 2004.

359 Zhengyan Kan, Bijay S. Jaiswal, Jeremy Stinson, VasantharajanJanakiraman, Deepali Bhatt, and Howard M. Stern et al. Diverse somatic

188

Page 219: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

mutation patterns and pathway alterations in human cancers. Nature,466(7308):869–873, 2010.

360Bing Yu, Sandra A. O’Toole, and Ronald J. Trent. Somatic DNA mutationanalysis in targeted therapy of solid tumours. Translational Pediatrics,4(2):125–138, 2015.

361 J. Guillermo Paez, Pasi A. Janne, Jeffrey C. Lee, Sean Tracy, Heidi Greulich,and Stacey Gabriel et al. EGFR mutations in lung cancer: correlation withclinical response to gefitinib therapy. Science, 304(5676):1497–1500, 2004.

362Keith T. Flaherty, Igor Puzanov, Kevin B. Kim, Antoni Ribas, Grant A.McArthur, and Jeffrey A. Sosman et al. Inhibition of mutated, activatedBRAF in metastatic melanoma. New England Journal of Medicine,363(9):809–819, 2010.

363Astrid Lievre, Jean-Baptiste Bachet, Delphine Le Corre, Valerie Boige,Bruno Landi, and Emile Jean-Francois et al. KRAS mutation status ispredictive of response to cetuximab therapy in colorectal cancer. CancerResearch, 66(8):3992–3995, 2006.

364Martin H. Cohen, Ann Farrell, Robert Justice, and Richard Pazdur.Approval summary: imatinib mesylate in the treatment of metastatic and/orunresectable malignant gastrointestinal stromal tumors. The Oncologist,14(2):174–180, 2009.

365Georgina L. Ryland, Maria A. Doyle, David Goode, Samantha E. Boyle,David Y. H. Choong, and Simone M. Rowley et al. Loss of heterozygosity:what is it good for? BMC Medical Genomics, 8(1):45, 2015.

366Brenda L. Gallie, A. Linn Murphree, Louise C. Strong, and Rhiannon L.White. Expression of recessive alleles by chromosomal mechanisms inretinoblastoma. Nature, 305(779784):3134, 1983.

367 Sofia D. Merajver, Thomas S. Frank, Junzhe Xu, Trinh M. Pham,Kathleen A. Calzone, and Pamela Bennett-Baker et al. Germline BRCA1mutations and loss of the wild-type allele in tumors from families with earlyonset breast and ovarian cancer. Clinical Cancer Research, 1(5):539–544,1995.

189

Page 220: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

368Daniel C. Koboldt, Qunyuan Zhang, David E. Larson, Dong Shen,Michael D. McLellan, and Ling Lin et al. VarScan2: somatic mutation andcopy number alteration discovery in cancer by exome sequencing. GenomeResearch, 22(3):568–576, 2012.

369Daniel C. Koboldt, David E. Larson, and Richard K. Wilson. UsingVarScan2 for germline variant calling and somatic mutation detection.Current Protocols in Bioinformatics, 44:15.4.1–15.4.17, 2013.

370Adam B. Olshen, Venkatraman E. Seshan, Robert Lucito, and MichaelWigler. Circular binary segmentation for the analysis of array-based DNAcopy number data. Biostatistics, 5(4):557–572, 2004.

371Richard Redon, Shumpei Ishikawa, Karen R. Fitch, Lars Feuk, George H.Perry, and T. Daniel Andrews et al. Global variation in copy number in thehuman genome. Nature, 444(7118):444–454, 2006.

372Adam Shlien and David Malkin. Copy number variations and cancer.Genome Medicine, 1(6):62–62, 2009.

373Darrin Stuart and William R. Sellers. Linking somatic genetic alterationsin cancer to therapeutics. Current Opinion in Cell Biology, 21(2):304–310,2009.

374Rebecca J. Leary, Jimmy C. Lin, Jordan Cummins, Simina Boca, Laura D.Wood, and D. Williams Parsons et al. Integrated analysis of homozygousdeletions, focal amplifications, and sequence alterations in breast andcolorectal cancers. Proceedings of the National Academy of Sciences,105(42):16224–16229, 2008.

375 Evelyn Despierre, Matthieu Moisse, Betul Yesilyurt, Jalid Sehouli, IoanaBraicu, and Sven Mahner et al. Somatic copy number alterations predictresponse to platinum therapy in epithelial ovarian cancer. GynecologicOncology, 135(3):415–422, 2014.

376Hongtao Xu, Xia Zhu, Zulong Xu, Yue Hu, Shiping Bo, Tongjing Xing, andKuichun Zhu. Non-invasive analysis of genomic copy number variation inpatients with hepatocellular carcinoma by next generation DNA sequencing.Journal of Cancer, 6(3):247, 2015.

190

Page 221: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

377 Sara Martoreli Silveira, Isabela Werneck da Cunha, Fabio AlbuquerqueMarchi, Ariane Fidelis Busso, Ademar Lopes, and Silvia Regina Rogatto.Genomic screening of testicular germ cell tumors from monozygotic twins.Orphanet Journal of Rare Diseases, 9(1):181, 2014.

378 Sukanya Horpaopan, Isabel Spier, Alexander M. Zink, Janine Altmuller,Stefanie Holzapfel, and Andreas Laner et al. Genome-wide CNV analysisin 221 unrelated patients and targeted high-throughput sequencing revealnovel causative candidate genes for colorectal adenomatous polyposis.International Journal of Cancer, 136(6):E578–E589, 2015.

379Nadine Bonberg, Beate Pesch, Thomas Behrens, Georg Johnen, Dirk Taeger,and Katarzyna Gawrych et al. Chromosomal alterations in exfoliatedurothelial cells from bladder cancer cases and healthy men: a prospectivescreening study. BMC Cancer, 14(1):854, 2014.

380Barbara A. Weir, Michele S. Woo, Gad Getz, Sven Perner, Li Ding, andRameen Beroukhim et al. Characterizing the cancer genome in lungadenocarcinoma. Nature, 450(7168), 2007.

381Astrid M. Eder, Xiaomei Sui, Daniel G. Rosen, Laura K. Nolden, Kwai WaCheng, and John P. Lahad et al. Atypical PKCI contributes to poorprognosis through loss of apical-basal polarity and cyclin E overexpressionin ovarian cancer. Proceedings of the National Academy of Sciences of theUnited States of America, 102(35):12519–12524, 2005.

382 Idoya Lahortiga, Kim De Keersmaecker, Pieter Van Vlierberghe, CarlosGraux, Barbara Cauwelier, and Frederic Lambert et al. Duplication of theMYB oncogene in T cell acute lymphoblastic leukemia. Nature Genetics,39(5):593–595, 2007.

383 Lars Zender, Mona S. Spector, Wen Xue, Peer Flemming, CarlosCordon-Cardo, and John Silke et al. Identification and validation ofoncogenes in liver cancer using an integrative oncogenomic approach. Cell,125(7):1253–1267, 2006.

384Charles G. Mullighan, Salil Goorha, Ina Radtke, Christopher B. Miller,Elaine Coustan-Smith, and James D. Dalton et al. Genome-wide

191

Page 222: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

analysis of genetic alterations in acute lymphoblastic leukaemia. Nature,446(7137):758–764, 2007.

385Ruprecht Wiedemeyer, Cameron Brennan, Timothy P. Heffernan, YonghongXiao, John Mahoney, and Alexei Protopopov et al. Feedback circuit amongINK4 tumor suppressors constrains human glioblastoma development.Cancer Cell, 13(4):355–364, 2008.

386The Cancer Genome Atlas Research Network. Comprehensive genomiccharacterization defines human glioblastoma genes and core pathways.Nature, 455(7216):1061–1068, 2008.

387Dhananjay Chitale, Yixuan Gong, Barry S. Taylor, Stephen Broderick,Cameron Brennan, and Romel Somwar et al. An integrated genomic analysisof lung cancer reveals loss of DUSP4 in EGFR-mutant tumors. Oncogene,28(31):2773–2783, 2009.

388 Erin D. Pleasance, R. Keira Cheetham, Philip J. Stephens, David J.McBride, Sean J. Humphray, and Chris D. Greenman et al. A comprehensivecatalogue of somatic mutations from a human cancer genome. Nature,463(7278):191–196, 2010.

389Christopher T. Saunders, Wendy S.W. Wong, Sajani Swamy, JenniferBecq, Lisa J. Murray, and R. Keira Cheetham. Strelka: accuratesomatic small-variant calling from sequenced tumor-normal sample pairs.Bioinformatics, 28(14):1811–1817, 2012.

390Heather E. Wheeler, Michael L. Maitland, M. Eileen Dolan, Nancy J. Cox,and Mark J. Ratain. Cancer pharmacogenomics: strategies and challenges.Nature Reviews Genetics, 14(1):23–34, 2013.

391Wanjuan Yang, Jorge Soares, Patricia Greninger, Elena J. Edelman, HowardLightfoot, and Simon Forbes et al. Genomics of drug sensitivity in cancer(GDSC): a resource for therapeutic biomarker discovery in cancer cells.Nucleic Acids Research, 41(D1):D955–D961, 2013.

392 Lin Wu, Nancy Patten, Carl T. Yamashiro, and Buena Chui. Extraction andamplification of DNA from formalin-fixed, paraffin-embedded tissues. AppliedImmunohistochemistry & Molecular Morphology, 10(3):269–274, 2002.

192

Page 223: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

393 Sarah Munchel, Yen Hoang, Yue Zhao, Joseph Cottrell, Brandy Klotzle, andAndrew K. Godwin et al. Targeted or whole genome sequencing of formalinfixed tissue samples: potential applications in cancer genomics. Oncotarget,6(28):25943–25961, 2015.

394 Simon Andrews. Fastqc: a quality control tool for high throughput sequencedata, 2010.

395Anthony M. Bolger, Marc Lohse, and Bjoern Usadel. Trimmomatic:a flexible trimmer for Illumina sequence data. Bioinformatics,30(15):2114–2120, 2014.

396Heng Li. Aligning sequence reads, clone sequences and assembly contigs withBWA-MEM. Broad Institute of Harvard and MIT, pages 1–3, 2013.

397Genome Research Limited. Samtools- utilities forthe sequence alignment/map (SAM) format; URL:http://www.htslib.org/doc/samtools.html, 2016.

398 E. Seshan Venkatraman and Adam B. Olshen. Dnacopy: a package foranalyzing DNA copy data. Department of Epidemiology and Biostatistics.Memorial Sloan-Kettering Cancer Center, 2007.

399Yan Guo, Fei Ye, Quanghu Sheng, Travis Clark, and David C. Samuels.Three-stage quality control strategies for DNA re-sequencing data. Briefingsin Bioinformatics, 15(6):879–889, 2014.

400Thomas L. Clarke, Maria Pilar Sanchez-Bailon, Kelly Chiang, John J.Reynolds, Joaquin Herrero-Ruiz, and Tiago M. Bandeiras et al.PRMT5-dependent methylation of the TIP60 coactivator RUVBL1 is a keyregulator of homologous recombination. Molecular Cell, 65(5):900–916, 2017.

401Bhavna Kumar, Arti Yadav, Nicole V. Brown, Songzhu Zhao, Michael J.Cipolla, and Paul E. Wakely et al. Nuclear PRMT5, cyclin D1 andIL-6 are associated with poor outcome in oropharyngeal squamous cellcarcinoma patients and is inversely associated with p16-status. Oncotarget,8(9):14847–14859, 2017.

193

Page 224: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

402Hao Yang, Xiaoping Zhao, Li Zhao, Liu Liu, Jiajin Li, and Wenzhi Jia etal. PRMT5 competitively binds to CDK4 to promote G1-S transition uponglucose induction in hepatocellular carcinoma. Oncotarget, 7(44):72131, 2016.

403Xiaxin Deng, Guoqiang Shao, Hong-Tao Zhang, Chunyan Li, Dajie Zhang,and Li Cheng et al. Protein arginine methyltransferase 5 functions as anepigenetic activator of the androgen receptor to promote prostate cancer cellgrowth. Oncogene, 36(9):1223–1231, 2017.

404Yan Sheng, Hongtao Wang, Dongchen Liu, Cheng Zhang, Yupeng Deng, andFan Yang et al. Methylation of tumor suppressor gene CDH13 and SHP1promoters and their epigenetic regulation by the UHRF1/PRMT5 complexin endometrial carcinoma. Gynecologic Oncology, 140(1):145–151, 2016.

405H. Chen, Benjamin Lorton, Vijayalaxmi Gupta, and David Shechter. ATGFB-PRMT5-MEP50 axis regulates cancer cell invasion through histoneH3 and H4 arginine methylation coupled transcriptional activation andrepression. Oncogene, 36(3):373–386, 2017.

406Annie Rochette, Nadia Boufaied, Eleonora Scarlata, Lucie Hamel, FadiBrimo, and Hayley C. Whitaker et al. Asporin is a stromally expressedmarker associated with prostate cancer progression. British Journal ofCancer, 116(6):775–784, 2017.

407 Paula J. Hurley, Debasish Sundi, Brian Shinder, Brian W. Simons,Robert M. Hughes, and Rebecca M. Miller et al. Germline variants inasporin vary by race, modulate the tumor microenvironment, and aredifferentially associated with metastatic prostate cancer. Clinical CancerResearch, 22(2):448, 2016.

408 Pamela Maris, Arnaud Blomme, Ana Perez Palacios, Brunella Costanza,Akeila Bellahcene, and Elettra Bianchi et al. Asporin is a fibroblast-derivedTGF-B1 inhibitor and a tumor suppressor associated with good prognosis inbreast cancer. PLOS Medicine, 12(9):e1001871, 2015.

409Qian Ding, Mei Zhang, and Can Liu. Asporin participates in gastric cancercell growth and migration by influencing EGF receptor signaling. OncologyReports, 33(4):1783–1790, 2015.

194

Page 225: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

410Rika Satoyoshi, Sei Kuriyama, Namiko Aiba, Masakazu Yashiro, andMasamitsu Tanaka. Asporin activates coordinated invasion of scirrhousgastric cancer and cancer-associated fibroblasts. Oncogene, 34(5):650–660,2015.

411Andrei Turtoi, Davide Musmeci, Yinghong Wang, Bruno Dumont, JoanSomja, and Generoso Bevilacqua et al. Identification of novel accessibleproteins bearing diagnostic and therapeutic potential in human pancreaticductal adenocarcinoma. Journal of Proteome Research, 10(9):4302–4313,2011.

412Thai H. Ho, Daniel J. Serie, Mansi Parasramka, John C. Cheville, Brian M.Bot, and Weihong Tan et al. Differential gene expression profiling ofmatched primary renal cell carcinoma and metastases reveals upregulationof extracellular matrix genes. Annals of Oncology, 28(3):604–610, 2017.

413Magdalena Zakrzewska, Wojciech Fendler, Krzysztof Zakrzewski, BeataSikorska, Wieslawa Grajkowska, and Bozenna Dembowska-Baginska et al.Altered microRNA expression is associated with tumor grade, molecularbackground and outcome in childhood infratentorial ependymoma. PLOSONE, 11(7):e0158464, 2016.

414 John Richard McPherson, Choon-Kiat Ong, Cedric Chuan-Young Ng,Vikneswari Rajasegaran, Hong-Lee Heng, and Willie Shun-Shing Yu etal. Whole-exome sequencing of breast cancer, malignant peripheral nervesheath tumor and neurofibroma from a patient with neurofibromatosis type1. Cancer Medicine, 4(12):1871–1878, 2015.

415 Pooja Ganguly and Niladri Ganguly. Transcriptomic analyses of genesdifferentially expressed by high-risk and low-risk human papilloma virus E6oncoproteins. VirusDisease, 26(3):105–116, 2015.

416O.A. Simonova, Ekaterina B. Kuznetsova, Elena V. Poddubskaya, Tatiana V.Kekeeva, R.A. Kerimov, and I.D. Trotsenko et al. DNA methylation in thepromoter regions of the laminin family genes in normal and breast carcinomatissues. Molecular Biology, 49(4):598–607, 2015.

195

Page 226: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

417Anbarasu Lourdusamy, Ruman Rahman, Stuart Smith, and Richard Grundy.microRNA network analysis identifies miR-29 cluster as key regulatorof LAMA2 in ependymoma. Acta Neuropathologica Communications,3(1):26–30, 2015.

418Radoslaw Januchowski, Piotr Zawierucha, Marcin Rucinski, and MaciejZabel. Microarray-based detection and expression analysis of extracellularmatrix proteins in drug-resistant ovarian cancer cell lines. Oncology Reports,32(5):1981–1990, 2014.

419 Suchit Jhunjhunwala, Zhaoshi Jiang, Eric W. Stawiski, Florian Gnad,Jinfeng Liu, and Oleg Mayba et al. Diverse modes of genomic alteration inhepatocellular carcinoma. Genome Biology, 15(8):436–450, 2014.

420Radoslaw Januchowski, Piotr Zawierucha, Marcin Rucinski, Michal Nowicki,and Maciej Zabel. Extracellular matrix proteins expression profiling inchemoresistant variants of the A2780 ovarian cancer cell line. BioMedResearch International, 2014:1–9, 2014.

421Akiko Niibori-Nambu, Uichi Midorikawa, Souhei Mizuguchi, Takuichiro Hide,Minako Nagai, and Yoshihiro Komohara et al. Glioma initiating cells form adifferentiation niche via the induction of extracellular matrices and integrinaV. PLOS ONE, 8(5):e59558, 2013.

422Rong Sheng Ni, Xiaohui Shen, Xiaoyun Qian, Chenjie Yu, Haiyan Wu, andX.I.A. Gao. Detection of differentially expressed genes and association withclinicopathological features in laryngeal squamous cell carcinoma. OncologyLetters, 4(6):1354–1360, 2012.

423Dwain Mefford and Joel Mefford. Stromal genes add prognostic informationto proliferation and histoclinical markers: a basis for the next generation ofbreast cancer gene signatures. PLOS ONE, 7(6):e37646, 2012.

424 Sunwoo Lee, Taejeong Oh, Hyuncheol Chung, Sunyoung Rha, Changjin Kim,and Youngho Moon et al. Identification of GABRA1 and LAMA2 as newDNA methylation markers in colorectal cancer. International Journal ofOncology, 40(3):889–898, 2012.

196

Page 227: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

425Yizhu Lyu, Jiacheng Lou, Yan Yang, Jiuxing Feng, Yuchao Hao, and ShuyuHuang et al. Dysfunction of the WT1-MEG3 signaling promotes AMLleukemogenesis via p53 dependent and independent pathways. Leukemia,2017.

426 Piotr Ciesielski, Pawel Jozwiak, Katarzyna Wojcik-Krowiranda, Ewa Forma,Lukasz Cwonda, and Sylwia Szczepaniec et al. Differential expression often-eleven translocation genes in endometrial cancers. Tumor Biology,39(3):1–8, 2017.

427Yoko Kubuki, Takumi Yamaji, Tomonori Hidaka, Takuro Kameda, KotaroShide, and Masaaki Sekine et al. TET2 mutation in diffuse large B-celllymphoma. Journal of Clinical and Experimental Hematopathology,56(3):145–149, 2017.

428 Lars Bullinger, Konstanze Dohner, and Hartmut Dohner. Genomics of acutemyeloid leukemia diagnosis and pathways. Journal of Clinical Oncology,35(9):934–946, 2017.

429Gholamreza Bahari, Mohammad Hashemi, Majid Naderi, and MohsenTaheri. TET2 promoter DNA methylation and expression in childhoodacute lymphoblastic leukemia. Asian Pacific Journal of Cancer Prevention,17(8):3959–3962, 2016.

430 Satoshi Chiba. Significance of TET2 mutations in myeloid and lymphoidneoplasms. [Rinshoo Ketsueki] The Japanese Journal of Clinical Hematology,57(6):715–722, 2016.

431 Joseph H.R. Hetmanski, Egor Zindy, Jean-Marc Schwartz, and Patrick T.Caswell. A MAPK-driven feedback loop suppresses Rac activity topromote RhoA-driven cancer cell invasion. PLOS Computational Biology,12(5):e1004909, 2016.

432 Pascale Monzo, Yuk Kien Chong, Charlotte Guetta-Terrier, AnithaKrishnasamy, Sharvari R. Sathe, and Evelyn K.F. Yim et al. Mechanicalconfinement triggers glioma linear migration dependent on formin FHOD3.Molecular Biology of the Cell, 27(8):1246–1261, 2016.

197

Page 228: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

433 Li Chai, Jia Li, and Zhongwei Lv. An integrated analysis of cancer genes inthyroid cancer. Oncology Reports, 35(2):962–970, 2016.

434Nikki R. Paul, Jennifer L. Allen, Anna Chapman, Maria Morlan-Mairal,Egor Zindy, and Guillaume Jacquem et al. a5B1 integrin recycling promotesArp2/3-independent cancer cell invasion via the formin FHOD3. The Journalof Cell Biology, 210(6):1013–1031, 2015.

435Deborah French, Wenjian Yang, Cheng Cheng, Susana C. Raimondi,Charles G. Mullighan, and James R. Downing et al. Acquired variationoutweighs inherited variation in whole genome analysis of methotrexatepolyglutamate accumulation in leukemia. Blood, 113(19):4512–4520, 2009.

436 Zongping Wang, Jie Kang, Xianzhao Deng, Bomin Guo, Bo Wu, and YoubenFan. Knockdown of GATAD2A suppresses cell proliferation in thyroid cancerin vitro. Oncology Reports, 37(4):2147–2152, 2017.

437Cornelia G. Spruijt, Martijn S. Luijsterburg, Roberta Menafra, Rik G.H.Lindeboom, Pascal W.T.C. Jansen, and Raghu Ram Edupuganti etal. ZMYND8 co-localizes with NuRD on target genes and regulatespoly(ADP-Ribose)-dependent recruitment of GATAD2A/NuRD to sites ofDNA damage. Cell Reports, 17(3):783–798, 2016.

438 Siddhartha P. Kar, Jonathan Beesley, Ali Amin Al Olama, KyriakiMichailidou, Jonathan Tyrer, and ZSofia Kote-Jarai et al. Genome-widemeta-analyses of breast, ovarian, and prostate cancer association studiesidentify multiple new susceptibility loci shared by at least two cancer types.Cancer Discovery, 6(9):1052–1067, 2016.

439Venkatadri Kolla, Koumudi Naraparaju, Tiangang Zhuang, Mayumi Higashi,Sriharsha Kolla, Gerd A. Blobel, and Garrett M. Brodeur. The tumoursuppressor CHD5 forms a NuRD-type chromatin remodelling complex.Biochemical Journal, 468(2):345–352, 2015.

440Morgan P. Torchy, Ali Hamiche, and Bruno P. Klaholz. Structure andfunction insights into the NuRD chromatin remodeling complex. Cellularand Molecular Life Sciences, 72(13):2491–2507, 2015.

198

Page 229: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

441 Sarah E. Mahoney, Zizhen Yao, C. Chip Keyes, Stephen J. Tapscott,and Scott J. Diede. Genome-wide DNA methylation studies suggestdistinct DNA methylation patterns in pediatric embryonal and alveolarrhabdomyosarcomas. Epigenetics, 7(4):400–408, 2012.

442 Eric I. Zimmerman, Alice A. Gibson, Shuiying Hu, Aksana Vasilyeva,Shelley J. Orwick, and Guoqing Du et al. Multikinase inhibitors inducecutaneous toxicity through OAT6-mediated uptake and MAP3K7-driven celldeath. Cancer Research, 76(1):117, 2016.

443 Fanfan Zhou and Guofeng You. Molecular insights into thestructure-function relationship of organic anion transporters OATs.Pharmaceutical Research, 24(1):28–36, 2007.

444Wei Cao, Enguang Ma, Li Zhou, Tan Yuan, and Chunying Zhang.Exploring the FGFR3-related oncogenic mechanism in bladder cancer usingbioinformatics strategy. World Journal of Surgical Oncology, 15(1):66–73,2017.

445Vivien Koh, Hsueh Yin Kwan, Woei Loon Tan, Tzia Liang Mah, andWei Peng Yong. Knockdown of POLA2 increases gemcitabine resistance inlung cancer cells. BMC Genomics, 17(13):1029–138, 2016.

446 Scooter Willis, Victor M. Villalobos, Olivier Gevaert, Mark Abramovitz,Casey Williams, Branimir I. Sikic, and Brian Leyland-Jones. Single geneprognostic biomarkers in ovarian cancer: a meta-analysis. PLOS ONE,11(2):e0149183, 2016.

447Guhyun Kang, Hongseok Yun, Choong-Hyun Sun, Inho Park, SeungmookLee, and Jekeun Kwon et al. Integrated genomic analyses identify frequentgene fusion events and VHL inactivation in gastrointestinal stromal tumors.Oncotarget, 7(6):6538–6551, 2016.

448Tzia Liang Mah, Xin Ning Adeline Yap, Vachiranee Limviphuvadh, NanpuLi, Srinath Sridharan, and Vellaisemy Kuralmani et al. Novel SNP improvesdifferential survivability and mortality in non-small cell lung cancer patients.BMC Genomics, 15(9):S20–S27, 2014.

199

Page 230: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

449Oluf Dimitri Roe, Adam Szulkin, Endre Anderssen, Arnar Flatberg, HelmutSandeck, and Tore Amundsen et al. Molecular resistance fingerprint ofpemetrexed and platinum in a long-term survivor of mesothelioma. PLOSONE, 7(8):e40521, 2012.

450 Fotis A. Asimakopoulos, Pesach J. Shteper, Svetlana Krichevsky, EitanFibach, Aaron Polliack, and Eliezer Rachmilewitz et al. ABL1 methylation isa distinct molecular event associated with clonal evolution of chronic myeloidleukemia. Blood, 94(7):2452–2460, 1999.

451Adina Aviram, Bruria Witenberg, Mati Shaklai, and Dorit Blickstein.Detection of methylated ABL1 promoter in philadelphia-negativemyeloproliferative disorders. Blood Cells, Molecules, and Diseases,30(1):100–106, 2003.

452Baodong Sun, Guanchao Jiang, Muhammad-Ali A. Zaydan, Vincent F.La Russa, Hana Safah, and Melanie Ehrlich. ABL1 promoter methylationcan exist independently of BCR-ABL transcription in chronic myeloidleukemia hematopoietic progenitors. Cancer Research, 61(18):6931–6937,2001.

453 Jing Jin Gu, Clay Rouse, Xia Xu, Jun Wang, Mark W. Onaitis, andAnn Marie Pendergast. Inactivation of ABL kinases suppresses non-smallcell lung cancer metastasis. JCI Insight, 1(21):1–16, 2016.

454 Jean-Philippe Foy, Curtis R. Pickering, Vassiliki A. Papadimitrakopoulou,Jaroslav Jelinek, Steven H. Lin, and William N. William et al. New DNAmethylation markers and global DNA hypomethylation are associated withoral cancer development. Cancer Prevention Research, 8(11):1027–1035,2015.

455 Eun-Joon Lee, Prakash Rath, Jimei Liu, Dungsung Ryu, Lirong Pei, andSatish K. Noonepalle et al. Identification of global DNA methylationsignatures in glioblastoma-derived cancer stem cells. Journal of Geneticsand Genomics, 42(7):355–371, 2015.

456 Jean-Pierre Roperch, Karim Benzekri, Hicham Mansour, and RobertoIncitti. Improved amplification efficiency on stool samples by addition of

200

Page 231: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

spermidine and its use for non-invasive detection of colorectal cancer. BMCBiotechnology, 15(1):41–49, 2015.

457Nadia Ashour, Javier C. Angulo, Guillermo Andres, Raul Alelu, AnaGonzalez-Corpas, and Maria V. Toledo et al. A DNA hypermethylationprofile reveals new potential biomarkers for prostate cancer diagnosis andprognosis. The Prostate, 74(12):1171–1182, 2014.

458Bodour Salhia, Jeff Kiefer, Julianna T.D. Ross, Raghu Metapally, Rae AnneMartinez, and Kyle N. Johnson et al. Integrated genomic and epigenomicanalysis of breast cancer brain metastasis. PLOS ONE, 9(1):e85448, 2014.

459 Jean-Pierre Roperch, Roberto Incitti, Solene Forbin, Floriane Bard, HichamMansour, and Farida Mesli et al. Aberrant methylation of NPY, PENK, andWIF1 as a promising marker for blood-based diagnosis of colorectal cancer.BMC Cancer, 13(1):566–576, 2013.

460Masahiro Shitani, Shigeru Sasaki, Noriyuki Akutsu, Hideyasu Takagi,Hiromu Suzuki, and Masanori Nojima et al. Genome-wide analysis of DNAmethylation identifies novel cancer-related genes in hepatocellular carcinoma.Tumor Biology, 33(5):1307–1317, 2012.

461Yugo Kishida, Atsushi Natsume, Yutaka Kondo, Ichiro Takeuchi,Byonggu An, and Yasuyuki Okamoto et al. Epigenetic subclassificationof meningiomas based on genome-wide DNA methylation analyses.Carcinogenesis, 33(2):436–441, 2012.

462Woonbok Chung, Jolanta Bondaruk, Jaroslav Jelinek, Yair Lotan, ShoudanLiang, Bogdan Czerniak, and Jean-Pierre J. Issa. Detection of bladdercancer using novel DNA methylation biomarkers in urine sediments. CancerEpidemiology Biomarkers & Prevention, 20(7):1483–1491, 2011.

463 Ji Un Kang, Sun Hoe Koo, Kye Chul Kwon, Jong Woo Park, and Jin ManKim. Gain at chromosomal region 5p15.33, containing TERT, is the mostfrequent genetic event in early stages of non-small cell lung cancer. CancerGenetics and Cytogenetics, 182(1):1–11, 2008.

464Yunyu Chen, Jing Zhang, Dongsheng Li, Jiandong Jiang, YanchangWang, and Shuyi Si. Identification of a novel Polo-like kinase 1 inhibitor

201

Page 232: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

that specifically blocks the functions of Polo-Box domain. Oncotarget,8(1):1234–1246, 2016.

465Baochi Ou, Jingkun Zhao, Shaopei Guan, Xiongzhi Wangpu, CongcongZhu, and Yaping Zong et al. PLK2 promotes tumor growth and inhibitsapoptosis by targeting Fbxw7/Cyclin E in colorectal cancer. Cancer Letters,380(2):457–466, 2016.

466 Fei Liu, Shimeng Zhang, Zhen Zhao, Xinru Mao, Jinlan Huang, and ZixianWu et al. MicroRNA-27b up-regulated by human papillomavirus 16 E7promotes proliferation and suppresses apoptosis by targeting polo-likekinase2 in cervical cancer. Oncotarget, 7(15):19666–19679, 2016.

467 Jia-Hui Xu, Shi-Lian Hu, Guo-Dong Shen, and Gan Shen. Tumor suppressorgenes and their underlying interactions in paclitaxel resistance in cancertherapy. Cancer Cell International, 16(1):13–23, 2016.

468M.V. Ramana Reddy, Balireddy Akula, ShashidharJatiani, Rodrigo Vasquez-Del Carpio, Vinay K. Billa,and Muralidhar R. Mallireddigari et al. Discovery of2-(1H-indol-5-ylamino)-6-(2,4-difluorophenylsulfonyl)-8-methylpyrido[2,3-d]pyrimidin-7(8H)-one (7ao) as a potent selective inhibitor of Polo likekinase 2 (PLK2). Bioorganic & Medicinal Chemistry, 24(4):521–544, 2016.

469 Zheng Bo Hu, Xiao Hong Liao, Zun Ying Xu, Xiao Yang, Chao Dong,An Min Jin, and Hai Lu. PLK2 phosphorylates and inhibits enriched TAp73in human osteosarcoma cells. Cancer Medicine, 5(1):74–87, 2016.

470 Li Ying Liu, Wei Wang, Ling Yu Zhao, Bo Guo, Juan Yang, and Xiao GeZhao et al. Silencing of polo-like kinase 2 increases cell proliferation anddecreases apoptosis in SGC-7901 gastric cancer cells. Molecular MedicineReports, 11(4):3033–3038, 2015.

471Cheng-Wei Li and Bor-Sen Chen. Investigating core genetic-and-epigeneticcell cycle networks for stemness and carcinogenic mechanisms, and cancerdrug design using big database mining and genome-wide next-generationsequencing data. Cell Cycle, 15(19):2593–2607, 2016.

202

Page 233: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

472Vishal Kothari, Iris Wei, Sunita Shankar, Shanker Kalyana-Sundaram,Lidong Wang, and Linda W. Ma et al. Outlier kinase expression by RNAsequencing as targets for precision therapy. Cancer Discovery, 3(3):280–293,2013.

473Tobias Berg, Gesine Bug, Oliver G. Ottmann, and Klaus Strebhardt.Polo-like kinases in AML. Expert Opinion on Investigational Drugs,21(8):1069–1074, 2012.

474Helen M. Coley, Eleftheria Hatzimichael, Sarah Blagden, Iain McNeish,Alastair Thompson, Tim Crook, and Nelofer Syed. Polo like kinase2 tumour suppressor and cancer biomarker: new perspectives on drugsensitivity/resistance in ovarian cancer. Oncotarget, 3(1):78–83, 2012.

475 Lalji K. Gediya, Aakanksha Khandelwal, Jyoti Patel, Aashvini Belosay,Gauri Sabnis, and Jhalak et al. Mehta. Design, synthesis, and evaluationof novel mutual prodrugs (hybrid drugs) of all-trans-retinoic acid andhistone deacetylase inhibitors with enhanced anticancer activities in breastand prostate cancer cells in vitro. Journal of Medicinal Chemistry,51(13):3895–3904, 2008.

476Mon-Ju Wu, Mi Ra Kim, Yu-Shan Chen, Jun-Yi Yang, and Chia-JungChang. Retinoic acid directs breast cancer cell state changes throughregulation of TET2-PKC pathway. Oncogene, 36(22):3193–3206, 2017.

477 Liyan Qu and Xiuwen Tang. Bexarotene: a promising anticancer agent.Cancer Chemotherapy and Pharmacology, 65(2):201–205, 2010.

478Martin P. Powers, Wei-Lien Wang, Vivian S. Hernandez, Kayuri S. Patel,Dina C. Lev, Alexander J. Lazar, and Dolores H. Lopez-Terrada. Detectionof myxoid liposarcoma-associated FUS-DDIT3 rearrangement variantsincluding a newly identified breakpoint using an optimized RT-PCR assay.Modern Pathology, 23(10):1307–1315, 2010.

479Carola Andersson, Henrik Fagman, Magnus Hansson, and Fredrik Enlund.Profiling of potential driver mutations in sarcomas by targeted nextgeneration sequencing. Cancer Genetics, 209(4):154–160, 2016.

203

Page 234: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

480Yoshinao Oda, Hidetaka Yamamoto, Tomonari Takahira, ChikashiKobayashi, Kenichi Kawaguchi, and Naomi Tateishi et al. Frequentalteration of p16INK4a/p14ARF and p53 pathways in the round cellcomponent of myxoid/round cell liposarcoma: p53 gene alterations andreduced p14ARF expression both correlate with poor prognosis. The Journalof Pathology, 207(4):410–421, 2005.

481 Jordi Barretina, Barry S. Taylor, Shantanu Banerji, Alexis H. Ramos,Mariana Lagos-Quintana, and Penelope L. DeCarolis et al. Subtype-specificgenomic alterations define new targets for soft-tissue sarcoma therapy.Nature Genetics, 42(8):715–721, 2010.

482 Elizabeth G. Demicco, Keila E. Torres, Markus P. Ghadimi, ChiaraColombo, Svetlana Bolshakov, and Aviad Hoffman et al. Involvement of thePI3K/Akt pathway in myxoid/round cell liposarcoma. Modern Pathology,25(2):212–221, 2012.

483Tsuyoshi Saito, Keisuke Akaike, Aiko Kurisaki-Arakawa, Midori Toda-Ishii,Kenta Mukaihara, and Yoshiyuki Suehara et al. TERT promoter mutationsare rare in bone and soft tissue sarcomas of Japanese patients. Molecular andClinical Oncology, 4(1):61–64, 2016.

484Christian Koelsche, Marcus Renner, Wolfgang Hartmann, Regine Brandt,Burkhard Lehner, and Nina Waldburger et al. TERT promoter hotspotmutations are recurrent in myxoid liposarcomas but rare in other soft tissuesarcoma entities. Journal of Experimental & Clinical Cancer Research,33(1):33–40, 2014.

485Marieke A. de Graaff, Jamie S.E. Yu, Hannah C. Beird, Davis R. Ingram,Theresa Nguyen, and Jeffrey Juehui Liu et al. Establishment andcharacterization of a new human myxoid liposarcoma cell line (DL-221) withthe FUS-DDIT3 translocation. Laboratory Investigation, 96(8):885–894, 2016.

486Cristina R. Antonescu, Sylvia J. Tschernyavsky, Ramona Decuseara,Denis H. Leung, James M. Woodruff, and Murray F. Brennan et al.Prognostic impact of P53 status, TLS-CHOP fusion transcript structure,and histological grade in myxoid liposarcoma. Clinical Cancer Research,7(12):3977–3987, 2001.

204

Page 235: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

487Aviad Hoffman, Markus P.H. Ghadimi, Elizabeth G. Demicco, Chad J.Creighton, Keila Torres, and Chiara Colombo et al. Localized and metastaticmyxoid/round cell liposarcoma. Cancer, 119(10):1868–1877, 2013.

488Christine G. Joseph, Heejung Hwang, Yuchen Jiao, Laura D. Wood, IsaacKinde, and Jian Wu et al. Exomic analysis of myxoid liposarcomas, synovialsarcomas, and osteosarcomas. Genes, Chromosomes and Cancer, 53(1):15–24,2014.

489 Sarah Uboldi, Enrica Calura, Luca Beltrame, Ilaria Fuso Nerini, SergioMarchini, and Duccio Cavalieri et al. A systems biology approach tocharacterize the regulatory networks leading to trabectedin resistance in anin vitro model of myxoid liposarcoma. PLOS ONE, 7(4):e35423, 2012.

490Walter Pavicic, Esa Perkio, Sippy Kaur, and Paivi Peltomaki. Alteredmethylation at microRNA-associated CpG islands in hereditary andsporadic carcinomas: a methylation-specific multiplex ligation-dependentprobe amplification (MS-MLPA)-based approach. Molecular Medicine,17(7-8):726–735, 2011.

491 Lina Albitar, Gavin Pickett, Marilee Morgan, Suzy Davies, and Kimberly K.Leslie. Models representing type I and type II human endometrial cancers:Ishikawa H and Hec50co cells. Gynecologic Oncology, 106(1):52–64, 2007.

492Karin Milde-Langosch, Christoph Goemann, Carola Methner, Gabriele Rieck,Ana-Maria Bamberger, and Thomas Loning. Expression of Rb2/p130 inbreast and endometrial cancer: correlations with hormone receptor status.British Journal of Cancer, 85(4):546–551, 2001.

493Amit Nahum, Keren Hirsch, Michael Danilenko, Colin K.W. Watts,Owen W.J. Prall, Joseph Levy, and Yoav Sharoni. Lycopene inhibition of cellcycle progression in breast and endometrial cancer cells is associated withreduction in cyclin D levels and retention of p27Kip1 in the cyclin E-cdk2complexes. Oncogene, 20(26):3428, 2001.

494Tommaso Susini, Daniela Massi, Milena Paglierani, Valeria Masciullo,Giovanni Scambia, and Antonio Giordano et al. Expression of theretinoblastoma-related gene Rb2/p130 is downregulated in atypical

205

Page 236: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

endometrial hyperplasia and adenocarcinoma. Human Pathology,32(4):360–367, 2001.

495Tommaso Susini, Feliciano Baldi, Candace M. Howard, AlfonsoBaldi, Gianluigi Taddei, and Daniela Massi et al. Expression of theretinoblastoma-related gene Rb2/p130 correlates with clinical outcome inendometrial cancer. Journal of Clinical Oncology, 16(3):1085–1093, 1998.

496Mina Massaro-Giordano, Gianluca Baldi, Antonio De Luca, Alfonso Baldi,and Antonio Giordano. Differential expression of the retinoblastoma genefamily members in choroidal melanoma: prognostic significance. ClinicalCancer Research, 5(6):1455, 1999.

497Maria Pardo, Antonio Pineiro, Maria de la Fuente, Angel Garcia, SripadiPrabhakar, and Nicole Zitzmann et al. Abnormal cell cycle regulation inprimary human uveal melanoma cultures. Journal of Cellular Biochemistry,93(4):708–720, 2004.

498Vasily A. Yakovlev. Nitric oxide-dependent downregulation of BRCA1expression promotes genetic instability. Cancer Research, 73(2):706, 2013.

499Cinti Caterina, Macaluso Marcella, and Antonio Giordano. Tumor-specificexon 1 mutations could be the ‘hit event’ predisposing Rb2/p130 gene toepigenetic silencing in lung cancer. Oncogene, 24(38):5821–5826, 2005.

500Hu Xue Jun, Akihiko Gemma, Yoko Hosoya, Kuniko Matsuda, Michiya Nara,and Yukio Hosomi et al. Reduced transcription of the RB2/p130 gene inhuman lung cancer. Molecular Carcinogenesis, 38(3):124–129, 2003.

501Giuseppe Russo, Pier Paolo Claudio, Yan Fu, Peter Stiegler, Zailin Yu,Marcella Macaluso, and Antonio Giordano. pRB2/p130 target genes innon-small lung cancer cells identified by microarray analysis. Oncogene,22(44):6959–6969, 2003.

502 Sanjay Modi, Akihito Kubo, Herbert Oie, Amy B. Coxon, Ahad Rehmatulla,and Frederic J. Kaye. Protein expression of the RB-related gene familyand SV40 large T antigen in mesothelioma and lung cancer. Oncogene,19(40):4632, 2000.

206

Page 237: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

503 Pier Paolo Claudio, Mario Caputi, and Antonio Giordano. The RB2/p130gene: the latest weapon in the war against lung cancer? Clinical CancerResearch, 6(3):754, 2000.

504Alfonso Baldi, Vincenzo Esposito, Antonio De Luca, Yan Fu, IlernandoMeoli, and Giovan G. Giordano et al. Differential expression of Rb2/p130and p107 in normal human tissues and in primary lung cancer. ClinicalCancer Research, 3(10):1691, 1997.

505 Luciano Mutti, Antonio De Luca, Pier Paolo Claudio, Giuseppe Convertino,Michele Carbone, and Antonio Giordano. Simian virus 40-like DNAsequences and large-T antigen-retinoblastoma family protein pRb2/p130interaction in human mesothelioma. Developments in BiologicalStandardization, 94:47–53, 1997.

506Kristian Helin, Karin Holm, Anita Niebuhr, Hans Eiberg, Niels Tommerup,and Susanne Hougaard et al. Loss of the retinoblastoma protein-related p130protein in small cell lung carcinoma. Proceedings of the National Academy ofSciences of the United States of America, 94(13):6933–6938, 1997.

507 Steven G. Gray, Xiang Guo, Darek Kedra, Bin T. Teh, and Hua-QingMin. Correspondence re: P.P. Claudio et al., Mutations in theRetinoblastoma-related Gene RB2/p130 in Primary NasopharyngealCarcinoma. Cancer Res., 60: 8-12, 2000. Cancer Research, 61(15):5950–5951,2001.

508 Pier Paolo Claudio, Candace M. Howard, Alfonso Baldi, Antonio De Luca,Yan Fu, and Gianluigi Condorelli et al. p130/pRb2 has growth suppressiveproperties similar to yet distinctive from those of retinoblastoma familymembers pRb and p107. Cancer Research, 54(21):5556, 1994.

509 Francesco P. Jori, Umberto Galderisi, Elena Piegari, Gianfranco Peluso,Marilena Cipollaro, and Antonio Cascino et al. RB2/p130 ectopic geneexpression in neuroblastoma stem cells: evidence of cell-fate restriction andinduction of differentiation. Biochemical Journal, 360(3):569, 2001.

510Giuseppe Raschella, Barbara Tanno, Francesco Bonetto, Roberto Amendola,Tullio Battista, and Antonio De Luca et al. Retinoblastoma-related

207

Page 238: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

protein pRb2/p130 and its binding to the B-myb promoter increase duringhuman neuroblastoma differentiation. Journal of Cellular Biochemistry,67(3):297–303, 1997.

511Giuseppe Raschella, Barbara Tanno, Francesco Bonetto, Anna Negroni,Pier Paolo Claudio, and Alfonso Baldi et al. The RB-related gene Rb2/p130in neuroblastoma differentiation and in B-myb promoter down-regulation.Cell Death and Differentiation, 5(5):401–407, 1998.

512Riccardo Di Fiore, Antonella D’Anneo, Giovanni Tesoriere, and RenzaVento. RB1 in cancer: different mechanisms of RB1 inactivation andalterations of pRb pathway in tumorigenesis. Journal of Cellular Physiology,228(8):1676–1687, 2013.

513 Iva Simeonova, Vincent Lejour, Boris Bardot, Rachida Bouarich-Bourimi,Aurelie Morin, and Ming Fang et al. Fuzzy tandem repeats containingp53 response elements may define species-specific p53 target genes. PLOSGenetics, 8(6):e1002731, 2012.

514 Zena Lim and Boon Long Quah. Unilateral retinoblastoma in an eyewith Peters anomaly. Journal of American Association for PediatricOphthalmology and Strabismus, 14(2):184–186, 2010.

515 Paola Indovina, Antonio Acquaviva, Giulia De Falco, Valeria Rizzo, AnnaOnnis, and Anna Luzzi et al. Downregulation and aberrant promotermethylation of p16INK4A: a possible novel heritable susceptibility markerto retinoblastoma. Journal of Cellular Physiology, 223(1):143–150, 2010.

516 Peh-Yean Cheah. The emerging role of RBL2/p130 in multi-stepretinoblastoma tumorigenesis. Cancer Biology & Therapy, 8(8):718–719,2009.

517Kadam Priya, Srinivasa Rao Jada, Boon Long Quah, Thuan Chong Quah,and Poh San Lai. High incidence of allelic loss at 16q12. 2 region spanningRB2/p130 gene in retinoblastoma. Cancer Biology & Therapy, 8(8):714–717,2009.

518David MacPherson, Karina Conkrite, Mandy Tam, Shizuo Mukai, David Mu,and Tyler Jacks. Murine bilateral retinoblastoma exhibiting rapid-onset,

208

Page 239: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

metastatic progression and N-myc gene amplification. The EMBO Journal,26(3):784–794, 2007.

519Gian Marco Tosi, Carmela Trimarchi, Marcella Macaluso, Dario La Sala,Alfredo Ciccodicola, and Stefano Lazzi et al. Genetic and epigeneticalterations of RB2/p130 tumor suppressor gene in human sporadicretinoblastoma: implications for pathogenesis and therapeutic approach.Oncogene, 24(38):5827–5836, 2005.

520David MacPherson, Julien Sage, Teresa Kim, Dennis Ho, Margaret E.McLaughlin, and Tyler Jacks. Cell type-specific effects of Rb deletion in themurine retina. Genes & Development, 18(14):1681–1694, 2004.

521Marie Classon and Ed Harlow. The retinoblastoma tumour suppressor indevelopment and cancer. Nature Reviews Cancer, 2(12):910–917, 2002.

522Cristiana Bellan, Giulia De Falco, Gian Marco Tosi, Stefano Lazzi, FilomenaFerrari, and Giovanna Morbini et al. Missing expression of pRb2/p130in human retinoblastomas is associated with reduced apoptosis andlesser differentiation. Investigative Ophthalmology & Visual Science,43(12):3602–3608, 2002.

523William R. Sellers, Bennett G. Novitch, Satoshi Miyake, Agnieszka Heith,Gregory A. Otterson, and Frederic J. Kaye et al. Stable binding to E2Fis not required for the retinoblastoma protein to activate transcription,promote differentiation, and suppress tumor cell growth. Genes &Development, 12(1):95–106, 1998.

524Yukiharu Sawada, Hajime Nomura, Yuichi Endo, Kazumi Umeki, TeizoFujita, Sachiya Ohtaki, and Kei Fujinaga. Cloning and characterization ofthe rat p130, a member of the retinoblastoma gene family. Biochimica etBiophysica Acta (BBA)-Molecular Basis of Disease, 1361(1):20–27, 1997.

525Alfonso Baldi, Vincenzo Boccia, Pier Paolo Claudio, Antonio De Luca, andAntonio Giordano. Genomic structure of the human retinoblastoma-relatedRb2/p130 gene. Proceedings of the National Academy of Sciences of theUnited States of America, 93(10):4629–4632, 1996.

209

Page 240: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

526 Jacqueline M. Sterner, Yunxia Tao, Sarah B. Kennett, Hyung G. Kim, andJonathan M. Horowitz. The amino terminus of the retinoblastoma (rb)protein associates with a cyclin-dependent kinase-like kinase via rb aminoacids required for growth suppression. Cell Growth & Differentiation,7(1):53–64, 1996.

527 Peter Whyte. The retinoblastoma protein and its relatives. Seminars inCancer Biology, 6(2):83–90, 1995.

528Hugh Cam and Brian David Dynlacht. Emerging roles for E2F: beyond theG1/S transition and DNA replication. Cancer Cell, 3(4):311–316, 2003.

529 Jacob B. Hansen, Hein te Riele, and Karsten Kristiansen. Novel function ofthe retinoblastoma protein in fat: regulation of white versus brown adipocytedifferentiation. Cell Cycle, 3(6):772–776, 2004.

530Victoria M. Richon, Robert E. Lyle, and Robert E. McGehee. Regulationand expression of retinoblastoma proteins p107 and p130 during3t3-l1 adipocyte differentiation. Journal of Biological Chemistry,272(15):10117–10124, 1997.

531 Stefania Capasso, Nicola Alessio, Giovanni Di Bernardo, Marilena Cipollaro,Mariarosa Melone, and Gianfranco Peluso et al. Silencing of RB1 andRB2/P130 during adipogenesis of bone marrow stromal cells results indysregulated differentiation. Cell Cycle, 13(3):482–490, 2014.

532Mark F. Pittenger, Alastair M. Mackay, Stephen C. Beck, Rama K. Jaiswal,Robin Douglas, and Joseph D. Mosca et al. Multilineage potential of adulthuman mesenchymal stem cells. Science, 284(5411):143–147, 1999.

533Alexander B. Mohseny, Karoly Szuhai, Salvatore Romeo, Emilie P.Buddingh, Inge Briaire-de Bruijn, and Danielle de Jong et al. Osteosarcomaoriginates from mesenchymal stem cells in consequence of aneuploidizationand genomic loss of Cdkn2. The Journal of Pathology, 219(3):294–305, 2009.

534Nedime Serakinci, Per Guldberg, Jorge S. Burns, Basem Abdallah, HenrikSchrodder, Thomas Jensen, and Moustapha Kassem. Adult humanmesenchymal stem cell as a target for neoplastic transformation. Oncogene,23(29):5095–5098, 2004.

210

Page 241: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

535 Ioannis Panagopoulos, M. Hoglund, Fredrik Mertens, Nils Mandahl, FelixMitelman, and Pierre Aman. Fusion of the EWS and CHOP genes in myxoidliposarcoma. Oncogene, 12(3):489–494, 1996.

536Helene Zinszner, John Sok, David Immanuel, Yin Yin, and David Ron.TLS (FUS) binds RNA in vivo and engages in nucleo-cytoplasmic shuttling.Journal of Cell Science, 110(15):1741, 1997.

537 Jessica I. Hoell, Erik Larsson, Simon Runge, Jeffrey D. Nusbaum, SujithaDuggimpudi, and Thalia A. Farazi et al. RNA targets of wild-type andmutant FET family proteins. Nature Structural & Molecular Biology,18(12):1428–1431, 2011.

538Adelene Y. Tan and James L. Manley. The TET family of proteins:Functions and roles in disease. Journal of Molecular Cell Biology, 1(2):82–92,2009.

539Nicola D. Roberts, R. Daniel Kortschak, Wendy T. Parker, Andreas W.Schreiber, Susan Branford, and Hamish S. Scott et al. A comparativeanalysis of algorithms for somatic SNV detection in cancer. Bioinformatics,29(18):2223–2230, 2013.

540Anne Bruun Kroigard, Mads Thomassen, Anne-Vibeke Laenkholm,Torben A. Kruse, and Martin Jakob Larsen. Evaluation of nine somaticvariant callers for detection of somatic mutations in exome and targeted deepsequencing data. PLOS ONE, 11(3):e0151664, 2016.

541 Li Ding, Michael C. Wendl, Daniel C. Koboldt, and Elaine R. Mardis.Analysis of next generation genomic data in cancer: accomplishments andchallenges. Human Molecular Genetics, 19(R2):188–196, 2010.

542Michael Gundry and Jan Vijg. Direct mutation analysis by high-throughputsequencing: from germline to low-abundant, somatic variants. MutationResearch/Fundamental and Molecular Mechanisms of Mutagenesis,729(1):1–15, 2012.

543 Ensel Oh, Yoon-La Choi, Mi Jeong Kwon, Ryong Nam Kim, Yu Jin Kim,and Ji-Young Song et al. Comparison of accuracy of whole-exome sequencing

211

Page 242: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

with formalin-fixed paraffin-embedded and fresh frozen tissue samples. PLOSONE, 10(12):e0144162, 2015.

544 Jan A. Sikorsky, Donald A. Primerano, Terry W. Fenger, and James Denvir.DNA damage reduces Taq DNA polymerase fidelity and PCR amplificationefficiency. Biochemical and Biophysical Research Communications,355(2):431–437, 2007.

545Hongdo Do and Alexander Dobrovic. Dramatic reduction of sequenceartefacts from DNA isolated from formalin-fixed cancer biopsies by treatmentwith uracil-DNA glycosylase. Oncotarget, 3(5):546–558, 2012.

546Hongdo Do, Stephen Q. Wong, Jason Li, and Alexander Dobrovic.Reducing sequence artifacts in amplicon-based massively parallel sequencingof formalin-fixed paraffin-embedded DNA by enzymatic depletion ofuracil-containing templates. Clinical Chemistry, 59(9):1376–1383, 2013.

547Michael Hofreiter, Viviane Jaenicke, David Serre, Arndt von Haeseler, andSvante Paabo. DNA sequences from multiple amplifications reveal artifactsinduced by cytosine deamination in ancient DNA. Nucleic Acids Research,29(23):4793–4799, 2001.

548 Juliane C. Dohm, Claudio Lottaz, Tatiana Borodina, and HeinzHimmelbauer. Substantial biases in ultra-short read data sets fromhigh-throughput DNA sequencing. Nucleic Acids Research, 36(16):e105–e105,2008.

549 Frazer Meacham, Dario Boffelli, Joseph Dhahbi, David I.K. Martin, MeromitSinger, and Lior Pachter. Identification and correction of systematic error inhigh-throughput sequence data. BMC Bioinformatics, 12(1):451, 2011.

550Kensuke Nakamura, Taku Oshima, Takuya Morimoto, Shun Ikeda, HirofumiYoshikawa, and Yuh Shiwa et al. Sequence-specific error profile of Illuminasequencers. Nucleic Acids Research, pages 1–13, 2011.

551Kym M. Boycott, Megan R. Vanstone, Dennis E. Bulman, and Alex E.MacKenzie. Rare-disease genetics in the era of next-generation sequencing:discovery to translation. Nature Review Genetics, 14(10):681–691, 2013.

212

Page 243: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

552Gregory M. Cooper and Jay Shendure. Needles in stacks of needles: findingdisease-causal variants in a wealth of genomic data. Nature ReviewsGenetics, 12(9):628–640, 2011.

553 Shamil R. Sunyaev. Inferring causality and functional significance of humancoding DNA variants. Human Molecular Genetics, 21(R1):R10–R17, 2012.

554Matthew Zawistowski, Shyam Gopalakrishnan, Jun Ding, Yun Li, SaraGrimm, and Sebastian Zollner. Extending rare-variant testing strategies:analysis of noncoding sequence and imputed genotypes. American Journal ofHuman Genetics, 87(5):604–617, 2010.

555Martin Ladouceur, Zari Dastani, Yurii S. Aulchenko, Celia M.T. Greenwood,and J. Brent Richards. The empirical power of rare variant associationmethods: results from sanger sequencing in 1,998 individuals. PLOSGenetics, 8(2):e1002496, 2012.

556 Seunggeung Lee, Goncalo R. Abecasis, Michael Boehnke, and XihongLin. Rare-variant association analysis: study designs and statistical tests.American Journal of Human Genetics, 95(1):5–23, 2014.

557 Stephan Morgenthaler and William G. Thilly. A strategy to discover genesthat carry multi-allelic or mono-allelic risk for common diseases: A cohortallelic sums test (CAST). Mutation Research/Fundamental and MolecularMechanisms of Mutagenesis, 615(1-2):28–56, 2007.

558Bingshan Li and Suzanne M. Leal. Methods for detecting associations withrare variants for common diseases: application to analysis of sequence data.The American Journal of Human Genetics, 83(3):311–321, 2008.

559Andrew S. Brohl, Rajesh Patidar, Clesson E. Turner, Xinyu Wen, Young K.Song, and Jun S. Wei et al. Frequent inactivating germline mutations inDNA repair genes in patients with Ewing sarcoma. Genetic in Medicine,2017.

560Garvan Institute of Medical Research. Medical genome reference bank; URL:https://www.garvan.org.au/research/kinghorn-centre-for-clinical-genomics/clinical-genomics/sydney-genomics-collaborative/mgrb, 2017.

213

Page 244: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

561 John J. McNeil, Robyn L. Woods, Mark R. Nelson, Anne M. Murray,Christopher M. Reid, and Brenda Kirpach et al. Baseline characteristics ofparticipants in the ASPREE (ASPirin in Reducing Events in the Elderly)study. The Journals of Gerontology, 2017.

562Goo Jun, Matthew Flickinger, Kurt N. Hetrick, Jane M. Romm, Kimberly F.Doheny, and Goncalo R. Abecasis et al. Detecting and estimatingcontamination of human DNA samples in sequencing and array-basedgenotype data. The American Journal of Human Genetics, 91(5):839–848,2012.

563Consortium The Genomes Project. A global reference for human geneticvariation. Nature, 526(7571):68–74, 2015.

564GATK Documentation. Best practices for germline SNP &Indel discovery in whole genome and exome sequence; URL:https://software.broadinstitute.org/gatk/best-practices/bp_3step.php?case=GermShortWGS, 2016.

565Donna Karolchik, Robert Baertsch, Mark Diekhans, Terrence S. Furey, AngieHinrichs, and Y.T. Lu et al. The UCSC genome browser database. NucleicAcids Research, 31(1):51–54, 2003.

566 Florian Gnad, Albion Baucom, Kiran Mukhyala, Gerard Manning, andZemin Zhang. Assessment of computational methods for predicting theeffects of missense mutations in human cancers. BMC Genomics, 14(3):S7,2013.

567NCBI. EST Profile Hs.684212 - C16orf96:Chromosome 16 open reading frame 96; URL:https://www.ncbi.nlm.nih.gov/UniGene/ESTProfileViewer.cgi?uglist=Hs.684212,2017.

568 Li Liu, Jiao Huang, Ke Wang, Li Li, Yangkai Li, Jingsong Yuan, and ShengWei. Identification of hallmarks of lung adenocarcinoma prognosis usingwhole genome sequencing. Oncotarget, 6(35):38016–38028, 2015.

214

Page 245: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

569Desheng Xiao, Ying Shi, Chunyan Fu, Jiantao Jia, Yu Pan, and YiqunJiang et al. Decrease of TET2 expression and increase of 5-hmC levels inmyeloid sarcomas. Leukemia Research, 42:75–79, 2016.

570Yu Pan, Yongguang Tao, Chunyan Fu, Jiantao Jia, Shuang Liu, and DeshengXiao. Assessment of PET/CT in multifocal myeloid sarcomas with loss ofTET2: a case report and literature review. International Journal of Clinicaland Experimental Pathology, 8(10):13630–13634, 2015.

571 Pamela J. Woodring, Tony Hunter, and Jean Y.J. Wang. Regulation ofF-actin-dependent processes by the Abl family of tyrosine kinases. Journalof Cell Science, 116(13):2613–2626, 2003.

572 Emma Shtivelman, Batia Lifshitz, Robert P. Gale, and Eli Canaani. Fusedtranscript of ABL and BCR genes in chronic myelogenous leukaemia. Nature,216:550–554, 1985.

573Richard B. Jones, Andrew Gordus, Jordan A. Krall, and Gavin MacBeath. Aquantitative protein interaction network for the ErbB receptors using proteinmicroarrays. Nature, 439(7073):168–174, 2006.

574Divyamani Srinivasan and Rina Plattner. Activation of Abl tyrosine kinasespromotes invasion of aggressive breast cancer cells. Cancer Research,66(11):5648–5655, 2006.

575 Liuqing Yang, Chunru Lin, and Zhi-Ren Liu. P68 RNA helicase mediatesPDGF-induced epithelial mesenchymal transition by displacing Axin fromB-catenin. Cell, 127(1):139–155, 2006.

576Klarisa Rikova, Ailan Guo, Qingfu Zeng, Anthony Possemato, Jian Yu, andHerbert Haack et al. Global survey of phosphotyrosine signaling identifiesoncogenic kinases in lung cancer. Cell, 131(6):1190–1203, 2007.

577 Jeffrey Lin, Tong Sun, Lin Ji, Wei Deng, Jack Roth, John D. Minna, andRalph Arlinghaus. Oncogenic activation of c-Abl in non-small cell lungcancer cells lacking FUS1 expression: inhibition of c-Abl by the tumorsuppressor gene product Fus1. Oncogene, 26(49):6989–6996, 2007.

215

Page 246: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

578Chang-Jiun Wu, Tianxi Cai, Klarisa Rikova, David Merberg, Simon Kasif,and Martin Steffen. A predictive phosphorylation signature of lung cancer.PLOS ONE, 4(11):e7994, 2009.

579 Julian Carretero, Takeshi Shimamura, Klarisa Rikova, Autumn L. Jackson,Matthew D. Wilkerson, and Christa L. Borgman et al. Integrative genomicand proteomic analyses identify targets for Lkb1-deficient metastatic lungtumors. Cancer Cell, 17(6):547–559, 2010.

580 Justin M. Drake, Nicholas A. Graham, Tanya Stoyanova, Amir Sedghi,Andrew S. Goldstein, and Houjian Cai et al. Oncogene-specific activationof tyrosine kinase networks during prostate cancer progression. Proceedingsof the National Academy of Sciences, 109(5):1643–1648, 2012.

581Alessandro Furlan, Venturina Stagni, Azeemudeen Hussain, Sylvie Richelme,Filippo Conti, and Andrea Prodosmo et al. Abl interconnects oncogenicMet and p53 core pathways in cancer cells. Cell Death & Differentiation,18(10):1608–1616, 2011.

582 Sourik S. Ganguly, Leann S. Fiore, Jonathan T. Sims, J. Woodrow Friend,Divyamani Srinivasan, and Matthew A. Thacker et al. c-Abl and Arg areactivated in human primary melanomas, promote melanoma cell invasionvia distinct pathways, and drive metastatic progression. Oncogene,31(14):1804–1816, 2012.

583 Junaid Ansari, Abdul Rafeh Naqash, Reinhold Munker, Hazem El-Osta,Samip Master, and James D. Cotelingam et al. Histiocytic sarcoma as asecondary malignancy: pathobiology, diagnosis, and treatment. EuropeanJournal of Haematology, 97(1):9–16, 2016.

584Xueyan Chen, Joe C. Rutledge, David Wu, Min Fang, Kent E. Opheim, andMin Xu. Chronic myelogenous leukemia presenting in blast phase with nodal,bilineal myeloid sarcoma and t-lymphoblastic lymphoma in a child. Pediatricand Developmental Pathology, 16(2):91–96, 2013.

585Brian L. Samuels, Sant Chawla, Shreyaskumar Patel, Margaret von Mehren,Jeremy Hamm, and Pamela E. Kaiser et al. Clinical outcomes and safetywith trabectedin therapy in patients with advanced soft tissue sarcomas

216

Page 247: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

following failure of prior chemotherapy: results of a worldwide expandedaccess program study. Annals of Oncology, 24(6):1703–1709, 2013.

586 Fernando A. Angarita, Amanda J. Cannell, Albiruni R. Abdul Razak,Brendan C. Dickson, and Martin E. Blackstein. Trabectedin for inoperable orrecurrent soft tissue sarcoma in adult patients: a retrospective cohort study.BMC Cancer, 16(1):30–41, 2016.

587Kira Bramswig, Ferdinand Ploner, Alexandra Martel, Thomas Bauernhofer,Wolfgang Hilbe, and Thomas Kuhr et al. Sorafenib in advanced, heavilypretreated patients with soft tissue sarcomas. Anti-Cancer Drugs,25(7):848–853, 2014.

588Armando Santoro, Alessandro Comandone, Umberto Basso, Hector SotoParra, Rita De Sanctis, and Elisa Stroppa et al. Phase II prospective studywith sorafenib in advanced soft tissue sarcomas after anthracycline-basedtherapy. Annals of Oncology, 24(4):1093–1098, 2013.

589Bo Eskerod Madsen and Sharon R. Browning. A groupwise associationtest for rare mutations using a weighted sum statistic. PLOS Genetics,5(2):e1000384, 2009.

590Ya-Jing Zhou, Yong Wang, and Li-Li Chen. Detecting the common andindividual effects of rare variants on quantitative traits by using extremephenotype sampling. Genes, 7(1):2–14, 2016.

591Benjamin M. Neale, Manuel A. Rivas, Benjamin F. Voight, David Altshuler,Bernie Devlin, and Marju Orho-Melander et al. Testing for an unusualdistribution of rare variants. PLOS Genetics, 7(3):e1001322, 2011.

592 Seunggeun Lee, Michael C. Wu, and Xihong Lin. Optimal tests for rarevariant effects in sequencing association studies. Biostatistics, 13(4):762–775,2012.

593Andriy Derkach, Jerry F. Lawless, and Lei Sun. Robust and powerfultests for rare variants using Fisher’s method to combine evidence ofassociation from two or more complementary tests. Genetic Epidemiology,37(1):110–121, 2013.

217

Page 248: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

594 Satu Maki-Nevala, Virinder Kaur Sarhadi, Aija Knuuttila, Ilari Scheinin,Pekka Ellonen, and Sonja Lagstrom et al. Driver gene and novel mutationsin asbestos-exposed lung adenocarcinoma and malignant mesotheliomadetected by exome sequencing. Lung, 194(1):125–135, 2016.

595Robbert D.A. Weren, Marjolijn J.L. Ligtenberg, C. Marleen Kets,Richarda M. de Voer, Eugene T.P. Verwiel, and Liesbeth Spruijt et al. Agermline homozygous mutation in the base-excision repair gene NTHL1causes adenomatous polyposis and colorectal cancer. Nature Genetics,47(6):668–671, 2015.

596Oriol Calvete, Jose Reyes, Sheila Zuniga, Beatriz Paumard-Hernandez,Victoria Fernandez, and Luis Bujanda et al. Exome sequencingidentifies ATP4A gene as responsible of an atypical familial type I gastricneuroendocrine tumour. Human Molecular Genetics, 24(10):2914–2922, 2015.

597Michael W. Ronellenfitsch, Oh Ji Eun, Kaishi Satomi, Koichiro Sumi,Patrick N. Harter, and Joachim P. Steinbach et al. CASP9 germlinemutation in a family with multiple brain tumors. Brain Pathology, pages1–22, 2016.

598Cezary Cybulski, Jian Carrot-Zhang, Wojciech Kluzniak, Barbara Rivera,Aniruddh Kashyap, and Dominika Wokolorczyk et al. Germline RECQLmutations are associated with breast cancer susceptibility. Nature Genetics,47(6):643–646, 2015.

599 Jie Sun, Yuxia Wang, Yisui Xia, Ye Xu, Tao Ouyang, and Jinfeng Li etal. Mutations in RECQL gene are associated with predisposition to breastcancer. PLOS Genetics, 11(5):e1005228, 2015.

600 Johanna I. Kiiski, Liisa M. Pelttari, Sofia Khan, Edda S. Freysteinsdottir,Inga Reynisdottir, and Steven N. Hart et al. Exome sequencing identifiesFANCM as a susceptibility gene for triple-negative breast cancer. Proceedingsof the National Academy of Sciences, 111(42):15172–15177, 2014.

601 Francisco Javier Gracia-Aznarez, Victoria Fernandez, Guillermo Pita, PaoloPeterlongo, Orlando Dominguez, and Miguel de la Hoya et al. Whole exomesequencing suggests much of non-BRCA1/BRCA2 familial breast cancer

218

Page 249: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

is due to moderate and low penetrance susceptibility alleles. PLOS ONE,8(2):e55681, 2013.

602 Paolo Peterlongo, Irene Catucci, Mara Colombo, Laura Caleca, EliseosMucaki, and Massimo Bogliolo et al. FANCM c.5791C>T nonsensemutation (rs144567652) induces exon skipping, affects DNA repair activityand is a familial breast cancer risk factor. Human Molecular Genetics,24(18):5345–5355, 2015.

603Daniel J. Park, Kayoko Tao, Florence Le Calvez-Kelm, Tu Nguyen-Dumont,Nivonirina Robinot, and Fleur Hammet et al. Rare mutations in RINT1predispose carriers to breast and Lynch syndrome-spectrum cancers. CancerDiscovery, 4(7):804–815, 2014.

604 Ella R. Thompson, Maria A. Doyle, Georgina L. Ryland, Simone M. Rowley,David Y. H. Choong, and Richard W. Tothill et al. Exome sequencingidentifies rare deleterious mutations in DNA repair genes FANCC andBLM as potential breast cancer susceptibility alleles. PLOS Genetics,8(9):e1002894, 2012.

605Anna P. Sokolenko, Aglaya G. Iyevleva, Elena V. Preobrazhenskaya,Nathalia V. Mitiushkina, Svetlana N. Abysheva, and Evgeny N. Suspitsin etal. High prevalence and breast cancer predisposing role of the BLM c.1642 C> T (Q548X) mutation in Russia. International Journal of Cancer,130(12):2867–2873, 2012.

606Darya Prokofyeva, Natalia Bogdanova, Natalia Dubrowinskaja, MarinaBermisheva, Zalina Takhirova, and Natalia Antonenkova et al. Nonsensemutation p.Q548X in BLM, the gene mutated in Bloom’s syndrome, isassociated with breast cancer in Slavic populations. Breast Cancer Researchand Treatment, 137(2):533–539, 2013.

607Marianne Berwick, Jaya M. Satagopan, Leah Ben-Porat, Ann Carlson,Katherine Mah, and Rashida Henry et al. Genetic heterogeneity amongfanconi anemia heterozygotes and risk of cancer. Cancer Research,67(19):9591–9596, 2007.

219

Page 250: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

608Daniel J. Park, Fabienne Lesueur, Tu Nguyen-Dumont, Maroulio Pertesi,Fabrice Odefrey, and F. Hammet et al. Rare mutations in XRCC2 increasethe risk of breast cancer. The American Journal of Human Genetics,90(4):734–739, 2012.

609Kirsi Maatta, Tommi Rantapero, Anna Lindstrom, Matti Nykter, MinnaKankuri-Tammilehto, Satu-Leena Laasanen, and Johanna Schleutker.Whole-exome sequencing of Finnish hereditary breast cancer families.European Journal of Human Genetics, 2016.

610Abdelkader Heddar, Pierre Fermey, Sophie Coutant, Emilie Angot,Jean-Christophe Sabourin, and Paul Michelin et al. Familial solitarychondrosarcoma resulting from germline EXT2 mutation. Genes,Chromosomes and Cancer, 2016.

611 Lynn R. Goldin, Mary L. McMaster, Melissa Rotunno, Sarah E.M. Herman,Kristine Jones, and Bin Zhu et al. Whole exome sequencing in families withCLL detects a variant in Integrin B 2 associated with disease susceptibility.Blood, 128(18):2261–2263, 2016.

612Helen E. Speedy, Ben Kinnersley, Daniel Chubb, Peter Broderick, Philip J.Law, and Kevin Litchfield et al. Germline mutations in shelterin complexgenes are associated with familial chronic lymphocytic leukemia. Blood, 2016.

613 Jun-Xiao Zhang, Lei Fu, Richarda M. de Voer, Marc-Manuel Hahn, PengJin, and Chen-Xi Lv et al. Candidate colorectal cancer predisposinggene variants in Chinese early-onset and familial cases. World Journal ofGastroenterology, 21(14):4136–4149, 2015.

614Nuria Segui, Leonardo B. Mina, Conxi Lazaro, Rebeca Sanz-Pamplona,Tirso Pons, and Matilde Navarro et al. Germline mutations in FAN1 causehereditary colorectal cancer by impairing DNA repair. Gastroenterology,149(3):563–566, 2015.

615Clara Esteban-Jurado, Maria Vila-Casadesus, Pilar Garre, Juan Jose Lozano,Anna Pristoupilova, and Sergi Beltran et al. Whole-exome sequencingidentifies rare pathogenic variants in new predisposition genes for familialcolorectal cancer. Genetics in Medicine, 2014.

220

Page 251: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

616Taina T. Nieminen, Marie-Francoise O’Donohue, Yunpeng Wu, Hannes Lohi,Stephen W. Scherer, and Andrew D. Paterson et al. Germline mutation ofRPS20, encoding a ribosomal protein, causes predisposition to hereditarynonpolyposis colorectal carcinoma without DNA mismatch repair deficiency.Gastroenterology, 147(3):595–598. e5, 2014.

617Alexandra E. Gylfe, Riku Katainen, Johanna Kondelin, Tomas Tanskanen,Tatiana Cajuso, and Ulrika Hanninen et al. Eleven candidate susceptibilitygenes for common familial colorectal cancer. PLOS Genetics, 9(10):e1003876,2013.

618 Pi-Yueh Chang, Jinn-Shiun Chen, Nai-Chung Chang, Shih-Cheng Chang,Mei-Chia Wang, and Shu-Hui Tsai et al. NRAS germline variant G138Rand multiple rare somatic mutations on APC in colorectal cancer patientsin Taiwan by next generation sequencing. Oncotarget, 7(25):37566–37580,2016.

619Daniel Chubb, Peter Broderick, Sara E. Dobbins, Matthew Framptom,Ben Kinnersley, and Steven Penegar et al. Rare disruptive mutationsand their contribution to the heritable risk of colorectal cancer. NatureCommunications, 2016.

620Richarda M. de Voer, Marc-Manuel Hahn, Robbert D.A. Weren, Arjen R.Mensenkamp, Christian Gilissen, and Wendy A. van Zelst-Stams et al.Identification of novel candidate genes for early-onset colorectal cancersusceptibility. PLOS Genetics, 12(2):e1005880, 2016.

621Claire Palles, Jean-Baptiste Cazier, Kimberley M. Howarth, Enric Domingo,Angela M. Jones, and Peter Broderick et al. Germline mutations affectingthe proofreading domains of POLE and POLD1 predispose to colorectaladenomas and carcinomas. Nature Genetics, 45(2):136–144, 2013.

622Christopher G. Smith, Marc Naven, Rebecca Harris, James Colley,Hannah West, and Ning Li et al. Exome resequencing identifies potentialtumor-suppressor genes that predispose to colorectal cancer. HumanMutation, 34(7):1026–1034, 2013.

221

Page 252: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

623Anna Rohlin, Theofanis Zagoras, Staffan Nilsson, Ulf Lundstam, JanWahlstrom, and Leif Hulten et al. A mutation in POLE predisposing to amulti-tumour phenotype. International Journal of Oncology, 45(1):77–81,2014.

624 Laura Valle, Eva Hernandez-Illan, Fernando Bellido, Gemma Aiza, AdelaCastillejo, and Maria-Isabel Castillejo et al. New insights into POLE andPOLD1 germline mutations in familial colorectal cancer and polyposis.Human Molecular Genetics, 23(13):3506–3512, 2014.

625 Fernando Bellido, Marta Pineda, Gemma Aiza, Rafael Valdes-Mas, MatildeNavarro, and Diana A. Puente et al. POLE and POLD1 mutations in 529kindred with familial colorectal cancer and/or polyposis: review of reportedcases and recommendations for genetic testing and surveillance. Genetics inMedicine, 18(4):325–332, 2015.

626Daniel Chubb, Peter Broderick, Matthew Frampton, Ben Kinnersley, AmySherborne, and Steven Penegar et al. Genetic diagnosis of high-penetrancesusceptibility for colorectal cancer (CRC) is achievable for a high proportionof familial CRC by exome sequencing. Journal of Clinical Oncology,33(5):426–432, 2015.

627 Fadwa A. Elsayed, C. Marleen Kets, Dina Ruano, Brendy van den Akker,Arjen R. Mensenkamp, and Melanie Schrumpf et al. Germline variants inPOLE are associated with early onset mismatch repair deficient colorectalcancer. European Journal of Human Genetics, 23(8):1080–1084, 2015.

628Maren F. Hansen, Jostein Johansen, Inga Bjornevoll, Anna E. Sylvander,Kristin S. Steinsbekk, and Pal Saetrom et al. A novel POLE mutationassociated with cancers of colon, pancreas, ovaries and small intestine.Familial Cancer, 14(3):437–448, 2015.

629 Isabel Spier, Stefanie Holzapfel, Janine Altmuller, Bixiao Zhao, SukanyaHorpaopan, and Stefanie Vogt et al. Frequency and phenotypic spectrumof germline mutations in POLE and seven other polymerase genes in 266patients with colorectal adenomas and carcinomas. International Journalof Cancer, 137(2):320–331, 2015.

222

Page 253: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

630Yael Goldberg, Naama Halpern, Ayala Hubert, Samuel N. Adler, SherriCohen, and Morasha Plesser-Duvdevani et al. Mutated MCM9 is associatedwith predisposition to hereditary mixed polyposis and colorectal cancer inaddition to primary ovarian failure. Cancer Genetics, 208(12):621–624, 2015.

631Ronja Adam, Isabel Spier, Bixiao Zhao, Michael Kloth, Jonathan Marquez,and Inga Hinrichsen et al. Exome sequencing identifies biallelic MSH3germline mutations as a recessive subtype of colorectal adenomatouspolyposis. The American Journal of Human Genetics, 99(2):337–351, 2016.

632 Isabel Spier, Martin Kerick, Dmitriy Drichel, Sukanya Horpaopan, JanineAltmuller, and Andreas Laner et al. Exome sequencing identifies potentialnovel candidate genes in patients with unexplained colorectal adenomatouspolyposis. Familial Cancer, 15(2):281–288, 2016.

633Ryan E. Fecteau, Jianping Kong, Adam Kresak, Wendy Brock, YeunjooSong, and Hisashi Fujioka et al. Association between germline mutation inVSIG10L and familial Barrett neoplasia. JAMA Oncology, 2(10):1333–1339,2016.

634Caixia Cheng, Heyang Cui, Ling Zhang, Zhiwu Jia, Bin Song, and FangWang et al. Genomic analyses reveal FAM84B and the NOTCH pathwayare associated with the progression of esophageal squamous cell carcinoma.GigaScience, 5(1):1, 2016.

635Keqiang Zhang, Jia-Wei Lin, Jinhui Wang, Xiwei Wu, Hanlin Gao, andYi-Chen Hsieh et al. A germline missense mutation in COQ6 is associatedwith susceptibility to familial schwannomatosis. Genetic Medicine,16(10):787–792, 2014.

636 Iikki Donner, Tuula Kiviluoto, Ari Ristimaki, Lauri A. Aaltonen, and PiaVahteristo. Exome sequencing reveals three novel candidate predispositiongenes for diffuse gastric cancer. Familial Cancer, 14(2):241–246, 2015.

637 Ian J. Majewski, Irma Kluijt, Annemieke Cats, Thomas S. Scerri, Daphnede Jong, and Roelof J.C. Kluin et al. An a-E-catenin (CTNNA1) mutation inhereditary diffuse gastric cancer. The Journal of Pathology, 229(4):621–629,2013.

223

Page 254: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

638 Samantha Hansford, Pardeep Kaurah, Hector Li-Chang, Michelle Woo,Janine Senz, and Hugo Pinheiro et al. Hereditary diffuse gastric cancersyndrome: CDH1 mutations and beyond. JAMA Oncology, 1(1):23–32, 2015.

639Matthew N. Bainbridge, Georgina N. Armstrong, M. Monica Gramatges,Alison A. Bertuch, Shalini N. Jhangiani, and Harsha Doddapaneni et al.Germline mutations in shelterin complex genes are associated with familialglioma. Journal of the National Cancer Institute, 107(1):dju384, 2015.

640Heikki Ristolainen, Outi Kilpivaara, Peter Kamper, Minna Taskinen, SilvaSaarinen, and Sirpa Leppa et al. Identification of homozygous deletion inACAN and other candidate variants in familial classical Hodgkin lymphomaby exome sequencing. British Journal of Haematology, 170(3):428–431, 2015.

641 Silva Saarinen, Mervi Aavikko, Kristiina Aittomaki, Virpi Launonen, RainerLehtonen, and Kaarle Franssila et al. Exome sequencing reveals germlineNPAT mutation as a candidate risk factor for Hodgkin lymphoma. Blood,118(3):493–498, 2011.

642Melissa Rotunno, Mary L. McMaster, Joseph Boland, Sara Bass, XijunZhang, and Laurie Burdett et al. Whole exome sequencing in families athigh risk for Hodgkin lymphoma: identification of a predisposing mutation inthe KDR gene. Haematologica, 101(7):853, 2016.

643Natalia D. Linhares, Maira C.M. Freire, Raony G.C.C.L. Cardenas,Heloisa B. Pena, Magda Bahia, and Sergio D.J. Pena. Exome sequencingidentifies a novel homozygous variant in NDRG4 in a family with infantilemyofibromatosis. European Journal of Medical Genetics, 57(11-12):643–648,2014.

644Yee Him Cheung, Tenzin Gayden, Philippe M. Campeau, Charles A. LeDuc,Donna Russo, and Van-Hung Nguyen et al. A recurrent PDGFRB mutationcauses familial infantile myofibromatosis. The American Journal of HumanGenetics, 92(6):996–1000, 2013.

645 John A. Martignetti, Lifeng Tian, Dong Li, Maria Celeste M. Ramirez,Olga Camacho-Vanegas, and Sandra Catalina Camacho et al. Mutations

224

Page 255: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

in PDGFRB cause autosomal-dominant infantile myofibromatosis. TheAmerican Journal of Human Genetics, 92(6):1001–1007, 2013.

646Xiaolei Lan, Hua Gao, Fei Wang, Jie Feng, Jiwei Bai, and Peng Zhao et al.Whole-exome sequencing identifies variants in invasive pituitary adenomas.Oncology Letters, 12(4):2319–2328, 2016.

647 Joanne Ngeow, Wanfeng Yu, Lamis Yehia, Farshad Niazi, Jinlian Chen, andXuhua Tang et al. Exome sequencing reveals germline SMAD9 mutationthat reduces phosphatase and tensin homolog expression and is associatedwith hamartomatous polyposis and gastrointestinal ganglioneuromas.Gastroenterology, 149(4):886–889, 2015.

648Mervi Aavikko, Eevi Kaasinen, Janne K. Nieminen, Minji Byun, IikkiDonner, and Roberta Mancuso et al. Whole-genome sequencing identifiesSTAT4 as a putative susceptibility gene in classic Kaposi sarcoma. Journalof Infectious Diseases, 211(11):1842–1851, 2015.

649 Sho Egashira, Masatoshi Jinnin, Miho Harada, Shinichi Masuguchi, SatoshiFukushima, and Hironobu Ihn. Exome sequence analysis of Kaposiformhemangioendothelioma: identification of putative driver mutations. AnaisBrasileiros de Dermatologia, 91(6):748–753, 2016.

650 Stefano Caruso, Julien Calderaro, Eric Letouze, Jean-Charles Nault,Gabrielle Couchy, and Anais Boulais et al. Germline and somatic DICER1mutations in familial and sporadic liver tumors. Journal of Hepatology,66(4):734–742, 2016.

651Donghai Xiong, Yian Wang, Elena Kupert, Claire Simpson, Susan M.Pinney, and Colette R. Gaba et al. A recurrent mutation in PARK2 isassociated with familial lung cancer. The American Journal of HumanGenetics, 96(2):301–308, 2015.

652Hsuan-Yu Chen, Sung-Liang Yu, Bing-Ching Ho, Kang-Yi Su, Yi-ChiungHsu, and Chi-Sheng Chang et al. R331W missense mutation of oncogeneYAP1 is a germline risk allele for lung adenocarcinoma with medicalactionability. Journal of Clinical Oncology, 33(20):2303–2310, 2015.

225

Page 256: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

653Makia J. Marafie, Mohammed Dashti, and Fahd Al-Mulla. Identificationof a rare germline NBN gene mutation by whole exome sequencing in alung-cancer survivor from a large family with various types of cancer.Familial Cancer, pages 1–6, 2016.

654 Leila Noetzli, Richard W. Lo, Alisa B. Lee-Sherick, Michael Callaghan,Patrizia Noris, and Anna Savoia et al. Germline mutations in ETV6 areassociated with thrombocytopenia, red cell macrocytosis and predispositionto lymphoblastic leukemia. Nature Genetics, 47(5):535–538, 2015.

655 Sabine Topka, Joseph Vijai, Michael F. Walsh, Lauren Jacobs, Ann Maria,and Danylo Villano et al. Germline ETV6 mutations confer susceptibilityto acute lymphoblastic leukemia and thrombocytopenia. PLOS Genetics,11(6):e1005262, 2015.

656Valentina Silvestri, Veronica Zelli, Virginia Valentini, Piera Rizzolo,Anna Sara Navazio, and Anna Coppa et al. Whole-exome sequencing andtargeted gene sequencing provide insights into the role of PALB2 as a malebreast cancer susceptibility gene. Cancer, 123(2):210–218, 2016.

657Carla Daniela Robles-Espinoza, Mark Harland, Andrew J. Ramsay,Lauren G. Aoude, Victor Quesada, and Zhihao Ding et al. POT1loss-of-function variants predispose to familial melanoma. Nature Genetics,46(5):478–481, 2014.

658 Satoru Yokoyama, Susan L. Woods, Glen M. Boyle, Lauren G. Aoude,Stuart MacGregor, and Victoria Zismann et al. A novel recurrentmutation in MITF predisposes to familial and sporadic melanoma. Nature,480(7375):99–103, 2011.

659 Paola Ghiorzo, Lorenza Pastorino, Paola Queirolo, William Bruno, Maria G.Tibiletti, and Sabina Nasti et al. Prevalence of the E318K MITF germlinemutation in Italian melanoma patients: associations with histologicalsubtypes and family cancer history. Pigment Cell & Melanoma Research,26(2):259–262, 2013.

660Marianne Berwick, Jamie MacArthur, Irene Orlow, Peter Kanetsky, Colin B.Begg, and Li Luo et al. MITF E318K’s effect on melanoma risk independent

226

Page 257: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

of, but modified by, other risk factors. Pigment Cell & Melanoma Research,27(3):485–488, 2014.

661 J. William Harbour, Michael D. Onken, Elisha D.O. Roberson, ShenghuiDuan, Li Cao, and Lori A. Worley et al. Frequent mutation of BAP1 inmetastasizing uveal melanomas. Science, 330(6009):1410–1413, 2010.

662 Joseph R. Testa, Mitchell Cheung, Jianming Pei, Jennifer E. Below, YinfeiTan, and Eleonora Sementino et al. Germline BAP1 mutations predispose tomalignant mesothelioma. Nature Genetics, 43(10):1022–1025, 2011.

663Thomas Wiesner, Anna C. Obenauf, Rajmohan Murali, Isabella Fried,Klaus G. Griewank, and Peter Ulz et al. Germline mutations in BAP1predispose to melanocytic tumors. Nature Genetics, 43(10):1018–1021, 2011.

664Mohamed H. Abdel-Rahman, Robert Pilarski, Colleen M. Cebulla, James B.Massengill, Benjamin N. Christopher, and Getachew Boru et al. GermlineBAP1 mutation predisposes to uveal melanoma, lung adenocarcinoma,meningioma, and other cancers. Journal of Medical Genetics, 48(12):856–859,2011.

665 Lauren G. Aoude, Karin Wadt, Anders Bojesen, Dorthe Cruger, Ake Borg,and Jeffrey M. Trent et al. A BAP1 mutation in a Danish family predisposesto uveal melanoma and other cancers. PLOS ONE, 8(8):e72144, 2013.

666Mitchell Cheung, Jacqueline Talarchek, Karen Schindeler, Eduardo Saraiva,Lynette S. Penney, Mark Ludman, and Joseph R. Testa. Further evidencefor germline BAP1 mutations predisposing to melanoma and malignantmesothelioma. Cancer Genetics, 206(5):206–210, 2013.

667Megan N. Farley, Laura S. Schmidt, Jessica L. Mester, Samuel Pena-Llopis,Andrea Pavia-Jimenez, and Alana Christie et al. A novel germline mutationin BAP1 predisposes to familial clear-cell renal cell carcinoma. MolecularCancer Research, 11(9):1061–1071, 2013.

668Tatiana Popova, Lucie Hebert, Virginie Jacquemin, Sophie Gad, VirginieCaux-Moncoutier, and Catherine Dubois-d’Enghien et al. Germline BAP1mutations predispose to renal cell carcinomas. The American Journal ofHuman Genetics, 92(6):974–980, 2013.

227

Page 258: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

669David A. Maerker, Michael Zeschnigk, Jasmin Nelles, Dietmar R. Lohmann,Karl Worm, and Anja K. Bosserhoff et al. BAP1 germline mutation intwo first grade family members with uveal melanoma. British Journal ofOphthalmology, 98(2):224–227, 2014.

670Robert Pilarski, Colleen M. Cebulla, James B. Massengill, Karan Rai,Thereasa Rich, and Louise Strong et al. Expanding the clinical phenotype ofhereditary BAP1 cancer predisposition syndrome, reporting three new cases.Genes, Chromosomes and Cancer, 53(2):177–182, 2014.

671Colleen M. Cebulla, Elaine M. Binkley, Robert Pilarski, James B. Massengill,Karan Rai, and David A. Liebner et al. Analysis of BAP1 germlinegene mutation in young uveal melanoma patients. Ophthalmic Genetics,36(2):126–131, 2015.

672Arnaud de la Fouchardiere, Odile Cabaret, Liliana Savin, PatrickCombemale, Hubert Schvartz, and Clotilde Penet et al. Germline BAP1mutations predispose also to multiple basal cell carcinomas. ClinicalGenetics, 88(3):273–277, 2015.

673 Sonja Klebe, Jack Driml, Masaki Nasu, Sandra Pastorino, AmirmasoudZangiabadi, Douglas Henderson, and Michele Carbone. BAP1 hereditarycancer predisposition syndrome: a case report and review of literature.Biomarker Research, 3(1):14, 2015.

674 Pedram Gerami, Oriol Yelamos, Christina Y. Lee, Roxana Obregon, PedramYazdan, and Lauren M. Sholl et al. Multiple cutaneous melanomas andclinically atypical moles in a patient with a novel germline BAP1 mutation.JAMA Dermatology, 151(11):1235–1239, 2015.

675Karan Rai, Robert Pilarski, Colleen M. Cebulla, and Mohamed H.Abdel-Rahman. Comprehensive review of BAP1 tumor predispositionsyndrome with report of two new cases. Clinical Genetics, 89(3):285–294,2016.

676Karin A.W. Wadt, Lauren G. Aoude, Peter Johansson, Annalisa Solinas,Antonia L. Pritchard, and Oana Crainic et al. A recurrent germline BAP1

228

Page 259: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

mutation and extension of the BAP1 tumor predisposition spectrum toinclude basal cell carcinoma. Clinical Genetics, 88(3):267–272, 2015.

677David J. Barnes, Edward Hookway, Nick Athanasou, Takeshi Kashima, UdoOppermann, and Simon Hughes et al. A germline mutation of CDKN2A anda novel RPLP1-C19MC fusion detected in a rare melanotic neuroectodermaltumor of infancy: a case report. BMC Cancer, 16(1):629, 2016.

678Miriam J. Smith, James O’Sullivan, Sanjeev S. Bhaskar, Kristen D. Hadfield,Gemma Poke, and John Caird et al. Loss-of-function mutations inSMARCE1 cause an inherited disorder of multiple spinal meningiomas.Nature Genetics, 45(3):295–298, 2013.

679Miriam J. Smith, Andrew J. Wallace, Chris Bennett, Martin Hasselblatt,Ewelina Elert-Dobkowska, and Linton T. Evans et al. Germline SMARCE1mutations predispose to both spinal and cranial clear cell meningiomas. TheJournal of Pathology, 234(4):436–440, 2014.

680Helen Raffalli-Ebezant, Scott A. Rutherford, Stavros Stivaros, AnnaKelsey, Miriam Smith, D. Gareth Evans, and John-Paul Kilday. Pediatricintracranial clear cell meningioma associated with a germline mutation ofSMARCE1: a novel case. Child’s Nervous System, 31(3):441–447, 2015.

681 Linton T. Evans, Jack Van Hoff, William F. Hickey, Miriam J. Smith,D. Gareth Evans, William G. Newman, and David F. Bauer. SMARCE1mutations in pediatric clear cell meningioma: case report. Journal ofNeurosurgery: Pediatrics, 16(3):296–300, 2015.

682Wei Dai, Hong Zheng, Arthur Kwok Leung Cheung, Clara Sze-manTang, Josephine Mun Yee Ko, and Bonnie Wing Yan Wong et al.Whole-exome sequencing identifies MST1R as a genetic susceptibility gene innasopharyngeal carcinoma. Proceedings of the National Academy of Sciences,113(12):3317–3322, 2016.

683 Sudheer Kumar Gara, Li Jia, Maria J. Merino, Sunita K. Agarwal, LisaZhang, and Maggie Cam et al. Germline HABP2 mutation causingfamilial nonmedullary thyroid cancer. New England Journal of Medicine,373(5):448–455, 2015.

229

Page 260: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

684Chang Liu, Yang Yu, Guangliang Yin, Junxia Zhang, Wei Wen, and XianhuiRuan et al. C14orf93 (RTFC) is identified as a novel susceptibility gene forfamilial nonmedullary thyroid cancer. Biochemical and Biophysical ResearchCommunications, pages 1–7, 2016.

685 Ed Dicks, Honglin Song, Susan J. Ramus, Elke Van Oudenhove, Jonathan P.Tyrer, and Maria P. Intermaggio et al. Germline whole exome sequencingand large-scale replication identifies FANCM as a likely high grade serousovarian cancer susceptibility gene. Oncotarget, 2017.

686 Silvia Vilarinho, E. Zeynep Erson-Omay, Akdes Serin Harmanci, RaffaellaMorotti, Geneive Carrion-Grant, and Jacob Baranoski et al. Paediatrichepatocellular carcinoma due to somatic CTNNB1 and NFE2L2 mutationsin the setting of inherited bi-allelic ABCB11 mutations. Journal ofHepatology, 61(5):1178–1183, 2014.

687 Filemon S. Dela Cruz, Daniel Diolaiti, Andrew T. Turk, Allison R. Rainey,Alberto Ambesi-Impiombato, and Stuart J. Andrews et al. A case studyof an integrative genomic and experimental therapeutic approach for raretumors: identification of vulnerabilities in a pediatric poorly differentiatedcarcinoma. Genome Medicine, 8(1):116, 2016.

688Yoko Shimada, Takashi Kohno, Hideki Ueno, Yoshinori Ino, HideyukiHayashi, and Takashi Nakaoku et al. An oncogenic ALK fusion andan RRAS mutation in KRAS mutation-negative pancreatic ductaladenocarcinoma. The Oncologist, 22(2):158–164, 2017.

689 Jerneja Tomsic, Huiling He, Keiko Akagi, Sandya Liyanarachchi, Qun Pan,and Blake Bertani et al. A germline mutation in SRRM2, a splicing factorgene, is implicated in papillary thyroid carcinoma predisposition. ScientificReports, 5, 2015.

690Alberto Cascon, Inaki Comino-Mendez, Maria Curras-Freixes, Aguirre A.de Cubas, Laura Contreras, and Susan Richter et al. Whole-exomesequencing identifies MDH2 as a new familial paraganglioma gene. Journalof the National Cancer Institute, 107(5):djv053, 2015.

230

Page 261: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

691Andrew Feber, Daniel C. Worth, Ankur Chakravarthy, Patricia de Winter,Kunal Shah, and Manit Arya et al. Somatic mutations in penile squamouscell carcinoma. Cancer Research, 76(16):4720–4727, 2016.

692 Inaki Comino-Mendez, Francisco J. Gracia-Aznarez, Francesca Schiavi, InigoLanda, Luis J. Leandro-Garcia, and Rocio Leton et al. Exome sequencingidentifies MAX mutations as a cause of hereditary pheochromocytoma.Nature Genetics, 43(7):663–667, 2011.

693Nelly Burnichon, Alberto Cascon, Francesca Schiavi, Nicole Paes Morales,Inaki Comino-Mendez, and Nassera Abermil et al. Mutations causehereditary and sporadic pheochromocytoma and paraganglioma. ClinicalCancer Research, 18(10):2828–2837, 2012.

694Mariola Peczkowska, Aldona Kowalska, Jacek Sygut, DariuszWaligorski, Angelica Malinoc, and Hanna Janaszek-Sitkowska et al.Testing new susceptibility genes in the cohort of apparently sporadicphaeochromocytoma/paraganglioma patients with clinical characteristics ofhereditary syndromes. Clinical Endocrinology, 79(6):817–823, 2013.

695 Sohela Shah, Kasmintan A. Schrader, Esme Waanders, Andrew E. Timms,Joseph Vijai, and Cornelius Miething et al. A recurrent germline PAX5mutation confers susceptibility to pre-B cell acute lymphoblastic leukemia.Nature Genetics, 45(10):1226–1231, 2013.

696Yuji Ikeda, Kazuma Kiyotani, Poh Yin Yew, Taigo Kato, Kenji Tamura, andKai Lee Yap et al. Germline PARP4 mutations in patients with primarythyroid and breast cancers. Endocrine-Related Cancer, 23(3):171–179, 2016.

697 Pia Ostergaard, Michael A. Simpson, Fiona C. Connell, Colin G. Steward,Glen Brice, and Wesley J. Woollard et al. Mutations in GATA2 causeprimary lymphedema associated with a predisposition to acute myeloidleukemia (Emberger syndrome). Nature Genetics, 43(10):929–931, 2011.

698Christopher N. Hahn, Chan-Eng Chong, Catherine L. Carmichael, Ella J.Wilkins, Peter J. Brautigan, and Xiao-Chun Li et al. Heritable GATA2mutations associated with familial myelodysplastic syndrome and acutemyeloid leukemia. Nature Genetics, 43(10):1012–1017, 2011.

231

Page 262: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

699Harriet Holme, Upal Hossain, Michael Kirwan, Amanda Walne, TomVulliamy, and Inderjeet Dokal. Marked genetic heterogeneity in familialmyelodysplasia/acute myeloid leukaemia. British Journal of Haematology,158(2):242–248, 2012.

700 Jan Kazenwadel, Genevieve A. Secker, Yajuan J. Liu, Jill A. Rosenfeld,Robert S. Wildin, and Jennifer Cuellar-Rodriguez et al. Loss-of-functiongermline GATA2 mutations in patients with MDS/AML or MonoMACsyndrome and primary lymphedema reveal a key role for GATA2 in thelymphatic vasculature. Blood, 119(5):1283–1291, 2012.

701Marlene Pasquet, Christine Bellanne-Chantelot, Suzanne Tavitian, NaisPrade, Blandine Beaupain, and Olivier LaRochelle et al. High frequencyof GATA2 mutations in patients with mild chronic neutropenia evolving toMonoMac syndrome, myelodysplasia, and acute myeloid leukemia. Blood,121(5):822, 2013.

702 Juehua Gao, Ryan D. Gentzler, Andrew E. Timms, Marshall S. Horwitz,Olga Frankfurt, Jessica K. Altman, and LoAnn C. Peterson. HeritableGATA2 mutations associated with familial AML-MDS: a case report andreview of literature. Journal of Hematology & Oncology, 7(1):36, 2014.

703Christopher N. Hahn, Peter J. Brautigan, Chan-Eng Chong, Alex Janssan,Parameswaran Venugopal, and Young Lee et al. Characterisation ofa compound in-cis GATA2 germline mutation in a pedigree presentingwith myelodysplastic syndrome/acute myeloid leukemia with concurrentthrombocytopenia. Leukemia, 29(8):1795–1797, 2015.

704Xinan Wang, Hideki Muramatsu, Yusuke Okuno, Hirotoshi Sakaguchi,Kenichi Yoshida, and Nozomu Kawashima et al. GATA2 and secondarymutations in familial myelodysplastic syndromes and pediatric myeloidmalignancies. Haematologica, 100(10):e398, 2015.

705 Liesel M. FitzGerald, Akash Kumar, Evan A. Boyle, Yuzheng Zhang,Laura M. McIntosh, and Suzanne Kolb et al. Germline missense variantsin the BTNL2 gene are associated with prostate cancer susceptibility. CancerEpidemiology Biomarkers & Prevention, 22(9):1520–1528, 2013.

232

Page 263: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

706Daniel C. Koboldt, Krishna L. Kanchi, Bin Gui, David E. Larson, Robert S.Fulton, and William B. Isaacs et al. Rare variation in TET2 is associatedwith clinically relevant prostate carcinoma in African Americans. CancerEpidemiology Biomarkers & Prevention, 25(11):1456–1463, 2016.

707Takahide Hayano, Hiroshi Matsui, Hirofumi Nakaoka, Nobuaki Ohtake,Kazuyoshi Hosomichi, Kazuhiro Suzuki, and Ituro Inoue. Germline variantsof prostate cancer in Japanese families. PLOS ONE, 11(10):e0164233, 2016.

708Danielle M. Karyadi, Milan S. Geybels, Eric Karlins, Brennan Decker, LauraMcIntosh, and Amy Hutchinson et al. Whole exome sequencing in 75high-risk families with validation and replication in independent case-controlstudies identifies TANGO2, OR5H14, and CHAD as new prostate cancersusceptibility genes. Oncotarget, 8(1):1495–1507, 2016.

709 Sally M. Hunter, Simone M. Rowley, David Clouston, Jason Li, RichardLupat, and Nishanth Krishnananthan et al. Searching for candidate genes infamilial BRCAX mutation carriers with prostate cancer. Urologic Oncology:Seminars and Original Investigations, 34(3):120.e9–120.e16, 2016.

710Krinio Giannikou, Izabela A. Malinowska, Trevor J. Pugh, Rachel Yan,Yuen-Yi Tseng, and Coyin Oh et al. Whole exome sequencing identifiesTSC1/TSC2 biallelic loss as the primary and sufficient driver event for renalangiomyolipoma development. PLOS Genetics, 12(8):e1006242, 2016.

711 Jinwoo Ahn, Kyung Seok Han, Jun Hyeok Heo, Duhee Bang, You HyunKang, and Hyun A. Jin et al. FOXC2 and CLIP4: a potential biomarkerfor synchronous metastasis of < 7-cm clear cell renal cell carcinomas.Oncotarget, 7(32):51423–51434, 2016.

712 Frank Y. Lin, Katie Bergstrom, Richard Person, Abhishek Bavle, Leomar Y.Ballester, and Sarah Scollon et al. Integrated tumor and germlinewhole-exome sequencing identifies mutations in MAPK and PI3K pathwaygenes in an adolescent with rosette-forming glioneuronal tumor of the fourthventricle. Cold Spring Harbor Molecular Case Studies, 2(5):a001057, 2016.

713 Leora Witkowski, Jian Carrot-Zhang, Steffen Albrecht, SomayyehFahiminiya, Nancy Hamel, and Eva Tomiak et al. Germline and somatic

233

Page 264: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

SMARCA4 mutations characterize small cell carcinoma of the ovary,hypercalcemic type. Nature Genetics, 46(5):438–443, 2014.

714 Pilar Ramos, Anthony N. Karnezis, David W. Craig, Aleksandar Sekulic,Megan L. Russell, and William P. D. Hendricks et al. Small cell carcinomaof the ovary, hypercalcemic type, displays frequent inactivating germline andsomatic mutations in SMARCA4. Nature Genetics, 46(5):427–429, 2014.

715 Pierre-Marie Lavrut, Francois Le Loarer, Charline Normand, Celine Grosos,Remi Dubois, and Anni Buenerd et al. Small cell carcinoma of the ovary,hypercalcemic type/ovarian malignant rhabdoid tumor: report of a bilateralcase in a teenager associated with SMARCA4 germline mutation. PediatricDevelopmental Pathology, 19:56–60, 2015.

716 Joanna Moes-Sosnowska, Lukasz Szafron, Dorota Nowakowska, AgnieszkaDansonka-Mieszkowska, Agnieszka Budzilowska, and Bozena Konopka et al.Germline SMARCA4 mutations in patients with ovarian small cell carcinomaof hypercalcemic type. Orphanet Journal of Rare Diseases, 10(1):32, 2015.

717Yoshitatsu Sei, Xilin Zhao, Joanne Forbes, Silke Szymczak, Qing Li, andApurva Trivedi et al. A hereditary form of small intestinal carcinoidassociated with a germline mutation in inositol polyphosphate multikinase.Gastroenterology, 149(1):67–78, 2015.

718Kie Kyon Huang, Kang Won Jang, Sangwoo Kim, Han Sang Kim, Sung-MooKim, and Hyeong Ju Kwon et al. Exome sequencing reveals recurrentREV3L mutations in cisplatin-resistant squamous cell carcinoma of head andneck. Scientific Reports, 6:19552, 2016.

719Huixing Pan, Xiaojian Xu, Deyao Wu, Qiaocheng Qiu, Shoujun Zhou, andXuefeng He et al. Novel somatic mutations identified by whole-exomesequencing in muscle-invasive transitional cell carcinoma of the bladder.Oncology Letters, 11(2):1486–1492, 2016.

720 Sandra Hanks, Elizabeth R. Perdeaux, Sheila Seal, Elise Ruark, Shazia S.Mahamdallie, and Anne Murray et al. Germline mutations in the PAF1complex gene CTR9 predispose to Wilms tumour. Nature Communications,5, 2014.

234

Page 265: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

721Cristina R. Antonescu and Paola Dal Cin. Promiscuous genes involved inrecurrent chromosomal translocations in soft tissue tumours. Pathology,46(2):105–112, 2014.

722 Jungho Kim and Jerry Pelletier. Molecular genetics of chromosometranslocations involving EWS and related family members. PhysiologicalGenomics, 1(3):127–138, 1999.

723 Fredrik Mertens, Cristina R. Antonescu, and Felix Mitelman. Gene fusionsin soft tissue tumors: recurrent and overlapping pathogenetic themes. Genes,Chromosomes and Cancer, 55(4):291–310, 2016.

724 Felix Mitelman, Bertil Johansson, and Fredrik Mertens. The impact oftranslocations and gene fusions on cancer causation. Nat Rev Cancer,7(4):233–245, 2007.

725 Johanna Manner, Bernhard Radlwimmer, Peter Hohenberger, KatharinaMossinger, Stefan Kuffer, and Christian Sauer et al. MYC high level geneamplification is a distinctive feature of angiosarcomas after irradiation orchronic lymphedema. The American Journal of Pathology, 176(1):34–39,2010.

726 Patrick S. Tarpey, Sam Behjati, Susanna L. Cooke, Peter Van Loo, David C.Wedge, and Nischalan Pillay et al. Frequent mutation of the major cartilagecollagen gene COL2A1 in chondrosarcoma. Nature Genetics, 45(8):923–926,2013.

727 Janusz Limon, Anna Szadowska, Mariola Iliszko, Malgorzata Babinska,Krzysztof Mrozek, and Janusz Jaskiewicz et al. Recurrent chromosomechanges in two adult fibrosarcomas. Genes, Chromosomes and Cancer,21(2):119–123, 1998.

728 Eva Van den Berg, Willemina M. Molenaar, Harald J. Hoekstra, Willem A.Kamps, and Bauke De Jong. DNA ploidy and karyotype in recurrent andmetastatic soft tissue sarcomas. Modern Pathology, 5(5):505–514, 1992.

729 Paola Dal Cin, Patrick Pauwels, Raf Sciot, and Herman Van Den Berghe.Multiple chromosome rearrangements in a fibrosarcoma. Cancer Geneticsand Cytogenetics, 87(2):176–178, 1996.

235

Page 266: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

730 Jilong Yang, Xiaoling Du, Kexin Chen, Antti Ylipaa, Alexander J.F. Lazar,and Jonathan Trent et al. Genetic aberrations in soft tissue leiomyosarcoma.Cancer Letters, 275(1):1–8, 2009.

731Avery A. Sandberg. Updates on the cytogenetics and molecular geneticsof bone and soft tissue tumors: leiomyosarcoma. Cancer Genetics andCytogenetics, 161(1):1–19, 2005.

732Ahmed Idbaih, Jean-Michel Coindre, Josette Derre, Odette Mariani,Philippe Terrier, and Dominique Ranchere et al. Myxoid malignant fibroushistiocytoma and pleomorphic liposarcoma share very similar genomicimbalances. Laboratory Investigation, 85(2):176–181, 2005.

733Hannelore Schmidt, Frank Bartel, Matthias Kappler, Peter Wurl, HeidemarieLange, and Matthias Bache et al. Gains of 13q are correlated with a poorprognosis in liposarcoma. Modern Pathology, 18(5):638–644, 2005.

734Barry S. Taylor, Jordi Barretina, Nicholas D. Socci, Penelope DeCarolis,Marc Ladanyi, and Matthew Meyerson et al. Functional copy-numberalterations in cancer. PLOS ONE, 3(9):e3179, 2008.

735Christopher D.M. Fletcher, Paola Dal Cin, Ivo De Wever, Nils Mandahl,Fredrik Mertens, and Felix Mitelman et al. Correlation betweenclinicopathological features and karyotype in spindle cell sarcomas: a reportof 130 cases from the CHAMP study group. The American Journal ofPathology, 154(6):1841–1847, 1999.

736 Fredrik Mertens, Paola Dal Cin, Ivo De Wever, Christopher D.M. Fletcher,Nils Mandahl, and Felix Mitelman et al. Cytogenetic characterization ofperipheral nerve sheath tumours: a report of the CHAMP study group. TheJournal of Pathology, 190(1):31–38, 2000.

737R. Stuart Bridge, Julia Ann Bridge, James R. Neff, Sabine Naumann,Pamela A. Althof, and Leslie A. Bruch. Recurrent chromosomal imbalancesand structurally abnormal breakpoints within complex karyotypes ofmalignant peripheral nerve sheath tumour and malignant triton tumour: acytogenetic and molecular cytogenetic study. Journal of Clinical Pathology,57(11):1172–1178, 2004.

236

Page 267: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

738 Fredrik Mertens, Christopher D.M. Fletcher, Paola Dal Cin, Ivo De Wever,Nils Mandahl, and Felix Mitelman et al. Cytogenetic analysis of 46pleomorphic soft tissue sarcomas and correlation with morphologic andclinical features: a report of the CHAMP study group. Genes, Chromosomesand Cancer, 22(1):16–25, 1998.

739Anwar N. Mohamed, Mark M. Zalupski, James R. Ryan, Fred Koppitch,Stanley Balcerzak, Raymond Kempf, and Sandra R. Wolman. Cytogeneticaberrations and DNA ploidy in soft tissue sarcoma: a Southwest OncologyGroup Study. Cancer Genetics and Cytogenetics, 99(1):45–53, 1997.

740Guidong Li, Akira Ogose, Hiroyuki Kawashima, Hajime Umezu, TetsuoHotta, and Tsuyoshi Tohyama et al. Cytogenetic and real-time quantitativereverse-transcriptase polymerase chain reaction analyses in pleomorphicrhabdomyosarcoma. Cancer Genetics and Cytogenetics, 192(1):1–9, 2009.

741Anthony Gordon, Aidan McManus, John Anderson, Cyril Fisher,Syuiti Abe, and Takayuki Nojima et al. Chromosomal imbalancesin pleomorphic rhabdomyosarcomas and identification of the alveolarrhabdomyosarcoma-associated PAX3-FOXO1A fusion gene in one case.Cancer Genetics and Cytogenetics, 140(1):73–77, 2003.

742 Josette Derre, Real Lagace, Andre Nicolas, Aline Mairal, Frederic Chibon,and Jean-Michel Coindre et al. Leiomyosarcomas and most malignantfibrous histiocytomas share very similar comparative genomic hybridizationimbalances: an analysis of a series of 27 leiomyosarcomas. LaboratoryInvestigation, 81(2):211–215, 2000.

743Marcelo L. Larramendy, Massimiliano Gentile, Sonia Soloneski, SakariKnuutila, and Tom Bohling. Does comparative genomic hybridizationreveal distinct differences in DNA copy number sequence patterns betweenleiomyosarcoma and malignant fibrous histiocytoma? Cancer Genetics andCytogenetics, 187(1):1–11, 2008.

744Ana Carneiro, Princy Francis, Par-Ola Bendahl, Josefin Fernebro, MansAkerman, and Christopher Fletcher et al. Indistinguishable genomic profilesand shared prognostic markers in undifferentiated pleomorphic sarcoma and

237

Page 268: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

leiomyosarcoma: different sides of a single coin? Laboratory Investigation,89(6):668–675, 2009.

745Ching C. Lau, Charles P. Harris, Xin-Yan Lu, Laszlo Perlaky, SheilaGogineni, and Murali Chintagumpala et al. Frequent amplification andrearrangement of chromosomal bands 6p12-p21 and 17p11.2 in osteosarcoma.Genes, Chromosomes and Cancer, 39(1):11–21, 2004.

746 Shamini Selvarajah, Maisa Yoshimoto, Olga Ludkovski, Paul C. Park, JaneBayani, and Paul Thorner et al. Genomic signatures of chromosomalinstability and osteosarcoma progression detected by high resolution arrayCGH and interphase FISH. Cytogenetic and Genome Research, 122(1):5–15,2008.

747 Jane Bayani, Maria Zielenska, Ajay Pandita, Khaldoun Al-Romaih, JanaKaraskova, and Karen Harrison et al. Spectral karyotyping identifiesrecurrent complex rearrangements of chromosomes 8, 17, and 20 inosteosarcomas. Genes, Chromosomes and Cancer, 36(1):7–16, 2003.

748 Julia A. Bridge, Marilu Nelson, Erin McComb, Michael H. McGuire,Howard Rosenthal, and Gerardo Vergara et al. Cytogenetic findings in 73osteosarcoma specimens and a review of the literature. Cancer Genetics andCytogenetics, 95(1):74–87, 1997.

238

Page 269: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendices

239

Page 270: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby
Page 271: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix A

World Health Organisationclassification of soft tissuetumours and bone tumours

SOFT TISSUE TUMOURS

Adipocytic tumours

Benign

Lipoma

Lipomatosis

Lipomatosis of nerve

Lipoblastoma / lipoblastomatosis

Angiolipoma

Myolipoma of soft tissue

Chondroid lipoma

Extra-renal angiomyolipoma

Extra-adrenal myelolipoma

Spindle cell / pleomorphic lipoma

Hibernoma

241

Page 272: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Intermediate (locally aggressive)

Atypical lipomatous tumour / well differentiated liposarcoma

Malignant

Dedifferentiated liposarcoma

Myxoid liposarcoma

Pleomorphic liposarcoma

Liposarcoma, not otherwise specified

Atypical lipomatous tumour (ALT)

Adipocytic (lipoma-like)

Sclerosing

Inflammatory types

Dedifferentiated liposarcoma

Fibroblastic / myofibroblastic tumours

Benign

Nodular fasciitis

Proliferative fasciitis

Proliferative myositis

Myositis ossifficans

Fibro-osseous pseudotumour of digits

Ischemic fasciitis

Elastofibroma

Fibrous hamartoma of infancy

Fibromatosis colli

Juvenile hyaline fibromatosis

Inclusion body fibromatosis

Fibroma of tendon sheath

Desmoplastic fibroblastoma

Mammary-type myofibroblastoma

Calcifying aponeurotic fibroma

242

Page 273: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Angiomyofibroblastoma

Cellular angiofibroma

Nuchal-type fibroma

Gardner fibroma

Calcifying fibrous tumour

Intermediate (locally aggressive)

Palmar / plantar fibromatosis

Desmoids-type fibromatosis

Lipofibromatosis

Giant cell fibroblastoma

Intermediate (rarely metastasizing)

Dermatofibrosarcoma protuberans

Fibrosarcomatous dermatofibrosarcoma protuberans

Pigmented dermatofibrosarcoma protuberans

Solitary fibrous tumour

Solitary fibrous tumour, malignant

Inflammatory myofibroblastic tumour

Low grade myofibroblastic sarcoma

Myxoinflammatory fibroblastic sarcoma

Atypical myxoinflammatory fibroblastic tumour

Infantile fibrosarcoma

Malignant

Adult fibrosarcoma

Myxofibrosarcoma

Low-grade fibromyxoid sarcoma

Sclerosing epithelioid fibrosarcoma

Nodular fasciitis

Extrapleural solitary fibrous tumour

Low grade fibromyxoid sarcoma (LGFMS)

243

Page 274: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Sclerosing epithelioid fibrosarcoma (SEF)

So-called fibrohistiocytic tumours

Benign

Tenosynovial giant cell tumour

Localized type

Diffuse type

Malignant

Deep benign fibrous histiocytoma

Intermediate (rarely metastasizing)

Plexiform fibrohistiocytic tumour

Giant cell tumour of soft tissue

Tenosynovial giant cell tumour

Smooth-muscle tumours

Benign

Leiomyoma of deep soft tissue

Malignant

Leiomyosarcoma (excluding skin)

Leiomyosarcoma

Pericytic (perivascular) tumours

Glomus tumour (and variants)

Glomangiomatosis

Malignant glomus tumour

Myopericytoma

Myofibroma

Myofibromatosis

Angioleiomyoma

Skeletal-muscle tumours

Rhabdomyoma

Embryonal rhabdomyosarcoma

244

Page 275: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Alveolar rhabdomyosarcoma

Pleomorphic rhabdomyosarcoma

Spindle cell / Sclerosing rhabdomyosarcoma

Alveolar rhabdomyosarcoma (ARMS)

Vascular tumours

Benign

Haemangioma

Synovial

Venous

Arteriovenous haemangioma / malformation

Epithelioid haemangioma

Angiomatosis

Lymphangioma

Intermediate (locally aggressive)

Kaposiform haemangioendothelioma

Intermediate (rarely metastasizing)

Retiform haemangioendothelioma

Papillary intralymphatic angioendothelioma

Composite haemangioendothelioma

Pseudomyogenic (epithelioid sarcoma-like) haemangioendothelioma

Kapsoi sarcoma

Malignant

Epithelioid haemangioendothelioma

Angiosarcoma of soft tissue

Gastrointestinal stromal tumours

Benign gastrointestinal stromal tumour

Gastrointestinal stromal tumour

Gastrointestinal stromal tumour

245

Page 276: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Nerve sheath tumours

Benign

Schwannoma (including variants)

Melanotic schwannoma

Neurofibroma (including variants)

Plexiform neurofibroma

Perineurioma

Malignant perineurioma

Granular cell tumour

Dermal nerve sheath myxoma

Solitary circumscribed neuroma

Ectopic meningioma

Nasal glial heterotopia

Benign Triton tumour

Hybrid nerve sheath tumours

Malignant

Malignant peripheral nerve sheath tumour

Epithelioid malignant nerve sheath tumour

Malignant Triton tumour

Malignant granular cell tumour

Ectomesenchymoma

Tumours of uncertain differentiation

Benign

Acral fibromyxoma

Intramuscular myxoma (including cellular variant)

Juxta-articular myxoma

Deep (“aggressive”) angiomyxoma

Pleomorphic hyalinizing angiectatic tumour

Ectopic hamartomatous thymoma

246

Page 277: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Intermediate (locally aggressive)

Haemosiderotic fibrolipomatous tumour

Intermediate (rarely metastasizing)

Atypical fibroxanthoma

Angiomatoid fibrous histiocytoma

Ossifying fibromyxoid tumour

Ossifying fibromyxoid tumour, malignant

Mixed tumour NOS

Mixed tumour NOS, malignant

Myoepithelioma

Myoepithelial carcinoma

Phosphaturic mesenchymal tumour

Phosphaturic mesenchymal tumour

Malignant

Synovial sarcoma NOS

Synovial sarcoma, spindle cell

Synovial sarcoma, biphasic

Epithelioid sarcoma

Alveolar soft-part sarcoma

Clear cell sarcoma of soft tissue

Extraskeletal myxoid chondrosarcoma

Extraskeletal Ewing sarcoma

Desmoplastic small round cell tumour

Extra-renal rhabdoid tumour

Neoplasms with perivascular epithelioid cell differentiation (PEComa)

PEComa NOS, benign

PEComa NOS, malignant

Intimal sarcoma

247

Page 278: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Undifferentiated / unclassified sarcomas

Undifferentiated spindle cell sarcoma

Undifferentiated pleomorphic sarcoma

Undifferentiated round cell sarcoma

Undifferentiated epithelioid sarcoma

Undifferentiated sarcoma NOS

Undifferentiated round cell and spindle cell sarcoma

Undifferentiated pleomorphic sarcoma (UPS)

TUMOURS OF BONE

Chondrogenic tumours

Benign

Osteochondroma

Chondroma

Enchondroma

Periosteal chondroma

Osteochondromyxoma

Subungual exostosis

Bizarre parosteal osteochondromatous proliferation

Synovial chondromatosis

Intermediate (locally aggressive)

Chondromyxoid fibroma

Atypical cartilaginous tumour / chondrosarcoma grade I

Intermediate (rarely metastasizing)

Chondroblastoma

Malignant

Chondrosarcoma

Grade II, Grade III

Dedifferentiated chondrosarcoma

Mesenchymal chondrosarcoma

248

Page 279: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Clear cell chondrosarcoma

Osteochondromyxoma

Bizarre parosteal osteochondromatous proliferation

Chondrosarcoma (grades I-III)

Osteogenic tumours

Benign

Osteoma

Osteoid osteoma

Intermediate (locally aggressive)

Osteoblastoma

Malignant

Low-grade central osteosarcoma

Conventional osteosarcoma

Chondroblastic osteosarcoma

Fibroblastic osteosarcoma

Osteoblastic osteosarcoma

Telangiectatic osteosarcoma

Small cell osteosarcoma

Secondary osteosarcoma

Parosteal osteosarcoma

Periosteal osteosarcoma

High-grade surface osteosarcoma

Osteoclastic giant cell rich tumours

Benign

Giant cell lesion of the small bones

Intermediate locally aggressive

Giant cell tumour of bone

Malignant

Malignancy in giant cell tumour of bone

249

Page 280: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Fibrohistiocytic tumours

Benign

Benign fibrous histiocytoma / non-ossifying fibroma

Notochordal tumours

Benign

Benign notochordal tumour

Malignant

Chordoma

Vascular tumours

Benign

Haemangioma

Intermediate locally aggressive rarely metastasizing

Epithelioid hemangioma

Malignant

Epithelioid hemangioendothelioma

Angiosarcoma

Reference: Bridge, J. A., et al. WHO classification of tumours of soft tissue and bone. International

Agency for Research on Cancer, 2013.

250

Page 281: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix B

Novel tumour-predisposing genesidentified by whole exomesequencing

251

Page 282: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Abestosexposed lungadenocarcinoma

Finland 26 cases MRPL1,SDK1,SEMA5B,INPP4A

594

Adenomatouspolyposis andcolorectalcarcinomas

Netherlands,USA

51 patient from 48 families,negative for APC and MUTYHmutations

NTHL1 595

Atypical gastricneuroendocrinetumour, type 1

Spain Large family, withconsanguineous parents and5/10 affected children

ATP4A 596

Brain Germany 1 family CASP9 597

Breast Poland,Canada

144 Polish and 51French-Canadian patients withfamily history and/or earlyonset, negative for foundermutations in BRCA1, BRCA2,CHEK2, NBN and PALB2

RECQL 598 599

China 9 early-onset patients withfamily history, negative forBRCA1 and BRCA2 mutations

RECQL 599 598

Finland 24 breast cancer patients from11 families, negative for BRCA1and BRCA2 mutations

FANCM 600 601,602

252

Page 283: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Multiple 89 early-onset breast cancerpatients from 47 families

RINT1 603

Australia 33 breast cancer patients from15 families, negative for BRCA1and BRCA2 mutations

FANCC,BLM

604 605–607

Multiple 13 families XRCC2 608

Finland 129 female hereditary breastand/or ovarian cancer patients,up to 989 female controls

ATM, MYC,PLAU,RAD1, andRRM2B

609

Chondrosarcoma France 2 third-degree affected relativesin a single family

EXT2 610

Chroniclymphocyticleukaemia

European 59 chronic lymphocyticleukaemia-prone familiesand 173 unrelated chroniclymphocytic leukaemia patients

ITGB2 611

UK 66 chronic lymphocyticleukaemia families

POT1 612

Colorectal China 23 early onset colorectal cancerpatients from 21 families

EIF2AK4 613

Spain 3 patients from a large family FAN1 614

253

Page 284: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Spain Patients from 29 families,negative for mutations in knowncolorectal cancer genes

CDKN1B,XRCC4,EPHX1,NFKBIZ,SMARCA4,BARD1

615

Finland 4 patients from a large family,negative for mutations in knowncolorectal cancer genes

RPS20 616

Finland 96 patients with family historyof colorectal cancer, negative formutations in known colorectalcancer genes

UACA,SFXN4,TWSG1,PSPH,NUDT7,ZNF490,PRSS37,CCDC18,PRADC1,MRPL3,AKR1C4

617

Taiwan 50 colorectal cancer cases NRAS 618

UK 1,006 early-onset familial CRCcases and 1,609 healthy controls

MRE11,POLE2 andPOT1

619

254

Page 285: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Netherlands 55 colorectal cancer cases witha disease onset before 45 yearsof age

PTPN12and LRP6

620

Colorectaladenomas andcarcinomas

UK Probands from 15 colorectaladenoma families, negativefor mutations in APC andMUTYH

POLE,POLD1

621 622–629

Ashkenazi 2 sisters MCM9 630

Colorectaladenomatouspolyposis

Germany 102 unrelated individuals MSH2 631

Germany 12 colorectal adenomas fromseven unrelated patients

DSC2,PIEZO1,ZSWIM7

629

Colorectaladenomatouspolyposis

Germany 12 colorectal adenomasfrom 7 unrelated patientswith unexplained sporadicadenomatous polyposis

DSC2,PIEZO1,ZSWIM7

632

Esophagealadenocarcinomaand Barrettesophagus

USA Large family VSIG10L 633

255

Page 286: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Esophagealsquamous cellcarcinoma

China 51 stage I and 53 stage IIIesophageal squamous cellcarcinomas

FAM84B 634

Familialschwannomatosis

China Large family, negativefor mutations in knowndisease-causing genes

COQ6 635

Gastric Finland Large family with the diffusetype of gastric cancer, negativefor mutations in CDH1

INSR,FBXO24,DOT1L

636

Netherlands Large family with the diffusetype of gastric cancer, negativefor mutations in CDH1

CTNNA1 637 638

Glioma Multiple 90 patients from 55 families POT1 639

Hodgkinlymphoma

Middle East Family with 3 out of 5 affectedchildren and healthy parents

ACAN 640

Finland Large family with nodularlymphocyte predominantHodgkin lymphoma

NPAT 641

USA 17 Hodgkin lymphoma pronefamilies with three or moreaffected cases or obligatecarriers (69 individuals)

KDR 642

Infantilemyofibromatosis

Brazil 2 affected brothers and theirhealthy consanguineous parents

NDRG4 643

256

Page 287: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Multiple 11 patients from 4 families, and5 simplex cases

PDGFRB 644 645

USA 11 patients from 9 families PDGFRB 645

USA Large family, negative forPDGFRB mutations

NOTCH3 645 644

Invasive pituitaryadenomas

China 6 invasive pituitary adenomasand 6 non-invasive pituitaryadenomas

DPCR1,EGFL7,the PRDMfamily andLRRC50

646

Juvenilehamartomatouspolyposissyndrome

USA Single patient, with extensivefamily history, negativefor known disease-causingmutations

SMAD9 647

Kaposi sarcoma Finland Large family STAT4 648

Kaposiformhemangioendothelioma

Japan Matched tumour and normalsample from an individual

ITGB2,IL-32 andDIDO1

649

Liver France 2 individuals from a family withrecurrent well-differentiatedhepatocellular tumours

DICER1 650

Lung USA Large family PARK 651

Taiwan Large family YAP1 652257

Page 288: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Arab An individual with lung cancerfrom an extended familysegregating different types ofhereditary cancer

NBN 653

Lymphoblasticleukaemia

Multiple Large family ETV6 654 655

Male breast Italy 1 male and 2 female BRCA1/2mutation-negative breast cancercases from a family

PALB2 656

Melanoma Multiple 101 patient from 56 melanomafamilies, negative for CDKN2Aand CDK4 mutations

POT1 208 657

Multiple 184 patients from 105melanoma families, negative forCDKN2A and CDK4 mutations

POT1 657 208

USA,Australia,UK

Patient from large melanomafamily

MITF 658 659,660

USA Uveal melanoma patients BAP1 661 662–676

Finland 21 cases BAP1 594

Melanoticneuroectodermaltumour of infancy

UK Single patient CDKN2A 677

258

Page 289: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Multiple spinalmeningiomas

UK 3 unrelated individuals withfamilial multiple spinalmeningiomas, negativefor mutations in NF2 andSMARCB1

SMARCE1 678 679–681

Nasopharyngealcarcinoma

China 161 NPC cases and 895 controlsof Southern Chinese descent

MST1R 682

Nonmedullarythyroid cancer

USA Large family HABP2 683

China 5 subjects from a large family RTFC 684

Ovarian UK, USA,Australia,Germany

412 high grade serous ovariancancer

FANCM 685

Paediatrichepatocellularcarcinoma

USA Single patient ABCB11 686

Paediatric poorlydifferentiatedcarcinoma

USA Patient with pediatric poorlydifferentiated carcinoma

APC 687

Pancreatic ductaladenocarcinoma

Japan 4 cases of KRASmutation-negative pancreaticductal adenocarcinoma

DCTN1-ALKfusion

688

Papillary thyroidcarcinoma

USA,Canada

Large family SRRM2 689259

Page 290: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Paraganglioma Spain Patient with multipleparagangliomas and familyhistory of the disease

MDH2 690

Penile squamouscell carcinoma

UK 27 patients CSN1 691

Pheochromocytoma Spain 3 patients with familialpheochromocytoma, negativefor mutations in known diseasecausing genes

MAX 692 693,694

Pre-B cell acutelymphoblasticleukemia

PuertoRicanAfricanAmericanancestry

2 families PAX2 695

Primary thyroidand breast

USA 14 female research participantswith primary thyroid and breastcancers without mutations inPTEN

PARP4 696

260

Page 291: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Primarylymphedemaassociated witha predispositionto acute myeloidleukemia(Embergersyndrome)

Europeanand Chinesedescent

2 unrelated patients with familyhistory and 1 sporadic case

GATA2 697 698–704

Prostate USA 91 patient from 19 families BTNL2 705

AfricanAmerican

652 aggressive prostate cancerpatients and 752 disease-freecontrols

TET2 706

Japan 140 patients with PC from 66families

TRRAP 707

USA 75 high risk families TANGO2,OR5H14,and CHAD

708

261

Page 292: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Australia 5 prostate cancer-affected menfrom 3 families

PCTP,MCRS1,ATRIP,PARP2,CYP3A43,DOK3,PLEKHH3,HEATR5B,GPR124,and HKR1

709

Renalangiomyolipoma

USA 15 patients TSC1 andTSC2

710

Renal cellcarcinoma

Korea 10 patients FOXC2 andCLIP4

711

Rosette-formingglioneuronaltumour

AfricanAmerican

A patient with rosette-formingglioneuronal tumour of thefourth ventricle

FGFR1,PIK3CA,PTPN11

712

Small cellcarcinomaof the ovary,hypercalcemic type

USA,Canada,UK

6 patients from 3 families SMARCA4 713 714–716

Multiple 7 patients SMARCA4 714 713,715,716

Small intestinalcarcinoids

USA Large family IPMK 717

262

Page 293: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Cancer Population Patients Genes Citation Additionalstudies

Squamous cellcarcinoma of headand neck

Korea 18 cisplatin-resistant metastatictumours and matched germline

REV3L 718

Transitional cellcarcinoma

China 2 patients HECW1 719

Wilms tumour UK 35 families CTR9 720

PubMed search was performed using a string (exome OR exom* OR NGS OR “whole genome” OR “next-generation” OR “next generation”

OR WES) AND (familial OR hereditary OR susceptib* OR risk OR germline OR “germline”) AND (sequencing OR analysis) AND (cancer

OR malignancy OR tumor* OR tumour*) AND English [lang]. Only studies which reported the identification of novel genes by exome sequencing

were included. Search results included up to March 2017.

263

Page 294: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

264

Page 295: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix C

Familial cancer syndromesassociated with sarcomas

265

Page 296: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Syndrome Sarcoma Inheritance Gene (location) Features

Beckwith-Wiedemannsyndrome

RMS AD NSD1(5q35.3), KIP2(11p15.4),CDKN1C(11p15.4), H19(11p15.5),KCNQ1OT1(11p15.5), ICR1(11p15.5)

Overgrowth syndrome: macroglossia,omphalocele, hemihypertrophy, gigantism,and associated tumour predisposition

Bloom syndrome RMS AR BLM (15q26.1) Progerioid syndrome: growth retardation,sun sensitivity, telangiectasias and otherskin changes, and associated tumourpredisposition

Costello syndrome RMS AD HRAS (11p15.5) Rasopathy: coarse facies, short stature,distinctive hand posture and appearance,cardiac anomalies, developmental delay,congenital myopathy

Familialadenomatouspolyposis

Gardner fibroma,desmoid, RMS

AD APC (5q21-q22) Individuals develop hundreds to thousandsof polyps of the colon and rectum thatcan progress to colorectal carcinoma if nottreated

Familialgastrointestinalstromal tumour

Gastrointesinalstromal tumour

AD KIT (4q12),PDGFRA (4q12)

Multiple gastrointestinal stromal tumours

Glomus tumours Glomus tumour AD GLMN (1p22.1) Glomuvenous malformations, glomangioma

266

Page 297: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Syndrome Sarcoma Inheritance Gene (location) Features

Gorlin-Goltznevoid basal cellcarcinoma

RMS, fetalrhabdomyoma

AD PTCH (Xp11.23) Multiple basal cell carcinomas, odontogenickeratocysts, palmar/plantar pits,calcification of the falx cerebri, ribabnormalities

Hereditaryleiomyomatosisand renal cellcarcinomasyndrome

Leiomyosarcoma(uterus)

AD FH (1q43) Tumour predisposition syndrome: cutaneouspiloleiomyomas, uterine leiomyomas, type 2papillary renal cell carcinomas

HereditaryRetinoblastoma

Sarcomas assecond malignantneoplasm, lipoma

AD RB1 (13q14.2) Retinoblastoma, often bilateral and typicallyin very early childhood

Leiomyomatosis-Alportsyndrome

Leiomyoma XLD COL4A6 (Xq22.3) Alport syndrome plus multiple, diffuseleiomyomas

Li-Fraumenisyndrome

RMS,undifferentiatedpleomorphicsarcoma,pleomorphicliposarcoma

AD TP53 (17p13.1) Inherited cancer syndrome: early onset oftumours, multiple tumours within individual;most commonly sarcomas, others includebreast cancer, central nervous systemtumours, leukaemia and adrenocorticalcarcinoma

Maffucci syndrome Spindle cellhemangiomas

IDH1 (2q34),IDH2 (15q26.1)

Multiple enchondromas (increased risk ofchondrosarcoma) and hemangiomas

Mazabraudsyndrome

Myxomas GNAS1 (20q13.32) Myxomas and fibrous dysplasia

267

Page 298: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Syndrome Sarcoma Inheritance Gene (location) Features

Mosaic variegatedaneuploidy

RMS AR BUB1B (15q15) Intrauterine growth restriction,microcephaly, spectrum of other anomalies,and a high risk of malignancy includingRMS, Wilms, and hematologic malignancy

Neurofibromatosistype 1

Malignantperipheralnerve sheathtumour, RMS,neurofibroma,gastrointestinalstromal tumour

AD NF1 ( 17q11.2) Cafe-au-lait spots, Lisch nodules in theeye, increased susceptibility to benign andmalignant tumours

Neurofibromatosistype 2

Schwannoma,RMS, malignantrhabdoid tumour

AD NF2 (22q12.2) Tumours of the eighth cranial nerve(usually bilateral) and other schwannomas,meningiomas of the brain, and schwannomasof the dorsal roots of the spinal cord

Nijmegen breakagesyndrome

RMS AR NBS1 (8q21.3) Chromosomal instability syndrome- microcephaly, growth retardation,immunodeficiency, and tumourpredisposition

Noonan syndrome RMS,lymphangioma

AD PTPN11 (12q24) Rasopathy - Dysmorphic facies, shortstature, neck webbing, cardiac anomalies,deafness, bleeding diathesis

Roberts syndrome RMS AR ESC02 (8p21.1) Range of mild to severe malformation ofbones, arms, legs, skull, and face - featuressimilar to those seen in thalidomide exposure

268

Page 299: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Syndrome Sarcoma Inheritance Gene (location) Features

Rothmund-Thomsonsyndrome

Osteosarcoma AR RTS (18q24.3) Skin atrophy, telangiectasia, hyper- andhypopigmentation, congenital skeletalabnormalities, short stature, prematureageing, and increased risk of malignantdisease

Rubinstein-Taybisyndrome

RMS AD CREBBP(16p13.1)

Multiple congenital anomalies,developmental delay,microcephaly,dysmorphic features, and tumourpredisposition

Simpson-Golabi-Behmelsyndrome

Embryonaltumours

XLR GPC3 Overgrowth syndrome - coarse facies,congenital heart defects, overgrowth, andother anomalies

Tuberous sclerosis RMS, cardiacrhabdomyoma,chordoma,renalangiomyolipoma,perivascularepithelioid celltumours

AD TSC1 (9q34),TSC2 (16p13.3),TSC3 (12q22-24.1)

Hamartomas of multiple organs,angiomyolipomas, other renal tumours(cysts and renal cell carcinomas),lymphangioleiomyomatosis, angiofibromasand other skin lesions

Werner syndrome RMS AR WRN (8p12-p11.2) Progerioid syndrome - Scleroderma-like skinchanges, early onset atherosclerosis anddiabetes

AD: autosomal dominant. AR: autosomal recessive. RMS: rhabdomyosarcoma. XLD: X linked dominant. XLR: X linked recessive.

269

Page 300: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

270

Page 301: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix D

Translocations associated withsarcomas

Translocation GenesAlveolar rhabdomyosarcomat(2;13)(q36;q14) PAX3–FOXO1t(1;13)(p36;q14) PAX7–FOXO1t(8;13;9)(p11;q14;q32) FOXO1-FGFR1t(X;2)(q13;q36) PAX3-FOXO4t(2;2)(p23;q36) PAX3-NCOA1t(2;8)(q36;q13) PAX3-NCOA2Alveolar soft-part sarcomat(X;17)(p11.2;q25) ASPL–TFE3Angiomatoid fibrous histiocytomat(2;22)(q33;q12) EWSR1-CREB1t(12;16)(q13;p11) FUS-ATF1t(12;22)(q13;q12) EWSR1-ATF1Chondroid lipomat(11;16)(q13.p13) C11orf95-MKL2

271

Page 302: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Translocation GenesClear-cell sarcomat(2;22)(q33;q12) EWSR1-CREB1t(12;22)(q13;q12) EWSR1–ATF1Congenital fibrosarcomat(12;15)(p13;q25) ETV6–NTRK3Dedifferentiated liposarcomat(5;5)(p15;p15) TRIO-TERTt(9;12)(q33;q15) CNOT2-ASTN2?t(12)(q14q14) CTDSP2-FAM19A2t(9;12)(q33;q21) NR6A1-TRHDE?t(12)(q15q21) NUP107-LGR5t(9;12)(q33;q15) NUP107-PAPPAt(5;14)(p13;q32) RCOR1-WDR70Dermatofibrosarcoma protuberanst(17;22)(q22;q13) COL1A1–PDGFBDesmoplastic small round-cell tumourt(11;22)(p13;q12) EWSR1–WT1t(21;22)(q22;q12) EWSR1-ERGEndometrial stromal sarcomat(6;10)(p21;p11) EPC1-PHF1t(6;7)(p21;p15) JAZF1-PHF1t(7;17)(p15;q11) JAZF1-SUZ12t(1;6)(p34;p21) MEAF6-PHF1t(10;17)(q23;p13) YWHAE-FAM22At(10;17)(q22;p13) YWHAE-FAM22Bt(X;22)(p11;q13) ZC3H7B-BCOREpithelioid hemangioendotheliomat(1;3)(p36;q25) WWTR1-CAMTA1t(X;11)(p11;q22) YAP1-TFE3

272

Page 303: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Translocation GenesEpithelioid sarcoma of the ovaryt(12;12)(q23;q24) CMKLR1-HNF1At(12;12)(q13;q22) ERBB3-CRADDt(1;22)(p36;q11) SMARCB1-WASF2Ewing’s sarcomat(11;22)(q24;q12) EWSR1–FLI1t(21;22)(q22;q12) EWSR1–ERGt(7;22)(p22;q12) EWSR1-ER81t(17;22)(q21;q12) EWSR1-ETV4t(2;22)(q33;q12) EWSR1–FEVt(21,22)(q22;q12) EWSR1-ERGt(16,21)(p11;q24) FUS-ERGt(2,16)(q35;p11) FUS-FEVt(20,22)(q13;q12) EWSR1-NFATC1t(6,22)(p21;q12) EWSR1-POU5F1t(4,22)(q31;q12) EWSR1-SMARCA5t(7;22)(p21;q12) EWSR1-ETV1Fibromyxoid sarcomat(7;16)(q34;p11) FUS-CREB3L2t(11;16)(p11;p11) FUS-CREB3L1t(11:22)(p11;q12) EWSR1-CREB3L1Inflammatory myofibroblastic tumour2p23 rearrangements TMP3–ALK ;

TMP4–ALKinv(2)(p23q35) ATIC-ALKt(2;11)(p23;p15) CARS-ALKt(2;17)(p23;q23) CLTC-ALKt(2;12)(p23;p11) PPFIBP1-ALKt(2;2)(p23;q13) RANBP2-ALKt(X;6)(p11;p24) RREB1-TFE3t(2;4)(p23;q21) SEC31A-ALKt(1;2)(q21;o23) TPM3-ALK

273

Page 304: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Translocation Genest(2;19)(p23;p13) TPM4-ALKt(2;2)(p21;p23) EML4-ALKKaposi’s sarcoma EZH2, SIRT1Leiomyoma of the uterusinv(7)(p21q22) CUX1-AGR3t(12;14)(q14;q11) HMGA2-CCNB1IP1t(7;12)(q31;q14) HMGA2-COG5t(8;12)(q22;q14) HMGA2-COX6Ct(12;14)(q14;q24) HMGA2-RAD51L1Leiomyosarcoma SIRT1Lipoblastomat(7;8)(q21;q12) COL1A2-PLAG1t(2;8)(q31;q12.1) COL3A1-PLAG1del(8)(q12q24) HAS2-PLAG1Lipomat(5;12)(q33;q14) EBF1-LOC204010t(2;12)9)(q37;q14) HMGA2-CXCR7t(5;12)(q33;q14) HMGA2-EBF1t(12;13)(q14;q13) HMGA2-LHFPt(3;12)(q28;q14 HMGA2-LPPt(9;12)(p22;q14) HMGA2-NFIBt(1;12)(p32;q14) HMGA2-PPAP2Bt(3;12)(q28;q14) LPP-C12orf9Mesenchymal chondrosarcomat(8;8)(q12;q21) HEY1-NCOA2t(1;5)(q42;q32) IRFBP2-CDX1Myoepitheliomat(12;22)(q13;q12) EWSR1-ATF1t(1;22)(q23;q12) EWSR1-PBX1t(6;22)(p21;q12) EWSR1-POU5F1t(19;22)(q13;q12) EWSR1-ZNF444

274

Page 305: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Translocation GenesMyxoid chondrosarcomat(9;17)(q31;q12) TAF15-NR4A3t(3;9)(q12;q31) TFG-NR4A3t(9;15)(q31;q21) TCF12-NR4A3t(9;22)(q22-31;q11-12) EWSR1–NR4A3Myxoid liposarcomat(12;16)(q13;p11) FUS–DDIT3t(12;22)(q13;q12) EWSR1–DDIT3Ossifying fibromyxoid tumourt(6;12)(p21;q24) EP400-PHF1PEComat(X;1)(p11;p34) SFPG-TFE3t(14;X)(q24;q12) RAD51B-OPHN1t(14;X)(q24;p11) RAD51B-RRAGBPericytomat(7;12)(p22;q13) ACTB-GLI1Primary pulmonary myxoid sarcomat(2;22)(q33;q12) EWSR1-CREB1Sclerosing epithelioid fibrosarcomat(7;16)(q34;p11) FUS-CREB3L2t(11;22)(p11;q12) EWSR1-CREB3L1t(7;22)(q3;q12) EWSR1-CREB3L2Soft tissue angiofibromat(5;8)(p15;q13) AHRR-NCOA2t(7;8;14)(q11;q13;q31) GTF2I-NCOA2Soft tissue chondromat(3;12)(q28;q14) HMGA2-LPPSolitary fibrous tumourinv(12)(q13q13) NAB2-STAT6

275

Page 306: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Translocation GenesSpindle cell rhabdomyosarcomat(6;8)(p21;q13) SRF-NCOA2t(8;11)(q13;p15) TEAD1-NCOA2t(6;6)(q22;q24) VGLL2-CITED2t(6;8)(q22;q13) VGLL2-NCOA2Synovial sarcomat(X;18)(p11;q11) SS18-SSX1,

SS18-SSX2,SS18-SSX4

Tenosynovial giant cell tumourt(1;2)(p13;q37) COL6A3-CSF1Undifferentiated sarcomasinv(x)(p11p11) BCOR-CCNB3t(4;19)(q35;q13) CIC-DUX4t(10;19)(q26;q13) CIC-DUX4L10t(6;22)(p21;q12) EWSR1-POU5F1t(2;22)(q31;q12) EWSR1-SP3

Citations:106,107,721–724

276

Page 307: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix E

Genetically complex sarcomas

277

Page 308: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Sarcoma Genes ReferencesAngiosarcoma Amplification: 8q24.21 (MYC ), 10p12.33, 5q35.3 725

Chondrosarcoma (typesother than extraskeletalmyxoid)

COL2A1, IDH1, IDH2, TP53, RB1 pathway 726

Embryonalrhabdomyosarcoma

Polysomy: 8, 2, 11, 12, 13 and 20. Monsomy: 10and 15. LOH: 11p15.5 (IGF2, H19, CDKN1C,HOTS). Gains: 12q13

64

Fibrosarcoma (other thancongenital)

Multiple non-specific numerical and structuralchromosomal abnormalities. Gain: 22q(PDGF-B)

727–729

Leiomyosarcoma Gains: 1, 5, 6, 8, 15, 16, 17, 19, 20, 22, X. Losses:1p, 2, 3, 4, 6q, 8, 9, 10p, 11p, 12q, 11q, 13, 16,17p, 18 19, 22q. Amplifications: 1, 5, 8, 12, 13,17, 19, 20

730,731

Liposarcoma (types otherthan myxoid)

Gains of 1p, 1q21-q32, 2q, 3p, 3q, 5p12-p15, 5q,6p21, 7p, 7q22, 8q, 10q, 12q12-q24, 13q, 14q,15q, 17p, 17q, 18p, 18q12, 19p12, 19q13, 20q,22q, and Xq21-q27. Losses: 1q, 2q, 3p, 4q, 10q,11q, 12p13, 13q14, 13q21-qter, 14q23-24, 16q22,17p13, 17q11.2, and 22q13

732–734

278

Page 309: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Sarcoma Genes ReferencesMalignant peripheralnerve-sheath tumour

Gains: 7p21-q36, 7p22, 7q, 8, 8q11-23, 1q25-44,and 5q13-35. Losses: 1p12-13, 1p21, 1p36,3p21-pter, 9p13-21, 9p22-24, 10, 10p11-15,11p, 11q21-25, 13q14, 15p, 16/16q24, 17/17p,17q11-12, 17q21-25, 22, 22p, 22q13, and22q11-12. Ring chromosomes, trisomy 7,and rearrangements of 11p and 12q13-15.Breakpoints: 1p, 7p22 (ETV1 ), 11q13-23, 20q13(SRC ), and 22q11-13 (NF2 )

735–737

Myxofibrosarcoma Gains: 19p, 19q. Losses: 1q, 2q, 3p, 4q, 10q, 11q,and 13q (RB1 ). Amplification: 1, 5p, and 20q

732

ExtraskeletalOsteosarcoma

Gains: 1q, 2, 8, and 17p11. Losses: 1q, 2, 5, 6,12, 13, 14, 15, 16, 18, 19, 20, 21, and Y

64,738,739

Pleomorphicrhabdomyosarcoma

Gains: 1p22-23, 5, 7p, 8, 14, 18/18, 20p, and 22.Losses: 2, 3p, 5q32-qter, 6,10q23 (PTEN ), 11,13, 14, 15q21-q22, 16, 17, 18, 19, and Y

740,741

279

Page 310: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Sarcoma Genes ReferencesSpindle cell/pleomorphicunclassified sarcoma

Gains: 1p36-p31, 1q21-q24, 2p, 4p16, 5p,5q34, 6q, 7p15-p22, 7q21-qter, 17q, 9q, 14q,16p13, 17q, 19p13, 19q13.11-q13.2, 20q, and21q. Losses: 1q32.1, 2p25.3, 2q36-q37, 8p23,9p, 10q21-q23, 11q22, 13q14-q21, 16q11, and16q23. Amplifications: 1p33-p34, 12q13-q15,17cen-p11.2, and 17p13-pter

742–744

Skeletal osteosarcoma Gains and regional amplifications: 1q, 6p21-p12,8q23-q24, and 17p13-p11.2 (TP53 ). Partialor complete loss: 6q. Rearrangements ofchromosomes 20

745–748

280

Page 311: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix F

Known cancer predispositiongenes

Gene Genomiclocation

Cancer predisposition

SDHB 1p36.13 Gastrointestinal stromal tumour,paraganglioma, gastric stromalsarcoma, pheochromocytoma

MUTYH 1p34.1 Colorectal adenomas, colorectaladenomatous polyposis, gastric cancer(somatic)

UROD 1p34.1 Hepatocellular carcinomaMPL 1p34 Familial essential thrombocythemiaGBA 1q22 Gaucher diseaseSDHC 1q23.3 Gastrointestinal stromal tumour,

paraganglioma and gastric stromalsarcoma

CDC73 1q31.2 Parathyroid carcinoma and adenomaFH 1q43 Leiomyomatosis and renal cell cancerALK 2p23.2-p23.1 Familial neuroblastomaSOS1 2p22.1 Noonan syndrome

281

Page 312: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Cancer predisposition

MSH2 2p21-p16 Colorectal cancer, hereditarynonpolyposis, type 1

MSH6 2p16.3 Colorectal cancer, hereditarynonpolyposis type 5, endometrialcancer (familial), mismatch repaircancer syndrome

TMEM127 2q11.2 PheochromocytomaERCC3 2q14.3 Xeroderma pigmentosum, group BABCB11 2q31.1 Hepatocellular carcinomaDIS3L2 2q37.1 Perlman syndromeVHL 3p25.3 Hemangioblastoma,

pheochromocytoma, renal cellcarcinoma, von Hippel-Lindausyndrome

XPC 3p25.1 Xeroderma pigmentosum, group CBAP1 3p21.1 Tumour predisposition syndromeCOL7A1 3p21.31 Dystrophic epidermolysis bullosaMLH1 3p22.2 Colorectal cancer, hereditary

nonpolyposis type 2, mismatch repaircancer syndrome

ATR 3q23 Familial cutaneous telangiectasia andcancer syndrome

GATA2 3q21.3 Acute myeloid leukemia,myelodysplastic syndrome

PHOX2B 4p13 NeuroblastomaKIT 4q12 Gastrointestinal stromal tumour, germ

cell tumours, acute myeloid leukemiaPDGFRA 4q12 Gastrointestinal stromal tumourSDHA 5p15.33 ParagangliomasTERT 5p15.33 Acute myeloid leukemia, melanoma

282

Page 313: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Cancer predisposition

APC 5q22.2 Adenomatous polyposis coli, Braintumour-polyposis syndrome 2,Colorectal cancer (somatic), Gardnersyndrome, gastric cancer (somatic),hepatoblastoma (somatic)

ITK 5q33.3 Lymphoproliferative syndrome 1HFE 6p22.2 HemochromatosisFANCE 6p21-p22 Acute myeloid leukaemiaPOLH 6p21.1 Xeroderma pigmentosum, variant typePMS2 7p22.1 Colorectal cancer, hereditary

nonpolyposis type 4, mismatch repaircancer syndrome

EGFR 7p11.2 Adenocarcinoma of lung, non-smallcell lung cancer

SBDS 7q11.21 Shwachman-Diamond syndromeSLC25A13 7q21.3 Hepatocellular carcinomaMET 7q31.2 Hepatocellular carcinoma, renal cell

carcinoma, osteofibrous dysplasiaPRSS1 7q34 Pancreatic cancerWRN 8p12 Werner syndromeNBN 8q21.3 Acute lymphoblastic leukemia,

Nijmegen breakage syndromeEXT1 8q24.11 ChondrosarcomaRECQL4 8q24.3 Rothmund-Thomson syndromeDOCK8 9p24.3 Hyper-IgE recurrent infection

syndrome, autosomal recessiveMTAP 9p21.3 Malignant fibrous histiocytomaCDKN2A 9p21.3 Melanoma, neural system tumour

syndrome, orolaryngeal cancer,pancreatic cancer/melanomasyndrome

283

Page 314: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Cancer predisposition

RMRP 9p13.3 Metaphyseal dysplasia withouthypotrichosis

FANCG 9p13 Acute myeloid leukaemiaXPA 9q22.33 Xeroderma pigmentosum, group AFANCC 9q22.32 Fanconi anemia, complementation

group CPTCH1 9q22.32 Basal cell carcinomaTGFBR1 9q22.33 Multiple self-healing squamous

epitheliomaTSC1 9q34.13 Lymphangioleiomyomatosis, tuberous

sclerosis-1RET 10q11.21 Medullary thyroid carcinoma,

multiple endocrine neoplasia,pheochromocytoma

BMPR1A 10q23.2 Polyposis syndromePTEN 10q23.31 Cowden syndrome 1, Endometrial

carcinoma, malignant melanoma,PTEN hamartoma tumour syndrome,squamous cell carcinoma, head andneck, glioma susceptibility, prostatecancer

TNFRSF6 10q23.31 Autoimmune lymphoproliferativesyndrome, squamous cell carcinoma,autoimmune lymphoproliferativesyndrome

SUFU 10q24.32 MedulloblastomaHRAS 11p15.5 Costello syndrome, bladder cancer,

thyroid carcinomaFANCF 11p15 Acute myeloid leukaemiaWT1 11p13 Mesothelioma, Wilms tumor, type 1DDB2 11p11.2 Xeroderma pigmentosum, group E,

DDB-negative subtype

284

Page 315: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Cancer predisposition

EXT2 11p11.2 Exostoses, multiple, type 2SDHAF2 11q12.2 Familial paragangliomaMEN1 11q13.1 Adrenal adenoma, angiofibroma,

carcinoid tumour of lung, lipoma,multiple endocrine neoplasia 1,parathyroid adenoma

ATM 11q22.3 Lymphoma, T-cell prolymphocyticleukemia, breast cancer

CBL 11q23.3 Noonan syndrome-like disorderSDHD 11q23.1 Intestinal carcinoid tumours, Cowden

syndrome, merkel cell carcinoma,paraganglioma, gastric stromalsarcoma and pheochromocytoma

HMBS 11q23.3 Hepatocellular carcinomaCDKN1B 12p13.1 Multiple endocrine neoplasia, type IVCDK4 12q14.1 MelanomaPTPN11 12q24.13 Juvenile myelomonocytic leukemia,

Noonan syndrome 1HNF1A 12q24.2 Familial hepatic adenomaPOLE 12q24.33 Colorectal cancerBRCA2 13q13.1 Fanconi anemia, complementation

group D1, Wilms tumour, breastcancer (male), breast-ovarian cancer,glioblastoma, medulloblastoma,pancreatic cancer, prostate cancer

GJB2 13q12.11 Vohwinkel syndromeRB1 13q14.2 Bladder cancer, osteosarcoma,

retinoblastoma, small cell cancer ofthe lung

ERCC5 13q33.1 Xeroderma pigmentosum, group GMAX 14q23.3 Pheochromocytoma

285

Page 316: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Cancer predisposition

SERPINA1 14q32.13 Thyroid cancerDICER1 14q32.13 Pleuropulmonary blastoma,

rhabdomyosarcomaBUB1B 15q15.1 Colorectal cancerFAH 15q25.1 Hepatocellular carcinomaBLM 15q26.1 Bloom SyndromeTSC2 16p13.3 Lymphangioleiomyomatosis, tuberous

sclerosis-2ERCC4 16p13.12 Fanconi anemia, complementation

group Q, Xeroderma pigmentosum,group F, Cockayne syndrome

PALB2 16p12.2 Fanconi anemia, complementationgroup N, breast cancer, pancreaticcancer

CYLD 16q12.1 Brooke-Spiegler syndrome,cylindromatosis, trichoepithelioma

CDH1 16q22.1 Endometrial carcinoma, gastriccancer, ovarian carcinoma, breastcancer, prostate cancer

FANCA 16q24.3 Fanconi anemia, complementationgroup A

TP53 17p13.1 Adrenal cortical carcinoma, breastcancer, choroid plexus papilloma,colorectal cancer, hepatocellularcarcinoma, Li-Fraumeni syndrome,nasopharyngeal carcinoma,osteosarcoma, pancreatic cancer, basalcell carcinoma 7, glioma susceptibility

FLCN 17p11.2 Colorectal cancer, renal carcinomaRAD51D 17q12 Familial breast-ovarian cancerNF1 17q11.2 Neurofibromatosis, type 1

286

Page 317: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Cancer predisposition

BRCA1 17q21.31 Familial breast-ovarian cancer,pancreatic cancer

STAT3 17q21.2 Autoimmune disease, Hyper-IgErecurrent infection syndrome

SMARCE1 17q21.2 Familial meningiomaTRIM37 17q22 Breast cancerBRIP1 17q23.2 Breast cancer, Fanconi anemia,

complementation group JPRKAR1A 17q24.2 Adrenocortical tumour, Carney

complex type 1, pigmented nodularadrenocortical disease, primary

AXIN2 17q24.1 Colorectal cancer,oligodontia-colorectal cancersyndrome

RAD51C 17q22 Fanconi anemia, complementationgroup O, familial breast-ovariancancer

RHBDF2 17q25.1 Tylosis with esophageal cancerSMAD4 18q21.2 Juvenile polyposis/hereditary

hemorrhagic telangiectasia syndrome,pancreatic cancer (somatic), juvenileintestinal polyposis

SETBP1 18q21.1 Schinzel-Giedion syndromeELANE 19p13.3 Cyclic neutropenia, severe congenital

neutropeniaSTK11 19p13.3 Melanoma, pancreatic cancer,

Peutz-Jeghers syndrome, testicularcancer

SMARCA4 19p13.2 Coffin-Siris syndrome 4, rhabdoidtumour predisposition syndrome 2

CEBPA 19q13.11 Acute myeloid leukaemia

287

Page 318: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Genomiclocation

Cancer predisposition

ERCC2 19q13.32 Cerebrooculofacioskeletal syndrome 2,trichothiodystrophy 1 photosensitive,xeroderma pigmentosum group D

POLD1 19q13.33 Colorectal cancerRUNX1 21q22.12 Acute myeloid leukaemia, platelet

disorder, familial, with associatedmyeloid malignancy

SMARCB1 22q11.23 Coffin-Siris syndrome, Rhabdoidtumours, Schwannomatosis-1

LZTR1 22q11.21 Schwannomatosis-2CHEK2 22q12.1 Li-Fraumeni syndrome, osteosarcoma,

breast cancer, colorectal cancer,prostate cancer

NF2 22q12.2 Neurofibromatosis type 2,schwannomatosis, meningioma

WAS Xp11.23 Neutropenia, thrombocytopenia,Wiskott-Aldrich syndrome

SH2D1A Xq25 Lymphoproliferative syndromeGPC3 Xq26.2 Wilms tumourDKC1 Xq28 Dyskeratosis congenitaSRY Yp11.2 Hepatocellular carcinoma

The information in this table was sourced from the Online Inheritance in Man database andthe Catalogue of Somatic Mutations in Cancer database.134,159–161

288

Page 319: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix G

Candidate genes used for variantprioritisation based on a prioriknowledge of cancer biology

Gene name Chromosome Start End No. variantsAPC 5 112018202 112206936 13ARID1A 1 26997522 27133601 4ATM 11 108068559 108264826 12ATR 3 142143077 142322668 31AXIN1 16 312440 427676 10AXIN2 17 63499683 63582740 6BARD1 2 215568275 215699428 10BLM 15 91235579 91383686 9BRCA1 17 41171312 41302500 13BRCA2 13 32864617 32998809 20BRIP1 17 59731547 59965920 9BUB1B 2 111370409 111460684 1C17orf85 17 3685045 3774545 3CD99 Y 2534228 2634350 0CDH1 16 68746195 68894444 6CDKN2A 9 21942751 22000132 2CHEK1 11 125471251 125552042 4CHEK2 22 29058731 29162822 2DDB2 11 47211493 47285769 4DICER1 14 95527565 95633085 3

289

Page 320: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene name Chromosome Start End No. variantsDKC1 X 153966031 154030964 0DNA2 10 70148821 70256730 6ELF3 1 201954690 202011315 3ELF5 11 34475342 34560347 6ERCC2 19 45829649 45898845 12ERCC3 2 127989866 128076752 4ERCC4 16 13989014 14071205 8ERCC5 13 103479468 103549748 14ERF 19 42726717 42784309 1ERG 21 39726950 39895428 7ETS1 11 128303656 128482453 4ETS2 21 40152231 40221878 4ETV-1 7 13905856 14054642 7ETV2 19 36107647 36160773 4ETV4 17 41580211 41648305 4ETV6 12 11777788 12073325 2EWSR1 22 29638998 29721515 8EXT1 8 118786602 119149058 3EXT2 11 44092747 44291980 6FAM175A 4 84357094 84431290 3FANCA 16 89778959 89908065 45FANCB X 14836529 14916184 0FANCC 9 97836336 98104991 3FANCD2 3 10043113 10166344 12FANCE 6 35395138 35459881 7FANCF 11 22619079 22672387 2FANCG 9 35048835 35105013 3FANCI 15 89762194 89885362 13FANCL 2 58361378 58493515 4FANCM 14 45580136 45695093 9FH 1 241635857 241708085 3FLI1 11 128538811 128708162 3HNF4A 20 42959441 43061115 7IDH1 2 209075953 209144806 3IDH2 15 90602212 90670708 4KIF1B 1 10245764 10466661 27KIT 4 55499095 55631881 4LIG1 19 48593703 48698560 24LIG4 13 108834792 108892882 2MDM2 12 69176971 69264320 4MEN1 11 64545986 64603188 8

290

Page 321: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene name Chromosome Start End No. variantsMET 7 116287459 116463440 7MLH1 3 37009841 37117337 6MLH3 14 75455467 75543235 8MRE11A 11 94125469 94252040 5MSH2 2 47605206 47735367 11MSH3 5 79925467 80197634 13MSH6 2 47985221 48059092 11MUTYH 1 45769914 45831142 6NBN 8 90920564 91021899 10NEIL2 8 11602172 11669854 5NF1 17 29396945 29729695 8NF2 22 29974545 30119589 2PALB2 16 23589483 23677678 3PMS1 2 190623811 190767355 3PMS2 7 5987870 6073737 5POLH 6 43518878 43613260 4PPARG 3 12368001 12500855 3PRKAR1A 17 66482921 66554570 6PTCH1 9 98180264 98295831 9PTEN 10 89598195 89753532 2PTPN11 12 112831536 112972717 2RAD50 5 131867616 132005313 6RAD51C 17 56744963 56836692 2RAD51D 17 33401811 33458500 3RB1 13 48852883 49081026 9RECQL4 8 145711667 145768210 10RET 10 43547517 43650797 12RMI1 9 86570637 86643987 2RMI2 16 11414311 11470617 8RPA1 17 1708273 1827848 8RPA3 7 7651575 7783238 1RPS19 19 42338988 42400484 3SDHA 5 193356 281814 12SDHB 1 17320225 17405665 3SDHC 1 161259166 161359535 3SDHD 11 111932548 111991525 3SMARCA4 19 11046598 11197958 11SMARCB1 22 24104150 24201705 3SPDEF 6 34480579 34549110 1SPI1 11 47351409 47425127 4SQSTM1 5 179222842 179290077 3

291

Page 322: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene name Chromosome Start End No. variantsSTK11 19 1180798 1253434 7TAF15 17 34111459 34199246 4TGFBR2 3 30622994 30760633 2TNFRSF11A 18 59967520 60079943 8TOP1 20 39632462 39778126 0TOP3A 17 18152235 18243321 8TP53 17 7546720 7615868 5TP53BP1 15 43674412 43810354 5TSC1 9 135741735 135845020 2TSC2 16 2072990 2163713 22VHL 3 10158319 10220354 1WRN 8 30865778 31056277 25WT1 11 32384322 32482081 9XPA 9 100412191 100484691 2XPC 3 14161648 14245172 14XRCC2 7 152318587 152398250 2

Start and End: the chromosome locations of the start and end of the gene (including ± 25kb).

292

Page 323: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix H

Genes in which variants were alsoprioritised using the candidategene prioritisation strategy

Gene Chromosome No. variantsACCS 11 8ACP2 11 4ACRV1 11 1ACYP1 14 1AIMP2 7 3ALX4 11 2ANKRD49 11 1ARHGAP39 8 5ARHGDIG 16 2ARHGEF1 19 2ATP13A2 1 4ATP1B2 17 3ATP5D 19 3BRK1 3 3C11orf57 11 2C11orf97 11 3C19orf26 19 5C22orf15 22 2C5orf45 5 7C9orf9 9 1

293

Page 324: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Chromosome No. variantsCAMKK1 17 5CAT 11 2CCDC127 5 2CDC42BPG 11 12CFAP126 1 3CHCHD10 22 2COX6B1 19 2CPM 12 4DCTN5 16 5DERL3 22 4DHFR 5 3DHX8 17 5DLAT 11 1DMRTC2 19 1EIF2AK1 7 1EIF2B2 14 2EPCAM 2 2EPM2AIP1 3 1EVI2A 17 2FAM20A 17 3FAM20A, PRKAR1A 17 1FBXO11 2 1FDFT1 8 4FLII 17 7FNDC8 17 2FRY 13 1GAS2L1 22 5GATA4 8 3GPT 8 2GSK3A 19 3GTPBP2 6 2HAUS5 19 3HEATR9 17 1HELQ 4 11HNRNPK 9 4HSCB 22 2IL13 5 2INTS2 17 2IRAK2 3 4ITFG3 16 2ITGAE 17 1

294

Page 325: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Chromosome No. variantsKLC3 19 8LOC100507346 9 3LOC401052 3 4LPAR6 13 1LRRC14 8 3LRRC14B 5 2LRRFIP2 3 2LYPD4 19 3MAD2L1BP 6 1MAP3K2 2 1MAP4K2 11 5MFSD3 8 1MIDN 19 3MIEF2 17 9MIS18BP1 14 2MMP11 22 4MPZ 1 1MRPL28 16 11MRPS18C 4 1MYBPC3 11 13NDUFAB1 16 2NPAT 11 1NR1H3 11 2ORMDL1 2 3OSGIN2 8 2PACSIN1 6 1PADI2 1 9PDIA2 16 10PGD 1 3PIGO 9 10PIGV 1 2PIH1D2 11 1PKD1 16 17PLA2G4C 19 10PLCG1-AS1 19 1POLG 15 11PPP1R16A 8 2R3HDML 20 6RBM42 19 4RCBTB2 13 1RGS11 16 5

295

Page 326: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene Chromosome No. variantsRHBDD3 22 2RNPEP 1 5RNU6-28P 15 5RPL10A 6 1RUFY2 10 2SHMT1 17 5SLC2A11 22 1SLC9A3R2 16 6SMCR8 17 6SMYD4 17 3SRP19 5 2STOML2 9 3STT3A 11 3TANGO6 16 2TEAD3 6 4TESK2 1 3TMEM43 3 10TMEM8A 16 13TOE1 1 1UPK1A 19 3VAT1 17 1VCP 9 5VPS9D1 16 2VRK2 2 1WRAP53 17 5XPO5 6 2XRN1 3 1ZAR1L 13 4ZC2HC1C 14 2ZNF276 16 13ZNF526 19 1ZNF710 15 6

296

Page 327: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix I

Patient 1-II-2: Copy numbervariation by chromosome

297

Page 328: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Black: normalised log ratios. Red: mean values among points in segment obtained by circular

binary segmentation.

Page 329: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Black: normalised log ratios. Red: mean values among points in segment obtained by circular

binary segmentation.

Page 330: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

IndexL

og

2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Black: normalised log ratios. Red: mean values among points in segment obtained by circular

binary segmentation.

Page 331: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Black: normalised log ratios. Red: mean values among points in segment obtained by circular

binary segmentation.

Page 332: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby
Page 333: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix J

Patient 2-II-1: Copy numbervariation by chromosome

303

Page 334: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Black: normalised log ratios. Red: mean values among points in segment obtained by circular

binary segmentation.

Page 335: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Black: normalised log ratios. Red: mean values among points in segment obtained by circular

binary segmentation.

Page 336: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

IndexL

og

2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Black: normalised log ratios. Red: mean values among points in segment obtained by circular

binary segmentation.

Page 337: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Index

Lo

g2(T

/R)

Black: normalised log ratios. Red: mean values among points in segment obtained by circular

binary segmentation.

Page 338: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby
Page 339: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix K

A list of nonsynonymousdeleterious variants included invariant burden analyses

309

Page 340: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position SIFT PolyPhen-2 MAF MAF 1000G Chr Position SIFT PolyPhen-2 MAF MAF 1000G

ABCB5 7 20685484 D D 9.80 x 10−3 1.39 x 10−2 7 20682919 D D 4.37 x 10−4 .7 20685585 D D 8.91 x 10−4 . 7 20685484 D D 3.50 x 10−3 1.39 x 10−2

7 20687600 D D 1.78 x 10−3 1.00 x 10−3 7 20687600 D D 4.37 x 10−4 1.00 x 10−3

7 20687604 D D 1.27 x 10−1 1.44 x 10−1 7 20687604 D D 1.32 x 10−1 1.44 x 10−1

7 20689724 D D 8.91 x 10−4 . 7 20691061 D D 4.37 x 10−4 .7 20691121 D D 8.91 x 10−4 . 7 20691133 D D 4.37 x 10−4 .7 20698170 D D 8.91 x 10−4 . 7 20691140 D D 4.37 x 10−4 .7 20698201 D D 1.78 x 10−3 . 7 20698156 D D 1.31 x 10−3 .7 20721141 D D 8.91 x 10−4 . 7 20721260 D D 3.50 x 10−3 .7 20721152 D D 8.91 x 10−4 . 7 20738157 D D 4.37 x 10−4 .7 20721260 D D 8.91 x 10−4 . 7 20762646 D D 3.68 x 10−1 3.61 x 10−1

7 20762646 D D 3.86 x 10−1 3.61 x 10−1 7 20782621 D D 4.37 x 10−4 .7 20766690 D D 1.78 x 10−3 . 7 20785017 D D 4.37 x 10−4 .7 20766694 D D 8.91 x 10−4 . 7 20793107 D D 8.74 x 10−4 .7 20782621 D D 8.91 x 10−4 . 7 20793113 D D 8.74 x 10−4 .7 20782629 D D 1.78 x 10−3 . 7 20795051 D D 4.37 x 10−4 .7 20793113 D D 2.67 x 10−3 .

ARHGAP39 8 145773319 D D 9.03 x 10−4 . 8 145770897 D D 4.37 x 10−4 .8 145773431 D D 4.37 x 10−4 .8 145806612 D D 4.37 x 10−4 .

BEAN1 16 66511541 D D 8.91 x 10−4 .C16orf96 16 4606635 D D 1.96 x 10−2 1.29 x 10−2 16 4606526 D D 8.74 x 10−4 .

16 4626229 D D 1.78 x 10−3 . 16 4606635 D D 1.35 x 10−2 1.29 x 10−2

16 4638318 D D 8.91 x 10−4 2.00 x 10−3 16 4606678 D D 8.74 x 10−4 .16 4649306 D D 8.91 x 10−4 . 16 4606727 D D 8.74 x 10−4 .

Page 341: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position SIFT PolyPhen-2 MAF MAF 1000G Chr Position SIFT PolyPhen-2 MAF MAF 1000G

16 4626476 D D 4.37 x 10−4 .16 4626576 D D 4.37 x 10−4 .16 4638204 D D 4.37 x 10−4 .16 4638318 D D 8.74 x 10−4 2.00 x 10−3

16 4649306 D D 4.37 x 10−4 .KIF2C 1 45226349 D D 4.37 x 10−4 .PDIA2 16 334543 D D 1.39 x 10−1 1.42 x 10−1 16 334543 D D 1.38 x 10−1 1.42 x 10−1

16 334593 D D 8.99 x 10−4 . 16 336377 D D 1.35 x 10−2 2.68 x 10−2

16 334956 D D 8.99 x 10−4 . 16 336895 D D 4.37 x 10−4 .16 335515 D D 8.93 x 10−4 .16 336377 D D 1.07 x 10−2 2.68 x 10−2

16 336413 D D 8.91 x 10−4 .16 336712 D D 8.91 x 10−4 .

UVSSA 4 1343341 D D 8.98 x 10−4 1.00 x 10−3 4 1343426 D D 4.37 x 10−4 .4 1343343 D D 8.98 x 10−4 . 4 1345557 D D 4.37 x 10−4 .4 1345606 D D 8.91 x 10−4 . 4 1348920 D D 8.74 x 10−3 8.90 x 10−3

4 1348920 D D 9.89 x 10−3 8.90 x 10−3 4 1373852 D D 4.37 x 10−4 .4 1360144 D D 2.67 x 10−3 2.00 x 10−3 4 1373986 D D 4.37 x 10−4 1.00 x 10−3

4 1369196 D D 1.79 x 10−3 . 4 1374746 D D 4.37 x 10−4 2.00 x 10−3

4 1373986 D D 8.98 x 10−4 1.00 x 10−3

4 1374004 D D 8.96 x 10−4 1.00 x 10−3

Page 342: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position SIFT PolyPhen-2 MAF MAF 1000G Chr Position SIFT PolyPhen-2 MAF MAF 1000G

ZFP69B 1 40928731 D D 8.91 x 10−4 1.00 x 10−3 1 40928825 D D 4.37 x 10−4 1.00 x 10−3

1 40928969 D D 8.91 x 10−4 2.00 x 10−3 1 40928969 D D 5.68 x 10−3 2.00 x 10−3

1 40929068 D D 4.37 x 10−4 .1 40929182 D D 4.37 x 10−4 .

ABL1 9 133730212 D D 8.91 x 10−4 . 9 133730212 D D 4.37 x 10−4 .9 133753873 D D 8.91 x 10−4 . 9 133759881 D D 4.37 x 10−4 .9 133760728 D D 8.99 x 10−4 .

ADSSL1 14 105206110 D D 8.91 x 10−4 . 14 105204718 D D 4.37 x 10−4 .14 105207039 D D 8.91 x 10−4 . 14 105207039 D D 1.75 x 10−3 .14 105208220 D D 8.93 x 10−4 . 14 105207580 D D 4.37 x 10−4 .14 105208302 D D 8.91 x 10−4 . 14 105208271 D D 4.37 x 10−4 .14 105211222 D D 8.91 x 10−4 . 14 105208301 D D 4.37 x 10−4 .

14 105209473 D D 4.37 x 10−4 .ASPN 9 95228663 D D 1.25 x 10−2 7.00 x 10−3 9 95227207 D D 4.37 x 10−4 .

9 95228663 D D 1.18 x 10−2 7.00 x 10−3

FHOD3 18 34174816 D D 1.78 x 10−3 1.00 x 10−3 18 33877878 D D 8.74 x 10−4 .18 34182694 D D 8.91 x 10−4 . 18 34081915 D D 4.37 x 10−4 .18 34205613 D D 8.91 x 10−4 . 18 34174816 D D 8.74 x 10−4 1.00 x 10−3

18 34261475 D D 8.91 x 10−4 2.00 x 10−3 18 34205591 D D 4.37 x 10−4 .18 34261511 D D 8.02 x 10−3 8.90 x 10−3 18 34261475 D D 8.74 x 10−4 2.00 x 10−3

18 34273279 D D 1.84 x 10−1 2.19 x 10−1 18 34261476 D D 4.37 x 10−4 .18 34289285 D D 1.87 x 10−2 2.29 x 10−2 18 34261511 D D 9.18 x 10−3

18 34297890 D D 8.91 x 10−4 . 18 34273279 D D 1.82 x 10−1

18 34335228 D D 8.91 x 10−4 . 18 34349311 D D 4.37 x 10−4

Page 343: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position SIFT PolyPhen-2 MAF MAF 1000G Chr Position SIFT PolyPhen-2 MAF MAF 1000G

GATAD2A 19 19612148 D D 8.99 x 10−4 . 19 19603483 D D 4.37 x 10−4

19 19603489 D D 4.37 x 10−4

LAMA2 6 129371108 D D 8.91 x 10−4 . 6 129419454 D D 4.37 x 10−4

6 129419358 D D 8.91 x 10−4 . 6 129419474 D D 4.37 x 10−4

6 129571272 D D 2.32 x 10−2 1.39 x 10−2 6 129465054 D D 4.37 x 10−4

6 129601217 D D 1.78 x 10−3 2.00 x 10−3 6 129465109 D D 4.37 x 10−4

6 129712747 D D 8.91 x 10−4 . 6 129465131 D D 4.37 x 10−4

6 129762111 D D 8.91 x 10−4 . 6 129498935 D D 4.37 x 10−4

6 129766875 D D 8.91 x 10−4 . 6 129513850 D D 9.18 x 10−3

6 129571272 D D 2.40 x 10−2

6 129601217 D D 1.75 x 10−3

6 129609065 D D 4.37 x 10−4

6 129618873 D D 4.37 x 10−4

6 129670490 D D 4.37 x 10−4

6 129712722 D D 4.37 x 10−4

6 129722453 D D 2.67 x 10−2

P4HTM 3 49027890 D D 9.01 x 10−4 . 3 49027912 D D 4.37 x 10−43 49041623 D D 1.78 x 10−3 2.00 x 10−3 3 49041677 D D 4.37 x 10−4

PLK2 5 57754604 D D 4.37 x 10−4

PRMT5 14 23395485 D D 8.91 x 10−4 . 14 23395461 D D 4.37 x 10−4

SDR16C6P,PENK

8 57353841 D D 8.91 x 10−4 . 8 57354047 D D 4.37 x 10−4

8 57354047 D D 8.91 x 10−4 1.00 x 10−3

Page 344: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position SIFT PolyPhen-2 MAF MAF 1000G Chr Position SIFT PolyPhen-2 MAF MAF 1000G

SLC22A20,POLA2

11 65049959 D D 8.91 x 10−4 . 11 65046209 D D 4.37 x 10−4 .

11 65061679 D D 4.37 x 10−4 2.00 x 10−3

SLC6A18 5 1232408 D D 6.38 x 10−2 5.17 x 10−2 5 1225719 D D 4.37 x 10−4 2.00 x 10−3

5 1232468 D D 1.79 x 10−3 . 5 1232408 D D 4.63 x 10−2 5.17 x 10−2

5 1243820 D D 8.98 x 10−4 . 5 1232468 D D 1.31 x 10−3 .5 1244395 D D 8.93 x 10−4 . 5 1238093 D D 4.37 x 10−4 .5 1244478 D D 1.79 x 10−3 1.00 x 10−3 5 1239586 D D 4.37 x 10−4 .

5 1243785 D D 4.37 x 10−4 .5 1244478 D D 2.62 x 10−3 1.00 x 10−3

5 1244751 D D 4.37 x 10−4 .TET2 4 106155185 D D 1.78 x 10−2 2.19 x 10−2 4 106155185 D D 1.27 x 10−2 2.19 x 10−2

4 106158350 D D 1.78 x 10−3 7.00 x 10−3 4 106155185 D D 4.37 x 10−4 .4 106164741 D D 8.91 x 10−4 . 4 106158215 D D 4.37 x 10−4 .4 106164866 D D 8.91 x 10−4 . 4 106158263 D D 4.37 x 10−4 .4 106196756 D D 8.91 x 10−4 . 4 106158350 D D 1.31 x 10−3 7.00 x 10−3

4 106164794 D D 4.37 x 10−4 .4 106164913 D D 4.37 x 10−4 .4 106190868 D D 4.37 x 10−4 .4 106197552 D D 1.31 x 10−3 .

ISKS: International Sarcoma Kindred Study. MGRB: Medical Genome Reference Bank. Chr: Chromosome. SIFT: Sorting Intolerant FromTolerant. PolyPhen-2: Polymorphism Phenotyping-2. MAF: Minor Allele Frequency in the Cohort. MAF 1000G: Minor Allele Frequency inthe 1000 Genome Project European Population. .: No MAF data available.

Page 345: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix L

Gene identified by variant burdenanalyses by Ballinger et al.(2016) and Brohl et al. (2017)

Gene StudyAPC Ballinger, BrohlATM BallingerATR BallingerAXIN1 BallingerAXIN2 BallingerBARD1 BallingerBLM Ballinger, BrohlBRCA1 Ballinger, BrohlBRCA2 BallingerBRIP1 Ballinger, BrohlBUB1 BallingerBUB1B BallingerCDH1 BallingerCDKN2A (ARF) BallingerCDKN2A (INK4A) BallingerCHEK1 BallingerCHEK2 BallingerDDB2 BallingerERCC2 BallingerERCC3 Ballinger, Brohl

315

Page 346: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene name StudyERCC4 BallingerERCC5 BallingerEXT1 BallingerEXT2 Ballinger, BrohlFAM175A BallingerFANCA BallingerFANCB BallingerFANCC Ballinger, BrohlFANCD2 Ballinger, BrohlFANCE BallingerFANCF BallingerFANCG BallingerFANCI BallingerFANCL BallingerFANCM Ballinger, BrohlFH BallingerFLCN BrohlIDH1 BallingerIDH2 BallingerMEN1 BallingerMITF BrohlMLH1 BallingerMLH3 BallingerMRE11 BallingerMRE11A BallingerMSH2 BallingerMSH6 BallingerMUTYH BallingerNBN BallingerNF1 BallingerNF2 BallingerPALB2 BallingerPMS1 BallingerPMS2 Ballinger, BrohlPOLE BrohlPTCH1 BallingerPTCH2 BrohlPTEN BallingerPTPN11 BrohlRAD50 BallingerRAD51 Brohl

Page 347: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Gene name StudyRAD51C BallingerRAD51D Ballinger, BrohlRB1 BallingerRECQL4 BallingerRET BrohlSDHA BallingerSDHB BallingerSDHC BallingerSDHD BallingerSLX4 BrohlSTK11 BallingerTINF2 BrohlTP53 Ballinger, BrohlTP53BP1 BallingerTSC1 BallingerTSC2 BallingerVHL BallingerWRAP53 BrohlWRN BallingerWT1 BallingerXPA BallingerXPC BallingerXRCC2 Ballinger

ReferencesMandy L. Ballinger, David L. Goode, Isabelle Ray-Coquard, Paul A. James, Gillian Mitchell,and Eveline Niedermayr et al. Monogenic and polygenic determinants of sarcoma risk: aninternational genetic study. The Lancet Oncology, 17(9):1261–1271, 2016.Andrew S. Brohl, Rajesh Patidar, Clesson E. Turner, Xinyu Wen, Young K. Song, and JunS. Wei et al. Frequent inactivating germline mutations in DNA repair genes in patients withEwing sarcoma. Genetic in Medicine, 2017.

Page 348: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby
Page 349: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

Appendix M

A list of putative regulatoryvariants included in variantburden analyses

319

Page 350: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position RegulomeDB MAF MAF 1000G Chr Position RegulomeDB MAF MAF 1000G

ABCB5 7 20664768 2a 3.57 x 10−3 1.00 x 10−3 7 20664768 2a 4.37 x 10−4 1.00 x 10−3

7 20690614 2a 1.27 x 10−1 1.44 x 10−1 7 20690614 2a 1.32 x 10−1 1.44 x 10−1

7 20691693 2b 8.91 x 10−4 . 7 20691842 2b 4.37 x 10−4 .7 20691842 2b 2.67 x 10−3 . 7 20704880 2b 1.28 x 10−1 1.45 x 10−1

7 20691843 2b 2.67 x 10−3 . 7 20706184 2b 4.37 x 10−4 .7 20704880 2b 1.27 x 10−1 1.45 x 10−1 7 20706751 2b 1.00 x 10+00 1.00 x 10+00

7 20706390 2b 8.91 x 10−4 . 7 20777911 2b 8.45 x 10−1 8.34 x 10−1

7 20706751 2b 1.00 x 10+00 1.00 x 10+00 7 20784538 2b 8.46 x 10−1 8.39 x 10−1

7 20777911 2b 8.42 x 10−1 8.34 x 10−1

7 20784538 2b 8.40 x 10−1 8.39 x 10−1

ARHGAP39 8 145802447 1f 4.71 x 10−1 4.66 x 10−1 8 145802447 1f 4.65 x 10−1 4.66 x 10−1

8 145758331 2b 4.46 x 10−3 3.00 x 10−3 8 145758331 2b 5.24 x 10−3 3.00 x 10−3

8 145777753 2b 1.78 x 10−3 2.00 x 10−3 8 145777753 2b 5.24 x 10−3 2.00 x 10−3

8 145781088 2b 2.67 x 10−3 1.00 x 10−3 8 145781088 2b 2.62 x 10−3 1.00 x 10−3

8 145800882 2b 4.37 x 10−4 .BEAN1 16 66515120 1f 7.66 x 10−2 9.05 x 10−2 16 66515120 1f 7.12 x 10−2 9.05 x 10−2

16 66517326 1f 7.66 x 10−2 9.05 x 10−2 16 66517326 1f 7.04 x 10−2 9.05 x 10−2

16 66468122 2b 2.67 x 10−3 1.00 x 10−3 16 66468122 2b 4.37 x 10−4 1.00 x 10−3

16 66478328 2b 6.68 x 10−1 6.50 x 10−1 16 66478328 2b 6.63 x 10−1 6.50 x 10−1

16 66494404 2b 5.53 x 10−2 6.36 x 10−2 16 66494404 2b 6.08 x 10−2 6.36 x 10−2

16 66511627 2b 1.58 x 10−1 1.75 x 10−1 16 66511627 2b 1.61 x 10−1 1.75 x 10−1

16 66512704 2b 1.58 x 10−1 8.05 x 10−2 16 66512704 2b 8.61 x 10−2 8.05 x 10−2

Page 351: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position RegulomeDB MAF MAF 1000G Chr Position RegulomeDB MAF MAF 1000G

C16orf96 16 4649720 1f 3.03 x 10−2 4.47 x 10−2 16 4649720 1f 3.63 x 10−2 4.47 x 10−2

16 4606743 2a 8.91 x 10−4 . 16 4606733 2c 4.37 x 10−4 1.00 x 10−3

16 4625126 2b 6.31 x 10−1 6.08 x 10−1 16 4625126 2b 6.10 x 10−1 6.08 x 10−1

16 4625164 2b 1.96 x 10−2 1.59 x 10−2 16 4625164 2b 2.93 x 10−2 1.59 x 10−2

16 4625470 2b 6.07 x 10−2 5.57 x 10−2 16 4625470 2b 5.81 x 10−2 5.57 x 10−2

16 4630530 2b 8.21 x 10−1 7.93 x 10−1 16 4630530 2b 7.85 x 10−1 7.93 x 10−1

16 4648520 2b 1.00 x 10+00 1.00 x 10+00 16 4648520 2b 1.00 x 10+00 1.00 x 10+00

16 4649866 2b 8.91 x 10−4 .KIF2C 1 45218542 1f 4.41 x 10−1 3.99 x 10−1 1 45218542 1f 4.38 x 10−1 3.99 x 10−1

1 45213749 2b 1.78 x 10−3 6.00 x 10−3 1 45213749 2b 3.93 x 10−3 6.00 x 10−3

PDIA2 16 335639 2b 8.96 x 10−3 6.00 x 10−3 16 335639 2b 1.05 x 10−2 6.00 x 10−3

UVSSA 4 1343135 1f 1.31 x 10−1 1.39 x 10−1 4 1343135 1f 1.35 x 10−1 1.39 x 10−1

4 1345956 1b 5.57 x 10−1 5.62 x 10−1 4 1345956 1b 5.69 x 10−1 5.62 x 10−1

4 1346190 1d 1.52 x 10−2 1.59 x 10−2 4 1346190 1d 1.01 x 10−2 1.59 x 10−2

4 1346389 1f 5.03 x 10−1 5.16 x 10−1 4 1346389 1f 5.21 x 10−1 5.16 x 10−1

4 1350322 1f 6.56 x 10−1 6.68 x 10−1 4 1350322 1f 7.02 x 10−1 6.68 x 10−1

4 1357325 1f 1.34 x 10−1 1.40 x 10−1 4 1357325 1f 1.35 x 10−1 1.40 x 10−1

4 1358449 1f 6.56 x 10−1 6.68 x 10−1 4 1358449 1f 7.02 x 10−1 6.68 x 10−1

4 1363628 1f 6.60 x 10−1 6.73 x 10−1 4 1363628 1f 7.11 x 10−1 6.73 x 10−1

4 1363886 1f 1.12 x 10−1 1.26 x 10−1 4 1363886 1f 1.20 x 10−1 1.26 x 10−1

4 1364543 1f 6.56 x 10−1 6.68 x 10−1 4 1364543 1f 7.04 x 10−1 6.68 x 10−1

4 1365623 1f 3.81 x 10−1 3.94 x 10−1 4 1365623 1f 4.11 x 10−1 3.94 x 10−1

4 1365858 1f 7.23 x 10−1 7.20 x 10−1 4 1365858 1f 7.44 x 10−1 7.20 x 10−1

4 1367552 1f 6.87 x 10−1 6.99 x 10−1 4 1367552 1f 7.34 x 10−1 6.99 x 10−1

4 1373042 1f 5.19 x 10−1 5.38 x 10−1 4 1373042 1f 5.48 x 10−1 5.38 x 10−1

Page 352: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position RegulomeDB MAF MAF 1000G Chr Position RegulomeDB MAF MAF 1000G

4 1340811 2a 5.01 x 10−1 5.14 x 10−1 4 1340811 2a 5.21 x 10−1 5.14 x 10−1

4 1345647 2b 1.32 x 10−1 1.39 x 10−1 4 1345647 2b 1.35 x 10−1 1.39 x 10−1

4 1345817 2a 1.78 x 10−3 1.00 x 10−3 4 1350186 2a 3.72 x 10−2 2.98 x 10−2

4 1350186 2a 3.57 x 10−2 2.98 x 10−2 4 1354306 2b 1.38 x 10−1 1.47 x 10−1

4 1354306 2b 1.40 x 10−1 1.47 x 10−1 4 1354525 2b 4.50 x 10−1 4.16 x 10−1

4 1354525 2b 4.17 x 10−1 4.16 x 10−1 4 1359371 2b 4.06 x 10−1 3.84 x 10−1

4 1359371 2b 3.75 x 10−1 3.84 x 10−1 4 1359608 2b 1.01 x 10−2 1.69 x 10−2

4 1359608 2b 1.52 x 10−2 1.69 x 10−2 4 1359793 2b 4.37 x 10−4 .4 1359793 2b 8.91 x 10−4 .

ZFP69BABL1 9 133595219 1f 1.18 x 10−1 1.26 x 10−1 9 133595219 1f 1.27 x 10−1 1.26 x 10−1

9 133636808 1f 1.10 x 10−1 1.20 x 10−1 9 133636808 1f 1.25 x 10−1 1.20 x 10−1

9 133638943 1f 7.07 x 10−1 7.46 x 10−1 9 133638943 1f 7.16 x 10−1 7.46 x 10−1

9 133588353 2c 8.99 x 10−4 1.00 x 10−3 9 133588682 2b 4.37 x 10−4 .9 133607956 2b 4.21 x 10−1 4.22 x 10−1 9 133607956 2b 4.03 x 10−1 4.22 x 10−1

9 133617617 2b 7.13 x 10−3 7.00 x 10−3 9 133617617 2b 5.24 x 10−3 7.00 x 10−3

9 133620998 2b 2.86 x 10−1 2.68 x 10−1 9 133620998 2b 2.56 x 10−1 2.68 x 10−1

9 133640597 2b 2.67 x 10−3 1.00 x 10−3 9 133640597 2b 1.75 x 10−3 1.00 x 10−3

9 133640725 2b 1.78 x 10−3 . 9 133656335 2b 4.22 x 10−1 4.18 x 10−1

9 133641045 2a 8.91 x 10−4 . 9 133670047 2b 1.49 x 10−2 1.79 x 10−2

9 133643981 2b 8.91 x 10−4 . 9 133699113 2b 1.31 x 10−3 .9 133656335 2b 4.43 x 10−1 4.18 x 10−1 9 133700786 2b 1.42 x 10−1 1.55 x 10−1

9 133670047 2b 1.69 x 10−2 1.79 x 10−2 9 133710246 2b 1.39 x 10−1 1.50 x 10−1

9 133700786 2b 1.72 x 10−1 1.55 x 10−1 9 133710345 2b 9.62 x 10−3 1.99 x 10−2

9 133710246 2b 1.67 x 10−1 1.50 x 10−1 9 133712727 2b 4.37 x 10−4 .

Page 353: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position RegulomeDB MAF MAF 1000G Chr Position RegulomeDB MAF MAF 1000G

9 133710345 2b 1.52 x 10−2 1.99 x 10−2 9 133712801 2b 1.38 x 10−1 1.49 x 10−1

9 133712801 2b 1.67 x 10−1 1.49 x 10−1 9 133719421 2b 4.37 x 10−4 .9 133723594 2b 1.29 x 10−1 1.13 x 10−1 9 133722953 2a 4.37 x 10−4 .9 133742253 2b 7.13 x 10−3 2.00 x 10−3 9 133723404 2b 4.37 x 10−4 .9 133746209 2b 1.78 x 10−3 3.00 x 10−3 9 133723594 2b 9.97 x 10−2 1.13 x 10−1

9 133748664 2b 5.35 x 10−3 5.00 x 10−3 9 133742253 2b 2.62 x 10−3 2.00 x 10−3

9 133761616 2b 1.78 x 10−3 1.00 x 10−3 9 133746209 2b 8.74 x 10−4 3.00 x 10−3

9 133761675 2b 8.91 x 10−3 6.00 x 10−3 9 133748664 2b 7.43 x 10−3 5.00 x 10−3

9 133763500 2b 2.05 x 10−2 1.99 x 10−2 9 133761616 2b 8.74 x 10−4 1.00 x 10−3

9 133761675 2b 5.68 x 10−3 6.00 x 10−3

9 133763500 2b 1.57 x 10−2 1.99 x 10−2

ADSSL1 14 105197567 1f 4.39 x 10−1 4.33 x 10−1 14 105197567 1f 4.40 x 10−1 4.33 x 10−1

14 105200377 1f 4.43 x 10−1 4.40 x 10−1 14 105200377 1f 4.44 x 10−1 4.40 x 10−1

14 105207134 1f 4.58 x 10−1 4.48 x 10−1 14 105207134 1f 4.51 x 10−1 4.48 x 10−1

14 105208057 1f 3.01 x 10−1 3.27 x 10−1 14 105208057 1f 3.06 x 10−1 3.27 x 10−1

14 105208879 1f 4.55 x 10−1 4.48 x 10−1 14 105208879 1f 4.47 x 10−1 4.48 x 10−1

14 105210207 1f 4.54 x 10−1 4.45 x 10−1 14 105210207 1f 4.45 x 10−1 4.45 x 10−1

14 105212399 1f 4.57 x 10−1 4.46 x 10−1 14 105212399 1f 4.48 x 10−1 4.46 x 10−1

14 105213343 1f 4.55 x 10−1 4.45 x 10−1 14 105213343 1f 4.48 x 10−1 4.45 x 10−1

14 105199637 2b 8.91 x 10−4 . 14 105195826 2b 4.37 x 10−4 1.00 x 10−3

14 105201596 2b 4.36 x 10−1 4.30 x 10−1 14 105199637 2b 4.37 x 10−4 .14 105206172 2b 4.36 x 10−1 1.02 x 10−1 14 105201596 2b 4.38 x 10−1 4.30 x 10−1

14 105207642 2b 6.34 x 10−2 7.06 x 10−2 14 105206172 2b 1.26 x 10−1 1.02 x 10−1

14 105207823 2b 8.94 x 10−4 . 14 105207642 2b 6.34 x 10−2 7.06 x 10−2

Page 354: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position RegulomeDB MAF MAF 1000G Chr Position RegulomeDB MAF MAF 1000G

ASPNFHOD3 18 34148508 1f 7.04 x 10−1 7.27 x 10−1 18 34148508 1f 7.31 x 10−1 7.27 x 10−1

18 34203207 1f 1.58 x 10−1 1.44 x 10−1 18 34203207 1f 1.45 x 10−1 1.44 x 10−1

18 34203793 1f 1.60 x 10−1 1.45 x 10−1 18 34203793 1f 1.43 x 10−1 1.45 x 10−1

18 34204260 1f 8.32 x 10−1 8.44 x 10−1 18 34204260 1f 8.47 x 10−1 8.44 x 10−1

18 34216404 1a 8.42 x 10−1 8.49 x 10−1 18 34216404 1a 8.54 x 10−1 8.49 x 10−1

18 34216504 1f 1.57 x 10−1 1.51 x 10−1 18 34216504 1f 1.46 x 10−1 1.51 x 10−1

18 34218537 1d 8.42 x 10−1 8.49 x 10−1 18 34218537 1d 8.55 x 10−1 8.49 x 10−1

18 34220491 1f 3.19 x 10−1 2.91 x 10−1 18 34220491 1f 3.30 x 10−1 2.91 x 10−1

18 34229559 1f 2.18 x 10−1 2.13 x 10−1 18 34229559 1f 2.36 x 10−1 2.13 x 10−1

18 34243972 1f 6.84 x 10−1 7.21 x 10−1 18 34243972 1f 6.82 x 10−1 7.21 x 10−1

18 34256668 1f 4.77 x 10−1 4.83 x 10−1 18 34256668 1f 4.69 x 10−1 4.83 x 10−1

18 34267997 1f 5.01 x 10−1 4.87 x 10−1 18 34267997 1f 4.96 x 10−1 4.87 x 10−1

18 34299071 1f 3.49 x 10−1 3.19 x 10−1 18 34299071 1f 3.51 x 10−1 3.19 x 10−1

18 34299757 1f 3.47 x 10−1 3.18 x 10−1 18 34299757 1f 3.50 x 10−1 3.18 x 10−1

18 34310668 1f 3.46 x 10−1 3.16 x 10−1 18 34310668 1f 3.48 x 10−1 3.16 x 10−1

18 34324190 1f 3.24 x 10−1 3.03 x 10−1 18 34324190 1f 3.29 x 10−1 3.03 x 10−1

18 34324376 1f 4.71 x 10−1 4.88 x 10−1 18 34324376 1f 4.79 x 10−1 4.88 x 10−1

18 34345587 1f 1.36 x 10−1 1.75 x 10−1 18 34345587 1f 1.49 x 10−1 1.75 x 10−1

18 34349962 1f 1.38 x 10−1 1.73 x 10−1 18 34349962 1f 1.48 x 10−1 1.73 x 10−1

18 33889608 2a 9.91 x 10−1 9.92 x 10−1 18 33889608 2a 9.93 x 10−1 9.92 x 10−1

18 33894243 2b 1.78 x 10−3 1.00 x 10−3 18 33895959 2b 3.06 x 10−3 5.00 x 10−3

18 33895959 2b 1.78 x 10−3 5.00 x 10−3 18 33895999 2b 4.37 x 10−4 2.00 x 10−3

18 33895999 2b 3.57 x 10−3 2.00 x 10−3 18 33901857 2b 5.68 x 10−3 5.00 x 10−3

18 33901843 2b 1.78 x 10−3 1.00 x 10−3 18 33908348 2b 8.30 x 10−3 1.09 x 10−2

Page 355: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position RegulomeDB MAF MAF 1000G Chr Position RegulomeDB MAF MAF 1000G

18 33901857 2b 1.78 x 10−3 5.00 x 10−3 18 33908348 2b 8.74 x 10−4 5.00 x 10−3

18 33908348 2b 9.80 x 10−3 1.09 x 10−2 18 33908571 2b 8.30 x 10−3 1.09 x 10−2

18 33908348 2b 4.46 x 10−3 5.00 x 10−3 18 33909237 2b 4.37 x 10−4 1.00 x 10−3

18 33908571 2b 9.80 x 10−3 1.09 x 10−2 18 33915970 2b 8.74 x 10−4 5.00 x 10−3

18 33915970 2b 4.46 x 10−3 5.00 x 10−3 18 33954556 2b 4.94 x 10−2 3.48 x 10−2

18 33954556 2b 3.92 x 10−2 3.48 x 10−2 18 33956543 2b 3.33 x 10−1 3.69 x 10−1

18 33956543 2b 3.56 x 10−1 3.69 x 10−1 18 33969863 2b 1.70 x 10−1 1.78 x 10−1

18 33969863 2b 1.93 x 10−1 1.78 x 10−1 18 33972395 2b 8.74 x 10−4 1.00 x 10−3

18 33972395 2b 8.91 x 10−4 1.00 x 10−3 18 33974575 2b 8.74 x 10−4 1.00 x 10−3

18 33974575 2b 8.91 x 10−4 1.00 x 10−3 18 33974664 2b 2.10 x 10−2 2.19 x 10−2

18 33974664 2b 2.41 x 10−2 2.19 x 10−2 18 34020888 2b 4.37 x 10−4 .18 34001063 2b 1.78 x 10−3 1.00 x 10−3 18 34034314 2b 2.75 x 10−2 3.88 x 10−2

18 34020888 2b 1.78 x 10−3 . 18 34034383 2b 2.75 x 10−2 3.88 x 10−2

18 34034314 2b 3.48 x 10−2 3.88 x 10−2 18 34042284 2b 2.71 x 10−2 3.78 x 10−2

18 34034383 2b 3.48 x 10−2 3.88 x 10−2 18 34076272 2b 1.05 x 10−2 8.00 x 10−3

18 34042277 2b 1.78 x 10−3 1.00 x 10−3 18 34160536 2b 7.43 x 10−3 2.00 x 10−3

18 34076196 2b 8.91 x 10−4 . 18 34204586 2b 1.53 x 10−1 1.55 x 10−1

18 34076272 2b 1.43 x 10−2 8.00 x 10−3 18 34204629 2b 1.46 x 10−1 1.45 x 10−1

18 34083621 2b 8.91 x 10−4 2.00 x 10−3 18 34215505 2b 1.54 x 10−1 1.60 x 10−1

18 34112639 2b 8.91 x 10−4 . 18 34240401 2b 2.67 x 10−1 2.57 x 10−1

18 34160536 2b 8.02 x 10−3 2.00 x 10−3 18 34271463 2b 4.69 x 10−1 4.60 x 10−1

18 34204586 2b 1.66 x 10−1 1.55 x 10−1 18 34273111 2b 2.88 x 10−2 3.08 x 10−2

18 34204629 2b 1.56 x 10−1 1.45 x 10−1 18 34291877 2b 5.31 x 10−1 5.47 x 10−1

18 34214122 2b 1.78 x 10−3 1.00 x 10−3 18 34297713 2b 4.37 x 10−4 1.00 x 10−3

18 34214199 2b 8.91 x 10−4 . 18 34351529 2b 7.34 x 10−1 7.18 x 10−1

Page 356: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position RegulomeDB MAF MAF 1000G Chr Position RegulomeDB MAF MAF 1000G

18 34214531 2b 8.91 x 10−4 .18 34215502 2b 1.78 x 10−3 1.00 x 10−3

18 34215505 2b 1.67 x 10−1 1.60 x 10−1

18 34215671 2a 1.78 x 10−3 1.00 x 10−3

18 34215802 2b 1.78 x 10−3 1.00 x 10−3

18 34231109 2b 8.91 x 10−4 1.00 x 10−3

18 34240401 2b 2.74 x 10−1 2.57 x 10−1

18 34271463 2b 4.65 x 10−1 4.60 x 10−1

18 34273111 2b 3.57 x 10−2 3.08 x 10−2

18 34291877 2b 5.38 x 10−1 5.47 x 10−1

18 34337473 2b 8.91 x 10−4 1.00 x 10−3

18 34339421 2b 8.91 x 10−4 1.00 x 10−3

18 34351529 2b 7.37 x 10−1 7.18 x 10−1

GATAD2A 19 19519518 1f 3.47 x 10−1 3.37 x 10−1 19 19519518 1f 3.63 x 10−1 3.37 x 10−1

19 19524105 1f 1.84 x 10−1 1.70 x 10−1 19 19524105 1f 1.70 x 10−1 1.70 x 10−1

19 19530270 1f 3.48 x 10−1 3.37 x 10−1 19 19530270 1f 3.62 x 10−1 3.37 x 10−1

19 19545099 1f 1.51 x 10−1 1.53 x 10−1 19 19545099 1f 1.86 x 10−1 1.53 x 10−1

19 19545428 1f 1.84 x 10−1 1.71 x 10−1 19 19545428 1f 1.71 x 10−1 1.71 x 10−1

19 19554803 1f 3.39 x 10−1 3.27 x 10−1 19 19554803 1f 3.59 x 10−1 3.27 x 10−1

19 19568659 1b 3.39 x 10−1 3.25 x 10−1 19 19568659 1b 3.59 x 10−1 3.25 x 10−1

19 19603692 1f 3.40 x 10−1 3.24 x 10−1 19 19603692 1f 3.58 x 10−1 3.24 x 10−1

19 19611550 1f 3.41 x 10−1 3.25 x 10−1 19 19611550 1f 3.60 x 10−1 3.25 x 10−1

19 19495954 2b 6.52 x 10−1 6.63 x 10−1 19 19611550 1f 4.37 x 10−4 .19 19497195 2b 8.16 x 10−1 8.30 x 10−1 19 19495954 2b 6.37 x 10−1 6.63 x 10−1

19 19499598 2b 1.51 x 10−1 1.53 x 10−1 19 19497195 2b 8.29 x 10−1 8.30 x 10−1

Page 357: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position RegulomeDB MAF MAF 1000G Chr Position RegulomeDB MAF MAF 1000G

19 19510831 2b 1.07 x 10−2 . 19 19499598 2b 1.85 x 10−1 1.53 x 10−1

19 19516433 2a 3.49 x 10−1 3.37 x 10−1 19 19516433 2a 3.62 x 10−1 3.37 x 10−1

19 19517325 2b 1.91 x 10−1 1.80 x 10−1 19 19517325 2b 1.77 x 10−1 1.80 x 10−1

19 19521761 2b 7.13 x 10−3 8.00 x 10−3 19 19521761 2b 7.87 x 10−3 8.00 x 10−3

19 19528806 2b 3.48 x 10−1 3.37 x 10−1 19 19528806 2b 3.61 x 10−1 3.37 x 10−1

19 19557965 2b 2.41 x 10−2 2.88 x 10−2 19 19557965 2b 2.32 x 10−2 2.88 x 10−2

19 19571531 2b 9.80 x 10−3 3.00 x 10−3 19 19571531 2b 1.18 x 10−2 3.00 x 10−3

19 19571545 2b 8.91 x 10−4 5.00 x 10−3 19 19571545 2b 5.68 x 10−3 5.00 x 10−3

19 19571752 2b 1.78 x 10−1 1.64 x 10−1 19 19571752 2b 1.65 x 10−1 1.64 x 10−1

19 19579557 2b 1.83 x 10−1 1.70 x 10−1 19 19579557 2b 1.71 x 10−1 1.70 x 10−1

19 19585119 2b 8.91 x 10−4 2.00 x 10−3 19 19585119 2b 1.31 x 10−3 2.00 x 10−3

19 19596340 2b 9.80 x 10−3 3.00 x 10−3 19 19585208 2b 4.37 x 10−4 .19 19598467 2b 8.91 x 10−4 2.00 x 10−3 19 19596340 2b 1.18 x 10−2 3.00 x 10−3

19 19598910 2b 8.91 x 10−4 . 19 19598467 2b 1.31 x 10−3 2.00 x 10−3

19 19599742 2b 3.83 x 10−2 4.08 x 10−2 19 19599742 2b 3.23 x 10−2 4.08 x 10−2

19 19608995 2b 5.35 x 10−3 2.00 x 10−3 19 19608995 2b 8.74 x 10−4 2.00 x 10−3

19 19618060 2b 1.32 x 10−1 . 19 19618060 2b 1.43 x 10−1 .19 19618060 2b 1.04 x 10−1 . 19 19618060 2b 1.04 x 10−1 .19 19618060 2b 1.62 x 10−1 . 19 19618060 2b 1.57 x 10−1 .

LAMA2 6 129204292 2b 3.83 x 10−2 3.18 x 10−2 6 129204292 2b 3.89 x 10−2 3.18 x 10−2

6 129204319 2b 8.94 x 10−4 . 6 129204319 2b 4.37 x 10−4 .6 129425564 2b 8.27 x 10−1 8.20 x 10−1 6 129425564 2b 8.24 x 10−1 8.20 x 10−1

6 129508265 2b 2.08 x 10−1 2.19 x 10−1 6 129508265 2b 2.15 x 10−1 2.19 x 10−1

6 129561473 2b 7.61 x 10−1 7.63 x 10−1 6 129561473 2b 7.61 x 10−1 7.63 x 10−1

6 129639557 2b 8.91 x 10−4 . 6 129671673 2b 8.39 x 10−2 6.66 x 10−2

Page 358: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position RegulomeDB MAF MAF 1000G Chr Position RegulomeDB MAF MAF 1000G

6 129671673 2b 6.51 x 10−2 6.66 x 10−2 6 129770195 2b 1.23 x 10−1 1.28 x 10−1

6 129761921 2b 8.91 x 10−4 . 6 129770422 2b 1.23 x 10−1 1.29 x 10−1

6 129770195 2b 1.51 x 10−1 1.28 x 10−1 6 129793597 2b 7.30 x 10−2 7.06 x 10−2

6 129770422 2b 1.51 x 10−1 1.29 x 10−1 6 129797037 2a 3.93 x 10−1 4.12 x 10−1

6 129793597 2b 7.58 x 10−2 7.06 x 10−2 6 129811850 2b 8.30 x 10−3 8.00 x 10−3

6 129797037 2a 3.93 x 10−1 4.12 x 10−1 6 129814998 2b 1.76 x 10−1 1.84 x 10−1

6 129811850 2b 1.69 x 10−2 8.00 x 10−3 6 129819824 2b 3.50 x 10−3 1.00 x 10−3

6 129814998 2b 1.93 x 10−1 1.84 x 10−1 6 129820418 2b 3.19 x 10−2 2.68 x 10−2

6 129819824 2b 1.78 x 10−3 1.00 x 10−3 6 129820459 2b 8.74 x 10−3 9.90 x 10−3

6 129820418 2b 2.23 x 10−2 2.68 x 10−2 6 129822127 2b 8.74 x 10−3 9.90 x 10−3

6 129820459 2b 1.69 x 10−2 9.90 x 10−3

6 129822127 2b 1.69 x 10−2 9.90 x 10−3

P4HTM 3 49030828 1f 6.68 x 10−1 6.64 x 10−1 3 49030828 1f 6.74 x 10−1 6.64 x 10−1

3 49032967 1f 7.83 x 10−1 7.83 x 10−1 3 49032967 1f 7.78 x 10−1 7.83 x 10−1

3 49034879 1f 7.84 x 10−1 7.83 x 10−1 3 49034879 1f 7.78 x 10−1 7.83 x 10−1

3 49035211 1f 6.67 x 10−1 6.64 x 10−1 3 49035211 1f 6.72 x 10−1 6.64 x 10−1

3 49035885 1b 6.66 x 10−1 6.64 x 10−1 3 49035885 1b 6.72 x 10−1 6.64 x 10−1

3 49040462 1f 7.81 x 10−1 7.79 x 10−1 3 49040462 1f 7.76 x 10−1 7.79 x 10−1

3 49028114 2b 9.01 x 10−4 . 3 49028114 2b 4.37 x 10−4 .PLK2 5 57749769 1f 9.45 x 10−2 9.64 x 10−2 5 57749769 1f 1.03 x 10−1 9.64 x 10−2

5 57756305 2b 9.66 x 10−2 1.01 x 10−1 5 57756305 2b 1.10 x 10−1 1.01 x 10−1

PRMT5

Page 359: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position RegulomeDB MAF MAF 1000G Chr Position RegulomeDB MAF MAF 1000G

SDR16C6P,PENK

8 57354365 2b 8.91 x 10−4 . 8 57354365 2b 4.37 x 10−4 .

8 57358016 2a 1.78 x 10−3 1.00 x 10−3 8 57358016 2a 1.31 x 10−3 1.00 x 10−3

8 57358242 2b 2.67 x 10−3 1.00 x 10−3

SLC22A20,POLA2

11 65000244 1f 9.20 x 10−1 9.22 x 10−1 11 65000244 1f 9.14 x 10−1 9.22 x 10−1

11 65000478 1f 7.79 x 10−1 7.77 x 10−1 11 65000478 1f 7.78 x 10−1 7.77 x 10−1

11 65001679 1f 8.61 x 10−1 8.72 x 10−1 11 65001679 1f 8.63 x 10−1 8.72 x 10−1

11 65033875 1f 1.78 x 10−3 2.00 x 10−3 11 65033875 1f 1.31 x 10−3 2.00 x 10−3

11 64981853 2b 4.38 x 10−1 4.12 x 10−1 11 64981853 2b 4.27 x 10−1 4.12 x 10−1

11 64981871 2b 5.70 x 10−2 4.87 x 10−2 11 64981871 2b 5.03 x 10−2 4.87 x 10−2

11 64982520 2b 5.88 x 10−2 4.87 x 10−2 11 64982520 2b 5.03 x 10−2 4.87 x 10−2

11 64982521 2b 8.11 x 10−2 9.34 x 10−2 11 64982521 2b 8.61 x 10−2 9.34 x 10−2

11 64983580 2c 7.71 x 10−1 7.48 x 10−1 11 64987193 2b 3.50 x 10−2 3.68 x 10−2

11 64983581 2c 7.71 x 10−1 7.48 x 10−1 11 64987391 2b 7.76 x 10−1 7.78 x 10−1

11 64987193 2b 2.76 x 10−2 3.68 x 10−2 11 64990041 2b 7.76 x 10−1 7.78 x 10−1

11 64987391 2b 7.80 x 10−1 7.78 x 10−1 11 64993507 2b 4.37 x 10−4 1.00 x 10−3

11 64990041 2b 7.80 x 10−1 7.78 x 10−1 11 65004416 2b 4.37 x 10−4 1.00 x 10−3

11 64992584 2a 8.91 x 10−4 . 11 65005006 2b 4.37 x 10−4 .11 64993507 2b 1.78 x 10−3 1.00 x 10−3 11 65005032 2b 4.37 x 10−4 1.00 x 10−3

11 65004416 2b 1.78 x 10−3 1.00 x 10−3 11 65042530 2b 1.49 x 10−1 1.45 x 10−1

11 65005006 2b 8.91 x 10−4 . 11 65042837 2b 9.22 x 10−2 .11 65005032 2b 1.78 x 10−3 1.00 x 10−3 11 65042921 2b 9.22 x 10−2 1.00 x 10−3

11 65042530 2b 1.53 x 10−1 1.45 x 10−1 11 65045725 2b 1.50 x 10−1 1.48 x 10−1

11 65042837 2b 9.95 x 10−2 .

Page 360: Identificationofnovelriskvariantsfor ... › portal... · Identificationofnovelriskvariantsfor sarcomaandothercancersbywhole exomesequencinganalysisincancer clusterfamilies Submittedby

ISKS MGRBRegion ofInterest

Chr Position RegulomeDB MAF MAF 1000G Chr Position RegulomeDB MAF MAF 1000G

11 65042837 2b 8.96 x 10−4 .11 65045725 2b 1.57 x 10−1 1.48 x 10−1

SLC6A18 5 1228240 2b 1.16 x 10−1 1.09 x 10−1 5 1228240 2b 9.88 x 10−2 1.09 x 10−1

5 1239940 2c 8.91 x 10−4 . 5 1239940 2c 4.37 x 10−4 .5 1247024 2b 2.06 x 10−2 2.49 x 10−2 5 1247024 2b 2.23 x 10−2 2.49 x 10−2

5 1247093 2b 8.98 x 10−4 2.00 x 10−3 5 1247093 2b 8.74 x 10−4 2.00 x 10−3

TET2 4 106067359 2a 3.86 x 10−2 1.99 x 10−2 4 106067359 2a 4.20 x 10−2 1.99 x 10−2

4 106068129 2b 8.91 x 10−4 . 4 106068155 2a 4.37 x 10−4 2.00 x 10−3

4 106068155 2a 2.67 x 10−3 2.00 x 10−3 4 106068498 2b 1.84 x 10−2 1.49 x 10−2

4 106068498 2b 1.87 x 10−2 1.49 x 10−2 4 106068499 2b 9.62 x 10−3 7.00 x 10−3

4 106068499 2b 1.60 x 10−2 7.00 x 10−3 4 106069013 2b 2.17 x 10−1 2.12 x 10−1

4 106069013 2b 1.96 x 10−1 2.12 x 10−1

ISKS: International Sarcoma Kindred Study. MGRB: Medical Genome Reference Bank. Chr: Chromosome. RegulomeDB: Regulome database.MAF: Minor Allele Frequency in the Cohort. MAF 1000G: Minor Allele Frequency in the 1000 Genome Project European Population. .: NoMAF data available.