Dr. Ethan Cerami: cBio Cancer Genomics Portal

36
The cBio Cancer Genomics Portal: An Open Platform for Exploring Multidimensional Cancer Genomics Data Ethan Cerami, Ph.D. Director, Cancer Informatics Development Computational Biology Center (cBio) Memorial Sloan-Kettering Cancer Center CBIIT Talk May 23, 2012 cBio Cancer Genomics Portal Introduction Motivation: Pathway Analysis Network Analysis CBIIT Talk Examples of Usage Advanced Options Web API / R Package TCGA Ecosystem & Future Plans http://cbioportal.org Friday, May 18, 12

description

On May 23, Dr. Ethan Cerami delivered a presentation titled "cBio Cancer Genomics Portal." This presentation provided an introduction to the portal and description on how to mine data generated by The Cancer Genome Atlas (TCGA) project.

Transcript of Dr. Ethan Cerami: cBio Cancer Genomics Portal

Page 1: Dr. Ethan Cerami: cBio Cancer Genomics Portal

The cBio Cancer Genomics Portal: An Open Platform forExploring Multidimensional Cancer Genomics Data

Ethan Cerami, Ph.D.Director, Cancer Informatics Development

Computational Biology Center (cBio)Memorial Sloan-Kettering Cancer Center

CBIIT Talk May 23, 2012

cBio Cancer Genomics Portal

Introduction

Motivation: Pathway Analysis Network Analysis

CBIIT Talk

Examples of Usage Advanced Options Web API / R Package

TCGA Ecosystem & Future Plans

http://cbioportal.org

Friday, May 18, 12

Page 2: Dr. Ethan Cerami: cBio Cancer Genomics Portal

The Cancer Genome Atlas (TCGA) Project

2

MSKCC Genome Data Analysis Center (GDAC)

Page 3: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Pathway Analysis

Genomic Alteration(s): Single Nucleotide Variants Small Insertions and Deletions

Copy NumberAlterations

mRNA and microRNAexpression Changes

DNA Methylation

Patient Cohort

PI3K Pathway

TP53 Pathway

Pathway Analysis:

Genomic Inputs:

+ +

Epigenetically silenced genes

Copy numberaltered geneswith correlatedgene expression

Pathway and Network Data

N-Ac-Neuraminate(Sialate)

O

OHACNH

HO OH

O

OH HO OPPU

NHAC

UDP-N-Ac-Glucosaminepyruvate

N-Ac-Mannosamine-6-P

2.7.7.433.1.3.29

1.1.1.158

3.1.3.294.1.3.20

2.7.1.60

CMP-N-Acetylneuraminate

2.4.1.16

6.3.2.136.3.2.7-10

OCHOHCHOHAcNH

HO

OPC

OCO -CH OH2

OCHOHCHOHAcNH

HO

OH

COOCH OH2

O

OHACNH

HO OH

CH O2 P

CH OH2CH OH2

UDP-N-Ac-Muramate

O

OCH CH3

COO-

NHAC

HO OPPU

CH OH2

4.1.3.20

O

OHC C2 NHAC

HO OPPU

COO

CH OH2

2.4.99.7

Metabolic Pathways Signaling Pathways Protein-Protein Interactions Regulatory Networks Drug-Target Networks

Friday, May 18, 12

3

Page 4: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package

Comprehensive genomic characterization defines human glioblastoma genes and core pathways The Cancer Genome Atlas Research Network Nature 455, 1061-1068(23 October 2008) 4

Page 5: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package

5

Homologous Repair (HR) Alterations BRCA Altered Cases, N=103 (33%) BRCA1

BRCA2Epigenetic Silencing via Hypermethylation

Somatic Mutation

Germline Mutation DNA damage

ATM 1%

ATR <1%

FA core complex

5%

FANCD2 <1%

BRCA1 23%

BRCA2 11%

EMSY 8%

RAD51C 3%

mutated mutated mutated / hypermethyl.

mutated hypermethyl.mutated

amplified / mutated

mutated

Sensors

HR-Mediated repair

PTEN 7%

deleted

HR Pathway51% of cases altered

0 20

40

60

80

10

0

BRCA Mutated [66] BRCA1 Epigenetically Silenced [33] BRCA Wildtype [212]

Patie

nt S

urvi

val

/RJïrank test Sïvalue: 0.0008602

0 50 100 150 Months Survival

Integrated genomic analyses of ovarian carcinoma The Cancer Genome Atlas Research Network Nature 474, 609–615 (30 June 2011)

Friday, May 18, 12

Page 6: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Pathway Analysis

Genomic Alteration(s): Single Nucleotide Variants Small Insertions and Deletions

Copy NumberAlterations

mRNA and microRNAexpression Changes

DNA Methylation

Patient Cohort

PI3K Pathway

TP53 Pathway

Pathway Analysis:

Genomic Inputs:

+ +

Epigenetically silenced genes

Copy numberaltered geneswith correlatedgene expression

Pathway and Network Data

N-Ac-Neuraminate(Sialate)

O

OHACNH

HO OH

O

OH HO OPPU

NHAC

UDP-N-Ac-Glucosaminepyruvate

N-Ac-Mannosamine-6-P

2.7.7.433.1.3.29

1.1.1.158

3.1.3.294.1.3.20

2.7.1.60

CMP-N-Acetylneuraminate

2.4.1.16

6.3.2.136.3.2.7-10

OCHOHCHOHAcNH

HO

OPC

OCO -CH OH2

OCHOHCHOHAcNH

HO

OH

COOCH OH2

O

OHACNH

HO OH

CH O2 P

CH OH2CH OH2

UDP-N-Ac-Muramate

O

OCH CH3

COO-

NHAC

HO OPPU

CH OH2

4.1.3.20

O

OHC C2 NHAC

HO OPPU

COO

CH OH2

2.4.99.7

Metabolic Pathways Signaling Pathways Protein-Protein Interactions Regulatory Networks Drug-Target Networks

cBio Cancer Genomics Portal

PathwayCommons

Friday, May 18, 12

6

Page 7: Dr. Ethan Cerami: cBio Cancer Genomics Portal

7

Mutations

Copy

Number

mRNA

Expression

DNA

Methylation

Biological

Pathways

Clinical

Survival

Protein /

Phospho-

protein

020

40

60

80

100

20 40 60 80 100 1200

Web-Based Interface for Iterative Exploratory Data Analysis

Integration of

Genomic Data

Types, Clinical

Data, and Biologi-

cal Pathways.

OncoPrint: Compact Visualization of Discrete Genomic Events

Survival Analysis Network Analysis

Comprehensive Cancer Genomic Studies

Web-Service Interface

R-Package

MATLAB ToolBox

Mutation Details

Predicted Functional Impact

of Mutations

Multidimensional Genomic

Data Plots

Other Reports

Alteration Frequency (%)

...

cBio Cancer Genomics Portal

Gene A

Gene B

Gene C

Biological Insight

Clinical Trial Design

The cBio Cancer Genomics Portal Cerami, et. al, Cancer Discovery (May, 2012)

Page 8: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package

CBIIT Talk

cBio Portal in Context • Other Portals available:

• TCGA Data Portal

• ICGC Data Portal

• UCSC Cancer Genome Browser

• cBio Portal:

• Supports Exploratory Data Analysis

• Lowers the barrier to access - specifically for biologists andclinical researchers

• Provides integrated access to data

8

Friday, May 18, 12

Page 9: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package

CBIIT Talk

Multiple Portals • Public Portal: http://www.cbioportal.org/

• Contains published TCGA studies + a fewother studies.

• Now also contains public copy number, mRNA, RPPA data for all TCGA tumor types (everything, but mutation data).

• Open Access.

• TCGA Portal: http://cbio.mskcc.org/gdac-portal/

• Contains all provisional TCGA data, updatedmonthly.

• Requires a user name / password.

• Register at: http://bit.ly/gdac-form. • Stand-Up to Cancer (SU2C) Portal

9

Friday, May 18, 12

Page 10: Dr. Ethan Cerami: cBio Cancer Genomics Portal

4-Step Web Interface

4-step web interface for querying a single cancer study

RB1 CDK4 CDKN2A

Advanced: Onco Query Language (OQL)Enter Gene Set:

Select Cancer Study:

Select Patient/Case Set:

Select Genomic Profiles:

All Complete Tumors (seq, mRNA, CNA)

MutationsCopy Number Data. Select one of the profiles below:

mRNA Expression z-Scores

Glioblastoma (TCGA)

The Cancer Genome Atlas (TCGA) Glioblastoma project. 206 primary glioblastoma samples.Nature 2008. Raw data via the TCGA Data Portal.

Or Select from Example Gene Sets:

User-Defined List

Query Download Data

Optional Arguments:Compute Mutual Exclusivity / Co-occurence between all pairs of genes. (Not recommended for more than 10 genes.)

Submit

Putative copy-number alterations (RAE)

Putative copy-number alterations (GBM Pathways)

1

2 Select one or more genomic profilesFor example: Mutation and Copy Number Data

Select a Cancer Study or “All Cancer Studies”

3 Select a Patient Set

4 Enter a Gene or Gene Set

Optional argument to compute mutual exclusivity / co-occurence between all pairs of genes.

10

Page 11: Dr. Ethan Cerami: cBio Cancer Genomics Portal

cBio Cancer Genomics Portal

Introduction

Motivation: Pathway Analysis Network Analysis

CBIIT Talk

Examples of Usage Advanced Options Web API / R Package

TCGA Ecosystem & Future Plans

Main Features:

11

Friday, May 18, 12

Page 12: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package

Key Abstraction: Discrete Genomic-Level Events

• Each Gene within each sample is assigned multiple discrete genomic level events:

• Mutations: Mutated or WT.

• Copy Number: Amplification, Homozygous Deletion, etc.

• Important caveats:

• Portal does not provide confidence intervals for mutations.

• Copy number calls (as determined by GISTIC or RAE) are putative.

12

Friday, May 18, 12

Page 13: Dr. Ethan Cerami: cBio Cancer Genomics Portal

New Tutorials Available

Friday, May 18, 12

13

Page 14: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Querying a Single Cancer Study

Friday, May 18, 12

14

Page 15: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Friday, May 18, 12

15

Page 16: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Friday, May 18, 12

16

Page 17: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Friday, May 18, 12

17

Page 18: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Friday, May 18, 12

18

Page 19: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Mutation Assessor is maintained by Boris Reva & Yevgeniy Antipin@ cBio.

Friday, May 18, 12

19

Page 20: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Friday, May 18, 12

20

Page 21: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Friday, May 18, 12

21

Page 22: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package

Cross-Cancer Queries

22

How do Pi3K alterations varyacross ovarian and endometrial cancers?

Friday, May 18, 12

Page 23: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Available soon...!

Friday, May 18, 12

23

Page 24: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Pathway Commons

Reactome

HRPD

HumanCyc

BioGrid

MSKCC Cancer CellMap

IMID

IntAct

MINT

NCI Nature PID

PSI-MI

UniProt Entrez Gene RefSeqBio

PAX

ID Mapping

PC

Batch Download

http://www.pathwaycommons.org

Web Site

Web Service

Pathway Commons, a web resource for biological pathway data. Cerami, et. al, Nucleic Acids Res. 2011 24

Page 25: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Friday, May 18, 12

25

Page 26: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Network View for BRCA1/BRCA2 in TCGA Ovarian Cancer

Network Filtering, Cropping and Searching

Filter Neighbors by Alteration (%)}Hide selected

Show only selected

Show all

Search by Gene Symbol}

Filter Edges by Interaction Type and/or Data Source

Node Legend

AmplificationHomozygous DeletionGainHemizygous Deletion

Copy Number

Mutated

MutationUp-RegulatedDown-Regulated

mRNA Expression

Alteration Frequency (%)

0 100

Thick Border: seed geneThin Border: linker gene

A. B.

D.

In Same ComponentReacts WithState Change

OtherMerged (multiple types)

Interaction LegendC.

Collaboration with Ugur Dogrusoz, Bilkent University; separately fundedby National Resource for Network Biology (NRNB) grant.

Friday, May 18, 12

26

Page 27: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Ovarian Cancer Gene Set: PTEN

Recently Added: RPPA Analysis

27

Friday, May 18, 12

Page 28: Dr. Ethan Cerami: cBio Cancer Genomics Portal

OncoQuery Language (OQL)

RB1

RB1: MUT

RB1: HOMDEL MUT

Step 4: Onco Query Description OncoPrint Output

Default. Shows putative amplifications, homozygous deletions, and mutations.

Shows only mutations.

Shows putative homozy-gous deletions and mutations.

Steps 1-3

User selects TCGA Ovarian Cancer, with genomic profiles:

Mutations (next-gen) Putative CNA (GISTIC)

All Complete Tumors

User selects TCGA GBM, with genomic profiles:

mRNA Expression (Z-Scores)

All Complete Tumors

PTEN Default. Shows up-down mRNA regulation at least 2 standard deviation from the mean.

PTEN: EXP < -1 Shows only down-regulated mRNA events more than 1 standard deviation below the mean.

}}

A) Onco Query Examples: Copy Number and Mutations

B) Onco Query Examples: mRNA Expression Data

Putative Copy Number Amplification

Putative Homozygous Deletion

Mutation

mRNA up-regulation

mRNA down-regulation

Friday, May 18, 12

28

Page 29: Dr. Ethan Cerami: cBio Cancer Genomics Portal

29

Endometrial Cancer: PIK3CA

PIK3CA

A

B C

Friday, May 18, 12

Page 30: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package

Web Service API and R/MATLAB Packages

• Access via Web API

• Access via R Package and MATLAB Library A) Example Query: Retrieve all Cancer Studies

http://www.cbioportal.org/public-portal/webservice.do?cmd=getCancerStudies

cancer_type_idtcga_gbmmskcc_pradmskcc_broad_sarctcga_ova

nameGlioblastoma (TCGA)Prostate Cancer (MSKCC)Sarcoma (MSKCC/Broad)Serous Ovarian Cancer (TCGA)

description............

Output

B) Example Query: Retrieve Copy Number Data for CCNE1 in TCGA Ovarian Cancer

http://www.cbioportal.org/public-portal/webservice.do?cmd=getProfileData&case_set_id=ova_all&genetic_profile_id=ova_gistic&gene_list=CCNE1

COMMONCCNE1

TCGA-04-13311

OutputGENE_ID898

TCGA-04-13321

TCGA-04-13360

TCGA-04-13370

Get Genomic Profile Data Restrict to all TCGA Ovarian Cancer Samples

Retrieve Copy Number (GISTIC) Data

Gene List Amplification

Homozygous DeletionHemizygous Deletion

Putative Copy Number Status+2

-2

Gain+1

-1Diploid0

30

Friday, May 18, 12

Page 31: Dr. Ethan Cerami: cBio Cancer Genomics Portal

R and MATLAB Packages

• Access portal data within R via the CGDS-R package.

• Available via CRAN.

• Vignette and Reference PDF available.

R Package maintained by Anders Jacobsen; MATLAB package maintained by Erik Larsson. 31

Friday, May 18, 12

Page 32: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package

Integrating with the Cancer Genome Atlas Project (TCGA)

GDAC Broad Firehose

Data Coordination Center (DCC)

All Data...

TCGA Researchers

TCGA Disease Working Groups

cBio Portal (s)

32

Friday, May 18, 12

Page 33: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package

33

Firehose @ Broad

Data Coordination Center (DCC) @ NCI

Central repository for all TCGA data.

Pipeline for processing all TCGA data.

cBio Portal @ MSKCC

Open platform for exploring, mining and visualizing TCGA data.

UCSC Cancer Genome Browser

Web portal for exploring TCGA genomic, clinical, and image data.

Integrative Genomics Viewer

(IGV) @ Broad

High-performance visualization tool for interactive exploration of large, inte-grated genomic datasets.

Multidimensionalgenomic profiling data

Oncotator @ Broad

Web application for annotating human genomic point mutations and indels with data relevant to cancer researchers

Mutation Assessor @ cBio

Predicted functional conse-quences of mutations in cancer.

Web API

Implemented

Work In Progress

Legend

User Cross Links (Beta)

RB1CDK4CDKN2A

Freeze lists, subtypes, and other case lists

Tools at ISB

Regulome Explorer, ...

Proposed / Planned

Analysis Working Groups

Generates freeze lists, sub-types, and other case lists

Web API

User Cross Linksfor IGV and Network Visualization

Web API

Download of FirehoseData via DCC

TCGA Ecosystem

Friday, May 18, 12

Page 34: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Planned Features • Adding Drugs and Drug Targets to the network view.

• Adding clinical features and new sort features to the OncoPrint, e.g. group/sort by MSI-Status or Histological Grade, etc.

• Improved analysis and visualization of RPPA (collaboration with Gordon Mills).

• Integration of mutation and copy number algorithm results, e.g. MutSig and GISTIC.

• full support for DNA methylation events.

• [your idea here...]

34

Friday, May 18, 12

Page 35: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package

Open Source • Portal software open source (GNU Lesser GPL).

• Available on Google code:

• http://code.google.com/p/cbio-cancer-genomics-portal/

• Amazon Machine Image (AMI) also available.

• Upstream pre-processing activities required before data can be imported into the portal:

• Mutation data finalization and format.

• Discrete copy number data, e.g. GISTIC algorithm.

• Case lists.

• Some of this is currently handled by the TCGA Broad Firehose.

35

Friday, May 18, 12

Page 36: Dr. Ethan Cerami: cBio Cancer Genomics Portal

Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package

Acknowledgements • cBio Portal

• Nikolaus Schultz • Benjamin Gross • Arthur Goldberg • Caitlin Byrne • Anders Jacobsen • Jianjiong Gao• Erik Larsson • Selcuk Onur Sumer, Bilkent University • Sinan Sonlu, Bilkent University• Ugur Dogrusoz, Bilkent University• Chris Sander

• Collaborators: • Broad Firehose Team • The TCGA Project Team

• Pathway Commons:• Benjamin Gross • Emek Demir • Igor Rodchenkov, U. Toronto • Ozgün Babur• Nadia Anwar • Nikolaus Schultz • Gary D. Bader, U. Toronto • Chris Sander

36

Friday, May 18, 12