Dr. Ethan Cerami: cBio Cancer Genomics Portal
-
Upload
national-cancer-institute-national-cancer-informatics-program -
Category
Health & Medicine
-
view
1.801 -
download
2
description
Transcript of Dr. Ethan Cerami: cBio Cancer Genomics Portal
The cBio Cancer Genomics Portal: An Open Platform forExploring Multidimensional Cancer Genomics Data
Ethan Cerami, Ph.D.Director, Cancer Informatics Development
Computational Biology Center (cBio)Memorial Sloan-Kettering Cancer Center
CBIIT Talk May 23, 2012
cBio Cancer Genomics Portal
Introduction
Motivation: Pathway Analysis Network Analysis
CBIIT Talk
Examples of Usage Advanced Options Web API / R Package
TCGA Ecosystem & Future Plans
http://cbioportal.org
Friday, May 18, 12
The Cancer Genome Atlas (TCGA) Project
2
MSKCC Genome Data Analysis Center (GDAC)
Pathway Analysis
Genomic Alteration(s): Single Nucleotide Variants Small Insertions and Deletions
Copy NumberAlterations
mRNA and microRNAexpression Changes
DNA Methylation
Patient Cohort
PI3K Pathway
TP53 Pathway
Pathway Analysis:
Genomic Inputs:
+ +
Epigenetically silenced genes
Copy numberaltered geneswith correlatedgene expression
Pathway and Network Data
N-Ac-Neuraminate(Sialate)
O
OHACNH
HO OH
O
OH HO OPPU
NHAC
UDP-N-Ac-Glucosaminepyruvate
N-Ac-Mannosamine-6-P
2.7.7.433.1.3.29
1.1.1.158
3.1.3.294.1.3.20
2.7.1.60
CMP-N-Acetylneuraminate
2.4.1.16
6.3.2.136.3.2.7-10
OCHOHCHOHAcNH
HO
OPC
OCO -CH OH2
OCHOHCHOHAcNH
HO
OH
COOCH OH2
O
OHACNH
HO OH
CH O2 P
CH OH2CH OH2
UDP-N-Ac-Muramate
O
OCH CH3
COO-
NHAC
HO OPPU
CH OH2
4.1.3.20
O
OHC C2 NHAC
HO OPPU
COO
CH OH2
2.4.99.7
Metabolic Pathways Signaling Pathways Protein-Protein Interactions Regulatory Networks Drug-Target Networks
Friday, May 18, 12
3
Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package
Comprehensive genomic characterization defines human glioblastoma genes and core pathways The Cancer Genome Atlas Research Network Nature 455, 1061-1068(23 October 2008) 4
Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package
5
Homologous Repair (HR) Alterations BRCA Altered Cases, N=103 (33%) BRCA1
BRCA2Epigenetic Silencing via Hypermethylation
Somatic Mutation
Germline Mutation DNA damage
ATM 1%
ATR <1%
FA core complex
5%
FANCD2 <1%
BRCA1 23%
BRCA2 11%
EMSY 8%
RAD51C 3%
mutated mutated mutated / hypermethyl.
mutated hypermethyl.mutated
amplified / mutated
mutated
Sensors
HR-Mediated repair
PTEN 7%
deleted
HR Pathway51% of cases altered
0 20
40
60
80
10
0
BRCA Mutated [66] BRCA1 Epigenetically Silenced [33] BRCA Wildtype [212]
Patie
nt S
urvi
val
/RJïrank test Sïvalue: 0.0008602
0 50 100 150 Months Survival
Integrated genomic analyses of ovarian carcinoma The Cancer Genome Atlas Research Network Nature 474, 609–615 (30 June 2011)
Friday, May 18, 12
Pathway Analysis
Genomic Alteration(s): Single Nucleotide Variants Small Insertions and Deletions
Copy NumberAlterations
mRNA and microRNAexpression Changes
DNA Methylation
Patient Cohort
PI3K Pathway
TP53 Pathway
Pathway Analysis:
Genomic Inputs:
+ +
Epigenetically silenced genes
Copy numberaltered geneswith correlatedgene expression
Pathway and Network Data
N-Ac-Neuraminate(Sialate)
O
OHACNH
HO OH
O
OH HO OPPU
NHAC
UDP-N-Ac-Glucosaminepyruvate
N-Ac-Mannosamine-6-P
2.7.7.433.1.3.29
1.1.1.158
3.1.3.294.1.3.20
2.7.1.60
CMP-N-Acetylneuraminate
2.4.1.16
6.3.2.136.3.2.7-10
OCHOHCHOHAcNH
HO
OPC
OCO -CH OH2
OCHOHCHOHAcNH
HO
OH
COOCH OH2
O
OHACNH
HO OH
CH O2 P
CH OH2CH OH2
UDP-N-Ac-Muramate
O
OCH CH3
COO-
NHAC
HO OPPU
CH OH2
4.1.3.20
O
OHC C2 NHAC
HO OPPU
COO
CH OH2
2.4.99.7
Metabolic Pathways Signaling Pathways Protein-Protein Interactions Regulatory Networks Drug-Target Networks
cBio Cancer Genomics Portal
PathwayCommons
Friday, May 18, 12
6
7
Mutations
Copy
Number
mRNA
Expression
DNA
Methylation
Biological
Pathways
Clinical
Survival
Protein /
Phospho-
protein
020
40
60
80
100
20 40 60 80 100 1200
Web-Based Interface for Iterative Exploratory Data Analysis
Integration of
Genomic Data
Types, Clinical
Data, and Biologi-
cal Pathways.
OncoPrint: Compact Visualization of Discrete Genomic Events
Survival Analysis Network Analysis
Comprehensive Cancer Genomic Studies
Web-Service Interface
R-Package
MATLAB ToolBox
Mutation Details
Predicted Functional Impact
of Mutations
Multidimensional Genomic
Data Plots
Other Reports
Alteration Frequency (%)
...
cBio Cancer Genomics Portal
Gene A
Gene B
Gene C
Biological Insight
Clinical Trial Design
The cBio Cancer Genomics Portal Cerami, et. al, Cancer Discovery (May, 2012)
Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package
CBIIT Talk
cBio Portal in Context • Other Portals available:
• TCGA Data Portal
• ICGC Data Portal
• UCSC Cancer Genome Browser
• cBio Portal:
• Supports Exploratory Data Analysis
• Lowers the barrier to access - specifically for biologists andclinical researchers
• Provides integrated access to data
8
Friday, May 18, 12
Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package
CBIIT Talk
Multiple Portals • Public Portal: http://www.cbioportal.org/
• Contains published TCGA studies + a fewother studies.
• Now also contains public copy number, mRNA, RPPA data for all TCGA tumor types (everything, but mutation data).
• Open Access.
• TCGA Portal: http://cbio.mskcc.org/gdac-portal/
• Contains all provisional TCGA data, updatedmonthly.
• Requires a user name / password.
• Register at: http://bit.ly/gdac-form. • Stand-Up to Cancer (SU2C) Portal
9
Friday, May 18, 12
4-Step Web Interface
4-step web interface for querying a single cancer study
RB1 CDK4 CDKN2A
Advanced: Onco Query Language (OQL)Enter Gene Set:
Select Cancer Study:
Select Patient/Case Set:
Select Genomic Profiles:
All Complete Tumors (seq, mRNA, CNA)
MutationsCopy Number Data. Select one of the profiles below:
mRNA Expression z-Scores
Glioblastoma (TCGA)
The Cancer Genome Atlas (TCGA) Glioblastoma project. 206 primary glioblastoma samples.Nature 2008. Raw data via the TCGA Data Portal.
Or Select from Example Gene Sets:
User-Defined List
Query Download Data
Optional Arguments:Compute Mutual Exclusivity / Co-occurence between all pairs of genes. (Not recommended for more than 10 genes.)
Submit
Putative copy-number alterations (RAE)
Putative copy-number alterations (GBM Pathways)
1
2 Select one or more genomic profilesFor example: Mutation and Copy Number Data
Select a Cancer Study or “All Cancer Studies”
3 Select a Patient Set
4 Enter a Gene or Gene Set
Optional argument to compute mutual exclusivity / co-occurence between all pairs of genes.
10
cBio Cancer Genomics Portal
Introduction
Motivation: Pathway Analysis Network Analysis
CBIIT Talk
Examples of Usage Advanced Options Web API / R Package
TCGA Ecosystem & Future Plans
Main Features:
11
Friday, May 18, 12
Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package
Key Abstraction: Discrete Genomic-Level Events
• Each Gene within each sample is assigned multiple discrete genomic level events:
• Mutations: Mutated or WT.
• Copy Number: Amplification, Homozygous Deletion, etc.
• Important caveats:
• Portal does not provide confidence intervals for mutations.
• Copy number calls (as determined by GISTIC or RAE) are putative.
12
Friday, May 18, 12
New Tutorials Available
Friday, May 18, 12
13
Querying a Single Cancer Study
Friday, May 18, 12
14
Friday, May 18, 12
15
Friday, May 18, 12
16
Friday, May 18, 12
17
Friday, May 18, 12
18
Mutation Assessor is maintained by Boris Reva & Yevgeniy Antipin@ cBio.
Friday, May 18, 12
19
Friday, May 18, 12
20
Friday, May 18, 12
21
Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package
Cross-Cancer Queries
22
How do Pi3K alterations varyacross ovarian and endometrial cancers?
Friday, May 18, 12
Available soon...!
Friday, May 18, 12
23
Pathway Commons
Reactome
HRPD
HumanCyc
BioGrid
MSKCC Cancer CellMap
IMID
IntAct
MINT
NCI Nature PID
PSI-MI
UniProt Entrez Gene RefSeqBio
PAX
ID Mapping
PC
Batch Download
http://www.pathwaycommons.org
Web Site
Web Service
Pathway Commons, a web resource for biological pathway data. Cerami, et. al, Nucleic Acids Res. 2011 24
Friday, May 18, 12
25
Network View for BRCA1/BRCA2 in TCGA Ovarian Cancer
Network Filtering, Cropping and Searching
Filter Neighbors by Alteration (%)}Hide selected
Show only selected
Show all
Search by Gene Symbol}
Filter Edges by Interaction Type and/or Data Source
Node Legend
AmplificationHomozygous DeletionGainHemizygous Deletion
Copy Number
Mutated
MutationUp-RegulatedDown-Regulated
mRNA Expression
Alteration Frequency (%)
0 100
Thick Border: seed geneThin Border: linker gene
A. B.
D.
In Same ComponentReacts WithState Change
OtherMerged (multiple types)
Interaction LegendC.
Collaboration with Ugur Dogrusoz, Bilkent University; separately fundedby National Resource for Network Biology (NRNB) grant.
Friday, May 18, 12
26
Ovarian Cancer Gene Set: PTEN
Recently Added: RPPA Analysis
27
Friday, May 18, 12
OncoQuery Language (OQL)
RB1
RB1: MUT
RB1: HOMDEL MUT
Step 4: Onco Query Description OncoPrint Output
Default. Shows putative amplifications, homozygous deletions, and mutations.
Shows only mutations.
Shows putative homozy-gous deletions and mutations.
Steps 1-3
User selects TCGA Ovarian Cancer, with genomic profiles:
Mutations (next-gen) Putative CNA (GISTIC)
All Complete Tumors
User selects TCGA GBM, with genomic profiles:
mRNA Expression (Z-Scores)
All Complete Tumors
PTEN Default. Shows up-down mRNA regulation at least 2 standard deviation from the mean.
PTEN: EXP < -1 Shows only down-regulated mRNA events more than 1 standard deviation below the mean.
}}
A) Onco Query Examples: Copy Number and Mutations
B) Onco Query Examples: mRNA Expression Data
Putative Copy Number Amplification
Putative Homozygous Deletion
Mutation
mRNA up-regulation
mRNA down-regulation
Friday, May 18, 12
28
29
Endometrial Cancer: PIK3CA
PIK3CA
A
B C
Friday, May 18, 12
Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package
Web Service API and R/MATLAB Packages
• Access via Web API
• Access via R Package and MATLAB Library A) Example Query: Retrieve all Cancer Studies
http://www.cbioportal.org/public-portal/webservice.do?cmd=getCancerStudies
cancer_type_idtcga_gbmmskcc_pradmskcc_broad_sarctcga_ova
nameGlioblastoma (TCGA)Prostate Cancer (MSKCC)Sarcoma (MSKCC/Broad)Serous Ovarian Cancer (TCGA)
description............
Output
B) Example Query: Retrieve Copy Number Data for CCNE1 in TCGA Ovarian Cancer
http://www.cbioportal.org/public-portal/webservice.do?cmd=getProfileData&case_set_id=ova_all&genetic_profile_id=ova_gistic&gene_list=CCNE1
COMMONCCNE1
TCGA-04-13311
OutputGENE_ID898
TCGA-04-13321
TCGA-04-13360
TCGA-04-13370
Get Genomic Profile Data Restrict to all TCGA Ovarian Cancer Samples
Retrieve Copy Number (GISTIC) Data
Gene List Amplification
Homozygous DeletionHemizygous Deletion
Putative Copy Number Status+2
-2
Gain+1
-1Diploid0
30
Friday, May 18, 12
R and MATLAB Packages
• Access portal data within R via the CGDS-R package.
• Available via CRAN.
• Vignette and Reference PDF available.
R Package maintained by Anders Jacobsen; MATLAB package maintained by Erik Larsson. 31
Friday, May 18, 12
Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package
Integrating with the Cancer Genome Atlas Project (TCGA)
GDAC Broad Firehose
Data Coordination Center (DCC)
All Data...
TCGA Researchers
TCGA Disease Working Groups
cBio Portal (s)
32
Friday, May 18, 12
Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package
33
Firehose @ Broad
Data Coordination Center (DCC) @ NCI
Central repository for all TCGA data.
Pipeline for processing all TCGA data.
cBio Portal @ MSKCC
Open platform for exploring, mining and visualizing TCGA data.
UCSC Cancer Genome Browser
Web portal for exploring TCGA genomic, clinical, and image data.
Integrative Genomics Viewer
(IGV) @ Broad
High-performance visualization tool for interactive exploration of large, inte-grated genomic datasets.
Multidimensionalgenomic profiling data
Oncotator @ Broad
Web application for annotating human genomic point mutations and indels with data relevant to cancer researchers
Mutation Assessor @ cBio
Predicted functional conse-quences of mutations in cancer.
Web API
Implemented
Work In Progress
Legend
User Cross Links (Beta)
RB1CDK4CDKN2A
Freeze lists, subtypes, and other case lists
Tools at ISB
Regulome Explorer, ...
Proposed / Planned
Analysis Working Groups
Generates freeze lists, sub-types, and other case lists
Web API
User Cross Linksfor IGV and Network Visualization
Web API
Download of FirehoseData via DCC
TCGA Ecosystem
Friday, May 18, 12
Planned Features • Adding Drugs and Drug Targets to the network view.
• Adding clinical features and new sort features to the OncoPrint, e.g. group/sort by MSI-Status or Histological Grade, etc.
• Improved analysis and visualization of RPPA (collaboration with Gordon Mills).
• Integration of mutation and copy number algorithm results, e.g. MutSig and GISTIC.
• full support for DNA methylation events.
• [your idea here...]
34
Friday, May 18, 12
Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package
Open Source • Portal software open source (GNU Lesser GPL).
• Available on Google code:
• http://code.google.com/p/cbio-cancer-genomics-portal/
• Amazon Machine Image (AMI) also available.
• Upstream pre-processing activities required before data can be imported into the portal:
• Mutation data finalization and format.
• Discrete copy number data, e.g. GISTIC algorithm.
• Case lists.
• Some of this is currently handled by the TCGA Broad Firehose.
35
Friday, May 18, 12
Motivation: cBio Cancer Genomics Portal TCGA Ecosystem & Future Plans Pathway Analysis Introduction Examples of Usage Network Analysis Advanced Options Web API / R Package
Acknowledgements • cBio Portal
• Nikolaus Schultz • Benjamin Gross • Arthur Goldberg • Caitlin Byrne • Anders Jacobsen • Jianjiong Gao• Erik Larsson • Selcuk Onur Sumer, Bilkent University • Sinan Sonlu, Bilkent University• Ugur Dogrusoz, Bilkent University• Chris Sander
• Collaborators: • Broad Firehose Team • The TCGA Project Team
• Pathway Commons:• Benjamin Gross • Emek Demir • Igor Rodchenkov, U. Toronto • Ozgün Babur• Nadia Anwar • Nikolaus Schultz • Gary D. Bader, U. Toronto • Chris Sander
36
Friday, May 18, 12