Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

66
Applied Bioinformatics Week 9 Jens Allmer

Transcript of Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Page 1: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Applied Bioinformatics

Week 9

Jens Allmer

Page 2: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Theory I

• Gene Expression• Microarray

Page 3: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Gene Expression

• Is there a transcript?

• How much transcript is made?

• Is there any difference to the DNA?

• Is there any difference to the annotation?

Page 4: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Measure Expression

• Northern/Western Blot

• qPCR

• Next generation sequencing

• Microarray

Page 5: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

http://www.nature.com/leu/journal/v17/n7/images/2402974f1.jpg

Page 6: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Chip Construction

http://www.dkfz.de/gpcf/fileadmin/_migrated/RTE/RTEmagicC_AffyChipProduction.jpg.jpg

https://www.bcm.edu/cms_web/110/affy3.jpg

Page 7: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

http://www.scq.ubc.ca/wp-content/GeneChip.gif

Chip Construction

Page 8: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

http://angerer.swissbrain.org/expression_oveview.gif

Page 9: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

http://www.scq.ubc.ca/wp-content/cDNAarray.gif

Page 10: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Bioinformatics Analysis

• Experimental design• Standardization• Data Analysis

– Image processing, normalization– ...– Clustering, Visualization

• Data Storage

Page 11: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Database ScopeMicroarray

experiment setsSample profiles

As of date

The Cancer Genome Atlas (TCGA) collection of expression data for different cancers 21229  ? 30-Aug-13

Stanford Microarray database private and published microarray and molecule abundance database 82542  ? 23-Oct-11

ArrayExpress at EBI Any curated MIAME or MINSEQE compliant transcriptomics data 24838 708914 28-Oct-11

Gene Expression Omnibus - NCBI any curated MIAME compliant molecular abundance study 25859 641770 28-Oct-11

Genevestigator Gene expression search engine based on manually curated, well annotated public and proprietary microarray and RNA-seq datasets

2615 119,400 Aug-14

NCI mAdb Hosts NCI data with integrated analysis and statistics tools  ? 105,000 Mar-12

ArrayTrackArrayTrack hosts both public and private data, including MAQC benchmark data, with integrated analysis tools 1622 50,093 Feb-12

ImmGen database Open access across all immune system cells; expression data, differential expression, coregulated clusters, regulation

267 1059 Jan-12

UPSC-BASE data generated by microarray analysis within Umeå Plant Science Centre (UPSC).

~100  ? 15-Nov-07

UPenn RAD database MIAME compliant public and private studies, associated with ArrayExpress ~100 ~2500 Sept. 1, 2007

GeneNetwork system Open access standard arrays, exons arrays, and RNA-seq data for genetic analysis (eQTL studies) with analysis suite

~100 ~10000 July, 2010

caArray at NCI Cancer data, prepared for analysis on caBIG 41 1741 15-Nov-06

UNC Microarray database provides the service for microarray data storage, retrieval, analysis, and visualization

~31 2093 1-Apr-07

MUSC database The database is a repository for DNA microarray data generated by MUSC investigators as well as researchers in the global research community.

~45 555 1-Apr-07

UNC modENCODE Microarray database Nimblegen customer 2.1 million array ~6 180 17-Jul-09

List of MA Data Sources

Page 12: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

A public repository for the archiving and distribution of gene expression data submitted by the scientific community.

MIAME compliant data. Minimum Information About a Microarray Experiment

http://www.mged.org/Workgroups/MIAME/miame.html

Convenient for deposition of gene expression data, as required by funding agencies and journals.

Curated, online resource for gene expression data browsing, query, analysis and retrieval.

Gene Expression Omnibus (GEO): Gene Expression and Molecular Abundance Data Repository

GEO Gene Expression Omnibus - TeachLine

Page 13: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

GEO Architecture

Platform (GPL) = the technology used and the features detected.

Sample (GSM) = preparation and description of the sample.

Series (GSE) defines a set of samples and how they are related.

DataSets (GDS) sample data collections assembled by GEO staff.

GEO has four kinds of data records

Submitters may provide raw data

Original microarray scans Raw quantification data

GEO Gene Expression Omnibus - TeachLine

Page 14: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

GPLPlatform

descriptions

GSMRaw/processedspot intensities

from a singleslide/chip

GSEGrouping of

slide/chip data“a single experiment”

GDSGrouping ofexperiments

Curated byNCBI

Submitted byExperimentalistsSubmitted by

Manufacturer*

GEO Architecture

GEO Gene Expression Omnibus - TeachLine

Page 15: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

GEO Home Page

Simple interface to: show status

find documentation

query data browse data submit data

Page 16: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Basic Search: Repository Browser

Selecting the total public data or Repository Browser links on the GEO home page, takes you to the Repository Browser, listing: number of each type of submitted file, both public and unreleased the total number of each technology type under Platforms the total number of each Sample type

Page 17: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Basic Search: Browse Platforms

All GEO submissions need to be associated with a platform file. These describe the features on a given platform, required to understand the data.

A platform file must be submitted if one is not already present in GEO. Commercial array platform files are submitted to GEO by the manufacturer.

GEO Gene Expression Omnibus - TeachLine

Page 18: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Basic Search: Browse Platforms

Accession:GEO ID

Title:brief description

of platform

Contact:submitter

Samples:number of samples in GEO associated

with platform ID

Technology:platform

type

Release date:when file is

publicly accessible

The table can be sorted on any field except organism by clicking on the header.Specific platform files can be found using the ‘Find Platform’ option.

GEO Gene Expression Omnibus - TeachLine

Page 19: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Basic Search: Find Platforms

Select ‘Find Platform’ Select company Select distribution Select species Enter title keyword

GEO Gene Expression Omnibus - TeachLine

Page 20: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Basic Search: Find Platforms (continued)

Start the platform search Select the accession for the U133 plus 2.0 array Scroll down to find data table information

GEO Gene Expression Omnibus - TeachLine

Page 21: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Data Retrieval: Browse Series

Data is submitted to GEO as a Series, which represents the experiment design.

Selecting Browse>Series brings up a list sorted by release date. Selecting a Series ID brings up the Series file summary.

GEO Gene Expression Omnibus - TeachLine

Page 22: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Data Retrieval: Series Accession Page

GEO Gene Expression Omnibus - TeachLine

Page 23: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

GEO Accession Results Display Options

Scope controls what information is displayed:SelfPlatform, Samples or SeriesFamily

Format controls how information is displayed:HTMLSOFT (Simple Omnibus Format in Text)MINiML (MIAME Notation in Markup Language)

Amount controls how much information is displayed:Brief QuickFull Data

All GEO accession results pages have the same header that allows different views and formats for the data to be displayed

GEO Gene Expression Omnibus - TeachLine

Page 24: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Data Retrieval: Series Accession Page

Biological sample summary

Design summary

Publication information

Platform (total)

Samples (total)

GEO Gene Expression Omnibus - TeachLine

Page 25: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Data Retrieval: Sample File Summary

Sample preparation

Hybridization and data

processing

Platform Series

GEO Gene Expression Omnibus - TeachLine

Page 26: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Data Retrieval: Sample File Data TableData table field

descriptions

Truncated data table from Quick view

Total data rows and file size

Supplementary raw data file

GEO Gene Expression Omnibus - TeachLine

Page 27: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Querying GEO with IDs from Papers

A common way to access GEO data is through accessions from papers. Online journals include hyperlinks to the GEO accession page. Or, at the GEO home page enter the accession into the Query>GEO

accession text boxGEO Gene Expression Omnibus - TeachLine

Page 28: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

GEO Links in PubMed Search Results

One option for displaying PubMed search results is GEO DataSet links. When present, the results page is actually from Entrez GEO DataSets.

GEO Gene Expression Omnibus - TeachLine

Page 29: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Advanced Searches

GEO data can be queried as: Datasets: experiment-centric view using Entrez GEO DataSets Gene profiles: gene-centric view using Entrez GEO Profiles

Selecting either takes you to a similar Entrez introduction page

GEO Gene Expression Omnibus - TeachLine

Page 30: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Querying GEO DataSets

Start a GEO DataSets search with the Query>DataSets text box This brings up an Entrez GEO DataSets results form

Total results

Number of DataSets

Number of Platforms

Number of Series

GEO Gene Expression Omnibus - TeachLine

Page 31: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

DataSet Search Result

DataSet ID

Description

Platform

Reference Series

Supplementary files

Number of Samples and truncated list

Cluster image

Select the DataSet ID or click on the cluster image to go to the DataSet record.

GEO Gene Expression Omnibus - TeachLine

Page 32: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

GEO DataSet Record

Experiment design and DataSet information.

Sample and analysis information. Data retrieval.

Selecting analysis takes you to the data clustering interface.

Selecting the cluster image takes you to the clustering page

GEO Gene Expression Omnibus - TeachLine

Page 33: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

GEO Gene Profiles

GEO DataSet ID

Platform ID, Platform Feature ID

Gene description

Target sequence accession

Expression profile

GEO Gene Profiles use gene IDs from Platform files to show the expression of a gene across DataSets.

Entering a gene ID into the Query>Gene profiles text box takes you to the Entrez results page.

GEO Gene Expression Omnibus - TeachLine

Page 34: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

GEO BLAST

On the GEO BLAST page enter sequences in fasta format, GenBank accessions or select sequence files on local disks for blastn comparisons.

These are compared to GenBank sequences listed in Platform files associated with GEO DataSets

From the Blast result page select the ‘E’ option to the right of an alignment to show GEO Gene Profiles for that sequence in GEO DataSets

E button

GEO Gene Expression Omnibus - TeachLine

Page 35: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

End Theory I

• Mindmapping• 10 min break

Page 36: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Practice I

• Gene Expression Omnibus– http://www.ncbi.nlm.nih.gov/geo/

Page 37: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

NCBI GEO

• Take 15 minutes to browse the website

Page 38: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Repository

• Go to the repository browser– http://www.ncbi.nlm.nih.gov/geo/summary/

• Explore the available tabs

• What kind of different data is available?

Page 39: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Where is the actual data?

• Try to find the following accessions:– GSE48874– GSM1186226

Page 40: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Page 41: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

End Practice I

• 15 min break

Page 42: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Theory II

• Next generation sequencing

Page 43: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Microarray vs NGS

1990 1995 2000 2005 2010 20150

200

400

600

800

1000

1200

1400

MicroarrayNGS

Publication Year

Num

ber o

f Pap

ers i

n Pu

bmed

Page 44: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Doug Brutlag 2011

The Human GenomeHow fast is the cost going down?

• 2006: $ 50 million• 2008: $500,000• 2009: $50,000• 2010: $20,000• 2011: $5,000• 2012:??? $1,000

Thanks to Serafim Batzoglou

Page 45: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Page 46: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Platforms• Roche/454 FLX: 2004• Illumina Solexa Genome Analyzer: 2006• Applied Biosystems SOLiDTM System: 2007• Helicos HeliscopeTM : recently available• Pacific Biosciencies SMRT: launching 2010

• And many more

Page 48: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Page 49: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Doug Brutlag 2011

Pacific Biosciences Sequencing

Page 52: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Applications of next-generation sequencing

Jay Shendure & Hanlee Ji, Nature Biotechnology 26, 1135 - 1145 (2008)

Page 53: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Page 54: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

OK, but where is the data?

• http://www.1000genomes.org/

• http://trace.ddbj.nig.ac.jp/dra/index_e.html

• http://www.ebi.ac.uk/ena

• http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=studies

Page 55: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

End Theory II

• Mindmapping• 10 min break

Page 56: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Practice I

Page 57: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

NCBI

• http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi

• Browse the webpage for 15 minutes

Page 58: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Available Data

• Search for human data• How much data is available?• Find accession ERX628533• How large is the dataset?• Why is it so large?

Page 59: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

End Practice II

Page 60: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

Homework

• Select 1 next generation sequencing platform and give a step by step description how it works

• Max 500 words and at most 5 figures.

Page 61: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.
Page 62: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

http://dx.doi.org/10.6084/m9.figshare.100940

Page 63: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

http://www.nature.com/nbt/journal/v26/n10/fig_tab/nbt1486_F1.html

Page 64: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

http://www.nature.com/nbt/journal/v26/n10/fig_tab/nbt1486_F2.html

Page 65: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.

http://www.nature.com/nbt/journal/v26/n10/fig_tab/nbt1486_F3.html

Page 66: Applied Bioinformatics Week 9 Jens Allmer. Theory I Gene Expression Microarray.