Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional...

148
Functional Genomics I MED263: Bioinformatics Applications to Human Disease Jason Young | Email: [email protected] | MED 263 | Winter 2015

Transcript of Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional...

Page 1: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Functional Genomics IMED263: Bioinformatics Applications to Human Disease

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 2: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

What You Will Learn Today...

• Functional genomic methods for gene expression analysis

• Typical workflow for a gene expression study

• Aspects of microarray data analysis• Kicic et al. (2010): Example of a

differential expression microarray study

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 3: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

The Central Dogma of Biology

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 4: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

The Central Dogma of Biology

Genomics

Transcriptomics

Proteomics

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 5: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Genomics

Transcriptomics

ProteomicsFunctional Genomics

The Central Dogma of Biology

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 6: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Functional Genomics

“Fishing Expeditions”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven” research.

Page 7: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Functional Genomics

“Fishing Expeditions”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven” research.

“Without speculation there is no good and original observation” - Charles Darwin

Page 8: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Functional Genomics

Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven” research.

“Fishing Expeditions”

Use functional genomics data to generate hypotheses that can then be tested with further experimentation.

“Without speculation there is no good and original observation” - Charles Darwin

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 9: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

History of Transcript AnalysisPre-Functional Genomics: One transcript at a time

- Northern blotting (1977)- Reverse Transcriptase PCR (RT-PCR)- RNase protection

* Highly-quantitative, still essential for validation of functional genomic gene expression results

Functional Genomics: Many transcripts at a time - cDNA libraries (early 1990s)- Serial Analysis Gene Expression (SAGE) (mid 1990s)- DNA Microarrays (late 1990s)- RNA-Seq (late 2000s)

* Less-quantitative, but provide a rapid, broad overview of genome-wide transcript abundance

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 10: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

History of Transcript AnalysisPre-Functional Genomics: One transcript at a time

- Northern blotting (1977)- Reverse Transcriptase PCR (RT-PCR)- RNase protection

* Highly-quantitative, still essential for validation of functional genomic gene expression results

Functional Genomics: Many transcripts at a time - cDNA libraries (early 1990s)- Serial Analysis Gene Expression (SAGE) (mid 1990s)- DNA Microarrays (late 1990s)- RNA-Seq (late 2000s)

* Less-quantitative, but provide a rapid, broad overview of genome-wide transcript abundance

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 11: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

History of Transcript AnalysisPre-Functional Genomics: One transcript at a time

- Northern blotting (1977)- Reverse Transcriptase PCR (RT-PCR)- RNase protection

* Highly-quantitative, still essential for validation of functional genomic gene expression results.

Functional Genomics: Many transcripts at a time - cDNA libraries (early 1990s)- Serial Analysis Gene Expression (SAGE) (mid 1990s)- DNA Microarrays (late 1990s)- RNA-Seq (late 2000s)

* Less-quantitative, but provide a rapid, broad overview of genome-wide transcript abundance.

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 12: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

History of Transcript AnalysisPre-Functional Genomics: One transcript at a time

- Northern blotting (1977)- Reverse Transcriptase PCR (RT-PCR)- RNase protection

* Highly-quantitative, still essential for validation of functional genomic gene expression results

Functional Genomics: Many transcripts at a time - cDNA libraries (early 1990s) - Serial Analysis Gene Expression (SAGE) (mid 1990s)- DNA Microarrays (late 1990s)- RNA-Seq (late 2000s)

* Less-quantitative, but provide a rapid, broad overview of genome-wide transcript abundance

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 13: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

cDNA Libraries

cDNA Library Construction 1. Isolate mRNA from organism,

cell type, developmental stage, or physiological condition

2. Reverse transcribe to cDNA3. Clone into a vector for

propagation in bacteria4. Sequence cDNA inserts to

produce Expressed Sequence Tags (ESTs)

5. ESTs represent a sampling of the expression repertoire of the original samples

Shotgun Single-Pass Approach Adams et al. 1991, 1993

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 14: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

cDNA Libraries

Caveats of cDNA libraries and ESTs 1. Time consuming and laborious2. Depth of sequencing of library

determines how well rare transcripts are represented (counting)

3. Incomplete transcripts often present (5’ end missing)

4. Clones can be used to express protein products in addition to measuring ESTs

Shotgun Single-Pass Approach Adams et al. 1991, 1993

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 15: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

History of Transcript AnalysisPre-Functional Genomics: One transcript at a time

- Northern blotting (1977)- Reverse Transcriptase PCR (RT-PCR)- RNase protection

* Highly-quantitative, still essential for validation of functional genomic gene expression results

Functional Genomics: Many transcripts at a time - cDNA libraries (early 1990s)- Serial Analysis Gene Expression (SAGE) (mid 1990s) - DNA Microarrays (late 1990s)- RNA-Seq (late 2000s)

* Less-quantitative, but provide a rapid, broad overview of genome-wide transcript abundance

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 16: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Serial Analysis Gene Expression (SAGE)

SAGE Library Construction 1. Isolate mRNA from organism, cell

type, developmental stage, or physiological condition

2. Reverse transcribe to cDNA (with biotin tag)

3. Cleave w/ AE & attach to beads4. Divide into two pools and ligate

distinct linkers (A & B)5. Cleave using blunt end TE6. Perform blunt ligation to generate

ditags7. Concatenate, clone and

sequence

Velculescu et al. 1995

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 17: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Serial Analysis Gene Expression (SAGE)

Caveats of SAGE libraries 1. Still time consuming and

laborious, but shorter tags make SAGE more cost efficient than EST libraries (specialized skill)

2. Only detect 3’ end of transcripts and relies on the presence of an appropriately spaced AE site

3. Relies on counting like ESTs, although genes expressed at low levels are difficult to reproduce

4. Like cDNA libraries, no need for knowledge of genome sequence to obtain tags (not true of microarrays)

Velculescu et al. 1995

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 18: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

History of Transcript AnalysisPre-Functional Genomics: One transcript at a time

- Northern blotting (1977)- Reverse Transcriptase PCR (RT-PCR)- RNase protection

* Highly-quantitative, still essential for validation of functional genomic gene expression results

Functional Genomics: Many transcripts at a time - cDNA libraries (early 1990s)- Serial Analysis Gene Expression (SAGE) (mid 1990s)- DNA Microarrays (late 1990s) - RNA-Seq (late 2000s)

* Less-quantitative, but provide a rapid, broad overview of genome-wide transcript abundance

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 19: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

DNA Microarrays

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 20: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

DNA Microarrays

Sequences need to be known for probe design

Relies on hybridization rather than sequence counting

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 21: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

DNA Microarrays

Sequences need to be known for probe design

Relies on hybridization rather than sequence counting

Fast: Can obtain genome-wide transcript levels in days

Comprehensive: Entire transcriptomes can be represented on one array

Flexible: Probes against any gene can be represented on a chip.

Affordable: Technology is >10 years old.

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 22: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Spotted Array Generally 60-80 nucleotidesSpotted mechanicallyGenerally <10k features+s: flexibility-s: low-density, reproducibilityDual color (intra array)

In Situ Synthesized 25-80 nucleotidesGenerated using photolithography>1 million features, static+s: high-density, reproducibility-s: flexibilitySingle or dual color (inter or intra array)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 23: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Spotted Array Generally 60-80 nucleotidesSpotted mechanicallyGenerally <10k features+ flexibility- low-density, reproducibilityDual color (intra array)

In Situ Synthesized 25-80 nucleotidesGenerated using photolithography>1 million features, static+s: high-density, reproducibility-s: flexibilitySingle or dual color (inter or intra array)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 24: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Spotted Array Generally 60-80 nucleotidesSpotted mechanicallyGenerally <10k features+ flexibility- low-density, reproducibilityDual color (intra array)

In Situ Synthesized 25-80 nucleotidesGenerated using photolithography>1 million features, static+s: high-density, reproducibility-s: flexibilitySingle or dual color (inter or intra array)

Cy5 (Red)

Cy3 (Green)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 25: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Spotted Array Generally 60-80 nucleotidesSpotted mechanicallyGenerally <10k features+ flexibility- low-density, reproducibilityDual color (intra array)

In Situ Synthesized 25-80 nucleotidesGenerated using photolithography>1 million features, static+ high-density, reproducibility- flexibilitySingle or dual color (inter or intra array)

Cy5 (Red)

Cy3 (Green)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 26: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Spotted Array Generally 60-80 nucleotidesSpotted mechanicallyGenerally <10k features+ flexibility- low-density, reproducibilityDual color (intra array)

In Situ Synthesized 25-80 nucleotidesGenerated using photolithography>1 million features, static+ high-density, reproducibility- flexibilitySingle or dual color (inter or intra array)

Affymetrix / Nimblegen (Roche) / Agilent / Illumina

Cy5 (Red)

Cy3 (Green)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 27: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Affymetrix GeneChips • Traditionally have

dominated the market• 11-20 distinct 25nt

probes measure expression of each gene

• Attempted to account for non-specific hybridization to PM probes using MM probes

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 28: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 29: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Nimblegen - Madison, WI

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 30: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Nimblegen (Roche)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 31: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Agilent

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 32: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Illumina

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 33: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Types of DNA Microarrays

Illumina

23 & Me

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 34: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

But Wait!?! Who uses microarrays anymore?!?

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 35: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Microarrays vs NGS

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays

Page 36: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Microarrays vs NGS

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays NGS

Page 37: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays NGS

Q1: Do you know what transcripts you’re looking for?

Microarrays vs NGS

Page 38: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays NGS

Yes No

Q1: Do you know what transcripts you’re looking for?

Microarrays vs NGS

Page 39: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays NGS

Microarrays vs NGSQ2: Do you have a lot of money to spend on experiments?

Page 40: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays NGS

YesNo

Microarrays vs NGSQ2: Do you have a lot of money to spend on experiments?

Page 41: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays NGS

Microarrays vs NGSQ3: Do you want to rely on the most well-tested and developed methods?

Page 42: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays vs NGS

From: David M. Rocke, UC-Davis

Page 43: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays vs NGS

From: David M. Rocke, UC-Davis

Page 44: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays vs NGS

From: David M. Rocke, UC-Davis

Page 45: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays vs NGS

From: David M. Rocke, UC-Davis

Page 46: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays vs NGS

From: David M. Rocke, UC-Davis

Page 47: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays vs NGS

From: David M. Rocke, UC-Davis

Page 48: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays NGS

Yes No

Microarrays vs NGSQ3: Do you want to rely on the most well-tested and developed methods?

Page 49: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Other Microarray Applications

• Genotyping arrays

• Methylation arrays

• Target enrichment (pre-sequencing)

• Rapid pathogen detection in-the-field (Influenza sub-typing)

• Protein arrays (parallelized ELISA)

• Antibody arrays

• High-throughput standardized testing (drug development) ($$$)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 50: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

1 Knife = 1 Knife

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Microarrays NGS

Page 51: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Microarray Gene Expression Workflow

1. Experimental Design

2. RNA Isolation and Labeling

3. Hybridization

4. Preprocessing

5. Data Analysis

6. Biological Confirmation

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 52: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Microarray Gene Expression Workflow

1. Experimental Design

2. RNA Isolation and Labeling

3. Hybridization

4. Preprocessing

5. Data Analysis

6. Biological Confirmation

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 53: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Experimental Design

Define Biological Question and Samples Needed • Tissue comparison. Ex. Regions of the brain• Time course. Ex. Pathogen life cycle• +/- Treatment. Ex. Drug treatment

Determine Appropriate Array Platform and Labeling Procedures • Are arrays commercially available for your purpose?• 1 or 2 color labeling needed? (2-color requires reverse labeling)• Amount of material needed? (1 to 5 ug total RNA/sample)• Make sure probes are randomized on an array

Plan entire workflow ahead of time to maximize experimental control • Prepare a well defined sample preparation procedure!• Do all steps for samples in parallel if possible, from RNA isolation, to

labeling, to hybridization, to scanning (same person and machine too).

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 54: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Experimental Design

Define Biological Question and Samples Needed • Tissue comparison. Ex. Regions of the brain• Time course. Ex. Pathogen life cycle• +/- Treatment. Ex. Drug treatment

Determine Appropriate Array Platform and Labeling Procedures • Are arrays commercially available for your purpose?• 1 or 2 color labeling needed? (2-color requires reverse labeling)• Amount of material needed? (1 to 5 ug total RNA/sample)• Make sure probes are randomized on an array

Plan entire workflow ahead of time to maximize experimental control • Prepare a well defined sample preparation procedure!• Do all steps for samples in parallel if possible, from RNA isolation, to

labeling, to hybridization, to scanning (same person and machine too).

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 55: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Experimental Design

Define Biological Question and Samples Needed • Tissue comparison. Ex. Regions of the brain• Time course. Ex. Pathogen life cycle• +/- Treatment. Ex. Drug treatment

Determine Appropriate Array Platform and Labeling Procedures • Are arrays commercially available for your purpose?• 1 or 2 color labeling needed? (2-color requires reverse labeling)• Amount of material needed? (1 to 5 ug total RNA/sample)• Make sure probes are randomized on an array

Plan entire workflow ahead of time to maximize experimental control • Prepare a well defined sample preparation procedure!• Do all steps for samples in parallel if possible, from RNA isolation, to

labeling, to hybridization, to scanning (same person and machine too).

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 56: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Microarray Gene Expression Workflow

1. Experimental Design

2. RNA Isolation and Labeling

3. Hybridization

4. Preprocessing

5. Data Analysis

6. Biological Confirmation

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 57: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

RNA Isolation and Labeling

Isolate RNA • Total RNA with Trizol• Further isolation of mRNA if needed• Assess quality of RNA

Agilent 2100 Bioanalyzer • Calculates RNA Integrity Number (RIN)• Examines the entire electrophoretic trace of the RNA sample including the presence/absence of degradation products

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 58: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

RNA and Probe Preparation

Direct Labeling

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 59: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

RNA and Probe Preparation

Indirect Labeling• Improved

efficiency of nucleotide incorporation

Direct Labeling

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 60: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

RNA and Probe Preparation

Affymetrix Protocol Indirect labeling w/ Amplification (1 color)

1. Reverse Transcription2. In Vitro Transcriptionto produce cRNA (signal amplification)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 61: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Microarray Gene Expression Workflow

1. Experimental Design

2. RNA Isolation and Labeling

3. Hybridization

4. Preprocessing

6. Data Analysis

7. Biological Confirmation

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 62: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Hybridization

Affymetrix Protocol

1. Pre-Hyb (10’)2. Hyb (16hr)3. Streptavidin -

Phycoerythrin (SAPE)

4. anti-SA Ab-biotin(more signal amplification!)5. SAPE6. Scan

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 63: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Hybridization

Affymetrix Protocol

1. Pre-Hyb (10’)2. Hyb (16hr)3. Streptavidin -

Phycoerythrin (SAPE)

4. anti-SA Ab-biotin(more signal amplification!)5. SAPE6. Scan

~3 days from RNA isolation to scan

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 64: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

RNA and Probe Preparation

Why all the signal amplification?

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 65: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

RNA and Probe Preparation

Why all the signal amplification?

1-5 ugtotal RNA

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 66: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Microarray Gene Expression Workflow

1. Experimental Design

2. RNA Isolation and Labeling

3. Hybridization

4. Preprocessing

5. Data Analysis

6. Biological Confirmation

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 67: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

PreprocessingGoal: To remove the systematic bias in the data as completely as possible while

preserving the variation in gene expression that occurs because of biologically relevant changes in transcription

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 68: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Preprocessing

Steps:

• Quantitation - Convert image into a series of numbers (image analysis) (.CEL Files).• Data import - Data must be collated from different formats housed in different files/

databases.• Quality assessment - Detects divergent measurements beyond the level of random

fluctuations. • Background adjustment - Adjustment of observed expression levels to account for

non-specific hybridization (noise).• Normalization - Allows for arrays to be compared to one another by controlling for

different efficiencies of reverse transcription, labeling or hybridization reactions, physical problems with the arrays, reagent batch effects and different laboratory conditions.

• Summarization - Combines multiple probe intensities for a particular gene to produce a single expression value for that gene.

Goal: To remove the systematic bias in the data as completely as possible while preserving the variation in gene expression that occurs because of biologically

relevant changes in transcription

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 69: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Preprocessing

Steps:

• Quantitation - Convert image into a series of numbers (image analysis) (.CEL Files).• Data import - Data must be collated from different formats housed in different files/

databases.• Quality assessment - Detects divergent measurements beyond the level of random

fluctuations. • Background adjustment - Adjustment of observed expression levels to account for

non-specific hybridization (noise).• Normalization - Allows for arrays to be compared to one another by controlling for

different efficiencies of reverse transcription, labeling or hybridization reactions, physical problems with the arrays, reagent batch effects and different laboratory conditions.

• Summarization - Combines multiple probe intensities for a particular gene to produce a single expression value for that gene.

Goal: To remove the systematic bias in the data as completely as possible while preserving the variation in gene expression that occurs because of biologically

relevant changes in transcription

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 70: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Preprocessing

Steps:

• Quantitation - Convert image into a series of numbers (image analysis) (.CEL Files).• Data import - Data must be collated from different formats housed in different files/

databases.• Quality assessment - Detects divergent measurements beyond the level of random

fluctuations. • Background adjustment - Adjustment of observed expression levels to account for

non-specific hybridization (noise).• Normalization - Allows for arrays to be compared to one another by controlling for

different efficiencies of reverse transcription, labeling or hybridization reactions, physical problems with the arrays, reagent batch effects and different laboratory conditions.

• Summarization - Combines multiple probe intensities for a particular gene to produce a single expression value for that gene.

Goal: To remove the systematic bias in the data as completely as possible while preserving the variation in gene expression that occurs because of biologically

relevant changes in transcription

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 71: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Quality Assessment• First thing to do: obtain overview of array signal • Box plots and histograms• Identify outlier arrays by examining probe intensities across all arrays at once.• Array “f” appears to stand out in box plot (Note: normalization can often correct thisdifference)• Array “a” appears to have a bimodal distribution in the histogram which usually indicatesa spatial artifact, i.e. large section of the array has abnormally high values.

Arrays

log

Inte

nsity

log Intensity

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 72: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Quality Assessment• First thing to do: obtain overview of array signal • Box plots and histograms• Identify outlier arrays by examining probe intensities across all arrays at once.• Array “f” appears to stand out in box plot (Note: normalization can often correct thisdifference)• Array “a” appears to have a bimodal distribution in the histogram which usually indicatesa spatial artifact, i.e. large section of the array has abnormally high values.

Arrays

log

Inte

nsity

log Intensity

What do you see?

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 73: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

• First thing to do: obtain overview of array signal• Box plots and histograms• Identify outlier arrays by examining probe intensities across all arrays at once.• Array “f” appears to stand out in box plot (Note: normalization can often correct thisdifference)• Array “a” appears to have a bimodal distribution in the histogram which usually indicatesa spatial artifact, i.e. large section of the array has abnormally high values.

Arrays

log

Inte

nsity

log Intensity

Quality Assessment

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 74: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

• First thing to do: obtain overview of array signal • Box plots and histograms• Identify outlier arrays by examining probe intensities across all arrays at once.• Array “f” appears to stand out in box plot (Note: normalization can often correct thisdifference)• Array “a” appears to have a bimodal distribution in the histogram which usually indicatesa spatial artifact, i.e. large section of the array has abnormally high values.

Arrays

log

Inte

nsity

log Intensity

Quality Assessment

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 75: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Raw Image Inspection

Crop circles

Ring of fire

Full moon

Tricolor

Thumb print

Arcs

http://plmimagegallery.bmbolstad.com/

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 76: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Preprocessing

Steps:

• Quantitation - Convert image into a series of numbers (image analysis) (.CEL Files).• Data import - Data must be collated from different formats housed in different files/

databases.• Quality assessment - Detects divergent measurements beyond the level of random

fluctuations. • Background adjustment - Adjustment of observed expression levels to account for

non-specific hybridization (noise).• Normalization - Allows for arrays to be compared to one another by controlling for

different efficiencies of reverse transcription, labeling or hybridization reactions, physical problems with the arrays, reagent batch effects and different laboratory conditions.

• Summarization - Combines multiple probe intensities for a particular gene to produce a single expression value for that gene.

Goal: To remove the systematic bias in the data as completely as possible while preserving the variation in gene expression that occurs because of biologically

relevant changes in transcription

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 77: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Background Adjustment

• Background noise distribution calculated using negative controls or empty spots• Subtract background noise from raw probe intensities

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 78: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Preprocessing

Steps:

• Quantitation - Convert image into a series of numbers (image analysis) (.CEL Files).• Data import - Data must be collated from different formats housed in different files/

databases.• Quality assessment - Detects divergent measurements beyond the level of random

fluctuations. • Background adjustment - Adjustment of observed expression levels to account for

non-specific hybridization (noise).• Normalization - Allows for arrays to be compared to one another by controlling for

different efficiencies of reverse transcription, labeling or hybridization reactions, physical problems with the arrays, reagent batch effects and different laboratory conditions.

• Summarization - Combines multiple probe intensities for a particular gene to produce a single expression value for that gene.

Goal: To remove the systematic bias in the data as completely as possible while preserving the variation in gene expression that occurs because of biologically

relevant changes in transcription

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 79: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Why do we need normalization?

• Some arrays are brighter than others.• Not due to the biological data but to

unavoidable experimental differences.• Goal: Normalization corrects this kind of

difference w/o altering the biological data so that cross array analyses can be conducted (differential expression, etc.).

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 80: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Why do we need normalization?

• Some arrays are brighter than others.• Not due to the biological data but to

unavoidable experimental differences.• Goal: Normalization aims to correct this kind

of difference w/o altering the biological data so that cross array analyses can be conducted (differential expression, etc.).

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 81: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Why do we need normalization?Before Normalization

• Some arrays are brighter than others.• Not due to the biological data but to

unavoidable experimental differences.• Goal: Normalization aims to correct this kind

of difference w/o altering the biological data so that cross array analyses can be conducted (differential expression, etc.).

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 82: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Why do we need normalization?

After Normalization

Before Normalization

• Some arrays are brighter than others.• Not due to the biological data but to

unavoidable experimental differences.• Goal: Normalization aims to correct this kind

of difference w/o altering the biological data so that cross array analyses can be conducted (differential expression, etc.).

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 83: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Scatter PlotsSimple to compare inter-array expression, no normalization

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 84: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Scatter PlotsSimple to compare inter-array expression, no normalization

Genes on 45 degree angle expressed the same in both

1 - Higher expressed genes in Control2 - Higher expressed genes in Downs3 - Low expression genes in both4 - High expression genes in both

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 85: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Scatter PlotsSimple to compare inter-array expression, no normalization

Genes on 45 degree angle expressed the same in both

1 - Higher expressed genes in Control2 - Higher expressed genes in Downs3 - Low expression genes in both4 - High expression genes in both

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 86: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Scatter PlotsSimple to compare inter-array expression, no normalization

Genes on 45 degree angle expressed the same in both

1 - Higher expressed genes in Control2 - Higher expressed genes in Downs3 - Low expression genes in both4 - High expression genes in both

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 87: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Why Log Transformation?• Experimentalists using microarrays are very often interested in fold change• Log scale provides symmetry in expression ratios• Example: 2-fold up-regulation = 2, but 2-fold down-regulation= 0.5• Without transformation, all down-regulated fold changes compressed between 0 and 1

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 88: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Why Log Transformation?• Experimentalists using microarrays are very often interested in fold change• Log scale provides symmetry in expression ratios• Example in raw ratio space: 2-fold up-regulation = 2, but 2-fold down-regulation= 0.5• Without transformation, all down-regulated fold changes compressed between 0 and 1

t1 t2 t3

Raw Ratio 1 2 0.5

Log2 Ratio 0 1 -1

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 89: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

MA plots are used to determine data needs normalization and assess if the normalization worked (sideways scatter plot).

M = log fold change for a gene xA = average log intensity for gene x

• A local regression (LOESS) curve can be fitted to the scatter plot to summarize non-linear data.

• A LOESS curve that oscillates and/or has variability of M values greater than other arrays indicates an issue.

MA Plots

Arra

y1/A

rray2

Arra

y1/A

rray2

Before Norm.

After Norm.

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 90: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

MA plots are used to determine data needs normalization and assess if the normalization worked (sideways scatter plot).

M = log fold change for a gene xA = average log intensity for gene x

• A local regression (LOESS) curve can be fitted to the scatter plot to summarize non-linear data.

• A LOESS curve that oscillates and/or has variability of M values greater than other arrays indicates an issue.

MA Plots

Arra

y1/A

rray2

Arra

y1/A

rray2

Before Norm.

After Norm.

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 91: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

MA plots are used to determine data needs normalization and assess if the normalization worked (sideways scatter plot).

M = log fold change for a gene xA = average log intensity for gene x

• A local regression (LOESS) curve can be fitted to the scatter plot to summarize non-linear data.

• A LOESS curve that oscillates and/or has variability of M values greater than other arrays indicates an issue.

MA Plots

Arra

y1/A

rray2

Arra

y1/A

rray2

Before Norm.

After Norm.

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 92: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

MA plots are used to determine data needs normalization and assess if the normalization worked (sideways scatter plot).

M = log fold change for a gene xA = average log intensity for gene x

• A local regression (LOESS) curve can be fitted to the scatter plot to summarize non-linear data.

• A LOESS curve that oscillates and/or has variability of M values greater than other arrays indicates an issue.

• Instead of 1-to-1 comparisons, each array can also be compared to a “synthetic” array calculated by taking probe-wise medians

MA Plots

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 93: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization Strategies

Simplest idea:• Calculate median expression from all arrays • Do global normalization by multiplying all probes by a normalization constant

However...

Often there is a non-linear dependence on intensity

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 94: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization Strategies

Simplest idea:• Calculate median expression from all arrays • Do global normalization by multiplying all probes by a normalization constant

However...

Often there is a non-linear dependence on intensity

Array 1 Array 2

Median Expression 5,000 10,000

Normalization Factor 2 1

Normalized Mean

Expression10,000 10,000

Global Normalization

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 95: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization Strategies

Simplest idea:• Calculate median expression from all arrays • Do global normalization by multiplying all probes by a normalization constant

However...

Often there is a non-linear dependence on intensity

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 96: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization Strategies

Gene 1 Gene 2 Gene 3 Gene 4 Total Reads

Sample 1 10,000 100 150 200 10,450

Sample 2 20,000 10 150 200 20,360

Global Normalization - NGS

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Before

Page 97: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization Strategies

Gene 1 Gene 2 Gene 3 Gene 4 Total Reads

Sample 1 10,000 100 150 200 10,450

Sample 2 20,000 10 150 200 20,360

Global Normalization - NGS

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Gene 1 Gene 2 Gene 3 Gene 4 Total Reads

Sample 1 14,742 147 221 294 15,405

Sample 2 15,133 8 113 151 15,405

Before

After

Page 98: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization Strategies

Gene 1 Gene 2 Gene 3 Gene 4 Total Reads

Sample 1 10,000 100 150 200 10,450

Sample 2 20,000 10 150 200 20,360

Global Normalization - NGS

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Gene 1 Gene 2 Gene 3 Gene 4 Total Reads

Sample 1 14,742 147 221 294 15,405

Sample 2 15,133 8 113 151 15,405

Before

After

Page 99: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization StrategiesParametric methods: Force distributions (not just medians) to be the same: • Amaratunga and Cabrera (2001)• Bolstad et al. (2003)

Use curve estimators such as splines to adjust for the effect: • Li and Wong (2001)• Colantuoni et al. (2002)• Dudoit et al. (2002)

Adjustments based on additive/multiplicative model: • Rocke and Durbin (2003)• Huber et al. (2002)• Cui et al. (2003)

Quantile Normalization (non-parametric) • Bolstad et al. (2003)

• Every probe value on any one chip is mapped to the corresponding quantile of the standard distribution; hence quantile normalization

• The average of all available arrays can be used to form an average empirical distribution

• Simple and effective!Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 100: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization StrategiesParametric methods: Force distributions (not just medians) to be the same: • Amaratunga and Cabrera (2001)• Bolstad et al. (2003)

Use curve estimators such as splines to adjust for the effect: • Li and Wong (2001)• Colantuoni et al. (2002)• Dudoit et al. (2002)

Adjustments based on additive/multiplicative model: • Rocke and Durbin (2003)• Huber et al. (2002)• Cui et al. (2003)

Quantile Normalization (non-parametric) • Bolstad et al. (2003)

• Every probe value on any one chip is mapped to the corresponding quantile of the standard distribution; hence quantile normalization

• The average of all available arrays can be used to form an average empirical distribution

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 101: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization StrategiesParametric methods: Force distributions (not just medians) to be the same: • Amaratunga and Cabrera (2001)• Bolstad et al. (2003)

Use curve estimators such as splines to adjust for the effect: • Li and Wong (2001)• Colantuoni et al. (2002)• Dudoit et al. (2002)

Adjustments based on additive/multiplicative model: • Rocke and Durbin (2003)• Huber et al. (2002)• Cui et al. (2003)

Quantile Normalization (non-parametric) • Bolstad et al. (2003)

• Every probe value on any one chip is mapped to the corresponding quantile of the standard distribution; hence quantile normalization

• The average of all available arrays can be used to form an average empirical distribution

I

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Arrays

Page 102: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization StrategiesParametric methods: Force distributions (not just medians) to be the same: • Amaratunga and Cabrera (2001)• Bolstad et al. (2003)

Use curve estimators such as splines to adjust for the effect: • Li and Wong (2001)• Colantuoni et al. (2002)• Dudoit et al. (2002)

Adjustments based on additive/multiplicative model: • Rocke and Durbin (2003)• Huber et al. (2002)• Cui et al. (2003)

Quantile Normalization (non-parametric) • Bolstad et al. (2003)

• Every probe value on any one chip is mapped to the corresponding quantile of the standard distribution; hence quantile normalization

• The average of all available arrays can be used to form an average empirical distribution

I

II

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Arrays

Page 103: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization StrategiesParametric methods: Force distributions (not just medians) to be the same: • Amaratunga and Cabrera (2001)• Bolstad et al. (2003)

Use curve estimators such as splines to adjust for the effect: • Li and Wong (2001)• Colantuoni et al. (2002)• Dudoit et al. (2002)

Adjustments based on additive/multiplicative model: • Rocke and Durbin (2003)• Huber et al. (2002)• Cui et al. (2003)

Quantile Normalization (non-parametric) • Bolstad et al. (2003)

• Every probe value on any one chip is mapped to the corresponding quantile of the standard distribution; hence quantile normalization

• The average of all available arrays can be used to form an average empirical distribution

I

II

III

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Arrays

Page 104: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization StrategiesParametric methods: Force distributions (not just medians) to be the same: • Amaratunga and Cabrera (2001)• Bolstad et al. (2003)

Use curve estimators such as splines to adjust for the effect: • Li and Wong (2001)• Colantuoni et al. (2002)• Dudoit et al. (2002)

Adjustments based on additive/multiplicative model: • Rocke and Durbin (2003)• Huber et al. (2002)• Cui et al. (2003)

Quantile Normalization (non-parametric) • Bolstad et al. (2003)

• Every probe value on any one chip is mapped to the corresponding quantile of the standard distribution; hence quantile normalization

• The average of all available arrays can be used to form an average empirical distribution

I

II

III

IV

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Arrays

Page 105: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization StrategiesParametric methods: Force distributions (not just medians) to be the same: • Amaratunga and Cabrera (2001)• Bolstad et al. (2003)

Use curve estimators such as splines to adjust for the effect: • Li and Wong (2001)• Colantuoni et al. (2002)• Dudoit et al. (2002)

Adjustments based on additive/multiplicative model: • Rocke and Durbin (2003)• Huber et al. (2002)• Cui et al. (2003)

Quantile Normalization (non-parametric) • Bolstad et al. (2003)

• Every probe value on any one chip is mapped to the corresponding quantile of the standard distribution; hence quantile normalization

• The average of all available arrays can be used to form an average empirical distribution

I

II

III

IV

V

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Arrays

Page 106: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization StrategiesParametric methods: Force distributions (not just medians) to be the same: • Amaratunga and Cabrera (2001)• Bolstad et al. (2003)

Use curve estimators such as splines to adjust for the effect: • Li and Wong (2001)• Colantuoni et al. (2002)• Dudoit et al. (2002)

Adjustments based on additive/multiplicative model: • Rocke and Durbin (2003)• Huber et al. (2002)• Cui et al. (2003)

Quantile Normalization (non-parametric) • Bolstad et al. (2003)

• Every probe value on any one chip is mapped to the corresponding quantile of the standard distribution; hence quantile normalization

• The average of all available arrays can be used to form an average empirical distribution

I

II

III

IV

V

VI

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Arrays

Page 107: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization StrategiesParametric methods: Force distributions (not just medians) to be the same: • Amaratunga and Cabrera (2001)• Bolstad et al. (2003)

Use curve estimators such as splines to adjust for the effect: • Li and Wong (2001)• Colantuoni et al. (2002)• Dudoit et al. (2002)

Adjustments based on additive/multiplicative model: • Rocke and Durbin (2003)• Huber et al. (2002)• Cui et al. (2003)

Quantile Normalization (non-parametric) • Bolstad et al. (2003)

• Every probe value on any one chip is mapped to the corresponding quantile of the standard distribution; hence quantile normalization

• The average of all available arrays can be used to form an average empirical distribution

• Simple & effective!

I

II

III

IV

V

VI

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Arrays

Page 108: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization StrategiesBefore Normalization

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 109: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Normalization StrategiesAfter Normalization

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 110: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

MAS5.0, RMA, GCRMAMAS 5.0 (Microarray Suite - Affymetrix): • Adjusts for background noise by subtracting MM from PM signal but this is an over adjustment.• MM probes detect specific signal such that a third of all MM probes are brighter than their PM counterpart. Due to specific + non-specific binding.

RMA (Robust Multiarray Averaging): • Increases precision but sacrifices some accuracy by using a background adjustment step that corrects PM probe-intensities chip by chip but ignores MM intensities.• Also uses quantile normalization

GCRMA (GeneChip Robust Multiarray Averaging): • Similar to RMA, but corrects background using sequence data of probes to account for non-specific binding (NSB).• MM probes not ignored, improved precision and accuracy.

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 111: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

MAS5.0, RMA, GCRMAMAS 5.0 (Microarray Suite - Affymetrix): • Adjusts for background noise by subtracting MM from PM signal but this is an over adjustment.• MM probes detect specific signal such that a third of all MM probes are brighter than their PM counterpart. Due to specific + non-specific binding.

RMA (Robust Multiarray Averaging): • Increases precision but sacrifices some accuracy by using a background adjustment step that corrects PM probe-intensities chip by chip but ignores MM intensities.• Also uses quantile normalization

GCRMA (GeneChip Robust Multiarray Averaging): • Similar to RMA, but corrects background using sequence data of probes to account for non-specific binding (NSB).• MM probes not ignored, improved precision and accuracy.

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 112: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

MAS5.0, RMA, GCRMAMAS 5.0 (Microarray Suite - Affymetrix): • Adjusts for background noise by subtracting MM from PM signal but this is an over adjustment.• MM probes detect specific signal such that a third of all MM probes are brighter than their PM counterpart. Due to specific + non-specific binding.

RMA (Robust Multiarray Averaging): • Increases precision but sacrifices some accuracy by using a background adjustment step that corrects PM probe-intensities chip by chip but ignores MM intensities.• Also uses quantile normalization

GCRMA (GeneChip Robust Multiarray Averaging): • Similar to RMA, but corrects background using sequence data of probes to account for non-specific binding.• MM probes not ignored, improved precision and accuracy.

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 113: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Microarray Gene Expression Workflow

1. Experimental Design

2. RNA Isolation and Labeling

3. Hybridization

4. Preprocessing

5. Data Analysis

6. Biological Confirmation

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 114: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 115: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. T-test Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value assesses statistical significance based on normal distributionMultiple Testing Problem:• 1 gene, p = 0.05 means 5% chance difference by chance alone. (OK)• 1000 genes, p = 0.05 means 50 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (FDR-adjusted p-value)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 116: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. T-test Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value assesses statistical significance based on normal distributionMultiple Testing Problem:• 1 gene, p = 0.05 means 5% chance difference by chance alone. (OK)• 1000 genes, p = 0.05 means 50 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (FDR-adjusted p-value)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 117: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. T-test Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value assesses statistical significance based on normal distributionMultiple Testing Problem:• 1 gene, p = 0.05 means 5% chance difference by chance alone. (OK)• 1000 genes, p = 0.05 means 50 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (FDR-adjusted p-value)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 118: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. T-test Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value assesses statistical significance based on normal distributionMultiple Testing Problem:• 1 gene, p = 0.05 means 5% chance difference by chance alone. (OK)• 1000 genes, p = 0.05 means 50 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (FDR-adjusted p-value)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 119: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. Statistical Tests (t-test, ANOVA) Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value assesses statistical significance based on normal distributionMultiple Testing Problem:• 1 gene, p = 0.05 means 5% chance difference by chance alone. (OK)• 1000 genes, p = 0.05 means 50 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (FDR-adjusted p-value)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 120: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. Statistical Tests (t-test, ANOVA) Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value < threshold (x) indicates only x% of the time would the observed differences be due to chance (norm. dist.)Multiple Testing Problem:• 1 gene, p = 0.05 means 5% chance difference by chance alone. (OK)• 1000 genes, p = 0.05 means 50 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (FDR-adjusted p-value)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 121: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. Statistical Tests (t-test, ANOVA) Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value < threshold (x) indicates only x% of the time would the observed differences be due to chance (norm. dist.)Multiple Testing Problem:• 1 gene, p = 0.05 means 5% difference by chance alone. (OK)• 100 genes, p = 0.05 means 5 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (FDR-adjusted p-value)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 122: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. Statistical Tests (t-test, ANOVA) Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value < threshold (x) indicates only x% of the time would the observed differences be due to chance (norm. dist.)Multiple Testing Problem:• 1 gene, p = 0.05 means 5% difference by chance alone. (OK)• 100 genes, p = 0.05 means 5 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (FDR-adjusted p-value)

• p/# tests - too stringent!

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 123: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. Statistical Tests (t-test, ANOVA) Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value < threshold (x) indicates only x% of the time would the observed differences be due to chance (norm. dist.)Multiple Testing Problem:• 1 gene, p = 0.05 means 5% difference by chance alone. (OK)• 100 genes, p = 0.05 means 5 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (False Discovery Rate-adjusted p-value)

• p/# tests - too stringent!

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 124: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. Statistical Tests (t-test, ANOVA) Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value < threshold (x) indicates only x% of the time would the observed differences be due to chance (norm. dist.)Multiple Testing Problem:• 1 gene, p = 0.05 means 5% difference by chance alone. (OK)• 100 genes, p = 0.05 means 5 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (False Discovery Rate-adjusted p-value)

• p/# tests - too stringent!

FDR = # false positives

# called significant

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 125: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. Statistical Tests (t-test, ANOVA) Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value < threshold (x) indicates only x% of the time would the observed differences be due to chance (norm. dist.)Multiple Testing Problem:• 1 gene, p = 0.05 means 5% difference by chance alone. (OK)• 100 genes, p = 0.05 means 5 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (False Discovery Rate-adjusted p-value)

• p/# tests - too stringent!

FDR = # false positives

# called significant1 - 0.95100 = 0.994

Example: Assuming the 100 tests are statically independent, the probability of obtaining at least one significant result is…

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 126: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

How to Identify Differential Expression1. Calculate expression ratio and rank order

Problems:• What threshold? Background subtraction? Ex. Two-fold change, 50 background (150/100 NS, 100/50 S!)

2. Percentage Problems:• What threshold? What is significant? Always a top 5%.

3. Statistical Tests (t-test, ANOVA) Null hypothesis is there is no difference in a gene’s expression between groups. Example: 7 treated, 7 untreated cellsp-value < threshold (x) indicates only x% of the time would the observed differences be due to chance (norm. dist.)Multiple Testing Problem:• 1 gene, p = 0.05 means 5% difference by chance alone. (OK)• 100 genes, p = 0.05 means 5 would be false positives by chance alone. (!!!!)

Solution:• Filter out non-expressed genes to limit number of tests• Use a correction for multiple tests (False Discovery Rate-adjusted p-value)

• p/# tests - too stringent!

FDR = # false positives

# called significant1 - 0.95100 = 0.994

Benjamini-Hochberg procedure (1995) - produces an adjusted p-value

Example: Assuming the 100 tests are statically independent, the probability of obtaining at least one significant result is…

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 127: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

ClusteringWhich genes are associated with each other or a particular state/condition?

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 128: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Clustering

Unsupervised (no prior knowledge used) • Hierarchical (Trees)• Non-hierarchical (K-means)• Cluster 3.0

http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm#ctv

1. Filter out genes that are not expressed in any samples2. Calculate distance between samples using expression of

genes• Euclidean• Pearson

3. Cluster samples based on these distances• Single• Complete• Centroid

Supervised (prior knowledge used) Many methods available...

Ontology-based Pattern Identification (OPI)

Which genes are associated with each other or a particular state/condition?

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 129: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Clustering

Unsupervised (no prior knowledge used) • Hierarchical (Trees)• Non-hierarchical (K-means)• Cluster 3.0

http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm#ctv

1. Filter out genes that are not expressed in any samples2. Calculate distance between samples using expression of

genes• Euclidean• Pearson

3. Cluster samples based on these distances• Single• Complete• Centroid

Supervised (prior knowledge used) Many methods available...

Ontology-based Pattern Identification (OPI)

Which genes are associated with each other or a particular state/condition?

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 130: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Clustering

Unsupervised (no prior knowledge used) • Hierarchical (Trees)• Non-hierarchical (K-means)• Cluster 3.0

http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm#ctv

1. Filter out genes (not expressed and/or stable)2. Calculate distance between samples based on gene

expression• Euclidean• Pearson

3. Cluster samples based on these distances• Single (Maximum similarity)• Complete (Minimum similarity)• Centroid (Average similarity)

Supervised (prior knowledge used) Many methods available...

Ontology-based Pattern Identification (OPI)

Which genes are associated with each other or a particular state/condition?

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 131: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

ClusteringClustering

Unsupervised (no prior knowledge used) • Hierarchical (Trees)• Non-hierarchical (K-means)• Cluster 3.0

http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm#ctv

1. Filter out genes (not expressed and/or stable)2. Calculate distance between samples based on gene

expression• Euclidean• Pearson

3. Cluster samples based on these distances• Single (Maximum similarity)• Complete (Minimum similarity)• Centroid (Average similarity)

Supervised (prior knowledge used) Many methods available...

Ontology-based Pattern Identification (OPI)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 132: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

ClusteringClustering

Unsupervised (no prior knowledge used) • Hierarchical (Trees)• Non-hierarchical (K-means)• Cluster 3.0

http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm#ctv

1. Filter out genes (not expressed and/or stable)2. Calculate distance between samples based on gene

expression• Euclidean• Pearson

3. Cluster samples based on these distances• Single (Maximum similarity)• Complete (Minimum similarity)• Centroid (Average similarity)

Supervised (prior knowledge used) Many methods available...

Ontology-based Pattern Identification (OPI)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 133: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

ClusteringClustering

Unsupervised (no prior knowledge used) • Hierarchical (Trees)• Non-hierarchical (K-means)• Cluster 3.0

http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm#ctv

1. Filter out genes (not expressed and/or stable)2. Calculate distance between samples based on gene

expression• Euclidean• Pearson

3. Cluster samples based on these distances• Single (Maximum similarity)• Complete (Minimum similarity)• Centroid (Average similarity)

Supervised (prior knowledge used) Many methods available...

Ontology-based Pattern Identification (OPI)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 134: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Centroid Clustering

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 135: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Centroid Clustering

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 136: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Clustering

Unsupervised (no prior knowledge used) • Hierarchical (Trees)• Non-hierarchical (K-means)• Cluster 3.0

http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm#ctv

1. Filter out genes (not expressed and/or stable)2. Calculate distance between samples based on gene

expression• Euclidean• Pearson

3. Cluster samples based on these distances• Single• Complete• Average (Centroid)

Supervised (prior knowledge used) Many methods available (machine learning, etc.)...Ex. Ontology-based Pattern Identification (OPI)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 137: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Microarray Gene Expression Workflow

1. Experimental Design

2. RNA Isolation and Labeling

3. Hybridization

4. Preprocessing

5. Data Analysis

6. Biological Confirmation

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 138: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Biological ConfirmationMicroarray gene expression must be confirmed using other

experimental techniques.

• Northern Blot• RT PCR• qPCR

• Also functionalconfirmation• mRNA != protein

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 139: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Biological ConfirmationMicroarray gene expression must be confirmed using other

experimental techniques.

• Northern Blot• RT PCR• qPCR

• Also functionalconfirmation• mRNA != protein

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 140: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Microarray Gene Expression Workflow

1. Experimental Design

2. RNA Isolation and Labeling

3. Hybridization

4. Preprocessing

5. Data Analysis

6. Biological Confirmation

7. Sharing of Data

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 141: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Sharing of DataMinimum Information About a Microarray Experiment (MIAME) (2001)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 142: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Sharing of Data

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 143: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Kicic, et al., Decreased fibronectin production significantly contributes to dysregulated repair of asthmatic epithelium. Am J. Respir Crit Care Med, 2010. 181(9): p.889-98.

AIM: Identify differentially expressed genes between disease and control groupWhat differences in gene expression may be responsible for differences in phenotype?

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 144: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Kicic, et al., Decreased fibronectin production significantly contributes to dysregulated repair of asthmatic epithelium. Am J. Respir Crit Care Med, 2010. 181(9): p.889-98.

AIM: Identify differentially expressed genes between disease and control groupWhat differences in gene expression may be responsible for differences in phenotype?

a.k.a. A Fishing Expedition!Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 145: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Methods• Epithelial cells were collected by bronchial brushing and cultured, and then classified as

healthy non-atopic (pAECHNA), healthy atopic (pAECHA), or atopic asthmatic (pAECAA).• RNA from 16 hybridizations (9 pAECHNA, 7 pAECAA) was quantified, assessed for quality

using Agilent Bioanalyser, and processed for hybridization to Affymetrix Human Genome U133 Arrays.

• Data were normalized by GCRMA and differential gene expression between groups assessed using LIMMA (supervised method).

• LIMMA: fits a linear model to the expression data of each gene and uses empirical Bayes to calculate a moderate t-statistic which smooths the standard errors across genes giving a more reliable results.

For more information see the LIMMA user guide (http://www.bioconductor.org/packages/2.5/bioc/html/limma.html)

Note: atopic = caused by a hereditary predisposition towards developing certain hypersensitivity reactions, such as asthma.

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 146: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Heatmap

• Figure 2. Differences in lower airway epithelial gene expression between healthy non-atopic children (HNA) and children with atopic asthma (AA). Differentially expressed genes based on false discovery rate of less than 0.25 and an absolute fold change of greater than or equal to 1.5 were arranged using unsupervised two-dimensional hierarchical clustering. Each column represents a differentially expressed gene and each row represents an individual subject. Colors represent fold change in each individual, with red indicating up-regulated genes and green indicating down-regulated genes with respect to the average of HNA subjects.

• Differentially regulated genes: 1612 (763 up, 848 down)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 147: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

Conclusion• Deposition of the extracellular matrix (ECM) is required to heal wounded epithelial cells.• Kicic, et al. noted that fibronectin (FN1) was the only down regulated ECM component in

asthmatic epithelial cell samples and hypothesized this was the reason for their inability to heal wounds.

Practical 1: You will reanalyze the data from this study to see if you arrive at the same conclusions as the original authors. (R - http://www.bioconductor.org)

Jason Young | Email: [email protected] | MED 263 | Winter 2015

Page 148: Functional Genomics Ihyphy.org/w/images/0/03/20140217_Week7Lecture_jyoung.pdf · Functional Genomics Functional genomic studies often labeled as “descriptive” versus “hypothesis-driven”

What You Learned Today...

Evaluations!

• Functional genomic methods for gene expression analysis

• Typical workflow for a microarray gene expression study

• Aspects of microarray data analysis• Kicic et al. (2010): Example of a

differential expression microarray study

Jason Young | Email: [email protected] | MED 263 | Winter 2015