1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal...

52
1 Expression Analysis Platforms

Transcript of 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal...

Page 1: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

1

Expression Analysis Platforms

Page 2: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

Friday's Class

4:00-5:00

140 SH

Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling biological systems. Currently, our lines of research are: diagnosing speech pathology, ultrasound signal processing, and

bioinformatics (particularly for phylogeny). All lines of research involve clustering algorithms.

The work on clustering seeks to determine the natural number of groups and to validate the clustering algorithms. Several techniques have been applied to genomic databases, among them are: resampling, analysis of missing data, using assumptions about a priori information. Now we are focusing on probabilistic models and validation of structural models. The studies are conducted on information generated by electrophoresis of species with agricultural applications, and are provided by Embrapa (www.embrapa.br). Now we are working with 3 doctoral students focusing on the area of binder phenotypic and genotypic information for varieties of corn.

2

Page 3: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

3

Other Courses

• Intro to Informatics (in CS)• Intro to Bioinformatics (51:121)

– provides a first exposure to some available computational techniques and resources

– however, the emphasis is on utilization

• In this course (51:123) -- I try to emphasize tools and techniques that you would use to go about developing your own computational resources (software, systems, tools, etc).

• Computational Methods in Molecular Biology (51:122 -- Casavant, Bair)– advanced topics

Page 4: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

Bioinformatics Certificate

• Offered by the Graduate College (MS/PhD)

• http://informatics.grad.uiowa.edu/bioinformatics/

4

Page 5: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

Final Exam

• 25 questions – mostly short answer/T/F– 1 paper– 1 genome sequencing– 2 Ensembl– 1 references– 1 array– 2 programming– 1 pattern matching– 2 expression– 3 other– 3 p-genes– 1 Blast/Blat– 5 Hash questions– 1 N-W– 1 sequencing

5

Page 6: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

6

Outline

• What is expression• Platforms

– ChIP on Chip– Gene expression– Exon arrays– Tiling arrays– SNP chips

• Applied S/W for Expression "Library" -- OTDB• Alternative Splicing• Association Study Example -- AMD• How to analyze

Page 7: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

7

What is expression?

Gene expression

mRNA - transcription

- microRNAs

Protein - translation

Page 8: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

8

A Typical Experiment

Case vs. Control

Ex. Retina cells +/- 7-keto-cholesterol3x redundancy

Look for differentially expressed genest-test, ANOVA

fold-change

Result --> set of genes

Page 9: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

9

But there’s so much more…

1. Differential expression of genes2. Time-courses3. Alternative splicing4. ChIP-on-chip5. High-density SNP genotyping6. Using chips to select genomic fragments for

re-sequencing

7. Additional annotation/analyses

Page 10: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

10

Definition of Microarray

• What is a gene expression array?– “A microarray is a small analytical device that

allows genomic exploration with speed and precision unprecedented in the history of biology” - Schena 2003

Page 11: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

11

Source: www.bioteach.ubc.ca/MolecularBiology/microarray/graphics by Jiang Long

Page 12: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

12

Page 13: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

13

Page 14: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

14

Physical Spotting

Page 15: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

15

http://www.nimblegen.com/technology/manufacture.html

Page 16: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

16

Advantages to Arrays

• A single array permits monitoring thousands of genes in parallel

• Provides information at genomic scale– Reveals gene function and gene interactions– Identifies relationship between genetic and biochemical

pathways– Identifies traits associated with multigenic origins

• Caveat - further modifications may occur– Post-transcriptional– Translational– Protein

Page 17: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

17

Microarray Research

• Ubiquitous in biology & agriculture research

• Interdisciplinary disciplines– Biology– Computer Science– Statistics

• Experiments require teams of individuals

• Analysis presents many obstacles that need to be overcome

Page 18: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

18

Statement of the Problem

• Obstacles impeding analysis process– Analysis is complex with multiple steps– Requires multiple discipline expertise

• Bio - understand underlying biology• Stats - normalization & statistical measures• Comp Sci - programmatic solutions, computation

resources

• Necessity for centralized analysis system– Robust– Extensible– Portable

Page 19: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

19

Platforms

Gene Expression Arrays

Exon Arrays

Tiling Arrays

SNP chips

Venders: Affymetrix, Nimblegene, Agilent, others

Page 20: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

20

An Aside

• State-of-the-art sequencing technology + microarray == ?

• 454-, pyro-, pyrophosphate sequencing

Page 21: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

21

Margulies, et. al, Nature, 2005

Page 22: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

22

Page 23: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

23

Page 24: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

24

GS 20 System Brochure 454/Roche

Page 25: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

25

Page 26: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

26

Page 27: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

27

Gene Chip + Sequencing454, pyro- or pyrophosphate sequencing

Genome sequencing in microfabricated high-density picolitre reactors, Margulies, et. al, Nature, 2005

Nature 2007

Page 28: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

Sequence Capture

28

http://www.nimblegen.com/products/seqcap/index.html

Page 29: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

29

Gene Expression Arrays

Traditional method, typically provides one or more probes that interrogate the expression level of a gene.

U133Plus2 - 54,000 probes

Page 30: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

30

Exon Arrays

Target each exon of a gene individually1,400,000 probe sets

Different levels of confidence/quality300,000 exons from full-length mRNAs

880,000+ exons from gene predictions

500,000+ “control” exons

Available for human, mouse and rat

Page 31: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

31

Tiling Array

http://www.affymetrix.com/products/arrays/specific/human_tiling.affx

Page 32: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

32

Tiling Arrays

Covering the entire genome with probesProbes every 35 bp across the genome

7-14 chips (depending on the application)

… or can focus on a specific area10,000 bp proximal promoter of every gene

1 chip

Page 33: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

33

Tiling Arrays - Applications

Applicationsexpression

protein-DNA interaction

DNA modificationsmethylation

acetylation

Anywhere in the genome!

Page 34: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

34

• What can you use tiling arrays for?

Page 35: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

35

ENCODE Project

Identification and analysis of functional elements in 1% of the human genome by the ENCODE

pilot project

Nature, V 447, June 14, 2007

Page 36: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

36

Transcript Connectivity• protein-coding loci are more transcriptionally complex than

previously thought• 19% of pseudogenes transcribed• genes had, on average, 10 different transcriptional start sites

Page 37: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

37

ChIP on Chip

http://www.chem.agilent.com/Scripts/generic.asp?lpage=37461&indcol=N&prodcol=N

Page 38: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

38

SNP chipsSNPs - single nucleotide polymorphisms

Affymetrix 6.0 Array• 906,600 SNPs• 946,000 (non-polymorphic) "monomorphic" SNPs

Applications: LinkageAssociation StudiesChanges in Copy Number (deletions/duplications)

Page 39: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

39

Association

populations Unaffected Affected

allele frequencies A1 A2 A1 A2SNP 1 0.74 0.26 0.75 0.25

SNP 2 0.70 0.30 0.10 0.90

Power increases with more samples, and more SNPs

Page 40: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

40

Page 41: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

41

Page 42: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

42

Alternative Splicing in the Eye

GOAL:To identify the splicing variants expressed in

retina, retinal pigment epithelia, and optic nerve head. (3x biological replicates)

Motivation:To guide/focus screening efforts to those

exons that are expressed.

In collaboration with Rob Mullins

Page 43: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

43

Show Probes

Page 44: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

44

Ocular Tissue Expression Database

Survey of 10 ocular tissues

GOAL: catalog which genes are expressed across tissues of specific interest in ocular

In collaboration with Abe Clark at Alcon

Page 45: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

45

Ocular Tissue Expression Database

Page 46: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

46

Ocular Tissue Expression Database

Page 47: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

47

End

Page 48: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

48

AMD Association Study

GOAL:Identify the major susceptibility regions for age-related macular degeneration.

Several regions have been reported– How may susceptibility regions are there?

Genotyping 400 AMD patients and controls with high-density SNP chips

400,000,000 genotypes

Page 49: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

49

Association

populations Unaffected Affected

allele frequencies A1 A2 A1 A2

SNP 1 0.74 0.26 0.75 0.25

SNP 2 0.70 0.30 0.10 0.90

Power increases with more samples, and more SNPs

Page 50: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

50

How to analyze the data?

First step is acquiring the data!

Normalization

Analysis

Page 51: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

51

Analysis

Differential Expressiont-test

ANOVA

Fold-change

Time series (all of the above)Correlation of expression

Early response vs. late response

Page 52: 1 Expression Analysis Platforms. Friday's Class 4:00-5:00 140 SH Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling.

52

AnalysisDAVID Database for Annotation, Visualization and

Integrated Discoveryhttp://david.abcc.ncifcrf.gov/

– Look for conservation of a particular function or annotation in the set of differentially expressed genes.

GSEA Gene Set Enrichment Analysishttp://www.broad.mit.edu/gsea/software/software_index.html

– Look for annotations that are differentially expressed (as a group).

Ex. Tour de France