Affymetrix GeneChips and Analysis Methods Neil Lawrence.

21
Affymetrix GeneChips and Analysis Methods Neil Lawrence

Transcript of Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Page 1: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Affymetrix GeneChipsand

Analysis Methods

Neil Lawrence

Page 2: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Schedule

18th April Introduction and Background

25th April cDNA Mircoarrays

2nd May No Lecture

9th May Affymetrix GeneChips

16th May Guest Lecturer – Dr Pen Rashbass

23rd May Analysis methods

and some of this

Page 3: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Photolithography

• Photolithography (Affymetrix) – Based on the same technique used to make

the microprocessors.– Oligonucleotides are generated in situ on a

silicon surface. – Oligonucleotides up to 30bp in length. – Array density of 106 probes per cm-2.

Page 4: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Affymetrix Stock Price

Page 5: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Affymetrix

• Only one biological sample per chip.

• Oligonucleotides represent a portion of a gene’s sequence.

• Twenty sub-sequences present for each gene.

Page 6: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Perfect vs Mismatch

• For each oligonucleotide there is– A perfect match– A mismatch

• The perfect match is a sub-sequence of the true sequence.

• The mismatch is a sub-sequence with a ‘central’ base-pair replaced.

Page 7: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Affymetrix Analysis

• Mismatch is designed to measure ‘background’.

• Signal from each sub-sequence isIPerfect match – IMismatch

• Twenty of these sub-sequences are present.

• Average of all these signals is taken.

Page 8: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Problems

• Sometimes Imismatch > Iperfect match

– Solution: set it to 20??!!!

• Other issues– Present/Absent call

• Based on the number of Signals > 0.

• Proprietary Technology– You don’t know what the subsequences are.

• Apparently this is changing!

Page 9: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Scaling Factors – Maximum likelihood estimation

• The data produced is still affected by undesirable variations that we need to remove.

• We can assume that the variations are primarily multiplicative: (No intensity dependent or print-tip effect)

Obs.-exp.Level = true-exp.Level * error *random-noise

(chip variations) (biological noise)

Page 10: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Model Assumption

• Organise the twelve values from three exogenous control species in a matrix:

X=[NControls * NChips]

• Error model: Here mi is associated with each control and rj is associated

with each chip or experiment.

Taking logs we have:

Page 11: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Scaling Factors

• Calculating scaling factors using maximum likelihood estimation of the model parameters

Likelihood:

• Estimates are calculated solving

Scaling factors are thus :

Page 12: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

You Should Know

• The Central Dogma (Gene Expression).

• cDNA chip overview.

• Noise in cDNA chips.

• Affymetrix GeneChip overview.

Page 13: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Analysis of Microarray Data

• Vanilla-flavour analysis:– Obtain temporal profiles (e.g. from last

week’s mouse experiment).– ‘Cluster’ profiles– Assume genes in the same cluster are

functionally related.

Page 14: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Temporal Profiles

• Lack of statistical independence.

• Take temporal differences to recover.

• Justified by assuming and underlying Markov process.

Page 15: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Analysis of Microarray Data

Day 1 Day 2 Day 3 Day 4 Day 5 Day 60

40

80

120

2-1 3-2 4-3 5-4 6-5

-80

-40

0

40

80

Original Temporal Profile

Take Temporal Differences

Gene e

xpre

ssio

n level

Change in e

xp.

level

Page 16: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Consider Clustering via MSE

These two similar profiles won’t cluster

Day 1 Day 2 Day 3 Day 4 Day 5 Day 60

40

80

120

Gene e

xpre

ssio

n level

Day 1 Day 2 Day 3 Day 4 Day 5 Day 620

60

100

140

Gene e

xpre

ssio

n level

Page 17: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

The Temporal Differences Will

2-1 3-2 4-3 5-4 6-5

-80

-40

0

40

80

Change in e

xp.

level

2-1 3-2 4-3 5-4 6-5

-80

-40

0

40

80

Change in e

xp.

level

Page 18: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Many Other Different Techniques

• Hierachical Clustering• Self-Organising Maps

• ML-Group– Generative Topographic Mappings (GTM)

Page 19: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

GTM

• Data lies in high dimensional space (>2).

• Model it with a lower embedded dimensionality (2).

• MATLAB Demo of embedded dimensions.

Page 20: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

GTM on Gene Data

• MATLAB Demo.

Page 21: Affymetrix GeneChips and Analysis Methods Neil Lawrence.

Conclusions

• Take Temporal differences of Profiles.

• Attempt to Cluster.

• Test Hypothesis that clustered Genes are functionally related.

• Good luck in the Exam!