What is Bioinformatics?

30
What is Bioinformatics? The Data The Analysis Comparison Evolution Long Distance: Comparative Genomics Short Distance: Variation Analysis Homology Non-homology

description

What is Bioinformatics?. The Data The Analysis Comparison Evolution Long Distance: Comparative Genomics Short Distance: Variation Analysis Homology Non-homology - PowerPoint PPT Presentation

Transcript of What is Bioinformatics?

Page 1: What is Bioinformatics?

What is Bioinformatics?

The Data

The Analysis

Comparison

Evolution

Long Distance: Comparative Genomics

Short Distance: Variation Analysis

Homology

Non-homology

Physical/Chemical/Statistical Mathematical Modelling

Page 2: What is Bioinformatics?

The Data & its growth.1976/79 The first viral genome –MS2/X174

1995 The first prokaryotic genome – H. influenzae

1996 The first unicellular eukaryotic genome - Yeast

1997 The first multicellular eukaryotic genome – C.elegans

2001 The human genome 3Gb

1.5.03: Known

>1000 viral genomes

96 prokaryotic genomes

16 Archeobacterial genomes

A series multicellular genomes are coming.

A general increase in data involving higher structures and dynamics of biological systems

Page 3: What is Bioinformatics?

Genomes & Tree of Life

•3.5-3.8 Gyr Origin of Life

•3+ Gyr LUCA

•~1.4 Gyr Origin of Eukaryotes

•5-600 Myr Origin of Vertebrates

•200+ Myr Origin of Mammals

•80-100 Myr Mouse Mammalian Split

•5-7 Myr Chimp-Human Split

•100 Kyr – Myr Age of Polymorphisms

From Janssen, 2003

Page 4: What is Bioinformatics?

Comparison of Evolutionary Objects.

RNA (Secondary) StructureSequences

ACTGT

ACTCCT

Protein Structure

87654321

4

Cabbage

Turnip

75 31 86 2

Gene Order/Orientation.

Gene Structure

Interaction Networks

Any Graph.

General Theme.

Formal Model of Structure

Stochastic Model of Structure Evolution.

Renin

HIV proteinase

Page 5: What is Bioinformatics?

The Phylogeny for Evolutionary Objects

observable observable

Parameters:tim

e

rates, selectionUnobservable

Evolutionary Path

observable

MRCA-Most Recent Common Ancestor

?

3 Problems:

i. Test all possible relationships.

ii. Examine unknown internal states.

iii. Explore unknown paths between states at nodes.

ATTGCGTATATAT….CAG ATTGCGTATATAT….CAG ATTGCGTATATAT….CAG

Tim

e Direction

Page 6: What is Bioinformatics?

Gene and Genome Evolution

)1(41),( 4

,t

t eGCP

Higher CellsChimp Mouse Fish E.coli

TGCGTATC TGTGTATA

Basic Events

• substitutions.

• insertion deletions.

• Chromosome Level events: inversions, duplications, transpositions,..

Average Number of Mitoses

•Per Male generation (15:35 .. 20:150)

•Per Female generation: ~24

• Single nucleotide substitutions: ~10-7

• Microsatellites (~100.000): ~10-2

• Small insertion deletions: ~10-8

Page 7: What is Bioinformatics?

Principles of String Comparison: Alignment

ACTGT

ACTCCT

ACTGCT ACTCGT

ACTGT

ACTCCT

ACT-GT

ACTCCT

ACTG-T

ACTCCT

Cost 2 Probability: e-16.47

.41 .41

Page 8: What is Bioinformatics?

Human alpha hemoglobin;Human beta hemoglobin;Human myoglobinBean leghemoglobin

Probability of data e -1560.138

Probability of data and alignment e-1593.223

Probability of alignment given data 4.279 * 10-15 = e-33.085

Ratio of insertion-deletions to substitutions: 0.0334

Maximum likelihood phylogeny and alignment

Gerton Lunter

Istvan Miklos

Alexei Drummond

Yun Song

Page 9: What is Bioinformatics?

Rooting using irreversibility (Lunter)

Lunter and Hein, ISMB2004

Reversibility:P( )=P( )* P( )P( )*

The Pulley Principle:

=

=

Contagious Dependence

CG avoidance creates irreversibility

Page 10: What is Bioinformatics?

Comparison of Evolutionary Objects.

Observable

Observable Unobservable

Unobservable

U

C G

A

C

AU

A

C

)()(

)()(

SequencePSequenceStructureP

StructurePStructureSequenceP

Goldman, Thorne & Jones, 96

Knudsen & Hein, 99

Eddy & co.

Meyer and Durbin 02 Pedersen & Hein, 03 Siepel & Haussler 03

Page 11: What is Bioinformatics?

The Rise of Comparative Genomics

Lan

der

et

al(2

001)

Fig

ure

25A

Page 12: What is Bioinformatics?

Recursive Definition of Strings

A

I

A

I

A

I

A

I

A

I

ATG

E

Exon 2Exon 1 Exon 3

GAG

E

s

ds

ss

dd d

S -> sS Ss dSd SS

S -> E I

E -> eE eI I -> iE iI

sSS

S

S

S

S ssSS

ssdSdS ssddSddS

ssddSddsS

Gene Grammar RNA Grammar

Page 13: What is Bioinformatics?

Stochastic Grammars

If there is a 1-1 derivation (creation) of a string, the probability of a string can be obtained as the product probability of the applied rules.

S -> aSa -> abSba -> abaaba (.015) 0.3 0.5 0.1

S--> (0.3)aSa (0.5)bSb (0.1)aa (0.1)bb

Page 14: What is Bioinformatics?

Grammars: Finite Set of Rules for Generating Stringsi. A starting symbol:

ii. A set of substitution rules applied to variables - - in the present string:

Reg

ula

r

Co

nte

xt F

ree

Co

nte

xt S

ensi

tive

Gen

eral

(a

lso

era

sin

g)

finished – no variables

Page 15: What is Bioinformatics?

Structure Dependent Evolution: RNA

U A C A C C G U

U

C G

A C

AU

C

U A C A C C G U

U A C A C C G U

U A C A C C G U

1 2 3 4 5 6 7

23

68

457

1 2 3 4 5 6 7

23

68

457

)(

)(

,

,

UnpairedHistoryP

PairedHistoryP

ji

ji

)(

)(

,

,

UnpairedHistoryP

PairedHistoryP

ji

ji

Fro

m B

jarn

e K

nuds

en

Page 16: What is Bioinformatics?

Knudsen & Hein, 2003

From Knudsen & Hein (1999)

RNA Structure Application

Page 17: What is Bioinformatics?

Observing Evolution has 2 parts

P(x):

P(Further history of x):

U

C G

A

C

AU

A

C

xx

Page 18: What is Bioinformatics?

Inter- and Intra-species Comparisons

At shorter time scales

•For sequences sampled within a population, their relationship is determined by population structure. There is no analogue for this for interspecies sequences.

•Is within species variation a short time slice of long term variation?

•Where do the species and population perspective meet?

Page 19: What is Bioinformatics?

Short Time Evolution: Population Genetics and History

Population

N1

1 2 1 2 1 2 1 2 1 2

Tim

e

Cardon

Donnelly

Griffiths

McVean

Wiuf

Song

Schierup

Three large areas of application:

Interpretation of Variation

Human Population History

Gene Mapping

Pathogen Evolution

An

cestral Reco

mb

inatio

n G

raph

Page 20: What is Bioinformatics?

Time slices

Population

N1

1 2 1 2 1 2 1 2 1 2

Tim

e

All positions have found a common ancestors

All positions have found a common ancestors on one sequence

Page 21: What is Bioinformatics?

A randomly picked ancestor: (ancestral material comes in batteries!)

0

0 52.000

260 Mb

06890 8360

7.5 Mb

*35

0 30kb

*250

4Ne 20.000 Segments 52.000 Ancestors 6.800

Applications to Human Genome (Chr 1) (Wiuf and

Hein,97)

Page 22: What is Bioinformatics?

The Origin of Variation

Show variation

N1

A

G

CA

G

C

A

G

T

T

G

C

T

G

C

Tim

e

T

G

C

Inter.SNP Consortium (2001): A map of human genome sequence variation containing 1.42 million SNPs. Nature 409.928-33

Page 23: What is Bioinformatics?

Slice in Space

N1

Tim

e

Page 24: What is Bioinformatics?

a: (3,4)

b: (3,4)

c: (15,16)

d: (16,17)

e: (35,36)

f: (35,36)

g: (36,37)

Minimal ARGs and Haplotype Blocks (Song)

Page 25: What is Bioinformatics?

Yun Song, 2004

Page 26: What is Bioinformatics?

Genotype and Phenotype Covariation: Gene Mapping

Tim

e Reich et al. (2001)

Rafnar et al.(2004) – Morris et al(2001) +

Page 27: What is Bioinformatics?

Finding Homologies

DatabaseNew Sequence

P( ) P( ) / * P( )

R. Doolittle et al.(1983).

New Sequence: Simian Sarcoma Virus onc Gene

Similar Sequence: Platelet-Derived Growth Factor

Properties for the known sequence are transferred to the new sequence, immediately yielding biological hypotheses about the new sequence.

P28SIS 51 GGELESLARGSLGSLSVAEPAMIAECKTRTEVFEISAALIDATNANFLVWPPCVEVQACSGCCNNRN..PDGF-1 1 ----------SLGSLTIAEPAMIAECKTREEVCFCIAAL?DA????????PPCVEVKACTGCCNNRN.. ***** ************ ** *** ** ****** ** *******

Page 28: What is Bioinformatics?

“Knowledge Based..”: The Products of Evolution - An Example (D.Baker)

Sequence Structure

Make a List:

Choose global structure that doesn’t create new local structures!

Page 29: What is Bioinformatics?

What is Bioinformatics?

The Data

The Analysis

Comparison

Evolution

Long Distance: Comparative Genomics

Short Distance: Variation Analysis

Homology

Non-homology

Physical/Chemical/Statistical Mathematical Modelling

Page 30: What is Bioinformatics?

Lizhong HaoBen Holtom Stephen McCauley

Gerton Lunter Rune Lyngsoe Irmtraud MeyerYun Song Jennifer Taylor

Jotun HeinAlexei DrummondRoald Forsberg Bjarne KnudsenIstvan MiklosJakob Skou PedersenSantiago SchnellCarsten Wiuf….

Homepage:Homepage:

http://www.stats.ox.ac.uk/mathgen/bioinformatics/http://www.stats.ox.ac.uk/mathgen/bioinformatics/

Methodology•Evolutionary Models

•Alignment

•Expression Data

•Genome and Gene Evolution

•Sequence Variation Data & Recombination

•RNA Secondary Structure and Evolution

•…………

Collaborations•William Cookson (WCHG)

•John Hancock (Harwell MRC)

•Peter Simmonds (Edinburgh)

•Bioinformatics Research Centre, Dk

•………

Funding:Funding:

MRC & EPSRCMRC & EPSRC