Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

27
Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

Page 1: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

Bioinformatics and Statistics: A Real World Example

Joseph D. Szustakowski

Page 2: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

Words of Encouragement

• “There are three kinds of lies: lies, damned lies, and statistics” – Benjamin Disraeli

• “Statistics in the hands of an engineer are like a lamppost to a drunk – they’re used more for support than illumination”

• “Then there is the man who drowned crossing a stream with an average depth of six inches.” – W.I.E. Gates

Page 3: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

Outline

• Basic idea – what are we trying to do?

• Extreme Value Distribution – a brief review

• Overwhelm audience with lots of pictures

Page 4: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

The Basic Idea

• Most experiments result in one or more quantitative measurements– Height, length, weight, time, speed– SW score, threading potential, Viterbi score

• Is that measurement unusual?– Tall or short; heavy or light; fast or slow– ‘good’ or ‘bad’; homologous or non-

homologous

Page 5: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

The Basic Idea

• ‘Unusualness’ is only definable if we know what ‘usual’ is.– Make lots of random measurements– Model the ‘background’ distribution– Compare measurements of interest to the

background

Page 6: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

What’s the Point?

• Are our results good / bad / the same / meaningful / garbage ?

• Consultation with an oracle– Definitive– Elusive

• Magic eight ball– Readily available– Inconsistent results

Page 7: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

What’s the Point?

• Statistics - – Readily available– Reproducible– Provide an estimate of how likely a better /

worse / same result can be obtained by chance.

Page 8: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

Background Distributions

• Gaussian – sum of independent variables (central limit theorem)

• Extreme Value Distributions – optimization procedures

Page 9: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

Extreme Value Distributions – a Brief Review

• Extreme value distributions often result from optimization procedures– Sequence alignments (BLAST, SW)– Viterbi algorithm (HMMER, SAM)

• EVDs are skewed

• EVDs have a ‘heavy’ tail

Page 10: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 11: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

Real World Example• Protein structure alignment

– Identify equivalent backbone positions in two proteins– Maximize the number of equivalent pairs– Minimize the distance between pairs

• K2– Target function to evaluate alignments– Searches for the best alignment

• Dynamic programming• Weighted bipartite matching• Genetic algorithm• Simulated annealing• Kitchen sink (in progress)

Page 12: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

Serine Proteases

• Yellow – human protease

• Red – viral coat protein

• Asp-His-Ser catalytic triad shown in balls and sticks

Page 13: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

DNA Methyltransferases

1 2 3456 7 A B CDE Z

1 2 3456 7 A B CDE Z

NCN

CM.TaqI

M.PvuII

Page 14: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

Background Distribution

ln( )Z S L

ZZ eZ e e

Extreme value distribution

( ) 1ZeP Z e

P-Value

Page 15: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 16: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 17: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 18: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 19: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 20: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 21: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 22: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 23: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 24: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 25: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.

Table 2 K2 and BLAST Sensitivities

Positive Set LL LR UL UR TotalK2

SensitivityBLAST

Sensitivity

Fold95% Identity 31909 133709 79 34092 199789 83% 16%

Fold40% Identity 1582 22237 8 7927 31754 75% 5%

Superfamily95% Identity 31909 95722 79 16541 144251 89% 22%

Superfamily40% Identity 1582 11519 8 2873 15982 82% 10%

Page 26: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Page 27: Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.