Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function

6
Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function Andrew Kernytsky, Burkhard Rost Columbia University

description

Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function. Andrew Kernytsky, Burkhard Rost Columbia University. Enzyme function prediction. Given protein sequence predict Enzyme Commission (EC) number. Ligases. Isomerases. Oxidoreductases. Lyases. Transferases. - PowerPoint PPT Presentation

Transcript of Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function

Page 1: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function

Genetic Algorithms Select Protein Features Most Predictive of Enzyme

Function

Andrew Kernytsky, Burkhard Rost

Columbia University

Page 2: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function

Enzyme function predictionGiven protein sequence predict Enzyme Commission (EC) number

NC-IUBMB (1992) Recommendations of the International Union of Biochemistry on the Nomenclature and Classification of Enzymes. In, Enzyme Nomenclature. Academic Press, New York.

EC Wheel Figure: Porter CT, Bartlett GJ, Thornton JM. Nucleic Acids Res. 2004 January 1; 32: D129–D133.

OxidoreductasesOxidoreductases

TransferasesTransferasesHydrolasesHydrolases

LyasesLyases

IsomerasesIsomerasesLigasesLigases

Page 3: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function

TAGHCVNYDYGAGCQSGSPVbbbbbieeeiibbieeeeee..|....|......||....

AAAccCons

Intersection properties capture local information

20%

10%

5%

HHHEEEEELLEEEEELLLLLiiibbbbbbboooobbbbbb36788842100000000123

Feat 4Feat 5Feat 6

1%

0.1%

0.01%

All All GlobalGlobal

All All IntersectiIntersecti

onon

Limited local information

Significant risk of overfitting during training103+ features > 102 positive samples

Page 4: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function

Algorithm overview

Proteinsequence

MSNLLKDFEVAQC

AA

AA×sec

sec AA×sec

Inner Learning Algorithm

SVM

Neural Network

OR

0.635

0.688

0.677

Fitness Assesed

SelectionCrossoverMutation

AA×sec

AA AA×secsec AA×sec

2nd GenerationGenome Pop.

3rd GenerationGenome Pop.

GA Evolution

Genetic Algorithm

1st 2nd 3rd 4th Generation Populations

AAsecAA×secAA secAA AA×secsec AA×secAA sec AA×sec

All possible combinations

of feature classes [genomes]

AAsecAA×sec

All intersection and global

feature classes

Page 5: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function

GA improves performance

EC Level

Page 6: Genetic Algorithms Select Protein Features Most Predictive of Enzyme Function

Balance between intersection and global features gives best performance

AA, acc, sec, htm, cons-95

AA, acc, sec, cons-95

AA, acc, acc×sec, htm, cons-95

AA, sec, cons-97

AA, acc×sec, sec, cons-95

AA, acc, acc×sec×cons-94, sec

AA, AA×acc×sec×cons-95, sec, cons-95

AA, secAA, acc, sec×cons-94, cons-83×cons-94

AA, acc×sec×cons-89, cons-95

AA, acc×sec×cons-84×cons-94, sec

AA×acc×htm×cons-84×cons-95, acc, cons-94

AAAA, acc×cons-96, sec×cons-91

AA, acc×sec×cons-94, acc×cons-94

AA, acc×sec×cons-84×cons-94

AA, acc×sec×cons-88×cons-91×cons-95

AA×cons-94, acc×cons-94

AA×cons-82, acc×sec×cons-94

AA×cons-82, acc×sec×cons-94×cons-96

AA×sec×htm×cons-95×cons-96