Protein Modeling using Machine Learning...

Post on 22-May-2020

3 views 0 download

Transcript of Protein Modeling using Machine Learning...

Machine Learning For

Protein Modeling

Presented By

Ellen Huynh

What are Proteins?

• Complex, high molecular

mass, organic compounds

• Consists of a specific order of amino acids (aa’s) joined together by peptide bonds

• The order of the aa’s is determined by the base sequence of nucleotides in the gene that codes for the protein

Why Study Proteins?

• Proteins are required for structure, function and regulation of body’s cells, tissues and organs

• Each protein has a unique functions, determine by their structure

• Examples of protein are enzymes, hormones, antibodies

Protein Structures (1/3)• Primary

– Amino acid sequence of polypeptide chain (linear)

– Determined by the gene that encodes it

• Secondary– Three types: -helix,

-sheets, coils

– Local ordered structure brought

about via hydrogen bonding

mainly within the peptide

backbone

– -helix: backbone H-bonds

link residues i and i+4

– -sheets: H-bonds link two

sequence segments

Protein Structure (2/3)

• Tertiary– "global" folding of a single polypeptide chain

– driving force in determining the tertiary structure of globular proteins is the

hydrophobic effect

– Folding so that side chains of the

nonpolar amino acids are

"hidden“ within the

structure and the side chains

of the polar residues are

exposed on the outer surface

Protein Structure (3/3)

• Quaternary

– Involves 2 or more

polypeptide chain to form

a multi-subunit

structure

Pre-Machine Learning Methods

• X-ray and NMR were used to determine structure and function of proteins

• Methods were costly and time consuming

Goal

• Increase the accuracy of Protein Structure prediction, mainly at the secondary level, in an effective manner to help improve the understanding of

protein functions

Machine Learning Methods (1/2)• Neural Networks

– Trained pairwise neural networks

– Networks are initialized with random uniform weights and subsequently trained through backpropagation

• Hidden Markov Method– Modeling stochastic sequences with probabilistic finite

state machine

– Character in position t depends only on the k preceding characters, where k = order of Markov Chain

– Hidden process: secondary structure of protein

– Observed process: amino acid sequence

– Prediction achieved with forward/backward algorithm

Protein Alphabets

• Structural Alphabet: 20 amino acids

• Chemical Alphabet: acidic, aliphatic, amide, aromatic, basic, hydroxyl, etc.

• Functional Alphabet: acidic, basic, hydrophoic nonpolar, polar uncharged

• Charge Alphabet: acidic, basic, neutral

• Hydrophobic Alphabet: hydrophobic, hydrophilic

Attribute

• Window size (W) that covers a relevant sequence

• Input: Protein sequence: p = p1p2…p1n

• Output: -helix (H), -sheets (B), coils (C)

• Data Set: www.pdb.org

• Trained weights: determine by previous set of alphabets and data set

Additional Information

• How far into Project?

– Have done researches into possible algorithms that can be implemented

• Risk?

References• Baldi, P., Brunak, S. (1998). “Bioinformatics: The

Machine Learning Approach.” The MIT Press.

• Gorga, F.R. (2001). “Introduction to Protein Structure” http://webhost.bridgew.edu/fgorga/proteins/default.htm

• Martin, J., Gibrat, J., Rodolphe, F. “Hidden Markov Model for Protein Secondary Structure.”

• Won, K., Hamelryck, T., Prugel-Bennett, A., Krogh, A. “Evolving Hidden Markov Models for Protein Secondary Structure Prediction.”

• Zhang, B., Zhihang, C. Murphey, Y.L. (2005). “Protein Secondary Structure Prediction Using Machine Learning.”