Protein Modeling using Machine Learning...

13
Machine Learning For Protein Modeling Presented By Ellen Huynh

Transcript of Protein Modeling using Machine Learning...

Page 1: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

Machine Learning For

Protein Modeling

Presented By

Ellen Huynh

Page 2: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

What are Proteins?

• Complex, high molecular

mass, organic compounds

• Consists of a specific order of amino acids (aa’s) joined together by peptide bonds

• The order of the aa’s is determined by the base sequence of nucleotides in the gene that codes for the protein

Page 3: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

Why Study Proteins?

• Proteins are required for structure, function and regulation of body’s cells, tissues and organs

• Each protein has a unique functions, determine by their structure

• Examples of protein are enzymes, hormones, antibodies

Page 4: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

Protein Structures (1/3)• Primary

– Amino acid sequence of polypeptide chain (linear)

– Determined by the gene that encodes it

• Secondary– Three types: -helix,

-sheets, coils

– Local ordered structure brought

about via hydrogen bonding

mainly within the peptide

backbone

– -helix: backbone H-bonds

link residues i and i+4

– -sheets: H-bonds link two

sequence segments

Page 5: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

Protein Structure (2/3)

• Tertiary– "global" folding of a single polypeptide chain

– driving force in determining the tertiary structure of globular proteins is the

hydrophobic effect

– Folding so that side chains of the

nonpolar amino acids are

"hidden“ within the

structure and the side chains

of the polar residues are

exposed on the outer surface

Page 6: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

Protein Structure (3/3)

• Quaternary

– Involves 2 or more

polypeptide chain to form

a multi-subunit

structure

Page 7: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

Pre-Machine Learning Methods

• X-ray and NMR were used to determine structure and function of proteins

• Methods were costly and time consuming

Page 8: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

Goal

• Increase the accuracy of Protein Structure prediction, mainly at the secondary level, in an effective manner to help improve the understanding of

protein functions

Page 9: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

Machine Learning Methods (1/2)• Neural Networks

– Trained pairwise neural networks

– Networks are initialized with random uniform weights and subsequently trained through backpropagation

• Hidden Markov Method– Modeling stochastic sequences with probabilistic finite

state machine

– Character in position t depends only on the k preceding characters, where k = order of Markov Chain

– Hidden process: secondary structure of protein

– Observed process: amino acid sequence

– Prediction achieved with forward/backward algorithm

Page 10: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

Protein Alphabets

• Structural Alphabet: 20 amino acids

• Chemical Alphabet: acidic, aliphatic, amide, aromatic, basic, hydroxyl, etc.

• Functional Alphabet: acidic, basic, hydrophoic nonpolar, polar uncharged

• Charge Alphabet: acidic, basic, neutral

• Hydrophobic Alphabet: hydrophobic, hydrophilic

Page 11: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

Attribute

• Window size (W) that covers a relevant sequence

• Input: Protein sequence: p = p1p2…p1n

• Output: -helix (H), -sheets (B), coils (C)

• Data Set: www.pdb.org

• Trained weights: determine by previous set of alphabets and data set

Page 12: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

Additional Information

• How far into Project?

– Have done researches into possible algorithms that can be implemented

• Risk?

Page 13: Protein Modeling using Machine Learning Methodspages.cpsc.ucalgary.ca/~mrichter/ML/Older/midtermexamples...exposed on the outer surface Protein Structure (3/3) •Quaternary –Involves

References• Baldi, P., Brunak, S. (1998). “Bioinformatics: The

Machine Learning Approach.” The MIT Press.

• Gorga, F.R. (2001). “Introduction to Protein Structure” http://webhost.bridgew.edu/fgorga/proteins/default.htm

• Martin, J., Gibrat, J., Rodolphe, F. “Hidden Markov Model for Protein Secondary Structure.”

• Won, K., Hamelryck, T., Prugel-Bennett, A., Krogh, A. “Evolving Hidden Markov Models for Protein Secondary Structure Prediction.”

• Zhang, B., Zhihang, C. Murphey, Y.L. (2005). “Protein Secondary Structure Prediction Using Machine Learning.”