Protein Folding Bioinformatics Ch 7 (with a little of Ch 8)

Protein Folding

Bioinformatics Ch 7

(with a little of Ch 8)

The Protein Folding Problem

• Central question of molecular biology:“Given a particular sequence of amino acid Given a particular sequence of amino acid residues (primary structure), what will the residues (primary structure), what will the tertiary/quaternary structure of the resulting tertiary/quaternary structure of the resulting protein be?”protein be?”

• Input: AAVIKYGCAL…Output: 11, 22…= backbone conformation:(no side chains yet)

Disulfide Bonds

• Two cyteines in close proximity will form a covalent bond

• Disulfide bond, disulfide bridge, or dicysteine bond.

• Significantly stabilizes tertiary structure.

Protein Folding – Biological perspective

• ““Central dogma”: Central dogma”: Sequence specifies structureSequence specifies structure• Denature – to “unfold” a protein back to random

coil configuration -mercaptoethanol – breaks disulfide bonds– Urea or guanidine hydrochloride – denaturant– Also heat or pH

• Anfinsen’s experiments– Denatured ribonuclease– Spontaneously regained enzymatic activity– Evidence that it re-folded to native conformation

Folding intermediates

• Levinthal’s paradox – Consider a 100 residue protein. If each residue can take only 3 positions, there are 3100 = 5 1047 possible conformations.– If it takes 10-13s to convert from 1 structure to another,

exhaustive search would take 1.6 1027 years!

• Folding must proceed by progressive stabilization of intermediates– Molten globules – most secondary structure formed,

but much less compact than “native” conformation.

Forces driving protein folding

• It is believed that hydrophobic collapse is a key driving force for protein folding– Hydrophobic core– Polar surface interacting with solvent

• Minimum volume (no cavities)• Disulfide bond formation stabilizes• Hydrogen bonds• Polar and electrostatic interactions

Folding help

• Proteins are, in fact, only marginally stable– Native state is typically only 5 to 10 kcal/mole

more stable than the unfolded form

• Many proteins help in folding– Protein disulfide isomerase – catalyzes

shuffling of disulfide bonds– Chaperones – break up aggregates and (in

theory) unfold misfolded proteins

The Hydrophobic Core

• Hemoglobin A is the protein in red blood cells (erythrocytes) responsible for binding oxygen.

• The mutation E6V in the chain places a hydrophobic Val on the surface of hemoglobin

• The resulting “sticky patch” causes hemoglobin S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently

• Sickle cell anemia was the first identified molecular disease

Sickle Cell Anemia

Sequestering hydrophobic residues in Sequestering hydrophobic residues in the protein core protects proteins from the protein core protects proteins from hydrophobic agglutination.hydrophobic agglutination.

Computational Problems in Protein Folding

• Two key questions:– Evaluation – how can we tell a correctly-folded protein

from an incorrectly folded protein?• H-bonds, electrostatics, hydrophobic effect, etc.• Derive a function, see how well it does on “real” proteins

– Optimization – once we get an evaluation function, can we optimize it?

• Simulated annealing/monte carlo• EC• Heuristics

Fold Optimization

• Simple lattice models (HP-models)– Two types of residues:

hydrophobic and polar– 2-D or 3-D lattice– The only force is

hydrophobic collapse– Score = number of HH

contacts

• H/P model scoring: count noncovalent hydrophobic interactions.

• Sometimes:– Penalize for buried polar or surface hydrophobic residues

Scoring Lattice Models

What can we do with lattice models?

• For smaller polypeptides, exhaustive search can be used– Looking at the “best” fold, even in such a simple

model, can teach us interesting things about the protein folding process

• For larger chains, other optimization and search methods must be used– Greedy, branch and bound– Evolutionary computing, simulated annealing– Graph theoretical methods

• The “hydrophobic zipper” effect:

Learning from Lattice Models

Ken Dill ~ 1997

• Absolute directions– UURRDLDRRU

• Relative directions– LFRFRRLLFL– Advantage, we can’t have UD or RL in absolute– Only three directions: LRF

• What about bumps? LFRRR– Bad score– Use a better representation

Representing a lattice model

Preference-order representation

• Each position has two “preferences”– If it can’t have either of the two, it will take the “least

favorite” path if possible

• Example: {LR},{FL},{RL},{FR},{RL},{RL},{FR},{RF}

• Can still cause bumps:{LF},{FR},{RL},{FL},{RL},{FL},{RF},{RL},{FL}

“Decoding” the representation

• The optimizer works on the representation, but to score, we have to “decode” into a structure that lets us check for bumps and score.

• Example: How many bumps in: URDDLLDRURU?

• We can do it on graph paper– Start at 0,0– Fill in the graph

More realistic models

• Higher resolution lattices (45° lattice, etc.)

• Off-lattice models– Local moves– Optimization/search methods and /

representations• Greedy search

• Branch and bound

• EC, Monte Carlo, simulated annealing, etc.

Threading: Fold recognition

• Given:– Sequence: IVACIVSTEYDVMKAAR…

– A database of molecular coordinates

• Map the sequence onto each fold

• Evaluate– Objective 1: improve scoring

function– Objective 2: folding

X-Ray Crystallography

~0.5mm

• The crystal is a mosaic of millions of copies of the protein.

• As much as 70% is solvent (water)!

• May take months (and a “green” thumb) to grow.

X-Ray diffraction

• Image is averagedover:– Space (many copies)– Time (of the diffraction

experiment)

The Protein Data Bank

ATOM 1 N ALA E 1 22.382 47.782 112.975 1.00 24.09 3APR 213ATOM 2 CA ALA E 1 22.957 47.648 111.613 1.00 22.40 3APR 214ATOM 3 C ALA E 1 23.572 46.251 111.545 1.00 21.32 3APR 215ATOM 4 O ALA E 1 23.948 45.688 112.603 1.00 21.54 3APR 216ATOM 5 CB ALA E 1 23.932 48.787 111.380 1.00 22.79 3APR 217ATOM 6 N GLY E 2 23.656 45.723 110.336 1.00 19.17 3APR 218ATOM 7 CA GLY E 2 24.216 44.393 110.087 1.00 17.35 3APR 219ATOM 8 C GLY E 2 25.653 44.308 110.579 1.00 16.49 3APR 220ATOM 9 O GLY E 2 26.258 45.296 110.994 1.00 15.35 3APR 221ATOM 10 N VAL E 3 26.213 43.110 110.521 1.00 16.21 3APR 222ATOM 11 CA VAL E 3 27.594 42.879 110.975 1.00 16.02 3APR 223ATOM 12 C VAL E 3 28.569 43.613 110.055 1.00 15.69 3APR 224ATOM 13 O VAL E 3 28.429 43.444 108.822 1.00 16.43 3APR 225ATOM 14 CB VAL E 3 27.834 41.363 110.979 1.00 16.66 3APR 226ATOM 15 CG1 VAL E 3 29.259 41.013 111.404 1.00 17.35 3APR 227ATOM 16 CG2 VAL E 3 26.811 40.649 111.850 1.00 17.03 3APR 228

• http://www.rcsb.org/pdb/

Protein Folding Bioinformatics Ch 7 (with a little of Ch 8)

Documents

Transcript of Protein Folding Bioinformatics Ch 7 (with a little of Ch 8)