Workshop in Computational Structural Biology 2015 81855 & 81813, 4 points Ora Schueler-Furman TA:...

Post on 20-Jan-2016

216 views 0 download

Tags:

Transcript of Workshop in Computational Structural Biology 2015 81855 & 81813, 4 points Ora Schueler-Furman TA:...

Workshop in Computational Structural Biology

201581855 & 81813, 4 points

Ora Schueler-FurmanTA: Orly Marcu

Introduction – When, Where, How?

• When & Where:– Thursdays, Givat Ram– Lecture: 14:00-15:45,

Sprinzak 25 – Exercise: 16:00-18:45,

Sprinzak computer class #2– Lectures & exercises available

in moodle

• How:– Make sure you have an

account in CS ✓

• Exercises- Submit 7/10 exercises- Due within 2 weeks- Submit by email to

orly.marcu@gmail.com- 1/3 of grade

• Contact: Ora 87094

oraf@ekmd.huji.ac.il, or Orly 87063

orly.marcu@gmail.com

Acknowledgements: Sources of figures and slides include slides from Branden & Tooze; some slides have been adapted from members of the Rosetta Community, especially from Jens MeilerExercises in Pyrosetta have been adapted from teaching material by Jeff Gray

What will we learn:Part I: Protein structure in the eye

of the computational biologist

1. Introduction to computational structural biology•The basics of protein structure•Challenges in computational biology and bioinformatics•Protein structure prediction and design

Part I: Protein structure in the eye of the computational biologist

2. Introduction to Rosetta and structural modeling•Approaches for structural modeling of proteins •The Rosetta framework and its prediction modes•Cartesian and polar coordinates•Sampling (find the structure) and •Scoring (select the structure)

3. Optimization techniques•Energy minimization•Monte Carlo (MC) Sampling•MC with minimization (MCM)

Part II: Protein modeling and design

4. Ab initio modeling: Principles and approaches

5. Full-atom refinement• Local optimization• Side chain modeling

– The representation of side chains as rotamers– Rotamer and off-rotamer sampling– Finding minimum energy rotamer combinations

Part II: Protein modeling and design

6. Homology modeling• Selection of template and alignment of query sequence to

template• Loop modeling approaches (modeling of unaligned regions)

7. Protein design • The theoretical basis of protein design; how different design

goals are achieved• Success and challenge in computational design

Part III: Protein interactions8. Protein-protein docking• Challenges and approaches in protein docking• The theoretical basis of low-resolution and high-resolution docking

9. Interface analysis and design• Determinants of binding affinity and specificity• Identification of interface residue hotspots: Computational alanine scanning• Success and challenge in interface design

10. Summary

What will we learn: ExercisesExercises will span a variety of subjects and involve both Rosetta and other widely-used protocols

• Basic introduction: how to look at proteins• Protein structure evaluation and classification: What does my protein do, how good is its structure? • Structure comparison• Running Rosetta• Pyrosetta and Rosettascripts: running and programming

• ab initio modeling• Homology modeling• Structure refinement• Modeling side chains• Loop modeling• Protein docking• Interface analysis –

Computational alanine scanning

• Protein design and protein interface design

1. Introduction to Computational Structural Biology

The Basics of Protein Structure

The central dogma

The code: 4 bases, 64 triplets, 20 amino acids

4 Hierarchies of protein structure

• Anfinsen: sequence determines structure

The building blocks: 20 amino acids

• Differ in size, polarity, charge, secondary structure propensity …

• The simplest aa• No sc• Very flexible bb

Special amino acids

• Cyclic aa• sc Connects bb N• Very constrained bb

N

CO

C H

HH

N

CO

C H

CH2

CH2H2C

Aliphatic amino acids

• sc contains only carbon and hydrogen atoms• hydrophobic

Amino acids with hydroxyl group

Negatively charged amino acids

Different size → different tendency for 2. structure

Amide amino acids

Positively charged amino acids

• large sc

• pKa 11.1 • pKa 12

Aromatic amino acids

• sc contains aromatic ring

• pKa 7

• benzene ring

Amino acids with sulfur

Cystine

Oxidation of Sulfur atoms creates covalent disulfide bond (S-S bond)between two cysteines

S-S bonds stabilize the protein

A chainG I V E Q C C A S V C S L Y Q L E N E N Y C N

s

s

s

s

B chainF V N Q H L C G S H L V E A L Y L V C G E R G F..

s

s

InsulinA chain

NN

CC

B chain

Post-translational modifications

• Processing (pro-insulin/insulin)– control of protein activity

• Glycosylation– protein trafficking

• Phosphorylation (Tyr, Ser, Thr) – regulation of signaling

• Methylation, Acetylation – histone tagging

• ….

24

Metal binding proteins

• aa: HCDE• Fe, Zn, Mg, Ca• Fe

– blood: red hemoglobin– electro-transfer: cytochrome c

• Zn – in DNA-binding “Zn-finger” proteins– Alcohol dehydrogenase: oxidation of alcohol

25

Important bonds for protein folding and stability

Dipole moments attract each other by van der Waals force (transient and very weak: 0.1-0.2 kcal.mol) Hydrophobic interaction –hydrophobic groups/ molecules tend to cluster together and shield themselves from the hydrophilic solvent

Dipole moments attract each other by van der Waals force (transient and very weak: 0.1-0.2 kcal.mol) Hydrophobic interaction –hydrophobic groups/ molecules tend to cluster together and shield themselves from the hydrophilic solvent

Hydrogen bonding potential of amino acids

Primary sequence: concatenated amino acids

Primary sequence: concatenated amino acids

Formation of a peptide bond

O - oxygen

N - nitrogen

O-+H3N

R

H

CO

C

||

H - hydrogen

C - carbon

cpk colors

The geometry of the peptide backbone

• Peptide bond length and angles do not change• Peptide dihedral angles define structure

•The peptide bond is planar & polar :=180o (trans) or 0o (cis)

Dihedral angles

Dihedral angles 1-4 define side chain

From wikipedia

• Dihedral angle: defines geometry of 4 consecutive atoms (given bond lengths and angles)

Ramachandran plot

Glycine: flexible backboneAll except Glycine

33

Ramachandran plot

34

Secondary structure: local interactions

Secondary structure – built from backbone hydrogen bonds

helix• discovered 1951 by Pauling• 5-40 aa long• average: 10aa• right handed • Oi-NHi+4 : bb atoms satisfied

• helix: i - i+5• 310 helix: i - i+3

1.5Å/res

Favored: Ala, Leu, Arg, Met, Lys Disfavored: Asn, Thr, Cys, Asp, Gly

helix: dipole

• binds negative charges at N-terminus

View down one helical turn39

helix: side chains point out

Frequent amino acids at the N-terminus of helices

Pro Blocks the continuation of the helix by its side

chain Asn, Ser

Block the continuation of the helix by hydrogen bonding with the donor (NH) of N3

Ncap, N1, N2, N3 …….Ccap

40

Helices of different character

1. buried 2. partially exposed3. exposed

41

Representation: helical wheel

42

1. buried 2. partially exposed: amphipathic

helix3. exposed

-sheet• Involves several regions in sequence• Oi-NHj

•Parallel andanti-parallelsheets

43Favored: Tyr, Thr, Ile, Phe, Trp Disfavored: Glu, Ala, Asp, Gly, Pro

Antiparallel -sheet

• Parallel Hbonds• Residue side chains point up/down/up ..• Pleated

44

Parallel -sheet

• less stable than antiparallel sheet• angled hbonds

45

Connecting elements of secondary structure define tertiary structure

46

Loops

• connect helices and strands• at surface of molecule• more flexible• contain functional sites

47

Hairpin Loops ( turns)• Connect strands in antiparallel sheet

G,N,D G G S,T

48

Super secondary structures – Greek Key Motif

Most common topology for 2 hairpins

49

Super Secondary Structures- Motif

• connects strands in parallel sheet• always right-handed

50

Repeated motif creates -meander: TIM barrel

51

Tertiary structure defines protein function

The quaternary structure of a protein defines its biological

functional unit

53

Quaternary structure: Hemoglobin consists of 4 distinct chains

Quaternary structure: assembly of protein domains

(from two distinct protein chains, or two domains in one protein sequence)

Glyceraldehyde phosphate dehydrogenase:• domain 1 binds the substance for being metabolized, • domain 2 binds a cofactor

1. Introduction to Computational Structural Biology

Experimental determination of protein structure: X-ray diffraction

and NMR

Experimental determination of structure

X-ray crystallography• Determines electron

density – positions of atoms in structure

• Highly accurate• Static: depends on

crystal

NMR• Determines constraints

between labeled spins• Allows measure of

structure in solution• Resolution not defined:

more constraints – better defined structure

X-ray diffraction

X-ray diffraction

If direction is such that -> Constructive addition-> Reflection spot in the diffraction pattern

• Wavelength of x-ray ~ crystal plane separations

• Rotation of crystal relative to beam allows recording of different diffractions

• Diffraction maps are translated to electron density maps using Fourier Transform

Resolution measures diffraction angles (high angle peaks – high resolution data)

X-ray diffraction

Iterative refinement allows improvement of structure

R-factor measures quality

Fo – observedFc - calculated

X-ray diffraction

1950’s first protein structure solved by Kendrew & Perutz: sperm whale myoglobin

Today: ~107’000 structures solved, most by x-ray crystallography

Challenges• Grow crystal• Determine phase

NMR (Nuclear Magnetic Resonance)

NMR-active nuclei (w spins)1H, 13C

Application of magnetic field reorients spins – measure resonance between close nuclei

Extract constraints & determine structure

1. Introduction to Computational Structural Biology

Challenges in Computational Structural Biology

Protein structure prediction and design

Protein sequence

Protein sequence

Protein structureProtein

structure

Protein Structure prediction

Protein Structure prediction

Protein DesignProtein Design

FASTA>2180 hSERTMETTPLNSQKQ……

PDBATOM 490 N GLN A 31 52.013 -87.359 -8.797 1.00 7.06 NATOM 491 CA GLN A 31 52.134 -87.762 -10.201 1.00 8.67 CATOM 492 C GLN A 31 51.726 -89.222 -10.343 1.00 10.90 CATOM 493 O GLN A 31 51.015 -89.601 -11.275 1.00 9.63 O…..….

Additional topics in computational structural biology

• Nucleic acids - Prediction of binding and structure– RNA stem & loops, pseudoknots; protein-RNA binding– DNA curvature; protein-DNA binding

• Prediction of macromolecular structures– Reconstruction of protein assemblies from low-

resolution cryo-EM maps

• Protein-ligand interactions– Docking of small ligands– Design of inhibitors

… and many many more!… and many many more!