Molecular Modeling andSimulation

12
Tamar Schlick Molecular Modeling and Simulation An Interdisciplinary Guide 2nd edition 4?) Springer

Transcript of Molecular Modeling andSimulation

Page 1: Molecular Modeling andSimulation

Tamar Schlick

Molecular Modelingand Simulation

An Interdisciplinary Guide

2nd edition

4?) Springer

Page 2: Molecular Modeling andSimulation

Contents

About the Cover v

Book URLs ix

Preface xi

Prelude xix

Table of Contents xxi

List of Figures xxxiii

List of Tables xxxix

Acronyms, Abbreviations, and Units xli

1 Biomolecular Structure and Modeling: Historical Perspective 1

1.1 A Multidisciplinary Enterprise 2

1.1.1 Consilience 2

1.1.2 What is Molecular Modeling? 3

1.1.3 Need For Critical Assessment 5

1.1.4 Text Overview 6

1.2 The Roots of Molecular Modeling in Molecular Mechanics.

8

1.2.1 The Theoretical Pioneers 8

1.2.2 Biomolecular Simulation Perspective 11

Page 3: Molecular Modeling andSimulation

xxii Contents

1.3 Emergence of Biomodeling from Experimental Progress

in Proteins and Nucleic Acids 14

1.3.1 Protein Crystallography 14

1.3.2 DNA Structure 17

1.3.3 The Technique of X-ray Crystallography 18

1.3.4 The Technique of NMR Spectroscopy 20

1.4 Modern Era of Technological Advances 22

1.4.1 From Biochemistry to Biotechnology 22

1.4.2 PCR and Beyond 23

1.5 Genome Sequencing 25

1.5.1 Projects Overview: From Bugs to Baboons 25

1.5.2 The Human Genome 30

2 Biomolecular Structure and Modeling: Problem and Application

Perspective 41

2.1 Computational Challenges in Structure and Function .... 41

2.1.1 Analysis of the Amassing Biological Databases... 41

2.1.2 Computing Structure From Sequence 46

2.2 Protein Folding - An Enigma 46

2.2.1 'Old' and 'New' Views 46

2.2.2 Folding Challenges 48

2.2.3 Folding by Dynamics Simulations? 49

2.2.4 Folding Assistants 50

2.2.5 Unstructured Proteins 52

2.3 Protein Misfolding - A Conundrum 53

2.3.1 Prions and Mad Cows 53

2.3.2 Infectious Protein? 53

2.3.3 Other Possibilities 54

2.3.4 Other Misfolding Processes 55

2.3.5 Deducing Function From Structure 56

2.4 From Basic to Applied Research 57

2.4.1 Rational Drug Design: Overview 58

2.4.2 A Classic Success Story: AIDS Therapy 58

2.4.3 Other Drugs and Future Prospects 65

2.4.4 Gene Therapy - Better Genes 67

2.4.5 Designed Compounds and Foods 69

2.4.6 Nutrigenomics 72

2.4.7 Designer Materials 74

2.4.8 Cosmeceuticals 74

3 Protein Structure Introduction 77

3.1 The Machinery of Life 77

3.1.1 From Tissues to Hormones 77

3.1.2 Size and Function Variability 78

3.1.3 Chapter Overview 79

Page 4: Molecular Modeling andSimulation

Contents xxiii

3.2 The Amino Acid Building Blocks 82

3.2.1 Basic C" Unit 82

3.2.2 Essential and Nonessential Amino Acids 83

3.2.3 Linking Amino Acids 85

3.2.4 The Amino Acid Repertoire: From Flexible

Glycine to Rigid Proline 85

3.3 Sequence Variations in Proteins 89

3.3.1 Globular Proteins 90

3.3.2 Membrane and Fibrous Proteins 90

3.3.3 Emerging Patterns from Genome Databases 92

3.3.4 Sequence Similarity 92

3.4 Protein Conformation Framework 97

3.4.1 The Flexible <j> and i/j and Rigid u> Dihedral Angles .97

3.4.2 Rotameric Structures 99

3.4.3 Ramachandran Plots 99

3.4.4 Conformational Hierarchy 103

4 Protein Structure Hierarchy 105

4.1 Structure Hierarchy 106

4.2 Helices: A Common Secondary Structural Element 106

4.2.1 Classic a-Helix 106

4.2.2 3iq and 7T Helices 107

4.2.3 Left-Handed a-Helix 109

4.2.4 Collagen Helix 110

4.3 /3-Sheets: A Common Secondary Structural Element 110

4.4 Turns and Loops 110

4.5 Formation of Supersecondary and Tertiary Structure 113

4.5.1 Complex 3D Networks 113

4.5.2 Classes in Protein Architecture 113

4.5.3 Classes are Further Divided into Folds 114

4.6 a-Class Folds 114

4.6.1 Bundles 114

4.6.2 Folded Leafs 115

4.6.3 Hairpin Arrays 115

4.7 /3-Class Folds 115

4.7.1 Anti-Parallel 0 Domains 116

4.7.2 Parallel and Antiparallel Combinations 116

4.8 a/0 and a + /3-Class Folds 117

4.8.1 a/0 Barrels 117

4.8.2 Open Twisted a/0 Folds 118

4.8.3 Leucine-Rich a/0 Folds 118

4.8.4 a+0 Folds 118

4.8.5 Other Folds 118

4.9 Number of Folds 118

4.9.1 Finite Number? 119

Page 5: Molecular Modeling andSimulation

xxiv Contents

4.10 Quaternary Structure 119

4.10.1 Viruses 119

4.10.2 From Ribosomes to Dynamic Networks 123

4.11 Protein Structure Classification 126

5 Nucleic Acids Structure Minitutorial 129

5.1 DNA, Life's Blueprint 130

5.1.1 The Kindled Field of Molecular Biology 130

5.1.2 Fundamental DNA Processes 132

5.1.3 Challenges in Nucleic Acid Structure 133

5.1.4 Chapter Overview 134

5.2 The Basic Building Blocks of Nucleic Acids 135

5.2.1 Nitrogenous Bases 135

5.2.2 Hydrogen Bonds 136

5.2.3 Nucleotides 137

5.2.4 Polynucleotides 137

5.2.5 Stabilizing Polynucleotide Interactions 140

5.2.6 Chain Notation 140

5.2.7 Atomic Labeling 141

5.2.8 Torsion Angle Labeling 142

5.3 Nucleic Acid Conformational Flexibility 142

5.3.1 The Furanose Ring 143

5.3.2 Backbone Torsional Flexibility 145

5.3.3 The Glycosyl Rotation 148

5.3.4 Sugar/Glycosyl Combinations 148

5.3.5 Basic Helical Descriptors 150

5.3.6 Base-Pair Parameters 151

5.4 Canonical DNA Forms 155

5.4.1 B-DNA 156

5.4.2 A-DNA 157

5.4.3 Z-DNA 160

5.4.4 Comparative Features 161

6 Topics in Nucleic Acids Structure: DNA Interactions

and Folding 163

6.1 Introduction 164

6.2 DNA Sequence Effects 165

6.2.1 Local Deformations 165

6.2.2 Orientation Preferences in Dinucleotide Steps ....166

6.2.3 Orientation Preferences in Dinucleotide StepsWith Flanking Sequence Context: Tetranucleotide

Studies 169

6.2.4 Intrinsic DNA Bending in A-Tracts 169

6.2.5 Sequence Deformability Analysis Continues 173

Page 6: Molecular Modeling andSimulation

Contents xxv

6.3 DNA Hydration and Ion Interactions 174

6.3.1 Resolution Difficulties 175

6.3.2 Basic Patterns 176

6.4 DNA/Protein Interactions 180

6.5 Cellular Organization of DNA .. ,182

6.5.1 Compaction of Genomic DNA 182

6.5.2 Coiling of the DNA Helix Itself 184

6.5.3 Chromosomal Packaging of Coiled DNA 185

6.6 Mathematical Characterization of DNA Supercoiling 195

6.6.1 DNA Topology and Geometry 195

6.7 Computational Treatments ofDNA Supercoiling 197

6.7.1 DNA as a Flexible Polymer 198

6.7.2 Elasticity Theory Framework 199

6.7.3 Simulations of DNA Supercoiling 200

7 Topics in Nucleic Acids Structure: Noncanonical Helices

and RNA Structure 205

7.1 Introduction 205

7.2 Variations on a Theme 206

7.2.1 Hydrogen Bonding Patterns in Polynucleotides . . .206

7.2.2 Hybrid Helical/Nonhelical Forms 210

7.2.3 Unusual Forms: Overstretched and Understretched

DNA 214

7.3 RNA Structure and Function 216

7.3.1 DNA's Cousin Shines 216

7.3.2 RNA Chains Fold Upon Themselves 216

7.3.3 RNA's Diversity 217

7.3.4 Non-Coding and Micro-RNAs 221

7.3.5 RNA at Atomic Resolution 222

7.4 Current Challenges in RNA Modeling 225

7.4.1 RNA Folding 225

7.4.2 RNA Motifs 225

7.4.3 RNA Structure Prediction 226

7.5 Application of Graph Theory to Studies of RNA Structure

and Function 229

7.5.1 Graph Theory 229

7.5.2 RNA-As-Graphs (RAG) Resource 230

8 Theoretical and Computational Approaches to Biomolecular

Structure 237

8.1 The Merging of Theory and Experiment 238

8.1.1 Exciting Times for Computationalists! 238

8.1.2 The Future of Biocomputations 240

8.1.3 Chapter Overview 240

Page 7: Molecular Modeling andSimulation

xxvi Contents

8.2 Quantum Mechanics (QM) Foundations of Molecular Mechan¬

ics (MM) 241

8.2.1 The Schrodinger Wave Equation 241

8.2.2 The Born-Oppenheimer Approximation 242

8.2.3 Ab Initio QM 242

8.2.4 Semi-Empirical QM 244

8.2.5 Recent Advances in Quantum Mechanics 244

8.2.6 From Quantum to Molecular Mechanics 247

8.3 Molecular Mechanics: Underlying Principles 251

8.3.1 The Thermodynamic Hypothesis 251

8.3.2 Additivity 252

8.3.3 Transferability 254

8.4 Molecular Mechanics: Model and Energy Formulation. ...

256

8.4.1 Configuration Space 258

8.4.2 Functional Form 259

8.4.3 Some Current Limitations 262

9 Force Fields 265

9.1 Formulation of the Model and Energy 266

9.2 Normal Modes 267

9.2.1 Quantifying Characteristic Motions 267

9.2.2 Complex Biomolecular Spectra 269

9.2.3 Spectra As Force Constant Sources 269

9.2.4 In-Plane and Out-of-Plane Bending 271

9.3 Bond Length Potentials 272

9.3.1 Harmonic Term 273

9.3.2 Morse Term 274

9.3.3 Cubic and Quartic Terms 275

9.4 Bond Angle Potentials 276

9.4.1 Harmonic and Trigonometric Terms 277

9.4.2 Cross Bond Stretch / Angle Bend Terms 278

9.5 Torsional Potentials 281

9.5.1 Origin of Rotational Barriers 281

9.5.2 Fourier Terms 281

9.5.3 Torsional Parameter Assignment 282

9.5.4 Improper Torsion 2869.5.5 Cross Dihedral/Bond Angle and Improper/Improper

Dihedral Terms 2879.6 The van der Waals Potential 288

9.6.1 Rapidly Decaying Potential 2889.6.2 Parameter Fitting From Experiment 2899.6.3 Two Parameter Calculation Protocols 289

9.7 The Coulomb Potential 2919.7.1 Coulomb's Law: Slowly Decaying Potential 2919.7.2 Dielectric Function 2929.7.3 Partial Charges 294

Page 8: Molecular Modeling andSimulation

Contents xxvii

9.8 Parameterization 295

9.8.1 A Package Deal 295

9.8.2 Force Field Comparisons 295

9.8.3 Force Field Performance 297

10 Nonbonded Computations 299

10.1 A Computational Bottleneck 301

10.2 Approaches for Reducing Computational Cost 302

10.2.1 Simple Cutoff Schemes 302

10.2.2 Ewald and Multipole Schemes 303

10.3 Spherical Cutoff Techniques 304

10.3.1 Technique Categories 304

10.3.2 Guidelines for CutoffFunctions 305

10.3.3 General Cutoff Formulations 306

10.3.4 Potential Switch 307

10.3.5 Force Switch 308

10.3.6 Shift Functions 309

10.4 The Ewald Method 311

10.4.1 Periodic Boundary Conditions 311

10.4.2 Ewald Sum and Crystallography 314

10.4.3 Mathematical Morphing of a ConditionallyConvergent Sum 316

10.4.4 Finite-Dielectric Correction 320

10.4.5 Ewald Sum Complexity 320

10.4.6 Resulting Ewald Summation 321

10.4.7 Practical Implementation: Parameters, Accuracy,and Optimization 322

10.5 The Multipole Method 324

10.5.1 Basic Hierarchical Strategy 324

10.5.2 Historical Perspective 329

10.5.3 Expansion in Spherical Coordinates 330

10.5.4 Biomolecular Implementations 332

10.5.5 Other Variants 333

10.6 Continuum Solvation 333

10.6.1 Need for Simplification! 333

10.6.2 Potential of Mean Force 334

10.6.3 Stochastic Dynamics 335

10.6.4 Continuum Electrostatics 338

11 Multivariate Minimization in Computational Chemistry 345

11.1 Ubiquitous Optimization: From Enzymes to Weather to Eco¬

nomics 347

11.1.1 Algorithmic Sophistication Demands Basic

Understanding 347

11.1.2 Chapter Overview 347

Page 9: Molecular Modeling andSimulation

xxviii Contents

11.2 Optimization Fundamentals 348

11.2.1 Problem Formulation 348

11.2.2 Independent Variables 349

11.2.3 Function Characteristics 349

11.2.4 Local and Global Minima 351

11.2.5 Derivatives of Multivariate Functions 353

11.2.6 The Hessian of Potential Energy Functions 353

11.3 Basic Algorithmic Components 356

11.3.1 Greedy Descent 356

11.3.2 Line-Search-Based Descent Algorithm 359

11.3.3 Trust-Region-Based Descent Algorithm 361

11.3.4 Convergence Criteria 362

11.4 The Newton-Raphson-Simpson-FourierMethod 364

11.4.1 The One-Dimensional Version of Newton's Method.

364

11.4.2 Newton's Method for Minimization 367

11.4.3 The Multivariate Version of Newton's Method....

368

11.5 Effective Large-Scale Minimization Algorithms 369

11.5.1 Quasi-Newton (QN) 370

11.5.2 Conjugate Gradient (CG) 372

11.5.3 Truncated-Newton (TN) 374

11.5.4 Simple Example 376

11.6 Available Software 378

11.6.1 Popular Newton and CG 378

11.6.2 CHARMM's ABNR 379

11.6.3 CHARMM's TN 379

11.6.4 Comparative Performance on Molecular Systems . . 379

11.7 Practical Recommendations 380

11.8 Future Outlook 383

12 Monte Carlo Techniques 385

12.1 MC Popularity 386

12.1.1 A Winning Combination 386

12.1.2 From Needles to Bombs 387

12.1.3 Chapter Overview 387

12.1.4 Importance of Error Bars 388

12.2 Random Number Generators 388

12.2.1 What is Randoml 388

12.2.2 Properties of Generators 389

12.2.3 Linear Congruential Generators (LCG) 392

12.2.4 Other Generators 396

12.2.5 Artifacts 400

12.2.6 Recommendations 401

12.3 Gaussian Random Variates 403

12.3.1 Manipulation of Uniform Random Variables 403

12.3.2 Normal Variates in Molecular Simulations 403

Page 10: Molecular Modeling andSimulation

Contents xxix

12.3.3 Odeh/Evans Method 404

12.3.4 Box/Muller/Marsaglia Method 405

12.4 Means for Monte Carlo Sampling 406

12.4.1 Expected Values 406

12.4.2 Error Bars 409

12.4.3 Batch Means 410

12.5 Monte Carlo Sampling 411

12.5.1 Density Function 411

12.5.2 Dynamic and Equilibrium MC: Ergodicity,Detailed Balance 411

12.5.3 Statistical Ensembles 412

12.5.4 Importance Sampling: Metropolis Algorithmand Markov Chains 413

12.6 Monte Carlo Applications to Molecular Systems 418

12.6.1 Ease of Application 418

12.6.2 Biased MC 419

12.6.3 Hybrid MC 420

12.6.4 Parallel Tempering and Other MC Variants 421

13 Molecular Dynamics: Basics 425

13.1 Introduction: Statistical Mechanics by Numbers 426

13.1.1 Why Molecular Dynamics? 426

13.1.2 Background 427

13.1.3 Outline of MD Chapters 428

13.2 Laplace's Vision of Newtonian Mechanics 429

13.2.1 The Dream Becomes Reality 42913.2.2 Deterministic Mechanics 432

13.2.3 Neglect of Electronic Motion 432

13.2.4 Critical Frequencies 433

13.2.5 Hybrid Quantum/Classical Mechanics Treatments. .

435

13.3 The Basics: An Overview 435

13.3.1 Following the Equations ofMotion 435

13.3.2 Perspective on MD Trajectories 436

13.3.3 Initial System Settings 437

13.3.4 Sensitivity to Initial Conditions and

Other Computational Choices 440

13.3.5 Simulation Protocol 442

13.3.6 High-Speed Implementations 443

13.3.7 Analysis and Visualization 445

13.3.8 Reliable Numerical Integration 445

13.3.9 Computational Complexity 446

13.4 The Verlet Algorithm 448

13.4.1 Position and Velocity Propagation 449

13.4.2 Leapfrog, Velocity Verlet, and Position Verlet....

451

13.5 Constrained Dynamics 453

Page 11: Molecular Modeling andSimulation

xxx Contents

13.6 Various MD Ensembles 455

13.6.1 Need for Other Ensembles 455

13.6.2 Simple Algorithms 456

13.6.3 Extended System Methods 459

14 Molecular Dynamics: Further Topics 463

14.1 Introduction 464

14.2 Symplectic Integrators 465

14.2.1 Symplectic Transformation 466

14.2.2 Harmonic Oscillator Example 467

14.2.3 Linear Stability 467

14.2.4 Timestep-Dependent Rotation in Phase Space .... 469

14.2.5 Resonance Condition for Periodic Motion 470

14.2.6 Resonance Artifacts 471

14.3 Multiple-Timestep (MTS) Methods 472

14.3.1 Basic Idea 472

14.3.2 Extrapolation 473

14.3.3 Impulses 474

14.3.4 Vulnerability of Impulse Splitting to Resonance

Artifacts 475

14.3.5 Resonance Artifacts in MTS 476

14.3.6 Limitations of Resonance Artifacts on Speedup;Possible Cures 478

14.4 Langevin Dynamics 479

14.4.1 Many Uses 479

14.4.2 Phenomenological Heat Bath 480

14.4.3 The Effect of 7 480

14.4.4 Generalized Verlet for Langevin Dynamics 482

14.4.5 The LN Method 482

14.5 Brownian Dynamics (BD) 487

14.5.1 Brownian Motion 487

14.5.2 Brownian Framework 489

14.5.3 General Propagation Framework 491

14.5.4 Hydrodynamic Interactions 491

14.5.5 BD Propagation Scheme: Cholesky vs. ChebyshevApproximation 494

14.6 Implicit Integration 496

14.6.1 Implicit vs. Explicit Euler 497

14.6.2 Intrinsic Damping 498

14.6.3 Computational Time 498

14.6.4 Resonance Artifacts 49914.7 Enhanced Sampling Methods 503

14.7.1 Overview 503

14.7.2 Harmonic-Analysis Based Techniques 50314.7.3 Other Coordinate Transformations 505

Page 12: Molecular Modeling andSimulation

Contents xxxi

14.7.4 Coarse Graining Models 507

14.7.5 Biasing Approaches 508

14.7.6 Variations in MD Algorithm and Protocol 509

14.7.7 Other Rigorous Approaches for DeducingMechanisms, Free Energies, and Reaction Rates

...511

14.8 Future Outlook 513

14.8.1 Integration Ingenuity 513

14.8.2 Current Challenges 514

IS Similarity and Diversity in Chemical Design 519

15.1 Introduction to Drug Design 520

15.1.1 Chemical Libraries 520

15.1.2 Early Drug Development Work 521

15.1.3 Molecular Modeling in Rational Drug Design .... 523

15.1.4 The Competition: Automated Technology 524

15.1.5 Chapter Overview 526

15.2 Problems in Chemical Libraries 526

15.2.1 Database Analysis 526

15.2.2 Similarity and Diversity Sampling 527

15.2.3 Bioactivity Relationships 529

15.3 General Problem Definitions 532

15.3.1 TheDataset 532

15.3.2 The Compound Descriptors 534

15.3.3 Characterizing Biological Activity 535

15.3.4 The Target Function 536

15.3.5 Scaling Descriptors 536

15.3.6 The Similarity and Diversity Problems 538

15.4 Data Compression and Cluster Analysis 540

15.4.1 Data Compression Based on Principal Component

Analysis (PCA) 540

15.4.2 Data Compression Based on the Singular Value

Decomposition (SVD) 542

15.4.3 Relation Between PCA and SVD 544

15.4.4 Data Analysis via PCA or SVD and Distance

Refinement 545

15.4.5 Projection, Refinement, and Clustering Example . . .546

15.5 Future Perspectives 551

Epilogue 555

Appendices 556

A Molecular Modeling Sample Syllabus 557

B Article Reading List 559