Download - Understanding the Molecular Mechanism of Elasticity in ... · Understanding the Molecular Mechanism of Elasticity in Elastin from a Solvation Perspective Zhuyi Xue Masters of Science

Understanding the Molecular Mechanism of Elasticity inElastin from a Solvation Perspective

by

Zhuyi Xue

A thesis submitted in conformity with the requirementsfor the degree of Masters of Science

Graduate Department of BiochemistryUniversity of Toronto

Copyright © 2013 by Zhuyi Xue

Abstract

Understanding the Molecular Mechanism of Elasticity in Elastin from a Solvation

Perspective

Zhuyi Xue

Masters of Science

Graduate Department of Biochemistry

University of Toronto

2013

Elastin is an extracellular matrix protein that provides tissues with elasticity. In this the-

sis, we studied three aspects of elastin-based peptides by performing molecular dynamics

(MD) simulations in explicit solvents: aggregation, solvent quality & mechanical prop-

erty. First, by simulating the peptides in water and methanol, we found that methanol

stabilizes the secondary structure of amyloid-like peptides, based on which we hypoth-

esized that the reduction of solvophobic effect in methanol compared to that in water

prevents their formation of amyloid-like fibrils. Second, we studied the solvent effects

of various solvents with different polarities on the peptides, and found that they exhibit

different solvent qualities. Third, we developed a model to predict the Young’s modulus

of elastin-like material using data from MD simulations. This model produces consistent

results with experimental measurements, hence provides a way to evaluate the solvent

effects on elasticity. We conclude that hydrophobic effect plays an important role in

generating elasticity.

ii

Acknowledgements

To study abroad for the first time is a wonderful yet very challenging experience. I feel

grateful to all the people that have helped me along the way.

To my supervisor and committee. Thanks to Dr. Regis Pomes, who brought me here

initially, and my committee: Dr. Fred Keeley, Dr. Simon Sharpe and Dr. Zhaolei Zhang

for the guidance, suggestion and comments.

To my colleagues. Thanks to Dr. John Holyoake, Dr. Chris Neale, Dr. Nilu Chakrabarti,

Dr. Loan Huynh, Dr. Chris Madill, Dr. Sarah Rauscher, David Caplan, Grace Li,

Kethika Kulleperuma, Aditi Ramesh, Christopher Ing, and Ana Nikolic for the guidance,

suggestion and comments.

To my friends and family members. Thanks to Lois Yin, Guang Shi and Feiyang Liu.

Thanks to my best friends, Gangzhi Zheng, Jian Yu, Quan Jin and Yong Zhu. Thanks to

my dear Beibei Zhang. Thanks to my mom and all the other family members for being

understanding and supportive all the time.

Finally, I would also like to thank the following high-performance computing consortia

of Compute Canada for providing computational resources for the work in this thesis:

SciNet, RQCHP, CLUMEQ, WestGrid and SHARCNET.

iii

https://computecanada.ca/

http://www.scinethpc.ca/

https://rqchp.ca/

http://www.clumeq.ca/

https://www.westgrid.ca/

https://www.sharcnet.ca/

Contents

List of Tables ix

List of Figures x

List of Acronyms xi

List of Symbols xv

1 Introduction 1

1.1 Elastomeric Proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Elastin and Tropoelastin . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 In Vivo and In Vitro Elastogenesis . . . . . . . . . . . . . . . . . . . . . 4

1.4 Aggregation Propensities of Elastin-based Peptides . . . . . . . . . . . . 6

1.5 Solvent Quality & Conformational Equilibria . . . . . . . . . . . . . . . . 8

1.6 Molecular Mechanism of Elasticity in Elastin . . . . . . . . . . . . . . . . 10

1.7 Review of Previous MD Simulations on Elastin-based Peptides . . . . . . 11

1.8 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.9 Organization of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Methods 17

2.1 Molecular Dynamics Simulations . . . . . . . . . . . . . . . . . . . . . . 17

2.2 Force Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

iv

2.2.1 All-atom Force Fields . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.2.2 Coarse Grained Force Fields . . . . . . . . . . . . . . . . . . . . . 28

2.3 Sampling Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3 Elastin-based Peptides in Water and Methanol 32

3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.2.1 Intrinsically Disordered Peptides . . . . . . . . . . . . . . . . . . 34

3.2.2 Radius of Gyration . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.2.3 Intramolecular Peptide-peptide Interactions . . . . . . . . . . . . 36

3.2.4 Interactions between Peptide and Solvent . . . . . . . . . . . . . . 38

3.2.5 β-sheet Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.5 Material & Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

4 Solvent Quality Studies 50

4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2.1 Radius of Gyration . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2.2 Secondary Structure Content . . . . . . . . . . . . . . . . . . . . 53

4.2.3 Size of peptides In Vacuo . . . . . . . . . . . . . . . . . . . . . . 56

4.2.4 The Discrepancy in β-sheet content . . . . . . . . . . . . . . . . . 59

4.2.5 Ratio of cis/trans Peptide Bonds . . . . . . . . . . . . . . . . . . 63

4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67


5 Modeling Mechanical Properties 71

5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

v

5.2 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

5.2.1 Modulus of a Monomer as a Spring . . . . . . . . . . . . . . . . . 72

5.2.2 Young’s Modulus in the tetrahedron model . . . . . . . . . . . . . 73

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.3.1 Modulus of Peptide Monomers . . . . . . . . . . . . . . . . . . . . 81

5.3.2 Young’s Modulus . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

5.3.3 Stress-strain Curve . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

5.4.1 Comparison between Experiments and Simulations . . . . . . . . 87

5.4.2 Comparison between Results in Water and in Methanol . . . . . . 88

5.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91


6 Summary & Future Directions 93

6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Appendix A Force Fields Comparison 96

A.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

A.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

A.2.1 Force Fields Comparison for (GVPGV)7 . . . . . . . . . . . . . . 99

A.2.2 Force Fields Comparison for (GV)18 . . . . . . . . . . . . . . . . . 102

A.2.3 Force Fields Comparison for Dipeptides In Vacuo . . . . . . . . . 111

A.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

A.4 Material & Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Appendix B sumcoresg 117

B.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

B.2 Material & Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

vi

B.3 Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

B.4 Screen Shots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

B.5 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123

Appendix C xit 128

C.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

C.2 Material & Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

C.3 Usage Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

C.4 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

Appendix D tprparser 135

D.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

D.2 Material & Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

D.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

D.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Bibliography 139

vii

List of Tables

1.1 Definition of the mechanical properties for quantification of elasticity . . 2

1.2 Comparison of in vivo and in vitro elastogenesis . . . . . . . . . . . . . . 5

2.1 Functional forms of bond and angle potentials . . . . . . . . . . . . . . . 22

2.2 Functional forms of the potentials of proper and improper dihedral angles 22

2.3 Functional forms of the Lennard-Jones and electrostatic potentials . . . . 23

2.4 Evolutions of different force fields in chronological order . . . . . . . . . . 29

3.1 Model peptides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.1 Peptide hydrophobicity and the solvent in which the peptide first reaches

its maximum Rg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2 Percentages of different cis-X-Pro in PDB database . . . . . . . . . . . . 64

4.3 Summary of the fraction of cis-X-nonPro and cis-X-Pro . . . . . . . . . 66

4.4 Box size of and number of solvent molecules in each system in OPLS-AA/L 69

4.5 Box size of and number of solvent molecules in each system in CHARMM22* 70

5.1 Comparison of Young’s moduli . . . . . . . . . . . . . . . . . . . . . . . . 84

A.1 Selected force field sets for comparison . . . . . . . . . . . . . . . . . . . 97

B.1 Summary of scripts and folders in sumcoresg . . . . . . . . . . . . . . . 122

C.1 Summary of scripts and folders in xit . . . . . . . . . . . . . . . . . . . 132

viii

List of Figures

2.1 Workflow of a MD simulation . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2 Family tree of AMBER force fields . . . . . . . . . . . . . . . . . . . . . 27

3.1 Snapshots of (GVPGV)7 and (GGVGV)7 in water and methanol . . . . . 35

3.2 Distribution of Rg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3 Propensity for intramolecular peptide-peptide interactions . . . . . . . . 39

3.4 Propensity for intermolecular peptide-solvent interactions . . . . . . . . . 41

3.5 Propensity to form β-sheet structure . . . . . . . . . . . . . . . . . . . . 42

3.6 RDFs between peptide and solvent nonpolar atoms . . . . . . . . . . . . 47

3.7 RDFs between the peptide nonpolar and solvent polar atoms . . . . . . . 48

3.8 Time evolution of the peptide Rg . . . . . . . . . . . . . . . . . . . . . . 49

4.1 Average Rg of model peptides in water, alcohol solvents, and octane . . . 54

4.2 Various types of backbone structures as defined in DSSP . . . . . . . . . . 55

4.3 Rg and intramolecular peptide-peptide H-bonds propensity of ELPs in

vacuo as a function of temperature . . . . . . . . . . . . . . . . . . . . . 57

4.4 Distribution of end-to-end distances in vacuo at 2707 K . . . . . . . . . . 57

4.5 Intramolecular H-bonds propensity of the model peptides in various solvents 58

4.6 Comparison of β-sheet content between in Dataset 1 and Dataset 2. . . . 60

4.7 Comparison of Rg in Dataset 1 and Dataset 2 . . . . . . . . . . . . . . . 61

ix

4.8 Average Rg of (PGV)12 in different solvents in CHARMM22* and OPLS-

AA/L . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.9 Fraction of cis-X-nonPro . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.10 Fraction of cis-X-Pro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.1 Illustration of a spring complex in the tetrahedron model . . . . . . . . . 74

5.2 Illustration of a unit cell in the tetrahedron model . . . . . . . . . . . . . 75

5.3 PMF along the end-to-end distance of ELPs . . . . . . . . . . . . . . . . 82

5.4 Young’s modulus as a function of strain for (GVPGV)7 and (PGV)12 . . 83

5.5 Stress-strain curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

A.1 Distributions of Rg of (GVPGV)7 in different force field sets . . . . . . . 100

A.2 PMFs of (GVPGV)7 in different force field sets . . . . . . . . . . . . . . 101

A.3 Average Rg of (GVPGV)7, (GV)18 and G36 in water, alcohol solvents, and

octane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

A.4 A snapshot of the zigzag extension of (GV)18 in methanol in CHARMM22*104

A.5 H-bonding maps of (GV)18 in different force fields . . . . . . . . . . . . . 105

A.6 H-bonding maps of (GV)18 in other solvents in CHARMM22* . . . . . . 106

A.7 PMFs of Ramachandran plots for Gly in (GV)18 in different force fields . 108

A.8 PMFs of Ramachandran plots for Val in (GV)18 in different force fields . 109

A.9 PMFs of Ramachandran plots for Gly in G36 in different force fields . . . 110

A.10 Potential energy maps of the Gly dipeptide in different force fields . . . . 112

A.11 Potential energy maps of the Val dipeptide in different force fields . . . . 113

A.12 Potential energy maps of the Pro dipeptide in different force fields . . . . 114

B.1 Workflow of sumcoresg . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

B.2 Historical usage data along the time . . . . . . . . . . . . . . . . . . . . . 125

B.3 Historical usage data in a bar chart . . . . . . . . . . . . . . . . . . . . . 126

x

List of Acronyms

Rg radius of gyration 22, 36–38,

40, 46, 49,

50, 52–54,

56, 57, 59,

61, 67, 68,

90, 91, 95–

99, 102,

103

ALP amyloid-like peptide 7, 8, 13, 30,

32–34, 36,

38–41, 43,

44, 47, 51,

53, 93, 95

AMBER Assisted Model Building with Energy Refinement 21–26, 29

CD circular dichroism 96

CG coarse-grained 21, 28, 95

CHARMM Chemistry at HARvard Macromolecular Mechanics 21–23, 26, 29

xi

EBP elastin-based peptide 6–9, 11, 12,

14, 15, 32, 96

ELP elastin-like peptide 7–9, 13–

15, 28,

30, 32–34,

36, 38–41,

43, 44, 47,

51, 53, 56,

57, 67, 71,

81, 82, 84,

86–88, 91,

93–96

ENM elastic network model 95

FTIR Fourier transform infrared spectroscopy 7

GP genipin 5

H-bond hydrogen bond 36, 38–41,

46, 56–58,

102, 105,

106, 115

HP hydrophobic 3, 6, 7, 13,

32, 73, 74,

78, 87, 91, 95

HPC high-performance computing 14

xii

HTTP Hypertext Transfer Protocol 119, 123

IDP intrinsically disordered peptide 8, 34, 94, 95

LINCS linear constraint solver 45, 69, 70,

115

MD molecular dynamics 11–15, 17,

30, 32, 71,

85, 88, 94–

96, 128, 130,

134, 135, 137

MR multiple replica 30

NMR nuclear magnetic resonance 11, 21, 68, 96

OPLS Optimized Potentials for Liquid 21–24, 29,

31, 70

PME Particle-Mesh Ewald 45, 69, 115

PMF potential of mean force 72, 73, 81,

82, 85, 92,

99, 101, 107–

111

PQQ pyrroloquinoline quinone 5

RDF radial distribution function 46–48

xiii

REX replica exchange 30

RHS right-hand-side 76

SEM standard error of mean 37, 82

SSH Secure Shell 2 119–122

ssNMR solid-state NMR 11

STDR simulated tempering distributed replica 14, 30

US umbrella sampling 30

VREX virtual replica exchange 30

XDR External Data Representation 136

XL cross-linking 3, 4, 73, 74,

87, 91, 95

xiv

List of Symbols

A cross-sectional area perpendicular to the direction of extension of

a piece of elastin-like material

72

F pulling force 72, 74, 76

G0 system free energy when the peptide is in its relaxed state 73

Gd system free energy when the peptide’s end-to-end distance is d 72

H system enthalpy 89

KY Young’s modulus 72, 74, 79–

81, 83–85,

87, 88, 91, 94

S system entropy 89

T temperature 72, 89

U Potential Energy 18–20

X length of a spring complex in the tetrahedron model in Figure 5.1 74, 76, 77, 80

Z Partition function 72

∆S change of system entropy between the extended and relaxed states 88–90

∆d extension of a peptide monomer 88, 89

∆l extension of a piece of elastin-like material 72, 79

∆G change of system free energy when the peptide end-to-end distance

changes from d to d0

88

xv

F force applied on a particular atom by the rest of the system during

MD simulations

18–20

R coordinates of all the atom in a MD system 18

r coordinate of a particular atom in a MD system 18–20

v velocity of a particular atom in a MD system 18–20

θtetra tetrahedron angle, 109.4712◦ 74

m in-methanol property value it is superscript of 90

w in-water property value it is superscript of 90

u solute component of the property it is subscript of 90

v solvent component of the property it is subscript of 90

d end-to-end distance of a peptide monomer 72, 76, 89

d0 end-to-end distance of a peptide monomer in its relaxed state 71–73, 76,

78, 81, 85,

88, 94, 95

dt time step used in MD systems 18, 20

f recoiling force of an piece of elastin-like material in its extended

state

xvi, 88, 89

fe enthalpic part of the recoiling force (f) 89

fs entropic part of the recoiling force (f) 89

h0 height of a piece of elastin-like material in its relaxed state 77

k modulus of a spring or a peptide monomer 71–73, 76,

79, 81, 85,

87–89, 91,

94, 95

kB Boltzmann constant 72

kc modulus of a spring complex in the tetrahedron model 74, 76, 78

ku modulus of a unit cell in the tetrahedron model 74, 78, 79

xvi

l length of a piece of elastin-like material 72

l0 length of a piece of elastin-like material in its relaxed state 77

m mass of a particular atom in a MD system 20

nu,x number of unit cells along the x axis in the tetrahedron model 79

nu,y number of unit cells along the y axis in the tetrahedron model 79

nu,z number of unit cells along the z axis in the tetrahedron model 79

p pressure 89

p0 probability when the peptide is in its relaxed state 73

pd probability when the peptide’s end-to-end distance is d 72

r ratio of the extension (x) over the its original length (x0) of a

spring complex in the tetrahedron model. Equal to the strain of

elastin-like material

76, 79, 80

r′ ratio of the shrinkage (x) over the its original width and height of

a piece of elastin-like material

76

s length of OO1 in the tetrahedron model in Figure 5.1 74, 76

s0 length of OO1 in the tetrahedron model in the relaxed state in

Figure 5.1

74

t time in a MD simulation 18

w0 width of a piece of elastin-like material in its relaxed state 77

x extension of a spring complex in the tetrahedron model in Figure

5.1

xvii, 74, 76

x0 length of a spring complex in its relaxed state in the tetrahedron

modelin Figure 5.1

xvii, 74

xvii

Chapter 1

Introduction

1.1 Elastomeric Proteins

A protein is considered elastomeric or elastic if it possesses elasticity, which is the phys-

ical property of a material to return to its original shape after being deformed by an

external force. The measurements of elasticity upon stretching include resilience, stiff-

ness, strength, extensibility and toughness. Their definitions are shown in Table 1.1.

Elastomeric proteins play crucial biological roles throughout the animal kingdom [101].

Among them, those that possess high-resilience, large extensibility and low stiffness are

usually described as rubber-like proteins [41] since such properties are also characteristic

of rubber. Typical rubber-like proteins include elastin and resilin. Elastin exists in most

of the vertebrates and is responsible for the extensibility and recoil of biological tissues like

blood vessels, lung and elastic ligaments. Resilin, while being very similar to elastin, only

exists in insects, and is responsible for conveying essential mechanical properties in tis-

sues like the wing joints of dragonfly, fleas cuticles, and the tymbal of cicada [101]. Other

examples for elastomeric but not rubber-like proteins include collagen fibers, which are

1

Chapter 1. Introduction 2

highly resilient but also very stiff, CoIP from mussel byssus threads and spider dragline

silks, which have considerable stiffness, strength, and extensibility, preventing them from

fracture [41, 101]. A more comprehensive review on various elastomeric proteins and

their measurements of mechanical properties can be found in Rauscher & Pomes, 2010

[101].

The mechanical properties of various elastomeric proteins are undoubtedly determined by

their underlying structures at the molecular level. Because of their promising applications

in biomedical engineering and material science [4], a variety of research studies have been

motivated and conducted to investigate their structure-function relationships[32, 22, 18].

The work presented in this thesis focuses on one of them: elastin.

Property Definition

stress force applied on the material normalized by its cross-sectional

area during deformation

strain extension of the material along the direction of the applied force

normalized by its original length

resilience reflected the efficiency of the material for storing energy, defined

as the difference between the work done upon deformation and

the heat released upon relax normalized by the work

stiffness measured by the Young’s modulus of the material, which is de-

fined as the slope of stress-strain curve upon stretching

strength defined as the stress at which the material ruptures

extensibility defined as the strain at which the material ruptures

toughness defined as the total amount of work needed to rupture the ma-

terial

Table 1.1: Definition of the mechanical properties for quantification of elasticity.


1.2 Elastin and Tropoelastin

Elastin is an extracellular matrix protein [123, 85] that has been found in all vertebrates

except for jawless agnathans such as lamprey [23]. The content of elastin varies in

different tissues. For example, it is about 28-32% dry mass in major vascular vessels,

3-7% in lung, 50% in elastic ligaments, 4% in tendon, and 2-3% in skin [112, 123]. In

addition, elastin has also been found in vertebral ligamenta flava, vocal chords, elastic

cartilage, and bladder [83, 24].

Since elastin is a matrix protein, it has a monomeric precursor called tropoelastin.

While mature elastin is extremely insoluble, tropoelastin is soluble at room temperature.

Tropoelastin is unusual in terms of both of its amino acid composition and domain com-

position. At the amino acid level, tropoelastin is mostly made of hydrophobic residues.

As a result, tropoelastin is among the most hydrophobic proteins. For example, human

tropoelastin has 34 exons [17] and over 700 amino acids, but 75% of the entire sequence

consists of only 4 hydrophobic residues: Gly, Val, Ala, and Pro [52]. Such a high level

of hydrophobicity is actually common in the elastin of all higher vertebrates in despite

of some species variation [123]. At the domain level, tropoelastin consists of alternat-

ing hydrophobic (HP) and cross-linking (XL) domains. HP domains are usually rich in

nonpolar residues and highly repetitive. For example, Exon 24 of human tropoelastin

contains 7 fold PGVGV[L/A] repeats [52]. XL domains are usually rich in Ala, with

a couple of Lys interspersed in the form of KAAK or KAAAK [123]. When tropoe-

lastins are crosslinked to form a matrix, it is the 4 Lys from two XL domains that act

as the crosslinkers and are oxidatively deaminated to form a desmosine or isodesmosine,

the crosslink[123]. It is thought that the XL domains impart strength and stability to

elastin, while the HP domains confer extensibility [101].


1.3 In Vivo and In Vitro Elastogenesis

The process of elastin generation, which includes tropoelastin production and matrix

formation, is called elastogenesis [115]. Usually, this term is used for the in vivo process,

but since in vitro synthesis of elastin-like material has been made possible [10, 120], we

think it also applies to the in vitro process.

In vivo, isoforms of tropoelastin mRNA are produced from a single tropoelastin gene

due to alternative splicing. They are transported to the rough endoplasmic reticu-

lum (RER) in the cytoplasm and translated to tropoelastin polypeptide. With very

few post-translational modifications, tropoelastin binds to the elastin-binding protein

(EBP), which prevents it from degradation, and together they are then transported close

to the cell surface via the Golgi apparatus. At the cell surface, EBP also binds to a

β-galactosugar, which reduces its affinity for tropoelastin. As a result, tropoelastin is

released to the extracellular environment. Released tropoelastins align with each other

upon the microfibrils, the scaffold made of multiple distinctive proteins, so that their

crosslinker residues (i.e. Lys in the XL domains of tropoelastin) can come close to each

other for crosslinking reactions to happen. This process is also named coacervation. Af-

ter the alignment, Tropoelastins are crosslinked together by lysyl oxidase via oxidative

deamination, which results in the formation of mature elastin. The formed crosslinks

prevent elastin from falling apart under extension, which is essential in conveying its me-

chanical properties. Since elastin is closely connected with microfibrils, the final structure

of elastin and microfibrils together is also called elastic fiber. A more comprehensive de-

scription of elastogenesis can be found in Vrhovski and Weiss, 1998 [123] and Eldijk et

al. 2012 [115].

An in vitro process similar to the flow of in vivo elastogenesis has been made possible

to produce elastin-like materials [10, 120] though the details are quite different. Instead


of using tropoelastin directly, a much shorter elastin-like peptide can be used as the pre-

cursor peptide for the later-formed polymeric matrix. First, the monomeric peptides are

produced in genetically modified E. coli and purified. Second, coacervation is induced

by increasing the temperature. Then, crosslinking agent like genipin (GP) or pyrrolo-

quinoline quinone (PQQ) is added to the coacervate to start crosslinking reactions. The

coacervate with added crosslinking agent is left overnight, during which self-alignment

and the formation of crosslinks take places. Some of the major differences compared to

the in vivo process are summarized in Table 1.2.

Step In Vivo [123, 115] In Vitro [11, 10, 120]

Monomeric

peptide

Tropoelastin Elastin-like peptide or

tropoelastin

Production

of monomeric

peptides

produced through transcription,

splicing, translation,

transportation to cytoplasm,

post-translation, transportation

out of membrane

produced with genetically

modified E. coli and purified

Coacervation Induced by increase of

concentration

Induced by increase of

temperature

Crosslinking Achieved with lysyl oxidase Achieved with chemical

crosslinker like GP or PQQ

Table 1.2: Comparison of in vivo and in vitro elastogenesis.


1.4 Aggregation Propensities of Elastin-based Pep-

tides

As described above, there is a well-documented protocol for producing elastin-like ma-

terial in vitro, but very little is know about this process at the molecular level. Of the

unknowns, coacervation is one of those are of particular interest to this thesis. For ex-

ample, very few is known about the structure of coacervate, the protein-rich phase after

phase separation.

In the simplest sense, coacervation can be understood as a type of protein aggregation.

A more precise definition of in vitro coacervation is that it is a reversible temperature-

induced phase separation process, in which tropoelastin molecules aggregate, self-assemble,

and form a turbid, protein-rich second phase [122, 123, 24]. Coacervation is generally

considered as a result of increased hydrophobic interactions between the HP domains

of tropoelastin as the temperature increases [123]. The onset temperature of coacerva-

tion depends on multiple factors like sequence composition, peptide concentration, ionic

strength, pH, and solvent hydrophobicity [123, 11, 79, 80].

It has been found that model peptides derived from the sequences of HP domains, i.e.

elastin-based peptides (EBPs), can also coacervate. Furthermore, materials made out of

such sequences have been shown to possess similar mechanical properties to that of native

elastin, hence they are described as elastin-like [10]. However, not all EBPs coacervate.

The aggregation propensities of EBPs can be modulated by inducing sequence variation or

different solvent conditions. For example, an EBP with PGVGVA repeats named EP20-

24-24 is capable of coacervation, but when P is mutated to G, resulting in GGVGVA

repeats, it forms amyloid-like fibrils instead, which contain a large amount of β-sheets

[79]. Another EBP, (VGGVG)n, forms amyloid-like structure when deposited in water

[36, 34, 35], which is consistent with the previous observation [79] since its repetitive


unit, GGVGV, is very similar to that of EP20-24-24 after P-to-G mutation, GGVGVA.

However, if (VGGVG)n is deposited in methanol, instead, it forms an amorphous film

initially, which evolves to be beaded string structures eventually [36]. The beaded string

morphology might be an artifact caused by the oxidized silicon on the substrate surface

used for deposition [34]. To impede the contact of peptides with the substrate, in more

recent work on another very similar sequence, (VGGLG)n, a pegboard-like substrate

surface was used, and it lead to the formation of cigar-like bundles instead of beaded

strings. Meanwhile, (VGGLG)n also forms amyloid-like fibril when deposited in water as

(VGGVG)n [20]. The presence of β-sheet structure in those fibrils have been confirmed

by Fourier transform infrared spectroscopy (FTIR) [105], and models for those structures

have been proposed and evaluated [35, 105].

On the one hand, coacervation is believed to be an important step in in vivo elastogenesis

and it is suggested that coacervation concentrates and aligns tropoelastin before cross-

linking [123]. On the other hand, the formation and deposition of amyloid-like fibrils

are associated with many neuro-degenerative diseases like Alzheimer’s and Parkinson’s

diseases [29]. Amyloid-like structure has even been proposed by Dobson to be a generic

and inherent structural form accessible to all proteins under appropriate conditions [28,

29]. Therefore, it is important to understand the molecular mechanism of amyloid-like

fibril formation in order to develop effective treatments for those diseases. Given the

two types of aggregation, it is very interesting that some of the EBPs could display both

aggregation types under varying conditions.

In the literature, elastin-based peptides have also been called elastin-like peptides or

elastin-derived peptides. In order to have a consistent nomenclature, all peptides that are

derived from HP domains of tropoelastin will be called elastin-based peptides (EBPs) in

this thesis, but only EBPs that tend to coacervate are called elastin-like peptides (ELPs),

while those that tend to form amyloid-like fibrils are called amyloid-like peptides (ALPs).


In 2006, my colleague Sarah Rauscher and coworkers investigated the structural prop-

erties of a set of EBPs of different sequence compositions. They found that ELPs and

ALPs are distinguishable according to backbone hydration and peptide-peptide hydro-

gen bonding, and ELPs remain disordered in both monomeric and aggregated state [98].

Furthermore, they discovered that Pro-Gly (PG) content is a very important criteria for

determining a peptide’s aggregation propensity [98]. From their PG diagram, it shows

that peptides with higher PG contents are unlikely to form amyloid-like fibrils since both

Pro and Gly are secondary structure breakers due to their extreme rigidity and flexibility,

respectively. As a result, peptides with a high percentage in P and G are destined to be

disordered. This discovery almost refutes Dobson’s proposal that amyloid-like structure

is accessible to all kinds of proteins.

However, very limited knowledge is known about how the solvent conditions affect EBPs’

aggregation propensities at the molecular level, which is one of the most important ques-

tions to be concerned in this thesis.

1.5 Solvent Quality & Conformational Equilibria

In order to develop a comprehensive understanding of the aggregation process, it is impor-

tant to have a quantitative measure of the structure of peptides in solution. That EBPs

tend to aggregate and be disordered [98] reminds us of the similarity between intrinsi-

cally disordered peptides (IDPs) and synthetic polymers [104, 90], whose conformation

has been well-studied in the discipline of polymer physics. The conformational equilibria

of synthetic polymers is governed by the balance of chain-chain and chain-solvent inter-

actions, which are in turn determined by solvent quality. [90] Therefore, we can adopt

the analysis methods from polymer physics and apply them to the polypeptides.


A single polymer molecule in a dilute solution can adopt a swollen coil in a good sol-

vent, a collapsed globule in a poor solvent, or a state in-between. In a good solvent

(e.g. polystyrene in benzene), chain-solvent interactions are favored over chain-chain in-

teractions, so the molecule swells; in a poor solvent (e.g. polystyrene in ethanol), the

chain-chain interactions dominate over chain-solvent interactions, so the molecule col-

lapses and becomes compact. If the poor and good solvents are interpolated, there will

be an ideal point where chain-chain interactions and chain-solvent interactions balance

out. This point is called the θ-point. the corresponding solvent and temperature are

called θ-solvent and θ-temperature. At the θ-point (e.g. polystyrene in cyclohexane at

34.5 ◦C), the chain adopts a random coil, and it reaches its maximum chain entropy [104].

A special case of solvation is called polymer melt, which means that the polymers are

solvated by themselves. It is first predicted by Paul Flory that polymer molecules may

behave as ideal chains when solvated by themselves [104], and this prediction has since

been validated for synthetic homopolymers like poly(methyl methacrylate) [62]. Encour-

agingly, my colleague Sarah Rauscher recently discovered that the conformation of ELP

(GVPGV)7 in aggregation resembles that in a polymer melt with MD simulations[97],

which not only contributes to our understanding the structure of the coacervate, but also

suggests the applicability of Flory theorem to polypeptides as well.

The solvent quality of a particular solvent is mainly affected by the inherent properties of

the solvent molecules and the temperature [90]. The theory from polymer physics works

well for uniform polymers like polyethylene, but when it comes to a polypeptide, the case

is often more complicated due to the uneven distribution of polar and nonpolar groups,

i.e. polar backbone and nonpolar sidechains in the EBPs, which causes the formation of

secondary structures in proteins. Therefore, even with the aid of polymer physics, the

problem of how solvent quality affects the conformational equilibria of a peptide needs

to be further explored.


1.6 Molecular Mechanism of Elasticity in Elastin

Elastin has been under study for over 70 years [130], and it has been shown that its

elasticity is primarily due to entropy loss between the stretched and relaxed state [82,

49]. However, because of its conformational heterogeneity [98] and extreme insolubility

[96], the characterization of its atomistic structure remains elusive, hence its molecular

mechanism for elasticity still remains controversial. This section presents a brief review of

the various models proposed to explain the molecular mechanism of elasticity in elastin.

Over the course of elastin research, two major groups of structure-function models have

been proposed, which consider elastin to be either isotropic or anisotropic [123, 83].

The isotropic model considers elastin to be a random-chain network like rubber, where

each individual peptide is kinetically free. As a result, the elasticity in elastin is mainly

due to the decrease in chain entropy when it is being stretched [49, 30]. There are many

research results compatible with the random-chain network model. For example, elastin

contains a high percentage of Pro and Gly, which is conducive to disordered peptide

structure, and polarized light microscopy on elastin exhibits no birefringence, which

suggests isotropic conformation [1]. However, this model cannot explain the fact that

elastin is not self-lubricating and requires plasticizer such as water in order to exhibit

elasticity. [92]

The anisotropic models can be further categorized into the two-phase model (mainly

the liquid drop model [129] and the oiled-coil model [44]) and the β-spiral model [119].

The two-phase model emphasizes that elastin contains both a hydrophobic phase and

a hydrophilic phase. When elastin is in its relaxed state, the hydrophobic phase is

buried inside while the hydrophilic phase is on the surface, but when it is stretched, with

the increase in its surface area, the hydrophobic phase becomes more exposed, which

results in relative ordering of the surrounding water molecules, and induces decrease in


the total entropy of the system [92, 129, 44]. The two-phase model is supported by

a fluorescence study, in which the dye-labeled elastin exhibits lower fluorescence when

being stretched, indicating a nonhomogeneous environment inside the elastin network

[43], but it is criticized for being unlikely to convey high backbone mobility, which is

observed by both nuclear magnetic resonance (NMR) and solid-state NMR (ssNMR)

studies [110, 96]. The β-spiral model suggests that elastin consists of helical structures

which comprise repetitive β-turns, and elasticity is caused by reduced liberational entropy

upon stretching [119]. However, the β-spiral structure has been reported to be very

unstable [67] though transient β-turns are abundant [122, 98]. In a previous study from

our group, it has been shown that the hydrophobic domains of elastin remain disordered

even in the aggregated state due to its richness in Pro [98]. Therefore, the β-spiral model

is highly unreliable.

More detailed reviews of the various models for elastin can also be found in published

papers and reviews [68, 123, 83]. Overall, the proposed models are supported by some ex-

perimental results, but unfortunately, none of them can explain all the evidence observed

in experiments properly [68].

1.7 Review of Previous MD Simulations on Elastin-

based Peptides

Because there is still no experimental approach for obtaining high-resolution structural

information about intrinsically disordered peptides like EBPs, molecular dynamics (MD)

simulation is a good technique to study their structures at the atomistic level. MD

simulation is also the major technique used in this thesis. In this section, we briefly

review all of the MD simulations that have been conducted on peptides relevant to


elastin.

The first MD simulation on an EBP was conducted by Chang and Urry in 1989 [21].

Starting from a previously developed β-spiral structure [119], they simulated the repeti-

tive polypeptide VPGVG in vacuo for 100 ps in both relaxed and stretched states. [21]

In 1990, Wasserman and Salemme simulated the (VPGVG)18 in β-spiral structure for

130 ps but with water molecules included [125].

The analysis on both of the above simulations turns out to be supportive of the so-called

“librational elasticity mechanism” for explaining the elastin’s elasticity [114]. However,

in Chang and Urry’s simulations, since elastin is known to be functional only in its

hydrated state and brittle otherwise [6], the state of VPGVG in vacuo is probably not

representative of the elastin’s functional states. Besides, in both simulations, the time

scales are only in the magnitude of 100 ps, which is much too short to allow adequate

conformational relaxation from the initial state of the peptide from today’s perspective.

Interestingly, the β-spiral model was refuted about a decade later in 2001 after Li et al.

simulated a 90-residue β-spiral-structured EBP, (VPGVG)18 with explicit water for a

total of 80 ns at 7 different temperatures between 7 and 42 ◦C [67]. They found that the

peptide collapses at all temperatures, which shows the unstability of the β-spiral struc-

ture. They conclude that the well-ordered β-spiral model is not a good description of

elastin in water [68]. Besides, Li et al. also found that the peptide at higher temperatures

above the transition temperature, i.e. the temperature at which coacervation happens,

appeared to be more compact than at lower temperatures below the transition tempera-

ture. Based on these results, they proposed an atomic-level description of coacervation.

However, a later study from our group, which involved much more extensive sampling

on the peptide (GVPGV)7 in a total sampling time of 84 µs at 105 temperatures be-

tween 266 and 749 K (800 ns per temperature) suggests that the compactness observed


at higher temperatures is probably due to a shorter relaxation time [97]. Therefore, Li’s

results on the overall compactness of the peptide is probably an artifact of insufficient

sampling time (80 ns). If the simulations from Li et al. could have been extended [67],

the peptides at lower temperature were expected to become more compact than at higher

temperatures.

Although it has been known for many years that elasticity in elastin is mainly entropic

[121, 82, 30, 5], what is still not clear is which part of the system is the major contributor

to the entropy change. Is it the change of backbone chain entropy, or that of the entropy

of the water (a.k.a. the hydrophobic effect)? The earliest simulations failed to answer

this question because either they were in vacuo [21] or the hydration properties were

not analyzed [125, 68]. In 2002, by pulling and releasing the (VPGVG)18 at 10 and

42 ◦C, Li et al. found that the orientational entropy of water molecules hydrating the

hydrophobic groups decreases upon pulling and increases upon releasing, while the chain

entropy undergoes be opposite change, at least within short extension, which is consistent

with results from a previous microcalorimetry experiment in 1978 [42]. Therefore, they

concluded that hydrophobic hydration is an important source of elasticity in elastin [66].

In 2004, Floquet et al. characterized the structural properties of hexapeptide VGVAPG

derived from the repetitive HP domains using both MD simulations and experimental

techniques, and they found that the GVAP sequence in the peptide exhibits a so-called

VIII β-turn [37]. In 2006, another small elastin-based oligopeptide GVG(VPGVG) was

simulated for around 100 ns [6], and the peptide’s kinetics were analyzed.

The simulations published in 2004 and 2006 only contain oligopeptides. Also in 2006,

A much more extensive study on much longer peptides were conducted in our group

[98]. The key objective of that work was to examine the structural properties of ELPs

and ALPs. As mentioned in Section 1.4, ELPs and ALPs are both derived from the

HP domains of tropoelastin, but display different aggregation propensities. That study


showed that the two types of peptides are separable based on backbone hydration and

peptide-peptide hydrogen bonding, and ELPs remain disordered in both monomeric and

aggregated state [98]. Another important contribution from that work is the discovery

of a PG threshold in the peptides’ sequence composition, above which the peptides are

elastin-like and below which the peptides are amyloid-like [98]. The total sampling times

of this work reached 800 ns, which is about 8000 times longer than the first MD simulation

done in 1989.

However, despite the enormous advancement in computational power, the time scale in

MD simulations is still very limited compared to that in experiments on macroscopic

systems. This bottleneck is even more challenging for disordered protein because of their

structural heterogeneity. To relieve the limitation, our group developed an enhanced-

sampling technique called simulated tempering distributed replica (STDR). In 2009, a

sampling time of 42 µs for the system of (GVPGV)7 as a monomer in explicit water [99]

was reached after deploying STDR on high-performance computing (HPC) facilities.

Continuing from the monomer study, Sarah Rauscher also explored the structural prop-

erties of the ELP (GVPGV)7 in the aggregated state. Surprisingly, she found that an

elastin-like aggregate state resembles a polymer melt, in which the monomers become

very flexible, and behave similarly to a polymer chain in an θ-solvent. Based on her re-

sults, She proposed a unified model which intends to resolve the contradictions between

different structure-function models that only when in the aggregate, by having exten-

sive intermolecular peptide-peptide nonpolar interactions (consistent with the two-phase

model), can the peptides’ chain entropy become maximized and thereby the peptides

become random chains (consistent the random-chain network model) [97].

As reviewed above, all of the previous MD simulations are done either in vacuo or in

water. Since EBPs can display different aggregation propensities in different solvents,

and hydrophobic effect can be an important source of elasticity, it would be interesting


and informative to simulate EBPs in different solvents of different polarities from water.

1.8 Objectives

The objectives of this thesis include:

1. Characterize the structural properties of EBPs in water and methanol, and explain

the variation of their aggregation propensities.

2. Characterize the structural properties of EBPs in other alcoholic solvents of varying

polarities, and compare their solvent quality for these peptides.

3. Model the Young’s modulus of macroscopic material based on MD data, and com-

pare the modeled results to experimental measurements.

1.9 Organization of this Thesis

Chapter 1 provides a general introduction to elastin-relevant topics. Chapter 2 briefly in-

troduces MD simulations, the developments of MD force fields and models, and sampling

errors. Chapter 3–5 present the major results in this thesis. Chapter 3 examines the

structural properties of EBPs successively in water and in methanol, and discusses the

solvents’ effects on their aggregation properties. Chapter 4 extends the solvent sets to

include more alcoholic solvents and octane, and discusses their solvent qualities for EBPs.

Chapter 5 describes how the modulus of a monomeric ELP is calculated, and proposes

a mathematical model to calculate the Young’s modulus of a piece of macroscopic mate-

rial based on these ELPs. Chapter 6 summarizes the contributions from this thesis and

proposes future directions. Finally, Appendix A presents the result of an ongoing work


on force fields comparison, which aims to find an optimal force field for this project, and

Appendices B, C and D describe three computational tools developed when preparing

this thesis.

Chapter 2

Methods

2.1 Molecular Dynamics Simulations

MD simulation is the major technique employed in this thesis. MD simulations intend

to generate a conformational ensemble of the target molecular system by simulating its

dynamics using classical Newtonian mechanics, and based on the ensemble, interesting

structural, thermodynamic and mechanical properties can be calculated. The rest of this

section presents the basic theory and practice of MD simulations.

A typical MD simulation needs two ingredients. The first one is a set of atom coordinates

of the system of interest, which can be either obtained from experiments like X-ray

crystallography or NMR study, or constructed de novo, and the second one is a force

field, which includes a set of functions that define the calculation of the potential energy

of the target system and the corresponding parameters used by these functions [38]. The

exact mathematical forms of the functions depend on the force field as discussed in the

next section.

17

Chapter 2. Methods 18

To start the simulation, the initial velocities of all atoms are assigned artificially according

to the Maxwell distribution at a particular temperature. The force at time 0 is calculated

as

F i(0) =∂U(0)

∂ri(0), (2.1)

where F i is the force applied on the ith atom, ri is its coordinates, U is the potential

energy of the system as a function of the coordinates of all the atoms (R).

To integrate the Newtonian’s equations of motion for the next time step dt, suppose the

current time is t, the Taylor expansion of r(t+ dt) is

r(t+ dt) = r(t) + dtv(t) +dt2

2

F (t)

m+dt3

3!

∂r3(t)

∂t3+O(dt4), (2.2)

where m is the mass of the atom.

Several algorithms have been developed to perform the integration, and the most common

one is called the Verlet algorithm. It starts by expanding r(t− dt),

r(t− dt) = r(t)− dtv(t) +dt2

2

F (t)

m− dt3

3!

∂r3(t)

∂t3+O(dt4), (2.3)

and then add it to (2.2), which yields

r(t+ dt) = 2r(t)− r(t− dt) +F (t)

mdt2, (2.4)

the error of which is of order dt4. Similarly, subtracting Equation (2.2) by (2.3) yields

v(t) =r(t+ dt)− r(t− dt)

2dt, (2.5)

the error of which is of order dt2. It is possible to obtain more accurate v(t) using refined

algorithms, which can be referenced in [38, 3].

Back at time 0 following Equation (2.1), to integrate to the next time step using Verlet

algorithm. We need the velocity at time -dt, which can be estimated using

ri(−dt) = ri(0)− vi(0) · dt. (2.6)


The accuracy for r(-dt) is not so important since it is just used to bootstrap the simulation

[38]. Then,

ri(dt) = 2ri(0)− ri(0− dt) +F i(0)

mdt2. (2.7)

With F i(dt) calculated similarly to Equation (2.1),

ri(2dt) = 2ri(dt)− ri(0) +F i(dt)

mdt2. (2.8)

Then,

vi(dt) =ri(2dt)− ri(0)

2dt. (2.9)

The same calculation is applied for every atom and continuously repeated until the in-

tended sampling time is reached. For this algorithm, r is always one step ahead of v.

The only difference between the first and following calculations is that the coordinates at

the previous time step can be obtained directly rather than from estimation. The same

process is also illustrated in Figure 2.1.

2.2 Force Fields

A force field in MD simulations is defined as a set of functions used to calculate the

potential energy of the system (U) together with the corresponding parameters used

in these functions [9]. U is usually decomposed into two groups: the terms arising from

bonded interactions (Ubonded) and those arising from nonbonded interactions (Unonbonded).

The bonded potential energy terms include that of bonds (Ubond), angles (Uangle), dihedral

angles (Udihedral) and improper dihedral angles (Uimproper), which are used to retain the

chirality and planarity of particular chemical groups such as sp3 C atoms and planar rings,

while the nonbonded energy terms include pairwise Lennard-Jones potential (ULJ) and

electrostatic potential (Uelectrostatic). Their relationship can be summarized in Equations

(2.10)–(2.12):

U = Ubonded + Unonbonded, (2.10)


Starting with ri(t − dt), ri(t), if t=0, then

assign velocities and bootstrap r(−dt).

Calculating the force:

F i(t) =∂U(t)

∂ri(t), (2.1)

Calculating the position:

ri(t+ dt) = 2ri(t)− ri(t− dt) +F i(t)

mdt2, (2.8)

Calculating the velocity:

vi(t) =ri(t+ dt)− ri(t− dt)

2dt. (2.9)

t = t + dt

Output ri(t) (and vi(t)) for future analysis

Figure 2.1: Workflow of a MD simulation. U is the potential energy of the system. F

is the force applied on the ith atom. r, v and m are its coordinates, velocity and mass,

respectively. dt is the length of a time step in the simulation. The calculation at each

time step is looped through all atoms in the system. For details of each step, please refer

to Section 2.1.


where

Ubonded = Ubond + Uangle + Udihedral + Uimproper, (2.11)

and

Unonbonded = ULJ + Uelectrostatic. (2.12)

Despite this relationship, the exact mathematical forms of Ubond, Uangle, Udihedral, Uimproper,

ULJ and Uelectrostatic may be different in different force fields, which will be discussed in a

moment.

In this thesis, all the force fields discussed are limited to empirical ones. In contrast to

the force fields that involve quantum calculations, empirical force fields consider atoms

as the smallest particles, and only include relatively simple empirical functional forms.

In the rest of this section, three all-atom force field families, Optimized Potentials for

Liquid (OPLS) [59], Assisted Model Building with Energy Refinement (AMBER) [126],

and Chemistry at HARvard Macromolecular Mechanics (CHARMM) [19] as well as a

coarse-grained (CG) force field MARTINI [76] are to be introduced.

2.2.1 All-atom Force Fields

With tremendous progress in computational power of both hardware and software in

the last decade, all-atom force fields have become dominant over united-atom (a.k.a.

extended-atom) force fields, in which aliphatic hydrogen atoms are incorporated into

the heavy atoms, to which they are bonded to. To be consistent with previous studies

from this laboratory[98, 99, 100], the work presented in this thesis started with using

OPLS-AA/L [61]. However, recent studies published in the last several years compared

a variety of modern force fields and their results suggest that OPLS-AA/L is relatively

inferior in reproducing NMR measurements for biomolecular systems [8, 71]. Besides,

the results from Sarah Rauscher in our group suggests that OPLS-AA/L over collapses


the N-terminal SH3 domain of the pro tine drk and hence underestimate its radius of

gyration (Rg) [97]. Therefore, an force fields comparison study for the selection of an

optimal force field for this project is underway and the preliminary results are shown

in Appendix A. As part of the background for this comparison, a brief review of the

difference in the functional forms among the three most commonly used all-atom force

field families, OPLS [59], AMBER [126] and CHARMM [19], as well as the developments

since their invention, is reviewed here.

Force Field Ubond Uangle Ref.

OPLS∑

bondsKr(r − req)2∑

anglesKθ(θ − θeq)2 [58, 61]

AMBER∑


anglesKθ(θ − θeq)2 [25]

CHARMM∑


anglesKθ(θ − θeq)2 +∑

UBK1,3(r1,3 − r1,30 ) [74]

Table 2.1: Functional forms of bond and angle potentials in the OPLS, AMBER and

CHARMM force fields.

Force Field Udihedral Uimproper Ref.

OPLS∑

dih.

∑3n=1

Vn2

[1 + (−1)n−1 cos(nφ+ γn)] - [58, 61]

AMBER∑

dih.Vn2

[1 + cos(nφ− γ)] - [25]

CHARMM∑

dih.Kχ[1 + cos(nχ− δ)]∑

imp.Kimp.(φ− φ0)2 [74]

Table 2.2: Functional forms of the potentials of proper and improper dihedral angles

in the OPLS, AMBER and CHARMM force fields. Uimproper for OPLS and AMBER is

filled with “-” because it is modeled using the same functional form as the proper periodic

dihedral angles.

Tables 2.1–2.3 show that the potential energy functions used for the three force fields


Force Field ULenard−Jones Uelectrostatic Ref.

OPLS∑atoms

i<j

(Aij

R12ij− Bij

R6ij

)· fij

∑atomsi<j

qiqjεRij· fij [58, 61]

AMBER∑atoms

i<j

(Aij

R12ij− Bij

R6ij

)· fij

∑atomsi<j

qiqjεRij· fij [25]

CHARMM∑atoms

i<j

(Aij

R12ij− Bij

R6ij

) ∑atomsi<j

qiqjεRij

[74]

Table 2.3: Functional forms of the Lennard-Jones (LJ) and electrostatic potentials in the

OPLS, AMBER and CHARMM force fields. In OPLS, for intramolecular 1,4-interactions,

fij = 0.5; otherwise, fij = 1. In AMBER, for intramolecular 1,4-LJ interactions, fij =

0.5, for intramolecular 1,4-electrostatic interactions, fij = 0.833; otherwise, fij = 1. In

CHARMM, there is no fij term.

are very similar. The major differences consist in the calculation of Uangle, Udihedral and

Uimproper. When calculating Uangle, an additional Urey-Bradley (UB) component is in-

cluded in CHARMM to model a virtual harmonic bond between the 1st and 3rd atoms

involved in an angle, θ∠123. This term was originally described by the Urey-Bradley force

field [113]. For historical reasons, it was included when modeling an aqueous dipeptide

solution system [103]. At first, the UB term also included a linear component, but this

term was later dropped as it was found to be unnecessary in absolute energy calculations

[94]. Therefore, it is now simplified as a single quadratic equation alone to more accu-

rately model the vibrational spectra [9]. OPLS uses Ryckaert-Bellemans (RB) potentials

for calculating Udihedral, while AMBER and CHARMM model it as periodic trigonometric

functions. OPLS and AMBER model Uimproper as proper dihedral angles, yet with dif-

ferent parameters, while CHARMM uses a quadratic potential. In addition, the scaling

factors for 1,4 interactions are also different among the three force fields as described in

the caption of Table 2.3. Although they have very similar functional forms for calculating


the potential energy of the system, significant difference exists in terms of their parame-

terization philosophies [9] (i.e. how the parameters in the potential energy functions are

obtained or derived, and optimized). A discussion of such difference is beyond the scope

of this thesis.

OPLS was first developed as a united-atom force field (a.k.a. OPLS-UA) in 1988 [59].

In 1996, OPLS-AA, the all-atom version of OPLS [58] was developed. A major im-

provement of OPLS-AA took place in 2001, which reparameterized the Fourier torsional

coefficients in the calculation of Udihedral with more accurate quantum chemistry software,

resulting in OPLS-AA/L [61], which became widely used and stable.

AMBER first appeared in 1981 as a program for building models of molecules and

calculating their interactions [126]. The first so-named AMBER force field was developed

by Weiner et al. as a united-atom force field in 1984 (ff84) [127]. In 1986, it was extended

to become an all-atom force field (ff86) [128]. The first major improvement of AMBER

was published in 1994 by Cornell et al., who used a new charge model (RESP), new

VDW parameters that took consideration of vicinal electronegative atoms, and high-

level quantum mechanical data that was not available at Weiner’s time [25]. This version

of AMBER (ff94) is coined as the second-generation force field, after which AMBER

became one of the most widely used force fields for biomolecular simulations. Over the

time, many variants of AMBER have been developed, whose names can be confusing

for newcomers. In general, different versions of the AMBER force field are named as

“ff” + “last two digits of the year when it began to be used” + “any particular feature

(optional)”.

With the progress of computational power, the deficiency in ff94 such as over-stabilization

of α-helices became explicit [51]. This issue was first addressed in ff96 [63], and later


ff99 [124]. Both ff96 and ff99 tried to improve the force fields by refitting the backbone

dihedral parameters for φ and ψ, but it has been revealed that the way dihedral pa-

rameters were optimized in ff96 or ff99 results in incorrect conformational preferences

for Gly. Besides, over-stabilization of β-sheet and α-helices has also been observed in

ff96 and ff99, respectively [51]. The same problem of ff94 was also addressed by Garcıa

and Sanbonmatsu, who simply set the backbone dihedral potential for φ and ψ to zero,

resulting in ff94GS [39]. The over-stabilization-of-α-helix problem of ff99 was addressed

respectively by Sorin and Pande, who developed ff99φ by replacing the backbone dihedral

potential for φ in ff99 with that from ff94 [108], Duan et al., who developed ff03 with a

fundamentally new approach for deriving atomic partial charges [31], and Hornak et al.,

who developed ff99SB with extensive optimizations of backbone dihedral parameters for

both φ and ψ [51].

In more recent years, the AMBER force fields continue to be improved. Best et al.,

in an attempt to obtain the correct balance of secondary structure propensities, de-

veloped ff99SB* [15] and ff03* [15] using simple backbone energy corrections. Li and

Bruschweiler integrated existing NMR data, and developed ff99SBnmr1 [69]. Nerenberg

and Gordon revised the φ′ backbone dihedral potential, and developed ff99SB-phi [87].

All of the above modifications of AMBER are focused on the backbone parameters. In-

stead, Lindorff-Larsen et al. improved the side-chain torsional potentials based on ff99SB,

yielding ff99SB-ILDN [72]. It is named as such because the parameterization is based on

the four types of residues, Ile(I), Leu(L), Asp(D), Asn(N). In the literature, names like

ff99sb*-ildn, ff99sb-ildn-nmr or ff99sb-ildn-phi [95, 71, 8] can also be found. Such names

indicate combinations of two force fields which modified different aspects of the same base

force field without conflicts. For example, ff99sb*-ildn is a combination of ff99SB* and

ff99SB-ILDN, both of which were developed based on ff99SB. The former only modifies

the backbone dihedral potential terms while the later modifies those of sidechains. In

2012, a new charge model is proposed to be used together with ff99sb*-ildn to improve


residue-specific α-helix propensities, resulting in ff99SB*-ILDN-Q [14].

Most of the AMBER force fields have been developed using TIP3P [57] as the water

model. However, recognizing the deficiencies of the primitive three-site water model in

reproducing the phase diagram of water, Best et al. combined ff03* [15] and a highly opti-

mized water model called TIP4P/2005 [2], which behaves well in non-standard conditions

such as low temperature and high pressure, and developed ff03w [13].

As we can see, the development of AMBER is convoluted, hence Figure 2.2 is shown

to illustrate the relationships among different AMBER variants, in other words, how

AMBER has evolved.

CHARMM first appeared as a program for the calculation of macromolecular en-

ergy minimization and dynamics in 1982 [19]. In 1985, the first so-named united-atom

CHARMM force field, CHARMM19, was developed [102]. The naming convention for

CHARMM force field is “CHARMM” + “the version number of CHARMM program

which for the first time includes the then newest version of the CHARMM force field” [74].

For example, CHARMM19 indicates that this version of the CHARMM force field was

firstly included in the CHARMM program of version 19. The first all-atom CHARMM

force field for proteins, CHARMM22, was developed in 1998 by MacKerell et al. [74]. In

2004, a new potential energy component, energy correction map (CMAP), was added to

CHARMM22 to improve the accuracy of the backbone dihedral potential, resulting in

CHARMM22/CMAP [75] (a.k.a. CHARMM27 [71]). In 2011, based on CHARMM27,

Piana et al. developed CHARMM22* by removing the CMAP for all residues but Gly and

Pro, and adding modification of the backbone torsional potentials [95]. In 2012, in order

to overcome the over-stabilization of α-helix conformations in CHARMM22/CMAP, its

parameters were optimized again, leading to the development of the most recent version

of the CHARMM force field as of this writing, CHARMM36 [16].


ff94[25]

ff94GS[39] ff96[63] ff99[124]

ff99φ[108] ff99SB[51]

ff99SB*[15]

ff99SB*-ILDN[8]

ff99SB*-ILDN-Q[14]

ff99SB-ILDN[72]

ff99SBnmr1[69]

ff99SB-phi[87]

ff03[31]

ff03*[15]

ff03w + TIP4P/2005[13]

Figure 2.2: Family tree of AMBER force fields. Note that not every AMBER force field

is developed based on the last released one. Instead, the history of AMBER family is

more like a tree as shown above.


Overall, there is no doubt that all-atom force fields will continue to evolve, and that even

new representations of the energy surface such as the effects of charge polarization are

going to be developed [16]. Table 2.4 is a list of all the aforementioned force fields in

chronological order.

2.2.2 Coarse Grained Force Fields

An alternative to all-atom force fields is CG force field, which can be used to probe length

and time scale that are currently infeasible for atomistic systems. The aforementioned

united-atom force fields are just one type of the CG force fields, in which aliphatic

hydrogens are incorporated to their attached heavy atoms so that the total number of

atoms in the system is reduced, hence the simulations are sped up.

A CG force field that is particularly promising for the future study of this project is called

MARTINI. MARTINI was first developed in 2004 for coarse-grained lipid simulations [76],

and later extended for biomolecular simulations but still without protein in 2007 [77],

which is tagged as MARTINI 2.0. In 2008, MARTINI 2.1 started to include parameters

for simulations of coarse grained peptides [84].

The major techniques used for coarse graining in MARTINI include the reduction of

the number of degrees of freedom via four-to-one mapping (i.e. four heavy atoms are

typically represented as one) and the use of short-range potentials, which means that

the nonbonded potential vanishes when the interatomic distances become larger than

a specified cutoff (e.g. rcut = 1.2 nm [84]). This force field leads to increases in the

time scale by 2–3 orders of magnitude compared to their atomistic counterparts. [84]

Therefore, it also increases the length scale of system and can be useful for large-scale

simulations—in the context of this project, large aggregates of ELPs.


Year Name First Author Ref.O

PL

S

1988 OPLS-UA Jorgensen et al. [59]

1996 OPLS-AA Jorgensen et al. [58]

2001 OPLS-AA/L Kaminski et al. [61]

AM

BE

R

1984 ff84 Weiner et al. [127]

1986 ff86 Weiner et al. [128]

1995 ff94 Cornell et al. [25]

1997 ff96 Kollman et al. [63]

2000 ff99 Wang et al. [124]

2002 ff94GS Garcia et al. [39]

2003 ff03 Duan et al. [31]

2005 ff99φ Sorin et al. [108]

2006 ff99SB Hornak et al. [51]

2009 ff99SB*, ff03* Best et al. [15]

2010 ff99SB-ILDN Lindorff-Larsen et al. [72]

2010 ff99SBnmr1 Li et al. [69]

2010 ff03w+TIP4P/2005 Best et al. [13]

2011 ff99SB-phi Nerenberg et al. [87]

2012 ff99SB*-ILDN-Q Best et al. [14]

CH

AR

MM

1985 CHARMM19 Reiher [102]

1998 CHARMM22 MacKerell et al. [74]

2004 CHARMM22/CMAP (a.k.a. CHARMM27) MacKerell et al. [75, 71]

2011 CHARMM22* Piana et al. [95]

2012 CHARMM36 Best et al. [16]

Table 2.4: Evolutions of OPLS, AMBER, CHARMM in chronological order.


A special version of MARTINI with improved internal peptide dynamics has been devel-

oped by our group in collaboration with Mikyung Seo and Peter Tieleman in order to

simulate coarse grained ELPs and ALPs [107]. Unfortunately, due to unresolved issue in

the parameters which can lead to constant crash during simulations of multiple peptides,

my attempts to simulate mesoscopic aggregate of ELPs using this force field have been

unsuccessful as of the writing of this thesis.

2.3 Sampling Errors

When applying MD simulations to solve scientific problems, errors are unavoidable. In

general, there are two types of errors, statistical error (a.k.a. statistical sampling error)

and systematic error. The second type of error can be further divided into systematic

sampling error and systematic force field error.

Statistical error and systematic sampling error mainly come from insufficient sampling

time, while systematic force field error is a direct result of an imperfect force field used.

In terms of their effect on the properties calculated from MD simulations, statistical error

affects a value’s precision while systematic error affects its accuracy [86].

One way to alleviate systematic sampling error as well as the statistical error is to explore

the sampling (conformational) space of the target system as comprehensive as possible,

which can be achieved by either using enhanced sampling techniques, or multiple replica

(MR) simulations with different initial system conformations. Common enhanced sam-

pling techniques include umbrella sampling (US) [111], replica exchange (REX) [109], as

well as more recent algorithms developed in our laboratory, including STDR [99], virtual

replica exchange (VREX) [99]. MR simulations are also called brute force sampling,

which is the method exclusively used in this thesis.


To reduce the systematic force field error, the key is to select an appropriate force field.

If the force field is biased or error-prone, even if the whole conformational space has

been well sampled, which is not always possible in the first place, the results, however

precise, would still be biased or even wrong. In fact, along with enormous advancements

of computational power and improvements of the sampling techniques, more limitations

of common force fields that were once implicit become explicit. In the last two years,

multiple studies have compared and evaluated a variety of modern force fields [95, 8, 71].

The results turn out to be surprising given that a force field that was once considered

superior may become inferior (e.g. OPLS-AA/L), which forced us to rethink of our

selection of a better force field for the near future. As mentioned above, a comparison

study for selecting a better force field is underway and the results obtained as of writing

are presented in Appendix A.

Chapter 3

Elastin-based Peptides in Water and

Methanol

3.1 Background

As introduced in Section 1.4, it is found that model peptides derived from the native

tropoelastin, i.e. EBP, can coacervate or form amyloid-like fibrils, which can be modu-

lated by sequence composition or solvent condition. Previous work from my group has

shown that a high percentage of combined Pro and Gly content is required to prevent an

EBP from forming amyloid-like fibrils in water [98], but still very limited knowledge is

known about how solvent conditions affect EBPs’ aggregation propensity at the molecular

level, which is to be investigated in this chapter.

We have performed atomistic MD simulations in explicit water and methanol to study

their solvent effects on a set of model EBPs, (GVPGV)7, (PGV)12, (GGVGV)7, (GVGVA)7,

(GV)18. The first 2 are referred to as ELPs since they are representative of a single HP

domain in native tropoelastin, and tend to coacervate. The next 3 are referred as ALPs

32

Chapter 3. Elastin-based Peptides in Water and Methanol 33

since they tend to form amyloid-like fibrils [98]. However, the sequence, (GGVGV)7, has

been found only to be able to form amyloid-like fibrils in water, but to form an amor-

phous film, which eventually becomes beaded string structures in methanol [36]. The

major difference between ELPs and ALPs in sequence composition is the presence of Pro

in the former. In addition, we have also included G35 in our study. G35 is considered

to be a good control for studying the solvent effects on the peptide backbone because of

its absence of any sidechains. The model peptides set is summarized in Table 3.1. All

the peptides are capped with an acetyl group at the N-terminal and an amide group at

the C-terminal, and simulated as monomers. The major results we found include: (1)

all peptides become more extended in methanol than in water; (2) The peptides remain

disordered in both solvents, which is consistent with the previous results in water from

our group [98]; (3) solvophobic effect (a.k.a. hydrophobic in water) is reduced in methanol

than in water; (4) in methanol, ALPs forms extensive β-sheet, but ELPs do not.

That methanol promotes extensive formation of β-sheets in ALPs, especially in (GGVGV)7,

might at first sight appear to be contradictory with the experimental observation that

methanol inhibits the formation of β-sheet-rich amyloid-like fibrils [36]. To resolve this

paradox, we hypothesize that the promotion of β-sheet formation for a monomer and the

inhibition of the formation of amyloid-like fibrils by methanol is due to the same reason,

the reduction of solvophobic effect. For a monomer, a reduction of the solvophobic effect

leads to better solvation of the peptide by surrounding solvent molecules. In particular,

in the case of (GGVGV)7 in methanol, the relatively nonpolar methanol molecules prefer-

entially solvate with the peptide’s nonpolar sidechains over its polar backbone, resulting

in the peptide’s formation of β-sheet, in which the sidechains become very exposed to

the solvent. However, for the fibrils, the same reduction leads to weaker interactions

among neigbouring β-sheet layers as in the stacking β-sheet model [93], hence inhibits

the formation of amyloid-fibrils.


Elastin-like peptides (ELPs) (GVPGV)7, (PGV)12

Amyloid-like peptides (ALPs) (GGVGV)7, (GVGVA)7 and (GV)18

Backbone control G35

Table 3.1: Model peptides.

3.2 Results

3.2.1 Intrinsically Disordered Peptides

Figure 3.1 shows four representative snapshots from the simulations of one ELP, (GVPGV)7,

and one ALP, (GGVGV)7, successively in water and methanol. The ELP is very col-

lapsed in water due to the hydrophobic effect. In contrast, it is much more extended in

methanol. Similar to the ELP, the ALP is also very collapsed in water, but with more

β-sheet formation. In contrast, it forms extensive β-sheets in methanol.

In our simulations, although methanol promotes the formation of β-sheets in ALPs,

none of the model peptides forms any stable tertiary structure as a monomer in water

or methanol, given that the tertiary structure is defined as the packing of secondary

structural elements, and it confirms the high propensity for intrinsic disorder of these

peptides as shown previously [98]. In addition to the inclusion of Pro, the secondary

structure broker, in ELPs, such disorder of the peptides as monomers is probably due to

their extremely simple and hydrophobic sequences. In order to sample a comprehensive

ensemble for the conformational equilibrium of IDPs, as many conformational states as

possible should be sampled per system, which is approached by multiple replica simu-

lations in this study, in which each replica starts from a unique initial conformational

state. In order to have a good description for the system, the properties of interest are

calculated as statistical average over all replicas.


(GVPGV)7

(GVPGV)7

(GGVGV)7

(GGVGV)7

Figure 3.1: Representative snapshots of an ELP, (GVPGV)7, on the left column and

an ALP, (GGVGV)7, on the right column in water (red) and methanol (blue) from the

simulations. The blue ends indicate the N-terminal of the peptides.


3.2.2 Radius of Gyration

The Rg was calculated to quantify the overall size of the peptides for all the systems.

As shown in Figure 3.2, all sequences have has a much broader distribution of Rg in

methanol than in water, and the average of Rg, as indicated by the vertical bars, is

larger in methanol than in water. An increased Rg indicates that the conformation

becomes more extended, which suggests that methanol is a better solvent than water

for these hydrophobic model peptides since they are not as unlikely to interact with the

solvent in methanol as in water. More extended conformations suggest that there are

fewer intramolecular peptide-peptide interactions and more peptide-solvent interactions.

Therefore, we quantified different types of intra- and intermolecular interactions.

3.2.3 Intramolecular Peptide-peptide Interactions

Two types of interactions within a peptide are calculated and shown in Figure 3.3. The

x axis shows the number of peptide-peptide hydrogen bonds (H-bonds) normalized by

the number of H-bonding groups, and the y axis shows that of peptide-peptide nonpolar

interactions normalized by the number nonpolar groups. When going from water to

methanol, on the one hand, the number of nonpolar interactions decreases significantly for

all the model peptides, which is consistent with the increased Rg and hence more extended

conformations. On the other hand, the number of H-bonds increases significantly for

ALPs while it barely changes for ELPs, both of which are inconsistent with an increased

Rg. The trend for G35 resembles that for ALPs. Compared with ALPs, the relatively

higher propensity of nonpolar interactions in G35 in spite of its absence of nonpolar

sidechains reflects the packing of methylene Cα groups in the highly collapsed polypeptide

chain (See Figures 3.2), as CαH2 is the only nonpolar group in polyglycine.


0.0

0.1

0.2

0.3

0.4

P

(GVPGV)7

water

methanol

(PGV)12

0.0

0.1

0.2

0.3

0.4

P

(GGVGV)7 (GVGVA)7

0.6 0.8 1.0 1.2 1.4Rg (nm)

0.0

0.1

0.2

0.3

0.4

P

(GV)18

0.6 0.8 1.0 1.2 1.4Rg (nm)

(G)35

Figure 3.2: Distribution of Rg of the model peptides in water and methanol. The vertical

bar indicates the average value. Error bars in this and all the following figures are

calculated as SEM. Rg is calculated from equation R2g = 1

N

∑Ni=1 ‖~Ri − ~Rcm‖2 where ~Ri

is the position of the Cα atom of the ith residue, ~Rcm is the center of mass of all the Cα

atoms and N is the number of residues in the peptide.


The fact that nonpolar interactions are reduced in all the sequences suggests that the

solvophobic effect becomes weaker in methanol than in water. This is due to the presence

of methyl groups in methanol molecules, which makes them much more nonpolar than

water molecules. The fact that the H-bonds of ALPs are more abundant in methanol

suggests that more extended conformations lead to the formation of more H-bonds, which

is indicative of the formation of secondary structure.

3.2.4 Interactions between Peptide and Solvent

Interactions between peptide and solvent are categorized as the solvation of polar groups

and nonpolar groups. Each type of solvation includes the interactions of the polar or

nonpolar atoms of the peptide with both the polar groups of the solvent (i.e. OH groups

of both water and methanol) and, in the case of methanol, the nonpolar (methyl) group

of the solvent. The results are then normalized by the corresponding values calculated

from another set of control simulations, in which the peptides are restrained to their most

extended state so as to maximize their interactions with solvent. The normalized values

are always between 0–1 and used to quantify the extent of solvation.

Figure 3.4 shows that from water to methanol, ELPs move roughly along the direction

of the diagonal, which means that both the polar and nonpolar groups become better

solvated in methanol than in water, which is consistent with an increased Rg. However,

ALPs move towards the upper left corner of the plot instead, which means that although

the nonpolar groups also become better solvated as in ELPs, their polar groups become

more desolvated, suggesting that methanol preferentially solvates the nonpolar groups of

the solute over its polar groups. The trend for G35 is again similar to that seen for ALPs,

but at a lower scale in terms of solvation of nonpolar groups, which is consistent with

a relatively high propensity for intramolecular nonpolar interactions as shown in Figure


0.06 0.08 0.10 0.12 0.14 0.16Peptide-peptide H-bonds

0.6

0.8

1.0

1.2

1.4

1.6

Nonpola

r in

tera

ctio

ns

Figure 3.3: Propensity for intramolecular peptide-peptide interactions in water (red) and

in methanol (blue). The number of nonpolar interactions is normalized by the number

of primary and secondary nonpolar C atoms in the peptides. The number of H-bonds

is normalized by the number of H-bonding groups in the peptides, which is calculated

as 2 × N − P where N and P are respectively the total number of residues and the

number of Proline residues in each peptide. ELP: [N: (GVPGV)7, H: (PGV)12], ALP:

[•: (GGVGV)7, J: (GVGVA)7, I: (GV)18], F: G35.


3.3.

The results for ALPs is inconsistent with an increased Rg, but consistent with an increased

number of H-bonds as shown above. Overall, it is indicative of the formation of secondary

structure again. Therefore, we analyzed the content of secondary structure for all the

systems.

3.2.5 β-sheet Content

There is no α-helix formation in any of the systems, and the content of β-sheet is found

to undergo a significant change from water to methanol. As shown in Figure 3.5, the

β-sheet content roughly doubles from water to methanol for all model peptides, but in

terms of the absolute values, ELPs and G35 are relatively incapable of β-sheet formation

compared to ALPs, especially in methanol. That ELP cannot form extensive β-sheets is

consistent with a previous study, which ascribed this effect to the presence of Pro [98],

a secondary structure breaker, while G35 is full of Gly, which makes it too flexible to

stabilize in extended secondary structure. The effect of Pro is more obvious by noticing

that the β-sheet content of (PGV)12 is even lower than that of (GVPGV)7 because of its

higher fraction of Pro. The results depicted in Figure 3.5 show that methanol promotes

the formation β-sheet in ALPs as seen in Figure 3.1(d).

3.3 Discussion

We have shown the results of all of the model peptides successively in water and in

methanol. The results in water are consistent with those from our group previously. In

particular, ALPs forms more β-sheet in water than ELPs [98], the Rg of (GVPGV)7 is

about 0.84 nm [99], and the peptides are all intrinsically disordered [98, 99].


0.60 0.65 0.70 0.75 0.80 0.85Solvation of polar groups

0.60

0.65

0.70

0.75

0.80

0.85

Solv

ati

on o

f nonpola

r gro

ups

Figure 3.4: Propensity for intermolecular peptide-solvent interactions in water (red) and

in methanol (blue). The solvation of polar groups in water is quantified as the number of

H-bonds between the peptide and water, and in methanol, as the sum of the number of

H-bonds and that of pairwise interactions between the polar heavy atoms of the peptide

(i.e. backbone O and N atoms) and the nonpolar heavy atoms of methanol (i.e. methyl

C atoms). The solvation of nonpolar groups in water is quantified as the sum of the

number of pairwise interactions between the nonpolar heavy atoms (i.e. primary and

secondary nonpolar C atoms) of the peptide and the O atom of water, and in methanol,

as the sum of the number of pairwise interactions between the nonpolar heavy atoms of

the peptide and the O atoms of methanol and that of the nonpolar pairwise interactions

between the heavy atoms of the peptide and those of methanol. ELP: [N: (GVPGV)7,

H: (PGV)12], ALP: [•: (GGVGV)7, J: (GVGVA)7, I: (GV)18], F: G35.


(GVPGV)7 (PGV)12 (GGVGV)7 (GVGVA)7 (GV)18 (G)35

0.00

0.05

0.10

0.15

0.20

0.25

β-s

heet

conte

nt

0.04

0.02

0.10 0.10

0.09

0.06

0.08

0.04

0.220.21 0.21

0.09

water

methanol

Figure 3.5: Propensity to form β-sheet structure in all the model peptides in water and

in methanol.


Comparing the results in different solvents, We found that all the model peptides become

more extended in methanol than in water. More extended structures correspond better

solvation, which is true for ELPs, but not for the polar groups of ALPs due to their

significant propensity to form β-sheet. In a β-sheet, the nonpolar groups of the peptide

monomer are very exposed to the solvent, and hence are well solvated. Concurrently,

the polar groups of the backbone become relatively buried when to form H-bonds, and

hence are desolvated. Therefore, in ALPs, forming β-sheet provides the possibility of

solvating nonpolar groups, desolvating polar groups, and increasing the overall size of

the peptide at the same time. Similar results showing that alcohol can promote the

formation of extended secondary structure (i.e. α-helix and β-sheet) have also been

reported previously, but for very different sequences such as globular proteins ferredoxin

[106] and BBA5 [55]. A common feature between these two sequences and our ALPs is

that they contain very few or no Pro residues. In ELPs, the presence of Pro inhibits their

formation of β-sheet, so while their nonpolar groups also become much better solvated,

their polar groups are forced to become more solvated as well with the increase in the

peptides’ size. In other words, methanol swells ELPs.

Our results show that methanol promotes β-sheet formation in ALPs, including (GGVGV)7

which has the same repeat unit as (VGGVG)n. However, the latter has been found to

form amyloid-like fibrils which contain a high amount of β-sheet only in water, but to

form an amorphous film instead in methanol [36, 34, 35], which leads to an apparent

paradox: why does methanol promote the formation of β-sheet in ALP while preventing

it from forming β-sheet-rich amyloid-like fibrils.

Based on this study, we propose a hypothesis to resolve this apparent paradox: since the

increased content of β-sheet in (GGVGV)7 as a monomer in methanol is mainly a result of

reduced solvophobic effect, as reflected by the decrease of intramolecular peptide-peptide

nonpolar interactions and the exposure of its nonpolar groups to the solvent, it is reason-


able to assume that the solvophobic effect between different monomers is also reduced

between when multiple peptides exist. According to the stacking β-sheet model for Aβ-

amyloid fibrils [93], such a reduction in solvophobic effect would necessarily weaken the

nonpolar interactions between neigbouring layers of cross β-sheets, resulting in ineffective

stacking that prevents the formation of amyloid-like fibrils. Instead, our results suggest

that in methanol, many small β-sheets are present but they cannot assemble into highly-

ordered amyloid fibrils because of the weak solvophobic effect. Consistent with previous

studies [106, 64, 55], while methanol promotes secondary structure, it does not favor the

tertiary structure required for protein folding or fibril formations.

3.4 Conclusion

We have characterized different types of structural properties of our model peptides suc-

cessively in water and in methanol. Interestingly, methanol promotes β-sheet formation

of ALPs, but prevents them from forming amyloid-like fibrils, whose core cross-β-sheet

structure consists of stacked β-sheets. We hypothesize that this effect is due to the weak-

ening of nonpolar interactions among peptides because of a reduced solvophobic effect of

nonpolar groups in methanol. Concurrently, the preferential solvation of nonpolar side

chains over that of polar polypeptide backbone promotes β-sheet formation. As a result,

even though there is a higher amount of β-sheets, they cannot stack effectively to form

fibrils. In contrast, ELPs can not form β-sheet due to their high Pro content. Instead,

they swell as a result of both of their polar and nonpolar groups being better solvated.


3.5 Material & Methods

Simulation Setup We performed MD simulations of 40 replicas of each of the 6 model

peptides as a monomer successively in water and in methanol. Each replica was simulated

at 1 par, 300 K for 200 ns, and the simulations of (GVPGV)7 in water and in methanol

were extended to 500 ns. Each system is solvated in a triclinic box with the three angles

as 60◦, 60◦, 90◦. For peptides in water, the box size is 5.4 × 5.4 × 3.8 nm3 with 3700

water molecules per system. For (GVPGV)7 and (GVGVA)7 in methanol, the box size is

7.3× 7.3× 5.2 nm3 with 4000 methanol molecules per system. For (PGV)12, (GGVGV)7

and (GV)18 in methanol, the box size is 6.9×6.9×4.9 nm3 with 3300 methanol molecules

per system, for G35 in methanol, the box size is 6.4× 6.4× 4.5 nm with 2700 methanol

molecules per system. The initial structures of the peptides for different replicas were

selected from simulations at 700 K in vacuo to ensure that all replicas started with very

different initial states. For simplicity, only peptides structures without cis-Val-Pro were

selected.

All simulations were performed at constant pressure and temperature with periodic

boundary conditions using the program Gromacs 4.0.5 [12, 48] in the OPLS-AA/L force

field [58, 61]. Explicit TIP4P water [57] and methanol [58] models were used. The lin-

ear constraint solver (LINCS) algorithm was used to constrain all bond lengths [47, 46].

A cutoff of 1.4 nm was used for Lennard-Jones interactions. The Particle-Mesh Ewald

(PME) algorithm [26, 33] was used to calculate long-range electrostatics interactions with

a Fourier spacing of 0.12 and a interpolation order of 4. The Nose-Hoover thermostat

[88, 50] was used for temperature coupling with the peptide and solvent coupled to two

temperature baths and a time constant of 0.1 ps. The Parrinello-Rahman [91] algorithm

was used for pressure coupling with a time constant of 2 ps. The integration step size is

2 fs and the system coordinates were stored every 10 ps.


Structural Analysis The first 150 ns of each trajectory was omitted as equilibra-

tion based on the convergence analysis of Rg along the time as shown in Figure 3.8,

resulting in a total of 14 µs of production time for the systems of (GVPGV)7 in water

and in methanol, and a total of 2 µs of production time for all other systems. When

characterizing interactions between the peptide and solvent, a nonpolar interaction was

counted whenever two nonpolar heavy atoms were within a distance of 0.55 nm, which

was the radius of the peptide’s first solvation shell of the nonpolar C atoms based on

radial distribution function (RDF) analysis as shown in Figure 3.6. Only the primary

and secondary C atoms were considered because tertiary C atoms were found to have a

very different solvation shell as shown in Figure 3.6. A cutoff of 0.45 nm was selected in

the same way for calculating the interactions between polar and nonpolar heavy atoms as

shown in Figure 3.7. RDF describes the change of the density of particles of interest (e.g.

nonpolar atoms of the peptide) along the radius from a reference particle (e.g. nonpolar

atoms of the solvent) and is calculated using g rdf from Gromacs tools with a bin size

of 0.02 nm. For H-bond calculation, g hond also from Gromacs tools was used with a

donor-acceptor distance cutoff of 0.35 nm and an acceptor-donor-hydrogen angle cutoff

of 30◦ [73]. The same criteria were also applied to the characterization of intramolecular

peptide-peptide interactions except that when counting nonpolar interactions, the two

nonpolar heavy atoms had to be at least two residues away in the primary sequence. The

β-sheet content was calculated using the DSSP program [60]. The peptide snapshots were

generated with VMD [53], and the plots were created with Matplotlib [54].


0.0

0.2

0.4

0.6

0.8

1.0

Densi

ty

(GVPGV)7 in methanol Primary C

Secondary C

Tertiary C

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0Radius (nm)

0.0

0.2

0.4

0.6

0.8

1.0

Densi

ty

(GGVGV)7 in methanol

Figure 3.6: RDFs between the primary, secondary, and tertiary nonpolar C atoms of the

peptide and the nonpolar heavy atoms of the solvent for one ELP, (GVPGV)7, and one

ALP, (GGVGV)7. The first solvation shell (i.e. the radius corresponds to its first trough

in density) of tertiary C atoms extends much further than that of primary and secondary

C atoms.


0.0

0.2

0.4

0.6

0.8

1.0

Densi

ty

(GVPGV)7 in water

Primary C

Secondary C

Tertiary C

(GVPGV)7 in methanol

0.2 0.4 0.6 0.8 1.0Radius (nm)

0.0

0.2

0.4

0.6

0.8

1.0

Densi

ty

(GGVGV)7 in water

0.2 0.4 0.6 0.8 1.0Radius (nm)

(GGVGV)7 in methanol

Figure 3.7: RDFs between the primary, secondary, and tertiary C atoms of the peptide

and the polar O atom of the solvent.


0 100 200 300 400 500

0.7

0.8

0.9

1.0

1.1

1.2

1.3

Rg (

nm

)

(GVPGV)7

water

methanol (PGV)12

0.7

0.8

0.9

1.0

1.1

1.2

1.3

Rg (

nm

)

(GGVGV)7 (GVGVA)7

0 50 100 150 200Time (ns)

0.7

0.8

0.9

1.0

1.1

1.2

1.3

Rg (

nm

)

(GV)18

0 50 100 150 200Time (ns)

(G)35

Figure 3.8: Time evolution of the peptide Rg averaged over all 40 replicas for each

system. The vertical line marks the simulation time of 150 ns, after which the systems

are considered to be equilibrated and the data is used for analysis.

Chapter 4

Solvent Quality Studies

4.1 Background

We have shown that methanol has a profound effect on the conformations of the model

peptides in Chapter 3 from the perspective of aggregation. This chapter will focus on

the examination of the quality of various solvents on these model peptides.

As introduced in Section 1.5, it is known from polymer physics that in a poor solvent, the

polymer molecule tends to collapse, while in a good solvent, it tends to swell, and between

the two in a θ-solvent, the molecule behaves as a random coil [104, 90]. The relatively

larger Rg as shown in Section 3.2.2, which corresponds to a greater extent of swelling of

the model peptides in methanol than in water, suggests that methanol is a better solvent

than water for such hydrophobic peptides. However, this does not necessarily mean that

methanol is a good solvent, because a good or poor solvent is an absolute concept and we

do not know what constitutes a θ-solvent for these peptides. The questions of whether

methanol is a good solvent and what kind of solvent can be a θ-solvent for the models

peptides are to be discussed in this chapter.

50

Chapter 4. Solvent Quality Studies 51

The model peptides used in this chapter are the same as in Chapter 3: two ELPs,

(GVPGV)7, (PGV)12, three ALPs, (GGVGV)7, (GVGVA)7, (GV)18, and G35 as a back-

bone control. However, for the solvents set, in addition to water (H2O) and methanol

(CH3OH), more alcohols, ethanol (C2H5OH), 1-propanol (C3H7OH), 1-butanol (C4H9OH),

1-pentanol (C5H9OH), 1-hexanol (C6H11OH), 1-heptanol (C7H13OH), 1-octanol (C8H15OH),

as well as octane (C8H18) are also included. This solvent series represents a trend of de-

creasing polarity, with water and octane representing the two extremes of polar and

nonpolar solvents. For convenience, the prefix “1-” will be omitted when referring to the

alcohols in the rest of this thesis. Note that polar and nonpolar are used to describe

the solvents while hydrophilic and hydrophobic are used to describe the peptides. The

difference is that a hydrophobic molecule is not necessarily nonpolar, while a nonpolar

molecule is usually also hydrophobic and the same logic applies to hydrophilic and polar.

For example, a hydrophobic peptide also has a polar backbone, and hence is not nonpo-

lar, but a nonpolar octane molecule can also be described as being hydrophobic. Besides,

it would be awkward in certain circumstances to describe a solvent as hydrophobic or

hydrophilic. For example, it does not make much sense to describe water as being more

hydrophilic or less hydrophobic than methanol, while it is natural to say water is more

polar or less nonpolar than methanol.

In comparison with the results from simulations at high temperature in vacuo, the con-

dition of which is used to mimic that of a θ-solvent, we found that none of the above

solvents is a θ- or good solvent for the model peptides. In order to find a θ- or good sol-

vent, other factors besides polarity such as the uneven distribution of polar and nonpolar

groups in the peptides need to be taken into consideration.


4.2 Results

4.2.1 Radius of Gyration

The Rg of the 6 model peptides in different solvents are shown in Figure 4.1. For each

peptide, Rg is shown as a function of the length of the alkyl chain of the solvent. The

shape of the curve varies from peptide to peptide. For (PGV)12, Rg keeps increasing

until ethanol, then flattens out, and starts decreasing at heptanol. For (GVPGV)7, Rg

reaches its peak at propanol. The indentation at butanol is unexpected, and according

the distribution of Rg, it is a result of the peptide being trapped in local minima where

Rg is low. Its Rg also starts decreasing at heptanol. The Rg of (GVGVA)7 and (GV)18

are very close from water to pentanol except for that at butanol, and they reach their

peaks at hexanol and heptanol, respectively. For (GGVGV)7, the Rg reaches its peak

at hexanol. For G35, the Rg also reaches its peak at heptanol, but at a much lower

scale than the other peptides. Although the model peptides are more hydrophobic than

average [78], they still have polar backbones, so both water and octane, which represent

the polar and nonpolar ends of the spectrum of solvent polarity, are the poorest solvents

in the series. The initial increase of Rg for all the peptides at the beginning indicates

an improvement in solvent quality. This is expected because as the alkyl chain of the

alcohol becomes longer, the solvent becomes more nonpolar, which results in a weaker

solvophobic effect and hence stronger peptide-solvent interactions. The decrease after

heptanol indicates that if the solvent becomes too nonpolar, peptide-peptide interactions

between polar peptide backbones bonds become more favored and over-compensate for

the improved solvation of the hydrophobic sidechains.

If the peptides are ordered by the length of the alkyl chain of the solvent in which it

first reaches its peak value in Rg as shown in Table 4.1, the following relationship is


obtained: the less hydrophobic the peptide is, the more sensitive it is to a decrease in

solvent polarity, with G35 as the only outlier.

Peptide Hydrophobicity score Solvent

(PGV)12 0.73 ethanol (C2H5OH)

(GVPGV)7 1.2 propanol (C3H7OH)

(GGVGV)7 1.44 hexanol (C6H11OH)

(GVGVA)7 1.88 heptanol (C7H13OH)

(GV)18 1.9 heptanol (C7H13OH)

G35 -0.4 heptanol (C7H13OH)

Table 4.1: Peptide hydrophobicity and the solvent in which the peptide first reaches its

maximum Rg. The hydrophobicity score is calculated using the Kyte-Doolittle hydropho-

bicity scales [65].

These results are consistent with how tropoelastin is isolated in experiments, where it

remains soluble in propanol:butanol (3:5 in volume), while other proteins precipitate [78].

4.2.2 Secondary Structure Content

Figure 4.2 shows the analysis of various structures defined in DSSP [60]. As it shows, there

is no helix formation in any of the peptides in any of the solvents. The content of bends

remains roughly constant across all solvents. The content of coil decreases dramatically

in octane while remains constant in the other solvents, which corresponds to an increase

in the propensity for β-sheets, β-bridges, and H-bonded turns.

As for the subplot of β-sheet content, that the content does not change much for ELPs

is not surprising because of their presence of Pro, and G35 is an outlier. However, that

the content does not keep increasing for ALPs as the solvent becomes more nonpolar


water

methanol

ethanol

propanol

butanol

pentanol

hexanol

heptanol

octanoloctane

0.7

0.8

0.9

1.0

1.1

1.2

1.3

1.4

Rg (

nm

)

(GVPGV)7

(PGV)12

(GGVGV)7

(GVGVA)7

(GV)18

(G)35

Figure 4.1: Average Rg of the model peptides in water, alcoholic solvents, and octane.

The alcoholic solvents are ordered by the length of the alkyl chain. Rg is calculated the

same way as in Figure 3.2.


0.0

0.1

0.2

0.3

0.4

0.5 β-sheet (E) α-helix (H) H-bonded turn (T)

0.0

0.1

0.2

0.3

0.4

0.5 β-bridge (B) 310-helix (G)

Coil (C)

water

methanol

ethanol

propanol

butanol

pentanol

hexanol

heptanol

octanol

octane

0.0

0.1

0.2

0.3

0.4

0.5

Bend (S)

water

methanol

ethanol

propanol

butanol

pentanol

hexanol

heptanol

octanol

octane

π-helix (I)(GVPGV)7

(PGV)12

(GGVGV)7

(GVGVA)7

(GV)18

(G)35

Figure 4.2: Various types of backbone structures as defined in DSSP[60] for all model

peptides in all solvents.

is unexpected. Instead, it decreases in trend except in octane. Comparing the β-sheet

content in water and methanol to that in Figure 3.5, which is calculated from another

independent set of simulations, there is a discrepancy. As we found out later, the dis-

crepancy is due to the introduction of cis- peptide bonds in the initial conformations of

the peptides, which will be discussed in detail in Subsections 4.2.4 and 4.2.5.


4.2.3 Size of peptides In Vacuo

The results of Rg in alcohols in Figure 4.1 show that there is a plateau for ELPs between

propanol and heptanol, which is around Rg = 1.22 nm, and a solvent with a longer alkyl

chain than heptanol does not extend the peptide any further as octanol is too hydropho-

bic. An interesting question inspired from this observation is whether 1.22 nm is greater

than the value of Rg in a θ-solvent, which we name it θ-Rg. In other words, are any

of the alcoholic solvents considered either θ-solvents or good solvents of the ELPs? To

address this question, simulations at a series of temperatures from 300 to 4039 K in vacuo

were conducted to investigate θ-Rg since at a significantly high temperature, the confor-

mational entropy of the system is maximized, which then dominates the conformational

ensemble, thereby the peptide is maximally disordered as in a θ-solvent.

Figure 4.3 shows that the Rg of ELPs increases rapidly as the temperature rises to 1000

K, and becomes constant above 2000 K at around 1.51 nm and 1.48 nm. We postulate

that these maximum values correspond to maximum disorder and therefore approximate

the θ-Rg of the two ELPs. The normal distribution of end-to-end distances as shown

in Figure 4.4 also indicates that the ELPs behave approximately as random chains at

such high temperatures. These results suggest that the peptides are not as extended in

alcohols as they are at high temperatures in vacuo. Therefore, the alcohols are neither θ-

or good solvents. One possible reason is the presence of intramolecular H-bonds, which

presumably contributes to the peptides’ collapse. Increasing solvent nonpolarity does not

always diminish intramolecular peptide-peptide H-bonds as shown in Figure 4.5, whereas

the peptides at high temperatures in vacuo form virtually no H-bonds as shown in Figure

4.3.


0 500 1000 1500 2000 2500 3000 3500 4000 4500Temperature (K)

0.8

1.0

1.2

1.4

1.6

Rg (

nm

)Rg (GVPGV)7

Rg (PGV)12

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Pepti

de-p

epti

de H

-bonds

H-bond (GVPGV)7

H-bond (PGV)12

Figure 4.3: Rg and intramolecular peptide-peptide H-bonds propensity of ELPs in vacuo

as a function of temperature.

0 1 2 3 4 5 6 7 8 9End-to-end distance

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

P

(GVPGV)7

r2 = 0.996

0 1 2 3 4 5 6 7 8 9End-to-end distance

(PGV)12

r2 = 0.996

Figure 4.4: Distribution of end-to-end distance of ELPs in vacuo at 2707 K. The solid

lines are fits to normal distributions. r2 indicates the fitting quality.


water

methanol

ethanol

propanol

butanol

pentanol

hexanol

heptanol

octanoloctane

0.00

0.05

0.10

0.15

0.20

0.25

Pepti

de-p

epti

de H

-bonds

(GVPGV)7

(PGV)12

(GGVGV)7

(GVGVA)7

(GV)18

(G)35

Figure 4.5: Intramolecular H-bonds propensity of the model peptides in various solvents.

The total number of H-bonds is normalized by the number of H-bonding groups in each

peptide.


4.2.4 The Discrepancy in β-sheet content

The simulations reported in Chapter 3 and Chapter 4 are referred to as Dataset 1 &

Dataset 2 (Ds. 1 & Ds. 2). Ds. 1 is comprised of 12 different systems, which consist of 6

model peptides successively in water and in methanol. Ds. 2 is comprised of 60 systems,

which correspond to all possible combinations of the same 6 model peptides and 10

different solvents. Therefore, the systems included in Ds. 2 are a superset of those in Ds.

1, and the comparison between them will focus the common simulation systems.

The discrepancy in β-sheet content observed between Ds. 1 and Ds. 2 is plotted in Figure

4.6. It shows that the β-sheet content is significantly higher in Ds. 1 than that in Ds. 2.

The only difference between the two independent sets of simulations is the temperature

selected when preparing the initial conformations in vacuo. By design, we used high

temperature in order to ensure that the initial conformations for different replicas of the

same system are very different from each other so as to reduce the systematic sampling

error (as discussed in Chapter 2). The temperatures selected for Ds. 1 and Ds. 2 are 700 K

and 3000 K, respectively. The conformations generated at 3000 K are expected to be more

heterogeneous than those produced at 700 K. However, the former turns out to contain

an unrealistic number of cis- peptide bonds, which we think is probably the major cause

of the discrepancy in β-sheet content because they have been known to induce different

structural properties than their trans- counterparts [56, 89, 40]. The details about the

ratio of the cis/trans peptide bonds will be shown in the next subsection. In contrast,

as shown in Figure 4.7, the distributions of Rg of the peptides in water and in methanol

are not significantly affected by those cis- peptide bonds. Moreover, similar results have

already been obtained for (PGV)12 using another force field, CHARMM22* [95], as shown

in Figure 4.8. Taken together, the analysis suggests that the results depicted in Figure

4.1 are reproducible, which is currently being verified.


(GVPGV)35 (PGV)12 (GGVGV)35 (GVGVA)35 (GV)18 (G)35

0.00

0.05

0.10

0.15

0.20

0.25

β-s

heet

conte

nt

Ds. 1 waterDs. 2 waterDs. 1 methanolDs. 2 methanol

Figure 4.6: Comparison of β-sheet content between in Ds. 1 and Ds. 2.


0.0

0.1

0.2

0.3

0.4

P

(GVPGV)35

Ds. 1 waterDs. 1 methanolDs. 2 waterDs. 2 methanol

(PGV)12

0.0

0.1

0.2

0.3

0.4

P

(GGVGV)35 (GVGVA)35

0.6 0.8 1.0 1.2 1.4 1.6Rg (nm)

0.0

0.1

0.2

0.3

0.4

P

(GV)18

0.6 0.8 1.0 1.2 1.4 1.6Rg (nm)

(G)35

Figure 4.7: Comparison of Rg in Ds. 1 and Ds. 2.


water

methanol

ethanol

propanol

butanol

pentanol

hexanol

heptanol

octanoloctane

0.7

0.8

0.9

1.0

1.1

1.2

1.3

Radiu

s of

Gyra

tion (

Rg)

(nm

)

(PGV)12 CHARMM22*

(PGV)12 OPLS-AA/L

Figure 4.8: Average Rg of (PGV)12 in different solvents, successively using the low-

temperature (Ds. 1) protocol with CHARMM22* and the high-temperature (Ds. 2) pro-

tocol with OPLS-AA/L. The simulations in CHARMM22* include only 7 solvents.


4.2.5 Ratio of cis/trans Peptide Bonds

In total, 4 types of residues (G, V, P, A), and two types of bonds are present in the model

peptides, which are the single bond and the partial double bond (i.e. the C=O bond in

the peptide backbone and the peptide bond). The equilibrium population of torsional

isomers around single bonds is readily sampled at room T because of their low torsional

energy barriers, and that around C=O bonds is also readily sampled because there is only

one isomer. However, the equilibrium population around the peptide bonds are not easily

sampled because of the high energy barrier between their cis and trans configurations,

which can only be crossed at high temperatures in the simulations.

Pal et al. studied the occurrence of cis/trans peptide bonds in the PDB database

(http://www.rcsb.org) in 1999, and found that peptide bonds are mostly in their trans

configurations with only 0.3% in their cis configurations. For X-Pro, the proportion of

cis configurations is much higher at 5.7%, which accounts for 87% of all the cis- peptide

bonds. [89] The reason for a high fraction of cis-X-Pro is due to the rigidity of the

pyrrolidine ring in Pro, which decreases double bond character of the pre-proline peptide

bond and therefore the energy barrier between its cis and trans configurations.

Considering that Pal’s study in 1999 may be outdated since only 294 X-ray structures

were included, we have reanalyzed the percentage of cis-X-Pro using the current PDB

database, which included 86468 structures at the time of analysis. A peptide bond is

defined as in its cis configuration if the corresponding dihedral angle is closer to 0◦ than

to 180◦. Otherwise, it is defined as in its trans configuration. The structures are without

redundancy check, and include both proteins and nucleic acids generated by both X-ray

crystallography and NMR studies. The inclusion of nucleic acids des not affect the result

since they contain no peptide bonds at all. The percentage of cis-X-Pro is calculated as

cis-X-Pro % =Ncis-X-Pro

Ncis-X-Pro +Ntrans-X-Pro, (4.1)

http://www.rcsb.org


and the result is shown Table 4.2. The overall percentage of cis-X-Pro is 4.5%, which

is less than 5.7%, but still approximately the same. Interestingly, the percentage of the

cis-V-Pro, which is the only type of cis-X-Pro available in the model peptides, is even

lower than the overall value, 2.6%, while those involving aromatic residues (F, Y, W) are

relatively higher. With 4.5%, the energy difference between cis and trans configurations

of X-Pro is calculated to be 7.6 kJ ·mol−1, or 1.8 kcal ·mol−1 assuming the temperature

is 300 K.

cis-X-Pro % cis-X-Pro % cis-X-Pro % cis-X-Pro %

IP 2.1 LP 2.8 KP 3.9 EP 6.8

CP 2.1 TP 3.4 HP 4.0 GP 6.9

DP 2.3 AP 3.5 NP 5.0 FP 7.4

VP 2.6 RP 3.5 SP 6.1 YP 12.1

MP 2.7 QP 3.7 PP 6.3 WP 13.7

Overall: 4.5

Table 4.2: Percentages of different cis-X-Pro calculated based on data from the PDB

database. The overall percentage is that of cis configurations out of all X-Pro peptide

bonds. The row in blue highlights that V-Pro is the only type of X-Pro available in the

model peptides.

Therefore, it is justifiable that for a short peptide with 35 peptide bonds or 11 V-Pro

peptide bonds as in (PGV)12, first, the percentage of cis-X-nonPro peptide bonds should

be set to zero (35× 0.3% · (1− 87%) = 0.014 ≈ 0); second, the percentage of cis-X-Pro

peptide bonds, can also be set to zero for simplicity (11× 2.6% = 0.029 ≈ 0). However,

at a temperatures as high as 3000 K, the systems were found to be contaminated with

too many cis- peptide bonds, the details of which are shown in Figure 4.9 and 4.10, and

summarized in Table 4.3.

In Ds. 1, in which the initial conformations were generated at 700 K, little X-nonPro


0

5

10

15

20

25

30

35

40

Num

ber

of

replic

as

(GVPGV)35

Ds. 2 waterDs. 2 methanol

(PGV)12

0

5

10

15

20

25

30

35

40

Num

ber

of

replic

as

(GGVGV)35 (GVGVA)35

0.0 0.2 0.4 0.6 0.8 1.0Fraction of cis-X-nonPro

0

5

10

15

20

25

30

35

40

Num

ber

of

replic

as

(GV)18

0.0 0.2 0.4 0.6 0.8 1.0Fraction of cis-X-nonPro

(G)35

Figure 4.9: Number of replicas (y-axis) vs. fraction of cis-X-nonPro (x-axis) in Ds. 2.

0.0 0.2 0.4 0.6 0.8 1.0Fraction of cis-X-Pro

0

5

10

15

20

25

30

35

40

Num

ber

of

replic

as

(GVPGV)35

Ds. 2 waterDs. 2 methanol

0.0 0.2 0.4 0.6 0.8 1.0Fraction of cis-X-Pro

(PGV)12

Figure 4.10: Number of replicas (y-axis) vs. fraction of cis-X-Pro (x-axis) in Ds. 2.


cis-X-nonPro cis-X-Pro

Ds. 1 Ds. 2 Ds. 1 Ds. 2

(GVPGV)7 0.01 0.17 0 0.33

(PGV)12 0.01 0.17 0 0.30

(GGVGV)7 0.02 0.18 - -

(GVGVA)7 0.01 0.18 - -

(GV)18 0.01 0.18 - -

G35 0.03 0.25 - -

Table 4.3: Summary of the fraction of cis-X-nonPro and cis-X-Pro in Ds. 1 and Ds. 2.

peptide bonds were in their cis configurations. By contrast, Figure 4.9 shows that in

Ds. 2, in which the initial conformations were generated at 3000 K, the fractions of cis

configurations in different replicas vary from 0 to as high as 0.6, which is unrealistic

compared to a fraction of 0.00039 (0.3%× (1− 87%)) in nature [89]. Since the presence

of cis- peptide bonds affect the formation of secondary structure [56, 89, 40], it is not

surprising to see a much lower content of β-sheet in Ds. 2 as shown in Figure 4.6 with

such a high amount of cis-X-nonPro .

For X-Pro, at 700 K, little cis-X-Pro was introduced in (GVPGV)7 whereas its fraction

was about 0.2 in (PGV)12, which means that with a larger number of X-Pro bonds, it

is easier for (PGV)12 to introduce cis-X-P than (GVPGV)7. These conformations were

omitted from the analysis of results for simplicity as mentioned above. At 3000 K, cis-X-

Pro becomes much more populous, with their fractions varying from 0.0 to 0.6 as shown

in Figure 4.10.


4.3 Discussion

We have examined the solvent quality of various solvents of different polarities for a set of

model peptides. Relative solvent quality was measured by the peptides’ average Rg in a

particular solvent: the larger the Rg, the better the solvent quality. Although the shape

of the Rg dependence on the different solvents which we examined differs from peptide to

peptide, in general, as the solvent becomes less polar, its solvent quality increases, and up

to a turning point (e.g. heptanol) beyond which the solvent becomes too nonpolar and

it quality begins to decrease. The increase of solvent quality along with its nonpolarity

is due to the hydrophobicity of the model peptides, while the existence of a turning

point despite such overall hydrophobicity is probably due to the polarity of the peptides’

backbone. For some of the model peptides, most notably ELPs, there is a plateau in

Rg, suggesting that Rg has little dependence on the solvent within the range of polarity

covered by this plateau. In addition, we also found a positive relationship between the

peptide’s hydrophobicity score and the nonpolarity of the solvent in which the peptide

first reaches its maximum Rg value, with G35 as an outlier.

We have also investigated θ-Rg of the ELPs by simulating them at very high temperatures

in vacuo. Interestingly, θ-Rg is much higher than the maximum Rg obtained in alcohols,

which are therefore all poor solvents of the model peptides in spite of their similar non-

polarity/hydrophobicity with the peptides. This result suggests that a solvent that is as

nonpolar/hydrophobic as the solute peptide is not necessarily a θ-solvent. Other factors

such as the uneven distribution of the polar and nonpolar groups, which distinguishes the

peptides from their homogeneous counterparts (i.e. synthetic polymers like polystyrene

and polyethylene), must be taken into consideration. The finding from our group that

the backbone of an ELP remains partially hydrated even as the peptide approaches the

condition of a polymer melt, corroborates the difficulty of achieving ideal (θ) solvation

even for such highly disordered peptides [97].


Unfortunately, inconsistency in properties such as β-sheet content is observed between

the results presented in this chapter and Chapter 3. A thorough investigation on this

issue revealed that the inconsistency is due to an unusually high ratio of cis/trans pep-

tide bonds in the peptides in this chapter, which is a result of improper preparation of

the initial peptide conformations at too high a temperature (3000 K) in vacuo. The

abnormal ratio makes many structural properties unreliable, but as we have showed,

Rg is not significantly affected, and qualitatively the same result for Rg of (PGV)12 has

been obtained with proper initial conformations in another force field, CHARMM22*.

Therefore, we think that the Rg part of the results in Section 4.2.1 is reproducible after

correcting the inappropriate step, which is currently being verified. We recommend the

next student who will continue to work on this project to pay particular attention to

the preparation of the initial conformations since they are generated de novo in comput-

ers rather than using experimental techniques like X-ray crystallography or NMR, and

thereby more likely to be subject to artifacts.


Simulations in solvents in OPLS-AA/L We performed 40 replicas of each of the

6 model peptides as a monomer in 10 solvents successively. Each replica was simulated

at 1 par, 300 K for 200 ns, and the simulations of (GVPGV)7 in water and in methanol

were extended to 500 ns. Each system was solvated in a triclinic box with angles of 60◦,

60◦, 90◦. The box size of each system and number of solvent molecules in it are listed

in Table 4.4. The initial structures of the peptides were generated at 3000 K in vacuo,

which resulted in a considerable amount of cis- peptide bonds as discussion in Subsection

4.2.5.

All simulations were performed at constant pressure and constant temperature with peri-


Solvent Box size (nm3) No. Solvent Box size (nm3) No.

water 5.4× 5.4× 3.8 3700 pentanol 7.8× 7.8× 5.5 1800

methanol 6.9× 6.9× 4.9 3300 hexanol 8.0× 8.0× 5.7 1700

ethanol 6.7× 6.7× 4.8 2200 heptanol 7.9× 7.9× 5.6 1400

propanol 7.1× 7.1× 5.0 2000 octanol 7.0× 7.0× 4.9 900

butanol 7.6× 7.6× 5.4 2000 octane 5.2× 5.2× 3.7 350

Table 4.4: Box size of and number of solvent molecules in each system.

odic boundary conditions. The simulation package used was Gromacs-4.0.5 for all systems

except those in heptanol, for which Gromacs-4.5.5 was used. Models of explicit TIP4P

water [57], methanol, ethanol and propanol [58] from the Gromacs-4.5.5 software packages

were used directly. Models for all the other solvents were constructed with g_x2top from

Gromacs-4.5.5 tools and furnished with in-house script. The LINCS algorithm was used

to constrain all bond lengths [47, 46]. An cutoff of 1.4 nm was used for Lennard-Jones

interactions. The PME algorithm [26, 33] was used to calculate long-range electrostatics

interactions with a Fourier spacing of 0.12 and a interpolation order of 4. Nose-Hoover

thermostat [88, 50] was used for temperature coupling with the peptide and solvent cou-

pled to two temperature baths and a time constant of 0.1 ps. Parrinello-Rahman [91]

was used for pressure coupling with a time constant of 2 ps. The integration step size is

2 fs and the system coordinates were stored every 10 ps.

The first 150 ns of each trajectory was omitted for equilibration, resulting in a total of

14 µs of production time for the systems of (GVPGV)7 in water and in methanol, and a

total of 2 µs of production time for all other systems.

Simulations in vacuo Simulations in vacuo were performed using Gromacs-4.5.5 in

OPLS-AA/L force field at constant temperature without pressure coupling. The tem-

peratures were 300, 366, 447, 546, 667, 815, 996, 1216, 1485, 1814, 2216, 2707, 3306,


4039 K. 8 replicas were used for each system at a each temperature and simulated for

200 ns, resulting in 1.6 µs of production time per system. The LINCS algorithm was

used to constrain all bond lengths [47, 46]. The shift algorithm was used to calculated

Lennard-Jones and electrostatics interactions with cutoffs of 0.9 and 0.8 nm respectively.

Nose-Hoover thermostat [88, 50] was used for temperature coupling with a time constant

of 0.1 ps. The integration step size is 0.1 fs to avoid system crash at high temperatures

and the system coordinates were stored every 10 ps. The translation of and rotation

around the center of mass were removed every 10 step to avoid the flying ice cube effect

[45].

Simulations in CHARMM22* The CHARMM22* [95] force field was downloaded

from http://www.gromacs.org/Downloads/User_contributions/Force_fields. The

solvent models were from CGenFF of version 2b7 [116]. The atom charges were gen-

erated by CGenFF program of version 0.9.6 beta from https://www.paramchem.org/

AtomTyping/ [118, 117]. The simulations in CHARMM22* were conducted using Gromacs-

4.5.5, and included 10 300-ns replicas with the first 150 ns truncated as equilibration,

resulting in 1.5 µs of production time per system. All other technical information was

same as simulations in OPLS-AA/L except that time constant for the thermostat was 2

ps. The box size and number of solvents per system is shown in Figure 4.5.

Solvent Box size (nm3) No. Solvent Box size (nm3) No.

water 5.4× 5.4× 3.8 3700 heptanol 7.6× 7.6× 5.3 1300

methanol 6.8× 6.8× 4.8 3300 octanol 7.2× 7.2× 5.1 1000

ethanol 6.7× 6.7× 4.7 2200 octane 6.1× 6.1× 4.3 600

pentanol 7.7× 7.7× 5.4 1800

Table 4.5: Box size of and number of solvent molecules in each system in CHARMM22*

force field.

http://www.gromacs.org/Downloads/User_contributions/Force_fields

https://www.paramchem.org/AtomTyping/

https://www.paramchem.org/AtomTyping/

Chapter 5

Modeling Mechanical Properties

5.1 Background

The most important function of elastin is to provide elasticity to biological tissues, and

understanding the underlying mechanism is crucial for future applications like biomimetic

materials engineering. Therefore, in this chapter, we focus on gaining insights into the

molecular mechanism of elasticity in elastin by modeling its mechanical properties based

on MD simulations of ELPs.

For all MD simulations studies, it would be very helpful to have a direct comparison be-

tween results in silico and those in experiments. However, experimentalists are usually

working at a much larger scale of both time and size than MD simulators. Therefore,

we attempted to model the macroscopic properties from the microscopic ones obtainable

from our MD studies. At the microscopic level, the major mechanical properties con-

cerned in this chapter are the modulus (k) and equilibrium length (d0) of a monomer

peptide. Both k and d0 can be calculated from the peptide’s end-to-end distance distri-

bution, which is directly obtainable from MD simulations. At the macroscopic level, we

71

Chapter 5. Modeling Mechanical Properties 72

modeled the Young’s modulus of a piece of elastin-like material with k and d0. Young’s

modulus is defined as

KY =stress

strain=

F/A

∆L/l, (5.1)

where stress is defined as the quotient of the recoiling force (F ) divided by the cross

section area (A) of the material, and strain is defined as the ratio of the extension (∆l)

to the length (l) of the material. Hence KY is in units of MPa.

Our approach for relating the microscopic and macroscopic worlds is to start by calculat-

ing k and d0 of a monomer, then to use these values to model KY for a piece of elastin-like

material, and at last to compare the modeled KY to the experimental values.

5.2 Theory

5.2.1 Modulus of a Monomer as a Spring

The modulus of a peptide monomer is calculated by fitting a parabola, which is of the

same shape of the potential of a spring, to the system’s potential of mean force (PMF)

profile upon the peptide’s end-to-end distance (d). The PMF is defined as the change of

free energy along a reaction coordinate, which is the end-to-end distance of the peptide

in this case. According to the Boltzmann distribution,

pd =e−Gd/kBT

Z=e−βGd

Z, (5.2)

where pd and Gd are the probability and free energy when the end-to-end distance is

d, kB is the Boltzmann constant, T is the absolute temperature, and Z is the partition

function. At zero extension, i.e. when d = d0,

p0 =e−βG0

Z(5.3)


where p0 and G0 are the corresponding probability and free energy. Dividing Equation

(5.2) by (5.3) yields

pdp0

= e−β(Gd−G0) (5.4)

Since the PMF only calculates the free energy difference, so G0 can be set to 0. Then,

apply the logarithmic operation to both sides of Equation (5.4),

lnpdp0

= −βGd, (5.5)

so that

Gd = − 1

βlnpdp0, (5.6)

which is then fitted to a parabola function,

Gd = − 1

βlnpdp0∼=

1

2k(d− d0)2, (5.7)

where k is the modulus and d0 is the equilibrium length, i.e. the end-to-end distance

when the peptide is in its relaxed state.

5.2.2 Young’s Modulus in the tetrahedron model

We developed a mathematical model named the tetrahedron model to calculate the

Young’s modulus for a piece of macroscopic elastin-like material based on the obtained

modulus (k) and equilibrium length (d0) of a monomer. The calculated Young’s modulus

can then be directly compared to experimental measurements.

The tetrahedron model considers the elastin-like material to be a collection of tetrahedra

at its relaxed state. In this model, each XL domain is represented by a node or a crosslink,

and each HP domain is represented by the edge connecting two crosslinks. Each crosslink

has a valence of four, which means being connected by four HP domains. This is similar

as in native elastin where four Lys residues from two XL domains interact to form a


crosslink (e.g. desmosine or isodesmosine) [123]. Since each XL domain is flanked by

two HP domains, each crosslink is connected to four HP domains. The material in

this model is assumed to retain a constant volume during extension, so as the material

increases in length, it decreases in width and height. The model construction consists of

the following sequence: we start by calculating the modulus of a spring complex (kc) as

shown in Figure 5.1; then, based on kc, we analyze the modulus of a unit cell (ku) which

contains 4 tetrahedra as shown in Figure 5.2; finally, based on ku, we derive the Young’s

modulus (KY ) of the material.

OA A′

B B′O1

O2

O′1

O′2

d0 d

x0 x

X

s0s

θ=54.7◦ θ′F

Figure 5.1: A spring complex as defined in the tetrahedron model. Please refer to the

text for a detailed description.

A spring complex is defined as two springs forming a tetrahedron angle (θtetra), i.e.

∠O1AO2 = θtetra = 109.4712◦, as shown in Figure 5.1. Consider the following process,

starting from Point A, where the complex is at its relaxed state, a force F is applied to

A along OA direction perpendicular to O1O2. Then, A shifts to A′, O1 and O2 shift to

O′1 and O′2, respectively; s0, the projection of O1A on O1O, becomes s; ∠O1AO, which

equals θ = 12∠O1AO2 = 54.7356◦, becomes θ′; and the length of the complex becomes X

from its equilibrium length, x0, by an extension of x.


x

y

z

A

B

C

D

E

F

G

H

I

J

O1

N1

O2

N2

O3

N3

O4N4

Figure 5.2: A unit cell as defined in the tetrahedron model. Within each unit cell, there

are 4 tetrahedra, the centers of which are labeled O1 to O4. Please refer to the text for

detailed description.


The logic is first to derive F as a function of x, F (x), and then to calculate the derivative

of F with respect x, dF/dx, which is defined as the modulus of the complex (kc). Since

X = x0 + x, (5.8)

and therefore,

dF

dx=dF

dX· dXdx

=dF

dX· d(x0 + x)

dx=dF

dX. (5.9)

We prefer to calculate dF/dX instead because it is easier to obtain. According to Figure

5.1,

F = 2k(d− d0) cos θ′, (5.10)

where k is the modulus of a single spring, d0 and d are the lengths of the spring at its

relaxed and extended states, and the constant 2 at the beginning is for two springs. In

the right-hand-side (RHS), k and d0 are constants, so we try to replace the variables, d

and cos θ′ with X. Given that

d =√X2 + s2, (5.11)

cos θ′ =X√

X2 + s2, (5.12)

Equation (5.10) can be rewritten as

F = 2k · (√X2 + s2 − d0) ·

X√X2 + s2

= 2kX · [1− d0(X2 + s2)−12 ]. (5.13)

In order to replace s with X as well, we use the relationship between the two which

results from the constraint that the material’s volume is constant during its extension.

At the macroscopic level, in order to conserve the volume of the material, if its length

increases by a ratio of r while its width and hight shrink by a ratio of r′, the following

relationship must hold,

(1 + r)l0 · (1− r′)w0 · (1− r′)h0 = l0 · w0 · h0, (5.14)


where l0, w0, h0 are the material’s length, width and height at its relaxed state. Solving

Equation (5.14) gives

r′ = 1− 1√1 + r

. (5.15)

At the microscopic level, by relating Figure 5.1 to Figure 5.2, which is considered to

consist of 8 such spring complexes in the pattern of a 2× 4 matrix, consisting of parallel

arrangement of 2 groups of 4 in-series complexes, therefore, the following relationships

hold,

s = (1− r′)s0, (5.16)

X = (1 + r)x0, (5.17)

With θ ≈ 54.7356◦,

s0 = d0 · sin θ =

√6

3d0, (5.18)

x0 = d0 · cos θ =

√3

3d0, (5.19)

so that

s = (1− r′)√

6

3d0 =

√6

3

1√1 + r

d0, , (5.20)

X =

√3

3(1 + r)d0. (5.21)

Then,

s2 =2

3

1

1 + rd20 =

2

3

(√3

3

d0X

)d20 =

2√

3

9

d30X. (5.22)

Substituting s2 in Equation (5.13) with (5.22),

F = 2kX · [1− d0(X2 +2√

3

9

d30X

)−12 ]. (5.23)

Therefore, the modulus of the spring complex, i.e. the derivative of F with respect to X,


is

kc =dF

dX= 2k ·

1− d0

(X2 +

2√

3

9

d30X

)− 12

+ kX · d0

(X2 +

2√

3

9

d30X

)− 32(

2X − 2√

3

9

d30X2

)

= 2k ·

[1− d0

(X2 +

C

X

)− 12

]+ kX · d0

(X2 +

C

X

)− 32(

2X − C

X2

)

= k ·

2− d02X2 + 2C

X− 2X2 + C

X

(X2 + CX

)√X2 + C

X

= k ·

[2− d0 ·

3C

X· (X2 +

C

X)−

32

], (5.24)

where

C =2√

3 · d309

. (5.25)

With kc obtained, next we calculate the modulus of a unit cell, ku. The topology of a

unit cell, as shown in Figure 5.2, is similar to that of a diamond except that the carbon

atoms and C-C bonds in the latter are replaced with crosslinks and HP domains. Each

cubic unit cell consists of 4 tetrahedra which are made up of 16 HP domains and 18

crosslinks. Of the 18 crosslinks, 8 in the corners are shared by 8 neigbouring unit cells,

and 6 in the middle of surfaces are shared by 2 neigbouring unit cells, and the remaining

2 are exclusively inside a single unit cell, so there are only 8 effective crosslinks per unit

cell (8× 18

+ 6× 12

+ 4). A unit cell is a highly symmetrical structure in which each HP

domain forms an angle of α = (1 − θtetra)/2 = 35.2644 ◦ with each surface. Given that

the equilibrium length of a HP domain is d0, the length of a unit cell is

lu = 2 · (2 · cosα · d0 · cos 45◦)

= 4 ·√

6

3· d0 ·

√2

2

=4√3d0. (5.26)

As mentioned above, a spring complex in Figure 5.1 is considered as the basic structure

for a unit cell as in Figure 5.2. The 16 HP domains in a unit cell can be grouped into 2×4


spring complexes, consisting of parallel arrangement of 2 groups of 4 in-series complexes.

Given that in physics, two springs of modulus k result in an overall modulus of 2k if in

parallel, and 12k if in series, the modulus of a single unit cell is

ku = (2 · kc)/4 =1

2kc. (5.27)

Now that we know ku, we are ready to calculate the Young’s modulus of a piece of

macroscopic material (KY ). Let the number of unit cells along the x, y, z axes of the

material be nu,x, nu,y, nu,z, then

nu,x = w0/lu, (5.28)

nu,y = l0 /lu, (5.29)

nu,z = h0/lu. (5.30)

Let the pulling force be along the y direction, then nu,y unit cells are in series while

nu,x × nu,z springs are considered in parallel. Hence the modulus of the material is

K =kunu,y· nu,x × nu,z =

kuw0h0lu · l0

=

√3

8

kcd0

w0h0l0

, (5.31)

and its Young’s modulus is

KY =stress

strain=K∆l/w0h0

∆l/l0=

Kl0w0h0

=kulu

=

√3

8

kcd0, (5.32)

where ∆l is extension of the material. Substituting kc with (5.24),

KY =

√3

8

k

d0·[2− d0 ·

3C

X· (X2 +

C

X)−

32

]=

√3

4

k

d0·

[1−√

3

3

d40X

(X2 +2√

3

9

d30X

)−32

]. (5.33)

However, the above equation is not very convenient when comparing our results with

experimental ones. Instead, KY as a function of strain is preferred. strain equals the

ratio of increase in length, r, which has already been mentioned above, namely

strain = r =∆l

l0. (5.34)


Substituting X in (5.33) with (5.21) yields

KY =

√3

4

k

d0·

(1− d30

1 + r

[d203

(1 + r)2 +2

3

d201 + r

]− 32

)

=

√3

4

k

d0·

(1− 1

1 + r

[(1 + r)2

3+

2

3(1 + r)

]− 32

). (5.35)

Equation (5.35) shows that KY changes with strain with kd0

as part of the constant,

which is determined by the inherent property of the sequence the material is made of.

If KY is integrated with respect to r, a stress-strain curve can be obtained, while in

experiments, it is usually the stress-strain curve that is measured first, then the Young’s

modulus is obtained by fitting a straight line tangent to the seemingly linear region of

the curve. Therefore, the two curves from simulations and experiments can be compared

as well to test the quality of this model. Let stress = 0 when r = 0, the integral of

Equation (5.35) turns out to be

stress =

√3

4

k

d0·

(r −√

3 · (1 + r) ·

√r + 1

(r + 1)3 + 2+ 1

). (5.36)

In experiments, the material’s cross-sectional area can sometimes be difficult to measure,

so m/l0 may be used instead where m is the material mass and l0 is its original length.

Then the definition of stress in experiments becomes

stressexp =F

m/l0=K∆l

m/l0, (5.37)

Let the density of the material be ρ, where

ρ =m

V=

m

l0w0h0. (5.38)

Substituting Equation (5.38) into (5.37), the relationship between stressexp and stress

is

stressexp =K∆l

ρ · w0h0=stress

ρ. (5.39)


Therefore, according to the specific case, either Equation (5.36) or the following equation

can be used for comparison between results in simulations and experiments.

stressexp =

√3

4ρ

k

d0·

(r −√

3 · (1 + r) ·

√r + 1

(r + 1)3 + 2+ 1

). (5.40)

5.3 Results

5.3.1 Modulus of Peptide Monomers

Figure 5.3 shows the result of fitting a parabola to the PMF of peptide end-to-end

distance for (GVPGV)7 and (PGV)12 successively in water and in methanol. It shows

that a monomer’s modulus k is lower and its equilibrium length d0 is larger in methanol

than in water. A lower k is reflected by a broader parabola while a larger d0 is seen

from the right shift of the parabola. Therefore, the Young’s modulus must be lower in

methanol than in water for both ELPs according to (5.35) in the tetrahedron model.

Comparing between sequences, the modulus of (PGV)12 in water is about 26 % lower

than (GVPGV)7, which is significant based on the error bars. However, in methanol, the

moduli of the two ELPs are not significantly different since their error bars overlap. The

fitting quality for (GVPGV)7 of (PGV)12 in water are the best and poorest out of the

four systems, respectively.

5.3.2 Young’s Modulus

Figure 5.4 shows the Young’s modulus (KY ) as a function of strain according to Equation

(5.35). The comparison of the modeled moduli with experimental measurements are

summarized in Table 5.1. Figure 5.4 shows that as strain increases, the modulus converges

to 2.9 MPa in water and 0.51 MPa in methanol for (GVPGV)7, and 1.8 MPa in water


0

1

2

3

4

5

6

7

PM

F (k

J/m

ol)

k = 8.7 ± 0.7 pN/nmd0 = 1.3 ± 0.1 nmr2 = 0.97

(GVPGV)7 in water

k = 2.1 ± 0.3 pN/nmd0 = 1.8 ± 0.1 nmr2 = 0.89


0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0d (nm)

0

1

2

3

4

5

6

7

PM

F (k

J/m

ol)

k = 6.4 ± 0.9 pN/nmd0 = 1.5 ± 0.1 nmr2 = 0.88

(PGV)12 in water

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0d (nm)

k = 2.7 ± 0.4 pN/nmd0 = 1.9 ± 0.1 nmr2 = 0.90

(PGV)12 in methanol

Figure 5.3: PMF along the end-to-end distance obtained from MD simulations of ELPs,

(GVPGV)7 and (PGV)12. A parabolic fit of the form PMF = k · (d − d0)2 + C, where

d0 is the equilibrium length of the peptide and C is a constant, is shown as dashed line.

Mean values and SEM are obtained by partitioning the MD trajectories used to compute

W (d) into four groups.


and 0.4 MPa in methanol for (PGV)12. The shaded areas indicate KY values obtained

when the strain is between 0 and 0.8, which covers the range where experimental values

are measured as shown in Table 5.1. KY in methanol is only 15 % and 30 % of that

in water for (GVPGV)7 and (PGV)12 respectively when the strain is 0.8. Despite KY

continues to be lower in methanol than in water when strain is larger than one, only KY

within a strain of one is shown because this is the approximate extensibility of elastin-

like materials in experiments. Table 5.1 shows that the KY when strain is 0.8 from the

tetrahedron model tends to overestimate KY compared to experimental values except

for (PGV)12 versus EP-20-244 (GP). Despite the overestimation ranges from 1.1 (2/1.8)

to 8 (2/2.5), considering the variance among experimental values and the simplicity of

this model, we think the results from the tetrahedron model are remarkably close to the

experimental measurements.

0.0 0.2 0.4 0.6 0.8 1.0Strain

0.0

0.5

1.0

1.5

2.0

2.5

3.0

KY

(M

Pa)

asymptotic limit = 2.9


k=8.7±0.7 pN/nm; d0 =1.3±0.1 nmk=2.1±0.3 pN/nm; d0 =1.8±0.1 nm

KY = 2.0

KY = 0.3

(GVPGV)7 water

(GVPGV)7 methanol

0.0 0.2 0.4 0.6 0.8 1.0Strain



k=6.4±0.9 pN/nm; d0 =1.5±0.1 nmk=2.7±0.4 pN/nm; d0 =1.9±0.1 nm

KY = 1.3

KY = 0.4

(PGV)12 water

(PGV)12 methanol

Figure 5.4: Young’s modulus as a function of strain for (GVPGV)7 and (PGV)12. The

thick dashed lines indicate the converged values for KY in water and methanol, and the

shaded areas indicate KY values obtained in the strain range of 0–0.8.


Peptide/Protein Strain Water Methanol

Tetra.

(GVPGV)7 0 – 0.8 0 – 2.0 0 – 0.3

(PGV)12 0 – 0.8 0 – 1.3 0 – 0.4

(GVPGV)7 ∞ 2.9 ± 0.1 0.5 ± 0.1

(PGV)12 ∞ 1.9 ± 0.1 0.6 ± 0.1

Exp.

EP-20-24-24 (PQQ) [10] 0.1 – 0.5 0.25± 0.10 -

EP-20-244 (PQQ) [10] 0.4 – 0.8 0.25± 0.09 -

EP-20-244 (PQQ) [120] 0 – 0.2 0.4 -

EP-20-244 (GP) [120] 0 – 0.2 1.8 -

aortic elastin [70, 10] 0.2 – 0.6 0.8 -

Table 5.1: Comparison of the Young’s moduli of the ELPs calculated using the tetra-

hedron model and those for other peptides/protein obtained in experiments. ∞ means

that the strain is large enough to have converged KY . The values for peptides/protein

other than (GVPGV)7 and (PGV)12 are from experimental studies. Unit: MPa.


5.3.3 Stress-strain Curve

Figure 5.5 shows the stress-strain curves constructed with the tetrahedron model as well

as that measured in experiments. Despite the modeled curves tend to underestimate the

stress at a particular strain compared to the experimental one, roughly the two types

of curves are on the same order of magnitude (hundreds of N/g/mm). The presence of

an inflection point in the experimental curve suggest partial breakage of the matrix in

the material. In experiments, KY is calculated by fitting a straight line to the stress-

strain curve where it looks most linear. However, according to our model, KY is still

increasing though it may look linear when the strain is less than 1. Therefore, the model

suggests that fitting a tangent line to the curve for obtaining a value for KY can be an

oversimplification. Since the KY in methanol is much lower than in water as shown in

Table 5.1, unsurprisingly, the stress-strain curve is also considerably lower in methanol

than in water.

5.4 Discussion

We have shown that the modulus (k) and equilibrium length (d0) of a monomer can

be calculated by fitting a parabola to the system’s PMF along the peptide’s end-to-end

distance. Based on k and d0, we developed a mathematical model, the tetrahedron model,

to calculate the Young’s modulus (KY ) of macroscopic elastin-like material made of such

monomers. In this section, we present two comparative discussions of our results. First,

the results from MD simulations are compared to those from experiments. Second, the

results from MD simulations in water are compared to those in methanol.


0.0 0.2 0.4 0.6 0.8 1.0Strain

0

200

400

600

800

1000

Str

ess

(N

/g/m

m)

(GVPGV)7 in water


(PGV)12 in water

(PGV)12 in methanol

Figure 5.5: The upper plot shows the stress-strain curves constructed with Equation

(5.40) from the tetrahedron model for (GVPGV)7 and (PGV)12 successively in water

and in methanol, assuming density is 1.3× 10−3g ·mm−3 [10]. The lower plot is a stress-

strain curve of an ELP measured in experiments in 20% methanol solution (personal

communication with Fred Keeley).


5.4.1 Comparison between Experiments and Simulations

At the monomer level, k of the model ELPs is compared to that obtained from a full

tropoelastin in Baldock et al. [7]. Using the worm-like-chain model as described in the

paper, the modulus of a single full tropoelastin molecule is calculated to be about 0–9

pN/nm at an end-to-end distance of 0–140nm. Therefore, the modulus of a monomer as

shown in Figure 5.3, 8.7 ± 0.7 pN/nm for (GVPGV)7 or 6.4 ± 0.9 pN/nm for (PGV)12,

is on the same order of magnitude which is significant considering the difference in length

(35 vs. 786 residues) and sequence composition between the full tropoelastin [52] and

the model ELPs.

At the material macroscopic level, the results of KY for both (GVPGV)7 and (PGV)12

calculated with the tetrahedron model are remarkably close to the experimental results

as shown in Table 5.1 with an overestimation of between 1.1 to 8. The overestimation of

the tetrahedron model can be justified by the following points. First, both (GVPGV)7

and (PGV)12 are quite different from the HP domains used in the experiments [120, 10].

As for native elastin [52], it consists of many different types of HP domains. Second, the

finite length of the XL domains have been completely ignored and their function is only

limited to providing linkage between HP domains in the model. Third, the crosslinking

efficiency of XL domains is assumed to be 100% in the model, but it is probably lower

in experiments, which would reduce KY . Fourth, in a phase-separated aggregate of

self-assembled elastin, the modulus of a HP domains should be lower because of the

reduced solvophobic effect and maximized chain entropy [97] (see the next subsection for

a more detailed discussion on the influence of solvophobic effect and chain entropy on

the modulus), which would also reduce the final KY according to Equation (5.35). All

the differences are likely to affect the consequent Young’s modulus, but their effects are

difficult to assess quantitatively.


Overall, the consistency between k and KY in simulations and in experiments suggests

that the tetrahedron model is a reasonable way to model the Young’s modulus of elastin-

like materials using data from MD simulations. The reason for using tetrahedron as the

most basic unit is that it is the most symmetrical structure given a node is connected by

4 edges. Another model, the cubic model, was also tried before the tetrahedron model

was developed. The cubic model is not as good as the tetrahedron one because: first, it is

not as symmetrical; second, it cannot be used to construct a stress-strain curve because

its resultant Young’s modulus does not change with strain; third, it overestimates KY to

an even larger extent.

5.4.2 Comparison between Results in Water and in Methanol

The modeled Young’s modulus (KY ) is higher in water than in methanol. According

to Equation (5.35), this is a result of a higher modulus (k) and a smaller equilibrium

length (d0) of a monomer in water than in methanol. Therefore, the comparison only

needs to be focused on the monomer level. Why is k higher and d0 lower in water? In

order to provide a detailed explanation to this question, we have attempted the following

derivation.

We make two assumptions. First, since the recoiling force is mainly entropic for the

native elastin [82, 49] and tropoelastin [7] in water, it is assumed to be also entropic

for ELPs both in water and in methanol. Second, in the most stretched state, the chain

entropy is assumed to be zero in both water and methanol regardless of sidechain entropy.

We started with deriving a relationship between the change of system free energy (∆G),

modulus (k) and the change of end-to-end distance of the peptide (∆d). Since the

recoiling process is entropic, ∆G is approximated to the change of system entropy (∆S).

As the material recoils, the recoiling force (f) is doing positive work, δd is below 0, and


∆S is above 0, which means a gain of system entropy. After the relationship is derived,

∆S is decomposed into the solvent and solute parts and compared in different solvents

so as to understand how they modulate the modulus of the peptide.

Based on its thermodynamic origin, the recoiling force can be decomposed into

f = fe + fs =

(∂H

∂d

)p,T

− T(∂S

∂d

)p,T

, (5.41)

where f is the recoiling force, fs and fe are the entropic and enthalpic contributions, and

H, S, d, T , p, are system enthalpy, system entropy, peptide end-to-end distance, tem-

perature, and pressure, respectively. Since the recoiling force is assumed to be entropic,

f is approximated to

f ≈ fs = −T(∂S

∂d

)p,T

(5.42)

The above equation shows how f is determined by the change of system entropy ∆S

with respect to a change in the peptide’s end-to-end distance (∆d) between the relaxed

and stretched states. To relate f to k,

∆G = −T∆S =

∫ d

d0

fsdx =

∫ d

d0

−kdx

= −1

2k(d− d0)2 = −1

2k(∆d)2. (5.43)

Therefore,

∆S =k(∆d)2

2T, (5.44)

which means that if ∆d and T are fixed, k is proportional to ∆S. Therefore, the more

significant the gain of system entropy, the higher the modulus. Since k is lower in

methanol than in water, ∆S should also be lower. However, Equation (5.44) cannot

explain why ∆S is lower. For a system of a hydrophobic polymer such as (GVPGV)7 or

(PGV)12 in a polar solvent such as water, the gain of system entropy is two fold upon the

peptide’s recoil. First, the chain entropy increases but only to a certain point, after which

it decreases instead. Second, the solvent entropy also increases because the solvophobic


effect is mainly entropic at room temperature [27]. Therefore, ∆S can be decomposed

into

∆S = Sd0 − Sd = (Sd0u − Sdu) + (Sd0v − Sdv )

= ∆Su + ∆Sv,

(5.45)

where subscripts u and v mean solute (i.e. the peptide) and solvent (i.e. water or

methanol). To distinguish different solvents, Equation (5.45) is rewritten as

∆Sw = ∆Swu + ∆Swv (5.46)

in water, and as

∆Sm = ∆Smu + ∆Smv (5.47)

in methanol, where superscript w and m mean in water and in methanol, respectively.

Since ∆S should be higher in water than in methanol as indicated by Equation (5.44),

∆∆S = ∆Sw −∆Sm > 0. (5.48)

The above equation can be expanded substituting Equation (5.46) & (5.47) and rear-

ranged to

∆Smu −∆Swu < ∆Swv −∆Smv , (5.49)

which is the first key inequality. Because the chain entropy in the most stretched state

is assumed to be zero, and the chain entropy is believed to be higher in methanol than

in water because of its broader distribution of Rg,

∆Swu −∆Smu = Swu − Smu < 0, (5.50)

which is the second key inequality. Substituting Equation (5.50) into (5.49),

∆Swv −∆Smv > 0, (5.51)

which is the third key inequality. For convenience, the three key inequalities obtained so

far are put together,

∆Smu −∆Swu < ∆Swv −∆Smv , (5.49)


∆Swu −∆Smu < 0, (5.50)

∆Swv −∆Smv > 0, (5.51)

which are interpreted as: comparing the recoiling process between in water and in

methanol, there are both an increase in the gain of chain entropy (Equation (5.50))

and a decrease in the gain of solvent entropy (Equation (5.51)), which would compromise

each other in modulating the modulus of an ELP according to Equation (5.44). As a re-

sult, the modulus turns out to be lower in methanol than in water, which means that the

increase cannot compensate the decrease (Equation (5.49)). Therefore, the lower mod-

ulus in methanol must be caused by the decrease in the gain of solvent entropy, which

is equivalent to saying that the solvophobic effect is reduced in methanol. Despite an

increase in the gain of chain entropy, the reduced solvophobic effect results in a broader

distribution of Rg and a larger equilibrium length.

5.5 Conclusion

Our results show that our estimate of k are commensurate with experimental values

obtained on full tropoelastin [7]. In addition, a mathematical model named the tetra-

hedron model was constructed to model KY and remarkably, it results in a very good

agreement with experimental measurements on self-assembled ELPs. Based on the tetra-

hedron model, the stress-strain curves can also be modeled, which turn out to be of a

similar shape to experimental curve, as well. The fact that the values of KY predicted

by our model are close to experimental measurements despite the neglect of XL domain

is a strong argument for HP domains being mainly responsible for generating elasticity

in elastin-like materials.

The same approach was applied successively to ELPs in water and in methanol. The

results suggest that methanol would decrease k of a monomer as well as KY of the


elastin-like material, which suggests that the hydrophobic effect plays an important role

in generating elasticity in elastin in support of the two-phase model [92, 129, 44, 66].


The simulations data used to compute the elastic moduli of (GVPGV)7 and (PGV)12 was

obtained from simulations described in Chapter 3 with the production time for (PGV)12

in water and in methanol extended to 500 ns per replica, the same as that of (GVPGV)7.

The end-to-end distance was calculated between the C atom of the C-terminal acetyl

group and the N atom of the N-terminal NH2 group. To fit parabolas to the end-to-

end distance distributions, all replicas were used for (GVPGV)7, but 3 and 4 replicas

were removed for (PGV)12 in water and methanol respectively in order to improve the

fitting quality. Total sampling times of 14 µs, 14 µs, 12.95 µs, and 12.6 µs were used

for (GVPGV)7 in water, (GVPGV)7 in methanol, (PGV)12 in water and (PGV)12 in

methanol, respectively. In the removed replicas, the peptides were trapped in a region of

small end-to-end distances which would have produced an abrupt peak in the resultant

PMFs. As for the cutoff of the PMF during the fitting process, a few values were tried

ranging from 1–4 RT , and the one that produced the highest fit quality was selected,

which is 2 RT (i.e. 5.98 kJ/mol, R: gas constant, T : temperature (300 K)), though the

modulus and equilibrium length produced with different cutoffs were not significantly

different from each other.

Chapter 6

Summary & Future Directions

6.1 Summary

Below is a compiled list of the work presented in this thesis.

• In Chapter 3, we investigated the self-aggregation propensities of a set of 6 model

peptides, (GVPGV)7, (PGV)12, (GGVGV)7, (GVGVA)7, (GV)18 and G35, in both

water and methanol by analyzing their conformational properties as monomers. We

found that ELPs swell in methanol and we also concluded that it is the reduction

of the solvophobic effect that prevents the ALPs from forming amyloid-like fibrils

in methanol.

• In Chapter 4, we studied the solvent qualities of water, a set of primary alcohols

from methanol to 1-octanol, and octane on the 6 model peptides. We found that

water and octane, which represent the polar and nonpolar extremes of the solvent

set, are the poorest solvents. In between, as the methyl chain of the alcohol becomes

longer (i.e. polarity of the solvent decreases), the solvent qualities increases up to

93

Chapter 6. Summary & Future Directions 94

heptanol as indicated by the peptide’s Rg, but none of the alcohols studied is a θ- or

good solvent. We postulate that the uneven distribution of the polar and nonpolar

groups in a peptide is a major impeding factor that prevents a solvent of similar

hydrophobicity/nonpolarity as the peptide from being its θ-solvent.

Due to improper preparation of the initial conformations, other structural prop-

erties like the contents of secondary structures were unreliable. This issue was

thoroughly discussed in the Section 4.3.

• In Chapter 5, we derived the modulus (k) and equilibrium length (d0) of ELPs

from the PMF upon the end-to-end distance. Based on k and d0, we developed the

tetrahedron model to calculate the Young’s modulus (KY ) of elastin-like materials.

The results are commensurate with experimental measurements. Applying this

approach to simulations in different solvents, it shows that a monomer has a lower

k and a larger d0 in methanol than in water. As a result, the corresponding Young’s

modulus of a material made of such a monomer is also lower in methanol. This

observation highlights the important role of hydrophobic effect in generating the

elasticity in elastin-like material, which is consistent with the two-phase model.

6.2 Future Directions

First, in recent years, along with the rapid increase of computational power and the

development of more efficient algorithms, the time scale for MD simulations have been

continuously extended. As a result, the accuracy of current force fields turns out to

be limited, which has spurred a new trend of force field optimization and comparison

(see Chapter 2 for references). However, most of the force fields were optimized and

validated with folded proteins in mind, which was also assumed to be good for IDPs

Chapter 6. Summary & Future Directions 95

when MD simulations were being conducted in this thesis. Such an assumption turns

out to be questionable, which has motivated a comparison of the most recent force fields

for the model peptides. The preliminary results of this comparison study are shown

in Appendix A. Some of the interesting findings include (1) in CHARMM22* [95], the

Rg curve is roughly reproduced for (PGV)12, but not for (GV)18 or G35; (2) the ALPs

form β-sheet in OPLS-AA/L, but do not in CHARMM22*; (3) a XL-domain derived

Ala-rich peptide forms α-helix in CHARMM22*, but do not in OPLS-AA/L (personal

communication with Aditi Ramesh). Therefore, a force field that is good for ELPs may

not be as suitable for ALPs, and one that is good for XL domains may not be as valid

for HP domains. In order to achieve a more accurate description of the conformational

ensembles of IDPs, a thorough comparison and evaluation of force fields are necessary.

Second, since elastin is an extracellular matrix protein, MD simulations of peptides aggre-

gation as well as of mature elastin-like materials are necessary to deepen our understand-

ing of the underlying structure-function relationships in elastin. However, due to the

extraordinary demand of atomistic models on computational resources, CG models need

to be used if a certain degree of loss of the atomistic details is acceptable. The MARTINI

model mentioned in Subsection 2.2.2 is currently under development and will be tested in

the near future. Once a CG model is validated, it can also be used to test the hypothesis

proposed in Chapter 3. An alternative to circumvent the bottleneck of computational

power is to use an elastic network model (ENM) model. The tetrahedron model built

in Chapter 5 provides the possibility of developing an extremely coarse-grained ENM

for modeling the Young’s modulus of a piece of elastin-like material. Equation (5.7) can

be used as the potential energy function with the calculated k and d0 as its constant

parameters. In this model, each XL and HP domain correspond to an atom and a bond,

respectively. By removing a number of HP domains randomly, the Young’s modulus

under the conditions of limited cross-linking efficiency can also be simulated.

Appendix A

Force Fields Comparison

A.1 Background

There are multiple pieces of evidence showing that OPLS-AA/L [61], the force field

used for most of the work presented in this thesis, is not the best one among a variety

of modern force fields. First, the results from a couple of very recent studies on force

fields comparison [8, 71] suggest that OPLS-AA/L is not the best at reproducing NMR

measurements for biomolecular systems. Second, results of Sarah Rauscher from our

group show that OPLS-AA/L produces over-collapsed conformations of the N-terminal

SH3 domain of the protein drk, and hence underestimating its Rg [97]. Third, our

colleague Aditi found that an Ala-rich peptide, A7K1, which is found to form α-helix

by circular dichroism (CD), can hardly form any α-helix in OPLS-AA/L (unpublished

results). Therefore, we started a comparison study that focuses on selecting an optimal

force field for this project, in particular, the MD simulations of EBPs.

At the first stage of this study, an ELP, (GVPGV)7, has been simulated in 7 force fields.

1The sequence of A7K is AAAAAAAKAAKAAAAAAA.

96

Appendix A. Force Fields Comparison 97

One of them, CHARMM22*, has been tested with two water models. Therefore, in total,

8 force field sets have been tested and compared for (GVPGV)7. The 8 force field sets

are shown in Table A.1. A force field set simply means the force field plus a particular

type of water model. Usually, a force field has a preferred water model, which is the one

used when the force field was being developed. Initially, CHARMM27 and CHARMM22*

were paired with TIP3P. Afterwards, realizing there is a CHARMM-modified variant of

TIP3P, namely TIPS3P, so CHARMM22* + TIPS3P was also added to the force field

sets for comparison. The results on (GVPGV)7 suggest that CHARMM22* + TIPS3P is

the best force field set because the peptide reaches its largest average Rg. Concurrently,

a parallel comparison conducted by Aditi shows that A7K forms an extensive amount of

α-helix in CHARMM22*, which does not happen in any of the other force fields2 she has

compared.

Force field Water model

ff99SB-ILDN [51, 72] TIP3P [57]

ff99SB*-ILDN [15, 72] TIP3P

ff03* [31] TIP3P

ff03w [13] TIP4P/2005 [2]

OPLS-AA/L [61] TIP4P [57]

CHARMM27 [75, 71] TIP3P

CHARMM22* [95] TIPS3P [74]

Table A.1: Selected force field set for comparison.

At the second stage of the study, we simulated 3 of the model peptides, (PGV)12, (GV)18

and G36, in 7 solvents, water, methanol, ethanol, pentanol, heptanol, octanol, octane in

CHARMM22*, trying to reproduce the Rg curve as shown in Chapter 4 and with proper

initial conformations. As mentioned in Subsection 4.2.4, the results have been reproduced

2The force fields Aditi has compared include ff99SB*-ILDN, ff03w, OPLS-AA/L and CHARMM22*.


qualitatively for (PGV)12. However, as for the other two peptides, the Rg curve of (GV)18

includes an abnormally high spike in methanol and that of G36 is flattened out. After

looking into the structure of (GV)18, a serious artifact is found that involves the formation

of continuous G-V β-turns, which we name it the zigzag extension. After communicating

with one of the major developers of the CHARMM force field, Alex Mackerell from

the University of Maryland, we took his suggestion and tested the currently newest

CHARMM force field, CHARMM36 [16]. Unfortunately, the zigzag extension still exists

in CHARMM36 although not as abundant as in CHARMM22*.

We suspect that this artifact is caused by inadequate parameterization of the Gly back-

bone parameters for the dihedral angles, φ and ψ. Therefore, at the third stage of the force

fields comparison study, we plotted the potential energy map for dipeptides of 3 residues,

Gly, Val and Pro, in vacuo with different φ and ψ angles in 4 force fields, CHARMM22*,

CHARMM36, OPLS-AA/L, and ff99SB*-ILDN. The results indicate that the effects of

different force field families on the potential energy is fundamentally different for Gly

dipeptide while very similar for the Val and Pro dipeptides. Since the zigzag extension

has not been observed in OPLS-AA/L, the results confirms our suspect that the artifact

is most likely due to improper parameters of the backbone dihedral angles of Gly.

This study is still ongoing, and probably will need assistance from the CHARMM22*

force field developers to improve the parameters of the backbone dihedral angles of Gly.

The following sections present the results that have been obtained as of writing.


A.2 Results

A.2.1 Force Fields Comparison for (GVPGV)7

Radius of Gyration The distributions of Rg of (GVPGV)7 in different force fields is

shown in Figure A.1. According to the broadness of the distributions, the force fields

can be roughly divided into three groups. Unsurprisingly, OPLS-AA/L belongs to the

first group, in which the peptide is the most collapsed than that in the other groups.

In addition, ff03* and ff03w also belong to the first group. The second group includes

ff99SB-ILDN, ff99SB*-ILDN, CHARMM27 and CHARMM22* with TIP3P water model,

in which the peptide has a larger Rg than in group one. Surprisingly, when CHARMM22*

is paired with the CHARMM-modified variant of TIP3P [74], TIPS3P, Rg becomes even

larger. Therefore, CHARMM22* itself forms the third group when paired with TIPS3P.

PMF The PMF along the end-to-end distance of (GVPGV)7 has been computed, based

on which the modulus of the peptide has also been calculated using the method described

in Subsection 5.2.1. The result is shown in Figure A.2. In terms of the fitting quality,

ff03*, ff03w, OPLS-AA/L and CHARMM22*+TIPS3P are above 0.90 and hence better

than the others, which means PMF converges faster in these force fields. However, as

shown in Figure A.1, ff03*, ff03w and OPLS-AA/L tend to underestimate Rg. As a result,

CHARMM22*+TIPS3P is the most preferred force field set. In terms of the consequent

modulus, since they are all on the same order of magnitude and the experimental value

is unknown, the difference among force fields is not very helpful in helping us make the

selection. Therefore, we decided to use CHARMM22* + TIPS3P as the optimal force

field set for the future work in this project.


0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3Rg (nm)

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

P

ff99SB-ILDN + TIP3P

ff99SB*-ILDN + TIP3P

ff03* + TIP3P

ff03w + TIP4P/2005

OPLS + TIP4P

CHARMM27 + TIP3P

CHARMM22* + TIP3P

CHARMM22* + TIPS3P

Figure A.1: Distributions of Rg of (GVPGV)7 in different force field sets. Only backbone

Cα atoms were used for the calculation.


0.0

0.5

1.0

1.5

2.0

2.5

3.0

PM

F (k

J/m

ol)

k = 3.5 ± 0.1 pN/nmd0 = 1.7 ± 0.0 nmr2 = 0.87

ff99SB-ILDN + TIP3P

k = 3.3 ± 0.3 pN/nmd0 = 1.8 ± 0.0 nmr2 = 0.84

ff99SB*-ILDN + TIP3P

k = 6.7 ± 0.2 pN/nmd0 = 1.4 ± 0.0 nmr2 = 0.93

ff03* + TIP3P

k = 4.1 ± 0.2 pN/nmd0 = 1.5 ± 0.0 nmr2 = 0.92

ff03w + TIP4P/2005

0.5 1.0 1.5 2.0 2.5 3.0d (nm)

0.0

0.5

1.0

1.5

2.0

2.5

3.0

PM

F (k

J/m

ol)

k = 6.5 ± 0.5 pN/nmd0 = 1.3 ± 0.0 nmr2 = 0.92

OPLS-AA/L + TIP4P

0.5 1.0 1.5 2.0 2.5 3.0d (nm)

k = 3.9 ± 0.2 pN/nmd0 = 1.4 ± 0.0 nmr2 = 0.80

CHARMM27 + TIP3P

0.5 1.0 1.5 2.0 2.5 3.0d (nm)

k = 3.8 ± 0.1 pN/nmd0 = 1.6 ± 0.0 nmr2 = 0.82

CHARMM22* + TIP3P

0.5 1.0 1.5 2.0 2.5 3.0d (nm)

k = 3.4 ± 0.1 pN/nmd0 = 1.9 ± 0.0 nmr2 = 0.93

CHARMM22* + TIPS3P

Figure A.2: PMFs of (GVPGV)7 in different force field sets. The dashed line show a fit

of parabola to the PMF. k and d0 are the modulus and the equilibrium length of the

peptide, and r2 indicates the fitting quality.


A.2.2 Force Fields Comparison for (GV)18

Zigzag extension As mentioned above, the Rg curve of (GV)18 contains a spike in

methanol. After analyzing the structure, we found that the spike is caused by a structure

called zigzag extension, which also exists in simulations in CHARMM36. A snapshot of

zigzag extension is shown in Figure A.4. In the zigzag extension, there is only one

type of β-turn, which is formed by the H-bond between the backbone C=O group of

Gly2i+1 and the N-H group of Val2i+4. In a perfect zigzag extension, nearly all residues

(except Val2 and Gly35) of the peptide are engaged in the formation of β-turns, which

would transform the peptide into a pseudo 2D-helix. The structure of zigzag extension is

most prominent in methanol, but also appears in other solvents. Although the β-spiral

model [119] mentioned in Section 1.6 also consists of repetitive β-turns, it is actually very

different from the zigzag extension. In a hypothetical β-spiral, for example, in (PGV)n

where all VPGV units form β-turns, both of the N-H and C=O groups of a single Val

participate in the formations of two consecutive β-turns.

H-bonding Map To have a quantitative view of the structure of zigzag extension,

H-bonding maps were calculated for (GV)18 in different solvents as well as in different

force fields as shown in Figure A.5. Since the zigzag extension involves H-bonds between

the backbone C=O group of Gly2i+1 and the N-H group of Val2i+4, so only every other

C=O group is hydrogen bonded, which is exactly what the maps in CHARMM22* and

CHARMM36 in Figure A.5 show. Similarly, the H-bonding maps of (GV)18 in other

solvents in Figure A.6 also indicate the existence of zigzag extensions. In contrast, as

shown in Figure A.5, the C=O group of every residue involves in the formation a H-bond

in OPLS-AA/L, and the corresponding type of turn is γ turn.


watermethanol

ethanolpentanol

heptanoloctanol

octane

0.8

1.0

1.2

1.4

1.6

Radiu

s of

Gyra

tion (

Rg)

(nm

)

(PGV)12

(GV)18

(G)36

Figure A.3: Average Rg of (GVPGV)7, (GV)18 and G36 in water, alcoholic solvents, and

octane. The alcoholic solvents are ordered by the length of their methyl chains.


Figure A.4: A snapshot of the zigzag extension of (GV)18 in methanol in CHARMM22*

force field. The upper half is a far-sight view of the whole peptide, the lower half zooms

into the zigzag region.


Figure A.5: H-bonding maps of (GV)18 in CHARMM22*, OPLS-AA/L and CHARMM36

force fields.


Figure A.6: H-bonding map of (GV)18 in other solvents in CHARMM22* force field.


PMF of the Ramachandran Plot In addition, we also plotted the PMF of the Ra-

machandran plot for Gly and Val in (GV)18 in CHARMM22*, CHARMM36 and OPLS-

AA/L as shown in Figure A.7 and A.8.

On the one hand, the PMFs of Gly in CHARMM22* and in CHARMM36 are very similar,

and both favor the helical region over the extended region. This is not exactly consistent

with the PMF of a short tripeptide, G3, in which the extended region was preferred

instead, but the overall contour shapes are similar among those PMFs. The aforemen-

tioned PMF of G3 was published in the paper that first announced CHARMM36 [16]. In

contrast, the PMF of Gly in OPLS-AA/L is very different from those in CHARMM force

fields not only because there is no preference to either the helical or extended region,

but also its contour shape is drastically different. Since the zigzag extension happens

in both CHARMM22* and CHARMM36, but not in OPLS-AA/L, we think this result

suggests that it is Gly that causes the zigzag extension. In addition, although the PMF

of Gly should be symmetrical with respect to the point of (φ = 0, ψ = 0) as shown in

Figure A.9, which is calculated for G35, the introduction of Val breaks such symmetry

and results in a PMF slightly biased towards the φ > 0 region.

On the other hand, the PMFs of Val are all different from each other in terms of both

of the contour shape and the preference to a certain region of φ-ψ combinations. In

CHARMM22*, the extended region is favored. In CHARMM36, the helical region is

favored. In OPLS-AA/L, though the extended region is favored, the helical region is

also populated. Given that the zigzag extension happens in both CHARMM22* and

CHARMM36, we think it is not very sensitive to the difference in the PMFs of Val. In

other word, Val is less likely to be the major reason that causes the zigzag extension.


Figure A.7: PMFs of Ramachandran plots for Gly in (GV)18 in CHARMM22*, OPLS-

AA/L and CHARMM36 force fields.


Figure A.8: PMFs of Ramachandran plots for Val in (GV)18 in CHARMM22*, OPLS-

AA/L and CHARMM36 force fields.


Figure A.9: PMFs of Ramachandran plots for Gly in G36 in CHARMM22*, OPLS-AA/L

and CHARMM36 force fields.


A.2.3 Force Fields Comparison for Dipeptides In Vacuo

Potential Energy Map The potential energy maps of the dipeptides of 3 residues,

Gly, Val, Pro, in 4 force fields, CHARMM22*, CHARMM36, OPLS-AA/L, and ff99SB*-

ILDN are shown in Figures A.10, A.11, and A.12. The maps of CHARMM22* and

CHARMM36 are exactly the same for all three dipeptides because we found the system of

a single dipeptide is too simple to distinguish the two. The maps of Gly dipeptide are very

different among different force field families. In particular, the peaks and troughs of the

potential energy are drastically different from each other. In OPLS-AA/L and ff99SB*-

ILDN, the favored region of φ-ψ combinations is close that of the extended structures,

while in the CHARMM force fields, it is closer to the helical region. For Val dipeptide

maps, their peaks and troughs are much closer to each other among different force fields

than those of Gly dipeptide though not exactly the same. The trough of potential energy

in CHARMM force fields favor the right part of the region of extended structures, but

ff99SB*-ILDN favors its left part while OPLS-AA/L displays no preference. In contrast,

the potential energy maps of the Pro dipeptide are nearly the same among the different

force fields.

A.3 Discussion

The force fields comparison on (GVPGV)7 suggests that CHARMM22* is the favored

force field, but CHARMM22* produces significant artifact on (GV)18. We think that the

sequence repetitiveness of (GV)18 probably amplifies the zigzag effect. The comparison

of the PMFs of Ramachandran plots of (GV)18 and G35, as well as the potential energy

maps of the dipeptides in different force fields suggests that the problem is likely to be

caused by Gly since it is where the force fields are most different from each other, at least


at the residue level.

As mentioned in the background, this work is still ongoing. Currently, the major bot-

tleneck is the resolution of the artifact by obtaining better parameters for Gly backbone

dihedral angles in CHARMM force fields, which probably needs help from the CHARMM

force field developers.

150

100

50

0

50

100

150

ψ

Gly, CHARMM22* Gly, CHARMM36

150 100 50 0 50 100 150φ

150

100

50

0

50

100

150

ψ

Gly, OPLS-AA/L

150 100 50 0 50 100 150φ

Gly, ff99SB*-ILDN

120

105

90

75

60

45

30

15

0

kJ/mol

Figure A.10: Potential energy maps of the Gly dipeptide in different force fields.


150

100

50

0

50

100

150

ψ

Val, CHARMM22* Val, CHARMM36

150 100 50 0 50 100 150φ

150

100

50

0

50

100

150

ψ

Val, OPLS-AA/L

150 100 50 0 50 100 150φ

Val, ff99SB*-ILDN

120

105

90

75

60

45

30

15

0

kJ/mol

Figure A.11: Potential energy maps of the Val dipeptide in different force fields.


150

100

50

0

50

100

150

ψ

Pro, CHARMM22* Pro, CHARMM36

150 100 50 0 50 100 150φ

150

100

50

0

50

100

150

ψ

Pro, OPLS-AA/L

150 100 50 0 50 100 150φ

Pro, ff99SB*-ILDN

0

60

120

180

240

300

360

kJ/mol

Figure A.12: Potential energy maps of the Pro dipeptide.


A.4 Material & Methods

All the peptides, including the dipeptides, are capped with an N-terminal acetyl group

and an C-terminal amide group.

(GVPGV)7 The same setup parameters apply to simulations in all 8 force field sets

shown in Table A.1. We simulated 40 300-ns replicas with the first 150 ns truncated

as equilibration, which results in 6 µs of sampling time in each force field set. The

initial structures of the peptides were generated at 300 K in vacuo. All simulations

were performed at constant pressure (1 bar) and constant temperature (300 K) with

periodic boundary conditions. The simulation package used was Gromacs-4.5.5. The

LINCS algorithm was used to constrain all bond lengths [47, 46], and an integration time

step of 2 fs was applied. An cutoff of 1.4 nm was used for Lennard-Jones interactions.

The PME algorithm [26, 33] was used to calculate long-range electrostatics interactions

with a Fourier spacing of 0.12 and a interpolation order of 4. Nose-Hoover thermostat

[88, 50] was used for temperature coupling with the peptide and solvent coupled to two

temperature baths and a time constant of 2 ps. Parrinello-Rahman [91] was used for

pressure coupling with a time constant of 2 ps.

(PGV)12, (GV)18, G36 For simulations in CHARMM22*, please refer to Section

4.4. For simulations in OPLS-AA/L, please refer to Section 3.5. For simulations in

CHARMM36, except for the force field, all the technical setup was the same as in

CHARMM22*. In the H-bonding maps, the value for each possible intramolecular

peptide-peptide H-bonds was calculated by the number of its appearance along the tra-

jectory normalized by the number of frames and averaged over all replicas.


Dipeptides The potential energy is calculated from the structure after energy mini-

mization, during which a quadratic term was added to the potential energy function to

restrain the φ and ψ dihedral angles to two particular values, respectively. Please note

the final potential energy was calculated without the quadratic term. The values for the

dihedral angles varied from −180◦ and 170◦ at a 10◦ interval. To compare among the

force fields, the minima of potential energy map were adjusted to the same level of that

in CHARMM22*.

Appendix B

sumcoresg

sumcoresg, which stands for sum up the cores usage for a research group, is a web

application (web app) that collects, analyzes and presents usage data of computational

resources on multiple computer clusters.

B.1 Motivation

Our group has been allocated a significant amount of computational resource in recent

years by Compute Canada (https://computecanada.ca/), so we think it necessary

to track our usage so as to fully utilize the resource. Since the allocated resource is

distributed over multiple computer clusters across the country, the usage tracking will

also help our group avoid underutilizing some of the clusters while overutilizing the others,

which would result in unnecessary queueing time for the jobs to start running.

We started by assigning each cluster to a group member who would be in charge of

collecting the usage data for that particular cluster. The data was collected by executing

a customized script, keeping it running without hanging up, and restarting it immediately

117

https://computecanada.ca/index.php/en/

https://computecanada.ca/

Appendix B. sumcoresg 118

after a cluster shutdown and reboot. At the end of each week, each member would report

the data collected during the last week to a group leader who would compile all the

results and discuss it in the upcoming group meeting. This process was rather tedious,

inefficient and can be problematic. For example, people all use their own scripts for their

own clusters, hence it was uncertain if they all calculated the usage in the same way.

Also, it was not always easy to keep noticed of a cluster shutdown and reboot, which

could result in one or two days’ data loss.

Later on, one of our group members, Chris Ing, wrote a script that could collect the

contemporary usage data from all allocated clusters at once when executed, which made

the data collection process much less laborious. However, the downside of Chris’ code

was that it still needed to be executed manually, which limited the frequency of data

collection. Besides, group members could visualize the usage data freely until it was

presented.

Therefore, I decided to build a web application to automate the whole process, which

includes data collection, analysis and presentation. One major advantage of a web ap-

plication over a desktop program is that there is no installation process required on the

client side except for a web browser and Internet connection, which is available by default

on most modern computers or cell phones.

This work turns out to be sumcoresg, which is accessible at http://usage.pomeslab.

com at any time, but only authorized people are able to view the usage data. Currently,

sumcoresg collects usage data from 8 computer clusters across Canada about every 10

minutes, which results in about 1000 data points per week. Therefore, the usage data

collected is reasonably accurate. After the data analysis, the latest usage information of

all interested clusters are shown in a table, which can be useful for a person to select a

relatively less busy cluster and start running new jobs there. The historical usage data

will be visualized as a plot along the time or in a bar chart. However, the usage data can

http://usage.pomeslab.com

http://usage.pomeslab.com


essentially be presented in any way that feels straightforward and convenient.

B.2 Material & Methods

At the backend of sumcoresg, the language used is Python (http://www.python.org/),

the web framework used is Flask (http://flask.pocoo.org/), the templating engine

used is Jinja2 (http://jinja.pocoo.org/docs/templates/), and the database used

is PostgreSQL (http://www.postgresql.org/).

At the frontend, the markup, styling and programming languages used are HTML, CSS and

JavaScript, respectively.

The hosting service used is Heroku (https://www.heroku.com/), a cloud platform ini-

tially developed for Ruby on Rails (http://rubyonrails.org/), and later extended

for developments in Python as well. The major advantage of Heroku is that it’s free for

small web applications.

The network protocol used for communication between the web server and all the inter-

ested clusters is Secure Shell 2 (SSH), and its implementation in Python (i.e. Python

module) used is paramiko (http://www.lag.net/paramiko/). The network used for

communication between the web server and users is Hypertext Transfer Protocol (HTTP).

All the code scripts and folders in sumcoresg are summarized in Table B.1. The source

code will be available upon request.

http://www.python.org/

http://flask.pocoo.org/

http://jinja.pocoo.org/docs/templates/

http://www.postgresql.org/

https://www.heroku.com/

http://rubyonrails.org/

http://www.lag.net/paramiko/


sumcoresg.py

contains all URL handlers. The important ones include

main, login, signup, logout, report, plot, plot_dur,

histo, pomeslab_png. It also contains two functions for

starting to collect data (start_collecting_data) and

starting the application (start_app_run).

app_config.py includes global configurations.

thedata.py includes constant variables.

util.py includes utility functions.

obj.py includes Python classes. e.g. Cluster, Report.

statparsers.pyincludes the parsers for processing usage data, which are

in the format of XML, fetched from different clusters.

data_collector.pyincludes functions for data collection, process, and presen-

tation.

db_tables.pyincludes table schemas used in the database. e.g. Usage,

Account, Figure.

manage.pyincludes management functions for initial launch of the

app.

distribute_pub_key.py

for distributing the SSH public key to different clusters

once updated. To save the trouble, it would be better to

use this script after confirming all the interested computer

clusters are on.


write_xml.py

based on CLUSTER_TAGS, CLUSTER_DATA and USER_DATA in

thedata.py, it generates static/xml/clusters.xml and

static/xml/users.xml, which contain the configurations

for clusters and users. Each time the above three vari-

ables are updated, write_xml.py needs executed so that

clusters.xml and users.xml are up to date. It is now

realized that writing the configurations directly into the

database would have been much more convenient and eas-

ier to update.

queue_data.pyfor importing the data that was manually collected previ-

ously. Now the code deprecated.

generate_key.shcontains code sample for generating a new SSH key pair,

i.e. both public and private keys.

templates This folder contains all HTML templates.

staticThis folder contains all non-dynamically generated files.

e.g css files and js files.

afternoon.backup This folder contains backups for usage data.

Makefile contains make rules for daily maintenance of the app.

matplotlibrcused by matplotlib [54], it contains customizations for

decorating the plots.

ProcfileHeroku specific file, contains the code to be executed when

the app is launched.

requirements.txtHeroku specific file, contains the modules that need to be

installed for launching the app.

runtime.txt Heroku specific file, specifies a specific runtime.


.sumcoresgk.pub

contains the public SSH key, which needs to be stored

in the ${HOME}/.ssh/authorized_keys file in each inter-

ested computer cluster. It should not be version controlled

in order to reduce the risk of being attacked.

.sumcoresgk

contains the private SSH key, which is needed by the web

server to interact with the computer clusters. It should

never be made public, so to reduce the risk of being at-

tached, it shall be regenerated and distributed to all clus-

ters using distribute_pub_key.py regularly.

Table B.1: Summary of scripts and folders in sumcoresg.

B.3 Workflow

The workflow of sumcoresg is illustrated as in Figure B.1. The web server first connects

to the target computer cluster, speaking SSH, and execute the command which will

generate the latest usage data on that computer cluster. The specific command depends

on what queueing system is installed and how it is configured on the particular cluster

interacted with. Generally, on a system with Moab/Torque installed, the command is like

/path/to/showq/or/qstat -some -options --format=xml.

--format=xml makes sure the data returned is in the format of XML. The returned usage

data will be received by the web server, and then processed and formatted in HTML. The

formatted result should be memcached using the program memcache in order to speed up

the response when a user visits the web application. The interaction between the web

server and the computer cluster takes place at a specified time interval (e.g. 10 minutes).


When a user visits the web application speaking HTTP, the server will respond with

the memcached result immediately. After the user receives it, the browser on the user’s

computer will generate the graph for visualization. In contrast to the server-cluster

interaction, that between web server and a user only happens when he visits the website.

In addition, in order to restrict the access only to authorized users, only those that know

a secret code, which is set up by the web master, are able to sign up and then log in

to visualize the usage data. However, please be noted that this is a not a very strong

authentication system, and could become insecure when the user base increases.

B.4 Screen Shots

Figure B.1–B.3 show how the data look like as of writing. Figure B.1 shows the latest

report on usage, which are being updated around every 10 minutes. Figure B.2 and B.3

show the historical usage data along the time and in a bar chart. As mentioned above,

the data presentation can be versatile.

B.5 Future Directions

The current code works very well, but there are still many ways to optimize and improve

the code both for readability and development of new features in the future.

1. Reimplement the configurations of clusters and users in the database in replacement

of write_xml.py, static/xml/clusters.xml and static/xml/users.xml, and

CLUSTER_TAGS, CLUSTER_DATA and USER_DATA variables in thedata.py.

2. Further modularize data_collector.py, and rewrite big functions into smaller


Web ServerComputer Cluster

showq/qstat

SSH2

return usage data in xml

The web server receives, processes, formats the usage data, and then memcache the

results. GET / HTTP/1.1Host: usage.pomeslab.com

<!DOCTYPE html><html lang="en"> <head> <meta charset="utf-8"/> <title>Latest report</title> </head> <body> ... </body></html>

Visitor

Figure B.1: Workflow of sumcoresg. Please see detailed description in the text. In the

end of the screen shot, results from more computer clusters are omitted as indicated

by the ellipsis. The cartoons for the web server and visitor are downloaded directly

from clker.com. The cartoon of the computer cluster is drawn by duplicate a unit of 4

computers, which is also from clker.com.

http://www.clker.com/

http://www.clker.com/


Figure B.2: Historical usage data along the time. The y axis shows the usage as a

percentage of the allocated core hours. The numbers of allocated cores in the legend

is arbitrarily made for illustration purpose only. The dashed line indicates 100% usage,

below or above which means the cluster is temporarily being underutilized or overutilized.

The density of data points is actually much higher than what is shown in the figure. The

data resolution is decreased because the figure would otherwise be very large in size and

take a long time to be load.


Figure B.3: Historical usage data in a bar chart. A bar chart is good for summarizing

the usage data over a long period of time. The x axis shows a list of cluster names being

tracked, and the y axis shows the usage as a percentage of the allocated core hours. bars

in green and red indicate overutilization and underutilization, respectively. The dashed

line indicates 100% usage. The title means the data is a summary of the usage data since

the beginning of the year.


ones.

3. Isolate URL handlers for the user system (e.g. login, signup, logout) from

sumcoresg.py into an separate script.

4. Build a content management system to make addition and removal of clusters and

users easy.

5. Build more interactive ways of data visualization.

6. Generalize the app so that it can be easily adopted by any research group that uses

multiple clusters simultaneously.

Appendix C

xit

xit is a program that eases the process of system set up and analysis for multiple replica

MD simulations.

C.1 Motivation

For MD simulations, it is routine to set up multiple replicas, submit jobs, analyze the

trajectories, and visualize the results. New comers usually do each step separately, which

will ends up in a number of scripts distributed all over the system. Even worse, when

it comes to a new set of systems, old scripts are likely to be copied and modified so as

to be adapted to the new systems. As a result, it will not take long to end up with

many different but highly similar files all around the place, which is a big challenge

for maintenance. Therefore, it is necessary to generalize the process and write up a

single program that is able to handle all kinds of routine jobs, which should also highly

extensible and easy to be adopted to new projects. This program turns out to be xit.

Besides, xit also implements a queueing mechanism for executing most of the jobs it is

128

Appendix C. xit 129

capable of in parallel via multi-threading.

The following list provides a more specific description of what xit does.

1. Set up simulation replicas, and then submitted jobs to the queue in the compute

cluster for calculations. Given the templates of input files, xit will generate the

code to be run for each replica, and then submit it to the queuing system. This

step is done via xit prep --some --options.

2. After the simulations are done, xit should handle all kinds of analysis. This step

is done via xit anal --some --options.

3. After each type of analysis, if the result is not in the most convenient format,

especially when the analysis code is not self-written, then it may need to be

transformed first and then stored properly in a data file. This step is done via

xit transform --some --options.

4. After the transformation, the result is almost ready for visualization. The commond

to generate a figure is xit plot --some --options.

5. It has been realized that xit plot is more appropriate for visualizing the anal-

ysis of a single property. When it comes to multiple ones, use the command

xit plotmp --some --options where mp means multiple properties.

In all of the above steps, xit takes care of looping through all replicas.

C.2 Material & Methods

xit is written purely in Python. The file format chosen for storing the configuration

file, which is project specific, is YAML (http://www.yaml.org/spec/1.2/spec.html).

http://www.yaml.org/spec/1.2/spec.html

Appendix C. xit 130

YAML turns out to be very powerful and useful. The Python module used to parse the

configuration files is PyYAML (http://pyyaml.org/). The file format chosen for stor-

ing analyzed and transformed results is HDF5 (http://www.hdfgroup.org/HDF5/), and

the corresponding Python module used is PyTables (http://www.pytables.org/). The

templating engine used is Jinja2 (http://jinja.pocoo.org/docs/), both for templat-

ing topological files and generating replica-specific analysis commands.

Currently, all of the subcommands of xit, i.e. prep, anal, transform, plot, plotmp

have been written, and new types of analysis and plots are being constantly added. Out

of the five subcommands, prep and anal are always executed in parallel.

All the code scripts and folders in sumcoresg are summarized in Table C.1. The source

code is available upon request.

xit.py

processes the commandline argument and then invokes one of

the following functions: prep, anal, transform, plot, plotmp

based on the subcommand typed.

prep.py

handles routine works to set up multiple simulation replicas. For

example, create directories, copy over initial structure files (e.g.

pdb or gro file), templating topology (e.g. top file) files and

scripts for system equilibration. Please be noted that xit does

not have code for equilibrating the MD system, but it can gen-

erate the code to do that based on a input template file.

anal.pyhandles different kinds of analysis and invokes the corresponding

functions in the analysis_methods directory.

transform.py

handles different kinds of transformation of the raw results gen-

erated by other analysis codes (e.g. parse xvg or xpm files gen-

erated by many Gromacs tools), and stores the results in a HDF5

file.

http://pyyaml.org/

http://www.hdfgroup.org/HDF5/

http://www.pytables.org/

http://jinja.pocoo.org/docs/

Appendix C. xit 131

plot.pyprepares the data to be plotted and invokes the corresponding

plotting function in the plot_types directory.

plotmp.pysimilar to plot.py but handles plotting of multiple properties

and uses plotting functions in the plotmp_types directory.

xutils.pyincludes the function for parsing the commandline arguments.

The function is called by function main in xit.py.

prop.pyincludes all table schemas for storing the analysis results in a

HDF5 file

objs.py includes Python class objects.

utils.py includes various utility functions.

analysis_methods

This folder contains files for all types of analysis, when adding

a new type of analysis, please modify one of the files or add

a new file to this folder. If a new file is added, please edit

analysis_methods/__init__.py to make sure it is properly im-

ported.

plot_types

This folder contains scripts for generating plots of different types.

Similar to analysis_methods, modify one of the files in it or add

a new file to this folder when a new type of plot is needed. If a

new file is added, please edit plot_types/__init__.py to make

sure it is properly imported.

plotmp_typesSimilar to plot_types, but scripts in this folder plot multiple

properties in a single figure.

.xitconfig.yaml

This is the configuration file which contains all project specific

information, and it should be located in the root of the project

directory.

Appendix C. xit 132

Table C.1: Summary of scripts and folders in xit.

C.3 Usage Examples

In this section, an example is shown for using each of the subcommands, prep, anal,

transform, plot, plotmp.

The following command will make directories for 10 replicas (from 00 to 09) of systems

of all combinations of sequence (sq) 3, 4, 5, 6 in water (w), methanol (m) and ethanol (e).

In total, there will be 120 jobs (10 replicas × 12 systems). The naming of the sequences

and solvents is totally arbitrary.

xit prep --vars sq[3-6] 'w m e' [00-09] --prepare mkdir

The following command will analyze the radius of gyration of all Cα atoms of the back-

bone for 10 replicas (from 00 to 09) of systems of sequence 3 and 6. --nolog means

no log files will be generated, instead the standard output (stdout) and standard error

(stderr) will be printed to the screen directly. Without --nolog, a log file will be gen-

erated for each replica. A very handy option for debugging purposes is --test, which

will print the commands to be executed instead of actually executing it for each replica.

It is like a dry run.

xit anal --vars 'sq3 sq6' 'w m' [00-09] --analysis rg_c_alpha --nolog

The following command will transform the results in xvg format to a proper one as

specified in prop.py, and then store them in a HDF5 file, which is specified in the config-

uration file, .xitconfig.yaml. A previously transformed results can be overwritten by

Appendix C. xit 133

appending the option --overwrite.

xit transform --vars 'sq3 sq6' 'w m' [00-09] --property rg_c_alpha \

--filetype xvg

The following command will plot the results of rg_c_alpha transformed in a bar chart.

The option --grptoken path2 is used to decide how the replicas should be grouped. In

this example, it is assumed that the directories with a deeper level than path2 represent

the replica numbers of a particular system, so they should be grouped together and only

their average will be used for plotting. The calculated values will also be stored in the

HDF5 file in order to reduce the time of re-plotting. The calculated values can also be

overwritten by appending the option --overwrite.

xit plot --vars 'sq3 sq6' 'w m' [00-09] --property rg_c_alpha \

--plot_type bars --grptoken path2

The following command is an example of plotting the two properties, Property 1 (p1)

and Property 2 (p2) on the x and y axes respectively, as indicated by the name of

plotmp_type, xy. The option, --overwrite, works for xit plotmp as well.

xit plotmp --vars 'sq3 sq6' 'w m' [00-09] --properties p1 p2 \

--plotmp_type xy --grptoken path2 --overwrite

When it comes to a new project, only if there needs to be a new type of analysis or plot

will new code need written. Otherwise, what is needed is usually just a new version of

the configuration file.

Below is part of an example configuration file in YAML format, where the text after # is

comment.

systems:# variables used to identify a single replica of

Appendix C. xit 134

# a particular systemvar1: [sq1, sq2, sq3, sq4, sq5, sq6]var2: [w, m]var3: ['00','01','02','03','04','05','06','07','08','09']

dir1: '{var2}300' # dir of level 1dir2: '{var1}' # dir of level 2dir3: '{var3}' # dir of level 3id : '{var1}{var2}{var3}' # a unique id for each replica

data:repository: 'repository' # dir containing mdp, templates, etc.analysis : 'analysis' # dir containing plain text resultsplots : 'plots' # dir for storing plotted figureslog : 'log' # dir for storing logs

hdf5:title : 'in water and methanol'filename: 'mono_meo.h5'

# includes another YAML file which contains configurations about# different types of analysisanal: !include .xitconfig_anal.yaml

# configurations for plottingplot:

rg_c_alpha: # property to be plottedbars: # name of a particular plot type

ylabel: {ylabel: $R_g$} # y label using LaTeX syntaxgrped_bars: # name of a particular plot type

grp_REs: ['w300/sq[1-6]', 'm300/sq[1-6]']ylabel: {ylabel: $R_g$, labelpad: 10}xticklabels:

labels: ['(GVPGV)7', '(PGV)12', '(GGVGV)7','(GVGVA)7', '(GV)18', '(G)35']

rotation: 15

C.4 Future Directions

xit is like a pipeline for setting up MD simulations and analyzing the resultant data.

As more analysis methods and plotting types are added, xit will become more featured

and more useful.

Appendix D

tprparser

tprparser is a component of MDAnalysis, a popular MD analysis package for multiple

popular MD simulation packages. It parses tpr files generated by Gromacs and extracts

useful topology information.

D.1 Motivation

A tpr file is the one that contains all information about the structural topology and

running parameters of a MD system in Gromacs. To start a MD run, Gromacs needs to

extract all the information from a tpr file. Although a tpr file contains all the useful

information, its file structure tpr file is poorly documented, which limits its access by

other MD analysis packages like MDAnalysis[81]. Previous to the work included in this

section, when using MDAnalysis to analyze the trjactories generated by Gromacs, pdb

or gro files have to be used for obtaining the information about structural topology.

However, the information contained in these files is limited comparing to that in a tpr

file. For example, the charge of individual atom is not available in either pdb or gro

135

Appendix D. tprparser 136

files. Therefore, there is a need for a tpr parser that can interact with MDAnalysis in

the community, and this need was first proposed in 2008 (see https://code.google.

com/p/mdanalysis/issues/detail?id=2 for a chronological discussion on this topic).

Finally, a workable tprparser has been written and will be included in the upcoming

0.8 release of MDAnalysis.

D.2 Material & Methods

MDAnalysis is written in Python, so is tprparser. A tpr file is written in External

Data Representation (XDR) format, which is a standard for description and encoding

data, and used for transferring data between different computer architectures (http:

//tools.ietf.org/html/rfc4506). In XDR, data is serialized. The full description of

the XDR format can be obtained from RFC 4506 at the above URL. In short, XDR uses

a base unit of 4 bytes to represent all items in the data. The Python package used for

coding and decoding a XDR file is called xdrlib. Although xdrlib is written following

RFC 1014 (http://tools.ietf.org/html/rfc1014) of 1987, which was obsoleted by

RFC 1832 (http://tools.ietf.org/html/rfc1832) in 1995, which was again obsoleted

by RFC 4506 in 2006, but without technical changes. Based on trial and error and

communications with the Gromacs developers on the mailing-list, it turns out that there

is no XDR version incompatibility issue between the tpr file and xdrlib.

Currently, since only the decoding of a tpr file is interested, i.e. to extract useful infor-

mation from it, we will not discuss how to encode a tpr file. Therefore, what needs to be

done is to follow the structure of the source code written in C from Gromacs, and figure

out how it decodes a tpr file, and then follow the same routines but in Python. With

all the necessary information decoded, it then needs to be formalized in a proper way so

that it could be used by MDAnalysis.

https://code.google.com/p/mdanalysis/issues/detail?id=2

https://code.google.com/p/mdanalysis/issues/detail?id=2

http://tools.ietf.org/html/rfc4506





D.3 Results

The structure of the tpr file turns out to be quite convoluted, which can be why still

no parser was written for it after 5 years ad passed since its need was first proposed.

What makes it even more difficult to follow the routines is that the structure of tpr

file changes with every major Gromacs release, which can be seen in the numerous “if

. . . elif . . . else . . .” structures in the source code, which is a result of trying to keep

Gromacs backward compatible.

Because the running parameters are rarely used when analyzing a MD trajectory, The

current version of the tprparser only extracts the structural information, which includes

atoms (number, name, type, resname, resid, segid, mass, charge, residue, segment, radius,

bfactor, resnum), bonds, angles, dihedral angles and improper dihedral angles. Accord-

ing to one of the major developers and maintainers of MDAnalysis, Oliver Beckstein,

tprparser is still the only available parser for reading tpr files written in pure Python

as of this writing.

D.4 Discussion

The major advantage of writing analysis tools using MDAnalysis in Python over using

Gromacs template in C is that it is generally easier and faster to program in a high-level

language like Python than in a low-level language like C. However, there can also be

disadvantages. For example, regular Python code is much slower than the corresponding

C code that can do the same job. To speed up the process, the bottleneck part of the

slow Python code may need to be compiled into binary and then called by Python.

Currently, tprparser is capable of parsing tpr files generated by Gromacs of version


4.0.x–4.6.x. It is currently only available in the develop branch of MDAnalysis (https:

//code.google.com/p/mdanalysis/source/list?name=develop), but will be included

in the upcoming 0.8 release.

Keeping up with the release of new tpr structures is probably the major challenge for

the future maintenance of tprparser. Ideally, the Gromacs developers could develop a

stable structure of the tpr file and have it well documented.

https://code.google.com/p/mdanalysis/source/list?name=develop

https://code.google.com/p/mdanalysis/source/list?name=develop

Bibliography

[1] B. B. Aaron and J. M. Gosline. Optical properties of single elastin fibres indicate

random protein conformation. Nature, 287(865):867, 1980.

[2] J. L. Abascal and C. Vega. A general purpose model for the condensed phases

of water: TIP4P/2005. The Journal of Chemical Physics, 123(23):234505–234512,

2005.

[3] M. P. Allen and D. J. Tildesley. Molecular Dynamics. In Computer Simulation of

Liquids, chapter 3, pages 71–109. Oxford University Press, 1989.

[4] J. Alper. Stretching the limits. Science, 297(5580):329–331, 2002.

[5] A. L. Andrady and J. E. Mark. Thermoelasticity of swollen elastin networks at

constant composition. Biopolymers, 19(4):849–855, 1980.

[6] M. Baer, E. Schreiner, A. Kohlmeyer, R. Rousseau, and D. Marx. Inverse tem-

perature transition of a biomimetic elastin model: Reactive flux analysis of fold-

ing/unfolding and its coupling to solvent dielectric relaxation. The Journal of

Physical Chemistry B, 110(8):3576–3587, 2006.

[7] C. Baldock, A. F. Oberhauser, L. Ma, D. Lammie, V. Siegler, S. M. Mithieux,

Y. Tu, J. Y. H. Chow, F. Suleman, M. Malfois, S. Rogers, L. Guo, T. C. Irving, T. J.

139

Bibliography 140

Wess, and A. S. Weiss. Shape of tropoelastin, the highly extensible protein that

controls human tissue elasticity. Proceedings of the National Academy of Sciences

of the United States of America, 108(11):4322–4327, 2011.

[8] O. M. Becker, J. Alexander D. MacKerell, B. Roux, and M. Watanabe. Are protein

force fields getting better? A systematic benchmark on 524 diverse NMR measure-

ments. Journal of Chemical Theory and Computation, 8(4):1409–1414, 2012.

[9] O. M. Becker, A. D. MacKerell, Jr., B. Roux, and M. Watanabe. Computational

Biochemistry and Biophysics. CRC Press, 2001.

[10] C. M. Bellingham, M. A. Lillie, J. M. Gosline, G. M. Wright, B. C. Starcher,

A. J. Bailey, K. A. Woodhouse, and F. W. Keeley. Recombinant human elastin

polypeptides self-assemble into biomaterials with elastin-like properties. Biopoly-

mers, 70(4):445–455, 2003.

[11] C. M. Bellingham, K. A. Woodhouse, P. Robson, S. J. Rothstein, and F. W. Keeley.

Self-aggregation characteristics of recombinantly expressed human elastin polypep-

tides. Biochimica et Biophysica Acta, 1550(1):6–19, 2001.

[12] H. J. C. Berendsen, D. van der Spoel, and R. van Drunen. GROMACS: A message-

passing parallel molecular dynamics implementation. Computer Physics Commu-

nications, 91(1-3):43–56, 1995.

[13] R. Best. Protein simulations with an optimized water model: cooperative helix for-

mation and temperature-induced unfolded state collapse. The Journal of Physical

Chemistry B, 114(46):14916–13923, 2010.

[14] R. B. Best, D. de Sancho, and J. Mittal. Residue-specific α-helix propensities from

molecular simulation. Biophysical Journal, 102(6):1462–1467, 2012.

Bibliography 141

[15] R. B. Best and G. Hummer. Optimized molecular dynamics force fields applied

to the helix-coil transition of polypeptides. The Journal of Physical Chemistry B,

113(26):9004–9015, 2009.

[16] R. B. Best, X. Zhu, J. Shim, P. E. M. Lopes, J. Mittal, M. Feig, and A. D. Mackerell,

Jr. Optimization of the additive CHARMM all-atom protein force field targeting

improved sampling of the backbone φ, ψ and side-chain χ1 and χ2 dihedral angles.

Journal of Chemical Theory and Computation, 8(9):3257–3273, 2012.

[17] B. Bochicchio, A. Pepe, and A. M. Tamburro. Investigating by CD the molecular

mechanism of elasticity of elastomeric proteins. Chirality, 994(9):985–994, 2008.

[18] S. L. Brazee and E. Carrington. Interspecific comparison of the mechanical prop-

erties of mussel byssus. The Biological Bulletin, 211(3):263–274, 2006.

[19] B. R. Brooks, R. E. Bruccoleri, B. D. Olafson, D. J. States, S. Swaminathan, and

M. Karplus. CHARMM: A program for macromolecular energy, minimization, and

dynamics calculations. Journal of Computational Chemistry, 4(2):187–217, 1983.

[20] J. E. Castle, A. M. Salvi, R. Flamia, and G. Satriano. Surface science aspects

of supramolecular conformation in elastin-like polypeptides. Surface and Interface

Analysis, 44(2):246–257, 2012.

[21] D. K. Chang and D. W. Urry. Polypentapeptide of elastin: Damping of internal

chain dynamics on extension. Journal of Computational Chemistry, 10(6):850–855,

1989.

[22] H. Chung, T. Y. Kim, and S. Y. Lee. Recent advances in production of recombinant

spider silk proteins. Current Opinion in Biotechnology, 23(6):957–964, 2012.

[23] M. I. S. Chung, M. Miao, R. J. Stahl, E. Chan, J. Parkinson, and F. W. Keeley. Se-

Bibliography 142

quences and domain structures of mammalian, avian, amphibian and teleost tropoe-

lastins: Clues to the evolutionary history of elastins. Matrix biology, 25(8):492–504,

2006.

[24] J. T. Cirulis. Self-Assembly and Fibre Formation of by Self-Assembly and Fibre

Formation of Elastin-Like Polypeptides. PhD thesis, 2009.

[25] W. D. Cornell, P. Cieplak, C. I. Bayly, I. R. Gould, K. M. Merz, D. M. Ferguson,

D. C. Spellmeyer, T. Fox, J. W. Caldwell, and P. A. Kollman. A Second Generation

Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules.

Journal of the American Chemical Society, 117(19):5179–5197, 1995.

[26] T. Darden, D. York, and L. Pedersen. Particle mesh Ewald: An Nlog(N) method

for Ewald sums in large systems. The Journal of Chemical Physics, 98(12):10089–

10092, 1993.

[27] K. A. Dill. Dominant Forces in Protein Folding. Biochemistry, 29(31):7133–7155,

1990.

[28] C. M. Dobson. Protein misfolding, evolution and disease. Trends in Biochemical

Sciences, 24(9):329–332, 1999.

[29] C. M. Dobson. Protein folding and misfolding. Nature, 426(6968):884–890, 2003.

[30] K. L. Dorrington and N. G. McCrum. Elastin as a rubber. Biopolymers, 16(6):1201–

1222, 1977.

[31] Y. Duan, C. Wu, S. Chowdhury, M. C. Lee, G. Xiong, W. Zhang, R. Yang,

P. Cieplark, R. Luo, T. Lee, J. Caldwell, J. Wang, and P. Kollman. A pointcharge

force field for molecular mechanics simulations of proteins based on condensed-

Bibliography 143

phase quantum mechanical calculations. Journal of Computational Chemistry,

24(16):1999–2012, 2003.

[32] C. M. Elvin, A. G. Carr, M. G. Huson, J. M. Maxwell, R. D. Pearson, T. Vuo-

colo, N. E. Liyou, D. C. C. Wong, D. J. Merritt, and N. E. Dixon. Synthesis

and properties of crosslinked recombinant pro-resilin. Nature, 437(7061):999–1002,

2005.

[33] U. Essmann, L. Perera, M. L. Berkowitz, T. Darden, H. Lee, and L. G. Peder-

sen. A smooth particle mesh Ewald method. The Journal of Chemical Physics,

103(19):8577–8593, 1995.

[34] R. Flamia, G. Lanza, A. M. Salvi, J. E. Castle, and A. M. Tamburro. Conforma-

tional study and hydrogen bonds detection on elastin-related polypeptides using

X-ray photoelectron spectroscopy. Biomacromolecules, 6(3):1299–1309, 2005.

[35] R. Flamia, a. M. Salvi, L. D’Alessio, J. E. Castle, and A. M. Tamburro. Trans-

formation of amyloid-like fibers, formed from an elastin-based biopolymer, into a

hydrogel: an X-ray photoelectron spectroscopy and atomic force microscopy study.

Biomacromolecules, 8(1):128–138, 2007.

[36] R. Flamia, P. A. Zhdan, M. Martino, J. E. Castle, and A. M. Tamburro. AFM

study of the elastin-like biopolymer poly(ValGlyGlyValGly). Biomacromolecules,

5(4):1511–1518, 2004.

[37] N. Floquet, S. Hery-Huynh, M. Dauchez, P. Derreumaux, A. M. Tamburro, and

A. J. P. Alix. Structural characterization of VGVAPG, an elastin-derived peptide.

Biopolymers, 76(3):266–280, 2004.

[38] D. Frenkel and B. Smit. Molecular Dynamics Simulations. In Understanding Molec-

ular Simulation (Second Edition): From Algorithms to Applications, chapter 4,

Bibliography 144

pages 63–107. Academic Press, 2 edition, 2002.

[39] A. E. Garcıa and K. Y. Sanbonmatsu. α-helical stabilization by side chain shielding

of backbone hydrogen bonds. Proceedings of the National Academy of Sciences of

the United States of America, 99(5):2782–2787, 2002.

[40] R. Glaves, M. Baer, E. Schreiner, R. Stoll, and D. Marx. Conformational dynam-

ics of minimal elastin-like polypeptides: the role of proline revealed by molecular

dynamics and nuclear magnetic resonance. Chemphyschem, 9(18):2759–2765, 2008.

[41] J. Gosline, M. Lillie, E. Carrington, P. Guerette, C. Ortlepp, and K. Savage. Elastic

proteins: biological roles and mechanical properties. Philosophical Transactions

of the Royal Society of London. Series B, Biological Sciences, 357(1418):121–132,

2002.

[42] J. M. Gosline. Hydrophobic interaction and a model for the elasticity of elastin.

Biopolymers, 17(3):677–695, 1978.

[43] J. M. Gosline, F. F. Yew, and T. WeisFogh. Reversible structural changes in a

hydrophobic protein, elastin, as indicated by fluorescence probe analysis. Biopoly-

mers, 14(9):1811–1826, 1975.

[44] W. R. Gray, L. B. Sandberg, and J. A. Foster. Molecular model for elastin structure

and function. Nature, 246(5434):461–466, 1973.

[45] S. C. Harvey, R. K.-Z. Tan, and T. E. Cheatham III. The flying ice cube: velocity

rescaling in molecular dynamics leads to violation of energy equipartition. Journal

of Computational Chemistry, 19(7):726–740, 1998.

[46] B. Hess. P-LINCS: A parallel linear constraint solver for molecular simulation.

Journal of Chemical Theory and Computation, 4(1):116–122, 2008.

Bibliography 145

[47] B. Hess, H. Bekker, H. J. C. Berendsen, and J. G. E. M. Fraaije. LINCS: A linear

constraint solver for molecular simulations. Journal of Computational Chemistry,

18(12):1463–1472, 1997.

[48] B. Hess, C. Kutzner, D. van der Spoel, and E. Lindahl. GROMACS 4: Algorithms

for highly efficient, load-balanced, and scalable molecular simulation. Journal of

Chemical Theory and Computation, 4(3):435–447, 2008.

[49] C. A. J. Hoeve and P. J. Flory. The elastic properties of elastin. Biopolymers,

13(4):677–686, 1974.

[50] W. G. Hoover. Canonical dynamics: Equilibrium phase-space distributions. Phys-

ical Review A, 31(3):1695–1697, 1985.

[51] V. Hornak, R. Abel, A. Okur, B. Strockbine, A. Roitberg, and C. Simmerling. Com-

parison of multiple Amber force fields and development of improved protein back-

bone parameters. Proteins: Structure, Function, and Bioinformatics, 65(3):712–

725, 2006.

[52] Http://www.uniprot.org/uniprot/P15502. Human tropoelastin sequence.

[53] W. Humphrey, A. Dalke, and K. Schulten. VMD: visual molecular dynamics.

Journal of Molecular Graphics, 14(1):33–38, 1996.

[54] J. D. Hunter. Matplotlib: A 2D graphics environment. Computing in Science &

Engineering, 9(3):90–95, 2007.

[55] S. Hwang, Q. Shao, H. Williams, C. Hilty, and Y. Q. Gao. Methanol Strength-

ens Hydrogen Bonds and Weakens Hydrophobic Interactions in ProteinsA Com-

bined Molecular Dynamics and NMR study. The Journal of Physical Chemistry

B, 115(20):6653–6660, 2011.

Bibliography 146

[56] A. Jabs, M. S. Weiss, and R. Hilgenfeld. Non-proline cis peptide bonds in proteins.

Journal of molecular biology, 286(1):291–304, 1999.

[57] W. L. Jorgensen, J. Chandrasekhar, J. D. Madura, R. W. Impey, and M. L. Klein.

Comparison of simple potential functions for simulating liquid water. The Journal

of Chemical Physics, 79(2):926–935, 1983.

[58] W. L. Jorgensen, D. S. Maxwell, and J. Tirado-Rives. Development and testing

of the OPLS all-atom force field on conformational energetics and properties of

organic liquids. Journal of the American Chemical Society, 118(45):11225–11236,

1996.

[59] W. L. Jorgensen and J. Tirado-Rives. The OPLS potential functions for proteins,

energy minimizations for crystals of cyclic peptides and crambin. Journal of the

American Chemical Society, 110(6):1657–1666, 1988.

[60] W. Kabsch and C. Sander. Dictionary of protein secondary structure: pat-

tern recognition of hydrogen-bonded and geometrical features. Biopolymers,

22(12):2577–2637, 1983.

[61] G. A. Kaminski, R. A. Friesner, J. Tirado-Rives, and W. L. Jorgensen. Evaluation

and reparametrization of the OPLS-AA force field for proteins via comparison

with accurate quantum chemical calculations on peptides. The Journal of Physical

Chemistry B, 105(28):6474–6487, 2001.

[62] R. G. Kirste, W. A. Kruse, and K. Ibel. Determination of the conformation of

polymers in the amorphous solid state and in concentrated solution by neutron

diffraction. Polymer, 16(2):120–124, 1975.

[63] P. Kollman, R. Dixon, W. Cornell, T. Fox, C. Chipot, and A. Pohorille. The de-

velopment/application of the minimalist organic/biochemical molecular mechanic

Bibliography 147

force field using a combination of ab initio calculations and experimental data. In

W. van Gunsteren, P. Weiner, and A. Wilkinson, editors, Computer Simulation of

Biomolecular Systems: Theoretical and Experimental Application Vol. 3. Springer,

1997.

[64] D. B. Kony, P. H. Hunenberger, and W. F. van Gunsteren. Molecular dynam-

ics simulations of the native and partially folded states of ubiquitin: influence of

methanol cosolvent, pH, and temperature on the protein structure and dynamics.

Protein science : a publication of the Protein Society, 16(6):1101–1118, 2007.

[65] J. Kyte and R. F. Doolittle. A simple method for displaying the hydropathic

character of a protein. Journal of Molecular Biology, 157(1):105–132, 1982.

[66] B. Li, D. O. V. Alonso, B. J. Bennion, and V. Daggett. Hydrophobic hydration

is an important source of elasticity in elastin-based biopolymers. Journal of the

American Chemical Society, 123(48):11991–11998, 2001.

[67] B. Li, D. O. V. Alonso, and V. Daggett. The molecular basis for the inverse

temperature transition of elastin. Journal of Molecular Biology, 305(3):581–592,

2001.

[68] B. Li and V. Daggett. Molecular basis for the extensibility of elastin. Journal of

Muscle Research and Cell Motility, 23(5-6):561–573, 2002.

[69] D.-W. Li and R. Bruschweiler. NMR-Based Protein Potentials. Angewandte

Chemie, 122(38):6930–6932, 2010.

[70] M. A. Lillie, G. J. David, and J. M. Gosline. Mechanical role of elastin-associated

microfibrils in pig aortic elastic tissue. Connective Tissue Research, 37(1-2):121–

141, 1998.

Bibliography 148

[71] K. Lindorff-Larsen, P. Maragakis, S. Piana, M. P. Eastwood, R. O. Dror, and D. E.

Shaw. Systematic validation of protein force fields against experimental data. PloS

one, 7(2):e32131, 2012.

[72] K. Lindorff-Larsen, S. Piana, K. Palmo, P. Maragakis, J. L. Klepeis, R. O. Dror,

and D. E. Shaw. Improved side-chain torsion potentials for the Amber ff99SB

protein force field. Proteins, 78(8):1950–1958, 2010.

[73] a. Luzar and D. Chandler. Effect of environment on hydrogen bond dynamics in

liquid water. Physical Review Letters, 76(6):928–931, 1996.

[74] A. D. MacKerell Jr., D. Bashford, M. Bellott, R. L. Dunbrack, J. D. Evanseck,

M. J. Field, S. Fischer, J. Gao, H. Guo, S. Ha, D. Joseph-McCarthy, L. Kuchnir,

K. Kuczera, F. T. K. L. Lau, C. Mattos, S. Michnick, T. Ngo, D. T. Nguyen,

B. Prodhom, W. E. Reiher, III, B. Roux, M. Schlenkrich, J. C. Smith, R. Stote,

J. Straub, M. Watanabe, J. Wiorkiewicz-Kuczera, D. Yin, and M. Karplus. All-

atom empirical potential for molecular modeling and dynamics studies of proteins.

The Journal of Physical Chemistry B, 102(18):3586–3616, 1998.

[75] A. D. Mackerell, Jr., M. Feig, and C. L. Brooks III. Extending the treatment of

backbone energetics in protein force fields: limitations of gas-phase quantum me-

chanics in reproducing protein conformational distributions in molecular dynamics

simulations. Journal of Computational Chemistry, 25(11):1400–1415, 2004.

[76] S. J. Marrink, A. H. de Vries, and A. E. Mark. Coarse grained model for semiquan-

titative lipid simulations. The Journal of Physical Chemistry B, 108(2):750–760,

2004.

[77] S. J. Marrink, H. J. Risselada, S. Yefimov, D. P. Tieleman, and A. H. de Vries.

The MARTINI force field: coarse grained model for biomolecular simulations. The

Bibliography 149

Journal of Physical Chemistry B, 111(27):7812–7824, 2007.

[78] R. P. Mecham. Methods in elastic tissue biology: elastin isolation and purification.

Methods, 45(1):32–41, 2008.

[79] M. Miao, C. M. Bellingham, R. J. Stahl, E. E. Sitarz, C. J. Lane, and F. W.

Keeley. Sequence and structure determinants for the self-aggregation of recombi-

nant polypeptides modeled after human elastin. Journal of Biological Chemistry,

278(49):48553–48562, 2003.

[80] M. Miao, J. T. Cirulis, S. Lee, and F. W. Keeley. Structural determinants of

cross-linking and hydrophobic domains for self-assembly of elastin-like polypep-

tides. Biochemistry, 44(43):14367–14375, 2005.

[81] N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. MDAnalysis:

A toolkit for the analysis of molecular dynamics simulations. Journal of Compu-

tational Chemistry, 32(10):2319–2327, 2011.

[82] F. Mistrali, D. Volpin, G. B. Garibaldo, and A. Ciferri. Thermodynamics of elas-

ticity in open systems. Elastin. The Journal of Physical Chemistry, 75(1):142–149,

1971.

[83] S. M. Mithieux and A. S. Weiss. Elastin. Advances in Protein Chemistry, 70:437–

461, 2005.

[84] L. Monticelli, S. K. Kandasamy, X. Periole, R. G. Larson, D. P. Tieleman, and S.-J.

Marrink. The MARTINI coarse-grained force field: extension to proteins. Journal

of Chemical Theory and Computation, 4(5):819–834, 2008.

[85] L. D. Muiznieks, A. S. Weiss, and F. W. Keeley. Structural disorder and dynamics

of elastin. Biochemistry and Cell Biology, 88(2):239–250, 2010.

Bibliography 150

[86] C. Neale, W. D. Bennett, D. P. Tieleman, and R. Pomes. Statistical Convergence

of Equilibrium Properties in Simulations of Molecular Solutes Embedded in Lipid

Bilayers. Journal of Chemical Theory and Computation, 7(12):4175–4188, 2011.

[87] P. S. Nerenberg and T. Head-Gordon. Optimizing ProteinSolvent Force Fields

to Reproduce Intrinsic Conformational Preferences of Model Peptides. Journal of

Chemical Theory and Computation, 7(4):1220–1230, 2011.

[88] S. Nose. A unified formulation of the constant temperature molecular dynamics

methods. The Journal of Chemical Physics, 81(1):511–519, 1984.

[89] D. Pal and P. Chakrabarti. Cis peptide bonds in proteins: residues involved,

their conformations, interactions and locations. Journal of Molecular Biology,

294(1):271–288, 1999.

[90] R. V. Pappu, X. Wang, A. Vitalis, and S. L. Crick. A polymer physics perspective

on driving forces and mechanisms for protein aggregation. Archives of Biochemistry

and Biophysics, 469(1):132–141, 2008.

[91] M. Parrinello and A. Rahman. Polymorphic transitions in single crystals: A new

molecular dynamics method. Journal of Applied Physics, 52(12):7182–7190, 1981.

[92] S. M. Partridge. Isolation and Characterization of Elastin. In E. A. Balazs, editor,

Chemistry and Molecular Biology of the Intercellular Matrix, volume 1, pages 593–

616. Academic Press, London, 1970.

[93] A. T. Petkova, W.-M. Yau, and R. Tycko. Experimental Constraints on Quaternary

Structure in Alzheimers β-Amyloid fibrils. Biochemistry, 45(2):498–512, 2006.

[94] B. M. Pettitt and M. Karplus. Role of electrostatics in the structure, energy and

dynamics of biomolecules: a model study of N-methylalanylacetamide. Journal of

Bibliography 151

the American Chemical Society, 107(5):1166–1173, 1985.

[95] S. Piana, K. Lindorff-Larsen, and D. E. Shaw. How robust are protein folding

simulations with respect to force field parameterization? Biophysical Journal,

100(9):L47–L49, 2011.

[96] M. S. Pometun, E. Y. Chekmenev, and R. J. Wittebort. Quantitative observation of

backbone disorder in native elastin. Journal of Biological Chemistry, 279(9):7982–

7987, 2004.

[97] S. Rauscher. Protein Non-Folding : A Molecular Simulation Study of the Structure

and Self-Aggregation of Elastin. PhD thesis, University of Toronto, 2011.

[98] S. Rauscher, S. Baud, M. Miao, F. W. Keeley, and R. Pomes. Proline and glycine

control protein self-organization into elastomeric or amyloid fibrils. Structure,

14(11):1667–1676, 2006.

[99] S. Rauscher, C. Neale, and R. Pomes. Simulated tempering distributed replica sam-

pling, virtual replica exchange, and other generalized-ensemble methods for con-

formational sampling. Journal of Chemical Theory and Computation, 5(10):2640–

2662, 2009.

[100] S. Rauscher and R. Pomes. Molecular simulations of protein disorder. Biochemistry

and Cell Biology, 88(2):269–290, 2010.

[101] S. Rauscher and R. Pomes. Structural Disorder and Protein Elasticity. In Fuzziness,

pages 159–183. 2012.

[102] W. Reiher. Theoretical studies of hydrogen bonding. PhD thesis, 1985.

[103] P. J. Rossky, M. Karplus, and A. Rahman. A model for the simulation of an

aqueous dipeptide solution. Biopolymers, 18(4):825–854, 1979.

Bibliography 152

[104] M. Rubinstein and R. H. Colby. Polymer physics. Oxford University Press, Reading,

Massachusetts, 2003.

[105] A. M. Salvi, P. Moscarelli, B. Bochicchio, G. Lanza, and J. E. Castle. Combined

effects of solvation and aggregation propensity on the final supramolecular struc-

tures adopted by hydrophobic, glycine-rich, elastin-like polypeptides. Biopolymers,

99(5):292–313, 2013.

[106] M. S. Searle, R. Zerella, D. H. Williams, and L. C. Packman. Native-like hairpin

structure in an isolated fragment from ferredoxin : NMR and CD studies of solvent

effects on the N-terminal 20 residues. Protein engineering, 9(7):559–565, 1996.

[107] M. Seo, S. Rauscher, R. Pomes, and D. P. Tieleman. Improving Internal Peptide

Dynamics in the Coarse-Grained MARTINI Model: Toward Large-Scale Simu-

lations of Amyloid- and Elastin-like Peptides. Journal of Chemical Theory and

Computation, 8(5):1774–1785, 2012.

[108] E. J. Sorin and V. S. Pande. Exploring the helix-coil transition via all-atom equi-

librium ensemble simulations. Biophysical Journal, 88(4):2472–2493, 2005.

[109] Y. Sugita and Y. Okamoto. Replica-exchange molecular dynamics method for

protein folding. Chemical Physics Letters, 314(1-2):141–151, 1999.

[110] D. A. Torchia and K. A. Piez. Mobility of elastin chains as determined by 13C

nuclear magnetic resonance. Journal of Molecular Biology, 76(3):419–424, 1973.

[111] G. M. Torrie and J. P. Valleau. Nonphysical sampling distributions in Monte Carlo

free-energy estimation: Umbrella sampling. Journal of Computational Physics,

23(2):187–199, 1977.

Bibliography 153

[112] J. Uitto. Biochemistry of the elastic fibers in normal connective tissues and its

alterations in diseases. Journal of Investigative Dermatology, 72(1):1–10, 1979.

[113] H. C. Urey and C. A. Bradley, Jr. the Vibrations of Pentatonic Tetrahedral

Molecules. Physical Review, 38(11):1969–1978, 1931.

[114] D. W. Urry and C. M. Venkatachalam. A librational entropy mechanism for elas-

tomers with repeating peptide sequences in helical array. International Journal of

Quantum Chemistry, 24(S10):81–93, 1983.

[115] M. B. van Eldijk, C. L. Mcgann, K. L. Kiick, and J. C. M. van Hest. Elastomeric

polypeptides. Topics in Current Chemistry, 310:71–116, 2012.

[116] K. Vanommeslaeghe, E. Hatcher, C. Acharya, S. Kundu, S. Zhong, J. Shim, E. Dar-

ian, O. Guvench, P. Lopes, I. Vorobyov, and A. D. MacKerell, Jr. CHARMM gen-

eral force field: A force field for druglike molecules compatible with the CHARMM

allatom additive biological force fields. Journal of Computational Chemistry,

31(4):671–690, 2010.

[117] K. Vanommeslaeghe and A. D. Mackerell, Jr. Automation of the CHARMM Gen-

eral Force Field (CGenFF) I: bond perception and atom typing. Journal of Chem-

ical Information and Modeling, 52(12):3144–3154, 2012.

[118] K. Vanommeslaeghe, E. P. Raman, and A. D. MacKerell, Jr. Automation of

the CHARMM General Force Field (CGenFF) II: assignment of bonded param-

eters and partial atomic charges. Journal of Chemical Information and Modeling,

52(12):3155–3168, 2012.

[119] C. M. Venkatachalam and D. W. Urry. Development of a linear helical confor-

mation from its cyclic correlate. β-Spiral model of the elastin poly(pentapeptide)

(VPGVG)n. Macromolecules, 14(5):1225–1229, 1981.

Bibliography 154

[120] S. Vieth, C. M. Bellingham, F. W. Keeley, S. M. Hodge, and D. Rousseau. Mi-

crostructural and tensile properties of elastin-based polypeptides crosslinked with

Genipin and pyrroloquinoline quinone. Biopolymers, 85(3):199–206, 2007.

[121] D. Volpin and A. Ciferri. Thermoelasticity of elastin. Nature, 225(5230):382–382,

1970.

[122] B. Vrhovski, S. Jensen, and A. S. Weiss. Coacervation characteristics of recombi-

nant human tropoelastin. European Journal of Biochemistry, 250(1):92–98, 1997.

[123] B. Vrhovski and A. S. Weiss. Biochemistry of tropoelastin. European Journal of

Biochemistry, 258(1):1–18, 1998.

[124] J. Wang, P. Cieplak, and P. A. Kollman. How well does a restrained electrostatic

potential (RESP) model perform in calculating conformational energies of organic

and biological molecules? Journal of Computational Chemistry, 21(12):1049–1074,

2000.

[125] Z. R. Wasserman and F. R. Salemme. A molecular dynamics investigation of the

elastomeric restoring force in elastin. Biopolymers, 29(12-13):1613–1631, 1990.

[126] P. K. Weiner and P. A. Kollman. AMBER: Assisted model building with energy re-

finement. A general program for modeling molecules and their interactions. Journal

of Computational Chemistry, 2(3):287–303, 1981.

[127] S. J. Weiner, P. A. Kollman, D. A. Case, U. C. Singh, C. Ghio, G. Alagona, S. Pro-

feta, Jr., and P. Weiner. A new force field for molecular mechanical simulation of

nucleic acids and proteins. Journal of the American Chemical Society, 106(3):765–

784, 1984.

[128] S. J. Weiner, P. A. Kollman, D. T. Nguyen, and D. A. Case. An all atom force field

Bibliography 155

for simulations of proteins and nucleic acids. Journal of Computational Chemistry,

7(2):230–252, 1986.

[129] T. Weis-Fogh and S. O. Andersen. New Molecular Model for the Long-range Elas-

ticity of Elastin. Nature, 227(5259):718–721, 1970.

[130] E. Wohlisch. Static-kinetic theory, thermodynamics and biological significance of

caoutchouc type elasticity. Kolloid-Z, 89:239–271, 1939.