HELM-Driven Tools for Peptide-based Drug Design Using the ... · HELM-Driven Tools for...

© Heptares Therapeutics 2018

The HEPTARES name, the logo and STAR are trade marks of Heptares Therapeutics Ltd

Heptares Therapeutics is a wholly owned subsidiary of Sosei Group Corporation

HELM-Driven Tools for Peptide-based Drug Design

Using the ChemAxon Biomolecular Toolkit

ChemAxon User Group Meeting

Budapest, March 21st 2018

Conor Scully - Computational Chemistry and Informatics

Non-Confidential© 2018 Heptares Therapeutics

Biomolecular Toolkit Integration at Heptares

1

Molecule

Registration

Molecule

Display

Library

Enumeration

~400 Amino acid residues~200 Chemical groups

Manual Curation via ChemaxonBiomolecular Toolkit RESTful API and BioEddieMonomer DB

HELM

Generation

MOL

Block

Despite the structural complexity of peptides, registration is via MOL block.

MOL block generated via interconversion of HELM strings.

Used to draw peptides manually.

Introduction of HELM and Biomolecular Toolkit has reduced errors in database registration.

Biomolecular Toolkit centred around a curated database of peptide and chemical monomers. Sequence

Analysis


API: Monomer DB and Macromolecule Control

2

Access RESTful API within Python scripts to Register/Modify/Delete monomers

BioEddie for single monomer registration and updates.

Construction of various monomer dictionaries directly from the Monomer DB via API has proven indispensable for peptide informatics work.

Examples:Abbreviation : Monomer TypeA: PEPTIDE,D-Leu: PEPTIDE,Ac : CHEM,+ 600 …

Abbreviation : Attachment PointsA: [R1, R2],C : [R1, R2, R3],Ac: [R1],…

Conversion of MOL blocks in database to HELMs

Checking that HELMs are valid

Checking MOL blocks are valid

Generating Sequences for peptide molecules

Monomer Controller

Macromolecule Controller


Integrating Peptide Design Into Drug Discovery Platforms – Common Restrictions

3

Day-to-day data analysis is performed within tabular data frames such as spreadsheets and chemically aware variants.

Data (pharmacological, ADMET, properties) pulled in automatically from Oracle Database

SAR analysis, visualizations and data models are commonly used functionalities.

Bespoke, complex peptides becoming common to support structure-based binding hypotheses and property enhancements

2D representations of complex peptides not visually compatible chemically-aware data tables.

How can we collate data about peptides and perform SAR analysis in an automated way?

How can we integrate sequence information about peptides into the spreadsheet environment of small molecule chemistry?...

Mock up of a typical small molecule data environment Complex peptides represent a challenge for 2D representation


Peptide Design - Tabulating Simple Linear Peptides

4

PEPTIDE1{R.A.R.Y.[D-Leu].P.M.E.S.F}$$$$V2.0

PEPTIDE1{R.A.R.H.[D-Leu].P.M.E.S.F}$$$$V2.0

PEPTIDE1{R.A.R.H.L.P.M.E.S.F}$$$$V2.0

Peptide HELMs

Python script (HELM to TABLE)

MOL extraction from database followed by HELM conversion via Biomolecular Toolkit API

Extract the sequence from HELM and arrange in spreadsheet.

Python script pulls apart the HELMs and arranges residues left to right.

A scenario like this is directly compatible with integration of compound data with sequence information for a large linear peptide library.


Peptides with Simple Modifications

5

Peptide Connection

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10

R A C* Y D-Leu P M C* S F

R A C* H D-Leu P M C* S F

… … … … … … … … … …

PEPTIDE1{R.A.C.Y.[D-Leu].P.M.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3$$$V2.0PEPTIDE1{R.A.C.H.[D-Leu].P.M.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3$$$V2.0

HELM

Schematic

Table

Simple modifications to amino acids can be incorporated.

Cyclic molecules can be handled programmatically or manually.

Lay out residues left-to-right and use HELM connection section to add denotation mark to residues involved in a modification.


Peptides with Simple Modifications

6

Peptide Connection

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10

R A C* Y D-Leu P M C* S F

R A C* H D-Leu P M C* S F

R K* C* H D-Leu E* M C* S F

… … … … … … … … … …

PEPTIDE1{R.A.C.Y.[D-Leu].P.M.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3$$$V2.0PEPTIDE1{R.A.C.H.[D-Leu].P.M.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3$$$V2.0PEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3

|PEPTIDE1,PEPTIDE1,2:R3-7:R3$$$V2.0

Simple modifications to amino acids can be incorporated.

Cyclic molecules can be handled programmatically or manually.

Lay out residues left-to-right and use HELM connection section to add denotation mark to residues involved in a modification.

---

This system quickly breaks down and ambiguity is introduced when multiple modifications are present in a complex peptide.

HELM

Schematic

Table


Numerical Notation of Peptide Modifications in Tables

7

P1 P2 P3 P4 P5 P6 P7 P8 P9 P10

R K{1&3} C{2&3} H D-Leu E{1&3} M C{2&3} S F

R K* C* H D-Leu E* M C* S F

… … … … … … … … … …

PEPTIDE1{R.K.C.Y.[D-Leu].E.M.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3|PEPTIDE1,PEPTIDE1,2:R3-6:R3$$$V2.0

Parse the coloured connections to map out residue-to-residue connections.

Replace ambiguous symbols with unambiguous notation.

New notation can be translated directly back to HELM (and hence to MOL blocks)

Braces ( and “&”) are used to provide a regular expression “hook” for scripting.

Second number e.g. {2&3} denotes identity of residue R group making the connection


Mixing Peptides and Chemical Modifications in Tables

8

Ideally, HELM chunks might be displayed in a “left-to-right” fashion, in line with our data tables:CHEM1{[Ac]}| PEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}|CHEM2{[NH2]}$PEPTIDE1,CHEM1,1:R1-1:R1|PEPTIDE1,CHEM2,10:R2-1:R1$$$V2.0

In reality, ordering of HELM chunks can seem arbitrary…PEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}|CHEM1{[Ac]}|CHEM2{[NH2]}$PEPTIDE1,CHEM1,1:R1-1:R1|PEPTIDE1,CHEM2,10:R2-1:R1$$$V2.0

How can we reliably get to this representation starting from any HELM?

Python script breaks up HELM string using regular expressionsPEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}CHEM1{[Ac]}CHEM2{[NH2]}PEPTIDE1,CHEM1,1:R1-1:R1 --> [PEPTIDE1,CHEM1,1,R1,1,R1]PEPTIDE1,CHEM2,10:R2-1:R1 --> [PEPTIDE1,CHEM2,10,R2,1,R1]

Take advantage of R-group numbering [R1=left, R2=right..]

Apply logical rules to detect what relative orientation sections have to each other:

Arrange polymers chunks, then extract residues [CHEM1] -> [PEPTIDE1] -> [CHEM2]


Dealing with Branched Peptides in Tables

9

PEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}|CHEM1{[Ac]}|CHEM2{[NH2]}|PEPTIDE2{K.F}

$CHEM1,PEPTIDE1,1:R1-1:R1|PEPTIDE1,CHEM2,10:R2-1:R1


Which chunks are connected to current backbone by R1/R2 connections?




Push chunks “not in backbone” to dedicated branch columns Using the brace notation ensures HELM can be reconstituted.




Determine Longest Chunk in the HELM string Chunk with most residues => backbone by definition


Peptide Backbone Alignments to Templates

10

Template

← Unaligned →

← Aligned →

Unaligned Sequences

Map to “Natural Analogues”

Addition of chemical groups to peptides results in different numbers of monomers in peptide library members.

A reference molecule which to align library members to can organize the sequences into one frame of reference.

Template can be native ligand or assay standard.

Modern peptide libraries contain many unnatural and modified amino acids.

Highly complex residues are registered with Natural Analogues (i.e. 20 natural amino acids) in Monomer DB.

Transform peptide residues to Natural Analog (or X). Monomer Abbreviation : Natural Analogs dictionary

A: A,D-Leu : L,Ac: X

Align to a template using Blosum62 in Biopython. Map aligned sequence back to original residues.

Aligned Sequences

Align to Template


GLP-1: A Peptide Target for Type 2 Diabetes

11

Efficacy values for GLP1 and GCG for their

receptors

GLP1

(pEC50)

GCG

(pEC50)

GLP1R 10.7 8.6

GCGR 5.8 9.6

GLP-1

GCG

GLP1R and GLP1 -

PDB: 5VAI

Class B G Protein-Coupled Receptors (GPCRs) have been identified as targets for a broad range of diseases, including disorders of glucose metabolism and bone composition, as well as inflammation and pain.

GLP-1 Receptor (GLP1R) and Glucagon Receptor (GCGR) are closely related Class B GPCRs and bind long helical peptides of approximately 30 residues.

GLP-1 is the endogenous ligand for GLP-1 receptor a 33 residue peptide. It is closely homologous to GCG in structure and function.

Several GLP-1 analogues have been approved as treatments for Type 2 diabetes.

Databases such as Reaxys Med Chem and ChEMBLhold a rich set of data for peptide analogues of GLP1 and GCG which can be computationally linked to the chemical structures.Heptares PDB: 5NX2 (GLP1)

5EE7 (GCGR)


GLP-1 and Glucagon Peptide Data Extraction and Processing

12

Patent and literature efficacy data for GLP1R and GCGR ligands were extracted from databases in SD files

The molecule files were filtered for peptide molecules by substructure search for tripeptide SMARTS.

There are ~700 GCG peptide analogues in Reaxys Med Chem and ChEMBLdatabases and over 2500 GLP-1 peptide analogues.

The monomer DB was updated to include all residues present in the GLP1/GCG collection.

The structures are transformed to HELMs using the Biomolecular Toolkit and subjected to HELM processing workflow.


Aligned GLP-1 Peptides in Tabular Form

13

Molecules in the GLP-1 dataset underwent sequence alignment using GLP-1 as a template.

Sequences generated directly from database MOL block so all attendant pharmacological and physical fields are present.

Different databases have different field codes for activity, properties so these must be handled.

Older patents and literature contain dirtier data (peptides with no stereochemistry in the MOL blocks for example) which must be dealt with.

Statistical analyses and other learning methods now available for application on sequence columns

Aligned GLP-1 sequencesGLP-1 Data


Residue Level Analysis of Peptide Properties1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

H A E G T F T S D V S S Y L E G Q A A K E F I A W L V K G R

SubnpEC50

(mean)

D-His 5.0

4-ImidProp 10.0

ThiazAla 7.8

Y 10.8

SubnpEC50

(mean)

D-Phe 10.7

F 9.7

N-Me-Phe 9.0

a-Me-D-Phe 10.3

SubnpEC50

(mean)

1-Nal 8.5

Chg 7.6

D-Nle 6.0

D-Val 10.7

F 7.5

I 8.2

K 10.9

L 10.0

SubnpEC50

(mean)

Aib 10.4

Bip 6.0

D-Asn 6.0

E 10.2

G 9.9

K 10.7

N 6.8

S 8.4

SubnpEC50

(mean)

D-Trp 8.2

F 9.9

H 10.0

K 9.9

W 9.7

SubnpEC50

(mean)

D-Lys 6.0

E 10.2

G 9.9

I 9.7

K 9.0

NH2 10.5

P 10.6

Ability to include positional columns opens up possibilities to include residue identities in data analysis.

Statistical analysis of sequences can generate lists of substitutions which are tolerated or not for a given property of interest.

Knowledge like this can enable intelligent peptide library design.

-> GLP-1 Position

-> Native Residue

14


Analysis of PK/PD Property-Modifying Groups1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30


Ave. length of

appended

monomers at

each point

4

3

2

1

1 2 3 4 5 6 7 8 9 10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30


7 30

29 6 6 2 1 41

14

31

33 1 2 3 4 31 7 25

Number of

peptides

with

stabilizing

groups, split

by position

Most Commonly Occurring Monomer at selected positions

GLP-1 analogues subject to rapid degradation in vivo due to proteolytic activity.

Peptide modification strategies can enhance the PK/PD properties of the peptides and hence their exposure and efficacy.

Common appendages include long-chain PEG groups or fatty acids attached at tolerant residues.

Analysed here is (i) the average length of appended chains, grouped by residue and (ii) the number of branched peptides occurring in the dataset (278 peptides), grouped by residue.

15


Acknowledgements

16

Discovery - ChemistryGiles BrownMiles CongreveRebecca Nonoo

Discovery - Computational Chemistry and InformaticsRob SmithBen TehanGiovanni BottegoniRob CookeFrancesca DeflorianJon MasonJuan Carlos Mobarec

HELM-Driven Tools for Peptide-based Drug Design Using the ... · HELM-Driven Tools for...

Documents

Transcript of HELM-Driven Tools for Peptide-based Drug Design Using the ... · HELM-Driven Tools for...