HELM-Driven Tools for Peptide-based Drug Design Using the ... · HELM-Driven Tools for...
Transcript of HELM-Driven Tools for Peptide-based Drug Design Using the ... · HELM-Driven Tools for...
© Heptares Therapeutics 2018
The HEPTARES name, the logo and STAR are trade marks of Heptares Therapeutics Ltd
Heptares Therapeutics is a wholly owned subsidiary of Sosei Group Corporation
HELM-Driven Tools for Peptide-based Drug Design
Using the ChemAxon Biomolecular Toolkit
ChemAxon User Group Meeting
Budapest, March 21st 2018
Conor Scully - Computational Chemistry and Informatics
Non-Confidential© 2018 Heptares Therapeutics
Biomolecular Toolkit Integration at Heptares
1
Molecule
Registration
Molecule
Display
Library
Enumeration
~400 Amino acid residues~200 Chemical groups
Manual Curation via ChemaxonBiomolecular Toolkit RESTful API and BioEddieMonomer DB
HELM
Generation
MOL
Block
Despite the structural complexity of peptides, registration is via MOL block.
MOL block generated via interconversion of HELM strings.
Used to draw peptides manually.
Introduction of HELM and Biomolecular Toolkit has reduced errors in database registration.
Biomolecular Toolkit centred around a curated database of peptide and chemical monomers. Sequence
Analysis
Non-Confidential© 2018 Heptares Therapeutics
API: Monomer DB and Macromolecule Control
2
Access RESTful API within Python scripts to Register/Modify/Delete monomers
BioEddie for single monomer registration and updates.
Construction of various monomer dictionaries directly from the Monomer DB via API has proven indispensable for peptide informatics work.
Examples:Abbreviation : Monomer TypeA: PEPTIDE,D-Leu: PEPTIDE,Ac : CHEM,+ 600 …
Abbreviation : Attachment PointsA: [R1, R2],C : [R1, R2, R3],Ac: [R1],…
Conversion of MOL blocks in database to HELMs
Checking that HELMs are valid
Checking MOL blocks are valid
Generating Sequences for peptide molecules
Monomer Controller
Macromolecule Controller
Non-Confidential© 2018 Heptares Therapeutics
Integrating Peptide Design Into Drug Discovery Platforms – Common Restrictions
3
Day-to-day data analysis is performed within tabular data frames such as spreadsheets and chemically aware variants.
Data (pharmacological, ADMET, properties) pulled in automatically from Oracle Database
SAR analysis, visualizations and data models are commonly used functionalities.
Bespoke, complex peptides becoming common to support structure-based binding hypotheses and property enhancements
2D representations of complex peptides not visually compatible chemically-aware data tables.
How can we collate data about peptides and perform SAR analysis in an automated way?
How can we integrate sequence information about peptides into the spreadsheet environment of small molecule chemistry?...
Mock up of a typical small molecule data environment Complex peptides represent a challenge for 2D representation
Non-Confidential© 2018 Heptares Therapeutics
Peptide Design - Tabulating Simple Linear Peptides
4
PEPTIDE1{R.A.R.Y.[D-Leu].P.M.E.S.F}$$$$V2.0
PEPTIDE1{R.A.R.H.[D-Leu].P.M.E.S.F}$$$$V2.0
PEPTIDE1{R.A.R.H.L.P.M.E.S.F}$$$$V2.0
Peptide HELMs
Python script (HELM to TABLE)
MOL extraction from database followed by HELM conversion via Biomolecular Toolkit API
Extract the sequence from HELM and arrange in spreadsheet.
Python script pulls apart the HELMs and arranges residues left to right.
A scenario like this is directly compatible with integration of compound data with sequence information for a large linear peptide library.
Non-Confidential© 2018 Heptares Therapeutics
Peptides with Simple Modifications
5
Peptide Connection
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
R A C* Y D-Leu P M C* S F
R A C* H D-Leu P M C* S F
… … … … … … … … … …
PEPTIDE1{R.A.C.Y.[D-Leu].P.M.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3$$$V2.0PEPTIDE1{R.A.C.H.[D-Leu].P.M.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3$$$V2.0
HELM
Schematic
Table
Simple modifications to amino acids can be incorporated.
Cyclic molecules can be handled programmatically or manually.
Lay out residues left-to-right and use HELM connection section to add denotation mark to residues involved in a modification.
Non-Confidential© 2018 Heptares Therapeutics
Peptides with Simple Modifications
6
Peptide Connection
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
R A C* Y D-Leu P M C* S F
R A C* H D-Leu P M C* S F
R K* C* H D-Leu E* M C* S F
… … … … … … … … … …
PEPTIDE1{R.A.C.Y.[D-Leu].P.M.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3$$$V2.0PEPTIDE1{R.A.C.H.[D-Leu].P.M.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3$$$V2.0PEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3
|PEPTIDE1,PEPTIDE1,2:R3-7:R3$$$V2.0
Simple modifications to amino acids can be incorporated.
Cyclic molecules can be handled programmatically or manually.
Lay out residues left-to-right and use HELM connection section to add denotation mark to residues involved in a modification.
---
This system quickly breaks down and ambiguity is introduced when multiple modifications are present in a complex peptide.
HELM
Schematic
Table
Non-Confidential© 2018 Heptares Therapeutics
Numerical Notation of Peptide Modifications in Tables
7
P1 P2 P3 P4 P5 P6 P7 P8 P9 P10
R K{1&3} C{2&3} H D-Leu E{1&3} M C{2&3} S F
R K* C* H D-Leu E* M C* S F
… … … … … … … … … …
PEPTIDE1{R.K.C.Y.[D-Leu].E.M.C.S.F}$PEPTIDE1,PEPTIDE1,3:R3-8:R3|PEPTIDE1,PEPTIDE1,2:R3-6:R3$$$V2.0
Parse the coloured connections to map out residue-to-residue connections.
Replace ambiguous symbols with unambiguous notation.
New notation can be translated directly back to HELM (and hence to MOL blocks)
Braces ( and “&”) are used to provide a regular expression “hook” for scripting.
Second number e.g. {2&3} denotes identity of residue R group making the connection
Non-Confidential© 2018 Heptares Therapeutics
Mixing Peptides and Chemical Modifications in Tables
8
Ideally, HELM chunks might be displayed in a “left-to-right” fashion, in line with our data tables:CHEM1{[Ac]}| PEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}|CHEM2{[NH2]}$PEPTIDE1,CHEM1,1:R1-1:R1|PEPTIDE1,CHEM2,10:R2-1:R1$$$V2.0
In reality, ordering of HELM chunks can seem arbitrary…PEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}|CHEM1{[Ac]}|CHEM2{[NH2]}$PEPTIDE1,CHEM1,1:R1-1:R1|PEPTIDE1,CHEM2,10:R2-1:R1$$$V2.0
How can we reliably get to this representation starting from any HELM?
Python script breaks up HELM string using regular expressionsPEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}CHEM1{[Ac]}CHEM2{[NH2]}PEPTIDE1,CHEM1,1:R1-1:R1 --> [PEPTIDE1,CHEM1,1,R1,1,R1]PEPTIDE1,CHEM2,10:R2-1:R1 --> [PEPTIDE1,CHEM2,10,R2,1,R1]
Take advantage of R-group numbering [R1=left, R2=right..]
Apply logical rules to detect what relative orientation sections have to each other:
Arrange polymers chunks, then extract residues [CHEM1] -> [PEPTIDE1] -> [CHEM2]
Non-Confidential© 2018 Heptares Therapeutics
Dealing with Branched Peptides in Tables
9
PEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}|CHEM1{[Ac]}|CHEM2{[NH2]}|PEPTIDE2{K.F}
$CHEM1,PEPTIDE1,1:R1-1:R1|PEPTIDE1,CHEM2,10:R2-1:R1
|PEPTIDE1,PEPTIDE2,7:R3-1:R1$$$V2.0
Which chunks are connected to current backbone by R1/R2 connections?
PEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}|CHEM1{[Ac]}|CHEM2{[NH2]}|PEPTIDE2{K.F}
$CHEM1,PEPTIDE1,1:R1-1:R1|PEPTIDE1,CHEM2,10:R2-1:R1
|PEPTIDE1,PEPTIDE2,7:R3-1:R1$$$V2.0
Push chunks “not in backbone” to dedicated branch columns Using the brace notation ensures HELM can be reconstituted.
PEPTIDE1{R.K.C.Y.[D-Leu].P.E.C.S.F}|CHEM1{[Ac]}|CHEM2{[NH2]}|PEPTIDE2{K.F}
$CHEM1,PEPTIDE1,1:R1-1:R1|PEPTIDE1,CHEM2,10:R2-1:R1
|PEPTIDE1,PEPTIDE2,7:R3-1:R1$$$V2.0
Determine Longest Chunk in the HELM string Chunk with most residues => backbone by definition
Non-Confidential© 2018 Heptares Therapeutics
Peptide Backbone Alignments to Templates
10
Template
← Unaligned →
← Aligned →
Unaligned Sequences
Map to “Natural Analogues”
Addition of chemical groups to peptides results in different numbers of monomers in peptide library members.
A reference molecule which to align library members to can organize the sequences into one frame of reference.
Template can be native ligand or assay standard.
Modern peptide libraries contain many unnatural and modified amino acids.
Highly complex residues are registered with Natural Analogues (i.e. 20 natural amino acids) in Monomer DB.
Transform peptide residues to Natural Analog (or X). Monomer Abbreviation : Natural Analogs dictionary
A: A,D-Leu : L,Ac: X
Align to a template using Blosum62 in Biopython. Map aligned sequence back to original residues.
Aligned Sequences
Align to Template
Non-Confidential© 2018 Heptares Therapeutics
GLP-1: A Peptide Target for Type 2 Diabetes
11
Efficacy values for GLP1 and GCG for their
receptors
GLP1
(pEC50)
GCG
(pEC50)
GLP1R 10.7 8.6
GCGR 5.8 9.6
GLP-1
GCG
GLP1R and GLP1 -
PDB: 5VAI
Class B G Protein-Coupled Receptors (GPCRs) have been identified as targets for a broad range of diseases, including disorders of glucose metabolism and bone composition, as well as inflammation and pain.
GLP-1 Receptor (GLP1R) and Glucagon Receptor (GCGR) are closely related Class B GPCRs and bind long helical peptides of approximately 30 residues.
GLP-1 is the endogenous ligand for GLP-1 receptor a 33 residue peptide. It is closely homologous to GCG in structure and function.
Several GLP-1 analogues have been approved as treatments for Type 2 diabetes.
Databases such as Reaxys Med Chem and ChEMBLhold a rich set of data for peptide analogues of GLP1 and GCG which can be computationally linked to the chemical structures.Heptares PDB: 5NX2 (GLP1)
5EE7 (GCGR)
Non-Confidential© 2018 Heptares Therapeutics
GLP-1 and Glucagon Peptide Data Extraction and Processing
12
Patent and literature efficacy data for GLP1R and GCGR ligands were extracted from databases in SD files
The molecule files were filtered for peptide molecules by substructure search for tripeptide SMARTS.
There are ~700 GCG peptide analogues in Reaxys Med Chem and ChEMBLdatabases and over 2500 GLP-1 peptide analogues.
The monomer DB was updated to include all residues present in the GLP1/GCG collection.
The structures are transformed to HELMs using the Biomolecular Toolkit and subjected to HELM processing workflow.
Non-Confidential© 2018 Heptares Therapeutics
Aligned GLP-1 Peptides in Tabular Form
13
Molecules in the GLP-1 dataset underwent sequence alignment using GLP-1 as a template.
Sequences generated directly from database MOL block so all attendant pharmacological and physical fields are present.
Different databases have different field codes for activity, properties so these must be handled.
Older patents and literature contain dirtier data (peptides with no stereochemistry in the MOL blocks for example) which must be dealt with.
Statistical analyses and other learning methods now available for application on sequence columns
Aligned GLP-1 sequencesGLP-1 Data
Non-Confidential© 2018 Heptares Therapeutics
Residue Level Analysis of Peptide Properties1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
H A E G T F T S D V S S Y L E G Q A A K E F I A W L V K G R
SubnpEC50
(mean)
D-His 5.0
4-ImidProp 10.0
ThiazAla 7.8
Y 10.8
SubnpEC50
(mean)
D-Phe 10.7
F 9.7
N-Me-Phe 9.0
a-Me-D-Phe 10.3
SubnpEC50
(mean)
1-Nal 8.5
Chg 7.6
D-Nle 6.0
D-Val 10.7
F 7.5
I 8.2
K 10.9
L 10.0
SubnpEC50
(mean)
Aib 10.4
Bip 6.0
D-Asn 6.0
E 10.2
G 9.9
K 10.7
N 6.8
S 8.4
SubnpEC50
(mean)
D-Trp 8.2
F 9.9
H 10.0
K 9.9
W 9.7
SubnpEC50
(mean)
D-Lys 6.0
E 10.2
G 9.9
I 9.7
K 9.0
NH2 10.5
P 10.6
Ability to include positional columns opens up possibilities to include residue identities in data analysis.
Statistical analysis of sequences can generate lists of substitutions which are tolerated or not for a given property of interest.
Knowledge like this can enable intelligent peptide library design.
-> GLP-1 Position
-> Native Residue
14
Non-Confidential© 2018 Heptares Therapeutics
Analysis of PK/PD Property-Modifying Groups1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
H A E G T F T S D V S S Y L E G Q A A K E F I A W L V K G R
Ave. length of
appended
monomers at
each point
4
3
2
1
1 2 3 4 5 6 7 8 9 10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
H A E G T F T S D V S S Y L E G Q A A K E F I A W L V K G R
7 30
29 6 6 2 1 41
14
31
33 1 2 3 4 31 7 25
Number of
peptides
with
stabilizing
groups, split
by position
Most Commonly Occurring Monomer at selected positions
GLP-1 analogues subject to rapid degradation in vivo due to proteolytic activity.
Peptide modification strategies can enhance the PK/PD properties of the peptides and hence their exposure and efficacy.
Common appendages include long-chain PEG groups or fatty acids attached at tolerant residues.
Analysed here is (i) the average length of appended chains, grouped by residue and (ii) the number of branched peptides occurring in the dataset (278 peptides), grouped by residue.
15
Non-Confidential© 2018 Heptares Therapeutics
Acknowledgements
16
Discovery - ChemistryGiles BrownMiles CongreveRebecca Nonoo
Discovery - Computational Chemistry and InformaticsRob SmithBen TehanGiovanni BottegoniRob CookeFrancesca DeflorianJon MasonJuan Carlos Mobarec