Line notations for nucleic acids (both natural and therapeutic)

20
Line notations for nucleic acids (both natural and therapeutic) Roger Sayle Nextmove software, cambridge, uk 252 nd ACS National Meeting, Philadelphia, PA, Wednesday 24 th August 2016

Transcript of Line notations for nucleic acids (both natural and therapeutic)

Page 1: Line notations for nucleic acids (both natural and therapeutic)

Line notations for nucleic acids (both natural and therapeutic)

Roger Sayle

Nextmove software, cambridge, uk

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Page 2: Line notations for nucleic acids (both natural and therapeutic)

overview

• This presentation provides an overview of efforts to update and extend the aging 1970 IUPAC/IUBMB recommendations on nucleic acid notations.

• Examples are given of representational challenges, literature precedents, and the relationship to HELM and Novartis’ sitrack representations.

• IUPAC-IUB Commission on Biochemical Nomenclature (CBN), “Abbreviations and Symbols for Nucleic Acids, Polynucleotides and their Constituents”, http://www.chem.qmul.ac.uk/iupac/misc/naabb.html

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Page 3: Line notations for nucleic acids (both natural and therapeutic)

Level 1: bioinformatics

• IUPAC70 defines condensed forms for DNA and RNA.

• dG-dA-dT-dC

– dGuo-P-dAdo-P-dThd-P-dCyd

– Gua-dRibf-P-Ade-dRibf-P-Thy-dRibf-P-Cyt-dRibf

– 2'-deoxy-guanylyl-(3'->5')-2'-deoxy-adenylyl-(3'->5')-thymidylyl-(3'->5')-2'-deoxy-cytidine

• G-A-T-C

– rGuo-P-rAdo-P-rUrd-P-rCyd

– Gua-Ribf-P-Ade-Ribf-P-Thy-Ribf-P-Cyt-Ribf

– guanylyl-(3'->5')-adenylyl-(3'->5')-uridylyl-(3'->5')-cytidine

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Page 4: Line notations for nucleic acids (both natural and therapeutic)

Revision: carbohydrate suffix

• The first revision/tweak to IUPAC70 is better conform to IUPAC-IUBMB’s 1996 recommendations on Nomeclature of Carbohydrates, 2-CARB.

– It is understood that the rings are in pyranose form unless otherwise specified. -- 2-Carb-38.4

• Hence, Ribf is the preferred “canonical” form on output, to avoid ambiguity.

• For backward compatability, Rib is considered Ribf in nucleic acid sequences, but Ribp in carbohydrates.

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Page 5: Line notations for nucleic acids (both natural and therapeutic)

Division of monomers

• Sugar and Splice represents nucleic acids as sequences of monomers, following the convention that the 5’ phosphate is associated with the sugar/base [this matches IUPAC and PDB].

• G-A-U-C is “rGuo”, “P-rAdo”, “P-rUrd”, “P-Cyd”.

• HELM currently disagrees, associating the 3’ PO3H2.

– RNA1{R(G)P.R(A)P.R(U)P.R(C)}$$$$

• Fortunately, the conventional form is accepted:

– RNA1{R(G).PR(A).PR(U).PR(C)}$$$$

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Page 6: Line notations for nucleic acids (both natural and therapeutic)

The perception problem 1

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Multiple tautomeric forms of guanine

Page 7: Line notations for nucleic acids (both natural and therapeutic)

The perception problem 2

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Multiple protonation states of thymine

Page 8: Line notations for nucleic acids (both natural and therapeutic)

The perception problem 3

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Multiple mesomeric forms of 7-methylguanine

Page 9: Line notations for nucleic acids (both natural and therapeutic)

Level 2: natural rna variants

• IUPAC70 provides the framework for handling the many “non-standard” RNA features often observed.

• Unusual bases:

– m2Gua, m3Ura, m5Cyt, m7Gua, hUra, Yra, Wyb, br5Ura, Hyp, Xan, etc.

• Unusual backbones:

– Ribf2Me, P-P-P-rAdo

• Sugar & Splice’s proposed syntax closely resembles that of the University of Albany’s RNA modification database, http://mods.rna.albany.edu/mods/

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Page 10: Line notations for nucleic acids (both natural and therapeutic)

Iupac condensed for pdb4tna

• The perceived line notation/identifier for 4TNA is:

P-rGuo-P-rCyd-P-rGuo-P-rGuo-P-rAdo-P-rUrd-P-rUrd-P-rUrd-P-rAdo-P-m2Gua-Ribf-P-rCyd-P-rUrd-P-rCyd-P-rAdo-P-rGuo-P-hUra-Ribf-P-hUra-Ribf-P-rGuo-P-rGuo-P-rGuo-P-rAdo-P-rGuo-P-rAdo-P-rGuo-P-rCyd-P-m22Gua-Ribf-P-rCyd-P-rCyd-P-rAdo-P-rGuo-P-rAdo-P-Cyt-Ribf2Me-P-rUrd-P-Gua-Ribf2Me-P-rAdo-P-rAdo-P-Wyb(R)-Ribf-P-rAdo-P-rYrd-P-m5Cyt-Ribf-P-rUrd-P-rGuo-P-rGuo-P-rAdo-P-rGuo-P-m7Gua-Ribf-P-rUrd-P-rCyd-P-m5Cyt-Ribf-P-rUrd-P-rGuo-P-rUrd-P-rGuo-P-rThd-P-rYrd-P-rCyd-P-rGuo-P-m1Ade-Ribf-P-rUrd-P-rCyd-P-rCyd-P-rAdo-P-rCyd-P-rAdo-P-rGuo-P-rAdo-P-rAdo-P-rUrd-P-rUrd-P-rCyd-P-rGuo-P-rCyd-P-rAdo-P-rCyd-P-rCyd-P-rAdo

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Page 11: Line notations for nucleic acids (both natural and therapeutic)

The 7-methylguanosine debate

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

m7Gua (Wikipedia) m7h8Gua (PDB’s 7MG)

m7Gua (Sigma-Aldrich)

Page 12: Line notations for nucleic acids (both natural and therapeutic)

Base stereochemistry

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

h56m5Ura h56m5Ura(S) PDB: QBT

h56m5Ura(R) PDB: PBT

h34Ura h34Ura(S) h34Ura(R) PDB: DDN

Page 13: Line notations for nucleic acids (both natural and therapeutic)

Level 3: nucleic therapeutics

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

• Non-standard phosphates

– Thiophosphates (sP)

• Non-standard sugars

– dRibf2F, L-Ribf (spiegelmer), Ribf2MeOEt, ddRibf, Araf, Araf2Me, dAraf2F

– Morpholino, tricyclo, lockedRibf, cEt constrained

• Non-standard topologies

– Head-to-Head, Tail-to-Tail and mixed sequences.

– Cyclic sequences.

Page 14: Line notations for nucleic acids (both natural and therapeutic)

Examples: arabinosyl nucleosides

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

• Clevudine m5Ura-dAraf2F

• Clofarabine cl2Ade-dAraf2F

• Cytarabine Cyt-Araf

• Fludarabine fl2Ade-Araf

• Nelarabine m6Gua-Araf

• Sorivudine brvin5Ura-Araf

• Vidarabine Ade-Araf

https://en.wikipedia.org/wiki/Arabinosyl_nucleosides

Page 15: Line notations for nucleic acids (both natural and therapeutic)

Mipomersen (kynamro)

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

• Gua-Ribf2MeOEt-sP-m5Cyt-Ribf2MeOt-sP-m5Cyt-Ribf2MeOEt-sP-m5Ura-Ribf2MeOEt-sP-m5Cyt-Ribf2MeOEt-sP-dAdo-sP-dGuo-sP-dThd-sP-m5Cyt-dRibf-sP-dThd-sP-dGuo-sP-m5Cyt-dRib-sP-dThd-sP-dThd-sP-m5Cyt-dRibf-sP-Gua-Ribf2MeOEt-sP-m5Cyt-Ribf2MeOEt-sP-Ade-Ribf2MeOEt-sP-m5Cyt-Rib2MeOEt-sP-m5Cyt-Rib2MeOEt

• The ChEMBL structure, CHEMBL2219536, is almost certainly wrong due to thiophosphate connectivity.

Page 16: Line notations for nucleic acids (both natural and therapeutic)

Thiophosphate stereochemistry

• Not only does ChEMBL get the thiophosphate connectivity wrong, but HELM ignores the influence of stereochemistry

– Hartmit Jahns, Jonathan Hall, et al. “Stereochemical Bias Introduced During RNA Synthesis Modulates the Activity of phosphorothioate siRNAs”, Nature Communications, Vol. 6, p. 6317, March 2015.

• Hence Sugar & Splice also supports RsP and SsP.

– For example, dC-RsP-dC has the SMILES string c1cn(c(=O)nc1N)[C@H]2C[C@@H]([C@H](O2)CO[P@](=S)(O)O[C@H]3C[C@@H](O[C@@H]3CO)n4ccc(nc4=O)N)O

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Page 17: Line notations for nucleic acids (both natural and therapeutic)

On-going and future work

• Chemists continue to push back the boundaries of chemical space around nucleic acid therapeutics.

• NextMove Software endeavors to extend the state-of-the-art in nucleic acid perception, hopefully continuing to close the “RNA informatics gap”.

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Page 18: Line notations for nucleic acids (both natural and therapeutic)

acknowledgements

• Joann Prescott-Roy, Novartis, Cambridge, MA.

• Evan Bolton, PubChem, NCBI, Bethesda, MD.

• Noel O’Boyle, NextMove Software, UK.

• Thank you for your time.

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Page 19: Line notations for nucleic acids (both natural and therapeutic)

Pdb Nucleic acids

IUPAC/SnS PDB IUPAC/SnS PDB

P-dAdo _DA P-Wyb(R)-Ribf _YG

P-dCyd _DC P-m1Ade-Ribf 1MA

P-dGuo _DG P-m2Gua-Ribf 2MG

P-dThd _DT P-Thy-Ribf2Me 2MU

P-dUrd _DU P-m5Cyt-Ribf 5MC

P-rAdo __A P-m7Gua-Ribf 7MG

P-rCyd __C P-Ade-Ribf2Me A2M

P-rGuo __C P-m22Gua-Ribf M2G

P-rUrd __T P-Cyt-Ribf2Me OMC

P-rThd 5MU P-Gua-Ribf2Me OMG

P-rYrd PSU P-Ura-Ribf2Me OMU

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Page 20: Line notations for nucleic acids (both natural and therapeutic)

pDB monosaccharides a-D- a-L- b-D- b-L-

allHexp AFD ALL

altHexp SHD

araHexp ARA ARB

araPenf BXY AHR BXX FUB

galHexp GLA GXL GAL

glcHexp GLC BGC

gulHexp GUP GL0

lyxHexp LDY

manHexp MAN BMA

ribHexp RIP

ribPenf RIB BDR

xylHexp XYS HSY XYP LXC

xylPenf XYZ

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016