Line notations for nucleic acids (both natural and therapeutic)

Post on 23-Jan-2017

73 views 1 download

Transcript of Line notations for nucleic acids (both natural and therapeutic)

Line notations for nucleic acids (both natural and therapeutic)

Roger Sayle

Nextmove software, cambridge, uk

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

overview

• This presentation provides an overview of efforts to update and extend the aging 1970 IUPAC/IUBMB recommendations on nucleic acid notations.

• Examples are given of representational challenges, literature precedents, and the relationship to HELM and Novartis’ sitrack representations.

• IUPAC-IUB Commission on Biochemical Nomenclature (CBN), “Abbreviations and Symbols for Nucleic Acids, Polynucleotides and their Constituents”, http://www.chem.qmul.ac.uk/iupac/misc/naabb.html

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Level 1: bioinformatics

• IUPAC70 defines condensed forms for DNA and RNA.

• dG-dA-dT-dC

– dGuo-P-dAdo-P-dThd-P-dCyd

– Gua-dRibf-P-Ade-dRibf-P-Thy-dRibf-P-Cyt-dRibf

– 2'-deoxy-guanylyl-(3'->5')-2'-deoxy-adenylyl-(3'->5')-thymidylyl-(3'->5')-2'-deoxy-cytidine

• G-A-T-C

– rGuo-P-rAdo-P-rUrd-P-rCyd

– Gua-Ribf-P-Ade-Ribf-P-Thy-Ribf-P-Cyt-Ribf

– guanylyl-(3'->5')-adenylyl-(3'->5')-uridylyl-(3'->5')-cytidine

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Revision: carbohydrate suffix

• The first revision/tweak to IUPAC70 is better conform to IUPAC-IUBMB’s 1996 recommendations on Nomeclature of Carbohydrates, 2-CARB.

– It is understood that the rings are in pyranose form unless otherwise specified. -- 2-Carb-38.4

• Hence, Ribf is the preferred “canonical” form on output, to avoid ambiguity.

• For backward compatability, Rib is considered Ribf in nucleic acid sequences, but Ribp in carbohydrates.

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Division of monomers

• Sugar and Splice represents nucleic acids as sequences of monomers, following the convention that the 5’ phosphate is associated with the sugar/base [this matches IUPAC and PDB].

• G-A-U-C is “rGuo”, “P-rAdo”, “P-rUrd”, “P-Cyd”.

• HELM currently disagrees, associating the 3’ PO3H2.

– RNA1{R(G)P.R(A)P.R(U)P.R(C)}$$$$

• Fortunately, the conventional form is accepted:

– RNA1{R(G).PR(A).PR(U).PR(C)}$$$$

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

The perception problem 1

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Multiple tautomeric forms of guanine

The perception problem 2

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Multiple protonation states of thymine

The perception problem 3

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Multiple mesomeric forms of 7-methylguanine

Level 2: natural rna variants

• IUPAC70 provides the framework for handling the many “non-standard” RNA features often observed.

• Unusual bases:

– m2Gua, m3Ura, m5Cyt, m7Gua, hUra, Yra, Wyb, br5Ura, Hyp, Xan, etc.

• Unusual backbones:

– Ribf2Me, P-P-P-rAdo

• Sugar & Splice’s proposed syntax closely resembles that of the University of Albany’s RNA modification database, http://mods.rna.albany.edu/mods/

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Iupac condensed for pdb4tna

• The perceived line notation/identifier for 4TNA is:

P-rGuo-P-rCyd-P-rGuo-P-rGuo-P-rAdo-P-rUrd-P-rUrd-P-rUrd-P-rAdo-P-m2Gua-Ribf-P-rCyd-P-rUrd-P-rCyd-P-rAdo-P-rGuo-P-hUra-Ribf-P-hUra-Ribf-P-rGuo-P-rGuo-P-rGuo-P-rAdo-P-rGuo-P-rAdo-P-rGuo-P-rCyd-P-m22Gua-Ribf-P-rCyd-P-rCyd-P-rAdo-P-rGuo-P-rAdo-P-Cyt-Ribf2Me-P-rUrd-P-Gua-Ribf2Me-P-rAdo-P-rAdo-P-Wyb(R)-Ribf-P-rAdo-P-rYrd-P-m5Cyt-Ribf-P-rUrd-P-rGuo-P-rGuo-P-rAdo-P-rGuo-P-m7Gua-Ribf-P-rUrd-P-rCyd-P-m5Cyt-Ribf-P-rUrd-P-rGuo-P-rUrd-P-rGuo-P-rThd-P-rYrd-P-rCyd-P-rGuo-P-m1Ade-Ribf-P-rUrd-P-rCyd-P-rCyd-P-rAdo-P-rCyd-P-rAdo-P-rGuo-P-rAdo-P-rAdo-P-rUrd-P-rUrd-P-rCyd-P-rGuo-P-rCyd-P-rAdo-P-rCyd-P-rCyd-P-rAdo

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

The 7-methylguanosine debate

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

m7Gua (Wikipedia) m7h8Gua (PDB’s 7MG)

m7Gua (Sigma-Aldrich)

Base stereochemistry

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

h56m5Ura h56m5Ura(S) PDB: QBT

h56m5Ura(R) PDB: PBT

h34Ura h34Ura(S) h34Ura(R) PDB: DDN

Level 3: nucleic therapeutics

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

• Non-standard phosphates

– Thiophosphates (sP)

• Non-standard sugars

– dRibf2F, L-Ribf (spiegelmer), Ribf2MeOEt, ddRibf, Araf, Araf2Me, dAraf2F

– Morpholino, tricyclo, lockedRibf, cEt constrained

• Non-standard topologies

– Head-to-Head, Tail-to-Tail and mixed sequences.

– Cyclic sequences.

Examples: arabinosyl nucleosides

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

• Clevudine m5Ura-dAraf2F

• Clofarabine cl2Ade-dAraf2F

• Cytarabine Cyt-Araf

• Fludarabine fl2Ade-Araf

• Nelarabine m6Gua-Araf

• Sorivudine brvin5Ura-Araf

• Vidarabine Ade-Araf

https://en.wikipedia.org/wiki/Arabinosyl_nucleosides

Mipomersen (kynamro)

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

• Gua-Ribf2MeOEt-sP-m5Cyt-Ribf2MeOt-sP-m5Cyt-Ribf2MeOEt-sP-m5Ura-Ribf2MeOEt-sP-m5Cyt-Ribf2MeOEt-sP-dAdo-sP-dGuo-sP-dThd-sP-m5Cyt-dRibf-sP-dThd-sP-dGuo-sP-m5Cyt-dRib-sP-dThd-sP-dThd-sP-m5Cyt-dRibf-sP-Gua-Ribf2MeOEt-sP-m5Cyt-Ribf2MeOEt-sP-Ade-Ribf2MeOEt-sP-m5Cyt-Rib2MeOEt-sP-m5Cyt-Rib2MeOEt

• The ChEMBL structure, CHEMBL2219536, is almost certainly wrong due to thiophosphate connectivity.

Thiophosphate stereochemistry

• Not only does ChEMBL get the thiophosphate connectivity wrong, but HELM ignores the influence of stereochemistry

– Hartmit Jahns, Jonathan Hall, et al. “Stereochemical Bias Introduced During RNA Synthesis Modulates the Activity of phosphorothioate siRNAs”, Nature Communications, Vol. 6, p. 6317, March 2015.

• Hence Sugar & Splice also supports RsP and SsP.

– For example, dC-RsP-dC has the SMILES string c1cn(c(=O)nc1N)[C@H]2C[C@@H]([C@H](O2)CO[P@](=S)(O)O[C@H]3C[C@@H](O[C@@H]3CO)n4ccc(nc4=O)N)O

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

On-going and future work

• Chemists continue to push back the boundaries of chemical space around nucleic acid therapeutics.

• NextMove Software endeavors to extend the state-of-the-art in nucleic acid perception, hopefully continuing to close the “RNA informatics gap”.

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

acknowledgements

• Joann Prescott-Roy, Novartis, Cambridge, MA.

• Evan Bolton, PubChem, NCBI, Bethesda, MD.

• Noel O’Boyle, NextMove Software, UK.

• Thank you for your time.

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

Pdb Nucleic acids

IUPAC/SnS PDB IUPAC/SnS PDB

P-dAdo _DA P-Wyb(R)-Ribf _YG

P-dCyd _DC P-m1Ade-Ribf 1MA

P-dGuo _DG P-m2Gua-Ribf 2MG

P-dThd _DT P-Thy-Ribf2Me 2MU

P-dUrd _DU P-m5Cyt-Ribf 5MC

P-rAdo __A P-m7Gua-Ribf 7MG

P-rCyd __C P-Ade-Ribf2Me A2M

P-rGuo __C P-m22Gua-Ribf M2G

P-rUrd __T P-Cyt-Ribf2Me OMC

P-rThd 5MU P-Gua-Ribf2Me OMG

P-rYrd PSU P-Ura-Ribf2Me OMU

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016

pDB monosaccharides a-D- a-L- b-D- b-L-

allHexp AFD ALL

altHexp SHD

araHexp ARA ARB

araPenf BXY AHR BXX FUB

galHexp GLA GXL GAL

glcHexp GLC BGC

gulHexp GUP GL0

lyxHexp LDY

manHexp MAN BMA

ribHexp RIP

ribPenf RIB BDR

xylHexp XYS HSY XYP LXC

xylPenf XYZ

252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016