Line notations for nucleic acids (both natural and therapeutic)
-
Upload
nextmove-software -
Category
Science
-
view
73 -
download
1
Transcript of Line notations for nucleic acids (both natural and therapeutic)
Line notations for nucleic acids (both natural and therapeutic)
Roger Sayle
Nextmove software, cambridge, uk
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
overview
• This presentation provides an overview of efforts to update and extend the aging 1970 IUPAC/IUBMB recommendations on nucleic acid notations.
• Examples are given of representational challenges, literature precedents, and the relationship to HELM and Novartis’ sitrack representations.
• IUPAC-IUB Commission on Biochemical Nomenclature (CBN), “Abbreviations and Symbols for Nucleic Acids, Polynucleotides and their Constituents”, http://www.chem.qmul.ac.uk/iupac/misc/naabb.html
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
Level 1: bioinformatics
• IUPAC70 defines condensed forms for DNA and RNA.
• dG-dA-dT-dC
– dGuo-P-dAdo-P-dThd-P-dCyd
– Gua-dRibf-P-Ade-dRibf-P-Thy-dRibf-P-Cyt-dRibf
– 2'-deoxy-guanylyl-(3'->5')-2'-deoxy-adenylyl-(3'->5')-thymidylyl-(3'->5')-2'-deoxy-cytidine
• G-A-T-C
– rGuo-P-rAdo-P-rUrd-P-rCyd
– Gua-Ribf-P-Ade-Ribf-P-Thy-Ribf-P-Cyt-Ribf
– guanylyl-(3'->5')-adenylyl-(3'->5')-uridylyl-(3'->5')-cytidine
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
Revision: carbohydrate suffix
• The first revision/tweak to IUPAC70 is better conform to IUPAC-IUBMB’s 1996 recommendations on Nomeclature of Carbohydrates, 2-CARB.
– It is understood that the rings are in pyranose form unless otherwise specified. -- 2-Carb-38.4
• Hence, Ribf is the preferred “canonical” form on output, to avoid ambiguity.
• For backward compatability, Rib is considered Ribf in nucleic acid sequences, but Ribp in carbohydrates.
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
Division of monomers
• Sugar and Splice represents nucleic acids as sequences of monomers, following the convention that the 5’ phosphate is associated with the sugar/base [this matches IUPAC and PDB].
• G-A-U-C is “rGuo”, “P-rAdo”, “P-rUrd”, “P-Cyd”.
• HELM currently disagrees, associating the 3’ PO3H2.
– RNA1{R(G)P.R(A)P.R(U)P.R(C)}$$$$
• Fortunately, the conventional form is accepted:
– RNA1{R(G).PR(A).PR(U).PR(C)}$$$$
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
The perception problem 1
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
Multiple tautomeric forms of guanine
The perception problem 2
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
Multiple protonation states of thymine
The perception problem 3
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
Multiple mesomeric forms of 7-methylguanine
Level 2: natural rna variants
• IUPAC70 provides the framework for handling the many “non-standard” RNA features often observed.
• Unusual bases:
– m2Gua, m3Ura, m5Cyt, m7Gua, hUra, Yra, Wyb, br5Ura, Hyp, Xan, etc.
• Unusual backbones:
– Ribf2Me, P-P-P-rAdo
• Sugar & Splice’s proposed syntax closely resembles that of the University of Albany’s RNA modification database, http://mods.rna.albany.edu/mods/
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
Iupac condensed for pdb4tna
• The perceived line notation/identifier for 4TNA is:
P-rGuo-P-rCyd-P-rGuo-P-rGuo-P-rAdo-P-rUrd-P-rUrd-P-rUrd-P-rAdo-P-m2Gua-Ribf-P-rCyd-P-rUrd-P-rCyd-P-rAdo-P-rGuo-P-hUra-Ribf-P-hUra-Ribf-P-rGuo-P-rGuo-P-rGuo-P-rAdo-P-rGuo-P-rAdo-P-rGuo-P-rCyd-P-m22Gua-Ribf-P-rCyd-P-rCyd-P-rAdo-P-rGuo-P-rAdo-P-Cyt-Ribf2Me-P-rUrd-P-Gua-Ribf2Me-P-rAdo-P-rAdo-P-Wyb(R)-Ribf-P-rAdo-P-rYrd-P-m5Cyt-Ribf-P-rUrd-P-rGuo-P-rGuo-P-rAdo-P-rGuo-P-m7Gua-Ribf-P-rUrd-P-rCyd-P-m5Cyt-Ribf-P-rUrd-P-rGuo-P-rUrd-P-rGuo-P-rThd-P-rYrd-P-rCyd-P-rGuo-P-m1Ade-Ribf-P-rUrd-P-rCyd-P-rCyd-P-rAdo-P-rCyd-P-rAdo-P-rGuo-P-rAdo-P-rAdo-P-rUrd-P-rUrd-P-rCyd-P-rGuo-P-rCyd-P-rAdo-P-rCyd-P-rCyd-P-rAdo
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
The 7-methylguanosine debate
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
m7Gua (Wikipedia) m7h8Gua (PDB’s 7MG)
m7Gua (Sigma-Aldrich)
Base stereochemistry
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
h56m5Ura h56m5Ura(S) PDB: QBT
h56m5Ura(R) PDB: PBT
h34Ura h34Ura(S) h34Ura(R) PDB: DDN
Level 3: nucleic therapeutics
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
• Non-standard phosphates
– Thiophosphates (sP)
• Non-standard sugars
– dRibf2F, L-Ribf (spiegelmer), Ribf2MeOEt, ddRibf, Araf, Araf2Me, dAraf2F
– Morpholino, tricyclo, lockedRibf, cEt constrained
• Non-standard topologies
– Head-to-Head, Tail-to-Tail and mixed sequences.
– Cyclic sequences.
Examples: arabinosyl nucleosides
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
• Clevudine m5Ura-dAraf2F
• Clofarabine cl2Ade-dAraf2F
• Cytarabine Cyt-Araf
• Fludarabine fl2Ade-Araf
• Nelarabine m6Gua-Araf
• Sorivudine brvin5Ura-Araf
• Vidarabine Ade-Araf
https://en.wikipedia.org/wiki/Arabinosyl_nucleosides
Mipomersen (kynamro)
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
• Gua-Ribf2MeOEt-sP-m5Cyt-Ribf2MeOt-sP-m5Cyt-Ribf2MeOEt-sP-m5Ura-Ribf2MeOEt-sP-m5Cyt-Ribf2MeOEt-sP-dAdo-sP-dGuo-sP-dThd-sP-m5Cyt-dRibf-sP-dThd-sP-dGuo-sP-m5Cyt-dRib-sP-dThd-sP-dThd-sP-m5Cyt-dRibf-sP-Gua-Ribf2MeOEt-sP-m5Cyt-Ribf2MeOEt-sP-Ade-Ribf2MeOEt-sP-m5Cyt-Rib2MeOEt-sP-m5Cyt-Rib2MeOEt
• The ChEMBL structure, CHEMBL2219536, is almost certainly wrong due to thiophosphate connectivity.
Thiophosphate stereochemistry
• Not only does ChEMBL get the thiophosphate connectivity wrong, but HELM ignores the influence of stereochemistry
– Hartmit Jahns, Jonathan Hall, et al. “Stereochemical Bias Introduced During RNA Synthesis Modulates the Activity of phosphorothioate siRNAs”, Nature Communications, Vol. 6, p. 6317, March 2015.
• Hence Sugar & Splice also supports RsP and SsP.
– For example, dC-RsP-dC has the SMILES string c1cn(c(=O)nc1N)[C@H]2C[C@@H]([C@H](O2)CO[P@](=S)(O)O[C@H]3C[C@@H](O[C@@H]3CO)n4ccc(nc4=O)N)O
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
On-going and future work
• Chemists continue to push back the boundaries of chemical space around nucleic acid therapeutics.
• NextMove Software endeavors to extend the state-of-the-art in nucleic acid perception, hopefully continuing to close the “RNA informatics gap”.
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
acknowledgements
• Joann Prescott-Roy, Novartis, Cambridge, MA.
• Evan Bolton, PubChem, NCBI, Bethesda, MD.
• Noel O’Boyle, NextMove Software, UK.
• Thank you for your time.
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
Pdb Nucleic acids
IUPAC/SnS PDB IUPAC/SnS PDB
P-dAdo _DA P-Wyb(R)-Ribf _YG
P-dCyd _DC P-m1Ade-Ribf 1MA
P-dGuo _DG P-m2Gua-Ribf 2MG
P-dThd _DT P-Thy-Ribf2Me 2MU
P-dUrd _DU P-m5Cyt-Ribf 5MC
P-rAdo __A P-m7Gua-Ribf 7MG
P-rCyd __C P-Ade-Ribf2Me A2M
P-rGuo __C P-m22Gua-Ribf M2G
P-rUrd __T P-Cyt-Ribf2Me OMC
P-rThd 5MU P-Gua-Ribf2Me OMG
P-rYrd PSU P-Ura-Ribf2Me OMU
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016
pDB monosaccharides a-D- a-L- b-D- b-L-
allHexp AFD ALL
altHexp SHD
araHexp ARA ARB
araPenf BXY AHR BXX FUB
galHexp GLA GXL GAL
glcHexp GLC BGC
gulHexp GUP GL0
lyxHexp LDY
manHexp MAN BMA
ribHexp RIP
ribPenf RIB BDR
xylHexp XYS HSY XYP LXC
xylPenf XYZ
252nd ACS National Meeting, Philadelphia, PA, Wednesday 24th August 2016