Dial-A-Molecule Talk
description
Transcript of Dial-A-Molecule Talk
AMI2: High-throughput
extraction of semantic chemistry
from the scientific literature – the
ChemistryVisitorAndy Howlett, Mark Williamson, Peter Murray-Rust, Robert Glen
Unilever Centre, Cambridge
AMI2 is a framework that can extract
semantic data from the scientific
literature.
Importance of semantic data
<html>
<p>
<a href="http://news.bbc.co.uk">BBC
News</a>
<p>
<a href="http://www.cam.ac.uk">Cambridge
University</a>
<p>
</html>
AMI2 architecture
SpeciesVisitor
ChemistryVisitor
PhylogeneticTreeVisitor
Turning dumb PDF into smart SVG
<svg version="1.0" xmlns="http://www.w3.org/2000/svg">
<line x1="10" y1="10"
x2="200" y2="200"
stroke="black" />
<circle cx="75" cy="300" r="50"
fill="blue"
stroke="red" />
<path d="M 100 410 L 410 710 L 100 700 z"
fill="red"
stroke="blue"
stroke-width="3" />
</svg>
X1,Y1
X2,Y2
1) SpeciesVisitor
2) ChemistryVisitor
3) PhylogeneticTreeVisitor
ChemistryVisitor:
converting graphics primitives
<svg>
<line x1="269.97" y1="528.12" x2="269.97" y2="536.58" />
<line x1="272.16" y1="528.12" x2="272.16" y2="536.58" />
</svg>
SVGBuilder ChemistryVisitor
SVG from PDF
“Magic Happens Here”
removal of cosmetic artifact
(Graphics program eye candy)
<?xml version="1.0"?><molecule xmlns="http://www.xml-cml.org/schema"><atomArray><atom id="a1" elementType="C"/><atom id="a2" elementType="C"/><atom id="a3" elementType="H"/><atom id="a4" elementType="H"/><atom id="a5" elementType="H"/><atom id="a6" elementType="H"/></atomArray><bondArray><bond atomRefs2="a1 a2" order="2"/><bond atomRefs2="a1 a3" order="1"/><bond atomRefs2="a1 a4" order="1"/><bond atomRefs2="a2 a5" order="1"/><bond atomRefs2="a2 a6" order="1"/></bondArray></molecule>
SVG to CML
Implicit hydrogens
http://bitbucket.org/AndyHowlett/ami2-poc
A1) Reactions using ChemistryVisitor
Fig 4: Metabolites 2012, 2(1), 100-133
<reactant> <conditions> <product>
A2) More complex reactions
B) Error detection
C) Fraud detection
Org. Lett., 2011, 13 (15), pp 4084–4087
Original thanks to ChemBark
After AMI2 processing…
… AMI2 has detected a square
Conclusions
We can extract reactions semantically.
We can validate structures with associated names.
This can be done as part of a generic framework for extracting data.
Such capabilities are useful for Dial-a-Molecule – applied only to
metabolism so far but could also be used for synthesis
https://bitbucket.org/petermr/ami/wiki/Home
Acknowledgments
Unilever metabolism team, Cambridge
Unilever for funding