Chem4Word Wade

Post on 09-Feb-2017

1.362 views 0 download

Transcript of Chem4Word Wade

Chem4Word: Semantic Chemical Authoring within Microsoft WordAlex D. Wade Tony HeyDirector, Scholarly Communications Corporate VP Microsoft Research Connections Microsoft Research Connections

GEPS20112http://research.microsoft.com/connections/

Imagine…• Live research reports that had

multiple end-user ‘views’ and which could dynamically tailor their presentation to each user

• An authoring environment that absorbs and encapsulates research workflows and outputs from the lab experiments

• A report that can be dropped into an electronic lab workbench in order to reconstitute an entire experiment

• A researcher working with multiple reports on a Surface and having the ability to mash up data and workflows across experiments

• The ability to apply new analyses and visualizations and to perform new in silico experiments

Envisioning a New Era of Research Reporting

DynamicDocuments

Reputation& Influence

Reproducible Research

Interactive Data

Collaboration

Words & Pictures• Papers/reports today describe chemical reactions/entities in a

variety of ways: – common (or brand-name) labels– identifiers and shorthand notations– chemical formulae– two- (and three-) dimensional graphical images of molecular

structure.• Describing chemical data becomes an exercise in typesetting

and/or graphics, and cross- and re-referencing existing chemical entities is labor intensive. – The resulting text is usually interpretable by humans but

chemical data are lost in the process, making it difficult to programmatically extract meaningful information from such reports.

• The goals of Chem4Word are to: – simplify the task of authoring a chemical document,– do so in a way that produces a semantically meaningful document,

facilitating downstream tasks such as publishers workflows, entity extraction, and semantic applications.

Chemistry Add-in for Wordaka Chem4Word

• Chem4Word allows chemists to create, edit and manipulate chemistry in the Word environment, by– Providing a built in dictionary of chemical structures– Enabling online lookup of further structures via web services (e.g.

Pubchem)– Facilitating linking/embedding chemical structures inside a Word

document– Modification of chemical structures & representations of those

structures• Authoring is backed by semantic data in

Chemical Markup Language (CML), enabling:– novel functionality in data checking during the authoring process– chemistry-centric article reading support– data-mining applications.

• Open source project (Outercurve Foundation); Apache 2.0 license

• ~500K downloads to date

Word UI Extensibility• Ribbon• Task Pane• Gallery• Templates• Recognizers• Applications

FILE FORMATS:OFFICE OPEN XML DOCUMENTS

Thanks to: http://www.slideshare.net/HollowKnight/a-quick-tour-of-open-xml-format

Binaryformat

Office Open XMLformat

Binaryformat

Office Open XMLformat

THEY LOOK IDENTICAL, BUT …

Binaryformat

Office Open XMLformat

Office Open XMLis a ZIP file …

That contains XML parts

Images stored in native format

(JPEG, PNG, GIF, …)

Programmer View of Open XML Files

• ZIP Archive• Document Parts

– XML Parts– Binary Parts– Typed (RFC 2616)

• Relationships– Connections between parts

• Content Type Stream– A specially-named stream– Defines mappings from part names to content types– Not itself a part, not URI addressable

• Folder structure for convenience only

Multiple ‘views’ backed by a single CML data file

EXAMPLE OF GETTING CML DATA BACK OUT OF A DOCUMENT

Current publishing… is broken for data-rich science

With Chem4Word… the cycle is closed

Data publication difficult and unsupported

Insufficient data to fully support research

Data preparation integrated into user workflow

Open Standards promote Open Semantic Science

To conclude..

Important Details

• Project Site– http://research.microsoft.com/chem4word

• Binaries and source code– http://chem4word.codeplex.com

• Facebook Page– http://www.facebook.com/groups/186300551397797/

• Outercurve Foundation– http://www.outercurve.org

Contributors

University of Cambridge• Peter Murray-Rust• Jim Downing• Joe Townsend

Microsoft Research• Alex D. Wade• Savas Parastatidis• Oscar Naim• Pablo Fernicola• Murray Sargent• Geraldine Wade• Tola Chhoeun• Anthony Hanses• Jim McGill