Chem4Word Wade

50
Chem4Word: Semantic Chemical Authoring within Microsoft Word Alex D. Wade Tony Hey Director, Scholarly Communications Corporate VP Microsoft Research Connections Microsoft Research Connections

Transcript of Chem4Word Wade

Page 1: Chem4Word Wade

Chem4Word: Semantic Chemical Authoring within Microsoft WordAlex D. Wade Tony HeyDirector, Scholarly Communications Corporate VP Microsoft Research Connections Microsoft Research Connections

Page 2: Chem4Word Wade

GEPS20112http://research.microsoft.com/connections/

Page 3: Chem4Word Wade

Imagine…• Live research reports that had

multiple end-user ‘views’ and which could dynamically tailor their presentation to each user

• An authoring environment that absorbs and encapsulates research workflows and outputs from the lab experiments

• A report that can be dropped into an electronic lab workbench in order to reconstitute an entire experiment

• A researcher working with multiple reports on a Surface and having the ability to mash up data and workflows across experiments

• The ability to apply new analyses and visualizations and to perform new in silico experiments

Envisioning a New Era of Research Reporting

DynamicDocuments

Reputation& Influence

Reproducible Research

Interactive Data

Collaboration

Page 4: Chem4Word Wade

Words & Pictures• Papers/reports today describe chemical reactions/entities in a

variety of ways: – common (or brand-name) labels– identifiers and shorthand notations– chemical formulae– two- (and three-) dimensional graphical images of molecular

structure.• Describing chemical data becomes an exercise in typesetting

and/or graphics, and cross- and re-referencing existing chemical entities is labor intensive. – The resulting text is usually interpretable by humans but

chemical data are lost in the process, making it difficult to programmatically extract meaningful information from such reports.

• The goals of Chem4Word are to: – simplify the task of authoring a chemical document,– do so in a way that produces a semantically meaningful document,

facilitating downstream tasks such as publishers workflows, entity extraction, and semantic applications.

Page 5: Chem4Word Wade

Chemistry Add-in for Wordaka Chem4Word

• Chem4Word allows chemists to create, edit and manipulate chemistry in the Word environment, by– Providing a built in dictionary of chemical structures– Enabling online lookup of further structures via web services (e.g.

Pubchem)– Facilitating linking/embedding chemical structures inside a Word

document– Modification of chemical structures & representations of those

structures• Authoring is backed by semantic data in

Chemical Markup Language (CML), enabling:– novel functionality in data checking during the authoring process– chemistry-centric article reading support– data-mining applications.

• Open source project (Outercurve Foundation); Apache 2.0 license

• ~500K downloads to date

Page 6: Chem4Word Wade

Word UI Extensibility• Ribbon• Task Pane• Gallery• Templates• Recognizers• Applications

Page 7: Chem4Word Wade

FILE FORMATS:OFFICE OPEN XML DOCUMENTS

Thanks to: http://www.slideshare.net/HollowKnight/a-quick-tour-of-open-xml-format

Page 8: Chem4Word Wade

Binaryformat

Office Open XMLformat

Page 9: Chem4Word Wade

Binaryformat

Page 10: Chem4Word Wade

Office Open XMLformat

Page 11: Chem4Word Wade

THEY LOOK IDENTICAL, BUT …

Page 12: Chem4Word Wade

Binaryformat

Page 13: Chem4Word Wade

Office Open XMLformat

Page 14: Chem4Word Wade

Office Open XMLis a ZIP file …

Page 15: Chem4Word Wade

That contains XML parts

Page 16: Chem4Word Wade

Images stored in native format

(JPEG, PNG, GIF, …)

Page 17: Chem4Word Wade

Programmer View of Open XML Files

• ZIP Archive• Document Parts

– XML Parts– Binary Parts– Typed (RFC 2616)

• Relationships– Connections between parts

• Content Type Stream– A specially-named stream– Defines mappings from part names to content types– Not itself a part, not URI addressable

• Folder structure for convenience only

Page 18: Chem4Word Wade
Page 19: Chem4Word Wade
Page 20: Chem4Word Wade
Page 21: Chem4Word Wade
Page 22: Chem4Word Wade
Page 23: Chem4Word Wade
Page 24: Chem4Word Wade
Page 25: Chem4Word Wade
Page 26: Chem4Word Wade
Page 27: Chem4Word Wade
Page 28: Chem4Word Wade
Page 29: Chem4Word Wade
Page 30: Chem4Word Wade
Page 31: Chem4Word Wade
Page 32: Chem4Word Wade
Page 33: Chem4Word Wade
Page 34: Chem4Word Wade

Multiple ‘views’ backed by a single CML data file

Page 35: Chem4Word Wade

EXAMPLE OF GETTING CML DATA BACK OUT OF A DOCUMENT

Page 36: Chem4Word Wade
Page 37: Chem4Word Wade
Page 38: Chem4Word Wade
Page 39: Chem4Word Wade
Page 40: Chem4Word Wade
Page 41: Chem4Word Wade
Page 42: Chem4Word Wade
Page 43: Chem4Word Wade
Page 44: Chem4Word Wade
Page 45: Chem4Word Wade
Page 46: Chem4Word Wade
Page 47: Chem4Word Wade
Page 48: Chem4Word Wade

Current publishing… is broken for data-rich science

With Chem4Word… the cycle is closed

Data publication difficult and unsupported

Insufficient data to fully support research

Data preparation integrated into user workflow

Open Standards promote Open Semantic Science

To conclude..

Page 49: Chem4Word Wade

Important Details

• Project Site– http://research.microsoft.com/chem4word

• Binaries and source code– http://chem4word.codeplex.com

• Facebook Page– http://www.facebook.com/groups/186300551397797/

• Outercurve Foundation– http://www.outercurve.org

Page 50: Chem4Word Wade

Contributors

University of Cambridge• Peter Murray-Rust• Jim Downing• Joe Townsend

Microsoft Research• Alex D. Wade• Savas Parastatidis• Oscar Naim• Pablo Fernicola• Murray Sargent• Geraldine Wade• Tola Chhoeun• Anthony Hanses• Jim McGill