The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of...

125
The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams Royal Society of Chemistry

Transcript of The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of...

Page 1: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

The Possibilities and Pitfalls of Internet-Based Chemical Data

Antony Williams

Royal Society of Chemistry

Page 2: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

I’ve performed a few dozen chemical syntheses

I’ve run thousands of analytical spectra

I’ve generated thousands of NMR assignments

I’ve probably published <5% of all work

But things can be different today….

About Me…as a Chemist

Page 3: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

My Early Scientific Computing

Page 4: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

If it was not just about me…

Page 5: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

If it was not just about me…

Together we might:

build an encyclopedia

…and rate restaurants

…provide book reviews to each other

…or movie reviews

…or reviews of service providers

…organize sit-ins and social action

…and more data might just be Open

Page 6: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

If it was not just about me…

Together we might:

build an encyclopedia

…and rate restaurants

…provide book reviews to each other

…or movie reviews

…or reviews of service providers

…organize sit-ins and social action

…and more data might just be Open

…more Chemists might share rather than just take!

Page 7: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

A hobby-project to connect chemistry data on the web

Three servers – one purchased, two hand-built

Software begged and borrowed – and thanks to Microsoft!

Some late nights – 10pm to 2am for over a year

Some survival of the naysayers in the community

…and taking advantage of a changing world of data availability and the crowdsourcing of willing participants

NO formal funding. Simply passion and abilities lining up.

A story of a hobby gone wild… Years 1 and 2

Page 8: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

ChemSpider (Year 2-present)

Building a Free Chemical Database

A central hub for chemists to source information

>28 million unique chemical records

Aggregated from >400 data sources

Chemicals, analytical data, movies, images, podcasts, links to patents, publications, predictions

Web services for integration

Daily updates of data

Page 9: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Answer Questions for Chemists

Questions a chemist might ask…

What is the melting point of n-heptanol?

What is the chemical structure of Xanax?

Chemically, what is phenolphthalein?

What are the stereocenters of cholesterol?

Where can I find publications about xylene?

What are the different trade names for Ketoconazole?

What is the NMR spectrum of Aspirin?

What are the safety handling issues for Thymol Blue?

Page 10: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

A LITTLE Chemistry First

Page 11: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Structural Diagrams

Page 12: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Structural Diagrams

Page 13: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Analytical Data

Page 14: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Does Stereochemistry Matter?

Page 15: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Does one stereocenter matter?

Distaval, Talimol, Nibrol, Sedimide, Quietoplex, Contergan, Neurosedyn, Softenon, Thalidomide

Page 16: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Structural Representations

Page 17: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

The InChI Standard

Page 18: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

InChIKeys Search the Web by Structure

Page 19: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

I want to know about “Vincristine”

Page 20: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.
Page 21: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Vincristine: Identifiers and Properties

Page 22: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Vincristine: Vendors and Sources

Page 23: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Vincristine: Patents

Page 24: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Chemical Names and Synonyms VALIDATION OF NAMES

Page 25: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Validated Names for Searching…

Page 26: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Information System Architecture

Input Filtering Curation Archival

Storage

Indexing

Processing Search Browse

Presentation

API

Page 27: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

The Quality of Chemical Data Online What is the Structure of Vitamin K?

A lipid cofactor that is required for normal blood clotting. Several forms of vitamin K have been identified: VITAMIN K1 (phytomenadione) derived from plants, VITAMIN K2 (menaquinone) from bacteria & synthetic naphthoquinone provitamins, VITAMIN K3 (menadione).

Page 28: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

What is the Structure of Vitamin K1?

Page 29: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

What is the Structure of Vitamin K1?

Page 30: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

CAS’s Common Chemistry

Page 31: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Wikipedia

Page 32: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Wolfram Alpha

Page 33: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

DailyMed

Page 34: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.
Page 35: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

People Use Trusted Resources…

Page 36: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Just Yesterday…

Page 37: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

How will it improve?

Participation

and

contribution

Page 38: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

ALL Different, ALL “Domoic Acids”

Page 39: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

ALL Different, ALL “Domoic Acids”

Page 40: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

The EXPERTS must get it right?!

Page 41: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Question Everything Online: www.dhmo.org

Page 42: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

ANYBODY can annotate a record on ChemSpider

Registered users can deposit new data

Registered users can validate existing data

Deposition, Annotation and Validation

Page 43: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

CURATION Search “Vitamin H”

Page 44: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

“Curate” Identifiers

Page 45: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

“Curate” Identifiers

Page 46: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

ChemSpider Web Services

Page 47: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

ChemSpider via web service access

For structure identification for mass spectrometry

For name and structure resolution

For structure and substructure searching

For an “innovative medicines initiative” semantic web project…

Open APIs for Science

Page 48: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Open PHACTS Project Develop a set of robust standards

Integrate Chemistry and Biology data by implementing the standards in a semantic integration hub

Deliver services to support drug discovery programs in pharma and public domain

INITIALLY 22 partners, 8 pharmaceutical companies, 3 biotechs

36 months project – first public release version is imminent

Guiding principle is open access, open usage, open source - Key to standards adoption -

Page 49: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Using RDF permalinks

http://www.chemspider.com/Chemical-Structure.7787.rdf

Using a Search Term

http://www.chemspider.com/rdf.ashx?q=cyclohexane

http://rdf.chemspider.com/cyclohexane

RDF and the semantic web

Page 50: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

RDF and the semantic web

Page 51: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

www.SpectralGame.com http://www.jcheminf.com/content/1/1/9

Page 52: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Times have changed

Immediacy of social networks

Commenting on articles/data is here

The “participating scientist” has high profile

And who can be a scientist now???

The World of Contribution

Page 53: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

A Ten Year Old Scientist

Page 54: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.
Page 55: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Challenging a Publication

Page 56: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.
Page 57: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Oops…

Page 58: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

>2 Years to Resolution

Page 60: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

The Blogosphere “Discusses”…

Page 61: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Oxidation by Sodium Hydride?

Page 62: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

The Blogosphere Analyzes…

Page 63: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

The Blogosphere Analyzes…

Page 64: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

How much is in the archives?

Page 65: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Open Notebook Science Analysis

Page 66: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Motivation Faster Science, Better Science

Page 67: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Openness – Still Carries Licensing

Openness may be hard..

Open Access flavors

Open Source licenses

Open Data licenses

Open Notebook Science

Page 68: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

License data based on GOALS: scientific, commercial, or mixed

Explore the benefits of open licensing and drawbacks of enclosure

Provide simple explanations terms of use

If you can't make the data public domain, make the metadata public domain.

We Suggest Rules for Licensing Data

Page 69: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

We Suggest Rules for Licensing Data

Page 70: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Challenged in the Twittersphere

Page 71: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Annotating Articles Today…

Page 72: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Attribution to me…

Page 73: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Other Publications to Annotate…

Page 74: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Other Publications to Annotate…

Page 75: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Publications to Annotate…

“We then established a collaboration with professor Sum Ting Wong, a fugitive from the North Korean University Hu Yu Hai Ding”

“..identified as the new protein Wai So Dim”

Page 76: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

A New World for Publishing?

Page 77: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

An Adventure into the World of Small but significant contribution..

Page 78: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

ChemSpider SyntheticPages

Page 79: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Micropublishing with Peer Review (a chemical synthesis blog?)

Page 80: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Multi-Step Synthesis

Page 81: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Interactive Data

Page 82: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

A New Route for Scientific Recognition?

Page 83: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

How do “we” measure a scientist?

The funding bodies, department heads etc. use

Publication profile

Impact factors

An index – h, m, g, i10, c, s …

Grants brought in

Scientists are notable in different ways – technology can help measure different types of “impact”

The Measure of a Scientist?

Page 84: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

What makes a Scientist Notable?

Page 85: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Online tools track activities of scientists

Some are totally opt-in, an increasing number are about you and need checking!

Take responsibility for your profile online

Actively BUILD your online profile

Public Profiles of Scientists

Page 86: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Microsoft Academic Search

Page 87: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

My Academic Search Profile

Page 88: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

My Co-author Graph

Page 89: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

How many times do you see errors where:

1) You have not been able to annotate or curate

2) You have chosen not to annotate or curate

Q: How Often Do You Contribute? Annotation and Validation

Page 90: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

My Co-author Graph

Page 91: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Contribute when you can!

Page 92: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Contribute when you can!

Page 93: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.
Page 94: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Scientists and Orcids?

Page 95: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

A unique identifier for a scientist – a Scientists InChI !

Will enable aggregation of a scientists activities

ORCIDs associated with publications, data, blog comments, other contributions (Wikipedia, reviews etc.) will be a way to measure their impact

Page 96: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

The Alt-Metrics Manifesto

http://altmetrics.org/manifesto/

Page 97: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

ImpactStory

Page 98: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

ImpactStory

Page 99: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

SlideShare

Page 100: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

SlideShare via ImpactStory

Page 101: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

ImpactStory

Page 102: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Where do I contribute? How might I be measured?

Page 103: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Article Level Metrics

Page 104: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Article Level Metrics

Page 105: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Impact will be an aggregate measure of

Publications – classic measures and article level metrics

Data, algorithms and code – and its distribution and reuse

Contributions as comments, annotation and curation activities

New “impact factors” will develop with time

New Measures of Impact

Page 106: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Some challenges are technology based

The growth in data – storage and compute speed

Ontologies, dictionaries and trusted sources

Many challenges are “about us”

Licenses and rights

Rewards and recognition

Participation, contribution and collaboration

The Challenges

Page 107: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

There are many government institutions building public compound databases that should collaborate more:

National Cancer Institute (NCI)

National Institutes of Health (NIH)

Environmental Protection Agency (EPA)

Food and Drug Administration (FDA)

National Library of Medicine (NLM)

Tear Down Walls between Government Labs

Page 108: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.
Page 109: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Release STRUCTURES Please!

Page 110: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

What Does the Future Hold?

Page 111: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

The Linked Network Will Grow

Page 112: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

The Data Deluge Will Not Go Away

Page 113: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

RSC Activities in Development

Deliver a Global Chemistry Hub

“Data enable” the RSC archive back to 1841:

Extract chemistry – chemicals, reactions, experimental data points, complex data

Enrich the articles for interactive viewing and crowdsourced annotation and curation

Enhance queries possible across the archive

Page 114: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Federated Data Segregation

Page 115: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Future System Architecture

Input Filtering Curation Archival

Storage

Indexing

Processing Search Bro

Presentation No more complex

API Complexity is hidden

Input Input

Curation Curation

Storage Storage

Elastic, distributed Indexing Indexing

New algorithms

Processing Processing Distributed

Search Search Over federated

systems

Archival Archival

Filtering Filtering Smarter

algorithms

Browse Over federated

systems

Page 116: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Data Validation is Exacting Work

Page 117: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

“Challenge” the Community

Page 118: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Chemistry is NOT just small molecules!

Data in RSC publications will be “enabled”

Data available for validation and curation

The delivery of the “Datument”

Data will be fed to models for validation, to retrain the models, full provenance retained

Algorithms will be provided to the community

Chemistry Data at RSC

Page 119: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Enhanced Mark-Up?

Page 120: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

An Error in my Abstract?

Page 121: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

An Error in my Abstract?

Chemists have embraced the web as a rich source of data and knowledge. However, all that glisters is not gold

Page 122: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Thanks Shakespeare

Page 123: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Acknowledgments

RSC and RSC|Cheminformatics team

All data source providers, curators and annotators

All software providers: commercial and open source

Contributors, curators, collaborators

Trusted Advisors: Jean-Claude Bradley, Sean Ekins, Lee Harland, Gary Martin, Martin Walker and…

Page 124: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Meet Valery… We’d love to chat…

Page 125: The Possibilities and Pitfalls of Internet-Based Chemical Data · The Possibilities and Pitfalls of Internet-Based Chemical Data Antony Williams ... and more data might just be Open.

Thank you Email: [email protected] Twitter: ChemConnector Personal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams