Post on 10-May-2015
description
The application of cloud computing to Royal Society of Chemistry data platforms
Valery Tkachenko, Ken Karapetyan, Jon Steele,
Alexey Pshenichnov, Antony J. Williams
ACS 247th National Meeting
Dallas, TX
March 18th 2014
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
• ~30 million chemicals and growing
• Data sourced from >500 different sources
• Live depositions
• Live crowd curation and annotation
• A structure centric hub for web-searching
ChemSpider – user view
ChemSpider – under the hood
ChemSpider – load over years
2007•1 visitor (there is always the first one)
2009•3000 – 7000 visits/day
2014•50000 visits/day•40000 unique visitors/day•150000 page views/day•100 – 400 real-time visitors
ChemSpider – bottlenecks analysis• “Live” database
o Read-only is easier to scale-out• Application server(s)
o Standard ways to scaleo Session persistence
• SQL server(s)o Expensive, but not all data are relational - NoSQLo Overhead for replicationo Alternatives do not work well for “live” databases
• Backend (processing) server(s)o Use of grid computing
• UI technologyo ASP.NET Formso MVC/REST
• Software as a Service (SaaS)o APIo Widgetso High-scalability
ChemSpider – scaling out
ChemSpider – geography
Globalization
Localization
CDN
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
RSC Archive – since 1841
Published article example
Compounds
Reaction
Analytical Data
Text and References
New navigation style
What’s the structure?What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s
similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?
Known Pathways?
Working On Now?
Working On Now?Connections
to disease?Connections to disease?
Expressed in right cell type?Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
New architecture
Compounds Reactions Spectra Crystals Documents
CompoundsAPI
ReactionsAPI
SpectraAPI
CrystalsAPI
DocumentsAPI
CompoundsWidgets
ReactionsWidgets
SpectraWidgets
CrystalsWidgets
DocumentsWidgets
Data tier
Data access tier
User interface
components tier
Analytical Laboratory application
User interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
We are a part of a much larger world
APIs, endpoints and widgets
Challenges of the Big Dataindexing, navigation, visualization
Managing Big Data
Consuming Big Data
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
Chemistry Validation and Standardization Platform
ChemSpider
RSC Archive
RSC Chemistry Platform
Big Data world and chemistry
Data quality
Cloud Computing considerations
Cloud continuum
Cloud services from major players
Big Data in a Cloud whoops…
Summary
Cloud definition is foggy
Demands for computing resources is growing tremendously as we move into a Big Data world
Moving into the Cloud is not an “if” question, it’s a “when” question
It’s also a question of timing, budgets and resources
Thank you
Email: tkachenkov@rsc.org
Slides: http://www.slideshare.net/valerytkachenko16