Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational...
-
Upload
wesley-flowers -
Category
Documents
-
view
214 -
download
0
Transcript of Big Data Supporting Drug Discovery Cautionary Tales from the World of Chemistry for Translational...
Big Data Supporting Drug Discovery
Cautionary Tales from the World of Chemistry for Translational Informatics
Valery Tkachenko
RSC-CSIR/OSDD meeting
Pune, India
February 3rd 2014
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
• ~30 million chemicals and growing
• Data sourced from >500 different sources
• Crowdsourced curation and annotation
• Ongoing deposition of data from our journals and our collaborators
• A structure centric hub for web-searching
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
ChemSpider Databases
ChemSpider Compounds
ChemSpider Reactions
ChemSpider Spectra
ChemSpider Crystals
ChemSpider Materials
ChemSpider Assays
ChemSpider Algorithms
Research data inflow
Deposition Gateway
Staging databases
Compounds
Reactions
Spectra
Materials
Articles / CSSP
Compounds Module
Spectra Module
Reactions Module
Materials Module
TextminingModule
!͙Module
Web UI for unified depositions
DropBox, Google Drive, SkyDrive, etc
LabTrove and other templated data
Documents
API, FTP, etc
Raw data Validated dataStaging
databases
All databases are sliced by data sources/data
collections and have simple
security model where each data
slice/source is private, public or
embargoed
Research data outflow
Compounds Reactions Spectra Materials Documents
CompoundsAPI
ReactionsAPI
SpectraAPI
MaterialsAPI
DocumentsAPI
CompoundsWidgets
ReactionsWidgets
SpectraWidgets
MaterialsWidgets
DocumentsWidgets
Data tier
Data access tier
User interface
components tier
Analytical Laboratory application
User interface tier
(examples) Electronic Laboratory Notebook
Paid 3rd party integrations (various platforms – SharePoint, Google, etc)
Chemical Inventory application
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
It is so difficult to navigate…
What’s the structure?What’s the structure?
Are they in our file?
Are they in our file?
What’s similar?What’s
similar?
What’s the target?
What’s the target?Pharmacology
data?Pharmacology
data?
Known Pathways?
Known Pathways?
Working On Now?
Working On Now?Connections
to disease?Connections to disease?
Expressed in right cell type?Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
Data quality issue and CVSP
– Robochemistry
– Proliferation of errors in public and private databases
– Automated quality control system
DrugBank dataset (6516 records)
J. Brechner, IUPACGraphical Representation of stereochem. configurationsSection: ST-1.1.10
DB06287
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
Research data management
University 1
Data Hub
Workstations
University 2
Data Hub
Workstations
Company 3
Data Hub
Workstations
Data Repositoryindexed storage
Data Repository provideddata storage
Chemically intelligent services
Indexes
Data
External clients Publishers
Scientists Funding bodies
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
RSC/Rewards and Recognition
Congratulations! Your 1st CSSP article has been published. Philosopher Lao Tzu said “A journey of a thousand miles begins with a single step”. In the same way we hope that this will be the first of many submissions that you make to CSSP.
The First Step badge is awarded when a user submits (& has published) their 1st CSSP article.
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementVisualization and navigationBuilding Global Chemistry Network
Big DataChemical SpaceDrug Discovery pipelineMachine learningTraining setsRSC/ChemSpider platformsRSC/ArchiveResearch data managementData quality, crowdsourcing and AltMetricsBuilding Global Chemistry Network
http://www.openphacts.org
Open PHACTS is an Innovative Medicines Initiative (IMI) project, aiming to reduce the barriers to
drug discovery in industry, academia and for small
businesses.
Semantic web is one of the corner stones