Andrea de Souza
Director, Informatics, Data Analysis & Finance
Center for the Science of Therapeutics
May 29, 2013
BioAssay Research Database
Direct Contributors
NIH Molecular Libraries – Glenn McFadden, Ajay Pillai
NIH Chemical Genomics Center – Chris Austin (PI), John Braisted, Marc
Ferrer, Rajarshi Guha, Ajit Jadhav, Dac-Trung Nguyen, Tyler Peryea, Noel
Southall, Henrike Veith
Broad Institute – Benjamin Alexander, Jacob Asiedu, Kay Aubrey, Joshua
Bittker, Steve Brudz, Simon Chatwin, Paul Clemons, Vlado Dancik, Siva
Dandapani, Andrea de Souza, Dan Durkin, David Lahr, Jeri Levine, Judy
McGloughlin, Phil Montgomery, Jose Perez, Stuart Schreiber (PI), Gil
Walzer, Xiaorong Xiang
University of New Mexico – Cristian Bologa, Steve Mathias, Tudor Oprea,
Larry Sklar (PI), Oleg Ursu, Anna Waller, Jeremy Yang
University of Miami – Saminda Abeyruwan, Hande Küküc, Vance
Lemmon, Ahsan Mir, Magdalena Przydzial, Kunie Sakurai, Stephan
Schürer, Uma Vempati, Ubbo Visser
Vanderbilt University – Eric Dawson, Bill Graham, Craig Lindsley (PI),
Shaun Stauffer
Sanford-Burnham Medical Research Institute – “T.C.” Chung, Jena
Diwan, Michael Hedrick, Gavin Magnuson, Siobhan Malany, Ian Pass,
Anthony Pinkerton, Derek Stonich, John Reed (PI)
Scripps Research Institute – Yasel Cruz, Mark Southern,
Hugh Rosen (PI)
BARD: BioAssay Research Database
Mission: Enable biomedical researchers and cheminformatic scientists to effectively use MLP data to generate new hypotheses
• Unique collaboration amongst 7 NIH & academic centers
• Develop and adopt an Assay Definition Standard (ADS)
• Provide tools for assay registration, querying & visualization o Deploy predictive models o Foster new methods to interpret chemical biology data o Enable private data sharing
• Developed as an open-source, industrial-strength platform to support public translational research
BARD: BioAssay Research Database
Mission: Enable biomedical researchers and cheminformatic
scientists to effectively use MLP data to generate new
hypotheses
Team Science
• Provide tools for assay registration and data querying &
visualization o Deploy predictive models
o Foster new methods to interpret chemical biology data
o Enable private data sharing
• Developed as an open-source, industrial-strength platform to
Research Data Management
Technology
Predictive Models
The BARD platform will support public translational research
Research Data Management
The Value of Context
The Value of Context
Research Data Management
PubChem BioAssay
PubChem BioAssay and BARD
structure the data
PubChem BioAssay and BARD
PubChem BARD
Missing or fuzzy assay definitions,
experiments and project concepts
Introduce assay definitions,
experiments and projects
‘Column header’ centric with
concentration details embedded
Result types and concentrations as
experimental variables
Extensive use of unstructured text Transition to structured use of
common language
PubChem
MLP-BioAssay structure
the data
Entrez
Uniprot
Gene Ontology Gene Ontology
Disease Ontology
BioAssay Ontology BioAssay Ontology BioAssay Ontology BioAssay Ontology
Unit Ontology
Uniprot Uniprot
Unit Ontology
BARD Dictionary & Term Hierarchy
Chemical Ontology
BARD Assay Definition Hierarchy
• Annotate all assays to a minimum standard
• Integrate and extend ontologies
• Enable assay registration
• Represent assays, results, experiments using ADS
• Exchange information in ADS via ADF
Structuring the Data
BARD Technology Components
Define & Register
Assays Data Dictionary – std terms
Catalog of Assay Protocols
High Quality Data &
Result Deposition Calculations & Results
Project-experiment association
Query & Interpret
Information Intuitive Guided Queries
Cross Assay & SAR centric views
Advance applications
En
ab
le H
yp
oth
esis
Ge
ne
ratio
n
Novice Expert
BARD Technology Components
Define & Register
Assays Data Dictionary – std terms
Catalog of Assay Protocols
High Quality Data &
Result Deposition Calculations & Results
Project-experiment association
Query & Interpret
Information Intuitive Guided Queries
Cross Assay & SAR centric views
Advance applications
En
ab
le H
yp
oth
esis
Ge
ne
ratio
n
Novice Expert
Web Client
Filter on annotations, such as detection method type
Google-like searching of: 4,000+ assays, 35M+ compounds, 300+ projects
Save items of interest for further analysis
Amazon-like Query Cart
Web Client - Project Specific Views
Web Client – Probe Development Workflow
Sunburst Visualization
Molecular activity against target classes
Target classifications from PantherDB
PANTHER in 2013: modeling the evolution of gene function,
and other gene attributes, in the context of phylogenetic trees.
Huaiyu Mi, Anushya Muruganujan and Paul D. Thomas
Nucl. Acids Res. (2012) doi: 10.1093/nar/gks1118
Jersey
D3.js
Web Query & Desktop Clients Data Warehouse & REST API Catalog of Assay Protocols
Commercial License
MySQL support for CAP coming soon
As open source as possible
JGoodies
Chemaxon Usage in BARD
UNM Promiscuity Plugin JChem for scaffold decomposition
REST API & Warehouse JChem for rendering structures and molecule fingerprint generation
http://bard.nih.gov/api/latest/compounds/6915727/image?s=200
http://bard.nih.gov/api/latest/compounds/?filter=n1cccc2ccccc12%5Bstructure%5D&type=sim&cutoff=0.9&expand=true
http://bard.nih.gov/api/latest/plugins/badapple/prom/cid/6915727?expand=true
Chemaxon Usage in BARD
Web Query Client JChem for rendering structures
Desktop Client JChem for rendering structures, molecule import & export Marvin for drawing query structures
• BioActivity Data Associative
Promiscuity Pattern Learning Engine
• Associations via scaffolds for chemical
space navigation
Example URI* description
<base>/badapple/prom/cid/752424
For compound with specified ID, return scaffold IDs and scores.
<base>/badapple/prom/cid/752424?expand=true
Additional statistics, scaffold smiles, and inDrug flag.
<base>/badapple/prom/scafid/233
For scaffold with specified ID, return statistics and smiles.
Predictive Models
Predictive Models
• Predicts CYP450 isoforms
metabolism sites with 2D
structures
• Patrik Rydberg et. al
• Released under LGPL
• BARD plugin
– Summary HTML view
– Data view
Navigating the Maze
Long-Term Path Forward
MLP
TBD
NCI-60
TBD
Datasets
CAP Web Query
Desktop APIs
Tools
BAD Apple
CYP450
TBD
TBD
Methods Data Analysis
Workflow 1
Workflow 2
Workflow 3
as a Platform
Sustained Community Engagement
ADS
Top Related