PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data...
-
Upload
dwain-dalton -
Category
Documents
-
view
215 -
download
1
Transcript of PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data...
PIMSdata management and harvesting
General Introduction
Design a LIMS
Protein Production Data Model
What can PIMS do for you?
Information Management System
■ Information Management System (IMS) is a joint database and information management system
■ A database management system (DBMS) is a system, usually automated and computerized, for the management of any collection of compatible, and ideally normalized, data
■ Information management is the handling of knowledge acquired by many disparate sources in a way that optimizes access by all who have a share in that knowledge
Scientific goals
■ Recording laboratory information■ A lot of data keeping
■ 10,000s of experiments
■ 1,000,000s of samples
■ Data interchange and interoperation■ Collaboration in protein production
■ Share data between stages and sites
■ Data transfer to beamline or NMR ops
■ Data mining and reporting■ Analysis
■ Negative results can be mined to improve methods
■ Scientific publications■ Data deposition
PIMS
■ Protein Information Management System■ Started in January 2005■ 5 years UK project, funded by the
Biotechnology and Biological Sciences Research Council (BBSRC)
■ Based on the Protein Production Data Model paper■ Proteins. 2005 Feb 1;58(2):278-84. “Design of
a data model for developing laboratory information management and analysis systems for protein production.”
Scope of PIMS
Targetselection
Targetoptimisation
Cloning ExpressionPurification &Concentration
CrystallisationMicrocrystals
Data collection
PhasingModel building
Refinement
Bio
info
rmat
ics
Molecular Biology
Cry
stal
log
rap
hy
import
export
Stakeholders
■ BBSRC SPoRT funding■ Scottish Structural Proteomics
Facility (SSPF) ■ Universities of Dundee, St.
Andrews, Glasgow and Warwick.■ Membrane Protein Structure
Initiative (MPSI)■ Universities of Glasgow, Leeds,
Oxford, Sheffield, Imperial College, Birkbeck College, UMIST and CCLRC Daresbury.
■ Protein Information Management System (PIMS)
■ CCP4, Diamond■ Oxford Protein Production Facility■ IBBMC, University Paris Sud■ European Bioinformatics Institute■ York Structural Biology
Laboratory■ Daresbury Laboratory■ Other UK protein scientists■ Other protein scientists worldwide
SSPF
BBSRC funding
MPSI
PIMS
Collaborations
■ Seamless data transfer and a consistent UI ...■ ... from target to structure deposition■ ... so far as possible
■ Bioinformatics: SSPF pipeline, EBI workflow■ Crystallization: NKI, EMBL Hamburg & Grenoble
(BIOXHIT)■ Data transfer: e-HTPX■ Data collection: DNA, X-track■ Structure solution: CCP4, CCPN■ Instruments: Kendro, Csols
General Introduction
Design a LIMS
Protein Production Data Model
What can PIMS do for you?
Design
■ The data model ■ focuses on what data should be stored■ is used to design the entities (classes or tables)
that we are dealing with, their various attributes, and their relationships
■ The goal of the data model is to make sure that the all data objects required are completely and accurately represented
Reliability
■ Loss of data is inexcusable■ Must be able to correct wrong data■ Must keep audit trails■ Must allow future changes
■ All made feasible by■ Data model■ Database■ Software engineering standards
Ancestry
■ HalX: an open-source LIMS (Laboratory Information Management System) for small- to large-scale laboratories.■ Acta Crystallogr D Biol
Crystallogr. 2005 Jun;61(Pt 6):671-8.
■ Prilusky J, Oueillet E, Ulryck N, Pajon A, Bernauer J, Krimm I, Quevillon-Cheruel S, Leulliot N, Graille M, Liger D, Tresaugues L, Sussman JL, Janin J, van Tilbeurgh H, Poupon A.
■ OPPF based on Nautilus■ MOLE: a data
management application based on a protein production data model.■ Proteins. 2005 Feb
1;58(2):285-9.
■ Morris C, Wood P, Griffiths SL, Wilson KS, Ashton AW.
PIMS
■ The aim is to provide a Laboratory Information Management System (LIMS)■ for Laboratories that produce proteins from target genes■ can be incorporated into commercial software in the area
of biotech and protein production■ Improve the quality of the experimental data
deposited into PDB■ by providing a software for lab scientists to harvest their
daily experimental data from protein production to structure
■ My roles■ Data Model■ Database / Persistence layer / Java API■ Java Applet development
General Introduction
Design a LIMS
Protein Production Data Model
What can PIMS do for you?
Why is Data Modelling Important?
■ A Data Model is a plan for building a database■ detailed enough to be
used to create the physical structure
■ simple enough to communicate to the end user the data structure
■ The Unified Modelling Language (UML)
Data Model
■ Related to protein production & crystallisation■ Suitable for large & small facilities
■ Required to reproduce the samples & experiments involved
■ Used for tracking samples, experiments & results
■ Developed to help software developers to collect, store and exchange information through the provision of a common platform
Area covered
■ Protein production work is generally the investigation of a particular protein, the Target
■ The work often aims to produce a derivative of the Target, such as a single domain or complexes
protein productiontarget
crystallisation
X-Ray
phasing
structure
NMR tube
NMR
The Core Data Model
Change Control Board
■ The data model is a work in progress■ The science is developing too■ Local protocols, which are novel and confidential■ Not easy work
■ Thanks to…■ Geoff Barton (Dundee)■ Steve Prince (Manchester)■ Anne Poupon (IBBMC)■ Jon Diprose (OPPF)■ Alun Ashton (Diamond)■ Rasmus Fogh (CCPN)
Generation machinery
■ Implemented in UML (Object Domain)
■ Developed within a framework provided by the CCPN project
■ Information stored in the UML Data Model is used to generate automatically■ SQL schema, ■ Java Application
Program Interfaces (APIs) and
■ Documentation
JavaAPI
PythonAPI
Doc
SQLschema
XMLschema
UML Data
Model
framework
www.ccpn.ac.uk
Architecture
■ The API provides methods to access the underlying DB to store and retrieve data■ This allows applications to manipulate data without a
detailed knowledge of the way in which the data is stored
■ Various different applications make use of the API■ LIMS■ Any High Throughput applications (non-GUI)
■ They are able to exchange data easily
API
Tools: GUI, standalone applications,…
storage
JavaAPI
Persistence layerDB
SQLschema
From data model to application
■ Data Model■ Use cases
■ Scientific logic into requirements
■ Specifications■ security, performance, usability, etc
■ Java API■ Test data■ UI Design■ Application
Modular Construction
■ http://www.pims-lims.org/project/use-case-suite.html
SystemAdministration
Setup & Configuration
Access RightsManagement
ProjectManagement
Reference Data
InstrumentManagement
Scheduling Data Capture
InventoryManagement
SampleManagement
Bioinformatics
Mobile DataCollection
ReportingVisualisationData Mining
Training &Support
Workflow
■ Supplier details ■ Protocols
■ documenting set of editable default protocols■ user interface design with Ed Daniel
■ Reagents■ protocol-related reference samples■ chemical hazard information
■ e.g. R and S-phrases
■ documenting lab chemicals as ‘MolComponents’■ includes synonyms, formula, CAS-number and mass
■ naming system under discussion with NKI
■ ~400 identified, ~180 based on crystallisation screens
Reference data
Instrument management
■ Analytical Data: A Tower of Babel
■ Integration■ CSols
■ produces a widely used Instrument Integration Package
■ if the PIMS I/O is implemented in a reasonable timescale CSols may develop a PIMS Driver
■ Kendro/Thermo
0 20 40 60 80 100 120 140 160 180 200 Mass (m/z)
MSMS 12 11 10 9 8 7 6 5 4 3 2 1 0
Parts Per Million
NMRNMR
4000 3500 3000 2500 2000 1500 1000 Wavenumber (cm-1)IRIR
.389
.863
1.24
4
1.92
7
2.83
4
.5 1 1.5 2 2.5 3 3.5 Minutes
LCLC
General Introduction
Design a LIMS
Protein Production Data Model
What can PIMS do for you?
What can PIMS do for you?
Not a lot right nowWhatever you want, eventually ...... as long as it's data management
for protein production
Version 0.2
■ October 2005■ Then incremental delivery
■ … for one customer at a time and integrate with trunk■ … and repeat until project complete
Protocol Editor
Applet Protocol Editor
■ Choose a step from a list■ Draw Temperature step
■ List of the protocol's steps already done and reload them from the bottom of the screen
■ Record the protocol in DB■ Display the protocol's list from DB in the explorer and reload
anyone of them
Applet Workflow
■ Select in tabulation the experiment categories■ Drag and drop the selected experiments■ Build a workflow or load an existing one■ Associate a protocol to an experiment
A collaborative framework
■ … to develop a family of LIMSes
■ Developers have difficulty in justifying the time required to create the software needed
■ The biologist doesn't want to wait■ The result is a rapidly written LIMS that is fragile
and cannot scale if the project grows up
■ Need a generic LIMS ■ helps to solve these problems by giving developers a
tool that can scale to meet the needs of a large project■ And which welcome plugins for novel methods
Conclusion
■ Each “Click” could be a lot of coding ...
■ What do molecular biologists really want?■ Expectations are High!
■ Users make an indispensable contribution■ Tell us when it's not good enough ...■ ... we will respond
Acknowledgements
■ PIMS developer group■ Chris Morris (CCP4)
■ Anne Pajon (EBI)
■ Ed Daniel (Daresbury)
■ Peter Troshin (MPSI)
■ Jo van Niekerk (SSPF)
■ Susy Griffiths (YSBL)
■ Jon Diprose (OPPF)
■ Katherine Pilicheva (OPPF)
■ Anne Poupon (IBBMC)
■ Eric Oeuillet (IBBMC)
■ Sabrina Haquin (IBBMC)
■ Alun Ashton (Diamond)
■ EBI-MSD■ Kim Henrick■ Wim Vranken■ John Ionides
■ CCPN■ Wayne Boucher■ Rasmus Fogh■ Tim Stevens■ Dan