PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data...

34
PIMS data management and harvesting

Transcript of PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data...

Page 1: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

PIMSdata management and harvesting

Page 2: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

General Introduction

Design a LIMS

Protein Production Data Model

What can PIMS do for you?

Page 3: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Information Management System

■ Information Management System (IMS) is a joint database and information management system

■ A database management system (DBMS) is a system, usually automated and computerized, for the management of any collection of compatible, and ideally normalized, data

■ Information management is the handling of knowledge acquired by many disparate sources in a way that optimizes access by all who have a share in that knowledge

Page 4: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Scientific goals

■ Recording laboratory information■ A lot of data keeping

■ 10,000s of experiments

■ 1,000,000s of samples

■ Data interchange and interoperation■ Collaboration in protein production

■ Share data between stages and sites

■ Data transfer to beamline or NMR ops

■ Data mining and reporting■ Analysis

■ Negative results can be mined to improve methods

■ Scientific publications■ Data deposition

Page 5: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

PIMS

■ Protein Information Management System■ Started in January 2005■ 5 years UK project, funded by the

Biotechnology and Biological Sciences Research Council (BBSRC)

■ Based on the Protein Production Data Model paper■ Proteins. 2005 Feb 1;58(2):278-84. “Design of

a data model for developing laboratory information management and analysis systems for protein production.”

Page 6: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Scope of PIMS

Targetselection

Targetoptimisation

Cloning ExpressionPurification &Concentration

CrystallisationMicrocrystals

Data collection

PhasingModel building

Refinement

Bio

info

rmat

ics

Molecular Biology

Cry

stal

log

rap

hy

import

export

Page 7: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Stakeholders

■ BBSRC SPoRT funding■ Scottish Structural Proteomics

Facility (SSPF) ■ Universities of Dundee, St.

Andrews, Glasgow and Warwick.■ Membrane Protein Structure

Initiative (MPSI)■ Universities of Glasgow, Leeds,

Oxford, Sheffield, Imperial College, Birkbeck College, UMIST and CCLRC Daresbury.

■ Protein Information Management System (PIMS)

■ CCP4, Diamond■ Oxford Protein Production Facility■ IBBMC, University Paris Sud■ European Bioinformatics Institute■ York Structural Biology

Laboratory■ Daresbury Laboratory■ Other UK protein scientists■ Other protein scientists worldwide

SSPF

BBSRC funding

MPSI

PIMS

Page 8: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Collaborations

■ Seamless data transfer and a consistent UI ...■ ... from target to structure deposition■ ... so far as possible

■ Bioinformatics: SSPF pipeline, EBI workflow■ Crystallization: NKI, EMBL Hamburg & Grenoble

(BIOXHIT)■ Data transfer: e-HTPX■ Data collection: DNA, X-track■ Structure solution: CCP4, CCPN■ Instruments: Kendro, Csols

Page 9: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

General Introduction

Design a LIMS

Protein Production Data Model

What can PIMS do for you?

Page 10: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Design

■ The data model ■ focuses on what data should be stored■ is used to design the entities (classes or tables)

that we are dealing with, their various attributes, and their relationships

■ The goal of the data model is to make sure that the all data objects required are completely and accurately represented

Page 11: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Reliability

■ Loss of data is inexcusable■ Must be able to correct wrong data■ Must keep audit trails■ Must allow future changes

■ All made feasible by■ Data model■ Database■ Software engineering standards

Page 12: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Ancestry

■ HalX: an open-source LIMS (Laboratory Information Management System) for small- to large-scale laboratories.■ Acta Crystallogr D Biol

Crystallogr. 2005 Jun;61(Pt 6):671-8.

■ Prilusky J, Oueillet E, Ulryck N, Pajon A, Bernauer J, Krimm I, Quevillon-Cheruel S, Leulliot N, Graille M, Liger D, Tresaugues L, Sussman JL, Janin J, van Tilbeurgh H, Poupon A.

■ OPPF based on Nautilus■ MOLE: a data

management application based on a protein production data model.■ Proteins. 2005 Feb

1;58(2):285-9.

■ Morris C, Wood P, Griffiths SL, Wilson KS, Ashton AW.

Page 13: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

PIMS

■ The aim is to provide a Laboratory Information Management System (LIMS)■ for Laboratories that produce proteins from target genes■ can be incorporated into commercial software in the area

of biotech and protein production■ Improve the quality of the experimental data

deposited into PDB■ by providing a software for lab scientists to harvest their

daily experimental data from protein production to structure

■ My roles■ Data Model■ Database / Persistence layer / Java API■ Java Applet development

Page 14: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

General Introduction

Design a LIMS

Protein Production Data Model

What can PIMS do for you?

Page 15: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Why is Data Modelling Important?

■ A Data Model is a plan for building a database■ detailed enough to be

used to create the physical structure

■ simple enough to communicate to the end user the data structure

■ The Unified Modelling Language (UML)

Page 16: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Data Model

■ Related to protein production & crystallisation■ Suitable for large & small facilities

■ Required to reproduce the samples & experiments involved

■ Used for tracking samples, experiments & results

■ Developed to help software developers to collect, store and exchange information through the provision of a common platform

Page 17: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Area covered

■ Protein production work is generally the investigation of a particular protein, the Target

■ The work often aims to produce a derivative of the Target, such as a single domain or complexes

protein productiontarget

crystallisation

X-Ray

phasing

structure

NMR tube

NMR

Page 18: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

The Core Data Model

Page 19: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Change Control Board

■ The data model is a work in progress■ The science is developing too■ Local protocols, which are novel and confidential■ Not easy work

■ Thanks to…■ Geoff Barton (Dundee)■ Steve Prince (Manchester)■ Anne Poupon (IBBMC)■ Jon Diprose (OPPF)■ Alun Ashton (Diamond)■ Rasmus Fogh (CCPN)

Page 20: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Generation machinery

■ Implemented in UML (Object Domain)

■ Developed within a framework provided by the CCPN project

■ Information stored in the UML Data Model is used to generate automatically■ SQL schema, ■ Java Application

Program Interfaces (APIs) and

■ Documentation

JavaAPI

PythonAPI

Doc

SQLschema

XMLschema

UML Data

Model

framework

www.ccpn.ac.uk

Page 21: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Architecture

■ The API provides methods to access the underlying DB to store and retrieve data■ This allows applications to manipulate data without a

detailed knowledge of the way in which the data is stored

■ Various different applications make use of the API■ LIMS■ Any High Throughput applications (non-GUI)

■ They are able to exchange data easily

API

Tools: GUI, standalone applications,…

storage

JavaAPI

Persistence layerDB

SQLschema

Page 22: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

From data model to application

■ Data Model■ Use cases

■ Scientific logic into requirements

■ Specifications■ security, performance, usability, etc

■ Java API■ Test data■ UI Design■ Application

Page 23: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Modular Construction

■ http://www.pims-lims.org/project/use-case-suite.html

SystemAdministration

Setup & Configuration

Access RightsManagement

ProjectManagement

Reference Data

InstrumentManagement

Scheduling Data Capture

InventoryManagement

SampleManagement

Bioinformatics

Mobile DataCollection

ReportingVisualisationData Mining

Training &Support

Workflow

Page 24: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

■ Supplier details ■ Protocols

■ documenting set of editable default protocols■ user interface design with Ed Daniel

■ Reagents■ protocol-related reference samples■ chemical hazard information

■ e.g. R and S-phrases

■ documenting lab chemicals as ‘MolComponents’■ includes synonyms, formula, CAS-number and mass

■ naming system under discussion with NKI

■ ~400 identified, ~180 based on crystallisation screens

Reference data

Page 25: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Instrument management

■ Analytical Data: A Tower of Babel

■ Integration■ CSols

■ produces a widely used Instrument Integration Package

■ if the PIMS I/O is implemented in a reasonable timescale CSols may develop a PIMS Driver

■ Kendro/Thermo

0 20 40 60 80 100 120 140 160 180 200 Mass (m/z)

MSMS 12 11 10 9 8 7 6 5 4 3 2 1 0

Parts Per Million

NMRNMR

4000 3500 3000 2500 2000 1500 1000 Wavenumber (cm-1)IRIR

.389

.863

1.24

4

1.92

7

2.83

4

.5 1 1.5 2 2.5 3 3.5 Minutes

LCLC

Page 26: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

General Introduction

Design a LIMS

Protein Production Data Model

What can PIMS do for you?

Page 27: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

What can PIMS do for you?

Not a lot right nowWhatever you want, eventually ...... as long as it's data management

for protein production

Page 28: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Version 0.2

■ October 2005■ Then incremental delivery

■ … for one customer at a time and integrate with trunk■ … and repeat until project complete

Page 29: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Protocol Editor

Page 30: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Applet Protocol Editor

■ Choose a step from a list■ Draw Temperature step

■ List of the protocol's steps already done and reload them from the bottom of the screen

■ Record the protocol in DB■ Display the protocol's list from DB in the explorer and reload

anyone of them

Page 31: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Applet Workflow

■ Select in tabulation the experiment categories■ Drag and drop the selected experiments■ Build a workflow or load an existing one■ Associate a protocol to an experiment

Page 32: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

A collaborative framework

■ … to develop a family of LIMSes

■ Developers have difficulty in justifying the time required to create the software needed

■ The biologist doesn't want to wait■ The result is a rapidly written LIMS that is fragile

and cannot scale if the project grows up

■ Need a generic LIMS ■ helps to solve these problems by giving developers a

tool that can scale to meet the needs of a large project■ And which welcome plugins for novel methods

Page 33: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Conclusion

■ Each “Click” could be a lot of coding ...

■ What do molecular biologists really want?■ Expectations are High!

■ Users make an indispensable contribution■ Tell us when it's not good enough ...■ ... we will respond

Page 34: PIMS data management and harvesting. General Introduction Design a LIMS Protein Production Data Model What can PIMS do for you?

Acknowledgements

■ PIMS developer group■ Chris Morris (CCP4)

■ Anne Pajon (EBI)

■ Ed Daniel (Daresbury)

■ Peter Troshin (MPSI)

■ Jo van Niekerk (SSPF)

■ Susy Griffiths (YSBL)

■ Jon Diprose (OPPF)

■ Katherine Pilicheva (OPPF)

■ Anne Poupon (IBBMC)

■ Eric Oeuillet (IBBMC)

■ Sabrina Haquin (IBBMC)

■ Alun Ashton (Diamond)

■ EBI-MSD■ Kim Henrick■ Wim Vranken■ John Ionides

■ CCPN■ Wayne Boucher■ Rasmus Fogh■ Tim Stevens■ Dan