Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics...

36
Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester http://www.cogeme.man.ac.uk http://www.bioinf.man.ac.uk

Transcript of Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics...

Page 1: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Capture, integration, and sharing offunctional genomic data

Steve Oliver

Professor of GenomicsSchool of Biological Sciences

University of Manchester

http://www.cogeme.man.ac.ukhttp://www.bioinf.man.ac.uk

Page 2: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

What are biologists interested in?

Complete organisms are much too complicated.

Only very well understoodsystems have well definedpathways.

Many biologists focus onone or a small number ofgenes.

Page 3: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

GENOME

TRANSCRIPTOME

PROTEOME

METABOLOME

Page 4: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

• Sample generation– Origin of sample

• hypothesis, organism, environment, preparation, paper citations

• Sample processing– Gels (1D/ 2D) and columns

• images, gel type and ranges, band/spot coordinates

• stationary and mobile phases, flow rate, temperature, fraction details

• Mass Spectrometry• machine type, ion source, voltages

• In Silico analysis• peak lists, database name + version,

partial sequence, search parameters, search hits, accession numbers

The nature of proteomics experiment data

Page 5: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

A Systematic Approach to Modelling, Capturing and

Disseminating Proteomics Experimental Data

http://pedro.man.ac.uk/

Page 6: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

The PEDRo UML schema in reduced form

MALDI

Electrospray

ToF

Spot Gel2D

TreatedAnalyte ChemicalTreatment

DiGEGelItemBoundaryPoint

GelItemRelatedGelItem

Quadrupole

CollisionCell

IonTrap

Hexapole

Organism TaggingProcess

Band Gel1D

OtherIonisation

OntologyEntry

OthermzAnalysis

OtherAnalyte

OntologyEntry

OtherAnalyte ProcessingStep

Fraction

AssayDataPoint

ColumnGradientStep

MobilePhase ComponentPercentX

Detection

mzAnalysis

AnalyteProcessingStep

IonSource

Analyte

MassSpecMachine

Peak-SpecificChromatogramIntegration

ChromatogramPoint

ListProcessing

MSMSFraction

MassSpecExperiment

Peak

PeakList

TandemSequenceData

DBSearchParameters

RelatedGelItem

Protein

DBSearch

OntologyEntryProteinHit

PeptideHit

DiGEGel

Gel

Experiment

SampleOrigin

S

Page 7: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

The Framework Around PEDRo

1. Lab generated data is encoded using the PEDRo data entry tool, producing an XML (PEML) file for local storage, or submission

2. Locally stored PEML files may be viewed in a web browser (with XSLT), allowing web pages to be quickly generated from datasets

3. Upon receipt of a PEML file at the repository site, a validation tool checks the file before entering it into the database

4. The repository (a relational database) holds submitted data, allowing various analyses to be performed, or data to be extracted as a PEML file or another format

Page 8: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

INTEGRATION

Page 9: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Why integrate data?

“These 200 genes are up-regulated in my experiment. Are any of their protein products known to interact?”

•Data is stored at a variety of sites and formats.•Databases designed mainly for browsing

(MIPS, SGD, BIND, SCPD, KEGG).•Need databases that allow complex queries.•Need to be easily usable by biologists.

Page 10: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Genome Information Management System (GIMS)

Paton NW, Khan SA, Hayes A, Moussouni F, Brass A, Eilbeck K, Goble GA, Hubbard SJ, Oliver SG (2000)

Conceptual modelling of genomic information. Bioinformatics 16, 548-557.

Page 11: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

GIMS

• Integrates genomic and functional data.

• Consists of two parts:

–GIMS Database

–GIMS User Interface

Page 12: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

GIMS data warehouse

SGD MIPS maxD

GIMS Database

Analysis Library

Canned QueriesBrowser

Page 13: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Database implementation

• Uses the object database FastObjects.• All database classes and analysis programs

are written in Java.• Allows close integration of the programming

language with the database.• Allows fast access to database data from

application programs.

• Allows data to be stored in a way that reflects the underlying mechanisms in the organism.

• Very flexible and extensible.

Page 14: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.
Page 15: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

GIMS Contents

Data type Data source

DNA sequences, chromosome locations of coding regions, e.g. ORFs, tRNAs, centromeres, telomeres etc.

MIPS

Predicted protein sequences, pI, mol weight, number of transmembrane regions.

MIPS

Protein attributes (e.g. cellular location, function, protein class, Prosite motifs, phenotype).

MIPS

Protein interaction data (affinity purification, yeast two-hybrid, genetic interactions).

Ho et al.,(2002), Gavin et al.,(2002), MIPS, Uetz et al.. (2000), Ito et al., (2001)

Page 16: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

GIMS Contents

Data type Data source

Metabolic data (reactions, compounds and enzymes).

L-compound, L-enzyme

Transcription factor. SCPD

Transcriptome data Stanford Microarray Database,

University of Manchester (BBSRC COGEME Project)

Ontology Data

Sequence similarity

GO

SGD

Page 17: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

GIMS User Interface

• Java application.• Can download from

http://img.cs.man.ac.uk/gims• Communicates with database via RMI.• On start-up, application is sent information

about database classes and canned queries.• Very flexible.• Allows user to browse database, ask canned

queries, and store and combine data sets.• Can save results as txt, html or xml.

Page 18: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.
Page 19: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Selecting Canned Queries

Query categories.

Queries in selected category

Initially empty store.

Page 20: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Parameterising a Query

Previously selected

query

Parameters for specific

run – selects down-

regulated genes in

the nucleus

Page 21: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Viewing the Results

Result collection

Operations on

collections

Page 22: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Selecting a Second Query

Page 23: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Setting Its Parameters

Parameters for specific

run – selects down-

regulated genes in the same

experiment that are

transcription factors

Page 24: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Obtaining Its Results

Page 25: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Inter-relating Results

Collections selected for operating

on

Remove one result from the

other

Page 26: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Result of Difference

Page 27: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

GIMSempowers

the biologist

Page 28: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Resources at the centre

Provenance record on howthe data wasproduced

Workflows that could be used to generate this data

People who have registered an interest in this data

Ontologies describing data

Services that can use or produce this data

Annotations

Data holdings

Literature relevant

Literature relevant

Related Data

Page 29: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Biologists at the centre

Provenance record of workflow runs they have made

People

Ontologies

Preferences for Services

Notes

Data holdings

LiteratureLiterature

Workflows they wrote or used

People they collaborate with

Page 30: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

myGrid

• EPSRC UK e-Science pilot project.• Open Source Upper Middleware for Bioinformatics.• (Web) Service-based architecture -> Grid services.• 42 months, 24 months in.• Prototype v1 Release Sept 2004; some services

available now.

www.mygrid.org.uk

Page 31: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Workflows are in silico experiments

Annotation PipelineWhat is known about my

candidate gene?

Medline

OMIM

GO

BLAST

EMBL

DQP

Query

Page 32: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

Application: Work bench demonstratorThe myGrid service components are used in a demonstration application called the “myGrid WorkBench”, which provides a common point of use for the services.

We can select data from the myGrid Information repository (mIR), select a workflow based on its semantic description, and examine the results.

Page 33: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

e-Science: ProvenanceLike a bench experiment, myGrid records the materials and methods it has used for an in silico experiment in a provenance log.

This is the where, what, when and how the experiment was run.

Derivation paths ~

workflows, queriesAnnotations ~ notesEvolution paths ~

workflow workflow

Page 34: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

e-Science: Notification

A notification service can inform the mIR and the user (proxy) that data, workflows, services, etc. have changed and thus prompt actions over data in the mIR.

Notifications are presented to the user with a client in the workbench environment.

User registers interest in notification topics

Page 35: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

The myGrid Team

Matthew Addis, Nedim Alpdemir, Rich Cawley, Vijay Dialani, Alvaro Fernandes, Justin Ferris, Rob Gaizauskas, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Claire Jennings, Ananth Krishna, Xiaojian Liu, Darren Marvin, Karon Mee, Simon Miles, Luc Moreau, Juri Papay, Norman Paton, Simon Pearce, Steve Pettifer, Milena Radenkovic, Peter Rice, Angus Roberts, Alan Robinson, Martin Senger, Nick Sharman, Paul

Watson, Anil Wipat and Chris Wroe.

Page 36: Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.

NeedGRID

to empowerthe biologist