EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

EBI is an Outstation of the European Molecular Biology Laboratory.

ChEBI: The story so far

ChEBI: The story so far2

Private Data Public Data


The state of affairs of bioinformatics in 2002

• Bioinformatics is booming

• Human Genome sequence rough draft published June 2000

• Free resources and free data


A different story for chemoinformatics

• Private data and private software


Too hard to solve… lets put our head in the sand


Bioinformatics data too large to keep track of chemical compounds

• 100000 Protein entries in SwissProt (2002)

• 20 million entries in EMBL Database (2002)

• Small databases unable to keep track

• ENZYME resources ~ 3500 enzymatic reactions


New initiatives start up

• PubChem• Chemical repository, millions of entries, focus on screening

assays

• ChEBI • Manually annotated database, nomenclature reference and

compound database, tens of thousands of entries


Principles of foundation

• December 2002 email exchanges within the EBI to address the issue of chemistry

• Three principles outlined

2002

2003

2004

2005

2006

2007

2008


“Nothing held in the database must be proprietary or derived from a proprietary source that would limit its free distribution/availability to anyone.”


“Every data item in the database should be fully traceable and explicitly referenced to the original source/version.”


“Although the EBI will provide a web interface, the entirety of the data should be available to all without constraint as, for example, SQL table dumps, ASCII tables, and XML (e.g. DAML+OIL)”


We make a start using existing resources

• Integrate three resources• KEGG Compound• IntEnz • Chemical Ontology

• Annotation starts summer 2003• Focus on nomenclature

2002

2003

2004

2005

2006

2007

2008


Our first release was modest but it was a start

• 21 July 2004

• 2783 annotated entities

• Data:• ChEBI Name, ChEBI Id• IUPAC Names, Synonyms• Formula• Cross-references20

0220

03

2004

2005

2006

2007

2008


We introduce structures - Sep 2005

• Molfiles

• InChI (IUPAC International Chemical Identifier)

• SMILES (Simplified Molecular Input Line Entry System)

• Image (PNG)

2002

2003

2004

2005

2006

2007

2008


Marvin in ChEBI


We start editing the chemical ontology – Dec 2005

2002

2003

2004

2005

2006

2007

2008


Web Services - Oct 2006

• Programmatic access to a ChEBI entry• SOAP based Java implementation

• Clients currently available in Java and perl

• Four methods with which to access data• getLiteEntity • getCompleteEntity• getOntologyParents • getOntologyChildren

2002

2003

2004

2005

2006

2007

2008


Automated Cross References – Aug 2007

Current Databases:UniProtKB, Reactome, BioModels, IntAct, SABIO-RK, PubChem and ArrayExpress

2002

2003

2004

2005

2006

2007

2008


2002

2003

2004

2005

2006

2007

2008

Chemical Structure Searching – May 2008


After all this, where are we?


Annotation is linear

ChEBI Data Growth

0

2000

4000

6000

8000

10000

12000

14000

1600007

/200

4 (R

el 1

)

09/2

004

(Rel

3)

12/2

004

(Rel

5)

01/2

005

(Rel

7)

03/2

005

(Rel

9)

05/2

005

(Rel

11)

07/2

005

(Rel

13)

09/2

005

(Rel

15)

11/2

005

(Rel

17)

02/2

006

(Rel

19)

04/2

006

(Rel

21)

06/2

006

(Rel

23)

08/2

006

(Rel

25)

10/2

006

(Rel

27)

01/2

007

(Rel

29)

03/2

007

(Rel

31)

05/2

007

(Rel

33)

07/2

007

(Rel

35)

09/2

007

(Rel

37)

11/2

007

(Rel

39)

01/2

008

(Rel

41)

03/2

008

(Rel

43)

Releases in month and year

Nu

mb

er o

f an

no

tate

d e

nti

ties


Diversity of users

Constant challenge of balancing our users' varied interests.


Our positives

• Nomenclature database

• Manually annotated data

• Attention to detail

• Free and accessible

• Loyal users


Our not so positives

• Size for some people

• Not well integrated into other bioinformatics resources

• Community interaction

• No software publicly available to manipulate the database


Involve the community

• Create a submission web based tool• Users can easily submit their entities on a one to one basis• Also allowing bulk submission from other resources.


Improvements to data depth

• Addition of more Xrefs: PDB, MACIE ???

• Addition of more chemical attributes? What chemical attributes?

• Text mining projects to extract relevant chemical information from patents, journals• European Patent Office


Going Open Source

• Commercial software packages will be replaced with Open Source

• Long term goal: allow people to create a free local installation of ChEBI

• Distribution of data in useful formats: CML, SDF


Acknowledgements

• ChEBI Team• Paula de Matos, Kirill

Degtyarenko, Marcus Ennis, Janna Hastings, Christoph Steinbeck

• Alumni • Michael Darsow, Mickael Guedj,

Alan McNaught, Martin Zbinden

• ChEBI supporters• Rolf Apweiler, Michael

Ashburner, Henning Hermjakob, Janet Thornton

• IntEnz Team• Rafael Alcantára, Volker Ast,

Kristian Axelsen, Anne Morgat

• EPO Collaborators• Hélène Courrier, Stephane

Nauche, Jeremy Parsons

• Database supporters• ArrayExpress, IntAct,

Reactome, SABIO-RK, RSC, GO, RESID etc…


Requirement for submitting data to ChEBI

• Disclaimer: this is only the summary of a chat I have had with the ChEBI coordinator last night. So no promises !

• Information needed to submit a compound:• Structure• Name, synonyms• Registry• Database accession(s)• Mapping to ChEBI Ontology

• ChEBI currently quite busy with ongoing projects, but would consider taking submissions.


What Could be done within APO-SYS

• From Pekka’s talk, I gathered that there are about 5,000 to 10,000 compounds in these siRNA libraries.

• Question: who else is dealing with compounds in APO-SYS?

• One could use the ChEBI’s web service using InCHI to identify what is already in the database.

• ChEBI can do targeted curation provided funding for the curation team.

EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

Documents

Transcript of EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.