EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

32
EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far

Transcript of EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

Page 1: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

EBI is an Outstation of the European Molecular Biology Laboratory.

ChEBI: The story so far

Page 2: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far2

Private Data Public Data

Page 3: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far3

The state of affairs of bioinformatics in 2002

• Bioinformatics is booming

• Human Genome sequence rough draft published June 2000

• Free resources and free data

Page 4: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far4

A different story for chemoinformatics

• Private data and private software

Page 5: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far5

Too hard to solve… lets put our head in the sand

Page 6: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far6

Bioinformatics data too large to keep track of chemical compounds

• 100000 Protein entries in SwissProt (2002)

• 20 million entries in EMBL Database (2002)

• Small databases unable to keep track

• ENZYME resources ~ 3500 enzymatic reactions

Page 7: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far7

New initiatives start up

• PubChem• Chemical repository, millions of entries, focus on screening

assays

• ChEBI • Manually annotated database, nomenclature reference and

compound database, tens of thousands of entries

Page 8: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far8

Principles of foundation

• December 2002 email exchanges within the EBI to address the issue of chemistry

• Three principles outlined

2002

2003

2004

2005

2006

2007

2008

Page 9: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far9

“Nothing held in the database must be proprietary or derived from a proprietary source that would limit its free distribution/availability to anyone.”

Page 10: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far10

“Every data item in the database should be fully traceable and explicitly referenced to the original source/version.”

Page 11: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far11

“Although the EBI will provide a web interface, the entirety of the data should be available to all without constraint as, for example, SQL table dumps, ASCII tables, and XML (e.g. DAML+OIL)”

Page 12: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far12

We make a start using existing resources

• Integrate three resources• KEGG Compound• IntEnz • Chemical Ontology

• Annotation starts summer 2003• Focus on nomenclature

2002

2003

2004

2005

2006

2007

2008

Page 13: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far13

Our first release was modest but it was a start

• 21 July 2004

• 2783 annotated entities

• Data:• ChEBI Name, ChEBI Id• IUPAC Names, Synonyms• Formula• Cross-references20

0220

03

2004

2005

2006

2007

2008

Page 14: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far14

We introduce structures - Sep 2005

• Molfiles

• InChI (IUPAC International Chemical Identifier)

• SMILES (Simplified Molecular Input Line Entry System)

• Image (PNG)

2002

2003

2004

2005

2006

2007

2008

Page 15: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far15

Marvin in ChEBI

Page 16: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far16

We start editing the chemical ontology – Dec 2005

2002

2003

2004

2005

2006

2007

2008

Page 17: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far17

Web Services - Oct 2006

• Programmatic access to a ChEBI entry• SOAP based Java implementation

• Clients currently available in Java and perl

• Four methods with which to access data• getLiteEntity • getCompleteEntity• getOntologyParents • getOntologyChildren

2002

2003

2004

2005

2006

2007

2008

Page 18: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far18

Automated Cross References – Aug 2007

Current Databases:UniProtKB, Reactome, BioModels, IntAct, SABIO-RK, PubChem and ArrayExpress

2002

2003

2004

2005

2006

2007

2008

Page 19: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far19

2002

2003

2004

2005

2006

2007

2008

Chemical Structure Searching – May 2008

Page 20: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far20

After all this, where are we?

Page 21: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far21

Page 22: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far22

Page 23: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far23

Annotation is linear

ChEBI Data Growth

0

2000

4000

6000

8000

10000

12000

14000

1600007

/200

4 (R

el 1

)

09/2

004

(Rel

3)

12/2

004

(Rel

5)

01/2

005

(Rel

7)

03/2

005

(Rel

9)

05/2

005

(Rel

11)

07/2

005

(Rel

13)

09/2

005

(Rel

15)

11/2

005

(Rel

17)

02/2

006

(Rel

19)

04/2

006

(Rel

21)

06/2

006

(Rel

23)

08/2

006

(Rel

25)

10/2

006

(Rel

27)

01/2

007

(Rel

29)

03/2

007

(Rel

31)

05/2

007

(Rel

33)

07/2

007

(Rel

35)

09/2

007

(Rel

37)

11/2

007

(Rel

39)

01/2

008

(Rel

41)

03/2

008

(Rel

43)

Releases in month and year

Nu

mb

er o

f an

no

tate

d e

nti

ties

Page 24: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far24

Diversity of users

Constant challenge of balancing our users' varied interests.

Page 25: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far25

Our positives

• Nomenclature database

• Manually annotated data

• Attention to detail

• Free and accessible

• Loyal users

Page 26: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far26

Our not so positives

• Size for some people

• Not well integrated into other bioinformatics resources

• Community interaction

• No software publicly available to manipulate the database

Page 27: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far27

Involve the community

• Create a submission web based tool• Users can easily submit their entities on a one to one basis• Also allowing bulk submission from other resources.

Page 28: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far28

Improvements to data depth

• Addition of more Xrefs: PDB, MACIE ???

• Addition of more chemical attributes? What chemical attributes?

• Text mining projects to extract relevant chemical information from patents, journals• European Patent Office

Page 29: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far29

Going Open Source

• Commercial software packages will be replaced with Open Source

• Long term goal: allow people to create a free local installation of ChEBI

• Distribution of data in useful formats: CML, SDF

Page 30: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far30

Acknowledgements

• ChEBI Team• Paula de Matos, Kirill

Degtyarenko, Marcus Ennis, Janna Hastings, Christoph Steinbeck

• Alumni • Michael Darsow, Mickael Guedj,

Alan McNaught, Martin Zbinden

• ChEBI supporters• Rolf Apweiler, Michael

Ashburner, Henning Hermjakob, Janet Thornton

• IntEnz Team• Rafael Alcantára, Volker Ast,

Kristian Axelsen, Anne Morgat

• EPO Collaborators• Hélène Courrier, Stephane

Nauche, Jeremy Parsons

• Database supporters• ArrayExpress, IntAct,

Reactome, SABIO-RK, RSC, GO, RESID etc…

Page 31: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far31

Requirement for submitting data to ChEBI

• Disclaimer: this is only the summary of a chat I have had with the ChEBI coordinator last night. So no promises !

• Information needed to submit a compound:• Structure• Name, synonyms• Registry• Database accession(s)• Mapping to ChEBI Ontology

• ChEBI currently quite busy with ongoing projects, but would consider taking submissions.

Page 32: EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.

ChEBI: The story so far32

What Could be done within APO-SYS

• From Pekka’s talk, I gathered that there are about 5,000 to 10,000 compounds in these siRNA libraries.

• Question: who else is dealing with compounds in APO-SYS?

• One could use the ChEBI’s web service using InCHI to identify what is already in the database.

• ChEBI can do targeted curation provided funding for the curation team.