EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.
-
Upload
jenifer-chanley -
Category
Documents
-
view
219 -
download
2
Transcript of EBI is an Outstation of the European Molecular Biology Laboratory. ChEBI: The story so far.
EBI is an Outstation of the European Molecular Biology Laboratory.
ChEBI: The story so far
ChEBI: The story so far2
Private Data Public Data
ChEBI: The story so far3
The state of affairs of bioinformatics in 2002
• Bioinformatics is booming
• Human Genome sequence rough draft published June 2000
• Free resources and free data
ChEBI: The story so far4
A different story for chemoinformatics
• Private data and private software
ChEBI: The story so far5
Too hard to solve… lets put our head in the sand
ChEBI: The story so far6
Bioinformatics data too large to keep track of chemical compounds
• 100000 Protein entries in SwissProt (2002)
• 20 million entries in EMBL Database (2002)
• Small databases unable to keep track
• ENZYME resources ~ 3500 enzymatic reactions
ChEBI: The story so far7
New initiatives start up
• PubChem• Chemical repository, millions of entries, focus on screening
assays
• ChEBI • Manually annotated database, nomenclature reference and
compound database, tens of thousands of entries
ChEBI: The story so far8
Principles of foundation
• December 2002 email exchanges within the EBI to address the issue of chemistry
• Three principles outlined
2002
2003
2004
2005
2006
2007
2008
ChEBI: The story so far9
“Nothing held in the database must be proprietary or derived from a proprietary source that would limit its free distribution/availability to anyone.”
ChEBI: The story so far10
“Every data item in the database should be fully traceable and explicitly referenced to the original source/version.”
ChEBI: The story so far11
“Although the EBI will provide a web interface, the entirety of the data should be available to all without constraint as, for example, SQL table dumps, ASCII tables, and XML (e.g. DAML+OIL)”
ChEBI: The story so far12
We make a start using existing resources
• Integrate three resources• KEGG Compound• IntEnz • Chemical Ontology
• Annotation starts summer 2003• Focus on nomenclature
2002
2003
2004
2005
2006
2007
2008
ChEBI: The story so far13
Our first release was modest but it was a start
• 21 July 2004
• 2783 annotated entities
• Data:• ChEBI Name, ChEBI Id• IUPAC Names, Synonyms• Formula• Cross-references20
0220
03
2004
2005
2006
2007
2008
ChEBI: The story so far14
We introduce structures - Sep 2005
• Molfiles
• InChI (IUPAC International Chemical Identifier)
• SMILES (Simplified Molecular Input Line Entry System)
• Image (PNG)
2002
2003
2004
2005
2006
2007
2008
ChEBI: The story so far15
Marvin in ChEBI
ChEBI: The story so far16
We start editing the chemical ontology – Dec 2005
2002
2003
2004
2005
2006
2007
2008
ChEBI: The story so far17
Web Services - Oct 2006
• Programmatic access to a ChEBI entry• SOAP based Java implementation
• Clients currently available in Java and perl
• Four methods with which to access data• getLiteEntity • getCompleteEntity• getOntologyParents • getOntologyChildren
2002
2003
2004
2005
2006
2007
2008
ChEBI: The story so far18
Automated Cross References – Aug 2007
Current Databases:UniProtKB, Reactome, BioModels, IntAct, SABIO-RK, PubChem and ArrayExpress
2002
2003
2004
2005
2006
2007
2008
ChEBI: The story so far19
2002
2003
2004
2005
2006
2007
2008
Chemical Structure Searching – May 2008
ChEBI: The story so far20
After all this, where are we?
ChEBI: The story so far21
ChEBI: The story so far22
ChEBI: The story so far23
Annotation is linear
ChEBI Data Growth
0
2000
4000
6000
8000
10000
12000
14000
1600007
/200
4 (R
el 1
)
09/2
004
(Rel
3)
12/2
004
(Rel
5)
01/2
005
(Rel
7)
03/2
005
(Rel
9)
05/2
005
(Rel
11)
07/2
005
(Rel
13)
09/2
005
(Rel
15)
11/2
005
(Rel
17)
02/2
006
(Rel
19)
04/2
006
(Rel
21)
06/2
006
(Rel
23)
08/2
006
(Rel
25)
10/2
006
(Rel
27)
01/2
007
(Rel
29)
03/2
007
(Rel
31)
05/2
007
(Rel
33)
07/2
007
(Rel
35)
09/2
007
(Rel
37)
11/2
007
(Rel
39)
01/2
008
(Rel
41)
03/2
008
(Rel
43)
Releases in month and year
Nu
mb
er o
f an
no
tate
d e
nti
ties
ChEBI: The story so far24
Diversity of users
Constant challenge of balancing our users' varied interests.
ChEBI: The story so far25
Our positives
• Nomenclature database
• Manually annotated data
• Attention to detail
• Free and accessible
• Loyal users
ChEBI: The story so far26
Our not so positives
• Size for some people
• Not well integrated into other bioinformatics resources
• Community interaction
• No software publicly available to manipulate the database
ChEBI: The story so far27
Involve the community
• Create a submission web based tool• Users can easily submit their entities on a one to one basis• Also allowing bulk submission from other resources.
ChEBI: The story so far28
Improvements to data depth
• Addition of more Xrefs: PDB, MACIE ???
• Addition of more chemical attributes? What chemical attributes?
• Text mining projects to extract relevant chemical information from patents, journals• European Patent Office
ChEBI: The story so far29
Going Open Source
• Commercial software packages will be replaced with Open Source
• Long term goal: allow people to create a free local installation of ChEBI
• Distribution of data in useful formats: CML, SDF
ChEBI: The story so far30
Acknowledgements
• ChEBI Team• Paula de Matos, Kirill
Degtyarenko, Marcus Ennis, Janna Hastings, Christoph Steinbeck
• Alumni • Michael Darsow, Mickael Guedj,
Alan McNaught, Martin Zbinden
• ChEBI supporters• Rolf Apweiler, Michael
Ashburner, Henning Hermjakob, Janet Thornton
• IntEnz Team• Rafael Alcantára, Volker Ast,
Kristian Axelsen, Anne Morgat
• EPO Collaborators• Hélène Courrier, Stephane
Nauche, Jeremy Parsons
• Database supporters• ArrayExpress, IntAct,
Reactome, SABIO-RK, RSC, GO, RESID etc…
ChEBI: The story so far31
Requirement for submitting data to ChEBI
• Disclaimer: this is only the summary of a chat I have had with the ChEBI coordinator last night. So no promises !
• Information needed to submit a compound:• Structure• Name, synonyms• Registry• Database accession(s)• Mapping to ChEBI Ontology
• ChEBI currently quite busy with ongoing projects, but would consider taking submissions.
ChEBI: The story so far32
What Could be done within APO-SYS
• From Pekka’s talk, I gathered that there are about 5,000 to 10,000 compounds in these siRNA libraries.
• Question: who else is dealing with compounds in APO-SYS?
• One could use the ChEBI’s web service using InCHI to identify what is already in the database.
• ChEBI can do targeted curation provided funding for the curation team.