Copyright Elsevier MDL 2007 Present and future of informatics in chemistry Symposium in Honor of...
-
Upload
elwin-horn -
Category
Documents
-
view
214 -
download
0
Transcript of Copyright Elsevier MDL 2007 Present and future of informatics in chemistry Symposium in Honor of...
Copyright Elsevier MDL 2007
Present and future of informatics in chemistrySymposium in Honor of Gary WigginsDivision of Chemical Information223rd ACS National Meeting, Chicago
Phil McHale Elsevier MDL25 March 2007
2 Copyright Elsevier MDL 2007
Outline
Informatics in chemistry?Where have we got to?What can we do now?What’s left to do?Where are we going?
3 Copyright Elsevier MDL 2007
Informatics in chemistry?
Cheminformatics vs. Chemoinformatics
Structure representation
Information acquisition
Information management
Information use
4 Copyright Elsevier MDL 2007
This Awful Neologism ….
Date: Fri, 17 Oct 1997 From: Wendy Warr
Subject: Re: Cheminformatics/Two new refs. I
wonder if any of the sources define this awful
neologism ("chemoinformatics" or
"cheminformatics"). Does it really differ from
"chemical information" or "computational
chemistry". As I have said before, I suspect
that it is merely an image-enhancing name
for some practitioners of computational
chemistry.
5 Copyright Elsevier MDL 2007
2 O or X 2 O?
Data copyrighted (C) by Molinspiration Cheminformatics. http://www.molinspiration.com/chemoinformatics.html
0
50000
100000
150000
200000
250000
300000
350000
400000
Jul-0
0
May
-01
Oct-01
Jun-
02
Jul-0
2
Aug-0
2
Sep-0
2
Oct-02
Jan-
03
Apr-0
3
Jun-
03
Aug-0
3
Nov-03
Feb-0
4
May
-04
Jul-0
4
Sep-0
4
Nov-04
Jan-
05
Mar
-05
Jul-0
5
Oct-05
Dec-05
Apr-0
6
Sep-0
6
Dec-06
Mar
-07
Date
Cit
atio
ns
0
0.5
1
1.5
2
2.5
3
3.5
4
Rat
io
Cheminformatics
Chemoinformatics
Ratio
6 Copyright Elsevier MDL 2007
The Building Blocks
Molecules – 2D, 3D, stereoisomers,
conformers, polymers, mixtures,
formulations, sequences, combichem
libraries, virtual libraries, Markush….
Reactions – reagents, products, catalysts,
solvents, reacting centers, transition states,
metabolic pathways ….
Nomenclature, fragment codes, line
notations, graphics, file formats
7 Copyright Elsevier MDL 2007
Representing Chemistry: Benzene?
Connection table:Benzene -ISIS- 08200115272D
6 6 0 0 0 0 0 0 0 0999 V2000 -1.0306 -1.4375 0.0000 C 0 0 0 0 0 0 0 0 0 0 0
0 -1.0318 -2.2648 0.0000 C 0 0 0 0 0 0 0 0 0 0 0
0 -0.3169 -2.6777 0.0000 C 0 0 0 0 0 0 0 0 0 0 0
0 0.3995 -2.2644 0.0000 C 0 0 0 0 0 0 0 0 0 0 0
0 0.3966 -1.4338 0.0000 C 0 0 0 0 0 0 0 0 0 0 0
0 -0.3187 -1.0247 0.0000 C 0 0 0 0 0 0 0 0 0 0 0
0 1 2 2 0 0 0 0 3 4 2 0 0 0 0 4 5 1 0 0 0 0 2 3 1 0 0 0 0 5 6 2 0 0 0 0 6 1 1 0 0 0 0M END
H
H
H
H
H
H H
H
H
H
H
H
b2u
a2u
e2u
e1g
Benzene
ID #: MUSE00000002
CAS #: 71-43-2
Other Names:BenzolCyclohexa-1,3,5-triene
Line notation•Wiswesser: RH
•MDL LN: C-C=C-C=C-C=@1
•SMILES: c1ccccc1
•InChI InChI=1/C6H6/c1-2-4-6-5-3-1/h1-6H
9 Copyright Elsevier MDL 2007
But have we really progressed?
Subject: Re: Beilstein R-groups
From: Dana Roth <[log in to unmask]>
Reply-To: CHEMICAL INFORMATION SOURCES DISCUSSION LIST <[log in to unmask]>
Date: Fri, 16 Mar 2007 10:57:59 -0700
Content-Type: text/plainHoward: we are still teaching v.6 since most people here are using MACs. From my little experience with v.7, it appears that the structure editor is the same. I just followed these instructions (which I borrowed many years ago from Andrea Twiss-Brooks) in v.7 and it works fine.
=================
Creating User Defined Groups and Atom Lists Atoms: Click on the atom in the structure, which needs to be variable. Type 'A1' in the Atom Box and click OK to make the change. Next, click the 'An' button in the Tool Box (left side), and the 'Atom List Number' box will appear. Click OK to display a 'Define Atom List A1' periodic table. Click as many elements or element groups as needed and click OK. A list of the all the selected atoms will appear in the Structure Editor window. Groups: Click the atom, which will be the variable group in the structure. Type 'G1' in the Atom Box and click OK to effect the change. Next, draw a group in the Structure Editor window, 'Select' a group structure (i.e. by double clicking an atom or bond with the select tool) and click the 'Gn' button in the tool box. Set G=1 and click OK. Repeat for additional groups. One atom in each group must be designated as the attachment point. Click on this atom (with the Edit tool), to display the 'Atom Attributes box. Click 'Set User Defined' and then click 'Attachments'. Click '1' in the 'Attachment Points' box and click OK (in that box). Then click OK in the 'Atom Attributes' box. After drawing the structure, click on the Crossed Red Arrows à Beilstein Commander.
10 Copyright Elsevier MDL 2007
Information Acquisition:Structure tools and presentation
Structure drawing
Name structure converters
Virtual chemistry – de novo structure generation, enumeration
Chemical OCR: dead structure live structure
Text mining: text structure
Renderers - on screen, in print, within applications, 2D, 3D, shapes, animations
11 Copyright Elsevier MDL 2007
Data Management
Structure storage systems – online, in-house, local, distributed, open, closed, proprietary systems, Oracle cartridges
Registration, novelty check, definitions, business rules
Search systems
• Molecules, reactions
• 2D, 3D, conformations
• Exact, substructure, similarity, fuzzy, shape, property-based, pharmacophores
Pre/Post-search processing – fingerprints, clustering, filtering, diversity analysis
Performance and scalability – virtual chemistry
12 Copyright Elsevier MDL 2007
Information Use:What we can do now
“Publish” information in lab notebooks, databases, reports, papers, patents
Detect, analyze and harvest structures and reactions from printed materials
Create, maintain, publish and link to databases
Search, browse and analyze structures and reactions in databases and documents
Link structures with their properties and with other disciplines – pathways, proteins, genes
Virtual chemistry and sceening
Predict/calculate properties, activity, reactivity, drug-likeness
Render, share and communicate
Collaborate and reuse
13 Copyright Elsevier MDL 2007
Sample workflows
Finding out what’s known about a
molecule
Exploring possible synthetic routes
to a target molecule
Assessing metabolic and toxic liabilities
and outcomes
21 Copyright Elsevier MDL 2007
Evaluating Metabolic and Toxic Liabilities
From one parent in MDL Metabolite
From one parent in MDL Metabolite
From another parent in MDL Metabolite
From another parent in MDL Metabolite
From Corporate Database
From Corporate Database
Link to Toxicity
Link to Toxicity
Transformation Details
Transformation Details
23 Copyright Elsevier MDL 2007
What’s left to do?
Structure Representation• Generic structures and patents
• More stereochemistry
• Organometallics, composites, stuff
• Biomolecules
• Transition states, reaction mechanisms, pathways
Information Acquisition• Authoring tools
• Annotation - semantics
• Web 2.0 – social networking, wikis
24 Copyright Elsevier MDL 2007
What else is left to do?
Information Management• Integration
• Performance
• Timeliness
• Accessibility
• Portability
Information Use• Better predictors: activity, ADMET, reactivity
• Better virtual screening
• Presenting QSAR results that chemists can act on
• Capturing and automating intellectual processes: synthesis design
• Knowledge extraction, inference generation
25 Copyright Elsevier MDL 2007
Where are we going?
Automated data capture and indexing
• Papers, patents, theses ….
Robust predictors and inference generators
Blurring of boundaries
• Internal and external information
• Text and structures
• Publications and databases
• Small molecules and -omics
• Mash ups
in cranio >> in silico >> in vitro