Drawing Query Structures in Chem & Bio Draw

24
HO HO OH O OH O O NH 2 HO O Drawing Query Structures in Chem & Bio Draw

Transcript of Drawing Query Structures in Chem & Bio Draw

HO

HOOH

O

OH

O

O

NH2

HO O

O

NH2

HO O

Drawing Query Structures in Chem & Bio Draw

Drawing Query Structures in Chem & Bio Draw i A White Paper by CambridgeSoft

ContentsContents ............................................................................................................................. iIntroduction .......................................................................................................................1

What is a query structure? ...............................................................................................1Search Modes ....................................................................................................................2

Exact structure searching .................................................................................................2Full structure search .........................................................................................................3Similarity search ..............................................................................................................3Tautomeric search ............................................................................................................4

Drawing Query Structures ...............................................................................................5Drawing hydrogen atoms .................................................................................................5Generic atom queries .......................................................................................................7Atom lists and NOT lists ...............................................................................................11Atom properties .............................................................................................................12

Saturation / unsaturation ........................................................................................................ 12Substituent count ............................................................................................................13

Exact substituent count ........................................................................................................... 13Inexact substitution and free sites ........................................................................................... 14

Link nodes ......................................................................................................................14Ring bond count .............................................................................................................15Bond query features .......................................................................................................16

Bond order .............................................................................................................................. 16Bond Topology ........................................................................................................................ 16

Alternative group queries ...............................................................................................16Multiple attachments ......................................................................................................18

Variable attachments .............................................................................................................. 19cis/trans isomers ............................................................................................................20A final note on queries ...................................................................................................21

Drawing Query Structures in Chem & Bio Draw 1 A White Paper by CambridgeSoft

IntroductionIn this white paper, we illustrate a variety of the Chem & Bio Draw features for drawing query structures to search chemical databases. We first explain what a query structure is and the types of queries you can per-form using these structures. We then illustrate several sample query structures and query results that may be returned.The query results shown throughout this white paper are for demonstration only. Running the sample que-ries on a real database (such as your own) that contains its own unique set of records will likely return dif-ferent results.

What is a query structure?More than likely, you’ve performed some kind of database query before. For example, when you search the Internet, you enter a text string and the search engine returns a list of results.You search chemical databases much the same way. You enter a chemical formula or molecular weight, and records that meet those criteria are returned. However, assume that you want to return all records for compounds that, for example, contain a tertiary amide functional group. Alphanumeric text fields simply aren’t practical for entering this information. To do so, you need to enter a chemical drawing, a query structure, of what to search for—in this case, a drawing that includes a tertiary amide group.To apply a query structure to a search, you either draw or paste the query structure into a search form that is designed for entering chemical structures. When executed, the search returns all records that match the query structure (and any other search parameters you may have entered). In our amide example, the query structure might look like this:

Some of the results of the query may be:

Query structures let you search for much more than just functional groups. You can search for nearly any structural feature or property—substructures, atoms, bond types, charges, etc.

N

O

N

O

N

N

O

N

O

Drawing Query Structures in Chem & Bio Draw 2 A White Paper by CambridgeSoft

Search ModesSearch modes are important in that they determine how a search engine matches the query structure to the data set being searched. While search modes are not a feature set in ChemDraw, it is important to under-stand how common search modes work when planning to draw a query structure. This section will review common search modes. To perform a search, you must enter your query structure into ChemBioFinder or a similar structure search application. You then use the search engine to choose a search mode and run the search. The search mode tells ChemBioFinder how to determine whether a record in the database matches the query structure (typically, the search mode is just as important as the query structure). For example, assume you enter o-xylene as your query structure. Depending on the mode you use, the search may return only o-xylene, structures that contain o-xylene as a substructure, or structures that are similar to o-xylene (such as m-xylene).

Exact structure searchingAn exact structure search (also called an “identity” search) returns structures that are identical to the query structure. This means that returned structures must match the same charges, isotopes, and stereoisomers that are in the query structure. For example, if you enter benzene as the query structure, only structures of benzene are returned.

D

D

D

D

D

D

does not find finds

Drawing Query Structures in Chem & Bio Draw 3 A White Paper by CambridgeSoft

Full structure searchA full structure search returns all records that an exact structure search returns but does not require that ste-reoisomers, atomic charges, or isotopes matches the query structure for them to be returned. For example:

Results in a full structure search may also include fragments (such as ions) that are part of the returned record; an exact search will not. For example, if your query structure is the acetate ion, your results may include:

Similarity searchSimilarity searching depends exclusively on the notion of molecular descriptors. The structures returned in a query depend on a percentage of similarity that considers how many and how close the descriptors in the target compound are versus the potential hit compound. For example, consider a search for compounds with 90% similarity to 2-ethylisoindoline-1,3-dione:

2-ethylisoindoline-1,3-dione

C•

D

D

D

finds

C

OH

H

HO

finds

O

O-

O

O-

Al3+

OH-O

O-

Na+

N

O

O

Drawing Query Structures in Chem & Bio Draw 4 A White Paper by CambridgeSoft

Example search results:

Using a smaller similarity value, such as 75%, will return results that are less similar to the query structure. The best similarity value to use depends on how closely you want the returned structures to resemble the query structure.

Tautomeric searchA tautomer search finds all the tautomers in the database for the query structure you enter and adds them to the search results. For example, a search for pyridin-2(1H)-imine will find its amine tautomer:

Here's an example in which 2-butanone is used as the query structure to find its enol tautomer1:

Some search tools let you combine tautomeric searches with either full, substructure, or similarity searches.

1. Not all commercially available query software recognizes keto-enol tautomerization.

N

NH

O

O

N NH

O

O

O

OH

O

HN NH

N

NH2

finds

O OHfinds

Drawing Query Structures in Chem & Bio Draw 5 A White Paper by CambridgeSoft

Drawing Query StructuresThe search modes we’ve introduced have their limitations when used by themselves (although still quite useful). For example, any record for benzoic acid may be found using a full or substructure search. But what if you are interested in other functional groups besides a carbonyl group? You can create query struc-tures for bromobenzene, aniline, benzenethiol, etc; and run a query on each one but this can be a daunting task.A better method is to use special indicators that represent the atoms and bond types you want to find. For example, to search for all structures of benzene that have one attachment (including hydrogen), you can use a generic query atom:

The query atom 'Q' can represent any heteroatom. When you run a query using this structure, all records that contain a benzene ring with one heteroatom attached to it will be returned. Using a substructure search, some example search results can be:

‘Q’ is just one of several available query atom types. Throughout the rest of this guide, we describe other query atom types and a variety of atom, bond, and stereochemistry attributes that you can apply to your query structures.

Drawing hydrogen atomsNot all hydrogen atoms in a query structure are necessarily the same and knowing the differences can help you save a lot of confusion when viewing your search results. In a drawn structure, hydrogen atoms can be either explicit or implicit. Explicit hydrogens are always visible and connected to the structure with a visi-

Q

H

H

H

H

H

Br S

N

O

NH2

OH

O

Drawing Query Structures in Chem & Bio Draw 6 A White Paper by CambridgeSoft

ble bond. Implicit hydrogens may or may not be visible (the hydrogen is assumed to be there) and do not have visible bonds. For example:

These differences are important because of how explicit and implicit hydrogens are treated in queries. Implicit hydrogens can be replaced with other atoms in a query whereas explicit hydrogens cannot be. For example, consider these two query structures of methanamine for a substructure search.

Having only explicit hydrogen atoms, the query structure on the right will likely return only a small num-ber of records (perhaps just methanamine itself) because none of the hydrogens can be replaced. However,

Implicit Hydrogen Implicit Hydrogen

Explicit Hydrogen Explicit Hydrogen

HC

HC

CH

CH

CH

HC

C

C

C

C

C

C

H

H

H

H

H

H

H

H

H

H

H

H

H3C NH2 C N

H

H

H H

H

Drawing Query Structures in Chem & Bio Draw 7 A White Paper by CambridgeSoft

the left structure, with implicit hydrogens, could return thousands of records because any one or more of the hydrogens can be substituted for any other atom or substructure.

This means that, to ensure that all returned records have a hydrogen atom at a specific location, you should draw your query structure with an explicit hydrogen at that location1. Also, keep in mind that using explicit or implicit hydrogens in your query structure does not mean that hydrogens have to appear as implicit or explicit in a record to be returned. For example, a query structure with implicit hydrogens can return records with either implicit or explicit hydrogens.

Generic atom queriesEarlier in this guide, we mentioned the ‘Q’ generic query atom. In this section, we describe other generic atoms you can use. You add a generic atom to your structure in Chem & Bio Draw the same way you would add any other atom. Click where you want it and then select the appropriate letter.You can think of the generic atoms as wildcards that can represent any atom in a query. Each of them have subtle but important differences.

• 'R' represents any atom, including hydrogen and carbon. • 'A' represents any nonhydrogen atom, including carbon.• 'Q' represents any heteroatom (a nonhydrogen, noncarbon atom).

Query structure Possible records returned

1. In Chem & Bio Draw., you can also define the properties of an atom to indicate whether explicit or implicit hydro-gens are allowing. Another alternative is to define the number of allowed substituents or free sites for an atom. See “Inexact substitution and free sites” on page 14.

H3C NH2N

O

Cl

N

OH2N

C N

H

H

H H

H

C N

D

D D

D

D H2N C N+ CH3

H

H

H

H

H

Drawing Query Structures in Chem & Bio Draw 8 A White Paper by CambridgeSoft

To understand the differences among the generic atoms, assume that you have a hypothetical database that consists only of these five structures:

We'll perform a substructure search of the database using each of the three query structures below and compare the results:

Using the R indicator returned all five records because all atoms are allowed. The A indicator disallowed benzene from being returned because hydrogen is not allowed at the query indicator site. Q returned only

CH3 NH2

H HN

CH3

H2C

CH3

H

H

H

H

H

H

H

H H

H H

H H

H

H

HH

H

H

H

H

H

H

H

H

H

R

H

H

H

H

H

A

H

H

H

H

H

Q

H

H

H

H

Drawing Query Structures in Chem & Bio Draw 9 A White Paper by CambridgeSoft

aniline and N-methylaniline because they are the only records in the database that have a heteroatom at the indicator site.

It is important to understand that the A and Q indicators represent atoms, not attachments. In a substructure search, results will commonly show the atom of interest attached to other substructures (such as the methyl group shown in the table above), not just the query structure.However, in a full structure search, the query indicator returns records in which the query structure is the only attachment (assuming the database contains such records). For example, using the same ‘A’ and ‘Q’ query structures above in a full structure search, may yield these results:

Query structure Records Returned Notes

All records in the database match the

query structure

Benzene is not included in the

results because only nonhydrogen atoms

match the query indi-cator

Only aniline and N-methylaniline are returned because only heteroatoms match the query

indicator

H

R

H

H

H

H

CH3 NH2

H HN

CH3

H2C

CH3

H

H

H

H

H

H

H

H H

H H

H H

H

H

HH

H

H

H

H

H

H

H

H

H

A

H

H

H

H

CH3 NH2

HN

CH3

H2C

CH3

H

H

H H

H H

H H

H

H

HH

H

H

H

H

H

H

H

H

H

Q

H

H

H

H

NH2 HN

CH3

H

H H

H

H

H

H

H

H

H

Drawing Query Structures in Chem & Bio Draw 10 A White Paper by CambridgeSoft

Chem & Bio Draw offers two types of heteroatom query indicators besides ‘Q’. To specify that an atom must be a metal, use the ’M’ atom label. To return structures with a halide present, use ‘X’. Consider the following query structure:

Query Structure Records Returned

H

A

H

H

H

H

H

CH3

H

H

H

H

H

NH2

H

H

H

H

H

Q

H

H

H

H

H

NH2

H

H

H

H

X

Drawing Query Structures in Chem & Bio Draw 11 A White Paper by CambridgeSoft

Valid query hits may include:

A query structure to search for metals might look like this:

Query results may include:

Atom lists and NOT listsIn Chem & Bio Draw, you can use atom lists and NOT lists to require or prohibit certain atoms from appearing at specific locations. For example, you can require that either nitrogen or oxygen appear at one location while prohibiting them at another location. Consider this query structure:

1-bromo-3-chloro-2-methylpentane perfluorocyclobutane

Br

Cl

F

F

F F

F

F

F

F

O

R M

Fe

O

OAu

O

[NOT N,O]

[N,O]

Drawing Query Structures in Chem & Bio Draw 12 A White Paper by CambridgeSoft

The following structures are valid results:

However, the query would not yield this:

Lists and NOT lists can include any atom or list of atoms including the generic atoms.

Atom properties

Saturation / unsaturationChem & Bio Draw lets you specify that an atom must be saturated. When you apply the saturation atom property an ‘S’ indicator appears. Consider the query structure below:

The query hits may include:

cyclohexanone (1Z,2E)-cyclohexane-1,2-dione dioxime

1,1-difluoroethane 1,1-difluorocyclopentane

O

N

N

HO

OH

O

O

NH

F F

SS

F

FFF

Drawing Query Structures in Chem & Bio Draw 13 A White Paper by CambridgeSoft

Alternatively, you can require that a location be unsaturated. This means that the query will return only those results where the specified atom is attached to at least one double, triple, quadruple, or aromatic bond. Consider this query structure:

The query results can include:

Substituent countChem & Bio Draw lets you specify how many substituents are allowed at a specified location in the query structure. Before describing this feature, we need to define “substituent”. In Chem & Bio Draw, a substitu-ent is defined as any nonhydrogen attachment. This means that, although the carbonyl carbons on a carbox-ylic acid and on an aldehyde are each bonded to three atoms, they have a different number of substituents:

Exact substituent countUsing Chem & Bio Draw, you can specify that a location has exactly n substituents where n can be any number from 0 to 15. For example, the query structure below specifies that exactly four substituents must be attached to a metal. In this case, one of the substituents is already defined:

2,2,2-trifluoroacetic acid 1,1-difluoro-1H-cyclopropa-benzene

Three substituents Two substituents

F F

SS S=Unsaturated

O

OH

F

F

F

F

F

H3C

O

OH H3C

O

H

MX4X4

O-

Drawing Query Structures in Chem & Bio Draw 14 A White Paper by CambridgeSoft

Some results may be:

Inexact substitution and free sitesInexact substitution lets you specify that a location may have up to n substituents where n is any number from 0 to 15. Some work environments, however, may not allow the inexact substituent feature. Therefore, Chem & Bio Draw also provides the free site feature. Regardless of whether you use free site or inexact substitution, you define the maximum number of substituents an atom can have. The two features behave the same way but how you derive the maximum number is different. The number you use in the inexact count must include the substitution(s) shown in the query structure, whereas, the free site count does not. For example, both of the query structures below are equivalent. The structure on the left uses the free site feature with a value of 1 because the hydroxyl groups are not counted. The inexact substitution, on the right, includes the hydroxyl groups in the count; so, it has a value of 3.

Either query structure can yield these results:

Link nodesUse link nodes to find rings or chains of varying sizes. For example, consider this query:

O-Mo

O

-O

O

O

Cr Cr-O O-

O

O O

O

O W2-

O

O

O

H

O

CH2∗∗

O

H H

O

CH2U3U3

O

H

HO

O

OH

HO OH

OH

HO

F

F

F

(CH2)1-5

Drawing Query Structures in Chem & Bio Draw 15 A White Paper by CambridgeSoft

Results may include:

When using link nodes, the query structures must follow these rules:

• The link node must consist of a generic name or no more than a single element symbol (except for hydrogen, such as in CH2).

• The number range must be in the form of integer-integer with the integers and hyphen subscripted, such as (CH2)1-4.

• Parentheses or braces may be used instead of brackets• Brackets can be omitted if not needed for clarity, for example with O1-3

Ring bond countUsing the ring bond count, you can specify how many bonds attached to an atom are also part of a ring (of any size). For example, the two bridgehead carbons in the structure below each have a ring bond count of three. All other carbons in the structure have a ring count of two.

bicyclo[2.2.2]octane

CH

CH

Drawing Query Structures in Chem & Bio Draw 16 A White Paper by CambridgeSoft

Chem & Bio Draw lets you specify exactly 0, 2 (a simple ring), or 3 (fusion) ring bonds. You can also specify four or more ring bonds or the exact number of ring bonds as drawn. Here are a few examples of counting ring bonds (for illustration, the target atom and ring bonds appear in red):

Figure 1.1 Ring bonds defined in Chem & Bio Draw. Select the ring bond count (“No Ring Bonds”, Simple Ring”, etc.) in the Chem & Bio Draw Atom Properties dialog box.

Bond query featuresQuery properties are not limited to atoms. You can also indicate properties for bonds. For example, you can indicate that a particular bond must be either a single or double bond.

Other bond query features you can specify in Chem & Bio Draw are described below.

Bond orderThese query indicators describe the bond order regardless of bond topology: Double or aromatic. The bond must be either double or aromatic. Single or aromatic. The bond must be either single or aromatic.Single or double. The bond must be either a single or double bond.

Bond TopologyThese query indicators describe the bond location regardless of the bond order:Ring. The bond must be part of one or more rings.Chain. The bond must not be in a ring.Ring or a chain. The bond may be in a chain or a ring.

Alternative group queriesBecause generic atoms are intended to be nonspecific, using them in query structures may, in some cases, not provide enough control over what records they return. Using alternative groups, you can create a list of only those atoms or substructures a generic atom is allowed to represent in a query.

aniline(no ring bonds)

phenylphosphine(two ring bonds)

naphthalene(three ring bonds)

spiro[5.5]undecane(four ring bonds)

search: “No Ring Bonds” search: “Simple Ring” search: “Fusion” search: “Spiro or higher”

NH2 PH2

C

CC

NH2

S/D

Drawing Query Structures in Chem & Bio Draw 17 A White Paper by CambridgeSoft

An example is shown here:

In this example, the query structure is composed of a benzyl group attached to an R atom. An R-group is attached to the query structure while the functional groups of interest are listed in a table. The attachment points (signified by ‘1’) indicated how each functional group must be attached to the query structure. A hit is returned if any of the alternative substructures match the target.Using a full structure search, the results may be:

R-groups are commonly represented with the letter ‘R’. However, two or more R-groups may be defined such that each represents its own set of atoms or groups. In this case, each R-group is indicated with a sub-script, such as:

S P

OO

OH

OHHO

O

R1 R2

H3C

Drawing Query Structures in Chem & Bio Draw 18 A White Paper by CambridgeSoft

Using this structure, we can say that R1 must be either a hydroxyl or amino group while R2 must be either chlorine or bromine. Here is an example of how you would draw these groups in Chem & Bio Draw:

Your search results may return these results:

Multiple attachmentsJust as with single R-groups, alternative R-groups may be polyvalent. For example:

ClHO

H3C

BrHO

H3C

ClH2N

H3C

BrH2N

H3C

Drawing Query Structures in Chem & Bio Draw 19 A White Paper by CambridgeSoft

It is typically necessary to indicate how the alternative substituents attach to the main structure. For exam-ple, assume you want to perform a query using the root structure shown above and the following groups:

Depending on how you define the orientation for the groups in your query, your results can include:

Variable attachmentsIn some cases, you may want to perform a search where a substituent may be attached to any one of several atoms. For example, assume you want to search for all structures that include a dichlorobenzene ring regardless of where the chlorine atoms in the ring are attached. To do so, you assign two variable attach-ment points—one for each chlorine atom—to each carbon atom in the ring.The query structure would look something like this1:

N-phenylmethanediamine 2-amino-1-phenylethanone

benzylhydrazine 2-phenylacetamide

1. You must use variable attachment points to draw this structure. Simply drawing disconnected bonds creates an incor-rect structure.

NH

NH2 NH2

O

HN

NH2 NH2

O

Cl

Cl

Drawing Query Structures in Chem & Bio Draw 20 A White Paper by CambridgeSoft

Finding all dichlorobenzene rings requires either a full structure search or a substructures search. A full structures search yields the dichlorobenzene isomers:

Used in a substructure search, the same query structure yields the same isomers but also other structures such as:

cis/trans isomersThis section describes how cis/trans isomers may be found using 1-bromo-2-chloroethylene as an example:

cis-1-bromo-2-chloroethylene trans-1-bromo-2-chloroethylene

Cl

Cl Cl Cl

Cl

Cl

Cl

NH2

Cl

OH

O

Cl

Cl

HN

Cl

O

Br Cl

Cl

Br

Drawing Query Structures in Chem & Bio Draw 21 A White Paper by CambridgeSoft

When searching for cis/trans isomers, you can indicate using a Double Either bond whether you want your results to include the cis or trans isomers, or both. The table below illustrates the query structure you would draw to return the desired result.

A final note on queriesWhenever you run a query, remember that you are not simply searching a database for drawings that look like your query structure. Instead, you are looking for chemical structures, represented in two dimensions, using a query structure that is also represented in two dimensions. This means that a query result may look completely different from your query structure even though both drawings accurately represent the same molecule, such as:

Desired Results

Query structures

Return cis and trans isomers X X

Return cis and trans isomers X X

Return trans isomer only X X X

Return cis isomer only X X X

Cl

Br

ClBr Cl

Br

Br Cl

Drawing Query Structures in Chem & Bio Draw 22 A White Paper by CambridgeSoft

It is important that you draw your query structure correctly and as intended (take particular care when drawing stereochemical structures1). For the same reason, it is just as important that you correctly interpreting your query results. With a solid understanding of the chemical structures and the information they provide, you will find Chem & Bio Draw to be a valuable tool for drawing structures for your database queries.

1. For the IUPAC recommendations for representing stereochemical configurations, see http://www.iupac.org/publications/pac/78/10/1897/.