SciBorg: Deep Processing and Chemical Informatics

28
SciBorg: Deep Processing and Chemical Informatics Ann Copestake, Peter Corbett, CJ Rupp, Advaith Siddharthan, Simone Teufel, Ben Waldron University of Cambridge

description

SciBorg: Deep Processing and Chemical Informatics. Ann Copestake, Peter Corbett, CJ Rupp, Advaith Siddharthan, Simone Teufel, Ben Waldron University of Cambridge. Overview. semantic markup language for integrated processing introduction to the SciBorg project overview of architecture - PowerPoint PPT Presentation

Transcript of SciBorg: Deep Processing and Chemical Informatics

Page 1: SciBorg: Deep Processing and Chemical Informatics

SciBorg: Deep Processing and Chemical Informatics

Ann Copestake, Peter Corbett, CJ Rupp, Advaith Siddharthan, Simone Teufel, Ben Waldron

University of Cambridge

Page 2: SciBorg: Deep Processing and Chemical Informatics

Overview

• semantic markup language for integrated processing

• introduction to the SciBorg project• overview of architecture• semantic markup in SciBorg• domain-dependent modules• citation classification• conclusion

Page 3: SciBorg: Deep Processing and Chemical Informatics

Compositional semantics as a common representation for NLP

integration• Different NLP systems have different strengths and

weaknesses• Pairwise compatibility between systems is too limiting

– Syntax is theory-specific and too language-specific– Eventual goal should be semantics

• Core idea: shallow processing gives underspecified semantic representation with respect to a normative `deep’ analysis

• Integrate processors with different capabilities• Applications work on a standard representation• Reuse of knowledge sources, integration with ontologies• First experiments done on Deep Thought and QUETAL:

RMRS language

Page 4: SciBorg: Deep Processing and Chemical Informatics

Extracting the science from scientific publications: SciBorg

• 4-year EPSRC-funded project started in October 2005– Computer Laboratory, Chemistry, Cambridge eScience Centre– Nature Publishing, Royal Society of Chemistry, International

Union of Crystallography (papers and publishing expertise)• Aims:

1. Develop an NL markup language (RMRS) which will act as a platform for extraction of information. Link to semantic web languages.

2. Develop IE technology and core ontologies for use by publishers, researchers, readers, vendors and regulatory organisations.

3. Model scientific argumentation and citation purpose in order to support novel modes of information access.

4. Demonstrate the applicability of this infrastructure in a real-world eScience environment.

Page 5: SciBorg: Deep Processing and Chemical Informatics

General assumptions• There is lots of useful information in the published

scientific literature that is not currently being retrieved• Language processing is required for some sorts of

analyses (text-mining versus data-mining)• Building specialized language processing tools for each

task isn’t cost-effective (time and skill), so we need to build and exploit general purpose language technology

• Eventually language technology should be a standard part of Computer Science, like database technology: i.e., needs some time and expertise to adapt to new tasks and domains, but not (as currently) a research project

• Text processing tools based directly on text patterns (regular expressions) work adequately for some tasks, but often fail to achieve high enough precision and recall

Page 6: SciBorg: Deep Processing and Chemical Informatics

Variation in expressionExample 1: searching for papers describing

synthesis of Tröger’s base from anilines:A: The synthesis of 2,8-dimethyl-6H,12H-5,11 methanodibenzo[b,f]

[1,5]diazocine (Troger's base) from p-toluidine and of two Troger's base analogs from other anilines

B: … Tröger’s base (TB) ... The TBs are usually prepared from para-substituted anilines

linguistic variation and syntactic relationship (synthesis of X, synthesize X, prepare X and so on), coreference, chemistry names, ontological information …

Example 2: searching for papers describing Tröger’s base syntheses which don’t involve anilines.

Page 7: SciBorg: Deep Processing and Chemical Informatics

SciBorg, or the Chemist’s amanuensis

• Research prototype, bringing together different language processing tools supporting different types of information extraction (IE)

• Process chemistry texts using combined domain-independent and domain-dependent processing: markup in RMRS

• IE based on patterns expressed via semantics and rhetorical organization:retrieve all papers X: PAPER-AIM(X,h), h:synthesis,

SYN-RESULT(h,<TB>), SYN-SOURCE(h,y) & NOT(aniline(y))

Page 8: SciBorg: Deep Processing and Chemical Informatics

Information Extraction

recipe expressed in chemistry formalism (CML)

To a solution of aldimine1 (1.5mmol) in THF (5mL) was added LDA (1mL, 1.6 M in THF) at 0 °C under argon, the resulting mixture was stirred for 2h, then was cooled to -78 °C ...

... alkaloids and other complex polycyclic azacycles ...

Enamines have been used widely ... (citation Y), however, ... did not provide the desired products.

<owl:Class rdf:ID="Alkaloid"> <rdfs:subClassOf rdf:resource="#Azacycle" />

X cites Y (contrast)

Chemistry IE: e.g., Organic chemistry syntheses

Ontology extraction (to support other IE)

Research markup

Page 9: SciBorg: Deep Processing and Chemical Informatics

Abonia et al. 2002

Tröger 1887

Elguero et al 2001Cowart et al 1998

Claridge 1999

Wagner 1935

Katritzky et al. 1998

Merona-Fuquen et al 2001

Cerrada et al. 1995

The bridging 15/17-CH2 protons appear assinglets, in agreement with what has beenobserved for similar systems [9].

Wilcox and Scott 1991

Goldberg and Alper 1995

Criticism/ contrastSupport/basis

Citation map

However, some of the above methodologies possess tedious work-up procedures or include relatively strong reaction conditions, such as treatment of the starting materials for several hours with an ethanolic solution of conc. hydrochloric acid or TFA solution, with poor to moderate yields, as is the case for analogues 4 and 5.

Page 10: SciBorg: Deep Processing and Chemical Informatics

Outline architecture

RSC

Nature

SciXML

IUCr

Biology and CL (pdf)

RASP tokeniser

andPOS tagger

OSCAR3 RASPparser

ERG/PET

WSD

anaphora

standoff annotation

rhetoricalanalysis

sentenceextraction

ERGtokeniser

sentence RM

RS

document R

MR

S

TASKS

Page 11: SciBorg: Deep Processing and Chemical Informatics

Details of sentence parsing

RASP tokeniser

andPOS tagger

OSCAR3

RASPparser

ERG/PET

sentencesplitter

ERGtokeniser

citation parser

section selection

domain token lattice(SMAF)

RMRS lattice(SMAF)

(unknown words)

Page 12: SciBorg: Deep Processing and Chemical Informatics

SciXML: text markup for scientific papers

<?xml version="1.0" encoding="UTF-8"?><PAPER> <METADATA> <FILENO>b200862a</FILENO>

<JOURNAL><NAME>P1</NAME><YEAR>2002</YEAR> <ISSUE>13</ISSUE> <PAGES>1588-1591</PAGES></JOURNAL></METADATA><TITLE>Synthesis of pyrazole and pyrimidine Tröger's-base analogues</TITLE><AUTHORLIST><AUTHOR ID="1">Rodrigo<SURNAME>Abonia</SURNAME></AUTHOR>

<AUTHOR ID="2">Andrea<SURNAME>Albornoz</SURNAME></AUTHOR>…</AUTHORLIST><ABSTRACT>Tröger's-base analogues bearing fused pyrazolic or pyrimidinic rings were prepared in acceptable to good yields through the reaction of 3-alkyl-5-amino-1-arylpyrazoles and 6-aminopyrimidin-4(3<IT>H</IT>)-ones with formaldehyde under mild conditions (<IT>i.e.</IT>, in ethanol at 50 °C in the presence of catalytic amounts of acetic acid). Two key intermediates were isolated from the reaction mixtures, which helped us to suggest a sequence of steps for the formation of the Tröger's bases obtained. The structures of the products were assigned by <SP>1</SP> H and <SP>13</SP>C NMR, mass spectra and elemental analysis and confirmed by X-ray diffraction for one of the obtained compounds.</ABSTRACT>

Page 13: SciBorg: Deep Processing and Chemical Informatics

Domain-independent language processing

• ERG (English Resource Grammar)/PET– DELPH-IN technology (www.delph-in.net), Open Source – LKB for grammar development (and generation), PET for fast parsing– HPSG, stochastic ranking– detailed lexicon, various approaches to unknown words– max coverage about 80% on general text, tuning required for some

constructions, relatively slow (100 words/sec)– Minimal Recursion Semantics (MRS) output, converted to RMRS

• RASP 2– Briscoe and Carroll et al– initial POS tagging stage, symbolic grammar over tags (hand-written),

stochastic ranking, no lexicon required– robust to missing lexical entries, faster (1000 words/sec), relatively

shallow– RASP-RMRS (Deep Thought/SciBorg DELPH-IN licence)

Page 14: SciBorg: Deep Processing and Chemical Informatics

Simplified RMRS example:`the mixture was allowed to warm’

• ERG-RMRS_the_q (h1,x2) RSTR(h1,h3) BODY(h1,h8)_mixture_n(h3,x4) ARG1(h3,u10)_allow_v_1(h5,e6) ARG1(h5,u11) ARG2(h5,x3) ARG3(h5,h8) qeq(h8,h7)_warm_v(h7,e8) ARG1(h7,x4)x2=x4

• POS-RMRS_the_q (h1,x2)

_mixture_n(h3,x4)

_allow_v (h5,e6)

_warm_v(h7,e8)

• RASP-RMRS_the_q (h1,x2) RSTR(h1,h3) BODY(h1,h8)_mixture_n(h3,x4) _allow_v(h5,e6) ARG2(h5,x3) ARG3(h5,h8) qeq(h8,h7)_warm_v(h7,e8)

x2=x4

Page 15: SciBorg: Deep Processing and Chemical Informatics

<ep cfrom='0' cto='4'><realpred lemma='some' pos='q'/><label vid='3'/><var sort='x' vid='4' pers='3' num='pl'/></ep><ep cfrom='0' cto='4'><gpred>part_of_rel</gpred><label vid='7'/><var sort='x' vid='4' pers='3' num='pl'/></ep><ep cfrom='8' cto='11'><realpred lemma='the' pos='q'/><label vid='9'/><var sort='x' vid='8' pers='3' num='pl'/></ep><ep cfrom='12' cto='26'><gpred>compound_rel</gpred><label vid='12'/><var sort='e' vid='14' tense='u'/></ep><ep cfrom='12' cto='26'><gpred>udef_q_rel</gpred><label vid='15'/><var sort='x' vid='13'/></ep><ep cfrom='12' cto='17'><realpred lemma='train' pos='n' sense='of'/><label vid='18'/><var sort='x' vid='13'/></ep><ep cfrom='18' cto='26'><realpred lemma='station' pos='n' sense='1'/><label vid='10001'/><var sort='x' vid='8' pers='3' num='pl'/></ep><ep cfrom='27' cto='33'><gpred>neg_rel</gpred><label vid='20'/><var sort='e' vid='22' tense='u'/></ep><ep cfrom='39' cto='46'><realpred lemma='check' pos='v' sense='1'/><label vid='23'/><var sort='e' vid='2' tense='past'/></ep><ep cfrom='47' cto='55'><gpred>unspec_loc_rel</gpred><label vid='10002'/><var sort='e' vid='26' tense='u'/></ep><ep cfrom='47' cto='55'><gpred>proper_q_rel</gpred><label vid='27'/><var sort='x' vid='25' pers='3' num='sg'/></ep><ep cfrom='47' cto='55'><gpred>dofw_rel</gpred><label vid='30'/><var sort='x' vid='25' pers='3' num='sg'/></ep>

Page 16: SciBorg: Deep Processing and Chemical Informatics

<ep cfrom='0' cto='4'><realpred lemma='some' pos='q'/><label vid='3'/><var sort='x' vid='4' pers='3' num='pl'/></ep><ep cfrom='0' cto='4'><gpred>part_of_rel</gpred><label vid='7'/><var sort='x' vid='4' pers='3' num='pl'/></ep><ep cfrom='8' cto='11'><realpred lemma='the' pos='q'/><label vid='9'/><var sort='x' vid='8' pers='3' num='pl'/></ep><ep cfrom='12' cto='26'><gpred>compound_rel</gpred><label vid='12'/><var sort='e' vid='14' tense='u'/></ep><ep cfrom='12' cto='26'><gpred>udef_q_rel</gpred><label vid='15'/><var sort='x' vid='13'/></ep><ep cfrom='12' cto='17'><realpred lemma='train' pos='n' sense='of'/><label vid='18'/><var sort='x' vid='13'/></ep><ep cfrom='18' cto='26'><realpred lemma='station' pos='n' sense='1'/><label vid='10001'/><var sort='x' vid='8' pers='3' num='pl'/></ep><ep cfrom='27' cto='33'><gpred>neg_rel</gpred><label vid='20'/><var sort='e' vid='22' tense='u'/></ep><ep cfrom='39' cto='46'><realpred lemma='check' pos='v' sense='1'/><label vid='23'/><var sort='e' vid='2' tense='past'/></ep><ep cfrom='47' cto='55'><gpred>unspec_loc_rel</gpred><label vid='10002'/><var sort='e' vid='26' tense='u'/></ep><ep cfrom='47' cto='55'><gpred>proper_q_rel</gpred><label vid='27'/><var sort='x' vid='25' pers='3' num='sg'/></ep><ep cfrom='47' cto='55'><gpred>dofw_rel</gpred><label vid='30'/><var sort='x' vid='25' pers='3' num='sg'/></ep>

Page 17: SciBorg: Deep Processing and Chemical Informatics

<ep cfrom='0' cto='4'><realpred lemma='some' pos='q'/><label vid='3'/><var sort='x' vid='4' pers='3' num='pl'/></ep><ep cfrom='0' cto='4'><gpred>part_of_rel</gpred><label vid='7'/><var sort='x' vid='4' pers='3' num='pl'/></ep><ep cfrom='8' cto='11'><realpred lemma='the' pos='q'/><label vid='9'/><var sort='x' vid='8' pers='3' num='pl'/></ep><ep cfrom='12' cto='26'><gpred>compound_rel</gpred><label vid='12'/><var sort='e' vid='14' tense='u'/></ep><ep cfrom='12' cto='26'><gpred>udef_q_rel</gpred><label vid='15'/><var sort='x' vid='13'/></ep><ep cfrom='12' cto='17'><realpred lemma='train' pos='n' sense='of'/><label vid='18'/><var sort='x' vid='13'/></ep><ep cfrom='18' cto='26'><realpred lemma='station' pos='n' sense='1'/><label vid='10001'/><var sort='x' vid='8' pers='3' num='pl'/></ep><ep cfrom='27' cto='33'><gpred>neg_rel</gpred><label vid='20'/><var sort='e' vid='22' tense='u'/></ep><ep cfrom='39' cto='46'><realpred lemma='check' pos='v' sense='1'/><label vid='23'/><var sort='e' vid='2' tense='past'/></ep><ep cfrom='47' cto='55'><gpred>unspec_loc_rel</gpred><label vid='10002'/><var sort='e' vid='26' tense='u'/></ep><ep cfrom='47' cto='55'><gpred>proper_q_rel</gpred><label vid='27'/><var sort='x' vid='25' pers='3' num='sg'/></ep><ep cfrom='47' cto='55'><gpred>dofw_rel</gpred><label vid='30'/><var sort='x' vid='25' pers='3' num='sg'/></ep>

Page 18: SciBorg: Deep Processing and Chemical Informatics

RMRS construction• OSCAR-3: different types of chemical compound

reference mapped to simple RMRSs (analogous to nouns etc)

• POS-RMRS: tag lexicon• RASP-RMRS: tag lexicon plus semantic rules associated

with RASP rules– no lexical subcategorization, so rely on grammar rules to provide

the ARGs – developed on basis of ERG semantic test suite– default composition principles when no rule RMRS specified

• ERG-RMRS: converted from MRS• Research Markup: RMRS versions of cue phrases

Page 19: SciBorg: Deep Processing and Chemical Informatics

Chemistry naming

2,4-dinitrotoluene

toluene

Trivial name: (toluene), plus additional groups (dinitro) and positions (2,4)

Alternative names: 1-methyl-2,4-dinitro-benzene, 2,4-dinitromethylbenzene, 2,4-DNT and so onGeneric references: dinitrotoluenes

Page 20: SciBorg: Deep Processing and Chemical Informatics

Chemistry Markup Language (CML, Murray-Rust et al)

• Language for formal, precise specification of organic chemistry structures in XML

• Language being actively extended• Markup of chemistry papers with CML• Already extensive online appendices to chemistry papers

(spectra etc)• Authoring tools for checking papers (e.g., checking that

name used matches with spectrum)• OSCAR-3: identification of productive chemistry terms

and conversion to CML • OSCAR-3: now in use by RSC journal publications

Page 21: SciBorg: Deep Processing and Chemical Informatics

Oscar Annotations

• We use Oscar3 to identify possible chemical terms (and formatted data sections)

• Interpretations:– {compound, element, substance} -> nominal lexical

entry (possibly plural)– reaction (e.g., methylate) -> verb (or nominalisation)

• Ambiguity: e.g., lead, In• High recall, low precision mode: treat as token

and sense ambiguity for ERG (and RASP?)

Page 22: SciBorg: Deep Processing and Chemical Informatics

• Better, rhetorically oriented search– “Find me contradictory claims to the ones in that

paper”• Improve automatic indexing (eg. CiteSeer)

– At-a-glance map shows type of rhetorical relations between papers

– Automatic classification rather than human perusing of each citation context

• Which citations are more important in the paper?• What is the authors’ stance towards them?• Find “schools of thought”

• Difference and similarity-oriented summaries

Research Markup for e-chemistry

Page 23: SciBorg: Deep Processing and Chemical Informatics

Research markupSynthesis of pyrazole and pyrimidine Tröger’s base analogues

Rodrigo<SURNAME>Abonia</SURNAME></NAME></AUTHOR><AUTHOR ID="2"><NAME>Andrea<SURNAME>Albornoz</SURNAME></NAME></AUTHOR><AUTHOR

ID="3"><NAME>Hector<SURNAME>Larrahondo</SURNAME></NAME></AUTHOR><AUTHOR ID="4"><NAME>Jairo<SURNAME>Quiroga</SURNAME></NAME></AUTHOR><AUTHOR

ID="5"><NAME>Braulio<SURNAME>Insuasty</SURNAME></NAME></AUTHOR><AUTHOR ID="6"><NAME>Henry<SURNAME>Insuasty</SURNAME></NAME></AUTHOR><AUTHOR

ID="7"><NAME>Angelina<SURNAME>Hormaza</SURNAME></NAME></AUTHOR><AUTHOR ID="8"><NAME>Adolfo<SURNAME>Sánchez</SURNAME></NAME></AUTHOR><AUTHOR

ID="9"><NAME>Manuel<SURNAME>Nogueras</SURNAME></NAME></AUTHOR></AUTHORLIST><ABSTRACT>Tröger's-base analogues bearing fused pyrazolic or pyrimidinic rings were prepared in acceptable to good yields through the reaction of 3-alkyl-5-amino-1-arylpyrazoles and 6-aminopyrimidin-

4(3<IT>H</IT>)-ones with formaldehyde under mild conditions (<IT>i.e.</ IT>, in ethanol at 50 °C in the presence of catalytic amounts of acetic acid). Two key intermediates were isolated from the reaction mixtures, which helped us to suggest a sequence of steps for the formation of the Tröger's bases obtained.

The structures of the products were assigned by <SP>1</SP>H and <SP>13</SP>C NMR, mass spectra and elemental analysis and confirmed by X-ray diffraction for one of the obtained compounds.</ABSTRACT><BODY> <DIV DEPTH="1"><HEADER>Introduction</HEADER>

<P>Although the first Tröger's base <XREF ID="chem1" TYPE="COMPOUND">1</XREF> was obtained more than a century ago from the reaction of <IT>p</ IT>-toluidine and formaldehyde,<REF ID="cit1" TYPE="P">1</REF> recently the study of these compounds has gained importance due to their

potential applications. They posses a relatively rigid chiral structure which makes them suitable for the development of possible synthetic enzyme and artificial receptor systems,<REF ID="cit2" TYPE="P">2</REF> chelating and biomimetic systems,<REF ID="cit3" TYPE="P">3</REF> and transition metal

complexes for regio- and stereoselective catalytic reactions.<REF ID="cit4" TYPE="P">4</REF> For these reasons, numerous Tröger's-base derivatives have been prepared bearing different types of substituents and structures (<IT>i.e.</IT>, <XREF ID="chem2 chem3 chem4 chem5" TYPE="COMPOUND">2–

5</XREF> Scheme 1), with the purpose of increasing their potential applications.<REF ID="cit2 cit3 cit5" TYPE="P">2,3,5</REF> However, some of the above methodologies possess tedious work-up procedures or include relatively strong reaction conditions, such as treatment of the starting materials for several hours with an ethanolic solution of conc. hydrochloric acid or TFA solution, with poor to moderate yields, as is the case for analogues <XREF ID="chem4" TYPE="COMPOUND">4</XREF> and <XREF ID="chem5" TYPE="COMPOUND">5</XREF>.<REF

ID="cit5e" TYPE="P">5<IT>e</ IT></REF></P> <P>Considering these potential applications, we now report a simple synthetic method for the preparation of 5,12-dialkyl-3,10-diaryl-1,3,4,8,10,11-

hexaazatetracyclo[6.6.1.0<SP>2,6</SP>.0<SP>9,13</SP>]pentadeca-2(6),4,9(13),11-tetraenes <XREF ID="chem8a chem8b chem8c chem8d chem8e" TYPE="COMPOUND">8a–e</XREF> and 4,12-dimethoxy-1,3,5,9,11,13-hexaazatetracyclo[7.7.1.0<SP>2,7</SP>.0<SP>10,15</SP>]heptadeca-

2(7),3,10(15),11-tetraene-6,14-diones <XREF ID="chem10a chem10b" TYPE="COMPOUND">10a,b</XREF> based on the reaction of 3-alkyl-5-amino-1-arylpyrazoles <XREF ID="chem6" TYPE="COMPOUND">6</XREF> and 6-aminopyrimidin-4(3<IT>H</IT>)-ones <XREF ID="chem9" TYPE="COMPOUND">9</XREF> with formaldehyde in ethanol and catalytic amounts of acetic acid. Compounds <XREF ID="chem8"

TYPE="COMPOUND">8</XREF> and <XREF ID="chem10" TYPE="COMPOUND">10</XREF> are new Tröger's-base analogues bearing heterocyclic rings instead of the usual phenyl rings in their aromatic parts.</P> </DIV> <DIV DEPTH="1"><HEADER>Results and discussion</HEADER>

<P>In an attempt to prepare the benzotriazolyl derivative <XREF ID="chem7a" TYPE="COMPOUND">7a</XREF>, which could be used as an intermediate in the synthesis of new hydroquinoline analogues of interest,<REF ID="cit6" TYPE="P">6</REF> a mixture of 5-amino-3-methyl-1-

phenylpyrazole <XREF ID="chem6a" TYPE="COMPOUND">6a</XREF>, formaldehyde and benzotriazole in 10 mL of ethanol, with catalytic amounts of acetic acid, was heated at 50 °C for 5 minutes. A solid precipitated from the solution while it was still hot. However, no consumption of benzotriazole was observed

by TLC.</P> <P>The reaction conditions were modified and the same product was obtained when the reaction was carried out without using benzotriazole, as shown in Chart 1. On the basis of NMR and mass spectra and X-ray crystallographic analysis we established that the structure of this compound is 5,12-dimethyl-3,10-diphenyl-1,3,4,8,10,11-hexaazatetracyclo[6.6.1.0<SP>2,6</SP>.0<SP>9,13</SP>]pentadeca-2(6),4,9(13),11-tetraene

<XREF ID="chem8a" TYPE="COMPOUND">8a</XREF>, a new pentagonal Tröger's-base analogue.<REF ID="cit7" TYPE="P">7</REF> This result prompted us to explore other aminopyrazoles <XREF ID="chem6a chem6b chem6c chem6d chem6e" TYPE="COMPOUND">6b–e</XREF> and aminopyrimidinones<XREF ID="chem9a chem9b" TYPE="COMPOUND">9a,b</XREF>, which have now shown similar chemical reactivity, yielding the corresponding products

<XREF ID="chem8a chem8b chem8c chem8d chem8e" TYPE="COMPOUND">8b–e</XREF> and <XREF ID="chem10a chem10b" TYPE="COMPOUND">10a,b</XREF>in acceptable to good yields and in relatively short reaction times, as shown in Table 1 and the Experimental

section.</P> <P>In the preparation of <XREF ID="chem8e" TYPE="COMPOUND">8e</XREF>, a yellow and sparingly soluble precipitate was initially obtained under the above conditions, and which corresponded to the

partially cyclized intermediate <XREF ID="chem11e" TYPE="COMPOUND">11e</XREF> (Chart 2). Heating of <XREF ID="chem11e" TYPE="COMPOUND">11e</XREF> for one hour with more formaldehyde (1.5 equivalents) in ethanol (˜ 20 mL), until complete dissolution, yielded the

expected product <XREF ID="chem8e" TYPE="COMPOUND">8e</XREF> in 70% yield. Compound <XREF ID="chem8e" TYPE="COMPOUND">8e</XREF> was directly obtained in 67% yield, by heating of the starting materials in 20–30 mL of ethanol without precipitation of <XREF ID="chem11e"

TYPE="COMPOUND">11e</XREF>. A similar result was obtained from the reaction of the aminopyrimidine <XREF ID="chem9c" TYPE="COMPOUND">9c</XREF> with formaldehyde, but in this case it was impossible to cyclizethe intermediate <XREF ID="chem12"

TYPE="COMPOUND">12</XREF> to <XREF ID="chem10c" TYPE="COMPOUND">10c</XREF> under our experimental conditions, due to its poor solubility (Chart 2). Some compounds of type <XREF ID="chem11" TYPE="COMPOUND">11</XREF>(<XREF ID="chem12" TYPE="COMPOUND">12</XREF>) have previously been obtained from similar reactions.<REF ID="cit8" TYPE="P">8</REF></P> <P>All compounds

were extensively characterized by <SP>1</SP>H and <SP>13</SP>C NMR spectra (including DEPT, COSY and HMBC techniques)<REF ID="cit9" TYPE="P">9</REF> and by mass spectra and elemental analysis. All signals in the <SP>1</SP>H NMR spectrum are consistent with the structures proposed

for compounds <XREF ID="chem8" TYPE="COMPOUND">8</XREF> and <XREF ID="chem10" TYPE="COMPOUND">10</XREF>, where the most relevantfeature is the non-equivalence of the geminal protons 7-/14-CH<SB>2</SB> and 8-/16-CH<SB>2</SB> respectively, each showing a geminally coupled doublet with reference to H-<IT>endo</ IT> and H-<IT>exo</ IT> in the framework. The bridging 15-/17-CH<SB>2</SB> protons appear as singlets, in

agreement with what has previously been observed for similar systems.<REF ID="cit5" TYPE="P">5</REF> The main feature observed in the <SP>13</SP>C NMR spectra to both compounds <XREF ID="chem8" TYPE="COMPOUND">8</XREF> and <XREF ID="chem10"

TYPE="COMPOUND">10</XREF>is the regular sequence 7-/14-C, 15-C, 6-/13-C, 2-/9-C and 8-/16-C, 17-C, 7-/15-C, 2-/10-C from high field to low field respectively, corresponding to the four carbon atoms of their concavities. The other aliphatic and aromatic carbon atoms were also assigned to both

structures. The structures of compounds <XREF ID="chem11e" TYPE="COMPOUND">11e</XREF> and <XREF ID="chem12" TYPE="COMPOUND">12</XREF> were also supported by the appearance of N–H stretchings at <IT>?</IT> = 3295 and

<IT>?</IT> = 3400, respectively, in the IR spectra and by a singlet at <IT>d</ IT> = 5.82 in the <SP>1</SP>H NMR spectrum of compound <XREF ID="chem11e" TYPE="COMPOUND">11e</XREF>. This signal corresponds to the free pyrazolic proton, which commonly appears at a higher field than a normal aromatic proton.<REF ID="cit10" TYPE="P">10</REF> Mass spectra and elemental analysis were also consistent with structures <XREF ID="chem11e" TYPE="COMPOUND">11e</XREF>and <XREF ID="chem12" TYPE="COMPOUND">12</XREF>.</P>

<P>According to these results, compounds <XREF ID="chem8a chem8b chem8c chem8d chem8e" TYPE="COMPOUND">8a–e</XREF> and <XREF ID="chem10a chem10b" TYPE="COMPOUND">10a,b</XREF> could be formed through intermediates of type <XREF ID="chem11"

TYPE="COMPOUND">11</XREF> and <XREF ID="chem12" TYPE="COMPOUND">12</XREF>, respectively, by an intramolecular cyclization from protonatedalcohol <XREF ID="chem15" TYPE="COMPOUND">15</XREF> (in the case of compounds <XREF ID="chem8" TYPE="COMPOUND">8</XREF>) as shown in

Scheme 2. C-Alkylation as the first step (forming protonated alcohol <XREF ID="chem13" TYPE="COMPOUND">13</XREF>) is well supported for aminopyrazoles and aminopyrimidines.<REF ID="cit11" TYPE="P">11</REF> The presence of a 5 (or 6)-amino group increases the reactivity of position 4 or 5, respectively, toward condensation reactions. Then water is displaced by a second molecule of <XREF ID="chem6" TYPE="COMPOUND">6</XREF> through

intermediate <XREF ID="chem14" TYPE="COMPOUND">14</XREF> (not isolated) which reacts with another molecule of <XREF ID="chem6" TYPE="COMPOUND">6</XREF> to afford the isolated intermediate type <XREF ID="chem11e" TYPE="COMPOUND">11e</XREF>. The last step (conversion

of <XREF ID="chem15" TYPE="COMPOUND">15</XREF> to <XREF ID="chem8" TYPE="COMPOUND">8</XREF>) could occur under an <IT>S</IT><SB>N</SB>1 or <IT>S</ IT><SB>N</SB>2 reaction. However, it seems more likely that an <IT>S</ IT><SB>N</SB>1 reaction occurred,

according to the reaction conditions used. This proposed sequence is also supported by the lack of formation of compound <XREF ID="chem7a" TYPE="COMPOUND">7a</XREF> (Chart 1). In fact, if N-alkylation had been the first step instead of C-alkylation, compound <XREF ID="chem7a"

TYPE="COMPOUND">7a</XREF> would have certainly been the only product obtained from this reaction, as is usually the case.<REF ID="cit6" TYPE="P">6</REF></P> <P>In conclusion, we have adapted milder and more efficient reaction conditions (in

comparison with the previous report)<REF ID="cit5e" TYPE="P">5<IT>e</IT></REF> for the synthesis of five new pyrazole and two new pyrimidineTröger's-base analogues. This methodology could be extended to other starting monoamines for Tröger's bases, and the newly obtained compounds offer further possibilities for potential applications, considering that only a few examples of Tröger's bases bearing heterocyclic rings instead of the usual phenyl

group in their aromatic part have previously been reported.<REF ID="cit5e" TYPE="P">5<IT>e</IT></REF> Also, we have reported the isolation of two key intermediates from the reaction mixtures (<IT>i.e.</ IT>, compounds <XREF ID="chem11e" TYPE="COMPOUND">11e</XREF> and <XREF ID="chem12"

TYPE="COMPOUND">12</XREF>), which helped us to suggest a sequence of steps for the formation of the Tröger's bases obtained. Similar findings previously reportedsupport this proposal.<REF ID="cit8" TYPE="P">8</REF> Finally, owing to the high content of nitrogen atoms in compounds <XREF

ID="chem8 chem9 chem10" TYPE="COMPOUND">8–10</XREF>, we are planning to try some of them as possible mono- or bidentate ligands in the synthesis of interesting transition metal clusters of some of the group eight metals (<IT>i.e.</ IT>, Fe, Ru and Os), as recently has been reported for other

homo- and heterocyclic organic molecules.<REF ID="cit12" TYPE="P">12</REF></P> </DIV> <DIV DEPTH="1"><HEADER>Experimental</HEADER> <DIV DEPTH="2"><HEADER>General methods</HEADER>

<P>All melting points were determined on a Büchi melting-point apparatus and are uncorrected. NMR spectra were recorded on a Bruker DPX 300 (300 MHz and 75.5 MHz, for <SP>1</SP>H and <SP>13</SP>C, respectively), CDCl<SB>3</SB> and DMSO-<IT>d</ IT><SB>6</SB>

(<IT>d</ IT><SB>H</SB> = 2.5; <IT>d</IT><SB>C</SB> = 39.5) as solvents, TMS as internal standard. IR spectra were recorded on an ATI-MATTSON FT spectrophotometer for samples in KBr discs. Mass spectra were run on a Hewlett

Packard 5989-B spectrometer (EI , 70 eV). Microanalyses were performed with a LECO CHNS-900 elemental analyzer. The starting aminopyrazoles <XREF ID="chem6" TYPE="COMPOUND">6</XREF> were prepared from 3-aminocrotononitrile and the appropriate phenylhydrazine following a general procedure described in ref. 13. For <IT>tert</IT>-butyl derivatives 4,4-dimethyl-3-oxopentanonitrile was used instead of the 3-aminocrotononitrile. Aminopyrimidines

<XREF ID="chem9a chem9b chem9c" TYPE="COMPOUND">9a–c</XREF>were prepared following the procedure described in ref. 14.</P></DIV> <DIV DEPTH="2"><HEADER>General procedure for preparing the compounds <XREF ID="chem8a chem8b chem8c chem8d

chem8e" TYPE="COMPOUND">8a–e</XREF> and <XREF ID="chem10a chem10b" TYPE="COMPOUND">10a,b</XREF></HEADER><P>A solution of a 5-amino-3-alkyl-1-arylpyrazole <XREF ID="chem6" TYPE="COMPOUND">6</XREF> (2.89 mmol), formaldehyde (10.0 mmol;

37% solution) and acetic acid (0.2–0.5 mL) in 10–30 mL of ethanol was heated to 50 °C for 30–90 minutes and monitored by TLC. After cooling, the precipitate was filtered off, and recrystallized from ethanol or alternatively purified by column chromatography on silica gel with chloroform as eluent. The same procedure was followed for compounds <XREF ID="chem10a chem10b" TYPE="COMPOUND">10a,b</XREF> by using 50 mL of ethanol and heating

the mixtures for two hours.</P> <DIV DEPTH="3"><HEADER>5,12-Dimethyl-3,10-diphenyl-1,3,4,8,10,11-hexaazatetracyclo[6.6.1.0<SP>2,6</SP>.0<SP>9,13</SP>]pentadeca-2(6),4,9(13),11-tetraene <XREF ID="chem8a"

TYPE="COMPOUND">8a</XREF></HEADER><P>White solid (Found: C, 72.1; H, 5.9; N, 22.1. C<SB>23</SB>H<SB>22</SB>N<SB>6</SB> requires C, 72.2; H, 5.8; N, 22.0%); <IT>?</IT><SB>max</SB>(disc)/cm<SP>–1</SP> 1593br and 1498br; <IT>d</IT><SB>H</SB>(300 MHz; DMSO-

<IT>d</ IT><SB>6</SB>; Me<SB>4</SB>Si) 1.92 (6 H, s, 5-/12-Me), 3.56 (2 H, d, <IT>J</ IT><SB><IT>gem</IT></SB> 15.7, 7-/14-H<SB><IT>endo</IT></SB>), 4.26 (2 H, d, <IT>J </IT><SB><IT>gem</IT></SB> 15.7, 7-/14-H<SB><IT>exo</ IT></SB>), 4.31 (2 H, s, 15-

H<SB>2</SB>), 7.29 (2 H, t, <IT>J </IT> 7.4, Ph–H<SB><IT>para</IT></SB>), 7.50 (4 H, br t, <IT>J</IT> 7.9, Ph–H<SB><IT>meta</IT></SB>) and 7.96 (4 H, d, <IT>J</ IT> 8.7, Ph–H<SB><IT>ortho</IT></SB>); <IT>d</IT><SB>C</SB>(75 MHz; DMSO-

<IT>d</ IT><SB>6</SB>;Me<SB>4</SB>Si) 12.1 (5-/12-Me), 47.4 (7-/14-C), 67.5 (15-C), 104.3 (6-/13-C), 119.7 (Ph–C<SB><IT>meta</IT></SB>), 125.5 (Ph–C<SB><IT>para</IT></SB>), 129.3 (Ph–C<SB><IT>ortho</IT></SB>), 139.2 (Ph–C<SB><IT>ipso</IT></SB>), 144.7 (2-/9-C) and 144.9

(5-/12-C); <IT>m</ IT>/<IT>z</ IT> (EI) 382 (M<SP>+</SP>, 100%), 354 (23), 198 (53), 77 (30).</P> </DIV><DIV DEPTH="3"><HEADER>5,12-Di-(<IT>tert</IT>-butyl)-3,10-diphenyl-1,3,4,8,10,11-

hexaazatetracyclo[6.6.1.0<SP>2,6</SP>.0<SP>9,13</SP>]pentadeca-2(6),4,9(13),11-tetraene <XREF ID="chem8b" TYPE="COMPOUND">8b</XREF></HEADER><P>White solid (Found: C, 74.7; H, 7.25; N, 18.15. C<SB>29</SB>H<SB>34</SB>N<SB>6</SB> requires

C, 74.65; H, 7.3; N, 18.0%); <IT>?</IT><SB>max</SB>(disc)/cm<SP>–1</SP> 1598br and 1500br; <IT>d</IT><SB>H</SB>(300 MHz; CDCl<SB>3</SB>; Me<SB>4</SB>Si) 1.15 (18 H, s, 5-/12-Bu<SP><IT>t</ IT></SP>), 3.82 (2 H, d, <IT>J</ IT><SB><IT>gem</IT></SB> 15.6, 7-

/14-H<SB><IT>endo</IT></SB>), 4.23 (2 H, s, 15-H<SB>2</SB>), 4.25 (2 H, d, <IT>J </IT><SB><IT>gem</IT></SB> 15.6, 7-/14-H<SB><IT>exo</IT></SB>), 7.24 (2 H, t, <IT>J</IT> 7.3, Ph–H<SB><IT>para</IT></SB>), 7.43 (4 H, t, <IT>J </ IT> 7.5, Ph–

H<SB><IT>meta</IT></SB>) and 7.95 (4 H, d, <IT>J</ IT> 8.3, Ph–H<SB><IT>ortho</IT></SB>); <IT>d</ IT><SB>C</SB>(75MHz; CDCl<SB>3</SB>; Me<SB>4</SB>Si) 29.4 (5-/12-Bu<SP><IT>t</ IT></SP>- × 6C), 33.2 (5-/12-Bu<SP><IT>t</ IT></SP>- × 2C), 49.6 (7-/14-C),

68.1 (15-C), 102.4 (6-/13-C), 121.3 (Ph–C<SB><IT>meta</IT></SB>), 125.8 (Ph–C<SB><IT>para</IT></SB>), 129.1 (Ph–C<SB><IT>ortho</IT></SB>), 139.8 (Ph–C<SB><IT>ipso</IT></SB>), 145.3 (2-/9-C) and 157.0 (5-/12-C); <IT>m</IT>/<IT>z</ IT> (EI) 466

(M<SP>+</SP>, 100%), 438 (18), 240 (60), 77 (26).</P> </DIV> <DIV DEPTH="3"><HEADER>3,10-Bis-(<IT>p</IT>-chlorophenyl)-5,12-dimethyl-1,3,4,8,10,11-hexaazatetracyclo[6.6.1.0<SP>2,6</SP>.0<SP>9,13</SP>]pentadeca-2(6),4,9(13),11-

tetraene <XREF ID="chem8c" TYPE="COMPOUND">8c</XREF></HEADER><P>Pale yellow solid (Found: C, 61.1; H, 4.55; N, 18.7. C<SB>23</SB>H<SB>20</SB>Cl<SB>2</SB>N<SB>6</SB> requires C, 61.2; H, 4.5; N, 18.6%); <IT>?</ IT><SB>max</SB>(disc)/cm<SP>–1</SP>

1602br and 1493br; <IT>d</IT><SB>H</SB>(300 MHz; CDCl<SB>3</SB>; Me<SB>4</SB>Si) 2.02 (6 H, s, 5-/12-Me), 3.58 (2 H, d, <IT>J</ IT><SB><IT>gem</IT></SB> 15.6, 7-/14-H<SB><IT>endo</IT></SB>), 4.14 (2 H, d, <IT>J </IT><SB><IT>gem</IT></SB> 16.1, 7-/14-

H<SB><IT>exo</IT></SB>), 4.21 (2 H, s, 15-H<SB>2</SB>), 7.39 (4 H, d, <IT>J</ IT> 8.3, Ar–H<SB><IT>meta</IT></SB>) and 7.92 (4 H, d, <IT>J</ IT> 8.3, Ar–H<SB><IT>ortho</IT></SB>); <IT>d</IT><SB>C</SB>(75 MHz; CDCl<SB>3</SB>; Me<SB>4</SB>Si) 13.3 (5-/12-Me),49.2 (7-

/14-C), 69.5 (15-C), 105.6 (6-/13-C), 123.2 (Ar–C<SB><IT>meta</IT></SB>), 130.8 (Ar–C<SB><IT>ortho</IT></SB>), 132.6 (Ar–C<SB><IT>para</IT></SB>), 139.4 (Ar–C<SB><IT>ipso</IT></SB>), 146.4 (2-/9-C) and 147.3 (5-/12-C); <IT>m</ IT>/<IT>z</ IT> (EI) 454/452/450

(M<SP>+</SP>, Cl<SB>2</SB> pattern, 100%), 422 (31), 232 (48), 111 (17).</P> </DIV> <DIV DEPTH="3"><HEADER>5,12-Di-(<IT>tert</ IT>-butyl)-3,10-bis-(<IT>p</ IT>-chlorophenyl)-1,3,4,8,10,11-

hexaazatetracyclo[6.6.1.0<SP>2,6</SP>.0<SP>9,13</SP>]pentadeca-2(6),4,9(13),11-tetraene <XREF ID="chem8d" TYPE="COMPOUND">8d</XREF></HEADER><P>Light pink solid (Found: C, 65.15; H, 5.9; N, 13.2.

C<SB>29</SB>H<SB>32</SB>Cl<SB>2</SB>N<SB>6</SB> requires C, 65.0; H, 6.0; N, 13.2%); <IT>?</ IT><SB>max</SB>(disc)/cm<SP>–1</SP> 1590br and 1502br; <IT>d</IT><SB>H</SB>(300 MHz; CDCl<SB>3</SB>; Me<SB>4</SB>Si) 1.15 (18 H, s, 5-/12-Bu<SP><IT>t</ IT></SP>), 3.78 (2

H, d, <IT>J </ IT><SB><IT>gem</IT></SB> 16.1, 7-/14-H<SB><IT>endo</ IT></SB>), 4.20 (2 H, s, 15-H<SB>2</SB>), 4.26 (2 H, d, <IT>J</ IT><SB><IT>gem</IT></SB> 15.6, 7-/14-H<SB><IT>exo</IT></SB>), 7.38 (4 H, d, <IT>J</IT> 8.8, Ar–H<SB><IT>meta</IT></SB>) and 7.93 (4 H, d, <IT>J </IT> 8.8, Ar–H<SB><IT>ortho</IT></SB>); <IT>d</ IT><SB>C</SB>(75 MHz; CDCl<SB>3</SB>;Me<SB>4</SB>Si) 29.3 (5-/12-

Bu<SP><IT>t</ IT></SP>- × 6C), 33.3 (5-/12-Bu<SP><IT>t</ IT></SP>- × 2C), 49.6 (7-/14-C), 69.9 (15-C), 102.6 (6-/13-C), 122.0 (Ar–C<SB><IT>meta</IT></SB>), 129.3 (Ar–C<SB><IT>ortho</IT></SB>), 131.1 (Ar–C<SB><IT>para</IT></SB>), 133.3 (Ar–

C<SB><IT>ipso</IT></SB>), 145.3 (2-/9-C) and 157.4 (5-/12-C); <IT>m</ IT>/<IT>z</IT> (EI) 538/536/534 (M<SP>+</SP>, Cl<SB>2</SB> pattern, 100%), 506 (28), 274 (45), 111 (18).</

Rodrigo Abonia, Andrea Albornoz, Hector Larrahondo, J airo Quiroga, Braulio Insuasty, Henry Insuasty, Angelina Hormaza, Adolfo Sánchez, Manuel Nogueras

1588 J .Chem. Soc, Perkin Trans. 1, 2002, 1588-1591 DOI : 10.1039/b200862aThis journal is © The Royal Society of Chemistry 2002

1PerkinTröger's-base analogues bearing fused pyrazolic or pyrimidinic rings were prepared in acceptable to good yields through the reaction of 3-alkyl-5-amino-1-arylpyrazoles and 6-aminopyrimidin-4(3H)-ones with formaldehyde under mild conditions ( i.e., in ethanol at 50 °C in the presence of catalytic amounts of acetic acid). Two key intermediates were isolated from the reaction mixtures, which helped us to suggest a sequence of steps for the formation of the Tröger's bases obtained. The structures of the products were assigned by 1 H and 13 C NMR, mass spectra and elemental analysis and confirmed by X-ray diffraction for one of the obtained compounds.

For these reasons, numerous Tröger's-base derivatives have been prepared bearing different types of substituents and structures (i.e., 2– 5 Scheme 1), with the purpose of

I ntroduction

Although the first Tröger's base 1 was obtained more than a century ago from the reaction of p-toluidine and formaldehyde,[1] recently the study of these compounds has gained importance due to their potential applications. They posses a relatively rigid chiral structure which makes them suitable for the development of possible synthetic enzyme and artificial receptor systems, [2] chelating and biomimetic systems, [3] and transition metal complexes for regio- and stereoselective catalytic reactions.[4]

increasing their potential applications.[2,3,5]

Considering these potential applications, we now report a simple synthetic method for the preparation of 5,12-dialkyl-3,10-diaryl-1,3,4,8,10,11-hexaazatetracyclo[6.6.1.0 2,6 .0 9,13]pentadeca-2(6),4,9(13),11-tetraenes 8a– e and 4,12-dimethoxy-1,3,5,9,11,13-hexaazatetracyclo[7.7.1.0 2,7.0 10,15heptadeca-2(7),3,10(15),11- tetraene-6,14-diones 10a,b based on the reaction of 3-alkyl-5-amino-1-arylpyrazoles 6and 6-aminopyrimidin-4(3H)-ones 9 with formaldehyde in ethanol and catalytic

However, some of the above methodologies possess tedious work-up procedures or include relatively strong reaction conditions, such as treatment of the starting materials for several hours with an ethanolicsolution of conc. hydrochloric acid or TFA solution, with poor to moderate yields, as is the case for analogues 4 and 5.

Results and discussionI n an attempt to prepare the benzotriazolyl derivative 7a, which could be used as an intermediate in the synthesis of new hydroquinoline analogues of interest,[6] a mixture of 5-amino-3-methyl-1-phenylpyrazole 6a, formaldehyde and benzotriazole in 10 mL of ethanol, with catalytic amounts of acetic acid, was heated at 50 °C for 5 minutes. A solid precipitated from the solution while it was still hot. However, no consumption of benzotriazole was observed by TLC.

The reaction conditions were modified and the same product was obtained when the reaction was carried out without using benzotriazole, as shown in Chart 1. On the basis of NMR and mass spectra and X-ray crystallographic analysis we established that the structure of this compound is 5,12-dimethyl-3,10-diphenyl-1,3,4,8,10,11-hexaazatetracyclo[6.6.1.0 2,6.0 9,13] pentadeca-2(6),4,9(13),11-tetraene 8a, a new pentagonal Tröger's-

amounts of acetic acid. Compounds 8 and 10 are new Tröger's-base analogues bearing heterocyclic rings instead of the usual phenyl rings in their aromatic parts.

Background

Other

OwnBased

Contrast

Textual

Aim

Legenda:

Page 24: SciBorg: Deep Processing and Chemical Informatics

Research markup• Chemistry: The primary aims of the present study are (i)

the synthesis of an amino acid derivative that can be incorporated into proteins /via/ standard solid-phase synthesis methods, and (ii) a test of the ability of the derivative to function as a photoswitch in a biological environment.

• Computational Linguistics: The goal of the work reported here is to develop a method that can automatically refine the Hidden Markov Models to produce a more accurate language model.

Page 25: SciBorg: Deep Processing and Chemical Informatics

RMRS and research markup• Specify cues in RMRS: e.g.,

– l1:objective(x), ARG1(l1,y), l2:research(y)– The concept objective generalises the predicates for aim, goal

etc and research generalises study, work etc. Ontology for rhetorical structure.

• Deep process possible cue phrases to get RMRSs: – feasible because domain-independent– more general and reliable than shallow techniques– allows for complex interrelationships e.g., our goal is not to ... but to ...

• Use zones for advanced citation maps (e.g., X cites Y (contrast)) and other enhancements to repositories

Page 26: SciBorg: Deep Processing and Chemical Informatics

Conclusion: extending technology in several ways

• SciXML (and standoff)– general framework for scientific texts

• more extensive and more varied IE-like operations– support for scientific discourse processing– ontology extraction

• finer-grained deep-shallow integration– deep cue phrase analysis

• unusual NER-like processing for chemistry with OSCAR3• discourse level processing with DELPH-IN technology

– anaphora, WSD, citations and research markup

Page 27: SciBorg: Deep Processing and Chemical Informatics

Status of SciBorg aims1. NL markup language (RMRS). Basic architecture for text

processing in place (SciXML, standoff, lattices, OSCAR-3, RASP2 and ERG/PET). Next steps:– debugging scripts, regression test sets– Treebank with ERG (maybe use for evaluating RASP ranking too?)– RMRS lattices from packed representations?– use of CamGrid (coarse-grained parallelism)

2. IE technology and core ontologies. OSCAR-3 in use by RSC.– Initial experiments with ontology extraction based on RASP-RMRS

from Wikipedia (Aurelie Herbelot).3. Model scientific argumentation and citation purpose. Finding

rhetorical cues with aid of RMRS (so far in CL papers only). 4. Applicability in a real-world eScience environment.

– Partial change in emphasis to using technology for authoring support, based on publishers’ interests.

Page 28: SciBorg: Deep Processing and Chemical Informatics

Using external ontologies• concepts like research generalizing study, work

etc: automatic acquisition? (machine learning or FrameNet)

• IE is ontologically driven (some ontologies exist for Chemistry, but not as rich as biology, hence the need to augment)

• chemical naming provides implicit ontology• ontologies bootstrapping ontology acquisition• CML target for IE tasks• classification of trivial chemistry names etc