SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology...

21
SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural Language David Mott, Dave Braines, ETS, Hursley, IBM UK

Transcript of SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology...

Page 1: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy

(C) Copyright IBM Corp. 2006, 2012. All Rights Reserved.

International Technology Alliance Programme:

Fact Extraction using a Controlled Natural Language

David Mott, Dave Braines, ETS, Hursley, IBM UK

Page 2: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.2

Team

Dave Braines, David Mott

– IBM, Hursley

Steve Poteet, Ping Xue, Anne Kao

– Boeing, Seattle

Paul Smart, Antonio Penta, Ron Tasker

– University of Southampton

Page 3: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.3

International Technology Alliance (ITA) in network and information sciences

How can coalition operations be assisted by networks of computer systems?

US/UK Academic/Industry collaboration

10 year programme ending in May 2016

– Sponsored by UK MOD and US ARL

– Research must be scientific, fundamental, reviewed by academic peers, and published

Page 4: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.4

ITA Consortium Members

Page 5: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.5

Fundamental Research Issues

How do we assist people to create and use applications that reason?– Modelling concepts, relationships and rules of inference

– Grasping the basic logic of the model and rules

– Understanding the reasoning performed by others

– Sharing understanding across the human team

– Sharing reasoning and artefacts across different systems

Page 6: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.6

Supporting the "analyst"

doc27doc27

doc27

CE Facts

Inference Rationale

Argumentation

Query

Analysts Conceptual Model

Assumptions

Uncertainty CNL Tools

NLP

Requirements

Product

Page 7: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.7

Analysts's "Conceptual Model"

Analyst represents specialist knowledge as concepts, facts and rules for inference

– a conceptual model

– a common set of concepts

The system must "understand" the conceptual model

– assist analyst to search for patterns, deduce information

A language to build the conceptual model

– analyst: easy to understand

– system: readable, unambiguous and formal

We use Controlled English to express the model

Page 8: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.8

Controlled English

A Controlled Natural Language, being a subset of English

– limited syntax, but still readable as English

– meanings of the expressions unambiguously defined

Avoids the complexity of a real Natural Language

– computer systems can read, interpret and apply it

Retains the appearance of a real language

– humans can naturally use it, without learning "computer speak"

The analyst may use Controlled English to construct their Conceptual Model

the person John is married to the person Jane and has red as hair colour.

Based on work by John Sowa

Page 9: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.9

CE for Reasoning CE used to define:

– "propositions", facts, assumptions– logical rules– queries– meta model of concepts

Inference engines constructed to apply logical rules– Specific Prolog implementations– CE Store based on Java and SQL

Rationale may be constructed:– presented to users for hybrid man/machine reasoning– to determine dependencies

Formal semantics for CE– (partially defined) in FOPL

Applications– analysis of information– societal and open government data– planning and resource allocation– (in progress) NLP

Page 10: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.10

Fact Extraction using Controlled Natural Language

As the target of the NL processing

– facts in documents can be used for further reasoning

As a means of describing the NL processing

– to share understanding of the linguistic processing

– to help configure NL tooling

Page 11: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.11

Controlled English is "Curiously Useful" – Why?

perhaps because humans are naturally good at using language to model, understand and reason

we can build upon "literary devices" already developed to solve problems in expressing knowledge

Page 12: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.12

Conceptual Model(s)

Meta Model Concept, Entity Concept, Relation Concept, Conceptual Model

belongs to, has as domain

Semiotic Triangle

Thing, Meaning, Symbol stands for, expresses

General Agent, Spatial Entity, Temporal Entity, Situation, Container

has as agent role, is contained in

Linguistic Sentence, Phrase, Word, Noun, Linguistic Category, Linguistic Frame

has as dependent, is parsed from

ACM Place, Church, Person, Village, IED, Facility, .... is located in

meaning

symbol thing

conceptualises

stands for

expresses

"Our" Semiotic Triangle, based on the original [Ogden, C. K. and Richards, I. A. (1923). ]

Page 13: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.13

Current NL Processing

StanfordParser

Entity Extractor

SituationExtractor

Names

CEAggregatorCEStore

SYNCOINReports

MessagePreProcessor

"Stylistic" CE

Conceptual Model(concepts, logical rules, linguistic expression)

Proper Nouns(places, units)

For Analysis

Our focus is on the semantics

of the conceptual

model

Page 14: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.14

General Semantics: Containersif ( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and ( the noun phrase NP2 stands for the thing T2 )then ( the thing T2 is a container ).

the noun phrase np1

the prepositional phrase pp1

has as dependent"the patrol in East Rashid discovers the facility."

the word |in|

the thing t1

stands for

the noun phrase np2

has as head has as object

container

is a

the thing t2

stands for

is contained in

if ( the noun phrase NP1 stands for the thing T1 and has the prepositional phrase PP as dependent ) and ( the prepositional phrase PP has the word '|in|' as head and has the noun phrase NP2 as object ) and ( the noun phrase NP2 stands for the container T2)then ( the thing T1 is contained in the container T2 ).

Least Commitment approach – dont say

what sort of container

Page 15: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.15

Specific Semantics: Entities from Noun Phrases

the noun phrase np1

if ( the noun phrase NP has the noun N as head and stands for the thing T ) and ( the noun N expresses the entity concept C )then ( the thing T realises the entity concept EC ).

"the patrol in East Rashid discovers the facility."

the noun |patrol|

has as head

the thing s1

stands for

the entity concept 'patrol unit'

expresses

realises

patrol unit

Analyst's helper

is a

Requires "expresses" link between words and concepts

Page 16: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.16

"Analyst's Helper"

Analyst HelperNL parser

"expresses"

conceptual model

Proper Names

wordnet/etc

meta information

ITAnet

MetaModel generator

gazetteers etc

Analyst

the word |xxx| is an unrecognised word

wordnet/etc gazetteers etc

translate translate

semantic rules the word |www| expresses the concept yyy

Only the analyst knows what the concepts mean

Page 17: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.17

Current question

How should the "expresses" link be made more expressive!

– conditional rules to handle ambiguous words

– selectional constraints based on semantics of models?

– introduce verbnet, etc?

– ...

Page 18: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.18

The ambiguity barrier

we start from basic CE and move towards full English

Can we control the crossing of the ambiguity barrier?

Basic CE

anaphoric reference

sub clauses

prepositional phrases flexible identities

verb inflections

domain specific syntax

Ambiguity

Ambiguity Barrier

Full English

CE needs to be enhanced

Page 19: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.19

"Identical" NL and CNL parsers

NL Parser CNL Parserlexicon

conceptualmodel

Reference English

Grammar

SemanticTheory

Increase stylistic expressibility of CEBetter understanding of linguistics

stylistically expressive CE

basic CE or predicate logic orCE-in-Java

stylistically expressive CE

NLP

Page 20: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.20

Linguistic Frame for semanticsthere is a linguistic frame named vp0 that

has 'is the dog Fido' as example and

defines the verb phrase VP_vp0 and

has the sequence

( the copula BE_vp0 , and the noun phrase OBJ_vp0 )

as syntactic pattern and

is predicated on the thing T and

has the statement that

( the noun phrase OBJ_vp0 is predicated on the thing OBJ )

and

( the thing T is the same as the thing OBJ )

as semantic statement.

the word |is| belongs to the linguistic category 'copula'.

the word |dog| is a noun.

the entity concept ce:Dog is expressed by the word |dog| and

has 'dog' as concept term.

semantics

syntaxcopula noun

phrase

verb phrase

is the dog fido

v(OBJ), dog(OBJ)..

v(T) T=OBJ,...

Analyst's Conceptual Model

Linguistic Model

We want exactly the same logic here as

in the real NL processing

Page 21: SWG Strategy (C) Copyright IBM Corp. 2006, 2012. All Rights Reserved. International Technology Alliance Programme: Fact Extraction using a Controlled Natural.

SWG Strategy – Emerging Technology Services, Hursley

(C) Copyright IBM Corp. 2006, 2011. All Rights Reserved.21

Could we?

use LKB instead of the Stanford Parser?

use the ERG instead of WordNet etc?

– where does the Analysts Helper fit in?

improve our linguistic model to take account of LKB semantic theory?

represent MRS in CE?

represent linguistic rules in CE?