Lecture semantic augmentation

34
COMP3725 Knowledge Enriched Information Systems Lecture 13: Semantic Augmentation Dhavalkumar Thakker (Dhaval) School of Computing, University of Leeds 1

description

 

Transcript of Lecture semantic augmentation

Page 1: Lecture semantic augmentation

1

COMP3725Knowledge Enriched Information

Systems

Lecture 13: Semantic Augmentation

Dhavalkumar Thakker (Dhaval)School of Computing, University of Leeds

Page 2: Lecture semantic augmentation

2

Outline

• Semantic Augmentation– What – Why– How

• Existing systems & services for Semantic Augmentation

• Challenges

Page 3: Lecture semantic augmentation

3

Semantic Augmentation

• From:

• To:

(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.

(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.

http://dbpedia.org/Ontology/Apple_Corps

http://dbpedia.org/Ontology/New_York_City

Page 4: Lecture semantic augmentation

4

Semantic Augmentation

• Semantic augmentation is a process of attaching semantics to a selected part of a text to assist automatic interpretation of the meaning conveyed by the text.

• Also called semantic annotation, semantic tagging

Page 5: Lecture semantic augmentation

5

It provides additional information about an existing piece of data.

Page 6: Lecture semantic augmentation

6

Why Semantic Augmentation?

• Links to complementary information– “More about this”

• Show related or similar informatiom• Reasoning and inferencing offered by

semantics• Semantic annotation is the glue that ties

ontologies into document spaces – remember existing web is document web

• Manual metadata production cost is too high

Page 7: Lecture semantic augmentation

7

GATE for Semantic Augmentation

• GATE (General Architecture for Text Engineering) – see gate.ac.uk

• GATE Developer is a development environment that provides a rich set of graphical interactive tools for the creation, measurement and maintenance of software components for processing human language.

• See: http://gate.ac.uk/family/developer.html

Page 8: Lecture semantic augmentation

Overview of Gate Developer

• GATE Developer• Resources Pane

– applications: groups of processes to run on a document or corpus

– language resources: corpus, ontologies, schemas– processing resources: tools that operate on

unstructured text– datastores: saved documents and resources

• Display Pane: whatever you’re currently working with.

• See next slide

Page 9: Lecture semantic augmentation

9

GATE : Interface

Resources Pane Display

Pane

Page 10: Lecture semantic augmentation

Processing Resources: ANNIE

• A family of Processing Resources for language analysis included with GATE

• Stands for A Nearly-New Information Extraction system.

• Using finite state techniques to implement various tasks: tokenization, semantic tagging, verb phrase chunking, and so on.

Page 11: Lecture semantic augmentation

ANNIE IE Modules

http://gate.ac.uk/sale/tao/splitch6.html#chap:annie

Page 12: Lecture semantic augmentation

Some ANNIE Components

• Tokenizer– word, number, symbol, punctuation, and spaceToken.

• Sentence Splitter– Segments text into sentences

• Part of Speech Tagger– produces a part-of-speech tag as an annotation on each word or

symbol – Nouns, verbs etc.

• Gate Morphological Analyser – detecting morphemes in a piece of text (e.g. car,

caring)• OntoGazetteer

– Semantic Tagging component – uses ontology

Page 13: Lecture semantic augmentation

13

Demo:

• From:

• To:

(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.

(…) Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.

http://dbpedia.org/Ontology/Apple_Corps

http://dbpedia.org/Ontology/New_York_City

13

Page 14: Lecture semantic augmentation

14

Step : Download & Start the GATE application

• Download GATE from: http://gate.ac.uk/download/

• Note: the demonstration is using GATE 6.0

Page 15: Lecture semantic augmentation

15

Step: From Language Resources Select

• GATE document-> Make sure that String content is selected in the last field, see screenshot below. Name the file “Test”

Page 16: Lecture semantic augmentation

16

Paste following text…in the file

• Upon their return, Lennon and McCartney went to New York to announce the formation of Apple Corps.

Page 17: Lecture semantic augmentation

17

Step: From Processing resources select following resources

• ANNIE English Tokeniser• ANNIE Sentence Splitter• ANNIE POS Tagger• GATE Morphological Analyser• Note: For all the above, leave the “Name”

field Empty

Page 18: Lecture semantic augmentation

18

Step: From Processing resources select following resources

Page 19: Lecture semantic augmentation

19

Step: From Language Resources Select

• OWLIM Ontology– Specify the location of the ontology you would

like to use for semantic augmentation– For example, we are using dbpedia ontology

Page 20: Lecture semantic augmentation

20

OWLIM Ontology window

Page 21: Lecture semantic augmentation

21

From Processing Resources Select

• Select Onto Root Gazetteer • & specify parameters as follows:

Page 22: Lecture semantic augmentation

22

Final steps: Create Corpus

• Go to Language resources and click on GATE Corpus, and add “Test” document created earlier

Page 23: Lecture semantic augmentation

23

Final steps: Create Corpus Pipeline

• From application

• And add processing resources in order shown below and press “run this application”

Page 24: Lecture semantic augmentation

24

Results: Go to file, Click on Annotation Set, Annotation List, Lookup

Semantic Augmentation

Page 25: Lecture semantic augmentation

Other features

• JAPE– a Java Annotation Patterns Engine, provides

regular-expression based pattern/action rules over annotations.

– Grammar to detect entities, validate detected entities, pre & post processing

– Example: “at the Carnegie Stadium”, “at the Emirates Stadium”, “at the O2 Arena”

– See Tutorial: http://gate.ac.uk/sale/thakker-jape-tutorial/index.html

Page 26: Lecture semantic augmentation

Some Links• Home page is http://gate.ac.uk/• Some good short tutorial videos for getting started:

http://gate.ac.uk/demos/developer-videos/ . These are only a few minutes each, so they’re fast

• User Guide: http://gate.ac.uk/sale/tao/index.html . This is apparently for version 7.1, which is a development build, but again it seems to be fine.

• Lots of documentation : http://gate.ac.uk/documentation.html

• The wiki: http://gate.ac.uk/wiki/ • JAPE grammar by Dhaval Thakker et al

http://gate.ac.uk/sale/thakker-jape-tutorial/index.html

Page 27: Lecture semantic augmentation

27

Challenge: Term Ambiguity

• ...this apple on the palm of my hand...• ...Apple tried to acquire Palm Inc....• ...eating an apple sitted by a palm tree...

• What do “apple” and “palm” mean in each case?

• Objective is to recognize entities and disambiguate their meaning.

DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva,

and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) .

Page 28: Lecture semantic augmentation

Challenges

• Disambiguation• Unknown entities • Ontology learning• Scale and speed• Co-referencing

Page 29: Lecture semantic augmentation

Existing Services for Semantic Augmentation

Page 30: Lecture semantic augmentation

Existing Services for Semantic Augmentation

Page 31: Lecture semantic augmentation

31

DBpedia Spotlight

• DBpedia is a collection of entity descriptions extracted from Wikipedia & shared as linked data

• DBpedia Spotlight uses data from DBpedia and text from associated Wikipedia pages

• Learns how to recognize that a DBpedia resource was mentioned

• Given plain text as input, generates annotated texthttp://dbpedia-spotlight.github.com/demo/

Page 32: Lecture semantic augmentation

32

DBpedia Spotlight

Page 33: Lecture semantic augmentation

33

DBpedia Spotlight

Page 34: Lecture semantic augmentation

34

References

• DBpedia Spotlight: Shedding Light on the Web of Documents. Pablo Mendes, Max Jakob, Andrés García-Silva, and Christian Bizer. In: In the Proceedings of the 7th International Conference on Semantic Systems I-Semantics (2011) .

• Introduction to GATE, Dr. Paula Matuszek• Various resources from gate.ac.uk