GATE : General Architecture for Text Engineering
-
Upload
ahmed-magdy -
Category
Technology
-
view
6.568 -
download
1
description
Transcript of GATE : General Architecture for Text Engineering
![Page 1: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/1.jpg)
GATEGeneral Architecturefor Text Engineering
Presented by Ahmed Magdy Ezzeldin
![Page 2: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/2.jpg)
What is Text Engineering?
● Text or Language Engineering means applying scientific principles to the design, construction and maintenance of tools to help deal with information that has been expressed in natural languages (the languages that people use for communicating with one another).
![Page 3: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/3.jpg)
Applications● Automatic summarization
● Co-reference resolution
● Discourse analysis (elaboration, explanation, contrast, question, statement, assertion)
● Machine translation
● Morphological segmentation
● Named entity recognition
● Natural language generation
● Natural language understanding
● Optical character recognition (OCR)
● Part-of-speech tagging
● Parsing
● Question answering
● Relationship extraction
● Sentiment analysis (Polarity)
● Speech recognition (Speech segmentation)
● Sentence breaking, Word segmentation, Topic segmentation
● Word sense disambiguation
![Page 4: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/4.jpg)
What is GATE?
● General Architecture for Text Engineering
● Java suite of NLP tools
● University of Sheffield
● Initial Release 1995 (17 years ago)
● Last Stable Release 6.1 May 6, 2011
● Languages : English, Spanish, Chinese, Arabic, Bulgarian, French, German, Hindi, Italian, Cebuano, Romanian, Russian.
● Accepted Input Formats TXT, HTML, XML, Doc, PDF and Java Serial, PostgreSQL, Lucene, Oracle Databases
● GATE Developer which is a GATE graphical user interface, like Eclipse for Java programmers, provides a graphical environment for research and development of language processing software.
![Page 5: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/5.jpg)
Gate Components and APIs
![Page 6: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/6.jpg)
ANNIE GATE Application
● A Nearly-New Information Extraction System
● Example Application for English Language Engineering
● A set of modules: ● Tokenizer● Gazetteer● Sentence splitter● Part-of-speech tagger● Named entities transducer● Co-reference tagger.
![Page 7: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/7.jpg)
ANNIE Architecture
![Page 8: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/8.jpg)
Demos
● ANNIE Gazetteer: A list lookup component. The list files are located in $GATE_HOME/plugins/ANNIE/resources/gazetteer
● JAPE Transducer: JAPE is a Java Annotation Patterns Engine. JAPE provides finite state transduction over annotations based on regular expressions. Example files are located in $GATE_HOME/plugins/ANNIE/resources/NE
● ANNIE NE Transducer: (ANNIE named entity grammar) a semantic tagger based on the JAPE language.
![Page 9: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/9.jpg)
Mimir
● Provides indexing and searching the linguistic and semantic information generated by GATE
Demo
![Page 10: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/10.jpg)
Installing Mimir
![Page 11: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/11.jpg)
● Open GATE and Load ANNIE Systems with Defaults
● Then click the Manage CREOLE Plug-ins
![Page 12: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/12.jpg)
Add Mimir Client Path
● Add Mimir as a Plugin and set mimir-client directory
![Page 13: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/13.jpg)
● Make sure Mimir Plugin is loaded now and every time you open GATE
![Page 14: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/14.jpg)
● Add Mimir Indexing PR to Processing Resources
![Page 15: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/15.jpg)
● Create a New Corpus from Language Resource
![Page 16: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/16.jpg)
● Right Click the Corpus and populate it with Documents
![Page 17: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/17.jpg)
Edit the Default Index Template
● Open http://localhost:8080/mimir-demo in your browser and go to the configuration page
● Then go to the Index Templates section and manage them
● Then Click on the default Index Template to edit it.
![Page 18: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/18.jpg)
Add some annotations to the Default Index Template
![Page 19: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/19.jpg)
Add a new Index
![Page 20: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/20.jpg)
Edit the Index you created and set the Scorer Algorithm
(1)
(2)
(3)
![Page 21: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/21.jpg)
Copy the Index URL
![Page 22: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/22.jpg)
Paste Index URL in Mimir and Run ANNIE on the Corpus
![Page 23: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/23.jpg)
Double click any document and check Annotations yourself
![Page 24: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/24.jpg)
Close and Search the Index
(1)
(2)
![Page 25: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/25.jpg)
Example Query
![Page 26: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/26.jpg)
Thank you
![Page 27: GATE : General Architecture for Text Engineering](https://reader033.fdocuments.net/reader033/viewer/2022050808/5538879d550346b04c8b478d/html5/thumbnails/27.jpg)
References
http://gate.ac.uk
http://www.wikipedia.com
GATE Website (it is huge)
Mother of all Knowledge