«Опыт разработки двухуровневых программ подготовки в области туризма , ориентированных на
Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология...
-
Upload
ainl-conferences -
Category
Technology
-
view
390 -
download
9
description
Transcript of Анатолий Старостин (ABBYY) "ABBYY InfoExtractor: технология...
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
Starostin A.S.
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
2
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
NLP-technologies
Rule-based - technologies based on the use of hand-written language rules applicable to a particular task.
Statistics-based - technologies based on machine learning on large text corpora, labeled, or parallel.
Hybrid technology - connecting a variety of approaches, for example: Rule-based + Statistics-based.
Model-based - technologies based on the universal (complete) language modeling
ABBYY Compreno
3
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
Universal Semantic Hierarchy
• It’s a tree• Intermediate nodes
represent semantic classes (concepts)
• Leafs represent lexical classes
• Concrete lexemes are linked to lexical classes
• All nodes are labeled with grammar and semantic information (set of grammemes and set of semantemes)
4
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
5
Syntactic-Semantic tree
Google sold Motorola to Lenovo for $2.91 billion.
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
6
OWL ontology
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
RDF graph
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
IE development factory
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
Extraction algorithm
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
Parse subtree interpretation rules
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
Parse subtree interpretation rules
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
Identification rules
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
Type of statements
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
IE system production
Design Input: customer needs (unformal), text examples
(marked up or not) Output: OWL-ontology where every object is well-
documented Development Input: well-documented OWL-ontology, marked up text
examples Output: production system of rules
Testing Nightly testing (marked up corpora) Reclamations (pointed error examples)
All three activities within one framework, which is called OntoDPS
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
IE system design
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
IE system design (marked up text example)
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
IE system development: libraries
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
IE system development: projects
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
IE system development: reuse and customization
Adding new items to
dictionaries
Adding new instances to
ontologies
Reuse of libraries and rules
Complex rule customization
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
IE system testing: nightly testing
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
IE system testing: nightly testing
ABBYY InfoExtractor: technology of producing domain oriented information extraction systems
Thank you!Questions?