Porting the QALL-ME framework to Romanian
-
Upload
constantin-orasan -
Category
Education
-
view
833 -
download
1
description
Transcript of Porting the QALL-ME framework to Romanian
![Page 1: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/1.jpg)
Porting the QALL-ME framework to Romanian
Constantin Orasan
Research Group in Computational LinguisticsResearch Institute in Information and Language Processing
University of Wolverhamptonhttp://www.wlv.ac.uk/~in6093/
29th March 2010
![Page 2: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/2.jpg)
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
![Page 3: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/3.jpg)
Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
![Page 4: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/4.jpg)
Need to access information
• as a result of the Internet development more and moreinformation becomes available
• this information is in many languages
• fields from computational linguistics such as automaticsummarisation, question answering, text mining, etc. can helppeople deal with information
![Page 5: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/5.jpg)
Need to access information
• as a result of the Internet development more and moreinformation becomes available
• this information is in many languages
• fields from computational linguistics such as automaticsummarisation, question answering, text mining, etc. can helppeople deal with information
![Page 6: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/6.jpg)
Question answering (QA)
• Question answering aims at identifying the answer to aquestion in a large collection of documents
• the information provided by QA is more focused thaninformation retrieval
• the output can be the exact answer or a text snippet whichcontains the answer
• the domain took off as a result of the introduction of QAtrack in TREC, whilst cross-lingual QA as a result of CLEF
![Page 7: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/7.jpg)
Types of QA systems
• open-domain QA systems: can answer any question from anycollection+ can potentially answer any question- very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository ofquestions for which the answer is known+ very little processing necessary- limited to the answers in the database
• closed-domain QA systems: are built for very specific domainsand exploit expert knowledge in them+ very high accuracy- can require extensive language processing and limited to onedomain
![Page 8: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/8.jpg)
Types of QA systems
• open-domain QA systems: can answer any question from anycollection+ can potentially answer any question- very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository ofquestions for which the answer is known+ very little processing necessary- limited to the answers in the database
• closed-domain QA systems: are built for very specific domainsand exploit expert knowledge in them+ very high accuracy- can require extensive language processing and limited to onedomain
![Page 9: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/9.jpg)
Types of QA systems
• open-domain QA systems: can answer any question from anycollection+ can potentially answer any question- very low accuracy (especially in cross-lingual settings)
• canned QA systems: rely on a very large repository ofquestions for which the answer is known+ very little processing necessary- limited to the answers in the database
• closed-domain QA systems: are built for very specific domainsand exploit expert knowledge in them+ very high accuracy- can require extensive language processing and limited to onedomain
![Page 10: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/10.jpg)
Purpose of the presentation
• briefly present the QALL-ME project
• show how it was adapted to answer questions in Romanianabout movies
![Page 11: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/11.jpg)
Purpose of the presentation
• briefly present the QALL-ME project
• show how it was adapted to answer questions in Romanianabout movies
![Page 12: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/12.jpg)
Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
![Page 13: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/13.jpg)
The QALL-ME project
• QALL-ME = Question Answering Learning technologies in amultiLingual and Multimodal Environment
• EU-funded project part of FP6
• 7 partners:• FBK-irst, Italy• University of Wolverhampton, UK• University of Alicante, Spain• DFKI, Germany• Comdata, Italy• UbiEST, Italy• WayCom, Italy
• Web page: http://qallme.fbk.eu
![Page 14: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/14.jpg)
The QALL-ME project
• aimed at establishing a shared infrastructure for multilingualand multimodal QA in the domain of tourism
• In the QALL-ME system• users ask natural language questions in several languages (both
in textual and speech modality) using a variety of input devices(e.g. mobile phones), and
• returns a list of specific answers formatted in the mostappropriate modality, ranging from small texts, maps, videos,and pictures.
![Page 15: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/15.jpg)
Spanish Answer Extractor
Italian Answer Extractor
German Answer Extractor
QALLME central QA planner
Service Provider
Question Type ontology
Answer Type ontology
Dialog Models
English Answer Extractor
Local Information Sources
Semantic representation
Speech Recognizers
![Page 16: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/16.jpg)
Main outputs of the project
• an ontology for the domain of tourism
• entailment based QA framework
• the QALL-ME benchmark
• an entailment framework
(all accessible from the project’s web page:http://qallme.fbk.eu)
![Page 17: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/17.jpg)
The ontology
• A domain-specific ontology for the tourism domain wasdeveloped and shared among all the partners.
• The ontology was used to serve as:• bridge between different languages• communication language between different components of the
system
• The ontology was linked to domain independent ontologiessuch as MultiWordNet and Sumo
• For more information see (Ou et al., 2008)
![Page 18: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/18.jpg)
Design of the ontology
• Analysis of data from content providers
• Analysis of users requirements
• Inspired by similar ontologies:• Harmonise and eTourism: focus on static information (e.g.
accommodation and events/activities)• Similar to eTourism as is written in OWL rather RDFs• but wider coverage
• Introspection
![Page 19: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/19.jpg)
The ontology
• Main classes: Country, Destination, Site (i.e.Accommodation, Attraction, Gastro, and Infrastructure),Transportation, EventContent and Event
• Element classes: Facility, Room, PersonOrganization,Language, and Currency
• Attribute classes: Contact, Location, Period and Price.
• Element and attribute classes cannot exist independently andhave to be attached to other main or element classes
![Page 20: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/20.jpg)
MovieShow
Cinema
Movie
TicketPrice
DateTimePeriod
synposis
isInSitehasPrice
hasEventContent
hasPeriod
priceType
priceValue
Director
Star
Producer
Writer
Currency
GPSCoordinate
DirectionLocation
Contact
hasCurrency
TimePeriod
DatePeriod
startTimeendTime
endDate startDate
hasTimePeriod
hasDatePeriod
DirectionLocation
hasSiteFacility
hasContact
hasWriter
hasDirector
hasProducer
genre
name
hasPostalAddress
hasGPSCoordinate
PostalAddress
CinemaRoom
hasRoom
hasStar
certificate
SitePrice
Event
EventContentPeriod
subClassOfsubClassOf
subClassOf
subClassOfsubClassOf
SiteFacility
RoomFacility
hasRoomFacility
name description
![Page 21: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/21.jpg)
The ontology
• Encoded using OWL DL, since it has more expressive powerthan OWL Lite and has more efficient reasoning support thanOWL Full
• Used Protege-OWL as the editor and RacerPro7 as thereasoner
• The ontology contains• 122 classes (concepts),• 55 datatype properties and• 52 object properties which indicate the relationships among
the 122 classes.• 15 top-level classes.
• The class hierarchy has a maximum depth of 4.
![Page 22: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/22.jpg)
The QALL-ME framework
• is an architecture skeleton for multilingual QA systems forclosed domains
• designed in such a way that it allows fast development ofclosed domain QA systems
• freely available from http://qallme.sourceforge.net/
• is based on a Service Oriented Architecture (SOA) which isrealised using web services
• relies on textual entailment recognisers
![Page 23: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/23.jpg)
Web services
1 Context providers: are used to anchor questions in spaceand time
2 Annotators: Currently three types of annotators areavailable:
• named entity annotators which identify names of cinemas,movies, persons, etc.
• term annotators which identify hotel facilities, movie genresand other domain-specific terminology
• temporal annotators that are used to recognise and normalisetemporal expressions in user questions
3 Entailment engine: determines whether a user questionentails a retrieval procedure
4 Query generator: which relies on an entailment engine togenerate a query to extract the answer.
5 Answer pool: retrieves the answers from a database.
![Page 24: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/24.jpg)
Context providers
• are used to anchor a question in space and time
• return the current position and time
• used by the presentation module when maps are displayed
• used by temporal process to normalise temporal entities
• determines which services are used in a cross-lingual scenario
• can be static or determined from a mobile phone
![Page 25: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/25.jpg)
Named entity and term annotators
• named entity recogniser = identifies names of hotels, movies,persons, etc.
• term annotator = identifies domain specific terms such ashotel facilities, movie genres, etc.
• the entities and terms are known, so the task is reduced to adatabase look up
• Gazetteers are the main source for determining the entities
• The annotation module needs to determine the canonical formof a entity
• greedy algorithm that uses character based similarity, amodified TF*IDF and a greedy algorithm
• does not allow overlapping and there are few ambiguities
![Page 26: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/26.jpg)
Named entity and term annotators
• Annotates both standard and non-standard entities: cinema,movie, location, genre, certificate
• Needs to deal with nosy input:• misspelt words/input from ASR engines/SMS input e.g.
becaming Jane, becoming Jade• free word order (Will Smith / Smith, Will)• equivalent strings (saw III / three / 3; Smith, Will / Smith,
W.)
• Needs to deal with questions in mixed languages
• Needs to deal with ambiguous entities
![Page 27: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/27.jpg)
Temporal annotator
• questions from the domain of tourism contain a large numberof temporal expressions
• we use a simplified version of the tagger implemented byPuscasu (2004)
• the simplification was done to reduce the processing time(Varga, Puscasu, and Orasan, 2009)
• identifies both self-contained temporal expressions (TEs) andindexical/under-specified TEs
• uses TIMEX2 standard
• the output is used by TIMEX2SPARQL service to restrict theextracted answers
![Page 28: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/28.jpg)
Entailment engine
• often closed-domain QA systems transform a question to aProlog fact or SQL query
• often this solution works only partially due to languagevariability
• in QALL-ME this problem is solved using textual entailment
• the entailment engine determines whether two questions entailthe same meaning so they share the same retrieval procedure:
• T the input question• H is textual pattern stored in a repository• textual patterns have SPARQL retrieval procedures
• we calculate the similarity between two sentences to determinewhether between them there is an entailment relation
![Page 29: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/29.jpg)
Query generation service
• produces a SPARQL query that can be used to answer thequestion
• has a list of question templates with their associated SPARQLqueries
• relies on the entailment engine to determine which of thequestion patterns entail the same meaning as the userquestion
• fills in the slots of the question patterns
![Page 30: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/30.jpg)
Example
User question (T): What movie can I see tonight inWolverhampton?
List of patterns (H):
• Who is the director of [MOVIE]?
• Where can I see [MOVIE] [TIMEX]?
• What movies are on in [DESTINATION] [TIMEX]?
• What is the address of [CINEMA]?
• . . .
![Page 31: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/31.jpg)
Example
User question (T): What movie can I see tonight inWolverhampton? → What movie can I see [TIMEX] in[DESTINATION]?
List of patterns (H):
• Who is the director of [MOVIE]?
• Where can I see [MOVIE] [TIMEX]?
• What movies are on in [DESTINATION] [TIMEX]?
• What is the address of [CINEMA]?
• . . .
Select the retrieval pattern associated with the questionWhat movies are on in Wolverhampton tonight
![Page 32: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/32.jpg)
Answer Pool service
• takes the SPARQL query generated by the query generatorand extracts the answer
• SPARQL is a query language for accessing RDF graphs by theW3C RDF Data Access Working Group
• SPARQL provides interoperability between languages
![Page 33: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/33.jpg)
Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
![Page 34: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/34.jpg)
Cross-lingual QA
• QALL-ME tourism prototype is design to allow bothmonolingual and cross-lingual QA
• relevant web services are activated depending on the sourceand target language
• user scenario: Romanian tourist in UK who wants to find outmore about the movies in Wolverhampton
![Page 35: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/35.jpg)
Cross-lingual QA
![Page 36: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/36.jpg)
Prototype for Romanian
• we wanted to find out how long it takes to develop a demo forRomanian
• components had to be adapted:• named entity and term annotators had to be trained on a
different list of entities• a simple temporal annotator was implemented on the basis of
the English one• the language independent similarity entailment engine was used• the question patterns were translated to Romanian• answer pool did not required any change
• the whole process took under one week
![Page 37: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/37.jpg)
Romanian demo
http://qallme.wlv.ac.uk:8080/QALL-ME-web-demo/index.jsp
![Page 38: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/38.jpg)
Structure of the presentation
1 Introduction
2 The QALL-ME project
3 Multilingual information access in QALL-ME
4 Conclusions
![Page 39: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/39.jpg)
Conclusions
• multilinguality is a very important issue for the QALL-MEproject
• the ontology constitute the bridge between languages
• the QALL-ME framework can be used to quickly developprototypes for other languages
![Page 40: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/40.jpg)
Thank you!
![Page 41: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/41.jpg)
References
![Page 42: Porting the QALL-ME framework to Romanian](https://reader034.fdocuments.net/reader034/viewer/2022052619/555f36a0d8b42a65118b4f3e/html5/thumbnails/42.jpg)
Ou, Shiyan, Viktor Pekar, Constantin Orasan, Christian Spurk, and Matteo Negri.2008. Development and alignment of a domain-specific ontology for questionanswering. In European Language Resources Association (ELRA), editor, Proceedingsof the Sixth International Language Resources and Evaluation (LREC’08), Marrakech,Morocco, May 28 – 30.
Puscasu, Georgiana. 2004. A framework for temporal resolution. In Proceedings ofthe 4th Conference on Language Resources and Evaluation (LREC 2004), Lisbon,Portugal, May, 26-28.
Varga, Andrea, Georgiana Puscasu, and Constantin Orasan. 2009. Identification oftemporal expressions in the domain of tourism. In Knowledge Engineering: Principlesand Techniques, volume 1, pages 29 – 32, Cluj-Napoca, Romania, July 2 – 4.