Question Answering over Linked Data: Challenges, Approaches & Trends (Tutorial @ ESWC 2014)
-
Upload
andre-freitas -
Category
Education
-
view
993 -
download
5
Transcript of Question Answering over Linked Data: Challenges, Approaches & Trends (Tutorial @ ESWC 2014)
Question Answering over Linked Data
Challenges, Approaches & Trends
André Freitas and Christina Unger
Tutorial @ ESWC 2014
Understand the changes on the database landscape in the direction of more heterogeneous data scenarios, especially the Semantic Web.
Understand how Question Answering (QA) fits into this new scenario.
Give the fundamental pointers for you to develop your own QA system from the state-of-the-art.
Focus on coverage.
2
Goal of this Talk
Motivation & Context Big Data, Linked Data & QA Challenges for QA over LD The Anatomy of a QA System QA over Linked Data (Case Studies) Evaluation of QA over Linked Data Do-it-yourself (DIY): Core Resources Trends Take-away Message
4
Outline
5
Motivation & Context
Humans are built-in with natural language communication capabilities.
Very natural way for humans to communicate information needs.
The archetypal AI system.
6
Why Question Answering?
A research field on its own.
Empirical bias: Focus on the development and evaluation of approaches and systems to answer questions over a knowledge base.
Multidisciplinary: ◦ Natural Language Processing◦ Information Retrieval◦ Knowledge Representation◦ Databases◦ Linguistics◦ Artificial Intelligence◦ Software Engineering◦ ...
7
What is Question Answering?
8
What is Question Answering?
QA System
Knowledge Bases
Question: Who is the daughter of Bill
Clinton married to?
Answer: Marc Mezvinsky
From the QA expert perspective◦ QA depends on mastering different semantic computing
techniques.
QA as a ‘Semantic Pentathlon’
9
Keyword Search:◦ User still carries the major efforts in interpreting the data.◦ Satisfying information needs may depend on multiple
search operations.◦ Answer-driven information access.◦ Input: Keyword search
Typically specification of simpler information needs.◦ Output: documents, structured data.
QA:◦ Delegates more ‘interpretation effort’ to the machines.◦ Query-driven information access.◦ Input: natural language query
Specification of complex information needs.◦ Output: direct answer.
10
QA vs IR
Structured Queries:◦ A priori user effort in understanding the schemas behind
databases.◦ Effort in mastering the syntax of a query language.◦ Satisfying information needs may depend on multiple
querying operations.◦ Input: Structured query◦ Output: data records, aggregations, etc
QA:◦ Delegates more ‘semantic interpretation effort’ to the
machine.◦ Input: natural language query◦ Output: direct natural language answer
11
QA vs Database Querying
Keyword search:◦ Simple information needs.◦ Predictable search behavior.◦ Vocabulary redundancy (large document
collections, Web). Structured queries:
◦ Demand for absolute precision/recall guarantees.◦ Small & centralized schemas.◦ More data volume/less semantic heterogeneity.
QA:◦ Specification of complex information needs.◦ Heterogeneous and schema-less data.◦ More automated semantic interpretation.
12
When to use?
13
QA: Vision
14
QA: Reality (FB Graph Search)
15
QA: Reality (Watson)
16
QA: Reality (Siri)
QA: Reality (Google Knowledge Graph)
QA is usually associated with the delegation of more of the ‘interpretation effort’ to the machines.
QA supports the specification of more complex information needs.
QA, Information Retrieval and Databases are complementary perspectives.
18
To Summarize
19
Big Data, Linked Data
& QA
Big Data: More complete data-based picture of the world.
20
Big Data
Heterogeneous, complex and large-scale databases. Very-large and dynamic “schemas”.
21
From Rigid Schemas to Schema-less
10s-100s attributes
1,000s-1,000,000s attributescirca 2000
circa 2013
Multiple perspectives (conceptualizations) of the reality. Ambiguity, vagueness, inconsistency.
22
Multiple Interpretations
Franklin et al. (2005): From Databases to Dataspaces.
Helland (2011): If You Have Too Much Data, then “Good Enough” Is Good Enough.
Fundamental trends:◦ Co-existence of heterogeneous data.◦ Semantic best-effort behavior.◦ Pay-as-you go data integration.◦ Co-existing query/search services.
23
Big Data & Dataspaces
24
Trend 1: Data size
Gantz & Reinsel, The Digital Universe in 2020 (2012).
25
Trend 2: Connectedness
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009).
Decentralization of content generation.
26
Trend 3
Eifrem, A NOSQL Overview And The Benefits Of Graph Database (2009)
Volume Velocity Variety
27
Big Data
- The most interesting but usually neglected dimension.
The Semantic Web Vision: Software which is able to understand meaning (intelligent, flexible).
QA as a way to support this vision.
QA & Semantic Web
28
From which university did the wife of Barack Obama graduate?
With Linked Data we are still in the DB world
QA & Linked Data
29
With Linked Data we are still in the DB world
(but slightly worse)
QA & Linked Data
30
Addresses practical problems of data accessibility in a data heterogeneity scenario.
A fundamental part of the original Semantic Web vision.
31
QA & Linked Data
32
Challenges for QA over Linked Data
From questions to answers
33
Example: What is the currency of the Czech Republic?
SELECT DISTINCT ?uri WHERE { res:Czech_Republic dbo:currency ?uri . }
Main challenges:
The gap between natural language and linked data
Mapping natural language expressions to vocabulary elements (accounting for lexical and structural differences).
Handling meaning variations (e.g. ambiguous or vague expressions, anaphoric expressions).
34
URIs are language independent identifiers.
Their only actual connection to natural language is by the labels that are attached to them.
dbo:spouse rdfs:label “spouse”@en , “echtgenoot”@nl .
Labels, however, do not capture lexical variation:
wife of husband of married to
...
Mapping natural language expressions to vocabulary elements
35
Mapping natural language expressions to vocabulary elements
Which Greek cities have more than 1 million inhabitants?
SELECT DISTINCT ?uri WHERE { ?uri rdf:type dbo:City . ?uri dbo:country res:Greece . ?uri dbo:populationTotal ?p . FILTER (?p > 1000000)}
36
Often the conceptual granularity of language does not coincide with that of the data schema.
Mapping natural language expressions to vocabulary elements
When did Germany join the EU?
SELECT DISTINCT ?date WHERE { res:Germany dbp:accessioneudate ?date .}
Who are the grandchildren of Bruce Lee?
SELECT DISTINCT ?uri WHERE { res:Bruce_Lee dbo:child ?c . ?c dbo:child ?uri .}
37
In addition, there are expressions with a fixed, dataset-independent meaning.
Mapping natural language expressions to vocabulary elements
Who produced the most films?
SELECT DISTINCT ?uri WHERE { ?x rdf:type dbo:Film . ?x dbo:producer ?uri .}ORDER BY DESC(COUNT(?x))OFFSET 0 LIMIT 1
38
Different datasets usually follow different schemas, thus provide different ways of answering an information need.
Example:
Meaning variation: Ambiguity
39
The meaning of expressions like the verbs to be, to have, and prepositions of, with, etc. strongly depends on the linguistic context.
Meaning variation: Semantically light expressions
Which museum has the most paintings?
?museum dbo:exhibits ?painting .
Which country has the most caves?
?cave dbo:location ?country .
40
The number of non-English actors on the web is growing substantially. ◦ Accessing data.◦ Creating and publishing data.
Semantic Web: In principle very well suited for multilinguality, as URIs are
language-independent. But adding multilingual labels is not common practice (less
than a quarter of the RDF literals have language tags, and most of those tags are in English).
Multilinguality
41
Requirement: Completeness and accuracy (Wrong answers are worse than no answers)
In the context of the Semantic Web: QA systems need to deal with heterogeneous
and imperfect data.◦ Datasets are often incomplete.◦ Different datasets sometimes contain duplicate information,
often using different vocabularies even when talking about the same things.
◦ Datasets can also contain conflicting information and inconsistencies.
Data quality and data heterogeneity
42
Data is distributed among a large collection of interconnected datasets.
Distributed and linked datasets
Example: What are side effects of drugs used for thetreatment of Tuberculosis?
SELECT DISTINCT ?xWHERE { disease:1154 diseasome:possibleDrug ?d1. ?d1 a drugbank:drugs . ?d1 owl:sameAs ?d2. ?d2 sider:sideEffect ?x.}
43
Requirement: Real-time answers, i.e. low processing time.
In the context of the Semantic Web:
Datasets are huge. ◦ There are a lot of distributed datasets that might
be relevant for answering the question. ◦ Reported performance of current QA systems
amounts to ~20-30 seconds per question (on one dataset).
Performance and scalability
44
Bridge the gap between natural languages and data.
Deal with incomplete, noisy and heterogeneous datasets.
Scale to a large number of huge datasets. Use distributed and interlinked datasets. Integrate structured and unstructured data. Low maintainability costs (easily adaptable to
new datasets and domains).
Summary: Challenges
45
The Anatomy of a QA System
46
Natural Language Interfaces (NLI) ◦ Input: Natural language queries◦ Output: Either processed or unprocessed answers
Processed: Direct answers (QA). Unprocessed: Database records, text snippets,
documents.
47
NLI & QAs
NLI
QA
Categorization of questions and answers. Important for:
◦ Understanding the challenges before attacking the problem.
◦ Scoping the QA system.
Based on:◦ Chin-Yew Lin: Question Answering.◦ Farah Benamara: Question Answering Systems: State
of the Art and Future Directions.
48
Basic Concepts & Taxonomy
The part of the question that says what is being asked:◦ Wh-words:
who, what, which, when, where, why, and how
◦ Wh-words + nouns, adjectives or adverbs: “which party …”, “which actress …”, “how long …”, “how
tall …”.
49
Terminology: Question Phrase
Useful for distinguishing different processing strategies ◦ FACTOID:
PREDICATIVE QUESTIONS: “Who was the first man in space?” “What is the highest mountain in Korea?” “How far is Earth from Mars?” “When did the Jurassic Period end?” “Where is Taj Mahal?”
LIST: “Give me all cities in Germany.”
SUPERLATIVE: “What is the highest mountain?”
YES-NO: “Was Margaret Thatcher a chemist?”
50
Terminology: Question Type
Useful for distinguishing different processing strategies ◦ OPINION:
“What do most Americans think of gun control?” ◦ CAUSE & EFFECT:
“What is the most frequent cause for lung cancer?” ◦ PROCESS:
“How do I make a cheese cake?”◦ EXPLANATION & JUSTIFICATION:
“Why did the revenue of IBM drop?”◦ ASSOCIATION QUESTION:
“What is the connection between Barack Obama and Indonesia?”
◦ EVALUATIVE OR COMPARATIVE QUESTIONS: “What is the difference between impressionism and
expressionism?”51
Terminology: Question Type
The class of object sought by the question: Abbreviation Entity: event, color, animal, plant,. . . Description, Explanation & Justification :
definition, manner, reason,. . . (“How, why …”) Human: group, individual,. . . (“Who …”) Location: city, country, mountain,. . . ( “Where
…”) Numeric: count, distance, size,. . . (“How many
how far, how long …”) Temporal: date, time, …(from “When …”)
52
Terminology: Answer Type
Question focus is the property or entity that is being sought by the question ◦ “In which city was Barack Obama born?” ◦ “What is the population of Galway?”
Question topic: What the question is generally about :◦ “What is the height of Mount Everest?”
(geography, mountains)◦ “Which organ is affected by the Meniere’s disease?”
(medicine)
53
Terminology: Question Focus & Topic
Structure level:◦ Structured data (databases).◦ Semi-structured data (e.g. XML).◦ Unstructured data.
Data source:◦ Single dataset (centralized).◦ Enumerated list of multiple, distributed datasets.◦ Web-scale.
54
Terminology: Data Source Type
Domain Scope:◦ Open Domain◦ Domain specific
Data Type:◦ Text◦ Image◦ Sound◦ Video
Multi-modal QA
55
Terminology: Domain Type
Long answers◦ Definition/justification based.
Short answers ◦ Phrases.
Exact answers◦ Named entities, numbers, aggregate, yes/no.
56
Terminology: Answer Format
Relevance: The level in which the answer addresses users information needs.
Correctness: The level in which the answer is factually correct.
Conciseness: The answer should not contain irrelevant information.
Completeness: The answer should be complete. Simplicity: The answer should be easy to
interpret. Justification: Sufficient context should be
provided to support the data consumer in the determination of the query correctness.
Answer Quality Criteria
57
Right: The answer is correct and complete. Inexact: The answer is incomplete or
incorrect. Unsupported: The answer does not have an
appropriate evidence/justification. Wrong: The answer is not appropriate for the
question.
Answer Assessment
58
Simple Extraction: Direct extraction of snippets from the original document(s) / data records.
Combination: Combines excerpts from multiple sentences, documents / multiple data records, databases.
Summarization: Synthesis from large texts / data collections.
Operational/functional: Depends on the application of functional operators.
Reasoning: Depends on the application of an inference process over the original data.
59
Answer Processing
Semantic Tractability (Popescu et al., 2003): Vocabulary overlap/distance between the query and the answer.
Answer Locality (Webber et al., 2003): Whether answer fragments are distributed across different document fragments / documents or datasets/dataset records.
Derivability (Webber et al, 2003): Dependent if the answer is explicit or implicit. Level of reasoning dependency.
Semantic Complexity: Level of ambiguity and discourse/data heterogeneity.
60
Complexity of the QA Task
61
High-level Components
Data pre-processing: Pre-processes the database data (includes indexing, data cleaning, feature extraction).
Question Analysis: Performs syntactic analysis and detects/extracts the core features of the question (NER, answer type, etc).
Data Matching: Matches terms in the question to entities in the data.
Query Constuction: Generates structured query candidates considering the question-data mappings and the syntactic constraints in the query and in the database.
Scoring: Data matching and the query construction components output several candidates that need to be scored and ranked according to certain criteria.
Answer Retrieval & Extraction: Executes the query and extracts the natural language answer from the result set.
62
Main Components
64
QA over Linked
Data(Case Studies)
ORAKEL & Pythia (Cimiano et al, 2007; Unger & Cimiano, 2011)◦ Ontology-specific question answering
TBSL (Unger et al., 2012)◦ Template-based question answering
Aqualog & PowerAqua (Lopez et al. 2006)◦ Querying on a Semantic Web scale
Treo (Freitas et al. 2011, 2014)◦ Schema-agnostic querying using distributional
semantics IBM Watson (Ferrucci et al., 2010)
◦ Large-scale evidence-based model for QA
65
QA Systems (Case Studies)
QuestIO & Freya (Damljanovic et al. 2010) QAKIS (Cabrio et al. 2012) Yahya et al., 2013
66
QA Systems (Not covered)
67
ORAKEL (Cimiano et al, 2007) & Pythia
(Unger & Cimiano, 2011)Ontology-specific question answering
Key contributions: ◦ Using ontologies to interpret user questions◦ Relies on a deep linguistic analysis that
returns semantic representations aligned to the ontology vocabulary and structure
Evaluation: Geobase
68
ORAKEL (Cimiano et al. 2007) &Pythia (Unger & Cimiano 2011)
69
ORAKEL (Cimiano et al. 2007) &Pythia (Unger & Cimiano 2011)
Ontology-based QA:◦ Ontologies play a central role in interpreting user questions ◦ Output is a meaning representation that is aligned to the
ontology underlying the dataset that is queried◦ ontological knowledge is used for drawing inferences, e.g.
for resolving ambiguities
Grammar-based QA:◦ Rely on linguistic grammars that assign a syntactic and
semantic representation to lexical units◦ Advantage: can deal with questions of arbitrary complexity◦ Drawback: brittleness (fail if question cannot be parsed
because expressions or constructs are not covered by the grammar)
70
Grammars Ontology-independent entries
◦ mostly function words quantifiers (some, every, two) wh-words (who, when, where, which, how many) negation (not)
◦ manually specified and re-usable for all domains Ontology-specific entries
◦ content words and phrases corresponding to concepts and properties in the ontology
◦ automatically generated from an ontology lexicon
Aim: capture rich and structured linguistic information about how ontology elements are lexicalized in a particular language
Ontology Lexicon
lemon (Lexicon Model for Ontologies)http://lemon-model.net
◦ meta-model for describing ontology lexica with RDF
◦ declarative (abstracting from specific syntactic and semantic theories)
◦ separation of lexicon and ontology
71
Semantics by reference:◦ The meaning of lexical entries is specified by
pointing to elements in the ontology.
Example:
Ontology Lexicon
72
Architecture
73
Which cities have more than three universities?
Example
SELECT DISTINCT ?x WHERE { ?x rdf:type dbo:City . ?y rdf:type dbo:University . ?y dbo:city ?x .}GROUP BY ?yHAVING (COUNT(?y) > 3)
74
75
TBSL (Unger et al., 2012)Template-based question answering
Key contributions: ◦ Constructs a query template that directly
mirrors the linguistic structure of the question◦ Instantiates the template by matching natural
language expressions with ontology concepts
Evaluation: QALD 2012
76
TBSL (Unger et al., 2012)
In order to understand a user question, we need to understand:
Motivation
The words (dataset-specific)Abraham Lincoln → res:Abraham Lincoln
died in → dbo:deathPlace
The semantic structure (dataset-independent)who → SELECT ?x WHERE { … }
the most N → ORDER BY DESC(COUNT(?N)) LIMIT 1more than i N → HAVING COUNT(?N) > i
77
Goal: An approach that combines both an analysis of the semantic structure and a mapping of words to URIs.
Two-step approach: ◦ 1. Template generation
Parse question to produce a SPARQL template that directly mirrors the structure of the question, including filters and aggregation operations.
◦ 2. Template instantiation Instantiate SPARQL template by matching natural
language expressions with ontology concepts using statistical entity identification and predicate detection.
Template-based QA
78
Example: Who produced the most films?
SPARQL template:
SELECT DISTINCT ?x WHERE { ?y rdf:type ?c . ?y ?p ?x .}ORDER BY DESC(COUNT(?y))OFFSET 0 LIMIT 1
?c CLASS [films]?p PROPERTY [produced]
Instantiations:
?c = <http://dbpedia.org/ontology/Film>?p = <http://dbpedia.org/ontology/producer>
79
Architecture
80
Step 1: Template generationLinguistic processing
81
1. Natural language question is tagged with part-of-speech information.
2. Based on POS tags, grammar entries are built on the fly. ◦ Grammar entries are pairs of:
tree structures (Lexicalized Tree Adjoining Grammar) semantic representations (ext. Discourse Representation
Structures)
3. These lexical entries, together with domain-independent lexical entries, are used for parsing the question (cf. Pythia).
4. The resulting semantic representation is translated into a SPARQL template.
Step 1: Template generationLinguistic processing
82
Domain-independent: who, the most Domain-dependent: produced/VBD, films/NNS
Example: Who produced the most films?
SPARQL template 1:
SELECT DISTINCT ?x WHERE { ?x ?p ?y . ?y rdf:type ?c .}ORDER BY DESC(COUNT(?y)) LIMIT 1
?c CLASS [films]?p PROPERTY [produced]
SPARQL template 2:
SELECT DISTINCT ?x WHERE { ?x ?p ?y .}ORDER BY DESC(COUNT(?y)) LIMIT 1
?p PROPERTY [films]
83
Step 2: Template instantiationEntity identification and predicate detection
84
1. For resources and classes, a generic approach to entity detection is applied:
◦ Identify synonyms of the label using WordNet.◦ Retrieve entities with a label similar to the slot label based on
string similarities (trigram, Levenshtein and substring similarity).
2. For property labels, the label is additionally compared to natural language expressions stored in the BOA pattern library.
3. The highest ranking entities are returned as candidates for filling the query slots.
Step 2: Template instantiationEntity identification and predicate detection
85
Example: Who produced the most films?
?c CLASS [films]
<http://dbpedia.org/ontology/Film><http://dbpedia.org/ontology/FilmFestival>...
?p PROPERTY [produced]
<http://dbpedia.org/ontology/producer><http://dbpedia.org/property/producer>
<http:// dbpedia.org/ontology/wineProduced>
86
Step 3: Query ranking and selection
87
1. Every entity receives a score considering string similarity and prominence.
2. The score of a query is then computed as the average of the scores of the entities used to fill its slots.
3. In addition, type checks are performed: ◦ For all triples ?x rdf:type <class>, all query triples ?x p
e and e p ?x are checked w.r.t. whether domain/range of p is consistent with <class>.
4. Of the remaining queries, the one with highest score that returns a result is chosen.
Step 3: Query ranking and selection
88
Example: Who produced the most films?
SELECT DISTINCT ?x WHERE { ?x <http://dbpedia.org/ontology/producer> ?y . ?y rdf:type <http://dbpedia.org/ontology/Film> .}ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.76
SELECT DISTINCT ?x WHERE { ?x <http://dbpedia.org/ontology/producer> ?y . ?y rdf:type <http://dbpedia.org/ontology/FilmFestival>.}ORDER BY DESC(COUNT(?y)) LIMIT 1
Score: 0.60
89
The created template structure does not always coincide with how the data is actually modelled.
Considering all possibilities of how the data could be modelled leads to a big amount of templates (and even more queries) for one question.
Main drawback
90
91
Aqualog & PowerAqua (Lopez et al.
2006)Querying on a Semantic Web scale
Key contributions: ◦ Pioneer work on the QA over Semantic Web
data.◦ Semantic similarity mapping.
Terminological Matching: ◦ WordNet-based◦ Ontology Based◦ String similarity◦ Sense-based similarity matcher
Evaluation: QALD (2011). Extends the AquaLog system.
92
PowerAqua (Lopez et al. 2006)
93
WordNet
Two words are strongly similar if any of the following holds:◦ 1. They have a synset in common (e.g. “human” and
“person”)◦ 2. A word is a hypernym/hyponym in the taxonomy of
the other word.◦ 3. If there exists an allowable “is-a” path connecting a
synset associated with each word.◦ 4. If any of the previous cases is true and the
definition (gloss) of one of the synsets of the word (or its direct hypernyms/hyponyms) includes the other word as one of its synonyms, we said that they are highly similar.
94
Sense-based similarity matcher
Lopez et al. 2006
95
PowerAqua (Lopez et al. 2006)
96
Treo (Freitas et al. 2011, 2014)Schema-agnostic querying using
distributional semantics
Key contributions: ◦ Distributional semantic relatedness matching
model.◦ Compositional-distributional model for QA.
Terminological Matching: ◦ Explicit Semantic Analysis (ESA)◦ String similarity + node cardinality
Evaluation: QALD (2011)
97
Treo (Freitas et al. 2011, 2014)
Treo (Irish): Direction
98
Simple Queries (Video)
99
More Complex Queries (Video)
100
Vocabulary Search (Video)
101
Treo Answers Jeopardy Queries (Video)
102
Semantics for a Complex World• Most semantic models have dealt with particular types of
constructions, and have been carried out under very simplifying assumptions, in true lab conditions.
• If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest models/statements.
Sahlgren, 2013
Formal World Real World
103
Baroni et al. 2013
Distributional Semantics“Words occurring in similar (linguistic) contexts are semantically related.”
If we can equate meaning with context, we can simply record the contexts in which a word occurs in a collection of texts (a corpus).
This can then be used as a surrogate of its semantic representation.
104
Distributional Semantic Model
c1
child
husbandspouse
cn
c2
function (number of times that the words occur in c1)
0.7
0.5
Commonsense is here
105
Semantic Relatedness
θ
c1
child
husbandspouse
cn
c2
Works as a semantic ranking function
106
Approach Overview
Query Planner
Ƭ-Space
Large-scale unstructured data
Commonsense knowledge
Database
Distributional semantics
Core semantic approximation &
composition operations
Query AnalysisQuery Query Features
Query Plan
107
Approach Overview
Query Planner
Ƭ-Space
Wikipedia
Commonsense knowledge
RDF
Explicit Semantic Analysis
Core semantic approximation &
composition operations
Query AnalysisQuery Query Features
Query Plan
108
Ƭ-Space
e
p
r
109
Ƭ-Space The vector space is segmented by the instances
110
Core OperationsQuery
111
Core Operations
Search & Composition Operations
Query
112
Search and Composition Operations Instance search
◦ Proper nouns◦ String similarity + node cardinality
Class (unary predicate) search◦ Nouns, adjectives and adverbs◦ String similarity + Distributional semantic relatedness
Property (binary predicate) search◦ Nouns, adjectives, verbs and adverbs◦ Distributional semantic relatedness
Navigation
Extensional expansion◦ Expands the instances associated with a class.
Operator application◦ Aggregations, conditionals, ordering, position
Disjunction & Conjunction Disambiguation dialog (instance, predicate)
113
Core Principles
Minimize the impact of Ambiguity, Vagueness, Synonymy.
Address the simplest matchings first (heuristics).
Semantic Relatedness as a primitive operation.
Distributional semantics as commonsense knowledge.
114
Transform natural language queries into triple patterns.
“Who is the daughter of Bill Clinton married to?”
Query Pre-Processing (Question Analysis)
115
Step 1: POS Tagging◦ Who/WP◦ is/VBZ◦ the/DT◦ daughter/NN◦ of/IN◦ Bill/NNP◦ Clinton/NNP◦ married/VBN◦ to/TO◦ ?/.
Query Pre-Processing (Question Analysis)
116
Step 2: Core Entity Recognition ◦ Rules-based: POS Tag + TF/IDF
Who is the daughter of Bill Clinton married to?(PROBABLY AN INSTANCE)
Query Pre-Processing (Question Analysis)
117
Step 3: Determine answer type◦ Rules-based.
Who is the daughter of Bill Clinton married to? (PERSON)
Query Pre-Processing (Question Analysis)
118
Step 4: Dependency parsing◦ dep(married-8, Who-1) ◦ auxpass(married-8, is-2) ◦ det(daughter-4, the-3) ◦ nsubjpass(married-8, daughter-4) ◦ prep(daughter-4, of-5) ◦ nn(Clinton-7, Bill-6) ◦ pobj(of-5, Clinton-7) ◦ root(ROOT-0, married-8) ◦ xcomp(married-8, to-9)
Query Pre-Processing (Question Analysis)
119
Step 5: Determine Partial Ordered Dependency Structure (PODS)◦ Rules based.
Remove stop words. Merge words into entities. Reorder structure from core entity position.
Query Pre-Processing (Question Analysis)
Bill Clinton daughter married to
(INSTANCE)
Person
ANSWER TYPE
QUESTION FOCUS
120
Step 5: Determine Partial Ordered Dependency Structure (PODS)◦ Rules based.
Remove stop words. Merge words into entities. Reorder structure from core entity position.
Query Pre-Processing (Question Analysis)
Bill Clinton daughter married to
(INSTANCE)
Person
(PREDICATE) (PREDICATE) Query Features
121
Map query features into a query plan. A query plan contains a sequence of:
◦ Search operations.◦ Navigation operations.
Query Planning
(INSTANCE) (PREDICATE) (PREDICATE) Query Features
(1) INSTANCE SEARCH (Bill Clinton) (2) DISAMBIGUATE ENTITY TYPE (3) GENERATE ENTITY FACETS (4) p1 <- SEARCH RELATED PREDICATE (Bill Clintion, daughter)
(5) e1 <- GET ASSOCIATED ENTITIES (Bill Clintion, p1)
(6) p2 <- SEARCH RELATED PREDICATE (e1, married to)
(7) e2 <- GET ASSOCIATED ENTITIES (e1, p2)
(8) POST PROCESS (Bill Clintion, e1, p1, e2, p2)
Query Plan
122
Core Entity Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked Data:
Entity Search
123
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked Data:
:Chelsea_Clinton
:child
:Baptists:religion
:Yale_Law_School
:almaMater
...(PIVOT ENTITY)
(ASSOCIATED TRIPLES)
124
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked Data:
:Chelsea_Clinton
:child
:Baptists:religion
:Yale_Law_School
:almaMater
...
sem_rel(daughter,child)=0.054
sem_rel(daughter,child)=0.004
sem_rel(daughter,alma mater)=0.001
Which properties are semantically related to ‘daughter’?
125
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked Data:
:Chelsea_Clinton
:child
126
Computation of a measure of “semantic proximity” between two terms.
Allows a semantic approximate matching between query terms and dataset terms.
It supports a commonsense reasoning-like behavior based on the knowledge embedded in the corpus.
Distributional Semantic Relatedness
127
Use distributional semantics to semantically match query terms to predicates and classes.
Distributional principle: Words that co-occur together tend to have related meaning.◦ Allows the creation of a comprehensive semantic model
from unstructured text.◦ Based on statistical patterns over large amounts of text.◦ No human annotations.
Distributional semantics can be used to compute a semantic relatedness measure between two words.
Distributional Semantic Search
128
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked Data:
:Chelsea_Clinton
:child
(PIVOT ENTITY)
129
Distributional Semantic Search
Bill Clinton daughter married to Person
:Bill_Clinton
Query:
Linked Data:
:Chelsea_Clinton
:child:Mark_Mezvinsky
:spouse
130
Architecture
133
Index Structure
134
Results
135
Post-Processing
136
What is the highest mountain?
Second Query Example
(CLASS) (OPERATOR) Query Features
mountain - highest PODS
137
Entity Search
Mountain highest
:Mountain
Query:
Linked Data:
:typeOf
(PIVOT ENTITY)
138
Extensional Expansion
Mountain highest
:Mountain
Query:
Linked Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
139
Distributional Semantic Matching
Mountain highest
:Mountain
Query:
Linked Data:
:Everest:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:location...
:deathPlaceOf
140
Get all numerical values
Mountain highest
:Mountain
Query:
Linked Data:
:Everest:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:elevation
8848 m
8611 m
141
Apply operator functional definition
Mountain highest
:Mountain
Query:
Linked Data:
:Everest
:typeOf
(PIVOT ENTITY)
:K2:typeOf
...
:elevation
:elevation
8848 m
8611 m
SORT
TOP_MOST
142
Results
143
Semantic approximation in databases (as in any IR system): semantic best-effort.
Need some level of user disambiguation, refinement and feedback.
As we move in the direction of semantic systems we should expect the need for principled dialog mechanisms (like in human communication).
Pull the the user interaction back into the system.
From Exact to Approximate
144
From Exact to Approximate
145
User Feedback
146
147
IBM Watson (Ferrucci et al., 2010)Large-scale evidence-based model for
QA
Key contributions: ◦ Evidence-based QA system.◦ Complex and high performance QA Pipeline.◦ The system uses more than 50 scoring
components that produce scores which range from probabilities and counts to categorical features.
◦ Major cultural impact: Before: QA as AI vision, academic exercise. After: QA as an attainable software architecture in the
short term.
Evaluation: Jeopardy! Challenge
148
IBM Watson
Lexical Answer Type
Ferrucci et al. 2010149
Question analysis: includes shallow and deep parsing, extraction of logical forms, semantic role labelling, coreference resolution, relations extraction, named entity recognition, among others.
Question decomposition: decomposition of the question into separate phrases, which will generate constraints that need to be satisfied by evidence from the data.
Query processing
150
Query processing
Ferrucci et al. 2010
Question
Question & Topic Analysis
QuestionDecomposition
151
Hypothesis generation:◦ Primary search
Document and passage retrieval SPARQL queries are used over triple stores.
◦ Candidate answer generation (maximizing recall). Information extraction techniques are applied to the
search results to generate candidate answers.
Soft filtering: Application of lightweight (less resource intensive) scoring algorithms to a larger set of initial candidates to prune the list of candidates before the more intensive scoring components.
Hypothesis Generation
152
Hypothesis Generation
Ferrucci et al. 2010
Question
PrimarySearch
CandidateAnswer
Generation
Hypothesis
Generation
Answer Sources
Question & Topic Analysis
QuestionDecomposition
HypothesisGeneration
Hypothesis and Evidence Scoring
153
Hypothesis and evidence scoring: ◦ Supporting evidence retrieval
Seeks additional evidence for each candidate answer from the data sources while the deep evidence scoring step determines the degree of certainty that the retrieved evidence supports the candidate answers.
◦ Deep evidence scoring Scores are then combined into an overall evidence
profile which groups individual features into aggregate evidence dimensions.
Evidence Scoring
154
Evidence Scoring
Ferrucci et al. 2010
Answer Scoring
Question
Evidence Sources
PrimarySearch
CandidateAnswer
Generation
Hypothesis
Generation
Hypothesis and Evidence Scoring
Answer Sources
Question & Topic Analysis
QuestionDecomposition
EvidenceRetrieval
Deep Evidence Scoring
HypothesisGeneration
Hypothesis and Evidence Scoring
155
Answer merging: is a step that merges answer candidates (hypotheses) with different surface forms but with related content, combining their scores.
Ranking and confidence estimation: ranks the hypotheses and estimate their confidence based on the scores, using machine learning approaches over a training set. Multiple trained models cover different question types.
Answer Generation
156
Answer Generation Ferrucci et al. 2010Farrell, 2011
Answer Scoring
Models
Answer & Confidence
Question
Evidence Sources
Models
Models
Models
Models
ModelsPrimarySearch
CandidateAnswer
Generation
Hypothesis
Generation
Hypothesis and Evidence Scoring
Final Confidence Merging & Ranking
Synthesis
Answer Sources
Question & Topic Analysis
QuestionDecomposition
EvidenceRetrieval
Deep Evidence Scoring
HypothesisGeneration
Hypothesis and Evidence Scoring
Learned Modelshelp combine and
weigh the Evidence
157
www.medbiq.org/sites/default/files/presentations/2011/Farrell.pp
Architecture
Question100s Possible
Answers
1000’s of Pieces of Evidence
Multiple Interpretations
100,000’s scores from many simultaneous Text Analysis Algorithms100s sources
HypothesisGeneration
Hypothesis and Evidence Scoring
Final Confidence Merging & Ranking
SynthesisQuestion & Topic Analysis
QuestionDecomposition
HypothesisGeneration
Hypothesis and Evidence Scoring
Answer & Confidence
Ferrucci et al. 2010, Farrell, 2011
158 www.medbiq.org/sites/default/files/presentations/2011/Farrell.pp
UIMA for interoperability
UIMA-AS for scale-out and speed
Evidence Profile
Ferrucci et al. 2010159
160
Evaluation of QA over Linked Data
Test Collection◦ Questions◦ Datasets◦ Answers (Gold-standard)
Evaluation Measures
Evaluation Campaigns
161
Measures how complete is the answer set. The fraction of relevant instances that are
retrieved.
Which are the Jovian planets in the Solar System?◦ Returned Answers:
Mercury Jupiter Saturn
Recall
Gold-standard:– Jupiter– Saturn– Neptune– Uranus
Recall = 2/4 = 0.5
162
Measures how accurate is the answer set. The fraction of retrieved instances that are
relevant.
Which are the Jovian planets in the Solar System?◦ Returned Answers:
Mercury Jupiter Saturn
Precision
Gold-standard:– Jupiter– Saturn– Neptune– Uranus
Recall = 2/3 = 0.67
163
Measures the ranking quality. The Reciprocal-Rank (1/r) of a query can be defined as the rank r
at which a system returns the first relevant result.
Mean Reciprocal Rank (MRR)
Which are the Jovian planets in the Solar System? Returned Answers:
– Mercury– Jupiter– Saturn
Gold-standard:– Jupiter– Saturn– Neptune– Uranus
rr = 1/2 = 0.5
164
Query execution time Indexing time Index size Dataset adaptation effort (Indexing time) Semantic enrichment/disambiguation
◦ # of operations/time
Additional measures
Question Answering over Linked Data (QALD-CLEF)
INEX Linked Data Track BioASQ SemSearch
Test Collections
168
QALDhttp://www.sc.cit-ec.uni-bielefeld.de/qald/
169
QALD is a series of evaluation campaigns on question answering over linked data.◦ QALD-1 (ESWC 2011)◦ QALD-2 as part of the workshop
(Interacting with Linked Data (ESWC 2012))◦ QALD-3 (CLEF 2013)◦ QALD-4 (CLEF 2014)
It is aimed at all kinds of systems that mediate between a user, expressing his or her information need in natural language, and semantic data.
QALD
170
QALD-4 is part of the Question Answering track at CLEF 2014:
http://nlp.uned.es/clef-qa/
Tasks:◦ 1. Multilingual question answering over DBpedia◦ 2. Biomedical question answering on interlinked
data◦ 3. Hybrid question answering
QALD-4
171
Task: Given a natural language question or keywords,
either retrieve the correct answer(s) from a given RDF repository, or provide a SPARQL query that retrieves these answer(s).
◦ Dataset: DBpedia 3.9 (with multilingual labels)◦ Questions: 200 training + 50 test◦ Seven languages:
English, Spanish, German, Italian, French, Dutch, Romanian
Task 1: Multilingual question answering
172
Example
<question id = "36" answertype = "resource" aggregation = "false" onlydbo = "true" >
Through which countries does the Yenisei river flow?Durch welche Länder fließt der Yenisei?¿Por qué países fluye el río Yenisei?...
PREFIX res: <http://dbpedia.org/resource/>PREFIX dbo: <http://dbpedia.org/ontology/>SELECT DISTINCT ?uri WHERE { res:Yenisei_River dbo:country ?uri .}
173
Datasets: SIDER, Diseasome, Drugbank Questions: 25 training + 25 test require integration of information from different
datasets
Example: What is the side effects of drugs used for Tuberculosis?
Task 2: Biomedical question answering on interlinked data
SELECT DISTINCT ?x WHERE { disease:1154
diseasome:possibleDrug ?v2 . ?v2 a drugbank:drugs . ?v3 owl:sameAs ?v2 . ?v3 sider:sideEffect ?x .}
174
Dataset: DBpedia 3.9 (with English abstracts) Questions: 25 training + 10 test require both structured data and free text from the
abstract to be answered
Example: Give me the currencies of all G8 countries.
Task 3: Hybrid question answering
SELECT DISTINCT ?uri WHERE { ?x text:"member of" text:"G8" . ?x dbo:currency ?uri .}
175
Focuses on the combination of textual and structured data. Datasets:
◦ English Wikipedia (MediaWiki XML Format)◦ DBpedia 3.8 & YAGO2 (RDF)◦ Links among the Wikipedia, DBpedia 3.8, and YAGO2 URI's.
Tasks:◦ Ad-hoc Task: return a ranked list of results in response to a search
topic that is formulated as a keyword query (144 search topics).◦ Jeopardy Task: Investigate retrieval techniques over a set of
natural-language Jeopardy clues (105 search topics – 74 (2012) + 31 (2013)).
INEX Linked Data Track (2013)
https://inex.mmci.uni-saarland.de/tracks/lod/
181
INEX Linked Data Track (2013)
182
Focuses on entity search over Linked Datasets. Datasets:
◦ Sample of Linked Data crawled from publicly available sources (based on the Billion Triple Challenge 2009).
Tasks:◦ Entity Search: Queries that refer to one particular
entity. Tiny sample of Yahoo! Search Query. ◦ List Search: The goal of this track is select objects
that match particular criteria. These queries have been hand-written by the organizing committee.
SemSearch Challenge
http://semsearch.yahoo.com/datasets.php#
183
List Search queries:◦ republics of the former Yugoslavia ◦ ten ancient Greek city ◦ kingdoms of Cyprus ◦ the four of the companions of the prophet ◦ Japanese-born players who have played in MLB
where the British monarch is also head of state ◦ nations where Portuguese is an official language ◦ bishops who sat in the House of Lords ◦ Apollo astronauts who walked on the Moon
SemSearch Challenge
184
Entity Search queries:◦ 1978 cj5 jeep ◦ employment agencies w. 14th street ◦ nyc zip code ◦ waterville Maine ◦ LOS ANGELES CALIFORNIA ◦ ibm ◦ KARL BENZ ◦ MIT
SemSearch Challenge
185
Baseline
Balog & Neumayer, A Test Collection for Entity Search in DBpedia (2013).
186
Datasets:◦ PubMed documents
Tasks:◦ 1a: Large-Scale Online Biomedical Semantic
Indexing Automatic annotation of PubMed documents. Training data is provided.
◦ 1b: Introductory Biomedical Semantic QA 300 questions and related material (concepts, triples
and golden answers).
187
Metrics, Statistics, Tests - Tetsuya Sakai (IR)◦ http://www.promise-noe.eu/documents/10156/26e7f254-1fe
b-4169-9204-1c53cc1fd2d7
Building test Collections (IR Evaluation - Ian Soboroff)◦ http://www.promise-noe.eu/documents/10156/951b6dfb-a40
4-46ce-b3bd-4bbe6b290bfd
Going Deeper
188
189
Do-it-yourself (DIY): Core Resources
DBpedia◦ http://dbpedia.org/
YAGO◦ http://www.mpi-inf.mpg.de/yago-naga/yago/
Freebase◦ http://www.freebase.com/
Wikipedia dumps◦ http://dumps.wikimedia.org/
ConceptNet◦ http:// conceptnet5.media.mit.edu/
Common Crawl◦ http://commoncrawl.org/
Where to use:◦ As a commonsense KB or as a data source
Datasets
190
High domain coverage:◦ ~95% of Jeopardy! Answers.◦ ~98% of TREC answers.
Wikipedia is entity-centric. Curated link structure. Complementary tools:
◦ Wikipedia Miner
Where to use:◦ Construction of distributional semantic models.◦ As a commonsense KB
Wikipedia
191
WordNet◦ http://wordnet.princeton.edu/
Wiktionary◦ http://www.wiktionary.org/◦ API: https://www.mediawiki.org/wiki/API:Main_page
FrameNet◦ https://framenet.icsi.berkeley.edu/fndrupal/
VerbNet◦ http://verbs.colorado.edu/~mpalmer/projects/verbnet.html
English lexicon for DBpedia 3.8 (in the lemon format)◦ http://lemon-model.net/lexica/dbpedia_en/
PATTY (collection of semantically-typed relational patterns)◦ http://www.mpi-inf.mpg.de/yago-naga/patty/
BabelNet◦ http://babelnet.org/
Where to use:◦ Query expansion◦ Semantic similarity◦ Semantic relatedness◦ Word sense disambiguation
Lexical Resources
192
Lucene & Solr◦ http://lucene.apache.org/
Terrier◦ http://terrier.org/
Where to use:◦ Answer Retrieval◦ Scoring◦ Query-Data matching
Indexing & Search Engines
193
GATE (General Architecture for Text Engineering)◦ http://gate.ac.uk/
NLTK (Natural Language Toolkit)◦ http://nltk.org/
Stanford NLP◦ http://www-nlp.stanford.edu/software/index.shtml
LingPipe◦ http://alias-i.com/lingpipe/index.html
Where to use:◦ Question Analysis
Text Processing Tools
194
MALT◦ http://www.maltparser.org/◦ Languages (pre-trained): English, French, Swedish
Stanford parser◦ http://nlp.stanford.edu/software/lex-parser.shtml◦ Languages: English, German, Chinese, and others
CHAOS◦ http://art.uniroma2.it/external/chaosproject/◦ Languages: English, Italian
C&C Parser◦ http://svn.ask.it.usyd.edu.au/trac/candc
Where to Use:◦ Question Analysis
Parsers
195
NERD (Named Entity Recognition and Disambiguation)◦ http://nerd.eurecom.fr/
Stanford Named Entity Recognizer◦ http://nlp.stanford.edu/software/CRF-NER.shtml
FOX (Federated Knowledge Extraction Framework)◦ http://fox.aksw.org
DBpedia Spotlight◦ http://spotlight.dbpedia.org
Where to use:◦ Question Analysis◦ Query-Data Matching
Named Entity Recognition/Linking
196
Wikipedia Miner◦ http://wikipedia-miner.cms.waikato.ac.nz/
WS4J (Java API for several semantic relatedness algorithms)◦ https://code.google.com/p/ws4j/
SecondString (string matching)◦ http://secondstring.sourceforge.net
EasyESA (distributional semantic relatedness)◦ http://treo.deri.ie/easyESA
Sspace (distributional semantics framework)◦ https://github.com/fozziethebeat/S-Space
Where to use:◦ Query-Data matching◦ Semantic relatedness & similiarity◦ Word Sense Disambiguation
String similarity and semantic relatedness
197
DIRT◦ Paraphrase Collection:
http://aclweb.org/aclwiki/index.php?title◦ DIRT_Paraphrase_Collection
Demo: http://demo.patrickpantel.com/demos/lexsem/paraphrase.htm
PPDB (The Paraphrase Database)◦ http://www.cis.upenn.edu/~ccb/ppdb/
Where to use:◦ Query-Data matching
Text Entailment
198
Apache UIMA◦ http://uima.apache.org/
Open Advancement of Question Answering Systems (OAQA)◦ http://oaqa.github.io/
OKBQA◦ http://www.okbqa.org/documentation
https://github.com/okbqa
Where to use:◦ Components integration
NLP Pipelines
199
200
Trends
Querying distributed linked data Integration of structured and unstructured data User interaction and context mechanisms Integration of reasoning (deductive, inductive,
counterfactual, abductive ...) on QA approaches and test collections
Measuring confidence and answer uncertainty Multilinguality Machine Learning Reproducibility in QA research
Research Topics/Opportunities
201
Big/Linked Data demand new principled semantic approaches to cope with the scale and heterogeneity of data.
Part of the Semantic Web/AI vision can be addressed today with a multi-disciplinary perspective:◦ Linked Data, IR and NLP
The multidiscipinarity of the QA problem can serve show what semantic computing have achieved and can be transported to other information system types.
Challenges are moving from the construction of basic QA systems to more sophisticated semantic functionalities.
Very active research area.
Take-away Message
202
Unger, C., Freitas, A., Cimiano, P., Introduction to Question Answering for Linked Data, Reasoning Web Summer School, 2014.
Introductory Reference
203
[1] Eifrem, A NOSQL Overview And The Benefits Of Graph Database, 2009
[2] Idehen, Understanding Linked Data via EAV Model, 2010.
[3] Kaufmann & Bernstein, How Useful are Natural Language Interfaces to the Semantic Web for Casual End-users?, 2007
[4] Chin-Yew Lin, Question Answering.
[5] Farah Benamara, Question Answering Systems: State of the Art and Future Directions.
[6] Yahya et al Robust Question Answering over the Web of Linked Data, CIKM, 2013.
[7] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends, 2012.
[8] Freitas et al., Answering Natural Language Queries over Linked Data Graphs: A Distributional Semantics Approach,, 2014.
[9] Freitas et al., Querying Heterogeneous Datasets on the Linked Data Web: Challenges, Approaches and Trends, 2012.
[10] Freitas & Curry, Natural Language Queries over Heterogeneous Linked Data Graphs: A Distributional-Compositional Semantics Approach, IUI, 2014.
[11] Cimiano et al., Towards portable natural language interfaces to knowledge bases, 2008.
[12] Lopez et al., PowerAqua: fishing the semantic web, 2006.
[13] Damljanovic et al., Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-based Lookup through the User Interaction, 2010
[14] Unger et al. Template-based Question Answering over RDF Data, 2012.
[15] Cabrio et al., QAKiS: an Open Domain QA System based on Relational Patterns, 2012.
[16] How Useful Are Natural Language Interfaces to the Semantic Web for Casual End-Users?, 2007.
[17] Popescu et al.,Towards a theory of natural language interfaces to databases., 2003.
[18] Farrel, IBM Watson A Brief Overview and Thoughts for Healthcare Education and Performance Improvement
References
204