Learn
-
Upload
allegra-hatfield -
Category
Documents
-
view
21 -
download
2
description
Transcript of Learn
112/04/19 1
Learn Learn Question Focus and Dependency RelQuestion Focus and Dependency Relations from ations from Web Search Results Web Search Results for for
Question ClassificationQuestion Classification
Wen-Hsiang Lu (盧文祥 )[email protected]
Web Mining and Multilingual Knowledge System Laboratory, Department of Computer Science and Information Engineeri
ng, National Cheng Kung University
WMMKS LabWMMKS Lab
112/04/19 2WMMKS LabWMMKS Lab
Research InterestResearch Interest
Web Mining
NaturalLanguageProcessing
Information
Retrieval
112/04/19 3
Unknown Term Translation & Cross-Language Information Retrieval A Multi-Stage Translation Extraction Method for Unknown Terms Usi
ng Web Search Results
Question Answering & Machine Translation Using Web Search Results to Learn Question Focus and Depen
dency Relations for Question Classification Using Phrase and Fluency to Improve Statistical Machine Translation
User Modeling & Web Search Learning Question Structure based on Website Link Structure to
Improve Natural Language Search Improving Short-Query Web Search based on User Goal Identification
Cross-Language Medical Information Retrieval MMODE: http://mmode.no-ip.org/
WMMKS LabWMMKS Lab
Research IssuesResearch Issues
112/04/19 4WMMKS LabWMMKS Lab
雅各氏症候群
112/04/19 5
Introduction Related Work Approach Experiment Conclusion Future Work
WMMKS LabWMMKS Lab
OutlineOutline
112/04/19 6
Introduction Related Work Approach Experiment Conclusion Future Work
WMMKS LabWMMKS Lab
OutlineOutline
112/04/19 7WMMKS LabWMMKS Lab
Question Answering (QA) SystemQuestion Answering (QA) System
1. Question Analysis: Question Classification, Keywords Extraction.
2. Document Retrieval: Retrieve related documents.
3. Answer Extraction: Extract a exact answer.
112/04/19 8WMMKS LabWMMKS Lab
Motivation (1/3)Motivation (1/3)
Importance of Question Classification Dan Moldovan proposed a report [Dan Moldovan 2000]
112/04/19 9WMMKS LabWMMKS Lab
Rule-based Question Classification Manual and unrealistic method.
Motivation (2/3)Motivation (2/3)
. Need a large number of training data. . Too many features may be noise.
Machine Learning-based Question Classification
Support Vector Machine (SVM)
112/04/19 10WMMKS LabWMMKS Lab
A new method for question classification.
Observe some useful features of question.
Solve the problem of insufficient training data.
Motivation (3/3)Motivation (3/3)
112/04/19 11WMMKS LabWMMKS Lab
Idea of Approach (1/4)Idea of Approach (1/4)
Many questions have ambiguous question words
Importance of Question Focus (QF). Use QF identification for question classification.
112/04/19 12WMMKS LabWMMKS Lab
If we do not have enough information to identify the type of QF.
QF Dependency Verb Dependency Quantifier Dependency Noun
Question
Question Type
: Dependency Features : Question Type
: (Unigram) Semantic Dependency Relation
: (Bigram) Semantic Dependency Relation
Idea of Approach (2/4)Idea of Approach (2/4)
112/04/19 13WMMKS LabWMMKS Lab
Example
Idea of Approach (3/4)Idea of Approach (3/4)
112/04/19 14WMMKS LabWMMKS Lab
Use QF and dependency features to classify questions. Learning QF and other dependency features from Web. Propose a Semantic Dependency Relation Model (SDRM).
Idea of Approach (4/4)Idea of Approach (4/4)
112/04/19 15
Introduction Related Work Approach Experiment Conclusion Future Work
WMMKS LabWMMKS Lab
OutlineOutline
112/04/19 16WMMKS LabWMMKS Lab
[Richard F. E. Sutcliffe 2005][Kui-Lam Kwok 2005][Ellen Riloff 2000]
Rule-based Question ClassificationRule-based Question Classification
5W(Who, When, Where, What, Why)Who → Person.When → Time.Where → Location.What → Difficult type.Why → Reason.
112/04/19 17WMMKS LabWMMKS Lab
Several methods based on SVM. [Zhang, 2003; Suzuki, 2003; Day, 2005]
Machine Learning-based Machine Learning-based Question ClassificationQuestion Classification
KDAG Kernel SVMQuestion Feature Vector Question Type
112/04/19 18WMMKS LabWMMKS Lab
Use a Web search engine to identify question type. [Solorio, 2004]
“Who is the President of the French Republic?”
Web-based Question ClassificationWeb-based Question Classification
112/04/19 19WMMKS LabWMMKS Lab
Language Model for Question Classification [Li, 2002]
Too many features may be noise.
Statistics-based Question ClassificationStatistics-based Question Classification
112/04/19 20
Introduction Related Work Approach Experiment Conclusion Future Work
WMMKS LabWMMKS Lab
OutlineOutline
112/04/19 21WMMKS LabWMMKS Lab
Architecture of Question ClassificationArchitecture of Question Classification
112/04/19 22WMMKS LabWMMKS Lab
6 types of questions Person Location Organization Number Date Artifact
Question TypeQuestion Type
112/04/19 23WMMKS LabWMMKS Lab
We define 17 basic rules for simple questions.
Basic Classification RulesBasic Classification Rules
112/04/19 24WMMKS LabWMMKS Lab
Architecture for Learning Dependency Features
Extracting Dependency Features Algorithm
Learning Semantic Learning Semantic Dependency Features (1/3)Dependency Features (1/3)
112/04/19 25WMMKS LabWMMKS Lab
Architecture for Learning Dependency Features
Learning Semantic Learning Semantic Dependency Features (2/3)Dependency Features (2/3)
112/04/19 26WMMKS LabWMMKS Lab
Extracting Dependency Features Algorithm
Learning Semantic Learning Semantic Dependency Features (3/3)Dependency Features (3/3)
..
112/04/19 27WMMKS LabWMMKS Lab
Question Focus Question Focus Identification Algorithm (1/2)Identification Algorithm (1/2)
Algorithm
112/04/19 28WMMKS LabWMMKS Lab
Example
Question Focus Question Focus Identification Algorithm (2/2)Identification Algorithm (2/2)
112/04/19 29WMMKS LabWMMKS Lab
Unigram-SDRM
Bigram-SDRM
Semantic Dependency Semantic Dependency Relation Model (SDMR) (1/12)Relation Model (SDMR) (1/12)
112/04/19 30WMMKS LabWMMKS Lab
Unigram-SDRM
P(C|Q) need many questions to train.
Semantic Dependency Semantic Dependency Relation Model (SDMR) (2/12)Relation Model (SDMR) (2/12)
Q
Question
C
Question Type
P(C|Q)
112/04/19 31WMMKS LabWMMKS Lab
P(DC|C): Collect related search results by every type.
P(Q|DC): Use DC to determine the question type.
Unigram-SDRM
Semantic Dependency Semantic Dependency Relation Model (SDMR) (3/12)Relation Model (SDMR) (3/12)
C
Question
DC
Question Type
P(DC|C)Q
P(Q|DC)
Web search result
112/04/19 32WMMKS LabWMMKS Lab
Unigram-SDRM
Semantic Dependency Semantic Dependency Relation Model (SDRM) (4/12)Relation Model (SDRM) (4/12)
112/04/19 33WMMKS LabWMMKS Lab
Unigram-SDRM
Semantic Dependency Semantic Dependency Relation Model (SDRM) (5/12)Relation Model (SDRM) (5/12)
Q={QF,QD}, QD={DV,DQ,DN}.
DV : Dependency VerbDQ: Dependency QuantifierDN: Dependency Noun
112/04/19 34WMMKS LabWMMKS Lab
DV={ dv1, dv2, ,⋯ dvi}, DQ={ dq1, dq2, , ⋯ dqj}, DN={ dn1, dn2, , ⋯ dnk}.
Unigram-SDRM
Semantic Dependency Semantic Dependency Relation Model (SDRM) (6/12)Relation Model (SDRM) (6/12)
112/04/19 35WMMKS LabWMMKS Lab
P(DC|C) P(QF |DC), P(dv|DC), P(dq|DC), P(dn|DC)
Parameter Estimation of Unigram-SDRM
Semantic Dependency Semantic Dependency Relation Model (SDRM) (7/12)Relation Model (SDRM) (7/12)
N(QF): The number of occurrence of the QF in Q. NQF(DC): Total number of all QF collected from search results.
112/04/19 36WMMKS LabWMMKS Lab
Semantic Dependency Semantic Dependency Relation Model (SDRelation Model (SDRMRM) (8/12)) (8/12)
Parameter Estimation of Unigram-SDRM
112/04/19 37WMMKS LabWMMKS Lab
Bigram-SDRM
Semantic Dependency Semantic Dependency Relation Model (SDRM) (9/12)Relation Model (SDRM) (9/12)
112/04/19 38WMMKS LabWMMKS Lab
Bigram-SDRM
Semantic Dependency Semantic Dependency Relation Model (SDRM) (10/12)Relation Model (SDRM) (10/12)
112/04/19 39WMMKS LabWMMKS Lab
Parameter Estimation of Bigram-SDRM
P(DC|C): The same as Unigram-SDRM P(QF|DC): The same as Unigram-SDRM P(dV|QF,DC), P(dQ|QF,DC), P(dN|QF,DC)
Nsentence(dv,QF): The number of sentence containing dv and QF. Nsentence(QF): Total number of sentence containing QF.
Semantic Dependency Semantic Dependency Relation Model (SDRM) (11/12)Relation Model (SDRM) (11/12)
112/04/19 40WMMKS LabWMMKS Lab
Parameter Estimation of Bigram-SDRM
Semantic Dependency Semantic Dependency Relation Model (SDRM) (12/12)Relation Model (SDRM) (12/12)
112/04/19 41
Introduction Related Work Approach Experiment Conclusion Future Work
WMMKS LabWMMKS Lab
OutlineOutline
112/04/19 42WMMKS LabWMMKS Lab
SDRM Performance Evaluation
ExperimentExperiment
. Unigram-SDRM v.s. Bigram-SDRM
. Combination with different weights
SDRM v.s. Language Model. Use questions as training data
. Use Web as training data
. Questions v.s. Web
112/04/19 43WMMKS LabWMMKS Lab
Collect questions from NTCIR-5 CLQA. 4-fold cross-validation.
Experimental DataExperimental Data
112/04/19 44WMMKS LabWMMKS Lab
Result
Unigram-SDRM v.s. Bigram-SDRMUnigram-SDRM v.s. Bigram-SDRM
112/04/19 45WMMKS LabWMMKS Lab
Example
For unigram: “ 人” ,” 創下” ,” 駕駛” are trained successfully.
For bigram: “ 人 _ 創下” are not trained successfully.
Unigram-SDRM v.s. Bigram-SDRM (2/2Unigram-SDRM v.s. Bigram-SDRM (2/2))
112/04/19 46WMMKS LabWMMKS Lab
Different weights for different features
α: The weight of QF, β: The weight of dV, γ: The weight of dQ, δ: The weight of dN.
Combination with different weight (1/3)Combination with different weight (1/3)
112/04/19 47WMMKS LabWMMKS Lab
Comparison of 4 dependency features
Combination with different weight (2/3)Combination with different weight (2/3)
112/04/19 48WMMKS LabWMMKS Lab
16 experimentsBest weighting: 0.23QF, 0.29DV, 0.48DQ.To solve some problem about mathematics. Example: QF and DV
α: The weight of QF
β: The weight of DV.
α=(1-0.77)/[(1-0.77)+(1-0.71)]
β=(1-0.71)/ [(1-0.77)+(1-0.71)]
Combination with different weight (3/3)Combination with different weight (3/3)
112/04/19 49WMMKS LabWMMKS Lab
Result
Use questions as training data (1/2)Use questions as training data (1/2)
112/04/19 50WMMKS LabWMMKS Lab
Example
Use questions as training data (2/2)Use questions as training data (2/2)
For LM: “ 網球選手” ,” 選手為” are not trained successfully.
For SDRM: “ 選手” , ” 奪得” are trained successfully.
112/04/19 51WMMKS LabWMMKS Lab
Result
Use Web search results as Use Web search results as training data (1/2)training data (1/2)
112/04/19 52WMMKS LabWMMKS Lab
Example
For LM: “ 何國” are not trained successfully.
For SDRM: “ 國” , ” 設於” are trained successfully.
Use Web search results as Use Web search results as training data (2/2)training data (2/2)
112/04/19 53WMMKS LabWMMKS Lab
Result
Question v.s. Web (1/3)Question v.s. Web (1/3)
Trained Question: LM can train QF of the question. Untrained Question: LM can’t train QF of the question.
112/04/19 54WMMKS LabWMMKS Lab
Example of trained question
Question v.s. Web (2/3)Question v.s. Web (2/3)
For LM: “ 何地” are trained successfully.
For SDRM: “ 地” , ” 舉行” are trained successfully, but these
terms are also trained on other types.
112/04/19 55WMMKS LabWMMKS Lab
Example of untrained question
Question vs. Web (3/3)Question vs. Web (3/3)
For LM: “ 女星” , ” 獲得” are not trained successfully.
For SDRM: “ 女星” , ” 獲得” are trained successfully.
112/04/19 56WMMKS LabWMMKS Lab
Discussion
ConclusionConclusion
We need to enhance our learning method and performance. We need better smoothing method.
Conclusion We propose a new model SDRM which uses
question focus and dependency features for question
classification. Use Web search results as training data to solve the
problem of insufficient training data.
112/04/19 57WMMKS LabWMMKS Lab
Further works in the future
Future WorkFuture Work
Enhance the performance of learning method. Consider the importance of features in the question.Question focus and dependency features may be
used for other process steps of question answer systems.
112/04/19 58
Thank YouThank You
WMMKS LabWMMKS Lab