Intro to Deep Learning for Question Answering

43
Intro to Deep Learning for Question Answering Traian Rebedea, Ph.D. Department of Computer Science, UPB [email protected] / [email protected]

Transcript of Intro to Deep Learning for Question Answering

Page 1: Intro to Deep Learning for Question Answering

Intro to Deep Learning for Question Answering

Traian Rebedea, Ph.D.

Department of Computer Science, UPB

[email protected] / [email protected]

Page 2: Intro to Deep Learning for Question Answering

About Me• The Academic Part:

• Education• B.Sc., “Politehnica” University of Bucharest, CS Dept., Romania

• M.Sc., “Politehnica” University of Bucharest, CS Dept., Romania

• Ph.D., Natural Language Processing & Technology-Enhanced Learning, “Politehnica” University of Bucharest, CS Dept., Romania

• Over 25 articles published at world-wide top conferences:• http://www.informatik.uni-trier.de/~ley/db/indices/a-

tree/r/Rebedea:Traian.html

• 4 book chapters on NLP & Technology Enhanced Learning

• Jobs:• Lecturer, “Politehnica” University of Bucharest, CS Dept., Romania

• Teaching Assistant, “Politehnica” University of Bucharest, CS Dept., Romania

Intro to Deep Learning for Question Answering 230 January 2017

Page 3: Intro to Deep Learning for Question Answering

About Me

• The Industrial Part• Jobs

• PeopleGraph, Bucharest, Romania – Researcher, Natural Language Processing, Machine Learning & Information Retrieval

• TeamNet, Bucharest, Romania – Research Consultant, Opinion Mining & Natural Language Processing

• Create IT, Bucharest, Romania – Founder & Web Developer• ProSoft Solutions, Bucharest, Romania – Java Developer

• Various collaborations with other companies: Bitdefender, Adobe, Treeworks, UberVU

• Other• Tutor for the Erasmus-Mundus DMKM Information Retrieval course (taught by

Ricard Gavalda from UPC)

3Intro to Deep Learning for Question Answering30 January 2017

Page 4: Intro to Deep Learning for Question Answering

Overview

• Why question answering (QA)?

• Previous work in QA (before deep learning)

• Deep learning for QA (intro)• Simple CNN

• Dependency tree – RNN

• LSTM-based solution

Intro to Deep Learning for Question Answering 430 January 2017

Page 5: Intro to Deep Learning for Question Answering

Why Question Answering?

• QA systems have been around for quite some time

• In the 60s-80s, mostly domain-dependent QA

• Quite related to conversational agents, at least at the beginning

• Open domain QA systems received larger attention in the 90s• Combination of NLP and IR/IE techniques

• One of the most famous: MIT START system (http://start.csail.mit.edu/index.php)

• Wolfram Alpha (https://www.wolframalpha.com/)

• Advanced systems use a combination of “shallow” methods together with knowledge bases and more complex NLP methods

Intro to Deep Learning for Question Answering 530 January 2017

Page 6: Intro to Deep Learning for Question Answering

Why Question Answering?

• In the last 20 years, TREC and ACL provided workshops and tracks for various flavor of QA tasks (closed and open-domain)

• Lately, a large number of new datasets and tasks have become available which have improved the performance of (open-domain) QA systems

• QALD: Question-Answering for Linked Data (http://qald.sebastianwalter.org/) • Given a knowledge base and a question in natural language, extract the correct

answers from the knowledge base• Small corpus: each year ~ 100 Q-A pairs for training and 100 for evaluation, 6 years

=> ~ 600 Q-A pairs for training and 600 for evaluation

• Allen AI Question Answering (http://allenai.org/data.html) • (Open-domain) QA task which contains questions asked to primary/secondary

students in different topics (science, maths, etc.)• Several datasets ~ 400-1000 Q-A pairs

Intro to Deep Learning for Question Answering 630 January 2017

Page 7: Intro to Deep Learning for Question Answering

Why Question Answering?

• SQuAD - Stanford QA Dataset (https://rajpurkar.github.io/SQuAD-explorer/)• Open-domain answer sentence selection• 100,000+ Q-A pairs on 500+ articles

• VisualQA (http://www.visualqa.org/) • Given an image and a question in natural language, provide the correct answer (open-

domain)• 600,000+ questions on more than 200,000 images

• MovieQA (http://movieqa.cs.toronto.edu/home/)• Given a movie and a question in natural language, provide the correct answer (open-domain)• almost 15,000 multiple choice question answers obtained from over 400 movies

• Several others

• Right now we are building a dataset similar to QALD, however it is aimed at answering questions from databases

Intro to Deep Learning for Question Answering 730 January 2017

Page 8: Intro to Deep Learning for Question Answering

Previous work in Question Answering

• Before deep-learning / non deep-learning

• Use NLP techniques to find best match between question and candidate answers: feature engineering or use expensive semantic resources• Lexical / IR: similarity measures (cosine + tf/idf, stemming, lemmatization, BM25,

other retrieval models)• Semantic: use non-neural word emdeddings (e.g. Latent Semantic Analysis - LSA), use

additional resources (linguistic ontologies – WordNet, other databases, thesauri, ontologies – Freebase, DBpedia)

• Syntactic: compute constituency/dependency trees of question and answer – try to align/match the two trees

• Mixed/other: string kernels, tree kernels, aligh/match based both on syntax and semantics, classifiers using a mix of several features discussed until now

Intro to Deep Learning for Question Answering 830 January 2017

Page 9: Intro to Deep Learning for Question Answering

Discussed Question Answering Tasks

• Answer sentence selection• Given a question

• Several possible sentences that also contain the answer (and anything else)

• Find the ones containing the answer

• Usually the sentences (answers) are longer than the questions

Q: When did Amtrak begin operations?

A: Amtrak has not turned a profit since it was founded in 1971.

• Factoid question answering (“quiz-bowl”)• Given a longer description of the factoid answer

(usually an entity, event, etc.) - question

• Find the entity as “fast” as possible - answer(using as few information/sentences/words as possible from the description)

• Question is longer than answer

Q: A: Holy Roman Empire

Intro to Deep Learning for Question Answering 930 January 2017

Page 10: Intro to Deep Learning for Question Answering

Deep learning for QA (intro)

• Simple CNN• Yu, Lei, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. "Deep learning for

answer sentence selection." arXiv preprint arXiv:1412.1632 (2014) (Oxford U. & Google DeepMind)

• Extension (good study): Feng, Minwei, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou. "Applying deep learning to answer selection: A study and an open task." In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on, pp. 813-820. IEEE, 2015 (IBM Watson)

• Dependency tree – RNN• Iyyer, Mohit, Jordan L. Boyd-Graber, Leonardo Max Batista Claudino, Richard Socher,

and Hal Daumé III. "A Neural Network for Factoid Question Answering over Paragraphs." In EMNLP, pp. 633-644. 2014 (Maryland & Colorado & Stanford U.)

• LSTM-based solution• Tan, Ming, Bing Xiang, and Bowen Zhou. "LSTM-based Deep Learning Models for

non-factoid answer selection." arXiv preprint arXiv:1511.04108 (2015) (IBM Watson)

Intro to Deep Learning for Question Answering 1030 January 2017

Page 11: Intro to Deep Learning for Question Answering

Simple CNN for Answer Sentence Selection

• (qi, aij, yij) • Binary classification problem• qi – question• aij – candidate sentences• yij = 1 if aij contains the answer to question qi

0 o.w.

• Assumption: correct answers have high semantic similarity to questions

• No (actually few) hand-crafted features

• Focus on modeling questions and answers as vectors, and evaluate the relatedness of each QA pair in a shared vector space

Intro to Deep Learning for Question Answering 1130 January 2017

Page 12: Intro to Deep Learning for Question Answering

Simple CNN for Answer Sentence Selection

• Given the QA pair is modelled in the same d-dimensional vector space, the probability of the answer being correct is:

• Intuition: transform the answer into the question-space q’ = M a, then use dot-product to assess the similarity between q and q’

• Finally, the sigmoid function transforms the generated scores (dot-products are not normalized!) to a probability (number between 0..1)

• Training by minimizing cross-entropy on training/labelled set

Intro to Deep Learning for Question Answering 1230 January 2017

Page 13: Intro to Deep Learning for Question Answering

Simple CNN for Answer Sentence Selection

• Bag-of-words model (simplest model, just embedddings, no NN here)

• Uses word embeddings for the vector space

• Then averages over all the words in the text (question or answer)

Intro to Deep Learning for Question Answering 1330 January 2017

Page 14: Intro to Deep Learning for Question Answering

Simple CNN for Answer Sentence Selection

• Bigram model

• Uses a simple CNN (Convolutional NN)• Sensitive to word order

• Can capture information from n-grams

• Authors only use bigrams (adjacent words), but can be extended

• Use a single convolutional layer + average (sum) pooling layer

• Convolution vector (filter) is shared by all bigrams

Intro to Deep Learning for Question Answering 1430 January 2017

Page 15: Intro to Deep Learning for Question Answering

Simple CNN for Answer Sentence Selection

• Convolutional filter combines adjacent words (bigrams)

• Then average pooling combines all bigram features

• In practice, we just need to learn how to combine the embedding of the wordsin the bigram

Intro to Deep Learning for Question Answering 1530 January 2017

Page 16: Intro to Deep Learning for Question Answering

Simple CNN for Answer Sentence Selection

• Experiments on Text Retrieval Conference (TREC) QA track (8-13) datasets, with candidate answers automatically selected from each question’s document pool

• Task: rank candidate answers given question (IR specific task)

• Assess using Mean Average Precision (MAP) and Mean Reciprocal Rank (MRR)

Intro to Deep Learning for Question Answering 1630 January 2017

Page 17: Intro to Deep Learning for Question Answering

What is MAP & MRR?

• IR metrics, more details here: https://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf OR any IR Book

Intro to Deep Learning for Question Answering 1730 January 2017

Page 18: Intro to Deep Learning for Question Answering

Simple CNN for Answer Sentence Selection

• Experimental results

• Used precomputed word embeddings (d=50) – details in paper, embeddings available online• Embeddings could be improved for this task, but dataset is small

• Other weights randomly intitialised using a Gaussian distribution

• All hyperparameters were optimised via grid search

• AdaGrad for training

• And also added some hand-crafted features (there is a justification in paper, not very convincing):• word co-occurrence count between Q & A• word co-occurrence count weighted by IDF between Q & A

• Together with the QA matching probability as provided by the distributional model (CNN) used to train a logistic regression classifier

Intro to Deep Learning for Question Answering 1830 January 2017

Page 19: Intro to Deep Learning for Question Answering

Simple CNN for Answer Sentence Selection

• Results were encouraging

• Co-occurance features are important

• Distributional model can assess semantics

Intro to Deep Learning for Question Answering 1930 January 2017

Page 20: Intro to Deep Learning for Question Answering

Dependency Tree – RNN

• Solution proposed for factoid “bowl quiz” QA

• Use a dependency tree recursive neural network (DT-RNN)

• Extend it to combine predictions across sentences to produce a question answering neural network with trans-sentential averaging (called QANTA)

Intro to Deep Learning for Question Answering 2030 January 2017

Page 21: Intro to Deep Learning for Question Answering

Dependency Tree – RNN

• Dependency trees are used to model syntax in NLP

• Two main types of (syntactic) parse trees: constituency and dependency

• Dependencies are actually directed edges between words

Intro to Deep Learning for Question Answering 2130 January 2017

Page 22: Intro to Deep Learning for Question Answering

Dependency Tree – RNN

• DT-RNN is just briefly explained in the paper

• More details are available in another paper: http://nlp.stanford.edu/~socherr/SocherKarpathyLeManningNg_TACL2013.pdf

• Key elements: original word embeddings, hidden representation for words (of the same size as the original embeddings), one transformation for each dependency type in the hidden space

For leaf nodes

For inner nodes

Intro to Deep Learning for Question Answering 2230 January 2017

Page 23: Intro to Deep Learning for Question Answering

Dependency Tree – RNN

• Example

Simpler formula for inner nodes

Intro to Deep Learning for Question Answering 2330 January 2017

Page 24: Intro to Deep Learning for Question Answering

Dependency Tree – RNN

• Training: limit the number of possible answers => problem viewed as a multi-class classication task

• Softmax can be used for the decision in the final layer by using features from question and answer

• Improvement: word vectors associated with answers to be trained in the same vector space as the question text

• Train both the answers and questions jointly in a single model

• Encourage vectors of question sentences to be near their correct answers and far away from incorrect answers

• => Can use hinge loss

• => “While we are not interested in obtaining a ranked list of answers, we observe better performance by adding the weighted approximaterank pairwise (WARP) loss”

Intro to Deep Learning for Question Answering 2430 January 2017

Page 25: Intro to Deep Learning for Question Answering

Dependency Tree – RNN

• Correct answer c

• Sample randomly j incorrect answers from the set of all incorrect answers and denote this subset as Z

• S – set of all nods in a dependency tree

• Cost / lost function is WARP – a variation of hinge loss

• More details how to approximate L(rank(c, s, Z)) in section 3.2

• Training using backpropagation through structure

Intro to Deep Learning for Question Answering 2530 January 2017

Page 26: Intro to Deep Learning for Question Answering

Dependency Tree – RNN

• QANTA: Previous model + average the representations of each sentence seen so far in a particular question

• This was the best aggregation found by the authors

• Datasets:• History questions: training set of 3,761 questions with 14,217 sentences and a test set of 699

questions with 2,768 sentences• Literature questions: training set of 4,777 questions with 17,972 sentences and a test set of 908

questions with 3,577 sentences• 451 history answers and 595 literature answers that occur on average twelve times in the corpus

• Word embeddings (We): word2vec trained on the preprocessed question text in our training set, then optimized in the current model

• Embedding size: 100, num incorrect sampled answers: 100

Intro to Deep Learning for Question Answering 2630 January 2017

Page 27: Intro to Deep Learning for Question Answering

Dependency Tree – RNN

• Results on test sets

• Several baselines, including comparison with all the text in Wikipedia page for the answer

• Also comparison with human players, after the first sentence in the question

Intro to Deep Learning for Question Answering 2730 January 2017

Page 28: Intro to Deep Learning for Question Answering

Dependency Tree – RNN

Intro to Deep Learning for Question Answering 2830 January 2017

Page 29: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• Work on sentence answer selection

• Use a sequence NN model to model the representation of Q&A• LSTM is the obvious choice

Intro to Deep Learning for Question Answering 2930 January 2017

Page 30: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• Use a bidirectional LSTM (BiLSTM)• Both the previous and future context by processing the sequence on two

directions

• Generate two independent sequences of LSTM output vectors

• One processes the input sequence forward, and one backward

• The input sequence contains the word embeddings for the analyzed text (Q&A)

• Output at each step contains the concatenation of the output vectors for both directions

Intro to Deep Learning for Question Answering 3030 January 2017

Page 31: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• Basic QA-LSTM model

• Compute BiLSTM representation for Q&A, then use a pooling method and cosine similarity for comparison

• Dropout on the last layer, before cosine

• Hinge loss for training

Intro to Deep Learning for Question Answering 3130 January 2017

Page 32: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• Best model when Q & A sides share the same network parameters

• Significantly better than the one that the question and answer sides own their own parameters

• Converges much faster

Intro to Deep Learning for Question Answering 3230 January 2017

Page 33: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• First improvement: QA-LSTM/CNN

• Put a CNN on top of the outputs of the BiLSTM

• Filter size m, output of the CNN for one filter is:

Intro to Deep Learning for Question Answering 3330 January 2017

Page 34: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• “The intuition of this structure is, instead of evenly considering the lexical information of each token as the previous subsection, we emphasize on certain parts of the answer, such that QA-LSTM/CNN can more effectively differentiate the ground truths and incorrect answers.”

Intro to Deep Learning for Question Answering 3430 January 2017

Page 35: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• Second improvement: Attention-based QA-LSTM

• “The fixed width of hidden vectors becomes a bottleneck, when the bidirectional LSTM models must propagate dependencies over long distances over the questions and answers.

• An attention mechanism is used to alleviate this weakness by dynamically aligning the more informative parts of answers to the questions.”

• Simple attention mechanism over the basic QA-LSTM model• Prior to pooling, each biLSTM output vector for the answer will be

multiplied by a softmax weight, which is determined by the question embedding from biLSTM

Intro to Deep Learning for Question Answering 3530 January 2017

Page 36: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• Conceptually, the attention mechanism gives more weight on certain words, just like tf-idf for each word

• But it computes the weights according to question information

Intro to Deep Learning for Question Answering 3630 January 2017

Page 37: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• Experiment 1: InsuranceQA

• Grid search for hyper-parameter tuning

• Word embedding is initialized using word2vec, size 100. They are further optimized as well during the training

• LSTM output vectors is 141 for one direction

• Also tried various norms

• SGD training

Intro to Deep Learning for Question Answering 3730 January 2017

Page 38: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• QA-LSTM compared against several baselines

• Metric is accuracy

Intro to Deep Learning for Question Answering 3830 January 2017

Page 39: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• Models’ performance by ground answer length

Intro to Deep Learning for Question Answering 3930 January 2017

Page 40: Intro to Deep Learning for Question Answering

LSTM Solution for Question Answering

• TREC-QA results

Intro to Deep Learning for Question Answering 4030 January 2017

Page 41: Intro to Deep Learning for Question Answering

CNN for QA – extended study

• Feng, Minwei, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou. "Applying deep learning to answer selection: A study and an open task." In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on, pp. 813-820. IEEE, 2015 – online here: https://arxiv.org/pdf/1508.01585.pdf

• Proposes several CNN architectures for QA

Intro to Deep Learning for Question Answering 4130 January 2017

Page 42: Intro to Deep Learning for Question Answering

• [1] Yu, Lei, Karl Moritz Hermann, Phil Blunsom, and Stephen Pulman. "Deep learning for answer sentence selection." arXiv preprint arXiv:1412.1632 (2014).- online here: https://arxiv.org/pdf/1412.1632.pdf

• [2] - Iyyer, Mohit, Jordan L. Boyd-Graber, Leonardo Max Batista Claudino, Richard Socher, and Hal Daumé III. "A Neural Network for Factoid Question Answering over Paragraphs." In EMNLP, pp. 633-644. 2014 - online here: https://cs.umd.edu/~miyyer/pubs/2014_qb_rnn.pdf

• [3] - Tan, Ming, Bing Xiang, and Bowen Zhou. "LSTM-based Deep Learning Models for non-factoid answer selection." arXiv preprint arXiv:1511.04108 (2015) - online here: https://arxiv.org/pdf/1511.04108v4.pdf

• [4] - Feng, Minwei, Bing Xiang, Michael R. Glass, Lidan Wang, and Bowen Zhou. "Applying deep learning to answer selection: A study and an open task." In Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on, pp. 813-820. IEEE, 2015– online here: https://arxiv.org/pdf/1508.01585.pdf

References

Intro to Deep Learning for Question Answering 4230 January 2017

Page 43: Intro to Deep Learning for Question Answering

Thank you!

[email protected]

Intro to Deep Learning for Question Answering

__________

4330 January 2017