Dialog State Tracking for Interview Coaching Using Two-Level … · 2019-03-01 · Dialog State...

Dialog State Tracking for Interview Coaching Using Two-Level LSTM

Ming-Hsiang Su , Chung-Hsien Wu, Kun-Yi Huang, Tsung-Hsien Yang, and Tsui-Ching Huang

Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan

Outline

IntroductionMotivation and goal

Data collection and annotation System frameworkWord embedding LSTM for answer vector extractionANN for dialog state detection

Experimental results Conclusions

2

Introduction

While in school, students are busy in studying for a better future. Students rarely have the opportunity to practice interview for

work or study. If students are afraid or nervous during the interview,

they can not answer questions asked by the interviewersproperly.

3

Motivation and Goal

For convenient and inexpensive opportunities to practice interview, a coaching system is constructed to improve user’s interview

skills.

4

Motivation and Goal

How to develop a dialog state tracker (DST) for user goal inference in an interview coaching system. DST is one of the key sub-tasks of dialog management. DST tasks are crucial because the dialog policy needs the

DST to detect correct dialog state in order to choose an appropriate action.

5

Introduction (3)

In this study, we proposed an encoder-classifier model. The word2vec method is used to encode words into word

vectors. A two-level LSTM-based method is proposed to encode the

answer sentences into answer hidden vectors. One for generating sentences hidden vectors. One for generating answer hidden vectors.

The ANN-based method is used to predict dialog states.

6

Word Encoder

(word2vec)

Hidden Vector Generation

(LSTM)

Classifier(ANN)

Data collection and annotation

Interview dialog corpus collection: The domain of the corpus is chosen as the College

Admission Interview. 12 participants were invited. During corpus collection, two participants completed the

interview without prior design questions and answers. A total of 75 dialogs with 540 question and answer pairs were

collected. Average number of sentences for each answer is 3.95.

7

Data collection and annotation

According to the collected corpus, 10 semantic slots were defined.

8

Semantic Slot Number

Association and cadre leadership 50

Performance and achievement 57

Leisure interest 50

Pros. and cons. 45

Motivation 54

Study and future plan 49

Curriculum areas 143

Programming language and professional skills 38

Personality trait 136

Others 12

System Framework9

Training phase

Testing phase

Answer Word Embedding

Sentence Hidden Vector

Generation

Answer Hidden Vector

Generation

Dialog State:0 0 0 1 0 0 1 0 0 0

LSTM AnswerHidden Vector

Model

LSTM AnswerHidden Vector

Model Training

Dialog State Classification

Model

Dialog State Classification

ModelTraining

Dialog State Detection

Word Embedding

Model Training

LSTM SentenceHidden VectorModel Training

LSTM Sentence Hidden Vector

Model

InterviewCorpus

Word Embedding

Model

Word embedding model

Each word is mapped to its corresponding word vector 𝒘𝒘𝒊𝒊 by using word2vec. Word2vec creates vectors that are distributed numerical

representations of word features, such as the context of individual words.

The purpose and usefulness of word2vec is to group the vectors of similar words together in the vector space.

Word2vec encodes each word in a vector and trains words against other words that neighbor them in the input corpus.

10

Word embedding model

This work uses the Skip-gram model. The word vector is connected to the vector representation of

the sentence. We use Chinese Gigaword corpus to train word2vector model. Totally, 42619 words were obtained.

11

𝑤𝑤𝑡𝑡−2

𝑤𝑤𝑡𝑡−1

𝑤𝑤𝑡𝑡+1

𝑤𝑤𝑡𝑡

𝑤𝑤𝑡𝑡+2

the

catsits

on

the

h

Input Projection Output

Dialogue State Tracking

The semantic relationship between consecutive sentences in an answer is important for state detection, the fist LSTM is employed to combine word vectors of a

sentence to obtain the hidden vector of the sentence.

12

LSTM

sentence hidden vector at time t

Word Embedding

hhh

Sentence

𝑤𝑤1 𝑤𝑤2 𝑤𝑤3

𝑤𝑤3𝑤𝑤2𝑤𝑤1


The second LSTM is employed to combine sentence hidden vectors to obtain the hidden vector of the answer.

13

Question:“如果都上了，你會選擇哪間學校，為什麼?”

“我應該會選擇成大” “想要離開自己熟悉的台北到台南念書” “也當作是磨練自己”


Finally, the answer hidden vector is fed into an ANN to detect the dialog states for dialog state representation.

14


Because the historical information could affect the user’s current answer, we combine current dialogue hidden vectors and the historical information to

obtain the final dialog state.

15

Answer(t)

𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠ℎ𝑖𝑖𝑖𝑖𝑖𝑖𝑠𝑠𝑠𝑠𝑣𝑣𝑠𝑠𝑠𝑠𝑠𝑠𝑣𝑣𝑣𝑣 𝑡𝑡−2

LSTM

Sentence 1

Word Embedding

𝑤𝑤1 𝑤𝑤2 𝑤𝑤3

hhh

Sentence 3

LSTM

hhh

Sentence 2

Word Embedding Word Embedding

𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠ℎ𝑖𝑖𝑖𝑖𝑖𝑖𝑠𝑠𝑠𝑠𝑣𝑣𝑠𝑠𝑠𝑠𝑠𝑠𝑣𝑣𝑣𝑣 𝑡𝑡−1

LSTM𝑤𝑤1 𝑤𝑤2 𝑤𝑤3

hhh𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠ℎ𝑖𝑖𝑖𝑖𝑖𝑖𝑠𝑠𝑠𝑠𝑣𝑣𝑠𝑠𝑠𝑠𝑠𝑠𝑣𝑣𝑣𝑣 𝑡𝑡

LSTM𝑤𝑤1 𝑤𝑤2 𝑤𝑤3

hhh

State(t) 0 0 1 0 0 0⋯

ANN1 ANN10⋯

𝑎𝑎𝑠𝑠𝑠𝑠𝑤𝑤𝑠𝑠𝑣𝑣ℎ𝑖𝑖𝑖𝑖𝑖𝑖𝑠𝑠𝑠𝑠𝑣𝑣𝑠𝑠𝑠𝑠𝑠𝑠𝑣𝑣𝑣𝑣 𝑡𝑡

𝑠𝑠𝑠𝑠𝑎𝑎𝑠𝑠𝑠𝑠𝑡𝑡−1 Action𝑡𝑡−1


Dialog state detection: each dialog state consists 10 semantic slot. we use 10 ANNs to detect semantic slots. we feed the answer hidden vector into the ANN model to

obtain the dialog state. the history of dialog state and dialog action in previous

time were combined with the answer hidden vector .

16

Input Hidden Output

1：The sentence containsthis semantic slot

0：The sentence does not contain this semantic slot

the answer sentences𝒕𝒕+

the dialog state𝒕𝒕−𝟏𝟏+

the dialog action𝒕𝒕−𝟏𝟏

Experimental setup

The proposed method was evaluated using 5-fold cross validation.

Each word vector dimension is set to 30. 10-dimensional semantic slot output.

17

𝑆𝑆𝑆𝑆𝑣𝑣𝑠𝑠1 𝑆𝑆𝑆𝑆𝑣𝑣𝑠𝑠2 𝑆𝑆𝑆𝑆𝑣𝑣𝑠𝑠3 𝑆𝑆𝑆𝑆𝑣𝑣𝑠𝑠4 𝑆𝑆𝑆𝑆𝑣𝑣𝑠𝑠5 𝑆𝑆𝑆𝑆𝑣𝑣𝑠𝑠6 𝑆𝑆𝑆𝑆𝑣𝑣𝑠𝑠7 𝑆𝑆𝑆𝑆𝑣𝑣𝑠𝑠8 𝑆𝑆𝑆𝑆𝑣𝑣𝑠𝑠𝟗𝟗 𝑆𝑆𝑆𝑆𝑣𝑣𝑠𝑠𝟏𝟏𝟏𝟏

1 0 0 0 1 0 0 0 0 0

The slot is “Association and cadre leadership” The slot is “Motivation”

Experimental results (1)

When we considered the accuracy of each ANN’s output, parameter tuning was conducted to obtain the

best performance of the LSTM and ANN models. the number of hidden nodes in the LSTM-based

sentence model and answer model were set to 32.

18

Experimental results (2)

This result shows that the information about the dialog state and dialog action at time t - 1 provided a best positive contribution in this experiment.

19

AccuracyANN

Hidden Nodes

AHVt AHVt+DSt-1 AHVt+DSt-1+ DAt-1

8 87.44% 89.45% 89.69%16 84.98% 89.08% 89.93%32 84.37% 88.70% 89.64%64 88.78% 88.53% 88.72%

128 87.66% 88.55% 89.05%256 87.53% 88.17% 88.82%

1. AHVt: Input of ANN contains the answer hidden vectors at time t.2. AHVt+DSt-1: Input of ANN contains the answer hidden vectors at time tand the dialogue states at time t-1.3. AHVt+DSt-1+ DAt-1: Input of ANN contains the answer hidden vectors attime t, and the dialog states along with the dialog action at time t-1.

Conclusions

We propose an approach to DST detection in an interview coaching system. The word2vec model is employed to encode the words to

embedding vector for word distributed representation. The LSTM-based model is used for sentence and answer vector

representation. The ANN-based model is applied to detect the final dialog state

for further policy control.

In the future, a richer interview corpus and a robust DST model are helpful to improve system performance.

20

21

http://tw.image.search.yahoo.com/images/view;_ylt=A8tUwJol72RQ6BwA9Ndt1gt.;_ylu=X3oDMTBlMTQ4cGxyBHNlYwNzcgRzbGsDaW1n?back=http://tw.image.search.yahoo.com/search/images?p=%E9%B3%B3%E5%87%B0%E8%8A%B1&ei=UTF-8&fr=yfp&tab=organic&ri=33&w=640&h=480&imgurl=www.ttvs.cy.edu.tw/kcc/94531wan/d160.jpg&rurl=http://www.ttvs.cy.edu.tw/kcc/94531wan/fon.htm&size=57.2+KB&name=%E9%B3%B3%E5%87%B0%E8%8A%B1%E9%96%8B&p=%E9%B3%B3%E5%87%B0%E8%8A%B1&oid=33d25c57e9de8ac8b88e1aa6acd4fa88&fr2=&fr=yfp&tt=%E9%B3%B3%E5%87%B0%E8%8A%B1%E9%96%8B&b=31&ni=72&no=33&ts=&tab=organic&sigr=11e6krah9&sigb=13eukr9fr&sigi=1188glagm&.crumb=fmNh5NsZCa8

http://tw.image.search.yahoo.com/images/view;_ylt=A8tUwJol72RQ6BwA9Ndt1gt.;_ylu=X3oDMTBlMTQ4cGxyBHNlYwNzcgRzbGsDaW1n?back=http://tw.image.search.yahoo.com/search/images?p=%E9%B3%B3%E5%87%B0%E8%8A%B1&ei=UTF-8&fr=yfp&tab=organic&ri=33&w=640&h=480&imgurl=www.ttvs.cy.edu.tw/kcc/94531wan/d160.jpg&rurl=http://www.ttvs.cy.edu.tw/kcc/94531wan/fon.htm&size=57.2+KB&name=%E9%B3%B3%E5%87%B0%E8%8A%B1%E9%96%8B&p=%E9%B3%B3%E5%87%B0%E8%8A%B1&oid=33d25c57e9de8ac8b88e1aa6acd4fa88&fr2=&fr=yfp&tt=%E9%B3%B3%E5%87%B0%E8%8A%B1%E9%96%8B&b=31&ni=72&no=33&ts=&tab=organic&sigr=11e6krah9&sigb=13eukr9fr&sigi=1188glagm&.crumb=fmNh5NsZCa8

Dialog State Tracking for Interview Coaching Using Two-Level … · 2019-03-01 · Dialog State...

Documents

Transcript of Dialog State Tracking for Interview Coaching Using Two-Level … · 2019-03-01 · Dialog State...