Neural Networks for Information...

184

Outline

Morning programPreliminariesText matching IText matching II

Afternoon programLearning to rankModeling user behaviorGenerating responsesWrap up

185

Outline


Afternoon programLearning to rankModeling user behaviorGenerating responses

One-shot dialoguesOpen-ended dialogues (chit-chat)Goal-oriented dialoguesResources

Wrap up

186

Generating responsesGeneral Formulation of Typical DL Models

The general basic formulation:

I Learnable parametric function

I Inputs: generally considered I.I.D.

I Outputs: classification orregression

Question: Is this enough?

187

Generating responsesExample Scenario

Joe went to the kitchen. Fred went to the kitchen. Joe picked up the milk. Joetraveled to the office. Joe left the milk. Joe went to the bathroom.

I Where is the milk now? A: office

I Where is Joe? A: bathroom

I Where was Joe before the office? A: kitchen

188

Generating responsesWhat is Required?

I The model needs to remember the context

I Given an input, the model needs to know where to look for in the context

I It needs to know what to look for in the context

I It needs to know how to reason using this context

I It needs to handle the potentially changing the contextI A Possible Solution:

I Hidden states of RNNs have memory: Run an RNN on the context/story/KB andget its representation. Then use the representation to map question toanswers/response.

I will not scale as RNN don’t have ability to capture long term dependency: Vanishinggradient, The state of memory is not large enough to capture rich information

189

Generating responsesNeural Networks with Memory

I Memory NetworksI Fully Supervised MemNNsI End2End MemNNsI Key-Value MemNNs

I Neural Turing Machines

I Stack/List/Queue Augmented RNNs

190

Generating responsesMemory Networks

Class of models which combine large memory with learning component which can readand write to it.

I Step 1: controller converts incoming data to internalfeature representation (I)

I Step 2: write head updates the memories and writes thedata into memory (G)

I Step 3: given the external input, the read head readsthe memory and fetches relevant data (O)

I Step 4: controller combines the external data withmemory contents returned by read head to generateoutput (O, R)

I Inference Given the question, picks the memory whichscores the highest and uses the selected memory andthe question to generate the answer.

191

Generating responsesFully Supervised Memory Networks [Weston et al., 2015]

I Current supporting facts are supplied assupervision signal for learning the memoryaddressing stage:

I It is essentially like hard attention except thatyou already know where to attend!

192

Generating responsesFully Supervised Memory Networks

Context:John was in the bathroom.Bob was in the office.John went to the kitchen. =⇒ Supporting FactBob traveled back home.Question Answer Pair:Where is John? A: kitchen

193

Generating responsesEnd2End Memory Networks [Sukhbaatar et al., 2015]

I No current supporting fact supplied.

I Learns which parts of the memory are relevant.

I This is achieved by reading using soft attentionas opposed to hard.

I Performs multiple lookups to refine its guessabout memory relevance.

I Only needs supervision at the final output.

194

Generating responsesEnd2End Memory Networks (Multiple Hops)

I Share the input and output embeddings or notI What to store in memories individual words, word windows, full sentencesI How to represent the memories bag-of-words, RNN style reading at words or

characters

195

Generating responsesAttentive Memory Networks [Kenter and de Rijke, 2017]

I Proposed Model: An end-to-end trainable memory networks with a hierarchicalinput encoder.

I Framing the task of conversational search as a general machine reading task.

196

Generating responsesKey-Value Memory Networks [Miller et al., 2016]

I Facts are stored in a key value structuredmemory.

I Memory is designed so that the model learns touse keys to address relevant memories withrespect to the question.

I Structure allows the model to encode priorknowledge for the considered task.

I Structure also allows to leverage possiblycomplex transforms between key and value.

197

Generating responsesKey-Value Memory Networks

Example:

for a KB triple [subject, relation, object], Key could be [subject,relation] and valuecould be [object] or vice versa.

198

Generating responsesMachine Reading [Hermann et al., 2015]

Cloze Style Question Answering:Teaching a machine to understand language. To read a comprehension and answerquestions pertaining to it. However, the questions should be such that they cannot beanswered using external world knowledge.

199

Generating responsesTeaching Machine to Read and Comprehend

General Structure: A two layer Deep LSTM Reader with the question encoded beforethe document:

Document Attentive Reading:

200

Generating responsesWikiReading [Hewlett et al., 2016]

Task:To predict textual values from the structured knowledge base Wikidata by reading thetext of the corresponding Wikipedia articles.

I Categorical: properties, such as instance of, gender and country, require selectingbetween a relatively small number of possible answers.

I Relational: properties, such as date of birth, parent, and capital, typically requireextracting rare or totally unique answers from the document.

201

Generating responsesWikiReading

I Answer Classification: Encoding document and question, using softmaxclassifier to assign probability to each of to-50k answers (limited answer vocab) .

I Sparse BoW Baseline, Averaged Embeddings, Paragraph Vector, LSTM Reader,Attentive Reader, Memory Network.

I Generally models with RNN and attention work better, especially at relationalproperties.

I Answer Extraction (labeling/pointing) For each word in the document, computethe probability that it is part of the answer.

I Regardless of the vocabulary so the answer requires being mentioned in thedocument.

I RNN Labeler: shows a complementary set of strengths, performing better onrelational properties than categorical ones

I Sequence to Sequence Encoding query and document and decoding the answeras sequences of words or characters.

I Basic seq2seq, Placeholder seq2seq, Basic Character seq2seq,I Unifies the classification and extraction in one model: Greater degree of balance

between relational and categorical properties.

202

Outline




Wrap up

203

Generating responsesDialogue systems

Dialogues/conversational agents/chat bots

Open-ended dialogues

I ELIZA

I Twitterbots

I Alexa/Google home/Siri/Cortana

Goal-oriented dialogues

I Restaurant finding

I Hotel reservations

I Set an alarm clock

I Order a pizza

I Play music

I Alexa/Google home/Siri/Cortana

Is this IR?

204


Chit-chat bots

Hello how are you I

am fine

thank

s

user machine

Straightforward seq-to-seq [Vinyals andLe, 2015].([Sordoni et al., 2015b] is a precursor,but no RNN-to-RNN, and no LSTM).

Hello how are you

I am fine

thank

s

user

machine

Same idea, but with attention [Shanget al., 2015]

205


Limitations

I ’Wrong’ optimization criterion

I Generic responses

I No way to incorporate world knowledgeI No model of conversation

I InconsistencyI No memory of what was said earlier on

Human: what is your job?Machine: i’m a lawyer.Human: what do you do?Machine: i’m a doctor.Example from [Vinyals and Le,2015]

Evaluation

I Perplexity?

I BLUE/METEOR?

I Nice overview of How NOT To Evaluate Your Dialogue System [Liu et al., 2016].

I Open problem....

206


4 solutions

I More consistency by maintaining a persona

I More consistency in dialogue with hierarchical network

I Less generic responses with different optimization function

I More natural responses with GANs

207

Generating responsesPersona-based

EOS Rob

Wor

d em

bedd

ings

(50k

)

englandlondonu.s.

greatgood

stay

live okaymondaytuesday

Spea

ker e

mbe

ddin

gs(7

0k)

Rob_712

where do you live

in

in Rob england Rob

england

. Rob

. EOSSource Target

skinnyoflynny2

TomcoatezKush_322

D_Gomes25

Dreamswalls

kierongillen5

TheCharlieZ

The_Football_BarThis_Is_Artful

DigitalDan285

Jinnmeow3

Bob_Kelly2

A persona-based model for generating responses [Li et al., 2016].

208


Hierarchical seq-to-seq [Serban et al., 2016]. Main evaluation metric: perplexity.

209


Avoid generic responses

Usually: optimize log likelhood of predicted utterance, given previous context:

CLL = arg maxut

log p(ut|context) = arg maxut

log p(ut|u0 . . . ut−1)

To avoid repetitive/boring answer (I don’t know), use maximum mutual informationbetween previous context and predicted utterance [Li et al., 2015].

CMMI = arg maxut

logp(ut, context)

p(ut)p(context)

= [derivation, next page . . . ]

= arg maxut

(1− λ)log p(ut|context) + λlog p(context|ut)

210


Bayes rule

log p(ut|context) = logp(context|ut)p(ut)

p(context)

log p(ut|context) = log p(context|ut) + log p(ut)− log p(context)log p(ut) = log p(ut|context)− log p(context|ut) + log p(context)

CMMI = argmaxut

logp(ut, context)

p(ut)p(context)= argmax

ut

logp(ut|context)p(context)

p(ut)p(context)

= argmaxut

logp(ut|context)

p(ut)

= argmaxut

log p(ut|context)− log p(ut) ← weird, minus language model score

= argmaxut

log p(ut|context)− λlog p(ut) ← Introduce lambda

= argmaxut

log p(ut|context)− λ(log p(ut|context)− log p(context|ut) + log p(context))

= argmaxut

(1− λ)log p(ut|context) + λlog p(context|ut)

(More is needed to get it to work. See [Li et al., 2015] for more details.)

211

Generating responsesGenerative adversarial network for dialogues

I Discriminator networkI Classifier: real or generated utterance

I Generator networkI Generate a realistic utterance

Generator Real data

Discriminator

p( / )

Original GAN paper [Goodfellow et al., 2014].Conditional GANs, e.g. [Isola et al., 2016].

212

Generating responsesGenerative adversarial network for dialogues

I Discriminator networkI Classifier: real or generated utterance

I Generator networkI Generate a realistic utterance

See [Li et al., 2017] for more details.

Hello how are you I

am fine

thank

s

provided generated

Generator

Hellohow are you

I am fine

thank

s

I am fine

thank

s

Discriminator p(x = real)p(x = generated)

Code available at https://github.com/jiweil/Neural-Dialogue-Generation

https://github.com/jiweil/Neural-Dialogue-Generation

213


Open-ended dialogue systems

I Very cool, current problem

I Very hardI Many problems

I Training dataI EvaluationI ConsistencyI PersonaI . . .

214

Outline




Wrap up

215

Generating responsesGoal-oriented

Idea

I Closed domainI Restaurant reservationsI Finding movies

I Have dialogue system findout what the user wants

Challenges

I Training data

I Keeping track of dialogues

I Handling of out-of-domain words orrequests

I Going beyond task-specific slot filling

I Intermingling live API calls, chit chat,information requests, etc.

I EvaluationI Solve the taskI NaturalnessI Tone of voiceI SpeedI Error recovery

216

Generating responsesGoal-oriented as seq2seq

Memory network [Bordes and Weston, 2017]

I Simulated datasetI Finite set of things the bot can

sayI Because of the way the dataset

is constructed

I Memory networks

I Training: next utteranceprediction

I EvaluationI response-levelI dialogue-level

Restaurant Knowledge Base, i.e., atable. Queried by API calls.Each row = restaurant:

I cuisine (10 choices, e.g., French,Thai)

I location (10 choices, e.g., London,Tokyo)

I price range (cheap, moderate orexpensive)

I rating (from 1 to 8)

For words of relevant entity types

I add a trainable entity vector

217

Generating responsesGoal-oriented as reinforcement learning

A typical reinforcement learningsystem:

I States S

I Actions A

I State transition function:T : S,A→ S

I Reward function:R : S,A, S → R

I Policy: π : S → A

A RL system needs an

environment to interact with (e.g.,

real users).

Typically [Shah et al., 2016]:I States: agents interpretation of the

environment: distribution over userintents, dialogue acts and slots and theirvalues

I intent(buy ticket)I inform(destination=Atlanta)I ...

I Actions: possible communications, and areusually designed as a combination ofdialogue act tags, slots and possibly slotvalues

I request(departure date)I ...

218

Generating responsesGoal-oriented as reinforcement learning

Restaurant finding [Wen et al., 2017]:

I Neural belief tracking: distributionover a possible values of a set ofslots

I Delexicalisation: swap slot-valuesfor generic token (e.g. Chinese,Indian, Italian → FOOD TYPE )

Movie finding [Dhingra et al., 2017]:

I Simulated user

I Soft attention over databaseI Neural belief tracking:

I Multinomial distribution forevery column over possiblecolumn values

I RNN, input is dialogue so far,output softmax over possiblecolumn values

Reward based on finding the right KBentry.

219

Generating responsesGoal-oriented

Goal-oriented models

I Currently works primarily in very small domains

I How about multiple speakers?

I Not clear what kind of architecture is best

I Reinforcement learning might be the way to go (?)

I Open research area...

220

Outline




Wrap up

221

Generating responsesResources

Open-ended dialogue- Opensubtitles [Tiedemann, 2009]- Twitter: http://research.microsoft.com/convo/

- Weibo:http://www.noahlab.com.hk/topics/ShortTextConversation

- Ubuntu Dialogue Corpus [Lowe et al.,2015]- Switchboardhttps://web.stanford.edu/~jurafsky/ws97/

- Coarse Discourse (Google Research)https://research.googleblog.com/2017/05/

coarse-discourse-dataset-for.html

Goal-oriented dialogues- MISC: A data set of information-seekingconversations [Thomas et al., 2017]- Maluuba Frameshttp://datasets.maluuba.com/Frames

- Loqui Human-Human Dialogue Corpushttps://academiccommons.columbia.edu/catalog/ac:176612

- bAbi (Facebook Research)https://research.fb.com/downloads/babi/

Machine reading- bAbi QA (Facebook Research)https://research.fb.com/downloads/babi/

- WikiReading (Google Research)https://github.com/google-research-datasets/wiki-reading

http://research.microsoft.com/convo/

http://www.noahlab.com.hk/topics/ShortTextConversation

https://web.stanford.edu/~jurafsky/ws97/

https://research.googleblog.com/2017/05/coarse-discourse-dataset-for.html

https://research.googleblog.com/2017/05/coarse-discourse-dataset-for.html

http://datasets.maluuba.com/Frames

https://academiccommons.columbia.edu/catalog/ac:176612

https://research.fb.com/downloads/babi/

https://research.fb.com/downloads/babi/

https://github.com/google-research-datasets/wiki-reading

Neural Networks for Information...

Documents

Transcript of Neural Networks for Information...