"Distributed representation of sentences and documents"の解説

“Distributed Representation of Sentences and

Documents”の解説西尾泰和

14年6月6日金曜日

前回までのあらすじ

http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf




文章は？

可変長↑


Bag-of-Words(BoW)


（追記）• BoWはVocab次元

• 順序の情報がなくなる• A is better than B と B is better than Aは同じ

• この種の区別ができなくなる高次脳機能障害がある。


文章のベクトル化

• BoWは単語の1-of-K表現の和

• じゃあ単語の分散表現の和でいい？


BoW

Word Vectorの和(平均)

提案手法はもっと性能がよい！

文章からそれが肯定的か否定的か判断する実験


提案手法

• PV-DM: Distributed Memory Model

• PV-DBOW: Distributed Bag of Words

の2つの組み合わせPV-DMだけでもかなり良いがPV-DBOWを組み合わせると更に良い


↑Vocab次元1-of-K

PV-DM

Para次元1-of-K→

予測問題を解かせることで分散表現を作るこのコンセプトはCBOWと同じ


PV-DM• 予測問題を解かせることで分散表現を作る• このコンセプトはCBOWと同じ

• だけどsumやaverageではなくconcatenateなので、語順の情報が保たれている

• IntroでBoWは語順が失われると批判してる

• 図ではaverageも含めてるが、実験結果はconcatのものだけ


PV-DM

✕

Averageのことは無視しよう14年6月6日金曜日

（追記）


PV-DM

• Paragraph IDから隠れ層への投影（行列D）は予測性能を上げるために、文脈だけでは表現できない情報を表現する役割を担う

• 学習データになかった新しいParagraphに関しては、Word Vectors等を固定して学習


PV-DBOW

「その段落での各単語の出現頻度」をVocab次元の数十万から400次元へ落としたもの


実験1

• 映画のレビュー文章を見て、ポジティブなのかネガティブなのか判定する


Positive? Negative?• It starts out like a very serious social commentary which

quickly makes one think of other Clark movies like Kids, Bully, etc. But then just as quickly, it unravels into a direction-less mess. Who is the main character? Is this a serious film or some Gregg Araki-esquire over the top goofy film? Is this a skate documentary with moments of dialog inserted? I have no clue. I found myself watching the clock and wonder when this turd was going to end. I kept thinking there would be some big shocker culmination which never came. I cut a good 20 minutes out of the movie by fast forwarding through the pointless skate scenes. Yes, it illustrates the changing landscape


順序の重要さ• 「Who is the main character?」は人間が見ればネガティブだとすぐわかる

• でも「main character」だけではネガティブじゃないし、その他のis, the, who, ?もそれ単体ではネガティブな意味を持たない

• (あえて言えば ? は少しネガティブ)


Protocol

入力は1文、学習データのパラグラフは8544個「8544→800はあんまり次元削減してなくない？」


BoW

提案手法は性能がよいし、構文解析も要らない！

構文解析を必要とする手法


実験2

• 映画のレビュー文章を見て、ポジティブなのかネガティブなのか判定する

• 実験1は入力が1文、こちらは複数文


Protocol

間にNNが挟まっているのは線形のLogRegより非線形にしたほうが性能が良かったから

「800次元もあって線形分離で性能が出ないの？」「NNの出力の次数は？」


RBM

Naive Bayes+ SVM

PV-DM only: 7.63PV-DM sum: 8.06

↑PV-DM + PV-DBOW

window size 5~12の範囲で0.7%変化する→cross validationで選ぶべき


時間コスト• 「can be expensiveだけどテストは並列化可能、

16コアで25000段落平均230単語が30分」

• でもテストの前の学習フェーズは単純に考えてデータ量が3倍、5~12のwindowサイズ探しで8倍

• テストフェーズではWord Vector等を固定してるからその部分の学習コストは含まれてない

→全部入りにすると結構掛かるんじゃないか？


むしろBigram NaiveBayes系の優秀さが際立つ？

（追記）

NaiveBayes+SVM, bigram→


"Distributed representation of sentences and documents"の解説

Education

Transcript of "Distributed representation of sentences and documents"の解説