NTC_Tensor flow 深度學習快速上手班_Part4 -自然語言
-
Upload
notch-training-center -
Category
Technology
-
view
251 -
download
6
Transcript of NTC_Tensor flow 深度學習快速上手班_Part4 -自然語言
TensorFlow深度學習快速上⼿手班������
四、⾃自然語⾔言處理應⽤用
By Mark Chang
• ⾃自然語⾔言處理簡介 • Word2vec神經網路 • 語意運算實作
⾃自然語⾔言處理簡介
⾃自然語⾔言處理 • ⾃自然語⾔言處理是⼈人⼯工智慧和語⾔言學領域的分⽀支
– 探討如何處理及運⽤用⾃自然語⾔言 • ⾃自然語⾔言理解系統
– 把⾃自然語⾔言轉化為電腦易於處理的形式。 • ⾃自然語⾔言⽣生成系統
– 把電腦程式數據轉化為⾃自然語⾔言。 • https://zh.wikipedia.org/wiki/%E8%87%AA
%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86���
語意理解
https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
機器翻譯
http://arxiv.org/abs/1409.0473
詩詞創作
http://emnlp2014.org/papers/pdf/EMNLP2014074.pdf
影像標題產⽣生
http://arxiv.org/pdf/1411.4555v2.pdf
影像內容問答
http://arxiv.org/pdf/1505.00468v6.pdf
Word2vec神經網路
⽂文字的語意
• 某個字的語意,可從它的上下⽂文得知
dog 和 cat 語意相近.
The dog run. A cat run. A dog sleep. The cat sleep. A dog bark. The cat meows.
語意向量
The dog run. A cat run. A dog sleep. The cat sleep. A dog bark. The cat meows.
the a run sleep bark meow dog 1 2 2 2 1 0
cat 2 1 2 2 0 1
語意向量
dog (1, 2,..., xn)
cat (2, 1,..., xn)
Car (0, 0,..., xn)
語意向量相似度 • A 和 B 的Cosine Similarity 為: A ·B
|A||B|
dog (a1, a2, ..., an)
cat (b1, b2, ..., bn)
dog 和 cat 的cosine similarity為:
a1b1 + a2b2 + ...+ anbnpa21 + a22 + ...+ a2n
pb21 + b22 + ...+ b2n
語意向量加減運算
Woman + King - Man = Queen
Woman Queen
Man King
King - Man
King - Man
語意向量維度太⼤大
(x1=the, x2 =a,..., xn)
dog
語意向量的維度等於總字彙量
x1
x2
x3
x4
xn ...
Word2vec神經網路
dog
One-Hot Encoding
word2vec 神經網路
壓縮過的語意向量
1.2
0.7
0.5
1
0
0
0
One-Hot Encoding
dog cat run fly 1
0
0
0
0
1
0
0
0
0
1
0
0
0
0
1
Initialize Weights
dog
cat run
fly
dog
cat run
fly
W =
2
664
w11 w12 w13
w21 w22 w23
w31 w32 w33
w31 w32 w43
3
775V =
2
664
v11 v12 v13v21 v22 v23v31 v32 v33v31 v32 v43
3
775
把語意向量壓縮
1
0
0
0
dog
高維度
低維度
v11
v12
v13
v11
v12
v13
v11
v12
v13
1
0
0
0
Compressed Vectors
dog cat run fly
v11
v12
v13
v21
v22
v23
w31
w32
w33
w41
w42
w43
dog
cat run
fly
dog
cat run
fly
Context Word dog 1
0
0
0
v11
v12
v13
v11
v12
v13 run
0
0
1
0
w31
w32
w33
dog
cat run
fly dog cat run fly
1
1 + e�V1W3⇡ 1
V1 ·W3 = v11w31 + v12w32 + v13w33
Context Word cat
1
0
0
0
v11
v12
v13
v21
v22
v23 run
0
0
1
0
w31
w32
w33
dog cat run fly
V2 ·W3 = v21w31 + v22w32 + v23w33
dog cat run fly
1
1 + e�V2W3⇡ 1
Non-context Word dog 1
0
0
0
v11
v12
v13
v11
v12
v13
fly
0
0
1
0
w41
w42
w43
V1 ·W4 = v11w41 + v12w42 + v13w43
1
1 + e�V1W4⇡ 0
dog cat run fly
dog cat run
fly
Non-context Word
cat 1
0
0
0
v11
v12
v13
v21
v22
v23
w41
w42
w43
0
0
1
0
V2 ·W4 = v21w41 + v22w42 + v23w43
dog cat run
fly
dog cat run
fly
fly
1
1 + e�V2W4⇡ 0
Result
dog cat run
fly
dog cat run fly
v11
v12
v13
v21
v22
v23
w31
w32
w33
w41
w42
w43
dog
cat run
fly
語意運算實作
語意運算實作 https://github.com/ckmarkoh/ntc_deeplearning_tensorflow/blob/master/sec4/semantics.ipynb
訓練資料 anarchism originated as a term of abuse first used against early working class radicals including the diggers of the english revolution and the sans culottes of the french revolution whilst the term is still used in a pejorative way to describe any act that used violent means to destroy the organization of society it has also been taken up as a positive label by self defined anarchists the word anarchism is derived from the greek without archons ruler chief king anarchism as a political philosophy is the belief that rulers are unnecessary and should be abolished although there are differing interpretations of what this means anarchism also refers to related social movements that advocate the elimination of authoritarian institutions particularly the state the word anarchy as most anarchists use it does not imply chaos nihilism or anomie but rather a harmonious anti authoritarian society in place of what
前處理 anarchism originated as a term of abuse first used against early working class radicals including the diggers of the english revolution and the sans culottes of the french revolution whilst the term is still used in a pejorative way to describe any act that used violent means to destroy the organization of society it has also been taken up ….
[‘anarchism’, ‘originated’, ‘as’, ‘a’, ‘term’, ‘of’, ‘abuse’, ‘first’, ‘used’, ‘against’, ‘early’, ‘working’, ‘class’, ‘radicals’, ‘including’, ‘the’, ‘diggers’, ‘of’, ‘the’, ‘english’, ‘revolution’, ‘and’, ‘the’, ‘sans’, ‘culottes’, ‘of’, ‘the’, ‘french’, ‘revolution’, ‘whilst’, ‘the’, ‘term’, ‘is’, ‘still’, ‘used’, ‘in’, ‘a’, ‘pejorative’, ‘way’, ‘to’, ‘describe’, ‘any’, ‘act’, ‘that’, ‘used’, ‘violent’, ‘means’, ‘to’, ‘destroy’, ‘the’... ]
前處理
‘the’, ‘english’, ‘revolution’, ‘and’, ‘the’, ‘sans’, UNK, 'of', 'the', 'french', 'revolution’…
1, 103, 855, 3, 1, 15068, 0, 2, 1, 151, 855, …
‘the’, ‘english’, ‘revolution’, ‘and’, ‘the’, ‘sans’, ‘culottes’, 'of', 'the', 'french', 'revolution’…
‘the’, ‘english’, ‘revolution’, ‘and’, ‘the’, ‘sans’, ‘culottes’, 'of', 'the', 'french', 'revolution’…
字典外的字,用UNK代替。
將字轉換成字典內的代碼。
根據詞頻, 轉換成字典
{“UNK”: 0, “the”: 1, “of”: 2, “and”: 3, “one”: 4, “in”: 5, “a”: 6, “to”: 7, “zero”: 8, “nine”: 9, .... }
# 字典大小 vocabulary_size = 50000
前處理 5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156, 128, 742, 477, 10572, 134, 1, 27549, 2, 1, 103, 855, 3, 1, 15068, 0, 2, 1, 151, 855, …
input output
3084 5239
3084 12
12 3084
12 6
6 12
6 195
195 6
195 2
3084 5239
word2vec
前處理
5239, 3084, 12, 6, 195, 2, 3137, 46, 59, 156, 128, 742, 477, 10572, 134, 1, 27549, 2, 1, 103, 855, 3, 1, 15068, 0, 2, 1, 151, 855, …
generate_batch(batch_size=8, num_skips=2, skip_window=1)
batch size
input 3084 3084 12 12 6 6 195 195
output 5239 12 3084 6 12 195 6 2
num_skips
batch_size
skip_window=1
Computational Graph train_inputs = tf.placeholder(tf.int32, shape=[batch_size]) train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1]) with tf.device('/cpu:0'):
embeddings = tf.Variable( tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
embed = tf.nn.embedding_lookup(embeddings, train_inputs) nce_weights = tf.Variable(
tf.truncated_normal([vocabulary_size, embedding_size], stddev=1.0 / math.sqrt(embedding_size))) nce_biases = tf.Variable(tf.zeros([vocabulary_size])) loss = tf.reduce_mean( tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels, num_sampled, vocabulary_size))
optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
Device with tf.device('/cpu:0’)
在CPU上執行以下定義的Computational Graph
由於Tensorflow未支援 embedding_lookup 在GPU上執行,故需令它在CPU上執行。
Inputs & Outputs
word2vec
train_inputs = tf.placeholder(tf.int32, shape=[batch_size]) train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
train_inputs 3084
3084
12
12
6
6
195
195
train_labels 5239
12
3084
6
12
195
6
2
Embedding Lookup embeddings = tf.Variable(tf.random_uniform([vocabulary_size,
embedding_size], -1.0, 1.0)) embed = tf.nn.embedding_lookup(embeddings, train_inputs)
train_inputs 2
embeddings
embedding_lookup
NCE Weights • NCE: Noise Contrastive Estimation
nce_weights = tf.Variable( tf.truncated_normal([vocabulary_size,
embedding_size], stddev=1.0 / math.sqrt(embedding_size) ))
nce_biases = tf.Variable( tf.zeros([vocabulary_size]) )
nce_weights
nce_biases
NCE Loss loss = tf.reduce_mean(
tf.nn.nce_loss(nce_weights, nce_biases, embed, train_labels, num_sampled, vocabulary_size))
1
0
0
0
v11
v12
v13
v21
v22
v23
0
0
1
0
w31
w32
w33
1
1 + e�V2W3⇡ 1
1
0
0
0
v11
v12
v13
v21
v22
v23
w41
w42
w43
0
0
1
0
1
1 + e�V2W4⇡ 0
Positive Negative
cost = log(1
1 + e
�vT
I
wpos
) +X
neg
log(1� 1
1 + e
�vT
I
wneg
)
Train feed_dict = {train_inputs: batch_inputs,
train_labels: batch_labels} _, loss_val = session.run([optimizer, loss], feed_dict=feed_dict)
loss_val
batch_inputs 3084
3084
12
12
6
6
195
195
batch_labels 5239
12
3084
6
12
195
6
2
Result final_embeddings
array([[-0.02782757, -0.16879494, -0.06111901, ..., -0.25700757, -0.07137159, 0.0191142 ], [-0.00155336, -0.00928817, -0.0535327 , ..., -0.23261793, -0.13980433, 0.18055709], [ 0.02576068, -0.06805354, -0.03688766, ..., -0.15378961, 0.00459271, 0.0717089 ], ..., [ 0.01061165, -0.09820389, -0.09913248, ..., 0.00818674, -0.12992384, 0.05826835], [ 0.0849214 , -0.14137401, 0.09674817, ..., 0.04111136, -0.05420518, -0.01920278], [ 0.08318492, -0.08202577, 0.11284919, ..., 0.03887166, 0.01556483, 0.12496017]], dtype=float32)
Visualization
Most Similar Words def get_most_similar(word, top=10): wid = dictionary.get(word,-1)
result = np.dot(final_embeddings[wid:wid+1,:],final_embeddings.T) result = result [0].argsort().tolist() result.reverse() for idx in result [:10]: print(reverse_dictionary[idx])
get_most_similar("one")
one six two four seven three ...
講師資訊
• Email: ckmarkoh at gmail dot com • Blog: http://cpmarkchang.logdown.com • Github: https://github.com/ckmarkoh
Mark Chang
• Facebook: https://www.facebook.com/ckmarkoh.chang • Slideshare: http://www.slideshare.net/ckmarkohchang • Linkedin:
https://www.linkedin.com/pub/mark-chang/85/25b/847
44