Statistical Semantic入門 ~分布仮説からword2vecまで~
-
Upload
yuya-unno -
Category
Technology
-
view
40.585 -
download
3
description
Transcript of Statistical Semantic入門 ~分布仮説からword2vecまで~
Statistical Semantic¨Ŭ ~�ǔǮũ!Lword2vecC8~�
Ĥĸ��Preferred Infrastructure óµ�ǧǺ (@unnonouno)
2014/02/06 PFIZg^k�
�ǕǿŽ�
! óµ�ǧǺ (@unnonouno) ! ƞŬ
! �ƓüūŋÌ
! m[eo|Vr�]
! ��đúÌĔ�IBMċȀų�PFI �
/�
ĻƑ=g}q��
0�
1�
Semantics�
¨Ŭ�ƓüūŋÌ [Bird+10] �
10Ǵ �ī=ßƜ=ĐDz�����10.1��ƓüūÌĐ�����10.2�ŹøŅÌÒ 10.3��ƳƐūŅÌ� 10.4�Əūī=ßƜŅ� 10.5�ŵõßƜŅ� 10.6�C9F�����10.7�ņěīǟ�����10.8�ǀǃáø�
2�
�ƓüūŋÌ�ƬưǝƼhwoWXTƪÒ� [�Ƃ+96] �
5. ßƜĐDz 5.1 ßƜĐDz9>
5.2 ÄǣÞŚ<ċ6%ßƜ°ȘȗäĐŗ
5.3 ÊŸ¸Į<ċ6%ßƜ°ȘȗäĐŗ
5.4 0=Ƹ=áø�
3�
@9%3<�ßƜŅ�9ü47Gm�|>ƅ��
! Ü;ßƜŅ�WikipediaKM� ! æ��Dz ! ¤æßƜŅ ! žĸßƜŅ
! ȄŲßƜŅ ! ĒēßƜŅ�
4�
'=Ʀ=õ>*C.S�
À�=ÆƩ�
! Statistical Semantics<¸Ê,Novk\<5�7ǭ%ę%C9FN
! ȋ5!=őů>�ǎ<Statistical Semantics=�Ǝ9*7>ūLO7;�!G*OC.S
! �ǎ=�ßƜŅ�>Ǭȏ*C.S
5�
Statistical Semantics9Distributional Semantics=Ū�>��
! (FS Q!LS!41���
! ľ�>§+G=29Ġ47�C, ! ľ�=Ū�Rē47�Nª�1LĜ�7½)��
6�
ņǩÈį [Evert10] �
Ü<NAACL2010=Stefan EvertL=Distributional Semantic Models=j��o�T�Rņě<*7�C,�
.-�
áø�???=ƹū>Ŷ8*J���
..� [Evert10]KM���
aj]n=H#I��& H�(+�P#?C8N�
áø�???=ƹū>Ŷ8*J���
./� [Evert10]KM���
aj]2ncatJpigE�(&;�C8C7knifeE�CF8�
ìĐ�dog �
.0� [Evert10]KM���
�ǔǮũ (Distributional Hypothesis) �
! §+īȍ8�é,Nƹū>§+ßƜRû59��'9 ! À�=õ> Ń;!L-'=ǮũR÷Í*7�N�¸Ê*7�N�ÁÎǎRǦ�C,
.1�
The Distributional Hypothesis is that words that occur in the same contexts tend to have similar meanings (Harris, 1954). (ACL wiki�)�
ŔĀ°ßƜŅ (Statistical Semantics)9>�
! ū=ŔĀ°;äŨ!LßƜRě�N�! Ȕ:= � "�� ��%=ŔĀ°;äŨR÷Í,N
.2�
Statistical Semantics is the study of "how the statistical patterns of human word usage can be used to figure out what people mean, at least to a level sufficient for information access”�(ACL wiki�)�
ū=ßƜ9 īȍ=¸Į�
.3�
&H��I)� GKBC��>ON�
&H��GKBC)� ;��>ON�
0L%ľªì*��
':G>:�I47ßƜRÒS8�N=!� [Àč13] �
! ]��xw��lG\��kb�G� 5#)C�
! �ƌ�9�ƃ�*!ēL;�²Ż<�ǰ�ƾR¡.1L�
! �^��RēL;�²Ż> �ŕ8q_��9��
.4�
Ƹ=`ot9=¸Į"���
ĝĽ°<8#0�;'9�
! §Şū�Ģ©ū�ƱŞū ! ū9ū=ßƜ=¸ĮRǦ�NK�<;N�
! ąŞū ! īȍ<ʼn+7 :=ßƜ8ħQO7�N=!"Q!N�
! ƻēū�´ū ! ƯƦīȍ!L ƻē=ƹū=ßƜ"ƱƘ8#N�
! Ǿū ! ƿ;Nüū®8©ʼn,Nƹū=¸Į"¿L!<�
.5�
;/'=õR,N=!�
! ļij;ĜƲđúRŀÝ9*;�1F üūn�i)��O?ƷÍ8#N
! PFI=ŦĴ=Ǎ&7�N³�Rśƽ*7%ON!G*O;� ! ćǶ��`��p=§ŞūƤŮ�ūŞȘȗäĐŗ
! īĦ�Ʊ< &NĜƲn�i=Öŕ ! ąüūćǶ<ŀÝ;ǾūǛĦ
! '=1�=®<ÆǂC*�æĨ"úġ)O7�N .6�
�Ǫȉ<>35=ŋÌR���
! üūŋÌ°;ŋÌR�� ! ex: ĪÍRȖ, �īſËīſ ǚ�ū etc…
! ƹūR�īȍ�<K47Ôé,N ! ex: īĦ±ķŜ ī±ķŜ ǯƗǸ ƹªò�Ǟªò ĮMî& ÓȌ-Æ°ū¸Į etc…
! ƹūÔéR��C%�Ôé*ĝ, ! ex: �Ǔy�e ĬŊy�e NNy�e etc…
! DZƛȐÂ ĬŊ�n� NN�n�Gƅ� /-�
ȈŁßƜĐDz: Latent Semantic Indexing (LSI), Latent Semantic Analysis (LSA) [Deerwester+90] �! đúćǶ=`}�rmU�8¤CO1
! ���9 :'=ÈįRǘS8GŶƚ!ãÏ)O7�N%L�ćǶ9üūŋÌ>ǜ"ƍ���
! ƹūR 0=ƹū"�é*7�NīĦĿ�9*7Ô,
! '=ƹūy\o�>#49ȓ�;=8ĚĎƭĄ8#N
/.�
LSI=ƙĐ�
//�
ńƹū":=īĦ<Ŷ¶�7�N=!=�Ǔ�
U � ∑� V�=� x� x�
�����(SVD)�
i�
k: ���KM�>8�
ƹūiRkĚĎ8Ôé*7�N�
�&�
���
LSI=[�j�
! ƹū=�ßƜ�>īȍy\o�8ºCN ! ''8>īȍ9>īĦ±ȑÂ='9 ! ÇĈ>�Ǔ=ėMª>Ĕŭ=*:'P�¬Ɛ�
! īȍy\o�>ȓ�;=8SVDRħ47 G49ţ�ĚĎ=y\o�8Ôé8#N>-�
/0�
īȍy\o�=ėMª�O'O�
! ƹū-īĦ�Ǔ ƹū-ƹū�Ǔ etc.
! ī±ķŜ ǯƗǸbVf ƥƫ=İij etc.
! ĮMî&¸Į ÓȌ-Æ°ū¸Į Šīù�=Ɖ²¸Į etc.�
/1�
ż�īȍǫǐ=Ū�<KNŐȁ�
/2��E:,E:� Vj_iE:`ckE:�
Ü<�5=ëÉ=ªòä�
/3�
LSI �
PLSI � LDA�
NMF �
NNLM �
%&h\i��
"��$��
NN� RNNLM � Skip-gram�
NTF �
üū�n�Ƣ�
! LSIƠ=ĬŊ°üū�n�90=ëÉƢ ! Ü<īǴ=ovk\9 ovk\!L=ƹū"¤æ)ON'9R�n�É,Novk\�n�9*7ëÉ
Good ! ƟĖÿƩ"�ǹ*7 M ÇŴ�ƟĖ*I,�
! Ƹ=ĬŊ�n�9ýD�Q.I,� Bad ! ĀIJ"þ1�
/4�
���0=< ĬŊ°üū�n�9>��
! `ot>��N�ĬŊ�ǔ!Lb�x��])O7�N9Ǯ¯,N
! ơǒ)O11%)S=m[eon�i!L G9=`ot=��ǔ�RƘ¯,N
! :=K�;ĬŊ�ǔRçĀ,N! :=K�<ƀŊ°<�ǔRƘ¯,N!"ǵ=¡.:'P�
/5�
ī�
ĬŊ°ȈŁßƜĐDz Probabilistic Latent Semantic Indexing (PLSI) [Hofmann99] �! LSI9Ʊǽ=ßƜĵ&"8#NK�;�n�
! ovk\<K47ƿ;Nƹū"�I,� ! ex: e{�l;L�µĺ��bkZ���ı¾� £Ć;L��Ŕŷ��Äť�����;:
! V��d9*7> LSI8ƹūRţĚĎ<Ŏ9*1=9§ƅ< ƹūGīĦGovk\=ĿCM9Ġ47ĬŊ�n�<;NK�<çĀ
/6�
PLSI=G�Ń*3HS9*1ũ¿�
! ńīĦ<>ovk\=�ǔ"ºC47�N ! ex: �bkZ�9óâ=õø"�I,�� D1�;Ȋǐè
! ńƹū<©*7��� ! ovk\�ǔ<Ʈ47�5ovk\RºFN ! 0=ovk\!LƹūR@95ºFN
! 'ORǑMŢ*7īĦ"¤æ)O19Ġ��
0-�
Latent Dirichlet Allocation (LDA) [Blei03] �
! PLSI=ëÉƢ ! PLSI8>ńīĦ=ovk\�ǔ"lj¯241"
LDA8>'OG¤æ).3H�
0.�
LDA>:S:SëÉ*7�%�
! ovk\�n�9�����µRǖ# ÙGþÝ;NLPT�a�f~=159*7ð�LO7�N���
! ĶǠ;: üūñâ=n�i<GʼnÍ)O7�N
! ûŧäI ĥ¦)S=ÈįRņǩ*7½)��
0/�
üū�n�Ƣ�Á=Ř�å�
! ĬŊ�n�;=8Ƹ=ĬŊ°�Á9Ñä"Ř� ! ĬŊƄ=dz&IJ<�Ɠ;ßƜĵ&"8#N ! ex: ŔĀ°đúćǶ ŔĀ°×ȆȒǾ etc.
! ƟĖªÁ"Ĭ¹)O7�N ! Ř��ǔ"Ƙ¯)O7�N=8�O? éÇ=�Ɠī=ĬŊ"¥%;47�N>-
! ăï"ŀ-1.0<;NK�<Ïš)O7�N1F e`T�ĬŊ�=Ǘ©Ƅ<ßƜ"�N
00�
�Ǔ�ĐƢ�
! ƹū=īȍ�Ǔ��N�>m�h��RėM ţ���\8ĕǽ,N
! Y�dq�=LSIGSVDR÷Í*7�N
Good ! �Ǔ�Đ=m\rk\"0=CCħ�N Bad ! ßƜĵ&R,N="Ű*� ! ĬŊ�n�9*7ħ�;�1F Ƹ=ʼnÍ9=Ɖïä"ţ��
01�
Non-negative Matrix Factorization (NMF) [Lee+99] �! ¼ş=SVD8�7%Nœ=Įð=ßƜĵ&"Ű*� ! ì=Ƅ<;NK�;ÞÐ=G98�ǓR�Đ,N
! ĊĨ"eu�e<;M ßƜGØMI,��"ą��
02�
3HS9ŠæÝNj<�Đ)O7�N�
[Lee+99]KM���
NMF = PLSI [Dinga+08] �
! NMF9PLSI> Ç>§+Æ°¸ð8�N'9"¬<�!41
! ij�<¢¿)O1�Á"Ç>§+ßƜRû47�1'9"=3<;47Q!N'9>Ƶş<ą�
03�
FQEF<�CN�
NMF�
PLSI�
Non-negative Tensor Factorization (NTF) [Cruys10] �! �Ǔ2935ñ�=¸ĮR¨OLO;� ! 2ĚĎ;S7ü47;�83ĚĎ<*7 m�h��Đ*K���
04�
�Ǔ�ĐƢ�Á=Ř�å�
! �Ǔm\rk\"0=CCħ�CQ.N ! ƋǗ¥ƇSVDT�a�f~"�N9 0=CCĐDz"¥Ƈ<;N
05�
r����sko��\Ƣ�
! ƹū=y\o�ÔéRr����sko8Òǃ ! Ùĕ=word2vecŅī8ŏ<õø<�Üơ8,�
Good ! ƱǽÂñ�=ßƜ=�Ny\o�Ôé"ÇŴ°<ďLO1�C2K%�!47;)0��
Bad ! ĀIJ"�ł ÇLJG�ł0�
! ƾ�K%Q!L;��
06�
Neural Network Language Model (NNLM) [Bengio+03] �
! N]�~üū�n�RNNÉ
! ÊêN-1īſ!L Ě=īſRÛ7NĬŊ�n�=r����skoRŠǖ,N�
1-�
Recurrent Neural Network Language Model (RNNLM) [Mikolov+10] �
! t-1īſǘS29#=�Œŝ�Ry\o�É*7 tīſÆR0=�Œŝ�!LÛ7N ! NNLM8>ĝNīſ=y\o�!LĚRÛ77�1
! ĝC8=īȍđú"ǡFĉCO7�NȊǐè
! http://rnnlm.org �
1.�
īſ ƹū�
«Ǚ�����=ǤOǏ�
ǤOǏ� Ě=¨»=āǒ�
`v��
RNNLM8Òǃ)O1Ôé"ŔūŠűIßƜŠűRĢnj*7�N'9"Q!47#1 [Mikolov+13a] �! ŶRĠ41! y\o�Rŕ*1Mí�1M*1L����
1/�
RNNLM9Transition-based parser=¸Į�
! RNNLM=�n�>Transition-based parser<ǽ7�NK�<¡�N�Ɓ¡� ! Stack"recurrent;y\o�<©ʼn*7�NK�<¡�N
! Ŕū°;ļǢGÔé8#7�N=>0=.�� ! ż�? �àƹé=Üū"#1�9��đú"ǡFĉCO7�N=8>
10�
��>Q'�KM�
Skip-gram�n� (word2vec) [Mikolov+13b] �
! ƯƦƹūRāǒ,N�n� ! ƯƦƹū!Lāǒ,N�n��CBOW�GĘČ*7�N
! Analogical reasoning=NJÂ"ǻ°<ò�
! ÇLJ"ÕÃ)ON
G>IParser":�'�9!;S241=���
11�
Skip-gram�n�[Mikolov+13b]=Æ°¸ð�
! ¨»`�ue: w1, w2, …, wT��wi>ƹū��
12�
'OR��
vwI�&wP#�@NK9FeU]il*�F��mD7=OLP��?A8�
cI� WSYD5<L8�
Ă9§+ÇŴ=ĊĨ"����
! ƨƓNJÂ�"MĻ$8>����
13�
ÇLJ�=m\rk\G:S:SëÉ [Mikolov+13c] �
! Îƹū=ăïRØN9'PR�C%bzN�
14�
bdTk�
Rd[k�
word2vec";/ǼƔ°241!�
! ƹū=y\o�Ôé>ÀC8G�41
! ƱǽDŽ=ļǢ>Ɲ1*7�1" ŕ*1Mí�1M9�41ǥė>ÀC88#;!41 ! üū�n�Ƣ> ċ�°<ī±ķŜR�n�É*7�N1F �ovk\�9���#�ȎÂ*!ȕ�LO7�;�
! �Ǔ�ĐƢ8 œ=e`T=ßƜĵ&<ƺƖ*7NMF"ƶÍ)O1'9RĠ��0�
! ßƜ=Šű>G49ǚǪ8 ¨Ų<ĹŇ)O1üūÈǁ9ňDž";�9Ôé8#;�9Ġ47�1
15�
ßƜ=�ƆÂ�"y\o�Ō®�<ǡFĉCON [Kim+13] �! “good”9”best”=ĞS�< C)!=”better”"ƕŁ�
16�
üū®=ȒǾǛĦ"8#N [Mikolov+13d] �
! ƹüū=`�ue8ėLO1Ôéy\o�>ǽ7�N ! Ń;�©ǾǛĦ8ė41 Ôéy\o�Ō®=ŖžłƧRėN
2-�
!&� XfSj&�
NNƢ�Á=''"Ř��
! ș=äŨ"Ě�9¿L!< ! ßƜ=ĝĽ°;äŨ" Ŷƚ!y\o�Ō®�<ǡFĉCO7�N
! ŕ*1Mí�1MôLj941M���
! B9S:"2013�<úġ)O7�N ! Mikolov>'=�2&815�ĕ�ŅīR�*7�N
! 1AS Ŷ!"Q!N=>'O!L�
2.�
ÆRdžF7DN9�
! ŶRīȍ9,N! ! üū�n�Ƣ�ī±ķŜ ĝNƹū ! �Ǔ�ĐƢ�ī±ķŜ ! NNƢ�ĝNƹū ǯƗǸ
! :�łƧ,N! ! üū�n�Ƣ�ǤOłð<Ŏ9, ! �Ǔ�ĐƢ�ţ��\<Ŏ9, ! NNƢ�Æ°¸ð"Ù��ÙË<;NK�<Ïš
2/�
�Á=Ū�>�O: ¢Ÿ>ǽ7�N�
À¬:�;47�%!���
! NNƢ�Á=�Ũ°<þÝ;³�"ØM�)ON ! ;/ÀC8�C%�!;!41=! �Ũ°;þÝ;Ū�>Ŷ;=!�
! ij=ßƜĵ&Iij=ÇéªÁ"�é,N ! �Ǔ�ĐƢ=ªÁ8ģé8#N9 ǻ°<Òǃ«®RĄL*1M �ňƒ<ĀIJ8#NK�<;N!G
! ʼnÍ�µ=Ĭ¹ ! C2 �;S!ğƈ���y� ! ÇĈ=Tx�^�c��<ʼnÍ)O7�%
20�
C9F�
! Statistical Semantics9>;<!� ! ŔĀ°;äŨ Ü<ƹū=�éīȍ=äŨ9 0=ū=ßƜ=¸Į<!S,NųƊ
! �#%�&735=ªòä ! �Ǔ�ĐƢ üū�n�Ƣ r����skoƢ
! NNƢ�Á>ÖĠÅ ! ŕ*1M í�1M ôLj941M
! Ȃź=NNüū�n�>C2ĭC41?!M2�
21�
ņěīǟ1�ř¨��
! [Bird+10] Steven Bird, Ewan Klein, Edward Loper, ȇÚ�ì�, �·�Ƿę, öµ�ƣ¿. ��������. Y�V��d�u�, 2010.
! [�Ƃ+96] �Ƃ�Ğ�Ťĩ�Ìƴ. ���������!$#� �������. ƬưĦō, 1996.
! [Evert10] Stefan Evert. Distributional Semantic Models. NAACL 2010 Tutorial.
! [Àč13] Àč�E5D. �������&��. ȃȅĦǨ, 2013.
! [Deerwester+90] Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman. Indexing by Latent Semantic Analysis. JASIS, 1990.
22�
ņěīǟ2�üū�n�Ƣ��Ǔ�ĐƢ��
! [Hofmann99] Thomas Hofmann. Probabilistic Latent Semantic Indexing. SIGIR, 1999.
! [Blei+03] David M. Blei, Andrew Y. Ng, Michael I. Jordan. Latent Dirichlet Allocation. JMLR, 2003.
! [Lee+99] Daniel D. Lee, H. Sebastian Seung. Learning the parts of objects by non-negative matrix factorization. Nature, vol 401, 1999.
! [Ding+08] Chris Ding, Tao Li, Wei Peng. On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing. Computational Statistics & Data Analysis, 52(8), 2008.
! [Cruys10] Tim Van de Cruys. A Non-negative Tensor Factorization Model for Selectional Preference Induction. Natural Language Engineering, 16(4), 2010.
23�
ņěīǟ3�NNƢ1��
! [Bengio+03] Yoshua Bengio, Réjean Ducharme, Pascal Vincent, Christian Jauvin. A Neural Probabilistic Language Model. JMLR, 2003.
! [Mikolov+10] Tomas Mikolov, Martin Karafiat, Lukas Burget, Jan "Honza" Cernocky, Sanjeev Khudanpur. Recurrent neural network based language model. Interspeech, 2010.
! [Mikolov+13a] Tomas Mikolov, Wen-tau Yih, Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations. HLT-NAACL, 2013.
! [Mikolov+13b] Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean. Efficient Estimation of Word Representations in Vector Space. CoRR, 2013.
24�
ņěīǟ4�NNƢ2��
! [Mikolov+13c] Tomas Mikolov, Ilya Sutskever, Kai Chen, Gregory S. Corrado, Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. NIPS, 2013.
! [Kim+13] Joo-Kyung Kim, Marie-Catherine de Marneffe. Deriving adjectival scales from continuous space word representations. EMNLP, 2013.
! [Mikolov+13d] Tomas Mikolov, Quoc V. Le, Ilya Sutskever. Exploiting Similarities among Languages for Machine Translation. CoRR, 2013.�
25�