Word representations: a simple and general method for semi-supervised learning · 2016-01-07 ·...
Transcript of Word representations: a simple and general method for semi-supervised learning · 2016-01-07 ·...
![Page 1: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/1.jpg)
G U A N N A N L U
M A R C H 1 7 , 2 0 1 5
Word representations: a simple and general method for
semi-supervised learning
![Page 2: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/2.jpg)
Outline
Motivation
Word representations
Distributional representations
Clustering-based representations
Distributed representations
Supervised evaluation tasks
Chunking
Named entity recognition (NER)
Experiments & Results
Summary
![Page 3: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/3.jpg)
Motivation
Semi-supervised approaches can improve accuracy
It can be tricky and time-consuming
![Page 4: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/4.jpg)
Motivation
Semi-supervised approaches can improve accuracy
It can be tricky and time-consuming
A popular approach:
use unsupervised methods to induce word features
![Page 5: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/5.jpg)
Motivation
Semi-supervised approaches can improve accuracy
It can be tricky and time-consuming
A popular approach:
use unsupervised methods to induce word features
clustering
word embeddings
![Page 6: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/6.jpg)
Motivation
Semi-supervised approaches can improve accuracy
It can be tricky and time-consuming
A popular approach:
use unsupervised methods to induce word features
clustering
word embeddings
Questions:
Which features are good for what tasks?
Should we prefer certain word features?
Can we combine them?
![Page 7: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/7.jpg)
Word Representations
Word representation: A mathematical object associated with each
word, often a vector
Word feature: each dimension’s value
Conventional representation E.g. One-hot representation
Problems:
Data sparsity
![Page 8: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/8.jpg)
Distributional representations
![Page 9: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/9.jpg)
Distributional representations
![Page 10: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/10.jpg)
Distributional representations
![Page 11: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/11.jpg)
Clustering-based representations
Brown clustering (Brown et al., 1992)
A hierarchical clustering algorithm
A class-based bigram language model
Time complexity: O(V*K2)
V is the size of the vocabulary, K is the number of clusters.
Limitations:
Only based on bigram statistics
not consider word usage
![Page 12: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/12.jpg)
Distributed representations
Not to be confused with distributional representations!
![Page 13: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/13.jpg)
Distributed representations
Not to be confused with distributional representations!
also known as word embeddings
dense, real-valued, low-dimensional
Neural language models
![Page 14: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/14.jpg)
Distributed representations
Collobert and Weston embeddings (2008) Neural language model
Discriminative and non-probabilistic
General architecture (e.g. SRL, NER, POS tagging)
Differences on implementation Not achieve the low log-rank
Corrupt the last word for each n-gram
Learning rates are separated
![Page 15: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/15.jpg)
Distributed representation
HLBL embeddings(2009)
Log-bilinear model
Predict the feature vector of the next word
Hierarchical structure (binary tree)
Represent each word as a leaf with a particular path
Calculate the product of the probability of each binary choice
![Page 16: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/16.jpg)
Evaluation tasks
Chunking: syntactic sequence labeling
CoNLL-2000 shared task
CRFsuite
Data
The Penn Treebank
7936 sentences(training)
1ooo sentences (development)
![Page 17: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/17.jpg)
Evaluation tasks
NER: sequence prediction problem
The regularized averaged perceptron model (Ratinov and Roth, 2009)
CoNLL03 shared task
204k words for training, 51k words for development, 46K words for testing
Out-of-domain dataset: MUC7 formal run (59K words)
![Page 18: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/18.jpg)
Evaluation---Features
Chunking NER
![Page 19: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/19.jpg)
Experiment
Unlabeled data
RCV1 corpus (63 millions words in 3.3 million sentences)
Preprocessing technique(Liang, 2005)
Remove all sentences that are less than 90% lowercase a-z.
![Page 20: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/20.jpg)
Results
![Page 21: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/21.jpg)
Results
Capacity of word representations
![Page 22: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/22.jpg)
Results
Chunking NER
![Page 23: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/23.jpg)
Results
Chunking NER
![Page 24: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/24.jpg)
Summary
Word features in an unsupervised, task-inspecific, and model-agnostic
manner The disadvantage Accuracy might be lower than a task-specific semi-
supervised method The contributions The first work to compare different word representations Combining different word representations can improve
accuracy further Future work Induce phrase representations Apply to other supervised NLP systems
![Page 25: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/25.jpg)
References
Brown, P. F., deSouza, P. V., Mercer, R. L., Pietra, V. J. D., & Lai, J. C. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18, 467–479.
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. ICML.
Landauer, T. K., Foltz, P.W., & Laham, D. (1998).An introduction to latent semantic analysis. Discourse Processes, 259–284.
Liang, P. (2005). Semi-supervised learning for natural language. Master’s thesis, Massachusetts Institute of Technology
Mnih, A., & Hinton, G. E. (2009). A scalable hierarchical distributed language model. NIPS (pp. 1081–1088).
Ratinov, L., & Roth, D. (2009). Design challenges and misconceptions in named entity recognition. CoNLL.
Turian, J., Ratinov, L., & Bengio, Y. (2010, July). Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th annual meeting of the association for computational linguistics (pp. 384-394). Association for Computational Linguistics.
![Page 26: Word representations: a simple and general method for semi-supervised learning · 2016-01-07 · Word representations: a simple and general method for semi-supervised learning. In](https://reader034.fdocuments.net/reader034/viewer/2022042308/5ed45b86ef5fbd20d62cdbea/html5/thumbnails/26.jpg)
Q&A
Any questions?
Thank you!