Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text...

50
Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing Fall 2017 Some content on these slides was borrowed from J&M

Transcript of Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text...

Page 1: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Text Classification and Convolutional Neural Networks

COSC 7336: Advanced Natural Language ProcessingFall 2017

Some content on these slides was borrowed from J&M

Page 2: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Today’s lecture★ Text Classification: task definition★ Classical approaches to Text Classification★ Convolutional Neural Networks (CNN)★ Recent work using CNNs for Text Classification problems★ Demo: CNN for text★ Practical

Page 3: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing
Page 4: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

What do these books have in common?

Page 5: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Other tasks that can be solved as TC★ Sentiment classification

★ Native language identification

★ Profiling

Page 6: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Formal definition of the TC task

★ Input:○ a document d○ a fixed set of classes C = {c1, c2,…, cJ}

★ Output: a predicted class c ∈ C

Page 7: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Methods for TC tasks★ Rule based approaches★ Machine Learning algorithms

○ Naive Bayes○ Support Vector Machines○ Logistic Regression○ And now deep learning approaches

Page 8: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Naive Bayes for Text Classification★ Simple approach ★ Based on the bag-of-words representation

Page 9: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Bag of wordsThe first reference to Bag of Words is attributed to a 1954 paper by Zellig Harris

Page 10: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Naive BayesProbabilistic classifier (eq. 1)

According to Bayes rule: (eq. 2)

Replacing eq. 2 into eq. 1:

Dropping the denominator:

Page 11: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Naive BayesA document d is represented as a set of features f

1 , f

2 , …, f

n

How many parameters do we need to learn in this model?

Page 12: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Naive Bayes Assumptions1. Position doesn’t matter2. Naive Bayes assumption: probabilities P(f

i|c) are independent given the class

c and thus we can multiply them:

This leads us to:

Page 13: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Naive Bayes in PracticeWe consider word positions:

We also do everything in log space:

Page 14: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Naive Bayes: TrainingHow do we compute and ?

Page 15: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Is Naive Bayes a good option for TC?

Page 16: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Evaluation in TCConfusion table

Accuracy = TP + TN

(TP + TN + FN + FP)

Gold Standard

True False

True TP = true positives FP = False positives

False FN = false negatives TN = True negatives

Page 17: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Evaluation in TC: Issues with Accuracy?Suppose we want to learn to classify each message in a web forum as “extremely negative”. We have a collected gold standard data:

★ 990 instances are labeled as negative★ 10 instances are labeled as positive★ Test data has 100 instances (99- and 1+)★ A dumb classifier can get 99% accuracy by always predicting “negative” !

Page 18: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

More Sensible Metrics: Precision, Recall and F-measure

P= TP/(TP+FP)

R=TP/(TP+FN)

F-measure =

Gold Standard

True False

True TP = true positives FP = False positives

False FN = false negatives TN = True negatives

Page 19: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

What about Multi-class problems?● Multi-class: c > 2

● P, R, and F-measure are defined for a single class

● We assume classes are mutually exclusive

● We use per class evaluation metrics

P = R =

Page 20: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Micro vs Macro Average★ Macro average: measure performance per class and then average★ Micro average: collect predictions for all classes then compute TP, FP, FN,

and TN ★ Weighted average: compute performance per label and then average where

each label score is weighted by its support

Page 21: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Example

Page 22: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Train/Test Data Separation

Page 23: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Convolutional Neural Networks

Page 24: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Visual Cortex

Page 25: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Neocognitron (Fukushima, 1980)

Page 26: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

LeNet (LeCun, 1998)

Page 27: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Convolution

Page 28: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Convolution

(Source:Feature extraction using convolution, Stanford Deep Learning Wiki)

Page 29: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Convolution

(Source:Feature extraction using convolution, Stanford Deep Learning Wiki)

Page 30: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Pooling or Subsampling

Page 31: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Pooling

(source: Karpathy, CS231n Convolutional Neural Networks for Visual Recognition)

Page 32: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Pooling

(source: Karpathy, CS231n Convolutional Neural Networks for Visual Recognition)

Page 33: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Properties★ Local invariance★ Compositionality

Adapted from: http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/

Page 34: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

CNNs for NLP★ Same as images, text exhibits some local invariance properties that can be

modeled by CNNs★ CNNs are not as popular as recurrent neural networks (to be discussed next

class) for text analysis, but there are many cases where they work pretty well.★ Big advantage: CNNs can be trained efficiently since they take full advantage

of parallelism.

Page 35: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

A character-level CNN

Example from Sebastián Sierrahttp://lin99.github.io/NLPTM-2016/4.Docs/cnn%20for%20text.pdf

Page 36: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

A character-level CNN

Page 37: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

A character-level CNN

Page 38: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

A character-level CNN

Page 39: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

A character-level CNN

Page 40: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

A character-level CNN

Page 41: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

A character-level CNN

Page 42: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

A character-level CNN

Page 43: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

A character-level CNN

Page 44: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

A character-level CNN

Page 45: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

A character-level CNN

Page 46: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Convolutional neural networks for sentence classification

Kim, Yoon. "Convolutional neural networks for sentence classification." arXiv preprint arXiv:1408.5882 (2014).

Page 47: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Convolutional neural networks for sentence classification

Page 48: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Recent work using CNNs: Text Classification★ Architecture with up to 29

convolutional layers★ Idea is to learn a hierarchical

representation of text★ Achieve state of the art on

most datasets and outperform recent work using shallow CNNs

★ They reach state of the art on large data sets > 630k

★ No statistical tests for significance

★ They couldn’t outperform a hierarchical method adapted for multiple sentences.

Page 49: Text Classification and Convolutional Neural Networks Fall 2017 … · 2017-11-27 · Text Classification and Convolutional Neural Networks COSC 7336: Advanced Natural Language Processing

Recent work using CNNs: Authorship Attribution