Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/09...Natural Language...

Natural Language Processing

Lecture 9: Classifcaton

Classifcaton

Notaton

• Training examples: x = (x1, x2, ..., xN)

• Their categories: y = (y1, y2, ..., yN)

• A classifer C seeks to map xi to yi

• A learner L infers C from (x, y)x

yL C

Cx y

Probabilistc Classifers

Cx y

return arg maxy’ p(y’ | x)

Noisy Channel Model (General)

sourcesource

channel

y x

decode

p(y)p(x | y)

What proporton of emails are expectedto be spam vs. not spam?

What proporton of product reviews areexpected to get 1,2,3,4,5 stars?

Noisy Channel Classifers

Cx y

returnargmaxy p(y) × p(x | y)

Representng Text: Features

• Any object you might be given to classify can be represented as a vector in a vector space– Vectors of representng text are ofen sparse and

high-dimensional

• Designing Φ (“Feature engineering”)– What informaton do you need to solve the

problem?

– What informaton do you need to avoid mistakes?

– Very common: bag-of-words

Naïve Bayes Classifer

Cx y

ϕj ← [Φ(x)]j

return argmaxy’ p(y’)×Πj p(ϕj | y’)

Naïve Bayes Learner

x

y

L

p∀y, p(y)

∀y, ∀j, ∀f, p(ϕj(x) = f | y)

Linear Classifers

C:

1. Use Φ(x) to map x onto a real-valued feature space.

2. Calculate the linear score z = w ᵀ Φ(x).

3. If z > 0, then return y = YES, else y = NO.

Cx y

Linear Classifers

Cx y

Linear Classifers

Cx y

u : wᵀu =

0

w

Linear Classifers

Cx y

u : w

ᵀu =

0

w

Linear Classifers

Cx y

u : w

ᵀu =

0

Linear Classifers (> 2 Classes)

Cx y

returnarg maxy w ᵀ Φ(x, y)

Perceptron Learner

x

y

L

w

w ← 0for t = 1 ... T:

select (xt, yt)# run current classifery ← arg maxy’ w ᵀ Φ(x, y’)

if y != yt then # mistakew ← w + α [Φ(xt, yt) − Φ(xt, y)]

return w

Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/09...Natural Language...

Documents

Transcript of Natural Language Processingdemo.clab.cs.cmu.edu/NLP/S20/files/slides/09...Natural Language...