Machine Learning in Natural Language Processing

6
Jinho D. Choi [email protected] Machine Learning in Natural Language Processing Data Science ATL Meetup October 9th, 2014

description

Panel talk given to the ATL Data Science meet-up. http://www.meetup.com/Data-Science-ATL/events/205956952/

Transcript of Machine Learning in Natural Language Processing

Page 1: Machine Learning in Natural Language Processing

Jinho D. Choi [email protected]

Machine Learning in Natural Language Processing

Data Science ATL Meetup October 9th, 2014

Page 2: Machine Learning in Natural Language Processing

Natural Language Processing

2

NLP is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages.

According to Wikipedia:

What areNLP tasks?

Page 3: Machine Learning in Natural Language Processing

Natural Language Processing

3

John bought two books from me that he wantedNNP VBD CD NNS IN NNP WDT PRP VBZ

wanted

bought

two

John books from

me

that he

agenttheme

source

theme agent

nsubjdobj

prep

num

rcmod

pobj

nsubjdobj

end possession

start possession

Part-of-speech Tagging

Dependency Parsing

Semantic Role Labeling

Semantic Understanding

Coreference Resolution

How?

Page 4: Machine Learning in Natural Language Processing

Rule-based Approach

4

if wi.form == ‘John’: wi.pos = ‘noun’

if wi.form == ‘majors’: wi.pos = ‘noun’

if wi.form == ‘majors’ and wi-1.form == ‘two’ wi.pos = ‘noun’

if wi.form == ‘studies’ and wi-1.pos == ‘num’ wi.pos = ‘noun’

Really?

Too specific!

Keep doing this?

Find the part-of-speech tag of each word.

Good.

John has two majors John majors in Mathnoun verb num noun noun verb num noun

Page 5: Machine Learning in Natural Language Processing

Machine Learning Approach

5

John has two majors John majors in Mathnoun verb num noun noun verb num noun

Extract features for each word.

wi-1.formwi.form wi+1.form wi-1.f + wi.f wi.f + wi+1.fLabel

John ∅ has ∅ John_hasnoun

noun majors two ∅ two_majors ∅

verb majors John in John_majors majors_in

Convert string features into vector.

0 0 1 0 0

John has two

majors in Math

0 1 0 0 0 0

John has two

majors in Math

0 0 0 0 1 0

John has two

majors in Math

0

Space?

Page 6: Machine Learning in Natural Language Processing

Issues with NLP Features

6

NLP tasks often deal with 1 ~ 10 million features.

These feature vectors are very sparse.

The values in these vectors are often binary.

Many features are redundant in some way.

Feature selection takes a long time.

Is machine learning easier or harder for NLP?