Machine Learning in Natural Language Processing
-
Upload
jinho-d-choi -
Category
Technology
-
view
166 -
download
1
description
Transcript of Machine Learning in Natural Language Processing
Jinho D. Choi [email protected]
Machine Learning in Natural Language Processing
Data Science ATL Meetup October 9th, 2014
Natural Language Processing
2
NLP is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages.
According to Wikipedia:
What areNLP tasks?
Natural Language Processing
3
John bought two books from me that he wantedNNP VBD CD NNS IN NNP WDT PRP VBZ
wanted
bought
two
John books from
me
that he
agenttheme
source
theme agent
nsubjdobj
prep
num
rcmod
pobj
nsubjdobj
end possession
start possession
Part-of-speech Tagging
Dependency Parsing
Semantic Role Labeling
Semantic Understanding
Coreference Resolution
How?
Rule-based Approach
4
if wi.form == ‘John’: wi.pos = ‘noun’
if wi.form == ‘majors’: wi.pos = ‘noun’
if wi.form == ‘majors’ and wi-1.form == ‘two’ wi.pos = ‘noun’
if wi.form == ‘studies’ and wi-1.pos == ‘num’ wi.pos = ‘noun’
Really?
Too specific!
Keep doing this?
Find the part-of-speech tag of each word.
Good.
John has two majors John majors in Mathnoun verb num noun noun verb num noun
Machine Learning Approach
5
John has two majors John majors in Mathnoun verb num noun noun verb num noun
Extract features for each word.
wi-1.formwi.form wi+1.form wi-1.f + wi.f wi.f + wi+1.fLabel
John ∅ has ∅ John_hasnoun
noun majors two ∅ two_majors ∅
verb majors John in John_majors majors_in
Convert string features into vector.
0 0 1 0 0
John has two
majors in Math
0 1 0 0 0 0
John has two
majors in Math
0 0 0 0 1 0
John has two
majors in Math
0
Space?
Issues with NLP Features
6
NLP tasks often deal with 1 ~ 10 million features.
These feature vectors are very sparse.
The values in these vectors are often binary.
Many features are redundant in some way.
Feature selection takes a long time.
Is machine learning easier or harder for NLP?