5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage...

156
5525: Speech and Language Processing Alan Ri8er (many slides from Greg Durrett)

Transcript of 5525: Speech and Language Processingaritter.github.io/courses/5525_slides_v2/lec1-intro.pdfLanguage...

5525:SpeechandLanguageProcessing

AlanRi8er(many slides from Greg Durrett)

Administrivia

‣ Coursewebsite: h8p://ari8er.github.io/courses/5525_fall19.html

‣ Piazza:linkonthecoursewebsite

‣ Myofficehours:Friday4-5pmDL595

‣ TA:AshutoshBaheP;Officehours:Wednesday1-2pm,DL574

CourseRequirements

‣ Priorexposuretomachinelearningveryhelpfulbutnotrequired

‣ Programming/Pythonexperience

‣ Probability

‣ LinearAlgebra

‣ Calculus

CourseRequirements

‣ Priorexposuretomachinelearningveryhelpfulbutnotrequired

‣ Programming/Pythonexperience

‣ Probability

‣ LinearAlgebra

‣ Calculus

There will be a lot of math and programming!

Enrollment

Enrollment

‣ Homework1isoutnow(dueAugust30):

Enrollment

‣ Homework1isoutnow(dueAugust30):

‣ Pleaselookattheassignmentwellbeforethen

Enrollment

‣ Homework1isoutnow(dueAugust30):

‣ Ifthisseemslikeit’llbechallengingforyou,comeandtalktome(thisissmaller-scalethanthelaterassignments,whicharesmaller-scalethanthefinalproject)

‣ Pleaselookattheassignmentwellbeforethen

Texts

‣ 2greattextbooksforNLP

‣ Therewillbeassignedreadingsfromboth

‣ Bothfreelyavailableonline

What’sthegoalofNLP?

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

‣ Example:dialoguesystems

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

‣ Example:dialoguesystems

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

‣ Example:dialoguesystems

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

‣ Example:dialoguesystems

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

WhoisitsCEO?

‣ Example:dialoguesystems

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

WhoisitsCEO?

‣ Example:dialoguesystems

TimCook

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

recognizemarketCapisthetargetvalue

WhoisitsCEO?

‣ Example:dialoguesystems

TimCook

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

recognizemarketCapisthetargetvalue

recognizepredicate

WhoisitsCEO?

‣ Example:dialoguesystems

TimCook

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

recognizemarketCapisthetargetvalue

recognizepredicate

docomputaPon

WhoisitsCEO?

‣ Example:dialoguesystems

TimCook

What’sthegoalofNLP?

‣ Beabletosolveproblemsthatrequiredeepunderstandingoftext

Siri,what’sthemostvaluableAmerican

company?

Apple

recognizemarketCapisthetargetvalue

recognizepredicate

docomputaPon

WhoisitsCEO?

‣ Example:dialoguesystems

resolvereferences

TimCook

AutomaPcSummarizaPon

AutomaPcSummarizaPon

AutomaPcSummarizaPon

OneofNewAmerica’swriterspostedastatementcriPcalofGoogle.EricSchmidt,Google’sCEO,wasdispleased.

Thewriterandhisteamweredismissed.

AutomaPcSummarizaPon

OneofNewAmerica’swriterspostedastatementcriPcalofGoogle.EricSchmidt,Google’sCEO,wasdispleased.

Thewriterandhisteamweredismissed.

compresstext

AutomaPcSummarizaPon

OneofNewAmerica’swriterspostedastatementcriPcalofGoogle.EricSchmidt,Google’sCEO,wasdispleased.

Thewriterandhisteamweredismissed.

providemissingcontext

compresstext

AutomaPcSummarizaPon

OneofNewAmerica’swriterspostedastatementcriPcalofGoogle.EricSchmidt,Google’sCEO,wasdispleased.

Thewriterandhisteamweredismissed.

providemissingcontext

paraphrasetoprovideclarity

compresstext

MachineTranslaPon

People’sDaily,August30,2017

MachineTranslaPon

People’sDaily,August30,2017

MachineTranslaPon

TrumpPopefamilywatchahundredyearsayearintheWhiteHousebalcony

People’sDaily,August30,2017

MachineTranslaPon

TrumpPopefamilywatchahundredyearsayearintheWhiteHousebalcony

People’sDaily,August30,2017

NLPAnalysisPipeline

NLPAnalysisPipelineText

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

TextAnalysisText

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

TextAnalysisText Annota.ons

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

TextAnalysis Applica.onsText Annota.ons

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

TextAnalysis Applica.onsText Annota.ons

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

TextAnalysis Applica.onsText Annota.ons

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

AnswerquesPons

TextAnalysis Applica.onsText Annota.ons

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

AnswerquesPons

IdenPfysenPment

TextAnalysis Applica.onsText Annota.ons

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

AnswerquesPons

IdenPfysenPment

Translate

TextAnalysis Applica.onsText Annota.ons

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

AnswerquesPons

IdenPfysenPment

‣ NLPisaboutbuildingthesepieces!Translate

TextAnalysis Applica.onsText Annota.ons

NLPAnalysisPipeline

SyntacPcparses

CoreferenceresoluPon

EnPtydisambiguaPon

Discourseanalysis

Summarize

ExtractinformaPon

AnswerquesPons

IdenPfysenPment

‣ NLPisaboutbuildingthesepieces!Translate

TextAnalysis Applica.onsText Annota.ons

‣ AllofthesecomponentsaremodeledwithstaPsPcal approachestrainedwithmachinelearning

Howdowerepresentlanguage?Text

Howdowerepresentlanguage?LabelsText

Howdowerepresentlanguage?LabelsText

themoviewasgood +

Howdowerepresentlanguage?LabelsText

themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjec.ve

Howdowerepresentlanguage?Labels

Sequences/tags

Text

themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjec.ve

TomCruisestarsinthenewMissionImpossiblefilmPERSON WORK_OF_ART

Howdowerepresentlanguage?Labels

Sequences/tags

Trees

Text

themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjec.ve

TomCruisestarsinthenewMissionImpossiblefilmPERSON WORK_OF_ART

Ieatcakewithicing

PPNP

S

NPVP

VBZ NN

Howdowerepresentlanguage?Labels

Sequences/tags

Trees

Text

themoviewasgood +Beyoncéhadoneofthebestvideosofall6me subjec.ve

TomCruisestarsinthenewMissionImpossiblefilmPERSON WORK_OF_ART

Ieatcakewithicing

PPNP

S

NPVP

VBZ NNflightstoMiami

λx.flight(x)∧dest(x)=Miami

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

ExtractsyntacPcfeatures

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

Treetransducers(formachinetranslaPon)

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

Treetransducers(formachinetranslaPon)

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

Applica.ons

Treetransducers(formachinetranslaPon)

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

end-to-endmodels …

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

‣MainquesPon:WhatrepresentaPonsdoweneedforlanguage?Whatdowewanttoknowaboutit?

Applica.ons

Treetransducers(formachinetranslaPon)

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

end-to-endmodels …

HowdoweusetheserepresentaPons?

Labels

Sequences

Trees

TextAnalysisText

‣MainquesPon:WhatrepresentaPonsdoweneedforlanguage?Whatdowewanttoknowaboutit?

‣ Boilsdownto:whatambiguiPesdoweneedtoresolve?

Applica.ons

Treetransducers(formachinetranslaPon)

ExtractsyntacPcfeatures

Tree-structuredneuralnetworks

end-to-endmodels …

Whyislanguagehard? (andhowcanwehandlethat?)

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyadvocated

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyadvocated

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyfeared

theyadvocated

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyfeared

theyadvocated

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyfeared

theyadvocated

‣ Thisissocomplicatedthatit’sanAIchallengeproblem!(AI-complete)

LanguageisAmbiguous!

‣ HectorLevesque(2011):“Winogradschemachallenge”(namedalerTerryWinograd,thecreatorofSHRDLU)

Thecitycouncilrefusedthedemonstratorsapermitbecausethey______violence

theyfeared

theyadvocated

‣ Thisissocomplicatedthatit’sanAIchallengeproblem!(AI-complete)

‣ ReferenPal/semanPcambiguity

LanguageisAmbiguous!

slidecredit:DanKlein

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk‣ IraqiHeadSeeksArms

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk‣ IraqiHeadSeeksArms

‣ StolenPainPngFoundbyTree

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk‣ IraqiHeadSeeksArms

‣ StolenPainPngFoundbyTree‣ KidsMakeNutriPousSnacks

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk‣ IraqiHeadSeeksArms

‣ StolenPainPngFoundbyTree‣ KidsMakeNutriPousSnacks‣ LocalHSDropoutsCutinHalf

LanguageisAmbiguous!

‣ Headlines

slidecredit:DanKlein

‣ SyntacPc/semanPcambiguity:parsingneededtoresolvethese,butneedcontexttofigureoutwhichparseiscorrect

‣ TeacherStrikesIdleKids‣ HospitalsSuedby7FootDoctors‣ BanonNudeDancingonGovernor’sDesk‣ IraqiHeadSeeksArms

‣ StolenPainPngFoundbyTree‣ KidsMakeNutriPousSnacks‣ LocalHSDropoutsCutinHalf

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

ilfaitvraimentbeau

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallynice

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPful

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPfulItisreallybeauPfuloutside

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPfulItisreallybeauPfuloutsideHemakestrulybeauPful

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPfulItisreallybeauPfuloutsideHemakestrulybeauPfulHemakestrulyboyfriend

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPfulItisreallybeauPfuloutsideHemakestrulybeauPful

ItfactactuallyhandsomeHemakestrulyboyfriend

LanguageisReallyAmbiguous!

‣ Therearen’tjustoneortwopossibiliPeswhichareresolvedpragmaPcally

‣ CombinatoriallymanypossibiliPes,manyyouwon’tevenregisterasambiguiPes,butsystemssPllhavetoresolvethem

Itisreallyniceout

ilfaitvraimentbeau It’sreallyniceTheweatherisbeauPfulItisreallybeauPfuloutsideHemakestrulybeauPful

ItfactactuallyhandsomeHemakestrulyboyfriend

‣ Lotsofdata!

slidecredit:DanKlein

Whatdoweneedtounderstandlanguage?

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

DOJgreenlightsDisney-Foxmerger

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

DOJgreenlightsDisney-Foxmerger

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

DOJgreenlightsDisney-Foxmerger

DepartmentofJus6ce

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

DOJgreenlightsDisney-Foxmerger

metaphor;“approves”

DepartmentofJus6ce

Whatdoweneedtounderstandlanguage?

‣ Worldknowledge:haveaccesstoinformaPonbeyondthetrainingdata

DOJgreenlightsDisney-Foxmerger

metaphor;“approves”

DepartmentofJus6ce

‣ Whatisagreenlight?Howdoweunderstandwhat“greenlighPng”does?

‣ Grounding:learnwhatfundamentalconceptsactuallymeaninadata-drivenway

Whatdoweneedtounderstandlanguage?

‣ Grounding:learnwhatfundamentalconceptsactuallymeaninadata-drivenway

Gollandetal.(2010)

Whatdoweneedtounderstandlanguage?

‣ Grounding:learnwhatfundamentalconceptsactuallymeaninadata-drivenway

McMahanandStone(2015)Gollandetal.(2010)

Whatdoweneedtounderstandlanguage?

‣ LinguisPcstructure

CenteringTheoryGroszetal.(1995)

Whatdoweneedtounderstandlanguage?

‣ LinguisPcstructure‣ …butcomputersprobablywon’tunderstandlanguagethesamewayhumansdo

CenteringTheoryGroszetal.(1995)

Whatdoweneedtounderstandlanguage?

‣ LinguisPcstructure‣ …butcomputersprobablywon’tunderstandlanguagethesamewayhumansdo

‣ However,linguisPcstellsuswhatphenomenaweneedtobeabletodealwithandgivesushintsabouthowlanguageworks

CenteringTheoryGroszetal.(1995)

Whatdoweneedtounderstandlanguage?

‣ LinguisPcstructure‣ …butcomputersprobablywon’tunderstandlanguagethesamewayhumansdo

‣ However,linguisPcstellsuswhatphenomenaweneedtobeabletodealwithandgivesushintsabouthowlanguageworks

CenteringTheoryGroszetal.(1995)

Whatdoweneedtounderstandlanguage?

Whattechniquesdoweuse?(tocombinedata,knowledge,linguisPcs,etc.)

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

“AIwinter”rule-based,expertsystems

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,SenPment

Unsup:topicmodels,grammarinducPon

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,SenPment

Unsup:topicmodels,grammarinducPon

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,SenPment

Semi-sup,structuredpredicPon

Unsup:topicmodels,grammarinducPon

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,SenPment

Semi-sup,structuredpredicPon

Neural

Unsup:topicmodels,grammarinducPon

Collinsvs.Charniakparsers

Abriefhistoryof(modern)NLP

1980 1990 2000 2010 2018

earlieststatMTworkatIBM

“AIwinter”rule-based,expertsystems

Penntreebank

NP VP

S

Ratnaparkhitagger

NNP VBZ

Sup:SVMs,CRFs,NER,SenPment

Semi-sup,structuredpredicPon

Neural

StructuredPredicPon

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

StructuredPredicPon

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

unsupervisedlearning

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

annotaPon(twohours!)

unsupervisedlearning

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

annotaPon(twohours!)

unsupervisedlearning

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

be8ersystem!

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

StructuredPredicPon

‣ Supervisedtechniquesworkwellonveryli8ledata

annotaPon(twohours!)

unsupervisedlearning

‣ Evenneuralnetscandopre8ywell!

“LearningaPart-of-SpeechTaggerfromTwoHoursofAnnotaPon” Garre8eandBaldridge(2013)

be8ersystem!

‣ Allofthesetechniquesaredata-driven!Somedataisnaturallyoccurring,butmayneedtolabel

Bahdanauetal.(2014)DeNeroetal.(2008)

LessManualStructure?

Doesmanualstructurehaveaplace?

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

‣ Coreference:rule-basedsystemsaresPllaboutasgoodasdeeplearningout-of-domain

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

MoosaviandStrube(2017)

‣ Coreference:rule-basedsystemsaresPllaboutasgoodasdeeplearningout-of-domain

Wikipedia

Newswire

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

MoosaviandStrube(2017)

‣ Coreference:rule-basedsystemsaresPllaboutasgoodasdeeplearningout-of-domain

‣ LORELEI:transiPonpointbelowwhichphrase-basedsystemsarebe8er

Wikipedia

Newswire

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

MoosaviandStrube(2017)

‣ Coreference:rule-basedsystemsaresPllaboutasgoodasdeeplearningout-of-domain

‣ LORELEI:transiPonpointbelowwhichphrase-basedsystemsarebe8er

‣ Whyisthis?InducPvebias!

Wikipedia

Newswire

Doesmanualstructurehaveaplace?

‣ Neuralnetsdon’talwaysworkoutofdomain!

MoosaviandStrube(2017)

‣ Coreference:rule-basedsystemsaresPllaboutasgoodasdeeplearningout-of-domain

‣ LORELEI:transiPonpointbelowwhichphrase-basedsystemsarebe8er

‣ Whyisthis?InducPvebias!

‣ CanmulP-tasklearninghelp?

Wikipedia

Newswire

TrumpPopefamilywatchahundredyearsayearintheWhiteHousebalcony

Doesmanualstructurehaveaplace?

TrumpPopefamilywatchahundredyearsayearintheWhiteHousebalcony

‣ Maybemanualstructurewouldhelp…

Doesmanualstructurehaveaplace?

Wherearewe?

Wherearewe?

‣ NLPconsistsof:analyzingandbuildingrepresentaPonsfortext,solvingproblemsinvolvingtext

Wherearewe?

‣ NLPconsistsof:analyzingandbuildingrepresentaPonsfortext,solvingproblemsinvolvingtext

‣ Theseproblemsarehardbecauselanguageisambiguous,requiresdrawingondata,knowledge,andlinguisPcstosolve

Wherearewe?

‣ NLPconsistsof:analyzingandbuildingrepresentaPonsfortext,solvingproblemsinvolvingtext

‣ Theseproblemsarehardbecauselanguageisambiguous,requiresdrawingondata,knowledge,andlinguisPcstosolve

‣ Knowingwhichtechniquesuserequiresunderstandingdatasetsize,problemcomplexity,andalotoftricks!

Wherearewe?

‣ NLPconsistsof:analyzingandbuildingrepresentaPonsfortext,solvingproblemsinvolvingtext

‣ Theseproblemsarehardbecauselanguageisambiguous,requiresdrawingondata,knowledge,andlinguisPcstosolve

‣ Knowingwhichtechniquesuserequiresunderstandingdatasetsize,problemcomplexity,andalotoftricks!

‣ NLPencompassesallofthesethings

NLPvs.ComputaPonalLinguisPcs

‣ NLP:buildsystemsthatdealwithlanguagedata

‣ CL:usecomputaPonaltoolstostudylanguage

Hamiltonetal.(2016)

NLPvs.ComputaPonalLinguisPcs

‣ NLP:buildsystemsthatdealwithlanguagedata

‣ CL:usecomputaPonaltoolstostudylanguage

Hamiltonetal.(2016)

NLPvs.ComputaPonalLinguisPcs

‣ ComputaPonaltoolsforotherpurposes:literarytheory,poliPcalscience…

Bamman,O’Connor,Smith(2013)

NLPvs.ComputaPonalLinguisPcs

‣ ComputaPonaltoolsforotherpurposes:literarytheory,poliPcalscience…

Bamman,O’Connor,Smith(2013)

OutlineoftheCourse

OutlineoftheCourse

MLandstructuredpredicPonforNLP {

OutlineoftheCourse

MLandstructuredpredicPonforNLP {

Neuralnets {

OutlineoftheCourse

MLandstructuredpredicPonforNLP {

Neuralnets {{Syntax/

semanPcs

OutlineoftheCourse

MLandstructuredpredicPonforNLP {

Neuralnets {{Syntax/

semanPcs

{ApplicaPons:MT,IE,summarizaPon,dialogue,etc.

CourseGoals

CourseGoals

‣ CoverfundamentalmachinelearningtechniquesusedinNLP

CourseGoals

‣ CoverfundamentalmachinelearningtechniquesusedinNLP

‣ UnderstandhowtolookatlanguagedataandapproachlinguisPcphenomena

CourseGoals

‣ CoverfundamentalmachinelearningtechniquesusedinNLP

‣ CovermodernNLPproblemsencounteredintheliterature:whataretheacPveresearchtopicsin2018?

‣ UnderstandhowtolookatlanguagedataandapproachlinguisPcphenomena

CourseGoals

‣ CoverfundamentalmachinelearningtechniquesusedinNLP

‣ Makeyoua“producer”ratherthana“consumer”ofNLPtools

‣ CovermodernNLPproblemsencounteredintheliterature:whataretheacPveresearchtopicsin2018?

‣ UnderstandhowtolookatlanguagedataandapproachlinguisPcphenomena

CourseGoals

‣ CoverfundamentalmachinelearningtechniquesusedinNLP

‣ Makeyoua“producer”ratherthana“consumer”ofNLPtools

‣ CovermodernNLPproblemsencounteredintheliterature:whataretheacPveresearchtopicsin2018?

‣ Thefourassignmentsshouldteachyouwhatyouneedtoknowtounderstandnearlyanysystemintheliterature

‣ UnderstandhowtolookatlanguagedataandapproachlinguisPcphenomena

Assignments

‣ 4HomeworkAssignments

‣ ImplementaPon-oriented,withanopen-endedcomponenttoeach

‣ Homework1(NaiveBayesforsenPmentclassificaPon)isoutNOW

‣ ~2weeksperassignment,3“slipdays”forautomaPcextensions

Assignments

‣ 4HomeworkAssignments

‣ ImplementaPon-oriented,withanopen-endedcomponenttoeach

‣ Homework1(NaiveBayesforsenPmentclassificaPon)isoutNOW

‣ ~2weeksperassignment,3“slipdays”forautomaPcextensions

Theseprojectsrequireunderstandingoftheconcepts,abilitytowriteperformantcode,andabilitytothinkabouthowtodebugcomplexsystems.Theyarechallenging,sostartearly!

FinalProject

‣ Finalproject(20%)‣ Groupsof3-4preferred,1ispossible.‣ Goodideatotalktorunyourprojectideabymeinofficehoursoremail.

‣ 4pagereport+finalprojectpresentaPon.