Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured...

35
Search-Guided, Lightly-Supervised Training of Structured Prediction Energy Networks Andrew McCallum Pedram Rooshenas Dongxu Zhang Gopal Sharma

Transcript of Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured...

Page 1: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-Guided,Lightly-SupervisedTrainingofStructuredPredictionEnergyNetworks

AndrewMcCallumPedram Rooshenas Dongxu Zhang Gopal Sharma

Page 2: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

StructuredPrediction

• Weareinterestedtolearnafunction• Xinputvariables• Youtputvariables

• Wecandefineas• ForaGibbsdistribution:

Page 3: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

StructuredPredictionEnergyNetworks(SPENs)

• Ifisparameterizedusingadifferentiablemodelsuchasadeepneuralnetwork:• WecanfindalocalminimumofEusinggradientdescent

• Theenergynetworksexpressthecorrelationamonginputandoutputvariables.• Traditionallygraphicalmodelsareusedforrepresentingthecorrelationamongoutputvariables.• Inference isintractable formostofexpressive graphicalmodels

Page 4: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

EnergyModels

[picture from Belanger (2016)]

[picture from Altinel (2018)]

Page 5: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

TrainingSPENs

• StructuralSVM(BelangerandMcCallum,2016)• End-to-End(Belangeretal.,2017)• Value-basedtraining(Gygli etal.2017)• InferenceNetwork(Lifu Tu andKevinGimpel,2018)• Rank-BasedTraining(Rooshenasetal.,2018)

Page 6: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

IndirectSupervision• Dataannotationisexpensive,especiallyforstructuredoutputs.• Domainknowledge asthesourceofsupervision.

• Itcanbewrittenasrewardfunctions• evaluatesapairofinputandoutputconfigurationintoascalarvalue• Foragivenx,wearelookingforthebestythatmaximize

6

Page 7: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

Wehaveareward function thatprovides indirect supervision

Page 8: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

Wehaveareward function thatprovides indirect supervision

Wewanttolearnasmooth versionof the rewardfunctionsuch thatwecanusegradient-descent inference attesttime

Page 9: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

y0

Wesample apoint from energy function using noisygradient-descent inference

Page 10: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

y0

y1

Wesample apoint from energy function using noisygradient-descent inference

Page 11: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

y0

y2

y1

Wesample apoint from energy function using noisygradient-descent inference

Page 12: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

y0

y2

y3y1

Wesample apoint from energy function using noisygradient-descent inference

Page 13: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

y0

y2

y3y1

y4

Wesample apoint from energy function using noisygradient-descent inference

Page 14: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

y0

y2

y3y1

y4y5

Wesample apoint from energy function using noisygradient-descent inference

Page 15: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

y0

y2

y3y1

y4y5

Thenweproject thesample tothedomain ofthe rewardfunction(thesample isapoint inthesimplex,but thedomain ofthe rewardfunction isoften discrete, i.e.,theverticesof thesimplex)

Page 16: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

y0

y2

y3y1

y4y5

Then thesearchprocedure usesthesampleasinput andreturns anoutput structure bysearching therewardfunction

Page 17: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

y0

y2

y3y1

y4y5

Weexpectthatthe twopoints havethesamerankingon thereward function andnegative oftheenergy function

Page 18: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

y0

y2

y3y1

y4y5

Rankingviolation

Weexpectthatthe twopoints havethesamerankingon thereward function andnegative oftheenergy function

Page 19: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Search-GuidedTraining

y0

y2

y3y1

y4y5

Whenwefind apairofpoints thatviolates theranking constraints,weupdate theenergy function towards reducing theviolation

Page 20: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Task-LossasRewardFunctionforMulti-LabelClassification• Thesimplestformofindirectsupervisionistousetask-lossasrewardfunction:

Page 21: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

DomainKnowledgeasRewardFunctionforCitationFieldExtraction

24

Page 22: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

DomainKnowledgeasRewardFunctionforCitationFieldExtraction

25

Page 23: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

DomainKnowledgeasRewardFunctionforCitationFieldExtraction

26

Page 24: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

DomainKnowledgeasRewardFunctionforCitationFieldExtraction

27

Page 25: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

EnergyModel

0.9

0.9

0.85

0.4

0.1

0.05

0.05

0.04

0.1

0.45

0.8

0.9

... ...

Input embedding

Tagdistribution

Convolutional layer with multiple filters

and differentwindow sizes

Max pooling and

concatenation Multi-layer perceptron

Tokens

WeiLi.

DeepLearning

for

...

Energy

...

...

...

...

...

...

...

author title ...

Filter

size

Filte

r siz

e

Page 26: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

PerformanceonCitationFieldExtraction

Page 27: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

Semi-SupervisedSetting• Alternativelyusetheoutputofsearchandground-truthlabelfortraining.

Page 28: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

ShapeParser

I

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

Page 29: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

ShapeParser

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

I

Predict

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

Page 30: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

ShapeParser

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

GraphicEngine

I O

Predict

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

Page 31: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

ShapeParser

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

GraphicEngine

I O

Predict

+

-

c(32,32,28) c(32,32,24)

t(32,32,20)

Parsing

Page 32: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

ShapeParserEnergyModel

0.8

1e-5

1e-5

0.01

1e-5

...

...

...

...

...

Convolutional layer

 

Program

circle(16,16,12)triangle(32,48,16)

+

circle(16,24,12)­

Energy

1e-5

1e-5

1e-3

1e-5

0.9

circle(16,16,12) -...

CNN

Output

distribution

Input  

image Multi-layer perceptron

Page 33: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

SearchBudgetvs.Constraints

Page 34: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

PerformanceonShapeParser

Page 35: Search-Guided, Lightly-Supervised Training of Structured … · 2020. 9. 20. · Structured Prediction Energy Networks (SPENs) • If is parameterized using a differentiable model

ConclusionandFutureDirections

• Ifarewardfunctionexiststoevaluateeverystructuredoutputintoascalarvalue• Wecanuseunlabled datafortrainingstructuredpredictionenergynetworks

• Domainknowledgeornon-differentiablepipelinescanbeusedtodefinetherewardfunctions.• Themainingredientforlearningfromtherewardfunctionisthesearchoperator.• Hereweonlyusesimplesearchoperators,butmorecomplexsearchfunctionsderivedfromdomainknowledgecanbeusedforcomplicatedproblems.