Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to...

73
Machine Learning Introduction to Bayesian Learning 1

Transcript of Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to...

Page 1: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MachineLearning

IntroductiontoBayesianLearning

1

Page 2: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Whatwehaveseensofar

Whatdoesitmeantolearn?– Mistake-drivenlearning

• Learningbycounting(andbounding)numberofmistakes

– PAClearnability• Samplecomplexityandboundsonerrorsonunseenexamples

Variouslearningalgorithms– Analyzedalgorithmsunderthesemodelsoflearnability– Inallcases,thealgorithmoutputsafunctionthatproducesalabely foragiveninputx

2

Page 3: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Comingup

Anotherwayofthinkingabout“Whatdoesitmeantolearn?”– Bayesianlearning

Differentlearningalgorithmsinthisregime– NaïveBayes– LogisticRegression

3

Page 4: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Today’slecture

• BayesianLearning

• Maximumaposterioriandmaximumlikelihoodestimation

• Twoexamplesofmaximumlikelihoodestimation– Binomialdistribution– Normaldistribution

4

Page 5: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Today’slecture

• BayesianLearning

• Maximumaposterioriandmaximumlikelihoodestimation

• Twoexamplesofmaximumlikelihoodestimation– Binomialdistribution– Normaldistribution

5

Page 6: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

ProbabilisticLearning

Twodifferentnotionsofprobabilisticlearning• Learningprobabilisticconcepts

– Thelearnedconceptisafunctionc:X®[0,1]– c(x)maybeinterpretedastheprobabilitythatthelabel1is

assignedtox– Thelearningtheorythatwehavestudiedbeforeisapplicable

(withsomeextensions)

• BayesianLearning:Useofaprobabilisticcriterioninselectingahypothesis– Thehypothesiscanbedeterministic,aBooleanfunction– Thecriterionforselectingthehypothesisisprobabilistic

6

Page 7: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

ProbabilisticLearning

Twodifferentnotionsofprobabilisticlearning• Learningprobabilisticconcepts

– Thelearnedconceptisafunctionc:X®[0,1]– c(x)maybeinterpretedastheprobabilitythatthelabel1is

assignedtox– Thelearningtheorythatwehavestudiedbeforeisapplicable

(withsomeextensions)

• BayesianLearning:Useofaprobabilisticcriterioninselectingahypothesis– Thehypothesiscanbedeterministic,aBooleanfunction– Thecriterionforselectingthehypothesisisprobabilistic

7

Page 8: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

ProbabilisticLearning

Twodifferentnotionsofprobabilisticlearning• Learningprobabilisticconcepts

– Thelearnedconceptisafunctionc:X®[0,1]– c(x)maybeinterpretedastheprobabilitythatthelabel1is

assignedtox– Thelearningtheorythatwehavestudiedbeforeisapplicable

(withsomeextensions)

• BayesianLearning:Useofaprobabilisticcriterioninselectingahypothesis– Thehypothesiscanbedeterministic,aBooleanfunction– Thecriterionforselectingthehypothesisisprobabilistic

8

Page 9: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesianLearning:Thebasics

• Goal: Tofindthebest hypothesisfromsomespaceHofhypotheses,usingtheobserveddataD

• Definebest =most probablehypothesisinH

• Inordertodothat,weneedtoassumeaprobabilitydistributionovertheclassH

• Wealsoneedtoknowsomethingabouttherelationbetweenthedataobservedandthehypotheses– Aswewillsee,wecanbeBayesianaboutotherthings.e.g.,the

parametersofthemodel

9

Page 10: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesianLearning:Thebasics

• Goal: Tofindthebest hypothesisfromsomespaceHofhypotheses,usingtheobserveddataD

• Definebest =most probablehypothesisinH

• Inordertodothat,weneedtoassumeaprobabilitydistributionovertheclassH

• Wealsoneedtoknowsomethingabouttherelationbetweenthedataobservedandthehypotheses– Aswewillsee,wecanbeBayesianaboutotherthings.e.g.,the

parametersofthemodel

10

Page 11: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesianLearning:Thebasics

• Goal: Tofindthebest hypothesisfromsomespaceHofhypotheses,usingtheobserveddataD

• Definebest =most probablehypothesisinH

• Inordertodothat,weneedtoassumeaprobabilitydistributionovertheclassH

• Wealsoneedtoknowsomethingabouttherelationbetweenthedataobservedandthehypotheses

– Aswewillseeinthecominglectures,wecanbeBayesianaboutotherthings.e.g.,theparametersofthemodel

11

Page 12: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Bayesianmethodshavemultipleroles

• Providepracticallearningalgorithms

• Combiningpriorknowledgewithobserveddata– Guidethemodeltowardssomethingweknow

• Provideaconceptualframework– Forevaluatingotherlearners

• Providetoolsforanalyzinglearning

12

Page 13: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesTheorem

13

𝑃 𝑌 𝑋 = 𝑃 𝑋 𝑌 𝑃(𝑌)

𝑃(𝑋)

Page 14: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesTheorem

Shortfor

14

𝑃 𝑌 𝑋 = 𝑃 𝑋 𝑌 𝑃(𝑌)

𝑃(𝑋)

∀𝑥, 𝑦𝑃 𝑌 = 𝑦 𝑋 = 𝑥 = 𝑃 𝑋 = 𝑥 𝑌 = 𝑦 𝑃(𝑌 = 𝑦)

𝑃(𝑋 = 𝑥)

Page 15: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesTheorem

15

Page 16: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesTheorem

Priorprobability:WhatisourbeliefinYbeforeweseeX?

16

Page 17: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesTheorem

Likelihood:WhatisthelikelihoodofobservingXgivenaspecificY?

Priorprobability:WhatisourbeliefinYbeforeweseeX?

17

Page 18: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesTheorem

Posteriorprobability:WhatistheprobabilityofYgiventhatXisobserved?

Likelihood:WhatisthelikelihoodofobservingXgivenaspecificY?

Priorprobability:WhatisourbeliefinYbeforeweseeX?

18

Page 19: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesTheorem

Posteriorprobability:WhatistheprobabilityofYgiventhatXisobserved?

Likelihood:WhatisthelikelihoodofobservingXgivenaspecificY?

Priorprobability:WhatisourbeliefinYbeforeweseeX?

19

Posterior/ Likelihood£ Prior

Page 20: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

ProbabilityRefresher

• Productrule:

• Sumrule:

• EventsA,Bareindependentif:– P(A,B)=P(A)P(B)– Equivalently,P(A|B)=P(A),P(B|A)=P(B)

• TheoremofTotalprobability:FormutuallyexclusiveeventsA1,A2,!,An (i.e Ai Å Aj =;)with

20

Page 21: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesianLearning

21

GivenadatasetD,wewanttofindthebest hypothesish

Whatdoesbestmean?

Bayesianlearninguses𝑃 ℎ 𝐷), theconditionalprobabilityofahypothesisgiventhedata,todefinebest.

Page 22: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesianLearning

22

GivenadatasetD,wewanttofindthebesthypothesishWhatdoesbestmean?

Page 23: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesianLearning

23

Posteriorprobability:Whatistheprobabilitythathisthehypothesis,giventhatthedataDisobserved?

GivenadatasetD,wewanttofindthebesthypothesishWhatdoesbestmean?

Page 24: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesianLearning

24

Posteriorprobability:Whatistheprobabilitythathisthehypothesis,giventhatthedataDisobserved?

GivenadatasetD,wewanttofindthebesthypothesishWhatdoesbestmean?

Keyinsight:BothhandDareevents.• D:Theeventthatweobservedthis particulardataset• h:Theeventthatthehypothesishisthetruehypothesis

SowecanapplytheBayesrulehere.

Page 25: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesianLearning

25

Posteriorprobability:Whatistheprobabilitythathisthehypothesis,giventhatthedataDisobserved?

GivenadatasetD,wewanttofindthebesthypothesishWhatdoesbestmean?

Keyinsight:BothhandDareevents.• D:Theeventthatweobservedthis particulardataset• h:Theeventthatthehypothesishisthetruehypothesis

Page 26: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesianLearning

26

Posteriorprobability:Whatistheprobabilitythathisthehypothesis,giventhatthedataDisobserved?

Priorprobabilityofh:Backgroundknowledge.Whatdoweexpectthehypothesistobeevenbeforeweseeanydata?Forexample,intheabsenceofanyinformation,maybetheuniformdistribution.

GivenadatasetD,wewanttofindthebesthypothesishWhatdoesbestmean?

Page 27: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesianLearning

27

Posteriorprobability:Whatistheprobabilitythathisthehypothesis,giventhatthedataDisobserved?

Priorprobabilityofh:Backgroundknowledge.Whatdoweexpectthehypothesistobeevenbeforeweseeanydata?Forexample,intheabsenceofanyinformation,maybetheuniformdistribution.

Likelihood:Whatistheprobabilitythatthisdatapoint(anexampleoranentiredataset)isobserved,giventhatthehypothesisish?

GivenadatasetD,wewanttofindthebesthypothesishWhatdoesbestmean?

Page 28: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BayesianLearning

28

Posteriorprobability:Whatistheprobabilitythathisthehypothesis,giventhatthedataDisobserved?

Priorprobabilityofh:Backgroundknowledge.Whatdoweexpectthehypothesistobeevenbeforeweseeanydata?Forexample,intheabsenceofanyinformation,maybetheuniformdistribution.

WhatistheprobabilitythatthedataDisobserved(independentofanyknowledgeaboutthehypothesis)?

Likelihood:Whatistheprobabilitythatthisdatapoint(anexampleoranentiredataset)isobserved,giventhatthehypothesisish?

GivenadatasetD,wewanttofindthebesthypothesishWhatdoesbestmean?

Page 29: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Today’slecture

• BayesianLearning

• Maximumaposterioriandmaximumlikelihoodestimation

• Twoexamplesofmaximumlikelihoodestimation– Binomialdistribution– Normaldistribution

29

Page 30: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Choosingahypothesis

Givensomedata,findthemostprobablehypothesis– TheMaximumaPosteriorihypothesishMAP

30

Page 31: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Choosingahypothesis

Givensomedata,findthemostprobablehypothesis– TheMaximumaPosteriorihypothesishMAP

31

Page 32: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Choosingahypothesis

Givensomedata,findthemostprobablehypothesis– TheMaximumaPosteriorihypothesishMAP

32

Posterior/ Likelihood£ Prior

Page 33: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Choosingahypothesis

Givensomedata,findthemostprobablehypothesis– TheMaximumaPosteriorihypothesishMAP

Ifweassumethatthepriorisuniform– SimplifythistogettheMaximumLikelihoodhypothesis

33

i.e.P(hi)=P(hj),forallhi,hj

Oftencomputationallyeasiertomaximizeloglikelihood

Page 34: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Choosingahypothesis

Givensomedata,findthemostprobablehypothesis– TheMaximumaPosteriorihypothesishMAP

Ifweassumethatthepriorisuniform– SimplifythistogettheMaximumLikelihoodhypothesis

34

i.e.P(hi)=P(hj),forallhi,hj

Oftencomputationallyeasiertomaximizeloglikelihood

Page 35: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Choosingahypothesis

Givensomedata,findthemostprobablehypothesis– TheMaximumaPosteriorihypothesishMAP

Ifweassumethatthepriorisuniform– SimplifythistogettheMaximumLikelihoodhypothesis

35

i.e.P(hi)=P(hj),forallhi,hj

Oftencomputationallyeasiertomaximizeloglikelihood

Page 36: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BruteforceMAPlearner

Input:DataDandahypothesissetH1. Calculatetheposteriorprobabilityforeachh2 H

2. Outputthehypothesiswiththehighestposteriorprobability

36

Page 37: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

BruteforceMAPlearner

Input:DataDandahypothesissetH1. Calculatetheposteriorprobabilityforeachh2 H

2. Outputthehypothesiswiththehighestposteriorprobability

37

Difficulttocompute,exceptforthemost

simplehypothesisspaces

Page 38: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Today’slecture

• BayesianLearning

• Maximumaposterioriandmaximumlikelihoodestimation

• Twoexamplesofmaximumlikelihoodestimation– Bernoullitrials– Normaldistribution

38

Page 39: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodestimation

MaximumLikelihoodestimation(MLE)

Whatweneedinordertodefinelearning:1. AhypothesisspaceH2. AmodelthatsayshowdataDisgeneratedgivenh

39

Page 40: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Example1:Bernoullitrials

TheCEOofastartuphiresyouforyourfirstconsultingjob

• CEO:Mycompanymakeslightbulbs.Ineedtoknowwhatistheprobabilitytheyarefaulty.

• You:Sure.Icanhelpyouout.Aretheyallidentical?

• CEO:Yes!

• You:Excellent.Iknowhowtohelp.Weneedtoexperiment…

40

Page 41: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Faultylightbulbs

Theexperiment:Tryout100lightbulbs80work,20don’t

You:TheprobabilityisP(failure)=0.2

CEO:Buthowdoyouknow?

You:Because…

41

Page 42: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Bernoullitrials

• P(failure)=p,P(success)=1– p

• Eachtrialisi.i.d– Independentandidenticallydistributed

• YouhaveseenD={80work,20don’t}

• Themostlikelyvalueofpforthisobservationis?

42

𝑃 𝐷 𝑝 = 10080 𝑝23 1 − 𝑝 53

argmax;

𝑃 𝐷 𝑝 = argmax;

10080 𝑝23 1 − 𝑝 53

Page 43: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Bernoullitrials

• P(failure)=p,P(success)=1– p

• Eachtrialisi.i.d– Independentandidenticallydistributed

• YouhaveseenD={80work,20don’t}

• Themostlikelyvalueofpforthisobservationis?

43

𝑃 𝐷 𝑝 = 10080 𝑝23 1 − 𝑝 53

argmax;

𝑃 𝐷 𝑝 = argmax;

10080 𝑝23 1 − 𝑝 53

Page 44: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Bernoullitrials

• P(failure)=p,P(success)=1– p

• Eachtrialisi.i.d– Independentandidenticallydistributed

• YouhaveseenD={80work,20don’t}

• Themostlikelyvalueofpforthisobservationis?

44

𝑃 𝐷 𝑝 = 10080 𝑝23 1 − 𝑝 53

argmax;

𝑃 𝐷 𝑝 = argmax;

10080 𝑝23 1 − 𝑝 53

Page 45: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Bernoullitrials

• P(failure)=p,P(success)=1– p

• Eachtrialisi.i.d– Independentandidenticallydistributed

• YouhaveseenD={80work,20don’t}

• Themostlikelyvalueofpforthisobservationis?

45

𝑃 𝐷 𝑝 = 10080 𝑝23 1 − 𝑝 53

argmax;

𝑃 𝐷 𝑝 = argmax;

10080 𝑝23 1 − 𝑝 53

Page 46: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

The“learning”algorithm

SayyouhaveaWorkandbNot-Work

Calculus101:SetthederivativetozeroPbest =b/(a+b)

46

𝑝<=>? = argmax;

𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;

log 𝑎 + 𝑏𝑎 𝑝F 1 − 𝑝 <

= argmax;

𝑎 log 𝑝 + 𝑏log(1 − 𝑝)

Page 47: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

The“learning”algorithm

SayyouhaveaWorkandbNot-Work

Calculus101:SetthederivativetozeroPbest =b/(a+b)

47

Loglikelihood𝑝<=>? = argmax

;𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;

log 𝑎 + 𝑏𝑎 𝑝F 1 − 𝑝 <

= argmax;

𝑎 log 𝑝 + 𝑏log(1 − 𝑝)

Page 48: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

The“learning”algorithm

SayyouhaveaWorkandbNot-Work

Calculus101:SetthederivativetozeroPbest =b/(a+b)

48

Loglikelihood𝑝<=>? = argmax

;𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;

log 𝑎 + 𝑏𝑎 𝑝F 1 − 𝑝 <

= argmax;

𝑎 log 𝑝 + 𝑏log(1 − 𝑝)

Page 49: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

The“learning”algorithm

SayyouhaveaWorkandbNot-Work

Calculus101:SetthederivativetozeroPbest =b/(a+b)

49

Loglikelihood𝑝<=>? = argmax

;𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;

log 𝑎 + 𝑏𝑎 𝑝F 1 − 𝑝 <

= argmax;

𝑎 log 𝑝 + 𝑏log(1 − 𝑝)

Page 50: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

The“learning”algorithm

SayyouhaveaWorkandbNot-Work

Calculus101:SetthederivativetozeroPbest =b/(a+b)

50

Loglikelihood𝑝<=>? = argmax

;𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;

log 𝑎 + 𝑏𝑎 𝑝F 1 − 𝑝 <

= argmax;

𝑎 log 𝑝 + 𝑏log(1 − 𝑝)

Page 51: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

The“learning”algorithm

SayyouhaveaWorkandbNot-Work

Calculus101:SetthederivativetozeroPbest =b/(a+b)

51

ThemodelweassumedisBernoulli.Youcouldassumeadifferentmodel!Nextwewillconsiderothermodelsandseehowtolearntheirparameters.

Loglikelihood𝑝<=>? = argmax

;𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;

log 𝑎 + 𝑏𝑎 𝑝F 1 − 𝑝 <

= argmax;

𝑎 log 𝑝 + 𝑏log(1 − 𝑝)

Page 52: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Today’slecture

• BayesianLearning

• Maximumaposterioriandmaximumlikelihoodestimation

• Twoexamplesofmaximumlikelihoodestimation– Bernoullitrials– Normaldistribution

52

Page 53: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodestimation

MaximumLikelihoodestimation(MLE)

Whatweneedinordertodefinelearning:1. AhypothesisspaceH2. AmodelthatsayshowdataDisgeneratedgivenh

53

Page 54: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

SupposeHconsistsofrealvaluedfunctionsInputsarevectorsx 2 <d andtheoutputisarealnumbery2 <

Supposethetrainingdataisgeneratedasfollows:• Aninputxi isdrawnrandomly(sayuniformlyatrandom)• Thetruefunctionfisappliedtogetf(xi)• Thisvalueisthenperturbedbynoiseei

– DrawnindependentlyaccordingtoanunknownGaussianwithzeromean

Saywehavemtrainingexamples(xi,yi)generatedbythisprocess

54

Example2:

Page 55: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

SupposeHconsistsofrealvaluedfunctionsInputsarevectorsx 2 <d andtheoutputisarealnumbery2 <

Supposethetrainingdataisgeneratedasfollows:• Aninputxi isdrawnrandomly(sayuniformlyatrandom)• Thetruefunctionfisappliedtogetf(xi)• Thisvalueisthenperturbedbynoiseei

– DrawnindependentlyaccordingtoanunknownGaussianwithzeromean

Saywehavemtrainingexamples(xi,yi)generatedbythisprocess

55

Example2:

Page 56: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

SupposeHconsistsofrealvaluedfunctionsInputsarevectorsx 2 <d andtheoutputisarealnumbery2 <

Supposethetrainingdataisgeneratedasfollows:• Aninputxi isdrawnrandomly(sayuniformlyatrandom)• Thetruefunctionfisappliedtogetf(xi)• Thisvalueisthenperturbedbynoiseei

– DrawnindependentlyaccordingtoanunknownGaussianwithzeromean

Saywehavemtrainingexamples(xi,yi)generatedbythisprocess

56

Example2:

Page 57: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

SupposeHconsistsofrealvaluedfunctionsInputsarevectorsx 2 <d andtheoutputisarealnumbery2 <

Supposethetrainingdataisgeneratedasfollows:• Aninputxi isdrawnrandomly(sayuniformlyatrandom)• Thetruefunctionfisappliedtogetf(xi)• Thisvalueisthenperturbedbynoiseei

– DrawnindependentlyaccordingtoanunknownGaussian withzeromean

Saywehavem trainingexamples(xi,yi)generatedbythisprocess

57

Example2:

Page 58: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Theerrorforthisexampleis𝑦G − ℎ 𝑥G

Webelieve thatthiserrorisfromaGaussiandistributionwithmean=0andstandarddeviation=¾

Wecancomputetheprobabilityofobservingonedatapoint(xi,yi),ifitweregeneratedusingthefunctionh

58

Example:

Supposewehaveahypothesish.Wewanttoknowwhatistheprobabilitythataparticularlabelyiwasgeneratedbythishypothesisash(xi)?

Page 59: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Theerrorforthisexampleis𝑦G − ℎ 𝑥G

Webelieve thatthiserrorisfromaGaussiandistributionwithmean=0andstandarddeviation=¾

Wecancomputetheprobabilityofobservingonedatapoint(xi,yi),ifitweregeneratedusingthefunctionh

59

Example:

Supposewehaveahypothesish.Wewanttoknowwhatistheprobabilitythataparticularlabelyiwasgeneratedbythishypothesisash(xi)?

Page 60: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Probabilityofobservingonedatapoint(xi,yi),ifitweregeneratedusingthefunctionh

EachexampleinourdatasetD={(xi,yi)}isgeneratedindependently bythisprocess

60

Example:

Page 61: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Probabilityofobservingonedatapoint(xi,yi),ifitweregeneratedusingthefunctionh

EachexampleinourdatasetD={(xi,yi)}isgeneratedindependently bythisprocess

61

Example:

Page 62: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Ourgoalistofindthemostlikelyhypothesis

MaximumLikelihoodandleastsquares

62

Example:

Page 63: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Ourgoalistofindthemostlikelyhypothesis

63

Example:

Page 64: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Ourgoalistofindthemostlikelyhypothesis

64

Example:

Howdowemaximizethisexpression?Anyideas?

Page 65: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Ourgoalistofindthemostlikelyhypothesis

65

Example:

Howdowemaximizethisexpression?Anyideas?

Answer:Takethelogarithmtosimplify

Page 66: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Ourgoalistofindthemostlikelyhypothesis

66

Example:

Page 67: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Ourgoalistofindthemostlikelyhypothesis

67

Example:

Page 68: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Ourgoalistofindthemostlikelyhypothesis

68

Example:

Becauseweassumedthatthestandarddeviationisaconstant.

Page 69: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Themostlikelyhypothesisis

Ifweconsiderthesetoflinearfunctionsasourhypothesisspace:h(xi)=wT xi

69

Example:

Thisistheprobabilisticexplanationforleastsquaresregression

Page 70: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Themostlikelyhypothesisis

Ifweconsiderthesetoflinearfunctionsasourhypothesisspace:h(xi)=wT xi

70

Example:

Thisistheprobabilisticexplanationforleastsquaresregression

Page 71: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

MaximumLikelihoodandleastsquares

Themostlikelyhypothesisis

Ifweconsiderthesetoflinearfunctionsasourhypothesisspace:h(xi)=wT xi

71

Example:

Thisistheprobabilisticexplanationforleastsquaresregression

Page 72: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Linearregression:Twoperspectives

72

BayesianperspectiveWebelievethattheerrorsareNormallydistributedwithzeromeanandafixedvariance

Findthelinearregressor usingthemaximumlikelihoodprinciple

LossminimizationperspectiveWewanttominimizethedifferencebetweenthesquaredlosserrorofourprediction

Minimizethetotalsquaredloss

Page 73: Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to learn?” –Bayesian learning Different learning algorithms in this regime –Naïve

Today’slecture:Summary

• BayesianLearning– Anotherwaytoask:Whatisthebesthypothesisforadataset?

– Twoanswerstothequestion:Maximumaposteriori(MAP)andmaximumlikelihoodestimation(MLE)

• Wesawtwoexamplesofmaximumlikelihoodestimation– Binomialdistribution,normaldistribution– YoushouldbeabletoapplybothMAPandMLEtosimpleproblems

73