Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to...

MachineLearning

IntroductiontoBayesianLearning

1

Whatwehaveseensofar

Whatdoesitmeantolearn?– Mistake-drivenlearning

• Learningbycounting(andbounding)numberofmistakes

– PAClearnability• Samplecomplexityandboundsonerrorsonunseenexamples

Variouslearningalgorithms– Analyzedalgorithmsunderthesemodelsoflearnability– Inallcases,thealgorithmoutputsafunctionthatproducesalabely foragiveninputx

2

Comingup

Anotherwayofthinkingabout“Whatdoesitmeantolearn?”– Bayesianlearning

Differentlearningalgorithmsinthisregime– NaïveBayes– LogisticRegression

3

Today’slecture

• BayesianLearning

• Maximumaposterioriandmaximumlikelihoodestimation

• Twoexamplesofmaximumlikelihoodestimation– Binomialdistribution– Normaldistribution

4

Today’slecture




5

ProbabilisticLearning

Twodifferentnotionsofprobabilisticlearning• Learningprobabilisticconcepts

– Thelearnedconceptisafunctionc:X®[0,1]– c(x)maybeinterpretedastheprobabilitythatthelabel1is

assignedtox– Thelearningtheorythatwehavestudiedbeforeisapplicable

(withsomeextensions)

• BayesianLearning:Useofaprobabilisticcriterioninselectingahypothesis– Thehypothesiscanbedeterministic,aBooleanfunction– Thecriterionforselectingthehypothesisisprobabilistic

6







7







8

BayesianLearning:Thebasics

• Goal: Tofindthebest hypothesisfromsomespaceHofhypotheses,usingtheobserveddataD

• Definebest =most probablehypothesisinH

• Inordertodothat,weneedtoassumeaprobabilitydistributionovertheclassH

• Wealsoneedtoknowsomethingabouttherelationbetweenthedataobservedandthehypotheses– Aswewillsee,wecanbeBayesianaboutotherthings.e.g.,the

parametersofthemodel

9





• Wealsoneedtoknowsomethingabouttherelationbetweenthedataobservedandthehypotheses– Aswewillsee,wecanbeBayesianaboutotherthings.e.g.,the

parametersofthemodel

10





• Wealsoneedtoknowsomethingabouttherelationbetweenthedataobservedandthehypotheses

– Aswewillseeinthecominglectures,wecanbeBayesianaboutotherthings.e.g.,theparametersofthemodel

11

Bayesianmethodshavemultipleroles

• Providepracticallearningalgorithms

• Combiningpriorknowledgewithobserveddata– Guidethemodeltowardssomethingweknow

• Provideaconceptualframework– Forevaluatingotherlearners

• Providetoolsforanalyzinglearning

12

BayesTheorem

13

𝑃 𝑌 𝑋 = 𝑃 𝑋 𝑌 𝑃(𝑌)

𝑃(𝑋)

BayesTheorem

Shortfor

14

𝑃 𝑌 𝑋 = 𝑃 𝑋 𝑌 𝑃(𝑌)

𝑃(𝑋)

∀𝑥, 𝑦𝑃 𝑌 = 𝑦 𝑋 = 𝑥 = 𝑃 𝑋 = 𝑥 𝑌 = 𝑦 𝑃(𝑌 = 𝑦)

𝑃(𝑋 = 𝑥)

BayesTheorem

15

BayesTheorem

Priorprobability:WhatisourbeliefinYbeforeweseeX?

16

BayesTheorem

Likelihood:WhatisthelikelihoodofobservingXgivenaspecificY?


17

BayesTheorem

Posteriorprobability:WhatistheprobabilityofYgiventhatXisobserved?



18

BayesTheorem

Posteriorprobability:WhatistheprobabilityofYgiventhatXisobserved?



19

Posterior/ Likelihood£ Prior

ProbabilityRefresher

• Productrule:

• Sumrule:

• EventsA,Bareindependentif:– P(A,B)=P(A)P(B)– Equivalently,P(A|B)=P(A),P(B|A)=P(B)

• TheoremofTotalprobability:FormutuallyexclusiveeventsA1,A2,!,An (i.e Ai Å Aj =;)with

20

BayesianLearning

21

GivenadatasetD,wewanttofindthebest hypothesish

Whatdoesbestmean?

Bayesianlearninguses𝑃 ℎ 𝐷), theconditionalprobabilityofahypothesisgiventhedata,todefinebest.

BayesianLearning

22

GivenadatasetD,wewanttofindthebesthypothesishWhatdoesbestmean?

BayesianLearning

23

Posteriorprobability:Whatistheprobabilitythathisthehypothesis,giventhatthedataDisobserved?


BayesianLearning

24



Keyinsight:BothhandDareevents.• D:Theeventthatweobservedthis particulardataset• h:Theeventthatthehypothesishisthetruehypothesis

SowecanapplytheBayesrulehere.

BayesianLearning

25



Keyinsight:BothhandDareevents.• D:Theeventthatweobservedthis particulardataset• h:Theeventthatthehypothesishisthetruehypothesis

BayesianLearning

26


Priorprobabilityofh:Backgroundknowledge.Whatdoweexpectthehypothesistobeevenbeforeweseeanydata?Forexample,intheabsenceofanyinformation,maybetheuniformdistribution.


BayesianLearning

27



Likelihood:Whatistheprobabilitythatthisdatapoint(anexampleoranentiredataset)isobserved,giventhatthehypothesisish?


BayesianLearning

28



WhatistheprobabilitythatthedataDisobserved(independentofanyknowledgeaboutthehypothesis)?

Likelihood:Whatistheprobabilitythatthisdatapoint(anexampleoranentiredataset)isobserved,giventhatthehypothesisish?


Today’slecture




29

Choosingahypothesis

Givensomedata,findthemostprobablehypothesis– TheMaximumaPosteriorihypothesishMAP

30

Choosingahypothesis


31

Choosingahypothesis


32

Posterior/ Likelihood£ Prior

Choosingahypothesis


Ifweassumethatthepriorisuniform– SimplifythistogettheMaximumLikelihoodhypothesis

33

i.e.P(hi)=P(hj),forallhi,hj

Oftencomputationallyeasiertomaximizeloglikelihood

Choosingahypothesis



34



Choosingahypothesis



35



BruteforceMAPlearner

Input:DataDandahypothesissetH1. Calculatetheposteriorprobabilityforeachh2 H

2. Outputthehypothesiswiththehighestposteriorprobability

36

BruteforceMAPlearner

Input:DataDandahypothesissetH1. Calculatetheposteriorprobabilityforeachh2 H

2. Outputthehypothesiswiththehighestposteriorprobability

37

Difficulttocompute,exceptforthemost

simplehypothesisspaces

Today’slecture



• Twoexamplesofmaximumlikelihoodestimation– Bernoullitrials– Normaldistribution

38

MaximumLikelihoodestimation

MaximumLikelihoodestimation(MLE)

Whatweneedinordertodefinelearning:1. AhypothesisspaceH2. AmodelthatsayshowdataDisgeneratedgivenh

39

Example1:Bernoullitrials

TheCEOofastartuphiresyouforyourfirstconsultingjob

• CEO:Mycompanymakeslightbulbs.Ineedtoknowwhatistheprobabilitytheyarefaulty.

• You:Sure.Icanhelpyouout.Aretheyallidentical?

• CEO:Yes!

• You:Excellent.Iknowhowtohelp.Weneedtoexperiment…

40

Faultylightbulbs

Theexperiment:Tryout100lightbulbs80work,20don’t

You:TheprobabilityisP(failure)=0.2

CEO:Buthowdoyouknow?

You:Because…

41

Bernoullitrials

• P(failure)=p,P(success)=1– p

• Eachtrialisi.i.d– Independentandidenticallydistributed

• YouhaveseenD={80work,20don’t}

• Themostlikelyvalueofpforthisobservationis?

42

𝑃 𝐷 𝑝 = 10080 𝑝23 1 − 𝑝 53

argmax;

𝑃 𝐷 𝑝 = argmax;

10080 𝑝23 1 − 𝑝 53

Bernoullitrials





43

𝑃 𝐷 𝑝 = 10080 𝑝23 1 − 𝑝 53

argmax;


10080 𝑝23 1 − 𝑝 53

Bernoullitrials





44

𝑃 𝐷 𝑝 = 10080 𝑝23 1 − 𝑝 53

argmax;


10080 𝑝23 1 − 𝑝 53

Bernoullitrials





45

𝑃 𝐷 𝑝 = 10080 𝑝23 1 − 𝑝 53

argmax;


10080 𝑝23 1 − 𝑝 53

The“learning”algorithm

SayyouhaveaWorkandbNot-Work

Calculus101:SetthederivativetozeroPbest =b/(a+b)

46

𝑝<=>? = argmax;

𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;

log 𝑎 + 𝑏𝑎 𝑝F 1 − 𝑝 <

= argmax;

𝑎 log 𝑝 + 𝑏log(1 − 𝑝)




47

Loglikelihood𝑝<=>? = argmax

;𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;


= argmax;





48


;𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;


= argmax;





49


;𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;


= argmax;





50


;𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;


= argmax;





51

ThemodelweassumedisBernoulli.Youcouldassumeadifferentmodel!Nextwewillconsiderothermodelsandseehowtolearntheirparameters.


;𝑃(𝐷|ℎ)

= argmax;

log 𝑃(𝐷 |ℎ)

= argmax;


= argmax;


Today’slecture



• Twoexamplesofmaximumlikelihoodestimation– Bernoullitrials– Normaldistribution

52

MaximumLikelihoodestimation

MaximumLikelihoodestimation(MLE)

Whatweneedinordertodefinelearning:1. AhypothesisspaceH2. AmodelthatsayshowdataDisgeneratedgivenh

53

MaximumLikelihoodandleastsquares

SupposeHconsistsofrealvaluedfunctionsInputsarevectorsx 2 <d andtheoutputisarealnumbery2 <

Supposethetrainingdataisgeneratedasfollows:• Aninputxi isdrawnrandomly(sayuniformlyatrandom)• Thetruefunctionfisappliedtogetf(xi)• Thisvalueisthenperturbedbynoiseei

– DrawnindependentlyaccordingtoanunknownGaussianwithzeromean

Saywehavemtrainingexamples(xi,yi)generatedbythisprocess

54

Example2:






55

Example2:






56

Example2:




– DrawnindependentlyaccordingtoanunknownGaussian withzeromean

Saywehavem trainingexamples(xi,yi)generatedbythisprocess

57

Example2:


Theerrorforthisexampleis𝑦G − ℎ 𝑥G

Webelieve thatthiserrorisfromaGaussiandistributionwithmean=0andstandarddeviation=¾

Wecancomputetheprobabilityofobservingonedatapoint(xi,yi),ifitweregeneratedusingthefunctionh

58

Example:

Supposewehaveahypothesish.Wewanttoknowwhatistheprobabilitythataparticularlabelyiwasgeneratedbythishypothesisash(xi)?


Theerrorforthisexampleis𝑦G − ℎ 𝑥G

Webelieve thatthiserrorisfromaGaussiandistributionwithmean=0andstandarddeviation=¾

Wecancomputetheprobabilityofobservingonedatapoint(xi,yi),ifitweregeneratedusingthefunctionh

59

Example:

Supposewehaveahypothesish.Wewanttoknowwhatistheprobabilitythataparticularlabelyiwasgeneratedbythishypothesisash(xi)?


Probabilityofobservingonedatapoint(xi,yi),ifitweregeneratedusingthefunctionh

EachexampleinourdatasetD={(xi,yi)}isgeneratedindependently bythisprocess

60

Example:


Probabilityofobservingonedatapoint(xi,yi),ifitweregeneratedusingthefunctionh

EachexampleinourdatasetD={(xi,yi)}isgeneratedindependently bythisprocess

61

Example:

Ourgoalistofindthemostlikelyhypothesis


62

Example:



63

Example:



64

Example:

Howdowemaximizethisexpression?Anyideas?



65

Example:

Howdowemaximizethisexpression?Anyideas?

Answer:Takethelogarithmtosimplify



66

Example:



67

Example:



68

Example:

Becauseweassumedthatthestandarddeviationisaconstant.


Themostlikelyhypothesisis

Ifweconsiderthesetoflinearfunctionsasourhypothesisspace:h(xi)=wT xi

69

Example:

Thisistheprobabilisticexplanationforleastsquaresregression




70

Example:





71

Example:


Linearregression:Twoperspectives

72

BayesianperspectiveWebelievethattheerrorsareNormallydistributedwithzeromeanandafixedvariance

Findthelinearregressor usingthemaximumlikelihoodprinciple

LossminimizationperspectiveWewanttominimizethedifferencebetweenthesquaredlosserrorofourprediction

Minimizethetotalsquaredloss

Today’slecture:Summary

• BayesianLearning– Anotherwaytoask:Whatisthebesthypothesisforadataset?

– Twoanswerstothequestion:Maximumaposteriori(MAP)andmaximumlikelihoodestimation(MLE)

• Wesawtwoexamplesofmaximumlikelihoodestimation– Binomialdistribution,normaldistribution– YoushouldbeabletoapplybothMAPandMLEtosimpleproblems

73

Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to...

Documents

Transcript of Introduction to Bayesian Learning...Coming up Another way of thinking about “What does it mean to...