Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova©...

35
Naïve Bayes Robot Image Credit: Viktoriya Sukhanova © 123RF.com These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.

Transcript of Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova©...

Page 1: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Naïve Bayes

Robot Image Credit: Viktoriya Sukhanova © 123RF.com

These slides were assembled by Byron Boots, with only minor modifications from Eric Eaton’s slides and grateful acknowledgement to the many others who made their course materials freely available online. Feel free to reuse or adapt these slides for your own academic purposes, provided that you include proper attribution.

Page 2: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Bayes’ Rule• Recall Baye’s Rule:

• Equivalently, we can write:

where X is a random variable representing the evidence and Y is a random variable for the label (our hypothesis)

• This is actually short for:

where Xj denotes the random variable for the j th feature2

P (hypothesis | evidence) = P (evidence | hypothesis)⇥ P (hypothesis)

P (evidence)

P (Y = yk | X = xi) =P (Y = yk)P (X1 = xi,1 ^ . . . ^Xd = xi,d | Y = yk)

P (X1 = xi,1 ^ . . . ^Xd = xi,d)

P (Y = yk | X = xi) =P (Y = yk)P (X = xi | Y = yk)

P (X = xi)

Page 3: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Naïve Bayes ClassifierIdea: Use the training data to estimate

Then, use Bayes rule to infer for new data

• Estimating the joint probability distribution is not practical

3

P (X | Y ) P (Y )and .P (Y |Xnew)

P (Y = yk | X = xi) =P (Y = yk)P (X1 = xi,1 ^ . . . ^Xd = xi,d | Y = yk)

P (X1 = xi,1 ^ . . . ^Xd = xi,d)

P (X1, X2, . . . , Xd | Y )

Easy to estimate from data

Unnecessary, as it turns out

Impractical, but necessary

Page 4: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Naïve Bayes ClassifierProblem: estimating the joint PD or CPD isn’t practical– Can severely overfit, as we saw before

However, if we make the assumption that the attributes are independent given the class label, estimation is easy!

• In other words, we assume all attributes are conditionally independent given Y

• Often this assumption is violated in practice, but more on that later…

4

P (X1, X2, . . . , Xd | Y ) =dY

j=1

P (Xj | Y )

Page 5: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve BayesEstimate and directly from the training data by counting!

P(play) = ? P(¬play) = ?P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

5

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 6: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve BayesEstimate and directly from the training data by counting!

P(play) = ? P(¬play) = ?P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

6

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 7: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve BayesEstimate and directly from the training data by counting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

7

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 8: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve BayesEstimate and directly from the training data by counting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = ? P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

8

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 9: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve BayesEstimate and directly from the training data by counting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

9

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 10: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve BayesEstimate and directly from the training data by counting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

10

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 11: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve BayesEstimate and directly from the training data by counting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

11

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 12: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve BayesEstimate and directly from the training data by counting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

12

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 13: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve BayesEstimate and directly from the training data by counting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = 2/3 P(Humid = high | ¬play) = ?... ...

13

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 14: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve BayesEstimate and directly from the training data by counting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = 2/3 P(Humid = high | ¬play) = ?... ...

14

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 15: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve BayesEstimate and directly from the training data by counting!

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 1 P(Sky = sunny | ¬play) = 0P(Humid = high | play) = 2/3 P(Humid = high | ¬play) = 1... ...

15

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 16: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Laplace Smoothing• Notice that some probabilities estimated by counting

might be zero– Possible overfitting!

• Fix by using Laplace smoothing:– Adds 1 to each count

where – cv is the count of training instances with a value of v for

attribute j and class label yk

– |values(Xj)| is the number of values Xj can take on16

P (Xj = v | Y = yk) =cv + 1X

v02values(Xj)

cv0 + |values(Xj)|

Page 17: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve Bayes with Laplace SmoothingEstimate and directly from the training data by counting with Laplace smoothing:

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = ?P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

17

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 18: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve Bayes with Laplace SmoothingEstimate and directly from the training data by counting with Laplace smoothing:

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = 1/3P(Humid = high | play) = ? P(Humid = high | ¬play) = ?... ...

18

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 19: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve Bayes with Laplace SmoothingEstimate and directly from the training data by counting with Laplace smoothing:

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = 1/3P(Humid = high | play) = 3/5 P(Humid = high | ¬play) = ?... ...

19

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 20: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Training Naïve Bayes with Laplace SmoothingEstimate and directly from the training data by counting with Laplace smoothing:

P(play) = 3/4 P(¬play) = 1/4P(Sky = sunny | play) = 4/5 P(Sky = sunny | ¬play) = 1/3P(Humid = high | play) = 3/5 P(Humid = high | ¬play) = 2/3... ...

20

Sky Temp Humid Wind Water Forecast Play?sunny warm normal strong warm same yessunny warm high strong warm same yesrainy cold high strong warm change nosunny warm high strong cool change yes

P (Xj | Y ) P (Y )

Page 21: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Using the Naïve Bayes Classifier• Now, we have

– In practice, we use log-probabilities to prevent underflow

• To classify a new point x,

21

P (Y = yk | X = xi) =P (Y = yk)

Qdj=1 P (Xj = xi,j | Y = yk)

P (X = xi)This is constant for a given instance, and so irrelevant to our prediction

h(x) = argmaxyk

P (Y = yk)dY

j=1

P (Xj = xj | Y = yk)

= argmaxyk

logP (Y = yk) +dX

j=1

logP (Xj = xj | Y = yk)

j th attribute value of x

h(x) = argmaxyk

P (Y = yk)dY

j=1

P (Xj = xj | Y = yk)

= argmaxyk

logP (Y = yk) +dX

j=1

logP (Xj = xj | Y = yk)

Page 22: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

22

The Naïve Bayes Classifier Algorithm

• For each class label yk– Estimate P(Y = yk) from the data– For each value xi,j of each attribute Xi

• Estimate P(Xi = xi,j | Y = yk)

• Classify a new point via:

• In practice, the independence assumption doesn’t often hold true, but Naïve Bayes performs very well despite this

h(x) = argmaxyk

logP (Y = yk) +dX

j=1

logP (Xj = xj | Y = yk)

Page 23: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

The Naïve Bayes Graphical Model

• Nodes denote random variables• Edges denote dependency• Each node has an associated conditional probability

table (CPT), conditioned upon its parents23

Attributes (evidence)

Labels (hypotheses)

Y

X1 Xi Xd…

P(Y)

P(X1 | Y) P(Xi | Y) P(Xd | Y)

Page 24: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Example NB Graphical Model

24

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Data:

Page 25: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Example NB Graphical Model

25

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yesno

Data:

Page 26: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Example NB Graphical Model

26

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Data:

Page 27: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Example NB Graphical Model

27

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky | Play)sunny yesrainy yessunny norainy no

Data:

Page 28: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Example NB Graphical Model

28

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky | Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Data:

Page 29: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Example NB Graphical Model

29

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky | Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp | Play)warm yescold yes

warm nocold no

Data:

Page 30: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Example NB Graphical Model

30

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky | Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp | Play)warm yes 4/5cold yes 1/5

warm no 1/3cold no 2/3

Data:

Page 31: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Example NB Graphical Model

31

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky | Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp | Play)warm yes 4/5cold yes 1/5

warm no 1/3cold no 2/3

Humid Play? P(Humid | Play)high yesnorm yeshigh nonorm no

Data:

Page 32: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Example NB Graphical Model

32

Play

Sky Temp Humid…

Sky Temp Humid Play?sunny warm normal yessunny warm high yesrainy cold high nosunny warm high yes

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky | Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp | Play)warm yes 4/5cold yes 1/5

warm no 1/3cold no 2/3

Humid Play? P(Humid | Play)high yes 3/5norm yes 2/5high no 2/3norm no 1/3

Data:

Page 33: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Example Using NB for Classification

Goal: Predict label for x = (rainy, warm, normal)33

Play

Sky Temp Humid

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky | Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp | Play)warm yes 4/5cold yes 1/5

warm no 1/3cold no 2/3

Humid Play? P(Humid | Play)high yes 3/5norm yes 2/5high no 2/3norm no 1/3

h(x) = argmaxyk

logP (Y = yk) +dX

j=1

logP (Xj = xj | Y = yk)

Page 34: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Example Using NB for Classification

Predict label for: x = (rainy, warm, normal)

34

Play

Sky Temp Humid

Play? P(Play)yes 3/4no 1/4

Sky Play? P(Sky | Play)sunny yes 4/5rainy yes 1/5sunny no 1/3rainy no 2/3

Temp Play? P(Temp | Play)warm yes 4/5cold yes 1/5

warm no 1/3cold no 2/3

Humid Play? P(Humid | Play)high yes 3/5norm yes 2/5high no 2/3norm no 1/3

predict PLAY

Page 35: Naïve Bayes - University of Washington · Naïve Bayes Robot Image Credit: ViktoriyaSukhanova© 123RF.com These slides were assembled by Byron Boots, with only minor modifications

Naïve Bayes SummaryAdvantages:• Fast to train (single scan through data)• Fast to classify • Not sensitive to irrelevant features• Can handle real-valued data (e.g. fit Gaussian for

each attribute)• Handles streaming data well

Disadvantages:• Assumes independence of features

35Slide by Eamonn Keogh