Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com.

Data Mining and Machine Learning

Naive BayesDavid Corne, HWU

dwcorne@gmail.com

A very simple dataset – one field / one class

P34 level Prostatecancer

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NLow N

Medium Y

P34 level Prostate cancer

High YMedium Y

Low YLow NLow N

Medium Y

A new patient hasa blood test – his P34level is HIGH.

what is our best guess for prostate cancer?

High YMedium Y

Low YLow NLow N

Medium Y

It’s useful to know: P(cancer = Y)

High YMedium Y

Low YLow NLow N

Medium Y

It’s useful to know: P(cancer = Y)

- on basis of this tiny dataset, P(c = Y) is 5/10 = 0.5

High YMedium Y

Low YLow NLow N

Medium Y

But we know that P34 =H,so actually we want: P(cancer=Y | P34 = H)

- the prob that cancer is Y, given that P34 is high

High YMedium Y

Low YLow NLow N

Medium Y

P(cancer=Y | P34 = H)

- the prob that cancer is Y, given that P34 is high

- this seems to be 2/3 = ~ 0.67

High YMedium Y

Low YLow NLow N

Medium Y

So we have:

P ( c=Y | P34 = H) = 0.67 P ( c =N | P34 = H) = 0.33

The class value with the highest probability is ourbest guess

In general we may have any number of class values

High YMedium Y

Low YLow NLow N

Medium NHigh YHigh NHigh Maybe

Medium Y

suppose again we know that P34 is High; here we have:

P ( c=Y | P34 = H) = 0.5 P ( c=N | P34 = H) = 0.25 P(c = Maybe | H) = 0.25

... and again, Y is the winner

That is the essence of Naive Bayes,

but:the probability calculations are much

trickier when there are >1 fieldsso we make a ‘Naive’ assumption that

makes it simpler

Bayes’ theorem

High YMedium Y

Low YLow NLow N

Medium Y

As we saw, on the rightwe are illustrating:

P(cancer = Y | P34 = H)

Bayes’ theorem

High YMedium Y

Low YLow NLow N

Medium Y

And now we are illustrating

P(P34 = H | cancer = Y)

This is a different thing, that turns out as 2/5 = 0.4

Bayes’ theorem is this:

P( A | B) = P ( B | A ) P (A) P(B)

It is very useful when it is hard to get P(A | B) directly, but easier to get the things on the right

Bayes’ theorem in 1-non-class-field DMML context:

P( Class=X | Fieldval = F) =

P ( Fieldval = F | Class = X ) × P( Class = X)

P(Fieldval = F)

We want to check this for each class and choosethe class that gives the highest value.

P(Fieldval = F)

E.g. We compare: P(F | Yes) × P (Yes) P(F | No× P (No) P(F | Maybe) × P (Maybe)

... We can ignore the P(Fieldval = F) bit ... Why ?

and that was Exactly how we do Naive Bayes for a

1-field dataset

Note how this relates to our beloved histograms

High YMedium Y

Low YLow NLow N

Medium YLow Med High

0.60.40.20

P(L| N)P(H| Y)

Nave-Bayes with Many-fields

P34 level P61 level BMI Prostate cancer

High Low Medium YMedium Low Medium Y

Low Low High YLow High Low NLow Low Low N

Medium Medium Low NHigh Low Medium YHigh Medium Low NLow Low High N

Medium High High Y

New patient:P34=M, P61=M, BMI = H

Best guess at cancer field ?

Medium High High Y

which of these gives the highest value?

Nave-Bayes with

Medium High High Y

Nave-Bayes with

Medium High High Y

0.4 × 0 × 0.4 × 0.5 = 0 0.2 × 0.4 × 0.2 × 0.5 = 0.008

Nave-Bayes -- in generalN fields, q possible class values, New unclassified instance: F1 = v1, F2 = v2, ... , Fn = vn

what is the class value? i.e. Is it c1, c2, .. or cq ?

calculate each of these q things – biggest one gives the class:

Nave-Bayes -- in general

Actually ... What we normally do, when there aremore than a handful of fields, is this

Calculate:log(P(F1=v1 | c1)) + ... + log(P(Fn=vn | c1)) + log( P(c1))

log(P(F1=v1 | c2)) + ... + log(P(Fn=vn | c2)) + log( P(c2))

and choose class based on highest of these. Why ?

Nave-Bayes -- in general

Because

log( a × b × c × …) = log(a) + log(b) + log(c) + …

and this means we won’t get underflow errors, whichwe would otherwise get with, e.g.

0.003 × 0.000296 × 0.001 × …[100 fields] × 0.042 …

Deriving NB

Essence of Naive Bayes, with 1 non-class field, is to calc this for each class value, given some new instance with fieldval = F:

P(class = C | Fieldval = F)

For many fields, our new instance is (e.g.) (F1, F2, ...Fn), and the ‘essence of Naive Bayes’ is to calculate this for each class:

P(class = C | F1,F2,F3,...,Fn)

i.e. What is prob of class C, given all these field vals together?

Apply magic dust and Bayes theorem, and ...

... If we make the naive assumption that all of the fields are independent of each other

(e.g. P(F1| F2) = P(F1), etc ...) ... then

EN(B)D

Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com.

Documents

Transcript of Data Mining and Machine Learning Naive Bayes David Corne, HWU dwcorne@gmail.com.

f Na Hwu Poster2014 10sep2014

Ravel - Le Tombeau de Couperin (Oboés & Corne)

Eric Corne – ‘Kid Dynamite & The Common Man’ Album Released Friday …fortybelowrecords.com/files/Eric_Corne_-_KID_-_Press... · · 2015-10-15Track Listing: Eric Corne –

Renforcer la résilience dans la Corne de l’Afrique

Urgence Corne de l'Afrique

Corne rpostemobile ampliamento16ott13

EPIC Architectures and Compiler Technology Wen-mei Hwu EPIC Architectures Wen-mei Hwu Department of Electrical and Computer Engineering Coordinated Science.

Melanoma Immunotherapy - Dr. Patrick Hwu

HWU Management Degree Programmes 2012

Hwu Technical Manual Potable Water 2004-04-27

Corne]] University, Ithaca, New York 14853-7801

The Corne rst one

Howard Hwu Design Portfolio

A Graduated Scale f 00 Corne

IMPACT Second Generation EPIC Architecture Wen-mei Hwu IMPACT Second Generation EPIC Architecture Wen-mei Hwu Department of Electrical and Computer Engineering.

Defensa Civil - Lorena Corne

Eriks Pe Hwu Pe Hwst

Corne Curriculum Vitae

Arabic Agri Yellow Corne

Handwörterbuch Unternehmensrechnung und Controlling (HWU ...