Classification & preduction

Classification and prediction

Prepared By - Mr. Nilesh Magar

•What Is Classification? •Example•Two Step Process:

Learning Step: Training set made up of DB tuples & their associated class labels- Classification Rule or

Decision Tree or mathematical Formulae Classification Step:

•Supervised Learning:•Accuracy of the classifier:


Class label Attribute

Classifier


Decision Tree

•Between 1970-1980 J. Rose Quinlan, a researcher in Machine Learning developed a decision tree algorithm known as ID3 (Iterative dichotomiser), C4.5 is the succesor of ID3.•CART(Classification & Regression tree is also developed during the same period which describe the generation of binary tree. • Flowchart like tree structure- root, Node, Branch, leaf node.

•How are decision trees used for classification?Prepared By - Mr. Nilesh Magar

3 Termination Condition

3 Splitting scenarios

3 Attribute Selection methods


Splitting Scenarios

1) A is Discrete value 2) A is continuous Valued

3) Discrete Value but Binary tree must be produced


Termination Condition : Recursive

1. All of the tuples in partition D (represented at node N) belong to the same class

(steps 2 and 3), or

2. There are no remaining attributes on which the tuples may be further partitioned

(step 4). In this case, majority voting is employed (step 5). This involves converting

node N into a leaf and labeling it with the most common class in D. Alternatively,

the class distribution of the node tuples may be stored.

3. There are no tuples for a given branch, that is, a partition Dj is empty (step 12).

In this case, a leaf is created with the majority class in D (step 13).


Attribute Selection Measures:

1. Information Gain:

2. Gain Ratio:

3. Gini Index


Performance:•Quite simple, suitable for relatively small data sets

•Large real-world databases?

•Training tuples should reside in main memory

Issues:

•Over fitting

Tree pruning

1. Pre-pruning2. Post-pruning


Bayes Classification Method

•Statistical Classifier

•They use to predict class membership probability

•Based on Bayes’ Theorm

•Naïve

•It assumes “effects of an attribute value on a given class is independent

of the value of the other attributes” – class condition independence

•The name bayes is taken from the name thomas Bayes who did early

work in probability and decision theoryduring 18th century.


•Let X is data tuple “evidence” & H is hypothesis that X belongs to specific

class.

•Determine P(H|X):

•Posterior probability: P(H|X), tuple X contains customers attribute age=35

& salary=40,000 , H customer will buy a computer.

•Prior Probability: P(H)

•P(X|H) :

•P(X)

•Bayesian Theorem:

P(H|X) = P(X|H) P(H) / P(X)

Bayesian Theorem


Naïve Bayesian classifier:

Suppose there are m classes, C1, C2, …..,Cm. Given a tuple, X, the classifier will predict that X belongs to the class having the highest posterior probability, conditioned on X. X belong to Ci If & only if

P(Ci|X)>P(Cj|X) for 1<= j <= m, j!=I

So Bayes theorem is

P(Ci|X) = P(X|Ci) P(Ci) / P(X)

As P(X) is constant for all classes so only P(X|Ci) P(Ci) need to be maximize, If class prior

probability is not known then P(C1) = P(C2) = …… = P(Cm) so only P(X|Ci) need to maximize.

But maximization of P(X|Ci) is computationally expensive so we will apply Class conditional independence,


Example:


Prediction

Regression Analysis Can be used to model the relationship between 2 variables.Predictor Variable: The values of the predictor variables are known.Response variable: The response variable is what we want to predict.

Linear regression: y = b+wx;

y = w0+w1x


Animal height (feet) weight (lbs)

Animal1 9 300

Animal2 8.78 295

Animal3 9.6 312

Animal4 8.09 280

Animal5 5 200

Animal6 5.5 250

Animal7 5.42 230

Animal8 5.75 250

Example


Given the above data, we compute

= 7.15 and = 264.7

(9-7.15)(300–264.7)+(8.78–7.15)(295–264.7)+(9.6–7.15)(312–264.7)+………+(5.75-7.15)(250–264.7)

W1=

(9 – 7.15)2 + (8.78 – 7.15 ) 2 +……… (5.75-7.15) 2

= 19.35337Let w 0 = 264.7 – (19.35337)(7.15)

= 126.3234y = 126.3234 + 19.35337x. Using this equation, we can predict that the Animal with 8 feet height can have 281.1504 lbs weight.( 126.3234 + 19.35337(8))


Subjects

1) U.M.L.2) P.P.L.3) D.M.D.W.4) O.S.5) Programming Languages6) RDBMS

Mr. Nilesh MagarLecturer at MIT, Kothrud, Pune.9975155310.


Thank You


Classification & preduction

Documents

Transcript of Classification & preduction