Classification & preduction
-
Upload
profnilesh-magar -
Category
Documents
-
view
806 -
download
0
Transcript of Classification & preduction
•What Is Classification? •Example•Two Step Process:
Learning Step: Training set made up of DB tuples & their associated class labels- Classification Rule or
Decision Tree or mathematical Formulae Classification Step:
•Supervised Learning:•Accuracy of the classifier:
Prepared By - Mr. Nilesh Magar
Decision Tree
•Between 1970-1980 J. Rose Quinlan, a researcher in Machine Learning developed a decision tree algorithm known as ID3 (Iterative dichotomiser), C4.5 is the succesor of ID3.•CART(Classification & Regression tree is also developed during the same period which describe the generation of binary tree. • Flowchart like tree structure- root, Node, Branch, leaf node.
•How are decision trees used for classification?Prepared By - Mr. Nilesh Magar
3 Termination Condition
3 Splitting scenarios
3 Attribute Selection methods
Prepared By - Mr. Nilesh Magar
Splitting Scenarios
1) A is Discrete value 2) A is continuous Valued
3) Discrete Value but Binary tree must be produced
Prepared By - Mr. Nilesh Magar
Termination Condition : Recursive
1. All of the tuples in partition D (represented at node N) belong to the same class
(steps 2 and 3), or
2. There are no remaining attributes on which the tuples may be further partitioned
(step 4). In this case, majority voting is employed (step 5). This involves converting
node N into a leaf and labeling it with the most common class in D. Alternatively,
the class distribution of the node tuples may be stored.
3. There are no tuples for a given branch, that is, a partition Dj is empty (step 12).
In this case, a leaf is created with the majority class in D (step 13).
Prepared By - Mr. Nilesh Magar
Attribute Selection Measures:
1. Information Gain:
2. Gain Ratio:
3. Gini Index
Prepared By - Mr. Nilesh Magar
Performance:•Quite simple, suitable for relatively small data sets
•Large real-world databases?
•Training tuples should reside in main memory
Issues:
•Over fitting
Tree pruning
1. Pre-pruning2. Post-pruning
Prepared By - Mr. Nilesh Magar
Bayes Classification Method
•Statistical Classifier
•They use to predict class membership probability
•Based on Bayes’ Theorm
•Naïve
•It assumes “effects of an attribute value on a given class is independent
of the value of the other attributes” – class condition independence
•The name bayes is taken from the name thomas Bayes who did early
work in probability and decision theoryduring 18th century.
Prepared By - Mr. Nilesh Magar
•Let X is data tuple “evidence” & H is hypothesis that X belongs to specific
class.
•Determine P(H|X):
•Posterior probability: P(H|X), tuple X contains customers attribute age=35
& salary=40,000 , H customer will buy a computer.
•Prior Probability: P(H)
•P(X|H) :
•P(X)
•Bayesian Theorem:
P(H|X) = P(X|H) P(H) / P(X)
Bayesian Theorem
Prepared By - Mr. Nilesh Magar
Naïve Bayesian classifier:
Suppose there are m classes, C1, C2, …..,Cm. Given a tuple, X, the classifier will predict that X belongs to the class having the highest posterior probability, conditioned on X. X belong to Ci If & only if
P(Ci|X)>P(Cj|X) for 1<= j <= m, j!=I
So Bayes theorem is
P(Ci|X) = P(X|Ci) P(Ci) / P(X)
As P(X) is constant for all classes so only P(X|Ci) P(Ci) need to be maximize, If class prior
probability is not known then P(C1) = P(C2) = …… = P(Cm) so only P(X|Ci) need to maximize.
But maximization of P(X|Ci) is computationally expensive so we will apply Class conditional independence,
Prepared By - Mr. Nilesh Magar
Prediction
Regression Analysis Can be used to model the relationship between 2 variables.Predictor Variable: The values of the predictor variables are known.Response variable: The response variable is what we want to predict.
Linear regression: y = b+wx;
y = w0+w1x
Prepared By - Mr. Nilesh Magar
Animal height (feet) weight (lbs)
Animal1 9 300
Animal2 8.78 295
Animal3 9.6 312
Animal4 8.09 280
Animal5 5 200
Animal6 5.5 250
Animal7 5.42 230
Animal8 5.75 250
Example
Prepared By - Mr. Nilesh Magar
Given the above data, we compute
= 7.15 and = 264.7
(9-7.15)(300–264.7)+(8.78–7.15)(295–264.7)+(9.6–7.15)(312–264.7)+………+(5.75-7.15)(250–264.7)
W1=
(9 – 7.15)2 + (8.78 – 7.15 ) 2 +……… (5.75-7.15) 2
= 19.35337Let w 0 = 264.7 – (19.35337)(7.15)
= 126.3234y = 126.3234 + 19.35337x. Using this equation, we can predict that the Animal with 8 feet height can have 281.1504 lbs weight.( 126.3234 + 19.35337(8))
Prepared By - Mr. Nilesh Magar
Subjects
1) U.M.L.2) P.P.L.3) D.M.D.W.4) O.S.5) Programming Languages6) RDBMS
Mr. Nilesh MagarLecturer at MIT, Kothrud, Pune.9975155310.
Prepared By - Mr. Nilesh Magar