Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.

Copy

right

R. W

eber

Machine Learning, Data Mining

INFO 629

Dr. R. Weber

Copy

right

R. W

eber

The picnic game

• How did you reason to find the rule?

• According to Michalski (1983) A theory and methodology of inductive learning. In Machine Learning, chapter 4, “inductive learning is a heuristic search through a space of symbolic descriptions (i.e., generalizations) generated by the application of rules to training instances.”

Copy

right

R. W

eber

Learning

• Rote Learning– Learn multiplication tables

• Supervised Learning– Examples are used to help a program identify a concept– Examples are typically represented with attribute-value

pairs– Notion of supervision originates from guidance from

examples• Unsupervised Learning

– Human efforts at scientific discovery, theory formation

Copy

right

R. W

eber

Inductive Learning

• Learning by generalization• Performance of classification tasks

– Classification, categorization, clustering• Rules indicate categories• Goal:

– Characterize a concept

Copy

right

R. W

eber

•Learner uses:–positive examples (instances ARE examples of

a concept) and –negative examples (instances ARE NOT

examples of a concept)

Concept Learning is a Form of Inductive Learning

Copy

right

R. W

eber

• Needs empirical validation• Dense or sparse data determine quality

of different methods

Concept Learning

Copy

right

R. W

eber

• The learned concept should be able to correctly classify new instances of the concept– When it succeeds in a real instance of the

concept it finds true positives – When it fails in a real instance of the concept

it finds false negatives

Validation of Concept Learning i

Copy

right

R. W

eber

• The learned concept should be able to correctly classify new instances of the concept– When it succeeds in a counterexample it

finds true negatives– When it fails in a counterexample it finds

false positives

Validation of Concept Learning ii

Copy

right

R. W

eber

Basic classification tasks

• Classification• Categorization• Clustering

Copy

right

R. W

eber

Categorization

Copy

right

R. W

eber

Classification

Copy

right

R. W

eber

Clustering

Copy

right

R. W

eber

Clustering

• Data analysis method applied to data• Data should naturally possess groupings• Goal: group data into clusters• Resulting clusters are collections where objects within a

cluster are similar to each other• Objects outside the cluster are dissimilar to objects inside• Objects from one cluster are dissimilar to objects in other

clusters • Distance measures are used to compute similarity

Copy

right

R. W

eber

Rule Learning

• Learning widely used in data mining• Version Space Learning is a search

method to learn rules• Decision Trees

Copy

right

R. W

eber

Version Space i

A=1,B=1,C=1 Outcome=1A=0,B=.5,C=.5 Outcome=0A=0,B=0,C=.3 Outcome=.5• Creates tree that includes all possible

combinations• Does not learn for rules with disjunctions (i.e. OR

statements)• Incremental method, trains additional data

without the need to retrain all data

Copy

right

R. W

eber

Decision trees

• Knowledge representation formalism• Represent mutually exclusive rules (disjunction)• A way of breaking up a data set into classes or

categories• Classification rules that determine, for each instance

with attribute values, whether it belongs to one or another class

Decision treesconsist of:-leaf nodes (classes)

- decision nodes (tests on attribute values)

-from decision nodes branches grow for each possible outcome of the test

From Cawsey, 1997

Copy

right

R. W

eber

Decision tree induction

• Goal is to correctly classify all example data• Several algorithms to induce decision trees: ID3

(Quinlan 1979) , CLS, ACLS, ASSISTANT, IND, C4.5

• Constructs decision tree from past data• Not incremental• Attempts to find the simplest tree (not

guaranteed because it is based on heuristics)

Copy

right

R. W

eber

•From:– a set of target classes–Training data containing objects of more than one class

•ID3 uses test to refine the training data set into subsets that contain objects of only one class each•Choosing the right test is the key

ID3 algorithm

Copy

right

R. W

eber

• Information gain or ‘minimum entropy’• Maximizing information gain corresponds to minimizing entropy•Predictive features (good indicators of the outcome)

How does ID3 chooses tests

Copy

right

R. W

eber

ID3 algorithm

No. Student First last year?

Male? Works hard? Drinks? First this year?

1 Richard yes yes no yes yes

2 Alan yes yes yes no yes

3 Alison no no yes no yes

4 Jeff no yes no yes no

5 Gail yes no yes yes yes

6 Simon no yes yes yes no

Copy

right

R. W

eber

Explanation-based learning

• Incorporates domain knowledge into the learning process

• Feature values are assigned a relevance factor if their values are consistent with domain knowledge

• Features that are assigned relevance factors are considered in the learning process

Copy

right

R. W

eber

Familiar Learning Task

• Learn relative importance of features• Goal: learn individual weights• Commonly used in case-based reasoning• Methods include a similarity measure to get

feedback about verify their relative importance: feedback methods

• Search methods: gradient descent• ID3

Copy

right

R. W

eber

Classification using Naive Bayes

• Naïve Bayes classifier uses two sources of information to classify a new instance– The distribution of the rtaining dataset (prior probability)– The region surrounding the new instance in the dataset (likelihood)

• Naïve because assumes conditional independence not always applicable

• It is made to simplify the computation and in this sense considered to be “Naïve”.

• Conditional independence reduces the requirement for large number of observations

• Bias in estimating probabilities often may not make a difference in practice -- it is the order of the probabilities, not their exact values, that determine the classifications.

• Comparable in performance with classification trees and with neural networks

• Highly accurate and fast when applied to large databases• Some links:

– http://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm– http://www.statsoft.com/textbook/stnaiveb.html

http://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm

http://www.resample.com/xlminer/help/NaiveBC/classiNB_intro.htm

Copy

right

R. W

eber

KDD: definition

Knowledge Discovery in Databases (KDD) is the non-trivial process of identifying valid, novel, and potential useful and understandable patterns in data. (R.Feldman,2000)KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data (Fayad, Piatetsky-Shapiro, Smyth 1996 p. 6).

Data mining is one of the steps in the KDD process.

Text mining concerns applying data mining techniques to unstructured text.

The KDD ProcessDATA

patternsinterpretation

filtering

SELECTED

DATApreprocessing

PROCESSED

DATA

transformation

Data mining

browsing

KNOWLEDGE

TRANSFORMED

DATA

Copy

right

R. W

eber

• Predictive modeling/risk assessment

• Database segmentation

Data mining tasks i

Classification, decision trees

Kohonen nets, clustering techniques

Copy

right

R. W

eber

• Link analysis

• Deviation detection

Data mining tasks ii

Rules: • Association generation• Relationships between entities

• How things change over time, trends

Copy

right

R. W

eber

KDD applications• Fraud detection

– Telecom (calling cards, cell phones)– Credit cards– Health insurance

Loan approval Investment analysis Marketing and sales data analysis

Identify potential customers Effectiveness of sales campaign Store layout

Copy

right

R. W

eber

Text mining

The problem starts with a query and the solution is a set of information (e.g., patterns, connections, profiles, trends) contained in several different texts that are potentially relevant to the initial query.

Copy

right

R. W

eber

Text mining applications

• IBM Text Navigator– Cluster documents by content;– Each document is annotated by the 2 most

frequently used words in the cluster;

• Concept Extraction (Los Alamos)– Text analysis of medical records;– Uses a clustering approach based on trigram

representation;– Documents in vectors, cosine for comparison;

Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.

Documents

Transcript of Copyright R. Weber Machine Learning, Data Mining INFO 629 Dr. R. Weber.