By Dan Stalloch. Association – what could be linked together in away with something Patterns –...

Post on 18-Dec-2015

213 views 0 download

Transcript of By Dan Stalloch. Association – what could be linked together in away with something Patterns –...

By Dan Stalloch

Data MiningCS 541

Association – what could be linked together in away with something

Patterns – sequential and time series, shows us how often certain things occur

Classification – shows us how data is grouped

An Overview of the Uses of Data Mining

Prediction – the detection of a stable occurrence within the data that may continue into the future

Identification – what can be found out by system usage or what might be present in a thing

Classification – how the data could be grouped

Optimization – finding ways to utilize resources

Why Data Mining is Useful

Apriori – frequent large item setsSampling – small frequent item setsFrequent-Pattern (FP) Tree and FP-Growth –

better version of AprioriPartition – efficient way to use the Apriori

algorithmDecision Tree Induction – constructing a

decision tree from a training data setk-Means – creates clusteringAnd others

Data Mining Algorithms

Marketing – analyzing customer behaviorFinance – keeping track of credit and fraudManufacturing – optimizing use of resourcesHealth Care – checking patterns for useful

information

Areas that use Data Mining to Enhance Performance

http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data

This is a Car database from a depository of databases made available to everyone through UCI

When mining a database it is essential to ask what would you like to be able to predict from it and in this instance we would like to know which cars have decent mpg

We might also be able to predict which companies are likely to stay in business

An Example of a Databse We may Wish to Mine And Why

We must create or use programs that shows us either a 2-D contingency table or a 3-D contingency table

How Can We Predict Information from Mining a Database?

http://www.autonlab.org/tutorials/dtree18.pdf

We use a formula to decide which areas have the highest information gain dependent on what we would like to know. That forumula goes

like this IG(Y|X) = H(Y) - H(Y | X)Where H(X) = the entropy of X

How do we Know what Information is Worth Mining?

http://www.autonlab.org/tutorials/dtree18.pdfhttp://archive.ics.uci.edu/ml/machine-learning

-databases/auto-mpg/auto-mpg.datahttp://www.autonlab.org/tutorials/infogain11.

pdfChapter 28 from Fundamentals of Database

Systems 6th Edition By Elmasri and NavathePictures from Andrew W. Moore Slides

References