By Dan Stalloch. Association – what could be linked together in away with something Patterns –...
-
Upload
shanon-collins -
Category
Documents
-
view
213 -
download
0
Transcript of By Dan Stalloch. Association – what could be linked together in away with something Patterns –...
![Page 1: By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.](https://reader035.fdocuments.net/reader035/viewer/2022072006/56649d1e5503460f949f1d4d/html5/thumbnails/1.jpg)
By Dan Stalloch
Data MiningCS 541
![Page 2: By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.](https://reader035.fdocuments.net/reader035/viewer/2022072006/56649d1e5503460f949f1d4d/html5/thumbnails/2.jpg)
Association – what could be linked together in away with something
Patterns – sequential and time series, shows us how often certain things occur
Classification – shows us how data is grouped
An Overview of the Uses of Data Mining
![Page 3: By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.](https://reader035.fdocuments.net/reader035/viewer/2022072006/56649d1e5503460f949f1d4d/html5/thumbnails/3.jpg)
Prediction – the detection of a stable occurrence within the data that may continue into the future
Identification – what can be found out by system usage or what might be present in a thing
Classification – how the data could be grouped
Optimization – finding ways to utilize resources
Why Data Mining is Useful
![Page 4: By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.](https://reader035.fdocuments.net/reader035/viewer/2022072006/56649d1e5503460f949f1d4d/html5/thumbnails/4.jpg)
Apriori – frequent large item setsSampling – small frequent item setsFrequent-Pattern (FP) Tree and FP-Growth –
better version of AprioriPartition – efficient way to use the Apriori
algorithmDecision Tree Induction – constructing a
decision tree from a training data setk-Means – creates clusteringAnd others
Data Mining Algorithms
![Page 5: By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.](https://reader035.fdocuments.net/reader035/viewer/2022072006/56649d1e5503460f949f1d4d/html5/thumbnails/5.jpg)
Marketing – analyzing customer behaviorFinance – keeping track of credit and fraudManufacturing – optimizing use of resourcesHealth Care – checking patterns for useful
information
Areas that use Data Mining to Enhance Performance
![Page 6: By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.](https://reader035.fdocuments.net/reader035/viewer/2022072006/56649d1e5503460f949f1d4d/html5/thumbnails/6.jpg)
http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data
This is a Car database from a depository of databases made available to everyone through UCI
When mining a database it is essential to ask what would you like to be able to predict from it and in this instance we would like to know which cars have decent mpg
We might also be able to predict which companies are likely to stay in business
An Example of a Databse We may Wish to Mine And Why
![Page 7: By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.](https://reader035.fdocuments.net/reader035/viewer/2022072006/56649d1e5503460f949f1d4d/html5/thumbnails/7.jpg)
We must create or use programs that shows us either a 2-D contingency table or a 3-D contingency table
How Can We Predict Information from Mining a Database?
http://www.autonlab.org/tutorials/dtree18.pdf
![Page 8: By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.](https://reader035.fdocuments.net/reader035/viewer/2022072006/56649d1e5503460f949f1d4d/html5/thumbnails/8.jpg)
We use a formula to decide which areas have the highest information gain dependent on what we would like to know. That forumula goes
like this IG(Y|X) = H(Y) - H(Y | X)Where H(X) = the entropy of X
How do we Know what Information is Worth Mining?
![Page 9: By Dan Stalloch. Association – what could be linked together in away with something Patterns – sequential and time series, shows us how often certain.](https://reader035.fdocuments.net/reader035/viewer/2022072006/56649d1e5503460f949f1d4d/html5/thumbnails/9.jpg)
http://www.autonlab.org/tutorials/dtree18.pdfhttp://archive.ics.uci.edu/ml/machine-learning
-databases/auto-mpg/auto-mpg.datahttp://www.autonlab.org/tutorials/infogain11.
pdfChapter 28 from Fundamentals of Database
Systems 6th Edition By Elmasri and NavathePictures from Andrew W. Moore Slides
References