An Investigation into Commercial Data Mining
description
Transcript of An Investigation into Commercial Data Mining
![Page 1: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/1.jpg)
An Evaluation of Commercial Data
MiningProposed and Presented by
Emily Davis
Supervisor: John Ebden
![Page 2: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/2.jpg)
Statement of the Problem
An Evaluation of Commercial Data Mining Capabilities, for example Oracle9i’s Data Mining Suite.
![Page 3: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/3.jpg)
Background
Data mining is a relatively new offshoot ofdatabase technology which has arisen as a resultof the ability of computers to: Store vast quantities of data in data warehouses. Implement ingenious algorithms for the mining
of data. Use these algorithms to analyse these vast
quantities of data in a reasonable amount of time.
![Page 4: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/4.jpg)
Data mining discovers the patterns in data that represent knowledge.
It is of interest what algorithms data mining suites use and how well each category of data mining algorithm performs on data and what kind of results are produced.
Another important issue is usability of the algorithm.
Random Number Example taken from http://www.saltspring.com/brochmann/math/mining/mining1.html
![Page 5: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/5.jpg)
# data a data b data c 1.00000000 0.71132700 0.15379400 1.88403600
2.00000000 0.62219935 0.83119106 3.73797189 3.00000000 0.33872289 0.80881084 3.10387831 4.00000000 0.54262732 0.35427095 2.14806749 5.00000000 0.50631348 0.71599532 3.16061290 6.00000000 0.00132503 0.22447315 0.67606951 7.00000000 0.76211535 0.94620700 4.36285170 8.00000000 0.91026206 0.89499186 4.50549970 9.00000000 0.92640874 0.47156928 3.26752532 10.0000000 0.49323546 0.27673696 1.81668179 11.0000000 0.04501477 0.30142353 0.99430013 12.0000000 0.49180000 0.17909135 1.52087404 13.0000000 0.06747225 0.85629071 2.70381663 14.0000000 0.84239974 0.41916601 2.94229750
![Page 6: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/6.jpg)
49.0000000 0.07845276 0.69584199 2.24443147
50.0000000 0.07548299 0.52973340 1.74016616
51.0000000 0.72301849 0.97594044 ????????
Data A and B random numbers generated in Excel.
Data c = 2*(data a) + 3*(data b).
![Page 7: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/7.jpg)
51st value calculated by Excel:4.37385831
Value calculated using Knowledge Miner – a Macintosh data mining tool:
4.34791231 and the equation :
1.97*(data a) + 2.96*(data b) + 0.0324
![Page 8: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/8.jpg)
Experiment repeated using three columns of random numbers and this equation:
Data d = 23*(data a)-4.5*(data b)+(data a + data c) .
The last five entries for Data D were missing from the column.
![Page 9: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/9.jpg)
These were generated by Excel:14.7314558 12.0720505 22.0008992 7.52633344 5.25167700 These are what Knowledge Miner predicted:14.7341613 12.0731391 22.0080223 7.52465867 5.24861860
![Page 10: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/10.jpg)
Plan of Action
Literature Survey (and other resources) Install Software for Oracle Get to know the Oracle Suite Evaluate Oracle9i’s Data Mining Suite
![Page 11: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/11.jpg)
Install Software for Oracle
Including JDeveloper May be extended to the installation of
other commercial data mining suites eg.
DB2’s Intelligent Miner
Informix’s Data Mine
![Page 12: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/12.jpg)
Investigate Oracle9i’s Data Mining Suite Two major algorithm types – supervised and
unsupervised learning. A Medical Example:
Supervised learning – researchers input medical profiles into a leukaemia model to predict propensity for the disease.Unsupervised learning – searches for clusters of related information in data sets to reveal insights about diseases and patient populations.
![Page 13: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/13.jpg)
Get to know the Oracle DM Suite (a major task). Explore JDeveloper, Oracle9i’s Java
based API. JDeveloper complies with JDM (Java Data
Mining) used by Oracle, Sun, IBM and others.
Explore DM4J( Data Mining for Java) the new Graphical User Interface for Oracle DM.
![Page 14: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/14.jpg)
Addressing the Problem:
Run the different algorithms available in the data mining suite.
Document and analyse results in terms of performance and effectiveness of algorithm.
![Page 15: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/15.jpg)
Expected Results:
The ability to say conclusively whether Oracle's data mining capabilities are inferior or superior to anything else in the market place and why this can be stated.
![Page 16: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/16.jpg)
Possible Extensions to the Project: To have sufficient knowledge of the topic to give
recommendations or feedback: to Oracle regarding their data mining suite. to IT customers wanting to purchase data mining
suites. Explore the field of Random stereograms- could
a computer see them? If not, why not?
![Page 17: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/17.jpg)
Literature Survey Principles of data mining by David Hand, Heikki
Mannila and Padhraic Smyth, Cambridge Massachusetts, MIT Press, 2001 – algorithmic concepts
Data mining: concepts and techniques by Jiawei Han and Micheline Kamber, San Francisco, California, Morgan Kauffmann, 2001 – algorithmic evaluations
Data mining: a tutorial- based primer by Richard J. Roiger and Michael W. Geatz, Boston, Massachusetts, Addison Wesley, 2003 - practical knowledge and processing
![Page 18: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/18.jpg)
Data Mining by Pieter Adriaans and Dolf Zantinge, Harlow, England, Addison Wesley, 1996 – real life application
Data Mining and Statistical Analysis Using SQL by Robert P. Trueblood and John N. Lovett, Jnr., USA, Apress, 2001 – statistical principles
Data Mining Using SAS Applications by George Fernandez, USA, Chapman and Hall/CRC, 2003 - methodologies
![Page 19: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/19.jpg)
Mastering Data Mining: The Art and Science of Customer Relationship Management by Michael J.A. Berry and Gordon S. Linoff, USA, Wiley Computer Publishing, 2000 – building effective models
Data Preparation for Data Mining by Dorian Pyle, San Francisco, California, Morgan Kauffman, 2000 – Demo code,
10 Golden Rules.
![Page 20: An Investigation into Commercial Data Mining](https://reader036.fdocuments.net/reader036/viewer/2022082803/5458a42eaf79592b448b548a/html5/thumbnails/20.jpg)
The White Paper: Data Mining- Beyond Algorithms by Dr Akeel Al-Attar, available at http://www.attar.com/tutor/mining.htm
Summary from the KDD-03 Panel—Data Mining: The Next Ten Years available at http://www.acm.org/sigs/sigkdd/explorations/issue5-2/pnl_10yrs_final1.pdf
Oracle Website Oracle Magazine