Understanding Data Mining Craig A. Stevens, PMP, CC [email protected] .
-
Upload
efren-gallamore -
Category
Documents
-
view
217 -
download
2
Transcript of Understanding Data Mining Craig A. Stevens, PMP, CC [email protected] .
![Page 2: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/2.jpg)
Examples of Classical Statistical
Methods
![Page 3: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/3.jpg)
Latitude 36.19N and Longitude -86.78W
Nashville, TN, USA
![Page 4: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/4.jpg)
Yi = a + bxi + e
![Page 5: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/5.jpg)
http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htm
Multiple Regression
![Page 6: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/6.jpg)
http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htm
Multiple Regression
![Page 7: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/7.jpg)
http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htm
Multiple Regression
![Page 8: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/8.jpg)
http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htm
Multiple Regression
![Page 9: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/9.jpg)
http://www.ats.ucla.edu/stat/sas/faq/spplot/reg_int_cont.htm
Multiple Regression
![Page 10: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/10.jpg)
Data Mining
![Page 11: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/11.jpg)
http://datamining.typepad.com/photos/uncategorized/livejournal.png
![Page 12: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/12.jpg)
![Page 13: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/13.jpg)
What is Data Mining?• The process of identifying hidden patterns, trends,
and relationships in large quantities of data. Why Do Data Mining? • To discover useful information for making decisions.• Too many variables for Classical Statistical methods
to work. – Large Number of Records 108 - 1012
• Gigabyte – Terabyte
– High Dimensional Data • Lots of Variables (10 – 104 attributes)
![Page 14: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/14.jpg)
The Huber-Wegman Taxonomy of Data Set Sizes
Descriptor Data Set Size in Bytes
Storage Mode
Tiny 10^2 Piece of PaperSmall 10^4 A few Pieces of
PaperMedium 10^6 A Floppy DiskLarge 10^8 Hard DiskHuge 10^10 Multiple Hard DisksMassive 10^12 Robotic Magnetic
TapeStorage Silos
Super Massive 10^15 Distributed Data Archives
![Page 15: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/15.jpg)
Name Model Role
MeasurementLevel
Description
BAD Target Binary 1=client defaulted on loan 0=loan repaid
CLAGE Input Interval Age of oldest trade line in months
CLNO Input Interval Number of trade lines
DEBTINC Input Interval Debt-to-income ratio
DELINQ Input Interval Number of trade lines
DEROG Input Interval Number of major derogatory reports
JOB Input Nominal Six occupational categories
LOAN Input Interval Amount of the loan request
MORTDUE Input Interval Amount due on existing mortgage
NINQ Input Interval Number of recent credit inquiries
REASON Input Binary DebtCon=debt consolidation,
HomeImp=home improvement
VALUE Input Interval Value of current property
YOJ Input Interval Years at present job
![Page 16: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/16.jpg)
SAS Enterprise Miner Objects
![Page 17: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/17.jpg)
![Page 18: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/18.jpg)
Shows the Cut off Point is 6 Variables
![Page 19: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/19.jpg)
Small Number of Useful Variables
![Page 20: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/20.jpg)
![Page 21: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/21.jpg)
Comparing Methods and Profit vs Marketing Cost
![Page 22: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/22.jpg)
![Page 23: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/23.jpg)
![Page 24: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/24.jpg)
Decision Trees for Predictive Modeling Padraic G. Neville SAS Institute Inc. 4 August 1999
![Page 25: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/25.jpg)
Clustering As in Different Brands
![Page 26: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/26.jpg)
MOIS_I9BPROT_TR3FAT_FCLJASH_JOD6SODI_HGQCARB_SZ0CAL_JOH4
PCR3_1
PCR1_1
PCR2_1
-1
01
MOIS_I9B
012
P R O T _ T R 3
-1
01
MOIS_I9B
-10123
F A T _ F C L J
01
2
PROT_TR3
-10123
F A T _ F C L J
-1
01
MOIS_I9B
-1012
A S H _ J O D 6
01
2
PROT_TR3
-1012
A S H _ J O D 6
-1
01
23
FAT_FCLJ
-1012
A S H _ J O D 6
-1
01
MOIS_I9B
-10123
S O D I _ H G Q
01
2
PROT_TR3
-10123
S O D I _ H G Q
-1
01
23
FAT_FCLJ
-10123
S O D I _ H G Q
-1
01
2
ASH_JOD6
-10123
S O D I _ H G Q
-1
01
MOIS_I9B
-101
C A R B _ S Z 0
01
2
PROT_TR3
-101
C A R B _ S Z 0
-1
01
23
FAT_FCLJ
-101
C A R B _ S Z 0
-1
01
2
ASH_JOD6
-101
C A R B _ S Z 0
-1
01
23
SODI_HGQ
-101
C A R B _ S Z 0
-1
01
MOIS_I9B
-1012
C A L _ J O H 4
01
2
PROT_TR3
-1012
C A L _ J O H 4
-1
01
23
FAT_FCLJ
-1012
C A L _ J O H 4
-1
01
2
ASH_JOD6
-1012
C A L _ J O H 4
-1
01
23
SODI_HGQ
-1012
C A L _ J O H 4
-1
01
CARB_SZ0
-1012
C A L _ J O H 4
![Page 27: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/27.jpg)
Data Mining Art found at http://datamining.typepad.com/data_mining/dataviz/page/2/
![Page 28: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/28.jpg)
Data Mining Art found at http://datamining.typepad.com/data_mining/dataviz/page/2/
![Page 29: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/29.jpg)
![Page 30: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/30.jpg)
National Energy Research Scientific Computing Center
![Page 31: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/31.jpg)
SurfStatA Matlab toolbox for the statistical analysis of univariate and multivariate surface and volumetric data using linear mixed effects models and random field theoryKeith J. Worsley
![Page 32: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/32.jpg)
Latitude 36.19N and Longitude -86.78W
Nashville, TN, USA
![Page 33: Understanding Data Mining Craig A. Stevens, PMP, CC craigastevens@westbrookstevens.com .](https://reader036.fdocuments.net/reader036/viewer/2022062511/5517a9d755034645368b5d71/html5/thumbnails/33.jpg)
http://www.youtube.com/watch?v=CnniJR5Ah7g
Genealogical TreeOn You Tube