1 Chapter 8: Introduction to Pattern Discovery 8.1 Introduction 8.2 Cluster Analysis 8.3 Market...
-
Upload
nathaniel-griffith -
Category
Documents
-
view
234 -
download
2
Transcript of 1 Chapter 8: Introduction to Pattern Discovery 8.1 Introduction 8.2 Cluster Analysis 8.3 Market...
1
Chapter 8: Introduction to Pattern Discovery
8.1 Introduction
8.2 Cluster Analysis
8.3 Market Basket Analysis (Self-Study)
2
Chapter 8: Introduction to Pattern Discovery
8.1 Introduction 8.1 Introduction
8.2 Cluster Analysis
8.3 Market Basket Analysis (Self-Study)
3
Pattern Discovery
3
The Essence of Data Mining?
“…the discovery of interesting,unexpected, or valuablestructures in large data sets.”
– David Hand
...
4
Pattern Discovery
4
“If you’ve got terabytes of data, and you’re relying on data mining to find interesting things in there for you, you’ve lost before you’ve even begun.”
The Essence of Data Mining?
“…the discovery of interesting,unexpected, or valuablestructures in large data sets.”
– David Hand
– Herb Edelstein
5
Pattern Discovery Caution
5
Poor data quality Opportunity Interventions Separability Obviousness Non-stationarity
6
Pattern Discovery Applications
6
Data reduction
Novelty detection
Profiling
Market basket analysis
Sequence analysisCB
A
...
7
Pattern Discovery Tools
7
Data reduction
Novelty detection
Profiling
Market basket analysis
Sequence analysisCB
A
...
8
Pattern Discovery Tools
8
Data reduction
Novelty detection
Profiling
Market basket analysis
Sequence analysisCB
A
9
Chapter 8: Introduction to Pattern Discovery
8.1 Introduction
8.2 Cluster Analysis8.2 Cluster Analysis
8.3 Market Basket Analysis (Self-Study)
10
Unsupervised Classification
10
inputs
Unsupervised classification: grouping of cases based on similarities in input values.
grouping
cluster 1
cluster 2
cluster 2
cluster 1
cluster 3
...
11
Unsupervised Classification
11
inputs
Unsupervised classification: grouping of cases based on similarities in input values.
grouping
cluster 1
cluster 2
cluster 2
cluster 1
cluster 3
...
12
k-means Clustering Algorithm
12
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Re-assign cases.
6. Repeat steps 4 and 5until convergence.
13
k-means Clustering Algorithm
13
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Re-assign cases.
6. Repeat steps 4 and 5until convergence.
14
k-means Clustering Algorithm
14
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
15
k-means Clustering Algorithm
15
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
16
k-means Clustering Algorithm
16
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
17
k-means Clustering Algorithm
17
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
18
k-means Clustering Algorithm
18
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
19
k-means Clustering Algorithm
19
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
20
k-means Clustering Algorithm
20
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
21
k-means Clustering Algorithm
21
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
22
k-means Clustering Algorithm
22
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
23
k-means Clustering Algorithm
23
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
24
k-means Clustering Algorithm
24
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
25
k-means Clustering Algorithm
25
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
26
k-means Clustering Algorithm
26
Training Data
1. Select inputs.
2. Select k cluster centers.
3. Assign cases to closest center.
4. Update cluster centers.
5. Reassign cases.
6. Repeat steps 4 and 5until convergence.
...
27
Segmentation Analysis
27
When no clusters exist, use the k-means algorithm to partition cases into contiguous groups.
Training Data
28
Demographic Segmentation Demonstration
28
Analysis goal:
Group geographic regions into segments based on income, household size, and population density.
Analysis plan: Select and transform segmentation inputs. Select the number of segments to create. Create segments with the Cluster tool. Interpret the segments.
29
Segmenting Census Data
This demonstration introduces SAS Enterprise Miner tools and techniques for cluster and segmentation analysis.
29
30
Exploring and Filtering Analysis Data
This demonstration introduces SAS Enterprise Miner tools and techniques that explore and filteranalysis data, particularly data source exploration and case filtering.
30
31
Setting Cluster Tool Options
This demonstration illustrates how to use the Cluster tool to segment the cases in the CENSUS2000 data set.
31
32
Creating Clusters with the Cluster Tool
This demonstration illustrates how the Cluster tool determines the number of clusters in the data.
32
33
Specifying the Segment Count
This demonstration illustrates how you can change the number of clusters created by the Cluster node.
33
34
Exploring Segments
This demonstration illustrates how to use graphical aids to explore the segments.
35
Profiling Segments
This demonstration illustrates using the Segment Profile tool to interpret the composition of clusters.
37
Chapter 8: Introduction to Pattern Discovery
8.1 Introduction
8.2 Cluster Analysis
8.3 Market Basket Analysis (Self-Study)8.3 Market Basket Analysis (Self-Study)
38
Market Basket Analysis
38
Rule
A DC AA C
B & C D
Support
2/52/52/51/5
Confidence
2/32/42/31/3
A B C A C D B C D A D E B C E
...
39
Market Basket Analysis
39
Rule
A DC AA C
B & C D
Support
2/52/52/51/5
Confidence
2/32/42/31/3
A B C A C D B C D A D E B C E
...
40
Implication?
40
Checking Account
No
Yes
No Yes
SavingsAccount
4,000
6,000
10,000Support(SVG CK) = 50%Confidence(SVG CK) = 83%
Lift(SVG CK) = 0.83/0.85 < 1Expected Confidence(SVG CK) = 85%
41
Barbie Doll Candy1. Put them closer together in the store.
2. Put them far apart in the store.
3. Package candy bars with the dolls.
4. Package Barbie + candy + poorly selling item.
5. Raise the price on one, and lower it on the other.
6. Offer Barbie accessories for proofs of purchase.
7. Do not advertise candy and Barbie together.
8. Offer candies in the shape of a Barbie doll.
41
43
Association Tool Demonstration
43
Analysis goal:
Explore associations between retail banking services used by customers.
Analysis plan: Create an association data source. Run an association analysis. Interpret the association rules. Run a sequence analysis. Interpret the sequence rules.
46
Pattern Discovery Tools: Review
46
Generate cluster models using automatic settings and segmentation models with user-defined settings.
Compare within-segment distributions ofselected inputs to overall distributions. Thishelps you understand segment definition.
Conduct market basket and sequence analysis on transactions data. A data source must have one target, one ID, and (if desired) one sequence variable in the data source.