penerapan metode data mining market basket analysis terhadap ...
Data mining- Association Analysis -market basket
-
Upload
swapnil-soni -
Category
Data & Analytics
-
view
384 -
download
1
Transcript of Data mining- Association Analysis -market basket
A P P L I C A T I O N O F A S S O C I A T I O N M I N I N G I N A N A L Y Z I N G T H E C O N S U M E R B E H A V I O R B Y M A R K E T B A S K E T T R A N S A C T I O N
13.11.14Association Analysis of Market Basket Transaction
Association Analysis of Market Basket Transaction
Prepared by-
Sowmiyan MorriSwapnil Soni
DoMS, IISc
Course-
Data MiningInstructors-
Prof Parthasarathy
2
Index
13.11.14Association Analysis of Market Basket Transaction
• Visualization of dataset
• Pre-processing of dataset
• Association analysis -3 tasks Results Insights
• Classification Vs Association
• Conclusion & Recommendation For Business For Business Analyst
3
Visualization of dataset
13.11.14Association Analysis of Market Basket Transaction
Transaction ID
Items
Item-1 Item-2 Item-3 -- Item-70
Acorn Squash Apple Brats Bacon -- Yukon Gold Potatoes Total
1 T F F -- -- 1
2 F F F -- -- 1
3 F F T -- -- 2
4 F F F -- -- 1
5 F F F -- -- 1
6 F T F -- -- 1
7 F F F -- -- 1
8 F F F -- -- 2
9 F F F -- -- 1
10 F F F -- -- 1
11 F F F -- -- 3
12 F F F -- -- 2
13 F F F -- -- 2
14 F F F -- -- 3
15 F F F -- -- 1
-- -- -- -- -- -- 2
1731 -- -- -- -- -- 1
Total 76 38 39 -- 71 3815
Support 4.39% 2.20% 2.25% -- 4.10%
Total no. of Attributes/Items 70
Total no. of Transactions 1731
4
Visualization of dataset
13.11.14Association Analysis of Market Basket Transaction
0
20
40
60
80
100
120
140
160
180
Frequency of Attributes(Support count of 1-itemset)
Statistics
Range [0,1731]
Average 54.5
Std Deviation 51.4
Min 1
Max 167
Attention:Maximum support an itemset can have= 167/1731 = 9.6%
0
2
4
6
8
10
12
14
16
T_
ID-1
96
T_
ID-6
33
T_
ID-1
64
8T
_ID
-16
38
T_
ID-9
93
T_
ID-2
03
T_
ID-7
28
T_
ID-1
14
5T
_ID
-17
14
T_
ID-2
54
T_
ID-6
00
T_
ID-8
21
T_
ID-1
18
9T
_ID
-14
31
T_
ID-2
2T
_ID
-18
2T
_ID
-33
2T
_ID
-49
8T
_ID
-62
9T
_ID
-79
4T
_ID
-97
1T
_ID
-11
23
T_
ID-1
30
8T
_ID
-14
53
T_
ID-1
60
3T
_ID
-28
T_
ID-1
10
T_
ID-1
80
T_
ID-2
53
T_
ID-3
21
T_
ID-3
93
T_
ID-4
71
T_
ID-5
34
T_
ID-5
91
T_
ID-6
71
T_
ID-7
51
T_
ID-8
20
T_
ID-8
98
T_
ID-9
64
T_
ID-1
04
2T
_ID
-11
07
T_
ID-1
16
9T
_ID
-12
41
T_
ID-1
30
0T
_ID
-13
70
T_
ID-1
44
0T
_ID
-15
02
T_
ID-1
56
9T
_ID
-16
53
T_
ID-1
69
7
No. of Items in Transaction
Quite Spars datasetPre-processing required!
Statistics
Range [0,70]
Average 2.20
Std Deviation 1.8
Min 1
Max 15
Real motivation-‘Weka’ failed to handle the dataset!
5
Pre-processing of dataset
13.11.14Association Analysis of Market Basket Transaction
Transaction ID
Items
Item-1 Item-2 Item-3 -- Item-70
Acorn Squash Apple Brats Bacon -- Yukon Gold Potatoes Total
1 T F F -- -- 1
2 F F F -- -- 1
3 F F T -- -- 2
4 F F F -- -- 1
5 F F F -- -- 1
6 F T F -- -- 1
7 F F F -- -- 1
8 F F F -- -- 2
9 F F F -- -- 1
10 F F F -- -- 1
11 F F F -- -- 3
12 F F F -- -- 2
13 F F F -- -- 2
14 F F F -- -- 3
15 F F F -- -- 1
-- -- -- -- -- -- 2
1731 -- -- -- -- -- 1
Total 76 38 39 -- 71 3815
Support 4.39% 2.20% 2.25% -- 4.10%
Total no. of Attributes/Items 70
Total no. of Transactions 1731
Total no. of Attributes/Items with support <2% 34
Total no. of Items after pruning 36
Pruning of attributes below the desired level of supportLogic: Apriori algorithm- If the individual item sets are not frequent than its superset will also be not frequentGain: Calculation & memory reduced by pruning
13.11.14Association Analysis of Market Basket Transaction
Fix the confidence level at 60%. Set the minimum support at 2%, 5%,10%, 20%, and 50%, run the Apriori algorithm to discover associationrules and summarize your findings.
Task-1
7
Task-1 : Result
13.11.14Association Analysis of Market Basket Transaction
Confidence 60%
Minimum Support 2% 5% 10% 20% 50%
Rules generated 297 22 NA NA NA
Generated sets of large itemsets:
Size of set of large itemsets L(1) 36 18 NA NA NA
Size of set of large itemsets L(2) 37 10 NA NA NA
Size of set of large itemsets L(3) 36 2 NA NA NA
Size of set of large itemsets L(4) 21 NA NA NA NA
Size of set of large itemsets L(5) 5 NA NA NA NA
Total Itemsets 135 30 0 0 0297
22
135
30
2% 5%
Ru
les g
en
er
ate
d
Minimum Support
Min Support Vs Rules @ 60% Confidence
Rules generated
Itemsets
Inferences1. Frequent itemsets can be found only up to 5% of Min Support
2. Number of frequent itemsets reduces with increase in Min Support3. At the fixed given confidence level no. of Association Rules decreases with decrease in frequent itemset
8
Task-1: Insights
Top-10 Rules
Antecedent Consequence
1. Butter Earthworm Segments > Black eye peas
2. Black eye peas Blue cheese > Butter
3. Black eye peas Butter > Earthworm Segments
4. Black eye peas > Earthworm Segments
5. Butter > Blue cheese
6. Black eye peas Butter > Blue cheese
7. Chilly Red Flame > Earthworm Segments
8. Blue cheese > Butter
9. Black eye peas Earthworm Segments > Butter
10. Basilisk Tail > Strawberry Essence
13.11.14Association Analysis of Market Basket Transaction
13.11.14Association Analysis of Market Basket Transaction
Fix the minimum support at 2%. Set the confidence level at 90%, 80%, 70%, 60%, and 50%, run the Apriori algorithm to discover association rules and summarize your findings.
Task-2
10
Task-2 : Result
13.11.14Association Analysis of Market Basket Transaction
Minimum Support 2%Confidence 90% 80% 70% 60% 50%Rules generated 134 140 245 297 417
Generated sets of large itemsets:
Size of set of large itemsets L(1) 36 36 36 36 36
Size of set of large itemsets L(2) 37 37 37 37 37Size of set of large itemsets L(3) 36 36 36 36 36Size of set of large itemsets L(4) 21 21 21 21 21Size of set of large itemsets L(5) 5 5 5 5 5
Total 135 135 135 135 135
40% 30% 20% 10% 5%
478 596 734 734 734
36 36 36 36 36
37 37 37 37 37
36 36 36 36 36
21 21 21 21 21
5 5 5 5 5
135 135 135 135 135
134 140
245
297
417
478
596
734 734 734
135 135 135 135 135 135 135 135 135 135
90% 80% 70% 60% 50% 40% 30% 20% 10% 5%
Ru
les g
en
er
ate
d
Confidence
Confidence Vs Rules @ 2% Min Support
Rules generated
Itemsets
Inference1. At the fixed given Min Support no. of Frequent itemsets remains constant irrespective of Confidence2. No. of Rules increases with decrease in Confidence level3. Maximum no. of Rules that can be extracted at the given Min Support is 734
11 13.11.14Association Analysis of Market Basket Transaction
Task-2 : InsightsAntecedent Consequence
1. Butter Earthworm Segments > Black eye peas
2. Black eye peas Blue cheese > Butter
3. Chilly Red Flame Black eye peas > Earthworm Segments
4. Garden soil Strawberry Essence > Salamander Skin
5. Basilisk Tail Salamander Skin > Strawberry Essence
6. Blue cheese Earthworm Segments > Black eye peas
7. Blue cheese Earthworm Segments > Butter
8. Butter Blue cheese Earthworm Segments > Black eye peas
9. Black eye peas Blue cheese Earthworm Segments > Butter
10. Blue cheese Earthworm Segments > Black eye peas Butter
13.11.14Association Analysis of Market Basket Transaction
Identify the diary products (milk, cheese etc.) from the items lists andgroup them into one binary variable. If a transaction has diary productsreplace them (only the diary products) with the binary variable. Use it asthe class label and build a decision tree using ID3 to predict the purchaseof diary products. Compare the rules generated from the decision treewith those generated earlier. Draw conclusions on the impact ofminimum support and confidence levels.
Task-3
Supervised Learning
Pre-determined Class Attribute: Dairy Product
13
Task-3 : Pre-processing
13.11.14Association Analysis of Market Basket Transaction
Blue cheese Butter Butter Cheese Ewezerella Cheese Feta Cheese Juustoleipa Cheese salted sweet cream butter Vanilla Ice Cream
Dairy Products (8 No.s)
Total no. of Independent Attributes
62
Total no. of Transactions 1731
Class Attribute Dairy Product
Transaction ID
Attributes Class AttributesItem-1 Item-2 Item-3 -- Item-62
Acorn Squash Apple Brats Bacon --Yukon Gold
Potatoes Dairy Product
1 T F F -- -- F
2 F F F -- -- F
3 F F T -- -- F
4 F F F -- -- F
5 F F F -- -- F
6 F T F -- -- F
7 F F F -- -- F
8 F F F -- -- F
9 F F F -- -- F
10 F F F -- -- F
11 F F F -- -- F
12 F F F -- -- F
13 F F F -- -- F
14 F F F -- -- F
15 F F F -- -- F
-- -- -- -- -- -- F
1731 -- -- -- -- -- --
Supervised classification:ID3 Algorithm applied!
14
Task-3 : Result
13.11.14Association Analysis of Market Basket Transaction
J48 Decision Tree1 | | | | | | | | | | | | | | | | | | | | | | | | Salamander Skin = T>Dairy
2 | | | | | | | | | | Bacon = T>Dairy
3 | | | | | | | Chilly Red Flame = T>Dairy
4 | | | | | | Roast potato = T>Dairy
5 | | | | | | | Strawberry Essence = T>Dairy
6 | | | | | | Bacon = T>Dairy
7 | | | | Ground Chicken = T>Dairy
8 | | | Red Potatoes = T>Dairy
9 | Salad Mix = T>Dairy
1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Salad Mix =
T>Dairy
2 Black eye peas = T>Dairy
Rules : J48
Rules : ID3
Observation:Above highlighted Rules are common in both-Association & Classification
15
Conclusion & Recommendation
Supervised learning- Classification
Large number of ‘Binary’ attributes explodes the huge uninterpretable decision tree.
High conditional decisions: If item-1 is not, item-2 is not…& so on then Dairy product=Yes
Symmetric treatment: ‘Presence’ & ‘Absence’ of an item in a transaction are treated with equal importance.
13.11.14Association Analysis of Market Basket Transaction
Unsupervised learning- Association Large number of ‘Binary’ attributes are handled prudently using ‘Min Support’ criteria; Only
qualified attributes/itemsets are considered for analysis
Asymmetric treatment: Only ‘Presence’ of an item in a transaction is of interest.
Simple and interpretable Rules required for market basket transaction to design the market strategies – ‘Cross selling’.
Comparison between Supervised & Un-supervised learning
Association mining is observed to be better technique for Market Basket Analysis!
For Business Analyst
16
Conclusion & Recommendation
13.11.14Association Analysis of Market Basket Transaction
Good opportunity to maximize revenue by deploying ‘Association Mining!
For Business
• Trivial Associations Relation among Dairy products- ‘Butter’, ‘Black Eye Peas’, ‘Blue Cheese’- seems to be
obvious as they act as supplements of Vitamin ‘D’. Relation between ‘Salmander Skin’ & ‘Strawberry Essence’ is observed as they are used
for Salmandar Brandy through Fermentation process.
• Non-trivial Associations Relation between ‘Garden Soil’ & ‘Strawberry Essence’ Relation between ‘Earthworm’ & ‘Black Eye Peas’
Cross selling
Source: http://www.grailtrail.ndo.co.uk/grails/brandy.html
http://greatist.com/health/18-surprising-dairy-free-sources-calcium
17
References & Tools
13.11.14Association Analysis of Market Basket Transaction
• Wikipedia• Greatist.com• MS -Excel• Data Mining tool- Weka
13.11.14Association Analysis of Market Basket Transaction
Thank you!
Inference:Data Mining enables businesses!