Data mining- Association Analysis -market basket

18
APPLICATION OF ASSOCIATION MINING IN ANALYZING THE CONSUMER BEHAVIOR BY MARKET BASKET TRANSACTION 13.11.14 Association Analysis of Market Basket Transaction Association Analysis of Market Basket Transaction Prepared by- Sowmiyan Morri Swapnil Soni DoMS, IISc Course- Data Mining Instructors- Prof Parthasarathy

Transcript of Data mining- Association Analysis -market basket

Page 1: Data mining- Association Analysis -market basket

A P P L I C A T I O N O F A S S O C I A T I O N M I N I N G I N A N A L Y Z I N G T H E C O N S U M E R B E H A V I O R B Y M A R K E T B A S K E T T R A N S A C T I O N

13.11.14Association Analysis of Market Basket Transaction

Association Analysis of Market Basket Transaction

Prepared by-

Sowmiyan MorriSwapnil Soni

DoMS, IISc

Course-

Data MiningInstructors-

Prof Parthasarathy

Page 2: Data mining- Association Analysis -market basket

2

Index

13.11.14Association Analysis of Market Basket Transaction

• Visualization of dataset

• Pre-processing of dataset

• Association analysis -3 tasks Results Insights

• Classification Vs Association

• Conclusion & Recommendation For Business For Business Analyst

Page 3: Data mining- Association Analysis -market basket

3

Visualization of dataset

13.11.14Association Analysis of Market Basket Transaction

Transaction ID

Items

Item-1 Item-2 Item-3 -- Item-70

Acorn Squash Apple Brats Bacon -- Yukon Gold Potatoes Total

1 T F F -- -- 1

2 F F F -- -- 1

3 F F T -- -- 2

4 F F F -- -- 1

5 F F F -- -- 1

6 F T F -- -- 1

7 F F F -- -- 1

8 F F F -- -- 2

9 F F F -- -- 1

10 F F F -- -- 1

11 F F F -- -- 3

12 F F F -- -- 2

13 F F F -- -- 2

14 F F F -- -- 3

15 F F F -- -- 1

-- -- -- -- -- -- 2

1731 -- -- -- -- -- 1

Total 76 38 39 -- 71 3815

Support 4.39% 2.20% 2.25% -- 4.10%

Total no. of Attributes/Items 70

Total no. of Transactions 1731

Page 4: Data mining- Association Analysis -market basket

4

Visualization of dataset

13.11.14Association Analysis of Market Basket Transaction

0

20

40

60

80

100

120

140

160

180

Frequency of Attributes(Support count of 1-itemset)

Statistics

Range [0,1731]

Average 54.5

Std Deviation 51.4

Min 1

Max 167

Attention:Maximum support an itemset can have= 167/1731 = 9.6%

0

2

4

6

8

10

12

14

16

T_

ID-1

96

T_

ID-6

33

T_

ID-1

64

8T

_ID

-16

38

T_

ID-9

93

T_

ID-2

03

T_

ID-7

28

T_

ID-1

14

5T

_ID

-17

14

T_

ID-2

54

T_

ID-6

00

T_

ID-8

21

T_

ID-1

18

9T

_ID

-14

31

T_

ID-2

2T

_ID

-18

2T

_ID

-33

2T

_ID

-49

8T

_ID

-62

9T

_ID

-79

4T

_ID

-97

1T

_ID

-11

23

T_

ID-1

30

8T

_ID

-14

53

T_

ID-1

60

3T

_ID

-28

T_

ID-1

10

T_

ID-1

80

T_

ID-2

53

T_

ID-3

21

T_

ID-3

93

T_

ID-4

71

T_

ID-5

34

T_

ID-5

91

T_

ID-6

71

T_

ID-7

51

T_

ID-8

20

T_

ID-8

98

T_

ID-9

64

T_

ID-1

04

2T

_ID

-11

07

T_

ID-1

16

9T

_ID

-12

41

T_

ID-1

30

0T

_ID

-13

70

T_

ID-1

44

0T

_ID

-15

02

T_

ID-1

56

9T

_ID

-16

53

T_

ID-1

69

7

No. of Items in Transaction

Quite Spars datasetPre-processing required!

Statistics

Range [0,70]

Average 2.20

Std Deviation 1.8

Min 1

Max 15

Real motivation-‘Weka’ failed to handle the dataset!

Page 5: Data mining- Association Analysis -market basket

5

Pre-processing of dataset

13.11.14Association Analysis of Market Basket Transaction

Transaction ID

Items

Item-1 Item-2 Item-3 -- Item-70

Acorn Squash Apple Brats Bacon -- Yukon Gold Potatoes Total

1 T F F -- -- 1

2 F F F -- -- 1

3 F F T -- -- 2

4 F F F -- -- 1

5 F F F -- -- 1

6 F T F -- -- 1

7 F F F -- -- 1

8 F F F -- -- 2

9 F F F -- -- 1

10 F F F -- -- 1

11 F F F -- -- 3

12 F F F -- -- 2

13 F F F -- -- 2

14 F F F -- -- 3

15 F F F -- -- 1

-- -- -- -- -- -- 2

1731 -- -- -- -- -- 1

Total 76 38 39 -- 71 3815

Support 4.39% 2.20% 2.25% -- 4.10%

Total no. of Attributes/Items 70

Total no. of Transactions 1731

Total no. of Attributes/Items with support <2% 34

Total no. of Items after pruning 36

Pruning of attributes below the desired level of supportLogic: Apriori algorithm- If the individual item sets are not frequent than its superset will also be not frequentGain: Calculation & memory reduced by pruning

Page 6: Data mining- Association Analysis -market basket

13.11.14Association Analysis of Market Basket Transaction

Fix the confidence level at 60%. Set the minimum support at 2%, 5%,10%, 20%, and 50%, run the Apriori algorithm to discover associationrules and summarize your findings.

Task-1

Page 7: Data mining- Association Analysis -market basket

7

Task-1 : Result

13.11.14Association Analysis of Market Basket Transaction

Confidence 60%

Minimum Support 2% 5% 10% 20% 50%

Rules generated 297 22 NA NA NA

Generated sets of large itemsets:

Size of set of large itemsets L(1) 36 18 NA NA NA

Size of set of large itemsets L(2) 37 10 NA NA NA

Size of set of large itemsets L(3) 36 2 NA NA NA

Size of set of large itemsets L(4) 21 NA NA NA NA

Size of set of large itemsets L(5) 5 NA NA NA NA

Total Itemsets 135 30 0 0 0297

22

135

30

2% 5%

Ru

les g

en

er

ate

d

Minimum Support

Min Support Vs Rules @ 60% Confidence

Rules generated

Itemsets

Inferences1. Frequent itemsets can be found only up to 5% of Min Support

2. Number of frequent itemsets reduces with increase in Min Support3. At the fixed given confidence level no. of Association Rules decreases with decrease in frequent itemset

Page 8: Data mining- Association Analysis -market basket

8

Task-1: Insights

Top-10 Rules

Antecedent Consequence

1. Butter Earthworm Segments > Black eye peas

2. Black eye peas Blue cheese > Butter

3. Black eye peas Butter > Earthworm Segments

4. Black eye peas > Earthworm Segments

5. Butter > Blue cheese

6. Black eye peas Butter > Blue cheese

7. Chilly Red Flame > Earthworm Segments

8. Blue cheese > Butter

9. Black eye peas Earthworm Segments > Butter

10. Basilisk Tail > Strawberry Essence

13.11.14Association Analysis of Market Basket Transaction

Page 9: Data mining- Association Analysis -market basket

13.11.14Association Analysis of Market Basket Transaction

Fix the minimum support at 2%. Set the confidence level at 90%, 80%, 70%, 60%, and 50%, run the Apriori algorithm to discover association rules and summarize your findings.

Task-2

Page 10: Data mining- Association Analysis -market basket

10

Task-2 : Result

13.11.14Association Analysis of Market Basket Transaction

Minimum Support 2%Confidence 90% 80% 70% 60% 50%Rules generated 134 140 245 297 417

Generated sets of large itemsets:

Size of set of large itemsets L(1) 36 36 36 36 36

Size of set of large itemsets L(2) 37 37 37 37 37Size of set of large itemsets L(3) 36 36 36 36 36Size of set of large itemsets L(4) 21 21 21 21 21Size of set of large itemsets L(5) 5 5 5 5 5

Total 135 135 135 135 135

40% 30% 20% 10% 5%

478 596 734 734 734

36 36 36 36 36

37 37 37 37 37

36 36 36 36 36

21 21 21 21 21

5 5 5 5 5

135 135 135 135 135

134 140

245

297

417

478

596

734 734 734

135 135 135 135 135 135 135 135 135 135

90% 80% 70% 60% 50% 40% 30% 20% 10% 5%

Ru

les g

en

er

ate

d

Confidence

Confidence Vs Rules @ 2% Min Support

Rules generated

Itemsets

Inference1. At the fixed given Min Support no. of Frequent itemsets remains constant irrespective of Confidence2. No. of Rules increases with decrease in Confidence level3. Maximum no. of Rules that can be extracted at the given Min Support is 734

Page 11: Data mining- Association Analysis -market basket

11 13.11.14Association Analysis of Market Basket Transaction

Task-2 : InsightsAntecedent Consequence

1. Butter Earthworm Segments > Black eye peas

2. Black eye peas Blue cheese > Butter

3. Chilly Red Flame Black eye peas > Earthworm Segments

4. Garden soil Strawberry Essence > Salamander Skin

5. Basilisk Tail Salamander Skin > Strawberry Essence

6. Blue cheese Earthworm Segments > Black eye peas

7. Blue cheese Earthworm Segments > Butter

8. Butter Blue cheese Earthworm Segments > Black eye peas

9. Black eye peas Blue cheese Earthworm Segments > Butter

10. Blue cheese Earthworm Segments > Black eye peas Butter

Page 12: Data mining- Association Analysis -market basket

13.11.14Association Analysis of Market Basket Transaction

Identify the diary products (milk, cheese etc.) from the items lists andgroup them into one binary variable. If a transaction has diary productsreplace them (only the diary products) with the binary variable. Use it asthe class label and build a decision tree using ID3 to predict the purchaseof diary products. Compare the rules generated from the decision treewith those generated earlier. Draw conclusions on the impact ofminimum support and confidence levels.

Task-3

Supervised Learning

Pre-determined Class Attribute: Dairy Product

Page 13: Data mining- Association Analysis -market basket

13

Task-3 : Pre-processing

13.11.14Association Analysis of Market Basket Transaction

Blue cheese Butter Butter Cheese Ewezerella Cheese Feta Cheese Juustoleipa Cheese salted sweet cream butter Vanilla Ice Cream

Dairy Products (8 No.s)

Total no. of Independent Attributes

62

Total no. of Transactions 1731

Class Attribute Dairy Product

Transaction ID

Attributes Class AttributesItem-1 Item-2 Item-3 -- Item-62

Acorn Squash Apple Brats Bacon --Yukon Gold

Potatoes Dairy Product

1 T F F -- -- F

2 F F F -- -- F

3 F F T -- -- F

4 F F F -- -- F

5 F F F -- -- F

6 F T F -- -- F

7 F F F -- -- F

8 F F F -- -- F

9 F F F -- -- F

10 F F F -- -- F

11 F F F -- -- F

12 F F F -- -- F

13 F F F -- -- F

14 F F F -- -- F

15 F F F -- -- F

-- -- -- -- -- -- F

1731 -- -- -- -- -- --

Supervised classification:ID3 Algorithm applied!

Page 14: Data mining- Association Analysis -market basket

14

Task-3 : Result

13.11.14Association Analysis of Market Basket Transaction

J48 Decision Tree1 | | | | | | | | | | | | | | | | | | | | | | | | Salamander Skin = T>Dairy

2 | | | | | | | | | | Bacon = T>Dairy

3 | | | | | | | Chilly Red Flame = T>Dairy

4 | | | | | | Roast potato = T>Dairy

5 | | | | | | | Strawberry Essence = T>Dairy

6 | | | | | | Bacon = T>Dairy

7 | | | | Ground Chicken = T>Dairy

8 | | | Red Potatoes = T>Dairy

9 | Salad Mix = T>Dairy

1| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Salad Mix =

T>Dairy

2 Black eye peas = T>Dairy

Rules : J48

Rules : ID3

Observation:Above highlighted Rules are common in both-Association & Classification

Page 15: Data mining- Association Analysis -market basket

15

Conclusion & Recommendation

Supervised learning- Classification

Large number of ‘Binary’ attributes explodes the huge uninterpretable decision tree.

High conditional decisions: If item-1 is not, item-2 is not…& so on then Dairy product=Yes

Symmetric treatment: ‘Presence’ & ‘Absence’ of an item in a transaction are treated with equal importance.

13.11.14Association Analysis of Market Basket Transaction

Unsupervised learning- Association Large number of ‘Binary’ attributes are handled prudently using ‘Min Support’ criteria; Only

qualified attributes/itemsets are considered for analysis

Asymmetric treatment: Only ‘Presence’ of an item in a transaction is of interest.

Simple and interpretable Rules required for market basket transaction to design the market strategies – ‘Cross selling’.

Comparison between Supervised & Un-supervised learning

Association mining is observed to be better technique for Market Basket Analysis!

For Business Analyst

Page 16: Data mining- Association Analysis -market basket

16

Conclusion & Recommendation

13.11.14Association Analysis of Market Basket Transaction

Good opportunity to maximize revenue by deploying ‘Association Mining!

For Business

• Trivial Associations Relation among Dairy products- ‘Butter’, ‘Black Eye Peas’, ‘Blue Cheese’- seems to be

obvious as they act as supplements of Vitamin ‘D’. Relation between ‘Salmander Skin’ & ‘Strawberry Essence’ is observed as they are used

for Salmandar Brandy through Fermentation process.

• Non-trivial Associations Relation between ‘Garden Soil’ & ‘Strawberry Essence’ Relation between ‘Earthworm’ & ‘Black Eye Peas’

Cross selling

Source: http://www.grailtrail.ndo.co.uk/grails/brandy.html

http://greatist.com/health/18-surprising-dairy-free-sources-calcium

Page 17: Data mining- Association Analysis -market basket

17

References & Tools

13.11.14Association Analysis of Market Basket Transaction

• Wikipedia• Greatist.com• MS -Excel• Data Mining tool- Weka

Page 18: Data mining- Association Analysis -market basket

13.11.14Association Analysis of Market Basket Transaction

Thank you!

Inference:Data Mining enables businesses!