Randomly Sampling Maximal Itemsets -...

16
Randomly Sampling Maximal Itemsets Sandy Moens and Bart Goethals

Transcript of Randomly Sampling Maximal Itemsets -...

Page 1: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

Randomly Sampling Maximal Itemsets Sandy Moens and Bart Goethals

Page 2: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

2

Frequent Itemset Mining

•  Finding interesting patterns by e.g. support

•  Problems: -  Much redundancy -  Many, many patterns

Page 3: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

3

Frequent Itemset Mining

•  Finding interesting patterns by support

•  Problems: -  Much redundancy -  Many, many patterns

Page 4: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

4

Frequent Itemset Mining

•  Finding interesting patterns by support

•  Problems: -  Much redundancy -  Many, many patterns

Page 5: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

5

Pattern Set Mining

•  Less redundancy •  Less patterns •  But: large enumeration space!

Page 6: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

6

Pattern Set Mining

•  Less redundancy •  Less patterns •  But: large enumeration space!

Step 1: Enumerate

Page 7: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

7

Pattern Set Mining

•  Less redundancy •  Less patterns •  But: large enumeration space!

Step 1: Enumerate Step 2: Filter

Page 8: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

8

Output Space Sampling

•  No explicit enumeration

Page 9: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

9

Output Space Sampling

•  No explicit enumeration

Page 10: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

10

Random Maximal Itemset Sampling

•  Long patterns with low support -  E.g. microarray data, recommendation

•  Simple random walk over extensions -  Quality measure q -  Approximation measure p

Page 11: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

11

Random Walk

Page 12: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

12

Random Walk

Page 13: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

13

Random Walk

Page 14: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

14

Random Walk

Page 15: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

15

Spreading the Search

•  Uniform Metropolis-Hastings -  E.g. Hasan and Zaki, Musk: Uniform sampling of

k-maximal patterns (SDM’09)

•  Weight approximation score -  Additive -  Multiplicative -  Adaptive

Page 16: Randomly Sampling Maximal Itemsets - Visualizationpoloclub.gatech.edu/idea2013/papers/IDEA_RMIS.pdf · DEMO TIME . Author: Sandy Moens Created Date: 8/11/2013 11:44:08 AM ...

16

DEMO TIME