Facilitating Interactive Mining of Global and Local Association Rules Abhishek Mukherji* Elke A....

download Facilitating Interactive Mining of Global and Local Association Rules Abhishek Mukherji* Elke A. Rundensteiner Matthew O. Ward Department of Computer Science,

If you can't read please download the document

Transcript of Facilitating Interactive Mining of Global and Local Association Rules Abhishek Mukherji* Elke A....

  • Slide 1
  • Facilitating Interactive Mining of Global and Local Association Rules Abhishek Mukherji* Elke A. Rundensteiner Matthew O. Ward Department of Computer Science, Worcester Polytechnic Institute, MA, USA. *Samsung Research America, CA, USA. Xmdvtool is an open source multivariate visual analytics tool developed at WPI with a series of NSF grants over the past 20 years (http://sourceforge.net/projects/xmdvtool/). This PhD research work was partly supported by NSF under grants IIS-0812027, CCF-0811510 and IIS-1117139.
  • Slide 2
  • Era of Big Data . And we are DRIVING! 11/03/20142 1. Wheres the Data in the Big Data Wave? Gerhard Weikum, Res. Director at Max Planck Institute, http://wp.sigmod.org/?p=786.http://wp.sigmod.org/?p=786 2. Analytic DB Technology for the Data Enthusiast. Pat Hanrahan, Stanford & Tableau, SIGMOD12 Keynote Talk. Volume Veracity Variety Velocity
  • Slide 3
  • XmdvTools Efforts Towards This Paradigm Shift 11/03/20143 Visualize Static DataI. Visualize Stream & Sensor Data SNIFTool & FireStream *Di Yang et al., Interactive visual exploration of neighbor-based patterns in data streams, ACM SIGMOD10 Demo. ViStream* II. Visualize Mined Results Visualize Data Records PARAS/FIRECOLARM
  • Slide 4
  • I. Stream & Sensor Data Processing 1. SNIFTool/FireStream: Discover Patterns in Live Stream [CIKM 08, ICDE Demo 07] 2. JAQPOT: High Velocity Streams MJoin Exec. [BNCOD 11] Summary of Graduate Research Works 11/03/20144 CAPE*XMDVTool^ * http://davis.wpi.edu/dsrg/PROJECTS/CAPE/index.html ^http://davis.wpi.edu/xmdv/index.html III. Scalable Nugget-guided Hypothesis Testing 1. SPHINX: Evidence-Hypotheses Explor.[CIKM13] 2. Iterative Multi-Evidence-Hypotheses Model II. Interactive Mining 1.PARAS /FIRE [VLDB13, SIGMOD13, CIKM13] 2.COLARM [EDBT14]
  • Slide 5
  • PARAS/FIRE: Interactive Visual Support for Parameter Space-Driven Mining of Global Rules [PVLDB 2013, SIGMOD 2013, CIKM 2013] Joint work with Xika Lin, Christopher Ryan Botaish, Jason Whitehouse, Elke A. Rundensteiner, Matthew O. Ward Department of Computer Science, Worcester Polytechnic Institute (WPI), MA, USA.
  • Slide 6
  • Association Rule Mining (ARM) Basics and Support = 40%, Confidence = 100% RecordIDAgeMarriedNumCars 10023No1 20025Yes1 30029No0 40034Yes2 50038Yes2 R. Agrawal and R. Srikant, Fast algorithms for mining association rules in large databases, VLDB94. R. Srikant and R. Agrawal, Mining quantitative association rules in large relational tables, SIGMOD96. 6 Which customers to target for multi-car discount promos? 11/03/2014
  • Slide 7
  • Motivation for Interactive Mining Data Miner (minsupp, minconf) {ARs} Improve turnaround times of mining queries. Provide parameter recommendations. Preprocess data to enable fast interactive mining experience. Unacceptably long response time. Trial-and-error iterations. Forced to rerun for each subset. Data Analyst C.C. Aggarwal and P.S. Yu, A new approach to online generation of association rules, IEEE TKDE01. C. Hidber. Online Association Rule Mining, SIGMOD99. B. Nag, P. M. Deshpande, and D. J. DeWitt, Using a knowledge cache for interactive discovery of association rules, SIGKDD99. M. Kubat et al., Itemset trees for targeted association querying, IEEE TKDE03. M. Kaya and R. Alhajj. Online mining of fuzzy multidimensional weighted association rules. Applied Intelligence08. Limitations Research Goals 711/03/2014
  • Slide 8
  • XYZ XYZ {} 806040 20 10 XYXZYZ 100 II. Rule Generation I. Frequent Itemset Generation Offline Online Assumptions 1.Cost(Freq. Itemset Generation) >> Cost(Rule Generation), 2.Count(Itemsets) > Cost(Rule Generation)">
  • 2. Pre-processing Times C.C. Aggarwal and P.S. Yu, A new approach to online generation of association rules, IEEE TKDE01. B. Nag, P. M. Deshpande, and D. J. DeWitt, Using a knowledge cache for interactive discovery of association rules, SIGKDD99. PARAS requires ~10% extra offline preprocess time compared with AdjLatticeRR. 11/03/201422 Rule Generation T5000k = 4 sec Webdocs = 220 sec Confirmed: Cost(Freq. Itemset Generation) >> Cost(Rule Generation)
  • Slide 23
  • FIRE: User Study Questions Stable Region Usage Tests T1: What are the most prominent rules by support and confidence? T2: Which settings (out of choice of 4) returns a different set of rules? T3: Find the common and unique rules for two distinct parameter settings. Filter/Redundancy Test T4: Find the most frequent characteristics of edible and poisonous mushrooms. Skyline View Test T5: Find the parameter settings that produce top-k rules in the dataset, where k = 20, 50, 100. 22 subjects Mushroom and chess datasets Cached Rule Miner (CRM) versus FIRE Randomization to eliminate pre-knowledge 2311/03/2014
  • Slide 24
  • Mushroom Dataset: Tasks 1, 2 and 3 2411/03/2014 Overall, FIRE outperforms the competitor CRM approach such that the users can achieve similar or better accuracy while having to use significantly less time for the tasks.
  • Slide 25
  • Tasks 4 and 5 2511/03/2014 Overall, FIRE outperforms the competitor CRM approach such that the users can achieve similar or better accuracy while having to use significantly less time for the tasks.
  • Slide 26
  • Conclusion Gains of several orders of magnitude when using PARAS for online processing outweigh the one-time minimal offline preprocessing time and storage requirements. 2611/03/2014 We proposed a novel parameter space model, developed optimal algorithms and designed effective visualizations to facilitate interactive rule exploration by tackling challenges related to both computational and visualization aspects of online rule mining. Our user study establishes usability and effectiveness of the proposed features and interactions of the FIRE system in facilitating interactive rule mining.
  • Slide 27
  • Recent works at Samsung Research America 27 MobileMiner: Mining Your Frequent Behavior Patterns On Your Phone V Srinivasan et al., ACM UbiComp 2014 (Best Paper Nominee), HotMobile 2013. Mobile Sequence Miner: Adding Intelligence to Your Mobile Device via On-Device Sequential Pattern Mining A Mukherji et al., ACM MCSS Workshop in UbiComp 2014. User Behavior Analysis via On-device Mobile Sensing Unobtrusively learn sequential patterns of mobile users Typically, when I am home on Sunday nights, I call my parents Association rule mining over multi-modal mobile context data 11/03/2014
  • Slide 28
  • Thanks Contact me with questions: Abhishek Mukherji Samsung Research America [email protected] 2811/03/2014