Scott Burton and Richard Morris CS 676 Presentation 12 April 2011.
-
Upload
emerald-page -
Category
Documents
-
view
216 -
download
1
Transcript of Scott Burton and Richard Morris CS 676 Presentation 12 April 2011.
Mining Rules from Surveys and Questionnaires
Scott Burton and Richard MorrisCS 676 Presentation
12 April 2011
Frequently Used Problems for data mining• Rarity• Related and dependent questions• Ordinal / Likert scale
Surveys and Questionnaires
Association Rule Mining
Market basket analysis
Cookies -> Milk
Customer Milk Cookies Butter Bread
A x x
B x x x
C x x
D x x
Our Goal: Improve PrecisionStandard Algorithms/Approaches• Apriori, MS-Apriori• Too many rules• Rules are not “interesting” or actionable• Finding the needle in the haystack
Our goal• Improve Precision• How do you measure “interestingness?”
Mostly based on Support or Confidence Considered about 40 different metrics All seemed to favor the wrong types of rules
Interestingness Measures
Our Datasets Smoking habits of middle school students
in Mexico• Global Youth Tobacco Survey for the Pan
American Health Organization (GYTSPAHO)• ~65 Questions and 13,000 responses
HINTS (Health Information National Trends Survey)• hints.cancer.gov• 2007 response data had ~475 Questions and
8,000 responses• We focused on a subset of ~100 questions
Apriori vs. MS-Apriori
Apriori (Figure 1)
MS-Apriori (Figure 2)
Related and Dependent QuestionsTrue but worthless rules• Do you smoke=no -> Did you smoke last
week=no
Our approach• Cluster similar questions• Remove any intra-cluster rules
1
2 3
4
5 6
7
8 9
Distance Metrics◦ Bi-conditional prediction
Attribute vs. Attribute-Value pair
Involving the subject matter expert
Creating Clusters
A Sample Clustering of Questions
(see handout)
Effects of Cluster PruningMS-Apriori (Figure 2)
After cluster pruning (Figure 3)
Similar Rules
Abstract Viewpoint:• A B -> C D• A -> C D• A B -> C• A B Z -> C D
Similar Rule Pruning
Effects of Similar Rule Pruning
After cluster pruning (Figure 3)
After Similar Rule Pruning (Figure 4)
Ordinal and Likert DataTwo Approaches• Pre-process• Post-process
Ordinal Likert
Effects of Pre-Binning (Figure 5)
HINTS Data
(see handout, Figures 6-10)
Other Examples
Conclusions and Future WorkConclusions• Increased precision of “interesting” rules• More work to be done
Future work• Tuning of existing processes• Handle numerical data• Handle questions not asked to everyone• Handle questions with multiple responses• Try other record matching techniques for similar
rule pruning