Presented at the American Evaluation Association/Canadian Evaluation Society Joint Conference
Association Analysis (4) (Evaluation). Evaluation of Association Patterns Association analysis...
-
date post
18-Dec-2015 -
Category
Documents
-
view
220 -
download
0
Transcript of Association Analysis (4) (Evaluation). Evaluation of Association Patterns Association analysis...
Association Analysis (4)(Evaluation)
Evaluation of Association Patterns• Association analysis algorithms have the potential to generate a large number
of patterns.
• In real commercial databases we could easily end up with thousands or even millions of patterns, many of which might not be interesting.
• Very important to establish a set of well accepted criteria for evaluating the quality of association patterns.
• First set of criteria can be established through statistical arguments.
– Patterns involving mutually independent items or cover very few transactions are considered uninteresting because they may capture spurious relationships in the data [confidence, support].
– Will talk also for interest factor.• Second set of criteria can be established through subjective arguments.
Subjective Arguments• A pattern is considered subjectively uninteresting unless it reveals
unexpected information about the data.
• E.g., the rule {Butter} {Bread} isn’t interesting, despite having high support and confidence values.
• On the other hand, the rule {Diapers} {Beer} is interesting because the relationship is quite unexpected and may suggest a new cross selling opportunity for retailers.
• Drawback: Incorporating subjective knowledge into pattern evaluation is a difficult task because it requires a considerable amount of prior information from the domain experts.
Computing Interestingness Measures• Given a rule X Y, the information needed to compute rule
interestingness can be obtained from a contingency table
Y Y
X f11 f10 f1+
X f01 f00 f0+
f+1 f+0 |T|
Contingency table for X Y
f11: support of X and Yf10: support of X and Yf01: support of X and Yf00: support of X and Y
Used to define various measures
Pitfall of ConfidenceCoffee Coffee
Tea 150 50 200
Tea 750 150 900
900 200 1100
Consider association rule: Tea Coffee
Confidence=
P(Coffee,Tea)/P(Tea) = P(Coffee|Tea) = 150/200 = 0.75 (seems quite high)
But, P(Coffee) = 0.9
Thus knowing that a person is a tea drinker actually decreases his/her probability of being a coffee drinker from 90% to 75%!
Although confidence is high, rule is misleading
In fact P(Coffee|Tea) =
P(Coffee, Tea)/P(Tea) = 750/900 = 0.83
The pitfall of confidence can be traced to the fact that the measure ignores the support of the itemset in the rule consequent.
Statistical Independence• Population of 1000 students• 600 students know how to swim (S)• 700 students know how to bike (B)• 420 students know how to swim and bike (S,B)
• P(S|B) = P(S) ( P(SB)/P(B) = .42 / .7 = .6 = P(S) )• P(SB)/P(B) = P(S)
• P(SB) = P(S) P(B) => Statistical independence• P(SB) > P(S) P(B) => Positively correlated
– i.e. if someone knows how to swim, then it is more probable he knows how to bike, and vice versa
• P(SB) < P(S) P(B) => Negatively correlated
– i.e. if someone knows how to swim, then it is less probable he/she knows how to bike, and vice versa
Interest Factor• Measure that takes into account statistical dependence
)()(
),(
YPXP
YXPInterest 11
11
11
11
ff
fN
NfNf
Nf
• Interest factor compares the frequency of a pattern against a baseline frequency computed under the statistical independence assumption.
• The baseline frequency for a pair of mutually independent variables is:
N
f
N
f
N
f 1111 Or equivalentlyN
fff 1111
Interest Equation• Previous equation follows from the standard approach of using
simple fractions as estimates for probabilities.
• The fraction f11/N is an estimate for the joint probability P(A,B), while f1+ /N and f+1 /N are the estimates for P(A) and P(B), respectively.
• If A and B are statistically independent, then P(A,B)=P(A)×P(B), thus the Interest is 1.
Example: Interest
Association Rule: Tea Coffee
Interest =
150*1100 / (200*900)= 0.92
(< 1, therefore they are negatively correlated)
Coffee Coffee
Tea 150 50 200
Tea 750 150 900
900 200 1100
Effect of Support Distribution• Many real data sets have skewed support distribution where
most of the items have relatively low to moderate frequencies, but a small number of them have very high frequencies.
Skewed distribution
• Tricky to choose the right support threshold for mining such data sets.
• If we set the threshold too high (e.g., 20%), then we may miss many interesting patterns involving the low support items from G1.
– Such low support items may correspond to expensive products (such as jewelry) that are seldom bought by customers, but whose patterns are still interesting to retailers.
• Conversely, when the threshold is set too low, there is the risk of generating spurious patterns that relate a high frequency item such as milk to low frequency item such as caviar.
Cross support patterns• They are patterns that relate a high frequency item such as milk to a low frequency
item such as caviar.
– Likely to be spurious because their correlations tend to be weak.
– Large number of weakly correlated cross support patterns can be generated when the support threshold is sufficiently low.
• E.g. the confidence of {caviar}{milk} is likely to be high, but still the pattern is spurious, since there isn’t probably any correlation between caviar and milk.
• However, we don’t want to use the Interest Factor during the computation of frequent itemsets because it doesn’t have the antimonotone property.
– Interest factor is rather used as a post-processing step.
• So, we want to detect cross-support pattern by looking at some antimonotone property.
– Towards this a definition comes next.
Cross support patternsDefinition
A cross support pattern is an itemset X = {i1, i2 ,…, ik} whose support ratio
is less than a user specified threshold hc.
Example
Suppose the support for milk is 70%, while the support for sugar is 10% and caviar is 0.04%
Given hc = 0.01, the frequent itemset {milk, sugar, caviar} is a cross support pattern because its support ratio is
r = min {0.7, 0.1, 0.0004} / max {0.7, 0.1, 0.0004}
= 0.0004 / 0.7 = 0.00058 < 0.01
k
k
isisis
isisisXr
,...,,max
,...,,min)(
21
21
Detecting cross support patterns
• E.g. assuming that hc = 0.3, the itemsets {p,q}, {p,r}, and {p,q,r} are cross support patterns. – Because their support ratios, which are
equal to 0.2, are less than the threshold hc.
• We can apply a high support threshold, say, 20%, to eliminate the cross support patterns…but,
this may come at the expense of discarding other interesting patterns such as the strongly correlated itemset {q,r} that has support equal to 16.7%.
Detecting cross support patterns
• Naïve confidence pruning also doesn’t help because the confidence of the rules extracted from cross support patterns can be very high.
• For example, the confidence for {q}{p} is 80% even though {p, q} is a cross support pattern. – No surprise because one of its items (p) appears
very frequently in the data. Therefore, p is expected to appear in many of the transactions that contain q.
• Meanwhile, the rule {q} {r} also has high confidence even though {q, r} is not a cross support pattern.
• These demonstrate the difficulty of using the confidence measure to distinguish between rules extracted from cross support and non cross support patterns.
Lowest confidence rule• Notice that the rule {p}{q} has very low confidence because
most of the transactions that contain p do not contain q.
• This observation suggests that:
Cross support patterns can be detected by examining the lowest confidence rule that can be extracted from a given itemset.
Finding lowest confidence • Recall the anti monotone property of confidence:
conf( {i1 ,i2}{i3,i4,…,ik} ) conf( {i1 ,i2 , i3}{i4,…,ik} )
• This property suggests that confidence never increases as we shift more items from the left to the right hand side of an association rule.
• Hence, the lowest confidence rule that can be extracted from a frequent itemset contains only one item on its left hand side.
Finding lowest confidence • Given a frequent itemset {i1,i2,i3,i4,…,ik}, the rule
{ij}{i1 ,i2 , i3, ij-1, ij+1, i4,…,ik}
has the lowest confidence if ?
s(ij) = max {s(i1), s(i2),…,s(ik)}
• This follows directly from the definition of confidence as the ratio between the rule's support and the support of the rule antecedent.
Finding lowest confidence• Summarizing, the lowest confidence attainable from a frequent
itemset {i1,i2,i3,i4,…,ik}, is
k
k
isisis
iiis
,...,,max
,...,,
21
21
• This is also known as the h-confidence measure or all-confidence measure.
• Because of the anti monotone property of support, the numerator of the h confidence measure is bounded by the minimum support of any item that appears in the frequent itemset. So,
(...)
,...,,max
,...,,min
,...,,max
,...,,confidence-h
21
21
21
21 risisis
isisis
isisis
iiis
k
k
k
k
h confidence• Clearly, cross support patterns can be eliminated by ensuring that
the h confidence values for the patterns exceed hc.
• Finally, observe that the measure is also anti monotone, i.e.,
h confidence({i1,i2,…, ik}) h confidence({i1,i2,…, ik+1 })
and thus can be incorporated directly into the mining algorithm.
Association Analysis (4)(Applications)
Type of attributes in assoc. analysis• Association rule mining assumes the input data consists of
binary attributes called items. – The presence of an item in a transaction is also assumed to be more
important than its absence. – As a result, an item is treated as an asymmetric binary attribute.
• Now we extend the formulation to data sets with symmetric binary, categorical, and continuous attributes.
Type of attributes• Symmetric binary attributes
– Gender
– Computer at Home
– Chat Online
– Shop Online
– Privacy Concerns
• Nominal attributes– Level of Education
– State
• Example of rules:
{Shop Online= Yes} {Privacy Concerns = Yes}.
This rule suggests that most Internet users who shop online are concerned about their personal privacy.
Transforming attributes into Asymmetric Binary Attributes
• The categorical and symmetric binary attributes are transformed into “items.”
• Create a new item for each distinct attribute-value pair.
• E.g., the nominal attribute Level of Education can be replaced by three binary items: – Education = College
– Education = Graduate
– Education = High School
• Symmetric binary attributes such as Gender is converted into a pair of binary items– Male
– Female
Internet survey data after binarizing attributes into “items”
Issues I (infrequent values)• Some attribute values may not be frequent
enough to be part of a frequent pattern. – More evident for nominal attributes that
have many possible values. e.g. state names. • Lowering the support threshold does not help
because: – Exponentially increases the number of
frequent patterns found (many of which may be spurious)
– Makes the computation more expensive.
Practical solution • Group related attribute values into a small
number of categories. – E.g., each state name can be replaced by its
corresponding geographical region. such as Midwest, Pacific Northwest, Southwest, and East Coast.
• Another possibility is to aggregate the less frequent attribute values into a single category called Others.
Issues II (skewed distribution)• Some attribute values may have considerably higher frequencies
than others. – E.g. suppose 85% of the survey participants own a home
computer.
– We may potentially generate many redundant spurious patterns such as:
{Computer at home = Yes, Shop Online = Yes} {Privacy Concerns = Yes}
Solutions
• Apply the techniques for skewed distributions.
Handling Continuous Attributes• Solution: Discretize
• Example of rules:– Age[21,35) Salary[70k,120k) Buy– Salary[70k,120k) Buy Age: =28, =4
• Of course to discretization isn’t always easy.– If intervals too large may not have enough confidence
Age [12,36) Chat Online = Yes (s = 30%, c = 57.7%) (minconf=60%)
– If intervals too small may not have enough supportAge [16,20) Chat Online = Yes (s = 4.4%, c = 84.6%) (minsup=15%)
• Potential solution: use all possible intervals (but too expensive)
Statistics-based quantitative association rulesSalary[70k,120k) Buy Age: =28, =4 Generated as follows:• Specify the target attribute (e.g. Age). • Withhold target attribute, and “itemize” the remaining attributes.• Apply algorithms such as Apriori or FP-growth to extract
frequent itemsets from the itemized data.– Each frequent itemset identifies an interesting segment of the
population. • Derive a rule for each frequent itemset.
– E.g., the preceding rule is obtained by averaging the age of Internet users who support the frequent itemset
{Annual Income> $100K, Shop Online = Yes}
• Remark: Notion of confidence is not applicable to such rules.
Multi-level Association Rules
Food
Bread
Milk
Skim 2%
Electronics
Computers Home
Desktop LaptopWheat White
Foremost Kemps
DVDTV
Printer Scanner
Accessory
Multi-level Association Rules• Why should we incorporate a concept hierarchy?
– Rules at lower levels may not have enough support to appear in any frequent itemsets
– Rules at lower levels of the hierarchy are overly specific e.g.,skim milk white bread,
2% milk wheat bread,
skim milk wheat bread, etc.
are all indicative of association between milk and bread
Multi-level Association Rules• How do support and confidence vary as we traverse the concept
hierarchy?– If X is the parent item for both X1 and X2, and they are the only
children, then (X) ≤ (X1) + (X2) (Why?)
– Because X1, and X2 might appear in the same transactions.
– If (X1 Y1) ≥ minsup, and X is parent of X1, Y is parent of Y1 then (X Y1) ≥ minsup
(X1 Y) ≥ minsup
(X Y) ≥ minsup
– If conf(X1 Y1) ≥ minconf,then conf(X1 Y) ≥ minconf
Multi-level Association RulesApproach 1
• Extend current association rule formulation by augmenting each transaction with higher level items
Original Transaction: {skim milk, wheat bread}
Augmented Transaction: {skim milk, wheat bread, milk, bread, food}
• Issues:– Items that reside at higher levels have much higher support counts
if support threshold is low, we get too many frequent patterns involving items from the higher levels
– Increased dimensionality of the data
Multi-level Association RulesApproach 2• Generate frequent patterns at highest level first.
• Then, generate frequent patterns at the next highest level, and so on.
• Issues:– I/O requirements will increase. – May miss some potentially interesting cross-level association patterns.
E.g.skim milk white bread, 2% milk white bread,skim milk white bread
Might not survive because of low support, but milk white bread
could. However, we don’t generate a cross-level itemset such as
{milk, white bread}