Post on 21-Jan-2016
Tag-Based Optimization for Top-k Product Design
Mahashweta Das, Gautam Das University of Texas at ArlingtonVagelis Hristidis Florida International University
Motivation
Given a database of tagged products, task is to design k new products (attribute values) that are likely to attract maximum number of desirable tags◦ tag-desirability is just one aspect of product design consideration
Applications◦ electronics, autos, apparel◦ musical artist, blogger
Problem Statement
Resolution?
Zoom? Flash?
Shooting mode?
Light Sensitivity?
Optimization Function Given a database of products, each having a set of
attributes and a set of desirable tags:◦ Build a Naive Bayes Classifier and compute P (Tag | Attributes)
Given classifier, we derive:
Expected number of desirable tags new product is annotated with:
Proposed Solution Problem is NP-Complete, even for:
Boolean attributes Top-1 Naïve Bayes Classifier
Exact Algorithm◦ Naïve◦ Exact Two-Tier Top-K
Approximation Algorithm◦ Hill Climbing◦ Approx Two-Tier Top-K ◦ PTAS
Exact Algorithm Naïve brute-force
◦ Consider all possible 2m products and compute for each possible product
◦ Exponential Complexity
Exact two-tier top-k (ETT)◦ Application of Rank-Join and TA top-k algorithm in a two-tier
architecture◦ Does not need to compute all possible products
performs significantly better than naïve brute-force◦ Works well for moderate data instances, does not scale to larger
data In the worst case, may have exponential running time
ETT: Two Tier Architecture• Determine “best”
product for each tag in tier-1
• Match these products in tier-2 to compute global best product across all tags
Main Algorithm Database: {A1, A2, A3, A4 } and {T1, T2} and top-1
◦ Partition attributes into 2 groups {A1, A2} and {A3, A4 } to form 2 lists of partial products
◦ Each list has 22 = 4 entries (partial products)◦ Compute score for each partial product for each tag using and sort in descending order
GetNext( ) = 1111 GetNext( ) = 1010
BufferTop-K ()
Product Complete Score
1111 1.75
1010 1.70
(A1 A2)
10, 1.97
00, 0.84
11, 0.84
01, 0.36
(A1 A2)
10, 1.97
00, 0.84
11, 0.84
01, 0.36L1 L2
(A1 A2)
11, 2.76
01, 1.18
10, 1.18
00, 0.51
(A1 A2)
11, 4.57
10, 2.53
01, 0.91
00, 0.51L1 L2
Join Product Actual Score
MPFS
1 1010 0.95 0.95
2 ..
.. ..
T1 T2
Join Join
Tier 2
Tier 1
Return to Tier 1
MinK (1.75) <= MUS (1.88)
Join Product Actual Score
MPFS
1 1111 0.93 0.93
.. ..
.. ..
>= >=
GetNext() = GetNext() =
(A1 A2)
10, 1.97
00, 0.84
11, 0.84
01, 0.36
(A1 A2)
10, 1.97
00, 0.84
11, 0.84
01, 0.36L1 L2
(A1 A2)
11, 2.76
01, 1.18
10, 1.18
00, 0.51
(A1 A2)
11, 4.57
10, 2.53
01, 0.91
00, 0.51L1 L2
T1 T2
Tier 2
Tier 1
Join Product Actual Score
MPFS
.. .. .. ..
.. .. .. ..
BufferTop-K ()
Product Complete Score
.. ..
.. ..
MUS: sum of last seen score from all GetNext()
MPFS:
Questions?Thank You