Post on 01-Jan-2016
Rachid Guerraoui, EPFL
What is a good recommendation system?
Recommendation systems are good
A good recommendation system is one that provides good recommendations
What is a good recommendation?
You know it when you see it
“ I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description ["hard-core pornography"]; and perhaps I could
never succeed in intelligibly doing so. But I know it when I see it, and the motion picture involved in this case is not that”
Justice Potter Stewart, US Supreme Court, 1964
Ideally: Build and deploy your system
Pragmatic: Transform past into future
What is a good recommendation system ?
Example
• Members of program committee (20) want to evaluate the submitted papers (200)
• Nobody has enough time to read all papers
• Each researcher is assigned a subset of papers
• A recommendation system uses the scores to find the opinion of all members about all papers
What is a good recommendation?
It depends on the correlation
Theory to the rescue
• n users• k * n objects• For each user and object: a grade– The grades of a user form his preference vector– The vectors of users form the preference matrix– Grades may be binary, discrete, continuous
General recommendation model
Vectors of grades: v(p)(known partially to the players)
Input?
Vectors of grades: w(p)(seeking to approximate v(p))
Output?
Ideal output
Target output
w(p) = v(p)
Minimize max |w(p)-v(p)| (Hamming distance)
Compare with a perfect on-line algorithm
How to account for the level of correlation?
Shared billboard
(1) All players know all partial vectors
The perfect on-line algorithm
The perfect on-line algorithm
(2) Chooses elements of the partial vectors to fill (B budget)
The player is initially indulgent (learning phase)
The algorithm assigns initial papers
(3) Knows the level of correlation
Hamming diameter of a set P
€
D(P) = max(v(p) − v(q) ) −∀p,q∈P
The perfect on-line algorithm
20 pc members; 200 papers
Every member can read 10 papers
All have the same taste
Perfect solution possible?
20 pc members; 200 papers
Two clusters of 10 have the same taste
Perfect solution possible?
Every member needs to read 20
Assume player p can probe B objects
n/B*k – 1
How many other players does p need to collaborate with to fill its vector?
20 pc members; 200 papers
4 clusters of 5 with diameter 8
Every member reads 20
What is the minimal error rate?
Ideal algorithm (k=1)• A player p has to use ideas of (n/B)-1 other
players to estimate her/his preferences
In the worst case, p cannot do better
• The rate of error for p depends on the hamming distance between p and the other (n/B) players
• This is with a constant factor of the diameter of these n/B players
Claim
For every B-algorithm, there is some distribution of preferences such that (with constant probability)
€
w(p) − v(p) ≥min(D(P)
4) −∀P, p∈P, P ≥ n /B
Proof (sketch)
Consider a constant D > 2B Define a preference vector as follows:
Let P be a set of players of size n/B - Let p in P with a random preference vector -Assign a random preference vector outside P
Choose a set S of D objects. For every player q in P, v(q)=v(p) except in S which is random
Proof (sketch)Probes outside P provide no information to p
Probes inside P provide no information to p w.r.t S
Since p probes at most B objects and S contains D > 2B objects, there are at least D/2 objects for which p has no information
No algorithm can do better than guess preferences in S
The rate of error is at least D/4 and the diameter of P is less than D
Optimality
An algorithm is (B,c)-optimal if for every input set of preferences
€
w(p) − v(p) ≤ min(cD(P)) −∀p∈P, P ≥ n /B
So what?The best we can do is find clusters of players that
are - Small enough (small diameter) to provide
“accurate” preferencesAnd- Big enough to cover all objects
• Practically speaking? - Try different sizes of clusters
Optimality
• Assume each player can evaluate B objects. • Given B, and the level of correlation among
players, there is a minimum rate of error that can be achieved.
• There is an algorithm that obtains a constant approximation of this error-rate, and each player evalutes O(B.Polylog(n)) objects.
Definition of Optimality
• An algorithm is asymptotically optimal in terms of error rate, if for every player p we have:
• |w(p)-v(p)| < min|P|>n/B-1 cD(P)• Where c is a constant and D(P) is the diameter
of set P. P can be any set of players with size at least n/B.