On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering,...
-
date post
19-Dec-2015 -
Category
Documents
-
view
216 -
download
0
Transcript of On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering,...
On the Limits of Dictatorial Classification
Reshef MeirSchool of Computer Science and Engineering, Hebrew University
Joint work with Shaull Almagor, Assaf Michaely and Jeffrey S. Rosenschein
Strategy-Proof Classification
• An Example
• Motivation
• Our Model and previous results
• Filling the gap: proving a lower bound
• The weighted case
ERM
Motivation Model Results
Strategic labeling: an example
Introduction
5 errors
There is a better classifier! (for me…)
Motivation Model ResultsIntroduction
If I just change the
labels…
Motivation Model ResultsIntroduction
2+5 = 7 errors
ClassificationThe Supervised Classification problem:
– Input: a set of labeled data points {(xi,yi)}i=1..m
– output: a classifier c from some predefined concept class C ( e.g., functions of the form f : X{-,+} )
– We usually want c to classify correctly not just the sample, but to generalize well, i.e., to minimize
R(c) ≡the expected number of errors w.r.t. the distribution D
(the 0/1 loss function)
Motivation ResultsIntroduction Model
E(x,y)~D[ c(x)≠y ]
Classification (cont.)• A common approach is to return the ERM
(Empirical Risk Minimizer), i.e., the concept in C that is the best w.r.t. the given samples (has the lowest number of errors)
• Generalizes well under some assumptions on the concept class C (e.g., linear classifiers tend to generalize well)
With multiple experts, we can’t trust our ERM!
Motivation ResultsIntroduction Model
Where do we find “experts” with incentives?
Example 1: A firm learning purchase patterns– Information gathered from local retailers– The resulting policy affects them – “the best policy, is the policy that fits my pattern”
Introduction Model ResultsMotivation
Users Reported Dataset
Classification AlgorithmClassifier
Introduction Model Results
Example 2: Internet polls / polls of experts
Motivation
Introduction Model Results
Motivation from other domains
Motivation
Aggregating partitions
Judgment aggregation
Facility location (on the binary cube)
Agent A B A & B A | ~B
T F F T
F T F F
F F F T
A problem instance is defined by
• Set of agents I = {1,...,n}• A set of data points
X = {x1,...,xm} X• For each xkX agent i has a label yik{,}
– Each pair sik=xk,yik is a sample– All samples of a single agent compose the labeled dataset
Si = {si1,...,si,m(i)} • The joint dataset S= S1 , S2 ,…, Sn is our input
– m=|S|• We denote the dataset with the reported labels by S’
Introduction Motivation ResultsModel
Agent 1 Agent 2 Agent 3
Input: Example
––
–
–
+
+
–
X Xm
Y1 {-,+}m Y2 {-,+}m Y3 {-,+}m
S = S1, S2,…, Sn = (X,Y1),…, (X,Yn)
Introduction Motivation ResultsModel
–+
–
+
-
-
–
–+
+
–
-
+
+
Mechanisms
• A Mechanism M receives a labeled dataset S and outputs c = M(S) C
• Private risk of i: Ri(c,S) = |{k: c(xik) yik}| / mi
• Global risk: R(c,S) = |{i,k: c(xik) yik}| / m
• We allow non-deterministic mechanisms– Measure the expected risk
Introduction Motivation ResultsModel
% of errors on Si
% of errors on S
ERM
We compare the outcome of M to the ERM:c* = ERM(S) = argmin(R(c),S)r* = R(c*,S)
c C
Can our mechanism simply compute and return the ERM?
Introduction Motivation ResultsModel
(Lying)
Requirements
1. Good approximation: S R(M(S),S) ≤ α∙r*
2. Strategy-Proofness (SP): i,S,Si‘ Ri(M(S-i , Si‘),S) ≥ Ri(M(S),S)
• ERM(S) is 1-approximating but not SP• ERM(S1) is SP but gives bad approximation
Are there any mechanisms
that guarantee both SP and
good approximation?
Introduction Motivation ResultsModel
MOST IMPORTANT
SLIDE
(Truth)
• A study of SP mechanisms in Regression learning
– O. Dekel, F. Fischer and A. D. Procaccia, SODA (2008), JCSS (2009). [supervised learning]
• No SP mechanisms for Clustering
– J. Perote-Peña and J. Perote, Economics Bulletin (2003) [unsupervised learning]
Introduction Motivation Model Results Related work
Results
A simple case
• Tiny concept class: |C|= 2• Either “all positive” or “all negative”
Theorem: • There is a SP 2-approximation mechanism• There are no SP α-approximation mechanisms,
for any α<2
Introduction Motivation Model
Meir, Procaccia and Rosenschein, AAAI 2008
Previous work
Results
General concept classes
Theorem: Selecting a dictator at random is SP and guarantees approximation
– True for any concept class C– Generalizes well from sampled data when C has a
bounded VC dimension
Open question #1: are there better mechanisms?Open question #2: what if agents are weighted?
Introduction Motivation Model
Meir, Procaccia and Rosenschein, IJCAI 2009
Previous work
n23
A lower boundIntroduction Motivation Model Results
Theorem: There is a concept class C (where |C|=3), for which any SP mechanism has an approximation ratio of at least n
23
Our main result:
o Matching the upper bound from IJCAI-09
o Proof is by a careful reduction to a voting scenario
o We will see the proof sketch
Proof sketchIntroduction Motivation Model Results
Gibbard [‘77] proved that every (randomized) SP voting rule for 3 candidates, must be a lottery over dictators*.
We define X = {x,y,z}, and C as follows:
We also restrict the agents, so that each agent can have mixed labels on just one point
x y zcx + - -
cy - + -
cz - - +
x y z- - - - - - - - ++++ - - - - ++++++++
++++++++ - - - - - - - - ++ - - - - - -
Proof sketch (cont.)Introduction Motivation Model Results
x y z- - - - - - - - ++++ - - - - ++++++++
++++++++ - - - - - - - - ++ - - - - - -
Suppose that M is SP
Proof sketch (cont.)Introduction Motivation Model Results
x y z- - - - - - - - ++++ - - - - ++++++++
++++++++ - - - - - - - - ++ - - - - - -
Suppose that M is SP
1. M must be monotone on the mixed point
2. M must ignore the mixed point
3. M is a (randomized) voting rule
cz > cy > cx
cx > cz > cy
Proof sketch (cont.)Introduction Motivation Model Results
x y z- - - - - - - - ++++ - - - - ++++++++
++++++++ - - - - - - - - ++ - - - - - -
4. By Gibbard [‘77], M is a random dictator
5. We construct an instance where random dictators perform poorly
cz > cy > cx
cx > cz > cy
31
32
Weighted agentsIntroduction Motivation Model Results
• We must select a dictator randomly
• However, probability may be based on weight
• Naïve approach: o Only gives 3-approximation
• An optimal SP algorithm:o Matches the lower bound of
iwipr )(
)1(2)(
i
i
w
wipr
n23
Future work• Other concept classes
• Other loss functions (linear loss, quadratic loss,…)
• Alternative assumptions on structure of data
• Other models of strategic behavior
• …
Introduction Motivation Model Results