On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering,...

On the Limits of Dictatorial Classification

Reshef MeirSchool of Computer Science and Engineering, Hebrew University

Joint work with Shaull Almagor, Assaf Michaely and Jeffrey S. Rosenschein

Strategy-Proof Classification

• An Example

• Motivation

• Our Model and previous results

• Filling the gap: proving a lower bound

• The weighted case

ERM

Motivation Model Results

Strategic labeling: an example

Introduction

5 errors

There is a better classifier! (for me…)

Motivation Model ResultsIntroduction

If I just change the

labels…

Motivation Model ResultsIntroduction

2+5 = 7 errors

ClassificationThe Supervised Classification problem:

– Input: a set of labeled data points {(xi,yi)}i=1..m

– output: a classifier c from some predefined concept class C ( e.g., functions of the form f : X{-,+} )

– We usually want c to classify correctly not just the sample, but to generalize well, i.e., to minimize

R(c) ≡the expected number of errors w.r.t. the distribution D

(the 0/1 loss function)

Motivation ResultsIntroduction Model

E(x,y)~D[ c(x)≠y ]

Classification (cont.)• A common approach is to return the ERM

(Empirical Risk Minimizer), i.e., the concept in C that is the best w.r.t. the given samples (has the lowest number of errors)

• Generalizes well under some assumptions on the concept class C (e.g., linear classifiers tend to generalize well)

With multiple experts, we can’t trust our ERM!

Motivation ResultsIntroduction Model

Where do we find “experts” with incentives?

Example 1: A firm learning purchase patterns– Information gathered from local retailers– The resulting policy affects them – “the best policy, is the policy that fits my pattern”

Introduction Model ResultsMotivation

Users Reported Dataset

Classification AlgorithmClassifier

Introduction Model Results

Example 2: Internet polls / polls of experts

Motivation

Introduction Model Results

Motivation from other domains

Motivation

Aggregating partitions

Judgment aggregation

Facility location (on the binary cube)

Agent A B A & B A | ~B

T F F T

F T F F

F F F T

A problem instance is defined by

• Set of agents I = {1,...,n}• A set of data points

X = {x1,...,xm} X• For each xkX agent i has a label yik{,}

– Each pair sik=xk,yik is a sample– All samples of a single agent compose the labeled dataset

Si = {si1,...,si,m(i)} • The joint dataset S= S1 , S2 ,…, Sn is our input

– m=|S|• We denote the dataset with the reported labels by S’

Introduction Motivation ResultsModel

Agent 1 Agent 2 Agent 3

Input: Example

––

–

–

+

+

–

X Xm

Y1 {-,+}m Y2 {-,+}m Y3 {-,+}m

S = S1, S2,…, Sn = (X,Y1),…, (X,Yn)


–+

–

+

-

-

–

–+

+

–

-

+

+

Mechanisms

• A Mechanism M receives a labeled dataset S and outputs c = M(S) C

• Private risk of i: Ri(c,S) = |{k: c(xik) yik}| / mi

• Global risk: R(c,S) = |{i,k: c(xik) yik}| / m

• We allow non-deterministic mechanisms– Measure the expected risk


% of errors on Si

% of errors on S

ERM

We compare the outcome of M to the ERM:c* = ERM(S) = argmin(R(c),S)r* = R(c*,S)

c C

Can our mechanism simply compute and return the ERM?


(Lying)

Requirements

1. Good approximation: S R(M(S),S) ≤ α∙r*

2. Strategy-Proofness (SP): i,S,Si‘ Ri(M(S-i , Si‘),S) ≥ Ri(M(S),S)

• ERM(S) is 1-approximating but not SP• ERM(S1) is SP but gives bad approximation

Are there any mechanisms

that guarantee both SP and

good approximation?


MOST IMPORTANT

SLIDE

(Truth)

• A study of SP mechanisms in Regression learning

– O. Dekel, F. Fischer and A. D. Procaccia, SODA (2008), JCSS (2009). [supervised learning]

• No SP mechanisms for Clustering

– J. Perote-Peña and J. Perote, Economics Bulletin (2003) [unsupervised learning]

Introduction Motivation Model Results Related work

Results

A simple case

• Tiny concept class: |C|= 2• Either “all positive” or “all negative”

Theorem: • There is a SP 2-approximation mechanism• There are no SP α-approximation mechanisms,

for any α<2

Introduction Motivation Model

Meir, Procaccia and Rosenschein, AAAI 2008

Previous work

Results

General concept classes

Theorem: Selecting a dictator at random is SP and guarantees approximation

– True for any concept class C– Generalizes well from sampled data when C has a

bounded VC dimension

Open question #1: are there better mechanisms?Open question #2: what if agents are weighted?

Introduction Motivation Model

Meir, Procaccia and Rosenschein, IJCAI 2009

Previous work

n23

A lower boundIntroduction Motivation Model Results

Theorem: There is a concept class C (where |C|=3), for which any SP mechanism has an approximation ratio of at least n

23

Our main result:

o Matching the upper bound from IJCAI-09

o Proof is by a careful reduction to a voting scenario

o We will see the proof sketch

Proof sketchIntroduction Motivation Model Results

Gibbard [‘77] proved that every (randomized) SP voting rule for 3 candidates, must be a lottery over dictators*.

We define X = {x,y,z}, and C as follows:

We also restrict the agents, so that each agent can have mixed labels on just one point

x y zcx + - -

cy - + -

cz - - +

x y z- - - - - - - - ++++ - - - - ++++++++

++++++++ - - - - - - - - ++ - - - - - -

Proof sketch (cont.)Introduction Motivation Model Results

x y z- - - - - - - - ++++ - - - - ++++++++

++++++++ - - - - - - - - ++ - - - - - -

Suppose that M is SP


x y z- - - - - - - - ++++ - - - - ++++++++

++++++++ - - - - - - - - ++ - - - - - -

Suppose that M is SP

1. M must be monotone on the mixed point

2. M must ignore the mixed point

3. M is a (randomized) voting rule

cz > cy > cx

cx > cz > cy


x y z- - - - - - - - ++++ - - - - ++++++++

++++++++ - - - - - - - - ++ - - - - - -

4. By Gibbard [‘77], M is a random dictator

5. We construct an instance where random dictators perform poorly

cz > cy > cx

cx > cz > cy

31

32

Weighted agentsIntroduction Motivation Model Results

• We must select a dictator randomly

• However, probability may be based on weight

• Naïve approach: o Only gives 3-approximation

• An optimal SP algorithm:o Matches the lower bound of

iwipr )(

)1(2)(

i

i

w

wipr

n23

Future work• Other concept classes

• Other loss functions (linear loss, quadratic loss,…)

• Alternative assumptions on structure of data

• Other models of strategic behavior

• …

Introduction Motivation Model Results

On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering,...

Documents

Transcript of On the Limits of Dictatorial Classification Reshef Meir School of Computer Science and Engineering,...