Preserving Worker Privacy in Crowdsourcing

23
Preserving Worker Privacy in Crowdsourcing Hiroshi Kajino 1 , Hiromi Arai 2 , Hisashi Kashima 3 1. The University of Tokyo, 2. RIKEN, 3. Kyoto University 1 18/09/14 ECML/PKDD 2014

Transcript of Preserving Worker Privacy in Crowdsourcing

Preserving  Worker  Privacy  in  Crowdsourcing

Hiroshi  Kajino1,  Hiromi  Arai2,  Hisashi  Kashima3  1.  The  University  of  Tokyo,  2.  RIKEN,  3.  Kyoto  University  

1 18/09/14 ECML/PKDD 2014

Outline

■  IntroducNon  &  ExisNng  Work  □  Crowdsourcing:  Outsourcing  to  unspecified  people  □  Quality  control:  Quality  of  results  is  variable  

■  Proposed  Problem  SeVng  □ Worker  privacy:  SensiNve  info  of  workers  can  be  inferred  □ Worker-­‐private  quality  control  problem  

■  Proposed  Method  □  ExisNng  Quality  control  method  +  secure  computaNon  

■  Experiments  □  Accuracy:  Validate  approximaNon  in  secure  computaNon  □  Computa=on  =me:  Validate  computaNonal  overhead

18/09/14 ECML/PKDD 2014 2

Propose  &  address  a  worker  privacy  problem  in  crowdsourcing

Outline

■  IntroducNon  &  ExisNng  Work  □  Crowdsourcing:  Outsourcing  to  unspecified  people  □  Quality  control:  Quality  of  results  is  variable  

■  Proposed  Problem  SeVng  □ Worker  privacy:  SensiNve  info  of  workers  can  be  inferred  □ Worker-­‐private  quality  control  problem  

■  Proposed  Method  □  ExisNng  Quality  control  method  +  secure  computaNon  

■  Experiments  □  Accuracy:  Validate  approximaNon  in  secure  computaNon  □  Computa=on  =me:  Validate  computaNonal  overhead

18/09/14 ECML/PKDD 2014 3

Propose  &  address  a  worker  privacy  problem  in  crowdsourcing

Research  Target

■  Crowdsourcing  □  Pros:  Easy  to  use  at  low  costs  

•  Industry:  Reduce  financial/Nme  costs  for  outsourcing  

•  Academy:  Trigger  of  new  AI  research  areas  (human  computaNon)  

□  Cons:  Quality  issue,  privacy  issues,  etc.  

4

Crowdsourcing  is  a  method  to  outsource  tasks  to  unspecified  workers  

18/09/14 ECML/PKDD 2014

Worker  Requester  

overlooks  inquiry

1.  Submit  instances

2.  Return  answers

(h]p://www.captcha.net/)  

ExisNng  Work

■  Quality  of  answers  depends  on  abiliNes  of  workers  □  CollecNng  labels  from  mulNple  workers  is  necessary  

■  Quality  control  problem  (in  a  labeling  task)  □  Input:  Crowd  labels  {yij ∈ {0,1} | i = 1,..., I, j = 1,..., J} □  Output:  EsNmated  true  labels  {yi ∈ {0,1} | i = 1,..., I}    

18/09/14 ECML/PKDD 2014 5

EsNmate  ground  truth  labels  by  aggregaNng  mulNple  workers’  answers

Task  example:    Label  an  image    whether  it  contains    a  bird  or  not  

1 =  Bird  0 =  Not  Bird

instance  i 1 1 0 0 1 0

0 0

0 ? ?

?

Ground  truth

worker  j

ExisNng  Work

■  Latent  Class  Method  [Dawid  &  Skene,  1979]  □ Model:  Latent  class  model  

•  p = Pr[yi = 1]: Prob.  of  true  label = 1 •  αj = Pr[yij = yi | yi = 1] •  βj = Pr[yij = yi | yi = 0] •  I, J: #(Instance),  #(Worker)  

□  Inference:  Given  {yij},  esNmate  {yi}, {αj, βj}, p •  E-­‐step:  EsNmate  {yi},  fixing  {αj, βj}, p •  M-­‐step:  EsNmate  {αj, βj}, p,  fixing  {yi}

18/09/14 ECML/PKDD 2014 6

EsNmate  consensus  labels  by  inferring  worker  models

AbiliNes  of  worker j

yi yij p αj J I

βj

Outline

■  IntroducNon  &  ExisNng  Work  □  Crowdsourcing:  Outsourcing  to  unspecified  people  □  Quality  control:  Quality  of  results  is  variable  

■  Proposed  Problem  SeVng  □ Worker  privacy:  SensiNve  info  of  workers  can  be  inferred  □ Worker-­‐private  quality  control  problem  

■  Proposed  Method  □  ExisNng  Quality  control  method  +  secure  computaNon  

■  Experiments  □  Accuracy:  Validate  approximaNon  in  secure  computaNon  □  Computa=on  =me:  Validate  computaNonal  overhead

18/09/14 ECML/PKDD 2014 7

Propose  &  address  a  worker  privacy  problem  in  crowdsourcing

Worker  Privacy  Issue

■  SensiNve  informaNon  in  answers  □  Loca=on  

•  AED4  collects  locaNons  of  AEDs  in  a  map  •  Movement  history  of  a  worker  is  revealed  

□  Personal  Informa=on  in  Ques=onnaire  Task  •  Interest  of  workers,  personal  informaNon  (quasi-­‐idenNfier)  •  Joining  other  data  sets  can  idenNfy  anonymous  workers  

□  Ability    •  Quality  control  methods  reveal  the  ability  of  a  worker  •  DemoNvate  to  join  in  volunteer-­‐based  crowdsourcing

18/09/14 ECML/PKDD 2014 8

Simply  passing  answers  to  the  requester  can  invade  worker  privacy

Our  Problem  SeVng

■  Worker-­‐Private  Quality  Control  Problem  □  Input:  Crowd  labels  {yij | i = 1,..., I, j = 1,..., J} □  Output:  EsNmated  true  labels  {yi | i = 1,..., I} □  Subject  to:  Labels  and  abiliNes  are  kept  worker-­‐private  

                 cf.  Similar  def  can  be  found  in  query  audiNng  

18/09/14 ECML/PKDD 2014 9

We  propose  a  worker-­‐private  quality  control  problem

Worker  j’s  vj  is  worker-­‐private  if  others  cannot  determine  vj  uniquely  

Defini=on  

Outline

■  IntroducNon  &  ExisNng  Work  □  Crowdsourcing:  Outsourcing  to  unspecified  people  □  Quality  control:  Quality  of  results  is  variable  

■  Proposed  Problem  SeVng  □ Worker  privacy:  SensiNve  info  of  workers  can  be  inferred  □ Worker-­‐private  quality  control  problem  

■  Proposed  Method  □  ExisNng  Quality  control  method  +  secure  computaNon  

■  Experiments  □  Accuracy:  Validate  approximaNon  in  secure  computaNon  □  Computa=on  =me:  Validate  computaNonal  overhead

18/09/14 ECML/PKDD 2014 10

Propose  &  address  a  worker  privacy  problem  in  crowdsourcing

Proposed  Method:  Overview

■  Worker-­‐Private  Latent  Class  Protocol  □ Model:  Latent  class  model  (same  as  the  previous  one)  □  Secure  Inference:  

•  E-­‐step:  Requester  &  workers  esNmate  {yi}  by  secure  computaNon  

•  M-­‐step:  Each  worker  updates  αj,  βj  secretly  

18/09/14 ECML/PKDD 2014 11

Propose  a  privacy-­‐preserving  inference  algorithm  for  LC  model

secure  computaNon  

Workers  keep    their  answers  secret

Requester  obtains  true  answers

New!

Proposed  Method:  Building  Block

■  Secure  Sum  Protocol  (Generalized  Paillier  cryptosystem  [Damgård+,01])  

Compute  Σj vj when  each  worker  j has  value  vj secretly  □  Addi=ve  Homomorphic  Cryptosystem:  

For  plaintexts  v1, v2  ∈  Zn and  ciphertexts  Enc(v1), Enc(v2),     Enc(v1 + v2) = Enc(v1)・Enc(v2) holds  

□  Protocol:  1)  Each  worker  j computes  Enc(vj),  and  parNes  compute  Enc(Σj vj)    2)  ParNes  decrypt  Enc(Σj vj) using  distributed  secret  keys  

18/09/14 ECML/PKDD 2014 12

Secure  sum  allows  us  to  compute  the  sum  without  privacy  invasion

Aoer  execuNng  the  protocol,  any  party  learns  nothing  other  than  their  iniNal  knowledge  &  the  sum.  

Lemma  

Proposed  Method:  Algorithm

■  Worker-­‐Private  Latent  Class  Protocol  □  Parameters:  {μi}, p, {αj}, {βj}  

•  μi = Pr[yi = 1 | Data], p = Pr[yi = 1] •  αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0]

18/09/14 ECML/PKDD 2014 13

Incorporate  workers  into  computaNon  to  preserve  worker  privacy

True  labels μ1 μ2 μ3 p

1 0 1 α1, β1

1 0 0 α2, β2

0 0 0 α3, β3

AbiliNes

Proposed  Method:  Algorithm

■  Worker-­‐Private  Latent  Class  Protocol  □  Parameters:  {μi}, p, {αj}, {βj}  

•  μi = Pr[yi = 1 | Data], p = Pr[yi = 1] •  αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0]

18/09/14 ECML/PKDD 2014 14

Incorporate  workers  into  computaNon  to  preserve  worker  privacy

True  labels μ1 μ2 μ3 p

1 0 1 α1, β1

1 0 0 α2, β2

0 0 0 α3, β3

AbiliNes

Public

Private  values  of  each  worker

Proposed  Method:  Algorithm

■  Worker-­‐Private  Latent  Class  Protocol  □  Parameters:  {μi}, p, {αj}, {βj}  □  E-­‐Step:  ParNes  update  true  labels  using  secure  sum

18/09/14 ECML/PKDD 2014 15

Incorporate  workers  into  computaNon  to  preserve  worker  privacy

True  labels μ1 μ2 μ3 p

1 0 1 α1, β1

1 0 0 α2, β2

0 0 0 α3, β3

AbiliNes

Weighted  majority  vote  of  crowd  labels

Proposed  Method:  Algorithm

■  Worker-­‐Private  Latent  Class  Protocol  □  Parameters:  {μi}, p, {αj}, {βj}  □ M-­‐Step:  Each  worker  independently  updates  abiliNes

18/09/14 ECML/PKDD 2014 16

Incorporate  workers  into  computaNon  to  preserve  worker  privacy

True  labels μ1 μ2 μ3 p

1 0 1 α1, β1

1 0 0 α2, β2

0 0 0 α3, β3

AbiliNes

Checking  agreement

Proposed  Method:  Security  Analysis

■  CondiNons  □  #(workers)  ≧  3  □  For  each  instance,  there  exist  at  least  one  worker  who  does  not  give  a  label  to  the  instance.  

18/09/14 ECML/PKDD 2014 17

Making  true  labels  public  does  not  invade  worker  privacy

Aoer  execuNng  the  protocol,  each  worker’s  labels  and  abiliNes  are  kept  worker-­‐private.  

Theorem  

Outline

■  IntroducNon  &  ExisNng  Work  □  Crowdsourcing:  Outsourcing  to  unspecified  people  □  Quality  control:  Quality  of  results  is  variable  

■  Proposed  Problem  SeVng  □ Worker  privacy:  SensiNve  info  of  workers  can  be  inferred  □ Worker-­‐private  quality  control  problem  

■  Proposed  Method  □  ExisNng  Quality  control  method  +  secure  computaNon  

■  Experiments  □  Accuracy:  Validate  approximaNon  in  secure  computaNon  □  Computa=on  =me:  Validate  computaNonal  overhead

18/09/14 ECML/PKDD 2014 18

Propose  &  address  a  worker  privacy  problem  in  crowdsourcing

Experiments:  Overview

■  Cons  of  secure  computaNon  1)  Approxima=on:  

•  Secure  sum  protocol  works  only  on  integers  

•  Use  approximaNon  parameter  L  to  convert  as  vj  -­‐>  round(L vj) 2)  Computa=on  Time:  

•  Cryptographic  (&  communicaNon)  overhead  

■  Data  Set  □  Duchenne  Data  Set:  [Whitehill+,09]  

•  Judge  fake  smile  or  not  •  #(workers)=20,  #(instances)=159

18/09/14 ECML/PKDD 2014 19

Evaluate  two  drawbacks  of  introducing  secure  computaNon

Cited  from  [Whitehill+,09]  

worker  j ’s  value

Large  number

Experiments:  (1)  ApproximaNon  Accuracy

■  RelaNve  Errors  of  EsNmated  Parameters  □  Compare  esNmated  model  parameters  w/  &  w/o  secure  comp.  □  Approx.  parameter  L  can  control  errors  arbitrarily  □  Note:  Accuracy  of  the  true  labels  was  the  same  as  the  original  

18/09/14 ECML/PKDD 2014 20

EsNmaNon  errors  can  be  handled  by  approximaNon  parameter  L

Approx.  Parameter  L  

Experiments:  (2)  ComputaNon  Time

■  Cryptographic  Overhead  □  Key  generaNon  □  One  iteraNon  of  the  algorithm  (encrypNon  &  decrypNon)  

0.8  sec  on  the  real  data  set  (#(workers)=20,  #(instances)=159,  #(iteraNons)=15)  

18/09/14 ECML/PKDD 2014 21

AddiNonal  computaNon  Nme  on  a  real  data  set  was  less  than  a  second

#(workers)  

Conclusion

■  ContribuNons  of  Our  Work  □  No=on  of  worker  privacy  

•  Workers’  sensiNve  informaNon  can  leak  from  their  answers  

□ WPLC  protocol  •  Introducing  secure  computaNon  into  the  LC  method  •  Security  is  theoreNcally  guaranteed  

□  Experiments  •  Accuracy  can  be  controlled  by  a  hyperparameter  •  ComputaNon  Nme  is  tolerable

18/09/14 ECML/PKDD 2014 22

We  proposed  the  noNon  of  worker  privacy

QuesNons?

18/09/14 ECML/PKDD 2014 23