PRIVACY-AWARE Personalization FOR Mobile Advertising.
description
Transcript of PRIVACY-AWARE Personalization FOR Mobile Advertising.
PRIVACY-AWARE PERSONALIZATION FOR MOBILE ADVERTISINGAuthors: Michaela Hardt, Suman NathPresented by: Michael Clegg
OUTLINE
Introduction: mobile ads Problem Design Goals Solution Analysis
MOBILE ADS
Important to internet economy $1.2 billion in advertising Increasingly personalized ads raise privacy
concerns Want to preserve privacy while gathering
useful statistics Mobile devices present new problems
AD DELIVERY SYSTEMS
3 main components: statistics gathering, add delivery, and billing
Current systems collect data: server-only and client-only personalization
Max efficiency or utility versus max privacy; sacrifice one or other
Something in between?
PROBLEMS
Want to balance ad relevance, privacy, and efficiency Maximizing all three variables is impossible without a TTP Optimization problem of balancing these variables is NP-
hard Must use an approximate solution with some tradeoff Users unwilling to share content contexts or even clicks;
CTR Distributed privacy preserving aggregation protocols are
unsuitable for mobile devices and dynamic networks; query timing, participation
DEFINITIONS AND CONSIDERATIONS
Participant classes: users (clients), advertisers, ad service (server)
Utility: revenue from or relevance/usefulness of ads to user
Privacy: measured in degree of context information shared
Efficiency: how few adds sent to client; overhead, computational and communication costs, specifically b.t. clients and server
DEFINITIONS AND CONSIDERATIONS (CONT.)
CTR: context –dependent click-through rates of ads; #clicks/#shown (impression)
Performance: efficiency, scalability, robustness
PER: privacy, efficiency, and relevance C c’ generalization of context C to
partial c
DESIGN GOALS
Optimization problem w/ 3 variables: privacy, communication efficiency, and utility is hard. How to solve? Approximate
Create a hybrid system to approximate solution with some tradeoff; performs better than both
Able to target specific users w/o violating user privacy too much
Ads sent should be as relevant as possible to maximize utility
Efficiently performs computations and transmits data Handles churning (dynamic network), requires no TTP
SOLUTION
Greedy algorithm with tight approximation guarantee
Work is done by both, with the client deciding how much to share, server maximizes utility given communication bounds
2 possibly parallel phases: statistics gathering and ad-delivery
Differentially-private aggregation protocol to compute CTR in churning NW without a TTP
Tested in Microsoft Bing
PHASE 1: STATISTICS GATHERING
The server gathers/computes offline the CTRs of various ads
Happens periodically in the background Uses historical context from many users Guarantees limited privacy by generalization of
context info Can guarantee l-diversity to protect sensitive
contexts User may decide to whether to participate or not
PHASE 2: AD-DELIVERY
Uses client’s current context (as limited) and estimated relevance of ads (based on CTR) to select a set of ads.
#ads sent does not exceed allowed overhead Most relevant one is chosen by the client using
the full context User decides whether or not to click (should be
likely)
ADVANTAGES
Robustness to dynamic (churning) population
Novel distributed aggregation protocol resistant to attack
No TTP
DISADVANTAGES
PER Tradeoff Less accurate/private in general Ignores other costs Conflicting goals of client, server, and
advertisers Revenue != relevance More computations for the server (see 15)
TRADEOFF
Not possible to maximize PER without a TTP
Maximizing any 2 means minimizing the third
In practice, however, found that these can be optimized
CLIENT SIDE: THE EASY COMPUTATION
Given a set of ads A, client with context c maximizes utility by choosing the add with greatest revenue = max[pa*CTR(a|c)], where pa is how much advertiser is paying for ad a in A and CTR is probability ad is viewed given the ad and context.
NOTE: ad relevance and revenue are related, but != Easy, since ad set is small, i.e. |A|<= k
SERVER-SIDE COMPUTATION: THE HARD PROBLEM
Needs to determine best k ads given partial context ĉ Considers probabilities of all possible contexts c’ ĉ and
expected revenues of ad A Uses to determine expected revenue E(Rev(A| ĉ)] Needs to find A* = {a | a in A and maximizes E} and |A*| = k This is NP-hard problem! No P time solution known
GREEDY APPROXIMATION
Algorithm 1 shows the greedy approximation Constructs A incrementally Approximates optimal value to within a factor of (1-1/e) Uses the benefit function E[Rev(A+A’| ĉ)] – E[…A…], where we
add A’ to set A. Choose A’ that maximizes, hence greedy They prove these claims formally
NP-COMPLETENESS: DETAILS
Some problems can’t be solved at all, let alone in O(n^k) (at worst polynomial) time, const k, input size n
There are also “intractable” problems we don’t know if it’s possible to do in O(n^k) time. We call these NPC
P = set of problems solvable in polynomial time
NP-COMPLETENESS: DETAILS (CONT.)
NP = members are verifiable in polynomial time. == P? open question. BIG problem!
NP-hard = {L | L’ “<=P” L for every L’ in NP}
In other words, the set of problems whose elements are not more than a polynomial factor harder than any in NP
NPC: set whose elements are in both NP and NP-hard
PRIVATE STATISTICS GATHERING
Algorithm uses a server and proxy for scalable and robust statistics gathering protocols
The user has ad history. Allows access via proposed protocol The server distributes keys and computes the final result The proxy is responsible for aggregation and anonymization Probability distributions over contexts Pr[c], CTR(a|c)
ASSUMPTIONS
Honest but curious servers; i.e. no collusion Honest fraction of users; at most t users are malicious Can use noise if server cannot be trusted; ensures differential
privacy in a distributed manner
Protocol 1 Robust, distributed count computing a privacy-preserving version of the sum over all private user bits bi. Count(2; t) 1. Each user i with bit bi samples ki 2 f0; : : : ; p 1g i.i.d. 2. Each user i samples ri from N(2=((1 t)N 1)). 3. Each user i uses 2-Phase-Commit to atomically send ki to the server and mi=bi+bric+ki mod p to the proxy. 4. The proxy sums up all incoming messagesmi. It forwards s = P mi mod p to the server. 5. The server subtracts from s the random numbers ki ( mod p) it received and releases the result P bi + ri.
COUNTING PROTOCOL
Used to compute CTR Sum with Gaussian noise of the users bits Ad viewed vector , sum of user bits Number of messages exchanged is linear in #users N Advantage: successful termination if at least (1-t)*N users
send messages to the server and proxy
Algorithm 2 Privacy-preserving Estimates. Estimates(context-driven ad log, noise scale ,
threshold min_support, contribution bound m, hierarchy
H) for each user do Delete all but m views or clicks on ads and their
contexts of this user from the ad log. return TopDown(ad log, root(H), , min_support)
Algorithm 3 Top-Down computation of noisy statistics. TopDown(context-driven ad log, node v in the hierarchy, noise scale , threshold min_support) A0 = set of ads with bids on context of v for a 2 A0 do clicksa;v = Count (# of clicks on a in v in ad log) no_clicksa;v = Count (# of views of a w/o clicks in v) release [CTR(ajv) = clicksa;v clicksa;v+no_clicksa;v countv = Count (# of appearances of node v appears) release countv if countv > min_support then for each child w of v do return TopDown(ad log, w , , min_support)
REFERENCES
“Privacy-Aware Personalization for Mobile Advertising”, Michaela Hardt and Suman Nath, ACM CCS, 2012.