PRIVACY-AWARE Personalization FOR Mobile Advertising.

PRIVACY-AWARE PERSONALIZATION FOR MOBILE ADVERTISINGAuthors: Michaela Hardt, Suman NathPresented by: Michael Clegg

OUTLINE

Introduction: mobile ads Problem Design Goals Solution Analysis

MOBILE ADS

Important to internet economy $1.2 billion in advertising Increasingly personalized ads raise privacy

concerns Want to preserve privacy while gathering

useful statistics Mobile devices present new problems

AD DELIVERY SYSTEMS

3 main components: statistics gathering, add delivery, and billing

Current systems collect data: server-only and client-only personalization

Max efficiency or utility versus max privacy; sacrifice one or other

Something in between?

PROBLEMS

Want to balance ad relevance, privacy, and efficiency Maximizing all three variables is impossible without a TTP Optimization problem of balancing these variables is NP-

hard Must use an approximate solution with some tradeoff Users unwilling to share content contexts or even clicks;

CTR Distributed privacy preserving aggregation protocols are

unsuitable for mobile devices and dynamic networks; query timing, participation

DEFINITIONS AND CONSIDERATIONS

Participant classes: users (clients), advertisers, ad service (server)

Utility: revenue from or relevance/usefulness of ads to user

Privacy: measured in degree of context information shared

Efficiency: how few adds sent to client; overhead, computational and communication costs, specifically b.t. clients and server

DEFINITIONS AND CONSIDERATIONS (CONT.)

CTR: context –dependent click-through rates of ads; #clicks/#shown (impression)

Performance: efficiency, scalability, robustness

PER: privacy, efficiency, and relevance C c’ generalization of context C to

partial c

DESIGN GOALS

Optimization problem w/ 3 variables: privacy, communication efficiency, and utility is hard. How to solve? Approximate

Create a hybrid system to approximate solution with some tradeoff; performs better than both

Able to target specific users w/o violating user privacy too much

Ads sent should be as relevant as possible to maximize utility

Efficiently performs computations and transmits data Handles churning (dynamic network), requires no TTP

SOLUTION

Greedy algorithm with tight approximation guarantee

Work is done by both, with the client deciding how much to share, server maximizes utility given communication bounds

2 possibly parallel phases: statistics gathering and ad-delivery

Differentially-private aggregation protocol to compute CTR in churning NW without a TTP

Tested in Microsoft Bing

PHASE 1: STATISTICS GATHERING

The server gathers/computes offline the CTRs of various ads

Happens periodically in the background Uses historical context from many users Guarantees limited privacy by generalization of

context info Can guarantee l-diversity to protect sensitive

contexts User may decide to whether to participate or not

PHASE 2: AD-DELIVERY

Uses client’s current context (as limited) and estimated relevance of ads (based on CTR) to select a set of ads.

#ads sent does not exceed allowed overhead Most relevant one is chosen by the client using

the full context User decides whether or not to click (should be

likely)

ADVANTAGES

Robustness to dynamic (churning) population

Novel distributed aggregation protocol resistant to attack

No TTP

DISADVANTAGES

PER Tradeoff Less accurate/private in general Ignores other costs Conflicting goals of client, server, and

advertisers Revenue != relevance More computations for the server (see 15)

TRADEOFF

Not possible to maximize PER without a TTP

Maximizing any 2 means minimizing the third

In practice, however, found that these can be optimized

CLIENT SIDE: THE EASY COMPUTATION

Given a set of ads A, client with context c maximizes utility by choosing the add with greatest revenue = max[pa*CTR(a|c)], where pa is how much advertiser is paying for ad a in A and CTR is probability ad is viewed given the ad and context.

NOTE: ad relevance and revenue are related, but != Easy, since ad set is small, i.e. |A|<= k

SERVER-SIDE COMPUTATION: THE HARD PROBLEM

Needs to determine best k ads given partial context ĉ Considers probabilities of all possible contexts c’ ĉ and

expected revenues of ad A Uses to determine expected revenue E(Rev(A| ĉ)] Needs to find A* = {a | a in A and maximizes E} and |A*| = k This is NP-hard problem! No P time solution known

GREEDY APPROXIMATION

Algorithm 1 shows the greedy approximation Constructs A incrementally Approximates optimal value to within a factor of (1-1/e) Uses the benefit function E[Rev(A+A’| ĉ)] – E[…A…], where we

add A’ to set A. Choose A’ that maximizes, hence greedy They prove these claims formally

NP-COMPLETENESS: DETAILS

Some problems can’t be solved at all, let alone in O(n^k) (at worst polynomial) time, const k, input size n

There are also “intractable” problems we don’t know if it’s possible to do in O(n^k) time. We call these NPC

P = set of problems solvable in polynomial time

NP-COMPLETENESS: DETAILS (CONT.)

NP = members are verifiable in polynomial time. == P? open question. BIG problem!

NP-hard = {L | L’ “<=P” L for every L’ in NP}

In other words, the set of problems whose elements are not more than a polynomial factor harder than any in NP

NPC: set whose elements are in both NP and NP-hard

PRIVATE STATISTICS GATHERING

Algorithm uses a server and proxy for scalable and robust statistics gathering protocols

The user has ad history. Allows access via proposed protocol The server distributes keys and computes the final result The proxy is responsible for aggregation and anonymization Probability distributions over contexts Pr[c], CTR(a|c)

ASSUMPTIONS

Honest but curious servers; i.e. no collusion Honest fraction of users; at most t users are malicious Can use noise if server cannot be trusted; ensures differential

privacy in a distributed manner

Protocol 1 Robust, distributed count computing a privacy-preserving version of the sum over all private user bits bi. Count(2; t) 1. Each user i with bit bi samples ki 2 f0; : : : ; p 1g i.i.d. 2. Each user i samples ri from N(2=((1 t)N 1)). 3. Each user i uses 2-Phase-Commit to atomically send ki to the server and mi=bi+bric+ki mod p to the proxy. 4. The proxy sums up all incoming messagesmi. It forwards s = P mi mod p to the server. 5. The server subtracts from s the random numbers ki ( mod p) it received and releases the result P bi + ri.

COUNTING PROTOCOL

Used to compute CTR Sum with Gaussian noise of the users bits Ad viewed vector , sum of user bits Number of messages exchanged is linear in #users N Advantage: successful termination if at least (1-t)*N users

send messages to the server and proxy

Algorithm 2 Privacy-preserving Estimates. Estimates(context-driven ad log, noise scale ,

threshold min_support, contribution bound m, hierarchy

H) for each user do Delete all but m views or clicks on ads and their

contexts of this user from the ad log. return TopDown(ad log, root(H), , min_support)

Algorithm 3 Top-Down computation of noisy statistics. TopDown(context-driven ad log, node v in the hierarchy, noise scale , threshold min_support) A0 = set of ads with bids on context of v for a 2 A0 do clicksa;v = Count (# of clicks on a in v in ad log) no_clicksa;v = Count (# of views of a w/o clicks in v) release [CTR(ajv) = clicksa;v clicksa;v+no_clicksa;v countv = Count (# of appearances of node v appears) release countv if countv > min_support then for each child w of v do return TopDown(ad log, w , , min_support)

REFERENCES

“Privacy-Aware Personalization for Mobile Advertising”, Michaela Hardt and Suman Nath, ACM CCS, 2012.

http://www.cs.wichita.edu/~jadliwala/CS898AB/classpapers/Week8/p662-hardt.pdf

http://www.cs.wichita.edu/~jadliwala/CS898AB/classpapers/Week8/p662-hardt.pdf

PRIVACY-AWARE Personalization FOR Mobile Advertising.

Documents

Transcript of PRIVACY-AWARE Personalization FOR Mobile Advertising.