CHOOSING THE LEVEL OF RANDOMIZATION. Unit of Randomization: Individual?
Graph cluster randomization
-
Upload
subhashis-hazarika -
Category
Education
-
view
292 -
download
0
Transcript of Graph cluster randomization
Graph Cluster Randomization: Network Exposure to Multiple Universes
Authors:
Johan Ugander, Cornell University
Brian Karrer, Facebook
Lars Backstrom, Facebook
Jon Kleinberg, Cornell University
Presented by:
Subhashis Hazarika,
Ohio State University
Motivation
• To estimate “average effect” of a treatment on a sample when the treatment of individuals in the sample spills over to the neighboring individuals via an underlying social network.
• A/B testing is so far the standard approach for “average effect” estimation of a treatment on sample population.
• But A/B testing doesn’t take into account the social interference of the sample being treated.
17-10-2013 2
A/B testing
• Assumption : SUTVA (single unit treatment value assumption)
• Universe A and Universe B are treated as two separate parallel universes.
New page A
• Treatment group
• Individuals respond independently
Default page B
• Control group
• Independent response
17-10-2013 3
Proposed Solution
Graph Cluster Randomization
– Formulate Average Treatment and Network Exposure w.r.t graph-theoretic conditions
– Apply graph cluster randomization algorithms on the formulated model
– Come up with an unbiased estimator i.e; Horvitz-Thompson estimator, with an upper bound on the estimator variance that is linear in the degrees of the graph.
17-10-2013 4
Average Treatment
• Given by Aronow and Samii equation without taking into consideration SUTVA.
• Let be the treatment assignment vector.
• Let be the potential outcome of user i under the treatment assignment vector z .
• Then the avg. treatment effect is given by:
17-10-2013 5
Network Exposure
• User i is “network exposed to a treatment” (with assignment vector say z) if i’s response under z is same as i’s response in the assignment vector 1.
• So there can be the following exposure (or conditions )for the experiment: o Full exposure
o Absolute k exposure
o Fractional q exposure
17-10-2013 6
Graph Cluster Randomization
• At a high level GCR is a technique in which the graph is partitioned into clusters and then randomization between treatment and control is performed at cluster level.
• We just need to know about the intersection of the set of clusters with the local graph structure near the vertex.
17-10-2013 7
Exposure Models
• Exposure Condition of an individual determines how they experience the intervention in full conjunction with how the world experiences the intervention.
• Let be the set of all assignment vector z for which i experiences outcome x. which is basically the exposure condition for i.
• Exposure Model for user i is a set of exposure conditions that completely partitions the possible assignment vectors z.
• Here we are interested only with and .
17-10-2013 8
Exposure Conditions
• Neighborhood Exposure( local exposure conditions ): Full neighborhood exposure
Absolute k- neighborhood exposure
Fractional q- neighborhood exposure
• Core Exposure(global dependency): Component exposure
Absolute k-core exposure
Fractional q-core exposure
Note:: assignment vectors of core exposure are entirely contained in the associated neighborhood exposure.
17-10-2013 9
Randomization and Estimation
Select assignment vector z at random from Z in the range of .
is distribution of Z.
is probability of network exposure to treatment.
Therefore avg. treatment effect is given by Horvitz-Thompson estimator,
The expectation over Z gives the actual avg. treatment effect.
17-10-2013 10
Exposure Probabilities
Model : Full neighborhood exposure + independent vertex randomization
– Probability of exposure to treatment will be
– Probability of exposure to control will be
– Exposure prob. for high degree vertex will be exponentially small in di and this will dramatically increase the variance of HT estimator.
17-10-2013 11
Exposure Probabilities
For absolute and fractional neighborhood models we have the following probabilities.
17-10-2013 12
Exposure Probabilities
• This model has an upper bound given by .
• This also gives an upper bound on the core exposure probabilities, given by the following proposition.
17-10-2013 13
Estimator Variance
• Thus we achieve O(1/n) bound on variance but only when the maximum degree is bounded.
• Variance can grow exponentially with the degree.
• Hence they try to introduce a condition on the graph clustering such that the degree remain bounded and we still have the variance growth.
17-10-2013 16
Restricted-Growth Graph
• Let Br(v) be the set of vertices within r hops of a vertex v.
17-10-2013 17
Variance in Restricted-Growth Graph
• Consider single cycle (k=1) graph of n vertices with basic cluster size c=2
• For c = 2
• For c >= 2
17-10-2013 18
Clustering Restricted-Growth Graph
• Using 3-net for the shortest path metric of graph G.
Initially all vertices are unmarked.
While there are unmarked vertices, in step j find an arbitrary unmarked vertex v, selecting v to be vertex vj and marking all vertices in B2(vj).
Suppose k such vertices are defined and let S = {v1,v2,…..vk}
For every vertex w of G assign w to the closest vertex vi belonging to S, breaking ties consistently.
For every vj, let Cj be the set of all vertices assigned to vj.
17-10-2013 20