Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use...

41
Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References Using Social Network Information In Survey Estimation Thomas S ¨ uße and Raymond Chambers National Institute for Applied Statistics Research Australia (NIASRA) University of Wollongong 2013 Graybill Conference, Fort Collins, Colorado 11 June 2013 1/36

Transcript of Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use...

Page 1: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Using Social Network Information In SurveyEstimation

Thomas Suße and Raymond Chambers

National Institute for Applied Statistics Research Australia (NIASRA)University of Wollongong

2013 Graybill Conference, Fort Collins, Colorado

11 June 2013

1/36

Page 2: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Outline

1 Introduction

2 Social Networks

3 Linear Models that Use Social Network Data

4 Simulation Study

5 Conclusions

2/36

Page 3: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Outline

1 Introduction

2 Social Networks

3 Linear Models that Use Social Network Data

4 Simulation Study

5 Conclusions

2/36

Page 4: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Outline

1 Introduction

2 Social Networks

3 Linear Models that Use Social Network Data

4 Simulation Study

5 Conclusions

2/36

Page 5: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Outline

1 Introduction

2 Social Networks

3 Linear Models that Use Social Network Data

4 Simulation Study

5 Conclusions

2/36

Page 6: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Outline

1 Introduction

2 Social Networks

3 Linear Models that Use Social Network Data

4 Simulation Study

5 Conclusions

2/36

Page 7: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Introduction

Population U of size NSample s of size n, remainder of population r := U \s of sizeN−nSurvey variable Y with realisations yi , i ∈ UFocus on estimating population total ty = ∑i∈U yi

Auxiliary variables X1, . . . ,Xp

Non-informative sampling method given population values ofauxiliariesModel-based approach

3/36

Page 8: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

A Place To Start

Simple linear model for Y in terms of X1, . . . ,Xp

yi = x1iβ1 + · · ·+ xpiβp + εi

εi ∼ (0,σ2)

In matrix terms

yi = XTi β + εi or YU = XUβ + εU

Best linear unbiased predictor (BLUP) for population total ty

ty = ∑i∈s

yi + ∑i∈r

yi = 1Ts Ys + 1T

r (Xr β )

β = (XTs Xs)−1XT

s Ys, YU =

(YsYr

), XU =

[XsXr

]4/36

Page 9: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

A More Complex Reality: Hierarchical Data

Data available at enumeration district (ED) and ward levelIndividuals i (level 1); EDs j (level 2); wards k (level 3)Multilevel model:

yijk = XTijk β + u(3)

k + u(2)jk + u(1)

ijk

withu(3)

k ∼(

0,τ(3)),u(2)

jk ∼(

0,τ(2)),u(1)

ijk ∼(

0,τ(1))

5/36

Page 10: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

A Patterned Covariance Structure

Var(yijk ) = σ2 = τ(3) + τ(2) + τ(1)

Cov(yijk ,ylmn) =

τ(2) + τ(3) different people, same EDτ(3) different EDs, same ward0 different wards

Linear model for population has the form

YU = XUβ + εU ,εU ∼ (0,σ2VU)

where VU has a nested block diagonal structure

6/36

Page 11: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

The General BLUP

BLUP for dependent responses

ty = 1Ts Ys + 1T

r

{Xr β + VrsV−1

ss (Ys−Xsβ )}

with best linear unbiased estimator (BLUE)

β s = (XTs V−1

ss Xs)−1XTs V−1

ss Ys

and

VU =

[Vss VrsVrs Vrr

]

7/36

Page 12: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Using Social Networks to Characterise Non-Hierarchical Dependence

Widespread (Facebook, Linkedin, Google, family, friends,colleagues, etc.)N actors or nodesSimplest characterisation via adjacency matrix ZU = [Zij ]

Ni ,j=1 with

Zij = 1 if relationship (’edge’) exists between i and j ; Zij = 0otherwiseZU has zero main diagonal and is symmetric (undirectednetwork) or asymmetric (directed network)Extensions exist for multiple types of relationships and count orcontinuous values for Zij , e.g. level/strength of communicationbetween two nodes

8/36

Page 13: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Example: Law Firm Collaborations

Working relations among N = 36 partners in a law firm (Lazega,2001)An edge exists between two partners if, and only if, both indicatethat they collaborate with the otherUndirected networkNumbers of edges (row and column sums) associated with eachof the N = 36 nodes range from 0 to 16, with an average of 6.4Node attributes (covariates collected on each partner) includeseniority (rank number of entry into the firm), gender, office(three offices in different cities), and practice (litigation = 0, andcorporate law = 1)

9/36

Page 14: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Example: Adjacency Matrix for Law Firm Collaborations1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 02 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 04 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 0 1 0 0 0 1 0 1 1 0 1 0 0 0 0 05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1 1 0 0 06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 1 1 1 0 0 0 07 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 08 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 09 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

10 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 012 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1 0 013 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 014 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 1 0 0 0 015 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 0 0 1 0 0 1 0 0 1 116 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 1 0 1 0 0 1 0 1 0 117 1 1 0 1 0 0 0 0 0 0 1 1 0 1 0 1 0 0 1 0 0 1 0 1 1 1 0 1 1 0 0 0 0 1 0 018 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 1 019 0 0 0 1 0 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 1 1 020 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 021 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 022 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 023 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 024 0 0 0 0 1 1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 125 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 026 0 1 0 1 0 0 0 0 0 1 0 1 0 0 1 1 1 0 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 027 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 028 0 0 1 1 1 1 0 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1 029 0 1 0 1 0 0 0 0 1 1 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 030 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 031 0 0 0 1 1 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 1 0 0 0 1 0 1 0 1 1 0 1 032 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 0 1 0 1 033 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 034 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 035 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 1 0 0 0 036 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

10/36

Page 15: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Example: Graph of Law Firm Collaborations

1

2

3

4

5 6

7

8

9

1011

12

13

14

15

16

17

18

19

20

21

22 23

2425

26

27

28

29

3031

32

33

34

35

36

11/36

Page 16: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Modelling ZU : Exponential Random Graph Models

Widely used family of models for network dataProbability distribution generated by an ERGM is

Pr(ZU = z) = exp(

ηT g(z)−κ(η)

)=

exp(ηT g(z)

)∑ζ∈Z exp(ηT g(ζ ))

η vector of model parametersg(z) vector of network statisticsκ is the normalising constant

κ(η) = log

{∑

ζ∈Zexp(η

T g(ζ ))

}

12/36

Page 17: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Examples of Network Statistics

Edges Statistic

A B

Two-Star Statistic

A

B C

Edgewise Shared-Partner Statistic

A B

CD E

Triangle Statistic

A

B C

13/36

Page 18: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

The GWESP Statistic

EPk (ZU) is the number of edges (Zij = 1 with i < j) that shareexactly k neighbors in commonEP0 + · · ·+ EPN−2 = number of edgesGeometrically weighted edgewise shared partner (GWESP)statistic defined as

GWESP(ZU ,θ) = exp(θ)N−2

∑k=1

{1− (1−exp(−θ))k

}EPk (ZU)

Geometrically weighted sum of EPk (ZU) values, with parameterθ controlling distribution of weights

14/36

Page 19: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Fitting ERGMs

ERGM’s are difficult to fit because the normalising constant κ

cannot be calculated explicitly in any realistic applicationMCMC techniques are typically used to approximate thelog-likelihoodGeometrically weighted statistics (e.g. GWESP) generate MCMCsamples that are degenerate less often

15/36

Page 20: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Three Questions

(a) Is embedding social network information into linear modelsuseful for survey estimation based on these models?

(b) If the answer to (a) is yes, then

(b1) Which network-based linear models are potentially useful?(b2) How much network data needs to be collected in order to obtain

potentially higher precision for survey estimation?

16/36

Page 21: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Linear Models with Embedded Social Network Information

There are basically three types of linear models that use theinformation in the adjacency matrix ZU generated by a socialnetwork

1. Contextual Network models2. Autocorrelation models3. Network Disturbance models

17/36

Page 22: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Contextual Network (CN) Models

Basic idea is to add one or more network-based contextualcovariates to the modelMotivation: Student academic performance (AP) as a function ofsocio-economic status (SES)Network: Student friendship networkModel student’s AP as a function of his/her SES and averageSES of his/her friends (Friedkin, 1990)

18/36

Page 23: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Contextual Network (CN) Models

CN model can be written as

YU = XUβ + WUTUγ + εU

where

TU is the population matrix of covariates measured on the networkWU is a row-normalised version of ZU , i.e. the rows of WU sum toone

19/36

Page 24: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Autocorrelation (AR) Models

The matrix TU can be any set of measurements on theindividuals in the network, and in particular it can be YU

Autocorrelation (AR) models, also known as network effectsmodels (Ord, 1975; Doreian et al., 1984; Duke, 1993; Leenders,2002), are defined by

YU = Xβ + λWUYU + εU

where λ ∈ (−1,+1)

The conditional (on XU ) mean and variance of YU areµ = D−1

U XUβ and VU = σ2(DTUDU)−1, where DU = IU −λWU

20/36

Page 25: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Network Disturbance (ND) Models

The linear model errors are assumed to have an AR structure,i.e. YU = XUβ + εU , where εU = λWUεU + vU and vU ∼ (0,σ2IU)

Conditional (on XU ) the mean and variance of YU are µ = XUβ

and VU = σ2(DTUDU)−1 respectively. That is, the network induces

correlation structure but does not affect mean structure (Ord,1975; Leenders, 2002)AR and ND models are similar to conditional autoregressive(CAR) and simultaneous autoregressive (SAR) modelscommonly used for spatial data (Banerjee et al., 2004)

21/36

Page 26: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

BLUP/EBLUP Specification

In order to calculate the BLUP, we generally need to specify

A design matrix HU such that the conditional mean µ of YU givenXU satisfies µ = HUξ

A positive definite matrix VU proportional to the conditionalvariance of YU given XU

When these quantities themselves depend on unknownparameters, we first estimate these parameters from the sampledata and then substitute in HU and VU before calculating theBLUP. This is the ‘plug-in’ version of the EBLUP

22/36

Page 27: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Model Specification

Standard HU = XU and VU = σ2IU . The residual mean squarederror is an unbiased estimator of σ2

CN HU = [XU ,WUTU ] and VU = σ2IU . Again, the residualmean squared error is an unbiased estimator of σ2

AR HU = D−1U XU and VU = σ2(DT

UDU)−1 withDU = IU −λWU . Estimates of σ2 and λ can beobtained by maximum likelihood (ML)

ND HU = XU and VU = σ2(DTUDU)−1. Both σ2 and λ can be

estimated via ML

23/36

Page 28: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Imputation of Missing Network Information

Calculation of the EBLUPs defined by the CN, AR and NDmodels assumes that the population network ZU is knownIn practice this is extremely unlikely, and it is more realistic toconsider situations where ZU is partially known

SS We only know Zss, i.e. the sub-network of relationships betweenthe n sampled individuals in s

SS+SR We also know the links between the sampled individuals and theremaining N−n non-sampled individuals in the population, i.e. weknow Zsr . Note that for an undirected network this means that weknow Zrs as well

We use model-based imputation to ‘fill in’ the rest of ZU in eithercase

24/36

Page 29: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Optimum Imputation

An optimal model-based approach is to assume that ZU can beadequately modelled via an ERGM and to use the minimummean squared error predictor E(Zmis

U |ZobsU = zobs)

In this case the conditional distribution of ZmisU is defined by

Pr(ZmisU = zmis|Zobs

U = zobs) =exp

(ηT g(zmis,zobs;θ)

)∑ζ mis∈Z mis exp

(ηT g(ζ mis,zobs;θ)

)where Z mis is the sample space of Zmis

U

In theory, MCMC techniques can be used to sample from thisconditional distribution, with η and θ replaced by estimatesbased on the observed network. However, this is impractical atpresent

25/36

Page 30: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Practical ImputationMethod 1

Suppose conditionally on zobs that Z misij and Z mis

kl areconditionally independent for any two distinct pairs ij and kl , i.e.Pr(Zmis = zmis|Zobs = zobs) = ∏ij Pr(Z mis

ij |Zobs = zobs)

This leads to

Pr(Z misij = 1|Zobs = zobs)

Pr(Z misij = 0|Zobs = zobs)

= exp(ηT ∆gmis

ij )

where ∆gmisij is the change statistic, i.e. the difference in g

between (zmisij ,zobs) = (1,zobs) and (zmis

ij ,zobs) = (0,zobs)

26/36

Page 31: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Practical ImputationMethod 1

Re-arranging this equation gives the MMSEP under conditionalindependence,

E(Zij = 1|Zobs = zobs) = Pr(Zij = 1|Zobs = zobs) = expit(ηT ∆gmis

ij )

with expit(x) = exp(x)/(1 + exp(x))

It is only necessary to compute ∆gmisij in order to obtain this

MMSEP for any distinct pair ij ∈misSince the conditional independence assumption is generallyunwarranted, this approach can only be considered as definingan approximation to Pr(Zmis|Zobs = zobs)

However, it is computationally feasible for realistic sample andpopulation sizes

27/36

Page 32: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Practical ImputationMethod 2

A very simple approach is to calculate the proportion of Zij = 1 inzobs and use this proportion (the network density) to impute Zmis

This corresponds to imputing on the basis of an ERGM modeldefined by just the EDGES statistic, i.e. the number of edges inthe network

Equivalent to assuming that each Zij in the network matrix ZU is anindependent Bernoulli variable with a common probability of a‘success’

If the network model also contains exogenous effects, then thissimple approach corresponds to imputation on the basis of thelogistic regression model defined by these effects

28/36

Page 33: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Simulation Study - Model Specification

Standard Yi = β0 + β1Xi + εi , εi ∼ N(0,1)β0 = 40, β1 = 5 and Xi is drawn randomly from 1, . . . ,9

CN Yi = β0 + Xiβ1 + Uiγ + εi , εi ∼ N(0,1)γ = 2 and Ui is the contextual variable defined byaverage value of X for all individuals in the network thatare linked to individual i

AR Yi = β0 + Xiβ1 + Uiλ + εi , εi ∼ N(0,1)λ = 0.5 and Ui is the average value of Y for allindividuals in the network that are linked to individual i

ND Yi = β0 + β1Xi + εi , with εi = Uiλ + vivi ∼ N(0,1) and Ui is the average value of ε for allindividuals in the network that are linked to individual i

29/36

Page 34: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Simulation Study - Network Specification

Two types of networks were simulated

An ERGM network. Here ZU was generated as a random drawfrom an ERGM with a density of about 15 network links for eachsubject (EDGES statistic equal to −4.18 on the logit scale) and aweight parameter of θ = 1 for the GWESP statistic

A Gang network, where ZU defined a network of 100 ‘gangs’, eachof size 10. In this network each gang member only knows everyother member of his/her gang, so Z, after re-ordering rows, is blockdiagonal. This is analogous to the network defined by members ofthe same household.

30/36

Page 35: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Simulation Study - Characteristics & Notation

Population of size N = 1,000 was independently simulated 2,000timesIndependent simple random samples of size n = 100 andn = 200 were independently selected without replacement fromeach simulated populationSS denotes the case where only Zss is observedSS+SR/1 denotes where Zss and Zsr are observed andimputation method 1 is usedSS+SR/2 denotes where Zss and Zsr are observed andimputation method 2 is used

31/36

Page 36: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Table: Monte Carlo Bias of EBLUP (average population total ≈ 65K); n = 100

ERGM network Gang NetworkTrue Model True Model

Prediction Based On CN AR ND CN AR NDBLUP 1.81 2.19 2.27 2.14 −2.11 −1.93full network known 1.81 2.50 2.14 2.14 −1.84 −1.83

SS 1.96 3.06 2.22 2.90 0.93 −1.12CN SS+SR/1 0.92 2.11 2.07 – – –

SS+SR/2 1.05 2.37 2.12 2.10 −1.24 −1.17SS 0.51 2.29 2.08 2.34 −1.57 −0.84

AR SS+SR/1 1.94 2.71 2.08 – – –SS+SR/2 1.26 1.27 2.05 2.24 −3.47 −1.15SS 0.78 1.48 2.20 2.39 −1.54 −1.63

ND SS+SR/1 0.80 1.79 2.23 – – –SS+SR/2 0.85 1.49 2.13 3.09 14.1 −1.83

standard model 0.78 1.59 2.13 2.98 1.21 −1.06

32/36

Page 37: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Table: Monte Carlo MSE of EBLUP relative to MSE of BLUP; n = 100

ERGM network Gang NetworkTrue Model True Model

Prediction Based On CN AR ND CN AR NDBLUP - actual MSE 9,390 8,739 8,736 9,421 12,330 12,315full network known 1.00 1.10 1.02 1.00 1.08 1.01CN SS 2.28 3.38 1.00 2.66 9.57 1.05

SS+SR/1 1.19 1.39 1.00 – – –SS+SR/2 1.14 1.31 1.00 1.01 1.07 1.06

AR SS 2.30 3.34 1.01 2.57 8.67 1.05SS+SR/1 1.45 1.42 1.00 – – –SS+SR/2 1.31 1.30 1.00 1.24 1.10 1.06

ND SS 2.30 3.52 1.02 2.54 8.79 1.01SS+SR/1 2.30 3.48 1.02 – – –SS+SR/2 2.30 3.49 1.02 3.14 11.2 1.01

standard model 2.30 3.50 1.00 2.87 10.5 1.04

33/36

Page 38: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Table: Monte Carlo average length of nominal 95% Gaussian confidenceinterval generated by EBLUP relative to that generated by BLUP; n = 100;corresponding coverage (%) shown in subscript

ERGM Network Gang NetworkTrue Model True Model

Prediction Based On CN AR ND CN AR NDBLUP - average length 37094.5 38296.2 38296.4 37094.0 41893.9 41893.4

full network known 1.0094.4 0.9995.0 0.9895.3 1.0094.0 1.0092.8 1.2794.1

CN SS 1.5493.5 1.7995.4 0.9995.5 1.6594.4 3.0791.8 1.0093.2SS+SR/1 1.0092.3 1.0192.2 0.9995.6 – – –SS+SR/2 1.0092.9 1.0192.7 0.9995.8 1.0093.7 1.0193.0 1.0193.7

AR SS 1.5494.1 1.7795.3 0.9995.4 1.6294.2 2.8992.4 1.0093.4SS+SR/1 1.1092.2 1.0192.2 0.9995.5 – – –SS+SR/2 1.1093.1 1.0192.8 0.9995.6 1.0693.7 0.9992.4 1.0093.4

ND SS 1.5994.8 1.8395.7 0.9895.1 1.5993.5 2.9392.3 0.9893.2SS+SR/1 1.5994.8 1.7794.4 0.9794.4 – – –SS+SR/2 1.6094.8 1.8095.2 0.9894.9 2.0190.1 2.8889.5 1.2794.1

standard model 1.5994.8 1.8496.0 0.9995.7 1.7293.8 3.2392.8 1.0193.6

34/36

Page 39: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Tentative Answers to Three Questions

(a) Is embedding social network information into linear modelsuseful for survey estimation based on these models?

Yes

(b1) Which network-based linear models are potentially useful?CN and AR models are useful when either model is true, since inboth cases the mean of the response depends on the networkIgnoring the network does not result in a significant loss ofefficiency when the ND model is true

(b2) How much network data needs to be collected in order to obtainpotentially higher precision for survey estimation?

Both Zss and Zsr must be available in order to obtain efficiencygains. Knowledge of Zss alone is not enough

35/36

Page 40: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

A Recommendation & A Caution

The AR model can be difficult to fit, see Suesse (2012), so werecommend that the CN model be used if it is a reasonable fit tothe data and relevant population level auxiliary network data areavailable. Otherwise ignoring the network might be the bestoptionNote that we have assumed that the method of sampling isindependent of the network structure given the availablepopulation auxiliary information

There are important applications, see Thompson and Seber(1996), where inclusion in sample depends on being linked toanother sampled individual via a networkIn these cases we cannot treat the observed network structure inZss and Zsr as ancillary (as we have here), and this ‘informative’method of sampling needs to be taken into account when weimpute the unknown components of ZU

36/36

Page 41: Using Social Network Information In Survey Estimation · 2 Social Networks 3 Linear Models that Use Social Network Data 4 Simulation Study 5 Conclusions 2/36. OutlineIntroductionSocial

Outline Introduction Social Networks Linear Models that Use Social Network Data Simulation Study Conclusions References

Banerjee S., Carlin B. P. and Gelfand A. E. (2004) Hierarchical modelling and analysisfor spatial data Boca Raton, Fla.: Chapman & Hall/CRC Press.

Doreian, P., Teuter, K. and Wang, C. H. (1984) Network auto-correlation models - somemonte-carlo results. Sociological Methods & Research 13, 155–200.

Duke, J. B. (1993) Estimation of the network effects model in a large data set.Sociological Methods & Research 21, 465–481.

Friedkin, N. E. (1990) Social networks in structural equation models. SocialPsychology Quarterly 53, 316–328.

Lazega, E. (2001) The Collegial Phenomenon: The Social Mechanism of CooperationAmong Peers in a Corporate Law Partnership. Oxford: Oxford University Press.

Leenders, R. (2002) Modeling social influence through network autocorrelation:constructing the weight matrix. Social Networks 24, 21–47.

Ord, K. (1975) Estimation methods for models of spatial interaction. Journal of theAmerican Statistical Association 70, 120–126.

Suesse, T. (2012) Estimation in autoregressive population models. In Proceedings ofFifth Annual ASEARC Research Conference. University of Wollongong: ASEARC.2-3 February 2012.

Thompson, S. K. and Seber, G. A. F. (1996) Adaptive sampling. Wiley series inprobability and mathematical statistics. New York: Wiley.

36/36