DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR...

DAY 2:

STATISTICAL METHODS FOR NETWORK ANALYSIS

Martina Morris, Ph.D.Steven M. Goodreau, Ph.D.Samuel M. Jenness, Ph.D.

Supported by the US National Institutes of Health

NME Workshop 1

Today we will cover

NME Workshop 2

Three classes of statistical models for networks

Simple null models

Generative models for static networks

Generative models for dynamic networks

Ending with an example How we can use these models to answer questions about

epidemic dynamics and interventions

morning

afternoon

Basic null hypothesis statistical tests Does your network differ from a simple random graph?

Simple null models to test against: CUG and BRG

ERGMs : generative models to test for multiple structural properties simultaneously in static networks Components of an ERG model

Interpretation of coefficients

Estimation algorithms (and when they fail)

Model Diagnostics (estimation, and goodness of fit)

Simulation from fitted models

Network data Types

Principles for estimation

Outline for the morning session

NME Workshop 3

This is what we’ll use ERGMs for in Epi modeling

Note

NME Workshop 4

We’ll cover a lot of ground here some of the material and vocabulary may be unfamiliar

Don’t worry if you don’t understand everything Focus on getting the big picture, not the details

EpiModel puts a lot of this behind the curtain So you don’t have to deal with it, for the most part

The details do matter when you have a problematic model though

And – don’t be afraid to ask questions

Getting started

NME Workshop 5

Recall, the two ways to access statnetWeb https://statnet.shinyapps.io/statnetWeb/ library(statnetWeb); run_sw();

Open statnetWeb and load the faux.mesa.high network

https://statnet.shinyapps.io/statnetWeb/

How do you know if your network is significantly different than a simple random graph?

Statistical Testing: Basics6

NME Workshop

Description vs. Inference in statistics

NME Workshop 7

So far we have been using descriptive statistics to explore our network data Density

Degree and geodesic distributions

Mixing matrices

Component size distributions

Next, we might want to compare these statistics to what we would expect by chance What do we mean “by chance”?

Is there a natural “null hypothesis test” in this context?

Recap

NME Workshop 8

Does the structure of our social network differ from a simple random graph?

What are some structural differences you can see?

faux.mesa.high network Simple random graph with the same tie probability

Consider triangles

Suppose kids have a tendency to become friends with their friends’ friends

And this is the only generative process occurring.

Presumably, this would mean that you would observe more triangles than expected by chance in the graph.

How would you test this for a specific network?

NME Workshop 9

A basic statistical test for triangles

Begin by counting the # triangles in your network

Say this is “T”, your test statistic

Then determine the probability of observing T or more triangles in this network …

And see if it is less than 5%

NME Workshop 10

But … how do you determine that probability?

For that you need a null hypothesis of some sort

What is the natural null hypothesis?

NME Workshop 11

It turns out there’s more than one …

But they all get used the same way when constructing a statistical test.

To create a sampling distribution consistent with the null

And compare your observed value to that distribution

Null probability distribution (1)

Unconditional: For a network this size (size = # nodes)

enumerate all possible networks for a fixed number of nodes,

count the number of triangles in each network, and

construct the frequency distribution of these counts.

Where does the number of triangles in your network lie in this distribution?

Top 5%?

Bottom 5%

Near the middle?

NME Workshop 12


For example: Take a network with 3 nodes

How many dyads are there?

32=

3!

2! 3−2 != 3

How many different networks on these dyads? Every dyad has 2 possible values, and there are 3 dyads

So the number of possible networks is: 23 = 8

What is the distribution of triangle counts? 7 networks have 0 triangles

1 network has 1 triangle

So if your network has 1 triangle, what do you think?

NME Workshop 13

0

2

4

6

8

0 1

Triangle distribution for 3 node networks


One problem with the unconditional null distribution

enumerate all possible networks for a fixed number of nodes

this is not so easy in practice

for 4 nodes: # of dyads is 4*3/2 = 6

# of possible networks = 26 = 64


# of possible networks = 245 ≈ 35 trillion


# of possible networks = 2190 ≈ 1057

We can solve this problem by sampling from the space of networks.

NME Workshop 14


More important question for the unconditional null distribution

Do you really care about comparing your network to networks with zero ties?

Or all possible ties?

Or does it make more sense to compare your network to other networks with the same number of ties?

Controlling for density, does your network have more or less triangles than expected?

NME Workshop 15


Condition on the density, the number of nodes and links

This is the Conditional Uniform Graph test (CUG)

enumerate all possible networks for a fixed number of nodes and links,

count the number of triangles in each network,

construct the frequency distribution of the counts

compare the value in your network

This also reduces the sample space

but it’s still a lot of graphs… 𝑛2𝑒= 𝑛2! /𝑒! ( 𝑛

2− 𝑒)!

so we will still need to sample from this space in practice

NME Workshop 16

The CUG is implemented as a permutation test

NME Workshop 17

Since full enumeration is typically not possible

We sample the enumeration space by permutation

Randomly choose a tied dyad, and a dyad without a tie

Permute the tie and the non-tie This preserves the exact density in the network

Count the number of triangles in the new network

Repeat until you have the desired sample size

Permutation tests are often used in statistics When the distribution of a sample statistic is not known


Condition on the probability of a tie

This is the Bernoulli Random Graph test (BRG)

Similar to the CUG, but treats density as a random variable

Implemented via Markov Chain Monte-Carlo (MCMC)

Randomly choose a dyad

Flip a coin with probability(tie) = density of the network This will not preserve the exact density for each network, but will preserve it on average

Repeat many times, then count the number of triangles in the final network

Repeat until you have a sample of the desired size

NME Workshop 18

Null models in statnetWeb

NME Workshop

Select a summary measure for the observed data

Compare it to the distribution simulated from a null model

In statnetWeb:

We can plot null distribution overlays on degree and geodesic distributions

And plot the CUG and BRG distributions for selected network summary measures

19

In statnetWeb: Degree distribution

Compare the degree distribution in faux.mesa.high to what we would expect by chance

20NME Workshop

Network Descriptives

Degree Distribution

Select CUG and BRG

null models

What do you see now?

Overlays the mean and 95% confidence intervals from 100 simulations

Test for the number of isolates

Compare the number of isolates in faux.mesa.high to what we would expect by chance

21NME Workshop

Network Descriptives

MoreNull model

tests

How likely is the number of isolates we observed in our network, under the null model?

CUG test for triangles

NME Workshop 22

Are there more triangles in the observed network?

Choose the triangle term from the dropdown menu and run 100 simulations to see how our network compares to the two null models

“CUG” and “BRG”

Indeed…

NME Workshop 23

Yes the observed triangle count is high

NME Workshop 24

But why?

… a simple null hypothesis test doesn’t provide any insight about that.

Limitations of simple null hypotheses

If we are only interested in whether the triangle counts are different than expected given the density of the graph

One can use these simple null hypothesis tests

But if we want to understand the underlying generative process, quantify the impact of each process on our network, and control for other network features …

This requires a more general statistical modeling framework

NME Workshop 25

Can you control for more than just density?

What if you want to test more than one network feature?

And you want a model grounded in generative social theory?

… That’s when you turn to ERGMs

Statistical Testing: Beyond Basics26

NME Workshop

Motivation

NME Workshop 27

Why are there so many more triangles?

What do you see when color-coding the nodes by their attributes?

faux.mesa.high network Simple random graph with the same tie probability

(At least) two theories about the process that generates triangles:

1. Homophily: People tend to chose friends who are like them, in terms of

grade, race, etc. (“birds of a feather”), triad closure is a by-product

2. Transitivity: People who have friends in common tend to become friends

(“friend of a friend”), triad closure is the key process

So, for three actors in

the same grade

A cycle-closing tie may

form due to transitivityBut it may be due

instead to homophily

Friend of a friend, or birds of a feather?

NME Workshop 28

But not completely. Any tie may be classified by whether it is:

The cells represent how the processes jointly influence that tie, so the distribution of ties in this table is informative.

This suggests we should be able to disentangle the two processes statistically

Transitivity and homophily are confounded

NME Workshop 29

Within Grade:

Triangle forming:

Yes No

Yes Both Homophily

No Transitivity Neither

partially

ERGMs: Basic idea

NME Workshop 30

We want to model the probability of a tie as a function of:

Nodal attributes (that influence degree and mixing)

The propensity for certain “configurations” (like triangles)

The dyads may be dependent

Nodal attribute effects do not induce dyad dependence

But triad closure does

So we model the joint distribution directly

where: g(y) = vector of network statistics

= vector of model parameters

k( ) = numerator summed over all possible networks on node set y

Probability of observing a graph (set of relationships) y on a fixed set of nodes:

• Exponential family model• Well understood statistical properties (≠ well understood models)• Very general and flexible

Exponential Random Graph Model (ERGM)

NME Workshop 31

𝑃 𝑌 = 𝑦 ) =exp(𝜽′𝒈 𝒚 )

𝑘( )

Probability of observing a graph (set of relationships) y on a fixed set of nodes:

If you’re not familiar with this kind of compact vector notation, the numerator is just:

exp(𝜽1𝑥1 + 𝜽2𝑥2 +⋯+ 𝜽𝑝𝑥𝑝)

Kind of like a linear model, but a bit different (watch out for this later)

Exponential Random Graph Model (ERGM)

NME Workshop 32

𝑃 𝑌 = 𝑦 ) =exp(𝜽′𝒈 𝒚 )

𝑘( )

where 𝝏 𝒈 𝒚 represents the change in 𝒈 𝒚 when Yij is toggled between 0 and 1

The conditional odds of a tie

NME Workshop 33

can be re-expressed as𝑃 𝑌 = 𝑦 ) =exp(𝜽′𝒈 𝒚 )

𝑘( )

𝑙𝑜𝑔𝑖𝑡 𝑃 𝑌𝑖𝑗 = 1 rest of the graph ) = 𝑙𝑜𝑔𝑃 𝑌𝑖𝑗 = 1 rest of the graph )

𝑃 𝑌𝑖𝑗 = 0 rest of the graph )

= 𝜽′𝝏 𝒈 𝒚

The probability of the graph

The conditional log odds of a specific tie

This is an auto logistic regression (auto because of the possible dependence)

ERGM specification: 𝜽′𝒈 𝒚

NME Workshop 34

The 𝒈 𝒚 terms in the model are summary “network statistics”

Counts of network configurations, for example:1. Edges: 𝑦𝑖𝑗

2. Within-group ties: 𝑦𝑖𝑗𝐼(𝑖 ∈ 𝐶, 𝑗 ∈ 𝐶)

3. 2-stars: 𝑦𝑖𝑗𝑦𝑖𝑘

4. 3-cycles: 𝑦𝑖𝑗𝑦𝑖𝑘𝑦𝑗𝑘

A key distinction in the types of terms: Dyad independent (1 & 2 are examples)

Dyad dependent (3 & 4 are examples)

ERGM specification: 𝜽′𝒈 𝒚

NME Workshop 35

Model specification involves:

1. Choosing the set of network statistics 𝒈 𝒚

From minimal : # of edges

To saturated: one term for every dyad in the network

NB: statnetWeb allows you to choose from the list of terms and retrieve documentation for each one

2. Choosing “homogeneity constraints” on the parameter , for example, with edges:

all homogeneous

group specific (e.g., sex or age specific )

dyad specific

Let’s explore the Florentine marriage network

… to StatnetWeb36

NME Workshop

Flomarriage: Bernoulli Model

Load the flomarriage network

Network of marriage ties between families in Renaissance Florence

On the Fit Model page, look up the documentation on the edges term

37NME Workshop

Flomarriage: Bernoulli Model

Add edges to the ergm formula

Fit the model

What does this model imply? Homogeneous edge probability

Every tie is equally likely

Not a very interesting model

38NME Workshop

Step 1

Step 2

Interpreting the coefficients

The log-odds of any tie existing is:

Corresponding probability:

39NME Workshop

= −1.609 × change in # ties

= −1.609 × 1

=exp −1.609

1 + exp −1.609

= 0.1667

You can confirm that this is the density of the network

𝜃 = 𝑙𝑛𝑝

(1 − 𝑝)

𝑝 =𝑒𝜃

1 + 𝑒𝜃

Flomarriage: Triad Formation

NME Workshop 40

The “triangle” term is a measure of clustering

Read the documentation for the triangle term

Fit the model edges + triangle

Hint: you can just add the triangle term if edges is already in your formula

Then click Fit Model

Triangle is a dyad dependent term, so the estimation algorithm changes to MCMC (more on this later)

Flomarriage: Triad Formation

NME Workshop 41

Now how to interpret the coefficients?

Conditional log-odds of two actors having a tie:

(−1.68 × change in the # of ties) + (0.16 × change in # of triangles)

always=1 how many triangles can one tie change?

For a tie that will create zero triangles −1.68 + 0 = −1.68

One triangle −1.68 + (0.16 × 1) = −1.52

Two triangles −1.68 + (0.16 × 2) = −1.36

Note, not significant

Still unlikely, but a bit less so

Flomarriage: Nodal covariates

NME Workshop 42

What do you notice?

We can test whether edge probabilities are a function of wealth

This is a quantitative nodal attribute, so we use the ergm term “nodecov”

flomarriage sized by wealth


NME Workshop 43

Reset the ergm formula and fit the following model:

There is a significant positive wealth effect on the odds of a tie

What does the positive coefficient mean?

Not that there is homophily by wealth

Just that wealthy nodes have more ties

Note that the wealth effect operates on both nodes in a dyad.


NME Workshop 44

The conditional log-odds of a tie between two actors is:

For a tie between two nodes with minimum wealth (3)

For a tie between two nodes with maximum wealth (146)

For a tie between nodes with maximum and minimum wealth

Note: To specify homophily on wealth, you would use the ergm-term absdiff

−2.59 × change in # ties + 0.01 × wealth of node 1 + 0.01 × wealth of node 2

−2.59 + 0.01 × 3 + 3 = −2.53

−2.59 + 0.01 × 146 + 146 = 0.33

−2.59 + 0.01 × 146 + 3 = −1.1

Estimation (in one slide)

NME Workshop 45

There is no “closed form” or analytic solution for the estimated coefficients (as there is in OLS: 𝛽 = 𝑋′𝑋 −1(𝑋′𝑌))

Instead, we rely on a defining property of Maximum Likelihood Estimates (MLEs) for exponential family models

At the MLE of the coefficients:

expected values of the statistics under the model = the observed statistics

And we find these MLEs using an iterative search algorithm

A “Markov Chain Monte Carlo” (MCMC) algorithm Start with some initial θ values, simulate a sample of networks from those values

Compare the means of the simulated statistics to the observed values

Update the values of θ based on the deviations

Repeat until the (expected – observed) < epsilon

Estimation (ok, I needed 2 slides)

NME Workshop 46

What does it mean to “simulate networks from those values”?

Pick a dyad at random

Toss a coin to set the tie status

The probability of the tie is determined by the model

And the details of the MCMC sampling algorithm (Gibbs, Metropolis, Metropolis-Hastings)

Repeat (many many many times)

This produces a Markov Chain of networks

Sample from this chain, every 1000th element (say)

Calculate the mean of the model statistics from this sample

And compare the this mean to the observed network statistics

Computationally intensive estimation

NME Workshop 47

Has been key to statistical estimation of complex (i.e., realistic and interesting) models for dependent data

And to the emergence of the field of “data science”

In most cases, it works really well

And there is lots of mathematical theory proving it has good convergence properties

… but, it can run into trouble

especially if the model you’re trying to fit is not a good one for the observed network

Model Degeneracy

NME Workshop 48

Models with dyad dependent terms can behave differently than we expect

They look simple, almost like logistic regression

But they represent effects that cascade through a network via a chain of dependence (this is the “watch out” from earlier)

Homogeneous triangle and k-star terms turn out to be some of the worst offenders

Model Degeneracy

NME Workshop 49

Technical Definition:

When a model places almost all probability on a small number of uninteresting graphs

Most common “uninteresting” graphs: Complete (all links exist)

Empty

Model degeneracy = misspecification The model you specified would almost never produce the network you

observed

Model Degeneracy

NME Workshop 50

Switch to the faux.mesa.high network

Fit a model with: edges + triangle

What happens?

Trying to fit this model, the algorithm heads off into networks that are much more dense than the observed network.

What does this mean? That this model would not have produced this network, for any combination of parameter estimates for the two terms

i.e., this is a model misspecification problem

Degeneracy Plot (for the 2 star model)

NME Workshop 51

Only the white area has networks with some interesting variation

The dark areas are complete graphs, or empty graphs (+/-1 or 2 edges)

This model does not produce many useful networksFrom Mark Handcock’s 2003 tech report:

https://www.csss.washington.edu/Papers/2003/wp39.pdf

https://www.csss.washington.edu/Papers/2003/wp39.pdf

Solution: better network statistics

NME Workshop 52

Old statistic: t(x) = # of triangles in the graph

Here, every additional 3-cycle has the same impact,

New statistic: Set declining marginal returns for each additional 3-cycle involving the

same edge

The specific function we place on this shared partner distribution involves a geometric weighting

Hence the name: geometrically weighted edge-wise shared partners A.k.a. GWESP

The parameter that specifies the rate of decline in marginal returns is α

The smaller the α, the more rapid the decline


NME Workshop 53

spi = # of edges with i shared partners𝑔𝑤𝑒𝑠𝑝 = 𝑒α

𝑖=1

𝑛−2

1 − 1 − 𝑒−α 𝑖 𝑠𝑝𝑖

This configuration contains:• 1 edge with 3 shared partners • 6 edges with 1 shared partner

α GWESP(α)

0 𝑒0 1 − 1 − 𝑒−0 1 × 6 + 𝑒0 1 − 1 − 𝑒−0 3 × 1 = 7

0.5 𝑒0.5 1 − 1 − 𝑒−0.5 1 × 6 + 𝑒0.5 1 − 1 − 𝑒−0.5 3 × 1 = 7.55

1 𝑒1 1 − 1 − 𝑒−1 1 × 6 + 𝑒1 1 − 1 − 𝑒−1 3 × 1 = 8.03

The # of edges with 1+ shared partners


NME Workshop 54

spi = # of edges with i shared partners𝑔𝑤𝑒𝑠𝑝 = 𝑒α

𝑖=1

𝑛−2

1 − 1 − 𝑒−α 𝑖 𝑠𝑝𝑖

Count of edges in each triangle (i.e. # of triangles x 3)

Count of edges in at least one triangle (because only an edge’s first triangle counts)

Adding a gwesp term to the faux.mesa.high model

And conducting model assessments

… to StatnetWeb55

NME Workshop

Fitting and diagnosing a model

NME Workshop 56

Convergence is the first assessment

Dyad independent models will always converge

Dyad dependent models may or may not

Next step depends on the model:

Dyad independent Dyad dependent

Goodness of fit assessment: GOF plots

Convergence assessment: MCMC diagnostics

What are MCMC Diagnostics?

NME Workshop 57

MCMC Diagnostics tell us if the estimation algorithm is mixing well

These look pretty good

The traceplots on the left show random walks around the target value (a bit of correlation in the sampled networks, but not enough to cause concern)

The distribution of sampled statistics on the right is centered on the target values

These are taken from the penultimate MCMC chain, which is stored in the ergmoutput object

Goodness of Fit (GOF)

NME Workshop 58

Traditional GOF stats can be used AIC, BIC are included in the model summary

We also take another approach

We are interested in how well we fit aggregate properties of the network structure that we did not include as model terms

This helps to identify what the model gets wrong

We use 3 “higher order” statistics:

Degree distribution

Shared partner distribution (non-parametric) (local clustering)

Geodesic distance distribution (global clustering)

DATA

MODEL

ESTIMATEDCOEFFICIENTS

SIMULATED DATA(draws from the prob. dist.)

HIGHER ORDERGRAPH STATISTICS

OF DATA

HIGHER ORDERGRAPH STATISTICS

OF SIMULATED DATA

GOODNESS OF FIT OF MODEL

TO DATA

NME Workshop 59

We’ll show how to do this next

Take a break?60

NME Workshop

So let’s run and compare several models

NME Workshop 61

These will allow us to examine the evidence for homophily vs. transitivity

We’ll assess the convergence of the different models

As well as the goodness of fit

And the implications for the generative process of high school friendship patterns in this network

Fit the Bernoulli model to faux.mesa.high

62

Estimate, and run the default set of GOF terms for this model:

NME Workshop

faux.mesa.high ~ edges

Is this a dyad independent or dyad dependent model?

Dyad independent models are not fit with MCMC, so we don’t need to check MCMC diagnostics

We can move directly to GOF

Save the model

NME Workshop 63

This will keep the results so we can compare them later

Run the GOF for this model

64

Go to the Goodness of Fit tab

NME Workshop

Run the default GOF

This will take a moment because GOF is simulating 100 networks from the model, and calculating the default summary statistics for each one

Data:

Model: Bernoulli

(i.e. edges only)

Goodness of fit measure 1: degree distribution

NME Workshop 65

Boxplots show 100 simulations from the Bernoulli model

Black line shows the observed data from faux.mesa.high

Data:

Model: Bernoulli

(i.e. edges only)

This edge has an ESP value of 3

Goodness of fit measure 2: ESP distribution (local clustering)

NME Workshop 66

Data:

Model: Bernoulli

(i.e. edges only)

A/B have geodesic 2

A/C have geodesic ∞

AB

C

Goodness of fit measure 3: geodesic distribution(global clustering)

NME Workshop 67

faux.mesa.high ~ edges

degree edgewise shared partner geodesic

Goodness of fit measures assembled

NME Workshop 68

Summary: Not a good fit to any of the aggregate structural properties observed

Fit a model with gwesp

69

Estimate, save and assess this model:

NME Workshop

faux.mesa.high ~ edges + gwesp(0.25, fixed = TRUE)

Save this model too.

This is a dyad dependent model

It converges (unlike the triangle model)

It is fit with MCMC

So we should check the MCMC diagnostics

Run the MCMC diagnostics for this model

70

Go to the MCMC diagnostics tab

NME Workshop

Select Model 2

Looks pretty good

Run the Goodness of Fit

71

edge-wise shared partnersdegree minimum geodesic distance

NME Workshop

faux.mesa.high ~ edges + gwesp(0.25, fixed = TRUE)

Much better, though the ESP distribution fit isn’t great

And, a quick eyeball test…

NME Workshop 72

So, back to our original question:How much of the clustering is due to homophily, and how much to transitivity?

The global structure looks kinda similar now, ... But something is not right

Observed network Network simulated from model*

* We’ll get to simulation in just a bit

Test this by comparing four models

Model Network Statistics g(y)

Edges # of edges

Edges + GWESP

(transitivity)

# of edges

weighted shared partners

Edges + Attributes

(homophily)

# of edges

# of edges for each race, sex, grade

# of edges that are within-race, within-grade, within-sex

Edges + Attributes + GWESP

(both)

# of edges

# of edges for each race, sex, grade

# of edges that are within-race, within-grade, within-sex

weighted shared partners

NME Workshop 73

Fitting and saving models

NME Workshop 74

statnetWeb allows you to save up to 5 models – we’ll fit 4 (you can cut and paste from here to statnetWeb):

1. edgesFit model, save model, reset formula

2. edges + gwesp(0.25, fixed = T)Fit model, save model, reset formula

3. edges + nodefactor("Grade") + nodefactor("Race") + nodefactor("Sex") + nodematch("Grade", diff = T) + nodematch("Race", diff = F) + nodematch("Sex", diff = F) Fit model, save model, reset formula

4. edges + nodefactor("Grade") + nodefactor("Race") + nodefactor("Sex") + nodematch("Grade", diff = TRUE) + nodematch("Race", diff = FALSE) + nodematch("Sex", diff = FALSE) + gwesp(0.25, fixed = TRUE)Fit model, save model

You’ve already fit and saved these

Model Comparison

NME Workshop 75

Note how the gwespestimate changes from model 2 to 4

About 25% smaller

That’s the impact of controlling for attribute effects, including homophily

Homophilyestimates change also, once you control for transitivity

GOF comparison for all 4 models:

1. Edges

AIC: 2288

3. Edges + Attributes

AIC: 1809

2. Edges + GWESP

AIC: 1999

4. Edges + Attributes +

GWESP

AIC: 1648

76NME Workshop

This will take some time to run

Summary

NME Workshop 77

Both transitivity and homophily play a role in clustering these friendships• Homophily accounts for the distribution of path lengths (geodesics)

• Transitivity (Triadic closure)

• Accounts for the large number of isolates

• Captures the local clustering (ESP) reasonably well

• ~25% of the transitivity effect is a by-product of homophily

• The gwesp coefficient drops by ~25% when homophily is added to the model

The GOF suggests the ESP distribution is still not well fit• You could tinker some more, if this was a real research question

• But we’ll move on…

Simulating networks from the model

NME Workshop 78

A fitted model describes a probability distribution across all networks of this size

The model assigns a probability to every possible network

The model terms and the estimated coefficients make some networks more likely than others

You can simulate networks from this distribution

Using the same MCMC algorithm that was used for estimation

And the simulated networks will be centered on the network statistics in the original observed network

This, of course, is why these models are really useful for network epidemiology

Simulations

NME Workshop 79

Choose one of the models that you have saved and run 100 simulations with the default control settings

Choose the model on the Simulations page next to “ergm formula”

Do you see autocorrelation in the simulation statistics?

Increase the MCMC interval to 10,000 and re-run the simulations to see how this changes the autocorrelation

Some common statistics in ergms

NME Workshop 80

Term Formula Unit Value(s)

~edges # of edges edges 8

~nodefactor(“color”) Sum of degrees for nodes of each color nodes/edges* [8,] 6, 2

~nodefactor(“color”,

base=2)

Sum of degrees for nodes of each color nodes/edges* 8, [6,] 2

~nodematch(“color”) # of edges between nodes of same color edges 6

~nodematch(“color”, diff

= TRUE)

# of edges between nodes of same color, for each color

edges 3, 2, 1

undirected network of 10 nodes, including nodal attribute “color”, with values:

1=black,

2=red,

3=green


NME Workshop 81


~nodemix(“color”, base=1) # of edges between nodes of each color combo edges [3,] 2, 2, 0, 0, 1

~degree(0) # of nodes of degree 0 nodes 2

~degree(2:5) # of nodes of degrees 2, 3, 4, 5 each nodes 1, 2, 1, 0

~concurrent # of nodes of at least degree 2 nodes 4


1=black,

2=red,

3=green


NME Workshop 82


~triangle # of triangles (beware!) triangles 2

~gwesp(0) # of edges in at least one triangle edges 5

~gwesp(∞) # of edges in triangles total (=3 * # triangles) triangles 6


1=black,

2=red,

3=green

Next

NME Workshop 83

Collecting network data

There are different types

And the type you collect has implications for the statistical methods that you can use

Selected References

NME Workshop 84

Journal of Statistical Software (v42) 2008 – Eight papers on the statnet software; covering theory, algorithms and usage

Goodreau, S., et al. (2009). "Birds of a Feather, or Friend of a Friend? Using Statistical Network Analysis to Investigate Adolescent Social Networks." Demography 46(1): 103–125.

Handcock MS. (2003) Assessing Degeneracy in Statistical Models of Social Networks. CSSS working paper 39. https://www.csss.washington.edu/research/working-papers/39

Hunter DR. Curved Exponential Family Models for Social Networks. (2007) Social networks. 29(2):216-30. doi: 10.1016/j.socnet.2006.08.005. PubMed PMID: PMC2031865.

Hunter DR, Handcock MS. Inference in Curved Exponential Family Models for Networks. (2006) Journal of Computational and Graphical Statistics. 15(3):565-83. doi: 10.1198/106186006X133069.

Hunter DR, Goodreau SM, Handcock MS. Goodness of fit of social network models. (2008) J Am Stat Assoc. 103(481):248-58. doi: 10.1198/016214507000000446. PubMed PMID: WOS:000254311500029.

https://www.csss.washington.edu/research/working-papers/39

DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR...

Documents

Transcript of DAY 2: STATISTICAL METHODS FOR NETWORK ANALYSISstatnet.org/nme/d2-s2.pdf · STATISTICAL METHODS FOR...