Bayesian estimation of complex networks and dynamic choice...

33
Data sets from the music broadcasting industry Multidimensional panel data An exponential random model Estimation method Bayesian estimation of complex networks and dynamic choice in the music industry Stefano Nasini Víctor Martínez-de-Albéniz Dept. of Production, Technology and Operations Management, IESE Business School, University of Navarra, Barcelona, Spain Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Transcript of Bayesian estimation of complex networks and dynamic choice...

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Bayesian estimation of complex networks anddynamic choice in the music industry

Stefano Nasini Víctor Martínez-de-Albéniz

Dept. of Production, Technology and Operations Management,IESE Business School, University of Navarra,

Barcelona, Spain

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Outline

1 Data sets from the music broadcasting industry

2 Multidimensional panel data

3 An exponential random modelMultidimensional Gaussian reductionThe exponential family of distribution

4 Estimation methodNumerical resultsGoodness of fit

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Artist goods: the music broadcasting industry

Artist goodsTheir life cycles that resembleclothing fashion trends, with a timewindow in which their popularityincreases shortly after theirpremiere and then decrease.

This is due to network externalitiesin individual preferences andopinions.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Artist goods: the music broadcasting industry

A data set of songs played on TVchannels and radio stations

Germany UKBroadcasting companies 41 51

Artists 13860 16169Songs 48785 65531

Time periods 163 weeks 163 weeks

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Artist goods: the music broadcasting industry

A song’s popularity increases after their premiere and then decrease

(a) B. Mars, Just the way you are in Germany. (b) B. Mars, Locked Out Of Heaven in Germany.

(c) B. Mars, Just the way you are in the UK. (d) B. Mars, Locked Out Of Heaven in the UK.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Artist goods: the music broadcasting industry

Correlated choices from different broadcasting companies

BBC 1 Xtra Capital FM Kiss 100 FM Metro Radio Radio City Smooth Radio LondonBBC 1 Xtra 1.000 0.729 0.668 0.686 0.010 –Capital FM 1.000 0.814 0.830 -0.135 –Kiss 100 FM 1.000 0.829 -0.142 –Metro Radio 1.000 0.078 –Radio City 1.000 –Smooth Radio London 1.000

Table: Spearman’s correlations among the dynamic plays of Locked Out Of Heaven.

BBC 1 Xtra Capital FM Kiss 100 FM Metro Radio Radio City Smooth Radio LondonBBC 1 Xtra 1.000 0.508 – 0.329 0.001 -0.076Capital FM 1.000 – 0.417 -0.128 -0.091Kiss 100 FM 1.000 – – –Metro Radio 1.000 -0.268 -0.222Radio City 1.000 0.495Smooth Radio London 1.000

Table: Spearman’s correlations among the dynamic plays of Just the way you are.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Artist goods: the music broadcasting industry

Our goal is to have a joint model which allows . . .

Predicting the common life cycle of song diffusion withinthe music broadcasting industry.Detecting the structure of imitation and spillover betweenradio stations and TV channels, based on the observedcorrelations.Taking decision about what’s the best broadcastingindustry to launch a song in order to maximize the futurenumber of plays.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional panel data as two-mode network

NotationR := set of individuals (primary layer); S := set of item (secondary layer); T := set of time periods;

xst = [xs1t xs2t . . . xs|R|t ]T ∈ χ is the |R|-dimensional connection profile of the sth item at time t .

E ⊆ R×R := a set of connections between broadcasting industries;

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional panel data as two-mode network

Spillover measurements to internalize cross-section dependency in the panel

i Ghk (xst ; xs,t−1, . . . , xs,t−τ ) = 1|E|τ

∑τ`=0 d`

((xsht )uh (xsk(t−`))uk

) 1p ;

ii Ghk (xst ; xs,t−1, . . . , xs,t−τ ) = − 1|E|τ

∑τ`=0 d`

∣∣∣∣ xshtuh−

xsk(t−`)uk

∣∣∣∣ 2p ;

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional Gaussian reductionThe exponential family of distribution

An exponential random model

P(xst | xs,t−1, . . . , xs,t−τ ) ∝ h(xst ) exp

αst Ss +∑r∈R

βr Rr +∑

(h, k)∈Eγhk Ghk

- Sst accounts for the size effect of each item in the secondary layer, for s ∈ S;

- Rr accounts for the size effect of each individual in the primary layer, for r ∈ R;

- Ghk internalizes the one-mode projection into the primary layer, for (h, k) ∈ R;

Underlying measure: either h(xst ) =1∏

r∈Rxsrt !

or h(xst ) = (2π)−(τ+1)|R|

2

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional Gaussian reductionThe exponential family of distribution

An exponential random model

The spillover measurement Ghk plays an important role.

P(xsrt | xsr ′t′ such that r ′ 6= r , t ′ < t) ∝1

xsrt !exp

([αst + βr

η

]T [ xsrtC(xsrt )

]),

where

η =1

τ |E|

∑k∈R

γrk

τ∑`=1

(xsk(t−`))1p

and C(xsrt ) = (xsrt )1p , for i,

η =1

τ |E|

γr1

.

.

.γrn

and C(xsrt ) =

τ∑`=1

d`

∣∣∣∣∣ xsrt

ur−

xs1(t−`)

u1

∣∣∣∣∣2p

.

.

.τ∑`=1

d`

∣∣∣∣ xsrt

ur−

xsn(t−`)

un

∣∣∣∣ 2p

for ii.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional Gaussian reductionThe exponential family of distribution

An exponential random model

α = 1 and γ = 1 α = −1 and γ = 1 Spillover measurement

1x!y!

exp(α(x + y) + γ(xy)1/2)

1x!y!

exp(α(x + y) + γ|x − y |)

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional Gaussian reductionThe exponential family of distribution

Multidimensional Gaussian reduction

Under special conditions:

P(xst | xs,t−1, . . . , xs,t−τ ) ∝ h(xst ) exp

αst Ss +∑r∈R

βr Rr +∑

(h, k)∈Eγhk Ghk

- Ghk (xst ; xs,t−1, . . . , xs,t−τ ) =

∑τ`=0 d`

(xsht xsk(t−`)

);

- h(xst ) = (2π)−(τ+1)|R|

2 ;

Xst

.

.

.Xs,t−τ

∼ N (µ,Σ) , where µ = Σ

αst e + β

.

.

.αs,t−τ e + β

and Σ = −1

2

d0Γ . . . dτ Γ

.

.

....

dτ Γ . . . d0Γ

−1

.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional Gaussian reductionThe exponential family of distribution

Why is our model an extension of the ERGM?

Exponential FamilyWhenever the density of a random variable may be written f (x) ∝ h(x) exp{θT C(x)}the family of all such random variables (for all possible θ) is called an exponentialfamily.

Exponential Random Graph Model (ERGM)

Pθ(X = x) =exp{θT C(x)}

Z (θ), where

X is a random network on n nodes (a matrix of 0’s and 1’s);

θ is a vector of parameters;

C(x) is a known vector of graph statistics on x.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional Gaussian reductionThe exponential family of distribution

Why it is difficult to find the MLE

The log-likelihood function

- the model: P(X = x(0)|θ) = exp{θT C(x(0))}Z (θ)

, where x(0) is the

observed data set.

- The log-likelihood function is

`(θ) = θT C(x(0))− log Z (θ)

= θT C(x(0))− log

( ∑all possible x

exp{θT C(x)}

)

- Even in the simplest case of undirected graphs withoutself-edges, the number of graphs in the sum is very large.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional Gaussian reductionThe exponential family of distribution

Maximum Pseudo-likelihood

Let xw be a unique component of x and x−w the vector of all theremaining components.

The pseudo-likelihood function

Let’s approximate the marginal P(xw |θ) by the conditionalP(xw |x−w ; θ)?

Then ˜̀(θ) =∏w

P(xw |x−w ; θ).

Result: The maximum pseudo-likelihood estimate.Unfortunately, little is known about the quality of MPL estimates.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional Gaussian reductionThe exponential family of distribution

Pseudo-likelihood for ERGM

Notation: For a network x and a pair (i , j) of nodes

˜̀(θ) =∏w

P(xw |x−w ; θ)

=∏(i,j)

exp{θT C(x(0))}exp{θT C(xij = 1, x−ij)}+ exp{θT C(xij = 0, x−ij)}

=exp{n(n − 1)θT C(x(0))}∏

(i,j)

(exp{θT C(xij = 1, x−ij)}+ exp{θT C(xij = 0, x−ij)}

)

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional Gaussian reductionThe exponential family of distribution

Pseudo-likelihood for our model

Pseudo-likelihood for our model

˜̀(θ) =∏(r,t)

P(xsrt | xsr ′t′ such that r ′ 6= r , t ′ < t)

∝∏(r,t)

1xsrt !

exp

([αst + βr

η

]T [ xsrtC(xsrt )

]),

What is the normalizing constant for the full conditional?

Z (αst , βr , η) =∑

xsrt≥0

1xsrt !

exp

([αst + βr

η

]T [ xsrtC(xsrt )

])

Even the pseudo-likelihood is hard to define for our model

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Multidimensional Gaussian reductionThe exponential family of distribution

Pseudo-likelihood for our model

Pseudo-likelihood for our model

˜̀(θ) =∏(r,t)

P(xsrt | xsr ′t′ such that r ′ 6= r , t ′ < t)

∝∏(r,t)

1xsrt !

exp

([αst + βr

η

]T [ xsrtC(xsrt )

]),

What is the normalizing constant for the full conditional?

Z (αst , βr , η) =∑

xsrt≥0

1xsrt !

exp

([αst + βr

η

]T [ xsrtC(xsrt )

])

Even the pseudo-likelihood is hard to define for our model

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Bayesian posteriorLet θ = [α1t , . . . , α|S|t , β1, . . . , β|R| , γ11, . . . , γ|R|,|R|]

T be the vector of naturalparameters, π(θ) a prior distribution and x(0) the observed data set. By applying theBayes rule we have:

P(θ | x(0)) =P(x(0) | θ)π(θ)∫

θP(x(0) | θ)π(θ) dθ

=

P(x1 . . . , xτ ; θ)w∏

t = τ+1

P(xt | xt−1 . . . , xt−τ ; θ)π(θ)

∫θ

P(x1 . . . , xτ ; θ)w∏

t = τ+1

P(xt | xt−1 . . . , xt−τ ; θ)π(θ) dθ

=

P(x1 . . . , xτ ; θ)π(θ)

Z (θ)

w∏t = τ+1

m∏s=1

qs,t,θ(xst )∫θ

P(x1 . . . , xτ ; θ)π(θ)

Z (θ)

w∏t = τ+1

m∏s=1

qs,t,θ(xst ) dθ

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Metropolis-Hastings

Since both P(x(0) | θ) and P(θ | x(0)) can only be specified under proportionalityconditions, almost all known valid MCMC algorithms for θ cannot be applied.Consider for instance the Metropolis-Hastings acceptance probability:

πaccept(θ,θ′) = min

{1 , P(x(0) | θ′)π(θ′)

P(x(0) | θ)π(θ)× Q(θ |θ′)

Q(θ′ |θ)

}

= min

1 ,

P(x1 . . . , xτ ; θ′)w∏

t = τ+1

m∏s=1

qs,t,θ′ (xst )π(θ′)

P(x1 . . . , xτ ; θ)w∏

t = τ+1

m∏s=1

qs,t,θ(xst )π(θ)

×Z (θ) Q(θ |θ′)Z (θ′)Q(θ′ |θ)

where Q(θ′ |θ) is the proposal distribution.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Specialized MCMC for doubly intractable distributions

Murray proposed a MCMC approach which overcomes the drawback to alarge extent, based on the simulation of the joint distribution of the parameterand the sample spaces, conditioned to the observed data set x(0), that is tosay P(x, θ | x(0)).

Algorithm 1 Exchange algorithm of Murray.

1: Initialize θ2: repeat3: Draw θ′ from an arbitrary proposal distribution;4: Draw x′ from P(. | θ′)

5: Accept θ′ with probability min{

1,P(x′ | θ)P(x(0) | θ′)π(θ′)P(x(0) | θ)P(x′ | θ′)π(θ)

}6: Update θ

7: until Convergence

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Goodness of fit: graphical illustration

Total number of plays along time by the top-30 songs

(a) Full model. (b) Null model (γ = 0).

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Goodness of fit: graphical illustration

Total number of plays along time by the top-30 songs

(a) Total plays along time. (b) Market share.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Reducing the dimensionality of the parameter space

Model specification based on structuralproperties of the music industryThe parameter space is the whole (|T | × |S|+ |R|+ |E|)-dimensionalEuclidean space, while the sample space has dimension (|T | × |S| × |R|).We use two strategies to reduce the dimensionality of the parameter space:

A. Define communities of broadcasting companies to consider only within-groupspillover effects γ;

B. Define a functional form for the effect of the song life cycle α.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Reducing the dimensionality of the parameter space

A. Reducing the |E| effects γ

Pairwise spillover effects γkh,between individualcompanies h and k with thesame radio format.

Common spillover effectbetween different radioformats γkh, if h and k havedifferent formats.

B. Reducing the |T | × |S| effects α

The broadcasting pattern of songsexhibit a time window in which theirpopularity quickly increases shortlyafter their premier and thendecreases.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Groups of broadcasting companies

WITHIN FORMAT – BETWEEN FORMATS

TV channels

Contemporary and Easy listening

Top 40 and UrbanRadio stations

Rock music

Let’s introduce only the effects γ whichare associated to TV channels and radiostation of the same format.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

The estimated spillover effects

The estimated spillover effects

Contemporary Rock News Sport Top-40 World-Music TV channelsContemporary (−0.089,0.004) (0.012,0.021) (−0.028,0.014) (−0.164,0.012) (−0.030,0.019)

(−0.186,−0.068)

Rock (−0.035,−0.021) (−0.049,0.037) (−0.018,0.001) (−0.032,0.001) (−0.015,−0.021)News (−0.023,0.047) (−0.072,−0.010) (−0.035,0.008) (−0.005,−0.024) (0.009,0.030)Sport (−0.009,0.076) (−0.036,−0.001) (−0.068,0.030) (−0.015,0.013) (−0.029,0.001)Top-40 (−0.070,0.001) (−0.083,0.022) (−0.052,0.000) (−0.038,0.022) (−0.025,0.019)World-Music (−0.017,0.014) (−0.029,0.036) (−0.022,0.005) (−0.017,0.011) (−0.014,0.024)

TV channels (−0.291,−0.038)

BBC 1 Xtra Capital FM Kiss 100 FM Metro Radio Radio City Smooth R. LondonBBC 1 Xtra (−0.009,0.060) (−0.104,0.057) (−0.015,0.012) (0.005,0.024) (−0.015,0.012)Capital FM (−0.015,0.051) (−0.060,0.001) (−0.009,0.025) (0.000,0.025) (−0.013,0.019)Kiss 100 FM (−0.020,0.124) (−0.028,0.025) (−0.009,0.025) (−0.032,0.021) (0.001,0.029)Metro Radio (−0.008,0.094) (−0.009,0.012) (−0.027,0.026) (−0.014,0.037) (0.000,0.055)Radio City (−0.019,0.110) (−0.040,0.012) (−0.015,0.022) (−0.021,0.009) (0.010,0.033)Smooth R. London (−0.033,0.011) (−0.021,0.014) (−0.022,0.016) (−0.032,0.023) (−0.022,0.001)

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Songs’ dynamics

Define a functional form for the effect of song dynamics

The attractiveness trajectory of the sth song can be specified by letting t0 bethe starting week when the song is launched and then considering a gammakernel to design the shape its time dynamics:

αst =

{δ0

s + δ1s (t − t0) + δ2

s log(t − t0) if t > t0−∞ otherwise

where t0 is the week when the song has been launched.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Songs life cycle

Common life cycle of the top-30 songs

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Propagation of the broadcasting decision after thepremier week t0.

max1

S

∑s∈S

T∑t′=1

E[xs,•,t+t′ |xsrt = zr : for all r ∈ R

],

subject to∑

r∈Ryr = 1

zr ≤ min{Myr , φ} r ∈ R,yr ∈ {0, 1}, zr ≥ 0, F ≥ φ ≥ 0 r ∈ R,

Format Eigenvector Expected plays in t0 + 1 Expected plays in t0 + 2φ = 10 φ = 100 φ = 10 φ = 100

Contemporary 0.098 265.795 267.647 265.949 267.720Rock 0.121 265.209 261.803 265.687 261.381News 0.098 265.609 264.058 265.995 263.211Sport 0.177 260.301 263.021 260.875 263.055Top-40 0.097 264.272 265.318 264.879 265.098World Music 0.187 267.345 266.350 266.858 266.603TV-channels 0.101 264.165 263.425 264.171 263.438

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

Discussion

Which are the real achievements of this work?

We considered a large multidimensional panel of songs weeklybroadcasted on radio stations and TV channels and detect a pattern ofcross-section dependencies, based on pairwise imitations.

An exponential random model has been proposed to internalized in aunique probabilistic framework both the songs’ life cycle and thecomplex correlation structure.

A specialized MCMC method has been implemented to estimate themodel parameters.

The out-of-sample goodness of fit has been analyzed, assessing themodel adequacy for the observed data set.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015

Data sets from the music broadcasting industryMultidimensional panel data

An exponential random modelEstimation method

Goodness of fit

THANK YOU FOR YOUR ATTENTION

Acknowledgements The research leading to these results hasreceived funding from the European Research Council under theEuropean Union’s Seventh Framework Programme (FP/2007-2013) /ERC Grant Agreement n. 283300.

Stefano Nasini, Víctor Martínez-de-Albéniz ENBIS-Spring-meeting-2015