Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf ·...

21
Community Detection with Partially Observable Links and Node Attributes Xiaokai Wei, Bokai Cao, Weixiang Shao, Chun-Ta Lu and Philip S. Yu

Transcript of Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf ·...

Page 1: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

Community Detection with Partially Observable Links and Node Attributes

Xiaokai Wei, Bokai Cao, Weixiang Shao, Chun-Ta Lu and Philip S. Yu

Page 2: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

2

Partially Observable Network

§  Networks are partially observable w.r.t. links and attributes under various scenarios: – Online social network: news users typically have no

links and some users does not fill up their profile information. Besides, prudent users might set strict privacy setting which limits the visibility of their profile information and connections.

– Bioinformatics: It is more costly to obtain attributes/interaction links for certain genes.

– Terrorist network: some terrorists’ attribute and linkage information could be difficult to obtain.

Page 3: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

3

Introduction

§  Link and node attributes provide complementary information about nodes

§  Challenge: nodes with only observable links and nodes with only observable attributes are not directly comparable

v4

v3

v2

v1

v6

Node attribute viewLink structure view

v4

v3

v2

v1v5

Page 4: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

4

Introduction

§  POLNA: –  community detection with Partially Observable

Links and Node Attributes

§  Idea: – Learn a latent representation for each node via

kernel matrix alignment based on either link structure or node content

– Link-based representations and content-based representations of fully observable nodes interact with each other by co-regularization

Page 5: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

5

Formulation

Definition 1 Information Network An information network G = (V,E,X)

consists of V , the set of nodes, E ✓ V ⇥ V , the set of links, and the feature

matrix X = [x1,x2, . . . ,xn] (n = |V |), where xi 2 RD(i = 1 . . . n) is the feature

vector of node vi.

Definition 2 Partially Observable Information Network In a partially

observable information network G = (V,E,X), each node belongs to one or

two (overlapping) of the following sets: the set of nodes with observable links

(denoted as Og) and nodes with observable attributes (denoted as Oa

). We

further denote the set of nodes with both observable links and attributes as Os=

Og \ Oa. Note that V = Og [ Oa

since we assume each node has at least one

source of information.

Let the number of nodes with observable links or observable attributes be

ng= |Og| and na

= |Oa|, respectively. For more concise notations in the

following sections, we assign local indices 1 ⇠ ngto the nodes in Og

and 1 ⇠ na

to the nodes in Oa. A mapping function �g

(i) (or �a(i)) maps the local index

i 2 {1, . . . , ng} (or i 2 {1, . . . , na}) in Og(or Oa

) to its global index.

Page 6: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

6

Matrix Alignment

§  Measuring the similarity between two (vectorized) matrices

Matrix Alignment For two matrices K1 2 Rn⇥nand K2 2 Rn⇥n

(assume

||K1||F > 0 and ||K2||F > 0), the alignment between K1 and K2 is defined as

⇢(K1,K2) =Tr(K1K2)

||K1||F · ||K2||F(1)

where Tr(·) is the trace of a matrix.

Unnormalized Matrix Alignment For two matrices K1 2 Rn⇥nand

K2 2 Rn⇥n(assume ||K1||F > 0 and ||K2||F > 0), the alignment between K1

and K2 is defined as

⇢(K1,K2) = Tr(K1K2) (1)

Centered Matrix Alignment For two matrices K1 2 Rn⇥nand K2 2

Rn⇥n(assume ||K1||F > 0 and ||K2||F > 0), the centered alignment between

K1 and K2 is defined as

⇢(K1,K2) =Tr(HK1HHK2H)

=Tr(HK1HK2)

where the second equation can be obtained by noting HH = H and Tr(AB) =

Tr(BA) for arbitrary matrices A,B 2 Rn⇥n. H = I� 1

n11T

Page 7: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

7

Representation Learning with Kernel Matrix Alignment

§  Kernel constructed from latent representations

§  Learn latent representations that best align with the network structure and attribute kernel

Suppose Kg 2 Rng⇥ngand Ka 2 Rna⇥na

are kernel matrices computed from

the latent representation Ugand Ua

with Gaussian Kernel, respectively.

Kgij =e�||Ug

i �Ugj ||

2

Kaij =e�||Ua

i �Uaj ||

2

minUg

fg = �Tr(HLgHKg) + �||Ug||2F

minUa

fa = �Tr(HLaHKa) + �||Ua||2F

Attribute-based representation

Network-based representation

Page 8: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

8

Partial Co-regularization

§  Learn a consensus from the link-based representation and attribute-based representation

§  POLNA with hard constraint

minUa,Ug,U⇤

f =fg⇤ + fa⇤ + f�⇤

=� Tr(HLgHKg)

� Tr(HLaHKa)

+ �||U⇤||2Fs.t. Ug

�g�1(i)= Ua

�a�1(i) = U⇤i

minU⇤

f =� Tr(HLgHKg)

� Tr(HLaHKa)

+ �||U⇤||2F

Page 9: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

9

Partial Co-regularization

§  POLNA with soft constraint – Enforcing the link representation and attribute

representation to be exactly the same might be too strict, since links and attributes could contain certain information reflecting their unique characteristics

minUa,Ug,U⇤

f =fg + fa + fs

=� Tr(HLgHKg) + �||Ug||2F� Tr(HLaHKa) + �||Ua||2F+ ↵

X

i2Os

(||Ug�g�1(i)

�U⇤i ||2F + ||Ua

�a�1(i) �U⇤i ||2F )

Page 10: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

10

Optimization

§  Gradient

@fg

@Ugi

= �ngX

j=1

((HLgH)ij ·@Kg

ij

@Ugi

+ (HLgH)ji ·@Kg

ji

@Ugi

) + 2�Ugi

= 4 ·ngX

j=1

(((HLgH)�Kg)ij�Ug

i �Ugj

�) + 2�Ug

i

Page 11: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

11

Optimization

§  POLNA with hard constraint

Algorithm 1 Optimization for POLNA with Hard Consensus Constraint(POLNA-HC)

1: Input: (Partially observable) feature matrix X, (Partially observable) net-work G, �.

2: Output: Consensus representation U

3: Initialize: Ugi = rand(0, 1), Ua

i = rand(0, 1), U⇤ = 0

4: Update U

⇤ with Eq (15) by L-BFGS

Page 12: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

12

Optimization

§  POLNA with soft constraint – Alternating optimization

Algorithm 1 Alternating Optimization for POLNA with Soft Consensus Con-straint (POLNA-SC)

1: Input: (Partially observable) feature matrix X, (Partially observable) net-work G, � and ↵.

2: Output: Consensus representation U

⇤.3: Initialize: Ug

i = rand(0, 1), Uai = rand(0, 1), U⇤ = 0

4: Set t = 15: while not converged do

6: Find the optimal Ug by L-BFGS with Eq (16)7: if t = 1 then

8: U

⇤i = U

g�g�1(i)

, 8i 2 Os

9: end if

10: Find the optimal Ua by L-BFGS with Eq (17)11: U

⇤i = (Ug

�g�1(i)+U

a�a�1(i))/2, 8i 2 Os

12: t = t+ 113: end while

14: U

⇤i = U

g�g�1(i)

, 8i 2 Og/Os

15: U

⇤i = U

a�a�1(i), 8i 2 Oa/Os

Page 13: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

13

Evaluation

§  Datasets – Citeseer: citation network – Cora: citation network

§  Evaluation Metrics – Accuracy – NMI

Table 1: Statistics of two datasets

Statistics Citeseer Cora

# of instances 3312 2708

# of links 4598 5429

# of attributes 3703 1433

# of classes 6 7

Page 14: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

14

Evaluation

§  Methods: –  NMF: Non-negative Matrix Factorization on combined

network –  Spectral Clustering on combined network –  POLNA-HC: POLNA with hard constraint –  POLNA-SC: POLNA with soft constraint

§  Combined network for baselines: –  Create kNN network from observable node attributes and

combine it with observable link structure

Page 15: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

15

Evaluation

§  Accuracy with different observable rate

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

Percentage of Fully Observable Nodes

Accu

racy

POLNA−SCPOLNA−HCSNMFSpectral

00.20.40.60.810.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

Percentage of Fully Observable Nodes

Accu

racy

POLNA−SCPOLNA−HCSNMFSpectral

Citeseer Cora

Page 16: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

16

Evaluation

§  NMI with different observable rate

00.20.40.60.810

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Percentage of Fully Observable Nodes

NMI

POLNA−SCPOLNA−HCSNMFSpectral

00.20.40.60.810

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Percentage of Fully Observable Nodes

NMI

POLNA−SCPOLNA−HCSNMFSpectral

Citeseer Cora

Page 17: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

17

Convergence Analysis

§  Accuracy and objective function w.r.t. # iterations

0 2 4 6 8 10 12 14 16 18 200.2

0.3

0.4

0.5

0.6

0.7

Number of Iterations

ACC

0 2 4 6 8 10 12 14 16 18 20−4

−3

−2

−1

0

1x 104

Obj

ectiv

e Fu

nctio

n

ACCObjective Function

0 2 4 6 8 10 12 14 16 18 200.2

0.4

0.6

0.8

Number of Iterations

ACC

0 2 4 6 8 10 12 14 16 18 20−4

−2

0

2x 104

Obj

ectiv

e Fu

nctio

n

ACCObjective Function

Citeseer Cora

Page 18: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

18

Sensitivity Analysis

§  POLNA-HC w.r.t.

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8

Percentage of Fully Observable Nodes

Accu

racy

h=0.01h=0.05h=0.1h=0.5h=1h=5h=10

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8

Percentage of Fully Observable Nodes

Accu

racy

h=0.01h=0.05h=0.1h=0.5h=1h=5h=10

Citeseer Cora

Page 19: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

19

Sensitivity Analysis

§  POLNA-SC w.r.t.

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8

Percentage of Fully Observable Nodes

Accu

racy

h=0.01h=0.05h=0.1h=0.5h=1h=5

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8

Percentage of Fully Observable Nodes

Accu

racy

h=0.01h=0.05h=0.1h=0.5h=1h=5

Citeseer Cora

Page 20: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

20

Sensitivity Analysis

§  POLNA-SC w.r.t.

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8

Percentage of Fully Observable Nodes

Accu

racy

_=0.01_=0.05_=0.1_=0.5_=1_=5_=10

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8

Percentage of Fully Observable Nodes

Accu

racy

_=0.01_=0.05_=0.1_=0.5_=1_=5_=10

Citeseer Cora

Page 21: Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf · POLNA with soft constraint – Alternating optimization 3: Algorithm 1 Alternating

21

Questions?