LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in...

14
LinkNBed: Multi-Graph Representation Learning with Entity Linkage Rakshit Trivedi * College of Computing Georgia Tech Christos Faloutsos SCS, CMU and Amazon.com Bunyamin Sisman Amazon.com Hongyuan Zha College of Computing Georgia Tech Jun Ma Amazon.com Xin Luna Dong Amazon.com Abstract Knowledge graphs have emerged as an im- portant model for studying complex multi- relational data. This has given rise to the construction of numerous large scale but incomplete knowledge graphs encod- ing information extracted from various re- sources. An effective and scalable ap- proach to jointly learn over multiple graphs and eventually construct a unified graph is a crucial next step for the success of knowledge-based inference for many down- stream applications. To this end, we pro- pose LinkNBed, a deep relational learning framework that learns entity and relation- ship representations across multiple graphs. We identify entity linkage across graphs as a vital component to achieve our goal. We design a novel objective that leverage en- tity linkage and build an efficient multi-task training procedure. Experiments on link prediction and entity linkage demonstrate substantial improvements over the state-of- the-art relational learning approaches. 1 Introduction Reasoning over multi-relational data is a key con- cept in Artificial Intelligence and knowledge graphs have appeared at the forefront as an effective tool to model such multi-relational data. Knowledge graphs have found increasing importance due to its wider range of important applications such as information retrieval (Dalton et al., 2014), natural language processing (Gabrilovich and Markovitch, 2009), recommender systems (Catherine and Co- hen, 2016), question-answering (Cui et al., 2017) * Correspondence: [email protected]. Work done when the author interned at Amazon. and many more. This has led to the increased ef- forts in constructing numerous large-scale Knowl- edge Bases (e.g. Freebase (Bollacker et al., 2008), DBpedia (Auer et al., 2007), Google’s Knowledge graph (Dong et al., 2014), Yago (Suchanek et al., 2007) and NELL (Carlson et al., 2010)), that can cater to these applications, by representing infor- mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness and sparsity and hence most ex- isting relational learning techniques focus on using observed triplets in an incomplete graph to infer unobserved triplets for that graph (Nickel et al., 2016a). Neural embedding techniques that learn vector space representations of entities and relation- ships have achieved remarkable success in this task. However, these techniques only focus on learning from a single graph. In addition to incompleteness property, these knowledge graphs also share a set of overlapping entities and relationships with varying information about them. This makes a compelling case to design a technique that can learn over mul- tiple graphs and eventually aid in constructing a unified giant graph out of them. While research on learning representations over single graph has pro- gressed rapidly in recent years (Nickel et al., 2011; Dong et al., 2014; Trouillon et al., 2016; Bordes et al., 2013; Xiao et al., 2016; Yang et al., 2015), there is a conspicuous lack of principled approach to tackle the unique challenges involved in learning across multiple graphs. One approach to multi-graph representation learning could be to first solve graph alignment problem to merge the graphs and then use exist- ing relational learning methods on merged graph. Unfortunately, graph alignment is an important but still unsolved problem and there exist several tech- niques addressing its challenges (Liu and Yang, 2016; Pershina et al., 2015; Koutra et al., 2013; Buneman and Staworko, 2016) in limited settings. arXiv:1807.08447v1 [cs.LG] 23 Jul 2018

Transcript of LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in...

Page 1: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

LinkNBed: Multi-Graph Representation Learning with Entity Linkage

Rakshit Trivedi ∗College of Computing

Georgia Tech

Christos FaloutsosSCS, CMU

and Amazon.com

Bunyamin SismanAmazon.com

Hongyuan ZhaCollege of Computing

Georgia Tech

Jun MaAmazon.com

Xin Luna DongAmazon.com

Abstract

Knowledge graphs have emerged as an im-portant model for studying complex multi-relational data. This has given rise tothe construction of numerous large scalebut incomplete knowledge graphs encod-ing information extracted from various re-sources. An effective and scalable ap-proach to jointly learn over multiple graphsand eventually construct a unified graphis a crucial next step for the success ofknowledge-based inference for many down-stream applications. To this end, we pro-pose LinkNBed, a deep relational learningframework that learns entity and relation-ship representations across multiple graphs.We identify entity linkage across graphs asa vital component to achieve our goal. Wedesign a novel objective that leverage en-tity linkage and build an efficient multi-tasktraining procedure. Experiments on linkprediction and entity linkage demonstratesubstantial improvements over the state-of-the-art relational learning approaches.

1 Introduction

Reasoning over multi-relational data is a key con-cept in Artificial Intelligence and knowledge graphshave appeared at the forefront as an effective toolto model such multi-relational data. Knowledgegraphs have found increasing importance due toits wider range of important applications such asinformation retrieval (Dalton et al., 2014), naturallanguage processing (Gabrilovich and Markovitch,2009), recommender systems (Catherine and Co-hen, 2016), question-answering (Cui et al., 2017)

∗Correspondence: [email protected]. Workdone when the author interned at Amazon.

and many more. This has led to the increased ef-forts in constructing numerous large-scale Knowl-edge Bases (e.g. Freebase (Bollacker et al., 2008),DBpedia (Auer et al., 2007), Google’s Knowledgegraph (Dong et al., 2014), Yago (Suchanek et al.,2007) and NELL (Carlson et al., 2010)), that cancater to these applications, by representing infor-mation available on the web in relational format.

All knowledge graphs share common drawbackof incompleteness and sparsity and hence most ex-isting relational learning techniques focus on usingobserved triplets in an incomplete graph to inferunobserved triplets for that graph (Nickel et al.,2016a). Neural embedding techniques that learnvector space representations of entities and relation-ships have achieved remarkable success in this task.However, these techniques only focus on learningfrom a single graph. In addition to incompletenessproperty, these knowledge graphs also share a set ofoverlapping entities and relationships with varyinginformation about them. This makes a compellingcase to design a technique that can learn over mul-tiple graphs and eventually aid in constructing aunified giant graph out of them. While research onlearning representations over single graph has pro-gressed rapidly in recent years (Nickel et al., 2011;Dong et al., 2014; Trouillon et al., 2016; Bordeset al., 2013; Xiao et al., 2016; Yang et al., 2015),there is a conspicuous lack of principled approachto tackle the unique challenges involved in learningacross multiple graphs.

One approach to multi-graph representationlearning could be to first solve graph alignmentproblem to merge the graphs and then use exist-ing relational learning methods on merged graph.Unfortunately, graph alignment is an important butstill unsolved problem and there exist several tech-niques addressing its challenges (Liu and Yang,2016; Pershina et al., 2015; Koutra et al., 2013;Buneman and Staworko, 2016) in limited settings.

arX

iv:1

807.

0844

7v1

[cs

.LG

] 2

3 Ju

l 201

8

Page 2: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

The key challenges for the graph alignment prob-lem emanate from the fact that the real world dataare noisy and intricate in nature. The noisy orsparse data make it difficult to learn robust align-ment features, and data abundance leads to com-putational challenges due to the combinatorial per-mutations needed for alignment. These challengesare compounded in multi-relational settings due toheterogeneous nodes and edges in such graphs.

Recently, deep learning has shown significantimpact in learning useful information over noisy,large-scale and heterogeneous graph data (Rossiet al., 2017). We, therefore, posit that combin-ing graph alignment task with deep representationlearning across multi-relational graphs has poten-tial to induce a synergistic effect on both tasks.Specifically, we identify that a key componentof graph alignment process—entity linkage—alsoplays a vital role in learning across graphs. Forinstance, the embeddings learned over two knowl-edge graphs for an actor should be closer to oneanother compared to the embeddings of all theother entities. Similarly, the entities that are alreadyaligned together across the two graphs should pro-duce better embeddings due to the shared contextand data. To model this phenomenon, we proposeLinkNBed, a novel deep learning framework thatjointly performs representation learning and graphlinkage task. To achieve this, we identify key chal-lenges involved in the learning process and makethe following contributions to address them:

• We propose novel and principled approach to-wards jointly learning entity representationsand entity linkage. The novelty of our frame-work stems from its ability to support linkagetask across heterogeneous types of entities.

• We devise a graph-independent inductiveframework that learns functions to capturecontextual information for entities and rela-tions. It combines the structural and semanticinformation in individual graphs for joint in-ference in a principled manner.

• Labeled instances (specifically positive in-stances for linkage task) are typically verysparse and hence we design a novel multi-taskloss function where entity linkage task is tack-led in robust manner across various learningscenarios such as learning only with unlabeledinstances or only with negative instances.

• We design an efficient training procedure toperform joint training in linear time in thenumber of triples. We demonstrate superiorperformance of our method on two datasets cu-rated from Freebase and IMDB against state-of-the-art neural embedding methods.

2 Preliminaries

2.1 Knowledge Graph RepresentationA knowledge graph G comprises of set of factsrepresented as triplets (es, r, eo) denoting the re-lationship r between subject entity es and objectentity eo. Associated to this knowledge graph, wehave a set of attributes that describe observed char-acteristics of an entity. Attributes are representedas set of key-value pairs for each entity and an at-tribute can have null (missing) value for an entity.We follow Open World Assumption - triplets notobserved in knowledge graph are considered to bemissing but not false. We assume that there are noduplicate triplets or self-loops.

2.2 Multi-Graph Relational LearningDefinition. Given a collection of knowledgegraphs G, Multi-Graph Relational Learning refersto the the task of learning information rich represen-tations of entities and relationships across graphs.The learned embeddings can further be used to in-fer new knowledge in the form of link prediction orlearn new labels in the form of entity linkage. Wemotivate our work with the setting of two knowl-edge graphs where given two graphs G1, G2 ∈ G,the task is to match an entity eG1 ∈ G1 to an entityeG2 ∈ G2 if they represent the same real-worldentity. We discuss a straightforward extension ofthis setting to more than two graphs in Section 7.

Notations. Let X and Y represent realization oftwo such knowledge graphs extracted from two dif-ferent sources. Let nXe and nYe represent numberof entities in X and Y respectively. Similarly, nXrand nYr represent number of relations in X and Y .We combine triplets from both X and Y to obtainset of all observed triplets D = (es, r, eo)pPp=1

where P is total number of available records acrossfrom both graphs. Let E andR be the set of all enti-ties and all relations in D respectively. Let |E| = nand |R| = m. In addition to D, we also have setof linkage labels L for entities between X and Y .Each record in L is represented as triplet (eX ∈ X ,eY ∈ Y , l ∈ 0, 1) where l = 1 when the entitiesare matched and l = 0 otherwise.

Page 3: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

3 Proposed Method: LinkNBed

We present a novel inductive multi-graph relationallearning framework that learns a set of aggregatorfunctions capable of ingesting various contextualinformation for both entities and relationships inmulti-relational graph. These functions encode theingested structural and semantic information intolow-dimensional entity and relation embeddings.Further, we use these representations to learn arelational score function that computes how twoentities are likely to be connected in a particularrelationship. The key idea behind this formulationis that when a triplet is observed, the relationshipbetween the two entities can be explained usingvarious contextual information such as local neigh-borhood features of both entities, attribute featuresof both entities and type information of the entitieswhich participate in that relationship.

We outline two key insights for establishing therelationships between embeddings of the entitiesover multiple graphs in our framework:Insight 1 (Embedding Similarity): If the two en-tities eX ∈ X and eY ∈ Y represent the samereal-world entity then their embeddings eX and eY

will be close to each other.Insight 2 (Semantic Replacement): For a giventriplet t = (es, r, eo) ∈ X , denote g(t) as the func-tion that computes a relational score for t usingentity and relation embeddings. If there existsa matching entity es

′ ∈ Y for es ∈ X , denotet′ = (es

′, r, eo) obtained after replacing es with es

′.

In this case, g(t) ∼ g(t′) i.e. score of triplets t andt′ will be similar.

For a triplet (es, r, eo) ∈ D, we describe en-coding mechanism of LinkNBed as three-layeredarchitecture that computes the final output represen-tations of zr, ze

s, ze

ofor the given triplet. Figure 1

provides an overview of LinkNBed architectureand we describe the three steps below:

3.1 Atomic LayerEntities, Relations, Types and Attributes are firstencoded in its basic vector representations. We usethese basic representations to derive more complexcontextual embeddings further.Entities, Relations and Types. The embeddingvectors corresponding to these three componentsare learned as follows:

ves = f(WEes) veo = f(WEeo) (1)

vr = f(WRr) vt = f(WTt) (2)

where ves ,veo ∈ Rd. es, eo ∈ Rn are “one-hot”representations of es and eo respectively. vr ∈Rk and r ∈ Rm is “one-hot” representation of r.vt ∈ Rq and t ∈ Rz is ”one-hot” representationof t . WE ∈ Rd×n, WR ∈ Rk×m and WT ∈Rq×z are the entity, relation and type embeddingmatrices respectively. f is a nonlinear activationfunction (Relu in our case). WE, WR and WT

can be initialized randomly or using pre-trainedword embeddings or vector compositions based onname phrases of components (Socher et al., 2013).Attributes. For a given attribute a representedas key-value pair, we use paragraph2vec (Le andMikolov, 2014) type of embedding network to learnattribute embedding. Specifically, we representattribute embedding vector as:

a = f(Wkeyakey + Wvalaval) (3)

where a ∈ Ry, akey ∈ Ru and aval ∈ Rv.Wkey ∈ Ry×u and Wval ∈ Ry×v. akey will be“one-hot” vector and aval will be feature vector.Note that the dimensions of the embedding vectorsdo not necessarily need to be the same.

3.2 Contextual Layer

While the entity and relationship embeddings de-scribed above help to capture very generic latentfeatures, embeddings can be further enriched tocapture structural information, attribute informa-tion and type information to better explain the exis-tence of a fact. Such information can be modeledas context of nodes and edges in the graph. To thisend, we design the following canonical aggregatorfunction that learns various contextual informationby aggregating over relevant embedding vectors:

c(z) = AGG(z′,∀z′ ∈ C(z)) (4)

where c(z) is the vector representation of the ag-gregated contextual information for component z.Here, component z can be either an entity or a rela-tion. C(z) is the set of components in the contextof z and z′ correspond to the vector embeddingsof those components. AGG is the aggregator func-tion which can take many forms such Mean, Max,Pooling or more complex LSTM based aggrega-tors. It is plausible that different components ina context may have varied impact on the compo-nent for which the embedding is being learned.To account for this, we employ a soft attentionmechanism where we learn attention coefficients

Page 4: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

Figure 1: LinkNBed Architecture Overview - one step score computation for a given triplet (es, r, eo).The Attribute embeddings are not simple lookups but they are learned as shown in Eq 3

to weight components based on their impact beforeaggregating them. We modify Eq. 4 as:

c(z) = AGG(q(z) ∗ z′, ∀z′ ∈ C(z)) (5)

where

q(z) =exp(θz)∑

z′∈C(z)

exp(θz′)(6)

and θz’s are the parameters of attention model.Following contextual information is modeled inour framework:

Entity Neighborhood Context Nc(e) ∈ Rd.Given a triplet (es, r, eo), the neighborhoodcontext for an entity es will be the nodes locatednear es other than the node eo. This will capturethe effect of local neighborhood in the graphsurrounding es that drives es to participate in fact(es, r, eo). We use Mean as aggregator function.As there can be large number of neighbors, wecollect the neighborhood set for each entity as apre-processing step using a random walk method.Specifically, given a node e, we run k rounds ofrandom-walks of length l following (Hamiltonet al., 2017) and create set N (e) by adding allunique nodes visited across these walks. Thiscontext can be similarly computed for object entity.

Entity Attribute Context Ac(e) ∈ Ry. Foran entity e, we collect all attribute embeddingsfor e obtained from Atomic Layer and learnaggregated information over them using Maxoperator given in Eq. 4.

Relation Type Context Tc(r) ∈ Rq. We use typecontext for relation embedding i.e. for a givenrelationship r, this context aims at capturing theeffect of type of entities that have participatedin this relationship. For a given triplet (es, r, eo),type context for relationship r is computed byaggregation with mean over type embeddingscorresponding to the context of r. Appendix Cprovides specific forms of contextual information.

3.3 Representation Layer

Having computed the atomic and contextual em-beddings for a triplet (es, r, eo), we obtain the finalembedded representations of entities and relationin the triplet using the following formulation:

zes= σ( W1ves︸ ︷︷ ︸

Subject Entity Embedding

+ W2Nc(es)︸ ︷︷ ︸

Neighborhood Context

+ W3Ac(es))︸ ︷︷ ︸

Subject Entity Attributes

(7)

zeo= σ( W1veo︸ ︷︷ ︸

Object Entity Embedding

+ W2Nc(eo)︸ ︷︷ ︸

Neighborhood Context

+ W3Ac(eo))︸ ︷︷ ︸

Object Entity Attributes

(8)

zr = σ( W4vr︸ ︷︷ ︸Relation Embedding

+ W5Tc(r))︸ ︷︷ ︸Entity Type Context

(9)

where W1,W2 ∈ Rd×d, W3 ∈ Rd×y, W4 ∈Rd×k and W5 ∈ Rd×q. σ is nonlinear activationfunction – generally Tanh or Relu.

Page 5: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

Following is the rationale for our formulation: Anentity’s representation can be enriched by encodinginformation about the local neighborhood featuresand attribute information associated with the en-tity in addition to its own latent features. Parame-ters W1,W2,W3 learn to capture these differentaspects and map them into the entity embeddingspace. Similarly, a relation’s representation canbe enriched by encoding information about entitytypes that participate in that relationship in addi-tion to its own latent features. Parameters W4,W5

learn to capture these aspects and map them intothe relation embedding space. Further, as the ulti-mate goal is to jointly learn over multiple graphs,shared parameterization in our model facilitate thepropagation of information across graphs therebymaking it a graph-independent inductive model.The flexibility of the model stems from the abilityto shrink it (to a very simple model consideringatomic entity and relation embeddings only) or ex-pand it (to a complex model by adding differentcontextual information) without affecting any otherstep in the learning procedure.

3.4 Relational Score Function

Having observed a triplet (es, r, eo), we first use Eq.7, 8 and 9 to compute entity and relation represen-tations. We then use these embeddings to capturerelational interaction between two entities usingthe following score function g(·):

g(es, r, eo) = σ(zrT · (zes ze

o)) (10)

where zr, zes, ze

o ∈ Rd are d-dimensional repre-sentations of entity and relationships as describedbelow. σ is the nonlinear activation function and represent element-wise product.

4 Efficient Learning Procedure

4.1 Objective Function

The complete parameter spaceof the model can be given by:Ω = Wi5i=1,W

E,WR,Wkey,Wval,Wt,Θ.To learn these parameters, we design a novel multi-task objective function that jointly trains over twographs. As identified earlier, the goal of our modelis to leverage the available linkage informationacross graphs for optimizing the entity and relationembeddings such that they can explain the ob-served triplets across the graphs. Further, we wantto leverage these optimized embeddings to match

entities across graphs and expand the available link-age information. To achieve this goal, we definefollowing two different loss functions catering toeach learning task and jointly optimize over themas a multi-task objective to learn model parameters:

Relational Learning Loss. This is conven-tional loss function used to learn knowledgegraph embeddings. Specifically, given a p-thtriplet (es, r, eo)p from training set D, we sampleC negative samples by replacing either head ortail entity and define a contrastive max marginfunction as shown in (Socher et al., 2013):

Lrel =C∑c=1

max(0, γ − g(esp, rp, eop)

+ g′(esc, rp, eop))

(11)

where, γ is margin, esc represent corrupted entityand g′(esc, rp, e

op) represent corrupted triplet score.

Linkage Learning Loss: We design a novelloss function to leverage pairwise label set L.Given a triplet (esX , rX , e

oX) from knowledge

graph X , we first find the entity e+Y from graph Ythat represent the same real-world entity as esX .We then replace esX with e+Y and compute scoreg(e+Y , rX , e

oX). Next, we find set of all entities E−Y

from graph Y that has a negative label with entityesX . We consider them analogous to the negativesamples we generated for Eq. 11. We then proposethe label learning loss function as:

Llab =Z∑

z=1

max(0, γ − g(e+Y , rX , eoX)

+ (g′(e−Y , rX , eoX)z))

(12)

where, Z is the total number of negative labelsfor eX . γ is margin which is usually set to 1and e−Y ∈ E

−Y represent entity from graph Y with

which entity esX had a negative label. Please notethat this applies symmetrically for the triplets thatoriginate from graph Y in the overall dataset. Notethat if both entities of a triplet have labels, we willinclude both cases when computing the loss. Eq. 12is inspired by Insight 1 and Insight 2 defined earlierin Section 2. Given a set D of N observed tripletsacross two graphs, we define complete multi-taskobjective as:

L(Ω) =

N∑i=1

[b·Lrel+(1−b)·Llab]+λ ‖Ω‖22 (13)

Page 6: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

Algorithm 1 LinkNBed mini-batch TrainingInput: Mini-batch M, Negative Sample SizeC, Negative Label Size Z, Attribute dataatt data, Neighborhood data nhbr data, Typedata type data, Positive Label Dict pos dict,Negative Label Dict neg dictOutput: Mini-batch Loss LM.LM = 0score pos = []; score neg = []; score pos lab =[]; score neg lab = []for i = 0 to size(M) do

input tuple =M[i] = (es, r, eo)sc = compute triplet score(es, r, eo) (Eq. 10)score pos.append(sc)for j = 0 to C do

Select esc from entity list such that esc 6= es

and esc 6= eo and (esc, r, eo) /∈ D

sc neg = compute triplet score(esc, r, eo)

score neg.append(sc neg)end forif es in pos dict thenes+ = positive label for es

sc pos l = compute triplet score(es+, r, eo)score pos lab.append(sc pos l)

end iffor k = 0 to Z do

Select es− from neg dictsc neg l = compute triplet score(es−, r, eo)score neg lab.append(sc neg l)

end forend forLM += compute minibatch loss(score pos,score neg, score pos lab, score neg lab)(Eq. 13)Back-propagate errors and update parameters Ωreturn LM

where Ω is set of all model parameters and λ isregularization hyper-parameter. b is weight hyper-parameter used to attribute importance to each task.We train with mini-batch SGD procedure (Algo-rithm 1) using Adam Optimizer.Missing Positive Labels. It is expensive to ob-tain positive labels across multiple graphs andhence it is highly likely that many entities willnot have positive labels available. For those en-tities, we will modify Eq. 12 to use the originaltriplet (esX , rX , e

oX) in place of perturbed triplet

g(e+Y , rX , eoX) for the positive label. The rationale

here again arises from Insight 2 wherein embed-dings of two duplicate entities should be able to

replace each other without affecting the score.Training Time Complexity. Most contextual in-formation is pre-computed and available to all train-ing steps which leads to constant time embeddinglookup for those context. But for attribute network,embedding needs to be computed for each attributeseparately and hence the complexity to computescore for one triplet is O(2a) where a is numberof attributes. Also for training, we generate Cnegative samples for relational loss function anduse Z negative labels for label loss function. Letk = C + Z. Hence, the training time complexityfor a set of n triplets will be O(2ak ∗ n) which islinear in number of triplets with a constant factor asak << n for real world knowledge graphs. This isdesirable as the number of triplets tend to be verylarge per graph in multi-relational settings.Memory Complexity. We borrow notationsfrom (Nickel et al., 2016a) and describe the pa-rameter complexity of our model in terms of thenumber of each component and correspondingembedding dimension requirements. Let Ha =2∗NeHe+NrHr+NtHt+NkHk+NvHv. The pa-rameter complexity of our model is: Ha ∗ (Hb+1).Here, Ne, Nr, Nt, Nk, Nv signify number of enti-ties, relations, types, attribute keys and vocab sizeof attribute values across both datasets. Here Hb isthe output dimension of the hidden layer.

5 Experiments

5.1 DatasetsWe evaluate LinkNBed and baselines on two realworld knowledge graphs: D-IMDB (derived fromlarge scale IMDB data snapshot) and D-FB (de-rived from large scale Freebase data snapshot). Ta-ble 5.1 provides statistics for our final dataset usedin the experiments. Appendix B.1 provides com-plete details about dataset processing.

Dataset # Entities # Relations # Attributes # Entity # AvailableName Types Triples

D-IMDB 378207 38 23 41 143928582D-FB 39667 146 69 324 22140475

Table 1: Statistics for Datasets: D-IMDB and D-FB

5.2 BaselinesWe compare the performance of our methodagainst state-of-the-art representation learningbaselines that use neural embedding techniques tolearn entity and relation representation. Specif-ically, we consider compositional methods of

Page 7: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

RESCAL (Nickel et al., 2011) as basic matrix fac-torization method, DISTMULT (Yang et al., 2015)as simple multiplicative model good for capturingsymmetric relationships, and Complex (Trouillonet al., 2016), an upgrade over DISTMULT that cancapture asymmetric relationships using complexvalued embeddings. We also compare againsttranslational model of STransE that combinedoriginal structured embedding with TransE and hasshown state-of-art performance in benchmark test-ing (Kadlec et al., 2017). Finally, we compare withGAKE (Feng et al., 2016), a model that capturescontext in entity and relationship representations.

In addition to the above state-of-art models,we analyze the effectiveness of different compo-nents of our model by comparing with variousversions that use partial information. Specifically,we report results on following variants:LinkNBed - Embed Only. Only use entityembeddings, LinkNBed - Attr Only. Only useAttribute Context, LinkNBed - Nhbr Only. Onlyuse Neighborhood Context, LinkNBed - Embed +Attr. Use both Entity embeddings and AttributeContext, LinkNBed - Embed + Nhbr. Use bothEntity embeddings and Neighbor Context andLinkNBed - Embed All. Use all three Contexts.

5.3 Evaluation Scheme

We evaluate our model using two inference tasks:Link Prediction. Given a test triplet (es, r, eo),we first score this triplet using Eq. 10. We thenreplace eo with all other entities in the datasetand filter the resulting set of triplets as shown in(Bordes et al., 2013). We score the remaining setof perturbed triplets using Eq. 10. All the scoredtriplets are sorted based on the scores and thenthe rank of the ground truth triplet is used for theevaluation. We use this ranking mechanism tocompute HITS@10 (predicted rank ≤ 10) andreciprocal rank ( 1

rank ) of each test triplet. Wereport the mean over all test samples.

Entity Linkage. In alignment with Insight2, we pose a novel evaluation scheme to performentity linkage. Let there be two ground truth testsample triplets: (eX , e+Y , 1) representing a positiveduplicate label and (eX , e

−Y , 0) representing a

negative duplicate label. Algorithm 2 outlinesthe procedure to compute linkage probability orscore q (∈ [0, 1]) for the pair (eX , eY ). We useL1 distance between the two vectors analogous

Algorithm 2 Entity Linkage Score ComputationInput: Test pair – (eX ∈ X, eY ∈ Y ).Output: Linkage Score – q.

1. Collect all triplets involving eX from graphX and all triplets involving eY from graph Yinto a combined set O. Let |O| = k.2. Construct Sorig ∈ Rk.For each triplet o ∈ O, compute score g(o) usingEq. 10 and store the score in Sorig.3. Create triplet set O′ as following:if o ∈ O contain eX ∈ X then

Replace eX with eY to create perturbed tripleto′ and store it in O′

end ifif o ∈ O contain eY ∈ Y then

Replace eY with eX to create perturbed tripleto′ and store it in O′

end if4. Construct Srepl ∈ Rk.For each triplet o′ ∈ O′, compute score g(o′)using Eq. 10 and store the score in Srepl.5. Compute q.Elements in Sorig and Srepl have one-one corre-spondence so take the mean absolute difference:q = |Sorig - Srepl|1return q

to Mean Absolute Error (MAE). In lieu ofhard-labeling test pairs, we use score q to computeArea Under the Precision-Recall Curve (AUPRC).

For the baselines and the unsupervised version(with no labels for entity linkage) of our model,we use second stage multilayer Neural Network asclassifier for evaluating entity linkage. AppendixB.2 provides training configuration details.

5.4 Predictive Analysis

Link Prediction Results. We train LinkNBedmodel jointly across two knowledge graphs andthen perform inference over individual graphs to re-port link prediction reports. For baselines, we traineach baseline on individual graphs and use parame-ters specific to the graph to perform link predictioninference over each individual graph. Table 5.4shows link prediction performance for all meth-ods. Our model variant with attention mechanismoutperforms all the baselines with 4.15% improve-ment over single graph state-of-the-art Complexmodel on D-IMDB and 8.23% improvement on D-FB dataset. D-FB is more challenging dataset to

Page 8: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

Method D-IMDB-HITS10 D-IMDB-MRR D-FB-HITS10 D-FB-MRR

RESCAL 75.3 0.592 69.99 0.147DISTMULT 79.5 0.691 72.34 0.556

Complex 83.2 0.725 75.67 0.629STransE 80.7 0.421 69.87 0.397GAKE 69.5 0.114 63.22 0.093

LinkNBed-Embed Only 79.9 0.612 73.2 0.519LinkNBed-Attr Only 82.2 0.676 74.7 0.588LinkNBed-Nhbr Only 80.1 0.577 73.4 0.572

LinkNBed-Embed + Attr 84.2 0.673 78.39 0.606LinkNBed-Embed + Nhbr 81.7 0.544 73.45 0.563

LinkNBed-Embed All 84.3 0.725 80.2 0.632LinkNBed-Embed All (Attention) 86.8 0.733 81.9 0.677

Improvement (%) 4.15 1.10 7.61 7.09

Table 2: Link Prediction Results on both datasets

learn as it has a large set of sparse relationships,types and attributes and it has an order of magni-tude lesser relational evidence (number of triplets)compared to D-IMDB. Hence, LinkNBed’s pro-nounced improvement on D-FB demonstrates theeffectiveness of the model. The simplest version ofLinkNBed with only entity embeddings resemblesDISTMULT model with different objective func-tion. Hence closer performance of those two mod-els aligns with expected outcome. We observedthat the Neighborhood context alone provides onlymarginal improvements while the model benefitsmore from the use of attributes. Despite beingmarginal, attention mechanism also improves accu-racy for both datasets. Compared to the baselineswhich are obtained by trained and evaluated on in-dividual graphs, our superior performance demon-strates the effectiveness of multi-graph learning.

Entity Linkage Results. We report entity link-age results for our method in two settings: a.) Su-pervised case where we train using both the objec-tive functions. b.) Unsupervised case where welearn with only the relational loss function. Thelatter case resembles the baseline training whereeach model is trained separately on two graphs inan unsupervised manner. For performing the entitylinkage in unsupervised case for all models, wefirst train a second stage of simple neural networkclassifier and then perform inference. In the super-vised case, we use Algorithm 2 for performing theinference. Table 5.4 demonstrates the performanceof all methods on this task. Our method signifi-cantly outperforms all the baselines with 33.86%over second best baseline in supervised case and17.35% better performance in unsupervised case.The difference in the performance of our method intwo cases demonstrate that the two training objec-tives are helping one another by learning across thegraphs. GAKE’s superior performance on this taskcompared to the other state-of-the-art relationalbaselines shows the importance of using contex-

Method AUPRC (Supervised) AUPRC (Unsupervised)

RESCAL - 0.327DISTMULT - 0.292

Complex - 0.359STransE - 0.231GAKE - 0.457

LinkNBed-Embed Only 0.376 0.304LinkNBed-Attr Only 0.451 0.397LinkNBed-Nhbr Only 0.388 0.322

LinkNBed-Embed + Attr 0.512 0.414LinkNBed-Embed + Nhbr 0.429 0.356

LinkNBed-Embed All 0.686 0.512LinkNBed-Embed All (Attention) 0.691 0.553

Improvement (%) 33.86 17.35

Table 3: Entity Linkage Results - Unsupervisedcase uses classifier at second step

tual information for entity linkage. Performance ofother variants of our model again demonstrate thatattribute information is more helpful than neigh-borhood context and attention provides marginalimprovements. We provide further insights withexamples and detailed discussion on entity linkagetask in Appendix A.

6 Related Work

6.1 Neural Embedding Methods forRelational Learning

Compositional Models learn representations byvarious composition operators on entity and rela-tional embeddings. These models are multiplica-tive in nature and highly expressive but often suf-fer from scalability issues. Initial models includeRESCAL (Nickel et al., 2011) that uses a relationspecific weight matrix to explain triplets via pair-wise interactions of latent features, Neural TensorNetwork (Socher et al., 2013), more expressivemodel that combines a standard NN layer with a bi-linear tensor layer and (Dong et al., 2014) that em-ploys a concatenation-projection method to projectentities and relations to lower dimensional space.Later, many sophisticated models (Neural Associa-tion Model (Liu et al., 2016), HoLE (Nickel et al.,2016b)) have been proposed. Path based composi-tion models (Toutanova et al., 2016) and contextualmodels GAKE (Feng et al., 2016) have been re-cently studied to capture more information fromgraphs. Recently, model like Complex (Trouil-lon et al., 2016) and Analogy (Liu et al., 2017)have demonstrated state-of-the art performance onrelational learning tasks. Translational Models( (Bordes et al., 2014), (Bordes et al., 2011), (Bor-des et al., 2013), (Wang et al., 2014), (Lin et al.,2015), (Xiao et al., 2016)) learn representation by

Page 9: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

employing translational operators on the embed-dings and optimizing based on their score. They of-fer an additive and efficient alternative to expensivemultiplicative models. Due to their simplicity, theyoften loose expressive power. For a comprehensivesurvey of relational learning methods and empiri-cal comparisons, we refer the readers to (Nickelet al., 2016a), (Kadlec et al., 2017), (Toutanova andChen, 2015) and (Yang et al., 2015). None of thesemethods address multi-graph relational learningand cannot be adapted to tasks like entity linkagein straightforward manner.

6.2 Entity Resolution in Relational Data

Entity Resolution refers to resolving entities avail-able in knowledge graphs with entity mentionsin text. (Dredze et al., 2010) proposed entitydisambiguation method for KB population, (Heet al., 2013) learns entity embeddings for resolu-tion, (Huang et al., 2015) propose a sophisticatedDNN architecture for resolution, (Campbell et al.,2016) proposes entity resolution across multiplesocial domains, (Fang et al., 2016) jointly embedstext and knowledge graph to perform resolutionwhile (Globerson et al., 2016) proposes AttentionMechanism for Collective Entity Resolution.

6.3 Learning across multiple graphs

Recently, learning over multiple graphs havegained traction. (Liu and Yang, 2016) dividesa multi-relational graph into multiple homoge-neous graphs and learns associations across themby employing product operator. Unlike our work,they do not learn across multiple multi-relationalgraphs. (Pujara and Getoor, 2016) provides logicbased insights for cross learning, (Pershina et al.,2015) does pairwise entity matching across multi-relational graphs and is very expensive, (Chen et al.,2017) learns embeddings to support multi-linguallearning and Big-Align (Koutra et al., 2013) tacklesgraph alignment problem efficiently for bipartitegraphs. None of these methods learn latent rep-resentations or jointly train graph alignment andlearning which is the goal of our work.

7 Concluding Remarks and Future Work

We present a novel relational learning frameworkthat learns entity and relationship embeddingsacross multiple graphs. The proposed representa-tion learning framework leverage an efficient learn-ing and inference procedure which takes into ac-count the duplicate entities representing the same

real-world entity in a multi-graph setting. Wedemonstrate superior accuracies on link predic-tion and entity linkage tasks compared to the exist-ing approaches that are trained only on individualgraphs. We believe that this work opens a newresearch direction in joint representation learningover multiple knowledge graphs.

Many data driven organizations such as Googleand Microsoft take the approach of constructing aunified super-graph by integrating data from multi-ple sources. Such unification has shown to signifi-cantly help in various applications, such as search,question answering, and personal assistance. Tothis end, there exists a rich body of work on linkingentities and relations, and conflict resolution (e.g.,knowledge fusion (Dong et al., 2014). Still, theproblem remains challenging for large scale knowl-edge graphs and this paper proposes a deep learningsolution that can play a vital role in this construc-tion process. In real-world setting, we envisionour method to be integrated in a large scale systemthat would include various other components fortasks like conflict resolution, active learning andhuman-in-loop learning to ensure quality of con-structed super-graph. However, we point out thatour method is not restricted to such use cases—onecan readily apply our method to directly make infer-ence over multiple graphs to support applicationslike question answering and conversations.

For future work, we would like to extend thecurrent evaluation of our work from a two-graphsetting to multiple graphs. A straightforward ap-proach is to create a unified dataset out of morethan two graphs by combining set of triplets asdescribed in Section 2, and apply learning and in-ference on the unified graph without any majorchange in the methodology. Our inductive frame-work learns functions to encode contextual informa-tion and hence is graph independent. Alternatively,one can develop sophisticated approaches with it-erative merging and learning over pairs of graphsuntil exhausting all graphs in an input collection.

Acknowledgments

We would like to give special thanks to Ben Lon-don, Tong Zhao, Arash Einolghozati, AndrewBorthwick and many others at Amazon for helpfulcomments and discussions. We thank the reviewersfor their valuable comments and efforts towardsimproving our manuscript. This project was sup-ported in part by NSF(IIS-1639792, IIS-1717916).

Page 10: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

ReferencesSoren Auer, Christian Bizer, Georgi Kobilarov, Jens

Lehmann, Richard Cyganiak, and Zachary Ives.2007. DBpedia: A nucleus for a web of open data.In The Semantic Web.

Kurt Bollacker, Colin Evans, Praveen Paritosh, TimSturge, and Jamie Taylor. 2008. Freebase: a col-laboratively created graph database for structuringhuman knowledge. In SIGMOD Conference.

Antoine Bordes, Xavier Glorot, Jason Weston, andYoshua Bengio. 2014. A semantic matching energyfunction for learning with multi-relational data. Ma-chine Learning.

Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko.2013. Translating embeddings for modeling multi-relational data. In Advances in neural informationprocessing systems, pages 2787–2795.

Antoine Bordes, Jason Weston, Ronan Collobert, andYoshua Bengio. 2011. Learning structured embed-dings of knowledge bases. In AAAI.

Peter Buneman and Slawek Staworko. 2016. Rdf graphalignment with bisimulation. Proc. VLDB Endow.

W. M. Campbell, Lin Li, C. Dagli, J. Acevedo-Aviles, K. Geyer, J. P. Campbell, and C. Priebe.2016. Cross-domain entity resolution in social me-dia. arXiv:1608.01386v1.

Andrew Carlson, Justin Betteridge, Bryan Kisiel, BurrSettles, Estevam R. Hruschka Jr., and Tom M.Mitchell. 2010. Toward an architecture for never-ending language learning. In Proceedings of theTwenty-Fourth Conference on Artificial Intelligence(AAAI 2010).

Rose Catherine and William Cohen. 2016. Personal-ized recommendations using knowledge graphs: Aprobabilistic logic programming approach. In Pro-ceedings of the 10th ACM Conference on Recom-mender Systems.

Muhao Chen, Yingtao Tian, Mohan Yang, and CarloZaniolo. 2017. Multilingual knowledge graph em-beddings for cross-lingual knowledge alignment.

Wanyun Cui, Yanghua Xiao, Haixun Wang, YangqiuSong, Seung-won Hwang, and Wei Wang. 2017.Kbqa: Learning question answering over qa corporaand knowledge bases. Proc. VLDB Endow.

Jeffrey Dalton, Laura Dietz, and James Allan. 2014.Entity query feature expansion using knowledgebase links. In Proceedings of the 37th InternationalACM SIGIR Conference on Research &#38; Devel-opment in Information Retrieval.

Xin Dong, Evgeniy Gabrilovich, Geremy Heitz, WilkoHorn, Ni Lao, Kevin Murphy, Thomas Strohmann,Shaohua Sun, and Wei Zhang. 2014. Knowledge

vault: A web-scale approach to probabilistic knowl-edge fusion. In Proceedings of the 20th ACMSIGKDD International Conference on KnowledgeDiscovery and Data Mining, pages 601–610.

Mark Dredze, Paul McNamee, Delip Rao, Adam Ger-ber, and Tim Finin. 2010. Entity disambiguationfor knowledge base population. In Proceedingsof the 23rd International Conference on Computa-tional Linguistics.

Wei Fang, Jianwen Zhang, Dilin Wang, Zheng Chen,and Ming Li. 2016. Entity disambiguation by knowl-edge and text jointly embedding. In CoNLL.

Jun Feng, Minlie Huang, Yang Yang, and Xiaoyan Zhu.2016. Gake: Graph aware knowledge embedding.In COLING.

Evgeniy Gabrilovich and Shaul Markovitch. 2009.Wikipedia-based semantic interpretation for naturallanguage processing. J. Artif. Int. Res.

Amir Globerson, Nevena Lazic, Soumen Chakrabarti,Amarnag Subramanya, Michael Ringaard, and Fer-nando Pereira. 2016. Collective entity resolutionwith multi-focal attention. In Proceedings of the54th Annual Meeting of the Association for Compu-tational Linguistics.

William L. Hamilton, Rex Ying, and Jure Leskovec.2017. Representation learning on graphs: Methodsand applications. arXiv:1709.05584.

Zhengyan He, Shujie Liu, Mu Li, Ming Zhou, LongkaiZhang, and Houfeng Wang. 2013. Learning entityrepresentation for entity disambiguation. In Pro-ceedings of the 51st Annual Meeting of the Associ-ation for Computational Linguistics.

Hongzhao Huang, Larry Heck, and Heng Ji.2015. Leveraging deep neural networks andknowledge graphs for entity disambiguation.arXiv:1504.07678v1.

Rudolph Kadlec, Ondrej Bajgar, and Jan Kleindienst.2017. Knowledge base completion: Baselines strikeback. In Proceedings of the 2nd Workshop on Rep-resentation Learning for NLP.

Danai Koutra, HangHang Tong, and David Lubensky.2013. Big-align: Fast bipartite graph alignment. In2013 IEEE 13th International Conference on DataMining.

Quoc Le and Tomas Mikolov. 2014. Distributed repre-sentations of sentences and documents. In Proceed-ings of the 31st International Conference on Ma-chine Learning.

Yankai Lin, Zhiyuan Liu, Maosong Sun, and Xuan Zhu.2015. Learning entity and relation embeddings forknowledge graph completion. AAAI Conference onArtificial Intelligence.

Page 11: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

Hanxiao Liu, Yuexin Wu, and Yimin Yang. 2017. Ana-logical inference for multi-relatinal embeddings. InProceedings of the 34th International Conference onMachine Learning.

Hanxiao Liu and Yimin Yang. 2016. Cross-graph learn-ing of multi-relational associations. In Proceedingsof the 33rd International Conference on MachineLearning.

Quan Liu, Hui Jiang, Andrew Evdokimov, Zhen-HuaLing, Xiaodan Zhu, Si Wei, and Yu Hu. 2016. Prob-abilistic reasoning via deep learning: Neural associ-ation models. arXiv:1603.07704v2.

Maximilian Nickel, Kevin Murphy, Volker Tresp, andEvgeniy Gabrilovich. 2016a. A review of relationalmachine learning for knowledge graphs. Proceed-ings of the IEEE.

Maximilian Nickel, Lorenzo Rosasco, and TomasoPoggio. 2016b. Holographic embeddings of knowl-edge graphs.

Maximilian Nickel, Volker Tresp, and Hans-PeterKriegel. 2011. A three-way model for collectivelearning on multi-relational data. In Proceedingsof the 28th International Conference on MachineLearning (ICML-11), pages 809–816.

Maria Pershina, Mohamed Yakout, and KaushikChakrabarti. 2015. Holistic entity matching acrossknowledge graphs. In 2015 IEEE International Con-ference on Big Data (Big Data).

Jay Pujara and Lise Getoor. 2016. Generic statisti-cal relational entity resolution in knowledge graphs.In Sixth International Workshop on Statistical Rela-tional AI.

Ryan A. Rossi, Rong Zhou, and Nesreen K.Ahmed. 2017. Deep feature learning for graphs.arXiv:1704.08829.

Richard Socher, Danqi Chen, Christopher D Manning,and Andrew Ng. 2013. Reasoning with neural ten-sor networks for knowledge base completion. InAdvances in Neural Information Processing Systems,pages 926–934.

Fabian M. Suchanek, Gjergji Kasneci, and GerhardWeikum. 2007. Yago: A core of semantic knowl-edge. In Proceedings of the 16th International Con-ference on World Wide Web.

Kristina Toutanova and Danqi Chen. 2015. Observedversus latent features for knowledge base and textinference. In ACL.

Kristina Toutanova, Xi Victoria Lin, Wen-tau Yih, Hoi-fung Poon, and Chris Quirk. 2016. Compositionallearning of embeddings for relation paths in knowl-edge bases and text. In Proceedings of the 54th An-nual Meeting of the Association for ComputationalLinguistics, volume 1, pages 1434–1444.

Theo Trouillon, Johannes Welbl, Sebastian Riedel, EricGaussier, and Guillaume Bouchard. 2016. Complexembeddings for simple link prediction. In Proceed-ings of the 33rd International Conference on Ma-chine Learning.

Zhen Wang, Jianwen Zhang, Jianlin Feng, and ZhengChen. 2014. Knowledge graph embedding by trans-lating on hyperplanes.

Han Xiao, Minlie Huang, and Xiaoyan Zhu. 2016.Transg: A generative model for knowledge graphembedding. In Proceedings of the 54th AnnualMeeting of the Association for Computational Lin-guistics.

Bishan Yang, Wen-tau Yih, Xiaodong He, JianfengGao, and Li Deng. 2015. Embedding entities andrelations for learning and inference in knowledgebases. arXiv:1412.6575.

Page 12: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

Appendix for

A Discussion and Insights on Entity Linkage Task

Entity linkage task is novel in the space of multi-graph learning and yet has not been tackled by anyexisting relational learning approaches. Hence we analyze our performance on the task in more detailhere. We acknowledge that baseline methods are not tailored to the task of entity linkage and hence theirlow performance is natural. But we observe that our model performs well even in the unsupervisedscenario where essentially the linkage loss function is switched off and our model becomes a relationallearning baseline. We believe that the inductive ability of our model and shared parameterization helps tocapture knowledge across graphs and allows for better linkage performance. This outcome demonstratesthe merit in multi-graph learning for different inference tasks. Having said that, we admit that our resultsare far from comparable to state-of-the-art linkage results (Das et al., 2017) and much work needs to bedone to advance representation and relational learning methods to support effective entity linkage. But wenote that our model works for multiple types of entities in a very heterogeneous environment with somepromising results which serves as an evidence to pursue this direction for entity linkage task.

We now discuss several use-case scenarios where our model did not perform well to gain in-sights on what further steps can be pursued to improve over this initial model:

Han Solo with many attributes (False-negative example). Han Solo is a fictional character inStar Wars and appears in both D-IMDB and D-FB records. We have a positive label for this sample butwe do not predict it correctly. Our model combines multiple components to effectively learn acrossgraphs. Hence we investigated all the components to check for the failures. One observation we haveis the mismatch in the amount of attributes across the two datasets. Further, this is compounded bymulti-value attributes. As described, we use paragraph2vec like model to learn attribute embeddingswhere for each attribute, we aggregate over all its values. This seems to be computing embeddingsthat are very noisy. As we have seen attributes are affecting the final result with high impact andhence learning very noisy attributes is not helping. Further, the mismatch in number of types isalso an issue. Even after filtering the types, the difference is pretty large. Types are also includedas attributes and they contribute context to relation embeddings. We believe that the skew in typedifference is making the model learn bad embeddings. Specifically this happens in cases where lotof information is available like Han Solo as it lead to the scenario of abundant noisy data. Withour investigation, we believe that contextual embeddings need further sophistication to handle suchscenarios. Further, as we already learn relation, type and attribute embeddings in addition to entity em-beddings, aligning relations, types and attributes as integral task could also be an important future direction.

Alfred Pennyworth is never the subject of matter (False-negative example). In this case, weobserve a new pattern which was found in many other examples. While there are many triples availablefor this character in D-IMDB, very few triplets are available in D-FB. This skew in availability ofdata hampers the learning of deep network which ends up learning very different embeddings for tworealizations. Further, we observe another patter where Alfred Pennyworth appears only as an object inall those few triplets of D-FB while it appears as both subject and object in D-IMDB. Accounting forasymmetric relationships in an explicit manner may become helpful for this scenario.

Thomas Wayne is Martha Wayne! (False-positive example). This is the case of abundance ofsimilar contextual information as our model predicts Thomas Wayne and Martha Wayne to be sameentity. Both the characters share a lot of context and hence many triples and attributes, neighborhoodetc. are similar for of them eventually learning very similar embeddings. Further as we have seenbefore, neighborhood has shown to be a weak context which seems to hamper the learning in this case.Finally, the key insight here is to be able to attend to the very few discriminative features for the enti-ties in both datasets (e.g. male vs female) and hence a more sophisticated attention mechanism would help.

Page 13: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

In addition to the above specific use cases, we would like to discuss insights on following gen-eral concepts that naturally occur when learning over multiple graphs:

• Entity Overlap Across Graphs. In terms of overlap, one needs to distinguish between *real* and*known* overlap between entities. For the known overlap between entities, we use that knowledgefor linkage loss function Llab. But our method does not need to assume either types of overlap. Incase there is no real overlap, the model will learn embeddings as if they were on two separate graphsand hence will only provide marginal (if any) improvement over state-of-art embedding methodsfor single graphs. If there is real overlap but no known overlap (i.e., no linked entity labels), theonly change is that Equation (13) will ignore the term (1− b) · Llab. Table 3 shows that in this case(corresponding to AUPRC (Unsupervised)), we are still able to learn similar embeddings for graphentities corresponding to the same real-world entity.

• Disproportionate Evidence for entities across graphs. While higher proportion of occurrenceshelp to provide more evidence for training an entity embedding, the overall quality of embeddingwill also be affected by all other contexts and hence we expect to have varied entity-specific behaviorwhen they occur in different proportions across two graphs

• Ambiguity vs. Accuracy. The effect of ambiguity on accuracy is dependent on the type of semanticdifferences. For example, it is observed that similar entities with major difference in attributes acrossgraphs hurts the accuracy while the impact is not so prominent for similar entities when only theirneighborhood is different.

B Implementation Details

B.1 Additional Dataset Details

We perform light pre-processing on the dataset to remove self-loops from triples, clean the attributesto remove garbage characters and collapse CVT (Compound Value Types) entities into single triplets.Further we observe that there is big skew in the number of types between D-IMDB and D-FB. D-FBcontains many non-informative type information such as #base.∗. We remove all such non-informativetypes from both datasets which retains 41 types in D-IMDB and 324 types in D-FB. This filtering doesnot reduce the number of entities or triples by significant number (less than 1000 entities filtered)

For comparing at scale with baselines, we further reduce dataset using similar techniques adopted inproducing widely accepted FB-15K or FB-237K. Specifically, we filter relational triples such that bothentities in a triple contained in our dataset must appear in more than k triples. We use k = 50 for D-FBand k = 100 for D-IMDB as D-IMDB has orders of magnitude more triples compared to D-FB in ourcurated datasets. We still maintain the overall ratio of the number of triples between the two datasets.

Positive and Negative Labels. We obtain 500662 positive labels using the existing links between thetwo datasets. Note that any entity can have only one positive label. We also generate 20 negative labelsfor each entity using the following method: (i) randomly select 10 entities from the other graph such thatboth entities belong to the same type and there exist no positive label between entities (ii) randomly select10 entities from the other graph such that both entities belong to different types.

B.2 Training Configurations

We performed hyper-parameter grid search to obtain the best performance of our method and finally usedthe following configuration to obtain the reported results:– Entity Embedding Size: 256, Relation Embedding Size=64, Attribute Embedding Size = 16, TypeEmbedding Size = 16, Attribute Value Embedding Size = 512. We tried multiple batch sizes with veryminor difference in performance and finally used size of 2000. For hidden units per layer, we use size =64. We used C = 50 negative samples and Z = 20 negative labels. The learning rate was initialized as0.01 and then decayed over epochs. We ran our experiments for 5 epochs after which the training starts toconvert as the dataset is very large. We use loss weights b as 0.6 and margin as 1. Further, we use K = 50

Page 14: LinkNBed: Multi-Graph Representation Learning with Entity Linkage · mation available on the web in relational format. All knowledge graphs share common drawback of incompleteness

random walks of length l = 3 for each entity We used a train/test split of 60%/40% for both the triplesset and labels set. For baselines, we used the implementations provided by the respective authors andperformed grid search for all methods according to their requirements.

C Contextual Information Formulations

Here we describe exact formulation of each context that we used in our work.

Neighborhood Context: Given a triplet (es, r, eo), the neighborhood context for an entity es

will be all the nodes at 1-hop distance from es other than the node eo. This will capture the effect of othernodes in the graph surrounding es that drives es to participate in fact (es, r, eo). Concretely, we define theneighborhood context of es as follows:

Nc(es) =

1

ne′

∑e′∈N (es)e′ 6=eo

ve′ (14)

where N (es) is the set of all entities in neighborhood of es other than eo. We collect the neighborhoodset for each entity as a pre-processing step using a random walk method. Specifically, given a node e, werun k rounds of random-walks of length l and create the neighborhood set N (e) by adding all uniquenodes visited across these walks.

Please note that we can also use max function in (14) instead of sum. Nc(es) ∈ Rd and the

context can be similarly computed for object entity.

Attribute Context. For an entity es, the corresponding attribute context is defined as

Ac(es) =

1

na

na∑i=1

aes

i (15)

where na is the number of attributes. aes

i is the embedding for attribute i. Ac(es) ∈ Ry.

Type Context. We use type context mainly for relationships i.e. for a given relationship r, thiscontext aims at capturing the effect of type of entities that have participated in this relationship. For agiven triplet (es, r, eo), we define type context for relationship r as:

Tc(r) =1

nrt

nrt∑

i=1

vt′i (16)

where, nrt is the total number of types of entities that has participated in relationship r and vt′i is the

type embedding that corresponds to type t. Tc(r) ∈ Rq.