Research Article On Efficient Link Recommendation in...

Research ArticleOn Efficient Link Recommendation in SocialNetworks Using Actor-Fact Matrices

MichaB Ciesielczyk Andrzej Szwabe and MikoBaj Morzy

Poznan University of Technology Piotrowo 2 60-965 Poznan Poland

Correspondence should be addressed to Mikołaj Morzy mikolajmorzyputpoznanpl

Received 28 February 2014 Revised 21 November 2014 Accepted 21 November 2014

Academic Editor Reda Alhajj

Copyright copy 2015 Michał Ciesielczyk et al This is an open access article distributed under the Creative Commons AttributionLicense which permits unrestricted use distribution and reproduction in any medium provided the original work is properlycited

Link recommendation is a popular research subject in the field of social network analysis and mining Often the main emphasis isput on the development of new recommendation algorithms semantic enhancements to existing solutions design of new similaritymeasures and so forth However relatively little scientific attention has been paid to the impact that various data representationmodels have on the performance of recommendation algorithms And by performance we do not mean the time or memoryefficiency of algorithms but the precision and recall of recommender systems Our recent findings unanimously show that thechoice of network representation model has an important and measurable impact on the quality of recommendations In thispaper we argue that the computation quality of link recommendation algorithms depends significantly on the social networkrepresentation and we advocate the use of actor-fact matrix as the best alternative We verify our findings using several state-of-the-art link recommendation algorithms such as SVD RSVD and RRI using both single-relation and multirelation dataset

1 Introduction

Link recommendation along with link prediction is a popu-lar research topic in the domain of social network analysis andmining [1] Numerous algorithms have been proposed overthe years [2] The main objective of link recommendationand prediction is to predict based on the historical dataunobserved relationships and interactions between actors ofa social network [3] It should be stressed here that the termldquolinkrdquo is used here freely as the task can refer to predictingpossible (existing or future) relationships between peoplerecommending interesting resources to actors of the networkor discovering latent similarities between objects Usually adistinction is drawn between link prediction (where the taskis to evaluate the probability of a given relationshiprsquos existencebetween actors) and link recommendation (where the task isto select top 119896 resources relevant to a given actor) One cansee however that it is relatively easy to combine the two tasksunder a single framework For the sake of brevity we will referto both problems as ldquolink recommendationrdquo throughout thispaper Link recommendation is predicated on the existenceof data either panel data or event data [4] Panel data refer

to snapshots of the social network taken at certain intervalsand representing possibly a coarse-grained view of existingrelationships In contrast event data refer to detailed recordsof activities between actors in the network Event data is time-stamped and fine-grained and often results from automatedmeasurements or transactions These two types of data aremerged and processed and split into a training set and atest set for the purpose of training of link recommendationmodels

Although link recommendation tasks have attractedsignificant attention of the scientific community over thelast years in our opinion relatively little work has beendone on the impact of data representation models on thequality of recommendations By far the most popular datarepresentation model is an actor-object matrix where actorsof the social network are represented as rows and objectsthat are the subject of recommendations are represented ascolumns The cells of such matrix may contain either binaryflags to denote the existence of a relation (eg Adam likesldquoThe Policerdquo) or a value of the relation both discrete andnumerical (eg Beth ate at ldquoPizza Paradiserdquo and rated it with45 stars) One may note that the social network need not be

Hindawi Publishing CorporationScientific ProgrammingVolume 2015 Article ID 450215 9 pageshttpdxdoiorg1011552015450215

2 Scientific Programming

a bipartite graphWhen the relation is defined between actors(eg Carol likes Douglas) the actor-object matrix becomessimply a square matrix The situation becomes slightly morecomplex in case of multirelational social networks wheremultiple different relations of possibly varying semanticsmay exist between actors in the network A typical exampleis a network where actors may express both fondness of andrejection of certain objects (eg Eve likes to watch comedymovies but she hates horror movies) If the storage of relationvalues is permitted by a given data model multirelationalnetworksmay bemodeled by assigning distinct values (or setsof values) to particular relations but for a binary actor-objectmatrix it is necessary to represent each relation by a separatematrix and to include processing of multiple matrices by therecommendation algorithm

In this paper we argue that actor-object matrix is notthe optimal data model for recommendation algorithmsOur experiments conclusively show that transformation fromthe actor-object to the actor-fact matrix improves recom-mendation quality significantly as measured by the popularldquoarea under receiver-operator characteristic curverdquo (AUROC)measure We perform extensive experiments on a large real-world dataset to support our claims Given the fact that thevast majority of link recommendation algorithms for socialnetworks compute actor-object actor-actor or object-objectsimilarities by applying linear algebra on data representationmatrices the superiority of actor-fact matrix representationbecomes quite obvious (in particular for methods which aregenerally based on singular value decomposition paradigm)The original contribution of this paper consists in the intro-duction of two elements

(i) a data representationmethod based on a binary actor-fact matrix

(ii) a similarity quasimeasure based on the 1-norm lengthof theHadamard product of the given tuple of vectors

Our key finding is that the proposed data representa-tion and the new similarity measure when combined withreflexivematrix processing significantly outperform state-of-the-art collaborative filtering methods based on the use of astandard actor-object matrix

Our paper is organized as follows In Section 2 we reporton the related work on the subject and we present the refer-enced recommendation algorithms Section 3 introduces theconcept of the actor-fact matrix In Section 4 we present theevaluation methodology of the actor-fact matrix represen-tation and we report the results of conducted experimentsin Section 5 The paper concludes in Section 6 with a briefsummary

2 Related Work

By far the most popular approach to link recommendationin social networks is collaborative filtering using an inputmatrix which represents each actor as a vector in the space ofobjects and each object as a vector in the space of actorsManyprevious works consider building a model of collaborativesimilarity from amodel of content-based interobject relations

to be the most promising hybrid link recommendationtechnique [5 6] As far as algebraic representations of graphdata is concerned the actor-fact matrix model is similar tothe model described in [7] Indeed our model was inspiredby the semantic data model of RDF triples Also as far asthe algebraic transformation of the graph data is concernedthe model presented in this paper may be regarded as similarto RDF data search methods which are based on spreadingactivation realized by means of iterative matrix data pro-cessing [8] or single multiplication by a random projectionmatrix [7] However the latter method is limited to theRDF graph node search using a traditional bilateral similaritymeasure whereas we extend the model by using a vector-space quasisimilarity measure which allows to efficientlycompute the likelihood of an unknown relationship

In our evaluation we use threemain types of collaborativefiltering recommender algorithmsThe baseline is establishedby a simple popularity-based algorithm favoring objects hav-ing the highest number of positive relationships in the trainset [9] Next we have employed several different approachesto the input matrix decomposition Firstly we have used thealgorithm based on reflexive random indexing [10] Secondlywe have used two types of algorithms that are based onthe singular value decomposition a traditional implemen-tation of the method (PureSVD) in which actor vectorsare represented as combinations of object vectors withoutany specific parameterization and an implementation ofthe randomized singular value decomposition (RSVD) [11]which is a combination of the reflexive random indexing andSVD We have chosen so since SVD-based methods havebeen long considered to be the most efficient recommenderengines in real world settings [12ndash15]

Section 5 presents the results of conducted experimentsSince our data have the form of binary prepositions (ieour social network is a signed network) the evaluation ofthe proposed method is oriented on the task of findingrelevant links [16] rather than on the minimization of recom-mendation rating error Classification metrics such as areaunder ROC (AUROC) measure the probability of makingcorrect or incorrect decisions by the recommender algorithmabout whether an object is relevant Moreover classificationmetrics tolerate the differences between actual and predictedvalues as long as they do not lead to wrong decisions Thusthese metrics are appropriate to examine binary relevancerelationships In particular while usingAUROC it is assumedthat the ordering among relevant items does not matterAccording to [17] AUROC is equivalent to the probability ofthe systembeing able to choose properly between two objectsone randomly selected from the set of relevant objects andone randomly selected from the set of nonrelevant objectsFor this reason the results of the theoretical research areevaluated bymeans of experiments based on qualitymeasuresthat are probabilistically interpretable such as AUROC

3 Actor-Fact Matrix

Let us recall that our model is influenced by the semanticmodel of RDF triples Each RDF triple combines informationabout the predicate that relates a subject to an object We

Scientific Programming 3

consider a generic social network (for simplicity we constrainourselves to nonvalued relations but the proposed methodmay be easily extended to valued relations) which conceptu-ally consists of a set of actors 119860 = 119886

1 119886

119899 a set of objects

119874 = 1199001 119900

119898 and a set of relations 119877 = 119877

1 119877

119897

where each relation 119877119894represents a function 119877

119894 119860 times 119874 rarr

0 1 Let us now combine all actors objects and possiblepredicates into a single set 119864 = 119860 cup 119874 cup 119877 Furthermorelet |119860| = 119899 |119874| = 119898 and |119877| = 119897 Of course there is norequirement to have the set of actors be separate from theset of objects that is in general it is possible that 119860 sube 119874It should be noted though that if sets119860 and119874 would overlapthat is if they would be represented by the same vectorsit would not be possible to take advantage of the semanticsof actors constituting relationships In other words puttingactors and objects together into a single set would makeit impossible to distinguish between semantically correctrelationships such as ldquoAlice likes applesrdquo and semanticallyincorrect relationships such as ldquoapples like Alicerdquo Being ableto encode such semantics directly in social network matrixrepresentation is obviously a very desirable property but thisissue is out of the scope of this paper

We refer to the set of actual instances of relations as the setof facts denoted by 119865 and let |119865| = 119891 The binary actor-factmatrix is defined as 119883 = [119909

119894119895](119899+119898+119897)times119891

where each columnof the matrix 119883 represents a single fact (ie an existing dyadconnected in the social network by a relation) each row of thematrix 119883 represents an entity (actor object or relation) andeach column contains exactly three nonzero entries that isfor each 119895 there exist exactly three nonzero entries 119909

1198941119895 1199091198942119895

and 1199091198943119895 such that 1 le 119894

1le 119899 119899 + 1 le 119894

2le 119899 +119898 and 119899 +119898+

1 le 1198943le 119899 + 119898 + 119897 (the rows containing these three nonzero

entries correspond to the actor object and relation of a givendyad or in the RDF parlance to the subject predicate andthe object of a triple) At the same time the number of nonzeroentries in each row represents the number of dyads in whicha given actorobject participates or the number of dyads of agiven relation

Let us consider a simple social network depicted inFigure 1 It represents two different relationships betweenactors Alice Bob Titanic and Star Wars The relationshipsbetween these actors include liking and being a friend ofImplicitly we understand that liking is a relationship betweenan actor representing a person and an actor representing amovie whereas being a friend of is a relationship betweentwo actors representing persons This network can be easilytransformed into the actor-fact model There are three factsthat exist in this network

(i) 1198911198861198881199051 Alice is a friend of Bob

(ii) 1198911198861198881199052 Alice likes Titanic

(iii) 1198911198861198881199053 Bob likes Star Wars

In our actor-fact matrix representation 1198911198861198881199051will con-

stitute a column in the matrix 119883 and this column will havenonzero entries for cells 119883 [119860119897119894119888119890 119891119886119888119905

1] 119883 [119894119904 119886 119891119903119894119890119899119889 119900119891

1198911198861198881199051] and 119883 [119861119900119887 119891119886119888119905

1] The entire social network from

Figure 1 is presented in the actor-fact matrix representationin Table 1

Alice

Bob

Is a friend of

Likes

LikesTitanic

Star wars

Figure 1 Example of a social network

Table 1 Actor-fact matrix representing the network from Figure 1

1198911198861198881199051

1198911198861198881199052

1198911198861198881199053

Alice 1 1 0Bob 1 0 1Is a friend of 1 0 0Likes 0 1 1Titanic 0 1 0Star Wars 0 0 1

When using the actor-fact matrix as the data represen-tation one has to perform the prediction generation stepin a special way Initially as in many of the most accuratecollaborative filtering methods the missing values of theinput matrix are estimated In order to achieve this the inputmatrix119883 is processed into its reconstructed form119883 using oneof the evaluated recommendation algorithms Afterwardseach of the predictions is calculated as the 1-norm lengthof the Hadamard product of row vectors The Hadamardproduct (also known as the Schur product or the entrywiseproduct) of two vectors119883 = [119909

1 119909

119899] and119884 = [119910

1 119910

119899]

of the same length 119899 is defined as

119883 ∘ 119884 = [1199091sdot 1199101 119909

119899sdot 119910119899] (1)

Each dyad forms the proposition which is the subject ofthe likelihood estimationMore formally the prediction value119901119894119895119896

is calculated according to the formula

119901119894119895119896

=10038171003817100381710038171003817119909119894∘ 119909119895∘ 119909119896

100381710038171003817100381710038171 (2)

where 119909119894 119909119895 and 119909

119896are the row vectors of the reconstructed

matrix 119883 = [119909119894119895](119899+119898+119897)times119891

corresponding to the elements ofthe given dyad and the symbol ∘ represents the Hadamardproduct

For instance using the proposedmeasure on the exampleshown in Table 1 one may predict the likelihood of Aliceliking the movie Titanic (ie the likelihood of the joint


incidence of actors Alice likes and Titanic represented byrow vectors 119883[119860119897119894119888119890 ] = [1 1 0] 119883[119897119894119896119890119904 ] = [0 1 1] and119883[119879119894119905119886119899119894119888 ] = [0 1 0] resp) This likelihood equals[1 1 0] ∘ [0 1 1] ∘ [0 1 1] = 1 Conversely the likelihoodof any nonexistent fact such as Bob likes Titanic equals0 Naturally the practical value of such a measure is toestimate the likelihood of missing links after the applicationof appropriate collaborative filtering algorithms

The proposed formula may be seen as a generalizationof the dot product formula as in the hypothetical case ofmeasuring quasisimilarity of two (rather than three) vectorsthe formula is equivalent to the dot product of the twovectors It should be also noted that themeasuremay be easilyextended to larger number of vectors The interpretation ofthe proposed formula as the likelihood of the joint incidenceof two or more facts represented as vectors is based onthe quantum information retrieval model [18] It has tobe admitted that for the methods presented in this paperthe coordinates of modeled entitiesrsquo representations do notformally denote probabilities Therefore formally speakingthe proposed method may be regarded as a technique forproviding the likelihood of the joint incidence of two ormore events represented as vectors which is inspired bythe quantum information retrieval model of probabilitycalculation

4 Evaluation Methodology

Let us now present the evaluationmethodology for the exper-iments Our goal is to quantitatively compare two matrix-based methods of social network representation the classicalactor-object matrix and the new actor-fact matrix from thepoint of view of the link recommendation task Taking intoconsideration that link recommendation tasks may varywe have additionally considered two subproblems a one-class link recommendation where the aim is to discoveronly the missing links of a single relation (eg for a givenactor recommend to her a set of possible new friends)and the biclass link recommendation where the aim is todiscover missing links of one particular relation while notrecommending any of the links of another relation (eg for agiven actor show him possible friends who share theatricalpreferences but do not recommend any new movies) Thecombination of the two results in the following four scenarios

(i) S1 using single relation and an actor-object matrix119861 = [119887

119894119895]119899times119898

where 119899 is the number of actors and119898 isthe number of objects a scenario which correspondsto friend recommendations using only informationon friendship between actors

(ii) S2 using two antagonistic relations and an actor-object matrix 119861 = [119887

119894119895]119899times119898

where 119899 is the numberof actors and 119898 is the number of objects a scenariowhich corresponds to friend recommendations usinginformation on friendship and dislike between actors

(iii) S3 using single relation and an actor-fact matrix119883 =

[119909119894119895](119899+119898+119897)times119891

where 119899 is the number of actors 119898 isthe number of objects and 119897 = 1 is the number ofpredicates (in this case a single predicate)

(iv) S4 using two antagonistic relations and an actor-factmatrix 119883 = [119909

119894119895](119899+119898+119897)times119891

where 119899 is the number ofactors 119898 is the number of objects and 119897 = 2 is thenumber of predicates

In order to evaluate the effect of data representationmodel on collaborative filtering methods we have decidedto use one of the most widely referenced datasets in therecommender systems area We have deliberately chosen toturn a typical recommender system dataset into an artificialsocial network instead of using a genuine network (egFacebook friend graph or Twitter followers graph) becausewe also wanted to compare our results with previous resultsthuswe needed awell-established benchmark datasetMovie-Lens ML100k set was collected over various periods of timefrom Internet users who expressed their opinions on differentmovies in order to receive personalized recommendations Itcontains 100 000 ratings of 1682movies given by 943 uniqueusers Each rating which is above the average for a givenmovie has been treated as an indication that a user likes themovie Analogically each rating below the average has beenused as an indication that a user dislikes the movie Finallytrain and test data sets were generated by randomly dividingthe set of all known facts into two subsets The data weredivided according to the specified training ratio denoted by119905119903 To compensate for the impact that the randomness inthe dataset partitioning has on the results of the presentedmethods each plot in this section shows a series of values thatrepresent averaged results of individual experiments

As we have previously stated four recommendationalgorithms are used a simple popularity-based method atraditional SVD (PureSVD) a randomized version of SVD(RSVD) and a reflexive random indexing (RRI) To clarifythe actual algorithm being used is the collaborative filtering(CF) but it works on a matrix decomposed using the abovealgorithms The decomposition of the original matrix is ofcourse necessary to make collaborative filtering computationfeasible in practice In real social networks the size of thematrix (actor-object and actor-fact in particular) is so hugethat vector similarity computation in original dimensions isimpossible Each of the methods has been tested using thefollowing parameters (where applicable)

(i) vector dimension 256 512 768 1024 1536 2048

(ii) seed length 2 4 8

(iii) SVD 119896-cut 2 4 6 8 10 12 14 16 20 24

The number of dimensions (ie the SVD 119896-cut value)which we have used in the experiments may appear as quitesmall when compared to a typical LSI application scenarioThis choice has been made in order to avoid overfitting inaccordance with the assumptions concerning the dimension-ality reduction sensitivity presented in [14] Moreover it hasbeen observed that for each investigated scenario the optimalalgorithm performance was achieved for the SVD 119896-cut valuethat was less or equal to 16 so experiments for 119896-cuts higherthan 24 were not necessary We have also varied the numberof reflections used in RRI and RSVD between 3 and 15


060

064

068

072

02 04 06 08tr (training ratio)

AURO

C

ScenarioPopularityS1S3

MethodPopularityPureSVDRRIRSVD

Figure 2 AUROC results for S1 and S3 scenarios

060

065

070

075


AURO

C




5 Experiments

Figures 2 and 3 show a comparison of the investigatedrecommendation algorithms each using either the classicalactor-object or the actor-fact matrix data representationThecomparison has been performed using the AUROC measureand datasets of various sparsity The presented results havebeen obtained using optimized parameters for each methodand each data model Figure 2 presents AUROC evaluationresults obtained for the case of using the network consisting ofa single relation (ie only positive ratings) whereas Figure 3presents analogical results obtained for the case of using thefull network that is the one containing both positive andnegative relations

As it has been confirmed experimentally the actor-factdata representation matrix obtains recommendation qualitywhich is higher than the analogical results obtained with theuse of the classical actor-object matrix representation It canbe observed that the advantage of the proposed model isespecially visible in the case of employing the full networkcontaining both positive and negative relations and the RRImethod Such behavior is the result of the more native ability

to represent multiple relations provided by the actor-factmodel

One may realize that the popularity-based algorithminstead ofmodeling actorsrsquo preference profiles simply reflectsthe ratio between the number of positive relation instances(hits) and negative relation instances (misses) for the mostpopular objects in a givennetwork Since a randomprocedureis used to divide the dataset into a train set and a test setthe values of AUROC observed for the popularity-basedalgorithm are almost identical for the case of both tr = 02

and tr = 08 which additionally confirms the reliability ofthe AUROC measurement

In Figures 4 5 and 6 the impact of the data representationmethod on the performance evaluation results is presentedAs can be seen the application of the new fact-based datarepresentation method accompanied with the Hadamard-based reconstruction technique improves the results of usingRRI for both single and multiple relations networks (seeFigures 4(a) and 4(b)) Moreover for the case of using thenetwork with multiple relations RRI outperforms any otherpresentedmethod It may be concluded that in the context ofthe proposed data representation scheme the calculation ofthe 1-norm length of the Hadamard product is an operationthat is synergic to the reflective data processing

On the other hand the application of the new representa-tion method accompanied by the reconstruction techniquebased on the Hadamard product decreases the quality ofresults of using PureSVD for both single andmultiple relationnetworks (see Figures 5(a) and 5(b)) The reason of suchbehavior is the fact that the prediction method based on theHadamard product is not compatiblewith the data processingtechniques based on the SVD decomposition In the case ofusing the SVD dimensionality reduction an input matrixreconstruction result should rather be used directly as theset of prediction values The comparatively low quality ofthe method based on PureSVD and Hadamard product maybe explained by the nonprobabilistic nature of SVD resultsit is especially evident in cases when the vectors multipliedtogether (by means of the Hadamard product) have negativecoordinates which indicates that they obviously have noprobabilistic interpretation

Furthermore in the case of using RSVD (see Figures6(a) and 6(b)) which is a combination of RI-based pre-processing and SVD-based vector space optimization theapplication of the new data representation method improvesthe performance when single relation network is concerned(especially for small numbers of the ratio 119905119903) On the otherhand the application of the new data representation methoddecreases the system performance for the multiple relationsnetwork scenario for the same reasons as in the case ofusing PureSVD It may be additionally concluded that whenthe methods based on dimensionality reduction are usedthe new representation method performs relatively (iewith respect to results obtained for standard representationmethods) better for smaller values of the ratio 119905119903 that is forsparser datasets forwhich the recommendation task is harder

Figure 7 presents the performance of the recommenderalgorithms as compared in the investigated scenarios (ie inscenarios S1ndash4) It may be concluded that as it was already


064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations

Figure 4 Impact of the input data representation method on AUROC results achieved using the RRI method

064

068

072

076


AURO

C

Scenario

S1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations

Figure 5 Impact of the input data representation method on AUROC results achieved using the PureSVD method

shown in [11] the RSVDmethod outperforms other methods(ie PureSVD and RRI) when the standard input datarepresentation is used As far as the S1 scenario is concernedthat is the onewith the standard data representation based onthe actor-object coincidence matrix single relation it may beseen that in general the decomposition-based methods (iePureSVD and RSVD) achieve comparable recommendationquality and that in general these methods perform betterthan RRI (for various values of the training ratio) It mayalso be seen that the decomposition-based methods behave

quite differently in the S3 scenario in which the novel fact-based data representation is used in such case RSVD is themethod which not only outperforms all the other methodscompared in the scenario (including PureSVD) but alsoprovides a high recommendation quality for various valuesof the training ratio When analyzed together S1 and S3scenarios show the superiority of RSVD in cases when singlerelation network is used Moreover as long as RSVD iscombined with the fact-based data representation it providesrecommendation quality that is the most reliable which is


064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations

Figure 6 Impact of the input data representation method on AUROC results achieved using the RSVD method

higher than the quality observed when any other method isused for the majority of investigated values of the trainingratio In the case of scenario S4 (fact-based data repre-sentation with multiple relations) RRI method outperformsboth decomposition-based methods which shows the com-patibility of the Hadamard-based reconstruction techniquewith the reflective processing of multirelational data Suchcombination that is the application of RRI together with thefact-based multirelational data representation provides thehighest recommendation quality among all the combinationspresented in this paper As the RRI method does not involveany computationally expensive spectral decomposition thisresult may be very valuable from the perspective of thepractical applicability of the RRI-based link recommendationsystems in real-world scenarios

The results of the experiments presented herein clearlyindicate that the presence of the additional informationabout the negative relation improves the recommendationquality The results for S2 and S4 scenarios (see Figure 3)are significantly better than the results obtained in S1 and S3scenarios (see Figure 2) However the main conclusion fromthe experiments is that the best quality is observed in scenarioS4 (in which the proposed data representation and predictionmethod has been applied) for the case of the RRI-based dataprocessing application

The results of the comparison show that in general aslong as the proposed multirelational actor-fact matrix datarepresentation is used the reflective processing methods(in particular RRI) outperform the well-known SVD-baseddimensionality reduction methods While trying to explainthis observation one may note that the typical actor-objectmatrix (representing only positive relations between actorsand objects) is equivalent to a part of another much biggermatrix This bigger matrix may be obtained as the resultof multiplying the actor-fact matrix (with both actors and

objects represented as the rows) by its transposition Theldquosubmatrixrdquo of the biggermatrix (together with its transposedldquoclonerdquo) is just a typical collaborative filtering matrixmdashitrepresents the ldquomagnitudesrdquo of the actor-object positive pref-erence relation Demonstrating this correspondence betweenthe object-fact matrix format and widely used actor-objectmatrix format (typically used together with the SVD-baseddimensionality reduction) requires an additionalmatrixmul-tiplication (ie an additional reflection) Therefore it maybe expected that as long as the proposed data represen-tation is used only reflective data processing methods cantake full advantage of using it by applying appropriatelymany reflections To put this observation (confirmed bythe results of the experiments presented herein) in otherwords while using the fact-based data representation SVD-based collaborative filtering methods need at least one morematrixmultiplication to provide the recommendation qualitycomparable to the quality achieved bymeans of the optimizedreflective matrix processing

6 Conclusions

The new framework proposed in this paper consists of twocore elements the new data representation method basedon the actor-fact matrix and the new prediction calculationtechnique based on the Hadamard product of vectors The 1-norm length of the Hadamard product vector may be seenas a natural extension of the vector dot product (in thiscase as a kind of group inner product of the three vectorsrepresenting the actor the object and the relation) whereasthe dot productmay be seen as an elemental step of thematrixmultiplication that is the basic operation used in reflectivematrix processing Therefore the calculation of the 1-normlength of theHadamard product vectormay be regarded as anoperation compatible with the reflective matrix processing


064

068

072

076


AURO

C

PureSVDRRIRSVD

(a) S1

064

068

072

076


AURO

C

PureSVDRRIRSVD

(b) S2

064

068

072

076


AURO

C

PureSVDRRIRSVD

(c) S3

064

068

072

076


AURO

C

PureSVDRRIRSVD

(d) S4

Figure 7 Comparison of all recommender algorithms for various scenarios

seen as an ldquoadditional reflectionrdquo (ie the next step of thereflective data exploration process) This observation mayadditionally explain why the optimal number of reflectionsfor the RRI method in the S4 scenario is relatively small(equal to 3 for each training ratio) On the other hand theprediction based on the Hadamard product does not suit wellthe data processing techniques based on SVDdecompositionThis explains relatively weak results of the dimensionalityreduction methods in the scenarios in which the proposeddata modeling method is used In the case of using thetechniques based on the dimensionality reduction the inputmatrix reconstruction result is used directly as the set of theprediction values and an additional step of the Hadamardproduct calculation procedure is not required

We have shown that the proposed fact-based approach tosocial network representation allows to improve the quality

of collaborative filtering The application of the proposedactor-fact matrix in systems featuring themost widely knownmethods for input data processing such as the SVD-baseddimensionality reduction and the reflective matrix process-ing has been investigated We have also shown that usingthe actor-fact matrix together with reflective data processingenables us to design a collaborative system outperformingsystems based on the application of the dimensionalityreduction techniques

We have demonstrated the superiority of multiple matrixdata reflections by realizing a new kind of spreading acti-vation However the purpose of the spreading activationmechanism introduced herein is to realize the probabilisticreasoning about any fact that may be composed of actorsrelations and objects appearing in the network To stateit more precisely in order to estimate the probability that


a given fact represents a true statement the three constituentsof the fact are independently primed On the basis of thethree independently generated vectors each one representinglevels of node activation obtained as the result of primingthe node represented by the vector the 1-norm length of aHadamard product is applied to measure holistically (ie bytaking into account the state of all nodes) the amount of jointsimilarity of the three fact constituents or more preciselytheir representations that have been obtained as the result ofthe spreading activation procedure

While taking the perspective of related areas of research(such as Web scale reasoning) one may find it particularlyinteresting to investigate our proposal of using the 1-normlength of the Hadamard product as the measure of anunknown dyad likelihood The authors believe that dueto probabilistic reasoning as a vector-space technique theintroduced solution provides basic means for extending thecapacity for reasoning on social networks beyond the bound-aries provided by currently used nonstatistical methodsIn our opinion the application of introduced methods (inparticular the new actor-fact data representation and thenew Hadamard product likelihood calculation) leads to asignificant link recommendation quality improvement atleast for the case of using the reflective matrix processingAlthough this paper provided an evaluation using only thetwo relations scenario one may also find the proposedapproach to matrix-based propositional data representationto be promising from the perspective of its extendability totruly multirelational applications

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

Mikołaj Morzy has been supported by the National ScienceCentre Grant 201103BST601563 Michał Ciesielczyk andAndrzej Szwabe have been supported by the National ScienceCentre Grant DEC-201101DST606788

References

[1] D Liben-Nowell and J Kleinberg ldquoThe link-prediction prob-lem for social networksrdquo Journal of the American Society forInformation Science and Technology vol 58 no 7 pp 1019ndash10312007

[2] L L Linyuan and T Zhou ldquoLink prediction in complexnetworks a surveyrdquo Physica A Statistical Mechanics and ItsApplications vol 390 no 6 pp 1150ndash1170 2011

[3] R Lichtenwalter and N V Chawla ldquoLink prediction fair andeffective evaluationrdquo in Proceedings of the IEEEACM Interna-tional Conference on Advances in Social Networks Analysis andMining (ASONAM rsquo12) pp 376ndash383 August 2012

[4] C Lee B Nick U Brandes and P Cunningham ldquoLink predic-tion with social vector clocksrdquo in Proceedings of the 19th ACMSIGKDD International Conference on Knowledge Discovery andData Mining ACM 2013

[5] B Mobasher X Jin and Y Zhou ldquoSemantically enhancedcollaborative filtering on the Webrdquo in Proceedings of the 1stEuropean Web Mining Forum (EWMF rsquo03) pp 57ndash76 SpringerCavtat-Dubrovnik Croatia September 2003

[6] J Salter and N Antonopoulos ldquoCinemaScreen recommanderagent combining collaborative and content-based filteringrdquoIEEE Intelligent Systems vol 21 no 1 pp 35ndash41 2006

[7] D Damljanovic J Petrak M Lupu et al ldquoRandom indexingfor finding similar nodes within large RDF graphsrdquo in TheSemantic Web ESWC 2011 Workshops vol 7117 of Lecture Notesin Computer Science pp 156ndash171 Springer Berlin Germany2012

[8] P Todorova A Kiryakov D Ognyano I Peikov R Velkov andZ Tashev ldquoConclusions from experimental data and combi-natorics analysisrdquo Tech Rep The Large Knowledge Collider(LarKC) 2009

[9] P Cremonesi Y Koren and R Turrin ldquoPerformance of rec-ommender algorithms on top-N recommendation tasksrdquo inProceedings of the 4th ACM Recommender Systems Conference(RecSys rsquo10) pp 39ndash46 ACM September 2010

[10] T Cohen R Schvaneveldt and D Widdows ldquoReflectiveRandom Indexing and indirect inference a scalable methodfor discovery of implicit connectionsrdquo Journal of BiomedicalInformatics vol 43 no 2 pp 240ndash256 2010

[11] M Ciesielczyk and A Szwabe ldquoRSVD-based dimensionalityreduction for recommender systemsrdquo International Journal ofMachine Learning and Computing vol 1 no 2 pp 170ndash175 2011

[12] G Adomavicius and A Tuzhilin ldquoToward the next generationof recommender systems a survey of the state-of-the-art andpossible extensionsrdquo IEEE Transactions on Knowledge and DataEngineering vol 17 no 6 pp 734ndash749 2005

[13] Y Koren R Bell and C Volinsky ldquoMatrix factorization tech-niques for recommender systemsrdquo Computer vol 42 no 8 pp30ndash37 2009

[14] B Sarwar G Karypis J Konstan and J Riedl ldquoAnalysis ofrecommendation algorithms for e-commercerdquo in Proceedings ofthe 2nd ACM Conference on Electronic Commerce (EC rsquo00) pp158ndash167 2000

[15] J B Schafer J A Konstan and J Riedl ldquoE-commerce rec-ommendation applicationsrdquo in Applications of Data Mining toElectronic Commerce pp 115ndash153 SpringerNewYorkNYUSA2001

[16] J L Herlocker J A Konstan L G Terveen and J T RiedlldquoEvaluating collaborative filtering recommender systemsrdquoACM Transactions on Information Systems vol 22 no 1 pp 5ndash53 2004

[17] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[18] I Pitowsky ldquoQuantum mechanics as a theory of probabilityrdquoin Physical Theory and Its Interpretation vol 72 ofTheWesternOntario Series in Philosophy of Science pp 213ndash240 SpringerDordrecht The Netherlands 2006

Submit your manuscripts athttpwwwhindawicom

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014


Applied Computational Intelligence and Soft Computing

thinspAdvancesthinspinthinsp

Artificial Intelligence

HindawithinspPublishingthinspCorporationhttpwwwhindawicom Volumethinsp2014

Advances inSoftware EngineeringHindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

httpwwwhindawicom Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in



a bipartite graphWhen the relation is defined between actors(eg Carol likes Douglas) the actor-object matrix becomessimply a square matrix The situation becomes slightly morecomplex in case of multirelational social networks wheremultiple different relations of possibly varying semanticsmay exist between actors in the network A typical exampleis a network where actors may express both fondness of andrejection of certain objects (eg Eve likes to watch comedymovies but she hates horror movies) If the storage of relationvalues is permitted by a given data model multirelationalnetworksmay bemodeled by assigning distinct values (or setsof values) to particular relations but for a binary actor-objectmatrix it is necessary to represent each relation by a separatematrix and to include processing of multiple matrices by therecommendation algorithm

In this paper we argue that actor-object matrix is notthe optimal data model for recommendation algorithmsOur experiments conclusively show that transformation fromthe actor-object to the actor-fact matrix improves recom-mendation quality significantly as measured by the popularldquoarea under receiver-operator characteristic curverdquo (AUROC)measure We perform extensive experiments on a large real-world dataset to support our claims Given the fact that thevast majority of link recommendation algorithms for socialnetworks compute actor-object actor-actor or object-objectsimilarities by applying linear algebra on data representationmatrices the superiority of actor-fact matrix representationbecomes quite obvious (in particular for methods which aregenerally based on singular value decomposition paradigm)The original contribution of this paper consists in the intro-duction of two elements

(i) a data representationmethod based on a binary actor-fact matrix

(ii) a similarity quasimeasure based on the 1-norm lengthof theHadamard product of the given tuple of vectors

Our key finding is that the proposed data representa-tion and the new similarity measure when combined withreflexivematrix processing significantly outperform state-of-the-art collaborative filtering methods based on the use of astandard actor-object matrix

Our paper is organized as follows In Section 2 we reporton the related work on the subject and we present the refer-enced recommendation algorithms Section 3 introduces theconcept of the actor-fact matrix In Section 4 we present theevaluation methodology of the actor-fact matrix represen-tation and we report the results of conducted experimentsin Section 5 The paper concludes in Section 6 with a briefsummary

2 Related Work

By far the most popular approach to link recommendationin social networks is collaborative filtering using an inputmatrix which represents each actor as a vector in the space ofobjects and each object as a vector in the space of actorsManyprevious works consider building a model of collaborativesimilarity from amodel of content-based interobject relations

to be the most promising hybrid link recommendationtechnique [5 6] As far as algebraic representations of graphdata is concerned the actor-fact matrix model is similar tothe model described in [7] Indeed our model was inspiredby the semantic data model of RDF triples Also as far asthe algebraic transformation of the graph data is concernedthe model presented in this paper may be regarded as similarto RDF data search methods which are based on spreadingactivation realized by means of iterative matrix data pro-cessing [8] or single multiplication by a random projectionmatrix [7] However the latter method is limited to theRDF graph node search using a traditional bilateral similaritymeasure whereas we extend the model by using a vector-space quasisimilarity measure which allows to efficientlycompute the likelihood of an unknown relationship

In our evaluation we use threemain types of collaborativefiltering recommender algorithmsThe baseline is establishedby a simple popularity-based algorithm favoring objects hav-ing the highest number of positive relationships in the trainset [9] Next we have employed several different approachesto the input matrix decomposition Firstly we have used thealgorithm based on reflexive random indexing [10] Secondlywe have used two types of algorithms that are based onthe singular value decomposition a traditional implemen-tation of the method (PureSVD) in which actor vectorsare represented as combinations of object vectors withoutany specific parameterization and an implementation ofthe randomized singular value decomposition (RSVD) [11]which is a combination of the reflexive random indexing andSVD We have chosen so since SVD-based methods havebeen long considered to be the most efficient recommenderengines in real world settings [12ndash15]

Section 5 presents the results of conducted experimentsSince our data have the form of binary prepositions (ieour social network is a signed network) the evaluation ofthe proposed method is oriented on the task of findingrelevant links [16] rather than on the minimization of recom-mendation rating error Classification metrics such as areaunder ROC (AUROC) measure the probability of makingcorrect or incorrect decisions by the recommender algorithmabout whether an object is relevant Moreover classificationmetrics tolerate the differences between actual and predictedvalues as long as they do not lead to wrong decisions Thusthese metrics are appropriate to examine binary relevancerelationships In particular while usingAUROC it is assumedthat the ordering among relevant items does not matterAccording to [17] AUROC is equivalent to the probability ofthe systembeing able to choose properly between two objectsone randomly selected from the set of relevant objects andone randomly selected from the set of nonrelevant objectsFor this reason the results of the theoretical research areevaluated bymeans of experiments based on qualitymeasuresthat are probabilistically interpretable such as AUROC

3 Actor-Fact Matrix

Let us recall that our model is influenced by the semanticmodel of RDF triples Each RDF triple combines informationabout the predicate that relates a subject to an object We



1 119886


119874 = 1199001 119900


1 119877

119897


119894 119860 times 119874 rarr



119894119895](119899+119898+119897)times119891


1198941119895 1199091198942119895

and 1199091198943119895 such that 1 le 119894

1le 119899 119899 + 1 le 119894

2le 119899 +119898 and 119899 +119898+









1] 119883 [119894119904 119886 119891119903119894119890119899119889 119900119891

1198911198861198881199051] and 119883 [119861119900119887 119891119886119888119905



Alice

Bob

Is a friend of

Likes

LikesTitanic

Star wars



1198911198861198881199051

1198911198861198881199052

1198911198861198881199053



1 119909

119899] and119884 = [119910

1 119910

119899]


119883 ∘ 119884 = [1199091sdot 1199101 119909

119899sdot 119910119899] (1)



119901119894119895119896

=10038171003817100381710038171003817119909119894∘ 119909119895∘ 119909119896

100381710038171003817100381710038171 (2)

where 119909119894 119909119895 and 119909


matrix 119883 = [119909119894119895](119899+119898+119897)times119891









119894119895]119899times119898



119894119895]119899times119898



[119909119894119895](119899+119898+119897)times119891



119894119895](119899+119898+119897)times119891






(iii) SVD 119896-cut 2 4 6 8 10 12 14 16 20 24



060

064

068

072


AURO

C




060

065

070

075


AURO

C




5 Experiments











064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations


064

068

072

076


AURO

C

Scenario

S1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations





064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations






6 Conclusions



064

068

072

076


AURO

C

PureSVDRRIRSVD

(a) S1

064

068

072

076


AURO

C

PureSVDRRIRSVD

(b) S2

064

068

072

076


AURO

C

PureSVDRRIRSVD

(c) S3

064

068

072

076


AURO

C

PureSVDRRIRSVD

(d) S4











Acknowledgment


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in





1 119886


119874 = 1199001 119900


1 119877

119897


119894 119860 times 119874 rarr



119894119895](119899+119898+119897)times119891


1198941119895 1199091198942119895

and 1199091198943119895 such that 1 le 119894

1le 119899 119899 + 1 le 119894

2le 119899 +119898 and 119899 +119898+









1] 119883 [119894119904 119886 119891119903119894119890119899119889 119900119891

1198911198861198881199051] and 119883 [119861119900119887 119891119886119888119905



Alice

Bob

Is a friend of

Likes

LikesTitanic

Star wars



1198911198861198881199051

1198911198861198881199052

1198911198861198881199053



1 119909

119899] and119884 = [119910

1 119910

119899]


119883 ∘ 119884 = [1199091sdot 1199101 119909

119899sdot 119910119899] (1)



119901119894119895119896

=10038171003817100381710038171003817119909119894∘ 119909119895∘ 119909119896

100381710038171003817100381710038171 (2)

where 119909119894 119909119895 and 119909


matrix 119883 = [119909119894119895](119899+119898+119897)times119891









119894119895]119899times119898



119894119895]119899times119898



[119909119894119895](119899+119898+119897)times119891



119894119895](119899+119898+119897)times119891






(iii) SVD 119896-cut 2 4 6 8 10 12 14 16 20 24



060

064

068

072


AURO

C




060

065

070

075


AURO

C




5 Experiments











064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations


064

068

072

076


AURO

C

Scenario

S1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations





064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations






6 Conclusions



064

068

072

076


AURO

C

PureSVDRRIRSVD

(a) S1

064

068

072

076


AURO

C

PureSVDRRIRSVD

(b) S2

064

068

072

076


AURO

C

PureSVDRRIRSVD

(c) S3

064

068

072

076


AURO

C

PureSVDRRIRSVD

(d) S4











Acknowledgment


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in









119894119895]119899times119898



119894119895]119899times119898



[119909119894119895](119899+119898+119897)times119891



119894119895](119899+119898+119897)times119891






(iii) SVD 119896-cut 2 4 6 8 10 12 14 16 20 24



060

064

068

072


AURO

C




060

065

070

075


AURO

C




5 Experiments











064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations


064

068

072

076


AURO

C

Scenario

S1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations





064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations






6 Conclusions



064

068

072

076


AURO

C

PureSVDRRIRSVD

(a) S1

064

068

072

076


AURO

C

PureSVDRRIRSVD

(b) S2

064

068

072

076


AURO

C

PureSVDRRIRSVD

(c) S3

064

068

072

076


AURO

C

PureSVDRRIRSVD

(d) S4











Acknowledgment


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




060

064

068

072


AURO

C




060

065

070

075


AURO

C




5 Experiments











064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations


064

068

072

076


AURO

C

Scenario

S1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations





064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations






6 Conclusions



064

068

072

076


AURO

C

PureSVDRRIRSVD

(a) S1

064

068

072

076


AURO

C

PureSVDRRIRSVD

(b) S2

064

068

072

076


AURO

C

PureSVDRRIRSVD

(c) S3

064

068

072

076


AURO

C

PureSVDRRIRSVD

(d) S4











Acknowledgment


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations


064

068

072

076


AURO

C

Scenario

S1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations





064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations






6 Conclusions



064

068

072

076


AURO

C

PureSVDRRIRSVD

(a) S1

064

068

072

076


AURO

C

PureSVDRRIRSVD

(b) S2

064

068

072

076


AURO

C

PureSVDRRIRSVD

(c) S3

064

068

072

076


AURO

C

PureSVDRRIRSVD

(d) S4











Acknowledgment


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




064

068

072

076


AURO

C

ScenarioS1S3

(a) One relation

064

068

072

076


AURO

C

ScenarioS2S4

(b) Two relations






6 Conclusions



064

068

072

076


AURO

C

PureSVDRRIRSVD

(a) S1

064

068

072

076


AURO

C

PureSVDRRIRSVD

(b) S2

064

068

072

076


AURO

C

PureSVDRRIRSVD

(c) S3

064

068

072

076


AURO

C

PureSVDRRIRSVD

(d) S4











Acknowledgment


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in




064

068

072

076


AURO

C

PureSVDRRIRSVD

(a) S1

064

068

072

076


AURO

C

PureSVDRRIRSVD

(b) S2

064

068

072

076


AURO

C

PureSVDRRIRSVD

(c) S3

064

068

072

076


AURO

C

PureSVDRRIRSVD

(d) S4











Acknowledgment


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in








Acknowledgment


References


























Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in










Advances in

FuzzySystems


Volume 2014












Journal of

Journal of





Advances in

Multimedia


Biomedical Imaging



Advances in


RoboticsJournal of










Advances in



Research Article On Efficient Link Recommendation in...

Documents

Transcript of Research Article On Efficient Link Recommendation in...