Deep Learning Modeling for Top-N Recommendation With...

16
Received July 11, 2018, accepted September 4, 2018, date of publication September 13, 2018, date of current version October 8, 2018. Digital Object Identifier 10.1109/ACCESS.2018.2869924 Deep Learning Modeling for Top-N Recommendation With Interests Exploring WANG ZHOU 1 , JIANPING LI 1 , MALU ZHANG 1 , YAZHOU WANG 2 , AND FADIA SHAH 1 1 School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China 2 School of Optoelectronic Information, University of Electronic Science and Technology of China, Chengdu 610054, China Corresponding authors: Wang Zhou ([email protected]) and Jianping Li ([email protected]) This work was supported by the National Science Foundation of China under Grant 61370073. ABSTRACT Recommender systems (RS) currently play a crucial role in information filtering and retrieval, and have been ubiquitously applied in many domains, although suffering from such data sparsity and cold start problems. There are plenty of studies that try to make efforts to improve the performance of RS through different aspects, such as traditional matrix factorization technique and deep learning methods in recent years, however, it’s still a challenging issue under research. In this paper, motivated by this, a two- stage deep learning-based model for top-N recommendation with interests exploring (DLMR) is proposed: 1) DLMR explores latent interests for each user, captures factors from reviews and contextual information via convolutional neural network, and performs convolutional matrix factorization to generate the candidates list; 2) In order to enhance the recommendation performance, DLMR further conducts candidates ranking through a three-layer denoising autoencoder, with taking account of heterogeneous side information. The DLMR provides a flexible scheme to leverage the available resources for recommendation, which is able to explore user’s latent interests, capture the intricate interactions between users and items, and provide accurate and personalized recommendations. Experimental analysis over real world data sets demonstrates that DLMR could provide high performance top-N recommendation in sparse settings and outperform state-of-the-art recommender approaches significantly. INDEX TERMS Sparse latent Dirichlet allocation, interest exploring, convolutional neural network, denoising autoencoder, top-N recommendation. I. INTRODUCTION In recent years, with the continuous development of network, the volume of online information increases exponentially, as a result, the serious problem of how to find most use- ful information effectively and efficiently always confuses Web users. As the crucial tool for information retrieval and recommendation, recommender systems (RS) could alleviate information overload, and have achieved great success in many industries. For example, RS could influence about 80% choice of movie watching in Netflix [1], and, in YouTube the homepage recommendation could influence about 60% video clicks [2], [3]. However, in practice, RS usually confront the data sparsity problem and lack abundant knowledge on user’s inherent interests and preference, which always leads to unsatisfactory performance. In real world, users may concern only a small proportion (such dozens) of large-scale item corpus (such millions), and they always make decisions over items according to their interests and preference, therefore, it’s believable that interests exploring based recommender approaches could overcome the data sparsity problem and provide accurate recommendation results. In this regards, many previous researches try to explore latent interests and preference for users via taking account of the available information, and in practice, these methods are demonstrated to be able to improve the performance of RS in some extent, both in academia and industry [4], [5]. In general, there are two ways to explore the distribution for user’s latent interests: One is the naive statistic of behavior records [6], and the other is via probabilistic topic model–latent Dirichlet Allocation (LDA) from textual reviews, which could sufficiently reflect user’s interests and preference [7]–[9]: Suppose user’s inter- ests and the textual reviews as the latent topics and document respectively, accordingly, user’s interests could be obtained via transfer learning, and RS can indeed provide much more accurate recommendation with the learned interests [10]. 51440 2169-3536 2018 IEEE. Translations and content mining are permitted for academic research only. Personal use is also permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. VOLUME 6, 2018

Transcript of Deep Learning Modeling for Top-N Recommendation With...

Page 1: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

Received July 11, 2018, accepted September 4, 2018, date of publication September 13, 2018, date of current version October 8, 2018.

Digital Object Identifier 10.1109/ACCESS.2018.2869924

Deep Learning Modeling for Top-NRecommendation With Interests ExploringWANG ZHOU 1, JIANPING LI1, MALU ZHANG1, YAZHOU WANG2, AND FADIA SHAH11School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China2School of Optoelectronic Information, University of Electronic Science and Technology of China, Chengdu 610054, China

Corresponding authors: Wang Zhou ([email protected]) and Jianping Li ([email protected])

This work was supported by the National Science Foundation of China under Grant 61370073.

ABSTRACT Recommender systems (RS) currently play a crucial role in information filtering and retrieval,and have been ubiquitously applied in many domains, although suffering from such data sparsity andcold start problems. There are plenty of studies that try to make efforts to improve the performance ofRS through different aspects, such as traditional matrix factorization technique and deep learning methodsin recent years, however, it’s still a challenging issue under research. In this paper, motivated by this, a two-stage deep learning-based model for top-N recommendation with interests exploring (DLMR) is proposed:1) DLMR explores latent interests for each user, captures factors from reviews and contextual informationvia convolutional neural network, and performs convolutional matrix factorization to generate the candidateslist; 2) In order to enhance the recommendation performance, DLMR further conducts candidates rankingthrough a three-layer denoising autoencoder, with taking account of heterogeneous side information. TheDLMR provides a flexible scheme to leverage the available resources for recommendation, which is ableto explore user’s latent interests, capture the intricate interactions between users and items, and provideaccurate and personalized recommendations. Experimental analysis over real world data sets demonstratesthat DLMR could provide high performance top-N recommendation in sparse settings and outperformstate-of-the-art recommender approaches significantly.

INDEX TERMS Sparse latent Dirichlet allocation, interest exploring, convolutional neural network,denoising autoencoder, top-N recommendation.

I. INTRODUCTIONIn recent years, with the continuous development of network,the volume of online information increases exponentially,as a result, the serious problem of how to find most use-ful information effectively and efficiently always confusesWeb users. As the crucial tool for information retrieval andrecommendation, recommender systems (RS) could alleviateinformation overload, and have achieved great success inmany industries. For example, RS could influence about 80%choice of movie watching in Netflix [1], and, in YouTube thehomepage recommendation could influence about 60% videoclicks [2], [3]. However, in practice, RS usually confrontthe data sparsity problem and lack abundant knowledge onuser’s inherent interests and preference, which always leadsto unsatisfactory performance.

In real world, users may concern only a small proportion(such dozens) of large-scale item corpus (such millions),and they always make decisions over items according to

their interests and preference, therefore, it’s believable thatinterests exploring based recommender approaches couldovercome the data sparsity problem and provide accuraterecommendation results. In this regards, many previousresearches try to explore latent interests and preference forusers via taking account of the available information, andin practice, these methods are demonstrated to be able toimprove the performance of RS in some extent, both inacademia and industry [4], [5]. In general, there are two waysto explore the distribution for user’s latent interests: Oneis the naive statistic of behavior records [6], and the otheris via probabilistic topic model–latent Dirichlet Allocation(LDA) from textual reviews, which could sufficiently reflectuser’s interests and preference [7]–[9]: Suppose user’s inter-ests and the textual reviews as the latent topics and documentrespectively, accordingly, user’s interests could be obtainedvia transfer learning, and RS can indeed provide much moreaccurate recommendation with the learned interests [10].

514402169-3536 2018 IEEE. Translations and content mining are permitted for academic research only.

Personal use is also permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

VOLUME 6, 2018

Page 2: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

FIGURE 1. Framework of the hybrid DLMR recommender engine. Offline computation is for interests inference through LDA,according to the observed ratings and textual reviews. In general, DLMR includes two steps: Candidates generation (hundreds)through convolutional matrix factorization (CMF), and candidates ranking through denoising Autoencoder (DAE). Finally,DLMR returns top-N recommendation list (dozens) to the user.

In general, RS is designed initially to predict missingratings and provide accurate top-N recommendation forqueries, according to the historical behavior records. Tradi-tional matrix factorization (MF) technique is an effective andpowerful tool for rating prediction [11]–[13], and essentially,many recommender approaches are extensive versions ofMF,which try to improve the performance of RS from differentaspects, such as interests exploring, social network and soon [8], [14]–[16], moreover, the additional information isalways normalized as regularization terms to constrain MFto learn low-rank feature vectors for users and items, how-ever, the performance of these RS is always not stable overunbalanced datasets and the final recommendation results arealways unsatisfactory. On the other hand, available side infor-mation is always neglected for many previous researches,which could be greatly helpful to improve recommendationaccuracy.

In past decade, deep learning methods have developedremarkably, which could be applicable to various areas,such as speech recognition, image analysis and so on,due to its powerful capability and advantage in mass dataprocessing [3]. In contrast to traditional recommender sys-tems, recently, increasing number of researches try to intro-duce deep learning methods to improve the performance ofRS [17], [18], such as multilayer perceptron (MLP) [9], [19],autoencoder (AE) [20]–[22], convolutional neural network(CNN) [23] and hybrid methods and so on. Deep learn-ing based recommender systems are prone to processvarious available data with different attributes, moreover,it’s already certificated that deep learning based recom-mender systems could capture intricate information withinthe numerical and textual data, work well over unbal-anced datasets, and yield significantly better recommendationperformance.

Due to there’s no ground truth for RS, in real life, the rec-ommendation accuracy is always relatively low. At present,it’s still a challenging work for RS to solve the data sparsityand cold start problems, as well as the unstable recommen-dation accuracy, although it’s promising to introduce deeplearning methods into RS. Moreover, the essential goal ofRS is to provide a ranked top-N recommendation list forqueries, not just perform rating prediction.

Motivated by the challenges stated above, in this article,a novel deep learning based recommender system–DLMR isproposed, which is an integration of traditional recommenda-tion methods and deep learning methods. DLMR leveragesavailable resources to reduce the uncertainty and improve therecommendation performance, and generally contains twostages: candidates generation and candidates ranking. Duringthe first stage, DLMR explores user’s latent interests viasparse LDA, learns low-rank feature vectors for users anditems through convolutional matrix factorization (CMF), andthen, generates the candidates list; During the second stage,DLMR performs candidates ranking through a three-layerdenoising autoencoder (DAE). The corresponding graphicalillustration of DLMR is presented in Fig. 1, from which wecan see that DLMR generally contains offline computationand online query. The contributions of this article are asfollows:• A general recommendation system DLMR is proposed,which explores user’s latent interests distribution viasparse Latent Dirichlet Allocation (sparseLDA) fromthe textual reviews, which could reflect user’s interestsand preference. Accordingly, DLMR is able to providepersonalized recommendation with the learned interests.

• To improve the prediction accuracy, DLMR cap-tures factors from both textual reviews and contextualinformation for items, through a convolutional neural

VOLUME 6, 2018 51441

Page 3: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

network, and then, DLMR conducts convolutionalmatrix factorization to learn low-rank feature vectors forusers and items respectively, to generate the candidateslist.

• DLMR performs candidates ranking for top-N rec-ommendation via a three-layer denoising autoencoderarchitecture, with taking heterogeneous side informationinto consideration.

• DLMR is able to deal with explicit and implicit feed-back, and work stably over unbalanced datasets. Exper-imental analysis on three large-scale datasets indicatesthat DLMR outperforms state-of-the-art recommenderapproaches significantly, especially in recommendationaccuracy and efficiency.

The remainder of this article is organized as follows:In Section II, we will present the related work about recom-mender systems. Section III will present candidates genera-tion for the proposedmodel DLMR in this paper. Section IV isfor candidates ranking. Section Vwill report the experimentalanalysis on three real-world datasets, and the conclusion andfuture work are presented in Section VI.

II. RELATED WORKIn this section, some related work for DLMR will be pre-sented, including interests exploring, probabilistic matrix fac-torization and deep learning based recommendation.

A. INTERESTS EXPLORINGIt’s conceivable that recommendation according to user’sinterests will improve the performance of RS significantly,and many previous researches try to explore interests forusers [24], [25]. Actually, it’s intractable to acquire theexact interests distribution for each user. In existing lit-eratures, researchers resort to transfer learning to exploreinterests for users, due to the great success of latent topicmodel–latent Dirichlet Allocation (LDA). Liu et al. [4]s themodeling ‘‘user-interests-items’’ is just similar to the model-ing ‘‘document-topic-words’’, and propose iExpand methodto improve recommendation accuracy, with interests exploit-ing via LDA. In [6] and [9], user’s interests distributionis learned via the statistic of historical records, which canalso improve the performance of RS. Wang et al. [5] andWang and Blei [8] develop a probabilistic topic modelingvia LDA, to exploit topics related to user’s interests andpreference for document recommendation. Ren et al. [15]try to explore user’s interests via LDA, for point-of-interestrecommendation. In practice, experimental analysis over realworld datasets indicates that recommender approaches withinterests exploring outperform that without interests explor-ing significantly.

B. PROBABILISTIC MATRIX FACTORIZATIONProbabilistic matrix factorization (PMF) aims to learn latentfeature vectors for users and items respectively, and per-form rating prediction via the product of the learnedfeature vectors. PMF has become the popular tool for

recommendation systems [11], [26], [27], and many currentrecommendation approaches are extensive version of PMF,due to the high performance of PMF to RS. In practice,PMF could alleviate the data sparsity and scale linearly withthe number of observations. To improve prediction accuracy,traditional recommender systems try to incorporate socialnetworks [6], [28], [29], contextual information [30] andother information into PMF. Recently, to enhance the rec-ommendation performance, many recommender approachestry to combine PMF and deep learning methods together,such as convolutional neural network [23], autoencoder [31]and so on. For the proposed recommender system–DLMR inthis article, PMF is also employed as basic tool for low-rankfeature vectors learning.

C. DEEP LEARNING BASED RECOMMENDATIONCurrently, deep learning methods have gained tremendoussuccess and widely applied in many areas, such as speechrecognition, text processing, image classification, sentimentanalysis and so on, due to its high efficacy and superiorityin information processing [32]–[36]. In past decade, thereare so many researches that try to introduce the promisingdeep learning methods into recommender systems to improvethe performance [37]–[40], as a result, these methods couldobtain satisfactory recommendation results in contrast to tra-ditional RS. For example, Restricted Boltzmann Machines(RBM) consists of a hidden layer and a visible layer, andin Salakhutdinov et al. [41] incorporate RBM into collabo-rative filtering for recommendation and achieve high perfor-mance. To exploit information from various kinds of sources,a Multilayer Perceptron (MLP) based recommender engineis proposed for YouTube in [19], with multiple hidden lay-ers between the input layer and output layer. Analogously,Gao et al. [9] employ MLP for documents recommendation,which could achieve a little better performance in contrastto previous researches. Moreover, in [42], [43], autoencoder(AE) is introduced for recommendation, the goal of which isto reconstruct the input ratings in the output layer. To utilizethe textual information for recommendation, Kim et al. [23]introduce convolutional neural network (CNN) into matrixfactorization for document recommendation, and achieve bet-ter recommendation accuracy than other methods. Recently,Zhang et al. [3] provide a comprehensive survey of deeplearning based recommender systems. This panorama willbe greatly helpful to the promising researches of introducingdeep learningmethods into recommender systems. In DLMR,deep learning methods will be employed to improve the rec-ommendation performance, including convolutional neuralnetwork (CNN) and denoising autoencoder (DAE).

III. CANDIDATES GENERATION FOR DLMRDuring the first stage, candidates generation of the deeplearning based recommender system–DLMR, will be pre-sented in detail, including the framework of DLMR,interests exploring, CNN architecture, Probabilistic MatrixFactorization with interests and CNN. DLMR could not

51442 VOLUME 6, 2018

Page 4: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

TABLE 1. Key notations and meanings.

only perform rating prediction, but also provide top-Nrecommendation.

A. PROBLEM DEFINITION AND FRAMEWORKTo avoid confusion, we firstly define the notations, andTable 1 shows some key notations. The incentive of recom-mendation systems is to provide timely and accurate rec-ommendations for users, however, it’s rather challenging forprevious researches to solve this issue over a long periodof time. Recently, deep learning methods provide a promis-ing solution for recommender systems. Based on the pre-vious researches, in this article, we propose a novel hybriddeep learning based recommender system, and the task ofDLMR is to provide an accurate top-N recommendation list,according to user’s historical behavior records and otheravailable information. Fig. 1 is the framework for DLMR.

B. INTERESTS EXPLORINGFirstly, we will present the interests exploring for each user.Many previous researches try to explore user’s interests indifferent ways [4], [7], since users always making decisionsaccording to their interests and preference. As an unsuper-vised generative probabilistic topic model, Latent DirichletAllocation (LDA) could be applicable to discover a set oftopics from collections of discrete words. Motivated by this,in DLMR, a generic scheme with three layers is proposed forinterests exploring from the available textual corpus, such asreviews, which could reflect user’s interests and preferenceperfectly. Here, T = {T1, T2, · · · , Tn} is introduced to denotethe latent interests for each user, and O+ denotes the tasteditems, and we resort to the model of sparse Latent DirichletAllocation (sparseLDA) to mimic the process for interestsinference [44].

The corresponding graphical illustration of interestsexploring is presented in Fig. 2. The textual review corpus,which will be combined together as the training documentfor each user, are derived from user’s latent interests andpreference, in other words, user’s interests are characterized

FIGURE 2. A three-layer modeling for interests exploring through sparseLatent Dirichlet Allocation. The rectangle denotes the latent interests foruser ui , and O+ denotes the set of items tasted by ui .

by the distribution over review corpus. It’s also interpretablefor the modeling ‘‘user-interests-items’’, since users alwaysmake decisions according to their interests and preferencein real life, therefore, it’s believable that recommendationaccording to user’s interests will enhance the robustness ofthe recommender engine significantly. Note that here, we justconsider the items tasted by the user for interests exploring.

Suppose there are K ∈ N+ latent interests variables inTi = {t1, t2, . . . , tK} for user ui ∈ U, and a concate-nation of C words for textual review corpus 9(u)

i |O+ ={w1,w2, . . . ,wC}, which is assigned by ui, accordingly,in order to learn the interests distribution for ui, we definethe following steps for interests inference:• Draw interest distribution θTi ∼ Dirichlet(α) for user ui;• Draw word distribution ϕui ∼ Dirichlet(β) for reviewcorpus assigned by ui;

• For each word wk in textual corpus 9(u)i |O+ :

(a) Choose an interest tk ∼ Multinomial(θTi );(b) Choose a word wk ∼ Multinomial(wk |tk , β).

During the learning process above, α and β are givenhyperparameters for Dirichlet distribution. Due to the largescale of review corpus and relatively small number of latentinterests for each user, here, to reduce the redundant compu-tation without loss of quality, Gibbs sampler is employed toapproximate the hidden variables θ, ϕ and t , according to thefollowing conditional distribution [45]:

p(ti = k|t¬i, 9(u)i |O+) ∝

∑k

αkβw

nk,¬i + βV︸ ︷︷ ︸$1

+

∑k

βwn¬iu,knk,¬i + βV︸ ︷︷ ︸$2

+

∑k

nwk,¬i(n¬iu,k + αk )

nk,¬i + βV︸ ︷︷ ︸$3

, (1)

where t¬i denotes excluding the i-th term. Let $ = $1 +

$2 +$3, and sample χ ∼ Uniform(0,$ ) as follows:• if χ 6 $1, sample interest distribution from$1;• if$1 < χ 6 $1 +$2, sample from$2;• if χ > $1 +$2, sample from$3.

VOLUME 6, 2018 51443

Page 5: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

After that convergence is achieved, the values of ϕ, θ canbe calculated through the following formulas:

ϕk,w =nk,w + βw∑

9(u)i |O+

nk,w + βw,

θTi,k =nu,k + αk∑Kk=1 nu,k + αk

, (2)

where nk,w is the number of tokens for word w assignedto interest tk , nu,k is the number of tokens assigned tointerest tk of user ui. From the sampling procedure above,the interests distribution θTi could be obtained for ui, throughsparseLDA theoretically, and in the following, DLMR canperform recommendation according to the learned interestsdistribution θT for each user.

Actually, in real life, user’s behavior may be not in accor-dance to the learned interests in some extent. Alternatively,the naive statistic of historical behavior records providesanother scheme to approximate user’s interests distribution,moreover,8 is introduced to denote the interests distributionfor user ui in this way:

8ui,tk =∑

vtag∈O+∪tag∈ctk

vtag/ |O+| , (3)

where ctk is the categorical tag for vtag that interest tkbelongs to.

So far, the symmetrical Jensen−Shannon divergence couldbe employed tomeasure the similarity of the learned T and8,and the interests coefficient for user ui could be defined asfollows [13], [15]:

0(ui) = 0(θTi ,8i)

= 1−12[DKL(θTi ||3)+ DKL(8i||3)], (4)

where 3 = 12 (θTi + 8i), DKL(.||.) is Kullback − Leibler

divergence.Here the obtained interests coefficient 0(ui) denotes

the approximation degree between the interest distributionlearned by sparseLDA and the realistic interest distribution,and its value belongs to (0, 1), moreover, it will be incorpo-rated into convolutional matrix factorization as a regulariza-tion term, to constrain feature vectors learning.

C. CNN ARCHITECTUREIn this article convolutional neural network (CNN) isemployed to capture information from both the textualreviews and contextual document simultaneously, for the rep-resentation vectors learning for items, since textual reviewscould indicate user’s interests and preference over items,and contextual information is the descriptive information foritems. The corresponding graphical illustration of CNN archi-tecture is presented in Fig. 3, which in general contains thesymmetric embedding layer, convolution layer, max-poolinglayer and output layer [32], [46].

FIGURE 3. CNN architecture for latent feature vector learning for items,through textual review corpus and contextual document. In general, thismodule includes the symmetric embedding layer, convolution layer,max-pooling layer and output layer.

1) SYMMETRIC EMBEDDING LAYERHere, take the textual reviews corpus 9(v) assigned toitem v as example for the symmetric embedding layer.To this end, a set of high frequent words are selectedfrom the review corpus 9(v) as the vocabulary, and then,embedding layer will map each word in 9(v) into a fixedf -dimensional vector. The embedding operation for the con-textual document ϒ is identical to the review corpus. Afterthat, two generated matrices 9(v)

∈ Rf×l , ϒ ∈ Rf×l couldbe obtained for the next convolution layer, where f denotesthe embedding dimension, and l is the size for 9(v) and ϒ .

2) CONVOLUTION LAYERConvolution layer is able to capture local features acrossinput sequence bymultiple filters, with a sliding window. Thegenerated feature vectors for 9(v) and ϒ through convolutionlayer are as follows respectively:

�jx = {x

j1, x

j2, . . . , x

j(l−win+1)},

�jy = {y

j1, y

j2, . . . , y

j(l−win+1)}, (5)

in which,

x jξ = π (Wξx ⊗ 9

(v)j−win+1 + b

ξx ),

yjξ = π (Wξy ⊗ ϒj−win+1 + b

ξy ), (6)

where j = {1, 2, . . . , d} is the number of filters, ξ ={1, 2, . . . , (l−win+1)}, win denotes the size for sliding win-dow, π is the non-linear activation function–rectified linearunit (ReLU), W ξ

x ∈ Rf×win and W ξy ∈ Rf×win are the shared

weights for filters across the input space, ⊗ is convolutionoperation, bx and by are biases.

3) MAX-POOLING LAYERTo compress the multiple-dimensional features extracted byconvolution layer, here, max-pooling is employed to extract

51444 VOLUME 6, 2018

Page 6: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

FIGURE 4. Probabilistic matrix factorization module with the learnedinterests and CNN. The yellow cell denotes the learned interests,and the dashed box is for the CNN architecture.

the global textual features. Through convolution layer, twolocal feature vectors are obtained from the textual sequence ofreview corpus and contextual document respectively, whichwill be merged to form a global variable via the followingmax operation as:

�jz = max{max(�%x ),max(�

%y )}, (7)

where % = {1, 2, . . . , d}. The generated representationvector �j

z could capture features from both the raw reviewcorpus and contextual document, which will be as the inputfor the next layer.

4) OUTPUT LAYERAccording to the output of max-pooling layer, the finalfeature vector could be obtained through a fully-connectednetwork with non-linear activation functions as follows:

oj = tanh(Wo · (tanh(Wh ·�jz + b

jh)+ b

jo)), (8)

where Wh, Wo are weight matrices, bjh, bjo are bias vectors.

In next section, the learned representation variable oj ∈ Rd

will be incorporated into matrix factorization as the latentfeature vector for items.

D. PROBABILISTIC MATRIX FACTORIZATION WITHINTERESTS AND CNNProbabilistic matrix factorization (PMF) model is a powerfultool in rating prediction, which is applied in many previousresearches. In this article, PMF is employed to learn the low-rank feature vectors for users and items, with the learnedinterests for each user and the representation feature vec-tors for items, accordingly, the predictive ratings could beobtained via the products of the learned low-rank vectors. Thegraphical illustration for this module is presented in Fig. 4.With the CNN architecture, we also refer to this module asconvolutional matrix factorization (CMF).

As mentioned before, the output oj of CNN architecturewill be incorporated into PMF as the latent feature vectorfor each item. For CMF, suppose the users and items areindependently and identically distributed, and place zeromean Gaussian priors over the latent low-rank feature vectorsU ∈ Rd×n and V ∈ Rd×m for users and items respectively,

accordingly, we could have:

p(U |σU ) =n∏i=1

N (ui|0, σ 2U I ),

p(V |σV ) =m∏j=1

N (vj|oj, σ 2V I ), (9)

where N (x|µ, σ 2) denotes the Gaussian normal distributionwith mean µ and variance σ 2 (µ = 0), d is the dimensionfor the latent feature vectors, and I represents the indicatorfunction, which is equal to 1 if user ui assigned rating toitem vj, and 0 otherwise. Accordingly, the conditional distri-bution over the observed rating matrix R ∈ Rn×m could beobtained as follows:

P(R|U ,V , σ 2 ) =n∏i=1

m∏j=1

N (ru,i|uTi vj, σ2)Ii,j . (10)

Here, we also suppose the latent interest variables T forusers are independently and identically distributed with zeromean Gaussian priors, accordingly, following the interestsexploring scheme, we can have:

P(T |U ,V , 9, σT ) =n∏i=1

K∏k=1

N (ti,k |uTi vj∈ti,k , σ2T ). (11)

where σT is variance for interest variable T .Here, for convenience, Q is introduced to denote the

weights and biases in CNN architecture [23]: Q =

{Wx ,Wy,Wh,Wo, bx , by, bh, bo}, furthermore, zero meanGaussian prior is placed over Q:

P(Q|σQ ) =|Q|∏z=1

N (qz|0, σ 2QI ), (12)

With the learned latent interests T and the representativeoutput of CNN architecture, the posterior probability of Uand V over the latent feature vectors could be obtained asfollows, through Bayesian inference:

P(U ,V ,Q|R, T , 9,ϒ, σ 2U , σ

2V , σ

2Q, σ

2T )

∝ P(R|U ,V , σ 2)P(U |σ 2U )P(V |Q, 9,ϒ, σ

2V )

×P(Q|σ 2Q)P(T |U ,V , 9, σ

2T )P(V |U , T , σ 2

V ). (13)

After that, the negative log-posterior probability of variablesU , V and Q can be obtained through Eq. (13) as objectivefunction as follows:

L(U ,V ,Q,R, T , 9,ϒ)

=12

n∑i=1

m∑j=1

(ri,j − uTi vj)2+λU

2‖U‖2F

+λV

2

m∑j=1

[(vj − oj)T (vj − oj)]+λQ

2‖Q‖2F

2

m∑j=1

[(vj −n∑i=1

0(ui)vj)T (vj −n∑i=1

0(ui)vj)]

2

n∑i=1

∑k∈Ti

[(ri,k − uTi vk )T (ri,k − uTi vk )], (14)

VOLUME 6, 2018 51445

Page 7: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

where λU = σ 2/σ 2U , λV = σ 2/σ 2

V , λQ = σ 2/σ 2Q, µ and

γ are trade-off parameters for objective function to avoidoverfitting, and ‖.‖2F denotes the Frobenius norm.

1) OPTIMIZATIONHere, Stochastic Gradient Descent method (SGD) isemployed over the objective function L to optimize variablesU , V and Q coordinately. Accordingly, the latent featurevectors U and V could be updated alternatively via:

ui ← ui − η[λUui −n∑i=1

m∑j=1

vj(ri,j − uTi vj)

+ γ

n∑i=1

∑k∈Ti

vk (uTi vk − ri,k )], (15)

vj ← vj − η[n∑i=1

m∑j=1

ui(uTi vj − ri,j)

+

m∑j=1

(λV (vj − oj)+ µvj(1−n∑i=1

0(ui))2)

+ γ

n∑i=1

∑k∈Ti

ui(uTi vk − ri,k )], (16)

where η is the learning rate. While ui and vj are fixed, backpropagation algorithm is employed to optimize the weightsand biases in CNN architecture [9].

2) PREDICTIONAfter that convergence is achieved, the rating prediction canbe performed via the product of the learned latent featurevectors: ri,j = uTi vj, accordingly, the candidate items will beformed with high predictive scores.

In contrast to previous recommender approaches,DLMR is able to explore user’s latent interests, captureinformation from both the textual reviews and contextualinformation, and conduct convolutional matrix factorizationwith the learned interests T , therefore, DLMR could over-come the data sparsity, reduce the uncertainty while perform-ing prediction, and provide high quality and personalizedrecommendation. The process of candidates generation forDLMR is presented in Algorithm 1.

IV. CANDIDATES RANKINGWith the high efficacy of DLMR presented before, itemswith high prediction scores would be much more likely to berecommended to users, and vice versa. Eventually, users maybe only interested in several recommended items, therefore,in the first stage of DLMR, the scale of candidate items,which are derived from the raw item corpus, decreases dras-tically from such millions to hundreds. DLMR highlightsthe recommendation quality, therefore, rating prediction isalways not enough for satisfactory results, and candidatesranking will play a crucial role for the final top-N recom-mended items, which could even affect the performance ofthe recommender system in some extent.

Algorithm 1 Candidates Generation for DLMR

Input: textual corpus 9(u)i |O+ , K, hyperparameters α and

β; contextual information ϒ ;Output: ϕk,w, θu,k , interests T , interest coefficient 0; can-didates list;initialize: 0, θ ∼ Dirichlet(α), ϕ ∼ Dirichlet(β), interesttui ∼ Multinomial(θui ), word wui ∼ Multinomial(ϕtui ),Q = {Wx ,Wy,Wh,Wo, bx , by, bh, bo}, U ∼ N (0, σU ),V ∼ N (oj, σV );while(true)

repeat Gibbs sampling according to Eq. (1) untill con-vergence;end whilecalculate: ϕk,w, θu,k according to Eq. (2);calculate: interest coefficient 0 according to Eq. (4);while(iteration> 0)

update: ui, according to Eq. (15);update: vj, according to Eq. (16);update: qz using back propagation algorithm;

end whileperform prediction via: ri,j = uTi vj;return: candidates generation with high predicted scores.

During the second stage, a user-based denoising autoen-coder (DAE) network with sigmoid activation is employedfor candidates ranking in DLMR, in contrast to traditionalranking methods [47]. With three hidden layers, DAE ownsthe superiority over representation and computation, further-more, it’s flexible for DAE to leverage available hetero-geneous side information to improve the recommendationperformance [33], [42]. Here side information (SI) includesuser’s profile, item’s various attributes, time, venue and soon, such as demographic features and properties of items.Actually, it’s a rather challenging work to incorporate therich side information into deep learning network [19], [21],although it’s beneficial to the recommendation performance.

The graphical illustration of DAE for candidates rankingis presented in Fig. 5. Assume the side information matrixS = {s1, s2, · · · , sn} ∈ Rp×n, and si = {si,1, si,2, · · · , si,p},moreover, side information S will be incorporated into DAEas real values normalized to [0, 1].

With the candidates list, the objective of DAE is to re-rankthe candidates items with available side information, and pro-vide much better top-N recommendation results. SupposeOito denote the generated candidate items for user ui, and thereare κ hidden neurons for DAE, and the variable for partiallyobserved ratings for ui is: Ri = {ri,1, ri,2, · · · , ri,q} ∈ Rq,accordingly, the reconstruction of Ri for user ui could beobtained through DAE as follows:

Ri = h(Ri, si;ϑ)|Oi

= f (W (r)d {g(W

(r)e · Ri +W

(s)e · si + be)} + bd ), (17)

where ϑ = {W (r)e ,W (r)

d ,W (s)e ,W

(s)d , be, bd }, f (·) and g(·) are

sigmoid activation functions, W (r)e ∈ Rq×κ , W (s)

e ∈ Rp×κ ,

51446 VOLUME 6, 2018

Page 8: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

FIGURE 5. Denoising Autoencoder modeling for candidates ranking. Thedark cells denote the observed ratings in Ri , and the yellow cell denotesside information si .

Algorithm 2 Candidates Ranking of DLMRInput: side information si; observed ratings Ri; candidatelist (hundreds);Output: top-N recommendation list;initialize: ϑ = {W (r)

e ,W (r)d ,W (s)

e ,W(s)d , be, bd };

while(iteration> 0)reconstruct the input via Eq. (17);

end whilereturn: top-N recommendation with high scores.

W (r)d ∈ Rκ×q andW (s)

d ∈ Rκ×p are weights vectors, and be ∈Rκ and bd ∈ Rq are biases. These parameters can be learnedusing back propagation method, with the following objectivefunction:

argminϑ

∥∥Ri − Ri|Oi

∥∥2F + ‖si − si‖

2F

+ λ(∥∥∥W (r)

e

∥∥∥2F+

∥∥∥W (s)e

∥∥∥2F+

∥∥∥W (r)d

∥∥∥2F+

∥∥∥W (s)d

∥∥∥2F).

(18)

With the learned parameters ϑ , Ri|Oi = h(Ri, si; ϑ) couldbe obtained.

Obviously, DAE could re-rank the candidate items throughtaking account of available side information, which is alwaysneglected in previous researches. According to the resultsof DAE, DLMR is able to provide dozens of final recom-mendation items with high scores for each user, which isalso referred to as top-N recommendation. The process ofcandidates ranking of DLMR is presented in Algorithm 2in detail.

V. EXPERIMENTAL ANALYSISIn this section, to evaluate the proposed recommendationapproach DLMR, we will perform experiments on AmazonBooks (Amazon-b),1 Amazon Movies and TV (Amazon-m&t), and Douban Movie (Douban-m)2 [6] datasets respec-tively, and compare the performance of DLMR with otherbenchmark recommendation methods. In addition, we will

1http://jmcauley.ucsd.edu/data/amazon/2https://www.douban.com/

also investigate the impacts of parameters over the recom-mendation performance.

A. DATASETSDatasets Amazon-b andAmazon-m&t rang fromMay 1996 toJuly 2014, including numerical score, helpfulness, time, tex-tual reviews, descriptions and category information for eachuser-product pair. For Amazon-b andAmazon-m&t, we crawlthe corresponding contextual description information fromIMDB3 for each item, and choose helpfulness as side infor-mation for each user. Douban-m is a collection ranging fromDecember 2012 to March 2013, and we crawl additionaltextual reviews for each user-item pair and descriptive infor-mation for each item.

In general, these three datasets all contain numerical rat-ings for user-item pairs with scale of 1-5, textual reviewsassigned by users, and contextual information for each item.Note here, the datasets Amazon-b and Amazon-m&t in thisarticle are subsets derived from the versions provided onlinerespectively. With the outlier and items with less than threeratings being removed, the statistic for the three datasets ispresented in Table 2, which shows that the datasets are rathersparse. 5-fold cross validation is employed in experiments,and each dataset will be split randomly into a training set(80%) and a testing set (20%).

B. EVALUATION METRICSThe essential goal of the proposed DLMR is to provide accu-rate top-N recommendation to users, therefore, Recall@Nand Precision@N are employed as the evaluation metrics.

Recall@N =

∣∣OtopN ∩Oadopted∣∣∣∣Oadopted

∣∣ ,

Precision@N =

∣∣OtopN ∩Oadopted∣∣

N, (19)

where OtopN is the set of top-N recommendation items pro-vided by DLMR, Oadopted is the set of items adopted by theuser. On the other hand, root mean squared error (RMSE) andmean absolute error (MAE) are also employed to evaluate theperformance of DLMR. RMSE and MAE are useful measuremetrics for rating prediction, which are defined as follows:

RMSE =

√√√√ 1|Otest |

∑(u,i)∈Otest

(ru,i − ru,i)2,

MAE =1|Otest |

∑(u,i)∈Otest

∣∣ru,i − ru,i∣∣ , (20)

whereOtest denotes the testing set, ru,i denotes the real ratingsin testing set, and ru,i denotes the prediction value obtainedby DLMR.

C. BENCHMARK METHODSTo evaluate the performance of DLMR, the following recom-mender algorithms are employed for comparison:

3http://www.imdb.com

VOLUME 6, 2018 51447

Page 9: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

TABLE 2. Statistic of the three real-world datasets.

FIGURE 6. Parameters tuning for α, β, d , λU , λV , λQ, µ and γ over Amazon-b.

PMF: Probabilistic matrix factorization model is proposedin [11], which conducts recommendation just bases on nume-rial user-item rating matrix, and learns latent feature vectorsfor users and items respectively.

CTR: Collaborative Topic Regression [8] could provideaccurate recommendation, since it combines the probabilistictopic modeling and traditional collaborative filtering methodtogether.

CDL: Collaborative Deep Learning [5] is able to per-form the representation learning for content informationthrough Stacked Denoising Autoencoder (SDAE), and con-duct collaborative filtering for rating prediction, accordingly,CDL can provide accurate recommendation results.

ConvMF: ConvMF [23] successfully introduces Convolu-tional neural network (CNN) to learn representation vectorsfor items from contextual information, which will be incor-porated into the Matrix Factorization (MF) to learn featurevectors for users and items respectively.

DLMR-PMF: To investigate the performance of theproposed DLMR, DLMR-PMF denotes the scheme ofcandidates generation of DLMR, without candidatesranking.

DLMR-DAE: For comparison, DLMR-DAE denotes theproposed recommendation system–Deep Learning Modelingfor top-N Recommendation with Interests Exploring, includ-ing interests exploring, representation vectors learning foritems via CNN architecture, candidates generating and can-didates ranking.

D. PARAMETERS SETTINGParameters setting could affect the performance of DLMR.Below, we will investigate the parameter setting for K, α, β,

d , λU , λV , λQ, µ, γ , κ and λ. For the interests exploringprocess in sparseLDA, K is the number of latent interests,and, hyperparameters α and β are for Dirichlet distribution,which determine the distributions for interests and wordsrespectively. Empirically, we set α = K/50 and β = 0.1for interests exploring [7]. d and κ are dimensions for latentvectors, and the others are trade-off parameters for regulariza-tion terms. The dimension for latent feature vectors learningin CMF is set to d = 30 [11].While exploring latent interests for users, the selected

textual reviews are concatenated as the document for eachuser, and while preprocessing the textual information forCNN architecture over these three datasets, similar to [23],we firstly remove the stop words from the raw reviews andcontextual document, then choose top 20000 discriminativewords according to the tf-idf values to form two vocabulariesrespectively. Note here, we choose 300 words for each rawdocument. For CNN architecture, the dimension size for wordembedding is set to 200, and the dropout rate is set to 0.2 toavoid overfitting [5], [23].

In practice, due to the small scale (such hundreds) ofcandidates items in DLMR, we set κ = 250 and λ = 0.1respectively, and employ the architecture (500-250-500) forcandidates ranking in DAE. For Amazon-b, the optimal per-formance could be achieved with the following parametersetting : λU = 1, λV = 10, λQ = 0.001, µ = 0.01, γ = 10,and the corresponding results for parameters tuning overAmazon-b is reported in Fig. 6. Small values for param-eters will cause inaccurate recommender results, how-ever, large values for parameters will lead to overfitting.For Amazon-m&t and Douban-m, the optimal values for eachparameter are shown in Table 3.

51448 VOLUME 6, 2018

Page 10: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

TABLE 3. Parameters setting for each dataset.

TABLE 4. Performance comparison in terms of RMSE and MAE for PMF,CTR, CDL, ConvMF, DLMR-PMF and DLMR-DAE over Amazon-b (K = 50).

TABLE 5. Performance comparison in terms of RMSE and MAE for PMF,CTR, CDL, ConvMF, DLMR-PMF and DLMR-DAE over Amazon-m&t(K = 50).

TABLE 6. Performance comparison in terms of RMSE and MAE for PMF,CTR, CDL, ConvMF, DLMR-PMF and DLMR-DAE over Douban-m (K = 30).

E. PERFORMANCE COMPARISONBelow, we will carry out a series of experiments over thesethree datasets to evaluate the effectiveness and practicabilityof DLMR, and we will further compare the performance ofDLMR with the benchmark approaches, in terms of RMSE,MAE, Recall@N and Precision@N.

Tables 4, 5 and 6 report the performance comparison ofrating prediction for PMF, CTR, CDL, ConvMF, DLMR-PMF and DLMR-DAE, in terms of RMSE and MAE. Here,note that the number of latent interests for each user is setto K = 50 for Amazon-b and Amazon-m&t, and K = 30for Douban-m. The dimension for latent feature vectors is

set to d = 10 and d = 30 respectively. The resultsin Table 4, 5 and 6 indicate that:• Values of RMSE and MAE for PMF over each datasetare much bigger than that of other methods respectively,since PMF performs recommendation just taking thenumerical user-item rating matrix into consideration,and neglects other available information. In addition,PMF is the basic technique for other advanced recom-mender systems, such as CTR, CDL, ConvMF, DLMR-PMF and DLMR-DAE.

• CTR, CDL and ConvMF try to improve the recommen-dation performance via introducing such topic regres-sion module, stacked denoising autoencoder and con-volutional neural network into collaborative filtering.Experimental results indicate that the performance ofCTR, CDL and ConvMF are more or less in terms ofRMSE and MAE, but a little better than that of PMF.

• Based on previous researches, the proposed meth-ods DLMR-PMF and DLMR-DAE outperform PMF,CTR, CDL and ConvMF significantly, over Amazon-b,Amazon-m&t and Douban-m.

• The performance of DLMR-DAE with candidates rank-ing outperforms DLMR-PMF slightly.

• Overall, for each recommender approach, the perfor-mancewith d = 30 is a little better than that with d = 10over each dataset.

As mentioned before, DLMR-DAE tries to explore the latentinterests for each user, perform convolutional matrix fac-torization for candidates generation with the learned inter-ests and textual information (such as reviews and contextualinformation), and re-rank the candidates items with availableside information for top-N recommendation, which is ableto improve the recommendation performance significantly.Take Amazon-b for example. While K = 50, d = 30,values of RMSE and MAE for DLMR-DAE are 0.841 and0.621 respectively, the improvements of which are 7.8%and 11.9% compared to PMF. While compared with CTR,CDL and ConvMF, the improvements of DLMR-DAE arearound 5.0% in terms of RMSE and MAE. Moreover, val-ues of RMSE and MAE for DLMR-PMF are 0.851 and0.637 respectively, which are a little bigger than that ofDLMR-DAE. Obviously, these results could demonstrate thesuperiority of DLMR-DAE in contrast to other methods. Thesimilar experimental results are reported over Amazon-m&tandDouban-m. On the other hand, values of RMSE andMAEover Amazon-b and Amazon-m&t are slightly larger than thatof Douban-m, since Douban-m is a little more dense.

Performance comparison of top-N recommendation interms of Recall@N and Precision@N for PMF, CTR, CDL,ConvMF, DLMR-PMF andDLMR-DAE is reported in Fig. 7,Fig. 8 and Fig. 9, which indicates that:• Values of Recall@N and Precision@N share the sim-ilar trends respectively for these approaches over eachdataset: Values of Recall@N increase gradually withthe increasing N, by contrast, values of Precision@Ndecrease slowly with the increasing N.

VOLUME 6, 2018 51449

Page 11: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

FIGURE 7. Performance comparison in terms of Recall@N and Precision@N over Amazon-b.

FIGURE 8. Performance comparison in terms of Recall@N and Precision@N over Amazon-m&t.

FIGURE 9. Performance comparison in terms of Recall@N and Precision@N over Douban-m.

• Values of Recall@N and Precision@N for PMF arerelatively small, while comparedwith that of CTR, CDL,ConvMF, DLMR-PMF and DLMR-DAE.

• Values of Recall@N and Precision@N over Amazon-bare more or less with that over Amazon-m&t, however,they are a little smaller than that of Douban-m.

• DLMR-PMF and DLMR-DAE outperform PMF,CTR, CDL and ConvMF significantly in terms ofRecall@N and Precision@N over Amazon-b, Amazon-m&t and Douban-m. In addition, the performance ofDLMR-DAE is slightly better than that of DLMR-PMF.

Due to these three datasets are rather sparse, values ofRecall@N and Precision@N are relatively small in Fig. 7,Fig. 8 and Fig. 9. For Amazon-b, values of Recall@25 andPrecision@25 for DLMR-PMF are 0.279 and 0.167 respec-tively, which are a little better than that of PMF, CTR, CDLand ConvMF, by contrast, values of Recall@25 and Preci-sion@25 for DLMR-DAE are 0.393 and 0.198 respectively,which slightly outperforms DLMR-PMF, due to candidatesranking. Moreover, the similar experimental results couldbe obtained for Amazon-m&t and Douban-m. Obviously,DLMR-DAE could obtain significant improvement while

51450 VOLUME 6, 2018

Page 12: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

FIGURE 10. Investigation of the number of latent interests K in terms ofRecall@N over Douban-m. Obviously, the optimal values for Recall@N canbe obtained while K = 30.

compared with PMF, CTR, CDL, ConvMF and DLMR-PMF,in terms of Recall@N and Precision@N, over these threedatasets.

In summary, from the experimental analysis overAmazon-b, Amazon-m&t and Douban-m, we could concludethat the performance of DLMR-DAE is stable and effectiveover real world datasets, which is able to provide much moreeffective and accurate top-N recommendation in contrast tostate-of-the-art RS.

F. DISCUSSIONAs presented before, we have investigated the performanceof DLMR over real world datasets in contrast to PMF, CTR,CDL and ConvMF. In this section, we will investigate theimpacts of interests exploring and candidates ranking over theperformance of DLMR. At last, we will present the compu-tational complexity for DLMR.

1) IMPACT OF INTERESTS EXPLORINGDue to the data sparsity of each dataset, it’s a tricky problemto provide accurate recommendation for users from largeamounts of items. Actually, users always make decisionsaccording to their interests, therefore, it’s conceivable andinterpretable that recommendations according to user’s inter-ests will enhance the performance of recommender systemssignificantly. The procedure of latent interests exploring viasparse Latent Dirichlet Allocation (sparseLDA) have beenpresented in detail before.

Here, we have investigated the impacts of the num-ber of latent interests K over the recommendation per-formance for Douban-m, and the corresponding resultsare reported in Fig. 10, which indicate that the opti-mal values of Recall@N could be obtained while K isaround 30.

Furthermore, we do researches to gain better insights intointerests exploring of DLMR, and Fig. 11 reports the corre-sponding results over Douban-m. Fig. 11(a) shows the longtail distribution for item popularity, which indicates Douban-m is rather sparse. Fig. 11(b) is the statistic for top-10 learned

interests, and Fig. 11(c) is heatmap for interests coefficient,which is as the function of number of ratings and number oflatent interests. The results indicate that the number of itemstasted by each user is relatively small in real life, besides,the proper number for latent interests is around 30 overDouban-m.

Table 7 shows the top-10 candidates list and the finaltop-10 recommendation list, which are recommended byDLMR for three users over Amazon-b, Amazon-m&t andDouban-m respectively, according to the learned latentinterests. Obviously, the top-10 recommendation list couldprovide better recommendation results for each user than thetop-10 candidates list, due to candidates ranking.

2) IMPACT OF CANDIDATES RANKINGAs stated above, candidates list is generated according tothe predictive scores, in fact, in real world, it is always notenough to provide accurate recommendation with candidatesitems. Actually, candidates ranking plays crucial role forthe performance of recommender systems, which has beencertificated in previous researches. For DLMR, candidatesranking is conducted via a user-based denoising autoencodernetwork (DAE), with three hidden layers. For Amazon-band Amazon-m&t, the values of helpfulness is normalizedto [0, 1] as side information for DAE, by contrast, for Douban-m, candidates ranking is performed through DAEwithout anyavailable side information.

From Table 7, we can see that the top-10 recommendationlist with candidates ranking is much better than the candidateslist, and the bold items denote the successful recommendationitems (accepted by the user). For user I in Amazon-b, top-10candidates list provides two successful recommendationitems: Atlas ofMiddle-earth and The LostWorld, by contrast,top-10 recommendation list provides three: Atlas of Middle-earth, The Lost World and The Lord of the Rings. For user IIin Amazon-m&t, top-10 candidates list provides two success-ful recommendation items: Sense and Sensibility, and TheBest Years of Our Lives, by contrast, top-10 recommendationlist provides four: The Best Years of Our Lives, To Haveand Have Not, Sense and Sensibility and Now and Then.For user III in Douban-m, top-10 candidates list provides onesuccessful recommendation item: Avatar, by contrast, top-10recommendation list provides three: Avatar, TheMummy andThe Green Mile. In a word, the results could demonstratethe significant superiority of DLMR-DAE, since candidatesranking not only considers the numerical ratings, but also theavailable side information.

On the other hand, experiments are performed to inves-tigate the impacts of hidden units κ over the perfor-mance in DAE, and the corresponding results are presentedin Fig. 12, which shows that values of Recall@N increaserapidly with the increasing κ while κ is under 200, andwhile κ is bigger than 300, values of Recall@N increase veryslowly with κ , but it will lead to heavy computational cost.Therefore, κ = 250 could be chosen as optimal value forDouban-m.

VOLUME 6, 2018 51451

Page 13: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

FIGURE 11. Investigation of interests exploring over Douban-m. (a) Long tail distribution for item popularity. (b) Top 10 learned interests. (c) Heatmap forinterests coefficient.

TABLE 7. Toy example of top-10 candidates list and top-10 recommendation list, which are provided by DLMR for three users over Amazon-b,Amazon-m&t and Douban-m respectively.

3) COMPLEXITY ANALYSISFirstly, we analyze the computational complexity for DLMR.In sparseLDA, the computational complexity of inter-ests exploring for each user is O(K9 + Kw), in con-trast to OK for the standard collapsed Gibbs sampler,

whereK9 is the number of latent interests that review corpus9 contains, andKw is the number of interests wordw belongsto. In CMF, computational complexity for CNN structure andprocedure of low-rank feature vectors learning are O(dflm)and O(d2 |O+| + d3n + d3m) respectively. Accordingly,

51452 VOLUME 6, 2018

Page 14: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

FIGURE 12. Performance investigation for hidden units κ of DAE overDouban-m, in terms of Recall@5, Recall@10, Recall@20 and Recall@30.

overall, the computational complexity for DLMR-PMF isO(nK9 + nKw + d2 |O+| + d3n + d3m + dflm). In DAEthe scale of candidates list is relatively small in contrast tothe procedure of low-rank feature vector learning, and, itscomputational cost is O(npκ + nqκ).All experiments are performed in python on a PC with

Intel i7-8700k CPU and NVIDIA GTX1080Ti GPU. ForAmazon-b and Amazon-m&t, it needs about 200 epochs forDLMR-PMF to achieve convergence in training phase,each epoch needs 40s on average, by contrast, it needs100 epochs for DAE. For Douban-m, it needs 250 epochs forDLMR-PMF in training phase, and each epoch needsabout 10s on average. By contrast, the computational cost foronline query could be ignored.

VI. CONCLUSIONAt present, recommender systems are widely applied inmany areas, and it’s promising to incorporate the deeplearning methods to enhance the performance of recom-mender systems. In this article, a novel deep learning basedrecommendation system–DLMR is developed, which couldalleviate the data sparsity problem and provide high perfor-mance top-N recommendation. The essential goal of DLMRis to leverage available information for accurate recommen-dation, to this end, DLMR explores user’s interests distri-bution via probabilistic topic model–sparseLDA, capturesinformation from both the textual reviews and contextualinformation via convolutional neural network, and performsconvolutional matrix factorization to learn low-rank featurevectors for users and items respectively, after that, the candi-dates list could be formed according to the prediction scores.

Furthermore, to improve the recommendation performance,DLMR performs candidates ranking for final top-N recom-mendation list, via a three-layer denoising autoencoder withheterogeneous side information.

As a hybrid model of traditional matrix factorization tech-nique and deep learning methods, DLMR is able to overcomethe data sparsity problem, provide personalized recommenda-tion with the learned interests, and extend easily over large-scale datasets. In addition, performance analysis over realworld datasets certificates that DLMR is an effective andefficient recommendation system, and outperforms state-of-the-art approaches significantly.

A major problem with DLMR is that it’s hard to draw thechange of user’s interests with time, therefore, as future work,we intend to enhance the module of interests exploring, andintroduce user’s social network into DLMR, to further boostthe recommendation performance.

REFERENCES[1] C. A. Gomez-Uribe and N. Hunt, ‘‘The Netflix recommender system:

Algorithms, business value, and innovation,’’ ACM Trans. Manage. Inf.Syst., vol. 6, no. 4, 2016, Art. no. 13.

[2] J. Davidson et al., ‘‘The YouTube video recommendation system,’’ inProc.ACM Conf. Recommender Syst., 2010, pp. 293–296.

[3] S. Zhang, L. Yao, Y. Tay, and A. Sun. (Jul. 2017). ‘‘Deep learning basedrecommender system: A survey and new perspectives.’’ [Online]. Avail-able: https://arxiv.org/abs/1707.07435

[4] Q. Liu, E. Chen, H. Xiong, C. H. Q. Ding, and J. Chen, ‘‘Enhancingcollaborative filtering by user interest expansion via personalized ranking,’’IEEE Trans. Syst. Man, Cybern. B, Cybern., vol. 42, no. 1, pp. 218–233,Feb. 2012.

[5] H. Wang, N. Wang, and D. Y. Yeung, ‘‘Collaborative deep learning forrecommender systems,’’ in Proc. 21th ACM SIGKDD Int. Conf. Knowl.Discovery Data Mining, 2014, pp. 1235–1244.

[6] X. Qian, H. Feng, G. Zhao, and T. Mei, ‘‘Personalized recommendationcombining user interest and social circle,’’ IEEE Trans. Knowl. Data Eng.,vol. 26, no. 7, pp. 1763–1777, Jul. 2014.

[7] D. M. Blei, A. Y. Ng, and M. I. Jordan, ‘‘Latent Dirichlet allocation,’’J. Mach. Learn. Res., vol. 3, pp. 993–1022, Mar. 2003.

[8] C.Wang and D.M. Blei, ‘‘Collaborative topic modeling for recommendingscientific articles,’’ in Proc. 17th ACM SIGKDD Int. Conf. Knowl. Discov-ery Data Mining, 2011, pp. 448–456.

[9] J. Gao, P. Pantel, M. Gamon, X. He, and L. Deng, ‘‘Modeling interest-ingness with deep neural networks,’’ in Proc. Conf. Empirical MethodsNatural Lang. Process. (EMNLP), 2014, pp. 2–13.

[10] S. Yang, G. Huang, Y. Xiang, X. Zhou, and C.-H. Chi, ‘‘Modeling userpreferences on spatiotemporal topics for point-of-interest recommenda-tion,’’ in Proc. IEEE Int. Conf. Services Comput., Jun. 2017, pp. 204–211.

[11] R. Salakhutdinov and A. Mnih, ‘‘Probabilistic matrix factorization,’’ inProc. Adv. Neural Inf. Process. Syst., 2008, pp. 1257–1264.

[12] Y. Koren, ‘‘Factorization meets the neighborhood: A multifaceted collab-orative filtering model,’’ in Proc. 14th ACM SIGKDD Int. Conf. Knowl.Discovery Data Mining, 2008, pp. 426–434.

[13] T. Wu, L. Chen, X. Xian, and Y. Guo, ‘‘Evolution prediction of multi-scale information diffusion dynamics,’’ Knowl.-Based Syst., vol. 113,pp. 186–198, Dec. 2016.

[14] R. Forsati, M. Mahdavi, M. Shamsfard, and M. Sarwat, ‘‘Matrix fac-torization with explicit trust and distrust side information for improvedsocial recommendation,’’ ACM Trans. Inf. Syst., vol. 32, no. 4, Oct. 2014,Art. no. 17.

[15] X. Ren, M. Song, E. Haihong, and J. Song, ‘‘Context-aware probabilis-tic matrix factorization modeling for point-of-interest recommendation,’’Neurocomputing, vol. 241, pp. 38–55, Jun. 2017.

[16] Y. Shi, M. Larson, and A. Hanjalic, ‘‘Collaborative filtering beyond theuser-item matrix: A survey of the state of the art and future challenges,’’ACM Comput. Surv., vol. 47, no. 1, 2014, Art. no. 3.

VOLUME 6, 2018 51453

Page 15: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

[17] X. He and T.-S. Chua, ‘‘Neural factorization machines for sparse predictiveanalytics,’’ in Proc. 40th Int. ACM SIGIR Conf. Res. Develop. Inf. Retr.,2017, pp. 355–364.

[18] T. Ebesu and Y. Fang, ‘‘Neural citation network for context-aware citationrecommendation,’’ in Proc. 40th Int. ACM SIGIR Conf. Res. Develop. Inf.Retr., 2017, pp. 1093–1096.

[19] P. Covington, J. Adams, and E. Sargin, ‘‘Deep neural networks forYouTube recommendations,’’ in Proc. 10th ACM Conf. RecommenderSyst., 2016, pp. 191–198.

[20] B. Bai, Y. Fan,W. Tan, and J. Zhang, ‘‘DLTSR: A deep learning frameworkfor recommendation of long-tail Web services,’’ IEEE Trans. ServicesComput., to be published, doi: 10.1109/TSC.2017.2681666.

[21] X. Dong, L. Yu, Z. Wu, Y. Sun, L. Yuan, and F. Zhang, ‘‘A hybrid collab-orative filtering model with deep structure for recommender systems,’’ inProc. AAAI, 2017, pp. 1309–1315.

[22] F. Strub, R. Gaudel, and J. Mary, ‘‘Hybrid recommender system based onautoencoders,’’ in Proc. 1st Workshop Deep Learn. Recommender Syst.,2016, pp. 11–16.

[23] D. Kim, C. Park, J. Oh, S. Lee, and H. Yu, ‘‘Convolutional matrix factor-ization for document context-aware recommendation,’’ in Proc. 10th ACMConf. Recommender Syst., 2016, pp. 233–240.

[24] L.Wu et al., ‘‘Modeling the evolution of users’ preferences and social linksin social networking services,’’ IEEE Trans. Knowl. Data Eng., vol. 29,no. 6, pp. 1240–1253, Jun. 2017.

[25] W. Zhou, J. Li, M. Zhang, and J. Ning, ‘‘Incorporating social network anduser’s preference in matrix factorization for recommendation,’’ ArabianJ. Sci. Eng., pp. 1–15, Jun. 2018.

[26] R. Salakhutdinov and A. Mnih, ‘‘Bayesian probabilistic matrix factoriza-tion using Markov chain Monte Carlo,’’ in Proc. 25th Int. Conf. Mach.Learn., 2008, pp. 880–887.

[27] Y. Koren, R. Bell, and C. Volinsky, ‘‘Matrix factorization techniquesfor recommender systems,’’ IEEE Comput., vol. 42, no. 8, pp. 30–37,Aug. 2009.

[28] M. Jiang, P. Cui, F. Wang, W. Zhu, and S. Yang, ‘‘Scalable recommenda-tion with social contextual information,’’ IEEE Trans. Knowl. Data Eng.,vol. 26, no. 11, pp. 2789–2802, Nov. 2014.

[29] W. Zhou, J. Li, and Q. Xue, ‘‘Social network enhanced collective recom-mendation,’’ in Proc. 14th Int. Comput. Conf. Wavelet Act. Media Technol.Inf. Process. (ICCWAMTIP), 2017, pp. 251–257.

[30] Y. Shi, M. Larson, and A. Hanjalic, ‘‘Mining contextual movie similar-ity with matrix factorization for context-aware recommendation,’’ ACMTrans. Intell. Syst. Technol., vol. 4, no. 1, 2013, Art. no. 16.

[31] X. Li and J. She, ‘‘Collaborative variational autoencoder for recommendersystems,’’ in Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discovery DataMining, 2017, pp. 305–314.

[32] G. E. Hinton, S. Osindero, and Y.-W. Teh, ‘‘A fast learning algorithm fordeep belief nets,’’ Neural Comput., vol. 18, no. 7, pp. 1527–1554, 2006.

[33] P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol,‘‘Stacked denoising autoencoders: Learning useful representations in adeep network with a local denoising criterion,’’ J. Mach. Learn. Res.,vol. 11, no. 12, pp. 3371–3408, Dec. 2010.

[34] M. Zhang, H. Qu, X. Xie, and J. Kurths, ‘‘Supervised learning in spik-ing neural networks with noise-threshold,’’ Neurocomputing, vol. 219,pp. 333–349, Jan. 2017.

[35] J. B. P. Vuurens, M. Larson, and A. P. de Vries, ‘‘Exploring deep space:Learning personalized ranking in a semantic space,’’ in Proc. 1st WorkshopDeep Learn. Recommender Syst., 2016, pp. 23–28.

[36] A. Sehgal and N. Kehtarnavaz, ‘‘A convolutional neural network smart-phone app for real-time voice activity detection,’’ IEEE Access, vol. 6,pp. 9017–9026, 2018.

[37] L. Zheng, V. Noroozi, and P. S. Yu, ‘‘Joint deep modeling of users anditems using reviews for recommendation,’’ in Proc. 10th ACM Int. Conf.Web Search Data Mining, 2017, pp. 425–434.

[38] T. Bansal, D. Belanger, and A. Mccallum, ‘‘Ask the GRU : Multi-tasklearning for deep text recommendations,’’ in Proc. 10th ACM Conf. Rec-ommender Syst., 2016, pp. 107–114.

[39] R. Catherine and W. Cohen. (Jun. 2017). ‘‘TransNets: Learning totransform for recommendation.’’ [Online]. Available: https://arxiv.org/abs/1704.02298

[40] S. Cao, N. Yang, and Z. Liu, ‘‘Online news recommender based on stackedauto-encoder,’’ inProc. IEEE/ACIS 16th Int. Conf. Comput. Inf. Sci. (ICIS),May 2017, pp. 721–726.

[41] R. Salakhutdinov, A. Mnih, and G. Hinton, ‘‘Restricted Boltzmannmachines for collaborative filtering,’’ inProc. 24th Int. Conf. Mach. Learn.,2007, pp. 791–798.

[42] S. Sedhain, A. K. Menon, S. Sanner, and L. Xie, ‘‘AutoRec: Autoencodersmeet collaborative filtering,’’ in Proc. 24th Int. Conf. World Wide Web,2015, pp. 111–112.

[43] Y. Wu, C. DuBois, A. X. Zheng, and M. Ester, ‘‘Collaborative denoisingauto-encoders for top-N recommender systems,’’ in Proc. 9th ACM Int.Conf. Web Search Data Mining, 2016, pp. 153–162.

[44] J. Yuan et al., ‘‘LightLDA: Big topicmodels onmodest computer clusters,’’in Proc. 24th Int. Conf. World Wide Web, 2015, pp. 1351–1361.

[45] L. Yao, D. Mimno, and A. Mccallum, ‘‘Efficient methods for topicmodel inference on streaming document collections,’’ in Proc. 15th ACMSIGKDD Int. Conf. Knowl. Discovery Data Mining, 2009, pp. 937–946.

[46] K. Muhammad, J. Ahmad, I. Mehmood, S. Rho, and S. W. Baik, ‘‘Convo-lutional neural networks based fire detection in surveillance videos,’’ IEEEAccess, vol. 6, pp. 18174–18183, 2018.

[47] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt-Thieme, ‘‘BPR:Bayesian personalized ranking from implicit feedback,’’ in Proc. Conf.Uncertainty Artif. Intell., 2009, pp. 452–461.

WANG ZHOU received the B.S. degree in infor-mation security and the M.S. degree in communi-cation and information system with the College ofElectronics and Information Engineering, SichuanUniversity. He is currently pursuing the Ph.D.degree with the School of Computer Science andEngineering, University of Electronic Science andTechnology of China. His current research inter-ests include deep learning, recommender systems,and data mining.

JIANPING LI received the Ph.D. degree in com-puter science from Chongqing University. He iscurrently a Professor and the Vice Dean with theSchool of Computer Science and Engineering,University of Electronic Science and Technologyof China, and the Director of the InternationalCentre for Wavelet Analysis and Its Applications,Logistical Engineering University of China. Hehas authored sixteen academic writing books andover 150 journals and refereed conference publi-

cations his research areas. His research interests include wavelets analysisand its application, information security, biometric recognition, and per-sonal authentication and its applications. He is one of the Founders and anAssociate Editor of the International Journal of Wavelet Multiresolution andInformation Processing.

MALU ZHANG is a currently pursuing the Ph.D.degree with the Department of Computer Scienceand Engineering, University of Electronic Scienceand Technology of China. His current researchinterests relate to neural networks, machine learn-ing, and big data.

51454 VOLUME 6, 2018

Page 16: Deep Learning Modeling for Top-N Recommendation With …static.tongtianta.site/paper_pdf/4db61b92-3362-11e9-baf1... · 2019-02-18 · W. Zhou et al.: Deep Learning Modeling for Top-N

W. Zhou et al.: Deep Learning Modeling for Top-N Recommendation With Interests Exploring

YAZHOU WANG received the B.S. degree fromSouthwest University, China, and the M.S. degreefrom Sichuan University, China. He is currentlypursuing the Ph.D. degree from the School ofOptoelectronic Information, University of Elec-tronic Science and Technology of China. He isalso a Visiting Student with DTU Fotonik, Tech-nical University of Denmark. His research inter-ests include algorithm design of nonlinear optics,infrared fiber laser, ultrafast terahertz pulse, andlaser supercontinuum.

FADIA SHAH received the M.S. degree in com-puter science. She is currently pursuing the Ph.D.degree with the School of Computer Science andEngineering, University of Electronic Science andTechnology of China. She has a vast academic,technical, and professional experience in Pakistan.Her research interests include medical big data,IoT, SDN, e-Health and tele-medicine, and con-cerned technologies and algorithms. She receivedthe Excellent Students Academics Award in 2016.

VOLUME 6, 2018 51455