Parallelizing Random Walk with Restart for Large-Scale Query Recommendation
description
Transcript of Parallelizing Random Walk with Restart for Large-Scale Query Recommendation
![Page 1: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/1.jpg)
Parallelizing Parallelizing Random Walk with Restart for Random Walk with Restart for
Large-Scale Query RecommendationLarge-Scale Query Recommendation
Meng-Fen Chiang, Tsung-Wei Wang andMeng-Fen Chiang, Tsung-Wei Wang and
Wen-Chih PengWen-Chih Peng
Department of Computer ScienceDepartment of Computer Science
National Chiao Tung University (R.O.C.)National Chiao Tung University (R.O.C.)
![Page 2: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/2.jpg)
OutlineOutline
• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR
– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries
• Experimental ResultsExperimental Results• ConclusionConclusion
2
![Page 3: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/3.jpg)
IntroductionIntroduction
• Yahoo! Asia Knowledge Plus (AKP)Yahoo! Asia Knowledge Plus (AKP)
Question Answer
3
![Page 4: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/4.jpg)
Introduction (contd.)Introduction (contd.)
• User access logUser access log– Consider a QA pair as an Item– A sequence of items clicked by a user
– Typically, what a user looks for during a short period shares certain topics
• Within 4 min, 18 sec. “Upload photos to Facebook “4
![Page 5: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/5.jpg)
Introduction (contd.)Introduction (contd.)
• Random Walk with Restart (RWR)Random Walk with Restart (RWR)– Compute relevance scores of a set of node for
a query nodeNode 4
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12
0.130.100.130.220.130.050.050.080.040.030.040.02
1
4
3
2
56
7
910
811
120.13
0.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
5
![Page 6: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/6.jpg)
OutlineOutline
• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR
– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries
• Experimental ResultsExperimental Results• ConclusionConclusion
6
![Page 7: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/7.jpg)
Related WorkRelated Work
• Random Walk with Restart (RWR)Random Walk with Restart (RWR)– Off-line mode
• Pre-compute required information off-line– Pros : fast on-line recommendation for a query– Cons : prohibitive storage consumption
– On-line mode• Compute recommendation for a query on-line
– Pros : less storage consumption– Cons : longer response time
– Fast RWR• Less storage consumption• Fast on-line response time for a query
7
![Page 8: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/8.jpg)
Related Work (contd.)Related Work (contd.)
• Scalable recommendationScalable recommendation– SmartMiner
• Identify user sessions• Mine frequent navigation patterns
– Personalized community recommendation• 312 K active users, 109 K popular communities• Training time ~ 14 mins (200 nodes)
– Personalized news recommendation• Handel streaming content• No explicit runtime analysis of off-line training and
on-line recommendation
8
![Page 9: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/9.jpg)
OutlineOutline
• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR
– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries
• Experimental ResultsExperimental Results• ConclusionConclusion
9
![Page 10: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/10.jpg)
Problem DefinitionProblem Definition
• GoalGoal– Given user click logs, a query item I– Recommend relevant items w.r.t. I
• RequirementsRequirements– Effectiveness
• Mine frequent navigation patterns from click logs
– Scalability• Efficiently manage large-scale click logs within few
hours– Parallelization of RWR– Parallelization of RWR for multiple query nodes
10
![Page 11: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/11.jpg)
OutlineOutline
• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• A framework for scalable A framework for scalable
recommendationrecommendation– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries
• Experimental ResultsExperimental Results• ConclusionConclusion
11
![Page 12: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/12.jpg)
System Architecture System Architecture
User Access Logs
Temporal Following Pattern
Mining
Parameters:1.window size2.bin size
Item ID : <Item List>. . .
Recommendation Graph
Construction
Random Walk with Restart
Item ID : <Item List>. . .
Query Items :Item 1Item 2
. . .
12Off-Line Computation StorageInput
![Page 13: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/13.jpg)
Mining Temporal Following Mining Temporal Following Patterns in ParallelPatterns in Parallel
User Access Logs
Temporal Following Pattern
Mining
Parameters:1.window size2.bin size
Item ID : <Item List>. . .
Recommendation Graph
Construction
Random Walk with Restart
Item ID : <Item List>. . .
Query Items :Item 1Item 2
. . .
13
![Page 14: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/14.jpg)
Temporal Following RelationTemporal Following Relation
• Frequent QA browsing behaviors of Frequent QA browsing behaviors of users within a pre-defined time users within a pre-defined time windowwindow– E.g., window size = 150 sec.
14
Item 1 Item 2 Item 4Item 3User Click Stream :
0
Temporal Following relation : <Item 1, Item 2> : dt = 30
30 70 160
<Item 1, Item 3> : dt = 70
. . .<Item 1, Item 4> : dt = 160
![Page 15: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/15.jpg)
Temporal Following Pattern Temporal Following Pattern MiningMining
15
Mapper 1
Mapper N
Reducer 1 Reducer N
User click logs
. . .
. . .
Parameters
<Itemi , Itemj:cntij>
<Itemi , <Itemj:cntij, …, Itemz:cntiz>>
Temporal Following Relations
Temporal Following Patterns
Emit temporal following pairs for each item
Aggregate temporal following relation for each item
![Page 16: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/16.jpg)
Recommendation Graph Recommendation Graph ConstructionConstruction
User Access Logs
Temporal Following Pattern
Mining
Parameters:1.window size2.bin size
Item ID : <Item List>. . .
Recommendation Graph
Construction
Random Walk with Restart
Item ID : <Item List>. . .
Query Items :Item 1Item 2
. . .
16
![Page 17: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/17.jpg)
Recommendation Graph Recommendation Graph ConstructionConstruction
• Goal Goal – Transform discovered temporal following
patterns to a recommendation graph
• E.g., E.g.,
17
<Item 1, <Item2:cnt12, item3:cnt13>>
Temporal Following Pattern
<Item 4, <Item3:cntt13>> n1
n2
n3
n4
cnt13
cnt12
cnt43
Recommendation Graph
![Page 18: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/18.jpg)
Paralleling Paralleling Random Walk with RestartRandom Walk with Restart
User Access Logs
Temporal Following Pattern
Mining
Parameters:1.window size2.bin size
Item ID : <Item List>. . .
Recommendation Graph
Construction
Random Walk with Restart
Item ID : <Item List>. . .
Query Items :Item 1Item 2
. . .
18
![Page 19: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/19.jpg)
Paralleling Paralleling Random Walk with RestartRandom Walk with Restart
• With single queryWith single query
1
43
2
5 6
7
9 10
811
120.130.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
Node 4
Node 1Node 2Node 3Node 4Node 5Node 6Node 7Node 8Node 9Node 10Node 11Node 12
0.130.100.130.220.130.050.050.080.040.030.040.02
1
43
2
5 6
7
9 10
811
12
19
![Page 20: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/20.jpg)
Paralleling RWR With Single QueryParalleling RWR With Single Query
20
Machine 1 : Set initial score
for q
Machine N : Set initial score
for qMachine 1 :
Calculate relevance score
for each item
Machine N : Calculate
relevance score for each item
Machine 1 : Calculate difference of relevance score
vectors
Machine N : Calculate difference of relevance score
vectors
q : an item
User click logs
. . .
. . .
. . .
Initialization
RWR
Convergence
Converged
Parameters
No Yes
![Page 21: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/21.jpg)
Paralleling Paralleling Random Walk with RestartRandom Walk with Restart
• With multiple queryWith multiple query
1
4
3
2
5 6
7
9 10
811
12
1
43
2
56
7
9 10
811
12
1
43
2
5 6
7
9 10
811
120.130.10
0.13
0.13
0.05
0.05
0.08
0.04
0.02
0.04
0.03
21
1
43
2
5 6
7
9 10
811
120.100.10
0.10
0.13
0.13
0.13
0.13
0.04
0.02
0.04
0.03
0.13
![Page 22: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/22.jpg)
Paralleling RWR With Multiple Paralleling RWR With Multiple QueriesQueries
22
Machine 1 : Set initial score
for Q
Machine N : Set initial score
for Q
Mapper 1 : Calculate diffusion score for each item
w.r.t. each q
Mapper N : Calculate relevance score for each item
w.r.t. each q
Reducer 1 : Sum up diffusion
score for each item w.r.t. q
Reducer N : Sum up diffusion
score for each w.r.t. q
Q : itemsUser click logs
. . .
. . .
. . .
Initialization
RWR
Parameters
Until Maximum iteration<Itemi , <q1:rs1i, …, qz:rs1z> <adjacent list>>
![Page 23: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/23.jpg)
Paralleling RWR With Multiple QueriesParalleling RWR With Multiple Queries
• Diffusion score for each item w.r.t. Diffusion score for each item w.r.t. qq
• Sum up diffusion scores for each item Sum up diffusion scores for each item w.r.t. w.r.t. qq
23
![Page 24: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/24.jpg)
OutlineOutline
• IntroductionIntroduction• Related WorkRelated Work• problem Definitionproblem Definition• Parallel RWRParallel RWR
– Temporal following pattern mining– Recommendation graph construction– Random walk with restart for multiple queries
• Experimental ResultsExperimental Results• ConclusionConclusion
24
![Page 25: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/25.jpg)
Experimental SetupExperimental Setup
• Yahoo! Asia Knowledge Plus (AKP)Yahoo! Asia Knowledge Plus (AKP)– Duration : 1-week in July, 2009– #clicks : 90 M– #items : 4 M– #users : 2 M
• Performance evaluationPerformance evaluation– Quality study– Scalability study– Case study
25
![Page 26: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/26.jpg)
Quality StudyQuality Study
• User access logsUser access logs– Train 80% – Test 20%
• GroundtruthGroundtruth– For each item I clicked by user U– The set of items clicked by U after I within T sec.
• Measure the similarity with historical Measure the similarity with historical user click logsuser click logs– Item-precision– Item-recall
26
![Page 27: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/27.jpg)
Quality Study (contd.)Quality Study (contd.)
– Top-k hot items in the category of test item (HC)
– Temporal following pattern (TFP)– RWR based on temporal following pattern
(RWRTFP)• Higher precision & recall
27
![Page 28: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/28.jpg)
Scalability StudyScalability Study
• Temporal following pattern (TFP)– 4.1M items– 40 sec.• RWR based on temporal following pattern
(RWRTFP)– #sizes of input data – #computing nodes
28
![Page 29: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/29.jpg)
Scalability Study (contd.)Scalability Study (contd.)
• Computational cost is significantly reduced as number of machines increases
• More queries, more computation effective– 0.74 sec. (2K queries) 0.49 sec. (10K
queries)
29
![Page 30: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/30.jpg)
Case StudyCase Study
• Query ItemQuery Item– “What can I do if I do not have Word?”
30
![Page 31: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/31.jpg)
ConclusionConclusion
• Proposes a parallel RWR for multiple Proposes a parallel RWR for multiple query recommendationquery recommendation– Parallelize mining frequent navigation
behavior– Parallelize RWR– Compute RWR for multiple queries in parallel
• The recommender systemThe recommender system– General– Content- agnostic
31
![Page 32: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/32.jpg)
Q & AQ & A
32
![Page 33: Parallelizing Random Walk with Restart for Large-Scale Query Recommendation](https://reader035.fdocuments.net/reader035/viewer/2022062410/5681586a550346895dc5cb67/html5/thumbnails/33.jpg)
Temporal Following Pattern Temporal Following Pattern MiningMining
33
Mapper 1 : Emit temporal
following pairs for each item
Mapper N : Emit temporal
following pairs for each item
Reducer 1 : Aggregate temporal following relation for
each item
Reducer N : Aggregate temporal following relation for
each item
User click logs
. . .
. . .
Parameters
<Itemi , Itemj:dtij>
<Itemi , <Itemj:dtij, …, Itemz:dtiz>>
Temporal Following Relations
Temporal Following Patterns