Decay-based Ranking for Social Application Content · Experiment #3 effect of time decay function...
Transcript of Decay-based Ranking for Social Application Content · Experiment #3 effect of time decay function...
Decay-based Rankingfor Social Application Content
Wolfgang Nejdl, Claudia Niederèe, George Papadakis
Maximilian Peters
Introduction
social web wikis blogs
ugc a lot effortless sharing
information overload
hard to find relevant
information
heterogen information
space
social web wikis blogs
ugc a lot effortless sharing
information overload
hard to find relevant
information
heterogen information
space
social web wikis blogs
ugc a lot effortless sharing
information overload
hard to find relevant
information
heterogen information
space
Problem
information overload
heterogeneous information space of social apps formats not interconnected
classical IR methodes are not adequate
people don‘t find relevant information easily
Idea
present resources to users by guessing ...
guessing their likelihood to be used in the future ...
while considering their recent usage in the past
Example
A
B
C
D
E
t0 t1 t2 ... tN tN+11 3 8 220 0 19 384 1 9 1578 3 16 35 0 78 0
...
...
...
...
...
319028171
unkown!
Examplet0 t1 t2 ... tN tN+1
A 1 3 8 22... 3
B 0 0 19 38... 190C 4 1 9 157... 28D 8 3 16 3... 17
E 5 0 78 0... 1
Idea
rank information items after n-th transaction batch so that their average ranking position aftern+1-th transaction batch is minimized
Approach
assign a value to each information item
take into account recency of usage degree of usage
rank them according to this value
4 Parameters
1. size of transaction batch #transactions to trigger update of information space
2. size of sliding window #transactions considered in calculating an information item‘s value
3. time decay function
4. accesses/editings ratio
Time Decay Function calculate value based on usage history
several time decay functions, distinguishable in rate of decay steep rate of decay: recency slow rate of decay: degree of usagee.g. exponential, polynomial, logarithmic
4 Experiments evaluate the 4 parameters
datasets L3S internal wiki, small information space cms in OKKAM, big information space
#days539
367
#transactions237.118
33.808
#accesses224.402
28.848
#editings12.716
4.960
#pages2.097
646
L3S wikiOKKAM cms
Experiment #1 effect of transaction batch size on performance
0
18
36
54
72
90
1 2 3 4 5 6 7 8 9 10 20 50 100
perf
orm
ance
L3S wiki OKKAM cms
result: update value of all information items with every new transaction to gain highest performance
Experiment #2 effect of sliding window size on performance
15
22
29
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
perf
orm
ance
L3S wiki OKKAM cms
result: best performance for w between 13,000 and 14,000
effect of time decay function on performance
LOG performs worst: slow decay, only degree of usage
EXP = LRU: steep decay, only recency
PLN performs best: balance in recency & degree of usage
Experiment #3
0
30
60
90
120
150
LRU EXP P0.25 P0.50 P0.75 P1.0 P1.25 P1.50 P1.75 P2.00 P2.25 P2.50 P2.75 P3.00 L1.1 L2.0 L5.0 L10 L20 L50 L100
perf
orm
ance
L3S wiki OKKAM cms
Experiment #3 effect of time decay function on performance
my experiment:
compare PLN over LRU in 200 not subsequent usages
to PLN over LRU in 20x10 subsequent usages
observation:in L3S wiki PLN performs 10% better than LRU,in OKKAM cms it performs only 3% better than LRU
how come?
result
not subsequent PLN/LRU > subsequent PLN/LRU
L3S wiki has less subsequent transactions than OKKAM cms
Experiment #4 effect of a/e ratio on performance
scenario: if one user updates a resource in a social app, what would you expect others to do?
others want to keep up to date, so they access it
what would you expect to be more important for ranking an information item?
result contrast assumption: editings are not more important that accesses; though editings are followed by a lot of updates on edited resource
Experiment #4 effect of a/e ratio on performance
1
10
100
1000
0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1,0 1,1 1,2 1,3 1,4 1,5 1,6 1,7 1,8 1,9 2,0
perf
orm
ance
L3S wiki OKKAM cms
Conclusion
novel method of information valuation remedying information overload in context of social applications
ranking of content by likelihood of being used in future
balance between recency and degree of usage
applicable to several environments due to parameters
outperforms methods normally employed
though not yet perfect: avg rank of 10 for small, 20-30 for big information space
Related Work information valuation is a branch of
„Information Lifecycle Management“ (ILM)
most related approaches quantify in a binary scale - intensively and slightly used resources this approach ranks on a floating scale
data streams use time decay models as well data stream methods focus on highest stream
approximation through summarization this approach focuses on accurancy in ranking
Future Work
propagate value of an information resource to its neighbouring ones to improve overall ranking
purge redundant information accumulated over time in a wiki
accelerate calculations in process of rapidly updating information resources’ values
adapation to work with queries as well