Temporal Latent Topic User Profiles for Search Personalisation

37
Thanh Vu, Alistair Willis, Dawei Song The Open University, UK Temporal Latent Topic User Profiles for Search Personalisation Son N. Tran City London University The 37 th European Conference on Information Retrieval 31 st of March, 2015

Transcript of Temporal Latent Topic User Profiles for Search Personalisation

Thanh Vu, Alistair Willis, Dawei Song

The Open University, UK

Temporal Latent Topic User Profiles for Search Personalisation

Son N. TranCity London University

The 37th European Conference on Information Retrieval

31st of March, 2015

Temporal Latent Topic User Profiles for Search Personalisation

Search Personalisation

2

Return search results based onThe input queryThe user searching interests

Different users submit the same input query will probably get different search result lists

Even an individual user will get different search results at different search times (e.g., Open US)

Temporal Latent Topic User Profiles for Search Personalisation

The performance of search personalisation

depends onthe richness of a user

profileJ. Teevan, M. R. Morris, and S. Bush. Discovering and using groups to improve personalized search. In WSDM’2009

3

Temporal Latent Topic User Profiles for Search Personalisation

Topic-based user profiles

4

Use Human generated ontology (ODP – dmoz.org) to extract topics from all clicked/relevant documents of a specific user to build her profile

1. R. W. White, et al., Enhancing Personalized Search by Mining and Modeling Task Behavior. In WWW’20132. P. N. Bennett, et al., Modeling the impact of short- and long-term behavior on search personalization. In SIGIR’2012

Temporal Latent Topic User Profiles for Search Personalisation

Challenges for Human Generated Ontology

5

New topics which are not covered in the Ontology will possibly emerge overtime

Expensive human effort to classify/maintain each document into correct categories

Temporal Latent Topic User Profiles for Search Personalisation

Challenges for Time-awareness

6

Previous methods use all the clicked/relevant documents of a user to build her searching profile

The documents are treated equally without considering temporal features (i.e., the time of documents being clicked and viewed)The profile is too broad Cannot fully express the current interest of

the user1. T. T. Vu, et al., Improving search personalisation with dynamic group formation. In SIGIR’20142. K. Raman, et al., Toward whole-session relevance: Exploring intrinsic diversity in web search. In SIGIR’2013

Temporal Latent Topic User Profiles for Search Personalisation

Research Questions

7

1. How can we build user profiles with time-awareness?

2. Do the time-aware profiles help improve search performance?

Temporal Latent Topic User Profiles for Search Personalisation

Applying Latent Dirichlet Allocation

8

Temporal Latent Topic User Profiles for Search Personalisation

Building temporal latent topic user profiles (1)

9

Non-temporal method

4th 1st2nd3rd

FootballLawHealthOS

0.510.330.110.05

Clicked documents

FootballLawOSHealth

0.550.270.100.08

LawOSHealthFootball

0.410.370.120.10

OSLawFootballHealth

0.650.210.100.04

Distribution over topics

FootballLawOSHealth

0.320.300.290.09

Means over topics

The topic-based user profile

Temporal Latent Topic User Profiles for Search Personalisation

Building temporal latent topic user profiles (2)

10

Our method

1st

FootballLawHealthOS

0.510.330.110.05

FootballLawHealthOS

0.510.330.110.05

The temporal topic user profile

0.90

Temporal Latent Topic User Profiles for Search Personalisation

FootballLawHealthOS

0.530.300.090.08

Building temporal latent topic user profiles (2)

11

2nd 1st

FootballLawHealthOS

0.510.330.110.05

FootballLawOSHealth

0.550.270.100.08

The temporal topic user profile

0.91 0.90

Temporal Latent Topic User Profiles for Search Personalisation

FootballLawOSHealth

0.370.340.190.10

0.91

0.92

Building temporal latent topic user profiles (2)

12

3rd 1st2nd

FootballLawHealthOS

0.510.330.110.05

FootballHealthOSLaw

0.550.270.100.08

LawOSHealthFootball

0.410.370.120.10

The temporal topic user profile

0.90

Temporal Latent Topic User Profiles for Search Personalisation

OSLawFootballHealth

0.320.300.290.09

Building temporal latent topic user profiles (2)

13

4th 1st2nd3rd

FootballLawHealthOS

0.510.330.110.05

FootballHealthOSLaw

0.550.270.100.08

LawOSHealthFootball

0.410.370.120.10

OSLawFootballHealth

0.650.210.100.04

Temporal topic profile

0.93

0.92

0.91

0.90

FootballLawOSHealth

0.320.300.290.09

Non-temporal topic profile

Temporal Latent Topic User Profiles for Search Personalisation

Building temporal latent topic user profiles (3)

14

Du = {d1, d2, …, dn} is a relevant document set of the user u

The user profile of u is a distribution over the topic Z (extracted by LDA)

tdi = n indicates that di is the nth most relevant/clicked document of u

α is the decay parameter; K is the normalisation factor

Temporal Latent Topic User Profiles for Search Personalisation

Building temporal latent topic user profiles (4)

15

Long-term user profileUse relevant documents extracted from the

user’s whole search historyDaily user profile

Use relevant documents extracted from the search history of the user in the current searching day

Session user profileUse relevant documents extracted from the

search history of the user in the current search session

Temporal Latent Topic User Profiles for Search Personalisation

Re-ranking search results (1)

16

1 32

HealthLawFootballOS

0.510.330.110.05

FootballLawHealthOS

0.550.270.130.05

FootballOSHealthLaw

0.410.370.120.10

Original Rank

132

HealthLawFootballOS

0.510.330.110.05

FootballLawHealthOS

0.550.270.130.05

FootballOSHealthLaw

0.410.370.120.10

After re-ranking

FootballLawOSHealth

0.470.240.160.12

The user profile (p)

Temporal Latent Topic User Profiles for Search Personalisation

Re-ranking search results (2)

17

Personalised scoresUse Jensen-Shannon divergence (DJS[d||p] )

1 32

HealthLawFootballOS

0.510.330.110.05

FootballLawHealthOS

0.550.270.130.05

FootballOSHealthLaw

0.410.370.120.10

FootballLawOSHealth

0.470.240.160.12

Returned documents (d)

The user profile (p)

Re-ranking search results (3)

18

Re-ranking Features

Re-Ranking Algorithm: LambdaMART[1]

1. C. J. Burges, et al., Learning to rank with non-smooth cost functions. In NIPS’2007.

Feature DescriptionPersonalised FeaturesLongTermScore

Personalised score between document and long-term profile

DailyScore Personalised score between document and daily profile

SessionScore Personalised score between document and session profile

Non-personalised FeaturesDocRank Rank of document on original returned listQuerySim Cosine similarity score between current and

previous queriesQueryNo Total number of queries that have been submitted in

the current search session (included the current query)

Temporal Latent Topic User Profiles for Search Personalisation

Evaluation

19

DatasetThe query logs of 1166 anonymous users in four

weeks, from 01st to 28th July 2012A log entity consists of an anonymous user

identifier, a query, top-10 returned URLs, and clicked documents along with the user’s dwell time

Download all the URLs’ content for learning topicsA search session is demarcated by 30 minutes of

user inactivityA relevant document is a click with dwell time of at

least 30 seconds or the last click in a session (SAT click)

Temporal Latent Topic User Profiles for Search Personalisation

Evaluation methodology

20

Assign a positive (relevant) label to a returned URL ifit is a SAT click in the current queryit is a SAT click in one of the other repeated

queries in the same search sessionAssign negative (irrelevant) labels to the

rest of URLs

Temporal Latent Topic User Profiles for Search Personalisation

Personalisation Methods and Baselines

21

Personalisation MethodsLON uses only LongTermScore from long-term profileDAI uses only DailyScore from daily profileSES uses SessionScore from session profileALL uses all personalised scores from three profiles

(ALL)Baselines

Default is the default ranking returned by the search engine

Static uses the LongTermScore from long-term profile without time-awareness (i.e., not using decay function)

Temporal Latent Topic User Profiles for Search Personalisation

Results

22

Evaluation metricsMean Average Precision (MAP)Precision (P@k)Mean Reciprocal Rank (MRR)Normalized Discounted Cumulative Gain

(nDCG@k) For each evaluation metric, the higher

value indicates the better ranking

Temporal Latent Topic User Profiles for Search Personalisation

Overall Performance

23

• All the improvements over the baselines are all significant with paired t-test of p < 0.001

Temporal Latent Topic User Profiles for Search Personalisation

• Three temporal profiles help to improve search performance over default ranking and the use of non-temporal profile

Conclusions (1)

24

Temporal Latent Topic User Profiles for Search Personalisation

• Using all features (ALL) achieves the highest performance

Conclusions (2)

25

Temporal Latent Topic User Profiles for Search Personalisation

Conclusions (3)

26

• The session profile achieves better performance than the daily profile

• The daily profile gains advantages over the long-term profile

Temporal Latent Topic User Profiles for Search Personalisation

Conclusions (4)

27

• Without time-awareness, the long-term profile gets no improvement over the default ranking

Temporal Latent Topic User Profiles for Search Personalisation

Summary

28

Build long-term, daily and session profiles with time-awareness using topics extracted automatically from relevant documents in different time scales

Use the three profiles to re-rank search results returned by Bing and show the significant improvement in search performances

Thank you!Any questions?

29

Temporal Latent Topic User Profiles for Search Personalisation

Dataset (2)

30

Temporal Latent Topic User Profiles for Search Personalisation

Example of query logs

31

Temporal Latent Topic User Profiles for Search Personalisation

Click Entropies

32

P(d|q) is the percentage of the clicks on document d among all the clicks for q

A smaller query click entropy value indicates more agreement between users on clicking a small number of web pages

Temporal Latent Topic User Profiles for Search Personalisation

Click entropies

33

Temporal Latent Topic User Profiles for Search Personalisation

Query Positions in Search Session

34

Aim to study whether the position of a query has any effect on the performance of the temporal latent topic profiles

Label the queries by their positions during the search

Temporal Latent Topic User Profiles for Search Personalisation35

FootballLawHealthOS

0.510.330.110.05

Clicked documents

FootballHealthOSLaw

0.550.270.130.05

LawOSHealthFootball

0.410.370.120.10

OSLawFootballHealth

0.650.150.110.09

Distribution over topics

FootballLawOSHealth

0.320.290.280.11

Means over topics

The topic-based user profile

Temporal Latent Topic User Profiles for Search Personalisation

Re-ranking search results (1)

36

Query: MU

Temporal Latent Topic User Profiles for Search Personalisation

Pre-processing

37

Remove the queries whose positive label set is empty from the dataset

Discard the domain-related queries (e.g., Facebook, Youtube)