Dynamic Collective Entity Representations for Entity Ranking
-
Upload
david-graus -
Category
Science
-
view
317 -
download
0
Transcript of Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDavid Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
2
first entities & structure, i get to show the mandatory entity search example
3
you are not interested in documentsbut in things: person/artist kendrick lamarreferring to him w/ his former stage name
4Entity search?Index = Knowledge Base (= Wikipedia)Documents = EntitiesReal world entities have a single representation (in KB)
so it is like web search, but the units of retrieval are real life entities, so we can collect data for them
5Representation is not staticPeople talk about entities all the timeAssociations between words and entities change over time
This is what we try to leverage in this work
6Example 1: News events
July 31st, after August 7th -> Added content, new words associations
7Example 2: Social media chatter
*****
this looks a bit extreme, because theres swearingbut theres a serious intuition here; vocabulary gap (formal KB, informal chatter)
8Dynamic Collective Entity RepresentationsUse collective intelligence to mine entity descriptions to enrich representation.Is like document expansion (add terms found through explicit links)Is not query expansion (terms found through predicted links)
our method aims to leverage thisenrich representation + close the gap
9AdvantagesCheap: Change document in index, leverage tried & tested retrieval algorithmsFree smoothing: (e.g., tweets) may capture newly evolving word associations (Ferguson shooting) and incorporate out-of-document termsmove relevant documents closer to queries (= close the gap between searcher vocabulary & docs in index)
of collective int/descr sources
10Havent we seen this before?Anchors & queries in particular have been shown to improve retrieval [1]Tweets have been shown to be similar to anchors [2]Social tags, same [3]But: in batch (i.e., add data, see how it affects retrieval)single source[1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001[2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR 12 [3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR 13
we look at a scenario where the expansions come in a streaming manner
11Description sources
Anthropornis nordenskjoeldiAnthropornisNordenskjoeld's Giant Penguin
EoceneOligoceneAnimalChordateAvesSphenisciformesSpheniscidae...emperor penguinNordenskjoeld's Giant PenguinAnthropornis nordenskjoeldiNordenskjoeld's giant penguinAnthropornisEocene birdsOligocene birdsExtinct penguinsOligocene extinctionsBird generaKB AnchorsKB CategoriesKB RedirectsKB Links
Anthropornis nordenskjoeldiAnthropornis nordenskjoeldiWeb Anchors
megafaunaTagsTweets
biggest penguinanthropornisextinct penguinprehistoric birdsQueries
Fielded document representation
12ChallengeHeterogeneityDescription sourcesEntities
Dynamic natureContent changes over time
You could do vanilla retrieval. But two challenges arise;description sources differ along several dimensions (e.g., volume, quality, novelty)head entities are likely to receive a larger number of external descriptions than tail entities.content changes over time, so expansions may accumulate and swamp the representation
13Method: Adaptive rankingSupervised single-field weighting modelFeatures: field similarity: retrieval score per field.field importance: length, novel terms, etc.entity importance: time since last update.(Re-)learn optimal weights from clicks
Our solution is to dynamically learn how to combine fields into single representation,Features (more detail in paper); field similarity features (per field) = queryfield similarity scores.field importance features (per field) to inform the ranker of the status of the field at that time (i.e., more and novel content)entity importance (to favor recently updated entities)(what about experimental setup?)
14Experimental setupData:MSN Query log (62,841 queries + clicks (on entities))
Each query is treated as a time unitFor each query:Produce rankingObserve clickEvaluate ranking (MAP/P@1)Expand entities (w/ dynamic descriptions)[re-train ranker]
Took all queries that yield Wiki clicks.Top-k retrieval, extract featuresAllows to track performance over time
15Main resultsComparing effectiveness of diff. description sourcesComparing adaptive vs. non-adaptive ranker performance
in this talk I focus on the contribution of sourcesand adaptive vs. static ranker
16Description sources
MAPNo. of queries
1. Each source contributes to better ranking; Tags/web anchors do best, tweets are significantly > KB2. Dynamic sources have higher learning rates (suggests that newly incoming data is successfully incorporated)3. Tags starts under web but approaches it; new tags improve[NEXT] To see the effect of incoming data, feature weights
17Feature weights over time
Relative feature importanceNo. of queries
- Static go down, dynamic go up (suggests retraining is important w/ dynamic expansions)- Tweets marginally, but as we know KB+Tweets > KB, the tweets do help- Not shown; static expansions stay roughly the same[NEXT] Increasing field weight + increased performance suggests retraining is needed, next;
18Non-adaptive vs. adaptive ranking
1. [LEFT] Lower performance overall (more data w/o more training queries)2. [LEFT] Dynamic ones higher slopes; so newly incoming data does help even in static3. [RIGHT] same patterns but tags+web do comparatively better (because of swamping?)[END] higher performance: retraining increases rankers ability in optimally combining descriptions into a single representation
19In summaryExpanding entity representations with different sources enables better matching of queries to entitiesAs new content comes in, it is beneficial to retrain the rankerInforming ranker of expansion state further improves performance
More data helps, but to optimally benefit you need to inform your ranker
20Thank you(Also, thank you WSDM & SIGIR travel grants)