Daan Odijk | Semantic Search ContentCafé #11
-
Upload
contentcafe -
Category
Internet
-
view
935 -
download
2
Transcript of Daan Odijk | Semantic Search ContentCafé #11
![Page 1: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/1.jpg)
Semantic SearchDaan Odijk
ContentCafé 8 april 2015
![Page 2: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/2.jpg)
ContentCafé proudly presents: Zoekt en gij zult vinden… toch? door CHARLOTTE VAN OOSTRUM geplaatst op 13 MAART 2015
Toen Google in 2013 5 minuten offline was. daalde het aantal page views op het internet met 40%. We navigeren het web via zoekmachines: elke maand stellen we met z’n allen elke 60 seconden zo’n 2.66 miljoen vragen aan Google’s ondoorgrondelijke algoritmes. Het is dus niet zo gek om te denken dat navigatie- of interactieproblemen ook met search ‘opgelost’ kunnen worden. Als je argumenten nodig hebt om aan te tonen dat dit niet werkt. lees dan dit artikel.
Maar wanneer werkt search dan wel en hoe weet je of een zoekmachine goed functioneert? Hoe kun je input leveren voor implementatie? Wat is semantisch zoeken. wat zijn de praktische mogelijkheden en hoe kun je dat zo inzetten dat jouw bezoekers niet eens meer hóeven te zoeken?
De elfde editie van het ContentCafé vindt plaats op woensdag 8 april om 19 uur Performance Solutions in Hoofddorp. We laten je graag verdwalen en je weg terugvinden in de wereld van search. semantiek en algoritmes.
![Page 3: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/3.jpg)
Termjeenhetdatde
searchhoe
zoekencontentcafé
denkensolutions
implementatieweb
presentsinput
hoofddorpgek
bezoekerszoekmachine
TF65543332211111111111
![Page 4: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/4.jpg)
Termjeenhetdatde
searchhoe
zoekencontentcafé
denkensolutions
implementatieweb
presentsinput
hoofddorpgek
bezoekerszoekmachine
TF65543332211111111111
![Page 5: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/5.jpg)
Termjeenhetdatde
searchhoe
zoekencontentcafé
denkensolutions
implementatieweb
presentsinput
hoofddorpgek
bezoekerszoekmachine
TF65543332211111111111
DF881091057810947749
11175298352
1725
![Page 6: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/6.jpg)
Termjeenhetdatde
searchhoe
zoekencontentcafé
denkensolutions
implementatieweb
presentsinput
hoofddorpgek
bezoekerszoekmachine
TF65543332211111111111
DF881091057810947749
11175298352
1725
TF.IDF0.070.050.050.050.030.060.040.220.020.140.20 0.50 0.110.120.330.20 0.50 0.060.500.20
Lucene2.452.242.242.00 1.732.931.734.810.00 3.563.894.613.4
3.484.303.894.612.794.613.89
![Page 7: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/7.jpg)
! TF Luhn 1957
! TF.IDF KSJ 1972
! BM25 Robertson
1995
!Language Models
Kalt, 1996
![Page 8: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/8.jpg)
PageRank Brin & Page
1998
!
! ! !!
![Page 9: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/9.jpg)
!Learning to Rank Fuhr (1992)
!
! ! !!
![Page 10: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/10.jpg)
ContentCafé proudly presents: Zoekt en gij zult vinden… toch? door CHARLOTTE VAN OOSTRUM geplaatst op 13 MAART 2015
Toen Google in 2013 5 minuten offline was. daalde het aantal page views op het internet met 40%. We navigeren het web via zoekmachines: elke maand stellen we met z’n allen elke 60 seconden zo’n 2.66 miljoen vragen aan Google’s ondoorgrondelijke algoritmes. Het is dus niet zo gek om te denken dat navigatie- of interactieproblemen ook met search ‘opgelost’ kunnen worden. Als je argumenten nodig hebt om aan te tonen dat dit niet werkt. lees dan dit artikel.
Maar wanneer werkt search dan wel en hoe weet je of een zoekmachine goed functioneert? Hoe kun je input leveren voor implementatie? Wat is semantisch zoeken. wat zijn de praktische mogelijkheden en hoe kun je dat zo inzetten dat jouw bezoekers niet eens meer hóeven te zoeken?
De elfde editie van het ContentCafé vindt plaats op woensdag 8 april om 19 uur Performance Solutions in Hoofddorp. We laten je graag verdwalen en je weg terugvinden in de wereld van search. semantiek en algoritmes.
48pt18pt
24pt
! Zoek
![Page 11: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/11.jpg)
%
"#$#
&
%
&! Zoek
BM25
Page
Rank
![Page 12: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/12.jpg)
Semantic Search
![Page 13: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/13.jpg)
Semantic search
• Improve search accuracy by understanding searcher intent and the contextual meaning of terms and documents.
• Move beyond “ten blue links” (towards actually answering information needs) using rich context.
![Page 14: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/14.jpg)
Semantic search• What is “semantic” search?
• understanding intent, contextual meaning • finding actual answers for information needs • combining text and structure
• “Entity-centric search” • Entity: uniquely identifiable thing or object • “A thing with a distinct and independent
existence”
![Page 15: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/15.jpg)
Challenges
"#$#
! Zoek
Query Understanding
Presentation & Interaction
Document Understanding
Presentation & Interaction
![Page 16: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/16.jpg)
Challenges
"#$#
! Zoek
Document Understanding
![Page 17: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/17.jpg)
Interplay: (un)structured data
Unstructured Structuredxxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx xx x xxxx x xxx xx
xxxx x xxx xx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxxx xxxxxx xx x xxx xx x xxxx xx xxx x xxxxx xx x xxx xx xxxx xx xxx xx x xxxxx xxx
xxxx x xxx xx xxxxxx
adding structure to text
adding text to structure
![Page 18: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/18.jpg)
Entity Profiling
- Entity profiling- generate a profile of an entity
- summary (keywords/full-text) - timelines - …
- Slot filling- automatically fill attribute fields
![Page 19: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/19.jpg)
But first…
ice cube music
![Page 20: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/20.jpg)
![Page 21: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/21.jpg)
michelangelo
![Page 22: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/22.jpg)
But first…
![Page 23: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/23.jpg)
vin diesel
![Page 24: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/24.jpg)
![Page 25: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/25.jpg)
schema.org (RDFa)
• used by Google, Bing, Yandex, Yahoo!, IPTC, etc.
![Page 26: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/26.jpg)
Challenges
"#$#
! Zoek
Query Understanding
![Page 27: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/27.jpg)
Distribution of web search queries [Pound et al. 2010]
6%
36%
1%5% 12%
41%Entity (“1978 cj5 jeep”)Type (“doctors in barcelona”)Attribute (“zip code waterville Maine”)Relation (“tom cruise katie holmes”)Other (“nightlife in Barcelona”)Uninterpretable
![Page 28: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/28.jpg)
Query Understanding• First step: recognize, label, and
disambiguate entities in queries • add: attributes/aspects • add: types • add: relationships • add: actions/verbs • etc.
• Then: query understanding • what is the intent?
![Page 29: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/29.jpg)
Query Understanding
• Adding structure to queries• Query intents• Query context
(sessions, users, history, etc.)• Interaction
![Page 30: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/30.jpg)
Template-based query understanding
• Rule-based approaches (editorial)• high precision • difficult to generalize • costly to create/maintain
• Research into more generic approaches is ongoing
![Page 31: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/31.jpg)
Challenges
"#$#
! Zoek
Presentation & Interaction
Presentation & Interaction
![Page 32: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/32.jpg)
Result presentation
• Rich result pages (SERPs)
• Directly displaying answers and relevant information or context
![Page 33: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/33.jpg)
![Page 34: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/34.jpg)
Rich result pages
![Page 35: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/35.jpg)
Direct displays
![Page 36: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/36.jpg)
![Page 37: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/37.jpg)
![Page 38: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/38.jpg)
![Page 39: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/39.jpg)
![Page 40: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/40.jpg)
![Page 41: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/41.jpg)
Keyword Queries
- Single-search-box paradigm
- Typical web search queries
- “Telegraphic”, i.e., neither well-formed nor grammatically correct
Keyword++ queries
- Augmented with context
- form/facet-based input
- location/date/TOD/…
![Page 42: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/42.jpg)
Example keyword++ queries
![Page 43: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/43.jpg)
![Page 44: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/44.jpg)
Example keyword++ queries
![Page 45: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/45.jpg)
Interaction: recommendation, auto-completion
![Page 46: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/46.jpg)
Interaction: recommendation, auto-completion
![Page 47: Daan Odijk | Semantic Search ContentCafé #11](https://reader033.fdocuments.net/reader033/viewer/2022042716/55b6c346bb61ebdf378b4644/html5/thumbnails/47.jpg)
Want to learn more?
[email protected] / daan.odijk.me
Edgar Meij – @edgarmeijYahoo Labs
Krisztian Balog – @krisztianbalogUniversity of Stavanger
Daan Odijk – @dodijkUniversity of Amsterdam
Entity Linking and Retrieval
Edgar Meij – @edgarmeijYahoo! Research
Krisztian Balog – @krisztianbalogUniversity of Stavanger
Daan Odijk – @dodijkUniversity of Amsterdam
Monday, May 13, 13
Tutorial on Entity Linking and Retrieval for Semantic Search
bit.ly/ELR-slides