Connect and search your data
-
Upload
brendonpage -
Category
Software
-
view
24 -
download
0
Transcript of Connect and search your data
Me
twitter @brendonpaginate
blog http://geekswithblogs.com/brendonpage
Software Developer @ Chillisoft
Situation
billion users don’t know what IRC is
billion users know what IRC is
2
0.9
High
Expectations
NormalExpectations
Situation
Flickr, Gmail, Reddit, Pandora, Youtube, Google Earth, Twitter, Khan Academy, Facebook, Street View, Kindle, Tumblr, SoundCloud, GitHub, Spotify, Dropbox, Google Docs, Wolfram Alpha, Kickstarter, WhatsApp, Bitcoin, Instagram, Pintrest, Google+, Coursera, Mega
Most internet users know an internet with
TheGuardian
SearchAnd
SoundCloud
Foursquare
Stack
Exchange YouTube
… and many more
Kalahari Github
StumbleUpon
Search
But what makes search good?
Fuzziness &
Synonyms
“batmn” -> “batman”
“i-Pod” -> “iPod”
Did you mean?
“example did you
meen” ->
“example did you
mean”
Ranking
Speed
“About 1 780 000 000 results (0.37 seconds)”
Auto
Complete
Advanced Query“crack -ass”
Indexing{ “Title”: “Batman Returns”, “Year”: “2005”}
Elasticsearch
Tokenise
Analyse
Store Documentwith id (1)
Update Index
returns: [1]return: [1]batman: [1]2005: [1]
REST API
“Batman”“Returns”
“2005”
“returns”“batman”
“2005”“return”
Indexing{ “Title”: “Batman”, “Year”: “2007”}
Elasticsearch
Tokenise
Analyse
Store Documentwith id (1)
Update Index
returns: [1]return: [1]batman: [1,2]2005: [1] 2007: [2]
REST API
“Batman”“2007”
“batman”“2007”
Search
Term frequency – inverse document frequency
term frequency: score for each matching word in the documentinverse document frequency: word weight is higher if uncommon across documents
“batman returns”
returns: [1]return: [1]batman: [1, 2]2005: [1] 2007: [1]
(id 1){ “Title”: “Batman Returns”, “Year”: “2005”}
1
(id 2){ “Title”: “Batman”, “Year”: “2007”}
2
??
Search Summary
Fuzziness &
Synonyms
“batmn” -> “batman”
“i-Pod” -> “iPod”
RankingSpeed
“About 1 780 000 000 results (0.37 seconds)”
Auto
CompleteAdvanced Query
“crack -ass”
With very little effort we’ve gotten
using Elasticsearch
Friend Recommendation
FriendMeFriend
of Friend
Friend
FriendFriend
of Friend
I don’t want todo that in SQL!
Thisguy
Thisguy
Book Recommendation
BookMe Genre Book
Other User
I don’t want todo that in SQL!
ToomanyThisguy
Thisguy
Other Uses?
LogisticsPermissions
Fraud Detection
Almost anything you drawas a graph on the white board
Improving Search
“man of steel”
Session
Doc 1“steel
workers are
cool”
“super man”
Doc 2“super man”
{ “Title”: “Super Man”, “Year”: “2013”}
{ “Title”: “Super Man”, “Year”: “2013” “OtherMatches”: [“man of steel”]}