BOSS: HackU IIT Bombay

45
HackU: IIT Bombay 5 th Feb’ 2009 Chris Heilmann Saurabh Sahni Build your Own Search Service

description

An introduction to BOSS API

Transcript of BOSS: HackU IIT Bombay

Page 1: BOSS: HackU IIT Bombay

HackU: IIT Bombay 5th Feb’ 2009

Chris Heilmann Saurabh Sahni

Build your Own Search Service

Page 2: BOSS: HackU IIT Bombay

- 2 -

Outline

•  Search engines using BOSS •  About BOSS API

–  What? –  Why? –  Features

•  How to use it –  BOSS API –  BOSS Mashup framework

Page 3: BOSS: HackU IIT Bombay

- 3 -

Search engines using BOSS

Page 4: BOSS: HackU IIT Bombay

- 4 -

hakia: http://hakia.com/

Page 5: BOSS: HackU IIT Bombay

- 5 -

hakia: http://hakia.com/

Page 6: BOSS: HackU IIT Bombay

- 6 -

hakia: http://hakia.com/

Page 7: BOSS: HackU IIT Bombay

- 7 -

Cluuz: http://cluuz.com

Page 8: BOSS: HackU IIT Bombay

- 8 -

Cluuz: http://cluuz.com

Page 9: BOSS: HackU IIT Bombay

- 9 -

Cluuz: http://cluuz.com

Page 10: BOSS: HackU IIT Bombay

- 10 -

Keyword finder - http://keywordfinder.org/

Page 11: BOSS: HackU IIT Bombay

- 11 -

askBOSS: http://ask-boss.appspot.com/

Page 12: BOSS: HackU IIT Bombay

- 12 -

askBOSS: http://ask-boss.appspot.com/

Page 13: BOSS: HackU IIT Bombay

- 13 -

askBOSS: http://ask-boss.appspot.com/

Page 14: BOSS: HackU IIT Bombay

- 14 -

askBOSS: http://ask-boss.appspot.com/

Page 15: BOSS: HackU IIT Bombay

- 15 -

askBOSS: http://ask-boss.appspot.com/

Page 16: BOSS: HackU IIT Bombay

- 16 -

About BOSS API

Page 17: BOSS: HackU IIT Bombay

- 17 -

What?

•  Open Yahoo’s core search features via web services to let 3rd parties revolutionize Search

•  Unrestricted

http://developer.yahoo.com/search/boss

Page 18: BOSS: HackU IIT Bombay

- 18 -

Usage

Opening the search technology stack

50B pages * 20ms page download = 31 years

CRAWL

EXTRACT

SPAM <-> Gold

Analyze

Index

Rank Assist

Index

Web Map

Retrieve

Page 19: BOSS: HackU IIT Bombay

- 19 -

Usage

Opening the search technology stack

50B pages * 20ms page download = 31 years

CRAWL

EXTRACT

SPAM <-> Gold

Analyze

Index

Rank Assist

Index

Web Map

Retrieve

WEB API

Your App here

Page 20: BOSS: HackU IIT Bombay

- 20 -

Why?

•  Removes entry barriers –  massive capital investment –  access to top technical talent

•  Asset to Innovate – Develop new relevance models

• Leverage user insights • Use tags, bookmarks

–  Change presentation style

•  Search anywhere –  Improve Vertical Quality w/ Web comprehensiveness –  Fragment the market, foster more players, choice, competition

Page 21: BOSS: HackU IIT Bombay

- 21 -

BOSS API features

•  Unlimited queries per day •  No branding or attribution •  No restrictions on presentation •  Ability to re-order results and blend-in addition content •  Access to multiple verticals (web search, image, news) •  Spell checks, keyword suggestions •  40+ supported language and region pairs •  Ability to monetize

Page 22: BOSS: HackU IIT Bombay

- 22 -

How to use it?

Page 23: BOSS: HackU IIT Bombay

- 23 -

Get Started

•  Register for an application id http://developer.yahoo.com/wsregapp/

•  Documentation http://developer.yahoo.com/search/boss/boss_guide/

•  Code samples: Javascript, PHP and Python http://www.saurabhsahni.com/boss-examples.zip

Page 24: BOSS: HackU IIT Bombay

- 24 -

BOSS API

Searching Slumdog Millionaire

(Source: http://en.wikipedia.org/wiki/File:Slumdog_Millionaire_poster.jpg)

Page 25: BOSS: HackU IIT Bombay

- 25 -

BOSS API

•  Search for slumdog millionaire: – http://boss.yahooapis.com/ysearch/web/v1/slumdog+millionaire?appid=xyz&format=xml

Page 26: BOSS: HackU IIT Bombay

- 26 -

BOSS API: XML response

http://boss.yahooapis.com/ysearch/web/v1/slumdog+millionaire?appid=xyz&format=xml

Page 27: BOSS: HackU IIT Bombay

- 27 -

BOSS API

•  Exact search for “slumdog millionaire” –  http://boss.yahooapis.com/ysearch/web/v1/%22slumdog+millionaire%22?appid=xyz&format=xml

Page 28: BOSS: HackU IIT Bombay

- 28 -

BOSS API

•  Search for slumdog millionaire only on indiatimes.com: –  Add site:indiatimes.com to your query –  http://boss.yahooapis.com/ysearch/web/v1/slumdog

+millionaire+site%3Aindiatimes.com?appid=xyz&format=xml

•  Search for slumdog millionaire on selected movie sites –  Add param sites=indiatimes.com,movies.yahoo.com,imdb.com –  http://boss.yahooapis.com/ysearch/web/v1/slumdog

+millionaire?appid=xyz&sites=indiatimes.com%2Cmovies.yahoo.com&format=xml

Page 29: BOSS: HackU IIT Bombay

- 29 -

http://boss.yahooapis.com/ysearch/web/v1/slumdog+millionaire? appid=xyz&sites=indiatimes.com%2Cmovies.yahoo.com&format=xml

Page 30: BOSS: HackU IIT Bombay

- 30 -

BOSS API

•  Find related keywords –  Add parameter view=keyterms –  http://boss.yahooapis.com/ysearch/web/v1/slumdog

+millionaire?appid=xyz&view=keyterms&format=xml

Page 31: BOSS: HackU IIT Bombay

- 31 -

http://boss.yahooapis.com/ysearch/web/v1/slumdog +millionaire?appid=xyz&view=keyterms&format=xml

Page 32: BOSS: HackU IIT Bombay

- 32 -

BOSS API

•  Search images –  http://boss.yahooapis.com/ysearch/images/v1/slumdog

+millionaire?dimensions=small

Page 33: BOSS: HackU IIT Bombay

- 33 -

http://boss.yahooapis.com/ysearch/images/v1/ slumdog +millionaire?dimensions=small

Page 34: BOSS: HackU IIT Bombay

- 34 -

BOSS API

•  Search news –  http://boss.yahooapis.com/ysearch/news/v1/slumdog

+millionaire?age=15d

Page 35: BOSS: HackU IIT Bombay

- 35 -

http://boss.yahooapis.com/ysearch/news/v1/slumdog +millionaire?age=15d

Page 36: BOSS: HackU IIT Bombay

- 36 -

BOSS API

Spell check request

http://boss.yahooapis.com/ysearch/spelling/v1/milionare?format=xml

Response

Page 37: BOSS: HackU IIT Bombay

- 37 -

BOSS API REST Interface

•  {query}: term to look for (url-encoded) •  {vert} := {web, news, images, spelling} •  @ required

–  appid

•  @ optional –  start, count, lang, region, format, callback, sites

http://boss.yahooapis.com/ysearch/{vert}/v1/{query}

Page 38: BOSS: HackU IIT Bombay

- 38 -

BOSS Mashup Framework

•  Python (v2.5+) library

•  BOSS Search SDK plus …

•  SQL for remixing arbitrary XML/JSON sources

http://developer.yahoo.com/search/boss/mashup.html

Page 39: BOSS: HackU IIT Bombay

- 39 -

BMF + Google App Engine

•  Enhanced version of BMF to GAE platform

•  http://zooie.wordpress.com/2008/08/04/yahoo-boss-google-app-engine-integrated/

•  Enables quick deployment of BOSS applications online

Page 40: BOSS: HackU IIT Bombay

- 40 -

One more thing…

Page 41: BOSS: HackU IIT Bombay

- 41 -

BOSS in Academic Research

•  The biggest dataset available on web •  Very useful for Web-mining research experiments

–  Natural language processing –  Semantic extraction –  Related keywords –  Similarity detection –  Clustering algorithms –  Spelling corrections

Page 42: BOSS: HackU IIT Bombay

- 42 -

Questions?

Thank You

More: http://developer.yahoo.com/search/boss/

Page 43: BOSS: HackU IIT Bombay

- 43 -

Appendix

Page 44: BOSS: HackU IIT Bombay

- 44 -

http://www.yahoo.com

Search UI Templates are Included in the BOSS Mashup Framework

BOSS Mashup Framework simplifies aggregating and presenting multiple data sources

Page 45: BOSS: HackU IIT Bombay

- 45 -

BMF Features

•  select, group, sort, union, joins, udfs, where •  Text normalization and duplicate removal •  Auto-transformation of resource-oriented API results

into tables w/o parsing •  All-in-memory storage and retrieval operations •  Ability to join lists of tables via an arbitrary predicate

function (map-like)

•  Search UI template framework •  Single search function provides total access to

BOSS REST API