Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of...

49
Open Search David Wolber
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    1

Transcript of Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of...

Page 1: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Open Search

David Wolber

Page 2: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Overview

Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P knowledge Sharing Webtop Metasearch Clients

Page 3: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Contributors

Michael Kepe Igor Ranitovic Iman Sadreddin Senior Team ’03 Ken Chong Rudd Stevens Colin Bean Tim Chan Julian Chan Pooja Garg

Page 4: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Information Source Explosion

Google, Amazon APIs Internet Archive Technorati– The World Live Web Domain Specific:

– ACM Digital Library for CS– Lexis-Nexis for law– MLA for literature

Page 5: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

End-User Created Digital Libraries

Personal Web (shared Google desktop)

Personal Web Neighborhood

Topic-Specific Personal Crawlers

Ordinary people creating search engines as easily as web pages

2nd Degree

1st Degree

Nth Degree

PersonalWeb

Page 6: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Subsets of the Web

Page 7: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Motivation for Small, Independent Subsets of the Web

Avoid information being channeled through a single portal: Googleopoly

Google does no evil, but…– Censorship in China– Creeping level of commercialization– Unregulated manipulation of secret ranking

algorithms (see PageKing case)

Other media is lost, this is the last frontier

Page 8: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Little support for using multiple search engines

Page 9: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Overview

Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P knowledge Sharing Webtop Metasearch Clients

Page 10: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Metasearch

Help users discover and use digital libraries

Send queries to multiple, selected search engines

filter, process, and unify results

A9.com – Amazon’s metasearch

Page 11: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Web Services Basis

server html

server softwarexml

server

html

Web Page Model

Web Service Model

Page 12: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

How does metasearch evolve?

New Digital library

Page 13: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

How does metasearch evolve?

New Digital library

Metasearch clients discover it

Page 14: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

How does metasearch evolve?

New Digital library

Metasearch clients discover it Metasearch

Programmers write adaptor/scraper

Page 15: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

How does metasearch evolve?

New Digital library

Metasearch clients discover Metasearch

Programmers write adaptor/scraper

User can access within metasearch

SLOWLY…

Page 16: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Overview

Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P knowledge Sharing Webtop Metasearch Clients

Page 17: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Goal: Automate the Process

Metasearch engines should provide users with up-to-date lists of existing digital libraries

Digital libraries should be able to register and be made immediately available to all Metasearch clients.

Metasearch and Library development is independent.

Page 18: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

What is Necessary?

Standard Search API– So Metasearch clients can use polymorphism to access

sources.

for each source s in sourceList {searchEngine.endPointUrl = s.endPointUrl;resultList +=

searchEngine.keywordSearch(keywords)}

Search API Registry

– Metasearch clients can get dynamic list

Page 19: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Web Service Standards

WSDL – Web Service Description Language

SOAP – Simple Object Access Protocol

UDDI – Universal Description, Discovery, and Integration

Page 20: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Standards on top of Web Services

WSDL, SOAP, UDDI basis for standards in many domains.– e.g., MS initiated for securities information

providers

Businesses agree on a standard, then client applications can use polymorphism and new businesses can register services.

In this case, we want cross-domain standard.

Page 21: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Open Search Architecture

Open Search Protocol (OSP)– Cross-Domain: Search-related services– Not just keyword search, but citations, authorOf, etc.

Open Search Registry– Based on UDDI– Can add customization, e.g., parsing to find out which

search operations are implemented.– Web and web service access

Page 22: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Open Search Architecture

OSP metasearch clientssource list

Register service

OSP-Conforming Libraries

OS Registry

Page 23: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

User Can Choose Sources

Page 24: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Open Search Protocol

Keyword search

Citations (inward links, outward links)

AuthorOf and other associative operations…

Metadata object results based on Dublin Core

Restriction object for “advanced search” stuff

Page 25: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Publishing a Library

• Access OSP WSDL Specification from webtop.cs.usfca.edu

• Generate code in language of choice

• Implement the search operations for the digital library

• Deploy the service

• Register with Open Search registry

Page 26: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Deploying an Open Search Lib.

programmer 1. OS wsdl

wsdl2java

2.wsdl

3. skeleton code

Open Search

information

Registry

Library server

4. deployed service

5. registration info

Page 27: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Wrapping a Library

Custom search API, e.g., Google API

Open Search Wrapper

Metasearch Client

1. OSP Query 4. OSP Result

2. Custom query

3. Custom Result

Located on 3rd party server

Page 28: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Wrappers Developed at USF

Google Amazon (sort of) Internet Archive Technorati Feedster

Page 29: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Overview

Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P knowledge Sharing Webtop Metasearch Clients

Page 30: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

PublishMe

Like Google Desktop, but shared.

Periodically updates inverse index and linkbase on PC

Deploys Web Service on User’s PC

Auto-Registers with Open Search Registry

Page 31: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Metasearch with P2P Knowledge Sharing

WEBTOP

Page 32: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Integrating Global and Personal Libraries

Page 33: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Motivation for Sharing Personal Webs

People create knowledge everyday when they bookmark, annotate, link, organize, and synthesize.

Communication is a separate step which often doesn’t happen

Page 34: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Experts Collaborative Work

Motivation for Sharing Personal Webs

Page 35: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Computers are designed using our brains for a model

Knowledge creation and dissemination separate

Explicit effort required to communicate Just as we model our word processors on

paper.

Page 36: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Additions to OSP for P2P

GetFile

OnLine(ip)– Handles user starting up– Dynamic IPs

OffLine

Page 37: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

But What About PRIVACY?The Big Question:

How much of the information hidden

within your personal web is hidden due

to privacy concerns?

Page 38: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

I Want you to be a Search Engine!

Page 39: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Overview

Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P knowledge Sharing Metasearch Clients

Page 40: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Goal: Implement Vannevar Bush’s Association Trails

View a document/thing in context

History of an idea

Page 41: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Thinkmap-like Interface

Page 42: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Association Types

Outward links Inward links Similar-Content links People Links

– author, people referenced in paper Domain-Specific links

– law citations– movie-actor

Associations specified by Annotators

Page 43: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Webtop Tree View webtop.cs.usfca.edu

Page 44: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Expanding a Tree

• Bird’s Eye View

• Local/Web files integrated

• Follow different Associative Trails

• Ins of Outs of Ins, etc.

• Siblings

• Weird though, as ins and outs both expand right

Page 45: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Webtop Side Panel View

Page 46: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Project Status

Too many bugs, Dad

Page 47: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Future Work

Open Search Protocol– In-depth study of existing search APIs– Provide Rest alternative to SOAP

Metasearch development– Complete and refine existing clients– Dream up new ones

Thinkmap Graph Automated Source Selection and Reputation System Page Ranking

Initiate grass-roots involvement

Page 48: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Future Work: Documents and Things

resourceassociationsannotations

document

html word pdf

person

film book

creative work

Page 49: Open Search David Wolber. Overview Proliferation of Digital Libraries Metasearch and Fixed Lists of Sources Open Search Architecture PublishMe for P2P.

Stop talking about Webtop daddy!

webtop.cs.usfca.edu