Search engines and digital libraries

42
© Tefko Saracevic 1 part 1: search engines part 2: digital libraries

Transcript of Search engines and digital libraries

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 1/42

© Tefko Saracevic 1

part 1: search engines

part 2: digital

libraries

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 2/42

© Tefko Saracevic 2

dictionary definitions

searchCOMPUTING (transitive verb) to examine a computer

file, disk, database, or network for particular

information

enginesomething that supplies the driving force or energy to

a movement, system, or trend

search enginea computer program that searches for particular

keywords and returns a list of documents in whichthey were found, especially a commercial servicethat scans documents on the Internet

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 3/42

© Tefko Saracevic 3

about definition of search

engines• oh well …

search engines do not search only for

keywords, some search for other stuff as well

• and they are really not “engines” in theclassical sensebut then mouse is not a “mouse”

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 4/42

© Tefko Saracevic 4

use of search engines… among others

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 5/42

© Tefko Saracevic 5

 YourBrowser

How Search Engines Work(Sherman 2003)

 The Web

URL1

URL2

URL3 URL4

Crawler

Indexer

SearchEngine

Database Eggs?Eggs.

Eggs - 90%

Eggo - 81%

Ego- 40%

Huh? - 10%

All AboutEggsby

S. I. Am

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 6/42

© Tefko Saracevic 6

how do search engineswork? elaboration

• crawlers, spiders: go out to findcontent in various ways go through the web

looking for new & changed sitesperiodic, not for each query

no search engine works in real time

some search engines do it for themselves,

others not buy content from companies such as Inktomi

for a number of reasons crawlers do notcover all of the web – just a fraction

what is not covered is “invisible web”

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 7/42© Tefko Saracevic 7

elaboration …

• organizing content: labeling, arranging indexing for searching – automatic

keywords and other fields

arranging by URL popularity - PageRank as Googleclassifying as directory

mostly human handpicked & classified

• as a result of different organization we

have basically two kinds of searchengines:

search – input is a query that is then searched &displayed

directory – classified content – a class is displayed–

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 8/42© Tefko Saracevic 8

elaboration (cont.)

• databases, caches: storing content  humongous files usually distributed over many

computers

• query processor: searching, retrieval,display takes your query as input

engines have differing rules how handled

displays ranked output some engines also cluster output and provide

visualization

• at the other end is your browser

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 9/42© Tefko Saracevic 9

elaboration…similarities, differences

• all search engines have these basicparts in common

• BUT the actual processes – methodshow they do it – are based on variousalgorithms & they differ

 most are proprietary with details kept

mostly secret but based on well knownprinciples from information retrieval orclassification

to some extent Google is an exception –

they published their method

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 10/42© Tefko Saracevic 10

case of 

• developed by Sergey Brin andLawrence Page while students atStanford in the beginning run on Stanford computers

• basic approach has been described intheir famous paper“The Anatomy of a Large-ScaleHypertextual Web Search Engine”

well written, simple language, has their pictures in acknowledgement they cite the support by NSF’sDigital Library Initiative i.e. initially, Google cameout of government sponsored research

describe their method PageRank - based on rankinghyperlinks as in citation indexing

“We chose our system name, Google, because it is acommon spelling of googol, or ten on hundredth”

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 11/42© Tefko Saracevic 11

coverage differences

• no engine covers more than a fraction of WWW estimates: none more than 16%

hard (even impossible) to discern & compare coverage, butthey differ substantially in what they cover

• in addition: many national search engines

own coverage, orientation, governance many specialized or domain search engines

own coverage geared to subject of interest

many comprehensive sources independent of searchengines

some have compilations of evaluated web sources

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 12/42© Tefko Saracevic 12

searching differences

• substantial differences among searchengines on searching, retrieval display

need to know how they work & differ inrespect to defaults in searching a query

searching of phrases, case sensitivity, categories

searching of different fields, formats, types of 

resources advance search capabilities and features

possibilities for refinement, using relevancefeedback

display options

personalization options

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 13/42

© Tefko Saracevic 13

business model

differencesseveral business models

• public good - have independent

budget e.g. PubMed, Librarians’ Index to Internet

• earn revenue from provision of information all commercial search engines

• using search engines to promote theirother activities

e.g. telephone directories

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 14/42

© Tefko Saracevic 14

sponsorship differences

• need to understand treatment of sponsorship – they influence what theysearch & how they display results

some list separately results fromsponsored sites so you are reasonablyclear what is there because it is sponsored& not

some have display-per-pay - showing first

sites that paid most & do not even tell youthat

some have pay per update of sites

• imperative to find sources that explain

these models for different engines to

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 15/42

© Tefko Saracevic 15

limitations

• every search engine has limitation as tocoverage

meta engines just follow coverage limitations & have

more of their ownsearch capabilities

finding quality information

• some have compromised search with

economicsbecoming little more than advertisers

• but search engines are also many times

victims of spamindexingaffectin what is included and how ranked

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 16/42

© Tefko Saracevic 16

spamming a search

engine• use of techniques that push rankings

higher than they belong is also called

spamdexingmethods typically include textual as well as

link-based techniques

like e-mail spam, search engine spam is a

form of adversarial information retrieval the conflicting goals of accurate results of search

providers & high positioning by content pagerank

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 17/42

© Tefko Saracevic 17

meta search engines

• meta engines search multipleengines 

getting combined results from avariety of engines

• do not have their own databases

but have their own business modelsaffecting results

• a number of techniques usedinteresting ones: clustering, statistical

analyses

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 18/42

© Tefko Saracevic 18

how to find a search

engine?• variety of resources that list or categorizeengines

• SearchEngines.com

search for engines by topic, geography, referenceSearch Engine Guideengines categorized by topic; other engine information

Search Engine Colossus

international directory of search engines by country,topicfrom 198 countries and 61 territories; engines in choice of 

languages

Phil Bradley’s country based search enginover 2000 serach engines from countries all over the globe

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 19/42

© Tefko Saracevic 19

sample of meta engines- with organized results

Dogpile

results from a number of leading search engines;gives source, so overlap can be compared; (hasalso a (bad) joke of the day)

Surfwax

gives statistics and text sources & linking tosources; for some terms gives related terms tofocus

Teomaresults with suggestions for narrowing; links

resources derived; originated at Rutgers

Turbo10

provides results in clusters; engines searched canbe edited

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 20/42

© Tefko Saracevic 20

meta search engines

(cont.)• Large directory

Complete Planet directory of over 70,000 databases & specialty

engines

• Results with graphical displays Vivisimo

clusters results; innovative

Webbrain results in tree structure – fun to use

Kartoo

results in display by topics of query

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 21/42

© Tefko Saracevic 21

domain engines &

catalogs• cover specific subjects & topics

• important tool for subject searches

particularly for subject specialistvalued by professional searchers

• selection mostly hand-picked ratherthan by crawlers, following inclusioncriteriaoften not readily discernable

but content more trustworthy

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 22/42

© Tefko Saracevic 22

domain engines … sample

Open Directory Project large edited catalog of the web – global, run by

volunteers

BUBL LINK  selected Internet resources covering all academic

subject areas; organized by Dewey Decimal System– from UK 

Profusion search in categories for resources & search engines

Resource Discovery Network  – UK “UK's free national gateway to Internet resources

for the learning, teaching and research”

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 23/42

© Tefko Saracevic 23

domain engines … sample

Think Quest – Oracle Education Foundation • education resources, programs; web sites created by students

All Music Guide• resource about musicians, albums, and songs

Internet Movie Database• treasure trove of American and British movies

Genealogy links and surname search engineswell.. that is getting really specialized (and popular)

Daypopsearches the “living web” “The living web is composed of sites

that update on a daily basis: newspapers, online magazineand weblogs”

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 24/42

© Tefko Saracevic 24

science, scholarshipengines …sample free

access Psychcrawler - Amer Psychological

Association web index for psychology

Entrez PubMed – Nat Library of Medicinebiomedical literature from MEDLINE & health

 journals

CiteSeer - NEC Research Center scientific literature, citations index; strong in

computer science

Scholar Googlesearches for scholarly articles & resources

Infominescholarly internet research collections

Scirusscientific information in journals & on the web

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 25/42

© Tefko Saracevic 25

science, scholarshipengines …sample

commercial access• an addition to freely accessible engines

many provide search free but access tofull text paid

by subscription or per itemRUL provides access to these & many

more:

ScienceDirect

Elsevier: “world's largest electronic collection of science,technology and medicine full text and bibliographic

information” ACM PortalAsoc. for Computing Machinery: access to ACM Digital Library &

Guide to Computing

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 26/42

© Tefko Saracevic 26

where to find out?

• information about search engines insources that have updates, news, tipsfor searching and more – a MUST forsearchers : Search Engine Watch

 ratings, news, statistics, charts, explanations,tutorials

Search Engine Showdown  “The users’ guide to web searching” - run by a

librarian, news links, ratings

  Virtual Chase a site about “Teaching Legal Professionals How To

Do Research;,” this section has very good tipsand links for consideration of quality on the web

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 27/42

© Tefko Saracevic 27

where? ….

SiteLines

a blog, written by Rita Vine, a professionallibrarian, & web search trainer; many

evaluations in archiveResourceShelf 

“Resources and News for InformationProfessionals,” edited by Gary Price, a librarian &author of Invisible Web – has extensive archive

WebsearchAbout

not evaluative, but provides news, capabilities,sources, articles about web searching

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 28/42

© Tefko Saracevic 28

art of searching search

engines

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 29/42

© Tefko Saracevic 29

part 2: digital libraries 

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 30/42

© Tefko Saracevic 30

definition

• digital libraries are viewed from severalperspectivestechnical: “Digital library is a managed

collection of information, with associatedservices, where information is stored in digitalformat and accessible over a network.” (Arms,2000)

institutional: “Digital libraries are

organizations that provide the resources,including the specialized staff, to select,structure, offer intellectual access to,interpret, distribute, preserve the integrity of,and ensure the persistence over time of 

collections of digital works so that they are

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 31/42

© Tefko Saracevic 31

a bit of context

• short but volatile history research & development took of by start/mid

1990’s

in the next decade phenomenal growthworldwide

large investment in research & building

• number of communities involvedcomputer science, primarily in researchmany subjects: digital libraries in their

domain

library & information science: operations,

studies of users, use, usability

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 32/42

© Tefko Saracevic 32

libraries & digital

resources• libraries (particularly research, academic &

special) directed massive funding toward

such resourceselectronic journals

databases

catalogs

digitization of parts of collection

• thus becoming in effect digital libraries –or more accurately hybrid librarieswith graphic and digital versions or types of 

resources

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 33/42

© Tefko Saracevic 33

emphasis here

• on large academic or research digitallibraries that also are related to

searchingprovide search capabilities or access to

search engines

provide electronic journals that provide full

text of articles after a search• such libraries have become also search

portals of sort, essential for their users in education, research & related activities

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 34/42

© Tefko Saracevic 34

sample

New York Public Library Digital“NYPL Digital is your gateway to The New York Public Library’s

rare and unique collections in digitized form.” Includesaccess to searchable databases

U California Berkeley Digital LibrarySUNsite“builds digital collections and services while providing

information and support to digital library developers

worldwide.

The British Library“The world’s knowledge.” Includes “Services fro library and

information Professionals.”

Los Angeles Public Library Kids’ Pathresources for children; search through directory

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 35/42

© Tefko Saracevic 35

sample …

New Zealand Digital Librarysearching of a number of digital collections, including

humanity development library

Research Library Group“RLG is a not-for-profit organization of over 150 research

libraries, archives, museums, and other cultural memoryinstitutions.” Includes links to a number of searchablecollections

Public Library of Science“PLoS is a nonprofit organization of scientists and physicians

committed to making the world's scientific and medicalliterature a public resource.” Publishes open access journals

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 36/42

© Tefko Saracevic 36

Rutgers libraries – digital components

• strategic planning in developing digitalaccess

• rich & complex content of digitalresourcesseveral hundred indexes & databases for

searching

some 20,000 electronic journals

thousand & more digital reference sources

subject research guides

Searchpath & other tutorials

electronic reserve

• affected teachin , learnin , research b

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 37/42

© Tefko Saracevic 37

some critical issues for

searching• no way yet to do federated searching

in digital libraries

to search several indexes at the same timeeach source has to be searched separately

most have very different search features,capabilities

• finding items in indexes does not meanthat always able to get full text

• thus, searching time-consuming,chaotic

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 38/42

© Tefko Saracevic 38

where to find out?

• information about digital librariesLibWeb U California, Berkeley“lists currently over 7200 pages from libraries in over 125

countries”Digital Library Federation“a consortium of libraries and related agencies that are

pioneering the use of electronic-information technologies

to extend their collections and services” 

D-Lib Magazine“a solely electronic publication with a primary focus on

digital library research and development, including butnot limited to new technologies, applications, andcontextual social and economic issues”

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 39/42

© Tefko Saracevic 39

where? …

Ariadne (UK)“to report on information service developments and

information networking issues worldwide, keepingthe busy practitioner abreast of current digital library

initiatives” Information Technology and Libraries

ALA publication; “related to all aspects of libraries andinformation technology, including digital libraries”

 Journal of Digital Information

“Publishing papers on the management, presentationand uses of information in digital environments”

Biblio Tech Review“Information Technology for Libraries” – monthly news

and review magazine

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 40/42

© Tefko Saracevic 40

in conclusion

• search engines are great but you haveto KNOW what is under the hoodas to coverage, business model, search

features, outputs … they are NOT for every kind of information

need

• digital libraries are great for searching

but you have to KNOW requirementsfor searching different resources thatare included there is no federated searching as yet, or

for the time to come

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 41/42

© Tefko Saracevic 41

art of searching digital

libraries

8/14/2019 Search engines and digital libraries

http://slidepdf.com/reader/full/search-engines-and-digital-libraries 42/42

and rewards …