Search Technologies for Digital Libraries
-
Upload
cneudecker -
Category
Technology
-
view
179 -
download
0
description
Transcript of Search Technologies for Digital Libraries
![Page 1: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/1.jpg)
Contemporary Search Technologies - also for Libraries?
Clemens Neudecker, KB – 20/04/2011
![Page 2: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/2.jpg)
Table of contents
Retrieval: Status Quo
New ways of searching
Prototypes & Outlook
![Page 3: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/3.jpg)
Lossau (dlib, 2004)
How to position the library as an information provider in the 21st century?
Search services are critical!
http://www.dlib.org/dlib/june04/lossau/06lossau.html
![Page 4: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/4.jpg)
Library as a “depot”
Collect
Preserve
![Page 5: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/5.jpg)
Library as a “gateway”
New ways of searching and/or browsing
Service infrastructure
User-Generated content
Competition: Internet Search Engines
![Page 6: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/6.jpg)
Simple Search
• By keyword
• Boolean operators
![Page 7: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/7.jpg)
Advanced Search
Facets
Views
Phrases
![Page 8: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/8.jpg)
Meta-Search
![Page 9: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/9.jpg)
Basics
• Crawling
• Indexing
• Searching
• Ranking results
http://nlp.stanford.edu/IR-book/
![Page 10: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/10.jpg)
Technology
Apache Lucene/Solr (KB: Migration Verity)
http://lucene.apache.org/
http://lucene.apache.org/solr/ SRU = Search/Retrieve via URL
http://www.loc.gov/standards/sru/ CQL = Contextual Query Language
http://www.loc.gov/standards/sru/specs/cql.html
![Page 11: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/11.jpg)
Retrieval: Status Quo
Catalogue
Metadata
![Page 12: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/12.jpg)
Catalogue Search
![Page 13: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/13.jpg)
Metadata
Dublin Core (DCMI)
http://dublincore.org/
Z39.50
http://www.loc.gov/z3950/agency/
![Page 14: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/14.jpg)
Metadata Harvesting
Open Archives Initiative: OIA-PMH
http://www.openarchives.org/
![Page 15: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/15.jpg)
Linked Data
![Page 16: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/16.jpg)
Authority Data
Named Entities
(Persons, Places, Institutions)
http://viaf.org/ Gazetteers
http://www.world-gazetteer.com/ Other Examples:
LocAuth, PND, NaCo
![Page 17: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/17.jpg)
Persistent Identifier
URN = Uniform Resource Name
NBN = National Bibliography Number
Resolver = Translation into web address
![Page 18: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/18.jpg)
Problems
Correctness of data
Coverage
Formats
Alignment
Multilingualism
![Page 19: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/19.jpg)
![Page 20: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/20.jpg)
What happened since
Google Books
The European Library
Europeana
Wolfram/Watson
What’s next?
![Page 29: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/29.jpg)
![Page 30: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/30.jpg)
The web
The web is not limited to the www!
Data deluge
“Deep web” – not indexed (dynamic) parts
Web of users – currently ~2 billion
![Page 33: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/33.jpg)
Web archiving
![Page 34: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/34.jpg)
The web as a resource
Knowledge Extraction (not the actual data!)
→ Semantic Web
(web of knowledge,
rather than data)
![Page 35: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/35.jpg)
Semantic Web
RDFhttp://www.w3.org/RDF/
OWLhttp://www.w3.org/2004/OWL/
SPARQL http://www.w3.org/TR/rdf-sparql-query/
SKOS http://www.w3.org/2004/02/skos/
![Page 36: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/36.jpg)
Ontologies
Ontology = “Model of the World”
Classes Instances Properties
![Page 37: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/37.jpg)
Semantic Graphs
![Page 38: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/38.jpg)
![Page 39: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/39.jpg)
New resources
Digital libraries (Images + OCR) Digital born material The web
→ Interoperability (STITCH, CATCH)
![Page 40: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/40.jpg)
Full text (OCR)
"... tte->e°n.m.66-..ie k>okke cire-5^ea. ver.è. 6.or ^ ^ ^ °
kiesrellj-oe-ikei^, v-in eeo ^elj-escdapeo ^UOI^, 7
^n>5«--'-/-r. veel8-Iiec-jc ttui5vroll^ v,a 'z » ^ v e . X. «. ^ ^ I» 2 L t. L ^-i ? > " Z Z^
l»v«e».ic. sx ^ ^ , 6en 2 l8c«. Leb. ^ L I L I tZ.
6eo zc> ^pr>!, >«(ZS. 8 O II 0 v ? L W. . L^-L"
. . ^ ... ,. , ^,a «ore Vrienilea ea Lekenaêll zeven dy aeeea ^^
^ LLQ d2i« 4 urea, 18 myoe ttuisvi-ouiv, van Kenoi5, Sis asr 0v?e darlelvk >zetief6e Vscier', ?. L08, op L
«eea vel.^esckspLa ^5^()I>Z verlof. Ke6ed w»cj6zZ reo l2urev, as eev Verval vsn ^evev^drscdceo, ^ ^ ^ "A.
Oevki>i7L«., K0>.^^Q8N()VL^, secZerr z ''Vckev öeclle^ri^ te , jv6evou6er6oru " ^
<Zen Zv ^pri!, 1806. ^x>0lè:ecsr. vsv dyQ!l 92 ^sr^n, ker ^clelvke vzet det Leu^visie vervvzilelc! 'O L ^ ^ ^ '-
".' «eckea mi6ck»z ruim êên uur verlatte ovvorfpieck-z. i>«kl. ^0-6 k»rskter verdeaxSe »Ue ryve iiinöeren en L--»S « > I L^Z
![Page 41: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/41.jpg)
OCR Lexica
Word matching (fuzzy words) Frequency Morphology Historic forms Inflected forms
![Page 42: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/42.jpg)
Visibility
“Hidden” - only indexed Highlighting in image Full text behind image (PDF) Parallel/switched mode User Correction/Annotation
![Page 43: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/43.jpg)
Hidden in index
![Page 44: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/44.jpg)
Image highlighting
![Page 45: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/45.jpg)
![Page 46: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/46.jpg)
Parallel/Switched
![Page 47: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/47.jpg)
![Page 48: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/48.jpg)
Crowdsourcing
![Page 49: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/49.jpg)
Crowdsourcing examples
UIBK Catalogue NLA Newspapers
http://trove.nla.gov.au/newspaper Digitalkoot
http://www.digitalkoot.fi/en/splash Concert TranscriBentham
http://www.transcribe-bentham.da.ulcc.ac.uk/td/Transcribe_Bentham
![Page 50: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/50.jpg)
UIBK Catalogue
![Page 51: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/51.jpg)
Trove I
![Page 52: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/52.jpg)
Trove II
![Page 53: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/53.jpg)
Digitalkoot
![Page 54: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/54.jpg)
Concert
![Page 55: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/55.jpg)
TranscriBentham
![Page 56: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/56.jpg)
![Page 57: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/57.jpg)
Prototypes
![Page 58: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/58.jpg)
Prototype: FEP
![Page 59: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/59.jpg)
Prototype: Assets
http://virserv.isti.cnr.it:8080/assetsIRService/index
![Page 60: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/60.jpg)
Prototype: Semantic Search
http://eculture.cs.vu.nl/europeana/session/search
![Page 61: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/61.jpg)
Prototype: Waisda
http://waisda.q42.net/, http://blog.waisda.nl/
![Page 62: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/62.jpg)
Prototype: Geospatial Search
![Page 63: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/63.jpg)
Prototype: Image Annotation
http://dme.arcs.ac.at/annotation/ Problem: No Flash in Europeana (A/V content)
![Page 64: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/64.jpg)
Prototype: GeoEuropeana
http://amercader.net/dev/geoeuropeana/
![Page 65: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/65.jpg)
Prototype: Random Image Explorer
http://europeana.fe2.nl/ (Willem Jan Faber, KB)
![Page 66: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/66.jpg)
![Page 67: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/67.jpg)
Solution: Common API
API = Application Programming Interface
Set of descriptions defining how to access an electronic resource/application through a common interface
![Page 68: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/68.jpg)
API
Documented Interface Definition
Machine readable
Public/shared
![Page 69: Search Technologies for Digital Libraries](https://reader033.fdocuments.net/reader033/viewer/2022061206/5482a9e2b4af9fa50d8b486d/html5/thumbnails/69.jpg)
API Benefits
Data/functionality available through documented, public interfaces
Anybody can use it
Can be integrated in other services/tools
Can be compared, combined, linked
Libraries need not be the actual host