Bieber et al., NJIT ©2005 - Slide 1 Lightweight Integration and Recommendation of Documents and...

40
Bieber et al., NJIT ©2005 - Slide 1 Lightweight Integration and Recommendation of Documents and Services ------- Digital Library Service Integration, IntegraL and IntLib Projects Michael Bieber * , Il Im * , Vincent Oria ** Richard Sweeney *** , Yi-Fang Wu * * Information Systems Department *** Robert Van Houten Library ** Computer Science Department College of Computing Sciences New Jersey Institute of Technology http://is. njit . edu /integral
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    3

Transcript of Bieber et al., NJIT ©2005 - Slide 1 Lightweight Integration and Recommendation of Documents and...

Bieber et al., NJIT ©2005 - Slide 1

Lightweight Integration and Recommendation

of Documents and Services-------

Digital Library Service Integration,IntegraL and IntLib Projects

Michael Bieber*, Il Im*, Vincent Oria**

Richard Sweeney***, Yi-Fang Wu*

* Information Systems Department *** Robert Van Houten Library ** Computer Science Department

College of Computing Sciences

New Jersey Institute of Technology

http://is.njit.edu/integralApril 2005

Bieber et al., NJIT ©2005 - Slide 2

Outline• Motivation• Illustrations• Structural Relationships• 3 Types of Integration• Personalizing Links• Federated Metasearch• Recommendations• Contributions and Vision• Call for Collaboration • Project Details

Bieber et al., NJIT ©2005 - Slide 3

Challenges for Library Users• Need to know what resources to use before they can access

them• Finding related information outside current system• Need to leave current page to do related tasks

Why?• Library resources aren’t

integrated well

==> Project Goal: – Bring relevant resources directly to the user

Library resources: databases (e.g., EBSCOhost,

ACM Digital Library), external digital libraries,

on-line catalog, special collections, library services (e.g., interlibrary loan)...

Bieber et al., NJIT ©2005 - Slide 4

Integration through Linking• automatically generate link anchors on elements

recognizedbased on:– structural relationships– lexical relationships

• automatically generate links – to related information – to relevant services

==> lightweight integration of – documents containing links and– documents/services the links point to

Bieber et al., NJIT ©2005 - Slide 5

Prototype

Services for a launch-date element:- search by launch date- search by month and year- search by year

Bieber et al., NJIT ©2005 - Slide 6

Prototype

Services for a document element:- open- summarize in 3 sentences

Bieber et al., NJIT ©2005 - Slide 7

Mock-up for alibrary database

Services from multiple systems(customized to user tasks/preferences)

Bieber et al., NJIT ©2005 - Slide 8

Benefits of Integrationfor a system (collection/service)

• Users: direct access to related systems– enlarges a system’s feature set

• Links leads users to a system– systems gain wider use

• Users become aware of other systems– systems gain wider awareness

• Direct access to a system’s features– streamlined access (bypassing menus)

Bieber et al., NJIT ©2005 - Slide 9

structural elementsand links

lexical elementsand links

Two Types of Links:(1) structural based on element type * title, author, source(2) lexical (found in a glossary)

Bieber et al., NJIT ©2005 - Slide 10

Structural Relationships

• Links generated based on application structure, not search or lexical analysis

– You cannot do a search on the display text “$127,322.12” to find related information…

– But you can find relationships for the element Sales[2002]

$85,101.99$127,322.12

2002 Expenses2002 Sales

Bieber et al., NJIT ©2005 - Slide 11

Outline• Motivation• Illustrations• Structural Relationships• 3 Types of Integration• Personalizing Links• Federated Metasearch• Recommendations• Contributions and Vision• Call for Collaboration • Project Details

Bieber et al., NJIT ©2005 - Slide 12

Three Types of Integration:(1) for documents to receive anchors and links(2) to provide services (which become links)(3) to provide glossaries for content analysis

Require a document schema mapper to recognize structural elements:- wrapper- fixed template - XML markup- etc.

Bieber et al., NJIT ©2005 - Slide 13

Three Types of Integration:(1) for documents to receive anchors and links

(2) to provide services (which become links)(3) to provide glossaries for content analysis

Linking Rules represent * every service * that a system can provide * for each kind of element.

Bieber et al., NJIT ©2005 - Slide 14

Three Types of Integration:(1) for documents to receive anchors and links

(2) to provide services (which become links)(3) to provide glossaries for content analysis

Linking Rules represent * every service * that a system can provide * for each kind of element.

Example ==>

Bieber et al., NJIT ©2005 - Slide 15

Example Linking Rulefrom the AskNSDL system

– a) element type (“concept”)

– b) link display label (“Ask an expert about this”)

– c) relationship metadata

– d) destination collection or service (“Ask NSDL”)

– e) the exact command to send to the destination system

• (logs the user into AskNSDL, opens question template, fills in the element instance (i.e., “physics teaching”) as the subject, and places the cursor in the question area)

– f) any relevant conditions for including this relationship

Bieber et al., NJIT ©2005 - Slide 16

Three Types of Integration:(1) for documents to receive anchors and links(2) to provide services (which become links)

(3) to provide glossaries for content analysis

Lexical analysis by:• NJIT Noun Phrase

Extractor• NJIT Ontology

Developer

Bieber et al., NJIT ©2005 - Slide 17

Each system is integrated independently:(1) Schema mappers for individual systems(2) Linking rules are plugged in” independently

for each service(3) Glossaries and thesauri can be independent

of other systems

Bieber et al., NJIT ©2005 - Slide 18

Outline• Motivation• Illustrations• Structural Relationships• 3 Types of Integration• Personalizing Links• Federated Metasearch• Recommendations• Contributions and Vision• Call for Collaboration• Project Details

Bieber et al., NJIT ©2005 - Slide 19

Personalizing the LinksCustomize the list of links according to:• Collaborative Filtering

– Matching user’s “click stream” to other users’

• time spent at each destination• asking users to rate links• user task information

Bieber et al., NJIT ©2005 - Slide 20

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Federated Metasearch• Searches, merges & ranks

Bieber et al., NJIT ©2005 - Slide 21

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Federated Metasearch• Searches, merges & ranks• Clusters results by concept

Bieber et al., NJIT ©2005 - Slide 22

Federated Metasearch:Clustering by Concept

concepthierarchy

search resultsare clusteredby concept

Bieber et al., NJIT ©2005 - Slide 23

Outline• Motivation• Illustrations• Structural Relationships• 3 Types of Integration• Personalizing Links• Federated Metasearch• Recommendations: General Recommendation Engine• Contributions and Vision• Call for Collaboration• Project Details

Bieber et al., NJIT ©2005 - Slide 24

Integration With Partner Libraries

Minimum integration More complex integration

Bieber et al., NJIT ©2005 - Slide 25

GRE Manager

User TaskDatabase

CF EngineClickstream/EvaluationDatabase

Users

GRE

Collection A

CB EngineDocumentDatabase

KB Engine OntologyDatabase

Final Recomm-endations

CB Recomm-endations

KB Recomm-endations

CF Recomm-endations

User Info.ClickstreamsDocumentsEvaluations

DocumentsClickstreams

User Info.ClickstreamsEvaluations

User Info.Documents

Final Recomm-endations

User Info.ClickstreamsDocumentsEvaluations Users

Collection N

Final Recomm-endations

User Info.ClickstreamsDocumentsEvaluations

......

Bieber et al., NJIT ©2005 - Slide 26

General Recommendation Engine Research Goals

• Integrate three major recommendation technologies

– Collaborative filtering (CF), Content-based (CB),

and Knowledge-based (KB) recommendation

• Automatically identify users’ current task

(search mode)

• Study the impacts of the recommendations on

information search

Bieber et al., NJIT ©2005 - Slide 27

Collaborative Filtering (CF)

• Recommendations based on similarities of people

• Traditional CF requires direct user inputs

• Clickstream-based CF (CCF) does not require direct

user inputs

• Works well for preference goods

• Does not work so well for information-intensive items

Bieber et al., NJIT ©2005 - Slide 28

Content-based Filtering (CB)

• Recommendations based on similarities of contents

– titles, authors, abstracts, or full texts

• Information retrieval (IR) techniques are used

– e.g., tf.idf value

• Documents similar in content are recommended

• Demo: http://highlight.njit.edu/ais/

Bieber et al., NJIT ©2005 - Slide 29

Knowledge-based Recommendation (KB)

• CF and CB lack a holistic view

– why a document is relevant for a user

• KB recommends items based on a certain

knowledge structure (ontology)

• KB requires knowledge engineering

• Goal: to build a automated (or semi-automated)

ontology engine based on the ‘Self-organizing

tree’ algorithm (Khan and Luo, 2002)

Bieber et al., NJIT ©2005 - Slide 30

Automatic User Profile Extractor

• Records a user’s recent documents

• The user’s profile is represented by a set of

keywords from those documents

• As the user visits more documents, his/her

profile is updated

Bieber et al., NJIT ©2005 - Slide 31

Outline• Motivation

• Illustrations

• Structural Relationships

• 3 Types of Integration

• Personalizing Links

• Federated Metasearch

• Recommendations

• Contributions and Vision

• Call for Collaboration

• Project Details

Bieber et al., NJIT ©2005 - Slide 32

Contributions• straightforward, sustainable approach for

integrating documents and services– Lightweight integration through linking

• combining structural links with content-based links

• next-generation collaborative filtering• federated metasearch• next-generation recommendations• integrating traditional and digital libraries• widespread dissemination

Bieber et al., NJIT ©2005 - Slide 33

VisionA nationwide virtual library• to and from

– your local library

– other physical libraries

– digital libraries

• incorporating– traditional library resources

– digital library resources

Bringing relevant resources directly to the user!

Bieber et al., NJIT ©2005 - Slide 34

Looking for Collaboration• Additional document systems, digital

library collections, services and glossaries to integrate

• Physical library partners

• Digital library partners

• Web services to integrate

• Other suggestions welcome!

Bieber et al., NJIT ©2005 - Slide 35

Additional Slides

Bieber et al., NJIT ©2005 - Slide 36

Outline• Motivation• Illustrations• Structural Relationships• 3 Types of Integration• Personalizing Links• Federated Metasearch• Contributions and Vision• Call for Collaboration• Project Details

Bieber et al., NJIT ©2005 - Slide 37

Digital Library Service IntegrationNSF National Science Digital Library Award #DUE-0226075; 2002-2005

Tasks• Develop Integration

Infrastructure• Integrate digital library

collections and services• Collaborative filtering• Evaluation

Partners• NASA GSFC Library• AskNSDL• Earth Science Picture of

the Day System• Atmospheric

Visualization Collection• Metis Workflow (University

of Colorado, Boulder)

• University of Arizona

Bieber et al., NJIT ©2005 - Slide 38

IntLibInstitute of Museum and Library Services Award #LG-02-04-0002-04; 2004-2007

Tasks - to integrate:• EBSCOhost • Gale’s Discovery

Collection • ProQuest• On-line Catalog Systems• New Jersey Digital

Highway

The IntLib Project focuses on integrating the resources of public libraries primarily (and university libraries secondarily) with digital libraries.

Additional Partner• Newark Public Library

Bieber et al., NJIT ©2005 - Slide 39

IntegraLNSF National Science Digital Library Award #DUE-0434581; 2004-2007

Tasks - to integrate:• ACM Digital Library• Elsevier Science Direct

(permission pending)

• NJIT Electronic Thesis collection

• JerseyClicks• StartingPoint• Digital Library for Earth

Science Education (DLESE)• Science@NASA• NSDL Core Integration

features• an on-line bookstore

The IntegraL project focuses on integrating specific resources of college libraries with those of the NSDL.

Additional Partners• Cumberland C.C.• Ramapo College• Olin College of Engineering

Bieber et al., NJIT ©2005 - Slide 40

General Recommendation EngineNSF National Science Digital Library Award; 2004-2007

Tasks - to integrate:• Collaborative filtering

recommendations• Content-based

recommendations• Knowledge-based

recommendations

Partners• Digital Library for Earth

Science Education (DLESE)• Eisenhower National

Clearinghouse for Mathematics and Science Education

• Computer Vision Education Digital Library