Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset...

Post on 26-Jul-2020

4 views 0 download

Transcript of Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset...

Data Recommender System: Improving the Discovery of Environmental Datasets through

Text Analytics and Usage Mining

Anusuriya Devaraju, Uwe Schindler, Robert Huber, Michael Diepenbroek{adevaraju, uschindler, rhuber, mdiepenbroek}@marum.de

PANGAEA Data Publisher for Earth & Environmental Science,MARUM - Center for Marine Environmental Sciences, University of Bremen, Germany.

Geospatial Sensing Conference 2019From Sensing to Understanding our World September 2 – 4, 2019 Münster, Germany

Presentation Outline

2

• PANGAEA Digital Data Repository

• Data Recommender Systema. Metadata-based data recommendation

b. Users interaction-based data recommendation

• Online Evaluation & Results

• Conclusions

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

PANGAEA Digital Data Repository

3

> 386000 scientific datasets (30.08.2019)• Founded in 1993

• Jointly managed by AWI and MARUM.

• Datasets from researchers, projects, research centres and infrastructures (national & international) published with DOIs

• Data types, e.g., time series, spatial, images, audio, video.

PANGAEA Data Portal (https://pangaea.de/)

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Data Search in PANGAEA

4

• PANGAEA offers tools/APIs for meta(data) access and discovery.

Data DOI

Publication DOI

IGSN

ORCID

Project

Full-text search with autocomplete

Search by topics

Map-based search

Search functions in PANGAEA Dataset and its related research objects

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Data Search in PANGAEA

5Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Search Meets Discovery

6

• “Search is often struggling to deliver meaningful results, unless you’re very explicit and goal oriented…” (Bibblio, Search vs Discovery, 2015)

• How about users ..• may not know what/how to search

• who are not aware of the range of available datasets

• if presented with many datasets may not be able to choose the datasets-of-interest.

How can users discover relevant and ‘novel’ datasets on the portal?

Keyword search brings too many almost identical datasets; diversity is missing!

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Recommender System

7

A recommender system is an information filtering system that provide users with personalized contents and services.

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

PANGAEA Data Recommender

8

View

Dataset

Metadata-based RecommendationsDatasets with similar

metadata

Data Users

Interaction-based Recommendations

Users interested in this dataset were also

interested in…

1 2

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

9

Metadata-based Recommendation Interaction-based Recommendation

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

More diverse recommendations.

Image : World Register of Marine Species

Metadata-based Data Recommendation

10

• Leverages ElasticSearch More Like This (MLT) with boosting.

• MLT returns datasets that are similar to a provided data based on metadata elements, e.g., title, abstract, related publication, authors, topics, projects, devices, campaign, location, time.

Query Index Database

Documents (Datasets)

Set of documents ranked by their similarity to the query

Ranking function

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Metadata

Set of documents ranked by their similarity to the metadata

Usage-based Data Recommendation

11

• Utilizes 3 user interactions (extracted from the server logs) such as search interaction, joint download, and total download.

T1 T2 Tn

D1

D2

D3

D4

Dataset vs. Search Term Matrix

Joint downloadJointly downloaded datasets are likely to be similar.

Search interactionDatasets examined after launching similar searches are likely to be similar.

U1 U2 U3 Un

D1

D2

D3

Dn

Dataset vs. User Matrix

USER 1USER 1USER 1USER 1USER 1

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

System Implementation

12

Offline processing of server logs (search interaction, joint download and total download)

Interaction-based Data

Recommendations

PANGAEA Data Portal

Users interested in this dataset were also

interested in…

Datasets with similar metadata

PANGAEA Elastic Search Engine

Data Usage Index

Data Full Text Index

Metadata-based Data

Recommendations

2

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Online Evaluation

13Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

• Average clicks/day : 26• % contribution to total events

• Metadata : 83%• Interaction : 17%

14

• Click Through Rate (CTR) = Clicks ÷ Impressions

Online Evaluation

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

2,6

2 1,9 1,9 1,8 1,82 2,1

1,6

2,1

1,8 1,9

0

0,5

1

1,5

2

2,5

3

Aug 18

Sep 18

Okt 18

Nov 18

Dez 18

Jan 19

Feb 19

Mrz 19

Apr 19

Mai 19

Jun 19

Jul 19

CTR

%

Monthly Click Through Rates (CTR) (Aug 2018 - July 2019)

Conclusions

15

• Developed a recommender to improve data discovery, which presents users with two kinds of recommendations.

• Building a data recommender on top of the ElasticSearch enhances the scalability and maintainability of the recommender system.

Ongoing/planned work:• More features –

ontological concepts of parameters.

• Improve the presentation of recommendations.

• Extend online evaluation (conversion rate).

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.