Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset...

15
Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining Anusuriya Devaraju, Uwe Schindler, Robert Huber, Michael Diepenbroek {adevaraju, uschindler, rhuber, mdiepenbroek}@marum.de PANGAEA Data Publisher for Earth & Environmental Science, MARUM - Center for Marine Environmental Sciences, University of Bremen, Germany. Geospatial Sensing Conference 2019 From Sensing to Understanding our World September 2 – 4, 2019 Münster, Germany

Transcript of Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset...

Page 1: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

Data Recommender System: Improving the Discovery of Environmental Datasets through

Text Analytics and Usage Mining

Anusuriya Devaraju, Uwe Schindler, Robert Huber, Michael Diepenbroek{adevaraju, uschindler, rhuber, mdiepenbroek}@marum.de

PANGAEA Data Publisher for Earth & Environmental Science,MARUM - Center for Marine Environmental Sciences, University of Bremen, Germany.

Geospatial Sensing Conference 2019From Sensing to Understanding our World September 2 – 4, 2019 Münster, Germany

Page 2: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

Presentation Outline

2

• PANGAEA Digital Data Repository

• Data Recommender Systema. Metadata-based data recommendation

b. Users interaction-based data recommendation

• Online Evaluation & Results

• Conclusions

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Page 3: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

PANGAEA Digital Data Repository

3

> 386000 scientific datasets (30.08.2019)• Founded in 1993

• Jointly managed by AWI and MARUM.

• Datasets from researchers, projects, research centres and infrastructures (national & international) published with DOIs

• Data types, e.g., time series, spatial, images, audio, video.

PANGAEA Data Portal (https://pangaea.de/)

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Page 4: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

Data Search in PANGAEA

4

• PANGAEA offers tools/APIs for meta(data) access and discovery.

Data DOI

Publication DOI

IGSN

ORCID

Project

Full-text search with autocomplete

Search by topics

Map-based search

Search functions in PANGAEA Dataset and its related research objects

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Page 5: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

Data Search in PANGAEA

5Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Page 6: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

Search Meets Discovery

6

• “Search is often struggling to deliver meaningful results, unless you’re very explicit and goal oriented…” (Bibblio, Search vs Discovery, 2015)

• How about users ..• may not know what/how to search

• who are not aware of the range of available datasets

• if presented with many datasets may not be able to choose the datasets-of-interest.

How can users discover relevant and ‘novel’ datasets on the portal?

Keyword search brings too many almost identical datasets; diversity is missing!

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Page 7: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

Recommender System

7

A recommender system is an information filtering system that provide users with personalized contents and services.

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Page 8: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

PANGAEA Data Recommender

8

View

Dataset

Metadata-based RecommendationsDatasets with similar

metadata

Data Users

Interaction-based Recommendations

Users interested in this dataset were also

interested in…

1 2

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Page 9: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

9

Metadata-based Recommendation Interaction-based Recommendation

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

More diverse recommendations.

Image : World Register of Marine Species

Page 10: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

Metadata-based Data Recommendation

10

• Leverages ElasticSearch More Like This (MLT) with boosting.

• MLT returns datasets that are similar to a provided data based on metadata elements, e.g., title, abstract, related publication, authors, topics, projects, devices, campaign, location, time.

Query Index Database

Documents (Datasets)

Set of documents ranked by their similarity to the query

Ranking function

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Metadata

Set of documents ranked by their similarity to the metadata

Page 11: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

Usage-based Data Recommendation

11

• Utilizes 3 user interactions (extracted from the server logs) such as search interaction, joint download, and total download.

T1 T2 Tn

D1

D2

D3

D4

Dataset vs. Search Term Matrix

Joint downloadJointly downloaded datasets are likely to be similar.

Search interactionDatasets examined after launching similar searches are likely to be similar.

U1 U2 U3 Un

D1

D2

D3

Dn

Dataset vs. User Matrix

USER 1USER 1USER 1USER 1USER 1

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Page 12: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

System Implementation

12

Offline processing of server logs (search interaction, joint download and total download)

Interaction-based Data

Recommendations

PANGAEA Data Portal

Users interested in this dataset were also

interested in…

Datasets with similar metadata

PANGAEA Elastic Search Engine

Data Usage Index

Data Full Text Index

Metadata-based Data

Recommendations

2

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

Page 13: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

Online Evaluation

13Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

• Average clicks/day : 26• % contribution to total events

• Metadata : 83%• Interaction : 17%

Page 14: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

14

• Click Through Rate (CTR) = Clicks ÷ Impressions

Online Evaluation

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.

2,6

2 1,9 1,9 1,8 1,82 2,1

1,6

2,1

1,8 1,9

0

0,5

1

1,5

2

2,5

3

Aug 18

Sep 18

Okt 18

Nov 18

Dez 18

Jan 19

Feb 19

Mrz 19

Apr 19

Mai 19

Jun 19

Jul 19

CTR

%

Monthly Click Through Rates (CTR) (Aug 2018 - July 2019)

Page 15: Data Recommender System: Improving the Discovery of ...€¦ · Search functions in PANGAEA Dataset and its related research objects Devaraju, A. et al., Data Recommender System:

Conclusions

15

• Developed a recommender to improve data discovery, which presents users with two kinds of recommendations.

• Building a data recommender on top of the ElasticSearch enhances the scalability and maintainability of the recommender system.

Ongoing/planned work:• More features –

ontological concepts of parameters.

• Improve the presentation of recommendations.

• Extend online evaluation (conversion rate).

Devaraju, A. et al., Data Recommender System: Improving the Discovery of Environmental Datasets through Text Analytics and Usage Mining, Geospatial Sensing Conference 2019.