Rijksmuseum presentation

30
Trusting user-contributed data in Cultural Heritage Domain Archana Nottamkandath (Work done with Davide Ceolin & Wan Fokkink) VU University Amsterdam COMMIT/SEALINC 1

Transcript of Rijksmuseum presentation

Trusting user-contributed data in Cultural Heritage Domain

Archana Nottamkandath(Work done with Davide Ceolin & Wan Fokkink)

VU University Amsterdam

COMMIT/SEALINC

1

Context

• COMMIT/SEALINC project• Museums have collections which can be

annotated with (external) user-contributed information for searching better through collection

COMMIT/SEALINC 2

TulipsTulips

ButterflyButterfly

PortraitPortrait

Can we directly trust the user provided content?

COMMIT/SEALINC 3

Can we trust the user provided content directly? – Apparently Not!

COMMIT/SEALINC 4

Stella is GayStella is Gay

wwwapartmentvermeercomwwwapartmentvermeercom

Possible Solution: Manually evaluate annotations

COMMIT/SEALINC 5

Accept

Not sure

Reject

But…RMA has over 1 million Collection items!!

Evaluation costs Resources

• Is expensive manual labor• Costs a lot of time• Requires adherence to museum policies– Museum X [Accept, not sure, reject]– Museum Y [Foreign, Judgmental, Strong reject,

Strong accept ]..

COMMIT/SEALINC 7

Need for automated trust analysis

• Algorithms automatically/ semi-automatically evaluate annotations

COMMIT/SEALINC 8

(a) Flower(b) 19th century (c) Sunshine(d) Vermeer(e) Bronze

Automated Trust analysis algorithms

• Requirements– High accuracy (Accurately predict evaluations

most of the time)– Minimum input from cultural heritage

professionals– Scalable and Efficient (w.r.t resources and time)– Works with different cultural heritage data

COMMIT/SEALINC 9

Definition

• Trustworthy annotation – Relevant to image– Enhances/re-instates existing knowledge– Is acceptable by museums policies to be published

on their website

COMMIT/SEALINC 10

Used

Accurator Interface

Existing workflow

COMMIT/SEALINC 11

TulipsRosesNight SkyVan GoghBuddhistPortraitMonumentAsianWar memorial

User_name: Jones

contributed

Tags

How to determine trust from user contributing annotations to the

system?

COMMIT/SEALINC 12

TulipsRosesNight SkyVan GoghBuddhistPortraitMonumentAsianWar memorial

User_name: Jones

contributedUsed

Accurator Interface

Tags

How to determine trust from the Annotation Process?

COMMIT/SEALINC 13

TulipsRosesNight SkyVan GoghBuddhistPortraitMonumentAsianWar memorial

User_name: Jones

contributedUsed

Accurator Interface

Tags

How to determine trust from contributed data?

COMMIT/SEALINC 14

TulipsRosesNight SkyVan GoghBuddhistPortraitMonumentAsianWar memorial

User_name: Jones

contributedUsed

Accurator Interface

Tags

How to determine trust from users?[1]

• Evaluate subset of user tags

COMMIT/SEALINC 15

TulipsRosesNight SkyVan GoghBuddhistPortraitMonumentAsianWar memorial

User_name: Jones Test setRosesNight skyVan GoghAsianWarMemorial

contributed

Train set

TulipsVan GoghBuddhistMonument

Evaluates

Museum

• User expert on one topic might be expert on similar topics

COMMIT/SEALINC 16

Expert on

Tulips

Possibly Expert on

Possibly Expert on

Roses

Lilies

User_name: Jones

Test setRosesNight skyVan GoghAsianWarMemorial

Train setTulipsVan GoghBuddhistMonument

How to determine trust from users?[1]

With a certain probability

Determine trust from users[2]• User profile : [Experience, education, country,

gender, income, museum visits…]

COMMIT/SEALINC 17Steve.museum dataset

Determine trust from users[2]

• Predict user reputation using machine learning

• [Feature1, Feature2, ..] -> Category of user– [21 yrs, Female, Bachelors, Australia] -> Excellent– [60 yrs, Male, PhD, America] -> Good– [56 yrs, Female, Masters, Croatia] -> Bad– [30 yrs, Male, Bachelors, Mexico] -> ?

COMMIT/SEALINC 18

How to determine trust from Annotation process?

• Time of day, Day of week, Day of month etc. affect user quality

• Typing speed affects user quality– Typing fast might indicate higher confidence

COMMIT/SEALINC 19

TulipsVan GoghBuddhistMonument

Rich LadyPlantLeonardoBronze plate

How to determine trust from Annotation process?

• Predict tag quality using machine learning• [Feature1, Feature2, ....] -> Category of Tag– [10:00, Monday, June, 3s] -> Excellent– [12:00, Wednesday, 15s] -> Good– [23:56, Friday, April, 80s] -> Bad– [06:00, Thursday, March, 70s] -> ?

COMMIT/SEALINC 20

How to determine trust from Annotation process?

• Why is this important?– Useful for anonymous users who did not fill profile

information

COMMIT/SEALINC 21

How to determine trust from data?• Contributed data itself has features, use

machine learning to predict quality of tag– Length – Specificity – Presence in vocabularies– Times already contributed– Noun

COMMIT/SEALINC 22

TulipsVan GoghBuddhistMonument

[6,specific, yes, English, 10, no…] -> Good[7,specific, yes, Dutch, 1,yes…] -> Bad

Goals achieved

• Requirements– High accuracy (Accurately predict evaluations

most of the time)– Minimum input from cultural heritage

professionals– Scalable and efficient– Works with different cultural heritage data

COMMIT/SEALINC 23

Goal 1: High Accuracy

COMMIT/SEALINC 24

– High accuracy (Accurately predict evaluations most of the time)• Predicted quality of a tag based on user profile with

accuracy from 68% to 72%

COMMIT/SEALINC 25

Steve dataset results

Goal 1: High Accuracy

Goal 2: Minimum input from Cultural Heritage Institutions

• Algorithms require minimum of 5 evaluated tags per user for predictions

• Working on to minimize/eliminate this requirement

COMMIT/SEALINC 26

Goal 3: Scalable and efficient• Reduced computation time while maintaining

accuracy in Steve dataset

COMMIT/SEALINC 27

Goal 4: Works with different cultural heritage data

• Steve Museum dataset• Waisda? Dataset– Video Tagging Game

• SEALINC Media experiments at CWI

COMMIT/SEALINC 28

Future Work

• Employ our experiences and algorithms to analyze the data from Accurator

• Employ trust scores for ranking in search• Identify techniques to visualize trust

COMMIT/SEALINC 29

Thank [email protected]

COMMIT/SEALINC 30