Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

20
Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments Davide Ceolin, Julia Noordegraaf, Lora Aroyo

Transcript of Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Page 1: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Capturing the Ineffable:

Collecting, Analysing, and Automating

Web Document Quality Assessments

Davide Ceolin, Julia Noordegraaf, Lora Aroyo

Page 2: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

• Introduction

• Nichesourcing Web Document Quality Assessments

• User studies

• Conclusion and Future Work

Outlin

e

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 3: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Introduction

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 4: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Web Document Quality Assessment

• Source criticism• Methodological practice from the humanities

• e.g., from the American Library Association:• How was the source located?

• What type of source is it?

• Who is the author and what are the qualifications of the author in regard to the topic that is discussed?

• When was the information published?

• In which country was it published?

• What is the reputation of the publisher?

• Does the source show a particular cultural or political bias?.

• How does it apply to Web sources?

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 5: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Web Document Quality Assessment

What is the quality of each of these documents?

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Authoritative source ✓

Accurate ✓

Precise ✓

Complete ✓

Neutral (?)

Blog Post (?)

Accurate (?)

Precise (?)

Complete (?)

Neutral ✗

Page 6: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

• We adapt source criticism to Web documents & aim at automating the process of quality estimation by:• Gathering quality assessments (mostly from experts).

• Looking for markers (document features) that correlate with them.

Quality and Quality

Markers

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 7: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Objectives

• Analyse the consistency of quality assessments.

• Are quality assessments consistent among users, over time, etc.?

• Analyse user ability to interpret document features.

• Can the users estimate the quality of a document from its sentiment or trustworthiness level?

• Analyse the predictability of quality assessments.

• Can we automatically estimate the quality of a document?

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 8: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Nichesourcing Web Document Quality Assessments

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 9: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

• Dataset: documents about vaccinations• Initially, 50 docs, various sources (blogs, authorities, etc.)

• Features• Information (automatically) extracted from documents

using AlchemyAPI & Web of Trust.• Entities, Topics, Sentiment, Emotions, Trustworthiness.

• Quality dimensions• Overall quality, accuracy, completeness, precision,

trustworthiness, readability, neutrality.

Dataset, Features, and Quality

Dimensions

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 10: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

• Setup:• 6 documents per participant.• Random selection.• Even distribution of assessments.• Scenario:

Suppose you are asked to write an article about debate on vaccinations triggered by the measles outbreak in 2015 at Disneyland in California.

WebQ: Nichesourcing Web Quality Assessments

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 11: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

• Documents are anonymized.• Users choose documents that meet their quality

criteria based on features only.• All feature values are shown, alone and together.

WebQ: Task 1

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 12: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

• Read each of the 6 articles.• Assess it.

• Rate completeness, accuracy, etc. • Likert scale 1-5.

• Annotate the article to explain the ratings• Articles are proxied & annotated through AnnotatorJS.

WebQ: Task 2

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 13: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

User Studies

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 14: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

• User Study 1

• Participants: 20 last-year UvA journalismstudents.

• Duration: 60’.

• User Study 2

• Participants: 20 RMA media scholars.

• Duration: 45’.

• Improvements (learnt from user study 1).

Setup

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 15: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

• Data collected:

• 104 (US1) + 47 (US2) assessments.

• 238 (US1) + 89 (US2) annotations.

• No significant difference between Use Cases (Wilcoxon signed-rank test).

• Assessments are assimilable.

• Assessment predictability (SVC)

• Up to 63% accuracy (5-classes)

• Up to 89% accuracy (2-classes)

• Promising predictability. We will try other algorithms.

Results

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 16: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

• Highest correlation with overall quality:• Accuracy

• Trustworthiness

• Precision

• Completeness

• Given the task at hand, neutrality is not relevant.

• Weak correlation task 1 - overall quality (task 2).

• Users were mostly unable to interpret those features.

Results

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 17: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

Conclusion

Capturing the Ineffable

Page 18: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

• We collected Web document quality assessments.• WebQ – Nichesourcing application.• 2 user studies with experts.• Clear defined task.• Controlled dataset.

• We analysed the assessments, and automatedtheir prediction.• The task matters more than subjectivity.• Assessments are quite uniform and coherent.• Features in isolation are not very meaningful.• The application setup is important.

Conclusion

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 19: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

• We plan to and are currently working on:• Extending the dataset (currently ~1,500 documents).

• Scaling up the experiments and gathering more assessments.

• Involving laymen via crowdsourcing.

• Extending the analyses.

• Utilising other automated reasoning approaches.

(Current and) Future Work

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments

Page 20: Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality Assessments

https://qupid-project.net/

[email protected]

Thank you!

Capturing the Ineffable: Collecting, Analysing, and Automating Web Document Quality

Assessments