Quality and collaboration in Wikidata

Post on 23-Jan-2018

343 views 5 download

Transcript of Quality and collaboration in Wikidata

QUALITY AND COLLABORATION

IN WIKIDATA

Elena Simperl and

Alessandro Piscopo

University of Southampton, UK

@esimperl

OVERVIEW

Wikidata is a critical AI asset in many applications

Recent project of Wikimedia (2012), edited collaboratively

Our research assesses the quality of Wikidata and the link between community processes and quality

WHAT IS WIKIDATA

BASIC FACTS

Collaborative knowledge graph

100k registered users, 35M items

Open licence

RDF exports, connected to Linked Open Data Cloud

THE KNOWLEDGE GRAPHSTATEMENTS, ITEMS, PROPERTIES

Item identifiers start with a Q, property identifiers

start with a P

5

Q84

London

Q334155

Sadiq Khan

P6

head of government

THE KNOWLEDGE GRAPHITEMS CAN BE CLASSES, ENTITIES, VALUES

6

Q7259Ada Lovelace

Q84London

Q334155Sadiq Khan

P6

head of government

Q727Amsterdam

Q515city

Q6581097male

Q59360Labour party

Q145United Kingdom

THE KNOWLEDGE GRAPHADDING CONTEXT TO STATEMENTS

Statements may include context Qualifiers (optional)

References (required)

Two types of references Internal, linking to another item

External, linking to webpage

7

Q84London

Q334155Sadiq Khan

P6head

of government

9 May 2016

https://www.london.gov.uk/...

THE KNOWLEDGE GRAPHCO-EDITED BY BOTS AND HUMANS

Human editors can register or work anonymously

Bots created by community for routine tasks

OUR WORK

Influence of community make-up on outcomes

Effects of editing practice on outcomes

Data quality, as a function of its provenance

THE RIGHT MIX OF USERS

Piscopo, A., Phethean, C., & Simperl, E. (2017) What

Makes a Good Collaborative Knowledge Graph:

Group Composition and Quality in Wikidata.

International Conference on Social Informatics, 305-

322, Springer.

BACKGROUND

Wikidata editors have varied tenure and interests

Group composition impacts outcomes

Diversity can multiple effects

Moderate tenure diversity increases outcome quality

Interest diversity leads to increased group productivity

Chen, J., Ren, Y., Riedl, J.: The effects of diversity on group productivity and member withdrawal in online volunteer groups. In: Proceedings of the 28th international

conference on human factors in computing systems - CHI ’10. p. 821. ACM Press, New York, USA (2010)

OUR STUDY

Analysed the edit history of itemsUsed corpus of 5000 items, whose quality has been manually assessed (5 levels)*

Edit history focused on community make-up

Community is defined as set of editors of item

Considered features from group diversity literature and Wikidata-specific aspects

*https://www.wikidata.org/wiki/Wikidata:Item_quality

RESEARCH HYPOTHESES

Activity Outcome

H1 Bots edits Item quality

H2 Bot-human interaction Item quality

H3 Anonymous edits Item quality

H4 Tenure diversity Item quality

H5 Interest diversity Item quality

DATA AND METHODS

Ordinal regression analysis, four models were trained

Dependent variable: 5000 labelled Wikidata items

Independent variables

Proportion of bot edits

Bot human edit proportion

Proportion of anonymous edits

Tenure diversity: Coefficient of variation

Interest diversity: User editing matrix

Control variables: group size, item age

RESULTSALL HYPOTHESES SUPPORTED

H1

H2

H3 H4

H5

LESSONS LEARNED

The more is not always the merrier

01Bot edits are key for quality, but bots and humans are better

02Diversity matters

03

IMPLICATIONS

Encourage registration

01Identify further areas for bot editing

02Design effective human-bot workflows

03Suggest items to edit based on tenure and interests

04

LIMITATIONS AND FUTURE WORK

▪ Measures of quality over time required

▪ Sample vs Wikidata (most items C or lower)

▪ Other group features (e.g., coordination) not

considered

▪ No distinction between editing activities (e.g.,

schema vs instances, topics etc.)

▪ Different metrics of interest (topics, type of

activity)

18

THE DATA IS AS GOOD AS ITS REFERENCES

Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E.

(2017). Provenance Information in a Collaborative

Knowledge Graph: an Evaluation of Wikidata External

References. International Semantic Web Conference,

542-558, Springer.

19

PROVENANCE IN WIKIDATA

Statements may include context Qualifiers (optional)

References (required)

Two types of references Internal, linking to another item

External, linking to webpage

Q84London

Q334155Sadiq Khan

P6head

of government

9 May 2016

https://www.london.gov.uk/...

THE ROLE OF PROVENANCE

Wikidata aims to become a hub of references

Data provenance increases trust in Wikidata

Lack of provenance hinders data reuse

Quality of references is yet unknown

Hartig, O. (2009). Provenance Information in the Web of Data. LDOW, 538.

OUR STUDY

Approach to evaluate quality of external references in Wikidata

Quality is defined by the Wikidata verifiability policy Relevant: support the statement they are attached to

Authoritative: trustworthy, up-to-date, and free of bias for supporting a particular statement

Large-scale (the whole of Wikidata)

Bot vs. human-contributed references

RESEARCH QUESTIONS

RQ1 Are Wikidata external references relevant?

RQ2 Are Wikidata external references authoritative?

▪I.e., do they match the author and publisher types from the Wikidata policy?

RQ3 Can we automatically detect non-relevant and non-authoritative references?

METHODSTWO STAGE MIXED APPROACH

1. Microtask crowdsourcing

▪Evaluate relevance & authoritativeness of a reference sample

▪Create training set for machine learning model

2. Machine learning

▪Large-scale reference quality prediction

RQ1 RQ2

RQ3

STAGE 1: MICROTASK CROWDSOURCING

▪3 tasks on Crowdflower

▪5 workers/task, majority voting

▪Test questions to select workers

25

Feature Microtask Description

Relevance T1 Does the reference support the statement?

Authoritativeness

T2 Choose author type from list

T3.A Choose publisher type from list

T3.B Verify publisher type, then choose sub-type from list

RQ1

RQ2

STAGE 2: MACHINE LEARNING

Compared three algorithms Naïve Bayes, Random Forest, SVM

Features based on [Lehmann et al., 2012 & Potthast et al. 2008]

Baseline: item labels matching (relevance); deprecated domains list (authoritativeness)

RQ3

Features

URL reference uses Subject parent class

Source HTTP code Property parent class

Statement item vector Object parent class

Statement object vector Author type

Author activity Author activity on references

DATA

1.6M external references (6% of total) 1.4M from two sources (protein KBs)

83,215 English-language references Sample of 2586 (99% conf., 2.5% m. of error)

885 assessed automatically, e.g., links not working or csv files

RESULTS: CROWDSOURCINGCROWDSOURCING WORKS

▪Trusted workers: >80% accuracy

▪95% of responses from T3.A confirmed in T3.B

Task No. of microtasks Total workers Trusted workers Workers’ accuracy Fleiss’ k

T1 1701 references 457 218 75% 0.335

T2 1178 links 749 322 75% 0.534

T3.A 335 web domains 322 60 66% 0.435

T3.B 335 web domains 239 116 68% 0.391

RESULTS: CROWDSOURCINGMAJORITY OF REFERENCES ARE HIGH QUALITY

2586 references evaluated

Found 1674 valid references from 345 domains

Broken URLs deemed not relevant and not authoritative

RQ1

RQ2

RESULTS: CROWDSOURCINGHUMANS ARE BETTER AT EDITING REFERENCES

RQ1

RQ2

RESULTS: CROWDSOURCINGDATA FROM GOVT. AND ACADEMIA

Most common author type (T2)

Organisation (78%)

Most common publisher types (T3)

Governmental agencies (37%)

Academic organisations (24%)

RQ2

RESULTS: MACHINE LEARNINGRANDOM FORESTS PERFORM BEST

F1 MCC

Relevance

Baseline 0.84 0.68

Naïve Bayes 0.90 0.86

Random Forest 0.92 0.89

SVM 0.91 0.87

Authoritativeness

Baseline 0.53 0.16

Naïve Bayes 0.86 0.78

Random Forest 0.89 0.83

SVM 0.89 0.79

RQ3

LESSONS LEARNED

Crowdsourcing+ML works!

Many external sources are high quality

Bad references mainly non-working links, continuous control required

Lack of diversity in bot-added sources

Humans and bots are good at different things

LIMITATIONS AND FUTURE WORK

Studies with non-English sources

New approach for internal references

Deployment in Wikidata, including changes inediting behaviour

THE COST OF FREEDOM: ON THE ROLE OF PROPERTY CONSTRAINTS IN WIKIDATA

35

BACKGROUND

Wikidata is built by the community, from scratch

Editors are free to carry out any kind of edit

There is tension between editing freedom and quality of the modelling

Property constraints have been introduced at a later stage

Currently 18 constraints, but they are not enforced

36Hall, A., McRoberts, S., Thebault-Spieker, J., Lin, Y., Sen, S., Hecht, B., & Terveen, L. (2017, May). Freedom versus standardization: structured data generation in a peer

production community. In Proceedings of the 2017 CHI Conference on human fators in computing sytems (pp. 6352-6362). ACM.

OUR STUDY

Effects of property constraints onContent quality, i.e., increasing user awareness of property use

Diversity of expression

Editor behaviour, by increasing conflict level

▪Several claims can be expressed for a statement, thanks to qualifiers and references

38

Q84London

Q334155Sadiq Khan

P6

9 May 2016

https://www.london.gov.u

k/…

The cost of freedom: Claims

Q180589Boris Johnson

4 May 2008

https://www.london.gov.u

k/…

RESEARCH HYPOTHESES

Activity Outcome

H1 Property constraints Property perspicuity

H2 Property constraints Knowledge diversity

H3 Property constraints Level of conflict

METRICS

▪ Property perspicuity: V = Nviolations/Nclaims

▪ Knowledge diversity: KDscore = Nclaims/Nstatements

▪ Controversy metric:

▪ Conflicting edits

▪ Cscore = Nconfl.edits/Nedits (0> Cscore>>1)

40

METHODS

H1: Linear trend analysis of Cviolations

H2 and H3: Lagged, multiple regression models to predict changes between Tn & Tn–1in KDscore and Cscore

RESULTS

H1 was supported, but limited to some constraints

12 constraints out of 18 showed significant variations along the time frame observed

Constraint with largest variation was type (i.e., property domain)

RESULTS

H2 was rejected, but more property constraints at the

beginning of a time frame lead to decreased knowledge

diversity

RESULTS

H3 was rejected, constraints lead to fewer conflicts

LIMITATIONS

Wikidata still in early state of development

Metrics need further refinement

Changes were made to constraints after our

analysis, which could produce new effects

LESSONS LEARNED

Editors seem to understand meaning of property constraints

Low level of knowledge diversity and conflict overall

Non-enforcement of constraints seems to have only limited effect on community dynamics

Effects of when and how constraints are introduced not explored yet

46

CONCLUSIONS

47

SUMMARY OF FINDINGS

Collaboration between human and bots is important

Tools needed to identify tasks for bots and continuously study their effects on outcomes and community

References are high quality, though biases exist in terms of choice of sources

Wikidata’s approach to knowledge engineering questions existing theoretical and empirical literature