Gleaning provenance from article similarity

20
Gleaning provenance from plagiarism detection article similarity Team BBCeX @fantasticlife @ekponk @tristanf UK Parliament + Springer Nature + BBC R&D

Transcript of Gleaning provenance from article similarity

Page 1: Gleaning provenance from article similarity

Gleaning provenance from plagiarism detectionarticle similarityTeam BBCeX@fantasticlife @ekponk @tristanfUK Parliament + Springer Nature + BBC R&D

Page 2: Gleaning provenance from article similarity

What are we doing?

Measure the similarity of articles

Are they recycled churnalism or original reporting?

Use similarity to show clusters of articles and outlets

Use similarity to find the original source of an article

based on publication date

Page 3: Gleaning provenance from article similarity

Why?

We think similarity could be an indicator of provenance which is a signal of trust

though not entirely sure if it’s positive or negative!

cf “Original reporting” and “Citations and References”

Probably an indicator to be combined with other things

Page 4: Gleaning provenance from article similarity

An investigation into feasibility

and usefulness

Page 5: Gleaning provenance from article similarity

MAP OF EVERYTHING

Page 6: Gleaning provenance from article similarity

A CLUSTER

Page 7: Gleaning provenance from article similarity

http://news-provenance.herokuapp.com/

Page 8: Gleaning provenance from article similarity
Page 9: Gleaning provenance from article similarity

Least similar

Most similar

Time

Page 10: Gleaning provenance from article similarity
Page 11: Gleaning provenance from article similarity
Page 12: Gleaning provenance from article similarity

Least similar

Most similar

Time

Page 13: Gleaning provenance from article similarity
Page 14: Gleaning provenance from article similarity

SIMILAR OUTLETS

Page 15: Gleaning provenance from article similarity

Who would use it?

For consumers reading an article this could...

Show the source (or show if this is the source for others)

Show a churnalism rating (recycled vs original)

Link to other diverse perspectives (using clusters)

Page 16: Gleaning provenance from article similarity

BROWSER EXTENSION?

Page 17: Gleaning provenance from article similarity

Who would use it?

For publishers

Show where similar or distinct to competitors

Identify market gaps

Reward original journalism (getting credit)

Journalism about journalism

Page 18: Gleaning provenance from article similarity

Who would use it?

For aggregators and platforms

Cluster sources around stories

Identify the source

Separate signal from noise

Trust indicators

Page 19: Gleaning provenance from article similarity

What would we do next?

Validate this

Test other similarity measures

Add the wires + fake news

Scalability

Develop tools for users

Fix the post-truth problem

Page 20: Gleaning provenance from article similarity

Gleaning provenance from plagiarism detectionarticle similarityTeam BBCeX@fantasticlife @ekponk @tristanfUK Parliament + Springer Nature + BBC R&D