To trust or not, is hardly the question! Sai Moturu.
-
date post
19-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of To trust or not, is hardly the question! Sai Moturu.
We're never so vulnerable than when we trust someone but paradoxically, if we cannot trust, neither can we find love or joy
- Walter Anderson
Trust Quality
Popularity
Reach
How much we can trust is the right question…
What are the hallmarks of consistently good information?
Objectivity: unbiased information
Completeness: self explanatory
Pluralism: not restricted to a particular viewpoint
Define prepositions of trust
Content quality
Six macro-areas: Quality of user, user distribution and leadership, stability, controllability, quality of editing and importance of an article.
Using the ten propositions, 50 sources of trust evidence are identified.
Macro-areas of analysis
Necessary to control the meaning of each trust factor in relationship to the others
IF stability is high AND (length is short OR edit is low OR importance is low) THEN warning
IF leadership is high AND dictatorship is high THEN warning
IF length is high AND importance is low THEN warning
Logic conditions
BasicThe better the authors, the better the article
quality
PeerReviewAssumption: A contributor reviews the content
before modifying it, thereby approving the content that he/she does not edit
Models
ProbReviewImproved assumption: A contributor may not
review the entire article before modifying itThe farther a word is from another that the
author has written, the lower the probability that he/she has read it
In conflicts, the higher probability is considered
Probability is modeled as a monotonically decaying function of the distance between the words
NaïveThe longer the article is , the better its qualityUsed as a baseline for comparison
Models
1. Initialize all quality and authority values equally
2. For each iteration Use authority values from previous iteration
to compute quality Use quality values to compute authority Normalize all quality and authority values
3. Repeat step 2 until convergence (alternatives: repeat until difference is very small or until maximum iterations have been reached)
Iterative computation
Use a set of articles on countries that have been assigned quality labels by Wikipedia’s Editorial team
Preprocessing: Bot revisions were removed from the analysis.Consecutive edits by a user were removed and
final edit was used.
Evaluation
Normalized discounted cumulative gain at top k (NDCG@k)Suited for ranked articles that have multiple
levels of assessment
Spearman’s rank correlationRelevant for comparing the agreement
between two rankings of the same set of objects
Evalation metrics
ProbReview works best with decay scheme 2 or 3.
Article length seems to be correlated with article quality
Adding this to Basic and PeerReview models showed some improvement but ProbReview did not benefit
Conclusions
Revision trust model may help addressArticle trustFragment trustAuthor trust
A dynamic Bayesian network is used to model the evolution of article trust over revisions
Wikipedia featured articles, clean-up articles and normal articles are used for evaluation
Summary
Uses revision history as well as the reputation of the contributing authors
Assigns trust to text
Summary
Propose the use of a trust tab in Wikipedia
Link-ratio: Ratio between the number of citation and the number of non-cited occurrences of the encyclopedia term
Evaluation: compare link ratio values for featured, normal and clean-up articles
Summary
Propose a content-driven reputation system for authors
Authors gain reputation when their work is preserved by subsequent authors and lose reputation when edits are undone or quickly rolled back
Evaluation: Low-reputation authors have larger than average probability of having poor quality as judged by human observers and are undone by later editors
Summary
A different question: What are the controversial articles?
Uses edit and collaboration historyTwo Models: Basic and Contributor RankContributor Rank model tries to differentiate
between disputes due to the article and those due to the aggressiveness of the contributors, with the former being the one that is to be measured
Evaluation: Identification of labeled controversial articles
Summary
Interesting area to work on
Different angles to consider and different questions too
Data is available easily and has lots of relevant features
Wikipedia editorial team classified articles help evaluation
Great scope for more work in this area
I want to look at this from the health perspective
Conclusions