Measuring Self-Focus Bias in Community Maintained Knowledge Repositories

104
Measuring Self-Focus Bias in Community Maintained Knowledge Repositories Brent Hecht and Darren Gergle Northwestern University

Transcript of Measuring Self-Focus Bias in Community Maintained Knowledge Repositories

Measuring Self-Focus Bias in Community Maintained Knowledge RepositoriesBrent Hecht and Darren GergleNorthwestern University

Overview

1. Introduction

2. Study 1

3. Study 2

4. Discussion

5. Conclusion

Introduction

Sum of World Knowledge

Introduction

Sum of World Knowledge

Introduction

Introduction

• Artificial Intelligence

Introduction

• Artificial Intelligence• Natural Language Processing

Introduction

• Artificial Intelligence• Natural Language Processing• Human-Computer Interaction

Introduction

• Artificial Intelligence• Natural Language Processing• Human-Computer Interaction• CSCW

Introduction

World knowledge according to whom?

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

Introduction

• self-focus bias• effect of community-held opinions and interests on the world knowledge in Wikipedia• if it exists, both positive and negative

Introduction

Introduction terms and concepts

Introduction terms and concepts

subset of the English Wikipedia Article Graph (WAG)

subset of the English Wikipedia Article Graph (WAG)

Introduction terms and concepts

subset of the English Wikipedia Article Graph (WAG)

• “Barack Obama” has 2 inlinks• “Barack Obama” has an indegree of 2

Introduction terms and concepts

subset of the English Wikipedia Article Graph (WAG)

Introduction terms and concepts

subset of the English Wikipedia Article Graph (WAG)

• indegree → what people are writing about• indegree → relatedness to sum of world knowledge in each Wikipedia

Introduction terms and concepts

subset of the English Wikipedia Article Graph (WAG)

Introduction terms and concepts

Barack Obama

The United States Joe Biden

subset of the English Wikipedia Article Graph (WAG)

• indegree → what people are writing about• indegree → relatedness to sum of world knowledge in each Wikipedia

Introduction terms and concepts

Barack Obama

The United States Joe Biden

Study 1 methods

definition of focus

• focus = indegree in Wikipedia Article Graph (WAG)

Study 1 methods

definition of focus

• focus = indegree in Wikipedia Article Graph (WAG)• greater indegree = greater focus

Study 1 methods

definition of focus

• focus = indegree in Wikipedia Article Graph (WAG)• greater indegree = greater focus• compare across 15 Wikipedias

Study 1 methods

definition of focus

Study 1

English Wikipedia

methods

Penn StateUniversity

Jonathan Frakes Pennsylvania Interstate

99

Université d'État de Pennsylvanie

Jonathan Frakes Pennsylvania

French Wikipedia

definition of focus

indegree = 3 indegree = 1

Experiment methods

Poutine!

http://commons.wikimedia.org/wiki/File:Poutine.JPG

Study 1

English Wikipedia

methods

French Wikipedia

definition of focus

Poutine

French Fries Cheddar

Cheese

Poutine

Chez Ashton French Fries Cheddar

Cheese

indegree = 0 indegree = 3

Chez Ashton

Study 1 methods

definition of focus

Study 1 methods

sample and statistic

• sample = geographic articles

Study 1 methods

sample and statistic

Study 1 methods

sample and statistic

Study 1 methods

sample and statistic

• statistic = spatial indegree sums

Study 1 methods

sample and statistic

Study 1 methods

Finland

sample and statistic

Flying Finn Airline

Study 1 methods

FinlandHelsinki

Rovaniemi

sample and statistic

Flying Finn Airline

Study 1 methods

FinlandHelsinki

Rovaniemi

Finno-Urgic Languages

Sub-arctic Climate

Linus Torvalds

Sub-arctic Climate

sample and statistic

Flying Finn Airline

Study 1 methods

FinlandHelsinki

Rovaniemi

Finno-Urgic Languages

Sub-arctic Climate

Linus Torvalds

Sub-arctic Climate

• Finland has an indegree sum = 4

sample and statistic

Flying Finn Airline

Study 1 null hypothesis

Study 1 null hypothesis

H0: Indegree sums will have roughly the same distribution in every Wikipedia

Study 1 null hypothesis

H0: Indegree sums will have roughly the same distribution in every Wikipedia

All Wikipedias agree on focus distribution

Study 1 null hypothesis

H0: Indegree sums will have roughly the same distribution in every Wikipedia

All Wikipedias agree on focus distribution

Self-focus bias does not exist

Study 1 self-focus hypothesis

Study 1 self-focus hypothesis

H1: Each language’s Wikipedia will have higher indegree sums in countries where

the language is prominent

Study 1 self-focus hypothesis

H1: Each language’s Wikipedia will have higher indegree sums in countries where

the language is prominent

Each Wikipedia will demonstrate greater focus on its language’s culture hearth

Study 1 self-focus hypothesis

H1: Each language’s Wikipedia will have higher indegree sums in countries where

the language is prominent

Each Wikipedia will demonstrate greater focus on its language’s culture hearth

Self-focus bias exists

Indegree Sums in the Russian Wikipedia

Indegree Sums in the English Wikipedia

Indegree Sums in the Polish Wikipedia

Study Iresults

Country Indegree Sum

Germany 718,668

United States 114,720

France 110,554

Switzerland 103,387

Austria 95,986

Italy 93,116

German Wikipedia

Study Iresults

Finnish Wikipedia

Country Indegree Sum

Finland 55,331

United States 25,664

Germany 11,972

Russia 10,076

United Kingdom 9,402

Italy 7,948

Study Iresults

Country Indegree Sum

Japan 453,048

Italy 70,922

United States 60,384

China 37,208

Germany 25,276

United Kingdom 20,690

Study Iresults

Country Indegree Sum

Japan 453,048

Italy 70,922

United States 60,384

China 37,208

Germany 25,276

United Kingdom 20,690

Japanese Wikipedia

Study Iresults

Study I

!

results

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

Y

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

YY

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

YYY

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

YYN

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

YYNN

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

YYNNY

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

YYNNYN

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

YYNNYN

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

YNNYN

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

YNNYN

Num

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

Y

NYN

Num

Study Iresults

Country Indegree Sum

United States 1,366,261

United Kingdom 439,582

France 189,698

Germany 151,503

Canada 146,191

Italy 129,133

English Wikipedia

Y

NYN

Num

Den

Study Iresults

SFR(WEnglish ) =USAFrance

=1,366,261189,698

= 7.2

Language Self-focus RatioEnglish 7.2

Japanese 6.4German 6.3French 4.2Italian 3.6

Catalan 2.9Spanish 2.4Finnish 2.2Polish 1.7

Norwegian 1.4Chinese 1.2Dutch 0.7

Swedish 0.6Portuguese 0.3

Study Iresults

• sample = geographic articles• statistic = spatial indegree sums

Study 1I methods

sample and statistic

• sample = geographic articles• statistic = spatial indegree sums

Study 1I methods

sample and statistic

• sample = geographic articles• statistic = spatial indegree sums

Study 1I methods

sample and statistic

spatial pagerank score sums

Language Self-focus RatioCatalan 2.7Finnish 1.7

Norwegian 0.5

Study 1I results

Discussionhyperlingual approach

Discussionhyperlingual approach

• 15 Wikipedias (22)

Discussionhyperlingual approach

• 15 Wikipedias (22)• over 8 million articles

Discussionhyperlingual approach

• 15 Wikipedias (22)• over 8 million articles• over 270 million links

Discussionhyperlingual approach

• 15 Wikipedias (22)• over 8 million articles• over 270 million links• English less than 1/4 the data

Discussionhyperlingual approach

• 15 Wikipedias (22)• over 8 million articles• over 270 million links• English less than 1/4 the data• it was “easy” with WikAPIdia software

Discussionhyperlingual approach

Discussion

• general benefits

hyperlingual approach

Discussion

• general benefits• similarities → more robust findings

hyperlingual approach

Discussion

• general benefits• similarities → more robust findings • differences → cultural diversity

hyperlingual approach

Discussion

• general benefits• similarities → more robust findings • differences → cultural diversity

• mine cultural diversity

hyperlingual approach

Discussion

• general benefits• similarities → more robust findings • differences → cultural diversity

• mine cultural diversity• “culturally-aware applications”

hyperlingual approach

Discussion

• general benefits• similarities → more robust findings • differences → cultural diversity

• mine cultural diversity• “culturally-aware applications”

• very rarely in literature

hyperlingual approach

DiscussionAfrica

1. self-focus is a systemic bias in Wikipedia• people reorient world knowledge around themselves• many implications for technologies

ConclusionCliffs Notes

Indegree Sums in the English Wikipedia

1. self-focus is a systemic bias in Wikipedia• people reorient world knowledge around themselves• many implications for technologies

2. hyperlingual approach proved very useful

ConclusionCliffs Notes

Nada Petrović Colleagues at the Collabolab

NSF #0705901 Microsoft Research

Acknowledgements

Contact Info

[email protected]