Playing GWAP with strategies - using ESP as an example

Post on 13-Jan-2016

52 views 0 download

Tags:

description

Playing GWAP with strategies - using ESP as an example. Wen-Yuan Zhu CSIE, NTNU. Motivation. “Games with a purpose” (GWAP) is an innovative concept in computer science There are a lot of GWAP systems has been created Research on enhancing GWAP systems is scarce. State of the art. - PowerPoint PPT Presentation

Transcript of Playing GWAP with strategies - using ESP as an example

Playing GWAP with strategies- using ESP as an example

Wen-Yuan ZhuCSIE, NTNU

Motivation

• “Games with a purpose” (GWAP) is an innovative concept in computer science

• There are a lot of GWAP systems has been created

• Research on enhancing GWAP systems is scarce

State of the art

• “Human Computation” represents a new paradigm of applications– Some problems solved by human, not

computer• “Games with a purpose” (GWAP)

– created by Dr. Luis von Ahn (CMU)– the most popular application

Our Contribution

• We propose an evaluation metric for GWAP systems

• We study the inner properties of the ESP game using analysis

• We propose an “Optimal Puzzle Selection Algorithm” (OPSA)

Our Contribution (2)

• We implement a quasi ESP game, ESP Lite, to demonstrate our proposed algorithms

• We confirm GWAP systems are more efficient if they are designed and played with strategies

Outline

• Introduction• Analysis• Propose our algorithm• Evaluation & Simulation• Implementation• Result• Conclusion & Future work

Human Computation

• There is a lot of things that human can easy do that computers can not yet do– Speech recognition– Natural language understanding– Computer graphics

Games with a Purpose

• It combine computation with game • People spend a lot of time playing games• It makes Human Computation more efficient• There are a lot of GWAP systems has been

created (e.g. ESP game and Google Image Labeler)

What is the ESP game?Alice Bob

shoe flower

flowerrocks

agreement on “flower”

What is the ESP game? (2)

• it is efficient– 200,000+ players have contributed 50+

million labels– each player plays for a total of 91 minutes– 233 labels/human/hour (i.e. one label every

15 seconds)• Google bought a license to create its own

version of the game in 2006

Outline

• Introduction• Analysis• Propose our algorithm• Evaluation & Simulation• Implementation• Result• Conclusion & Future work

Objectives

1. Design a metric to measure the performing of the ESP game

2. Design proper strategies to improve the ESP game

3. Validate the proposed strategies in real world systems

Observations

• The ESP game has two goals1. Quantity : the system prefers to maximize

the number of images which have been played

2. Quality : the system prefers to take as many labels as possible for each image

• There is a trade-off between the two goals

System metricqualityquantityG

*)ln()ln( SNG

)ln()ln( rSr

TG

))ln()))(ln(ln()(ln( rSrTG

)ln()ln()ln())ln()(ln())(ln( 2 STrSTrG

CST

rG

2)2

)ln()ln()(ln(

n

iiSn

S1

1

i

iSNS

1*

average scores per labeled image# of labeled images

# of labels per labeled image

# of labels

Modeling(2)

2

)ln()ln( ST

er

2)2

)ln()ln((

STC

Outline

• Introduction• Analysis• Propose our algorithm• Evaluation & Simulation• Implementation• Result• Conclusion & Future work

Puzzle Selection Algorithms

• Optimal Puzzle Selection Algorithm (OPSA)– select an image based on our analysis

• Random Puzzle Selection Algorithm (RPSA)– select a image by random

• Fresh-first Puzzle Selection Algorithm (FPSA)– select a image that has been played least

frequently

OPSA

• The idea is there are optimal r labels per labeled image in system

• We group images into 3 sets– contains all the images that have not

been played– contains all the images that have been

played at least once, but less than r rounds– 102 PPPP

1P

0P

OPSA (2)

Outline

• Introduction• Analysis• Propose our algorithm• Evaluation & Simulation• Implementation• Result• Conclusion & Future work

Simulation

• Setup– There are 100,000 images– Running 20 times simulation

• Observation– T vs. r → OPSA– T vs. System gain → 3 strategies

• Discussion

T vs. r

2

])[ln()ln( SET

er

T vs. System Gain

Discussion

• OPSA is superior to RPSA & FPSA in the simulation

• A systematically & thorough study to verify the purposed strategies in real systems is highly desirable

• To this end, we decide to implement the ESP Lite system

Outline

• Introduction• Analysis• Propose our algorithm• Evaluation & Simulation• Implementation• Result• Conclusion & Future work

ESP Lite

• We implement ESP using Flash/Java• It is a mimic ESP game• It implemented three playing strategies

– RPSA, FPSA, OPSA

Flow chart

the strategy selection process gives priority to the strategy that has been used least in terms of the # of rounds played previously

Client interface

Client interface (2)

Score System

• The score system is used to measure the quality of the agreed words

• The quality of each the agreed word should depends on its popularity– high frequency → low quality → low score– low frequency → high quality → high score

Score System (2)

i-th level of the w_i

nwww ,,, 21 n words in the score table

nfff 21 frequency of the word

0, 0

0

1

0

fkf

f

L n

jj

i

jj

i

baseoffsetii SSLwscore )( score of w_i

4

1 levels score

levels score 5

k

reserved for agreed words which are not in score table

Score System (3)

• Apply the Porter Stemming Algorithm to remove common morphological and inflectional endings of English words– Prevent words with the same root, but

receiving different scores (e.g. determinant and determine)

– Reduce the plural form to the singular form (e.g. experiments and experiment)

Outline

• Introduction• Analysis• Propose our algorithm• Evaluation & Simulation• Implementation• Result• Conclusion & Future work

• 100,000 images from ESP dataset• ESP dataset

– 100,000 images– Average 15 labels per image collected from

the ESP game• Score system

– – Range of score (10 levels)

Experiment Setting

9k

}150,,70,60{

10scaleS60baseS

Experiment Setting (2)

• Score system (cont.)– The 5,000 most frequency words from

Brown Corpus (a general corpus in the field of corpus linguistics)

– Processed by Porter Stemming Algorithm – 3,476 words in score table

Experiment

• http://nrl.iis.sinica.edu.tw/GWAP/ESPLite/• From 2009/3/9 to 2009/4/9• 3,103 games• 9,376 labeled pictures• 12,312 agreements

Score Statistics

Behavior of OPSA

• What happened in each round?

Behavior of OPSA (2)

Behavior of OPSA (2)

• The operation of OPSA depend on r

• Observe players’ behavior

Behavior of OPSA (3)

Performance

Outline

• Introduction• Analysis• Propose our algorithm• Evaluation & Simulation• Implementation• Result• Conclusion & Future work

Conclusion

• In this thesis– We propose a metric to evaluate the performance

of GWAP– We play GWAP systems with strategies and make

GWAP systems more efficient

Conclusion (2)

• This is the first GWAP study that implements and evaluates an analytical model on real-world GWAP systems

• our experiment results confirm that GWAP systems are more efficient if they are designed and played with strategies

Future work

• Thinking about the time factor in analysis• Thinking about adaptive r for each image• Constructing the framework of GWAP systems• Developing the GWAP portal for fast accessing

more GWAP systems• Developing the GWAP systems toward mobile

environment

Thank You!