Download - Turkomatic - EECS at UC Berkeleybjoern/projects/turkomatic/... · poang lounge chair adirondack chair ... Why does the crowd lose context? Turkomatic worker: “…I’ve taken a

Transcript

Anand  Kulkarni  Björn  Hartmann      University  of  California,  Berkeley  Ma3hew  Can      Stanford  University    

Collaboratively Crowdsourcing Complex Work With Turkomatic

Turkomatic

Microtask marketplaces excel at simple, repetitive work.

Microtask marketplaces excel at simple, repetitive work.

Transcribe a "business card.

Microtask marketplaces excel at simple, repetitive work.

Transcribe a "business card.

Look up a fact online.

Much of the work we do in our daily lives "is not simple or repetitive.

“Create algebra problems for my mathematics exam.” “Write a research paper.”

“Create a small piece of software.”

“Arrange my trip to Seattle.”

“Write a blog about Mechanical Turk with a few good entries.”

!How do we crowdsource complex work?!

Complex work with crowds

Soylent: "Editing word processing documents (Bernstein et al ’10) Vizwiz: Answering queries about visual scenes (Bigham et al ‘10)

More complex applications: "Platemate [NHZG11], Adrenaline [BBMK11], Crowdforge [KSK11]….

Workflows: Crowd Algorithms Divide complex tasks into a sequence of microtasks arranged in a workflow

Soylent,  Bernstein  et  al,  UIST  2010    

Workflow design is labor-intensive

1. Design individual HITs 2. Implement parallelism to make sure tasks are done correctly 3. Write software to launch HITs and parse worker results 4. Test workflow by running program 5. Identify errors 6. Iterate from step 1

Workflow design is labor-intensive

Difficult and domain-specific: "Workflow design requires extensive "up-front iteration and experimentation and is specific to a given task domain.

Inaccessible to non-experts: "Few have the patience to implement this process in code

Turkomatic is a system for crowdsourcing "high-level complex and creative work where the crowd designs the workflow.

What is Turkomatic?

What is Turkomatic?

Create a new blog about Mechanical Turk with two posts.!

Price-Divide-Solve (PDS)

How do we induce the crowd to design a workflow?

Price-Divide-Solve (PDS) PDS is a divide and conquer algorithm to create workflows.

Price: Can this task be solved for 20 cents? If yes: Solve task and return the answer. If no: Divide task into multiple steps. For each step, recurse. Merge steps into solution.

Price-Divide-Solve (PDS) PDS is a divide and conquer algorithm to create workflows.

Price: Can this task be solved for 20 cents? If yes: Solve task and return the answer. If no: Divide task into multiple steps. For each step, recurse. Merge steps into solution.

Price Task Price Task

Price-Divide-Solve (PDS) Redundancy is used at each step "to ensure quality.

Divide Task

Best subdivision Vote

Price Task Price Task Price check

Consensus on price Majority

Price Task Price Task Solve Task Best solution Vote

Price-Divide-Solve (PDS) Create a new blog about Mechanical Turk with two posts.!

Can  we  solve  it  for  20  cents?  Price

Price-Divide-Solve (PDS) Create a new blog about Mechanical Turk with two posts.!

Can  we  solve  it  for  20  cents?  Price No.  

Can  we  solve  it  for  20  cents?  Price No.  

Price-Divide-Solve (PDS) Create a new blog about Mechanical Turk with two posts.!

Create a new blog on Wordpress.com.

Write one entry for �a blog.

Write a second entry for a blog.

Divide  it  into  two  or  more  steps.  Divide

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for �a blog.

Create a new blog about Mechanical Turk with two posts.!

Write a second entry for a blog.

Price Divide  it  into  two  or  more  steps.  Divide

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for �a blog.

Create a new blog about Mechanical Turk with two posts.!

Write a second entry for a blog.

Can  we  solve  it  for  20  cents?  

Price

Can  we  solve  it  for  20  cents?  

Can  we  solve  it  for  20  cents?  

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for �a blog.

Create a new blog about Mechanical Turk with two posts.!

Write a second entry for a blog.

Can  we  solve  it  for  20  cents?  

Price

Can  we  solve  it  for  20  cents?  

Can  we  solve  it  for  20  cents?  

Yes.   Yes.   Yes.  

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for �a blog.

Create a new blog about Mechanical Turk with two posts.!

Write a second entry for a blog.

Solve

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for �a blog.

Create a new blog about Mechanical Turk with two posts.!

Write a second entry for a blog.

Solve

“Welcome to my blog about Mechanical Turk! Here, I’ll be posting some of my

favorite recipes for Mechanical Turk. You’ll be able to follow along at home and create

delicious HITs. From the comfort of your own home! Stay tuned and i’ll show you

some of the best strategies for keeping your Turk workers engaged.”

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for �a blog.

Create a new blog about Mechanical Turk with two posts.!

Write a second entry for a blog.

Solve

“You may be inclined to price your HITs at the lowest possible rate, but this isn’t

always the best choice. Instead, you should base your pricing on:

-How long will the HIT take?

-Is the HIT similar to other HITs? If so, price it slightly less than theirs.

-If the HIT involves a lot of qualifications, you may want to price it higher, to attract

more qualified workers.”

Price-Divide-Solve (PDS)

Create a new blog on Wordpress.com.

Write one entry for �a blog.

Create a new blog about Mechanical Turk with two posts.!

Write a second entry for a blog.

mtworker.wordpress.com!

Combine  the  results  of  solved  steps.  Merge

mtworker.wordpress.com!

Can  this  task  be  solved  for  20  cents?  

Yes  No  

Write  a  blog  about  Mechanical  Turk  

Submit  

Break  down  the  following  task.  

Write  a  blog  about  Mechanical  Turk  

Step  1:  Step  2:  

Add  Step   Submit  

Solve  the  following  task.  

Create  a  new  blank  blog  on  Wordpress  

Submit  

Merge  the  following  subtasks.  

Write  a  blog  about  Mechanical  Turk  

Step  1:  Step  2:  

Submit  

Workers  previously  divided  this  task  into  simpler  steps  and  solved  each  step.  Combine  their  work  into  a  complete  soluPon.  

Write  a  blog  post  about  Mechanical  Turk.  [answer:  This  post  is…]  

Create  a  blank  blog  about  Mechanical  Turk  [answer:  www...]  

Price-Divide-Solve (PDS)

PDS guides the crowd to design workflows in a particular way. ""It can attempt to create a workflow for "any task, but it can’t produce all workflows.

Write  a  sentence.  Improve  the  

previous  worker’s  answer.  

Check  that  the  previous  answer  was  improved.  

System Recap

Price Solve Divide

Requester Interface

System Output

Algorithm

Algorithm

Worker Interface

Experiment 1: �Can the crowd plan and execute workflows using PDS? Over 150 trials, including: •  Java programming •  Booking restaurants •  Sorting and cleaning data •  Blogging •  Creating self-portraits •  Solving an SAT •  Logo design •  Travel planning •  Writing essays •  Web research …

Experiment 1: �Can the crowd plan and execute workflows using PDS? Over 150 trials, including: •  Java programming •  Booking restaurants •  Sorting and cleaning data •  Blogging •  Creating self-portraits •  Solving an SAT •  Logo design •  Travel planning •  Writing essays •  Web research …

Experiment 1: Success Modes

Write a 3-paragraph essay about whether it’s ever OK to lie.!

Write one paragraph arguing it’s OK to lie sometimes.

Write one paragraph suggesting it’s never OK to lie.

Write a conclusion reconciling the two.

Write one sentence �to open the conclusion.

Write 2-3 sentences �in the middle of the conclusion.

Write a concluding sentence.

Experiment 1: Success Modes

Data: •  6 subnodes were produced •  44 separate worker judgments were used •  Task completed with a full essay

Experiment 1: Success Modes

“…although many people believe it is always essential to tell the truth, sometimes it may be better to lie. There is credibility in both views. And like many ethical decisions, sometimes the circumstances dictate. ""When you tell the truth you develop a stronger bond of trust with those around you. A relationship can not exist without trust. If you lie, you end up telling more lies to cover the first….”

Experiment 1: Failure Modes

There are two ways we found that the algorithm could fail: -Failing to terminate at all -Completing, but producing wrong answers

Experiment 1: Failing to terminate

Plan a trip from New York to S.F. that visits 5 interesting places.!

Think  about  where  to  go  next  

in  Ohio.  

Think  about  where  to  go  next  

in  Ohio.  

Experiment 1: Wrong answers

List the department chairs of the top 20 US programs in CS.!

aalto armchair poang lounge chair

adirondack chair aeron chair balans chair

ball chair ….

Why does the crowd lose context?

Turkomatic worker: “…I’ve taken a look at your instructions, and I understand them perfectly. However, this task seems to have been inadvertently sabotaged by other turkers who do not understand what you are asking them to do…”

Long workflows involve increasing chains of trust.

Each individual worker has a ~30% probability of failure [Chi/Kittur/Suh ’08, Bernstein et al ’10] Weakest link problem: If one worker early in the workflow design process makes mistakes, the subsequent decompositions will fail.

Including context doesn’t suffice

One explanation

What if we used more competent workers?

Experiment 2: Can expert workers make Turkomatic work? Setup:"

"We recruited five graduate students with experience as requesters on Mechanical Turk.

We ran the PDS algorithm on three complex tasks with this crowd: online research, essay writing, and creating a blog"

Experiment 2: Can expert workers make Turkomatic work? Results:

Each of three tested tasks completed correctly when we used only expert workers!

Experiment 2: Can expert workers make Turkomatic work? Results:

Each of three tested tasks completed correctly when we used only expert workers!

Conclusion:

PDS works well with qualified crowds.

How can we successfully run PDS with unskilled workers?

Experiment 3: Can requester management help the crowd? Workflow visualizer: "Monitor the workflow in real-time.

Interactive task editor: Selectively invalidate parts of a workflow.

Workflow seeding: Run previously-designed parts of workflows in the crowd.

Task Graphs (Requester)

Task Graph Nodes

Task Prompt

Status Submitted Answer

completed queued in progress

Task Graph Edges: Parallel

Parent Task

Split

Sub Task 1

Solve

Sub Task 2

Decide

Task Graph Edges: Sequential

Parent Task

Split

Sub Task 1

Solve

Sub Task 2

Decide

Task Graph Example

Write an essay

Split

Write an outline

Solve 1. Thesis: …

Expand the outline

Decide

Task Graph Editing Write a 3-paragraph essay…

Split

Think about the topic…

Split Collect information about…

Decide Write the paragraphs…

Decide

Pick one of the topics

Split

List possible topics

Solve 1. The word…

EDIT TASK DETAILS

Edit Task Edit Solution

Edit Subtask Delete Node

Task:

Status:

Think about the topic you want to write about

Split

Task Graph Editing Write a 3-paragraph essay…

Split

Think about the topic…

Split Collect information about…

Decide Write the paragraphs…

Decide

Pick one of the topics

Split

List possible topics

Solve 1. The word…

List three main topics…

Solve

Recomputing Task Graphs

•  Delete subtree of edited task •  Recursively:

– Delete stale solutions in parent tasks – Delete stale solutions in subsequent sibling

tasks (for serial decompositions)

Seeding workflows We mitigate poor performances by workers by starting with partial workflows.

Run Workflow with Crowd

Experiment 3: Collaboration

Setup: "We ran the PDS algorithm using Turkers on three sets of tasks, but actively monitored and intervened only to eliminate errors

Outcomes:

Each of the three tested tasks completed correctly with 1 to 4 requester interventions.

Experiment 3: Collaboration

Paragraph  1  

Paragraph  2  

Paragraph  3  

Experiment 3: Collaboration

Crowdsourcing  is  a  term…  

Experiment 3: Collaboration

Crowdsourcing  is  a  term…  

Chaordix  crowd  consulPng  is…  

Experiment 3: Collaboration

Crowdsourcing  is  a  term…  

Experiment 3: Collaboration

Crowdsourcing  is  a  term…  

Crowdsourcing  works  best  on  tasks  where…  

Experiment 3: Collaboration

Crowdsourcing  is  a  term…  

Crowdsourcing  works  best  on  tasks  where…  

One  of  the  best  known  crowdsourcing  plaYorms…  

Conclusion We presented Turkomatic, a system to let the requesters harness the crowd to design complex workflows.

Our first experiment showed successful and unsuccessful examples could result from letting the crowd design their own tasks.

Our second experiment showed that expert workers could successfully design workflows using PDS.

Conclusion

Last, we showed that an interactive, real-time interface for visualizing and selectively editing worker interfaces could produce viable workflows.

One finding of note

In Turkomatic, highly motivated workers could not contribute to correct others’ errors. Excessive structure in workflow design prevents the emergence of leaders. To scale, we may consider giving editing abilities to more capable workers.

Contributions

A simplified interface for crowdsourcing that lowers the threshold for crowdsourcing complex tasks"

A new algorithm, techniques, and interfaces enabling the crowd to decompose complex tasks

A new interface for letting requesters edit,

visualize, and seed workflows