Some Scaling Laws for MOOC Assessments - Aspiring Minds · Some Scaling Laws for MOOC Assessments...

Post on 06-Aug-2020

3 views 0 download

Transcript of Some Scaling Laws for MOOC Assessments - Aspiring Minds · Some Scaling Laws for MOOC Assessments...

Some Scaling Laws for

MOOC Assessments

Nihar B. Shah

Joint work with: J. Bradley, S. Balakrishnan,

A. Parekh, K. Ramchandran, M. J. Wainwright

MOOCs

Information Dissemination Scales Well

?

?

??

Assessment & Feedback Not Easy

Auto-Grading

What is the name of this workshop?

○ Assess○ Recess○ Digress○ Matress

Restricted Applicability

Need human participationfor subjective topics

Peer-Grading

Peer-Grading

A+

B-

C+

B+

B-

Aggregate

Peer-Grading

A+

B-

C+

B+

B-

Aggregate

Potential to scale: Number of graders scales automatically with the number of students!

Coursera HCI 1

A+

B-

C+

B+

B-

Median

Many Errors Observed

Other Aggregation Algorithms

Piech et al. ‘13

Gutierrez et al. ‘14

Walsh ‘14

Díez et al., ‘13

Other Aggregation Algorithms

No Guarantees

Piech et al. ‘13

Gutierrez et al. ‘14

Walsh ‘14

Díez et al., ‘13

Scalable Peer-grading?

Which peer-grading algorithms can guaranteethat the expected fraction of students misgradedgoes to zero (as the class size becomes large)?

None – no aggregation algorithm can give such a guarantee.

(when peer-grading is used as a standalone)

Impossibility Result

THEOREM

If average grading ability of students is invariant to d then theexpected fraction of students misgraded under any peer-gradingalgorithm is lower bounded by a constant c > 0 (independent of d).

Let d = number of students

• The constant c depends on the ability of the graders

• The results holds even if instructor grades a constant fraction of submissions

Impossibility ResultLet d = number of students

Intuition:• Due to noisy graders, many errors when d is small

• When d is large, want to use largeness of system to combat noise

• Although #graders increases with d, the number of submissionsto be graded also increases proportionally

• For any individual student, there is no “improvement” in thepeer-grading system as d increases

How to Make Peer-grading Scalable?

Dimensionality reduction!(Clustering)

And then peer-grade.

Cluster Submissions…

Cluster Submissions… Then Peer-grade

Theoretical Guarantee

THEOREM

If the d submissions can be clustered into at most d/log(d)clusters with at most o(d) errors, then the expected fraction ofstudents misgraded goes to zero as d gets large.

Theoretical Guarantee

Intuition:• Each submission graded by log(d) students. Grows as d increases.• d is large, so aggregate reliable even if graders extremely noisy.

THEOREM

If the d submissions can be clustered into at most d/log(d)clusters with at most o(d) errors, then the expected fraction ofstudents misgraded goes to zero as d gets large.

Clustering: In Practice…Active topic of research

“Powergrading”Basu et al. ‘13

Brooks et al. ’14 “ACES”Rogers et al. ‘14

Essay GradingLarkey ‘98

“Codewebs”Nguyen et al. ‘14

“Overcode”Glassman et al. ‘14

Clustering: In Theory…

Do they belong tothe same cluster?

Submission 1

Submission 2Yes/No

Correct with probability ≥ ½ + δ(for some δ > 0)

Suppose there are d/log(d) or fewer underlying clusters. Suppose there is an algorithm such that:

Then the expected fraction of students misgraded goes to zero as the number of students becomes large.

THEOREM

Clustering: In Theory…

Takeaway: Suffices to design a just-better-than-random comparator

Do they belong tothe same cluster?

Submission 1

Submission 2Yes/No

Correct with probability ≥ ½ + δ(for some δ > 0)

Suppose there are d/log(d) or fewer underlying clusters. Suppose there is an algorithm such that:

Then the expected fraction of students misgraded goes to zero as the number of students becomes large.

THEOREM

Summary: Peer-grading in MOOCs

• Most literature is empirical, we take a statistical approach

• Takeaways:

1) Peer-grading as a standalone does not scale

2) Dimensionality reduction + peer-grading can scale

3) Any better-than-random comparator suffices