Machine Teaching -...

35
Machine Teaching An Inverse Problem to Machine Learning and an Approach Toward Optimal Education Jerry Zhu Department of Computer Sciences University of Wisconsin-Madison AAAI 2015 Zhu (Wisconsin) Machine Teaching AAAI 2015 1 / 12

Transcript of Machine Teaching -...

Page 1: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Machine Teaching

An Inverse Problem to Machine Learningand an Approach Toward Optimal Education

Jerry ZhuDepartment of Computer SciencesUniversity of Wisconsin-Madison

AAAI 2015

Zhu (Wisconsin) Machine Teaching AAAI 2015 1 / 12

Page 2: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Example One

Steve the student runs a linear SVM:

Given a training set with n items xi ∈ Rd, yi ∈ −1, 1

Steve learns w ∈ Rd

Tina the teacher wants Steve to learn a target w∗

x>w∗ = 0

What is the smallest training set Tina can give Steve?

Zhu (Wisconsin) Machine Teaching AAAI 2015 2 / 12

Page 3: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Example One

Steve the student runs a linear SVM:

Given a training set with n items xi ∈ Rd, yi ∈ −1, 1

Steve learns w ∈ Rd

Tina the teacher wants Steve to learn a target w∗

x>w∗ = 0

What is the smallest training set Tina can give Steve?

Zhu (Wisconsin) Machine Teaching AAAI 2015 2 / 12

Page 4: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Example One

Steve the student runs a linear SVM:

Given a training set with n items xi ∈ Rd, yi ∈ −1, 1

Steve learns w ∈ Rd

Tina the teacher wants Steve to learn a target w∗

x>w∗ = 0

What is the smallest training set Tina can give Steve?

Zhu (Wisconsin) Machine Teaching AAAI 2015 2 / 12

Page 5: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Example One

Tina’s non-iid training set with n = 2 items

Zhu (Wisconsin) Machine Teaching AAAI 2015 3 / 12

Page 6: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Example Two

Steve estimates a Gaussian density:

Given x1 . . .xn ∈ Rd

Steve learns µ =1

n

∑xi, Σ =

1

n− 1

∑(xi − µ)(xi − µ)>

Tina wants Steve to learn a target Gaussian with (µ∗,Σ∗)

Zhu (Wisconsin) Machine Teaching AAAI 2015 4 / 12

Page 7: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Example Two

Steve estimates a Gaussian density:

Given x1 . . .xn ∈ Rd

Steve learns µ =1

n

∑xi, Σ =

1

n− 1

∑(xi − µ)(xi − µ)>

Tina wants Steve to learn a target Gaussian with (µ∗,Σ∗)

Zhu (Wisconsin) Machine Teaching AAAI 2015 4 / 12

Page 8: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Example Two

Tina’s minimal training set of n = d+ 1 tetrahedron vertices

Zhu (Wisconsin) Machine Teaching AAAI 2015 5 / 12

Page 9: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Machine Teaching More Powerful Than Active Learning

θ θ

O(1/2 )n

θ

O(1/n)

passive learning "waits" active learning "explores" teaching "guides"

Sample complexity to achieve ε error

passive learning 1/ε

active learning log(1/ε)

machine teaching 2: Tina knows θ

Zhu (Wisconsin) Machine Teaching AAAI 2015 6 / 12

Page 10: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Machine Teaching More Powerful Than Active Learning

θ θ

O(1/2 )n

θ

O(1/n)

passive learning "waits" active learning "explores" teaching "guides"

Sample complexity to achieve ε error

passive learning 1/ε

active learning log(1/ε)

machine teaching 2: Tina knows θ

Zhu (Wisconsin) Machine Teaching AAAI 2015 6 / 12

Page 11: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Machine Teaching More Powerful Than Active Learning

θ θ

O(1/2 )n

θ

O(1/n)

passive learning "waits" active learning "explores" teaching "guides"

Sample complexity to achieve ε error

passive learning 1/ε

active learning log(1/ε)

machine teaching 2: Tina knows θ

Zhu (Wisconsin) Machine Teaching AAAI 2015 6 / 12

Page 12: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Machine Teaching

A(D)

θ∗

Α (θ )−1 ∗

A

Α−1

D

Θ

DD

Tina wants Steve to learn a target model θ∗

I not machine learning: Tina already knows θ∗

Tina knows Steve’s learning algorithm A

Tina seeks the best training set within A−1(θ∗) for Steve

I best = the smallest (Teaching Dimension [Goldman Kearns 1995]), orother criteria

Zhu (Wisconsin) Machine Teaching AAAI 2015 7 / 12

Page 13: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Machine Teaching

A(D)

θ∗

Α (θ )−1 ∗

A

Α−1

D

Θ

DD

Tina wants Steve to learn a target model θ∗

I not machine learning: Tina already knows θ∗

Tina knows Steve’s learning algorithm A

Tina seeks the best training set within A−1(θ∗) for Steve

I best = the smallest (Teaching Dimension [Goldman Kearns 1995]), orother criteria

Zhu (Wisconsin) Machine Teaching AAAI 2015 7 / 12

Page 14: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Machine Teaching

A(D)

θ∗

Α (θ )−1 ∗

A

Α−1

D

Θ

DD

Tina wants Steve to learn a target model θ∗

I not machine learning: Tina already knows θ∗

Tina knows Steve’s learning algorithm A

Tina seeks the best training set within A−1(θ∗) for Steve

I best = the smallest (Teaching Dimension [Goldman Kearns 1995]), orother criteria

Zhu (Wisconsin) Machine Teaching AAAI 2015 7 / 12

Page 15: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Machine Teaching

A(D)

θ∗

Α (θ )−1 ∗

A

Α−1

D

Θ

DD

Tina wants Steve to learn a target model θ∗

I not machine learning: Tina already knows θ∗

Tina knows Steve’s learning algorithm A

Tina seeks the best training set within A−1(θ∗) for Steve

I best = the smallest (Teaching Dimension [Goldman Kearns 1995]), orother criteria

Zhu (Wisconsin) Machine Teaching AAAI 2015 7 / 12

Page 16: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Machine Teaching

A(D)

θ∗

Α (θ )−1 ∗

A

Α−1

D

Θ

DD

Tina wants Steve to learn a target model θ∗

I not machine learning: Tina already knows θ∗

Tina knows Steve’s learning algorithm A

Tina seeks the best training set within A−1(θ∗) for SteveI best = the smallest (Teaching Dimension [Goldman Kearns 1995]), or

other criteria

Zhu (Wisconsin) Machine Teaching AAAI 2015 7 / 12

Page 17: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Machine Teaching Harder Than Machine Learning

Special case: teaching the exact parameters, minimizing training set size

minD∈D

|D| Tina’s problem

s.t. θ∗ = argminθ∈Θ

1

|D|∑zi∈D

`(zi, θ) + Ω(θ) Steve’s algorithm A

Bilevel optimization, NP-hard in general

Zhu (Wisconsin) Machine Teaching AAAI 2015 8 / 12

Page 18: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

General Machine Teaching Framework

[poster 819 Wednesday]

minD∈D,θ∈Θ

RT (θ) + λET (D)

s.t. θ = A(D)

RT (): teaching risk function e.g. ‖θ − θ∗‖22

ET (): teaching effort function e.g. different item costs

Tina’s search space D: constructive or pool-based, batch or sequential

Tractable solutions when Steve runs linear regression, logisticregression, SVM, LDA, etc. [Mei Z 2015a, Mei Z 2015b]

Zhu (Wisconsin) Machine Teaching AAAI 2015 9 / 12

Page 19: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

General Machine Teaching Framework

[poster 819 Wednesday]

minD∈D,θ∈Θ

RT (θ) + λET (D)

s.t. θ = A(D)

RT (): teaching risk function e.g. ‖θ − θ∗‖22ET (): teaching effort function e.g. different item costs

Tina’s search space D: constructive or pool-based, batch or sequential

Tractable solutions when Steve runs linear regression, logisticregression, SVM, LDA, etc. [Mei Z 2015a, Mei Z 2015b]

Zhu (Wisconsin) Machine Teaching AAAI 2015 9 / 12

Page 20: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

General Machine Teaching Framework

[poster 819 Wednesday]

minD∈D,θ∈Θ

RT (θ) + λET (D)

s.t. θ = A(D)

RT (): teaching risk function e.g. ‖θ − θ∗‖22ET (): teaching effort function e.g. different item costs

Tina’s search space D: constructive or pool-based, batch or sequential

Tractable solutions when Steve runs linear regression, logisticregression, SVM, LDA, etc. [Mei Z 2015a, Mei Z 2015b]

Zhu (Wisconsin) Machine Teaching AAAI 2015 9 / 12

Page 21: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

General Machine Teaching Framework

[poster 819 Wednesday]

minD∈D,θ∈Θ

RT (θ) + λET (D)

s.t. θ = A(D)

RT (): teaching risk function e.g. ‖θ − θ∗‖22ET (): teaching effort function e.g. different item costs

Tina’s search space D: constructive or pool-based, batch or sequential

Tractable solutions when Steve runs linear regression, logisticregression, SVM, LDA, etc. [Mei Z 2015a, Mei Z 2015b]

Zhu (Wisconsin) Machine Teaching AAAI 2015 9 / 12

Page 22: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Education

Steve may actually be a human!

Tina computes the best lesson D for Steve

Needs Steve’s cognitive model A

I a “good enough” A is fine

Zhu (Wisconsin) Machine Teaching AAAI 2015 10 / 12

Page 23: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Education

Steve may actually be a human!

Tina computes the best lesson D for Steve

Needs Steve’s cognitive model A

I a “good enough” A is fine

Zhu (Wisconsin) Machine Teaching AAAI 2015 10 / 12

Page 24: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Education

Steve may actually be a human!

Tina computes the best lesson D for Steve

Needs Steve’s cognitive model A

I a “good enough” A is fine

Zhu (Wisconsin) Machine Teaching AAAI 2015 10 / 12

Page 25: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Education

Steve may actually be a human!

Tina computes the best lesson D for Steve

Needs Steve’s cognitive model AI a “good enough” A is fine

Zhu (Wisconsin) Machine Teaching AAAI 2015 10 / 12

Page 26: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

A Case Study

θ =0.5∗0 1

y=−1 y=1

Human categorization [Patil Z Kopec Love 2014]

A: a limited capacity retrieval cognitive model

human training set human test accuracy

D 72.5%iid 69.8%

(statistically significant)

Zhu (Wisconsin) Machine Teaching AAAI 2015 11 / 12

Page 27: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

A Case Study

θ =0.5∗0 1

y=−1 y=1

Human categorization [Patil Z Kopec Love 2014]

A: a limited capacity retrieval cognitive model

human training set human test accuracy

D 72.5%iid 69.8%

(statistically significant)

Zhu (Wisconsin) Machine Teaching AAAI 2015 11 / 12

Page 28: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

A Case Study

θ =0.5∗0 1

y=−1 y=1

Human categorization [Patil Z Kopec Love 2014]

A: a limited capacity retrieval cognitive model

human training set human test accuracy

D 72.5%iid 69.8%

(statistically significant)

Zhu (Wisconsin) Machine Teaching AAAI 2015 11 / 12

Page 29: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

A Case Study

θ =0.5∗0 1

y=−1 y=1

Human categorization [Patil Z Kopec Love 2014]

A: a limited capacity retrieval cognitive model

human training set human test accuracy

D 72.5%iid 69.8%

(statistically significant)

Zhu (Wisconsin) Machine Teaching AAAI 2015 11 / 12

Page 30: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Open Problems for the AI Community

Optimization: solve Tina’s problem for any Steve

Theory: generalize Teaching Dimension [Goldman Kearns 1995]

Cognitive Science and Education: Steve’s A, real education tasks

Computer Security: data poisoning attacks

References:http://pages.cs.wisc.edu/~jerryzhu/machineteaching/

Zhu (Wisconsin) Machine Teaching AAAI 2015 12 / 12

Page 31: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Open Problems for the AI Community

Optimization: solve Tina’s problem for any Steve

Theory: generalize Teaching Dimension [Goldman Kearns 1995]

Cognitive Science and Education: Steve’s A, real education tasks

Computer Security: data poisoning attacks

References:http://pages.cs.wisc.edu/~jerryzhu/machineteaching/

Zhu (Wisconsin) Machine Teaching AAAI 2015 12 / 12

Page 32: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Open Problems for the AI Community

Optimization: solve Tina’s problem for any Steve

Theory: generalize Teaching Dimension [Goldman Kearns 1995]

Cognitive Science and Education: Steve’s A, real education tasks

Computer Security: data poisoning attacks

References:http://pages.cs.wisc.edu/~jerryzhu/machineteaching/

Zhu (Wisconsin) Machine Teaching AAAI 2015 12 / 12

Page 33: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Open Problems for the AI Community

Optimization: solve Tina’s problem for any Steve

Theory: generalize Teaching Dimension [Goldman Kearns 1995]

Cognitive Science and Education: Steve’s A, real education tasks

Computer Security: data poisoning attacks

References:http://pages.cs.wisc.edu/~jerryzhu/machineteaching/

Zhu (Wisconsin) Machine Teaching AAAI 2015 12 / 12

Page 34: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Open Problems for the AI Community

Optimization: solve Tina’s problem for any Steve

Theory: generalize Teaching Dimension [Goldman Kearns 1995]

Cognitive Science and Education: Steve’s A, real education tasks

Computer Security: data poisoning attacks

References:http://pages.cs.wisc.edu/~jerryzhu/machineteaching/

Zhu (Wisconsin) Machine Teaching AAAI 2015 12 / 12

Page 35: Machine Teaching - pages.cs.wisc.edupages.cs.wisc.edu/~jerryzhu/career/pub/MachineTeachingAAAI.pdfMachine Teaching A(D) q* A (q )-1 * A A-1 D Q DD Tina wants Steve to learn a target

Open Problems for the AI Community

Optimization: solve Tina’s problem for any Steve

Theory: generalize Teaching Dimension [Goldman Kearns 1995]

Cognitive Science and Education: Steve’s A, real education tasks

Computer Security: data poisoning attacks

References:http://pages.cs.wisc.edu/~jerryzhu/machineteaching/

Zhu (Wisconsin) Machine Teaching AAAI 2015 12 / 12