slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17...

32
Game-Theoretic Learning Amy Greenwald with David Gondek, Amir Jafari, and Casey Marks Brown University ICML Tutorial I July 4, 2004

Transcript of slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17...

Page 1: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Game-Theoretic Learning

Amy Greenwaldwith David Gondek, Amir Jafari, and Casey Marks

Brown University

ICML Tutorial I

July 4, 2004

Page 2: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Overview

1. Introduction to Game Theory

2. Regret Matching Learning Algorithms

◦ Regret Matching Learns Equilibria

3. Machine Learning Applications

Page 3: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Introduction to Game Theory

1. General-Sum Games

◦ Nash Equilibrium

◦ Correlated Equilibrium

2. Zero-Sum Games

◦ Minimax Equilibrium

Page 4: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Regret Matching Learning Algorithms

1. Regret Variations

◦ No Φ-Regret Learning

◦ External, Internal, and Swap Regret

2. Sufficient Conditions for No Φ-Regret Learning

◦ Blackwell’s Approachability Theorem

◦ Gordon’s Gradient Descent Theorem

– Potential Function Argument

3. Expected and Observed Regret Matching Algorithms

◦ Polynomial and Exponential Potential Functions

◦ External, Internal, and Swap Regret

4. No Φ-Regret Learning Converges to Φ-Equilibria

So Φ-Regret Matching Learns Φ-Equilibria

Page 5: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Machine Learning Applications

1. Online Classification

2. Offline Boosting

Page 6: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Game Theory and Economics

◦ Perfect Competition agents are price-takers

◦ Monopoly single entity commands all market power

◦ Game Theory payoffs in a game are jointly determined bythe strategies of all players

Page 7: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Knowledge, Rationality, and Equilibrium

Assumption

Players are rational: i.e., optimizing wrt their beliefs.

Theorem

Mutual knowledge of rationality and common knowledge of beliefs

is sufficient for the deductive justification of Nash equilibrium.

(Aumann and Brandenburger 95)

Question

Can learning provide an inductive justification for equilibrium?

Page 8: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Dimensions of Game Theory

◦ zero-sum vs. general-sum

◦ simultaneous vs. sequential-move

– deterministic vs. stochastic transitions

◦ cooperative vs. non-cooperative

◦ one-shot vs. repeated

Learning in Repeated Games

Page 9: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Rational Learning in Repeated Games

◦ An Iterative Method of Solving a GameRobinson 51

◦ Rational Learning Leads to Nash EquilibriumKalai and Lehrer 93

◦ Prediction, Optimization, and Learning in Repeated GamesNachbar 97

Page 10: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Low-Rationality Learning in Repeated Games

Evolutionary Learning

No-Regret Learning

◦ No-external-regret learning converges to minimax equilibrium

◦ No-internal-regret learning converges to correlated equilibrium

◦ No-Φ-regret learning does not converge to Nash equilibrium,

Page 11: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

One-Shot Games

1. General-Sum Games

◦ Nash Equilibrium

◦ Correlated Equilibrium

2. Zero-Sum Games

◦ Minimax Equilibrium

Page 12: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

An Example

Prisoners’ Dilemma

C D

C 4,4 0,5

D 5,0 1,1

C: Cooperate

D: Defect

Unique Nash, Correlated, and “Minimax” Equilibrium

Page 13: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Normal Form Games

A normal form game is a 3-tuple Γ = (I, (Ai, ri)i∈I), where

◦ I is a set of players

◦ for all players i ∈ I,

– a set of actions Ai with ai ∈ Ai

– a reward function ri : A → R, where A =∏

i∈I Ai

Normal form games are also called strategic form, or matrix, games.

Notation

Write a = (ai, a−i) ∈ A for ai ∈ Ai and a−i ∈ A−i =∏

j 6=i Aj.

Write q = (qi, q−i) ∈ Q for qi ∈ Qi and q−i ∈ Q−i =∏

j 6=i Qi,

where Qi = {qi ∈ RAi|∑

j qij = 1 & qij ≥ 0,∀j}.

Page 14: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Nash Equilibrium

Preliminaries

E[ri(q)] =∑

a∈A

q(a)ri(a), where q(a) =∏

i

qi(ai)

BRi(q) ≡ BRi(q−i) = {q∗i ∈ Qi | ∀qi ∈ Qi, E[ri(q∗i , q−i)] ≥ E[ri(qi,

Definition

A Nash equilibrium is an action profile q∗ s.t. q∗ ∈ BR(q∗).

Theorem [Nash 51]

Every finite strategic form game has a mixed strategy Nash equilib

Page 15: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

General-Sum Games

Battle of the Sexes Stag Hunt

B F

B 2,1 0,0F 0,0 1,2

C D

C 2,2 0,1D 1,0 1 + ε,1 + ε

Coordination Game Shapley Game

L C R

T 3,3 0,0 0,0M 0,0 2,2 0,0B 0,0 0,0 1,1

L C R

T 0,0 1,0 0,1M 0,1 0,0 1,0B 1,0 0,1 0,0

Page 16: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Correlated Equilibrium

ChickenL R

T 6,6 2,7B 7,2 0,0

CEL R

T 1/2 1/4B 1/4 0

max12πTL + 9πTR + 9πBL + 0πBR

subject to

πTL + πTR + πBL + πBR = 1

πTL, πTR, πBL, πBR ≥ 0

6πL|T + 2πR|T ≥ 7πL|T + 0πR|T

7πL|B + 0πR|B ≥ 6πL|B + 2πR|B

6πT |L + 2πB|L ≥ 7πT |L + 0πB|L

7πT |R + 0πB|R ≥ 6πT |R + 2πB|R

Page 17: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Correlated Equilibrium

Definition

An action profile q∗ ∈ Q is a correlated equilibrium iff for all strategies

if q(ai) > 0,∑

a−i∈A−i

q(a−i|ai) ri(ai, a−i) ≥∑

a−i∈A−i

q(a−i|ai) ri(a′i, a−i)

Observe

Every Nash equilibrium is a correlated equilibrium ⇒

Every finite normal form game has a correlated equilibrium.

Page 18: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Prisoners’ Dilemma

Weights Frequencies

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Wei

ght

Time

NIR polynomial [p = 2]: Prisoners’ Dilemma

Player 1: CPlayer 1: DPlayer 2: CPlayer 2: D

0 20 40 600

0.2

0.4

0.6

0.8

1

Em

piric

al F

requ

ency

Time

NIR polynomial [p = 2]: Prisoners’ Dilemma

Page 19: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Battle of the Sexes

Weights Frequencies

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

Wei

ght

Time

NIR polynomial [p = 2]: Battle of the Sexes

Player 1: BPlayer 1: FPlayer 2: BPlayer 2: F

0 20 40 600

0.2

0.4

0.6

0.8

1

Em

piric

al F

requ

ency

Time

NIR polynomial [p = 2]: Battle of the Sexes

Page 20: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Stag Hunt

Weights Frequencies

0 100 200 300 400 5000

0.2

0.4

0.6

0.8

1

Wei

ght

Time

NIR exponential [η = 0.052655]: Stag Hunt

Player 1: TPlayer 1: BPlayer 2: LPlayer 2: R

0 100 200 3000

0.2

0.4

0.6

0.8

1

Em

piric

al F

requ

ency

Time

NIR exponential [η = 0.052655]: Stag Hunt

Page 21: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Coordination Game

Weights Frequencies

0 100 200 300 400 5000

0.2

0.4

0.6

0.8

1

Wei

ght

Time

NIR exponential [η = 0.066291]: Coordination Game

Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R

0 100 200 3000

0.2

0.4

0.6

0.8

1

Em

piric

al F

requ

ency

Time

NIR exponential [η = 0.066291]: Coordination Game

Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R

Page 22: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Shapley Game: No Internal Regret Learning

Frequencies

0 2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

1

Em

piric

al F

requ

ency

Time

NIR polynomial [p = 2]: Shapley Game

Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R

0 2000 4000 6000 80000

0.2

0.4

0.6

0.8

1

Em

piric

al F

requ

ency

Time

NIR exponential [η = 0.014823]: Shapley Game

Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R

Page 23: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Shapley Game: No Internal Regret Learning

Joint Frequencies

0 2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

1

Em

piric

al F

requ

ency

Time

NIR polynomial [p = 2]: Shapley Game

(T,L)(M,L)(B,L)(T,C)(M,C)(B,C)(T,R)(M,R)(B,R)

0 2000 4000 6000 80000

0.2

0.4

0.6

0.8

1

Em

piric

al F

requ

ency

Time

NIR exponential [η = 0.014823]: Shapley Game

(T,L)(M,L)(B,L)(T,C)(M,C)(B,C)(T,R)(M,R)(B,R)

Page 24: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Shapley Game: No External Regret Learning

Frequencies

0 2000 4000 6000 8000 100000

0.2

0.4

0.6

0.8

1

Em

piric

al F

requ

ency

Time

NER polynomial [p = 2]: Shapley Game

Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R

0 2000 4000 6000 80000

0.2

0.4

0.6

0.8

1

Em

piric

al F

requ

ency

Time

NER exponential [η = 0.014823]: Shapley Game

Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R

Page 25: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Zero-Sum Games

Matching Pennies

H T

H −1,1 1,−1T 1,−1 −1,1

Rock-Paper-Scissors

R P S

R 0,0 −1,1 1,−1P 1,−1 0,0 −1,1S −1,1 1,−1 0,0

Definition∑

k∈I rk(a) = 0, for all a ∈ A∑

k∈I rk(a) = c, for all a ∈ A, for some c ∈ R

Page 26: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Zero-Sum Games: Pure Actions

Two Players mkl ≡ M(k, l) = r1(k, l) = −r2(k, l)

◦ Maximizer k∗ ∈ argmaxk∈A1minl∈A2

mkl

v(k∗) = maxk∈A1minl∈A2

mkl

◦ Minimizer l∗ ∈ argminl∈A2maxk∈A1

mkl

v(l∗) = minl∈A2maxk∈A1

mkl

Example

L R

T 1 2

B 4 3

Page 27: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Zero-Sum Games: Mixed Actions

Two Players M(p, l) =∑

k∈A1p(k)M(k, l)

M(k, q) =∑

l∈A2q(l)M(k, l)

◦ Maximizer p∗ ∈ argmaxp∈Q1minl∈A2

M(p, l)v(p∗) = maxp∈Q1

minl∈A2M(p, l)

◦ Minimizer q∗ ∈ argminq∈Q2maxk∈A1

M(k, q)v(q∗) = minq∈Q2

maxk∈A1M(k, q)

Example

L R

T +1 −1

B −1 +1

Page 28: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Minimax Theoremvon Neumann 28

Theorem

In two player, zero-sum games, the minimax value equals the maximin

Easy Direction v(p∗) ≤ v(q∗)

◦ analogous to weak duality in linear programming

Hard Direction v(q∗) ≤ v(p∗)

◦ akin to strong duality in linear programming

Page 29: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Proof of Easy Direction

Observe

M l l∗

k ∗≥

k∗ ∗ ≤ ∗

Therefore, v(k∗) = M(k∗, l) ≤ M(k, l∗) = v(l∗)

Proof of Hard Direction

Corollary of the existence of no-external-regret learning algorithms

Freund & Schapire 96

Page 30: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Matching Pennies

Weights Frequencies

0 200 400 600 800 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Wei

ght

Time

NER exponential [η = 0.037233]: Matching Pennies

Player 1: HPlayer 1: TPlayer 2: HPlayer 2: T

0 200 400 6000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Em

piric

al F

requ

ency

Time

NER exponential [η = 0.037233]: Matching Pennies

Page 31: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Rock-Paper-Scissors

Weights Frequencies

0 2000 4000 6000 8000 100000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Wei

ght

Time

NER polynomial [p = 2]: Rock, Paper, Scissors

Player 1: RPlayer 1: PPlayer 1: SPlayer 2: RPlayer 2: PPlayer 2: S

0 2000 4000 60000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Em

piric

al F

requ

ency

Time

NER polynomial [p = 2]: Rock, Paper, Scissors

Page 32: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM

Summary of Part I

“A little rationality goes a long way” [Hart 03]

No-Regret Learning

◦ No-external regret learning converges to minimax equilibrium

◦ No-internal regret learning converges to correlated equilibrium