slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17...
Transcript of slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17...
![Page 1: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/1.jpg)
Game-Theoretic Learning
Amy Greenwaldwith David Gondek, Amir Jafari, and Casey Marks
Brown University
ICML Tutorial I
July 4, 2004
![Page 2: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/2.jpg)
Overview
1. Introduction to Game Theory
2. Regret Matching Learning Algorithms
◦ Regret Matching Learns Equilibria
3. Machine Learning Applications
![Page 3: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/3.jpg)
Introduction to Game Theory
1. General-Sum Games
◦ Nash Equilibrium
◦ Correlated Equilibrium
2. Zero-Sum Games
◦ Minimax Equilibrium
![Page 4: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/4.jpg)
Regret Matching Learning Algorithms
1. Regret Variations
◦ No Φ-Regret Learning
◦ External, Internal, and Swap Regret
2. Sufficient Conditions for No Φ-Regret Learning
◦ Blackwell’s Approachability Theorem
◦ Gordon’s Gradient Descent Theorem
– Potential Function Argument
3. Expected and Observed Regret Matching Algorithms
◦ Polynomial and Exponential Potential Functions
◦ External, Internal, and Swap Regret
4. No Φ-Regret Learning Converges to Φ-Equilibria
So Φ-Regret Matching Learns Φ-Equilibria
![Page 5: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/5.jpg)
Machine Learning Applications
1. Online Classification
2. Offline Boosting
![Page 6: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/6.jpg)
Game Theory and Economics
◦ Perfect Competition agents are price-takers
◦ Monopoly single entity commands all market power
◦ Game Theory payoffs in a game are jointly determined bythe strategies of all players
![Page 7: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/7.jpg)
Knowledge, Rationality, and Equilibrium
Assumption
Players are rational: i.e., optimizing wrt their beliefs.
Theorem
Mutual knowledge of rationality and common knowledge of beliefs
is sufficient for the deductive justification of Nash equilibrium.
(Aumann and Brandenburger 95)
Question
Can learning provide an inductive justification for equilibrium?
![Page 8: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/8.jpg)
Dimensions of Game Theory
◦ zero-sum vs. general-sum
◦ simultaneous vs. sequential-move
– deterministic vs. stochastic transitions
◦ cooperative vs. non-cooperative
◦ one-shot vs. repeated
Learning in Repeated Games
![Page 9: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/9.jpg)
Rational Learning in Repeated Games
◦ An Iterative Method of Solving a GameRobinson 51
◦ Rational Learning Leads to Nash EquilibriumKalai and Lehrer 93
◦ Prediction, Optimization, and Learning in Repeated GamesNachbar 97
![Page 10: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/10.jpg)
Low-Rationality Learning in Repeated Games
Evolutionary Learning
No-Regret Learning
◦ No-external-regret learning converges to minimax equilibrium
◦ No-internal-regret learning converges to correlated equilibrium
◦ No-Φ-regret learning does not converge to Nash equilibrium,
![Page 11: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/11.jpg)
One-Shot Games
1. General-Sum Games
◦ Nash Equilibrium
◦ Correlated Equilibrium
2. Zero-Sum Games
◦ Minimax Equilibrium
![Page 12: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/12.jpg)
An Example
Prisoners’ Dilemma
C D
C 4,4 0,5
D 5,0 1,1
C: Cooperate
D: Defect
Unique Nash, Correlated, and “Minimax” Equilibrium
![Page 13: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/13.jpg)
Normal Form Games
A normal form game is a 3-tuple Γ = (I, (Ai, ri)i∈I), where
◦ I is a set of players
◦ for all players i ∈ I,
– a set of actions Ai with ai ∈ Ai
– a reward function ri : A → R, where A =∏
i∈I Ai
Normal form games are also called strategic form, or matrix, games.
Notation
Write a = (ai, a−i) ∈ A for ai ∈ Ai and a−i ∈ A−i =∏
j 6=i Aj.
Write q = (qi, q−i) ∈ Q for qi ∈ Qi and q−i ∈ Q−i =∏
j 6=i Qi,
where Qi = {qi ∈ RAi|∑
j qij = 1 & qij ≥ 0,∀j}.
![Page 14: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/14.jpg)
Nash Equilibrium
Preliminaries
E[ri(q)] =∑
a∈A
q(a)ri(a), where q(a) =∏
i
qi(ai)
BRi(q) ≡ BRi(q−i) = {q∗i ∈ Qi | ∀qi ∈ Qi, E[ri(q∗i , q−i)] ≥ E[ri(qi,
Definition
A Nash equilibrium is an action profile q∗ s.t. q∗ ∈ BR(q∗).
Theorem [Nash 51]
Every finite strategic form game has a mixed strategy Nash equilib
![Page 15: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/15.jpg)
General-Sum Games
Battle of the Sexes Stag Hunt
B F
B 2,1 0,0F 0,0 1,2
C D
C 2,2 0,1D 1,0 1 + ε,1 + ε
Coordination Game Shapley Game
L C R
T 3,3 0,0 0,0M 0,0 2,2 0,0B 0,0 0,0 1,1
L C R
T 0,0 1,0 0,1M 0,1 0,0 1,0B 1,0 0,1 0,0
![Page 16: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/16.jpg)
Correlated Equilibrium
ChickenL R
T 6,6 2,7B 7,2 0,0
CEL R
T 1/2 1/4B 1/4 0
max12πTL + 9πTR + 9πBL + 0πBR
subject to
πTL + πTR + πBL + πBR = 1
πTL, πTR, πBL, πBR ≥ 0
6πL|T + 2πR|T ≥ 7πL|T + 0πR|T
7πL|B + 0πR|B ≥ 6πL|B + 2πR|B
6πT |L + 2πB|L ≥ 7πT |L + 0πB|L
7πT |R + 0πB|R ≥ 6πT |R + 2πB|R
![Page 17: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/17.jpg)
Correlated Equilibrium
Definition
An action profile q∗ ∈ Q is a correlated equilibrium iff for all strategies
if q(ai) > 0,∑
a−i∈A−i
q(a−i|ai) ri(ai, a−i) ≥∑
a−i∈A−i
q(a−i|ai) ri(a′i, a−i)
Observe
Every Nash equilibrium is a correlated equilibrium ⇒
Every finite normal form game has a correlated equilibrium.
![Page 18: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/18.jpg)
Prisoners’ Dilemma
Weights Frequencies
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
Wei
ght
Time
NIR polynomial [p = 2]: Prisoners’ Dilemma
Player 1: CPlayer 1: DPlayer 2: CPlayer 2: D
0 20 40 600
0.2
0.4
0.6
0.8
1
Em
piric
al F
requ
ency
Time
NIR polynomial [p = 2]: Prisoners’ Dilemma
![Page 19: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/19.jpg)
Battle of the Sexes
Weights Frequencies
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
Wei
ght
Time
NIR polynomial [p = 2]: Battle of the Sexes
Player 1: BPlayer 1: FPlayer 2: BPlayer 2: F
0 20 40 600
0.2
0.4
0.6
0.8
1
Em
piric
al F
requ
ency
Time
NIR polynomial [p = 2]: Battle of the Sexes
![Page 20: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/20.jpg)
Stag Hunt
Weights Frequencies
0 100 200 300 400 5000
0.2
0.4
0.6
0.8
1
Wei
ght
Time
NIR exponential [η = 0.052655]: Stag Hunt
Player 1: TPlayer 1: BPlayer 2: LPlayer 2: R
0 100 200 3000
0.2
0.4
0.6
0.8
1
Em
piric
al F
requ
ency
Time
NIR exponential [η = 0.052655]: Stag Hunt
![Page 21: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/21.jpg)
Coordination Game
Weights Frequencies
0 100 200 300 400 5000
0.2
0.4
0.6
0.8
1
Wei
ght
Time
NIR exponential [η = 0.066291]: Coordination Game
Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R
0 100 200 3000
0.2
0.4
0.6
0.8
1
Em
piric
al F
requ
ency
Time
NIR exponential [η = 0.066291]: Coordination Game
Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R
![Page 22: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/22.jpg)
Shapley Game: No Internal Regret Learning
Frequencies
0 2000 4000 6000 8000 100000
0.2
0.4
0.6
0.8
1
Em
piric
al F
requ
ency
Time
NIR polynomial [p = 2]: Shapley Game
Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R
0 2000 4000 6000 80000
0.2
0.4
0.6
0.8
1
Em
piric
al F
requ
ency
Time
NIR exponential [η = 0.014823]: Shapley Game
Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R
![Page 23: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/23.jpg)
Shapley Game: No Internal Regret Learning
Joint Frequencies
0 2000 4000 6000 8000 100000
0.2
0.4
0.6
0.8
1
Em
piric
al F
requ
ency
Time
NIR polynomial [p = 2]: Shapley Game
(T,L)(M,L)(B,L)(T,C)(M,C)(B,C)(T,R)(M,R)(B,R)
0 2000 4000 6000 80000
0.2
0.4
0.6
0.8
1
Em
piric
al F
requ
ency
Time
NIR exponential [η = 0.014823]: Shapley Game
(T,L)(M,L)(B,L)(T,C)(M,C)(B,C)(T,R)(M,R)(B,R)
![Page 24: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/24.jpg)
Shapley Game: No External Regret Learning
Frequencies
0 2000 4000 6000 8000 100000
0.2
0.4
0.6
0.8
1
Em
piric
al F
requ
ency
Time
NER polynomial [p = 2]: Shapley Game
Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R
0 2000 4000 6000 80000
0.2
0.4
0.6
0.8
1
Em
piric
al F
requ
ency
Time
NER exponential [η = 0.014823]: Shapley Game
Player 1: TPlayer 1: MPlayer 1: BPlayer 2: LPlayer 2: CPlayer 2: R
![Page 25: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/25.jpg)
Zero-Sum Games
Matching Pennies
H T
H −1,1 1,−1T 1,−1 −1,1
Rock-Paper-Scissors
R P S
R 0,0 −1,1 1,−1P 1,−1 0,0 −1,1S −1,1 1,−1 0,0
Definition∑
k∈I rk(a) = 0, for all a ∈ A∑
k∈I rk(a) = c, for all a ∈ A, for some c ∈ R
![Page 26: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/26.jpg)
Zero-Sum Games: Pure Actions
Two Players mkl ≡ M(k, l) = r1(k, l) = −r2(k, l)
◦ Maximizer k∗ ∈ argmaxk∈A1minl∈A2
mkl
v(k∗) = maxk∈A1minl∈A2
mkl
◦ Minimizer l∗ ∈ argminl∈A2maxk∈A1
mkl
v(l∗) = minl∈A2maxk∈A1
mkl
Example
L R
T 1 2
B 4 3
![Page 27: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/27.jpg)
Zero-Sum Games: Mixed Actions
Two Players M(p, l) =∑
k∈A1p(k)M(k, l)
M(k, q) =∑
l∈A2q(l)M(k, l)
◦ Maximizer p∗ ∈ argmaxp∈Q1minl∈A2
M(p, l)v(p∗) = maxp∈Q1
minl∈A2M(p, l)
◦ Minimizer q∗ ∈ argminq∈Q2maxk∈A1
M(k, q)v(q∗) = minq∈Q2
maxk∈A1M(k, q)
Example
L R
T +1 −1
B −1 +1
![Page 28: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/28.jpg)
Minimax Theoremvon Neumann 28
Theorem
In two player, zero-sum games, the minimax value equals the maximin
Easy Direction v(p∗) ≤ v(q∗)
◦ analogous to weak duality in linear programming
Hard Direction v(q∗) ≤ v(p∗)
◦ akin to strong duality in linear programming
![Page 29: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/29.jpg)
Proof of Easy Direction
Observe
M l l∗
k ∗≥
k∗ ∗ ≤ ∗
Therefore, v(k∗) = M(k∗, l) ≤ M(k, l∗) = v(l∗)
Proof of Hard Direction
Corollary of the existence of no-external-regret learning algorithms
Freund & Schapire 96
![Page 30: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/30.jpg)
Matching Pennies
Weights Frequencies
0 200 400 600 800 10000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Wei
ght
Time
NER exponential [η = 0.037233]: Matching Pennies
Player 1: HPlayer 1: TPlayer 2: HPlayer 2: T
0 200 400 6000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Em
piric
al F
requ
ency
Time
NER exponential [η = 0.037233]: Matching Pennies
![Page 31: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/31.jpg)
Rock-Paper-Scissors
Weights Frequencies
0 2000 4000 6000 8000 100000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Wei
ght
Time
NER polynomial [p = 2]: Rock, Paper, Scissors
Player 1: RPlayer 1: PPlayer 1: SPlayer 2: RPlayer 2: PPlayer 2: S
0 2000 4000 60000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Em
piric
al F
requ
ency
Time
NER polynomial [p = 2]: Rock, Paper, Scissors
![Page 32: slides1cs.brown.edu/~amy/papers/tutorial1.pdf · Title: slides1.dvi Created Date: 1/13/2005 9:30:17 AM](https://reader036.fdocuments.net/reader036/viewer/2022071105/5fdf1f0fcd8cd75bfa1be9e3/html5/thumbnails/32.jpg)
Summary of Part I
“A little rationality goes a long way” [Hart 03]
No-Regret Learning
◦ No-external regret learning converges to minimax equilibrium
◦ No-internal regret learning converges to correlated equilibrium