Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game...

56
Single agent or multiple agents Many domains are characterized by multiple agents rather than a single agent. Game theory studies what agents should do in a multi-agent setting. A game is an abstraction of agents interacting. ©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 1 / 26

Transcript of Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game...

Page 1: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Single agent or multiple agents

Many domains are characterized by multiple agents ratherthan a single agent.

Game theory studies what agents should do in a multi-agentsetting. A game is an abstraction of agents interacting.

Agents can be cooperative, competitive or somewhere inbetween.

Agents that reason and act autonomoulsly can’t be modeledas nature.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 1 / 26

Page 2: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Single agent or multiple agents

Many domains are characterized by multiple agents ratherthan a single agent.

Game theory studies what agents should do in a multi-agentsetting. A game is an abstraction of agents interacting.

Agents can be cooperative, competitive or somewhere inbetween.

Agents that reason and act autonomoulsly can’t be modeledas nature.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 1 / 26

Page 3: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Single agent or multiple agents

Many domains are characterized by multiple agents ratherthan a single agent.

Game theory studies what agents should do in a multi-agentsetting. A game is an abstraction of agents interacting.

Agents can be cooperative, competitive or somewhere inbetween.

Agents that reason and act autonomoulsly can’t be modeledas nature.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 1 / 26

Page 4: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Multi-agent framework

Each agent can have its own utility.

Agents select actions autonomously.

Agents can have different information.

The outcome can depend on the actions of all of the agents.

Each agent’s value depends on the outcome.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 2 / 26

Page 5: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Multi-agent framework

Each agent can have its own utility.

Agents select actions autonomously.

Agents can have different information.

The outcome can depend on the actions of all of the agents.

Each agent’s value depends on the outcome.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 2 / 26

Page 6: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Multi-agent framework

Each agent can have its own utility.

Agents select actions autonomously.

Agents can have different information.

The outcome can depend on the actions of all of the agents.

Each agent’s value depends on the outcome.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 2 / 26

Page 7: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Multi-agent framework

Each agent can have its own utility.

Agents select actions autonomously.

Agents can have different information.

The outcome can depend on the actions of all of the agents.

Each agent’s value depends on the outcome.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 2 / 26

Page 8: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Multi-agent framework

Each agent can have its own utility.

Agents select actions autonomously.

Agents can have different information.

The outcome can depend on the actions of all of the agents.

Each agent’s value depends on the outcome.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 2 / 26

Page 9: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Normal Form of a Game

The strategic form of a game or normal-form game:

a finite set I of agents, {1, . . . , n}.a set of actions Ai for each agent i ∈ I .

An action profile σ is a tuple 〈a1, . . . , an〉, means agent icarries out ai .

a utility function utility(σ, i) for action profile σ and agenti ∈ I , gives the expected utility for agent i when all agentsfollow action profile σ.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 3 / 26

Page 10: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Normal Form of a Game

The strategic form of a game or normal-form game:

a finite set I of agents, {1, . . . , n}.a set of actions Ai for each agent i ∈ I .An action profile σ is a tuple 〈a1, . . . , an〉, means agent icarries out ai .

a utility function utility(σ, i) for action profile σ and agenti ∈ I , gives the expected utility for agent i when all agentsfollow action profile σ.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 3 / 26

Page 11: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Normal Form of a Game

The strategic form of a game or normal-form game:

a finite set I of agents, {1, . . . , n}.a set of actions Ai for each agent i ∈ I .An action profile σ is a tuple 〈a1, . . . , an〉, means agent icarries out ai .

a utility function utility(σ, i) for action profile σ and agenti ∈ I , gives the expected utility for agent i when all agentsfollow action profile σ.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 3 / 26

Page 12: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Rock-Paper-Scissors

Bobrock paper scissors

rock 0, 0 −1, 1 1,−1Alice paper 1,−1 0, 0 −1, 1

scissors −1, 1 1,−1 0,0

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 4 / 26

Page 13: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Extensive Form of a Game

keep

Andy

Barb Barb Barb

share give

yes no yes no yes no

2,0 0,0 1,1 0,0 0,2 0,0

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 5 / 26

Page 14: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Extensive Form of an imperfect-information Game

r p s

rock

Alice

Bob Bob Bob

paper scissors

0,0 1,-1-1,1

r p s

1,-1 -1,10,0

r p s

-1,1 0,01,-1

Bob cannot distinguish the nodes in an information set.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 6 / 26

Page 15: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Multiagent Decision Networks

FireAlarm1

Alarm2

Call1

Call2

Call Works

Fire Dept Comes

U1

U2

Value node for each agent.Each decision node is owned by an agent.The parents of each decision node specify what that agent willobserve when making the decision

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 7 / 26

Page 16: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Multiple Agents, shared value

...

...

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 8 / 26

Page 17: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Complexity of Multi-agent decision theory

It can be exponentially harder to find optimal multi-agentpolicy even with a shared values.

Why?

Because dynamic programming doesn’t work:I If a decision node has n binary parents, dynamic programming

lets us solve 2n decision problems.I This is much better than

d2n

policies (where d is the numberof decision alternatives).

Multiple agents with shared values is equivalent to having asingle forgetful agent.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 9 / 26

Page 18: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Complexity of Multi-agent decision theory

It can be exponentially harder to find optimal multi-agentpolicy even with a shared values.

Why? Because dynamic programming doesn’t work:I If a decision node has n binary parents, dynamic programming

lets us solve 2n decision problems.I This is much better than

d2n

policies (where d is the numberof decision alternatives).

Multiple agents with shared values is equivalent to having asingle forgetful agent.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 9 / 26

Page 19: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Complexity of Multi-agent decision theory

It can be exponentially harder to find optimal multi-agentpolicy even with a shared values.

Why? Because dynamic programming doesn’t work:I If a decision node has n binary parents, dynamic programming

lets us solve 2n decision problems.I This is much better than d2n policies (where d is the number

of decision alternatives).

Multiple agents with shared values is equivalent to having asingle forgetful agent.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 9 / 26

Page 20: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Fully Observable + Multiple Agents

If agents act sequentially and can observe the state beforeacting: Perfect Information Games.

Can do dynamic programming or search:Each agent maximizes for itself.

Multi-agent MDPs: value function for each agent.each agent maximizes its own value function.

Multi-agent reinforcement learning: each agent has its own Qfunction.

Two person, competitive (zero sum) =⇒ minimax.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 10 / 26

Page 21: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Fully Observable + Multiple Agents

If agents act sequentially and can observe the state beforeacting: Perfect Information Games.

Can do dynamic programming or search:Each agent maximizes for itself.

Multi-agent MDPs: value function for each agent.each agent maximizes its own value function.

Multi-agent reinforcement learning: each agent has its own Qfunction.

Two person, competitive (zero sum) =⇒ minimax.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 10 / 26

Page 22: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Fully Observable + Multiple Agents

If agents act sequentially and can observe the state beforeacting: Perfect Information Games.

Can do dynamic programming or search:Each agent maximizes for itself.

Multi-agent MDPs: value function for each agent.each agent maximizes its own value function.

Multi-agent reinforcement learning: each agent has its own Qfunction.

Two person, competitive (zero sum) =⇒ minimax.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 10 / 26

Page 23: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Fully Observable + Multiple Agents

If agents act sequentially and can observe the state beforeacting: Perfect Information Games.

Can do dynamic programming or search:Each agent maximizes for itself.

Multi-agent MDPs: value function for each agent.each agent maximizes its own value function.

Multi-agent reinforcement learning: each agent has its own Qfunction.

Two person, competitive (zero sum) =⇒ minimax.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 10 / 26

Page 24: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Fully Observable + Multiple Agents

If agents act sequentially and can observe the state beforeacting: Perfect Information Games.

Can do dynamic programming or search:Each agent maximizes for itself.

Multi-agent MDPs: value function for each agent.each agent maximizes its own value function.

Multi-agent reinforcement learning: each agent has its own Qfunction.

Two person, competitive (zero sum) =⇒ minimax.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 10 / 26

Page 25: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

1: procedure Minimax(n)2: . n is a node. Returns value of n, path3: if n is a leaf node then4: return evaluate(n),None5: else if n is a MAX node then6: max score = −∞; max path=None7: for each child c of n do8: score, path := Minimax(c)9: if score > max score then

10: max score := score ; best path := n : path

11: return max score, best path12: else13: min score = ∞; max path=None14: for each child c of n do15: score, path := Minimax(c)16: if score < min score then17: min score := score ; best path := c : path

18: return min score, best path

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 11 / 26

Page 26: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Pruning Dominated Strategies

h i j k l m n o

d e f g

b c

a

7 9 6 11 12 5 4

7

711

7

7

≤5 ≤4≤6

≤5

≤5

≥11

MAX

MIN

MIN

MAX

square MAX nodes are controlled by an agent that wants tomaximize the score, round MIN nodes are controlled by anadversary who wants to minimize the score.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 12 / 26

Page 27: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Partial Observability and Competition

goalkeeperleft right

kicker left 0.6 0.2right 0.3 0.9

Probability of a goal.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 13 / 26

Page 28: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Stochastic Policies

0 0.2 0.4 0.6 0.8 1

0.20.30.40.50.60.70.80.9

pk

P(goal)

pj=1

pj= 0

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 14 / 26

Page 29: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Strategy Profiles

Assume a general n-player game,

A strategy for an agent is a probability distribution over theactions for this agent.

A strategy profile is an assignment of a strategy to each agent.

A strategy profile σ has a utility for each agent.Let utility(σ, i) be the utility of strategy profile σ for agent i .

If σ is a strategy profile:σi is the strategy of agent i in σ,σ−i is the set of strategies of the other agents.Thus σ is σiσ−i

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 15 / 26

Page 30: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Strategy Profiles

Assume a general n-player game,

A strategy for an agent is a probability distribution over theactions for this agent.

A strategy profile is an assignment of a strategy to each agent.

A strategy profile σ has a utility for each agent.Let utility(σ, i) be the utility of strategy profile σ for agent i .

If σ is a strategy profile:σi is the strategy of agent i in σ,σ−i is the set of strategies of the other agents.Thus σ is σiσ−i

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 15 / 26

Page 31: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Strategy Profiles

Assume a general n-player game,

A strategy for an agent is a probability distribution over theactions for this agent.

A strategy profile is an assignment of a strategy to each agent.

A strategy profile σ has a utility for each agent.Let utility(σ, i) be the utility of strategy profile σ for agent i .

If σ is a strategy profile:σi is the strategy of agent i in σ,σ−i is the set of strategies of the other agents.Thus σ is σiσ−i

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 15 / 26

Page 32: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Nash Equilibria

σi is a best response to σ−i if for all other strategies σ′i foragent i ,

utility(σiσ−i , i) ≥ utility(σ′iσ−i , i).

A strategy profile σ is a Nash equilibrium if for each agent i ,strategy σi is a best response to σ−i . That is, a Nashequilibrium is a strategy profile such that no agent can dobetter by unilaterally deviating from that profile.

Theorem [Nash, 1950] Every finite game has at least oneNash equilibrium.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 16 / 26

Page 33: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Nash Equilibria

σi is a best response to σ−i if for all other strategies σ′i foragent i ,

utility(σiσ−i , i) ≥ utility(σ′iσ−i , i).

A strategy profile σ is a Nash equilibrium if for each agent i ,strategy σi is a best response to σ−i . That is, a Nashequilibrium is a strategy profile such that no agent can dobetter by unilaterally deviating from that profile.

Theorem [Nash, 1950] Every finite game has at least oneNash equilibrium.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 16 / 26

Page 34: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Nash Equilibria

σi is a best response to σ−i if for all other strategies σ′i foragent i ,

utility(σiσ−i , i) ≥ utility(σ′iσ−i , i).

A strategy profile σ is a Nash equilibrium if for each agent i ,strategy σi is a best response to σ−i . That is, a Nashequilibrium is a strategy profile such that no agent can dobetter by unilaterally deviating from that profile.

Theorem [Nash, 1950] Every finite game has at least oneNash equilibrium.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 16 / 26

Page 35: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Multiple Equilibria

Hawk-Dove Game:Agent 2

dove hawkAgent 1 dove R/2,R/2 0,R

hawk R,0 -D,-D

D and R are both positive with D >> R.

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 17 / 26

Page 36: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Coordination

Just because you know the Nash equilibria doesn’t mean you knowwhat to do:

Agent 2shopping football

Agent 1 shopping 2,1 0,0football 0,0 1,2

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 18 / 26

Page 37: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Prisoner’s Dilemma

Two strangers are in a game show. They each have the choice:

Take $100 for yourself

Give $1000 to the other player

This can be depicted as the playoff matrix:

Player 2take give

Player 1 take 100,100 1100,0give 0,1100 1000,1000

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 19 / 26

Page 38: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Tragedy of the Commons

Example:

There are 100 agents.

There is an common environment that is shared amongst allagents. Each agent has 1/100 of the shared environment.

Each agent can choose to do an action that has a payoff of+10 but has a -100 payoff on the environmentor do nothing with a zero payoff

For each agent, doing the action has a payoff of10− 100/100 = 9

If every agent does the action the total payoff is1000− 10000 = −9000

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 20 / 26

Page 39: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Tragedy of the Commons

Example:

There are 100 agents.

There is an common environment that is shared amongst allagents. Each agent has 1/100 of the shared environment.

Each agent can choose to do an action that has a payoff of+10 but has a -100 payoff on the environmentor do nothing with a zero payoff

For each agent, doing the action has a payoff of

10− 100/100 = 9

If every agent does the action the total payoff is1000− 10000 = −9000

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 20 / 26

Page 40: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Tragedy of the Commons

Example:

There are 100 agents.

There is an common environment that is shared amongst allagents. Each agent has 1/100 of the shared environment.

Each agent can choose to do an action that has a payoff of+10 but has a -100 payoff on the environmentor do nothing with a zero payoff

For each agent, doing the action has a payoff of10− 100/100 = 9

If every agent does the action the total payoff is

1000− 10000 = −9000

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 20 / 26

Page 41: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Tragedy of the Commons

Example:

There are 100 agents.

There is an common environment that is shared amongst allagents. Each agent has 1/100 of the shared environment.

Each agent can choose to do an action that has a payoff of+10 but has a -100 payoff on the environmentor do nothing with a zero payoff

For each agent, doing the action has a payoff of10− 100/100 = 9

If every agent does the action the total payoff is1000− 10000 = −9000

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 20 / 26

Page 42: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Extensive Form of a Game

What are the Nash equilibria of:

keep

Andy

Barb Barb Barb

share give

yes no yes no yes no

2,0 0,0 1,1 0,0 0,2 0,0

A strategy for Barb is a choice of what to do in each situation.Action profile eg 1: Andy: keep, Barb: no if keep, otherwise yes.Action profile eg 2: Andy: share, Barb: yes always

What if the 2,0 payoff was 1.9,0.1?

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 21 / 26

Page 43: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Extensive Form of a Game

What are the Nash equilibria of:

keep

Andy

Barb Barb Barb

share give

yes no yes no yes no

2,0 0,0 1,1 0,0 0,2 0,0

A strategy for Barb is a choice of what to do in each situation.Action profile eg 1: Andy: keep, Barb: no if keep, otherwise yes.Action profile eg 2: Andy: share, Barb: yes alwaysWhat if the 2,0 payoff was 1.9,0.1?

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 21 / 26

Page 44: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Computing Nash Equilibria

To compute a Nash equilibria for a game in strategic form:

Eliminate dominated strategies

Determine which actions will have non-zero probabilities. Thisis the support set.

Determine the probability for the actions in the support set

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 22 / 26

Page 45: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Eliminating Dominated Strategies

Agent 2d2 e2 f2

a1 3,5 5,1 1,2Agent 1 b1 1,1 2,9 6,4

c1 2,6 4,7 0,8

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 23 / 26

Page 46: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Computing probabilities in randomized strategies

Given a support set:

Why would an agent will randomize between actions a1 . . . ak?

Actions a1 . . . ak have the same value for that agent given thestrategies for the other agents.

This forms a set of simultaneous equations where variables areprobabilities of the actions

If there is a solution with all the probabilities in range (0,1)this is a Nash equilibrium.

Search over support sets to find a Nash equilibrium

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 24 / 26

Page 47: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Computing probabilities in randomized strategies

Given a support set:

Why would an agent will randomize between actions a1 . . . ak?Actions a1 . . . ak have the same value for that agent given thestrategies for the other agents.

This forms a set of simultaneous equations where variables areprobabilities of the actions

If there is a solution with all the probabilities in range (0,1)this is a Nash equilibrium.

Search over support sets to find a Nash equilibrium

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 24 / 26

Page 48: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Computing probabilities in randomized strategies

Given a support set:

Why would an agent will randomize between actions a1 . . . ak?Actions a1 . . . ak have the same value for that agent given thestrategies for the other agents.

This forms a set of simultaneous equations where variables areprobabilities of the actions

If there is a solution with all the probabilities in range (0,1)this is a Nash equilibrium.

Search over support sets to find a Nash equilibrium

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 24 / 26

Page 49: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Computing probabilities in randomized strategies

Given a support set:

Why would an agent will randomize between actions a1 . . . ak?Actions a1 . . . ak have the same value for that agent given thestrategies for the other agents.

This forms a set of simultaneous equations where variables areprobabilities of the actions

If there is a solution with all the probabilities in range (0,1)this is a Nash equilibrium.

Search over support sets to find a Nash equilibrium

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 24 / 26

Page 50: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Computing probabilities in randomized strategies

Given a support set:

Why would an agent will randomize between actions a1 . . . ak?Actions a1 . . . ak have the same value for that agent given thestrategies for the other agents.

This forms a set of simultaneous equations where variables areprobabilities of the actions

If there is a solution with all the probabilities in range (0,1)this is a Nash equilibrium.

Search over support sets to find a Nash equilibrium

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 24 / 26

Page 51: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Example: computing Nash equilibrium

goalkeeperleft right

kicker left 0.6 0.2right 0.3 0.9

Probability of a goal.

When would goalkeeper randomize?

P(goal | jump left) = p(goal | jump right)

kr ∗ 0.3 + (1− kr) ∗ 0.6 = kr ∗ 0.9 + (1− kr) ∗ 0.2

0.6− 0.2 = (0.6− 0.3 + 0.9− 0.2) ∗ kr

kr = 0.4

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 25 / 26

Page 52: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Example: computing Nash equilibrium

goalkeeperleft right

kicker left 0.6 0.2right 0.3 0.9

Probability of a goal.

When would goalkeeper randomize?

P(goal | jump left) = p(goal | jump right)

kr ∗ 0.3 + (1− kr) ∗ 0.6 = kr ∗ 0.9 + (1− kr) ∗ 0.2

0.6− 0.2 = (0.6− 0.3 + 0.9− 0.2) ∗ kr

kr = 0.4

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 25 / 26

Page 53: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Example: computing Nash equilibrium

goalkeeperleft right

kicker left 0.6 0.2right 0.3 0.9

Probability of a goal.

When would goalkeeper randomize?

P(goal | jump left) = p(goal | jump right)

kr ∗ 0.3 + (1− kr) ∗ 0.6 = kr ∗ 0.9 + (1− kr) ∗ 0.2

0.6− 0.2 = (0.6− 0.3 + 0.9− 0.2) ∗ kr

kr = 0.4

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 25 / 26

Page 54: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Example: computing Nash equilibrium

goalkeeperleft right

kicker left 0.6 0.2right 0.3 0.9

Probability of a goal.

When would goalkeeper randomize?

P(goal | jump left) = p(goal | jump right)

kr ∗ 0.3 + (1− kr) ∗ 0.6 = kr ∗ 0.9 + (1− kr) ∗ 0.2

0.6− 0.2 = (0.6− 0.3 + 0.9− 0.2) ∗ kr

kr = 0.4

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 25 / 26

Page 55: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Learning to Coordinate (multiple agents, single state)

Each agent maintains P[A] a probability distribution overactions.

Each agent maintains Q[A] an estimate of value of doing Agiven policy of other agents.

Repeat:I select action a using distribution P,I do a and observe payoffI update Q:

Q[a]← Q[a] + α(payoff − Q[a])I incremented probability of best action by δ.I decremented probability of other actions

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 26 / 26

Page 56: Single agent or multiple agentspoole/aibook/2e/slides/ch11/lect1.pdf · than a single agent. Game theorystudies what agents should do in a multi-agent setting. Agameis an abstraction

Learning to Coordinate (multiple agents, single state)

Each agent maintains P[A] a probability distribution overactions.

Each agent maintains Q[A] an estimate of value of doing Agiven policy of other agents.

Repeat:I select action a using distribution P,I do a and observe payoffI update Q: Q[a]← Q[a] + α(payoff − Q[a])I incremented probability of best action by δ.I decremented probability of other actions

©D. Poole and A. Mackworth 2019 Artificial Intelligence, Lecture 11.1 26 / 26