game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame...

Human-aware Robotics

1

Game Theory• 2019/04/22

Ø Announcement:q Slides for this lecture are here

http://www.public.asu.edu/~yzhan442/teaching/CSE591S19-HAR/Lectures/game.pdf

Slides are largely based on information from Mike Conlin

http://www.public.asu.edu/~yzhan442/teaching/CSE471/Lectures/probability.pdf

Human-aware RoboticsGame theoryThe study of strategic decision making. More formally, it is the study of mathematical models of conflict and cooperation between intelligent rational decision-makers.

Tic-Tac-Toe: a zero-sum game

Human-aware RoboticsGame theoryGame theory is, in essence, the science of strategic thinking—a way of making the best decision possible based on the way you expect other people to act. It was once the domain of Nobel Prize-winning economists and big thinkers on geopolitics, but now parents are getting in on the act. Though game theory assumes, as a technical matter, that its players are rational, it applies just as well to not-always-rational children.

A key lesson in game theory, says Barry Nalebuff, a professor at the Yale School of Management, is to understand the perspective of the other players. It isn't about what you would do in another person's shoes, he says; it's about what they would do in their shoes. "Good game theory," he says, "appreciates the quirks and features that make us unique and takes us as we are."


• Games, of course• National Defense – Terrorism and Cold War• Auctions • Sports – Cards, Cycling, and race car driving• Politics – positions taken and $$/time spent on

campaigning• Personnel management• …

Game theory applications

Human-aware RoboticsCake cuttingThe party is over, and you're down to the last bit of cake. All three of your children want it. If you're familiar with game theory, you might think of the classic strategy in which one person cuts the cake and the other chooses the slice. But how do you divide it three ways without anyone throwing a fit?

You want a strategy where everyone feels that they are being equally treated!

Human-aware RoboticsGame theory terminologySimultaneous Move Game – Game in which each player makes decisions without knowledge of the other players’ decisions (e.g., the prisoner’s dilemma)

Sequential Move Game – Game in which one player makes a move after observing the other player’s move (e.g., Stackelberg game).

Strategy – In game theory, a decision rule that describes the actions a player will take at each decision point.

Normal Form Game – A representation of a game indicating the players, their possible strategies, and the payoffs resulting from alternative strategies.

Human-aware RoboticsPrisoner’s dilemma

Martha’s options

Don’t Confess Confess

Peter’s Options

Don’t Confess P: 2 years, M: 2 years P: 10 years, M: 1 year

Confess P: 1 year, M: 10 years P: 6 years, M: 6 years

What is Peter’s best option if Martha doesn’t confess?What is Peter’s best option if Martha confess?


Martha’s options


Peter’s Options



What is Martha’s best option if Peter doesn’t confess?What is Martha’s best option if Peter Confesses?


Martha’s options


Peter’s Options



Dominant Strategy – A strategy that results in the highest payoff to a player regardless of the opponent’s action.

Human-aware RoboticsNash Equilibrium

A condition describing a set of strategies in which no player can improve her payoff by unilaterally changing her own strategy, given the other player’s strategy

--every player is doing its best given the other player’s strategy (best response): e.g., for NE (A, B), B is a best response to row agent’s strategy A, and A is a best response to column agent’s strategy B


Martha’s options


Peter’s Options



A pure strategy Nash Equilibrium here is (Confess, Confess)

Human-aware RoboticsNash Equilibrium

Theorem: Strictly dominated strategies cannot be a part of a Nash equilibrium.• If all players have a strictly dominant

strategy, there is a unique Nash equilibrium• Weakly dominated strategies may be part of

Nash equilibria.Ø A NE may not be always be associated with a

dominant strategy.

There always exists a dominant strategy? C dominates D

Human-aware RoboticsBK vs McDBurger King’s options

Enter Tempe Marketplace

Don’t Enter Tempe Marketplace

McDonalds’ Options


PM = -30, PBK = -40 PM = 50, PBK = 0


PM = 0, PBK = 40 PM = 0, PBK = 0

Is there a dominant strategy for BK? Is there a dominant strategy for McD?

Human-aware RoboticsBurger King’s options





PM = -30, PBK = -40 PM = 50, PBK = 0


PM = 0, PBK = 40 PM = 0, PBK = 0

Is there a pure strategy Nash Equilibrium?

BK vs McD

Human-aware RoboticsBurger King’s options





PM = -30, PBK = -40 PM = 50, PBK = 0


PM = 0, PBK = 40 PM = 0, PBK = 0

Yes, there are 2 – (Enter, Don’t Enter) and (Don’t Enter, Enter).

BK vs McD

Human-aware RoboticsNash EquilibriumNo need for a dominant strategy to have Nash Equilibrium

Is there always a pure strategy Nash Equilibrium?

Human-aware RoboticsWorker MonitoringWorker’s options

Work Shirk

Manager’s Options

Monitor M: -1, W: 1 M: 1, W: -1

Don’t Monitor M: 1, W: -1 M: -1, W: 1

Is there a dominant strategy for the worker?

Is there a dominant strategy for the manager?

Human-aware RoboticsWorker MonitoringWorker’s options

Work Shirk

Manager’s Options

Monitor M: -1, W: 1 M: 1, W: -1

Don’t Monitor M: 1, W: -1 M: -1, W: 1

Is there a pure strategy Nash Equilibrium for the worker?


Definition:A strategy whereby a player randomizes over two or more available actions in order to keep rivals from being able to predict his or her actions.

John Nash proved a mixed Nash Equilibrium always exists

Mixed Strategy Nash Equilibrium

Human-aware RoboticsMixed Strategy Nash Equilibrium

How to compute the mixed strategy?


Worker’s options

Work Shirk

Manager’s Options

Monitor M: -1, W: 1 M: 1, W: -1

Don’t Monitor M: 1, W: -1 M: -1, W: 1PM

1-PM

PW 1-PW

Mixed Strategy

How to compute the mixed strategy?


Worker’s options

Work Shirk

Manager’s Options

Monitor M: -1, W: 1 M: 1, W: -1

Don’t Monitor M: 1, W: -1 M: -1, W: 1PM

1-PM

PW 1-PW

Mixed Strategy

Consider the manager’s strategy: if the worker best-responds with a mixed strategy, the manager must have made the worker indifferent between Work and Shirk!


Manager selects PM to make Worker indifferent between working and shirking (i.e., same expected payoff)

• Worker’s expected payoff from workingPM * (1)+(1 - PM) * (-1) = -1+ 2 * PM

• Worker’s expected payoff from shirkingPM * (-1)+(1 - PM) * (1) = 1 – 2 * PM

Worker’s expected payoff is the same from working and shirking if PM=0.5. This expected payoff is 0 (-1+2 * 0.5 = 0 and 1 – 2 * 0.5 = 0). Therefore, worker’s best response is to either work or shirk or randomize between working and shirking.

Mixed Strategy


Worker selects PW to make Manager indifferent between monitoring and not monitoring.

• Manager’s expected payoff from monitoringPW * (-1) + (1 - PW) * (1) = 1 – 2 * PW

• Manager’s expected payoff from not monitoringPW * (1) + (1 - PW) * (-1) = -1 + 2 * PW

Manager’s expected payoff is the same from monitoring and not monitoring if PW = 0.5. Therefore, the manager’s best response is to either monitor or not monitor or randomize between monitoring or not monitoring .

Mixed Strategy


• Worker works with probability 0.5 and shirks with probability 0.5 (i.e., PW = 0.5)

• Manager monitors with probability 0.5 and doesn’t monitor with probability 0.5 (i.e., PM = 0.5)

Neither the Worker nor the Manager can increase their expected payoff by playing some other strategy (expected payoff for both is zero). They are both playing a best response to the other player’s strategy.

Mixed Strategy

Human-aware RoboticsMixed Strategy

Worker’s options

Work Shirk

Manager’s Options

Monitor M: -1, W: 1 M: 1, W: -1

Don’t Monitor M: 1, W: -1 M: -1, W: 1-.5 1.5

What if the monitoring cost decreases?

PM

1-PM

PW 1-PW


• Worker works with probability 0.625 and shirks with probability 0.375 (i.e., PW = 0.625)

• Manager monitors with probability 0.5 and doesn’t monitor with probability 0.5 (i.e., PM = 0.5)

Mixed Strategy Nash Equilibrium

The decrease in monitoring costs does not change the probability that the manager monitors. However, it increases the probability that the worker works.

game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame...

Documents

Transcript of game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame...