game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame...

27
Human-aware Robotics 1 Game Theory 2019/04/22 Ø Announcement: q Slides for this lecture are here http://www.public.asu.edu/~yzhan442/teaching/CSE591S19-HAR/Lectures/game.pdf Slides are largely based on information from Mike Conlin

Transcript of game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame...

Page 1: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware Robotics

1

Game Theory• 2019/04/22

Ø Announcement:q Slides for this lecture are here

http://www.public.asu.edu/~yzhan442/teaching/CSE591S19-HAR/Lectures/game.pdf

Slides are largely based on information from Mike Conlin

Page 2: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsGame theoryThe study of strategic decision making. More formally, it is the study of mathematical models of conflict and cooperation between intelligent rational decision-makers.

Tic-Tac-Toe: a zero-sum game

Page 3: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsGame theoryGame theory is, in essence, the science of strategic thinking—a way of making the best decision possible based on the way you expect other people to act. It was once the domain of Nobel Prize-winning economists and big thinkers on geopolitics, but now parents are getting in on the act. Though game theory assumes, as a technical matter, that its players are rational, it applies just as well to not-always-rational children.

A key lesson in game theory, says Barry Nalebuff, a professor at the Yale School of Management, is to understand the perspective of the other players. It isn't about what you would do in another person's shoes, he says; it's about what they would do in their shoes. "Good game theory," he says, "appreciates the quirks and features that make us unique and takes us as we are."

Page 4: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware Robotics

• Games, of course• National Defense – Terrorism and Cold War• Auctions • Sports – Cards, Cycling, and race car driving• Politics – positions taken and $$/time spent on

campaigning• Personnel management• …

Game theory applications

Page 5: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsCake cuttingThe party is over, and you're down to the last bit of cake. All three of your children want it. If you're familiar with game theory, you might think of the classic strategy in which one person cuts the cake and the other chooses the slice. But how do you divide it three ways without anyone throwing a fit?

You want a strategy where everyone feels that they are being equally treated!

Page 6: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsGame theory terminologySimultaneous Move Game – Game in which each player makes decisions without knowledge of the other players’ decisions (e.g., the prisoner’s dilemma)

Sequential Move Game – Game in which one player makes a move after observing the other player’s move (e.g., Stackelberg game).

Strategy – In game theory, a decision rule that describes the actions a player will take at each decision point.

Normal Form Game – A representation of a game indicating the players, their possible strategies, and the payoffs resulting from alternative strategies.

Page 7: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsPrisoner’s dilemma

Martha’s options

Don’t Confess Confess

Peter’s Options

Don’t Confess P: 2 years, M: 2 years P: 10 years, M: 1 year

Confess P: 1 year, M: 10 years P: 6 years, M: 6 years

What is Peter’s best option if Martha doesn’t confess?What is Peter’s best option if Martha confess?

Page 8: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsPrisoner’s dilemma

Martha’s options

Don’t Confess Confess

Peter’s Options

Don’t Confess P: 2 years, M: 2 years P: 10 years, M: 1 year

Confess P: 1 year, M: 10 years P: 6 years, M: 6 years

What is Martha’s best option if Peter doesn’t confess?What is Martha’s best option if Peter Confesses?

Page 9: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsPrisoner’s dilemma

Martha’s options

Don’t Confess Confess

Peter’s Options

Don’t Confess P: 2 years, M: 2 years P: 10 years, M: 1 year

Confess P: 1 year, M: 10 years P: 6 years, M: 6 years

Dominant Strategy – A strategy that results in the highest payoff to a player regardless of the opponent’s action.

Page 10: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsNash Equilibrium

A condition describing a set of strategies in which no player can improve her payoff by unilaterally changing her own strategy, given the other player’s strategy

--every player is doing its best given the other player’s strategy (best response): e.g., for NE (A, B), B is a best response to row agent’s strategy A, and A is a best response to column agent’s strategy B

Page 11: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsPrisoner’s dilemma

Martha’s options

Don’t Confess Confess

Peter’s Options

Don’t Confess P: 2 years, M: 2 years P: 10 years, M: 1 year

Confess P: 1 year, M: 10 years P: 6 years, M: 6 years

A pure strategy Nash Equilibrium here is (Confess, Confess)

Page 12: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsNash Equilibrium

Theorem: Strictly dominated strategies cannot be a part of a Nash equilibrium.• If all players have a strictly dominant

strategy, there is a unique Nash equilibrium• Weakly dominated strategies may be part of

Nash equilibria.Ø A NE may not be always be associated with a

dominant strategy.

There always exists a dominant strategy? C dominates D

Page 13: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsBK vs McDBurger King’s options

Enter Tempe Marketplace

Don’t Enter Tempe Marketplace

McDonalds’ Options

Enter Tempe Marketplace

PM = -30, PBK = -40 PM = 50, PBK = 0

Don’t Enter Tempe Marketplace

PM = 0, PBK = 40 PM = 0, PBK = 0

Is there a dominant strategy for BK? Is there a dominant strategy for McD?

Page 14: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsBurger King’s options

Enter Tempe Marketplace

Don’t Enter Tempe Marketplace

McDonalds’ Options

Enter Tempe Marketplace

PM = -30, PBK = -40 PM = 50, PBK = 0

Don’t Enter Tempe Marketplace

PM = 0, PBK = 40 PM = 0, PBK = 0

Is there a pure strategy Nash Equilibrium?

BK vs McD

Page 15: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsBurger King’s options

Enter Tempe Marketplace

Don’t Enter Tempe Marketplace

McDonalds’ Options

Enter Tempe Marketplace

PM = -30, PBK = -40 PM = 50, PBK = 0

Don’t Enter Tempe Marketplace

PM = 0, PBK = 40 PM = 0, PBK = 0

Yes, there are 2 – (Enter, Don’t Enter) and (Don’t Enter, Enter).

BK vs McD

Page 16: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsNash EquilibriumNo need for a dominant strategy to have Nash Equilibrium

Is there always a pure strategy Nash Equilibrium?

Page 17: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsWorker MonitoringWorker’s options

Work Shirk

Manager’s Options

Monitor M: -1, W: 1 M: 1, W: -1

Don’t Monitor M: 1, W: -1 M: -1, W: 1

Is there a dominant strategy for the worker?

Is there a dominant strategy for the manager?

Page 18: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsWorker MonitoringWorker’s options

Work Shirk

Manager’s Options

Monitor M: -1, W: 1 M: 1, W: -1

Don’t Monitor M: 1, W: -1 M: -1, W: 1

Is there a pure strategy Nash Equilibrium for the worker?

Page 19: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware Robotics

Definition:A strategy whereby a player randomizes over two or more available actions in order to keep rivals from being able to predict his or her actions.

John Nash proved a mixed Nash Equilibrium always exists

Mixed Strategy Nash Equilibrium

Page 20: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsMixed Strategy Nash Equilibrium

How to compute the mixed strategy?

Page 21: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware Robotics

Worker’s options

Work Shirk

Manager’s Options

Monitor M: -1, W: 1 M: 1, W: -1

Don’t Monitor M: 1, W: -1 M: -1, W: 1PM

1-PM

PW 1-PW

Mixed Strategy

How to compute the mixed strategy?

Page 22: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware Robotics

Worker’s options

Work Shirk

Manager’s Options

Monitor M: -1, W: 1 M: 1, W: -1

Don’t Monitor M: 1, W: -1 M: -1, W: 1PM

1-PM

PW 1-PW

Mixed Strategy

Consider the manager’s strategy: if the worker best-responds with a mixed strategy, the manager must have made the worker indifferent between Work and Shirk!

Page 23: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware Robotics

Manager selects PM to make Worker indifferent between working and shirking (i.e., same expected payoff)

• Worker’s expected payoff from workingPM * (1)+(1 - PM) * (-1) = -1+ 2 * PM

• Worker’s expected payoff from shirkingPM * (-1)+(1 - PM) * (1) = 1 – 2 * PM

Worker’s expected payoff is the same from working and shirking if PM=0.5. This expected payoff is 0 (-1+2 * 0.5 = 0 and 1 – 2 * 0.5 = 0). Therefore, worker’s best response is to either work or shirk or randomize between working and shirking.

Mixed Strategy

Page 24: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware Robotics

Worker selects PW to make Manager indifferent between monitoring and not monitoring.

• Manager’s expected payoff from monitoringPW * (-1) + (1 - PW) * (1) = 1 – 2 * PW

• Manager’s expected payoff from not monitoringPW * (1) + (1 - PW) * (-1) = -1 + 2 * PW

Manager’s expected payoff is the same from monitoring and not monitoring if PW = 0.5. Therefore, the manager’s best response is to either monitor or not monitor or randomize between monitoring or not monitoring .

Mixed Strategy

Page 25: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware Robotics

• Worker works with probability 0.5 and shirks with probability 0.5 (i.e., PW = 0.5)

• Manager monitors with probability 0.5 and doesn’t monitor with probability 0.5 (i.e., PM = 0.5)

Neither the Worker nor the Manager can increase their expected payoff by playing some other strategy (expected payoff for both is zero). They are both playing a best response to the other player’s strategy.

Mixed Strategy

Page 26: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware RoboticsMixed Strategy

Worker’s options

Work Shirk

Manager’s Options

Monitor M: -1, W: 1 M: 1, W: -1

Don’t Monitor M: 1, W: -1 M: -1, W: 1-.5 1.5

What if the monitoring cost decreases?

PM

1-PM

PW 1-PW

Page 27: game - Arizona State Universityyzhan442/teaching/CSE591S19-HAR/Lectures/game.pdfGame theoryHuman-aware Robotics The study of strategic decision making. More formally, it is the study

Human-aware Robotics

• Worker works with probability 0.625 and shirks with probability 0.375 (i.e., PW = 0.625)

• Manager monitors with probability 0.5 and doesn’t monitor with probability 0.5 (i.e., PM = 0.5)

Mixed Strategy Nash Equilibrium

The decrease in monitoring costs does not change the probability that the manager monitors. However, it increases the probability that the worker works.