Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

48
Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

Page 1: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

Opponent Modeling In Interesting Adversarial Environments

Brett Borghetti

22 Jul 2008

Page 2: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 2

Opponent Modeling in Interesting Adversarial Environments

• Opponent: An entity known to be attempting to maximize his own well being in an environment where that usually implies he is minimizing ours.

• Modeling: Building a capability to predict something about a target based on observations of the target in its environment.

• Interesting Environments: Where some agents are unlikely to have access to – or desire to use - perfect equilibrium solutions.

Page 3: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 3

Motivation

• Systems with opponent modeling capability can learn to beat approximations of equilibrium strategies.

• Leaving the equilibrium opens an agent to exploitation.

• So we must decide whether to model our opponent– How good can we do in a given environment?– How good are we doing with a given model?– What can we do to improve if we don’t know our

opponent yet?

Page 4: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 4

Overview

• Contributions and Domains

• Related Work

• Opponent Models and Predictions

• Measuring Model Quality

• The Environment-Value of a Model

• Improving Opponent Models

• Conclusions & Future Work

Page 5: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 5

Contributions

• Developed a fast-learning model for poker and showcased a method for quantifying the contribution made by the learning component.

• Created a new domain-independent measure of performance prediction quality for opponent models.

• Defined the environment-value of an opponent model and showed how to calculate it.

• Presented new techniques for improving models when the opponents are unknown and observations of their behavior are unavailable.

Page 6: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 6

Domains of InterestInformation Action Length Use

Simultaneous

High Card Draw

Hidden Simult. Single Perf. Bounds

Full Scale Poker Hidden Seq. Ext. Measurement; Improving Models

Simultaneous Move Strategy Game

Perfect Simult. Ext. Perf. Bounds

Page 7: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 7

Related Work

• Agent Interaction– Game Theory– Opponent Modeling

• Opponent Model Quality– Desired Properties– Measuring Quality

Page 8: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 8

Related Work: Game TheoryRationality and Equilibrium

Nash, Aumann, Brandenberger: Nash, Eq. & Correlated Eq.; unbounded rationality

Simon: Bounded Rationality

Representing Environments as Games

Nash: two person games, non-cooperative games

Koller & Megiddio: Mapping environments to extensive form games

Solving Games Koller & Megiddio: Direct solution for extensive-form games

Allis: Levels of solution; solving Connect 4

Schaeffer: solving checkers

Intractable Solutions

Billings; Zinkevitch: State Abstraction

Gilpin & Sandholm: Automated Abstraction

Page 9: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 9

Related Work: Opponent ModelingRepresentations: Carmel & Markovitch: DFA

Gal &Pfeffer: MAID, NID

Steffens: Case Based Reasoning

Riley & Veloso: Archetypes

Learning: Bowling & Veloso: WoLF (GD)

Jensen: ELPH (pred. entropy)

Hoehn: Exp3

Using: Carmel & Markovitch: M*

Luckhardt & Irani: maxn

Sturtevant & Bowling: Soft-maxn

Page 10: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 10

Related Work: Model QualityDesired properties:

Bowling & Veloso: Rationality, Convergence

Jensen: Fast reaction (tracking)

Performance Measurements

Carmel & Markovitch: Error

Rogowski: Accuracy

Blaylock & Allen; Cox & Kerkez: Precision and Recall

Davidson; Sukthanar & Sycara: Confusion Matrix

Page 11: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 11

Related Work: What’s Missing?

1. Model prediction quality is usually measured in terms of number of correct predictions made.

2. Nothing regarding performance bounds on opponent models as a function of the game.

3. Lacking in dedicated improvement techniques for opponent modeling.

Page 12: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 12

Definition of Opponent Model

• Two types of models:– Predict future action

based on current state

– Explain past state based on current action

• Can be expressed in terms of probability distribution over possible actions or states

Page 13: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 13

Prediction

• Opponent is going to make a decision• Model’s job is to predict the

conditional probability that the opponent will take each action

F

C

R

?

?

?

Page 14: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 14

Calculating Expected Value (EV)of a Branch

• If we are deciding what to do next we would like to know the EV of each action

• EV = (PiVi)

• EV depends on our opponent

L

B4b

3b

LB

6b

5b

8b

7b

B2b

2b

LB

4b

3b

6b

5b

L8b

7b

r

r

r

r

r

r

c

c

c

c

c

c

c

ff

f

f

f

f

f

f

1.5b

cN

Page 15: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 15

Measuring Model Quality

• Total Performance: How well did the agent perform overall?

• Differential Performance: How much better is the agent when it can learn?

• Prediction Performance: How well did the model predict the opponent?

Page 16: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 16

Measuring Model Quality: Total Performance Measurement

• PokeMinnLimit1 vs. Top 3 agents (2007)– Equilibrium/Fixed

Strategy opponents

• Smoothed payouts over 10 matches/3000 hands each

• Evidence of learning against very strong fixed strategies

Page 17: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 17

Measuring Model Quality:Differential Performance

Page 18: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 18

Measuring Model Quality:Prediction Performance

• Expected value calculations require correct distributions over actions

• If our predictions are distributions over opponent behaviors, current measures are not well suited.– Precision/recall don’t make sense when the number of

underlying classes is large

• We need a scalar measure of the distance between two distributions

Page 19: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 19

Measuring Model Quality:Prediction Performance

• Desired traits of a distribution-distance measurement – Result = 0 if distributions are the same– Result >= 0 for all pairs of distributions– Result is symmetric– Result obeys triangle inequality– Result obeys the transitive property– Result is bounded

Page 20: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 20

Measuring Model Quality:Prediction Performance

Where D is Relative Entropy / KL-Divergence:

Given distributions p (prediction) and q (truth) over a set of choices , the Jensen-Shannon Divergence is:

Page 21: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 21

Measuring Model Quality:Prediction Performance

• Weighted Prediction Divergence: For all nodes k in the set N, given the calculations for each node’s JSD and a weighting scheme W, calculate the weighted prediction divergence for the set of nodes

Page 22: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 22

Measuring Model Quality:Prediction Performance

• Weighting Schemes– Uniform: When no particular biases are required, set

all weights to 1.

– Frequency weighting: When some nodes occur more frequently, weight them according to frequency of occurring

– Utility Weighting: based on the relative value of outcomes for trajectories passing through this node.

– Risk (reward) weighting: Multiplying a node’s frequency by its utility yields the node’s risk.

Page 23: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 23

Measuring Model Quality:Prediction Performance

• Empirical Results from Poker– Collected data from PokeMinnLimit1 vs

HyperboreanLimitEq1 from 2007 Computer Poker Competition

– Compared two opponent models predictions to truth data

– Training Set Size: 20 observations/hand– Test Set Size: 660 observations/hand

Page 24: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 24

Measuring Model Quality:Prediction Performance

• While Learning performed well during the pre-flop and flop portions of the game, it grew worse over time during the turn and river.

Page 25: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 25

Measuring Model Quality:Discussion

• While agent-level performance of the learning model was better than the fixed model, WPD analysis reveals quality issues with the performance of the learning model.

• WPD paves the way for reasoning over which types of models might do well during a specific interaction.

Page 26: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 26

Bounding Performance• Goal: Determine the relationship between the structure

of an environment and the potential value of an opponent model within that environment.– Helps us make decisions on whether, and what to model

• Method– Determine equilibrium behavior (and value) in original game– Transform the environment to provide perfect predictions– Recalculate the behavior and value in the new game

• Domains– Simultaneous-Bet High Card Draw – Simultaneous-move Strategy Game

Page 27: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 27

Bounding Performance:Simultaneous-Bet High Card Draw

• High Card Draw With Simultaneous Betting– Ante = $1– Bet = $1

• Strategy = Threshold Card– Opponent = X axis– Me = Y axis

• Expected Earnings from our perspective (Z axis)

• Pure Strategy Equilibrium (PSE) at EIGHT,EIGHT

-0.75 -0.7-0.65 -0.6

-0.55-0.5

-0.45 -0.4

-0.35

-0.3

-0.25-0.2

-0.15

-0.1

-0.1

-0.05

-0.05

0

0 0.05

0.05

0.1

0.1

0.15

0.2

0.25

0.3

0.350.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

My payouts under normal game circumstances

My opponent's strategy

My

stra

tegy

TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACETWO

THREE

FOUR

FIVE

SIX

SEVEN

EIGHT

NINE

TEN

JACK

QUEEN

KING

ACE

Page 28: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 28

TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACE-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

X: 8Y: 0.2443

My payouts if I can see my opponent's cards - BR strategies of each player, Ante = 1, Bet = 1

Strategy (Threshold)O

ppon

ent

payo

ut

X: 14Y: 0.2443

My Payout BR against opponent strategy

My Payout when opponent plays BR to my strategy

Bounding Performance: Simultaneous-Bet High Card Draw

• When we know the opponent’s State (Card) what strategy should we choose?– If we have higher card, bet, otherwise follow our strategy

• PSE = ACE,EIGHT• Guaranteed at least $0.24 per game

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0.25

0.25

0.3

0.3

0.35

0.35

0.4

0.4

0.45

0.45

0.45

0.50.55

0.6

0.65

0.70.75

My payouts if I can see my opponent's cards

My opponent's strategy

My

stra

tegy

if m

y ca

rd is

low

er t

han

my

oppo

nent

s

TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACETWO

THREE

FOUR

FIVE

SIX

SEVEN

EIGHT

NINE

TEN

JACK

QUEEN

KING

ACE

Guaranteed Bound

Page 29: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 29

TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACE-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

My payouts if I can see my opponent's strategy - BR strategies of each player, Ante = 1, Bet = 1

Strategy (Threshold)

Opp

onen

t pa

yout

X: 8Y: 0

My Payout BR against opponent strategy

My Payout when opponent plays BR to my strategy

Bounding Performance: Simultaneous-Bet High Card Draw

-0.75 -0.7-0.65-0.6

-0.55

-0.5

-0.45

-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.1

-0.05

-0.05

0

00.05

0.05

0.1

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.70.75

My payouts under normal game circumstances

My opponent's strategy

My

stra

tegy

TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACETWO

THREE

FOUR

FIVE

SIX

SEVEN

EIGHT

NINE

TEN

JACK

QUEEN

KING

ACE

• When we know the opponent’s Policy, what strategy should we use?

• If opponent policy EIGHT, our payoff = $0.0

• But could be higher if opponent is not using that policy

Guaranteed Bound

Page 30: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 30

TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACE-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

X: 7Y: 0.09804

Strategy (Threshold)M

y pa

yout

My Payouts if I can see my opponent's action: - BR strategies of each player, Ante = 1, Bet = 1

X: 4Y: 0.1161

X: 14Y: 0.8507My Payout BR against opponent strategy

My Payout when opponent plays BR to my strategy

Bounding Performance: Simultaneous-Bet High Card Draw

-0.75-0.7-0.65-0.6

-0.55

-0.5-0.45

-0.4

-0.35

-0.3-0.25

-0.2-0.15

-0.1

-0.1

-0.05

-0.05

0

00.05

0.05

0.1

0.1

0.150.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.80.85

My payouts if I can see my opponent's action

My opponent's strategy

My

stra

tegy

giv

en m

y op

pone

nt's

act

ion

is t

o be

t

TWO THREE FOUR FIVE SIX SEVEN EIGHT NINE TEN JACK QUEEN KING ACETWO

THREE

FOUR

FIVE

SIX

SEVEN

EIGHT

NINE

TEN

JACK

QUEEN

KING

ACE

• If we know the opponent’s next action what strategy should we choose?– Equivalent to turn-based betting

• No PSE• Guarantee of at least $0.10 if I play fixed (SEVEN)

– Mixed Strategy (SIX and SEVEN, 50% each) guarantees $0.1124• New expected value -0.78 < EV < +0.85

Guaranteed Bound

Page 31: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 31

Bounding Performance: Simultaneous-Bet High Card Draw

• Normal expectation is ±$0.78

• If we know perfectly the opponent’s:– State: Guaranteed at least $0.24– Policy: Guaranteed at least $0.00– Action: Guaranteed at least $0.10 ($0.11

mixed)

• Helps us make modeling decisions

Page 32: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 32

Bounding Performance: Simultaneous-Bet High Card Draw

• Examine class of games formed when– Ante=1-Bet

– Bet ranges from [0,2]

• Where do the guarantees occur?

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

0.05

0.1

0.15

0.2

0.25Perfect Model Guarantee Comparisons

Bet amount (=1-Ante Amount)

My

payo

ut

State

PolicyAction

Page 33: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 33

Bounding Performance:Simultaneous-Move Strategy Game

• 2-Player Perfect Information Environment– 4 locations– 3 unit cap per player, 4 base limit per player

• Multiple phases per turn– Order/Move/Battle/Generate/Destroy/Build

• Solvable with iterated equilibrium calculation (2136 state isomorphisms with up to 2304 joint action sets per state)

Page 34: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 34

Bounding Performance:Simultaneous-Move Strategy Game

• Transform environment to new environment where the action of the opponent is always revealed– Equivalent to a turn-based game

• Calculate equilibrium in the new game

• Subtract values to obtain performance improvement attributable to the model

Page 35: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 35

Bounding Performance:Simultaneous-Move Strategy Game

• When oracle is used on every turn our improvement is significant– Fair Starting State mean improvement = 0.0808

– All State mean improvement = 0.0692

Page 36: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 36

Bounding Performance:Simultaneous-Move Strategy Game

• When oracle is used only once our improvement is small– Fair Starting State mean improvement = 0.0041

– All State mean improvement = 0.0129

Page 37: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 37

Improving Opponent Models:

• Goal: Formalize methods for assisting developers wishing to improve their modeling algorithms when actual opponents are unknown or unavailable for testing.

• Method: Use co-evolution and near-self-play to generate strong agents

Page 38: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 38

Improving Opponent Models:Tools

• Instrumentation– Identify and track measurable

performance points where quality can be assessed (i.e. WPD)

• Advanced Parametric Control– Identify p key features of the code

(functions or variable settings) and rewrite the software to allow the features to be chosen at run time

Page 39: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 39

Improving Opponent Models:Techniques

• How do we reduce the size of the search?– Preprocess the k-parameter space to determine

parameters with the most effect on outcome

• Rank order performance– Test all pairs of parameter settings after

preprocessing

• Techniques are amenable to automation

Page 40: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 40

Future Work

• Equilibrium deviance detection (JSD)– If opponent is not playing correct distribution

• Automatic Performance Improvement – Genetic Algorithm Parameter Search (offline)– Metareasoning (online)

Page 41: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 41

Future Work (cont.)

• Bounds calculation for model value– Private oracles– Distribution of opponent types– Non-Zero-Sum Environments

• Non-stationary opponents (anticipation)

Page 42: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 42

Questions?

Page 43: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 43

Precision and Recall

• Precision = correctly predicted / total predictions

– What percentage of my predictions were correct?

• Recall = correctly predicted / total in that prediction class

– What percentage of the total class was I able to correctly classify?

• F-measure = 2*P*R/(P+R)

Page 44: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 44

Simultaneous Turn Strategy GameIterated Equilibrium Calculation

• Calculate the set of transition probabilities from every state to every other state, conditional on the orders given by each player

• Initialize values of states for terminal nodes to +1, 0 or -1 for win/tie/loss. Allow all other state values to float (initialize to zero)

• For each subgame of length L=1..N, compute:– Mixed NE distribution over orders from each player in each state– Value of playing the NE for each player, using game (L-1)’s

values for the values of the states

• Repeat until convergence

Page 45: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 45

Metareasoning Goal

• For each prediction required within an expected value calculation, estimate:– What is the prediction quality and cost of each

model at this node?

• Choose the model for each prediction:– Lowest cost that meets the minimum quality– Highest quality within a fixed cost budget

Page 46: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 46

Proposed System: Overview

Object

LevelMeta-Level

Ground

Level

Doing Reasoning Metareasoning

ActionSelection Control

Perception Monitoring

Page 47: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 47

Proposed System:Object Level Component

Object Level

StrategyAction

Generator

Candidate Models

Control

Monitoring

PredictionEngine

ActionSelection

Perception

Page 48: Opponent Modeling In Interesting Adversarial Environments Brett Borghetti 22 Jul 2008.

22 July 2008 Brett Borghetti 48

Proposed System:Meta-Level Component

Meta-Level

ContextAbstraction

ActivityRepository

Control

Monitoring

UtilityCalculation

PredictionsPrediction CostObservations

World State

Con

text