Pondering Probabilistic Play Policies for Pig
description
Transcript of Pondering Probabilistic Play Policies for Pig
![Page 1: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/1.jpg)
Pondering Probabilistic Pondering Probabilistic Play Policies for PigPlay Policies for Pig
Todd W. NellerGettysburg College
![Page 2: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/2.jpg)
Sow What’s This All Sow What’s This All About?About?
•The Dice Game “Pig”•Odds and Ends•Playing to Win–“Piglet”–Value Iteration
•Machine Learning
![Page 3: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/3.jpg)
Pig: The GamePig: The Game
•Object: First to score 100 points
•On your turn, roll until:–You roll 1, and score NOTHING.–You hold, and KEEP the sum.
•Simple game simple strategy?
•Let’s play…
![Page 4: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/4.jpg)
Playing to ScorePlaying to Score
•Simple odds argument–Roll until you risk more than you stand to gain.
–“Hold at 20”•1/6 of time: -20 -20/6•5/6 of time: +4 (avg. of 2,3,4,5,6) +20/6
![Page 5: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/5.jpg)
Hold at 20?Hold at 20?
•Is there a situation in which you wouldn’t want to hold at 20?–Your score: 99; you roll 2–Case scenario •you: 79 opponent:99•Your turn total stands at 20
![Page 6: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/6.jpg)
What’s Wrong With What’s Wrong With Playing to Score?Playing to Score?
•It’s mathematically optimal!•But what are we optimizing?•Playing to score Playing to win
•Optimizing score per turn Optimizing probability of a win
![Page 7: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/7.jpg)
PigletPiglet
•Simpler version of Pig with a coin
•Object: First to score 10 points•On your turn, flip until:–You flip tails, and score NOTHING.–You hold, and KEEP the # of heads.
•Even simpler: play to 2 points
![Page 8: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/8.jpg)
Essential InformationEssential Information
•What is the information I need to make a fully informed decision?–My score–The opponent’s score–My “turn score”
![Page 9: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/9.jpg)
A Little NotationA Little Notation
•Pi,j,k – probability of a win ifi = my scorej = the opponent’s scorek = my turn score
•Hold: Pi,j,k = 1 - Pj,i+k,0
•Flip: Pi,j,k = ½(1 - Pj,i,0) + ½ Pi,j,k+1
![Page 10: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/10.jpg)
Assume RationalityAssume Rationality
•To make a smart player, assume a smart opponent.
• (To make a smarter player, know your opponent.)
• Pi,j,k = max(1 - Pj,i+k,0, ½(1 - Pj,i,0 + Pi,j,k+1))
•Probability of win based on best decisions in any state
![Page 11: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/11.jpg)
The Whole StoryThe Whole Story
P0,0,0 = max(1 – P0,0,0, ½(1 – P0,0,0 + P0,0,1))P0,0,1 = max(1 – P0,1,0, ½(1 – P0,0,0 + P0,0,2))P0,1,0 = max(1 – P1,0,0, ½(1 – P1,0,0 + P0,1,1))P0,1,1 = max(1 – P1,1,0, ½(1 – P1,0,0 + P0,1,2))P1,0,0 = max(1 – P0,1,0, ½(1 – P0,1,0 + P1,0,1))P1,1,0 = max(1 – P1,1,0, ½(1 – P1,1,0 + P1,1,1))
![Page 12: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/12.jpg)
The Whole StoryThe Whole Story
P0,0,0 = max(1 – P0,0,0, ½(1 – P0,0,0 + P0,0,1))P0,0,1 = max(1 – P0,1,0, ½(1 – P0,0,0 + P0,0,2))P0,1,0 = max(1 – P1,0,0, ½(1 – P1,0,0 + P0,1,1))P0,1,1 = max(1 – P1,1,0, ½(1 – P1,0,0 + P0,1,2))P1,0,0 = max(1 – P0,1,0, ½(1 – P0,1,0 + P1,0,1))P1,1,0 = max(1 – P1,1,0, ½(1 – P1,1,0 + P1,1,1))
These are winning states!
![Page 13: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/13.jpg)
The Whole StoryThe Whole Story
P0,0,0 = max(1 – P0,0,0, ½(1 – P0,0,0 + P0,0,1))
P0,0,1 = max(1 – P0,1,0, ½(1 – P0,0,0 + 1))P0,1,0 = max(1 – P1,0,0, ½(1 – P1,0,0 +
P0,1,1))P0,1,1 = max(1 – P1,1,0, ½(1 – P1,0,0 + 1))P1,0,0 = max(1 – P0,1,0, ½(1 – P0,1,0 + 1))P1,1,0 = max(1 – P1,1,0, ½(1 – P1,1,0 + 1))Simplified…
![Page 14: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/14.jpg)
The Whole StoryThe Whole Story
P0,0,0 = max(1 – P0,0,0, ½(1 – P0,0,0 + P0,0,1))P0,0,1 = max(1 – P0,1,0, ½(2 – P0,0,0))P0,1,0 = max(1 – P1,0,0, ½(1 – P1,0,0 + P0,1,1))P0,1,1 = max(1 – P1,1,0, ½(2 – P1,0,0))P1,0,0 = max(1 – P0,1,0, ½(2 – P0,1,0))P1,1,0 = max(1 – P1,1,0, ½(2 – P1,1,0))And simplified more into a hamsome set
of equations…
![Page 15: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/15.jpg)
How to Solve It?How to Solve It?
P0,0,0 = max(1 – P0,0,0, ½(1 – P0,0,0 + P0,0,1))P0,0,1 = max(1 – P0,1,0, ½(2 – P0,0,0))P0,1,0 = max(1 – P1,0,0, ½(1 – P1,0,0 + P0,1,1))P0,1,1 = max(1 – P1,1,0, ½(2 – P1,0,0))P1,0,0 = max(1 – P0,1,0, ½(2 – P0,1,0))P1,1,0 = max(1 – P1,1,0, ½(2 – P1,1,0))P0,1,0 depends on P0,1,1 depends on P1,0,0 depends on P0,1,0
depends on …
![Page 16: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/16.jpg)
A System of A System of PigquationsPigquations
P0,1,1
P0,0,1P1,1,0P0,1,0
P1,0,0P0,0,0
Dependencies betweennon-winning states
![Page 17: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/17.jpg)
How Bad Is It?How Bad Is It?
•The intersection of a set of bent hyperplanes in a hypercube
• In the general case, no known method (read: PhD research)
• Is there a method that works (without being guaranteed to work in general)? –Yes! Value Iteration!
![Page 18: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/18.jpg)
Value IterationValue Iteration
•Start out with some values (0’s, 1’s, random #’s)
•Do the following until the values converge (stop changing):–Plug the values into the RHS’s–Recompute the LHS values
•That’s easy. Let’s do it!
![Page 19: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/19.jpg)
Value IterationValue Iteration
P0,0,0 = max(1 – P0,0,0, ½(1 – P0,0,0 + P0,0,1))P0,0,1 = max(1 – P0,1,0, ½(2 – P0,0,0))P0,1,0 = max(1 – P1,0,0, ½(1 – P1,0,0 + P0,1,1))P0,1,1 = max(1 – P1,1,0, ½(2 – P1,0,0))P1,0,0 = max(1 – P0,1,0, ½(2 – P0,1,0))P1,1,0 = max(1 – P1,1,0, ½(2 – P1,1,0))• Assume Pi,j,k is 0 unless it’s a win• Repeat: Compute RHS’s, assign to
LHS’s
![Page 20: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/20.jpg)
But That’s GRUNT But That’s GRUNT Work!Work!
•So have a computer do it, slacker!
•Not difficult – end of CS1 level•Fast! Don’t blink – you’ll miss it•Optimal play:–Compute the probabilities–Determine flip/hold from RHS
max’s– (For our equations, always FLIP)
![Page 21: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/21.jpg)
Piglet SolvedPiglet Solved
•Game to 10•Play to Score: “Hold at 1”
•Play to Win:
0
2
4
6
8
10
Hold Value
Little Pig Hold Values
You
Opponent0 1 2 3 4 5 6 7 8 9
0 2 2 2 2 2 2 2 2 3 101 1 2 2 2 2 2 2 3 3 92 1 1 2 2 2 2 2 2 3 83 1 1 1 2 2 2 2 2 3 74 1 1 1 1 2 2 2 2 2 65 1 1 1 1 1 2 2 2 2 56 1 1 1 1 1 1 2 2 2 47 1 1 1 1 1 1 1 1 3 38 1 1 1 1 1 1 1 2 2 29 1 1 1 1 1 1 1 1 1 1
![Page 22: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/22.jpg)
Pig ProbabilitiesPig Probabilities
•Just like Piglet, but more possible outcomes
•Pi,j,k = max(1 - Pj,i+k,0, 1/6(1 - Pj,i,0 + Pi,j,k+2 + Pi,j,k+3
+ Pi,j,k+4 + Pi,j,k+5 + Pi,j,k+6))
![Page 23: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/23.jpg)
Solving PigSolving Pig
•505,000 such equations•Same simple solution method (value iteration)
•Speedup: Solve groups of interdependent probabilities
•Watch and see!
![Page 24: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/24.jpg)
Pig Sow-lutionPig Sow-lution
![Page 25: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/25.jpg)
Pig Sow-lutionPig Sow-lution
![Page 26: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/26.jpg)
Reachable StatesReachable States
Player 2 Score (j) = 30
![Page 27: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/27.jpg)
Reachable StatesReachable States
![Page 28: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/28.jpg)
Sow-lution forSow-lution forReachable StatesReachable States
![Page 29: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/29.jpg)
Probability ContoursProbability Contours
![Page 30: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/30.jpg)
SummarySummary
•Playing to score is not playing to win.
•A simple game is not always simple to play.
•The computer is an exciting power tool for the mind!
![Page 31: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/31.jpg)
When Value IterationWhen Value IterationIsn’t EnoughIsn’t Enough
•Value Iteration assumes a model of the problem–Probabilities of state transitions– Expected rewards for transitions
•Loaded die?•Optimal play vs. suboptimal
player? •Game rules unknown?
![Page 32: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/32.jpg)
No Model? Then No Model? Then LearnLearn!!
•Can’t write equations can’t solve
•Must learn from experience!•Reinforcement Learning– Learn optimal sequences of
actions– From experience–Given positive/negative feedback
![Page 33: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/33.jpg)
Clever Mabel the CatClever Mabel the Cat
![Page 34: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/34.jpg)
Clever Mabel the CatClever Mabel the Cat
•Mabel claws new La-Z-Boy BAD!
•Cats hate water spray bottle = negative reinforcement
•Mabel claws La-Z-Boy Todd gets up Todd sprays Mabel Mabel gets negative feedback
•Mabel learns…
![Page 35: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/35.jpg)
Clever Mabel the CatClever Mabel the Cat
•Mabel learns to run when Todd gets up.
•Mabel first learns local causality:–Todd gets up Todd sprays Mabel
•Mabel eventually sees no correlation, learns indirect cause
•Mabel happily claws carpet. The End.
![Page 36: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/36.jpg)
BackgammonBackgammon
•Tesauro’s Neurogammon–Reinforcement Learning + Neural
Network (memory for learning)– Learned backgammon through
self-play–Got better than all but a handful
of people in the world!–Downside: Took 1.5 million
games to learn
![Page 37: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/37.jpg)
Greased PigGreased Pig
•My continuous variant of Pig•Object: First to score 100 points•On your turn, generate a
random number from 0.5 to 6.5 until:–Your rounded number is 1, and
you score NOTHING.–You hold, and KEEP the sum.
•How does this change things?
![Page 38: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/38.jpg)
Greased Pig ChallengesGreased Pig Challenges
•Infinite possible game states
•Infinite possible games•Limited experience•Limited memory•Learning and approximation challenge
![Page 39: Pondering Probabilistic Play Policies for Pig](https://reader036.fdocuments.net/reader036/viewer/2022062422/56813aea550346895da35315/html5/thumbnails/39.jpg)
SummarySummary
•Solving equations can only take you so far (but much farther than we can fathom).
•Machine learning is an exciting area of research that can take us farther.
•The power of computing will increasingly aid in our learning in the future.