Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk...
Transcript of Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk...
![Page 1: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/1.jpg)
General Video Game AI and Bandit Landscape EAs
Simon LucasQueen Mary University of London
(with Jialin Liu, Diego Perez,
Raluca Gaina, Kamolwan (Mike) Kunanusont)
Game AI Research Group
![Page 2: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/2.jpg)
History of Artificial Intelligence
Boring AI – not very adaptive; siloed
Exciting Learning AI – adaptive and general
![Page 3: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/3.jpg)
Artificial General IntelligenceThree Pillars
• Evolutionary Algorithms• Deep Learning• Simulation-Based Learning / Planning
• Most interesting to use hybrids of all three
• At different temporal scales – including evolution for real-time action selection
![Page 4: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/4.jpg)
Progressing towards AGIWhy Games?
• Games provide the perfect platform:– Experimental development and testing of theories and
systems– Wide range: from simple to complex– Fun for humans to engage– Generally harmless
• Easy to simulate fast and in parallel (unlike real robots)
• Creative aspects as well as performance– Automated game design and game tuning
![Page 5: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/5.jpg)
Talk Outline
• Motivation in Game AI – and General Video Game AI
• A practical new algorithm for noisy optimisation
• Main features:– Adapts the simplest evolutionary algorithm (Random
Mutation Hill-Climber) to use model to guide search
– UCB (Bandit) equation to balance exploration versus exploitation
• Initial results: very promising
![Page 6: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/6.jpg)
Games: Great Source of Noisy Optimisation Problems
• Here we treat noise as uncertainty about how to play well
• About:– Hidden cards (e.g. Poker)
– Dice outcome (Backgammon)
– Hidden Random Seed (Ms Pac-Man)
• Different, but also related:– Unknown intended opponent actions (Chess)
![Page 7: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/7.jpg)
Video Games
• For more than a decade our community has used video games as a great source of AI challenge – Check out IEEE CIG 2017 in New York
![Page 9: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/9.jpg)
StarCraft AI (classic e-sport)http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/custom-ai-programs-take-on-top-ranked-huma
ns-in-starcraft
• Challenging RTS• Strategy +
lightning reactions
• AI currently at amateur level
• I predict AI will be super-human in 2019
![Page 10: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/10.jpg)
Planet Wars – a Simple RTS (Real Time Strategy Game)
![Page 11: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/11.jpg)
Getting Creative: Level Generation(See demo)
![Page 12: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/12.jpg)
General Video Game AI
• Challenge for AI:– Play any video game– Don’t know the rules– But you know the
score – And when you die
• A bit like walking in to an arcade in the 80s and playing a new game
![Page 14: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/14.jpg)
VGDL and the GVGAI Framework
(Sokoban)
14https://github.com/EssexUniversityMCTS/gvgai/wiki/VGDL-Language
![Page 15: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/15.jpg)
GVGAI Videos
• http://gvgai.net/test/vid/Aliens.mov
• http://gvgai.net/test/vid/Butterflies.mov
• http://gvgai.net/test/vid/Seaquest.mov
• http://gvgai.net/test/vid/Sheriff.mov
![Page 16: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/16.jpg)
Statistical Simulation based CI / AI
• Relies on fast forward model:
–F(s,a) -> s’
![Page 17: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/17.jpg)
Real-Time Decision MakingChallenges for Statistical Search Methods
• Rapid reaction needed: e.g. 40ms or less between requests for action
• Branching factor may be high• Limited horizon– Can’t look very far ahead
• Limited roll-out budget– Don’t have time to perform many roll-outs
• Random actions may be terrible!– And lead to a very flat reward landscape
![Page 18: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/18.jpg)
Monte Carlo Tree Search: the main (CRAZY ?) idea
• Tree policy: choose which node to expand (not necessarily leaf of tree)
• Default (simulation) policy: random playout until end of game
![Page 19: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/19.jpg)
MCTS Builds Asymmetric Trees
• Aims to balance exploration and exploitation
• In video games the limited roll-out budget is a challenge, but not the only one!
![Page 20: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/20.jpg)
Rolling Horizon Evolution
• Evolve action sequences in real time
• Each time pick first action
• Then run evolution again
![Page 21: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/21.jpg)
Rolling Horizon Evolution ExampleWhere might noise come from?
Int vec:1444333333322Translated into game actions:Up, Left, Left, Left, Down, …Then evaluated by game engineFirst action is used after each optimisation runRe-run EA every 40ms
![Page 22: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/22.jpg)
Motivation for Bandit Landscape EA
• Game AI– Evolving / tuning game parameters / rules / content
(e.g. level design)– Real-time control via rolling horizon evolution
• Applies when the fitness evaluations are:– Noisy– Limited in number either because they are:
• Computationally Expensive• Need to be done very rapidly (real-time)
• Not limited to games of course!
![Page 23: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/23.jpg)
Evolutionary AlgorithmsSimple and Beautiful
• Initialise a RANDOM population of individuals
• Then REPEAT– Evaluate them all – and rank them by FITNESS– BREED Offspring from the FITTEST Parents
• UNTIL Satisfied, or Out of Time
• Attractive simplicity, takes a bit of skill to make them really work …
• One of my VERY FAVOURITE APPROACHES to AI• BUT: Can do even better with a fitness landscape model• (the part in RED is affected by noise)
![Page 24: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/24.jpg)
Simple Evolutionary Algorithm Demo
![Page 25: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/25.jpg)
We use Evolution to Evolve Vectors of Integers (Discrete Noisy Optimisation)
• But these have very different interpretations
• Could be a sequence of actions to take– Rolling Horizon Evolution
• Or parameters of a game design– Locations of pills in Pac-Man
– Missile velocity in Asteroids
– Jump height in Mario
– Etc.
![Page 26: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/26.jpg)
Game Design: Also Noisy!
• Can design a game• But each time an AI agent or human player plays:– The experience will be different – To to the game, or the player actions
• In a population of players each one may play differently– And therefore have a different experience
• We can measure aspects of this experience• And view game design as noisy optimisation
![Page 27: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/27.jpg)
Value of Fitness Landscape Modelling
• Can lead to more efficient search– Fitter solutions are found more quickly
• We learn more about the problem– Aim now is not just to find fittest possible
solutions
– But also estimate value of untested points in the search space
![Page 28: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/28.jpg)
System Diagram
• Note the fat connection between the EA and the landscape model
Bandit EANoisy
Fitness Evaluator
Bandit Fitness Landscape
Model
![Page 29: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/29.jpg)
The Multi-Armed Bandit Problem
At each step pull one arm
Noisy/random reward signal
In order to:* Find the best arm* Minimise regret* Maximise expected return
![Page 30: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/30.jpg)
Which Arm to Pull?UCB1 Balances Exploration v. Exploitation
UCB1 (Auer et al, 2002)Choose arm j so as to maximise:
Mean so far (exploit)
Upper bound on variance (explore)
![Page 31: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/31.jpg)
Example: Simple Space Battlepublic enum BattleParamNames {
DAMAGE_RADIUS, DAMAGE_COST, LOSS, SHIP_SIZE, ROTATION, THRUST}…
EvoVectorSet params = new EvoVectorSet();
params.params.add(new EvoDoubleSet(DAMAGE_RADIUS.toString(), new double[]{5, 20, 50, 100, 200}));
params.params.add(new EvoDoubleSet(DAMAGE_COST.toString(),
new double[]{1, 5, 20, 50}));…
![Page 32: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/32.jpg)
Example Game: Space Battle(Unevolved)
![Page 33: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/33.jpg)
Fitness Landscape Model Interface
public interface FitnessLandscapeModel {
void addPoint(int[] p, double value);
// return a Double object - a null return indicates that // we know nothing yet; Double getSimple(int[] x);
// careful - this can be slow – // it iterates over all points in the search space!
int[] getBestSolution();
int[] getBestOfSampled();
int[] getBestOfSampledPlusNeighbours(int nNeighbours);
}
![Page 34: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/34.jpg)
The Bandit Landscape EA int[] p = SearchSpaceUtil.randomPoint(searchSpace);
while (evaluator.nEvals() < nEvals) {
double fitness = evaluator.evaluate(p); banditLandscape.addPoint(p, fitness);
EvaluateChoices evc = new EvaluateChoices(banditLandscape, kExplore);
while (evc.n() < nNeighbours) { int[] pp = mutator.randMut(p); evc.add(pp); }
p = evc.picker.getBest(); }
int[] solution = banditLandscape.getBest(); return solution;
![Page 35: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/35.jpg)
Currently we Implement the Bandit Landscape Model as an N-Tuple System
• N-Tuples are the best function approximator that no-one has ever heard of!– (bit like random forests)
• Constant time access (independent of number of samples learned)
• Take projections of a high-dimensional space• Store values for each projection in a look-up table
– Each n-tuple sample provides a table index– Novel contribution:
•Each table entry stores a Statistical Summary Object
![Page 36: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/36.jpg)
Statistical Summary Objectclass StatSummary
Each table entry is of type StatSummaryProvides efficient storage and access to:
Mean, Standard Deviation, Standard Error, Number of Samples, …
For the Bandit EA, we just need to know the mean and number of samples of each point in the search space, but also interesting to query other stats
![Page 37: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/37.jpg)
2-Dimensional Example
![Page 38: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/38.jpg)
Summary
• Bandit Landscape Evolutionary Algorithm
• Simple algorithm with attractive properties:– Balances exploration versus exploitation
– Makes use of all available information during search
– No need to choose a resampling rate
– Result of search is a landscape model in addition to estimate of best solution
![Page 39: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/39.jpg)
Noisy Sample ProblemNoisy Win Rate Optimisation
• Optimise a bit string such that• Each fitness evaluation flips a biased coin– P(win) = Math.rand < (x / (2^n-1))– i.e. win prob is given by:
• binary number value of bit string / max possible
• This very roughly models this situation game parameter optimisation where some parameters are much more sensitive than others
![Page 40: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/40.jpg)
Budget: 100 Fitness Evals
![Page 41: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/41.jpg)
Simple Space BattleEach ship has a Damage Disc
![Page 42: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/42.jpg)
N-Tuple Bandit Landscape Evolved Space Battle Video
![Page 43: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/43.jpg)
Summary
• Games provide great application area for AI• And for noisy optimisation
– Both for generating smart players– And designing or tuning new games
• Bandit Landscape Evolutionary Algorithm• Simple algorithm:
– Balances exploration versus exploitation
• We use the same EAs for automated game design and automated game playing
• More detail in our IEEE CEC 2017 Paper
![Page 44: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/44.jpg)
Thank you!
• Questions?
![Page 45: Queen Mary University of London General Video Game AI and ...epia2017/wp-content/uploads/...Talk Outline •Motivation in Game AI – and General Video Game AI •A practical new algorithm](https://reader036.fdocuments.net/reader036/viewer/2022070806/5f0534e47e708231d411cee1/html5/thumbnails/45.jpg)
Some references…• Kamolwan Kunanusont, Raluca Gaina, Jialin Liu, Diego
Perez-Liebana and Simon Lucas, The N-Tuple Bandit Evolutionary Algorithm for Game Improvement, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]
• Jialin Liu, Julian Togelius, Diego Perez-Liebana and Simon M. Lucas, Evolving Game Skill-Depth using General Video Game AI Agents, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]
• Jialin Liu, Diego Perez-Liebana and Simon M. Lucas, Bandit-Based Random Mutation Hill-Climbing, in Proceedings of the Congress on Evolutionary Computation (2017). [pdf]