Planning and Learning in Games

Planning and Learning in Games

Michael van LentInstitute for Creative TechnologiesUniversity of Southern California

Business of Games• 60% of Americans play video games• $25 Billion dollar industry worldwide (2004)

• $11 Billion dollars in the US (2004)• $6.1 billion in 1999, $5.5 billion in 1998, $4.4 billion in 1997.

• One day sales records• Halo 2: $125 million in a single day• Harry Potter (Half-blood Prince): $140 million single day

• Consoles dominate the industry• 90% of sales (Microsoft, Sony, Nintendo)

• Average age of game players is 29• Average age of game buyers is 36

• 59% of game players are men

Game AI: A little context

• History of game AI in 5 bullet points• Lots of work on path planning• Hand-coded AI• Finite state machines• Scripted AI• Embed hints in the environment

• Things are starting to change• Game environments are getting more complex• Players are getting more sophisticated• Development costs are sky rocketing• Incremental improvements are required to get a publisher

• Game developers are adopting new techniques• Game AI is becoming more procedural and more adaptive

Scripted AI: Example 1

Age of Kings Age of Kings MicrosoftMicrosoft

; The AI will attack once at 1100 seconds and then again ; The AI will attack once at 1100 seconds and then again ; every 1400 sec, provided it has enough defense soldiers.; every 1400 sec, provided it has enough defense soldiers. (defrule(defrule

(game-time > 1100)(game-time > 1100)=>=>

(attack-now)(attack-now)(enable-timer 7 1100)(enable-timer 7 1100)))

(defrule(defrule

(timer-triggered 7)(timer-triggered 7)(defend-soldier-count >= 12)(defend-soldier-count >= 12)

=>=>(attack-now)(attack-now)(disable-timer 7)(disable-timer 7)(enable-timer 7 1400)(enable-timer 7 1400)))

Scripted AI: Example 2

Age of Kings Microsoft

Age of Kings Microsoft

(defrule(defrule(true)(true)

=>=>(enable-timer 4 3600)(enable-timer 4 3600)(disable-self))(disable-self))

(defrule(defrule

(timer-triggered 4)(timer-triggered 4)=>=>

(cc-add-resource food 700)(cc-add-resource food 700)(cc-add-resource wood 700)(cc-add-resource wood 700)(cc-add-resource gold 700)(cc-add-resource gold 700)(disable-timer 4)(disable-timer 4)(enable-timer 4 2700)(enable-timer 4 2700)))

The SIMS Maxis

The SIMS Maxis

Procedural AI: The SimsProcedural AI: The Sims

Two Adaptive AI Technologies

• Criteria• First-hand experience• Support procedural and adaptive AI• Early stages of adoption by commercial developers


• Criteria• Deliberative Planning

• F.E.A.R. (Monolith/Vivendi Universal for PC)• Condemned (Monolith/Sega for Xbox 2)


• Criteria• Deliberative Planning• Machine Learning

• Long considered “scary voodoo”• Decision tree induction & neural nets in Black & White• Drivatar in Forza Motorsport

Why Planning and Learning?

• Improving current games• More variable & replayable• More immersive & engaging• More customized experience• More robust• More challenging

• Improved profits• More sales• Marketing• Cheaper development

• New elements of game play and whole new genres• Necessary as games advance

Why not Planning and Learning?

• Costlier development• Is the expense worth the result?

• Greater processor/memory load• AI typically gets 10-20% of the CPU• That time comes in frequent small slices

• Harder to control the player’s experience• Harder to do quality assurance

• Double the cost of testing

• Adds technical risk• Programmers need to spin up on new technologies• Designers need to understand what’s possible

• Designers create the AI; Programmers implement it

• Marketing backlash• Once game is stable it’s too late to add a major feature

Why Planning and Learning?

• Improving current games• More variable & replayable• More immersive & engaging• More customized experience• More robust• More challenging

• Improved profits• More sales• Marketing• Cheaper development

• New elements of game play and whole new genres• Necessary as games advance

Blah Blah blah Blah?

• Blah blah blah • Blah blah & blah • Blah blah & blah • Blah blah blah • Blah blah • Blah blah

• Improved profitsImproved profits• Blah blah • Blah• Blah blah

• Blah blah blah blah blah blah blah blah blah • Blah blah blah blah

Deliberative Planning

• What is deliberative planning?• If you know the current state of the world• and the goal state(s) of the world• and the actions available

• When each can be done• How each changes the world

• then search for a sequence of actions that changes the

current state into a goal state.• Deliberative planning is just a search problem

• When to plan?• Off-line: Before/after each game session• Real-time: During the game session• During development: Not part of shipped product


• Domain independent planning engine• Abstract problem description

• Goal world state (Mission objective)• secure(building1)• clear(building1) & clear(building2) & clear(building3)• captured(OpforLeader) or killed(OpforLeader)



• Goal world state (Mission objective)• Operators

Team-Move (opfor,L?)

Checkpoint (u1)

Checkpoint (u2)

Checkpoint (u3)

(opfor at L?)(mobile opfor)

(mobile u1)

(mobile u2)

(mobile u3)

(u2 at L?)

(u3 at L?)

(u1 at L?)



• Goal world state (Mission objective)• Operators (base-secure)Secure-Base-Against-SW-Attack(at-base u?,u?,u?)

Defend-Building (u?, b14)

Secure-Perimeter-Against-SW-Attack (opfor)

(u? at b14)

Ambush (u?, sw-region)

Patrol (u?, s-path)(u? at s-path)

(u? at sw-region)

(at-base u?,u?)

(perimeter-secure)



• Goal world state• Operators• Initial world state

• Deliberative Planning: Find a sequence of operators that

change the initial world state into a goal world state.

Strategic Planning Example

GoalInit

(base-secure)

Secure-Base-Against-SW-AttackTeam-Move (opfor) (opfor at base)

Checkpoint (u1)

Defend-Building (u1, b14)

Secure-Perimeter-Against-SW-Attack (opfor)

Ambush (u3, sw-region)

Patrol (u2, s-path)

Checkpoint (u2)

Checkpoint (u3)

(u1 at b14)

(u2 at s-path)

(u3 at sw-region)

(mobile opfor)

Plan Execution

• Execute atomic actions from plan• Move from abstract planning world to “real world”• Real-time interaction with environment

• 10+ sense/think/act cycles per second

Ambush (u3, sw-region)

Select-ambush-loc Move-to-ambush-loc Wait-to-ambush Ambush-attack Report-success

Report-failure

Abandon-ambushDefend

Machine Learning: Behavior Capture

• Also called:• Behavioral Cloning• Learning by Observation• Learning by Imitation• A form of Knowledge Capture

• Learn by watching an expert• Experts are good at performing the task• Experts aren’t always good at teaching/explaining the task• Learn believable, human-like behavior• Mimic the styles of different players

• When to learn?• During development• Off-line

Drivatar

• “Check out the revolutionary A.I. Drivatar™

technology: Train your own A.I. "Drivatars" to use the

same racing techniques you do, so they can race for

you in competitions or train new drivers on your team.

Drivatar technology is the foundation of the human-

like A.I. in Forza Motosport.”

• Collaboration between Microsoft Games and

Microsoft Research

Learning to Fly

• Learn a flight sim autopilot from observing human pilots• 30 “observations” each from 3 experts• 20 features (elevation, airspeed, twist, fuel, thrust…)• 4 controls (elevators, rollers, thrust, flaps)• Take off, level out, fly towards a mountain, return and land

• Key idea: Experts react to the same situation in different

ways depending on their current goals• Divide a flight sim task into 7 phases• Learn four decision trees for each stage (one per control)

• Second key idea: Don’t combine data from multiple

experts• Sammut, C. Hurst, S., Kedzier, D., and Michie, D. Learning to fly. In

Proceedings of the Ninth International Conference on Machine Learning, pgs.

385-393, 1992.

KnoMic (Knowledge Mimic)

• Learn air combat in a flight sim and a deathmatch bot in

Quake II• Dynamic behavior against opponents• Can’t divide the task into fixed phases

• Key idea: Experts dynamically select which operator they’re

working on based on opponent and environment• Also learn when to select operators (pre-conditions)• and what those operators do (effects)

• Second key idea: Experts annotation observations with

their operator selections• van Lent, M. & Laird, J. E., Learning Procedural Knowledge by Observation.

Proceedings of the First International Conference on Knowledge Capture (K-

CAP 2001), October 21-23, 2001, Victoria, BC, Canada, ACM, pp 179-186.

The Future

Where to learn more

• AI and Interactive Digital Entertainment Conference• Marina del Rey, June 2006

• Journal of Game Development• Charles River Media

• Game Developer Magazine• August special issue on AI

• Game Developer’s Conference• AI Game Programming Wisdom book series• Historical:

• 2005 IJCAI workshop on Reasoning, Representation and

Learning in Computer Games• AAAI Spring Symposiums 1999 – 2003• 2004 AAAI Workshop

Interesting observations

• A few of my own:• The most challenging opponent isn’t the most fun.• “Never stupid” is better than “sometimes brilliant.”• Never underestimate the player’s ability to see intelligence where

there is none.• Game companies aren’t a source of research funds

• A few of Will Wright’s:• Maximize the ratio of internal complexity to perceived intelligence.• The player will build an internal model of your system. If you

don’t help them build it, they’ll probably build the wrong one.• The flow of information about a system has a huge impact on the

players perception of it’s intelligence.• From the players point of view there is a fine line between

complex behavior and random behavior.

Planning and Learning in Games

Documents

Transcript of Planning and Learning in Games