Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies*

Mary Lou MaherUniversity of Sydney

AAAI AI and Fun WorkshopJuly 2010

1 Based on Merrick, K. and Maher, M.L. (2009) Motivated Reinforcement Learning: Curious Characters for Multiuser Games, Springer.

Outline Curiosity and Fun Motivation Motivated Reinforcement Learning An Agent Model of a Curious Character Evaluation of Behavior Policies

Can AI model Fun?

Claim:An agent motivated by curiosity to learn patterns is a model of fun.

Games try to achieve flow: a function of the players skill and performance

J. Chen, Flow in games (and everything else). Communications of the ACM 50(4):31-34, 2007

Why Motivated Reinforcement Learning?

More efficient learning: Complement external reward with internal reward

External reward not known at design time Design tasks Real world scenrios: Robotics Virtual world scenarios: NPC in computer games

More autonomy in determining learning tasks Robotics NPC in computer games

Models of Motivation Cognitive:

Interest Competency Challenge

Biological Stasis variables: energy, blood pressure, etc

Social Conformity Peer pressure

MRL Agent Model

Motivation as Interesting Events

F +

F -

I n t e r e s t

- 1

- 0 . 5

0

0 . 5

1

0 0 . 5 1 1 . 5 2

I n t e n s i t y / N o v e l t y

Pleasantness / Interest .

Event is a change in observations:O(t)–O(t’) = (Δ(o1(t), o1(t’)), Δ(o2(t), o2(t’)), … Δ(oL(t), oL(t’)), …)

D.E. Berlyne, Exploration and Curiosity, Science 153:24-33, 1966

Sensed States: Context Free Grammar (CFG)CFG = (VS, ΓS, ΨS, S) where: VS is a set of variables or syntactic categories, ΓS is a finite set of terminals such that VS ∩ ΓS = {}, ΨS is a set of productions V -> v where V is a variable

and v is a string of terminals and variables, S is the start symbol.

Thus, the general form of a sensed state is: S -> <sensations> <sensations> -> <PiSensations><sensations> | ε <PiSensations> -> <sL><PiSensations> | ε <sL> -> <number> | <string>

MRL for Non Player Characters

Mine

Agent Forest

Pick

Forge

Lathe

Axe

Smithy

Carpenter’s shop

W ‡ <agent><environment> <agent> ‡ <location><inventory> <location> ‡ <mine> | <smithy> | <forest> | <carpenter> <mine> ‡ 1 <smithy> ‡ 2 <forest> ‡ 3 <carpenter> ‡ 4 <inventory> ‡ <objects> <environment> ‡ <objects> <objects> ‡ <object><objects> | <object> ‡ <pick> | <forge> | <iron> | <weapons> | <axe> | <lathe> | <timber> | <furniture> <pick> ‡ 1 <forge> ‡ 1 <iron> ‡ 1 <weapons> ‡ 1 <axe> ‡ 1 <lathe> ‡ 1 <timber> ‡ 1 <furniture> ‡ 1

Habituated Self Organizing Map

Behavioral Variety

Behavioural variety measures the number of events for which a near optimal policy is learned.

We characterise the level of optimality of a policy learned to achieve the event E(t) in terms of its structural stability.

0

2

4

6

8

1 0

1 2

1 4

1 6

1 8

2 0

0 5 0 0 0 1 0 0 0 0 1 5 0 0 0 2 0 0 0 0

T i m e

Behavioural Variety .e

Behavioral Complexity The complexity of a policy can be measured

by averaging the mean numbers of actions ā

E(t) required to repeat E(t) at any time when the current behaviour is stable

0

1

2

3

4

5

6

7

8

0 5 1 0 1 5 2 0

B e h a v i o u r N u m b e r

Behavioural Complexity .e

Research Directions Scalability and dynamics: different RL

such as decision trees and NN function approximation

Motivation functions: competence, optimal challenges, social models

Relevance to AI and Fun Is it more fun to play with curious NPC?

Can a curious agent play a game to test how fun a game is?

Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Documents

Transcript of Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010