Post on 23-Feb-2016
description
Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies*
Mary Lou MaherUniversity of Sydney
AAAI AI and Fun WorkshopJuly 2010
1 Based on Merrick, K. and Maher, M.L. (2009) Motivated Reinforcement Learning: Curious Characters for Multiuser Games, Springer.
Outline Curiosity and Fun Motivation Motivated Reinforcement Learning An Agent Model of a Curious Character Evaluation of Behavior Policies
Can AI model Fun?
Claim:An agent motivated by curiosity to learn patterns is a model of fun.
Games try to achieve flow: a function of the players skill and performance
J. Chen, Flow in games (and everything else). Communications of the ACM 50(4):31-34, 2007
Why Motivated Reinforcement Learning?
More efficient learning: Complement external reward with internal reward
External reward not known at design time Design tasks Real world scenrios: Robotics Virtual world scenarios: NPC in computer games
More autonomy in determining learning tasks Robotics NPC in computer games
Models of Motivation Cognitive:
Interest Competency Challenge
Biological Stasis variables: energy, blood pressure, etc
Social Conformity Peer pressure
MRL Agent Model
Motivation as Interesting Events
F +
F -
I n t e r e s t
- 1
- 0 . 5
0
0 . 5
1
0 0 . 5 1 1 . 5 2
I n t e n s i t y / N o v e l t y
Pleasantness / Interest .
Event is a change in observations:O(t)–O(t’) = (Δ(o1(t), o1(t’)), Δ(o2(t), o2(t’)), … Δ(oL(t), oL(t’)), …)
D.E. Berlyne, Exploration and Curiosity, Science 153:24-33, 1966
Sensed States: Context Free Grammar (CFG)CFG = (VS, ΓS, ΨS, S) where: VS is a set of variables or syntactic categories, ΓS is a finite set of terminals such that VS ∩ ΓS = {}, ΨS is a set of productions V -> v where V is a variable
and v is a string of terminals and variables, S is the start symbol.
Thus, the general form of a sensed state is: S -> <sensations> <sensations> -> <PiSensations><sensations> | ε <PiSensations> -> <sL><PiSensations> | ε <sL> -> <number> | <string>
MRL for Non Player Characters
Mine
Agent Forest
Pick
Forge
Lathe
Axe
Smithy
Carpenter’s shop
W ‡ <agent><environment> <agent> ‡ <location><inventory> <location> ‡ <mine> | <smithy> | <forest> | <carpenter> <mine> ‡ 1 <smithy> ‡ 2 <forest> ‡ 3 <carpenter> ‡ 4 <inventory> ‡ <objects> <environment> ‡ <objects> <objects> ‡ <object><objects> | <object> ‡ <pick> | <forge> | <iron> | <weapons> | <axe> | <lathe> | <timber> | <furniture> <pick> ‡ 1 <forge> ‡ 1 <iron> ‡ 1 <weapons> ‡ 1 <axe> ‡ 1 <lathe> ‡ 1 <timber> ‡ 1 <furniture> ‡ 1
Habituated Self Organizing Map
Behavioral Variety
Behavioural variety measures the number of events for which a near optimal policy is learned.
We characterise the level of optimality of a policy learned to achieve the event E(t) in terms of its structural stability.
0
2
4
6
8
1 0
1 2
1 4
1 6
1 8
2 0
0 5 0 0 0 1 0 0 0 0 1 5 0 0 0 2 0 0 0 0
T i m e
Behavioural Variety .e
Behavioral Complexity The complexity of a policy can be measured
by averaging the mean numbers of actions ā
E(t) required to repeat E(t) at any time when the current behaviour is stable
0
1
2
3
4
5
6
7
8
0 5 1 0 1 5 2 0
B e h a v i o u r N u m b e r
Behavioural Complexity .e
Research Directions Scalability and dynamics: different RL
such as decision trees and NN function approximation
Motivation functions: competence, optimal challenges, social models
Relevance to AI and Fun Is it more fun to play with curious NPC?
Can a curious agent play a game to test how fun a game is?