Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010
description
Transcript of Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010
Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies*
Mary Lou MaherUniversity of Sydney
AAAI AI and Fun WorkshopJuly 2010
1 Based on Merrick, K. and Maher, M.L. (2009) Motivated Reinforcement Learning: Curious Characters for Multiuser Games, Springer.
Outline Curiosity and Fun Motivation Motivated Reinforcement Learning An Agent Model of a Curious Character Evaluation of Behavior Policies
Can AI model Fun?
Claim:An agent motivated by curiosity to learn patterns is a model of fun.
Games try to achieve flow: a function of the players skill and performance
J. Chen, Flow in games (and everything else). Communications of the ACM 50(4):31-34, 2007
Why Motivated Reinforcement Learning?
More efficient learning: Complement external reward with internal reward
External reward not known at design time Design tasks Real world scenrios: Robotics Virtual world scenarios: NPC in computer games
More autonomy in determining learning tasks Robotics NPC in computer games
Models of Motivation Cognitive:
Interest Competency Challenge
Biological Stasis variables: energy, blood pressure, etc
Social Conformity Peer pressure
MRL Agent Model
Motivation as Interesting Events
F +
F -
I n t e r e s t
- 1
- 0 . 5
0
0 . 5
1
0 0 . 5 1 1 . 5 2
I n t e n s i t y / N o v e l t y
Pleasantness / Interest .
Event is a change in observations:O(t)–O(t’) = (Δ(o1(t), o1(t’)), Δ(o2(t), o2(t’)), … Δ(oL(t), oL(t’)), …)
D.E. Berlyne, Exploration and Curiosity, Science 153:24-33, 1966
Sensed States: Context Free Grammar (CFG)CFG = (VS, ΓS, ΨS, S) where: VS is a set of variables or syntactic categories, ΓS is a finite set of terminals such that VS ∩ ΓS = {}, ΨS is a set of productions V -> v where V is a variable
and v is a string of terminals and variables, S is the start symbol.
Thus, the general form of a sensed state is: S -> <sensations> <sensations> -> <PiSensations><sensations> | ε <PiSensations> -> <sL><PiSensations> | ε <sL> -> <number> | <string>
MRL for Non Player Characters
Mine
Agent Forest
Pick
Forge
Lathe
Axe
Smithy
Carpenter’s shop
W ‡ <agent><environment> <agent> ‡ <location><inventory> <location> ‡ <mine> | <smithy> | <forest> | <carpenter> <mine> ‡ 1 <smithy> ‡ 2 <forest> ‡ 3 <carpenter> ‡ 4 <inventory> ‡ <objects> <environment> ‡ <objects> <objects> ‡ <object><objects> | <object> ‡ <pick> | <forge> | <iron> | <weapons> | <axe> | <lathe> | <timber> | <furniture> <pick> ‡ 1 <forge> ‡ 1 <iron> ‡ 1 <weapons> ‡ 1 <axe> ‡ 1 <lathe> ‡ 1 <timber> ‡ 1 <furniture> ‡ 1
Habituated Self Organizing Map
Behavioral Variety
Behavioural variety measures the number of events for which a near optimal policy is learned.
We characterise the level of optimality of a policy learned to achieve the event E(t) in terms of its structural stability.
0
2
4
6
8
1 0
1 2
1 4
1 6
1 8
2 0
0 5 0 0 0 1 0 0 0 0 1 5 0 0 0 2 0 0 0 0
T i m e
Behavioural Variety .e
Behavioral Complexity The complexity of a policy can be measured
by averaging the mean numbers of actions ā
E(t) required to repeat E(t) at any time when the current behaviour is stable
0
1
2
3
4
5
6
7
8
0 5 1 0 1 5 2 0
B e h a v i o u r N u m b e r
Behavioural Complexity .e
Research Directions Scalability and dynamics: different RL
such as decision trees and NN function approximation
Motivation functions: competence, optimal challenges, social models
Relevance to AI and Fun Is it more fun to play with curious NPC?
Can a curious agent play a game to test how fun a game is?