Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

15
Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies * Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010 d on Merrick, K. and Maher, M.L. (2009) Motivated Reinforcement Learning: Curious Characters for Multiuser Games , Spr

description

Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies *. Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010. - PowerPoint PPT Presentation

Transcript of Mary Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Page 1: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Curious Characters in Multiuser Games: A Study in Motivated Reinforcement Learning for Creative Behavior Policies*

Mary Lou MaherUniversity of Sydney

AAAI AI and Fun WorkshopJuly 2010

1 Based on Merrick, K. and Maher, M.L. (2009) Motivated Reinforcement Learning: Curious Characters for Multiuser Games, Springer.

Page 2: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Outline Curiosity and Fun Motivation Motivated Reinforcement Learning An Agent Model of a Curious Character Evaluation of Behavior Policies

Page 3: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Can AI model Fun?

Claim:An agent motivated by curiosity to learn patterns is a model of fun.

Page 4: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Games try to achieve flow: a function of the players skill and performance

J. Chen, Flow in games (and everything else). Communications of the ACM 50(4):31-34, 2007

Page 5: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Why Motivated Reinforcement Learning?

More efficient learning: Complement external reward with internal reward

External reward not known at design time Design tasks Real world scenrios: Robotics Virtual world scenarios: NPC in computer games

More autonomy in determining learning tasks Robotics NPC in computer games

Page 6: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Models of Motivation Cognitive:

Interest Competency Challenge

Biological Stasis variables: energy, blood pressure, etc

Social Conformity Peer pressure

Page 7: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

MRL Agent Model

Page 8: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Motivation as Interesting Events

F +

F -

I n t e r e s t

- 1

- 0 . 5

0

0 . 5

1

0 0 . 5 1 1 . 5 2

I n t e n s i t y / N o v e l t y

Pleasantness / Interest .

Event is a change in observations:O(t)–O(t’) = (Δ(o1(t), o1(t’)), Δ(o2(t), o2(t’)), … Δ(oL(t), oL(t’)), …)

D.E. Berlyne, Exploration and Curiosity, Science 153:24-33, 1966

Page 9: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Sensed States: Context Free Grammar (CFG)CFG = (VS, ΓS, ΨS, S) where: VS is a set of variables or syntactic categories, ΓS is a finite set of terminals such that VS ∩ ΓS = {}, ΨS is a set of productions V -> v where V is a variable

and v is a string of terminals and variables, S is the start symbol.

Thus, the general form of a sensed state is: S -> <sensations> <sensations> -> <PiSensations><sensations> | ε <PiSensations> -> <sL><PiSensations> | ε <sL> -> <number> | <string>

Page 10: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

MRL for Non Player Characters

Mine

Agent Forest

Pick

Forge

Lathe

Axe

Smithy

Carpenter’s shop

W ‡ <agent><environment> <agent> ‡ <location><inventory> <location> ‡ <mine> | <smithy> | <forest> | <carpenter> <mine> ‡ 1 <smithy> ‡ 2 <forest> ‡ 3 <carpenter> ‡ 4 <inventory> ‡ <objects> <environment> ‡ <objects> <objects> ‡ <object><objects> | <object> ‡ <pick> | <forge> | <iron> | <weapons> | <axe> | <lathe> | <timber> | <furniture> <pick> ‡ 1 <forge> ‡ 1 <iron> ‡ 1 <weapons> ‡ 1 <axe> ‡ 1 <lathe> ‡ 1 <timber> ‡ 1 <furniture> ‡ 1

Page 11: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Habituated Self Organizing Map

Page 12: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Behavioral Variety

Behavioural variety measures the number of events for which a near optimal policy is learned.

We characterise the level of optimality of a policy learned to achieve the event E(t) in terms of its structural stability.

0

2

4

6

8

1 0

1 2

1 4

1 6

1 8

2 0

0 5 0 0 0 1 0 0 0 0 1 5 0 0 0 2 0 0 0 0

T i m e

Behavioural Variety .e

Page 13: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Behavioral Complexity The complexity of a policy can be measured

by averaging the mean numbers of actions ā

E(t) required to repeat E(t) at any time when the current behaviour is stable

0

1

2

3

4

5

6

7

8

0 5 1 0 1 5 2 0

B e h a v i o u r N u m b e r

Behavioural Complexity .e

Page 14: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Research Directions Scalability and dynamics: different RL

such as decision trees and NN function approximation

Motivation functions: competence, optimal challenges, social models

Page 15: Mary  Lou Maher University of Sydney AAAI AI and Fun Workshop July 2010

Relevance to AI and Fun Is it more fun to play with curious NPC?

Can a curious agent play a game to test how fun a game is?