A presentation by Matthew Dilts. To solve problems that would take a long amount of time to...

35
MACHINE LEARNING A presentation by Matthew Dilts

Transcript of A presentation by Matthew Dilts. To solve problems that would take a long amount of time to...

MACHINE LEARNING•A presentation by Matthew Dilts

Why use learning?

To solve problems that would take a long amount of time to manually solve ie. What’s the best strategy in a certain game?

Learning AI can adapt to conditions that cannot be anticipated prior to a game’s release (such as individuals playing tastes and strategies)

http://www.youtube.com/watch?v=SbipWOeqF1c

Why hasn’t it been done before very often?

Until recently, the lack of precedent of the successful application of learning in a mainstream top-rated game means that the technology is unproven and hence perceived as being high risk. (Software companies are wusses!)

Learning algorithms are frequently associated with techniques such as neural networks and genetic algorithms, which are difficult to apply in-game due to their relatively low efficiency.

Some games which have used AI learning in the past

Different Learning/Adaption Strategies.

Faking It Indirect Adaptation Direct Adaptation Supervised Learning Unsupervised Learning

Faking It

Simply degrade an AI that performs very well through the addition of random errors, then over time reduce the number of random errors.

Advantages:- The ‘rate of learning’ can be carefully controlled and specified prior to release, as can the behavior of the AI at each stages of its development.- The state of the AI at any point in time is independent of the details of the interaction of the player with the game, simplifying debugging and testing.

Disadvantage: - However, this doesn’t actually solve any of the problems that would be solved by an actual AI

Indirect Adaption

AI Agent gathers information which is then used by a “conventional” AI layer to adapt the agent’s behavior.

Calculate optimal camping locations in an FPS, then hand it over to the AI layer.

Advantages:-The information about the game world upon which the changes in behavior are based can often be extracted very easily and reliably, resulting in fast and effective adaptation.-Since changes in behavior are made by a conventional AI layer, they are well defined and controlled, and hence easy to debug and test.

Disadvantages:-It requires both the information to be learned and the changes in behavior that occur in response to it to be defined a priori by the AI designer.

Fun http://www.20q.net/

Direct Adaptation

Using learning algorithms to adapt an agent’s behavior directly, usually by testing modifications to it in the game world to see if it can be improved.

Consider a game with no built-in AI whatsoever which evolves rules for controlling AI agents as the game is played. Such a system would be Direct Adaptation.

This type of learning closely mimics human learning and is a very bright-eyed idealistic way to think about learning, but it not very applicable in its most general form.

Advantages of Direct Adaption

Direct adaptation is the ultimate form of AIhttp://www.youtube.com/watch?v=s2CS9XijFvs

All the behaviors developed by the AI agents would be learned from their experience in the game world, and would therefore be unconstrained by the preconceptions of the Ai designer.

The evolution of the AI would be open ended in the sense that there would be no limit to the complexity and sophistication of the rule sets, and hence the behaviors that could evolve.

Disadvantages of Direct Adaptation

A measure of the agent’s performance must be developed that reflects the real aim of learning and the role of the agent in-game.

Each agent’s performance must be evaluated over a substantial period of time to minimize the impact of random events on the measured performance.

Too many evaluations are likely to be required for each agent’s performance to be measured against a representative sample of human opponents.

The lack of constraints on the types of behavior that can develop makes it impossible to guarantee that the game would continue to be playable once adaption had begun. Testing for such cases would be difficult.

Improving Adaptation

Specifically for Direct Adaptation, it works best when used for specific subsets of an overall AI goal.

Incorporate as much prior knowledge as possible. Design a good performance measure (this can be extremely

difficult because:-Many alternative measures of apparently equal merit often exist, requiring an arbitrary choice to be made between them.-The most obvious or logical measure of performance might produce the same value for wide ranges of parameter values, providing little guide as to how to choose between them. -Carelessly designed performance measures can encourage undesirable behavior, or introduce locally optimal behavior.

Methods of Learning

Learn by Optimization-Search for sets of parameters that make the agent perform well in-game.

Learn by Reinforcement-Learn the relationship between an action taken by the agent in a particular state of the game world and the performance of the agent.

Learn by Imitation-Imitating a player or allowing the player to evaluate performance assessments.

http://youtube.com/watch?v=swxtKrVb0m0&feature=related

(3:45)

Improving Learning

Avoid Locally Optimal Behaviors-Behaviors that, while not the best possible, can’t be improved upon by making small changes.

Minimize Dependencies-Multiple learning dependencies can rely on each other. Example- An AI with both a ‘location’ and a ‘weapon choice algorithm.

Avoid Overfitting-Overfitting means an agent has adapted its behavior to a very specific set of states of the game world and performs poorly in other states.

Explore and Exploit-Should the AI explore new strategies or repeat what it already knows?

Computational Requirements

Indirect adaptation works well in game because the agent’s behavior is determined by the AI layer in the design phase of a game.

Direct adaptation can be performed during a game’s development, but is limited to only limited and specific problems in game and only when guided by a well thought out heuristic.

Requirements for Adaptive Game AI

Handle learning opportunities efficiently-When do we calculate what has been learned?

Generate effective novel behaviors-The AI should experiment plausibly and effectively.

Be robust with respect to nondeterminism-Be aware that sometimes random choices will work well just by sheer luck

Require minimal computational resources-Typically Ai has access to a very small proportion of a machines resources.

Including Adaptive AI in games

Potentially NPCs could have adaptive AI. Why not?

Adaptive Ai takes too long to learn and, in general, the search space of behaviors is too large and complex to be explored quickly and efficiently.

Solution: Dynamic Scripting. An adaptive mechanism that uses domain knowledge to restrict the search space enough to make real-time adaption a reality while ensuring that ALL NPC behaviors are always plausible.

We can also go back and consider how Dynamic Scripting solves all of the “requirements” page as well.

Dynamic Scripting

At a tactical level, the number of possible states and actions in a game can be huge, while the number of learning opportunities is often relatively small. In these circumstances, reinforcement learning is unlikely to achieve acceptable results unless the size of the state-action space can be dramatically reduced. This is the approach of dynamic scripting.

This is different than just reinforcement learning because (although reinforcement learning may be involved) because it is a strategy to reduce the number of actual state actions that can be utilized and it can also adapt more easily to the strategy in use by the opponent.

Now, how do we actually go about implementing this?

Maintain a rulebase created by the AI programmer

The first step in creating an application of dynamic scripting is to create the rulebases from which it will build its scripts.

Example rules: use a melee attack against the closest enemy. Another rule could be to use a special ability at a certain time on a certain.

Using the many rulebases for each NPC, we then go and create our script, or strategy that the NPCs plan on using against their foes.

Also need to set priorities so the NPC can determine the order it needs to take its actions. Pulling out a weapon should be higher priority than attacking for example since you need to do it first.

Building our Script

Choose rules in a random or weighted-random fashion before an encounter with the player.

Choosing the right size for the script is important.

If you set the size too small, the AI won’t be very complex and won’t seem very smart.

If you set the size too large, the AI might spend too much time doing high priority tasks and won’t have any time to do the lower priority ones. This would be a problem, for example, for our fighter mentioned earlier if a vast amount of potential high priority tasks were defined.

Fitness

Dynamic scripting requires a fitness function that will assign a value to the results of an encounter that indicates how well the dynamically scripted AI performed.

Evaluate the “fitness” value of the encounter afterwards. Based on these evaluations, we’ll change the weights on our potential rules that we used to create the script.

This fitness function is game specific and needs to be carefully designed by the developers in advance.

Consider individual performance and also team performance. If one bot’s performance was terrible, but team performance was great, maybe its performance wasn’t so terrible after all? Then again, can’t only consider team performance, because maybe its performance is dragging the rest of the team down and they could do even better.

Weights

Update the weights of each ruleset as needed after implementing the fitness function.

There are many possibilities for the design of the formula, the book has its favorite, it doesn’t matter so much which you use.

The main goal here is to make sure that your Dynamically learning NPCs are generating variability, exploring new behaviors to adapt to the players strategy, but still utilizing strategies that work.

Difficulty Scaling in a game

In games we don’t want to create the ultimate AI to defeat the player every time. Not everyone is this good http://www.youtube.com/watch?v=Jen46qkZVNI&feature=related (3:10)

Therefore we do the following: When the computer loses a fight, change the weights such

that the computer focuses on successful behaviors instead of experimenting with new ones.

When the computer wins a fight, change the weights such that the computer focuses on varying and experimental strategies instead of ones that it knows are successful against the particular player.

If these two things happen correctly, the player will find themselves facing an AI that beats them roughly 50% of the time.

Supervised Learning

Supervised learning is a machine learning technique for creating a function from training data. The training data consist of pairs of input objects (typically vectors), and desired outputs. The output of the function can be a continuous value, or can predict a class label of the input object. The task of the supervised learner is to predict the value of the function for any valid input object after having seen a number of training examples (i.e. pairs of input and target output). To achieve this, the learner has to generalize from the presented data to unseen situations in a "reasonable" way.

Unsupervised learning

Unsupervised learning is a type of machine language where manual labels of inputs are not used. It is distinguished from supervised learning approaches which learn how to perform a task, such as classification or regression, using a set of human prepared examples.

Un/Supervised and In/Direct Adaptation interactions

In reality these 4 different strategies intermingle. You can have direct supervised, indirect unsupervised, etc.

An example: Supervised Direct Adaptation might be your self-learning AI that the developers run for a game for several days worth of run/test time before releasing it to the public to figure out some of the best strategies for their AI to implement.

Pattern Recognition and Squential Prediction

The concept: Many video games or games in general boil down to extremely overcomplicated versions of rock paper scissors. In a fighting game you have abilities such as kick/punch/block and many abilities are good at countering other abilities.

If a bot can predict what the player is going to do more reliably, it can win more reliably.

Or if an opponent always does the same thing in an FPS game: Moves to pick up his favorite weapon, gets a medkit, gets some armor, then goes to his favorite hiding spot in that order, we can more reliably counteract their plans.

How to actually do it

Basic Idea: If you had to predict what the next number would be in the following string sequence how would you do it?10010110111000010001101

Find the longest substring that matches the tail end of the string, then figure out what comes after that. What…? The answer to the example might help elaborate.

Answer

10010110111000010001101 It’s 1. The substring is 01101 and the

following number is 1. Problem: O(N^2) run time on this

algorithm. The solution is to instead, update our

knowledge base of match sizes incrementally. A picture speaks a thousand words. Instead of trying to explain this in words, here comes the next page.

RockPaperScissors Program:

PaperRockScissor.application

Incremental Pattern Recognition

Sample code of how it would work

Improving predictions

Compute all matching substrings, not just the longest one.

Prediction Value = Length(S)/DistanceToTail(S) Take the sum of the prediction value for each

time the same length occurred. The idea here is that smaller string matches

that occurred recently may be more reasonable than longer ones in the past and string matches that have occurred multiple times are much more reasonable than ones that have only occurred once or twice.

Overall

There are several different methods of machine learning one can use for various effects on computers.

Any good examples of this happening? Yes.

Black and White

Black and White is an excellent example of a game which uses mostly Supervised Indirect Adaptation

B&W Videos

http://www.youtube.com/watch?v=sTu41uCOXL8http://youtube.com/watch?v=LKv-YPhrrx8http://www.wischik.com/lu/senses/bwcreature.html#basics

Conclusion

There are many simple, non-resource-eating methods to utilize machine learning in a game.

Although it has been done before, the AI that exists in many new games is pathetically boring and a few simple tweaks would make mountains out of molehills with learning AI that can adapt to, counteract, and learn strategies used by the player. I would soil myself if AI ever learned some nonsense like this from a player http://www.youtube.com/watch?v=wh6JPbIWNU0 (:50)

Should anyone in this class ever be making a computer game, consider using some simple machine learning concepts to improve the game greatly.