Markov Games TCM Conference 2016 Chris Gann

Markov Games

TCM Conference 2016Chris [email protected]

A taxi company has divided the city into three regions – Northside, Downtown, and Southside. By keeping track of pickups and deliveries, the company has found that of the fares picked up in Northside, 50% stay in that region, 20% are taken Downtown, and 30% go to Southside. Of the fares picked up Downtown, only 10% go to Northside, 40% stay Downtown, and 50% go to Southside. Of the fares picked up in Southside, 30% go to each of Northside and Downtown, while 40% stay in Southside.

We would like to know what the distribution of taxis will be over time as they pick up and drop off successive fares. Suppose we want to know the probability that a taxi starting off Downtown, will be Downtown after letting off its seventh fare?

The Taxi Problem

State DiagramThis information can be represented in a state diagram which includes

• the three states D, N, and S corresponding to the three regions of the city

• the probabilities of a taxi transitioning from one region/state to another

Markov ChainsThese probabilities are constant and independent of previous behavior – this memorylessness of the system is called the Markov property. We assume that a transition – picking up and dropping off a fare – occurs each time the system is observed, and that observations occur at regular intervals. Systems with these characteristics are called Markov chains or Markov processes.

What is the probability that a taxi that starts off Downtown ends up in Northside after two fares?

One possibility is that the taxi stays Downtown for the first fare, and then transitions to Northside for the second. The probability of this occurring is:

But we could also have the taxi going to either Northside of Southside first, then transitioning to Northside:

Computing the Probabilities

Since the taxi could follow the first, second or third path, the probability of starting and ending Downtown after two fares is

But we could also have the taxi going to either Northside of Southside first, then transitioning to Northside:

Computing the Probabilities

If we want to know the probability of a taxi transitioning from one region to another after just three fares, the computation would have more possible paths.

Suppose we were interested in the probability of a taxi both starting and ending up Downtown. We can use a tree diagram to represent this calculation.

If we multiply along all of the paths and sum the results we find that this probability is 0.309.

Tree Diagram

However, we originally asked what would be the probability that a taxi starting Downtown is back Downtown after seven fares. Considering the tree diagram for only three transitions, trying to compute this probability using a tree diagram would be unwieldy.

Let’s look again at our first calculation – the probability of transitioning from D to N in two transitions.

This has the structure of an inner product. That is it can be represented as the dot product of two vectors, or equivalently, the product of a row and a column of two matrices.

Matrix Multiplication

Labeling these with matrices with D, S, and N indicates which probability is found in each entry. That is, each entry of these two simple matrices is the probability of going from the row label to the column label in one transition.

Developing the Transition Matrix

We can create a square matrix, T, called the transition matrix, by constructing rows for the probabilities going from Southside and Northside as well.

An entry Tij of this matrix is the probability of a transition from region i to region j. For example, T32, is the probability of a fare that originates in Northside going to Southside. (Note the sum of entries across rows of T.)

The Transition Matrix

What results when we multiply the transition matrix by itself?

The highlighted entry results from the same computation that we already considered for of a taxi going from D to N in two fares.

What are the other entries of ? What are the entries of ? ?

The Transition Matrix

Some general features of a transition matrix for a Markov chain can be observed from this problem:

1. A transition matrix is square. Clearly the number of rows and the number of columns are both the same as the number of states.2. All entries are between 0 and 1. This follows from the fact that the entries correspond to transition probabilities from one state to another.3. The sum of the entries in any row must be 1. The sum of the entries in an entire row is the sum of the transition probabilities from one state to all other states. Since a transition is sure to take place, this sum must be 1.4. The ij entry in the matrix gives the probability of being in state j after n transitions, with state i as the initial state.5. The entries in the transition matrix are constant. A Markov chain model depends upon the assumption that the transition matrix does not change throughout the process.

Hi Ho Cherry O – A Markov Game• A player starts with 0 cherries.• A spinner determines how many

cherries you gain or lose each turn.

• If the spinner lands on 1, 2, 3, or 4 cherries then you gain that many cherries.

• If the spinner lands on the bird or dog then you lose 2 cherries (or lose all your cherries if you have 2 or less).

• If the spinner lands on the spilled basket then you lose all your cherries.

• You must collect 10 or more cherries to win the game.

Markov Games TCM Conference 2016 Chris Gann

Documents

Transcript of Markov Games TCM Conference 2016 Chris Gann