Introduction to Sequence Models. Sequences Many types of information involve sequences: -Financial...

29
Introduction to Sequence Models

Transcript of Introduction to Sequence Models. Sequences Many types of information involve sequences: -Financial...

Introduction to Sequence Models

Sequence Models

Sequence models try to describe how an element of a sequence depends on previous (or sometimes following) elements.

For instance, financial models might try to predict a stock price tomorrow, given the stock prices for the past few weeks.

As another example, robot motion models try to predict where a robot will be, given current location and the commands given to the motors.

Types of Sequence Models

“Continuous-time” models:• these try to describe situations where things

change continously, or smoothly, as a function of time.

• For instance, weather models, models from physics and engineering describing how gases or liquids behave over time, some financial models, …

• Typically, these involve differential equations• We won’t be talking about these

Types of Sequence Models

“Discrete-time” models:• These try to describe situations where the environment

provides information periodically, rather than continuously.– For instance, if stock prices are quoted once per day, or once

per hour, or once per time period T, then it’s a discrete sequence of data.

– The price of a stock as it fluctuates all day, at any time point, is a continuous sequence of data.

• We’ll cover 2 examples of Discrete-time sequence models:– Hidden Markov Models (used in NLP, machine learning)– Particle Filters (primarily used in robotics)

Hidden Markov Models

How students spend their time (observed once per time interval T):

Sleep Study Video games

0.4 0.60.8

0.0

0.2

0.3

0.3

0.3

0.1

Markov Model:- A set of states- A set of transitions (edges) from one state to the next- A conditional probability P(destination state | source state)

Quiz: Markov Models

How students spend their time (observed once per time interval T):

Sleep Study Video games

0.4 0.60.8

0.0

0.2

0.3

0.3

0.3

0.1

Suppose a student starts in the Study state.What is P(Study) in the next time step?What about P(Study) after two time steps?And P(Study) after three time steps?

Answer: Markov Models

How students spend their time (observed once per time interval T):

Sleep Study Video games

0.4 0.60.8

0.0

0.2

0.3

0.3

0.3

0.1

Suppose a student starts in the Study state.What is P(Study) in the next time step? 0.4What about P(Study) after two time steps? 0.4*0.4 + 0.3*0.2 + 0.3*0.1 = 0.16 + 0.06 + 0.03 = .25And P(Study) after three time steps? … complicated

Simpler Example

Suppose the student starts asleep.What is P(Sleep) after 1 time step?What is P(Sleep) after 2 time steps?What is P(Sleep) after 3 time steps?

Sleep Study

00.50.5

1

Answer: Simpler Example

Suppose the student starts asleep.What is P(Sleep) after 1 time step? 0.5What is P(Sleep) after 2 time steps? 0.5*0.5 + 0.5*1 = 0.75What is P(Sleep) after 3 time steps? 0.5*0.5*0.5 + 0.5*1*0.5 + 0.5*0.5*1 + 0.5*0*1 = 0.625

Sleep Study

00.50.5

1

Stationary Distribution

What happens after many, many time steps?We’ll make three assumptions about the transition probabilities:1. It’s possible to get from any state to any other state.2. On average, the number of time steps it takes to get from one

state back to itself is finite.3. There are no cycles (or periods).Any Markov chains in this course will have these properties; in practice, most do anyway.

Sleep Study

00.50.5

1

Stationary Distribution

What happens after many, many time steps?If those assumptions are true, then:- After enough time steps, the probability of each state

converges to a stationary distribution.- This means that the probability at one time step is the

same as the probability at the next time step, and the one after that, and the one after that, …

Sleep Study

00.50.5

1

Stationary Distribution

Let’s compute the stationary distribution for this Markov chain:Let Pt be the probability distribution for Sleep at time step t.

For big enough t, Pt(Sleep) = Pt-1(Sleep).

Pt(Sleep) = Pt-1(Sleep)*0.5 + Pt-1(Study)*1

x = 0.5x + 1*(1-x)1.5x = 1x = 2/3

Sleep Study

00.50.5

1

Quiz: Stationary Distribution

Compute the stationary distribution for this Markov chain.

A B

0.40.750.25

0.6

Answer: Stationary Distribution

Compute the stationary distribution for this Markov chain.Pt(A) = Pt-1(A)

Pt(A) = Pt-1(A) * 0.75 + Pt-1(B) * 0.6

x = 0.75 x + 0.6 (1-x)0.85x = 0.6x = 0.6 / 0.85 ~= 0.71

A B

0.40.750.25

0.6

Learning Markov Model Parameters

There are six probabilities associated with Markov models:1. Initial state probabilities P0(A), P0(B)

2. Transition probabilities P(A|A), P(B|A), P(A|B), and P(B|B).

A B

???

?

Initial state is A: ?is B: ?

Learning Markov Model Parameters

Here is a sequence of observations from our Markov model:BAAABABBAAAUse maximum likelihood to estimate these parameters.1. P0(A) = 0/1, P0(B) = 1/1

2. P(A|A) = 4/6 = 2/3. P(B|A) = 2/6=1/3.3. P(A|B) = 3/4. P(B|B) = 1/4.

A B

???

?

Initial state is A: ?is B: ?

Quiz: Learning Markov Model Parameters

Here is a sequence of observations from our Markov model:AAABBBBBABBBAUse maximum likelihood to estimate these parameters.

A B

???

?

Initial state is A: ?is B: ?

Answer: Learning Markov Model Parameters

Here is a sequence of observations from our Markov model:AAABBBBBABBBAUse maximum likelihood to estimate these parameters.1. P0(A) = 1/1. P0(B) = 0/1.

2. P(A|A) = 2/4. P(B|A) = 2/4.3. P(A|B) = 2/8 = 1/4. P(B|B) = 6/8 = 3/4.

A B

???

?

Initial state is A: ?is B: ?

Restrictions on Markov Models

Sleep Study Video games

0.4 0.60.8

0.0

0.2

0.3

0.3

0.30.1

- Probability only depends on previous state, not any of the states before that (called the Markov assumption)

- Transition probabilities cannot change over time (called the stationary assumption)

Observations and Latent States

Markov models don’t get used much in AI.

The reason is that Markov models assume that you know exactly what state you are in, at each time step.

This is rarely true for AI agents.

Instead, we will say that the agent has a set of possible latent states – states that are not observed, or known to the agent.

In addition, the agent has sensors that allow it to sense some aspects of the environment, to take measurements or observations.

Hidden Markov Models

Suppose you are the parent of a college student, and would like to know how studious your child is.

You can’t observe them at all times, but you can periodically call, and see if your child answers.

Sleep Study

0.50.60.4

0.5H1 H2 H3

Sleep Study

0.50.60.4

0.5

Sleep Study

0.50.60.4

0.5

O1 O2 O3Answer callor not?

Answer callor not?

Answer callor not?

Hidden Markov Models

H1 H2 H3…

O1 O2 O3

H1 H2 P(H2|H1)

Sleep Sleep 0.6

Study Sleep 0.5

H2 H3 P(H3|H2)

Sleep Sleep 0.6

Study Sleep 0.5

H4 H3 P(H4|H3)

Sleep Sleep 0.6

Study Sleep 0.5

H1 O1P(O1

|H1)

Sleep Ans 0.1

Study Ans 0.8

H2 O2P(O2

|H2)

Sleep Ans 0.1

Study Ans 0.8

H3 O3P(O3

|H3)

Sleep Ans 0.1

Study Ans 0.8

H1 P(H1)

Sleep 0.5

Study 0.5

Here’s the same model, with probabilities in tables.

Hidden Markov Models

HMMs (and MMs) are a special type of Bayes Net. Everything you have learned about BNs applies here.

H1 H2 H3…

O1 O2 O3

H1 H2 P(H2|H1)

Sleep Sleep 0.6

Study Sleep 0.5

H2 H3 P(H3|H2)

Sleep Sleep 0.6

Study Sleep 0.5

H4 H3 P(H4|H3)

Sleep Sleep 0.6

Study Sleep 0.5

H1 O1P(O1

|H1)

Sleep Ans 0.1

Study Ans 0.8

H2 O2P(O2

|H2)

Sleep Ans 0.1

Study Ans 0.8

H3 O3P(O3

|H3)

Sleep Ans 0.1

Study Ans 0.8

H1 P(H1)

Sleep 0.5

Study 0.5

Quick Review of BNs for HMMs

H1

O1

H1 H2

Hidden Markov Models

Suppose a parent calls and gets an answer at time step 1. What is P(H1=Sleep|O1=Ans)?

Notice: before the observation, P(Sleep) was 0.5. By making a call and getting an answer, the parent’s belief in Sleep drops to P(Sleep) = 0.111.

H1…

O1

H1 H2 P(H2|H1)

Sleep Sleep 0.6

Study Sleep 0.5

H1 O1P(O1

|H1)

Sleep Ans 0.1

Study Ans 0.8

H1 P(H1)

Sleep 0.5

Study 0.5

Hidden Markov ModelsSuppose a parent calls and gets an answer at time step 2. What is P(H2=Sleep|O2=Ans)?

H1

O1

H1 H2 P(H2|H1)

Sleep Sleep 0.6

Study Sleep 0.5

H1 O1P(O1

|H1)

Sleep Ans 0.1

Study Ans 0.8

H1 P(H1)

Sleep 0.5

Study 0.5

H2

O2

Quiz: Hidden Markov Models

H1

O1

H1 H2 P(H2|H1)

Sleep Sleep 0.6

Study Sleep 0.5

H1 O1P(O1

|H1)

Sleep Ans 0.1

Study Ans 0.8

H1 P(H1)

Sleep 0.5

Study 0.5

H2

O2

Suppose a parent calls twice, once at time step 1 and once at time step 2. The first time, the child does not answer, and the second time the child does.

Now what is P(H2=Sleep)?

Answer: Hidden Markov Models

H1

O1

H1 H2 P(H2|H1)

Sleep Sleep 0.6

Study Sleep 0.5

H1 O1P(O1

|H1)

Sleep Ans 0.1

Study Ans 0.8

H1 P(H1)

Sleep 0.5

Study 0.5

H2

O2

Suppose a parent calls twice, once at time step 1 and once at time step 2. The first time, the child does not answer, and the second time the child does.

Now what is P(H2=Sleep)?

Numerator:+

Denominator:+