Milner Lecture From bisimulation to representation ...

130
Milner Lecture From bisimulation to representation learning via metrics Prakash Panangaden School of Computer Science McGill University and Montreal Institute of Learning Algorithms 30 September 2021 Panangaden Milner Lecture 2021 30 September 2021 1 / 40

Transcript of Milner Lecture From bisimulation to representation ...

Page 1: Milner Lecture From bisimulation to representation ...

Milner LectureFrom bisimulation to representation learning via

metrics

Prakash PanangadenSchool of Computer Science

McGill Universityand

Montreal Institute of Learning Algorithms

30 September 2021

Panangaden Milner Lecture 2021 30 September 2021 1 / 40

Page 2: Milner Lecture From bisimulation to representation ...

Outline

1 Introduction

2 Bisimulation for LTS’s

3 Probabilistic bisimulation

4 Continuous state spaces

5 Metrics

6 Representation learning

7 The MICo Distance

8 Experimental results

9 Conclusions

Panangaden Milner Lecture 2021 30 September 2021 2 / 40

Page 3: Milner Lecture From bisimulation to representation ...

Outline

1 Introduction

2 Bisimulation for LTS’s

3 Probabilistic bisimulation

4 Continuous state spaces

5 Metrics

6 Representation learning

7 The MICo Distance

8 Experimental results

9 Conclusions

Panangaden Milner Lecture 2021 30 September 2021 2 / 40

Page 4: Milner Lecture From bisimulation to representation ...

Outline

1 Introduction

2 Bisimulation for LTS’s

3 Probabilistic bisimulation

4 Continuous state spaces

5 Metrics

6 Representation learning

7 The MICo Distance

8 Experimental results

9 Conclusions

Panangaden Milner Lecture 2021 30 September 2021 2 / 40

Page 5: Milner Lecture From bisimulation to representation ...

Outline

1 Introduction

2 Bisimulation for LTS’s

3 Probabilistic bisimulation

4 Continuous state spaces

5 Metrics

6 Representation learning

7 The MICo Distance

8 Experimental results

9 Conclusions

Panangaden Milner Lecture 2021 30 September 2021 2 / 40

Page 6: Milner Lecture From bisimulation to representation ...

Outline

1 Introduction

2 Bisimulation for LTS’s

3 Probabilistic bisimulation

4 Continuous state spaces

5 Metrics

6 Representation learning

7 The MICo Distance

8 Experimental results

9 Conclusions

Panangaden Milner Lecture 2021 30 September 2021 2 / 40

Page 7: Milner Lecture From bisimulation to representation ...

Outline

1 Introduction

2 Bisimulation for LTS’s

3 Probabilistic bisimulation

4 Continuous state spaces

5 Metrics

6 Representation learning

7 The MICo Distance

8 Experimental results

9 Conclusions

Panangaden Milner Lecture 2021 30 September 2021 2 / 40

Page 8: Milner Lecture From bisimulation to representation ...

Outline

1 Introduction

2 Bisimulation for LTS’s

3 Probabilistic bisimulation

4 Continuous state spaces

5 Metrics

6 Representation learning

7 The MICo Distance

8 Experimental results

9 Conclusions

Panangaden Milner Lecture 2021 30 September 2021 2 / 40

Page 9: Milner Lecture From bisimulation to representation ...

Outline

1 Introduction

2 Bisimulation for LTS’s

3 Probabilistic bisimulation

4 Continuous state spaces

5 Metrics

6 Representation learning

7 The MICo Distance

8 Experimental results

9 Conclusions

Panangaden Milner Lecture 2021 30 September 2021 2 / 40

Page 10: Milner Lecture From bisimulation to representation ...

Outline

1 Introduction

2 Bisimulation for LTS’s

3 Probabilistic bisimulation

4 Continuous state spaces

5 Metrics

6 Representation learning

7 The MICo Distance

8 Experimental results

9 Conclusions

Panangaden Milner Lecture 2021 30 September 2021 2 / 40

Page 11: Milner Lecture From bisimulation to representation ...

Behavioural equivalence is fundamental

When do two states have exactly the same behaviour?

What can one observe of the behaviour?What should be guaranteed?(i) If two states are equivalent we should not be able to “see” anydifferences in observable behaviour.(ii) If two states are equivalent they should stay equivalent as theyevolve.

Panangaden Milner Lecture 2021 30 September 2021 3 / 40

Page 12: Milner Lecture From bisimulation to representation ...

Behavioural equivalence is fundamental

When do two states have exactly the same behaviour?What can one observe of the behaviour?

What should be guaranteed?(i) If two states are equivalent we should not be able to “see” anydifferences in observable behaviour.(ii) If two states are equivalent they should stay equivalent as theyevolve.

Panangaden Milner Lecture 2021 30 September 2021 3 / 40

Page 13: Milner Lecture From bisimulation to representation ...

Behavioural equivalence is fundamental

When do two states have exactly the same behaviour?What can one observe of the behaviour?What should be guaranteed?

(i) If two states are equivalent we should not be able to “see” anydifferences in observable behaviour.(ii) If two states are equivalent they should stay equivalent as theyevolve.

Panangaden Milner Lecture 2021 30 September 2021 3 / 40

Page 14: Milner Lecture From bisimulation to representation ...

Behavioural equivalence is fundamental

When do two states have exactly the same behaviour?What can one observe of the behaviour?What should be guaranteed?(i) If two states are equivalent we should not be able to “see” anydifferences in observable behaviour.

(ii) If two states are equivalent they should stay equivalent as theyevolve.

Panangaden Milner Lecture 2021 30 September 2021 3 / 40

Page 15: Milner Lecture From bisimulation to representation ...

Behavioural equivalence is fundamental

When do two states have exactly the same behaviour?What can one observe of the behaviour?What should be guaranteed?(i) If two states are equivalent we should not be able to “see” anydifferences in observable behaviour.(ii) If two states are equivalent they should stay equivalent as theyevolve.

Panangaden Milner Lecture 2021 30 September 2021 3 / 40

Page 16: Milner Lecture From bisimulation to representation ...

Heros of concurrency theory: Milner and Park

Panangaden Milner Lecture 2021 30 September 2021 4 / 40

Page 17: Milner Lecture From bisimulation to representation ...

Inspiration for my work I: Dexter Kozen

Panangaden Milner Lecture 2021 30 September 2021 5 / 40

Page 18: Milner Lecture From bisimulation to representation ...

Inspiration for my work II: Lawvere and Giry

Panangaden Milner Lecture 2021 30 September 2021 6 / 40

Page 19: Milner Lecture From bisimulation to representation ...

Special thanks I

Panangaden Milner Lecture 2021 30 September 2021 7 / 40

Page 20: Milner Lecture From bisimulation to representation ...

Special thanks II

Panangaden Milner Lecture 2021 30 September 2021 8 / 40

Page 21: Milner Lecture From bisimulation to representation ...

A bit of history

Cantor and the back-and-forth argument

Lumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

Panangaden Milner Lecture 2021 30 September 2021 9 / 40

Page 22: Milner Lecture From bisimulation to representation ...

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’s

Bisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

Panangaden Milner Lecture 2021 30 September 2021 9 / 40

Page 23: Milner Lecture From bisimulation to representation ...

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and Park

Probabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

Panangaden Milner Lecture 2021 30 September 2021 9 / 40

Page 24: Milner Lecture From bisimulation to representation ...

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989

Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

Panangaden Milner Lecture 2021 30 September 2021 9 / 40

Page 25: Milner Lecture From bisimulation to representation ...

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...

Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

Panangaden Milner Lecture 2021 30 September 2021 9 / 40

Page 26: Milner Lecture From bisimulation to representation ...

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999

Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

Panangaden Milner Lecture 2021 30 September 2021 9 / 40

Page 27: Milner Lecture From bisimulation to representation ...

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001

Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

Panangaden Milner Lecture 2021 30 September 2021 9 / 40

Page 28: Milner Lecture From bisimulation to representation ...

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003

Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

Panangaden Milner Lecture 2021 30 September 2021 9 / 40

Page 29: Milner Lecture From bisimulation to representation ...

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004

Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

Panangaden Milner Lecture 2021 30 September 2021 9 / 40

Page 30: Milner Lecture From bisimulation to representation ...

A bit of history

Cantor and the back-and-forth argumentLumpability in queueing theory 1960’sBisimulation of nondeterministic automata 1970’s and processalgebras 1980’s: Milner and ParkProbabilistic bisimulation, discrete systems: Larsen and Skou1989Bisimulation of Markov processes on continuous state spaces:Desharnais, Edalat, P. 1997...Bisimulation metrics for Markov processes Desharnais, Gupta,Jagadeesan, P. 1999Fixed-point version: van Breugel and Worrell 2001Bisimulation for MDP’s : Givan and Dean 2003Bisimulation metrics for MDP’s: Ferns, Precup, P. 2004Representation learning using “metrics”: Castro, Kastner, P.Rowland 2021

Panangaden Milner Lecture 2021 30 September 2021 9 / 40

Page 31: Milner Lecture From bisimulation to representation ...

The definition

A set of states S,

a set of labels or actions, L or A anda transition relation ⊆ S×A× S, usually written

→a⊆ S× S.

The transitions could be indeterminate (nondeterministic).We write s a−−→ s′ for (s, s′) ∈→a.

Panangaden Milner Lecture 2021 30 September 2021 10 / 40

Page 32: Milner Lecture From bisimulation to representation ...

The definition

A set of states S,a set of labels or actions, L or A and

a transition relation ⊆ S×A× S, usually written

→a⊆ S× S.

The transitions could be indeterminate (nondeterministic).We write s a−−→ s′ for (s, s′) ∈→a.

Panangaden Milner Lecture 2021 30 September 2021 10 / 40

Page 33: Milner Lecture From bisimulation to representation ...

The definition

A set of states S,a set of labels or actions, L or A anda transition relation ⊆ S×A× S, usually written

→a⊆ S× S.

The transitions could be indeterminate (nondeterministic).

We write s a−−→ s′ for (s, s′) ∈→a.

Panangaden Milner Lecture 2021 30 September 2021 10 / 40

Page 34: Milner Lecture From bisimulation to representation ...

The definition

A set of states S,a set of labels or actions, L or A anda transition relation ⊆ S×A× S, usually written

→a⊆ S× S.

The transitions could be indeterminate (nondeterministic).We write s a−−→ s′ for (s, s′) ∈→a.

Panangaden Milner Lecture 2021 30 September 2021 10 / 40

Page 35: Milner Lecture From bisimulation to representation ...

Vending machine LTSs

Place cup

Insert money

Choose

WaitWait

Cup

£1

CoffeeTea

Dispense coffeeDispense tea

Panangaden Milner Lecture 2021 30 September 2021 11 / 40

Page 36: Milner Lecture From bisimulation to representation ...

Vending machine LTSs

Place cup

Insert money

ChooseChoose

WaitWait

Cup

£1£1

CoffeeTea

Dispense teaDispense coffee

Panangaden Milner Lecture 2021 30 September 2021 12 / 40

Page 37: Milner Lecture From bisimulation to representation ...

Are the two LTSs equivalent?

One gives us the choice whereas the other makes the choiceinternally.

The sequences that the machines can perform are identical:[Cup;£1;(Cof + Tea)]∗

We need to go beyond language equivalence.

Panangaden Milner Lecture 2021 30 September 2021 13 / 40

Page 38: Milner Lecture From bisimulation to representation ...

Are the two LTSs equivalent?

One gives us the choice whereas the other makes the choiceinternally.The sequences that the machines can perform are identical:[Cup;£1;(Cof + Tea)]∗

We need to go beyond language equivalence.

Panangaden Milner Lecture 2021 30 September 2021 13 / 40

Page 39: Milner Lecture From bisimulation to representation ...

Are the two LTSs equivalent?

One gives us the choice whereas the other makes the choiceinternally.The sequences that the machines can perform are identical:[Cup;£1;(Cof + Tea)]∗

We need to go beyond language equivalence.

Panangaden Milner Lecture 2021 30 September 2021 13 / 40

Page 40: Milner Lecture From bisimulation to representation ...

Formal definition

s s′

t t′

a

a

[Bisimulation definition]If s ∼ t then

∀s ∈ S, ∀a ∈ A, s a−−→ s′ ⇒ ∃t′, t a−−→ t′ with s′ ∼ t′

and vice versa with s and t interchanged.

Panangaden Milner Lecture 2021 30 September 2021 14 / 40

Page 41: Milner Lecture From bisimulation to representation ...

Discrete probabilistic transition systems

Just like a labelled transition system with probabilities associatedwith the transitions.

(S,A,∀a ∈ A Ta : S× S −→ [0, 1])

The model is reactive: All probabilistic data is internal - noprobabilities associated with environment behaviour.

Panangaden Milner Lecture 2021 30 September 2021 15 / 40

Page 42: Milner Lecture From bisimulation to representation ...

Discrete probabilistic transition systems

Just like a labelled transition system with probabilities associatedwith the transitions.

(S,A, ∀a ∈ A Ta : S× S −→ [0, 1])

The model is reactive: All probabilistic data is internal - noprobabilities associated with environment behaviour.

Panangaden Milner Lecture 2021 30 September 2021 15 / 40

Page 43: Milner Lecture From bisimulation to representation ...

Discrete probabilistic transition systems

Just like a labelled transition system with probabilities associatedwith the transitions.

(S,A, ∀a ∈ A Ta : S× S −→ [0, 1])

The model is reactive: All probabilistic data is internal - noprobabilities associated with environment behaviour.

Panangaden Milner Lecture 2021 30 September 2021 15 / 40

Page 44: Milner Lecture From bisimulation to representation ...

Probabilistic bisimulation : Larsen and Skou

s0

s1

s2

s3

a, 13

a, 13

a, 13

b, 1 c, 1 c, 1

t0

t1 t2

a, 13 a, 2

3

b, 1 c, 1

Panangaden Milner Lecture 2021 30 September 2021 16 / 40

Page 45: Milner Lecture From bisimulation to representation ...

Are s0 and t0 bisimilar?

Yes, but one needs to add up the probabilities to s2 and s3.

If s is a state, a an action and C a set of states, we writeTa(s,C) =

∑s′∈S Ta(s, s′) for the probability of jumping on an a-action to

one of the states in C.

DefinitionR is a bisimulation relation if whenever sRt and C is an equivalenceclass of R then Ta(s,C) = Ta(t,C).

Panangaden Milner Lecture 2021 30 September 2021 17 / 40

Page 46: Milner Lecture From bisimulation to representation ...

Are s0 and t0 bisimilar?

Yes, but one needs to add up the probabilities to s2 and s3.

If s is a state, a an action and C a set of states, we writeTa(s,C) =

∑s′∈S Ta(s, s′) for the probability of jumping on an a-action to

one of the states in C.

DefinitionR is a bisimulation relation if whenever sRt and C is an equivalenceclass of R then Ta(s,C) = Ta(t,C).

Panangaden Milner Lecture 2021 30 September 2021 17 / 40

Page 47: Milner Lecture From bisimulation to representation ...

Are s0 and t0 bisimilar?

Yes, but one needs to add up the probabilities to s2 and s3.

If s is a state, a an action and C a set of states, we writeTa(s,C) =

∑s′∈S Ta(s, s′) for the probability of jumping on an a-action to

one of the states in C.

DefinitionR is a bisimulation relation if whenever sRt and C is an equivalenceclass of R then Ta(s,C) = Ta(t,C).

Panangaden Milner Lecture 2021 30 September 2021 17 / 40

Page 48: Milner Lecture From bisimulation to representation ...

Markov decision processes?

Markov decision processes are probabilistic versions of labelledtransition systems. Labelled transition systems where the finalstate is governed by a probability distribution - no otherindeterminacy.

There is a reward associated with each transition.We observe the interactions and the rewards - not the internalstates.

Panangaden Milner Lecture 2021 30 September 2021 18 / 40

Page 49: Milner Lecture From bisimulation to representation ...

Markov decision processes?

Markov decision processes are probabilistic versions of labelledtransition systems. Labelled transition systems where the finalstate is governed by a probability distribution - no otherindeterminacy.There is a reward associated with each transition.

We observe the interactions and the rewards - not the internalstates.

Panangaden Milner Lecture 2021 30 September 2021 18 / 40

Page 50: Milner Lecture From bisimulation to representation ...

Markov decision processes?

Markov decision processes are probabilistic versions of labelledtransition systems. Labelled transition systems where the finalstate is governed by a probability distribution - no otherindeterminacy.There is a reward associated with each transition.We observe the interactions and the rewards - not the internalstates.

Panangaden Milner Lecture 2021 30 September 2021 18 / 40

Page 51: Milner Lecture From bisimulation to representation ...

Markov decision processes: formal definition

(S,A, ∀a ∈ A,Pa : S −→ D(S),R : A× S −→ R)

whereS : the state space, we will take it to be a finite set.A : the actions, a finite setPa : the transition function; D(S) denotes distributions over SR : the reward, could readily make it stochastic.Will write Pa(s,C) for Pa(s)(C).

Panangaden Milner Lecture 2021 30 September 2021 19 / 40

Page 52: Milner Lecture From bisimulation to representation ...

Policies

MDP

(S,A, ∀a ∈ A,Pa : S −→ D(S),R : A× S −→ R)

We control the choice of action; it is not some external scheduler.

Policy

π : S −→ D(A)

The goal is choose the best policy: numerous algorithms to find orapproximate the optimal policy.

Panangaden Milner Lecture 2021 30 September 2021 20 / 40

Page 53: Milner Lecture From bisimulation to representation ...

Policies

MDP

(S,A, ∀a ∈ A,Pa : S −→ D(S),R : A× S −→ R)

We control the choice of action; it is not some external scheduler.

Policy

π : S −→ D(A)

The goal is choose the best policy: numerous algorithms to find orapproximate the optimal policy.

Panangaden Milner Lecture 2021 30 September 2021 20 / 40

Page 54: Milner Lecture From bisimulation to representation ...

Policies

MDP

(S,A, ∀a ∈ A,Pa : S −→ D(S),R : A× S −→ R)

We control the choice of action; it is not some external scheduler.

Policy

π : S −→ D(A)

The goal is choose the best policy: numerous algorithms to find orapproximate the optimal policy.

Panangaden Milner Lecture 2021 30 September 2021 20 / 40

Page 55: Milner Lecture From bisimulation to representation ...

Policies

MDP

(S,A, ∀a ∈ A,Pa : S −→ D(S),R : A× S −→ R)

We control the choice of action; it is not some external scheduler.

Policy

π : S −→ D(A)

The goal is choose the best policy: numerous algorithms to find orapproximate the optimal policy.

Panangaden Milner Lecture 2021 30 September 2021 20 / 40

Page 56: Milner Lecture From bisimulation to representation ...

Bisimulation

Let R be an equivalence relation. R is a bisimulation if: s R t if (∀ a)and all equivalence classes C of R:

(i) R(a, s) = R(a, t)(ii) Pa(s,C) = Pa(t,C)

s, t are bisimilar if there is a bisimulation relation R with sRt them.Basic pattern: immediate rewards match (initiation), stay relatedafter the transition (coinduction).Bisimulation can be defined as the greatest fixed point of arelation transformer.

Panangaden Milner Lecture 2021 30 September 2021 21 / 40

Page 57: Milner Lecture From bisimulation to representation ...

Bisimulation

Let R be an equivalence relation. R is a bisimulation if: s R t if (∀ a)and all equivalence classes C of R:

(i) R(a, s) = R(a, t)

(ii) Pa(s,C) = Pa(t,C)

s, t are bisimilar if there is a bisimulation relation R with sRt them.Basic pattern: immediate rewards match (initiation), stay relatedafter the transition (coinduction).Bisimulation can be defined as the greatest fixed point of arelation transformer.

Panangaden Milner Lecture 2021 30 September 2021 21 / 40

Page 58: Milner Lecture From bisimulation to representation ...

Bisimulation

Let R be an equivalence relation. R is a bisimulation if: s R t if (∀ a)and all equivalence classes C of R:

(i) R(a, s) = R(a, t)(ii) Pa(s,C) = Pa(t,C)

s, t are bisimilar if there is a bisimulation relation R with sRt them.Basic pattern: immediate rewards match (initiation), stay relatedafter the transition (coinduction).Bisimulation can be defined as the greatest fixed point of arelation transformer.

Panangaden Milner Lecture 2021 30 September 2021 21 / 40

Page 59: Milner Lecture From bisimulation to representation ...

Bisimulation

Let R be an equivalence relation. R is a bisimulation if: s R t if (∀ a)and all equivalence classes C of R:

(i) R(a, s) = R(a, t)(ii) Pa(s,C) = Pa(t,C)

s, t are bisimilar if there is a bisimulation relation R with sRt them.

Basic pattern: immediate rewards match (initiation), stay relatedafter the transition (coinduction).Bisimulation can be defined as the greatest fixed point of arelation transformer.

Panangaden Milner Lecture 2021 30 September 2021 21 / 40

Page 60: Milner Lecture From bisimulation to representation ...

Bisimulation

Let R be an equivalence relation. R is a bisimulation if: s R t if (∀ a)and all equivalence classes C of R:

(i) R(a, s) = R(a, t)(ii) Pa(s,C) = Pa(t,C)

s, t are bisimilar if there is a bisimulation relation R with sRt them.Basic pattern: immediate rewards match (initiation), stay relatedafter the transition (coinduction).

Bisimulation can be defined as the greatest fixed point of arelation transformer.

Panangaden Milner Lecture 2021 30 September 2021 21 / 40

Page 61: Milner Lecture From bisimulation to representation ...

Bisimulation

Let R be an equivalence relation. R is a bisimulation if: s R t if (∀ a)and all equivalence classes C of R:

(i) R(a, s) = R(a, t)(ii) Pa(s,C) = Pa(t,C)

s, t are bisimilar if there is a bisimulation relation R with sRt them.Basic pattern: immediate rewards match (initiation), stay relatedafter the transition (coinduction).Bisimulation can be defined as the greatest fixed point of arelation transformer.

Panangaden Milner Lecture 2021 30 September 2021 21 / 40

Page 62: Milner Lecture From bisimulation to representation ...

Continuous state spaces: why?

Software controllers attached to physical devices or sensors -robots, controllers.

Continuous state space but discrete time.Applications to control systems.Applications to probabilistic programming languages.

Panangaden Milner Lecture 2021 30 September 2021 22 / 40

Page 63: Milner Lecture From bisimulation to representation ...

Continuous state spaces: why?

Software controllers attached to physical devices or sensors -robots, controllers.Continuous state space but discrete time.

Applications to control systems.Applications to probabilistic programming languages.

Panangaden Milner Lecture 2021 30 September 2021 22 / 40

Page 64: Milner Lecture From bisimulation to representation ...

Continuous state spaces: why?

Software controllers attached to physical devices or sensors -robots, controllers.Continuous state space but discrete time.Applications to control systems.

Applications to probabilistic programming languages.

Panangaden Milner Lecture 2021 30 September 2021 22 / 40

Page 65: Milner Lecture From bisimulation to representation ...

Continuous state spaces: why?

Software controllers attached to physical devices or sensors -robots, controllers.Continuous state space but discrete time.Applications to control systems.Applications to probabilistic programming languages.

Panangaden Milner Lecture 2021 30 September 2021 22 / 40

Page 66: Milner Lecture From bisimulation to representation ...

Some remarks on the use of continuous spaces

Can be used for reasoning - but much better if we could have afinite-state version.

Why not discretize right away and never worry about thecontinuous case?How can we say that our discrete approximation is “accurate”?We lose the ability to refine the model later.

Panangaden Milner Lecture 2021 30 September 2021 23 / 40

Page 67: Milner Lecture From bisimulation to representation ...

Some remarks on the use of continuous spaces

Can be used for reasoning - but much better if we could have afinite-state version.Why not discretize right away and never worry about thecontinuous case?

How can we say that our discrete approximation is “accurate”?We lose the ability to refine the model later.

Panangaden Milner Lecture 2021 30 September 2021 23 / 40

Page 68: Milner Lecture From bisimulation to representation ...

Some remarks on the use of continuous spaces

Can be used for reasoning - but much better if we could have afinite-state version.Why not discretize right away and never worry about thecontinuous case?How can we say that our discrete approximation is “accurate”?

We lose the ability to refine the model later.

Panangaden Milner Lecture 2021 30 September 2021 23 / 40

Page 69: Milner Lecture From bisimulation to representation ...

Some remarks on the use of continuous spaces

Can be used for reasoning - but much better if we could have afinite-state version.Why not discretize right away and never worry about thecontinuous case?How can we say that our discrete approximation is “accurate”?We lose the ability to refine the model later.

Panangaden Milner Lecture 2021 30 September 2021 23 / 40

Page 70: Milner Lecture From bisimulation to representation ...

The Need for Measure Theory

Basic fact: There are subsets of R for which no sensible notion ofsize can be defined.

More precisely, there is no translation-invariant measure definedon all the subsets of the reals.

Panangaden Milner Lecture 2021 30 September 2021 24 / 40

Page 71: Milner Lecture From bisimulation to representation ...

The Need for Measure Theory

Basic fact: There are subsets of R for which no sensible notion ofsize can be defined.More precisely, there is no translation-invariant measure definedon all the subsets of the reals.

Panangaden Milner Lecture 2021 30 September 2021 24 / 40

Page 72: Milner Lecture From bisimulation to representation ...

Logical Characterization

Very austere logic:

L ::== T|φ1 ∧ φ2|〈a〉qφ

s |= 〈a〉qφ means that if the system is in state s, then after theaction a, with probability at least q the new state will satisfy theformula φ.Two systems are bisimilar iff they obey the same formulas of L.[DEP 1998 LICS, I and C 2002]No finite branching assumption.No negation in the logic,so one can obtain a logical characterization result for simulationbut it needs disjunction.The proof uses tools from descriptive set theory and measuretheory.Such a theorem originally proved for LTS with finite-branchingrestrictions by Hennessy and Milner in 1977 and van Benthem in1976.

Panangaden Milner Lecture 2021 30 September 2021 25 / 40

Page 73: Milner Lecture From bisimulation to representation ...

Logical Characterization

Very austere logic:

L ::== T|φ1 ∧ φ2|〈a〉qφs |= 〈a〉qφ means that if the system is in state s, then after theaction a, with probability at least q the new state will satisfy theformula φ.

Two systems are bisimilar iff they obey the same formulas of L.[DEP 1998 LICS, I and C 2002]No finite branching assumption.No negation in the logic,so one can obtain a logical characterization result for simulationbut it needs disjunction.The proof uses tools from descriptive set theory and measuretheory.Such a theorem originally proved for LTS with finite-branchingrestrictions by Hennessy and Milner in 1977 and van Benthem in1976.

Panangaden Milner Lecture 2021 30 September 2021 25 / 40

Page 74: Milner Lecture From bisimulation to representation ...

Logical Characterization

Very austere logic:

L ::== T|φ1 ∧ φ2|〈a〉qφs |= 〈a〉qφ means that if the system is in state s, then after theaction a, with probability at least q the new state will satisfy theformula φ.Two systems are bisimilar iff they obey the same formulas of L.[DEP 1998 LICS, I and C 2002]

No finite branching assumption.No negation in the logic,so one can obtain a logical characterization result for simulationbut it needs disjunction.The proof uses tools from descriptive set theory and measuretheory.Such a theorem originally proved for LTS with finite-branchingrestrictions by Hennessy and Milner in 1977 and van Benthem in1976.

Panangaden Milner Lecture 2021 30 September 2021 25 / 40

Page 75: Milner Lecture From bisimulation to representation ...

Logical Characterization

Very austere logic:

L ::== T|φ1 ∧ φ2|〈a〉qφs |= 〈a〉qφ means that if the system is in state s, then after theaction a, with probability at least q the new state will satisfy theformula φ.Two systems are bisimilar iff they obey the same formulas of L.[DEP 1998 LICS, I and C 2002]No finite branching assumption.

No negation in the logic,so one can obtain a logical characterization result for simulationbut it needs disjunction.The proof uses tools from descriptive set theory and measuretheory.Such a theorem originally proved for LTS with finite-branchingrestrictions by Hennessy and Milner in 1977 and van Benthem in1976.

Panangaden Milner Lecture 2021 30 September 2021 25 / 40

Page 76: Milner Lecture From bisimulation to representation ...

Logical Characterization

Very austere logic:

L ::== T|φ1 ∧ φ2|〈a〉qφs |= 〈a〉qφ means that if the system is in state s, then after theaction a, with probability at least q the new state will satisfy theformula φ.Two systems are bisimilar iff they obey the same formulas of L.[DEP 1998 LICS, I and C 2002]No finite branching assumption.No negation in the logic,

so one can obtain a logical characterization result for simulationbut it needs disjunction.The proof uses tools from descriptive set theory and measuretheory.Such a theorem originally proved for LTS with finite-branchingrestrictions by Hennessy and Milner in 1977 and van Benthem in1976.

Panangaden Milner Lecture 2021 30 September 2021 25 / 40

Page 77: Milner Lecture From bisimulation to representation ...

Logical Characterization

Very austere logic:

L ::== T|φ1 ∧ φ2|〈a〉qφs |= 〈a〉qφ means that if the system is in state s, then after theaction a, with probability at least q the new state will satisfy theformula φ.Two systems are bisimilar iff they obey the same formulas of L.[DEP 1998 LICS, I and C 2002]No finite branching assumption.No negation in the logic,so one can obtain a logical characterization result for simulation

but it needs disjunction.The proof uses tools from descriptive set theory and measuretheory.Such a theorem originally proved for LTS with finite-branchingrestrictions by Hennessy and Milner in 1977 and van Benthem in1976.

Panangaden Milner Lecture 2021 30 September 2021 25 / 40

Page 78: Milner Lecture From bisimulation to representation ...

Logical Characterization

Very austere logic:

L ::== T|φ1 ∧ φ2|〈a〉qφs |= 〈a〉qφ means that if the system is in state s, then after theaction a, with probability at least q the new state will satisfy theformula φ.Two systems are bisimilar iff they obey the same formulas of L.[DEP 1998 LICS, I and C 2002]No finite branching assumption.No negation in the logic,so one can obtain a logical characterization result for simulationbut it needs disjunction.

The proof uses tools from descriptive set theory and measuretheory.Such a theorem originally proved for LTS with finite-branchingrestrictions by Hennessy and Milner in 1977 and van Benthem in1976.

Panangaden Milner Lecture 2021 30 September 2021 25 / 40

Page 79: Milner Lecture From bisimulation to representation ...

Logical Characterization

Very austere logic:

L ::== T|φ1 ∧ φ2|〈a〉qφs |= 〈a〉qφ means that if the system is in state s, then after theaction a, with probability at least q the new state will satisfy theformula φ.Two systems are bisimilar iff they obey the same formulas of L.[DEP 1998 LICS, I and C 2002]No finite branching assumption.No negation in the logic,so one can obtain a logical characterization result for simulationbut it needs disjunction.The proof uses tools from descriptive set theory and measuretheory.

Such a theorem originally proved for LTS with finite-branchingrestrictions by Hennessy and Milner in 1977 and van Benthem in1976.

Panangaden Milner Lecture 2021 30 September 2021 25 / 40

Page 80: Milner Lecture From bisimulation to representation ...

Logical Characterization

Very austere logic:

L ::== T|φ1 ∧ φ2|〈a〉qφs |= 〈a〉qφ means that if the system is in state s, then after theaction a, with probability at least q the new state will satisfy theformula φ.Two systems are bisimilar iff they obey the same formulas of L.[DEP 1998 LICS, I and C 2002]No finite branching assumption.No negation in the logic,so one can obtain a logical characterization result for simulationbut it needs disjunction.The proof uses tools from descriptive set theory and measuretheory.Such a theorem originally proved for LTS with finite-branchingrestrictions by Hennessy and Milner in 1977 and van Benthem in1976.

Panangaden Milner Lecture 2021 30 September 2021 25 / 40

Page 81: Milner Lecture From bisimulation to representation ...

The proof “engine” Josée Desharnais

Panangaden Milner Lecture 2021 30 September 2021 26 / 40

Page 82: Milner Lecture From bisimulation to representation ...

But...

In the context of probability is exact equivalence reasonable?

We say “no”. A small change in the probability distributions mayresult in bisimilar processes no longer being bisimilar though theymay be very “close” in behaviour.Instead one should have a (pseudo)metric for probabilisticprocesses.

Panangaden Milner Lecture 2021 30 September 2021 27 / 40

Page 83: Milner Lecture From bisimulation to representation ...

But...

In the context of probability is exact equivalence reasonable?We say “no”. A small change in the probability distributions mayresult in bisimilar processes no longer being bisimilar though theymay be very “close” in behaviour.

Instead one should have a (pseudo)metric for probabilisticprocesses.

Panangaden Milner Lecture 2021 30 September 2021 27 / 40

Page 84: Milner Lecture From bisimulation to representation ...

But...

In the context of probability is exact equivalence reasonable?We say “no”. A small change in the probability distributions mayresult in bisimilar processes no longer being bisimilar though theymay be very “close” in behaviour.Instead one should have a (pseudo)metric for probabilisticprocesses.

Panangaden Milner Lecture 2021 30 September 2021 27 / 40

Page 85: Milner Lecture From bisimulation to representation ...

A metric-based approximate viewpoint

Move from equality between processes to distances betweenprocesses (Jou and Smolka 1990).

Quantitative measurement of the distinction between processes.

Panangaden Milner Lecture 2021 30 September 2021 28 / 40

Page 86: Milner Lecture From bisimulation to representation ...

A metric-based approximate viewpoint

Move from equality between processes to distances betweenprocesses (Jou and Smolka 1990).Quantitative measurement of the distinction between processes.

Panangaden Milner Lecture 2021 30 September 2021 28 / 40

Page 87: Milner Lecture From bisimulation to representation ...

In lieu of several slides of greek letters and symbols

If two states are not bisimilar there is a some observation onwhich they disagree.

They may diasagree on the reward or on the probabilitydistribution that results from a transition.We need to measure the latter, we use the WassersteinKantorovich metric between probability distributions.Intuitively, if the difference shows up only after a long andelaborate observation then we should make the states “nearby” inthe bisimulation metric.All this can be formalized and was originally done by Desharnaiset al. and later with a beautiful fixed-point construction by vanBreugel and Worrell.Ferns et al. added rewards and showed that the bisimulationmetric bounds the difference in optimal value functions.

Panangaden Milner Lecture 2021 30 September 2021 29 / 40

Page 88: Milner Lecture From bisimulation to representation ...

In lieu of several slides of greek letters and symbols

If two states are not bisimilar there is a some observation onwhich they disagree.They may diasagree on the reward or on the probabilitydistribution that results from a transition.

We need to measure the latter, we use the WassersteinKantorovich metric between probability distributions.Intuitively, if the difference shows up only after a long andelaborate observation then we should make the states “nearby” inthe bisimulation metric.All this can be formalized and was originally done by Desharnaiset al. and later with a beautiful fixed-point construction by vanBreugel and Worrell.Ferns et al. added rewards and showed that the bisimulationmetric bounds the difference in optimal value functions.

Panangaden Milner Lecture 2021 30 September 2021 29 / 40

Page 89: Milner Lecture From bisimulation to representation ...

In lieu of several slides of greek letters and symbols

If two states are not bisimilar there is a some observation onwhich they disagree.They may diasagree on the reward or on the probabilitydistribution that results from a transition.We need to measure the latter, we use the WassersteinKantorovich metric between probability distributions.

Intuitively, if the difference shows up only after a long andelaborate observation then we should make the states “nearby” inthe bisimulation metric.All this can be formalized and was originally done by Desharnaiset al. and later with a beautiful fixed-point construction by vanBreugel and Worrell.Ferns et al. added rewards and showed that the bisimulationmetric bounds the difference in optimal value functions.

Panangaden Milner Lecture 2021 30 September 2021 29 / 40

Page 90: Milner Lecture From bisimulation to representation ...

In lieu of several slides of greek letters and symbols

If two states are not bisimilar there is a some observation onwhich they disagree.They may diasagree on the reward or on the probabilitydistribution that results from a transition.We need to measure the latter, we use the WassersteinKantorovich metric between probability distributions.Intuitively, if the difference shows up only after a long andelaborate observation then we should make the states “nearby” inthe bisimulation metric.

All this can be formalized and was originally done by Desharnaiset al. and later with a beautiful fixed-point construction by vanBreugel and Worrell.Ferns et al. added rewards and showed that the bisimulationmetric bounds the difference in optimal value functions.

Panangaden Milner Lecture 2021 30 September 2021 29 / 40

Page 91: Milner Lecture From bisimulation to representation ...

In lieu of several slides of greek letters and symbols

If two states are not bisimilar there is a some observation onwhich they disagree.They may diasagree on the reward or on the probabilitydistribution that results from a transition.We need to measure the latter, we use the WassersteinKantorovich metric between probability distributions.Intuitively, if the difference shows up only after a long andelaborate observation then we should make the states “nearby” inthe bisimulation metric.All this can be formalized and was originally done by Desharnaiset al. and later with a beautiful fixed-point construction by vanBreugel and Worrell.

Ferns et al. added rewards and showed that the bisimulationmetric bounds the difference in optimal value functions.

Panangaden Milner Lecture 2021 30 September 2021 29 / 40

Page 92: Milner Lecture From bisimulation to representation ...

In lieu of several slides of greek letters and symbols

If two states are not bisimilar there is a some observation onwhich they disagree.They may diasagree on the reward or on the probabilitydistribution that results from a transition.We need to measure the latter, we use the WassersteinKantorovich metric between probability distributions.Intuitively, if the difference shows up only after a long andelaborate observation then we should make the states “nearby” inthe bisimulation metric.All this can be formalized and was originally done by Desharnaiset al. and later with a beautiful fixed-point construction by vanBreugel and Worrell.Ferns et al. added rewards and showed that the bisimulationmetric bounds the difference in optimal value functions.

Panangaden Milner Lecture 2021 30 September 2021 29 / 40

Page 93: Milner Lecture From bisimulation to representation ...

Quantitative equational logic

It is possible to generalize the notion of equation to captureapproximate equality.

s =ε t means s is within ε of t.Much of the theory of equational logic carries over to this setting.Algebras for such equations are naturally equipped with metricsand give a way of reasoning about bisimulation metrics.Mardare, P., Plotkin LICS 2016, 2017, 2021; Bacci, Mardare, P.,Plotkin LICS 2018, CALCO 2021.

Panangaden Milner Lecture 2021 30 September 2021 30 / 40

Page 94: Milner Lecture From bisimulation to representation ...

Quantitative equational logic

It is possible to generalize the notion of equation to captureapproximate equality.s =ε t means s is within ε of t.

Much of the theory of equational logic carries over to this setting.Algebras for such equations are naturally equipped with metricsand give a way of reasoning about bisimulation metrics.Mardare, P., Plotkin LICS 2016, 2017, 2021; Bacci, Mardare, P.,Plotkin LICS 2018, CALCO 2021.

Panangaden Milner Lecture 2021 30 September 2021 30 / 40

Page 95: Milner Lecture From bisimulation to representation ...

Quantitative equational logic

It is possible to generalize the notion of equation to captureapproximate equality.s =ε t means s is within ε of t.Much of the theory of equational logic carries over to this setting.

Algebras for such equations are naturally equipped with metricsand give a way of reasoning about bisimulation metrics.Mardare, P., Plotkin LICS 2016, 2017, 2021; Bacci, Mardare, P.,Plotkin LICS 2018, CALCO 2021.

Panangaden Milner Lecture 2021 30 September 2021 30 / 40

Page 96: Milner Lecture From bisimulation to representation ...

Quantitative equational logic

It is possible to generalize the notion of equation to captureapproximate equality.s =ε t means s is within ε of t.Much of the theory of equational logic carries over to this setting.Algebras for such equations are naturally equipped with metricsand give a way of reasoning about bisimulation metrics.

Mardare, P., Plotkin LICS 2016, 2017, 2021; Bacci, Mardare, P.,Plotkin LICS 2018, CALCO 2021.

Panangaden Milner Lecture 2021 30 September 2021 30 / 40

Page 97: Milner Lecture From bisimulation to representation ...

Quantitative equational logic

It is possible to generalize the notion of equation to captureapproximate equality.s =ε t means s is within ε of t.Much of the theory of equational logic carries over to this setting.Algebras for such equations are naturally equipped with metricsand give a way of reasoning about bisimulation metrics.Mardare, P., Plotkin LICS 2016, 2017, 2021; Bacci, Mardare, P.,Plotkin LICS 2018, CALCO 2021.

Panangaden Milner Lecture 2021 30 September 2021 30 / 40

Page 98: Milner Lecture From bisimulation to representation ...

Basic goals in RL

We are often dealing with large or infinite transition systemswhose behaviour is probabilistic.

The system responds to stimuli (actions) and moves to a newstate probabilistically and outputs a (possibly) random reward.We seek optimal policies for extracting the largest possible rewardin expectation.A plethora of algorithms and techniques, but the cost depends onthe size of the state space.Can we learn representations of the state space that acceleratethe learning process?

Panangaden Milner Lecture 2021 30 September 2021 31 / 40

Page 99: Milner Lecture From bisimulation to representation ...

Basic goals in RL

We are often dealing with large or infinite transition systemswhose behaviour is probabilistic.The system responds to stimuli (actions) and moves to a newstate probabilistically and outputs a (possibly) random reward.

We seek optimal policies for extracting the largest possible rewardin expectation.A plethora of algorithms and techniques, but the cost depends onthe size of the state space.Can we learn representations of the state space that acceleratethe learning process?

Panangaden Milner Lecture 2021 30 September 2021 31 / 40

Page 100: Milner Lecture From bisimulation to representation ...

Basic goals in RL

We are often dealing with large or infinite transition systemswhose behaviour is probabilistic.The system responds to stimuli (actions) and moves to a newstate probabilistically and outputs a (possibly) random reward.We seek optimal policies for extracting the largest possible rewardin expectation.

A plethora of algorithms and techniques, but the cost depends onthe size of the state space.Can we learn representations of the state space that acceleratethe learning process?

Panangaden Milner Lecture 2021 30 September 2021 31 / 40

Page 101: Milner Lecture From bisimulation to representation ...

Basic goals in RL

We are often dealing with large or infinite transition systemswhose behaviour is probabilistic.The system responds to stimuli (actions) and moves to a newstate probabilistically and outputs a (possibly) random reward.We seek optimal policies for extracting the largest possible rewardin expectation.A plethora of algorithms and techniques, but the cost depends onthe size of the state space.

Can we learn representations of the state space that acceleratethe learning process?

Panangaden Milner Lecture 2021 30 September 2021 31 / 40

Page 102: Milner Lecture From bisimulation to representation ...

Basic goals in RL

We are often dealing with large or infinite transition systemswhose behaviour is probabilistic.The system responds to stimuli (actions) and moves to a newstate probabilistically and outputs a (possibly) random reward.We seek optimal policies for extracting the largest possible rewardin expectation.A plethora of algorithms and techniques, but the cost depends onthe size of the state space.Can we learn representations of the state space that acceleratethe learning process?

Panangaden Milner Lecture 2021 30 September 2021 31 / 40

Page 103: Milner Lecture From bisimulation to representation ...

Representation learning

For large state spaces, learning value functions S×A −→ R is notfeasible.

Instead we define a new space of features M and try to come upwith an embedding φ : S −→ RM.Then we can try to use this to predict values associated withstate,action pairs.Representation learning means learning such a φ.The elements of M are the “features” that are chosen. They canbe based on any kind of knowledge or experience about the taskat hand.

Panangaden Milner Lecture 2021 30 September 2021 32 / 40

Page 104: Milner Lecture From bisimulation to representation ...

Representation learning

For large state spaces, learning value functions S×A −→ R is notfeasible.Instead we define a new space of features M and try to come upwith an embedding φ : S −→ RM.

Then we can try to use this to predict values associated withstate,action pairs.Representation learning means learning such a φ.The elements of M are the “features” that are chosen. They canbe based on any kind of knowledge or experience about the taskat hand.

Panangaden Milner Lecture 2021 30 September 2021 32 / 40

Page 105: Milner Lecture From bisimulation to representation ...

Representation learning

For large state spaces, learning value functions S×A −→ R is notfeasible.Instead we define a new space of features M and try to come upwith an embedding φ : S −→ RM.Then we can try to use this to predict values associated withstate,action pairs.

Representation learning means learning such a φ.The elements of M are the “features” that are chosen. They canbe based on any kind of knowledge or experience about the taskat hand.

Panangaden Milner Lecture 2021 30 September 2021 32 / 40

Page 106: Milner Lecture From bisimulation to representation ...

Representation learning

For large state spaces, learning value functions S×A −→ R is notfeasible.Instead we define a new space of features M and try to come upwith an embedding φ : S −→ RM.Then we can try to use this to predict values associated withstate,action pairs.Representation learning means learning such a φ.

The elements of M are the “features” that are chosen. They canbe based on any kind of knowledge or experience about the taskat hand.

Panangaden Milner Lecture 2021 30 September 2021 32 / 40

Page 107: Milner Lecture From bisimulation to representation ...

Representation learning

For large state spaces, learning value functions S×A −→ R is notfeasible.Instead we define a new space of features M and try to come upwith an embedding φ : S −→ RM.Then we can try to use this to predict values associated withstate,action pairs.Representation learning means learning such a φ.The elements of M are the “features” that are chosen. They canbe based on any kind of knowledge or experience about the taskat hand.

Panangaden Milner Lecture 2021 30 September 2021 32 / 40

Page 108: Milner Lecture From bisimulation to representation ...

The MICo distance

The Kantorovich metric is expensive to compute and difficult toestimate from samples.

We (Castro et al.) invented a version that is easy to estimate fromsamples.In spirit it is closely related to the bisimulation metric but it is acrude approximationand is not even technically a metric!

Panangaden Milner Lecture 2021 30 September 2021 33 / 40

Page 109: Milner Lecture From bisimulation to representation ...

The MICo distance

The Kantorovich metric is expensive to compute and difficult toestimate from samples.We (Castro et al.) invented a version that is easy to estimate fromsamples.

In spirit it is closely related to the bisimulation metric but it is acrude approximationand is not even technically a metric!

Panangaden Milner Lecture 2021 30 September 2021 33 / 40

Page 110: Milner Lecture From bisimulation to representation ...

The MICo distance

The Kantorovich metric is expensive to compute and difficult toestimate from samples.We (Castro et al.) invented a version that is easy to estimate fromsamples.In spirit it is closely related to the bisimulation metric but it is acrude approximation

and is not even technically a metric!

Panangaden Milner Lecture 2021 30 September 2021 33 / 40

Page 111: Milner Lecture From bisimulation to representation ...

The MICo distance

The Kantorovich metric is expensive to compute and difficult toestimate from samples.We (Castro et al.) invented a version that is easy to estimate fromsamples.In spirit it is closely related to the bisimulation metric but it is acrude approximationand is not even technically a metric!

Panangaden Milner Lecture 2021 30 September 2021 33 / 40

Page 112: Milner Lecture From bisimulation to representation ...

A new type of distance

Diffuse metric

1 d(x, y) ≥ 02 d(x, y) = d(y, x)3 d(x, y) ≤ d(x, z) + d(z, y)4 Do not require d(x, x) = 0

Panangaden Milner Lecture 2021 30 September 2021 34 / 40

Page 113: Milner Lecture From bisimulation to representation ...

A new type of distance

Diffuse metric1 d(x, y) ≥ 0

2 d(x, y) = d(y, x)3 d(x, y) ≤ d(x, z) + d(z, y)4 Do not require d(x, x) = 0

Panangaden Milner Lecture 2021 30 September 2021 34 / 40

Page 114: Milner Lecture From bisimulation to representation ...

A new type of distance

Diffuse metric1 d(x, y) ≥ 02 d(x, y) = d(y, x)

3 d(x, y) ≤ d(x, z) + d(z, y)4 Do not require d(x, x) = 0

Panangaden Milner Lecture 2021 30 September 2021 34 / 40

Page 115: Milner Lecture From bisimulation to representation ...

A new type of distance

Diffuse metric1 d(x, y) ≥ 02 d(x, y) = d(y, x)3 d(x, y) ≤ d(x, z) + d(z, y)

4 Do not require d(x, x) = 0

Panangaden Milner Lecture 2021 30 September 2021 34 / 40

Page 116: Milner Lecture From bisimulation to representation ...

A new type of distance

Diffuse metric1 d(x, y) ≥ 02 d(x, y) = d(y, x)3 d(x, y) ≤ d(x, z) + d(z, y)4 Do not require d(x, x) = 0

Panangaden Milner Lecture 2021 30 September 2021 34 / 40

Page 117: Milner Lecture From bisimulation to representation ...

MICo loss

Nearly all machine learning algorithms are optimizationalgorithms.One often introduces extra terms into the objective function thatpush the solution in a desired direction.We defined a loss term based on the MICo distance.For details readhttps://psc-g.github.io/posts/research/rl/mico/

Panangaden Milner Lecture 2021 30 September 2021 35 / 40

Page 118: Milner Lecture From bisimulation to representation ...

Experimental setup

Panangaden Milner Lecture 2021 30 September 2021 36 / 40

Page 119: Milner Lecture From bisimulation to representation ...

Experiments

Added the MICo loss term to a variety of existing agents: all thoseavailable in the Dopamine Library; 5 in all.

Ran each game 5 times with new seeds so 300 runs for eachagent.Each game is run for 200 million environment interactions.We look at final scores and learning curve.We tried each agent with and without the MICo loss term on 60different Atari games.Every agent performed better on about 2

3 of the games.

Panangaden Milner Lecture 2021 30 September 2021 37 / 40

Page 120: Milner Lecture From bisimulation to representation ...

Experiments

Added the MICo loss term to a variety of existing agents: all thoseavailable in the Dopamine Library; 5 in all.Ran each game 5 times with new seeds so 300 runs for eachagent.

Each game is run for 200 million environment interactions.We look at final scores and learning curve.We tried each agent with and without the MICo loss term on 60different Atari games.Every agent performed better on about 2

3 of the games.

Panangaden Milner Lecture 2021 30 September 2021 37 / 40

Page 121: Milner Lecture From bisimulation to representation ...

Experiments

Added the MICo loss term to a variety of existing agents: all thoseavailable in the Dopamine Library; 5 in all.Ran each game 5 times with new seeds so 300 runs for eachagent.Each game is run for 200 million environment interactions.

We look at final scores and learning curve.We tried each agent with and without the MICo loss term on 60different Atari games.Every agent performed better on about 2

3 of the games.

Panangaden Milner Lecture 2021 30 September 2021 37 / 40

Page 122: Milner Lecture From bisimulation to representation ...

Experiments

Added the MICo loss term to a variety of existing agents: all thoseavailable in the Dopamine Library; 5 in all.Ran each game 5 times with new seeds so 300 runs for eachagent.Each game is run for 200 million environment interactions.We look at final scores and learning curve.

We tried each agent with and without the MICo loss term on 60different Atari games.Every agent performed better on about 2

3 of the games.

Panangaden Milner Lecture 2021 30 September 2021 37 / 40

Page 123: Milner Lecture From bisimulation to representation ...

Experiments

Added the MICo loss term to a variety of existing agents: all thoseavailable in the Dopamine Library; 5 in all.Ran each game 5 times with new seeds so 300 runs for eachagent.Each game is run for 200 million environment interactions.We look at final scores and learning curve.We tried each agent with and without the MICo loss term on 60different Atari games.

Every agent performed better on about 23 of the games.

Panangaden Milner Lecture 2021 30 September 2021 37 / 40

Page 124: Milner Lecture From bisimulation to representation ...

Experiments

Added the MICo loss term to a variety of existing agents: all thoseavailable in the Dopamine Library; 5 in all.Ran each game 5 times with new seeds so 300 runs for eachagent.Each game is run for 200 million environment interactions.We look at final scores and learning curve.We tried each agent with and without the MICo loss term on 60different Atari games.Every agent performed better on about 2

3 of the games.

Panangaden Milner Lecture 2021 30 September 2021 37 / 40

Page 125: Milner Lecture From bisimulation to representation ...

Results for Rainbow

Panangaden Milner Lecture 2021 30 September 2021 38 / 40

Page 126: Milner Lecture From bisimulation to representation ...

Results for DQN

Panangaden Milner Lecture 2021 30 September 2021 39 / 40

Page 127: Milner Lecture From bisimulation to representation ...

Conclusions

Bisimulation has a rich and venerable history.

The metric analogue holds promise for quantitative reasoning andapproximation.Perhaps a fruitful line of research would be equation solving inquantitative algebras and automating equational reasoning in thequantitative setting.Research is alive and well and there are new areas wherebisimulation is being “discovered”.

Panangaden Milner Lecture 2021 30 September 2021 40 / 40

Page 128: Milner Lecture From bisimulation to representation ...

Conclusions

Bisimulation has a rich and venerable history.The metric analogue holds promise for quantitative reasoning andapproximation.

Perhaps a fruitful line of research would be equation solving inquantitative algebras and automating equational reasoning in thequantitative setting.Research is alive and well and there are new areas wherebisimulation is being “discovered”.

Panangaden Milner Lecture 2021 30 September 2021 40 / 40

Page 129: Milner Lecture From bisimulation to representation ...

Conclusions

Bisimulation has a rich and venerable history.The metric analogue holds promise for quantitative reasoning andapproximation.Perhaps a fruitful line of research would be equation solving inquantitative algebras and automating equational reasoning in thequantitative setting.

Research is alive and well and there are new areas wherebisimulation is being “discovered”.

Panangaden Milner Lecture 2021 30 September 2021 40 / 40

Page 130: Milner Lecture From bisimulation to representation ...

Conclusions

Bisimulation has a rich and venerable history.The metric analogue holds promise for quantitative reasoning andapproximation.Perhaps a fruitful line of research would be equation solving inquantitative algebras and automating equational reasoning in thequantitative setting.Research is alive and well and there are new areas wherebisimulation is being “discovered”.

Panangaden Milner Lecture 2021 30 September 2021 40 / 40