Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja...

75
The “procrastination paradox” A formal toy model Partial solutions Logical uncertainty Conclusions Vingean reflection: Reliable reasoning for self-improving agents Benja Fallenstein Machine Intelligence Research Institute May 16, 2015 Benja Fallenstein Vingean reflection

Transcript of Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja...

Page 1: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Vingean reflection:Reliable reasoning for self-improving agents

Benja Fallenstein

Machine Intelligence Research Institute

May 16, 2015

Benja Fallenstein Vingean reflection

Page 2: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Motivation

Smarter-than-human intelligence isn’t around the corner

but it’ll (probably) be developed eventually.

Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?

How do we make sure system actually pursues them?

How do we correct the system if we get it wrong?

Want solid theoretical understanding of problem & solution

What is correct reasoning and decision making?

Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .

. . . go in the right direction, but are not enough.

Need for foundational research—which can be done today.

Benja Fallenstein Vingean reflection

Page 3: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Motivation

Smarter-than-human intelligence isn’t around the corner

but it’ll (probably) be developed eventually.

Important to ensure it’s aligned with our interests

But how do we specify beneficial goals?

How do we make sure system actually pursues them?

How do we correct the system if we get it wrong?

Want solid theoretical understanding of problem & solution

What is correct reasoning and decision making?

Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .

. . . go in the right direction, but are not enough.

Need for foundational research—which can be done today.

Benja Fallenstein Vingean reflection

Page 4: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Motivation

Smarter-than-human intelligence isn’t around the corner

but it’ll (probably) be developed eventually.

Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?

How do we make sure system actually pursues them?

How do we correct the system if we get it wrong?

Want solid theoretical understanding of problem & solution

What is correct reasoning and decision making?

Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .

. . . go in the right direction, but are not enough.

Need for foundational research—which can be done today.

Benja Fallenstein Vingean reflection

Page 5: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Motivation

Smarter-than-human intelligence isn’t around the corner

but it’ll (probably) be developed eventually.

Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?

How do we make sure system actually pursues them?

How do we correct the system if we get it wrong?

Want solid theoretical understanding of problem & solution

What is correct reasoning and decision making?

Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .

. . . go in the right direction, but are not enough.

Need for foundational research—which can be done today.

Benja Fallenstein Vingean reflection

Page 6: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Motivation

Smarter-than-human intelligence isn’t around the corner

but it’ll (probably) be developed eventually.

Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?

How do we make sure system actually pursues them?

How do we correct the system if we get it wrong?

Want solid theoretical understanding of problem & solution

What is correct reasoning and decision making?

Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .

. . . go in the right direction, but are not enough.

Need for foundational research—which can be done today.

Benja Fallenstein Vingean reflection

Page 7: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Motivation

Smarter-than-human intelligence isn’t around the corner

but it’ll (probably) be developed eventually.

Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?

How do we make sure system actually pursues them?

How do we correct the system if we get it wrong?

Want solid theoretical understanding of problem & solution

What is correct reasoning and decision making?

Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .

. . . go in the right direction, but are not enough.

Need for foundational research—which can be done today.

Benja Fallenstein Vingean reflection

Page 8: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Motivation

Smarter-than-human intelligence isn’t around the corner

but it’ll (probably) be developed eventually.

Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?

How do we make sure system actually pursues them?

How do we correct the system if we get it wrong?

Want solid theoretical understanding of problem & solution

What is correct reasoning and decision making?

Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .

. . . go in the right direction, but are not enough.

Need for foundational research—which can be done today.

Benja Fallenstein Vingean reflection

Page 9: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Motivation

Smarter-than-human intelligence isn’t around the corner

but it’ll (probably) be developed eventually.

Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?

How do we make sure system actually pursues them?

How do we correct the system if we get it wrong?

Want solid theoretical understanding of problem & solution

What is correct reasoning and decision making?

Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .

. . . go in the right direction, but are not enough.

Need for foundational research—which can be done today.

Benja Fallenstein Vingean reflection

Page 10: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Motivation

Smarter-than-human intelligence isn’t around the corner

but it’ll (probably) be developed eventually.

Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?

How do we make sure system actually pursues them?

How do we correct the system if we get it wrong?

Want solid theoretical understanding of problem & solution

What is correct reasoning and decision making?

Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .

. . . go in the right direction, but are not enough.

Need for foundational research—which can be done today.

Benja Fallenstein Vingean reflection

Page 11: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Motivation

Smarter-than-human intelligence isn’t around the corner

but it’ll (probably) be developed eventually.

Important to ensure it’s aligned with our interestsBut how do we specify beneficial goals?

How do we make sure system actually pursues them?

How do we correct the system if we get it wrong?

Want solid theoretical understanding of problem & solution

What is correct reasoning and decision making?

Probability theory, decision theory, game theory, statisticallearning theory, Bayesian networks, formal verification, . . .

. . . go in the right direction, but are not enough.

Need for foundational research—which can be done today.

Benja Fallenstein Vingean reflection

Page 12: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Vingean reflection

Can we create a self-modifying system. . .

. . . that goes through a billion modifications. . .

. . . without ever going wrong?

Need extremely reliable way for an AI to reason about agentssmarter than itself — much more reliable than a human!

Need to use abstract reasoning

Vinge: Can’t know exactly what a smarter successor will doInstead, have abstract reasons to think its choices are goodStandard decision theory doesn’t model this

Formal logic as a model of abstract reasoning

Benja Fallenstein Vingean reflection

Page 13: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Vingean reflection

Can we create a self-modifying system. . .

. . . that goes through a billion modifications. . .

. . . without ever going wrong?

Need extremely reliable way for an AI to reason about agentssmarter than itself — much more reliable than a human!

Need to use abstract reasoning

Vinge: Can’t know exactly what a smarter successor will doInstead, have abstract reasons to think its choices are goodStandard decision theory doesn’t model this

Formal logic as a model of abstract reasoning

Benja Fallenstein Vingean reflection

Page 14: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Vingean reflection

Can we create a self-modifying system. . .

. . . that goes through a billion modifications. . .

. . . without ever going wrong?

Need extremely reliable way for an AI to reason about agentssmarter than itself — much more reliable than a human!

Need to use abstract reasoning

Vinge: Can’t know exactly what a smarter successor will doInstead, have abstract reasons to think its choices are good

Standard decision theory doesn’t model this

Formal logic as a model of abstract reasoning

Benja Fallenstein Vingean reflection

Page 15: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Vingean reflection

Can we create a self-modifying system. . .

. . . that goes through a billion modifications. . .

. . . without ever going wrong?

Need extremely reliable way for an AI to reason about agentssmarter than itself — much more reliable than a human!

Need to use abstract reasoning

Vinge: Can’t know exactly what a smarter successor will doInstead, have abstract reasons to think its choices are goodStandard decision theory doesn’t model this

Formal logic as a model of abstract reasoning

Benja Fallenstein Vingean reflection

Page 16: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Vingean reflection

Can we create a self-modifying system. . .

. . . that goes through a billion modifications. . .

. . . without ever going wrong?

Need extremely reliable way for an AI to reason about agentssmarter than itself — much more reliable than a human!

Need to use abstract reasoning

Vinge: Can’t know exactly what a smarter successor will doInstead, have abstract reasons to think its choices are goodStandard decision theory doesn’t model this

Formal logic as a model of abstract reasoning

Benja Fallenstein Vingean reflection

Page 17: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

1 The “procrastination paradox”

2 A formal toy model

3 Partial solutions

4 Logical uncertainty

5 Conclusions

Benja Fallenstein Vingean reflection

Page 18: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

The “procrastination paradox”

Agent in a deterministic, known world; discrete timesteps.

In each timestep, the agent chooses whether to press a button:

If pressed in 1st round: Utility = 1/2If pressed in 2nd round (and not before): Utility = 2/3If pressed in 3rd round (and not before): Utility = 3/4. . .

If never pressed: Utility = 0

(No optimal strategy, but sure can beat 0!)

The agent is programmed to press the button immediately. . .

. . . unless it finds a “good argument” that the button will getpressed later.

Benja Fallenstein Vingean reflection

Page 19: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

The “procrastination paradox”

Agent in a deterministic, known world; discrete timesteps.

In each timestep, the agent chooses whether to press a button:

If pressed in 1st round: Utility = 1/2If pressed in 2nd round (and not before): Utility = 2/3If pressed in 3rd round (and not before): Utility = 3/4. . .If never pressed: Utility = 0

(No optimal strategy, but sure can beat 0!)

The agent is programmed to press the button immediately. . .

. . . unless it finds a “good argument” that the button will getpressed later.

Benja Fallenstein Vingean reflection

Page 20: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

The “procrastination paradox”

Agent in a deterministic, known world; discrete timesteps.

In each timestep, the agent chooses whether to press a button:

If pressed in 1st round: Utility = 1/2If pressed in 2nd round (and not before): Utility = 2/3If pressed in 3rd round (and not before): Utility = 3/4. . .If never pressed: Utility = 0

(No optimal strategy, but sure can beat 0!)

The agent is programmed to press the button immediately. . .

. . . unless it finds a “good argument” that the button will getpressed later.

Benja Fallenstein Vingean reflection

Page 21: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

The “procrastination paradox”

Agent in a deterministic, known world; discrete timesteps.

In each timestep, the agent chooses whether to press a button:

If pressed in 1st round: Utility = 1/2If pressed in 2nd round (and not before): Utility = 2/3If pressed in 3rd round (and not before): Utility = 3/4. . .If never pressed: Utility = 0

(No optimal strategy, but sure can beat 0!)

The agent is programmed to press the button immediately. . .

. . . unless it finds a “good argument” that the button will getpressed later.

Benja Fallenstein Vingean reflection

Page 22: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

The agent reasons:

Suppose I don’t press the button now.

Either I press the button in the next step, or I don’t.

If I do, the button gets pressed, good.If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!Either way, the button gets pressed.

So the agent can always find a “good argument” that the buttonwill get pressed later. . .

. . . and therefore never presses the button!

If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).

Benja Fallenstein Vingean reflection

Page 23: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

The agent reasons:

Suppose I don’t press the button now.

Either I press the button in the next step, or I don’t.

If I do, the button gets pressed, good.

If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!Either way, the button gets pressed.

So the agent can always find a “good argument” that the buttonwill get pressed later. . .

. . . and therefore never presses the button!

If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).

Benja Fallenstein Vingean reflection

Page 24: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

The agent reasons:

Suppose I don’t press the button now.

Either I press the button in the next step, or I don’t.

If I do, the button gets pressed, good.If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!

Either way, the button gets pressed.

So the agent can always find a “good argument” that the buttonwill get pressed later. . .

. . . and therefore never presses the button!

If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).

Benja Fallenstein Vingean reflection

Page 25: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

The agent reasons:

Suppose I don’t press the button now.

Either I press the button in the next step, or I don’t.

If I do, the button gets pressed, good.If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!Either way, the button gets pressed.

So the agent can always find a “good argument” that the buttonwill get pressed later. . .

. . . and therefore never presses the button!

If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).

Benja Fallenstein Vingean reflection

Page 26: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

The agent reasons:

Suppose I don’t press the button now.

Either I press the button in the next step, or I don’t.

If I do, the button gets pressed, good.If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!Either way, the button gets pressed.

So the agent can always find a “good argument” that the buttonwill get pressed later. . .

. . . and therefore never presses the button!

If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).

Benja Fallenstein Vingean reflection

Page 27: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

The agent reasons:

Suppose I don’t press the button now.

Either I press the button in the next step, or I don’t.

If I do, the button gets pressed, good.If I don’t, I must have found a good argument that the buttongets pressed later. So the button gets pressed, good!Either way, the button gets pressed.

So the agent can always find a “good argument” that the buttonwill get pressed later. . .

. . . and therefore never presses the button!

If we want to have reliable self-referential reasoning, we mustunderstand how to avoid this paradox (and others like it).

Benja Fallenstein Vingean reflection

Page 28: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

So what went wrong? (And how do we fix it?)

The paradox doesn’t go through with finite time horizons—

—or with temporal discounting:Utility =

∑∞t=0 γt · Rt , where

∑∞t=0 γt <∞ and Rt ∈ [0, 1].

Does using temporal discounting fix all such problems?

In our toy model:No, not by itself.

Still get (more technical) paradoxes of self-reference.

But: there are ways to fix these problems. . .. . . which work if we use finite horizons or discounting.

(Suggests this is key to avoiding the problem.)

Benja Fallenstein Vingean reflection

Page 29: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

So what went wrong? (And how do we fix it?)

The paradox doesn’t go through with finite time horizons—

—or with temporal discounting:Utility =

∑∞t=0 γt · Rt , where

∑∞t=0 γt <∞ and Rt ∈ [0, 1].

Does using temporal discounting fix all such problems?

In our toy model:No, not by itself.

Still get (more technical) paradoxes of self-reference.

But: there are ways to fix these problems. . .. . . which work if we use finite horizons or discounting.

(Suggests this is key to avoiding the problem.)

Benja Fallenstein Vingean reflection

Page 30: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

So what went wrong? (And how do we fix it?)

The paradox doesn’t go through with finite time horizons—

—or with temporal discounting:Utility =

∑∞t=0 γt · Rt , where

∑∞t=0 γt <∞ and Rt ∈ [0, 1].

Does using temporal discounting fix all such problems?

In our toy model:No, not by itself.

Still get (more technical) paradoxes of self-reference.

But: there are ways to fix these problems. . .. . . which work if we use finite horizons or discounting.

(Suggests this is key to avoiding the problem.)

Benja Fallenstein Vingean reflection

Page 31: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

So what went wrong? (And how do we fix it?)

The paradox doesn’t go through with finite time horizons—

—or with temporal discounting:Utility =

∑∞t=0 γt · Rt , where

∑∞t=0 γt <∞ and Rt ∈ [0, 1].

Does using temporal discounting fix all such problems?

In our toy model:No, not by itself.

Still get (more technical) paradoxes of self-reference.

But: there are ways to fix these problems. . .. . . which work if we use finite horizons or discounting.

(Suggests this is key to avoiding the problem.)

Benja Fallenstein Vingean reflection

Page 32: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

So what went wrong? (And how do we fix it?)

The paradox doesn’t go through with finite time horizons—

—or with temporal discounting:Utility =

∑∞t=0 γt · Rt , where

∑∞t=0 γt <∞ and Rt ∈ [0, 1].

Does using temporal discounting fix all such problems?

In our toy model:No, not by itself.

Still get (more technical) paradoxes of self-reference.

But: there are ways to fix these problems. . .. . . which work if we use finite horizons or discounting.

(Suggests this is key to avoiding the problem.)

Benja Fallenstein Vingean reflection

Page 33: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

1 The “procrastination paradox”

2 A formal toy model

3 Partial solutions

4 Logical uncertainty

5 Conclusions

Benja Fallenstein Vingean reflection

Page 34: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

For our toy model, use formal logic.

But not because we think realistic smarter-than-human agentswork like this.

The problem seems to be much more general.Any scheme for highly reliable self-referential reasoningwill need to deal with it somehow.

Rather: because we can prove theorems about it—

and then see what this tells us about the real problem.

Benja Fallenstein Vingean reflection

Page 35: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

For our toy model, use formal logic.

But not because we think realistic smarter-than-human agentswork like this.

The problem seems to be much more general.Any scheme for highly reliable self-referential reasoningwill need to deal with it somehow.

Rather: because we can prove theorems about it—

and then see what this tells us about the real problem.

Benja Fallenstein Vingean reflection

Page 36: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Write P(n) for “the button is pressed in the nth timestep”.

Define computable function f (n):

f (n) searches for proofs

in Peano Arithmetic (PA)of length ≤ 10100+n

of “∃k > n. P(k)” — i.e., “button pressed later”.

If proof found =⇒ returns 0 (“don’t press button”).Else =⇒ returns 1 (“press button”).

PA ` P(n) ↔ [f (n) = 1].

(Self-referential definition by Kleene’s second recursion thm.)

Benja Fallenstein Vingean reflection

Page 37: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

By looking at f (n + 1), can prove (in � 10100+n symbols):

“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)

“If button not pressed in next step, must have found proof itwill be pressed later”:1

PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q

(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)

“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)

Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.

=⇒ So PA 0 �PApϕq → ϕ.

1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection

Page 38: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

By looking at f (n + 1), can prove (in � 10100+n symbols):

“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)

“If button not pressed in next step, must have found proof itwill be pressed later”:1

PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q

(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)

“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)

Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.

=⇒ So PA 0 �PApϕq → ϕ.

1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection

Page 39: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

By looking at f (n + 1), can prove (in � 10100+n symbols):

“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)

“If button not pressed in next step, must have found proof itwill be pressed later”:1

PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q

(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)

“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)

Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.

=⇒ So PA 0 �PApϕq → ϕ.

1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection

Page 40: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

By looking at f (n + 1), can prove (in � 10100+n symbols):

“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)

“If button not pressed in next step, must have found proof itwill be pressed later”:1

PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q

(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)

“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)

Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.

=⇒ So PA 0 �PApϕq → ϕ.

1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection

Page 41: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

By looking at f (n + 1), can prove (in � 10100+n symbols):

“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)

“If button not pressed in next step, must have found proof itwill be pressed later”:1

PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q

(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)

“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)

Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.

=⇒ So PA 0 �PApϕq → ϕ.

1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection

Page 42: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

By looking at f (n + 1), can prove (in � 10100+n symbols):

“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)

“If button not pressed in next step, must have found proof itwill be pressed later”:1

PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q

(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)

“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)

Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.

=⇒ So PA 0 �PApϕq → ϕ.

1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection

Page 43: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

By looking at f (n + 1), can prove (in � 10100+n symbols):

“Either the button will be pressed in the next timestep or not”:PA ` P(n + 1) ∨ ¬P(n + 1)

“If button not pressed in next step, must have found proof itwill be pressed later”:1

PA ` ¬P(n + 1) → �PAp∃k > n + 1. P(k)q

(???) “If there’s a proof that the button will be pressed, thenit will indeed be pressed.”PA ` �PAp∃k > n + 1. P(k)q → ∃k > n + 1. P(k)

“Hence, either way, the button is pressed.”PA ` P(n + 1) ∨ ∃k > n + 1. P(k)PA ` ∃k > n. P(k)

Hence, f (n) = 0 (for all n ∈ N). . . button never pressed.

=⇒ So PA 0 �PApϕq → ϕ.

1Notation: �PApϕq means “ϕ is provable in PA”.Benja Fallenstein Vingean reflection

Page 44: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

PA avoids the paradox since PA 0 �PApϕq → ϕ.

→ Generalize this beyond our logic-based toy example?

Why do we think our agent will work correctly?

We reason: “It will take only actions if it has very good reasonto believe these actions will be safe — therefore, any actions itwill take will be almost certainly safe.”

An agent should be able to use the same argument whenreasoning about rewriting itself!

Need something like T ` �Tpϕq → ϕ. . .

Godel/Lob: But that’s inconsistent, finite time horizons or not!

Benja Fallenstein Vingean reflection

Page 45: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

PA avoids the paradox since PA 0 �PApϕq → ϕ.

→ Generalize this beyond our logic-based toy example?

Why do we think our agent will work correctly?

We reason: “It will take only actions if it has very good reasonto believe these actions will be safe —

therefore, any actions itwill take will be almost certainly safe.”

An agent should be able to use the same argument whenreasoning about rewriting itself!

Need something like T ` �Tpϕq → ϕ. . .

Godel/Lob: But that’s inconsistent, finite time horizons or not!

Benja Fallenstein Vingean reflection

Page 46: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

PA avoids the paradox since PA 0 �PApϕq → ϕ.

→ Generalize this beyond our logic-based toy example?

Why do we think our agent will work correctly?

We reason: “It will take only actions if it has very good reasonto believe these actions will be safe — therefore, any actions itwill take will be almost certainly safe.”

An agent should be able to use the same argument whenreasoning about rewriting itself!

Need something like T ` �Tpϕq → ϕ. . .

Godel/Lob: But that’s inconsistent, finite time horizons or not!

Benja Fallenstein Vingean reflection

Page 47: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

PA avoids the paradox since PA 0 �PApϕq → ϕ.

→ Generalize this beyond our logic-based toy example?

Why do we think our agent will work correctly?

We reason: “It will take only actions if it has very good reasonto believe these actions will be safe — therefore, any actions itwill take will be almost certainly safe.”

An agent should be able to use the same argument whenreasoning about rewriting itself!

Need something like T ` �Tpϕq → ϕ. . .

Godel/Lob: But that’s inconsistent, finite time horizons or not!

Benja Fallenstein Vingean reflection

Page 48: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

PA avoids the paradox since PA 0 �PApϕq → ϕ.

→ Generalize this beyond our logic-based toy example?

Why do we think our agent will work correctly?

We reason: “It will take only actions if it has very good reasonto believe these actions will be safe — therefore, any actions itwill take will be almost certainly safe.”

An agent should be able to use the same argument whenreasoning about rewriting itself!

Need something like T ` �Tpϕq → ϕ. . .

Godel/Lob: But that’s inconsistent, finite time horizons or not!

Benja Fallenstein Vingean reflection

Page 49: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

PA avoids the paradox since PA 0 �PApϕq → ϕ.

→ Generalize this beyond our logic-based toy example?

Why do we think our agent will work correctly?

We reason: “It will take only actions if it has very good reasonto believe these actions will be safe — therefore, any actions itwill take will be almost certainly safe.”

An agent should be able to use the same argument whenreasoning about rewriting itself!

Need something like T ` �Tpϕq → ϕ. . .

Godel/Lob: But that’s inconsistent, finite time horizons or not!

Benja Fallenstein Vingean reflection

Page 50: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

1 The “procrastination paradox”

2 A formal toy model

3 Partial solutions

4 Logical uncertainty

5 Conclusions

Benja Fallenstein Vingean reflection

Page 51: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Partial solutions

1 Can have theories T0,T1,T2, . . . s.t. Tn+1 ` �Tnpϕq→ ϕ.

Agent using Tn+1 can rewrite into agent using Tn.Stops working when we reach T0.Works for finite time horizons.

2 Can have theories s.t. Tn ` �Tn+1pϕq→ ϕ for all ϕ ∈ Π1.

Agent using Tn can rewrite into agent using Tn+1.Can rewrite forever!

(But: Agent doesn’t know this! :-()

Works with temporal discounting (Fallenstein & Soares, 2014).

Do these approaches generalize beyond formal logic?

Benja Fallenstein Vingean reflection

Page 52: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Partial solutions

1 Can have theories T0,T1,T2, . . . s.t. Tn+1 ` �Tnpϕq→ ϕ.

Agent using Tn+1 can rewrite into agent using Tn.Stops working when we reach T0.Works for finite time horizons.

2 Can have theories s.t. Tn ` �Tn+1pϕq→ ϕ for all ϕ ∈ Π1.

Agent using Tn can rewrite into agent using Tn+1.Can rewrite forever!

(But: Agent doesn’t know this! :-()

Works with temporal discounting (Fallenstein & Soares, 2014).

Do these approaches generalize beyond formal logic?

Benja Fallenstein Vingean reflection

Page 53: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Partial solutions

1 Can have theories T0,T1,T2, . . . s.t. Tn+1 ` �Tnpϕq→ ϕ.

Agent using Tn+1 can rewrite into agent using Tn.Stops working when we reach T0.Works for finite time horizons.

2 Can have theories s.t. Tn ` �Tn+1pϕq→ ϕ for all ϕ ∈ Π1.

Agent using Tn can rewrite into agent using Tn+1.Can rewrite forever!

(But: Agent doesn’t know this! :-()

Works with temporal discounting (Fallenstein & Soares, 2014).

Do these approaches generalize beyond formal logic?

Benja Fallenstein Vingean reflection

Page 54: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

1 The “procrastination paradox”

2 A formal toy model

3 Partial solutions

4 Logical uncertainty

5 Conclusions

Benja Fallenstein Vingean reflection

Page 55: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Logical uncertainty

Standard probability theory = environmental uncertainty.

Agents are assumed to be logically omniscient.

No theoretical understanding of mathematical uncertainty!

Example: Choose between O(n2) and O(n log n) algorithm

Realistic Vingean reflection needs logical uncertainty.

Approach for study:

Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)

→ uncertainty about whether PA is consistent

Reflection is still difficult

Benja Fallenstein Vingean reflection

Page 56: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Logical uncertainty

Standard probability theory = environmental uncertainty.

Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!

Example: Choose between O(n2) and O(n log n) algorithm

Realistic Vingean reflection needs logical uncertainty.

Approach for study:

Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)

→ uncertainty about whether PA is consistent

Reflection is still difficult

Benja Fallenstein Vingean reflection

Page 57: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Logical uncertainty

Standard probability theory = environmental uncertainty.

Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!

Example: Choose between O(n2) and O(n log n) algorithm

Realistic Vingean reflection needs logical uncertainty.

Approach for study:

Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)

→ uncertainty about whether PA is consistent

Reflection is still difficult

Benja Fallenstein Vingean reflection

Page 58: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Logical uncertainty

Standard probability theory = environmental uncertainty.

Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!

Example: Choose between O(n2) and O(n log n) algorithm

Realistic Vingean reflection needs logical uncertainty.

Approach for study:

Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)

→ uncertainty about whether PA is consistent

Reflection is still difficult

Benja Fallenstein Vingean reflection

Page 59: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Logical uncertainty

Standard probability theory = environmental uncertainty.

Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!

Example: Choose between O(n2) and O(n log n) algorithm

Realistic Vingean reflection needs logical uncertainty.

Approach for study:

Probability distribution over complete theories in somefirst-order language.

e.g. complete theories extending Peano Arithmetic (PA)

→ uncertainty about whether PA is consistent

Reflection is still difficult

Benja Fallenstein Vingean reflection

Page 60: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Logical uncertainty

Standard probability theory = environmental uncertainty.

Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!

Example: Choose between O(n2) and O(n log n) algorithm

Realistic Vingean reflection needs logical uncertainty.

Approach for study:

Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)

→ uncertainty about whether PA is consistent

Reflection is still difficult

Benja Fallenstein Vingean reflection

Page 61: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Logical uncertainty

Standard probability theory = environmental uncertainty.

Agents are assumed to be logically omniscient.No theoretical understanding of mathematical uncertainty!

Example: Choose between O(n2) and O(n log n) algorithm

Realistic Vingean reflection needs logical uncertainty.

Approach for study:

Probability distribution over complete theories in somefirst-order language.e.g. complete theories extending Peano Arithmetic (PA)

→ uncertainty about whether PA is consistent

Reflection is still difficult

Benja Fallenstein Vingean reflection

Page 62: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Reflection in probabilistic logic

Assign probabilities P[ϕ] to sentences ϕ. . .

. . . in a language with a symbol for P[·].Require e.g.: if ZFC ` ϕ→ ψ, then P[ϕ] ≤ P[ψ].

Reflection: α ≤ P[ϕ] ≤ β =⇒ P[α ≤ P[ϕ] ≤ β] = 1.

But let ZFC ` ϕ↔ P[ϕ] < 1 (diagonal lemma).

Suppose P[ϕ] = 1. Then P[ϕ] = P[P[ϕ] < 1] = 0.

Suppose P[ϕ] ≤ 1− ε < 1. ThenP[ϕ] = P[P[ϕ] < 1] ≥ P[P[ϕ] ≤ 1− ε] = 1.

Contradiction!

Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1

Benja Fallenstein Vingean reflection

Page 63: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Reflection in probabilistic logic

Assign probabilities P[ϕ] to sentences ϕ. . .

. . . in a language with a symbol for P[·].Require e.g.: if ZFC ` ϕ→ ψ, then P[ϕ] ≤ P[ψ].

Reflection: α ≤ P[ϕ] ≤ β =⇒ P[α ≤ P[ϕ] ≤ β] = 1.

But let ZFC ` ϕ↔ P[ϕ] < 1 (diagonal lemma).

Suppose P[ϕ] = 1. Then P[ϕ] = P[P[ϕ] < 1] = 0.

Suppose P[ϕ] ≤ 1− ε < 1. ThenP[ϕ] = P[P[ϕ] < 1] ≥ P[P[ϕ] ≤ 1− ε] = 1.

Contradiction!

Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1

Benja Fallenstein Vingean reflection

Page 64: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Reflection in probabilistic logic

Assign probabilities P[ϕ] to sentences ϕ. . .

. . . in a language with a symbol for P[·].Require e.g.: if ZFC ` ϕ→ ψ, then P[ϕ] ≤ P[ψ].

Reflection: α ≤ P[ϕ] ≤ β =⇒ P[α ≤ P[ϕ] ≤ β] = 1.

But let ZFC ` ϕ↔ P[ϕ] < 1 (diagonal lemma).

Suppose P[ϕ] = 1. Then P[ϕ] = P[P[ϕ] < 1] = 0.

Suppose P[ϕ] ≤ 1− ε < 1. ThenP[ϕ] = P[P[ϕ] < 1] ≥ P[P[ϕ] ≤ 1− ε] = 1.

Contradiction!

Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1

Benja Fallenstein Vingean reflection

Page 65: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Reflection in probabilistic logic

Assign probabilities P[ϕ] to sentences ϕ. . .

. . . in a language with a symbol for P[·].Require e.g.: if ZFC ` ϕ→ ψ, then P[ϕ] ≤ P[ψ].

Reflection: α ≤ P[ϕ] ≤ β =⇒ P[α ≤ P[ϕ] ≤ β] = 1.

But let ZFC ` ϕ↔ P[ϕ] < 1 (diagonal lemma).

Suppose P[ϕ] = 1. Then P[ϕ] = P[P[ϕ] < 1] = 0.

Suppose P[ϕ] ≤ 1− ε < 1. ThenP[ϕ] = P[P[ϕ] < 1] ≥ P[P[ϕ] ≤ 1− ε] = 1.

Contradiction!

Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1

Benja Fallenstein Vingean reflection

Page 66: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Reflection in probabilistic logic

Assign probabilities P[ϕ] to sentences ϕ. . .

. . . in a language with a symbol for P[·].Require e.g.: if ZFC ` ϕ→ ψ, then P[ϕ] ≤ P[ψ].

Reflection: α ≤ P[ϕ] ≤ β =⇒ P[α ≤ P[ϕ] ≤ β] = 1.

But let ZFC ` ϕ↔ P[ϕ] < 1 (diagonal lemma).

Suppose P[ϕ] = 1. Then P[ϕ] = P[P[ϕ] < 1] = 0.

Suppose P[ϕ] ≤ 1− ε < 1. ThenP[ϕ] = P[P[ϕ] < 1] ≥ P[P[ϕ] ≤ 1− ε] = 1.

Contradiction!

Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1

Benja Fallenstein Vingean reflection

Page 67: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Procrastination in probabilistic logic

Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1.

Let ZFC ` P(n) ↔ P[∃k > n. P(k)] < 1− 1n

“Button pressed in step n unless very sure it’s pressed later”

P[∃n.P(n)] = 1

For all n, P[P(n)] = 0

Unclear how to interpret this!

P can’t be σ-additive probability measure on standard models

But can be finitely additive measure

Clearer understanding needed!

Benja Fallenstein Vingean reflection

Page 68: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Procrastination in probabilistic logic

Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1.

Let ZFC ` P(n) ↔ P[∃k > n. P(k)] < 1− 1n

“Button pressed in step n unless very sure it’s pressed later”

P[∃n.P(n)] = 1

For all n, P[P(n)] = 0

Unclear how to interpret this!

P can’t be σ-additive probability measure on standard models

But can be finitely additive measure

Clearer understanding needed!

Benja Fallenstein Vingean reflection

Page 69: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Procrastination in probabilistic logic

Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1.

Let ZFC ` P(n) ↔ P[∃k > n. P(k)] < 1− 1n

“Button pressed in step n unless very sure it’s pressed later”

P[∃n.P(n)] = 1

For all n, P[P(n)] = 0

Unclear how to interpret this!

P can’t be σ-additive probability measure on standard models

But can be finitely additive measure

Clearer understanding needed!

Benja Fallenstein Vingean reflection

Page 70: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Procrastination in probabilistic logic

Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1.

Let ZFC ` P(n) ↔ P[∃k > n. P(k)] < 1− 1n

“Button pressed in step n unless very sure it’s pressed later”

P[∃n.P(n)] = 1

For all n, P[P(n)] = 0

Unclear how to interpret this!

P can’t be σ-additive probability measure on standard models

But can be finitely additive measure

Clearer understanding needed!

Benja Fallenstein Vingean reflection

Page 71: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Procrastination in probabilistic logic

Christiano (2013): consistent to have for all α, β ∈ Q, all ϕ:α < P[ϕ] < β =⇒ P[α < P[ϕ] < β] = 1.

Let ZFC ` P(n) ↔ P[∃k > n. P(k)] < 1− 1n

“Button pressed in step n unless very sure it’s pressed later”

P[∃n.P(n)] = 1

For all n, P[P(n)] = 0

Unclear how to interpret this!

P can’t be σ-additive probability measure on standard models

But can be finitely additive measure

Clearer understanding needed!

Benja Fallenstein Vingean reflection

Page 72: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

1 The “procrastination paradox”

2 A formal toy model

3 Partial solutions

4 Logical uncertainty

5 Conclusions

Benja Fallenstein Vingean reflection

Page 73: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Conclusions

Gave example of self-referential reasoning gone wrong.

Any reliable system for self-referential reasoning will need todeal with this somehow.

Analyzed the problem using a toy model,and looked for solutions that generalize.

Can extend to utility-based agents (Fallenstein & Soares, 2014)

Looked for extensions to logical uncertainty.

Reflection is still difficult.

Still get versions of the procrastination paradox.

Better understanding needed.

Extremely reliable self-referential reasoning isn’t trivial. . .

but we can make progress towards it! Thanks for listening!

Benja Fallenstein Vingean reflection

Page 74: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Conclusions

Gave example of self-referential reasoning gone wrong.

Any reliable system for self-referential reasoning will need todeal with this somehow.

Analyzed the problem using a toy model,and looked for solutions that generalize.

Can extend to utility-based agents (Fallenstein & Soares, 2014)

Looked for extensions to logical uncertainty.

Reflection is still difficult.

Still get versions of the procrastination paradox.

Better understanding needed.

Extremely reliable self-referential reasoning isn’t trivial. . .

but we can make progress towards it! Thanks for listening!

Benja Fallenstein Vingean reflection

Page 75: Vingean reflection: Reliable reasoning for self … reasoning for self-improving agents Benja Fallenstein ... Need to use abstract reasoning Vinge: Can’t know exactly what a smarter

The “procrastination paradox”A formal toy model

Partial solutionsLogical uncertainty

Conclusions

Conclusions

Gave example of self-referential reasoning gone wrong.

Any reliable system for self-referential reasoning will need todeal with this somehow.

Analyzed the problem using a toy model,and looked for solutions that generalize.

Can extend to utility-based agents (Fallenstein & Soares, 2014)

Looked for extensions to logical uncertainty.

Reflection is still difficult.

Still get versions of the procrastination paradox.

Better understanding needed.

Extremely reliable self-referential reasoning isn’t trivial. . .

but we can make progress towards it! Thanks for listening!

Benja Fallenstein Vingean reflection