Carl Shulman, Singularity Institute Anna Salamon, Singularity Institute.

74
Risk-averse preferences as an AGI safety technique Carl Shulman, Singularity Institute Anna Salamon, Singularity Institute

Transcript of Carl Shulman, Singularity Institute Anna Salamon, Singularity Institute.

  • Slide 1

Carl Shulman, Singularity Institute Anna Salamon, Singularity Institute Slide 2 Slide 3 attempt escape Slide 4 80% chance take over universe Slide 5 attempt escape 80% chance 20% chance take over universe shutdown Slide 6 attempt escape 80% chance 20% chance cooperate take over universe shutdown Slide 7 attempt escape 80% chance 20% chance cooperate take over universe shutdown reward Slide 8 U conquest > U reward Slide 9 P(reward) > P(conquest) Slide 10 P(conq.)U conq. + P(shutdown)U shutdown P(reward)U reward > Slide 11 Slide 12 Slide 13 hand- specified actions domain-specific optimizer (e.g., chess AI) optimizer Slide 14 (Omohundro, 2008) Slide 15 Slide 16 Slide 17 attempt escape cooperate Slide 18 Slide 19 attempt escape 80% chance 20% chance cooperate take over universe shutdown reward Slide 20 Slide 21 Certainty of happy lifetime in modern USA 10% chance of 10^100 years of superhuman existence Posner 2004 Slide 22 Certainty of happy lifetime in modern USA 10 -20 chance of 10^100 years of superhuman existence Slide 23 Slide 24 certainty of 1 trillionth of universe 10% chance of entire universe Slide 25 Utility linear in resources 10% chance of entire universe certainty of 1 trillionth of universe Slide 26 Slide 27 10 -20 * 10 200 > 10 60 probability of strange physics permitting vast resources payoff if so normal payoff Slide 28 10 -20 * 10 200 > 10 60 probability of strange physics permitting vast resources payoff if so normal payoff ? Slide 29 n=1 10 60 10 20 10 6 10 10 100 10 1000 small n universe -sized n already hit ceiling much higher ceiling Slide 30 Slide 31 Slide 32 Slide 33 Slide 34 Certainty of happy lifetime in modern USA 10% chance of 10^100 years of superhuman existence Posner 2004 Slide 35 Slide 36 attempt escape 80% chance 20% chance cooperate take over universe shutdown reward Slide 37 Slide 38 Slide 39 Slide 40 Slide 41 Slide 42 Slide 43 attempt escape cooperate reward Slide 44 cooperate 95% chance 5% chance shutdown reward attempt escape Slide 45 cooperate 95% chance 5% chance shutdown reward attempt escape Chosen reneging Slide 46 cooperate 95% chance 5% chance shutdown reward attempt escape Slide 47 P(conq.)U conq. + P(shutdown)U shutdown P(reward)U reward > Slide 48 Slide 49 n=1 10 60 10 20 10 6 10 10 100 10 1000 small n universe -sized n already hit ceiling much higher ceiling Slide 50 Slide 51 Slide 52 humans automatically win gains from trade AGI automatically wins Slide 53 humans automatically win gains from trade AGI automatically wins Slide 54 humans automatically win gains from trade AGI automatically wins Slide 55 humans automatically win gains from trade AGI automatically wins Slide 56 humans automatically win gains from trade AGI automatically wins Slide 57 Slide 58 Slide 59 Slide 60 Slide 61 attempt escape cooperate Slide 62 Slide 63 Resource-satiable AGI designs Slide 64 Human norms and precommitments Slide 65 Ways to slowly turn up the power Slide 66 Slide 67 Carl Shulman [email protected] Anna Salamon annasalamon.com [email protected] singinst.org/upload/ai-resource- drives.pdf Slide 68 Slide 69 99.9999%: The Universe is what it seems Slide 70 99.9999%: The Universe is what it seems U = max; actions make no difference Slide 71 99.9999%: The Universe is what it seems 0.0001% chance the universe is an illusion U = max; actions make no difference Slide 72 99.9999%: The Universe is what it seems 0.0001% chance the universe is an illusion Actions make a difference U = max; actions make no difference Slide 73 99.9999%: The Universe is what it seems 0.0001% chance the universe is an illusion Actions make a difference U = max; actions make no difference Slide 74 99.9999%: The Universe is what it seems 0.0001% chance the universe is an illusion Actions make a difference U = max; actions make no difference ?