1 December, 2019matusima/chinou3/JSCPB...24/12/2019 1 Give us today our daily bread. (The Lord’s...

24/12/2019

1

Give us today our daily bread.(The Lord’s Prayer)

動物にとって理（ことわり）とは何か？

Predisposed developments of economic, social and mathematical comprehension in

domestic chicks

1 December, 2019Yoshida Memorial Lecture

JSCPB 2019, University of Tokyo

Toshiya MatsushimaFaculty of Science, Hokkaido University, Sapporo

Center for Mind/Brain Science, University of Trento, Rovereto



domestic chicks





“最適者の生存”

• Spencer H. から Darwin C.へ、Wallace A.R.の指摘により。

• 社会の進歩を前提とする社会学から。

• 「自然選択」を代替する用語として。

• 生物学の用語が一般に使われるとき、しばしば概念が悪用される。

• 強欲な経済を正当化する。

• 我々の世界はモノクロマティックだろうか。

• 何千もの最適者（Agent Smith）が優占する世界だろうか。

http://en.wikipedia.org/wiki/File:Smithposter.jpg

“最適者の生存”

•ちがう

http://ecogreensalonika.wordpress.com/2009/07/22/tremopoulos-question-biodiversity/

• 実に多様・多彩なものに満ちている。

• 最適性はどのように機能しているのか。

• 動物は最適の理（ことわり）を実装しているのか。

Neuro-Ecology of ForagingMatsushima et al. 2001 NeuroReportYanagihara et al. 2001 NeuroReportIzawa et al. 2001 NeuroReportIzawa et al. 2002 Behav Brain ResIzawa et al. 2003 J NeurosciAoki et al. 2003 European J NeurosciIchikawa et al. 2004 Cognitive Brain ResIzawa et al. 2005 European J NeurosciAoki et al. 2006a Behav Brain ResAoki et al. 2006b European J NeurosciMatsushima et al. 2008 Brain Res BullAmita et al. 2010 Biology LettersKawamori & Matsushima 2010 Animal CognitionAmita et al. 2011 Frontiers NeurosciOgura et al. 2011 Frontiers NeurosciKawamori & Matsushima 2012 Animal BehaviourMatsunami, Ogura et al. 2012 Behav Brain ResAmita & Matsushima 2014 Behav Brain ResOgura et al. 2015 Behav Brain ResTsutsui-Kimura et al. 2016 Behav Brain ResWen et al. 2016 Frontiers NeurosciMizuyama et al. 2016 Animal BehaviourXin, Ogura et al. 2017 Behav ProcessesXin et al. 2017 European J NeurosciOgura et al. 2018 Frontiers Applied Math & Stat

Imprinting and DevelopmentMatsushima & Aoki 1995 Neurosci LettTakeuchi et al. 1996 Neurosci LettYazaki et al. 1997a Zool SciYazaki et al. 1997b Zool SciYanagihara et al. 1998 Brain ResYamaguchi et al. 2007 NeuroReportYamaguchi et al. 2008a Brain Res BullYamaguchi et al. 2008b Brain Res BullYamaguchi et al. 2010 Neurosci ResYamaguchi et al. 2011 NeuroReportMiura & Matsushima 2012 Animal CognitionYamaguchi et al. 2012 Nature CommAoki et al. 2015 NeuroscienceYamaguchi et al. 2016 Neurosci LettMiura & Matsushima 2016 Animal BehaviourYamaguchi et al. 2017 Plos OneYamaguchi et al. 2018 Horm & BehavTakemura et al. 2018 Behav Brain ResMiura et al. 2018 Frontiers PhysiologyAoki et al. 2018 Frontiers PhysiologyMiura et al. 2019 Animal Cognition

1 2

3 4

5 6

24/12/2019

2

Neuro-Ecology of ForagingMatsushima et al. 2001 NeuroReportYanagihara et al. 2001 NeuroReportIzawa et al. 2001 NeuroReportIzawa et al. 2002 Behav Brain ResIzawa et al. 2003 J NeurosciAoki et al. 2003 European J NeurosciIchikawa et al. 2004 Cognitive Brain ResIzawa et al. 2005 European J NeurosciAoki et al. 2006a Behav Brain ResAoki et al. 2006b European J NeurosciMatsushima et al. 2008 Brain Res BullAmita et al. 2010 Biology LettersKawamori & Matsushima 2010 Animal CognitionAmita et al. 2011 Frontiers NeurosciOgura et al. 2011 Frontiers NeurosciKawamori & Matsushima 2012 Animal BehaviourMatsunami, Ogura et al. 2012 Behav Brain ResAmita & Matsushima 2014 Behav Brain ResOgura et al. 2015 Behav Brain ResTsutsui-Kimura et al. 2016 Behav Brain ResWen et al. 2016 Frontiers NeurosciMizuyama et al. 2016 Animal BehaviourXin, Ogura et al. 2017 Behav ProcessesXin et al. 2017 European J NeurosciOgura et al. 2018 Frontiers Applied Math & Stat

Imprinting and DevelopmentMatsushima & Aoki 1995 Neurosci LettTakeuchi et al. 1996 Neurosci LettYazaki et al. 1997a Zool SciYazaki et al. 1997b Zool SciYanagihara et al. 1998 Brain ResYamaguchi et al. 2007 NeuroReportYamaguchi et al. 2008a Brain Res BullYamaguchi et al. 2008b Brain Res BullYamaguchi et al. 2010 Neurosci ResYamaguchi et al. 2011 NeuroReportMiura & Matsushima 2012 Animal CognitionYamaguchi et al. 2012 Nature CommAoki et al. 2015 NeuroscienceYamaguchi et al. 2016 Neurosci LettMiura & Matsushima 2016 Animal BehaviourYamaguchi et al. 2017 Plos OneYamaguchi et al. 2018 Horm & BehavTakemura et al. 2018 Behav Brain ResMiura et al. 2018 Frontiers PhysiologyAoki et al. 2018 Frontiers PhysiologyMiura et al. 2019 Animal Cognition

どんなものを食べているか言ってみたまえ。君がどんな人か言い当てて見せよう。

（ブリア・サヴァラン）

1. 最適性：何かを最大化する✓ Charnovの枠組み：限定された知識の下で確率的に決定する✓ 最適餌メニューモデルと異時点間選択：利潤率原則✓ 最適餌パッチモデルと埋没費用効果

2. 社会性：最適点をシフトさせる✓ Giraldeauの枠組み：生産者・略奪者ゲームとナッシュ均衡✓ 選択衝動性：生産者の理（ことわり）✓ 社会的促進：略奪者の理（ことわり）

3. 一意の内部表現：全単射 f: brain → cognition✓ 二つの生態学的時間要因の二重分離✓ 脳はバイアスをもって生まれ、予期を生成し続ける✓ ジャンキーな価値表現から時間的差分がロバストに計算される

4. これからの課題：知の原型✓ 刷り込みの硬い結び目✓ 数学の生物学的基礎

1.最適性：何かを最大化する

Charnovの最適採餌理論

喪失機会

利潤率 = エネルギー利得 / 処理時間

餌

Eric L. Charnov

最適採餌理論(diet menu model)

⚫ 餌との遭遇は確率的である。事前に遭遇を知りえない。⚫ 猫またぎすれば得られたはずの利益（喪失機会）がある。⚫ 即時利潤率の追求が、長期利益率を最大化する。

採るべきか、採らざるべきか…

Charnov (1976a)

啄むべきか、べからざるか、これが問題だ…

既知の餌の利潤率 = e/h

喪失機会

=?

近くの餌か、大きな餌か、これが問題だ…

近さ 1/h

量 e

G. Ainslie “Breakdown of Will” (2001)

最適採餌問題は異時点間選択に似ている。同値か？

1.利潤率＝量ｘ近さ

7 8

9 10

11 12

24/12/2019

3

ヒヨコの選択

遅延報酬で強化されたオペラント色弁別課題

ビーズバーを啄む遅延報酬

Izawa et al. (2005)

ヒヨコの選択

二者択一選択

(S+ vs. S+)

Izawa et al. (2005)

ビーズバーを啄む遅延

?

in 8 trials

8

0

ヒヨコの選択

?

in 8 trials

7

1

ヒヨコの選択

?

in 8 trials

7 1

ヒヨコの選択利潤率を明示的に計算してみた。

13 14

15 16

17 18

24/12/2019

4

一次近似として対応則を仮定する。

Prediction 2: if D1 ＝ D6 (D) …

Prediction 1: if D1 ≪ D6 …



Prediction 1: if D1 < D6 …$1,500 $500

3回 1回



Prediction 1: if D1 < D6 …$1,500 $500

3回 1回

理想的自由分布が実現する。


Prediction 1: if D1 < D6 …$1,500 $500

3回 1回社会性が個体の選択の裏側に隠れている。

予測１

0 0.5 1 1.5 2 2.5 3 3.5 40.2

0.3

0.4

0.5

0.6

0.7

0.8

Delay associated with the larger reward

Cho

ice

ratio

for

the

larg

er r

ewar

d

D1=0.5 sec=0.4 sec=0.3 sec=0.2 sec=0.1 sec

even choice

6 p

ellets

1 p

ellet

D6

choic

e r

atio

D1=0.15 sec, D2= D1 + (0,1,2,3) sec

Izawa et al. (2003), Matsushima et al. (2008)

予測２

D =4.0 sec=3.0 sec=2.0 sec=1.5 sec=1.0 sec=0.5 sec=0.2 sec

even choice

1 2 3 4 5 6 7 8 9 100.2

0.3

0.4

0.5

0.6

0.7

0.8

Kappa: number of pecks to gain a pellet

Chi

ce r

atio

for

the

lar

ger

rew

ard

even choice

6 p

ellets

1 p

ellet

K

choic

e r

atio

D=1.4 sec (const)

Aoki et al. (2006), Matsushima et al. (2008)

19 20

21 22

23 24

24/12/2019

5

ヒヨコの経済学：利潤率⇒長期利益率最大化

⚫ ヒヨコは餌を好む。⚫ ヒヨコは大きな餌を好む。

⚫ ヒヨコはすぐに手に入る餌を好む。

⚫ ヒヨコは利潤率に対応して選択する。⚫ ヒヨコは報酬の量と近さの積（利潤率）を予期推定し、その比に

対応した対象選択を確率的に執行する。

⚫ 対応則の採用によって、最適性が損なわれている。

2.最適パッチ利用と埋没費用効果

止まり木

色＋オペラントキー

給餌器

16 flights＞Red ON＞Peck Red＞Food

（報酬量は同じ）

ホシムクドリ

トレーニング

4 flights＞Blue ON＞Peck Blue＞Food

Kacelnik & Marsh (2002)

埋没費用効果（コンコルドの誤信）

＊実際に用いた色は青・赤ではなく、白・橙である。

food tray

赤と青の二者択一選択

赤 > 青Kacelnik & Marsh (2002)

ホシムクドリ

テスト




＊実際に用いた色は青・赤ではなく、白・橙である。

food tray

赤と青の二者択一選択

赤 > 青Kacelnik & Marsh (2002)

ホシムクドリ

テスト




⚫ 既に投資した回収不能なコストが、前向的な意思決定をゆがめる。（埋没費用効果）

⚫ 投資した努力コストが、等価な報酬と結びついた二つの予告子に選好性バイアスを形成する。（労働価値説）

Eric L. Charnov

最適採餌理論（最適餌パッチ利用モデル）

前提：（１）餌は世界にパッチ状に（不均一に）分布している。（２）パッチの餌資源は、消費によって徐々に減っていく。（収益逓減）（３）採餌者はパッチ遭遇を、あらかじめ知ることがない。

採餌者は餌が枯渇する前、早々に離れるべきである。

Charnov (1976b)

25 26

27 28

29 30

24/12/2019

6

収益逓減と限界値定理瞬間利益率（接線）が、その採餌環境における長期平均利益率に一致した時、パッチを離脱することが最適である。

累積餌量

パッチ利用時間トラベル時間

最適

長すぎる

短すぎる

Charnov (1976b)

累積餌量

パッチ利用時間トラベル時間

収益逓減と限界値定理トラベル時間が長くなれば、パッチ利用時間も相応に長くなるだろう。

Charnov (1976b)

Aトレッドミルによる強制歩行

x2 runs x8 runs x12 runs

累積餌量

パッチ利用時間

B

累積餌量


収益逓減と限界値定理トラベル時間が長くなれば、パッチ利用時間も相応に長くなるだろう。

Fujikawa and Matsushima (unpublished)

より長くトラベルすると、より長くステイした。⚫しかし、相関はない。

⚫離脱は確率的でポアソン則に従う。 variance ≈ mean2

Aforced runs by tread-mill

unbiased

B

cumulative food

patch use time

x2 runs, x8 runs, x12 runs


x2 runs

x8 runs

x12 runs

patch A+B

最適水準から逸脱している。

Fujikawa and Matsushima (unpublished)


累積餌量


差をつけたら?A には 2 runs で届く。

B には 2~12 runs が必要。


差をつけたら?A には 2 runs で届く。

B には 2~12 runs が必要。

累積餌量


B

累積餌量


x2 runs x8 runs x12 runsto B

x2 runs to A

31 32

33 34

35 36

24/12/2019

7

patch A patch B x2 runs

x8 runs

x12 runs

差をつけたら?A には 2 runs でたどり着く。B には 2~12 runs が必要。

patch A patch B x2 runs

x8 runs

x12 runs

unbiased (n=5 chicks)

biased (n=5 chicks)

差をつけたら?A には 2 runs でたどり着く。B には 2~12 runs が必要。


累積餌量


B

累積餌量



トラベルに差をつけても、ステイに差が生じない。⚫限界値定理に基づき、採餌空間全体で最適化される。


累積餌量


B

累積餌量



A Aforced runs by tread-mill

biased

B

cumulative food

patch use time


fixed x2 runs

A B> Aの労働コストがBより低いから。（効用価値）

A B= AもBも同じだけの餌を与えているから。（客観価値）

A B< BにはAより多くの労働を投資したから。（労働価値）

トラベルに差をつけても、ステイに差が生じない。⚫限界値定理に基づき、採餌空間全体で最適化される。

⚫ヒヨコは「埋没費用効果」（コンコルドの誤信）を示さない。

ヒヨコの経済学：利潤率原則は妥当である

⚫ ヒヨコは利潤率（量ｘ近さ）に対応した対象選択を行う。対応則のために選択は亜最適である。

⚫ ヒヨコは大きな餌が好きだ。（量）

⚫ ヒヨコは直ぐにもらえる餌が好きだ。（近さ）

⚫ 利潤率に応じて選択確率を決める。（対応則）

⚫ 限界値定理に基づいて確率的に離脱を決定する。滞在時間は最適水準より長く亜最適だが、離脱決定は客観的な収量に基づく。







37 38

39 40

41 42

24/12/2019

8

2.社会性：最適点をシフトさせる

Giraldeauの社会採餌理論

社会生活はコストを強いる

Energetic budget of sociality

Reproductive interference

Foraging competition

Increased extra-parasite load

Alcock (2001) “Animal Behavior”

Blood sharing in vampire bats Public information

Danchin et al. (2004) ScienceDilution of predation risk

http://www.justinunderwater.com/gallery/album96/Schooling_Fish

Giraldeau & Caraco (2000) Social Foraging Theory「社会採餌理論」

Producer-scrounger game

社会生活は利益をもたらす


Blood sharing in vampirebats

Public information

Danchin et al. (2004) ScienceDilution of predation risk

http://asa10.eiga.com/2016/cinema/611.html

社会生活は利益をもたらす

Producer-scrounger game

武士（略奪者）

… そして多くの農民（生産者）

菊千代（三船敏郎）

社会採餌: 生産者ｰ略奪者ゲーム

⚫ 適応度の頻度依存性は、２つの戦術の間に安定なナッシュ均衡を生む。


3.競争採餌と衝動性

43 44

45 46

47 48

24/12/2019

9

社会採餌は選択シフトをもたらす

期待される報酬量は距離により変わることがない。

生産者は「発見者の有利」を得る：資源は距離に応じて分配される。

⚫ 生産者であると同時に略奪者でもある、と仮定する。

⚫ 略奪者の下で生産者はより衝動的にふるまうべきである。

生産者はより身近な餌資源からより多くの報酬を確保できる。

生産者戦術は衝動性を亢進する。

Day 1 to 3 Day 5

SS short latency, small food 1 粒、遅延=0秒

LL long latency, large food 6 粒、遅延=0, 1.5, 3秒

Amita et al. (2010)

⚫ 実競合のもとで遅延のインパクトはほぼ２倍になった。

⚫ 擬似的な（知覚された）競争も衝動性を亢進した。

⚫ リスクだけでは亢進しない。

生産者

略奪者⚫ アクリル越しに生産者

と略奪者を隔離した。⚫ 競争は疑似的で、餌の

競合は起こらない。⚫ 二項分布に基づいて、

リスク（帰結の分散）をシミュレートした。

Amita et al. (2010) Mizuyama et al. (2016)

N=5 N=5 N=6 N=8

Variable food

No.

of

choic

es o

f LL

(trials

/20 trials

)

Constant food

Comp

0

5

10

15

20

Isol Comp Isol

a b c bc

LL delay = 1.5sec

Variable food Constant food

0

5

10

15

20

Comp Isol Comp IsolN=7 N=7 N=6 N=6

NS

LL delay = 0sec

No.

of

choic

es

of

LL

(trials

/20 trials

)

(b)

⚫ リスクと競争

⚫ リスク（帰結の分散）と競争（知覚的）が出会うとき、衝動性亢進が起こる。

生産者

略奪者


LL long latency, large food 6 粒、遅延=0, 1.5, 3秒



N=9 N=8 N=8 N=8

No.

of

choic

es

of

LL

(trials

/20 trials

)

Var

0

5

10

15

20

Const Var ConstSSLL

Var Var Const Const

a a bab

N=5 N=5 N=6 N=8

Variable food

No.

of

choic

es

of

LL

(trials

/20 trials

)

Constant food

Comp

0

5

10

15

20

Isol Comp Isol

a b c bc

LL delay = 1.5sec


0

5

10

15

20


NS

LL delay = 0sec

No.

of

choic

es

of

LL

(trials

/20 trials

)

(b)

生産者

略奪者


LL long latency, large food 6 粒、遅延=0, 1.5, 3秒Mizuyama et al. (2016)



⚫ どちらの選択肢にリスクがあっても、衝動性亢進は起こる。選択肢ごとに割引が生じるのではない。N=9 N=8 N=8 N=8

No.

of

choic

es

of

LL

(trials

/20 trials

)

Var

0

5

10

15

20

Const Var ConstSSLL

Var Var Const Const

a a bab

N=5 N=5 N=6 N=8

Variable food

No.

of

choic

es

of

LL

(trials

/20 trials

)

Constant food

Comp

0

5

10

15

20

Isol Comp Isol

a b c bc

LL delay = 1.5sec


0

5

10

15

20


NS

LL delay = 0sec

No.

of

choic

es

of

LL

(trials

/20 trials

)

(b)

生産者

略奪者


LL long latency, large food 6 粒、遅延=0, 1.5, 3秒Mizuyama et al. (2016)

49 50

51 52

53 54

24/12/2019

10

δ (distance between P and S)

F (food patch)

d (distance between F and P)

gain

time

P’s gain

S’s gain

A (amount)

Φ Ω

delay Tp

time during which the

producer monopolizes

the patch

𝑇𝑝 = ൗδ 𝑣

profitability

S (scrounger)

P (producer)

• 生産者(P)が、餌Fをみつける。

• 略奪者(S)はPを追いかけ、餌Fを分け合う。

異時点間選択の数学生産者ー略奪者の利益相反がもたらすもの

Ogura, Amita & Matsushima (2018)

δ (distance between P and S)

F (food patch)

d (distance between F and P)

gain

time

P’s gain

S’s gain

A (amount)

Φ Ω

delay Tp

time during which the

producer monopolizes

the patch

𝑇𝑝 = ൗδ 𝑣

profitability

ΩΦ

ΩΦ

S (scrounger)

P (producer)

long δ → 引き離す

生産者の取り分↑

short δ → 追いすがる

→ 略奪者の取り分↑

• 誰もが自己利益を「よりましなもの」にしようとする。

• 軍拡ゲームの状況が生まれる。



軍拡競争

社会採餌条件では、量と遅延の比（A/D比）がSS<LLであっても、SSの利潤率＞LLの利潤率が起こりえる。

𝐴𝑆𝑆 < 𝐴𝐿𝐿 ≤ 𝑠𝑇

Τ𝐴𝑆𝑆 𝐷𝑆𝑆

Τ𝐴𝐿𝐿𝐷𝐿𝐿

𝑝𝑟𝑜𝑓 𝑆𝑆 > 𝑝𝑟𝑜𝑓(𝐿𝐿)for ∀𝑇 > 0, ∀𝑠 > 0

Case (1)

𝑠𝑇 < 𝐴𝑆𝑆 < 𝐴𝐿𝐿



𝑝𝑟𝑜𝑓 𝑆𝑆 > 𝑝𝑟𝑜𝑓(𝐿𝐿)for ∀𝑇 > 0, ∀𝑠 > 0

𝑝𝑟𝑜𝑓 𝑆𝑆 > 𝑝𝑟𝑜𝑓(𝐿𝐿)

∃𝑔 > 0 f𝑜𝑟∀𝑠 > 0,where for ∀𝑇 ∈ (𝑔,∞)

Case (3)

𝑝𝑟𝑜𝑓 𝑆𝑆 > 𝑝𝑟𝑜𝑓(𝐿𝐿)



𝐴𝑆𝑆 ≤ 𝑠𝑇 < 𝐴𝐿𝐿

Case (2)

∃𝑓 > 0 𝑓𝑜𝑟 ∀𝑠 > 0,where for ∀𝑇 ∈ (0, 𝑓)


long δ 引き離す追いすがる short δ


軍拡競争異時点間選択の数学

生産者ー略奪者の利益相反がもたらすもの

⚫ SS はLLに比べて、略奪に対して頑健である。

⚫ 軍拡ゲームの厳しい競合下では、SSの頑健性がLLの量に勝る。

long δ 引き離す追いすがる short δ


軍拡競争脳の処理時間ぎりぎり。刺激―反応の反射では

間に合わない？

4.社会的促進と同調

生産者は「発見者の有利」を得る：資源は距離に応じて分配される。

生産者はより身近な餌資源から、より多くの報酬を確保できる。

略奪者はより身近な生産者から、より多くを略奪できる。

社会採餌は選択シフトをもたらす

⚫ 略奪者もまた、より近い生産者を襲うべきである。

55 56

57 58

59 60

24/12/2019

11

88cm

12cm

red feeder (left)

blue feeder (right)

center line

⚫個々の餌場は低い頻度(10-20 秒に一回)小さな餌（一粒の粟）を供給する。

⚫給餌に先立つ手掛かりは提示されない。

⚫ヒヨコは速やかに、二つの餌場の間を交互に行き来し始める。

Ogura & Matsushima (2011)

略奪者戦術は採餌努力の社会的促進を介して、同調と過剰労働投資をもたらす。

実競合なし。相互に見える。

実競合なし。相互に見えない。

略奪者戦術は採餌努力の社会的促進を介して、同調と過剰労働投資をもたらす。

Ogura & Matsushima (2011)

Xin, Ogura and Matsushima (2017)

⚫ 採餌努力は同様に促進される。Pair: 競争的他者を見るMirror: 採餌する自分を見る

⚫ 餌場利用の対応は一致しない。Pair:正確な対応（マッチング）Mirror: Singleと同様（アンダー・マッチング）

採餌努力は社会的に促進される。機会の増加か、公共情報の獲得か？

採餌努力は社会的に促進される。機会の増加か、公共情報の獲得か？

⚫ 餌場を反転させると…Pair: かつて良かった餌場へ、強い固執を示す。Mirror, Single: 今良い餌場へ、速やかに切り替える。

⚫ 採餌努力は同様に促進される。Pair: 競争的他者を見るMirror: 採餌する自分を見る

⚫ 餌場利用の対応は一致しない。Pair:正確な対応（マッチング）Mirror: Singleと同様（アンダー・マッチング）

Xin, Ogura and Matsushima (2017)







⚫ 資源競合は衝動性を亢進し、過剰労働をもたらす。

⚫ 資源配分が最適化し、ナッシュ均衡が実現する。

⚫ 競合は同調をもたらし、対応則と参照記憶を強化する。

３．一意の内部表現全単射 f : brain → cognition

二重分離するふたつのコスト変数

予期のジャンキーな表象と時間的差分

61 62

63 64

65 66

24/12/2019

12

5. ふたつのコスト変数

GPLSt Optic TectumMSt/NAc

基底核（線条体・側坐核）が報酬の近さに基づく決定をつかさどる。

VTA/SNc

Nidopalliumdopamine

Arcopallium

側坐核の局所破壊の効果

Izawa et al. (2003)

lesionedcontrol

⚫ 破壊は衝動的選択を引きおこした。

⚫ 量に基づく選択と再学習には影響がない。

ITIs = 30-60 sec

sec


Izawa et al. (2003)

⚫ 手掛かりから「近さ」を想起する場合、破壊は衝動性を高める。

Red: 20 cmYellow: 80 cmGreen: 140 cmBlue: no reward

(fixed amount for the sides; 1 for left, 6 for right)

Aoki et al. (2006a)


Aoki et al. (2006a)

When colors signaled the proximity: lesions caused impulsive choices.




67 68

69 70

71 72

24/12/2019

13



Red: 6 grainsYellow: 1 grainBlue: no reward


⚫ 手掛かりから「近さ」を想起する場合、破壊は衝動性を高める。

⚫ 手掛かりから「量」を想起する場合、破壊の効果はない。

Aoki et al. (2006a)

GPLSt Optic TectumMSt/NAcVTA/SNc

Nidopalliumdopamine

Arcopallium

弓外套（皮質連合野に相当）がコストに基づく決定をつかさどる。

One-way ANOVA revealed a significant difference between groups at p<0.005.

⚫コスト係数 (κ) は対象選択の重要な要素の1つである。

弓外套皮質の局所破壊の効果

Aoki et al. (2006b)

pre-ope

post-ope

One-way ANOVA of the post-ope data revealed significant interaction of group x test-type at p<0.01. Significant interaction of group x test-type at

p<0.01.

MSt-NAc

shamarcopallium

弓外套皮質の局所破壊の効果

⚫弓外套の破壊は、コストのかかる選択肢の忌避をもたらす。

⚫側坐核の破壊では、コスト忌避は生じない。

Aoki et al. (2006b)

small-immediate reward

large-late reward

small-easy reward

large-effortful reward

側坐核

弓外套

衝動的選択

エフォート回避

(no effects)

(no effects)

遅延 (time to reach food) 努力 (to consume food)

時間というコスト、努力というコスト、両者は脳内で二重分離されている。

• 神経核のレベルでは、脳から認知へ単射である。

6.ニューロンの表現

73 74

75 76

77 78

24/12/2019

14

micro-drive and buffer amplifiers

LEGO Mindstorm-controlled feeder

colored cue bead

Yanagihara et al. (2001)

GPLSt Optic TectumMSt/NAc

基底核（線条体・側坐核）が報酬の近さに基づく決定をつかさどる。

VTA/SNc

Nidopalliumdopamine

Arcopallium

内側線条体・側坐核のニューロンは、直近の未来に得られるもの（予期報酬）を表現している。

4s 4s1s

cue delay reward

peckreward

no peckreward

no rewardno peck


4s 4s1s

cue delay reward

peckreward

no peckreward

no rewardno peck



内側線条体－側坐核ニューロンは、刺激入力（色手掛かり）でも運動出力（Go/No-Go）でもなく、予期報酬を表現する。



手掛かり期の活動

red = yellow > green

は予期報酬の近さ（遅延の逆数）に対応する。

自身の決定による報酬期待（遅延期）も表現している。

Izawa et al. (2005)

内側線条体・側坐核のニューロンは、直近の未来に得られるもの（予期報酬）の近さを表現している。

79 80

81 82

83 84

24/12/2019

15

内側線条体・側坐核のニューロンは、直近の未来に得られるもの（予期報酬）の量を表現している。

手掛かり期の活動

red = green > Yellow

は予期報酬の量に対応する。

実報酬に対する価値も表現している。

Izawa et al. (2005)

cue-period

activity

color cue

“goodness” of food

(proximity / amount)

recall of cue-reward

association

time

側坐核は回転寿司である: 時間的差分学習の基盤

Amita & Matsushima (2014)

delay-period

activity

cue-period

activity

color cue




association

decision

interval-timing

mechanism

time



delay-period

activity

reward-period

activity

cue-period

activity

color cue




association

decision

interval-timing

mechanism

food delivery

reward

perception

time



delay-period

activity

reward-period

activity

cue-period

activity

color cue




association

decision

interval-timing

mechanism

food delivery

reward

perception

time

error computation by

DA-ergic neurons


• 予期報酬の価値には慣性 inertia があり、実際に報酬が得られた後も残存する。

Wen & Matsushima (2016)

側坐核は回転寿司である: 時間的差分学習の基盤cue1 (rewarding) cue3 (non-rewarding) cue1 (omission)

-1

0

1

2

3

4

5

実報酬 ARstr（内側線条体・側坐核）

2

1

0

-2

-3

-1

報酬予期 RPstr（内側線条体・側坐核）

time (s)

cue food

85 86

87 88

89 90

24/12/2019

16


« β1∙ARstr + β2∙RPstr時間的差分（中脳被蓋）

time (s)

z s

core

of

mean

firin

g

rate

-2 -1 0 1 2 3 4 5-2 -1 0 1 2 3 4 5-2

-1

0

1

2

3

4

5

-1 0 1 2 3 4 5-2

cue food

cue1 (rewarding) cue3 (non-rewarding) cue1 (omission)

-1

0

1

2

3

4

5


2

1

0

-2

-3

-1



(β1: 0.7006, β2:0.6623)


• 予期は時間的差分によって更新される。

• 報酬期に限らず、常時起こる。（R-W則が拡張される。）


time (s)

z s

core

of

mean

firin

g

rate

-2 -1 0 1 2 3 4 5-2 -1 0 1 2 3 4 5-2

-1

0

1

2

3

4

5

-1 0 1 2 3 4 5-2

cue food


-1

0

1

2

3

4

5


2

1

0

-2

-3

-1



(β1: 0.7006, β2:0.6623)





time (s)

z s

core

of

mean

firin

g

rate

-2 -1 0 1 2 3 4 5-2 -1 0 1 2 3 4 5-2

-1

0

1

2

3

4

5

-1 0 1 2 3 4 5-2

cue food


-1

0

1

2

3

4

5


2

1

0

-2

-3

-1


ここに数式を入力します。

• ニューロンレベルでは、脳から認知への単射がない。

cue-reward association

food value (amount)

food value (proximity)

decision-based prediction

approach to the reward

real reward

MSt-NAc


(β1: 0.7006, β2:0.6623)





time (s)

z s

core

of

mean

firin

g

rate

-2 -1 0 1 2 3 4 5-2 -1 0 1 2 3 4 5-2

-1

0

1

2

3

4

5

-1 0 1 2 3 4 5-2

cue food


-1

0

1

2

3

4

5


2

1

0

-2

-3

-1


ここに数式を入力します。

• ニューロンレベルでは、脳から認知への単射がない。• さまざまな認知表現のロバストな線形和で、なぜか

時間的差分が算出されている。

glu/GABA

cue-reward association

food value (amount)

food value (proximity)

decision-based prediction

approach to the reward

real reward

MSt-NAc

SNc/VTA

TD signal = dopamine

• Sutton R.S. & Barto A.G. (1998) “Reinforcement Learning”

• Temporal Difference Learning （時間的差分学習）• Rescola-Wagner則（予期誤差）を実時間に一般化する。

• クリティークが実報酬と価値の時間的差分を計算する。𝑉 𝑠𝑡 ← 𝑉 𝑠𝑡 + 𝛼 𝑟𝑡+1 + 𝛾𝑉(𝑠𝑡+1) − 𝑉(𝑠𝑡)

• アクターの方策を同時に更新していく。



Nidopalliumdopamine

Arcopallium







Nidopalliumdopamine

Arcopallium

91 92

93 94

95 96

24/12/2019

17





• Reservoir計算の可能性



Nidopallium

Arcopallium?

dopamine








Nidopallium

Arcopallium?

dopamine

実はごく少数。LSｔへの投射がはるかに大きい。

ここの可塑性だけでアクターの学習を立ち上げねばな

らない。



• 現在の状態（𝑠𝑡）の良さ（価値関数 𝑉 𝑠𝑡 ）を、将来の報酬予期に基づいて与える。• 私の今の幸せは、私が一生をかけて飲むビールの量によって決まる。＞非現実的

• 遠い将来のビールの量を時間割引して積算することで、有限の価値を定義できる。

• クリティークが実報酬と価値の時間的差分を計算する。（価値に慣性 inertia がある。）𝑉 𝑠𝑡 ← 𝑉 𝑠𝑡 +𝛼 𝑟𝑡+1 + 𝛾𝑉(𝑠𝑡+1) − 𝑉(𝑠𝑡)





Nidopallium

Arcopallium?

dopamine



• 現在の状態（𝑠𝑡）の良さ（価値関数 𝑉 𝑠𝑡 ）を、将来の報酬予期に基づいて与える。• 私の今の幸せは、私が一生をかけて飲むビールの量によって決まる。＞非現実的

• 遠い将来のビールの量を時間割引して積算することで、有限の価値を定義できる。

• クリティークが実報酬と価値の時間的差分を計算する。（価値に慣性 inertia がある。）𝑉 𝑠𝑡 ← 𝑉 𝑠𝑡 +𝛼 𝑟𝑡+1 + 𝛾𝑉(𝑠𝑡+1) − 𝑉(𝑠𝑡)





Nidopallium

Arcopallium?

dopamine

• 反射学が、今も我々の理解を縛る。

• ヒヨコの脳は前のめりに予期を生成し続ける。• 側坐核の「繭」の中で、予期をもとに予期を生成する。• 常に遅れる運命にある行為が、予期のおかげで実時間に追いつく。• 反射的「脳」では間に合わない世界を、生き延びる。

• 作り込まれた社会性バイアスが適応度を高める。• ふ化直後から、合理的な行為バイアスを示す。• 無調整の、楽観的な初期値が機能する。• 学習では間に合わない世界を、生き延びる。

• 背景に膨大な数のニューロンを必要とする。• Reservoir計算の可能性。

4. まとめと展望

97 98

99 100

101 102

24/12/2019

18

理（ことわり）には諸相がある。

1. 最適性：何か最大化するものがある✓ Charnovの枠組み：限定された知識の下で確率的に決定する✓ 最適餌メニューモデルと異時点間選択：利潤率原則✓ 最適餌パッチモデルと埋没費用効果

2. 社会性：最適点をシフトさせる✓ Giraldeauの枠組み：生産者・略奪者ゲームとナッシュ均衡✓ 選択衝動性：生産者の理（ことわり）✓ 社会的促進：略奪者の理（ことわり）

3. 一意の内部表現：全単射 f: brain → cognition✓ 二つの生態学的時間要因の二重分離✓ 脳はバイアスをもって生まれ、予期を生成し続ける✓ ジャンキーな価値表現から時間的差分がロバストに計算される

4. これからの課題：知の原型✓ 刷り込みの硬い結び目✓ 数学の生物学的基礎

Yes?

Yes!

???

formation of imprinting memory

thyroid hormone action

✓ Yamaguchi et al. (2012) Nature Comm.

刷り込みの硬い結び目



✓ Yamaguchi et al. (2012) Nature Comm.✓ Miura & Matsushima (2012) Animal Cognition✓ Miura & Matsushima (2016) Animal Behaviour✓ Miura et al. (2018) Frontiers Physiology✓ Miura et al. (2019) Animal Cognition

Johansson’s Biological Motion, or BM

刷り込みの硬い結び目induction of BM

preferenceformation of imprinting

memory

reversal learning

task switching

economical reasoning

✓ Aoki et al. (ongoing, unpublished)



induction of BM preference


number sense

geometric sense

reversal learning

task switching

economical reasoning

✓ Aoki et al. (ongoing, unpublished)✓ Rugani (2017) Phil Trans B✓ Vallortigara (2017) Phil Trans B✓ … maybe in the next 10 years to come

mathematical reasoning





domestic chicks





103 104

105 106

107 108

24/12/2019

19

Hokkaido UniversityMatsunami SKawamori AAmita HOgura YMiura MWen CXin QMizunami RWatanabe KUno LSaheki YHuang J and many students

Nagoya UniversityYanagihara SIzawa E-IAoki N Ichikawa YSuzuki R and many students

Semmelweis University (Hungary)Csillag AMezey Sz and students

St. Istvan University (Hungary)Kabai PZacher G and students

University of Trento & Padova (Italy)Vallortigara GPecchia TChiandetti C and students

Teikyo University (Japan)Homma K JYamaguchi S and students

Inst Biocybernetics, Polish Academy of Science (Poland)Bem-Sojka T

students & colleagues

Financial supports from JSPS-MEXT (Japan), TeT (Hungary) and CNR (Italy)

Hidetoshi AMITAneural control of impulsive choices

Post-doc (Dr. O. Hikosaka’s lab in NIH Maryland, USA)Assistant Professor (Primatology Institute of Kyoto University, Inuyama)

Optogenetic Study of the Basal ganglia in Monkeys

Chentao WENneuro-computation for prediction error

Post-doc at Dr Kimura’s lab (Osaka Univ. /Nagoya City Univ.)Whole Neurons Imaging from freely behaving C. elegans

Qiuhong XINsocial facilitation and limbic system Post-doc (Dr. Hailan Hu’s lab at Zhejiang University, Hangzhu, China) Limbic system of Mouse

Yukiko OGURAsocial facilitation of work investmentAssistant Professor (Social Psychology Lab of University of Tokyo, Tokyo)fMR Imaging of Human Social Behaviors

Give us today our daily bread.(The Lord’s Prayer)

われらの日用の糧（かて）を今日、与えたまえ（主の祈り）

Mental Toolkitどんな動物も「心の道具箱」を持っている。

種を超えて普遍的な道具（モジュール）もあれば、特定の種だけが持つ道具（モジュール）もある。

（Hauser M., “Wild Minds” 2000）

Core Knowledge環境の重要な事象を処理するために、生得的な「コア知識」を持っている。

種や発達段階の違いを超えて普遍的である。（Spelke E.S. & Kinzler K.D., 2007;

Vallortigara G., 2009）

Mental Toolkitどんな動物も「心の道具箱」を持っている。

種を超えて普遍的な道具（モジュール）もあれば、特定の種だけが持つ道具（モジュール）もある。

（Hauser M., “Wild Minds” 2000）

109 110

111 112

113 114

24/12/2019

20

zzzz

No

rmal

ized

po

pu

lati

on

act

ivit

yo

f 3

7 n

eu

ron

s

LL trials

1.0

2.0

0 2 4 6 (s)

isol

comp

isol

comp

isol

comp

1.0

2.0

1.0

2.0

S-

SS

LL

cue

operant

small rewardshort delay

Pseudo-competition selectively suppressed the cue responses.

contextual effect

⚫2-way ANOVA with repeated measure⚫isol vs. comp (F(1,36) = 29.70, p<0.001); ⚫S-, SS and LL (F(2,72) = 36.30, p<0.001)⚫no significant interaction (F(2,72) =0.67, NS).

0 2 4 6 (s)

large rewardlong delay

Amita and Matsushima (2014) Behav Brain Res

Contextual modulationscf. conditioned impulsiveness

delay-period

activity

reward-period

activity

cue-period

activity

color cue




association

decision

interval-timing

mechanism

food delivery

reward

perception

time

temporal difference


DA-ergic neurons

“Sushi Train” hypothesis: multiple codes of rewards

⚫ TD error (or prediction error) is computed by the lasting representations of prediction during the reward period. (Wen et al. (2016) Front Neurosci)

⚫ Concurrent LTP-LTD in striatum occurs under DA-R1 activation. (Matsushima et al. (2001) NeuroReport)

delay-period

activity

reward-period

activity

cue-period

activity

color cue




association

decision

interval-timing

mechanism

food delivery

reward

perception

time

temporal difference


DA-ergic neurons

“Sushi Train” hypothesis: multiple codes of rewards

social suppression by visual perception of the competitor

Amita and Matsushima (2014) Behav Brain Res

profitability rule and decision-making

⚫Relative profitability stochastically determines choices.⚫Each cue is associated with amount and proximity (inverse of delay),

both of which are represented by neurons in basal ganglia.⚫Product of (amount x proximity) gives (profitability) as the unique and

fixed value of each option.⚫Probability is uniquely determined by the pair of profitability.⚫Poisson process executes the choices based on the probability.

profitability 1✕cue 1

amount 1

proximity 1

profitability 2✕cue 2

amount 2

proximity 2

probability

then ↑↓↓↑↓↓ …

If p1 < p2

profitability rule (alternative)

⚫Parallel decision processes⚫Each cue is separately represented by the amount module and the

proximity module.⚫Each module makes decision independently of the other’s decision.⚫Inter-modular competitive process occurs, allowing one of these to

control the actual outcomes.

SS cue 1 amount 1

amount 2LL cue 2

“amount” decision

LL cue 2 proximity 2

proximity 1SS cue 1“proximity”

decision

then ↑↓↑↓↑↓ …

module competition

<

<

SS

LL

profitability rule (alternative)

⚫Parallel decision processes⚫Each cue is separately represented by the amount module and the

proximity module.⚫Each module makes decision independently of the other’s decision.⚫Inter-modular competitive process occurs, allowing one of these to

control the actual outcomes.

SS cue 1 amount 1

amount 2LL cue 2

“amount”

decision

LL cue 2 proximity 2

proximity 1SS cue 1“proximity”

decision

then ↑↑↑↑↑↑ …

module competition

<

<

SS

LL

social suppression by visual perception of the competitor

localized lesion of MSt/NAc

HOW?

115 116

117 118

119 120

1 December, 2019matusima/chinou3/JSCPB...24/12/2019 1 Give us today our daily bread. (The Lord’s...

Documents

Transcript of 1 December, 2019matusima/chinou3/JSCPB...24/12/2019 1 Give us today our daily bread. (The Lord’s...