研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3...

33
研究室のトピック紹介 略歴 下平英寿 1995.3 東大計数 博士(工学) 1995.10~1996.3 Dept. Gene0cs, University of Washington ポスドク 学振PD 1996.7~2002.5 統計数理研究所 助手 1999.7~2001.1 Dept. Sta0s0cs, Stanford University 客員研究員 2002.6~2012.3 東工大情報理工 講師,助教授,准教授 2012.4~現在 阪大基礎工 教授 1

Transcript of 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3...

Page 1: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

研究室のトピック紹介

略歴  下平英寿    1995.3 東大計数 博士(工学)  1995.10~1996.3  Dept.  Gene0cs,  University  of  Washington  ポスドク 学振PD  1996.7~2002.5 統計数理研究所 助手  1999.7~2001.1 Dept.  Sta0s0cs,  Stanford  University  客員研究員  2002.6~2012.3 東工大情報理工 講師,助教授,准教授  2012.4~現在  阪大基礎工 教授

1

Page 2: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

データ解析の統計的信頼度を計算

マルチスケール・ブートストラップ法の理論研究  【指数型分布族の場合】  Shimodaira  (2004)  Annals  of  Sta0s0cs  【滑らかでない場合】  Shimodaira  (2008)  J.  Stat.  Plan.  Inf.    【高次漸近理論】  Shimodaira  (2013)  arXiv:1312.6348

Function seplot provides a graphical interface for examiningstandard errors, while print gives more detailed informationabout p-values in text-based format. See online instruction onour website for the usage of these facilities.In the multiscale bootstrap resampling, we intentionally alter the

data size of bootstrap samples to several values. Let N be theoriginal data size, and N0 be that for bootstrap samples. In theexample of Figure 1, N ¼ 916, and N0 ¼ 458, 549, 641, 732,824, 916, 1007, 1099, 1190 and 1282. For each cluster, an observedBP value is obtained for each value of N0, and we look at change inz¼"F"1 (BP) values, whereF"1 (·) is the inverse function ofF (·),the standard normal distribution function. For the cluster labeled 62in Figure 1, the observed BP values are 0.8554, 0.8896, 0.9132,0.9335, 0.9498, 0.9636, 0.9656, 0.9756, 0.9795 and 0.9859 (Fig. 2).Then, a theoretical curve zðN0Þ ¼ v

ffiffiffiffiffiffiffiffiffiffiffiN0=N

p+ c

ffiffiffiffiffiffiffiffiffiffiffiN=N0

pis fitted to

the observed values, and the coefficients v, c are estimated for eachcluster. The AU p-value is computed by AU ¼ F("v + c). For thecluster labeled 62, v ¼ "2.01, c ¼ 0.26, and thus AU ¼ F(2.01 +0.26) ¼ F(2.27) ¼ 0.988, where BP ¼ 0.964 for N0 ¼ N. Anasymptotic theory proves that the AU p-value is less biased thanthe BP value.Currently only the simplest form of the bootstrapping, i.e. the

non-parametric bootstrap resampling, is implemented in pvclust.More elaborate models designed for specific applications, such

Hei

ght

feta

l_lu

ngX

306.

99_n

orm

alX

219.

97_n

orm

alX

222.

97_n

orm

alX

314.

99_n

orm

alX

315.

99_n

orm

alX

222.

97_A

deno

X22

6.97

_Ade

noX

306.

99_A

deno

X30

6.99

_nod

eX

313.

99P

T_A

deno

X31

3.99

MT

_Ade

noX

165.

96_A

deno

X17

8.96

_Ade

noX

185.

96_A

deno

X11

.00_

Ade

noX

181.

96_A

deno

X15

6.96

_Ade

noX

198.

96_A

deno

X18

0.96

_Ade

noX

187.

96_A

deno

X68

.96_

Ade

noX

137.

96_A

deno

X12

.00_

Ade

noX

199.

97_A

deno

_cX

199.

97_A

deno

_p X20

4.97

_Ade

noX

257.

97_A

deno

X13

2.95

_Ade

noX

319.

00P

T_A

deno

X32

0.00

_Ade

no_p

X32

0.00

_Ade

no_c

X20

7.97

_SC

LCX

314.

99_S

CLC

X23

0.97

_SC

LCX

315.

99_S

CLC

X31

5.99

_nod

eX

161.

96_A

deno

X29

9.99

_Ade

noX

75.9

5_co

mbi

ned

X69

.96_

Ade

noX

157.

96_S

CC

X21

9.97

_SC

CX

166.

96_S

CC

X3_

SC

CX

58.9

5_S

CC

X23

2.97

_SC

CX

232.

97_n

ode

X22

0.97

_SC

CX

220.

97_n

ode

X42

.95_

SC

CX

239.

97_S

CC

X24

5.97

_SC

CX

245.

97_n

ode

X24

6.97

_SC

C_p

X24

6.97

_SC

C_c

X23

7.97

_Ade

noX

147.

96_A

deno

X6.

00_L

CLC

X25

6.97

_LC

LCX

248.

97_L

CLC

X19

1.96

_Ade

noX

139.

97_L

CLC

X59

.96_

SC

CX

234.

97_A

deno

X18

4.96

_Ade

noX

184.

96_n

ode

X21

8.97

_Ade

noX

319.

00M

T1_

Ade

noX

319.

00M

T2_

Ade

noX

265.

98_A

deno

X80

.96_

Ade

noX

223.

97_A

deno

0.0

0.2

0.4

0.6

0.8

1.0

Cluster dendrogram with AU/BP values (%)

Cluster method: averageDistance: correlation

100

99 100100 10079 100100

100 100 100100100

9910010061100 96 100

10095 84 9696 75 7583 8797100 99 9296 91 9899

63 93 10087 100 5967 9975 916166 80 9571 7188 10076 9362 7168 739984

7269

797083

7774

85

au

100

100 100100 10066 100100

100 100 100100100

9910010042100 90 100

10092 49 9293 47 3060 5989100 96 8697 79 9493

53 79 10058 100 4731 9721 772745 17 9539 3756 9962 6822 4421 599658

163

16427

3029

26

bp

1

2 34 56 78

9 10 111213

1415161718 19 20

2122 23 2425 26 2728 293031 32 3334 35 3637

38 39 4041 42 4344 4546 474849 50 5152 5354 5556 5758 5960 616263

6465

666768

6970

71

edge #

Fig. 1. Hierarchical clustering of 73 lung tumors. The data are expression pattern of 916 genes of Garber et al. (2001). Values at branches are AU p-values (left),BP values (right), and cluster labels (bottom).ClusterswithAU% 95 are indicated by the rectangles. The fourth rectangle from the right is a cluster labeled 62with

AU ¼ 0.99 and BP ¼ 0.96.

Fig. 2. Diagnostic plot of the multiscale bootstrap for the cluster labeled 62.The observed z-values are plotted for

ffiffiffiffiffiffiffiffiffiffiffiN0=N

p, and the theoretical curve is

obtained by the weighted least squares fitting. This plot is obtained by

command: msplot(result, edges¼62). When the curve fitting is

poor, a breakdown of the asymptotic theory may be suspected.

Pvclust

1541

分子進化系統樹や遺伝子発現解析へ応用に関する統計手法  【SH-­‐test】  Shimodaira  and  Hasegawa  (1999)  Molecular  Biology  and  Evolu0on 引用数=2916  【CONSEL】  Shimodaira  and  Hasegawa  (2001)  Bioinforma0cs 引用数=1156 【AU  test】  Shimodaira  (2002)  Systema0c  Biology 引用数=1175 【pvclust】  Suzuki  and  Shimodaira  (2006)  Bioinforma0cs 引用数=450

ネットワーク推定への応用  【GGM】Kamimura,Shimodaira  (2004)  GIW  【LiNGAM】Komatsu,  Shimizu,  Shimodaira  (2010)  ICANN

4 Hidetoshi Shimodaira and Masami Hasegawa

0.1

623239

elephant shrew

golden mole 82 89100

aardvark

595248

tenrec

78 92100

dugong

elephant 77 86100

hyrax

93100100

98100100

armadillo

524836

wallaroo

opossum100100100

echidna

platypus

71 92100

human

baboon100100100

capuchin

slow loris

74 85100

guinea pig

mouse686095

red squirrel

69 59100

rabbit

755693

harbor seal

dog

cat

100100100

horse

white rhinoceros100100100

64 61100

hippopotamus

blue whale 99100100

cow

100100100

pig

95 99100

69 72100

Ryukyu flying fox

77 92100

greater moonrat

Japanese mole 74 79100

72 81100

6178100

100100100

100100100 100

100100

G1

G2

G3

G4

G5

Fig. 1. The ML tree for 32 mammalian species estimated from a part of the mito-chondrial protein sequences of Nikaido et al. (2003). Biological discussion is givenin Section 5. This ML topology is represented as ((G1, G2), (G3, G4), G5) usingthe five groups of taxa defined above. The MAP topology of Bayesian inference(Section 2.2) is represented as ((G1,(G2,G3)),G4,G5), where the subtrees are thesame as above except for (((elephant shrew, golden mole), aardvark), tenrec) in G3being changed to ((tenrec, aardvark), (elephant shrew, golden mole)). Our datasetconsists of the sequences of n = 3392 amino acids for s = 32 species. The mtREVmodel (Adachi and Hasegawa 1996) was used for amino acid substitutions, and thesite-heterogeneity was modeled by the discrete-gamma distribution (Yang 1996).A limited list of 2502 candidate topologies was obtained from heated MCMCMCsimulations as explained in Fig. 2 and Fig. 3, and their log-likelihood values werecalculated by PAML to find the ML topology. Three numbers near the branches arethe approximately unbiased p-value (AU) (top) and the bootstrap probability (BP)(middle) calculated by CONSEL (Section 3), and the posterior probability (PP)(bottom) calculated by MrBayes (Section 2.2).

2

Page 3: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

Suzuki  and  Shimodaira  (2006)

Function seplot provides a graphical interface for examiningstandard errors, while print gives more detailed informationabout p-values in text-based format. See online instruction onour website for the usage of these facilities.In the multiscale bootstrap resampling, we intentionally alter the

data size of bootstrap samples to several values. Let N be theoriginal data size, and N0 be that for bootstrap samples. In theexample of Figure 1, N ¼ 916, and N0 ¼ 458, 549, 641, 732,824, 916, 1007, 1099, 1190 and 1282. For each cluster, an observedBP value is obtained for each value of N0, and we look at change inz¼"F"1 (BP) values, whereF"1 (·) is the inverse function ofF (·),the standard normal distribution function. For the cluster labeled 62in Figure 1, the observed BP values are 0.8554, 0.8896, 0.9132,0.9335, 0.9498, 0.9636, 0.9656, 0.9756, 0.9795 and 0.9859 (Fig. 2).Then, a theoretical curve zðN0Þ ¼ v

ffiffiffiffiffiffiffiffiffiffiffiN0=N

p+ c

ffiffiffiffiffiffiffiffiffiffiffiN=N0

pis fitted to

the observed values, and the coefficients v, c are estimated for eachcluster. The AU p-value is computed by AU ¼ F("v + c). For thecluster labeled 62, v ¼ "2.01, c ¼ 0.26, and thus AU ¼ F(2.01 +0.26) ¼ F(2.27) ¼ 0.988, where BP ¼ 0.964 for N0 ¼ N. Anasymptotic theory proves that the AU p-value is less biased thanthe BP value.Currently only the simplest form of the bootstrapping, i.e. the

non-parametric bootstrap resampling, is implemented in pvclust.More elaborate models designed for specific applications, such

Hei

ght

feta

l_lu

ngX

306.

99_n

orm

alX

219.

97_n

orm

alX

222.

97_n

orm

alX

314.

99_n

orm

alX

315.

99_n

orm

alX

222.

97_A

deno

X22

6.97

_Ade

noX

306.

99_A

deno

X30

6.99

_nod

eX

313.

99P

T_A

deno

X31

3.99

MT

_Ade

noX

165.

96_A

deno

X17

8.96

_Ade

noX

185.

96_A

deno

X11

.00_

Ade

noX

181.

96_A

deno

X15

6.96

_Ade

noX

198.

96_A

deno

X18

0.96

_Ade

noX

187.

96_A

deno

X68

.96_

Ade

noX

137.

96_A

deno

X12

.00_

Ade

noX

199.

97_A

deno

_cX

199.

97_A

deno

_p X20

4.97

_Ade

noX

257.

97_A

deno

X13

2.95

_Ade

noX

319.

00P

T_A

deno

X32

0.00

_Ade

no_p

X32

0.00

_Ade

no_c

X20

7.97

_SC

LCX

314.

99_S

CLC

X23

0.97

_SC

LCX

315.

99_S

CLC

X31

5.99

_nod

eX

161.

96_A

deno

X29

9.99

_Ade

noX

75.9

5_co

mbi

ned

X69

.96_

Ade

noX

157.

96_S

CC

X21

9.97

_SC

CX

166.

96_S

CC

X3_

SC

CX

58.9

5_S

CC

X23

2.97

_SC

CX

232.

97_n

ode

X22

0.97

_SC

CX

220.

97_n

ode

X42

.95_

SC

CX

239.

97_S

CC

X24

5.97

_SC

CX

245.

97_n

ode

X24

6.97

_SC

C_p

X24

6.97

_SC

C_c

X23

7.97

_Ade

noX

147.

96_A

deno

X6.

00_L

CLC

X25

6.97

_LC

LCX

248.

97_L

CLC

X19

1.96

_Ade

noX

139.

97_L

CLC

X59

.96_

SC

CX

234.

97_A

deno

X18

4.96

_Ade

noX

184.

96_n

ode

X21

8.97

_Ade

noX

319.

00M

T1_

Ade

noX

319.

00M

T2_

Ade

noX

265.

98_A

deno

X80

.96_

Ade

noX

223.

97_A

deno

0.0

0.2

0.4

0.6

0.8

1.0

Cluster dendrogram with AU/BP values (%)

Cluster method: averageDistance: correlation

100

99 100100 10079 100100

100 100 100100100

9910010061100 96 100

10095 84 9696 75 7583 8797100 99 9296 91 9899

63 93 10087 100 5967 9975 916166 80 9571 7188 10076 9362 7168 739984

7269

797083

7774

85

au

100

100 100100 10066 100100

100 100 100100100

9910010042100 90 100

10092 49 9293 47 3060 5989100 96 8697 79 9493

53 79 10058 100 4731 9721 772745 17 9539 3756 9962 6822 4421 599658

163

16427

3029

26

bp

1

2 34 56 78

9 10 111213

1415161718 19 20

2122 23 2425 26 2728 293031 32 3334 35 3637

38 39 4041 42 4344 4546 474849 50 5152 5354 5556 5758 5960 616263

6465

666768

6970

71

edge #

Fig. 1. Hierarchical clustering of 73 lung tumors. The data are expression pattern of 916 genes of Garber et al. (2001). Values at branches are AU p-values (left),BP values (right), and cluster labels (bottom).ClusterswithAU% 95 are indicated by the rectangles. The fourth rectangle from the right is a cluster labeled 62with

AU ¼ 0.99 and BP ¼ 0.96.

Fig. 2. Diagnostic plot of the multiscale bootstrap for the cluster labeled 62.The observed z-values are plotted for

ffiffiffiffiffiffiffiffiffiffiffiN0=N

p, and the theoretical curve is

obtained by the weighted least squares fitting. This plot is obtained by

command: msplot(result, edges¼62). When the curve fitting is

poor, a breakdown of the asymptotic theory may be suspected.

Pvclust

1541

3

Page 4: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

Extended Data Figure 6 | Characterization of ES-like cells converted fromFgf4-induced stem cells and comparison of STAP cells with early embryos.a, Immunohistochemistry of ES-like cells for trophoblast and pluripotencymarkers. ES-like cells converted from Fgf4-induced stem cells no longerexpressed the trophoblast marker (integrin alpha 7), but they did express thepluripotency markers (Oct4, Nanog and SSEA-1). Scale bar, 100mm.b, Pluripotency of ES-like cells converted from Fgf4-induced stem cells asshown by teratoma formation. Those cells successfully formed teratomascontaining tissues from all three germ layers: neuroepithelium (left, arrowindicates), muscle tissue (middle, arrow indicates) and bronchial-likeepithelium (right). Scale bar, 100mm. c, MEK inhibitor treatment assay for

Oct4-gfp Fgf4-induced stem cells in trophoblast stem-cell medium containingFgf4. No substantial formation of Oct4-GFP1 colonies was observed fromdissociated Fgf4-induced stem cells in MEK-inhibitor-containing medium.Scale bar, 100mm. d, Cluster tree diagram from hierarchical clustering of globalexpression profiles. Red, AU P values. As this analysis included morula andblastocyst embryos from which only small amounts of RNA could be obtained,we used pre-amplification with the SMARTer Ultra Low RNA kit for IlluminaSequencing (Clontech Laboratories). e, f, Volcano plot of the expressionprofile of STAP cells compared to the morula (e) and blastocyst (f). Genesshowing greater than 10-fold change and P value 1.0 3 1026 are highlighted inred and are considered up- (or down-) regulated in the STAP cells.

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2014

Extended Data Figure 6 | Characterization of ES-like cells converted fromFgf4-induced stem cells and comparison of STAP cells with early embryos.a, Immunohistochemistry of ES-like cells for trophoblast and pluripotencymarkers. ES-like cells converted from Fgf4-induced stem cells no longerexpressed the trophoblast marker (integrin alpha 7), but they did express thepluripotency markers (Oct4, Nanog and SSEA-1). Scale bar, 100mm.b, Pluripotency of ES-like cells converted from Fgf4-induced stem cells asshown by teratoma formation. Those cells successfully formed teratomascontaining tissues from all three germ layers: neuroepithelium (left, arrowindicates), muscle tissue (middle, arrow indicates) and bronchial-likeepithelium (right). Scale bar, 100mm. c, MEK inhibitor treatment assay for

Oct4-gfp Fgf4-induced stem cells in trophoblast stem-cell medium containingFgf4. No substantial formation of Oct4-GFP1 colonies was observed fromdissociated Fgf4-induced stem cells in MEK-inhibitor-containing medium.Scale bar, 100mm. d, Cluster tree diagram from hierarchical clustering of globalexpression profiles. Red, AU P values. As this analysis included morula andblastocyst embryos from which only small amounts of RNA could be obtained,we used pre-amplification with the SMARTer Ultra Low RNA kit for IlluminaSequencing (Clontech Laboratories). e, f, Volcano plot of the expressionprofile of STAP cells compared to the morula (e) and blastocyst (f). Genesshowing greater than 10-fold change and P value 1.0 3 1026 are highlighted inred and are considered up- (or down-) regulated in the STAP cells.

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2014

Extended Data Figure 6 | Characterization of ES-like cells converted fromFgf4-induced stem cells and comparison of STAP cells with early embryos.a, Immunohistochemistry of ES-like cells for trophoblast and pluripotencymarkers. ES-like cells converted from Fgf4-induced stem cells no longerexpressed the trophoblast marker (integrin alpha 7), but they did express thepluripotency markers (Oct4, Nanog and SSEA-1). Scale bar, 100mm.b, Pluripotency of ES-like cells converted from Fgf4-induced stem cells asshown by teratoma formation. Those cells successfully formed teratomascontaining tissues from all three germ layers: neuroepithelium (left, arrowindicates), muscle tissue (middle, arrow indicates) and bronchial-likeepithelium (right). Scale bar, 100mm. c, MEK inhibitor treatment assay for

Oct4-gfp Fgf4-induced stem cells in trophoblast stem-cell medium containingFgf4. No substantial formation of Oct4-GFP1 colonies was observed fromdissociated Fgf4-induced stem cells in MEK-inhibitor-containing medium.Scale bar, 100mm. d, Cluster tree diagram from hierarchical clustering of globalexpression profiles. Red, AU P values. As this analysis included morula andblastocyst embryos from which only small amounts of RNA could be obtained,we used pre-amplification with the SMARTer Ultra Low RNA kit for IlluminaSequencing (Clontech Laboratories). e, f, Volcano plot of the expressionprofile of STAP cells compared to the morula (e) and blastocyst (f). Genesshowing greater than 10-fold change and P value 1.0 3 1026 are highlighted inred and are considered up- (or down-) regulated in the STAP cells.

LETTER RESEARCH

Macmillan Publishers Limited. All rights reserved©2014

passages) and in their vulnerability to dissociation1. However, whencultured in the presence of ACTH and LIF for 7 days, STAP cells, at amoderate frequency, further convert into pluripotent ‘stem’ cells thatrobustly proliferate (STAP stem cells).

Here we have investigated the unique nature of STAP cells, focusingon their differentiation potential into the two major categories (embry-onic and placental lineages) of cells in the blastocyst5–8. We becameparticularly interested in this question after a blastocyst injection assayrevealed an unexpected finding. In general, progeny of injected ES cellsare found in the embryonic portion of the chimaera, but rarely in theplacental portion5,7 (Fig. 1a; shown with Rosa26-GFP). Surprisingly,injected STAP cells contributed not only to the embryo but also to theplacenta and fetal membranes (Fig. 1b and Extended Data Fig. 1a–c) in60% of the chimaeric embryos (Fig. 1c).

In quantitative polymerase chain reaction (qPCR) analysis, STAP cells(sorted for strong Oct4-GFP signals) expressed not only pluripotencymarker genes but also trophoblast marker genes such as Cdx2 (Fig. 1dand Supplementary Table 1 for primers), unlike ES cells. Therefore,the blastocyst injection result is not easily explained by the idea thatSTAP cells are composed of a simple mixture of pluripotent cells(Oct41Cdx22) and trophoblast-stem-like cells (Oct42Cdx21).

In contrast to STAP cells, STAP stem cells did not show the ability tocontribute to placental tissues (Fig. 1e, lanes 2–4), indicating that the

derivation of STAP stem cells from STAP cells involves the loss ofcompetence to differentiate into placental lineages. Consistent withthis idea, STAP stem cells show little expression of trophoblast markergenes (Fig. 1f).

We next examined whether an alteration in culture conditions couldinduce in vitro conversion of STAP cells into cells similar to tropho-blast stem cells8,9, which can be derived from blastocysts during pro-longed adhesion culture in the presence of Fgf4. When we culturedSTAP cell clusters under similar conditions (Fig. 2a; one cluster per wellin a 96-well plate), flat cell colonies grew out by days 7–10 (Fig. 2b, left;typically in ,30% of wells). The Fgf4-induced cells strongly expressedthe trophoblast marker proteins9–12 integrin a7 (Itga7) and eomesoder-min (Eomes) (Fig. 2c, d) and marker genes (for example, Cdx2; Fig. 2e).

These Fgf4-induced cells with trophoblast marker expression couldbe expanded efficiently in the presence of Fgf4 by passaging for morethan 30 passages with trypsin digestion every third day. Hereafter,these proliferative cells induced from STAP cells by Fgf4 treatmentare referred to as Fgf4-induced stem cells. This type of derivation intotrophoblast-stem-like cells is not common with ES cells (unless genet-ically manipulated)13 or STAP stem cells.

In the blastocyst injection assay, unlike STAP stem cells, the pla-cental contribution of Fgf4-induced stem cells (cag-GFP-labelled) wasobserved with 53% of embryos (Fig. 2f, g; n 5 60). In the chimaeric

STAP clusters

Passage

Outgrowth of trophoblast-like cells

Fgf4 medium7–10 days

a

b c

e

Integrin α7 Eomes

f

DAPI

cag-GFP

Placenta

d

h

Per

cent

age

of G

FP+

cells

Placental contribution

cag-GFP

Bright-field

Cross-section g Chimaeric fetus

1 32 4

FI-S

C 1FI

-SC 2

FI-S

C 3GFP

-ES 1

GFP-E

S 2GFP

-ES 3

5 6

0

20

10

18161412

2468

Day 1

Day 7

Passage 1(on feeder)

Passage 1(enlarged

view)

i j

Rel

ativ

e ex

pres

sion

leve

ls(T

S=1

.0)

cag-GFP

Rel

ativ

e ex

pres

sion

leve

ls(E

S=1

.0)

Fgf4 medium

Oct4-GFPBright-field

***

NS

NS

k

ES TSFG

F-ind

uced

STAP

CD45+

Oct4 Nanog Rex1

JAKi– JAKi+ JAKi– JAKi+ES FI-SC

00.20.4

0.60.8

1.2

1.0

00.20.40.60.8

1.2

1.0

Cdx2 Eomes Elf5 Itga7

JAKi–JAKi+

FI-SC

TS

0

5

10

15

20

25

30

Oct4

Cdx2

Eomes

Rel

ativ

e ex

pres

sion

leve

ls(E

S=1

.0)

ES STAP-SC

CD45

FI-SCTS

STAP

Hei

ght

0.00

0.10

0.15

0.05

au

100

100100

100

Figure 2 | Fgf4 treatment induces sometrophoblast-lineage character in STAP cells.a, Schematic of Fgf4 treatment to induceFgf4-induced stem cells from STAP cells. b, Fgf4treatment promoted the generation of flat cellclusters that expressed Oct4-GFP at moderatelevels (right). Top and middle: days 1 and 7 ofculture with Fgf4, respectively. Bottom: cultureafter the first passage. Scale bar, 50mm.c, d, Immunostaining of Fgf4-induced cells withthe trophoblast stem cell markers integrin a7(c) and eomesodermin (d). Scale bar, 50mm.e, qPCR analysis of marker expression.f, g, Placental contribution of Fgf4-induced stemcells (FI-SCs) (genetically labelled with constitutiveGFP expression). Scale bars: 5.0 mm (f (left panel)and g); 50mm (f, right panel). In addition toplacental contribution, Fgf4-induced stem cellscontributed to the embryonic portion at amoderate level (g). h, Quantification of placentalcontribution by FACS analysis. Unlike Fgf4-induced cells, ES cells did not contribute toplacental tissues at a detectable level. i, Cluster treediagram from hierarchical clustering of globalexpression profiles. Red, approximately unbiasedP values. j, qPCR analysis of Fgf4-induced cells(cultured under feeder-free conditions) with orwithout JAK inhibitor (JAKi) treatment forpluripotent marker genes. k, qPCR analysis ofFI-SCs with or without JAK inhibitor (JAKi)treatment for trophoblast marker genes. Values areshown as ratio to the expression level in ES cells(j) or trophoblast stem cells (k). ***P , 0.001;NS, not significant; t-test for each gene betweengroups with and without JAK inhibitor treatment.n 5 3. Statistical significance was all the same withthree pluripotency markers. None of thetrophoblast marker genes showed statisticalsignificance. Error bars represent s.d.

LETTER RESEARCH

3 0 J A N U A R Y 2 0 1 4 | V O L 5 0 5 | N A T U R E | 6 7 7

Macmillan Publishers Limited. All rights reserved©2014

passages) and in their vulnerability to dissociation1. However, whencultured in the presence of ACTH and LIF for 7 days, STAP cells, at amoderate frequency, further convert into pluripotent ‘stem’ cells thatrobustly proliferate (STAP stem cells).

Here we have investigated the unique nature of STAP cells, focusingon their differentiation potential into the two major categories (embry-onic and placental lineages) of cells in the blastocyst5–8. We becameparticularly interested in this question after a blastocyst injection assayrevealed an unexpected finding. In general, progeny of injected ES cellsare found in the embryonic portion of the chimaera, but rarely in theplacental portion5,7 (Fig. 1a; shown with Rosa26-GFP). Surprisingly,injected STAP cells contributed not only to the embryo but also to theplacenta and fetal membranes (Fig. 1b and Extended Data Fig. 1a–c) in60% of the chimaeric embryos (Fig. 1c).

In quantitative polymerase chain reaction (qPCR) analysis, STAP cells(sorted for strong Oct4-GFP signals) expressed not only pluripotencymarker genes but also trophoblast marker genes such as Cdx2 (Fig. 1dand Supplementary Table 1 for primers), unlike ES cells. Therefore,the blastocyst injection result is not easily explained by the idea thatSTAP cells are composed of a simple mixture of pluripotent cells(Oct41Cdx22) and trophoblast-stem-like cells (Oct42Cdx21).

In contrast to STAP cells, STAP stem cells did not show the ability tocontribute to placental tissues (Fig. 1e, lanes 2–4), indicating that the

derivation of STAP stem cells from STAP cells involves the loss ofcompetence to differentiate into placental lineages. Consistent withthis idea, STAP stem cells show little expression of trophoblast markergenes (Fig. 1f).

We next examined whether an alteration in culture conditions couldinduce in vitro conversion of STAP cells into cells similar to tropho-blast stem cells8,9, which can be derived from blastocysts during pro-longed adhesion culture in the presence of Fgf4. When we culturedSTAP cell clusters under similar conditions (Fig. 2a; one cluster per wellin a 96-well plate), flat cell colonies grew out by days 7–10 (Fig. 2b, left;typically in ,30% of wells). The Fgf4-induced cells strongly expressedthe trophoblast marker proteins9–12 integrin a7 (Itga7) and eomesoder-min (Eomes) (Fig. 2c, d) and marker genes (for example, Cdx2; Fig. 2e).

These Fgf4-induced cells with trophoblast marker expression couldbe expanded efficiently in the presence of Fgf4 by passaging for morethan 30 passages with trypsin digestion every third day. Hereafter,these proliferative cells induced from STAP cells by Fgf4 treatmentare referred to as Fgf4-induced stem cells. This type of derivation intotrophoblast-stem-like cells is not common with ES cells (unless genet-ically manipulated)13 or STAP stem cells.

In the blastocyst injection assay, unlike STAP stem cells, the pla-cental contribution of Fgf4-induced stem cells (cag-GFP-labelled) wasobserved with 53% of embryos (Fig. 2f, g; n 5 60). In the chimaeric

STAP clusters

Passage

Outgrowth of trophoblast-like cells

Fgf4 medium7–10 days

a

b c

e

Integrin α7 Eomes

f

DAPI

cag-GFP

Placenta

d

h

Per

cent

age

of G

FP+

cells

Placental contribution

cag-GFP

Bright-field

Cross-section g Chimaeric fetus

1 32 4

FI-S

C 1FI

-SC 2

FI-S

C 3GFP

-ES 1

GFP-E

S 2GFP

-ES 3

5 6

0

20

10

18161412

2468

Day 1

Day 7

Passage 1(on feeder)

Passage 1(enlarged

view)

i j

Rel

ativ

e ex

pres

sion

leve

ls(T

S=1

.0)

cag-GFP

Rel

ativ

e ex

pres

sion

leve

ls(E

S=1

.0)

Fgf4 medium

Oct4-GFPBright-field

***

NS

NS

k

ES TSFG

F-ind

uced

STAP

CD45+

Oct4 Nanog Rex1

JAKi– JAKi+ JAKi– JAKi+ES FI-SC

00.20.4

0.60.8

1.2

1.0

00.20.40.60.8

1.2

1.0

Cdx2 Eomes Elf5 Itga7

JAKi–JAKi+

FI-SC

TS

0

5

10

15

20

25

30

Oct4

Cdx2

Eomes

Rel

ativ

e ex

pres

sion

leve

ls(E

S=1

.0)

ES STAP-SC

CD45

FI-SCTS

STAP

Hei

ght

0.00

0.10

0.15

0.05

au

100

100100

100

Figure 2 | Fgf4 treatment induces sometrophoblast-lineage character in STAP cells.a, Schematic of Fgf4 treatment to induceFgf4-induced stem cells from STAP cells. b, Fgf4treatment promoted the generation of flat cellclusters that expressed Oct4-GFP at moderatelevels (right). Top and middle: days 1 and 7 ofculture with Fgf4, respectively. Bottom: cultureafter the first passage. Scale bar, 50mm.c, d, Immunostaining of Fgf4-induced cells withthe trophoblast stem cell markers integrin a7(c) and eomesodermin (d). Scale bar, 50mm.e, qPCR analysis of marker expression.f, g, Placental contribution of Fgf4-induced stemcells (FI-SCs) (genetically labelled with constitutiveGFP expression). Scale bars: 5.0 mm (f (left panel)and g); 50mm (f, right panel). In addition toplacental contribution, Fgf4-induced stem cellscontributed to the embryonic portion at amoderate level (g). h, Quantification of placentalcontribution by FACS analysis. Unlike Fgf4-induced cells, ES cells did not contribute toplacental tissues at a detectable level. i, Cluster treediagram from hierarchical clustering of globalexpression profiles. Red, approximately unbiasedP values. j, qPCR analysis of Fgf4-induced cells(cultured under feeder-free conditions) with orwithout JAK inhibitor (JAKi) treatment forpluripotent marker genes. k, qPCR analysis ofFI-SCs with or without JAK inhibitor (JAKi)treatment for trophoblast marker genes. Values areshown as ratio to the expression level in ES cells(j) or trophoblast stem cells (k). ***P , 0.001;NS, not significant; t-test for each gene betweengroups with and without JAK inhibitor treatment.n 5 3. Statistical significance was all the same withthree pluripotency markers. None of thetrophoblast marker genes showed statisticalsignificance. Error bars represent s.d.

LETTER RESEARCH

3 0 J A N U A R Y 2 0 1 4 | V O L 5 0 5 | N A T U R E | 6 7 7

Macmillan Publishers Limited. All rights reserved©2014

passages) and in their vulnerability to dissociation1. However, whencultured in the presence of ACTH and LIF for 7 days, STAP cells, at amoderate frequency, further convert into pluripotent ‘stem’ cells thatrobustly proliferate (STAP stem cells).

Here we have investigated the unique nature of STAP cells, focusingon their differentiation potential into the two major categories (embry-onic and placental lineages) of cells in the blastocyst5–8. We becameparticularly interested in this question after a blastocyst injection assayrevealed an unexpected finding. In general, progeny of injected ES cellsare found in the embryonic portion of the chimaera, but rarely in theplacental portion5,7 (Fig. 1a; shown with Rosa26-GFP). Surprisingly,injected STAP cells contributed not only to the embryo but also to theplacenta and fetal membranes (Fig. 1b and Extended Data Fig. 1a–c) in60% of the chimaeric embryos (Fig. 1c).

In quantitative polymerase chain reaction (qPCR) analysis, STAP cells(sorted for strong Oct4-GFP signals) expressed not only pluripotencymarker genes but also trophoblast marker genes such as Cdx2 (Fig. 1dand Supplementary Table 1 for primers), unlike ES cells. Therefore,the blastocyst injection result is not easily explained by the idea thatSTAP cells are composed of a simple mixture of pluripotent cells(Oct41Cdx22) and trophoblast-stem-like cells (Oct42Cdx21).

In contrast to STAP cells, STAP stem cells did not show the ability tocontribute to placental tissues (Fig. 1e, lanes 2–4), indicating that the

derivation of STAP stem cells from STAP cells involves the loss ofcompetence to differentiate into placental lineages. Consistent withthis idea, STAP stem cells show little expression of trophoblast markergenes (Fig. 1f).

We next examined whether an alteration in culture conditions couldinduce in vitro conversion of STAP cells into cells similar to tropho-blast stem cells8,9, which can be derived from blastocysts during pro-longed adhesion culture in the presence of Fgf4. When we culturedSTAP cell clusters under similar conditions (Fig. 2a; one cluster per wellin a 96-well plate), flat cell colonies grew out by days 7–10 (Fig. 2b, left;typically in ,30% of wells). The Fgf4-induced cells strongly expressedthe trophoblast marker proteins9–12 integrin a7 (Itga7) and eomesoder-min (Eomes) (Fig. 2c, d) and marker genes (for example, Cdx2; Fig. 2e).

These Fgf4-induced cells with trophoblast marker expression couldbe expanded efficiently in the presence of Fgf4 by passaging for morethan 30 passages with trypsin digestion every third day. Hereafter,these proliferative cells induced from STAP cells by Fgf4 treatmentare referred to as Fgf4-induced stem cells. This type of derivation intotrophoblast-stem-like cells is not common with ES cells (unless genet-ically manipulated)13 or STAP stem cells.

In the blastocyst injection assay, unlike STAP stem cells, the pla-cental contribution of Fgf4-induced stem cells (cag-GFP-labelled) wasobserved with 53% of embryos (Fig. 2f, g; n 5 60). In the chimaeric

STAP clusters

Passage

Outgrowth of trophoblast-like cells

Fgf4 medium7–10 days

a

b c

e

Integrin α7 Eomes

f

DAPI

cag-GFP

Placenta

d

h

Per

cent

age

of G

FP+

cells

Placental contribution

cag-GFP

Bright-field

Cross-section g Chimaeric fetus

1 32 4

FI-S

C 1FI

-SC 2

FI-S

C 3GFP

-ES 1

GFP-E

S 2GFP

-ES 3

5 6

0

20

10

18161412

2468

Day 1

Day 7

Passage 1(on feeder)

Passage 1(enlarged

view)

i j

Rel

ativ

e ex

pres

sion

leve

ls(T

S=1

.0)

cag-GFP

Rel

ativ

e ex

pres

sion

leve

ls(E

S=1

.0)

Fgf4 medium

Oct4-GFPBright-field

***

NS

NS

k

ES TSFG

F-ind

uced

STAP

CD45+

Oct4 Nanog Rex1

JAKi– JAKi+ JAKi– JAKi+ES FI-SC

00.20.4

0.60.8

1.2

1.0

00.20.40.60.8

1.2

1.0

Cdx2 Eomes Elf5 Itga7

JAKi–JAKi+

FI-SC

TS

0

5

10

15

20

25

30

Oct4

Cdx2

Eomes

Rel

ativ

e ex

pres

sion

leve

ls(E

S=1

.0)

ES STAP-SC

CD45

FI-SCTS

STAP

Hei

ght

0.00

0.10

0.15

0.05

au

100

100100

100

Figure 2 | Fgf4 treatment induces sometrophoblast-lineage character in STAP cells.a, Schematic of Fgf4 treatment to induceFgf4-induced stem cells from STAP cells. b, Fgf4treatment promoted the generation of flat cellclusters that expressed Oct4-GFP at moderatelevels (right). Top and middle: days 1 and 7 ofculture with Fgf4, respectively. Bottom: cultureafter the first passage. Scale bar, 50mm.c, d, Immunostaining of Fgf4-induced cells withthe trophoblast stem cell markers integrin a7(c) and eomesodermin (d). Scale bar, 50mm.e, qPCR analysis of marker expression.f, g, Placental contribution of Fgf4-induced stemcells (FI-SCs) (genetically labelled with constitutiveGFP expression). Scale bars: 5.0 mm (f (left panel)and g); 50mm (f, right panel). In addition toplacental contribution, Fgf4-induced stem cellscontributed to the embryonic portion at amoderate level (g). h, Quantification of placentalcontribution by FACS analysis. Unlike Fgf4-induced cells, ES cells did not contribute toplacental tissues at a detectable level. i, Cluster treediagram from hierarchical clustering of globalexpression profiles. Red, approximately unbiasedP values. j, qPCR analysis of Fgf4-induced cells(cultured under feeder-free conditions) with orwithout JAK inhibitor (JAKi) treatment forpluripotent marker genes. k, qPCR analysis ofFI-SCs with or without JAK inhibitor (JAKi)treatment for trophoblast marker genes. Values areshown as ratio to the expression level in ES cells(j) or trophoblast stem cells (k). ***P , 0.001;NS, not significant; t-test for each gene betweengroups with and without JAK inhibitor treatment.n 5 3. Statistical significance was all the same withthree pluripotency markers. None of thetrophoblast marker genes showed statisticalsignificance. Error bars represent s.d.

LETTER RESEARCH

3 0 J A N U A R Y 2 0 1 4 | V O L 5 0 5 | N A T U R E | 6 7 7

Macmillan Publishers Limited. All rights reserved©2014

LETTERdoi:10.1038/nature12969

Bidirectional developmental potential inreprogrammed cells with acquired pluripotencyHaruko Obokata1,2,3, Yoshiki Sasai4, Hitoshi Niwa5, Mitsutaka Kadota6, Munazah Andrabi6, Nozomu Takata4, Mikiko Tokoro2,Yukari Terashita1,2, Shigenobu Yonemura7, Charles A. Vacanti3 & Teruhiko Wakayama2,8

We recently discovered an unexpected phenomenon of somatic cellreprogramming into pluripotent cells by exposure to sublethal stim-uli, which we call stimulus-triggered acquisition of pluripotency(STAP)1. This reprogramming does not require nuclear transfer2,3

or genetic manipulation4. Here we report that reprogrammed STAPcells, unlike embryonic stem (ES) cells, can contribute to both embry-onic and placental tissues, as seen in a blastocyst injection assay.Mouse STAP cells lose the ability to contribute to the placenta aswell as trophoblast marker expression on converting into ES-likestem cells by treatment with adrenocorticotropic hormone (ACTH)and leukaemia inhibitory factor (LIF). In contrast, when culturedwith Fgf4, STAP cells give rise to proliferative stem cells with enhancedtrophoblastic characteristics. Notably, unlike conventional tropho-blast stem cells, the Fgf4-induced stem cells from STAP cells con-tribute to both embryonic and placental tissues in vivo and transforminto ES-like cells when cultured with LIF-containing medium. Taken

together, the developmental potential of STAP cells, shown by chi-maera formation and in vitro cell conversion, indicates that theyrepresent a unique state of pluripotency.

We recently discovered an intriguing phenomenon of cellular fateconversion: somatic cells regain pluripotency after experiencing sub-lethal stimuli such as a low-pH exposure1. When splenic CD451 lym-phocytes are exposed to pH 5.7 for 30 min and subsequently culturedin the presence of LIF, a substantial portion of surviving cells start toexpress the pluripotent cell marker Oct4 (also called Pou5f1) at day 2.By day 7, pluripotent cell clusters form with a bona fide pluripotencymarker profile and acquire the competence for three-germ-layer differ-entiation as shown by teratoma formation. These STAP cells can alsoefficiently contribute to chimaeric mice and undergo germline trans-mission using a blastocyst injection assay1. Although these charac-teristics resemble those of ES cells, STAP cells seem to differ from EScells in their limited capacity for self-renewal (typically, for only a few

1Laboratory for Cellular Reprogramming, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan. 2Laboratory for Genomic Reprogramming, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan. 3Laboratory for Tissue Engineering and Regenerative Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA. 4Laboratory for Organogenesisand Neurogenesis, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan. 5Laboratory for Pluripotent Stem Cell Studies, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan.6Genome Resource and Analysis Unit, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan. 7Electron Microscopy Laboratory, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan.8Faculty of Life and Environmental Sciences, University of Yamanashi, Yamanashi 400-8510, Japan.

STAP chimera cag-GFPSTAP chimaera

Bright-fieldES chimaera cag-GFP

Bright-field

Long exposure

Long exposure

c

Contribution patternEmbryo-specificEmbryo+placenta+yolk sac

Per

cent

age

of e

mbr

yos

with

eac

h co

ntrib

utio

n pa

tter

n 100

0

908070605040302010

STAPES

d

Rel

ativ

e ex

pres

sion

leve

ls(E

S=1

.0)

Rel

ativ

e ex

pres

sion

leve

ls(T

S=1

.0)

0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

ES STAP CD45 TS

Oct4 Nanog Rex1

00.20.40.60.81.01.21.41.6

TS STAP CD45 ES

Cdx2 Eomes Elf5

***

eP

erce

ntag

e of

GFP

+ ce

lls

Placental contribution

1 32 4

0

20

10

18161412

2468

0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

Cdx2

Eomes

Elf5

Rel

ativ

e ex

pres

sion

leve

ls(T

S=1

.0)

f

a

b n = 50 n = 10

TS

STAP-S

C 1

STAP-S

C 2

STAP-S

C 3 TS

STAP-S

C ES

Figure 1 | STAP cells contribute to both embryonic and placental tissuesin vivo. a, b, E12.5 embryos from blastocysts injected with ES cells (a) andSTAP cells (b). Both cells are genetically labelled with GFP driven by aconstitutive promoter. Progeny of STAP cells also contributed to placentaltissues and fetal membranes (b), whereas ES-cell-derived cells were not foundin these tissues (a). Scale bar, 5.0 mm. c, Percentages of fetuses in which injectedcells contributed only to the embryonic portion (red) or also to placentaland yolk sac tissues (blue). ***P , 0.001 with Fisher’s exact test. d, qPCR

analysis of FACS-sorted Oct4-GFP-strong STAP cells for pluripotent markergenes (left) and trophoblast marker genes (right). Values are shown as ratio tothe expression level in ES cells. Error bars represent s.d. e, Contribution toplacental tissues. Unlike parental STAP cells and trophoblast stem (TS) cells,STAP stem cells (STAP-SCs) did not retain the ability for placentalcontributions. Three independent lines were tested and all showed substantialcontributions to the embryonic portions. f, qPCR analysis of trophoblastmarker gene expression in STAP stem cells. Error bars represent s.d.

6 7 6 | N A T U R E | V O L 5 0 5 | 3 0 J A N U A R Y 2 0 1 4

Macmillan Publishers Limited. All rights reserved©2014

4

Page 5: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

統計的モデル選択の情報量規準

Z1

Z2 Y

観測部分 Y 潜在部分 Z

ここを評価したい!

興味がない!

【不完全データの情報量規準  PDIO】    H.  Shimodaira  (1994)    Lecture  Notes  in  Sta0s0cs  【不完全データの一部に興味がある場合の情報量規準】 原照雅,下平英寿  (2011) 統計関連学会連合大会    【covariate  shi`,共変量シフト】  H.  Shimodaira  (2000)    Journal  of  sta0s0cal  planning  and  inference 引用数=357

230 H. Shimodaira / Journal of Statistical Planning and Inference 90 (2000) 227–244

Fig. 1. Fitting of polynomial regression with degree d=1. (a) Samples (xt ; yt) of size n=100 are generatedfrom q(y|x)q0(x) and plotted as circles, where the underlying true curve is indicated by the thin dotted line.The solid line is obtained by OLS, and the dotted line is WLS with weight q1(x)=q0(x). (b) Samples ofn = 100 are generated from q(y|x)q1(x), and the regression line is obtained by OLS.

On the other hand, MWLE !w is obtained by weighted least squares (WLS) withweights w(xt) for the normal regression. We again consider the model with d = 1,and the regression line !tted by WLS with w(x) = q1(x)=q0(x) is drawn in dotted linein Fig. 1a. Here, the density q1(x) for imaginary “future” observations or that for thewhole population in sample surveys is speci!ed in advance by

x ∼ N("1; #21); (2.4)

where "1 = 0:0; #21 = 0:32. The ratio of q1(x) to q0(x) is

q1(x)q0(x)

=exp(−(x − "1)2=2#21)=#1exp(−(x − "0)2=2#20)=#0

˙ exp!

− (x − "")2

2 "#2

"

; (2.5)

where "#2 = (#−21 − #−20 )−1 = 0:382, and "" = "#2(#−21 "1 − #−20 "0) =−0:28.

The obtained lines in Fig. 1a are very di#erent for OLS and WLS. The questionis: which is better than the other? It is known that OLS is the best linear unbiasedestimate and makes small mean squared error of prediction in terms of q(y|x)q0(x)which generated the data. On the other hand, WLS with weight (2.5) makes smallprediction error in terms of q(y|x)q1(x) which will generate future observations, andthus WLS is better than OLS here. To con!rm this, a dataset of size n = 100 isgenerated from q(y|x)q1(x) speci!ed by (2.2) and (2.4). The regression line of d= 1!tted by OLS is shown in Fig. 1b, which is considered to have small prediction errorfor the “future” data. The regression line of WLS !tted to the past data in Fig. 1a isquite similar to the line of OLS !tted to the future data in Fig. 1b. In practice, onlythe past data is available. The WLS gave almost the equivalent result to the futureOLS by using only the past data.

直接観測できない変数に興味がある場合

H. Shimodaira / Journal of Statistical Planning and Inference 90 (2000) 227–244 233

For su!ciently large n, loss1(!∗w) is the dominant term on the right-hand side of(4.1), and the optimal weight is w(x) = q1(x)=q0(x) as seen in Section 3. If n is notlarge enough compared with the extent of the misspeci"cation, the O(n−1) terms relatedto the "rst and second moments of !w−!∗w cannot be ignored in (4.1), and the optimalweight changes. In an extreme case where the model is correctly speci"ed, we onlyhave to look at the O(n−1) terms as shown in the following lemma.

Lemma 4. Assume there exists ! ∗ ∈" such that q(y|x) =p(y|x; ! ∗). Then; !∗w = ! ∗

and q(y|x) = p(y|x; !∗w) for all proper w(x). The expected loss E(n)0 (loss1(!w)) is

minimized when w(x) ≡ 1 for su!ciently large n.

5. Information criterion

The performance of MWLE for a speci"ed w(x) is given by (4.1). However, wecannot calculate the value of the expected loss from it in practice, because q(y|x) isunknown. We provide a variant of the information criterion as an estimate of (4.1).

Theorem 1. Let the information criterion for MWLE be

ICw:=− 2L1(!w) + 2 tr(JwH−1w ); (5.1)

where

L1(!) =n!

t=1

q1(x)q0(x)

logp(yt |xt ; !);

Jw =−E0

"

q1(x)q0(x)

@ logp(y|x; !)@!

#

#

#

#

!∗w

@lw(x; y|!)@!′

#

#

#

#

!∗w

$

:

The matrices Jw and Hw may be replaced by their consistent estimates

J w =−1n

n!

t=1

q1(xt)q0(xt)

@ logp(yt |xt ; !)@ !

#

#

#

#

!w

@lw(xt ; yt |!)@!′

#

#

#

#

!w

;

H w =1n

n!

t=1

@2lw(xt ; yt |!)@!@!′

#

#

#

#

!w

:

Then; ICw=2n is an estimate of the expected loss unbiased up to O(n−1) term:

E(n)0 (ICw=2n) = E(n)0 (loss1(!w)) + o(n

−1): (5.2)

Fortunately, expression (5.1) turns out to be rather simpler than that of (4.1), andwe do not have to worry about the calculation of the third derivatives appeared in(4.2). The explicit form of (5.1) for the normal regression is given in (6.1) and (6.2).Given the model p(y|x; !) and the data (x(n); y(n)), we choose a weight function w(x)

which attains the minimum of ICw over a certain class of weights. This is selectionof the weight rather than model selection. Searching the optimal weight over all the

H. Shimodaira / Journal of Statistical Planning and Inference 90 (2000) 227–244 233

For su!ciently large n, loss1(!∗w) is the dominant term on the right-hand side of(4.1), and the optimal weight is w(x) = q1(x)=q0(x) as seen in Section 3. If n is notlarge enough compared with the extent of the misspeci"cation, the O(n−1) terms relatedto the "rst and second moments of !w−!∗w cannot be ignored in (4.1), and the optimalweight changes. In an extreme case where the model is correctly speci"ed, we onlyhave to look at the O(n−1) terms as shown in the following lemma.

Lemma 4. Assume there exists ! ∗ ∈" such that q(y|x) =p(y|x; ! ∗). Then; !∗w = ! ∗

and q(y|x) = p(y|x; !∗w) for all proper w(x). The expected loss E(n)0 (loss1(!w)) is

minimized when w(x) ≡ 1 for su!ciently large n.

5. Information criterion

The performance of MWLE for a speci"ed w(x) is given by (4.1). However, wecannot calculate the value of the expected loss from it in practice, because q(y|x) isunknown. We provide a variant of the information criterion as an estimate of (4.1).

Theorem 1. Let the information criterion for MWLE be

ICw:=− 2L1(!w) + 2 tr(JwH−1w ); (5.1)

where

L1(!) =n!

t=1

q1(x)q0(x)

logp(yt |xt ; !);

Jw =−E0

"

q1(x)q0(x)

@ logp(y|x; !)@!

#

#

#

#

!∗w

@lw(x; y|!)@!′

#

#

#

#

!∗w

$

:

The matrices Jw and Hw may be replaced by their consistent estimates

J w =−1n

n!

t=1

q1(xt)q0(xt)

@ logp(yt |xt ; !)@ !

#

#

#

#

!w

@lw(xt ; yt |!)@!′

#

#

#

#

!w

;

H w =1n

n!

t=1

@2lw(xt ; yt |!)@!@!′

#

#

#

#

!w

:

Then; ICw=2n is an estimate of the expected loss unbiased up to O(n−1) term:

E(n)0 (ICw=2n) = E(n)0 (loss1(!w)) + o(n

−1): (5.2)

Fortunately, expression (5.1) turns out to be rather simpler than that of (4.1), andwe do not have to worry about the calculation of the third derivatives appeared in(4.2). The explicit form of (5.1) for the normal regression is given in (6.1) and (6.2).Given the model p(y|x; !) and the data (x(n); y(n)), we choose a weight function w(x)

which attains the minimum of ICw over a certain class of weights. This is selectionof the weight rather than model selection. Searching the optimal weight over all the

説明変数の分布が変化する場合

5

Page 6: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

複雑ネットワークを仮定して推定

6

Paul Sheridan December 2010

Breast cancer discovery

35

2

3

1

2

3

1

Random structure prior Scale-free structure prior

�� = 2.28

1:FOXA1 is a known hub necessary for the expression of estrogen regulated genes.2:SLC39A6 is a previously unknown hub that has been implicated with the spread of breast cancer to the lymph nodes.3:E2F3 is a related to the expression of estrogen regulated genes.

P(G)=ランダムグラフ P(G)=複雑ネットワーク

【MCMCでベイズ事後確率の計算】 Sheridan, Kamimura, Shimodaira (2010) PLoS ONE

 【L1正則化によるスパース推定】 小倉,廣瀬,下平 (2013) 日本計算機統計学会シンポ

1:FOXA1,  2:SLC39A6,  3:E2F3

breast  cancer  expression  dataset    

psf (G) proved versatile enough to recover random networks onpar with pr(G) itself. Above all, our analysis of the breast cancerexpression data illustrates the practical value of the scale-freestructure prior as an instrument to aid in the identification ofcandidate hub genes with the potential to direct the hypotheses ofmolecular biologists, and thus drive future experiments.

A node labeling s~ s1,s2, . . . ,sp

! ", that is, a permutation of

the integers 1,2, . . . ,p applied to the nodes of G so that each vi[Vis represented by the integer si, is an essential prerequisite for anyMCMC implementation of the scale-free structure prior. Thereason is that the scale-free network model underlying psf (G), orany other scale-free network model for that matter, is so elaboratethat the nodes are not interchangeable in regard to computing theprobability of G. And, although easily overshadowed by psf (G)itself, our new Metropolis-Hastings sampler for s is an innovativecontribution in its own right. Our sampler uses a simple pairswapping strategy for updating s, and one future topic of researchis to investigate the comparative performance of more ingeniousupdate schemes. More research is also required in order to assesseshow accurately s can be estimated.

We take pains to point out that while our implementation is forGGMs, the methodology described here applies to graphicalmodels more generally. For instance, psf (G) could be appliedcrudely to Bayesian network inference by simply ignoring edgedirectionality, or else the underlying static model could bemodified to have directed edges. The latter approach raises aninteresting consideration: in gene regulatory networks, accordingto the prevailing wisdom [31], it is actually only the out-degreedistribution that follows a power-law. By contrast, the in-degree of

a node is usually small and its distribution is better approximatedby a sort of restricted exponential function. While this distinctiongets blurred when inference is conducted with undirectedgraphical models, Bayesian networks provide an obvious incentivefor taking it into account. Indeed, Bayesian networks may prove tobe a more promising area of application because they currentlyable to handle much larger networks than GGMs [63].

Although the static model is not biologically motivated, it is adefensible choice as an underlying model for psf (G) on thegrounds that it is a simple model with the potential to describe anynetwork topology; not to mention that it includes the ER model asa limiting case. But there is more, implementing a structure priorbased on a growing network model poses some added difficultiesbecause not only will the probability of a network depend on thechoice of seed network, but evaluating P(GDh,s) will result in agreater expenditure of computational resources as the edgeinclusion probabilities depend on the order in which they wereadded to the network.

All the same, we implemented two other scale-free structurepriors based on growing models; one on the Poisson-growth,preferential attachment model [64], and another on thebiologically meaningful duplication model. In the former case,we were able to get away with using a single node as the seednetwork, and we found that while this prior recovered heavy-tailednetworks as well as psf (G), yet it understandably struggled toaccurately recover random networks. Meanwhile, the duplicationmodel based structure prior was highly sensitive to the choice ofseed network in addition to being unstable due to the complexityof the model. One future avenue of research is to adapt these

Figure 5. Gene interactions identified by all methods. A subnetwork of the gene interactions involved with the estrogen receptor gene, ESR1,that were commonly identified by the random structure prior, pr(G), the scale-free structure prior, psf (G), and the relevance network (estimated withARACNE). The directed edges indicate the gene interactions on which the Bayesian network (estimated with BANJO) agreed with the other threemethods.doi:10.1371/journal.pone.0013580.g005

A Scale-Free Structure Prior

PLoS ONE | www.plosone.org 10 November 2010 | Volume 5 | Issue 11 | e13580

Page 7: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

ネットワーク成長メカニズムの推定

7

【複雑ネットワークの生成モデルにおける優先的選択関数のノンパラメトリック推定】 Pham, Sheridan, 下平 (2013) 統計関連学会連合大会

優先的選択関数𝐴

2

有名人の アカウント

k = 10000

自分の アカウント

k = 2

新しいアカウント

Youtubeのアカウント間のフォロー関係

𝑃 𝑘 ∝ 𝐴 :次数𝑘のノードに繋がる確率 𝐴 = 𝑘 Barabasi-Albert(1999) 𝐴 = 𝑘 Newman(2001)

恐らく次数の高い方に

どっちにフォローするか?

提案手法で推定した𝐴

3

𝑘

1e-08

10e-07

10e-05

1e+00 1e+01 1e+02 1e+03 1e+04 1e+05

𝐴

Youtube データ ノード数 : 約3 ∗ 10 エッジ 数 : 約20 ∗ 10

Page 8: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

ブートストラップ法による信頼度計算    

遺伝子発現のクラスタリングや分子進化の系統樹を例として  ベイズ事後確率とp値の関係を探る

下平英寿  大阪大学 大学院基礎工学研究科

8

Page 9: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

チンパンジー

ヒト

ゴリラ

オランウータン

マウス

塩基置換数から系統樹を推定 NC_001807 NC_012920 NC_001643 NC_001645 NC_002083 NC_012920 18

NC_001643 1188 1185 NC_001645 1481 1479 1420

NC_002083 1994 2001 2090 2116

NC_010339 4069 4068 4052 4102 4127

推定された系統樹 (無根系統樹) 推定された系統樹 (有根系統樹)

置換数  (推定値)

ココに根があると考える(マウス以外)  

4052

4127

2116

1479

1188

9

Page 10: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

紙テープで系統樹を工作

10

hdp://www.youtube.com/watch?v=jqqJODQytzk YouTubeにも置きました

Page 11: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

11  

ヒトの起源はアフリカ

Ingman  et  al.  (2000)  Nature  

アフリカ以外

アフリカ

チンパンジー

ヒトとチンプの 共通祖先

アフリカ大陸

ヨーロッパ大陸

進出

98  

98  

98  

82  

100  

100  

Page 12: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

哺乳類の系統樹 (32種)

12

ML  tree  topology:  ((G1,G2),  (G3,G4),  G5)

Fig  1  of  Shimodaira  and  Hasegawa  (2005)  from  the  book  (ed.  Nielsen)    Data:  mt  protein  sequences  of  n=3392  amino  acids  for  s=32  species

枝の小さい数字は推定結果の信頼度

Page 13: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

Comparing  15  trees  (cont.)

13

Page 14: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

系統樹の信頼度は?

human  cow  

rabbit  mouse  

opposum

nD 計算結果

サンプルサイズ n

D∞計算 ?

計算

14

Page 15: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

「計算」はブラックボックスとみなす

human  cow  

rabbit  mouse  

opposum

ℓ1(θ1)

ℓ2(θ2 )

ℓ15(θ15)

max

nD

計算

計算結果

15

Page 16: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

反復計算してカウントだけ使う

nD 計算 Yes

Bootstrap  Probability

ブートストラップ確率

*1nD

*2nD

*10000nD

計算 Yes

計算 No

計算 Yes

16

BP = #{Yes}

10000≈ P(Yes | Data)

Page 17: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

1 2 3 n

1 2 3 m

ランダムに要素をコピーする

ブートストラップ・リサンプリング Efron  (1979)  Felsenstein  (1985)

nD

Dm*1

Dm*2

Dm*10000

17

Page 18: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

Bootstrap:  n=10,242  columns human  

cow  rabbit  mouse  

opposum

18

Dn

Assume  i.i.d.  for  columns

Dm*1,..., Dm

*100

0 < m nIn  the  “m-­‐out-­‐of-­‐n  bootstrap”

m = O(n)We  assume   ,  typically  m=10242,  5000,  20000,  say  

,  typically  m=30,  say  

Page 19: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

BPはベイズ事後確率

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

1 2 3 4 5

nD D∞ 19

*1 *2 *3, , ,...n n nD D D BP = P(Tree | data) ≈ #{Tree}

100

Efron  and  Tibshirani  (1998)

Page 20: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

New  Discovery?

58% 2%

20

Page 21: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

ベイズの事後確率

21

1=human 3=rabbit 4=mouse 1=human 3=rabbit 4=mouse

0.05p ≥

ブートストラップ法(m=n)で計算

Page 22: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

Shimodaira-­‐Hasegawa  testは安全すぎる

SH 0.05≥

22

Page 23: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

頻度論のp-­‐値  (近似的に不偏  AU)

23

1=human 3=rabbit 4=mouse

マルチスケール・ブートストラップ法(m=-­‐n)で計算???

Page 24: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

ベイズと頻度論のギャップを埋める

NBP(σ 2 ) ! Φ σ Φ−1 BP(σ 2 )( ){ }

正規化したブートストラップ確率(NBP)

σ 2 = n

m2 1σ =2 1σ = − 0

NBP(σ 2 )

AU = NBP(−1)

BP = NBP(+1)

m = n m = −n m = ∞

24

Thomas  Bayes  (1702-­‐1761)

Jerzy  Neyman  (1894-­‐1981)  

頻度論 ベイズ統計学

スケーリング則

AUの計算手順: いくつかのm>0でBPを計算して,NBPに変換してからm=-­‐nへ外挿

Page 25: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

正規分布の上側確率

( ) ( )z

z x dxφ∞

Φ = ∫

2 /212

( ) xx eπ

φ −=

-4 -2 0 2 4

0.0

0.1

0.2

0.3

0.4

-4 -2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

1.64

(1.64) 0.05Φ =

25

Page 26: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

もしパラメータにアクセスすれば簡単

θ

θ

ブートストラップ分布をひっくり返す

データからブートストラップ分布

p-­‐value 帰無仮説からシミュレーション

BP(1)

0:H θ θ≤

26

Page 27: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

=

確率分布の空間の幾何学で証明

θ θ

頻度論のp-­‐値

AU = Φ β0 − β1( ) +O(n−3/2 )

ベイズの事後確率

BP = Φ β0 + β1( ) +O(n−3/2 )

0β1β

1β−

27

Jerzy  Neyman  (1894-­‐1981)  

頻度論

Thomas  Bayes  (1702-­‐1761)

ベイズ統計学

Page 28: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

Rescaling  the  whole  picture

2 3/201( ) ( )BP O nβσ β σ

σ−⎡ ⎤=Φ + +⎢ ⎥⎣ ⎦

BP = Φ β0 + β1( ) +O(n−3/2 )

2 1σ =

Shimodaira  (2002)

28

2 1σ =

×

NBP(σ 2 ) = Φ σ Φ−1 BP(σ 2 )( ){ } = Φ β0 + β1σ

2( ) +O(n−3/2 )

Page 29: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

Scaling-­‐law  and  geometry

NBP(σ 2 ) = Φ β0 + β1σ

2 + β2σ4 + β3σ

6 +( ) +O(h2 )

Using  a  new  asympto0c  theory  with  “nearly  flat  surface    h”    instead  of    large  n.

Shimodaira  (2008)

29

β j (u) = 1

2 j j!j!

j1! jp−1!∂2 j h(u)

∂u12 j1∂up−1

2 jp−1j1++ jp−1= j∑

distance mean  curvature

v  

u=(u1,..., up-­‐1)  ( )v h u= −

p-value = Φ β0 − β1 + β2 − β3 +( ) +O(h2 )

Page 30: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

k=3:  AU=0.052 p=0.05

BP=0.0078 k=2:  AU=0.057

2σ-­‐1 1

30

Extrapola0ng  NBP  to  m=-­‐n

σ Φ−1 BP(σ 2 )( )

ψ (σ 2 ) = β0 + β1σ2 + β2σ

4 ++ βk−1σ2(k−1)Fiqng  a  model  :

Ploqng  :   NBP(σ 2 ) ! Φ σ Φ−1 BP(σ 2 )( ){ }

Page 31: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

What  NBP  is  calcula0ng  ?  

2 1σ = 2 1σ = −

Shimodaira  (2008)

31

NBP(−1) = p-value +O(n−3/2 )

NBP(σ 2 ) = Φ β0 + β1σ

2( ) +O(n−3/2 )

curvature

Page 32: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

Applica0ons  to        Life  Science

Brambrink  et  al    (PNAS  2006)

http://cran.r-project.org/web/packages/pvclust/index.html

Mathema0cal  Sta0s0cs/Computer  Science

7mins@700cpus 80hours@1cpu

TSUBAME 32

Page 33: 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3 東大計数 博士(工学)! 1995.10~1996.3!Dept.!Gene0cs,!University!of!Washington!ポスドク

GPGPUでブートストラップを並列計算

3

GPGPU でどれくらい高速化されるか遺伝子発現データの階層型クラスタリング

永田(2012)によるGPUとCPUのハイブリット並列

Suzuki  and  Shimodaira  (2006)  のpvclustパッケージ

高速化 33