研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3...
Transcript of 研究室のトピック紹介...研究室のトピック紹介 略歴 下平英寿!! 1995.3...
研究室のトピック紹介
略歴 下平英寿 1995.3 東大計数 博士(工学) 1995.10~1996.3 Dept. Gene0cs, University of Washington ポスドク 学振PD 1996.7~2002.5 統計数理研究所 助手 1999.7~2001.1 Dept. Sta0s0cs, Stanford University 客員研究員 2002.6~2012.3 東工大情報理工 講師,助教授,准教授 2012.4~現在 阪大基礎工 教授
1
データ解析の統計的信頼度を計算
マルチスケール・ブートストラップ法の理論研究 【指数型分布族の場合】 Shimodaira (2004) Annals of Sta0s0cs 【滑らかでない場合】 Shimodaira (2008) J. Stat. Plan. Inf. 【高次漸近理論】 Shimodaira (2013) arXiv:1312.6348
Function seplot provides a graphical interface for examiningstandard errors, while print gives more detailed informationabout p-values in text-based format. See online instruction onour website for the usage of these facilities.In the multiscale bootstrap resampling, we intentionally alter the
data size of bootstrap samples to several values. Let N be theoriginal data size, and N0 be that for bootstrap samples. In theexample of Figure 1, N ¼ 916, and N0 ¼ 458, 549, 641, 732,824, 916, 1007, 1099, 1190 and 1282. For each cluster, an observedBP value is obtained for each value of N0, and we look at change inz¼"F"1 (BP) values, whereF"1 (·) is the inverse function ofF (·),the standard normal distribution function. For the cluster labeled 62in Figure 1, the observed BP values are 0.8554, 0.8896, 0.9132,0.9335, 0.9498, 0.9636, 0.9656, 0.9756, 0.9795 and 0.9859 (Fig. 2).Then, a theoretical curve zðN0Þ ¼ v
ffiffiffiffiffiffiffiffiffiffiffiN0=N
p+ c
ffiffiffiffiffiffiffiffiffiffiffiN=N0
pis fitted to
the observed values, and the coefficients v, c are estimated for eachcluster. The AU p-value is computed by AU ¼ F("v + c). For thecluster labeled 62, v ¼ "2.01, c ¼ 0.26, and thus AU ¼ F(2.01 +0.26) ¼ F(2.27) ¼ 0.988, where BP ¼ 0.964 for N0 ¼ N. Anasymptotic theory proves that the AU p-value is less biased thanthe BP value.Currently only the simplest form of the bootstrapping, i.e. the
non-parametric bootstrap resampling, is implemented in pvclust.More elaborate models designed for specific applications, such
Hei
ght
feta
l_lu
ngX
306.
99_n
orm
alX
219.
97_n
orm
alX
222.
97_n
orm
alX
314.
99_n
orm
alX
315.
99_n
orm
alX
222.
97_A
deno
X22
6.97
_Ade
noX
306.
99_A
deno
X30
6.99
_nod
eX
313.
99P
T_A
deno
X31
3.99
MT
_Ade
noX
165.
96_A
deno
X17
8.96
_Ade
noX
185.
96_A
deno
X11
.00_
Ade
noX
181.
96_A
deno
X15
6.96
_Ade
noX
198.
96_A
deno
X18
0.96
_Ade
noX
187.
96_A
deno
X68
.96_
Ade
noX
137.
96_A
deno
X12
.00_
Ade
noX
199.
97_A
deno
_cX
199.
97_A
deno
_p X20
4.97
_Ade
noX
257.
97_A
deno
X13
2.95
_Ade
noX
319.
00P
T_A
deno
X32
0.00
_Ade
no_p
X32
0.00
_Ade
no_c
X20
7.97
_SC
LCX
314.
99_S
CLC
X23
0.97
_SC
LCX
315.
99_S
CLC
X31
5.99
_nod
eX
161.
96_A
deno
X29
9.99
_Ade
noX
75.9
5_co
mbi
ned
X69
.96_
Ade
noX
157.
96_S
CC
X21
9.97
_SC
CX
166.
96_S
CC
X3_
SC
CX
58.9
5_S
CC
X23
2.97
_SC
CX
232.
97_n
ode
X22
0.97
_SC
CX
220.
97_n
ode
X42
.95_
SC
CX
239.
97_S
CC
X24
5.97
_SC
CX
245.
97_n
ode
X24
6.97
_SC
C_p
X24
6.97
_SC
C_c
X23
7.97
_Ade
noX
147.
96_A
deno
X6.
00_L
CLC
X25
6.97
_LC
LCX
248.
97_L
CLC
X19
1.96
_Ade
noX
139.
97_L
CLC
X59
.96_
SC
CX
234.
97_A
deno
X18
4.96
_Ade
noX
184.
96_n
ode
X21
8.97
_Ade
noX
319.
00M
T1_
Ade
noX
319.
00M
T2_
Ade
noX
265.
98_A
deno
X80
.96_
Ade
noX
223.
97_A
deno
0.0
0.2
0.4
0.6
0.8
1.0
Cluster dendrogram with AU/BP values (%)
Cluster method: averageDistance: correlation
100
99 100100 10079 100100
100 100 100100100
9910010061100 96 100
10095 84 9696 75 7583 8797100 99 9296 91 9899
63 93 10087 100 5967 9975 916166 80 9571 7188 10076 9362 7168 739984
7269
797083
7774
85
au
100
100 100100 10066 100100
100 100 100100100
9910010042100 90 100
10092 49 9293 47 3060 5989100 96 8697 79 9493
53 79 10058 100 4731 9721 772745 17 9539 3756 9962 6822 4421 599658
163
16427
3029
26
bp
1
2 34 56 78
9 10 111213
1415161718 19 20
2122 23 2425 26 2728 293031 32 3334 35 3637
38 39 4041 42 4344 4546 474849 50 5152 5354 5556 5758 5960 616263
6465
666768
6970
71
edge #
Fig. 1. Hierarchical clustering of 73 lung tumors. The data are expression pattern of 916 genes of Garber et al. (2001). Values at branches are AU p-values (left),BP values (right), and cluster labels (bottom).ClusterswithAU% 95 are indicated by the rectangles. The fourth rectangle from the right is a cluster labeled 62with
AU ¼ 0.99 and BP ¼ 0.96.
Fig. 2. Diagnostic plot of the multiscale bootstrap for the cluster labeled 62.The observed z-values are plotted for
ffiffiffiffiffiffiffiffiffiffiffiN0=N
p, and the theoretical curve is
obtained by the weighted least squares fitting. This plot is obtained by
command: msplot(result, edges¼62). When the curve fitting is
poor, a breakdown of the asymptotic theory may be suspected.
Pvclust
1541
分子進化系統樹や遺伝子発現解析へ応用に関する統計手法 【SH-‐test】 Shimodaira and Hasegawa (1999) Molecular Biology and Evolu0on 引用数=2916 【CONSEL】 Shimodaira and Hasegawa (2001) Bioinforma0cs 引用数=1156 【AU test】 Shimodaira (2002) Systema0c Biology 引用数=1175 【pvclust】 Suzuki and Shimodaira (2006) Bioinforma0cs 引用数=450
ネットワーク推定への応用 【GGM】Kamimura,Shimodaira (2004) GIW 【LiNGAM】Komatsu, Shimizu, Shimodaira (2010) ICANN
4 Hidetoshi Shimodaira and Masami Hasegawa
0.1
623239
elephant shrew
golden mole 82 89100
aardvark
595248
tenrec
78 92100
dugong
elephant 77 86100
hyrax
93100100
98100100
armadillo
524836
wallaroo
opossum100100100
echidna
platypus
71 92100
human
baboon100100100
capuchin
slow loris
74 85100
guinea pig
mouse686095
red squirrel
69 59100
rabbit
755693
harbor seal
dog
cat
100100100
horse
white rhinoceros100100100
64 61100
hippopotamus
blue whale 99100100
cow
100100100
pig
95 99100
69 72100
Ryukyu flying fox
77 92100
greater moonrat
Japanese mole 74 79100
72 81100
6178100
100100100
100100100 100
100100
G1
G2
G3
G4
G5
Fig. 1. The ML tree for 32 mammalian species estimated from a part of the mito-chondrial protein sequences of Nikaido et al. (2003). Biological discussion is givenin Section 5. This ML topology is represented as ((G1, G2), (G3, G4), G5) usingthe five groups of taxa defined above. The MAP topology of Bayesian inference(Section 2.2) is represented as ((G1,(G2,G3)),G4,G5), where the subtrees are thesame as above except for (((elephant shrew, golden mole), aardvark), tenrec) in G3being changed to ((tenrec, aardvark), (elephant shrew, golden mole)). Our datasetconsists of the sequences of n = 3392 amino acids for s = 32 species. The mtREVmodel (Adachi and Hasegawa 1996) was used for amino acid substitutions, and thesite-heterogeneity was modeled by the discrete-gamma distribution (Yang 1996).A limited list of 2502 candidate topologies was obtained from heated MCMCMCsimulations as explained in Fig. 2 and Fig. 3, and their log-likelihood values werecalculated by PAML to find the ML topology. Three numbers near the branches arethe approximately unbiased p-value (AU) (top) and the bootstrap probability (BP)(middle) calculated by CONSEL (Section 3), and the posterior probability (PP)(bottom) calculated by MrBayes (Section 2.2).
2
Suzuki and Shimodaira (2006)
Function seplot provides a graphical interface for examiningstandard errors, while print gives more detailed informationabout p-values in text-based format. See online instruction onour website for the usage of these facilities.In the multiscale bootstrap resampling, we intentionally alter the
data size of bootstrap samples to several values. Let N be theoriginal data size, and N0 be that for bootstrap samples. In theexample of Figure 1, N ¼ 916, and N0 ¼ 458, 549, 641, 732,824, 916, 1007, 1099, 1190 and 1282. For each cluster, an observedBP value is obtained for each value of N0, and we look at change inz¼"F"1 (BP) values, whereF"1 (·) is the inverse function ofF (·),the standard normal distribution function. For the cluster labeled 62in Figure 1, the observed BP values are 0.8554, 0.8896, 0.9132,0.9335, 0.9498, 0.9636, 0.9656, 0.9756, 0.9795 and 0.9859 (Fig. 2).Then, a theoretical curve zðN0Þ ¼ v
ffiffiffiffiffiffiffiffiffiffiffiN0=N
p+ c
ffiffiffiffiffiffiffiffiffiffiffiN=N0
pis fitted to
the observed values, and the coefficients v, c are estimated for eachcluster. The AU p-value is computed by AU ¼ F("v + c). For thecluster labeled 62, v ¼ "2.01, c ¼ 0.26, and thus AU ¼ F(2.01 +0.26) ¼ F(2.27) ¼ 0.988, where BP ¼ 0.964 for N0 ¼ N. Anasymptotic theory proves that the AU p-value is less biased thanthe BP value.Currently only the simplest form of the bootstrapping, i.e. the
non-parametric bootstrap resampling, is implemented in pvclust.More elaborate models designed for specific applications, such
Hei
ght
feta
l_lu
ngX
306.
99_n
orm
alX
219.
97_n
orm
alX
222.
97_n
orm
alX
314.
99_n
orm
alX
315.
99_n
orm
alX
222.
97_A
deno
X22
6.97
_Ade
noX
306.
99_A
deno
X30
6.99
_nod
eX
313.
99P
T_A
deno
X31
3.99
MT
_Ade
noX
165.
96_A
deno
X17
8.96
_Ade
noX
185.
96_A
deno
X11
.00_
Ade
noX
181.
96_A
deno
X15
6.96
_Ade
noX
198.
96_A
deno
X18
0.96
_Ade
noX
187.
96_A
deno
X68
.96_
Ade
noX
137.
96_A
deno
X12
.00_
Ade
noX
199.
97_A
deno
_cX
199.
97_A
deno
_p X20
4.97
_Ade
noX
257.
97_A
deno
X13
2.95
_Ade
noX
319.
00P
T_A
deno
X32
0.00
_Ade
no_p
X32
0.00
_Ade
no_c
X20
7.97
_SC
LCX
314.
99_S
CLC
X23
0.97
_SC
LCX
315.
99_S
CLC
X31
5.99
_nod
eX
161.
96_A
deno
X29
9.99
_Ade
noX
75.9
5_co
mbi
ned
X69
.96_
Ade
noX
157.
96_S
CC
X21
9.97
_SC
CX
166.
96_S
CC
X3_
SC
CX
58.9
5_S
CC
X23
2.97
_SC
CX
232.
97_n
ode
X22
0.97
_SC
CX
220.
97_n
ode
X42
.95_
SC
CX
239.
97_S
CC
X24
5.97
_SC
CX
245.
97_n
ode
X24
6.97
_SC
C_p
X24
6.97
_SC
C_c
X23
7.97
_Ade
noX
147.
96_A
deno
X6.
00_L
CLC
X25
6.97
_LC
LCX
248.
97_L
CLC
X19
1.96
_Ade
noX
139.
97_L
CLC
X59
.96_
SC
CX
234.
97_A
deno
X18
4.96
_Ade
noX
184.
96_n
ode
X21
8.97
_Ade
noX
319.
00M
T1_
Ade
noX
319.
00M
T2_
Ade
noX
265.
98_A
deno
X80
.96_
Ade
noX
223.
97_A
deno
0.0
0.2
0.4
0.6
0.8
1.0
Cluster dendrogram with AU/BP values (%)
Cluster method: averageDistance: correlation
100
99 100100 10079 100100
100 100 100100100
9910010061100 96 100
10095 84 9696 75 7583 8797100 99 9296 91 9899
63 93 10087 100 5967 9975 916166 80 9571 7188 10076 9362 7168 739984
7269
797083
7774
85
au
100
100 100100 10066 100100
100 100 100100100
9910010042100 90 100
10092 49 9293 47 3060 5989100 96 8697 79 9493
53 79 10058 100 4731 9721 772745 17 9539 3756 9962 6822 4421 599658
163
16427
3029
26
bp
1
2 34 56 78
9 10 111213
1415161718 19 20
2122 23 2425 26 2728 293031 32 3334 35 3637
38 39 4041 42 4344 4546 474849 50 5152 5354 5556 5758 5960 616263
6465
666768
6970
71
edge #
Fig. 1. Hierarchical clustering of 73 lung tumors. The data are expression pattern of 916 genes of Garber et al. (2001). Values at branches are AU p-values (left),BP values (right), and cluster labels (bottom).ClusterswithAU% 95 are indicated by the rectangles. The fourth rectangle from the right is a cluster labeled 62with
AU ¼ 0.99 and BP ¼ 0.96.
Fig. 2. Diagnostic plot of the multiscale bootstrap for the cluster labeled 62.The observed z-values are plotted for
ffiffiffiffiffiffiffiffiffiffiffiN0=N
p, and the theoretical curve is
obtained by the weighted least squares fitting. This plot is obtained by
command: msplot(result, edges¼62). When the curve fitting is
poor, a breakdown of the asymptotic theory may be suspected.
Pvclust
1541
3
Extended Data Figure 6 | Characterization of ES-like cells converted fromFgf4-induced stem cells and comparison of STAP cells with early embryos.a, Immunohistochemistry of ES-like cells for trophoblast and pluripotencymarkers. ES-like cells converted from Fgf4-induced stem cells no longerexpressed the trophoblast marker (integrin alpha 7), but they did express thepluripotency markers (Oct4, Nanog and SSEA-1). Scale bar, 100mm.b, Pluripotency of ES-like cells converted from Fgf4-induced stem cells asshown by teratoma formation. Those cells successfully formed teratomascontaining tissues from all three germ layers: neuroepithelium (left, arrowindicates), muscle tissue (middle, arrow indicates) and bronchial-likeepithelium (right). Scale bar, 100mm. c, MEK inhibitor treatment assay for
Oct4-gfp Fgf4-induced stem cells in trophoblast stem-cell medium containingFgf4. No substantial formation of Oct4-GFP1 colonies was observed fromdissociated Fgf4-induced stem cells in MEK-inhibitor-containing medium.Scale bar, 100mm. d, Cluster tree diagram from hierarchical clustering of globalexpression profiles. Red, AU P values. As this analysis included morula andblastocyst embryos from which only small amounts of RNA could be obtained,we used pre-amplification with the SMARTer Ultra Low RNA kit for IlluminaSequencing (Clontech Laboratories). e, f, Volcano plot of the expressionprofile of STAP cells compared to the morula (e) and blastocyst (f). Genesshowing greater than 10-fold change and P value 1.0 3 1026 are highlighted inred and are considered up- (or down-) regulated in the STAP cells.
LETTER RESEARCH
Macmillan Publishers Limited. All rights reserved©2014
Extended Data Figure 6 | Characterization of ES-like cells converted fromFgf4-induced stem cells and comparison of STAP cells with early embryos.a, Immunohistochemistry of ES-like cells for trophoblast and pluripotencymarkers. ES-like cells converted from Fgf4-induced stem cells no longerexpressed the trophoblast marker (integrin alpha 7), but they did express thepluripotency markers (Oct4, Nanog and SSEA-1). Scale bar, 100mm.b, Pluripotency of ES-like cells converted from Fgf4-induced stem cells asshown by teratoma formation. Those cells successfully formed teratomascontaining tissues from all three germ layers: neuroepithelium (left, arrowindicates), muscle tissue (middle, arrow indicates) and bronchial-likeepithelium (right). Scale bar, 100mm. c, MEK inhibitor treatment assay for
Oct4-gfp Fgf4-induced stem cells in trophoblast stem-cell medium containingFgf4. No substantial formation of Oct4-GFP1 colonies was observed fromdissociated Fgf4-induced stem cells in MEK-inhibitor-containing medium.Scale bar, 100mm. d, Cluster tree diagram from hierarchical clustering of globalexpression profiles. Red, AU P values. As this analysis included morula andblastocyst embryos from which only small amounts of RNA could be obtained,we used pre-amplification with the SMARTer Ultra Low RNA kit for IlluminaSequencing (Clontech Laboratories). e, f, Volcano plot of the expressionprofile of STAP cells compared to the morula (e) and blastocyst (f). Genesshowing greater than 10-fold change and P value 1.0 3 1026 are highlighted inred and are considered up- (or down-) regulated in the STAP cells.
LETTER RESEARCH
Macmillan Publishers Limited. All rights reserved©2014
Extended Data Figure 6 | Characterization of ES-like cells converted fromFgf4-induced stem cells and comparison of STAP cells with early embryos.a, Immunohistochemistry of ES-like cells for trophoblast and pluripotencymarkers. ES-like cells converted from Fgf4-induced stem cells no longerexpressed the trophoblast marker (integrin alpha 7), but they did express thepluripotency markers (Oct4, Nanog and SSEA-1). Scale bar, 100mm.b, Pluripotency of ES-like cells converted from Fgf4-induced stem cells asshown by teratoma formation. Those cells successfully formed teratomascontaining tissues from all three germ layers: neuroepithelium (left, arrowindicates), muscle tissue (middle, arrow indicates) and bronchial-likeepithelium (right). Scale bar, 100mm. c, MEK inhibitor treatment assay for
Oct4-gfp Fgf4-induced stem cells in trophoblast stem-cell medium containingFgf4. No substantial formation of Oct4-GFP1 colonies was observed fromdissociated Fgf4-induced stem cells in MEK-inhibitor-containing medium.Scale bar, 100mm. d, Cluster tree diagram from hierarchical clustering of globalexpression profiles. Red, AU P values. As this analysis included morula andblastocyst embryos from which only small amounts of RNA could be obtained,we used pre-amplification with the SMARTer Ultra Low RNA kit for IlluminaSequencing (Clontech Laboratories). e, f, Volcano plot of the expressionprofile of STAP cells compared to the morula (e) and blastocyst (f). Genesshowing greater than 10-fold change and P value 1.0 3 1026 are highlighted inred and are considered up- (or down-) regulated in the STAP cells.
LETTER RESEARCH
Macmillan Publishers Limited. All rights reserved©2014
passages) and in their vulnerability to dissociation1. However, whencultured in the presence of ACTH and LIF for 7 days, STAP cells, at amoderate frequency, further convert into pluripotent ‘stem’ cells thatrobustly proliferate (STAP stem cells).
Here we have investigated the unique nature of STAP cells, focusingon their differentiation potential into the two major categories (embry-onic and placental lineages) of cells in the blastocyst5–8. We becameparticularly interested in this question after a blastocyst injection assayrevealed an unexpected finding. In general, progeny of injected ES cellsare found in the embryonic portion of the chimaera, but rarely in theplacental portion5,7 (Fig. 1a; shown with Rosa26-GFP). Surprisingly,injected STAP cells contributed not only to the embryo but also to theplacenta and fetal membranes (Fig. 1b and Extended Data Fig. 1a–c) in60% of the chimaeric embryos (Fig. 1c).
In quantitative polymerase chain reaction (qPCR) analysis, STAP cells(sorted for strong Oct4-GFP signals) expressed not only pluripotencymarker genes but also trophoblast marker genes such as Cdx2 (Fig. 1dand Supplementary Table 1 for primers), unlike ES cells. Therefore,the blastocyst injection result is not easily explained by the idea thatSTAP cells are composed of a simple mixture of pluripotent cells(Oct41Cdx22) and trophoblast-stem-like cells (Oct42Cdx21).
In contrast to STAP cells, STAP stem cells did not show the ability tocontribute to placental tissues (Fig. 1e, lanes 2–4), indicating that the
derivation of STAP stem cells from STAP cells involves the loss ofcompetence to differentiate into placental lineages. Consistent withthis idea, STAP stem cells show little expression of trophoblast markergenes (Fig. 1f).
We next examined whether an alteration in culture conditions couldinduce in vitro conversion of STAP cells into cells similar to tropho-blast stem cells8,9, which can be derived from blastocysts during pro-longed adhesion culture in the presence of Fgf4. When we culturedSTAP cell clusters under similar conditions (Fig. 2a; one cluster per wellin a 96-well plate), flat cell colonies grew out by days 7–10 (Fig. 2b, left;typically in ,30% of wells). The Fgf4-induced cells strongly expressedthe trophoblast marker proteins9–12 integrin a7 (Itga7) and eomesoder-min (Eomes) (Fig. 2c, d) and marker genes (for example, Cdx2; Fig. 2e).
These Fgf4-induced cells with trophoblast marker expression couldbe expanded efficiently in the presence of Fgf4 by passaging for morethan 30 passages with trypsin digestion every third day. Hereafter,these proliferative cells induced from STAP cells by Fgf4 treatmentare referred to as Fgf4-induced stem cells. This type of derivation intotrophoblast-stem-like cells is not common with ES cells (unless genet-ically manipulated)13 or STAP stem cells.
In the blastocyst injection assay, unlike STAP stem cells, the pla-cental contribution of Fgf4-induced stem cells (cag-GFP-labelled) wasobserved with 53% of embryos (Fig. 2f, g; n 5 60). In the chimaeric
STAP clusters
Passage
Outgrowth of trophoblast-like cells
Fgf4 medium7–10 days
a
b c
e
Integrin α7 Eomes
f
DAPI
cag-GFP
Placenta
d
h
Per
cent
age
of G
FP+
cells
Placental contribution
cag-GFP
Bright-field
Cross-section g Chimaeric fetus
1 32 4
FI-S
C 1FI
-SC 2
FI-S
C 3GFP
-ES 1
GFP-E
S 2GFP
-ES 3
5 6
0
20
10
18161412
2468
Day 1
Day 7
Passage 1(on feeder)
Passage 1(enlarged
view)
i j
Rel
ativ
e ex
pres
sion
leve
ls(T
S=1
.0)
cag-GFP
Rel
ativ
e ex
pres
sion
leve
ls(E
S=1
.0)
Fgf4 medium
Oct4-GFPBright-field
***
NS
NS
k
ES TSFG
F-ind
uced
STAP
CD45+
Oct4 Nanog Rex1
JAKi– JAKi+ JAKi– JAKi+ES FI-SC
00.20.4
0.60.8
1.2
1.0
00.20.40.60.8
1.2
1.0
Cdx2 Eomes Elf5 Itga7
JAKi–JAKi+
FI-SC
TS
0
5
10
15
20
25
30
Oct4
Cdx2
Eomes
Rel
ativ
e ex
pres
sion
leve
ls(E
S=1
.0)
ES STAP-SC
CD45
FI-SCTS
STAP
Hei
ght
0.00
0.10
0.15
0.05
au
100
100100
100
Figure 2 | Fgf4 treatment induces sometrophoblast-lineage character in STAP cells.a, Schematic of Fgf4 treatment to induceFgf4-induced stem cells from STAP cells. b, Fgf4treatment promoted the generation of flat cellclusters that expressed Oct4-GFP at moderatelevels (right). Top and middle: days 1 and 7 ofculture with Fgf4, respectively. Bottom: cultureafter the first passage. Scale bar, 50mm.c, d, Immunostaining of Fgf4-induced cells withthe trophoblast stem cell markers integrin a7(c) and eomesodermin (d). Scale bar, 50mm.e, qPCR analysis of marker expression.f, g, Placental contribution of Fgf4-induced stemcells (FI-SCs) (genetically labelled with constitutiveGFP expression). Scale bars: 5.0 mm (f (left panel)and g); 50mm (f, right panel). In addition toplacental contribution, Fgf4-induced stem cellscontributed to the embryonic portion at amoderate level (g). h, Quantification of placentalcontribution by FACS analysis. Unlike Fgf4-induced cells, ES cells did not contribute toplacental tissues at a detectable level. i, Cluster treediagram from hierarchical clustering of globalexpression profiles. Red, approximately unbiasedP values. j, qPCR analysis of Fgf4-induced cells(cultured under feeder-free conditions) with orwithout JAK inhibitor (JAKi) treatment forpluripotent marker genes. k, qPCR analysis ofFI-SCs with or without JAK inhibitor (JAKi)treatment for trophoblast marker genes. Values areshown as ratio to the expression level in ES cells(j) or trophoblast stem cells (k). ***P , 0.001;NS, not significant; t-test for each gene betweengroups with and without JAK inhibitor treatment.n 5 3. Statistical significance was all the same withthree pluripotency markers. None of thetrophoblast marker genes showed statisticalsignificance. Error bars represent s.d.
LETTER RESEARCH
3 0 J A N U A R Y 2 0 1 4 | V O L 5 0 5 | N A T U R E | 6 7 7
Macmillan Publishers Limited. All rights reserved©2014
passages) and in their vulnerability to dissociation1. However, whencultured in the presence of ACTH and LIF for 7 days, STAP cells, at amoderate frequency, further convert into pluripotent ‘stem’ cells thatrobustly proliferate (STAP stem cells).
Here we have investigated the unique nature of STAP cells, focusingon their differentiation potential into the two major categories (embry-onic and placental lineages) of cells in the blastocyst5–8. We becameparticularly interested in this question after a blastocyst injection assayrevealed an unexpected finding. In general, progeny of injected ES cellsare found in the embryonic portion of the chimaera, but rarely in theplacental portion5,7 (Fig. 1a; shown with Rosa26-GFP). Surprisingly,injected STAP cells contributed not only to the embryo but also to theplacenta and fetal membranes (Fig. 1b and Extended Data Fig. 1a–c) in60% of the chimaeric embryos (Fig. 1c).
In quantitative polymerase chain reaction (qPCR) analysis, STAP cells(sorted for strong Oct4-GFP signals) expressed not only pluripotencymarker genes but also trophoblast marker genes such as Cdx2 (Fig. 1dand Supplementary Table 1 for primers), unlike ES cells. Therefore,the blastocyst injection result is not easily explained by the idea thatSTAP cells are composed of a simple mixture of pluripotent cells(Oct41Cdx22) and trophoblast-stem-like cells (Oct42Cdx21).
In contrast to STAP cells, STAP stem cells did not show the ability tocontribute to placental tissues (Fig. 1e, lanes 2–4), indicating that the
derivation of STAP stem cells from STAP cells involves the loss ofcompetence to differentiate into placental lineages. Consistent withthis idea, STAP stem cells show little expression of trophoblast markergenes (Fig. 1f).
We next examined whether an alteration in culture conditions couldinduce in vitro conversion of STAP cells into cells similar to tropho-blast stem cells8,9, which can be derived from blastocysts during pro-longed adhesion culture in the presence of Fgf4. When we culturedSTAP cell clusters under similar conditions (Fig. 2a; one cluster per wellin a 96-well plate), flat cell colonies grew out by days 7–10 (Fig. 2b, left;typically in ,30% of wells). The Fgf4-induced cells strongly expressedthe trophoblast marker proteins9–12 integrin a7 (Itga7) and eomesoder-min (Eomes) (Fig. 2c, d) and marker genes (for example, Cdx2; Fig. 2e).
These Fgf4-induced cells with trophoblast marker expression couldbe expanded efficiently in the presence of Fgf4 by passaging for morethan 30 passages with trypsin digestion every third day. Hereafter,these proliferative cells induced from STAP cells by Fgf4 treatmentare referred to as Fgf4-induced stem cells. This type of derivation intotrophoblast-stem-like cells is not common with ES cells (unless genet-ically manipulated)13 or STAP stem cells.
In the blastocyst injection assay, unlike STAP stem cells, the pla-cental contribution of Fgf4-induced stem cells (cag-GFP-labelled) wasobserved with 53% of embryos (Fig. 2f, g; n 5 60). In the chimaeric
STAP clusters
Passage
Outgrowth of trophoblast-like cells
Fgf4 medium7–10 days
a
b c
e
Integrin α7 Eomes
f
DAPI
cag-GFP
Placenta
d
h
Per
cent
age
of G
FP+
cells
Placental contribution
cag-GFP
Bright-field
Cross-section g Chimaeric fetus
1 32 4
FI-S
C 1FI
-SC 2
FI-S
C 3GFP
-ES 1
GFP-E
S 2GFP
-ES 3
5 6
0
20
10
18161412
2468
Day 1
Day 7
Passage 1(on feeder)
Passage 1(enlarged
view)
i j
Rel
ativ
e ex
pres
sion
leve
ls(T
S=1
.0)
cag-GFP
Rel
ativ
e ex
pres
sion
leve
ls(E
S=1
.0)
Fgf4 medium
Oct4-GFPBright-field
***
NS
NS
k
ES TSFG
F-ind
uced
STAP
CD45+
Oct4 Nanog Rex1
JAKi– JAKi+ JAKi– JAKi+ES FI-SC
00.20.4
0.60.8
1.2
1.0
00.20.40.60.8
1.2
1.0
Cdx2 Eomes Elf5 Itga7
JAKi–JAKi+
FI-SC
TS
0
5
10
15
20
25
30
Oct4
Cdx2
Eomes
Rel
ativ
e ex
pres
sion
leve
ls(E
S=1
.0)
ES STAP-SC
CD45
FI-SCTS
STAP
Hei
ght
0.00
0.10
0.15
0.05
au
100
100100
100
Figure 2 | Fgf4 treatment induces sometrophoblast-lineage character in STAP cells.a, Schematic of Fgf4 treatment to induceFgf4-induced stem cells from STAP cells. b, Fgf4treatment promoted the generation of flat cellclusters that expressed Oct4-GFP at moderatelevels (right). Top and middle: days 1 and 7 ofculture with Fgf4, respectively. Bottom: cultureafter the first passage. Scale bar, 50mm.c, d, Immunostaining of Fgf4-induced cells withthe trophoblast stem cell markers integrin a7(c) and eomesodermin (d). Scale bar, 50mm.e, qPCR analysis of marker expression.f, g, Placental contribution of Fgf4-induced stemcells (FI-SCs) (genetically labelled with constitutiveGFP expression). Scale bars: 5.0 mm (f (left panel)and g); 50mm (f, right panel). In addition toplacental contribution, Fgf4-induced stem cellscontributed to the embryonic portion at amoderate level (g). h, Quantification of placentalcontribution by FACS analysis. Unlike Fgf4-induced cells, ES cells did not contribute toplacental tissues at a detectable level. i, Cluster treediagram from hierarchical clustering of globalexpression profiles. Red, approximately unbiasedP values. j, qPCR analysis of Fgf4-induced cells(cultured under feeder-free conditions) with orwithout JAK inhibitor (JAKi) treatment forpluripotent marker genes. k, qPCR analysis ofFI-SCs with or without JAK inhibitor (JAKi)treatment for trophoblast marker genes. Values areshown as ratio to the expression level in ES cells(j) or trophoblast stem cells (k). ***P , 0.001;NS, not significant; t-test for each gene betweengroups with and without JAK inhibitor treatment.n 5 3. Statistical significance was all the same withthree pluripotency markers. None of thetrophoblast marker genes showed statisticalsignificance. Error bars represent s.d.
LETTER RESEARCH
3 0 J A N U A R Y 2 0 1 4 | V O L 5 0 5 | N A T U R E | 6 7 7
Macmillan Publishers Limited. All rights reserved©2014
passages) and in their vulnerability to dissociation1. However, whencultured in the presence of ACTH and LIF for 7 days, STAP cells, at amoderate frequency, further convert into pluripotent ‘stem’ cells thatrobustly proliferate (STAP stem cells).
Here we have investigated the unique nature of STAP cells, focusingon their differentiation potential into the two major categories (embry-onic and placental lineages) of cells in the blastocyst5–8. We becameparticularly interested in this question after a blastocyst injection assayrevealed an unexpected finding. In general, progeny of injected ES cellsare found in the embryonic portion of the chimaera, but rarely in theplacental portion5,7 (Fig. 1a; shown with Rosa26-GFP). Surprisingly,injected STAP cells contributed not only to the embryo but also to theplacenta and fetal membranes (Fig. 1b and Extended Data Fig. 1a–c) in60% of the chimaeric embryos (Fig. 1c).
In quantitative polymerase chain reaction (qPCR) analysis, STAP cells(sorted for strong Oct4-GFP signals) expressed not only pluripotencymarker genes but also trophoblast marker genes such as Cdx2 (Fig. 1dand Supplementary Table 1 for primers), unlike ES cells. Therefore,the blastocyst injection result is not easily explained by the idea thatSTAP cells are composed of a simple mixture of pluripotent cells(Oct41Cdx22) and trophoblast-stem-like cells (Oct42Cdx21).
In contrast to STAP cells, STAP stem cells did not show the ability tocontribute to placental tissues (Fig. 1e, lanes 2–4), indicating that the
derivation of STAP stem cells from STAP cells involves the loss ofcompetence to differentiate into placental lineages. Consistent withthis idea, STAP stem cells show little expression of trophoblast markergenes (Fig. 1f).
We next examined whether an alteration in culture conditions couldinduce in vitro conversion of STAP cells into cells similar to tropho-blast stem cells8,9, which can be derived from blastocysts during pro-longed adhesion culture in the presence of Fgf4. When we culturedSTAP cell clusters under similar conditions (Fig. 2a; one cluster per wellin a 96-well plate), flat cell colonies grew out by days 7–10 (Fig. 2b, left;typically in ,30% of wells). The Fgf4-induced cells strongly expressedthe trophoblast marker proteins9–12 integrin a7 (Itga7) and eomesoder-min (Eomes) (Fig. 2c, d) and marker genes (for example, Cdx2; Fig. 2e).
These Fgf4-induced cells with trophoblast marker expression couldbe expanded efficiently in the presence of Fgf4 by passaging for morethan 30 passages with trypsin digestion every third day. Hereafter,these proliferative cells induced from STAP cells by Fgf4 treatmentare referred to as Fgf4-induced stem cells. This type of derivation intotrophoblast-stem-like cells is not common with ES cells (unless genet-ically manipulated)13 or STAP stem cells.
In the blastocyst injection assay, unlike STAP stem cells, the pla-cental contribution of Fgf4-induced stem cells (cag-GFP-labelled) wasobserved with 53% of embryos (Fig. 2f, g; n 5 60). In the chimaeric
STAP clusters
Passage
Outgrowth of trophoblast-like cells
Fgf4 medium7–10 days
a
b c
e
Integrin α7 Eomes
f
DAPI
cag-GFP
Placenta
d
h
Per
cent
age
of G
FP+
cells
Placental contribution
cag-GFP
Bright-field
Cross-section g Chimaeric fetus
1 32 4
FI-S
C 1FI
-SC 2
FI-S
C 3GFP
-ES 1
GFP-E
S 2GFP
-ES 3
5 6
0
20
10
18161412
2468
Day 1
Day 7
Passage 1(on feeder)
Passage 1(enlarged
view)
i j
Rel
ativ
e ex
pres
sion
leve
ls(T
S=1
.0)
cag-GFP
Rel
ativ
e ex
pres
sion
leve
ls(E
S=1
.0)
Fgf4 medium
Oct4-GFPBright-field
***
NS
NS
k
ES TSFG
F-ind
uced
STAP
CD45+
Oct4 Nanog Rex1
JAKi– JAKi+ JAKi– JAKi+ES FI-SC
00.20.4
0.60.8
1.2
1.0
00.20.40.60.8
1.2
1.0
Cdx2 Eomes Elf5 Itga7
JAKi–JAKi+
FI-SC
TS
0
5
10
15
20
25
30
Oct4
Cdx2
Eomes
Rel
ativ
e ex
pres
sion
leve
ls(E
S=1
.0)
ES STAP-SC
CD45
FI-SCTS
STAP
Hei
ght
0.00
0.10
0.15
0.05
au
100
100100
100
Figure 2 | Fgf4 treatment induces sometrophoblast-lineage character in STAP cells.a, Schematic of Fgf4 treatment to induceFgf4-induced stem cells from STAP cells. b, Fgf4treatment promoted the generation of flat cellclusters that expressed Oct4-GFP at moderatelevels (right). Top and middle: days 1 and 7 ofculture with Fgf4, respectively. Bottom: cultureafter the first passage. Scale bar, 50mm.c, d, Immunostaining of Fgf4-induced cells withthe trophoblast stem cell markers integrin a7(c) and eomesodermin (d). Scale bar, 50mm.e, qPCR analysis of marker expression.f, g, Placental contribution of Fgf4-induced stemcells (FI-SCs) (genetically labelled with constitutiveGFP expression). Scale bars: 5.0 mm (f (left panel)and g); 50mm (f, right panel). In addition toplacental contribution, Fgf4-induced stem cellscontributed to the embryonic portion at amoderate level (g). h, Quantification of placentalcontribution by FACS analysis. Unlike Fgf4-induced cells, ES cells did not contribute toplacental tissues at a detectable level. i, Cluster treediagram from hierarchical clustering of globalexpression profiles. Red, approximately unbiasedP values. j, qPCR analysis of Fgf4-induced cells(cultured under feeder-free conditions) with orwithout JAK inhibitor (JAKi) treatment forpluripotent marker genes. k, qPCR analysis ofFI-SCs with or without JAK inhibitor (JAKi)treatment for trophoblast marker genes. Values areshown as ratio to the expression level in ES cells(j) or trophoblast stem cells (k). ***P , 0.001;NS, not significant; t-test for each gene betweengroups with and without JAK inhibitor treatment.n 5 3. Statistical significance was all the same withthree pluripotency markers. None of thetrophoblast marker genes showed statisticalsignificance. Error bars represent s.d.
LETTER RESEARCH
3 0 J A N U A R Y 2 0 1 4 | V O L 5 0 5 | N A T U R E | 6 7 7
Macmillan Publishers Limited. All rights reserved©2014
LETTERdoi:10.1038/nature12969
Bidirectional developmental potential inreprogrammed cells with acquired pluripotencyHaruko Obokata1,2,3, Yoshiki Sasai4, Hitoshi Niwa5, Mitsutaka Kadota6, Munazah Andrabi6, Nozomu Takata4, Mikiko Tokoro2,Yukari Terashita1,2, Shigenobu Yonemura7, Charles A. Vacanti3 & Teruhiko Wakayama2,8
We recently discovered an unexpected phenomenon of somatic cellreprogramming into pluripotent cells by exposure to sublethal stim-uli, which we call stimulus-triggered acquisition of pluripotency(STAP)1. This reprogramming does not require nuclear transfer2,3
or genetic manipulation4. Here we report that reprogrammed STAPcells, unlike embryonic stem (ES) cells, can contribute to both embry-onic and placental tissues, as seen in a blastocyst injection assay.Mouse STAP cells lose the ability to contribute to the placenta aswell as trophoblast marker expression on converting into ES-likestem cells by treatment with adrenocorticotropic hormone (ACTH)and leukaemia inhibitory factor (LIF). In contrast, when culturedwith Fgf4, STAP cells give rise to proliferative stem cells with enhancedtrophoblastic characteristics. Notably, unlike conventional tropho-blast stem cells, the Fgf4-induced stem cells from STAP cells con-tribute to both embryonic and placental tissues in vivo and transforminto ES-like cells when cultured with LIF-containing medium. Taken
together, the developmental potential of STAP cells, shown by chi-maera formation and in vitro cell conversion, indicates that theyrepresent a unique state of pluripotency.
We recently discovered an intriguing phenomenon of cellular fateconversion: somatic cells regain pluripotency after experiencing sub-lethal stimuli such as a low-pH exposure1. When splenic CD451 lym-phocytes are exposed to pH 5.7 for 30 min and subsequently culturedin the presence of LIF, a substantial portion of surviving cells start toexpress the pluripotent cell marker Oct4 (also called Pou5f1) at day 2.By day 7, pluripotent cell clusters form with a bona fide pluripotencymarker profile and acquire the competence for three-germ-layer differ-entiation as shown by teratoma formation. These STAP cells can alsoefficiently contribute to chimaeric mice and undergo germline trans-mission using a blastocyst injection assay1. Although these charac-teristics resemble those of ES cells, STAP cells seem to differ from EScells in their limited capacity for self-renewal (typically, for only a few
1Laboratory for Cellular Reprogramming, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan. 2Laboratory for Genomic Reprogramming, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan. 3Laboratory for Tissue Engineering and Regenerative Medicine, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA. 4Laboratory for Organogenesisand Neurogenesis, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan. 5Laboratory for Pluripotent Stem Cell Studies, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan.6Genome Resource and Analysis Unit, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan. 7Electron Microscopy Laboratory, RIKEN Center for Developmental Biology, Kobe 650-0047, Japan.8Faculty of Life and Environmental Sciences, University of Yamanashi, Yamanashi 400-8510, Japan.
STAP chimera cag-GFPSTAP chimaera
Bright-fieldES chimaera cag-GFP
Bright-field
Long exposure
Long exposure
c
Contribution patternEmbryo-specificEmbryo+placenta+yolk sac
Per
cent
age
of e
mbr
yos
with
eac
h co
ntrib
utio
n pa
tter
n 100
0
908070605040302010
STAPES
d
Rel
ativ
e ex
pres
sion
leve
ls(E
S=1
.0)
Rel
ativ
e ex
pres
sion
leve
ls(T
S=1
.0)
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
ES STAP CD45 TS
Oct4 Nanog Rex1
00.20.40.60.81.01.21.41.6
TS STAP CD45 ES
Cdx2 Eomes Elf5
***
eP
erce
ntag
e of
GFP
+ ce
lls
Placental contribution
1 32 4
0
20
10
18161412
2468
0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
Cdx2
Eomes
Elf5
Rel
ativ
e ex
pres
sion
leve
ls(T
S=1
.0)
f
a
b n = 50 n = 10
TS
STAP-S
C 1
STAP-S
C 2
STAP-S
C 3 TS
STAP-S
C ES
Figure 1 | STAP cells contribute to both embryonic and placental tissuesin vivo. a, b, E12.5 embryos from blastocysts injected with ES cells (a) andSTAP cells (b). Both cells are genetically labelled with GFP driven by aconstitutive promoter. Progeny of STAP cells also contributed to placentaltissues and fetal membranes (b), whereas ES-cell-derived cells were not foundin these tissues (a). Scale bar, 5.0 mm. c, Percentages of fetuses in which injectedcells contributed only to the embryonic portion (red) or also to placentaland yolk sac tissues (blue). ***P , 0.001 with Fisher’s exact test. d, qPCR
analysis of FACS-sorted Oct4-GFP-strong STAP cells for pluripotent markergenes (left) and trophoblast marker genes (right). Values are shown as ratio tothe expression level in ES cells. Error bars represent s.d. e, Contribution toplacental tissues. Unlike parental STAP cells and trophoblast stem (TS) cells,STAP stem cells (STAP-SCs) did not retain the ability for placentalcontributions. Three independent lines were tested and all showed substantialcontributions to the embryonic portions. f, qPCR analysis of trophoblastmarker gene expression in STAP stem cells. Error bars represent s.d.
6 7 6 | N A T U R E | V O L 5 0 5 | 3 0 J A N U A R Y 2 0 1 4
Macmillan Publishers Limited. All rights reserved©2014
4
統計的モデル選択の情報量規準
Z1
Z2 Y
観測部分 Y 潜在部分 Z
ここを評価したい!
興味がない!
【不完全データの情報量規準 PDIO】 H. Shimodaira (1994) Lecture Notes in Sta0s0cs 【不完全データの一部に興味がある場合の情報量規準】 原照雅,下平英寿 (2011) 統計関連学会連合大会 【covariate shi`,共変量シフト】 H. Shimodaira (2000) Journal of sta0s0cal planning and inference 引用数=357
230 H. Shimodaira / Journal of Statistical Planning and Inference 90 (2000) 227–244
Fig. 1. Fitting of polynomial regression with degree d=1. (a) Samples (xt ; yt) of size n=100 are generatedfrom q(y|x)q0(x) and plotted as circles, where the underlying true curve is indicated by the thin dotted line.The solid line is obtained by OLS, and the dotted line is WLS with weight q1(x)=q0(x). (b) Samples ofn = 100 are generated from q(y|x)q1(x), and the regression line is obtained by OLS.
On the other hand, MWLE !w is obtained by weighted least squares (WLS) withweights w(xt) for the normal regression. We again consider the model with d = 1,and the regression line !tted by WLS with w(x) = q1(x)=q0(x) is drawn in dotted linein Fig. 1a. Here, the density q1(x) for imaginary “future” observations or that for thewhole population in sample surveys is speci!ed in advance by
x ∼ N("1; #21); (2.4)
where "1 = 0:0; #21 = 0:32. The ratio of q1(x) to q0(x) is
q1(x)q0(x)
=exp(−(x − "1)2=2#21)=#1exp(−(x − "0)2=2#20)=#0
˙ exp!
− (x − "")2
2 "#2
"
; (2.5)
where "#2 = (#−21 − #−20 )−1 = 0:382, and "" = "#2(#−21 "1 − #−20 "0) =−0:28.
The obtained lines in Fig. 1a are very di#erent for OLS and WLS. The questionis: which is better than the other? It is known that OLS is the best linear unbiasedestimate and makes small mean squared error of prediction in terms of q(y|x)q0(x)which generated the data. On the other hand, WLS with weight (2.5) makes smallprediction error in terms of q(y|x)q1(x) which will generate future observations, andthus WLS is better than OLS here. To con!rm this, a dataset of size n = 100 isgenerated from q(y|x)q1(x) speci!ed by (2.2) and (2.4). The regression line of d= 1!tted by OLS is shown in Fig. 1b, which is considered to have small prediction errorfor the “future” data. The regression line of WLS !tted to the past data in Fig. 1a isquite similar to the line of OLS !tted to the future data in Fig. 1b. In practice, onlythe past data is available. The WLS gave almost the equivalent result to the futureOLS by using only the past data.
直接観測できない変数に興味がある場合
H. Shimodaira / Journal of Statistical Planning and Inference 90 (2000) 227–244 233
For su!ciently large n, loss1(!∗w) is the dominant term on the right-hand side of(4.1), and the optimal weight is w(x) = q1(x)=q0(x) as seen in Section 3. If n is notlarge enough compared with the extent of the misspeci"cation, the O(n−1) terms relatedto the "rst and second moments of !w−!∗w cannot be ignored in (4.1), and the optimalweight changes. In an extreme case where the model is correctly speci"ed, we onlyhave to look at the O(n−1) terms as shown in the following lemma.
Lemma 4. Assume there exists ! ∗ ∈" such that q(y|x) =p(y|x; ! ∗). Then; !∗w = ! ∗
and q(y|x) = p(y|x; !∗w) for all proper w(x). The expected loss E(n)0 (loss1(!w)) is
minimized when w(x) ≡ 1 for su!ciently large n.
5. Information criterion
The performance of MWLE for a speci"ed w(x) is given by (4.1). However, wecannot calculate the value of the expected loss from it in practice, because q(y|x) isunknown. We provide a variant of the information criterion as an estimate of (4.1).
Theorem 1. Let the information criterion for MWLE be
ICw:=− 2L1(!w) + 2 tr(JwH−1w ); (5.1)
where
L1(!) =n!
t=1
q1(x)q0(x)
logp(yt |xt ; !);
Jw =−E0
"
q1(x)q0(x)
@ logp(y|x; !)@!
#
#
#
#
!∗w
@lw(x; y|!)@!′
#
#
#
#
!∗w
$
:
The matrices Jw and Hw may be replaced by their consistent estimates
J w =−1n
n!
t=1
q1(xt)q0(xt)
@ logp(yt |xt ; !)@ !
#
#
#
#
!w
@lw(xt ; yt |!)@!′
#
#
#
#
!w
;
H w =1n
n!
t=1
@2lw(xt ; yt |!)@!@!′
#
#
#
#
!w
:
Then; ICw=2n is an estimate of the expected loss unbiased up to O(n−1) term:
E(n)0 (ICw=2n) = E(n)0 (loss1(!w)) + o(n
−1): (5.2)
Fortunately, expression (5.1) turns out to be rather simpler than that of (4.1), andwe do not have to worry about the calculation of the third derivatives appeared in(4.2). The explicit form of (5.1) for the normal regression is given in (6.1) and (6.2).Given the model p(y|x; !) and the data (x(n); y(n)), we choose a weight function w(x)
which attains the minimum of ICw over a certain class of weights. This is selectionof the weight rather than model selection. Searching the optimal weight over all the
H. Shimodaira / Journal of Statistical Planning and Inference 90 (2000) 227–244 233
For su!ciently large n, loss1(!∗w) is the dominant term on the right-hand side of(4.1), and the optimal weight is w(x) = q1(x)=q0(x) as seen in Section 3. If n is notlarge enough compared with the extent of the misspeci"cation, the O(n−1) terms relatedto the "rst and second moments of !w−!∗w cannot be ignored in (4.1), and the optimalweight changes. In an extreme case where the model is correctly speci"ed, we onlyhave to look at the O(n−1) terms as shown in the following lemma.
Lemma 4. Assume there exists ! ∗ ∈" such that q(y|x) =p(y|x; ! ∗). Then; !∗w = ! ∗
and q(y|x) = p(y|x; !∗w) for all proper w(x). The expected loss E(n)0 (loss1(!w)) is
minimized when w(x) ≡ 1 for su!ciently large n.
5. Information criterion
The performance of MWLE for a speci"ed w(x) is given by (4.1). However, wecannot calculate the value of the expected loss from it in practice, because q(y|x) isunknown. We provide a variant of the information criterion as an estimate of (4.1).
Theorem 1. Let the information criterion for MWLE be
ICw:=− 2L1(!w) + 2 tr(JwH−1w ); (5.1)
where
L1(!) =n!
t=1
q1(x)q0(x)
logp(yt |xt ; !);
Jw =−E0
"
q1(x)q0(x)
@ logp(y|x; !)@!
#
#
#
#
!∗w
@lw(x; y|!)@!′
#
#
#
#
!∗w
$
:
The matrices Jw and Hw may be replaced by their consistent estimates
J w =−1n
n!
t=1
q1(xt)q0(xt)
@ logp(yt |xt ; !)@ !
#
#
#
#
!w
@lw(xt ; yt |!)@!′
#
#
#
#
!w
;
H w =1n
n!
t=1
@2lw(xt ; yt |!)@!@!′
#
#
#
#
!w
:
Then; ICw=2n is an estimate of the expected loss unbiased up to O(n−1) term:
E(n)0 (ICw=2n) = E(n)0 (loss1(!w)) + o(n
−1): (5.2)
Fortunately, expression (5.1) turns out to be rather simpler than that of (4.1), andwe do not have to worry about the calculation of the third derivatives appeared in(4.2). The explicit form of (5.1) for the normal regression is given in (6.1) and (6.2).Given the model p(y|x; !) and the data (x(n); y(n)), we choose a weight function w(x)
which attains the minimum of ICw over a certain class of weights. This is selectionof the weight rather than model selection. Searching the optimal weight over all the
説明変数の分布が変化する場合
5
複雑ネットワークを仮定して推定
6
Paul Sheridan December 2010
Breast cancer discovery
35
2
3
1
2
3
1
Random structure prior Scale-free structure prior
�� = 2.28
1:FOXA1 is a known hub necessary for the expression of estrogen regulated genes.2:SLC39A6 is a previously unknown hub that has been implicated with the spread of breast cancer to the lymph nodes.3:E2F3 is a related to the expression of estrogen regulated genes.
P(G)=ランダムグラフ P(G)=複雑ネットワーク
【MCMCでベイズ事後確率の計算】 Sheridan, Kamimura, Shimodaira (2010) PLoS ONE
【L1正則化によるスパース推定】 小倉,廣瀬,下平 (2013) 日本計算機統計学会シンポ
1:FOXA1, 2:SLC39A6, 3:E2F3
breast cancer expression dataset
psf (G) proved versatile enough to recover random networks onpar with pr(G) itself. Above all, our analysis of the breast cancerexpression data illustrates the practical value of the scale-freestructure prior as an instrument to aid in the identification ofcandidate hub genes with the potential to direct the hypotheses ofmolecular biologists, and thus drive future experiments.
A node labeling s~ s1,s2, . . . ,sp
! ", that is, a permutation of
the integers 1,2, . . . ,p applied to the nodes of G so that each vi[Vis represented by the integer si, is an essential prerequisite for anyMCMC implementation of the scale-free structure prior. Thereason is that the scale-free network model underlying psf (G), orany other scale-free network model for that matter, is so elaboratethat the nodes are not interchangeable in regard to computing theprobability of G. And, although easily overshadowed by psf (G)itself, our new Metropolis-Hastings sampler for s is an innovativecontribution in its own right. Our sampler uses a simple pairswapping strategy for updating s, and one future topic of researchis to investigate the comparative performance of more ingeniousupdate schemes. More research is also required in order to assesseshow accurately s can be estimated.
We take pains to point out that while our implementation is forGGMs, the methodology described here applies to graphicalmodels more generally. For instance, psf (G) could be appliedcrudely to Bayesian network inference by simply ignoring edgedirectionality, or else the underlying static model could bemodified to have directed edges. The latter approach raises aninteresting consideration: in gene regulatory networks, accordingto the prevailing wisdom [31], it is actually only the out-degreedistribution that follows a power-law. By contrast, the in-degree of
a node is usually small and its distribution is better approximatedby a sort of restricted exponential function. While this distinctiongets blurred when inference is conducted with undirectedgraphical models, Bayesian networks provide an obvious incentivefor taking it into account. Indeed, Bayesian networks may prove tobe a more promising area of application because they currentlyable to handle much larger networks than GGMs [63].
Although the static model is not biologically motivated, it is adefensible choice as an underlying model for psf (G) on thegrounds that it is a simple model with the potential to describe anynetwork topology; not to mention that it includes the ER model asa limiting case. But there is more, implementing a structure priorbased on a growing network model poses some added difficultiesbecause not only will the probability of a network depend on thechoice of seed network, but evaluating P(GDh,s) will result in agreater expenditure of computational resources as the edgeinclusion probabilities depend on the order in which they wereadded to the network.
All the same, we implemented two other scale-free structurepriors based on growing models; one on the Poisson-growth,preferential attachment model [64], and another on thebiologically meaningful duplication model. In the former case,we were able to get away with using a single node as the seednetwork, and we found that while this prior recovered heavy-tailednetworks as well as psf (G), yet it understandably struggled toaccurately recover random networks. Meanwhile, the duplicationmodel based structure prior was highly sensitive to the choice ofseed network in addition to being unstable due to the complexityof the model. One future avenue of research is to adapt these
Figure 5. Gene interactions identified by all methods. A subnetwork of the gene interactions involved with the estrogen receptor gene, ESR1,that were commonly identified by the random structure prior, pr(G), the scale-free structure prior, psf (G), and the relevance network (estimated withARACNE). The directed edges indicate the gene interactions on which the Bayesian network (estimated with BANJO) agreed with the other threemethods.doi:10.1371/journal.pone.0013580.g005
A Scale-Free Structure Prior
PLoS ONE | www.plosone.org 10 November 2010 | Volume 5 | Issue 11 | e13580
ネットワーク成長メカニズムの推定
7
【複雑ネットワークの生成モデルにおける優先的選択関数のノンパラメトリック推定】 Pham, Sheridan, 下平 (2013) 統計関連学会連合大会
優先的選択関数𝐴
2
有名人の アカウント
k = 10000
自分の アカウント
k = 2
新しいアカウント
Youtubeのアカウント間のフォロー関係
𝑃 𝑘 ∝ 𝐴 :次数𝑘のノードに繋がる確率 𝐴 = 𝑘 Barabasi-Albert(1999) 𝐴 = 𝑘 Newman(2001)
恐らく次数の高い方に
どっちにフォローするか?
提案手法で推定した𝐴
3
𝑘
1e-08
10e-07
10e-05
1e+00 1e+01 1e+02 1e+03 1e+04 1e+05
𝐴
Youtube データ ノード数 : 約3 ∗ 10 エッジ 数 : 約20 ∗ 10
ブートストラップ法による信頼度計算
遺伝子発現のクラスタリングや分子進化の系統樹を例として ベイズ事後確率とp値の関係を探る
下平英寿 大阪大学 大学院基礎工学研究科
8
チンパンジー
ヒト
ゴリラ
オランウータン
マウス
塩基置換数から系統樹を推定 NC_001807 NC_012920 NC_001643 NC_001645 NC_002083 NC_012920 18
NC_001643 1188 1185 NC_001645 1481 1479 1420
NC_002083 1994 2001 2090 2116
NC_010339 4069 4068 4052 4102 4127
推定された系統樹 (無根系統樹) 推定された系統樹 (有根系統樹)
置換数 (推定値)
ココに根があると考える(マウス以外)
4052
4127
2116
1479
1188
9
紙テープで系統樹を工作
10
hdp://www.youtube.com/watch?v=jqqJODQytzk YouTubeにも置きました
11
ヒトの起源はアフリカ
Ingman et al. (2000) Nature
アフリカ以外
アフリカ
チンパンジー
ヒトとチンプの 共通祖先
アフリカ大陸
ヨーロッパ大陸
進出
98
98
98
82
100
100
哺乳類の系統樹 (32種)
12
ML tree topology: ((G1,G2), (G3,G4), G5)
Fig 1 of Shimodaira and Hasegawa (2005) from the book (ed. Nielsen) Data: mt protein sequences of n=3392 amino acids for s=32 species
枝の小さい数字は推定結果の信頼度
Comparing 15 trees (cont.)
13
系統樹の信頼度は?
human cow
rabbit mouse
opposum
nD 計算結果
サンプルサイズ n
D∞計算 ?
計算
14
「計算」はブラックボックスとみなす
human cow
rabbit mouse
opposum
ℓ1(θ1)
ℓ2(θ2 )
ℓ15(θ15)
max
nD
計算
計算結果
15
反復計算してカウントだけ使う
nD 計算 Yes
Bootstrap Probability
ブートストラップ確率
*1nD
*2nD
*10000nD
計算 Yes
計算 No
計算 Yes
16
BP = #{Yes}
10000≈ P(Yes | Data)
1 2 3 n
1 2 3 m
ランダムに要素をコピーする
ブートストラップ・リサンプリング Efron (1979) Felsenstein (1985)
nD
Dm*1
Dm*2
Dm*10000
17
Bootstrap: n=10,242 columns human
cow rabbit mouse
opposum
18
Dn
Assume i.i.d. for columns
Dm*1,..., Dm
*100
0 < m nIn the “m-‐out-‐of-‐n bootstrap”
m = O(n)We assume , typically m=10242, 5000, 20000, say
, typically m=30, say
BPはベイズ事後確率
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
nD D∞ 19
*1 *2 *3, , ,...n n nD D D BP = P(Tree | data) ≈ #{Tree}
100
Efron and Tibshirani (1998)
New Discovery?
58% 2%
20
ベイズの事後確率
21
1=human 3=rabbit 4=mouse 1=human 3=rabbit 4=mouse
0.05p ≥
ブートストラップ法(m=n)で計算
Shimodaira-‐Hasegawa testは安全すぎる
SH 0.05≥
22
頻度論のp-‐値 (近似的に不偏 AU)
23
1=human 3=rabbit 4=mouse
マルチスケール・ブートストラップ法(m=-‐n)で計算???
ベイズと頻度論のギャップを埋める
NBP(σ 2 ) ! Φ σ Φ−1 BP(σ 2 )( ){ }
正規化したブートストラップ確率(NBP)
σ 2 = n
m2 1σ =2 1σ = − 0
NBP(σ 2 )
AU = NBP(−1)
BP = NBP(+1)
m = n m = −n m = ∞
24
Thomas Bayes (1702-‐1761)
Jerzy Neyman (1894-‐1981)
頻度論 ベイズ統計学
スケーリング則
AUの計算手順: いくつかのm>0でBPを計算して,NBPに変換してからm=-‐nへ外挿
正規分布の上側確率
( ) ( )z
z x dxφ∞
Φ = ∫
2 /212
( ) xx eπ
φ −=
-4 -2 0 2 4
0.0
0.1
0.2
0.3
0.4
-4 -2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
1.64
(1.64) 0.05Φ =
25
もしパラメータにアクセスすれば簡単
0θ
0θ
θ
θ
*θ
*θ
ブートストラップ分布をひっくり返す
データからブートストラップ分布
p-‐value 帰無仮説からシミュレーション
BP(1)
0:H θ θ≤
26
=
確率分布の空間の幾何学で証明
θ θ
頻度論のp-‐値
AU = Φ β0 − β1( ) +O(n−3/2 )
0β
1β
Hθ
ベイズの事後確率
BP = Φ β0 + β1( ) +O(n−3/2 )
0β1β
1β−
27
Jerzy Neyman (1894-‐1981)
頻度論
Thomas Bayes (1702-‐1761)
ベイズ統計学
Rescaling the whole picture
2 3/201( ) ( )BP O nβσ β σ
σ−⎡ ⎤=Φ + +⎢ ⎥⎣ ⎦
BP = Φ β0 + β1( ) +O(n−3/2 )
2 1σ =
Shimodaira (2002)
28
2 1σ =
1σ
×
NBP(σ 2 ) = Φ σ Φ−1 BP(σ 2 )( ){ } = Φ β0 + β1σ
2( ) +O(n−3/2 )
Scaling-‐law and geometry
NBP(σ 2 ) = Φ β0 + β1σ
2 + β2σ4 + β3σ
6 +( ) +O(h2 )
Using a new asympto0c theory with “nearly flat surface h” instead of large n.
Shimodaira (2008)
29
β j (u) = 1
2 j j!j!
j1! jp−1!∂2 j h(u)
∂u12 j1∂up−1
2 jp−1j1++ jp−1= j∑
distance mean curvature
v
u=(u1,..., up-‐1) ( )v h u= −
p-value = Φ β0 − β1 + β2 − β3 +( ) +O(h2 )
k=3: AU=0.052 p=0.05
BP=0.0078 k=2: AU=0.057
2σ-‐1 1
30
Extrapola0ng NBP to m=-‐n
σ Φ−1 BP(σ 2 )( )
ψ (σ 2 ) = β0 + β1σ2 + β2σ
4 ++ βk−1σ2(k−1)Fiqng a model :
Ploqng : NBP(σ 2 ) ! Φ σ Φ−1 BP(σ 2 )( ){ }
What NBP is calcula0ng ?
2 1σ = 2 1σ = −
Shimodaira (2008)
31
NBP(−1) = p-value +O(n−3/2 )
NBP(σ 2 ) = Φ β0 + β1σ
2( ) +O(n−3/2 )
curvature
Applica0ons to Life Science
Brambrink et al (PNAS 2006)
http://cran.r-project.org/web/packages/pvclust/index.html
Mathema0cal Sta0s0cs/Computer Science
7mins@700cpus 80hours@1cpu
TSUBAME 32
GPGPUでブートストラップを並列計算
3
GPGPU でどれくらい高速化されるか遺伝子発現データの階層型クラスタリング
永田(2012)によるGPUとCPUのハイブリット並列
Suzuki and Shimodaira (2006) のpvclustパッケージ
高速化 33