Fdr.stepwise.regression
-
Upload
pcdjohnson -
Category
Science
-
view
237 -
download
0
Transcript of Fdr.stepwise.regression
[email protected]!00@paulcdjo0
How0not0to0make0a0fool0of0yourself0with0model0selec:on0
Paul0Johnson0
IBAHCM0PostDoc/PI0seminar0290May020150 10
[email protected]!00@paulcdjo0
“Reproducibility.is0the0ability0of0an0en:re0experiment0or0study0to0be0reproduced,0either0by0the0researcher0or0by0someone0else0working0independently.0It0is0one0of0the0main0principles0of0the0scien:fic0method”0Wikipedia00“non0reproducible0single0occurrences0are0of0no0significance0to0science”0Karl0Popper,0The$Logic$of$Scien-fic$Discovery$
IBAHCM0Postdoc/PI0Seminar,0290May020150 20
[email protected]!00@paulcdjo0
A0crisis0of0irreproducibility?0
IBAHCM0Postdoc/PI0Seminar,0290May020150 30
[email protected]!00@paulcdjo0
A0crisis0of0irreproducibility?0
IBAHCM0Postdoc/PI0Seminar,0290May020150 40
Schoenfeld0&0Ioannidis0Am0J0Clin0Nutr.02013;097(1):0127\34.0
Decreased0risk0of0cancer0
Increased0risk0of0cancer0
Most0foods0are0associated0with0both0increased0and0decreased0risk0of0cancer0
[email protected]!00@paulcdjo0
A0crisis0of0irreproducibility?0
IBAHCM0Postdoc/PI0Seminar,0290May020150 50
[email protected]!00@paulcdjo0
A0crisis0of0irreproducibility?0
IBAHCM0Postdoc/PI0Seminar,0290May020150 60
[email protected]!00@paulcdjo0
How0to0not0make0a0fool0of0yourself0with0P0values0
• “You0make0a0fool0of0yourself0if0you0declare0that0you0have0discovered0something,0when0all0you0are0observing0is0random0chance.”0
• “…what0maaers0is0the.probability.that,.when.you.find.that.a.result.is.‘sta8s8cally.significant’,.there.is.actually.a.real.effect.”00
• “If0you0find0a0‘significant’0result0when0there0is0nothing0but0chance0at0play,0your0result0is0a0false0posi:ve,0and0the.chance.of.ge=ng.a.false.posi8ve.is.o?en.alarmingly.high.”0
• This0probability0is0called0the0false.discovery.rate.(FDR)..o FDR.=.P(no.effect.|.significant.result)0
IBAHCM0Postdoc/PI0Seminar,0290May020150 70
Colquhoun0D.020140An0inves:ga:on0of0the0false0discovery0rate0and0the0misinterpreta:on0of0p\values.0R.0Soc.0Open0Sci.01:01402160
[email protected]!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 80
Colquhoun0D.020140An0inves:ga:on0of0the0false0discovery0rate0and0the0misinterpreta:on0of0p\values.0R.0Soc.0open0sci.01:01402160
The0false0discovery0rate0(FDR)0
[email protected]!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 90
The0false0discovery0rate0(FDR)0
[email protected]!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 100
• No0of0discoveries0=0800+0450
The0false0discovery0rate0(FDR)0
[email protected]!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 110
• No0of0discoveries0=01250
The0false0discovery0rate0(FDR)0
[email protected]!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 120
• No0of0discoveries0=01250• No0of0false0discoveries0=0450
The0false0discovery0rate0(FDR)0
[email protected]!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 130
• No0of0discoveries0=01250• No0of0false0discoveries0=0450• False0discovery0rate0=045/1250=036%00
The0false0discovery0rate0(FDR)0
[email protected]!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 140
• No0of0discoveries0=01250• No0of0false0discoveries0=0450• False0discovery0rate0=045/1250=036%00
The0false0discovery0rate0(FDR)0
We0know0this0number…0
…but0not0this0one0
[email protected]!00@paulcdjo0IBAHCM0Postdoc/PI0Seminar,0290May020150 150
• No0of0discoveries0=01250• No0of0false0discoveries0<0500• False0discovery0rate0<050/1250=040%00
The0false0discovery0rate0(FDR)0
We0know0this0number…0
…we0can0guess0this0one0
[email protected]!00@paulcdjo0
OK,0I’m0alarmed,0but0what0has0this0got0to0do0with0model0selec:on?0
• The0alarmingly0high0risk0of0a0“significant”0result0being0false0poten:ally0applies0to0any0sta:s:cal0method0(not0only0significance0tes:ng)0that0divides0hypotheses0into0“hits”0and0“misses”0
• Simple0model0selec:on0methods0–0in0par:cular0stepwise0selec:on0–0are0prone0to0making0random0noise0look0like0discoveries0
• The0false0discovery0rate0provides0a0simple0way0to0o Illustrate0the0unreliability0of0stepwise0selec:on0o Poten:ally0make0it0more0reliable0
IBAHCM0Postdoc/PI0Seminar,0290May020150 160
[email protected]!00@paulcdjo0
What0is0model0selec:on?0
• A0method0of0sta-s-cal$inference0(learning0from0data)0that0selects0the0best0model0from0a0set0of0several0candidate0models0
• Very0commonly0applied0to0regression0models0• There’s0a0great0deal0of0debate0about0how0(and0how0not)0to0do0
model0selec:on0–0no0:me0to0get0into0this0here0
IBAHCM0Away0Day,0180December020130 170
[email protected]!00@paulcdjo0
What0is0stepwise0model0selec:on?0
• Aims0to0iden:fy0the0subset0of0p0explanatory0variables,0x10,0x2,0…,0xp0that0best0explains0varia:on0in0the0response0variable,0y$
• Backwards0stepwise0selec:on01. Fit0full0regression0model:0y0=0β00+0β10x10+0β20x20+0…0+0βp0xp0+0
ε02. Drop0weakest0x0(e.g.0largest0P0value)0and0refit03. Repeat0step020un:l0all0surviving0x0are0significant0
IBAHCM0Away0Day,0180December020130 180
[email protected]!00@paulcdjo0
What0is0stepwise0model0selec:on?0
• Forwards0stepwise0selec:on01. Fit0minimal0full0regression0model:0y0=0β00+0ε02. Add0the0strongest0x0(smallest0P0value)0if0significant003. Repeat0step020un:l0strongest0predictor0is0not0significant0
• For0both0direc:ons,0we0divide0p0predictors0into:0o Selected:0β$≠000o Not0selected:0β$=000
IBAHCM0Away0Day,0180December020130 190
[email protected]!00@paulcdjo0
Problems0with0simple0stepwise0model0selec:on0
• Overconfidence0&0bias0in0the0selected0model0o P\values0underes:mated0due0to0mul:ple0tes:ng0
• Leads.to.selec8on.of.too.many.variables.• Difficult0to0adjust0for0
o Uncertainty0(standard0errors,0CIs)0underes:mated0o Effect0sizes0(slope,0R2,0etc)0overes:mated0
• Poor0search0algorithm0o Inconsistent,0e.g.0forwards0≠0backwards0(some:mes)0o Majority0of0models0unexplored0
IBAHCM0Postdoc/PI0Seminar,0290May020150 200
[email protected]!00@paulcdjo0
Example0using0simulated0data0
• We0have0a0con:nuous0response,0y,0that0we0would0like0to0explain0using0200con:nuous0explanatory0variables,0x10to0x200
• n0=0500observa:ons0• We0plan0to0use0backwards0stepwise0selec:on:00
1. Fit0maximal0model02. Drop0weakest0x0(largest0P0value)0and0refit03. Repeat0step020un:l0P0<00.050for0all0all0surviving0x0
IBAHCM0Postdoc/PI0Seminar,0290May020150 210
[email protected]!00@paulcdjo0
Example0using0simulated0data0
• Ini:al0(full)0model:00o y0=0β00+0β10x10+0β20x20+0…0+0β200x200+0ε$
• The0true0values0of0the0slopes0β1\200are0o β1\30=00.300o β4\200=000
• So0the0correct0model0is:00o y0=0β00+0β10x10+0β20x20+0β30x30+0ε00
IBAHCM0Postdoc/PI0Seminar,0290May020150 220
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x1
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x2
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x3
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x4
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x5
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x6
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x7
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x8
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x9
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●●●
●
●
●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x10
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x11
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x12
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x13
●●
●
●●●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x14
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x15
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●● ●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x16
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x17
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x18
● ●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x19
●●
●
●●●
●●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
−3 −2 −1 0 1 2 3
−20
12
34
y
x20
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
Slope estimates +/− 95% CI with 20 predictors remaining
True R2 = 22 %Est. Radj
2 = 34 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
Slope estimates +/− 95% CI with 19 predictors remaining
True R2 = 22 %Est. Radj
2 = 36 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
Slope estimates +/− 95% CI with 18 predictors remaining
True R2 = 22 %Est. Radj
2 = 38 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ●
Slope estimates +/− 95% CI with 17 predictors remaining
True R2 = 22 %Est. Radj
2 = 40 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ●
Slope estimates +/− 95% CI with 16 predictors remaining
True R2 = 22 %Est. Radj
2 = 41 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ● ● ● ●
Slope estimates +/− 95% CI with 15 predictors remaining
True R2 = 22 %Est. Radj
2 = 43 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●●
●
●
●
●
●
●
●
●
●●
●
● ● ● ● ● ●
Slope estimates +/− 95% CI with 14 predictors remaining
True R2 = 22 %Est. Radj
2 = 44 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
● ●
●
●
●
●
●
●
●
●●
●
● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 13 predictors remaining
True R2 = 22 %Est. Radj
2 = 45 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
● ●
●
●
●
●
●
●
● ●
●
● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 12 predictors remaining
True R2 = 22 %Est. Radj
2 = 45 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
● ●
●
●
●
●
●
●●
●
● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 11 predictors remaining
True R2 = 22 %Est. Radj
2 = 46 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●●
● ●
●
● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 10 predictors remaining
True R2 = 22 %Est. Radj
2 = 45 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 9 predictors remaining
True R2 = 22 %Est. Radj
2 = 44 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 8 predictors remaining
True R2 = 22 %Est. Radj
2 = 43 %
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 7 predictors remaining
True R2 = 22 %Est. Radj
2 = 43 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Slope estimates +/− 95% CI with 20 predictors remaining
True R2 = 0 %Est. Radj
2 = 21 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
Slope estimates +/− 95% CI with 19 predictors remaining
True R2 = 0 %Est. Radj
2 = 23 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ●
Slope estimates +/− 95% CI with 18 predictors remaining
True R2 = 0 %Est. Radj
2 = 26 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
● ● ●
Slope estimates +/− 95% CI with 17 predictors remaining
True R2 = 0 %Est. Radj
2 = 28 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β) ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ● ●
Slope estimates +/− 95% CI with 16 predictors remaining
True R2 = 0 %Est. Radj
2 = 30 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ● ● ● ●
Slope estimates +/− 95% CI with 15 predictors remaining
True R2 = 0 %Est. Radj
2 = 32 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●●
●
●
●●
●
●
● ● ● ● ● ●
Slope estimates +/− 95% CI with 14 predictors remaining
True R2 = 0 %Est. Radj
2 = 33 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●●
●
●
●
●
●
●
● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 13 predictors remaining
True R2 = 0 %Est. Radj
2 = 35 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●●
●
●
●
●
●
● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 12 predictors remaining
True R2 = 0 %Est. Radj
2 = 36 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●●
●
●
●
●
● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 11 predictors remaining
True R2 = 0 %Est. Radj
2 = 37 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 10 predictors remaining
True R2 = 0 %Est. Radj
2 = 38 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 9 predictors remaining
True R2 = 0 %Est. Radj
2 = 39 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 8 predictors remaining
True R2 = 0 %Est. Radj
2 = 39 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 7 predictors remaining
True R2 = 0 %Est. Radj
2 = 38 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 6 predictors remaining
True R2 = 0 %Est. Radj
2 = 37 %
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
●
●
●
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Slope estimates +/− 95% CI with 5 predictors remaining
True R2 = 0 %Est. Radj
2 = 36 %
[email protected]!00@paulcdjo0
Recap0
• Backwards0stepwise0selec:on:00o 70discoveries0o 50false0discoveries0using0data0with0permuted0y$o FDR0=05/70=071%?0
• No0of0false0discoveries0is0random0–0need0to0average0over0many0permuta:ons0
IBAHCM0Postdoc/PI0Seminar,0290May020150 240
[email protected]!00@paulcdjo0
Distribution of N false discoveries from 1000 permutations, k = 3.84
n.false.discoveries
Freq
uenc
y
0 2 4 6 8 10 12
010
020
030
040
050
0
7 discoveriesMean 2 falsediscoveries
FDR = 28%
IBAHCM0Postdoc/PI0Seminar,0290May020150 250
[email protected]!00@paulcdjo0
Now0what?0
• Having0an0es:mate0of0FDR0=0P(making0a0fool0of0ourselves)0is0useful0in0itself0
• Now0that0we0can0es:mate0FDR,0we0can0increase0stringency0un:l0FDR0is0acceptable0
IBAHCM0Postdoc/PI0Seminar,0290May020150 260
[email protected]!00@paulcdjo0
Effect0on0FDR0of0increasing0selec:on0criterion0stringency0
●
●
●
● ●
●
●●
●
● ● ●
2 3 4 5 6 7 8 9
0.0
0.1
0.2
0.3
0.4
0.5
0.6
k
Fals
e di
scov
ery
rate
11
72
1 1
11 1
1
0 0 0
Relationship between FDR and test stringency (k)
IBAHCM0Postdoc/PI0Seminar,0290May020150 270
min(AIC)0
P0<00.050
k0
k0=07.50gives0
FDR0=020%0
[email protected]!00@paulcdjo0
Final0model0aiming0for0FDR0=020%0
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5 10 15 20
−1.0
−0.5
0.0
0.5
1.0
Predictor
Slop
e(β)
●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Final model selected using FDR = 20%
True R2 = 22 %Est. Radj
2 = 14 %
IBAHCM0Postdoc/PI0Seminar,0290May020150 280
[email protected]!00@paulcdjo0
How0can0we0avoid0making0fools0of0ourselves0with0model0selec:on?0
• Is0it0necessary?0• Is0automa:c0selec:on0appropriate,0i.e.0are0all0hypotheses0
equally0plausible?0• Avoid0stepwise0selec:on0–0use0superior0methods,0e.g.0lasso0• If0using0stepwise0selec:on0
o Gauge0reliability0of0results0(e.g.0monitoring0FDR)0o Controlling0reliability0(e.g.0control0FDR)0
• Transparency0is0the0last0defence0against0folly!0
IBAHCM0Postdoc/PI0Seminar,0290May020150 290
[email protected]!00@paulcdjo0
Conclusions0
• The0“crisis0of0irreproducibility”0is0harmful0and0we0need0to0avoid0contribu:ng0to0it0
• We0need0to0be0aware0of0the0(un)reliability0of0our0findings…0o Some$scien-sts$have$unreasonable$expecta-ons$of$replica-on$of$results0–0Stephen0Senn0
• …so0we0need0to0understand0the0proper:es0of0our0sta:s:cal0analyses0o Banning0P\values0is0not0the0answer0o Beaer0sta:s:cal0understanding0from0design0to0analysis0is0part0of0the0answer0
IBAHCM0Postdoc/PI0Seminar,0290May020150 300