Feature selection can hurt model inference

30
Feature Selection Can Hurt Model Inference Wayne Tai Lee

Transcript of Feature selection can hurt model inference

Page 1: Feature selection can hurt model inference

Feature Selection Can Hurt Model Inference

Wayne Tai Lee

Page 2: Feature selection can hurt model inference

I can see this happen...

- Deep diving into your A/B test data:- Your A/B test said the new feature did not improve the target

metric- It is tempting to deep-dive into the A/B test data to see if you

can detect an effect through fancy modeling- Part of your modeling involves some feature selection- You now detected an impact of your feature!

Page 3: Feature selection can hurt model inference

Prep work - what is a collider?

A B

C

- Let A be a coin toss- Let B be a separate coin toss- Let C be “Were the coin toss

outcomes from A and B the same?”

- C is a “collider”

Page 4: Feature selection can hurt model inference

Colliders can pass knowledge from one another

A B

C

- Let A be a coin toss- Let B be a separate coin toss- Let C be “Were the coin toss

outcomes from A and B the same?”

- C is a “collider”

- Knowing A doesn’t tell you anything about B.

- BUT if you know C in addition, you know B, even if B is not measured.

Page 5: Feature selection can hurt model inference

Possible Tech World Issue

A B

C

S

?

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 6: Feature selection can hurt model inference

We want to know if the Ad affected sign-up chances

A B

C

S

?

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 7: Feature selection can hurt model inference

Ads affect click rate (with calls to action)

A B

C

S

?

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 8: Feature selection can hurt model inference

Your user’s background affects their clicks

A B

C

S

?

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 9: Feature selection can hurt model inference

A user’s background, like age, industry, personal interests will also affect sign-up

A B

C

S

?

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 10: Feature selection can hurt model inference

Your A/B testing should have eliminated the correlation between background and the exposure to Ads

A B

C

S

?

X

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 11: Feature selection can hurt model inference

If you did not perform an A/B test, you ad could have been shown to people who would sign-up anyway

A B

C

S

?

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 12: Feature selection can hurt model inference

Clicks themselves should not drive sign-ups…it is more common that something is driving both clicks and sign-ups

A B

C

S

?X

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 13: Feature selection can hurt model inference

Most likely you cannot or do not know what to measure for the background

A B

C

S

?

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 14: Feature selection can hurt model inference

Feature selection that predicts sign-ups will likely pick up C since B drives both C and S.

A B

C

S

?

Pred(S) = func(A, C)

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 15: Feature selection can hurt model inference

Even if A and C do not affect S….

A B

C

S

Prob(S) = f(B)

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 16: Feature selection can hurt model inference

Even if A and C do not affect S….Now if you predict S based on A and CYou will detect an effect from A!

A B

C

S

Prob(S) = f(B)ButPred(S) =func(A, C)

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 17: Feature selection can hurt model inference

Implication: you will think your ads matter for sign-upsEven though you performed an A/B test!

A B

C

S

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 18: Feature selection can hurt model inference

Intuition: your predictive model picked up knowledgeof B through C.

A B

C

S

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 19: Feature selection can hurt model inference

Solution: stick to classic A/B testing and do notadd features without careful thought!

A B

C

S

- A is a new Ad campaign- C is the total Clicks

(engagement) of the user- B is the Background of

the user- S indicates whether the

user signed-up

Page 20: Feature selection can hurt model inference

Simulation: Generate the data

A

B

C

S

B = Uniform(0, 1)A = Bernoulli(0.2)C = 1[B > 0.5] * ceiling(Exponential(1/(B + A)))S = Bernoulli(B)------------------------------------n = 10000B = runif(n, 0, 1)A = rbinom(n, 1, 0.2)C = ifelse(B > 0.5, ceiling(rexp(n, 1/(B + A))), 0)S = rbinom(n, 1, B)

Page 21: Feature selection can hurt model inference

Simulation: Feature selection is often some form of correlation check

> cor(S, C)[1] 0.3083319

A B

C

S

Page 22: Feature selection can hurt model inference

Simulation: We detect a strong effect from A

> summary(lm(S ~ A + C))

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.381158 0.006409 59.470 < 2e-16 ***A 0.056145 0.011937 4.704 2.59e-06 ***C 0.119463 0.003646 32.766 < 2e-16 ***

A

B

C

S

Page 23: Feature selection can hurt model inference

Simulation: No problems if you include everything (often not feasible in real life)

> summary(lm(S ~ A + C + B))

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.0002407 0.0085286 0.028 0.977 A 0.0108208 0.0103250 1.048 0.295 C -0.0048050 0.0037920 -1.267 0.205 B 1.0031360 0.0171023 58.655 <2e-16 ***

A

B

C

S

Page 24: Feature selection can hurt model inference

Simulation: Also no problems if you don’t add the extra variables from feature selection

> summary(lm(S ~ A))

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.498498 0.005593 89.125 <2e-16 ***A 0.012458 0.012482 0.998 0.318

A

B

C

S

Page 25: Feature selection can hurt model inference

Even with experimental data, adding features to yourwithout careful thought can lead to wrong inference!

A B

C

S

Page 26: Feature selection can hurt model inference

How to spot this?

A B

C

S

Page 27: Feature selection can hurt model inference

Recall that engagement metrics are ultimately proxies of more important indicators

A B

C

S

Page 28: Feature selection can hurt model inference

If your model detected an impact when you know it is not true, your model likely picked up something else...

A B

C

S

X

> summary(lm(S ~ A + C))

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.381158 0.006409 59.470 < 2e-16 ***A 0.056145 0.011937 4.704 2.59e-06 ***C 0.119463 0.003646 32.766 < 2e-16 ***

Page 29: Feature selection can hurt model inference

Yes, looking at the features that you don’t really care about can help!

A B

C

S

X

> summary(lm(S ~ A + C))

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 0.381158 0.006409 59.470 < 2e-16 ***A 0.056145 0.011937 4.704 2.59e-06 ***C 0.119463 0.003646 32.766 < 2e-16 ***

Page 30: Feature selection can hurt model inference

Question?

A B

C

S

Send me a LinkedIn message!https://www.linkedin.com/in/waynetailee/