How radiotherapy influence the survival of lung cancer patients

22
1 How radiotherapy influences the survival of lung cancer patients Luqi Kong 2016/05/05

Transcript of How radiotherapy influence the survival of lung cancer patients

Page 1: How radiotherapy influence the survival of lung cancer patients

1

How radiotherapy influences the survival of lung cancer patients

Luqi Kong

2016/05/05

Page 2: How radiotherapy influence the survival of lung cancer patients

2

Contents 1 Introduction ........................................................................................... 3

2 Data description .................................................................................... 3

3 Check for balance .................................................................................. 4

3.1 Chi-square test of independence ..................................................... 4

3.2 Standardized Mean Difference ........................................................ 4

4 Matching ............................................................................................... 5

5 Test for causal effect.............................................................................. 5

6 Further implications ............................................................................... 6

7 Bibliography ........................................................................................... 7

Appendix ...................................................................................................... 8

Page 3: How radiotherapy influence the survival of lung cancer patients

3

1 Introduction Lung cancer is the number one cause of cancer deaths in both men and women in the U.S. and

worldwide. Treatment of lung cancer can involve a combination of surgery, chemotherapy, and

radiation therapy as well as newer experimental methods. The treatment methods have no

absolute positive effects on the patients. For radiation therapy, it may help to control the cancer

and prolong patients’ life, but also has the chance to spread the tumor to other sites of the

body.

This paper aims at examining how radiation therapy influences the survival of lung and bronchus

cancer patients: 1) If radiation has any causal effect on patients’ survival; 2) What is the

direction of the effect if there is causal relationship between radiotherapy and survival; 3) What

is the magnitude of the effect if there is causal relationship between radiotherapy and survival.

The first part of the paper is to understand the data extracted from SEER database. The second

part is to check the balance of the covariates using Chi-square test of independence and

Standard Mean Difference. After checking the balance, the third part is to match on the

covariates that are not well balanced, using Nearest Neighbors Matching, Optimal Matching and

Full Matching methods. The next part is to calculate the treatment effect with the matched data

and reach conclusions to the research problem. The final part includes further implications of

the paper, including the discussion of unobserved biases and instrumental variable.

2 Data description The data of this paper is from SEER database 1973 -2013.The Surveillance, Epidemiology, and

End Results (SEER) of the National Cancer Institute is a program that works to provide

information on cancer statistics in an effort to reduce the burden of cancer among the U.S.

1321 incidences are extracted from the database using the following criterion: 1) the site of

cancer is lung and bronchus; 2) year of diagnosis is 2005; 3) SEER registries are San Francisco-

Oakland SMSA-1973+, San Jose-Monterey-1992+, Los Angeles-1992+, California excluding

SF/SJM/LA-2000+; 4) All the patients had performed surgery; 5) All the patients died at or before

2013; 6) The patients are over 50 years old.

The outcome of this study is the survival months. Treatment is the beam radiation therapy,

while control group is no radiation therapy. Covariates include: age, gender, site, cancer stage,

tumor size, lymph nodes involvement and metastasis. (See Table 1 for detailed description)

Patients in the treatment group live 27 months on average (SD: 23.34), while patients in the

control group live 34 months on average (SD: 27.19). Patients who are older, have higher cancer

stage, have lymph nodes and metastasis involvement tend to live shorter. For gender, female

survive better than male. Patients in LA registry live shorter on average than people in other

sites. (See Table 2 and Figure 1. Figure 1’s violin plot draws the distribution of survival months

upon each value of the covariates. The fatter violin indicates larger sample size.)

Page 4: How radiotherapy influence the survival of lung cancer patients

4

3 Check for balance

3.1 Chi-square test of independence Chi-square distribution assumes treatment that patients receive is not affected by the presence

of covariates. Assume fij is the observed frequency count of events belonging to both i-th

category of x and j-th category of y. Also assume eij to be the corresponding expected count if x

and y are independent.

The null hypothesis of the independence assumption is to be rejected if the p-value of the

following Chi-squared test statistics is less than a given significance level α.

For example, if treatment is independent from lymph nodes involvement, e11=e (treatment=0,

lymph nodes=0) = 816*0.809=660. f11=f (treatment=0, lymph nodes=0) =731. 2

11 = (731-

660)2/660=7.638 (slightly different than 7.562 given by R, due to rounding in my calculation)

| Lymph nodes involvement Treatment | 0 | 1 | Row Total | -------------|-----------|-----------|-----------| 0 | 731 | 338 | 1069 | | 7.562 | 12.219 | | | 0.684 | 0.316 | 0.809 | | 0.896 | 0.669 | | | 0.553 | 0.256 | | -------------|-----------|-----------|-----------| 1 | 85 | 167 | 252 | | 32.078 | 51.833 | | | 0.337 | 0.663 | 0.191 | | 0.104 | 0.331 | | | 0.064 | 0.126 | | -------------|-----------|-----------|-----------| Column Total | 816 | 505 | 1321 | | 0.618 | 0.382 | |

See table 3 for a summary of this test and table 4 for detail.

Among the 7 covariates, only gender (P value: 0.9123) and site (P value: 0.2313) pass the chi-

square test. For the other 5 covariates, the null hypothesis of independence is rejected under

0.05 significance level, which indicates the imbalance of covariates between treatment and

control group.

3.2 Standardized Mean Difference Another method to check the balance is using the standardized mean difference, which is

calculated as the mean difference of a covariate between treatment and control group divided

by the pooled standard deviation.

0.2 is used as the threshold for balance checking. If the standard mean difference of a covariate

is greater than 0.2, then the treatment and control groups are judged to be unbalanced in terms

Page 5: How radiotherapy influence the survival of lung cancer patients

5

of the covariate. Except for gender (SMD: 0.013) and site (SMD:-0.025), all other 5 covariates

have larger than 0.2 standard mean difference. This method gives the same results as the chi-

square test. (See Table 5 and Figure 3)

Therefore, the covariates we need to match on in the next part is age, stage, tumor size, lymph

nodes and metastasis. Gender and site are relatively balanced and not going through matching

process.

4 Matching 3 matching methods are used through MatchIt Package in R: Nearest neighbor, optimal

matching (ratio=1:1) and full matching.

Nearest neighbor matching selects the r (default = 1) best control matches for each individual in

the treatment group (excluding those discarded using the discard option). Matches are chosen

for each treated unit one at a time, with the order specified by the m. order command (default =

largest to smallest). At each matching step we choose the control unit that is not yet matched

but is closest to the treated unit on the distance measure.

The default nearest neighbor matching method in MatchIt is “greedy” matching, where the

closest control match for each treated unit is chosen one at a time, without trying to minimize a

global distance measure. In contrast, “optimal” matching finds the matched samples with the

smallest average absolute distance globally across all the matched pairs. We can choose ratio

(number of control subjects/treatment subjects in each matched subclass). In this paper, 1:1

ratio is used.

Full matching is the method where each matched set contains one treated unit and one or more

controls (or one control unit and one or more treated units). The only units not placed into a

subclass will be those discarded (if a discard option is specified) because they are outside the

range of common support. Full matching is optimal in terms of minimizing a weighted average

of the estimated distance measure between each treated subject and each control subject

within each subclass. Using full matching method, 242 groups are matched out of 252 treated

units and 1069 control units.

Full matching has the best improvement in balance (99.73%) in terms of propensity score,

although it performs the worst in balance improvement in terms of the covariate metastasis.

Overall, full matching achieve the best result of balance. The matched data is then used to

calculate the treatment effect. (See Table 6 for detailed comparison among the 3 matching

methods; Figure 4, 5 for balance comparison before and after matching)

5 Test for causal effect After matching, every subject receives a weight. Each treated unit has the weight 1, while each

control unit receives a weight of 2

1

1*

i

n

n m, where n2 is the control group size, n1 is the treated

group size, and mi is the ratio of control units to treated units in ith matched sub-class.

Page 6: How radiotherapy influence the survival of lung cancer patients

6

The weighted mean of survival months is 27.08 in the treatment group and 28.40 in the control

group after matching. The weighted mean difference is -1.32. Compare to the raw mean

difference before matching (-7.37), the magnitude of treatment effect decreases. This is

probably due to the imbalance of the original data, that patients with worse condition are more

likely to receive the treatment, which decreases the survival months in the treatment group and

amplifies the negative treatment effect.

To test if the weighted mean difference is significantly different from zero, two sample t-test is

used. T score is -0.7616, degree of freedom=482, p value=0.4467, 95% percent confidence

interval is [-5.6778, 2.5059]. Under 5% significance level, we fail to reject the null hypothesis

that true difference in means is equal to 0. To conclude, we have nearly no confidence to say

that radiation has causal effect on survival months. (See Figure 5 for visualization)

Sensitivity analysis is not included in this paper cause when gamma increases from 1, P value

only becomes larger and the original conclusion is strengthened not overturned. So it’s not

necessary to do the sensitivity analysis if the original conclusion is not changed.

6 Further implications In this paper, only 7 covariates are included. Other unobserved covariates such as the psycho

factors, economic status, emotional support and surgery type all influence patients’ chance of

survival. Once these factors are considered, the conclusion of this paper may be changed. For

further research, more information about the patient needs to be incorporated. For example,

the information about chemotherapy should be included. Records about the surgery should be

more detailed, including the surgery type and hospital. In addition, if the patient’s mental

positivity can be measured in some way, it will be a good source of information.

We can also include instrumental variable into the research to distinguish the units in the

matched group. Doctor’s recommendations for treatment may serve as an IV. There are patients

who are compliant with the recommendations while others are not. These different responses

to the incentives (IV) may result in a more accurate analysis.

In addition, machine learning techniques such as boosted gradient model can be tried to

estimate the propensity score. The method in this paper is logistic regression, which is a

parametric model. With non-parametric models, we rely less on the assumptions.

Last but not least, multiple control groups can be formed, including “radiation before surgery”,

“radiation during surgery” and “radiation after surgery”. These different treatments may have

different effect on the survival of the patients.

Page 7: How radiotherapy influence the survival of lung cancer patients

7

7 Bibliography

[1] D. Ho, K. Imai, G. King and a. E. Stuart, "Matching as Nonparametric Preprocessing for

Reducing Model Dependece in Parametric Causal Inference," Political Analysis, vol. 15, no.

3, pp. 199-236, 2007.

[2] G. R. a. A. R. M. Daniel F. McCaffrey, "Propensity Score Estimation With Boosted Regression

for Evaluating causal effects in observational studies," Psychological Methods, vol. 9, no. 4,

pp. 403-425, 2004.

[3] B. B. H. a. J. Bowers, "Covariate balance in simple, stratified," vol. 23, no. 2, pp. 219-236,

2008.

[4] P. R. Rosenbaum, Design of Observational Studies, Springer-Verlag New York, 2010.

[5] B. a. K. S. Hansen, " Optimal full matching and related designs," JCGS, vol. 15, pp. 609-627,

2006.

[6] H. Wickham, ggplot2: Elegant Graphics for data analysis, Springer-Verlag New York, 2009.

[7] A. Z. a. K. David Meyer, "The Strucplot Framework:Visualizing Multi-Way Contingency

Tables with vcd," Journal of Statistical Software, vol. 17, no. 3, pp. 1-48, 2006.

Page 8: How radiotherapy influence the survival of lung cancer patients

8

Appendix

Page 9: How radiotherapy influence the survival of lung cancer patients

9

Table 1: Variable Description Variable Type of Variable Value Recode Survival Months Continuous >=0

Treatment Binary Beam Radiation 0

No Radiation 1

Age Ordered Categorical 50-60 years old 1

60-70 years old 2

70-80 years old 3

>80 years old 4

Sex Categorical Male 1

Female 2

Registry Site Categorical CA(Not SF/SJM/LA) 1

Los Angeles 2

San Francisco 3

San Jose-Monterey 4

Stage Ordered Categorical Localized 1

Regional 2

Distant 3

Tumor Size Ordered Categorical 0-20mm 1

20-40mm 2

40-100mm 3

Lymph Nodes Binary No involvement 0

Involvement 1

Metastasis Binary No involvement 0

Involvement 1

Page 10: How radiotherapy influence the survival of lung cancer patients

10

Table 2: Survival by different covariates (Numbers are rounded to integer)

Variable Value Mean Standard Deviation

1st Quantile

Median 3rd Quantile

Treatment Radiation 27 23 10 20 37

No Radiation 34 27 12 27 55

Age

50-60 33 26 12 26 50

60-70 36 27 14 28 55

70-80 34 27 11 26 52

80-90 26 25 6 16 41

Gender Male 30 26 10 21 47

Female 36 27 13 30 56

Site

CA(Not the 3 sites below) 32 27 11 24 50

LA 31 26 10 22 50

SF 38 28 15 33 61

SJM 37 27 15 34 57

Stage

Localized 40 28 16 36 62

Regional 31 25 11 23 46

Distant 22 23 5 15 31

Tumor Size

0-20mm 39 27 16 35 60

20-40mm 35 27 13 27 54

40-100mm 27 25 9 18 40

Lymph nodes No 36 28 13 30 58

Yes 27 24 9 20 41

Metastasis No 35 27 13 27 54

Yes 23 23 6 14 32

Page 11: How radiotherapy influence the survival of lung cancer patients

11

Figure 1: Survival by different covariates

Page 12: How radiotherapy influence the survival of lung cancer patients

12

Figure 1: Survival by different covariates

Page 13: How radiotherapy influence the survival of lung cancer patients

13

Figure 2: Treament by different covariates

Page 14: How radiotherapy influence the survival of lung cancer patients

14

Table 3: Pearson’s chi-square test for independence between treatment and covariates Covariates 2 Degree of

freedom P value

Age 43.33 3 2.094e-09

Sex 0.01 1 0.9123

Site 4.30 3 0.2313

Stage 94.74 2 < 2.2e-16

Tumor Size 40.71 2 1.442e-09

Lymph Nodes 102.23 1 2.2e-16

Metastasis 29.12 1 6.865e-08

Table 4: Crosstable for each covariate and treatment

Age and treatment | Age Treatment | 1 | 2 | 3 | 4 | Row Total | -------------|-----------|-----------|-----------|-----------|-----------| 0 | 117 | 318 | 436 | 198 | 1069 | | 3.778 | 1.121 | 1.314 | 2.052 | | | 0.109 | 0.297 | 0.408 | 0.185 | 0.809 | | 0.676 | 0.763 | 0.855 | 0.896 | | | 0.089 | 0.241 | 0.330 | 0.150 | | -------------|-----------|-----------|-----------|-----------|-----------| 1 | 56 | 99 | 74 | 23 | 252 | | 16.026 | 4.756 | 5.575 | 8.707 | | | 0.222 | 0.393 | 0.294 | 0.091 | 0.191 | | 0.324 | 0.237 | 0.145 | 0.104 | | | 0.042 | 0.075 | 0.056 | 0.017 | | -------------|-----------|-----------|-----------|-----------|-----------| Column Total | 173 | 417 | 510 | 221 | 1321 | | 0.131 | 0.316 | 0.386 | 0.167 | |

Page 15: How radiotherapy influence the survival of lung cancer patients

15

Gender and treatment | Gender Treatment | 1 | 2 | Row Total | -------------|-----------|-----------|-----------| 0 | 537 | 532 | 1069 | | 0.003 | 0.003 | | | 0.502 | 0.498 | 0.809 | | 0.811 | 0.807 | | | 0.407 | 0.403 | | -------------|-----------|-----------|-----------| 1 | 125 | 127 | 252 | | 0.013 | 0.013 | | | 0.496 | 0.504 | 0.191 | | 0.189 | 0.193 | | | 0.095 | 0.096 | | -------------|-----------|-----------|-----------| Column Total | 662 | 659 | 1321 |

Site and treatment | Site Treatment | 1 | 2 | 3 | 4 | Row Total | -------------|-----------|-----------|-----------|-----------|-----------| 0 | 120 | 73 | 252 | 624 | 1069 | | 0.309 | 0.045 | 0.416 | 0.050 | | | 0.112 | 0.068 | 0.236 | 0.584 | 0.809 | | 0.769 | 0.830 | 0.843 | 0.802 | | | 0.091 | 0.055 | 0.191 | 0.472 | | -------------|-----------|-----------|-----------|-----------|-----------| 1 | 36 | 15 | 47 | 154 | 252 | | 1.309 | 0.190 | 1.767 | 0.210 | | | 0.143 | 0.060 | 0.187 | 0.611 | 0.191 | | 0.231 | 0.170 | 0.157 | 0.198 | | | 0.027 | 0.011 | 0.036 | 0.117 | | -------------|-----------|-----------|-----------|-----------|-----------| Column Total | 156 | 88 | 299 | 778 | 1321 | | 0.118 | 0.067 | 0.226 | 0.589 | |

Page 16: How radiotherapy influence the survival of lung cancer patients

16

Stage and treatment | Stage Treatment | 1 | 2 | 3 | Row Total | -------------|-----------|-----------|-----------|-----------| 0 | 466 | 453 | 150 | 1069 | | 10.687 | 2.734 | 4.651 | | | 0.436 | 0.424 | 0.140 | 0.809 | | 0.941 | 0.749 | 0.679 | | | 0.353 | 0.343 | 0.114 | | -------------|-----------|-----------|-----------|-----------| 1 | 29 | 152 | 71 | 252 | | 45.335 | 11.599 | 19.730 | | | 0.115 | 0.603 | 0.282 | 0.191 | | 0.059 | 0.251 | 0.321 | | | 0.022 | 0.115 | 0.054 | | -------------|-----------|-----------|-----------|-----------| Column Total | 495 | 605 | 221 | 1321 | | 0.375 | 0.458 | 0.167 | |

Tumor Size and treatment | Size Treatment | 1 | 2 | 3 | Row Total | -------------|-----------|-----------|-----------|-----------| 0 | 240 | 497 | 332 | 1069 | | 0.794 | 1.937 | 5.036 | | | 0.225 | 0.465 | 0.311 | 0.809 | | 0.857 | 0.861 | 0.716 | | | 0.182 | 0.376 | 0.251 | | -------------|-----------|-----------|-----------|-----------| 1 | 40 | 80 | 132 | 252 | | 3.369 | 8.215 | 21.363 | | | 0.159 | 0.317 | 0.524 | 0.191 | | 0.143 | 0.139 | 0.284 | | | 0.030 | 0.061 | 0.100 | | -------------|-----------|-----------|-----------|-----------| Column Total | 280 | 577 | 464 | 1321 | | 0.212 | 0.437 | 0.351 | |

Page 17: How radiotherapy influence the survival of lung cancer patients

17

Lymph Nodes Involvement and treatment | Lymph Nodes Involvement Treatment | 0 | 1 | Row Total | -------------|-----------|-----------|-----------| 0 | 731 | 338 | 1069 | | 7.562 | 12.219 | | | 0.684 | 0.316 | 0.809 | | 0.896 | 0.669 | | | 0.553 | 0.256 | | -------------|-----------|-----------|-----------| 1 | 85 | 167 | 252 | | 32.078 | 51.833 | | | 0.337 | 0.663 | 0.191 | | 0.104 | 0.331 | | | 0.064 | 0.126 | | -------------|-----------|-----------|-----------| Column Total | 816 | 505 | 1321 | | 0.618 | 0.382 | |

Metastasis and treatment | Metastasis Treatment | 0 | 1 | Row Total | -------------|-----------|-----------|-----------| 0 | 954 | 115 | 1069 | | 0.764 | 5.002 | | | 0.892 | 0.108 | 0.809 | | 0.832 | 0.657 | | | 0.722 | 0.087 | | -------------|-----------|-----------|-----------| 1 | 192 | 60 | 252 | | 3.240 | 21.221 | | | 0.762 | 0.238 | 0.191 | | 0.168 | 0.343 | | | 0.145 | 0.045 | | -------------|-----------|-----------|-----------| Column Total | 1146 | 175 | 1321 | | 0.868 | 0.132 | |

Page 18: How radiotherapy influence the survival of lung cancer patients

18

Table 5: Check for balance-Standardized mean difference

Variable Treatment Control Mean difference

Standard Deviation

Standardized Mean Difference

Z score P value

Age 2.250 2.670 -0.415 0.064 -0.460 -6.470 9.95e-11

Gender 1.500 1.500 0.006 0.035 0.013 0.180 8.57e-01

Site 3.270 3.290 -0.025 0.072 -0.025 -0.350 7.27e-01

Stage 2.170 1.700 0.462 0.050 0.677 9.340 9.49e-21

Size 2.370 2.090 0.279 0.052 0.382 5.400 6.62e-08

Lymph 0.663 0.316 0.347 0.034 0.742 10.200 2.46e-24

Metastasis 0.238 0.108 0.131 0.024 0.389 5.500 3.89e-08

Figure 3: Check for balance-Standardized mean difference

Standardized Mean Difference

Covariates

Page 19: How radiotherapy influence the survival of lung cancer patients

19

Table 6: Comparison of different matching methods

Note:

0X : Mean in treated group before matching 0X : Mean in treated group after matching

1X : Mean in control group before matching 1X : Mean in control group after matching

SMD : Standardized mean difference before matching SMD : Standardized mean difference after matching

: Improvement in balance after matching

Matching Methods Features 0X 1X SMD

0X 1X SMD %

Nearest Neighbor (252 treated, 252 control)

Propensity Score 0.2972 0.1657 0.8595 0.2972 0.2948 0.0152 98.2354

Age 2.2540 2.6688 -0.4582 2.2540 2.2698 -0.0175 96.1741

Stage 2.1667 1.7044 0.7594 2.1667 2.1746 -0.0130 98.2831

Tumor Size 2.3651 2.0861 0.7315 2.3651 0.6548 0.0168 97.7096

Lymph Nodes 0.6627 0.3162 0.3757 0.6627 2.3611 0.0053 98.5778

Metastasis 0.2381 0.1076 0.3058 0.2381 0.2302 0.0186 93.9192

Optimal (1:1) (252 treated, 252 control)

Propensity Score 0.2972 0.1657 0.8595 0.2972 0.2948 0.0154 98.2109

Age 2.2540 2.6688 -0.4582 2.2540 2.2857 -0.0351 92.3482

Stage 2.1667 1.7044 0.7594 2.1667 2.1865 -0.0326 95.7079

Tumor Size 2.3651 2.0861 0.7315 2.3651 0.6587 0.0084 98.8548

Lymph Nodes 0.6627 0.3162 0.3757 0.6627 2.3492 0.0214 94.3111

Metastasis 0.2381 0.1076 0.3058 0.2381 0.2421 -0.0093 96.9596

Full (252 treated, 1069 control)

Propensity Score 0.2972 0.1657 0.8595 0.2972 0.2975 -0.0023 99.7299

Age 2.2540 2.6688 -0.4582 2.2540 2.2647 -0.0118 97.4221

Stage 2.1667 1.7044 0.7594 2.1667 2.1756 -0.0147 98.0610

Tumor Size 2.3651 2.0861 0.7315 2.3651 0.6605 0.0047 99.3513

Lymph Nodes 0.6627 0.3162 0.3757 0.6627 2.3657 -0.0008 99.7888

Metastasis 0.2381 0.1076 0.3058 0.2381 0.2174 0.0485 84.1393

Page 20: How radiotherapy influence the survival of lung cancer patients

20

Figure 4: Covariate balance before and after matching

Page 21: How radiotherapy influence the survival of lung cancer patients

21

Figure 5: Overall balance (Propensity Score) before and after matching

The number in Y-axis is due to “scaling” in R.

Page 22: How radiotherapy influence the survival of lung cancer patients

22

Figure 6: Treatment Effect