Post on 29-Dec-2015
• 1
Active learning based survival regression for censored data
Bhanukiran Vinzamuri (bhanukiranv@wayne.edu)
Yan Li (rockli_yan@wayne.edu)
Chandan K. Reddy(reddy@cs.wayne.edu)
• 2
Index of presentationIntroduction
Problem Description
Cox Regression
Regularized Cox Regression
Coordinate Majorization Descent solver for EN-COX.
KEN-COX
Model discriminative Gradient based Sampling
Active learning based Regularized Cox Regression
Experimental Evaluation
Conclusions and Future Work
References
• 3
IntroductionCensored data are observed in many real world applications such as clinical health informatics, genomic analysis and finance applications.
Mining censored data poses a unique challenge to the data mining community due to the fact that standard regression models cannot be directly applied on them.
Censored data consists of two different entries associated with the feature vector.
A time-to-event variable to represent the time.
A binary indicator variable for representing the censored status.
• 4
Problem descriptionObserving the trends of 10 patients for 12 days post their discharge from their index hospitalization.
Patients (2,3,5) have the event of interest observed. Patients (1,8,9) do not have the event of interest before the end of the 12th day. Finally, patients (4,7) dropout from the study.
• 5
Notations used in this paperName Description
X n x m matrix of feature vectors.
T n x 1 vector of failure times.
K number of unique failure times.
δ n x 1 binary vector of censored status.
set of all patients at risk at time
β m x 1 regression coefficient vector
L(β) partial log-likelihood
h(t|X) conditional hazard probability
base hazard rate
base survival rate
S(t|X) conditional survival probability
Ke column wise kernel matrix
• 6
Cox regression Cox models the effect of covariates on the hazard rate
but leaves the baseline hazard rate unspecified.
It does NOT assume knowledge of absolute risk and estimates relative rather than absolute risk.
It uses the proportional hazards assumption which states that the hazard for any individual is a fixed proportion of the hazard for any other individual.
• 7
Mathematical formulation of Cox regression
Baseline Hazard Function
Baseline Survival Function
Conditional Survival Probability
• 8
Hazard probabilities predicted for EHR data
• 9
Regularized Cox RegressionCox regression models have the tendency to overfit which limits their generalizability to different scenarios.
LASSO provides a sparse solution but cannot handle correlation. Elastic net provides sparsity and is effective at handling correlation.
We look at a coordinate majorization descent (CMD) based solver for elastic net cox (EN-COX).
• 10
CMD solver for EN-COX
Composite log likelihood function to be minimized.
Coordinate wise component of the composite function.
Pre-computed term to accelerate the computation
Coordinate wise update for regression coefficient vector.
Soft threshold operator.
• 11
Regularized Cox Regression AlgorithmAlgorithm 1 Regularized Cox Regression (RegCox)
Require: Feature Set Censored variable Time-to-event Regularization parameter
1: Initialize β
2: repeat
3: Compute from using Equations (4),(5)
4: for j = 1,……,m do
5: Set the objective function and apply the CMD procedure
6: Compute the updating factor for computing using
Equation (6)
7:
8: end for
9: Update =
10: until Convergence of
11: Output
12: Output base hazard function using and
• 12
Extending solver for KEN-COX
The original elastic net is extended by adding another term which incorporates a column wise kernel similarity matrix into the computation.
The goal of introducing this additional term is to help the regularization be even more effective at handling correlation and grouped correlation.
• 13
Model discriminative gradient based sampling =
This sampling method chooses that instance which maximizes the criterion mentioned above. This criterion consists of two components the first being the hazard probability computed at each unique time-to-event values.
The second component is the absolute value of the gradient of the log likelihood function computed at the given point.
• 14
Active Regularized Cox Regression (ARC)Algorithm 2 ARC Algorithm
Require: Training Set Train, Unlabelled pool Pool, Time-to-event T, Censored status ,
Active learning rounds max
1:
2: repeat
3:
4: for each instance in Pool do
5: Use model discriminative gradient sampling for each instance in Pool
6: end for
7:
8: Query domain expert for label (time-to-event) of
9:
10:
11:
12: until
• 15
Flowchart for ARC with KEN-COX
• 16
Metrics for evaluation in survival analysis
Survival AUC is also called the concordance index.
In survival analysis physicians and researchers are often more interested in evaluating the relative risk of a disease between patients with different covariates, than the absolute survival times of these patients
The root mean squared error (RMSE) measures the goodness of fit obtained using the survival model.
• 17
Experimental SetupWe conduct experiments to evaluate our ARC framework on EHR data from Henry Ford Hospital Detroit, Michigan. In addition, publicly available censored datasets are also used for our evaluation.
Time-to-event (30 day readmission) values are calculated using the prior admission and discharge dates and patients are right censored using the 30 day readmission study period. For other survival datasets right censoring information is inherently provided.
We also generate synthetic datasets using the normal distribution to generate the feature vectors and a Weibull distribution to generate the synthetic response times.
• 18
# Instances, # Features and Active Learning Sampling Size in Dataset
Dataset #Inst #Feat Train(Samp Size)
Breast 686 10 100(20)
Colon 311 19 50(10)
PBC 888 15 200(20)
EHR 1 5675 98 500(100)
EHR 2 4379 98 500(100)
EHR 3 3543 98 500(100)
EHR 4 2826 98 500(50)
Syn1 500 15 100(15)
Syn2 500 50 100(15)
Syn3 100 50 50(1)
• 19
Experimental ResultsDataset L-COX EN-COX C-Boost RSF GBCI ARC-L ARC-
ENARC-KEN
Breast 0.61 0.63 0.67 0.68 0.69 0.65 0.6856 0.734
Colon 0.651 0.65 0.62 0.60 0.64 0.738 0.735 0.859
PBC 0.735 0.759 0.86 0.863 0.79 0.81 0.825 0.862
EHR 1 0.54 0.55 0.59 0.58 0.59 0.60 0.64 0.671
EHR 2 0.56 0.5822 0.60 0.61 0.601 0.66 0.68 0.71
EHR 3 0.533 0.553 0.59 0.59 0.58 0.575 0.58 0.601
EHR 4 0.54 0.55 0.58 0.569 0.56 0.585 0.581 0.645
Syn1 0.59 0.628 0.60 0.61 0.589 0.7823 0.838 0.92
Syn2 0.801 0.815 0.86 0.94 0.93 0.86 0.867 0.921
Syn3 0.67 0.688 0.64 0.64 0.664 0.73 0.78 0.81
• 20
Comparison of rMSE std values of ARCDataset ARC(LASSO) ARC(EN) ARC(KEN)
Breast
Colon
PBC
EHR 1
EHR 2
EHR 3
EHR 4
Syn1
Syn2
Syn3
• 21
Active learning curves
• Breast
• EHR 2• EHR 1
• Colon
• 22
Active learning curves contd..
• PBC • Synthetic1
• Synthetic3• Synthetic2
• 23
Conclusions and Future Work
We present an active learning extension to the cox regression framework using a novel model discriminative gradient based sampling procedure.
The proposed method uses a fast and scalable optimization method to converge efficiently. Experimental results on EHR data and censored datasets indicates that the proposed models have good discriminative ability and outperform other competing survival regression methods.
Future work includes studying extending this active learning model to transfer learning using regularized cox regression and accelerated failure time models.
• 24
ReferencesJ. P. Klein and M. L. Moeschberger. Survival analysis:techniques for censored and truncated data. Springer,2003.
P. Sasieni. Cox regression model. Encyclopedia of Biostatistics, 1999.
N. Simon, J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for cox proportional hazards model via coordinate descent. Journal of Statistical Software, 39(5):1–13, 2011.
B. Vinzamuri and C. K. Reddy. Cox regression with correlation based regularization for electronic health records. In Proceedings of the IEEE International Conference on Data Mining (ICDM), pages 757–767.IEEE, 2013.
Y. Chen, Z. Jia, D. Mercola, and X. Xie. A gradient boosting algorithm for survival analysis via direct optimization of concordance index. Computational and mathematical methods in medicine, 2013.
B. Settles. Active learning literature survey. University of Wisconsin, Madison, 2010.
Y. Yang and H. Zou. A Cocktail Algorithm for Solving The Elastic Net Penalized Cox Regression in High Dimensions. Statistics and Its Interface, 2012.
http://dmkd.cs.wayne.edu/survival