Disease Outcome Prediction Using Graph...
Transcript of Disease Outcome Prediction Using Graph...
Disease Outcome Prediction Using GAE
Disease Outcome Prediction Using GraphAuto-encoders
Juliette Valenchon
Department of Electrical and Computer Engineering, McGill University, MontrealUnder the supervision of Mark Coates
February 13, 2019
1/19
Disease Outcome Prediction Using GAE
Introduction
Motivation
Head and Neck Cancer1 Breast Cancer2 Cardiovascular disease3
Colorectal Cancer4Lung Cancer5
Alzheimer’s Disease6
1Reproduced from ”Head and Neck Cancer is not Just a Smoker’s Disease Anymore”, Mount Sinai News. 2 S. Roan,
”Early Stage Breast Cancer: Do You Really Need Your Lymph Nodes Removed?”, Everyday Health. 3 ”ConqueringCardiovascular Disease”, NIH. 4 ”Colorectal Cancers”, Dr. Fuhrman. 5 K. O’Sullican, ”New drug approved for advanced lungcancer by HSE”, The Irish Times. 6 M. Casalino, ”Alzheimer’s Association Offers Virtual Dementia Tour”, Patch
2
3
4
5
6
2/19
Disease Outcome Prediction Using GAE
Introduction
Motivation
I Traditionally: risk calculator for possibility of disease development.I Framingham study: prediction for hospitalization for long-term
cardiovascular disease
Risk calculator
Machine learning algorithm
Patient information
Patient information
Accuracy of 56 %
Accuracy of 82 %
Figure 2: Comparison of a risk calculator and a machine learning algorithm7
7W. Dai et al., “Prediction of hospitalization due to heart diseases by supervised learning methods” Int. J. medical
informatics, vol. 84, no. 3, pp. 189–197, 2015.3/19
Disease Outcome Prediction Using GAE
Introduction
Motivation
Mild Cognitive Impairment
(MCI): Clinical diagnosis
Possible Alzheimer’s Disease (AD) : irreversible disease, destroys brain cells
Dementia : 46.8 million in 2015
Stable
→ Early and accurate diagnosis for an early treatment to improve thequality of life for some time
4/19
Disease Outcome Prediction Using GAE
Introduction
Goal
I Predict conversion from MCI to ADI Multimodal data with missing values
(a) MRI8(b) DTI9
(c) PET10
I Use characteristics of subjects
8Clinica developers, ”Volume pre-processing - Clinica Documentation”. 9 Rachel VanCott, ”NOVA — scienceNOW —
Diagnosing Damage image 3 — PBS”. 10 University of California - Berkeley, ”PET scans reveal key details of Alzheimer’sprotein growth in aging brains”
9
10
5/19
Disease Outcome Prediction Using GAE
Introduction
Goal
I Predict conversion from MCI to ADI Multimodal data with missing values
(a) MRI8(b) DTI9
(c) PET10
I Use characteristics of subjects
8Clinica developers, ”Volume pre-processing - Clinica Documentation”. 9 Rachel VanCott, ”NOVA — scienceNOW —
Diagnosing Damage image 3 — PBS”. 10 University of California - Berkeley, ”PET scans reveal key details of Alzheimer’sprotein growth in aging brains”
9
10
5/19
Disease Outcome Prediction Using GAE
Introduction
Problem formulation
→ Deal with missing values→ Perform classification
→ Matrix completion→ Label as feature
6/19
Disease Outcome Prediction Using GAE
Introduction
Problem formulation
→ Deal with missing values→ Perform classification
→ Matrix completion→ Label as feature
6/19
Disease Outcome Prediction Using GAE
Introduction
Matrix completion
I Recover missing values by solving optimization problemI Loss function :
l = ||Ω ∗ (M − M)||+ γlΩb(M, M) + β
q∑i=1
Wi (1)
n-1 features
m s
ubje
cts
M mxn Value available
Missing value
labels
7/19
Disease Outcome Prediction Using GAE
Architecture
Graph methods for the prediction of MCI to AD conversion
Figure 5: Overview of the pipeline used for classification of population graphsusing Graph Convolutional Networks. Reproduced from Parisot et al. [1]
I No missing data
I One graph
8/19
Disease Outcome Prediction Using GAE
Architecture
Graph methods for the prediction of MCI to AD conversion
Figure 6: Vivar et al. [2]
I Matrix completionI Missing dataI One graph
9/19
Disease Outcome Prediction Using GAE
Architecture
A novel graph-based method for the prediction of MCI toAD conversion
Features
Subjects
MGAE
Features
Subjects
Bipartite graph
Figure 7: Proposed architecture
I Matrix completion : Van den Berg et al. [3]I Missing dataI Multiple graphs
10/19
Disease Outcome Prediction Using GAE
Architecture
Defining the feature dependencies
60 80age
1.0
0.5
0.0
0.5
1.0
feat
ure
sex0.01.0
60 80age
1.0
0.5
0.0
0.5
1.0
feat
ure
sex0.01.0
Age-related features.
60 80age
1.0
0.5
0.0
0.5
1.0
feat
ure
sex0.01.0
60 80age
1.0
0.5
0.0
0.5
1.0fe
atur
e
sex0.01.0
Sex-related features.
60 80age
1.0
0.5
0.0
0.5
1.0
feat
ure
sex0.01.0
60 80age
1.0
0.5
0.0
0.5
1.0
feat
ure
sex0.01.0
Age & Sex features.
Figure 8: Relationships of age and sex (Men and Women) with six differentfeatures in the case of Alzheimer’s disease.
11/19
Disease Outcome Prediction Using GAE
Architecture
Bipartite graph
1
2
i
m
1
2
j
j+1
n
Subjects FeaturesGroup 1Group kGroup K
M(1,1)
M(1,2)M(2,2)
M(1,j+1)
M(1,n)
M(2,n) M(2,j)
M(2,j+1)
M(i,j)
M(i,n)
M(m,n)
M(m,1)
M(i,1)
M(i,2)
I Relationship between a groupof subjects and a group offeatures
12/19
Disease Outcome Prediction Using GAE
Architecture
Graph Auto-encoder
Subject i
Age-related featuresSex-related featuresAge & Sex-related features
Features
Feature j
Subjects
WomenMen
i
j
Subject embedding
Feature embedding
Graph encoder
Embeddings
DecoderM(i,j)
13/19
Disease Outcome Prediction Using GAE
Implementation details
Datasets
TADPOLE dataset
I 779 subjects
I 564 features
I 21 % missing data
564 features
779
subj
ects
M 779x564 Value available
Missing value
Creation of a synthetic dataset
I 779 subjects
I 564 features
I No missing data
564 features
779
subj
ects
M 779x564 Value available
14/19
Disease Outcome Prediction Using GAE
Implementation details
Creation of the synthetic dataset
FeatureAge, Sex or Age & Sex related
M(i , j) = mj fij + ij +εij +vj ∗yi (2)
fij = xi if age (3)
= si if sex (4)
= sixi if age & sex (5)
mj ∼ U [−m,m] (6)
ij ∼ U [a, b] (7)
εij ∼ N (0, σ) (8)
vj ∼ U [c , d ] (9)
15/19
Disease Outcome Prediction Using GAE
Implementation details
Evaluation measure for performance
I Chosen Metrics: Integral of ROC : AUC (Area Under the Curve)I ROC measures the true positive rate, relative to the number of false
positivesI Integral of ROC ranges from 0 to 1, with 1 being the best
0.0 0.2 0.4 0.6 0.8 1.0False Positive Rate
0.0
0.2
0.4
0.6
0.8
1.0
True
Pos
itive
Rat
e
Perfect ROCRandom
Figure 9: Perfect and random ROC curve
16/19
Disease Outcome Prediction Using GAE
Results
Results on the real dataset
Linear SVM sRGCNN GC-MC RFAlgorithm
0.4
0.5
0.6
0.7
0.8
AUC
Comparison of four algorithms
17/19
Disease Outcome Prediction Using GAE
Results
Results on the synthetic dataset
Linear SVM GC-MC RFAlgorithm
0.60
0.65
0.70
0.75
0.80
0.85
0.90
0.95
AUC
Comparison of three algorithms
18/19
Disease Outcome Prediction Using GAE
Results
Conclusion
I Better than baseline methods linear SVM and MLP
I Better performance than sRGNN by 2.9 %
I Random Forest performs better
Future work:
I Remove missing values in the dataset
19/19