A Deep Learning Model to Detect Anemia from Echocardiography

4
A Deep Learning Model to Detect Anemia from Echocardiography J Weston Hughes Computer Science Stanford University Stanford CA [email protected] James Zou Biomedical Data Science Stanford University Stanford CA [email protected] David Ouyang Smidt Heart Institute Cedars-Sinai Medical Center Los Angeles CA [email protected] Abstract Computer vision models applied in medical imaging domains are capable of diag- nosing diseases beyond what human physicians are capable of unassisted. This is especially the case in cardiology, where echocardiograms, electrocardiograms, and other imaging methods have been shown to contain large amounts of information beyond that described by simple clinical observation. Using 67,762 echocardio- grams and temporally associated laboratory hemoglobin test results, we trained a video-based deep learning algorithm to predict abnormal lab values. On held-out test data, the model achieved an area under the curve (AUC) of 0.80 in predict- ing low hemoglobin. We applied smoothgrad to further understand the features used by the model, and compared its performance with a linear model based on demographics and features derived from the echocardiogram. These results suggest that advanced algorithms can obtain additional value from diagnostic imaging and identify phenotypic information beyond the ability of expert clinicians. 1 Introduction Recent advances in computational techniques have shown that deep learning applied to medical images can identify phenotypes beyond what is currently possible by observation from human clinicians alone [3, 8, 2]. Such discoveries span across a variety of medical imaging modalities and disease states, suggesting that medical imaging holds rich additional information that can predict a wide range of patient phenotypes, diagnoses, and outcomes. With cardiovascular imaging, deep learning has uncovered demographics such as age and sex [3], more complicated phenotypes such as heart failure [7], arrhythmias [1], and hypertrophic cardiomyopathy [4], and rivals human physicians in assessing ejection fraction [6]. Recent work has shown that the electrocardiograms, electrical signals from the heart, can be used to diagnose anemia, a life-threatening disorder demonstrated by a low concentration of hemoglobin in the blood. Given that Video understanding models have had great success in interpreting echocardiogram studies, or ultrasound videos of the heart, we hypothesized that applying computer vision to echocardiography could also be used to diagnose anemia. Anemia can be associated with a hyperdynamic cardiac states, but would be challenging for human clinicians to identify on echocardiograms. Echocardiography is the most common form of cardiovascular imaging, combining cost-effective image acquisition, lack of ionizing radiation, and high temporal resolution to capture spatiotemporal information on cardiac motion and function. Variations in physiologic state, including disturbances in electrolytes and biomarkers, can greatly influence cardiac function, and over time, structure. In 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.

Transcript of A Deep Learning Model to Detect Anemia from Echocardiography

Page 1: A Deep Learning Model to Detect Anemia from Echocardiography

A Deep Learning Model to Detect Anemia fromEchocardiography

J Weston HughesComputer Science

Stanford UniversityStanford CA

[email protected]

James ZouBiomedical Data Science

Stanford UniversityStanford CA

[email protected]

David OuyangSmidt Heart Institute

Cedars-Sinai Medical CenterLos Angeles CA

[email protected]

Abstract

Computer vision models applied in medical imaging domains are capable of diag-nosing diseases beyond what human physicians are capable of unassisted. This isespecially the case in cardiology, where echocardiograms, electrocardiograms, andother imaging methods have been shown to contain large amounts of informationbeyond that described by simple clinical observation. Using 67,762 echocardio-grams and temporally associated laboratory hemoglobin test results, we trained avideo-based deep learning algorithm to predict abnormal lab values. On held-outtest data, the model achieved an area under the curve (AUC) of 0.80 in predict-ing low hemoglobin. We applied smoothgrad to further understand the featuresused by the model, and compared its performance with a linear model based ondemographics and features derived from the echocardiogram. These results suggestthat advanced algorithms can obtain additional value from diagnostic imaging andidentify phenotypic information beyond the ability of expert clinicians.

1 Introduction

Recent advances in computational techniques have shown that deep learning applied to medicalimages can identify phenotypes beyond what is currently possible by observation from humanclinicians alone [3, 8, 2]. Such discoveries span across a variety of medical imaging modalities anddisease states, suggesting that medical imaging holds rich additional information that can predicta wide range of patient phenotypes, diagnoses, and outcomes. With cardiovascular imaging, deeplearning has uncovered demographics such as age and sex [3], more complicated phenotypes such asheart failure [7], arrhythmias [1], and hypertrophic cardiomyopathy [4], and rivals human physiciansin assessing ejection fraction [6]. Recent work has shown that the electrocardiograms, electricalsignals from the heart, can be used to diagnose anemia, a life-threatening disorder demonstrated by alow concentration of hemoglobin in the blood. Given that Video understanding models have had greatsuccess in interpreting echocardiogram studies, or ultrasound videos of the heart, we hypothesizedthat applying computer vision to echocardiography could also be used to diagnose anemia. Anemiacan be associated with a hyperdynamic cardiac states, but would be challenging for human cliniciansto identify on echocardiograms.

Echocardiography is the most common form of cardiovascular imaging, combining cost-effectiveimage acquisition, lack of ionizing radiation, and high temporal resolution to capture spatiotemporalinformation on cardiac motion and function. Variations in physiologic state, including disturbancesin electrolytes and biomarkers, can greatly influence cardiac function, and over time, structure. In

34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada.

Page 2: A Deep Learning Model to Detect Anemia from Echocardiography

Figure 1: Our experimental setup. We compiled a large dataset of echocardiograms and hemoglobinlab values acquired through the course of clinical care. We used a video understanding model(ResNet2+1D) to predict hemoglobin concentration from echocardiogram, and used the prediction todiagnose anemia.

this study, we applied a deep learning algorithm to a large data set of transthoracic echocardiogramscollected over twenty years to answer whether echocardiograms can predict anemia.

2 Methods

We compiled a dataset of 67,762 echocardiograms performed in the course of clinical care at Stanfordhospital from patients who also received a hemoglobin lab test. A single apical-4-chamber 2Dgray-scale video was identified from each echocardiogram study and used to represent the individualpatient for mapping to laboratory values. Echocardiograms were pre-processed to standard resolutionand identifying information outside of the ultrasound sector such as text, ECG and respirometerinformation was removed. From the electronic health record, laboratory values were extracted andpaired with the representative echocardiogram video. A total of 67,762 videos were split by patientidentifier into 58,116 videos for training, 4,841 videos for validation, and 4,805 videos for testingmodel performance such that the same patient was never seen in multiple splits of the data. Thisresearch was approved by the Stanford University Institutional Review Board.

Models were built using Python 3.8 and PyTorch 1.4. Following [6], we chose the ResNet2+1Darchitecture [10] with 32 frames of input, sampling every second frame from the original video. Themodel was trained to regress hemoglobin concentration using mean squared error. During modeltraining, the hemoglobin value closest in time to each video was used as the training label, and videoswere excluded from training if the patient did not have a corresponding laboratory value pair. Inthe validation and test sets, the same process was applied with the additional constraint that onlylabels acquired within 30 days of the echocardiogram were included. During evaluation, anemiawas defined as having a concentration of hemoglobin below 10 grams/100 ml, which was used as athreshold for converting to a binary diagnosis and when calculating area under receiver operatingcurve (AUROC).

The model was trained to minimize the squared loss between the prediction and true lab value. Modeltraining used a stochastic gradient descent optimizer with an initial learning rate of 0.001, momentumof 0.9, and batch size of 20 for 45 epochs. The learning rate was decayed by a factor of 0.1 every 15epochs. The weights from the epoch with the lowest validation AUROC was selected for final testing.Our final model averages predictions across the entire echocardiogram video over all possible 32frame clips to account for potential variance between beats.

Table 1: Results. AUC and R2 of convolutional model based on echocardiogram, and linear modelbased on features known to be present in echocardiogram. The linear model fails to perform betterthan random, demonstrating the model picks up novel features.

Metric Vision Model Regression Baseline

AUC 0.80 0.49R2 0.34 0.04

2

Page 3: A Deep Learning Model to Detect Anemia from Echocardiography

Figure 2: Two examples with low predicted hemoglobin, and two with high predicted hemoglobin,with overlaid smoothgrad saliency map in green. The areas of highest importance are around themitral valves and crux of the heart.

One way a model might learn to predict anemia is to predict other statistics which are containedin echocardiogram data, and use those statistics to predict anemia. For example, age, sex, and leftventricular ejection fraction can all be predicted from echocardiogram with high accuaracy [3, 6]. Todetermine if the model truly learned novel features, we trained a logistic regression model based onage, binary sex, and ejection fraction, and compared results with our deep learning model.

To further understand the features the model is learning, we applied smoothgrad [9] to the exampleswith the most extreme hemoglobin predictions. Smoothgrad is a method for generating sensitivitymaps on single model inputs. Several copies of the example are perturbed using gaussian noise, andthe gradient of the model output with respect to each perturbed example is taken and averaged. Thisproduces a heat-map of gradients over different areas of the input, highlighting important regions.While smoothgrad is generally used on still image data, it is trivially extended to video data byindependently perturbing every pixel in every frame, and taking the gradient with respect to each ofthem.

3 Results

We predicted blood hemoglobin concentration from patient echocardiograms and present accuracyresults in Table 1. Prediction was set up as a binary classification task, predicting normal versusabnormal lab value, based on standard thresholds. To understand model generalization, the model wasevaluated on a held-out test set not used in any way during model development, from a set of patientscompletely disjoint from those used during training. Using a binary threshold of 10 grams/100 mlfor diagnosing anemia, the model achieved an AUC of 0.80 (95% CI: 0.79-0.81). Additionally, themodel achieved an R2 score of 0.34 (95% CI: 0.316-0.362) in predicting hemoglobin concentration.

To understand the features used by our model, we performed two additional analyses. Previous work[3, 6] has shown that age, binary sex, and left ventricular ejection fraction are readily predictablefrom echocardiogram. To determine if the model was using these known existing features to makeits prediction, we trained linear models to predict anemia and hemoglobin based on age, binary sex,and ejection fraction. The models achieved a test AUROC of 0.49 (95% CI: 0.49-0.51) in predictinganemia, and an R2 of 0.04 in predicting hemoglobin. This demonstrates that the model is makinguse of features other than those previously shown to be present in echocardiogram. Second, we usedsmoothgrad to highlight the features the model “focuses” on, shown in Figure 2.

4 Discussion

Echocardiography is a common, noninvasive imaging modality without ionizing radiation and withrapid acquisition and high temporal resolution, and recent advances in deep learning models appliedto echocardiography have shown impressive results in identifying features from the videos thatdirectly relate to the clinical interpretation of echocardiograms. Echocardiography, however, is notused to determine laboratory test values such as hemoglobin. Surprisingly, hemoglobin can be readilypredicted from echocardiography videos, extending prior work in other modalities that show a similarrelationship [5]. One exciting future direction is in exploring whether this result extends to othercommon lab values, such as NT-proBNP, A1c, and troponin. External validation at a different hospitalwould demonstrate the generalizability of the results.

3

Page 4: A Deep Learning Model to Detect Anemia from Echocardiography

5 Broader Impact

Human interpretation of medical imaging relies on experience, has significant person-to-personheterogeneity, and has been suggested to not capture all the information of the imaging. Computervision applied to medical imaging can identify previously unknown relationships, standardize inter-pretations, and leverage information to limit additional testing. The most obvious societal impactof this work is to improve clinical care for patients receiving echocardiograms. It also contributesto the broader scientific and clinical understanding of what information echocardiograms contain,potentially suggesting future research directions with further clinical impact.

References[1] Zachi I Attia et al. “An artificial intelligence-enabled ECG algorithm for the identification of patients

with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction”. In: TheLancet 394.10201 (2019), pp. 861–867.

[2] Antonin Dauvin et al. “Machine learning can accurately predict pre-admission baseline hemoglobin andcreatinine in intensive care patients”. In: NPJ digital medicine 2.1 (2019), pp. 1–10.

[3] Amirata Ghorbani et al. “Deep learning interpretation of echocardiograms”. In: NPJ digital medicine 3.1(2020), pp. 1–10.

[4] Wei-Yin Ko et al. “Detection of hypertrophic cardiomyopathy using a convolutional neural network-enabled electrocardiogram”. In: Journal of the American College of Cardiology 75.7 (2020), pp. 722–733.

[5] Joon-myoung Kwon et al. “A deep learning algorithm to detect anaemia with ECGs: a retrospective,multicentre study”. In: The Lancet Digital Health 2.7 (2020), e358–e367.

[6] David Ouyang et al. “Video-based AI for beat-to-beat assessment of cardiac function”. In: Nature (2020),pp. 1–5.

[7] Perry J Pickhardt et al. “Automated CT biomarkers for opportunistic prediction of future cardiovascularevents and mortality in an asymptomatic screening population: a retrospective cohort study”. In: TheLancet Digital Health (2020).

[8] Ryan Poplin et al. “Prediction of cardiovascular risk factors from retinal fundus photographs via deeplearning”. In: Nature Biomedical Engineering 2.3 (2018), p. 158.

[9] Daniel Smilkov et al. “Smoothgrad: removing noise by adding noise”. In: arXiv preprintarXiv:1706.03825 (2017).

[10] Du Tran et al. “A closer look at spatiotemporal convolutions for action recognition”. In: Proceedings ofthe IEEE conference on Computer Vision and Pattern Recognition. 2018, pp. 6450–6459.

4