Implementation of a classification strategy of Raman data ...

14
RESEARCH PAPER Implementation of a classification strategy of Raman data collected in different clinical conditions: application to the diagnosis of chronic lymphocytic leukemia M. Féré 1 & C. Gobinet 1 & L. H. Liu 1 & A. Beljebbar 1 & V. Untereiner 2 & D. Gheldof 3 & M. Chollat 4 & J. Klossa 4 & B. Chatelain 3 & O. Piot 1,2 Received: 16 September 2019 /Revised: 31 October 2019 /Accepted: 3 December 2019 # Springer-Verlag GmbH Germany, part of Springer Nature 2019 Abstract The literature is rich in proof of concept studies demonstrating the potential of Raman spectroscopy for disease diagnosis. However, few studies are conducted in a clinical context to demonstrate its applicability in current clinical practice and workflow. Indeed, this translational research remains far from the patients bedside for several reasons. First, samples are often cultured cell lines. Second, they are prepared on non-standard substrates for clinical routine. Third, a unique supervised classification model is usually constructed using inadequate cross-validation strategy. Finally, the implemented models maximize classification accu- racy without taking into account the clinicians needs. In this paper, we address these issues through a diagnosis problem in real clinical conditions, i.e., the diagnosis of chronic lymphocytic leukemia from fresh unstained blood smears spread on glass slides. From Raman data acquired in different experimental conditions, a repeated double cross-validation strategy was combined with different cross-validation approaches, a consensus label strategy and adaptive thresholds able to adapt to the clinicians needs. Combined with validation at the patient level, classification results were improved compared to traditional strategies. Keywords Raman spectroscopy . Chronic lymphocytic leukemia . Pre-processing . Supervised classification algorithms . Label consensus . Clinical practice Introduction Raman spectroscopy is a label-free biophotonic technique, applicable to the analysis of complex biological samples, such as biofluids, cells, and tissues. Raman spectroscopy allows accessing the global molecular composition of the biological samples. With this technology, it is possible to detect biochemical changes caused by various diseases such as can- cer [13] or metabolic pathologies [47]. The literature is rich of proof of concept studies demonstrating that Raman spectroscopy has a high potential to improve disease diagnosis [3, 811] or predict its progression [12, 13] for a better personalization of patient care. However, beyond the proofs of concept, few studies are realized in a clinical context in order to demonstrate the applicability of Raman spectroscopy in current clinical practice and workflow. Indeed, this translational research remains far from the patient s bedside for several reasons. For example, samples are often cultured cell lines [1416] or prepared on non-standard substrates [1720]. Furthermore, concerning the construction of prediction models, the literature is highly heterogeneous on the choices of the trainingvalidation strategy and of the supervised classi- fication algorithm [8, 10, 2124]. Inappropriate choices can lead to overfitting, i.e., models with poor generalization prop- erties on unknown data. Indeed, models that are too specialized to the training set are unable to correctly predict the class of new unknown data, resulting in a high difference between training and validation accuracies. Overfitting is highly dependent on * C. Gobinet [email protected] 1 BioSpecT EA 7506, Faculty of Pharmacy, University of Reims Champagne-Ardenne, 51 rue Cognacq-Jay, 51096 Reims, France 2 Cellular and Tissular Imaging Platform PICT, Faculty of Pharmacy, University of Reims Champagne-Ardenne, 51 rue Cognacq-Jay, 51096 Reims, France 3 CHU UCL Namur, Namur Thrombosis and Hemostasis Center, Hematology Laboratory, Rue Dr Gaston Therasse, Catholic University of Louvain, 5530 Yvoir, Belgium 4 TRIBVN, 39 Rue Louveau, 92320 Châtillon, France Analytical and Bioanalytical Chemistry https://doi.org/10.1007/s00216-019-02321-z

Transcript of Implementation of a classification strategy of Raman data ...

Page 1: Implementation of a classification strategy of Raman data ...

RESEARCH PAPER

Implementation of a classification strategy of Raman data collectedin different clinical conditions: application to the diagnosis of chroniclymphocytic leukemia

M. Féré1& C. Gobinet1 & L. H. Liu1

& A. Beljebbar1 & V. Untereiner2 & D. Gheldof3 & M. Chollat4 & J. Klossa4 &

B. Chatelain3& O. Piot1,2

Received: 16 September 2019 /Revised: 31 October 2019 /Accepted: 3 December 2019# Springer-Verlag GmbH Germany, part of Springer Nature 2019

AbstractThe literature is rich in proof of concept studies demonstrating the potential of Raman spectroscopy for disease diagnosis.However, few studies are conducted in a clinical context to demonstrate its applicability in current clinical practice and workflow.Indeed, this translational research remains far from the patient’s bedside for several reasons. First, samples are often cultured celllines. Second, they are prepared on non-standard substrates for clinical routine. Third, a unique supervised classification model isusually constructed using inadequate cross-validation strategy. Finally, the implemented models maximize classification accu-racy without taking into account the clinician’s needs. In this paper, we address these issues through a diagnosis problem in realclinical conditions, i.e., the diagnosis of chronic lymphocytic leukemia from fresh unstained blood smears spread on glass slides.From Raman data acquired in different experimental conditions, a repeated double cross-validation strategy was combined withdifferent cross-validation approaches, a consensus label strategy and adaptive thresholds able to adapt to the clinician’s needs.Combined with validation at the patient level, classification results were improved compared to traditional strategies.

Keywords Raman spectroscopy . Chronic lymphocytic leukemia . Pre-processing . Supervised classification algorithms . Labelconsensus . Clinical practice

Introduction

Raman spectroscopy is a label-free biophotonic technique,applicable to the analysis of complex biological samples, suchas biofluids, cells, and tissues. Raman spectroscopy allowsaccessing the global molecular composition of the biologicalsamples. With this technology, it is possible to detect

biochemical changes caused by various diseases such as can-cer [1–3] or metabolic pathologies [4–7].

The literature is rich of proof of concept studies demonstratingthat Raman spectroscopy has a high potential to improve diseasediagnosis [3, 8–11] or predict its progression [12, 13] for a betterpersonalization of patient care. However, beyond the proofs ofconcept, few studies are realized in a clinical context in order todemonstrate the applicability of Raman spectroscopy in currentclinical practice and workflow. Indeed, this translational researchremains far from the patient’s bedside for several reasons.

For example, samples are often cultured cell lines [14–16]or prepared on non-standard substrates [17–20].

Furthermore, concerning the construction of predictionmodels, the literature is highly heterogeneous on the choicesof the training–validation strategy and of the supervised classi-fication algorithm [8, 10, 21–24]. Inappropriate choices canlead to overfitting, i.e., models with poor generalization prop-erties on unknown data. Indeed, models that are too specializedto the training set are unable to correctly predict the class of newunknown data, resulting in a high difference between trainingand validation accuracies. Overfitting is highly dependent on

* C. [email protected]

1 BioSpecT EA 7506, Faculty of Pharmacy, University of ReimsChampagne-Ardenne, 51 rue Cognacq-Jay, 51096 Reims, France

2 Cellular and Tissular Imaging Platform PICT, Faculty of Pharmacy,University of Reims Champagne-Ardenne, 51 rue Cognacq-Jay,51096 Reims, France

3 CHU UCL Namur, Namur Thrombosis and Hemostasis Center,Hematology Laboratory, Rue Dr Gaston Therasse, CatholicUniversity of Louvain, 5530 Yvoir, Belgium

4 TRIBVN, 39 Rue Louveau, 92320 Châtillon, France

Analytical and Bioanalytical Chemistryhttps://doi.org/10.1007/s00216-019-02321-z

Page 2: Implementation of a classification strategy of Raman data ...

the population representability in the training set. A training setconsidering all the possible sources of variability will be lessexposed to overfitting. A direct consequence is that classifierstrained on small datasets, capturing only a part of populationcharacteristics, are more exposed to overfitting. Overfitting isalso intimately linked to model complexity. Thus, the challengeis to collect a dataset, representative enough of the populationvariability, to train the model with the highest and closest accu-racies both in training and validation.

Moreover, in most studies, a single model is built.However, the selection of the training set has a direct impacton the model estimation and thus on its performance. Thus,the training set should be as representative as possible of thepopulation characteristics. Other strategies, known as ensem-ble learning, consist in combining the predictions of severalclassifiers, resulting in better generalization performance thana unique classification model [25].

Finally, the implementedmodels are generally sensitivity andspecificity balanced, i.e., the final prediction is based on a deci-sion threshold equal to 0.5. However, some clinical applicationsmay require the prioritization of either sensitivity or specificity,which can be achieved by decision threshold optimization [26].

In our approach, we have tried to address these differentpoints by establishing a training, validation and optimizationstrategy to create stable and not subject to overfitting predictionmodels, which are able to adapt to clinical demands. For this,we worked in the context of chronic lymphocytic leukemia(CLL). Raman data were collected under different experimentalconditions on unstained blood smears, spread manually or au-tomatically on standard glass slides, during two different mea-surement campaigns. In our study, we used an innovative ap-proach based on the strategy of repeated double cross-validation (rdCV) combined with validation at the patient level,in order to limit overfitting as much as possible. We tested alsodifferent cross-validation approaches (leave-one-patient-outcross-validation [27], K-fold cross-validation [28] (KFCV),and Monte-Carlo cross-validation [29] (MCCV)) to evaluatetheir effects on classification performance. For the problem ofmodel stability, we have developed a solution using the consen-sus label strategy that makes a decision using a combination ofmodels, unlike unique model prediction where performancesfluctuate according to the used training set. In order to adaptthe final prediction to the clinical context, our method has adap-tive thresholds that allow the clinician to either promote sensi-tivity or have a balance between sensitivity and specificity.

Materials and methods

Patients

In this study, one group of 61 healthy patients and one groupof 79 untreated B-CLL patients, with a Matutes score over 3

and a stage A in the Binet classification, were formed fromtwo different measurement campaigns.

The first one was achieved in the years 2010–2011 duringthe ANR TecSan IHMO project and included 25 healthy and45 B-CLL patients recruited at the Reims Champagne-Ardenne hospital center (RCA-HC) [28]. The second cam-paign, performed in the years 2015–2016 during the EU CIPICT PSP M3S project, included 36 healthy and 34 B-CLLpatients recruited at Mont Godinne-Namur hospital center(MGN-HC). The written informed patient consent was obtain-ed according to the approved local ethics committees (NUB:B039201628170).

Sample preparation

For each patient, one smear was prepared directly in the re-cruitment hospital, by spreading a blood drop deposited on aconventional glass slide in order to work in classical clinicalconditions. At RCA-HC, during the IHMO project, a manualspreading is used, while an automatic spreading with an auto-mated blood-smearing device (HemaPrep) is preferred atMGN-HC, during the M3S project, both without prior chem-ical treatment. Each patient’s blood is systematically analyzedat the hospital by flow cytometry in order to know with cer-tainty the pathophysiological label of the patient (healthy orCLL).

Acquisition of Raman spectra

For both IHMO and M3S measurement campaigns, Ramandata were acquired with a multimodal device developed by theTRIBVN company (Châtillon, France) combining a conven-tional microscope (ECLIPSE FN1, Nikon SA, Champigny-sur-Marne, France) and a Raman spectrometer (HORIBAFRANCE SAS, France).

The microscope was equippedwith a motorized XYZ stage(Ludl Electronic Products Ltd., New York, USA) and two drylenses (Nikon): (i) a × 40 lens (NA 0.6) used to localize theposition of approximatively eighty lymphocytes on eachsmear, (ii) a high magnification lens (× 100/NA 0.9 inIHMO project and × 150/NA 0.95 in M3S project) used forRaman acquisitions on lymphocytes.

The Raman spectrometer was composed of a 532 nm laserexcitation source (Toptica Photonics, Munich, Germany) de-livering a power of 13 mW on the sample, a holographicgrating of 1200 g/mm and a − 70 °C cooled-Pelletier CCD(charge-coupled device) detector (Andor Technology, SouthWindsor, CT, USA) of 1024 × 256 pixels, and a 100 μm con-focal hole.

This setup leads to a XY spatial resolution of 1μm, an axialresolution of 2 μm, a spectral range from 700 to 3170 cm−1,and a spectral resolution of 4 cm−1.

Féré M. et al.

Page 3: Implementation of a classification strategy of Raman data ...

For the IHMO campaign, one Raman spectrum was ac-quired on the nucleus of 2596 healthy and 4257 B-CLL cellswith an acquisition time fixed to two accumulations of 10 s,corresponding to 6853 acquired Raman spectra.

For the M3S project, three Raman spectra were acquired atthree different positions on the nucleus of 1804 healthy and3128 B-CLL cells, corresponding to 14,796 Raman spectra,each acquired during one accumulation of 10 s. For this study,the 3 spectra of each cell were averaged.

In summary, the experimental differences between IHMOand M3S measurement campaigns were (i) the smear spread-ing method (manual for IHMO project and automatic for M3Sproject), (ii) the high magnification lens (× 100/NA 0.9 forIHMO project and × 150/NA 0.95 for M3S project), (iii) thenumber of accumulations for each spectrum (two for IHMOproject and one for M3S project), and (iv) the number ofspectra acquired per lymphocyte (one for IHMO project, andthree for M3S project).

Each cell is thus characterized by its nucleus Raman spec-trum and its physiopathological label (heathy or B-CLL) usedin further supervised classification.

The reader can refer to our previous article to visualize themeans and standard deviations of the Raman spectra acquiredduring the IMHO and M3S campaigns [30].

Quality tests

The classification results are highly dependent on the qualityof training data. Only high quality Raman spectra should beretained for the construction of robust and optimal models[31]. Different statistical methods exist to identify abnormalspectra, such as the Hotelling’s T-squared statistic and the Qresiduals [32]. Obtained from a principal component analysis,these statistics are dependent from the whole dataset and areefficient to find outliers for dataset having a multivariate nor-mal distribution, which is not the case in our study. We thuspreferred to detect outliers individually from their knowncharacteristics in Raman spectroscopy applied to lympho-cytes. Four quality tests were designed in order to quantifythe contribution of various disturbing factors [28].

Signal-to-noise ratio (SNR) was computed by dividing thestandard deviation of the first derivative of the lymphocytesignal in the 2800–3150 cm−1 range by the standard deviationof the first derivative of the noise in the 1800–2200 cm−1

range. First derivatives were used in order to get free fromlinear baseline effects in the computation of signal standarddeviation. Spectra with a SNR smaller than a predefinedthreshold were removed from further analysis. A nominalthreshold of two has been determined as optimal. 5.8% ofspectra from the M3S project and 7.8% of spectra from theIHMO project have been discarded by this quality test. Morespectra could be kept by decreasing this threshold, but it wasintentionally chosen high in this study in order to train the

supervised classification models on high quality spectraexclusively.

Lymphocytes may be close to or superimposed on redblood cells, inducing Raman spectra contaminated by a redblood cell Raman signature. The lymphocyte signal to hemo-globin signal ratio (LSHSR) was computed as the ratio be-tween the maximum intensity of the lymphocyte signal in the1656–1720 cm−1 range and the maximum intensity of thehemoglobin signal in the 1540–1656 cm−1 range. Spectra witha LSHSR smaller than a predefined threshold were removedfrom the analysis. A nominal threshold of 1 has been deter-mined as optimal. Thus, 0.2% of spectra from theM3S projectand 1.6% of spectra from the IHMO project have beendiscarded by this quality test.

Due to the use of a CCD detector, acquired Raman spectracan be contaminated by spikes generated by cosmic rays [31].A spectrum is discarded if at least one of its intensities issuperior to the whole data mean spectrum plus five times theirstandard deviation. In these conditions, 2.1% of spectra fromthe M3S project and 3.1% of spectra from the IHMO projecthave been discarded by this quality test. Other strategies con-sist in the correction of the contaminated spectra, but theywere not implemented in this work because they require pa-rameter optimization, which is often manual, with contrastedresults.

Due to cell fluorescence, Raman spectra can be saturated.Saturation is characterized by several consecutivewavenumbers having the same recorded maximum intensityallowed by the CCD. Spectra presenting such saturation areremoved from the analysis. Here, 0.1% of spectra from theM3S project have been discarded by this quality test. No spec-trum from the IHMO project was saturated.

The application of these four quality tests has discardedapproximately 416 spectra (8%) from the M3S project and822 spectra (12%) from the IHMO project. Even if the pro-portion of discarded spectra may seem high, it must be noticedthat 84 lymphocytes in average have been studied for eachpatient by Raman spectroscopy. Thus, in average, quality testsdiscarded 8 cells per patient from further analysis. The finaldiagnosis was thus finally achieved for each patient using 76lymphocytes in average, which is still statistically sufficient tomake a decision.

Spectral data pre-processing

To make data from the two campaigns comparable for furtherclassification, a specific pre-processing was carried out. Itconsists in the application of a Savitzky-Golay smoothing[33] using a window length of 9 points and a third orderpolynomial function in order to reduce random noise in theRaman spectra. The second step consists in the application ofa collective method based on extended multiplicative signal

Implementation of a classification strategy of Raman data collected in different clinical conditions:...

Page 4: Implementation of a classification strategy of Raman data ...

correction (EMSC) [34] using the mean spectrum S of theentire dataset as the reference spectrum to guide all correc-tions. Using an EMSCmodel, the baseline and the glass signalwere neutralized at once using a fourth order polynomial func-tion and a mean Raman spectrum of glass [35], respectively.Finally, the spectra were normalized around the reference

spectrum S.The efficiency of this pre-processing protocol has been

demonstrated in our previous article [30].

Datasets for numerical processing

The pre-processed Raman data were divided into two sets.The first one (Dataset 1) is composed of 41 healthy patients(15 from IHMO and 26 fromM3S campaigns) and 59 B-CLLpatients (35 from IHMO and 24 from M3S campaigns). Thisdataset is used for feature selection and optimization of super-vised classification models as described below. The second set(Dataset 2) is composed of 20 healthy patients (10 fromIHMO and 10 from M3S campaigns) and 20 B-CLL patients(10 from IHMO and 10 fromM3S campaigns). This indepen-dent test set is blindly tested on these optimized classificationmodels in order to assess their performance. These twodatasets were constructed by random selections of patients.

Feature selection

In this work, a canonical correlation analysis [30, 36] wasapplied to Dataset 1 in order to achieve a supervised featureselection aiming at identifying the discriminant variables be-tween Raman spectra acquired on lymphocytes of healthy andCLL patients.

In order further to highly decrease the number of selectedvariables, a rapid and simple supervised data dimension re-duction was developed in order to remove redundant selectedvariables. The first step consisted in the computation of theFisher-score at each selected wavenumber. The second stepcomputed the wavenumber correlation coefficient matrix Rλ

from the dataset. In a third step, the elements of Rλ beingsmaller than a threshold fixed to 0.7 were removed. Each lineof Rλ was thus composed of highly correlated wavenumbers.In the fourth step, each line was reduced to its wavenumberhaving the highest Fisher-score computed in the first step, i.e.,the most discriminant wavenumber. The last step computedthe unique wavenumbers composing Rλ, i.e., removed repeat-ed wavenumbers. At the end of this fast procedure, all thespectral information is summarized in its most representativeand discriminant wavenumbers. The aim of this superviseddata dimension reduction is to save computational time duringthe forthcoming supervised classification steps. However, itcan be omitted without noticeable consequences on the forth-coming supervised classification results (data not shown).

Supervised classification algorithms

The aim of this study was to optimize a model able to auto-matically diagnose B-CLL patients from heathy ones based ontheir lymphocyte Raman signature. In this project, three dif-ferent supervised classification algorithms were tested fortheir known high performance in vibrational spectroscopy:partial least squares–discriminant analysis (PLS-DA) [37,38], support vector machine (SVM) [39–41], and random for-est (RF) [39, 42].

PLS-DA is a linear supervised classification algorithmbased on the extraction of latent variables explaining boththe maximum variance in the data and the covariance betweendata and group labels. This method has been proved particu-larly effective for collinear data composed of more featuresthan observations, typically observed in Raman spectroscopystudies [43, 44]. This method is parametrized by the numberof latent variables nlv. In this study, this parameter can take thefollowing values: nlv ∈ {1, 2,…, 40}.

By definition, SVM is a linear method searching for a hy-perplane maximizing the margin between classes in order toseparate them. To handle non-linear problems, the kernel trickhas been introduced. It consists in the implicit mapping of thedata, using a non-linear kernel function, into a high dimen-sional feature space in which a linear separation possibly ex-ists between classes. In this study, a Gaussian Radial BasisFunction (RBF) kernel parameterized by γ was chosen be-cause of its widespread popularity. The SVM model estima-tion was achieved by the ν-SVM algorithm implemented bythe LIBSVM library [45]. In this study, these parameters werechosen among the following values: γ ∈ {10−8, 10−7,…, 103}and ν ∈ {0.1, 0.2,…, 0.9}.

RF is a non-linear method based on the construction of amultitude ndt of decision trees. Each tree, with a predefineddepth d, is built using a random selection of the data and apredefined number of features randomly selected. The finalRF decision is based on majority voting of all the decisiontrees. In this study, we selected these parameters in: ndt-∈ {10, 20,…, 200} and d ∈ {10, 20,…, 100}.Linear and non-linear classification methods have been

both tested in this study because there was no evidence infavor of linear or non-linear separability of our Raman spectraldata.

Repeated double cross-validation for classifieroptimization

Classifier construction and its parameter optimization are re-alized by a training step necessary to construct predictivemodels. Among the different existing training strategies, arepeated double cross-validation (rdCV) [46, 47] was usedin this work because of its known reliable model predictionperformance estimation compared to a simple cross-validation

Féré M. et al.

Page 5: Implementation of a classification strategy of Raman data ...

(CV) which can overestimate the model’s performance [48].Furthermore, as a dataset must always be carried out at thehighest hierarchical level in order to properly evaluate classi-fication models [49], data were considered at the patient levelin our study (not at the spectrum level). After the quality test,each patient corresponds to a set of about 76 cells associatedwith their Raman spectra.

The rdCV consists in two nested loops, as represented inFig. 1. The inner loop performs a grid search model tuning inorder to optimize the parameters associated with each classi-fication algorithm. At each iteration of the inner loop, theparameters of the classification algorithm are fixed topredefined values {i, j} (j is empty for PLS-DA), and an innercross-validation loop procedure (simple CV), coupled to fea-ture selection, is run in order to estimate the most discriminantand uncorrelated wavenumbers and to evaluate the percent-ages of CLL and normal cells of each patient from modelsconstructed with this set of parameters. As suggested in liter-ature [49], feature selection is included in the cross-validationloop in order to prevent model from overfitting. At the end ofthis inner loop, the estimated percentages of cells are used toconstruct a receiver operating characteristic (ROC) curve andoptimize a patient decision threshold τ. More informationabout this threshold are given in the next sub-section entitled“Patient decision threshold.” Then, a new classification modelis constructed on the entire training set reduced to the selectedwavenumbers using the best performing parameters andthreshold τ(t), i.e., those maximizing a predefined clinical ob-jective such as (i) balanced sensitivity and specificity (objec-tive 1) in order to have the best compromise between sensi-tivity and specificity, or (ii) sensitivity maximization (objec-tive 2) in order to reduce the risk of misdiagnosing B-CLLpatients. Other classification criteria could of course be ap-plied in function of the study objectives, such as, for example,Bayesian probability [50].

The outer loop repeats T = 100 times the inner loop fordifferent training sets. Indeed, at each iteration of the outerloop, data from Dataset 1 are randomly divided into two sets,i.e., a training set composed of 70% of patients is used tooptimize the classifier during the inner loop, while the 30%remaining patients compose the external validation set used toindependently evaluate the predictive performance of this op-timized model in term of sensitivity and specificity. At the endof this outer loop, 100 optimal models estimated on differenttraining sets and predicting the physiopathological state of apatient, i.e., healthy or B-CLL, are obtained.

To summarize, rdCVis based on the computation of severaltraining-test splits of the data. For each split, the cross-validation is doubly applied (i) on the training part to optimizethe model parameter choice, and (ii) to evaluate the perfor-mance of the selected model on the test part. Using this pro-cedure, model parameter choice and model performance eval-uation are completely separated.

Among the existing cross-validation strategies, three pop-ular methods were tested in the inner loop, i.e., the methodsnamed leave-one-patient-out cross-validation (LOPOCV), K-fold cross-validation (KFCV), and Monte-Carlo cross-valida-tion (MCCV). Of course, other cross-validation strategies ex-ist and could be tested, such as bootstrap. In LOPOCV, the Npatients are divided into N one-patient-folds, inducing N iter-ations. At each iteration, the training set is composed of N − 1patients and the validation set is composed of the remainingpatient. In KFCV, K folds of patients are randomly construct-ed, inducing K iterations. In this study, the classical value K =10 was chosen. At each iteration, the training set is composedof K − 1 folds and the validation set is composed of the re-maining fold. At each iteration of the MCCV, the validationset is composed ofM patients randomly selected. The remain-ing N −M patients compose the training set. In this work, thenumber M of patients selected for the validation set and thenumber of repetitions of MCCV were chosen equal to 10 and100, respectively.

Patient decision threshold

After the pre-processing steps, each patient is represented ap-proximately by 76 Raman spectra, each one being representa-tive of one lymphocyte. A patient is classified as B-CLL by aclassification model if the proportion of its cells classified asB-CLL is greater than a patient decision threshold τ, and ashealthy otherwise. In the following, τ has been first fixedarbitrarily to 50%. The impact of the choice of this thresholdwill be discussed in section “Adaptive patient decision thresh-old in function of clinical objective”.

Blind test of the optimized classifiers

The last step consists in testing the 100 models on the inde-pendent test set (see Fig. 2) represented by Dataset 2 that hasnot been used in the rdCV. Each optimized model reduces thedata to its specific features and predicts the physiopathologicalstate of each test patient, i.e., healthy or B-CLL using themodel specific patient decision threshold τ(t). For a patient,the final decision is obtained using majority voting of theprediction of the 100 optimized models.

Results and discussion

Evaluation of the performance of the differentcross-validation methods

One important element of the development of a supervisedclassification scheme is the choice of a CV method. Most ofspectroscopic studies are performed using a simple CV meth-od, such as LOPOCV [27], KFCV [28], and MCCV [29].

Implementation of a classification strategy of Raman data collected in different clinical conditions:...

Page 6: Implementation of a classification strategy of Raman data ...

However, several studies [51, 52] have demonstrated the effi-ciency of repeated CV in terms of a better estimate of theperformance.

That is the reason why rdCV was used in our study. Aspreviously explained in the section “Materials andmethods,” its inner loop includes a CV method, such asLOPOCV, KFCV. and MCCV, which were evaluated inthis work. Performance of a CV strategy relying more ondata statistics than on the choice of the supervised classi-fication algorithm, this study on the inner CVof rdCV wasled using PLS-DA exclusively. The mean and standarddeviation of sensitivities and specificities estimated by

each CV method on internal and external validation setsare summarized in Fig. 3. Whatever the validation set, nosignificant difference can be observed between the CVmethods. Furthermore, each CV method gives exactly thesame results for internal and external validation sets, witha mean sensitivity around 88% and mean specificityaround 74%. As expected, standard deviations of sensitiv-ity and specificity are higher for the external validation setthan for the internal one. The similarity between the per-formances of cross-validation methods can be explainedby the high number of repetitions T = 100 of the externalloop of rdCV.

Féré M. et al.

Page 7: Implementation of a classification strategy of Raman data ...

These results are in accordance with a previous study [51]which showed similar performance of LOPOCV and KFCVwith k = 10. However, LOPOCV and MCCV are more time-consuming than KFCV which is also known to offer a good

compromise between variance and bias of predictive models[31]. Therefore, KFCV was applied in the rest of the study.

Assessment of the classifying capability of Ramanspectroscopy

Three supervised algorithms (SVM, RF, and PLS-DA) basedon different principles were applied to the dataset. As present-ed in Fig. 4, these three methods have the same mean perfor-mance when applied 100 times, for both the internal validationset, with mean sensitivity and specificity around 87% and75%, respectively, and the external validation set, with meansensitivity and specificity around 84% and 75% respectively.Note also that the standard deviation is smaller for PLS-DAthan other algorithms. In literature, non-linear supervised clas-sification methods, such as RF and SVM, have shown betterthan linear methods, such as PLS-DA, when data are not lin-early separable [53, 54]. Thus, our results, especially the meansensitivity and specificity of the supervised algorithms, showthat linear methods are sufficient to solve the present B-CLL/healthy classification issue.

Moreover, PLS-DA depends on a unique parameter, i.e.,the number of latent variables. Its optimization is thus simplerthan SVM and RF. Themean and the standard deviation of theoptimal number of latent variables estimated over the 100PLS-DA models constructing during rdCV were found equalto 13 and 6, respectively. Taken together, these results showthat PLS-DA is a valuable method for the classification ofRaman spectra acquired on B-CLL and healthy lymphocytesin clinical conditions. It has to be noticed that the similarresults obtained on both internal and external validation setsprove that no overfitting occurs during data training thanks tothe rdCV strategy [46, 55].

Fig. 1 Flowchart of the application on Dataset 1 of rdCV which is acombination of an inner loop composed of a conventional CV method(LOPOCV, KFCV, or MCCV), a supervised feature selection and asupervised classification algorithm (PLS-DA, SVM, or RF) in order totrain a classifier and optimize its parameters, and an outer loop aiming atconstructing several models based on different training sets. (1) At eachiteration of the outer loop, patients from Dataset 1 are randomized anddivided into a training set composed of 70% of patients and an externalvalidation set composed of 30% of patients. (2) At each iteration of theinner loop, a set of classification parameters is selected as part of a gridsearch. (3) Using these parameters, a conventional CV is run. (4) At eachstep of the CV loop, the internal training set feeds the supervised featureselection in order to identify the most discriminant and decorrelated fea-tures. (5) On this reduced data, a supervised classificationmodel is trainedusing the parameters selected at step (2). (6) The internal validation set isreduced to the features estimated at step (4). (7) These data are injectedinto themodel constructed at step (5) in order to predict the percentages ofCLL and normal cells of each patient composing the internal validationset. (8) At the end of the CV loop, these percentages are used to constructa ROC curve by varying the decision threshold τ (i.e., the percentage ofpatient CLL cells). (9) From the ROC curve, the optimal patient decisionthreshold τij(t) and the corresponding sensitivity Sen(i, j) and specificitySpe(i, j) are determined in function of the clinical objective. (10) At theend of the inner loop, i.e., at the end of the grid search, the optimalclassification parameters and the corresponding patient decision thresholdτ(t) are estimated as those maximizing the clinical objective. (11) Theentire training set is now used to identify a unique set FS(t) of the mostdiscriminant and decorrelated features by the supervised feature selection.(12) The resulting reduced training set is used to compute a unique clas-sification model using the optimal parameters estimated at step (10). (13)The external validation data are reduced to the features estimated at step(11). (14) They are injected into the model constructed at step (12) usingthe patient decision threshold determined at step (10) in order to estimateits generalization performance in term of sensitivity and specificity. Steps(1) to (14) are repeated T = 100 times, resulting in the construction of 100optimized models

Fig. 2 Flowchart of label consensus. For each patient of the independenttest set and eachmodel t estimated during rdCV, the pre-processed Ramanspectra are reduced to the features specific tomodel t and estimated at step(11) of the workflow presented in Fig. 1, then injected into the model tconstructed at step (12) of the workflow presented in Fig. 1 in order to

predict the patient class Pt (CLL or normal) based on the patient decisionthreshold τ(t) specific of this model and estimated at step (10) of theworkflow presented in Fig. 1. Finally, a majority voting predict the patientstatus P based on the T patient status predictions Pt

R

Implementation of a classification strategy of Raman data collected in different clinical conditions:...

Page 8: Implementation of a classification strategy of Raman data ...

Adaptive patient decision threshold in functionof clinical objective

The previously presented results were obtained using a patientdecision threshold τ of 50% that is considered as the defaultvalue for classifying patients based on Raman spectra cell.However, this arbitrary value is questionable since it may in-fluence the determination of the optimized classification modelat each iteration of the outer loop. It is thus important to eval-uate the impact of this parameter on the classification results.

A naïve approach would be to fix τ to the same value for alliterations of the outer loop. However, the training datasetchanging at each outer loop iteration, a more objective strate-gy is to adaptively select the optimal τ value at each iterationin order to maximize diagnostic performances linked to theclinical objective. In this study, two different clinical objec-tives have been considered: (i) balanced sensitivity and spec-ificity (objective 1) in order to have the best compromisebetween sensitivity and specificity, (ii) sensitivity maximiza-tion (objective 2) in order to reduce the risk of misdiagnosingB-CLL patients.

Technically, at the end of each inner loop iteration, i.e., for aset of classification parameters, each patient has been classi-fied during an iteration of the CV loop. Thus, for each patient,the percentage of B-CLL cells estimated by the model wasobtained (step 7 of Fig. 1). From these estimated B-CLL cellpercentages, a receiver operating characteristic (ROC) curvewas constructed by varying τ (step 8 of Fig. 1). Concretely, fora high value of τ, all patients were classified as healthy, induc-ing a sensitivity of 0% and a specificity of 100%. A progres-sive reduction of τ led to an increase of sensitivity and adecrease of specificity. For a low value of τ, all patients wereclassified as CLL, inducing a sensitivity of 100% and a spec-ificity of 0%. Figure 5a depicts an example of ROC curve for amodel estimated at an inner loop. If objective 1 is considered,an increase of specificity from 77 to 84% and a small decreaseof sensitivity from 85 to 84% were obtained when the thresh-old of the black curve was slightly increasing from 50% to theoptimized value of 53%.

The point of ROC curves that (i) minimized the distancebetween sensitivity and specificity for objective 1 or (ii) max-imized the sensitivity for objective 2 defined the best τ value

Fig. 4 Comparison of differentsupervised classificationalgorithms (RF, SVM, PLS-DA),using KFCV for the inner loop ofrdCV, expressed in terms of meansensitivity and specificitycomputed on the internal andexternal validation sets

Fig. 3 Comparison of differentcross-validation techniques(LOPOCV, KFCV, MCCV), forthe inner loop of rdCVusing PLS-DA, expressed in terms of meansensitivity and specificity com-puted on the internal and externalvalidation sets

Féré M. et al.

Page 9: Implementation of a classification strategy of Raman data ...

and sensitivity and specificity associated with each model ofthe inner loop. At the end of each inner loop, the parameters ofPLS-DA and threshold τ of the model with the highest sensi-tivity and specificity were selected (step 9 of Fig. 1). After theinner loop of each outer loop iteration t, the final classificationparameters (I(t), J(t)) and the corresponding patient decisionthreshold τ(t) optimizing the clinical objective were selected(step 10 of Fig. 1).

As illustrated on Fig. 5b, the optimal threshold τ is highlyvariable from one outer loop iteration to another with valuesranging from 35 to 85%. This high variability is due to therandom selection of the training set at each outer loop itera-tion. This result demonstrates importance of the optimizationof the threshold τ in order to adapt the best classificationmodel selection to each training set.

As can be seen on Fig. 6, this result can be generalized tothe 100 models estimated during outer loop. The proposedadaptive patient decision threshold procedure is efficient since(i) a rebalancing is visible between the sensitivity and thespecificity (around 80%) for objective 1, and (ii) a significantincrease of sensitivity (around 95%) can be observed for ob-jective 2, both for the internal and external validation sets,compared to the results obtained using a 50% threshold (seeFig. 4).

Due to the ROC curve definition, the choice of a thresholdinducing an increase of sensitivity will surely induce a de-crease of specificity, and vice versa [56]. Indeed, if we com-pare the results for PLS-DA in Fig. 4 and the results in Fig. 6for internal and external validation for objective 1, the speci-ficity increased from around 74 to 80%, while the sensitivitydecreased from around 89 to 80%, compared to the resultsobtained with a 50% threshold. In addition, for objective 2,the increase in sensitivity from around 89 to 95% is followedby a sharp decrease in specificity from around 74 to 54%.

These results demonstrate the importance of parameterizationand the possibility of adapting these parameters according tothe clinician needs.

It is important to notice that the possible clinical objectivesare not limited to the two presented in this paper. Particularly,for applications where disease prevalence must be considered,other classification performance measure metrics such as pos-itive predictive value (PPV) or negative predictive value(NPV) can be maximized in the proposed approach insteadof sensitivity or balanced sensitivity and specificity, as alreadydone in literature [57].

Improvement of classification predictive performanceusing label consensus

Most biomedical studies related to vibrational spectroscopyevaluated the predictive performance of a unique optimizedmodel from an independent test set unseen during the trainingphase [58–60]. For objective 1 (balanced sensitivity and spec-ificity), we have evaluated independently the predictive per-formance of each of the 100 optimized models on an indepen-dent test set (Dataset 2) composed of 40 patients (20 healthyand 20 B-CLL patients). These results are summarized inFig. 7a as mean and standard deviation of sensitivity andspecificity. The results show a similar mean sensitivity andspecificity to those obtained during the training and validation(see Fig. 6). This coherence of predictive performance is char-acteristic of a non-overfitting of the classifiers. However, theindividual performances of the 100 models varied from 56 to94% for sensitivity and from 43 to 95% for specificity, show-ing the great influence of the selection of the training set on theperformance of the classifiers. Some models being less effi-cient, the strategies consisting in the training of a unique

Fig. 5 a An example of ROC curve computed on internal validation setby varying the patient decision threshold τ for a model estimated at aninner loop of rdCV. b Variability of the optimal threshold τ, of sensitivity

and of specificity in function of the model number of the outer loop. Themodels are sorted according to the increasing value of τ

Implementation of a classification strategy of Raman data collected in different clinical conditions:...

Page 10: Implementation of a classification strategy of Raman data ...

model or in the averaging of several models are thus unsuc-cessful approaches.

In our study, a different approach based on the label con-sensus was proposed. The principle is based using a set ofindividual models whose predictions were combined by ma-jority voting. In the case of an unknown patient, each of the100 optimized models realizes a prediction and assigns a label(healthy or B-CLL) to the patient. The predictions of the 100models are then merged and the algorithm chooses the classlabel that receives the most votes. Figure 7b shows the perfor-mance obtained with this strategy, i.e., a sensitivity of 95%and a specificity of 85%. Compared to the average of allmodels (Fig. 7a), label consensus led to increases of 14% forsensitivity and 8% for specificity.

By looking more closely at the results, label consensusapplied at the patient level on the test set resulted in the mis-classification of one healthy patient and three B-CLL patients.Two of these decisions are unclear since less than 65% of theoptimized models wrongly classified this healthy patient and

one of these B-CLL patients. On the contrary, the two other B-CLL patients are clearly misclassified by more than 85% ofthe models.

In order to assess the stability of the results of our method,the whole methodology (including the random construction ofthe training Dataset 1 and the blind test Dataset, rdCV, deci-sion threshold optimization, and label consensus) was repeat-ed four more times for the balanced sensitivity and specificityobjective using the PLS-DA algorithm. The results presentedin the Table 1 show the stability of the proposed procedure.The small observed variability is due to the random split ofdatasets. It must be noticed that the results of the first splitcorrespond to the results presented above and shown onFigs. 6 and 7.

Based on the prediction of 100 different models, label con-sensus is thus 100 times more time-consuming than the tradi-tional use of a unique classification model. On average, theprediction of the pathophysiological state of a new patientbased on label consensus requires 13 s on a laptop equipped

Fig. 6 Mean sensitivity andspecificity computed on internaland external validation sets afteroptimal optimization of thresholdτ for two different objectives,balanced sensitivity andspecificity, and sensitivitymaximization

Fig. 7 Comparison of sensibilityand specificity for independentmodels (a) and label consensus(b)

Féré M. et al.

Page 11: Implementation of a classification strategy of Raman data ...

with an Intel Core i3 CPU, with a sequential application ofeach model. However, this computational time increase is atthe benefit of the classification accuracy as demonstratedabove. Of course, this time can be greatly reduced by usinga workstation dedicated to processing with a recent CPU, andby parallelizing the prediction of the 100 models using CPUmulti-cores.

Improvement of model transferability

The different experimental conditions of Raman acquisitionsfor M3S and IHMO measurement campaigns and the limitednumber of patients have been shown to be limiting factor formodel transferability [30]. However, in this previous study, aclassic LOPOCV was used to estimate the optimal number oflatent variables of a single PLS-DA classifier (i) trained on theIHMO dataset and blindly tested on the M3S dataset(Configuration 1), and (ii) trained on the M3S dataset andblindly tested on the IHMO dataset (Configuration 2). ForConfiguration 1, the averaged accuracies were of 88.5%,70%, and 70.5% for the internal validation, external valida-tion, and blind test sets, respectively, while for Configuration2, they were of 83.5%, 72%, and 73%. Furthermore, the sen-sitivity and specificity associated with these results were clear-ly unbalanced. The reader can refer to [30] for complete de-tailed results.

We tested our new methodology based on the combinationof rdCV, decision threshold optimization and label consensuson the two previous configurations. The results are presentedin Tables 2 and 3. Our new methodology is clearly not subjectto overfitting since the internal and external validations have

the same sensitivity and specificity, contrary to our previousresults. Furthermore, we can see that the blind test results arebalanced with our new methodology and are 15% greater thanthe internal validation sensitivity and specificity, contrary toour previous results. Finally, on the blind test, our new meth-odology has an averaged accuracy of 88% for Configuration1, and of 76.5% (73% for our previous results) forConfiguration 2, which are better results than for our previousstudy. Altogether, these results show that our new methodol-ogy is better adapted for learning classification models fromdata acquired in different experimental conditions, and can behelpful to improve model transferability.

Assignment of the discriminant Raman vibrationsbetween healthy and B-CLL patients

As can be seen on Fig. 8, the mean Raman spectra of healthyand B-CLL patients after the spectral data pre-processing stepare highly correlated and no evident difference is visible. Inorder to highlight the subtle discriminant features betweenthese two groups of spectra, a supervised feature selectionalgorithm described in section “Feature selection” was usedat each rdCV loop. In average, 169 features with a standarddeviation of 6 features, among the 752 original features, wereselected by this method at each rdCV loop. Each of the 100models was thus optimized on its selected spectral features todifferentiate the B-CLL group from the healthy group. Tosummarize, the 50 most frequently selected variables amongthe 100 runs are shown in Fig. 8 as black vertical bands.

The spectral bands selected by the algorithm are attributedin Table 4 and mainly correspond to nucleic acid vibrations[61], i.e., 745 cm−1 for adenine, thymine, and cytosine respi-ration and 1577 cm−1 for guanine and adenine. Other bandsare associated with RNA and DNA phosphate stretchingbands at 813 cm−1, 840 cm−1, and 1100 cm−1. The predomi-nance of these bands is related to the fact that the spectra wereacquired on the nucleus of the cells. Other typical contribu-tions can be found mainly due to proteins such as vibrationswithin the Amide I and Amide III bands at 1660 cm−1 and1252 cm−1 respectively. The band at 1006 cm−1 originatesfrom the aromatic amino acid phenylalanine. Cytochrome cbands [61, 62] are visible at 745 cm−1, 755 cm−1, and1134 cm−1.

Table 1 Sensitivity (Se) and specificity (Sp), in percentage, of the com-plete proposed methodology using PLS-DA to achieve objective 1 for 5different random splits of the training (Dataset 1) and test (Dataset 2) sets

Split 1 Split 2 Split 3 Split 4 Split 5

Se Sp Se Sp Se Sp Se Sp Se Sp

Internal validation 83 81 86 84 79 77 89 85 87 89

External validation 79 78 82 81 78 75 82 83 86 82

Blind test 95 85 86 90 82 85 80 81 95 91

Table 2 Sensitivity and specificity, in percentage, of the completeproposed methodology trained on the IHMO dataset and blindly testedon the M3S dataset, using PLS-DA to achieve objective 1

Sensitivity Specificity

Internal validation 71 74

External validation 64 69

Blind test 87 89

Table 3 Sensitivity and specificity, in percentage, of the completeproposed methodology trained on the M3S dataset and blindly tested onthe IHMO dataset, using PLS-DA to achieve objective 1

Sensitivity Specificity

Internal validation 67 70

External validation 62 65

Blind test 75 78

Implementation of a classification strategy of Raman data collected in different clinical conditions:...

Page 12: Implementation of a classification strategy of Raman data ...

These observations show that DNA, protein, and cyto-chrome c bands can be biochemical markers that discriminatebetween healthy and diseased states. These results are in ac-cordance with literature which has shown that Raman spec-troscopy is capable of distinguishing healthy cells from cancercells by DNA condensation or protein modifications [15, 63].Furthermore, in the case of chronic lymphoid leukemia, path-ological lymphocytes have a modified chromatin [64, 65](clumped chromatin). Cytochrome c plays a central role incellular apoptosis [66] and in the cellular respiration cycle[66]. Variations in cytochrome c bands could reveal a meta-bolic acceleration of leukemic cells that divide anarchically orshow dysfunction at the apoptosis level.

Conclusion

This paper demonstrated that the combination of Raman spec-troscopy with an effective supervised classification strategy isable to accurately diagnose chronic lymphocytic leukemia.First, the use of a repeated double cross-validation strategyis efficient whatever the considered cross-validation techniqueand supervised classification algorithm, due to the high num-ber of repetitions inducing an exhaustive exploration of thetraining set. Second, we introduced an adaptive patient deci-sion threshold based on the number of cells per patient whichis necessary to decide from the patient classification. In fact,the choice of the threshold is automatically driven by the clin-ical objective, i.e., balanced sensitivity and specificity, or max-imization of sensitivity, or maximization of specificity.Finally, label consensus exploited the diversity of trained clas-sifiers to aggregate their predictions and improve classifica-tion results on an independent test set.

Funding information The authors received financial support from theAgence Nationale de la Recherche (ANR) and the European Community.

Compliance with ethical standards

Conflict of interest The authors declare that they have no competinginterests.

References

1. Huang Z, McWilliams A, Lui H, McLean DI, Lam S, Zeng H.Near-infrared Raman spectroscopy for optical diagnosis of lungcancer. Int J Cancer. 2003;107(6):1047–52.

2. Talari ACS, Movasaghi Z, Rehman S, Rehman I. Raman spectros-copy of biological tissues. Appl Spectrosc Rev. 2015;50(1):46–111.

3. Kong K, Kendall C, Stone N, Notingher I. Raman spectroscopy formedical diagnostics— from in-vitro biofluid assays to in-vivo can-cer detection. Adv Drug Deliv Rev. 2015;89:121–34.

Fig. 8 Average spectra of healthy(gray curve) and B-CLL (blackcurve) patients after spectral datapre-processing step, and theirdiscriminant wavenumbers repre-sented by black vertical bands

Table 4 Position and assignment of characteristic bands to differentiatehealthy from CLL patients [26, 56, 61, 62]

Raman shift (cm−1) Attributions

745 A,T,C, Trp (sym br), Cytochrome c

755 Cytochrome c

833 PO2 asym str (DNA), Tyr

929 DNA bk, C-C str

1006 Phe (phenyl ring)

1120 PO2 bk, C-N str, Cytochrome c

1134 Cytochrome c

1206 Phe, Amide III

1252 Amide III

1370 A,T,G, Cytochrome c

1420 CH def, CH2 def

1485 A,G, CH def

1520 A

1577 A,G, Phe, Cytochrome c

1660 Amide I

A adenine, C cytosine, G guanine, T thymine, Tyr tyrosine, Trp trypto-phan,Phe phenylalanine, sym symmetric, asym asymmetric, br breathing,str stretching, def deformation, bk backbone

Féré M. et al.

Page 13: Implementation of a classification strategy of Raman data ...

4. Vuiblet V, FereM, Bankole E,Wynckel A, Gobinet C, Birembaut P,et al. Raman-based detection of hydroxyethyl starch in kidney al-lograft biopsies as a potential marker of allograft quality in kidneytransplant recipients. Sci Rep. 2016;6:33045.

5. Vuiblet V, Nguyen TT, Wynckel A, Fere M, Van-Gulick L,Untereiner V, et al. Contribution of Raman spectroscopy in nephrol-ogy: a candidate technique to detect hydroxyethyl starch of thirdgeneration in osmotic renal lesions. Analyst. 2015;140(21):7382–90.

6. Sharma N, Takeshita N, Ho KY. Raman spectroscopy for the endo-scopic diagnosis of esophageal, gastric, and colonic diseases. ClinEndosc. 2016;49(5):404–7.

7. Rohleder D, Kiefer W, Petrich W. Quantitative analysis of serumand serum ultrafiltrate by means of Raman spectroscopy. Analyst.2004;129(10):906–11.

8. Khan S, Ullah R, Khan A, Ashraf R, Ali H, Bilal M, et al. Analysisof hepatitis B virus infection in blood sera using Raman spectros-copy and machine learning. Photodiagn Photodyn Ther. 2018;23:89–93.

9. Pinto J. Cancer classification in human brain and prostate usingRaman spectroscopy and machine learning [Master’s thesis]:University of Waterloo; 2017.

10. Teh SK, ZhengW, Ho KY, Teh M, Yeoh KG, Huang Z. Diagnosticpotential of near-infrared Raman spectroscopy in the stomach: dif-ferentiating dysplasia from normal tissue. Br J Cancer. 2008;98(2):457–65.

11. Austin LA, Osseiran S, Evans CL. Raman technologies in cancerdiagnostics. Analyst. 2016;141(2):476–503.

12. Ellis DI, Goodacre R.Metabolic fingerprinting in disease diagnosis:biomedical applications of infrared and Raman spectroscopy.Analyst. 2006;131(8):875–85.

13. Hobro AJ, Konishi A, Coban C, Smith NI. Raman spectroscopicanalysis of malaria disease progression via blood and plasma sam-ples. Analyst. 2013;138(14):3927–33.

14. Crow P, Barrass B, Kendall C, Hart-Prieto M, Wright M, Persad R,et al. The use of Raman spectroscopy to differentiate between dif-ferent prostatic adenocarcinoma cell lines. Br J Cancer.2005;92(12):2166–70.

15. Chan JW, Taylor DS, Zwerdling T, Lane SM, Ihara K, Huser T.Micro-Raman spectroscopy detects individual neoplastic and nor-mal hematopoietic cells. Biophys J. 2006;90(2):648–56.

16. Managò S, Zito G, De Luca AC. Raman microscopy based sensingof leukemia cells: a review. Opt Laser Technol. 2018;108:7–16.

17. Vanna R, Ronchi P, Lenferink ATM, Tresoldi C, Morasso C, MehnD, et al. Label-free imaging and identification of typical cells ofacute myeloid leukaemia and myelodysplastic syndrome byRaman microspectroscopy. Analyst. 2015;140(4):1054–64.

18. Draux F, Jeannesson P, Beljebbar A, Tfayli A, Fourre N, ManfaitM, et al. Raman spectral imaging of single living cancer cells: apreliminary study. Analyst. 2009;134(3):542–8.

19. Ramoji A, Neugebauer U, Bocklitz T, Foerster M, Kiehntopf M,Bauer M, et al. Toward a spectroscopic hemogram: Raman spec-troscopic differentiation of the two most abundant leukocytes fromperipheral blood. Anal Chem. 2012;84(12):5335–42.

20. Meade AD, Lyng FM,Knief P, ByrneHJ. Growth substrate inducedfunctional changes elucidated by FTIR and Raman spectroscopy inin–vitro cultured human keratinocytes. Anal Bioanal Chem.2007;387(5):1717–28.

21. Del Mistro G, Cervo S, Mansutti E, Spizzo R, Colombatti A,Belmonte P, et al. Surface-enhanced Raman spectroscopy of urinefor prostate cancer detection: a preliminary study. Anal BioanalChem. 2015;407(12):3271–5.

22. LiuW,WangH, Du J, Jing C. Ramanmicrospectroscopy of nucleusand cytoplasm for human colon cancer diagnosis. BiosensBioelectron. 2017;97:70–4.

23. Larraona-PuyM, Ghita A, Zoladek AB, PerkinsW, Varma S, LeachIH, et al. Development of Raman microspectroscopy for automateddetection and imaging of basal cell carcinoma. JBO. 2009;14(5):054031.

24. Crow P, Stone N, Kendall CA, Uff JS, Farmer JAM, Barr H, et al.The use of Raman spectroscopy to identify and grade prostaticadenocarcinoma in vitro. Br J Cancer. 2003;89(1):106–8.

25. Opitz D, Maclin R. Popular ensemble methods: an empirical study.J Artif Intell Res. 1999;11:169–98.

26. Chen JJ, Tsai C-A, Moon H, Ahn H, Young JJ, Chen C-H. Decisionthreshold adjustment in class prediction. SAR QSAR Environ Res.2006;17(3):337–52.

27. Lyng FM, Faoláin EÓ, Conroy J, Meade AD, Knief P, Duffy B,et al. Vibrational spectroscopy for cervical cancer pathology, frombiochemical analysis to diagnostic tool. Exp Mol Pathol.2007;82(2):121–9.

28. Happillon T, Untereiner V, Beljebbar A, Gobinet C, Daliphard S,Cornillet-Lefebvre P, et al. Diagnosis approach of chronic lympho-cytic leukemia on unstained blood smears using Ramanmicrospectroscopy and supervised classification. Analyst.2015;140(13):4465–72.

29. Graça G, Moreira AS, Correia AJV, Goodfellow BJ, Barros AS,Duarte IF, et al. Mid-infrared (MIR) metabolic fingerprinting ofamniotic fluid: a possible avenue for early diagnosis of prenataldisorders? Anal Chim Acta. 2013;764:24–31.

30. Féré M, Piot O, Liu LH, Beljebbar A, Untereiner V, Gheldof D,et al. Focus on pre-processing step to ensure the clinical transfer-ability of Raman data acquired on lymphocytes in different exper-imental and instrumental conditions. Vib Spectrosc. 2019;103:102931.

31. Bocklitz T, Walter A, Hartmann K, Rösch P, Popp J. How to pre-process Raman spectra for reliable and stable models? Anal ChimActa. 2011;704(1):47–56.

32. Brereton RG. Chemometrics for pattern recognition: John Wiley &Sons; 2009. 524 p

33. Savitzky A, Golay MJE. Smoothing and differentiation of data bysimplified least squares procedures. Anal Chem. 1964;36(8):1627–39.

34. Afseth NK, Kohler A. Extended multiplicative signal correction invibrational spectroscopy, a tutorial. Chemom Intell Lab Syst.2012;117:92–9.

35. Kerr LT, Hennelly BM. A multivariate statistical investigation ofbackground subtraction algorithms for Raman spectra of cytologysamples recorded on glass slides. Chemom Intell Lab Syst.2016;158:61–8.

36. Hardoon DR, Szedmak S, Shawe-Taylor J. Canonical correlationanalysis: an overview with application to learning methods. NeuralComput. 2004;16(12):2639–64.

37. Ming LC,Gangodu NR, Loh T, ZhengW,Wang J, Lin K, et al. Realtime near-infrared Raman spectroscopy for the diagnosis of naso-pharyngeal cancer. Oncotarget. 2017;8(30):49443–50.

38. Barker M, Rayens W. Partial least squares for discrimination. JChemom. 2003;17(3):166–73.

39. Maguire A, Vega-Carrascal I, Bryant J, White L, Howe O, LyngFM, et al. Competitive evaluation of data mining algorithms for usein class i f icat ion of leukocyte subtypes wi th Ramanmicrospectroscopy. Analyst. 2015;140(7):2473–81.

40. Neugebauer U, Bocklitz T, Clement JH, Krafft C, Popp J. Towardsdetection and identification of circulating tumour cells using Ramanspectroscopy. Analyst. 2010;135(12):3178–82.

41. Cortes C, Vapnik V. Support-vector networks. Mach Learn.1995;20(3):273–97.

42. Breiman L. Random forests. Mach Learn. 2001;45(1):5–32.43. Gaydou V, Polette M, Gobinet C, Kileztky C, Angiboust J-F,

Manfait M, et al. Vibrational analysis of lung tumor cell lines:

Implementation of a classification strategy of Raman data collected in different clinical conditions:...

Page 14: Implementation of a classification strategy of Raman data ...

implementation of an invasiveness scale based on the cell infraredsignatures. Anal Chem. 2016;88(17):8459–67.

44. Palermo A, Fosca M, Tabacco G, Marini F, Graziani V, SantarsiaMC, et al. Raman spectroscopy applied to parathyroid tissues: anew diagnostic tool to discriminate normal tissue from adenoma.Anal Chem. 2018;90(1):847–54.

45. Chang C-C, Lin C-J. LIBSVM: a library for support vector ma-chines. ACM Trans Intell Syst Technol. 2011;2(3):27.

46. Filzmoser P, Liebmann B, Varmuza K. Repeated double cross val-idation. J Chemom. 2009;23(4):160–71.

47. Varmuza K, Filzmoser P. Repeated double cross validation (rdCV)–a strategy for optimizing empirical multivariate models, and forcomparing their prediction performances. In: KhanmohammadiM, editor. Current Applications of Chemometrics. Hauppauge:Nova Science Publishers; 2014. p. 15–32.

48. Hastie T, Tibshirani R, Friedman J. The elements of statistical learn-ing: data mining, inference, and prediction, second edition. 2e éd.New York: Springer-Verlag; 2009. (Springer Series in Statistics)

49. Guo S, Bocklitz T, Neugebauer U, Popp J. Common mistakes incross-validating classification models. Anal Methods. 2017;9(30):4410–7.

50. Botelho BG, Reis N, Oliveira LS, Sena MM. Development andanalytical validation of a screening method for simultaneous detec-tion of five adulterants in rawmilk using mid-infrared spectroscopyand PLS-DA. Food Chem. 2015;181:31–7.

51. Molinaro AM, Simon R, Pfeiffer RM. Prediction error estimation: acomparison of resampling methods. Bioinformatics. 2005;21(15):3301–7.

52. Kim J-H. Estimating classification error rate: repeated cross-valida-tion, repeated hold-out and bootstrap. Comput Stat Data Anal.2009;53(11):3735–45.

53. Sattlecker M, Bessant C, Smith J, Stone N. Investigation of supportvector machines and Raman spectroscopy for lymph node diagnos-tics. Analyst. 2010;135(5):895–901.

54. Bergner N, Bocklitz T, Romeike BFM, Reichart R, Kalff R, KrafftC, et al. Identification of primary tumors of brain metastases byRaman imaging and support vector machines. Chemom Intell LabSyst. 2012;117:224–32.

55. Schoeller DA, Westerterp M. Advances in the Assessment ofDietary Intake: CRC Press; 2017.

56. Florkowski CM. Sensitivity, specificity, receiver-operating charac-teristic (ROC) curves and likelihood ratios: communicating the

performance of diagnost ic tests . Clin Biochem Rev.2008;29(Suppl 1):S83–7.

57. Saha A, Barman I, Dingari NC, McGee S, Volynskaya Z, GalindoLH, et al. Raman spectroscopy: a real-time tool for identifyingmicrocalcifications during stereotactic breast core needle biopsies.Biomed Opt Express. 2011;2(10):2792–803.

58. Ramos IR, Meade AD, Ibrahim O, Byrne HJ, McMenamin M,McKenna M, et al. Raman spectroscopy for cytopathology of ex-foliated cervical cells. Faraday Discuss 2016;187(0):187–198.

59. Haifler M, Pence I, Sun Y, Kutikov A, Uzzo RG, Mahadevan-Jansen A, et al. Discrimination of malignant and normal kidneytissue with short wave infrared dispersive Raman spectroscopy. JBiophotonics. 2018;11(6):e201700188.

60. Hlaing MM, Dunn M, Stoddart PR, McArthur SL. Raman spectro-scopic identification of single bacterial cells at different stages oftheir lifecycle. Vib Spectrosc. 2016;86:81–9.

61. Managò S, Mirabelli P, Napolitano M, Zito G, Luca ACD. Ramandetection and identification of normal and leukemic hematopoieticcells. J Biophotonics. 2018;11(5):e201700265.

62. Hobro AJ, Kumagai Y, Akira S, Smith NI. Raman spectroscopy as atool for label-free lymphocyte cell line discrimination. Analyst.2016;141(12):3756–64.

63. Poplineau M, Trussardi-Régnier A, Happillon T, Dufer J, ManfaitM, Bernard P, et al. Raman microspectroscopy detects epigeneticmodifications in living Jurkat leukemic cells. Epigenomics.2011;3(6):785–94.

64. Peterson LC, Bloomfield CD, Sundberg RD, Gajl-Peczalska KJ,Brunning RD. Morphology of chronic lymphocytic leukemia andits relationship to survival. Am J Med. 1975;59(3):316–24.

65. Oscier D, Else M, Matutes E, Morilla R, Strefford JC, Catovsky D.The morphology of CLL revisited: the clinical significance ofprolymphocytes and correlations with prognostic/molecularmarkers in the LRF CLL4 trial. Br J Haematol. 2016;174(5):767–75.

66. Hüttemann M, Pecina P, Rainbolt M, Sanderson TH, Kagan VE,Samavati L, et al. The multiple functions of cytochrome c and theirregulation in life and death decisions of the mammalian cell: fromrespiration to apoptosis. Mitochondrion. 2011;11(3):369–81.

Publisher’s note Springer Nature remains neutral with regard to jurisdic-tional claims in published maps and institutional affiliations.

Féré M. et al.