Eye state classification from electroencephalography ......classification of mental states. In this...

© 2018 DIGITAL MEDICINE | PUBLISHED BY WOLTERS KLUWER - MEDKNOW84

the neurons of the brain.[1] In contrast to methods such as computed tomography or magnetic resonance imaging on imaging the brain anatomy, the EEG examination allows evaluation of the brain physiology within a temporal resolution of milliseconds. Thus, measuring and further analysis of human brain activity using EEG offers essential benefits in comparison to other modalities for functional brain imaging and enables applications beyond clinical routine. Recent research aims for real-time eye states’ classification[2] used as a parameter for individual fatigue,[3] drowsiness,[4] and communication based on

INTRODUCTION

Electroencephalography (EEG) nowadays enables mobile, noninvasive measurement, and analysis of human brain activity during daily activities. The EEG measures voltage changes on the scalp resulting from ionic current within

Access this article onlineQuick Response Code

Website:www.digitmedicine.com

DOI:10.4103/digm.digm_41_17

Eye state classification from electroencephalography recordings using machine learning algorithmsŁukasz Piątek1,2*, Patrique Fiedler1, Jens Haueisen1

1Department of Expert Systems and Artificial Intelligence, Uniersity of Information Technology and Management in Rzeszów, Rzeszów, Poland; 2Institute of Biomedical Engineering and Informatics, Technical University Ilmenau, Ilmenau, Germany

ABSTRACT

Background and Objectives: Current developments in electroencephalography (EEG) foster medical and nonmedical applications outside the hospitals. For example, continuous monitoring of mental and cognitive states can contribute to avoid critical and potentially dangerous situations in daily life. An important prerequisite for successful EEG at home is a real‑time classification of mental states. In this article, we compare different machine learning algorithms for the classification of eye states based on EEG recordings. Materials and Methods: We tested 23 machine learning algorithms from the Waikato Environment for Knowledge Analysis toolkit. Each classifier was analyzed on four different datasets, since two separate approaches – called sample‑wise and segment‑wise – in combination with raw and filtered data were applied. These datasets were recorded for 27 volunteers. The different approaches are compared in terms of accuracy, complexity, training time, and classification time. Results: Ten out of 23 classifiers fulfilled the determined requirements of high classification accuracy and short time of classification and can be denoted as applicable for real‑time EEG eye state classification. Conclusions: We found that it is possible to predict eye states using EEG recordings with an accuracy from about 96% to over 99% in a real‑time system. On the other hand, we found no best, universal method of classifying EEG eye states in all volunteers. Therefore, we conclude that the best algorithm should be chosen individually, using the optimal classification accuracy in combination with time of classification as the criterion.

Keywords: Brain‑computer‑interface, classification, decision rules, decision trees, electroencephalography, eye state, machine learning, Waikato environment for knowledge analysis

ORIGINAL ARTICLE

This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.

For reprints contact: [email protected]

How to cite this article: Piątek L, Fiedler P, Haueisen J. Eye state classification from electroencephalography recordings using machine learning algorithms. Digit Med 2018;4:84-95.

*Address for correspondence: Dr. Łukasz Piątek, Institute of Biomedical Engineering and Informatics,Technical University Ilmenau, Gustav‑Kirchoff 2 Str., 98684 Ilmenau, Germany. E‑mail: lukasz.piatek@tu‑ilmenau.de

[Downloaded free from http://www.digitmedicine.com on Friday, August 24, 2018, IP: 38.72.124.130]

Piątek, et al.: Eye state classification from EEG recordings using ML algorithms

DIGITAL MEDICINE / VOLUME 4 / ISSUE 2 / APRIL-JUNE 2018 85

brain-computer interfacing.[4,5] For instance, analysis of brain stimuli can help paralyzed persons to control devices using commands based on the eye blinking pattern.[5,6] Yeo et al.[4] developed a specialized system for detecting dangerous levels of sleepiness during driving based on eyes’ blinking frequency. Jongkees and Colzato reported about recent research using eye blink rate as a predictor for cognitive function.[7] Furthermore, monitoring eye blink rates can support volunteers to avoid eye diseases such as computer vision syndrome.[8]

The main goal of our current research is related to choosing efficient algorithms, applicable for performing classification of eye states from EEG recordings in a real time. This research is an attempt of expanding results of Rösler and Suendermann[2] Precisely, Rösler and Suendermann tested 42 different classifiers from the Waikato Environment for Knowledge Analysis (WEKA) machine learning toolkit[9,10] and showed that it is possible to predict the eye state retrieved from EEG with an accuracy of classification higher than 97%. They chose the KStar algorithm as the best performing classifier,[11] having the lowest classification error rate of only 2.7%. The main drawback of the KStar algorithm is the long time of classification. As a result, it cannot be used to classify large datasets in real-time systems. Sole evaluation based on accuracy consequently is unsuitable for real-time applications and online EEG processing. In addition, Rösler and Suendermann performed tests on a dataset of only one volunteer, considerably limiting the generalizability of the results. Therefore, in our research, the classifiers are evaluated in terms of three factors: classification accuracy (Acc(C)), time of classification for new, unseen cases (ToC(C)), and time of training the classifier (ToL(C)). The first two factors are prioritized for the evaluation of the classifier’s efficiency, while the third factor is not critical. We tested each classifier on data acquired from 27 volunteers. Furthermore, the data were analyzed using two separate approaches – called sample‑wise and segment‑wise – used in combination with raw and prefiltered data. Thus, our results extend the previous publications and develop the field toward real‑life applications.

MATERIALS AND METHODS

Data acquisitionData acquisition was performed in an EEG laboratory environment for research using medically approved equipment. Electrodes were placed in accordance with the 10–20 system for electrode placement[12] using a commercial EEG cap system (Waveguard, Advanced Neuro Technologies B. V., Enschede, Netherlands)

comprising silver/silver chloride electrodes. The layout of the EEG electrodes is shown in Figure 1. All electrode positions were manually filled with commercial electrolyte gel (ECI Electro‑Gel, Electro‑Gel International, USA). Electrode-skin impedances were checked to be below 25 kOhm at all electrodes. For data acquisition, we used a commercial setup (Eego Mylab, Advanced Neuro Technologies B. V., Enschede, Netherlands) at a sampling rate of 1024 samples/s. The reference electrode was placed at the right mastoid, while the GND electrode was placed at the left mastoid position. Before the recordings, the volunteers were informed about the equipment, measurement processes, and tests procedures. Every volunteer gave written informed consent. Ethics committee approval was obtained before the study.

All 27 volunteers were healthy with no known record of neurological, psychological, ophthalmological, or dermatological diseases. The average age of the volunteers was 27.6 ± 8.2 years. Three different sequences of EEG were analyzed: eyes open, eyes closed, and externally triggered eye blinks. During the eyes’ open session, the volunteers were asked not to blink their eyes for about 1 min during a recording of 2 min. During eyes closed session, the volunteers had to keep their eyes closed for about 2 min. Within the eye blink session, the volunteers had to blink their eyes naturally after a beep tone while avoiding additional eye blinks. The interval of beep tones was 3 s.

DatasetsThe input data for training and subsequent testing of classifiers were saved in the form of a two‑dimensional decision table(s) (or dataset(s)) in which cases, i.e., single samples-in-time or segments consisting of ten subsequent

Figure 1: A map of the electrode positions of the standard 10–20 electrode placement system used during the electroencephalography measurements



DIGITAL MEDICINE / VOLUME 4 / ISSUE 2 / APRIL-JUNE 201886

samples [Figure 2] – were described by conditional attributes and a decision. Attributes are independent variables, whereas the decision is one dependent variable.[13] A simplified example of a decision table is given in Table 1, with the attributes’ electrodeposition (C3, Cz, and C4) and with a decision Eye_state. The set of all cases labeled by the same decision value is called concept. For instance, for Table 1, case set (1, 2, 4) is a concept of all cases such that the value of Eye_state was closed eyes. The two remaining concepts are sets (3, 5, 8) (for these cases, Eye_state has the value open eyes) and (6, 7) (with Eye_state value equals blinking).

The eye states were manually annotated by visual inspection and selection of the corresponding EEG recordings. For eyes open and eyes closed, sequences of 10 s were selected from the overall 2 min recordings, where no eye blink or other artifact was visible in any of the 19 electrode signals. For eye blinks, 10 representative, individual eye blink segments were selected, using 1 s of data following a beep tone trigger. No multiple eye blinks were allowed. The eye blink state was identified to start at the initial deviation from the EEG baseline and end at the return to baseline. Figure 3 shows an example of an eye blink signal at electrode Fp1. All data before and after the eye blink were identified as eyes’ open state.

In our experiments, four differently processed datasets – recorded for 27 volunteers – were used including (i) raw sample‑wise, (ii) filtered sample‑wise, (iii) raw segment‑wise, and (iv) filtered segment‑wise datasets [Figure 1]. We assured that all prepared and tested datasets were complete, without any missing attribute values. Furthermore, all cases were preclassified by an expert, i.e., every sample-in-time or segment must be assigned to one out of the three classes.

Sample‑wise approachThe raw sample-wise dataset contains separate individual samples-in-time acquired in parallel at 19 electrode positions. Thus, each of the cases (i.e., sample‑in‑time) is described by 20 attributes [Figure 2], comprising 19 descriptive attributes representing EEG data values acquired from the analyzed electrodes, and one decision attribute for the eye state (closed, open, or blinking). The raw sample‑wise dataset (called as SamRD) contains 829,494 cases of samples‑in‑time, including 276,507 cases for closed eyes, 474,641 cases for open eyes, and 78,346 cases for eye blinking [Figure 4a].

The second dataset – filtered sample‑wise (also called SamFD) – contains prefiltered data. Prefiltered data comprise the same samples used for the raw data. However, before extraction of the selected data sequences, the whole recording was filtered using a Butterworth bandpass with cutoff frequencies at 1 and 40 Hz as well as a Butterworth band stop with cutoff frequencies at 49 and 51 Hz. The filtered version of data (SamFD) contains an identical number of cases for each concept as the original (raw) dataset.

Table 1: An illustrating artificial dataset comprising 3 attributes/channels (C3, Cz, and C4) and one decision for 8 cases (samples‑in‑time)

Case Attributes Decision

C3 Cz C4 Eye_state

1 7500 7000 −5000 Closed eyes2 7800 7000 −5100 Closed eyes3 −21,000 50 −10,000 Open eyes4 7500 7300 −5000 Closed eyes5 −21,000 50 −11,000 Open eyes6 3500 13,000 17,000 Blinking7 3900 13,000 16,000 Blinking8 −22,000 70 −10,000 Open eyes

Figure 2: Process of data acquisition and sample-wise versus segment‑wise classification approaches. Note that the colored dots in the electroencephalography traces are only for illustrative purposes, no subsampling was used

Figure 3: An example of an eye blink recorded at electrode Fp1 using an Eego electroencephalography amplifier. Unfiltered raw data shown




Segment‑wise approachFor the segment‑wise dataset, we used a specific approach based on grouping and flattening the data. First, ten subsequent samples-in-time are grouped into a single segment of data. Subsequently, the data were flattened by conversion from the source matrix consisting of 19 rows (i.e., number of used electrodes) and 10 columns (i.e., number of subsequent samples) into a single data vector. The first 19 attributes of the output vector are the values of all 19 electrodes registered at time t1, the next 19 attributes are values of electrodes at time t2, and so on. Then, each case – i.e., segment comprised ten subsequent samples-in-time [Figure 2] – is described by 191 attributes, including 190 condition attributes and one decision attribute. Therefore, the newly generated classifiers include time dependencies (within the individual segments), i.e., values recorded on different electrodes at different time points ti where (i: I ∈ C ∧ 1 ≤ i ≤ 10). Then, example decision rules induced by the PART algorithm[9] are as follows:

F8_t4 ≤ ‑42036 AND T8_t1 > ‑30032 => Eye_state = closed_eyes

O1_t2 > ‑26977 AND F7_t7 > 102460 => Eye_state = open_eyes

P3_t1 > 33327 AND P3_t3 <= 34317 AND Fp2_t10 > 38483 AND Fp1_t10 > ‑7841.4 => Eye_state = blinking

In the case of both segment‑wise datasets – raw and filtered, called as SegRD and SegFD, respectively – each dataset

contains 82,836 segments, including 27,648 cases for closed eyes, 47,379 cases for open eyes, and 7809 cases for eye blinking [Figure 4b].

Technical informationTo enable other researchers to further explore or reproduce the results presented in this article, we released a public repository. After registering on the web page of Institute of Biomedical Engineering and Informatics of TU Ilmenau, 98684 Ilmenau, Germany and subsequent approval, the datasets can be downloaded from.[14]

The main tool used in our research for classification was the WEKA machine learning toolkit developed at the Waikato University, New Zealand.[9,10] Thus, all analyzed datasets were prepared using the required Attribute-Relation File Format (ARFF) format. The ARFF files have two distinct sections, including header and data information. The format of the first line in the header section is @relation <relation-name>, where <relation-name> is a string with the name (description) of the dataset. The next lines contain attribute declarations in the form of an ordered sequence of @attribute statements, using the notation @attribute <attribute‑name> <datatype>. Each attribute from the dataset must have its own @attribute statement, which uniquely defines its name and data type. The WEKA toolkit supports four types of attributes, including numeric, nominal, string and data format. In case of our EEG data, all conditional attributes are numeric, i.e., containing real or integer numbers, and decision is a nominal type attribute with three allowed values: Closed_eyes, open_eyes, and blinking. The data section in the second part of the file contains the data declaration line and the actual instances (i.e., cases) lines. The @data declaration is a single line indicating the beginning of a data segment. Each instance is represented on a single line, where attribute values are delimited by commas and carriage returns denoting the end of each instance. The structure of an example dataset, containing raw samples‑in‑time saved in the ARFF file is presented in Figure 5.

Experiments for all classifiers were performed on a compute server (128GB, 4CPU’s 6CORE Xeon E7450, 2.4GHz).

Validation and testingDuring the training and the subsequent testing processes, the hold‑out validation method was used, as follows:• Sixty‑six percent of all cases (i.e., 547466 for both

sample‑wise datasets and 54,672 for both segment‑wise datasets) were used in the training phase

Figure 4: Number of cases of raw‑and filtered sample‑wise (i.e., singular samples‑in‑time) datasets (a), and raw‑and filtered segment‑wise (i.e., segments of each 10 subsequent samples) datasets (b)

b

a




• The remaining 34% of all cases (i.e., 282,028 for both sample‑wise datasets and 28,164 for both segment‑wise datasets) for testing the classifiers.

Each of the algorithms was tested on four different datasets, comprising raw and filtered datasets containing individual samples-in-time (denoted as SamRD and SamFD, respectively), and raw and filtered datasets containing data segments (i.e., SegRD and SegFD). Furthermore, all classifiers were tested using the simultaneous evaluation of three factors, including:• Acc(C) – classification accuracy• ToC(C) – time of classification of new cases• ToL(C) – time of its learning phase.

RESULTS

In this research, 23 machine learning algorithms were tested including (i) three Bayes type classifiers, (ii) three functions, (iii) one “lazy” classifier, (iv) five decision rule‑based algorithms, (v) six decision tree-based algorithms (including aggregated Random Forest classifier), and (vi) four meta‑type classifiers [Figures 6 and 7].

Based on the performed tests, 10 of all 23 machine learning algorithms turned out to be effective in the analyzed domain of EEG eye state classification.

Classification accuracyThe Accuracy of a classifier C (i.e., Acc(C)) is the percentage of testing dataset cases classified correctly. The error rate or misclassification rate of a classifier C is equal to 1–Acc(C). To accept a classifier as effective or at least acceptable, we set a threshold for its classification accuracy of at least 90%. Second, besides the acceptable average value of the classification accuracy, the analyzed

classifier must correctly differentiate cases from different concepts (i.e., eye states) at the same time.

Unexpectedly, for 10 out of 23 tested classifiers, very poor results of classification accuracy were achieved [Figures 6 and 7]. The highest classification error rate equal to over 65% had standard classifiers such as Naive Bayes and Naive Bayes Updatable. A much better result for the Bayes type classifier subgroup was obtained using Bayes Net (about 83%–87% of the correct classifications). However, the accuracy was still lower than the acceptable value of classification accuracy of at least 90%. Weak results were also noted for all functions including classification error rates above 40% for Logistic and Simple Logistic methods, and about 25% for MULTILAYER perceptron network. Furthermore, the weakest decision rule and decision tree‑based classifiers – i.e., ZeroR and Decision Stump, respectively – had to be rejected. In the case of the Decision Stump classifier even its aggregation using two different methods – AdaBoost and LogitBoost – did not considerably improve the classification results. For instance, application of the LogitBoost algorithm improved the classification accuracy of the baseline Decision Stump classifier by about 7% but still insufficient to obtain an acceptable value of Acc(C). Therefore, only 13 out of 23 classifiers achieved average values of Acc(C) >90%, including:

Figure 5: An example electroencephalography dataset in an Attribute‑Relation File Format file prepared for use in the Waikato Environment for Knowledge Analysis toolkit

Figure 6: Classification accuracy of all 23 tested classifiers for raw‑and filtered sample‑type datasets




• IBk (k = 1) – Implementation of k‑Nearest Neighbor (kNN) algorithm

• Decision Table – Implementation of Decision Table Majority (DTMaj)[15] algorithm

• OneR – Simple classifier[16] which extracts a set of rules using only a single attribute

• PART– Indirect approach to decision rule induction using the C4.5 decision tree-based algorithm

• JRip – Implements a decision rule learner called Repeated Incremental Pruning to Produce Error Reduction, proposed by William W. Cohen[17] as an optimized version of the IREP algorithm

• ZeroR – Class for building and using a 0‑R type classifier, which predicts the mean for a numeric class or the mode for a nominal class

• REPTree – Fast decision tree learner• Random Tree – Class for constructing a tree that

considers K randomly chosen attributes at each node[18]

• LMT – Classifier for building “logistic model trees,” which are classification trees with logistic regression functions at the leaves

• J48 – Implementation of C4.5 algorithm (used in two versions – with and without using the pruning procedure)

• Random Forest – The aggregated form of random tree classifier

• Classification via Regression – Multiresponse linear regression (tested for M5P decision tree‑based baseline classifier)

• Randomizable Filter Classifier – A simple version of the filtered classifier that implements the randomizable interface, useful for building ensemble classifiers using the RandomCommittee meta-learner (tested for baseline classifier IBk with k = 1).

Afterward, the analysis of the confusion matrix of OneR [Table 2] shows that, despite the high average value of classification accuracy, this classifier will be useless, due to the poor quality of differentiation of cases belonging to other concepts. For instance, about 19% blinking (i.e., 5078 out of 26746) cases were classified as open eyes.

Therefore, in the first phase, 12 out of 23 classifiers were chosen as potential candidates for real‑time classification of EEG eye states [Tables 3 and 4], since each of them (i) achieved Acc(C) >90% and (ii) simultaneously correctly differentiates cases from different concepts.

Time of classificationThe time of classification should be short, at least in reference to other classifiers belonging to the same subgroup as the analyzed one. Furthermore, in case of an EEG data sampling frequency of 1024 samples/s, the time of classification should not exceed 0.98 ms. A longer classification time would inhibit real‑time classification. Thus, two of 12 initially preselected classifiers do not fulfill this additional condition. At first, the IBk classifier – i.e., the implementation of kNN algorithm in WEKA toolkit – must be rejected from the list. The kNN algorithm is a type of instance‑based learner (or lazy learner), where the function is approximated only locally and all computation is postponed until classification. Due to the necessity of computing the distances from the test case to all other cases of the analyzed dataset, the kNN algorithm is computationally expensive, especially for large training datasets. The results from Tables 3 and 4 show that, in our case, the time of classification in comparison to the other classifiers is the longest. For instance, in reference to an example segment-wise dataset (e.g., SegRD dataset), IBk classifies only about 79 cases/s, whereas most classifiers allow to classify over a 100,000 cases during 1 s [Table 4]. A similar situation can be observed for the random filter classifier aggregated algorithm, when IBk was used as a baseline classifier. The speed of classification increased almost twice in the sample-wise SamRD dataset (from 5.4 to 10.4 cases/s) and threefold for segment‑wise SegRD dataset (from 79.2 to 252.7 cases/s) but still not enough to be considered as acceptable.

Figure 7: Classification accuracy of all 23 tested classifiers for raw‑and filtered segment‑type datasets




Based on the results reported in Tables 3 and 4, it can be clearly seen that all 10 remaining (i.e., finally selected) classifiers allow to perform the real‑time classification, assuming that the sampling rate is 1024 samples/s (or less). Precisely, the classification time of a single case – individual samples‑in‑time or segments – ranges from only 2.6 µs to a maximum of 104.9 µs and consequently is much shorter than the interval between the consecutive measurements (i.e., <0.98 ms). In other words, the classification of a single case will be performed much earlier by the best ten classifiers than new, subsequent measurement from the data acquisition would appear.

Time of learningThe last important factor is the time required to build the machine learning model (i.e., training time). For instance, if both comparable classifiers have similar values of Acc(C) and ToC(C), the classifier with a shorter training time will be selected as more efficient. For instance, the classification accuracy of JRip and PART does not differ more than 0.4% for sample‑wise and 0.071% for segment‑wise approaches. Second, the time of classification of both algorithms is almost identical with values of a few microseconds. On the other hand, the PART highly outperformed JRip in reference to the time of training the

Table 2: Performance of one less efficient (OneR) and one efficient (random forest) decision rule and decision tree based classifier with reference to quality of differentiating cases from each concept, both with average classification accuracy greater than 90% (example results for raw sample‑wise dataset [SamRD])

Classifier TP rate FP rate Precision Recall F‑measure ROC area Accuracy

OneREyes closed 0.960 0.049 0.908 0.960 0.933 0.956 91.304%Eyes open 0.954 0.106 0.923 0.954 0.938 0.924Blinking 0.501 0.010 0.838 0.501 0.627 0.745Weighted average 0.913 0.078 0.910 0.913 0.907 0.918

Confusion matrix Original class Classified asa b c

a=Eyes closed 90,330 3418 342b=Eyes open 5159 153,498 1413

c=Blinking 441 5078 21,227Random forest

Eyes closed 1.000 0.000 1.000 1.000 1.000 1.000 99.278%Eyes open 0.998 0.014 0.989 0.998 0.994 1.000Blinking 0.936 0.001 0.988 0.936 0.961 0.999Weighted average 0.993 0.008 0.993 0.993 0.993 1.000

Confusion matrix Original class Classified asa b c

a=Eyes closed 94,090 0 0b=Eyes open 0 160,881 311

c=Blinking 0 1725 25,021ROC: Receiver operating characteristic, TP: rate of True Positives, FP: rate of False Positives

Table 3: Classification accuracy and time of classification of the 12 preselected classifiers (results only for raw sample‑wise dataset [SamRD], sorted by accuracy in descending order)

Classifier Accuracy Time of classification Speed of classification (case (s)/s)

Entire testing dataset Single case

Random forest 99.278% 29.61 s 104.9 µs 9525Classification via regression (MP5)

99.118% 2.14 s 7.6 µs 131,789

LMT 98.959% 1.46 s 5.2 µs 193,170J48 (C4.5) with pruning 98.885% 1.11 s 3.9 µs 254,079J48 (C4.5) without pruning 98.862% 0.95 s 3.4 µs 296,871REP tree 98.750% 0.73 s 2.6 µs 386,339Random tree 98.630% 0.73 s 2.6 µs 386,339JRip 98.609% 2.38 s 8.4 µs 118,499PART 98.267% 3.08 s 10.9 µs 91,568DT majority 97.399% 2.01 s 7.1 µs 140,312IBk (k=1) 99.389% 52006.45 s 184.4 m 5.4Randomizable filter classifier (IBk) 98.995% 27152.9 s 96.3 m 10.4LMT: Logistic model trees




classifier [Table 5] and consequently, the PART algorithm was chosen to be superior to JRip.

DISCUSSION

General informationBased on the achieved results, it is clearly visible that most of the 10 selected best classifiers outperformed the KStar classifier proposed in Rösler and Suendermann.[2] The KStar belongs to the group of instance-based learning algorithms, so‑called lazy learners. Lazy‑type classifiers – like, for example, kNN or locally weighted learning[19] – perform a classification by comparing the analyzed case to the entire dataset of preclassified cases. Thus, the biggest disadvantage of using KStar relates to the fact that it is definitely too slow for the envisaged real‑time classification.

Furthermore, Rösler and Suendermann achieved classification accuracy of the KStar algorithm of 97.3% for the binary classification problem, since their analysis was performed only for closed and open eyes, whereas in our research, three different concepts (closed, open, and blinking eyes) were investigated. In addition, Rösler and Suendermann used a much smaller dataset with only 14,977 cases, containing information about only one volunteer. In contrast, we analyzed information of 27 different volunteers. Thus, our results advance the field in terms of reliability and generalizability.

Sample‑wise versus segment‑wise approachThe comparison of the results of the ten selected (i.e., the best) classifiers for sample‑wise and segment‑wise approaches shows that classification accuracy is on a similar level. Differences between results of both approaches do not exceed 1.5% for any classifier. The slightest difference

of only 0.77% was noted for random forest, and the highest of 1.43% for the unpruned and 1.45% for the pruned version of decision tree generated using the J48 (C4.5) algorithm. These differences result from the fact that, in the segment‑wise approach, each segment – containing ten consecutive samples – is more complex to analyze by any machine learning algorithm compared to an individual sample-in-time, i.e., case in the sample-wise approach. On the other hand, the segment-wise dataset stores time dependencies among 10 grouped subsequent samples-in-time. As a result, the possibility of detecting and recognizing the patterns’ meaning in different eye states (e.g., eye blinking) is higher. Despite these differences, both proposed approaches proved to be effective, since the classification accuracy ranges from above 96% for simple

Table 4: Classification accuracy and time of classification of the 12 preselected classifiers (results only for raw segment‑wise dataset [SegRD], sorted by accuracy in descending order)

Classifier Accuracy Time of classification Speed of classification (case (s)/s)

Entire testing dataset Single case

Random forest 98.505% 2.55 s 90.5 µs 11,045Classification via regression (MP5)

98.069% 0.56 s 19.9 µs 50,293

LMT 97.706% 0.75 s 26.6 µs 37,552J48 (C4.5) with pruning 97.433% 0.2 s 7.1 µs 140,820J48 (C4.5) without pruning 97.426% 0.33 s 11.7 µs 85,345JRip 97.390% 0.21 s 7.5 µs 134,114REP tree 97.376% 0.16 s 5.7 µs 176,025Random tree 97.323% 0.17 s 6.0 µs 165,670PART 97.319% 0.26 s 9.2 µs 108,323DT majority 96.272% 0.77 s 27.3 µs 36,576IBk (k=1) 98.328% 355.59 s 12.6 m 79.2Randomizable filter classifier (IBk) 97.145% 111.44 s 3.96 m 252.7LMT: Logistic model trees

Table 5: Time of training/model generation of selected classifiers (best 10 out of 23) (total time of training, i.e., time of training process performed on the entire raw (a) sample‑in‑time [SamRD] and (b) segment type [SegRD] dataset)

Classifier Time of learning for the entire dataset

RAW samples‑in‑time (SamRD dataset)

RAW segments (SegRD dataset)

Random Forest (s) 4404.93 563.72Classification via regression (MP5) (s)

804.85 5883.60

LMT (s) 9892.26 2625.09J48 (C4.5) with pruning (s)

1805.65 338.07

J48 (C4.5) without pruning (s)

891.56 295.23

REP tree (s) 357.64 134.57Random tree (s) 91.19 8.77JRip (s) 106,141.39 4432.24PART (s) 5914.41 1430.08DT majority (s) 518.94 477.14LMT: Logistic model trees




DTMaj algorithm up to over 99% for the Random Forest classifier [details in Tables 3 and 4]. Analysis of the results showed that, in general, classification accuracy is higher for classifiers generated for the raw data rather than for the filtered, especially for the segment‑wise data [Figure 7]. However, this principle is not universal and could not be arbitrarily applied to each dataset.

With respect to time of classification and time of training, both sample-wise and segment-wise approaches are efficient [Tables 3 and 5]. Nevertheless, direct comparison of ToC(C) and ToL(C) between these both approaches in a trivial way is not meaningful, due to their different dimensions. Simplifying, the sample-wise dataset contains 10 times more cases than segment-type dataset, and each case denoted as individual sample-in-time is described by 10 times less conditional attributes than a case denoted as a segment. To clarify this, one should reconsider the structure of saving the data used in both approaches. Each sample-wise dataset (SamRD and SamFD) contains 829,494 cases, in the form of individual samples-in-time. Each case is described by 20 attributes, including 19 descriptive attributes meaning values of voltage recorded from 19 electrodes and 1 decision attribute as the assigned value of eye state. For both segment‑type datasets (SegRD and SegFD), the data were flattened, then each of the 82,836 cases – i.e., segments containing 10 samples‑in‑time – is described by 191 variables, including 190 descriptive attributes and 1 decision attribute. Thus, in conclusion, we achieved:• In most cases, the shorter time of classification (or

higher speed of classification) of a single, new case for a sample-wise approach than for a segment-wise approach (see [Tables 3 and 4]).

• Efficient time of classification of a new case for both approaches, since a value of ToC(C) for a single case – i.e., individual sample‑in‑time and/or an entire segment of data – is equal only from a few to several tens of microseconds (not more than 104.9 µs for the Random Forest classifier). Furthermore, the time of classification of a single case is very close for each selected classifier in both approaches.

• Shorter time of training/building the classifier model [Table 5] and shorter time of its validation – i.e., total time of testing performed on the entire testing dataset – for a segment‑wise approach.

Optimal classifiersThe ten best‑selected classifiers belong to supervised machine learning, including:• Three algorithms for induction of decision rules• Six decision tree‑based algorithms

• One meta‑type classifier, using as a baseline model the MP5 decision tree.

Thus, it can easily be interpreted and further investigated by a person being an expert in the EEG domain. In case of the three decision rule‑based classifiers, the highest classification accuracy and the shortest time of classification are observed for the JRip algorithm. However, the PART algorithm was chosen to be superior to JRip. The classification accuracy of JRip and PART does not differ more than 0.4% for sample‑wise and 0.071% for segment‑wise approaches. Time of classification for both algorithms is almost identical with values of a few microseconds. However, PART outperformed JRip in reference to the time of training the classifier model, since it was almost 18 times faster for samples-in-time and 3-times faster for segments [Table 5]. The aforementioned criterion was also applied to the six decision tree‑based classifiers, where the optimal one seems to be the random tree algorithm, mainly because of the shortest time of training and very similar results of two remaining factors to the other decision trees. In the case of meta‑type classifiers, the classification through regression (with MP5) is the only one from that subgroup fulfilling the criterion of high classification accuracy and acceptable values of ToC(C) and ToL(C). A list of the three most efficient or optimal classifiers of each subgroup is presented in Table 6.

Based on the aforementioned facts, we furthermore found that there is no best, universal method of classifying EEG eye states. For instance, the highest value of classification accuracy was obtained for Random Forest classifier; the

Table 6: Performance of the most efficient classifiers, i.e., one best classifier from each selected subgroup, including decision trees, decision rules, and meta‑type classifiers (results only for raw sample‑wise dataset [SamRD])

Classifier PART Random tree

Classification via

regression

TP rate (eyes closed) 1.000 1.000 1.000TP rate (eyes open) 0.989 0.988 0.996TP rate (blinking) 0.885 0.927 0.929TP rate (weighted average) 0.983 0.986 0.991FP rate (eyes closed) 0.000 0.000 0.000FP rate (eyes open) 0.026 0.016 0.016FP rate (blinking) 0.007 0.007 0.002FP rate (weighted average) 0.015 0.010 0.009Classification accuracy (%) 98.257 98.630 98.118Classification time (for the entire test dataset) (s)

3.08 0.73 2.14

Classification time (for a single case) (µs)

10.9 2.6 7.6

TP: rate of True Positives, FP: rate of False Positives




fastest classification could be performed using random tree and/or REP Tree classifiers, whereas the shortest time of training/building was noted for random tree. Further, it can be observed that classifiers built for sample‑type datasets are more complex than those for segment-type datasets [Table 7], since they contain more decision rules or larger decision trees. Analysis of the classification results showed that, in general, the classification accuracy to be slightly higher for classifiers generated for the raw data rather than for the filtered. However, this principle is not universal and could not be arbitrarily applied to each result. Thus, on the basis of our current research reviewing different machine learning methods used for EEG eye state classification, we may conclude that, for every specific dataset, the best algorithm should be chosen individually, using as the criterion of optimality the value of classification accuracy in conjunction with times of classification.

Further researchMany machine learning algorithms require symbolic (also called categorical) attributes. However, many real‑world datasets (e.g., EEG data) comprise numerical (continuous) attributes, which are integers or real numbers. Thus, in the fields of data mining and knowledge discovery in databases, a conversion of numerical attributes into symbolic ones – called discretization or quantization – must be performed for such types of input data during the preprocessing. In most cases, the discretization leads to more efficient use of knowledge discovery, where the most important point is the maximization of the classification accuracy of the classifier generated from that discretized dataset.[20] An example of a dataset with

numerical attributes saved in the form of a decision table is presented in Table 1. This decision table has eight samples with conditional attributes being variables C3, Cz, and C4. These variables are too specific for any instant generalization. For instance, the C3 attribute determines the class membership of any sample as good as the set of all three attributes. Thus, the set of decision rules can be induced by choosing variable C3 as the only attribute, as follows:

(C3,‑22000)‑> (Eye_state, open_eyes)

(C3,‑21000)‑> (Eye_state, open_eyes)

(C3, 3500)‑> (Eye_state, blinking)

(C3, 3900)‑> (Eye_state, blinking)

(C3, 7500)‑> (Eye_state, closed_eyes)

(C3, 7800)‑> (Eye_state, closed_eyes)

However, it can be clearly noticed that classifiers generated in that way would be weak or even useless, due to the fact that these decision rules are over‑specialized. Namely, with any new case (sample‑in‑time) characterized by a different value of attribute (electrode) C3 than listed in the training decision table, it will not be possible to determine the Eye_state. Consequently, the classification accuracy of the aforementioned set of decision rules will be close to 0%. Such problems can be resolved using discretization. The process of conversion is performed by partitioning of each numerical attribute domain into

Table 7: Classification accuracy and complexity of an example two decision rule‑based and four decision tree‑based classifiers

Classifier Dataset type

Sample‑wise approach Segment‑wise approach

SamRD SamFD SegRD SegFD

(1) classification accuracy and (2) number of decision rules

JRip 98.61%276

97.27%409

97.39%129

95.89%50

PART (indirect decision rule induction) 98.24%816

97.12%1549

97.32%216

95.89%222

(1) classification accuracy and (2) size of the decision tree

Random tree 98.63%15759

98.73%17667

97.32%2825

94.92%4873

REP tree 98.75%3479

98.12%7173

97.38%575

95.99%519

J48 (C4.5) with pruning 98.89%6927

98.71%11003

97.43%1337

95.41%2155

J48 (C4.5) without pruning 98.86%8077

98.73%11961

97.43%1441

95.21%2409




selected intervals. For instance, if the analyzed attribute has numerical values from the interval [a, b], then this interval is divided into subintervals (a1, b1), (a2, b2),…, (an−1, b n−1), (a n, b n−1), where a1 = a, b1 = a2,…, bn−2 = an-1, bn−1 = an and bn = b, and the calculated values a2, a3,…, an are called cut points. Methods of discretization can be categorized into different subgroups including:• Supervised and unsupervised – Information about

the class membership is used in supervised methods and not taken into account in unsupervised[21] (or class-blind[22]) methods

• Dynamic and static – In dynamic[21] methods (called also in[23] as global), all attributes are processed by selecting not only cut points but also attributes as well, whereas static[21] methods (called local in[23]) are working separately on the attributes, i.e., one attribute at a time. At present, most of the used discretization methods are static (or local)

• Global and local – According to Kohavi and Sahami[24] methods of discretizing, all numerical attributes before training the classifier are called global, and if discretization is accomplished during the process of training the classifier, then such methods are called local

• Bottom‑up and top‑down – For bottom‑up approaches, small intervals are recursively merged into larger intervals, whereas in top-down approaches, intervals are recursively split into smaller ones.

One of the most commonly used techniques for discretization is equal interval frequency, equal interval width, minimal class entropy, minimum description length principle, and clustering.[20,25]

Therefore, not all machine learning algorithms can be directly applied for classification of EEG data. For instance, the Iterative Dichotomizer 3 (ID3) developed by Ross Quinlan[26] is the well-known algorithm used very often to generate decision trees from the different types of datasets using Shannon entropy. However, ID3 requires preprocessing of the dataset with numerical values, by initial conversion of all numerical attributes into discrete values. An extension of Quinlan’s ID3 algorithm is the C4.5,[27] which enables (i) handling both – i.e., numerical and symbolic – attributes, (ii) handling data with missing attribute values, and (iii) pruning trees. The decision tree created using the J48 (C4.5)[22] algorithm for the data from Table 1 is presented in Figure 8. Two other machine learning models, including Modified Learning from Examples Module, version 2[25] (MLEM2) and random tree[18] are presented in Figures 9 and 10], respectively. Each of those classifiers correctly covers

all cases from the training dataset. The only difference is the structure of these models. In both cases, including decision rules induced by MLEM2 and decision tree generated by random tree, two attributes (C3 and Cz) are used in the process of classification. For the decision tree generated by J48 (C4.5), cases are differentiated based on using three intervals determined by two cut point values equal – 21,000 and 3900 of only one attribute C3.

Therefore, to improve the current results in future research, we propose using (i) selected methods of discretization and (ii) feature extraction algorithms as a preprocessing phase. In reference to discretization, Grzymała‑Busse recently proved in[28] that the new multiple scanning method is significantly better than other methods. Namely, multiple scanning outperforms internal discretization used in the C4.5 algorithm or two other globalized discretization methods: equal interval width and equal frequency per interval.[20] Multiple scanning allows to achieve higher values of classification accuracy and significantly simpler

Figure 10: Decision tree for Table 1 generated with the random tree[18] algorithm in the Waikato Environment for Knowledge Analysis toolkit[9,10]

Figure 9: Decision rules for Table 1 induced with the Modified Learning from Examples Module, version 2[25] algorithm (our own implementation in Microsoft Visual Studio 2015 and C#)

Figure 8: Decision tree for Table 1 generated with the J48 (C4.5)[27] algorithm in the Waikato Environment for Knowledge Analysis toolkit[9,10]




decision trees than other methods of discretization. Second, feature extraction algorithms should allow to filter the data by selection of only the most important (i.e., dominant) attributes strongly correlated with the decision, and rejection of so-called redundant attributes at the same time. Thus, it seems that using algorithms from these two domains should allow to further improve our current results of classifying EEG eye states, mainly in terms of low error rate and complexity of newly generated machine learning models.

CONCLUSIONS

Ten out of 23 tested classifiers fulfilled the determined requirements of high classification accuracy and short time of classification and can be selected as applicable for the process of real‑time EEG eye state classification. Therefore, in this article, we showed that it is possible to predict eye states using EEG recordings with an accuracy from about 96% to over 99% in a real‑time system. Furthermore, all selected classifiers belong to supervised machine learning algorithms including decision rules and decision trees. Thus, all output models – for example, a set of decision rules induced by PART algorithm or other(s) – can easily be interpreted and further investigated by a person being an expert in the EEG domain.

Financial support and sponsorshipNil.

Conflicts of interestThere are no conflicts of interest.

REFERENCES

1. NiedermeyerE,daSilvaFL.Electroencephalography:BasicPrinciples,ClinicalApplications,andRelatedFields.5thed.Philadelphia,USA:Lippincott,WilliamsandWilkins,A.WolterKluwerCompany;2005.

2. RöslerO, SuendermannD.Afirst step towards eye statepredictionusingEEG.Proc.oftheInternationalConferenceonAppliedInformaticsfor Health and Life Science (AIHLS’2013). Istanbul, Turkey; 2013.

3. Stern JA, Boyer D, Schroeder D. Blink rate: A possible measure of fatigue.HumFactors1994;36:285‑97.

4. YeoMV,LiX,ShenK,Wilder‑SmithEP.CanSVMbeusedforautomaticEEGdetectionofdrowsinessduringcardriving?SafSci2009;47:115‑24.

5. FuangkaewS,PatanukhomK.Eyestatedetectionandeyesequenceclassificationforparalyzedpatientinteraction.Procofthe2ndIAPRAsian Conference on Pattern Recognition (ACPR’2013). Okinawa,Japan; 2013.

6. Chambayil B, Rajesh S, Rameshwar J. Virtual keyboard BCI using eye blinks in EEG. Proc. of the 6th IEEE International Conference on Wireless and Mobile Computing, Networking and Communications(WiMob’2010).NiagaraFalls,NU,Canada;2010.

7. Jongkees BJ, Colzato LS. Spontaneous eye blink rate as predictor of dopamine‑related cognitive function – A review. Neurosci Biobehav Rev 2016;71:58‑82.

8. PortelloJK,RosenfieldM,ChuCA.Blinkrate,incompleteblinksandcomputer vision syndrome. Optom Vis Sci 2013;90:482‑7.

9. Available from: https://www.cs.waikato.ac.nz/ml/index.html.[Last accessed on 2017 Oct 31].

10. Sharma TC, Jain M. Weka approach for comparative study for classification algorithms. Int J Adv Res Comput Commun Eng2013;2:2013.

11. Cleary J, Trigg LK. An instancje‑based learner using an entropic distance measure. Proc. of the 12th International Conference on Machine Learning (ICML’1995). Lake Tahoe, USA; 1995.

12. Klem GH, Lüders HO, Jasper HH, Elger C. The ten‑twenty electrode system of the international federation. The international federation of clinical neurophysiology. Electroencephalogr Clin Neurophysiol Suppl 1999;52:3‑6.

13. Grzymała‑Busse JW. Rule induction. In: Maimon O, Rokach L,editors. Data Mining and Knowledge Discovery Handbook. Ch. 13. Springer, Boston, MA, USA; 2005. p. 218‑25.

14. TheEEGRepositoryofBMTI.Availablefrom:https://www.tu‑ilmenau.de/en/institute‑of‑biomedical‑engineering‑and‑informatics/research/.[Lastaccessedon2017Oct31].

15. Kohavi R, SommerfieldD. Targeting business userswith decisiontable classifiers. Proc. of the 4th International Conference on Knowledge Discovery and Data Mining (KDD’1998). New York, USA; 1998.

16. Holt RC. Very simple classification rules perform well on mostcommonly used datasets. Mach Learn 2003;11:63‑9.

17. CohenWW.Fasteffectiveruleinduction.Proc.ofthe12th International Conference on Machine Learning (ML’1995). Tahoe City, California, USA; 1995. p. 115‑23.

18. Available from: https://www.cs.waikato.ac.nz/ml/weka/book.html. [Last accessed on 2017 Oct 31].

19. FrankE,HallM,PfahringerB.Locallyweightednaivebayes.Proc.ofthe 19thConferenceinUncertaintyinArtificialIntelligence(UAI’2003).Acapulco,Mexico;2003.p.249‑56.

20. Grzymała‑Busse JW. C3.4 discretization of numerical attributes.In:KlösgenW,Żytkow J, editors.HandbookonDataMining andKnowledge Discovery. Oxford: Oxford University Press; 2004.p. 218‑25.

21. Dougherty J, Kohavi R, Sahami M. Supervised and unsupervised discretizationofcontinuousfeatures.Proc.ofthe12th International Conference on Machine Learning. Tahoe City, CA, USA; 1995. p. 194‑202.

22. Pfahringer B. Compression‑based discretization of continuousattributes. Proc. of the 12th International Conference on Machine Learning. Tahoe City, CA, USA; 1995. p. 456‑63.

23. Chmielewski MR, Grzymała‑Busse JW. Global discretization ofcontinuous attributes as preprocessing formachine learning. Int JApproxReason1996;15:319‑31.

24. Kohavi R, Sahami M. Error‑based and en tropy‑based discretization ofcontinuousfeatures.Proc.ofthe2nd International Conference on KnowledgeDiscoveryandDataMining.Portland,OR,USA;1996.p. 114‑9.

25. Grzymała‑BusseJW.MLEM2–Discretizationduringruleinduction.Proc. of the International Conference on Intelligent InformationProcessing and WEB Mining Systems (IIPWM’2003). Zakopane,Poland:Springer‑Verlag;2003.p.499‑508.

26. Quinlan JR.C4.5,ProgramsforEmpiricalLearning.SanFrancisco,CA,USA:MorganKaufmannPublisherr,Inc.;1993.

27. Quinlan JR.Bagging,boosting, andC4.5.Procof the13th National ConferenceonArtificial Intelligence.Cambridge,MA,USA:AAAIPress/MITPress;1996.p.725‑30.

28. Grzymała‑Busse JW.Discretizationbasedonentropyandmultiplescanning. Entropy 2013;15:1486‑502.


Eye state classification from electroencephalography ......classification of mental states. In this...

Documents

Transcript of Eye state classification from electroencephalography ......classification of mental states. In this...