Classificationofmulticlassmotor imageryEEGsignalsusingsparsity...

Classification of multi class motorimagery EEG signals using sparsitybased dictionary learning approach

Thesis submitted in fulfillment of the

requirements for the degree of

Master of Technology

in

Computer Science and Engineeringby

Joytirmoy Rabha15CS60R12

under the supervision of

Dr. Debasis Samanta

Department of Computer Science and Engineering

Indian Institute of Technology, Kharagpur

West Bengal, India

May, 2017

CERTIFICATE

This is to certify that the thesis titled Classification of multi class motorimagery EEG signals using sparsity based dictionary learning approach,submitted by Joytirmoy Rabha, in the Department of Computer Science andEngineering, Indian Institute of Technology, Kharagpur, for the award of thedegree of Master of Technology, is a record of research work carried out byhim under my supervision and guidance.

The thesis fulfils all the requirements as per the regulations of this institute.Neither this thesis nor any part of it has been submitted for any degree oracademic award elsewhere.

Date: 5th May 2017

Dr. Debasis Samanta

Associate Professor


Indian Institute of Technology Kharagpur

Kharagpur - 721302, India.

CERTIFICATE

This is to certify that the thesis titled Classification of multi class motorimagery EEG signals using sparsity based dictionary learning approach,submitted by Joytirmoy Rabha, in the Department of Computer Science andEngineering, Indian Institute of Technology, Kharagpur, for the award of thedegree of Master of Technology, is a record of research work carried out byhim under the supervision and guidance of Dr. Debasis Samanta.

The thesis fulfils all the requirements as per the regulations of this institute.Neither this thesis nor any part of it has been submitted for any degree oracademic award elsewhere.

Date: 5th May 2017

Dr. Sudeshna Sarkar

Head of Department


Indian Institute of Technology Kharagpur

Kharagpur - 721302, India.

CERTIFICATE

This is to certify that the thesis titled Classification of multi class motorimagery EEG signals using sparsity based dictionary learning approach,submitted by Joytirmoy Rabha, in the Department of Computer Science andEngineering, Indian Institute of Technology, Kharagpur, for the award of thedegree of Master of Technology, has been examined

Date: 5th May 2017

Prof. M. S. Manikandan

External Examinar

Indian Institute of Technology Bhubaneswar

ACKNOWLEDGEMENT

I would first like to thankmy thesis advisor Dr. Debasis Samanta of the De-partment of Computer Science and Engineering at IIT Kharagpur. The doorto Prof. Samanta’s office was always open for advice whenever needed. Heconsistently allowed this paper to be my own work, but steered me in theright direction whenever he thought I needed it.

Secondly, i would like to thank the BCI team with special mention to MrsSreeja S.R for all your constant support and discussions over the past oneyear which has been one of the main reasons for the completion of this work.

Lastly, i would like to express my deepest gratitude to my parents andfriends for providing me with constant support and motivation throughoutmy studies and during the process of researching and writing this thesis. Thisaccomplishment wouldn’t have been possible without the collaboration of allthe aforementioned people. Thank you.

AuthorJOYTIRMOY RABHA

Abstract

Brain Computer Interfaces based on Motor Imagery (MI) signals hasbeen used to classify various imagined motor movement and has an arrayof applications ranging from prosthesis to entertainment. The classificationof MI signals is commonly done using spatial filtering and then using anyconventional classification method. There has been recent developments inDictionary and Sparsity based classification for MI based BCIs. The motorimagery signals captured from an EEG device have low spatial resolution dueto volume conduction. Although its temporal resolution is high, the signal isnoisy and highly correlated. Keeping this in mind its important to extract thediscerning features and select a good classification model for MI based BCIsystems.

The aim of this project is analyze and build a classification modelwhich can discriminate between different motor intentions accurately. In thiswork we attempt to propose a classification model for BCI systems usingdictionary learning and sparsity based classification technique. We analyzedthe frequency bands of various sensory motor rhythms. Band Power, Waveletdomain, DCT based and autoregressive features were extracted to build a dic-tionary which is in turn used to find a sparse representation of test signals.The analysis of the performance of the method on different datasets are pro-vided. The proposed method provides high classification accuracy which isverified using cross-validation.

Keywords:Electroencephalogram (EEG); Brain Computer Interface (BCI);Dictionary Learning; Sparsity; Discrete Cosine Transform (DCT); Motor Im-agery (MI); Feature Extraction.

Contents

Abstract v

1 Introduction 11.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 State-of-the-art . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.5 Scope of Work . . . . . . . . . . . . . . . . . . . . . . . . 71.6 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.7 Organization of Thesis . . . . . . . . . . . . . . . . . . . . 8

2 Literature Survey 92.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Biological background of Motor imagery signals . . . . . . 92.3 EEG signal pre-processing . . . . . . . . . . . . . . . . . . 112.4 Feature Engineering and Modeling of MI signals . . . . . . 122.5 Dictionary learning and Sparse representation . . . . . . . . 14

3 Dataset and Preprocessing 163.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.1 Dataset 1: BCI competition III Dataset IVa . . . . . 163.2.2 Dataset 2: BCI competition III Dataset IIIa . . . . . 19

3.3 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 213.3.1 Bandpass Filtering . . . . . . . . . . . . . . . . . . 213.3.2 Common Spatial Patterns . . . . . . . . . . . . . . . 22

4 Feature Engineering 254.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.2 Taxonomy of Features . . . . . . . . . . . . . . . . . . . . 25

4.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . 264.3.1 Time Domain Features . . . . . . . . . . . . . . . . 264.3.2 Frequency Domain Features . . . . . . . . . . . . . 264.3.3 Wavelet Domain Features . . . . . . . . . . . . . . 274.3.4 Auto-regressive Features . . . . . . . . . . . . . . . 284.3.5 Discrete Cosine Transform . . . . . . . . . . . . . . 28

5 Dictionary based Sparse Classification 315.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.2 Proposed BCI framework . . . . . . . . . . . . . . . . . . . 315.3 Design of dictionary matrix . . . . . . . . . . . . . . . . . . 32

5.3.1 Dictionary design for two-class MI . . . . . . . . . 325.3.2 Dictionary design for four-class MI . . . . . . . . . 32

5.4 Linear model . . . . . . . . . . . . . . . . . . . . . . . . . 335.5 Sparse Approximation . . . . . . . . . . . . . . . . . . . . 345.6 Sparsity based classification . . . . . . . . . . . . . . . . . 36

6 Results and Conclusion 386.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 386.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

6.2.1 Classification of two-class MI signals . . . . . . . . 386.2.2 Classification of four-class MI signals . . . . . . . . 46

6.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Bibliography 53

List of Figures

1.1 General BCI System . . . . . . . . . . . . . . . . . . . . . 21.2 Motor imagery based BCI System . . . . . . . . . . . . . . 31.3 International 10-20 system of electrodes. . . . . . . . . . . . 4

2.1 HumanBrain: (Left) PrimaryMotor Cortex, (Right) Homuncu-lus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1 Single trail signal for dataset 1. . . . . . . . . . . . . . . . . 173.2 Single trail signal for dataset 2. . . . . . . . . . . . . . . . . 193.3 Splitting of the signal trial into epochs. . . . . . . . . . . . . 203.4 Power Spectral Density of some of the EEG channels. (Left)

PSDof central electrodes (C5,C3,C1,C2,C4,C6). (Right) PSDof frontal electrodes (F5,F3,F1,F2,F4,F6) . . . . . . . . . . 21

3.5 Colormap of the magnitutde of coefficients of CSP filtersprojected on the scalp. . . . . . . . . . . . . . . . . . . . . 23

4.1 Scalp plot of bandpower and wavelet energy features for twoclass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.1 Framework of proposed BCI . . . . . . . . . . . . . . . . . 315.2 Two class dictionary with each column as a feature vector of

an epoch. . . . . . . . . . . . . . . . . . . . . . . . . . . . 335.3 Dictionary design for four class motor imagery data. (Left)

Four three dimensional matrix one for each class of features.(Right) Grid view of each matrix where l and b are real num-bers such that l×b = N (no of training signals for each class).The third axis contains the features vectors of each of thoseN training data. . . . . . . . . . . . . . . . . . . . . . . . . 34

6.1 Sparse representation of Right hand (left) and right Foot (right). 39

LIST OF FIGURES LIST OF FIGURES

6.2 Confusion matrix of classifier 1 using wavelet energy fea-tures dictionary . . . . . . . . . . . . . . . . . . . . . . . . 40




6.6 Confusion matrix of classifier 1 using bandpower featuresdictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . 42




6.10 Precision recall curve for for the two-class data using waveletfeature dictionary . . . . . . . . . . . . . . . . . . . . . . . 44

6.11 Precision recall curve for for the two-class data using band-power feature dictionary . . . . . . . . . . . . . . . . . . . 44

6.12 Average ROC curve for wavelet energy dictionary . . . . . . 456.13 Average ROC curve for bandpass dictionary . . . . . . . . . 456.14 Sparse representation of (top-left) Left class, (top-right) Right

class, (bottom-left) Foot class and (bottom-right) Tongue classsignals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47





6.19 The Precision recall curve of all the classes with bandpowerdictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

6.20 The ROC curves of all classes with bandpower dictionary . . 51

ix

List of Tables

4.1 Time Domain Features . . . . . . . . . . . . . . . . . . . . 264.2 Frequency Domain Features . . . . . . . . . . . . . . . . . 274.3 Wavelet domain features . . . . . . . . . . . . . . . . . . . 28

6.1 Acuuracies of the classifiers using 10 fold cross validationscheme and bandpower dictionary. . . . . . . . . . . . . . . 39

6.2 Acuuracies of the classifiers using 10 fold cross validationscheme and wavelet energy dictionary. . . . . . . . . . . . . 39

6.3 Accuracies of the classifiers using 10 fold cross validationand bandpower dictionary . . . . . . . . . . . . . . . . . . . 48

Chapter 1

Introduction

1.1 Overview

In this chapter wewill give a brief view of the work done in the developmentof a modernMotor Imagery based brain computer interface. We will begin bydiscussing the various keywords related to BCIs followed by the current stateof the art. The motivation,scope and objective of the work done is discussednext and lastly, we will discuss the contribution and the organization of thethesis.

1.2 Context

Human Computer Interaction has been through input devices such as key-board and mouse ever since the pioneering of the personal computers. Withthe exponentially developing world of technology and information, our handscan barely keep up with the interaction. Our brains on the other hand areknown to be one of the fastest processing units out there. As we live in thegeneration where our digital selves are as important as our physical selves,it is also important that we look into ways to communicate with technologybetter. In another perspective, there are around 1 billion in the world todaywho suffer from some form of disability. These people can utilize moderntechnologies such as BCIs (Brain-computer interface systems) to live life onequal terms with the able bodied.

BCI (Brain-computer interface) is a system for controlling a device, for ex-ample, wheelchair or a neuroprothesis by human intention which does notdepend on the brain’s normal output pathways of peripheral nerves and mus-cles. They provide a direct communication between the brain and the com-puter without muscle control. For those people with severe physical disabil-

1.2 Context Introduction

Figure 1.1: General BCI System

ities, such as damaged limbs, brain-stem stroke, spinal cord injury, cerebralpalsy, amyotrophic lateral sclerosis (ALS) or other neuro-muscular diseases,BCI is the only method to communicate. Currently, several functional imag-ing modalities like EEG, MEG, fMRI, etc., are available for research. BCIresearch leads to variety of applications in various fields such as MedicalSciences, Artificial Intelligence and Gaming. Initially, the applications weremore aimed towards the disabled. The Brain controlled wheelchair and pros-thesis being two of the famous ones. In recent times, BCI applications for thephysically able population has also seen progress in the form of games, at-tention monitoring, working memory and vigilance detection among others.Fig 1.1 shows an pictorial representation of a BCI system.

If we take a look at the various assistive technologies [1] for disabled users,be it screen readers, eye tracker or something else, we find that all of theseuse the same data flow path from human brain to hands or some other bodypart to computer peripherals like camera, keyboard to the computer memoryor CPU. What if the humans only think actively and computers somehow un-derstands the users’ intention? This forms the underlying capability of BCIwhere it distinguishes different patterns of brain activity, each being associ-ated to a particular intention or mental task, to directly control the HCI ap-plication. Hence, creating BCI based hands-free, touch-free applications cangreatly enhance the life of people with disabilities. Moreover, it is not easy

2


Figure 1.2: Motor imagery based BCI System

to apply BCI systems to operate an application like virtual keyboard.

Among the available non-invasive devices Electroencephalography (EEG)is unique and most often used, since it provides high temporal resolution ofthe measured brain signals, is relatively convenient, affordable, safe and im-proves user comfort level [2]. In BCI systems, EEG signals provide an effec-tive way to help people who have severe motor disabilities to communicatewith the outside world just by brain signals. In recent years, increasing atten-tion has been devoted to motor imagery (MI) task problems related to BCIapplications as MI activities represent an efficient mental strategy to operatea BCI system. A MI task may be seen as a mental rehearsal of a motor actsuch as movements of hands, feet, fingers and tongue without any overt mo-tor activity. The general concept diagram of a motor imagery based BCI isshown in Fig 1.2.

EEG is a method used to detect the brain’s electrical activity when it isattached to the scalp of a subject using electrodes. The electrodes of the EEGdevice are placed on the scalp using 10-20 international system as shownin Fig 1.3. It records and stores the voltage levels of brain activity over a

3


Figure 1.3: International 10-20 system of electrodes.

period of time. Modern BCIs involves the study of these brain patterns tobuild mathematical models to predict and interpret the signals based on themental and physiological activities of the subject. The interpreted signals inturn acts as commands to a computer to carry out specific tasks.

MI is a mental process of simulating a motor action in one’s mind. Thesesignals have a frequency band of 8-30 Hz in which activity is maximum. Theobjective of this project is to take the raw signals which are marked with thedifferent motor imagery movements at specific timestamps and teach a math-ematical model to distinguish the different classes of MI tasks. In this work,we attempted to use a new approach using dictionary learning and sparsitybased classification for MI signal classification. we have focused on the al-pha (α) and beta (β) rhythms of the brain to find relative features from theMI signals to build a dictionary of various motor imagery simulations. Thisdictionary is used to find a sparse representation of an input signal. A greedy,inner product based technique is used to find the sparse representation. Power,

4

1.3 State-of-the-art Introduction

variance or peak value is used to finally classify the sparse representation.

1.3 State-of-the-art

Non-Invasive BCIs suffer from volume conduction and noise and thus be-comes difficult to correctly identify the intent of the person. EEG-based BCIsystems are mostly build using visually evoked potentials (VEPs), event-related potentials (ERPs), slow cortical potentials (SCPs) and sensorimotorrhythms (SMR). Out of these potentials SMR based BCI provides high de-grees of freedom in association with real and imaginary movements of hands,arms, feet and tongue. The neural activities associated with SMR based mo-tor imagery (MI) BCI are the so-called mu (7-13 Hz) and beta (13-30 Hz)rhythms. These rhythms are readily measurable in both healthy and disabledpeople with neuromuscular injuries. Upon executing real or imaginary motormovements, it causes amplitude supression or enhancement of mu rhythmand these phenomena are called event-related desynchronization (ERD) andevent-related synchronization (ERS), respectively. Traditional BCIs rely onthis neurophysiological phenomenon to determine whether the user is per-forming a motor task or not. A simple way to detect these events is by calcu-lating the band powers in frequency domain.

As the dynamics of brain potentials associated with motor imagery taskscan form spatio-temporal patterns, the common spatial pattern (CSP) [3] isa highly successful algorithm to extract relevant MI features. This algorithmis designed to capture the spatial projections of ERD/ERS in such a way thatthe power ratio differ greatly between two classes. It performs source local-ization which is otherwise difficult to perform due to volume conduction.Although CSP is a popular method, it is non-robust to outliers, noise andover-fits with data. Hence, several variants of CSP have been devised suchas common spatial spectral pattern (CSSP) [4], spectrally weighted commonspatial pattern [5], iterative spatio-spectral patterns learning (ISSPL) [6], fil-ter bank common spatial pattern [7], and augmented complex common spatialpattern [8].

Since raw EEG data is of very high dimension, it is not possible to use itdirectly for classification. The goal of the feature extraction is to pull outfeatures (special patterns) from the original data for a reliable classification.Feature extraction is the most important part of EEG signal analysis because

5

1.4 Motivation Introduction

the classification performance will be degraded if the features are not cho-sen well [9]. The feature extraction stage must reduce the original data toa lower dimension that contains most of the useful information included inthe original vector. It is, therefore, necessary to find out the key features thatrepresent the whole dataset, depending on the characteristics of the dataset.Feature extraction is usually followed by assigning EEG segments to differ-ent groups/classes. Supervised and unsupervised classification methods maybe used. Various combinations of classifiers may also be used to increase theaccuracy and robustness of the classification.

In recent years, sparse representation based classification has received agreat deal of attention in image recognition [10] and speech recognition [11]field. In compressive sensing (CS), the sparse representation idea has beenused and according to CS theory, natural signals can be represented as sparsesignals on certain constraints [12, 13]. If the signal and an over-completedictionary matrix is given, then the objective of the sparse representation isto compute the sparse coefficients, so that the signal can be represented asa sparse linear combination of atoms (columns) in dictionary. If the over-complete dictionary matrix is not choosen properly in sparse representationbased classification, then it might result in very poor performance.

1.4 Motivation

Communication comes easy to the able-bodied, whereas it poses a seriouschallenge to the disabled. With human computer interaction (HCI), it is con-ceivable to build systems and have yielded satisfactory results. However, notmuch focus has been given on disabled people. It is time that we take this areaa step further and help the disabled. This is where BCI can play a pivotal role.The current BCI works ranges from laboratory research to gaming and enter-tainment. Although many BCIs have been developed, there is still scope forimprovement in its prediction accuracy and real time feedback. Such issueshave restricted BCIs from being widely used and fulfilling the requirement ofbeing a full fledged Human Computer Interaction method. The offline meth-ods for MI signal classification shows promising accuracy but it requires alot of pre-processing techniques which is computationally expensive and re-sults in a delay in real-time system. Another major issue with the existingclassification methods are it might work for binary MI signals but gives very

6

1.5 Scope of Work Introduction

poor classification accuracy with multi-class MI signals. The aforementionedissues motivate us to build a classification model that works for any class ofMI signals and produces high accuracy for real-time BCI applications.

1.5 Scope of Work

We have had generations of ways for Human Computer Interaction. We havecome a long from using punch cards to feed instructions to the computer, toour modern day state of the art speech recognition softwares. There has beenvarious ways to interact with the computer in the last few decades. For mod-ern PCs the command line interface and graphical user interface has beenand still is the most preferred medium for interaction. Today we have tech-nologies like Augmented and Virtual reality and real time speech recognitionat the palm of our hands. Brain Computer Interface technology is still to beutilized to its full potential . An accurate and online method to classify anyclass of motor imagery signals will allow a platform of applications to bebuilt using it as an interface between applications and user’s brain. AlthoughEEG signals depends on various aspects of the subject and his environment,a well optimized BCI system to provide a hands free human computer inter-action does not seem like a distant dream for mankind. To move towards thisgoal, we build up a classification model using dictionary learning and sparserepresentation based classification. This model works for any class of motorimagery signals and the dictionary created is scalable which changes accord-ing to the number of MI tasks performed. This novel dictionary learning andsparse representation based technique produces high accuracy for multi-classMI signal classification.

1.6 Objectives

The objective of this project is to build an online BCI system using motorimagery EEG signals with accurate predictions and real time feedback. Thiswork focuses on the analysis of existing feature extraction and classificationof EEG signals followed by the design of a BCI framework. The objectivesas the part of the research study include:

• Collecting signals related to motor imagery and signal preprocessing:The experiment is carried out to collect public dataset of sensorimotorrhythms synchronized to motor activities. Then, the acquired raw EEG

7

1.7 Organization of Thesis Introduction

signals are preprocessed using bandpass filtering and common spatialpattern filtering.

• Feature engineering : In this process, each preprocessed EEG signal isrepresented by a set of parameters (features). Then, Feature Normaliza-tion is performed in order to avoid possible problems caused by inade-quately scaled features. This part of the EEG processing is crucial, be-cause it provides the ability to distinguish between different classes. Itthus directly affects the accuracy of the final classification.

• Modeling for command detection : Various supervised and unsupervisedclassification methods can be used to obtain the final results of the anal-ysis. Here, we planned to implement dictionary learning and sparse rep-resentation based classification method to classify the MI signals.

1.7 Organization of Thesis

The thesis is organized as follows:

• Chapter 2 is the literature survey of the existingmethods in pre-processingof MI based EEG signals, feature engineering and classification of MIsignals.

• Chapter 3 gives us details of the datasets used in this work and prelimi-nary processing done on the dataset.

• Chapter 4 describes the various features that can be extracted from theMI based EEG signals.

• Chapter 5 will explain the proposed methodology and the various stagesof feature engineering and classification.

• Chapter 6 will conclude the work with results of the experiments andanalysis and will give a plan for the future work.

8

Chapter 2

Literature Survey

2.1 Overview

In this chapter we present a thorough survey of the brain and motor activities,the recent developments in brain computer interfaces using motor imagerysignals.

2.2 Biological background of Motor imagery signals

The brain is one of the most fascinating and mysterious organs of the humanbody. It controls all the our physiological activities of human beings.There arebillions of neurons interconnected in our brain. These transfer the impulsesthrough a network of neurons and control our movement, feelings and otheraspects of human life. The transfer of ions from neuron to neuron duringevery physiological or mental activities creates an electric field which can becaptured through EEG for various applications [14].

The brain generates signals when electrical impulses pass through the neu-rons in the brain. Since EEG depends largely on the subject present state andhis environment, brain signals differ widely in terms of frequency, amplitudeand shape. Our brains have two main type of processes. These include EventRelated Potentials (ERPs) [15] and Oscillatory processes [16]. ERPs is themeasure of brain potential in direct response to a sensory, cognitive or mo-tor action. The ERPs related to motor actions are further subdivided into twoclasses.

• Event Related Synchronization : The potential of the brain increasesafter certain events. This increase in potential is called Event related Syn-chronization.

2.2 Biological background of Motor imagery signals Literature Survey

• Event RelatedDesynchronization :The potential of the brain decreasesafter certain events. This decrease in potential is called Event relatedDesynchronization.

The above metioned ERPs are widely used to classify various events in thebrain and is well versed in literature. Apart fromERPs our brains also exhibitssome repetitive activity which are characterized by amplitude, frequency andphase of the signals. These brain rhythms can be classified into the followingtypes:

• Delta Waves : Delta brainwaves are slow, loud brainwaves (low fre-quency and deeply penetrating, like a drum beat). They are generated indeepest meditation and dreamless sleep. Their frequency range from 0.5Hz to 3 Hz.

• Theta Waves : Theta brainwaves occur most often in sleep but are alsodominant in deep meditation. It acts as our gateway to learning andmem-ory. Their frequency range from 3 Hz to 8 Hz.

• Alpha Waves : Alpha brainwaves are dominant during quietly flowingthoughts, and in some meditative states. Alpha is the power of now, be-ing here, in the present. Alpha is the resting state for the brain. Alphawaves aid overall mental coordination, calmness, alertness, mind/bodyintegration and learning. Their frequency range from 8 Hz to 13 Hz.

• BetaWaves :Beta brainwaves dominate our normal waking state of con-sciousness when attention is directed towards cognitive tasks and the out-side world. Their frequency range from 13 Hz to 38 Hz.

• GammaWaves :Gamma brainwaves are the fastest of brain waves, andrelate to simultaneous processing of information from different brain ar-eas. Their frequency range from 38Hz to 42Hz.

In this work we are focused onMotor Imagery based BCIs. MI based BCIsuse sensory motor rhythms such as Alpha (α) and Beta (β) over the motorcortex area of the brain. The sensory motor cortex cross section is knownas homunculus which provides a mapping of which part of the brain controlwhich part of the body. Fig 2.1 shows the motor cortex and the homunculusregion of the brain

10

2.3 EEG signal pre-processing Literature Survey

Figure 2.1: Human Brain: (Left) Primary Motor Cortex, (Right) Homunculus

2.3 EEG signal pre-processing

EEG signal recorded using scalp electrodes are usually noisy and non sta-tionary. These include unwanted signals, artifacts, noise from electrical sys-tem and environmental noise. Thus we need powerful preprocessing to makesense of the highly noisy data. EEG signals are known to have poor spatialinformation due to volume conduction [17]. Spatial filtering is the process toperform point source localization and get rid of the volume conduction effect.Some spatial filtering methods include Laplacian [18], common average ref-erence [19],cross-coorelation [20] and Common Spatial Patterns [19].

As the dynamics of brain potentials associated with motor imagery taskscan form spatio-temporal patterns, the common spatial pattern (CSP) [3] isa highly successful algorithm to extract relevant MI features. This algorithmis designed to capture the spatial projections of ERD/ERS in such a way thatthe power ratio differ greatly between two classes. In our work we have usedthe Common Spatial Patterns method.

Although CSP is a popular method, it is non-robust to outliers, noise andover-fits with data. Hence, several variants of CSP have been devised suchas common spatial spectral pattern (CSSP) [4], spectrally weighted commonspatial pattern [5], iterative spatio-spectral patterns learning (ISSPL) [6], fil-ter bank common spatial pattern [7], and augmented complex common spatialpattern [8].

11

2.4 Feature Engineering and Modeling of MI signals Literature Survey

2.4 Feature Engineering and Modeling of MI signals

The goal of the feature extraction is to pull out features (special patterns) fromthe original data for a reliable classification. Feature extraction is the mostimportant part of EEG signal analysis because the classification performancewill be degraded if the features are not chosen well [9]. The feature extractionstage must reduce the original data to a lower dimension that contains mostof the useful information included in the original vector. It is, therefore, nec-essary to find out the key features that represent the whole dataset, dependingon the characteristics of the dataset. Because of the large numbers of features,various feature reduction methods can be used. Feature engineering is usuallyfollowed by assigning EEG segments to different groups/classes. Supervisedand unsupervised classification methods may be used. Various combinationsof classifiers may also be used to increase the accuracy and robustness of theclassification.

Tam et al. [21] used Fisher’s criterion and the support-vector machine re-cursive feature elimination (SVM-RFE) techniques to reduce 64 channels to18 channels. Then common spatial pattern (CSP) was used to preprocess thetwo class MI EEG signals. The log-variance of each channel was used asthe feature. Then the log-variance of the spatially filtered signal was usedto train a Fisher’s linear discriminant (FLD) classifier and it classified motorimagery signals with an accuracy of 91.5 ± 2.6%. To attain better classifica-tion accuracy Tam et al. suggested to select frequency band as feature alongwith log-variance, primarily for two classes. In [22], Yuan et al. used fre-quency domain features by applying fourier transform for MI based BCI andin [23] Park et al. uses the empirical mode decomposition (EMD) to extracttime-frequency domain features from MI signals.

In [24], for a two class MI signals, nine statistical features are extractedfrom each data points as they are the most representative values to describethe distribution of the EEG signals. The extracted features are the minimum,maximum, mean, median, mode, first quartile, third quartile, inter-quartilerange and standard deviation of the EEG data. For a symmetric distribu-tions, the mean and the standard deviation and for a skewed distributions,the inter-quartile range are the appropriate measures for measuring the cen-tre and spread of the data. Like mean and median, the mode is used as a way

12

2.4 Feature Engineering and Modeling of MI signals Literature Survey

of capturing important information about a data set. For these reasons, theyconsidered these nine statistical features as the valuable parameters for repre-senting the distribution of EEG signals. The obtained features are employedas the input for the Least Square Support Vector Machine (LSSVM) classifierand it achieved an accuracy of about 95.72%.

Li et al. in 2014 [25] extended the above work and they used cross corre-lation (CC) technique to extract the representative features from the MI tasksEEG data. From each cross-corrologram the features extracted are the mean,standard deviation, skewness, kurtosis, maximum and minimum. Skewnessdescribes the shape of a distribution that characterizes the degree of asym-metry of a distribution around its mean. Kurtosis measures of whether thedata are peaked or flat relative to a normal distribution. Then, the logistic re-gression (LR) model is used for the classification ofMI tasks and achieved anaccuracy of about 98.57%. This method has been compared with eight recentclassification methods and it performs the best for the motor imagery signalclassification.

It is known that characteristic motor imagery causes modulations of EEGpower spectra. Schlogl et al. [26] showed that power spectrum changes arereflected in changes of the autoregressive (AR) coefficients. Hence, in [27]Kus et al. used power spectral features and AR coefficients to train multi-nomial logistic regression classifier. It achieved an accuracy of 74.84% onclassifying three class motor imagery EEG signals. In [28], Almonacid et al.used the same power spectral features and Neuro-fuzzy algorithm S-dFasArtwith rule pruning and voting strategy for classification purpose. It classifiesthree class motor imagery EEG signals with 80-90% accuracy.

Baali et al. in 2015 [29] reported an improved method on classifying mul-ticlass EEG signals. He employed his proposed method on the four class mo-tor imagery tasks (left hand, right hand, left foot and right foot) EEG data forDataset IIIa from BCI competition III. They applied Linear prediction sin-gular value decomposition (LP-SVD) as feature extraction method and theyextracted the LP coefficients, the prediction error variance, the Q-statisticsand the Hotelling’s T2 statistics as features from the transformed data. Thenthey applied sequential forward selection (SFS) algorithm to find the most in-formative subset of features to be used in classification. Logistic model tree

13

2.5 Dictionary learning and Sparse representation Literature Survey

with simple regression functions as base learners was used as classifier andit achieved an accuracy of about 81.38%.

In [30], Nicolas et al. used mutual information based best individual fea-ture (MIBIF) algorithm to select the most discriminative spatial-spectral fea-tures. Then Stacked Regularized Linear Discriminant Analysis (SRLDA)wasused as a classifier to classify four class motor imagery signals. The NaïveBayes classifier continues to be a popular learning algorithm for data miningapplications due to its simplicity and linear run-time. Many techniques havebeen developed to improve the Naïve Bayes algorithm. In [31], the combina-tion of Naïve Bayes Parzen window (NBPW) classifier and linear minimummean distance classifier achieved an accuracy of about 90% on classifyingfour class motor imagery signals.

2.5 Dictionary learning and Sparse representation

In recent years, a lot of work has been done using dictionaries in applying itin some state of the art technologies like image denoising [32], inpainting[33], clustering [34], classification [35] etc. We have used the same conceptto build a dictionary for motor imagery EEG signals using band power andwavelet transform energy features.

Sparse representation has recieved considerable attention in recent yearsand is one of the most representative methodology in Linear representativemodel [36, 37]. It has had huge success in wide range of applications suchas signal processing, image processing, machine learning, image denoising,computer vision, image segmentation etc. Sparse representation has its originin Compressed Sensing theory [38]. In the literature of sparse problem mod-elling [39, 40], sparse representation algorithms are divided into two groups:greedy algorithms and convex relaxation algorithms.

A greedy algorithm named Orthogonal Pursuit Matching [41] has beenapplied in our work, which is an improvement of the Matching Pursuit al-gorithm [42] to get the sparse representation. We note that there has beenrelated works in the BCI application. For example [43] uses sparse represen-tation for source separation to enhance the efficiency of feature extraction,[44] aims to enforce sparsity condition on the selection procedure of spatial

14

2.5 Dictionary learning and Sparse representation Literature Survey

filter coefficients by a sparsity-regularization term in the CSP optimizationproblem and a recent work gives a sparse representation based classificationscheme of two class MI signals [45].

15

Chapter 3

Dataset and Preprocessing

3.1 Overview

In this chapter we will give a detailed description of the two motor imagerydatasets used for our analysis. We will follow this by the preprocessing ofthese discrete signal trials before it can be used further for classification.

3.2 Datasets

The two datasets were taken fromBerlin Brain Computer Interface BCI Com-petition III which is a competition whose goal was to validate classificationand signal processing for Brain Computer Interfaces. The first is the DatasetIV a. The other is the Dataset III a. The difference between the two datasets isthat they used different number of channels and there are different number ofclasses of motor imagery signals. The description of he two datasets is givenbelow.

3.2.1 Dataset 1: BCI competition III Dataset IVa

The data set is a two class (Right hand, Foot) motor imagery data provided byFraunhofer FIRST, Intelligent Data Analysis Group (Klaus-Robert Müller,Benjamin Blankertz), and Campus Benjamin Franklin of the Charité - Uni-versityMedicine Berlin, Department of Neurology, NeurophysicsGroup (GabrielCurio).

Experimental Setup

This data set was recorded from five healthy subjects. Subjects sat in a com-fortable chair with arms resting on armrests. This data set contains only datafrom the 4 initial sessions without feedback. Visual cues indicated for 3.5 s

3.2 Datasets Dataset and Preprocessing

Figure 3.1: Single trail signal for dataset 1.

which of the following 2 motor imageries the subject should perform: (R)right hand, (F) right foot. The presentation of target cues were intermitted byperiods of random length, 1.75 to 2.25 s, in which the subject could relax. Thetime sequence of each signal trial is depicted in Fig 3.1 . There were two typesof visual stimulation: (1) where targets were indicated by letters appearing be-hind a fixation cross (which might nevertheless induce little target-correlatedeye movements), and (2) where a randomly moving object indicated targets(inducing target-uncorrelated eyemovements). From subjects al and aw 2 ses-sions of both types were recorded, while from the other subjects 3 sessionsof type (2) and 1 session of type (1) were recorded.

Format of the Data

Given are continuous signals of 118 EEG channels and markers that indicatethe time points of 280 cues for each of the 5 subjects (aa, al, av, aw, ay).For some markers no target class information is provided (value NaN) forcompetition purpose. Only cues for the classes ’right’ and ’foot’ are providedfor the competition. Data are provided in Matlab format (*.mat) containingvariables:

• cnt: the continuous EEG signals, size [time x channels]. The array isstored in datatype INT16. To convert it to uV values, use cnt= 0.1*dou-

17


ble(cnt); in Matlab.

• mrk: structure of target cue information with fields

– pos: vector of positions of the cue in the EEG signals given in unitsample, length: no. of cues

– y: vector of target classes (1, 2, or NaN), length: no. of cues– className: cell array of class names.

• info: structure providing additional information with fields

– name: name of the data set,– fs: sampling rate,– clab: cell array of channel labels,– xpos: x-position of electrodes in a 2d-projection,– ypos: y-position of electrodes in a 2d-projection.

An alternative, data is also provided in zipped ASCII format (splitted intothree files for each subject):

• *cnt.txt: the continuous EEG signals, where each row holds the valuesfor all channels at a specific time point

• *mrk.txt: target cue information, each row represents one cue where thefirst value defines the time point (given in unit sample) and the secondvalue the target class (1= right, 2=foot, or 0 for test trials).

• *info.txt: contains other information as described for the matlab format.

Technical Information

The recordingwasmade usingBrainAmp amplifiers and a 128 channel Ag/AgClelectrode cap from ECI. 118 EEG channels were measured at positions ofthe extended international 10/20-system. Signals were band-pass filtered be-tween 0.05 and 200 Hz and then digitized at 1000 Hz with 16 bit (0.1 uV)accuracy. Another version of the data that is downsampled at 100 Hz (bypicking each 10th sample) that we typically use for analysis is also provided.Each subject has three files: the data signal file, the marker file for the differ-ent event positions and an extra information file containing channel namesand locations.

18


Figure 3.2: Single trail signal for dataset 2.

3.2.2 Dataset 2: BCI competition III Dataset IIIa

This dataset is a cued motor imagery (multi-class) data with 4 classes (lefthand, right hand, foot, tongue) provided by the Laboratory of Brain-ComputerInterfaces (BCI-Lab), Graz University of Technology, (Gert Pfurtscheller,Alois Schlögl).

Experimental Setup

During the experiment the subject sat in a relaxing chair with armrests. Thetask was to perform imagery left hand, right hand, foot or tongue movementsaccording to a cue. The order of cues was random. The experiment consistsof several runs (at least 6) with 40 trials each each; after trial begin, the first2s were quite, at t=2s an acoustic stimulus indicated the beginning of the trial,and a cross “+” is displayed; then from t=3s an arrow to the left, right, up ordownwas displayed for 1 s; at the same time the subject was asked to imaginea left hand, right hand, tongue or foot movement, respectively, until the crossdisappeared at t=7s. Each of the 4 cues was displayed 10 times within eachrun in a randomized order. The timing of a single trial is shown in Fig 3.2.

Format of the Data

The data is stored in the GDF format [1] and can be loaded into Matlab orOctave with Biosig-toolbox [2] (version 0.81 or higher) using the command

19


Figure 3.3: Splitting of the signal trial into epochs.

[s,HDR] = sload(filename). The data s can contains NaN’s; these NaN’s in-dicate the breaks in between the runs or saturation of the analog-to-digitalconverter.

The beginning of each trial (t = 0s according to Fig. 3.2) can be obtainedfromHDR.TRIG; the classlabels are stored inHDR.Classlabel. HDR.Classlabelcan contain the values ‘1’,’2’,’3’,’4’, and ‘NaN’. Values ‘1’,’2’,’3’,’4’ indi-cate the labels of the training set, NaN indicates the trials of the test set.

Technical Information

The recording was made with a 64-channel EEG amplifier from Neuroscan,using the left mastoid for reference and the right mastoid as ground. The EEGwas sampledwith 250Hz, it was filtered between 1 and 50Hzwith Notchfilteron. The data of all runs was concatenated and converted into the GDF format.

Each of the two datasets have a training period during which the subject per-forms the intended motor imagery. We have generated epochs of one secondwindow from each trial as shown in an example in Fig 3.3 and split them intotraining and test set with 90% of the data as the training set and the remaining10% as test. We made 10 folds of the data, so that every epoch window of thedata can be taken as a test case in at least one occasion.

20

3.3 Preprocessing Dataset and Preprocessing

Figure 3.4: Power Spectral Density of some of the EEG channels. (Left) PSD of central electrodes(C5,C3,C1,C2,C4,C6). (Right) PSD of frontal electrodes (F5,F3,F1,F2,F4,F6)

3.3 Preprocessing

The signals obtained by the signal acquisition step are usually very noisy andcontains a a lot of artifacts including high frequency noise due to electricalinterference and physiological artifacts such as EOGs and EMGs. Apart fromthe noise, there is the issue of volume conduction[] in EEG due to distancebetween the scalp and the neurons which makes it difficult to pin point theexact location of where an activation took place. Preprocessing these signalsis a necessary step for our experiment. The following are the steps involvedin preprocessing:

3.3.1 Bandpass Filtering

A bandpass filter is an electronic device or circuit that allows signals betweentwo specific frequencies to pass, but that discriminates against signals at otherfrequencies. From our experiments and domain knowledge we know that,most brain activities related to motor imagery are within the frequency bandof (8-30)Hz. So a bandpass filter of (8-30)Hz window is used. Fig 3.4. showsthe power spectral density plots of few of the channels which depicts activityin the same frequency band. The bandpass filter filters out most of the highfrequency noise. The filter can have as many sub bands as one wants. Wehave experimented with two sub bands of 8-14Hz and 15-30Hz in the twoclass and three sub bands namely 8-13Hz, 14-20Hz and 20-30Hz in the fourclass classification problem. The choice of the sub bands is due to the factthat Aplha,Beta brain rhythms reside within those frequency bands.

21


3.3.2 Common Spatial Patterns

EEG devices are usually multi channel devices but for a particular experimentnot all signals are required. In other words, there are important and unimpor-tant channels for classification. This is done with the help of a spatial filter.The spatial filter reduces the number of channels in such a way that the re-sulting signals are easy to classify. Common Spatial Patterns is one of themost commonly used spatial filter used in building MI based BCIs.

The signals which are segmented into one second time samples are spatiallyfiltered using a CSP filter. CSP is a very commonly method used in motorimagery BCI for two classes. It maximizes the variance of one class whileminimizing it for the other class.

Let X1 and X2 be two windows of a multivariate signal belonging to twodifferent classes of motor imagery. They are both of size (n,c) where n is thenumber of sampled time points and c is the number of channels. We denotethe spatial filter by

S = WT E

where S ∈ R(d,n) is the spatially filtered signal window, W ∈ R(d,c) is thespatial filter and E ∈ R(n.c) is the input signal to the spatial filter.

The Common Spatial Pattern algorithm determines the spatial filter w suchthat the variance between the two class is maximized:

w = argmax|wX1 ||wX2|

where w is a component of the filterW The solution to the spatial filter prob-lem is given by computing the two covariance matirices:

R1 =X1XT1n

R1 =X2XT2n

Then, the generalized eigenvalue decomposition is realized. We find the ma-trix of eigenvectors P = [p1, ... , pn] and the diagonal matrixD of eigen valuessuch that

22


Figure 3.5: Colormap of the magnitutde of coefficients of CSP filters projected on the scalp.

P−1R1P = D

andP−1R2P = In

where In is an Identity matrix.

The columns of P will give us the spatial filters in decreasing order of vari-ance for one class and increasing order of variance for another class. So wwill be the leftmost column of P We take the first m and last m columnsfrom P to make the spatial filter matrixW. Thus, we will have d = 2m chan-nels after the signals are passed through the CSP filter. For our experimentwe computed the two class windows X1 and X2 by taking the average of alltraining windows of each class.

For the multi-class MI signal case we performed a one versus rest CSPwhere in a four-class scenario like ours, there will be four CSP filters eachmaximizing the variance of one class and minimizing the variance of another.The CSP filters can be plotted back on the different channels to see the acti-vations of various regions of the brain. Fig 3.5 gives us the color map of few

23


of the spatial filters on the scalp which increases the variance of one classwith respect to the other.

The spatial filter transforms the signal to a new more discriminative signalbut also causes changes to the real data. This might lead to losing valuableinformation from the EEG signals. In our method CSP filtering is not con-sidered to be a core components and the bandpass filtered signals are directlyused for the feature engineering and classification steps.

24

Chapter 4

Feature Engineering

4.1 Overview

In this chapter we will discuss the various features to be extracted from theEEG signals to reduce the dimensionality of data and use those features toclassification.

4.2 Taxonomy of Features

Each class window in our experiment is a multivariate signal window with100 samples. The high dimensionality of the data windowmakes it difficult totake the window as an input to a model for classification. The dimensionalityis reduced by extracting meaningful features from the time windows. Thetime window, after feature extraction gets converted into a feature vectorV ∈R(ch,1), where ch is the number of channels and each element of the vector isa specific feature of the corresponding channel’s signal. Some of the widelyused features in BCIs is broadly divided into the following categories:

• Time Domain Features

• Frequency Domain Features

• Wavelet Domain Features

• Autoregressive Features

• Discrete Cosine Transform Features

4.3 Feature Extraction Feature Engineering

FEATURE DESCRIPTIONHjorth Parameters Measure of signal complexity

1st Differential(Mean/Max/Min) Mean, maximum and minimum value of solution offirst differential

2nd Differential(Mean/Max/Min) Mean, maximum and minimum value of solution ofsecond differentialKurtosis A measure of flatness of the distributionSkewness A measure of asymmetry of the distribution

Coefficient of variation A statistical measure of the deviation of a variable fromits mean, standard deviation divided by mean.Mean of vertex to vertex slope Mean of vertex-to-vertex slopeMaximum amplitude Maximum amplitude of the signalRoot Mean Square(RMS) Root mean square value of the signalStandard deviation Standard deviation of the input signal

Table 4.1: Time Domain Features

4.3 Feature Extraction

4.3.1 Time Domain Features

Features in time domain have been used in medical and engineering practicesand researches. Time domain features are used in signal classification due toits easy and quick implementation. Furthermore, it does not need any trans-formation, which the features are calculated based on raw EEG time series.Non-stationary property of EEG signal, changing in statistical properties overtime has come to be disadvantage for the features in time domain, but timedomain features assume the data as a stationary signal. Moreover, much inter-ference that is acquired through the recording because of their calculations isbased on the EEG signal amplitude. However, compared to frequency domainand time-frequency domain, time domain features have been widely used be-cause of their performances of signal classification in low noise environmentsand their lower computational complexity. Some of the time domain featuresextracted are given in table 4.1.

4.3.2 Frequency Domain Features

The frequency domain features captures the information based on the fre-quency of brain rhythms during motor imagery tasks. As mentioned in Chap-ter 2 there are brain rhythms in specific frequency bands where most of theneural activities take place. To analyze the characteristics of the EEG signalsthe Power Spectral Density(PSD) can be realised as shown in Chapter 3 inFig 3.4.

26


FEATURE DESCRIPTIONFFT DELTA 0.1 - 3 HzFFT THETA 3 - 7 HzFFT ALPHA 7 - 12 HzFFT BETA 12 - 30 HzFFT GAMMA 30 - 40 HzFFT DT RATIO DELTA / THETAFFT DA RATIO DELTA / ALPHAFFT TA RATIO THETA / ALPHA

Table 4.2: Frequency Domain Features

The PSD can be calculated by the Fourier Transform of the time domainsignals into frequency domain by using non parametric methods. One of thesemethods is known asWelch’s method. For each channel we calculate the bandpower of the Fourier transformed signal and build our feature vector.

bandpowerch=i = log(1 + X2)

where X is the signal in frequency domain having a specific frequency con-tent and ch = i represents the specific channel whose band power is to befound. Some of the frequency domain features extracted are given in table4.2.

4.3.3 Wavelet Domain Features

Wavelet transform is a spectral estimation technique in which any generalfunction can be expressed as an infinite series of wavelets. The basic idea be-hind wavelet analysis consists of expressing a signal as a linear combinationof a particular set of functions, obtained by shifting and dilating one singlefunction called a mother wavelet. The decomposition of the signal leads to aset of coefficients called wavelet coefficients. In our work we have used theDiscrete Wavelet Transform(DWT) which employs two functions, namely,scaling function and wavelet function. The DWT gives rise to two details, Diand Ai which are the down sampled outputs of the high pass and low passfilters at wach decomposition level. The mother wavelet used is the coif1wavelet as it gives the most distinguishing features. The features extractedare energy and entropy of both Ds and As. Mathematically wavelet transformcan be expressed as

F(a, b) =∫ +∞−∞ f (x)g

∗(x)dx

where the * is the complex conjugate symbol and function ψ is some function.Table 4.3 shows some of the extracted features.

27


FEATURE DESCRIPTIONMIN WAV VALUE Minimum valueMAX WAV VALUE Maximum valueMEAN WAV VALUE Mean valueMEDIAN WAV VALUE Median valueENTROPY SPECTRAL WAV The spectral entropy1st DIFF WAV MEAN Mean value of first derivative1st DIFF WAV MAX Max value of first derivative2nd DIFF WAV MEAN Mean value of second derivative2nd DIFF WAV MAX Max value of second derivativeENERGY PERCENT WAV Percentage of total energyWAV TOTAL ENERGY Total energyWAV COEFF OF VARAITION Coefficient of variation

Table 4.3: Wavelet domain features

4.3.4 Auto-regressive Features

Auto-Regressive(AR)methodmodels the signal at any given time as aweightedsum of signals at previous time and some noise. Burg’s algorithm was used tofind the auto-regressive coefficients. The coefficients were used as featuresfor the analysis. Mathematically it can be formulated as:

X(t) = a1X(t1) + a2X(t2) + ..... + apX(tp) + Et

where, X(t) is the measured signal at time t, Et is the noise term, a1 to ap arethe auto-regressive parameters and p is the model order. The coefficients ofthe autoregrssive model gives us the features from this paradigm

4.3.5 Discrete Cosine Transform

A discrete cosine transform (DCT) expresses a finite sequence of data pointsin terms of a sum of cosine functions oscillating at different frequencies. TheDCT is similar to the discrete Fourier transform: it transforms a signal fromthe spatial domain to the frequency domain. The general formulation for a1D (N data items) DCT is defined by the following equation:

F(u) = (2

N)12

N−1∑i=1

A(i)cos[π.u2N

(2i + 1)] f (i)

The DCT coefficients are then used to find the power of the DCT signalto build our feature vector. The DCT is better than the Fourier transform asDCT can approximate lines better with fewer coefficients.

The different features are not all necessary for the classification task. Tooptimize the features selected we can perform feature selection algorithms

28


such as JointMutual Information [46],MinimumRedundancy andmaximumRelevance [47] to get a subset of the features extracted. In our work wehave not applied any feature selection algorithm but rather taken two featuresnamely: FFT band power at Alpha and Beta rhythms, wavelet energy using’coif1’ wavelet. These two features are plotted on the scalp to give it a spatialrepresentation. In Fig 4.1 (a) the bandpower features within 8-20 Hz is plottedon the scalp for two different classes, (b) Wavelet energy for the sampledoutput for the high pass filter is shown in the plot (c) Wavelet energy for thesampled outputs for the loss pass filter in wavelet transform. The plots showthat the two classes are differentiable even visually.

The plots can be obtained similarly for multi- class case. The plot givesus one instance of the features extracted from each channel. These featureswill help us build the dictionary based on which the classification will bedone. this ends our feature extraction step. We can of course try out differentcombination of features or new features to build the dictionary but for nowwe will keep our discussion to the aforementioned two extracted features asit encompasses most of the information related to Motor Imagery signals.

This concludes our feature engineering step and we are left with featurevectors of all the epochs in our training data. This has reduced the dimensionof the signal to the number of channels which is the length of the featurevector. In our work we have used sub band powers for our features. In sucha case, the feature vector size increases by a multiple of the number of subbands. In the next chapter we will discuss the proposed methodology of thedictionary based method.

29


Figure 4.1: Scalp plot of bandpower and wavelet energy features for two class

30

Chapter 5

Dictionary based Sparse Classification

5.1 Overview

This chapter will discuss the proposed framework and methodology for MIbased BCI system. The various steps involved will be discussed in details inthe following sections.

5.2 Proposed BCI framework

The framework design is an important element in BCIs. Fig 5.1 shows theproposed framework of our BCI system.

The raw stream of EEG signals is segmented by a one second window andbandpass filtered into one or more sub bands depending on the frequencyband of the brain rhythmic activities. The bandpass filtered signal is thenspatially filtered by using a Common Spatial Pattern (CSP). The CSP stepis optional as we have obtained satisfactory results even without the spatialfilter. Since these signals are of multi channel and each channel is of high

Figure 5.1: Framework of proposed BCI

5.3 Design of dictionary matrix Dictionary based Sparse Classification

dimensional time series, we extract features from each channel to reduce thedimensionality of the data.

A precomputed dictionary of the feature vectors from the training set of thedata is used to find the sparse representation of the input signal. The dictio-nary must be properly designed to get good classification accuracy. At last,we perform a simple sparse based classification to get the desired output. Thefollowing section will describe each of the steps involved in the BCI frame-work in details.

5.3 Design of dictionary matrix

As we have already discussed the preprocessing and feature extraction step,the design of the dictionary is the next important element in the sparse basedclassification. Our dictionarywill contain feature vectors of each of the epochsand the epoch of the same class are clubbed together in the dictionary. Thesefeature vectors are called atoms of the dictionary. The dictionary generatedfor two-class MI and four-class MI is discussed in detail in the followingsubsections.

5.3.1 Dictionary design for two-class MI

For the two-class MI signal, the dictionary matrix D is created, that is:

D: = [DL:DR]

where Di=[di,1,di,2,di,3,....,di,N] where i = R for right hand class and i = Lfor left hand say, for our example,N is the number of training signals for eachclass and each column vector di, j ∈ Rm,1, j = 1, 2, ...., N having dimensionm containing m features.The same procedure is followed for left and righthand MI signals to construct the dictionary D. Fig 5.2 gives an example ofthe construction of a two class dictionary. The dictionary can be made withone type of feature or a combination of different features.

5.3.2 Dictionary design for four-class MI

For the four class MI data, the design of the dictionary will change, andwe propose a three dimensional dictionary as shown in Fig 5.3. The finaldictionary:

D = [DL:DR:DF:DT ]

32

5.4 Linear model Dictionary based Sparse Classification

Figure 5.2: Two class dictionary with each column as a feature vector of an epoch.

is a concatenation of four three dimensional matrix Di ∈ R l,b,m where i =(L, R, F,T), l, b ∈ R such that l × b ≤ N and m is the size of the feature vec-tor; similar to the two class case. So Di(x,y) = [fx,y,1,fx,y,2,......,fx,y,m] wherex ∈ (1, l), y ∈ (1, b) are the different feature vectors which will be used forclassification.

The purpose of this structure is to store the dictionary it in a distributeddatabase system for parallel processing and it helps us to perform various ac-tions on the existing dictionary for better performance. It provides scalabilityand helps maintain structure of the high dimensional data. In our analysiswe have use a two dimensional version of our 3D dictionary for our analysiswhich is similar to the two class case in Fig 5.2 but with more dividing classesto make our analysis simpler.

5.4 Linear model

After the construction of the dictionary, we have our linear system of equa-tions to get the sparse representation of the input signals. The test signal isfirst converted to a feature vector y ∈ Rm,1,in the same way as the columnsin the dictionary D. So the input vector can be represented as a linear combi-

33

5.5 Sparse Approximation Dictionary based Sparse Classification

Figure 5.3: Dictionary design for four class motor imagery data. (Left) Four three dimensional matrix one foreach class of features. (Right) Grid view of each matrix where l and b are real numbers such that l × b = N (noof training signals for each class). The third axis contains the features vectors of each of those N training data.

nation of the columns of D, the columns of D can be called as its atoms:

y =∑

i

xi,1di,1 + xi,2di,2 + ... + xi,Ndi,N

where xi, j ∈ R, j = 1, 2, ..., N are scalar coefficients and i = (L,R) for the twoclass and i = (L,R,T,F) for the five class. In matrix form it can be representedas:

y = Dx

where x = [xi,1, xi,2, ..., xi,N ]

The task is to estimate the scalar coefficients so that we can sparsely repre-sent the test signals as a linear combination of some columns of dictionary Dto successfully classify the signal.

5.5 Sparse Approximation

The sparse representation of an input signal y can be obtained by performingL0 norm minimization as follows:

34

5.5 Sparse Approximation Dictionary based Sparse Classification

min ∥x∥0 subject to y = Dx

L0 norm optimization gives us the sparse representation but the algorithmin an NP-hard problem and therefore a good alternative is the L1 norm whichhas been can also be used to obtain sparsity. Recent development tells us thatthe representation obtained by L1 norm optimization is also content with thecondition of sparsity and is comparable to the L0 norm. Furthermore, the L1norm can be solved in polynomial time. The L1 norm is shown geometricallyin Fig . Thus the optimization problem becomes:

min ∥x∥1 subject to y = Dx

One can choose from a variety of L1 norm algorithms available to obtainsparsity.

The sparse approximation algorithms can be classified into four classesnamely : the greedy strategy approximation, constrained optimization strat-egy, proximity algorithm based optimization strategy, and homotopy algo-rithm based sparse representation.

In our work we have utilized a greedy based approximation technique tosolve the minimization problem. The greedy technique does not solve theminimization problem but gives an approximate solution by finding the bestoptimal solution at each iteration. The greedy always gives a global optimalapproximate.

The Orthogonal Matching pursuit (OMP) is a greedy algorithm for sparserepresentation and is one of the oldest greedy algorithms. It employs theconcept of orthogonalization to get orthogonal projections at each iterationand is known to converge in few iterations. For OMP to work in the de-sired way, all the feature vectors in dictionary D should be normalized suchthat ∥Di( j)∥ = 1,where i = (L, F) and j = 1, 2, ..., N for two classes and∥Di( j, k)∥ = 1 where i = (L, R, F,T), j = 1, 2, ..., l and k = 1, 2, ..., b forfour classes. The steps followed in OMP algorithm is given below in Algo-rithm 1:

We get the sparse representation x of the feature vector y which will beused for further classifying two-class and four-class MI signals.

35

5.6 Sparsity based classification Dictionary based Sparse Classification

Algorithm 1 Orthogonal Matching pursuit (OMP)Tasks: Approximate the problem x̂ = argmax(∥x∥0) s.t y = DxInput: Input feature vector y, Dictionary matrix D and the scalar coefficients vector xInitialization : t = 1 is the counter, residual r0 = y, RD0 = ϕ , index set Λ0 = ϕ , ρ is a small constant and ϕdenotes the empty set.while ∥rt ∥ ≥ ρ do

Step 1 : Find the most matched atom from the dictionary D by finding the largest inner product betweenrt−1 and dj s.t j < Λ

Step 2 :Update the index set Λt = Λt−1 ∪ λtwhereλt = argmaxj

5.6 Sparsity based classification Dictionary based Sparse Classification

where max() is a function that returns the maximum value of a vector,Variance() to find the variance of data and nonzero() to count the num-ber of non-zero elements in a vector.Classi f ier1,Classi f ier2,Classi f ier3and Classi f ier4 are the different classifiers. In the next chapter we will re-view the results of the classifiers and the performance of the dictionary basedmodel.

37

Chapter 6

Results and Conclusion

6.1 Overview

In this chapter we will verify the performance of the model based on variousresults and conclude with some future prospects of the work done.

6.2 Results

The data sets used to obtained the results are those mentioned in Chapter 3.The performance of the model in our case is the prediction performance of theclassifier. A 10-fold cross validation was performed on the dataset to split theentire data into 10 folds, from which 9 folds were used to build the dictionaryand 1 fold for testing the model. Each fold was used for testing iteratively andthe accuracies were calculated. Two different dictionaries were built: onewithsub-band power features and the other with energies of a wavelet transform.We will denote the first dictionary as Dsbp and the later as Dwe from nowonwards.

Accuracy of a model based on training and testing test, is not a good enoughmetric by itself to calculate the performance of the classifier. So here we havecalculated other metrics such as confusionmatrix, precision and recall and thereceiver operating characteristics (ROC) curve.

6.2.1 Classification of two-class MI signals

For the two-class MI signals, we had right hand and right foot MI signals thatneeded to be classified. To illustrate how sparsity plays an important role inour classification, Fig 6.1 shows the sparse representation of two sample sig-nals belonging to the two different class. Here there are around 1400 atoms inthe dictionary and so the 1st 800 elements corresponds to the first class and

6.2 Results Results and Conclusion

Figure 6.1: Sparse representation of Right hand (left) and right Foot (right).

i = 1 i = 2 i = 3 i = 4 i = 5 i = 6 i = 7 i = 8 i = 9 i = 10 AverageClassifier1 94.64 94.64 96.42 90.47 94.64 91.66 92.85 92.85 93.45 92.26 93.38Classifier2 96.42 95.23 95.23 91.07 94.04 91.07 92.26 90.47 95.23 90.47 93.6Classifier3 94.64 94.64 96.42 90.47 94.64 91.66 92.85 92.85 93.45 92.26 93.38Classifier4 94.64 95.83 92.26 91.07 93.45 89.88 93.45 91.07 95.64 89.28 92.65

Table 6.1: Acuuracies of the classifiers using 10 fold cross validation scheme and bandpower dictionary.

the rest for the second class. We can clearly see that the sparse representationis classifying the input with accuracy.

Table 6.1 and 6.2 shows the accuracies of each of the four classifiers in the10 fold cross validation using the different dictionaries Dsbp and Dwe respec-tively.

The confusion matrices: normalized and non-normalized, of each of theclassifiers using dictionaryDwe is given from Fig 6.2-6.5. Similarly the band-power dictionarty Dsbp is used for classification and the confusion matricesare given from Fig 6.6-6.9.

Precision and recall are two metric but are commonly mentioned together.Precision gives us information about the percentage of samples predicted tobe true that were actually true. Whereas, recall tell us about the percentage of


Table 6.2: Acuuracies of the classifiers using 10 fold cross validation scheme and wavelet energy dictionary.

39


Figure 6.2: Confusion matrix of classifier 1 using wavelet energy features dictionary


40




41


Figure 6.6: Confusion matrix of classifier 1 using bandpower features dictionary


42




43


Figure 6.10: Precision recall curve for for the two-class data using wavelet feature dictionary

Figure 6.11: Precision recall curve for for the two-class data using bandpower feature dictionary

44


Figure 6.12: Average ROC curve for wavelet energy dictionary

Figure 6.13: Average ROC curve for bandpass dictionary

45


true samples that were actually predicted true.

Precision =TruePositives

TruePositives + FalsePositives

Recall =TruePositives

TruePositives + FalseNegativesprecision recall curve shows the relationship between the two metrics. The

area under the curve is a good measure of the accuracy of the model. Theaverage precision recall curve for the two dictionaries are shown in Fig 6.10and Fig 6.11.

A receiver operating characteristic (ROC), or ROC curve, is a graphical plotthat illustrates the performance of a binary classifier system as its predictionthreshold is varied. The ROC curve plots true positive rate on the y-axis andfalse positive rate on the x-axis. So the top left point in the plot is the idealpoint. The area under the ROC curve gives us a performance parameter anda bigger area is usually better. Fig 6.12 and 6.13 shows the two ROC curvesfor the two different dictionaries used for classification.

All the metrics for the two class scenario gives us a good response aboutthe performance of the model. The wavelet energy feature based dictionaryDwe outperforms the sub-band-power dictionary Dsbp in the two class classi-fication.

6.2.2 Classification of four-class MI signals

Now we present the results of the four-class MI signal classification usingonly dictionaryDsbp as dictionaryDwe was giving unsatisfactory results. Thebandpower features for the four class data were calculated from three fre-quency sub-bands (8-13)Hz, (14-20)Hz and (20-30)Hz .Like in the two classcase the various performance metrics were found and a 10 fold cross valida-tion was performed to get the accuracy. Fig 6.14 illustrates the sparse repre-sentation of the four class input signal and as the two class case if we dividethe sparse signals into four equal parts, we can clearly see the different classescorresponding to different sparse signals.

46


Figure 6.14: Sparse representation of (top-left) Left class, (top-right) Right class, (bottom-left) Foot class and(bottom-right) Tongue class signals

47

6.3 Conclusion Results and Conclusion


Table 6.3: Accuracies of the classifiers using 10 fold cross validation and bandpower dictionary

Table 6.3 gives us the accuracies of the classifier. The confusion matricesfor the 4 classifiers are shown from Fig 6.15-6.18.Here the Precision recallcurve and ROC curve is given in one vs all manner as the two curves are orig-inally for binary classification. The Precision recall and ROC curve is givenin Fig 6.19 and Fig 6.20 respectively.

The performance metrics of our model for both the two-class and four-classscenario illustrates very positively about its performance.

6.3 Conclusion

The aim of the thesis was to be analyze existing methods for BCI to build bet-ter BCI. In this study we looked into a dictionary based learning technique toclassify motor imagery signals. The sparse based classification was a simpleyet an effective way of classifying the different signals. The design of thedictionary was an important issue for the performance of the classifier. Themodel used was applied to two data sets with satisfactory results on all fronts.

There is still scope of improvement in speed and robustness of the model aswell as in the scalability of the brain signal processing.The dictionary struc-ture can be stored in a distributed file system for faster access and parallelprocessing. The classification rate will only increase as the dictionary sizeincreases given that the newly added atoms of the dictionary are not Falsepositives. There can be faster implementations with the help of parallelismboth in the programming as well as the computer architecture level. The de-sign of the dictionary is keeping in view of the database where a huge dic-tionary of this sort have to be stored for easy and fast access to the data. Theband power features and wavelet energy features were calculated by usingtransformation. To make it even faster for real time applications, the Fouriertransform and wavelet transform which are the two main transformations forour feature extraction module,can be implemented in hardware using a field

48




49




50


Figure 6.19: The Precision recall curve of all the classes with bandpower dictionary

Figure 6.20: The ROC curves of all classes with bandpower dictionary

51


programmable gate array as described in [48, 49]. To give a rough estimate,I ran my instructions on a general purpose processor and it took 850 millisec-onds for every classification input. This can be made much faster with hard-ware implementations. From the dictionary point of view there can be otherfeatures or a combination of features to build different dictionaries. Moreclasses can be added to make the dictionary more powerful to develop ap-plications with higher degrees of freedom. The same model structure can beused to classify other brain signals related to emotions and cognitive loadamong others to build the future BCI devices. In this work, I contribute a lit-tle to the beginning of what could be the future of our computing world withbetter understanding of our human brain.

52

Bibliography

[1] K. L. Crow, “Four types of disabilities: Their impact on online learning,”TechTrends, vol. 52, no. 1, pp. 51–55, 2008. 2

[2] J. Wolpaw and E. W. Wolpaw, Brain-computer interfaces: principlesand practice. Oxford University Press, USA, 2012. 3

[3] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, “Optimal spatialfiltering of single trial eeg during imagined hand movement,” IEEEtransactions on rehabilitation engineering, vol. 8, no. 4, pp. 441–446,2000. 5, 11

[4] S. Lemm, B. Blankertz, G. Curio, and K.-R. Muller, “Spatio-spectralfilters for improving the classification of single trial eeg,” IEEE trans-actions on biomedical engineering, vol. 52, no. 9, pp. 1541–1548, 2005.5, 11

[5] R. Tomioka, G. Dornhege, G. Nolte, B. Blankertz, K. Aihara, and K.-R.Müller, “Spectrally weighted common spatial pattern algorithm for sin-gle trial eeg classification,” Department of Mathematical Informatics,University of Tokyo, Tokyo, Japan, Tech. Rep, vol. 40, 2006. 5, 11

[6] W. Wu, X. Gao, B. Hong, and S. Gao, “Classifying single-trial eeg dur-ing motor imagery by iterative spatio-spectral patterns learning (isspl),”IEEE Transactions on Biomedical Engineering, vol. 55, no. 6, pp. 1733–1743, 2008. 5, 11

[7] H. Higashi and T. Tanaka, “Simultaneous design of fir filter banksand spatial patterns for eeg signal classification,” IEEE transactions onbiomedical engineering, vol. 60, no. 4, pp. 1100–1110, 2013. 5, 11

[8] C. Park, C. C. Took, and D. P. Mandic, “Augmented complex commonspatial patterns for classification of noncircular eeg frommotor imagerytasks,” IEEE Transactions on Neural Systems and Rehabilitation Engi-neering, vol. 22, no. 1, pp. 1–10, 2014. 5, 11

BIBLIOGRAPHY BIBLIOGRAPHY

[9] D. Hanbay, “An expert system based on least square support vector ma-chines for diagnosis of the valvular heart disease,” Expert Systems withApplications, vol. 36, no. 3, pp. 4232–4238, 2009. 6, 12

[10] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma, “Robust facerecognition via sparse representation,” IEEE transactions on patternanalysis and machine intelligence, vol. 31, no. 2, pp. 210–227, 2009.6

[11] J. F. Gemmeke, T. Virtanen, and A. Hurmalainen, “Exemplar-basedsparse representations for noise robust automatic speech recogni-tion,” IEEE Transactions on Audio, Speech, and Language Processing,vol. 19, no. 7, pp. 2067–2080, 2011. 6

[12] D. L. Donoho, Y. Tsaig, I. Drori, and J.-L. Starck, “Sparse solution ofunderdetermined systems of linear equations by stagewise orthogonalmatching pursuit,” IEEE Transactions on Information Theory, vol. 58,no. 2, pp. 1094–1121, 2012. 6

[13] E. J. Candès and M. B. Wakin, “An introduction to compressive sam-pling,” IEEE signal processing magazine, vol. 25, no. 2, pp. 21–30,2008. 6

[14] e. a. Ramadan, Rabie A., “Basics of brain computer interface.,” SpringerInternational Publishing, pp. pp. 31–60. 9

[15] e. a. Levine, Simon P., “A direct brain interface based on event-relatedpotentials,” IEEE Transactions on Rehabilitation Engineering, vol. 8,no. 2. 9

[16] e. a. Neuper, Christa, “Motor imagery and action observation: modu-lation of sensorimotor brain rhythms during mental control of a brain–computer interface.,” Clinical neurophysiology, vol. 120, no. 2. 9

[17] S. R. Westdorp AF Wijesinghe RS Tucker DM Silberstein RB CaduschPJ.R Nunez PL, “Eeg coherency. i: Statistics, reference elec- trode, vol-ume conduction, laplacians, cortical imaging, and interpretation at mul-tiple scales,” Electroencephalogr Clin Neurophysiol, vol. 103, no. 23,pp. 499–515, 1997. 11

[18] M. L. M. D. S. V. . W. J. R. McFarland, D. J., “Spatial filter selection foreeg-based communication,” Electroencephalography and clinical Neu-rophysiology, vol. 103, no. 3, pp. 386–394, 1997. 11

54


[19] M.-G. J. . P. G. Ramoser, H., “Optimal spatial filtering of single trial eegduring imagined hand movement,” IEEE transactions on rehabilitationengineering., vol. 8, no. 4, pp. 441–446, 2000. 11

[20] S. Siuly and Y. Li., “Improving the separability of motor imagery eegsignals using a cross correlation-based least square support vector ma-chine for brain–computer interface,” IEEE Transactions on Neural Sys-tems and Rehabilitation Engineering., vol. 20, no. 4, pp. 526–538, 2012.11

[21] W.-K. Tam, K.-y. Tong, F. Meng, and S. Gao, “A minimal set of elec-trodes for motor imagery BCI to control an assistive device in chronicstroke subjects: a multi-session study,” IEEE Transactions on NeuralSystems and Rehabilitation Engineering, vol. 19, no. 6, pp. 617–627,2011. 12

[22] D.-A. G. A. . H. B. Yuan, H., “Cortical imaging of event-related (de)synchronization during online control of brain-computer interface usingminimum-norm estimates in frequency domain,” IEEE Transactions onNeural Systems and Rehabilitation Engineering, vol. 16, no. 5, pp. 425–431, 2008. 12

[23] L.-D. u. R. N. A. A. . M. D. P. Park, C., “Classification of motor imagerybci using multivariate empirical mode decomposition,” IEEE Transac-tions on neural systems and rehabilitation engineering, vol. 21, no. 1,pp. 10–22, 2013. 12

[24] S. Siuly and Y. Li, “Improving the separability of motor imagery EEGsignals using a cross correlation-based least square support vector ma-chine for brain–computer interface,” IEEE Transactions on Neural Sys-tems and Rehabilitation Engineering, vol. 20, no. 4, pp. 526–538, 2012.12

[25] Y. Li, P. P. Wen, et al., “Modified CC-LR algorithm with three diversefeature sets for motor imagery tasks classification in eeg based brain–computer interface,” Computer methods and programs in biomedicine,vol. 113, no. 3, pp. 767–780, 2014. 13

[26] A. Schlögl, K. Lugger, and G. Pfurtscheller, “Adaptive autoregressiveparameters for a brain-computer-interface experiment,” in Engineering

55


in Medicine and Biology Society, vol. 4, pp. 1533–1535, Citeseer, 1997.13

[27] V.-D. Z. J. M. T. G. A. . D. P. Kus, R., “Asynchronous bci based onmotor imagery with automated calibration and neurofeedback training,”IE

Classificationofmulticlassmotor imageryEEGsignalsusingsparsity...

Documents

Transcript of Classificationofmulticlassmotor imageryEEGsignalsusingsparsity...