Towards 24-7 Brain Mapping Technologyetd.dtu.dk/thesis/240392/ep09_17.pdf · Towards 24-7 brain...

71
Towards 24-7 Brain Mapping Technology Brian Nielsen Master’s thesis, March 2009

Transcript of Towards 24-7 Brain Mapping Technologyetd.dtu.dk/thesis/240392/ep09_17.pdf · Towards 24-7 brain...

Towards 24-7 Brain Mapping Technology

Brian Nielsen

Master’s thesis, March 2009

Towards 24-7 Brain Mapping

This master’s thesis is written by: Brian Nielsen, s974663 Supervisor: Professor Lars Kai Hansen

DTU Informatics Intelligent Signal Processing Technical University of Denmark Richard Petersens Plads Building 321 2800 Kgs. Lyngby Denmark www.imm.dtu.dk Phone: (+45) 45 25 33 51 Email: [email protected]

Date of publication:

18 March 2009

Edition:

1. edition

This master’s thesis serves as documentation for the final assignment in the requirements to achieve the degree Master of Science in Engineering. The report represents 30 ECTS points.

© Brian Nielsen, 2009

The front page illustration shows a computer simulation of pyramidal cells in the neocortex (Markram, 2009).

Towards 24-7 brain mapping technology DTU, March 2009

I

Abstract The use of closely spaced subcutaneously implanted electrodes for EEG recording is

examined.

A comparison between conventional electrodes and subcutaneous electrodes is made.

Only a limited amount of data material is available. Several methods are employed for

the comparison: frequency spectra, ERP, the amount of artifacts, and ICA

decomposition. The analysis shows that the data recorded from the two different

recording methods is almost identical, although some differences are found. The found

differences do not give a clear picture of whether the subcutaneous electrodes provide

better or worse data compared to the conventional electrodes.

A classification of two different data sets is done in order to investigate the use of a

limited amount of electrodes: 1) Classification of a data set containing visual evoked

potential (VEP) trials is performed by three different classification methods: Fisher’s

linear discriminant (FLD), linear support vector machines (SVM), and Gaussian SVM. A

good classification from the supplied data is not possible. 2) Classification of a data set

containing tasks based on motor imagery is performed. FLD, linear SVM, and Gaussian

SVM are used as classifiers. Feature extraction is performed on the basis of event

related potential (ERP) and event related spectral perturbation (ERSP). Using only four

electrodes a classification accuracy of 94% is obtained. The results from the second

classification show that it is possible to perform a successful classification using only a

few electrodes.

Towards 24-7 brain mapping technology DTU, March 2009

II

Preface This master’s thesis has been carried out in the time from 25 August 2008 to 18 March

2009 at the Intelligent Signal Processing group at DTU Informatics institute, Technical

University of Denmark. The work has been supervised by Professor Lars Kai Hansen. I

would like to thank him for the inspiration and guidance he provided. I would also like to

thank the people at Hypo-Safe A/S who also offered inspiration and willingly provided

part of the data which is used in this project.

Kgs. Lyngby, 18 March 2009

Brian Nielsen, s974663

Towards 24-7 brain mapping technology DTU, March 2009

III

Contents 1 Introduction ....................................................................................................... 1

1.1 Problem description ............................................................................................ 3

1.2 Guide to this report ............................................................................................. 4

2 The brain and EEG .............................................................................................. 5

2.1 Structure of the brain .......................................................................................... 5

2.2 Dipoles ................................................................................................................. 7

2.3 Brain current sources ........................................................................................ 10

2.4 From current sources to scalp potential ............................................................ 12

2.5 Measuring the EEG ............................................................................................ 14

2.6 Characteristics of the EEG ................................................................................. 16

2.6.1 EEG rhythms/spectral information ........................................................... 17

2.6.2 Mu rhythm ................................................................................................ 17

2.6.3 ERP and EP ................................................................................................. 18

2.6.4 Readiness potential ................................................................................... 19

2.6.5 ERD, ERS, and ERSP ................................................................................... 19

3 Methods and tools ........................................................................................... 20

3.1 Classification ..................................................................................................... 20

3.1.1 Fisher’s linear discriminant (FLD) classification ........................................ 21

3.1.2 Support vector machine (SVM) classification ........................................... 22

3.1.3 Cross-validation ......................................................................................... 26

3.2 EEGLAB .............................................................................................................. 27

3.2.1 ERP images ................................................................................................ 27

3.2.2 Event related spectral perturbation (ERSP) .............................................. 27

3.2.3 ICA ............................................................................................................. 28

4 Data analysis ................................................................................................... 29

4.1 Comparison of subcutaneous electrodes with conventional electrodes ........... 29

4.1.1 Description of dataset ............................................................................... 29

4.1.2 Preprocessing ............................................................................................ 29

4.1.3 Artifacts ..................................................................................................... 31

4.1.4 Frequency spectrum ................................................................................. 32

Towards 24-7 brain mapping technology DTU, March 2009

IV

4.1.5 Event related potentials (ERP) .................................................................. 35

4.1.6 ICA components ........................................................................................ 36

4.1.7 Conclusion on electrode comparison ........................................................ 38

4.2 Classification of visual stimuli ........................................................................... 40

4.2.1 Preprocessing ............................................................................................ 40

4.2.2 Features ..................................................................................................... 40

4.2.3 Classification ............................................................................................. 41

4.2.4 Conclusion on classification of visual stimuli ............................................ 47

4.3 Classification of motor imagery tasks ............................................................... 49

4.3.1 Description of dataset ............................................................................... 49

4.3.2 Preprocessing ............................................................................................ 50

4.3.3 Selection of electrodes .............................................................................. 50

4.3.4 Extraction of features ................................................................................ 51

4.3.5 Classification ............................................................................................. 55

4.3.6 Conclusion on motor imagery classification ............................................. 56

5 Discussion ........................................................................................................ 58

5.1 Validity of data from subcutaneous electrodes ................................................ 58

5.2 Limits in classification performance using few electrodes ................................ 58

5.3 Improvements ................................................................................................... 59

5.4 Perspective ........................................................................................................ 60

6 Conclusion ....................................................................................................... 61

7 References ....................................................................................................... 62

Towards 24-7 brain mapping technology DTU, March 2009

V

Abbreviations

BCI Brain computer interface

EEG Electroencephalography

EP Evoked potential

EPSP Excitatory postsynaptic potential

ERD Event related desynchronization

ERP Event related potential

ERS Event related synchronization

ERSP Event related spectral perturbation

FLD Fisher’s linear discriminant

fMRI Functional magnetic resonance imaging

IPSP Inhibitory postsynaptic potential

LOO Leave-one-out (cross-validation)

MEG Magnetoencephalography

MI Motor imagery

PET Positron emission tomography

SPECT Single photon emission computed tomography

SVM Support vector machines

VEP Visual evoked potential

Towards 24-7 brain mapping technology DTU, March 2009

1

1 Introduction The brain is a fascinating organ. For a long time though, it has been a closed black box to

us. We could see the inputs and outputs but had no understanding of what went on

inside. Different strategies have been employed in trying to unravel the mysteries inside

the black box. In the bottom-up approach, one starts from an understanding of how the

individual neurons behave and connect with each other. On this basis, the goal is to

discover more and more complex structures until a complete understanding of the brain

is achieved. In contrast, the top-down approach looks at the behavior of the individual

under different circumstances. This information is then used to draw conclusions on the

underlying mechanisms.

In the recent years a revolution in neuroscience has taken place. New tools for imaging

the brain – such as fMRI, EEG, MEG, PET, and SPECT – have provided us with a way to

investigate the human brain while the subject performs a variety of tasks. Together with

progress with mathematical models and raw computing power, this has led to the new

research field called systems neuroscience. This research area tries to bridge the gap

between the top-down and bottom-up approach.

One of the tools that is receiving much attention right now is the electroencephalogram

(EEG). Since the start of EEG-studies on humans in the beginning of the 20th century, its

use as a clinical tool has been explored. The epileptiform spikes seen in epileptics were

demonstrated in 1934 by Fisher and Lowenback. Today the diagnostic use of EEG in a

clinical setting is for epilepsy as well as many other areas such as sleep-disorders,

strokes, infectious diseases, brain tumors, mental retardation, severe head injury, drug

overdose, brain death, etc. (Nunez, 2005).

Even today the EEG is often examined manually by experts who are able to perform a

diagnosis based on the raw EEG measurements. But there is a trend towards automated

and objective methods of extracting information from an EEG measurement.

Many other areas for use of the EEG are being investigated today. Areas of research

include psychiatric disorders, depression, and metabolic disorders. Another area being

investigated is the use of EEG as a brain-computer-interface (BCI). A BCI system enables

the user to send commands to an electronic device only by means of brain activity. This

would be usable by many people with motor difficulties.

Thus EEG has a great potential as a tool for diagnostic purposes as well as an aid to

people living with chronic diseases or disabilities.

Towards 24-7 brain mapping technology DTU, March 2009

2

The recording of EEG is not without problems though. It is rather slow to set up a

recording session, especially if it is a recording with many electrodes. Furthermore the

results are influenced by noise from the surroundings. When using EEG as a diagnostic

tool for conditions such as drug overdose, brain tumors, infectious diseases or head

injuries one can live with the difficulties of carrying out an EEG recording – it just takes a

bit more time to acquire the recording.

But for other applications the limitations of a conventional EEG recording is inhibiting

the potential of the specific application. Some of the applications which are limited by

the difficulties with recording are for disorders that affect the daily life of a patient –

such as sleep disorders, metabolic disorders, and epilepsy. Within these areas long term

recordings of high quality are required. The conventional EEG recording apparatus has

many problems associated with long term recordings: Many cumbersome wires,

electrodes with poor skin contact, loosening of electrode contact with the skin when

moving or sweating, equipment that is sensitive to electric noise in the surrounding

area, etc. Because of these issues it is problematic to obtain EEG recordings over longer

periods of time and impossible to do so without restricting the daily activities of the

subject. Today when recording over longer periods of time the patients are often

admitted to a hospital, where the recording can be made under relatively controlled

conditions.

If these problems could be solved, EEG and BCIs could potentially be used for aiding the

daily life of patients who must live with diseases or disabilities. In general three major

obstacles restrict the use of EEG in daily life purposes. 1) The lack of a convenient

recording method (small device, easy to use, allowing freedom of movement). 2)

Recordings that are not “drowned” in noise from the environment or noise from the

movements of the patient. 3) Efficient analysis of the signal.

The analysis of the massive amounts of data obtainable from long term EEG recordings

can require substantial computing power, and in the case of a BCI the computer

probably needs to be mobile. Efficient algorithms and an increase in computer power in

less space has (or will) solve this problem. As an example of the algorithms getting more

efficient it can be mentioned that a consumer-oriented BCI device will soon be on the

market. This device is capable of detecting EEG signals from 14 electrodes with the

purpose of real time control of games, instant messaging, music listening on a personal

computer (Emotiv Systems, 2009). This device is also a demonstration of how cheap an

EEG device can be constructed – it is expected to retail at US$299.

Towards 24-7 brain mapping technology DTU, March 2009

3

In order to control a BCI or discover if a patient has a disease or not, differences in brain

activity patterns need to be identified. Most methods rely on classification algorithms,

i.e. an algorithm that automatically groups recorded signals into different classes (e.g.

having epilepsy or not having epilepsy). Creating a classification algorithm that works for

long-term recordings is a challenge, due to the very different patterns the EEG exhibits

during the daily life of a subject: when sleeping, exercising, relaxing, speaking etc.

Robust mathematical models describing the state of the brain and the variations in the

EEG under many different conditions are probably needed.

Even if robust models are developed the problem of noisy data still exists. In order to

alleviate the problems of poor skin contact and poor signal-to-noise ratio, subdural

electrodes implanted under the skull have been tested for use in BCI systems. But the

risks associated with open brain surgery will only make this worthwhile under the most

severe circumstances.

A way to obtain some of the advantages of sub-cranial electrodes without the risk would

be to use a device under the skin (subcutaneous) for recording EEG. Since the device is

outside the skull the risks are small. Such a device would make it possible to obtain long

term recordings of people who will not be restricted in their daily life during the

recording.

A device based on subcutaneously implanted electrodes and recording device is being

developed presently by a company called Hypo-Safe (Hypo-Safe, 2009). The

subcutaneous implant features a 5 cm long electrode with four contact points attached

to a coin sized device. The basis of this device is to use the brain as a sensor for

detecting hypoglycemia in diabetes patients. The company’s vision is to use the device

for other diseases as well. This device would be a unique tool because it would allow 24-

7 brain mapping under near-normal life conditions. No other tool used in neuroscience

today offers this possibility.

1.1 Problem description The use of subcutaneously implanted electrodes in EEG is novel. Consequently there is a

need for examining the data obtained from subcutaneous electrodes. This can be done

by comparing the data obtained from subcutaneous electrodes with the data obtained

from conventional surface electrodes. When we know the relationship between the

data acquired using these two methods, it will be possible to take advantage of all the

knowledge concerning conventional EEG we have today.

One drawback of a subcutaneously implanted EEG recording device, such as the one

Hypo-Safe is developing, will be the small number of electrodes and the low spatial

Towards 24-7 brain mapping technology DTU, March 2009

4

distribution of these. This will limit the accuracy of the used classification algorithms to a

certain degree, compared to the use of conventional EEG with many channels and high

spatial distribution.

1.2 Guide to this report The structure of this report is presented here.

The next section gives the reader a basic understanding of the brain and the electric

potential we can measure on the scalp. The section starts with a short description of the

microscopic and macroscopic structures of the brain. Following this is an explanation of

the EEG. The explanation begins with the individual current sources in the brain and

ends with the measured EEG.

In section 3 there is a presentation of the algorithms used later in the report. EEGLAB (a

Matlab toolbox) is also introduced.

After these introductory sections follows three data analysis sections. In the first of the

analysis sections a comparison between subcutaneous electrodes and ordinary surface

electrodes is made. In the second analysis section a classification of data recorded with

subcutaneous electrodes is done. The third analysis section uses a data set recorded

with conventional EEG using many electrodes. With this data set the effect of reducing

the number of electrodes is examined. A conclusion for each of the analysis is given in

the respective section.

Afterwards, the results from all the analysis sections are discussed. The analysis sections

and the discussion result in a conclusion which hopefully will answer the problem

description.

Towards 24-7 brain mapping technology DTU, March 2009

5

2 The brain and EEG When studying the EEG, a proper understanding of the underlying mechanisms that

generates the EEG is helpful. With knowledge about what is measured (and what is not

measured) more informed decisions and stronger conclusions can be made.

This section will start with a brief introduction on the structure of the brain. Afterwards

an explanation of what causes the potential measurable at the scalp by EEG is

presented. This includes the mathematical framework describing the electric potential in

the brain. Although the cause of the potential measurable by EEG is not proven, one

theory is generally accepted. The primary source used for this description is (Nunez,

2005).

2.1 Structure of the brain In order to get ones bearings when talking of the brain, a brief description of the

microscopic and macroscopic structures is presented.

In the brain there are 100 billion neurons. 10 billion of these are pyramidal cells which

lie in the outer 2-4 millimeters of the brain, called the cerebral cortex. Other names for

the cerebral cortex are grey matter and neocortex (the latter name only used in

mammals). An illustration of a pyramidal cell is seen in figure 1. Each pyramidal cell is

covered with as many as 10,000 to 100,000 connections, called synapses, from other

neurons. The outgoing connection from a pyramidal cell is a single branched fiber called

an axon. The connections from other cells are received through the dendrites. The

connections between pyramidal cells are in the form of either short intracortical fibers

(<1 mm in length) or corticocortical fibers roughly 1-15 cm in length. The corticocortical

fibers pass through the deep parts of the brain, called white matter, to connect one part

of the cortex with more distant parts. (Nunez, 2005)

Towards 24-7 brain mapping technology DTU, March 2009

6

Figure 1: Illustration of a pyramidal cell. Taken from (Cotterill, 1998)

The surface of the cerebral cortex is highly folded. In fact two-thirds of the surface is

buried in the folds. The upper parts of the folds are called gyri and the grooves are called

sulci.

The pyramidal cells are structured in so-called mini columns and macro columns. A mini

column is 0.03 mm in radius, 3 mm in height and contains 100 pyramidal cells with 1

million synapses in total. 1,000 mini columns make up one macrocolumn.

The cortex can be divided into areas based on their general function as seen in figure 2.

The two areas of interest to this work are marked in bold, namely the primary motor

cortex and the primary visual cortex. The primary motor cortex is involved with

execution of all voluntary movements. Each of the different body parts has a precise

representation in the motor cortex as seen on figure 3. The arrangement of these

representations is called a motor homunculus (Latin: little man). A corresponding

sensory homunculus is to be found in the primary somatosensory cortex.

Towards 24-7 brain mapping technology DTU, March 2009

7

Figure 2: Functional division of the cerebral cortex. The areas of main interest in this work are marked with bold. Taken from (University of Colorado at Boulder, 2009)

Figure 3: Localization of each of the body parts in the primary motor cortex. This representation is known as a motor homunculus. Taken from (Dubuc, 2009)

2.2 Dipoles A fundamental concept when trying to describe what causes an electric potential at the

scalp is that of the dipole. In order to describe the dipole we first need to establish a few

equations.

Towards 24-7 brain mapping technology DTU, March 2009

8

Coulomb’s law describe the force, 𝑭 between the two charges 𝑞1 and 𝑞2 at vector

location 𝒓:

𝑭 𝒓 =𝑞1𝑞2𝒂

4𝜋𝜀0𝑅2 (2.1)

where 𝜀0 is the constant permittivity of empty space, 𝑅 is the charge separation in

meters, and 𝒂 is a unit vector pointing in the direction of the line between the two

charges.

From Coulomb’s law the electric field at vector location 𝒓 due to a point charge 𝑞 at

location 𝒓𝟏 is given by

𝑬 𝒓 =𝑞𝒂

4𝜋𝜀0𝑅2

(2.2)

𝑅 is the scalar distance between the charge and the field point. This expression can only

be used when only one point charge is looked at. With more than one charge the

individual contributions are summed up:

𝑬 𝒓 =

1

4𝜋𝜀0

𝑞𝑛𝒂𝑛

𝑅𝑛2

𝑁

𝑛=1

(2.3)

Next we connect the electric field to the electric potential, Φ. Oscillations in an electric

field will induce a magnetic field, but if the oscillation frequency is less than the order of

MHz, then the induction is negligible (Nunez, 2005). It is then possible to express the

electric field in terms of the potential:

𝑬 𝒓 = −∇Φ 𝐫 ⇒

Φ 𝐫 =1

4𝜋𝜀0

𝑞𝑛𝑅𝑛

𝑁

𝑛=1

(2.4)

Now that we have an expression for the potential of a number of charges as a function

of the location in space, we can take a look at what happens when we put two charges

of opposite sign and equal magnitude close to each other. For moderate to large

distances from the charges the length of vector 𝒓 is approximately the same as 𝑅𝑛 . Thus

the potential can be approximated as

Φ 𝐫,𝜃 ≅

𝑞𝑑 cos𝜃

4𝜋𝜀0𝑟2

(2.5)

Towards 24-7 brain mapping technology DTU, March 2009

9

where 𝑑 is the distance between the charges, 𝑟 is the length of vector 𝒓, and 𝜃 is the

angle between 𝒓 and the line between the two charges. This equation becomes a good

approximation when 𝑟 ≅ 3𝑑 or 4𝑑.

Figure 4 shows the electric field around such a pair of charges, called a dipole. From the

figure it can be seen that when going along the z-direction the potential falls off with

distance as 1 𝑟2 . In the direction of the y-axis the potential is 0, i.e. no potential is

found perpendicular to the dipole.

Figure 4: The electric field close to a dipole in homogeneous, isotropic dielectric medium. Solid lines are the field lines signifying the direction of the local electric field. The dashed lines are the isopotentials. Reproduced from (Nunez, 2005).

Now that we have described what a dipole is, we can begin to think about what happens

when more than two charges are placed close to each other. Many different and

complicated configurations can be thought of, but it can be shown that the potential

due to all charges can be expressed in simplified form as a series of terms called a

multipole expansion (Nunez, 2005):

Φ 𝑟 = monopole contribution, 1 𝑟 + dipole contribution, 1 𝑟2 + quadropole contribution, 1 𝑟3 + octupole contribution, 1 𝑟4 + ⋯

(2.6)

Towards 24-7 brain mapping technology DTU, March 2009

10

For each contribution the dependency on the distance to the polesource is stated. This

expression tells us that the monopole contribution is the most significant at a longer

distance. But in practice – because of electroneutrality – we will always have an equal

amount of positive and negative sources, and the monopole terms will thus be zero.

Furthermore, it is seen that the n-pole terms larger than dipole falls off with distance

much more rapidly than the dipole. Therefore, the dipole contribution will be the most

significant when looking at a complicated charge configuration from a certain distance.

When looking at a number of parallel dipoles it is possible to sum them up because from

a distance it is just a larger dipole.

What has been described here is the so-called charge dipole. In electrophysiology a

more important concept is the current dipole. The mathematical formulation of the

current dipole is identical, since potential for a current dipole can be described as

Φ 𝐫,𝜃 ≅

𝐼𝑑 cos𝜃

4𝜋𝜎𝑟2, 𝑟 ≫ 𝑑 (2.7)

where 𝐼 is the current source and 𝜎 is fluid conductivity. Figure 4 also looks the same for

a current source and a current sink in a homogenous, isotropic conductor (e.g. a salt

water tank). The solid lines then show the current lines and the dashed lines still show

the isopotentials. Consequently the multipole expansion and importance of dipoles also

holds.

2.3 Brain current sources As described in the previous section the potential is influenced by current dipoles and

charge dipoles. In context of the macroscopic potential the current dipoles are by far the

most important. One of the reasons for current dipoles being more important than

charge dipoles, is that the charge separation which occurs in the neurons, e.g. over a

membrane, is orders of magnitude smaller than the distance between current sources

and sinks. Furthermore a shielding effect (Debye shielding) occurs in a fluid with mobile

charge carriers. At macroscopic distances this makes the contribution to potential of

charge dipoles negligible (Nunez, 2005).

Many different sources and mechanisms in the brain can act as a dipole. Some of the

possible sources are presented in this section.

An important current source in the brain is that produced by the synapses. The process

starts when an action potential travels down the axon of the presynaptic cell to synapse.

In the synaptic knob the depolarization of the membrane causes an inflow of calcium

ions through the presynaptic membrane. The calcium ions in turn cause a release of

Towards 24-7 brain mapping technology DTU, March 2009

11

neurotransmitters from the presynaptic cell into the synaptic cleft between the two

neurons. The neurotransmitters diffuse to the subsynaptic membrane (on the

postsynaptic neuron) and attach to receptors there. Two main types of synapses exist:

Excitatory synapses and inhibitory synapses. Each type of synapse employs different

types of neurotransmitters, receptors, and ion channels. In an excitatory synapse the

release of neurotransmitters cause an opening of ion channels which mainly transports

positive ions into the postsynaptic neuron. This will then depolarize the subsynaptic

membrane, making the postsynaptic neuron more susceptible to firing a new action

potential. In an inhibitory synapse the neurotransmitter release will primarily cause an

outflow of positive ions into the synaptic cleft. This will then hyperpolarize the

postsynaptic membrane, making it less susceptible to firing a new action potential. The

change of potential in the postsynaptic neuron is called either excitatory postsynaptic

potentials (EPSP) in the case of excitatory synapses or inhibitory postsynaptic potentials

(IPSP) in the case of inhibitory synapses.

For EPSP the membrane of the postsynaptic neuron (subsynaptic membrane) acts as a current sink – since positive ions move inward. The current then flows in the intracellular fluid and exits the membrane at more distant and distributed locations, as shown in figure 5. The opposite happens for IPSP, meaning that IPSP acts as a current source.

Figure 5: Membrane current caused by an excitatory synaptic action. An incoming action potential releases neurotransmitter substances into the space between the synaptic knob and the subsynaptic membrane (called the synaptic cleft). This changes the permeability of the subsynaptic membrane to select positive ions, thereby producing a local current sink. This creates a current flow, where more distant and distributed locations provide the current source. Taken from (Nunez, 2005).

Another type of current sinks and sources is the action potentials in themselves. The

action potential travels as a wave along the axon. As it travels past a section of the

Towards 24-7 brain mapping technology DTU, March 2009

12

membrane an inflow of mostly Na+ ions happens. This inflow of ions functions as a

current sink. After the action potential has travelled past the section of membrane an

outflow of mostly K+ ions happen to reestablish the membrane potential. Thus the

propagation of an action potential can be seen as a current sink which is travelling fast

along an axon followed closely by a current source. Many axons are covered by an

electrically insulating material called a myelin sheath. Along the length of the axon are

regular points at which the axon is not covered by the myelin sheath (called nodes of

Ranvier – see also Figure 1). The action potential cannot travel through the membrane

covered in myelin. Instead the action potential jumps from one node of Ranvier to the

next (called saltatory conduction).

Many other processes in the brain can also influence the potential. But these are

believed to be less important for generating a potential (Nunez, 2005). Below some of

them are mentioned:

Active or passive transport of ions across the membrane, e.g. in order to re-establish ion concentrations inside the cell after an action potential.

Electrical synapses where neurons connect mechanically and electrically.

Reciprocal synapses which are synapses between dendrites.

Fast chemical transport where the membranes of adjacent neurons connect directly creating a “short-circuit”.

Retrograde signalling where postsynaptic neurons release substances which prevent the release of neurotransmitters from the presynaptic neuron.

2.4 From current sources to scalp potential Now that we have looked at what the sources of potential inside the brain are, we need

to integrate that knowledge into an understanding of how the sources create the

potential measured at the scalp. Of all the possible sources mentioned in the previous

section the action potential and the synapses are the ones likely to be important at the

scalp level.

It is generally accepted that action potentials only contribute locally to the potential but

not at scalp level. There are several explanations for this: Only if a large number of

action potentials happen concurrently is the effect measurable at the surface of the

scalp. Since action potentials have very short duration the probability of the action

potentials happening at the same time is small. Another issue is the distribution of the

sources and sinks. Axons in the neocortex are not unidirectional with respect to the

surface of the brain. Therefore they will most likely cancel each other out. Axons also

form fibers containing a number of axons. The longest and most numerous of these

Towards 24-7 brain mapping technology DTU, March 2009

13

fibers – the corticocortical fibers that travel through the white matter – are much more

distant from the surface of the scalp, and will thus only produce a negligible potential.

This leaves us with the IPSP and EPSP as respectively current source and sink. As

suggested earlier the magnitude of electric potential or dipole moment at distance

depends heavily on distribution and synchrony. It is known from anatomical data that

inhibitory and excitatory synapses are distributed in a specific way (Megias, Emri,

Freund, & Gulyas, 2001). Almost all synapses close to the cell bodies are inhibitory

synapses. In the dendritic tree which is closer to the cortex surface mostly excitatory

synapses are seen. Since the cortex is 2-4 mm deep, this leads to a separation of the

current source and sinks on the same order of scale. The spatial distribution of the

synapses gives the effect of a dipole placed normal to the local cortical surface.

As mentioned earlier little or no potential is found perpendicular to the axis of a dipole.

Because the surface of the cortex is folded, a greater contribution to the scalp potential

is received from areas where the cortex is parallel with the surface of the scalp. As seen

on figure 6 the cortex is parallel to the scalp surface roughly at top of the gyri and at the

bottom of the sulci (Srinivasan, Winter, Ding, & Nunez, 2007). The distance from the top

of the gyri to the bottom of the sulci is in the order of a cm or more. The shortest

distance between scalp surface and the top of the gyri is about 1-1.5 cm. As the

potential drops off with the square of the distance, a doubling of the distance means

only ¼ of the potential. Consequently most of the measured EEG signal is likely to

originate from the cortical surface on the top of the gyri (Nunez, 2005).

Figure 6: Drawing of the cortex in relation to the skull and scalp. The triangles in the cortex signify pyramidal cells. The figure shows that the orientation of the pyramidal cells with respect to the surface of the scalp varies with the location in the cortex. The axis of the pyramidal cells are perpendicular to the scalp surface at the top of the gyri. At some locations between the gyri and sulci the pyramidal cells are parallel with the scalp surface.

Towards 24-7 brain mapping technology DTU, March 2009

14

Next we take a look at the synchrony between the sources measured by a value called

the coherence. Scalp potential amplitude depends strongly on the amount of source

synchronization because the individual contributions sum up, but naturally only if

synchronous. A single macrocolumn containing about 100,000 pyramidal cells is not able

to produce a dipole moment strong enough to produce scalp potentials. As a rough

estimate synchronous activity is needed in about 6 cm2 of cortex (approx. 600

macrocolumns or 60,000,000 neurons) located on the gyri in order to produce

recordable scalp potentials.

Using equation 2.7 we can estimate the ratio between cortical potential and scalp

potential. Looking at a single dipole in the cortex the distance to the surface of the

cortex would be in the order of a few mm and the distance to the scalp surface would be

1-2 cm. The ratio is then in the order of 100 ((10 mm)2/(1 mm)2). More realistically, the

potential is generated by many synchronous dipoles over a larger area. For such a dipole

layer it can be shown that the ratio between cortical potential and scalp potential

becomes smaller with increasing area of dipoles (Nunez, 2005). This relationship is

shown in figure 7.

Figure 7: Theoretical estimates of ratio between cortical and scalp potential as a function of area size of synchronously active cortex. A larger area of synchronously active dipoles decreases this ratio. The three curves are for the skull to brain resistivity ratios shown in the figure (40, 80, and 120). Two known experimental data points are also plotted in the figure. Reproduced from (Nunez, 2005).

2.5 Measuring the EEG An EEG consists of the measured potential difference between two electrodes placed

different places on the scalp. One of the electrodes is called the reference electrode and

the other the recording electrode. When using many electrodes usually the same

Towards 24-7 brain mapping technology DTU, March 2009

15

reference electrode is used for each of the recording electrodes. Calling this single

electrode a reference electrode is perhaps a bit misleading as the potential variations

occurring in this electrode affects the measured signal as well. Other types of electrode

references are also used. One commonly used is the average reference where the

outputs of all the electrodes are averaged, and the averaged signal is used as reference

for each of the electrode channels.

In a clinical setting when recording a routine EEG 21 to 25 electrodes are used (including

reference and ground electrode) (Nunez, 2005). EEG recorded with many more

electrodes is also being used today, mostly for research purposes. Systems with up to

256 electrodes have been developed.

When recording an EEG it is estimated that a single electrode channel records average

synaptic action from 100 million to 1 billion individual neurons. There is still room for

improvement in this respect but not much. The theoretical limit for distance between

electrodes is 1 cm which would record averages of “only” 10 million neurons. (Nunez,

2005)

The advantage of EEG does not lie in its spatial resolution which as can be seen is rather

poor compared to the array of other neuroscience tools available such as PET, SPECT,

fMRI and microelectrode recordings. The main strength of EEG lies in the quantification

of spatio-temporal patterns with excellent temporal and mediocre (or worse) spatial

resolution. High spatial resolution and precise source localization is not a natural

application of EEG. The signal from the electrodes can be sampled at almost any

frequency down to the millisecond level and below. In clinical EEG a sampling frequency

of typically 200 or 256 Hz is used (Nuwer, et al., 1998).

When using EEG systems with a large number of electrodes, several practical problems

present themselves. It becomes very time consuming to set up the recording.

Furthermore the amount of wires becomes very cumbersome for the subject.

Consequently, depending on the application it is not always desired to increase the

number of electrodes.

The measured potential at the scalp is 20-100 μV in amplitude (Aurlien, 2004). This faint

signal is easily drowned out by the many sources of noise. Many sources of noise giving

rise to artifacts in the measured signal are generated by the body itself. Some common

types of artifacts are eye-induced artifacts (blinking and eye movement), muscle induced

artifacts, and cardiac artifacts. Artifacts from other sources include movement of the

electrodes, poor grounding from power line noise, and poor contact with the skin

particularly when the subject is sweating.

Towards 24-7 brain mapping technology DTU, March 2009

16

As mentioned in the introduction a company called Hypo-Safe is developing a device

which takes a new approach at recording the EEG. Figure 8 shows a concept drawing of

the device. The idea of the device is to implant the electrode and recording device fully

under the skin. The device consists of an electrode attached to a small recording device

which is implanted fully under the skin. On the outside of the skin, a device resembling a

hearing aid is placed behind the ear. This device collects the recorded information.

(Hypo-Safe, 2009)

The advantages of implanting the EEG device subcutaneously are many. All the problems

with establishing and maintaining proper skin contact are eliminated. Another

advantage is that the device can be carried by the subject 24 hours a day for years

without compromising the comfort of the user. This allows for long-term recordings of

under normal life conditions.

Figure 8: The EEG recording device being developed by Hypo-Safe. The device consists of two parts. The part with the electrode is implanted under the skin and the second part, which is placed behind the ear, collects the recorded signal.

2.6 Characteristics of the EEG In order to use the EEG in practice it is often necessary to look at specific features of the

signal. The features can either be used to obtain a better understanding or model of a

specific signal, or the features can be used as an input to a classification algorithm. This

section describes some different characteristics of the EEG signal.

Towards 24-7 brain mapping technology DTU, March 2009

17

In general the EEG signal can be divided into rhythmic/spontaneous activity and

transient activity. The spontaneous activity occurs in the absence of any sensory stimuli,

while transient activity happens as a response to a stimulus.

2.6.1 EEG rhythms/spectral information

The rhythmic activity in the brain can be divided into characteristic frequency bands. The

limits between these frequency bands are not set in stone but vary between individual

subjects (especially due to age), as well as other influences such as use of drugs. Table 1

summarizes the different bands, their frequency range, their spatial location, and under

which circumstances the activity is seen.

In general the amplitude of EEG decreases with increasing frequency.

Type Frequency

(Hz)

Location Conditions for activity

Delta up to 4 frontally in adults, posterior in

children

adults slow wave sleep in babies

Theta 4 - 7 Hz young children drowsiness or arousal in

older children and adults idling

Alpha 8 - 13 Hz posterior regions of head, both

sides, higher in amplitude on

dominant side. Central sites (c3-

c4) at rest .

relaxed/reflecting closing the eyes

Beta 13 - 30 Hz both sides, symmetrical

distribution, most evident

frontally; low amplitude waves

alert/working active, busy or anxious

thinking, active concentration

Gamma 30 and up certain cognitive or motor functions

Table 1: Summary of the spectral bands in the rhythmic activity of EEG. The frequency limits for each band is an approximate value, because it varies over subjects and conditions. The location where each band is primarily seen is in the third column. The rightmost column states the conditions that normally increase activity of a given band. Summary of (Nunez, 2005) and (Wikipedia - Electroencephalography, 2009)

2.6.2 Mu rhythm

The mu rhythm is a spontaneous activity usually seen in the alpha range (8-13 Hz) over

the sensorimotor cortex. The amplitude of the signal is strongest when no movements

Towards 24-7 brain mapping technology DTU, March 2009

18

are performed. A decrease in amplitude is seen over the corresponding area in the

contralateral motor cortex when a movement is done. Even just thinking about doing a

movement (motor imagery) will decrease the amplitude of the mu rhythm.

(Pfurtscheller, Brunner, Schlogl, & Lopesdasilva, 2006)

2.6.3 ERP and EP

As mentioned, a stimulus can provoke a transient activity in the EEG. The measured EEG

signal can be timed to the stimulus event and is then called event related potentials

(ERP). Usually it is difficult to see any actual change in the EEG from a single stimulus

event. But if many trials are conducted and the results averaged, the random (untimed)

activity of the brain is averaged out and only the time-locked part caused by the

stimulus event remains. An example of an ERP is seen in figure 9. Depending on the

conditions for the recording and the location of the recording a number of peaks are

seen in the ERP. Ordinarily these peaks are referred to by a letter indicating polarity

(negative (N) or positive (P)) and a number indicating the latency from the event in

milliseconds. For example, P300 is a positive peak occurring 300 ms after the event.

Figure 9: Example of an event related potential (ERP) showing the typical components encountered. Notice that the vertical axis is reversed on the plot – this is due to historic reasons.

The immediate and spontaneous change recorded after an event is called an evoked

potential (EP). The EP reflects the processing of the sensory stimulus by the brain. An

example of an EP is the visual evoked potential (VEP), which is caused by stimulation of

the subject’s visual field. Figure 10 shows a normal VEP performed by a paradigm set out

in the visual evoked potential standard (Odom, et al., 2004). As can be seen from the

figure, the peaks are all located within the first few hundred milliseconds after the

stimulus event – this is a general characteristic for EPs.

Towards 24-7 brain mapping technology DTU, March 2009

19

Figure 10: A normal visual evoked potential (VEP) recorded from a subject looking at a checkerboard pattern where the colors are reversed 1-3 times a second (Odom, et al., 2004).

2.6.4 Readiness potential

The readiness potential (RP) is also called premotor potential or bereitschaftspotential.

The RP is an activity that occurs contralateral in the motor cortex before the actual

voluntary muscle movement is performed. The magnitude of the RP is quite small and

can only be visualized by averaging (Fabiani, Gratton, & Coles, 2000).

2.6.5 ERD, ERS, and ERSP

When calculating the ERP, a model consisting of a time-locked signal (the ERP) added to

uncorrelated noise is assumed. By averaging the individual events all activity that is not

both time-locked and phase-locked is averaged out through phase cancellation. This

simple model is useful and sufficient in some respects, but sometimes the desired part

of the signal is averaged out by this simple averaging method.

It has been shown that some events cause a change that is time-locked to the event but

not phase-locked. This type of event is believed to be caused by a decrease or increase

in synchrony between the individual neurons. A decrease in synchronization is called

event related desynchronization (ERD) and an increase in synchronization is called event

related synchronization (ERS) (Pfurtscheller & Lopes da Silva, 1999). An ERD or ERS might

not be phase-locked to the event; therefore it might not be seen on the ERP. But an ERD

or ERS would cause a decrease or an increase in power of a given frequency band.

In order to quantify ERD and ERS in the time domain several methods are being used.

The one used in this work is called event related spectral perturbation (ERSP) (Makeig S.

, 1993). The ERSP measures average changes in amplitude of the frequency spectrum

relative to the baseline before the event. The average changes are plotted as a function

of time from the event. The calculation of ERSP is described further in section 3.2.2.

Towards 24-7 brain mapping technology DTU, March 2009

20

3 Methods and tools

3.1 Classification When measuring an EEG you will often have large amounts of data, in particular if the

recording is done over a long time period. To extract information about such a large

amount of data automated methods are needed. Automated methods are also needed

for a real time BCI system.

A common problem when looking at EEG data is to work out if a certain observation can

tell us something about the conditions under which the observation is made. For

instance to detect whether a diabetes patient has critically low blood sugar or not based

on an EEG measurement.

In general, the area concerning development of algorithms which can tell us something

about the organization of data is called machine learning. The purpose of the algorithms

is to make a model of the data. When new data is encountered the rules of the model

can then give us information about the data.

Several branches of machine learning exist. One branch is the unsupervised learning.

Here an algorithm tries to determine the organization of the data by using only the data

itself, i.e. without knowledge of the source signals or the process by which the data is

mixed. An example of unsupervised learning is independent component analysis.

Another branch of machine learning is supervised learning. In this case some sort of

knowledge about the data is at hand. The knowledge can for instance be information

about the circumstances under which a specific EEG is measured. In other words a given

observation is supplied with a label containing information about the observation. If a

number of observations are available, this set of observations and labels can be used to

train a function. The input to the function is the observation and ideally the output is the

label. This function or model can then be supplied with new observations not used in

the training process and hopefully supply the correct label.

In this section two different classification methods are presented. A simple linear

classifier called Fisher’s linear discriminant (FLD) and a more complex linear classifier

called support vector machines (SVM). SVM can be extended into a non-linear classifier.

This non-linear SVM, called Gaussian SVM, is also presented.

FLD was chosen as it is a very common method used with success in many BCI systems

(Lotte, Congedo, Lécuyer, Lamarche, & Arnaldi, 2007). Furthermore, the classifier is

simple to apply, and only uses little computational power. The drawback of the method

Towards 24-7 brain mapping technology DTU, March 2009

21

is that it is linear and does not cope well with nonlinear data. Linear SVM is selected

because it, unlike FLD, is insensitive to high-dimensional data (Lotte, Congedo, Lécuyer,

Lamarche, & Arnaldi, 2007). Furthermore it can be customized to the data used by

changing an adjustable parameter. Because EEG data often can be of non-linear nature,

Gaussian SVM is also tried.

In general for all the presented algorithms we can consider a set of (training) data

T= {(𝒙𝑘 ,𝑦𝑘),… , (𝒙𝑘 ,𝑦𝑘)}, containing k pairs of observation/feature vectors 𝒙𝑖 ∈ ℝ𝑛 and

the class label for each of the k observations, 𝑦𝑖 ∈ 𝒴. In the classification problems

looked at in this work, only one of two labels is possible for each observation – thus only

the binary case of 𝒴 = {1,2} is considered. We then want to find a classification rule, q,

which is able to predict the label only using the information given in the observation,

𝑞:𝒳 → 𝒴 = {1,2}. The classification rule can then be used to predict the labels of any

new observation from the same distribution.

3.1.1 Fisher’s linear discriminant (FLD) classification

Fisher’s linear discriminant classification was presented by the statistician and biologist

Sir Ronald Aylmer Fisher in 1936 (Wikipedia - Linear discriminant analysis, 2009). The

implementation used in this work, is from the statistical pattern recognition (STPR)

toolbox for Matlab (Franc & Hlavac, 2008)

This supervised method for classification is based on a simple linear weighing of the

features or observations, with the discriminant function being:

𝑓 𝒙 = 𝒘 ∙ 𝒙 + 𝑏 (3.1)

defined by the parameter or weight vector 𝒘 ∈ ℝ𝑛 and bias 𝑏 ∈ ℝ. The assignment of a

label to a given observation is then done by the following rule q,

𝑞 𝒙 =

1 if 𝑓 𝒙 = 𝒘 ∙ 𝒙 + 𝑏 ≥ 0

2 if 𝑓 𝒙 = 𝒘 ∙ 𝒙 + 𝑏 < 0 (3.2)

In other words we want to separate the points of the two classes with a hyperplane. In

order to find the hyperplane that best divides the two classes, we define the class

separability as a function of 𝒘:

𝐹 𝒘 =

𝒘 ∙ 𝑺𝑩𝒘

𝒘 ∙ 𝑺𝑾𝒘 (3.3)

where 𝑺𝑩 is the scatter matrix between the classes: 𝑺𝑩 = 𝝁1 − 𝝁2 (𝝁1 − 𝝁2)𝑇 (𝝁𝒚 is a

vector of the means of observations). 𝑺𝑾 is the scatter matrix within the classes defined

as:

Towards 24-7 brain mapping technology DTU, March 2009

22

𝑺𝑾 = 𝑺𝟏 + 𝑺𝟐, 𝑺𝑾 = 𝒙𝑖 − 𝝁𝑖 (𝒙𝑖 −

𝑖∈𝒴𝑦

𝝁𝑖)𝑇 ,

𝑦 ∈ {1,2}

(3.4)

We then want to find the parameter vector 𝒘 which maximizes the class separability.

Several solutions to this problem are being used. In the implementation used in this

work the following solution is used:

𝒘 = 𝑺𝑾−1(𝝁1 − 𝝁2) (3.5)

In order to find the full discriminant function we also need to find 𝑏. In the used

implementation this is done by solving

𝒘 ∙ 𝝁𝟏 + 𝑏 = − 𝒘 ∙ 𝝁𝟐 + 𝑏 ⇒

𝑏 = −0.5 ∗ (𝒘 ∙ 𝝁𝟏 + 𝒘 ∙ 𝝁𝟐) (3.6)

3.1.2 Support vector machine (SVM) classification

Support vector machines are a group of supervised classification methods. The original

method was presented by Vladimir Vapnik in 1963 (Wikipedia - Support vector machine,

2009). The basis for this method is, like the FLD classifier, to find a hyperplane which

separates the two classes. The idea of SVM is to maximize the margin between the

hyperplane and the points of the two datasets closest to the dividing hyperplane. A

dataset that is linearly separable have many hyperplanes that will correctly classify all

the training data. The approach used by SVM will ensure that the found hyperplane will

also classify future observations (test data) as correctly as possible.

The data points closest to the boundary between the two classes are the only ones that

contribute to the location of the hyperplane. These “deciding” data points are called

support vectors.

Originally SVM was only a linear classification method, but the method has later been

extended to also work as a non-linear classifier.

The implementation of SVM used in this work is from the STPR toolbox for Matlab (Franc

& Hlavac, 2008).

Linear classifier

The linear SVM uses the same discriminant function as FLD, namely:

Towards 24-7 brain mapping technology DTU, March 2009

23

𝑓 𝒙 = 𝒘 ∙ 𝒙 + 𝑏 (3.7)

The classification rule in the binary case is also the same as for FLD:

𝑞 𝒙 =

1 if 𝑓 𝒙 ≥ 0

2 if 𝑓 𝒙 < 0 (3.8)

The difference between the two methods is, as mentioned, in the way the hyperplane is

found. When the data is completely separable by a hyperplane, the so-called hard

margin SVM can be used. In this case all data points are used when calculating the

weight and bias. In many cases this is not desirable since excluding a few outliers can

improve the margin considerably. Furthermore, when dealing with data that is not

linearly separable, it is necessary to allow the classifier to misclassify data points in order

to find a hyperplane at all. The practice of allowing the classifier to misclassify some data

points is called soft margin SVM.

When using the soft margin SVM the optimal parameters 𝒘∗ and 𝑏∗ corresponding to

the maximum margin hyperplane is calculated by solving the following quadratic

programming optimization problem:

minimize:

𝒘,𝑏

1

2 𝒘 2 + 𝐶 𝜉𝑖

𝑙

𝑖=1

subject to: 𝒘 ∙ 𝒙𝒊 + 𝑏 ≥ +1 − 𝜉𝑖 , 𝑖 ∈ ℐ1

𝒘 ∙ 𝒙𝒊 + 𝑏 ≤ −1 + 𝜉𝑖 , 𝑖 ∈ ℐ2

𝜉𝑖 ≥ 0, 𝑖 ∈ ℐ1 ∪ ℐ2

(3.9)

𝜉𝑖 ≥ 0 are slack variables, which will allow observations to be misclassified. ℐ1 =

𝑖: 𝑦𝑖 = 1 and ℐ2 = 𝑖:𝑦𝑖 = 2 are sets of indices for the two classes in the training

data. The regularization constant 𝐶 > 0 defines the importance of the slack variables. A

large value of C means that classification errors incur a large penalty – setting C to

infinity gives the same results as using a hard margin SVM. Setting a smaller value of C

lets more data points close to the boundary be ignored and thus creates a larger margin.

It should be noted here that the scale of C has no direct meaning. It is therefore not

intuitively easy to set the value of the regularization parameter. An illustration of the

effect brought about by changing the regularization parameter is shown in figure 11.

Towards 24-7 brain mapping technology DTU, March 2009

24

Figure 11: The effect of changing the regularization parameter C on the decision boundary when using linear SVM. Synthetic data is used. The support vectors are marked with a black circle. The C values used are left: C=1, middle: C=10, and right C= 100. Using a low C-value of 1 increases the margin and allows the classifier to ignore the points close to the boundary. In this case the low C-value allows the classifier to put all data points within the margin and a poor result is obtained as two data points are misclassified. By increasing the C-value the margin becomes narrower and fewer points are used as support vectors. With a C-value of 100 only the four closest points are used as support vectors and only the one misclassified data point is within the margin.

Non-linear classifier

For some problems a linear classifier will yield unsatisfactory results, simply because

they have non-linear decision boundaries. As mentioned earlier SVM classification can

be extended to take care of non-linear problems as well. The procedure for doing this is

to map the data to another vector space using a so-called kernel-function 𝒌𝑆.

The discriminant function then becomes:

𝑓 𝒙 = 𝜶 ∙ 𝒌𝑠 𝒙 + 𝑏 (3.10)

𝜶 ∈ ℝ𝒍 is a vector containing weights and 𝑏 ∈ ℝ is the bias. The data vector is mapped

into another space using a kernel function 𝒌𝑠 𝒙 = [𝑘 𝒙, 𝒔1 ,… ,𝑘 𝒙, 𝒔𝑑 ]𝑇. The support

vectors 𝒮 = {𝒔1 ,… , 𝒔𝑑} is the subset of training vectors used when calculating the

discriminant function. Many different kernel functions can be used. In this work only

one kernel function has been selected, namely the widely used Gaussian kernel function

(also called radial basis function (RBF)) – SVM classification using this kernel is called

Gaussian SVM. The Gaussian kernel is described by the function

𝑘 𝒙, 𝒔 = exp −

𝒙 − 𝒔 2

2𝜎2 (3.11)

The parameter 𝜎 > 0 in this kernel function determines how wide the Gaussian function

centered on each of the support vectors is. When the squared distance 𝒙 − 𝒔 2 is

Towards 24-7 brain mapping technology DTU, March 2009

25

much larger than 2𝜎2 the function is close to zero. If σ is set to a high value the Gaussian

“bump” around each support vector will be relatively wide and vice versa. When using a

high σ the entire set of support vectors will thus influence a given data point, leading to

a smoother decision boundary. If σ is set lower, the Gaussian becomes more narrow

leading to a possibility for greater bends in the decision boundary. If σ is set too low, the

discriminant function will essentially be zero outside the close vicinity of each of the

support vectors. The decision boundary will then most likely lie close to the support

vectors leading too poor classification for new observations. An illustration of the effect

of changing σ is shown in figure 12.

Figure 12: The effect of changing the σ parameter when using Gaussian SVM. The data used in this figure is the same as the data used in figure 11. The σ-values used are: left: σ=5, middle plot: σ=1, and right plot: σ=0.2. When using a high σ-value in the left plot, the decision boundary is only allowed a slight curve, and the result is close to a linear classifier. Using a σ-value of 1 is just the right setting as a perfect decision boundary is achieved. If the σ-value is set too low (right plot), then the decision boundary can curve too much and the data is overfitted. Although this gives a perfect classification for the training data, a poor classification of any test data will probably be seen.

Training of the binary non-linear SVM equates to solving the following quadratic

programming problem.

maximize:

𝜷∗𝜷 ∙ 𝟏 −

1

2𝜷 ∙ (𝑯𝜷)

subject to: 𝜷 ∙ 𝜸 = 1

𝜷 ≥ 𝟎

𝜷 ≤ 𝟏𝐶

(3.12)

0 is vector of zeros size [l x 1], 1 is a vector of ones size [l x 1]. γ and H are defined as

follows:

Towards 24-7 brain mapping technology DTU, March 2009

26

𝛾𝑖 = 1 if 𝑦𝑖 = 1−1 if 𝑦𝑖 = 2

𝐻𝑖,𝑗 = 𝑘 𝒙𝑖 ,𝒙𝑗 if 𝑦𝑖 = 𝑦𝑗

−𝑘 𝒙𝑖 ,𝒙𝑗 if 𝑦𝑖 ≠ 𝑦𝑗

The weight vector 𝜶 in the discriminant function is then found as

𝛼𝑖 = 𝛽𝑖 if 𝑦𝑖 = 1−𝛽𝑖 if 𝑦𝑖 = 2

The bias 𝑏 can be found from the Karush-Kuhn-Tucker (KKT) conditions where the

following constraints must hold for the optimal solution:

𝑓 𝒙𝒊 = 𝜶 ∙ 𝒌𝒔 𝒙𝒊 + 𝑏 = +1 for 𝑖 ∈ {𝑗:𝑦𝑗 = 1, 0 < 𝛽𝑗 < 𝐶

𝑓 𝒙𝒊 = 𝜶 ∙ 𝒌𝒔 𝒙𝒊 + 𝑏 = −1 for 𝑖 ∈ {𝑗:𝑦𝑗 = 2, 0 < 𝛽𝑗 < 𝐶

𝑏 is then found as an average over all the constraints so that

𝑏 =

1

ℐ𝑏 𝛾𝑖 − 𝜶 ∙ 𝒌𝒔 𝒙𝒊

𝑖∈ℐ𝑏

(3.13)

where the indices of the vectors on the boundary are denoted by ℐ𝑏 = {𝑖: 0 < 𝛽𝑖 < 𝐶}.

3.1.3 Cross-validation

After training a classifier on a training data set, you will want to test it afterwards on a

new data set which the classifier has not used during the training. This is done to assess

how accurate the classifier is and to estimate how well it will perform in practice. Often

the number of observations you have available for training and testing the classifier is

very limited. With a given amount of observations you would want to use as many of the

observations as possible to train a perfect classifier, but this would leave you with too

few observations to properly evaluate the classifier afterwards.

In order to alleviate this issue, cross-validation methods are used. The principle behind

this is to partition the entire amount of observations into a training subset and a test

subset. After having trained and tested a classifier, a new partitioning of the

observations is made. The new subsets are then used to train and test a new classifier.

This is repeated a number of times. The validation results are then averaged over the

individual rounds.

Many types of cross-validations are in use. In this work leave-one-out (LOO) cross-

validation is used (presented in (Lunts & Brailovskiy, 1967)). As the name suggest this

method involves taking out a single observation for test data set and using all the rest as

training data set. This is repeated so that each observation will be tested once. This is

Towards 24-7 brain mapping technology DTU, March 2009

27

the most thorough of the cross-validation methods and is only used for relatively small

data sets.

3.2 EEGLAB Most of the processing and visualization of the data in this work is done in the MATLAB

toolbox called EEGLAB (Delorme & Makeig, 2004). This toolbox simplifies many

commonly used procedures when working with EEG data, as well as offering many

different visualization options. Below, some of the visualizations used in this work are

presented.

3.2.1 ERP images

An ERP (event related potential) image is a useful method for visualizing many epochs of

EEG data in one diagram. In figure 13 a sample ERP image is seen. Each horizontal line in

the rectangular colored image is a single epoch which is color coded as shown in the

color bar on the right. Below the colored image the average of all the epochs is shown.

Figure 13: Sample ERP image of a surface electrode. The rectangular colored image at the top shows all epochs of data – epoch number is seen on the vertical axis to the left. The bottom plot shows the ERP averaged over all epochs.

3.2.2 Event related spectral perturbation (ERSP)

In section 2.6.5 the ERSP was described, here the EEGLAB implementation of this

measure is presented. Figure 14 shows a sample ERSP image as plotted by EEGLAB. To

calculate the ERSP values shown in the main plot, we start by computing baseline

spectra from the epoch values just before each event. The mean of these baseline

spectra is shown in the panel to the left of the ERSP plot in figure 14. Each epoch is

Towards 24-7 brain mapping technology DTU, March 2009

28

divided into overlapping time windows. A moving average of the amplitude spectrum of

these windows is found. Each of the found amplitude spectra is then divided by the

respective mean baseline spectrum in order to normalize them. Now that we have a

time-frequency decomposition of each trial, we can average them to obtain the ERSP

which is plotted in the figure. Below the main ERSP plot in figure 14 is shown the

minimum and maximum values of the ERSP plot. All the spectra can be calculated either

by FFT or by wavelet transform. The former is used in all cases in this work.

Figure 14: Event related spectral perturbation (ERSP) image. The plot is based on 280 trials of hand and foot motor imagery tasks from the data set used in section 4.3. The plot is calculated from electrode C3, i.e. over the hand motor cortex. The main plot shows the ERSP values in dB with the mean baseline spectral activity subtracted. The panel to the left of the ERSP plot presents the mean spectrum values during the baseline period, i.e. the values subtracted from the values in the ERSP plot. The panel below the ERSP image shows the maximum (green) and minimum (blue) ERSP values relative to baseline power at each frequency.

3.2.3 ICA

ICA decomposition is done with the EEGLAB method “runica”. This method is based on

the Extended Infomax algorithm (Lee & Sejnowski, 1998). The original Infomax algorithm

was presented in (Bell & Sejnowski, 1995). The implementation used in EEGLAB is

presented in (Makeig, Bell, Jung, & Sejnowski, 1996).

Towards 24-7 brain mapping technology DTU, March 2009

29

4 Data analysis

4.1 Comparison of subcutaneous electrodes with conventional

electrodes Currently, knowledge about the use of subcutaneous electrodes is almost non-existing.

Only one reference to the practical use of subcutaneous electrodes can be found in the

literature (Kamphuisen, et al., 1991). Although this reference states that the EEG

observed by the subcutaneous electrodes is similar to the EEG from ordinary skin

electrodes, no evidence is presented for the basis of this similarity. When contemplating

the use of subcutaneous electrodes for a variety of applications, a better understanding

of the differences in relation to ordinary (well known) electrodes is advantageous.

Using EEG data recorded simultaneously by subcutaneous electrodes and ordinary

surface electrodes, this section will focus on a comparison of the two types of electrodes

by various methods.

4.1.1 Description of dataset

The used data is a visual evoked potential (VEP) experiment which was carried out by

HypoSafe A/S at Odense University Hospital (OUH) on the 13th of December 2007.

Prior to the experiment, a subcutaneous electrode with four contact points was inserted

in the subcutis layer in the back of the head (visual area). The electrode was inserted

vertically in the center, due to constraints associated with blood vessel locations. Four

surface electrodes were placed on top of the subcutaneous ones, to allow for

comparison. In addition to the 2 times 4 electrodes, two reference electrodes were

placed on the top of the head (approx Cz), one for each set (one subcutaneous and one

surface). Both subcutaneous and surface electrodes are recorded with the same

equipment.

The full recording consists of approx. 6 minutes of EEG, where the subject is looking at a

computer screen. During three intervals, with breaks in between, the screen is flickering

a chessboard like image. The colours of the chessboard are changing with a frequency of

2 Hz (500 ms between the changes).

The datafile contains 9 signals: The first four channels are the surface channels; the fifth

channel is the stimulus channel (500 ms between events); the following four channels

are the subcutaneous channels. The sampling rate, fS = 256Hz.]

4.1.2 Preprocessing

The data is processed in the following way:

Towards 24-7 brain mapping technology DTU, March 2009

30

1. Data is imported into EEGLAB.

2. The event latencies are extracted from the stimulus channel.

3. 50 Hz noise is filtered out using a non-linear infinite impulse response (IIR) filter.

4. The data is divided into epochs based on the event data. Epoch start is set to

200 ms before the event and epoch end is set to 500 ms.

5. In order to remove low frequency drifts and like artifacts, a mean baseline value

(based on the -200 to 0 ms interval) is calculated and subtracted from each

epoch individually.

6. Epochs containing artifacts are removed using some of EEGLAB standard

methods. Using these methods 101 out of 432 epochs are rejected. Thus 331

epochs remain for further analysis. Figure 15 shows ERP images of all data

before rejection of artifacts and figure 16 shows the same images after artifact

rejection. The employed artifact rejection methods are the following:

a. Epochs containing abnormal values of above 40 μV or below -40 μV are

rejected.

b. The spectrum for each epoch is found. If a specific spectrum contains

values above 25 dB or below -25 dB in the interval 0 Hz to 50 Hz, then

the epoch is rejected.

Figure 15: ERP images of all the data before the artifacts have been removed. The top four plots show the four surface electrodes and the bottom four plots show the four subcutaneous

Towards 24-7 brain mapping technology DTU, March 2009

31

electrodes. Data from electrodes which are positioned on top of each other on the head are shown above and below each other in the plots above.

Figure 16: ERP images of the same data as shown in figure 15, but epochs containing possible artifacts have been removed.

4.1.3 Artifacts

Looking again at Figure 15 and comparing the surface electrodes with the subcutaneous

electrodes, an increased number and/or more noticeable artifacts are seen in the

subcutaneous electrodes (for instance subcutaneous electrode 4 at approx. epoch 160-

170 and subcutaneous electrode 2 at approx. epoch 280). The nature of these artifacts is

not known.

Figure 17 shows the number of rejected artifacts as a function of a threshold value using

two different methods of artifact rejection. The artifact rejection methods are described

in section 4.1.2. From figure 17 it is seen that although there apparently are more

artifacts in the subcutaneous electrodes than in the surface electrodes the number of

rejected artifacts is almost the same in the two groups of electrodes, only a few more

artifacts are rejected when using rejection by threshold of the data values. At a

threshold of 40 μV (the value used when processing the data) an extra 5% of the epochs

in the subcutaneous electrodes are rejected.

Towards 24-7 brain mapping technology DTU, March 2009

32

Further experiments are needed in order to investigate whether this increased amount

of artifacts is significant. Perhaps a different result will be obtained with electrodes

which are fully implanted under the skin for a longer period of time.

Figure 17: The proportion of epochs which are rejected as a function of the threshold value used for rejection. Two different rejection methods are used. Each method is used on both the surface data channels and the subcutaneous data channels. In the top plot every epoch containing a data value outside the interval [-threshold; threshold] in any data channel will be considered an artifact, and the epoch is rejected. Using this method, slightly more epochs are rejected in the subcutaneous data than in the surface data. In the lower plot the rejection is based on the spectrum of each epoch between 0 Hz and 50 Hz. If the power at any frequency in any channel is outside the interval [-threshold; threshold], then the epoch is rejected. Using this method the percentage of rejected epochs is almost the same for both the surface and subcutaneous data channels.

4.1.4 Frequency spectrum

From a first look at the frequency spectrum of the raw data (figure 18) it is seen that the

data from respectively the surface electrodes and the subcutaneous electrodes are

almost similar. But a few differences are immediately noticeable:

50 Hz noise is more pronounced in the surface electrodes. The power of the 50

Hz noise is increased by 123% in the surface electrodes.

A 100 Hz component is seen exclusively in the subcutaneous electrodes

The signal from the subcutaneous electrodes has more power in the low

frequency spectrum (< 1 Hz)

A 60 Hz component is seen on both the subcutaneous and surface electrodes.

Towards 24-7 brain mapping technology DTU, March 2009

33

The 50 Hz noise might possibly be explained by radiation of line noise from the

surroundings. Because the subcutaneous electrodes are “protected” by a layer of skin,

they might be less influenced by this radiated line noise.

The 100 Hz component might be a harmonic of the 50 Hz noise, but why it is only seen in

the subcutaneous electrodes, and not in the surface electrodes where the 50 Hz noise is

larger, is unknown.

As described earlier, more artifacts of various kinds are seen in the subcutaneous

electrodes. This might be the reason for the higher power of the low frequency signal in

the subcutaneous electrodes.

The 60 Hz component is a component usually seen when using a power supply of this

frequency (i.e. in the United States). This cannot be the case in this experiment. The 60

Hz component might perhaps be caused by the update frequency of the computer

screen the test subject is looking at. This phenomenon is known as frequency driving.

Figure 19 shows the spectrum after having processed the data as described in section

4.1.2. From this graph a 9 Hz alpha activity is clearly seen. Furthermore a peak at around

26 Hz is seen, corresponding with high beta waves.

Towards 24-7 brain mapping technology DTU, March 2009

34

Figure 18: Plot of the frequency spectrum of the raw data, i.e. before filtering and removal of artifacts. The top plot displays the spectrum for the four surface electrodes. The bottom plot displays the spectrum for the four subcutaneous electrodes. Note the higher power of the low frequency components in the plot of the subcutaneous electrodes.

Figure 19: Frequency spectrum of processed data (see section 4.1.2 Preprocessing), i.e. after filtering of 50 Hz noise and removal of artifacts. The top plot displays the four surface electrodes. The bottom plot displays the four subcutaneous electrodes. Note the 60 Hz component in both plots and the 100 Hz component in the bottom plot.

Towards 24-7 brain mapping technology DTU, March 2009

35

4.1.5 Event related potentials (ERP)

Figure 20 shows the ERP for each of the eight electrodes. Some characteristic

components of the ERP are seen: A positive peak at 110 ms after visual stimulus (P110),

a negative peak at 170 ms (N170), and a (small) negative peak at 220 ms (P220).

From the plot in figure 20 some small differences between the subcutaneous and

surface electrodes can be seen. Larger amplitudes are seen in the subcutaneous

electrodes – most pronounced in electrode locations 1 and 2. The amplitude at the peak

at 110 ms is on average 19% larger in the subcutaneous electrodes compared to the

surface electrodes. In electrode 1 the amplitude is 37% larger in the subcutaneous

electrodes and in electrode 4 only a 1% increase is seen.

These larger amplitudes for the subcutaneous electrodes – which are closer to the

cortex of the brain – might be expected since the potential of a dipole falls off as the

inverse square of the distance (Nunez, 2005). The distance between the electrodes and

the cortex can be as little as 1 cm and with the thickness of the epidermis and dermis

being up to several millimeters, this decrease in distance could lead to a significant

increase in amplitude.

Towards 24-7 brain mapping technology DTU, March 2009

36

Figure 20: Average ERP for the four electrode locations. Each plot shows the average ERP for surface electrode (green), subcutaneous electrode (blue), and difference between the two (red). The largest difference in maximal amplitude is seen in electrode number one and two (top left and top right).

4.1.6 ICA components

ICA decomposition is run on the four surface electrodes, yielding four components.

Likewise the four subcutaneous electrodes yield four components. The ICA

decomposition is done with EEGLAB.

Figure 21 shows ERP images of the two times four components. Component 1 of the

surface electrodes and component 1 of the subcutaneous electrodes are seen to contain

Towards 24-7 brain mapping technology DTU, March 2009

37

most of the ERP. Figure 22 show the cross correlation between these two components –

a significant cross correlation coefficient of 86% is seen between the two components.

Figure 21: ERP images of ICA components. The four ICA components for the four surface electrodes are shown in the four top plots, and the four components for the four subcutaneous electrodes are shown in the bottom four plots.

Towards 24-7 brain mapping technology DTU, March 2009

38

Figure 22: The red curve shows the cross-correlation of ICA component one for subcutaneous electrodes and component one for surface electrodes (as shown in figure 21). The blue curve is constructed by calculating the cross-correlation between one of the components and a random permutation of the time indexes of the other component. 1000 different random permutations are made and the blue curve is the minimal and maximal values of the cross-correlation using these random permutations. The boundaries of the blue curve are then comparable to a significance level of 0.1%.

4.1.7 Conclusion on electrode comparison

This section has investigated the similarities and differences between surface electrodes

and subcutaneous electrodes used for measuring EEG.

The signals obtained using the two kinds of electrodes exhibited some small differences.

- Slightly more as well as more pronounced artifacts are seen in the subcutaneous

electrodes than in the surface electrodes. 14% of the epochs in the

subcutaneous electrodes compared to 9% of the epochs in the surface

electrodes are rejected when artifacts are rejected by threshold of the data

values. The cause for this increase in number of artifacts is unknown.

- Less 50 Hz noise is seen in the subcutaneous electrodes. The power of the noise

in the subcutaneous electrodes is reduced by 55% compared to the surface

electrodes. This is possibly due to a decreased effect of radiation of line noise

from the surroundings.

Towards 24-7 brain mapping technology DTU, March 2009

39

- Higher amplitude of the signal in the subcutaneous electrodes is seen. In

electrode 1 the peak amplitude at 110 ms is 37% higher in the subcutaneous

electrode compared to the surface electrode. This is most likely caused by the

subcutaneous electrodes simply being closer to the cerebral cortex.

When looking at the ICA components these also show a significant similarity between

the subcutaneous and the surface electrodes. When comparing the primary component

of respectively the surface and subcutaneous electrodes a cross correlation coefficient

of 86% is found.

This section shows that the signal obtained using subcutaneous electrodes could

possibly be of better quality than the surface electrodes due to less noise and higher

amplitude. But the data material for this comparison is quite limited so this conclusion is

not certain.

Towards 24-7 brain mapping technology DTU, March 2009

40

4.2 Classification of visual stimuli This section will be based on the VEP experiment that was presented in section 4.1. In

this experiment the triggering event was the inversion of the black and white colors in a

checkerboard pattern on a computer monitor. The colors of the checkerboard changed

every 0.5 seconds. The recording was performed in three continuous segments with

longer breaks in between. The three segments contain respectively 186, 120, and 124

epochs with a break of about 30 seconds between each segment.

As an exercise in classification, it will be investigated whether or not it is possible to

make a classifier capable of distinguishing the color of the checkerboard – with color is

meant whether the checkerboard is black or white in for instance the top left corner.

4.2.1 Preprocessing

The data set was preprocessed in the exact same way as described in section 4.1.2. This

processing includes filtering of 50 Hz noise by IIR filter, division into epochs from 200 ms

before the event to 500 ms after the event (i.e. epochs are 700 ms in total), subtraction

of mean baseline value, and removal of artifactual epochs. After removal of epochs

containing obvious artifacts the three segments contain respectively 140, 90, and 90

epochs.

The epochs have been labeled with either “black” or “white” before removal of

artifactual epochs. The first epoch in each of the three data sets have been labeled with

black, the next epoch has been labeled with white, and so on.

4.2.2 Features

Two different approaches have been tried with respect to the selection of features. One

approach is using the data values after preprocessing. Another approach is to use the

primary ICA component as described in section 4.1.6 and visualized in figure 21.

In all cases only the subcutaneous electrodes have been used, since these yield

comparable results to using the surface electrodes. When using the primary ICA

component four components have been calculated based on the four subcutaneous

electrodes, but only component 1 has been used. When using the data values several

options have been tried: Using all four electrodes or only electrode number 1 or 4

isolated. Unless stated otherwise the subcutaneous electrode number 1 has been used

since this has been shown in section 4.1 to provide the highest amplitude and contain

the fewest artifacts.

Towards 24-7 brain mapping technology DTU, March 2009

41

For both the ICA values and data values, all values from the entire epochs have been

tried. As alternative, shorter windows within the entire epochs have been tried, as will

be seen in later figures.

4.2.3 Classification

Three different classification algorithms have been used: Fisher’s linear discriminant,

linear SVM, and Gaussian SVM.

The first of the three continuous segments is used for training the classifier (training

data set). The trained classifier is afterwards tested on epochs from each of the two

other segments separately (test data set 1 and test data set 2).

When testing the classifier on the two test data sets, a classification error of 0% in for

instance test data set 1 means that all labels in test data set 1 are correct – thus the first

showing of the checkerboard in this set is the same color as the first showing of the

checkerboard in the train data set. Vice versa, a classification error of 100% means that

the first showing of the checkerboard in a test data set is the opposite color as the first

showing of the checkerboard in the train data set.

Each of the three sets contains an equal amount of epochs from each color of the

checkerboard (each label). Because of this we will expect an error of 50% if there is a

random classification of labels. Likewise if all epochs in a test data set are labeled with

either black or white we will get a classification error of 50%.

With this in mind we now try to train different classifiers using different parameters and

different features. Figure 23 shows plots of the classification error using FLD as

classification method. All possible window lengths are shown in the figure. From the top

plots it is seen that the classifier is able to obtain good separation of the two classes of

the training data, as long as the window length is longer than about 300 ms. But when it

comes to using the testing data no clear separation occurs – the classification error is

close to 50% with almost all windows – meaning that the data is either randomly

classified or all classified with the same label. Another observation can be made from

the top plot in figure 23: when the dimension of the data becomes too large, i.e. window

length of more than 540 ms, the classifier will in some cases give an error of 100% for

the training data. This is due to the fact that FLD, unlike SVM, is not robust to data of

high dimension (Lotte, Congedo, Lécuyer, Lamarche, & Arnaldi, 2007).

Towards 24-7 brain mapping technology DTU, March 2009

42

Figure 23: Plots showing the classification error using different windows of the data. The top plot shows the error on the training set, middle plot is for test data set 1, and bottom plot is for test data set 2. Each point in the plots shows the classification error for a classifier trained and tested on a specific window of the data. The classification error can be seen on the colorbar to the right (dark blue: 0%-20%, light blue: 20%-40%, green: 40%-60%, yellow: 60%-80%, orange: 80%-100%). The vertical axis is the start point of the chosen window and the horizontal axis is the length of the window. The plots are not valid for interval start + interval length > 500 ms (the lower right half). On each of the plots for test data sets is also shown the location and value of the minimal and maximal classification errors.

The classifier used in these plots is FLD. The data set used is the processed data values. From the top plot it can be seen that with a sufficient length of the window (above 300 ms) a good separation of the training data can be obtained (less than 20% error). If the window becomes too long (more than 540 ms) the classifier will yield erroneous results – either yielding a classification error of 0% or 100%. This shows that FLD is not robust to data of high dimensions. Looking at the test data sets it is seen that nearly all windows yield a classification error of 40%-60%. In other words the classifier does not give any usable results. Although the minimum and maximum errors deviate somewhat from 50%, it is not significant.

In figure 23 the processed data values were used as a feature. Figure 24 shows the

results when using the primary ICA component instead. It is seen that this does not

improve dramatically on the results. A small slightly interesting area in the plot for test

data set 2 can be seen when the window starts at 300-400 ms and the window length is

Towards 24-7 brain mapping technology DTU, March 2009

43

to the end of the epoch. In this area there is an error above 60% with a maximum error

of 70%. The location in the plot of this area might show that the values of the ICA

component around 450 ms after the event might be giving this result.

Figure 24: This plot has been made using FLD as classifier like in figure 23, only using the ICA data instead of the processed data. These plots show almost the same results as the ones in the previous figure, although for test data set 2 with a window start of about 400 ms and interval length of10 ms to 100 ms an error of above 60% is seen. In this same area a maximum error of 70% is also seen.

Towards 24-7 brain mapping technology DTU, March 2009

44

Figure 25: Classification results using linear SVM as classification method. The regularization constant, C, is set to 10. From the top plot it can be seen that, unlike FLD, no error is seen with data of high dimensionality. The two lower plots show no conclusive results, i.e. almost all windows give a classification error of 50%.

Next, another classification method is tried, namely linear SVM. From the top plot in

figure 25 (the training data) it is seen that, unlike FLD, no error occurs when a high

dimension feature vector is used. This is expected of the SVM method which is robust to

high dimension data. A regularization constant of 10 (the default value in the STPR

toolbox) has been used. Other values of the regularization constant have also been tried

but with no significant changes in the results (figures not shown). Using ICA as feature

vector has also been tried, but also with no significant changes (results not shown).

Towards 24-7 brain mapping technology DTU, March 2009

45

Figure 26: Classification results using Gaussian SVM as classification method. The parameters are set to the default values used in the STPR toolbox: regularization constant, C = 10 and σ = 1. The top plot shows that for almost all window sizes and locations an almost perfect division of the training data is obtained. When testing the classifier on the test data sets (middle and bottom plot) a classification result of 50% is seen in almost all cases. This might signify an overfitting of the training data.

The final classification method that will be used is the Gaussian SVM. To begin with, the

default values for the regularization constant (C = 10) and σ in the kernel function (σ = 1)

is used. The results are shown in figure 26. From the top plot it is seen that even the

shortest window length (i.e. low dimension feature vector) yields an error close to 0%

for the training of the classifier. This might signify that the decision boundary of the

classifier is too flexible, meaning that the data points of the training data set might be

overfitted. Looking at the results from the test data sets, almost all window sizes give a

classification error of exactly 50%. This is due to all observations of the test data sets

being labeled with the same classification.

In order to try and prevent overfitting of the training data, the value of σ is increased.

This will lead to a less flexible decision boundary, which will not as easily fit too close to

the data points. Furthermore the regularization constant is increased, meaning that

Towards 24-7 brain mapping technology DTU, March 2009

46

classification errors are penalized heavier. Figure 27 shows the results when using a

regularization constant of 75 and σ of 50. This combination of parameters was found to

give the most conclusive results. Looking at the results for test data set 1, a relatively

large area of classification error below 40% is seen, compared to the previous figures. In

this area is also seen the smallest classification error (31%) of all the tried classifications.

For test data set 2 a large fraction of the possible windows gives a classification error of

above 60%. Furthermore a maximum classification error of 73% is seen in this data set –

larger than any of the other tried classifications. The location of the area of high

classification error tells us that the data values from 150 ms to 350 ms after the event

might be the significant values for the classifier - although values before the event is also

necessary for giving good results.

In order to get a comparison value for the minimum and maximum errors, the exact

same run is made. The only difference is that the labels are randomly set before

calculating each point in the plot. In this random run a minimum classification error of

approximately 30% and a maximum error of 70% are found.

An example of the effect of changing the feature vector can be seen in figure 27. Looking

at the results for test data set 1 it can be seen that for a given set of electrodes and a

given classification method very different results are obtained. Using almost the same

start point of the window (150 ms), a change of the window length of only 40 ms

changes the classification accuracies from the minimum value of 31% to the maximum

value of 62%. This goes to show that just blindly selecting your feature vector might

produce seemingly random results.

Towards 24-7 brain mapping technology DTU, March 2009

47

Figure 27: Classification results using Gaussian SVM. Regularization constant is set to 75 and σ to 50. The middle plot (test data set 1) shows a low classification error for some different window sizes and placements. A classification error of 31% is the minimal found for test data set 1. For test data set 2 a high classification error (maximum 73%) is seen for many different window sizes and placements.

Using ICA as feature vector instead of the data values was also tried under the same

conditions used for figure 27. This was found to decrease the significance of the results.

For all the shown classifiers it was also tried to use other channels than the first

subcutaneous channel as well as using all four of the subcutaneous channels. This did

not improve the results, and in many cases worsened the results.

4.2.4 Conclusion on classification of visual stimuli

This section looks at the classification of a visual stimulus in the form of a checkerboard

changing color with a frequency of 2 Hz. The goal of the classification is to determine the

color of the top-left square of the checkerboard for the individual epochs. With the

paradigm of this test we are also armed with the knowledge that the checkerboard

changes to the other color with each observation. Therefore we can make our

Towards 24-7 brain mapping technology DTU, March 2009

48

conclusion of the checkerboard color based on all the observations from a given

segment.

From the literature it is known that a light flickering with a certain frequency gives a

signal in the visual cortex of the same frequency (Herrmann, 2001). Whether a 2 Hz

frequency can be found in the EEG signal is hard to determine due to the large amount

of noise in the low frequency ranges as seen on figure 19 – this has not been

investigated further.

Three different classification methods have been tried, i.e. FLD, linear SVM, and

Gaussian SVM. Classification with FLD and linear SVM did not produce any noteworthy

results.

The processed data values as well as the primary ICA component have been used as a

feature. Using the ICA component does not improve the results and in some cases

worsen the results compared to using the data values. This could be because the

important part of the data is contained in another component than the primary.

Regarding the location and length of the data values selected in the entire epoch, all

combinations were tried. This parameter was found to be very important for the

performance of the classifier.

The results shown in figure 27 (Gaussian SVM, C = 75, σ = 50) gives the most compelling

argument for whether this classification is possible or not. If a guess should be ventured

as to the conclusion of this classification exercise, it would be that the second segment

of showing the checkerboard starts with the same color as the first segment, and the

third segment of showing the checkerboard starts with the opposite color as the first

segment. But considering the amount of tests that have been tried, as well as the,

insignificant, maximal and minimal classification errors – 31% for test data set 1 and 73%

(equating to 27%) for test data set 2 - this result is not considered significant.

Towards 24-7 brain mapping technology DTU, March 2009

49

4.3 Classification of motor imagery tasks The foundation of this section is EEG data measured with a large number of electrodes.

The data is recorded from subjects performing two different cued motor imagery tasks.

Classification of these tasks into the two groups will be done. There are two main

purposes of this section:

1. To find ways to reduce the large amount of recorded data to a smaller feature vector

which more easily can be trained and tested. This will be done by applying various

methods for visualizing the content of the data.

2. To reduce the number of electrodes used in the classification. The large amount of

electrodes can be advantageous especially for investigative purposes, e.g. if you are

trying to locate an unknown dipole source. But the approach of this work is to learn

more about EEG for use in a BCI. In this regard a large number of electrodes is

inconvenient and should be avoided if possible. This section will thus focus on whether

the number of electrodes used can be reduced. A further advantage with reducing the

number of electrodes is that the computing time or hardware requirements for

calculation can be significantly reduced.

4.3.1 Description of dataset

The data was provided by Fraunhofer FIRST, Intelligent Data Analysis Group, and

Campus Benjamin Franklin of the Charité - University Medicine Berlin, Department of

Neurology, Neurophysics Group (Blankertz B. , 2004) . The data was provided for BCI

competition III, wherein participants try to achieve the best possible classification. The

aim of the experimental paradigm for this dataset (set IVa) was to discriminate trials of

different motor imagery tasks, based on a relatively small training set.

The subjects sat in a comfortable chair with arms resting on armrests. The subject does

not receive any form of feedback during the testing. Visual cues lasting 3.5 s were shown

on a computer screen. The cues indicated which of 3 motor imageries the subject should

perform: left hand, right hand, or right foot. Only trials for the classes of right hand, and

right foot are used in the analysis. Between the cues were relaxation periods of random

length (1.75 s to 2.25 s). Two different types of visual stimulation were used: either the

target cue on the screen was stationary or moving. 280 trials (140 hand trials, and 140

foot trials) are provided for each of five subjects. The 280 trials are divided into training

and test sets. In this analysis only one of the subjects (designated al) is used.

For the recording a 128 channel Ag/AgCl electrode cap was used. 118 EEG channels were

measured at positions established by the extended international 10/20 system as seen

Towards 24-7 brain mapping technology DTU, March 2009

50

on figure 28 (Oostenveld & Praamstra, 2001). The data was band-pass filtered between

0.05 and 200 Hz and digitized at 100 Hz with 0.1 μV accuracy.

Figure 28: Locations of electrodes in BCI competition data set IVa. 118 electrodes are shown labeled according to the extended international 10/20 system. Figure made in EEGLAB. The electrode system is described in (Oostenveld & Praamstra, 2001).

4.3.2 Preprocessing

The provided data is processed by the following steps:

1. The data is imported into EEGLAB.

2. For each channel the location was provided by a label, as per the extended

10/20 system. The label is translated to coordinates by EEGLAB on a head

model.

3. Unless otherwise noted, no filters (for cancelling noise) were used.

4. The data is divided into epochs according to the start of the visual cues. Each

epoch starts at -1 second and ends 3.5 seconds from the start of the cue.

5. Low frequency drifts and artifacts are removed by removing the mean baseline

(based on the pre-event interval) from each epoch individually.

6. No artifact rejection was run, in order to get results comparable with other

literature on this dataset.

4.3.3 Selection of electrodes

As described this data set contains recordings from 118 electrodes. In the paradigm used

in this experiment, we are looking for a response specifically in the sensorimotor cortex.

Towards 24-7 brain mapping technology DTU, March 2009

51

Thus, it might be possible to reduce the number of electrodes needed for the

classification.

As described in section 2.12.1 and section 2.6.2, specific areas of the sensorimotor

cortex show decreased activity when imagining a movement of specific body parts. For

instance imagining movement of the right hand leads to decreased power in the

contralateral sensorimotor cortex (left side), possibly in the area corresponding to the

electrode labeled C3. The decreased activity corresponding to the imagined foot

movement is located in the central area (possibly close to Cz).

Electrodes in the areas around C3 and Cz are considered likely candidates for the best

possible electrodes. This is also shown in (Wang, Gao, & Gao, 2005), where the authors

find that electrodes C3 and CPz are the best channels for calculating a feature based on

ERD, and Cz and FCz are best candidates for a ERP based feature. Based on prior

knowledge as well as the results from (Wang, Gao, & Gao, 2005) the following

electrodes are investigated in closer detail: The three central electrodes CPz, Cz, FCz, the

electrode above the motor cortex for the hand, C3, as well as the four electrodes closest

to C3: FCC3, C5, C1, and FCC3.

4.3.4 Extraction of features

In order to find features which can separate the two groups of trials, several

visualizations are used. In the visualizations the data is divided into the two groups:

Hand imagery and foot imagery. Although the classification needs to work on single trial

data, a look at plots averaged over all trials might give a good idea of which features are

useful. But one has to keep in mind that any discovered differences based on averaging

might not be useful as a feature for classification, since the variance might be so large as

to cause too many errors in the classification.

Figure 29 shows the ERP from the eight different channels. This is plotted in order to see

if a simple difference between ERP (see section 2.6.4) can be found in any of the

channels. As can be seen, all eight channels have some difference between hand and

foot imagery, but the most distinct difference is seen in channels Cz and FCz.

Interestingly a relatively much smaller difference between the two groups is seen in

electrode C3.

Towards 24-7 brain mapping technology DTU, March 2009

52

Figure 29: ERP plots from 8 different channels. The blue line is ERP from hand imagery, the green line is from foot imagery, and the red line is the difference between the two. The data has been lowpass filtered at 7 Hz in order to give smoother curves. Although all channels show a difference between hand and foot imagery, the most marked difference is seen in channels Cz and CPZ.

Next, the time-frequency characteristics of the same eight channels are investigated.

Again each plot is the average of all trials. Only the channels exhibiting the greatest

difference between the two conditions are shown. In figure 30 the ERSP plot for

electrode C3 is displayed. A marked difference between the two conditions is seen,

especially the frequency range 11-14 Hz (mu rhythm). It can also be noted, that as

expected for an electrode above the hand area of the motor cortex, the power (above 8

Hz) drops when hand motor imagery is performed. This decrease in power seems to be

divided into three distinct frequency bands: Lower mu band or alpha rhythm (7-9 Hz), a

higher mu band (11-14 Hz), and a beta band (16-20 Hz). For foot motor imagery an

increase in the power compared to baseline conditions of the mu rhythm is seen – the

cause for this increase is unknown.

Towards 24-7 brain mapping technology DTU, March 2009

53

Figure 30: ERSP plot for electrode C3 located over the hand motor cortex (see description of this type of plot in section 3.2.2). Condition 1 is hand motor imagery and condition 2 is foot motor imagery. The panel to the right shows condition 1 minus condition 2. A significant difference is seen in the frequency range 11-14 Hz starting at 1000 ms. The difference is caused by a decrease of power (ERD) in the hand motor imagery tasks as well as an increase in power (ERS) during the foot motor imagery tasks.

Figure 31 and figure 32 displays the ERSP plot for the centrally located electrodes Cz

respectively CPz. A significant difference is seen in the beta band, but at a higher

frequency (19-24 Hz) than was seen in the C3 electrode (16-20 Hz). As expected the

trials for the foot motor imagery show a decrease in power in this higher beta band

shortly after the event (400 ms to 1400 ms). Another detail to take note of in these

figures is the early low frequency component (0-3 Hz, 400-1000 ms). This corroborates

the ERP findings seen in figure 29. The greater difference between hand and foot motor

imageries in electrode Cz and CPz compared to C3, is also reflected in these plots: the

low frequency component is less pronounced in electrode C3.

When looking at the foot imagery conditions for the electrodes that supposedly are

positioned close to the foot motor cortex, no significant ERD is seen in the mu band as

would be expected for a motor imagery. Only electrode CPz shows any kind of

desynchronization in this band, but the desynchronization is seen for both conditions

with no significant difference between the two.

Towards 24-7 brain mapping technology DTU, March 2009

54

Figure 31: ERSP plot for electrode Cz close to the centrally located foot motor cortex. Condition 1 is hand motor imagery and condition 2 is foot motor imagery. A marked difference between the two conditions is seen in the 19-24 Hz frequency band. The cause of this difference is an ERD during the foot imagery condition. A marked difference between the two conditions is seen in a low frequency (0-3 Hz) component 400-1000 ms after the event.

Figure 32: ERSP plot for electrode CPz located close to the foot motor cortex. The results are almost the same as in figure 31, although a less marked difference between the two conditions in the 19-24 Hz window can be noticed.

Based on the found differences, two sets of features are selected. One set is based on

the ERP, the other on the ERSP. The feature based on the ERP is calculated simply by low

pass filtering the EEG signal at 7 Hz. In order to reduce the dimensionality of the feature

vector, each of the epochs is then divided into 8 time windows of 500 ms (-1000 ms to

3000 ms) and the mean of each window is used as a feature.

Towards 24-7 brain mapping technology DTU, March 2009

55

The same time windowing is used for the ERSP based feature. Two different frequency

windows are selected: One at 11-14 Hz and another at 19-24 Hz. The mean of each of

the two-dimensional windows (combination of time window and frequency window) is

found and used as a feature.

If both feature types are used each channel provides 24 features.

4.3.5 Classification

The two groups of features described in the previous section can now be applied to a

classifier. Three different classifiers have been tested, namely Fisher’s linear

discriminant, linear SVM, and Gaussian SVM. The regularization parameter of both the

linear SVM and Gaussian SVM has been set to 104, and σ for the Gaussian SVM has also

been set to 104. These parameters were found to yield the best results in this case.

The classification accuracy has been determined by leave-one-out crossvalidation.

The results of the classifications are seen in table 2 using different combination of

features (ERP, ERSP, or both) and different combination of channels for which the

features have been calculated.

Different channel combination has been tried:

1) All 118 electrodes

2) CPz, Cz, FCz, C3, FCC3, C5, C1, CCP3

3) C5, C3, C1, Cz

4) FCz, Cz, CPz for ERP and C3, Cz, CPz for ERSP

Combination 1 is chosen in order to see how the classification algorithms handle feature

vectors of relatively high dimension. Combination 2 is the eight electrodes described

earlier. Combination 3 consists of four electrodes located in a straight line from the ear

to the top of the head. This combination is chosen to simulate the location of electrodes

that might be obtained by a device such as Hypo-Safe’s. Combination 4 is the electrodes

found to give the best classification accuracy with the FLD classifier.

Towards 24-7 brain mapping technology DTU, March 2009

56

Feature Channels

ERP ERSP Both

All 118 channels 48/49/56 45/80/77 45/75/71

CPz, Cz, FCz, C3, FCC3, C5, C1, CCP3 90/54/69 84/93/94 90/90/88

C5, C3, C1, Cz 87/50/68 88/89/90 92/91/88

Optimal combination 89/55/74 86/85/89 94/88/87 Table 2: Classification accuracy (%) using different combinations of selected channels and different

combinations of selected features. The first value (red) for each combination is when classification is done with Fisher Linear Discriminant, the second one (green) is with classification done by linear SVM, and the third one is with classification done with Gaussian SVM. The optimal combination of channels in the bottom row is calculation of ERP feature on the channels FCz, Cz, and CPz and calculating the ERSP feature only on channel C3, Cz, and CPz. The best result for each of the three classification methods are written in bold letters.

From the classification results in table 2 it is seen that using all channels with FLD cannot

deliver a working classifier. This is as expected because FLD does not handle feature

vectors of high dimension well (Lotte, Congedo, Lécuyer, Lamarche, & Arnaldi, 2007).

This result corroborates the same finding done in section 4.2.3.

The results in table 2 also show that the SVM classifiers deliver relatively poor accuracies

when using only the ERP feature. The reason for this is unknown but most likely it can be

corrected by adjusting the classification parameters (C and σ). This is most likely true in

general for the results from SVM classifiers: changing the parameters might improve

each of the classification accuracies.

4.3.6 Conclusion on motor imagery classification

This section has looked at the classification of motor imagery tasks. The data is recorded

with a large number of electrodes (118). The goal of this section is to find methods that

will lower the dimensionality of the data and furthermore investigate whether a good

classification can be made with a small number of electrodes.

Two methods for visualizing the data are used: ERP plots and ERSP plots. These are used

to select the subset of electrodes which exhibit the greatest amount of variability

between the two classes of motor imagery. The electrodes situated on top of the

relevant areas of the motor cortex are investigated further with the two plots. Eight

electrodes are selected based on the two plot types. And out of these four are looking

the most promising. The ERP and ERSP plots are then used to find the time and

frequency intervals of the data that contain the greatest difference between the two

classes. This is done in order to reduce the dimensionality of the feature vector by taking

the mean of specific windows in the plots.

Towards 24-7 brain mapping technology DTU, March 2009

57

Although the chosen method of reducing the feature vector was not compared with any

other method, it seems to provide a reasonable accuracy of classification. A secondary

benefit of reducing the size of the feature vector is that the training and testing of the

classifier can be performed with considerable less computing power.

The best classification accuracy achieved is 94% accuracy. This same result was obtained

by either Gaussian SVM classification using only the ERSP feature with 8 channels, or by

FLD classification using both features on only 3 channels per feature group (4 unique

channels).

The achieved accuracy can be compared with results from the BCI competition

(Blankertz B. , 2004). In the competition there is data from five subjects but only one of

the subjects has been used in this work. Furthermore LOO cross-validation has been

used in this work and not in the competition results, making the results not directly

comparable. If only looking at the same subject as was used in this section, the 14

participants in the competition achieved accuracies between 53.6% and 100%. The

obtained accuracy of 94% would equate to a 7th-9th place.

All in all this section proved that it is possible to reduce the number of electrodes to 4

while still maintaining a classification accuracy of at least 94%. If an additional constraint

is put on the selection of electrodes – that they must be on a straight line on the scalp–

a classification accuracy of 92% is achieved. This constraint is to simulate the limitations

Hypo-Safe’s device has in electrode placement. By carefully selecting the features these

classifications can be done by the fast classification algorithm FLD.

Towards 24-7 brain mapping technology DTU, March 2009

58

5 Discussion As described in the introduction, the focus of this study is to examine the possible

problems and limits of using a device like the one Hypo-Safe is developing at the

moment. Two main areas have been considered: the validity of the data recorded with

subcutaneous electrodes and the limits encountered in classification of data by using

only a few electrodes.

5.1 Validity of data from subcutaneous electrodes In section 4.1 the data recorded from subcutaneous electrodes is examined in

comparison with data recorded simultaneously from ordinary surface electrodes. Only a

limited amount of data is available, making any conclusion of this comparison uncertain.

An additional factor that possibly reduces the certainty of the conclusion is the following

issue: The subcutaneous electrode used in the experiment was inserted under the skin

and the wire attached to ordinary EEG recording equipment. Consequently, one might

imagine that the presence of an open puncture wound might influence the collection of

the data. Oozing blood around the wound might cause artifacts and give a poor

electrode contact. The subcutaneous electrode closest to the puncture site was found to

yield the poorest results. This can maybe corroborate the theory that oozing blood

causes artifacts or poor electrode contact. The oozing blood will not be present if both

electrode and recording device is implanted, and the skin has healed tightly around the

electrode.

As concluded in section 4.1.7 the data obtained from subcutaneous electrodes is very

similar to the data obtained from surface electrodes. But a number of small differences

between the subcutaneous and surface electrodes are seen (more artifacts, less noise,

and higher amplitude in the subcutaneous electrodes). Considering the small amount of

data as well as the lack of confidence for the results, these differences are not significant

enough to conclude whether the small differences seen speak in favor of the

subcutaneous electrodes or the surface electrodes.

5.2 Limits in classification performance using few electrodes In section 4.2 a classification of VEP events using between 1 and 4 electrodes is

performed. Section 4.3 describes a classification of motor imagery events using data

from between 4 and 118 electrodes.

The classifiers found in section 4.2 cannot predict the data with the few closely spaced

subcutaneous electrodes. But the reason that the classification did not yield conclusive

results is not known. It can be due to the electrodes not being placed in the area of the

Towards 24-7 brain mapping technology DTU, March 2009

59

brain responsible for the needed signal; it can be because the effect is simply not

measurable with EEG at all; or perhaps that more than the few electrodes supplied are

needed in order to record the differences.

The results from section 4.3 show that if the features and electrodes are chosen

carefully FLD produces very good results. SVM has the advantage that you can take any

combination of electrodes and get a reasonable classification without optimization.

These results also show that even when reducing the number of electrodes from 118 to

4 a good classification is still obtainable. Even though it is shown that few electrodes is

enough to get a good classification, this result is not directly comparable with the

recordings that will be made with Hypo-Safe’s device. The spatial distribution of the

subcutaneous electrodes in the device is much tighter than the spatial distribution of the

four electrodes in the motor imagery classification.

The tight spatial distribution can perhaps prove to be a problem if the device is to be

used for other purposes such as a BCI system. One way to solve this would be to attach a

longer electrode to the device if possible. Another way to solve the problem of close

spatial distribution would be to select other motor imageries than hand and foot.

Tongue and lips are an example of a pair of motor imageries that are closer together on

the motor cortex as shown in figure 3 on page 7.

5.3 Improvements This section suggests various improvements that can be made to the analysis performed

in section 4.

Comparison of the subcutaneous and surface electrodes by looking at the

countinous data: Further knowledge about the subcutaneous electrodes migh

be obtained by employing comparison methods usable on the continuous data.

This would allow one to look at all the data and not just the epoched data. An

example of such a method is scatter plots which can compare the data values of

each electrode with the other electrodes. This can be done using the plotmatrix

function from matlab.

ERSP plots for VEP classification: The classification of the VEP trials is only

performed with the data values as feature vector. Perhaps investigating for

specific time-frequency differences between the two classes might provide new

information.

PCA before FLD: In a few cases the FLD classification gave poor results due to the

dimensionality of the feature vector being too high. This might be alleviated by

reducing the dimensionality with for instance PCA or ICA beforehand.

Towards 24-7 brain mapping technology DTU, March 2009

60

Examination of the weight vector: One way to examine the importance of the

individual features in the feature vector, would be to look at the weight vector

of the classifier and remove features that are less important.

Artifact rejection combined with classification: You might obtain better results

by investigating epochs with an artifact rejection method before classifying. The

individual trials could be labeled with a measure of the certainty of it being an

artifactual epoch. This would allow a further investigation of trials which have

been classified with the wrong label.

5.4 Perspective It seems possible that Hypo-Safe’s device might be usable as the foundation of a BCI

system as well as many other applications. Some problems still exist though. Further

research into the limits of the low spatial resolution of the electrodes is needed. In

relation to this a good control input to such a BCI system needs to be found. A more

practical issue is with the calibration of the BCI system. When establishing a BCI long

calibration sessions are usually needed in order to find the specific features in the data

that gives the best classification for a given subject. If a BCI system is to be put into

widespread clinical use, this is a problem that needs to be addressed. The generalized

methods for feature selection used in this work, e.g. ERSP plots, might be advantageous

for selecting exactly the best feature for a given subject. The selection of features based

on such a measure can be (semi)automated. Recently, work has been done into the

reduction of the time needed for the calibration session by using well established

knowledge of the EEG changes in response to the function used to control the BCI

(Blankertz, Dornhege, Krauledat, Müller, & Curio, 2007). Use of prior knowledge

combined with an automated system might improve the time needed for calibration of a

BCI system.

Towards 24-7 brain mapping technology DTU, March 2009

61

6 Conclusion This section sums up the conclusions from each of the three data analysis sections.

Further details about the conclusions can be seen in the sections 4.1.7, 4.2.4, and 4.3.6

on pages 38, 47, respectively 56.

The use of closely spaced subcutaneously implanted electrodes for EEG recording is

examined.

A comparison between conventional electrodes and subcutaneous electrodes is made.

Only a limited amount of data material is available. Several methods are employed for

the comparison: frequency spectra, ERP, the amount of artifacts, and ICA

decomposition. The analysis shows that the data recorded from the two different

recording methods is almost identical, although some differences are found. The found

differences do not give a clear picture of whether the subcutaneous electrodes provide

better or worse data compared to the conventional electrodes.

A classification of two different data sets is done in order to investigate the use of a

limited amount of electrodes: 1) Classification of a data set containing visual evoked

potential (VEP) trials is performed by three different classification methods: Fisher’s

linear discriminant (FLD), linear support vector machines (SVM), and Gaussian SVM. A

good classification from the supplied data is not possible. 2) Classification of a data set

containing tasks based on motor imagery is performed. FLD, linear SVM, and Gaussian

SVM are used as classifiers. Feature extraction is performed on the basis of event

related potential (ERP) and event related spectral perturbation (ERSP). Using only four

electrodes a classification accuracy of 94% is obtained. The results from the second

classification show that it is possible to perform a successful classification using only a

few electrodes.

Towards 24-7 brain mapping technology DTU, March 2009

62

7 References Aurlien, H. (2004). EEG background activity described by a large computerized database.

Clinical Neurophysiology , 115 (3), 665-673.

Bell, A. J., & Sejnowski, T. J. (1995). An information-maximization approach to blind

separation and blind deconvolution. Neural Comput , 7 (6), 1129-1159.

Blankertz, B. (2004, 12 12). BCI Competition III. Retrieved 02 25, 2009, from BCI

Competition III: http://ida.first.fraunhofer.de/projects/bci/competition_iii/

Blankertz, B., Dornhege, G., Krauledat, M., Müller, K. R., & Curio, G. (2007). The non-

invasive Berlin Brain-Computer Interface: fast acquisition of effective performance in

untrained subjects. NeuroImage , 37 (2), 539-550.

Cotterill, R. (1998). Enchanted Looms: Conscious Networks in Brains and Computers.

Cambridge University Press.

Delorme, A., & Makeig, S. (2004). EEGLAB: an open source toolbox for analysis of single-

trial EEG dynamics including independent component analysis. J Neurosci Methods , 134

(1), 9-21.

Dubuc, B. (2009, 03 15). The Motor Cortex: The Brain From Top to Bottom. Retrieved 03

15, 2009, from The Brain From Top to Bottom: http://thebrain.mcgill.ca/

Emotiv Systems. (2009, 03 02). Retrieved 03 02, 2009, from Emotiv Systems:

http://emotiv.com/

Fabiani, M., Gratton, G., & Coles, M. G. (2000). Event-related brain potentials - methods,

theory, and applications. (J. T. Cacioppo, L. G. Tassinary, & G. G. Bertson, Eds.)

Cambridge: University Press.

Franc, V., & Hlavac, V. (2008, 08 15). Statistical Pattern REcognition Toolbox for Matlab

v. 2.09. Retrieved 12 03, 2008, from Statistical Pattern REcognition Toolbox for Matlab:

http://cmp.felk.cvut.cz/cmp/software/stprtool/

Herrmann, C. S. (2001). Human EEG responses to 1-100 Hz flicker: resonance

phenomena in visual cortex and their potential correlation to cognitive phenomena.

Experimental Brain Research , 137 (3), 346-353.

Hypo-Safe. (2009, 02 23). Retrieved 02 23, 2009, from Hypo-Safe:

http://www.hyposafe.com

Towards 24-7 brain mapping technology DTU, March 2009

63

Kamphuisen, H. A., Dulken, H., Janssen, A. J., Huyser, W. W., Sweden, B., Kemp, B., et al.

(1991). Sleep-Wake Recording of Forty Day Duration Using Subcutaneous Electrodes.

Epilepsia , 32 (3), 347-350.

Lee, T.-W., & Sejnowski, T. J. (1998). Independent component analysis for mixed

subgaussian and super-gaussian sources. Technical report, Computational Neurobiology

Lab, The Salk Institute, La Jolla .

Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., & Arnaldi, B. (2007). A review of

classification algorithms for EEG-based brain-computer interfaces. J. Neural Eng. , 4 (2),

1.

Lunts, A., & Brailovskiy, V. (1967). Evaluation of attributes obtained in statistical decision

rules. Engineering Cybernetics , 3, 98-109.

Makeig, S. (1993). Auditory event-related dynamics of the EEG spectrum and effects of

exposure to tones. Electroencephalography and clinical neurophysiology , 86 (4), 283-

293.

Makeig, S., Bell, A. J., Jung, T.-P., & Sejnowski, T. J. (1996). Independent component

analysis of electroencephalographic data. in Advances in Neural Information Processing

Systems, 8, pp. 145-151.

Markram, H. (2009, 03 15). Gallery: Blue Brain Project. Retrieved 03 15, 2009, from Blue

Brain Project: http://bluebrain.epfl.ch/

Megias, M., Emri, Z., Freund, T. F., & Gulyas, A. I. (2001). Total number and distribution

of inhibitory and excitatory synapses on hippocampal CA1 pyramidal cells. Neuroscience

, 102 (3), 527-540.

Nunez, P. L. (2005). Electric Fields of the Brain: The Neurophysics of Eeg. Oxford Univ Pr

(Txt).

Nuwer, M. R., Comi, G., Emerson, R., Fuglsang-Frederiksen, A., Guerit, J.-M., Hinrichs, H.,

et al. (1998). IFCN standards for digital recording of clinical EEG. Electroencephalography

and Clinical Neurophysiology , 106 (3), 259-261.

Odom, J. V., Bach, M., Barber, C., Brigell, M., Marmor, M. F., Tormene, A. P., et al.

(2004). Visual evoked potentials standard (2004). Documenta ophthalmologica.

Advances in ophthalmology , 108 (2), 115-123.

Towards 24-7 brain mapping technology DTU, March 2009

64

Oostenveld, R., & Praamstra, P. (2001). The five percent electrode system for high-

resolution EEG and ERP measurements. Clinical neurophysiology : official journal of the

International Federation of Clinical Neurophysiology , 112 (4), 713-719.

Pfurtscheller, G., & Lopes da Silva, F. H. (1999, November). Event-related eeg/meg

synchronization and desynchronization: basic principles. Clin Neurophysiol , 110 (11), pp.

1842-1857.

Pfurtscheller, G., Brunner, C., Schlogl, A., & Lopesdasilva, F. (2006). Mu rhythm

(de)synchronization and EEG single-trial classification of different motor imagery tasks.

NeuroImage , 31 (1), 153-159.

Srinivasan, R., Winter, W. R., Ding, J., & Nunez, P. L. (2007). EEG and MEG coherence:

Measures of functional connectivity at distinct spatial scales of neocortical dynamics. J

Neurosci Methods , 166 (1), 41-52.

University of Colorado at Boulder. (2009, 03 15). Functional organization of the brain.

Retrieved from University of Colorado at Boulder - Department of Integrative

Physiology: http://www.colorado.edu/intphys/

Wang, Y., Gao, S., & Gao, X. (2005). Common Spatial Pattern Method for Channel

Selelction in Motor Imagery Based Brain-computer Interface. Conference proceedings :

... Annual International Conference of the IEEE Engineering in Medicine and Biology

Society. IEEE Engineering in Medicine and Biology Society. Conference , 5, 5392-5395.

Wikipedia - Electroencephalography. (2009, 03 10). Retrieved 03 10, 2009, from

Wikipedia: http://en.wikipedia.org/wiki/Electroencephalography

Wikipedia - Linear discriminant analysis. (2009, februar 23). Retrieved from Wikipedia:

http://en.wikipedia.org/wiki/Linear_discriminant_analysis

Wikipedia - Support vector machine. (2009, 02 25). Retrieved from Wikipedia:

http://en.wikipedia.org/wiki/Support_vector_machine

Wurtz, R. H., & Kandel, E. R. (2000). Central Visual Pathways (Fourth ed.). (E. R. Kandel, J.

H. Schwartz, & T. M. Jessell, Eds.) New York: McGraw-Hill.