INPRO Project - Semantic ScholarSophisticated information processing is required fo r the huge...

INPRO Project

Spike Discrimination for Hybrid Bio-CMOS Chip

Semester Thesis

Dominic R. FrutigerETH Zurich

Summer Semester 2003

Author

Dominic R. Frutiger[[email protected]]

Supervisor

Sadik Hafizovic[[email protected]]

Host

Prof. Henry Baltes, PEL, ETH Zurich

Zurich, July 2003

Contents

Preface 3

Topical Framework 4INPRO: INformation PROcessing by Natural Neural Networks 4Semester Thesis: Task Specification 4

Biological Background: Information Processing in Neural Networks 5Listening to the Cell: Basic Neural Activity 5Dynamics of Neural Networks: Recent Insights 6

Technical Background: The Interface 7The Prototype Chip 7The Physical Neuron-Electrode Interface 8

Multi-Unit Recordings: Critical Issues 9

Methodical Background: The Challenge of Spike Sorting 11Introduction to the Problem 11Sketching a Recognition System 11

Statistical Approaches 12Artificial Neural Networks 12Template Matching 13

General Issues 13

Learning by Doing: A Partial Implementation 14INPRO Status: Available Data 14Preliminary Data Analysis in Mathematica 14Implementation of Unsupervised Clustering 20

Mathematical Background 21Feature Extraction Stage: Projection Pursuit and Negentropy Maximization 21Unsupervised Classification Stage: Mixture of Gaussian Model 22

Implementation in Matlab 22Results 24Obtained Results and Remaining Challenges 35Discussion of the Method by Kim and Alternative Approaches 36

Spike Sorting: Necessity and Effectiveness 40

Lessons Learned 42Summary 42Conclusions and Recommendations 42

Acknowledgements 43

References 44

Semester Thesis 2003 by Dominic R. Frutiger 3

PrefaceThis project was very fascinating for me because of my strong interest in biological and artificial intel-ligence, and because of both the philosophical and practical aspects of these fields. Neural interfacing isone of the most obvious steps towards a deeper understanding of how biological intelligence comesabout and what we can learn when attempting to build artificial intelligent systems.

This of course can only be a long term goal. Thus, contributing a small share to cutting-edge techno-logy such as the INPRO-chip and learning as much as possible about this field in the rather tight timeframe of a semester thesis was a very attractive idea.

The goal of INPRO is „to establish a novel method of information processing by combining naturalneurons with silicon technology“. The main purpose of this semester project in this context was to inve-stigate state-of-the-art spike sorting algorithms in the literature and possibly implement the most promi-sing ones in Matlab or Mathematica.

Unfortunately, it was impossible during the project to have our own recordings from the prototypechip. In the process of reading and programming, I became more and more aware of how immenselycomplicated the analysis of neural signals actually is and how even recent techniques still seem to bequite unreliable. This made me put much more weight on the theoretical side of the project, and specifi-cally the biological side, to learn more about recent findings on neural information processing and theirimpact on spike sorting challenge in order to provide a better starting point for both future hardware andsoftware considerations.

4 Spike Discrimination for Hybrid Bio-CMOS Chip


1. Topical Framework

In this section, the definition of the overall project and the task specification of this semester thesis isgiven.

1.1 INPRO: INformation PROcessing by Natural Neural Networks

The official project definition of INPRO reads as follows (citation):„The overall goal of INPRO is to establish a novel method of information processing by combining

natural neurons with silicon technology. This will be pursued by (a) realizing a prototype of a bioelec-tronic circuit, and (b) establishing its information processing capabilities.

Sophisticated information processing is required for the huge amounts of data routinely generated to-day. Taking a cue from nature, this project proposes to combine natural neurons, in themselves highlyefficient, parallel processing units, with custom-designed microelectronic chips. Cell colonies will begrown on arrays of electrodes, which will serve to both electrically stimulate and record cellular respon-se.

The Federal Institute of Technology Zurich (ETHZ) will develop novel CMOS integrated circuits theadvantage of which is obvious: the cells grow on transducers monolithically integrated withmicroelectronics on the same chip. The combination of miniaturized transducers (detection byminiaturized electrodes) and on-chip circuitry (biasing, amplification, signal preprocessing, generationof digital outputs etc.) in the immediate vicinity of the signal generating processes will be extremelybeneficial. Signal-to-noise ratios of expectedly weak signals will be significantly increased byamplifying and processing the signal while still on-chip, thus eliminating the signal deterioration byparasitic elements. A stable output (extracted features) then will be transmitted to computers, instead oftrying to transfer weak electrical signals via wires to external measuring equipment. The application ofCMOS technology thus improves the signal-to-noise ratio, allows for on-chip data processing andreduction, and provides a novel tool for cell stimulation. It should also facilitate the implementation ofon-board microfluidics, since the technologies used to make microelectronic and microfluidic chips areinherently compatible.“

1.2 Semester Thesis: Task Specification

Sadik Hafizovic, Ph.D. student at the Physical Electronics Laboratory of the ETHZ during the period ofthis project, is responsible for the final implemention of pattern recognition algorithms in fast hardware.A suitable spike detection algorithm will be realized on a reconfigurable logic device, presumably aSpartan-II FPGA, higher level spike sorting on a standard PC.

The goal of this semester thesis in the framework of INPRO is (a) to provide background of state-of-the-art approaches related to spike sorting in the broad domain of pattern recognition, (b) if possible, toimplement such algorithms or parts of it in standard software like Matlab or Mathematica for preliminarytesting and to attain a better understanding of the problem.

While working on the project, an additional task became increasingly important: (c) to investigate onrecent findings concerning information processing in natural neural networks and the possible conse-quences for the design of spike sorting algorithms and communication protocols.


2. Biological Background: Information Processing in Neural Networks

There is an immense interdisciplinary community worldwide investigating neural interfacing and neuralinformation processing, and a review is beyond the scope of this project. Here, only a brief introductionto the most relevant points can be given.

2.1 Listening to the Cell: Basic Neural Activity

The human brain consists of roughly 1011 nerve cells with up to 10‘000 connections each. These so-cal-led neurons are highly specialized, nonspherical cells consisting of a cell body (soma), one longerprotrusion called axon, and many shorter dendritic extensions. The axon is basically a tube with a dia-meter between < 1 µm and > 20 µm depending on the cell type and it is coated with a membrane [1].

There is a transmembrane resting potential of approximately -60 mV inside versus outside. The po-larisation is maintained by the mechanism of the so-called sodium(Na+)-potassium(K+) pump. Themembrane potential is time dependent and decays when no spikes are received by the neuron through itsnumerous connections to other cells over biochemical synapsis.

The neural activity and thus the communication between neural cells is based on the depolarizationof the membrane potential due to ion-flows through the opening and closing channels. This electricaldischarge is referred to as an action potential (AP). The typical duration of an AP is about 1-2 ms follo-wed by a refractory period of about the same duration [2]. This partially nonlinear voltage change overtime creates a characteristic waveform, the so-called spike shape.

Roughly 15-20 spikes (on average) have to arrive at the soma over its numerous connections withinfew milliseconds to trigger its own AP. This of course depends on the cell type, the sensitivity of theneuron and its connections, respectively. If it is very sensitive, 2-5 spikes might be sufficient, but it invery depressed synapses it can take 100 incoming spikes before a new AP is fired. The maximum spikingfrequency of a single neuron is approximately 200Hz, the average activity of active neurons around 5-50 Hz. This is the so-called neural background activity level. There are also neurons that are calm mostof the time. Homogeneous in vitro neuronal networks are known to exhibit synchronized bursting eventsof rapid firing of most of the neurons within a short time interval (~200 ms), and periods of sporadic fi-ring of individual neurons in between [3]. The shape of an AP changes dramatically within these burstsof activity due to the dynamics of membrane excitability [4]. Recent reports have shown that the mecha-nism of single action potential generation is very reliable when a suitable stimulation is directly appliedto the neuron. Pinato et al. [5] claimed that such a direct stimulation can evoke spike trains with a veryreproducible pattern. However, they also stated that neurons are connected to each other over biochemi-cal interfaces, the synapses, which in general exhibit high variability [6].

All physical aspects underlying the neural activity of a single cell, such as connectivity, sensitivity,and thus signaling strength in a group of interconnected neurons, constantly undergo changes dependenton the surrounding neural group activity. The keywords in this context are spike-time dependent plasti-city. This field has received a lot of attention lately [7][8].


2.2 Dynamics of Neural Networks: Recent Insights

The nervous system represents signals by sequences of action potentials or spikes, which typically occurin an irregular temporal pattern. In the past decades, numerous investigations on network dynamics andinformation processing properties of both networks and single cells, in vivo and in vitro, in real-worldexperiments and simulation, stimulated and unstimulated etc. have been reported.

Shaw et al. investigated a cooperative network of approximately 30 neurons in 1982 and found, thatthey emit similar spike trains and that neighboring neurons receiving shared inputs tend to belong to thesame functional assembly [5][9][10][11].

Pinato et al. [5] investigated in 2000 the input/ouput relations in an isolated leech ganglion (consistingof ~400 neurons) with six to eight suction pipettes and two intracellular electrodes. They claimed thattheir results were consistent with the notion that neural computation in the leech ganglion is distributedand demonstrate a significant spatial and temporal variability of this distributed processing.

Jimbo et al. [12] investigated the stimulation dependent dynamics of a neuronal culture of dissociatedcortical neurons of neonatal rats on an multi-electrode array in 1999. The results showed that the neuro-nal network may exist in two different dynamical states, depending on the strength of the simulationsignal: one state in which the neuronal network seemingly behaves as a non-chaotic deterministic systemand another state where the system exhibits large spatio-temporal fluctuations, characteristic of stocha-stic or chaotic systems [12].

Other findings on the influence of stimulation on network plasticity in real-world experiments as wellas in simulation showed, that repeated stimulation induces changes in network responsiveness [4][14].This adaptiveness is certainly an essential property of learning and memory [7][15]. Amit et al. [7], whoinvestigated spike driven synaptic dynamics in a simulated network, stated, that the repeated presentati-on of a set of external stimuli is shown to structure the network to the point of sustaining working me-mory (selective delay activity).

Shahaf et al. [4] made closed-loop experiments with computer and neural networks in vitro in 2001.The underlying assumption was, that the basic properties of neural cells, such as average number of con-nections, the stability of these connections, and the general modifiability by external stimuli, do not dif-fer from those in vivo. Using an Multi Channel Systems electrode array [16] he succeeded in teaching anetwork through repeated cycles of a stimulation pattern to exhibit a reproducible response to a specificstimulus. The stimulation protocols consisted of biphasic current pulses at a constant frequency over adefined time period. This repeated stimulation induces changes in network responsivness. Shahaf foundthat the magnitude of such modifications increases with stimulation time and lasts for many hours. Hesuggests that no concept of reward is needed, but stimulation itself is the driving force. The network ex-plores the space of possible modes of network response while stimulated and a intentionless relaxationinto a specific mode takes place when the external stimulus is removed.

De Ruyter van Steveninck et al. [13] investigated the motion-sensitive neuron in the fly visual system.They found, that it makes efficient use of the capacity of temporal variability to transmit information.This efficiency is achieved by establishing precise temporal relations between individual action potenti-als and events in the sensory stimulus.

Although this is certainly no thorough review of recent biological findings, it can be said, that thework in the past twenty years revealed complex dynamics with large fluctuations and waves of electricalactivity propagating through the neuronal tissue.

The details of these patterns may just be noise that should be averaged out to reveal meaningful si-gnals. Alternatively, if the precise arrival time of each spike is significant, then temporal variability pro-vides a large capacity for carrying information. The issue has been debated for decades, however, it isnow widely agreed that these dynamics are certainly an essential feature of information processing in thebrain [5][13][17][18].


3. Technical Background: The Interface

This section details the relevant physical properties of the INPRO prototype chip and the constraints gi-ven by the interface between recording circuitry and neural cell are described.

3.1 The Prototype Chip

The current prototype with 16 electrodes is capable of on-chip signal filtering, analog to digital conver-sion and simultaneous recording and stimulation at each electrode [19]. The electrode pitch (center-to-center distance of electrodes) is 250 µm, the electrode area is 30x30 µm2. Because the utilized CMOSprocess employs neurotoxic aluminium for the metal layers, the biocompatible electrodes are depositedin an electro-chemical post processing step. The adhesion layer is made of laminin or poly-L-lysine. Thefinal chip will consist of an array of up to 1024 electrodes. Stimulation voltages are in the range of

Volts at 120 kHz and 8 bit resolution. A 50 µs latency exists between two stimulation signals.The data are continuously recorded at 20 kHz with 8 bit resolution. The frequency of the neural acti-

vity is expected to be approximately 1 kHz. Consequently, a bandpass filter with center frequency at1 kHz was implemented. The corner frequencies of the filter are 100 Hz and 50 kHz, respectively. Mul-tiplexing takes place after signal amplification and filtering, which makes the design less sensitive toswitching and noise picked up along connection lines.

The expected signal-to-noise ratio (SNR) will be around 20 dB and thus in combination with on-chippre-processing provide signals that are clearly distinguishable from background noise.

It is expected that the activity of up to four neurons will be simultaneously recorded per site. Thebackground noise due to neuronal activity simultaneously recorded on multiple sites is expected to beminimal due to the relatively wide electrode pitch of 250 µm.

Fig. 3.1.A Micrograph of the INPRO prototype CMOS chip [19]

2±


3.2 The Physical Neuron-Electrode Interface

There are three electrically relevant components in a neuron-electrode interface: the neuron, the micro-electrode, and the medium in between [1].

Action potentials arising from different neurons have different shapes on extracellular recordings, de-pending on the neuron type and its distance to the recording site, contact quality to the recording site, aswell as on the specific characteristics of the electrode and the intervening medium. Moreover, the wave-form will depend on the location where the spike is picked up from (axonal vs. somatical spikes) and onthe properties of the stimulation, if present. Many attempts were made to provide mathematical modelsof these processes [1][20].

Metal electrodes can function as electro-chemical transducers that exchange electrons with ions and/or operate in a capacitive way or exert faradaic mechanisms. These processes are nonlinear and not yetfully understood [1]. When an action potential occurs in a neuron above a recording electrode on the IN-PRO-chip, ions flowing across the membrane capacitively create charge in the electrode. The signal isexpected to be bipolar and weak, in the range of hundreds of microvolts [19]. The chip will record up tofour neurons per site, in other words its selectivity is very low, although it will enable the experimenterto stimulate the network at various positions in a two dimensional plain. When attempting to stimulatethe cell culture, an action potential can be triggered e.g. by extracellular application of a short cathodiccurrent pulse [1][21].


4. Multi-Unit Recordings: Critical Issues

When working with neurons in-vitro (ex-vivo), one should keep in mind that the behavior of the culturemight significantly differ from that in an in-vivo experiment under more natural conditions. This is es-pecially true for investigations of neural cultures on multi-electrode arrays, where specifically preparedcultures live in an artificial environment. An investigation in this direction is certainly not part of thisreport. However, it seems, that one can assume similar properties of the single neural cell in-vitro as inin-vivo, at least as long as a comparable biochemical environment is provided [4].

Recent investigations by Segev et al. in 2003 [3] showed interesting clustering properties of neuralcultures in vitro. In vitro neurons normally self-organize into homogeneous networks of single neuronslinked by dendrites and axons. Under special conditions however, they can also self-organize into neu-ronal clusters, which are linked by bundles of axons. They seem to exhibit the same synchronized bur-sting events similar to those observed in homogeneous networks [3]. It seems that chemotactic signalingdoes not play a major role during cluster migration although it is known to guide neurite navigation. Ex-periments by Segev et al. suggest that the two key features during self-organization via cell migrationare the mechanical tension along stretched axons connecting two neurons and cell adhesion to the sub-strate. The movement is not smooth, as neurons can stop for short durations. The migration velocity is1.4 µm/min. Once formed, the clusterized network retains its organization until its degradation(usually after 3-4 weeks).

On a multielectrode array like the INPRO-chip, such in-vitro clustering might lead to problems: onone hand the selectivity of each active electrode will significantly decrease while more and more sourceswill contribute to the signal, on the other hand the efficiency of the chip will decrease since many elec-trodes between such clusters might be inactive and record no signal at all. Electrode positioning and tem-poral stability is a very common problem in most experimental setups [22], but since the configurationof the chip will be fixed, individual electrode positioning will be entirely impossible, resulting in randomcontact quality. This may be compensated with a higher number of electrodes, like it has been done withthe INPRO-chip.

Furthermore, guided growth [1][23] or cell immobilization [24][25] might be a cure to the problemof inactive electrodes. However, Rutten stated in his review on neural interfaces in 2002 [1], that in hisopinion network growth is not yet sufficiently understood to yield reliable and functional connections.

Another issue closely related to the spatial organization of neuronal clusters are nonobvious mecha-nisms of intracellular interaction. Holt et al. pointed out in 1998 [26] that only a small fraction (usuallyabout 20%) of the space in the brain is actually extracellular. Extracellular action potentials outsideaxons are small in amplitude and spatially spread-out, while they are larger in amplitude and much morespatially confined near cell bodies. A single spike from a neuron can cause an extracellular potential ofa few mV near the cell body, called ephatic interaction. How much of an effect does this or similar waysof interaction like electromagnetic fields [27] or chemical agents have on the behavior of nearby neuralelements? As a result, the question arises, if mechanically immobilized or by other means restricted net-works will still exhibit the desired information processing properties.

Fig. 4.0.A Examples of different cell clustering. Courtesy of Frauke Greve, PEL, ETH Zurich.

2.4±


Neuronal networks are able to self-regulate (rescale) both the strength of synaptic connections andthe morphology of their circuitry (self-wiring) to sustain the desired level and temporal structure of theirelectrical activity [3]. In general, the somata of cells seem to be rather inert after outgrowth of the culture[1], however, the axons and dendrites are wandering in the substrate following growth factor gradients.Neurons can react on certain information within minutes to hours by changing their physical shape.Membranes can adapt to the environment they are adhered to and the distribution of ionic channels mightalso change. Other important parameters of the integrate-and-fire concept underlying neural informationprocessing, such as threshold and refractory period can adapt, and thus, the sensitivity and efficiency ofa cell can change significantly over time. Some of these changes on the physiological and molecular le-vel take place within seconds.

This adaptivity is obviously a desired and essential component of neural information processing.However, it also has a significant impact on the recorded waveforms. When listening to clusters of neu-rons, which might change their position, signaling intensity, and activity level, by means of a static multi-electrode array, one will certainly experience large temporal fluctuations of the recorded spike shapes.This intrinsic unpredictability of the system poses a major challenge to the classic spike sorting problemat hand. These issues will be further discussed later in this report.

Other relevant topics when working with neurons in vitro are bio-compatibility, mechanical damagewhen preparing the culture, and relatively fast degradation of these cultures after a 3-5 weeks. Artifici-ally reduced cell diversity might also have an impact on the information processing capabilities of neuralcultures. Also, the effects of strong unnatural stimulation signals broadcasted through the medium areunknown. This is a typical problem when working with multielectrode arrays. The biological relevanceof observations on the level of emergent properties of a cooperative network obtained in experimentswith unselective stimulation techniques is to be doubted.

These and more topics cannot be discussed in this report, but they are not to be underestimated whendesigning the overall system.


5. Methodical Background: The Challenge of Spike Sorting

This section describes the challenge of spike sorting from the perspective of pattern recognition. The re-levant properties of the data are listed and a theoretical background on spike sorting techniques is given.

5.1 Introduction to the Problem

The detection of neural spike activity is a technical challenge widely considered to be a prerequisite forstudying many types of brain function.

Looking at recordings of extracellular waveforms one can immediately observe that there are diffe-rent types of spike shapes (Fig. 6.3.B and Fig. 6.3.C). It is not clear, whether they actually belong to dif-ferent neurons and/or if their shapes differ in time because of other unknown time dependent changes.Apart from the background noise introduced in the recording circuitry, there might be residuals of distantaction potentials. In the case of multi-electrode arrays such as the INPRO-chip, multiple neurons withvarying signal intensity are recorded per site. With the INPRO technology, a significant improvement ofthe signal-to-noise ratio will be achieved, while cross-talk between the channels is expected to be mini-mal [19]. However, up to four neurons are expected to contribute to the signal on each recording channel.This also means, that a significant amount of overlaps of spike shapes from different units will occur,resulting in new, unpredictable spike shapes.

Pattern recognition is the study of how machines can observe the environment, learn to detect anddistinguish patterns of interest for their purpose, and make reasonable decisions about the categories ofthese patterns [28]. A pattern can be roughly defined as the opposite of chaos, it is some form of structureor entity one could give a name. Here the patterns we are interested in are spike shapes that stand outfrom the noisy background of a continuous recording of neural activity.

The problem of the detection and classification of action potentials, decomposing overlaps and assi-gning single spike shapes to the corresponding neuron is commonly referred to as spike sorting. Spikesorting is a subdomain of the much broader domain of pattern recognition.

Many of the earlier approaches seemed to focus very much on single spike interpretation. By now, aconsiderable amount of effort has gone into the spike sorting problem and the aspect of overlap decom-position. However, it is now widely accepted that there is no single method that stands out from others,but there are several methods of varying complexity and the decision about which one is appropriate de-pends on the requirements of the specific experiment [28][37]. Since there is no optimal method, thecombination of different techniques has received a lot of attention lately.

Here a brief overview over the basic concepts is provided, since a more thorough review is simplebeyond a semester thesis. However, some other interesting approaches and will be mentioned in section[S6.3.5]. General issues like the biological justification of such algorithms, the necessity and effectiven-ess of spike sorting will be discussed later in section [S7.].

5.2 Sketching a Recognition System

The design of a pattern recognition system basically consist of the following three aspects: 1) data ac-quisition and preprocessing, 2) data representation, and 3) decision making [28].

In the case of multielectrode arrays such as the INPRO-chip, the data is acquired through multipleelectrodes and the number of units present on each electrode may vary along with the recording qualityitself. Thus, some preprocessing, such as filtering, might be required. Also, spikes must first be detectedand aligned, so that suitable ways of representation can be found in an intermediate stage. This usuallygoes along with compression (dimensionality reduction) of the data to enable the recognition system tomake reasonable decisions in real time later. Since the number of sources present on a channel in multi-unit recordings is typically unknown, the task at representation stage is, to figure out how many neuronscontribute to the signal on an electrode and to establish a set of rules that allows to discriminate betweenthose sources in the final stage of decision making (classification).

Thus, a recognition system is operated in two modes: training (learning) and classification (testing)[28].


Depending on the actual method, different types of learning might be necessary at different levels ofthe system. The three best known approaches for spike sorting are: 1) statistical classification, 2) tem-plate matching, and 3) neural networks. These models are not necessarily independent and sometimesthe same pattern recognition method exists with different interpretations. First, the main aspects of thestatistical approach will be discussed. It incorporates many of the underlying concepts also common inthe other approaches.

5.2.1 Statistical Approaches

In statistical approaches, patterns are represented in terms of d features or measurements and are viewedas points in a d-dimensional space. The goal is to choose those features that allow pattern vectors belon-ging to different categories to occupy compact and disjoint regions in a d-dimensional feature space. Theeffectiveness of the representation space (feature space) is determined by how well patterns from diffe-rent classes can be separated [28].

The role of the preprocessing module in the recognition system is to segment the pattern of interestfrom the background, remove noise, normalize the pattern, and any other operation which will contributeto defining a compact representation of the pattern. In training mode, the feature extraction /selectionmodule finds the appropriate features for representing the input patterns and the classifier is trained topartition the feature space. In classification mode, the trained classifier assigns the input pattern to oneof the pattern classes under consideration based on the measured features.

It is important to make a distinction between feature selection and feature extraction. The term featureselection refers to algorithms that select the (hopefully) best subset of the input feature set. Methods thatcreate new features based on transformations or combinations of the original feature set are called featureextraction algorithms. Thus feature extraction precedes feature selection where the features with low dis-crimination ability are discarded [28].

Feature extraction methods determine an appropriate subspace of dimensionality m (either in a linearor nonlinear way) in the original feature space of dimensionality d .

Linear transforms, such as principle component analysis, factor analysis, linear discriminant analysis,and projection pursuit [29] have been widely used in pattern recognition for feature extraction and di-mensionality reduction [28].There are several nonlinear feature extraction methods, e.g. multidimensio-nal scaling, artificial neural networks etc [30].

Independent component analysis (ICA) is a technique that exploits higher-order statistical structurein data. It finds a linear nonorthogonal coordinate system in multivariate data determined by second- andhigher-order statistics. The goal of ICA is to linearly transform the data such that the transformed varia-bles are as statistically independent from each other as possible. ICA generalizes principle componentanalysis (PCA) and, like PCA, has proven a useful tool for finding structure in data[31][32][33][34][35][36].

Well-known concepts from statistical decision theory are used to establish decision boundaries in thefeature space which optimally separate the patterns belonging to different classes. In the statistical deci-sion theoretic approach, the decision boundaries are determined by the probability distributions of thepatterns belonging to each class, which must either be specified or learned [28]]. In some sense, most ofthe approaches in statistical pattern recognition are attempting to implement the Bayes decision rule[28][37][38].

5.2.2 Artificial Neural Networks

Artificial neural networks are also widely used for pattern recognition problems. They try to model thekey properties of biological neural networks, such as learning, generalization, adaptivity, fault toleranceand distributed representation and computation. They have the ability to learn complex nonlinear input-output relationships, use sequential training procedures, adapt themselves to data, and they require littleprior knowledge [28]. They provide means of nonlinear algorithms for feature extraction and classifica-tion and often are efficient in hardware implementations. However, one should keep in mind that mostneural network architectures conceal the statistics, the actual “algorithmic solution” from the user and

m d≤( )


have to be implemented as black box with intrinsic unpredictability. The most commonly used family of neural network for pattern classification tasks is the feed-forward

network, which includes multilayer perceptron and Radial-Basis Function (RBF) networks [30]. Anotherpopular network is the Self-Organizing Map (SOM), or Kohonen-Network [39], which is mainly usedfor clustering and feature mapping.

A common problem when working with artificial neural networks is the appropriate choice of variousparameters needed for setup and training [40].

5.2.3 Template Matching

In template matching, the pattern to be recognized is matched against the stored template while takinginto account different shifts and scale changes. The template either has to be provided by the human ope-rator or learned from a training set.

This approach has several critical issues. It can fail due to all sorts of shape distortion during the ac-quisition or preprocessing phase. A major problem are the overlaps in multi-unit recordings which canrender the underlying single spike shapes unrecognizable due to summing and cancellation effects withall possible time shifts in between. Thus overlap decomposition based on templates is computationallystill very demanding [40][41][42][43].

5.3 General Issues

With the given unpredictability of the data (see sections [S3.2] and [S4.]) you will need an individuallearning phase for each individual channel to derive the crucial information needed for making reasona-ble decisions in supervised classification.

There are basically two different tasks: supervised and unsupervised classification. In supervisedclassification we know what we are looking for and a pattern can be identified as a member of a prede-fined class [28]. In unsupervised classification, patterns are clustered by an algorithm so that they formgroups depending on some measure of similarity.

When attempting to sort recordings of neural activity with mutiple signal sources on each track inclu-ding overlaps and noise, you would typically use unsupervised classification, or clustering, to find thenumber of sources present and to group them in order to obtain templates for later supervised classifica-tion with template matching.

[28] In supervised learning, labeled training samples are provided to train the classifier system whe-reas in unsupervised learning, often the number of classes has to be learned along with the structure ofeach class [28]. This is normally the case in multi-unit recordings when the electrode placement is ran-dom and as result an unknown number of different neurons can contribute to the signal.

No matter which classification system or decision rule is used, it must be trained using the availabletraining samples. As a result, the performance of a classifier depends on both the number of availabletraining samples as well as the specific values of the samples. The generalization ability of a classifierrefers to its performance in classifying test patterns which were not used during the training stage [28].A poor generalization ability of a classifier can be attributed to any one of the following factors: 1) thenumber of features is too large relative to the number of training samples (curse of dimensionality), 2)the number of unknown parameters associated with the classifier is large, and 3) a classifier is too inten-sively optimized on the training set (overtrained) [28]. Moreover, the pitfalls of overadaptation of esti-mators to the given training set are observed in several stages of a pattern recognition system such asdimenstionality reduction, density estimation, and classifier design [28].

It is beyond the scope of this report, to describe all related topics in depth. However, there are manyrelevant subtopics that are widely discussed in the community, such as the ‘curse of dimensionality’, di-mensionality reduction, classifier design, classifier combination, error estimation, unsupervised classifi-cation [28].


6. Learning by Doing: A Partial Implementation

This section contains the details of the practical side of this project. The boundary conditions of the se-mester thesis is described. Results from a preliminary data analysis with Mathematica and a partial im-plementation of unsupervised clustering in Matlab are presented. The implemented approach andalternative strategies can be found at the end of this section.

6.1 INPRO Status: Available Data

In pattern recognition, as a general basic rule, the less information is available to the system designer,the more difficult classification problems become [28]. Reading more and more on the subject of cellcultivation and pattern recognition it seemed increasingly reasonable to work with recordings as close tothe expected data from the INPRO-chip as possible. There are various reasons sketched in more detailin the previous sections concerning the spike shape dependency on biological factors as well as the pro-perties of the interface. Moreover, both the activity of neurons and the spike shapes significantly changewhen stimulated from outside.

However, due to several complications with the cell cultivation on the INPRO-chip in Kaiserslautern,Germany, was it impossible for me to work with real recordings from the INPRO-chip. To mention anexample, undesired chemical reactions between the chip components and the nutrient solution occuredwhich are highly toxic for the neural cell culture. The state of the the INPRO-chip towards the end ofthis semester thesis was such that the functionality could be verified with heart cells, although certaintechnical problems, e.g. with the power source, were found. In short, no recordings of neural cultures onthe INPRO-chip were available during the period of this semester thesis.

Thus, the only data to work with were the recordings from 2000 by Jimbo et al. [12], kindly providedby P. Bonifazi, assistant to V. Torre who worked on this project with Jimbo et al. These recordings weremade with dissociated cortical neurons of neonatal rats on a multielectrode dish with 64 active sites after4 weeks of culture [12]. The neural activity of a mixture of exitatory and inhibitory neurons and inter-neurons with randomized connections was investigated.

During the project, we received several datasets by P. Bonifazi. There were two different types of re-cordings: (a) continuous recordings without stimulation, and (b) recordings of stimulated activity consi-sting of short frames around the stimulation event.

6.2 Preliminary Data Analysis in Mathematica

To obtain a better insight into real neural spike data and thus the problem of spike sorting at hand, I wroteseveral routines in Mathematica and Matlab to extract features, detect spikes, plot overviews, performfrequency analysis of spike shapes etc. during the following four weeks. Due to the fact that I am newto the field and my very limited experience with Mathematica, the programming took a significantamount of time even though Sadik Hafizovic kindly helped me with both Mathematica and routines.

Fig. 6.2.A shows a sample overview of a continuous recording of neural background activity on 64channels of a multielectrode array used by Jimbo et al. [12]. As expected when working with multielec-trode arrays, some channels show much more activity than others which most likely is due to the randomrelative position of electrodes and neurons.

The data by P. Bonifazi can be viewed with the MCRack software package [16]. It is also possible toconvert the raw electrode data to common ASCII text files with the time measured in milliseconds in thefirst column and the measurements for each channel in the other columns. The extracted files are hugeand thus only few columns can be viewed at a time. The routine written in Mathematica allows to loadthese text files and extract a segment with arbitrary starting point and length. In Fig. 6.2.B the neuralactivity on a single channel is shown.


Fig. 6.2.A Overview of the channel activity of a continuous recording. It is clearly visible, how much theactivity and/or signal intensity vary from electrode to electrode.

Fig. 6.2.B Sample time slices from a continuous recording. Top: activity on a long segment. Bottom: a shorterspike trail in more detail

0 10000 20000 30000 40000 50000SR: 20kHz, UNITS x: pts => Time = H x * 20e-3 + 0.5 L sec, y: uV

-250

-200

-150

-100

-50

0

50

100

DATE: 2003.5.1, FN: 20020705_e13, CH: 13, OFF: 10k, LEN: 50k, SMTH: none

0 200 400 600 800 1000SR: 20kHz, UNITS x: pts => Time = H x * 20e-3 + 0.5 L sec, y: uV

-250

-200

-150

-100

-50

0

50

DATE: 2003.5.1, FN: 20020705_e13, CH: 13, OFF: 10k, LEN: 1k, SMTH: none


As sketched in section [S5.2], the first stage of data preprocessing often includes some classic smoot-hing. This was done with very simple binomial filters. To obtain a binomial filter, a line of Pascal‘s Tri-angle is used as a weighting vector or convolution function (e.g. {1,2,1} or {1,4,6,4,1} ), and thusmultiplied with the corresponding number of data points. The weighted units are then summed and de-vided through their number to compute the new value of their center point.

Starting with channels that show promising activity with good signal-to-noise ratio, I implementedseveral primitive types of peak detection algorithms. Their objective is to detect spikes based on somepredefined features and to align them in a frame of fixed length.

The most primitive detection algorithm considers a point to be a spike, when it crossed a predefinedthreshold, and when it is greater than the preceding. This, however, is only possible if the data has beenpreviously smoothed such that the peaks have monotoneously increasing or decreasing flanks. Withoutsmoothing, the same spikes were listed multiple times since noise is interpreted as distinct peaks. In Fig.6.2.C a list of smoothed waveforms with aligned peaks is shown. When aligning a greater number ofindividual spikes, it is possible to visually distinguish between different types of waveforms. This, howe-ver, does not guarantee that they directly relate to distinct sources since the spike shapes may vary overtime due to the decreasing membrane potential and other factors (see section [S2.1]), although in manycases it is very likely.

It is well known that whatever type of filter is used it will distort the signal in one way or another. Asimple binomial filter will typically distort the spike shapes by increasing the peak half-width, decrea-sing the peak hight, and shifting the peak position of assymetric shapes, like it is the case with neuralwaveforms. This effect can be seen in Fig. 6.2.D, where the raw waveforms are compared to the smoo-thed ones.

Because any form of smoothing means signal distortion which might also influence the performanceof the spike sorting system along with the need for additional computation power, I tried to modify thedetection algorithm so that it will also work with raw data.

A still very simple algorithm turned out to perform surprisingly well. This algorithm checks whetherthe threshold was crossed and if the data points next to it have a lower value. It verifies, if this is alsotrue for a pair of points with a predefinded offset from the center point, and if there is a minimal distanceto the last detected peak.


Fig. 6.2.C Top: Examples of isolated spike shapes. Bottom: When a higher number of spike shapes is aligned, itseems possible to visually distinguish between distinct spike shapes. Here, two realms above and below-75 µV can be seen.

SR: 20kHz, SMTH: BN14641, RNG: -30 to -450 uV, HOR: 2, MINPTP: 5 PAGE-5

SPIKE SHAPES LIST, DATE: 2003.5.1, FN: 20020705_e13, CH: 13, OFF: 10k, LEN: 50k

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100

0 5 10 15 20 25 30 35-200

-150

-100

-50

0

50

100


Fig. 6.2.D Impact of smoothing on the spike waveforms. Top: raw data. Middle: smoothed data with Binomial fil-ter (here a binomial filter was used {1,4,6,4,1}). Bottom: comparison of the raw and smoothed wave-forms. It is visible how the peaks are shortened and the half-width flattened.

ROWS: raw, smth, both, SR: 20kHz, SMTH: BN14641, RNG: -50 to -450 uV, HOR: 2, MINPTP: 5 PAGE-1

SHAPE DISTORTION, DATE: 2003.5.1, FN: 20020705_e13, CH: 13, OFF: 10k, LEN: 1k

0 5 10 15 20 25 30 35-300

-200

-100

0

100

0 5 10 15 20 25 30 35-300

-200

-100

0

100

0 5 10 15 20 25 30 35-300

-200

-100

0

100

0 5 10 15 20 25 30 35-300

-200

-100

0

100

0 5 10 15 20 25 30 35-300

-200

-100

0

100

0 5 10 15 20 25 30 35-300

-200

-100

0

100

0 5 10 15 20 25 30 35-300

-200

-100

0

100

0 5 10 15 20 25 30 35-300

-200

-100

0

100

0 5 10 15 20 25 30 35-300

-200

-100

0

100

0 5 10 15 20 25 30 35-300

-200

-100

0

100

0 5 10 15 20 25 30 35-300

-200

-100

0

100

0 5 10 15 20 25 30 35-300

-200

-100

0

100


To see if a fourier analysis of the data reveals other useful properties that might be exploited in a spikesorting system, we also implemented an algorithm for a simple window fourier analysis. The routinemultiplies a parameterized Hammond Window at multiples of an arbitrary step size with the dataset andapplies a discrete Fourier Transform to these snapshots in time. With the Fourier Transform, the effectof smoothing is clearly visible on the higher frequencies (Fig. 6.2.E and Fig. 6.2.F).

Fig. 6.2.E Fourier Transform when using a Hammond Window as weighting function. Upper half: raw data trailwith Fourier Transform normal and in dB. Lower half: corresponding plots of smoothed data. With theFourier Transform it is clearly visible, how smoothing eliminates high frequency noise.

SR: 20kHz, CH: 13, OFF: 9k, LEN: 2k, WIN: Tp=HAMM Ln=40, Pd=20, Sz=5, FREQHyL = y * 333.333

WINDOW FOURIER, DATE: 2003.5.1, FN: 20020705_e13, Upper: no smoothing, Lower: smoothed BN14641

0 100 200 300 4000

5

10

15

20

25

300 100 200 300 400

0

5

10

15

20

25

300 500 1000 1500 2000

-250

-200

-150

-100

-50

0

500 100 200 300 400

0

5

10

15

20

25

300 100 200 300 400

0

5

10

15

20

25

300 500 1000 1500 2000

-250

-200

-150

-100

-50

0

50


The routines described in this section were implemented in Mathematica. They are automated andallow the direct export of all graphics to user specified formats. They can help as a tool to quickly inve-stigate own recordings, localize individual spikes and have an overview over the different waveformspresent on a specific channel.

6.3 Implementation of Unsupervised Clustering

After getting a feeling for the data, I began with the study of pattern recognition for spike sorting. Thereare many papers out there describing all sorts of investigation and sorting problems. However, many ofthem completely disregard overlaps [4][5][12][22].

One possible process from raw data to sorted spike shapes is sketched as follows: (a) a spike detectormodule isolates action potentials in the multi-unit extracellular recording, (b) a dimensionality reductionscheme that provides effective discriminative features, (c) a unsupervised classification method to obtaindistinct clusters of the spike shapes, which do not have to necessarily correspond unknown number ofcontributing sources, (d) template generation for individual spike shapes based on these clusters, (e)overlap decomposition and a supervised classification method based on some measure of similarity bet-ween spike shapes and corresponding templates.

While reading and browsing for new papers on the subject, at the beginning of June, I became awareof an article by Kim et al. recently published in IEEE Transactions On Biomedical Engineering (April2003) [44].

The paper details a promising combination of projection persuit and negentropy maximisation withmixture of Gaussians (MoG) to obtain unsupervised clustering of spikes in multi-unit recordings. Inother words, a method to extract waveform templates from a set of previously detected and extractedspikes. These templates could then be used in a second step for supervised classification, e.g. by training

Fig. 6.2.F Surface plot of the Fourier Transform of raw and smoothed data when using a Hammond Window asweighting function. It is clearly visible, how smoothing eliminates the higher frequencies (flat area inthe rear of the plots).


a neural network or employing them in a template matching approach. They claim to obtain high perfor-mance under low SNR compared to other algorithms for unsupervised classification, such as PCA, Sam-mon’s mapping or generative topographic mapping GTM.

The focus of the paper by Kim et al. is limited to steps (b) to (c), thus unsupervised classification. Acompanion paper on spike detection algorithms based on discrete wavelet transforms will be publishedin IEEE Transactions on Biomedical Engineering in August (personal communication) [45]. Most of thedifferent basic mathematical tools they employ in their approach are well known in the field of patternrecognition and are also used in various other implementations. However, the combination seemed to bevery interesting and a good start for a specific partial Matlab implementation for newbies, since featureextraction and clustering is the key step in the process. Fig. 6.3.A illustrates how the unsupervised clu-stering by Kim et al. is done.

6.3.1 Mathematical Background

Kim et al. employ a projection pursuit method [29] based on negentropy maximization (PP/NEM) forfeature extraction and dimensionality reduction. They claim that this method is more capable of extrac-ting discriminative features for spike sorting and shows higher separability than other methods like PCAor nonlinear methods like GTM. Because the shape of the distribution of these features form hyper-el-lipsoidal clusters, they use the modeling of probability density function (pdf) by mixture of Gaussians(MoG) as clustering algorithm. Gaussians can match the hyper-ellipsoidal shape, whereas methods ba-sed on Euclidian distance (i.e. hyper-spherical shapes) would lead to high classification errors [28][44].The MoG model is frequently used for the purpose of cluster assignement [28][38]. The problem of de-termining the appropriate number of Gaussians in the mixture model is achieved by a rough estimationof the number of Gaussians and subsequent mode-seeking, thus the detection of maxima in the overallsurface [46].

6.3.1.1 Feature Extraction Stage: Projection Pursuit and Negentropy Maximization

The feature extraction stage is based on the excellent work of Hyvärinen on fast independent componentanalysis (ICA) [31][47][48]. It is achieved by a Projection Persuit [29][42] approach and Negentropymaximization.

The best known automatic linear feature extractor in statistical pattern recognition is the principlecomponent analysis (PCA), that computes the m largest eigenvectors of the covariance matrix ofthe n d-dimensional patterns. These eigenvectors form an ordered set of orthogonal basis vectors thatcapture the directions in the data of largest variation. Ordered in terms of how much variability they cap-ture, adding together the first k components will describe the most variation in the data. Adding additio-nal components yields progressively smaller corrections until the spike is described exactly [28][37].One study comparing clustering methods found that principal components as features yield more accu-rate classification than other features, but are not as accurate as template matching [50].

However, projection pursuit and ICA are more appropriate for non-Gaussian distributions since they

Fig. 6.3.A Diagram of the unsupervised spike sorting system by Kim et al. [44]

d d×


do not rely on the second-order property of the data. ICA has been successfully used in blind source se-paration (BSS), extracting linear feature combinations that define independent sources. This demixingis possible if at most one of the sources has a Gaussian distribution.

Using a linear transform for feature extraction has the advantage that the underlying shape of distri-bution his preserved - thus a pdf modeling of the extracted feature vectors by the Gaussian mixture isfeasable. A linear transform can be expressed as where x is a m-dimensional observed vectorrepresenting a spike shape and y is an n-dimensional vector of extracted discriminant features where

. The goal of Projection Pursuit is to find a projection matrix W such that the components of y

become discriminative features and thus the separability among clusters is maximized.Negentropy is used as objective function to find W that maximizes separability. The multimodality

of given high dimensional data might be represented most lucidly in the direction where non-Gaussianityis maximized [47][48], thus negentropy can be roughly described as a measure of non-Gaussianity. Ne-gentropy is defined as where denotes the entropy of y. denotes a Gaussianrandom vector with the same mean an variance as y.

Thus the negentropy directly signifies the measure of non-Gaussianity, and it is set to zero fora Gaussian random vector.

One can find the linear transformation matrix W that maximizes separability and thus independenceof each component by maximizing the negentropy with a batch-type learning algorithm.

This procedure is not computationally intensive, because the actual dimension of the feature vectoris limited to a low value.

Kim et al. use this algorithm by Hyvärinen [47][48] to find a limited number of basis vectors w andthen choosing the best ones according to the resulting . is approximated by some nonquadra-tic function.

6.3.1.2 Unsupervised Classification Stage: Mixture of Gaussian Model

Modeling probability density estimation by mixture of Gaussians has been utilized in many unsupervisedpattern classification problems [38]. Here Gaussians seem to be well suited because they enable to consi-der the different elongated shapes of feature clusters. As in the approach by Kim et al., for the data pro-vided by the INPRO-chip the number of units recorded per channel is unknown.

Thus they suggest to roughly estimate the number of Gaussians in the mixture in a first step and tofind the global modes (local maxima of MoG) of the total distribution. In a second step, individual Gaus-sians can be assigned to a group. The minimal distance of a the mean of an individual Gaussian to theglobal modes defines to which group it will be assigned.

The Gaussians are iteratively estimated by the application of the expectation-maximization (EM) al-gorithm [28]. The number of Gaussians for the mixture model, K, however is unknown and must also befound iteratively. Kim et al. suggest to compute the MoG with the EM-algorithm for different K and touse the log-likelihood to determine a suitable K.

Thus the number of Gaussians K can be different from the true number of units, and they used a gra-dient ascent method to identify the modes of the MoG and then assigning each Gaussian to the local ma-ximum closest to its mean. The basic idea is that the pdf of a single cluster (i.e., a single unit) can berepresented by the sum of the corresponding group of Gaussians.

These clusters could then be used to identify different spike shapes and the generation of correspon-ding templates for later supervised online classification.

6.3.2 Implementation in Matlab

To find out more about the less obvious problems at the various levels of a pattern recognition system Iimplemented the critical parts of the approach suggested by Kim et al. as their work is recent and theirdiscription is exceptionally detailed [44][48].

First of all, I had to write a routine in Matlab to extract a segment of arbitrary length and starting pointfrom a continuous recording in ASCII-format. It is possible to smooth the data with binomial filters,

y WTx=

n m<m n×

J y( ) H yG( ) H y( )–= H y( ) yG

J y( )

J y( ) J y( )


however it is unclear how much this will distort the spike waveforms and influence the feature extractionstage and the further analysis has been done without smoothing the data for now.

Working only with channels which show a rather high signal-to-noise ratio and a seemingly few num-ber of different sources and thus overlaps (visual inspection) it was sufficient to use classic thresholddetection for the localization of action potentials. The goal of this detection algorithm was to find spikingevents rather than single spike peaks. An event can include overlaps and bursts as well as isolated singlespikes present on the data trail. There are several reasons for this. First of all, a real-time decompositionof overlaps would be computationally very demanding and is not feasable for a FPGA, but has to be doneon a standard PC. Thus, it seems reasonable to list overlaps within a predefined time frame only once,even if there are multiple peaks present. This is also true for bursting events. If overlap decompositionis actually technically feasable, then it will most likely be done by just passing windows with the detectedand aligned events to a PC featuring the required computational power. Secondly, it was interesting tosee what a clustering algorithm will come up with when real-world data is streaming in, and not perfectlyisolated spike shapes like in many generated data sets.

To isolate events in raw data, a window with a certain length of up to 5 ms is moved through the data.This is very similar to online data acquisition where an FPGA might pipeline those windows (data vec-tors). These windows overlap to a certain degree so that no events are missed. The overlap depends onthe position where the last valid event was detected. Inside this larger window of approximately 5 ms aslightly offset subregion of about 2 ms is investigated for possible events. Only peaks which cross thepredefined threshold and also have a certain offset to the timeflag of the last detected event are conside-red. Thanks to the data-buffer the larger window provides, the event can then be centered without goingback to the original data stream. All events have to be aligned in their time window in order to apply PP/NEM in the feature extraction stage.

A block consisting of a predefined number of sample events collected during a learning stage can thenbe used to construct the linear projection matrix W to extract a feature vector y from a sample x whichmaximizes non-Gaussianity. Fortunately there is Matlab code for FastICA by Hyvärinen available online[47][48], although it was not exactly trivial for me to understand it, it was possible to directly exploit theroutines for our purpose. The algorithm returns the centered and whitened data matrix X holding the trai-ning samples and the corresponding transformation matrix W.

In the next stage, the best subset of basis vectors w has to be found which maximizes the negentropy for the training set X. For testing, one would usually select not more than three basis vectors w.

This way it is still possible to plot the results.For the mixture of Gaussians, a piece of code by S. Savarese and M. Welling at Caltech came in very

handy. With a couple of own extensions it was possible to use it for the computation and display of themixture of Gaussians for our data, and plotting the log-likelihood curve for different number of Gaussi-ans K, as suggested by Kim et al.

The number of Gaussians has to be found iteratively, thus by fitting several models with differentnumber K of Gaussians to the data and deciding later which K is the most appropriate. Kim et al. suggestto find the “knee“ in the log-likelihood curve versus different K and use the next higher K. This will ty-pically mean, that there are more Gaussians than distinct clusters visible in the data.

The last step detailed in the paper by Kim et al. is to implement a global mode finding algorithm [46].Using gradient ascent, the maxima in the hyper-surface formed by the mixture of Gaussian model canbe found. Then, subgroups consisting of n single Gaussians with their mean closest the maxima can beformed, obtaining the same number of groups as clusters are visible in the data. The overall shape of suchgroups formed by different Gaussians can possibly better account for certain forms of spike shape varia-tion. This step has not been implemented so far, due to time constraints and because it seemed to be amuch better investement to learn more about the biological background and its impact on the spike sor-ting issue. Furthermore, it is unclear how desirable it is, to let the more dispersed spike shapes such asoverlaps contribute to a group defining a template.

J y( )


6.3.3 Results

In a first stage, a data trail of arbitrary length can be extracted, smoothed and examined for events bysliding a window through the data as described in the previous section. Fig. 6.3.A shows the effect ofbinomial smoothing on a typical spike waveform. Fig. 6.3.B and Fig. 6.3.C show a typical data trail froma continuous recording with modest SNR at different resolution. They also show, how the events are de-tected and aligned by the algorithm.

In a second stage, the list of events previously generated is loaded, the FastICA algorithm by Hyvä-rinen extracts independent components from a learning subset (e.g. the first 50 samples), and the mostpromising projections are chosen. These projection vectors form the projection matrix W, and the samematrix is also applied for all following events, hopefully forming clusters in a n-dimensional feature spa-ce. The algorithm has various parameters, such as the number of independent components to be extrac-ted, the size of the subset of promising projection vectors as well as several parameters for FastICA itself.The success rate of having distinct clusters is around 60% after applying PP/NEM with the current codeto extracted events from channel 34. In the rest of the trials, approximately 20% of the clusters only ex-hibit boundaries which seem to be very weak when examining the clusters visually. The remaining 20%do not seem to be separable at all. In Fig. 6.3.D a random collection of different cluster shapes in thetwo-dimensional feature space are shown along with the corresponding independent components foundby FastICA. It is not clear, what the underlying rule is for how the independent components lead to thesequite different cluster shapes.

In the third stage, a mixture of Gaussian model is fitted to the clusters obtained in the previous step.Different trials with an increasing number of Gaussians K is computed (Fig. 6.3.E and following). Thenthe log-likelihood for each model is plotted against K. Different trials lead to different results due to therandom initialization in Matlab. However, the best fits can be found for a number of Gaussians which isslightly higher than the number of actual visually distinct clusters. This is the same result as describedby Kim et al. [44]. Also, an algorithm was implemented which generated artificial data windows basedon three different spike shapes, which will overlap to a certain degree. However, this very rudimentaryalgorithm did not seem to perform very well, but well enough to serve as a test data generator while pro-gramming (Fig. 6.3.J).

Fig. 6.3.A The impact of smoothing on a typical spike waveform. The dotted line is the original waveform, thecontinuous line the resulting shape after the application of a Binomial filter.

60 60.5 61 61.5 62 62.5 63 63.5 64 64.5 65−100

−80

−60

−40

−20

0

20Smoothing: Binomial [1 4 6 4 1]


Fig. 6.3.B Here a typical data trail from a continuous recording is shown. This time extracted with Matlab. Thereare many events visible and two different waveforms seem to be present. They might be separablearound -75 µV.


Fig. 6.3.C Two shorter time segments. Complex overlaps are visible in the shorter trail (bottom). Four peaks arelisted because they cross the predefined threshold of -25 µV and fullfill the other constraints of thealgorithm.


Fig. 6.3.D Examples of cluster shapes in a two dimensional feature space found by FastICA and the correspondingindependent components. The clusters exhibit a very different degree of separability for different trials.No underlying rule seems to visible when comparing the independent components with the resultingclusters.


Fig. 6.3.E MoG modelling for different number of Gaussians K. The logarithm of the likelihood of each model isplotted (lower right). According to Kim et al. [44], it is not so important to match a the number of clu-sters present in the feature space exactly. High slightly higher number leads to better results. Thus, ahigher number K after the “knee“ leads to good results. Looking at these plots, the “knee” is at K=3,and K+1 seems to be a reasonable choice, since the inner three gaussians in the second plot in the leftcolumn account quite well for the two clusters present in the feature space.


Fig. 6.3.F Feature clusters based on data from real recording.


Fig. 6.3.G Feature clusters based on data from real recording.


Fig. 6.3.H Feature clusters based on data from real recording.


Fig. 6.3.I Feature clusters based on data from real recording.


Fig. 6.3.J Feature clusters based on artificial data generated with three different randomly overlapping wave-forms extracted from real recordings and additional noise.


Fig. 6.3.K Feature clusters based on artificial data generated with three different randomly overlapping wave-forms extracted from real recordings and additional noise.


6.3.4 Obtained Results and Remaining Challenges

As the results show, PP/NEM based on ICA can actually extract distinct features to form clusters. Howe-ver, there is a fair bit of randomness involved throughout the process induced by initialization proceduresat several stages in the code and the separability of the clusters varies from run to run (Fig. 6.3.D). Sofar, the algorithm does not necessarily lead to good results. The separability in feature space is directlydependent on the choice of the independent components. Kim et al. suggest to let FastICA compute moreindependent components than are actually used for projection. However, I could not find a significantdifference when choosing a subset out of a greater number of independent components in comparison tochoosing directly the desired number (the dimension of the feature space, typically 2 for visualization).Kim claims, that the projection might vary, but that the quality of the resulting clusters is virtually thesame (personal communication). Thus, some serious verification of the current code has to be done be-fore it could be used for further developement.

However, if the projected features form separated clusters, MoG models can be found by FastICAwhich fit to the data quite well when choosing a K slightly greater than the visually distinct clusters. Thispart of the code seems to work reliably and virtually any cluster shapes can be matched with superimpo-sed Gaussians as long as they are separable (Fig. 6.3.E and following). An obvious step with the existingcode is to figure out to what spike shapes the clusters obtained actually correspond to and if they are sui-table to generate distinct templates. One could also verify, whether or not the more scattered data pointsare due to overlaps, which is very likely. Unfortunately, it is unclear whether the order of the originalvector holding the events is maintained throughout the process. This can be done quite easily, however,it will take some time which is not available to me anymore within this project.

Another direction for further research is, to investigate stimulated data provided by P. Bonifazi in or-der to prove that stimulation increases the number of overlaps and bursting events which pose the mainproblem in spike sorting approaches [37]. However, the data available was stimulated by broadcasting asingle and very strong trigger event which most likely does not resemble natural biological events [12]well enough. Furthermore, the available data consists of time windows around the trigger event. Thesewindows can of course be attached to each other to form a pseudo-continuous stream, but it will alsointroduce new unnatural artefacts at the window boundaries which might distort the performance of theclassifier and possibly confuse statistical analysis. In Fig. 6.3.A it can be seen, to what extent the inten-sity of the trigger event differs from the elicited activity. A more detailed view reveals complex overlapsand bursts.

An obvious problem regarding the detection algorithm is that it still needs a prior information aboutvarious parameters, such as reasonable window lengths, offsets, signal-to-noise ratio etc. Such assump-tions based on seemingly characteristic ad-hoc features present in the data is of course very dangerousand is unlikely to hold for other data sets. More sophisticated detection routines that minimize compu-tational costs by simplicity have to be found, e.g. [49].

Kim et al. earlier employed a nonlinear energy operator for the detection of action potentials [41].Currently they are investigating the potential of wavelet transforms. Their paper is scheduled for August2003 (personal communication) [45].

The artificial data used for preliminary tests certainly does not reflect the true properties of real re-cordings. But this is likely to be true for almost any generated data set used for benchmarking in the va-rious papers I came across, e.g. [44][51][52]. They do not account for the more complex case ofstimulated activity, possibly leading to distorted statistics and classifier performance results.


6.3.5 Discussion of the Method by Kim and Alternative Approaches

Programming the core steps, generating new routines, and modifying code packages from others, helpeda great deal in understanding the overall problems with this and many related approaches - which is themain goal of the author’s implementation.

The approach investigated by Kim at al. [44] seems promising when talking about the feature extrac-tion capabilities of the modified ICA algorithm. However, like many other approaches, it is heavily de-pendent on various factors. Both, data and preprocessing routines, such as the spike detection algorithmused, do significantly affect the performance of the classifier, and as pointed out in many parts of thisreport, it is very hard to judge its generalizing capability without the data optained form the specific tar-get device and robust detection routines. If the INPRO-chip proves to be capable of providing an SNRas high as the expected 20 dB, the detection of spiking events might in the best case be greatly facilitated.Fig. 6.3.A shows such a case, where the spike shapes are perfectly separated. However, only one elec-trode out of 64 in the used continuous recording by P. Bonifazi et al. shows such a favorable SNR. Thisis also true for other sets we received. Furthermore, one should keep in mind, that only background ac-tivity was recorded and this most likely does not account for the signal complexity in stimulated activity.

Fig. 6.3.A A trail from a pseudo-continuous stimulated recording. The time frames around the trigger event wereattached together (top). The induced activity is very complex: overlaps and bursts are visible (bottom).Both trails have been smoothed.

0 10000 20000 30000 40000 50000SR: 20kHz, UNITS x: pts => Time = H x * 20e-3 + 0. L sec, y: uV

-300

-200

-100

0

100

200

300

DATE: 2003.5.1, FN: 20020919n2_e87, CH: 87, OFF: 1, LEN: 50k, SMTH: BN14641

0 2000 4000 6000 8000SR: 20kHz, UNITS x: pts => Time = H x * 20e-3 + 0.105 L sec, y: uV

-150

-100

-50

0

50

DATE: 2003.5.1, FN: 20020919n2_e87, CH: 87, OFF: 2k, LEN: 9k, SMTH: BN14641


Fig. 6.3.A In this figure, the best possible case is shown: perfectly identifiable spikes due to a very high SNR. Theraw and smoothed trail is shown along with its window Fourier Transform. However, there was onlyone single channel with such a high SNR among 64 channels in total.

SR: 20kHz, CH: 54, OFF: 1k, LEN: 2k, WIN: Tp=HAMM Ln=40, Pd=20, Sz=5, FREQHyL = y * 333.333

WINDOW FOURIER, DATE: 2003.5.1, FN: 20020705_e54, Upper: no smoothing, Lower: smoothed BN14641

0 100 200 300 4000

5

10

15

20

25

300 100 200 300 400

0

5

10

15

20

25

300 500 1000 1500 2000

-300

-200

-100

0

100

0 100 200 300 4000

5

10

15

20

25

300 100 200 300 400

0

5

10

15

20

25

300 500 1000 1500 2000

-300

-200

-100

0

100


Like the majority of techniques, the approach by Kim et al. relies on the assumption that differentcells exhibit action potentials having unique amplitudes and waveforms - since identical cell types exhi-bit very similar waveforms, the variation is mainly due to signal distortion at various levels of the inter-face (see sections [S3.2] and [S4.]). When this assumption fails, due to the similarity of spike shapeamong different cells or due to the presence of complex spikes with declining intra-burst amplitude, the-se methods lead to errors in classification. The learning procedure can result in lumping together twoclasses having closely spaced spikes into one class. Conversely, it can also mistakenly separate one classinto two classes [22][53]. Such errors might not even be detected when inspecting the data visually, likeit was suggested in the previous section.

In an unstimulated environment overlapping spikes are, on one hand, less frequent than single spikes,since the probability of more than two neural souces firing simultaneously in a spontaneous activity isnegligible. Furthermore, they usually occur with so much variation in shape and peak-to-peak values,that it is very unlikely that they produce separate clusters, that would be identified by the algorithm asnew templates [22]. The results obtained with this preliminary implementation suggest the same, howe-ver it has not yet been verified. However, in multisite recordings including stimulus driven activity over-laps and bursting events do increase, and the probability of overlaps is usually significant [54]. In theworst case, when multiple sources with similar intensity are present on a channel and their activity isincreased due to stimulation, the resulting bursts and overlaps might produce so much variation, that nodistinct clusters can be extracted anymore.

A possible cure to the problem would be to detect overlaps and bursts already in the preprocessingstage and to use only well isolated spikes for template generation. Such an approach is likely to fail, sinceit requires a lot of prior information on the waveforms, which can not obtained online in an arbitrary re-cording including all difficulties described in sections [S3.2] and [S4.]. Of course, it might be possibleto have reduced activity and isolated spikes during an unstimulated learning phase in the beginning ofan experiment and to derive the templates from them. This can only work to some extent if the wholesystem with all its parameters is temporally stable. The information gathered throughout this project doesindicate that this will almost certainly not be the case.

To account for the likely temporal unreliability of waveforms, their resulting discriminative featuresand as a consequence also of the derived templates, the system must be adaptive and refresh its repre-sentation while in operation. This is a very common problem in all approaches, no matter if an artificialneural network, statistical methods or other means are used [43]. It is not clear, how this could be donewith a system like the PP/NEM approach or other statistic approaches like PCA, linear discriminant ana-lysis (LDA), optimal filters etc. [31][35][44][55]. These methods attempt to maximize the discriminativeinformation content in the patterns in some way. When reapplying such an algorithm to a new data subsetfrom the same recording, different features might be more discriminative. They will automatically bechosen through the classifier system, and thus, spikes are no longer labelled the same way and differentsources might be lumped together while others might be artificially separated.

Another problem with statistical approaches is that they usually make strong assumptions on the sta-tistical properties of the data. Many waveforms have a large deterministic component of variability as-sociated with the preceding interspike interval, resulting in temporal modes with non-Gaussiandistributions. Furthermore, when dealing with multiple units on each channel which are very likely toexhibit highly correlated activity (at least under stimulation) and with the given temporal instability ofneural waveforms, these mathematically unfortunate intrinsic properties of the overall system leads tofull or partial failure of such algorithms [2][31][35][37][44][55][60].

The goal of the unsupervised classification approach partially reproduced here, is to obtain templates thatcan be used in an on-line classfier system. Template generation is obviously a very important step in theoverall process. Zouridakis [51] suggested to use fuzzy c-means clustering for both, the detection of theexact number of sources present and template generation. Fuzzy clustering can deal with incomplete andambiguous information. Every spike belongs to all possible classes, but with different degrees of mem-bership. Membership values are allowed to vary continuously between zero and one. The higher themembership, the more it will contribute to the template of the corresponding class and thus, the resultingtemplates are more reliable representatives of their respective classes. The validity criterion they usedwas the fuzzy clustering validity function, expressed as the ratio of the compactness to the separation of


the clusters. At the end of fuzzy clustering, the algorithm provides for each class a fuzzy template, whichprovide maximum separation of the classes. To obtain exact values of amplitude and latency it is neces-sary to convert fuzzy partitions into hard (crisp) ones. Like in the MoG model employed by Kim et al.,the clustering procedure has to be run for different number of clusters K. The partition giving the smallestS was considered optimal for that value of K. Classifying with c-means clustering uses a Euclidian di-stance measure. Thus, only the information about the means can be considered and the distribution ofdata within the cluster is ignored. This approach is adequate when the clusters are well separated, butbreaks down when clusters significantly overlap or shapes differ significantly from a spherical distribu-tion [37][44].

Like other methods, the PP/NEM approach used for feature extraction could also be implemented inan artificial neural network. Girolami [42] developed an unsupervised generalized learning algorithm forneural networks by considering an information theoretic index of projection based on negentropy. It maybe applied to both unsupervised exploratory data analysis and independent component analysis with anarbitrary number of outputs.

The final step in a spike sorting process like the one implemented here is supervised classification, wherethe templates obtained through unsupervised clustering are used to discriminate between different wa-veforms and to decompose overlaps by some measure of similarity.

According to Kim, there is no satisfactory method for this problem available in the literature (perso-nal communication). Many others seem to agree to this [43][50].

Template matching relies on comparing the putative superposition waveforms to all possible combi-nations of each pair of templates with all possible delays in the range of -1 to +1 ms between the twotemplates. This is clearly impractical and cumbersome when multichannel recording device with highchannel count is recording a large number of cells simultaneously [54].

However, more recent methods try to employ wavelet transforms to handle the problem [56][57].Zouridakis and Tam [51] claimed in 1996, to have implemented a system to successfully discriminatespikes by using a shift-invariant wavelet transform with a magnitude and phase representation. Oweisset al. [54] propose to use a Discrete Wavelet Packet Transform (DWPT) for the representation of themultichannel neural data in space, multiresolution analysis and binary tree representation. They claim tohave no constraints on the time window in which the superposition occurs or the amount of overlap bet-ween spike waveforms. Apparently, they have a substantial degree of success in resolving an arbitrarynumber of spikes overlapped together in time without the need for template matching procedures. In theirapproach, each source contributing to the signal can be characterized by a pruned binary tree in the wa-velet domain describing the best basis for that neural source. When the analysis window contains a singlespike waveform, this amounts to tracking the eingevector associated with the largest eignevalue in eachnode of the binary tree. The higher the degree of correlation with the spike waveform, the larger the cor-responding eingvalues will be. Thus, a library of pruned binary trees is obtained for the template spikewaveforms in the data record and these serve as the basis for comparison for the complex spike wave-forms.


7. Spike Sorting: Necessity and Effectiveness

In section [S2.2] the high temporal fluctuations and the phenomena of clusterized activity in living neuralnetworks has been briefly described. It is still a matter of debate, if the details of these patterns may justbe noise that should be averaged out to reveal meaningful signals or alternatively, if the precise arrivaltime of each spike is significant. However, it is now widely agreed that these dynamics are certainly anessential feature of information processing in the brain [5][12][18].

The frequency of spikes inceases monotonically with the amplitude of the injected current. Therefore,an analog input signal gets encoded (modulated) into a spike train, and this is often referred to as ratecoding [58]. De Ruyter van Steveninck et al. [13][59] propose, that this temporal variability provides alarge capacity for carrying information and that the coding efficiency adapts to the dynamic demands ofan agent in its environment. This efficiency is achieved by establishing precise temporal relations bet-ween individual action potentials and events in the sensory stimulus. If this is true, it will have a majorimpact on the reproducibility and variability in neural spike trains. They show that in the motion-sensi-tive neuron in the fly visual system, variability of the response to constant stimuli coexists with extremereproducibility for more natural dynamic stimuli, and that the reproducibility has a direct impact on theinformation content of the spike train. De Ruyter van Steveninck al. conclude that models that may ac-curately account for the neural response to static stimuli in an experimental setup can significantly un-derestimate the reliability of signal transfer under more natural conditions. They claim, that theirobservations on the encoding of naturalistic stimuli cannot be understood by extrapolation from quasi-static experiments, nor do such experiments provide any hint of the timing and counting accuracy thatthe brain can achieve.

Different views of the neural code may be appropriate to different contexts. However, criticism con-cerning classic investigation methods such as trial averaging and unnatural stimulation techniques wasconstantly increasing in the past few years.

All spike sorting techniques up to date fail either fully or partially in identifying spikes from multipleneurons when they overlap due to occurrence whithin a short time interval. Bar-Gad et al. [60] investi-gated the effects of this failure, termed the ‘shadowing effect‘. In general, all methods in pattern recogni-tion suffer from problems common in classical signal detection methods, e.g. false positives (noise in thesignal is classified as real spikes) and false negatives (real spikes are rejected as noise). However, whenusing spike-sorting methods, additional errors may occur. These errors include false match (a spike ge-nerated by one unit is classified to a different unit) and couble match (a single waveform is classified asbelonging to more than one class). A number of errors can be reduced by better SNR and good sortingalgorithms, and some methods to reduce the misidentification caused by overlaps have been developed[22][37][38][43][51]. However, whichever methods are used, overlapping spikes are identified signifi-cantly worse than well-separated spikes. This overlap might result in several consequences: none of thespikes is identified (complete false negative); only one of the spikes is identified (partial false negative);or the overlapped signal is identified as another different spike (false match). When the errors in eithersorting of identification are systematic, they might cause effects, which seem to derive from the proper-ties of the neurons rather than from the classification procedure [60]. Auto and cross-correlation are si-gnificantly affected. Thus, short and long term synchonization might appear in the cross-correlogramsdue to the sorting problems in spite of the fact that the neurons fire independently. Bar-Gad et al. [60]stated in 2001, that methods at this date do not overcome the shadowing effect completely leading to theneed for awareness for the possibility of false phenomena derived from the systematic misidentificationof overlapping spikes in multi-unit recording from a single electrode.

Masuda et al. [18] investigated ergocity, thus the consistency between trial averaging and populationaveraging in models. If the modulation of firing rates or the correlated firing is relevant to informationprocessing, the brain must calculate these statistics using spike trains from many neurons instead of thosefrom many trials like it is done in many statistical investigations [18]. They point out that „in real situa-tions, what is available to the brain is not the trial statistics but the population statistics.“ In brain-machi-ne interface (BMI) applications, multi-channel recordings are spike sorted before the single-unit spiketrains are used in computational analysis to decode the neural response. Nevertheless, questions about


the necessity and effectiveness of spike sorting remain. Recent investigations by [17] have shown thatmutual information content between a stimulus signal and the resulting single-unit response decreasesexponentionally as random spike sorting error, both false positive or false negative, is introduced.

In summary, these and other findings suggest, that either every single spike of an individual neuron holdscrucial information and its response must be both, precise and reliable, or that the brain does not rely oncode that can so easily be corrupted, but on highly redundant and thus robust information of many neu-rons firing at the same time.

Spike discrimination has been extensively investigated since the mid 1960‘s. The necessity of extrac-ting single-unit signals from multi-unit recordings has been taken for granted [17]. The dependency ofany spike sorting system on the biological constraints has been discussed in this report to a great detail.An online and real-time pattern recognition system which can reliably discriminate spike shapes despitetheir temporal instability is unknown to the author. All pattern recognition systems are highly dependenton the predictability of repeating patterns in their environment. While many authors suggest, that the fi-ring patterns observed in neural networks might indeed be entirely deterministic and theoretically com-putable, it is very doubtful that this is still the case when introducing additional random elements at theneuron-electrode interface level like it is the case with a multielectrode array (see section [S3.2] and[S4.]).


8. Lessons Learned

8.1 Summary

This report attempts to shed light on the complex problems involved when trying to build an interfacebetween machine and a culture of living neurons.

The generation of an action potential in a single cell was briefly discribed, as well as the basic pro-perties of the prototype chip. It was shown, how the chip-neuron interface can introduce unpredictableelements into the recorded spike shape. The inherent physical adaptiveness of interconnected neural cellsand the resulting temporal variation of singaling strength in the group was described. Recent insights intothe complex dynamics of natural neural networks were mentioned. It was detailed, how spiking patternsmight encode information, which possibly degrades significantly when relying on current spike sortingsystems. Criticism regarding statistics on networks under unnatural condition, such as static stimulation,trial averaging, and neglected correlation and shadowing effects were mentioned.

The problem of overlaps resulting from the fact that an unknown number of neurons can contributethe signal recorded on each electrode of the chip has been extensively discussed. A brief overview ondifferent approaches in spike sorting was given as well as the discription of the implemented unsupervi-sed clustering system. The potency of the algorithm proposed by Kim et al. [44] was discussed as wellas related approaches and alternatives. Finally, the necessity and effectiveness of spike sorting itself wasquestioned.

Although I put much effort into structuring the interdisciplinary information contained in this report,the actual work was not as linear. Many critical aspects surfaced while implementing the unsupervisedclustering system suggested by Kim et al. [44]. Most of the biological information, however, was speci-fically acquired at the end of this thesis while writing this report. In this report, all relevant informationknown to the author on the challenge of spike sorting has been discussed to provide a good starting pointfor future work at PEL.

8.2 Conclusions and Recommendations

Vapnik once stated: “If you possess a restricted amount of information for solving some problem, try tosolve the problem directly and never solve a more general problem as an intermediate step. It is possiblethat the available information is sufficient for a direct solution but is insufficient for solving a more ge-neral intermediate problem”.

On the other hand however, incorporating as much prior knowledge into the design as possible canconsiderably help to increase robustness and minimize the computational cost. This is a common rule inall pattern recognition problems.

It has been shown in this report, how immensely complex a pattern recognition task like spike sortingcan become, when not only the system producing the signals is still widely unknown and seems to exhibitlarge variability, but additional elements of unpredictibility are introduced into the setup by implemen-ting an interface where neither contact quality nor the number of contributing sources will be predictable.

Currently, there are three distinct strategies or perspectives on the problem: (I) to proceed on the wayof classic spike sorting, attempting to isolate and identify every single spike with sophisticated algo-rithms, (II) to redesign the interface to some extent in order to improve contact quality, minimize chaoticelements and provide as much information to the spike discrimination system as possible, and (III) notto sort at all, but to treat channels as if they were a single source. Results may be obtained simply bytraining the network with biologically inspired protocols.

Of course, these different strategies should be combined and might iteratively lead to the goal of suf-ficient communication reliability in order to exploit the processing power of natural neural nets.

(I) New approaches are published on a daily basis. The in depth study of more recent results and fur-ther investigations of the biological constraints are certainly a good path to follow. Judging from infor-mation collected throughout this semester thesis, a hybrid system based on dicrete wavelet transformsseems promising.

(II) A partially prestructered network by means of cell immobilization or guided growth in specific


structures (holes or channels in the surface of the chip) might lead to a more reliable contact quality tospecific single neurons. The immobilization would only effect a two-dimensional interface plane and notthe whole spatially distributed network, thus minimizing the effects of structurally unrealistic conditions.On the other hand, when continuing to work with an unstructured network like it is planned for the IN-PRO prototype chip, a smaller spacing between the electrodes could introduce more information into thesystem, although it will also mean higher background noise levels. The additional information has alsobeen exploited in more traditional approches with stereotrodes and tetrodes [53]: when the same sourcecan be recorded on several channels at the same time, classic algorithms like independent componentanalysis can exploit this redunancy for overlap decomposition.

(III) In the third strategy, however, the main assumption would be that information is processed witha certain fuzzyness and redundancy, thus making the system robust. This is common for solutions foundby natural evolution and seems also very likely in the case of neural information processing. On onehand, neurons very close to the electrode will be highly interconnected and hopefully form a cluster thatexhibits highly correlated activity which can be interpreted as a single signal. Neural networks are on theother hand known to be very adaptive. This adaptiveness is directly dependent on the incoming stimulus.If training induces changes in the network structure like it has been observed in other approaches, thenetwork might evolve to satisfy the our needs: a physical adaptation of the cell to the stimulating elec-trode might lead to improved contact quality. Also, the cell could becoming more significant in its regi-on, acting as a biological gateway eliciting activity in the corresponding functional unit. With thisapproach, a lot of computational power could be saved.

In order to observe the activity of the network at a cell level beneath the functional level of the net-work and to detect progressive changes induced by training or other manipulations, spike sorting willprobably be a necessary tool. However, for the various reasons detailed in this report, statistics based onspike sorting algorithms must be handled wisely with the underlying biological concepts in mind.

Despite worldwide efforts in understanding the brain still very little is known about neural informationprocessing. It is widely agreed, the human brain is the most complex system known to man. Thus, oneshould be well aware, that even with the most recent insights a subunit formed by a bunch of cells is infact still a black box. Future work on the INPRO-chip will hopefully reveal more of the mistery of bio-logical systems.

AcknowledgementsI would like to thank my supervisor Sadik Hafizovic for helping me with the mathematical and compu-tational side of this work as well as for proofreading, and especially for encouraging me with his positiveattitude throughout the project. We are greatful to Paolo Bonifazi for providing us with real world data.Also thanks to Pascal Kaufmann for fruitful discussions and encouragement during the early stages ofthis work.


References[1] W. L. C. Rutten, “Selective Electrical Interfaces With The Nervous System“, Annu. Rev. Biomed. Eng.,

4:407-52 (2002)[2] M. S. Fee, P. P. Mitra, D. Kleinfeld, “Automatic sorting of multiple unit neuronal signals in the presence of

anisotropic and non-Gaussian variability“, Journal of Neuroscience Methods, Vol.69, 175–188 (1996)[3] R. Segev, “Formation of Electrically Active Clusterized Neural Networks”, Physical Review Letters,

vol.90, No.16, (2003)[4] G. Shahaf, S. Marom, “Learning in Networks of Cortical Neurons“, The Journal of Neuroscience,

21(22):8782-8788 (2001)[5] F. Pinato, S. Battistone, V. Torre, “Statistical independence and neural computation in the leech ganglion“,

Biol. Cybern., 83, 119-130, (2000)[6] A. M. Thomson, A. P. Bannister, A. Mercer, O. T. Morris, “Target and temporal pattern selection at neo-

cortical synapses“, Phil. Trans. R. Soc. Lond., B 357, 1781-1791, (2002)[7] D. J. Amit, G. Mongillo, “Spike-Driven Synaptic Dynamics Generating Working Memory States”, Neural

Computation 15, 565–596 (2003)[8] L. F. Abbott., S. Song, “Temporally asymmetric Hebbian learning and neuronal response variability.”, In

M. S. Kearns, S. A. Solla, & D. A. Cohn (Eds.), Advances in neural information processing systems, 11.Cambridge, MA:MIT Press (1999).

[9] Shadlen, M. N., & Newsome, W. T. (1998). “The variable discharge of cortical neurons: implications forconnectivity, computation,andinformation coding.”, J. Neurosci., 18(10), 3870–3896.

[10] Shaw, G. L., Harth, E., & Scheibel, A. B. (1982). “Cooperativity in brain function: Assemblies of approxi-mately 30 neurons.”, ExperimentalNeurology,77, 324–358.

[11] Shaw, G. L., & Silverman, D. J. (1988). “Simulations of the trion model and the search for the code of hig-her cortical processing.”, In R. M. J. Cotterill (Ed.), Computer simulation in brain science (pp. 189–209),Cambridge: Cambridge University Press.

[12] Y.Jimbo, A.Kawana, P.Parodi, V.Torre, “The dynamics of a neuronal culture of dissociated cortical neu-rons of neonatal rats”, Biol. Cybern., vol. 83, 1-20 (2000)

[13] R. R. de Ruyter van Steveninck, G. D. Lewen, S. P. Strong, R. Koberle, W. Bialek, “Reproducibility andVariability in Neural Spike Trains“, Science, vol.275, (March 1997)

[14] Y.Jimbo, T. Tateno, H. P. C. Robinson, “Simultaneous induction of pathway-specific potentiation anddepression in networks of cortical neurons”, Biophysical Journal, vol.76, 670-678, (Feb. 1999)

[15] G. Chechik, “Spike-Timing-Dependent Plasticity and Relevant Mutual Information Maximization”,Neural Computation, 15, 1481-1510, (2003)

[16] Multichannel Systems MCS GmbH, Germany, http://www.multichannelsystems.com/[17] D. S. Won, D. Y. Chong, P. D. Wolf, “Effects Of Spike Sorting Error On Information Content In Multi-

Neuron Recordings“, Proc. of the 1st Internat. IEEE EMBS, Conference on Neural Engineering, (2003)[18] N. Masuda, K. Aihara, “Ergodicity of Spike Trains: When Does Trial Averaging Make Sense?“, Neural

Computation, 15, 1341-1372, (2003)[19] W.Franks, F.Heer, I.McKay, S. Taschini, R.Sunier, C.Hagleitner, A.Hierlemann, H.Baltes, “CMOS mono-

lithic microelectrode array for stimulation and recording of natural neural networks“, Proc. IEEE, (2003)[20] U. S. Bhalla, “Understanding complex signalling networks through models and metaphors”, Progress in

Biophysics & Molecular Biology, 81, 45-65, (2003)[21] Frankenhauser B, Huxley AF. 1964. "The action potential in the myelinated nerve of Xenopus Laevis as

computed on the basis of voltage clampodata.", J. Physiol., 171:302–15[22] A. F. Atiya, “Recognition of Multiunit Neural Signals”, IEEE Trans. Biomed. Eng., vol.39, No.7, (July

1992)[23] M. Scholl, C. Sprössler, M. Denyer, M. Krause, K. Nkajima, A. Maelicke, W. Knoll, A. Offenhäuser,

“Ordered networks of rat hippocampal neurons attached to silicon oxide surfaces“, Journal of Neurosci-ence Methods, 104, 65-75, (2000)

[24] M. P. Maher, J. Pine, J. Wright, Y. Tai, “The neurochip: a new multielectrode device for stimulating andrecording from cultured neurons“, Journal of Neuroscience Methods, 87, 45-56, (1999)

[25] G. Zeck, P. Fromherz, “Noninvasive neuroelectronic interfacing with synaptically connected snail neurons


immobilized on a semiconductor chip“, PNAS, (2001)[26] G. R. Holt, C. Koch, “Electrical Interactions via the Extracellular Potential Near Cell Bodies“, Journal of

Computational Neuroscience, 6, 169-184, (1999)[27] J. McFadden, “Evidence For An Electromagnetic Field Theory Of Consciousness“, (2002)[28] A. K. Jain, R. P. W. Duin, J. Mao, “Statistical Pattern Recognition: A Review”, IEEE Transactions on Pat-

tern Analyis and Machine Intelligence, vol. 22, No.1 (2000)[29] P. J. Huber, “Projection pursuit” Ann. Statist., vol. 13, pp. 435–475, 1985.[30] X. Fu, L. Wang, “Data Dimensionality Reduction With Application to Symplifying RBF Network Struc-

ture and Improving Classification Performance”, IEEE Trans. on Systems, Man, and Cybernetics, vol.33,No.3, (June 2003)

[31] T. W. Lee, M. Girolami, and T. J. Sejnowski, “Independent component analysis using an extended infomaxalgorithm for mixed sub-Gaussian and super-Gaussian sources,” Neural Computat., vol. 11, pp. 417–441,1999.

[32] J. Karhunen, A. Hyvärinen, R. Vigario, J. Hurri, E. Oja, “Applications of Neural Blind Source Separationto Signal and Image Processing“, IEEE, (1997)

[33] T. Lee, M. Girolami, T. Sejnowski, “Independent Component Analysis Using an Extended Infomax Algo-rithm for Mixed Subgaussian and Supergaussian Sources“, Neural Computation, 11, 417-441, (1999)

[34] T. Lee, M. S. Lewicki, M. Girolami, T. J. Sejnowski, “Blind Source Separation of More Sources ThanMixtures Using Overcomplete Representations“, IEEE Signal Processing Letters, vol.6, No.4., (April1999)

[35] T. Lee, M. S. Lewicki, “Unsupervised Image Classification, Segmentation, and Enhancement Using ICAMixture Models“, IEEE, (2002)

[36] X. Giannakopoulos, J. Karhunen, E. Oja, “A Comparison of Neural ICA Algorithms Using Real-WorldData“, IEEE, (1999)

[37] M. S. Lewicki, “A review of methods for spike sorting: The detection and classification of neural actionpotentials”, Network: Computation Neural Syst., vol. 9, pp. R53–R78, (1998)

[38] M. S. Lewicki, “Bayesian modeling and classification of neural signals”, Neural Computat., vol. 6, pp.1005–1030, 1994

[39] T. Kohonen, “Self-Organizing Maps”, 2nd ed. Berlin, Germany: Springer-Verlag, 1997[40] F. Öhberg, H. Hohansson, M. Bergenheim, J Pedersen, M. Djupsjöbacka, “A neural network approach to

real-time spike discrimination during simultaneous recording from several multi-unit nerve filaments “,Journal of Neuroscience Moethods, 64, 181-187, (1996)

[41] K. H. Kim and S. J. Kim, “Neural spike sorting under nearly 0-dB signal-to-noise ratio using nonlinearenergy operator and artificial neural network classifier,” IEEE Trans. Biomed. Eng., vol. 47, pp. 1406–1411, (Oct. 2000)

[42] M. Girolami, A. Cichocki, and S. I. Amari, “A common neural-network model for unsupervised explora-tory data analysis and independent component analysis,” IEEE Trans. Neural Network, vol. 9, pp. 1495–1501, 1998.

[43] R. Chandra and L. M. Optican, “Detection, classification, and superposition resolution of action potentialsin multiunit single channel recordings by an on-line real-time neural network,” IEEE Trans. Biomed. Eng.,vol.44, pp. 403–412, (May 1997)

[44] K.H.Kim, S.J.Kim, “Method for Unsupervised Classification of Multiunit Neural Signal Recording UnderLow SNR“, IEEE Transactions on Biomedical Engineering, vol.50, No.4, (April 2003)

[45] K. H. Kim and S. J. Kim, “Method for action potential detection from extracellular neural signal recordingwith low signal-to-noise ratio,” IEEE Trans. Biomed. Eng., 2003, to be published.

[46] M. A. Carreira-Perpinan, “Mode-finding for mixtures of Gaussian distribution,” IEEE Trans. Pattern Anal.Mach. Intell., vol. 22, pp.1318–1323, Nov. 2000.

[47] A. Hyvärinen, “Fast and robust fixed-point algorithms for independent component analysis,” IEEE Trans.Neural Network, vol. 10, pp.626–634, May 1999.

[48] A. Hyvärinen, E. Oja, “Independent Component Analysis: A Tutorial“, Neural Networks, (1999)[49] S. N. Kudoh, T. Taguchi, “A simple exploratory algorithm for the accurate and fast detection of spontane-

ous synaptic events“, Biosensors and Bioelectronics, 17, 773-782, (2002)


[50] B. C. Wheeler and W. J. Heetderks, “A comparison of techniques for classification of multiple neuralsignals,” IEEE Trans. Biomed. Eng., vol.BME-29, pp. 752–759, 1982.

[51] G. Zouridakis and D. C. Tam, “Identification of reliable spike templates in multi-unit extracellular recor-dings using fuzzy clustering“, Comp. Biol. Med., vol.27, No.1, pp.9-18, (1997)

[52] G. Zouridakis and D. C. Tam, “Identification of reliable spike templates in multi-unit extracellular recor-dings using fuzzy clustering,” Comp. Meth. Prog. Biomed., vol. 61, pp. 91–98, 2000

[53] C. M. Gray, P. E. Maldonado, M. Wilson, B. McNaughton, “Tetrodes markedly impove the reliability andyield of multiple single-unit isolation from multi-unit recordings in cat striate cortex”, Journal of Neu-roscience Moethods, 63, 43-54, (1995)

[54] K. G. Oweiss, D. J. Anderson, “Spike Superposition Resolution In Multichannel Extracellular NeuralRecordings: A Novel Approach“, Proc. of the 1st International IEEE EMBS, 20-22, (March 2003)

[55] S. N. Gozani and J. P. Miller, “Optimal discrimination and classification of neuronal action potential wave-forms from multiunit, multichannel recordings using software-based linear filters,” IEEE Trans. Biomed.Eng., vol. 41, no. 4, pp. 358–372, 1994

[56] S. Mallat, “Zero-crossing of a wavelet transform“, IEEE Trans. Inform. Theory, 37, (1991)[57] S. Mallat, S. Zhong, “Characterization of singnals from multiscale edges“, IEEE Trans. Pattern. Anal.

Mach. Intell., 14, (1992)[58] Z. Nenadic, B. Ghosh, “Computation with Biological Neurons“, Proc. of the American Control Confe-

rence, 25-27, (June 2001)[59] F. Rieke, D. Warland, R. De Ruyter Van Steveninck, and W. Bialek, “Spikes: Exploring the Neural Code”,

Cambridge, MA: The MIT Press, (1996)[60] I. Bar-Gad, Y. Ritov, E. Vaadia, H. Bergman, “Failure in identification of overlapping spikes from multiple

neuron activity causes artificial correlations”, Journal of Neuroscience Methods, 107, 1-13, (2001)[61] X. Yang, S. A. Shamma, “A Totally Automated System for the Detection and Classification of Neural

Spikes”, IEEE Trans. Biomed. Eng., vol.35, No.10, (Oct. 1988)

INPRO Project - Semantic ScholarSophisticated information processing is required fo r the huge...

Documents

Transcript of INPRO Project - Semantic ScholarSophisticated information processing is required fo r the huge...