Barbara Hammer, Alexander Hasenfuss and Thomas Villmann- Magnification control for batch neural gas

8/3/2019 Barbara Hammer, Alexander Hasenfuss and Thomas Villmann- Magnification control for batch neural gas

1/20

Magnication control for batch neural gas

Barbara Hammer

Institute of Computer Science, Clausthal University of Technology, Germany

Alexander Hasenfuss

Institute of Computer Science, Clausthal University of Technology, Germany

Thomas Villmann

Clinic for Psychotherapy, Universit at Leipzig, Germany

Abstract

Neural gas (NG) constitutes a very robust clustering algorithm which can be derived asstochastic gradient descent from a cost function closely connected to the quantization error.In the limit, an NG network samples the underlying data distribution. Thereby, the con-nection is not linear, rather, it follows a power law with magnication exponent differentfrom the information theoretically optimum one in adaptive map formation. There exists acouple of schemes to explicitely control the exponent such as local learning which leadsto a small change of the learning algorithm of NG. Batch NG constitutes a fast alternativeoptimization scheme for NG vector quantizers which has been derived from the same costfunction and which constitutes a fast Newton optimization scheme. It possesses the samemagnication factor (different from 1) as standard online NG. In this paper, we propose amethod to integrate magnication control by local learning into batch NG. Thereby, the keyobservation is a link of local learning to an underlying cost function which opens the waytowards alternative, e.g. batch optimization schemes. We validate the learning rule derivedfrom this altered cost function in an articial experimental setting and we demonstrate thebenet of magnication control to sample rare events for a real data set.

1 Introduction

Vector quantization constitutes an important technical problem in different areasof application such as data mining, control, image compression, or information

Email address: [email protected] (Barbara Hammer).

Preprint submitted to Elsevier Science 29 September 2006


2/20

representation. Thereby, the tasks are diverse such as the minimization of the quan-tization error, optimum information transfer, classication, visualization, or topo-graphic map formation [17,25]. Self-organizing quantization processes are a com-mon property of many regions of the brain, including the visual, auditory, and so-matosensoric cortex. Inspired by nature, a variety of self-organization mechanismshas been proposed, such as [9,23]. Apart from their wide applicability, the algo-rithms captivate by their intuitive learning rules and easy interpretability due to pro-totypes. Neural gas (NG) as proposed in [14] constitutes a particularly robust vectorquantization method. It has the benet that, unlike the self-organizing map [8,9], thedynamics follow a cost function which is closely connected to the standard quan-tization error of vector quantization. The behavior of NG can be interpreted as anoverlay of standard vector quantization and a diffusion process introduced into thebehavior by means of neighborhood cooperation. This diffusion process accountsfor a robustness of the algorithm to the specic initialization. NG networks do notpropose an explicit neighborhood topology for the nal map formation, however,

they can be extended towards topographic maps with data optimum topology byconnecting the respective rst and second winner for all data points [15]. As it isshown in [15], this yields a Delaunay triangulation of the underlying data manifoldif the data points are sufciently dense (as dened in [15]). In particular, an adap-tation of the prototypes as provided by NG according to their rank corresponds tothis respective topology.

A characteristic property of vector quantizers consists in a selective magnicationof regions of interest. This corresponds to a specic connection of the density of prototypes and stimuli. Usually, regions with high data density attract more proto-

types than regions which are only sparsely covered. An information theoreticallyoptimum magnication factor one corresponds to an exact adjustment of the pro-totypes according to the underlying data distribution. That means, the amount of data is the same for the receptive eld of every prototype. In this case, the amountof information which is conserved substituting the points in a receptive eld byits prototypes is maximized. Therefore, optimum information transfer correspondsto this setting. A magnication factor one is achieved by approaches which ex-plicitely optimize the information transfer or related quantities [13]. For a varietyof popular alternatives, however, the magnication follows a power law with expo-nent different from one [4,14,24]. Starting with the work [1], schemes to control themagnication factor have been proposed in the literature, for recent results see e.g.[20]. Popular methods include local learning, where the learning rate of the trainingalgorithm is adjusted according to the local data density; winner relaxing strategieswhere the learning rate of the winner is enlarged by an additional correction toachieve optimum information transfer; and convex and concave learning where anexponent is added to the adaptation vector of the prototypes into the direction of theactual data point. In all cases, magnication control changes the learning schemeand allows to achieve a magnication factor one or beyond. An explicit controlis particularly interesting for application areas where rare events should be sup-pressed or, contrarily, emphasized. A magnication factor larger or smaller than

2


3/20

one, respectively, allows to achieve this goal. Explicit magnication control hasproven benecial in several tasks in robotics and image inspection [16,22,21]. Inaddition, this effect corresponds to biological phenomena such as the perceptualmagnet effect which leads to an emphasis of rarely occurring stimuli [7,12].

The magnication factor of online NG is D/ (D + 2) [14], D being the intrinsic(Hausdorff) dimension of the data manifold of stimuli. Thus, it is different from onein general, and it approaches one only for very large intrinsic dimensionality, whichis usually not the case. The magnication factor can be controlled using e.g. locallearning, as already mentioned above. Local learning changes the learning rate by afactor depending on the local data density [19]. It constitutes a particularly intuitivelearning scheme which is plausible from a biological point of view. We will focuson the local learning method in the following.

Neural gas is a very robust, but a computationally complex method since several

thousands of learning cycles or even more might be necessary for convergence [14].For priorly known training examples, an alternative batch update scheme becomespossible [3]: the cost function of neural gas is optimized in turn for the prototypelocations and hidden variables given by the rank. Since each step takes all trainingpatterns into account and directly moves into the next local optimum, much fewertraining cycles are necessary. One can show that batch optimization can be inter-preted as (fast) Newton optimization of the cost function [3]. Unlike batch Self Organizing Maps (SOM), which easily suffers from topological mismatches un-less it is initialized appropriately [5], a data optimum topology is achieved for NGschemes.

Since it optimizes the original NG cost function, batch-NG follows the same powerlaw for the magnication factor as online NG. Thus a factor D/ (D + 2) which isdifferent from one is achieved. Here we introduce magnication control for batchNG by including local learning into the update formulas. The link becomes possi-ble because local learning can be related to a modied cost function which can beoptimized in the batch mode. Intuitive update formulas arise where the new proto-type locations are determined as the average of the data points weighted accordingto the rank and the local data density. As for standard batch NG, one can prove theconvergence of batch optimization of this altered cost function. We demonstratethis strategy by a controlled example where the property of optimum information

transfer can be tested. In addition, we demonstrate the benet of an explicit magni-cation control in one real life experiment from remote sensing image analysis.

2 Neural Gas

Assume data vectors v R d are given as stimuli, distributed according to P (v ).The goal of neural gas is to represent the data points faithfully by prototypes w i

3


4/20

R d, i = 1 , . . . , n . This objective is formalized as the goal to minimize the costfunction

E (W ) =1

2C ()

n

i=1 h (ki(v , W )) v w i 2P (v )dv (1)where W denotes the set of prototypes, ki(v , W ) = |{ w j | v w j 2 < v w i 2}|is the rank of prototype i, h (t) is an exponential shaped curve for t 0 such ash (t) = exp( t/ ) with neighborhood range > 0, and C () is a normalizationconstant. This function is closely related to the standard quantization error [14].In order to avoid the sensitivity of standard vector quantization to initialization,it integrates neighborhood cooperation into the cost function. The correspondingonline adaptation rule of NG is derived as stochastic gradient descent by taking thederivative of (1). The adaptation

w i = h (ki(v j , W )) (v j w i) (2)

with learning rate > 0, given a stimulus v j , results. This learning rule adapts allprototypes according to their rank given v j .

This adaptation can be applied in online scenarios such as a robot exploring anenvironment; however, usually several thousand steps or even more are necessaryfor convergence, and the procedure can become quite costly due to an usually onlylinear convergence of stochastic gradient descent methods. If data are given priorly,an alternative batch adaptation scheme can be applied. For a given (nite) trainingset v 1, . . . , v p of training data the cost function (1) becomes

E (W ) =1

2C ()

n

i=1

p

j =1h (ki(v j , W )) v j w i 2 . (3)

This function cannot analytically be optimized directly. Therefore, in batch op-timization hidden variables are introduced: the term ki(v j , W ) in eq. (3) is sub-stituted by a free variable kij which is to be optimized under the condition thatk1 j , . . . , k nj yields a permutation of 0 to n 1 for each j . The cost function (3)

becomes

E (W, K ) =1

2C ()

n

i=1

p

j =1h (kij ) v j w i 2 (4)

where K collects the vector of hidden variables. Note that a global optimum of E (W, K ) under the constraints that kij constitutes a permutation of {0, . . . , n 1}for xed j necessarily assigns the ranks to kij , thus, the global optima of E (W,K )and E (W ) coincide. It is possible to optimize E (W, K ) analytically if either W or

4


5/20

K is xed. For xed w i , optimum variables kij are given by the rank ki(v j , W ). Inturn, for xed kij , optimum assignments of the prototypes have the form

w i = p

j =1

h (kij ) v j p

j =1

h (kij ) .

Batch NG consecutively performs these two optimization steps until convergence,which can usually be observed after only few epochs [3]. Batch NG always con-verges to a xed point of the assignments, which is a local optimum of the original(discrete) cost function of NG unless two prototypes have the same distance fromone data point for the nal prototype locations (with measure 0 for concrete set-tings) as shown in [3].

3 Magnication control

Optimization schemes for the NG cost function result in a map formation whichobeys a magnication power law with magnication exponent different from oneas demonstrated in the literature [14]. In this argumentation, the effect of an averageupdate w i on the map behavior is investigated. Thereby, several properties of the map are used, such as the fact, that the neighborhood function h (ki(v , W ))converges sufciently fast to 0 such that terms of higher order can be neglected.

In addition, the system is considered in the limit of many prototypes, such that acontinuum can be assumed. Then, the weight density (w i) of the map is linked tothe density P (w i) given by the input space by

(w i) P (w i)

with a magnication factor = D/ (D + 2) where D d is the effective data di-mensionality of data embedded in R d [14]. For a given nite number of prototypesand patterns this law approximately describes concrete maps. An information theo-retic optimum is achieved for = 1 in which case each prototype accumulates thesame number of patterns in its respective receptive eld. For low intrinsic dimen-sionality D , which is often the case in concrete settings, the magnication factoris considerably smaller than 1. This has the consequence that regions of the inputspace with low data density are emphasized.

Local learning extends the learning rate in eq. (2) by a factor which depends on thelocal data density:

w i = 0 P (w s(v j ))m h (ki(v j , W )) (v j w i) (5)

5


6/20

where 0 > 0 is the learning rate and s(v j ) is the winner index for stimulus v j . P describes the data density. m > 0 is a constant which controls the magnicationexponent. The factor P (w s(v j ))m vanishes for m = 0 leading to standard NG. Fordifferent values, a local learning factor depending on the data density at the winnerlocation is added.

For this learning rule, the power law (w i) P (w i)

results where

= ( m + 1) = ( m + 1) D/ (D + 2) (6)

as shown in [19]. An information theoretically optimum factor is obtained form = 2 /D . Larger values emphasize input regions with high density, whereassmaller values focus on regions with rare stimuli. To apply the learning rule (5), thedistribution P as well as the effective data dimensionality D have to be estimatedfrom the data (using e.g. Parzen windows resp. the box counting dimension).

Here, we consider the similar learning rule

w i = 0 P (v j )m h (ki(v j , W ))( v j w i) . (7)

where the local density of the location of the stimulus is taken instead of the winner.The average of the learning rule (7) can be formulated as an integral

w i P (v )m h (ki(v , W )) (v w i) P (v )dv .In the limit of a continuum of prototypes, w s(v ) = v holds, thus, the average updateyields exactly the same result as the original one (5) proposed in [19]. Since themagnication factor of local learning has been derived under the assumption of acontinuum of prototypes with w s(v ) = v , the same magnication factor (m +1)

results for this altered learning rule. This alternative update (7) has the benet thatit constitutes a stochastic gradient descent of the cost function

E m (W ) =1

2C ()

n

i=1 P (v )m h (ki(v , W )) v w i 2 P (v )dv (8)as shown in Appendix A. Thus learning schemes which optimize the cost functionE m (W ) yield a map formation with magnication factor

as given by eq. (6).

The formulation of local learning by means of a cost function opens the way to-wards an extension of control schemes to batch learning: For a given discrete set of training stimuli, the cost function (8) becomes

E m (W ) =1

2C ()

n

i=1

p

j =1h (ki(v j , W )) v j w i 2 P (v j )m . (9)

6


7/20

As beforehand with eq. (4), we substitute the terms ki(v j , W ) by hidden variableskij which are chosen from {0, . . . , n 1} such that k1 j , . . . , k nj constitutes a per-mutation of {0, . . . , n 1}:

E m (W, K ) =

1

2C ()

n

i=1

p

j =1 h (kij ) v

j w

i

2

P (v

j )m

. (10)

Batch optimization in turn determines optimum kij , given prototype locations, andoptimum prototype locations, given values kij , as follows. Obviously, optimum kijare given by the ranks for xed W . It can be seen further by setting the partialderivatives of eq. (10) to zero,

E m (W,K ) w i

=1

C ()

p

j =1h (kij )( w i v j ) P (v j )m

!= 0 ,

that the optimum w i are given by the average of the points weighted by the rank and local data density. The following iterative update scheme results:

(i) kij = |{ w l | v j w l 2 < v j w i 2}| ,

(ii) w i = j h (kij ) P (v j )m v j / j h (kij ) P (v j )m(11)

It is shown in Appendix B that this procedure converges in a nite number of steps

towards a xed point. The xed point is a local minimum of the cost function (9) if the distances of the nal prototype locations from the data points are mutually dif-ferent (which is almost surely the case in concrete settings). In analogy to [2,3] onecan show that the adaptation scheme (11) can be interpreted as a Newton optimiza-tion scheme for the weights as shown in Appendix C. Thus, a fast batch adaptationscheme is offered with magnication coefcient (m + 1) D/ (D + 2) which canexplicitely be controlled by the quantity m . As beforehand, the local data densityP (v j ) has to be estimated e.g. using Parzen windows. The intrinsic data dimen-sionality D can be estimated using e.g. a Grassberger-Procaccia analysis [6] suchthat a value m which yields optimum information transfer can be determined.

4 Experiments

For all experiments the initial neighborhood range 0 is chosen as n/ 2 with n thenumber of neurons used. The neighborhood range (t) is decreasing exponentiallywith the number of adaptation steps t according to (t) = 0 (0.01/ 0)t/t max(cf. [14]). The value tmax is given by the number of epochs of a training run.

7


8/20

-1 0 1 2 33.8

3.82

3.84

3.86

3.88

3.9

3.92

3.94

d = 1

d = 2

d = 3

Fig. 1. Entropy of map formation for different values m of magnication control and train-ing sets of intrinsic dimensionality d { 1 , 2 , 3 }. The arrows indicate the expected optimaof the entropy according to the underlying theory.

Control experiment

In a rst control experiment we use the setting as proposed e.g. in [20]. We use thedistribution (x1, . . . , x d, di=1 sin( xi)) for d { 1, 2, 3} and uniformly distributedxi in [0, 1]. Thus, the intrinsic data dimensionality is known in these examples. Thenumber of stimuli is 2500 for d = 1 , 5000 for d = 2 , and 10000 for d = 3 .These numbers account for the fact that the necessary number of data points tosufciently sample a d-dimensional data space grows exponentially with d. Due tocomputational complexity a c 2d - scheme with a large constant c = 2500 waschosen instead of cd.

Optimum information transfer for NG with magnication control can be expectedfor values which yield = ( m + 1) D/ (D + 2) != 1 , hence m = 2 (d = 1 ),m = 1 (d = 2 ), and m = 2 / 3 (d = 3 ). We train an NG network with magnicationcontrol for control values m [ 1.5, 3.5] (step size 0.25) such that the overallbehavior of the local learning rule for different m can be observed. An NG network with 50 neurons, initial neighborhood range 25 and 200 epochs per training run hasbeen used. The reported results have been averaged over 20 runs. The data densityP (v ) has been estimated by a Parzen window estimator with bandwidth given bythe average training point distances divided by 3.

The information theoretic quality of the information transfer of the map can be

8


9/20

class number percentage surface cover type

1 17.30% scotch pine

2 10.46% Douglas r

3 5.32% pine/r4 7.93% mixed pine forest

5 4.26% supple/prickle pine

6 6.13% aspen/mixed pine forest

7 5.05% without vegetation

8 8.12% aspen

9 0.48% water

10 2.83% moist meadow

11 3.78% bush land

12 7.78% grass/pastureland

13 19.77% dry meadow

14 0.79% alpine vegetationTable 1Surface cover classes and their respective percentage

judged by counting the balance of patterns in the receptive elds. For equal values,optimum information transfer is achieved. We count the number of data points of the training set in the receptive of a given prototype averaged over the numberof data points and report the entropy thereof. The resulting values are reported inFig. 1. The entropy should be maximum for optimum information transfer, i.e. weexpect the optimum for m = 2 (d = 1 ), m = 1 (d = 2 ), and m = 2 / 3 (d = 3 ),respectively. As indicated by the arrows, the experimental optimum of the curves isvery close to the expected theoretical values in all cases, thus conrming the theorypresented in this paper.

Remote sensing image analysis

In geophysics, geology, astronomy, and many environmental applications airborneand satellite-borne remote sensing spectral imaging has become one of the mostadvanced tools for collecting vital information about the surface of the Earth andother planets. Thereby, automatic classication of intricate spectral signatures hasturned out far from trivial: discrimination among many surface cover classes anddiscovery of spatially small, interesting species proved to be an insurmountablechallenge to many traditional methods. Usually, spectral images consist of millions

9


10/20


11/20

This behavior can be quantied by counting the number of neurons responsible fora given class. Thereby, neurons are labeled according to a majority vote on their re-ceptive eld. The number of hits for every class averaged over 10 runs is depictedin Fig. 6. m = 0 corresponds to standard NG, m = 0 .64 is close to the informa-tion theoretic optimum. m 0 focuses on rare events, whereas large values maccount for nal prototype locations in typical regions, the center of gravity as thelimit. The depicted values are m { 0.9836, 0.91817, 0.59, 0, 0.64, 2.27, 3.9}which corresponds to a magnication factor { 0.01, 0.05, 0.25, 0.61, 1.0, 2.0, 3.0}assuming D 3.1414. One can see that small classes at the borders of the data set(classes 9-water and14-alpine vegetation) are represented by neurons only for smallvalues m which emphasize rare events. In the limit of large m, class 1 (scotch pine),which is the second largest class and located near the center of gravity, accumulatesmost neurons. Thus, magnication control allows, depending on the control values,to detect rare cover types or, conversely, to focus on the most representative surfacecover type in inspections.

5 Discussion

By linking local learning to a general cost function, we have transferred magnica-tion control by local learning to fast batch NG. We demonstrated the applicabilityin a controlled experiment as well as one real life example. Thereby, because of the fast convergence of batch NG, the control scheme is quite robust. Magnica-tion control opens the way towards interesting applications: NG itself puts a slight

focus on regions of the data space which are sparsely sampled. This might yield tounwanted effects if outliers are present. Magnication control can achieve factorslarger than one and thus suppress outliers of the data set. However, also a furthermagnication of sparsely sampled regions which goes beyond the standard expo-nent of NG might be useful as demonstrated in a real life experiment. Generally, amagnication of rare events is relevant for the classication of unbalanced classes,visualization of uncommon effects, or modeling attention [18,11,12,16,22,21].

Apart from local learning, further schemes for magnication control exist. Amongthese methods, so-called convex/concave learning has the benet that the local datadensity need not be estimated during training [20]. Concave/convex learning addsand exponent to the adaptation term resulting in (v w i) . Obviously, this termcan be obtained as derivative of (v w i) +1 / ( + 1) such that an underlyingcost function exists at least for the discrete case. Hence this scheme can also betransferred to batch NG, which will be the subject of future investigations.

We would like to mention that batch NG can naturally be transferred to proximitydata for which no embedding into a euclidian vector space is available. The keytechnique is thereby the substitution of the mean of the data points by the general-ized median as proposed in [10]. The update formulas for batch NG which incor-

11


12/20

porate magnication control can immediately be applied to this important scenario.

Appendix A

The derivative of cost function (8) is given by

E m (W ) w l

= 1

C () h (ki(v , W )) (v w l) P (v )m +1 dv+

12C ()

n

i=1 h (ki(v , W )) k i (v , W )

w l v w i 2 P (v )m +1 dv .

ki(v , W ) = no=1 ( v w i 2 v w o 2), where is the Heaviside functionwith derivative which is symmetric and nonvanishing only for inputs not equal 0.The rst term yields the update rule. The second term equals

1C ()

n

o=1 h (kl(v , W )) ( v w l 2 v w o 2) (v w l) v w l 2+

n

i=1h (ki(v , W )) ( v w i 2 v w l 2)( v w l) v w i 2 P (v )m +1 dv .

This term vanishes due to the properties of .

Appendix B

Consider the cost function (9)

E m (W ) =1

2C ()

n

i=1

p

j =1h (ki(v j , W )) v j w i 2 P (v j )m .

For each W , batch NG determines uniqueoptimum assignments kij (W ) := ki(v j , W )where we assume a xed priority in case of ties. These values stem from a niteset. Conversely, for given kij (W ) unique optimum assignments W

are determinedby batch NG. We consider the auxiliary function

Q(W , W ) :=1

2C ()

n

i=1

p

j =1h (kij (W )) v j w

i2 P (v j )m

12


13/20

which is connected to E m (W ) via E m (W ) = Q(W, W ). Assume prototype loca-tions W are given and new prototype locations W are computed in one cycle of batch-NG. It holds E m (W

) = Q(W , W ) Q(W , W ) because kij (W ) are opti-mum assignments for the kij given W

. In addition, Q(W , W ) Q(W, W ) =E m (W ), because W

are optimum assignments of the prototypes given valueskij (W ). Thus, E m (W ) E m (W ) = Q(W , W ) Q(W , W ) + Q(W , W ) Q(W, W ) 0, i.e. the value of the cost function decreases in each step. Becausethere exists only a nite number of different values kij , the procedure convergesafter a nite number of steps towards a xed point W .

Assume that the distances of training points from W are mutually different. Then,the assignment W kij (W ) is constant in a vicinity of W . Thus, E m () andQ(W , ) are identical in a neighborhood of W and a local optimum of Q(W , )is also a local optimum of E m . Hence, W is a local optimum of E m .

Appendix C

Batch NG update for the prototypes can be written in the form

w i = p j =1 h (ki(v j , W )) P (v j )m (v j w i)

p j =1 h (ki(v j , W )) P (v j )m

Newton optimization yields the formula

w i = J (w i) H 1(w i )

with J being the Jacobian of the cost function and H the Hesse matrix. We canignore constant terms of the cost function. Further, ki(v j , W ) is locally constant.Thus we get up to sets of measure zero

J (w i) = p

j =1h (ki(v j , W )) P (v j )m (v j w i)

and the Hessian equals the diagonal matrix with entries

p

j =1h (ki(v j , W )) P (v j )m

13


14/20

References

[1] H. Bauer, R. Der, and M. Herrmann (1996). Controlling the magnication factor of self-organizing feature maps. Neural Computation , 8(4):757771.

[2] L. Bottou and Y. Bengio (1995). Convergence properties of the k-means algorithm. in NIPS 1994 , pp. 585-592, G. Tesauro, D.S. Touretzky, and T.K. Leen (eds.), MIT.

[3] M. Cottrell, B. Hammer, A. Hasenfu, and T. Villmann (2006), Batch and medianneural gas, Neural Networks , 19:762-771.

[4] D. Dersch and P. Tavan (1995). Asymptotic level density in topological feature maps. IEEE Transactions on Neural Networks , 6(1):230236.

[5] J.-C. Fort, P. Letr emy, and M. Cottrell (2002). Advantages and drawbacks of the batchKohonen algorithm. In M. Verleysen (Ed.), European Symposium on Articial Neural Networks , Bruges (Belgium), p. 223230.

[6] P. Grassberger and I. Procaccia (1983). Measuring the strangeness of strangeattractors. Physica D , 9:189-208.

[7] M. Herrmann, H.-U. Bauer, and R. Der (1994). The perceptual magnet effect: amodel based on self-organizing feature maps. In L. Smith and P. Hancock (Eds.), Neural Computation and Psychology , Springer, p. 107116.

[8] T. Heskes (2001), Self-organizing maps, vector quantization, and mixture modeling, IEEE Transactions on Neural Networks , 12: 1299-1305.

[9] T. Kohonen (1995). Self-Organizing Maps . Springer.

[10] T. Kohonen and P. Somervuo (2002). How to make large self-organizing maps fornonvectorial data. Neural Networks , 15:945952.

[11] P. K. Kuhl (1991). Human adults and human infants show a perceptual magneteffect for the prototypes of speech categories, monkeys do not. Perception and Psychophysics , 50, 93107.

[12] P. K. Kuhl, K. A. Williams, F. Lacerda, K. N. Setevens, and B. Lindblom (1992).Linguistic experience alters phonetic perception in infants by 6 months of age.Science , 255:606608.

[13] R. Linsker (1989). How to generate maps by maximizing the mutual information

between input and output signals. Neural Computation , 1:402411.

[14] T. Martinetz, S. Berkovich, and K. Schulten (1993). Neural gas network for vectorquantization and its application to time series prediction. IEEE Transactions on Neural Networks , 4(4):558569.

[15] T. Martinetz and K. Schulten (1994). Topology representing networks. Neural Networks , 7(3):507522.

[16] E. Merenyi and A. Jain (2004). Forbidden magnication? II. In M. Verleysen (Ed.), European Symposium on Articial Neural Networks , Bruges (Belgium), p. 5762.

14


15/20

[17] M.N. Murty, A.K. Jain, and P.J. Flynn (1999), Data clustering: a review, ACM Computing Surveys 31: 264-323.

[18] H. Ritter, T. Martinetz, and K. Schulten (1992). Neural Computation and Self-Organizing Maps: An Introduction . Addison-Wesley.

[19] T. Villmann (2000). Controlling strategies for the magnication factor in the neuralgas network. Neural Network World , 10(4):739750.

[20] T. Villmann and J. Claussen (2005). Magnication control in self-organizing mapsand neural gas. Neural Computation , 18:446-469.

[21] T. Villmann and A. Heinze (2000). Application of magnication control for the neuralgas network in a sensorimotor architecture for robot navigation. In H.-M. Gro,K. Debes, and H.-J. B ohme (Eds.), SOAVE2000 - Selbstorganisation von adaptivemVerhalten , VDI-Verlag, p. 125134.

[22] T. Villmann, E. Mer enyi, and B. Hammer (2003). Neural maps in remote sensing

image analysis. Neural Networks , 16(3-4):389403.

[23] D. J. Willshaw and C. von der Malsburg (1979). A marker induction mechanism for theestablishment of ordered neural mappings: its application to the retinotectal problem.Philosophical Transactions of the Royal Society B , 287:203-243.

[24] P. Zador (1982). Asymptotic quantization error of continuous signals and thequantization dimension. IEEE Transactions on Information Theory , 28:149159.

[25] S. Zhong and J. Ghosh (2003), A unied framework for model-based clustering, Journal of Machine Learning Research 4:1001-1037.

15


16/20

0 100 200 3000

100

200

300Class 1

0 100 200 3000

100

200

300Class 4

0 100 200 3000

100

200

300Class 7

0 100 200 3000

100

200

300Class 10

0 100 200 3000

100

200

300Class 13

0 100 200 3000

100

200

300Class 2

0 100 200 3000

100

200

300Class 5

0 100 200 3000

100

200

300Class 8

0 100 200 3000

100

200

300Class 11

0 100 200 3000

100

200

300Class 14

0 100 200 3000

100

200

300Class 3

0 100 200 3000

100

200

300Class 6

0 100 200 3000

100

200

300Class 9

0 100 200 3000

100

200

300Class 12

0 100 200 3000

100

200

300All data

Fig. 2. Location of the classes on the map: a large overlap of the classes can be observedsince many classes are centered around different kinds of forest. Classes 9 (water) and 14(alpine vegetation) , respectively, constitute two extremal classes at the borders.

16


17/20

0 50 100 150 200 250 3000

50

100

150

200

250

300

Fig. 3. Final prototype location for = 0 .01

17


18/20

0 50 100 150 200 250 3000

50

100

150

200

250

300

Fig. 4. Final prototype location for = 0 .61

18


19/20

0 50 100 150 200 250 3000

50

100

150

200

250

300

Fig. 5. Final prototype location for = 2 .0 .

19


20/20

-1 0 1 2 30

5

10

15

20Class 1

-1 0 1 2 30

5

10

15

20Class 4

-1 0 1 2 30

5

10

15

20Class 7

-1 0 1 2 30

5

10

15

20Class 10

-1 0 1 2 30

5

10

15

20Class 13

-1 0 1 2 30

5

10

15

20Class 2

-1 0 1 2 30

5

10

15

20Class 5

-1 0 1 2 30

5

10

15

20Class 8

-1 0 1 2 30

5

10

15

20Class 11

-1 0 1 2 30

5

10

15

20Class 14

-1 0 1 2 30

5

10

15

20Class 3

-1 0 1 2 30

5

10

15

20Class 6

-1 0 1 2 30

5

10

15

20Class 9

-1 0 1 2 30

5

10

15

20Class 12

-1 0 1 2 30

5

10

15

20All classes

Fig. 6. Number of neurons per class in dependence of the control parameter m .

20

Barbara Hammer, Alexander Hasenfuss and Thomas Villmann- Magnification control for batch neural gas

Documents

Transcript of Barbara Hammer, Alexander Hasenfuss and Thomas Villmann- Magnification control for batch neural gas