3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An...

3.1159.302 Stephen Marsland

Ch. 9 Unsupervised LearningStephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009based on slides from Stephen Marsland and some slides from the Internet

Collected and modified byLongin Jan LateckiTemple [email protected]


Introduction

Suppose we don’t have good training dataHard and boring to generate targetsDon’t always know target values

Biologically implausible to have targets?

Two cases:Know when we’ve got it rightNo external information at all


Unsupervised Learning

We have no external error informationNo task-specific error criterion

Generate internal errorMust be general

Usual method is to cluster data together according to activation of neuronsCompetitive learning


Competitive Learning

Set of neurons compete to fireNeuron that ‘best matches’ the input (has the

highest activation) firesWinner-take-all

Neurons ‘specialise’ to recognise some inputGrandmother cells


The k-Means Algorithm

Suppose that you know the number of clusters, but not what the clusters look like

How do you assign each data point to a cluster?Position k centers at random in the spaceAssign each point to its nearest center according to some

chosen distance measureMove the center to the mean of the points that it representsIterate

3.6 6

k-means Clustering


Euclidean Distance

x

y

y1 - y2

x1 - x2


.

.. .

..

..

..

.

.

4 means

^

+

^+

--

- -

*+

-

+



+

^+

--

- -

*+

-

+

^ ^

-

^+

--

- -

*+

-

-

These are local minima solutions



^

-

^+

--

- -

^+

*

-

^

-

^+

--

- -

*+

-

-

More perfectly valid, wrong solutions



+

-

++

--

- -

++

-

-

^

-

^+

--

- -

*+

-

-

If you don’t know the number of means the problem is worse




One solution is to run the algorithm for many values of kPick the one with lowest errorUp to overfitting

Run the algorithm from many starting pointsAvoids local minima?

What about noise?Median instead of mean?


k-Means Neural Network

Neuron activation measures distance between input and neuron position in weight space


Weight Space

Image we plot neuronal positions according to their weights

w1

w3

w2

w2w1 w3



Use winner-take-all neuronsWinning neuron is the one closest to input

Best-matching cluster

How do we do training?Update weights - move neuron positionsMove winning neuron towards current inputIgnore the rest


Normalisation

Suppose the weights are:(0.2, 0.2, -0.1)(0.15, -0.15, 0.1)(10, 10, 10)

The input is (0.2, 0.2, -0.1)

w1

w3

w2


Normalisation

For a perfect match with first neuron:0.2*0.2 + 0.2*0.2 + -0.1*-0.1 = 0.090.15*0.2 + -0.15*0.2 + 0.1*-0.1 = -0.0110*0.2 + 10*0.2 + 10*-0.1 = 3

Can only compare activations if the weights are about the same size


Normalisation

Make the distance between each neuron and the origin be 1

All neurons lie on the unit hypersphere

Need to stop the weights growing unboundedly



Normalise inputs tooThen use:

That’s itSimple and easy


Vector Quantisation (VQ)

Think about the problem of data compressionWant to store a set of data (say, sensor readings) in as small

an amount of memory as possibleWe don’t mind some loss of accuracy

Could make a codebook of typical data and index each data point by reference to a codebook entry

Thus, VQ is a coding method by mapping each data point x to the closest codeword, i.e., we encode x by replacing it with the closest codeword.

3.21S.R.Subramanya21

Outline of Vector Quantization of Images


The Codebook...

… is sent to the receiver

1011001001110101110011001

01234

1011001001110101110011001

01234

At least 30 bits

Vector Quantisation


The data...1011001001110101110011001

01234

01001

11100

11101

00101

11110

… is encoded...

…and sent 13 bits

Vector Quantisation


The data...1011001001110101110011001

01234

01001

11100

11101

00101

11110

… is encoded...

…and sent 33 bits

Vector Quantisation


The data...1011001001110101110011001

01234

01001

11100

11101

00101

11110

… is encoded...

?

Vector Quantisation


The data...1011001001110101110011001

01234

01001

11100

11101

00101

11110

… is encoded...

?

Vector Quantisation

Pick the nearest according to some measure


The data...1011001001110101110011001

01234

01001

11100

11101

00101

11110

… is encoded...

?

Vector Quantisation

Pick the nearest according to some measure

And send … 3 bits, but information is lost


The data...01001

11100

11101

00101

11110

… is sent as13313

… which takes 15 bits instead of 30

Of course, sending the codebook is inefficient for this data, but if there was a lot more information, the cost would have been reduced

Vector Quantisation


The problem is that we have only sent 2 different pieces of data - 11100 and 00101, instead of the 5 we had.

If the codebook had been picked more carefully, this would have been a lot better

How can you pick the codebook?Usually k-means is used for

Vector Quantisation

Learning Vector Quantisation


Voronoi Tesselation

Join neighbouring pointsDraw lines equidistant to

each pair of pointsThese are perpendicular

to other lines

3.31

Codewords in 2-dimensional space. Input vectors are marked with an x, codewords are marked with red circles, and the Voronoi regions are separated with boundary lines.

Two Dimensional Voronoi Diagram

3.32

Self Organizing Maps

Self-organizing maps (SOMs) are a data visualization technique invented by Professor Teuvo Kohonen Also called Kohonen Networks, Competitive Learning,

Winner-Take-All LearningGenerally reduces the dimensions of data through the use

of self-organizing neural networksUseful for data visualization; humans cannot visualize

high dimensional data so this is often a useful technique to make sense of large data sets

3.33

Neurons in the BrainAlthough heterogeneous, at a low level the

brain is composed of neuronsA neuron receives input from other neurons

(generally thousands) from its synapses

Inputs are approximately summed

When the input exceeds a threshold the neuron sends an electrical spike that travels that travels from the body, down the axon, to the next neuron(s)


Feature Maps

Low pitch Higher pitch High pitch


Sounds that are similar (‘close together’) excite neurons that are near to each other

Sounds that are very different excite neurons that are a long way off

This is known as topology preservationThe ordering of the inputs is preserved

If possible (perfectly topology-preserving)

Feature Maps


Topology Preservation

InputsOutputs


Topology Preservation

3.38November 24, 2009 Introduction to Cognitive Science Lecture 21: Self-

Organizing Maps

38

Self-Organizing Maps (Kohonen Maps)

Common output-layer structures:Common output-layer structures:

One-dimensionalOne-dimensional(completely interconnected(completely interconnectedfor determining “winner” unit)for determining “winner” unit)

Two-dimensionalTwo-dimensional(connections omitted, only (connections omitted, only neighborhood relations neighborhood relations shown)shown)

ii

ii

Neighborhood of neuron iNeighborhood of neuron i


The Self-Organising MapInputs


Neuron Connections?

We don’t actually need the inhibitory connectionsJust use a neighbourhood of positive connections

How large should this neighbourhood be?Early in learning, network is unordered

Big neighbourhood

Later on, just fine-tuning networkSmall neighbourhood


The weight vectors are randomly initialisedInput vectors are presented to the network

The neurons are activated proportional to the Euclidean distance between the input and the weight vector

The winning node has its weight vector moved closer to the input

So do the neighbours of the winning nodeOver time, the network self-organises so that

the input topology is preserved

The Self-Organising Map


Self-Organisation

Global ordering from local interactionsEach neurons sees its neighboursThe whole network becomes ordered

Understanding self-organisation is part of complexity science

Appears all over the place

3.43

Basic “Winner Take All” Network

Two layer networkInput units, output units, each input unit is connected to each output

unit

I1

I2

O1

O2

Input Layer

Wi,j

I3

Output Layer

3.44

Basic Algorithm (the same as k-Means Neural Network)

Initialize Map (randomly assign weights)

Loop over training examples Assign input unit values according to the values in the

current example

Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g.

Modify weights on the winner to more closely match the input

2

1

n

iiij IW

For all output units j=1 to mand input units i=1 to nFind the one that minimizes:

)(1 tti

t WXcW

where c is a small positive learning constantthat usually decreases as the learning proceeds

3.45

Result of Algorithm

Initially, some output nodes will randomly be a little closer to some particular type of input

These nodes become “winners” and the weights move them even closer to the inputs

Over time nodes in the output become representative prototypes for examples in the input

Note there is no supervised training hereClassification:

Given new input, the class is the output node that is the winner

3.46

Typical Usage: 2D Feature Map

In typical usage the output nodes form a 2D “map” organized in a grid-like fashion and we update weights in a neighborhood around the winner

I1

I2

Input Layer

I3

Output Layers

O11 O12 O13 O14 O15

O21 O22 O23 O24 O25

O31 O32 O33 O34 O35

O41 O42 O43 O44 O45

O51 O52 O53 O54 O55

…

3.47

Modified Algorithm

Initialize Map (randomly assign weights)Loop over training examples

Assign input unit values according to the values in the current example

Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g.

Modify weights on the winner to more closely match the input

Modify weights in a neighborhood around the winner so the neighbors on the 2D map also become closer to the input

Over time this will tend to cluster similar items closer on the map

3.48November 24, 2009 Introduction to Cognitive Science Lecture 21: Self-

Organizing Maps

48

Unsupervised Learning in SOMsFor n-dimensional input space and m output neurons:

(1) Choose random weight vector wi for neuron i, i = 1, ..., m

(2) Choose random input x

(3) Determine winner neuron k: ||wk – x|| = mini ||wi – x|| (Euclidean distance)

(4) Update all weight vectors of all neurons i in the neighborhood of neuron k: wi := wi + η·h(i, k)·(x – wi) (wi is shifted towards x)

(5) If convergence criterion met, STOP. Otherwise, narrow neighborhood function h and learning parameter η and go to (2).



Before training (large neighbourhood)



After training (small neighbourhood)

3.51

Updating the Neighborhood

Node O44 is the winner

Color indicates scaling to update neighbors

Output Layers

O11 O12 O13 O14 O15

O21 O22 O23 O24 O25

O31 O32 O33 O34 O35

O41 O42 O43 O44 O45

O51 O52 O53 O54 O55

)(1 tti

t WXcW

c=1

c=0.75

c=0.5

3.52

Selecting the NeighborhoodTypically, a “Sombrero Function” or Gaussian

function is used

Neighborhood size usually decreases over time to allow initial “jockeying for position” and then “fine-tuning” as algorithm proceeds

Strength

Distance

3.53

Color Example

http://davis.wpi.edu/~matt/courses/soms/applet.html

3.54

Kohonen Network Examples

Document Map: http://websom.hut.fi/websom/milliondemo/html/root.html

3.55

Poverty Map

http://www.cis.hut.fi/research/som-research/worldmap.html

3.56

SOM for Classification

A generated map can also be used for classification

Human can assign a class to a data point, or use the strongest weight as the prototype for the data point

For a new test case, calculate the winning node and classify it as the class it is closest to


Network Size

We have to predetermine the network sizeBig network

Each neuron represents exact featureNot much generalisation

Small networkToo much generalisationNo differentiation

Try different sizes and pick the best

3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An...

Documents

Transcript of 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An...