3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An...

57
3.1 159.302 Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen Marsland and some slides from the Internet Collected and modified by Longin Jan Latecki Temple University [email protected]

Transcript of 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An...

Page 1: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.1159.302 Stephen Marsland

Ch. 9 Unsupervised LearningStephen Marsland, Machine Learning: An Algorithmic Perspective.  CRC 2009based on slides from Stephen Marsland and some slides from the Internet

Collected and modified byLongin Jan LateckiTemple [email protected]

Page 2: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.2159.302 Stephen Marsland

Introduction

Suppose we don’t have good training dataHard and boring to generate targetsDon’t always know target values

Biologically implausible to have targets?

Two cases:Know when we’ve got it rightNo external information at all

Page 3: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.3159.302 Stephen Marsland

Unsupervised Learning

We have no external error informationNo task-specific error criterion

Generate internal errorMust be general

Usual method is to cluster data together according to activation of neuronsCompetitive learning

Page 4: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.4159.302 Stephen Marsland

Competitive Learning

Set of neurons compete to fireNeuron that ‘best matches’ the input (has the

highest activation) firesWinner-take-all

Neurons ‘specialise’ to recognise some inputGrandmother cells

Page 5: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.5159.302 Stephen Marsland

The k-Means Algorithm

Suppose that you know the number of clusters, but not what the clusters look like

How do you assign each data point to a cluster?Position k centers at random in the spaceAssign each point to its nearest center according to some

chosen distance measureMove the center to the mean of the points that it representsIterate

Page 6: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.6 6

k-means Clustering

Page 7: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.7159.302 Stephen Marsland

Euclidean Distance

x

y

y1 - y2

x1 - x2

Page 8: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.8159.302 Stephen Marsland

.

.. .

..

..

..

.

.

4 means

^

+

^+

--

- -

*+

-

+

The k-Means Algorithm

Page 9: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.9159.302 Stephen Marsland

+

^+

--

- -

*+

-

+

^ ^

-

^+

--

- -

*+

-

-

These are local minima solutions

The k-Means Algorithm

Page 10: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.10159.302 Stephen Marsland

^

-

^+

--

- -

^+

*

-

^

-

^+

--

- -

*+

-

-

More perfectly valid, wrong solutions

The k-Means Algorithm

Page 11: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.11159.302 Stephen Marsland

+

-

++

--

- -

++

-

-

^

-

^+

--

- -

*+

-

-

If you don’t know the number of means the problem is worse

The k-Means Algorithm

Page 12: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.12159.302 Stephen Marsland

The k-Means Algorithm

One solution is to run the algorithm for many values of kPick the one with lowest errorUp to overfitting

Run the algorithm from many starting pointsAvoids local minima?

What about noise?Median instead of mean?

Page 13: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.13159.302 Stephen Marsland

k-Means Neural Network

Neuron activation measures distance between input and neuron position in weight space

Page 14: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.14159.302 Stephen Marsland

Weight Space

Image we plot neuronal positions according to their weights

w1

w3

w2

w2w1 w3

Page 15: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.15159.302 Stephen Marsland

k-Means Neural Network

Use winner-take-all neuronsWinning neuron is the one closest to input

Best-matching cluster

How do we do training?Update weights - move neuron positionsMove winning neuron towards current inputIgnore the rest

Page 16: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.16159.302 Stephen Marsland

Normalisation

Suppose the weights are:(0.2, 0.2, -0.1)(0.15, -0.15, 0.1)(10, 10, 10)

The input is (0.2, 0.2, -0.1)

w1

w3

w2

Page 17: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.17159.302 Stephen Marsland

Normalisation

For a perfect match with first neuron:0.2*0.2 + 0.2*0.2 + -0.1*-0.1 = 0.090.15*0.2 + -0.15*0.2 + 0.1*-0.1 = -0.0110*0.2 + 10*0.2 + 10*-0.1 = 3

Can only compare activations if the weights are about the same size

Page 18: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.18159.302 Stephen Marsland

Normalisation

Make the distance between each neuron and the origin be 1

All neurons lie on the unit hypersphere

Need to stop the weights growing unboundedly

Page 19: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.19159.302 Stephen Marsland

k-Means Neural Network

Normalise inputs tooThen use:

That’s itSimple and easy

Page 20: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.20159.302 Stephen Marsland

Vector Quantisation (VQ)

Think about the problem of data compressionWant to store a set of data (say, sensor readings) in as small

an amount of memory as possibleWe don’t mind some loss of accuracy

Could make a codebook of typical data and index each data point by reference to a codebook entry

Thus, VQ is a coding method by mapping each data point x to the closest codeword, i.e., we encode x by replacing it with the closest codeword.

Page 21: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.21S.R.Subramanya21

Outline of Vector Quantization of Images

Page 22: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.22159.302 Stephen Marsland

The Codebook...

… is sent to the receiver

1011001001110101110011001

01234

1011001001110101110011001

01234

At least 30 bits

Vector Quantisation

Page 23: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.23159.302 Stephen Marsland

The data...1011001001110101110011001

01234

01001

11100

11101

00101

11110

… is encoded...

…and sent 13 bits

Vector Quantisation

Page 24: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.24159.302 Stephen Marsland

The data...1011001001110101110011001

01234

01001

11100

11101

00101

11110

… is encoded...

…and sent 33 bits

Vector Quantisation

Page 25: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.25159.302 Stephen Marsland

The data...1011001001110101110011001

01234

01001

11100

11101

00101

11110

… is encoded...

?

Vector Quantisation

Page 26: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.26159.302 Stephen Marsland

The data...1011001001110101110011001

01234

01001

11100

11101

00101

11110

… is encoded...

?

Vector Quantisation

Pick the nearest according to some measure

Page 27: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.27159.302 Stephen Marsland

The data...1011001001110101110011001

01234

01001

11100

11101

00101

11110

… is encoded...

?

Vector Quantisation

Pick the nearest according to some measure

And send … 3 bits, but information is lost

Page 28: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.28159.302 Stephen Marsland

The data...01001

11100

11101

00101

11110

… is sent as13313

… which takes 15 bits instead of 30

Of course, sending the codebook is inefficient for this data, but if there was a lot more information, the cost would have been reduced

Vector Quantisation

Page 29: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.29159.302 Stephen Marsland

The problem is that we have only sent 2 different pieces of data - 11100 and 00101, instead of the 5 we had.

If the codebook had been picked more carefully, this would have been a lot better

How can you pick the codebook?Usually k-means is used for

Vector Quantisation

Learning Vector Quantisation

Page 30: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.30159.302 Stephen Marsland

Voronoi Tesselation

Join neighbouring pointsDraw lines equidistant to

each pair of pointsThese are perpendicular

to other lines

Page 31: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.31

Codewords in 2-dimensional space.  Input vectors are marked with an x, codewords are marked with red circles, and the Voronoi regions are separated with boundary lines.

Two Dimensional Voronoi Diagram

Page 32: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.32

Self Organizing Maps

Self-organizing maps (SOMs) are a data visualization technique invented by Professor Teuvo Kohonen Also called Kohonen Networks, Competitive Learning,

Winner-Take-All LearningGenerally reduces the dimensions of data through the use

of self-organizing neural networksUseful for data visualization; humans cannot visualize

high dimensional data so this is often a useful technique to make sense of large data sets

Page 33: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.33

Neurons in the BrainAlthough heterogeneous, at a low level the

brain is composed of neuronsA neuron receives input from other neurons

(generally thousands) from its synapses

Inputs are approximately summed

When the input exceeds a threshold the neuron sends an electrical spike that travels that travels from the body, down the axon, to the next neuron(s)

Page 34: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.34159.302 Stephen Marsland

Feature Maps

Low pitch Higher pitch High pitch

Page 35: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.35159.302 Stephen Marsland

Sounds that are similar (‘close together’) excite neurons that are near to each other

Sounds that are very different excite neurons that are a long way off

This is known as topology preservationThe ordering of the inputs is preserved

If possible (perfectly topology-preserving)

Feature Maps

Page 36: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.36159.302 Stephen Marsland

Topology Preservation

InputsOutputs

Page 37: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.37159.302 Stephen Marsland

Topology Preservation

Page 38: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.38November 24, 2009 Introduction to Cognitive Science Lecture 21: Self-

Organizing Maps

38

Self-Organizing Maps (Kohonen Maps)

Common output-layer structures:Common output-layer structures:

One-dimensionalOne-dimensional(completely interconnected(completely interconnectedfor determining “winner” unit)for determining “winner” unit)

Two-dimensionalTwo-dimensional(connections omitted, only (connections omitted, only neighborhood relations neighborhood relations shown)shown)

ii

ii

Neighborhood of neuron iNeighborhood of neuron i

Page 39: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.39159.302 Stephen Marsland

The Self-Organising MapInputs

Page 40: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.40159.302 Stephen Marsland

Neuron Connections?

We don’t actually need the inhibitory connectionsJust use a neighbourhood of positive connections

How large should this neighbourhood be?Early in learning, network is unordered

Big neighbourhood

Later on, just fine-tuning networkSmall neighbourhood

Page 41: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.41159.302 Stephen Marsland

The weight vectors are randomly initialisedInput vectors are presented to the network

The neurons are activated proportional to the Euclidean distance between the input and the weight vector

The winning node has its weight vector moved closer to the input

So do the neighbours of the winning nodeOver time, the network self-organises so that

the input topology is preserved

The Self-Organising Map

Page 42: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.42159.302 Stephen Marsland

Self-Organisation

Global ordering from local interactionsEach neurons sees its neighboursThe whole network becomes ordered

Understanding self-organisation is part of complexity science

Appears all over the place

Page 43: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.43

Basic “Winner Take All” Network

Two layer networkInput units, output units, each input unit is connected to each output

unit

I1

I2

O1

O2

Input Layer

Wi,j

I3

Output Layer

Page 44: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.44

Basic Algorithm (the same as k-Means Neural Network)

Initialize Map (randomly assign weights)

Loop over training examples Assign input unit values according to the values in the

current example

Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g.

Modify weights on the winner to more closely match the input

2

1

n

iiij IW

For all output units j=1 to mand input units i=1 to nFind the one that minimizes:

)(1 tti

t WXcW

where c is a small positive learning constantthat usually decreases as the learning proceeds

Page 45: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.45

Result of Algorithm

Initially, some output nodes will randomly be a little closer to some particular type of input

These nodes become “winners” and the weights move them even closer to the inputs

Over time nodes in the output become representative prototypes for examples in the input

Note there is no supervised training hereClassification:

Given new input, the class is the output node that is the winner

Page 46: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.46

Typical Usage: 2D Feature Map

In typical usage the output nodes form a 2D “map” organized in a grid-like fashion and we update weights in a neighborhood around the winner

I1

I2

Input Layer

I3

Output Layers

O11 O12 O13 O14 O15

O21 O22 O23 O24 O25

O31 O32 O33 O34 O35

O41 O42 O43 O44 O45

O51 O52 O53 O54 O55

Page 47: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.47

Modified Algorithm

Initialize Map (randomly assign weights)Loop over training examples

Assign input unit values according to the values in the current example

Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g.

Modify weights on the winner to more closely match the input

Modify weights in a neighborhood around the winner so the neighbors on the 2D map also become closer to the input

Over time this will tend to cluster similar items closer on the map

Page 48: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.48November 24, 2009 Introduction to Cognitive Science Lecture 21: Self-

Organizing Maps

48

Unsupervised Learning in SOMsFor n-dimensional input space and m output neurons:

(1) Choose random weight vector wi for neuron i, i = 1, ..., m

(2) Choose random input x

(3) Determine winner neuron k: ||wk – x|| = mini ||wi – x|| (Euclidean distance)

(4) Update all weight vectors of all neurons i in the neighborhood of neuron k: wi := wi + η·h(i, k)·(x – wi) (wi is shifted towards x)

(5) If convergence criterion met, STOP. Otherwise, narrow neighborhood function h and learning parameter η and go to (2).

Page 49: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.49159.302 Stephen Marsland

The Self-Organising Map

Before training (large neighbourhood)

Page 50: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.50159.302 Stephen Marsland

The Self-Organising Map

After training (small neighbourhood)

Page 51: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.51

Updating the Neighborhood

Node O44 is the winner

Color indicates scaling to update neighbors

Output Layers

O11 O12 O13 O14 O15

O21 O22 O23 O24 O25

O31 O32 O33 O34 O35

O41 O42 O43 O44 O45

O51 O52 O53 O54 O55

)(1 tti

t WXcW

c=1

c=0.75

c=0.5

Page 52: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.52

Selecting the NeighborhoodTypically, a “Sombrero Function” or Gaussian

function is used

Neighborhood size usually decreases over time to allow initial “jockeying for position” and then “fine-tuning” as algorithm proceeds

Strength

Distance

Page 53: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.53

Color Example

http://davis.wpi.edu/~matt/courses/soms/applet.html

Page 54: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.54

Kohonen Network Examples

Document Map: http://websom.hut.fi/websom/milliondemo/html/root.html

Page 55: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.55

Poverty Map

http://www.cis.hut.fi/research/som-research/worldmap.html

Page 56: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.56

SOM for Classification

A generated map can also be used for classification

Human can assign a class to a data point, or use the strongest weight as the prototype for the data point

For a new test case, calculate the winning node and classify it as the class it is closest to

Page 57: 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from Stephen.

3.57159.302 Stephen Marsland

Network Size

We have to predetermine the network sizeBig network

Each neuron represents exact featureNot much generalisation

Small networkToo much generalisationNo differentiation

Try different sizes and pick the best