3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An...
-
Upload
norma-mosley -
Category
Documents
-
view
277 -
download
5
Transcript of 3.1 159.302Stephen Marsland Ch. 9 Unsupervised Learning Stephen Marsland, Machine Learning: An...
3.1159.302 Stephen Marsland
Ch. 9 Unsupervised LearningStephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009based on slides from Stephen Marsland and some slides from the Internet
Collected and modified byLongin Jan LateckiTemple [email protected]
3.2159.302 Stephen Marsland
Introduction
Suppose we don’t have good training dataHard and boring to generate targetsDon’t always know target values
Biologically implausible to have targets?
Two cases:Know when we’ve got it rightNo external information at all
3.3159.302 Stephen Marsland
Unsupervised Learning
We have no external error informationNo task-specific error criterion
Generate internal errorMust be general
Usual method is to cluster data together according to activation of neuronsCompetitive learning
3.4159.302 Stephen Marsland
Competitive Learning
Set of neurons compete to fireNeuron that ‘best matches’ the input (has the
highest activation) firesWinner-take-all
Neurons ‘specialise’ to recognise some inputGrandmother cells
3.5159.302 Stephen Marsland
The k-Means Algorithm
Suppose that you know the number of clusters, but not what the clusters look like
How do you assign each data point to a cluster?Position k centers at random in the spaceAssign each point to its nearest center according to some
chosen distance measureMove the center to the mean of the points that it representsIterate
3.6 6
k-means Clustering
3.7159.302 Stephen Marsland
Euclidean Distance
x
y
y1 - y2
x1 - x2
3.8159.302 Stephen Marsland
.
.. .
..
..
..
.
.
4 means
^
+
^+
--
- -
*+
-
+
The k-Means Algorithm
3.9159.302 Stephen Marsland
+
^+
--
- -
*+
-
+
^ ^
-
^+
--
- -
*+
-
-
These are local minima solutions
The k-Means Algorithm
3.10159.302 Stephen Marsland
^
-
^+
--
- -
^+
*
-
^
-
^+
--
- -
*+
-
-
More perfectly valid, wrong solutions
The k-Means Algorithm
3.11159.302 Stephen Marsland
+
-
++
--
- -
++
-
-
^
-
^+
--
- -
*+
-
-
If you don’t know the number of means the problem is worse
The k-Means Algorithm
3.12159.302 Stephen Marsland
The k-Means Algorithm
One solution is to run the algorithm for many values of kPick the one with lowest errorUp to overfitting
Run the algorithm from many starting pointsAvoids local minima?
What about noise?Median instead of mean?
3.13159.302 Stephen Marsland
k-Means Neural Network
Neuron activation measures distance between input and neuron position in weight space
3.14159.302 Stephen Marsland
Weight Space
Image we plot neuronal positions according to their weights
w1
w3
w2
w2w1 w3
3.15159.302 Stephen Marsland
k-Means Neural Network
Use winner-take-all neuronsWinning neuron is the one closest to input
Best-matching cluster
How do we do training?Update weights - move neuron positionsMove winning neuron towards current inputIgnore the rest
3.16159.302 Stephen Marsland
Normalisation
Suppose the weights are:(0.2, 0.2, -0.1)(0.15, -0.15, 0.1)(10, 10, 10)
The input is (0.2, 0.2, -0.1)
w1
w3
w2
3.17159.302 Stephen Marsland
Normalisation
For a perfect match with first neuron:0.2*0.2 + 0.2*0.2 + -0.1*-0.1 = 0.090.15*0.2 + -0.15*0.2 + 0.1*-0.1 = -0.0110*0.2 + 10*0.2 + 10*-0.1 = 3
Can only compare activations if the weights are about the same size
3.18159.302 Stephen Marsland
Normalisation
Make the distance between each neuron and the origin be 1
All neurons lie on the unit hypersphere
Need to stop the weights growing unboundedly
3.19159.302 Stephen Marsland
k-Means Neural Network
Normalise inputs tooThen use:
That’s itSimple and easy
3.20159.302 Stephen Marsland
Vector Quantisation (VQ)
Think about the problem of data compressionWant to store a set of data (say, sensor readings) in as small
an amount of memory as possibleWe don’t mind some loss of accuracy
Could make a codebook of typical data and index each data point by reference to a codebook entry
Thus, VQ is a coding method by mapping each data point x to the closest codeword, i.e., we encode x by replacing it with the closest codeword.
3.21S.R.Subramanya21
Outline of Vector Quantization of Images
3.22159.302 Stephen Marsland
The Codebook...
… is sent to the receiver
1011001001110101110011001
01234
1011001001110101110011001
01234
At least 30 bits
Vector Quantisation
3.23159.302 Stephen Marsland
The data...1011001001110101110011001
01234
01001
11100
11101
00101
11110
… is encoded...
…and sent 13 bits
Vector Quantisation
3.24159.302 Stephen Marsland
The data...1011001001110101110011001
01234
01001
11100
11101
00101
11110
… is encoded...
…and sent 33 bits
Vector Quantisation
3.25159.302 Stephen Marsland
The data...1011001001110101110011001
01234
01001
11100
11101
00101
11110
… is encoded...
?
Vector Quantisation
3.26159.302 Stephen Marsland
The data...1011001001110101110011001
01234
01001
11100
11101
00101
11110
… is encoded...
?
Vector Quantisation
Pick the nearest according to some measure
3.27159.302 Stephen Marsland
The data...1011001001110101110011001
01234
01001
11100
11101
00101
11110
… is encoded...
?
Vector Quantisation
Pick the nearest according to some measure
And send … 3 bits, but information is lost
3.28159.302 Stephen Marsland
The data...01001
11100
11101
00101
11110
… is sent as13313
… which takes 15 bits instead of 30
Of course, sending the codebook is inefficient for this data, but if there was a lot more information, the cost would have been reduced
Vector Quantisation
3.29159.302 Stephen Marsland
The problem is that we have only sent 2 different pieces of data - 11100 and 00101, instead of the 5 we had.
If the codebook had been picked more carefully, this would have been a lot better
How can you pick the codebook?Usually k-means is used for
Vector Quantisation
Learning Vector Quantisation
3.30159.302 Stephen Marsland
Voronoi Tesselation
Join neighbouring pointsDraw lines equidistant to
each pair of pointsThese are perpendicular
to other lines
3.31
Codewords in 2-dimensional space. Input vectors are marked with an x, codewords are marked with red circles, and the Voronoi regions are separated with boundary lines.
Two Dimensional Voronoi Diagram
3.32
Self Organizing Maps
Self-organizing maps (SOMs) are a data visualization technique invented by Professor Teuvo Kohonen Also called Kohonen Networks, Competitive Learning,
Winner-Take-All LearningGenerally reduces the dimensions of data through the use
of self-organizing neural networksUseful for data visualization; humans cannot visualize
high dimensional data so this is often a useful technique to make sense of large data sets
3.33
Neurons in the BrainAlthough heterogeneous, at a low level the
brain is composed of neuronsA neuron receives input from other neurons
(generally thousands) from its synapses
Inputs are approximately summed
When the input exceeds a threshold the neuron sends an electrical spike that travels that travels from the body, down the axon, to the next neuron(s)
3.34159.302 Stephen Marsland
Feature Maps
Low pitch Higher pitch High pitch
3.35159.302 Stephen Marsland
Sounds that are similar (‘close together’) excite neurons that are near to each other
Sounds that are very different excite neurons that are a long way off
This is known as topology preservationThe ordering of the inputs is preserved
If possible (perfectly topology-preserving)
Feature Maps
3.36159.302 Stephen Marsland
Topology Preservation
InputsOutputs
3.37159.302 Stephen Marsland
Topology Preservation
3.38November 24, 2009 Introduction to Cognitive Science Lecture 21: Self-
Organizing Maps
38
Self-Organizing Maps (Kohonen Maps)
Common output-layer structures:Common output-layer structures:
One-dimensionalOne-dimensional(completely interconnected(completely interconnectedfor determining “winner” unit)for determining “winner” unit)
Two-dimensionalTwo-dimensional(connections omitted, only (connections omitted, only neighborhood relations neighborhood relations shown)shown)
ii
ii
Neighborhood of neuron iNeighborhood of neuron i
3.39159.302 Stephen Marsland
The Self-Organising MapInputs
3.40159.302 Stephen Marsland
Neuron Connections?
We don’t actually need the inhibitory connectionsJust use a neighbourhood of positive connections
How large should this neighbourhood be?Early in learning, network is unordered
Big neighbourhood
Later on, just fine-tuning networkSmall neighbourhood
3.41159.302 Stephen Marsland
The weight vectors are randomly initialisedInput vectors are presented to the network
The neurons are activated proportional to the Euclidean distance between the input and the weight vector
The winning node has its weight vector moved closer to the input
So do the neighbours of the winning nodeOver time, the network self-organises so that
the input topology is preserved
The Self-Organising Map
3.42159.302 Stephen Marsland
Self-Organisation
Global ordering from local interactionsEach neurons sees its neighboursThe whole network becomes ordered
Understanding self-organisation is part of complexity science
Appears all over the place
3.43
Basic “Winner Take All” Network
Two layer networkInput units, output units, each input unit is connected to each output
unit
I1
I2
O1
O2
Input Layer
Wi,j
I3
Output Layer
3.44
Basic Algorithm (the same as k-Means Neural Network)
Initialize Map (randomly assign weights)
Loop over training examples Assign input unit values according to the values in the
current example
Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g.
Modify weights on the winner to more closely match the input
2
1
n
iiij IW
For all output units j=1 to mand input units i=1 to nFind the one that minimizes:
)(1 tti
t WXcW
where c is a small positive learning constantthat usually decreases as the learning proceeds
3.45
Result of Algorithm
Initially, some output nodes will randomly be a little closer to some particular type of input
These nodes become “winners” and the weights move them even closer to the inputs
Over time nodes in the output become representative prototypes for examples in the input
Note there is no supervised training hereClassification:
Given new input, the class is the output node that is the winner
3.46
Typical Usage: 2D Feature Map
In typical usage the output nodes form a 2D “map” organized in a grid-like fashion and we update weights in a neighborhood around the winner
I1
I2
Input Layer
I3
Output Layers
O11 O12 O13 O14 O15
O21 O22 O23 O24 O25
O31 O32 O33 O34 O35
O41 O42 O43 O44 O45
O51 O52 O53 O54 O55
…
3.47
Modified Algorithm
Initialize Map (randomly assign weights)Loop over training examples
Assign input unit values according to the values in the current example
Find the “winner”, i.e. the output unit that most closely matches the input units, using some distance metric, e.g.
Modify weights on the winner to more closely match the input
Modify weights in a neighborhood around the winner so the neighbors on the 2D map also become closer to the input
Over time this will tend to cluster similar items closer on the map
3.48November 24, 2009 Introduction to Cognitive Science Lecture 21: Self-
Organizing Maps
48
Unsupervised Learning in SOMsFor n-dimensional input space and m output neurons:
(1) Choose random weight vector wi for neuron i, i = 1, ..., m
(2) Choose random input x
(3) Determine winner neuron k: ||wk – x|| = mini ||wi – x|| (Euclidean distance)
(4) Update all weight vectors of all neurons i in the neighborhood of neuron k: wi := wi + η·h(i, k)·(x – wi) (wi is shifted towards x)
(5) If convergence criterion met, STOP. Otherwise, narrow neighborhood function h and learning parameter η and go to (2).
3.49159.302 Stephen Marsland
The Self-Organising Map
Before training (large neighbourhood)
3.50159.302 Stephen Marsland
The Self-Organising Map
After training (small neighbourhood)
3.51
Updating the Neighborhood
Node O44 is the winner
Color indicates scaling to update neighbors
Output Layers
O11 O12 O13 O14 O15
O21 O22 O23 O24 O25
O31 O32 O33 O34 O35
O41 O42 O43 O44 O45
O51 O52 O53 O54 O55
)(1 tti
t WXcW
c=1
c=0.75
c=0.5
3.52
Selecting the NeighborhoodTypically, a “Sombrero Function” or Gaussian
function is used
Neighborhood size usually decreases over time to allow initial “jockeying for position” and then “fine-tuning” as algorithm proceeds
Strength
Distance
3.53
Color Example
http://davis.wpi.edu/~matt/courses/soms/applet.html
3.54
Kohonen Network Examples
Document Map: http://websom.hut.fi/websom/milliondemo/html/root.html
3.55
Poverty Map
http://www.cis.hut.fi/research/som-research/worldmap.html
3.56
SOM for Classification
A generated map can also be used for classification
Human can assign a class to a data point, or use the strongest weight as the prototype for the data point
For a new test case, calculate the winning node and classify it as the class it is closest to
3.57159.302 Stephen Marsland
Network Size
We have to predetermine the network sizeBig network
Each neuron represents exact featureNot much generalisation
Small networkToo much generalisationNo differentiation
Try different sizes and pick the best