Unsupervised feature learning for audio classification using convolutional deep belief networks
Unsupervised learning Networks
description
Transcript of Unsupervised learning Networks
Unsupervised learning Networks
•Associative Memory Networks
ELE571
Digital Neural Networks
Associative Memory Networks
•feedforward type (one-shot-recovery)
•feedback type e.g., the Hopfield network (iterative recovery)
Associative Memory Networks recalls the original undistorted pattern from a distorted or partially-missing pattern.
Associative Memory Model (feedfoward type )
b
a
W could be (1) symmetric or not, (2) square or not
nonlinear unit: e.g. threshold
b(k) W = b(k) m b(m) T a(m) = a(k)
Bidirectional Associative Memory
a1 = [1 1 1 1 –1 1 1 1 1]
a2 = [1 -1 1 -1 1 -1 1 –1 1]
2 0 2 0 0 0 2 0 2
0 2 0 2 -2 2 0 2 0
2 0 2 0 0 0 2 0 2
0 2 0 2 -2 2 0 2 0
0 -2 0 -2 2 -2 0 -2 0
0 2 0 2 -2 2 0 2 0
2 0 2 0 0 0 2 0 2
0 2 0 2 -2 2 0 2 0
2 0 2 0 0 0 2 0 2
The weight matrix
1 1 1 1 –1 1 1 1 1 1 -1 1 -1 1 -1 1 –1 1X=[ ]
W =XTX=
Associative Memory Model (feedback type )
aold
anew
W must be
•normalized
•W = WH W=X+X
Each iteration in AMM(W) comprises two substeps:
(a) Projection of aold onto W-plane
anet = W aold
(b) Remap the net-vector to closest symbol vector:
anew = T[anet]
anew = arg min anew ss
|| anew - W aold ||
The two substeps in one iteration can be sumarized as one procedure:
initial vector symbol
vector
perfect attractor
Γρ x -plane
Linear projection onto x-plane
g-update in DML, shown as
s-update in DML, shown asResymbolization nonlinear mapping
2 steps in one AMM iteration
^
^
=
=
anet is the (least-square-error) projection of aold
onto the (column) subspace of W.
Inherent properties of the signals (patterns)
to be retrieved:
•Orthogonality
•Higher-order statistics
•Constant modulus property
• FAE -Property
• others
Common Assumptions on Signal
for Associative Retrieval
Source
H
XS
Channel Observationg+
ε1
v S = ŝgX=gH S =
Blind Recovery of MIMO System
Blind Recovery of MIMO System
h11
h21
h1p
h2p
hq1
hqp
S
+
g1
g2
gq
Goal: to find g, such that v gH, and
v S = [ 0 .. 0 1 0 .. 0 ] S = sj
s1
sp
si
Signal Recoverability
H is PR (perfectly recoverable) if and only if H has full column rank, i.e. an inverse exists.
Assumptions on MIMO System
For Deterministic H, ……
non-recoverable
1 2 1 2 1 2
[ ]
1 2 1 3 1 2
[ ]
Examples for Flat MIMO
recoverable
If perfectly recoverable, e.g.
1 2 1 3 1 2
[ ]Parallel Equalizer
ŝ = H+X = G X
H
XS
Signal Constellation
Ci
g+ε1
Example: Wireless MIMO System
v S = ŝgX=gH S =
Signal recovery via g:
Given v s = ŝ , for ŝ to be always a valid symbol for any valid symbol
vector s, if and only if
v [ 0 .. 0 ±1 0 .. 0 ]
FAE()-Property: Finite-Alphabet Exclusiveness
Suppose that a v W = b. For the output b to be always a valid symbol sequence
given whatever v,
the necessary and sufficient condition is that
v = E(k) .
Theorem: FAE -Property
In other words, it is impossible to produce a valid but
different output symbol vector.
if and only if
v [ 0 .. 0 ±1 0 .. 0 ]
-1 +1 +1 +1 +1 +1 • •+1 +1 +1 -1 -1 +1 • •-1 -1 +1 -1 +1 +1 • •S = [ ],
v S = [valid symbols]
FAE()-Property: Finite-Alphabet Exclusiveness
If v ≠ [ 0 .. 0 ±1 0 .. 0 ]If v = [ 0 .. 0 ±1 0 .. 0 ]
ŝ [Finite Alphabet]
“EM”:
v S = ŝBlind-BLAST
gX=gH S =
ĝ= ŝ X+
š= T[ ĝ X]E-step
M-step
•The E step determines the best guess of the membership function zj .
• The M step determines the best parameters, θn , which maximizes the likelihood function.
ŝ = T[gX ]E-step
M-step ğ= š X+
Combined EM: š= T[ ĝ X] = T[ ŝ X+ X] = T[ ŝ W]
ŝold=ŝŝnew
Associative Memory Network
W=X+X
ŝnewŝnew
:threshold
ŝnew= Sign[ŝoldW]
initial
vector ŝNearest symbol vector
gx -plane
Linear projection onto gx - plane
FA nonlinear mapping
=
=
ŝ’ = T[ŝW]
g= ŝold X+
A symbol vector a* is a "perfect attractor" of AMM(W) if and only if
• a* is symbol vector
• a* = W a*
Definition: Perfect Attractor of AMM(W)
Let v= [ f(1) ≠0 f(2) … f(p)]
1
2
p
q
1
1
Thus f(p) =0
= 0
Let ûi= [ ui (1) ui(2) … ui(p) ]T
Let v’= [ f(1) ≠0 f(2) … f(p-1) 0]
Let ǖi = v ûi Let ǖ’i = v’ ûi
Compare the two programsand determine the differences in performance.
Why such a difference?
MatLab Exercise
p = zeros(1,100);for j=1:100; S = sign(randn(5,200)); A = randn(5,5)+eye(5); X = A*S + 0.01*randn(5,200); s = sign(randn(200,1)); W = X'*inv(X*X')*X; for i=1:20; sold = s; s = tanh(100*W*s); s = sign(s); end while norm(s-W*s)> 5.0,
s = sign(randn(200,1)); for i=1:20; sold = s; s = tanh(100*W*s); s = sign(s); end end p(j) = max(abs(S*s));endhist(p)
p = zeros(1,100);for j=1:100; S = sign(randn(5,200)); A = randn(5,5)+eye(5); X = A*S + 0.01*randn(5,200); s = sign(randn(200,1)); W = X'*inv(X*X')*X; for i=1:20; sold = s; s = tanh(100*W*s); s = sign(s); end p(j) = max(abs(S*s));endhist(p)