Wavelets and Sparse Signal Processingmin.sjtu.edu.cn/files/wavelet/intro.pdf · Stéphane Mallat, A...
Transcript of Wavelets and Sparse Signal Processingmin.sjtu.edu.cn/files/wavelet/intro.pdf · Stéphane Mallat, A...
Wavelets and Sparse Signal Processing
Instructor: Hongkai Xiong (熊红凯) Distinguished Professor (特聘教授)
http://min.sjtu.edu.cn
Department of Electronic Engineering Department of Computer Science and Engineering
Shanghai Jiao Tong University
2019
1
Hongkai Xiong, Distinguished Professor Office : 1-309, No.1, SEIEE Bld. Email: [email protected] Web-page: http://min.sjtu.edu.cn
Wenrui Dai, Associate Professor Office : 1-304, No.1, SEIEE Bld. Email: [email protected]
Teaching Assistants: • Ph.D Candidate: Mr. Wen Fei Email: [email protected]
• Ph.D Candidate: Ms. Tianran Wu Email: [email protected]
Office : 1-304, No.1, SEIEE Bld.
单击此处编辑母版标题样式
Part I - fundamentals
Continuous time Fourier transform Discrete time Fourier transform Discrete Fourier transform Z transform
单击此处编辑母版标题样式
Part II – wavelets and sparse signal processing
Time meets frequency Wavelet frames Wavelet zoom Wavelet bases Multiscale geometric analysis Lifting wavelet and filter banks Sparse representation Scattering transform Graph signal processing
单击此处编辑母版标题样式
Text books and references Stéphane Mallat, A Wavelet Tour to Signal
Processing, The Sparse Way, Third Edition,
Elsevier, 2009
Michael Elad, Sparse and Redundant
Representations, From Theory to Applications
in Signal and Image Processing, Springer, 2010
Alan V. Oppenheim, Signals & Systems, Second
Edition, Publishing House of Electronics Industry of
China
Website: http://min.sjtu.edu.cn/courses/wt.htm
单击此处编辑母版标题样式
Related Sources “Sparse and Redundant Representations and Their
Applications in Signal and Image Processing” https://elad.cs.technion.ac.il/236862-course-webpage-winter-semester-2018-2019/
“Wavelets in Signal Processing” http://www.ifp.illinois.edu/~minhdo/teaching/wavelets.html
“Wavelets, Filter Banks and Applications” https://ocw.mit.edu/courses/mathematics/18-327-wavelets-filter-banks-and-applications-spring-2003/ http://www.numerical-tours.com/
单击此处编辑母版标题样式
Requirements and grading
Homework and attendance (20%)
Projects (40%)
Final Examination (40%)
单击此处编辑母版标题样式
Projects (report + source code)
Harmonic analysis
Multi-scale geometry analysis
Wavelet and Filter bank design
Compressive sensing
Sparse coding, representation, dictionary learning
Generalized source coding, and subband coding
Multidimensional signal processing
Other relevant topics
单击此处编辑母版标题样式
Final Examination (online)
• 3 mandatories + 2 optionals (3 days)
• Theoretical analysis
• Algorithm implementations
Check Yourself
• Computer generated music 𝑓𝑓(𝑡𝑡)
𝑓𝑓(𝑡𝑡)
Check Yourself
• Listen to the following three manipulated signals:
𝑓𝑓1(𝑡𝑡) 𝑓𝑓2(𝑡𝑡) 𝑓𝑓3(𝑡𝑡), try to find the correct answer
𝑓𝑓1(𝑡𝑡)
𝑓𝑓2(𝑡𝑡)
𝑓𝑓3(𝑡𝑡)
-𝑓𝑓(𝑡𝑡)
0.5𝑓𝑓(𝑡𝑡)
𝑓𝑓(2𝑡𝑡)
Check Yourself
Check Yourself
14
SCIENCE
Multimedia Signal
Genome
Computer Vision
Real-Time Detection and Recognition
Multi-person pose estimation
Multi-objects detection
3D object tracking 17
18
Biomedical Data
Biomedical Signal
Cell Segmentation Scale-space shape representation (Detail preservation)
Nuclei Tracking Multi-cell segmentation
and tracking
19/9/18 熊红凯 21
Nuclei Tracking
Y
X
Time
frame 7
frame 30
frame 48
frame 84
5192
AVI Result with marked cell index
Nuclei Segmentation and Tracking 22
Hongkai Xiong, Wang B, Zheng Y. F. “A Structured Learning-Based Graph Matching Method For Tracking Dynamic Multiple Objects,” IEEE Trans. Circuits and Systems for Video Technology, Mar. 2013.
Zebrafish Cell Tracking
Zebrafish Cell Segmentation
Biomedical Imaging 3D Axon cell rendering and manual effect
Biomedical Imaging Volume Modeling (e.g. Vessel Modeling)
Biomedical Imaging Shifted normal plane (TMI 2002) Minimum Cost (Optimization) Sphere-kernel Decomposition
Biomedical Imaging Sphere-kernel Current improvement
Light Field
3D Dynamic Model Reconstruction (PLEX Inc.)
Refocusing Images (Lytro Inc.)
…..
Applications:
The original Lytro camera Refocusing Collection of Light Field Information Reconstruction
30
Light-Field Camera — Refocusing 31
This video shows the refocusing results in different depths
Trillion Frames per Second Imaging 32
http://web.media.mit.edu/~raskar//trillionfps/
Fruit Bottle: light bullet
Ultrafast Camera
Femtosecond Laser 2 picosecond per frame femtosecond long laser pulse
33
Separation based on Time Resolved Images
Virtual Reality
Light Field Recording
Share
35
Point Cloud
𝛼𝛼1
𝛼𝛼2 𝛼𝛼3 𝛼𝛼4
𝛼𝛼𝑝𝑝
𝛼𝛼6 𝛼𝛼5
Dictionary Atoms
⋮
Ongoing: 3D hierarchical sparse representation Point cloud color compression
36
⋮
Multiscale Dictionary Learning for Hierarchical Sparse Representation
𝐦𝐦𝐦𝐦𝐦𝐦𝐃𝐃∈𝓒𝓒,𝐀𝐀∈ℝ𝒑𝒑×𝒏𝒏
𝟏𝟏𝒏𝒏�
𝟏𝟏𝟐𝟐
𝐱𝐱𝒊𝒊 − 𝐌𝐌𝒊𝒊𝐃𝐃𝜶𝜶𝒊𝒊 𝟐𝟐𝟐𝟐 + 𝝀𝝀𝓗𝓗 𝜶𝜶𝒊𝒊
𝒏𝒏
𝒊𝒊=𝟏𝟏
Octree Decomposition
Input Voxel
Hierarchical Representation
Multiscale Dictionary
37
Signal Processing
H. Xiong, et al., “Scalable Video Compression Framework with Adaptive Orientational Multiresolution Transform and Nonuniform Directional Filterbank Design”, IEEE Trans. CSVT, 2011.
Reconstruction
Multiscale Multi-directional
Implementation Image Coding
Implementation Video Coding
Protein Phenotype DNA
(Genotype)
Purine Bases: Adenine (A); Guanine (G) Pyramidine Bases:Thymine (T); Cytosine (C)
RNA
40
Conceptual Diagram: Genome Sequence Genome Coding
Many signal processing techniques are based on transform methods
Signal in original domain
Fourier Transform Global basis no location information
Short-time Fourier Transform
Uniform time-frequency Wavelet Transform
Multi-dimension signal? 2D separable
Wavelet Transform
Undesirable bias for coordinate axis directions
Signal Processing Road
A large family of alternative multiscale transforms has been developed.
41
42
1928 H.Nyquist, an engineer at Bell Laboratories, first found the so called Sampling theorem
1933 V. Kotelnikov, an information theory and radar astronomy pioneer from the Soviet Union, was the first to write down a precise statement of Sampling theorem
1949 C.E.Shannon, an American mathematician, electronic engineer, and known as "the father of information theory", stated and proved the sampling theorem
Theoretic Problem Conventional signal processing system
Original continuous-time signal Sampling Transform
Recovered discrete-time signal
Coding/Compression
Reconstruction storage/transmission
Nyquist, Harry. "Certain topics in telegraph transmission theory", Trans. AIEE, vol. 47, pp. 617–644, Apr. 1928 Reprint as classic paper in: Proc. IEEE, Vol. 90, No. 2, Feb 2002.
C. E. Shannon, "Communication in the presence of noise", Proc. Institute of Radio Engineers, vol. 37, no. 1, pp. 10–21, Jan. 1949. Reprint as classic paper in: Proc. IEEE, vol. 86, no. 2, (Feb. 1998)
V. A. Kotelnikov, "On the carrying capacity of the ether and wire in telecommunications", Material for the First All-Union Conference on Questions of Communication, Izd. Red. Upr. Svyazi RKKA, Moscow, 1933 (Russian).
Sparse Representation
Sparse representation
where
θΨ=x
Ψ
L
N
x
=
θ
Σ
LNKKRRRx LLNN ,,,,,0
11 <<=∈∈Ψ∈ ××× θθ
Sparse Representation
“General” measurements instead of samples
[Candes, Romberg, & Tao `04, Donoho `06, Candes ‘06, Tsaig & Donoho `06]
Directly obtain compressed data ?
Nonlinear compressing
Linear sampling
with nonzero components
Abel Prize 2017
Yves Meyer École normale supérieure Paris-Saclay, France
“for his pivotal role in the development of the mathematical theory of wavelets.”
Gauss Prize 2018 David Donoho Stanford University, USA
“for his fundamental contributions to the mathematical, statistical and computational analysis of signal processing.”
47 Wavelet Transform Fast 2D wavelet transform
48 Wavelet Transform Inverse 2D wavelet transform
Multiscale geometric analysis is an emerging area of high-dimensional signal processing and data analysis.
(a) Example ridgelet function 𝜓𝜓𝑎𝑎,𝑏𝑏,𝜃𝜃(𝑥𝑥1, 𝑥𝑥2) (b) Relations between transforms
Applying 1-D wavelet transform to the slices of the Radon transform
Ridgelet Transform : good at capture line sigularity
(c) Reconstruction image
Wavelet
Ridgelet
Not good at handle with curves
49 Multiscale Geometry Analysis Ridgelet transform
Multiscale geometric analysis is an emerging area of high-dimensional signal processing and data analysis.
Applying Ridgelet transform to small blocks (a curved edge is almost straight at sufficiently fine scales)
Curvelet Transform : good at capture curve
Have no ideal discrete implementations
Ridgelet Transform
50 Multiscale Geometry Analysis Curvelet Transform
Discrete Fourier Transform (DFT)
51
Key element
Graph Fourier Transform (GFT)
Graph Signal Processing: Spectrum of Graphs
Basis Frequency Index
Eigenvectors of Laplacian matrix L Eigenvalues of Laplacian matrix L index
Convolutional Neural Network 52
Fig. 2 Structure of AlexNet Feature Extractor Classifier
We can divide the CNN models into two parts – one is a feature extractor and another is a classifier.
The feature extractor reminds us of some commonly used techniques in signal processing, including filter banks and operators exploited in edge detection.
The difference between convolutional kernels and commonly used filters in signal processing is that the former is learnt with huge datasets and the latter is handcrafted.
So is it possible to construct some interpretable models similar to deep CNNs with signal processing methods?
Interpretable CNN 53
Deep Convolutional Neural Networks have been widely applied since the breakthrough in 2012 ImageNet competition (Russakovsky et al., 2015) achieved by AlexNet (Krizhevsky et al., 2012).
Convolutional neural networks (CNNs) have achieved superior performance in many visual tasks, such as object classification and detection. However, the interpretability of the model is always an Achilles’ heel of neural networks.
Fig. 1 Top1 accuracies on ImageNet of different networks
Things we want to know: • Why CNNs performed so well? • What knowledge do CNNs
learn with huge datasets? • What can we learn from CNNs
to construct further signal processing tools?
54
Frequency
Salvador Dali “Gala Contemplating the Mediterranean Sea, which at 30 meters becomes the portrait of Abraham Lincoln”, 1976
Jean Baptiste Joseph Fourier (1768-1830)
• had crazy idea (1807): • Any periodic function can be
rewritten as a weighted sum of sines and cosines of different frequencies.
• Don’t believe it? ▫ Neither did Lagrange, Laplace,
Poisson and other big wigs ▫ Not translated into English
until 1878! • But it’s true! ▫ called Fourier Series
Frequency Spectra • example : g(t) = sin(2πf t) + (1/3)sin(2π(3f) t)
= +
Slides: Efros
Frequency Spectra
= +
=
Frequency Spectra
= +
=
Frequency Spectra
= +
=
Frequency Spectra
= +
=
Frequency Spectra
= +
=
Frequency Spectra
= 1
1 sin(2 )k
A ktk
π∞
=∑
Frequency Spectra
Deep Networks using Fourier Analysis 64
DNNs can exploit the geometry of low dimensional data manifolds to approximate complex functions that exist along the manifold with simple functions when seen with respect to the input space. The magnitude of a particular frequency component (k) of deep
ReLU network function decays at least as fast as O( ), with width and depth helping polynomially and exponentially (respectively) in modeling higher frequencies. This shows for instance why DNNs cannot perfectly memorize peaky delta-like functions. DNN parameters corresponding to functions with higher
frequency components occupy a smaller volume in the parameter.
Sparseland : A Formal Description 65
m
n
A Dictionary 𝐃𝐃
α A Sparse Vector
= n
Signal x
• Every column in 𝐃𝐃 (dictionary) is a prototype signal (atom)
• The vector α is generated with few non-zeros at arbitrary locations and values
minα α 0 s. t. x = 𝐃𝐃α
minα α 0 s. t. 𝐃𝐃α − y 2 ≤ ε
Approximation Algorithms
Greedy methods Thresholding/OMP
Relaxation methods Basis-Pursuit
L0 – counting number of non-zeros in the vector
This is a projection onto the Sparseland model
These problems are known to be NP-Hard problem
Convolution Sparse Coding (CSC) 66
• What is the corresponding global model? This brings us to … the Convolutional Sparse Coding (CSC)
• When handling images, Sparseland is typically deployed on small overlapping patches due to the desire to train the model to fit the data better
• The model assumption is: each patch in the image is believed to have a sparse representation w.r.t. a common local dictionary
Convolution Sparse Coding (CSC) 67
[𝐗𝐗] = � di
𝑚𝑚
i=1
∗ [Γi]
An image with 𝑁𝑁 pixels
The i-th filter of small size 𝑛𝑛
i-th feature-map: An image of the same size as 𝐗𝐗 holding the sparse representation related to the i-filter
𝑚𝑚 filters convolved with their sparse representations
Why CSC? 68
=
𝐗𝐗 = 𝐃𝐃𝐃𝐃
𝐑𝐑𝒊𝒊𝐗𝐗 𝜸𝜸𝒊𝒊
𝑛𝑛
(2𝑛𝑛 − 1)𝑚𝑚
stripe-dictionary
𝛀𝛀
stripe vector
𝐑𝐑i𝐗𝐗 = 𝛄𝛄i
𝐑𝐑𝒊𝒊+𝟏𝟏𝐗𝐗 𝑛𝑛
(2𝑛𝑛 − 1)𝑚𝑚
𝜸𝜸𝒊𝒊+𝟏𝟏
𝐑𝐑i+1𝐗𝐗 = 𝛀𝛀𝛄𝛄i+1 • Every patch has a sparse representation w.r.t. to the
same local dictionary (𝛀𝛀) just as assumed for images
• There is a rough analogy between CSC and CNN: 1. Convolutional structure 2. Data driven model 3. ReLU is a sparsifying operator
• We shall now propose a principled way to analyze CNN
From CSC to Multi-Layered CSC 69
𝐗𝐗 ∈ ℝ𝑁𝑁 𝑚𝑚1
𝑛𝑛0
𝐃𝐃1 ∈ ℝ𝑁𝑁×𝑁𝑁𝑚𝑚1
𝑛𝑛1𝑚𝑚1 𝑚𝑚2 𝐃𝐃2 ∈ ℝ𝑁𝑁𝑚𝑚1×𝑁𝑁𝑚𝑚2
𝑚𝑚1
𝐃𝐃1 ∈ ℝ𝑁𝑁𝑚𝑚1
𝐃𝐃1 ∈ ℝ𝑁𝑁𝑚𝑚1
𝐃𝐃2 ∈ ℝ𝑁𝑁𝑚𝑚2
Convolutional sparsity (CSC) assumes an
inherent structure is present in natural
signals
We propose to impose the same structure on the
representations themselves
Multi-Layer CSC (ML-CSC)
Multi-Layer CSC 70
𝐗𝐗 ∈ ℝ𝑁𝑁 𝐃𝐃1 ∈ ℝ𝑁𝑁×𝑁𝑁𝑚𝑚1 𝐃𝐃2 ∈ ℝ𝑁𝑁𝑚𝑚1×𝑁𝑁𝑚𝑚2
𝐃𝐃1 ∈ ℝ𝑁𝑁𝑚𝑚1
𝐃𝐃2 ∈ ℝ𝑁𝑁𝑚𝑚2
• We can chain the all the dictionaries into one effective dictionary 𝐃𝐃eff = 𝐃𝐃1𝐃𝐃2𝐃𝐃3 ∙∙∙ 𝐃𝐃K→ 𝐱𝐱 = 𝐃𝐃eff 𝐃𝐃K
• This is a special Sparseland (indeed, a CSC) model
𝐃𝐃1 ∈ ℝ𝑁𝑁𝑚𝑚1
• However: A key property in this model: sparsity of the intermediate representations The effective atoms: atoms → molecules → cells → tissue → body-parts …
71
• Scattering Convolutional Networks [S. Mallat, PAMI13]
Convolution + modulus pooling
network architecture Translation invariance
Deformation Stability
Energy Propagation
Geometric image priors
Wavelet decomposition
(Stable to deformation)
Convolution
Average Pooling
Modulus
Scaling
Rotate
Input signals
Various Wavelet
Filters U
S
Wavelet Filter
A special type of CNN with pre- defined complex wavelet filters and modulus operator
Convolutional Networks Scattering Networks
Not invertible!
Sparse Auto-encoder
The auto-encoder tries to learn a function it is trying to learn an approximation to the identity function, so as to output is similar to .
xxh bw ≈)(,
x̂ x
Convolutional Autoencoder
Winner-Take-All Auto-encoders (Alireza et.al. 2015) Propose the convolutional winner-take-all auto-encoder which combines the benefits of convolutional architectures and auto-encoders for learning sparse representations.
Auto-encoder with adversarial training (Oren et.al. 2017) Introduce adversarial training in convolutional auto-encoder, which enables to produce pleasing reconstruction for very low bitrates.
Autoencoder with Recurrent Neural Networks
Autoencoder with Recurrent Neural Networks (George et.al.
2017)
The architecture consists of a reurrent neural network(RNN)–based
encoder and decoder, a binarizer, and a neural network for entropy
coding.
Ongoing Work
Introduce structural sparsity learning in convolutional
autoencoder
The sparsity penalty(Grouped-lasso) helps the architecture adaptively
produce less feature map and keeping structure information.
En-coder De-coder
Grouped-lasso Penalty
Reconstruction Loss
Structural Loss
Many Thanks
Q & A