omp-and-k-svd - Gdc2013
-
Upload
manchor-ko -
Category
Technology
-
view
2.214 -
download
4
description
Transcript of omp-and-k-svd - Gdc2013
![Page 1: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/1.jpg)
Orthogonal Matching Pursuit and K-SVD for Sparse Encoding Manny Ko Senior Software Engineer, Imaginations Technologies
Robin Green SSDE, Microsoft Xbox ATG
![Page 2: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/2.jpg)
Outline
● Signal Representation
● Orthonormal Bases vs. Frames
● Dictionaries
● The Sparse Signal Model
● Matching Pursuit
● Implementing Orthonomal Matching Pursuit
● Learned Dictionaries and KSVD
● Image Processing with Learned Dictionaries
● OMP for GPUs
![Page 3: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/3.jpg)
Representing Signals
● We represent most signals as linear combinations of things we already know, called a projection
× 𝛼1 +
× 𝛼2 +
× 𝛼3 +⋯
=
× 𝛼0 +
![Page 4: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/4.jpg)
Representing Signals
● Each function we used is a basis and the scalar weights are coefficients
● The reconstruction is an approximation to the original 𝑥 ● We can measure and control the error 𝑥 − 𝑥 2
𝑥 𝑡 = 𝑏𝑖 𝑡 𝛼𝑖
𝑁
𝑖=0
![Page 5: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/5.jpg)
Orthonormal Bases (ONBs)
● The simplest way to represent signals is using a set of orthonormal bases
𝑏𝑖 𝑡 𝑏𝑗(𝑡)
+∞
−∞
𝑑𝑡 = 0 𝑖 ≠ 𝑗 1 𝑖 = 𝑗
![Page 6: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/6.jpg)
Example ONBs
● Fourier Basis
𝑏𝑘 𝑡 = 𝑒𝑖2𝑝𝑘𝑡
● Wavelets
𝑏𝑚,𝑛 𝑡 = 𝑎−𝑚 2 𝑥 𝑎−𝑚𝑡 − 𝑏𝑚
● Gabor Functions
𝑏𝑘,𝑛 𝑡 = 𝜔 𝑡 − 𝑏𝑛 𝑒𝑖2𝑝𝑘𝑡
● Contourlet
𝑏𝑗,𝑘,𝐧 𝑡 = λ𝑗,𝑘 𝑡 − 2𝑗−1𝐒𝑘n
![Page 7: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/7.jpg)
Benefits of ONB
● Analytic formulations
● Well understood mathematical properties
● Fast algorithms for projection
![Page 8: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/8.jpg)
Limitations
● Orthonormal bases are optimal only for specific synthetic signals ● If your signal looks exactly like your basis, you only need one
coefficient
● Limited expressiveness, all signals behave the same
● Real world signals often take a lot of coefficients ● Just truncate the series, which leads to artifacts like aliasing
![Page 9: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/9.jpg)
Smooth vs. Sharp
Haar Wavelet Basis
● Sharp edges
● Local support
Discrete Cosine Transform
● Smooth signals
● Global support
![Page 10: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/10.jpg)
Overcomplete Bases
● Frames are overcomplete bases ● There is now more than one way to represent a signal
● By relaxing the ONB rules on minimal span, we can better approximate signals using more coefficients
Φ = 𝑒1|𝑒2|𝑒3
=1 0 10 1 −1
Φ = 𝑒 1| 𝑒 2|𝑒 3
=2 −1 −10 1 0
![Page 11: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/11.jpg)
Dictionaries
● A dictionary is an overcomplete basis made of atoms
● A signal is represented using a linear combination of only a few atoms
● Atoms work best when zero-mean and normalized
𝑑𝑖𝑖∈𝐼
𝛼𝑖 = 𝑥
𝑫𝛼 = 𝑥
![Page 12: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/12.jpg)
Dictionaries
𝐃 α
=
𝑥
![Page 13: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/13.jpg)
Mixed Dictionaries
● A dictionary of Haar + DCT gives the best of both worlds But now how do we pick which coefficients to use?
![Page 14: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/14.jpg)
The Sparse Signal Model
𝐃 A fixed dictionary
𝛼
=
𝑥
𝑁 𝑁
𝐾
resulting signal
Sparse vector of
coefficients
![Page 15: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/15.jpg)
The Sparse Signal Model
It’s Simple ● Every result is built from a combination of a few atoms
It’s Rich ● It’s a general model, signals are a union of many low dimensional parts
It’s Used Everywhere ● The same model is used for years in Wavelets, JPEG compression,
anything where we’ve been throwing away coefficients
![Page 16: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/16.jpg)
Solving for Sparsity
What is the minimum number of coefficients we can use?
1. Sparsity Constrained
keep adding atoms until we reach a maximum count
2. Error Constrained Keep adding atoms until we reach a certain accuracy
𝛼 = argmin 𝛼
𝛼 0 s. t. 𝐃𝛼 − 𝑥 22 ≤ 𝜖
𝛼 = argmin𝛼
𝐃𝛼 − 𝑥 22 s. t. 𝛼 0 ≤ 𝐾
![Page 17: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/17.jpg)
Naïve Sparse Methods
● We can directly find 𝛼 using Least Squares
● Given K=1000 and L=10 at one LS per nanosecond this would complete in ~8 million years.
1. set 𝐿 = 1
2. generate 𝑆 = { 𝒫𝐿 𝑫 }
3. for each set solve the Least Squares problem min𝛼
𝐃𝛼 − 𝑥 22
where 𝑠𝑢𝑝𝑝 𝛼 ∈ 𝑆𝑖
4. if LS error ≤ 𝜖 finish!
5. set 𝐿 = 𝐿 + 1
6. goto 2
![Page 18: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/18.jpg)
Matching Pursuit
1. Set the residual 𝑟 = 𝑥
2. Find an unselected atom that best matches the residual 𝐃𝛼 − 𝑟
3. Re-calculate the residual from matched atoms 𝑟 = 𝑥 − 𝐃𝛼
4. Repeat until 𝑟 ≤ 𝜖
Greedy Methods
𝐃 𝛼
=
𝑥
![Page 19: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/19.jpg)
Problems with Matching Pursuit (MP)
● If the dictionary contains atoms that are very similar, they tend to match the residual over and over
● Similar atoms do not help the basis span the space of representable values quickly, wasting coefficients in a sparsity constrained solution
● Similar atoms may match strongly but will not have a large effect in reducing the absolute error in an error constrained solution
![Page 20: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/20.jpg)
Orthogonal Matching Pursuit (OMP)
● Add an Orthogonal Projection to the residual calculation
1. set 𝐼 ∶= ∅ , 𝑟 ≔ 𝑥, 𝛾 ≔ 0
2. while (𝑠𝑡𝑜𝑝𝑝𝑖𝑛𝑔 𝑡𝑒𝑠𝑡 𝑓𝑎𝑙𝑠𝑒) do
3. 𝑘 ≔ argmax𝑘
𝑑𝑘𝑇𝑟
4. 𝐼 ≔ 𝐼, 𝑘
5. 𝛾𝐼 ≔ 𝐃𝐼+𝑥
6. 𝑟 ≔ 𝑥 − 𝐃𝐼𝛾𝐼
7. end while
![Page 21: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/21.jpg)
Uniqueness and Stability
● OMP has guaranteed reconstruction (provided the dictionary is overcomplete)
● By projecting the input into the range-space of the atoms, we know that that the residual will be orthogonal to the selected atoms
● Unlike Matching Pursuit (MP) that atom, and all similar ones, will not be reselected so more of the space is spanned per iteration
![Page 22: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/22.jpg)
Orthogonal Projection
● If the dictionary 𝐃 was square, we could use an inverse
● Instead we use the Pseudo-inverse 𝐃+ = 𝐃𝑇𝐃 −1𝐃𝑇
𝐃+
× =
−1
𝐃𝑇 𝐃𝑇 𝐃𝑇 𝑖𝑛𝑣 × 𝐃
=
![Page 23: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/23.jpg)
Pseudoinverse is Fragile
● In floating point, the expression 𝐃T𝐃−1
is notoriously
numerically troublesome – the classic FP example
● Picture mapping all the points on a sphere using 𝐃𝑇𝐃 then inverting
![Page 24: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/24.jpg)
Implementing the Pseudoinverse
● To avoid this, and reduce the cost of inversion, we can note that 𝐃T𝐃 is always symmetric and positive definite
● We can break the matrix into two triangular matrices using Cholesky
Decomposition 𝐀 = 𝐋𝐋𝑇
● Incremental Cholesky Decomp reuses the results of the previous iteration, adding a single new row and column each time
𝐋𝑛𝑒𝑤 =
𝐋 0
𝑤𝑇 1 − 𝑤𝑇𝑤 where 𝑤 = 𝐋−1𝐷𝐼𝑑𝑘
![Page 25: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/25.jpg)
OMP-Cholesky 1. set 𝐼 ∶= ∅ , 𝐿 ≔ 1 , 𝑟 ≔ 𝑥, 𝛾 ≔ 0, 𝛼 ≔ 𝐃𝑇𝑥, 𝑛 ≔ 1
2. while (𝑠𝑡𝑜𝑝𝑝𝑖𝑛𝑔 𝑡𝑒𝑠𝑡 𝑓𝑎𝑙𝑠𝑒) do
3. 𝑘 ≔ argmax𝑘
𝑑𝑘𝑇𝑟
4. if 𝑛 > 1 then 𝑤 ≔ Solve for 𝑤 𝐋𝑤 = 𝐃𝐼
𝑇𝑑𝑘
𝐋 ≔ 𝐋 𝟎
𝑤𝑇 1 − 𝑤𝑇𝑤
5. 𝐼 ≔ 𝐼, 𝑘
6. 𝛾𝐼 ≔ Solve for 𝑐 𝐋𝐋𝑇𝑐 = 𝛼𝐼
7. 𝑟 ≔ 𝑥 − 𝐃𝐼𝛾𝐼
8. 𝑛 ≔ 𝑛 + 1
9. end while
![Page 26: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/26.jpg)
OMP compression of Barbara
2 atoms 3 atoms 4 atoms
![Page 27: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/27.jpg)
![Page 28: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/28.jpg)
![Page 29: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/29.jpg)
![Page 30: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/30.jpg)
Batch OMP (BOMP)
● By pre-computing matrices, Batch OMP can speed up OMP on large numbers (>1000) of inputs against one dictionary
● To avoid computing 𝐃𝑇𝑟 at each iteration
● Precompute 𝐃𝑇𝑥 and the Gram-matrix 𝐆 = 𝐃𝑇𝐃
𝐃𝑇𝑟 = 𝐃𝑇(𝑥 − 𝐃𝐼(𝐃𝐈)+𝑥)
= 𝐃𝑇𝑥 − 𝐆𝐼(𝐃𝐼)+𝑥
= 𝐃𝑇𝑥 − 𝐆𝐼(𝐃𝐼𝑇𝐃𝐼)
−1𝐃𝐼𝑇𝑥
= 𝐃𝑇𝑥 − 𝐆𝐼(𝐆𝐼,𝐼)−1𝐃𝐼
𝑇𝑥
![Page 31: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/31.jpg)
Learned Dictionaries and K-SVD
● OMP works well for a fixed dictionary, but it would work better if we could optimize the dictionary to fit the data
𝐃 ≈ ⋯ 𝐗 ⋯ 𝐀
![Page 32: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/32.jpg)
Sourcing Enough Data
● For training you will need a large number of samples compared to the size of the dictionary.
● Take blocks from all integer offsets on the pixel grid
𝑥 = …
![Page 33: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/33.jpg)
1. Sparse Encode
● Sparse encode all entries in 𝐱. Collect these sparse vectors into a square array 𝐀
𝐃
𝐀T
𝐗T
⋯
⋯
![Page 34: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/34.jpg)
2. Dictionary Update
● Find all 𝐗 that use atom in column 𝒅𝑘
⋯
⋯
![Page 35: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/35.jpg)
2. Dictionary Update
● Find all 𝐗 that use atom in column 𝑑𝑘
● Calculate the error without 𝑑𝑘
by 𝐄 = 𝐗𝐼 − 𝑑𝑖𝐀𝐢𝑖≠𝑘
● Solve the LS problem:
● Update 𝑑𝑖 with the new 𝑑 and 𝐀 with the new 𝑎
𝑑, 𝑎 = Argmin𝑑,𝑎
𝐄 − 𝑑𝑎𝑇 𝐹2 𝑠. 𝑡. 𝑑 2 = 1
𝑎
![Page 36: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/36.jpg)
Atoms after K-SVD Update
![Page 37: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/37.jpg)
How many iterations of update?
200000
250000
300000
350000
400000
450000
500000
550000
600000
0 10 20 30 40 50 60 70
Batch OMP
K-SVD
![Page 38: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/38.jpg)
Sparse Image Compression
● As we have seen, we can control the number of atoms used per block
● We can also specify the exact size of the dictionary and optimize it for each data source
● The resulting coefficient stream can be coded using a Predictive Coder like Huffman or Arithmetic coding
![Page 39: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/39.jpg)
![Page 40: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/40.jpg)
Domain Specific Compression
● Using just 550 bytes per image
1. Original
2. JPEG
3. JPEG2000
4. PCA
5. KSVD per block
![Page 41: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/41.jpg)
Sparse Denoising
● Uniform noise is incompressible and OMP will reject it
● KSVD can train a denoising dictionary from noisy image blocks
Source Result 30.829dB Noisy image
20
![Page 42: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/42.jpg)
Sparse Inpainting
● Missing values in 𝐱 means missing rows in 𝐃
● Remove these rows and refit α to recover 𝐱 ● If 𝛼 was sparse enough, the recovery will be perfect
=
![Page 43: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/43.jpg)
Sparse Inpainting
Original 80% missing Result
![Page 44: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/44.jpg)
Super Resolution
![Page 45: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/45.jpg)
Super Resolution
The Original Bicubic Interpolation SR result
![Page 46: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/46.jpg)
Block compression of Voxel grids
● “A Compression Domain output-sensitive volume rendering architecture based on sparse representation of voxel blocks” Gobbetti, Guitian and Marton [2012]
● COVRA sparsely represents each voxel block as a dictionary of 8x8x8 blocks and three coefficients
● The voxel patch is reconstructed only inside the GPU shader so voxels are decompressed just-in-time
● Huge bandwidth improvements, larger models and faster rendering
![Page 47: omp-and-k-svd - Gdc2013](https://reader034.fdocuments.net/reader034/viewer/2022042515/55583725d8b42ac6078b4915/html5/thumbnails/47.jpg)
Thank you to:
● Ron Rubstein & Michael Elad
● Marc LeBrun
● Enrico Gobbetti