On Missing Data Prediction using Sparse Signal Models: A Comparison of Atomic Decompositions with...

On Missing Data Prediction using Sparse Signal Models:

A Comparison of Atomic Decompositions with Iterated

DenoisingOnur G. Guleryuz

DoCoMo USA Labs,San Jose, CA 95110

guleryuz@docomolabs-usa.com

(google: onur guleryuz)

(Please view in full screen mode. The presentation tries to squeeze in too much, please feel free to email me any questions you may have.)

•Problem statement: Prediction of missing data.

•Formulation as a sparse linear expansion over overcomplete basis.

•AD ( regularized) and ID formulations.

•Short simulation results ( regularized) .

•Why ID is better than AD.

•Adaptive predictors on general data: all methods are mathematically the same.

Key issues are basis selection, and utilizing what you have effectively.

Overview

Mini FAQ:1. Is ID the same as ? No.2. Is ID the same as , except implemented iteratively? No.3. Are predictors that yield the sparsest set of expansion coefficients the best? No,

predictors that yield the smallest mse are the best.4. On images, look for performance over large missing chunks (with edges).

Some results from Ivan W. Selesnick, Richard Van Slyke, and Onur G. Guleryuz,``Pixel Recovery via l1 Minimization in the Wavelet Domain,'‘ Proc. IEEE Int'l Conf. on Image Proc. (ICIP2004), Singapore, Oct. 2004.

( Some software available at my webpage.)Pretty ID pictures: Onur G. Guleryuz, ``Nonlinear Approximation Based Image Recovery Using Adaptive Sparse Reconstructions and Iterated Denoising: Part II – Adaptive Algorithms,‘’ IEEE Tr. on IP, to appear.

Problem Statement

Original image

available pixels

lost region pixels(assume zero mean)

0x 0}n

1}n)( 10 Nnn 2.

Lost region

yP available data projection (“mask”)

xyDerive predicted3.

+ = )1(

xySignal space

Noisy signal (noise correlated with the data)

type 1 iterations

Recipe for

iihcHcy

MhhhH ...21

1. Take NxM matrix of overcomplete basis,

2. Write y in terms of the basis

3. Find “sparse” expansion coefficients (AD v.s. ID)

Any y has to be sparse

xy 001 )(ˆ xxAx

1N null space of dimension 01 nNn

ˆy has to be sparse

Onur’s trivial sparsity theorem:

Estimation algorithms equivalent basis in which estimates are sparse

Who cares about y, what about the original x?

2|||| yx

If successful prediction is possible x also has to be ~sparse

small, then x ~ sparse

1. Predictable sparse

2. Sparsity of x is a necessary leap of faith to make in estimation

i.e., if

•Caveat: Any estimator is putting up a sparse y. Assuming x is sparse, the estimator that wins is the one that matches the sparsity “correctly”!•Putting up sparse estimates is not the issue, putting up estimates that minimize mse is.

•Can we be proud of the formulation? Not really. It is honest, but ambitious.

AD: Find the expansion coefficients to minimize the norm

0l norm of expansion coefficients

Regularization Available data constraint

TxhcPM

10 ||)(||

0||min subject to

Getting to the heart of the matter:

AD with Significance Sets

TxyP 20 ||)(||

)(min ScardS

subject to

Finds the sparsest (the most predictable) signal consistent with the available data.

Iterated Denoising with Insignificant Sets

subject to )(

2||minTIi

yyh xPyP 00

( 0 THx

reconsdenoising_y

(Once the insignificant set is determined, ID uses well defined denoising operators to construct mathematically sound equations)

),,(1 dTTHyreconsdenoising_y

)2,,( 12 dTTHyreconsdenoising_y

),,( 1f

PP THyreconsdenoising_y

Progressions

Pick )(TI

Recipe for using your transform based image denoiser (to justify progressions, think decaying coefficients): …

Mini Formulation Comparison

subject toIi

yyh 2||min xPyP 00

TxyP 20 ||)(||)(min ScardS

subject to ,

No progression ID

•If H is orthonormal the two formulations come close.

•Important thing is how you determine the sets/sparsity (ID: Robust DSP, AD: sparsest)

•ID uses progressions, progressions change everything!

Simulation Comparison

H: Two times expansive M=2N, real, isotropic, dual-tree, DWT. Real part of:

N. G. Kingsbury, ``Complex wavelets for shift invariant analysis and filtering of signals,‘’ Appl. Comput. Harmon. Anal., 10(3):234-253, May 2002.

ID (no layering and no selective thresholding)

AD TxhcPM

10 ||)(||

0||min subject to1

D. Donoho, M. Elad, and V. Temlyakov, ``Stable Recovery of Sparse Overcomplete Representations in the Presence of Noise‘’.

0l :1l

Original

Missing

l1: 21.40 dB

ID: 30.38 dB

Original

Missing

l1: 23.49 dB

ID: 25.39 dB

Simulation Results

( results are doctored!)1l

Problems in ? Yes and no.

What is wrong with AD?

•I will argue that even if we used an “ solver”, ID will in general prevail.•Specific issues with .•How to fix the problems with based AD.•How to do better.

So let’s assume we can solve the problem ...0l

Bottom Up (AD) v.s. Top Down (ID)

Prediction as signal construction:

•AD is a builder that tries to accomplish constructions using as few bricks as possible. Requires very good basis.

•ID is a sculptor that removes portions that do not belong in the final construction by using as many carving steps as needed. Requires good denoising.

AD Builder ID Sculptor

Application is not compression! (“Where will the probe hit the meteor?”, “What is the value of S&P500 tomorrow?”)

Significance v.s. Insignificance,The Sherlock Holmes Principle

•Both ID and AD do well with very good basis. But ID can also use unintuitive basis for sophisticated results.

E.g.: ID can use “unsophisticated”, “singularity unfriendly” DCT basis to recover singularities. AD cannot!Secret: DCTs are not great on singularities but they are very good on everything else!

"How often have I said to you that when you have eliminated the impossible whatever remains, however improbable, must be the truth?"Sherlock Holmes, in "The Sign of the Four"

non-singularities

singularities

•DCTs are very good at eliminating non-singularities.•ID is more robust to basis selection compared to AD (secretly violate coherency restrictions).

•You can add to the AD dictionary but solvers won’t be able to handle it.

Sherlock Holmes Principle using overcomplete DCTs for eliminationPredicting missing edge pixels: basis: DCT 16x16

Predicting missing wavelet coefficients over edges:

Onur G. Guleryuz, ``Predicting Wavelet Coefficients Over Edges Using Estimates Based on Nonlinear Approximants,’’ Proc. Data Compression Conference, IEEE DCC-04, April 2004..

basis: DCT 8x8

Do not abandon isotropic *lets, use a framework that can extract the most mileage from the chosen basis (“sparsest”).

( 0 THx

reconsdenoising_y

Progressions

“Annealing” Progressions (think decaying coefficients)

),,( 1f

PP THyreconsdenoising_y

basis: DCT 16x16, best threshold

Progressions generate up to tens of dBs. If the data was very sparse with respect to H, if we were solving a convex problem, why should progressions matter? Modeling assumptions…

iterations of simple denoisingtype 1

Sparse Modeling Generates Non-Convex Problems

Pixel coordinates for a “two pixel” image

Transform coordinates

available pixel

missing pixel

xavailable pixel constraint

Equally sparse solutions

More skeptical picture:

How does this affect some “AD solvers”, i.e., ?

Geometry

ball1l

Case 1 Case 2 Case 3

Linear/Quadratic program, …, Not sparse!

Case 3: the magic is gone…

“Under i.i.d. Laplacian model for the joint probability of expansion coefficients, ...

),...,,(max 21 Mcccp

1l normmin

You now have to argue:

Problems with the norm I1lWhat about all the optimality/sparsest results?

Results such as: D. Donoho et. al. ``Stable Recovery of Sparse Overcomplete Representations in the Presence of Noise‘’…

are very impressive, but they are closely tied to H providing the sparsest decomposition for x. Not every problem has this structure.

)( 2212 aNn

),(),( 211fyxmsel

Worst case noise robustness results, but overwhelming noise:

modeling error error due to missing data

Problems with the norm II

min subject to

MhhhH ...21

“nice” basis, “decoherent”

“not nice” basis (due to masking), may become very “coherent”

2(problem due to )

TxhcPM

10 ||)(||

TxPhPcM

10 ||||

Example

6/12/13/1

6/203/1

6/12/13/1

H orthonormal, coherency=0

6/12/13/1

0HPunnormalized coherency= 6/1normalized coherency= 1(worst possible)

Optimal solution sometimes tries to make coefficients of scaling functions zero.

Possible fix using Progressions

( 0 THx

l1_reconsy

||min subject to

•If you pick a large T maybe you can pretend the first one is a convex problem.•This is not an l1 problem! No single l1 solution will generate the final.•After the first few solutions, you may start hitting l1 issues.

),,(1 dTTHyl1_reconsy

),,( 1f

PP THyl1_reconsy

2. Enforce available data

The fix is ID!

),,(1 dTTHyl1_reconsy v.s.

: You can do soft thresholding, “block descent”, or

Daubechies, Defrise, De Mol, “An iterative thresholding algorithm for linear inverse problems with a sparsity constraint”,Figueiredo and Nowak, “An EM Algorithm for Wavelet-Based Image Restoration”.

•There are many “denoising” techniques that discover the “true” sparsity.•Pick the technique that is cross correlation robust.

>>Experience suggests:

Conclusion

•Smallest mse not necessarily = sparsest. Somebody putting up really bad estimates maybe very sparse (sparser than us) with respect to some basis.•Good denoisers should be cross correlation robust (hard thresholding tends to beat soft).•How many iterations you do within each l1_recons() or denoising_recons() is not very important.•Progressions! •Wil l1 generate sparse results? In the sense of the trivial sparsity theorem, of course! (Sparsity may not be in terms of your intended basis :). Please check the assumptions for your problem!

TxhcPM

10 ||)(||

0||min subject to1

•To see its limitations, go ahead and solve the real l1 (with or without masking setups, you can even cheat on T) and compare to ID.

The trivial sparsity theorem is true. The prediction problem is all about the basis. ID simply allows the construction of a sophisticated, signal adaptive basis, by starting with a simple dictionary!

On Missing Data Prediction using Sparse Signal Models: A Comparison of Atomic Decompositions with...

Documents

Transcript of On Missing Data Prediction using Sparse Signal Models: A Comparison of Atomic Decompositions with...

A Nonlinear Loop Filter for Quantization Noise Removal in Hybrid Video Compression Onur G. Guleryuz DoCoMo USA Labs guleryuz@docomolabs-usa.com.

Machine Learning in Iterated Prisoner’s Dilemma using Evolutionary ... · ITERATED PRISONER’S DILEMMA 1 Kanpur Genetic Algorithms Laboratory Machine Learning in Iterated Prisoner’s

Tensor Decompositions

Onur G. Guleryuz & Ulas C.Kozat DoCoMo USA Labs, San Jose, CA 95110 {guleryuz,kozat}@docomolabs-usa.com.

Applications of Tree Decompositions

Minkowski decompositions of associahedra

CursoDeLadino.com.ar - The History of Turkish Jews - Naim Guleryuz

Evolutionary Iterated Prisoner’s Dilemma Game

Bounding Iterated Function Systems

Polynomial Matrix Decompositions

Möbius iterated function systems - University of Florida · PDF fileiiafreeoffpriproitotheauthorbythepubi.Copyrirestriimayapply. MOBIUS ITERATED FUNCTION SYSTEMS 495¨ commutes: L

Iterated local reflection versus iterated consistency · PDF fileIterated local reflection versus iterated consistency Lev Beklemishev* Steklotl ~at~emat~eal Institute, Vaailolta 42,

Iterated Dominance and Iterated Best Response in ...authors.library.caltech.edu/11560/1/HOTaer98.pdf · Iterated Dominance and Iterated Best Response in Experimental "p-Beauty Contests"

11 Iterated Containers

X19 partial fraction decompositions

› ~laviolette › Publications › JCT-B05_Circuit_Decomp... · Decompositions of inﬁnite graphs: Part II Circuit ...Decompositions of inﬁnite graphs: Part II Circuit decompositions

Iterated Snap Rounding

Two-coloured Path Decompositions

Onur G. Guleryuz & Ulas C.Kozat

Iterated Register Coalescing