Download - Elysium Technologies Private Limitedelysiumtechnologies.com/wp-content/uploads/2014/07/...ETPL NT-035 Distortion-Aware Scalable Video Streaming to Multinetwork Clients ETPL NT-036




ETPL NT-001 Answering “What-If” Deployment and Configuration Questions With WISE: Techniques and Deployment Experience

ETPL NT-002 Complexity Analysis and Algorithm Design for Advance Bandwidth Scheduling in Dedicated Networks

ETPL NT-003 Diffusion Dynamics of Network Technologies With Bounded Rational Users: Aspiration-Based Learning

ETPL NT-004 Delay-Based Network Utility Maximization

ETPL NT-005 A Distributed Control Law for Load Balancing in Content Delivery Networks

ETPL NT-006 Efficient Algorithms for Neighbor Discovery in Wireless Networks

ETPL NT-007 Stochastic Game for Wireless Network Virtualization

ETPL NT-008 ABC: Adaptive Binary Cuttings for Multidimensional Packet Classification,

ETPL NT-009 A Utility Maximization Framework for Fair and Efficient Multicasting in Multicarrier Wireless Cellular Networks

ETPL NT-010 Achieving Efficient Flooding by Utilizing Link Correlation in Wireless Sensor Networks,

ETPL NT-011 Random Walks and Green's Function on Digraphs: A Framework for Estimating Wireless Transmission Costs

ETPL NT-012 "A Flexible Platform for Hardware-Aware Network Experiments and a Case Study on Wireless Network Coding

ETPL NT-013 Exploring the Design Space of Multichannel Peer-to-Peer Live Video Streaming Systems

ETPL NT-014 Secondary Spectrum Trading—Auction-Based Framework for Spectrum Allocation and Profit Sharing

ETPL NT-015 Towards Practical Communication in Byzantine-Resistant DHTs

ETPL NT-016 Semi-Random Backoff: Towards Resource Reservation for Channel Access in Wireless LANs

ETPL NT-017 Entry and Spectrum Sharing Scheme Selection in Femtocell Communications Markets

ETPL NT-018 On Replication Algorithm in P2P VoD,

ETPL NT-019 Back-Pressure-Based Packet-by-Packet Adaptive Routing in Communication Networks

ETPL NT-020 Scheduling in a Random Environment: Stability and Asymptotic Optimality

ETPL NT-021 An Empirical Interference Modeling for Link Reliability Assessment in Wireless Networks

ETPL NT-022 On Downlink Capacity of Cellular Data Networks With WLAN/WPAN Relays

ETPL NT-023 Centralized and Distributed Protocols for Tracker-Based Dynamic Swarm Management

ETPL NT-024 Localization of Wireless Sensor Networks in the Wild: Pursuit of Ranging Quality

ETPL NT-025 Control of Wireless Networks With Secrecy

ETPL NT-026 ICTCP: Incast Congestion Control for TCP in Data-Center Networks

ETPL NT-027 Context-Aware Nanoscale Modeling of Multicast Multihop Cellular Networks

ETPL NT-028 Moment-Based Spectral Analysis of Large-Scale Networks Using Local Structural Information

ETPL NT-029 Internet-Scale IPv4 Alias Resolution With MIDAR

ETPL NT-030 Time-Bounded Essential Localization for Wireless Sensor Networks

ETPL NT-031 Stability of FIPP -Cycles Under Dynamic Traffic in WDM Networks

ETPL NT-032 Cooperative Carrier Signaling: Harmonizing Coexisting WPAN and WLAN Devices

ETPL NT-033 Mobility Increases the Connectivity of Wireless Networks

ETPL NT-034 Topology Control for Effective Interference Cancellation in Multiuser MIMO Networks

ETPL NT-035 Distortion-Aware Scalable Video Streaming to Multinetwork Clients

ETPL NT-036 Combined Optimal Control of Activation and Transmission in Delay-Tolerant Networks

ETPL NT-037 A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless








In inverse synthetic aperture radar (ISAR) imaging, a target is usually regarded as consist of a few

strong (specular) scatterers and the distribution of these strong scatterers is sparse in the imagingvolume. In

this paper, we propose to incorporate the sparse signal recovery method in 3D multiple-input multiple-

output radar imaging algorithm. Sequential order one negative exponential (SOONE) function, which forms

homotopy between $ell_{1}$ and $ell_{0}$ norms, is proposed to measure the sparsity. Gradient projection is

used to solve a constrained nonconvex SOONE function minimization problem and recover the sparse signal.

However, while the gradient projection method is computationally simple, it is not robust when a matrix in the

algorithm is ill conditioned. We thus further propose using diagonal loading and singular value decomposition

methods to improve the robustness of the algorithm. In order to handle targets with large flat surfaces,

a combined amplitude and total-variation objective function is also proposed to regularize the shapes of the

flat surfaces. Simulation results show that the proposed gradient projection of SOONE function method is

better than orthogonal matching pursuit, CoSaMp,$ell_{1}$ -magic, Bayesian method with Laplace prior,

smoothed $ell_{0}$ method, and $ell_{1}{-}ell_{s}$ in high SNR cases for recovery of ${pm}{1}$ random

spikes sparse signal. The quality of the simulated 3D images and real data ISAR images obtained using the

new method is better than that of the conventional correlation method and minimum <- nline-

formula> $ell_{2}$ norm method, and competitive to the aforementioned sparse signal recovery algorithms.

ETPL

DIP - 001

MIMO Radar 3D Imaging Based on Combined Amplitude and Total Variation

Cost Function With Sequential Order One Negative Exponential Form

Capturing aerial imagery at high resolutions often leads to very low frame rate video streams, well

under full motion video standards, due to bandwidth, storage, and cost constraints. Low frame rates

makeregistration difficult when an aircraft is moving at high speeds or when global positioning system (GPS)

contains large errors or it fails. We present a method that takes advantage of persistent cyclic videodata

collections to perform an online registration with drift correction. We split the persistent aerialimagery

collection into individual cycles of the scene, identify and correct the registration errors on the first cycle in a

batch operation, and then use the corrected base cycle as a reference pass to register and correct subsequent

passes online. A set of multi-view panoramic mosaics is then constructed for each aerial pass for

representation, presentation and exploitation of the 3D dynamic scene. These sets of mosaics are all in

alignment to the reference cycle allowing their direct use in change detection, tracking, and 3D

reconstruction/visualization algorithms. Stereo viewing with adaptive baselines and varying view angles is

realized by choosing a pair of mosaics from a set of multi-view mosaics. Further, the mosaics for the second

pass and later can be generated and visualized online as their is no further batch error correction.

ETPL

DIP - 002

Persistent Aerial Video Registration and Fast Multi-View Mosaicing








In this paper, we propose FeatureMatch, a generalised approximate nearest-neighbour field (ANNF)

computation framework, between a source and target image. The proposed algorithm can estimateANNF maps

between any image pairs, not necessarily related. This generalisation is achieved through appropriate spatial-

range transforms. To compute ANNF maps, global colour adaptation is applied as a range transform on the

source image. Image patches from the pair of images are approximated using low-dimensional features, which

are used along with KD-tree to estimate the ANNF map. This ANNFmap is further improved based on image

coherency and spatial transforms. The proposed generalisation, enables us to handle a wider range of

vision applications, which have not been tackled using the ANNF framework. We illustrate two

such applications namely: 1) optic disk detection and 2) super resolution. The first application deals with

medical imaging, where we locate optic disks in retinal images using a healthy optic disk image as common

target image. The second application deals with super resolution of synthetic images using a common source

image as dictionary. We make use ofANNF mappings in both these applications and show experimentally that

our proposed approaches are faster and accurate, compared with the state-of-the-art techniques.

ETPL

DIP - 003

FeatureMatch: A General ANNF Estimation Technique and its Applications

Newly developed hypertext transfer protocol (HTTP)-based video streaming technologies enable

flexiblerate-adaptation under varying channel conditions. Accurately predicting the users' quality of

experience (QoE) for rate-adaptive HTTP video streams is thus critical to achieve efficiency. An important

aspect of understanding and modeling QoE is predicting the up-to-the-moment subjective quality of a video as

it is played, which is difficult due to hysteresis effects and nonlinearities in human behavioral responses. This

paper presents a Hammerstein-Wiener model for predicting the time-varying subjective quality(TVSQ)

of rate-adaptive videos. To collect data for model parameterization and validation, a database of longer

duration videos with time-varying distortions was built and the TVSQs of the videos were measured in a large-

scale subjective study. The proposed method is able to reliably predict the TVSQ of rate adaptive videos.

Since the Hammerstein-Wiener model has a very simple structure, the proposed method is suitable for online

TVSQ prediction in HTTP-based streaming.

ETPL

DIP - 004

Modeling the Time—Varying Subjective Quality of HTTP Video Streams With

Rate Adaptations








We present a noniterative multiresolution motion estimation strategy, involving block-

basedcomparisons in each detail band of a Laplacian pyramid. A novel matching score is developed and

analyzed. The proposed matching score is based on a class of nonlinear transformations of Laplacian detail

bands, yielding 1-bit or 2-bit representations. The matching score is evaluated in a dense full-

search motion estimation setting, with synthetic video frames and an optical flow data set. Together with a

strategy for combining the matching scores across resolutions, the proposed method is shown to produce

smoother and more robust estimates than mean square error (MSE) in each detail band and combined. It

tolerates more of nontranslational motion, such as rotation, validating the analysis, while providing much

better localization of the motion discontinuities. We also provide an efficient implementation of

the motion estimation strategy and show that the computational complexity of the approach is closely related

to the traditional MSE block-based full-search motion estimation procedure.

ETPL

DIP - 005

Nonlinear Transform for Robust Dense Block-Based Motion Estimation








Photo cropping is a widely used tool in printing industry, photography, and cinematography.

Conventional croppingmodels suffer from the following three challenges. First, the deemphasized

role of semantic contents that are many times more important than low-level features

in photoaesthetics. Second, the absence of a sequential ordering in the existing models. In

contrast, humans look at semantically important regions sequentially when viewing aphoto. Third,

the difficulty of leveraging inputs from multiple users. Experience from multiple users is particularly

critical incropping as photo assessment is quite a subjective task. To address these challenges, this

paper proposes semantics-aware photo cropping, which crops a photo by simulating the process

of humans sequentially perceiving semantically important regions of a photo. We first project the

local features (graphlets in this paper) onto the semantic space, which is constructed based on the

category information of the training photos. An efficient learning algorithm is then derived to

sequentially select semantically representative graphlets of a photo, and the selecting process can be

interpreted by a path, which simulates humans activelyperceiving semantics in a photo. Furthermore,

we learn a prior distribution of such active graphlet paths from trainingphotos that are marked as

aesthetically pleasing by multiple users. The learned priors enforce the corresponding active

graphlet path of a test photo to be maximally similar to those from the training photos. Experimental

results show that: 1) the active graphlet path accurately predicts humangaze shifting, and thus is more

indicative for photoaesthetics than conventional saliency maps and 2) thecropped photos produced by

our approach outperform its competitors in both qualitative and quantitative comparisons.

ETPL

DIP - 006

Actively Learning Human Gaze Shifting Paths for Semantics-Aware Photo

Cropping








In the framework of texture image retrieval, a new family of stochastic multivariate modeling is

proposed based onGaussian Copula and wavelet decompositions. We take advantage of the copula paradigm,

which makes it possible to separate dependence structure from marginal behavior. We introduce two

new multivariate models using, respectively, generalized Gaussian and Weibull densities.

These models capture both the subband marginal distributions and the correlation

between waveletcoefficients. We derive, as a similarity measure, a closed form expression of the Jeffrey

divergence between Gaussiancopula-based multivariate models. Experimental results on well-known

databases show significant improvements inretrieval rates using the proposed method compared with the best

known state-of-the-art approaches.

ETPL

DIP - 007

Gaussian Copula Multivariate Modeling for Texture Image Retrieval Using

Wavelet Transforms

Visual features are successfully exploited in several applications (e.g., visual search, object

recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis

tasks require featuresto be transmitted over a bandwidth-limited network, thus calling for coding techniques to

reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the

first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted fromvideo sequences.

To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe

and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion

optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-

compress (ATC) paradigm in the context of visual sensor networks. That is, sets

of visual features are extracted fromvideo frames, encoded at remote nodes, and finally transmitted to a central

controller that performs visualanalysis. This is in contrast to the traditional compress-then-analyze (CTA)

paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for

further processing. In this paper, we compare these codingparadigms using metrics that are routinely adopted

to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and

tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the

proposed codingscheme, ATC outperforms CTA with respect to all evaluation metrics.

ETPL

DIP - 008

Coding Visual Features Extracted From Video Sequences








In this paper, a novel fuzzy rule-based prediction framework is developed for high-quality

image zooming. In classical interpolation-based image zooming, resolution is increased by inserting

pixels using certain interpolation techniques. Here, we propose a patch-based image zooming

technique, where each low-resolution (LR) image patch is replaced by an estimated high-

resolution (HR) patch. Since an LR patch can be generated from any of the many possible HR

patches, it would be natural to develop rules to find different possible HR patches and then to

combine them according to rulestrength to get the estimated HR patch. Here, we generate a large

number of LR-HR patch pairs from a collection of natural images, group them into different clusters,

and then generate a fuzzy rule for each of these clusters. The ruleparameters are also learned from

these LR-HR patch pairs. As a result, an efficient mapping from LR patch space to HR patch space

can be formulated. The performance of the proposed method is tested on different images, and is also

compared with other representative as well as state-of-the-art image zooming techniques.

Experimental results show that the proposed method is better than the competing methods and is

capable of reconstructing thin lines, edges, fine details, and textures in the image efficiently.

ETPL

DIP - 009

A Fuzzy-Rule-Based Approach for Single Frame Super Resolution

In this paper, we propose a novel approach for integrating multiple tracking cues within a

unified probabilistic graph-based Markov random fields (MRFs) representation. We show how to integrate

temporal and spatial cues encoded by unary and pairwise probabilistic potentials. As the inference of such

high-order MRF models is known to be NP-hard, we propose an efficient spectral relaxation-basedinference

scheme. The proposed scheme is exemplified by applying it to a mixture of five tracking cues, and is shown to

be applicable to wider sets of cues. This paves the way for a modular plug-and-play tracking framework that

can be easily adapted to diverse tracking scenarios. The proposed scheme is experimentally shown to compare

favorably with contemporary state-of-the-art schemes, and provides accurate tracking results.

ETPL

DIP - 010

A Probabilistic Graph-Based Framework for Plug-and-Play Multi-Cue Visual

Tracking








This paper deals with fast and accurate visualization ofpushbroom image data from airborne and

spaceborne platforms. A pushbroom sensor acquires images in a line-scanning fashion, and this results

in scattered input datathat need to be resampled onto a uniform grid for geometrically correct visualization. To

this end, we model theanisotropic spatial dependence structure caused by the acquisition process. Several

methods for scattered datainterpolation are then adapted to handle the inducedanisotropic metric and compared

for the pushbroom imagerectification problem. A trick that exploits the semiordered line structure

of pushbroom data to improve the computational complexity several orders of magnitude is also presented.

ETPL

DIP - 011

Anisotropic Scattered Data Interpolation for Pushbroom Image Rectification

Two model-based algorithms for edge detection in spectralimagery are developed that specifically

target capturing intrinsic features such as isoluminant edges that are characterized by a jump in color but not in

intensity. Given prior knowledge of the classes of reflectance or emittance spectra associated with candidate

objects in a scene, a small set of spectral-band ratios, which most profoundly identify the edge between each

pair of materials, are selected to define a edge signature. The bands that form the edge signature are fed into a

spatial mask, producing asparse joint spatiospectral nonlinear operator. The first algorithm

achieves edge detection for every material pair by matching the response of the operator at every pixel with

the edge signature for the pair of materials. The second algorithm is a classifier-enhanced extension of the first

algorithm that adaptively accentuates distinctive features before applying the spatiospectral operator. Both

algorithms are extensively verified using spectral imagery from the airborne hyperspectral imager and from a

dots-in-a-well midinfrared imager. In both cases, the multicolor gradient (MCG) and the hyperspectral/spatial

detection of edges(HySPADE) edge detectors are used as a benchmark for comparison. The results

demonstrate that the proposed algorithms outperform the MCG and HySPADE edgedetectors in accuracy,

especially when isoluminant edges are present. By requiring only a few bands as input to

thespatiospectral operator, the algorithms enable significant levels of data compression in band selection. In

the presented examples, the required operations per pixel are reduced by a factor of 71 with respect to those

required by the MCG edge detector.

ETPL

DIP - 012

Model-Based Edge Detector for Spectral Imagery Using Sparse Spatiospectral

Masks








Disparity estimation is a fundamental task in stereo imaging and is a well-studied problem. Recently,

methods have been adapted to the video domain where motion is used as amatching criterion to help

disambiguate spatially similar candidates. In this paper, we analyze the validity of the underlying assumptions

of spatio-temporal disparity estimation, and determine the extent to which motion aids the matching process.

By analyzing the error signal forspatio-temporal block matching under the sum of squared differences criterion

and treating motion as a stochastic process, we determine the probability of a false match as a function of

image features, motion distribution, image noise, and number of frames in the spatio-temporal patch. This

performance quantification provides insight into whenspatio-temporal matching is most beneficial in terms of

the scene and motion, and can be used as a guide to select parameters for stereo matching algorithms. We

validate our results through simulation and experiments on stereo video.

ETPL

DIP - 013

Discriminability Limits in Spatio-Temporal Stereo Block Matching

We propose a novel representation for stereo videos namely 2D-plus-depth-cue. This representation is

able to encodestereo videos compactly by leveraging the by-product of astereo video conversion process.

Specifically, the depth cues are derived from an interactive labeling process during 2D-to-

stereo video conversion-they are contour points of image regions and their corresponding depth models, and so

forth. Using such cues and the image features of 2D video frames, the scene depth can be reliably recovered.

Experimental results demonstrate that the bit rate can be saved about 10%-50% in coding

a stereo video compared with multiviewvideo coding and the 2D-plus-depth methods. In addition, since the

objects are segmented in the conversion process, it is convenient to adopt the region-of-interest (ROI) coding

in the proposed stereo video coding system. Experimental results show that using ROI coding, the bit rate is

reduced by 30%-40% or the video quality is increased by 1.5-4 dB with the fixed bit rate.

ETPL

DIP - 014

A Compact Representation for Compressing Converted Stereo Videos








In this paper, we propose a robust object tracking algorithm based on

a sparse collaborative model that exploits both holistic templates and local representations to account for

drastic appearance changes. Within the proposedcollaborative appearance model, we develop

a sparsediscriminative classifier (SDC) and sparse generative model(SGM) for object tracking. In the SDC

module, we present a classifier that separates the foreground object from the background based on holistic

templates. In the SGM module, we propose a histogram-based method that takes the spatial information of

each local patch into consideration. The update scheme considers both the most recent observations and

original templates, thereby enabling the proposed algorithm to deal with appearance changes effectively and

alleviate the tracking drift problem. Numerous experiments on various challenging videos demonstrate that the

proposed tracker performs favorably against several state-of-the-art algorithms.

ETPL

DIP - 015

Robust Object Tracking via Sparse Collaborative Appearance Model

We propose a new set of moment invariants based on Krawtchouk polynomials

for comparison of local patches in 2D images. Being computed from discrete functions, thesemoments do not

carry the error due to discretization. Unlike many orthogonal moments, which usually capture global features,

Krawtchouk moments can be used to compute localdescriptors from a region-of-interest in an image. This can

be achieved by changing two parameters, and hence shifting the center of interest region horizontally or

vertically or both. This property enables comparison of two arbitrary localregions. We show that

Krawtchouk moments can be written as a linear combination of geometric moments, so easily converted to

rotation, size, and position independentinvariants. We also construct local Hu-

based invariants usingHu invariants and utilizing them on images localized by the weight function given in the

definition of Krawtchouk polynomials. We give the formulation of local Krawtchouk-based and Hu-

based invariants, and evaluate their discriminative performance on local comparison of artificially generated

test images.

ETPL

DIP - 016

Comparison of Image Patches Using Local Moment Invariants








We present a novel scale-invariant image feature detection algorithm (D-SIFER) using a newly

proposed scale-space optimal 10th-order Gaussian derivative (GDO-10) filter, which reaches the jointly

optimal Heisenberg's uncertainty of its impulse response in scale and space simultaneously (i.e., we minimize

the maximum of the two moments). The D-SIFER algorithm using this filter leads to an outstanding quality

ofimage feature detection, with a factor of three quality improvement over state-of-the-art scale-

invariant featuretransform (SIFT) and speeded up robust features (SURF) methods that use the second-order

Gaussian derivativefilters. To reach low computational complexity, we also present a technique approximating

the GDO-10 filters with a fixed-length implementation, which is independent of thescale. The final

approximation error remains far below the noise margin, providing constant time, low cost, but nevertheless

high-quality feature detection and registration capabilities. D-SIFER is validated on a real-life

hyperspectralimage registration application, precisely aligning up to hundreds of successive narrowband

color images, despite their strong artifacts (blurring, low-light noise) typically occurring in such delicate

optical system setups.

ETPL

DIP - 017

Derivative-Based Scale Invariant Image Feature Detector With Error Resilience

Dynamic contrast enhanced magnetic resonance imaging(DCE-MRI) of the kidneys requires proper

motion correction and segmentation to enable an estimation of glomerular filtration rate through

pharmacokinetic modeling. Traditionally, co-registration, segmentation, and pharmacokinetic modeling have

been applied sequentially as separate processing steps. In this paper, a combined 4Dmodel for

simultaneous registration and segmentation of the whole kidney is presented. To demonstrate the model in

numerical experiments, we used normalized gradients as data term in the registration and a Mahalanobis

distance from the time courses of the segmented regions to a training set for supervised segmentation. By

applying this frameworkto an input consisting of 4D image time series, we conduct simultaneous motion

correction and two-regionsegmentation into kidney and background. The potential of the new approach is

demonstrated on real DCE-MRI data from ten healthy volunteers.

ETPL

DIP - 018

Segmentation-Driven Image Registration-Application to 4D DCE-MRI

Recordings of the Moving Kidneys








We present a sequential framework for change detection. This framework allows us to use

multiple images from reference and mission passes of a scene of interest in order to

improve detection performance. It includes a changestatistic that is easily updated when additional data

becomes available. Detection performance using this statistic is predictable when the reference and image data

are drawn from known distributions. We verify our performance prediction by simulation. Additionally, we

show thatdetection performance improves with additional measurements on a set of synthetic aperture

radar imagesand a set of visible images with unknown probability distributions.

ETPL

DIP - 019

A Sequential Framework for Image Change Detection

We introduce a family of novel image regularization penalties

called generalized higher degree total variation (HDTV). These penalties further extend our previously

introducedHDTV penalties, which generalize the popular total variation(TV) penalty to

incorporate higher degree image derivatives. We show that many of the proposed second degreeextensions of

TV are special cases or are closely approximated by a generalized HDTV penalty. Additionally, we propose a

novel fast alternating minimization algorithm for solving image recovery problems

with HDTV andgeneralized HDTV regularization. The new algorithm enjoys a tenfold speed up compared

with the iteratively reweighted majorize minimize algorithm proposed in a previous paper. Numerical

experiments on 3D magnetic resonance images and 3D microscopy images show

that HDTV and generalizedHDTV improve the image quality significantly compared with TV.

ETPL

DIP - 020

Generalized Higher Degree Total Variation (HDTV) Regularization








In biometrics research and industry, it is critical yet a

challenge to match infrared face images to optical faceimages. The major difficulty lies in the fact that a great

discrepancy exists between the infrared face image and corresponding optical face image because they are

captured by different devices (optical imaging device and infraredimaging device). This paper presents a new

approach calledcommon feature discriminant analysis to reduce this great discrepancy and improve optical-

infrared face recognition performance. In this approach, a new learning-based facedescriptor is first

proposed to extract the common featuresfrom

heterogeneous face images (infrared face images andoptical face images), and an effective matching method is

then applied to the resulting features to obtain the final decision. Extensive experiments are conducted on two

large and challenging optical-infrared face data sets to show the superiority of our approach over the state-of-

the-art.

ETPL

DIP - 022

Common Feature Discriminant Analysis for Matching Infrared Face Images to

Optical Face Images

In this paper, a probability-based rendering (PBR) method is described for reconstructing an intermediate view

with a steady-state matching probability (SSMP) density function. Conventionally, given multiple reference

images, the intermediate view is synthesized via the depth image-based rendering technique in which

geometric information (e.g., depth) is explicitly leveraged, thus leading to serious rendering artifacts on the

synthesized view even with small depth errors. We address this problem by formulating the rendering process

as an image fusion in which the textures of all probable matching points are adaptively blended with the SSMP

representing the likelihood that points among the input reference images are matched. The PBR hence

becomes more robust against depth estimation errors than existing view synthesis approaches. The MP in the

steady-state, SSMP, is inferred for each pixel via the random walk with restart (RWR). The RWR always

guarantees visually consistent MP, as opposed to conventional optimization schemes (e.g., diffusion or

filtering-based approaches), the accuracy of which heavily depends on parameters used. Experimental results

demonstrate the superiority of the PBR over the existing view synthesis approaches both qualitatively and

quantitatively. Especially, the PBR is effective in suppressing flicker artifacts of virtual video rendering

although no temporal aspect is considered. Moreover, it is shown that the depth map itself calculated from our

RWR-based method (by simply choosing the most probable matching point) is also comparable with that of

the state-of-the-art local stereo matching methods.

ETPL

DIP - 021

Probability-Based Rendering for View Synthesis








Objective measures to automatically predict the perceptualquality of images or videos can reduce the

time and cost requirements of end-to-end quality monitoring. For reliablequality predictions,

these objective quality measures need to respond consistently with the behavior of the human

visualsystem (HVS). In practice, many important HVS mechanisms are too complex to be modeled directly.

Instead, they can be mimicked by machine learning systems, trained on subjectivequality assessment

databases, and applied on predefinedobjective quality measures for specific content or distortion classes. On

the downside, machine learning systems are often difficult to interpret and may even contradict the

inputobjective quality measures, leading to unreliable qualitypredictions. To address this problem, we

developed an interpretable machine learning system for objective qualityassessment, namely

the locally adaptive fusion (LAF). This paper describes the LAF system and compares its performance with

traditional machine learning. As it turns out, the LAF system is more consistent with the inputmeasures and

can better handle heteroscedastic training data.

ETPL

DIP - 023

A Locally Adaptive System for the Fusion of Objective Quality Measures

Natural image statistics plays an important role in imagedenoising, and various natural image priors,

includinggradient-based, sparse representation-based, and nonlocal self-similarity-based ones, have been

widely studied and exploited for noise removal. In spite of the great success of many denoising algorithms,

they tend to smooth the fine scale image textures when removing noise, degrading theimage visual quality. To

address this problem, in this paper, we propose a texture enhanced image denoising method by enforcing

the gradient histogram of the denoised image to be close to a reference gradient histogram of the

originalimage. Given the reference gradient histogram, a novelgradient histogram preservation (GHP)

algorithm is developed to enhance the texture structures while removing noise. Two region-based variants of

GHP are proposed for the denoising of images consisting of regions with differenttextures. An algorithm is

also developed to effectively estimate the reference gradient histogram from the noisy observation of the

unknown image. Our experimental results demonstrate that the proposed GHP algorithm can well preserve

the texture appearance in the denoised images, making them look more natural.

ETPL

DIP - 024

Gradient Histogram Estimation and Preservation for Texture Enhanced Image

Denoising








In this paper, we investigate the impact of spatial, temporal, and amplitude resolution on

the perceptual quality of a compressed video. Subjective quality tests were carried out on a mobile device and

a total of 189 processed videosequences with 10 source sequences included in the test. Subjective data reveal

that the impact of spatial resolution(SR), temporal resolution (TR), and quantization stepsize (QS) can each be

captured by a function with a single content-dependent parameter, which indicates the decay rate of

the quality with each resolution factor. The jointimpact of SR, TR, and QS can be accurately modeled by the

product of these three functions with only three parameters. The impact of SR and QS on the quality are

independent of that of TR, but there are significant interactions between SR and QS. Furthermore,

the model parameters can be predicted accurately from a few content features derived from the original video.

The proposed model correlates well with the subjective ratings with a Pearson correlation coefficient of 0.985

when the model parameters are predicted from content features. The quality model is further validated on six

other subjective rating data sets with very high accuracy and outperforms several well-known qualitymodels.

ETPL

DIP - 025

Q-STAR: A Perceptual Video Quality Model Considering Impact of Spatial,

Temporal, and Amplitude Resolutions

In the Part 1 of this two-part study, we present a method

ofimaging and velocity estimation of ground moving targetsusing passive synthetic aperture radar. Such a

system uses a network of small, mobile receivers that collect scattered waves due to transmitters of

opportunity, such as commercial television, radio, and cell phone towers. Therefore, passive imaging systems

have significant cost, manufacturing, and stealth advantages over active systems. We describe a novel

generalized Radon transform-type forward model and a corresponding filtered-backprojection-

type image formation and velocity estimation method. We form a stack of position images over a range of

hypothesized velocities, and show that the targets can be reconstructed at the correct position whenever the

hypothesized velocity is equal to the true velocity of targets. We then use entropy to determine the most

accuratevelocity and image pair for each moving target. We present extensive numerical simulations to verify

the reconstruction method. Our method does not require a priori knowledge of transmitter locations and

transmitted waveforms. It can determine the location and velocity of multiple targetsmoving at

different velocities. Furthermore, it can accommodate arbitrary imaging geometries. In Part 2, we present the

resolution analysis and analysis of positioning errors in passive SAR images due to

erroneous velocityestimation.

ETPL

DIP - 026

Passive Synthetic Aperture Hitchhiker Imaging of Ground Moving Targets—

Part 1: Image Formation and Velocity Estimation








We present double random projection methods forreconstruction of imaging data. The methods draw

upon recent results in the random projection literature, particularly on low-rank matrix approximations, and

the reconstructionalgorithm has only two simple and noniterative steps, while the reconstruction error is close

to the error of the optimal low-rank approximation by the truncated singular-value decomposition. We extend

the often-required symmetric distributions of entries in a random-projection matrix to asymmetric

distributions, which can be more easily implementable on imaging devices. Experimental results are provided

on the subsampling of natural images and hyperspectral images, and on simulated compressible matrices.

Comparisons with other random projectionmethods are also provided.

ETPL

DIP - 027

Image Reconstruction From Double Random Projection

Incorporating image classification into image retrieval system brings many attractive advantages. For

instance, the searchspace can be narrowed down by rejecting images in irrelevant categories of the query. The

retrieved images can be more consistent in semantics by indexing and returning images in the relevant

categories together. However, due to their different goals on recognition accuracy and retrieval scalability, it is

hard to efficiently incorporate most image classification works into large-scale image search. To study this

problem, we propose cascade category-aware visualsearch, which utilizes weak category clue to achieve better

retrieval accuracy, efficiency, and memory consumption. To capture the category and visual clues of an image,

we first learn category-visual words, which are discriminative and repeatable local features labeled with

categories. By identifying category-visual words in database images, we are able to discard noisy local

features and extract imagevisual and category clues, which are hence recorded in a hierarchical index

structure. Our retrieval system narrows down the search space by: 1) filtering the noisy local features in query;

2) rejecting irrelevant categories in database; and 3) preforming discriminative visual search in relevant

categories. The proposed algorithm is tested on object search, landmark search, and large-scale similar

image search on the large-scale LSVRC10 data set. Although the category clue introduced is weak, our

algorithm still shows substantial advantages in retrieval accuracy, efficiency, and memory consumption than

the state-of-the-art.

ETPL

DIP - 028

Cascade Category-Aware Visual Search








This paper develops a distributed dictionary learningalgorithm for sparse representation of the

data distributedacross nodes of sensor networks, where the sensitive or private data are stored or there is no

fusion center or there exists a big data application. The main contributions of this paper are: 1) we decouple

the combined dictionary atom update and nonzero coefficient revision procedure into two-stage operations to

facilitate distributed computations, first updating the dictionary atom in terms of the eigenvalue decomposition

of the sum of the residual (correlation) matrices across the nodes then implementing a local projection

operation to obtain the related representationcoefficients for each node; 2) we cast the aforementioned atom

update problem as a set of decentralized optimization subproblems with consensus constraints. Then, we

simplify the multiplier update for the symmetry undirected graphs insensor networks and minimize the

separable subproblems to attain the consistent estimates iteratively; and 3)dictionary atoms are typically

constrained to be of unit norm in order to avoid the scaling ambiguity. We efficiently solve the resultant

hidden convex subproblems by determining the optimal Lagrange multiplier. Some experiments are given to

show that the proposed algorithm is an alternativedistributed dictionary learning approach, and is suitable for

the sensor network environment.

ETPL

DIP - 029

Distributed Dictionary Learning for Sparse Representation in Sensor Networks

A method is proposed for fully restoring local image structures of an unknown continuous-tone patch

from an input halftoned patch with homogenously distributed dot patterns, based on a

locally learned dictionary pair via feature clustering. First, many training sets consisting of paired halftone and

continuous-tone patches are collected, and then histogram-of- oriented-gradient (HOG) feature vectors that

describe the edge orientations are calculated from every continuous-tone patch, to group the training sets.

Next, a dictionary learning algorithm is separately conducted on the categorized training sets, to obtain the

halftone and continuous-tone dictionary pairs, optimized toedge-oriented patch representation. Finally, an

adaptively smoothing filter is applied to the input halftone patch, topredict the HOG feature vector of an

unknown continuous-tone patch, and to select one of the previously learneddictionary pairs, based on the

Euclidean distance between the HOG mean feature vectors of the grouped training sets and the predicted HOG

vector. In addition to using the localdictionary pairs, a patch fusion technique is used to reduce some artifacts,

such as color noise and overemphasizededges on smooth regions. Experimental results show that the use of the

paired dictionary selected by the local edgeorientation and patch fusion technique not only reduced the

artifacts in smooth regions, but also provided well expressed fine details and outlines, especially in the areas of

textures, lines, and regular patterns.

ETPL

DIP - 030

Local Learned Dictionaries Optimized to Edge Orientation for Inverse

Halftoning








Effective characterization of texture images requires exploiting multiple visual cues from the image

appearance. The local binary pattern (LBP) and its variants achieve great success in texture description.

However, because the LBP(-like) feature is an index of discrete patterns rather than a numerical feature, it is

difficult to combine the LBP(-like)feature with other discriminative ones by a compact descriptor. To

overcome the problem derived from the nonnumerical constraint of the LBP, this paper proposes a numerical

variant accordingly, named the LBP difference(LBPD). The LBPD characterizes the extent to which

one LBPvaries from the average local structure of an image region of interest. It is simple, rotation invariant,

and computationally efficient. To achieve enhanced performance, we combine the LBPD with other

discriminative cues by a covariance matrix. The proposed descriptor, termed the covariance and LBPD

descriptor (COV-LBPD), is able to capture the intrinsiccorrelation between the LBPD and other features in a

compact manner. Experimental results show that the COV-LBPD achieves promising results on publicly

available data sets.

ETPL

DIP - 031

Combining LBP Difference and Feature Correlation for Texture Description

We address single image super resolution usinga statisticalprediction model based on sparse representations of

low- and high-resolution image patches. The suggested modelallows us to avoid any invariance assumption,

which is a common practice in sparsity-based approaches treating this task. Prediction of

high resolution patches is obtained via MMSE estimation and the resulting scheme has the useful

interpretation of a feedforward neural network. To further enhance performance, we suggest data clustering

and cascading several levels of the basic algorithm. We suggest a training scheme for the resulting network

and demonstrate the capabilities of our algorithm, showing its advantages over existing methods based on a

low- and high-resolution dictionary pair, in terms of computational complexity, numerical criteria, and visual

appearance. The suggested approach offers a desirable compromise between low computational complexity

and reconstruction quality, when comparing it with state-of-the-art methods for singleimage super-resolution.

ETPL

DIP - 032

A Statistical Prediction Model Based on Sparse Representations for Single

Image Super-Resolution








Feature description for local image patch is widely used in computer vision. While the conventional

way to design local descriptor is based on expert experience and knowledge, learning-based methods for

designing local descriptor become more and more popular because of their good performance and data-driven

property. This paper proposes a novel data-driven method for designing binary featuredescriptor, which we

call receptive fields descriptor (RFD). Technically, RFD is constructed by thresholding responses of a set

of receptive fields, which are selected from a large number of candidates according to their distinctiveness and

correlations in a greedy way. Using two different kinds ofreceptive fields (namely rectangular pooling area

and Gaussian pooling area) for selection, we obtain two binarydescriptors ${rm RFD}_{rm R}$ and ${rm

RFD}_{rm G}$.accordingly. Image matching experiments on the well-known patch data set and Oxford data

set demonstrate that RFD significantly outperforms the state-of-the-art binarydescriptors, and is comparable

with the best float-valued descriptors at a fraction of processing time. Finally, experiments on object

recognition tasks confirm that both${rm RFD}_{rm R}$ and ${rm RFD}_{rm G}$ successfully bridge the

performance gap between binary descriptors and their floating-point competitors.

ETPL

DIP - 033

Receptive Fields Selection for Binary Feature Description

Pan-sharpening is a common postprocessing operation for captured multispectral satellite imagery,

where the spatial resolution of images gathered in various spectral bands is enhanced by fusing them with a

panchromatic image captured at a higher resolution. In this paper, pan-sharpening is formulated as the problem

of jointly estimating the high-resolution (HR) multispectral images to minimize an objective function

comprised of the sum of squared residual errors in physically motivated observation models of the low-

resolution (LR) multispectral and the HR panchromatic images and a correlation-dependent regularization

term. The objective function differs from and improves upon previously reported model-

based optimization approaches to pan-sharpening in two major aspects: 1) a new regularization term is

introduced and 2) a highpass filter, complementary to the lowpass filter for the LR spectral observations, is

introduced for the residual error corresponding to the panchromatic observation model. To obtain pan-

sharpened images, an iterative algorithm is developed to solve the proposed joint minimization. The proposed

algorithm is compared with previously proposed methods both visually and using established quantitative

measures of SNR, spectral angle mapper, relative dimensionless global error in synthesis, Q, and Q4 indices.

Both the quantitative results and visual evaluation demonstrate that the proposed joint formulation provides

superior results compared with pre-existing methods. A software implementation is provided.

ETPL

DIP - 034

A Regularized Model-Based Optimization Framework for Pan-Sharpening








Fluorescence diffuse optical tomography (FDOT) is an emerging molecular imaging modality that

uses near infraredlight to excite the fluorophore injected into tissue; and to reconstruct the fluorophore

concentration from boundary measurements. The FDOT image reconstruction is a highly ill-posed inverse

problem due to a large number of unknowns and limited number of measurements. However, the fluorophore

distribution is often very sparse in the imaging domain since fluorophores are typically designed to accumulate

in relatively small regions. In this paper, we usecompressive sensing (CS) framework to

design lightillumination and detection patterns to improve the reconstruction of sparse fluorophore

concentration. Unlike the conventional FDOT imaging where spatially distributedlight sources illuminate the

imaging domain one at a time and the corresponding boundary measurements are used for image

reconstruction, we assume that the light sources illuminate the imaging domain simultaneously several times

and the corresponding boundary measurements are linearly filtered prior to image reconstruction. We design a

set ofoptical intensities (illumination patterns) and a linear filter (detection pattern) applied to the boundary

measurements to improve the reconstruction of sparse fluorophore concentration maps. We show that the

FDOT sensing matrix can be expressed as a columnwise Kronecker product of two matrices determined by the

excitation and emission lightfields. We derive relationships between the incoherence of the FDOT forward

matrix and these two matrices, and use these results to reduce the incoherence of the FDOT forward matrix.

We present extensive numerical simulation and the results of a real phantom experiment to demonstrate the

improvements in image reconstruction due to the CS-basedlight illumination and detection patterns in

conjunction with relaxation and greedy-type reconstruction algorithms.

ETPL

DIP - 035

Light Illumination and Detection Patterns for Fluorescence Diffuse Optical

Tomography Based on Compressive Sensing








Many saliency detection models for 2D images have been proposed for various multimedia

processing applications during the past decades. Currently, the emerging applications of stereoscopic display

require new saliencydetection models for salient region extraction. Different fromsaliency detection for

2D images, the depth feature has to be taken into account in saliency detection for stereoscopicimages. In this

paper, we propose a novel stereoscopicsaliency detection framework based on the feature contrast of color,

luminance, texture, and depth. Four types of features, namely color, luminance, texture, and depth, are

extracted from discrete cosine transform coefficients for feature contrast calculation. A Gaussian model of the

spatial distance between image patches is adopted for consideration of local and global contrast calculation.

Then, a new fusion method is designed to combine the feature maps to obtain the final saliency map

for stereoscopic images. In addition, we adopt the center bias factor and human visual acuity, the important

characteristics of the human visual system, to enhance the final saliency map for stereoscopicimages.

Experimental results on eye tracking databases show the superior performance of the proposed model over

other existing methods.

ETPL

DIP - 036

Saliency Detection for Stereoscopic Images

Because of the lack of disciplined and efficient mechanisms, most modern area charge-coupled

device-based barcodescanning technologies are not capable of handling out-of-focus (OOF) image blur and

rely heavily on camera systems for capturing good quality, well-focused barcode images. In this paper, we

present a novel linear barcode scanningsystem based on a dynamic template matching scheme. The proposed

system works entirely in the spatial domain, and is capable of reading linear barcodes from low-

resolutionimages containing severe OOF blur. This paper treats linearbarcode scanning under the perspective

of deformed binary waveform analysis and classification. A directed graphical model is designed to

characterize the relationship between the blurred barcode waveform and its corresponding symbol value at any

specific blur level. Under this model, linearbarcode scanning is cast to find the optimal state sequence

associated with the deformed barcode waveform segments. A dynamic programming-based inference

algorithm is designed to retrieve the optimal state sequence, enabling real-time decoding on mobile devices of

limited processing power.

ETPL

DIP - 037

On Scanning Linear Barcodes From Out-of-Focus Blurred Images: A Spatial

Domain Dynamic Template Matching Approach








Mixed noise removal from natural images is a challenging task since the noise distribution usually

does not have a parametric model and has a heavy tail. One typical kind ofmixed noise is additive white

Gaussian noise (AWGN) coupled with impulse noise (IN). Many mixed noise removalmethods are detection

based methods. They first detect the locations of IN pixels and then remove the mixed noise. However, such

methods tend to generate many artifacts when the mixed noise is strong. In this paper, we propose a simple yet

effective method, namely weighted encoding withsparse nonlocal regularization (WESNR),

for mixed noiseremoval. In WESNR, there is not an explicit step of impulse pixel detection; instead, soft

impulse pixel detection viaweighted encoding is used to deal with IN and AWGN simultaneously. Meanwhile,

the image sparsity prior andnonlocal self-similarity prior are integrated into aregularization term and

introduced into the variationalencoding framework. Experimental results show that the proposed WESNR

method achieves leading mixed noiseremoval performance in terms of both quantitative measures and visual

quality.

ETPL

DIP - 038

Mixed Noise Removal by Weighted Encoding With Sparse Nonlocal

Regularization

This paper presents a nonlinear mixing model forhyperspectral image unmixing. The proposed model

assumes that the pixel reflectances are post-nonlinear functions of unknown pure spectral components

contaminated by an additive white Gaussian noise. These nonlinear functions are approximated using second-

order polynomials leading to a polynomial post-nonlinear mixing model. A Bayesianalgorithm is proposed to

estimate the parameters involved in the model yielding an unsupervised nonlinear unmixingalgorithm. Due to

the large number of parameters to be estimated, an efficient Hamiltonian Monte Carlo algorithm is

investigated. The classical leapfrog steps of this algorithmare modified to handle the parameter constraints.

The performance of the unmixing strategy, including convergence and parameter tuning, is first evaluated on

synthetic data. Simulations conducted with real data finally show the accuracy of the

proposed unmixing strategy for the analysis of hyperspectral images.

ETPL

DIP - 039

Unsupervised Post-Nonlinear Unmixing of Hyperspectral Images Using a

Hamiltonian Monte Carlo Algorithm








We aim to realize a new and simple compensation method that robustly handles multiple-

projector systems withoutrecourse to the linearization of projector response functions. We introduce state

equations, which distribute arbitrary brightness among the individual projectors, and control the state

equations according to the feedback from a camera. By employing the color-mixing matrix with gradient

of projectorresponses, we compensate the controlled brightness input to each projector. Our method dispenses

with cooperation among multiple projectors as well as time-consuming photometric calibration. Compared

with existing methods, our method is shown to offer superior compensationperformance and a more effective

way of compensatingmultiple-projector systems.

ETPL

DIP - 040

An Iterative Compensation Approach Without Linearization of Projector

Responses for Multiple-Projector System

Camera motion blur is drastically nonuniform for large depth-range scenes, and the nonuniformity

caused by cameratranslation is depth dependent but not the case for camerarotations. To restore the blurry

images of large-depth-range scenes deteriorated by arbitrary camera motion, we build an image blur model

considering 6-degrees of freedom (DoF) ofcamera motion with a given scene depth map. To make this

6D depth-aware model tractable, we propose a novel parametrization strategy to reduce the number of

variables and an effective method to estimate high-dimensionalcamera motion as well. The number of

variables is reduced by temporal sampling motion function, which describes the 6-DoF camera motion by

sampling the camera trajectory uniformly in time domain. To effectively estimate the high-

dimensional camera motion parameters, we construct the probabilistic motion density function (PMDF) to

describe the probability distribution of camera poses during exposure, and apply it as a unified constraint to

guide the convergence of the iterative deblurring algorithm. Specifically, PMDF is computed through a back

projection from 2D local blur kernels to 6D camera motion parameter space and robust voting. We conduct a

series of experiments on both synthetic and real captured data, and validate that our method achieves better

performance than existing uniform methods and nonuniform methods on large-depth-range scenes.

ETPL

DIP - 041

High-Dimensional Camera Shake Removal With Given Depth Map








Automatic video summarization is indispensable for fast browsing and efficient management of

large video libraries. In this paper, we introduce an image feature that we

referto as heterogeneity image patch (HIP) index. The proposed HIP index provides a new entropy-based

measure of theheterogeneity of patches within any picture. By evaluating this index for every frame in

a video sequence, we generate a HIP curve for that sequence. We exploit the HIP curve in solving two

categories of video summarization applications: key frame extraction and dynamic video skimming. Under the

key frame extraction framework, a set of candidate key frames is selected from abundant video frames based

on the HIP curve. Then, a proposed patch-based image dissimilarity measure is used to create affinity matrix

of these candidates. Finally, a set of key frames is extracted from the affinity matrix using a min-max based

algorithm. Under videoskimming, we propose a method to measure the distance between

a video and its skimmed representation. The videoskimming problem is then mapped into an optimization

framework and solved by minimizing a HIP-based distance for a set of extracted excerpts. The HIP framework

is pixel-based and does not require semantic information or complex camera motion estimation. Our

simulation results are based on experiments performed on consumer videos and are compared with state-of-

the-art methods. It is shown that the HIP approach outperforms other leading methods, while maintaining low

complexity.

ETPL

DIP - 042

Heterogeneity Image Patch Index and Its Application to Consumer Video

Summarization








Successful image-based object recognition techniques have been constructed founded on powerful

techniques such assparse representation, in lieu of the popular vector quantization approach. However, one

serious drawback ofsparse space-based methods is that local features that are quite similar can be quantized

into quite distinct visualwords. We address this problem with a novel approach for object recognition,

called sparse spatial coding, which efficiently combines a sparse coding dictionary learning

andspatial constraint coding stage. We performed experimental evaluation using the Caltech 101, Caltech 256,

Corel 5000, and Corel 10000 data sets, which were specifically designed for object recognition evaluation.

Our results show that ourapproach achieves high accuracy comparable with the best single feature method

previously published on those databases. Our method outperformed, for the same bases, several multiple

feature methods, and provided equivalent, and in few cases, slightly less accurate results than other techniques

specifically designed to that end. Finally, we report state-of-the-art results for scene recognition on COsy

Localization Dataset (COLD) and high performance results on the MIT-67 indoor scene recognition, thus

demonstrating the generalization of our approach for such tasks.

ETPL

DIP - 043

Sparse Spatial Coding: A Novel Approach to Visual Recognition

The stimulus response of the classical receptive field (CRF) of neuron in primary visual cortex is

affected by its periphery [i.e., non-CRF (nCRF)]. This modulation exerts inhibition, which depends primarily

on the correlation of both visual stimulations. The theory of periphery and center interaction with visual

characteristics can be applied in night vision information processing. In this paper, a weighted kernel principal

component analysis (WKPCA) degree ofhomogeneity (DH) amended inhibition model inspired by visual

perceptual mechanisms is proposed to extract salientcontour from complex natural scene in low-light-level

image. The core idea is that multifeature analysis can recognize thehomogeneity in modulation coverage

effectively. Computationally, a novel WKPCA algorithm is presented to eliminate outliers and anomalous

distribution in CRF and accomplish principal component analysis precisely. On this basis, a new concept and

computational procedure for DH is defined to evaluate the dissimilarity between periphery and center

comprehensively. Through amending the inhibitionfrom nCRF to CRF by DH, our model can reduce the

interference of noises, suppress details, and textures in homogeneous regions accurately. It helps to further

avoid mutual suppression among inhomogeneous regions andcontour elements. This paper provides an

improved computational visual model with high-performance forcontour detection from cluttered natural

scene in night vision image.

ETPL

DIP - 044

Weighted KPCA Degree of Homogeneity Amended Nonclassical Receptive Field

Inhibition Model for Salient Contour Extraction in Low-Light-Level Image








We present an effective image boundary processing for $M$-channel $(MinBBN, Mgeq 2)$ lifting-

based linear-phase filterbanks that are applied to unified lossy and lossless imagecompression (coding), i.e.,

lossy-to-lossless image coding. The reversible symmetric extension we propose is achieved by manipulating

building blocks on the image boundary and reawakening the symmetry of each building block that has been

lost due to rounding error on each lifting step. In addition, complexity is reduced by

extending nonexpansiveconvolution, called reversible symmetric nonexpansiveconvolution, because the

number of input signals does not even temporarily increase. Our method not only

achievesreversible boundary processing, but also is comparable with irreversible symmetric extension in

lossy image coding and outperformed periodic extension in lossy-to-lossless imagecoding.

ETPL

DIP - 045

Reversible Symmetric Nonexpansive Convolution: An Effective Image

Boundary Processing for $mbi{M}$ -Channel Lifting-Based Linear-Phase Filter

Banks

The behavior and performance of denoising algorithms are governed by one or several parameters,

whose optimal settings depend on the content of the processed image and the characteristics of the noise, and

are generally designed to minimize the mean squared error (MSE) between the denoised image returned by the

algorithm and a virtual ground truth. In this paper, we introduce a new Poisson-

Gaussian unbiased risk estimator (PG-URE) of the MSE applicable to a mixed Poisson-Gaussian noise model

that unifies the widely used Gaussian and Poisson noise models in fluorescence bioimaging applications. We

propose a stochastic methodology to evaluate this estimator in the case when little is known about the internal

machinery of the considered denoising algorithm, and we analyze both theoretically and empirically the

characteristics of the PG-UREestimator. Finally, we evaluate the PG-URE-driven parametrization for three

standard denoising algorithms, with and without variance stabilizing transforms, and different characteristics

of the Poisson-Gaussian noisemixture.

ETPL

DIP - 046

An Unbiased Risk Estimator for Image Denoising in the Presence of Mixed

Poisson–Gaussian Noise








We present a new efficient edge-preserving filter-“treefilter”-to achieve strong image smoothing. The

proposedfilter can smooth out high-contrast details while preservingmajor edges, which is not achievable for

bilateral-filter-like techniques. Tree filter is a weighted-average filter, whose kernel is derived by viewing

pixel affinity in a probabilistic framework simultaneously considering pixel spatial distance, color/intensity

difference, as well as connectedness. Pixel connectedness is acquired by treating pixels as nodes in

aminimum spanning tree (MST) extracted from the image. The fact that an MST makes all image pixels

connected through the tree endues the filter with the power to smooth out high-contrast, fine-scale details

while preserving major image structures, since pixels in small isolated region will be closely connected to

surrounding majority pixels through thetree, while pixels inside large homogeneous region will be

automatically dragged away from pixels outside the region. The tree filter can be separated into two

other filters, both of which turn out to have fast algorithms. We also propose an efficient linear time MST

extraction algorithm to further improve the whole filtering speed. The algorithms give treefilter a great

advantage in low computational complexity (linear to number of image pixels) and fast speed: it can process a

1-megapixel 8-bit image at ~ 0.25 s on an Intel 3.4 GHz Core i7 CPU (including the construction of MST).

The proposed tree filter is demonstrated on a variety of applications.

ETPL

DIP - 047

Tree Filtering: Efficient Structure-Preserving Smoothing With a Minimum

Spanning Tree

The development of energy selective, photon counting X-ray detectors allows for a wide range of new

possibilities in the area of computed tomographic image formation. Under the assumption of perfect energy

resolution, here we propose a tensor-based iterative algorithm that simultaneously reconstructs the X-ray

attenuation distribution for each energy. We use a multilinear image model rather than a more standard

stacked vector representation in order to develop novel tensor-based regularizers. In particular, we model the

multispectral unknown as a three-way tensor where the first two dimensions are space and the third dimension

is energy. This approach allows for the design of tensor nuclear norm regularizers, which like its 2D

counterpart, is a convex function of the multispectral unknown. The solution to the resulting convex

optimization problem is obtained using an alternating direction method of multipliers approach. Simulation

results show that the generalized tensor nuclear norm can be used as a standalone regularization technique for

the energy selective (spectral) computed tomography problem and when combined with total variation

regularization it enhances the regularization capabilities especially at low energy images where the effects of

noise are most prominent.

ETPL

DIP - 048

Tensor-Based Formulation and Nuclear Norm Regularization for Multienergy

Computed Tomography








This paper studies the impact of secure watermark embedding in digital images by proposing a

practical implementation of secure spread-spectrum watermarking using distortion optimization. Because

strong security properties (key-security and subspace-security) can be achieved using natural watermarking

(NW) since this particular embedding lets the distribution of the host and watermarked signals unchanged, we

use elements of transportation theory to minimize the global distortion. Next, we apply this new modulation,

called transportation NW (TNW), to design a secure watermarking scheme for grayscale images. The TNW

uses a multiresolution image decomposition combined with a multiplicative embedding which is taken into

account at the distribution level. We show that the distortion solely relies on the variance of the wavelet

subbands used during the embedding. In order to maximize a target robustness after JPEG compression, we

select different combinations of subbands offering the lowest Bit Error Rates for a target PSNR ranging from

35 to 55 dB and we propose an algorithm to select them. The use of transportation theory also provides an

average PSNR gain of 3.6 dB on PSNR with respect to the previous embedding for a set of 2000 images.

ETPL

DIP - 049

Optimal Transport for Secure Spread-Spectrum Watermarking of Still Images

In this paper, we propose an efficient algorithm, called vector field consensus, for establishing robust

point correspondences between two sets of points. Our algorithm starts by creating a set of putative

correspondences which can contain a very large number of false correspondences, or outliers, in addition to a

limited number of true correspondences (inliers). Next, we solve for correspondence by interpolating a vector

field between the two point sets, which involves estimating a consensus of inlier points whose matching

follows a nonparametric geometrical constraint. We formulate this a maximum a posteriori (MAP) estimation

of a Bayesian model with hidden/latent variables indicating whether matches in the putative set are outliers or

inliers. We impose nonparametric geometrical constraints on the correspondence, as a prior distribution, using

Tikhonov regularizers in a reproducing kernel Hilbert space. MAP estimation is performed by the EM

algorithm which by also estimating the variance of the prior model (initialized to a large value) is able to

obtain good estimates very quickly (e.g., avoiding many of the local minima inherent in this formulation). We

illustrate this method on data sets in 2D and 3D and demonstrate that it is robust to a very large number of

outliers (even up to 90%). We also show that in the special case where there is an underlying parametric

geometrical model (e.g., the epipolar line constraint) that we obtain better results than standard alternatives

like RANSAC if a large number of outliers are present. This suggests a two-stage strategy, where we use our

nonparametric model to reduce the size of the putative set and then apply a parametric variant of our approach

to estimate the geometric parameters. Our algorithm is computationally efficient and we provide code for

others to use it. In addition, our approach is general and can be applied to other problems, such as learning

with a badly corrupted training data set.

ETPL

DIP - 050

Robust Point Matching via Vector Field Consensus








In this paper, we address the problem of the high annotation cost of acquiring training data for

semantic segmentation. Most modern approaches to semantic segmentation are based upon graphical models,

such as the conditional random fields, and rely on sufficient training data in form of object contours. To reduce

the manual effort on pixel-wise annotating contours, we consider the setting in which the training data set for

semantic segmentation is a mixture of a few object contours and an abundant set of bounding boxes of objects.

Our idea is to borrow the knowledge derived from the object contours to infer the unknown object contours

enclosed by the bounding boxes. The inferred contours can then serve as training data for semantic

segmentation. To this end, we generate multiple contour hypotheses for each bounding box with the

assumption that at least one hypothesis is close to the ground truth. This paper proposes an approach, called

augmented multiple instance regression (AMIR), that formulates the task of hypothesis selection as the

problem of multiple instance regression (MIR), and augments information derived from the object contours to

guide and regularize the training process of MIR. In this way, a bounding box is treated as a bag with its

contour hypotheses as instances, and the positive instances refer to the hypotheses close to the ground truth.

The proposed approach has been evaluated on the Pascal VOC segmentation task. The promising results

demonstrate that AMIR can precisely infer the object contours in the bounding boxes, and hence provide

effective alternatives to manually labeled contours for semantic segmentation.

ETPL

DIP - 051

Augmented Multiple Instance Regression for Inferring Object Contours in

Bounding Boxes

The parsimonious nature of sparse representations has been successfully exploited for the

development of highly accurate classifiers for various scientific applications. Despite the successes of Sparse

Representation techniques, a large number of dictionary atoms as well as the high dimensionality of the data

can make these classifiers computationally demanding. Furthermore, sparse classifiers are subject to the

adverse effects of a phenomenon known as coefficient contamination, where, for example, variations in pose

may affect identity and expression recognition. We analyze the interaction between dimensionality reduction

and sparse representations, and propose a technique, called Linear extension of Graph Embedding K-means-

based Singular Value Decomposition (LGE-KSVD) to address both issues of computational intensity and

coefficient contamination. In particular, the LGE-KSVD utilizes variants of the LGE to optimize the K-SVD,

an iterative technique for small yet over complete dictionary learning. The dimensionality reduction matrix,

sparse representation dictionary, sparse coefficients, and sparsity-based classifier are jointly learned through

the LGE-KSVD. The atom optimization process is redefined to allow variable support using graph embedding

techniques and produce a more flexible and elegant dictionary learning algorithm. Results are presented on a

wide variety of facial and activity recognition problems that demonstrate the robustness of the proposed

method.

ETPL

DIP - 052

LGE-KSVD: Robust Sparse Representation Classification








Speckle noise filtering on polarimetric SAR (PolSAR) images remains a challenging task due to the

difficulty to reduce a scatterer-dependent noise while preserving the polarimetric information and the spatial

information. This challenge is particularly acute on single look complex images, where little information about

the scattering process can be derived from a rank-1 covariance matrix. This paper proposes to analyze and to

evaluate the performances of a set of PolSAR speckle filters. The filter performances are measured by a set of

ten different indicators, including relative errors on incoherent target decomposition parameters, coherences,

polarimetric signatures, point target, and edge preservation. The result is a performance profile for each

individual filter. The methodology consists of simulating a set of artificial PolSAR images on which the

various filters will be evaluated. The image morphology is stochastic and determined by a Markov random

field and the number of scattering classes is allowed to vary so that we can explore a large range of image

configurations. Evaluation on real PolSAR images is also considered. Results show that filters performances

need to be assessed using a complete set of indicators, including distributed scatterer parameters, radiometric

parameters, and spatial information preservation.

ETPL

DIP - 053

Analysis, Evaluation, and Comparison of Polarimetric SAR Speckle Filtering

Techniques

Detecting generic object categories in images and videos are a fundamental issue in computer vision.

However, it faces the challenges from inter and intraclass diversity, as well as distortions caused by

viewpoints, poses, deformations, and so on. To solve object variations, this paper constructs a structure kernel

and proposes a multiscale part-based model incorporating the discriminative power of kernels. The structure

kernel would measure the resemblance of part-based objects in three aspects: 1) the global similarity term to

measure the resemblance of the global visual appearance of relevant objects; 2) the part similarity term to

measure the resemblance of the visual appearance of distinctive parts; and 3) the spatial similarity term to

measure the resemblance of the spatial layout of parts. In essence, the deformation of parts in the structure

kernel is penalized in a multiscale space with respect to horizontal displacement, vertical displacement, and

scale difference. Part similarities are combined with different weights, which are optimized efficiently to

maximize the intraclass similarities and minimize the interclass similarities by the normalized stochastic

gradient ascent algorithm. In addition, the parameters of the structure kernel are learned during the training

process with regard to the distribution of the data in a more discriminative way. With flexible part sizes on

scale and displacement, it can be more robust to the intraclass variations, poses, and viewpoints. Theoretical

analysis and experimental evaluations demonstrate that the proposed multiscale part-based representation

model with structure kernel exhibits accurate and robust performance, and outperforms state-of-the-art object

classification approaches.

ETPL

DIP - 054

Data-Driven Hierarchical Structure Kernel for Multiscale Part-Based Object

Recognition








This paper investigates the use of local prediction in difference expansion reversible watermarking.

For each pixel, a least square predictor is computed on a square block centered on the pixel and the

corresponding prediction error is expanded. The same predictor is recovered at detection without any

additional information. The proposed local prediction is general and it applies regardless of the predictor order

or the prediction context. For the particular cases of least square predictors with the same context as the

median edge detector, gradient-adjusted predictor or the simple rhombus neighborhood, the local prediction-

based reversible watermarking clearly outperforms the state-of-the-art schemes based on the classical

counterparts. Experimental results are provided.

ETPL

DIP - 055

Local-Prediction-Based Difference Expansion Reversible Watermarking

We consider a wireless relay network with a single source, a single destination, and a multiple relay.

The relays are half-duplex and use the decode-and-forward protocol. The transmit source is a layered video

bitstream, which can be partitioned into two layers, a base layer (BL) and an enhancement layer (EL), where

the BL is more important than the EL in terms of the source distortion. The source broadcasts both layers to

the relays and the destination using hierarchical 16-QAM. Each relay detects and transmits successfully

decoded layers to the destination using either hierarchical 16-QAM or QPSK. The destination can thus receive

multiple signals, each of which can include either only the BL or both the BL and the EL. We derive the

optimal linear combining method at the destination, where the uncoded bit error rate is minimized. We also

present a suboptimal combining method with a closed-form solution, which performs very close to the

optimal. We use the proposed double-layer transmission scheme with our combining methods for transmitting

layered video bitstreams. Numerical results show that the double-layer scheme can gain 2-2.5 dB in channel

signal-to-noise ratio or 5-7 dB in video peak signal-to-noise ratio, compared with the classical single-layer

scheme using conventional modulation.

ETPL

DIP - 056

Double-Layer Video Transmission Over Decode-and-Forward Wireless Relay

Networks Using Hierarchical Modulation








Once an image is decomposed into a number of visual primitives, e.g., local interest points or regions,

it is of great interests to discover meaningful visual patterns from them. Conventional clustering of visual

primitives, however, usually ignores the spatial and feature structure among them, thus cannot discover high-

level visual patterns of complex structure. To overcome this problem, we propose to consider spatial and

feature contexts among visual primitives for pattern discovery. By discovering spatial co-occurrence patterns

among visual primitives and feature co-occurrence patterns among different types of features, our method can

better address the ambiguities of clustering visual primitives. We formulate the pattern discovery problem as a

regularized k-means clustering where spatial and feature contexts are served as constraints to improve the

pattern discovery results. A novel self-learning procedure is proposed to utilize the discovered spatial or

feature patterns to gradually refine the clustering result. Our self-learning procedure is guaranteed to converge

and experiments on real images validate the effectiveness of our method.

ETPL

DIP - 057

Context-Aware Discovery of Visual Co-Occurrence Patterns

We propose a genuine 3D texture synthesis algorithm based on a probabilistic 2D Markov random

field conceptualization, capable of capturing the visual characteristics of a texture into a unique statistical

texture model. We intend to reproduce, in the volumetric texture, the interactions between pixels learned in an

input 2D image. The learning is done by nonparametric Parzen-windowing. Optimization is handled voxel by

a relaxation algorithm, aiming at maximizing the likelihood of each voxel in terms of its local conditional

probability function. Variants are proposed regarding the relaxation algorithm and the heuristic strategies used

for the simultaneous handling of the orthogonal slices containing the voxel. The procedures are materialized

on various textures through a comparative study and a sensitivity analysis, highlighting the variants strengths

and weaknesses. Finally, the probabilistic model is compared objectively with a nonparametric neighborhood-

search-based algorithm.

ETPL

DIP - 058

Maximum-Likelihood Based Synthesis of Volumetric Textures From a 2D

Sample








Multiplicative noise (also known as speckle) reduction is a prerequisite for many image-processing

tasks in coherent imaging systems, such as the synthetic aperture radar. One approach extensively used in this

area is based on total variation (TV) regularization, which can recover significantly sharp edges of an image,

but suffers from the staircase-like artifacts. In order to overcome the undesirable deficiency, we propose two

novel models for removing multiplicative noise based on total generalized variation (TGV) penalty. The TGV

regularization has been mathematically proven to be able to eliminate the staircasing artifacts by being aware

of higher order smoothness. Furthermore, an efficient algorithm is developed for solving the TGV-based

optimization problems. Numerical experiments demonstrate that our proposed methods achieve state-of-the-art

results, both visually and quantitatively. In particular, when the image has some higher order smoothness, our

methods outperform the TV-based algorithms.

ETPL

DIP - 059

Speckle Reduction via Higher Order Total Variation Approach

In order to quantitatively analyze biological images and study underlying mechanisms of the cellular

and subcellular processes, it is often required to track a large number of particles involved in these processes.

Manual tracking can be performed by the biologists, but the workload is very heavy. In this paper, we present

an automatic particle tracking method for analyzing an essential subcellular process, namely clathrin mediated

endocytosis. The framework of the tracking method is an extension of the classical multiple hypothesis

tracking (MHT), and it is designed to manage trajectories, solve data association problems, and handle pseudo-

splitting/merging events. In the extended MHT framework, particle tracking becomes evaluating two types of

hypotheses. The first one is the trajectory-related hypothesis, to test whether a recovered trajectory is correct,

and the second one is the observation-related hypothesis, to test whether an observation from an image

belongs to a real particle. Here, an observation refers to a detected particle and its feature vector. To detect the

particles in 2D fluorescence images taken using total internal reflection microscopy, the images are segmented

into regions, and the features of the particles are obtained by fitting Gaussian mixture models into each of the

image regions. Specific models are developed according to the properties of the particles. The proposed

tracking method is demonstrated on synthetic data under different scenarios and applied to real data.

ETPL

DIP - 060

A Novel Multiple Hypothesis Based Particle Tracking Method for Clathrin

Mediated Endocytosis Analysis Using Fluorescence Microscopy








This paper focuses on the problem of detecting a number of different class objects in images. We

present a novel part-based model for object detection with cascaded classifiers. The coarse root and fine part

classifiers are combined into the model. Different from the existing methods which learn root and part

classifiers independently, we propose a shared-Boost algorithm to jointly train multiple classifiers. This paper

is distinguished by two key contributions. The first is to introduce a new definition of shared features for

similar pattern representation among multiple classifiers. Based on this, a shared-Boost algorithm which

jointly learns multiple classifiers by reusing the shared feature information is proposed. The second

contribution is a method for constructing a discriminatively trained part-based model, which fuses the outputs

of cascaded shared-Boost classifiers as high-level features. The proposed shared-Boost-based part model is

applied for both rigid and deformable object detection experiments. Compared with the state-of-the-art

method, the proposed model can achieve higher or comparable performance. In particular, it can lift up the

detection rates in low-resolution images. Also the proposed procedure provides a systematic framework for

information reusing among multiple classifiers for part-based object detection.

ETPL

DIP - 061

Learning Cascaded Shared-Boost Classifiers for Part-Based Object Detection

In this paper, we cast the tracking problem as finding the candidate that scores highest in the

evaluation model based upon a matrix called discriminative sparse similarity map (DSS map). This map

demonstrates the relationship between all the candidates and the templates, and it is constructed based on the

solution to an innovative optimization formulation named multitask reverse sparse representation formulation,

which searches multiple subsets from the whole candidate set to simultaneously reconstruct multiple templates

with minimum error. A customized APG method is derived for getting the optimum solution (in matrix form)

within several iterations. This formulation allows the candidates to be evaluated accurately in parallel rather

than one-by-one like most sparsity-based trackers do and meanwhile considers the relationship between

candidates, therefore it is more superior in terms of cost-performance ratio. The discriminative information

containing in this map comes from a large template set with multiple positive target templates and hundreds of

negative templates. A Laplacian term is introduced to keep the coefficients similarity level in accordance with

the candidates similarities, thereby making our tracker more robust. A pooling approach is proposed to extract

the discriminative information in the DSS map for easily yet effectively selecting good candidates from bad

ones and finally get the optimum tracking results. Plenty experimental evaluations on challenging image

sequences demonstrate that the proposed tracking algorithm performs favorably against the state-of-the-art

methods.

ETPL

DIP - 062

Visual Tracking via Discriminative Sparse Similarity Map








In this paper, we propose a novel example-based method for denoising and super-resolution of

medical images. The objective is to estimate a high-resolution image from a single noisy low-resolution

image, with the help of a given database of high and low-resolution image patch pairs. Denoising and super-

resolution in this paper is performed on each image patch. For each given input low-resolution patch, its high-

resolution version is estimated based on finding a nonnegative sparse linear representation of the input patch

over the low-resolution patches from the database, where the coefficients of the representation strongly depend

on the similarity between the input patch and the sample patches in the database. The problem of finding the

nonnegative sparse linear representation is modeled as a nonnegative quadratic programming problem. The

proposed method is especially useful for the case of noise-corrupted and low-resolution image. Experimental

results show that the proposed method outperforms other state-of-the-art super-resolution methods while

effectively removing noise.

ETPL

DIP - 063

Novel Example-Based Method for Super-Resolution and Denoising of Medical

Images

Compressive spectral imaging (CSI) senses the spatio-spectral information of a scene by measuring

2D coded projections on a focal plane array. A ℓ1-norm-based optimization algorithm is then used to recover

the underlying discretized spectral image. The coded aperture snapshot spectral imager (CASSI) is an

architecture realizing CSI where the reconstruction image quality relies on the design of a 2D set of binary

coded apertures which block-unblock the light from the scene. This paper extends the compressive capabilities

of CASSI by replacing the traditional blocking-unblocking coded apertures by a set of colored coded

apertures. The colored coded apertures are optimized such that the number of projections is minimized while

the quality of reconstruction is maximized. The optimal design of the colored coded apertures aims to better

satisfy the restricted isometry property in CASSI. The optimal designs are compared with random colored

coded aperture patterns and with the traditional blocking-unblocking coded apertures. Extensive simulations

show the improvement in reconstruction PSNR attained by the optimal colored coded apertures designs.

ETPL

DIP - 064

Colored Coded Aperture Design by Concentration of Measure in Compressive

Spectral Imaging








Compressive spectral imaging (CSI) senses the spatio-spectral information of a scene by

measuring 2D coded projections on a focal plane array. A ℓ1-norm-based optimization algorithm is

then used to recover the underlying discretized spectral image. The coded aperture snapshot spectral

imager (CASSI) is an architecture realizing CSI where the reconstruction image quality relies on the

design of a 2D set of binary coded apertures which block-unblock the light from the scene. This

paper extends the compressive capabilities of CASSI by replacing the traditional blocking-

unblocking coded apertures by a set of colored coded apertures. The colored coded apertures are

optimized such that the number of projections is minimized while the quality of reconstruction is

maximized. The optimal design of the colored coded apertures aims to better satisfy the restricted

isometry property in CASSI. The optimal designs are compared with random colored coded aperture

patterns and with the traditional blocking-unblocking coded apertures. Extensive simulations show

the improvement in reconstruction PSNR attained by the optimal colored coded apertures designs.

ETPL

DIP - 065

Regularized Tree Partitioning and Its Application to Unsupervised Image

Segmentation

A prior work proposed by Chung-Wu considered an edge-based lookup table to obtain good inversed

image quality, yet it suffers from some drawbacks in terms of image quality, memory consumption, and

complexity. In this correspondence, an improved scheme is proposed to deal with these issues.

ETPL

DIP - 066

Inverse Halftoning With Context Driven Prediction

This paper proposes a novel saliency detection framework termed as saliency tree. For effective

saliency measurement, the original image is first simplified using adaptive color quantization and region

segmentation to partition the image into a set of primitive regions. Then, three measures, i.e., global contrast,

spatial sparsity, and object prior are integrated with regional similarities to generate the initial regional

saliency for each primitive region. Next, a saliency-directed region merging approach with dynamic scale

control scheme is proposed to generate the saliency tree, in which each leaf node represents a primitive region

and each non-leaf node represents a non-primitive region generated during the region merging process.

Finally, by exploiting a regional center-surround scheme based node selection criterion, a systematic saliency

tree analysis including salient node selection, regional saliency adjustment and selection is performed to obtain

final regional saliency measures and to derive the high-quality pixel-wise saliency map. Extensive

experimental results on five datasets with pixel-wise ground truths demonstrate that the proposed saliency tree

model consistently outperforms the state-of-the-art saliency models.

ETPL

DIP - 067

Saliency Tree: A Novel Saliency Detection Framework








This paper proposes two sets of novel edge-texture features, Discriminative Robust Local Binary

Pattern (DRLBP) and Ternary Pattern (DRLTP), for object recognition. By investigating the limitations of

Local Binary Pattern (LBP), Local Ternary Pattern (LTP) and Robust LBP (RLBP), DRLBP and DRLTP are

proposed as new features. They solve the problem of discrimination between a bright object against a dark

background and vice-versa inherent in LBP and LTP. DRLBP also resolves the problem of RLBP whereby

LBP codes and their complements in the same block are mapped to the same code. Furthermore, the proposed

features retain contrast information necessary for proper representation of object contours that LBP, LTP, and

RLBP discard. Our proposed features are tested on seven challenging data sets: INRIA Human, Caltech

Pedestrian, UIUC Car, Caltech 101, Caltech 256, Brodatz, and KTH-TIPS2-a. Results demonstrate that the

proposed features outperform the compared approaches on most data sets.

ETPL

DIP - 068

LBP-Based Edge-Texture Features for Object Recognition

Many imaging applications require the implementation of space-varying convolution for accurate

restoration and reconstruction of images. Here, we use the term space-varying convolution to refer to linear

operators whose impulse response has slow spatial variation. In addition, these space-varying convolution

operators are often dense, so direct implementation of the convolution operator is typically computationally

impractical. One such example is the problem of stray light reduction in digital cameras, which requires the

implementation of a dense space-varying deconvolution operator. However, other inverse problems, such as

iterative tomographic reconstruction, can also depend on the implementation of dense space-varying

convolution. While space-invariant convolution can be efficiently implemented with the fast Fourier

transform, this approach does not work for space-varying operators. So direct convolution is often the only

option for implementing space-varying convolution. In this paper, we develop a general approach to the

efficient implementation of space-varying convolution, and demonstrate its use in the application of stray light

reduction. Our approach, which we call matrix source coding, is based on lossy source coding of the dense

space-varying convolution matrix. Importantly, by coding the transformation matrix, we not only reduce the

memory required to store it; we also dramatically reduce the computation required to implement matrix-vector

products. Our algorithm is able to reduce computation by approximately factoring the dense space-varying

convolution operator into a product of sparse transforms. Experimental results show that our method can

dramatically reduce the computation required for stray light reduction while maintaining high accuracy.

ETPL

DIP - 069

Fast Space-Varying Convolution Using Matrix Source Coding With

Applications to Camera Stray Light Reduction








The goal of this paper is to propose a statistical model of quantized discrete cosine transform (DCT)

coefficients. It relies on a mathematical framework of studying the image processing pipeline of a typical

digital camera instead of fitting empirical data with a variety of popular models proposed in this paper. To

highlight the accuracy of the proposed model, this paper exploits it for the detection of hidden information in

JPEG images. By formulating the hidden data detection as a hypothesis testing, this paper studies the most

powerful likelihood ratio test for the steganalysis of Jsteg algorithm and establishes theoretically its statistical

performance. Based on the proposed model of DCT coefficients, a maximum likelihood estimator for

embedding rate is also designed. Numerical results on simulated and real images emphasize the accuracy of

the proposed model and the performance of the proposed test.

ETPL

DIP - 070

Statistical Model of Quantized DCT Coefficients: Application in the

Steganalysis of Jsteg Algorithm

In image classification tasks, one of the most successful algorithms is the bag-of-features (BoFs)

model. Although the BoF model has many advantages, such as simplicity, generality, and scalability, it still

suffers from several drawbacks, including the limited semantic description of local descriptors, lack of robust

structures upon single visual words, and missing of efficient spatial weighting. To overcome these

shortcomings, various techniques have been proposed, such as extracting multiple descriptors, spatial context

modeling, and interest region detection. Though they have been proven to improve the BoF model to some

extent, there still lacks a coherent scheme to integrate each individual module together. To address the

problems above, we propose a novel framework with spatial pooling of complementary features. Our model

expands the traditional BoF model on three aspects. First, we propose a new scheme for combining texture and

edge-based local features together at the descriptor extraction level. Next, we build geometric visual phrases to

model spatial context upon complementary features for midlevel image representation. Finally, based on a

smoothed edgemap, a simple and effective spatial weighting scheme is performed to capture the image

saliency. We test the proposed framework on several benchmark data sets for image classification. The

extensive results show the superior performance of our algorithm over the state-of-the-art methods.

ETPL

DIP - 071

Spatial Pooling of Heterogeneous Features for Image Classification








We present a novel domain adaptation approach for solving cross-domain pattern recognition

problems, i.e., the data or features to be processed and recognized are collected from different

domains of interest. Inspired by canonical correlation analysis (CCA), we utilize the derived

correlation subspace as a joint representation for associating data across different domains, and we

advance reduced kernel techniques for kernel CCA (KCCA) if nonlinear correlation subspace are

desirable. Such techniques not only makes KCCA computationally more efficient, potential over-

fitting problems can be alleviated as well. Instead of directly performing recognition in the derived

CCA subspace (as prior CCA-based domain adaptation methods did), we advocate the exploitation of

domain transfer ability in this subspace, in which each dimension has a unique capability in

associating cross-domain data. In particular, we propose a novel support vector machine (SVM) with

a correlation regularizer, named correlation-transfer SVM, which incorporates the domain adaptation

ability into classifier design for cross-domain recognition. We show that our proposed domain

adaptation and classification approach can be successfully applied to a variety of cross-domain

recognition tasks such as cross-view action recognition, handwritten digit recognition with different

features, and image-to-text or text-to-image classification. From our empirical results, we verify that

our proposed method outperforms state-of-the-art domain adaptation approaches in terms of

recognition performance.

ETPL

DIP - 072

Heterogeneous Domain Adaptation and Classification by Exploiting the

Correlation Subspace








Image reranking is effective for improving the performance of a text-based image search. However,

existing reranking algorithms are limited for two main reasons: 1) the textual meta-data associated with

images is often mismatched with their actual visual content and 2) the extracted visual features do not

accurately describe the semantic similarities between images. Recently, user click information has been used

in image reranking, because clicks have been shown to more accurately describe the relevance of retrieved

images to search queries. However, a critical problem for click-based methods is the lack of click data, since

only a small number of web images have actually been clicked on by users. Therefore, we aim to solve this

problem by predicting image clicks. We propose a multimodal hypergraph learning-based sparse coding

method for image click prediction, and apply the obtained click data to the reranking of images. We adopt a

hypergraph to build a group of manifolds, which explore the complementarity of different features through a

group of weights. Unlike a graph that has an edge between two vertices, a hyperedge in a hypergraph connects

a set of vertices, and helps preserve the local smoothness of the constructed sparse codes. An alternating

optimization procedure is then performed, and the weights of different modalities and the sparse codes are

simultaneously obtained. Finally, a voting strategy is used to describe the predicted click as a binary event

(click or no click), from the images' corresponding sparse codes. Thorough empirical studies on a large-scale

database including nearly 330 K images demonstrate the effectiveness of our approach for click prediction

when compared with several other methods. Additional image reranking experiments on real-world data show

the use of click prediction is beneficial to improving the performance of prominent graph-based image

reranking algorithms.

ETPL

DIP - 073

Click Prediction for Web Image Reranking Using Multimodal Sparse Coding

We propose a new mathematical and algorithmic framework for unsupervised image segmentation,

which is a critical step in a wide variety of image processing applications. We have found that most existing

segmentation methods are not successful on histopathology images, which prompted us to investigate

segmentation of a broader class of images, namely those without clear edges between the regions to be

segmented. We model these images as occlusions of random images, which we call textures, and show that

local histograms are a useful tool for segmenting them. Based on our theoretical results, we describe a flexible

segmentation framework that draws on existing work on nonnegative matrix factorization and image

deconvolution. Results on synthetic texture mosaics and real histology images show the promise of the

method.

ETPL

DIP - 074

Images as Occlusions of Textures: A Framework for Segmentation








In recent years, there has been growing interest in mapping visual features into compact binary codes

for applications on large-scale image collections. Encoding high-dimensional data as compact binary codes

reduces the memory cost for storage. Besides, it benefits the computational efficiency since the computation of

similarity can be efficiently measured by Hamming distance. In this paper, we propose a novel flexible scale

invariant feature transform (SIFT) binarization (FSB) algorithm for large-scale image search. The FSB

algorithm explores the magnitude patterns of SIFT descriptor. It is unsupervised and the generated binary

codes are demonstrated to be dispreserving. Besides, we propose a new searching strategy to find target

features based on the cross-indexing in the binary SIFT space and original SIFT space. We evaluate our

approach on two publicly released data sets. The experiments on large-scale partial duplicate image retrieval

system demonstrate the effectiveness and efficiency of the proposed algorithm.

ETPL

DIP - 075

Cross-Indexing of Binary SIFT Codes for Large-Scale Image Search

We propose a new strategy to evaluate the quality of multi and hyperspectral images, from the

perspective of human perception. We define the spectral image difference as the overall perceived difference

between two spectral images under a set of specified viewing conditions (illuminants). First, we analyze the

stability of seven image-difference features across illuminants, by means of an information-theoretic strategy.

We demonstrate, in particular, that in the case of common spectral distortions (spectral gamut mapping,

spectral compression, spectral reconstruction), chromatic features vary much more than achromatic ones

despite considering chromatic adaptation. Then, we propose two computationally efficient spectral image

difference metrics and compare them to the results of a subjective visual experiment. A significant

improvement is shown over existing metrics such as the widely used root-mean square error.

ETPL

DIP - 076

Image-Difference Prediction: From Color to Spectral








Multimedia communication is becoming pervasive because of the progress in wireless

communications and multimedia coding. Estimating the quality of the visual content accurately is crucial in

providing satisfactory service. State of the art visual quality assessment approaches are effective when the

input image and reference image have the same resolution. However, finding the quality of an image that has

spatial resolution different than that of the reference image is still a challenging problem. To solve this

problem, we develop a quality estimator (QE), which computes the quality of the input image without

resampling the reference or the input images. In this paper, we begin by identifying the potential weaknesses

of previous approaches used to estimate the quality of experience. Next, we design a QE to estimate the

quality of a distorted image with a lower resolution compared with the reference image. We also propose a

subjective test environment to explore the success of the proposed algorithm in comparison with other QEs.

When the input and test images have different resolutions, the subjective tests demonstrate that in most cases

the proposed method works better than other approaches. In addition, the proposed algorithm also performs

well when the reference image and the test image have the same resolution.

ETPL

DIP - 077

Full-Reference Quality Estimation for Images With Different Spatial

Resolutions

Content-aware image resizing techniques allow to take into account the visual content of images

during the resizing process. The basic idea beyond these algorithms is the removal of vertical and/or horizontal

paths of pixels (i.e., seams) containing low salient information. In this paper, we present a method which

exploits the gradient vector flow (GVF) of the image to establish the paths to be considered during the

resizing. The relevance of each GVF path is straightforward derived from an energy map related to the

magnitude of the GVF associated to the image to be resized. To make more relevant, the visual content of the

images during the content-aware resizing, we also propose to select the generated GVF paths based on their

visual saliency properties. In this way, visually important image regions are better preserved in the final

resized image. The proposed technique has been tested, both qualitatively and quantitatively, by considering a

representative data set of 1000 images labeled with corresponding salient objects (i.e., ground-truth maps).

Experimental results demonstrate that our method preserves crucial salient regions better than other state-of-

the-art algorithms.

ETPL

DIP - 078

Saliency-Based Selection of Gradient Vector Flow Paths for Content Aware

Image Resizing








Texture enhancement presents an ongoing challenge, in spite of the considerable progress made in

recent years. Whereas most of the effort has been devoted so far to enhancement of regular textures, stochastic

textures that are encountered in most natural images, still pose an outstanding problem. The purpose of

enhancement of stochastic textures is to recover details, which were lost during the acquisition of the image. In

this paper, a texture model, based on fractional Brownian motion (fBm), is proposed. The model is global and

does not entail using image patches. The fBm is a self-similar stochastic process. Self-similarity is known to

characterize a large class of natural textures. The fBm-based model is evaluated and a single-image

regularized superresolution algorithm is derived. The proposed algorithm is useful for enhancement of a wide

range of textures. Its performance is compared with single-image superresolution methods and its advantages

are highlighted.

ETPL

DIP - 079

Single-Image Superresolution of Natural Stochastic Textures Based on

Fractional Brownian Motion

This paper addresses the problem of automatic figure-ground segmentation, which aims at

automatically segmenting out all foreground objects from background. The underlying idea of this approach is

to transfer segmentation masks of globally and locally (glocally) similar exemplars into the query image. For

this purpose, we propose a novel high-level image representation method named as object-oriented descriptor.

Using this descriptor, a set of exemplar images glocally similar to the query image is retrieved. Then, using

over-segmented regions of these retrieved exemplars, a discriminative classifier is learned on-the-fly and

subsequently used to predict foreground probability for the query image. Finally, the optimal segmentation is

obtained by combining the online prediction with typical energy optimization of Markov random field. The

proposed approach has been extensively evaluated on three datasets, including Pascal VOC 2010, VOC 2011

segmentation challenges, and iCoseg dataset. Experiments show that the proposed approach outperforms state-

of-the-art methods and has the potential to segment large-scale images containing unknown objects, which

never appear in the exemplar images.

ETPL

DIP - 080

Online Glocal Transfer for Automatic Figure-Ground Segmentation








This paper presents a method for learning overcomplete dictionaries of atoms composed of two

modalities that describe a 3D scene: 1) image intensity and 2) scene depth. We propose a novel joint basis

pursuit (JBP) algorithm that finds related sparse features in two modalities using conic programming and we

integrate it into a two-step dictionary learning algorithm. The JBP differs from related convex algorithms

because it finds joint sparsity models with different atoms and different coefficient values for intensity and

depth. This is crucial for recovering generative models where the same sparse underlying causes (3D features)

give rise to different signals (intensity and depth). We give a bound for recovery error of sparse coefficients

obtained by JBP, and show numerically that JBP is superior to the group lasso algorithm. When applied to the

Middlebury depth-intensity database, our learning algorithm converges to a set of related features, such as

pairs of depth and intensity edges or image textures and depth slants. Finally, we show that JBP outperforms

state of the art methods on depth inpainting for time-of-flight and Microsoft Kinect 3D data.

ETPL

DIP - 081

Learning Joint Intensity-Depth Sparse Representations

Geodesic distance, as an essential measurement for data dissimilarity, has been successfully used in

manifold learning. However, most geodesic distance-based manifold learning algorithms have two limitations

when applied to classification: 1) class information is rarely used in computing the geodesic distances between

data points on manifolds and 2) little attention has been paid to building an explicit dimension reduction

mapping for extracting the discriminative information hidden in the geodesic distances. In this paper, we

regard geodesic distance as a kind of kernel, which maps data from linearly inseparable space to linear

separable distance space. In doing this, a new semisupervised manifold learning algorithm, namely regularized

geodesic feature learning algorithm, is proposed. The method consists of three techniques: a semisupervised

graph construction method, replacement of original data points with feature vectors which are built by

geodesic distances, and a new semisupervised dimension reduction method for feature vectors. Experiments

on the MNIST, USPS handwritten digit data sets, MIT CBCL face versus nonface data set, and an intelligent

traffic data set show the effectiveness of the proposed algorithm.

ETPL

DIP - 082

A Regularized Approach for Geodesic-Based Semisupervised Multimanifold

Learnin








This paper presents a nonlinear mixing model for joint hyperspectral image unmixing and

nonlinearity detection. The proposed model assumes that the pixel reflectances are linear combinations of

known pure spectral components corrupted by an additional nonlinear term, affecting the end members and

contaminated by an additive Gaussian noise. A Markov random field is considered for nonlinearity detection

based on the spatial structure of the nonlinear terms. The observed image is segmented into regions where

nonlinear terms, if present, share similar statistical properties. A Bayesian algorithm is proposed to estimate

the parameters involved in the model yielding a joint nonlinear unmixing and nonlinearity detection algorithm.

The performance of the proposed strategy is first evaluated on synthetic data. Simulations conducted with real

data show the accuracy of the proposed unmixing and nonlinearity detection strategy for the analysis of

hyperspectral images.

ETPL

DIP - 083

Residual Component Analysis of Hyperspectral Images—Application to Joint

Nonlinear Unmixing and Nonlinearity Detection

In this paper, we present an efficient multiscale low-rank representation for image segmentation. Our

method begins with partitioning the input images into a set of superpixels, followed by seeking the optimal

superpixel-pair affinity matrix, both of which are performed at multiple scales of the input images. Since low-

level superpixel features are usually corrupted by image noise, we propose to infer the low-rank refined

affinity matrix. The inference is guided by two observations on natural images. First, looking into a single

image, local small-size image patterns tend to recur frequently within the same semantic region, but may not

appear in semantically different regions. The internal image statistics are referred to as replication prior, and

we quantitatively justified it on real image databases. Second, the affinity matrices at different scales should be

consistently solved, which leads to the cross-scale consistency constraint. We formulate these two purposes

with one unified formulation and develop an efficient optimization procedure. The proposed representation

can be used for both unsupervised or supervised image segmentation tasks. Our experiments on public data

sets demonstrate the presented method can substantially improve segmentation accuracy.

ETPL

DIP - 084

MsLRR: A Unified Multiscale Low-Rank Representation for Image

Segmentation








The success of many image restoration algorithms is often due to their ability to sparsely describe the

original signal. Shukla proposed a compression algorithm, based on a sparse quadtreedecomposition model,

which could optimally represent piecewise polynomial images. In this paper, we adapt this model to

the image restoration by changing the rate-distortion penalty to a description-length penalty. In addition, one

of the major drawbacks of this type of approximation is the computational complexity required to find a

suitable subspace for each node of the quadtree. We address this issue by searching for a suitable subspace

much more efficiently using the mathematics of updating matrix factorisations. Algorithms are developed to

tackle denoising and interpolation. Simulation results indicate that we beat state of the art results when the

original signal is in the model (e.g., depth images) and are competitive for natural images when the

degradation is high.

ETPL

DIP - 085

Quadtree Structured Image Approximation for Denoising and Interpolation

Line scratch detection in old films is a particularly challenging problem due to the variable

spatiotemporal characteristics of this defect. Some of the main problems include sensitivity to noise and

texture, and false detections due to thin vertical structures belonging to the scene. We propose a

robust and automatic algorithm for frame-by-frame line scratch detection in old films, as well as a temporal

algorithm for the filtering of false detections. In the frame-by-frame algorithm, we relax some of the

hypotheses used in previous algorithms in order to detect a wider variety of scratches. This step's robustness

and lack of external parameters is ensured by the combined use of an a contrario methodology and local

statistical estimation. In this manner, over-detection in textured or cluttered areas is greatly reduced. The

temporal filtering algorithm eliminates false detections due to thin vertical structures by exploiting the

coherence of their motion with that of the underlying scene. Experiments demonstrate the ability of the

resulting detection procedure to deal with difficult situations, in particular in the presence of noise, texture,

and slanted or partial scratches. Comparisons show significant advantages over previous work.

ETPL

DIP - 086

Robust Automatic Line Scratch Detection in Films








The behavior and performance of denoising algorithms are governed by one or several parameters,

whose optimal settings depend on the content of the processed image and the characteristics of the noise, and

are generally designed to minimize the mean squared error (MSE) between the denoisedimage returned by the

algorithm and a virtual ground truth. In this paper, we introduce a new Poisson-

Gaussian unbiased riskestimator (PG-URE) of the MSE applicable to a mixed Poisson-Gaussian noise model

that unifies the widely used Gaussian and Poisson noise models in fluorescence bioimaging applications. We

propose a stochastic methodology to evaluate this estimator in the case when little is known about the internal

machinery of the considered denoising algorithm, and we analyze both theoretically and empirically the

characteristics of the PG-URE estimator. Finally, we evaluate the PG-URE-driven parametrization for three

standard denoising algorithms, with and without variance stabilizing transforms, and different characteristics

of the Poisson-Gaussian noise mixture.

ETPL

DIP - 087

An Unbiased Risk Estimator for Image Denoising in the Presence of Mixed

Poisson–Gaussian Noise

Block Truncation Coding (BTC) has been considered as a highly efficient compression technique for

decades. However, the annoying blocking effect and false contour under low bit rate configuration are its key

problems. In this work, an improved BTC, namely Dot-Diffused BTC (DDBTC), is proposed to solve these

problems. On one hand, the DDBTC can provide excellent processing efficiency by exploiting the innate

parallelism advantage of dot diffusion. On the other hand, the DDBTC can provide excellent image quality by

co-optimizing the class matrix and diffused matrix of the dot diffusion. The experimental results demonstrate

that the proposed DDBTC is fully superior to the pervious Error-Diffused BTC (EDBTC) in terms of image

quality and processing efficiency, and has much better image quality than that of the Ordered-Dither BTC

(ODBTC).

ETPL

DIP - 088

Improved Block Truncation Coding Using Optimized Dot Diffusion








Mathematical morphology is a very popular framework for processing binary or grayscale images.

One of the key problems in applying this framework to color images is the notorious false color problem. We

discuss the nature of this problem and its origins. In doing so, it becomes apparent that the lack of invariance

of operators to certain transformations (forming a group) plays an important role. The main culprits are the

basic join and meet operations, and the associated lattice structure that forms the theoretical basis for

mathematical morphology. We show how a lattice that is not group invariant can be related to another lattice

that is. When all transformations in a group are linear, these lattices can be related to one another via the

theory of frames. This provides all the machinery to let us transform any (grayscale or color) morphological

filter into a group-invariant filter on grayscale or color images. We then demonstrate the potential for both

subjective and objective improvement in selected tasks.

ETPL

DIP - 089

Group-Invariant Colour Morphology Based on Frames

In this paper, we present an extension of the iterative closest point (ICP) algorithm that

simultaneously registers multiple 3Dscans. While ICP fails to utilize the multiview constraints available, our

method exploits the information redundancy in a set of 3D scans by using the averaging of relative motions.

This averaging method utilizes the Lie group structure of motions, resulting in a 3D registration method that is

both efficient and accurate. In addition, we present two variants of our approach, i.e., a method that solves

for multiview 3Dregistration while obeying causality and a transitive correspondence variant that efficiently

solves the correspondence problem across multiple scans. We present experimental results to characterize our

method and explain its behavior as well as those of some other multiview registration methods in the literature.

We establish the superior accuracy of our method in comparison to these multiview methods with

registration results on a set of well-known real datasets of 3Dscans.

ETPL

DIP - 090

On Averaging Multiview Relations for 3D Scan Registration








The distributions of discrete cosine transform (DCT) coefficients of images are revisited on a per

image base. To better handle, the heavy tail phenomenon commonly seen in the DCT coefficients, a

new model dubbed a transparent composite model (TCM) is proposed and justified for both

modeling accuracy and an additional data reduction capability. Given a sequence of the DCT coefficients, a

TCM first separates the tail from the main body of the sequence. Then, a uniform distribution is used

to model the DCT coefficients in the heavy tail, whereas a different parametric distribution is used to

model data in the main body. The separate boundary and other parameters of the TCM can be estimated via

maximum likelihood estimation. Efficient online algorithms are proposed for parameter estimation and their

convergence is also proved. Experimental results based on Kullback-Leibler divergence and χ2 test show that

for real-valued continuous ac coefficients, the TCM based on truncated Laplacian offers the best tradeoff

between modeling accuracy and complexity. For discrete or integer DCT coefficients, the discrete TCM based

on truncated geometric distributions (GMTCM) models the ac coefficients more accurately than pure

Laplacian models and generalized Gaussian models in majority cases while having simplicity and practicality

similar to those of pure Laplacian models. In addition, it is demonstrated that the GMTCM also exhibits a

good capability of data reduction or feature extraction-the DCT coefficients in the heavy tail identified by the

GMTCM are truly outliers, and these outliers represent an outlier image revealing some unique global features

of the image. Overall, the modeling performance and the data reduction feature of the GMTCM make it a

desirable choice for modeling discrete or integer DCT coefficients in the real-world image or video

applications, as summarized in a few of our further studies on quantization design, entropy coding design, and

ima- e understanding and management.

ETPL

DIP - 091

Transparent Composite Model for DCT Coefficients: Design and Analysis








Privacy is a critical issue when the data owners outsource data storage or processing to a third party

computing service, such as the cloud. In this paper, we identify a cloud computing application scenario that

requires simultaneously performingsecure watermark detection and privacy preserving multimedia

data storage. We then propose a compressive sensing (CS)-based framework using secure multiparty

computation (MPC) protocols to address such a requirement. In our framework, the multimedia data and

secret watermark pattern are presented to the cloud for secure watermark detection in a CS domain to protect

the privacy. During CS transformation, the privacy of the CS matrix and the watermark pattern is protected by

the MPC protocols under the semi-honest security model. We derive the expected watermark detection

performance in the CS domain, given the target image, watermark pattern, and the size of the CS matrix (but

without the CS matrix itself). The correctness of the derived performance has been validated by our

experiments. Our theoretical analysis and experimental results show that secure watermark detection in the CS

domain is feasible. Our framework can also be extended to other collaborative secure signal processing and

data-mining applications in the cloud.

ETPL

DIP - 092

A Compressive Sensing Based Secure Watermark Detection and Privacy

Preserving Storage Framework

Noise is present in all images captured by real-world image sensors. Poisson distribution is said to

model the stochastic nature of the photon arrival process and agrees with the distribution of measured pixel

values. We propose a method for estimating

unknown noise parameters from Poisson corruptedimages using properties of variance stabilization. With a

significantly lower computational complexity and improved stability, the proposed estimation technique

yields noise parameters that are comparable in accuracy to the state-of-art methods.

ETPL

DIP - 093

Noise Parameter Estimation for Poisson Corrupted Images Using Variance

Stabilization Transforms








The left ventricular myocardium plays a key role in the entire circulation system and

an automatic delineation of the myocardium is a prerequisite for most of the subsequent functional analysis. In

this paper, we present a complete system for an automatic segmentation of

the left ventricularmyocardium from cardiac computed tomography (CT) images using the shape information

from images to be segmented. The system follows a coarse-to-fine strategy by first localizing the left ventricle

and then deforming the myocardial surfaces of the left ventricle to refine the segmentation. In particular, the

blood pool of a CT image is extracted and represented as a triangulated surface. Then, the left ventricle is

localized as a salient component on this surface using geometric and anatomical characteristics. After that, the

myocardial surfaces are initialized from the localization result and evolved by applying forces from

the image intensities with a constraint based on the initial myocardial surface locations. The proposed

framework has been validated on 34-human and 12-pig CT images, and the robustness and accuracy are

demonstrated.

ETPL

DIP - 094

A Complete System for Automatic Extraction of Left Ventricular Myocardium

From CT Images Using Shape Segmentation and Contour Evolution

We propose a blind (no reference or NR) video quality evaluation model that is non distortion

specific. The approach relies on a spatio-temporal model of video scenes in the discrete cosine transform

domain, and on a model that characterizes the type of motion occurring in the scenes, to predict video quality.

We use the models to define video statistics and perceptual features that are the basis of a video

quality assessment (VQA) algorithm that does not require the presence of a pristine video to compare against

in order to predict a perceptual quality score. The contributions of this paper are threefold. 1) We propose a

spatio-temporal natural scene statistics (NSS) model for videos. 2) We propose a motion model that quantifies

motion coherency in video scenes. 3) We show that the proposed NSS and motion coherency models are

appropriate for quality assessment of videos, and we utilize them to design a blind VQA algorithm that

correlates highly with human judgments of quality. The proposed algorithm, called video BLIINDS, is tested

on the LIVE VQA database and on the EPFL-PoliMi video database and shown to perform close to the level

of top performing reduced and full reference VQA algorithms.

ETPL

DIP - 095

Blind Prediction of Natural Video Quality








While image-difference metrics show good prediction performance on visual data, they often yield

artifact-contaminated results if used as objective functions for optimizing complex image-processing tasks.

We investigate in this regard the recently proposed color-image-difference (CID) metric particularly

developed for predicting gamut-mapping distortions. We present an algorithm for optimizing gamut mapping

employing the CID metric as the objective function. Resulting images contain various visual artifacts, which

are addressed by multiple modifications yielding the improved color-image-difference (iCID) metric. The

iCID-based optimizations are free from artifacts and retain contrast, structure, and color of the

original image to a great extent. Furthermore, the prediction performance on visual data is improved by the

modifications.

ETPL

DIP - 096

Color-Image Quality Assessment: From Prediction to Optimization

This paper presents an accelerated iterative Landweber method

for nonlinear ultrasonic tomographic imaging in a multiple-input multiple-output (MIMO) configuration under

a sparsity constraint on the image. The proposed method introduces the emerging MIMO signal processing

techniques and target sparseness constraints in the traditional computational imaging field, thus significantly

improves the speed of image reconstruction compared with the conventional imaging method while producing

high quality images. Using numerical examples, we demonstrate that incorporating prior knowledge about

the imaging field such as target sparsenessaccelerates significantly the convergence of the iterative

imaging method, which provides considerable benefits to real-time tomographic imaging applications.

ETPL

DIP - 097

Accelerated Nonlinear Multichannel Ultrasonic Tomographic Imaging Using Target

Sparseness








Blur kernel estimation is a crucial step in the deblurring process for images. Estimation of the kernel,

especially in the presence of noise, is easily perturbed, and the quality of the resulting deblurred images is

hence degraded. Since every motion blur in a single exposure image can be represented by 2D parametric

curves, we adopt a piecewise-linear model to approximate the curves for the reliable blur kernel estimation.

The model is found to be an effective tradeoff between flexibility and robustness as it takes advantage of two

extremes: (1) the generic model, represented by a discrete 2D function, which has a high degree of freedom

(DOF) for the maximum flexibility but suffers from noise and (2) the linearmodel, which enhances robustness

and simplicity but has limited expressiveness due to its low DOF. We evaluate several deblurring methods

based on not only the generic model, but also the piecewise-linear model as an alternative. After analyzing the

experiment results using real-world images with significant levels of noise and a benchmark data set, we

conclude that the proposed model is not only robust with respect to noise, but also flexible in dealing with

various types of blur.

ETPL

DIP - 098

Robust Estimation of Motion Blur Kernel Using a Piecewise-Linear Model

A directed graph (or digraph) approach is proposed in this paper for identifying all the visual objects

commonly presented in the two images under comparison. As a model, the directed graph is superior to the

undirected graph, since there are two link weights with opposite orientations associated with each link of

the graph. However, it inevitably draws two main challenges: 1) how to compute the two link weights for each

link and 2) how to extract the sub graph from the digraph. For 1), a novel n-ranking process for computing the

generalized median and the Gaussian link-weight mapping function are developed that basically map the

established undirected graph to the digraph. To achieve this graph mapping, the proposed process and function

are applied to each vertex independently for computing its directed link weight by not only considering the

influences inserted from its immediately adjacent neighboring vertices (in terms of their link-weight values),

but also offering other desirable merits-i.e., link-weight enhancement and computational complexity reduction.

For 2), an evolutionary iterative process for solving the non-cooperative game theory is exploited to handle the

non-symmetric weighted adjacency matrix. The abovementioned two stages of processes will be conducted for

each assumed scale-change factor, experimented over a range of possible values, one factor at a time. If there

is a match on the scale-change factor under experiment, the common visual patterns with the same scale-

change factor will be extracted. If more than one pattern are extracted, the proposed topological splitting

method is able to further differentiate among them provided that the visual objects are sufficiently far apart

from each other. Extensive simulation results have clearly demonstrated the superior performance

accomplished by the proposed digraph approach, compared with those of using the

undirected graph approach.

ETPL

DIP - 099

Common Visual Pattern Discovery via Directed Graph








Photo aesthetic quality evaluation is a fundamental yet under addressed task in computer vision and

image processing fields. Conventional approaches are frustrated by the following two drawbacks. First, both

the local and global spatial arrangements of image regions play an important role in photoaesthetics. However,

existing rules, e.g., visual balance, heuristically define which spatial distribution among the salient regions of

a photo is aesthetically pleasing. Second, it is difficult to adjust visual cues from multiple channels

automatically in photo aesthetics assessment. To solve these problems, we propose a

new photo aesthetics evaluation framework, focusing on learning the image descriptors that

characterize local and global structural aesthetics from multiple visual channels. In particular, to describe the

spatial structure of the image local regions, we construct graphlets small-sized connected graphs by connecting

spatially adjacent atomic regions. Since spatially adjacent graphlets distribute closely in their feature space, we

project them onto a manifold and subsequently propose an embedding algorithm. The embedding algorithm

encodes the photo global spatial layout into graphlets. Simultaneously, the importance of graphlets from

multiple visual channels are dynamically adjusted. Finally, these post-embedding graphlets are integrated

for photoaesthetics evaluation using a probabilistic model. Experimental results show that: 1) the visualized

graphlets explicitly capture the aesthetically arranged atomic regions; 2) the proposed approach generalizes

and improves four prominent aesthetic rules; and 3) our approach significantly outperforms state-of-the-art

algorithms in photo aesthetics prediction.

ETPL

DIP - 100

Fusion of Multichannel Local and Global Structural Cues for Photo Aesthetics

Evaluation








Supervised machine learning techniques have been applied to

multilabel image classification problems with tremendous success. Despite disparate learning mechanisms,

their performances heavily rely on the quality of training images. However, the acquisition of

training images requires significant efforts from human annotators. This hinders the applications of

supervised learning techniques to large scale problems. In this paper, we propose a high-

order label correlation driven active learning (HoAL) approach that allows the iterative learning algorithm

itself to select the informative example-label pairs from which it learns so as to learn an accurate classifier

with less annotation efforts. Four crucial issues are considered by the proposed HoAL: 1) unlike binary cases,

the selection granularity for multilabel active learning need to be fined from example to example-label pair; 2)

different labels are seldom independent, and label correlations provide critical information for

efficient learning; 3) in addition to pair-wise label correlations, high-order label correlations are also

informative for multilabel active learning; and 4) since the number of label combinations increases

exponentially with respect to the number of labels, an efficient mining method is required to discover

informative label correlations. The proposed approach is tested on public data sets, and the empirical results

demonstrate its effectiveness.

ETPL

DIP - 101

Multilabel Image Classification Via High-Order Label Correlation Driven

Active Learning

We present a novel image super pixel segmentation approach using the

proposed lazy random walk (LRW) algorithm in this paper. Our method begins with initializing the seed

positions and runs the LRW algorithm on the input image to obtain the probabilities of each pixel. Then, the

boundaries of initial super pixels are obtained according to the probabilities and the commute time. The initial

super pixels are iteratively optimized by the new energy function, which is defined on the commute time and

the texture measurement. Our LRW algorithm with self-loops has the merits of segmenting the weak

boundaries and complicated texture regions very well by the new global probability maps and the commute

time strategy. The performance of super pixel is improved by relocating the center positions of super pixels

and dividing the large super pixels into small ones with the proposed optimization algorithm. The

experimental results have demonstrated that our method achieves better performance than previous super

pixelapproaches.

ETPL

DIP - 102

Lazy Random Walks for Super pixel Segmentation








Conventionally, data embedding techniques aim at maintaining high-output image quality so that the

difference between the original and the embedded images is imperceptible to the naked eye. Recently, as a

new trend, some researchers exploited reversible data embedding techniques to deliberately degrade image

quality to a desirable level of distortion. In this paper, a unified data embedding-scrambling technique called

UES is proposed to achieve two objectives simultaneously, namely, high payload and adaptive scalable quality

degradation. First, a pixel intensity value prediction method called checkerboard-based prediction is proposed

to accurately predict 75% of the pixels in the image based on the information obtained from 25% of the image.

Then, the locations of the predicted pixels are vacated to embed information while degrading the image

quality. Given a desirable quality (quantified in SSIM) for the output image, UES guides the embedding-

scrambling algorithm to handle the exact number of pixels, i.e., the perceptual quality of the embedded-

scrambled image can be controlled. In addition, the prediction errors are stored at a predetermined precision

using the structure side information to perfectly reconstruct or approximate the original image. In particular,

given a desirable SSIM value, the precision of the stored prediction errors can be adjusted to control the

perceptual quality of the reconstructed image. Experimental results confirmed that UES is able to perfectly

reconstruct or approximate the original image with SSIM value after completely degrading its perceptual

quality while embedding at 7.001 bpp on average.

ETPL

DIP - 103

A Unified Data Embedding and Scrambling Method








We describe a new 3D saliency prediction model that accounts for diverse low-level luminance,

chrominance, motion, and depth attributes of 3D videos as well as high-level classifications of scenes by type.

The model also accounts for perceptual factors, such as the non uniform resolution of the human

eye, stereoscopic limits imposed by Panum's fusional area, and the predicted degree of (dis) comfort felt, when

viewing the 3D video. The high-level analysis involves classification of each 3D video scene by type with

regard to estimated camera motion and the motions of objects in the videos. Decisions regarding the

relative saliency of objects or regions are supported by data obtained through a series of eye-tracking

experiments. The algorithm developed from the model elements operates by finding and segmenting salient

3D space-time regions in a video, then calculating the saliency strength of each segment using measured

attributes of motion, disparity, texture, and the predicted degree of visual discomfort experienced.

The saliency energy of both segmented objects and frames are weighted using models of human foveation and

Panum's fusional area yielding a single predictor of 3D saliency.

ETPL

DIP - 104

Saliency Prediction on Stereoscopic Videos

Recovering images from corrupted observations is necessary for many real-world applications. In this

paper, we propose aunified framework to perform progressive image recovery based

on hybrid graph Laplacian regularized regression. We first construct a multi scale representation of the

target image by Laplacian pyramid, then progressively recover the degraded image in the scale space from

coarse to fine so that the sharp edges and texture can be eventually recovered. On one hand, within each scale,

a graph Laplacian regularization model represented by implicit kernel is learned, which simultaneously

minimizes the least square error on the measured samples and preserves the geometrical structure of

the image data space. In this procedure, the intrinsic manifold structure is explicitly considered using both

measured and unmeasured samples, and the nonlocal self-similarity property is utilized as a fruitful resource

for abstracting a priori knowledge of the images. On the other hand, between two successive scales, the

proposed model is extended to a projected high-dimensional feature space through explicit kernel mapping to

describe the inter scale correlation, in which the local structure regularity is learned and propagated from

coarser to finer scales. In this way, the proposed algorithm gradually recovers more and more image details

and edges, which could not been recovered in previous scale. We test our algorithm on one typical image

recovery task: impulse noise removal. Experimental results on benchmark test images demonstrate that the

proposed method achieves better performance than state-of-the-art algorithms.

ETPL

DIP - 105

Progressive Image Denoising Through Hybrid Graph Laplacian Regularization:

A Unified Framework








Labeled training data are used for challenging medical image segmentation problems to learn

different characteristics of the relevant domain. In this paper, we examine random forest (RF) classifiers, their

learned knowledge during training and ways to exploit it for improved image segmentation. Apart from

learning discriminative features, RFs also quantify their importance in classification. Feature importance is

used to design a feature selection strategy critical for high segmentation and classification accuracy, and also

to design a smoothness cost in a second-order MRF framework for graph cut segmentation. The cost function

combines the contribution of different image features like intensity, texture, and curvature information.

Experimental results on medical images show that this strategy leads to better segmentation accuracy than

conventional graph cut algorithms that use only intensity information in the smoothness cost.

ETPL

DIP – 106

Analyzing Training Information From Random Forests for Improved Image

Segmentation

In this paper, we propose saliency driven image multiscalenonlinear diffusion filtering.

The resulting scale space in general preserves or even enhances semantically important structures

such as edges, lines, or flow-like structures in the foreground, and inhibits and smoothes clutter in the

background. The image is classified using multiscale information fusion based on the original image,

the image at the final scale at which the diffusion process converges, and the image at a midscale.

Our algorithm emphasizes the foreground features, which are important for image classification. The

background image regions, whether considered as contexts of the foreground or noise to the

foreground, can be globally handled by fusing information from different scales. Experimental tests

of the effectiveness of the multiscale space for the image classification are conducted on the

following publicly available datasets: 1) the PASCAL 2005 dataset; 2) the Oxford 102 flowers

dataset; and 3) the Oxford 17 flowers dataset, with high classification rates.

ETPL

DIP - 107

Image Classification Using Multiscale Information Fusion Based on Saliency

Driven Nonlinear Diffusion Filtering








The objective approaches of 3D image quality assessment play a key role for the development

of compression standards and various 3D multimedia applications. The quality assessment of3D images faces

more new challenges, such as asymmetric stereo compression, depth perception, and virtual view synthesis,

than its 2D counterparts. In addition, the widely used 2D image quality metrics (e.g., PSNR and SSIM) cannot

be directly applied to deal with these newly introduced challenges. This statement can be verified by the low

correlation between the computed objective measures and the subjectively measured mean opinion scores

(MOSs), when 3Dimages are the tested targets. In order to meet these newly introduced challenges, in this

paper, besides traditional 2Dimage metrics, the binocular integration behaviors-the binocular combination and

the binocular frequency integration, are utilized as the bases for measuring the quality of

stereoscopic 3D images. The effectiveness of the proposed metrics is verified by conducting subjective

evaluations on publicly available stereoscopic image databases. Experimental results show that significant

consistency could be reached between the measured MOS and the proposed metrics, in which the correlation

coefficient between them can go up to 0.88. Furthermore, we found that the proposed metrics can also address

the quality assessment of the synthesized color-plus-depth 3D images well. Therefore, it is our belief that the

binocular integration behaviors are important factors in the development of

objective quality assessment for 3D images.

ETPL

DIP - 108

Quality Assessment of Stereoscopic 3D Image Compression by Binocular

Integration Behaviors








Path openings and closings are morphological tools used to preserve long, thin, and tortuous

structures in gray level images. They explore all paths from a defined class, and filter them with a length

criterion. However, most paths are redundant, making the process generally

slow. Parsimoniouspath openings and closings are introduced in this paper to solve this problem. These

operators only consider a subset of the paths considered by classical path openings, thus achieving a

substantial speed-up, while obtaining similar results. In addition, a recently introduced 1D opening algorithm

is applied along each selected path. Its complexity is linear with respect to the number of pixels, independent

of the size of the opening. Furthermore, it is fast for any input data accuracy (integer or floating point) and

works in stream. Parsimonious path openings are also extended to incomplete paths, i.e., paths containing

gaps. Noise-corrupted paths can thus be processed with the same approach and complexity. These

parsimonious operators achieve a several orders of magnitude speed-up. Examples are shown for

incomplete path openings, where computing times are brought from minutes to tens of milliseconds, while

obtaining similar results.

ETPL

DIP - 109

Parsimonious Path Openings and Closings

Matching visual appearances of the target over consecutive video frames is a fundamental yet

challenging task in visualtracking. Its performance largely depends on the distance metric that determines the

quality of visual matching. Rather than using fixed and predefined metric, recent attempts of

integrating metric learning-based trackers have shown more robust and promising results, as the

learned metric can be more discriminative. In general, these global metric adjustment methods are

computationally demanding in real-time visual tracking tasks, and they tend to under fit the data when the

target exhibits dynamic appearance variation. This paper presents a nonparametric data-driven local metric

adjustment method. The proposed method finds a spatially adaptive metric that exhibits different properties at

different locations in the feature space, due to the differences of the data distribution in a local neighborhood.

It minimizes the deviation of the empirical misclassification probability to obtain the optimal metric such that

the asymptotic error as if using an infinite set of training samples can be approximated. Moreover, by taking

the data local distribution into consideration, it is spatially adaptive. Integrating this new local metric learning

method into target tracking leads to efficient and robust tracking performance. Extensive experiments have

demonstrated the superiority and effectiveness of the proposed tracking method in various tracking scenarios.

ETPL

DIP - 110

Data-Driven Spatially-Adaptive Metric Adjustment for Visual Tracking








This paper considers the recognition of realistic human actions in videos based on spatio-

temporal interest points (STIPs). Existing STIP-based action recognition approaches operate on intensity

representations of the image data. Because of this, these approaches are sensitive to disturbing photometric

phenomena, such as shadows and highlights. In addition, valuable information is neglected by discarding

chromaticity from the photometric representation. These issues are addressed by color STIPs. Color STIPs are

multichannel reformulations of STIP detectors and descriptors, for which we consider a number of chromatic

and invariant representations derived from the opponent color space. Color STIPs are shown to outperform

their intensity-based counterparts on the challenging UCF sports, UCF11 and UCF50 action recognition

benchmarks by more than 5% on average, where most of the gain is due to the multichannel descriptors. In

addition, the results show that color STIPs are currently the single best low-level feature choice for STIP-

based approaches to human action recognition.

ETPL

DIP - 111

Evaluation of Color Spatio-Temporal Interest Points for Human Action

Recognition

An analysis of the relationship between multipath ghosts and the

direct target image for radar imaging is presented. A multipath point spread function (PSF) is defined that

allows for specular reflections in the local environment and can allow the ghost images to be

localized. Analysis of the multipath PSF shows that certain ghosts can only be focused for the far field

synthetic aperture radar case and not the full array case. Importantly, the ghosts are shown to be equivalent to

direct target images taken from different observation angles. This equivalence suggests that exploiting

the ghosts would improve target classification performance and t is improvement is demonstrated using

e perimental data and a na ve ayesian classifer. e ma imum performance gain achieved is 32%.

ETPL

DIP - 112

Analysis and Exploitation of Multipath Ghosts in Radar Target Image

Classification








This paper presents a novel manifold learning algorithm for high-dimensional data sets. The scope of the

application focuses on the problem of motion tracking in video sequences. The framework presented is

twofold. First, it is assumed that the samples are time ordered, providing valuable information that is not

presented in the current methodologies. Second, the manifold topology comprises multiple charts, which

contrasts to the most current methods that assume one single chart, being overly restrictive. The proposed

algorithm, Gaussian process multiple local models (GP-MLM), can deal with arbitrary manifold topology by

decomposing the manifold into multiple local models that are probabilistic combined using Gaussian process

regression. In addition, the paper presents a multiple filter architecture where standard filtering techniques are

integrated within the GP-MLM. The proposed approach exhibits comparable performance of state-of-the-art

trackers, namely multiple model data association and deep belief networks, and compares favorably with

Gaussian process latent variable models. Extensive experiments are presented using real video data, including

a publicly available database of lip sequences and left ventricle ultrasound images, in which the GP-MLM

achieves state of the art results.

ETPL

DIP - 113

Manifold Learning for Object Tracking With Multiple Nonlinear Models

With the explosive growth of the multimedia data on the Web, content-based image search has

attracted considerable attentions in the multimedia and the computer vision community. The most popular

approach is based on the bag-of-visual-words model with invariant local features. Since the spatial context

information among local features is critical for visual content identification, many methods exploit the

geometric clues of local features, including the location, the scale, and the orientation, for explicitly post-

geometric verification. However, usually only a few initially top-ranked results are geometrically verified,

considering the high computational cost in full geometric verification. In this paper, we propose to represent

the spatial context of local features into binary codes, and implicitly achieve geometric verification by efficient

comparison of the binary codes. Besides, we explore the multimode property of local features to further boost

the retrieval performance. Experiments on holidays, Paris, and Oxford building benchmark data sets

demonstrate the effectiveness of the proposed algorithm.

ETPL

DIP - 114

Contextual Hashing for Large-Scale Image Search








Video retargeting is a useful technique to adapt a video to a desired display resolution. It aims to

preserve the information contained in the original video and the shapes of salient objects while maintaining the

temporal coherence of contents in the video. Existing video retargeting schemes achieve temporal coherence

via constraining each region/pixel to be deformed consistently with its corresponding region/pixel in

neighboring frames. However, these methods often distort the shapes of salient objects, since they do not

ensure the content consistency for regions/pixels constrained to be coherently deformed along time axis. In

this paper, we propose a video retargeting scheme to simultaneously meet the two requirements. Our method

first segments a video clip into spatiotemporal grids called grid flows, where the consistency of the content

associated with a grid flow is maintained while retargeting the grid flow. After that, due to the coarse

granularity of grid, there still may exist content inconsistency in some grid flows. We exploit the temporal

redundancy in a grid flow to avoid that the grids with inconsistent content be incorrectly constrained to be

coherently deformed. In particular, we use grid flows to select a set of key-frames which summarize

a video clip, and resize subgrid-flows in these key-frames. We then resize the remaining non key-frames by

simply interpolating their grid contents from the two nearest retargeted key-frames. With the key-frame-based

scheme, we only need to solve a small-scale quadratic programming problem to resize subgrid-flows and

perform grid interpolation, leading to low computation and memory costs. The experimental results

demonstrate the superior performance of our scheme.

ETPL

DIP - 115

Spatiotemporal Grid Flow for Video Retargeting

This paper proposes three clustering-based discriminantanalysis (CDA) models to address the

problem that the Fisher linear discriminant may not be able to extract adequate features for satisfactory

performance, especially for two class problems. The first CDA model, CDA-1, divides each class into a

number of clusters by means of the k-means clustering technique. In this way, a new within-cluster scatter

matrix Swcand a new between-cluster scatter matrix Sb

c are defined. The second and the third CDA models,

CDA-2 and CDA-3, define a nonparametric form of the between-cluster scatter matrices N-Sbc. The

nonparametric nature of the between-cluster scatter matrices inherently leads to the derived features that

preserve the structure important for classification. The difference between CDA-2 and CDA-3 is that the

former computes the between-cluster matrix N-Sbc on a local basis, whereas the latter computes the between-

cluster matrix N-Sbc on a global basis. This paper then presents an accurate CDA-based eye detection method.

Experiments on three widely used face databases show the feasibility of the proposed three CDA models and

the improved eye detection performance over some state-of-the-art methods.

ETPL

DIP - 116

Clustering-Based Discriminant Analysis for Eye Detection








While numerous algorithms have been proposed for object tracking with demonstrated success, it

remains a challenging problem for a tracker to handle large appearance change due to factors such as scale,

motion, shape deformation, and occlusion. One of the main reasons is the lack of effective image

representation schemes to account for appearance variation. Most of the trackers use high-level appearance

structure or low-level cues for representing and matching target objects. In this paper, we propose

a tracking method from the perspective of midlevel vision with structural information captured in superpixels.

We present a discriminative appearance model based on superpixels, thereby facilitating a tracker to

distinguish the target and the background with midlevel cues. The tracking task is then formulated by

computing a target-background confidence map, and obtaining the best candidate by maximum a posterior

estimate. Experimental results demonstrate that our tracker is able to handle heavy occlusion and recover from

drifts. In conjunction with online update, the proposed algorithm is shown to perform favorably against

existing methods for object tracking. Furthermore, the proposed algorithm facilitates foreground and

background segmentation during tracking.

ETPL

DIP - 117

Robust Superpixel Tracking

An effective, low complexity method for lossy compression of scenic bi level images,

called lossy cutset coding, is proposed based on a Markov random field model. It operates by loss lessly

encoding pixels in a square grid of lines, which is a cutset with respect to a Markov random field model, and

preserves key structural information, such as borders between black and white regions. Relying on

the Markov random field model, the decoder takes a MAP approach to reconstructing the interior of each grid

block from the pixels on its boundary, thereby creating a piecewise smooth image that is consistent with the

encoded grid pixels. The MAP rule, which reduces to finding the block interiors with fewest black-white

transitions, is directly implementable for the most commonly occurring block boundaries, thereby avoiding the

need for brute force or iterative solutions. Experimental results demonstrate that the new method is

computationally simple, outperforms the current lossy compression technique most suited to scenic

bilevel images, and provides substantially lower rates than lossless techniques, e.g., JBIG, with little loss in

perceived image quality.

ETPL

DIP - 118

Lossy Cutset Coding of Bilevel Images Based on Markov Random Fields








Text in an image provides vital information for interpreting its contents, and text in a scene can aid a

variety of tasks from navigation to obstacle avoidance and odometry. Despite its value, however, detecting

general text in images remains a challenging research problem. Motivated by the need to consider the widely

varying forms of natural text, we propose a bottom-up approach to the problem, which reflects the

characterness of an image region. In this sense, our approach mirrors the move from saliency detection

methods to measures of objectness. In order to measure the characterness, we develop three novel cues that are

tailored for character detection and a Bayesian method for their integration. Because text is made up of sets of

characters, we then design a Markov random field model so as to exploit the inherent dependencies between

characters. We experimentally demonstrate the effectiveness of our characterness cues as well as the

advantage of Bayesian multicue integration. The proposed text detector outperforms state-of-the-art methods

on a few benchmark scene text detection data sets. We also show that our measurement of characterness is

superior than state-of-the-art saliency detection models when applied to the same task.

ETPL

DIP - 119

Characterness: An Indicator of Text in the Wild

The development of energy selective, photon counting X-ray detectors allows for a wide range of new

possibilities in the area of computed tomographic image formation. Under the assumption of perfect energy

resolution, here we propose a tensor-based iterative algorithm that simultaneously reconstructs the X-ray

attenuation distribution for each energy. We use a multi linear image model rather than a more standard

stacked vector representation in order to develop novel tensor-based regularizers. In particular, we model the

multispectral unknown as a three-way tensor where the first two dimensions are space and the third dimension

is energy. This approach allows for the design of tensor nuclear norm regularizers, which like its 2D

counterpart, is a convex function of the multispectral unknown. The solution to the resulting convex

optimization problem is obtained using an alternating direction method of multipliers approach. Simulation

results show that the generalized tensor nuclear norm can be used as a standalone regularization technique for

the energy selective (spectral) computed tomography problem and when combined with total

variation regularization it enhances the regularization capabilities especially at low energy images where the

effects of noise are most prominent.

ETPL

DIP - 120

Tensor-Based Formulation and Nuclear Norm Regularization for Multienergy

Computed Tomography








Visual tracking is an important but challenging problem in the computer vision field. In the

real world, the appearances of the target and its surroundings change continuously over space and

time, which provides effective information to track the target robustly. However, enough attention

has not been paid to the spatio-temporal appearance information in previous works. In this paper, a

robust spatio-temporal context model based tracker is presented to complete the tracking task in

unconstrained environments. The tracker is constructed with temporal and spatial appearance context

models. The temporal appearance context model captures the historical appearance of the target to

prevent the tracker from drifting to the background in a long-term tracking. The spatial appearance

context model integrates contributors to build a supporting field. The contributors are the patches

with the same size of the target at the key-points automatically discovered around the target. The

constructed supporting field provides much more information than the appearance of the target itself,

and thus, ensures the robustness of the tracker in complex environments. Extensive experiments on

various challenging databases validate the superiority of our tracker over other state-of-the-art

trackers.

ETPL

DIP - 121

Robust Online Learned Spatio-Temporal Context Model for Visual Tracking

In this paper, we present a compressed-domain video retargeting solution that operates without

compromising the resizing quality. Existing video retargeting methods operate in the spatial (or pixel) domain.

Such a solution is not practical if it is implemented in mobile devices due to its large memory requirement. In

the proposed solution, each component of the retargeting system is designed to exploit the low-level

compressed domain features extracted from the coded bit stream. For example, motion information is obtained

directly from motion vectors. An efficient column shape mesh deformation is employed to solve the difficulty

of sophisticated quad-shape mesh deformation in the compressed domain. The proposed solution achieves

comparable (or slightly better) visual quality performance as compared with several state-of-the-art pixel-

domain retargeting methods at lower computational and memory costs, making content-aware video resizing

both scalable and practical in real-world applications.

ETPL

DIP - 122

Compressed-Domain Video Retargeting








Modeling the temporal structure of sub-activities is an important yet challenging problem in complex

activity classification. This paper proposes a latent hierarchical model (LHM) to describe the decomposition of

complex activity into sub-activities in a hierarchical way. The LHM has a tree-structure, where each node

corresponds to a video segment (sub-activity) at certain temporal scale. The starting and ending time points of

each sub-activity are represented by two latent variables, which are automatically determined during the

inference process. We formulate the training problem of the LHM in a latent kernelized SVM framework and

develop an efficient cascade inference method to speed up classification. The advantages of our methods come

from: 1) LHM models the complex activity with a deep structure, which is decomposed into sub-activities in a

coarse-to-fine manner and 2) the starting and ending time points of each segment are adaptively determined to

deal with the temporal displacement and duration variation of sub-activity. We conduct experiments on three

datasets: 1) the KTH; 2) the Hollywood2; and 3) the Olympic Sports. The experimental results show the

effectiveness of the LHM in complex activity classification. With dense features, our LHM achieves the state-

of-the-art performance on the Hollywood2 dataset and the Olympic Sports dataset.

ETPL

DIP - 123

Latent Hierarchical Model of Temporal Structure for Complex Activity

Classification

mCENTRIST, a new multichannel feature generation mechanism for recognizing scene categories, is

proposed in this paper. mCENTRIST explicitly captures the image properties that are encoded jointly by two

image channels, which is different from popular multichannel descriptors. In order to avoid the curse of

dimensionality, tradeoffs at both feature and channel levels have been executed to make mCENTRIST

computationally practical. As a result, mCENTRIST is both efficient and easy to implement. In addition, a

hyper opponent color space is proposed by embedding Sobel information into the opponent color space for

further performance improvements. Experiments show that mCENTRIST outperforms established

multichannel descriptors on four RGB and RGB-near infrared data sets, including aerial orthoimagery, indoor,

and outdoor scene category recognition tasks. Experiments also verify that the hyper opponent color space

enhances descriptors' performance effectively.

ETPL

DIP - 124

mCENTRIST: A Multi-Channel Feature Generation Mechanism for Scene

Categorization








Dictionary learning has been widely used in many image processing tasks. In most of these methods,

the number of basis vectors is either set by experience or coarsely evaluated empirically. In this paper, we

propose a new scale adaptive dictionary learning framework, which jointly estimates suitable scales and

corresponding atoms in an adaptive fashion according to the training data, without the need of prior

information. We design an atom counting function and develop a reliable numerical scheme to solve the

challenging optimization problem. Extensive experiments on texture and video data sets demonstrate

quantitatively and visually that our method can estimate the scale, without damaging the sparse reconstruction

ability.

ETPL

DIP - 125

Scale Adaptive Dictionary Learning

The Richardson-Lucy algorithm is one of the most important in image deconvolution. However, a

drawback is its slow convergence. A significant acceleration was obtained using the technique proposed by

Biggs and Andrews (BA), which is implemented in the deconvlucy function of the image processing

MATLAB toolbox. The BA method was developed heuristically with no proof of convergence. In this paper,

we introduce the heavy-ball (H-B) method for Poisson data optimization and extend it to a scaled H-B method,

which includes the BA method as a special case. The method has a proof of the convergence rate of O(K-2),

where k is the number of iterations. We demonstrate the superior convergence performance, by a speedup

factor of five, of the scaled H-B method on both synthetic and real 3D images.

ETPL

DIP - 126

Scaled Heavy-Ball Acceleration of the Richardson-Lucy Algorithm for 3D

Microscopy Image Restoration








Active contours are a popular approach for object segmentation that uses an energy minimizing spline

to extract an object's boundary. Nonparametric approaches can be computationally complex, whereas

parametric approaches can be impacted by parameter sensitivity. A decoupled active contour (DAC)

overcomes these problems by decoupling the external and internal energies and optimizing them separately.

However a drawback of this approach is its reliance on the edge gradient as the external energy. This can lead

to poor convergence toward the object boundary in the presence of weak object and strong background edges.

To overcome these issues with convergence, a novel approach is proposed that takes advantage of a sparse

texture model, which explicitly considers texture for boundary detection. The approach then defines the

external energy as a weighted combination of textural and structural variation maps and feeds it into a

multifunctional hidden Markov model for more robust object boundary detection. The enhanced DAC

(EDAC) is qualitatively and visually analyzed on two natural image data sets as well as Brodatz images. The

results demonstrate that EDAC effectively combines texture and structural information to extract the object

boundary without impact on computation time and a reliance on color.

ETPL

DIP - 127

Enhanced Decoupled Active Contour Using Structural and Textural Variation

Energy Functionals

We provide conditions under which 2D digital images preserve their topological properties under rigid

transformations. We consider the two most common digital topology models, namely dual adjacency and well-

composedness. This paper leads to the proposal of optimal preprocessing strategies that ensure the topological

invariance of images under arbitrary rigid transformations. These results and methods are proved to be valid

for various kinds of images (binary, gray-level, label), thus providing generic and efficient tools, which can be

used in particular in the context of image registration and warping.

ETPL

DIP - 128

Topology-Preserving Rigid Transformation of 2D Digital Images








We propose a texture learning approach that exploits local organizations of scales and directions. First,

linear combinations of Riesz wavelets are learned using kernel support vector machines. The resulting texture

signatures are modeling optimal class-wise discriminatory properties. The visualization of the obtained

signatures allows verifying the visual relevance of the learned concepts. Second, the local orientations of the

signatures are optimized to maximize their responses, which is carried out analytically and can still be

expressed as a linear combination of the initial steerable Riesz templates. The global process is iteratively

repeated to obtain final rotation-covariant texture signatures. Rapid convergence of class-wise signatures is

observed, which demonstrates that the instances are projected into a feature space that leverages the local

organizations of scales and directions. Experimental evaluation reveals average classification accuracies in the

range of 97% to 98% for the Outex_TC_00010, the Outex_TC_00012, and the Contrib_TC_00000 suites for

even orders of the Riesz transform, and suggests high robustness to changes in images orientation and

illumination. The proposed framework requires no arbitrary choices of scales and directions and is expected to

perform well in a large range of computer vision applications.

ETPL

DIP - 129

Rotation–Covariant Texture Learning Using Steerable Riesz Wavelets

X-ray computed tomography (CT) is a powerful tool for noninvasive imaging of time-varying objects.

In the past, methods have been proposed to reconstruct images from continuously changing objects. For

discretely or structurally changing objects, however, such methods fail to reconstruct high quality images,

mainly because assumptions about continuity are no longer valid. In this paper, we propose a method to

reconstruct structurally changing objects. Starting from the observation that there exist regions within the

scanned object that remain unchanged over time, we introduce an iterative optimization routine that can

automatically determine these regions and incorporate this knowledge in an algebraic reconstruction method.

e proposed algorit m was validated on simulation data and e perimental μC data illustrating its capability

to reconstruct structurally changing objects more accurately in comparison to current techniques.

ETPL

DIP - 130

Region-Based Iterative Reconstruction of Structurally Changing Objects in CT








As a general framework, Laplacian embedding, based on a pairwise similarity matrix, infers low

dimensional representations from high dimensional data. However, it generally suffers from three issues: 1)

algorithmic performance is sensitive to the size of neighbors; 2) the algorithm encounters the well known

small sample size (SSS) problem; and 3) the algorithm de-emphasizes small distance pairs. To address these

issues, here we propose exponential embedding using matrix exponential and provide a general framework for

dimensionality reduction. In the framework, the matrix exponential can be roughly interpreted by the random

walk over the feature similarity matrix, and thus is more robust. The positive definite property of matrix

exponential deals with the SSS problem. The behavior of the decay function of exponential embedding is more

significant in emphasizing small distance pairs. Under this framework, we apply matrix exponential to extend

many popular Laplacian embedding algorithms, e.g., locality preserving projections, unsupervised

discriminant projections, and marginal fisher analysis. Experiments conducted on the synthesized data, UCI,

and the Georgia Tech face database show that the proposed new framework can well address the issues

mentioned above.

ETPL

DIP - 131

A General Exponential Framework for Dimensionality Reduction

I formulate an optimization framework for computing the transmission actions of streaming multi-

view video content over bandwidth constrained channels. The optimization finds the schedule for sending the

packetized data that maximizes the reconstruction quality of the content, for the given network bandwidth.

Two prospective multi-view content representation formats are considered: 1) MVC and 2) video plus depth.

In the case of each, I formulate directed graph models that characterize the interdependencies between the data

units that comprise the content. For the video plus depth format, I develop a novel space-time error

concealment strategy that reconstructs the missing content based on received data units from multiple views. I

design multiple techniques to solve the optimization problem of interest, at varying degrees of complexity and

accuracy. In conjunction, I derive spatiotemporal models of the reconstruction error for the multi-view content

that I employ to reduce the computational requirements of the optimization. I study the performance of my

framework via simulation experiments. Significant gains in terms of rate-distortion efficiency are

demonstrated over various reference methods.

ETPL

DIP - 132

Transmission Policy Selection for Multi-View Content Delivery Over

Bandwidth Constrained Channels








This paper proposes a new approach to label-equivalence-based two-scan connected-component

labeling. We use two strategies to reduce repeated checking-pixel work for labeling. The first is that instead of

scanning image lines one by one and processing pixels one by one as in most conventional two-scan labeling

algorithms, we scan image lines alternate lines, and process pixels two by two. The second is that by

considering the transition of the configuration of pixels in the mask, we utilize the information detected in

processing the last two pixels as much as possible for processing the current two pixels. With our method, any

pixel checked in the mask when processing the current two pixels will not be checked again when the next two

pixels are processed; thus, the efficiency of labeling can be improved. Experimental results demonstrated that

our method was more efficient than all conventional labeling algorithms.

ETPL

DIP - 133

Configuration-Transition-Based Connected-Component Labeling

A novel application of the Hough transform (HT) neighborhood approach to collinear segment

detection was proposed in [1]. It, however, suffered from one major weakness in that it could not provide an

effective solution to the case of segment intersection. This paper analyzes a vital prerequisite step, disturbance

elimination in the Hough space, and shows why, this method alone, is incapable of distinguishing the true

segment endpoints. To address the problem, a unique HT butterfly separation method is proposed in this

correspondence, as an essential complement to the above publication.

ETPL

DIP - 134

Comment on “Collinear Segment Detection Using HT Neighborhoods

In this paper, we propose a novel joint data-hiding and compression scheme for digital images using

side match vector quantization (SMVQ) and image inpainting. The two functions of data hiding and image

compression can be integrated into one single module seamlessly. On the sender side, except for the blocks in

the leftmost and topmost of the image, each of the other residual blocks in raster-scanning order can be

embedded with secret data and compressed simultaneously by SMVQ or image inpainting adaptively

according to the current embedding bit. Vector quantization is also utilized for some complex blocks to control

the visual distortion and error diffusion caused by the progressive compression. After segmenting the image

compressed codes into a series of sections by the indicator bits, the receiver can achieve the extraction of

secret bits and image decompression successfully according to the index values in the segmented sections.

Experimental results demonstrate the effectiveness of the proposed scheme.

ETPL

DIP - 135

A Novel Joint Data-Hiding and Compression Scheme Based on SMVQ and

Image Inpainting








Building an accurate training database is challenging in supervised classification. For instance, in

medical imaging, radiologists often delineate malignant and benign tissues without access to the histological

ground truth, leading to uncertain data sets. This paper addresses the pattern classification problem arising

when available target data include some uncertainty information. Target data considered here are both

qualitative (a class label) or quantitative (an estimation of the posterior probability). In this context, usual

discriminative methods, such as the support vector machine (SVM), fail either to learn a robust classifier or to

predict accurate probability estimates. We generalize the regular SVM by introducing a new formulation of the

learning problem to take into account class labels as well as class probability estimates. This original

reformulation into a probabilistic SVM (P-SVM) can be efficiently solved by adapting existing flexible SVM

solvers. Furthermore, this framework allows deriving a unique learned prediction function for both decision

and posterior probability estimation providing qualitative and quantitative predictions. The method is first

tested on synthetic data sets to evaluate its properties as compared with the classical SVM and fuzzy-SVM. It

is then evaluated on a clinical data set of multiparametric prostate magnetic resonance images to assess its

performances in discriminating benign from malignant tissues. P-SVM is shown to outperform classical SVM

as well as the fuzzy-SVM in terms of probability predictions and classification performances, and

demonstrates its potential for the design of an efficient computer-aided decision system for prostate cancer

diagnosis based on multiparametric magnetic resonance (MR) imaging.

ETPL

DIP - 136

Kernel-Based Learning From Both Qualitative and Quantitative labels:

Application to Prostate Cancer Diagnosis Based on Multiparametric MR

Imaging

Otsu's algorithm for thresholding images is widely used, and the computational complexity of

determining the threshold from the histogram is O(N) where N is the number of histogram bins. When the

algorithm is adapted to circular rather than linear histograms then two thresholds are required for binary

thresholding. We show that, surprisingly, it is still possible to determine the optimal threshold in O(N) t ime.

The efficient optimal algorithm is over 300 times faster than traditional approaches for typical histograms and

is thus particularly suitable for real-time applications. We further demonstrate the usefulness of circular

thresholding using the adapted Otsu criterion for various applications, including analysis of optical flow data,

indoor/outdoor image classification, and non-photorealistic rendering. In particular, by combining circular

Otsu feature with other colour/texture features, a 96.9% correct rate is obtained for indoor/outdoor

classification on the well known IITM-SCID2 data set, outperforming the state-of-the-art result by 4.3%.

ETPL

DIP - 137

Efficient Circular Thresholding








Image label prediction is a critical issue in computer vision and machine learning. In this paper, we

propose and develop sparse label-indicator optimization methods for image classification problems. Sparsity is

introduced in the label-indicator such that relevant and irrelevant images with respect to a given class can be

distinguished. Also, when we deal with multi-class image classification problems, the number of possible

classes of a given image can also be constrained to be small in which it is valid for natural images. The

resulting sparsity model can be formulated as a convex optimization problem, and it can be solved very

efficiently. Experimental results are reported to illustrate the effectiveness of the proposed model, and

demonstrate that the classification performance of the proposed method is better than the other testing methods

in this paper.

ETPL

DIP - 138

Sparse Label-Indicator Optimization Methods for Image Classification

In this paper, we propose a novel coding and transmission scheme, called LineCast, for broadcasting

satellite images to a large number of receivers. The proposed LineCast matches perfectly with the line

scanning cameras that are widely adopted in orbit satellites to capture high-resolution images. On the sender

side, each captured line is immediately compressed by a transform-domain scalar modulo quantization.

Without syndrome coding, the transmission power is directly allocated to quantized coefficients by scaling the

coefficients according to their distributions. Finally, the scaled coefficients are transmitted over a dense

constellation. This line-based distributed scheme features low delay, low memory cost, and low complexity.

On the receiver side, our proposed line-based prediction is used to generate side information from previously

decoded lines, which fully utilizes the correlation among lines. The quantized coefficients are decoded by the

linear least square estimator from the received data. The image line is then reconstructed by the scalar modulo

dequantization using the generated side information. Since there is neither syndrome coding nor channel

coding, the proposed LineCast can make a large number of receivers reach the qualities matching their channel

conditions. Our theoretical analysis shows that the proposed LineCast can achieve Shannon's optimum

performance by using a high-dimensional modulo-lattice quantization. Experiments on satellite images

demonstrate that it achieves up to 1.9-dB gain over the state-of-the-art 2D broadcasting scheme and a gain of

more than 5 dB over JPEG 2000 with forward error correction.

ETPL

DIP - 139

LineCast: Line-Based Distributed Coding and Transmission for Broadcasting

Satellite Images








The goal of multilabel classification is to reveal the underlying label correlations to boost the accuracy

of classification tasks. Most of the existing multilabel classifiers attempt to exhaustively explore dependency

between correlated labels. It increases the risk of involving unnecessary label dependencies, which are

detrimental to classification performance. Actually, not all the label correlations are indispensable to

multilabel model. Negligible or fragile label correlations cannot be generalized well to the testing data,

especially if there exists label correlation discrepancy between training and testing sets. To minimize such

negative effect in the multilabel model, we propose to learn a sparse structure of label dependency. The

underlying philosophy is that as long as the multilabel dependency cannot be well explained, the principle of

parsimony should be applied to the modeling process of the label correlations. The obtained sparse label

dependency structure discards the outlying correlations between labels, which makes the learned model more

generalizable to future samples. Experiments on real world data sets show the competitive results compared

with existing algorithms.

ETPL

DIP - 140

Multi-Label Image Categorization With Sparse Factor Representation

We present a new method in image segmentation that is based on Otsu's method but iteratively

searches for subregions of the image for segmentation, instead of treating the full image as a whole region for

processing. The iterative method starts with Otsu's threshold and computes the mean values of the two classes

as separated by the threshold. Based on the Otsu's threshold and the two mean values, the method separates the

image into three classes instead of two as the standard Otsu's method does. The first two classes are

determined as the foreground and background and they will not be processed further. The third class is

denoted as a to-be-determined (TBD) region that is processed at next iteration. At the succeeding iteration,

Otsu's method is applied on the TBD region to calculate a new threshold and two class means and the TBD

region is again separated into three classes, namely, foreground, background, and a new TBD region, which by

definition is smaller than the previous TBD regions. Then, the new TBD region is processed in the similar

manner. The process stops when the Otsu's thresholds calculated between two iterations is less than a preset

threshold. Then, all the intermediate foreground and background regions are, respectively, combined to create

the final segmentation result. Tests on synthetic and real images showed that the new iterative method can

achieve better performance than the standard Otsu's method in many challenging cases, such as identifying

weak objects and revealing fine structures of complex objects while the added computational cost is minimal.

ETPL

DIP - 141

A New Iterative Triclass Thresholding Technique in Image Segmentation








Camera-enabled mobile devices are commonly used as interaction platforms for linking the user's

virtual and physical worlds in numerous research and commercial applications, such as serving an augmented

reality interface for mobile information retrieval. The various application scenarios give rise to a key technique

of daily life visual object recognition. On-premise signs (OPSs), a popular form of commercial advertising, are

widely used in our living life. The OPSs often exhibit great visual diversity (e.g., appearing in arbitrary size),

accompanied with complex environmental conditions (e.g., foreground and background clutter). Observing

that such real-world characteristics are lacking in most of the existing image data sets, in this paper, we first

proposed an OPS data set, namely OPS-62, in which totally 4649 OPS images of 62 different businesses are

collected from Google's Street View. Further, for addressing the problem of real-world OPS learning and

recognition, we developed a probabilistic framework based on the distributional clustering, in which we

proposed to exploit the distributional information of each visual feature (the distribution of its associated OPS

labels) as a reliable selection criterion for building discriminative OPS models. Experiments on the OPS-62

data set demonstrated the outperformance of our approach over the state-of-the-art probabilistic latent

semantic analysis models for more accurate recognitions and less false alarms, with a significant 151.28%

relative improvement in the average recognition rate. Meanwhile, our approach is simple, linear, and can be

executed in a parallel fashion, making it practical and scalable for large-scale multimedia applications.

ETPL

DIP - 142

Learning and Recognition of On-Premise Signs From Weakly Labeled Street

View Images

This paper addresses a new learning algorithm for the recently introduced co-sparse analysis model. First, we

give new insights into the co-sparse analysis model by establishing connections to filter-based MRF models,

such as the field of experts model of Roth and Black. For training, we introduce a technique called bi-level

optimization to learn the analysis operators. Compared with existing analysis operator learning approaches,

our training procedure has the advantage that it is unconstrained with respect to the analysis operator. We

investigate the effect of different aspects of the co-sparse analysis model and show that the sparsity promoting

function (also called penalty function) is the most important factor in the model. In order to demonstrate the

effectiveness of our training approach, we apply our trained models to various classical image restoration

problems. Numerical experiments show that our trained models clearly outperform existing analysis operator

learning approaches and are on par with state-of-the-art image denoising algorithms. Our approach develops a

framework that is intuitive to understand and easy to implement.

ETPL

DIP - 143

Insights Into Analysis Operator Learning: From Patch-Based Sparse Models to

Higher Order MRFs








The classification of retinal vessels into artery/vein (A/V) is an important phase for automating the

detection of vascular changes, and for the calculation of characteristic signs associated with several systemic

diseases such as diabetes, hypertension, and other cardiovascular conditions. This paper presents an automatic

approach for A/V classification based on the analysis of a graph extracted from the retinal vasculature. The

proposed method classifies the entire vascular tree deciding on the type of each intersection point (graph

nodes) and assigning one of two labels to each vessel segment (graph links). Final classification of a vessel

segment as A/V is performed through the combination of the graph-based labeling results with a set of

intensity features. The results of this proposed method are compared with manual labeling for three public

databases. Accuracy values of 88.3%, 87.4%, and 89.8% are obtained for the images of the INSPIRE-AVR,

DRIVE, and VICAVR databases, respectively. These results demonstrate that our method outperforms recent

approaches for A/V classification.

ETPL

DIP - 144

An Automatic Graph-Based Approach for Artery/Vein Classification in Retinal

Images

In this paper, we propose a novel image interpolation algorithm via graph-based Bayesian label

propagation. The basic idea is to first create a graph with known and unknown pixels as vertices and with edge

weights encoding the similarity between vertices, then the problem of interpolation converts to how to

effectively propagate the label information from known points to unknown ones. This process can be posed as

a Bayesian inference, in which we try to combine the principles of local adaptation and global consistency to

obtain accurate and robust estimation. Specially, our algorithm first constructs a set of local interpolation

models, which predict the intensity labels of all image samples, and a loss term will be minimized to keep the

predicted labels of the available low-resolution (LR) samples sufficiently close to the original ones. Then, all

of the losses evaluated in local neighborhoods are accumulated together to measure the global consistency on

all samples. Moreover, a graph-Laplacian-based manifold regularization term is incorporated to penalize the

global smoothness of intensity labels, such smoothing can alleviate the insufficient training of the local models

and make them more robust. Finally, we construct a unified objective function to combine together the global

loss of the locally linear regression, square error of prediction bias on the available LR samples, and the

manifold regularization term. It can be solved with a closed-form solution as a convex optimization problem.

Experimental results demonstrate that the proposed method achieves competitive performance with the state-

of-the-art image interpolation algorithms.

ETPL

DIP - 145

Image Interpolation via Graph-Based Bayesian Label Propagation








Rain removal is a very useful and important technique in applications such as security surveillance

and movie editing. Several rain removal algorithms have been proposed these years, where photometric,

chromatic, and probabilistic properties of the rain have been exploited to detect and remove the rainy effect.

Current methods generally work well with light rain and relatively static scenes, when dealing with heavier

rainfall in dynamic scenes, these methods give very poor visual results. The proposed algorithm is based on

motion segmentation of dynamic scene. After applying photometric and chromatic constraints for rain

detection, rain removal filters are applied on pixels such that their dynamic property as well as motion

occlusion clue are considered; both spatial and temporal informations are then adaptively exploited during rain

pixel recovery. Results show that the proposed algorithm has a much better performance for rainy scenes with

large motion than existing algorithms.

ETPL

DIP - 146

A Rain Pixel Recovery Algorithm for Videos With Highly Dynamic Scenes

Multiview face recognition has become an active research area in the last few years. In this paper, we

present an approach for video-based face recognition in camera networks. Our goal is to handle pose

variations by exploiting the redundancy in the multiview video data. However, unlike traditional approaches

that explicitly estimate the pose of the face, we propose a novel feature for robust face recognition in the

presence of diffuse lighting and pose variations. The proposed feature is developed using the spherical

harmonic representation of the face texture-mapped onto a sphere; the texture map itself is generated by back-

projecting the multiview video data. Video plays an important role in this scenario. First, it provides an

automatic and efficient way for feature extraction. Second, the data redundancy renders the recognition

algorithm more robust. We measure the similarity between feature sets from different videos using the

reproducing kernel Hilbert space. We demonstrate that the proposed approach outperforms traditional

algorithms on a multiview video database.

ETPL

DIP - 147

Robust Face Recognition From Multi-View Videos








In this paper, we propose two novel shape descriptors, angular pattern (AP) and binary angular pattern

(BAP), and a multiscale integration of them for shape retrieval. Both AP and BAP are intrinsically invariant to

scale and rotation. More importantly, being global shape descriptors, the proposed shape descriptors are

computationally very efficient, while possessing similar discriminability as state-of-the-art local descriptors.

As a result, the proposed approach is attractive for real world shape retrieval applications. The experiments on

the widely used MPEG-7 and TARI-1000 data sets demonstrate the effectiveness of the proposed method in

comparison with existing methods.

ETPL

DIP - 148

Angular Pattern and Binary Angular Pattern for Shape Retrieval

Using a novel characterization of texture, we propose an image decomposition technique that can

effectively decomposes an image into its cartoon and texture components. The characterization rests on our

observation that the texture component enjoys a blockwise low-rank nature with possible overlap and shear,

because texture, in general, is globally dissimilar but locally well patterned. More specifically, one can

observe that any local block of the texture component consists of only a few individual patterns. Based on this

premise, we first introduce a new convex prior, named the block nuclear norm (BNN), leading to a suitable

characterization of the texture component. We then formulate a cartoon-texture decomposition model as a

convex optimization problem, where the simultaneous estimation of the cartoon and texture components from

a given image or degraded observation is executed by minimizing the total variation and BNN. In addition,

patterns of texture extending in different directions are extracted separately, which is a special feature of the

proposed model and of benefit to texture analysis and other applications. Furthermore, the model can handle

various types of degradation occurring in image processing, including blur+missing pixels with several types

of noise. By rewriting the problem via variable splitting, the so-called alternating direction method of

multipliers becomes applicable, resulting in an efficient algorithmic solution to the problem. Numerical

examples illustrate that the proposed model is very selective to patterns of texture, which makes it produce

better results than state-of-the-art decomposition models.

ETPL

DIP - 149

Cartoon-Texture Image Decomposition Using Blockwise Low-Rank Texture

Characterization








This paper investigates a convex-relaxed kernel mapping formulation of image segmentation. We

optimize, under some partition constraints, a functional containing two characteristic terms: 1) a data term,

which maps the observation space to a higher (possibly infinite) dimensional feature space via a kernel

function, thereby evaluating nonlinear distances between the observations and segments parameters and 2) a

total-variation term, which favors smooth segment surfaces (or boundaries). The algorithm iterates two steps:

1) a convex-relaxation optimization with respect to the segments by solving an equivalent constrained problem

via the augmented Lagrange multiplier method and 2) a convergent fixed-point optimization with respect to

the segments parameters. The proposed algorithm can bear with a variety of image types without the need for

complex and application-specific statistical modeling, while having the computational benefits of convex

relaxation. Our solution is amenable to parallelized implementations on graphics processing units (GPUs) and

extends easily to high dimensions. We evaluated the proposed algorithm with several sets of comprehensive

experiments and comparisons, including: 1) computational evaluations over 3D medical-imaging examples

and high-resolution large-size color photographs, which demonstrate that a parallelized implementation of the

proposed method run on a GPU can bring a significant speed-up and 2) accuracy evaluations against five state-

of-the-art methods over the Berkeley color-image database and a multimodel synthetic data set, which

demonstrates competitive performances of the algorithm.

ETPL

DIP - 150

Convex-Relaxed Kernel Mapping for Image Segmentation

High frame rate cameras capture sharp videos of highly dynamic scenes by trading off signal-noise-

ratio and image resolution, so combinational super-resolving and denoising is crucial for enhancing high speed

videos and extending their applications. The solution is nontrivial due to the fact that two deteriorations co-

occur during capturing and noise is nonlinearly dependent on signal strength. To handle this problem, we

propose conducting noise separation and super resolution under a unified optimization framework, which

models both spatiotemporal priors of high quality videos and signal-dependent noise. Mathematically, we

align the frames along temporal axis and pursue the solution under the following three criterion: 1) the sharp

noise-free image stack is low rank with some missing pixels denoting occlusions; 2) the noise follows a given

nonlinear noise model; and 3) the recovered sharp image can be reconstructed well with sparse coefficients

and an over complete dictionary learned from high quality natural images. In computation aspects, we propose

to obtain the final result by solving a convex optimization using the modern local linearization techniques. In

the experiments, we validate the proposed approach in both synthetic and real captured data.

ETPL

DIP - 151

Joint Non-Gaussian Denoising and Superresolving of Raw High Frame Rate

Videos








High frame rate cameras capture sharp videos of highly dynamic scenes by trading off signal-noise-

ratio and image resolution, so combinational super-resolving and denoising is crucial for enhancing high speed

videos and extending their applications. The solution is nontrivial due to the fact that two deteriorations co-

occur during capturing and noise is nonlinearly dependent on signal strength. To handle this problem, we

propose conducting noise separation and super resolution under a unified optimization framework, which

models both spatiotemporal priors of high quality videos and signal-dependent noise. Mathematically, we

align the frames along temporal axis and pursue the solution under the following three criterion: 1) the sharp

noise-free image stack is low rank with some missing pixels denoting occlusions; 2) the noise follows a given

nonlinear noise model; and 3) the recovered sharp image can be reconstructed well with sparse coefficients

and an over complete dictionary learned from high quality natural images. In computation aspects, we propose

to obtain the final result by solving a convex optimization using the modern local linearization techniques. In

the experiments, we validate the proposed approach in both synthetic and real captured data.

ETPL

DIP - 152

Does Deblurring Improve Geometrical Hyperspectral Unmixing?

In computed tomography (CT), partial volume effects impede accurate segmentation of structures that

are small with respect to the pixel size. In this paper, it is shown that for objects consisting of a small number

of homogeneous materials, the reconstruction resolution can be substantially increased without altering the

acquisition process. A super-resolution reconstruction approach is introduced that is based on discrete

tomography, in which prior knowledge about the materials in the object is assumed. Discrete tomography has

already been used to create reconstructions from a low number of projection angles, but in this paper, it is

demonstrated that it can also be applied to increase the reconstruction resolution. Experiments on simulated

and real μC data of bone and foam structures s ow t at t e proposed met od indeed leads to significantly

improved structure segmentation and quantification compared with what can be achieved from conventional

reconstructions.

ETPL

DIP - 153

Super-Resolution for Computed Tomography Based on Discrete Tomography








Illumination estimation is an important component of color constancy and automatic white balancing.

A number of methods of combining illumination estimates obtained from multiple subordinate illumination

estimation methods now appear in the literature. These combinational methods aim to provide better

illumination estimates by fusing the information embedded in the subordinate solutions. The existing

combinational methods are surveyed and analyzed here with the goals of determining: 1) the effectiveness of

fusing illumination estimates from multiple subordinate methods; 2) the best method of combination; 3) the

underlying factors that affect the performance of a combinational method; and 4) the effectiveness of

combination for illumination estimation in multiple-illuminant scenes. The various combinational methods are

categorized in terms of whether or not they require supervised training and whether or not they rely on high-

level scene content cues (e.g., indoor versus outdoor). Extensive tests and enhanced analyzes using three data

sets of real-world images are conducted. For consistency in testing, the images were labeled according to their

high-level features (3D stages, indoor/outdoor) and this label data is made available on-line. The tests reveal

that the trained combinational methods (direct combination by support vector regression in particular) clearly

outperform both the non-combinational methods and those combinational methods based on scene content

cues.

ETPL

DIP - 154

Evaluating Combinational Illumination Estimation Methods on Real-World

Images

The automatic clustering of time-varying characteristics and phenomena in natural scenes has recently

received great attention. While there exist many algorithms for motion segmentation, an important issue

arising from these studies concerns that for which attributes of the data should be used to cluster phenomena

with a certain repetitiveness in both space and time. It is difficult because there is no knowledge about the

labels of the phenomena to guide the search. In this paper, we present a feature selection dynamic mixture

model for motion segmentation. The advantage of our method is that it is intuitively appealing, avoiding any

combinatorial search, and allowing us to prune the feature set. Numerical experiments on various phenomena

are conducted. The performance of the proposed model is compared with that of other motion segmentation

algorithms, demonstrating the robustness and accuracy of our method.

ETPL

DIP - 155

An Unsupervised Feature Selection Dynamic Mixture Model for Motion

Segmentation








This paper presents a new lossless color image compression algorithm, based on the hierarchical

prediction and context-adaptive arithmetic coding. For the lossless compression of an RGB image, it is first

decorrelated by a reversible color transform and then Y component is encoded by a conventional lossless

grayscale image compression method. For encoding the chrominance images, we develop a hierarchical

scheme that enables the use of upper, left, and lower pixels for the pixel prediction, whereas the conventional

raster scan prediction methods use upper and left pixels. An appropriate context model for the prediction error

is also defined and the arithmetic coding is applied to the error signal corresponding to each context. For

several sets of images, it is shown that the proposed method further reduces the bit rates compared with

JPEG2000 and JPEG-XR.

ETPL

DIP - 156

Hierarchical Prediction and Context Adaptive Coding for Lossless Color Image

Compression

One of the most challenging ongoing issues in the field of 3D visual research is how to perceptually

quantify object and surface visualizations that are displayed within a virtual 3D space between a human eye

and 3D display. To seek an effective method of quantification, it is necessary to measure various elements

related to the perception of 3D objects at different depths. We propose a new framework for quantifying 3D

visual information that we call 3D visual activity (3DVA), which utilizes natural scene statistics measured

over 3D visual coordinates. We account for important aspects of 3D perception by carrying out a 3D

coordinate transform reflecting the nonuniform sampling resolution of the eye and the process of stereoscopic

fusion. The 3DVA utilizes the empirical distortions of wavelet coefficients to a parametric generalized

Gaussian probability distribution model and a set of 3D perceptual weights. We conducted a series of

simulations that demonstrate the effectiveness of the 3DVA for quantifying the statistical dynamics of visual

3D space with respect to disparity, motion, texture, and color. A successful example application is also

provided, whereby 3DVA is applied to the problem of predicting visual fatigue experienced when viewing 3D

displays.

ETPL

DIP - 157

3D Visual Activity Assessment Based on Natural Scene Statistics








This paper presents a new method to estimate the parameters of two types of blurs, linear uniform

motion (approximated by a line characterized by angle and length) and out-of-focus (modeled as a uniform

disk characterized by its radius), for blind restoration of natural images. The method is based on the spectrum

of the blurred images and is supported on a weak assumption, which is valid for the most natural images: the

power-spectrum is approximately isotropic and has a power-law decay with the spatial frequency. We

introduce two modifications to the radon transform, which allow the identification of the blur spectrum pattern

of the two types of blurs above mentioned. The blur parameters are identified by fitting an appropriate

function that accounts separately for the natural image spectrum and the blur frequency response. The

accuracy of the proposed method is validated by simulations, and the effectiveness of the proposed method is

assessed by testing the algorithm on real natural blurred images and comparing it with state-of-the-art blind

deconvolution methods.

ETPL

DIP - 158

Parametric Blur Estimation for Blind Restoration of Natural Images: Linear

Motion and Out-of-Focus

In pedestrian detection, as sophisticated feature descriptors are used for improving detection accuracy,

its processing speed becomes a critical issue. In this paper, we propose a novel speed-up scheme based on

multiple-instance pruning (MIP), one of the soft cascade methods, to enhance the processing speed of support

vector machine (SVM) classifiers. Our scheme mainly consists of three steps. First, we regularly split an SVM

classifier into multiple parts and build a cascade structure using them. Next, we rearrange the cascade structure

for enhancing the rejection rate, and then train the rejection threshold of each stage composing the cascade

structure using the MIP. To verify the validity of our scheme, we apply it to a pedestrian classifier using co-

occurrence histograms of oriented gradients trained by an SVM, and experimental results show that the

processing time for classification of the proposed scheme is as low as one-hundredth of the original classifier

without sacrificing detection accuracy.

ETPL

DIP - 159

A Speed-Up Scheme Based on Multiple-Instance Pruning for Pedestrian

Detection Using a Support Vector Machine








The development of video quality metrics requires methods for measuring perceived video quality.

Most of these metrics are designed and tested using databases of images degraded by compression and scored

using opinion ratings. We studied video quality preferences for enhanced images of normally-sighted

participants using the method of paired comparisons with a thorough statistical analysis. Participants (n=40)

made pair-wise comparisons of high definition video clips enhanced at four different levels using a

commercially available enhancement device. Perceptual scales were computed with binary logistic regression

to estimate preferences for each level and to provide statistical inference of the differences among levels and

the impact of other variables. While moderate preference for enhanced videos was found, two unexpected

effects were also uncovered: 1) participants could be broadly classified into two groups: a) those who

preferred enhancement (“S arp”) and b) t ose w o disliked en ancement (“Smoot ”) and 2) en ancement

preferences depended on video content, particularly for human faces to be enhanced less. The results suggest

that algorithms to evaluate image quality (at least for enhancement) may need to be adjusted or applied

differentially based on video content and viewer preferences. The possible impact of similar effects on image

quality of compressed video needs to be evaluated.

ETPL

DIP - 160

Factors Affecting Enhanced Video Quality Preferences

A new fingerprint compression algorithm based on sparse representation is introduced. Obtaining an

overcomplete dictionary from a set of fingerprint patches allows us to represent them as a sparse linear

combination of dictionary atoms. In the algorithm, we first construct a dictionary for predefined fingerprint

image patches. For a new given fingerprint images, represent its patches according to the dictionary by

computing l0-minimization and then quantize and encode the representation. In this paper, we consider the

effect of various factors on compression results. Three groups of fingerprint images are tested. The

experiments demonstrate that our algorithm is efficient compared with several competing compression

techniques (JPEG, JPEG 2000, and WSQ), especially at high compression ratios. The experiments also

illustrate that the proposed algorithm is robust to extract minutiae.

ETPL

DIP - 161

Fingerprint Compression Based on Sparse Representation








The plenoptic function is a powerful tool to analyze the properties of multi-view image data sets. In

particular, the understanding of the spectral properties of the plenoptic function is essential in many computer

vision applications, including image-based rendering. In this paper, we derive for the first time an exact

closed-form expression of the plenoptic spectrum of a slanted plane with finite width and use this expression

as the elementary building block to derive the plenoptic spectrum of more sophisticated scenes. This is

achieved by approximating the geometry of the scene with a set of slanted planes and evaluating the closed-

form expression for each plane in the set. We then use this closed-form expression to revisit uniform plenoptic

sampling. In this context, we derive a new Nyquist rate for the plenoptic sampling of a slanted plane and a new

reconstruction filter. Through numerical simulations, on both real and synthetic scenes, we show that the new

filter outperforms alternative existing filters.

ETPL

DIP - 162

On the Spectrum of the Plenoptic Function

In digital forensics, recovery of a damaged or altered video file plays a crucial role in searching for

evidences to resolve a criminal case. This paper presents a frame-based recovery technique of a corrupted

video file using the specifications of a codec used to encode the video data. A video frame is the minimum

meaningful unit of video data. Many existing approaches attempt to recover a video file using file structure

rather than frame structure. In case a target video file is severely fragmented or even has a portion of video

overwritten by other video content, however, video file recovery of existing approaches may fail. The

proposed approach addresses how to extract video frames from a portion of video to be restored as well as

how to connect extracted video frames together according to the codec specifications. Experiment results show

that the proposed technique successfully restores fragmented video files regardless of the amount of

fragmentations. For a corrupted video file containing overwritten segments, the proposed technique can

recover most of the video content in non-overwritten segments of the video file.

ETPL

DIP - 163

Frame-Based Recovery of Corrupted Video Files Using Video Codec

Specifications








.

Vector-valued images such as RGB color images or multimodal medical images show a strong

interchannel correlation, which is not exploited by most image processing tools. We propose a new notion of

treating vector-valued images which is based on the angle between the spatial gradients of their channels.

Through minimizing a cost functional that penalizes large angles, images with parallel level sets can be

obtained. After formally introducing this idea and the corresponding cost functionals, we discuss t eir teau

derivatives that lead to a diffusion-like gradient descent scheme. We illustrate the properties of this cost

functional by several examples in denoising and demosaicking of RGB color images. They show that parallel

level sets are a suitable concept for color image enhancement. Demosaicking with parallel level sets gives

visually perfect results for low noise levels. Furthermore, the proposed functional yields sharper images than

the other approaches in comparison.

ETPL

DIP - 164

Vector-Valued Image Processing by Parallel Level Sets

In region-of-interest (ROI)-based video coding, ROI parts of the frame are encoded with higher

quality than non-ROI parts. At low bit rates, such encoding may produce attention-grabbing coding artifacts,

which may draw viewer's attention away from ROI, thereby degrading visual quality. In this paper, we present

a saliency-aware video compression method for ROI-based video coding. The proposed method aims at

reducing salient coding artifacts in non-ROI parts of the frame in order to keep user's attention on ROI.

Further, the method allows saliency to increase in high quality parts of the frame, and allows saliency to

reduce in non-ROI parts. Experimental results indicate that the proposed method is able to improve visual

quality of encoded video relative to conventional rate distortion optimized video coding, as well as two state-

of-the art perceptual video coding methods.

ETPL

DIP - 165

Saliency-Aware Video Compression

A new algorithm for calculating the metamer mismatch volumes that arise in colour vision and colour imaging

is introduced. Unlike previous methods, the proposed method places no restrictions on the set of possible

object reflectance spectra. As a result of such restrictions, previous methods have only been able to provide

approximate solutions to the mismatch volume. The proposed new method is the first to characterize precisely

the metamer mismatch volume for any possible reflectance.

ETPL

DIP - 166

Metamer Mismatching








This paper is devoted to the study of a directional lifting transform for wavelet frames. A

nonsubsampled lifting structure is developed to maintain the translation invariance as it is an important

property in image denoising. Then, the directionality of the lifting-based tight frame is explicitly discussed,

followed by a specific translation invariant directional framelet transform (TIDFT). The TIDFT has two

framelets ψ1 ψ2 wit vanis ing moments of order two and one respectively w ic are able to detect

singularities in a given direction set. It provides an efficient and sparse representation for images containing

rich textures along with properties of fast implementation and perfect reconstruction. In addition, an adaptive

block-wise orientation estimation method based on Gabor filters is presented instead of the conventional

minimization of residuals. Furthermore, the TIDFT is utilized to exploit the capability of image denoising,

incorporating the MAP estimator for multivariate exponential distribution. Consequently, the TIDFT is able to

eliminate the noise effectively while preserving the textures simultaneously. Experimental results show that

the TIDFT outperforms some other frame-based denoising methods, such as contourlet and shearlet, and is

competitive to the state-of-the-art denoising approaches.

ETPL

DIP - 167

Translation Invariant Directional Framelet Transform Combined With Gabor

Filters for Image Denoising

We propose a segmentation method based on the geometric representation of images as two-

dimensional manifolds embedded in a higher dimensional space. The segmentation is formulated as a

minimization problem, where the contours are described by a level set function and the objective functional

corresponds to the surface of the image manifold. In this geometric framework, both data-fidelity and

regularity terms of the segmentation are represented by a single functional that intrinsically aligns the

gradients of the level set function with the gradients of the image and exploits this directional information to

overcome image in homogeneities and fragmented contours. The proposed formulation combines this robust

alignment of gradients with attractive properties of previous methods developed in the same geometric

framework: the natural coupling of image channels proposed for anisotropic diffusion in [1] and the ability of

subjective surfaces [2] to detect weak edges and close fragmented boundaries. The potential of such a

geometric approach lies in the general definition of Riemannian manifolds, which naturally generalizes

existing segmentation methods (the geodesic active contours of Caselles et al. [3], the active contours without

edges of Chan and Vese [4] and the robust edge integrator of Kimmel and Bruckstein [5]) to higher

dimensional spaces, non-flat images and feature spaces. Our experiments show that the proposed technique

improves the segmentation of multichannel images, images subject to in homogeneities, and images

ETPL

DIP - 168

Harmonic Active Contours








Most existing color constancy algorithms assume uniform illumination. However, in real-world

scenes, this is not often the case. Thus, we propose a novel framework for estimating the colors of multiple

illuminants and their spatial distribution in the scene. We formulate this problem as an energy minimization

task within a conditional random field over a set of local illuminant estimates. In order to quantitatively

evaluate the proposed method, we created a novel data set of two-dominant-illuminant images comprised of

laboratory, indoor, and outdoor scenes. Unlike prior work, our database includes accurate pixel-wise ground

truth illuminant information. The performance of our method is evaluated on multiple data sets. Experimental

results show that our framework clearly outperforms single illuminant estimators as well as a recently

proposed multi-illuminant estimation approach.

ETPL

DIP - 169

Multi-Illuminant Estimation with Conditional Random Fields

Time multiplexing (TM) and spatial neighborhood (SN) are two mainstream structured light

techniques widely used for depth sensing. The former is well known for its high accuracy and the latter for its

low delay. In this paper, we explore a new paradigm of scalable depth sensing to integrate the advantages of

both the TM and SN methods. Our contribution is twofold. First, we design a set of hybrid structured light

patterns composed of phase-shifted fringe and pseudo-random speckle. Under the illumination of the hybrid

patterns, depth can be decently reconstructed either from a few consecutive frames with the TM principle for

static scenes or from a single frame with the SN principle for dynamic scenes. Second, we propose a scene-

adaptive depth sensing framework based on which a global or region-wise optimal depth map can be generated

through motion detection. To validate the proposed scalable paradigm, we develop a real-time (20fps) depth

sensing system. Experimental results demonstrate that our method achieves an efficient balance between

accuracy and speed during depth sensing that has rarely been exploited before

ETPL

DIP - 170

Real-Time Scalable Depth Sensing With Hybrid Structured Light Illumination








We address the two inherently related problems of segmentation and interpolation of 3D and 4D

sparse data and propose a new method to integrate these stages in a level set framework. The interpolation

process uses segmentation information rather than pixel intensities for increased robustness and accuracy. The

method supports any spatial configurations of sets of 2D slices having arbitrary positions and orientations. We

achieve this by introducing a new level set scheme based on the interpolation of the level set function by radial

basis functions. The proposed method is validated quantitatively and/or subjectively on artificial data and MRI

and CT scans and is compared against the traditional sequential approach, which interpolates the images first,

using a state-of-the-art image interpolation method, and then segments the interpolated volume in 3D or 4D. In

our experiments, the proposed framework yielded similar segmentation results to the sequential approach but

provided a more robust and accurate interpolation. In particular, the interpolation was more satisfactory in

cases of large gaps, due to the method taking into account the global shape of the object, and it recovered

better topologies at the extremities of the shapes where the objects disappear from the image slices. As a

result, the complete integrated framework provided more satisfactory shape reconstructions than the sequential

approach.

ETPL

DIP - 171

Integrated Segmentation and Interpolation of Sparse Data

We present four novel point-to-set distances defined for fuzzy or gray-level image data, two based on

integration over α-cuts and two based on the fuzzy distance transform. We explore their theoretical properties.

Inserting the proposed point-to-set distances in existing definitions of set-to-set distances, among which are

the Hausdorff distance and the sum of minimal distances, we define a number of distances between fuzzy sets.

These set distances are directly applicable for comparing gray-level images or fuzzy segmented objects, but

also for detecting patterns and matching parts of images. The distance measures integrate shape and

intensity/membership of observed entities, providing a highly applicable tool for image processing and

analysis. Performance evaluation of derived set distances in real image processing tasks is conducted and

presented. It is shown that the considered distances have a number of appealing theoretical properties and

exhibit very good performance in template matching and object classification for fuzzy segmented images as

well as when applied directly on gray-level intensity images. Examples include recognition of hand written

digits and identification of virus particles. The proposed set distances perform excellently on the MNIST digit

classification task, achieving the best reported error rate for classification using only rigid body

transformations and a kNN classifier.

ETPL

DIP - 172

Linear Time Distances between Fuzzy Sets with Applications to Pattern

Matching and Classification








In this paper, we develop an efficient bit allocation strategy for subband-based image

coding systems. More specifically, our objective is to design a new optimization algorithm based on a

rate-distortion optimality criterion. To this end, we consider the uniform scalar quantization of a class

of mixed distributed sources following a Bernoulli-generalized Gaussian distribution. This model

appears to be particularly well-adapted for image data, which have a sparse representation in a

wavelet basis. In this paper, we propose new approximations of the entropy and the distortion

functions using piecewise affine and exponential forms, respectively. Because of these

approximations, bit allocation is reformulated as a convex optimization problem. Solving the

resulting problem allows us to derive the optimal quantization step for each subband. Experimental

results show the benefits that can be drawn from the proposed bit allocation method in a typical

transform-based coding application.

ETPL

DIP - 173

A Bit Allocation Method for Sparse Source Coding

When matching images for applications such as mosaicking and homography estimation, the

distribution of features across the overlap region affects the accuracy of the result. This paper uses the spatial

statistics of these features, measured by Ripley's K-function, to assess whether feature matches are clustered

together or spread around the overlap region. A comparison of the performances of a dozen state-of-the-art

feature detectors is then carried out using analysis of variance and a large image database. Results show that

SFOP introduces significantly less aggregation than the other detectors tested. When the detectors are rank-

ordered by this performance measure, the order is broadly similar to those obtained by other means, suggesting

that the ordering reflects genuine performance differences. Experiments on stitching images into mosaics

confirm that better coverage values yield better quality outputs.

ETPL

DIP - 174

Spatial Statistics of Image Features for Performance Comparison








We present a novel method to incorporate prior knowledge into normalized cuts. The prior is

incorporated into the cost function by maximizing the similarity of the prior to one partition and the

dissimilarity to the other. This simple formulation can also be extended to multiple priors to allow the

modeling of the shape variations. A shape model obtained by PCA on a training set can be easily integrated

into the new framework. This is in contrast to other methods that usually incorporate prior knowledge by hard

constraints during optimization. The eigenvalue problem inferred by spectral relaxation is not sparse, but can

still be solved efficiently. We apply this method to biomedical data sets as well as natural images of people

from a public database and compare it with other normalized cut based segmentation algorithms. We

demonstrate that our method gives promising results and can still give a good segmentation even when the

prior is not accurate.

ETPL

DIP - 175

Shape-Based Normalized Cuts Using Spectral Relaxation for Biomedical

Segmentation

The selection of optimal camera configurations (camera locations, orientations, etc.) for multi-camera

networks remains an unsolved problem. Previous approaches largely focus on proposing various objective

functions to achieve different tasks. Most of them, however, do not generalize well to large scale networks. To

tackle this, we propose a statistical framework of the problem as well as propose a trans-dimensional

simulated annealing algorithm to effectively deal with it. We compare our approach with a state-of-the-art

method based on binary integer programming (BIP) and show that our approach offers similar performance on

small scale problems. However, we also demonstrate the capability of our approach in dealing with large scale

problems and show that our approach produces better results than two alternative heuristics designed to deal

with the scalability issue of BIP. Last, we show the versatility of our approach using a number of specific

scenarios.

ETPL

DIP - 176

Optimal Camera Planning Under Versatile User Constraints in Multi-Camera

Image Processing Systems








We propose an analytical model to estimate the synthesized view quality in 3D video. The model relates

errors in the depth images to the synthesis quality, taking into account texture image characteristics, texture

image quality, and the rendering process. Especially, we decompose the synthesis distortion into texture-error

induced distortion and depth-error induced distortion. We analyze the depth-error induced distortion using an

approach combining frequency and spatial domain techniques. Experiment results with video sequences and

coding/rendering tools used in MPEG 3DV activities show that our analytical model can accurately estimate

the synthesis noise power. Thus, the model can be used to estimate the rendering quality for different system

designs.

ETPL

DIP - 177

An Analytical Model for Synthesis Distortion Estimation in 3D Video

Contrast sensitivity of the human visual system to visual stimuli can be significantly affected by

several mechanisms, e.g., vision foveation and attention. Existing studies on foveation based video quality

assessment only take into account static foveation mechanism. This paper first proposes an advanced foveal

imaging model to generate the perceived representation of video by integrating visual attention into the

foveation mechanism. For accurately simulating the dynamic foveation mechanism, a novel approach to

predict video fixations is proposed by mimicking the essential functionality of eye movement. Consequently,

an advanced contrast sensitivity function, derived from the attention driven foveation mechanism, is modeled

and then integrated into a wavelet-based distortion visibility measure to build a full reference attention driven

foveated video quality (AFViQ) metric. AFViQ exploits adequately perceptual visual mechanisms in video

quality assessment. Extensive evaluation results with respect to several publicly available eye-tracking and

video quality databases demonstrate promising performance of the proposed video attention model, fixation

prediction approach, and quality metric.

ETPL

DIP - 178

Attention Driven Foveated Video Quality Assessment








Acquiring scenery depth is a fundamental task in computer vision, with many applications in

manufacturing, surveillance, or robotics relying on accurate scenery information. Time-of-flight cameras can

provide depth information in real-time and overcome short-comings of traditional stereo analysis. However,

they provide limited spatial resolution and sophisticated upscaling algorithms are sought after. In this paper,

we present a sensor fusion approach to time-of-flight super resolution, based on the combination of depth and

texture sources. Unlike other texture guided approaches, we interpret the depth upscaling process as a

weighted energy optimization problem. Three different weights are introduced, employing different available

sensor data. The individual weights address object boundaries in depth, depth sensor noise, and temporal

consistency. Applied in consecutive order, they form three weighting strategies for time-of-flight super

resolution. Objective evaluations show advantages in depth accuracy and for depth image based rendering

compared with state-of-the-art depth upscaling. Subjective view synthesis evaluation shows a significant

increase in viewer preference by a factor of four in stereoscopic viewing conditions. To the best of our

knowledge, this is the first extensive subjective test performed on time-of-flight depth upscaling. Objective

and subjective results proof the suitability of our approach to time-of-flight super resolution approach for

depth scenery capture.

ETPL

DIP - 179

A Weighted Optimization Approach to Time-of-Flight Sensor Fusion

This paper presents a novel learning method for precise eye localization, a challenge to be solved in

order to improve the performance of face processing algorithms. Few existing approaches can directly detect

and localize eyes with arbitrary angels in predicted eye regions, face images, and original portraits at the same

time. To preserve rotation invariant property throughout the entire eye localization framework, a codebook of

invariant local features is proposed for the representation of eye patterns. A heat map is then generated by

integrating a 2-class sparse representation classifier with a pyramid-like detecting and locating strategy to

fulfill the task of discriminative classification and precise localization. Furthermore, a series of prior

information is adopted to improve the localization precision and accuracy. Experimental results on three

different databases show that our method is capable of effectively locating eyes in arbitrary rotation situations

(360° in plane).

ETPL

DIP - 180

A Novel Eye Localization Method with Rotation Invariance








Directionlets allow a construction of perfect reconstruction and critically sampled multidirectional

anisotropic basis, yet retaining the separable filtering of standard wavelet transform. However, due to the

spatially varying filtering and downsampling direction, it is forced to apply spatial segmentation and process

each segment independently. Because of this independent processing of the image segments, directionlets

suffer from the following two major limitations when applied to, say, image coding. First, failure to exploit the

correlation across block boundaries degrades the coding performance and also induces blocking artifacts, thus

making it mandatory to use de-blocking filter at low bit rates. Second, spatial scalability, i.e., minimum

segment size or the number of levels of the transform, is limited due to independent processing of segments.

We show that, with simple modifications in the block boundaries, we can overcome these limitations by, what

we call, in-phase lifting implementation of directionlets. In the context of directionlets using in-phase lifting,

we identify different possible groups of downsampling matrices that would allow the construction of a

multilevel transform without forcing independent processing of segments both with and without any

modifications in the segment boundary. Experimental results in image coding show objective and subjective

improvements when compared with the directionlets applied independently on each image segment. As an

application, using both the in-phase lifting implementation of directionlets and the adaptive directional lifting,

we have constructed an adaptive directional wavelet transform, which has shown improved image coding

performance over these adaptive directional wavelet transforms.

ETPL

DIP - 181

Directionlets Using In-Phase Lifting for Image Representation

The goal of this paper is to design a statistical test for the camera model identification problem. The

approach is based on the heteroscedastic noise model, which more accurately describes a natural raw image.

This model is characterized by only two parameters, which are considered as unique fingerprint to identify

camera models. The camera model identification problem is cast in the framework of hypothesis testing

theory. In an ideal context where all model parameters are perfectly known, the likelihood ratio test (LRT) is

presented and its performances are theoretically established. For a practical use, two generalized LRTs are

designed to deal with unknown model parameters so that they can meet a prescribed false alarm probability

while ensuring a high detection performance. Numerical results on simulated images and real natural raw

images highlight the relevance of the proposed approach.

ETPL

DIP - 182

Camera Model Identification Based on the Heteroscedastic Noise Model








Vector bilateral filtering has been shown to provide good tradeoff between noise removal and edge

degradation when applied to multispectral/hyperspectral image denoising. It has also been demonstrated to

provide dynamic range enhancement of bands that have impaired signal to noise ratios (SNRs). Typical vector

bilateral filtering described in the literature does not use parameters satisfying optimality criteria. We

introduce an approach for selection of the parameters of a vector bilateral filter through an optimization

procedure rather than by ad hoc means. The approach is based on posing the filtering problem as one of

nonlinear estimation and minimization of the Stein's unbiased risk estimate of this nonlinear estimator. Along

the way, we provide a plausibility argument through an analytical example as to why vector bilateral filtering

outperforms band-wise 2D bilateral filtering in enhancing SNR. Experimental results show that the optimized

vector bilateral filter provides improved denoising performance on multispectral images when compared with

several other approaches.

ETPL

DIP - 183

Multispectral Image Denoising With Optimized Vector Bilateral Filter

In this paper, we investigate a new inter-channel coding mode called LM mode proposed for the next

generation video coding standard called high efficiency video coding. This mode exploits inter-channel

correlation using reconstructed luma to predict chroma linearly with parameters derived from neighboring

reconstructed luma and chroma pixels at both encoder and decoder to avoid overhead signaling. In this paper,

we analyze the LM mode and prove that the LM parameters for predicting original chroma and reconstructed

chroma are statistically the same. We also analyze the error sensitivity of the LM parameters. We identify

some LM mode problematic situations and propose three novel LM-like modes called LMA, LML, and LMO

to address the situations. To limit the increase in complexity due to the LM-like modes, we propose some fast

algorithms with the help of some new cost functions. We further identify some potentially-problematic

conditions in the parameter estimation (including regression dilution problem) and introduce a novel model

correction technique to detect and correct those conditions. Simulation results suggest that considerable BD-

rate reduction can be achieved by the proposed LM-like modes and model correction technique. In addition,

the performance gain of the two techniques appears to be essentially additive when combined.

ETPL

DIP - 184

Chroma Intra Prediction Based on Inter-Channel Correlation for HEVC








This paper proposes a quadratic classification approach on the subspace of Extended

Histogram of Gradients (ExHoG) for human detection. By investigating the limitations of Histogram

of Gradients (HG) and Histogram of Oriented Gradients (HOG), ExHoG is proposed as a new feature

for human detection. ExHoG alleviates the problem of discrimination between a dark object against a

bright background and vice versa inherent in HG. It also resolves an issue of HOG whereby gradients

of opposite directions in the same cell are mapped into the same histogram bin. We reduce the

dimensionality of ExHoG using Asymmetric Principal Component Analysis (APCA) for improved

quadratic classification. APCA also addresses the asymmetry issue in training sets of human

detection where there are much fewer human samples than non-human samples. Our proposed

approach is tested on three established benchmarking data sets - INRIA, Caltech, and Daimler - using

a modified Minimum Mahalanobis distance classifier. Results indicate that the proposed approach

outperforms current state-of-the-art human detection methods.

ETPL

DIP - 185

Human Detection by Quadratic Classification on Subspace of Extended

Histogram of Gradients

In this paper, we address the problem of recovering a color image from a grayscale one. The input

color data comes from a source image considered as a reference image. Reconstructing the missing color of a

grayscale pixel is here viewed as the problem of automatically selecting the best color among a set of color

candidates while simultaneously ensuring the local spatial coherency of the reconstructed color information.

To solve this problem, we propose a variational approach where a specific energy is designed to model the

color selection and the spatial constraint problems simultaneously. The contributions of this paper are twofold.

First, we introduce a variational formulation modeling the color selection problem under spatial constraints

and propose a minimization scheme, which computes a local minima of the defined nonconvex energy.

Second, we combine different patch-based features and distances in order to construct a consistent set of

possible color candidates. This set is used as input data and our energy minimization automatically selects the

best color to transfer for each pixel of the grayscale image. Finally, the experiments illustrate the potentiality

of our simple methodology and show that our results are very competitive with respect to the state-of-the-art

methods.

ETPL

DIP - 186

Variational Exemplar-Based Image Colorization








Depth-map merging based 3D modeling is an effective approach for reconstructing large-scale scenes

from multiple images. In addition to generate high quality depth maps at each image, how to select suitable

neighboring images for each image is also an important step in the reconstruction pipeline, unfortunately to

which little attention has been paid in the literature untill now. This paper is intended to tackle this issue for

large scale scene reconstruction where many unordered images are captured and used with substantial varying

scale and view-angle changes. We formulate the neighboring image selection as a combinatorial optimization

problem and use the quantum-inspired evolutionary algorithm to seek its optimal solution. Experimental

results on the ground truth data set show that our approach can significantly improve the quality of the depth-

maps as well as final 3D reconstruction results with high computational efficiency.

ETPL

DIP - 187

How to Select Good Neighboring Images in Depth-Map Merging Based 3D

Modeling

A joint source-channel coding has attracted substantial attention with the aim of further exploiting the

residual correlation residing in the encoded video signals for the sake of improving the reconstructed video

quality. In our previous paper, a first-order Markov process model was utilized as an error concealment tool

for exploiting the intra-frame correlation residing in the Wyner--Ziv (WZ) frame in the context of pixel-

domain distributed video coding. In this contribution, we exploit the inter-view correlation with the aid of an

inter-view motion search in distributed multi-view video coding (DMVC). Initially, we rely on the system

architecture of WZ coding invoked for multi-view video. Then, we construct a novel mesh-structured pixel-

correlation model from the inter-view motion vectors and derive its decoding rules for joint source-channel

decoding. Finally, we benchmark the attainable system performance against the existing pixel-domain WZ

coding based DMVC scheme, where the classic turbo codec is employed. Our simulation results show that

substantial bitrate reductions are achieved by employing the proposed motion-aware mesh-structured

correlation modelling technique in a DMVC scheme.

ETPL

DIP - 188

Motion-Aware Mesh-Structured Trellis for Correlation Modelling Aided

Distributed Multi-View Video Coding








During the acquisition process with the Compton gamma-camera, integrals of the intensity distribution

of the source on conical surfaces are measured. They represent the Compton projections of the intensity. The

inversion of the Compton transform reposes on a particular Fourier-Slice theorem. This paper proposes a

filtered backprojection algorithm for image reconstruction from planar Compton camera data. We show how

different projections are related together and how they may be combined in the tomographical reconstruction

step. Considering a simulated Compton imaging system, we conclude that the proposed method yields

accurate reconstructed images for simple sources. An elongation of the source in the direction orthogonal to

the camera may be observed and is to be related to the truncation of the projections induced by the finite extent

of the device. This phenomenon was previously observed with other reconstruction methods, e.g., iterative

maximum likelihood expectation maximization. The redundancy of the Compton transform is thus an

important feature for the reduction of noise in Compton images, since the ideal assumptions of infinite width

and observation time are never met in practice. We show that a selection operated on the set of data allows to

partially get around projection truncation, at the expense of an enhancement of the noise in the images.

ETPL

DIP - 189

Filtered Backprojection Reconstruction and Redundancy in Compton Camera

Imaging

In this paper, we present a novel view synthesis method named Visto, which uses a reference input

view to generate synthesized views in nearby viewpoints. We formulate the problem as a joint optimization of

inter-view texture and depth map similarity, a framework that is significantly different from other traditional

approaches. As such, Visto tends to implicitly inherit the image characteristics from the reference view

without the explicit use of image priors or texture modeling. Visto assumes that each patch is available in both

the synthesized and reference views and thus can be applied to the common area between the two views but

not the out-of-region area at the border of the synthesized view. Visto uses a Gauss-Seidel-like iterative

approach to minimize the energy function. Simulation results suggest that Visto can generate seamless virtual

views and outperform other state-of-the-art methods.

ETPL

DIP - 190

Seamless View Synthesis Through Texture Optimization








In this paper, a novel technique to speed-up a non-local means (NLM) filter is proposed. In the

original NLM filter, most of its computational time is spent on finding distances for all the patches in the

search window. Here, we build a dictionary in which patches with similar photometric structures are clustered

together. Dictionary is built only once with high resolution images belonging to different scenes. Since the

dictionary is well organized in terms of indexing its entries, it is used to search similar patches very quickly for

efficient NLM denoising. We achieve a substantial reduction in computational cost compared with the original

NLM method, especially when the search window of NLM is large, without much affecting the PSNR.

Second, we show that by building a dictionary for edge patches as opposed to intensity patches, it is possible

to reduce the dictionary size; thus, further improving the computational speed and memory requirement. The

proposed method preclassifies similar patches with the same distance measure as used by NLM method. The

proposed algorithm is shown to outperform other prefiltering based fast NLM algorithms computationally as

well as qualitatively.

ETPL

DIP - 191

Novel Speed-Up Strategies for Non-Local Means Denoising With Patch and

Edge Patch Based Dictionaries

The concept of tension is introduced in the framework of active contours with prior shape information,

and it is used to improve image segmentation. In particular, two properties of this new quantity are shown: 1)

high values of the tension correspond to undesired equilibrium points of the cost function under minimization

and 2) tension decreases if a curve is split into two or more parts. Based on these ideas, a tree is generated

whose nodes are different local minima of the cost function. Deeper nodes in the tree are expected to

correspond to lower values of the cost function. In this way, the search for the global optimum is reduced to

visiting and pruning a binary tree. The proposed method has been applied to the problem of fish segmentation

from low quality underwater images. Qualitative and quantitative comparison with existing algorithms based

on the Euler-Lagrange diffusion equations shows the superiority of the proposed approach in avoiding

undesired local minima.

ETPL

DIP - 192

Tension in Active Shapes








To evaluate multitarget video tracking results, one needs to quantify the accuracy of the estimated

target-size and the cardinality error as well as measure the frequency of occurrence of ID changes. In this

paper, we survey existing multitarget tracking performance scores and, after discussing their limitations, we

propose three parameter-independent measures for evaluating multitarget video tracking. The measures

consider target-size variations, combine accuracy and cardinality errors, quantify long-term tracking accuracy

at different accuracy levels, and evaluate ID changes relative to the duration of the track in which they occur.

We conduct an extensive experimental validation of the proposed measures by comparing them with existing

ones and by evaluating four state-of-the-art trackers on challenging real-world publicly-available data sets.

The software implementing the proposed measures is made available online to facilitate their use by the

research community.

ETPL

DIP - 193

Measures of Effective Video Tracking

Seed-based methods for region-based image segmentation are known to provide satisfactory results

for several applications, being usually easy to extend to multidimensional images. However, while boundary-

based methods like live wire can easily incorporate a preferred boundary orientation, region-based methods

are usually conceived for undirected graphs, and do not resolve well between boundaries with opposite

orientations. This motivated researchers to investigate extensions for some region-based frameworks, seeking

to better solve oriented transitions. In this same spirit, we discuss how to incorporate this orientation

information in a region-based approac called “IF segmentation by seed competition” by e ploring digrap s.

We give direct proof for the optimality of the proposed extensions in terms of energy functions associated with

the cuts. To stress these theoretical results, we also present an experimental evaluation that shows the obtained

gains in accuracy for some 2D and 3D data sets of medical images.

ETPL

DIP -194

Oriented Image Foresting Transform Segmentation by Seed Competition








This paper presents a new framework for motion compensated frame rate up conversion (FRUC)

based on variational image fusion. The proposed algorithm consists of two steps: 1) generation of multiple

intermediate interpolated frames and 2) fusion of those intermediate frames. In the first step, we determine

four different sets of the motion vector field using four neighboring frames. We then generate intermediate

interpolated frames corresponding to the determined four sets of the motion vector field, respectively. Multiple

sets of the motion vector field are used to solve the occlusion problem in motion estimation. In the second

step, the four intermediate interpolated frames are fused into a single frame via a variational image fusion

process. For effective fusion, we determine fusion weights for each intermediate interpolated frame by

minimizing the energy, which consists of a weighted- L1-norm based data energy and gradient-driven

smoothness energy. Experimental results demonstrate that the proposed algorithm improves the performance

of FRUC compared with the existing algorithms.

ETPL

DIP - 195

Frame Rate Up Conversion Based on Variational Image Fusion

We introduce an adaptive continuous-domain modeling approach to texture and natural images. The

continuous-domain image is assumed to be a smooth function, and we embed it in a parameterized Sobolev

space. We point out a link between Sobolev spaces and stochastic auto-regressive models, and exploit it for

optimally choosing Sobolev parameters from available pixel values. To this aim, we use exact continuous-to-

discrete mapping of the auto-regressive model that is based on symmetric exponential splines. The mapping is

computationally efficient, and we exploit it for maximizing an approximated Gaussian likelihood function. e

account for non- aussian L vy-type processes by deriving a more robust estimator that is based on the sample

auto-correlation sequence. Both estimators use multiple initialization values for overcoming the local minima

structure of the fitting criteria. Experimental image resizing results indicate that the auto-correlation criterion

can cope better with non-Gaussian processes and model mismatch. Our work demonstrates the importance of

the auto-correlation function in adaptive image interpolation and image modeling tasks, and we believe it is

instrumental in other image processing tasks as well.

ETPL

DIP - 196

Adaptive Image Resizing Based on Continuous-Domain Stochastic Modeling








A new method of image resolution up-conversion (image interpolation) based on maximum a

posteriori sequence estimation is proposed. Instead of making a hard decision about the value of each missing

pixel, we estimate the missing pixels in groups. At each missing pixel of the high resolution (HR) image, we

consider an ensemble of candidate interpolation methods (interpolation functions). The interpolation functions

are interpreted as states of a Markov model. In other words, the proposed method undergoes state transitions

from one missing pixel position to the next. Accordingly, the interpolation problem is translated to the

problem of estimating the optimal sequence of interpolation functions corresponding to the sequence of

missing HR pixel positions. We derive a parameter-free probabilistic model for this to-be-estimated sequence

of interpolation functions. Then, we solve the estimation problem using a trellis representation and the Viterbi

algorithm. Using directional interpolation functions and sequence estimation techniques, we classify the new

algorithm as an adaptive directional interpolation using soft-decision estimation techniques. Experimental

results show that the proposed algorithm yields images with higher or comparable peak signal-to-noise ratios

compared with some benchmark interpolation methods in the literature while being efficient in terms of

implementation and complexity considerations.

ETPL

DIP - 197

A MAP-Based Image Interpolation Method via Viterbi Decoding of Markov

Chains of Interpolation Functions

An optical approach using confocal parabolic reflectors is used to transform 2D input data based on

spatial position to a 1D sequenced serial string. The optical input data are set up as a 2D array. Individual

channels are established between the input array and the final output detector, which reads the data as a time

based serial data. The transformation is achieved by changing the optical path length associated with each

pixel and its channel to the output detector. The 2D data can be images or individual sources but the light must

be parallel. This paper defines how to establish the channels and the calculations required to achieve the

desired transformation.

ETPL

DIP - 198

A Two Dimensional Optical Input to One Dimensional Serial Pulse

Transformation Using Confocal Reflectors








Thank You !

This paper presents a new lossless color image compression algorithm, based on the hierarchical

prediction and context-adaptive arithmetic coding. For the lossless compression of an RGB image, it is first

decorrelated by a reversible color transform and then Y component is encoded by a conventional lossless

grayscale image compression method. For encoding the chrominance images, we develop a hierarchical

scheme that enables the use of upper, left, and lower pixels for the pixel prediction, whereas the conventional

raster scan prediction methods use upper and left pixels. An appropriate context model for the prediction error

is also defined and the arithmetic coding is applied to the error signal corresponding to each context. For

several sets of images, it is shown that the proposed method further reduces the bit rates compared with

JPEG2000 and JPEG-XR.

ETPL

DIP - 199

Hierarchical Prediction and Context Adaptive Coding for Lossless Color Image

Compression