Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
a
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
ETPL NT-001 Answering “What-If” Deployment and Configuration Questions With WISE: Techniques and Deployment Experience
ETPL NT-002 Complexity Analysis and Algorithm Design for Advance Bandwidth Scheduling in Dedicated Networks
ETPL NT-003 Diffusion Dynamics of Network Technologies With Bounded Rational Users: Aspiration-Based Learning
ETPL NT-004 Delay-Based Network Utility Maximization
ETPL NT-005 A Distributed Control Law for Load Balancing in Content Delivery Networks
ETPL NT-006 Efficient Algorithms for Neighbor Discovery in Wireless Networks
ETPL NT-007 Stochastic Game for Wireless Network Virtualization
ETPL NT-008 ABC: Adaptive Binary Cuttings for Multidimensional Packet Classification,
ETPL NT-009 A Utility Maximization Framework for Fair and Efficient Multicasting in Multicarrier Wireless Cellular Networks
ETPL NT-010 Achieving Efficient Flooding by Utilizing Link Correlation in Wireless Sensor Networks,
ETPL NT-011 Random Walks and Green's Function on Digraphs: A Framework for Estimating Wireless Transmission Costs
ETPL NT-012 "A Flexible Platform for Hardware-Aware Network Experiments and a Case Study on Wireless Network Coding
ETPL NT-013 Exploring the Design Space of Multichannel Peer-to-Peer Live Video Streaming Systems
ETPL NT-014 Secondary Spectrum Trading—Auction-Based Framework for Spectrum Allocation and Profit Sharing
ETPL NT-015 Towards Practical Communication in Byzantine-Resistant DHTs
ETPL NT-016 Semi-Random Backoff: Towards Resource Reservation for Channel Access in Wireless LANs
ETPL NT-017 Entry and Spectrum Sharing Scheme Selection in Femtocell Communications Markets
ETPL NT-018 On Replication Algorithm in P2P VoD,
ETPL NT-019 Back-Pressure-Based Packet-by-Packet Adaptive Routing in Communication Networks
ETPL NT-020 Scheduling in a Random Environment: Stability and Asymptotic Optimality
ETPL NT-021 An Empirical Interference Modeling for Link Reliability Assessment in Wireless Networks
ETPL NT-022 On Downlink Capacity of Cellular Data Networks With WLAN/WPAN Relays
ETPL NT-023 Centralized and Distributed Protocols for Tracker-Based Dynamic Swarm Management
ETPL NT-024 Localization of Wireless Sensor Networks in the Wild: Pursuit of Ranging Quality
ETPL NT-025 Control of Wireless Networks With Secrecy
ETPL NT-026 ICTCP: Incast Congestion Control for TCP in Data-Center Networks
ETPL NT-027 Context-Aware Nanoscale Modeling of Multicast Multihop Cellular Networks
ETPL NT-028 Moment-Based Spectral Analysis of Large-Scale Networks Using Local Structural Information
ETPL NT-029 Internet-Scale IPv4 Alias Resolution With MIDAR
ETPL NT-030 Time-Bounded Essential Localization for Wireless Sensor Networks
ETPL NT-031 Stability of FIPP -Cycles Under Dynamic Traffic in WDM Networks
ETPL NT-032 Cooperative Carrier Signaling: Harmonizing Coexisting WPAN and WLAN Devices
ETPL NT-033 Mobility Increases the Connectivity of Wireless Networks
ETPL NT-034 Topology Control for Effective Interference Cancellation in Multiuser MIMO Networks
ETPL NT-035 Distortion-Aware Scalable Video Streaming to Multinetwork Clients
ETPL NT-036 Combined Optimal Control of Activation and Transmission in Delay-Tolerant Networks
ETPL NT-037 A Low-Complexity Congestion Control and Scheduling Algorithm for Multihop Wireless
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In inverse synthetic aperture radar (ISAR) imaging, a target is usually regarded as consist of a few
strong (specular) scatterers and the distribution of these strong scatterers is sparse in the imagingvolume. In
this paper, we propose to incorporate the sparse signal recovery method in 3D multiple-input multiple-
output radar imaging algorithm. Sequential order one negative exponential (SOONE) function, which forms
homotopy between $ell_{1}$ and $ell_{0}$ norms, is proposed to measure the sparsity. Gradient projection is
used to solve a constrained nonconvex SOONE function minimization problem and recover the sparse signal.
However, while the gradient projection method is computationally simple, it is not robust when a matrix in the
algorithm is ill conditioned. We thus further propose using diagonal loading and singular value decomposition
methods to improve the robustness of the algorithm. In order to handle targets with large flat surfaces,
a combined amplitude and total-variation objective function is also proposed to regularize the shapes of the
flat surfaces. Simulation results show that the proposed gradient projection of SOONE function method is
better than orthogonal matching pursuit, CoSaMp,$ell_{1}$ -magic, Bayesian method with Laplace prior,
smoothed $ell_{0}$ method, and $ell_{1}{-}ell_{s}$ in high SNR cases for recovery of ${pm}{1}$ random
spikes sparse signal. The quality of the simulated 3D images and real data ISAR images obtained using the
new method is better than that of the conventional correlation method and minimum <- nline-
formula> $ell_{2}$ norm method, and competitive to the aforementioned sparse signal recovery algorithms.
ETPL
DIP - 001
MIMO Radar 3D Imaging Based on Combined Amplitude and Total Variation
Cost Function With Sequential Order One Negative Exponential Form
Capturing aerial imagery at high resolutions often leads to very low frame rate video streams, well
under full motion video standards, due to bandwidth, storage, and cost constraints. Low frame rates
makeregistration difficult when an aircraft is moving at high speeds or when global positioning system (GPS)
contains large errors or it fails. We present a method that takes advantage of persistent cyclic videodata
collections to perform an online registration with drift correction. We split the persistent aerialimagery
collection into individual cycles of the scene, identify and correct the registration errors on the first cycle in a
batch operation, and then use the corrected base cycle as a reference pass to register and correct subsequent
passes online. A set of multi-view panoramic mosaics is then constructed for each aerial pass for
representation, presentation and exploitation of the 3D dynamic scene. These sets of mosaics are all in
alignment to the reference cycle allowing their direct use in change detection, tracking, and 3D
reconstruction/visualization algorithms. Stereo viewing with adaptive baselines and varying view angles is
realized by choosing a pair of mosaics from a set of multi-view mosaics. Further, the mosaics for the second
pass and later can be generated and visualized online as their is no further batch error correction.
ETPL
DIP - 002
Persistent Aerial Video Registration and Fast Multi-View Mosaicing
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In this paper, we propose FeatureMatch, a generalised approximate nearest-neighbour field (ANNF)
computation framework, between a source and target image. The proposed algorithm can estimateANNF maps
between any image pairs, not necessarily related. This generalisation is achieved through appropriate spatial-
range transforms. To compute ANNF maps, global colour adaptation is applied as a range transform on the
source image. Image patches from the pair of images are approximated using low-dimensional features, which
are used along with KD-tree to estimate the ANNF map. This ANNFmap is further improved based on image
coherency and spatial transforms. The proposed generalisation, enables us to handle a wider range of
vision applications, which have not been tackled using the ANNF framework. We illustrate two
such applications namely: 1) optic disk detection and 2) super resolution. The first application deals with
medical imaging, where we locate optic disks in retinal images using a healthy optic disk image as common
target image. The second application deals with super resolution of synthetic images using a common source
image as dictionary. We make use ofANNF mappings in both these applications and show experimentally that
our proposed approaches are faster and accurate, compared with the state-of-the-art techniques.
ETPL
DIP - 003
FeatureMatch: A General ANNF Estimation Technique and its Applications
Newly developed hypertext transfer protocol (HTTP)-based video streaming technologies enable
flexiblerate-adaptation under varying channel conditions. Accurately predicting the users' quality of
experience (QoE) for rate-adaptive HTTP video streams is thus critical to achieve efficiency. An important
aspect of understanding and modeling QoE is predicting the up-to-the-moment subjective quality of a video as
it is played, which is difficult due to hysteresis effects and nonlinearities in human behavioral responses. This
paper presents a Hammerstein-Wiener model for predicting the time-varying subjective quality(TVSQ)
of rate-adaptive videos. To collect data for model parameterization and validation, a database of longer
duration videos with time-varying distortions was built and the TVSQs of the videos were measured in a large-
scale subjective study. The proposed method is able to reliably predict the TVSQ of rate adaptive videos.
Since the Hammerstein-Wiener model has a very simple structure, the proposed method is suitable for online
TVSQ prediction in HTTP-based streaming.
ETPL
DIP - 004
Modeling the Time—Varying Subjective Quality of HTTP Video Streams With
Rate Adaptations
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We present a noniterative multiresolution motion estimation strategy, involving block-
basedcomparisons in each detail band of a Laplacian pyramid. A novel matching score is developed and
analyzed. The proposed matching score is based on a class of nonlinear transformations of Laplacian detail
bands, yielding 1-bit or 2-bit representations. The matching score is evaluated in a dense full-
search motion estimation setting, with synthetic video frames and an optical flow data set. Together with a
strategy for combining the matching scores across resolutions, the proposed method is shown to produce
smoother and more robust estimates than mean square error (MSE) in each detail band and combined. It
tolerates more of nontranslational motion, such as rotation, validating the analysis, while providing much
better localization of the motion discontinuities. We also provide an efficient implementation of
the motion estimation strategy and show that the computational complexity of the approach is closely related
to the traditional MSE block-based full-search motion estimation procedure.
ETPL
DIP - 005
Nonlinear Transform for Robust Dense Block-Based Motion Estimation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Photo cropping is a widely used tool in printing industry, photography, and cinematography.
Conventional croppingmodels suffer from the following three challenges. First, the deemphasized
role of semantic contents that are many times more important than low-level features
in photoaesthetics. Second, the absence of a sequential ordering in the existing models. In
contrast, humans look at semantically important regions sequentially when viewing aphoto. Third,
the difficulty of leveraging inputs from multiple users. Experience from multiple users is particularly
critical incropping as photo assessment is quite a subjective task. To address these challenges, this
paper proposes semantics-aware photo cropping, which crops a photo by simulating the process
of humans sequentially perceiving semantically important regions of a photo. We first project the
local features (graphlets in this paper) onto the semantic space, which is constructed based on the
category information of the training photos. An efficient learning algorithm is then derived to
sequentially select semantically representative graphlets of a photo, and the selecting process can be
interpreted by a path, which simulates humans activelyperceiving semantics in a photo. Furthermore,
we learn a prior distribution of such active graphlet paths from trainingphotos that are marked as
aesthetically pleasing by multiple users. The learned priors enforce the corresponding active
graphlet path of a test photo to be maximally similar to those from the training photos. Experimental
results show that: 1) the active graphlet path accurately predicts humangaze shifting, and thus is more
indicative for photoaesthetics than conventional saliency maps and 2) thecropped photos produced by
our approach outperform its competitors in both qualitative and quantitative comparisons.
ETPL
DIP - 006
Actively Learning Human Gaze Shifting Paths for Semantics-Aware Photo
Cropping
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In the framework of texture image retrieval, a new family of stochastic multivariate modeling is
proposed based onGaussian Copula and wavelet decompositions. We take advantage of the copula paradigm,
which makes it possible to separate dependence structure from marginal behavior. We introduce two
new multivariate models using, respectively, generalized Gaussian and Weibull densities.
These models capture both the subband marginal distributions and the correlation
between waveletcoefficients. We derive, as a similarity measure, a closed form expression of the Jeffrey
divergence between Gaussiancopula-based multivariate models. Experimental results on well-known
databases show significant improvements inretrieval rates using the proposed method compared with the best
known state-of-the-art approaches.
ETPL
DIP - 007
Gaussian Copula Multivariate Modeling for Texture Image Retrieval Using
Wavelet Transforms
Visual features are successfully exploited in several applications (e.g., visual search, object
recognition and tracking, etc.) due to their ability to efficiently represent image content. Several visual analysis
tasks require featuresto be transmitted over a bandwidth-limited network, thus calling for coding techniques to
reduce the required bit budget, while attaining a target level of efficiency. In this paper, we propose, for the
first time, a coding architecture designed for local features (e.g., SIFT, SURF) extracted fromvideo sequences.
To achieve high coding efficiency, we exploit both spatial and temporal redundancy by means of intraframe
and interframe coding modes. In addition, we propose a coding mode decision based on rate-distortion
optimization. The proposed coding scheme can be conveniently adopted to implement the analyze-then-
compress (ATC) paradigm in the context of visual sensor networks. That is, sets
of visual features are extracted fromvideo frames, encoded at remote nodes, and finally transmitted to a central
controller that performs visualanalysis. This is in contrast to the traditional compress-then-analyze (CTA)
paradigm, in which video sequences acquired at a node are compressed and then sent to a central unit for
further processing. In this paper, we compare these codingparadigms using metrics that are routinely adopted
to evaluate the suitability of visual features in the context of content-based retrieval, object recognition, and
tracking. Experimental results demonstrate that, thanks to the significant coding gains achieved by the
proposed codingscheme, ATC outperforms CTA with respect to all evaluation metrics.
ETPL
DIP - 008
Coding Visual Features Extracted From Video Sequences
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In this paper, a novel fuzzy rule-based prediction framework is developed for high-quality
image zooming. In classical interpolation-based image zooming, resolution is increased by inserting
pixels using certain interpolation techniques. Here, we propose a patch-based image zooming
technique, where each low-resolution (LR) image patch is replaced by an estimated high-
resolution (HR) patch. Since an LR patch can be generated from any of the many possible HR
patches, it would be natural to develop rules to find different possible HR patches and then to
combine them according to rulestrength to get the estimated HR patch. Here, we generate a large
number of LR-HR patch pairs from a collection of natural images, group them into different clusters,
and then generate a fuzzy rule for each of these clusters. The ruleparameters are also learned from
these LR-HR patch pairs. As a result, an efficient mapping from LR patch space to HR patch space
can be formulated. The performance of the proposed method is tested on different images, and is also
compared with other representative as well as state-of-the-art image zooming techniques.
Experimental results show that the proposed method is better than the competing methods and is
capable of reconstructing thin lines, edges, fine details, and textures in the image efficiently.
ETPL
DIP - 009
A Fuzzy-Rule-Based Approach for Single Frame Super Resolution
In this paper, we propose a novel approach for integrating multiple tracking cues within a
unified probabilistic graph-based Markov random fields (MRFs) representation. We show how to integrate
temporal and spatial cues encoded by unary and pairwise probabilistic potentials. As the inference of such
high-order MRF models is known to be NP-hard, we propose an efficient spectral relaxation-basedinference
scheme. The proposed scheme is exemplified by applying it to a mixture of five tracking cues, and is shown to
be applicable to wider sets of cues. This paves the way for a modular plug-and-play tracking framework that
can be easily adapted to diverse tracking scenarios. The proposed scheme is experimentally shown to compare
favorably with contemporary state-of-the-art schemes, and provides accurate tracking results.
ETPL
DIP - 010
A Probabilistic Graph-Based Framework for Plug-and-Play Multi-Cue Visual
Tracking
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper deals with fast and accurate visualization ofpushbroom image data from airborne and
spaceborne platforms. A pushbroom sensor acquires images in a line-scanning fashion, and this results
in scattered input datathat need to be resampled onto a uniform grid for geometrically correct visualization. To
this end, we model theanisotropic spatial dependence structure caused by the acquisition process. Several
methods for scattered datainterpolation are then adapted to handle the inducedanisotropic metric and compared
for the pushbroom imagerectification problem. A trick that exploits the semiordered line structure
of pushbroom data to improve the computational complexity several orders of magnitude is also presented.
ETPL
DIP - 011
Anisotropic Scattered Data Interpolation for Pushbroom Image Rectification
Two model-based algorithms for edge detection in spectralimagery are developed that specifically
target capturing intrinsic features such as isoluminant edges that are characterized by a jump in color but not in
intensity. Given prior knowledge of the classes of reflectance or emittance spectra associated with candidate
objects in a scene, a small set of spectral-band ratios, which most profoundly identify the edge between each
pair of materials, are selected to define a edge signature. The bands that form the edge signature are fed into a
spatial mask, producing asparse joint spatiospectral nonlinear operator. The first algorithm
achieves edge detection for every material pair by matching the response of the operator at every pixel with
the edge signature for the pair of materials. The second algorithm is a classifier-enhanced extension of the first
algorithm that adaptively accentuates distinctive features before applying the spatiospectral operator. Both
algorithms are extensively verified using spectral imagery from the airborne hyperspectral imager and from a
dots-in-a-well midinfrared imager. In both cases, the multicolor gradient (MCG) and the hyperspectral/spatial
detection of edges(HySPADE) edge detectors are used as a benchmark for comparison. The results
demonstrate that the proposed algorithms outperform the MCG and HySPADE edgedetectors in accuracy,
especially when isoluminant edges are present. By requiring only a few bands as input to
thespatiospectral operator, the algorithms enable significant levels of data compression in band selection. In
the presented examples, the required operations per pixel are reduced by a factor of 71 with respect to those
required by the MCG edge detector.
ETPL
DIP - 012
Model-Based Edge Detector for Spectral Imagery Using Sparse Spatiospectral
Masks
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Disparity estimation is a fundamental task in stereo imaging and is a well-studied problem. Recently,
methods have been adapted to the video domain where motion is used as amatching criterion to help
disambiguate spatially similar candidates. In this paper, we analyze the validity of the underlying assumptions
of spatio-temporal disparity estimation, and determine the extent to which motion aids the matching process.
By analyzing the error signal forspatio-temporal block matching under the sum of squared differences criterion
and treating motion as a stochastic process, we determine the probability of a false match as a function of
image features, motion distribution, image noise, and number of frames in the spatio-temporal patch. This
performance quantification provides insight into whenspatio-temporal matching is most beneficial in terms of
the scene and motion, and can be used as a guide to select parameters for stereo matching algorithms. We
validate our results through simulation and experiments on stereo video.
ETPL
DIP - 013
Discriminability Limits in Spatio-Temporal Stereo Block Matching
We propose a novel representation for stereo videos namely 2D-plus-depth-cue. This representation is
able to encodestereo videos compactly by leveraging the by-product of astereo video conversion process.
Specifically, the depth cues are derived from an interactive labeling process during 2D-to-
stereo video conversion-they are contour points of image regions and their corresponding depth models, and so
forth. Using such cues and the image features of 2D video frames, the scene depth can be reliably recovered.
Experimental results demonstrate that the bit rate can be saved about 10%-50% in coding
a stereo video compared with multiviewvideo coding and the 2D-plus-depth methods. In addition, since the
objects are segmented in the conversion process, it is convenient to adopt the region-of-interest (ROI) coding
in the proposed stereo video coding system. Experimental results show that using ROI coding, the bit rate is
reduced by 30%-40% or the video quality is increased by 1.5-4 dB with the fixed bit rate.
ETPL
DIP - 014
A Compact Representation for Compressing Converted Stereo Videos
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In this paper, we propose a robust object tracking algorithm based on
a sparse collaborative model that exploits both holistic templates and local representations to account for
drastic appearance changes. Within the proposedcollaborative appearance model, we develop
a sparsediscriminative classifier (SDC) and sparse generative model(SGM) for object tracking. In the SDC
module, we present a classifier that separates the foreground object from the background based on holistic
templates. In the SGM module, we propose a histogram-based method that takes the spatial information of
each local patch into consideration. The update scheme considers both the most recent observations and
original templates, thereby enabling the proposed algorithm to deal with appearance changes effectively and
alleviate the tracking drift problem. Numerous experiments on various challenging videos demonstrate that the
proposed tracker performs favorably against several state-of-the-art algorithms.
ETPL
DIP - 015
Robust Object Tracking via Sparse Collaborative Appearance Model
We propose a new set of moment invariants based on Krawtchouk polynomials
for comparison of local patches in 2D images. Being computed from discrete functions, thesemoments do not
carry the error due to discretization. Unlike many orthogonal moments, which usually capture global features,
Krawtchouk moments can be used to compute localdescriptors from a region-of-interest in an image. This can
be achieved by changing two parameters, and hence shifting the center of interest region horizontally or
vertically or both. This property enables comparison of two arbitrary localregions. We show that
Krawtchouk moments can be written as a linear combination of geometric moments, so easily converted to
rotation, size, and position independentinvariants. We also construct local Hu-
based invariants usingHu invariants and utilizing them on images localized by the weight function given in the
definition of Krawtchouk polynomials. We give the formulation of local Krawtchouk-based and Hu-
based invariants, and evaluate their discriminative performance on local comparison of artificially generated
test images.
ETPL
DIP - 016
Comparison of Image Patches Using Local Moment Invariants
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We present a novel scale-invariant image feature detection algorithm (D-SIFER) using a newly
proposed scale-space optimal 10th-order Gaussian derivative (GDO-10) filter, which reaches the jointly
optimal Heisenberg's uncertainty of its impulse response in scale and space simultaneously (i.e., we minimize
the maximum of the two moments). The D-SIFER algorithm using this filter leads to an outstanding quality
ofimage feature detection, with a factor of three quality improvement over state-of-the-art scale-
invariant featuretransform (SIFT) and speeded up robust features (SURF) methods that use the second-order
Gaussian derivativefilters. To reach low computational complexity, we also present a technique approximating
the GDO-10 filters with a fixed-length implementation, which is independent of thescale. The final
approximation error remains far below the noise margin, providing constant time, low cost, but nevertheless
high-quality feature detection and registration capabilities. D-SIFER is validated on a real-life
hyperspectralimage registration application, precisely aligning up to hundreds of successive narrowband
color images, despite their strong artifacts (blurring, low-light noise) typically occurring in such delicate
optical system setups.
ETPL
DIP - 017
Derivative-Based Scale Invariant Image Feature Detector With Error Resilience
Dynamic contrast enhanced magnetic resonance imaging(DCE-MRI) of the kidneys requires proper
motion correction and segmentation to enable an estimation of glomerular filtration rate through
pharmacokinetic modeling. Traditionally, co-registration, segmentation, and pharmacokinetic modeling have
been applied sequentially as separate processing steps. In this paper, a combined 4Dmodel for
simultaneous registration and segmentation of the whole kidney is presented. To demonstrate the model in
numerical experiments, we used normalized gradients as data term in the registration and a Mahalanobis
distance from the time courses of the segmented regions to a training set for supervised segmentation. By
applying this frameworkto an input consisting of 4D image time series, we conduct simultaneous motion
correction and two-regionsegmentation into kidney and background. The potential of the new approach is
demonstrated on real DCE-MRI data from ten healthy volunteers.
ETPL
DIP - 018
Segmentation-Driven Image Registration-Application to 4D DCE-MRI
Recordings of the Moving Kidneys
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We present a sequential framework for change detection. This framework allows us to use
multiple images from reference and mission passes of a scene of interest in order to
improve detection performance. It includes a changestatistic that is easily updated when additional data
becomes available. Detection performance using this statistic is predictable when the reference and image data
are drawn from known distributions. We verify our performance prediction by simulation. Additionally, we
show thatdetection performance improves with additional measurements on a set of synthetic aperture
radar imagesand a set of visible images with unknown probability distributions.
ETPL
DIP - 019
A Sequential Framework for Image Change Detection
We introduce a family of novel image regularization penalties
called generalized higher degree total variation (HDTV). These penalties further extend our previously
introducedHDTV penalties, which generalize the popular total variation(TV) penalty to
incorporate higher degree image derivatives. We show that many of the proposed second degreeextensions of
TV are special cases or are closely approximated by a generalized HDTV penalty. Additionally, we propose a
novel fast alternating minimization algorithm for solving image recovery problems
with HDTV andgeneralized HDTV regularization. The new algorithm enjoys a tenfold speed up compared
with the iteratively reweighted majorize minimize algorithm proposed in a previous paper. Numerical
experiments on 3D magnetic resonance images and 3D microscopy images show
that HDTV and generalizedHDTV improve the image quality significantly compared with TV.
ETPL
DIP - 020
Generalized Higher Degree Total Variation (HDTV) Regularization
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In biometrics research and industry, it is critical yet a
challenge to match infrared face images to optical faceimages. The major difficulty lies in the fact that a great
discrepancy exists between the infrared face image and corresponding optical face image because they are
captured by different devices (optical imaging device and infraredimaging device). This paper presents a new
approach calledcommon feature discriminant analysis to reduce this great discrepancy and improve optical-
infrared face recognition performance. In this approach, a new learning-based facedescriptor is first
proposed to extract the common featuresfrom
heterogeneous face images (infrared face images andoptical face images), and an effective matching method is
then applied to the resulting features to obtain the final decision. Extensive experiments are conducted on two
large and challenging optical-infrared face data sets to show the superiority of our approach over the state-of-
the-art.
ETPL
DIP - 022
Common Feature Discriminant Analysis for Matching Infrared Face Images to
Optical Face Images
In this paper, a probability-based rendering (PBR) method is described for reconstructing an intermediate view
with a steady-state matching probability (SSMP) density function. Conventionally, given multiple reference
images, the intermediate view is synthesized via the depth image-based rendering technique in which
geometric information (e.g., depth) is explicitly leveraged, thus leading to serious rendering artifacts on the
synthesized view even with small depth errors. We address this problem by formulating the rendering process
as an image fusion in which the textures of all probable matching points are adaptively blended with the SSMP
representing the likelihood that points among the input reference images are matched. The PBR hence
becomes more robust against depth estimation errors than existing view synthesis approaches. The MP in the
steady-state, SSMP, is inferred for each pixel via the random walk with restart (RWR). The RWR always
guarantees visually consistent MP, as opposed to conventional optimization schemes (e.g., diffusion or
filtering-based approaches), the accuracy of which heavily depends on parameters used. Experimental results
demonstrate the superiority of the PBR over the existing view synthesis approaches both qualitatively and
quantitatively. Especially, the PBR is effective in suppressing flicker artifacts of virtual video rendering
although no temporal aspect is considered. Moreover, it is shown that the depth map itself calculated from our
RWR-based method (by simply choosing the most probable matching point) is also comparable with that of
the state-of-the-art local stereo matching methods.
ETPL
DIP - 021
Probability-Based Rendering for View Synthesis
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Objective measures to automatically predict the perceptualquality of images or videos can reduce the
time and cost requirements of end-to-end quality monitoring. For reliablequality predictions,
these objective quality measures need to respond consistently with the behavior of the human
visualsystem (HVS). In practice, many important HVS mechanisms are too complex to be modeled directly.
Instead, they can be mimicked by machine learning systems, trained on subjectivequality assessment
databases, and applied on predefinedobjective quality measures for specific content or distortion classes. On
the downside, machine learning systems are often difficult to interpret and may even contradict the
inputobjective quality measures, leading to unreliable qualitypredictions. To address this problem, we
developed an interpretable machine learning system for objective qualityassessment, namely
the locally adaptive fusion (LAF). This paper describes the LAF system and compares its performance with
traditional machine learning. As it turns out, the LAF system is more consistent with the inputmeasures and
can better handle heteroscedastic training data.
ETPL
DIP - 023
A Locally Adaptive System for the Fusion of Objective Quality Measures
Natural image statistics plays an important role in imagedenoising, and various natural image priors,
includinggradient-based, sparse representation-based, and nonlocal self-similarity-based ones, have been
widely studied and exploited for noise removal. In spite of the great success of many denoising algorithms,
they tend to smooth the fine scale image textures when removing noise, degrading theimage visual quality. To
address this problem, in this paper, we propose a texture enhanced image denoising method by enforcing
the gradient histogram of the denoised image to be close to a reference gradient histogram of the
originalimage. Given the reference gradient histogram, a novelgradient histogram preservation (GHP)
algorithm is developed to enhance the texture structures while removing noise. Two region-based variants of
GHP are proposed for the denoising of images consisting of regions with differenttextures. An algorithm is
also developed to effectively estimate the reference gradient histogram from the noisy observation of the
unknown image. Our experimental results demonstrate that the proposed GHP algorithm can well preserve
the texture appearance in the denoised images, making them look more natural.
ETPL
DIP - 024
Gradient Histogram Estimation and Preservation for Texture Enhanced Image
Denoising
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In this paper, we investigate the impact of spatial, temporal, and amplitude resolution on
the perceptual quality of a compressed video. Subjective quality tests were carried out on a mobile device and
a total of 189 processed videosequences with 10 source sequences included in the test. Subjective data reveal
that the impact of spatial resolution(SR), temporal resolution (TR), and quantization stepsize (QS) can each be
captured by a function with a single content-dependent parameter, which indicates the decay rate of
the quality with each resolution factor. The jointimpact of SR, TR, and QS can be accurately modeled by the
product of these three functions with only three parameters. The impact of SR and QS on the quality are
independent of that of TR, but there are significant interactions between SR and QS. Furthermore,
the model parameters can be predicted accurately from a few content features derived from the original video.
The proposed model correlates well with the subjective ratings with a Pearson correlation coefficient of 0.985
when the model parameters are predicted from content features. The quality model is further validated on six
other subjective rating data sets with very high accuracy and outperforms several well-known qualitymodels.
ETPL
DIP - 025
Q-STAR: A Perceptual Video Quality Model Considering Impact of Spatial,
Temporal, and Amplitude Resolutions
In the Part 1 of this two-part study, we present a method
ofimaging and velocity estimation of ground moving targetsusing passive synthetic aperture radar. Such a
system uses a network of small, mobile receivers that collect scattered waves due to transmitters of
opportunity, such as commercial television, radio, and cell phone towers. Therefore, passive imaging systems
have significant cost, manufacturing, and stealth advantages over active systems. We describe a novel
generalized Radon transform-type forward model and a corresponding filtered-backprojection-
type image formation and velocity estimation method. We form a stack of position images over a range of
hypothesized velocities, and show that the targets can be reconstructed at the correct position whenever the
hypothesized velocity is equal to the true velocity of targets. We then use entropy to determine the most
accuratevelocity and image pair for each moving target. We present extensive numerical simulations to verify
the reconstruction method. Our method does not require a priori knowledge of transmitter locations and
transmitted waveforms. It can determine the location and velocity of multiple targetsmoving at
different velocities. Furthermore, it can accommodate arbitrary imaging geometries. In Part 2, we present the
resolution analysis and analysis of positioning errors in passive SAR images due to
erroneous velocityestimation.
ETPL
DIP - 026
Passive Synthetic Aperture Hitchhiker Imaging of Ground Moving Targets—
Part 1: Image Formation and Velocity Estimation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We present double random projection methods forreconstruction of imaging data. The methods draw
upon recent results in the random projection literature, particularly on low-rank matrix approximations, and
the reconstructionalgorithm has only two simple and noniterative steps, while the reconstruction error is close
to the error of the optimal low-rank approximation by the truncated singular-value decomposition. We extend
the often-required symmetric distributions of entries in a random-projection matrix to asymmetric
distributions, which can be more easily implementable on imaging devices. Experimental results are provided
on the subsampling of natural images and hyperspectral images, and on simulated compressible matrices.
Comparisons with other random projectionmethods are also provided.
ETPL
DIP - 027
Image Reconstruction From Double Random Projection
Incorporating image classification into image retrieval system brings many attractive advantages. For
instance, the searchspace can be narrowed down by rejecting images in irrelevant categories of the query. The
retrieved images can be more consistent in semantics by indexing and returning images in the relevant
categories together. However, due to their different goals on recognition accuracy and retrieval scalability, it is
hard to efficiently incorporate most image classification works into large-scale image search. To study this
problem, we propose cascade category-aware visualsearch, which utilizes weak category clue to achieve better
retrieval accuracy, efficiency, and memory consumption. To capture the category and visual clues of an image,
we first learn category-visual words, which are discriminative and repeatable local features labeled with
categories. By identifying category-visual words in database images, we are able to discard noisy local
features and extract imagevisual and category clues, which are hence recorded in a hierarchical index
structure. Our retrieval system narrows down the search space by: 1) filtering the noisy local features in query;
2) rejecting irrelevant categories in database; and 3) preforming discriminative visual search in relevant
categories. The proposed algorithm is tested on object search, landmark search, and large-scale similar
image search on the large-scale LSVRC10 data set. Although the category clue introduced is weak, our
algorithm still shows substantial advantages in retrieval accuracy, efficiency, and memory consumption than
the state-of-the-art.
ETPL
DIP - 028
Cascade Category-Aware Visual Search
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper develops a distributed dictionary learningalgorithm for sparse representation of the
data distributedacross nodes of sensor networks, where the sensitive or private data are stored or there is no
fusion center or there exists a big data application. The main contributions of this paper are: 1) we decouple
the combined dictionary atom update and nonzero coefficient revision procedure into two-stage operations to
facilitate distributed computations, first updating the dictionary atom in terms of the eigenvalue decomposition
of the sum of the residual (correlation) matrices across the nodes then implementing a local projection
operation to obtain the related representationcoefficients for each node; 2) we cast the aforementioned atom
update problem as a set of decentralized optimization subproblems with consensus constraints. Then, we
simplify the multiplier update for the symmetry undirected graphs insensor networks and minimize the
separable subproblems to attain the consistent estimates iteratively; and 3)dictionary atoms are typically
constrained to be of unit norm in order to avoid the scaling ambiguity. We efficiently solve the resultant
hidden convex subproblems by determining the optimal Lagrange multiplier. Some experiments are given to
show that the proposed algorithm is an alternativedistributed dictionary learning approach, and is suitable for
the sensor network environment.
ETPL
DIP - 029
Distributed Dictionary Learning for Sparse Representation in Sensor Networks
A method is proposed for fully restoring local image structures of an unknown continuous-tone patch
from an input halftoned patch with homogenously distributed dot patterns, based on a
locally learned dictionary pair via feature clustering. First, many training sets consisting of paired halftone and
continuous-tone patches are collected, and then histogram-of- oriented-gradient (HOG) feature vectors that
describe the edge orientations are calculated from every continuous-tone patch, to group the training sets.
Next, a dictionary learning algorithm is separately conducted on the categorized training sets, to obtain the
halftone and continuous-tone dictionary pairs, optimized toedge-oriented patch representation. Finally, an
adaptively smoothing filter is applied to the input halftone patch, topredict the HOG feature vector of an
unknown continuous-tone patch, and to select one of the previously learneddictionary pairs, based on the
Euclidean distance between the HOG mean feature vectors of the grouped training sets and the predicted HOG
vector. In addition to using the localdictionary pairs, a patch fusion technique is used to reduce some artifacts,
such as color noise and overemphasizededges on smooth regions. Experimental results show that the use of the
paired dictionary selected by the local edgeorientation and patch fusion technique not only reduced the
artifacts in smooth regions, but also provided well expressed fine details and outlines, especially in the areas of
textures, lines, and regular patterns.
ETPL
DIP - 030
Local Learned Dictionaries Optimized to Edge Orientation for Inverse
Halftoning
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Effective characterization of texture images requires exploiting multiple visual cues from the image
appearance. The local binary pattern (LBP) and its variants achieve great success in texture description.
However, because the LBP(-like) feature is an index of discrete patterns rather than a numerical feature, it is
difficult to combine the LBP(-like)feature with other discriminative ones by a compact descriptor. To
overcome the problem derived from the nonnumerical constraint of the LBP, this paper proposes a numerical
variant accordingly, named the LBP difference(LBPD). The LBPD characterizes the extent to which
one LBPvaries from the average local structure of an image region of interest. It is simple, rotation invariant,
and computationally efficient. To achieve enhanced performance, we combine the LBPD with other
discriminative cues by a covariance matrix. The proposed descriptor, termed the covariance and LBPD
descriptor (COV-LBPD), is able to capture the intrinsiccorrelation between the LBPD and other features in a
compact manner. Experimental results show that the COV-LBPD achieves promising results on publicly
available data sets.
ETPL
DIP - 031
Combining LBP Difference and Feature Correlation for Texture Description
We address single image super resolution usinga statisticalprediction model based on sparse representations of
low- and high-resolution image patches. The suggested modelallows us to avoid any invariance assumption,
which is a common practice in sparsity-based approaches treating this task. Prediction of
high resolution patches is obtained via MMSE estimation and the resulting scheme has the useful
interpretation of a feedforward neural network. To further enhance performance, we suggest data clustering
and cascading several levels of the basic algorithm. We suggest a training scheme for the resulting network
and demonstrate the capabilities of our algorithm, showing its advantages over existing methods based on a
low- and high-resolution dictionary pair, in terms of computational complexity, numerical criteria, and visual
appearance. The suggested approach offers a desirable compromise between low computational complexity
and reconstruction quality, when comparing it with state-of-the-art methods for singleimage super-resolution.
ETPL
DIP - 032
A Statistical Prediction Model Based on Sparse Representations for Single
Image Super-Resolution
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Feature description for local image patch is widely used in computer vision. While the conventional
way to design local descriptor is based on expert experience and knowledge, learning-based methods for
designing local descriptor become more and more popular because of their good performance and data-driven
property. This paper proposes a novel data-driven method for designing binary featuredescriptor, which we
call receptive fields descriptor (RFD). Technically, RFD is constructed by thresholding responses of a set
of receptive fields, which are selected from a large number of candidates according to their distinctiveness and
correlations in a greedy way. Using two different kinds ofreceptive fields (namely rectangular pooling area
and Gaussian pooling area) for selection, we obtain two binarydescriptors ${rm RFD}_{rm R}$ and ${rm
RFD}_{rm G}$.accordingly. Image matching experiments on the well-known patch data set and Oxford data
set demonstrate that RFD significantly outperforms the state-of-the-art binarydescriptors, and is comparable
with the best float-valued descriptors at a fraction of processing time. Finally, experiments on object
recognition tasks confirm that both${rm RFD}_{rm R}$ and ${rm RFD}_{rm G}$ successfully bridge the
performance gap between binary descriptors and their floating-point competitors.
ETPL
DIP - 033
Receptive Fields Selection for Binary Feature Description
Pan-sharpening is a common postprocessing operation for captured multispectral satellite imagery,
where the spatial resolution of images gathered in various spectral bands is enhanced by fusing them with a
panchromatic image captured at a higher resolution. In this paper, pan-sharpening is formulated as the problem
of jointly estimating the high-resolution (HR) multispectral images to minimize an objective function
comprised of the sum of squared residual errors in physically motivated observation models of the low-
resolution (LR) multispectral and the HR panchromatic images and a correlation-dependent regularization
term. The objective function differs from and improves upon previously reported model-
based optimization approaches to pan-sharpening in two major aspects: 1) a new regularization term is
introduced and 2) a highpass filter, complementary to the lowpass filter for the LR spectral observations, is
introduced for the residual error corresponding to the panchromatic observation model. To obtain pan-
sharpened images, an iterative algorithm is developed to solve the proposed joint minimization. The proposed
algorithm is compared with previously proposed methods both visually and using established quantitative
measures of SNR, spectral angle mapper, relative dimensionless global error in synthesis, Q, and Q4 indices.
Both the quantitative results and visual evaluation demonstrate that the proposed joint formulation provides
superior results compared with pre-existing methods. A software implementation is provided.
ETPL
DIP - 034
A Regularized Model-Based Optimization Framework for Pan-Sharpening
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Fluorescence diffuse optical tomography (FDOT) is an emerging molecular imaging modality that
uses near infraredlight to excite the fluorophore injected into tissue; and to reconstruct the fluorophore
concentration from boundary measurements. The FDOT image reconstruction is a highly ill-posed inverse
problem due to a large number of unknowns and limited number of measurements. However, the fluorophore
distribution is often very sparse in the imaging domain since fluorophores are typically designed to accumulate
in relatively small regions. In this paper, we usecompressive sensing (CS) framework to
design lightillumination and detection patterns to improve the reconstruction of sparse fluorophore
concentration. Unlike the conventional FDOT imaging where spatially distributedlight sources illuminate the
imaging domain one at a time and the corresponding boundary measurements are used for image
reconstruction, we assume that the light sources illuminate the imaging domain simultaneously several times
and the corresponding boundary measurements are linearly filtered prior to image reconstruction. We design a
set ofoptical intensities (illumination patterns) and a linear filter (detection pattern) applied to the boundary
measurements to improve the reconstruction of sparse fluorophore concentration maps. We show that the
FDOT sensing matrix can be expressed as a columnwise Kronecker product of two matrices determined by the
excitation and emission lightfields. We derive relationships between the incoherence of the FDOT forward
matrix and these two matrices, and use these results to reduce the incoherence of the FDOT forward matrix.
We present extensive numerical simulation and the results of a real phantom experiment to demonstrate the
improvements in image reconstruction due to the CS-basedlight illumination and detection patterns in
conjunction with relaxation and greedy-type reconstruction algorithms.
ETPL
DIP - 035
Light Illumination and Detection Patterns for Fluorescence Diffuse Optical
Tomography Based on Compressive Sensing
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Many saliency detection models for 2D images have been proposed for various multimedia
processing applications during the past decades. Currently, the emerging applications of stereoscopic display
require new saliencydetection models for salient region extraction. Different fromsaliency detection for
2D images, the depth feature has to be taken into account in saliency detection for stereoscopicimages. In this
paper, we propose a novel stereoscopicsaliency detection framework based on the feature contrast of color,
luminance, texture, and depth. Four types of features, namely color, luminance, texture, and depth, are
extracted from discrete cosine transform coefficients for feature contrast calculation. A Gaussian model of the
spatial distance between image patches is adopted for consideration of local and global contrast calculation.
Then, a new fusion method is designed to combine the feature maps to obtain the final saliency map
for stereoscopic images. In addition, we adopt the center bias factor and human visual acuity, the important
characteristics of the human visual system, to enhance the final saliency map for stereoscopicimages.
Experimental results on eye tracking databases show the superior performance of the proposed model over
other existing methods.
ETPL
DIP - 036
Saliency Detection for Stereoscopic Images
Because of the lack of disciplined and efficient mechanisms, most modern area charge-coupled
device-based barcodescanning technologies are not capable of handling out-of-focus (OOF) image blur and
rely heavily on camera systems for capturing good quality, well-focused barcode images. In this paper, we
present a novel linear barcode scanningsystem based on a dynamic template matching scheme. The proposed
system works entirely in the spatial domain, and is capable of reading linear barcodes from low-
resolutionimages containing severe OOF blur. This paper treats linearbarcode scanning under the perspective
of deformed binary waveform analysis and classification. A directed graphical model is designed to
characterize the relationship between the blurred barcode waveform and its corresponding symbol value at any
specific blur level. Under this model, linearbarcode scanning is cast to find the optimal state sequence
associated with the deformed barcode waveform segments. A dynamic programming-based inference
algorithm is designed to retrieve the optimal state sequence, enabling real-time decoding on mobile devices of
limited processing power.
ETPL
DIP - 037
On Scanning Linear Barcodes From Out-of-Focus Blurred Images: A Spatial
Domain Dynamic Template Matching Approach
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Mixed noise removal from natural images is a challenging task since the noise distribution usually
does not have a parametric model and has a heavy tail. One typical kind ofmixed noise is additive white
Gaussian noise (AWGN) coupled with impulse noise (IN). Many mixed noise removalmethods are detection
based methods. They first detect the locations of IN pixels and then remove the mixed noise. However, such
methods tend to generate many artifacts when the mixed noise is strong. In this paper, we propose a simple yet
effective method, namely weighted encoding withsparse nonlocal regularization (WESNR),
for mixed noiseremoval. In WESNR, there is not an explicit step of impulse pixel detection; instead, soft
impulse pixel detection viaweighted encoding is used to deal with IN and AWGN simultaneously. Meanwhile,
the image sparsity prior andnonlocal self-similarity prior are integrated into aregularization term and
introduced into the variationalencoding framework. Experimental results show that the proposed WESNR
method achieves leading mixed noiseremoval performance in terms of both quantitative measures and visual
quality.
ETPL
DIP - 038
Mixed Noise Removal by Weighted Encoding With Sparse Nonlocal
Regularization
This paper presents a nonlinear mixing model forhyperspectral image unmixing. The proposed model
assumes that the pixel reflectances are post-nonlinear functions of unknown pure spectral components
contaminated by an additive white Gaussian noise. These nonlinear functions are approximated using second-
order polynomials leading to a polynomial post-nonlinear mixing model. A Bayesianalgorithm is proposed to
estimate the parameters involved in the model yielding an unsupervised nonlinear unmixingalgorithm. Due to
the large number of parameters to be estimated, an efficient Hamiltonian Monte Carlo algorithm is
investigated. The classical leapfrog steps of this algorithmare modified to handle the parameter constraints.
The performance of the unmixing strategy, including convergence and parameter tuning, is first evaluated on
synthetic data. Simulations conducted with real data finally show the accuracy of the
proposed unmixing strategy for the analysis of hyperspectral images.
ETPL
DIP - 039
Unsupervised Post-Nonlinear Unmixing of Hyperspectral Images Using a
Hamiltonian Monte Carlo Algorithm
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We aim to realize a new and simple compensation method that robustly handles multiple-
projector systems withoutrecourse to the linearization of projector response functions. We introduce state
equations, which distribute arbitrary brightness among the individual projectors, and control the state
equations according to the feedback from a camera. By employing the color-mixing matrix with gradient
of projectorresponses, we compensate the controlled brightness input to each projector. Our method dispenses
with cooperation among multiple projectors as well as time-consuming photometric calibration. Compared
with existing methods, our method is shown to offer superior compensationperformance and a more effective
way of compensatingmultiple-projector systems.
ETPL
DIP - 040
An Iterative Compensation Approach Without Linearization of Projector
Responses for Multiple-Projector System
Camera motion blur is drastically nonuniform for large depth-range scenes, and the nonuniformity
caused by cameratranslation is depth dependent but not the case for camerarotations. To restore the blurry
images of large-depth-range scenes deteriorated by arbitrary camera motion, we build an image blur model
considering 6-degrees of freedom (DoF) ofcamera motion with a given scene depth map. To make this
6D depth-aware model tractable, we propose a novel parametrization strategy to reduce the number of
variables and an effective method to estimate high-dimensionalcamera motion as well. The number of
variables is reduced by temporal sampling motion function, which describes the 6-DoF camera motion by
sampling the camera trajectory uniformly in time domain. To effectively estimate the high-
dimensional camera motion parameters, we construct the probabilistic motion density function (PMDF) to
describe the probability distribution of camera poses during exposure, and apply it as a unified constraint to
guide the convergence of the iterative deblurring algorithm. Specifically, PMDF is computed through a back
projection from 2D local blur kernels to 6D camera motion parameter space and robust voting. We conduct a
series of experiments on both synthetic and real captured data, and validate that our method achieves better
performance than existing uniform methods and nonuniform methods on large-depth-range scenes.
ETPL
DIP - 041
High-Dimensional Camera Shake Removal With Given Depth Map
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Automatic video summarization is indispensable for fast browsing and efficient management of
large video libraries. In this paper, we introduce an image feature that we
referto as heterogeneity image patch (HIP) index. The proposed HIP index provides a new entropy-based
measure of theheterogeneity of patches within any picture. By evaluating this index for every frame in
a video sequence, we generate a HIP curve for that sequence. We exploit the HIP curve in solving two
categories of video summarization applications: key frame extraction and dynamic video skimming. Under the
key frame extraction framework, a set of candidate key frames is selected from abundant video frames based
on the HIP curve. Then, a proposed patch-based image dissimilarity measure is used to create affinity matrix
of these candidates. Finally, a set of key frames is extracted from the affinity matrix using a min-max based
algorithm. Under videoskimming, we propose a method to measure the distance between
a video and its skimmed representation. The videoskimming problem is then mapped into an optimization
framework and solved by minimizing a HIP-based distance for a set of extracted excerpts. The HIP framework
is pixel-based and does not require semantic information or complex camera motion estimation. Our
simulation results are based on experiments performed on consumer videos and are compared with state-of-
the-art methods. It is shown that the HIP approach outperforms other leading methods, while maintaining low
complexity.
ETPL
DIP - 042
Heterogeneity Image Patch Index and Its Application to Consumer Video
Summarization
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Successful image-based object recognition techniques have been constructed founded on powerful
techniques such assparse representation, in lieu of the popular vector quantization approach. However, one
serious drawback ofsparse space-based methods is that local features that are quite similar can be quantized
into quite distinct visualwords. We address this problem with a novel approach for object recognition,
called sparse spatial coding, which efficiently combines a sparse coding dictionary learning
andspatial constraint coding stage. We performed experimental evaluation using the Caltech 101, Caltech 256,
Corel 5000, and Corel 10000 data sets, which were specifically designed for object recognition evaluation.
Our results show that ourapproach achieves high accuracy comparable with the best single feature method
previously published on those databases. Our method outperformed, for the same bases, several multiple
feature methods, and provided equivalent, and in few cases, slightly less accurate results than other techniques
specifically designed to that end. Finally, we report state-of-the-art results for scene recognition on COsy
Localization Dataset (COLD) and high performance results on the MIT-67 indoor scene recognition, thus
demonstrating the generalization of our approach for such tasks.
ETPL
DIP - 043
Sparse Spatial Coding: A Novel Approach to Visual Recognition
The stimulus response of the classical receptive field (CRF) of neuron in primary visual cortex is
affected by its periphery [i.e., non-CRF (nCRF)]. This modulation exerts inhibition, which depends primarily
on the correlation of both visual stimulations. The theory of periphery and center interaction with visual
characteristics can be applied in night vision information processing. In this paper, a weighted kernel principal
component analysis (WKPCA) degree ofhomogeneity (DH) amended inhibition model inspired by visual
perceptual mechanisms is proposed to extract salientcontour from complex natural scene in low-light-level
image. The core idea is that multifeature analysis can recognize thehomogeneity in modulation coverage
effectively. Computationally, a novel WKPCA algorithm is presented to eliminate outliers and anomalous
distribution in CRF and accomplish principal component analysis precisely. On this basis, a new concept and
computational procedure for DH is defined to evaluate the dissimilarity between periphery and center
comprehensively. Through amending the inhibitionfrom nCRF to CRF by DH, our model can reduce the
interference of noises, suppress details, and textures in homogeneous regions accurately. It helps to further
avoid mutual suppression among inhomogeneous regions andcontour elements. This paper provides an
improved computational visual model with high-performance forcontour detection from cluttered natural
scene in night vision image.
ETPL
DIP - 044
Weighted KPCA Degree of Homogeneity Amended Nonclassical Receptive Field
Inhibition Model for Salient Contour Extraction in Low-Light-Level Image
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We present an effective image boundary processing for $M$-channel $(MinBBN, Mgeq 2)$ lifting-
based linear-phase filterbanks that are applied to unified lossy and lossless imagecompression (coding), i.e.,
lossy-to-lossless image coding. The reversible symmetric extension we propose is achieved by manipulating
building blocks on the image boundary and reawakening the symmetry of each building block that has been
lost due to rounding error on each lifting step. In addition, complexity is reduced by
extending nonexpansiveconvolution, called reversible symmetric nonexpansiveconvolution, because the
number of input signals does not even temporarily increase. Our method not only
achievesreversible boundary processing, but also is comparable with irreversible symmetric extension in
lossy image coding and outperformed periodic extension in lossy-to-lossless imagecoding.
ETPL
DIP - 045
Reversible Symmetric Nonexpansive Convolution: An Effective Image
Boundary Processing for $mbi{M}$ -Channel Lifting-Based Linear-Phase Filter
Banks
The behavior and performance of denoising algorithms are governed by one or several parameters,
whose optimal settings depend on the content of the processed image and the characteristics of the noise, and
are generally designed to minimize the mean squared error (MSE) between the denoised image returned by the
algorithm and a virtual ground truth. In this paper, we introduce a new Poisson-
Gaussian unbiased risk estimator (PG-URE) of the MSE applicable to a mixed Poisson-Gaussian noise model
that unifies the widely used Gaussian and Poisson noise models in fluorescence bioimaging applications. We
propose a stochastic methodology to evaluate this estimator in the case when little is known about the internal
machinery of the considered denoising algorithm, and we analyze both theoretically and empirically the
characteristics of the PG-UREestimator. Finally, we evaluate the PG-URE-driven parametrization for three
standard denoising algorithms, with and without variance stabilizing transforms, and different characteristics
of the Poisson-Gaussian noisemixture.
ETPL
DIP - 046
An Unbiased Risk Estimator for Image Denoising in the Presence of Mixed
Poisson–Gaussian Noise
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We present a new efficient edge-preserving filter-“treefilter”-to achieve strong image smoothing. The
proposedfilter can smooth out high-contrast details while preservingmajor edges, which is not achievable for
bilateral-filter-like techniques. Tree filter is a weighted-average filter, whose kernel is derived by viewing
pixel affinity in a probabilistic framework simultaneously considering pixel spatial distance, color/intensity
difference, as well as connectedness. Pixel connectedness is acquired by treating pixels as nodes in
aminimum spanning tree (MST) extracted from the image. The fact that an MST makes all image pixels
connected through the tree endues the filter with the power to smooth out high-contrast, fine-scale details
while preserving major image structures, since pixels in small isolated region will be closely connected to
surrounding majority pixels through thetree, while pixels inside large homogeneous region will be
automatically dragged away from pixels outside the region. The tree filter can be separated into two
other filters, both of which turn out to have fast algorithms. We also propose an efficient linear time MST
extraction algorithm to further improve the whole filtering speed. The algorithms give treefilter a great
advantage in low computational complexity (linear to number of image pixels) and fast speed: it can process a
1-megapixel 8-bit image at ~ 0.25 s on an Intel 3.4 GHz Core i7 CPU (including the construction of MST).
The proposed tree filter is demonstrated on a variety of applications.
ETPL
DIP - 047
Tree Filtering: Efficient Structure-Preserving Smoothing With a Minimum
Spanning Tree
The development of energy selective, photon counting X-ray detectors allows for a wide range of new
possibilities in the area of computed tomographic image formation. Under the assumption of perfect energy
resolution, here we propose a tensor-based iterative algorithm that simultaneously reconstructs the X-ray
attenuation distribution for each energy. We use a multilinear image model rather than a more standard
stacked vector representation in order to develop novel tensor-based regularizers. In particular, we model the
multispectral unknown as a three-way tensor where the first two dimensions are space and the third dimension
is energy. This approach allows for the design of tensor nuclear norm regularizers, which like its 2D
counterpart, is a convex function of the multispectral unknown. The solution to the resulting convex
optimization problem is obtained using an alternating direction method of multipliers approach. Simulation
results show that the generalized tensor nuclear norm can be used as a standalone regularization technique for
the energy selective (spectral) computed tomography problem and when combined with total variation
regularization it enhances the regularization capabilities especially at low energy images where the effects of
noise are most prominent.
ETPL
DIP - 048
Tensor-Based Formulation and Nuclear Norm Regularization for Multienergy
Computed Tomography
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper studies the impact of secure watermark embedding in digital images by proposing a
practical implementation of secure spread-spectrum watermarking using distortion optimization. Because
strong security properties (key-security and subspace-security) can be achieved using natural watermarking
(NW) since this particular embedding lets the distribution of the host and watermarked signals unchanged, we
use elements of transportation theory to minimize the global distortion. Next, we apply this new modulation,
called transportation NW (TNW), to design a secure watermarking scheme for grayscale images. The TNW
uses a multiresolution image decomposition combined with a multiplicative embedding which is taken into
account at the distribution level. We show that the distortion solely relies on the variance of the wavelet
subbands used during the embedding. In order to maximize a target robustness after JPEG compression, we
select different combinations of subbands offering the lowest Bit Error Rates for a target PSNR ranging from
35 to 55 dB and we propose an algorithm to select them. The use of transportation theory also provides an
average PSNR gain of 3.6 dB on PSNR with respect to the previous embedding for a set of 2000 images.
ETPL
DIP - 049
Optimal Transport for Secure Spread-Spectrum Watermarking of Still Images
In this paper, we propose an efficient algorithm, called vector field consensus, for establishing robust
point correspondences between two sets of points. Our algorithm starts by creating a set of putative
correspondences which can contain a very large number of false correspondences, or outliers, in addition to a
limited number of true correspondences (inliers). Next, we solve for correspondence by interpolating a vector
field between the two point sets, which involves estimating a consensus of inlier points whose matching
follows a nonparametric geometrical constraint. We formulate this a maximum a posteriori (MAP) estimation
of a Bayesian model with hidden/latent variables indicating whether matches in the putative set are outliers or
inliers. We impose nonparametric geometrical constraints on the correspondence, as a prior distribution, using
Tikhonov regularizers in a reproducing kernel Hilbert space. MAP estimation is performed by the EM
algorithm which by also estimating the variance of the prior model (initialized to a large value) is able to
obtain good estimates very quickly (e.g., avoiding many of the local minima inherent in this formulation). We
illustrate this method on data sets in 2D and 3D and demonstrate that it is robust to a very large number of
outliers (even up to 90%). We also show that in the special case where there is an underlying parametric
geometrical model (e.g., the epipolar line constraint) that we obtain better results than standard alternatives
like RANSAC if a large number of outliers are present. This suggests a two-stage strategy, where we use our
nonparametric model to reduce the size of the putative set and then apply a parametric variant of our approach
to estimate the geometric parameters. Our algorithm is computationally efficient and we provide code for
others to use it. In addition, our approach is general and can be applied to other problems, such as learning
with a badly corrupted training data set.
ETPL
DIP - 050
Robust Point Matching via Vector Field Consensus
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In this paper, we address the problem of the high annotation cost of acquiring training data for
semantic segmentation. Most modern approaches to semantic segmentation are based upon graphical models,
such as the conditional random fields, and rely on sufficient training data in form of object contours. To reduce
the manual effort on pixel-wise annotating contours, we consider the setting in which the training data set for
semantic segmentation is a mixture of a few object contours and an abundant set of bounding boxes of objects.
Our idea is to borrow the knowledge derived from the object contours to infer the unknown object contours
enclosed by the bounding boxes. The inferred contours can then serve as training data for semantic
segmentation. To this end, we generate multiple contour hypotheses for each bounding box with the
assumption that at least one hypothesis is close to the ground truth. This paper proposes an approach, called
augmented multiple instance regression (AMIR), that formulates the task of hypothesis selection as the
problem of multiple instance regression (MIR), and augments information derived from the object contours to
guide and regularize the training process of MIR. In this way, a bounding box is treated as a bag with its
contour hypotheses as instances, and the positive instances refer to the hypotheses close to the ground truth.
The proposed approach has been evaluated on the Pascal VOC segmentation task. The promising results
demonstrate that AMIR can precisely infer the object contours in the bounding boxes, and hence provide
effective alternatives to manually labeled contours for semantic segmentation.
ETPL
DIP - 051
Augmented Multiple Instance Regression for Inferring Object Contours in
Bounding Boxes
The parsimonious nature of sparse representations has been successfully exploited for the
development of highly accurate classifiers for various scientific applications. Despite the successes of Sparse
Representation techniques, a large number of dictionary atoms as well as the high dimensionality of the data
can make these classifiers computationally demanding. Furthermore, sparse classifiers are subject to the
adverse effects of a phenomenon known as coefficient contamination, where, for example, variations in pose
may affect identity and expression recognition. We analyze the interaction between dimensionality reduction
and sparse representations, and propose a technique, called Linear extension of Graph Embedding K-means-
based Singular Value Decomposition (LGE-KSVD) to address both issues of computational intensity and
coefficient contamination. In particular, the LGE-KSVD utilizes variants of the LGE to optimize the K-SVD,
an iterative technique for small yet over complete dictionary learning. The dimensionality reduction matrix,
sparse representation dictionary, sparse coefficients, and sparsity-based classifier are jointly learned through
the LGE-KSVD. The atom optimization process is redefined to allow variable support using graph embedding
techniques and produce a more flexible and elegant dictionary learning algorithm. Results are presented on a
wide variety of facial and activity recognition problems that demonstrate the robustness of the proposed
method.
ETPL
DIP - 052
LGE-KSVD: Robust Sparse Representation Classification
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Speckle noise filtering on polarimetric SAR (PolSAR) images remains a challenging task due to the
difficulty to reduce a scatterer-dependent noise while preserving the polarimetric information and the spatial
information. This challenge is particularly acute on single look complex images, where little information about
the scattering process can be derived from a rank-1 covariance matrix. This paper proposes to analyze and to
evaluate the performances of a set of PolSAR speckle filters. The filter performances are measured by a set of
ten different indicators, including relative errors on incoherent target decomposition parameters, coherences,
polarimetric signatures, point target, and edge preservation. The result is a performance profile for each
individual filter. The methodology consists of simulating a set of artificial PolSAR images on which the
various filters will be evaluated. The image morphology is stochastic and determined by a Markov random
field and the number of scattering classes is allowed to vary so that we can explore a large range of image
configurations. Evaluation on real PolSAR images is also considered. Results show that filters performances
need to be assessed using a complete set of indicators, including distributed scatterer parameters, radiometric
parameters, and spatial information preservation.
ETPL
DIP - 053
Analysis, Evaluation, and Comparison of Polarimetric SAR Speckle Filtering
Techniques
Detecting generic object categories in images and videos are a fundamental issue in computer vision.
However, it faces the challenges from inter and intraclass diversity, as well as distortions caused by
viewpoints, poses, deformations, and so on. To solve object variations, this paper constructs a structure kernel
and proposes a multiscale part-based model incorporating the discriminative power of kernels. The structure
kernel would measure the resemblance of part-based objects in three aspects: 1) the global similarity term to
measure the resemblance of the global visual appearance of relevant objects; 2) the part similarity term to
measure the resemblance of the visual appearance of distinctive parts; and 3) the spatial similarity term to
measure the resemblance of the spatial layout of parts. In essence, the deformation of parts in the structure
kernel is penalized in a multiscale space with respect to horizontal displacement, vertical displacement, and
scale difference. Part similarities are combined with different weights, which are optimized efficiently to
maximize the intraclass similarities and minimize the interclass similarities by the normalized stochastic
gradient ascent algorithm. In addition, the parameters of the structure kernel are learned during the training
process with regard to the distribution of the data in a more discriminative way. With flexible part sizes on
scale and displacement, it can be more robust to the intraclass variations, poses, and viewpoints. Theoretical
analysis and experimental evaluations demonstrate that the proposed multiscale part-based representation
model with structure kernel exhibits accurate and robust performance, and outperforms state-of-the-art object
classification approaches.
ETPL
DIP - 054
Data-Driven Hierarchical Structure Kernel for Multiscale Part-Based Object
Recognition
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper investigates the use of local prediction in difference expansion reversible watermarking.
For each pixel, a least square predictor is computed on a square block centered on the pixel and the
corresponding prediction error is expanded. The same predictor is recovered at detection without any
additional information. The proposed local prediction is general and it applies regardless of the predictor order
or the prediction context. For the particular cases of least square predictors with the same context as the
median edge detector, gradient-adjusted predictor or the simple rhombus neighborhood, the local prediction-
based reversible watermarking clearly outperforms the state-of-the-art schemes based on the classical
counterparts. Experimental results are provided.
ETPL
DIP - 055
Local-Prediction-Based Difference Expansion Reversible Watermarking
We consider a wireless relay network with a single source, a single destination, and a multiple relay.
The relays are half-duplex and use the decode-and-forward protocol. The transmit source is a layered video
bitstream, which can be partitioned into two layers, a base layer (BL) and an enhancement layer (EL), where
the BL is more important than the EL in terms of the source distortion. The source broadcasts both layers to
the relays and the destination using hierarchical 16-QAM. Each relay detects and transmits successfully
decoded layers to the destination using either hierarchical 16-QAM or QPSK. The destination can thus receive
multiple signals, each of which can include either only the BL or both the BL and the EL. We derive the
optimal linear combining method at the destination, where the uncoded bit error rate is minimized. We also
present a suboptimal combining method with a closed-form solution, which performs very close to the
optimal. We use the proposed double-layer transmission scheme with our combining methods for transmitting
layered video bitstreams. Numerical results show that the double-layer scheme can gain 2-2.5 dB in channel
signal-to-noise ratio or 5-7 dB in video peak signal-to-noise ratio, compared with the classical single-layer
scheme using conventional modulation.
ETPL
DIP - 056
Double-Layer Video Transmission Over Decode-and-Forward Wireless Relay
Networks Using Hierarchical Modulation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Once an image is decomposed into a number of visual primitives, e.g., local interest points or regions,
it is of great interests to discover meaningful visual patterns from them. Conventional clustering of visual
primitives, however, usually ignores the spatial and feature structure among them, thus cannot discover high-
level visual patterns of complex structure. To overcome this problem, we propose to consider spatial and
feature contexts among visual primitives for pattern discovery. By discovering spatial co-occurrence patterns
among visual primitives and feature co-occurrence patterns among different types of features, our method can
better address the ambiguities of clustering visual primitives. We formulate the pattern discovery problem as a
regularized k-means clustering where spatial and feature contexts are served as constraints to improve the
pattern discovery results. A novel self-learning procedure is proposed to utilize the discovered spatial or
feature patterns to gradually refine the clustering result. Our self-learning procedure is guaranteed to converge
and experiments on real images validate the effectiveness of our method.
ETPL
DIP - 057
Context-Aware Discovery of Visual Co-Occurrence Patterns
We propose a genuine 3D texture synthesis algorithm based on a probabilistic 2D Markov random
field conceptualization, capable of capturing the visual characteristics of a texture into a unique statistical
texture model. We intend to reproduce, in the volumetric texture, the interactions between pixels learned in an
input 2D image. The learning is done by nonparametric Parzen-windowing. Optimization is handled voxel by
a relaxation algorithm, aiming at maximizing the likelihood of each voxel in terms of its local conditional
probability function. Variants are proposed regarding the relaxation algorithm and the heuristic strategies used
for the simultaneous handling of the orthogonal slices containing the voxel. The procedures are materialized
on various textures through a comparative study and a sensitivity analysis, highlighting the variants strengths
and weaknesses. Finally, the probabilistic model is compared objectively with a nonparametric neighborhood-
search-based algorithm.
ETPL
DIP - 058
Maximum-Likelihood Based Synthesis of Volumetric Textures From a 2D
Sample
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Multiplicative noise (also known as speckle) reduction is a prerequisite for many image-processing
tasks in coherent imaging systems, such as the synthetic aperture radar. One approach extensively used in this
area is based on total variation (TV) regularization, which can recover significantly sharp edges of an image,
but suffers from the staircase-like artifacts. In order to overcome the undesirable deficiency, we propose two
novel models for removing multiplicative noise based on total generalized variation (TGV) penalty. The TGV
regularization has been mathematically proven to be able to eliminate the staircasing artifacts by being aware
of higher order smoothness. Furthermore, an efficient algorithm is developed for solving the TGV-based
optimization problems. Numerical experiments demonstrate that our proposed methods achieve state-of-the-art
results, both visually and quantitatively. In particular, when the image has some higher order smoothness, our
methods outperform the TV-based algorithms.
ETPL
DIP - 059
Speckle Reduction via Higher Order Total Variation Approach
In order to quantitatively analyze biological images and study underlying mechanisms of the cellular
and subcellular processes, it is often required to track a large number of particles involved in these processes.
Manual tracking can be performed by the biologists, but the workload is very heavy. In this paper, we present
an automatic particle tracking method for analyzing an essential subcellular process, namely clathrin mediated
endocytosis. The framework of the tracking method is an extension of the classical multiple hypothesis
tracking (MHT), and it is designed to manage trajectories, solve data association problems, and handle pseudo-
splitting/merging events. In the extended MHT framework, particle tracking becomes evaluating two types of
hypotheses. The first one is the trajectory-related hypothesis, to test whether a recovered trajectory is correct,
and the second one is the observation-related hypothesis, to test whether an observation from an image
belongs to a real particle. Here, an observation refers to a detected particle and its feature vector. To detect the
particles in 2D fluorescence images taken using total internal reflection microscopy, the images are segmented
into regions, and the features of the particles are obtained by fitting Gaussian mixture models into each of the
image regions. Specific models are developed according to the properties of the particles. The proposed
tracking method is demonstrated on synthetic data under different scenarios and applied to real data.
ETPL
DIP - 060
A Novel Multiple Hypothesis Based Particle Tracking Method for Clathrin
Mediated Endocytosis Analysis Using Fluorescence Microscopy
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper focuses on the problem of detecting a number of different class objects in images. We
present a novel part-based model for object detection with cascaded classifiers. The coarse root and fine part
classifiers are combined into the model. Different from the existing methods which learn root and part
classifiers independently, we propose a shared-Boost algorithm to jointly train multiple classifiers. This paper
is distinguished by two key contributions. The first is to introduce a new definition of shared features for
similar pattern representation among multiple classifiers. Based on this, a shared-Boost algorithm which
jointly learns multiple classifiers by reusing the shared feature information is proposed. The second
contribution is a method for constructing a discriminatively trained part-based model, which fuses the outputs
of cascaded shared-Boost classifiers as high-level features. The proposed shared-Boost-based part model is
applied for both rigid and deformable object detection experiments. Compared with the state-of-the-art
method, the proposed model can achieve higher or comparable performance. In particular, it can lift up the
detection rates in low-resolution images. Also the proposed procedure provides a systematic framework for
information reusing among multiple classifiers for part-based object detection.
ETPL
DIP - 061
Learning Cascaded Shared-Boost Classifiers for Part-Based Object Detection
In this paper, we cast the tracking problem as finding the candidate that scores highest in the
evaluation model based upon a matrix called discriminative sparse similarity map (DSS map). This map
demonstrates the relationship between all the candidates and the templates, and it is constructed based on the
solution to an innovative optimization formulation named multitask reverse sparse representation formulation,
which searches multiple subsets from the whole candidate set to simultaneously reconstruct multiple templates
with minimum error. A customized APG method is derived for getting the optimum solution (in matrix form)
within several iterations. This formulation allows the candidates to be evaluated accurately in parallel rather
than one-by-one like most sparsity-based trackers do and meanwhile considers the relationship between
candidates, therefore it is more superior in terms of cost-performance ratio. The discriminative information
containing in this map comes from a large template set with multiple positive target templates and hundreds of
negative templates. A Laplacian term is introduced to keep the coefficients similarity level in accordance with
the candidates similarities, thereby making our tracker more robust. A pooling approach is proposed to extract
the discriminative information in the DSS map for easily yet effectively selecting good candidates from bad
ones and finally get the optimum tracking results. Plenty experimental evaluations on challenging image
sequences demonstrate that the proposed tracking algorithm performs favorably against the state-of-the-art
methods.
ETPL
DIP - 062
Visual Tracking via Discriminative Sparse Similarity Map
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In this paper, we propose a novel example-based method for denoising and super-resolution of
medical images. The objective is to estimate a high-resolution image from a single noisy low-resolution
image, with the help of a given database of high and low-resolution image patch pairs. Denoising and super-
resolution in this paper is performed on each image patch. For each given input low-resolution patch, its high-
resolution version is estimated based on finding a nonnegative sparse linear representation of the input patch
over the low-resolution patches from the database, where the coefficients of the representation strongly depend
on the similarity between the input patch and the sample patches in the database. The problem of finding the
nonnegative sparse linear representation is modeled as a nonnegative quadratic programming problem. The
proposed method is especially useful for the case of noise-corrupted and low-resolution image. Experimental
results show that the proposed method outperforms other state-of-the-art super-resolution methods while
effectively removing noise.
ETPL
DIP - 063
Novel Example-Based Method for Super-Resolution and Denoising of Medical
Images
Compressive spectral imaging (CSI) senses the spatio-spectral information of a scene by measuring
2D coded projections on a focal plane array. A ℓ1-norm-based optimization algorithm is then used to recover
the underlying discretized spectral image. The coded aperture snapshot spectral imager (CASSI) is an
architecture realizing CSI where the reconstruction image quality relies on the design of a 2D set of binary
coded apertures which block-unblock the light from the scene. This paper extends the compressive capabilities
of CASSI by replacing the traditional blocking-unblocking coded apertures by a set of colored coded
apertures. The colored coded apertures are optimized such that the number of projections is minimized while
the quality of reconstruction is maximized. The optimal design of the colored coded apertures aims to better
satisfy the restricted isometry property in CASSI. The optimal designs are compared with random colored
coded aperture patterns and with the traditional blocking-unblocking coded apertures. Extensive simulations
show the improvement in reconstruction PSNR attained by the optimal colored coded apertures designs.
ETPL
DIP - 064
Colored Coded Aperture Design by Concentration of Measure in Compressive
Spectral Imaging
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Compressive spectral imaging (CSI) senses the spatio-spectral information of a scene by
measuring 2D coded projections on a focal plane array. A ℓ1-norm-based optimization algorithm is
then used to recover the underlying discretized spectral image. The coded aperture snapshot spectral
imager (CASSI) is an architecture realizing CSI where the reconstruction image quality relies on the
design of a 2D set of binary coded apertures which block-unblock the light from the scene. This
paper extends the compressive capabilities of CASSI by replacing the traditional blocking-
unblocking coded apertures by a set of colored coded apertures. The colored coded apertures are
optimized such that the number of projections is minimized while the quality of reconstruction is
maximized. The optimal design of the colored coded apertures aims to better satisfy the restricted
isometry property in CASSI. The optimal designs are compared with random colored coded aperture
patterns and with the traditional blocking-unblocking coded apertures. Extensive simulations show
the improvement in reconstruction PSNR attained by the optimal colored coded apertures designs.
ETPL
DIP - 065
Regularized Tree Partitioning and Its Application to Unsupervised Image
Segmentation
A prior work proposed by Chung-Wu considered an edge-based lookup table to obtain good inversed
image quality, yet it suffers from some drawbacks in terms of image quality, memory consumption, and
complexity. In this correspondence, an improved scheme is proposed to deal with these issues.
ETPL
DIP - 066
Inverse Halftoning With Context Driven Prediction
This paper proposes a novel saliency detection framework termed as saliency tree. For effective
saliency measurement, the original image is first simplified using adaptive color quantization and region
segmentation to partition the image into a set of primitive regions. Then, three measures, i.e., global contrast,
spatial sparsity, and object prior are integrated with regional similarities to generate the initial regional
saliency for each primitive region. Next, a saliency-directed region merging approach with dynamic scale
control scheme is proposed to generate the saliency tree, in which each leaf node represents a primitive region
and each non-leaf node represents a non-primitive region generated during the region merging process.
Finally, by exploiting a regional center-surround scheme based node selection criterion, a systematic saliency
tree analysis including salient node selection, regional saliency adjustment and selection is performed to obtain
final regional saliency measures and to derive the high-quality pixel-wise saliency map. Extensive
experimental results on five datasets with pixel-wise ground truths demonstrate that the proposed saliency tree
model consistently outperforms the state-of-the-art saliency models.
ETPL
DIP - 067
Saliency Tree: A Novel Saliency Detection Framework
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper proposes two sets of novel edge-texture features, Discriminative Robust Local Binary
Pattern (DRLBP) and Ternary Pattern (DRLTP), for object recognition. By investigating the limitations of
Local Binary Pattern (LBP), Local Ternary Pattern (LTP) and Robust LBP (RLBP), DRLBP and DRLTP are
proposed as new features. They solve the problem of discrimination between a bright object against a dark
background and vice-versa inherent in LBP and LTP. DRLBP also resolves the problem of RLBP whereby
LBP codes and their complements in the same block are mapped to the same code. Furthermore, the proposed
features retain contrast information necessary for proper representation of object contours that LBP, LTP, and
RLBP discard. Our proposed features are tested on seven challenging data sets: INRIA Human, Caltech
Pedestrian, UIUC Car, Caltech 101, Caltech 256, Brodatz, and KTH-TIPS2-a. Results demonstrate that the
proposed features outperform the compared approaches on most data sets.
ETPL
DIP - 068
LBP-Based Edge-Texture Features for Object Recognition
Many imaging applications require the implementation of space-varying convolution for accurate
restoration and reconstruction of images. Here, we use the term space-varying convolution to refer to linear
operators whose impulse response has slow spatial variation. In addition, these space-varying convolution
operators are often dense, so direct implementation of the convolution operator is typically computationally
impractical. One such example is the problem of stray light reduction in digital cameras, which requires the
implementation of a dense space-varying deconvolution operator. However, other inverse problems, such as
iterative tomographic reconstruction, can also depend on the implementation of dense space-varying
convolution. While space-invariant convolution can be efficiently implemented with the fast Fourier
transform, this approach does not work for space-varying operators. So direct convolution is often the only
option for implementing space-varying convolution. In this paper, we develop a general approach to the
efficient implementation of space-varying convolution, and demonstrate its use in the application of stray light
reduction. Our approach, which we call matrix source coding, is based on lossy source coding of the dense
space-varying convolution matrix. Importantly, by coding the transformation matrix, we not only reduce the
memory required to store it; we also dramatically reduce the computation required to implement matrix-vector
products. Our algorithm is able to reduce computation by approximately factoring the dense space-varying
convolution operator into a product of sparse transforms. Experimental results show that our method can
dramatically reduce the computation required for stray light reduction while maintaining high accuracy.
ETPL
DIP - 069
Fast Space-Varying Convolution Using Matrix Source Coding With
Applications to Camera Stray Light Reduction
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
The goal of this paper is to propose a statistical model of quantized discrete cosine transform (DCT)
coefficients. It relies on a mathematical framework of studying the image processing pipeline of a typical
digital camera instead of fitting empirical data with a variety of popular models proposed in this paper. To
highlight the accuracy of the proposed model, this paper exploits it for the detection of hidden information in
JPEG images. By formulating the hidden data detection as a hypothesis testing, this paper studies the most
powerful likelihood ratio test for the steganalysis of Jsteg algorithm and establishes theoretically its statistical
performance. Based on the proposed model of DCT coefficients, a maximum likelihood estimator for
embedding rate is also designed. Numerical results on simulated and real images emphasize the accuracy of
the proposed model and the performance of the proposed test.
ETPL
DIP - 070
Statistical Model of Quantized DCT Coefficients: Application in the
Steganalysis of Jsteg Algorithm
In image classification tasks, one of the most successful algorithms is the bag-of-features (BoFs)
model. Although the BoF model has many advantages, such as simplicity, generality, and scalability, it still
suffers from several drawbacks, including the limited semantic description of local descriptors, lack of robust
structures upon single visual words, and missing of efficient spatial weighting. To overcome these
shortcomings, various techniques have been proposed, such as extracting multiple descriptors, spatial context
modeling, and interest region detection. Though they have been proven to improve the BoF model to some
extent, there still lacks a coherent scheme to integrate each individual module together. To address the
problems above, we propose a novel framework with spatial pooling of complementary features. Our model
expands the traditional BoF model on three aspects. First, we propose a new scheme for combining texture and
edge-based local features together at the descriptor extraction level. Next, we build geometric visual phrases to
model spatial context upon complementary features for midlevel image representation. Finally, based on a
smoothed edgemap, a simple and effective spatial weighting scheme is performed to capture the image
saliency. We test the proposed framework on several benchmark data sets for image classification. The
extensive results show the superior performance of our algorithm over the state-of-the-art methods.
ETPL
DIP - 071
Spatial Pooling of Heterogeneous Features for Image Classification
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We present a novel domain adaptation approach for solving cross-domain pattern recognition
problems, i.e., the data or features to be processed and recognized are collected from different
domains of interest. Inspired by canonical correlation analysis (CCA), we utilize the derived
correlation subspace as a joint representation for associating data across different domains, and we
advance reduced kernel techniques for kernel CCA (KCCA) if nonlinear correlation subspace are
desirable. Such techniques not only makes KCCA computationally more efficient, potential over-
fitting problems can be alleviated as well. Instead of directly performing recognition in the derived
CCA subspace (as prior CCA-based domain adaptation methods did), we advocate the exploitation of
domain transfer ability in this subspace, in which each dimension has a unique capability in
associating cross-domain data. In particular, we propose a novel support vector machine (SVM) with
a correlation regularizer, named correlation-transfer SVM, which incorporates the domain adaptation
ability into classifier design for cross-domain recognition. We show that our proposed domain
adaptation and classification approach can be successfully applied to a variety of cross-domain
recognition tasks such as cross-view action recognition, handwritten digit recognition with different
features, and image-to-text or text-to-image classification. From our empirical results, we verify that
our proposed method outperforms state-of-the-art domain adaptation approaches in terms of
recognition performance.
ETPL
DIP - 072
Heterogeneous Domain Adaptation and Classification by Exploiting the
Correlation Subspace
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Image reranking is effective for improving the performance of a text-based image search. However,
existing reranking algorithms are limited for two main reasons: 1) the textual meta-data associated with
images is often mismatched with their actual visual content and 2) the extracted visual features do not
accurately describe the semantic similarities between images. Recently, user click information has been used
in image reranking, because clicks have been shown to more accurately describe the relevance of retrieved
images to search queries. However, a critical problem for click-based methods is the lack of click data, since
only a small number of web images have actually been clicked on by users. Therefore, we aim to solve this
problem by predicting image clicks. We propose a multimodal hypergraph learning-based sparse coding
method for image click prediction, and apply the obtained click data to the reranking of images. We adopt a
hypergraph to build a group of manifolds, which explore the complementarity of different features through a
group of weights. Unlike a graph that has an edge between two vertices, a hyperedge in a hypergraph connects
a set of vertices, and helps preserve the local smoothness of the constructed sparse codes. An alternating
optimization procedure is then performed, and the weights of different modalities and the sparse codes are
simultaneously obtained. Finally, a voting strategy is used to describe the predicted click as a binary event
(click or no click), from the images' corresponding sparse codes. Thorough empirical studies on a large-scale
database including nearly 330 K images demonstrate the effectiveness of our approach for click prediction
when compared with several other methods. Additional image reranking experiments on real-world data show
the use of click prediction is beneficial to improving the performance of prominent graph-based image
reranking algorithms.
ETPL
DIP - 073
Click Prediction for Web Image Reranking Using Multimodal Sparse Coding
We propose a new mathematical and algorithmic framework for unsupervised image segmentation,
which is a critical step in a wide variety of image processing applications. We have found that most existing
segmentation methods are not successful on histopathology images, which prompted us to investigate
segmentation of a broader class of images, namely those without clear edges between the regions to be
segmented. We model these images as occlusions of random images, which we call textures, and show that
local histograms are a useful tool for segmenting them. Based on our theoretical results, we describe a flexible
segmentation framework that draws on existing work on nonnegative matrix factorization and image
deconvolution. Results on synthetic texture mosaics and real histology images show the promise of the
method.
ETPL
DIP - 074
Images as Occlusions of Textures: A Framework for Segmentation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In recent years, there has been growing interest in mapping visual features into compact binary codes
for applications on large-scale image collections. Encoding high-dimensional data as compact binary codes
reduces the memory cost for storage. Besides, it benefits the computational efficiency since the computation of
similarity can be efficiently measured by Hamming distance. In this paper, we propose a novel flexible scale
invariant feature transform (SIFT) binarization (FSB) algorithm for large-scale image search. The FSB
algorithm explores the magnitude patterns of SIFT descriptor. It is unsupervised and the generated binary
codes are demonstrated to be dispreserving. Besides, we propose a new searching strategy to find target
features based on the cross-indexing in the binary SIFT space and original SIFT space. We evaluate our
approach on two publicly released data sets. The experiments on large-scale partial duplicate image retrieval
system demonstrate the effectiveness and efficiency of the proposed algorithm.
ETPL
DIP - 075
Cross-Indexing of Binary SIFT Codes for Large-Scale Image Search
We propose a new strategy to evaluate the quality of multi and hyperspectral images, from the
perspective of human perception. We define the spectral image difference as the overall perceived difference
between two spectral images under a set of specified viewing conditions (illuminants). First, we analyze the
stability of seven image-difference features across illuminants, by means of an information-theoretic strategy.
We demonstrate, in particular, that in the case of common spectral distortions (spectral gamut mapping,
spectral compression, spectral reconstruction), chromatic features vary much more than achromatic ones
despite considering chromatic adaptation. Then, we propose two computationally efficient spectral image
difference metrics and compare them to the results of a subjective visual experiment. A significant
improvement is shown over existing metrics such as the widely used root-mean square error.
ETPL
DIP - 076
Image-Difference Prediction: From Color to Spectral
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Multimedia communication is becoming pervasive because of the progress in wireless
communications and multimedia coding. Estimating the quality of the visual content accurately is crucial in
providing satisfactory service. State of the art visual quality assessment approaches are effective when the
input image and reference image have the same resolution. However, finding the quality of an image that has
spatial resolution different than that of the reference image is still a challenging problem. To solve this
problem, we develop a quality estimator (QE), which computes the quality of the input image without
resampling the reference or the input images. In this paper, we begin by identifying the potential weaknesses
of previous approaches used to estimate the quality of experience. Next, we design a QE to estimate the
quality of a distorted image with a lower resolution compared with the reference image. We also propose a
subjective test environment to explore the success of the proposed algorithm in comparison with other QEs.
When the input and test images have different resolutions, the subjective tests demonstrate that in most cases
the proposed method works better than other approaches. In addition, the proposed algorithm also performs
well when the reference image and the test image have the same resolution.
ETPL
DIP - 077
Full-Reference Quality Estimation for Images With Different Spatial
Resolutions
Content-aware image resizing techniques allow to take into account the visual content of images
during the resizing process. The basic idea beyond these algorithms is the removal of vertical and/or horizontal
paths of pixels (i.e., seams) containing low salient information. In this paper, we present a method which
exploits the gradient vector flow (GVF) of the image to establish the paths to be considered during the
resizing. The relevance of each GVF path is straightforward derived from an energy map related to the
magnitude of the GVF associated to the image to be resized. To make more relevant, the visual content of the
images during the content-aware resizing, we also propose to select the generated GVF paths based on their
visual saliency properties. In this way, visually important image regions are better preserved in the final
resized image. The proposed technique has been tested, both qualitatively and quantitatively, by considering a
representative data set of 1000 images labeled with corresponding salient objects (i.e., ground-truth maps).
Experimental results demonstrate that our method preserves crucial salient regions better than other state-of-
the-art algorithms.
ETPL
DIP - 078
Saliency-Based Selection of Gradient Vector Flow Paths for Content Aware
Image Resizing
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Texture enhancement presents an ongoing challenge, in spite of the considerable progress made in
recent years. Whereas most of the effort has been devoted so far to enhancement of regular textures, stochastic
textures that are encountered in most natural images, still pose an outstanding problem. The purpose of
enhancement of stochastic textures is to recover details, which were lost during the acquisition of the image. In
this paper, a texture model, based on fractional Brownian motion (fBm), is proposed. The model is global and
does not entail using image patches. The fBm is a self-similar stochastic process. Self-similarity is known to
characterize a large class of natural textures. The fBm-based model is evaluated and a single-image
regularized superresolution algorithm is derived. The proposed algorithm is useful for enhancement of a wide
range of textures. Its performance is compared with single-image superresolution methods and its advantages
are highlighted.
ETPL
DIP - 079
Single-Image Superresolution of Natural Stochastic Textures Based on
Fractional Brownian Motion
This paper addresses the problem of automatic figure-ground segmentation, which aims at
automatically segmenting out all foreground objects from background. The underlying idea of this approach is
to transfer segmentation masks of globally and locally (glocally) similar exemplars into the query image. For
this purpose, we propose a novel high-level image representation method named as object-oriented descriptor.
Using this descriptor, a set of exemplar images glocally similar to the query image is retrieved. Then, using
over-segmented regions of these retrieved exemplars, a discriminative classifier is learned on-the-fly and
subsequently used to predict foreground probability for the query image. Finally, the optimal segmentation is
obtained by combining the online prediction with typical energy optimization of Markov random field. The
proposed approach has been extensively evaluated on three datasets, including Pascal VOC 2010, VOC 2011
segmentation challenges, and iCoseg dataset. Experiments show that the proposed approach outperforms state-
of-the-art methods and has the potential to segment large-scale images containing unknown objects, which
never appear in the exemplar images.
ETPL
DIP - 080
Online Glocal Transfer for Automatic Figure-Ground Segmentation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper presents a method for learning overcomplete dictionaries of atoms composed of two
modalities that describe a 3D scene: 1) image intensity and 2) scene depth. We propose a novel joint basis
pursuit (JBP) algorithm that finds related sparse features in two modalities using conic programming and we
integrate it into a two-step dictionary learning algorithm. The JBP differs from related convex algorithms
because it finds joint sparsity models with different atoms and different coefficient values for intensity and
depth. This is crucial for recovering generative models where the same sparse underlying causes (3D features)
give rise to different signals (intensity and depth). We give a bound for recovery error of sparse coefficients
obtained by JBP, and show numerically that JBP is superior to the group lasso algorithm. When applied to the
Middlebury depth-intensity database, our learning algorithm converges to a set of related features, such as
pairs of depth and intensity edges or image textures and depth slants. Finally, we show that JBP outperforms
state of the art methods on depth inpainting for time-of-flight and Microsoft Kinect 3D data.
ETPL
DIP - 081
Learning Joint Intensity-Depth Sparse Representations
Geodesic distance, as an essential measurement for data dissimilarity, has been successfully used in
manifold learning. However, most geodesic distance-based manifold learning algorithms have two limitations
when applied to classification: 1) class information is rarely used in computing the geodesic distances between
data points on manifolds and 2) little attention has been paid to building an explicit dimension reduction
mapping for extracting the discriminative information hidden in the geodesic distances. In this paper, we
regard geodesic distance as a kind of kernel, which maps data from linearly inseparable space to linear
separable distance space. In doing this, a new semisupervised manifold learning algorithm, namely regularized
geodesic feature learning algorithm, is proposed. The method consists of three techniques: a semisupervised
graph construction method, replacement of original data points with feature vectors which are built by
geodesic distances, and a new semisupervised dimension reduction method for feature vectors. Experiments
on the MNIST, USPS handwritten digit data sets, MIT CBCL face versus nonface data set, and an intelligent
traffic data set show the effectiveness of the proposed algorithm.
ETPL
DIP - 082
A Regularized Approach for Geodesic-Based Semisupervised Multimanifold
Learnin
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper presents a nonlinear mixing model for joint hyperspectral image unmixing and
nonlinearity detection. The proposed model assumes that the pixel reflectances are linear combinations of
known pure spectral components corrupted by an additional nonlinear term, affecting the end members and
contaminated by an additive Gaussian noise. A Markov random field is considered for nonlinearity detection
based on the spatial structure of the nonlinear terms. The observed image is segmented into regions where
nonlinear terms, if present, share similar statistical properties. A Bayesian algorithm is proposed to estimate
the parameters involved in the model yielding a joint nonlinear unmixing and nonlinearity detection algorithm.
The performance of the proposed strategy is first evaluated on synthetic data. Simulations conducted with real
data show the accuracy of the proposed unmixing and nonlinearity detection strategy for the analysis of
hyperspectral images.
ETPL
DIP - 083
Residual Component Analysis of Hyperspectral Images—Application to Joint
Nonlinear Unmixing and Nonlinearity Detection
In this paper, we present an efficient multiscale low-rank representation for image segmentation. Our
method begins with partitioning the input images into a set of superpixels, followed by seeking the optimal
superpixel-pair affinity matrix, both of which are performed at multiple scales of the input images. Since low-
level superpixel features are usually corrupted by image noise, we propose to infer the low-rank refined
affinity matrix. The inference is guided by two observations on natural images. First, looking into a single
image, local small-size image patterns tend to recur frequently within the same semantic region, but may not
appear in semantically different regions. The internal image statistics are referred to as replication prior, and
we quantitatively justified it on real image databases. Second, the affinity matrices at different scales should be
consistently solved, which leads to the cross-scale consistency constraint. We formulate these two purposes
with one unified formulation and develop an efficient optimization procedure. The proposed representation
can be used for both unsupervised or supervised image segmentation tasks. Our experiments on public data
sets demonstrate the presented method can substantially improve segmentation accuracy.
ETPL
DIP - 084
MsLRR: A Unified Multiscale Low-Rank Representation for Image
Segmentation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
The success of many image restoration algorithms is often due to their ability to sparsely describe the
original signal. Shukla proposed a compression algorithm, based on a sparse quadtreedecomposition model,
which could optimally represent piecewise polynomial images. In this paper, we adapt this model to
the image restoration by changing the rate-distortion penalty to a description-length penalty. In addition, one
of the major drawbacks of this type of approximation is the computational complexity required to find a
suitable subspace for each node of the quadtree. We address this issue by searching for a suitable subspace
much more efficiently using the mathematics of updating matrix factorisations. Algorithms are developed to
tackle denoising and interpolation. Simulation results indicate that we beat state of the art results when the
original signal is in the model (e.g., depth images) and are competitive for natural images when the
degradation is high.
ETPL
DIP - 085
Quadtree Structured Image Approximation for Denoising and Interpolation
Line scratch detection in old films is a particularly challenging problem due to the variable
spatiotemporal characteristics of this defect. Some of the main problems include sensitivity to noise and
texture, and false detections due to thin vertical structures belonging to the scene. We propose a
robust and automatic algorithm for frame-by-frame line scratch detection in old films, as well as a temporal
algorithm for the filtering of false detections. In the frame-by-frame algorithm, we relax some of the
hypotheses used in previous algorithms in order to detect a wider variety of scratches. This step's robustness
and lack of external parameters is ensured by the combined use of an a contrario methodology and local
statistical estimation. In this manner, over-detection in textured or cluttered areas is greatly reduced. The
temporal filtering algorithm eliminates false detections due to thin vertical structures by exploiting the
coherence of their motion with that of the underlying scene. Experiments demonstrate the ability of the
resulting detection procedure to deal with difficult situations, in particular in the presence of noise, texture,
and slanted or partial scratches. Comparisons show significant advantages over previous work.
ETPL
DIP - 086
Robust Automatic Line Scratch Detection in Films
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
The behavior and performance of denoising algorithms are governed by one or several parameters,
whose optimal settings depend on the content of the processed image and the characteristics of the noise, and
are generally designed to minimize the mean squared error (MSE) between the denoisedimage returned by the
algorithm and a virtual ground truth. In this paper, we introduce a new Poisson-
Gaussian unbiased riskestimator (PG-URE) of the MSE applicable to a mixed Poisson-Gaussian noise model
that unifies the widely used Gaussian and Poisson noise models in fluorescence bioimaging applications. We
propose a stochastic methodology to evaluate this estimator in the case when little is known about the internal
machinery of the considered denoising algorithm, and we analyze both theoretically and empirically the
characteristics of the PG-URE estimator. Finally, we evaluate the PG-URE-driven parametrization for three
standard denoising algorithms, with and without variance stabilizing transforms, and different characteristics
of the Poisson-Gaussian noise mixture.
ETPL
DIP - 087
An Unbiased Risk Estimator for Image Denoising in the Presence of Mixed
Poisson–Gaussian Noise
Block Truncation Coding (BTC) has been considered as a highly efficient compression technique for
decades. However, the annoying blocking effect and false contour under low bit rate configuration are its key
problems. In this work, an improved BTC, namely Dot-Diffused BTC (DDBTC), is proposed to solve these
problems. On one hand, the DDBTC can provide excellent processing efficiency by exploiting the innate
parallelism advantage of dot diffusion. On the other hand, the DDBTC can provide excellent image quality by
co-optimizing the class matrix and diffused matrix of the dot diffusion. The experimental results demonstrate
that the proposed DDBTC is fully superior to the pervious Error-Diffused BTC (EDBTC) in terms of image
quality and processing efficiency, and has much better image quality than that of the Ordered-Dither BTC
(ODBTC).
ETPL
DIP - 088
Improved Block Truncation Coding Using Optimized Dot Diffusion
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Mathematical morphology is a very popular framework for processing binary or grayscale images.
One of the key problems in applying this framework to color images is the notorious false color problem. We
discuss the nature of this problem and its origins. In doing so, it becomes apparent that the lack of invariance
of operators to certain transformations (forming a group) plays an important role. The main culprits are the
basic join and meet operations, and the associated lattice structure that forms the theoretical basis for
mathematical morphology. We show how a lattice that is not group invariant can be related to another lattice
that is. When all transformations in a group are linear, these lattices can be related to one another via the
theory of frames. This provides all the machinery to let us transform any (grayscale or color) morphological
filter into a group-invariant filter on grayscale or color images. We then demonstrate the potential for both
subjective and objective improvement in selected tasks.
ETPL
DIP - 089
Group-Invariant Colour Morphology Based on Frames
In this paper, we present an extension of the iterative closest point (ICP) algorithm that
simultaneously registers multiple 3Dscans. While ICP fails to utilize the multiview constraints available, our
method exploits the information redundancy in a set of 3D scans by using the averaging of relative motions.
This averaging method utilizes the Lie group structure of motions, resulting in a 3D registration method that is
both efficient and accurate. In addition, we present two variants of our approach, i.e., a method that solves
for multiview 3Dregistration while obeying causality and a transitive correspondence variant that efficiently
solves the correspondence problem across multiple scans. We present experimental results to characterize our
method and explain its behavior as well as those of some other multiview registration methods in the literature.
We establish the superior accuracy of our method in comparison to these multiview methods with
registration results on a set of well-known real datasets of 3Dscans.
ETPL
DIP - 090
On Averaging Multiview Relations for 3D Scan Registration
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
The distributions of discrete cosine transform (DCT) coefficients of images are revisited on a per
image base. To better handle, the heavy tail phenomenon commonly seen in the DCT coefficients, a
new model dubbed a transparent composite model (TCM) is proposed and justified for both
modeling accuracy and an additional data reduction capability. Given a sequence of the DCT coefficients, a
TCM first separates the tail from the main body of the sequence. Then, a uniform distribution is used
to model the DCT coefficients in the heavy tail, whereas a different parametric distribution is used to
model data in the main body. The separate boundary and other parameters of the TCM can be estimated via
maximum likelihood estimation. Efficient online algorithms are proposed for parameter estimation and their
convergence is also proved. Experimental results based on Kullback-Leibler divergence and χ2 test show that
for real-valued continuous ac coefficients, the TCM based on truncated Laplacian offers the best tradeoff
between modeling accuracy and complexity. For discrete or integer DCT coefficients, the discrete TCM based
on truncated geometric distributions (GMTCM) models the ac coefficients more accurately than pure
Laplacian models and generalized Gaussian models in majority cases while having simplicity and practicality
similar to those of pure Laplacian models. In addition, it is demonstrated that the GMTCM also exhibits a
good capability of data reduction or feature extraction-the DCT coefficients in the heavy tail identified by the
GMTCM are truly outliers, and these outliers represent an outlier image revealing some unique global features
of the image. Overall, the modeling performance and the data reduction feature of the GMTCM make it a
desirable choice for modeling discrete or integer DCT coefficients in the real-world image or video
applications, as summarized in a few of our further studies on quantization design, entropy coding design, and
ima- e understanding and management.
ETPL
DIP - 091
Transparent Composite Model for DCT Coefficients: Design and Analysis
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Privacy is a critical issue when the data owners outsource data storage or processing to a third party
computing service, such as the cloud. In this paper, we identify a cloud computing application scenario that
requires simultaneously performingsecure watermark detection and privacy preserving multimedia
data storage. We then propose a compressive sensing (CS)-based framework using secure multiparty
computation (MPC) protocols to address such a requirement. In our framework, the multimedia data and
secret watermark pattern are presented to the cloud for secure watermark detection in a CS domain to protect
the privacy. During CS transformation, the privacy of the CS matrix and the watermark pattern is protected by
the MPC protocols under the semi-honest security model. We derive the expected watermark detection
performance in the CS domain, given the target image, watermark pattern, and the size of the CS matrix (but
without the CS matrix itself). The correctness of the derived performance has been validated by our
experiments. Our theoretical analysis and experimental results show that secure watermark detection in the CS
domain is feasible. Our framework can also be extended to other collaborative secure signal processing and
data-mining applications in the cloud.
ETPL
DIP - 092
A Compressive Sensing Based Secure Watermark Detection and Privacy
Preserving Storage Framework
Noise is present in all images captured by real-world image sensors. Poisson distribution is said to
model the stochastic nature of the photon arrival process and agrees with the distribution of measured pixel
values. We propose a method for estimating
unknown noise parameters from Poisson corruptedimages using properties of variance stabilization. With a
significantly lower computational complexity and improved stability, the proposed estimation technique
yields noise parameters that are comparable in accuracy to the state-of-art methods.
ETPL
DIP - 093
Noise Parameter Estimation for Poisson Corrupted Images Using Variance
Stabilization Transforms
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
The left ventricular myocardium plays a key role in the entire circulation system and
an automatic delineation of the myocardium is a prerequisite for most of the subsequent functional analysis. In
this paper, we present a complete system for an automatic segmentation of
the left ventricularmyocardium from cardiac computed tomography (CT) images using the shape information
from images to be segmented. The system follows a coarse-to-fine strategy by first localizing the left ventricle
and then deforming the myocardial surfaces of the left ventricle to refine the segmentation. In particular, the
blood pool of a CT image is extracted and represented as a triangulated surface. Then, the left ventricle is
localized as a salient component on this surface using geometric and anatomical characteristics. After that, the
myocardial surfaces are initialized from the localization result and evolved by applying forces from
the image intensities with a constraint based on the initial myocardial surface locations. The proposed
framework has been validated on 34-human and 12-pig CT images, and the robustness and accuracy are
demonstrated.
ETPL
DIP - 094
A Complete System for Automatic Extraction of Left Ventricular Myocardium
From CT Images Using Shape Segmentation and Contour Evolution
We propose a blind (no reference or NR) video quality evaluation model that is non distortion
specific. The approach relies on a spatio-temporal model of video scenes in the discrete cosine transform
domain, and on a model that characterizes the type of motion occurring in the scenes, to predict video quality.
We use the models to define video statistics and perceptual features that are the basis of a video
quality assessment (VQA) algorithm that does not require the presence of a pristine video to compare against
in order to predict a perceptual quality score. The contributions of this paper are threefold. 1) We propose a
spatio-temporal natural scene statistics (NSS) model for videos. 2) We propose a motion model that quantifies
motion coherency in video scenes. 3) We show that the proposed NSS and motion coherency models are
appropriate for quality assessment of videos, and we utilize them to design a blind VQA algorithm that
correlates highly with human judgments of quality. The proposed algorithm, called video BLIINDS, is tested
on the LIVE VQA database and on the EPFL-PoliMi video database and shown to perform close to the level
of top performing reduced and full reference VQA algorithms.
ETPL
DIP - 095
Blind Prediction of Natural Video Quality
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
While image-difference metrics show good prediction performance on visual data, they often yield
artifact-contaminated results if used as objective functions for optimizing complex image-processing tasks.
We investigate in this regard the recently proposed color-image-difference (CID) metric particularly
developed for predicting gamut-mapping distortions. We present an algorithm for optimizing gamut mapping
employing the CID metric as the objective function. Resulting images contain various visual artifacts, which
are addressed by multiple modifications yielding the improved color-image-difference (iCID) metric. The
iCID-based optimizations are free from artifacts and retain contrast, structure, and color of the
original image to a great extent. Furthermore, the prediction performance on visual data is improved by the
modifications.
ETPL
DIP - 096
Color-Image Quality Assessment: From Prediction to Optimization
This paper presents an accelerated iterative Landweber method
for nonlinear ultrasonic tomographic imaging in a multiple-input multiple-output (MIMO) configuration under
a sparsity constraint on the image. The proposed method introduces the emerging MIMO signal processing
techniques and target sparseness constraints in the traditional computational imaging field, thus significantly
improves the speed of image reconstruction compared with the conventional imaging method while producing
high quality images. Using numerical examples, we demonstrate that incorporating prior knowledge about
the imaging field such as target sparsenessaccelerates significantly the convergence of the iterative
imaging method, which provides considerable benefits to real-time tomographic imaging applications.
ETPL
DIP - 097
Accelerated Nonlinear Multichannel Ultrasonic Tomographic Imaging Using Target
Sparseness
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Blur kernel estimation is a crucial step in the deblurring process for images. Estimation of the kernel,
especially in the presence of noise, is easily perturbed, and the quality of the resulting deblurred images is
hence degraded. Since every motion blur in a single exposure image can be represented by 2D parametric
curves, we adopt a piecewise-linear model to approximate the curves for the reliable blur kernel estimation.
The model is found to be an effective tradeoff between flexibility and robustness as it takes advantage of two
extremes: (1) the generic model, represented by a discrete 2D function, which has a high degree of freedom
(DOF) for the maximum flexibility but suffers from noise and (2) the linearmodel, which enhances robustness
and simplicity but has limited expressiveness due to its low DOF. We evaluate several deblurring methods
based on not only the generic model, but also the piecewise-linear model as an alternative. After analyzing the
experiment results using real-world images with significant levels of noise and a benchmark data set, we
conclude that the proposed model is not only robust with respect to noise, but also flexible in dealing with
various types of blur.
ETPL
DIP - 098
Robust Estimation of Motion Blur Kernel Using a Piecewise-Linear Model
A directed graph (or digraph) approach is proposed in this paper for identifying all the visual objects
commonly presented in the two images under comparison. As a model, the directed graph is superior to the
undirected graph, since there are two link weights with opposite orientations associated with each link of
the graph. However, it inevitably draws two main challenges: 1) how to compute the two link weights for each
link and 2) how to extract the sub graph from the digraph. For 1), a novel n-ranking process for computing the
generalized median and the Gaussian link-weight mapping function are developed that basically map the
established undirected graph to the digraph. To achieve this graph mapping, the proposed process and function
are applied to each vertex independently for computing its directed link weight by not only considering the
influences inserted from its immediately adjacent neighboring vertices (in terms of their link-weight values),
but also offering other desirable merits-i.e., link-weight enhancement and computational complexity reduction.
For 2), an evolutionary iterative process for solving the non-cooperative game theory is exploited to handle the
non-symmetric weighted adjacency matrix. The abovementioned two stages of processes will be conducted for
each assumed scale-change factor, experimented over a range of possible values, one factor at a time. If there
is a match on the scale-change factor under experiment, the common visual patterns with the same scale-
change factor will be extracted. If more than one pattern are extracted, the proposed topological splitting
method is able to further differentiate among them provided that the visual objects are sufficiently far apart
from each other. Extensive simulation results have clearly demonstrated the superior performance
accomplished by the proposed digraph approach, compared with those of using the
undirected graph approach.
ETPL
DIP - 099
Common Visual Pattern Discovery via Directed Graph
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Photo aesthetic quality evaluation is a fundamental yet under addressed task in computer vision and
image processing fields. Conventional approaches are frustrated by the following two drawbacks. First, both
the local and global spatial arrangements of image regions play an important role in photoaesthetics. However,
existing rules, e.g., visual balance, heuristically define which spatial distribution among the salient regions of
a photo is aesthetically pleasing. Second, it is difficult to adjust visual cues from multiple channels
automatically in photo aesthetics assessment. To solve these problems, we propose a
new photo aesthetics evaluation framework, focusing on learning the image descriptors that
characterize local and global structural aesthetics from multiple visual channels. In particular, to describe the
spatial structure of the image local regions, we construct graphlets small-sized connected graphs by connecting
spatially adjacent atomic regions. Since spatially adjacent graphlets distribute closely in their feature space, we
project them onto a manifold and subsequently propose an embedding algorithm. The embedding algorithm
encodes the photo global spatial layout into graphlets. Simultaneously, the importance of graphlets from
multiple visual channels are dynamically adjusted. Finally, these post-embedding graphlets are integrated
for photoaesthetics evaluation using a probabilistic model. Experimental results show that: 1) the visualized
graphlets explicitly capture the aesthetically arranged atomic regions; 2) the proposed approach generalizes
and improves four prominent aesthetic rules; and 3) our approach significantly outperforms state-of-the-art
algorithms in photo aesthetics prediction.
ETPL
DIP - 100
Fusion of Multichannel Local and Global Structural Cues for Photo Aesthetics
Evaluation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Supervised machine learning techniques have been applied to
multilabel image classification problems with tremendous success. Despite disparate learning mechanisms,
their performances heavily rely on the quality of training images. However, the acquisition of
training images requires significant efforts from human annotators. This hinders the applications of
supervised learning techniques to large scale problems. In this paper, we propose a high-
order label correlation driven active learning (HoAL) approach that allows the iterative learning algorithm
itself to select the informative example-label pairs from which it learns so as to learn an accurate classifier
with less annotation efforts. Four crucial issues are considered by the proposed HoAL: 1) unlike binary cases,
the selection granularity for multilabel active learning need to be fined from example to example-label pair; 2)
different labels are seldom independent, and label correlations provide critical information for
efficient learning; 3) in addition to pair-wise label correlations, high-order label correlations are also
informative for multilabel active learning; and 4) since the number of label combinations increases
exponentially with respect to the number of labels, an efficient mining method is required to discover
informative label correlations. The proposed approach is tested on public data sets, and the empirical results
demonstrate its effectiveness.
ETPL
DIP - 101
Multilabel Image Classification Via High-Order Label Correlation Driven
Active Learning
We present a novel image super pixel segmentation approach using the
proposed lazy random walk (LRW) algorithm in this paper. Our method begins with initializing the seed
positions and runs the LRW algorithm on the input image to obtain the probabilities of each pixel. Then, the
boundaries of initial super pixels are obtained according to the probabilities and the commute time. The initial
super pixels are iteratively optimized by the new energy function, which is defined on the commute time and
the texture measurement. Our LRW algorithm with self-loops has the merits of segmenting the weak
boundaries and complicated texture regions very well by the new global probability maps and the commute
time strategy. The performance of super pixel is improved by relocating the center positions of super pixels
and dividing the large super pixels into small ones with the proposed optimization algorithm. The
experimental results have demonstrated that our method achieves better performance than previous super
pixelapproaches.
ETPL
DIP - 102
Lazy Random Walks for Super pixel Segmentation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Conventionally, data embedding techniques aim at maintaining high-output image quality so that the
difference between the original and the embedded images is imperceptible to the naked eye. Recently, as a
new trend, some researchers exploited reversible data embedding techniques to deliberately degrade image
quality to a desirable level of distortion. In this paper, a unified data embedding-scrambling technique called
UES is proposed to achieve two objectives simultaneously, namely, high payload and adaptive scalable quality
degradation. First, a pixel intensity value prediction method called checkerboard-based prediction is proposed
to accurately predict 75% of the pixels in the image based on the information obtained from 25% of the image.
Then, the locations of the predicted pixels are vacated to embed information while degrading the image
quality. Given a desirable quality (quantified in SSIM) for the output image, UES guides the embedding-
scrambling algorithm to handle the exact number of pixels, i.e., the perceptual quality of the embedded-
scrambled image can be controlled. In addition, the prediction errors are stored at a predetermined precision
using the structure side information to perfectly reconstruct or approximate the original image. In particular,
given a desirable SSIM value, the precision of the stored prediction errors can be adjusted to control the
perceptual quality of the reconstructed image. Experimental results confirmed that UES is able to perfectly
reconstruct or approximate the original image with SSIM value after completely degrading its perceptual
quality while embedding at 7.001 bpp on average.
ETPL
DIP - 103
A Unified Data Embedding and Scrambling Method
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We describe a new 3D saliency prediction model that accounts for diverse low-level luminance,
chrominance, motion, and depth attributes of 3D videos as well as high-level classifications of scenes by type.
The model also accounts for perceptual factors, such as the non uniform resolution of the human
eye, stereoscopic limits imposed by Panum's fusional area, and the predicted degree of (dis) comfort felt, when
viewing the 3D video. The high-level analysis involves classification of each 3D video scene by type with
regard to estimated camera motion and the motions of objects in the videos. Decisions regarding the
relative saliency of objects or regions are supported by data obtained through a series of eye-tracking
experiments. The algorithm developed from the model elements operates by finding and segmenting salient
3D space-time regions in a video, then calculating the saliency strength of each segment using measured
attributes of motion, disparity, texture, and the predicted degree of visual discomfort experienced.
The saliency energy of both segmented objects and frames are weighted using models of human foveation and
Panum's fusional area yielding a single predictor of 3D saliency.
ETPL
DIP - 104
Saliency Prediction on Stereoscopic Videos
Recovering images from corrupted observations is necessary for many real-world applications. In this
paper, we propose aunified framework to perform progressive image recovery based
on hybrid graph Laplacian regularized regression. We first construct a multi scale representation of the
target image by Laplacian pyramid, then progressively recover the degraded image in the scale space from
coarse to fine so that the sharp edges and texture can be eventually recovered. On one hand, within each scale,
a graph Laplacian regularization model represented by implicit kernel is learned, which simultaneously
minimizes the least square error on the measured samples and preserves the geometrical structure of
the image data space. In this procedure, the intrinsic manifold structure is explicitly considered using both
measured and unmeasured samples, and the nonlocal self-similarity property is utilized as a fruitful resource
for abstracting a priori knowledge of the images. On the other hand, between two successive scales, the
proposed model is extended to a projected high-dimensional feature space through explicit kernel mapping to
describe the inter scale correlation, in which the local structure regularity is learned and propagated from
coarser to finer scales. In this way, the proposed algorithm gradually recovers more and more image details
and edges, which could not been recovered in previous scale. We test our algorithm on one typical image
recovery task: impulse noise removal. Experimental results on benchmark test images demonstrate that the
proposed method achieves better performance than state-of-the-art algorithms.
ETPL
DIP - 105
Progressive Image Denoising Through Hybrid Graph Laplacian Regularization:
A Unified Framework
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Labeled training data are used for challenging medical image segmentation problems to learn
different characteristics of the relevant domain. In this paper, we examine random forest (RF) classifiers, their
learned knowledge during training and ways to exploit it for improved image segmentation. Apart from
learning discriminative features, RFs also quantify their importance in classification. Feature importance is
used to design a feature selection strategy critical for high segmentation and classification accuracy, and also
to design a smoothness cost in a second-order MRF framework for graph cut segmentation. The cost function
combines the contribution of different image features like intensity, texture, and curvature information.
Experimental results on medical images show that this strategy leads to better segmentation accuracy than
conventional graph cut algorithms that use only intensity information in the smoothness cost.
ETPL
DIP – 106
Analyzing Training Information From Random Forests for Improved Image
Segmentation
In this paper, we propose saliency driven image multiscalenonlinear diffusion filtering.
The resulting scale space in general preserves or even enhances semantically important structures
such as edges, lines, or flow-like structures in the foreground, and inhibits and smoothes clutter in the
background. The image is classified using multiscale information fusion based on the original image,
the image at the final scale at which the diffusion process converges, and the image at a midscale.
Our algorithm emphasizes the foreground features, which are important for image classification. The
background image regions, whether considered as contexts of the foreground or noise to the
foreground, can be globally handled by fusing information from different scales. Experimental tests
of the effectiveness of the multiscale space for the image classification are conducted on the
following publicly available datasets: 1) the PASCAL 2005 dataset; 2) the Oxford 102 flowers
dataset; and 3) the Oxford 17 flowers dataset, with high classification rates.
ETPL
DIP - 107
Image Classification Using Multiscale Information Fusion Based on Saliency
Driven Nonlinear Diffusion Filtering
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
The objective approaches of 3D image quality assessment play a key role for the development
of compression standards and various 3D multimedia applications. The quality assessment of3D images faces
more new challenges, such as asymmetric stereo compression, depth perception, and virtual view synthesis,
than its 2D counterparts. In addition, the widely used 2D image quality metrics (e.g., PSNR and SSIM) cannot
be directly applied to deal with these newly introduced challenges. This statement can be verified by the low
correlation between the computed objective measures and the subjectively measured mean opinion scores
(MOSs), when 3Dimages are the tested targets. In order to meet these newly introduced challenges, in this
paper, besides traditional 2Dimage metrics, the binocular integration behaviors-the binocular combination and
the binocular frequency integration, are utilized as the bases for measuring the quality of
stereoscopic 3D images. The effectiveness of the proposed metrics is verified by conducting subjective
evaluations on publicly available stereoscopic image databases. Experimental results show that significant
consistency could be reached between the measured MOS and the proposed metrics, in which the correlation
coefficient between them can go up to 0.88. Furthermore, we found that the proposed metrics can also address
the quality assessment of the synthesized color-plus-depth 3D images well. Therefore, it is our belief that the
binocular integration behaviors are important factors in the development of
objective quality assessment for 3D images.
ETPL
DIP - 108
Quality Assessment of Stereoscopic 3D Image Compression by Binocular
Integration Behaviors
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Path openings and closings are morphological tools used to preserve long, thin, and tortuous
structures in gray level images. They explore all paths from a defined class, and filter them with a length
criterion. However, most paths are redundant, making the process generally
slow. Parsimoniouspath openings and closings are introduced in this paper to solve this problem. These
operators only consider a subset of the paths considered by classical path openings, thus achieving a
substantial speed-up, while obtaining similar results. In addition, a recently introduced 1D opening algorithm
is applied along each selected path. Its complexity is linear with respect to the number of pixels, independent
of the size of the opening. Furthermore, it is fast for any input data accuracy (integer or floating point) and
works in stream. Parsimonious path openings are also extended to incomplete paths, i.e., paths containing
gaps. Noise-corrupted paths can thus be processed with the same approach and complexity. These
parsimonious operators achieve a several orders of magnitude speed-up. Examples are shown for
incomplete path openings, where computing times are brought from minutes to tens of milliseconds, while
obtaining similar results.
ETPL
DIP - 109
Parsimonious Path Openings and Closings
Matching visual appearances of the target over consecutive video frames is a fundamental yet
challenging task in visualtracking. Its performance largely depends on the distance metric that determines the
quality of visual matching. Rather than using fixed and predefined metric, recent attempts of
integrating metric learning-based trackers have shown more robust and promising results, as the
learned metric can be more discriminative. In general, these global metric adjustment methods are
computationally demanding in real-time visual tracking tasks, and they tend to under fit the data when the
target exhibits dynamic appearance variation. This paper presents a nonparametric data-driven local metric
adjustment method. The proposed method finds a spatially adaptive metric that exhibits different properties at
different locations in the feature space, due to the differences of the data distribution in a local neighborhood.
It minimizes the deviation of the empirical misclassification probability to obtain the optimal metric such that
the asymptotic error as if using an infinite set of training samples can be approximated. Moreover, by taking
the data local distribution into consideration, it is spatially adaptive. Integrating this new local metric learning
method into target tracking leads to efficient and robust tracking performance. Extensive experiments have
demonstrated the superiority and effectiveness of the proposed tracking method in various tracking scenarios.
ETPL
DIP - 110
Data-Driven Spatially-Adaptive Metric Adjustment for Visual Tracking
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper considers the recognition of realistic human actions in videos based on spatio-
temporal interest points (STIPs). Existing STIP-based action recognition approaches operate on intensity
representations of the image data. Because of this, these approaches are sensitive to disturbing photometric
phenomena, such as shadows and highlights. In addition, valuable information is neglected by discarding
chromaticity from the photometric representation. These issues are addressed by color STIPs. Color STIPs are
multichannel reformulations of STIP detectors and descriptors, for which we consider a number of chromatic
and invariant representations derived from the opponent color space. Color STIPs are shown to outperform
their intensity-based counterparts on the challenging UCF sports, UCF11 and UCF50 action recognition
benchmarks by more than 5% on average, where most of the gain is due to the multichannel descriptors. In
addition, the results show that color STIPs are currently the single best low-level feature choice for STIP-
based approaches to human action recognition.
ETPL
DIP - 111
Evaluation of Color Spatio-Temporal Interest Points for Human Action
Recognition
An analysis of the relationship between multipath ghosts and the
direct target image for radar imaging is presented. A multipath point spread function (PSF) is defined that
allows for specular reflections in the local environment and can allow the ghost images to be
localized. Analysis of the multipath PSF shows that certain ghosts can only be focused for the far field
synthetic aperture radar case and not the full array case. Importantly, the ghosts are shown to be equivalent to
direct target images taken from different observation angles. This equivalence suggests that exploiting
the ghosts would improve target classification performance and t is improvement is demonstrated using
e perimental data and a na ve ayesian classifer. e ma imum performance gain achieved is 32%.
ETPL
DIP - 112
Analysis and Exploitation of Multipath Ghosts in Radar Target Image
Classification
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper presents a novel manifold learning algorithm for high-dimensional data sets. The scope of the
application focuses on the problem of motion tracking in video sequences. The framework presented is
twofold. First, it is assumed that the samples are time ordered, providing valuable information that is not
presented in the current methodologies. Second, the manifold topology comprises multiple charts, which
contrasts to the most current methods that assume one single chart, being overly restrictive. The proposed
algorithm, Gaussian process multiple local models (GP-MLM), can deal with arbitrary manifold topology by
decomposing the manifold into multiple local models that are probabilistic combined using Gaussian process
regression. In addition, the paper presents a multiple filter architecture where standard filtering techniques are
integrated within the GP-MLM. The proposed approach exhibits comparable performance of state-of-the-art
trackers, namely multiple model data association and deep belief networks, and compares favorably with
Gaussian process latent variable models. Extensive experiments are presented using real video data, including
a publicly available database of lip sequences and left ventricle ultrasound images, in which the GP-MLM
achieves state of the art results.
ETPL
DIP - 113
Manifold Learning for Object Tracking With Multiple Nonlinear Models
With the explosive growth of the multimedia data on the Web, content-based image search has
attracted considerable attentions in the multimedia and the computer vision community. The most popular
approach is based on the bag-of-visual-words model with invariant local features. Since the spatial context
information among local features is critical for visual content identification, many methods exploit the
geometric clues of local features, including the location, the scale, and the orientation, for explicitly post-
geometric verification. However, usually only a few initially top-ranked results are geometrically verified,
considering the high computational cost in full geometric verification. In this paper, we propose to represent
the spatial context of local features into binary codes, and implicitly achieve geometric verification by efficient
comparison of the binary codes. Besides, we explore the multimode property of local features to further boost
the retrieval performance. Experiments on holidays, Paris, and Oxford building benchmark data sets
demonstrate the effectiveness of the proposed algorithm.
ETPL
DIP - 114
Contextual Hashing for Large-Scale Image Search
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Video retargeting is a useful technique to adapt a video to a desired display resolution. It aims to
preserve the information contained in the original video and the shapes of salient objects while maintaining the
temporal coherence of contents in the video. Existing video retargeting schemes achieve temporal coherence
via constraining each region/pixel to be deformed consistently with its corresponding region/pixel in
neighboring frames. However, these methods often distort the shapes of salient objects, since they do not
ensure the content consistency for regions/pixels constrained to be coherently deformed along time axis. In
this paper, we propose a video retargeting scheme to simultaneously meet the two requirements. Our method
first segments a video clip into spatiotemporal grids called grid flows, where the consistency of the content
associated with a grid flow is maintained while retargeting the grid flow. After that, due to the coarse
granularity of grid, there still may exist content inconsistency in some grid flows. We exploit the temporal
redundancy in a grid flow to avoid that the grids with inconsistent content be incorrectly constrained to be
coherently deformed. In particular, we use grid flows to select a set of key-frames which summarize
a video clip, and resize subgrid-flows in these key-frames. We then resize the remaining non key-frames by
simply interpolating their grid contents from the two nearest retargeted key-frames. With the key-frame-based
scheme, we only need to solve a small-scale quadratic programming problem to resize subgrid-flows and
perform grid interpolation, leading to low computation and memory costs. The experimental results
demonstrate the superior performance of our scheme.
ETPL
DIP - 115
Spatiotemporal Grid Flow for Video Retargeting
This paper proposes three clustering-based discriminantanalysis (CDA) models to address the
problem that the Fisher linear discriminant may not be able to extract adequate features for satisfactory
performance, especially for two class problems. The first CDA model, CDA-1, divides each class into a
number of clusters by means of the k-means clustering technique. In this way, a new within-cluster scatter
matrix Swcand a new between-cluster scatter matrix Sb
c are defined. The second and the third CDA models,
CDA-2 and CDA-3, define a nonparametric form of the between-cluster scatter matrices N-Sbc. The
nonparametric nature of the between-cluster scatter matrices inherently leads to the derived features that
preserve the structure important for classification. The difference between CDA-2 and CDA-3 is that the
former computes the between-cluster matrix N-Sbc on a local basis, whereas the latter computes the between-
cluster matrix N-Sbc on a global basis. This paper then presents an accurate CDA-based eye detection method.
Experiments on three widely used face databases show the feasibility of the proposed three CDA models and
the improved eye detection performance over some state-of-the-art methods.
ETPL
DIP - 116
Clustering-Based Discriminant Analysis for Eye Detection
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
While numerous algorithms have been proposed for object tracking with demonstrated success, it
remains a challenging problem for a tracker to handle large appearance change due to factors such as scale,
motion, shape deformation, and occlusion. One of the main reasons is the lack of effective image
representation schemes to account for appearance variation. Most of the trackers use high-level appearance
structure or low-level cues for representing and matching target objects. In this paper, we propose
a tracking method from the perspective of midlevel vision with structural information captured in superpixels.
We present a discriminative appearance model based on superpixels, thereby facilitating a tracker to
distinguish the target and the background with midlevel cues. The tracking task is then formulated by
computing a target-background confidence map, and obtaining the best candidate by maximum a posterior
estimate. Experimental results demonstrate that our tracker is able to handle heavy occlusion and recover from
drifts. In conjunction with online update, the proposed algorithm is shown to perform favorably against
existing methods for object tracking. Furthermore, the proposed algorithm facilitates foreground and
background segmentation during tracking.
ETPL
DIP - 117
Robust Superpixel Tracking
An effective, low complexity method for lossy compression of scenic bi level images,
called lossy cutset coding, is proposed based on a Markov random field model. It operates by loss lessly
encoding pixels in a square grid of lines, which is a cutset with respect to a Markov random field model, and
preserves key structural information, such as borders between black and white regions. Relying on
the Markov random field model, the decoder takes a MAP approach to reconstructing the interior of each grid
block from the pixels on its boundary, thereby creating a piecewise smooth image that is consistent with the
encoded grid pixels. The MAP rule, which reduces to finding the block interiors with fewest black-white
transitions, is directly implementable for the most commonly occurring block boundaries, thereby avoiding the
need for brute force or iterative solutions. Experimental results demonstrate that the new method is
computationally simple, outperforms the current lossy compression technique most suited to scenic
bilevel images, and provides substantially lower rates than lossless techniques, e.g., JBIG, with little loss in
perceived image quality.
ETPL
DIP - 118
Lossy Cutset Coding of Bilevel Images Based on Markov Random Fields
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Text in an image provides vital information for interpreting its contents, and text in a scene can aid a
variety of tasks from navigation to obstacle avoidance and odometry. Despite its value, however, detecting
general text in images remains a challenging research problem. Motivated by the need to consider the widely
varying forms of natural text, we propose a bottom-up approach to the problem, which reflects the
characterness of an image region. In this sense, our approach mirrors the move from saliency detection
methods to measures of objectness. In order to measure the characterness, we develop three novel cues that are
tailored for character detection and a Bayesian method for their integration. Because text is made up of sets of
characters, we then design a Markov random field model so as to exploit the inherent dependencies between
characters. We experimentally demonstrate the effectiveness of our characterness cues as well as the
advantage of Bayesian multicue integration. The proposed text detector outperforms state-of-the-art methods
on a few benchmark scene text detection data sets. We also show that our measurement of characterness is
superior than state-of-the-art saliency detection models when applied to the same task.
ETPL
DIP - 119
Characterness: An Indicator of Text in the Wild
The development of energy selective, photon counting X-ray detectors allows for a wide range of new
possibilities in the area of computed tomographic image formation. Under the assumption of perfect energy
resolution, here we propose a tensor-based iterative algorithm that simultaneously reconstructs the X-ray
attenuation distribution for each energy. We use a multi linear image model rather than a more standard
stacked vector representation in order to develop novel tensor-based regularizers. In particular, we model the
multispectral unknown as a three-way tensor where the first two dimensions are space and the third dimension
is energy. This approach allows for the design of tensor nuclear norm regularizers, which like its 2D
counterpart, is a convex function of the multispectral unknown. The solution to the resulting convex
optimization problem is obtained using an alternating direction method of multipliers approach. Simulation
results show that the generalized tensor nuclear norm can be used as a standalone regularization technique for
the energy selective (spectral) computed tomography problem and when combined with total
variation regularization it enhances the regularization capabilities especially at low energy images where the
effects of noise are most prominent.
ETPL
DIP - 120
Tensor-Based Formulation and Nuclear Norm Regularization for Multienergy
Computed Tomography
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Visual tracking is an important but challenging problem in the computer vision field. In the
real world, the appearances of the target and its surroundings change continuously over space and
time, which provides effective information to track the target robustly. However, enough attention
has not been paid to the spatio-temporal appearance information in previous works. In this paper, a
robust spatio-temporal context model based tracker is presented to complete the tracking task in
unconstrained environments. The tracker is constructed with temporal and spatial appearance context
models. The temporal appearance context model captures the historical appearance of the target to
prevent the tracker from drifting to the background in a long-term tracking. The spatial appearance
context model integrates contributors to build a supporting field. The contributors are the patches
with the same size of the target at the key-points automatically discovered around the target. The
constructed supporting field provides much more information than the appearance of the target itself,
and thus, ensures the robustness of the tracker in complex environments. Extensive experiments on
various challenging databases validate the superiority of our tracker over other state-of-the-art
trackers.
ETPL
DIP - 121
Robust Online Learned Spatio-Temporal Context Model for Visual Tracking
In this paper, we present a compressed-domain video retargeting solution that operates without
compromising the resizing quality. Existing video retargeting methods operate in the spatial (or pixel) domain.
Such a solution is not practical if it is implemented in mobile devices due to its large memory requirement. In
the proposed solution, each component of the retargeting system is designed to exploit the low-level
compressed domain features extracted from the coded bit stream. For example, motion information is obtained
directly from motion vectors. An efficient column shape mesh deformation is employed to solve the difficulty
of sophisticated quad-shape mesh deformation in the compressed domain. The proposed solution achieves
comparable (or slightly better) visual quality performance as compared with several state-of-the-art pixel-
domain retargeting methods at lower computational and memory costs, making content-aware video resizing
both scalable and practical in real-world applications.
ETPL
DIP - 122
Compressed-Domain Video Retargeting
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Modeling the temporal structure of sub-activities is an important yet challenging problem in complex
activity classification. This paper proposes a latent hierarchical model (LHM) to describe the decomposition of
complex activity into sub-activities in a hierarchical way. The LHM has a tree-structure, where each node
corresponds to a video segment (sub-activity) at certain temporal scale. The starting and ending time points of
each sub-activity are represented by two latent variables, which are automatically determined during the
inference process. We formulate the training problem of the LHM in a latent kernelized SVM framework and
develop an efficient cascade inference method to speed up classification. The advantages of our methods come
from: 1) LHM models the complex activity with a deep structure, which is decomposed into sub-activities in a
coarse-to-fine manner and 2) the starting and ending time points of each segment are adaptively determined to
deal with the temporal displacement and duration variation of sub-activity. We conduct experiments on three
datasets: 1) the KTH; 2) the Hollywood2; and 3) the Olympic Sports. The experimental results show the
effectiveness of the LHM in complex activity classification. With dense features, our LHM achieves the state-
of-the-art performance on the Hollywood2 dataset and the Olympic Sports dataset.
ETPL
DIP - 123
Latent Hierarchical Model of Temporal Structure for Complex Activity
Classification
mCENTRIST, a new multichannel feature generation mechanism for recognizing scene categories, is
proposed in this paper. mCENTRIST explicitly captures the image properties that are encoded jointly by two
image channels, which is different from popular multichannel descriptors. In order to avoid the curse of
dimensionality, tradeoffs at both feature and channel levels have been executed to make mCENTRIST
computationally practical. As a result, mCENTRIST is both efficient and easy to implement. In addition, a
hyper opponent color space is proposed by embedding Sobel information into the opponent color space for
further performance improvements. Experiments show that mCENTRIST outperforms established
multichannel descriptors on four RGB and RGB-near infrared data sets, including aerial orthoimagery, indoor,
and outdoor scene category recognition tasks. Experiments also verify that the hyper opponent color space
enhances descriptors' performance effectively.
ETPL
DIP - 124
mCENTRIST: A Multi-Channel Feature Generation Mechanism for Scene
Categorization
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Dictionary learning has been widely used in many image processing tasks. In most of these methods,
the number of basis vectors is either set by experience or coarsely evaluated empirically. In this paper, we
propose a new scale adaptive dictionary learning framework, which jointly estimates suitable scales and
corresponding atoms in an adaptive fashion according to the training data, without the need of prior
information. We design an atom counting function and develop a reliable numerical scheme to solve the
challenging optimization problem. Extensive experiments on texture and video data sets demonstrate
quantitatively and visually that our method can estimate the scale, without damaging the sparse reconstruction
ability.
ETPL
DIP - 125
Scale Adaptive Dictionary Learning
The Richardson-Lucy algorithm is one of the most important in image deconvolution. However, a
drawback is its slow convergence. A significant acceleration was obtained using the technique proposed by
Biggs and Andrews (BA), which is implemented in the deconvlucy function of the image processing
MATLAB toolbox. The BA method was developed heuristically with no proof of convergence. In this paper,
we introduce the heavy-ball (H-B) method for Poisson data optimization and extend it to a scaled H-B method,
which includes the BA method as a special case. The method has a proof of the convergence rate of O(K-2),
where k is the number of iterations. We demonstrate the superior convergence performance, by a speedup
factor of five, of the scaled H-B method on both synthetic and real 3D images.
ETPL
DIP - 126
Scaled Heavy-Ball Acceleration of the Richardson-Lucy Algorithm for 3D
Microscopy Image Restoration
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Active contours are a popular approach for object segmentation that uses an energy minimizing spline
to extract an object's boundary. Nonparametric approaches can be computationally complex, whereas
parametric approaches can be impacted by parameter sensitivity. A decoupled active contour (DAC)
overcomes these problems by decoupling the external and internal energies and optimizing them separately.
However a drawback of this approach is its reliance on the edge gradient as the external energy. This can lead
to poor convergence toward the object boundary in the presence of weak object and strong background edges.
To overcome these issues with convergence, a novel approach is proposed that takes advantage of a sparse
texture model, which explicitly considers texture for boundary detection. The approach then defines the
external energy as a weighted combination of textural and structural variation maps and feeds it into a
multifunctional hidden Markov model for more robust object boundary detection. The enhanced DAC
(EDAC) is qualitatively and visually analyzed on two natural image data sets as well as Brodatz images. The
results demonstrate that EDAC effectively combines texture and structural information to extract the object
boundary without impact on computation time and a reliance on color.
ETPL
DIP - 127
Enhanced Decoupled Active Contour Using Structural and Textural Variation
Energy Functionals
We provide conditions under which 2D digital images preserve their topological properties under rigid
transformations. We consider the two most common digital topology models, namely dual adjacency and well-
composedness. This paper leads to the proposal of optimal preprocessing strategies that ensure the topological
invariance of images under arbitrary rigid transformations. These results and methods are proved to be valid
for various kinds of images (binary, gray-level, label), thus providing generic and efficient tools, which can be
used in particular in the context of image registration and warping.
ETPL
DIP - 128
Topology-Preserving Rigid Transformation of 2D Digital Images
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We propose a texture learning approach that exploits local organizations of scales and directions. First,
linear combinations of Riesz wavelets are learned using kernel support vector machines. The resulting texture
signatures are modeling optimal class-wise discriminatory properties. The visualization of the obtained
signatures allows verifying the visual relevance of the learned concepts. Second, the local orientations of the
signatures are optimized to maximize their responses, which is carried out analytically and can still be
expressed as a linear combination of the initial steerable Riesz templates. The global process is iteratively
repeated to obtain final rotation-covariant texture signatures. Rapid convergence of class-wise signatures is
observed, which demonstrates that the instances are projected into a feature space that leverages the local
organizations of scales and directions. Experimental evaluation reveals average classification accuracies in the
range of 97% to 98% for the Outex_TC_00010, the Outex_TC_00012, and the Contrib_TC_00000 suites for
even orders of the Riesz transform, and suggests high robustness to changes in images orientation and
illumination. The proposed framework requires no arbitrary choices of scales and directions and is expected to
perform well in a large range of computer vision applications.
ETPL
DIP - 129
Rotation–Covariant Texture Learning Using Steerable Riesz Wavelets
X-ray computed tomography (CT) is a powerful tool for noninvasive imaging of time-varying objects.
In the past, methods have been proposed to reconstruct images from continuously changing objects. For
discretely or structurally changing objects, however, such methods fail to reconstruct high quality images,
mainly because assumptions about continuity are no longer valid. In this paper, we propose a method to
reconstruct structurally changing objects. Starting from the observation that there exist regions within the
scanned object that remain unchanged over time, we introduce an iterative optimization routine that can
automatically determine these regions and incorporate this knowledge in an algebraic reconstruction method.
e proposed algorit m was validated on simulation data and e perimental μC data illustrating its capability
to reconstruct structurally changing objects more accurately in comparison to current techniques.
ETPL
DIP - 130
Region-Based Iterative Reconstruction of Structurally Changing Objects in CT
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
As a general framework, Laplacian embedding, based on a pairwise similarity matrix, infers low
dimensional representations from high dimensional data. However, it generally suffers from three issues: 1)
algorithmic performance is sensitive to the size of neighbors; 2) the algorithm encounters the well known
small sample size (SSS) problem; and 3) the algorithm de-emphasizes small distance pairs. To address these
issues, here we propose exponential embedding using matrix exponential and provide a general framework for
dimensionality reduction. In the framework, the matrix exponential can be roughly interpreted by the random
walk over the feature similarity matrix, and thus is more robust. The positive definite property of matrix
exponential deals with the SSS problem. The behavior of the decay function of exponential embedding is more
significant in emphasizing small distance pairs. Under this framework, we apply matrix exponential to extend
many popular Laplacian embedding algorithms, e.g., locality preserving projections, unsupervised
discriminant projections, and marginal fisher analysis. Experiments conducted on the synthesized data, UCI,
and the Georgia Tech face database show that the proposed new framework can well address the issues
mentioned above.
ETPL
DIP - 131
A General Exponential Framework for Dimensionality Reduction
I formulate an optimization framework for computing the transmission actions of streaming multi-
view video content over bandwidth constrained channels. The optimization finds the schedule for sending the
packetized data that maximizes the reconstruction quality of the content, for the given network bandwidth.
Two prospective multi-view content representation formats are considered: 1) MVC and 2) video plus depth.
In the case of each, I formulate directed graph models that characterize the interdependencies between the data
units that comprise the content. For the video plus depth format, I develop a novel space-time error
concealment strategy that reconstructs the missing content based on received data units from multiple views. I
design multiple techniques to solve the optimization problem of interest, at varying degrees of complexity and
accuracy. In conjunction, I derive spatiotemporal models of the reconstruction error for the multi-view content
that I employ to reduce the computational requirements of the optimization. I study the performance of my
framework via simulation experiments. Significant gains in terms of rate-distortion efficiency are
demonstrated over various reference methods.
ETPL
DIP - 132
Transmission Policy Selection for Multi-View Content Delivery Over
Bandwidth Constrained Channels
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper proposes a new approach to label-equivalence-based two-scan connected-component
labeling. We use two strategies to reduce repeated checking-pixel work for labeling. The first is that instead of
scanning image lines one by one and processing pixels one by one as in most conventional two-scan labeling
algorithms, we scan image lines alternate lines, and process pixels two by two. The second is that by
considering the transition of the configuration of pixels in the mask, we utilize the information detected in
processing the last two pixels as much as possible for processing the current two pixels. With our method, any
pixel checked in the mask when processing the current two pixels will not be checked again when the next two
pixels are processed; thus, the efficiency of labeling can be improved. Experimental results demonstrated that
our method was more efficient than all conventional labeling algorithms.
ETPL
DIP - 133
Configuration-Transition-Based Connected-Component Labeling
A novel application of the Hough transform (HT) neighborhood approach to collinear segment
detection was proposed in [1]. It, however, suffered from one major weakness in that it could not provide an
effective solution to the case of segment intersection. This paper analyzes a vital prerequisite step, disturbance
elimination in the Hough space, and shows why, this method alone, is incapable of distinguishing the true
segment endpoints. To address the problem, a unique HT butterfly separation method is proposed in this
correspondence, as an essential complement to the above publication.
ETPL
DIP - 134
Comment on “Collinear Segment Detection Using HT Neighborhoods
In this paper, we propose a novel joint data-hiding and compression scheme for digital images using
side match vector quantization (SMVQ) and image inpainting. The two functions of data hiding and image
compression can be integrated into one single module seamlessly. On the sender side, except for the blocks in
the leftmost and topmost of the image, each of the other residual blocks in raster-scanning order can be
embedded with secret data and compressed simultaneously by SMVQ or image inpainting adaptively
according to the current embedding bit. Vector quantization is also utilized for some complex blocks to control
the visual distortion and error diffusion caused by the progressive compression. After segmenting the image
compressed codes into a series of sections by the indicator bits, the receiver can achieve the extraction of
secret bits and image decompression successfully according to the index values in the segmented sections.
Experimental results demonstrate the effectiveness of the proposed scheme.
ETPL
DIP - 135
A Novel Joint Data-Hiding and Compression Scheme Based on SMVQ and
Image Inpainting
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Building an accurate training database is challenging in supervised classification. For instance, in
medical imaging, radiologists often delineate malignant and benign tissues without access to the histological
ground truth, leading to uncertain data sets. This paper addresses the pattern classification problem arising
when available target data include some uncertainty information. Target data considered here are both
qualitative (a class label) or quantitative (an estimation of the posterior probability). In this context, usual
discriminative methods, such as the support vector machine (SVM), fail either to learn a robust classifier or to
predict accurate probability estimates. We generalize the regular SVM by introducing a new formulation of the
learning problem to take into account class labels as well as class probability estimates. This original
reformulation into a probabilistic SVM (P-SVM) can be efficiently solved by adapting existing flexible SVM
solvers. Furthermore, this framework allows deriving a unique learned prediction function for both decision
and posterior probability estimation providing qualitative and quantitative predictions. The method is first
tested on synthetic data sets to evaluate its properties as compared with the classical SVM and fuzzy-SVM. It
is then evaluated on a clinical data set of multiparametric prostate magnetic resonance images to assess its
performances in discriminating benign from malignant tissues. P-SVM is shown to outperform classical SVM
as well as the fuzzy-SVM in terms of probability predictions and classification performances, and
demonstrates its potential for the design of an efficient computer-aided decision system for prostate cancer
diagnosis based on multiparametric magnetic resonance (MR) imaging.
ETPL
DIP - 136
Kernel-Based Learning From Both Qualitative and Quantitative labels:
Application to Prostate Cancer Diagnosis Based on Multiparametric MR
Imaging
Otsu's algorithm for thresholding images is widely used, and the computational complexity of
determining the threshold from the histogram is O(N) where N is the number of histogram bins. When the
algorithm is adapted to circular rather than linear histograms then two thresholds are required for binary
thresholding. We show that, surprisingly, it is still possible to determine the optimal threshold in O(N) t ime.
The efficient optimal algorithm is over 300 times faster than traditional approaches for typical histograms and
is thus particularly suitable for real-time applications. We further demonstrate the usefulness of circular
thresholding using the adapted Otsu criterion for various applications, including analysis of optical flow data,
indoor/outdoor image classification, and non-photorealistic rendering. In particular, by combining circular
Otsu feature with other colour/texture features, a 96.9% correct rate is obtained for indoor/outdoor
classification on the well known IITM-SCID2 data set, outperforming the state-of-the-art result by 4.3%.
ETPL
DIP - 137
Efficient Circular Thresholding
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Image label prediction is a critical issue in computer vision and machine learning. In this paper, we
propose and develop sparse label-indicator optimization methods for image classification problems. Sparsity is
introduced in the label-indicator such that relevant and irrelevant images with respect to a given class can be
distinguished. Also, when we deal with multi-class image classification problems, the number of possible
classes of a given image can also be constrained to be small in which it is valid for natural images. The
resulting sparsity model can be formulated as a convex optimization problem, and it can be solved very
efficiently. Experimental results are reported to illustrate the effectiveness of the proposed model, and
demonstrate that the classification performance of the proposed method is better than the other testing methods
in this paper.
ETPL
DIP - 138
Sparse Label-Indicator Optimization Methods for Image Classification
In this paper, we propose a novel coding and transmission scheme, called LineCast, for broadcasting
satellite images to a large number of receivers. The proposed LineCast matches perfectly with the line
scanning cameras that are widely adopted in orbit satellites to capture high-resolution images. On the sender
side, each captured line is immediately compressed by a transform-domain scalar modulo quantization.
Without syndrome coding, the transmission power is directly allocated to quantized coefficients by scaling the
coefficients according to their distributions. Finally, the scaled coefficients are transmitted over a dense
constellation. This line-based distributed scheme features low delay, low memory cost, and low complexity.
On the receiver side, our proposed line-based prediction is used to generate side information from previously
decoded lines, which fully utilizes the correlation among lines. The quantized coefficients are decoded by the
linear least square estimator from the received data. The image line is then reconstructed by the scalar modulo
dequantization using the generated side information. Since there is neither syndrome coding nor channel
coding, the proposed LineCast can make a large number of receivers reach the qualities matching their channel
conditions. Our theoretical analysis shows that the proposed LineCast can achieve Shannon's optimum
performance by using a high-dimensional modulo-lattice quantization. Experiments on satellite images
demonstrate that it achieves up to 1.9-dB gain over the state-of-the-art 2D broadcasting scheme and a gain of
more than 5 dB over JPEG 2000 with forward error correction.
ETPL
DIP - 139
LineCast: Line-Based Distributed Coding and Transmission for Broadcasting
Satellite Images
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
The goal of multilabel classification is to reveal the underlying label correlations to boost the accuracy
of classification tasks. Most of the existing multilabel classifiers attempt to exhaustively explore dependency
between correlated labels. It increases the risk of involving unnecessary label dependencies, which are
detrimental to classification performance. Actually, not all the label correlations are indispensable to
multilabel model. Negligible or fragile label correlations cannot be generalized well to the testing data,
especially if there exists label correlation discrepancy between training and testing sets. To minimize such
negative effect in the multilabel model, we propose to learn a sparse structure of label dependency. The
underlying philosophy is that as long as the multilabel dependency cannot be well explained, the principle of
parsimony should be applied to the modeling process of the label correlations. The obtained sparse label
dependency structure discards the outlying correlations between labels, which makes the learned model more
generalizable to future samples. Experiments on real world data sets show the competitive results compared
with existing algorithms.
ETPL
DIP - 140
Multi-Label Image Categorization With Sparse Factor Representation
We present a new method in image segmentation that is based on Otsu's method but iteratively
searches for subregions of the image for segmentation, instead of treating the full image as a whole region for
processing. The iterative method starts with Otsu's threshold and computes the mean values of the two classes
as separated by the threshold. Based on the Otsu's threshold and the two mean values, the method separates the
image into three classes instead of two as the standard Otsu's method does. The first two classes are
determined as the foreground and background and they will not be processed further. The third class is
denoted as a to-be-determined (TBD) region that is processed at next iteration. At the succeeding iteration,
Otsu's method is applied on the TBD region to calculate a new threshold and two class means and the TBD
region is again separated into three classes, namely, foreground, background, and a new TBD region, which by
definition is smaller than the previous TBD regions. Then, the new TBD region is processed in the similar
manner. The process stops when the Otsu's thresholds calculated between two iterations is less than a preset
threshold. Then, all the intermediate foreground and background regions are, respectively, combined to create
the final segmentation result. Tests on synthetic and real images showed that the new iterative method can
achieve better performance than the standard Otsu's method in many challenging cases, such as identifying
weak objects and revealing fine structures of complex objects while the added computational cost is minimal.
ETPL
DIP - 141
A New Iterative Triclass Thresholding Technique in Image Segmentation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Camera-enabled mobile devices are commonly used as interaction platforms for linking the user's
virtual and physical worlds in numerous research and commercial applications, such as serving an augmented
reality interface for mobile information retrieval. The various application scenarios give rise to a key technique
of daily life visual object recognition. On-premise signs (OPSs), a popular form of commercial advertising, are
widely used in our living life. The OPSs often exhibit great visual diversity (e.g., appearing in arbitrary size),
accompanied with complex environmental conditions (e.g., foreground and background clutter). Observing
that such real-world characteristics are lacking in most of the existing image data sets, in this paper, we first
proposed an OPS data set, namely OPS-62, in which totally 4649 OPS images of 62 different businesses are
collected from Google's Street View. Further, for addressing the problem of real-world OPS learning and
recognition, we developed a probabilistic framework based on the distributional clustering, in which we
proposed to exploit the distributional information of each visual feature (the distribution of its associated OPS
labels) as a reliable selection criterion for building discriminative OPS models. Experiments on the OPS-62
data set demonstrated the outperformance of our approach over the state-of-the-art probabilistic latent
semantic analysis models for more accurate recognitions and less false alarms, with a significant 151.28%
relative improvement in the average recognition rate. Meanwhile, our approach is simple, linear, and can be
executed in a parallel fashion, making it practical and scalable for large-scale multimedia applications.
ETPL
DIP - 142
Learning and Recognition of On-Premise Signs From Weakly Labeled Street
View Images
This paper addresses a new learning algorithm for the recently introduced co-sparse analysis model. First, we
give new insights into the co-sparse analysis model by establishing connections to filter-based MRF models,
such as the field of experts model of Roth and Black. For training, we introduce a technique called bi-level
optimization to learn the analysis operators. Compared with existing analysis operator learning approaches,
our training procedure has the advantage that it is unconstrained with respect to the analysis operator. We
investigate the effect of different aspects of the co-sparse analysis model and show that the sparsity promoting
function (also called penalty function) is the most important factor in the model. In order to demonstrate the
effectiveness of our training approach, we apply our trained models to various classical image restoration
problems. Numerical experiments show that our trained models clearly outperform existing analysis operator
learning approaches and are on par with state-of-the-art image denoising algorithms. Our approach develops a
framework that is intuitive to understand and easy to implement.
ETPL
DIP - 143
Insights Into Analysis Operator Learning: From Patch-Based Sparse Models to
Higher Order MRFs
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
The classification of retinal vessels into artery/vein (A/V) is an important phase for automating the
detection of vascular changes, and for the calculation of characteristic signs associated with several systemic
diseases such as diabetes, hypertension, and other cardiovascular conditions. This paper presents an automatic
approach for A/V classification based on the analysis of a graph extracted from the retinal vasculature. The
proposed method classifies the entire vascular tree deciding on the type of each intersection point (graph
nodes) and assigning one of two labels to each vessel segment (graph links). Final classification of a vessel
segment as A/V is performed through the combination of the graph-based labeling results with a set of
intensity features. The results of this proposed method are compared with manual labeling for three public
databases. Accuracy values of 88.3%, 87.4%, and 89.8% are obtained for the images of the INSPIRE-AVR,
DRIVE, and VICAVR databases, respectively. These results demonstrate that our method outperforms recent
approaches for A/V classification.
ETPL
DIP - 144
An Automatic Graph-Based Approach for Artery/Vein Classification in Retinal
Images
In this paper, we propose a novel image interpolation algorithm via graph-based Bayesian label
propagation. The basic idea is to first create a graph with known and unknown pixels as vertices and with edge
weights encoding the similarity between vertices, then the problem of interpolation converts to how to
effectively propagate the label information from known points to unknown ones. This process can be posed as
a Bayesian inference, in which we try to combine the principles of local adaptation and global consistency to
obtain accurate and robust estimation. Specially, our algorithm first constructs a set of local interpolation
models, which predict the intensity labels of all image samples, and a loss term will be minimized to keep the
predicted labels of the available low-resolution (LR) samples sufficiently close to the original ones. Then, all
of the losses evaluated in local neighborhoods are accumulated together to measure the global consistency on
all samples. Moreover, a graph-Laplacian-based manifold regularization term is incorporated to penalize the
global smoothness of intensity labels, such smoothing can alleviate the insufficient training of the local models
and make them more robust. Finally, we construct a unified objective function to combine together the global
loss of the locally linear regression, square error of prediction bias on the available LR samples, and the
manifold regularization term. It can be solved with a closed-form solution as a convex optimization problem.
Experimental results demonstrate that the proposed method achieves competitive performance with the state-
of-the-art image interpolation algorithms.
ETPL
DIP - 145
Image Interpolation via Graph-Based Bayesian Label Propagation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Rain removal is a very useful and important technique in applications such as security surveillance
and movie editing. Several rain removal algorithms have been proposed these years, where photometric,
chromatic, and probabilistic properties of the rain have been exploited to detect and remove the rainy effect.
Current methods generally work well with light rain and relatively static scenes, when dealing with heavier
rainfall in dynamic scenes, these methods give very poor visual results. The proposed algorithm is based on
motion segmentation of dynamic scene. After applying photometric and chromatic constraints for rain
detection, rain removal filters are applied on pixels such that their dynamic property as well as motion
occlusion clue are considered; both spatial and temporal informations are then adaptively exploited during rain
pixel recovery. Results show that the proposed algorithm has a much better performance for rainy scenes with
large motion than existing algorithms.
ETPL
DIP - 146
A Rain Pixel Recovery Algorithm for Videos With Highly Dynamic Scenes
Multiview face recognition has become an active research area in the last few years. In this paper, we
present an approach for video-based face recognition in camera networks. Our goal is to handle pose
variations by exploiting the redundancy in the multiview video data. However, unlike traditional approaches
that explicitly estimate the pose of the face, we propose a novel feature for robust face recognition in the
presence of diffuse lighting and pose variations. The proposed feature is developed using the spherical
harmonic representation of the face texture-mapped onto a sphere; the texture map itself is generated by back-
projecting the multiview video data. Video plays an important role in this scenario. First, it provides an
automatic and efficient way for feature extraction. Second, the data redundancy renders the recognition
algorithm more robust. We measure the similarity between feature sets from different videos using the
reproducing kernel Hilbert space. We demonstrate that the proposed approach outperforms traditional
algorithms on a multiview video database.
ETPL
DIP - 147
Robust Face Recognition From Multi-View Videos
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In this paper, we propose two novel shape descriptors, angular pattern (AP) and binary angular pattern
(BAP), and a multiscale integration of them for shape retrieval. Both AP and BAP are intrinsically invariant to
scale and rotation. More importantly, being global shape descriptors, the proposed shape descriptors are
computationally very efficient, while possessing similar discriminability as state-of-the-art local descriptors.
As a result, the proposed approach is attractive for real world shape retrieval applications. The experiments on
the widely used MPEG-7 and TARI-1000 data sets demonstrate the effectiveness of the proposed method in
comparison with existing methods.
ETPL
DIP - 148
Angular Pattern and Binary Angular Pattern for Shape Retrieval
Using a novel characterization of texture, we propose an image decomposition technique that can
effectively decomposes an image into its cartoon and texture components. The characterization rests on our
observation that the texture component enjoys a blockwise low-rank nature with possible overlap and shear,
because texture, in general, is globally dissimilar but locally well patterned. More specifically, one can
observe that any local block of the texture component consists of only a few individual patterns. Based on this
premise, we first introduce a new convex prior, named the block nuclear norm (BNN), leading to a suitable
characterization of the texture component. We then formulate a cartoon-texture decomposition model as a
convex optimization problem, where the simultaneous estimation of the cartoon and texture components from
a given image or degraded observation is executed by minimizing the total variation and BNN. In addition,
patterns of texture extending in different directions are extracted separately, which is a special feature of the
proposed model and of benefit to texture analysis and other applications. Furthermore, the model can handle
various types of degradation occurring in image processing, including blur+missing pixels with several types
of noise. By rewriting the problem via variable splitting, the so-called alternating direction method of
multipliers becomes applicable, resulting in an efficient algorithmic solution to the problem. Numerical
examples illustrate that the proposed model is very selective to patterns of texture, which makes it produce
better results than state-of-the-art decomposition models.
ETPL
DIP - 149
Cartoon-Texture Image Decomposition Using Blockwise Low-Rank Texture
Characterization
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper investigates a convex-relaxed kernel mapping formulation of image segmentation. We
optimize, under some partition constraints, a functional containing two characteristic terms: 1) a data term,
which maps the observation space to a higher (possibly infinite) dimensional feature space via a kernel
function, thereby evaluating nonlinear distances between the observations and segments parameters and 2) a
total-variation term, which favors smooth segment surfaces (or boundaries). The algorithm iterates two steps:
1) a convex-relaxation optimization with respect to the segments by solving an equivalent constrained problem
via the augmented Lagrange multiplier method and 2) a convergent fixed-point optimization with respect to
the segments parameters. The proposed algorithm can bear with a variety of image types without the need for
complex and application-specific statistical modeling, while having the computational benefits of convex
relaxation. Our solution is amenable to parallelized implementations on graphics processing units (GPUs) and
extends easily to high dimensions. We evaluated the proposed algorithm with several sets of comprehensive
experiments and comparisons, including: 1) computational evaluations over 3D medical-imaging examples
and high-resolution large-size color photographs, which demonstrate that a parallelized implementation of the
proposed method run on a GPU can bring a significant speed-up and 2) accuracy evaluations against five state-
of-the-art methods over the Berkeley color-image database and a multimodel synthetic data set, which
demonstrates competitive performances of the algorithm.
ETPL
DIP - 150
Convex-Relaxed Kernel Mapping for Image Segmentation
High frame rate cameras capture sharp videos of highly dynamic scenes by trading off signal-noise-
ratio and image resolution, so combinational super-resolving and denoising is crucial for enhancing high speed
videos and extending their applications. The solution is nontrivial due to the fact that two deteriorations co-
occur during capturing and noise is nonlinearly dependent on signal strength. To handle this problem, we
propose conducting noise separation and super resolution under a unified optimization framework, which
models both spatiotemporal priors of high quality videos and signal-dependent noise. Mathematically, we
align the frames along temporal axis and pursue the solution under the following three criterion: 1) the sharp
noise-free image stack is low rank with some missing pixels denoting occlusions; 2) the noise follows a given
nonlinear noise model; and 3) the recovered sharp image can be reconstructed well with sparse coefficients
and an over complete dictionary learned from high quality natural images. In computation aspects, we propose
to obtain the final result by solving a convex optimization using the modern local linearization techniques. In
the experiments, we validate the proposed approach in both synthetic and real captured data.
ETPL
DIP - 151
Joint Non-Gaussian Denoising and Superresolving of Raw High Frame Rate
Videos
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
High frame rate cameras capture sharp videos of highly dynamic scenes by trading off signal-noise-
ratio and image resolution, so combinational super-resolving and denoising is crucial for enhancing high speed
videos and extending their applications. The solution is nontrivial due to the fact that two deteriorations co-
occur during capturing and noise is nonlinearly dependent on signal strength. To handle this problem, we
propose conducting noise separation and super resolution under a unified optimization framework, which
models both spatiotemporal priors of high quality videos and signal-dependent noise. Mathematically, we
align the frames along temporal axis and pursue the solution under the following three criterion: 1) the sharp
noise-free image stack is low rank with some missing pixels denoting occlusions; 2) the noise follows a given
nonlinear noise model; and 3) the recovered sharp image can be reconstructed well with sparse coefficients
and an over complete dictionary learned from high quality natural images. In computation aspects, we propose
to obtain the final result by solving a convex optimization using the modern local linearization techniques. In
the experiments, we validate the proposed approach in both synthetic and real captured data.
ETPL
DIP - 152
Does Deblurring Improve Geometrical Hyperspectral Unmixing?
In computed tomography (CT), partial volume effects impede accurate segmentation of structures that
are small with respect to the pixel size. In this paper, it is shown that for objects consisting of a small number
of homogeneous materials, the reconstruction resolution can be substantially increased without altering the
acquisition process. A super-resolution reconstruction approach is introduced that is based on discrete
tomography, in which prior knowledge about the materials in the object is assumed. Discrete tomography has
already been used to create reconstructions from a low number of projection angles, but in this paper, it is
demonstrated that it can also be applied to increase the reconstruction resolution. Experiments on simulated
and real μC data of bone and foam structures s ow t at t e proposed met od indeed leads to significantly
improved structure segmentation and quantification compared with what can be achieved from conventional
reconstructions.
ETPL
DIP - 153
Super-Resolution for Computed Tomography Based on Discrete Tomography
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Illumination estimation is an important component of color constancy and automatic white balancing.
A number of methods of combining illumination estimates obtained from multiple subordinate illumination
estimation methods now appear in the literature. These combinational methods aim to provide better
illumination estimates by fusing the information embedded in the subordinate solutions. The existing
combinational methods are surveyed and analyzed here with the goals of determining: 1) the effectiveness of
fusing illumination estimates from multiple subordinate methods; 2) the best method of combination; 3) the
underlying factors that affect the performance of a combinational method; and 4) the effectiveness of
combination for illumination estimation in multiple-illuminant scenes. The various combinational methods are
categorized in terms of whether or not they require supervised training and whether or not they rely on high-
level scene content cues (e.g., indoor versus outdoor). Extensive tests and enhanced analyzes using three data
sets of real-world images are conducted. For consistency in testing, the images were labeled according to their
high-level features (3D stages, indoor/outdoor) and this label data is made available on-line. The tests reveal
that the trained combinational methods (direct combination by support vector regression in particular) clearly
outperform both the non-combinational methods and those combinational methods based on scene content
cues.
ETPL
DIP - 154
Evaluating Combinational Illumination Estimation Methods on Real-World
Images
The automatic clustering of time-varying characteristics and phenomena in natural scenes has recently
received great attention. While there exist many algorithms for motion segmentation, an important issue
arising from these studies concerns that for which attributes of the data should be used to cluster phenomena
with a certain repetitiveness in both space and time. It is difficult because there is no knowledge about the
labels of the phenomena to guide the search. In this paper, we present a feature selection dynamic mixture
model for motion segmentation. The advantage of our method is that it is intuitively appealing, avoiding any
combinatorial search, and allowing us to prune the feature set. Numerical experiments on various phenomena
are conducted. The performance of the proposed model is compared with that of other motion segmentation
algorithms, demonstrating the robustness and accuracy of our method.
ETPL
DIP - 155
An Unsupervised Feature Selection Dynamic Mixture Model for Motion
Segmentation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper presents a new lossless color image compression algorithm, based on the hierarchical
prediction and context-adaptive arithmetic coding. For the lossless compression of an RGB image, it is first
decorrelated by a reversible color transform and then Y component is encoded by a conventional lossless
grayscale image compression method. For encoding the chrominance images, we develop a hierarchical
scheme that enables the use of upper, left, and lower pixels for the pixel prediction, whereas the conventional
raster scan prediction methods use upper and left pixels. An appropriate context model for the prediction error
is also defined and the arithmetic coding is applied to the error signal corresponding to each context. For
several sets of images, it is shown that the proposed method further reduces the bit rates compared with
JPEG2000 and JPEG-XR.
ETPL
DIP - 156
Hierarchical Prediction and Context Adaptive Coding for Lossless Color Image
Compression
One of the most challenging ongoing issues in the field of 3D visual research is how to perceptually
quantify object and surface visualizations that are displayed within a virtual 3D space between a human eye
and 3D display. To seek an effective method of quantification, it is necessary to measure various elements
related to the perception of 3D objects at different depths. We propose a new framework for quantifying 3D
visual information that we call 3D visual activity (3DVA), which utilizes natural scene statistics measured
over 3D visual coordinates. We account for important aspects of 3D perception by carrying out a 3D
coordinate transform reflecting the nonuniform sampling resolution of the eye and the process of stereoscopic
fusion. The 3DVA utilizes the empirical distortions of wavelet coefficients to a parametric generalized
Gaussian probability distribution model and a set of 3D perceptual weights. We conducted a series of
simulations that demonstrate the effectiveness of the 3DVA for quantifying the statistical dynamics of visual
3D space with respect to disparity, motion, texture, and color. A successful example application is also
provided, whereby 3DVA is applied to the problem of predicting visual fatigue experienced when viewing 3D
displays.
ETPL
DIP - 157
3D Visual Activity Assessment Based on Natural Scene Statistics
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper presents a new method to estimate the parameters of two types of blurs, linear uniform
motion (approximated by a line characterized by angle and length) and out-of-focus (modeled as a uniform
disk characterized by its radius), for blind restoration of natural images. The method is based on the spectrum
of the blurred images and is supported on a weak assumption, which is valid for the most natural images: the
power-spectrum is approximately isotropic and has a power-law decay with the spatial frequency. We
introduce two modifications to the radon transform, which allow the identification of the blur spectrum pattern
of the two types of blurs above mentioned. The blur parameters are identified by fitting an appropriate
function that accounts separately for the natural image spectrum and the blur frequency response. The
accuracy of the proposed method is validated by simulations, and the effectiveness of the proposed method is
assessed by testing the algorithm on real natural blurred images and comparing it with state-of-the-art blind
deconvolution methods.
ETPL
DIP - 158
Parametric Blur Estimation for Blind Restoration of Natural Images: Linear
Motion and Out-of-Focus
In pedestrian detection, as sophisticated feature descriptors are used for improving detection accuracy,
its processing speed becomes a critical issue. In this paper, we propose a novel speed-up scheme based on
multiple-instance pruning (MIP), one of the soft cascade methods, to enhance the processing speed of support
vector machine (SVM) classifiers. Our scheme mainly consists of three steps. First, we regularly split an SVM
classifier into multiple parts and build a cascade structure using them. Next, we rearrange the cascade structure
for enhancing the rejection rate, and then train the rejection threshold of each stage composing the cascade
structure using the MIP. To verify the validity of our scheme, we apply it to a pedestrian classifier using co-
occurrence histograms of oriented gradients trained by an SVM, and experimental results show that the
processing time for classification of the proposed scheme is as low as one-hundredth of the original classifier
without sacrificing detection accuracy.
ETPL
DIP - 159
A Speed-Up Scheme Based on Multiple-Instance Pruning for Pedestrian
Detection Using a Support Vector Machine
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
The development of video quality metrics requires methods for measuring perceived video quality.
Most of these metrics are designed and tested using databases of images degraded by compression and scored
using opinion ratings. We studied video quality preferences for enhanced images of normally-sighted
participants using the method of paired comparisons with a thorough statistical analysis. Participants (n=40)
made pair-wise comparisons of high definition video clips enhanced at four different levels using a
commercially available enhancement device. Perceptual scales were computed with binary logistic regression
to estimate preferences for each level and to provide statistical inference of the differences among levels and
the impact of other variables. While moderate preference for enhanced videos was found, two unexpected
effects were also uncovered: 1) participants could be broadly classified into two groups: a) those who
preferred enhancement (“S arp”) and b) t ose w o disliked en ancement (“Smoot ”) and 2) en ancement
preferences depended on video content, particularly for human faces to be enhanced less. The results suggest
that algorithms to evaluate image quality (at least for enhancement) may need to be adjusted or applied
differentially based on video content and viewer preferences. The possible impact of similar effects on image
quality of compressed video needs to be evaluated.
ETPL
DIP - 160
Factors Affecting Enhanced Video Quality Preferences
A new fingerprint compression algorithm based on sparse representation is introduced. Obtaining an
overcomplete dictionary from a set of fingerprint patches allows us to represent them as a sparse linear
combination of dictionary atoms. In the algorithm, we first construct a dictionary for predefined fingerprint
image patches. For a new given fingerprint images, represent its patches according to the dictionary by
computing l0-minimization and then quantize and encode the representation. In this paper, we consider the
effect of various factors on compression results. Three groups of fingerprint images are tested. The
experiments demonstrate that our algorithm is efficient compared with several competing compression
techniques (JPEG, JPEG 2000, and WSQ), especially at high compression ratios. The experiments also
illustrate that the proposed algorithm is robust to extract minutiae.
ETPL
DIP - 161
Fingerprint Compression Based on Sparse Representation
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
The plenoptic function is a powerful tool to analyze the properties of multi-view image data sets. In
particular, the understanding of the spectral properties of the plenoptic function is essential in many computer
vision applications, including image-based rendering. In this paper, we derive for the first time an exact
closed-form expression of the plenoptic spectrum of a slanted plane with finite width and use this expression
as the elementary building block to derive the plenoptic spectrum of more sophisticated scenes. This is
achieved by approximating the geometry of the scene with a set of slanted planes and evaluating the closed-
form expression for each plane in the set. We then use this closed-form expression to revisit uniform plenoptic
sampling. In this context, we derive a new Nyquist rate for the plenoptic sampling of a slanted plane and a new
reconstruction filter. Through numerical simulations, on both real and synthetic scenes, we show that the new
filter outperforms alternative existing filters.
ETPL
DIP - 162
On the Spectrum of the Plenoptic Function
In digital forensics, recovery of a damaged or altered video file plays a crucial role in searching for
evidences to resolve a criminal case. This paper presents a frame-based recovery technique of a corrupted
video file using the specifications of a codec used to encode the video data. A video frame is the minimum
meaningful unit of video data. Many existing approaches attempt to recover a video file using file structure
rather than frame structure. In case a target video file is severely fragmented or even has a portion of video
overwritten by other video content, however, video file recovery of existing approaches may fail. The
proposed approach addresses how to extract video frames from a portion of video to be restored as well as
how to connect extracted video frames together according to the codec specifications. Experiment results show
that the proposed technique successfully restores fragmented video files regardless of the amount of
fragmentations. For a corrupted video file containing overwritten segments, the proposed technique can
recover most of the video content in non-overwritten segments of the video file.
ETPL
DIP - 163
Frame-Based Recovery of Corrupted Video Files Using Video Codec
Specifications
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
.
Vector-valued images such as RGB color images or multimodal medical images show a strong
interchannel correlation, which is not exploited by most image processing tools. We propose a new notion of
treating vector-valued images which is based on the angle between the spatial gradients of their channels.
Through minimizing a cost functional that penalizes large angles, images with parallel level sets can be
obtained. After formally introducing this idea and the corresponding cost functionals, we discuss t eir teau
derivatives that lead to a diffusion-like gradient descent scheme. We illustrate the properties of this cost
functional by several examples in denoising and demosaicking of RGB color images. They show that parallel
level sets are a suitable concept for color image enhancement. Demosaicking with parallel level sets gives
visually perfect results for low noise levels. Furthermore, the proposed functional yields sharper images than
the other approaches in comparison.
ETPL
DIP - 164
Vector-Valued Image Processing by Parallel Level Sets
In region-of-interest (ROI)-based video coding, ROI parts of the frame are encoded with higher
quality than non-ROI parts. At low bit rates, such encoding may produce attention-grabbing coding artifacts,
which may draw viewer's attention away from ROI, thereby degrading visual quality. In this paper, we present
a saliency-aware video compression method for ROI-based video coding. The proposed method aims at
reducing salient coding artifacts in non-ROI parts of the frame in order to keep user's attention on ROI.
Further, the method allows saliency to increase in high quality parts of the frame, and allows saliency to
reduce in non-ROI parts. Experimental results indicate that the proposed method is able to improve visual
quality of encoded video relative to conventional rate distortion optimized video coding, as well as two state-
of-the art perceptual video coding methods.
ETPL
DIP - 165
Saliency-Aware Video Compression
A new algorithm for calculating the metamer mismatch volumes that arise in colour vision and colour imaging
is introduced. Unlike previous methods, the proposed method places no restrictions on the set of possible
object reflectance spectra. As a result of such restrictions, previous methods have only been able to provide
approximate solutions to the mismatch volume. The proposed new method is the first to characterize precisely
the metamer mismatch volume for any possible reflectance.
ETPL
DIP - 166
Metamer Mismatching
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper is devoted to the study of a directional lifting transform for wavelet frames. A
nonsubsampled lifting structure is developed to maintain the translation invariance as it is an important
property in image denoising. Then, the directionality of the lifting-based tight frame is explicitly discussed,
followed by a specific translation invariant directional framelet transform (TIDFT). The TIDFT has two
framelets ψ1 ψ2 wit vanis ing moments of order two and one respectively w ic are able to detect
singularities in a given direction set. It provides an efficient and sparse representation for images containing
rich textures along with properties of fast implementation and perfect reconstruction. In addition, an adaptive
block-wise orientation estimation method based on Gabor filters is presented instead of the conventional
minimization of residuals. Furthermore, the TIDFT is utilized to exploit the capability of image denoising,
incorporating the MAP estimator for multivariate exponential distribution. Consequently, the TIDFT is able to
eliminate the noise effectively while preserving the textures simultaneously. Experimental results show that
the TIDFT outperforms some other frame-based denoising methods, such as contourlet and shearlet, and is
competitive to the state-of-the-art denoising approaches.
ETPL
DIP - 167
Translation Invariant Directional Framelet Transform Combined With Gabor
Filters for Image Denoising
We propose a segmentation method based on the geometric representation of images as two-
dimensional manifolds embedded in a higher dimensional space. The segmentation is formulated as a
minimization problem, where the contours are described by a level set function and the objective functional
corresponds to the surface of the image manifold. In this geometric framework, both data-fidelity and
regularity terms of the segmentation are represented by a single functional that intrinsically aligns the
gradients of the level set function with the gradients of the image and exploits this directional information to
overcome image in homogeneities and fragmented contours. The proposed formulation combines this robust
alignment of gradients with attractive properties of previous methods developed in the same geometric
framework: the natural coupling of image channels proposed for anisotropic diffusion in [1] and the ability of
subjective surfaces [2] to detect weak edges and close fragmented boundaries. The potential of such a
geometric approach lies in the general definition of Riemannian manifolds, which naturally generalizes
existing segmentation methods (the geodesic active contours of Caselles et al. [3], the active contours without
edges of Chan and Vese [4] and the robust edge integrator of Kimmel and Bruckstein [5]) to higher
dimensional spaces, non-flat images and feature spaces. Our experiments show that the proposed technique
improves the segmentation of multichannel images, images subject to in homogeneities, and images
ETPL
DIP - 168
Harmonic Active Contours
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Most existing color constancy algorithms assume uniform illumination. However, in real-world
scenes, this is not often the case. Thus, we propose a novel framework for estimating the colors of multiple
illuminants and their spatial distribution in the scene. We formulate this problem as an energy minimization
task within a conditional random field over a set of local illuminant estimates. In order to quantitatively
evaluate the proposed method, we created a novel data set of two-dominant-illuminant images comprised of
laboratory, indoor, and outdoor scenes. Unlike prior work, our database includes accurate pixel-wise ground
truth illuminant information. The performance of our method is evaluated on multiple data sets. Experimental
results show that our framework clearly outperforms single illuminant estimators as well as a recently
proposed multi-illuminant estimation approach.
ETPL
DIP - 169
Multi-Illuminant Estimation with Conditional Random Fields
Time multiplexing (TM) and spatial neighborhood (SN) are two mainstream structured light
techniques widely used for depth sensing. The former is well known for its high accuracy and the latter for its
low delay. In this paper, we explore a new paradigm of scalable depth sensing to integrate the advantages of
both the TM and SN methods. Our contribution is twofold. First, we design a set of hybrid structured light
patterns composed of phase-shifted fringe and pseudo-random speckle. Under the illumination of the hybrid
patterns, depth can be decently reconstructed either from a few consecutive frames with the TM principle for
static scenes or from a single frame with the SN principle for dynamic scenes. Second, we propose a scene-
adaptive depth sensing framework based on which a global or region-wise optimal depth map can be generated
through motion detection. To validate the proposed scalable paradigm, we develop a real-time (20fps) depth
sensing system. Experimental results demonstrate that our method achieves an efficient balance between
accuracy and speed during depth sensing that has rarely been exploited before
ETPL
DIP - 170
Real-Time Scalable Depth Sensing With Hybrid Structured Light Illumination
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We address the two inherently related problems of segmentation and interpolation of 3D and 4D
sparse data and propose a new method to integrate these stages in a level set framework. The interpolation
process uses segmentation information rather than pixel intensities for increased robustness and accuracy. The
method supports any spatial configurations of sets of 2D slices having arbitrary positions and orientations. We
achieve this by introducing a new level set scheme based on the interpolation of the level set function by radial
basis functions. The proposed method is validated quantitatively and/or subjectively on artificial data and MRI
and CT scans and is compared against the traditional sequential approach, which interpolates the images first,
using a state-of-the-art image interpolation method, and then segments the interpolated volume in 3D or 4D. In
our experiments, the proposed framework yielded similar segmentation results to the sequential approach but
provided a more robust and accurate interpolation. In particular, the interpolation was more satisfactory in
cases of large gaps, due to the method taking into account the global shape of the object, and it recovered
better topologies at the extremities of the shapes where the objects disappear from the image slices. As a
result, the complete integrated framework provided more satisfactory shape reconstructions than the sequential
approach.
ETPL
DIP - 171
Integrated Segmentation and Interpolation of Sparse Data
We present four novel point-to-set distances defined for fuzzy or gray-level image data, two based on
integration over α-cuts and two based on the fuzzy distance transform. We explore their theoretical properties.
Inserting the proposed point-to-set distances in existing definitions of set-to-set distances, among which are
the Hausdorff distance and the sum of minimal distances, we define a number of distances between fuzzy sets.
These set distances are directly applicable for comparing gray-level images or fuzzy segmented objects, but
also for detecting patterns and matching parts of images. The distance measures integrate shape and
intensity/membership of observed entities, providing a highly applicable tool for image processing and
analysis. Performance evaluation of derived set distances in real image processing tasks is conducted and
presented. It is shown that the considered distances have a number of appealing theoretical properties and
exhibit very good performance in template matching and object classification for fuzzy segmented images as
well as when applied directly on gray-level intensity images. Examples include recognition of hand written
digits and identification of virus particles. The proposed set distances perform excellently on the MNIST digit
classification task, achieving the best reported error rate for classification using only rigid body
transformations and a kNN classifier.
ETPL
DIP - 172
Linear Time Distances between Fuzzy Sets with Applications to Pattern
Matching and Classification
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In this paper, we develop an efficient bit allocation strategy for subband-based image
coding systems. More specifically, our objective is to design a new optimization algorithm based on a
rate-distortion optimality criterion. To this end, we consider the uniform scalar quantization of a class
of mixed distributed sources following a Bernoulli-generalized Gaussian distribution. This model
appears to be particularly well-adapted for image data, which have a sparse representation in a
wavelet basis. In this paper, we propose new approximations of the entropy and the distortion
functions using piecewise affine and exponential forms, respectively. Because of these
approximations, bit allocation is reformulated as a convex optimization problem. Solving the
resulting problem allows us to derive the optimal quantization step for each subband. Experimental
results show the benefits that can be drawn from the proposed bit allocation method in a typical
transform-based coding application.
ETPL
DIP - 173
A Bit Allocation Method for Sparse Source Coding
When matching images for applications such as mosaicking and homography estimation, the
distribution of features across the overlap region affects the accuracy of the result. This paper uses the spatial
statistics of these features, measured by Ripley's K-function, to assess whether feature matches are clustered
together or spread around the overlap region. A comparison of the performances of a dozen state-of-the-art
feature detectors is then carried out using analysis of variance and a large image database. Results show that
SFOP introduces significantly less aggregation than the other detectors tested. When the detectors are rank-
ordered by this performance measure, the order is broadly similar to those obtained by other means, suggesting
that the ordering reflects genuine performance differences. Experiments on stitching images into mosaics
confirm that better coverage values yield better quality outputs.
ETPL
DIP - 174
Spatial Statistics of Image Features for Performance Comparison
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We present a novel method to incorporate prior knowledge into normalized cuts. The prior is
incorporated into the cost function by maximizing the similarity of the prior to one partition and the
dissimilarity to the other. This simple formulation can also be extended to multiple priors to allow the
modeling of the shape variations. A shape model obtained by PCA on a training set can be easily integrated
into the new framework. This is in contrast to other methods that usually incorporate prior knowledge by hard
constraints during optimization. The eigenvalue problem inferred by spectral relaxation is not sparse, but can
still be solved efficiently. We apply this method to biomedical data sets as well as natural images of people
from a public database and compare it with other normalized cut based segmentation algorithms. We
demonstrate that our method gives promising results and can still give a good segmentation even when the
prior is not accurate.
ETPL
DIP - 175
Shape-Based Normalized Cuts Using Spectral Relaxation for Biomedical
Segmentation
The selection of optimal camera configurations (camera locations, orientations, etc.) for multi-camera
networks remains an unsolved problem. Previous approaches largely focus on proposing various objective
functions to achieve different tasks. Most of them, however, do not generalize well to large scale networks. To
tackle this, we propose a statistical framework of the problem as well as propose a trans-dimensional
simulated annealing algorithm to effectively deal with it. We compare our approach with a state-of-the-art
method based on binary integer programming (BIP) and show that our approach offers similar performance on
small scale problems. However, we also demonstrate the capability of our approach in dealing with large scale
problems and show that our approach produces better results than two alternative heuristics designed to deal
with the scalability issue of BIP. Last, we show the versatility of our approach using a number of specific
scenarios.
ETPL
DIP - 176
Optimal Camera Planning Under Versatile User Constraints in Multi-Camera
Image Processing Systems
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
We propose an analytical model to estimate the synthesized view quality in 3D video. The model relates
errors in the depth images to the synthesis quality, taking into account texture image characteristics, texture
image quality, and the rendering process. Especially, we decompose the synthesis distortion into texture-error
induced distortion and depth-error induced distortion. We analyze the depth-error induced distortion using an
approach combining frequency and spatial domain techniques. Experiment results with video sequences and
coding/rendering tools used in MPEG 3DV activities show that our analytical model can accurately estimate
the synthesis noise power. Thus, the model can be used to estimate the rendering quality for different system
designs.
ETPL
DIP - 177
An Analytical Model for Synthesis Distortion Estimation in 3D Video
Contrast sensitivity of the human visual system to visual stimuli can be significantly affected by
several mechanisms, e.g., vision foveation and attention. Existing studies on foveation based video quality
assessment only take into account static foveation mechanism. This paper first proposes an advanced foveal
imaging model to generate the perceived representation of video by integrating visual attention into the
foveation mechanism. For accurately simulating the dynamic foveation mechanism, a novel approach to
predict video fixations is proposed by mimicking the essential functionality of eye movement. Consequently,
an advanced contrast sensitivity function, derived from the attention driven foveation mechanism, is modeled
and then integrated into a wavelet-based distortion visibility measure to build a full reference attention driven
foveated video quality (AFViQ) metric. AFViQ exploits adequately perceptual visual mechanisms in video
quality assessment. Extensive evaluation results with respect to several publicly available eye-tracking and
video quality databases demonstrate promising performance of the proposed video attention model, fixation
prediction approach, and quality metric.
ETPL
DIP - 178
Attention Driven Foveated Video Quality Assessment
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Acquiring scenery depth is a fundamental task in computer vision, with many applications in
manufacturing, surveillance, or robotics relying on accurate scenery information. Time-of-flight cameras can
provide depth information in real-time and overcome short-comings of traditional stereo analysis. However,
they provide limited spatial resolution and sophisticated upscaling algorithms are sought after. In this paper,
we present a sensor fusion approach to time-of-flight super resolution, based on the combination of depth and
texture sources. Unlike other texture guided approaches, we interpret the depth upscaling process as a
weighted energy optimization problem. Three different weights are introduced, employing different available
sensor data. The individual weights address object boundaries in depth, depth sensor noise, and temporal
consistency. Applied in consecutive order, they form three weighting strategies for time-of-flight super
resolution. Objective evaluations show advantages in depth accuracy and for depth image based rendering
compared with state-of-the-art depth upscaling. Subjective view synthesis evaluation shows a significant
increase in viewer preference by a factor of four in stereoscopic viewing conditions. To the best of our
knowledge, this is the first extensive subjective test performed on time-of-flight depth upscaling. Objective
and subjective results proof the suitability of our approach to time-of-flight super resolution approach for
depth scenery capture.
ETPL
DIP - 179
A Weighted Optimization Approach to Time-of-Flight Sensor Fusion
This paper presents a novel learning method for precise eye localization, a challenge to be solved in
order to improve the performance of face processing algorithms. Few existing approaches can directly detect
and localize eyes with arbitrary angels in predicted eye regions, face images, and original portraits at the same
time. To preserve rotation invariant property throughout the entire eye localization framework, a codebook of
invariant local features is proposed for the representation of eye patterns. A heat map is then generated by
integrating a 2-class sparse representation classifier with a pyramid-like detecting and locating strategy to
fulfill the task of discriminative classification and precise localization. Furthermore, a series of prior
information is adopted to improve the localization precision and accuracy. Experimental results on three
different databases show that our method is capable of effectively locating eyes in arbitrary rotation situations
(360° in plane).
ETPL
DIP - 180
A Novel Eye Localization Method with Rotation Invariance
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Directionlets allow a construction of perfect reconstruction and critically sampled multidirectional
anisotropic basis, yet retaining the separable filtering of standard wavelet transform. However, due to the
spatially varying filtering and downsampling direction, it is forced to apply spatial segmentation and process
each segment independently. Because of this independent processing of the image segments, directionlets
suffer from the following two major limitations when applied to, say, image coding. First, failure to exploit the
correlation across block boundaries degrades the coding performance and also induces blocking artifacts, thus
making it mandatory to use de-blocking filter at low bit rates. Second, spatial scalability, i.e., minimum
segment size or the number of levels of the transform, is limited due to independent processing of segments.
We show that, with simple modifications in the block boundaries, we can overcome these limitations by, what
we call, in-phase lifting implementation of directionlets. In the context of directionlets using in-phase lifting,
we identify different possible groups of downsampling matrices that would allow the construction of a
multilevel transform without forcing independent processing of segments both with and without any
modifications in the segment boundary. Experimental results in image coding show objective and subjective
improvements when compared with the directionlets applied independently on each image segment. As an
application, using both the in-phase lifting implementation of directionlets and the adaptive directional lifting,
we have constructed an adaptive directional wavelet transform, which has shown improved image coding
performance over these adaptive directional wavelet transforms.
ETPL
DIP - 181
Directionlets Using In-Phase Lifting for Image Representation
The goal of this paper is to design a statistical test for the camera model identification problem. The
approach is based on the heteroscedastic noise model, which more accurately describes a natural raw image.
This model is characterized by only two parameters, which are considered as unique fingerprint to identify
camera models. The camera model identification problem is cast in the framework of hypothesis testing
theory. In an ideal context where all model parameters are perfectly known, the likelihood ratio test (LRT) is
presented and its performances are theoretically established. For a practical use, two generalized LRTs are
designed to deal with unknown model parameters so that they can meet a prescribed false alarm probability
while ensuring a high detection performance. Numerical results on simulated images and real natural raw
images highlight the relevance of the proposed approach.
ETPL
DIP - 182
Camera Model Identification Based on the Heteroscedastic Noise Model
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Vector bilateral filtering has been shown to provide good tradeoff between noise removal and edge
degradation when applied to multispectral/hyperspectral image denoising. It has also been demonstrated to
provide dynamic range enhancement of bands that have impaired signal to noise ratios (SNRs). Typical vector
bilateral filtering described in the literature does not use parameters satisfying optimality criteria. We
introduce an approach for selection of the parameters of a vector bilateral filter through an optimization
procedure rather than by ad hoc means. The approach is based on posing the filtering problem as one of
nonlinear estimation and minimization of the Stein's unbiased risk estimate of this nonlinear estimator. Along
the way, we provide a plausibility argument through an analytical example as to why vector bilateral filtering
outperforms band-wise 2D bilateral filtering in enhancing SNR. Experimental results show that the optimized
vector bilateral filter provides improved denoising performance on multispectral images when compared with
several other approaches.
ETPL
DIP - 183
Multispectral Image Denoising With Optimized Vector Bilateral Filter
In this paper, we investigate a new inter-channel coding mode called LM mode proposed for the next
generation video coding standard called high efficiency video coding. This mode exploits inter-channel
correlation using reconstructed luma to predict chroma linearly with parameters derived from neighboring
reconstructed luma and chroma pixels at both encoder and decoder to avoid overhead signaling. In this paper,
we analyze the LM mode and prove that the LM parameters for predicting original chroma and reconstructed
chroma are statistically the same. We also analyze the error sensitivity of the LM parameters. We identify
some LM mode problematic situations and propose three novel LM-like modes called LMA, LML, and LMO
to address the situations. To limit the increase in complexity due to the LM-like modes, we propose some fast
algorithms with the help of some new cost functions. We further identify some potentially-problematic
conditions in the parameter estimation (including regression dilution problem) and introduce a novel model
correction technique to detect and correct those conditions. Simulation results suggest that considerable BD-
rate reduction can be achieved by the proposed LM-like modes and model correction technique. In addition,
the performance gain of the two techniques appears to be essentially additive when combined.
ETPL
DIP - 184
Chroma Intra Prediction Based on Inter-Channel Correlation for HEVC
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper proposes a quadratic classification approach on the subspace of Extended
Histogram of Gradients (ExHoG) for human detection. By investigating the limitations of Histogram
of Gradients (HG) and Histogram of Oriented Gradients (HOG), ExHoG is proposed as a new feature
for human detection. ExHoG alleviates the problem of discrimination between a dark object against a
bright background and vice versa inherent in HG. It also resolves an issue of HOG whereby gradients
of opposite directions in the same cell are mapped into the same histogram bin. We reduce the
dimensionality of ExHoG using Asymmetric Principal Component Analysis (APCA) for improved
quadratic classification. APCA also addresses the asymmetry issue in training sets of human
detection where there are much fewer human samples than non-human samples. Our proposed
approach is tested on three established benchmarking data sets - INRIA, Caltech, and Daimler - using
a modified Minimum Mahalanobis distance classifier. Results indicate that the proposed approach
outperforms current state-of-the-art human detection methods.
ETPL
DIP - 185
Human Detection by Quadratic Classification on Subspace of Extended
Histogram of Gradients
In this paper, we address the problem of recovering a color image from a grayscale one. The input
color data comes from a source image considered as a reference image. Reconstructing the missing color of a
grayscale pixel is here viewed as the problem of automatically selecting the best color among a set of color
candidates while simultaneously ensuring the local spatial coherency of the reconstructed color information.
To solve this problem, we propose a variational approach where a specific energy is designed to model the
color selection and the spatial constraint problems simultaneously. The contributions of this paper are twofold.
First, we introduce a variational formulation modeling the color selection problem under spatial constraints
and propose a minimization scheme, which computes a local minima of the defined nonconvex energy.
Second, we combine different patch-based features and distances in order to construct a consistent set of
possible color candidates. This set is used as input data and our energy minimization automatically selects the
best color to transfer for each pixel of the grayscale image. Finally, the experiments illustrate the potentiality
of our simple methodology and show that our results are very competitive with respect to the state-of-the-art
methods.
ETPL
DIP - 186
Variational Exemplar-Based Image Colorization
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Depth-map merging based 3D modeling is an effective approach for reconstructing large-scale scenes
from multiple images. In addition to generate high quality depth maps at each image, how to select suitable
neighboring images for each image is also an important step in the reconstruction pipeline, unfortunately to
which little attention has been paid in the literature untill now. This paper is intended to tackle this issue for
large scale scene reconstruction where many unordered images are captured and used with substantial varying
scale and view-angle changes. We formulate the neighboring image selection as a combinatorial optimization
problem and use the quantum-inspired evolutionary algorithm to seek its optimal solution. Experimental
results on the ground truth data set show that our approach can significantly improve the quality of the depth-
maps as well as final 3D reconstruction results with high computational efficiency.
ETPL
DIP - 187
How to Select Good Neighboring Images in Depth-Map Merging Based 3D
Modeling
A joint source-channel coding has attracted substantial attention with the aim of further exploiting the
residual correlation residing in the encoded video signals for the sake of improving the reconstructed video
quality. In our previous paper, a first-order Markov process model was utilized as an error concealment tool
for exploiting the intra-frame correlation residing in the Wyner--Ziv (WZ) frame in the context of pixel-
domain distributed video coding. In this contribution, we exploit the inter-view correlation with the aid of an
inter-view motion search in distributed multi-view video coding (DMVC). Initially, we rely on the system
architecture of WZ coding invoked for multi-view video. Then, we construct a novel mesh-structured pixel-
correlation model from the inter-view motion vectors and derive its decoding rules for joint source-channel
decoding. Finally, we benchmark the attainable system performance against the existing pixel-domain WZ
coding based DMVC scheme, where the classic turbo codec is employed. Our simulation results show that
substantial bitrate reductions are achieved by employing the proposed motion-aware mesh-structured
correlation modelling technique in a DMVC scheme.
ETPL
DIP - 188
Motion-Aware Mesh-Structured Trellis for Correlation Modelling Aided
Distributed Multi-View Video Coding
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
During the acquisition process with the Compton gamma-camera, integrals of the intensity distribution
of the source on conical surfaces are measured. They represent the Compton projections of the intensity. The
inversion of the Compton transform reposes on a particular Fourier-Slice theorem. This paper proposes a
filtered backprojection algorithm for image reconstruction from planar Compton camera data. We show how
different projections are related together and how they may be combined in the tomographical reconstruction
step. Considering a simulated Compton imaging system, we conclude that the proposed method yields
accurate reconstructed images for simple sources. An elongation of the source in the direction orthogonal to
the camera may be observed and is to be related to the truncation of the projections induced by the finite extent
of the device. This phenomenon was previously observed with other reconstruction methods, e.g., iterative
maximum likelihood expectation maximization. The redundancy of the Compton transform is thus an
important feature for the reduction of noise in Compton images, since the ideal assumptions of infinite width
and observation time are never met in practice. We show that a selection operated on the set of data allows to
partially get around projection truncation, at the expense of an enhancement of the noise in the images.
ETPL
DIP - 189
Filtered Backprojection Reconstruction and Redundancy in Compton Camera
Imaging
In this paper, we present a novel view synthesis method named Visto, which uses a reference input
view to generate synthesized views in nearby viewpoints. We formulate the problem as a joint optimization of
inter-view texture and depth map similarity, a framework that is significantly different from other traditional
approaches. As such, Visto tends to implicitly inherit the image characteristics from the reference view
without the explicit use of image priors or texture modeling. Visto assumes that each patch is available in both
the synthesized and reference views and thus can be applied to the common area between the two views but
not the out-of-region area at the border of the synthesized view. Visto uses a Gauss-Seidel-like iterative
approach to minimize the energy function. Simulation results suggest that Visto can generate seamless virtual
views and outperform other state-of-the-art methods.
ETPL
DIP - 190
Seamless View Synthesis Through Texture Optimization
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
In this paper, a novel technique to speed-up a non-local means (NLM) filter is proposed. In the
original NLM filter, most of its computational time is spent on finding distances for all the patches in the
search window. Here, we build a dictionary in which patches with similar photometric structures are clustered
together. Dictionary is built only once with high resolution images belonging to different scenes. Since the
dictionary is well organized in terms of indexing its entries, it is used to search similar patches very quickly for
efficient NLM denoising. We achieve a substantial reduction in computational cost compared with the original
NLM method, especially when the search window of NLM is large, without much affecting the PSNR.
Second, we show that by building a dictionary for edge patches as opposed to intensity patches, it is possible
to reduce the dictionary size; thus, further improving the computational speed and memory requirement. The
proposed method preclassifies similar patches with the same distance measure as used by NLM method. The
proposed algorithm is shown to outperform other prefiltering based fast NLM algorithms computationally as
well as qualitatively.
ETPL
DIP - 191
Novel Speed-Up Strategies for Non-Local Means Denoising With Patch and
Edge Patch Based Dictionaries
The concept of tension is introduced in the framework of active contours with prior shape information,
and it is used to improve image segmentation. In particular, two properties of this new quantity are shown: 1)
high values of the tension correspond to undesired equilibrium points of the cost function under minimization
and 2) tension decreases if a curve is split into two or more parts. Based on these ideas, a tree is generated
whose nodes are different local minima of the cost function. Deeper nodes in the tree are expected to
correspond to lower values of the cost function. In this way, the search for the global optimum is reduced to
visiting and pruning a binary tree. The proposed method has been applied to the problem of fish segmentation
from low quality underwater images. Qualitative and quantitative comparison with existing algorithms based
on the Euler-Lagrange diffusion equations shows the superiority of the proposed approach in avoiding
undesired local minima.
ETPL
DIP - 192
Tension in Active Shapes
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
To evaluate multitarget video tracking results, one needs to quantify the accuracy of the estimated
target-size and the cardinality error as well as measure the frequency of occurrence of ID changes. In this
paper, we survey existing multitarget tracking performance scores and, after discussing their limitations, we
propose three parameter-independent measures for evaluating multitarget video tracking. The measures
consider target-size variations, combine accuracy and cardinality errors, quantify long-term tracking accuracy
at different accuracy levels, and evaluate ID changes relative to the duration of the track in which they occur.
We conduct an extensive experimental validation of the proposed measures by comparing them with existing
ones and by evaluating four state-of-the-art trackers on challenging real-world publicly-available data sets.
The software implementing the proposed measures is made available online to facilitate their use by the
research community.
ETPL
DIP - 193
Measures of Effective Video Tracking
Seed-based methods for region-based image segmentation are known to provide satisfactory results
for several applications, being usually easy to extend to multidimensional images. However, while boundary-
based methods like live wire can easily incorporate a preferred boundary orientation, region-based methods
are usually conceived for undirected graphs, and do not resolve well between boundaries with opposite
orientations. This motivated researchers to investigate extensions for some region-based frameworks, seeking
to better solve oriented transitions. In this same spirit, we discuss how to incorporate this orientation
information in a region-based approac called “IF segmentation by seed competition” by e ploring digrap s.
We give direct proof for the optimality of the proposed extensions in terms of energy functions associated with
the cuts. To stress these theoretical results, we also present an experimental evaluation that shows the obtained
gains in accuracy for some 2D and 3D data sets of medical images.
ETPL
DIP -194
Oriented Image Foresting Transform Segmentation by Seed Competition
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
This paper presents a new framework for motion compensated frame rate up conversion (FRUC)
based on variational image fusion. The proposed algorithm consists of two steps: 1) generation of multiple
intermediate interpolated frames and 2) fusion of those intermediate frames. In the first step, we determine
four different sets of the motion vector field using four neighboring frames. We then generate intermediate
interpolated frames corresponding to the determined four sets of the motion vector field, respectively. Multiple
sets of the motion vector field are used to solve the occlusion problem in motion estimation. In the second
step, the four intermediate interpolated frames are fused into a single frame via a variational image fusion
process. For effective fusion, we determine fusion weights for each intermediate interpolated frame by
minimizing the energy, which consists of a weighted- L1-norm based data energy and gradient-driven
smoothness energy. Experimental results demonstrate that the proposed algorithm improves the performance
of FRUC compared with the existing algorithms.
ETPL
DIP - 195
Frame Rate Up Conversion Based on Variational Image Fusion
We introduce an adaptive continuous-domain modeling approach to texture and natural images. The
continuous-domain image is assumed to be a smooth function, and we embed it in a parameterized Sobolev
space. We point out a link between Sobolev spaces and stochastic auto-regressive models, and exploit it for
optimally choosing Sobolev parameters from available pixel values. To this aim, we use exact continuous-to-
discrete mapping of the auto-regressive model that is based on symmetric exponential splines. The mapping is
computationally efficient, and we exploit it for maximizing an approximated Gaussian likelihood function. e
account for non- aussian L vy-type processes by deriving a more robust estimator that is based on the sample
auto-correlation sequence. Both estimators use multiple initialization values for overcoming the local minima
structure of the fitting criteria. Experimental image resizing results indicate that the auto-correlation criterion
can cope better with non-Gaussian processes and model mismatch. Our work demonstrates the importance of
the auto-correlation function in adaptive image interpolation and image modeling tasks, and we believe it is
instrumental in other image processing tasks as well.
ETPL
DIP - 196
Adaptive Image Resizing Based on Continuous-Domain Stochastic Modeling
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
A new method of image resolution up-conversion (image interpolation) based on maximum a
posteriori sequence estimation is proposed. Instead of making a hard decision about the value of each missing
pixel, we estimate the missing pixels in groups. At each missing pixel of the high resolution (HR) image, we
consider an ensemble of candidate interpolation methods (interpolation functions). The interpolation functions
are interpreted as states of a Markov model. In other words, the proposed method undergoes state transitions
from one missing pixel position to the next. Accordingly, the interpolation problem is translated to the
problem of estimating the optimal sequence of interpolation functions corresponding to the sequence of
missing HR pixel positions. We derive a parameter-free probabilistic model for this to-be-estimated sequence
of interpolation functions. Then, we solve the estimation problem using a trellis representation and the Viterbi
algorithm. Using directional interpolation functions and sequence estimation techniques, we classify the new
algorithm as an adaptive directional interpolation using soft-decision estimation techniques. Experimental
results show that the proposed algorithm yields images with higher or comparable peak signal-to-noise ratios
compared with some benchmark interpolation methods in the literature while being efficient in terms of
implementation and complexity considerations.
ETPL
DIP - 197
A MAP-Based Image Interpolation Method via Viterbi Decoding of Markov
Chains of Interpolation Functions
An optical approach using confocal parabolic reflectors is used to transform 2D input data based on
spatial position to a 1D sequenced serial string. The optical input data are set up as a 2D array. Individual
channels are established between the input array and the final output detector, which reads the data as a time
based serial data. The transformation is achieved by changing the optical path length associated with each
pixel and its channel to the output detector. The 2D data can be images or individual sources but the light must
be parallel. This paper defines how to establish the channels and the calculations required to achieve the
desired transformation.
ETPL
DIP - 198
A Two Dimensional Optical Input to One Dimensional Serial Pulse
Transformation Using Confocal Reflectors
Elysium Technologies Private Limited Singapore | Madurai | Chennai | Trichy | Coimbatore | Ramnad
Pondicherry | Salem | Erode | Tirunelveli
http://www.elysiumtechnologies.com, [email protected]
Thank You !
This paper presents a new lossless color image compression algorithm, based on the hierarchical
prediction and context-adaptive arithmetic coding. For the lossless compression of an RGB image, it is first
decorrelated by a reversible color transform and then Y component is encoded by a conventional lossless
grayscale image compression method. For encoding the chrominance images, we develop a hierarchical
scheme that enables the use of upper, left, and lower pixels for the pixel prediction, whereas the conventional
raster scan prediction methods use upper and left pixels. An appropriate context model for the prediction error
is also defined and the arithmetic coding is applied to the error signal corresponding to each context. For
several sets of images, it is shown that the proposed method further reduces the bit rates compared with
JPEG2000 and JPEG-XR.
ETPL
DIP - 199
Hierarchical Prediction and Context Adaptive Coding for Lossless Color Image
Compression
Top Related