Saliency Enabled Compression in JPEG Framework › profile › Kumar_Rahul2 › publication ›...

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/323191726

Saliency Enabled Compression in JPEG Framework

Article in IET Image Processing · February 2018

DOI: 10.1049/iet-ipr.2017.0554

CITATIONS

2READS

736

2 authors:

Some of the authors of this publication are also working on these related projects:

Handwritten Form Digitization View project

Saliency based Image Compression View project

Kumar Rahul

Indian Institute of Technology Jodhpur

5 PUBLICATIONS 29 CITATIONS

SEE PROFILE

Anil Tiwari

Indian Institute of Technology Jodhpur

85 PUBLICATIONS 396 CITATIONS

SEE PROFILE

All content following this page was uploaded by Kumar Rahul on 06 September 2018.

The user has requested enhancement of the downloaded file.

https://www.researchgate.net/publication/323191726_Saliency_Enabled_Compression_in_JPEG_Framework?enrichId=rgreq-a637b704a81095cb0e2be4e926451341-XXX&enrichSource=Y292ZXJQYWdlOzMyMzE5MTcyNjtBUzo2Njc3NTE5NDExNjkxNTdAMTUzNjIxNTg3Nzg1NA%3D%3D&el=1_x_2&_esc=publicationCoverPdf

https://www.researchgate.net/publication/323191726_Saliency_Enabled_Compression_in_JPEG_Framework?enrichId=rgreq-a637b704a81095cb0e2be4e926451341-XXX&enrichSource=Y292ZXJQYWdlOzMyMzE5MTcyNjtBUzo2Njc3NTE5NDExNjkxNTdAMTUzNjIxNTg3Nzg1NA%3D%3D&el=1_x_3&_esc=publicationCoverPdf

https://www.researchgate.net/project/Handwritten-Form-Digitization?enrichId=rgreq-a637b704a81095cb0e2be4e926451341-XXX&enrichSource=Y292ZXJQYWdlOzMyMzE5MTcyNjtBUzo2Njc3NTE5NDExNjkxNTdAMTUzNjIxNTg3Nzg1NA%3D%3D&el=1_x_9&_esc=publicationCoverPdf

https://www.researchgate.net/project/Saliency-based-Image-Compression?enrichId=rgreq-a637b704a81095cb0e2be4e926451341-XXX&enrichSource=Y292ZXJQYWdlOzMyMzE5MTcyNjtBUzo2Njc3NTE5NDExNjkxNTdAMTUzNjIxNTg3Nzg1NA%3D%3D&el=1_x_9&_esc=publicationCoverPdf

https://www.researchgate.net/?enrichId=rgreq-a637b704a81095cb0e2be4e926451341-XXX&enrichSource=Y292ZXJQYWdlOzMyMzE5MTcyNjtBUzo2Njc3NTE5NDExNjkxNTdAMTUzNjIxNTg3Nzg1NA%3D%3D&el=1_x_1&_esc=publicationCoverPdf

https://www.researchgate.net/profile/Kumar_Rahul2?enrichId=rgreq-a637b704a81095cb0e2be4e926451341-XXX&enrichSource=Y292ZXJQYWdlOzMyMzE5MTcyNjtBUzo2Njc3NTE5NDExNjkxNTdAMTUzNjIxNTg3Nzg1NA%3D%3D&el=1_x_4&_esc=publicationCoverPdf


https://www.researchgate.net/institution/Indian_Institute_of_Technology_Jodhpur?enrichId=rgreq-a637b704a81095cb0e2be4e926451341-XXX&enrichSource=Y292ZXJQYWdlOzMyMzE5MTcyNjtBUzo2Njc3NTE5NDExNjkxNTdAMTUzNjIxNTg3Nzg1NA%3D%3D&el=1_x_6&_esc=publicationCoverPdf


https://www.researchgate.net/profile/Anil_Tiwari14?enrichId=rgreq-a637b704a81095cb0e2be4e926451341-XXX&enrichSource=Y292ZXJQYWdlOzMyMzE5MTcyNjtBUzo2Njc3NTE5NDExNjkxNTdAMTUzNjIxNTg3Nzg1NA%3D%3D&el=1_x_4&_esc=publicationCoverPdf


https://www.researchgate.net/institution/Indian_Institute_of_Technology_Jodhpur?enrichId=rgreq-a637b704a81095cb0e2be4e926451341-XXX&enrichSource=Y292ZXJQYWdlOzMyMzE5MTcyNjtBUzo2Njc3NTE5NDExNjkxNTdAMTUzNjIxNTg3Nzg1NA%3D%3D&el=1_x_6&_esc=publicationCoverPdf



IET Research Journals

Saliency Enabled Compression in JPEGFramework

ISSN 1751-8644doi: 0000000000www.ietdl.org

Kumar Rahul1 Anil Kumar Tiwari11Department of Electrical Engineering, Indian Institute of Technology Jodhpur, Rajasthan-342011, India

E-mail: [email protected]

Abstract: Under low bit rate requirements, JPEG Baseline causes degradation in the perceptual quality at regions with highfrequency and leads to compression artifacts in the image. The aim of this paper is to propose a novel region of interest (ROI)dependent quantization method in JPEG framework with small computational overhead. The proposed method judiciously quan-tizes DCT coefficients belonging to salient and non-salient regions of the image. In this work, multiple ROIs i.e. salient regionsare optimally identified and ranked by using their variances. The number of classes is adaptively calculated using the goodness-of-segmentation (GOS). After the number of regions and their ranks are obtained, the image is divided into blocks of size 8× 8.These blocks may belong to more than one regions and hence the blocks are ranked based on their membership to the variousregions. 2D-DCT coefficients of each of the blocks are obtained and the same are quantized adaptively based on their ranks. Theoverhead for the rank information of the blocks is minimized by applying delta encoding. The results of the proposed method areanalyzed in terms of objective quality parameters and visual perception and found that the blocking artifacts in our method aresignificantly lower as compared to JPEG. The efficiency of the proposed method is demonstrated by results of recently publishedsimilar methods and the former is found superior in terms of quality of the reconstructed image.

1 Introduction

Usage of image data through the Internet has exponentially increasedamong the users [1]. Compression is essentially required to man-age this high data-rate of images without degrading the quality to anunacceptable level. The necessity of accessing the high definitionimages with quality as of paramount importance has become themajor issue in designing such algorithms to operate in real time.Image compression can broadly be categorized into lossy and loss-less. Lossless image compression mainly focuses on the identifica-tion and removal of redundancy that can be recovered on the decoderside [2]. It is preferred for archival purposes and often for medicalimaging. Lossy compression techniques usually remove the irrele-vant information from the image space. These methods are espe-cially suitable for natural images such as photographs, where minor(sometimes imperceptible) loss of fidelity is acceptable to achieve asubstantial reduction in bit-rate. The lossy compression which yieldsimperceptible differences may be called visually lossless[3]. Thelossy image compression methods can be further enumerated intotwo categories:

The methods that fall into the first category are called directmethods [1], which act directly on the image samples in the spatialdomain. Block truncation coding (BTC) [4–6] and vector quantiza-tion [7, 8] based methods are widely used under this category.

The methods under the second category are called transformmethods [1], where the image is transformed into frequency domain.Principle component analysis (PCA) [9], discrete cosine transform(DCT) [6, 10–12], discrete wavelet transform (DWT) [13–15] arethe most popular transformations used for this purpose. Such trans-formations concentrate the energy of the image in a few number ofcoefficients, making it suitable for removing perceptual redundan-cies.

Among all transform based image compression methods, DWTachieves the best energy compaction. The comparative studybetween DCT and DWT-based image and video coding techniques[16] suggests that although DWT based methods yield slightly bet-ter quality of reconstructed image after compression i.e. higherpeak-signal-to-noise ratio(PSNR) value (< 0.7db) compared to DCT,for the same compression ratio, the DCT based coder has signifi-cant lower complexity than its DWT counterpart. For this reason,

Input Image Salient Regions RectangularApproximation

Fig. 1: Rectangular approximation in multi-level saliency basedcompression techniques

the state-of-art image and video coding standards, and multimediadevices prefer DCT over DWT [17].

Due to the computationally efficient encoding and decoding struc-tures, JPEG baseline [10] is a well-accepted standard used for lossyimage compression. It is widely used in digital cameras and otherphotographic image capturing devices for storing and transmittingimages on World Wide Web (WWW). During the encoding process,JPEG divides the image into blocks of size 8× 8 and applies 2D-DCT on each of them. To achieve compression, the DCT coefficientsof the blocks are quantized by fixed quantization parameters, irre-spective of the Region-of-Interest (ROI) of the block. In the case ofhigh compression requirements, scaled up quantization parametersare used with the DCT coefficients. We observed that the recon-structed blocks with higher variance (i.e., the second central momentof the pixel intensities within a block) are highly degraded, com-pared to the blocks with lower variance [18, 19]. This happens dueto the structure of JPEG baseline quantization table designed to

IET Research Journals, pp. 1–7c© The Institution of Engineering and Technology 2015 1

quantize higher frequency regions more heavily than the lower fre-quency ones. A non-homogeneous compression is observed causingcompression artifacts in the reconstructed image. ROI independentquantization of DCT coefficients causes degradation of the overallperceptual quality of the reconstructed image particularly, at highcompression ratio (CR) (i.e. at low bit rate).

There are some regions in images where more information isanticipated and can be delineated as ROIs or salient regions. Theamount of attention steered among all the regions of an image isnon-identical as per the human visual and cognitive systems [20].A saliency-guided compression method is ideally suited to preserveperceptually important regions. These methods can intelligentlycompress the salient regions lightly and non-salient ones heavily, toensure as small perceptual loss as possible, for required CR.

There have been efforts towards saliency-based image and videocompression techniques [15, 21–26]. Mostly these approaches seg-ment the image into two regions [15, 24–26]: salient and non-salient. On the segmented regions, different compression algorithmsare applied to obtain a good combination of reconstructed imagequality and CR. However, our cognitive system does not alwaysclassify images into salient and non-salient. Human visual systembestows multi-level attention on different regions. This leads to therequirement of segmentation of images into saliency driven multipleregions.

The work done in multi-level saliency-based compression tech-niques [15, 21, 26, 27] exhibits an improved trade-off between CRand perceptual quality than using only two-level saliency. Texture-based methods [21, 27] classifies the image into edges, textures andflat regions with the aim to save the edge, and important textureinformation of the image after compression. JPEG 2000 standard[26], incorporates both two-level and multi-level models of ROIencoding using maximum shift or MAXSHIFT and general scal-ing based method (GSBM), respectively. The major challenge inmulti-level saliency-based compression technique is the requirementof sending the overhead for the shape of salient regions and theirranks used to grade the saliency. The overhead is proportional to thecomplexity of the shape and the number of ranks. This is becausethe complex shaped regions will require a larger number of modelparameters which will increase the overhead. Also, the overhead forranks information will increase with an increase in the number ofsalient regions i.e. for the R number of ranks, dlog2Re bits perrank will be needed. If the ROI mask is generated for an arbitraryshaped ROI, the decoder needs to reproduce the ROI mask [26],making the decoder computationally complex and increased mem-ory requirement on the decoder side. To reduce the requirement ofthe overhead information and to make the decoder simple, the ROIshape is approximated as a rectangular box [15, 26], as shown in Fig.1. The coordinates of the opposite vertices of the rectangular boxesand their rank information are sent to the decoder. This approachsaves the overhead information to a good extent but the CR is com-promised, as the actual ROI has been approximated by a rectangularbounding box.

Because of the computational efficiency of DCT based methods[16], with a small loss in performance as compared to DWT basedsuch methods, our focus has been on the former one. We propose anovel multi-level saliency-based image compression algorithm thatprovides a statistically optimal trade-off between the overhead, per-ceptual quality, and CR. The proposed method chooses variance asa basis to classify and rank the image into an optimal number ofclasses. The aim is to enable JPEG standard to judiciously retainhigh-frequency regions in the image that provides perceptual homo-geneity in compression, particularly in the case of high compressionrequirement.

The rest of the paper is organized as follows. Section 2 brieflyreviews the quantization phase of JPEG compression. Section 3explains the proposed multi-level saliency detection and image cod-ing algorithm. Section 4 presents the experimental results and itsanalysis. Concluding remarks are given in section 5

ColorTransform

DownSampling

ForwardDCT

Quantization Encoding

CompressedData

ColorTransform

Up-Sampling

InverseDCT

De-quantization

Decoding

JPEG Encoding Process

JPEG Decoding Process

Raw Image

Reconstructed Image

Fig. 2: Encoding and Decoding Process of JPEG Baseline

2 JPEG Compression

The encoding and decoding process of JPEG baseline is shownin Fig. 2. There are three basic steps in JPEG compression [10],such as color sub-sampling, block-DCT coefficient quantization,and entropy coding. The required bit-rate after encoding by JPEGbaseline is mainly controlled in the quantization phase. The DCTcoefficients of all the blocks are quantized by fixed quantizationparameter of its quantization table (T, i.e. a matrix of quantiza-tion step sizes). According to JPEG standard, the quantization tablecan be configured as per the bit-rate requirement [18]. There havebeen series of several quantization tables developed and are widelyused for the requirement of higher compression ratio or improvedreconstructed image quality. The JPEG recommended a series ofquantization tables (TF ) is given in (1).

TF =

bT50 × 50F + 1

2c, 1 ≤ F < 50

bT50 × (2− F50 ) +

12c, 50 ≤ F ≤ 100

(1)

The value of quality factor (F ) ranges between 1 to 100. Thesmall value of F corresponds to large quantization step size, andthus results in high CR. T50 is the standard quantization table [10]used for luminance component of an image is given in (2). JPEGuses separate quantization table for the chrominance component ofthe color images (3). For illustration purpose, few other quantizationtables at different F values are included in Appendix A.

T50 =

16 11 10 16 24 40 51 6112 12 14 19 26 58 60 5514 13 16 24 40 57 69 5614 17 22 29 51 87 80 6218 22 37 56 68 109 103 7724 35 55 64 81 104 113 9249 64 78 87 103 121 120 10172 92 95 98 112 100 103 99

(2)

TC50 =

17 18 24 47 99 99 99 9912 12 14 19 99 99 99 9914 13 16 19 99 99 99 9914 17 19 19 99 99 99 9919 19 19 19 99 99 99 9919 19 19 19 99 99 99 9919 19 19 19 99 99 99 9919 19 19 19 99 99 99 99

(3)

To achieve low bit-rate requirements, the visual quality of theblocks with higher variance (or frequency) get more degraded afterreconstruction compared to the blocks with lower variance. To illus-trate this, we compressed an image at different bit-rates using JPEGbaseline and highlighted a high variance region to analyze the dis-tortion occurred, as shown in Fig. 9. Fig. 9 (a) is the original(512× 512) dimension color image at 24 bits-per-pixel (bpp) and

IET Research Journals, pp. 1–72 c© The Institution of Engineering and Technology 2015

Decompose into(8x8) Blocks

Multiple SaliencyIdentification

Block Ranking

Saliency Ranking

DCTAdaptive-

QuantizationEncoding

Compressed Data&

Rank Information

Overhead Reduction

(Delta Encoding)

Number of RegionsIdentification

Path 1

Path 2

Original Image(M N)×

Fig. 3: Encoding Process of Proposed Method

1 1 1

4 4 4

444

1 2 2

3 3 4

444

(b)

1 2 3

3 3 4

444

1 2 2

3 4 4

444

(d)(a)

Fig. 4: Block ranking process: (a) to (d) represents pixel withdifferent rank situations.

Fig. 9 (b) is the reconstructed images after applying JPEG compres-sion with quantization table T5. It can be observed that the distortionin the highlighted region increases at lower bit-rate, causing degra-dation in the perceptual quality of the overall image. A similar effectis also shown in Fig. 10.

3 Proposed Method

The proposed image encoding process is shown in Fig. 3. For easeof implementation, various steps are briefly explained as follows:

The encoder processes a given image through two paths. Thefirst path generates multi-level saliency map for the input image.The optimal number of classes is adaptively calculated by usingefficiency of segmentation [28]. The image is then segmented intothe same number of salient classes by maximizing between-classvariances [29]. Every class is then given a rank based on its impor-tance by using their weighted-variances. The class having highweighted-variance is given higher rank i.e. more importance, andvice-versa.

The second path is used for the adaptive quantization. The imageis decomposed into blocks of size 8× 8 and each block is rankedbased on the input saliency map obtained from the first path byapplying probability bound. 2D-DCT coefficients of each block arequantized adaptively by the quantization parameters modified by therank of the block. The higher rank block (i.e. more salient) will belightly quantized and vice-versa. The quantized coefficients are thenentropy coded and the overhead for the rank of the blocks is reducedby using delta encoding method [30].

Unlike method in [26], where decoder requires reproducing theROI mask, making the decoder complex, the decoder of our methodis simple as ROI information is sent to the decoder by the encoder.The reconstruction of the image is the inverse of the encoding steps.The detailed description of the key steps in the encoding process isgiven as follows:

3.1 Number of Regions Identification & Multiple SaliencyIdentification

Salient regions are identified by segmenting the image into Knumber of classes, to be discussed later, by maximizing between-class variances. For segmenting the image, the Otsu’s segmentationmethod [28, 29, 31] is extended for K classes. Let these K classesbe arbitrary bounded by K + 1 intensity levels (t0, t1, t2, ..., tK )as t0 < t1 < t2 < ... < tK−1 < tK . For an image with L inten-sity levels, ti (0 ≤ i ≤ K) is intensity value of pixels with t0 = 0,and tK = L− 1. Let ith class (1 < i < K − 1) consists of all thepixels with their intensities in the range [ti−1, ti − 1]. Whereas, theKth class consists of pixels with intensity values in the range of[tK−1, tK ]. With these initial assumptions, probability of ith classoccurrence (ωi), and the class mean (µi) are obtained as follows:

ωi =

ti∑j=ti−1

pj , µi =1

ωi

ti∑j=ti−1

jpj , µT =

K∑i=1

ωiµi (4)

Here pj is probability of the pixels with intensity value j, and (µT )is the mean of the image. Thereafter between-class variance (σ2K)can be obtained using (5), given below.

σ2K =K∑i=1

ωi(µi − µT )2 (5)

σ2K is function of ωi and µi and these parameters, in turn, arefunctions of the chosen class boundaries t1, t2, ...tK−1. It is desiredto obtain an optimal set of the class boundaries that results into amaximum value of σ2K . This can be obtained by iteratively solving(5) for possible boundary values given in (4). The maximum valueof σ2K is called maximum between-class variance.

To identify the total number of classes (K), it is proposed tofirst obtain the goodness-of-segmentation (GOS) (ηK), given in (6).Between class variance, σ2K , given in (5), and weighted-variance, Si,given in (8), are used to calculate the total variance σ2T and ηK , for


initial value of K = 2. Value of K is incremented till the inequalitygiven in (7) is satisfied for required value of (ηr), typically in therange of 0.8 to 0.99.

σ2T = σ2K +

K∑i=1

Si, ηK =σ2Kσ2T

(6)

As it can be observed from (6), nK will be less than 1.

ηK ≥ ηr (0 ≤ ηr ≤ 1) (7)

Choosing number of classes (K) based on GOS helps to avoidthe over-segmentation and under-segmentation situations. For a goodsegmentation, the required GOS ηr should be at least 0.80, asobserved in the proposed work. The effect of ηr parameter isdiscussed in Section 4.

3.2 Saliency Ranking

In order to rank each class, consisting of randomly distributed pixels,weighted-variance of the pixels are obtained, using (8).

Si = ωiσ2i , i = 1, 2, ...,K (8)

where σ2i is the variance of ith class, as given in (9)

σ2i =1

ωi

ti∑j=ti−1

(j − µi)2pj (9)

Si for 1 ≤ i ≤ K is sorted in descending order. The ith class pix-els will get rank q where q is the position of the sorted Si. Highestweighted-variance refers to the most salient class and gets the high-est rank and vice versa, i.e. pixels corresponding to max(Si) willget rank (r = 1) and min(Si) will be ranked (r = K). The aim isto give more importance to the class with a considerable area havinghigh variance. As expressed in (8), a class with a high variance butvery small area (small ωi) may get less importance than a class withrelatively lower variance but a larger area.

3.3 Block Ranking

After classifying pixels based on the threshold values t1, t2, ...,tK−1 and ranking them according to sorted series of Si, given in(8), we propose to use probability mass function (PMF) to rank every8× 8 blocks used in JPEG. Blocks having pixels with more than onerank may be the one that is on the border, or sometimes on the edges.The block is assigned rank r, whenever the empirically proposedprobability bound (10) is satisfied, starting with r = 1.

r∑i=1

pi ≥1

K − r + 1, r = 1, 2, ...,K (10)

Where pi is the probability of ith ranked pixels in the block. Toillustrate the use of probability bound and ranking of blocks, letsassume K = 4 and apply (10) on four different blocks of size 3× 3shown in Fig. 4(a) to (d). Considering Fig. 4(a), for example, it isfound that p1 = 0.33, p2 = p3 = 0, and p4 = 0.66. The probabil-ity bound (10) is then applied, starting with r = 1. The probabilitybound is satisfied for r = 1, resulting block rank to be 1. Similarly,blocks in Fig. 4(b) to (d) gets rank 2, 3, and 4 respectively.

3.4 Adaptive Quantization of DCT Coefficients

DCT coefficients of every ranked block of size 8× 8 is adaptivelyquantized as per their importance, estimated in terms their rankvalues (r). The quantization table (T50) used in JPEG baseline[10] is proposed to be scaled by a factor Fr for rth ranked block(1 ≤ r ≤ K), and the same is controlled by two variable Var andQam given in (11).

Table 1: Effect of varying parameter (ηr) on the image data-set [32]

ηr Number of R1 area R2 area

Regions in % in %

0.78 2.83 36.18 42.230.81 3 35.38 30.140.84 3.33 34.37 25.370.87 3.83 36.79 33.980.90 4.33 38.17 30.710.93 5.17 29.66 28.460.96 7.33 38.49 23.820.99 11.5 28.88 15.52

(a) (b) ( )c

(d) (e) (f)

Fig. 5: Test images: (a) Airplane, (b) Peppers, (c) Lena, (d) Girl,(e) Couple, (f) Zelda. [a-c] dimension 512× 512, [d-f] dimension256× 256.

Fr = Var + (r − 1)Qam (11)

Fr = Var for r = 1 i.e. for the most salient blocks and value ofFr increases by (r − 1)Qam as the saliency of the block decreases(i.e. r increases). Details of choice of Var and Qam are given inSection 4.

3.5 Overhead Reduction

The number of bits for sending the rank (r) information associatedwith various blocks will be dlog2(K)e and its value in terms of bppwill be no, given in (12)

no =dlog2(K)e

8× 8(12)

High correlation in the ranks of adjacent blocks, due to the Marko-vian property in images, helped in reduction of overhead by applyingdelta encoding [30] on rank matrix. In delta encoding, the onlyerror between the current-block rank and the previous-block rankis encoded.

4 Results and Analysis

In order to evaluate the robustness and efficiency of the proposedmethod, USC-SIPI Image data-set [32] is used. The data set con-tains 397 images in 5 different volumes as 155 texture images, 91rotated texture images, 38 high altitude aerial images, 44 miscella-neous outdoor images and 69 frames taken from 4 video sequences.Images in each volume are of the dimensions between 256× 256pixels to 2048× 2048 pixels. All images are 8 bits/pixel, for gray-scale images, or 24 bits/pixel, for color images. We used 4 : 4 : 4chroma sub-sampling, as in [15], on the color images to quantize on


Table 2: Effect of varying parameters Var and Qam on 24 bpp Lenaimage compressed to 0.5 bpp and at ηr = 0.95. Ri denotes theregions in the image with rank i

Var Qam PSNR PSNR PSNR SSIM

R1 R2 Overall Overall1.4 4 31.28 26.43 26.12 0.93511.6 3.5 30.95 26.48 26.48 0.94501.8 3 30.59 26.63 26.40 0.94432 2.5 30.33 26.98 27.05 0.9509

2.3 2 29.87 27.18 27.39 0.95202.6 1.5 29.47 27.32 27.59 0.95552.8 1 29.25 27.54 28.14 0.96233 0.5 29.01 27.89 28.75 0.9666

JPEG Baseline 29 28.41 29.42 0.9714

Region Weight (%) 38 20 100 100

Bitrate (bpp)0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

PS

NR

(db

)

23

24

25

26

27

28

29

Proposed R1JPEG R1Proposed OverallJPEG Overall

Fig. 6: Rate-distortion comparison (at ηr = 0.95) between pro-posed method and JPEG baseline on the data-set [32]. R1 denotesmost salient regions in the image.

Input Image After Saliency Ranking After Block Ranking

Fig. 7: Multiple salient regions after applying proposed saliencydetection technique at ηr = 0.92. Brighter region indicates moresaliency and vice versa.

all luminance and chrominance components. To compare the qualityof the reconstructed image, PSNR [33] and SSIM [34] are used.

The key parameters in the proposed method are ηr , Var andQam. The number of classes in which the image is to be segmenteddepends on ηr . Table 1 shows average number of regions, and thepercentage of area with rank r = 1, and 2 as a function of ηr , forthe image data-set in [32]. From Table 1, it is observed that numberof classes obtained are 3 and 12 for ηr = 0.8 and 0.99, respectively.The value of ηr can be chosen as per the target bit-rate. Choosinglower value for ηr yields less number of salient regions, the over-head per block is less, which is suitable to achieve lower bit-rate. Inorder to control the overhead, an appropriate value for ηr between0.8 to 0.99 can be selected for an image.

Region-wise effect of changing the parameters Var and Qam, at0.5 bpp, for Lena test image is shown in Table 2. To achieve 0.5bpp, the quantization table T25 for JPEG baseline is used. IncreasingQam yields improved quality at the most salient regions (i.e. r = 1)which is 38% of the image. By decreasing Qam, the results behavelike JPEG baseline. So, by changing the parameters Var and Qam,we get adequate flexibility in controlling quality and compressionratio than that can be obtained in JPEG baseline method, as shownin Table 2.

After Saliency Ranking Rectangular Approximation After Block Ranking

Fig. 8: Comparison of ROI reconstruction at the decoder side byusing rectangular approximation and the proposed method.

The rate-distortion curve in Fig. 6, shows that the PSNR of theproposed method is always higher compared to JPEG for the image’smost salient regions (r = 1) which constitutes an average of 31.1%of total area (or the number of pixels) of the test images. However,the overall quality has an average degradation of 1.57% compared toJPEG baseline. Since, there is a significant improvement in quality atthe salient regions i.e. the regions of perceptual importance, the over-all visual quality of the images has improved. It can also be observedfrom Fig. 6, that between 0.4 to 0.6 bpp, there is a fluctuation inJPEG baseline curve for the most important regions i.e. a region witha large variance. The reason for this is non-homogeneous compres-sion in JPEG, leading to compression artifacts. On the other hand,the proposed method gives a stable curve for the most importantregions as well as for the overall image.

Table 3 presents the performance comparison of the proposedmethod with the recently published DCT and DWT based algorithmsin [6, 11, 12, 14] and the JPEG baseline [10] on the same test imagesas shown in Fig. 5. The value of PSNR (in db) shown in Table 3for proposed method, and JPEG baseline is for the most importantregions (r = 1) of the image and for rest of the methods for thewhole image. When any DWT or DCT methods in [6, 10–12, 14]are applied on the most important regions (r = 1), the mean-squareerror (MSE) of these regions is higher than overall MSE. The rea-son for this is while applying these transform based methods in aregion with high variance, the energy compaction is lesser comparedto a region with lower variance [18], which results in higher MSEafter quantization to achieve lower bit-rate. A similar example can bereferred from rate-distortion curve in Fig. 6, where the PSNR afterapplying JPEG on the most important regions of the image is alwayslower than the overall image. This information suggests that thePSNR values provided in Table 3 for the methods in [6, 11, 12, 14],which is for the whole image will have a lower value of PSNR for theregions with (r=1). It is clear that the proposed method outperformthose reported in [6, 10–12, 14].

The Fig. 7 shows the performance of the proposed multi-levelsaliency ranking and block ranking on Lena. For illustration purpose,the saliency of the regions is shown in gray-scale with brighter regionimplies more salient and vice-versa. The face region of Lena imagegets high importance regardless of having comparatively low vari-ance. This is because the face region has large area compared to otherregions. It is observed that the proposed approach of block rankingprovides effective means of labeling Region of Interest (ROI).

Fig. 8 shows the comparison in terms of accuracy of reconstructedROI at the decoder side, between the proposed method of send-ing the ROI and the rectangular approximation used in state-of-artsaliency enabled methods [15, 26]. The reference images can be seenin Fig. 1. It is observed that the proposed approach of sending ROIinformation to the decoder by using block ranks, retains the ROIstructure better than the rectangular approximation of ROI. The aver-age overhead found to be 0.00038 bpp while using the rectangularapproximation, and 0.0091 bpp while using the proposed method.


Table 3: Performance comparison between the proposed method, JPEG Baseline [10], CBTF-PF [6], CDABS [11], GA-DWT [14], and dLUT[12] algorithms

Image R1 Area JPEG [10] CBTC − PF [6] CDABS [11] GA − DWT [14] dLUT [12] Proposed

in% PSNR bpp PSNR bpp PSNR bpp PSNR bpp PSNR bpp PSNR bpp

Airplane 19.6 29.71 0.97 30.36 1.04 31.40 0.72 31.16 0.49 31.16 0.48 31.43 0.45Peppers 45.2 30.16 1.47 30.15 1.5 30.33 0.88 31.20 0.83 31.19 0.88 31.49 0.85

Lena 51.1 32.57 1.03 31.93 1.17 32.77 1 32.76 0.66 32.65 0.74 33.37 0.72Girl 21.58 34.98 0.62 35.13 0.6 36.96 0.69 35.90 0.41 35.86 0.38 36.26 0.37

Couple 44.4 31.49 0.94 32.44 1 33.07 1.13 32.87 0.89 32.62 0.79 32.51 0.81Zelda 30.8 31.24 1 31.31 1.12 32.05 1.09 31.98 0.76 32.01 0.82 32.89 0.80

Average 35.45 31.69 1.01 31.89 1.07 32.76 0.92 32.65 0.67 32.58 0.68 32.99 0.67

Although, the rate-distortion curve in Fig. 6 shows that at 0.2bpp, the PSNR of the proposed method converges to JPEG for theimage’s most salient regions (r = 1), however, the overall percep-tual quality of the proposed method is significantly better than JPEG.To illustrate this, Fig. 9, and Fig. 10 provide a better visual com-parison of the reconstructed images after applying JPEG baselineand the proposed method. In Fig. 9 (a), Lena image of dimension512× 512 is shown with portion of its face area highlighted. Fig.10 (b), and (c) are the reconstructed images after applying JPEGbaseline and the proposed method, respectively, to achieve bit-rateof 0.2 bpp. Similarly, in Fig. 10 (a), Baboon image of dimension512× 512 is shown and a high variance region is highlighted. Fig.10 (b), and (c) are the reconstructed images after applying JPEGbaseline and the proposed method, respectively, to achieve bit-rate of0.31 bpp. The overall perceptual quality of the reconstructed images,obtained from the proposed method is significantly better than that ofthe reconstructed images from JPEG baseline. Also, the highlightedarea of the images obtained from the proposed method shows veryless distortion compared to the reconstructed images from JPEGbaseline.

The overhead of sending the ROI information to the decoder isdependent on ηr . We set the parameter ηr = 0.95 and got aver-age overhead for the data set without any post-processing as 0.047bpp. Although this overhead seems to be low, however, at high CRrequirements (i.e. low bpp), reduction in this overhead is desired. Forthis, after applying delta encoding, the average overhead reduced to0.0299 bpp.

5 Conclusions

For the requirement of high compression ratio without any sig-nificant quality loss in salient regions of the reconstructed image,we propose to classify a given image into multiple ranked ROI’sand quantize the corresponding DCT coefficients judiciously. Thegoodness-of-segmentation (GOS) is used as a parameter to adap-tively identify the number of classes in the image. The multipleROI’s are obtained by maximizing between-class variances andranked by using within-class variances. Coefficients of the quantiza-tion table used in JPEG are adaptively changed as a function of therank of the ROI, and its application in JPEG’s framework resulted insignificant improvement in compression performance.

We achieved (average) 2.88% better quality at the most salientregions, which contained an average of 31.1% area in 397 testimages. Due to improvement in the salient regions, the overall per-ceptual quality of the reconstructed image is better than JPEG.The experimental results obtained on different color images clearlyshowed that our proposed method outperforms the recently pub-lished similar methods in terms of the perceptual quality of thereconstructed images. By ranking 8× 8 blocks, we were able toreconstruct the ROI at the decoder side more accurately than therecent state-of-the-art works where the ROI is approximated by arectangular bounding box. The average overhead is also found to getreduced by 36.38% by using delta encoding as a post-processing onrank information matrix.

(a) (b) ( )c

Original Image24 bpp 0.2 bpp

JPEG Baseline (T )5 Proposed Method0.2 bpp

PSNR: 23.29 db PSNR: 28.37 db

Fig. 9: Reconstructed Lena image from JPEG baseline and theproposed compression technique. (a) Original Image (24 bpp). (b)Reconstructed image compressed at 0.2 bpp, using JPEG baseline.(c) Reconstructed images at 0.2 bpp, using the proposed method.

(a) (b) ( )c

Original Image24 bpp 0.31 bpp

JPEG Baseline (T )5 Proposed Method0.31 bpp

PSNR: 19.89 db PSNR: 22.11 db

Fig. 10: Reconstructed Baboon image from JPEG baseline and theproposed compression technique. (a) Original Image (24 bpp). (b)Reconstructed image compressed 0.31 bpp, using JPEG baseline. (c)Reconstructed image at 0.31 bpp, using the proposed method.

6 Appendix A: Standard JPEG QuantizationTables

JPEG quantization tables at different quality factor F =100,60,20,1are presented as follows:


T100 =

1 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 11 1 1 1 1 1 1 1

T60 =

13 9 8 13 19 32 41 4910 10 11 15 21 46 48 4411 10 13 19 32 46 55 4511 14 18 23 41 70 64 5014 18 30 45 54 87 82 6219 28 44 51 65 83 90 7439 51 62 70 82 97 96 8158 74 76 78 90 80 82 79

T20 =

40 28 25 40 60 100 128 15330 30 35 48 65 145 150 13835 33 40 60 100 143 173 14035 43 55 73 128 218 200 15545 55 93 140 170 273 258 19360 88 138 160 203 260 283 230123 160 195 218 258 303 300 253180 230 238 245 280 250 258 248

T1 =

800 550 500 800 1200 2000 2550 3050600 600 700 950 1300 2900 3000 2750700 650 800 1200 2000 2850 3450 2800700 850 1100 1450 2550 4350 4000 3100900 1100 1850 2800 3400 5450 5150 38501200 1750 2750 3200 4050 5200 5650 46002450 3200 3900 4350 5150 6050 6000 50503600 4600 4750 4900 5600 5000 5150 4950

7 Acknowledgments

This work was supported by the Visvesvaraya Ph.D. scheme forElectronics and IT Research Fellowship (MeitY, India).

8 References1 Sayood, K.: ‘Introduction to data compression’. (Newnes, 2012)2 Weinberger, M.J., Seroussi, G., Sapiro, G.: ‘The loco-i lossless image compres-

sion algorithm: Principles and standardization into jpeg-ls’, IEEE Transactions onImage processing, 2000, 9, (8), pp. 1309–1324

3 Wang, Z., Simon, S., Baroud, Y., Najmabadi, S.M. ‘Visually lossless image com-pression extension for jpeg based on just-noticeable distortion evaluation’. In:Systems, Signals and Image Processing (IWSSIP), 2015 International Conferenceon. (IEEE, 2015. pp. 237–240

4 Kurita, T., Otsu, N.: ‘A method of block truncation coding for color image com-pression’, IEEE transactions on communications, 1993, 41, (9), pp. 1270–1274

5 Yang, C.K., Lin, J.C., Tsai, W.H. ‘Color image compression by moment-preserving and block truncation coding techniques’. In: Image Processing,1994. Proceedings. ICIP-94., IEEE International Conference. vol. 3. (IEEE, 1994.pp. 972–976

6 Dhara, B.C., Chanda, B.: ‘Color image compression based on block trunca-tion coding using pattern fitting principle’, Pattern Recognition, 2007, 40, (9),pp. 2408–2417

7 Feng, Y., Nasrabadi, N.: ‘Dynamic address-vector quantisation of rgb colourimages’, IEE Proceedings I (Communications, Speech and Vision), 1991, 138, (4),pp. 225–231

8 Lee, W.f., Chan, C.k.: ‘Dynamic finite state vq of colour images using stochasticlearning’, Signal Processing: Image Communication, 1994, 6, (1), pp. 1–11

9 Abadpour, A., Kasaei, S.: ‘Color pca eigenimages and their application to compres-sion and watermarking’, Image and Vision Computing, 2008, 26, (7), pp. 878–890

10 Wallace, G.K.: ‘The jpeg still picture compression standard’, IEEE transactions onconsumer electronics, 1992, 38, (1), pp. xviii–xxxiv

11 Douak, F., Benzid, R., Benoudjit, N.: ‘Color image compression algorithm basedon the dct transform combined to an adaptive block scanning’, AEU-InternationalJournal of Electronics and Communications, 2011, 65, (1), pp. 16–26

12 Messaoudi, A., Srairi, K.: ‘Colour image compression algorithm based on thedct transform using difference lookup table’, Electronics Letters, 2016, 52, (20),pp. 1685–1686

13 Skodras, A., Christopoulos, C., Ebrahimi, T.: ‘The jpeg 2000 still image compres-sion standard’, IEEE Signal processing magazine, 2001, 18, (5), pp. 36–58

14 Boucetta, A., Melkemi, K.: ‘Dwt based-approach for color image compressionusing genetic algorithm’, Image and Signal Processing, 2012, pp. 476–484

15 Barua, S., Mitra, K., Veeraraghavan, A. ‘Saliency guided wavelet compression forlow-bitrate image and video coding’. In: 2015 IEEE Global Conference on Signaland Information Processing (GlobalSIP). (IEEE, 2015. pp. 1185–1189

16 Xiong, Z., Ramchandran, K., Orchard, M.T., Zhang, Y.Q.: ‘A comparative study ofdct-and wavelet-based image coding’, IEEE Transactions on circuits and systemsfor video technology, 1999, 9, (5), pp. 692–695

17 Sullivan, G.J., Ohm, J., Han, W.J., Wiegand, T.: ‘Overview of the high efficiencyvideo coding (hevc) standard’, IEEE Transactions on circuits and systems for videotechnology, 2012, 22, (12), pp. 1649–1668

18 Yang, J., Zhu, G., Shi, Y.Q.: ‘Analyzing the effect of jpeg compression on localvariance of image intensity’, IEEE Transactions on Image Processing, 2016, 25,(6), pp. 2647–2656

19 Lam, E.Y., Goodman, J.W.: ‘A mathematical analysis of the dct coefficient dis-tributions for images’, IEEE Transactions on image processing, 2000, 9, (10),pp. 1661–1666

20 Borji, A., Itti, L.: ‘State-of-the-art in visual attention modeling’, IEEE transactionson pattern analysis and machine intelligence, 2013, 35, (1), pp. 185–207

21 Xia, Q., Li, X., Zhuo, L., Lam, K.: ‘Visual sensitivity-based low-bit-rate imagecompression algorithm’, IET image processing, 2012, 6, (7), pp. 910–918

22 Yang, H., Long, M., Tai, H.M.: ‘Region-of-interest image coding based on ebcot’,IEE Proceedings-Vision, Image and Signal Processing, 2005, 152, (5), pp. 590–596

23 Kaur, L., Chauhan, R., Saxena, S.: ‘Adaptive compression of medical ultrasoundimages’, IEE Proceedings-Vision, Image and Signal Processing, 2006, 153, (2),pp. 185–190

24 Hadizadeh, H., Bajic, I.V.: ‘Saliency-aware video compression’, IEEE Transac-tions on Image Processing, 2014, 23, (1), pp. 19–33

25 Guo, C., Zhang, L.: ‘A novel multiresolution spatiotemporal saliency detectionmodel and its applications in image and video compression’, IEEE transactions onimage processing, 2010, 19, (1), pp. 185–198

26 Christopoulos, C., Askelof, J., Larsson, M.: ‘Efficient methods for encodingregions of interest in the upcoming jpeg2000 still image coding standard’, IEEESignal Processing Letters, 2000, 7, (9), pp. 247–249

27 Bruckstein, A.M., Elad, M., Kimmel, R.: ‘Down-scaling for better transform com-pression’, IEEE Transactions on Image Processing, 2003, 12, (9), pp. 1132–1144

28 Otsu, N.: ‘A threshold selection method from gray-level histograms’, IEEEtransactions on systems, man, and cybernetics, 1979, 9, (1), pp. 62–66

29 Huang, D.Y., Wang, C.H.: ‘Optimal multi-level thresholding using a two-stage otsuoptimization approach’, Pattern Recognition Letters, 2009, 30, (3), pp. 275–284

30 Schindler, H.: ‘Delta modulation’, IEEE spectrum, 1970, 7, (10), pp. 69–7831 Lagarias, J.C., Reeds, J.A., Wright, M.H., Wright, P.E.: ‘Convergence proper-

ties of the nelder–mead simplex method in low dimensions’, SIAM Journal onoptimization, 1998, 9, (1), pp. 112–147

32 Weber, A.G.: ‘The usc-sipi image database version 5’, USC-SIPI Report, 1997,315, pp. 1–24

33 Wang, Z., Bovik, A.C.: ‘Image and multidimensional signal processing-a universalimage quality index’, IEEE Signal Processing Letters, 2002, 9, (3), pp. 81–84

34 Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: ‘Image quality assess-ment: from error visibility to structural similarity’, IEEE transactions on imageprocessing, 2004, 13, (4), pp. 600–612


View publication statsView publication stats

https://www.researchgate.net/publication/323191726

Saliency Enabled Compression in JPEG Framework › profile › Kumar_Rahul2 › publication ›...

Documents

Transcript of Saliency Enabled Compression in JPEG Framework › profile › Kumar_Rahul2 › publication ›...